[
  {
    "path": "2024/ACMMM.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ACMMM2024\n\n#### [1] LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field\n- **🧑‍🔬 作者**：Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He\n- **🏫 单位**：Zhejiang University HangZhou, China ⟐ Hefei University of Technology Hefei, China ⟐ University of Manchester Manchester, United Kingdom\n- **🔗 链接**：[[中英摘要](../abs/2404.08966.md)] [[arXiv:2404.08966](https://arxiv.org/abs/2404.08966)] [[Code](https://github.com/Pokerlishao/LoopGaussian)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [2] GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu\n- **🏫 单位**：Alibaba Group ⟐ Zhejiang University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2404.14037.md)] [[arXiv:2404.14037](https://arxiv.org/abs/2404.14037)] [Code]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [3] F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiangyu Sun, Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Usman Ali, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ KT\n- **🔗 链接**：[[中英摘要](../abs/2405.17083.md)] [[arXiv:2405.17083](https://arxiv.org/abs/2405.17083)] [[Code](https://github.com/Xiangyu1Sun/Factorize-3DGS)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [4] GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane\n- **🧑‍🔬 作者**：Yansong Qu, Shaohui Dai, Xinyang Li, Jianghang Lin, Liujuan Cao, Shengchuan Zhang, Rongrong Ji\n- **🏫 单位**：Xiamen University\n- **🔗 链接**：[[中英摘要](./abs/2405.17596.md)] [[arXiv:2405.17596](https://arxiv.org/abs/2405.17596)] [[Code](https://github.com/Quyans/GOI-Hyperplane)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [5] PlacidDreamer: Advancing Harmony in Text-to-3D Generation\n- **🧑‍🔬 作者**：Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia\n- **🏫 单位**：Tsinghua University ⟐ Kuaishou Technology\n- **🔗 链接**：[[中英摘要](../abs/2407.13976.md)] [[arXiv:2407.13976](https://arxiv.org/abs/2407.13976)] [[Code](https://github.com/HansenHuang0823/PlacidDreamer)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [6] A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness\n- **🧑‍🔬 作者**：Lutao Jiang, Hangyu Li, Lin Wang\n- **🏫 单位**：HKUST (Guangzhou)\n- **🔗 链接**：[[中英摘要](../abs/2408.01269.md)] [[arXiv:2408.01269](https://arxiv.org/abs/2408.01269)] [Code]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [7] Large Point-to-Gaussian Model for Image-to-3D Generation\n- **🧑‍🔬 作者**：Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, Shu-Tao Xia\n- **🏫 单位**：Tsinghua University ⟐ Tencent, Hunyuan ⟐ Shenzhen University ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2408.10935.md)] [[arXiv:2408.10935](https://arxiv.org/abs/2408.10935)] [Code]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [8] SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiru Wang, Shiyun Xie, Chengwei Pan, Guoping Wang\n- **🏫 单位**： Beihang University ⟐ Peking University\n- **🔗 链接**：[[中英摘要](../abs/2409.05868.md)] [[arXiv:2409.05868](https://arxiv.org/abs/2409.05868)] [[Code](https://github.com/MarcWangzhiru/SpeclatentGS)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [9] Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models\n- **🧑‍🔬 作者**：Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Chong-Wah Ngo, Tao Mei\n- **🏫 单位**：Fudan University ⟐ HiDream.ai Inc. China ⟐ Singapore Management University\n- **🔗 链接**：[[中英摘要](../abs/2409.07452.md)] [[arXiv:2409.07452](https://arxiv.org/abs/2409.07452)] [[Code](https://github.com/yanghb22-fdu/Hi3D-Official)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024\n\n#### [10] 4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes\n- **🧑‍🔬 作者**：Jinbo Yan, Rui Peng, Luyang Tang, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Pengcheng Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2412.06299.md)] [[arXiv:2412.06299](https://arxiv.org/abs/2412.06299)] [[Code](https://github.com/yjb6/SaRO-GS)]\n- **📝 说明**：🏆 Accepted to ACM MM 2024 Best Paper Candidate\n"
  },
  {
    "path": "2024/Accepted.md",
    "content": "# 3D Gaussian Splatting Papers Accepted in 2024\n\n#### [1] Dynamic 3d gaussians: Tracking by persistent dynamic view synthesis\n- **🧑‍🔬 作者**：Jonathon Luiten, Georgios Kopanas, Bastian Leibe, Deva Ramanan\n- **🏫 单位**：Carnegie Mellon University ⟐ RWTH Aachen University ⟐ Inria & Universite C´ ote d’Azur\n- **🔗 链接**：[[中英摘要](../abs/2308.09713.md)] [[arXiv:2308.09713](https://arxiv.org/abs/2308.09713)] [[Code](https://github.com/JonathonLuiten/Dynamic3DGaussians)]\n- **📝 说明**：🏆 Accepted to 3DV 2024\n\n#### [2] Dynamic Gaussian Splatting from Markerless Motion Capture can Reconstruct Infants Movements\n- **🧑‍🔬 作者**：R. James Cotton, Colleen Peyton\n- **🏫 单位**：Shirley Ryan AbilityLab ⟐ Northwestern University\n- **🔗 链接**：[[中英摘要](../abs/2310.19441.md)] [[arXiv:2310.19441](https://arxiv.org/abs/2310.19441)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2024 Workshop\n\n#### [3] DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation\n- **🧑‍🔬 作者**：Bardienus P. Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Jenny Seidenschwarz, Mike Zheng Shou, Deva Ramanan, Shuran Song, Stan Birchfield, Bowen Wen, Jeffrey Ichnowski\n- **🏫 单位**：Carnegie Mellon University ⟐ Stanford University ⟐ NVIDIA ⟐ National University of Singapore ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2312.00583.md)] [[arXiv:2312.00583](https://arxiv.org/abs/2312.00583)] [[Code](https://github.com/momentum-robotics-lab/deformgs)]\n- **📝 说明**：🏆 Accepted to WAFR 2024\n\n#### [4] FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding\n- **🧑‍🔬 作者**：Xingxing Zuo, Pouya Samangouei, Yunwen Zhou, Yan Di, Mingyang Li\n- **🏫 单位**：Google\n- **🔗 链接**：[[中英摘要](../abs/2401.01970.md)] [[arXiv:2401.01970](https://arxiv.org/abs/2401.01970)] [Code]\n- **📝 说明**：🏆 Accepted to IJCV 2024\n\n#### [5] TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering\n- **🧑‍🔬 作者**：Linus Franke, Darius Rückert, Laura Fink, Marc Stamminger\n- **🏫 单位**：Friedrich-Alexander-Universität Erlangen-Nürnberg\n- **🔗 链接**：[[中英摘要](../abs/2401.06003.md)] [[arXiv:2401.06003](https://arxiv.org/abs/2401.06003)] [[Code](https://github.com/lfranke/trips)]\n- **📝 说明**：🏆 Accepted to Eurographics 2024\n\n#### [6] LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering\n- **🧑‍🔬 作者**：Sheng Hong, Junjie He, Xinhu Zheng, Hesheng Wang, Hao Fang, Kangcheng Liu, Chunran Zheng, Shaojie Shen\n- **🏫 单位**：HKUST\n- **🔗 链接**：[[中英摘要](./abs/2401.14857.md)] [[arXiv:2401.14857](https://arxiv.org/abs/2401.14857)] [[Code](https://github.com/sheng00125/LIV-GaussMap)]\n- **📝 说明**：🏆 Accepted to RAL 2024\n\n#### [7] Gaussian Splatting in Style\n- **🧑‍🔬 作者**：Abhishek Saroha, Mariia Gladkova, Cecilia Curreli, Tarun Yenamandra, Daniel Cremers\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2403.08498.md)] [[arXiv:2403.08498](https://arxiv.org/abs/2403.08498)] [Code]\n- **📝 说明**：🏆 Accepted to GCPR 2024\n\n#### [8] Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph\n- **🧑‍🔬 作者**：Donglin Di, Jiahui Yang, Chaofan Luo, Zhou Xue, Wei Chen, Xun Yang, Yue Gao\n- **🏫 单位**：Space AI, Li Auto ⟐ Tsinghua University ⟐ University of Science and Technology of China ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2403.09236.md)] [[arXiv:2403.09236](https://arxiv.org/abs/2403.09236)] [[Code](https://github.com/yjhboy/Hyper3DG)]\n- **📝 说明**：🏆 Accepted to IJCV\n\n#### [9] GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping\n- **🧑‍🔬 作者**：Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang\n- **🏫 单位**： Beihang University ⟐ Chinese Academy of Sciences ⟐ Tsinghua University ⟐ Imperial College London ⟐ China Mobile Research Institute ⟐ Wuhan University ⟐ Shanghai AI Laboratory ⟐ University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2403.09637.md)] [[arXiv:2403.09637](https://arxiv.org/abs/2403.09637)] [[Code](https://github.com/MrSecant/GaussianGrasper)]\n- **📝 说明**：🏆 Accepted to RA-L 2024\n\n#### [10] NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yiming Ji, Yang Liu, Guanghu Xie, Boyu Ma, Zongwu Xie\n- **🏫 单位**：Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2403.11679.md)] [[arXiv:2403.11679](https://arxiv.org/abs/2403.11679)] [Code]\n- **📝 说明**：🏆 Accepted to RA-L 2024\n\n#### [11] Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiarui Meng, Haijie Li, Yanmin Wu, Qiankun Gao, Shuzhou Yang, Jian Zhang, Siwei Ma\n- **🏫 单位**：Peking Universiy\n- **🔗 链接**：[[中英摘要](../abs/2404.01168.md)] [[arXiv:2404.01168](https://arxiv.org/abs/2404.01168)] [Code]\n- **📝 说明**：🏆 Accepted to VCIP 2024\n\n#### [12] HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2\n- **🧑‍🔬 作者**：Miriam Jäger, Theodor Kapler, Michael Feßenbecker, Felix Birkelbach, Markus Hillemann, Boris Jutzi\n- **🏫 单位**：Institute of Photogrammetry and Remote Sensing (IPF), Karlsruhe Institute of Technology (KIT)\n- **🔗 链接**：[[中英摘要](../abs/2405.02005.md)] [[arXiv:2405.02005](https://arxiv.org/abs/2405.02005)] [Code]\n- **📝 说明**：🏆 Accepted to ISPRS\n\n#### [13] FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting\n- **🧑‍🔬 作者**：Yikun Ma, Dandan Zhan, Zhi Jin\n- **🏫 单位**：Sun Yat-sen University ⟐ Guangdong Provincial Key Laboratory of Fire Science and Intelligent Emergency Technology\n- **🔗 链接**：[[中英摘要](../abs/2405.05768.md)] [[arXiv:2405.05768](https://arxiv.org/abs/2405.05768)] [Code]\n- **📝 说明**：🏆 Accepted to IJCAI 2024\n\n#### [14] Object-centric Reconstruction and Tracking of Dynamic Unknown Objects using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kuldeep R Barad, Antoine Richard, Jan Dentler, Miguel Olivares-Mendez, Carol Martinez\n- **🏫 单位**：University of Luxembourg ⟐ Redwire Space Europe\n- **🔗 链接**：[[中英摘要](../abs/2405.20104.md)] [[arXiv:2405.20104](https://arxiv.org/abs/2405.20104)] [Code]\n- **📝 说明**：🏆 Accepted to iSpaRo 2024\n\n#### [15] PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction\n- **🧑‍🔬 作者**：Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang, Haomin Liu, Hujun Bao, Guofeng Zhang\n- **🏫 单位**：Zhejiang University ⟐ SenseTime Research ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2406.06521.md)] [[arXiv:2406.06521](https://arxiv.org/abs/2406.06521)] [[Code](https://github.com/zju3dv/PGSR)]\n- **📝 说明**：🏆 Accepted to TVCG 2024\n\n#### [16] Reducing the Memory Footprint of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Panagiotis Papantonakis, Georgios Kopanas, Bernhard Kerbl, Alexandre Lanvin, George Drettakis\n- **🏫 单位**：Inria, Université Côte d’Azur\n- **🔗 链接**：[[中英摘要](../abs/2406.17074.md)] [[arXiv:2406.17074](https://arxiv.org/abs/2406.17074)] [Code]\n- **📝 说明**：🏆 Accepted to I3D 2024\n\n#### [17] RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering\n- **🧑‍🔬 作者**：Weikai Lin, Yu Feng, Yuhao Zhu\n- **🏫 单位**：University of Rochester ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2407.00435.md)] [[arXiv:2407.00435](https://arxiv.org/abs/2407.00435)] [[Code](https://github.com/horizon-research/Fov-3DGS)]\n- **📝 说明**：🏆 Accepted to ASPLOS 2025\n\n#### [18] DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction\n- **🧑‍🔬 作者**：Yujin Ham, Mateusz Michalkiewicz, Guha Balakrishnan\n- **🏫 单位**：Rice University\n- **🔗 链接**：[[中英摘要](../abs/2407.01761.md)] [[arXiv:2407.01761](https://arxiv.org/abs/2407.01761)] [Code]\n- **📝 说明**：🏆 Accepted to ICCP 2024\n\n#### [19] Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting\n- **🧑‍🔬 作者**：Sheng Ye, Zhen-Hui Dong, Yubin Hu, Yu-Hui Wen, Yong-Jin Liu\n- **🏫 单位**：MOE-Key Laboratory of Pervasive Computing, the Department of Computer Science and Technology, Tsinghua University ⟐ Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University\n- **🔗 链接**：[[中英摘要](../abs/2408.09130.md)] [[arXiv:2408.09130](https://arxiv.org/abs/2408.09130)] [[Code](https://github.com/yec22/Gaussian-DK)]\n- **📝 说明**：🏆 Accepted to Pacific Graphics 2024\n\n#### [20] Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors\n- **🧑‍🔬 作者**：Paul Ungermann, Armin Ettenhofer, Matthias Nießner, Barbara Roessle\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2408.11697.md)] [[arXiv:2408.11697](https://arxiv.org/abs/2408.11697)] [Code]\n- **📝 说明**：🏆 Accepted to GCPR 2024\n\n#### [21] Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks\n- **🧑‍🔬 作者**：Ruihan Xu, Anthony Opipari, Joshua Mah, Stanley Lewis, Haoran Zhang, Hanzhe Guo, Odest Chadwicke Jenkins\n- **🏫 单位**：University of Michigan\n- **🔗 链接**：[[中英摘要](../abs/2409.07245.md)] [[arXiv:2409.07245](https://arxiv.org/abs/2409.07245)] [Code]\n- **📝 说明**：🏆 Accepted to RSS 2024 Workshop on Geometric and Algebraic Structure in Robot Learning\n\n#### [22] A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis\n- **🧑‍🔬 作者**：Yohan Poirier-Ginter, Alban Gauthier, Julien Philip, Jean-Francois Lalonde, George Drettakis\n- **🏫 单位**：Inria, Université Côte d'Azur ⟐ Université Laval ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](../abs/2409.08947.md)] [[arXiv:2409.08947](https://arxiv.org/abs/2409.08947)] [[Code](https://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/)]\n- **📝 说明**：🏆 Accepted to EGSR 2024\n\n#### [23] MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering\n- **🧑‍🔬 作者**：Jaehoon Choi, Yonghan Lee, Hyungtae Lee, Heesung Kwon, Dinesh Manocha\n- **🏫 单位**：University of Maryland ⟐ DEVCOM Army Research Laboratory ⟐ BlueHalo, Rockville\n- **🔗 链接**：[[中英摘要](../abs/2410.08941.md)] [[arXiv:2410.08941](https://arxiv.org/abs/2410.08941)] [Code]\n- **📝 说明**：🏆 Accepted to ACCV 2024\n\n#### [24] Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization\n- **🧑‍🔬 作者**：Yanan Guo, Ying Xie, Ying Chang, Benkui Zhang, Bo Jia, Lin Cao\n- **🏫 单位**：Beijing Information Science and Technology University ⟐ Aerospace Information Research Institute\n- **🔗 链接**：[[中英摘要](../abs/2410.13280.md)] [[arXiv:2410.13280](https://arxiv.org/abs/2410.13280)] [[Code](https://github.com/Bistu3DV/hybridBA)]\n- **📝 说明**：🏆 Accepted to Photonics Asia 2024\n\n#### [25] GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video\n- **🧑‍🔬 作者**：Jingxuan Chen\n- **🏫 单位**：Jinan University-University of Birmingham Joint Institute\n- **🔗 链接**：[[中英摘要](../abs/2411.09952.md)] [[arXiv:2411.09952](https://arxiv.org/abs/2411.09952)] [[Code](https://github.com/J-X-Chen/GGAvatar/)]\n- **📝 说明**：🏆 Accepted to MMAsia 2024\n\n#### [26] GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization\n- **🧑‍🔬 作者**：Yanhao Sun, RunZe Tian, Xiao Han, XinYao Liu, Yan Zhang, Kai Xu\n- **🏫 单位**：State Key Laboratory for Novel Software Technology of Nanjing University, China ⟐ National University of Defense Technology, China\n- **🔗 链接**：[[中英摘要](../abs/2411.10033.md)] [[arXiv:2411.10033](https://arxiv.org/abs/2411.10033)] [Code]\n- **📝 说明**：🏆 Accepted to Pacific Graphics 2024\n\n#### [27] 2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Qipeng Yan, Mingyang Sun, Lihua Zhang\n- **🏫 单位**：Academy for Engineering and Technology Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2503.02452.md)] [[arXiv:2503.02452](https://arxiv.org/abs/2503.02452)] [Code]\n- **📝 说明**：🏆 Accepted to ICVRV 2024\n"
  },
  {
    "path": "2024/BMVC.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to BMVC2024\n\n#### [1] AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field\n- **🧑‍🔬 作者**：Rong Liu, Rui Xu, Yue Hu, Meida Chen, Andrew Feng\n- **🏫 单位**：University of Southern California\n- **🔗 链接**：[[中英摘要](../abs/2405.12369.md)] [[arXiv:2405.12369](https://arxiv.org/abs/2405.12369)] [[Code](https://github.com/RongLiu-Leo/AtomGS)]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [2] HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction\n- **🧑‍🔬 作者**：Haoyu Zhao, Xingyue Zhao, Lingting Zhu, Weixi Zheng, Yongchao Xu\n- **🏫 单位**：WuHan University ⟐ Xi’an Jiaotong University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2405.17872.md)] [[arXiv:2405.17872](https://arxiv.org/abs/2405.17872)] [Code]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [3] RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields\n- **🧑‍🔬 作者**：Mihnea-Bogdan Jurca, Remco Royen, Ion Giosan, Adrian Munteanu\n- **🏫 单位**：Vrije Universiteit Brussel ⟐ Technical University of Cluj-Napoca\n- **🔗 链接**：[[中英摘要](../abs/2405.18033.md)] [[arXiv:2405.18033](https://arxiv.org/abs/2405.18033)] [Code]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [4] Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning\n- **🧑‍🔬 作者**：Muhammad Salman Ali, Maryam Qamar, Sung-Ho Bae, Enzo Tartaglione\n- **🏫 单位**： Institut Polytechnique de Paris ⟐ Kyung Hee University\n- **🔗 链接**：[[中英摘要](../abs/2406.18214.md)] [[arXiv:2406.18214](https://arxiv.org/abs/2406.18214)] [[Code](https://github.com/salmanali96/Trimming-the-Fat)]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [5] HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images\n- **🧑‍🔬 作者**：Shreyas Singh, Aryan Garg, Kaushik Mitra\n- **🏫 单位**：Indian Institute of Technology, Madra\n- **🔗 链接**：[[中英摘要](../abs/2407.16503.md)] [[arXiv:2407.16503](https://arxiv.org/abs/2407.16503)] [[Code](https://github.com/shreyesss/HDRSplat)]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [6] Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty\n- **🧑‍🔬 作者**：Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao\n- **🏫 单位**：Tsinghua University ⟐ Nanyang Technological University ⟐ Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2408.15242.md)] [[arXiv:2408.15242](https://arxiv.org/abs/2408.15242)] [[Code](https://github.com/SainingZhang/UC-GS)]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [7] Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs\n- **🧑‍🔬 作者**：Sadra Safadoust, Fabio Tosi, Fatma Güney, Matteo Poggi\n- **🏫 单位**：Koç University ⟐ University of Bologna\n- **🔗 链接**：[[中英摘要](../abs/2409.07456.md)] [[arXiv:2409.07456](https://arxiv.org/abs/2409.07456)] [Code]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n\n#### [8] Gaussian Splatting in Mirrors: Reflection-Aware Rendering via Virtual Camera Optimization\n- **🧑‍🔬 作者**：Zihan Wang, Shuzhe Wang, Matias Turkulainen, Junyuan Fang, Juho Kannala\n- **🏫 单位**：Aalto University ⟐ University of Oulu ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2410.01614.md)] [[arXiv:2410.01614](https://arxiv.org/abs/2410.01614)] [Code]\n- **📝 说明**：🏆 Accepted to BMVC 2024\n"
  },
  {
    "path": "2024/CVPR.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to CVPR2024\n\n#### [1] Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin\n- **🏫 单位**： Zhejiang University ⟐ ByteDance Inc.\n- **🔗 链接**：[[中英摘要](../abs/2309.13101.md)] [[arXiv:2309.13101](https://arxiv.org/abs/2309.13101)] [[Code](https://github.com/ingra14m/Deformable-3D-Gaussians)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [2] Text-to-3D using Gaussian Splatting\n- **🧑‍🔬 作者**：Zilong Chen, Feng Wang, Huaping Liu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2309.16585.md)] [[arXiv:2309.16585](https://arxiv.org/abs/2309.16585)] [[Code](https://github.com/gsgen3d/gsgen)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [3] 4d gaussian splatting for real-time dynamic scene rendering\n- **🧑‍🔬 作者**：Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, Xinggang Wang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Huawei Inc.\n- **🔗 链接**：[[中英摘要](../abs/2310.08528.md)] [[arXiv:2310.08528](https://arxiv.org/abs/2310.08528)] [[Code](https://github.com/hustvl/4DGaussians)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [4] GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models\n- **🧑‍🔬 作者**：Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang Wang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Huawei Inc.\n- **🔗 链接**：[[中英摘要](../abs/2310.08529.md)] [[arXiv:2310.08529](https://arxiv.org/abs/2310.08529)] [[Code](https://github.com/hustvl/GaussianDreamer)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [5] GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chi Yan, Delin Qu, Dong Wang, Dan Xu, Zhigang Wang, Bin Zhao, Xuelong Li\n- **🏫 单位**：Shanghai AI Laboratory ⟐ Fudan University ⟐ Northwestern Polytechnical University ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](../abs/2311.11700.md)] [[arXiv:2311.11700](https://arxiv.org/abs/2311.11700)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [6] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics\n- **🧑‍🔬 作者**：Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang\n- **🏫 单位**：UCLA ⟐ Zhejiang University ⟐ University of Utah\n- **🔗 链接**：[[中英摘要](../abs/2311.12198.md)] [[arXiv:2311.12198](https://arxiv.org/abs/2311.12198)] [[Code](https://github.com/XPandora/PhysGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [7] SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering\n- **🧑‍🔬 作者**：Antoine Guédon, Vincent Lepetit\n- **🏫 单位**：Univ Gustave Eiffel\n- **🔗 链接**：[[中英摘要](../abs/2311.12775.md)] [[arXiv:2311.12775](https://arxiv.org/abs/2311.12775)] [[Code](https://github.com/Anttwo/SuGaR)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [8] Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images\n- **🧑‍🔬 作者**：Jaeyoung Chung, Jeongtaek Oh, Kyoung Mu Lee\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2311.13398.md)] [[arXiv:2311.13398](https://arxiv.org/abs/2311.13398)] [[Code](https://github.com/robot0321/DepthRegularizedGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2024 Workshop on 3DMV\n\n#### [9] Compact 3D Gaussian Representation for Radiance Field\n- **🧑‍🔬 作者**：Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ KT\n- **🔗 链接**：[[中英摘要](../abs/2311.13681.md)] [[arXiv:2311.13681](https://arxiv.org/abs/2311.13681)] [[Code](https://github.com/maincold2/Compact-3DGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [10] GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting\n- **🧑‍🔬 作者**：Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin\n- **🏫 单位**：Nanyang Technological University ⟐ Tsinghua University ⟐ SenseTime Research\n- **🔗 链接**：[[中英摘要](../abs/2311.14521.md)] [[arXiv:2311.14521](https://arxiv.org/abs/2311.14521)] [[Code](https://github.com/buaacyw/GaussianEditor)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [11] GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions\n- **🧑‍🔬 作者**：Jiemin Fang, Junjie Wang, Xiaopeng Zhang, Lingxi Xie, Qi Tian\n- **🏫 单位**：Huawei Inc.\n- **🔗 链接**：[[中英摘要](../abs/2311.16037.md)] [[arXiv:2311.16037](https://arxiv.org/abs/2311.16037)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [12] Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling\n- **🧑‍🔬 作者**：Zhe Li, Zerong Zheng, Lizhen Wang, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ NNKosmos Technology\n- **🔗 链接**：[[中英摘要](../abs/2311.16096.md)] [[arXiv:2311.16096](https://arxiv.org/abs/2311.16096)] [[Code](https://github.com/lizhe00/AnimatableGaussians)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [13] GART: Gaussian Articulated Template Models\n- **🧑‍🔬 作者**：Jiahui Lei, Yufu Wang, Georgios Pavlakos, Lingjie Liu, Kostas Daniilidis\n- **🏫 单位**：University of Pennsylvania ⟐ UC Berkeley ⟐ Archimedes, Athena RC\n- **🔗 链接**：[[中英摘要](../abs/2311.16099.md)] [[arXiv:2311.16099](https://arxiv.org/abs/2311.16099)] [[Code](https://github.com/JiahuiLei/GART)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [14] GS-IR: 3D Gaussian Splatting for Inverse Rendering\n- **🧑‍🔬 作者**：Zhihao Liang, Qi Zhang, Ying Feng, Ying Shan, Kui Jia\n- **🏫 单位**：South China University of Technology ⟐ Tencent AI Lab ⟐ The Chinese University of Hong Kong, Shenzhen\n- **🔗 链接**：[[中英摘要](../abs/2311.16473.md)] [[arXiv:2311.16473](https://arxiv.org/abs/2311.16473)] [[Code](https://github.com/lzhnb/GS-IR)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [15] Mip-Splatting: Alias-free 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, Andreas Geiger\n- **🏫 单位**： University of Tubingen ⟐ Tubingen AI Center ⟐ ShanghaiTech University ⟐ Czech Technical University in Prague\n- **🔗 链接**：[[中英摘要](../abs/2311.16493.md)] [[arXiv:2311.16493](https://arxiv.org/abs/2311.16493)] [[Code](https://github.com/autonomousvision/mip-splatting)]\n- **📝 说明**：🏆 CVPR 2024 Best Student Paper\n\n#### [16] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting\n- **🧑‍🔬 作者**：Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu\n- **🏫 单位**：CUHK ⟐ Tencent AI Lab ⟐ PKU ⟐ HKU ⟐ NTU\n- **🔗 链接**：[[中英摘要](../abs/2311.17061.md)] [[arXiv:2311.17061](https://arxiv.org/abs/2311.17061)] [[Code](https://github.com/alvinliu0/HumanGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [17] Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering\n- **🧑‍🔬 作者**：Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2311.17089.md)] [[arXiv:2311.17089](https://arxiv.org/abs/2311.17089)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [18] Human Gaussian Splatting: Real-time Rendering of Animatable Avatars\n- **🧑‍🔬 作者**：Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2311.17113.md)] [[arXiv:2311.17113](https://arxiv.org/abs/2311.17113)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [19] Gaussian Shell Maps for Efficient 3D Human Generation\n- **🧑‍🔬 作者**：Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein\n- **🏫 单位**：Stanford University ⟐ HKUST\n- **🔗 链接**：[[中英摘要](../abs/2311.17857.md)] [[arXiv:2311.17857](https://arxiv.org/abs/2311.17857)] [[Code](https://github.com/computational-imaging/GSM)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [20] HUGS: Human Gaussian Splats\n- **🧑‍🔬 作者**：Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan\n- **🏫 单位** Apple ⟐ Max Planck Institute for Intelligent Systems ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2311.17910.md)] [[arXiv:2311.17910](https://arxiv.org/abs/2311.17910)] [[Code](https://github.com/apple/ml-hugs)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [21] GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces\n- **🧑‍🔬 作者**：Yingwenqi Jiang, Jiadong Tu, Yuan Liu, Xifeng Gao, Xiaoxiao Long, Wenping Wang, Yuexin Ma\n- **🏫 单位**：ShanghaiTech University ⟐ The University of Hong Kong ⟐ Tencent America ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](../abs/2311.17977.md)] [[arXiv:2311.17977](https://arxiv.org/abs/2311.17977)] [[Code](https://github.com/Asparagus15/GaussianShader)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [22] Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding\n- **🧑‍🔬 作者**：Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, Shao-Hua Guan\n- **🏫 单位**：Beihang University ⟐ Zhongguanchun Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2311.18482.md)] [[arXiv:2311.18482](https://arxiv.org/abs/2311.18482)] [[Code](https://github.com/Chuan-10/LEGaussians)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [23] Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering\n- **🧑‍🔬 作者**：Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong ⟐ Nanjing University ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](../abs/2312.00109.md)] [[arXiv:2312.00109](https://arxiv.org/abs/2312.00109)] [[Code](https://github.com/city-super/Scaffold-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [24] Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction\n- **🧑‍🔬 作者**：Devikalyan Das, Christopher Wewer, Raza Yunus, Eddy Ilg, Jan Eric Lenssen\n- **🏫 单位**：Saarland University ⟐ Max Planck Institute for Informatics\n- **🔗 链接**：[[中英摘要](../abs/2312.01196.md)] [[arXiv:2312.01196](https://arxiv.org/abs/2312.01196)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [25] GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians\n- **🧑‍🔬 作者**：Shenhan Qian, Tobias Kirschstein, Liam Schoneveld, Davide Davoli, Simon Giebenhain, Matthias Nießner\n- **🏫 单位**：Technical University of Munich ⟐ Woven by Toyota ⟐ Toyota Motor Europe NV/SA\n- **🔗 链接**：[[中英摘要](../abs/2312.02069.md)] [[arXiv:2312.02069](https://arxiv.org/abs/2312.02069)] [[Code](https://github.com/ShenhanQian/GaussianAvatars)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [26] Splat-SLAM: Dense RGB-D SLAM via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon Luiten\n- **🏫 单位**：CMU ⟐ MIT\n- **🔗 链接**：[[中英摘要](../abs/2312.02126.md)] [[arXiv:2312.02126](https://arxiv.org/abs/2312.02126)] [[Code](https://github.com/spla-tam/SplaTAM)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [27] GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians\n- **🧑‍🔬 作者**：Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang, Boyao Zhou, Boning Liu, Shengping Zhang, Liqiang Nie\n- **🏫 单位**：Harbin Institute of Technology ⟐ Beijing Normal University ⟐  Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2312.02134.md)] [[arXiv:2312.02134](https://arxiv.org/abs/2312.02134)] [[Code](https://github.com/huliangxiao/GaussianAvatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [28] MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians\n- **🧑‍🔬 作者**：Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar\n- **🏫 单位**：IIIT Hyderabad ⟐ Brown University\n- **🔗 链接**：[[中英摘要](../abs/2312.02137.md)] [[arXiv:2312.02137](https://arxiv.org/abs/2312.02137)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [29] GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis\n- **🧑‍🔬 作者**：Shunyuan Zheng, Boyao Zhou, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu\n- **🏫 单位**：Harbin Institute of Technology ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2312.02155.md)] [[arXiv:2312.02155](https://arxiv.org/abs/2312.02155)] [[Code](https://github.com/ShunyuanZheng/GPS-Gaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [30] GauHuman: Articulated Gaussian Splatting for Real-Time 3D Human Rendering\n- **🧑‍🔬 作者**：Shoukang Hu, Tao Hu, Ziwei Liu\n- **🏫 单位**：Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2312.02973.md)] [[arXiv:2312.02973](https://arxiv.org/abs/2312.02973)] [[Code](https://github.com/skhu101/GauHuman)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [31] Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians\n- **🧑‍🔬 作者**：Yuelang Xu, Benwang Chen, Zhe Li, Hongwen Zhang, Lizhen Wang, Zerong Zheng, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ NNKosmos\n- **🔗 链接**：[[中英摘要](../abs/2312.03029.md)] [[arXiv:2312.03029](https://arxiv.org/abs/2312.03029)] [[Code](https://github.com/YuelangX/Gaussian-Head-Avatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [32] Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields\n- **🧑‍🔬 作者**：Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi\n- **🏫 单位**：University of California, Los Angeles ⟐ University of Texas at Austin ⟐ DEVCOM Army Research Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2312.03203.md)] [[arXiv:2312.03203](https://arxiv.org/abs/2312.03203)] [[Code](https://github.com/ShijieZhou-UCLA/feature-3dgs)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [33] Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle\n- **🧑‍🔬 作者**：Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao\n- **🏫 单位**：Nanjing University ⟐ Alibaba Group ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2312.03431.md)] [[arXiv:2312.03431](https://arxiv.org/abs/2312.03431)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [34] HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting\n- **🧑‍🔬 作者**：Yuheng Jiang, Zhehao Shen, Penghao Wang, Zhuo Su, Yu Hong, Yingliang Zhang, Jingyi Yu, Lan Xu\n- **🏫 单位**：ShanghaiTech University ⟐ NeuDim ⟐ ByteDance ⟐ DGene\n- **🔗 链接**：[[中英摘要](../abs/2312.03461.md)] [[arXiv:2312.03461](https://arxiv.org/abs/2312.03461)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [35] Relightable Gaussian Codec Avatars\n- **🧑‍🔬 作者**：Shunsuke Saito, Gabriel Schwartz, Tomas Simon, Junxuan Li, Giljoo Nam\n- **🏫 单位**：Codec Avatars Lab, Meta\n- **🔗 链接**：[[中英摘要](../abs/2312.03704.md)] [[arXiv:2312.03704](https://arxiv.org/abs/2312.03704)] [[Code](https://github.com/shunsukesaito/rgca)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [36] CoGS: Controllable Gaussian Splatting\n- **🧑‍🔬 作者**：Heng Yu, Joel Julin, Zoltán Á. Milacski, Koichiro Niinuma, László A. Jeni\n- **🏫 单位**：Carnegie Mellon University ⟐ Fujitsu Research of America\n- **🔗 链接**：[[中英摘要](../abs/2312.05664.md)] [[arXiv:2312.05664](https://arxiv.org/abs/2312.05664)] [[Code](https://github.com/Heng14/CoGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [37] ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering\n- **🧑‍🔬 作者**：Haokai Pang, Heming Zhu, Adam Kortylewski, Christian Theobalt, Marc Habermann\n- **🏫 单位**：Max Planck Institute for Informatics ⟐ ETH Zurich ⟐  Universitat Freiburg ⟐ Saarbrucken Research Center for Visual Computing, Interaction and AI\n- **🔗 链接**：[[中英摘要](../abs/2312.05941.md)] [[arXiv:2312.05941](https://arxiv.org/abs/2312.05941)] [[Code](https://github.com/kv2000/ASH)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [38] Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Hidenobu Matsuki, Riku Murai, Paul H.J. Kelly, Andrew J. Davison\n- **🏫 单位**：Imperial College London\n- **🔗 链接**：[[中英摘要](../abs/2312.06741.md)] [[arXiv:2312.06741](https://arxiv.org/abs/2312.06741)] [[Code](https://github.com/muskie82/MonoGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [39] COLMAP-Free 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yang Fu, Sifei Liu, Amey Kulkarni, Jan Kautz, Alexei A. Efros, Xiaolong Wang\n- **🏫 单位**：UC San Deigo ⟐ NVIDIA ⟐ 3UC Berkeley\n- **🔗 链接**：[[中英摘要](../abs/2312.07504.md)] [[arXiv:2312.07504](https://arxiv.org/abs/2312.07504)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [40] DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes\n- **🧑‍🔬 作者**：Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang\n- **🏫 单位**：Peking University ⟐ Google Research ⟐ University of California, Merced\n- **🔗 链接**：[[中英摘要](../abs/2312.07920.md)] [[arXiv:2312.07920](https://arxiv.org/abs/2312.07920)] [[Code](https://github.com/VDIGPKU/DrivingGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [41] Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers\n- **🧑‍🔬 作者**：Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Yan-Pei Cao, Song-Hai Zhang\n- **🏫 单位**：Tsinghua University ⟐ VAST\n- **🔗 链接**：[[中英摘要](../abs/2312.09147.md)] [[arXiv:2312.09147](https://arxiv.org/abs/2312.09147)] [[Code](https://github.com/VAST-AI-Research/TriplaneGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [42] 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiyin Qian, Shaofei Wang, Marko Mihajlovic, Andreas Geiger, Siyu Tang\n- **🏫 单位**：ETH Zürich ⟐ University of Tübingen ⟐ Tübingen AI Center\n- **🔗 链接**：[[中英摘要](../abs/2312.09228.md)] [[arXiv:2312.09228](https://arxiv.org/abs/2312.09228)] [[Code](https://github.com/mikeqzy/3dgs-avatar-release)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [43] GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning\n- **🧑‍🔬 作者**：Ye Yuan, Xueting Li, Yangyi Huang, Shalini De Mello, Koki Nagano, Jan Kautz, Umar Iqbal\n- **🏫 单位**：NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2312.11461.md)] [[arXiv:2312.11461](https://arxiv.org/abs/2312.11461)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [44] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction\n- **🧑‍🔬 作者**：David Charatan, Sizhe Li, Andrea Tagliasacchi, Vincent Sitzmann\n- **🏫 单位**：Massachusetts Institute of Technology ⟐ Simon Fraser University ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](../abs/2312.12337.md)] [[arXiv:2312.12337](https://arxiv.org/abs/2312.12337)] [[Code](https://github.com/dcharatan/pixelsplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [45] SpecNeRF: Gaussian Directional Encoding for Specular Reflections\n- **🧑‍🔬 作者**：Li Ma, Vasu Agrawal, Haithem Turki, Changil Kim, Chen Gao, Pedro Sander, Michael Zollhöfer, Christian Richardt\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Meta Reality Labs ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2312.13102.md)] [[arXiv:2312.13102](https://arxiv.org/abs/2312.13102)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [46] Splatter Image: Ultra-Fast Single-View 3D Reconstruction\n- **🧑‍🔬 作者**：Stanislaw Szymanowicz, Christian Rupprecht, Andrea Vedaldi\n- **🏫 单位**：University of Oxford (VGG)\n- **🔗 链接**：[[中英摘要](../abs/2312.13150.md)] [[arXiv:2312.13150](https://arxiv.org/abs/2312.13150)] [[Code](https://github.com/szymanowiczs/splatter-image)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [47] Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models\n- **🧑‍🔬 作者**：Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis\n- **🏫 单位**：NVIDIA ⟐ Vector Institute ⟐ University of Toronto ⟐ MIT\n- **🔗 链接**：[[中英摘要](../abs/2312.13763.md)] [[arXiv:2312.13763](https://arxiv.org/abs/2312.13763)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [48] SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes\n- **🧑‍🔬 作者**：Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi\n- **🏫 单位**：The University of Hong Kong ⟐ VAST ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2312.14937.md)] [[arXiv:2312.14937](https://arxiv.org/abs/2312.14937)] [[Code](https://github.com/yihua7/SC-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [49] LangSplat: 3D Language Gaussian Splatting\n- **🧑‍🔬 作者**：Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister\n- **🏫 单位**：Tsinghua University ⟐ Harvard University\n- **🔗 链接**：[[中英摘要](../abs/2312.16084.md)] [[arXiv:2312.16084](https://arxiv.org/abs/2312.16084)] [[Code](https://github.com/minghanqin/LangSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [50] Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis\n- **🧑‍🔬 作者**：Zhan Li, Zhang Chen, Zhong Li, Yi Xu\n- **🏫 单位**：OPPO US Research Center ⟐ Portland State University\n- **🔗 链接**：[[中英摘要](../abs/2312.16812.md)] [[arXiv:2312.16812](https://arxiv.org/abs/2312.16812)] [[Code](https://github.com/oppo-us-research/SpacetimeGaussians)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [51] Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis\n- **🧑‍🔬 作者**：Simon Niedermayr, Josef Stumpfegger, rüdiger westermann\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2401.02436.md)] [[arXiv:2401.02436](https://arxiv.org/abs/2401.02436)] [[Code](https://github.com/KeKsBoTer/c3dgs)]\n- **📝 说明**：🏆 Accepted to CVPR 2024G\n\n#### [52] Gaussian Shadow Casting for Neural Characters\n- **🧑‍🔬 作者**：Luis Bolanos, Shih-Yang Su, Helge Rhodin\n- **🏫 单位**：The University of British Columbia\n- **🔗 链接**：[[中英摘要](../abs/2401.06116.md)] [[arXiv:2401.06116](https://arxiv.org/abs/2401.06116)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [53] GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering\n- **🧑‍🔬 作者**：Abdullah Hamdi, Luke Melas-Kyriazi, Guocheng Qian, Jinjie Mai, Ruoshi Liu, Carl Vondrick, Bernard Ghanem, Andrea Vedaldi\n- **🏫 单位**：VGG, University of Oxford ⟐ KAUST ⟐ Columbia University ⟐ Snap Inc.\n- **🔗 链接**：[[中英摘要](../abs/2402.10128.md)] [[arXiv:2402.10128](https://arxiv.org/abs/2402.10128)] [[Code](https://github.com/ajhamdi/ges-splatting)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [54] VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction\n- **🧑‍🔬 作者**：Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, Wenming Yang\n- **🏫 单位**：Tsinghua University ⟐ Huawei Noah's Ark Lab ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](../abs/2402.17427.md)] [[arXiv:2402.17427](https://arxiv.org/abs/2402.17427)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [55] 3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos\n- **🧑‍🔬 作者**：Jiakai Sun, Han Jiao, Guangyuan Li, Zhanjie Zhang, Lei Zhao, Wei Xing\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2403.01444.md)] [[arXiv:2403.01444](https://arxiv.org/abs/2403.01444)] [[Code](https://github.com/SJoJoK/3DGStream)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [56] SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting\n- **🧑‍🔬 作者**：Zhijing Shao, Wang Zhaolong, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ Tsinghua University ⟐ Prometheus Vision Technology Co., Ltd ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](../abs/2403.05087.md)] [[arXiv:2403.05087](https://arxiv.org/abs/2403.05087)] [[Code](https://github.com/initialneil/SplattingAvatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [57] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization\n- **🧑‍🔬 作者**：Jiahui Zhang, Fangneng Zhan, MUYU XU, Shijian Lu, Eric P. Xing\n- **🏫 单位**：Nanyang Technological University ⟐ Max Planck Institute for Informatics ⟐ Carnegie Mellon University ⟐ MBZUAI\n- **🔗 链接**：[[中英摘要](../abs/2403.06908.md)] [[arXiv:2403.06908](https://arxiv.org/abs/2403.06908)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [58] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization\n- **🧑‍🔬 作者**：Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu\n- **🏫 单位**：Beihang University ⟐ Chinese Academy of Sciences ⟐ Griffith University ⟐ RIKEN AIP ⟐ The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2403.06912.md)] [[arXiv:2403.06912](https://arxiv.org/abs/2403.06912)] [[Code](https://github.com/Fictionarry/DNGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [59] HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting\n- **🧑‍🔬 作者**：Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao\n- **🏫 单位**：Zhejiang University ⟐ Huawei Noah’s Ark Lab ⟐ University of Tübingen ⟐ Tubingen AI Center\n- **🔗 链接**：[[中英摘要](../abs/2403.12722.md)] [[arXiv:2403.12722](https://arxiv.org/abs/2403.12722)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [60] 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis\n- **🧑‍🔬 作者**：Zhicheng Lu, Xiang Guo, Le Hui, Tianrui Chen, Min Yang, Xiao Tang, Feng Zhu, Yuchao Dai\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Samsung R&D Institute\n- **🔗 链接**：[[中英摘要](../abs/2404.06270.md)] [[arXiv:2404.06270](https://arxiv.org/abs/2404.06270)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [61] GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh\n- **🧑‍🔬 作者**：Jing Wen, Xiaoming Zhao, Zhongzheng Ren, Alexander G. Schwing, Shenlong Wang\n- **🏫 单位**：University of Illinois at Urbana-Champaign\n- **🔗 链接**：[[中英摘要](../abs/2404.07991.md)] [[arXiv:2404.07991](https://arxiv.org/abs/2404.07991)] [[Code](https://github.com/wenj/GoMAvatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [62] Gaussian Splatting Decoder for 3D‑aware Generative Adversarial Networks\n- **🧑‍🔬 作者**：Florian Barthel, Arian Beckmann, Wieland Morgenstern, Anna Hilsmann, Peter Eisert\n- **🏫 单位**：Heinrich-Hertz Institut, Humboldt University of Berlin\n- **🔗 链接**：[[中英摘要](../abs/2404.10625.md)] [[arXiv:2404.10625](https://arxiv.org/abs/2404.10625)] [[Code](https://github.com/fraunhoferhhi/gaussian_gan_decoder)]\n- **📝 说明**：🏆 Accepted to CVPR 2024 Workshop on Generative Models for Computer Vision\n\n#### [63] Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses\n- **🧑‍🔬 作者**：Inhee Lee, Byungjun Kim, Hanbyul Joo\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2404.14410.md)] [[arXiv:2404.14410](https://arxiv.org/abs/2404.14410)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [64] Interactive3D: Create What You Want by Interactive 3D Generation\n- **🧑‍🔬 作者**：Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu\n- **🏫 单位**：Hong Kong University of Science and Technology ⟐ The Chinese University of Hong Kong ⟐ SenseTime Research ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2404.16510.md)] [[arXiv:2404.16510](https://arxiv.org/abs/2404.16510)] [[Code](https://github.com/interactive-3d/interactive3d)]\n- **📝 说明**：🏆 Accepted to CVPR 2024\n\n#### [65] ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation\n- **🧑‍🔬 作者**：Pengzhi Li, Chengshuai Tang, Qinxuan Huang, Zhiheng Li\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School ⟐ Tsinghua-Berkeley Shenzhen Institute\n- **🔗 链接**：[[中英摘要](../abs/2405.10508.md)] [[arXiv:2405.10508](https://arxiv.org/abs/2405.10508)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024 Workshop on AI3DG\n\n#### [66] ICE-G: Image Conditional Editing of 3D Gaussian Splats\n- **🧑‍🔬 作者**：Vishnu Jaganathan, Hannah Hanyun Huang, Muhammad Zubair Irshad, Varun Jampani, Amit Raj, Zsolt Kira\n- **🏫 单位**：Georgia Institute of Technology ⟐ Toyota Research Institute ⟐ Stability AI ⟐ Google Research\n- **🔗 链接**：[[中英摘要](../abs/2406.08488.md)] [[arXiv:2406.08488](https://arxiv.org/abs/2406.08488)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2024 Workshop on AI4CC\n\n#### [67] Low Latency Point Cloud Rendering with Learned Splatting\n- **🧑‍🔬 作者**：Gennady Sidorov, Malik Mohrat, Ksenia Lebedeva, Ruslan Rakhimov, Sergey Kolyubin\n- **🏫 单位**：Tandon School of Engineering, New York University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2409.16504.md)] [[arXiv:2409.16504](https://arxiv.org/abs/2409.16504)] [[Code](https://github.com/huzi96/gaussian-pcloud-render)]\n- **📝 说明**：🏆 Accepted to CVPR 2024 Workshop on AIS: Vision, Graphics and AI for Streaming\n"
  },
  {
    "path": "2024/CoRL.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to CoRL2024\n\n#### [1] Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion\n- **🧑‍🔬 作者**：Tianyi Xiong, Jiayi Wu, Botao He, Cornelia Fermuller, Yiannis Aloimonos, Heng Huang, Christopher A. Metzler\n- **🏫 单位**：University of Maryland, College Park\n- **🔗 链接**：[[中英摘要](../abs/2406.02972.md)] [[arXiv:2406.02972](https://arxiv.org/abs/2406.02972)] [Code]\n- **📝 说明**：🏆 Accepted to CoRL 2024\n\n#### [2] Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics\n- **🧑‍🔬 作者**：Jad Abou-Chakra, Krishan Rana, Feras Dayoub, Niko Sünderhauf\n- **🏫 单位**：Queensland University of Technology ⟐ University of Adelaide\n- **🔗 链接**：[[中英摘要](./abs/2406.10788.md)] [[arXiv:2406.10788](https://arxiv.org/abs/2406.10788)] [[Code](https://github.com/bdaiinstitute/embodied_gaussians)]\n- **📝 说明**: 🏆 Accepted to CoRL 2024\n\n#### [3] SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting\n- **🧑‍🔬 作者**：Mohammad Nomaan Qureshi, Sparsh Garg, Francisco Yandun, David Held, George Kantor, Abhishesh Silwal\n- **🏫 单位**：Carnegie Mellon University, USA\n- **🔗 链接**：[[中英摘要](../abs/2409.10161.md)] [[arXiv:2409.10161](https://arxiv.org/abs/2409.10161)] [[Code](https://splatsim.github.io/)]\n- **📝 说明**：🏆 Accepted to CoRL 2024 MRM-D Workshop\n\n#### [4] Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling\n- **🧑‍🔬 作者**：Mingtong Zhang, Kaifeng Zhang, Yunzhu Li\n- **🏫 单位**：University of Illinois Urbana-Champaign ⟐ Columbia University\n- **🔗 链接**：[[中英摘要](../abs/2410.18912.md)] [[arXiv:2410.18912](https://arxiv.org/abs/2410.18912)] [[Code](https://github.com/robo-alex/gs-dynamics)]\n- **📝 说明**：🏆 Accepted to CoRL 2024\n\n#### [5] Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Michael Büttner, Jonathan Francis, Helge Rhodin, Andrew Melnik\n- **🏫 单位**：Bielefeld University ⟐ Carnegie Mellon University ⟐ Bremen University\n- **🔗 链接**：[[中英摘要](../abs/2411.03555.md)] [[arXiv:2411.03555](https://arxiv.org/abs/2411.03555)] [Code]\n- **📝 说明**：🏆 Accepted to CoRL 2024 Workshop on Lifelong Learning for Home Robots\n\n#### [6] Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision\n- **🧑‍🔬 作者**：Alberta Longhini, Marcel Büsching, Bardienus P. Duisterhof, Jens Lundell, Jeffrey Ichnowski, Mårten Björkman, Danica Kragic\n- **🏫 单位**: KTH Royal Institute of Technology ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2501.01715.md)] [[arXiv:2501.01715](https://arxiv.org/abs/2501.01715)] [[Code](https://github.com/KTH-RPL/cloth-splatting)]\n- **📝 说明**：🏆 Accepted to CoRL 2024\n"
  },
  {
    "path": "2024/ECCV.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ECCV2024\n\n#### [1] An Efficient 3D Gaussian Representation for Monocular/Multi-view Dynamic Scenes\n- **🧑‍🔬 作者**：Kai Katsumata, Duc Minh Vo, Hideki Nakayama\n- **🏫 单位**：The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2311.12897.md)] [[arXiv:2311.12897](https://arxiv.org/abs/2311.12897)] [[Code](https://github.com/raven38/EfficientDynamic3DGaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [2] Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing\n- **🧑‍🔬 作者**：Jian Gao, Chun Gu, Youtian Lin, Hao Zhu, Xun Cao, Li Zhang, Yao Yao\n- **🏫 单位**：Nanjing University ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2311.16043.md)] [[arXiv:2311.16043](https://arxiv.org/abs/2311.16043)] [[Code](https://github.com/NJU-3DV/Relightable3DGaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [3] Compact3D: Smaller and Faster Gaussian Splatting with Vector Quantization\n- **🧑‍🔬 作者**：KL Navaneet, Kossar Pourahmadi Meibodi, Soroush Abbasi Koohpayegani, Hamed Pirsiavash\n- **🏫 单位**：University of California, Davis\n- **🔗 链接**：[[中英摘要](../abs/2311.18159.md)] [[arXiv:2311.18159](https://arxiv.org/abs/2311.18159)] [[Code](https://github.com/UCDvision/compact3d)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [4] DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Agelos Kratimenos, Jiahui Lei, Kostas Daniilidis\n- **🏫 单位**：University of Pennsylvania\n- **🔗 链接**：[[中英摘要](../abs/2312.00112.md)] [[arXiv:2312.00112](https://arxiv.org/abs/2312.00112)] [[Code](https://github.com/agelosk/dynmf)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [5] FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting\n- **🧑‍🔬 作者**：Zehao Zhu, Zhiwen Fan, Yifan Jiang, Zhangyang Wang\n- **🏫 单位**：University of Texas at Austin\n- **🔗 链接**：[[中英摘要](../abs/2312.00451.md)] [[arXiv:2312.00451](https://arxiv.org/abs/2312.00451)] [[Code](https://github.com/VITA-Group/FSGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [6] Gaussian Grouping: Segment and Edit Anything in 3D Scenes\n- **🧑‍🔬 作者**：Mingqiao Ye, Martin Danelljan, Fisher Yu, Lei Ke\n- **🏫 单位**：ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2312.00732.md)] [[arXiv:2312.00732](https://arxiv.org/abs/2312.00732)] [[Code](https://github.com/lkeab/gaussian-grouping)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [7] HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Helisa Dhamo, Yinyu Nie, Arthur Moreau, Jifei Song, Richard Shaw, Yiren Zhou, Eduardo Pérez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2312.02902.md)] [[arXiv:2312.02902](https://arxiv.org/abs/2312.02902)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [8] EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS\n- **🧑‍🔬 作者**：Sharath Girish, Kamal Gupta, Abhinav Shrivastava\n- **🏫 单位**：University of Maryland\n- **🔗 链接**：[[中英摘要](../abs/2312.04564.md)] [[arXiv:2312.04564](https://arxiv.org/abs/2312.04564)] [[Code](https://github.com/Sharath-girish/efficientgaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [9] Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu, Guosheng Lin\n- **🏫 单位**：Nanyang Technological University ⟐ OPPO US Research Center ⟐ A*STAR\n- **🔗 链接**：[[中英摘要](../abs/2312.04820.md)] [[arXiv:2312.04820](https://arxiv.org/abs/2312.04820)] [[Code](https://github.com/yangxiaofeng/LODS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [10] Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting\n- **🧑‍🔬 作者**：Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan\n- **🏫 单位**：Peking University ⟐ Pengcheng Laboratory ⟐  National University of Singapore ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](../abs/2312.13271.md)] [[arXiv:2312.13271](https://arxiv.org/abs/2312.13271)] [[Code](https://github.com/PKU-YuanGroup/repaint123)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [11] Compact 3D Scene Representation via Self-Organizing Gaussian Grids\n- **🧑‍🔬 作者**：Wieland Morgenstern, Florian Barthel, Anna Hilsmann, Peter Eisert\n- **🏫 单位**：Fraunhofer Heinrich Hertz Institute ⟐ Humboldt University of Berlin\n- **🔗 链接**：[[中英摘要](../abs/2312.13299.md)] [[arXiv:2312.13299](https://arxiv.org/abs/2312.13299)] [[Code](https://github.com/fraunhoferhhi/Self-Organizing-Gaussians)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [12] Deblurring 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Byeonghyeon Lee, Howoong Lee, Xiangyu Sun, Usman Ali, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Hanhwa Vision\n- **🔗 链接**：[[中英摘要](../abs/2401.00834.md)] [[arXiv:2401.00834](https://arxiv.org/abs/2401.00834)] [[Code](https://github.com/benhenryL/Deblurring-3D-Gaussian-Splatting)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [13] Street Gaussians for Modeling Dynamic Urban Scenes\n- **🧑‍🔬 作者**：Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng\n- **🏫 单位**：Zhejiang University ⟐ Li Auto\n- **🔗 链接**：[[中英摘要](../abs/2401.01339.md)] [[arXiv:2401.01339](https://arxiv.org/abs/2401.01339)] [[Code](https://github.com/zju3dv/street_gaussians)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [14] On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy\n- **🧑‍🔬 作者**：Letian Huang, Jiayang Bai, Jie Guo, Yuanqi Li, Yanwen Guo\n- **🏫 单位**：Nanjing University\n- **🔗 链接**：[[中英摘要](../abs/2402.00752.md)] [[arXiv:2402.00752](https://arxiv.org/abs/2402.00752)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [15] SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM\n- **🧑‍🔬 作者**：Mingrui Li, Shuhong Liu, Heng Zhou\n- **🏫 单位**：Dalian University of Technology ⟐ The University of Tokyo ⟐ Columbia University\n- **🔗 链接**：[[中英摘要](./abs/2402.03246.md)] [[arXiv:2402.03246](https://arxiv.org/abs/2402.03246)] [[Code](https://github.com/ShuhongLL/SGS-SLAM)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [16] LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation\n- **🧑‍🔬 作者**：Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, Ziwei Liu\n- **🏫 单位**：Peking University ⟐ Nanyang Technological University ⟐ Shanghai AI Lab\n- **🔗 链接**：[[中英摘要](../abs/2402.05054.md)] [[arXiv:2402.05054](https://arxiv.org/abs/2402.05054)] [[Code](https://github.com/3DTopia/LGM)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [17] HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2402.06149.md)] [[arXiv:2402.06149](https://arxiv.org/abs/2402.06149)] [[Code](https://github.com/ZhenglinZhou/HeadStudio)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [18] Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis\n- **🧑‍🔬 作者**：Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille\n- **🏫 单位**：Johns Hopkins University ⟐ HKUST(GZ) ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2403.04116.md)] [[arXiv:2403.04116](https://arxiv.org/abs/2403.04116)] [[Code](https://github.com/caiyuanhao1998/X-Gaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [19] BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling\n- **🧑‍🔬 作者**：Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa\n- **🏫 单位**：Johns Hopkins University\n- **🔗 链接**：[[中英摘要](../abs/2403.04926.md)] [[arXiv:2403.04926](https://arxiv.org/abs/2403.04926)] [[Code](https://github.com/snldmt/BAGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [20] ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation\n- **🧑‍🔬 作者**：Guanxing Lu, Shiyi Zhang, Ziwei Wang, Changliu Liu, Jiwen Lu, Yansong Tang\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School ⟐ Carnegie Mellon University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2403.08321.md)] [[arXiv:2403.08321](https://arxiv.org/abs/2403.08321)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [21] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing\n- **🧑‍🔬 作者**：Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu\n- **🏫 单位**：University of Oxford ⟐ Mohamed bin Zayed University of Artificial Intelligence\n- **🔗 链接**：[[中英摘要](../abs/2403.08733.md)] [[arXiv:2403.08733](https://arxiv.org/abs/2403.08733)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [22] Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians\n- **🧑‍🔬 作者**：Licheng Zhong, Hong-Xing Yu, Jiajun Wu, Yunzhu Li\n- **🏫 单位**：Stanford University ⟐ University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](../abs/2403.09434.md)] [[arXiv:2403.09434](https://arxiv.org/abs/2403.09434)] [[Code](https://github.com/Colmar-zlicheng/Spring-Gaus)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [23] GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time\n- **🧑‍🔬 作者**：Hao Li, Yuanyuan Gao, Dingwen Zhang, Chenming Wu, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Baidu Inc. ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2403.10147.md)] [[arXiv:2403.10147](https://arxiv.org/abs/2403.10147)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [24] Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration\n- **🧑‍🔬 作者**：Zhihao Liang, Qi Zhang, Wenbo Hu, Ying Feng, Lei Zhu, Kui Jia\n- **🏫 单位**：South China University of Technology ⟐ Tencent AI Lab ⟐ City University of Hong Kong ⟐ The Chinese University of Hong Kong, Shenzhen\n- **🔗 链接**：[[中英摘要](../abs/2403.11056.md)] [[arXiv:2403.11056](https://arxiv.org/abs/2403.11056)] [[Code](https://github.com/lzhnb/Analytic-Splatting)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [25] GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering\n- **🧑‍🔬 作者**：Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Tianjin University ⟐ National University of Singapore ⟐ Google\n- **🔗 链接**：[[中英摘要](../abs/2403.11324.md)] [[arXiv:2403.11324](https://arxiv.org/abs/2403.11324)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [26] BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting\n- **🧑‍🔬 作者**：Lingzhe Zhao, Peng Wang, Peidong Liu\n- **🏫 单位**：Westlake University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2403.11831.md)] [[arXiv:2403.11831](https://arxiv.org/abs/2403.11831)] [[Code](https://github.com/WU-CVGL/BAD-Gaussians)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [27] View-Consistent 3D Editing with Gaussian Splatting\n- **🧑‍🔬 作者**：Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang\n- **🏫 单位**：Nanyang Technological University ⟐ Singapore University of Technology and Design ⟐ Hong Kong University of Science and Technology ⟐ Skywork AI\n- **🔗 链接**：[[中英摘要](../abs/2403.11868.md)] [[arXiv:2403.11868](https://arxiv.org/abs/2403.11868)] [[Code](https://github.com/Yuxuan-W/VcEdit)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [28] RGBD GS-ICP SLAM\n- **🧑‍🔬 作者**：Seongbo Ha, Jiung Yeon, Hyeonwoo Yu\n- **🏫 单位**：Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](./abs/2403.12550.md)] [[arXiv:2403.12550](https://arxiv.org/abs/2403.12550)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [29] GVGEN: Text-to-3D Generation with Volumetric Representation\n- **🧑‍🔬 作者**：Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He\n- **🏫 单位**：Shanghai AI Lab ⟐ Tsinghua Shenzhen International Graduate School ⟐ Shanghai Jiao Tong University ⟐ Zhejiang University ⟐ VAST\n- **🔗 链接**：[[中英摘要](../abs/2403.12957.md)] [[arXiv:2403.12957](https://arxiv.org/abs/2403.12957)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [30] Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion\n- **🧑‍🔬 作者**：Otto Seiskari, Jerry Ylilammi, Valtteri Kaatrasalo, Pekka Rantalankila, Matias Turkulainen, Juho Kannala, Esa Rahtu, Arno Solin\n- **🏫 单位**：Spectacular AI ⟐ ETH Zurich ⟐ Aalto University ⟐ Tampere University\n- **🔗 链接**：[[中英摘要](../abs/2403.13327.md)] [[arXiv:2403.13327](https://arxiv.org/abs/2403.13327)] [[Code](https://github.com/SpectacularAI/3dgs-deblur)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [31] Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians\n- **🧑‍🔬 作者**：Guangchi Fang, Bing Wang\n- **🏫 单位**：The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](../abs/2403.14166.md)] [[arXiv:2403.14166](https://arxiv.org/abs/2403.14166)] [[Code](https://github.com/fatPeter/mini-splatting)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [32] HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression\n- **🧑‍🔬 作者**：Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Monash University\n- **🔗 链接**：[[中英摘要](../abs/2403.14530.md)] [[arXiv:2403.14530](https://arxiv.org/abs/2403.14530)] [[Code](https://github.com/YihangChen-ee/HAC)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [33] Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering\n- **🧑‍🔬 作者**：Antoine Guédon, Vincent Lepetit\n- **🏫 单位**：Univ Gustave Eiffel\n- **🔗 链接**：[[中英摘要](../abs/2403.14554.md)] [[arXiv:2403.14554](https://arxiv.org/abs/2403.14554)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [34] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images\n- **🧑‍🔬 作者**：Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai\n- **🏫 单位**：Monash University ⟐ ETH Zurich ⟐ University of Tübingen ⟐ University of Oxford ⟐ Microsoft ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2403.14627.md)] [[arXiv:2403.14627](https://arxiv.org/abs/2403.14627)] [[Code](https://github.com/donydchen/mvsplat)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [35] STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians\n- **🧑‍🔬 作者**：Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao\n- **🏫 单位**：Nanjing University ⟐ Institution of Automation, Chinese Academy of Science ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2403.14939.md)] [[arXiv:2403.14939](https://arxiv.org/abs/2403.14939)] [[Code](https://github.com/zeng-yifei/STAG4D)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [36] Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zheng Zhang, Wenbo Hu, Yixing Lao, Tong He, Hengshuang Zhao\n- **🏫 单位**：The University of Hong Kong ⟐ Tencent AI Lab ⟐ Shanghai AI Lab\n- **🔗 链接**：[[中英摘要](../abs/2403.15530.md)] [[arXiv:2403.15530](https://arxiv.org/abs/2403.15530)] [[Code](https://github.com/zhengzhang01/Pixel-GS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [37] Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections\n- **🧑‍🔬 作者**：Dongbin Zhang, Chuming Wang, Weitao Wang, Peihao Li, Minghan Qin, Haoqian Wang\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2403.15704.md)] [[arXiv:2403.15704](https://arxiv.org/abs/2403.15704)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [38] CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field\n- **🧑‍🔬 作者**：Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2403.16095.md)] [[arXiv:2403.16095](https://arxiv.org/abs/2403.16095)] [[Code](https://github.com/hjr37/CG-SLAM)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [39] latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction\n- **🧑‍🔬 作者**：Christopher Wewer, Kevin Raj, Eddy Ilg, Bernt Schiele, Jan Eric Lenssen\n- **🏫 单位**：Max Planck Institute for Informatics ⟐ Saarland University\n- **🔗 链接**：[[中英摘要](../abs/2403.16292.md)] [[arXiv:2403.16292](https://arxiv.org/abs/2403.16292)] [[Code](https://github.com/Chrixtar/latentsplat)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [40] CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians\n- **🧑‍🔬 作者**：Avinash Paliwal, Wei Ye, Jinhui Xiong, Dmytro Kotovenko, Rakesh Ranjan, Vikas Chandra, Nima Khademi Kalantari\n- **🏫 单位**：Texas A&M University ⟐ Meta Reality Labs ⟐ LMU Munich\n- **🔗 链接**：[[中英摘要](../abs/2403.19495.md)] [[arXiv:2403.19495](https://arxiv.org/abs/2403.19495)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [41] CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians\n- **🧑‍🔬 作者**：Yang Liu, He Guan, Chuanchen Luo, Lue Fan, Junran Peng, Zhaoxiang Zhang\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Centre for Artificial Intelligence and Robotics ⟐ State Key Laboratory of Multimodal Artificial Intelligence Systems\n- **🔗 链接**：[[中英摘要](../abs/2404.01133.md)] [[arXiv:2404.01133](https://arxiv.org/abs/2404.01133)] [[Code](https://github.com/DekuLiuTesla/CityGaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [42] Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing\n- **🧑‍🔬 作者**：Ri-Zhao Qiu, Ge Yang, Weijia Zeng, Xiaolong Wang\n- **🏫 单位**：University of California San Diego ⟐ Massachusetts Institute of Technology ⟐ Institute for Artificial Intelligence and Fundamental Interactions\n- **🔗 链接**：[[中英摘要](../abs/2404.01223.md)] [[arXiv:2404.01223](https://arxiv.org/abs/2404.01223)] [[Code](https://github.com/vuer-ai/feature_splatting)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [43] Surface Reconstruction from Gaussian Splatting via Novel Stereo Views\n- **🧑‍🔬 作者**：Yaniv Wolf, Amit Bracha, Ron Kimmel\n- **🏫 单位**：Israel Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2404.01810.md)] [[arXiv:2404.01810](https://arxiv.org/abs/2404.01810)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [44] Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jeongmin Bae, Seoha Kim, Youngsik Yun, Hahyun Lee, Gun Bang, Youngjung Uh\n- **🏫 单位**：Yonsei University ⟐ Electronics and Telecommunications Research Institute\n- **🔗 链接**：[[中英摘要](../abs/2404.03613.md)] [[arXiv:2404.03613](https://arxiv.org/abs/2404.03613)] [[Code](https://github.com/JeongminB/E-D3DGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [45] PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations\n- **🧑‍🔬 作者**：Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein\n- **🏫 单位**：Stanford University ⟐ Carnegie Mellon University ⟐ Google ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2404.04421.md)] [[arXiv:2404.04421](https://arxiv.org/abs/2404.04421)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [46] Dual-Camera Smooth Zoom on Mobile Phones\n- **🧑‍🔬 作者**：Renlong Wu, Zhilu Zhang, Yu Yang, Wangmeng Zuo\n- **🏫 单位**：Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2404.04908.md)] [[arXiv:2404.04908](https://arxiv.org/abs/2404.04908)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [47] DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting\n- **🧑‍🔬 作者**：Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi\n- **🏫 单位**：University of California, Los Angeles ⟐ University of Texas at Austin ⟐ DEVCOM Army Research Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2404.06903.md)] [[arXiv:2404.06903](https://arxiv.org/abs/2404.06903)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [48] GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal\n- **🧑‍🔬 作者**：Yuxin Wang, Qianyi Wu, Guofeng Zhang, Dan Xu\n- **🏫 单位**：HKUST ⟐ Monash University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2404.13679.md)] [[arXiv:2404.13679](https://arxiv.org/abs/2404.13679)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [49] TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu\n- **🏫 单位**：Beihang University ⟐ Chinese Academy of Sciences ⟐ Griffith University ⟐ RIKEN AIP ⟐ The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2404.15264.md)] [[arXiv:2404.15264](https://arxiv.org/abs/2404.15264)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [50] DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing\n- **🧑‍🔬 作者**：Minghao Chen, Iro Laina, Andrea Vedaldi\n- **🏫 单位**：Visual Geometry Group, University of Oxford\n- **🔗 链接**：[[中英摘要](../abs/2404.18929.md)] [[arXiv:2404.18929](https://arxiv.org/abs/2404.18929)] [[Code](https://github.com/silent-chen/DGE)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [51] SAGS: Structure-Aware 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, Stefanos Zafeiriou\n- **🏫 单位**：Imperial College London ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2404.19149.md)] [[arXiv:2404.19149](https://arxiv.org/abs/2404.19149)] [[Code](https://github.com/eververas/SAGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [52] MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections\n- **🧑‍🔬 作者**：Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan\n- **🏫 单位**：Tsinghua University ⟐ Huawei Noah’s Ark Lab ⟐ University of Toronto ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](../abs/2405.11921.md)] [[arXiv:2405.11921](https://arxiv.org/abs/2405.11921)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [53] CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization\n- **🧑‍🔬 作者**：Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, Xiao Bai\n- **🏫 单位**：Beihang University ⟐  Macquarie University ⟐ RIKEN AIP ⟐ The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2405.12110.md)] [[arXiv:2405.12110](https://arxiv.org/abs/2405.12110)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [54] Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo\n- **🧑‍🔬 作者**：Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu\n- **🏫 单位**：Huazhong University of Science and Technology ⟐  Nanyang Technological University ⟐  Great Bay University ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2405.12218.md)] [[arXiv:2405.12218](https://arxiv.org/abs/2405.12218)] [[Code](https://github.com/TQTQliu/MVSGaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [55] GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction\n- **🧑‍🔬 作者**：Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University ⟐ University of California, Berkeley ⟐ PhiGent Robotics\n- **🔗 链接**：[[中英摘要](../abs/2405.17429.md)] [[arXiv:2405.17429](https://arxiv.org/abs/2405.17429)] [[Code](https://github.com/huang-yh/GaussianFormer)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [56] DGD: Dynamic 3D Gaussians Distillation\n- **🧑‍🔬 作者**：Isaac Labe, Noam Issachar, Itai Lang, Sagie Benaim\n- **🏫 单位**：The Hebrew University of Jerusalem ⟐ University of Chicago\n- **🔗 链接**：[[中英摘要](../abs/2405.19321.md)] [[arXiv:2405.19321](https://arxiv.org/abs/2405.19321)] [[Code](https://github.com/Isaaclabe/DGD-Dynamic-3D-Gaussians-Distillation)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [57] Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture\n- **🧑‍🔬 作者**：X. Li, Y. Cheng, X. Ren, H. Jia, D. Xu, W. Zhu, Y. Yan\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Huawei Cloud Computing Technologies Co., Ltd\n- **🔗 链接**：[[中英摘要](../abs/2406.00440.md)] [[arXiv:2406.00440](https://arxiv.org/abs/2406.00440)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [58] SuperGaussian: Repurposing Video Models for 3D Super Resolution\n- **🧑‍🔬 作者**：Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Früstück\n- **🏫 单位**：University of Illinois at Urbana-Champaign ⟐ Adobe Research ⟐ University College London\n- **🔗 链接**：[[中英摘要](../abs/2406.00609.md)] [[arXiv:2406.00609](https://arxiv.org/abs/2406.00609)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [59] End-to-End Rate-Distortion Optimized 3D Gaussian Representation\n- **🧑‍🔬 作者**：Henan Wang, Hanxin Zhu, Tianyu He, Runsen Feng, Jiajun Deng, Jiang Bian, Zhibo Chen\n- **🏫 单位**：University of Science and Technology of China ⟐ Microsoft Research Asia ⟐ The University of Adelaide\n- **🔗 链接**：[[中英摘要](../abs/2406.01597.md)] [[arXiv:2406.01597](https://arxiv.org/abs/2406.01597)] [[Code](https://github.com/USTC-IMCL/RDO-Gaussian)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [60] VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors\n- **🧑‍🔬 作者**：Sungwon Hwang, Min-Jung Kim, Taewoong Kang, Jayeon Kang, Jaegul Choo\n- **🏫 单位**：KAIST ⟐ Ghent University\n- **🔗 链接**：[[中英摘要](../abs/2407.02945.md)] [[arXiv:2407.02945](https://arxiv.org/abs/2407.02945)] [[Code](https://github.com/deepshwang/vegs)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [61] GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction\n- **🧑‍🔬 作者**：Yuxuan Mu, Xinxin Zuo, Chuan Guo, Yilin Wang, Juwei Lu, Xiaofeng Wu, Songcen Xu, Peng Dai, Youliang Yan, Li Cheng\n- **🏫 单位**：University of Alberta\n- **🔗 链接**：[[中英摘要](../abs/2407.04237.md)] [[arXiv:2407.04237](https://arxiv.org/abs/2407.04237)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [62] LaRa: Efficient Large-Baseline Radiance Fields\n- **🧑‍🔬 作者**：Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, Andreas Geiger\n- **🏫 单位**：University of Tübingen ⟐ ETH Zürich\n- **🔗 链接**：[[中英摘要](../abs/2407.04699.md)] [[arXiv:2407.04699](https://arxiv.org/abs/2407.04699)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [63] GaussReg: Fast 3D Registration with Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahao Chang, Yinglin Xu, Yihao Li, Yuantao Chen, Xiaoguang Han\n- **🏫 单位**：School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen ⟐ The Future Network of Intelligence Institute, CUHK-Shenzhen\n- **🔗 链接**：[[中英摘要](../abs/2407.05254.md)] [[arXiv:2407.05254](https://arxiv.org/abs/2407.05254)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [64] MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition\n- **🧑‍🔬 作者**：Aggelina Chatziagapi, Grigorios G. Chrysos, Dimitris Samaras\n- **🏫 单位**：Stony Brook University ⟐ University of Wisconsin-Madison\n- **🔗 链接**：[[中英摘要](../abs/2407.07284.md)] [[arXiv:2407.07284](https://arxiv.org/abs/2407.07284)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [65] 3DEgo: 3D Editing on the Go!\n- **🧑‍🔬 作者**：Umar Khalid, Hasan Iqbal, Azib Farooq, Jing Hua, Chen Chen\n- **🏫 单位**：University of Central Florida, Orlando ⟐ Wayne State University ⟐ Miami University\n- **🔗 链接**：[[中英摘要](../abs/2407.10102.md)] [[arXiv:2407.10102](https://arxiv.org/abs/2407.10102)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [66] iHuman: Instant Animatable Digital Humans From Monocular Videos\n- **🧑‍🔬 作者**：Pramish Paudel, Anubhav Khanal, Ajad Chhatkuli, Danda Pani Paudel, Jyoti Tandukar\n- **🏫 单位**：Tribhuvan University, Lalitpur, Nepal ⟐ ETH Zürich ⟐ NAAMI, Kathmandu ⟐ INSAIT, Sofia\n- **🔗 链接**：[[中英摘要](../abs/2407.11174.md)] [[arXiv:2407.11174](https://arxiv.org/abs/2407.11174)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [67] Click-Gaussian: Interactive Segmentation to Any 3D Gaussians\n- **🧑‍🔬 作者**：Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do\n- **🏫 单位**：AI Lab, CTO Division, LG Electronics, Republic of Korea ⟐  Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2407.11793.md)] [[arXiv:2407.11793](https://arxiv.org/abs/2407.11793)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [68] Generalizable Human Gaussians for Sparse View Synthesis\n- **🧑‍🔬 作者**：Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, Aayush Prakash, Fernando De la Torre\n- **🏫 单位**：Carnegie Mellon University ⟐ Meta Reality Labs\n- **🔗 链接**：[[中英摘要](../abs/2407.12777.md)] [[arXiv:2407.12777](https://arxiv.org/abs/2407.12777)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [69] Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation\n- **🧑‍🔬 作者**：Zongrui Li, Minghui Hu, Qian Zheng, Xudong Jiang\n- **🏫 单位**： Nanyang Technological University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2407.13584.md)] [[arXiv:2407.13584](https://arxiv.org/abs/2407.13584)] [[Code](https://github.com/LMozart/ECCV2024-GCS-BEG)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [70] 3D Gaussian Parametric Head Model\n- **🧑‍🔬 作者**：Yuelang Xu, Lizhen Wang, Zerong Zheng, Zhaoqi Su, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ NNKosmos\n- **🔗 链接**：[[中英摘要](../abs/2407.15070.md)] [[arXiv:2407.15070](https://arxiv.org/abs/2407.15070)] [[Code](https://github.com/YuelangX/3DGPHM)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [71] 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model\n- **🧑‍🔬 作者**：Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue\n- **🏫 单位**：Fondazione Istituto Italiano di Tecnologia (IIT) ⟐ Fondazione Bruno Kessler (FBK) ⟐ Università di Trento ⟐ Durham University\n- **🔗 链接**：[[中英摘要](../abs/2407.15484.md)] [[arXiv:2407.15484](https://arxiv.org/abs/2407.15484)] [[Code](https://github.com/mbortolon97/6dgs)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [72] Expressive Whole-Body 3D Gaussian Avatar\n- **🧑‍🔬 作者**：Gyeongsik Moon, Takaaki Shiratori, Shunsuke Saito\n- **🏫 单位**：DGIST ⟐ Codec Avatars Lab, Meta\n- **🔗 链接**：[[中英摘要](../abs/2407.21686.md)] [[arXiv:2407.21686](https://arxiv.org/abs/2407.21686)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [73] EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head\n- **🧑‍🔬 作者**：Qianyun He, Xinya Ji, Yicheng Gong, Yuanxun Lu, Zhengyu Diao, Linjia Huang, Yao Yao, Siyu Zhu, Zhan Ma, Songcen Xu, Xiaofei Wu, Zixiao Zhang, Xun Cao, Hao Zhu\n- **🏫 单位**：Nanjing University ⟐ Fudan University ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2408.00297.md)] [[arXiv:2408.00297](https://arxiv.org/abs/2408.00297)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [74] 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhe Jun Tang, Tat-Jen Cham\n- **🏫 单位**：Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2408.03753.md)] [[arXiv:2408.03753](https://arxiv.org/abs/2408.03753)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [75] Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras\n- **🧑‍🔬 作者**：Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2409.04751.md)] [[arXiv:2409.04751](https://arxiv.org/abs/2409.04751)] [[Code](https://github.com/zmliao/Fisheye-GS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024 Workshop NFBCC\n\n#### [76] Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis\n- **🧑‍🔬 作者**：Qian Chen, Shihao Shu, Xiangzhi Bai\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](../abs/2409.08042.md)] [[arXiv:2409.08042](https://arxiv.org/abs/2409.08042)] [[Code](https://github.com/mzzcdf/Thermal3DGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [77] FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally\n- **🧑‍🔬 作者**：Qiuhong Shen, Xingyi Yang, Xinchao Wang\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2409.08270.md)] [[arXiv:2409.08270](https://arxiv.org/abs/2409.08270)] [[Code](https://github.com/florinshen/FlashSplat)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [78] MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation\n- **🧑‍🔬 作者**：Shuzhao Xie, Weixiang Zhang, Chen Tang, Yunpeng Bai, Rongwei Lu, Shijia Ge, Zhi Wang\n- **🏫 单位**：SIGS & TBSI, Tsinghua University ⟐ Peng Cheng Laboratory ⟐ MMLab, The Chinese University of Hong Kong ⟐ The University of Texas at Austin\n- **🔗 链接**：[[中英摘要](../abs/2409.09756.md)] [[arXiv:2409.09756](https://arxiv.org/abs/2409.09756)] [[Code](https://github.com/ShuzhaoXie/MesonGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [79] SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction\n- **🧑‍🔬 作者**：Marko Mihajlovic, Sergey Prokudin, Siyu Tang, Robert Maier, Federica Bogo, Tony Tung, Edmond Boyer\n- **🏫 单位**：ETH Zurich ⟐ Meta Reality Labs ⟐ Balgrist University Hospital\n- **🔗 链接**：[[中英摘要](../abs/2409.11211.md)] [[arXiv:2409.11211](https://arxiv.org/abs/2409.11211)] [[Code](https://github.com/markomih/SplatFields)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [80] Vista3D: Unravel the 3D Darkside of a Single Image\n- **🧑‍🔬 作者**：Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang\n- **🏫 单位**：National University of Singapore ⟐ Huawei Technologies Ltd\n- **🔗 链接**：[[中英摘要](../abs/2409.12193.md)] [[arXiv:2409.12193](https://arxiv.org/abs/2409.12193)] [[Code](https://github.com/florinshen/Vista3D)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [81] MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views\n- **🧑‍🔬 作者**：Wangze Xu, Huachen Gao, Shihe Shen, Rui Peng, Jianbo Jiao, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ RUniversity of Birmingham ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2409.14316.md)] [[arXiv:2409.14316](https://arxiv.org/abs/2409.14316)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [82] Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation\n- **🧑‍🔬 作者**：Mahtab Dahaghin, Myrna Castillo, Kourosh Riahidehkordi, Matteo Toso, Alessio Del Bue\n- **🏫 单位**：Pattern Analysis and Computer Vision (PAVIS) lab ⟐ Italian Institute of Technology (IIT)\n- **🔗 链接**：[[中英摘要](../abs/2409.19039.md)] [[arXiv:2409.19039](https://arxiv.org/abs/2409.19039)] [[Code](https://github.com/mahtaabdn/GaussianHeritage)]\n- **📝 说明**：🏆 Accepted to ECCV 2024 VISART Workshop\n\n#### [83] Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats\n- **🧑‍🔬 作者**：Mingyang Xie, Haoming Cai, Sachin Shah, Yiran Xu, Brandon Y. Feng, Jia-Bin Huang, Christopher A. Metzler\n- **🏫 单位**：University of Maryland ⟐ Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2410.02764.md)] [[arXiv:2410.02764](https://arxiv.org/abs/2410.02764)] [[Code](https://flash-splat.github.io/)]\n- **📝 说明**：🏆 Accepted to ECCV 2024\n\n#### [84] Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Raja Kumar, Vanshika Vats\n- **🏫 单位**：University of California Santa Cruz\n- **🔗 链接**：[[中英摘要](../abs/2410.11080.md)] [[arXiv:2410.11080](https://arxiv.org/abs/2410.11080)] [[Code](https://github.com/raja-kumar/depth-aware-3DGS)]\n- **📝 说明**：🏆 Accepted to ECCV 2024 S3DSGR Workshop\n\n#### [85] Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuanbo Chen, Chengyu Zhang, Jason Wang, Xuefan Gao, Avideh Zakhor\n- **🏫 单位**：UC Berkeley\n- **🔗 链接**：[[中英摘要](../abs/2410.11285.md)] [[arXiv:2410.11285](https://arxiv.org/abs/2410.11285)] [Code]\n- **📝 说明**：🏆 Accepted to ECCV 2024 S3DSGR Workshop\n\n#### [86] ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting\n- **🧑‍🔬 作者**：Takuma Nishimura, Andreea Dogaru, Martin Oeggerli, Bernhard Egger\n- **🏫 单位**：Friedrich-Alexander-Universität Erlangen-Nürnberg\n- **🔗 链接**：[[中英摘要](../abs/2410.21310.md)] [[arXiv:2410.21310](https://arxiv.org/abs/2410.21310)] [[Code](https://ronly2460.github.io/ArCSEM/)]\n- **📝 说明**：🏆 Accepted to ECCV 2024 AI for Visual Arts Workshop and Challenges\n"
  },
  {
    "path": "2024/ICLR.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICLR2024\n\n#### [1] DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation\n- **🧑‍🔬 作者**：Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng\n- **🏫 单位**：Peking University ⟐ Nanyang Technological University ⟐ Baidu Inc.\n- **🔗 链接**：[[中英摘要](../abs/2309.16653.md)] [[arXiv:2309.16653](https://arxiv.org/abs/2309.16653)] [[OpenReview](https://openreview.net/forum?id=UyNXMqnN3c)] [[Code](https://github.com/dreamgaussian/dreamgaussian)]\n- **📝 说明**：🏆 ICLR2024 Oral; 🌟 OpenReview Ratings: 8, 10, 8, 8\n\n#### [2] Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Zeyu Yang, Hongye Yang, Zijie Pan, Xiatian Zhu, Li Zhang\n- **🏫 单位**：Fudan University ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](../abs/2310.10642.md)] [[arXiv:2310.10642](https://arxiv.org/abs/2310.10642)] [[OpenReview](https://openreview.net/forum?id=WhgB5sispV)] [[Code](https://github.com/fudan-zvg/4d-gaussian-splatting)]\n- **📝 说明**：🏆 ICLR 2024 poster; 🌟 OpenReview Ratings: 8, 6, 6\n"
  },
  {
    "path": "2024/ICML.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICML2024\n\n#### [1] GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang\n- **🏫 单位**：Peking University ⟐ Google Research ⟐ University of California, Merced\n- **🔗 链接**：[[中英摘要](../abs/2402.07207.md)] [[arXiv:2402.07207](https://arxiv.org/abs/2402.07207)] [[Code](https://github.com/VDIGPKU/GALA3D)]\n- **📝 说明**：🏆 Accepted to ICML 2024\n\n#### [2] GaussianPro: 3D Gaussian Splatting with Progressive Propagation\n- **🧑‍🔬 作者**：Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, Xuejin Chen\n- **🏫 单位**：University of Science and Technology of China ⟐ The University of Hong Kong ⟐ Nanjing University ⟐ The University of Adelaide ⟐ ShanghaiTech University ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](../abs/2402.14650.md)] [[arXiv:2402.14650](https://arxiv.org/abs/2402.14650)] [[Code](https://github.com/kcheng1021/GaussianPro)]\n- **📝 说明**：🏆 Accepted to ICML 2024\n\n#### [3] EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting\n- **🧑‍🔬 作者**：Jiaxu Wang, Junhao He, Ziyi Zhang, Mingyuan Sun, Jingkai Sun, Renjing Xu\n- **🏫 单位**：Hong Kong University of Science and Technology, Guangzhou ⟐ Northeastern University, China\n- **🔗 链接**：[[中英摘要](../abs/2405.14959.md)] [[arXiv:2405.14959](https://arxiv.org/abs/2405.14959)] [Code]\n- **📝 说明**：🏆 Accepted to ICML 2024\n\n#### [4] Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Diwen Wan, Ruijie Lu, Gang Zeng\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](../abs/2406.03697.md)] [[arXiv:2406.03697](https://arxiv.org/abs/2406.03697)] [Code]\n- **📝 说明**：🏆 Accepted to ICML 2024\n"
  },
  {
    "path": "2024/IROS.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to IROS2024\n\n#### [1] PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation\n- **🧑‍🔬 作者**：Lukas Meyer, Floris Erich, Yusuke Yoshiyasu, Marc Stamminger, Noriaki Ando, Yukiyasu Domae\n- **🏫 单位**：Friedrich-Alexander-Universität Erlangen-Nürnberg-Fürth ⟐ Industrial CPS Research Center, National Institute of Advanced Industrial Science and Technology, Japan\n- **🔗 链接**：[[中英摘要](../abs/2401.02281.md)] [[arXiv:2401.02281](https://arxiv.org/abs/2401.02281)] [[Code](https://github.com/meyerls/PEGASUS)]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [2] Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Aiden Swann, Matthew Strong, Won Kyung Do, Gadiel Sznaier Camps, Mac Schwager, Monroe Kennedy III\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](../abs/2403.09875.md)] [[arXiv:2403.09875](https://arxiv.org/abs/2403.09875)] [Code]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [3] DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark\n- **🧑‍🔬 作者**：Tianyi Zhang, Kaining Huang, Weiming Zhi, Matthew Johnson-Roberson\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2403.10814.md)] [[arXiv:2403.10814](https://arxiv.org/abs/2403.10814)] [[Code](https://github.com/tyz1030/neuralight)]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [4] 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization\n- **🧑‍🔬 作者**：Peng Jiang, Gaurav Pandey, Srikanth Saripalli\n- **🏫 单位**：Texas A&M University\n- **🔗 链接**：[[中英摘要](../abs/2403.11367.md)] [[arXiv:2403.11367](https://arxiv.org/abs/2403.11367)] [Code]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [5] 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration\n- **🧑‍🔬 作者**：Quentin Herau, Moussab Bennehar, Arthur Moreau, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric Demonceaux\n- **🏫 单位**：Noah’s Ark ⟐ Universite de Bourgogne ⟐ Universite de Picardie Jules Verne\n- **🔗 链接**：[[中英摘要](../abs/2403.11577.md)] [[arXiv:2403.11577](https://arxiv.org/abs/2403.11577)] [Code]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [6] High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization\n- **🧑‍🔬 作者**：Shuo Sun, Malcolm Mielle, Achim J. Lilienthal, Martin Magnusson\n- **🏫 单位**：Orebro University ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2403.12535.md)] [[arXiv:2403.12535](https://arxiv.org/abs/2403.12535)] [Code]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [7] MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements\n- **🧑‍🔬 作者**：Lisong C. Sun, Neel P. Bhatt, Jonathan C. Liu, Zhiwen Fan, Zhangyang Wang, Todd E. Humphreys, Ufuk Topcu\n- **🏫 单位**：University of Texas at Austin\n- **🔗 链接**：[[中英摘要](../abs/2404.00923.md)] [[arXiv:2404.00923](https://arxiv.org/abs/2404.00923)] [Code]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [8] Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion\n- **🧑‍🔬 作者**：Ke Li, Reinhard Bacher, Susanne Schmidt, Wim Leemans, Frank Steinicke\n- **🏫 单位**：Deutsche Elektronen Synchrotron DESY ⟐ Hamburg University\n- **🔗 链接**：[[中英摘要](../abs/2408.01225.md)] [[arXiv:2408.01225](https://arxiv.org/abs/2408.01225)] [[Code](https://github.com/uhhhci/RealityFusion)]\n- **📝 说明**：🏆 Accepted to IROS 2024\n\n#### [9] Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization\n- **🧑‍🔬 作者**：Christian Schmidt, Jens Piekenbrinck, Bastian Leibe\n- **🏫 单位**：RWTH Aachen University\n- **🔗 链接**：[[中英摘要](../abs/2410.08743.md)] [[arXiv:2410.08743](https://arxiv.org/abs/2410.08743)] [Code]\n- **📝 说明**：🏆 Accepted to IROS 2024\n"
  },
  {
    "path": "2024/MICCAI.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to MICCAI2024\n\n#### [1] Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting\n- **🧑‍🔬 作者**：Lingting Zhu, Zhao Wang, Zhenchao Jin, Guying Lin, Lequan Yu\n- **🏫 单位**：The University of Hong Kong ⟐  The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2401.11535.md)] [[arXiv:2401.11535](https://arxiv.org/abs/2401.11535)] [[Code](https://github.com/HKU-MedAI/EndoGS)]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n\n#### [2] Endo-4DGS: Distilling Depth Ranking for Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Yiming Huang, Beilei Cui, Long Bai, Ziqi Guo, Mengya Xu, Hongliang Ren\n- **🏫 单位**：The Chinese University of Hong Kong ⟐  Shun Hing Institute of Advanced Engineering, CUHK ⟐ Shenzhen Research Institute, CUHK\n- **🔗 链接**：[[中英摘要](./abs/2401.16416.md)] [[arXiv:2401.16416](https://arxiv.org/abs/2401.16416)] [Code]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n\n#### [3] Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting\n- **🧑‍🔬 作者**：Shuojue Yang, Qian Li, Daiyun Shen, Bingchen Gong, Qi Dou, Yueming Jin\n- **🏫 单位**： National University of Singapore ⟐ Tsinghua University ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2405.17835.md)] [[arXiv:2405.17835](https://arxiv.org/abs/2405.17835)] [[Code](https://github.com/jinlab-imvr/Deform3DGS)]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n\n#### [4] LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction\n- **🧑‍🔬 作者**：Hengyu Liu, Yifan Liu, Chenxin Li, Wuyang Li, Yixuan Yuan\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2406.16073.md)] [[arXiv:2406.16073](https://arxiv.org/abs/2406.16073)] [[Code](https://github.com/CUHK-AIM-Group/LGS)]\n- **📝 说明**：🏆 Accepted by MICCAI 2024\n\n#### [5] EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting\n- **🧑‍🔬 作者**：Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan\n- **🏫 单位**：1 The Chinese University of Hong Kong ⟐ Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2407.01029.md)] [[arXiv:2407.01029](https://arxiv.org/abs/2407.01029)] [[Code](https://github.com/CUHK-AIM-Group/EndoSparse)]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n\n#### [6] Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction\n- **🧑‍🔬 作者**：Yiqun Lin, Hualiang Wang, Jixiang Chen, Xiaomeng Li\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ HKUST Shenzhen-Hong Kong Collaborative Innovation Research\nInstitute\n- **🔗 链接**：[[中英摘要](../abs/2407.01090.md)] [[arXiv:2407.01090](https://arxiv.org/abs/2407.01090)] [[Code](https://github.com/xmed-lab/DIF-Gaussian)]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n\n#### [7] Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction\n- **🧑‍🔬 作者**：Jiaxin Guo, Jiangliu Wang, Di Kang, Wenzhen Dong, Wenting Wang, Yun-hui Liu\n- **🏫 单位**：The Chinese University of Hong Kon ⟐  Tencent AI Lab ⟐ Hong Kong Center for Logistics Robotic\n- **🔗 链接**：[[中英摘要](../abs/2407.02918.md)] [[arXiv:2407.02918](https://arxiv.org/abs/2407.02918)] [[Code](https://github.com/wrld/Free-SurGS)]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n\n#### [8] Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Tianle Zeng, Gerardo Loza Galindo, Junlei Hu, Pietro Valdastri, Dominic Jones\n- **🏫 单位**：University of Leeds\n- **🔗 链接**：[[中英摘要](../abs/2407.14846.md)] [[arXiv:2407.14846](https://arxiv.org/abs/2407.14846)] [Code]\n- **📝 说明**：🏆 Accepted to MICCAI 2024\n"
  },
  {
    "path": "2024/NeurIPS.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to NeurIPS2024\n\n#### [1] LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS\n- **🧑‍🔬 作者**：Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang\n- **🏫 单位**：University of Texas at Austin ⟐ Xiamen University\n- **🔗 链接**：[[中英摘要](../abs/2311.17245.md)] [[arXiv:2311.17245](https://arxiv.org/abs/2311.17245)] [[Code](https://github.com/VITA-Group/LightGaussian)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [2] Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Ziyi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin\n- **🏫 单位**：Zhejiang University ⟐ The University of Hong Kong ⟐ ByteDance Inc.\n- **🔗 链接**：[[中英摘要](../abs/2402.15870.md)] [[arXiv:2402.15870](https://arxiv.org/abs/2402.15870)] [[Code](https://github.com/ingra14m/Specular-Gaussians)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [3] GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction\n- **🧑‍🔬 作者**：Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong ⟐ University of Science and Technology of China ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](../abs/2403.16964.md)] [[arXiv:2403.16964](https://arxiv.org/abs/2403.16964)] [[Code](https://github.com/city-super/GSDF)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [4] GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling\n- **🧑‍🔬 作者**：Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo\n- **🏫 单位**：University of Science and Technology of China ⟐ Tsinghua University ⟐ Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](../abs/2403.19655.md)] [[arXiv:2403.19655](https://arxiv.org/abs/2403.19655)] [[Code](https://github.com/GaussianCube/GaussianCube)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [5] 3D Gaussian Splatting as Markov Chain Monte Carlo\n- **🧑‍🔬 作者**：Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Weiwei Sun, Jeff Tseng, Hossam Isack, Abhishek Kar, Andrea Tagliasacchi, Kwang Moo Yi\n- **🏫 单位**：University of British Columbia ⟐ Google Research ⟐ Google DeepMind ⟐ Simon Fraser University ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](../abs/2404.09591.md)] [[arXiv:2404.09591](https://arxiv.org/abs/2404.09591)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [6] DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus\n- **🧑‍🔬 作者**：Yu Chen, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2405.13943.md)] [[arXiv:2405.13943](https://arxiv.org/abs/2405.13943)] [[Code](https://github.com/AIBluefisher/DOGS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [7] NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation\n- **🧑‍🔬 作者**：Chaokang Jiang, Dalong Du, Jiuming Liu, Siting Zhu, Zhenqiang Liu, Zhuang Ma, Zhujin Liang, Jie Zhou\n- **🏫 单位**：PhiGent Robotics ⟐ Shanghai Jiaotong University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2405.14241.md)] [[arXiv:2405.14241](https://arxiv.org/abs/2405.14241)] [[Code](https://github.com/jiangchaokang/NeuroGauss4D-PCI)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [8] D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup\n- **🧑‍🔬 作者**：Joanna Waczyńska, Piotr Borycki, Joanna Kaleta, Sławomir Tadeja, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ Warsaw University of Technology ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](../abs/2405.14276.md)] [[arXiv:2405.14276](https://arxiv.org/abs/2405.14276)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [9] GS-Hider: Hiding Messages into 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang\n- **🏫 单位**：Peking University ⟐ Harbin Institute of Technology (Shenzhen)\n- **🔗 链接**：[[中英摘要](../abs/2405.15118.md)] [[arXiv:2405.15118](https://arxiv.org/abs/2405.15118)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [10] HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting\n- **🧑‍🔬 作者**：Yuanhao Cai, Zihao Xiao, Yixun Liang, Yulun Zhang, Xiaokang Yang, Yaoyao Liu, Alan Yuille\n- **🏫 单位**：Johns Hopkins University ⟐ HKUST (GZ) ⟐ Tsinghua University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2405.15125.md)] [[arXiv:2405.15125](https://arxiv.org/abs/2405.15125)] [[Code](https://github.com/caiyuanhao1998/HDR-GS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [11] DisC-GS: Discontinuity-aware Gaussian Splatting\n- **🧑‍🔬 作者**：Haoxuan Qu, Zhuoling Li, Hossein Rahmani, Yujun Cai, Jun Liu\n- **🏫 单位**：Singapore University of Technology and Design ⟐ Central South University ⟐ Lancaster University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2405.15196.md)] [[arXiv:2405.15196](https://arxiv.org/abs/2405.15196)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [12] Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels\n- **🧑‍🔬 作者**：Yikai Wang, Xinzhou Wang, Zilong Chen, Zhengyi Wang, Fuchun Sun, Jun Zhu\n- **🏫 单位**：Tsinghua University ⟐ ShengShu ⟐ Tongji University\n- **🔗 链接**：[[中英摘要](../abs/2405.16822.md)] [[arXiv:2405.16822](https://arxiv.org/abs/2405.16822)] [[Code](https://github.com/yikaiw/vidu4d)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [13] Memorize What Matters: Emergent Scene Decomposition from Multitraverse\n- **🧑‍🔬 作者**：Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez\n- **🏫 单位**：NYU ⟐ NVIDIA ⟐ USC ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](../abs/2405.17187.md)] [[arXiv:2405.17187](https://arxiv.org/abs/2405.17187)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [14] DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos\n- **🧑‍🔬 作者**：Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu\n- **🏫 单位**：Virginia Tech ⟐ Hong Kong University ⟐ USTC ⟐ University of Adelaide ⟐ CAIR ⟐ Sony Research\n- **🔗 链接**：[[中英摘要](../abs/2405.17705.md)] [[arXiv:2405.17705](https://arxiv.org/abs/2405.17705)] [[Code](https://github.com/linhanwang/DC-Gaussian)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [15] FreeSplat: Generalizable 3D Gaussian Splatting Towards Free View Synthesis of Indoor Scenes\n- **🧑‍🔬 作者**：Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2405.17958.md)] [[arXiv:2405.17958](https://arxiv.org/abs/2405.17958)] [[Code](https://github.com/wangys16/FreeSplat)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [16] R2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction\n- **🧑‍🔬 作者**：Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li\n- **🏫 单位**：Australian National University ⟐ Johns Hopkins University ⟐ University of Technology Sydney\n- **🔗 链接**：[[中英摘要](../abs/2405.20693.md)] [[arXiv:2405.20693](https://arxiv.org/abs/2405.20693)] [[Code](https://github.com/Ruyi-Zha/r2_gaussian)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [17] ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model\n- **🧑‍🔬 作者**：Yufei Wang, Zhihao Li, Lanqing Guo, Wenhan Yang, Alex C. Kot, Bihan Wen\n- **🏫 单位**：Nanyang Technological University ⟐ PengCheng Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2405.20721.md)] [[arXiv:2405.20721](https://arxiv.org/abs/2405.20721)] [[Code](https://github.com/wyf0912/ContextGS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [18] Tetrahedron Splatting for 3D Generation\n- **🧑‍🔬 作者**：Chun Gu, Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang\n- **🏫 单位**：Fudan University ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](../abs/2406.01579.md)] [[arXiv:2406.01579](https://arxiv.org/abs/2406.01579)] [[Code](https://github.com/fudan-zvg/tet-splatting)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [19] OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding\n- **🧑‍🔬 作者**：Yanmin Wu, Jiarui Meng, Haijie Li, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Jian Zhang\n- **🏫 单位**：Peking University ⟐ Baidu VIS ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](../abs/2406.02058.md)] [[arXiv:2406.02058](https://arxiv.org/abs/2406.02058)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [20] DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering\n- **🧑‍🔬 作者**：Zhongpai Gao, Benjamin Planche, Meng Zheng, Xiao Chen, Terrence Chen, Ziyan Wu\n- **🏫 单位**：United Imaging Intelligence, Boston, MA\n- **🔗 链接**：[[中英摘要](../abs/2406.02518.md)] [[arXiv:2406.02518](https://arxiv.org/abs/2406.02518)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [21] GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats\n- **🧑‍🔬 作者**：Sangeek Hyun, Jae-Pil Heo\n- **🏫 单位**：Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](../abs/2406.02968.md)] [[arXiv:2406.02968](https://arxiv.org/abs/2406.02968)] [[Code](https://github.com/hse1032/GSGAN)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [22] Dynamic 3D Gaussian Fields for Urban Areas\n- **🧑‍🔬 作者**：Tobias Fischer, Jonas Kulhanek, Samuel Rota Bulò, Lorenzo Porzi, Marc Pollefeys, Peter Kontschieder\n- **🏫 单位**：ETH Zürich ⟐ Meta Reality Labs ⟐ CTU Prague\n- **🔗 链接**：[[中英摘要](../abs/2406.03175.md)] [[arXiv:2406.03175](https://arxiv.org/abs/2406.03175)] [[Code](https://github.com/tobiasfshr/map4d)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [23] VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction\n- **🧑‍🔬 作者**：Hanlin Chen, Fangyin Wei, Chen Li, Tianxin Huang, Yunsong Wang, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐ Princeton University\n- **🔗 链接**：[[中英摘要](../abs/2406.05774.md)] [[arXiv:2406.05774](https://arxiv.org/abs/2406.05774)] [[Code](https://github.com/HLinChen/VCR-GauS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [24] Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis\n- **🧑‍🔬 作者**：Xin Jin, Pengyi Jiao, Zheng-Peng Duan, Xingchao Yang, Chun-Le Guo, Bo Ren, Chongyi Li\n- **🏫 单位**：Nankai University ⟐ MEGVII Technology\n- **🔗 链接**：[[中英摘要](../abs/2406.06216.md)] [[arXiv:2406.06216](https://arxiv.org/abs/2406.06216)] [[Code](https://github.com/Srameo/LE3D)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [25] MVGamba: Unify 3D Content Generation as State Space Sequence Modeling\n- **🧑‍🔬 作者**：Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang\n- **🏫 单位**：Nanyang Technological University ⟐ National University of Singapore ⟐ University of British Columbia ⟐ Singapore Management University ⟐ Institute for Infocomm Research ⟐ Skywork AI\n- **🔗 链接**：[[中英摘要](../abs/2406.06367.md)] [[arXiv:2406.06367](https://arxiv.org/abs/2406.06367)] [[Code](https://github.com/SkyworkAI/MVGamba)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [26] From Chaos to Clarity: 3DGS in the Dark\n- **🧑‍🔬 作者**：Zhihao Li, Yufei Wang, Alex Kot, Bihan Wen\n- **🏫 单位**：Department of EEE ⟐ Nanyang Technology University, Singapor\n- **🔗 链接**：[[中英摘要](../abs/2406.08300.md)] [[arXiv:2406.08300](https://arxiv.org/abs/2406.08300)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [27] Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models\n- **🧑‍🔬 作者**：Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll\n- **🏫 单位**：University of Tübingen ⟐ Tübingen AI Center ⟐ Max Planck Institute for Informatics, Saarland Informatics Campus\n- **🔗 链接**：[[中英摘要](../abs/2406.08475.md)] [[arXiv:2406.08475](https://arxiv.org/abs/2406.08475)] [[Code](https://github.com/YuxuanSnow/Human3Diffusion/)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [28] L4GM: Large 4D Gaussian Reconstruction Model\n- **🧑‍🔬 作者**：Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling\n- **🏫 单位**：NVIDIA ⟐ University of Toronto ⟐ University of Cambridge ⟐ MIT ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2406.10324.md)] [[arXiv:2406.10324](https://arxiv.org/abs/2406.10324)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [29] Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Junha Hyung, Susung Hong, Sungwon Hwang, Jaeseong Lee, Jaegul Choo, Jin-Hwa Kim\n- **🏫 单位**：KAIST ⟐ NAVER AI Lab ⟐ SNU AIIS ⟐ Korea University\n- **🔗 链接**：[[中英摘要](../abs/2406.11672.md)] [[arXiv:2406.11672](https://arxiv.org/abs/2406.11672)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [30] HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors\n- **🧑‍🔬 作者**：Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu\n- **🏫 单位**：ByteDance ⟐ Peking University ⟐ Xiamen University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2406.12459.md)] [[arXiv:2406.12459](https://arxiv.org/abs/2406.12459)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [31] Splatter a Video: Video Gaussian Representation for Versatile Processing\n- **🧑‍🔬 作者**：Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi\n- **🏫 单位**：The University of Hong Kong ⟐ VAST\n- **🔗 链接**：[[中英摘要](../abs/2406.13870.md)] [[arXiv:2406.13870](https://arxiv.org/abs/2406.13870)] [[Code](https://github.com/SunYangtian/Splatter_A_Video)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [32] Gaussian-Informed Continuum for Physical Property Identification and Simulation\n- **🧑‍🔬 作者**：Junhao Cai, Yuji Yang, Weihao Yuan, Yisheng He, Zilong Dong, Liefeng Bo, Hui Cheng, Qifeng Chen\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Sun Yat-sen University ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](../abs/2406.14927.md)] [[arXiv:2406.14927](https://arxiv.org/abs/2406.14927)] [[Code](https://github.com/Jukgei/gic)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [33] GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation\n- **🧑‍🔬 作者**：Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang\n- **🏫 单位**：Tsinghua University ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](../abs/2406.15333.md)] [[arXiv:2406.15333](https://arxiv.org/abs/2406.15333)] [[Code](https://github.com/alibaba-yuanjing-aigclab/GeoLRM)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [34] Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text\n- **🧑‍🔬 作者**：Xinyang Li, Zhangyu Lai, Linning Xu, Yansong Qu, Liujuan Cao, Shengchuan Zhang, Bo Dai, Rongrong Ji\n- **🏫 单位**：Xiamen University ⟐  Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2406.17601.md)] [[arXiv:2406.17601](https://arxiv.org/abs/2406.17601)] [[Code](https://github.com/imlixinyang/director3d)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [35] Expressive Gaussian Human Avatars from Monocular RGB Video\n- **🧑‍🔬 作者**：Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang\n- **🏫 单位**：University of Texas at Austin  ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](../abs/2407.03204.md)] [[arXiv:2407.03204](https://arxiv.org/abs/2407.03204)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [36] Reference-based Controllable Scene Stylization with Gaussian Splatting\n- **🧑‍🔬 作者**：Yiqun Mei, Jiacong Xu, Vishal M. Patel\n- **🏫 单位**：Johns Hopkins University\n- **🔗 链接**：[[中英摘要](../abs/2407.07220.md)] [[arXiv:2407.07220](https://arxiv.org/abs/2407.07220)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [37] WildGaussians: 3D Gaussian Splatting in the Wild\n- **🧑‍🔬 作者**：Jonas Kulhanek, Songyou Peng, Zuzana Kukelova, Marc Pollefeys, Torsten Sattler\n- **🏫 单位**： Czech Technical University in Prague ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2407.08447.md)] [[arXiv:2407.08447](https://arxiv.org/abs/2407.08447)] [[Code](https://github.com/jkulhanek/wild-gaussians/)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [38] DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation\n- **🧑‍🔬 作者**：Zhiqi Li, Yiming Chen, Peidong Liu\n- **🏫 单位**：Zhejiang University ⟐ Westlake University\n- **🔗 链接**：[[中英摘要](../abs/2410.06756.md)] [[arXiv:2410.06756](https://arxiv.org/abs/2410.06756)] [[Code](https://github.com/WU-CVGL/DreamMesh4D)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [39] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Ruijie Zhu, Yanzhe Liang, Hanzhi Chang, Jiacheng Deng, Jiahao Lu, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](../abs/2410.07707.md)] [[arXiv:2410.07707](https://arxiv.org/abs/2410.07707)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [40] Generalizable and Animatable Gaussian Head Avatar\n- **🧑‍🔬 作者**：Xuangeng Chu, Tatsuya Harada\n- **🏫 单位**：The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2410.07971.md)] [[arXiv:2410.07971](https://arxiv.org/abs/2410.07971)] [[Code](https://github.com/xg-chu/GAGAvatar)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [41] Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics\n- **🧑‍🔬 作者**：Junyi Cao, Shanyan Guan, Yanhao Ge, Wei Li, Xiaokang Yang, Chao Ma\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ vivo Mobile Communication Co., Ltd.\n- **🔗 链接**：[[中英摘要](../abs/2410.08257.md)] [[arXiv:2410.08257](https://arxiv.org/abs/2410.08257)] [[Code](https://github.com/XJay18/NeuMA)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [42] Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars\n- **🧑‍🔬 作者**：Xuan Huang, Hanhui Li, Wanquan Liu, Xiaodan Liang, Yiqiang Yan, Yuhao Cheng, Chengqiang Gao\n- **🏫 单位**：Shenzhen Campus of Sun Yat-Sen University ⟐ Lenovo Research\n- **🔗 链接**：[[中英摘要](../abs/2410.08840.md)] [[arXiv:2410.08840](https://arxiv.org/abs/2410.08840)] [[Code](https://github.com/XuanHuang0/GuassianHand)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [43] DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering\n- **🧑‍🔬 作者**：Jiahao Lu, Jiacheng Deng, Ruijie Zhu, Yanzhe Liang, Wenfei Yang, Tianzhu Zhang, Xu Zhou\n- **🏫 单位**：University of Science and Technology of China ⟐ Deep Space Exploration Lab ⟐ Sangfor Technologies Inc\n- **🔗 链接**：[[中英摘要](../abs/2410.13607.md)] [[arXiv:2410.13607](https://arxiv.org/abs/2410.13607)] [[Code](https://github.com/peoplelu/DN-4DGS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [44] Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set\n- **🧑‍🔬 作者**：Wenyuan Zhang, Yu-Shen Liu, Zhizhong Han\n- **🏫 单位**：Tsinghua University ⟐ Wayne State University\n- **🔗 链接**：[[中英摘要](../abs/2410.14189.md)] [[arXiv:2410.14189](https://arxiv.org/abs/2410.14189)] [[Code](https://github.com/wen-yuan-zhang/GS-Pull)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [45] Fully Explicit Dynamic Gaussian Splatting\n- **🧑‍🔬 作者**：Junoh Lee, Chang-Yeon Won, Hyunjun Jung, Inhwan Bae, Hae-Gon Jeon\n- **🏫 单位**：School of Electrical Engineering and Computer Science ⟐ AI Graduate School\n- **🔗 链接**：[[中英摘要](../abs/2410.15629.md)] [[arXiv:2410.15629](https://arxiv.org/abs/2410.15629)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [46] 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors\n- **🧑‍🔬 作者**：Xi Liu, Chaoyi Zhou, Siyu Huang\n- **🏫 单位**：Clemson University\n- **🔗 链接**：[[中英摘要](../abs/2410.16266.md)] [[arXiv:2410.16266](https://arxiv.org/abs/2410.16266)] [[Code](https://github.com/xiliu8006/3DGS-Enhancer)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [47] Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis\n- **🧑‍🔬 作者**：Liang Han, Junsheng Zhou, Yu-Shen Liu, Zhizhong Han\n- **🏫 单位**：Tsinghua University ⟐ Wayne State University\n- **🔗 链接**：[[中英摘要](../abs/2410.18822.md)] [[arXiv:2410.18822](https://arxiv.org/abs/2410.18822)] [[Code](https://github.com/hanl2010/Binocular3DGS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [48] DiffGS: Functional Gaussian Splatting Diffusion\n- **🧑‍🔬 作者**：Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2410.19657.md)] [[arXiv:2410.19657](https://arxiv.org/abs/2410.19657)] [[Code](https://github.com/weiqi-zhang/DiffGS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [49] SCube: Instant Large-Scale Scene Reconstruction using VoxSplats\n- **🧑‍🔬 作者**：Xuanchi Ren, Yifan Lu, Hanxue Liang, Zhangjie Wu, Huan Ling, Mike Chen, Sanja Fidler, Francis Williams, Jiahui Huang\n- **🏫 单位**：NVIDIA ⟐ University of Toronto ⟐ Vector Institute ⟐ Shanghai Jiao Tong University ⟐ University of Cambridge ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2410.20030.md)] [[arXiv:2410.20030](https://arxiv.org/abs/2410.20030)] [[Code](https://github.com/nv-tlabs/SCube)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [50] Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering\n- **🧑‍🔬 作者**：Meng Wei, Qianyi Wu, Jianmin Zheng, Hamid Rezatofighi, Jianfei Cai\n- **🏫 单位**：Monash Univeristy ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2410.20593.md)] [[arXiv:2410.20593](https://arxiv.org/abs/2410.20593)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [51] ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings\n- **🧑‍🔬 作者**：Suyoung Lee, Jaeyoung Chung, Jaeyoo Huh, Kyoung Mu Lee\n- **🏫 单位**：Dept. of ECE & ASRI ⟐ Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2410.20686.md)] [[arXiv:2410.20686](https://arxiv.org/abs/2410.20686)] [[Code](https://github.com/esw0116/ODGS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [52] Grid4D: 4D Decomposed Hash Encoding for High-fidelity Dynamic Gaussian Splatting\n- **🧑‍🔬 作者**：Jiawei Xu, Zexin Fan, Jian Yang, Jin Xie\n- **🏫 单位**：Nankai University ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](../abs/2410.20815.md)] [[arXiv:2410.20815](https://arxiv.org/abs/2410.20815)] [[Code](https://github.com/JiaweiXu8/Grid4D)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [53] MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps\n- **🧑‍🔬 作者**：Takuma Nishimura, Andreea Dogaru, Martin Oeggerli, Bernhard Egger\n- **🏫 单位**：Department of Computer Science, National University of Singapore ⟐ Institute of High Performance Computing, A*STAR ⟐ Centre for Frontier AI Research, A*STAR\n- **🔗 链接**：[[中英摘要](../abs/2410.21566.md)] [[arXiv:2410.21566](https://arxiv.org/abs/2410.21566)] [[Code](https://github.com/Pixie8888/MVSDet)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [54] Geometry Cloak: Preventing TGS-based 3D Reconstruction from Copyrighted Images\n- **🧑‍🔬 作者**：Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan\n- **🏫 单位**：Department of Computer Science, Hong Kong Baptist University ⟐ NVIDIA AI Technology Center, NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2410.22705.md)] [[arXiv:2410.22705](https://arxiv.org/abs/2410.22705)] [[Code](https://github.com/qsong2001/Geometry-Cloak)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [55] Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis\n- **🧑‍🔬 作者**：Zhiyuan Min, Yawei Luo, Jianwen Sun, Yi Yang\n- **🏫 单位**：Zhejiang University ⟐ Central China Normal University\n- **🔗 链接**：[[中英摘要](../abs/2410.22817.md)] [[arXiv:2410.22817](https://arxiv.org/abs/2410.22817)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [56] GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring\n- **🧑‍🔬 作者**：Dongwoo Lee, Joonkyu Park, Kyoung Mu Lee\n- **🏫 单位**：Dept. of ECE&ASRI ⟐ IPAI, Seoul National University, Korea\n- **🔗 链接**：[[中英摘要](../abs/2410.23658.md)] [[arXiv:2410.23658](https://arxiv.org/abs/2410.23658)] [[Code](https://github.com/dongwoohhh/GS-Blur)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024 Datasets & Benchmarks Track\n\n#### [57] GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiufeng Huang, Ruiqi Li, Yiu-ming Cheung, Ka Chun Cheung, Simon See, Renjie Wan\n- **🏫 单位**：Department of Computer Science, Hong Kong Baptist University ⟐ NVIDIA AI Technology Center, NVAITC\n- **🔗 链接**：[[中英摘要](../abs/2410.23718.md)] [[arXiv:2410.23718](https://arxiv.org/abs/2410.23718)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [58] GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes\n- **🧑‍🔬 作者**：Gaochao Song, Chong Cheng, Hao Wang\n- **🏫 单位**：AI Thrust, HKUST(GZ)\n- **🔗 链接**：[[中英摘要](../abs/2411.01853.md)] [[arXiv:2411.01853](https://arxiv.org/abs/2411.01853)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [59] FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training\n- **🧑‍🔬 作者**：Ruihong Yin, Vladimir Yugay, Yue Li, Sezer Karaoglu, Theo Gevers\n- **🏫 单位**：University of Amsterdam\n- **🔗 链接**：[[中英摘要](../abs/2411.02229.md)] [[arXiv:2411.02229](https://arxiv.org/abs/2411.02229)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [60] Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis\n- **🧑‍🔬 作者**：Rui Peng, Wangze Xu, Luyang Tang, Liwei Liao, Jianbo Jiao, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory ⟐ University of Birmingham\n- **🔗 链接**：[[中英摘要](../abs/2411.03637.md)] [[arXiv:2411.03637](https://arxiv.org/abs/2411.03637)] [[Code](https://github.com/prstrive/SCGaussian)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [61] MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views\n- **🧑‍🔬 作者**：Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai\n- **🏫 单位**：Monash University ⟐ VGG, University of Oxford ⟐ ETH Zurich ⟐ University of Tübingen ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2411.04924.md)] [[arXiv:2411.04924](https://arxiv.org/abs/2411.04924)] [[Code](https://github.com/donydchen/mvsplat360)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [62] ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing\n- **🧑‍🔬 作者**：Jun-Kun Chen, Yu-Xiong Wang\n- **🏫 单位**：University of Illinois at Urbana-Champaign\n- **🔗 链接**：[[中英摘要](../abs/2411.05006.md)] [[arXiv:2411.05006](https://arxiv.org/abs/2411.05006)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [63] HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Qiankun Gao, Jiarui Meng, Chengxiang Wen, Jie Chen, Jian Zhang\n- **🏫 单位**：School of Electronic and Computer Engineering, Peking University ⟐ Peng Cheng Laboratory ⟐ Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology\n- **🔗 链接**：[[中英摘要](../abs/2411.07541.md)] [[arXiv:2411.07541](https://arxiv.org/abs/2411.07541)] [[Code](https://github.com/gqk/HiCoM)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [64] GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Umangi Jain, Ashkan Mirzaei, Igor Gilitschenski\n- **🏫 单位**：University of Toronto\n- **🔗 链接**：[[中英摘要](../abs/2411.07555.md)] [[arXiv:2411.07555](https://arxiv.org/abs/2411.07555)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [65] DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization\n- **🧑‍🔬 作者**：Yueming Xu, Haochen Jiang, Zhongyang Xiao, Jianfeng Feng, Li Zhang\n- **🏫 单位**：Fudan University ⟐ Autonomous Driving Division, NIO\n- **🔗 链接**：[[中英摘要](../abs/2411.08373.md)] [[arXiv:2411.08373](https://arxiv.org/abs/2411.08373)] [[Code](https://github.com/fudan-zvg/DG-SLAM)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [66] 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization\n- **🧑‍🔬 作者**：Mijeong Kim, Jongwoo Lim, Bohyung Han\n- **🏫 单位**：ECE ⟐ ME ⟐ IPAI, Seoul National University, South Korea\n- **🔗 链接**：[[中英摘要](../abs/2411.08879.md)] [[arXiv:2411.08879](https://arxiv.org/abs/2411.08879)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [67] Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects\n- **🧑‍🔬 作者**：Abdurrahman Zeybey, Mehmet Ergezer, Tommy Nguyen\n- **🏫 单位**：Wentworth Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2412.02803.md)] [[arXiv:2412.02803](https://arxiv.org/abs/2412.02803)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024 Safe Generative AI Workshop\n\n#### [68] QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos\n- **🧑‍🔬 作者**：Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shrivastava, David Luebke, Shalini De Mello\n- **🏫 单位**：University of Maryland ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2412.04469.md)] [[arXiv:2412.04469](https://arxiv.org/abs/2412.04469)] [Code]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [69] Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis\n- **🧑‍🔬 作者**：Diwen Wan, Yuxiang Wang, Ruijie Lu, Gang Zeng\n- **🏫 单位**：National Key Laboratory of General Artificial Intelligence, School of IST, Peking University, China\n- **🔗 链接**：[[中英摘要](../abs/2412.05570.md)] [[arXiv:2412.05570](https://arxiv.org/abs/2412.05570)] [[Code](https://github.com/dnvtmf/SK_GS)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n\n#### [70] Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images\n- **🧑‍🔬 作者**：Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan\n- **🏫 单位**: Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2503.16338.md)] [[arXiv:2503.16338](https://arxiv.org/abs/2503.16338)] [[Code](https://github.com/shengjun-zhang/GGN)]\n- **📝 说明**：🏆 Accepted to NeurIPS 2024\n"
  },
  {
    "path": "2024/SIGGRAPH.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to SIGGRAPH2024\n\n#### [1] MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar\n- **🧑‍🔬 作者**：Yufan Chen, Lizhen Wang, Qijing Li, Hongjiang Xiao, Shengping Zhang, Hongxun Yao, Yebin Liu\n- **🏫 单位**：Harbin Institute of Technology ⟐ Tsinghua University ⟐ Communication University of China\n- **🔗 链接**：[[中英摘要](../abs/2312.04558.md)] [[arXiv:2312.04558](https://arxiv.org/abs/2312.04558)] [[Code](https://github.com/yufan1012/MonoGaussianAvatar)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [2] VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality\n- **🧑‍🔬 作者**：Ying Jiang, Chang Yu, Tianyi Xie, Xuan Li, Yutao Feng, Huamin Wang, Minchen Li, Henry Lau, Feng Gao, Yin Yang, Chenfanfu Jiang\n- **🏫 单位**：UCLA ⟐ HKU ⟐ Utah ⟐ ZJU ⟐ Style3D Research ⟐ CMU ⟐ Amazon\n- **🔗 链接**：[[中英摘要](../abs/2401.16663.md)] [[arXiv:2401.16663](https://arxiv.org/abs/2401.16663)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [3] StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering\n- **🧑‍🔬 作者**：Lukas Radl, Michael Steiner, Mathias Parger, Alexander Weinrauch, Bernhard Kerbl, Markus Steinberger\n- **🏫 单位**：Graz University of Technology ⟐ TU Wien, Austria ⟐ Huawei Technologies, Austria\n- **🔗 链接**：[[中英摘要](../abs/2402.00525.md)] [[arXiv:2402.00525](https://arxiv.org/abs/2402.00525)] [[Code](https://github.com/r4dl/StopThePop)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024 (Journal Track)\n\n#### [4] 4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes\n- **🧑‍🔬 作者**：Yuanxing Duan, Fangyin Wei, Qiyu Dai, Yuhang He, Wenzheng Chen, Baoquan Chen\n- **🏫 单位**：Peking University ⟐ Princeton University ⟐ NVIDIA ⟐ National Key Lab of General AI, China\n- **🔗 链接**：[[中英摘要](../abs/2402.03307.md)] [[arXiv:2402.03307](https://arxiv.org/abs/2402.03307)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [5] GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting\n- **🧑‍🔬 作者**：Chen Yang, Sikuang Li, Jiemin Fang, Ruofan Liang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Huawei ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](../abs/2402.10259.md)] [[arXiv:2402.10259](https://arxiv.org/abs/2402.10259)] [[Code](https://github.com/GaussianObject/GaussianObject)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [6] StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting\n- **🧑‍🔬 作者**：Kunhao Liu, Fangneng Zhan, Muyu Xu, Christian Theobalt, Ling Shao, Shijian Lu\n- **🏫 单位**：Nanyang Technological University ⟐ Max Planck Institute for Informatics ⟐ UCAS-Terminus AI Lab\n- **🔗 链接**：[[中英摘要](../abs/2403.07807.md)] [[arXiv:2403.07807](https://arxiv.org/abs/2403.07807)] [[Code](https://github.com/Kunhao-Liu/StyleGaussian)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024 (Technical Communications)\n\n#### [7] LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives\n- **🧑‍🔬 作者**：Jiadi Cui, Junming Cao, Yuhui Zhong, Liao Wang, Fuqiang Zhao, Penghao Wang, Yifan Chen, Zhipeng He, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu\n- **🏫 单位**：ShanghaiTech University ⟐ DGene ⟐ Stereye ⟐ NeuDim\n- **🔗 链接**：[[中英摘要](../abs/2404.09748.md)] [[arXiv:2404.09748](https://arxiv.org/abs/2404.09748)] [[Code](https://github.com/zhaofuq/LOD-3DGS)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [8] Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes\n- **🧑‍🔬 作者**：Zehao Yu, Torsten Sattler, Andreas Geiger\n- **🏫 单位**：University of Tübingen, Tübingen AI Center, Germany ⟐ Czech Technical University in Prague, Czech Republic\n- **🔗 链接**：[[中英摘要](../abs/2404.10772.md)] [[arXiv:2404.10772](https://arxiv.org/abs/2404.10772)] [[Code](https://github.com/autonomousvision/gaussian-opacity-fields)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024 (Journal Track)\n\n#### [9] 3D Gaussian Blendshapes for Head Avatar Animation\n- **🧑‍🔬 作者**：Shengjie Ma, Yanlin Weng, Tianjia Shao, Kun Zhou\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2404.19398.md)] [[arXiv:2404.19398](https://arxiv.org/abs/2404.19398)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [10] RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting\n- **🧑‍🔬 作者**：Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou\n- **🏫 单位**：Zhejiang University ⟐ University of Utah ⟐ Baidu Research\n- **🔗 链接**：[[中英摘要](../abs/2404.19706.md)] [[arXiv:2404.19706](https://arxiv.org/abs/2404.19706)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [11] A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose\n- **🧑‍🔬 作者**：Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi\n- **🏫 单位**：University of California\n- **🔗 链接**：[[中英摘要](../abs/2405.03659.md)] [[arXiv:2405.03659](https://arxiv.org/abs/2405.03659)] [[Code](https://github.com/RaymondJiangkw/COGS)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [12] LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer\n- **🧑‍🔬 作者**：Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ NNKosmos Technology ⟐ Beijing Normal University\n- **🔗 链接**：[[中英摘要](../abs/2405.07319.md)] [[arXiv:2405.07319](https://arxiv.org/abs/2405.07319)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [13] NPGA: Neural Parametric Gaussian Avatars\n- **🧑‍🔬 作者**：Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner\n- **🏫 单位**：Technical University of Munich ⟐ Synthesia ⟐ University College London\n- **🔗 链接**：[[中英摘要](./abs/2405.19331.md)] [[arXiv:2405.19331](https://arxiv.org/abs/2405.19331)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [14] GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis\n- **🧑‍🔬 作者**：Boming Zhao, Yuan Li, Ziyu Sun, Lin Zeng, Yujun Shen, Rui Ma, Yinda Zhang, Hujun Bao, Zhaopeng Cui\n- **🏫 单位**：Zhejiang University ⟐ Jilin University ⟐ Ant Group ⟐ Google Inc.\n- **🔗 链接**：[[中英摘要](../abs/2405.19745.md)] [[arXiv:2405.19745](https://arxiv.org/abs/2405.19745)] [[Code](https://github.com/BoMingZhao/GaussianPrediction)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [15] GGHead: Fast and Generalizable 3D Gaussian Heads\n- **🧑‍🔬 作者**：Tobias Kirschstein, Simon Giebenhain, Jiapeng Tang, Markos Georgopoulos, Matthias Nießner\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2406.09377.md)] [[arXiv:2406.09377](https://arxiv.org/abs/2406.09377)] [[Code](https://github.com/tobias-kirschstein/gghead)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [16] Modeling Ambient Scene Dynamics for Free-view Synthesis\n- **🧑‍🔬 作者**：Meng-Li Shih, Jia-Bin Huang, Changil Kim, Rajvi Shah, Johannes Kopf, Chen Gao\n- **🏫 单位**：University of Washington ⟐ University of Maryland ⟐ Meta\n- **🔗 链接**：[[中英摘要](../abs/2406.09395.md)] [[arXiv:2406.09395](https://arxiv.org/abs/2406.09395)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [17] Projecting Radiance Fields to Mesh Surfaces\n- **🧑‍🔬 作者**：Adrian Xuan Wei Lim, Lynnette Hui Xian Ng, Nicholas Kyger, Tomo Michigami, Faraz Baghernezhad\n- **🏫 单位**：Roblox ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2406.11570.md)] [[arXiv:2406.11570](https://arxiv.org/abs/2406.11570)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Posters 2024\n\n#### [18] A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets\n- **🧑‍🔬 作者**：Bernhard Kerbl, Andréas Meuleman, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, George Drettakis\n- **🏫 单位**：Inria ⟐ Université Côte d’Azur, France ⟐ TU Wien, Austria\n- **🔗 链接**：[[中英摘要](../abs/2406.12080.md)] [[arXiv:2406.12080](https://arxiv.org/abs/2406.12080)] [[Code](https://github.com/graphdeco-inria/hierarchical-3d-gaussians)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2024\n\n#### [19] Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos\n- **🧑‍🔬 作者**：Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, Leonidas Guibas\n- **🏫 单位**：Stanford University ⟐ Google\n- **🔗 链接**：[[中英摘要](../abs/2406.18717.md)] [[arXiv:2406.18717](https://arxiv.org/abs/2406.18717)] [[Code](https://github.com/coltonstearns/dynamic-gaussian-marbles)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [20] 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes\n- **🧑‍🔬 作者**：Nicolas Moenne-Loccoz, Ashkan Mirzaei, Or Perel, Riccardo de Lutio, Janick Martinez Esturo, Gavriel State, Sanja Fidler, Nicholas Sharp, Zan Gojcic\n- **🏫 单位**：NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2407.07090.md)] [[arXiv:2407.07090](https://arxiv.org/abs/2407.07090)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [21] Pano2Room: Novel View Synthesis from a Single Indoor Panorama\n- **🧑‍🔬 作者**：Guo Pu, Yiming Zhao, Zhouhui Lian\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](../abs/2408.11413.md)] [[arXiv:2408.11413](https://arxiv.org/abs/2408.11413)] [[Code](https://github.com/TrickyGo/Pano2Room)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [22] Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos\n- **🧑‍🔬 作者**：Yuheng Jiang, Zhehao Shen, Yu Hong, Chengcheng Guo, Yize Wu, Yingliang Zhang, Jingyi Yu, Lan Xu\n- **🏫 单位**：ShanghaiTech University ⟐ NeuDim Digital Technology ⟐ DGene Digital Technology\n- **🔗 链接**：[[中英摘要](../abs/2409.08353.md)] [[arXiv:2409.08353](https://arxiv.org/abs/2409.08353)] [[Code](https://github.com/HiFi-Human/DualGS)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [23] AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius\n- **🧑‍🔬 作者**：Xinzhe Wang, Ran Yi, Lizhuang Ma\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2409.08669.md)] [[arXiv:2409.08669](https://arxiv.org/abs/2409.08669)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [24] SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation\n- **🧑‍🔬 作者**：Mingze Sun, Chen Guo, Puhua Jiang, Shiwei Mao, Yurun Chen, Ruqi Huang\n- **🏫 单位**：TsinghuaShenzhen International Graduate School ⟐ Peng Cheng Lab\n- **🔗 链接**：[[中英摘要](../abs/2409.11682.md)] [[arXiv:2409.11682](https://arxiv.org/abs/2409.11682)] [[Code](https://github.com/rqhuang88/SRIF)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [25] GS^3: Efficient Relighting with Triple Gaussian Splatting\n- **🧑‍🔬 作者**：Zoubin Bi, Yixin Zeng, Chong Zeng, Fan Pei, Xiang Feng, Kun Zhou, Hongzhi Wu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2410.11419.md)] [[arXiv:2410.11419](https://arxiv.org/abs/2410.11419)] [[Code](https://github.com/gsrelight/gs-relight)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [26] L3DG: Latent 3D Gaussian Diffusion\n- **🧑‍🔬 作者**：Barbara Roessle, Norman Müller, Lorenzo Porzi, Samuel Rota Bulò, Peter Kontschieder, Angela Dai, Matthias Nießner\n- **🏫 单位**：Technical University of Munich ⟐ Meta Reality Labs Zurich\n- **🔗 链接**：[[中英摘要](../abs/2410.13530.md)] [[arXiv:2410.13295](https://arxiv.org/abs/2410.13530)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n\n#### [27] URAvatar: Universal Relightable Gaussian Codec Avatars\n- **🧑‍🔬 作者**：Junxuan Li, Chen Cao, Gabriel Schwartz, Rawal Khirodkar, Christian Richardt, Tomas Simon, Yaser Sheikh, Shunsuke Saito\n- **🏫 单位**：Codec Avatars Lab, Meta\n- **🔗 链接**：[[中英摘要](../abs/2410.24223.md)] [[arXiv:2410.24223](https://arxiv.org/abs/2410.24223)] [Code]\n- **📝 说明**：🏆 Accepted to SIGGRAPH Asia 2024\n"
  },
  {
    "path": "2025/3DV.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to 3DV2025\n\n#### [1] Drivable 3D Gaussian Avatars\n- **🧑‍🔬 作者**：Wojciech Zielonka, Timur Bagautdinov, Shunsuke Saito, Michael Zollhöfer, Justus Thies, Javier Romero\n- **🏫 单位**：Meta Reality Labs Research ⟐ Technical University of Darmstadt ⟐ Max Planck Institute for Intelligent Systems\n- **🔗 链接**：[[中英摘要](../abs/2311.08581.md)] [[arXiv:2311.08581](https://arxiv.org/abs/2311.08581)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [2] SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting\n- **🧑‍🔬 作者**：Haolin Xiong, Sairisheek Muttukuru, Rishi Upadhyay, Pradyumna Chari, Achuta Kadambi\n- **🏫 单位**：University of California, Los Angeles\n- **🔗 链接**：[[中英摘要](../abs/2312.00206.md)] [[arXiv:2312.00206](https://arxiv.org/abs/2312.00206)] [[Code](https://github.com/ForMyCat/SparseGS)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [3] Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu\n- **🏫 单位**：Zhejiang University ⟐ Westlake University ⟐ Tongji University\n- **🔗 链接**：[[中英摘要](../abs/2403.09981.md)] [[arXiv:2403.09981](https://arxiv.org/abs/2403.09981)] [[Code](https://github.com/WU-CVGL/MVControl-threestudio)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [4] GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation\n- **🧑‍🔬 作者**：Dingding Cai, Janne Heikkilä, Esa Rahtu\n- **🏫 单位**：Tampere University ⟐ University of Oulu\n- **🔗 链接**：[[中英摘要](../abs/2403.10683.md)] [[arXiv:2403.10683](https://arxiv.org/abs/2403.10683)] [[Code](https://github.com/dingdingcai/GSPose)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [5] RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS\n- **🧑‍🔬 作者**：Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Oechsle, Daniel Duckworth, Rama Gosula, Keisuke Tateno, John Bates, Dominik Kaeser, Federico Tombari\n- **🏫 单位**：Google\n- **🔗 链接**：[[中英摘要](../abs/2403.13806.md)] [[arXiv:2403.13806](https://arxiv.org/abs/2403.13806)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [6] RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion\n- **🧑‍🔬 作者**：Jaidev Shriram, Alex Trevithick, Lingjie Liu, Ravi Ramamoorthi\n- **🏫 单位**：University of California, San Diego ⟐  AMAP ⟐ University of Pennsylvania\n- **🔗 链接**：[[中英摘要](../abs/2404.07199.md)] [[arXiv:2404.07199](https://arxiv.org/abs/2404.07199)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [7] GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details\n- **🧑‍🔬 作者**：Boqian Li, Xuan Li, Ying Jiang, Tianyi Xie, Feng Gao, Huamin Wang, Yin Yang, Chenfanfu Jiang\n- **🏫 单位**：UCLA ⟐ Utah ⟐ HKU ⟐ Amazon ⟐ Style3D Research\n- **🔗 链接**：[[中英摘要](../abs/2405.12420.md)] [[arXiv:2405.12420](https://arxiv.org/abs/2405.12420)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [8] EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang\n- **🏫 单位**：ETH Zürich ⟐ Google ⟐ Microsoft ⟐ KU Leuven ⟐ INSAIT, Sofia\n- **🔗 链接**：[[中英摘要](./abs/2406.19811.md)] [[arXiv:2406.19811](https://arxiv.org/abs/2406.19811)] [[Code](https://github.com/zdwww/EgoGaussian)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [9] HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors\n- **🧑‍🔬 作者**：Xiaozheng Zheng, Chao Wen, Zhaohu Li, Weiyi Zhang, Zhuo Su, Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, Yongjie Zhang, Guidong Wang, Lan Xu\n- **🏫 单位**：ByteDance ⟐ ShanghaiTech University\n- **🔗 链接**：[[中英摘要](../abs/2408.06019.md)] [[arXiv:2408.06019](https://arxiv.org/abs/2408.06019)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [10] WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting\n- **🧑‍🔬 作者**：Huapeng Li, Wenxuan Song, Tianao Xu, Alexandre Elsig, Jonas Kulhanek\n- **🏫 单位**：University of Zurich ⟐ ETH Zurich ⟐ CTU in Prague\n- **🔗 链接**：[[中英摘要](../abs/2408.08206.md)] [[arXiv:2408.08206](https://arxiv.org/abs/2408.08206)] [[Code](https://github.com/water-splatting/water-splatting)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [11] LoopSplat: Loop Closure by Registering 3D Gaussian Splats\n- **🧑‍🔬 作者**：Liyuan Zhu, Yue Li, Erik Sandström, Shengyu Huang, Konrad Schindler, Iro Armeni\n- **🏫 单位**：Stanford University ⟐ University of Amsterdam ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2408.10154.md)] [[arXiv:2408.10154](https://arxiv.org/abs/2408.10154)] [[Code](https://github.com/GradientSpaces/LoopSplat)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [12] DEGAS: Detailed Expressions on Full-Body Gaussian Avatars\n- **🧑‍🔬 作者**：Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ Prometheus Vision Technology Co., Ltd. ⟐ The Hong Kong University of Science and Technology ⟐ Swinburne University of Technology\n- **🔗 链接**：[[中英摘要](../abs/2408.10588.md)] [[arXiv:2408.10588](https://arxiv.org/abs/2408.10588)] [[Code](https://github.com/initialneil/DEGAS)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [13] ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining\n- **🧑‍🔬 作者**：Qi Ma, Yue Li, Bin Ren, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Danda Pani Paudel\n- **🏫 单位**：Computer Vision Lab, ETH Zurich ⟐ INSAIT, Sofia University ⟐ University of Amsterdam ⟐ University of Pisa ⟐ University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2408.10906.md)] [[arXiv:2408.10906](https://arxiv.org/abs/2408.10906)] [[Code](https://github.com/qimaqi/ShapeSplat-Gaussian_MAE)]\n- **📝 说明**: 🏆 Accepted to 3DV 2025\n\n#### [14] LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming\n- **🧑‍🔬 作者**：Yuang Shi, Simone Gasparini, Géraldine Morin, Wei Tsang Ooi\n- **🏫 单位**：National University of Singapore ⟐ IRIT - University of Toulouse\n- **🔗 链接**：[[中英摘要](../abs/2408.14823.md)] [[arXiv:2408.14823](https://arxiv.org/abs/2408.14823)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [15] DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction\n- **🧑‍🔬 作者**：Jenny Seidenschwarz, Qunjie Zhou, Bardienus Duisterhof, Deva Ramanan, Laura Leal-Taixé\n- **🏫 单位**：Technical University of Munich ⟐ Carnegie Mellon University ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2409.02104.md)] [[arXiv:2409.02104](https://arxiv.org/abs/2409.02104)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [16] Direct and Explicit 3D Generation from a Single Image\n- **🧑‍🔬 作者**：Haoyu Wu, Meher Gitika Karumuri, Chuhang Zou, Seungbae Bang, Yuelong Li, Dimitris Samaras, Sunil Hadap\n- **🏫 单位**：Stony Brook University ⟐ Amazon Inc.\n- **🔗 链接**：[[中英摘要](../abs/2411.10947.md)] [[arXiv:2411.10947](https://arxiv.org/abs/2411.10947)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [17] Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes\n- **🧑‍🔬 作者**：Thomas Wimmer, Michael Oechsle, Michael Niemeyer, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Google\n- **🔗 链接**：[[中英摘要](../abs/2411.19233.md)] [[arXiv:2411.19233](https://arxiv.org/abs/2411.19233)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [18] AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones\n- **🧑‍🔬 作者**：Xuqian Ren, Matias Turkulainen, Jiepeng Wang, Otto Seiskari, Iaroslav Melekhov, Juho Kannala, Esa Rahtu\n- **🏫 单位**：Tampere University ⟐ Aalto University ⟐ University of Hong Kong ⟐ Spectacular AI ⟐ University of Oulu\n- **🔗 链接**：[[中英摘要](../abs/2411.19271.md)] [[arXiv:2411.19271](https://arxiv.org/abs/2411.19271)] [[Code](https://github.com/maturk/dn-splatter)]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [19] GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor\n- **🧑‍🔬 作者**：Xiangyue Liu, Kunming Luo, Heng Li, Qi Zhang, Yuan Liu, Li Yi, Ping Tan\n- **🏫 单位**： Hong Kong University of Science and Technology ⟐ Tencent AI Lab ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2501.09978.md)] [[arXiv:2501.09978](https://arxiv.org/abs/2501.09978)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [20] E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sohaib Zahid, Viktor Rudnev, Eddy Ilg, Vladislav Golyanik\n- **🏫 单位**：Saarland University ⟐ MPI for Informatics, SIC\n- **🔗 链接**：[[中英摘要](../abs/2502.10827.md)] [[arXiv:2502.10827](https://arxiv.org/abs/2502.10827)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n\n#### [21] Open-Vocabulary Semantic Part Segmentation of 3D Human\n- **🧑‍🔬 作者**：Keito Suzuki, Bang Du, Girish Krishnan, Kunyao Chen, Runfa Blark Li, Truong Nguyen\n- **🏫 单位**: University of California, San Diego ⟐ Qualcomm\n- **🔗 链接**：[[中英摘要](../abs/2502.19782.md)] [[arXiv:2502.19782](https://arxiv.org/abs/2502.19782)] [Code]\n- **📝 说明**：🏆 Accepted to 3DV 2025\n"
  },
  {
    "path": "2025/AAAI.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to AAAI2025\n\n#### [1] Segment Any 3D Gaussians\n- **🧑‍🔬 作者**：Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Huawei Inc.\n- **🔗 链接**：[[中英摘要](../abs/2312.00860.md)] [[arXiv:2312.00860](https://arxiv.org/abs/2312.00860)] [[Code](https://github.com/Jumpat/SegAnyGAussians)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [2] GFlow: Recovering 4D World from Monocular Video\n- **🧑‍🔬 作者**：Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, Xinchao Wang\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2405.18426.md)] [[arXiv:2405.18426](https://arxiv.org/abs/2405.18426)] [[Code](https://github.com/littlepure2333/gflow)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [3] DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors\n- **🧑‍🔬 作者**：Tianyu Huang, Yihan Zeng, Hui Li, Wangmeng Zuo, Rynson W. H. Lau\n- **🏫 单位**：City University of Hong Kong ⟐ Harbin Institute of Technology ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2406.01476.md)] [[arXiv:2406.01476](https://arxiv.org/abs/2406.01476)] [[Code](https://github.com/tyhuang0428/DreamPhysics)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [4] FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping\n- **🧑‍🔬 作者**：Yuzhou Ji, He Zhu, Junshu Tang, Wuyi Liu, Zhizhong Zhang, Yuan Xie, Lizhuang Ma, Xin Tan\n- **🏫 单位**：East China Normal University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2406.01916.md)] [[arXiv:2406.01916](https://arxiv.org/abs/2406.01916)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [5] TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers\n- **🧑‍🔬 作者**：Chuanrui Zhang, Yingshuang Zou, Zhuoling Li, Minmin Yi, Haoqian Wang\n- **🏫 单位**：Tsinghua University ⟐ The University of Hong Kong ⟐ E-surfing Vision Technology Co., Ltd\n- **🔗 链接**：[[中英摘要](../abs/2408.13770.md)] [[arXiv:2408.13770](https://arxiv.org/abs/2408.13770)] [[Code](https://github.com/xingyoujun/transplat)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [6] DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input\n- **🧑‍🔬 作者**：Qijian Tian, Xin Tan, Yuan Xie, Lizhuang Ma\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ East China Normal University\n- **🔗 链接**：[[中英摘要](../abs/2409.12753.md)] [[arXiv:2409.12753](https://arxiv.org/abs/2409.12753)] [[Code](https://github.com/fangzhou2000/DrivingForward)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [7] Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding\n- **🧑‍🔬 作者**：Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge, Huchuan Lu\n- **🏫 单位**：Dalian University of Technology ⟐ University of Electronic Science and Technology of China ⟐ ZMOAI\n- **🔗 链接**：[[中英摘要](./abs/2411.19551.md)] [[arXiv:2411.19551](https://arxiv.org/abs/2411.19551)] [[Code](https://github.com/wb014/FreeGS)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [8] PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis\n- **🧑‍🔬 作者**：Yifan Xie, Tao Feng, Xin Zhang, Xiangyang Luo, Zixuan Guo, Weijiang Yu, Heng Chang, Fei Ma, Fei Richard Yu\n- **🏫 单位**：Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) ⟐ Xi’an Jiaotong University ⟐ Peking University ⟐ Sun Yat-sen University ⟐ Tsinghua University ⟐ Shenzhen University ⟐ Carleton University\n- **🔗 链接**：[[中英摘要](../abs/2412.08504.md)] [[arXiv:2412.08504](https://arxiv.org/abs/2412.08504)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [9] 3D2-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling\n- **🧑‍🔬 作者**：Zichen Tang, Hongyu Yang, Hanchen Zhang, Jiaxin Chen, Di Huang\n- **🏫 单位**：Beihang University ⟐ Shanghai Artificial Intelligence Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2412.11599.md)] [[arXiv:2412.11599](https://arxiv.org/abs/2412.11599)] [[Code](https://github.com/silence-tang/GaussianActor)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [10] GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians\n- **🧑‍🔬 作者**：Xiaobao Wei, Peng Chen, Ming Lu, Hui Chen, Feng Tian\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Intel Labs China\n- **🔗 链接**：[[中英摘要](../abs/2412.13983.md)] [[arXiv:2412.13983](https://arxiv.org/abs/2412.13983)] [[Code](https://github.com/ucwxb/GraphAvatar)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [11] EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene\n- **🧑‍🔬 作者**：Yixiong Huo, Guangfeng Jiang, Hongyang Wei, Ji Liu, Song Zhang, Han Liu, Xingliang Huang, Mingjie Lu, Jinzhang Peng, Dong Li, Lu Tian, Emad Barsoum\n- **🏫 单位**：Advanced Micro Devices, Inc.\n- **🔗 链接**：[[中英摘要](../abs/2412.15550.md)] [[arXiv:2412.15550](https://arxiv.org/abs/2412.15550)] [[Code](https://github.com/jiangxb98/EGSRAL)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [12] Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity\n- **🧑‍🔬 作者**：Tianqi Shen, Shaohua Liu, Jiaqi Feng, Ziye Ma, Ning An\n- **🏫 单位**：City University of Hong Kong ⟐ Beihang University ⟐ China Coal Research Institute ⟐ State Key Laboratory of Intelligent Coal Mining and Strata Control\n- **🔗 链接**：[[中英摘要](./abs/2412.16619.md)] [[arXiv:2412.16619](https://arxiv.org/abs/2412.16619)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2025\n\n#### [13] GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance\n- **🧑‍🔬 作者**：Jingqiu Zhou, Lue Fan, Xuesong Chen, Linjiang Huang, Si Liu, Hongsheng Li\n- **🏫 单位**：Beihang University ⟐ The Chinese University of Hong Kong ⟐ Chinese Academy of Sciences ⟐ Centre for Perceptual and Interactive Intelligence\n- **🔗 链接**：[[中英摘要](../abs/2412.17715.md)] [[arXiv:2412.17715](https://arxiv.org/abs/2412.17715)] [[Code](https://github.com/zhou745/GaussianPainter)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [14] KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences\n- **🧑‍🔬 作者**：Keng-Wei Chang, Zi-Ming Wang, Shang-Hong Lai\n- **🏫 单位**: National Tsing Hua University\n- **🔗 链接**：[[中英摘要](../abs/2412.20767.md)] [[arXiv:2412.20767](https://arxiv.org/abs/2412.20767)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [15] HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation\n- **🧑‍🔬 作者**：Wentian Qu, Jiahe Li, Jian Cheng, Jian Shi, Chenyu Meng, Cuixia Ma, Hongan Wang, Xiaoming Deng, Yinda Zhang\n- **🏫 单位**: Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Chinese Academy of Sciences ⟐ Google\n- **🔗 链接**：[[中英摘要](../abs/2501.02845.md)] [[arXiv:2501.02845](https://arxiv.org/abs/2501.02845)] [Code]\n- **📝 说明**：🏆 Accepted by AAAI 2025\n\n#### [16] DehazeGS: Seeing Through Fog with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jinze Yu, Yiqun Wang, Zhengda Lu, Jianwei Guo, Yong Li, Hongxing Qin, Xiaopeng Zhang\n- **🏫 单位**：Chongqing University ⟐ University of Chinese Academy of Sciences ⟐ Beijing Normal University ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2501.03659.md)] [[arXiv:2501.03659](https://arxiv.org/abs/2501.03659)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2025\n\n#### [17] FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency\n- **🧑‍🔬 作者**：Han Huang, Yulun Wu, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu\n- **🏫 单位**: Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2501.04628.md)] [[arXiv:2501.04628](https://arxiv.org/abs/2501.04628)] [[Code](https://github.com/yulunwu0108/FatesGS)]\n- **📝 说明**：🏆 Accepted by AAAI 2025\n\n#### [18] BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation\n- **🧑‍🔬 作者**：Xiaolu Hou, Mingcheng Li, Dingkang Yang, Jiawei Chen, Ziyun Qian, Xiao Zhao, Yue Jiang, Jinjie Wei, Qingyao Xu, Lihua Zhang\n- **🏫 单位**：Fudan University ⟐ Engineering Research Center of AI and Robotics, Ministry of Education ⟐ Jilin Provincial Key Laboratory of Intelligence Science and Engineering ⟐ Artificial Intelligence and Unmanned Systems Engineering Research Center of Jilin Province ⟐ ByteDance Inc\n- **🔗 链接**：[[中英摘要](../abs/2501.10462.md)] [[arXiv:2501.10462](https://arxiv.org/abs/2501.10462)] [[Code](https://github.com/SparklingH/BloomScene)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [19] Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting\n- **🧑‍🔬 作者**：Jiaqi Lin, Zhihao Li, Binxiao Huang, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Xiaofei Wu, Fenglong Song, Wenming Yang\n- **🏫 单位**：Tsinghua University ⟐ Huawei Noah’s Ark Lab ⟐ The University of Hong Kong ⟐ Shenzhen Institute of Advanced Technology\n- **🔗 链接**：[[中英摘要](../abs/2501.10788.md)] [[arXiv:2501.10788](https://arxiv.org/abs/2501.10788)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [20] Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images\n- **🧑‍🔬 作者**：Yihui Li, Chengxin Lv, Hongyu Yang, Di Huang\n- **🏫 单位**：State Key Laboratory of Complex and Critical Software Environment, Beijing ⟐ Beihang University ⟐ Shanghai Artificial Intelligence Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2501.14231.md)] [[arXiv:2501.14231](https://arxiv.org/abs/2501.14231)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [21] Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Lingting Zhu, Guying Lin, Jinnan Chen, Xinjie Zhang, Zhenchao Jin, Zhao Wang, Lequan Yu\n- **🏫 单位**：The University of Hong Kong ⟐ Carnegie Mellon University ⟐ National University of Singapore ⟐ The Hong Kong University of Science and Technology ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2502.09039.md)] [[arXiv:2502.09039](https://arxiv.org/abs/2502.09039)] [[Code](https://github.com/HKU-MedAI/LIG)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [22] Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling\n- **🧑‍🔬 作者**：Hanyang Kong, Xingyi Yang, Xinchao Wang\n- **🏫 单位**: National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2502.20378.md)] [[arXiv:2502.20378](https://arxiv.org/abs/2502.20378)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [23] ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting\n- **🧑‍🔬 作者**：Dexter Ong, Yuezhan Tao, Varun Murali, Igor Spasojevic, Vijay Kumar, Pratik Chaudhari\n- **🏫 单位**: GRASP Laboratory, University of Pennsylvania\n- **🔗 链接**：[[中英摘要](../abs/2502.20386.md)] [[arXiv:2502.20386](https://arxiv.org/abs/2502.20386)] [[Code](https://atlasnav.github.io/)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [24] Frequency-Aware Density Control via Reparameterization for High-Quality Rendering of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhaojie Zeng, Yuesong Wang, Lili Ju, Tao Guan\n- **🏫 单位**: Huazhong University of Science and Technology ⟐ University of South Carolina\n- **🔗 链接**：[[中英摘要](../abs/2503.07000.md)] [[arXiv:2503.07000](https://arxiv.org/abs/2503.07000)] [[Code](https://github.com/whoiszzj/FDS-GS)]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [25] CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image\n- **🧑‍🔬 作者**：Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesha Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu\n- **🏫 单位**：University of California, Riverside ⟐ United Imaging Intelligence, Boston\n- **🔗 链接**：[[中英摘要](./abs/2503.15671.md)] [[arXiv:2503.15671](https://arxiv.org/abs/2503.15671)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2025\n\n#### [26] Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes\n- **🧑‍🔬 作者**：Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy Chowdhury\n- **🏫 单位**：University of California\n- **🔗 链接**：[[中英摘要](./abs/2503.15742.md)] [[arXiv:2503.15742](https://arxiv.org/abs/2503.15742)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2025\n\n#### [27] Enhancing Close-up Novel View Synthesis via Pseudo-labeling\n- **🧑‍🔬 作者**：Jiatong Xia, Libo Sun, Lingqiao Liu\n- **🏫 单位**: Australian Institute for Machine Learning, The University of Adelaide\n- **🔗 链接**：[[中英摘要](../abs/2503.15908.md)] [[arXiv:2503.15908](https://arxiv.org/abs/2503.15908)] [Code]\n- **📝 说明**：🏆 Accepted to AAAI 2025\n\n#### [28] Multi-StyleGS: Stylizing Gaussian Splatting with Multiple Styles\n- **🧑‍🔬 作者**：Yangkai Lin, Jiabao Lei, Kui jia\n- **🏫 单位**：South China University of Technology ⟐ The Chinese University of Hong Kong, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2506.06846.md)] [[arXiv:2506.06846](https://arxiv.org/abs/2506.06846)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2025\n"
  },
  {
    "path": "2025/ACMMM.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ACMMM2025\n\n#### [1] Street Gaussians without 3D Object Tracker\n- **🧑‍🔬 作者**：Ruida Zhang, Chengxi Li, Chenyangguang Zhang, Xingyu Liu, Haili Yuan, Yanyan Li, Xiangyang Ji, Gim Hee Lee\n- **🏫 单位**：Tsinghua University ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2412.05548.md)] [[arXiv:2412.05548](https://arxiv.org/abs/2412.05548)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [2] SLGaussian: Fast Language Gaussian Splatting in Sparse Views\n- **🧑‍🔬 作者**：Kangjie Chen, BingQuan Dai, Minghan Qin, Dongbin Zhang, Peihao Li, Yingshuang Zou, Haoqian Wang\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School, Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2412.08331.md)] [[arXiv:2412.08331](https://arxiv.org/abs/2412.08331)] [[Code](https://github.com/chenkangjie1123/SLGaussian)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [3] EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler\n- **🧑‍🔬 作者**：Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang\n- **🏫 单位**：Peking University ⟐ Nanjing University ⟐ University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2504.09540.md)] [[arXiv:2504.09540](https://arxiv.org/abs/2504.09540)] [[Code](https://github.com/PKUHaoWang/EmbodiedOcc2)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [4] TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors\n- **🧑‍🔬 作者**：Mingwei Li, Pu Pang, Hehe Fan, Hua Huang, Yi Yang\n- **🏫 单位**：Zhejiang University ⟐ Zhongguancun Academy, Beijing ⟐ Xi'an Jiaotong University ⟐ Beijing Normal University\n- **🔗 链接**：[[中英摘要](./abs/2504.12799.md)] [[arXiv:2504.12799](https://arxiv.org/abs/2504.12799)] [[Code](https://github.com/longxiang-ai/TSGS)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [5] CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos\n- **🧑‍🔬 作者**：Keyang Ye, Tianjia Shao, Kun Zhou\n- **🏫 单位**：Westlake University ⟐ Wuhan University ⟐ ETH Zürich ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2504.17728.md)] [[arXiv:2504.17728](https://arxiv.org/abs/2504.17728)] [[Code](https://github.com/WU-CVGL/CasualHDRSplat)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [6] FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors\n- **🧑‍🔬 作者**：Chenxi Li, Weijie Wang, Qiang Li, Bruno Lepri, Nicu Sebe, Weizhi Nie\n- **🏫 单位**：Tianjin University ⟐ University of Trento ⟐ Fondazione Bruno Kessler ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2505.01322.md)] [[arXiv:2505.01322](https://arxiv.org/abs/2505.01322)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [7] FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Boyuan Tian, Qizhe Gao, Siran Xianyu, Xiaotong Cui, Minjia Zhang\n- **🏫 单位**：UIUC\n- **🔗 链接**：[[中英摘要](./abs/2507.06671.md)] [[arXiv:2507.06671](https://arxiv.org/abs/2507.06671)] [[Code](https://github.com/Supercomputing-System-AI-Lab/FlexGaussian)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [8] Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections\n- **🧑‍🔬 作者**：Yongtang Bao, Chengjie Tang, Yuze Wang, Haojie Li\n- **🏫 单位**：University of Science and Technology ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2507.07395.md)] [[arXiv:2507.07395](https://arxiv.org/abs/2507.07395)] [[Code](https://github.com/Sugar0725/Seg-Wild)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [9] Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction\n- **🧑‍🔬 作者**：Hyungjun Doh, Dong In Lee, Seunggeun Chi, Pin-Hao Huang, Kwonjoon Lee, Sangpil Kim, Karthik Ramani\n- **🏫 单位**：Purdue University ⟐ Korea University ⟐ Honda Research Institute USA\n- **🔗 链接**：[[中英摘要](./abs/2507.08137.md)] [[arXiv:2507.08137](https://arxiv.org/abs/2507.08137)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [10] Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction\n- **🧑‍🔬 作者**：Xiufeng Huang, Ka Chun Cheung, Runmin Cong, Simon See, Renjie Wan\n- **🏫 单位**：Hong Kong Baptist University ⟐ NVIDIA ⟐ Shandong University\n- **🔗 链接**：[[中英摘要](./abs/2507.14921.md)] [[arXiv:2507.14921](https://arxiv.org/abs/2507.14921)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [11] SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations\n- **🧑‍🔬 作者**：Chunshi Wang, Hongxing Li, Yawei Luo\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2507.19835.md)] [[arXiv:2507.19835](https://arxiv.org/abs/2507.19835)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [12] GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting\n- **🧑‍🔬 作者**：Lei Yao, Yi Wang, Yi Zhang, Moyun Liu, Lap-Pui Chau\n- **🏫 单位**：Hong Kong Polytechnic University ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.02172.md)] [[arXiv:2508.02172](https://arxiv.org/abs/2508.02172)] [[Code](https://github.com/RayYoh/GaussianCross)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [13] 3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering\n- **🧑‍🔬 作者**：Junyu Zhou, Yuyang Huang, Wenrui Dai, Junni Zou, Ziyang Zheng, Nuowen Kan, Chenglin Li, Hongkai Xiong\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2508.05343.md)] [[arXiv:2508.05343](https://arxiv.org/abs/2508.05343)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [14] Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation\n- **🧑‍🔬 作者**：YoungChan Choi, HengFei Wang, YiHua Cheng, Boeun Kim, Hyung Jin Chang, YoungGeun Choi, Sang-Il Choi\n- **🏫 单位**：Dankook University ⟐ University of Birmingham\n- **🔗 链接**：[[中英摘要](./abs/2508.06136.md)] [[arXiv:2508.06136](https://arxiv.org/abs/2508.06136)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [15] E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras\n- **🧑‍🔬 作者**：Chaoran Feng, Zhenyu Tang, Wangbo Yu, Yatian Pang, Yian Zhao, Jianbin Zhao, Li Yuan, Yonghong Tian\n- **🏫 单位**：Peking University ⟐ National University of Singapore ⟐ Dalian University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.09912.md)] [[arXiv:2508.09912](https://arxiv.org/abs/2508.09912)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [16] TiP4GEN: Text to Immersive Panorama 4D Scene Generation\n- **🧑‍🔬 作者**：Ke Xing, Hanwen Liang, Dejia Xu, Yuyang Yin, Konstantinos N. Plataniotis, Yao Zhao, Yunchao Wei\n- **🏫 单位**：Beijing Jiaotong University ⟐ University of Toronto ⟐ The University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2508.12415.md)] [[arXiv:2508.12415](https://arxiv.org/abs/2508.12415)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [17] SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion\n- **🧑‍🔬 作者**：Zhiwen Yang, Yuxin Peng\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2509.11171.md)] [[arXiv:2509.11171](https://arxiv.org/abs/2509.11171)] [[Code](https://github.com/PKU-ICST-MIPL/SPHERE_ACMMM2025)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [18] SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments\n- **🧑‍🔬 作者**：Ruiyan Wang, Zhengxue Cheng, Zonghao Lin, Jun Ling, Yuzhou Liu, Yanru An, Rong Xie, Li Song\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2509.16960.md)] [[arXiv:2509.16960](https://arxiv.org/abs/2509.16960)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025 AECAI Workshop\n\n#### [19] PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion\n- **🧑‍🔬 作者**：Zhiwei Zhang, Ruikai Xu, Weijian Zhang, Zhizhong Zhang, Xin Tan, Jingyu Gong, Yuan Xie, Lizhuang Ma\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ East China Normal University ⟐ Shanghai Key Laboratory of Computer Software Evaluating and Testing\n- **🔗 链接**：[[中英摘要](./abs/2509.26008.md)] [[arXiv:2509.26008](https://arxiv.org/abs/2509.26008)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [20] BSGS: Bi-stage 3D Gaussian Splatting for Camera Motion Deblurring\n- **🧑‍🔬 作者**：An Zhao, Piaopiao Yu, Zhe Zhu, Mingqiang Wei\n- **🏫 单位**：Nanjing University of Aeronautics and Astronautics\n- **🔗 链接**：[[中英摘要](./abs/2510.12493.md)] [[arXiv:2510.12493](https://arxiv.org/abs/2510.12493)] [[Code](https://github.com/wsxujm/bsgs)]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [21] HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars\n- **🧑‍🔬 作者**：Haocheng Tang, Ruoke Yan, Xinhui Yin, Qi Zhang, Xinfeng Zhang, Siwei Ma, Wen Gao, Chuanmin Jia\n- **🏫 单位**：Peking University ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2510.16463.md)] [[arXiv:2510.16463](https://arxiv.org/abs/2510.16463)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n\n#### [22] Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular Video\n- **🧑‍🔬 作者**：Seonghwa Choi, Moonkyeong Choi, Mingyu Jang, Jaekyung Kim, Jianfei Cai, Wen-Huang Cheng, Sanghoon Lee\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](../abs/2512.09335.md)] [[arXiv:2512.09335](https://arxiv.org/abs/2512.09335)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MM 2025\n"
  },
  {
    "path": "2025/Accepted.md",
    "content": "# 3D Gaussian Splatting Papers Accepted in 2025\n\n#### [1] GaussianHead: Impressive 3D Gaussian-based Head Avatars with Dynamic Hybrid Neural Field\n- **🧑‍🔬 作者**：Jie Wang, Xianyan Li, Jiucheng Xie, Feng Xu, Hao Gao\n- **🏫 单位**：Nanjing University of Posts and Telecommunications ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2312.01632.md)] [[arXiv:2312.01632](https://arxiv.org/abs/2312.01632)] [[Code](https://github.com/chiehwangs/gaussian-head)]\n- **📝 说明**：🏆 Accepted to TVCG 2025\n\n#### [2] GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization\n- **🧑‍🔬 作者**：Yahao Shi, Yanmin Wu, Chenming Wu, Xing Liu, Chen Zhao, Haocheng Feng, Jingtuo Liu, Liangjun Zhang, Jian Zhang, Bin Zhou, Errui Ding, Jingdong Wang\n- **🏫 单位**：Beihang University ⟐ Peking University ⟐ Baidu VIS\n- **🔗 链接**：[[中英摘要](./abs/2312.05133.md)] [[arXiv:2312.05133](https://arxiv.org/abs/2312.05133)] [[Code](https://github.com/guduxiaolang/GIR)]\n- **📝 说明**：🏆 Accepted to TPAMI 2025\n\n#### [3] EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction\n- **🧑‍🔬 作者**：Yifan Liu, Chenxin Li, Chen Yang, Yixuan Yuan\n- **🏫 单位**：Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2401.12561.md)] [[arXiv:2401.12561](https://arxiv.org/abs/2401.12561)] [[Code](https://github.com/yifliu3/EndoGaussian)]\n- **📝 说明**：🏆 Accepted to TMI 2025\n\n#### [4] Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction\n- **🧑‍🔬 作者**：Qiuhong Shen, Xuanyu Yi, Zike Wu, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang\n- **🏫 单位**：National University of Singapore ⟐ Singapore Management University ⟐ Nanyang Technology University ⟐ Sea AI Lab ⟐ Skywork AI\n- **🔗 链接**：[[中英摘要](./abs/2403.18795.md)] [[arXiv:2403.18795](https://arxiv.org/abs/2403.18795)] [[Code](https://github.com/SkyworkAI/Gamba)]\n- **📝 说明**：🏆 Accepted to TPAMI 2025\n\n#### [5] StylizedGS: Controllable Stylization for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao\n- **🏫 单位**：The University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2404.05220.md)] [[arXiv:2404.05220](https://arxiv.org/abs/2404.05220)] [Code]\n- **📝 说明**: 🏆 Accepted to TPAMI 2025\n\n#### [6] CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding\n- **🧑‍🔬 作者**：Guibiao Liao, Jiankun Li, Zhenyu Bao, Xiaoqing Ye, Jingdong Wang, Qing Li, Kanglin Liu\n- **🏫 单位**：Peking University ⟐ Baidu Inc. ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2404.14249.md)] [[arXiv:2404.14249](https://arxiv.org/abs/2404.14249)] [[Code](https://github.com/gbliao/CLIP-GS)]\n- **📝 说明**: 🏆 Accepted to ACM TOMM 2025\n\n#### [7] Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Inkyu Shin, Qihang Yu, Xiaohui Shen, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen\n- **🏫 单位**：KAIST ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](./abs/2406.02541.md)] [[arXiv:2406.02541](https://arxiv.org/abs/2406.02541)] [[Code](https://github.com/dlsrbgg33/Video-3DGS)]\n- **📝 说明**：🏆 Accepted to TMLR 2025\n\n#### [8] Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field\n- **🧑‍🔬 作者**：Chao Wang, Krzysztof Wolski, Bernhard Kerbl, Ana Serrano, Mojtaba Bemana, Hans-Peter Seidel, Karol Myszkowski, Thomas Leimkühler\n- **🏫 单位**：Max-Planck-Institut für Informatik, Germany ⟐ Universidad de Zaragoza, I3A, Spain ⟐ Technische Universität Wien, Austria ⟐ Carnegie Mellon University, USA\n- **🔗 链接**：[[中英摘要](./abs/2406.07329.md)] [[arXiv:2406.07329](https://arxiv.org/abs/2406.07329)] [[Code](https://github.com/Hans1984/CineGS)]\n- **📝 说明**：🏆 Accepted to Pacific Graphics 2024\n\n#### [9] Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering\n- **🧑‍🔬 作者**：Jingqian Wu, Shuo Zhu, Chutian Wang, Edmund Y. Lam\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2407.11343.md)] [[arXiv:2407.11343](https://arxiv.org/abs/2407.11343)] [Code]\n- **📝 说明**：🏆 Accepted to IEEE MLSP 2024\n\n#### [10] GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion\n- **🧑‍🔬 作者**：Jiaxin Wei, Stefan Leutenegger\n- **🏫 单位**：Technical University of Munich ⟐ Imperial College London ⟐ Munich Institute of Robotics and Machine Intelligence\n- **🔗 链接**：[[中英摘要](../abs/2408.12677.md)] [[arXiv:2408.12677](https://arxiv.org/abs/2408.12677)] [[Code](https://github.com/GS-Fusion/GSFusion)]\n- **📝 说明**：🏆 Accepted to RAL 2025\n\n#### [11] gsplat: An Open-Source Library for Gaussian Splatting\n- **🧑‍🔬 作者**：Vickie Ye, Ruilong Li, Justin Kerr, Matias Turkulainen, Brent Yi, Zhuoyang Pan, Otto Seiskari, Jianbo Ye, Jeffrey Hu, Matthew Tancik, Angjoo Kanazawa\n- **🏫 单位**：UC Berkeley ⟐ Aalto University ⟐ ShanghaiTech University ⟐ SpectacularAI ⟐ Amazon ⟐ Luma AI\n- **🔗 链接**：[[中英摘要](../abs/2409.06765.md)] [[arXiv:2409.06765](https://arxiv.org/abs/2409.06765)] [[Code](https://github.com/nerfstudio-project/gsplat)]\n- **📝 说明**：🏆 Accepted to JMLR 2025\n\n#### [12] SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality\n- **🧑‍🔬 作者**：Hongjia Zhai, Xiyu Zhang, Boming Zhao, Hai Li, Yijia He, Zhaopeng Cui, Hujun Bao, Guofeng Zhang\n- **🏫 单位**：Zhejiang University ⟐ RayNeo\n- **🔗 链接**：[[中英摘要](../abs/2409.14067.md)] [[arXiv:2409.14067](https://arxiv.org/abs/2409.14067)] [[Code](https://github.com/zhaihongjia/SplatLoc)]\n- **📝 说明**：🏆 Accepted to TVCG 2025\n\n#### [13] Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model\n- **🧑‍🔬 作者**：Hongliang Zhong, Can Wang, Jingbo Zhang, Jing Liao\n- **🏫 单位**：Department of Computer Science, City University of Hong Kong, Hong Kong, China\n- **🔗 链接**：[[中英摘要](./abs/2409.16938.md)] [[arXiv:2409.16938](https://arxiv.org/abs/2409.16938)] [[Code](https://github.com/JiuTongBro/MultiView_Inpaint)]\n- **📝 说明**：🏆 Accepted to Visual Informatics 2025\n\n#### [14] Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance\n- **🧑‍🔬 作者**：Hongchao Shu, Mingxu Liu, Lalithkumar Seenivasan, Suxi Gu, Ping-Cheng Ku, Jonathan Knopf, Russell Taylor, Mathias Unberath\n- **🏫 单位**：Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA ⟐ Department of Orthopedics, Tsinghua Changgung Hospital, Tsinghua University, School of Medicine, Beijing, China ⟐ Arthrex, Inc. Naples, Florida, USA\n- **🔗 链接**：[[中英摘要](./abs/2410.00386.md)] [[arXiv:2410.00386](https://arxiv.org/abs/2410.00386)] [Code]\n- **📝 说明**：🏆 Accepted to AE-CAI 2024\n\n#### [15] CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Dapeng Feng, Zhiqiang Chen, Yizhen Yin, Shipeng Zhong, Yuhua Qi, Hongbo Chen\n- **🏫 单位**：Sun Yat-sen University ⟐ The University of Hong Kong ⟐ WeRide Inc.\n- **🔗 链接**：[[中英摘要](../abs/2410.00486.md)] [[arXiv:2410.00486](https://arxiv.org/abs/2410.00486)] [[Code](https://github.com/dapengfeng/cartgs)]\n- **📝 说明**：🏆 Accepted to RAL 2025\n\n#### [16] Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency\n- **🧑‍🔬 作者**：Florian Hahlbohm, Fabian Friederichs, Tim Weyrich, Linus Franke, Moritz Kappel, Susana Castillo, Marc Stamminger, Martin Eisemann, Marcus Magnor\n- **🏫 单位**：Computer Graphics Lab, TU Braunschweig, Germany ⟐ Visual Computing Erlangen, FAU Erlangen-Nürnberg, Germany ⟐ University College London (UCL), United Kingdom\n- **🔗 链接**：[[中英摘要](../abs/2410.08129.md)] [[arXiv:2410.08129](https://arxiv.org/abs/2410.08129)] [[Code](https://github.com/nerficg-project/HTGS)]\n- **📝 说明**：🏆 Accepted to Eurographics 2025\n\n#### [17] 4-LEGS: 4D Language Embedded Gaussian Splatting\n- **🧑‍🔬 作者**：Gal Fiebelman, Tamir Cohen, Ayellet Morgenstern, Peter Hedman, Hadar Averbuch-Elor\n- **🏫 单位**：Tel Aviv University ⟐ Google Research\n- **🔗 链接**：[[中英摘要](../abs/2410.10719.md)] [[arXiv:2410.10719](https://arxiv.org/abs/2410.10719)] [[Code](https://github.com/TAU-VAILab/4-LEGS)]\n- **📝 说明**：🏆 Accepted to Eurographics 2025\n\n#### [18] GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance information\n- **🧑‍🔬 作者**：Wancai Zheng, Xinyi Yu, Jintao Rong, Linlin Ou, Yan Wei, Libo Zhou\n- **🏫 单位**：Zhejiang University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2410.11356.md)] [[arXiv:2410.11356](https://arxiv.org/abs/2410.11356)] [[Code](https://github.com/Aczheng-cai/GSORB-SLAM)]\n- **📝 说明**: 🏆 Accepted to RAL 2025\n\n#### [19] MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields\n- **🧑‍🔬 作者**：Yuru Xiao, Deming Zhai, Wenbo Zhao, Kui Jiang, Junjun Jiang, Xianming Liu\n- **🏫 单位**：Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2410.11394.md)] [[arXiv:2410.11394](https://arxiv.org/abs/2410.11394)] [Code]\n- **📝 说明**: 🏆 Accepted to TPAMI 2025\n\n#### [20] E-3DGS: Gaussian Splatting with Exposure and Motion Events\n- **🧑‍🔬 作者**：Xiaoting Yin, Hao Shi, Yuhan Bao, Zhenshan Bing, Yiyi Liao, Kailun Yang, Kaiwei Wang\n- **🏫 单位**：Zhejiang University ⟐ Hunan University ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2410.16995.md)] [[arXiv:2410.16995](https://arxiv.org/abs/2410.16995)] [[Code](https://github.com/MasterHow/E-3DGS)]\n- **📝 说明**：🏆 Accepted to Applied Optics 2025\n\n#### [21] VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points\n- **🧑‍🔬 作者**：Linus Franke, Laura Fink, Marc Stamminger\n- **🏫 单位**：Friedrich-Alexander-Universitat Erlangen-N ¨ urnberg\n- **🔗 链接**：[[中英摘要](../abs/2410.17932.md)] [[arXiv:2410.17932](https://arxiv.org/abs/2410.17932)] [[Code](https://github.com/lfranke/vr_splatting)]\n- **📝 说明**：🏆 Accepted to i3D 2025\n\n#### [22] ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting\n- **🧑‍🔬 作者**：Yuetao Li, Zijia Kuang, Ting Li, Guyue Zhou, Shaohui Zhang, Zike Yan\n- **🏫 单位**：Beijing Institute of Technology ⟐ AIR, Tsinghua University ⟐ Centre for Frontier AI Research, A*STAR\n- **🔗 链接**：[[中英摘要](./abs/2410.21955.md)] [[arXiv:2410.21955](https://arxiv.org/abs/2410.21955)] [[Code](https://github.com/Li-Yuetao/ActiveSplat)]\n- **📝 说明**：🏆 Accepted to RAL 2025\n\n#### [23] MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation\n- **🧑‍🔬 作者**：Peng Wang, Lingzhe Zhao, Yin Zhang, Shiyu Zhao, Peidong Liu\n- **🏫 单位**：College of Computer Science and Technology at Zhejiang University ⟐ the School of Engineering at Westlake University ⟐ School of Engineering, Westlake University\n- **🔗 链接**：[[中英摘要](./abs/2411.08279.md)] [[arXiv:2411.08279](https://arxiv.org/abs/2411.08279)] [[Code](https://github.com/WU-CVGL/MBA-SLAM)]\n- **📝 说明**: 🏆 Accepted to TPAMI 2025\n\n#### [24] GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views\n- **🧑‍🔬 作者**：Boyao Zhou, Shunyuan Zheng, Hanzhang Tu, Ruizhi Shao, Boning Liu, Shengping Zhang, Liqiang Nie, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ Harbin Institute of Technology ⟐ Harbin Institute of Technology, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2411.11363.md)] [[arXiv:2411.11363](https://arxiv.org/abs/2411.11363)] [[Code](https://github.com/YaourtB/GPS_plus)]\n- **📝 说明**：🏆 Accepted to TPAMI 2025\n\n#### [25] Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps\n- **🧑‍🔬 作者**：Yiqing Liang, Mikhail Okunev, Mikaela Angelina Uy, Runfeng Li, Leonidas Guibas, James Tompkin, Adam W. Harley\n- **🏫 单位**：Brown University ⟐ Stanford University ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2412.04457.md)] [[arXiv:2412.04457](https://arxiv.org/abs/2412.04457)] [[Code](https://github.com/lynl7130/MonoDyGauBench_code)]\n- **📝 说明**: 🏆 Accepted to TMLR 2025\n\n#### [26] 4DRGS: 4D Radiative Gaussian Splatting for Efficient 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images\n- **🧑‍🔬 作者**：Zhentao Liu, Ruyi Zha, Huangxuan Zhao, Hongdong Li, Zhiming Cui\n- **🏫 单位**：ShanghaiTech Universit ⟐ Australian National University, Canberra, Australia ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2412.12919.md)] [[arXiv:2412.12919](https://arxiv.org/abs/2412.12919)] [Code]\n- **📝 说明**：🏆 Accepted to IPMI 2025\n\n#### [27] ActiveGS: Active Scene Reconstruction using Gaussian Splatting\n- **🧑‍🔬 作者**：Liren Jin, Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss, Marija Popović\n- **🏫 单位** University of Bonn ⟐ MAVLab ⟐ Lamarr Institute for Machine Learning and Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2412.17769.md)] [[arXiv:2412.17769](https://arxiv.org/abs/2412.17769)] [Code]\n- **📝 说明**：🏆 Accepted to RAL 2025\n\n#### [28] ActiveGAMER: Active GAussian Mapping through Efficient Rendering\n- **🧑‍🔬 作者**：Liyan Chen, Huangying Zhan, Kevin Chen, Xiangyu Xu, Qingan Yan, Changjiang Cai, Yi Xu\n- **🏫 单位**：OPPO US Research Center ⟐ Stevens Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2501.06897.md)] [[arXiv:2501.06897](https://arxiv.org/abs/2501.06897)] [Code]\n- **📝 说明**：🏆 Accepted to I3D 2025\n\n#### [29] RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians\n- **🧑‍🔬 作者**：Sen Peng, Weixing Xie, Zilong Wang, Xiaohu Guo, Zhonggui Chen, Baorong Yang, Xiao Dong\n- **🏫 单位**：Jimei University ⟐ Xiamen University ⟐ The University of Texas at Dallas ⟐ BNU-HKBU United International College\n- **🔗 链接**：[[中英摘要](../abs/2501.07104.md)] [[arXiv:2501.07104](https://arxiv.org/abs/2501.07104)] [[Code](https://github.com/RMAvatar/RMAvatar)]\n- **📝 说明**：🏆 Accepted to CVM2025\n\n#### [30] Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware Pruning for Compact Object Models\n- **🧑‍🔬 作者**：Marcel Rogge, Didier Stricker\n- **🏫 单位**：University of Kaiserslautern-Landau ⟐ Deutsches Forschungszentrum fuer Kuenstliche Intelligenz\n- **🔗 链接**：[[中英摘要](../abs/2501.08174.md)] [[arXiv:2501.08174](https://arxiv.org/abs/2501.08174)] [[Code](https://github.com/MarcelRogge/object-centric-2dgs)]\n- **📝 说明**：🏆 Accepted to ICPRAM 2025\n\n#### [31] Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study\n- **🧑‍🔬 作者**：Shi Qiu, Binzhu Xie, Qixuan Liu, Pheng-Ann Heng\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2501.09302.md)] [[arXiv:2501.09302](https://arxiv.org/abs/2501.09302)] [Code]\n- **📝 说明**：🏆 Accepted to IEEE VR 2025\n\n#### [32] HAC++: Towards 100X Compression of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yihang Chen, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai\n- **🏫 单位**：Monash University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2501.12255.md)] [[arXiv:2501.12255](https://arxiv.org/abs/2501.12255)] [[Code](https://github.com/YihangChen-ee/HAC-plus)]\n- **📝 说明**：🏆 Accepted to TPAMI 2025\n\n#### [33] HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting\n- **🧑‍🔬 作者**：Javier Yu, Timothy Chen, Mac Schwager\n- **🏫 单位**：Stanford University Department of Aeronautics and Astronautics\n- **🔗 链接**：[[中英摘要](./abs/2501.14147.md)] [[arXiv:2501.14147](https://arxiv.org/abs/2501.14147)] [Code]\n- **📝 说明**: 🏆 Accepted to RAL 2025\n\n#### [34] VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion\n- **🧑‍🔬 作者**：Shaoting Zhu, Linzhan Mou, Derun Li, Baijun Ye, Runhan Huang, Hang Zhao\n- **🏫 单位**：IIIS, Tsinghua University ⟐ Galaxea AI ⟐ Shanghai Qi Zhi Institute ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2502.01536.md)] [[arXiv:2502.01536](https://arxiv.org/abs/2502.01536)] [[Code](https://github.com/zst1406217/VR-Robo)]\n- **📝 说明**：🏆 Accepted to RAL 2025\n\n#### [35] PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map\n- **🧑‍🔬 作者**：Yue Pan, Xingguang Zhong, Liren Jin, Louis Wiesmann, Marija Popović, Jens Behley, Cyrill Stachniss\n- **🏫 单位**：University of Bonn ⟐ MAVLab ⟐  Lamarr Institute for Machine Learning and Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2502.05752.md)] [[arXiv:2502.05752](https://arxiv.org/abs/2502.05752)] [Code]\n- **📝 说明**: 🏆 Accepted to RSS 2025\n\n#### [36] GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis\n- **🧑‍🔬 作者**：Anand Kumar, Kavinder Roghit Kanthen, Josna John\n- **🏫 单位**：University of California, San Diego\n- **🔗 链接**：[[中英摘要](./abs/2502.16748.md)] [[arXiv:2502.16748](https://arxiv.org/abs/2502.16748)] [Code]\n- **📝 说明**：🏆 Accepted to SPIE Medical Imaging 2025\n\n#### [37] Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?\n- **🧑‍🔬 作者**：Adam Celarek, George Kopanas, George Drettakis, Michael Wimmer, Bernhard Kerbl\n- **🏫 单位**：TU Wien, Austria ⟐ Google, United Kingdom ⟐ Inria, France ⟐ Université Côte d’Azur, France\n- **🔗 链接**：[[中英摘要](../abs/2502.19318.md)] [[arXiv:2502.19318](https://arxiv.org/abs/2502.19318)] [[Code](https://github.com/cg-tuwien/does_3d_gaussian_splatting_need_accurate_volumetric_rendering)]\n- **📝 说明**：🏆 Accepted to Eurogrpahics 2025\n\n#### [38] GaussianSeal: Rooting Adaptive Watermarks for 3D Gaussian Generation Model\n- **🧑‍🔬 作者**：Runyi Li, Xuanyu Zhang, Chuhan Tong, Zhipei Xu, Jian Zhang\n- **🏫 单位**：School of Electronic and Computer Engineering, Peking University, Shenzhen, China\n- **🔗 链接**：[[中英摘要](./abs/2503.00531.md)] [[arXiv:2503.00531](https://arxiv.org/abs/2503.00531)] [Code]\n- **📝 说明**: 🏆 Accepted to MIR 2025\n\n#### [39] Learning High-Fidelity Robot Self-Model with Articulated 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kejun Hu, Peng Yu, Ning Tan\n- **🏫 单位**：School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China\n- **🔗 链接**：[[中英摘要](./abs/2503.05398.md)] [[arXiv:2503.05398](https://arxiv.org/abs/2503.05398)] [Code]\n- **📝 说明**: 🏆 Accepted to IJRR 2025\n\n#### [40] Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction\n- **🧑‍🔬 作者**：Xiaoming Peng, Yixin Yang, Yang Zhou, Hui Huang\n- **🏫 单位**：Visual Computing Research Center, CCSE, Shenzhen University\n- **🔗 链接**：[[中英摘要](./abs/2503.06587.md)] [[arXiv:2503.06587](https://arxiv.org/abs/2503.06587)] [Code]\n- **📝 说明**: 🏆 Accepted to Pacific Graphics 2025\n\n#### [41] Motion Blender Gaussian Splatting for Dynamic Reconstruction\n- **🧑‍🔬 作者**：Xinyu Zhang, Haonan Chang, Yuhan Liu, Abdeslam Boularias\n- **🏫 单位**：Department of Computer Science, Rutgers University\n- **🔗 链接**：[[中英摘要](./abs/2503.09040.md)] [[arXiv:2503.09040](https://arxiv.org/abs/2503.09040)] [[Code](https://github.com/mlzxy/motion-blender-gs)]\n- **📝 说明**: 🏆 Accepted to CoRL 2025\n\n#### [42] GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sixu Li, Ben Keller, Yingyan Celine Lin, Brucek Khailany\n- **🏫 单位**：School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA ⟐ NVIDIA Research, NVIDIA, Santa Clara, CA, USA ⟐ NVIDIA Research, NVIDIA, Austin, TX, USA\n- **🔗 链接**：[[中英摘要](../abs/2503.16681.md)] [[arXiv:2503.16681](https://arxiv.org/abs/2503.16681)] [Code]\n- **📝 说明**：🏆 Accepted to DAC 2025\n\n#### [43] VizFlyt: Perception-centric Pedagogical Framework For Autonomous Aerial Robots\n- **🧑‍🔬 作者**：Kushagra Srivastava, Rutwik Kulkarni, Manoj Velmurugan, Nitin J. Sanket\n- **🏫 单位**：Perception and Autonomous Robotics Group (PeAR)\n- **🔗 链接**：[[中英摘要](../abs/2503.22876.md)] [[arXiv:2503.22876](https://arxiv.org/abs/2503.22876)] [[Code](https://github.com/pearwpi/VizFlyt)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [44] Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR\n- **🧑‍🔬 作者**：Zhifan Ye, Yonggan Fu, Jingqun Zhang, Leshu Li, Yongan Zhang, Sixu Li, Cheng Wan, Chenxi Wan, Chaojian Li, Sreemanth Prathipati, Yingyan Celine Lin\n- **🏫 单位**：Georgia Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.23625.md)] [[arXiv:2503.23625](https://arxiv.org/abs/2503.23625)] [Code]\n- **📝 说明**：🏆 Accepted to HPCA 2025\n\n#### [45] Robust LiDAR-Camera Calibration with 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Shuyi Zhou, Shuxiang Xie, Ryoichi Ishikawa, Takeshi Oishi\n- **🏫 单位**：The Institute of Industrial Science, The University of Tokyo, Japan\n- **🔗 链接**：[[中英摘要](../abs/2504.00525.md)] [[arXiv:2504.00525](https://arxiv.org/abs/2504.00525)] [[Code](https://github.com/ShuyiZhou495/RobustCalibration)]\n- **📝 说明**：🏆 Accepted to RAL 2025\n\n#### [46] FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking\n- **🧑‍🔬 作者**：Ulas Gunes, Matias Turkulainen, Xuqian Ren, Arno Solin, Juho Kannala, Esa Rahtu\n- **🏫 单位**：Tampere University, Finland ⟐ Aalto University, Finland\n- **🔗 链接**：[[中英摘要](../abs/2504.01732.md)] [[arXiv:2504.01732](https://arxiv.org/abs/2504.01732)] [Code]\n- **📝 说明**：🏆 Accepted to SCIA 2025\n\n#### [47] Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization\n- **🧑‍🔬 作者**：Haishan Wang, Mohammad Hassan Vali, Arno Solin\n- **🏫 单位**：Department of Computer Science, Aalto University, Espoo, Finland\n- **🔗 链接**：[[中英摘要](../abs/2504.03059.md)] [[arXiv:2504.03059](https://arxiv.org/abs/2504.03059)] [Code]\n- **📝 说明**：🏆 Accepted to SCIA 2025\n\n#### [48] Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization\n- **🧑‍🔬 作者**：Yikai Wang, Guangce Liu, Xinzhou Wang, Zilong Chen, Jiafang Li, Xin Liang, Fuchun Sun, Jun Zhu\n- **🏫 单位**： Tsinghua University ⟐ ShengShu ⟐ Tongji University\n- **🔗 链接**：[[中英摘要](../abs/2504.04153.md)] [[arXiv:2504.04153](https://arxiv.org/abs/2504.04153)] [[Code](https://github.com/yikaiw/Vidu4D)]\n- **📝 说明**：🏆 Accepted to TPAMI 2025\n\n#### [49] Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation\n- **🧑‍🔬 作者**：Bram Vanherle, Brent Zoomers, Jeroen Put, Frank Van Reeth, Nick Michiels\n- **🏫 单位**：Hasselt University\n- **🔗 链接**：[[中英摘要](./abs/2504.08473.md)] [[arXiv:2504.08473](https://arxiv.org/abs/2504.08473)] [Code]\n- **📝 说明**: 🏆 Accepted to ROBOVIS 2025\n\n#### [50] 3D Gabor Splatting: Reconstruction of High-frequency Surface Texture using Gabor Noise\n- **🧑‍🔬 作者**：Haato Watanabe, Kenji Tojo, Nobuyuki Umetani\n- **🏫 单位**：The University of Tokyo\n- **🔗 链接**：[[中英摘要](./abs/2504.11003.md)] [[arXiv:2504.11003](https://arxiv.org/abs/2504.11003)] [Code]\n- **📝 说明**: 🏆 Accepted to Eurographics 2025\n\n#### [51] Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation\n- **🧑‍🔬 作者**：Sizhe Yang, Wenye Yu, Jia Zeng, Jun Lv, Kerui Ren, Cewu Lu, Dahua Lin, Jiangmiao Pang\n- **🏫 单位**：Shanghai AI Laboratory ⟐ The Chinese University of Hong Kong ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2504.13175.md)] [[arXiv:2504.13175](https://arxiv.org/abs/2504.13175)] [Code]\n- **📝 说明**：🏆 Accepted to RSS 2025\n\n#### [52] Green Robotic Mixed Reality with Gaussian Splatting\n- **🧑‍🔬 作者**：Chenxuan Liu, He Li, Zongze Li, Shuai Wang, Wei Xu, Kejiang Ye, Derrick Wing Kwan Ng, Chengzhong Xu\n- **🏫 单位**：Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China ⟐ University of Chinese Academy of Sciences ⟐ State Key Laboratory of IOTSC, Department of Computer and Information Science, University of Macau, Macau, China ⟐ Peng Cheng Laboratory, Shenzhen, China ⟐ Manifold Tech Limited, Hong Kong, China ⟐ School of Electrical Engineering and Telecommunications, the University of New South Wales, Australia\n- **🔗 链接**：[[中英摘要](./abs/2504.13697.md)] [[arXiv:2504.13697](https://arxiv.org/abs/2504.13697)] [Code]\n- **📝 说明**: 🏆 Accepted to IEEE INFOCOM 2025 Workshop on Networked Robotics and Communication Systems\n\n#### [53] NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation\n- **🧑‍🔬 作者**：Junyuan Fang, Zihan Wang, Yejun Zhang, Shuzhe Wang, Iaroslav Melekhov, Juho Kannala\n- **🏫 单位**：Aalto University, Espoo, Finland ⟐ University of Helsinki, Helsinki, Finland ⟐ University of Oulu, Oulu, Finland\n- **🔗 链接**：[[中英摘要](./abs/2504.14638.md)] [[arXiv:2504.14638](https://arxiv.org/abs/2504.14638)] [Code]\n- **📝 说明**: 🏆 Accepted to SCIA 2025\n\n#### [54] Immersive Teleoperation Framework for Locomanipulation Tasks\n- **🧑‍🔬 作者**：Takuya Boehringer, Jonathan Embley-Riches, Karim Hammoud, Valerio Modugno, Dimitrios Kanoulas\n- **🏫 单位**：Department of Computer Science, University College London\n- **🔗 链接**：[[中英摘要](../abs/2504.15229.md)] [[arXiv:2504.15229](https://arxiv.org/abs/2504.15229)] [Code]\n- **📝 说明**：🏆 Accepted to CASE 2025\n\n#### [55] PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation\n- **🧑‍🔬 作者**：Wenxuan Li, Hang Zhao, Zhiyuan Yu, Yu Du, Qin Zou, Ruizhen Hu, Kai Xu\n- **🏫 单位**：National University of Defense Technology ⟐ Wuhan University ⟐ Shenzhen University ⟐ Guangdong Laboratory of Artificial Intelligence and Digital Economy\n- **🔗 链接**：[[中英摘要](../abs/2504.16693.md)] [[arXiv:2504.16693](https://arxiv.org/abs/2504.16693)] [[Code](https://github.com/XuAdventurer/PIN-WM)]\n- **📝 说明**: 🏆 Accepted to RSS 2025\n\n#### [56] iVR-GS: Inverse Volume Rendering for Explorable Visualization via Editable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kaiyuan Tang, Siyuan Yao, Chaoli Wang\n- **🏫 单位**：Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA\n- **🔗 链接**：[[中英摘要](../abs/2504.17954.md)] [[arXiv:2504.17954](https://arxiv.org/abs/2504.17954)] [[Code](https://github.com/TouKaienn/iVR-GS)]\n- **📝 说明**: 🏆 Accepted to TVCG 2025\n\n#### [57] PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models\n- **🧑‍🔬 作者**：Michel Gokan Khan, Renan Guarese, Fabian Johnson, Xi Vincent Wang, Anders Bergman, Benjamin Edvinsson, Mario Romero, Jérémy Vachier, Jan Kronqvist\n- **🏫 单位**：School of Engineering Sciences, KTH Royal Institute of Technology, Stockholm, Sweden ⟐ Digital Futures, KTH Royal Institute of Technology, Stockholm, Sweden ⟐ School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden ⟐ AstraZeneca, Stockholm, Sweden ⟐ School of Industrial Engineering and Management, KTH Royal Institute of Technology, Stockholm, Sweden ⟐ Department of Science and Technology, Link¨ oping University, Norrk¨ oping, Sweden\n- **🔗 链接**：[[中英摘要](./abs/2504.18165.md)] [[arXiv:2504.18165](https://arxiv.org/abs/2504.18165)] [Code]\n- **📝 说明**: 🏆 Accepted to IEEE Access 2025\n\n#### [58] EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian\n- **🧑‍🔬 作者**：Hao Tian, Rui Liu, Wen Shen, Yilong Hu, Zhihao Zheng, Xiaolin Qin\n- **🏫 单位**：Chengdu Institute of Computer Applications, Chinese Academy of Sciences ⟐ Minzu University of China ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2504.20607.md)] [[arXiv:2504.20607](https://arxiv.org/abs/2504.20607)] [Code]\n- **📝 说明**: 🏆 Accepted to IJCNN 2025\n\n#### [59] GarmentGS: Point-Cloud Guided Gaussian Splatting for High-Fidelity Non-Watertight 3D Garment Reconstruction\n- **🧑‍🔬 作者**：Zhihao Tang, Shenghao Yang, Hongtao Zhang, Mingbo Zhao\n- **🏫 单位**：Donghua University\n- **🔗 链接**：[[中英摘要](../abs/2505.02126.md)] [[arXiv:2505.02126](https://arxiv.org/abs/2505.02126)] [Code]\n- **📝 说明**: 🏆 Accepted to ICMR 2025\n\n#### [60] Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting\n- **🧑‍🔬 作者**：Feng Yang, Wenliang Qian, Wangmeng Zuo, Hui Li\n- **🏫 单位**：Key Lab of Smart Prevention and Mitigation of Civil Engineering Disasters of the Ministry of Industry and Information Technology, Harbin Institute of Technology ⟐ School of Computer Science and Technology, Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2505.04262.md)] [[arXiv:2505.04262](https://arxiv.org/abs/2505.04262)] [Code]\n- **📝 说明**: 🏆 Accepted to Neural Networks 2025\n\n#### [61] VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality\n- **🧑‍🔬 作者**：Xuechang Tu, Lukas Radl, Michael Steiner, Markus Steinberger, Bernhard Kerbl, Fernando de la Torre\n- **🏫 单位**：Peking University ⟐ Graz University of Technology ⟐ Huawei Technologies ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2505.10144.md)] [[arXiv:2505.10144](https://arxiv.org/abs/2505.10144)] [[Code](https://github.com/Cekavis/VRSplat)]\n- **📝 说明**: 🏆 Accepted to I3D 2025\n\n#### [62] Multiview Geometric Regularization of Gaussian Splatting for Accurate Radiance Fields\n- **🧑‍🔬 作者**：Jungeon Kim, Geonsoo Park, Seungyong Lee\n- **🏫 单位**：POSTECH\n- **🔗 链接**：[[中英摘要](./abs/2506.13508.md)] [[arXiv:2506.13508](https://arxiv.org/abs/2506.13508)] [Code]\n- **📝 说明**: 🏆 Accepted to EGSR 2025\n\n#### [63] A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality\n- **🧑‍🔬 作者**：Shuo Xin, Haiyu Wang, Sai Qian Zhang\n- **🏫 单位**：New York University\n- **🔗 链接**：[[中英摘要](./abs/2507.04147.md)] [[arXiv:2507.04147](https://arxiv.org/abs/2507.04147)] [Code]\n- **📝 说明**: 🏆 Accepted to ICS 2025\n\n#### [64] NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kuangshi Ai, Kaiyuan Tang, Chaoli Wang\n- **🏫 单位**：University of Notre Dame\n- **🔗 链接**：[[中英摘要](./abs/2507.12621.md)] [[arXiv:2507.12621](https://arxiv.org/abs/2507.12621)] [[Code](https://github.com/KuangshiAi/nli4volvis)]\n- **📝 说明**: 🏆 Accepted to IEEE VIS 2025 Best Paper Award\n\n#### [65] TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting\n- **🧑‍🔬 作者**：Kaiyuan Tang, Kuangshi Ai, Jun Han, Chaoli Wang\n- **🏫 单位**：University of Notre Dame ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.13586.md)] [[arXiv:2507.13586](https://arxiv.org/abs/2507.13586)] [Code]\n- **📝 说明**: 🏆 Accepted to IEEE VIS 2025\n\n#### [66] DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hung Nguyen, Runfa Li, An Le, Truong Nguyen\n- **🏫 单位**：University of California San Diego\n- **🔗 链接**：[[中英摘要](./abs/2507.15690.md)] [[arXiv:2507.15690](https://arxiv.org/abs/2507.15690)] [Code]\n- **📝 说明**: 🏆 Accepted to VCIP 2025\n\n#### [67] LT-Gaussian: Long-Term Map Update Using 3D Gaussian Splatting for Autonomous Driving\n- **🧑‍🔬 作者**：Luqi Cheng, Zhangshuo Qi, Zijie Zhou, Chao Lu, Guangming Xiong\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.01704.md)] [[arXiv:2508.01704](https://arxiv.org/abs/2508.01704)] [[Code](https://github.com/ChengLuqi/LT-gaussian)]\n- **📝 说明**: 🏆 Accepted to IV 2025\n\n#### [68] Low-Frequency First: Eliminating Floating Artifacts in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jianchao Wang, Peng Zhou, Cen Li, Rong Quan, Jie Qin\n- **🏫 单位**：Nanjing University of Aeronautics and Astronautics\n- **🔗 链接**：[[中英摘要](./abs/2508.02493.md)] [[arXiv:2508.02493](https://arxiv.org/abs/2508.02493)] [[Code](https://github.com/jcwang-gh/EFA-GS)]\n- **📝 说明**: 🏆 Accepted to CW 2025\n\n#### [69] EGS-SLAM: RGB-D Gaussian Splatting SLAM with Events\n- **🧑‍🔬 作者**：Siyu Chen, Shenghai Yuan, Thien-Minh Nguyen, Zhuyu Huang, Chenyang Shi, Jin Jing, Lihua Xie\n- **🏫 单位**：Nanyang Technological University ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2508.07003.md)] [[arXiv:2508.07003](https://arxiv.org/abs/2508.07003)] [[Code](https://github.com/Chensiyu00/EGS-SLAM)]\n- **📝 说明**: 🏆 Accepted to RAL 2025\n\n#### [70] GS4Buildings: Prior-Guided Gaussian Splatting for 3D Building Reconstruction\n- **🧑‍🔬 作者**：Qilin Zhang, Olaf Wysocki, Boris Jutzi\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2508.07355.md)] [[arXiv:2508.07355](https://arxiv.org/abs/2508.07355)] [[Code](https://github.com/zqlin0521/GS4Buildings)]\n- **📝 说明**: 🏆 Accepted to ISPRS 3D GeoInfo & Smart Data, Smart Cities 2025\n\n#### [71] IntelliCap: Intelligent Guidance for Consistent View Sampling\n- **🧑‍🔬 作者**：Ayaka Yasunaga, Hideo Saito, Dieter Schmalstieg, Shohei Mori\n- **🏫 单位**：Keio University ⟐ University of Stuttgart\n- **🔗 链接**：[[中英摘要](./abs/2508.13043.md)] [[arXiv:2508.13043](https://arxiv.org/abs/2508.13043)] [Code]\n- **📝 说明**: 🏆 Accepted to ISMAR 2025\n\n#### [72] UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation\n- **🧑‍🔬 作者**：Zhaodong Jiang, Ashish Sinha, Tongtong Cao, Yuan Ren, Bingbing Liu, Binbin Xu\n- **🏫 单位**：Huawei Noah’s Ark Lab ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2508.15972.md)] [[arXiv:2508.15972](https://arxiv.org/abs/2508.15972)] [Code]\n- **📝 说明**: 🏆 Accepted to CoRL 2025\n\n#### [73] LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation\n- **🧑‍🔬 作者**：Yupeng Zhang, Dezhi Zheng, Ping Lu, Han Zhang, Lei Wang, Liping xiang, Cheng Luo, Kaijun Deng, Xiaowen Fu, Linlin Shen, Jinbao Wang\n- **🏫 单位**：Shenzhen University ⟐ ZTE Co., Ltd ⟐ King Abdullah University of Science and Technology ⟐ Guangdong Provincial Key Laboratory of Intelligent Information Processing\n- **🔗 链接**：[[中英摘要](./abs/2508.19699.md)] [[arXiv:2508.19699](https://arxiv.org/abs/2508.19699)] [[Code](https://github.com/garrisonz/LabelGS)]\n- **📝 说明**: 🏆 Accepted to PRCV 2025\n\n#### [74] RadGS-Reg: Registering Spine CT with Biplanar X-rays via Joint 3D Radiative Gaussians Reconstruction and 3D/3D Registration\n- **🧑‍🔬 作者**：Ao Shen, Xueming Fu, Junfeng Jiang, Qiang Zeng, Ye Tang, Zhengming Chen, Luming Nong, Feng Wang, S. Kevin Zhou\n- **🏫 单位**：Hohai University ⟐ University of Science and Technology of China ⟐ The Third Affiliated Hospital of Nanjing Medical University ⟐ Tuodao Medical Technology Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2508.21154.md)] [[arXiv:2508.21154](https://arxiv.org/abs/2508.21154)] [[Code](https://github.com/shenao1995/RadGS_Reg)]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n\n#### [75] GS-TG: 3D Gaussian Splatting Accelerator with Tile Grouping for Reducing Redundant Sorting while Preserving Rasterization Efficiency\n- **🧑‍🔬 作者**：Joongho Jo, Jongsun Park\n- **🏫 单位**：Korea University\n- **🔗 链接**：[[中英摘要](./abs/2509.00911.md)] [[arXiv:2509.00911](https://arxiv.org/abs/2509.00911)] [Code]\n- **📝 说明**: 🏆 Accepted to DAC 2025\n\n#### [76] Towards Integrating Multi-Spectral Imaging with Gaussian Splatting\n- **🧑‍🔬 作者**：Josef Grün, Lukas Meyer, Maximilian Weiherer, Bernhard Egger, Marc Stamminger, Linus Franke\n- **🏫 单位**：Friedrich-Alexander-Universität Erlangen-Nürnberg\n- **🔗 链接**：[[中英摘要](./abs/2509.00989.md)] [[arXiv:2509.00989](https://arxiv.org/abs/2509.00989)] [Code]\n- **📝 说明**: 🏆 Accepted to VMV 2025\n\n#### [77] Efficient Geometry Compression and Communication for 3D Gaussian Splatting Point Clouds\n- **🧑‍🔬 作者**：Liang Xie, Yanting Li, Luyang Tang, Wei Gao\n- **🏫 单位**：Guangdong University of Technology Guangzhou ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2509.02232.md)] [[arXiv:2509.02232](https://arxiv.org/abs/2509.02232)] [Code]\n- **📝 说明**: 🏆 Accepted to ACM MOBICOM 2025\n\n#### [78] On the Geometric Accuracy of Implicit and Primitive-based Representations Derived from View Rendering Constraints\n- **🧑‍🔬 作者**：Elias De Smijter, Renaud Detry, Christophe De Vleeschouwer\n- **🏫 单位**：KU Leuven ⟐ UCLouvain\n- **🔗 链接**：[[中英摘要](./abs/2509.10241.md)] [[arXiv:2509.10241](https://arxiv.org/abs/2509.10241)] [Code]\n- **📝 说明**: 🏆 Accepted to ASTRA 2025\n\n#### [79] A Controllable 3D Deepfake Generation Framework with Gaussian Splatting\n- **🧑‍🔬 作者**：Wending Liu, Siyun Liang, Huy H. Nguyen, Isao Echizen\n- **🏫 单位**：The University of Tokyo ⟐ National Institute of Informatics ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2509.11624.md)] [[arXiv:2509.11624](https://arxiv.org/abs/2509.11624)] [Code]\n- **📝 说明**: 🏆 Accepted to IJCB 2025\n\n#### [80] E2-BKI: Evidential Ellipsoidal Bayesian Kernel Inference for Uncertainty-aware Gaussian Semantic Mapping\n- **🧑‍🔬 作者**：Junyoung Kim, Minsik Jeon, Jihong Min, Kiho Kwak, Junwon Seo\n- **🏫 单位**：Agency for Defense Development ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2509.11964.md)] [[arXiv:2509.11964](https://arxiv.org/abs/2509.11964)] [Code]\n- **📝 说明**: 🏆 Accepted to IEEE RA-L 2025\n\n#### [81] Perception-Integrated Safety Critical Control via Analytic Collision Cone Barrier Functions on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Dario Tscholl, Yashwanth Nakka, Brian Gunter\n- **🏫 单位**： Georgia Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.14421.md)] [[arXiv:2509.14421](https://arxiv.org/abs/2509.14421)] [Code]\n- **📝 说明**: 🏆 Accepted to IEEE L-CSS 2025\n\n#### [82] ConfidentSplat: Confidence-Weighted Depth Fusion for Accurate 3D Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Amanuel T. Dufera, Yuan-Li Cai\n- **🏫 单位**：Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2509.16863.md)] [[arXiv:2509.16863](https://arxiv.org/abs/2509.16863)] [Code]\n- **📝 说明**: 🏆 Accepted to CCSSTA 2025\n\n#### [83] PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control\n- **🧑‍🔬 作者**：Tianheng Zhu, Yinfeng Yu, Liejun Wang, Fuchun Sun, Wendong Zheng\n- **🏫 单位**：Xinjiang University ⟐ Tsinghua University ⟐ Tianjin University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.16922.md)] [[arXiv:2509.16922](https://arxiv.org/abs/2509.16922)] [Code]\n- **📝 说明**: 🏆 Accepted to ICONIP 2025\n\n#### [84] GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts\n- **🧑‍🔬 作者**：Zhenyu Shu, Junlong Yu, Kai Chao, Shiqing Xin, Ligang Liu\n- **🏫 单位**：NingboTech University ⟐ Zhejiang University ⟐ Xi’an Jiaotong University ⟐ ShanDong University ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2509.26055.md)] [[arXiv:2509.26055](https://arxiv.org/abs/2509.26055)] [Code]\n- **📝 说明**: 🏆 Accepted to TVCG 2025\n\n#### [85] RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction\n- **🧑‍🔬 作者**：Leshu Li, Jiayin Qin, Jie Peng, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu Cao, Tianlong Chen, Yang Katie Zhao\n- **🏫 单位**：University of Minnesot ⟐ University of North Carolina at Chapel Hill ⟐ Georgia Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.06644.md)] [[arXiv:2510.06644](https://arxiv.org/abs/2510.06644)] [[Code](https://github.com/UMN-ZhaoLab/RTGS)]\n- **📝 说明**: 🏆 Accepted to MICRO 2025\n\n#### [86] SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis\n- **🧑‍🔬 作者**：Jipeng Lyu, Jiahua Dong, Yu-Xiong Wang\n- **🏫 单位**：University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2510.06694.md)] [[arXiv:2510.06694](https://arxiv.org/abs/2510.06694)] [Code]\n- **📝 说明**: 🏆 Accepted to TMLR 2025\n\n#### [87] DEGS: Deformable Event-based 3D Gaussian Splatting from RGB and Event Stream\n- **🧑‍🔬 作者**：Junhao He, Jiaxu Wang, Jia Li, Mingyuan Sun, Qiang Zhang, Jiahang Cao, Ziyi Zhang, Yi Gu, Jingkai Sun, Renjing Xu\n- **🏫 单位**：Hong Kong University of Science and Technology (Guangzhou) ⟐ Hong Kong University of Science and Technology ⟐ University of Hong Kong ⟐ Northeastern University\n- **🔗 链接**：[[中英摘要](./abs/2510.07752.md)] [[arXiv:2510.07752](https://arxiv.org/abs/2510.07752)] [Code]\n- **📝 说明**: 🏆 Accepted to TVCG 2025\n\n#### [88] Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Ankit Gahlawat, Anirban Mukherjee, Dinesh Babu Jayagopi\n- **🏫 单位**：International Institute of Information Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.08096.md)] [[arXiv:2510.08096](https://arxiv.org/abs/2510.08096)] [Code]\n- **📝 说明**: 🏆 Accepted to VCIP 2025\n\n#### [89] EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation\n- **🧑‍🔬 作者**：Tianheng Zhu, Yinfeng Yu, Liejun Wang, Fuchun Sun, Wendong Zheng\n- **🏫 单位**：Xinjiang University ⟐ Tsinghua University ⟐ Tianjin University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.08587.md)] [[arXiv:2510.08587](https://arxiv.org/abs/2510.08587)] [[Code](https://github.com/ZhuTianheng/EGSTalker)]\n- **📝 说明**: 🏆 Accepted to SMC 2025\n\n#### [90] Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications\n- **🧑‍🔬 作者**：Naruya Kondo, Yuto Asano, Yoichi Ochiai\n- **🏫 单位**：University of Tsukuba\n- **🔗 链接**：[[中英摘要](./abs/2510.13978.md)] [[arXiv:2510.13978](https://arxiv.org/abs/2510.13978)] [Code]\n- **📝 说明**: 🏆 Accepted to SUI 2025 Demo Track\n\n#### [91] BalanceGS: Algorithm-System Co-design for Efficient 3D Gaussian Splatting Training on GPU\n- **🧑‍🔬 作者**：Junyi Wu, Jiaming Xu, Jinhao Li, Yongkang Zhou, Jiayi Pan, Xingyang Li, Guohao Dai\n- **🏫 单位**：1Shanghai Jiao Tong University ⟐ Infinigence-AI ⟐ SII\n- **🔗 链接**：[[中英摘要](./abs/2510.14564.md)] [[arXiv:2510.14564](https://arxiv.org/abs/2510.14564)] [Code]\n- **📝 说明**: 🏆 Accepted to ASP-DAC 2026\n\n#### [92] From Volume Rendering to 3D Gaussian Splatting: Theory and Applications\n- **🧑‍🔬 作者**：Vitor Pereira Matias, Daniel Perazzo, Vinicius Silva, Alberto Raposo, Luiz Velho, Afonso Paiva, Tiago Novello\n- **🏫 单位**：ICMC-USP ⟐ IMPA ⟐ PUC-RIO\n- **🔗 链接**：[[中英摘要](./abs/2510.18101.md)] [[arXiv:2510.18101](https://arxiv.org/abs/2510.18101)] [Code]\n- **📝 说明**: 🏆 Accepted to SIBGRAPI 2025\n\n#### [93] Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation\n- **🧑‍🔬 作者**：Anthony Opipari, Aravindhan K Krishnan, Shreekant Gayaka, Min Sun, Cheng-Hao Kuo, Arnie Sen, Odest Chadwicke Jenkins\n- **🏫 单位**：University of Michigan ⟐ Amazon Inc.\n- **🔗 链接**：[[中英摘要](./abs/2510.23521.md)] [[arXiv:2510.23521](https://arxiv.org/abs/2510.23521)] [Code]\n- **📝 说明**: 🏆 Accepted to RAL 2025\n\n#### [94] Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation\n- **🧑‍🔬 作者**：Yuxiang Mao, Zhijie Zhang, Zhiheng Zhang, Jiawei Liu, Chen Zeng, Shihong Xia\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Huadian (Beijing) Co-Generation Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2510.25234.md)] [[arXiv:2510.25234](https://arxiv.org/abs/2510.25234)] [Code]\n- **📝 说明**: 🏆 Accepted to ICXR 2025\n\n#### [95] CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hexu Zhao, Xiwen Min, Xiaoteng Liu, Moonjun Gong, Yiming Li, Ang Li, Saining Xie, Jinyang Li, Aurojit Panda\n- **🏫 单位**：New York University ⟐ Pacific Northwest National Laboratory ⟐ University of Washington\n- **🔗 链接**：[[中英摘要](./abs/2511.04951.md)] [[arXiv:2511.04951](https://arxiv.org/abs/2511.04951)] [Code]\n- **📝 说明**: 🏆 Accepted to ASPLOS 2026\n\n#### [96] EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering\n- **🧑‍🔬 作者**：Pierrick Bournez, Luca Savant Aira, Thibaud Ehret, Gabriele Facciolo\n- **🏫 单位**：Universite Paris-Saclay ⟐ Politecnico di Torino ⟐ AMIAD\n- **🔗 链接**：[[中英摘要](../abs/2511.16542.md)] [[arXiv:2511.16542](https://arxiv.org/abs/2511.16542)] [Code]\n- **📝 说明**: 🏆 Accepted to ISPRS 2025\n\n#### [97] FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception\n- **🧑‍🔬 作者**：Shubham Sonarghare, Prasad Deshpande, Ciaran Hogan, Deepika-Rani Kaliappan-Mahalingam, Ganesh Sistu\n- **🏫 单位**：Valeo Vision Systems\n- **🔗 链接**：[[中英摘要](../abs/2511.17210.md)] [[arXiv:2511.17210](https://arxiv.org/abs/2511.17210)] [Code]\n- **📝 说明**: 🏆 Accepted to IMVIP 2025\n\n#### [98] ReCoGS: Real-time ReColoring for Gaussian Splatting scenes\n- **🧑‍🔬 作者**：Lorenzo Rutayisire, Nicola Capodieci, Fabio Pellacini\n- **🏫 单位**： University of Modena and Reggio Emilia\n- **🔗 链接**：[[中英摘要](../abs/2511.18441.md)] [[arXiv:2511.18441](https://arxiv.org/abs/2511.18441)] [[Code](https://github.com/loryruta/recogs)]\n- **📝 说明**: 🏆 Accepted to STAG 2025\n\n#### [99] Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs\n- **🧑‍🔬 作者**：Cyrus Vachha, Yixiao Kang, Zach Dive, Ashwat Chidambaram, Anik Gupta, Eunice Jun, Bjoern Hartmann\n- **🏫 单位**：UC Berkeley ⟐ UCLA\n- **🔗 链接**：[[中英摘要](../abs/2512.20129.md)] [[arXiv:2512.20129](https://arxiv.org/abs/2512.20129)] [Code]\n- **📝 说明**: 🏆 Accepted to CHI 2025\n"
  },
  {
    "path": "2025/BMVC.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to BMVC2025\n\n#### [1] RUSplatting: Robust 3D Gaussian Splatting for Sparse-View Underwater Scene Reconstruction\n- **🧑‍🔬 作者**：Zhuodong Jiang, Haoran Wang, Guoxi Huang, Brett Seymour, Nantheera Anantrasirichai\n- **🏫 单位**：University of Bristol ⟐ Submerged Resources Centre\n- **🔗 链接**：[[中英摘要](./abs/2505.15737.md)] [[arXiv:2505.15737](https://arxiv.org/abs/2505.15737)] [Code]\n- **📝 说明**: 🏆 Accepted to BMVC 2025\n\n#### [2] HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yimin Pan, Matthias Nießner, Tobias Kirschstein\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2509.07774.md)] [[arXiv:2509.07774](https://arxiv.org/abs/2509.07774)] [Code]\n- **📝 说明**: 🏆 Accepted to BMVC 2025\n\n#### [3] Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams\n- **🧑‍🔬 作者**：Takuya Nakabayashi, Navami Kairanda, Hideo Saito, Vladislav Golyanik\n- **🏫 单位**：Keio University ⟐ Max Planck Institute for Informatics\n- **🔗 链接**：[[中英摘要](./abs/2510.11717.md)] [[arXiv:2510.11717](https://arxiv.org/abs/2510.11717)] [Code]\n- **📝 说明**: 🏆 Accepted to BMVC 2025\n"
  },
  {
    "path": "2025/CVPR.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to CVPR2025\n\n#### [1] Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering\n- **🧑‍🔬 作者**：Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang\n- **🏫 单位**：University of Utah ⟐ Zhejiang University ⟐ UCLA\n- **🔗 链接**：[[中英摘要](../abs/2401.15318.md)] [[arXiv:2401.15318](https://arxiv.org/abs/2401.15318)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [2] MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing\n- **🧑‍🔬 作者**：Cong Wang, Di Kang, He-Yi Sun, Shen-Han Qian, Zi-Xuan Wang, Linchao Bao, Song-Hai Zhang\n- **🏫 单位**：Tsinghua University ⟐ Tencent AI Lab ⟐ Technical University of Munich ⟐ Carnegie Mellon University ⟐ University of Birmingham\n- **🔗 链接**：[[中英摘要](../abs/2404.19026.md)] [[arXiv:2404.19026](https://arxiv.org/abs/2404.19026)] [[Code](https://github.com/conallwang/MeGA)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [3] DOF-GS:Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal\n- **🧑‍🔬 作者**：Yujie Wang, Praneeth Chakravarthula, Baoquan Chen\n- **🏫 单位**：National Key Lab of General AI, China ⟐ Peking University ⟐ University of North Carolina at Chapel Hill\n- **🔗 链接**：[[中英摘要](../abs/2405.17351.md)] [[arXiv:2405.17351](https://arxiv.org/abs/2405.17351)] [[Code](https://github.com/rongduo/DOFGS_implementation)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [4] Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh\n- **🧑‍🔬 作者**：Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Tencent AI Lab ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](../abs/2405.17811.md)] [[arXiv:2405.17811](https://arxiv.org/abs/2405.17811)] [[Code](https://github.com/gaoxiangjun/Mani-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [5] 3D-HGS: 3D Half-Gaussian Splatting\n- **🧑‍🔬 作者**：Haolin Li, Jinyang Liu, Mario Sznaier, Octavia Camps\n- **🏫 单位**：Northeastern University\n- **🔗 链接**：[[中英摘要](../abs/2406.02720.md)] [[arXiv:2406.02720](https://arxiv.org/abs/2406.02720)] [[Code](https://github.com/lihaolin88/3D-Half-Gaussian-Splatting)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [6] Improving Gaussian Splatting with Localized Points Management\n- **🧑‍🔬 作者**：Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu\n- **🏫 单位**：University of Surrey ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2406.04251.md)] [[arXiv:2406.04251](https://arxiv.org/abs/2406.04251)] [[Code](https://github.com/Surrey-UP-Lab/GS-LPM)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [7] Generative Gaussian Splatting for Unbounded 3D City Generation\n- **🧑‍🔬 作者**：Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu\n- **🏫 单位**：Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2406.06526.md)] [[arXiv:2406.06526](https://arxiv.org/abs/2406.06526)] [[Code](https://github.com/hzxie/GaussianCity)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [8] WonderWorld: Interactive 3D Scene Generation from a Single Image\n- **🧑‍🔬 作者**：Hong-Xing Yu, Haoyi Duan, Charles Herrmann, William T. Freeman, Jiajun Wu\n- **🏫 单位**：Stanford University ⟐ MIT\n- **🔗 链接**：[[中英摘要](../abs/2406.09394.md)] [[arXiv:2406.09394](https://arxiv.org/abs/2406.09394)] [[Code](https://github.com/KovenYu/WonderWorld)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [9] PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Alex Hanson, Allen Tu, Vasu Singla, Mayuka Jayawardhana, Matthias Zwicker, Tom Goldstein\n- **🏫 单位**：University of Maryland, College Park\n- **🔗 链接**：[[中英摘要](../abs/2406.10219.md)] [[arXiv:2406.10219](https://arxiv.org/abs/2406.10219)] [[Code](https://github.com/j-alex-hanson/gaussian-splatting-pup)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [10] Gaussian Eigen Models for Human Heads\n- **🧑‍🔬 作者**：Wojciech Zielonka, Timo Bolkart, Thabo Beeler, Justus Thies\n- **🏫 单位**：Max Planck Institute for Intelligent Systems ⟐ Google ⟐ TU Darmstadt\n- **🔗 链接**：[[中英摘要](../abs/2407.04545.md)] [[arXiv:2407.04545](https://arxiv.org/abs/2407.04545)] [[Code](https://github.com/Zielon/GEM)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [11] FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering\n- **🧑‍🔬 作者**：Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai\n- **🏫 单位**：Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2408.07967.md)] [[arXiv:2408.07967](https://arxiv.org/abs/2408.07967)] [[Code](https://github.com/InternLandMark/FlashGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [12] Towards Realistic Example-based Modeling via 3D Gaussian Stitching\n- **🧑‍🔬 作者**：Xinyu Gao, Ziyi Yang, Bingchen Gong, Xiaoguang Han, Sipeng Yang, Xiaogang Jin\n- **🏫 单位**：Zhejiang University ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2408.15708.md)] [[arXiv:2408.15708](https://arxiv.org/abs/2408.15708)] [[Code](https://github.com/gs-learner/gs_stitching)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [13] GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers\n- **🧑‍🔬 作者**：Lorenza Prospero, Abdullah Hamdi, Joao F. Henriques, Christian Rupprecht\n- **🏫 单位**：Visual Geometry Group, University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2409.04196.md)] [[arXiv:2409.04196](https://arxiv.org/abs/2409.04196)] [[Code](https://github.com/prosperolo/GST)]\n- **📝 说明**：🏆 Accepted to CVPR 2025 CVSports Workshop\n\n#### [14] 3D-GSW: 3D Gaussian Splatting Watermark for Protecting Copyrights in Radiance Fields\n- **🧑‍🔬 作者**：Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim\n- **🏫 单位**：Korea University ⟐ Google Research ⟐ University of Alberta\n- **🔗 链接**：[[中英摘要](./abs/2409.13222.md)] [[arXiv:2409.13222](https://arxiv.org/abs/2409.13222)] [[Code](https://github.com/kuai-lab/cvpr25_3D-GSW)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [15] Disco4D: Disentangled 4D Human Generation and Animation from a Single Image\n- **🧑‍🔬 作者**：Hui En Pang, Shuai Liu, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu\n- **🏫 单位**：S-Lab, Nanyang Technological University ⟐ SenseTime Research ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2409.17280.md)] [[arXiv:2409.17280](https://arxiv.org/abs/2409.17280)] [[Code](https://github.com/disco-4d/Disco4D)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [16] RNG: Relightable Neural Gaussians\n- **🧑‍🔬 作者**：Jiahui Fan, Fujun Luan, Jian Yang, Miloš Hašan, Beibei Wang\n- **🏫 单位**：Nanjing University of Science and Technology, China ⟐ Adobe Research, USA ⟐ Nanjing University, China\n- **🔗 链接**：[[中英摘要](../abs/2409.19702.md)] [[arXiv:2409.19702](https://arxiv.org/abs/2409.19702)] [[Code](https://github.com/sssssy/RNG_release)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [17] IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera\n- **🧑‍🔬 作者**：Jian Huang, Chengrui Dong, Peidong Liu\n- **🏫 单位**：Zhejiang University ⟐ Westlake University\n- **🔗 链接**：[[中英摘要](../abs/2410.08107.md)] [[arXiv:2410.08107](https://arxiv.org/abs/2410.08107)] [[Code](https://github.com/wu-cvgl/IncEventGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [18] DepthSplat: Connecting Gaussian Splatting and Depth\n- **🧑‍🔬 作者**：Haofei Xu, Songyou Peng, Fangjinhua Wang, Hermann Blum, Daniel Barath, Andreas Geiger, Marc Pollefeys\n- **🏫 单位**：ETH Zurich ⟐ Skywork AI ⟐ University of Tübingen, Tübingen AI Center ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](../abs/2410.13862.md)] [[arXiv:2410.13862](https://arxiv.org/abs/2410.13862)] [[Code](https://github.com/cvg/depthsplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [19] SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes\n- **🧑‍🔬 作者**：Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu, Jie-Ying Lee, Jiun-Long Huang, Yu-Chee Tseng, Yu-Lun Liu\n- **🏫 单位**：National Yang Ming Chiao Tung University ⟐ University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](../abs/2410.17249.md)] [[arXiv:2410.17249](https://arxiv.org/abs/2410.17249)] [[Code](https://github.com/cdfan0627/SpectroMotion)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [20] GaussianSpa: An Optimizing-Sparsifying Simplification Framework for Compact and High-Quality 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yangming Zhang, Wenqi Jia, Wei Niu, Miao Yin\n- **🏫 单位**：Department of Computer Science, University of Texas at Arlington ⟐ School of Computing, University of Georgia\n- **🔗 链接**：[[中英摘要](../abs/2411.06019.md)] [[arXiv:2411.06019](https://arxiv.org/abs/2411.06019)] [[Code](https://github.com/miaoyin390/GaussianSpa)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [21] USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting\n- **🧑‍🔬 作者**：Kang Chen, Jiyuan Zhang, Zecheng Hao, Yajing Zheng, Tiejun Huang, Zhaofei Yu\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](../abs/2411.10504.md)] [[arXiv:2411.10504](https://arxiv.org/abs/2411.10504)] [[Code](https://github.com/chenkang455/USP-Gaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [22] DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes\n- **🧑‍🔬 作者**：Chensheng Peng, Chengwei Zhang, Yixiao Wang, Chenfeng Xu, Yichen Xie, Wenzhao Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan\n- **🏫 单位**：UC Berkeley\n- **🔗 链接**：[[中英摘要](../abs/2411.11921.md)] [[arXiv:2411.11921](https://arxiv.org/abs/2411.11921)] [[Code](https://github.com/chengweialan/DeSiRe-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [23] FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting\n- **🧑‍🔬 作者**：Fangyu Wu, Yuhao Chen\n- **🏫 单位**：University of Waterloo\n- **🔗 链接**：[[中英摘要](./abs/2411.12089.md)] [[arXiv:2411.12089](https://arxiv.org/abs/2411.12089)] [[Code](https://github.com/fanguw/FruitNinja3DInterior)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [24] VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving\n- **🧑‍🔬 作者**：Haiming Zhang, Wending Zhou, Yiyao Zhu, Xu Yan, Jiantao Gao, Dongfeng Bai, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li\n- **🏫 单位**：FNii, Shenzhen ⟐ CUHK-Shenzhen ⟐ HKUST ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2411.14716.md)] [[arXiv:2411.14716](https://arxiv.org/abs/2411.14716)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [25] 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes\n- **🧑‍🔬 作者**：Jan Held, Renaud Vandeghen, Abdullah Hamdi, Adrien Deliege, Anthony Cioppa, Silvio Giancola, Andrea Vedaldi, Bernard Ghanem, Marc Van Droogenbroeck\n- **🏫 单位**：University of Liège ⟐ KAUST ⟐ University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2411.14974.md)] [[arXiv:2411.14974](https://arxiv.org/abs/2411.14974)] [[Code](https://github.com/convexsplatting/convex-splatting)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [26] SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis\n- **🧑‍🔬 作者**：Hyojun Go, Byeongjun Park, Jiho Jang, Jin-Young Kim, Soonwoo Kwon, Changick Kim\n- **🏫 单位**：Twelve Labs ⟐ KAIST\n- **🔗 链接**：[[中英摘要](../abs/2411.16443.md)] [[arXiv:2411.16443](https://arxiv.org/abs/2411.16443)] [[Code](https://github.com/gohyojun15/SplatFlow)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [27] MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM\n- **🧑‍🔬 作者**：Vladimir Yugay, Theo Gevers, Martin R. Oswald\n- **🏫 单位**：University of Amsterdam\n- **🔗 链接**：[[中英摘要](../abs/2411.16785.md)] [[arXiv:2411.16785](https://arxiv.org/abs/2411.16785)] [[Code](https://github.com/VladimirYugay/MAGiC-SLAM)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [28] SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving\n- **🧑‍🔬 作者**：Georg Hess, Carl Lindström, Maryam Fatemi, Christoffer Petersson, Lennart Svensson\n- **🏫 单位**：Zenseact ⟐ Chalmers University of Technology\n- **🔗 链接**：[[中英摘要](../abs/2411.16816.md)] [[arXiv:2411.16816](https://arxiv.org/abs/2411.16816)] [[Code](https://github.com/carlinds/splatad)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [29] Geometry Field Splatting with Gaussian Surfels\n- **🧑‍🔬 作者**：Kaiwen Jiang, Venkataram Sivaram, Cheng Peng, Ravi Ramamoorthi\n- **🏫 单位**：UC, San Diego ⟐ Johns Hopkins University\n- **🔗 链接**：[[中英摘要](../abs/2411.17067.md)] [[arXiv:2411.17067](https://arxiv.org/abs/2411.17067)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [30] SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Gyeongjin Kang, Jisang Yoo, Jihyeon Park, Seungtae Nam, Hyeonsoo Im, Sangheon Shin, Sangpil Kim, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Hanhwa Systems ⟐ Korea University\n- **🔗 链接**：[[中英摘要](../abs/2411.17190.md)] [[arXiv:2411.17190](https://arxiv.org/abs/2411.17190)] [[Code](https://github.com/Gynjn/selfsplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [31] Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters\n- **🧑‍🔬 作者**：Zhiyang Guo, Jinxu Xiang, Kai Ma, Wengang Zhou, Houqiang Li, Ran Zhang\n- **🏫 单位**：CAS Key Laboratory of Technology in GIPAS, EEIS Department ⟐ University of Science and Technology of China ⟐ Tencent PCG\n- **🔗 链接**：[[中英摘要](../abs/2411.18197.md)] [[arXiv:2411.18197](https://arxiv.org/abs/2411.18197)] [[Code](https://github.com/jasongzy/Make-It-Animatable)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [32] Textured Gaussians for Enhanced 3D Scene Appearance Modeling\n- **🧑‍🔬 作者**：Brian Chao, Hung-Yu Tseng, Lorenzo Porzi, Chen Gao, Tuotuo Li, Qinbo Li, Ayush Saraf, Jia-Bin Huang, Johannes Kopf, Gordon Wetzstein, Changil Kim\n- **🏫 单位**：Stanford University ⟐ Meta ⟐ University of Maryland College Park\n- **🔗 链接**：[[中英摘要](../abs/2411.18625.md)] [[arXiv:2411.18625](https://arxiv.org/abs/2411.18625)] [[Code](https://github.com/bchao1/textured_gaussians)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [33] InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception\n- **🧑‍🔬 作者**：Haijie Li, Yanmin Wu, Jiarui Meng, Qiankun Gao, Zhiyao Zhang, Ronggang Wang, Jian Zhang\n- **🏫 单位**：School of Electronic and Computer Engineering, Peking University, China ⟐ Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Peking University Shenzhen Graduate School, China ⟐ College of Information Science and Engineering, Northeastern University, China\n- **🔗 链接**：[[中英摘要](../abs/2411.19235.md)] [[arXiv:2411.19235](https://arxiv.org/abs/2411.19235)] [[Code](https://github.com/lhj-git/InstanceGasuusian_code)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [34] TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Bojun Xiong, Jialun Liu, Jiakui Hu, Chenming Wu, Jinbo Wu, Xing Liu, Chen Zhao, Errui Ding, Zhouhui Lian\n- **🏫 单位**：Wangxuan Institute of Computer Technology, Peking University ⟐ Baidu VIS ⟐ Institute of Medical Technology, Peking University\n- **🔗 链接**：[[中英摘要](../abs/2411.19654.md)] [[arXiv:2411.19654](https://arxiv.org/abs/2411.19654)] [[Code](https://github.com/ymxbj/TexGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [35] GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zixuan Chen, Guangcong Wang, Jiahao Zhu, Jianhuang Lai, Xiaohua Xie\n- **🏫 单位**：Sun Yat-Sen University ⟐ Great Bay University\n- **🔗 链接**：[[中英摘要](../abs/2411.19895.md)] [[arXiv:2411.19895](https://arxiv.org/abs/2411.19895)] [[Code](https://github.com/NarcissusEx/GuardSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [36] Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives\n- **🧑‍🔬 作者**：Alex Hanson, Allen Tu, Geng Lin, Vasu Singla, Matthias Zwicker, Tom Goldstein\n- **🏫 单位**：University of Maryland, College Park\n- **🔗 链接**：[[中英摘要](./abs/2412.00578.md)] [[arXiv:2412.00578](https://arxiv.org/abs/2412.00578)] [[Code](https://github.com/j-alex-hanson/speedy-splat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [37] Ref-GS: Directional Factorization for 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Youjia Zhang, Anpei Chen, Yumin Wan, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ University of Tubingen ⟐ Westlake University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2412.00905.md)] [[arXiv:2412.00905](https://arxiv.org/abs/2412.00905)] [[Code](https://github.com/YoujiaZhang/Ref-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [38] SfM-Free 3D Gaussian Splatting via Hierarchical Training\n- **🧑‍🔬 作者**：Bo Ji, Angela Yao\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2412.01553.md)] [[arXiv:2412.01553](https://arxiv.org/abs/2412.01553)] [[Code](https://github.com/jibo27/3DGS_Hierarchical_Training)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [39] Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes\n- **🧑‍🔬 作者**：Lihan Jiang, Kerui Ren, Mulin Yu, Linning Xu, Junting Dong, Tao Lu, Feng Zhao, Dahua Lin, Bo Dai\n- **🏫 单位**：University of Science and Technology of China ⟐ Shanghai Jiao Tong University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong ⟐ Brown University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2412.01745.md)] [[arXiv:2412.01745](https://arxiv.org/abs/2412.01745)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [40] AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction\n- **🧑‍🔬 作者**：Lingteng Qiu, Shenhao Zhu, Qi Zuo, Xiaodong Gu, Yuan Dong, Junfei Zhang, Chao Xu, Zhe Li, Weihao Yuan, Liefeng Bo, Guanying Chen, Zilong Dong\n- **🏫 单位**：Alibaba Group ⟐ Sun Yat-sen University ⟐ Nanjing University ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](../abs/2412.02684.md)] [[arXiv:2412.02684](https://arxiv.org/abs/2412.02684)] [[Code](https://github.com/aigc3d/AniGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [41] Volumetrically Consistent 3D Gaussian Rasterization\n- **🧑‍🔬 作者**：Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi, Nicholas Antipa\n- **🏫 单位**：University of California San Diego\n- **🔗 链接**：[[中英摘要](./abs/2412.03378.md)] [[arXiv:2412.03378](https://arxiv.org/abs/2412.03378)] [[Code](https://github.com/chinmay0301ucsd/Vol3DGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [42] HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jingyu Lin, Jiaqi Gu, Lubin Fan, Bojian Wu, Yujing Lou, Renjie Chen, Ligang Liu, Jieping Ye\n- **🏫 单位**：University of Science and Technology of China ⟐ Alibaba Cloud Computing ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2412.03844.md)] [[arXiv:2412.03844](https://arxiv.org/abs/2412.03844)] [[Code](https://github.com/Yeyuqqwx/HybridGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [43] Multi-View Pose-Agnostic Change Localization with Zero Labels\n- **🧑‍🔬 作者**：Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim, Donald Dansereau, Niko Suenderhauf, Dimity Miller\n- **🏫 单位**：QUT Centre for Robotics ⟐ ARIAM Hub ⟐ University of Sydney ⟐ Abyss Solutions\n- **🔗 链接**：[[中英摘要](../abs/2412.03911.md)] [[arXiv:2412.03911](https://arxiv.org/abs/2412.03911)] [[Code](https://github.com/Chumsy0725/PASLCD)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [44] Turbo3D: Ultra-fast Text-to-3D Generation\n- **🧑‍🔬 作者**：Hanzhe Hu, Tianwei Yin, Fujun Luan, Yiwei Hu, Hao Tan, Zexiang Xu, Sai Bi, Shubham Tulsiani, Kai Zhang\n- **🏫 单位**：Carnegie Mellon University ⟐ Massachusetts Institute of Technology ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](../abs/2412.04470.md)] [[arXiv:2412.04470](https://arxiv.org/abs/2412.04470)] [[Code](https://github.com/hzhupku/Turbo3D)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [45] Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction\n- **🧑‍🔬 作者**：Seungtae Nam, Xiangyu Sun, Gyeongjin Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](../abs/2412.06234.md)] [[arXiv:2412.06234](https://arxiv.org/abs/2412.06234)] [[Code](https://github.com/stnamjef/GenerativeDensification)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [46] Splatter-360: Generalizable 360∘ Gaussian Splatting for Wide-baseline Panoramic Images\n- **🧑‍🔬 作者**：Zheng Chen, Chenming Wu, Zhelun Shen, Chen Zhao, Weicai Ye, Haocheng Feng, Errui Ding, Song-Hai Zhang\n- **🏫 单位**：Tsinghua University ⟐ Baidu VIS ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2412.06250.md)] [[arXiv:2412.06250](https://arxiv.org/abs/2412.06250)] [[Code](https://github.com/thucz/splatter360)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [47] Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction\n- **🧑‍🔬 作者**：Dongxu Wei, Zhiqi Li, Peidong Liu\n- **🏫 单位**：Westlake University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2412.06273.md)] [[arXiv:2412.06273](https://arxiv.org/abs/2412.06273)] [[Code](https://github.com/WU-CVGL/OmniScene)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [48] MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views\n- **🧑‍🔬 作者**：Antoine Guédon, Tomoki Ichikawa, Kohei Yamashita, Ko Nishino\n- **🏫 单位**：Univ Gustave Eiffel ⟐ Kyoto University\n- **🔗 链接**：[[中英摘要](../abs/2412.06767.md)] [[arXiv:2412.06767](https://arxiv.org/abs/2412.06767)] [[Code](https://github.com/Anttwo/MAtCha)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [49] GASP: Gaussian Avatars with Synthetic Priors\n- **🧑‍🔬 作者**：Jack Saunders, Charlie Hewitt, Yanan Jian, Marek Kowalski, Tadas Baltrusaitis, Yiye Chen, Darren Cosker, Virginia Estellers, Nicholas Gyde, Vinay P. Namboodiri, Benjamin E Lundell\n- **🏫 单位**：Microsoft ⟐ University of Bath ⟐ Georgia Tech\n- **🔗 链接**：[[中英摘要](../abs/2412.07739.md)] [[arXiv:2412.07739](https://arxiv.org/abs/2412.07739)] [[Code](https://microsoft.github.io/GASP/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [50] GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency\n- **🧑‍🔬 作者**：Dongyue Lu, Lingdong Kong, Tianxin Huang, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2412.09511.md)] [[arXiv:2412.09511](https://arxiv.org/abs/2412.09511)] [[Code](https://github.com/DylanOrange/geal)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [51] Feat2GS: Probing Visual Foundation Models with Gaussian Splatting\n- **🧑‍🔬 作者**：Yabo Chen, Chen Yang, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Wei Shen, Wenrui Dai, Hongkai Xiong, Qi Tian\n- **🏫 单位**：Westlake University ⟐ Max Planck Institute for Intelligent Systems ⟐ University of Tübingen, Tübingen AI Center ⟐ Max Planck Institute for Informatics, Saarland Informatics Campus\n- **🔗 链接**：[[中英摘要](../abs/2412.09606.md)] [[arXiv:2412.09606](https://arxiv.org/abs/2412.09606)] [[Code](https://fanegg.github.io/Feat2GS/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [52] MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction\n- **🧑‍🔬 作者**：Xiaohao Xu, Feng Xue, Shibo Zhao, Yike Pan, Sebastian Scherer, Xiaonan Huang\n- **🏫 单位**：University of Michigan, Ann Arbor ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2412.09723.md)] [[arXiv:2412.09723](https://arxiv.org/abs/2412.09723)] [[Code](https://github.com/Xiaohao-Xu/MAC-Ego3D)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [53] SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video\n- **🧑‍🔬 作者**：Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, Munchurl Kim\n- **🏫 单位**：KAIST ⟐ Chung-Ang University\n- **🔗 链接**：[[中英摘要](../abs/2412.09982.md)] [[arXiv:2412.09982](https://arxiv.org/abs/2412.09982)] [[Code](https://github.com/KAIST-VICLab/SplineGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [54] GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion\n- **🧑‍🔬 作者**：Jiapeng Tang, Davide Davoli, Tobias Kirschstein, Liam Schoneveld, Matthias Niessner\n- **🏫 单位**：Technical University of Munich ⟐ Toyota Motor Europe NV/SA ⟐ Woven by Toyota\n- **🔗 链接**：[[中英摘要](../abs/2412.10209.md)] [[arXiv:2412.10209](https://arxiv.org/abs/2412.10209)] [[Code](https://github.com/tangjiapeng/GAF)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [55] DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting\n- **🧑‍🔬 作者**：Luis Wiedmann, Luca Wiehe, David Rozenberszki\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2412.10972.md)] [[arXiv:2412.10972](https://arxiv.org/abs/2412.10972)] [[Code](https://github.com/lusxvr/dcseg)]\n- **📝 说明**：🏆 Accepted to CVPR 2025 OpenSUN3D Workshop\n\n#### [56] PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting\n- **🧑‍🔬 作者**：Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gambardella, Dinh Phung, Jianfei Cai\n- **🏫 单位**：Monash University ⟐ Building 4.0 CRC, Caulfield East, Victoria, Australia ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2412.12096.md)] [[arXiv:2412.12096](https://arxiv.org/abs/2412.12096)] [[Code](https://github.com/chengzhag/PanSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [57] 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting\n- **🧑‍🔬 作者**：Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, Zan Gojcic\n- **🏫 单位**：NVIDIA ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2412.12507.md)] [[arXiv:2412.12507](https://arxiv.org/abs/2412.12507)] [[Code](https://github.com/nv-tlabs/3dgrut)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [58] Gaussian Splatting for Efficient Satellite Image Photogrammetry\n- **🧑‍🔬 作者**：Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret\n- **🏫 单位**：Politecnico di Torino ⟐ Universite Paris-Saclay ⟐ AMIAD, Pole Recherche, France\n- **🔗 链接**：[[中英摘要](./abs/2412.13047.md)] [[arXiv:2412.13047](https://arxiv.org/abs/2412.13047)] [[Code](https://github.com/mezzelfo/EOGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [59] GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding\n- **🧑‍🔬 作者**：Haoyi Jiang, Liu Liu, Tianheng Cheng, Xinjie Wang, Tianwei Lin, Zhizhong Su, Wenyu Liu, Xinggang Wang\n- **🏫 单位**：Huazhong University of Science & Technology ⟐ Horizon Robotics\n- **🔗 链接**：[[中英摘要](./abs/2412.13193.md)] [[arXiv:2412.13193](https://arxiv.org/abs/2412.13193)] [[Code](https://github.com/hustvl/GaussTR)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [60] IDOL: Instant Photorealistic 3D Human Creation from a Single Image\n- **🧑‍🔬 作者**：Yiyu Zhuang, Jiaxi Lv, Hao Wen, Qing Shuai, Ailing Zeng, Hao Zhu, Shifeng Chen, Yujiu Yang, Xun Cao, Wei Liu\n- **🏫 单位**：Nanjing University ⟐ Chinese Academy of Sciences ⟐ Tsinghua University ⟐ Tencent ⟐ Shenzhen University of Advanced Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.14963.md)] [[arXiv:2412.14963](https://arxiv.org/abs/2412.14963)] [[Code](https://github.com/yiyuzhuang/IDOL)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [61] Ref-GS: Modeling View-Dependent Appearance with Environment Gaussian\n- **🧑‍🔬 作者**：Tao Xie, Xi Chen, Zhen Xu, Yiman Xie, Yudong Jin, Yujun Shen, Sida Peng, Hujun Bao, Xiaowei Zhou\n- **🏫 单位**：Zhejiang University ⟐ Ant Group\n- **🔗 链接**：[[中英摘要](../abs/2412.15215.md)] [[arXiv:2412.15215](https://arxiv.org/abs/2412.15215)] [[Code](https://github.com/zju3dv/EnvGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [62] IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing\n- **🧑‍🔬 作者**：Chun Gu, Xiaofei Wei, Zixuan Zeng, Yuxuan Yao, Li Zhang\n- **🏫 单位**：Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2412.15867.md)] [[arXiv:2412.15867](https://arxiv.org/abs/2412.15867)] [[Code](https://github.com/fudan-zvg/IRGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [63] CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images\n- **🧑‍🔬 作者**：Jungho Lee, Suhwan Cho, Taeoh Kim, Ho-Deok Jang, Minhyeok Lee, Geonho Cha, Dongyoon Wee, Dogyoon Lee, Sangyoun Lee\n- **🏫 单位**：Yonsei University ⟐ Naver Cloud\n- **🔗 链接**：[[中英摘要](../abs/2412.16028.md)] [[arXiv:2412.16028](https://arxiv.org/abs/2412.16028)] [[Code](https://github.com/Jho-Yonsei/CoCoGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [64] OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities\n- **🧑‍🔬 作者**：Suyoung Lee, Jaeyoung Chung, Kihoon Kim, Jaeyoo Huh, Gunhee Lee, Minsoo Lee, Kyoung Mu Lee\n- **🏫 单位**：Seoul National University ⟐ LG AI Research\n- **🔗 链接**：[[中英摘要](./abs/2412.16604.md)] [[arXiv:2412.16604](https://arxiv.org/abs/2412.16604)] [[Code](https://github.com/esw0116/OmniSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [65] MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks\n- **🧑‍🔬 作者**：Yifei Liu, Zhihang Zhong, Yifan Zhan, Sheng Xu, Xiao Sun\n- **🏫 单位**：Shanghai AI Laboratory ⟐ Beihang University ⟐ The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2412.20522.md)] [[arXiv:2412.20522](https://arxiv.org/abs/2412.20522)] [[Code](https://github.com/kaikai23/MaskGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [66] PERSE: Personalized 3D Generative Avatars from A Single Portrait\n- **🧑‍🔬 作者**：Hyunsoo Cha, Inhee Lee, Hanbyul Joo\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2412.21206.md)] [[arXiv:2412.21206](https://arxiv.org/abs/2412.21206)] [[Code](https://github.com/snuvclab/perse)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [67] MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong, Won-Sik Cheong, Jihyong Oh, Munchurl Kim\n- **🏫 单位**：ETRI, South Korea ⟐ KAIST, South Korea ⟐ Chung-Ang University, South Korea\n- **🔗 链接**：[[中英摘要](../abs/2501.03714.md)] [[arXiv:2501.03714](https://arxiv.org/abs/2501.03714)] [[Code](https://github.com/skwak-kaist/MoDec-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [68] Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance\n- **🧑‍🔬 作者**：Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias, Alexandros Lattas, Stefanos Zafeiriou\n- **🏫 单位**：Imperial College London, UK\n- **🔗 链接**：[[中英摘要](../abs/2501.05379.md)] [[arXiv:2501.05379](https://arxiv.org/abs/2501.05379)] [[Code](https://github.com/dimgerogiannis/Arc2Avatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [69] GauSTAR: Gaussian Surface Tracking and Reconstruction\n- **🧑‍🔬 作者**：Chengwei Zheng, Lixin Xue, Juan Zarate, Jie Song\n- **🏫 单位**：ETH Zurich ⟐ HKUST(GZ) ⟐ HKUST\n- **🔗 链接**：[[中英摘要](../abs/2501.10283.md)] [[arXiv:2501.10283](https://arxiv.org/abs/2501.10283)] [[Code](https://github.com/eth-ait/GauSTAR)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [70] Dense-SfM: Structure from Motion with Dense Consistent Matching\n- **🧑‍🔬 作者**：JongMin Lee, Sungjoo Yoo\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2501.14277.md)] [[arXiv:2501.14277](https://arxiv.org/abs/2501.14277)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [71] UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping\n- **🧑‍🔬 作者**：Aashish Rai, Dilin Wang, Mihir Jain, Nikolaos Sarafianos, Arthur Chen, Srinath Sridhar, Aayush Prakash\n- **🏫 单位**：Brown University ⟐ Meta Reality Labs\n- **🔗 链接**：[[中英摘要](../abs/2502.01846.md)] [[arXiv:2502.01846](https://arxiv.org/abs/2502.01846)] [[Code](https://aashishrai3799.github.io/uvgs/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [72] Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation\n- **🧑‍🔬 作者**：Joohyun Kwon, Hanbyel Cho, Junmo Kim\n- **🏫 单位**：DGIST ⟐ KAIST\n- **🔗 链接**：[[中英摘要](../abs/2502.02091.md)] [[arXiv:2502.02091](https://arxiv.org/abs/2502.02091)] [[Code](https://github.com/juhyeon-kwon/efficient_4d_gaussian_editing)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [73] AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting\n- **🧑‍🔬 作者**：Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen, Jie-Ying Lee, Bo-Hsu Ke, Chun-Wei Tuan Mu, Yi-Chuan Huang, Chin-Yang Lin, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu\n- **🏫 单位**：National Yang Ming Chiao Tung University ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2502.05176.md)] [[arXiv:2502.05176](https://arxiv.org/abs/2502.05176)] [[Code](https://github.com/kkennethwu/AuraFusion360_official)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [74] 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency\n- **🧑‍🔬 作者**：Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang\n- **🏫 单位**：National Taiwan University ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2502.11801.md)] [[arXiv:2502.11801](https://arxiv.org/abs/2502.11801)] [[Code](https://github.com/peterjohnsonhuang/3dgic)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [75] Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration\n- **🧑‍🔬 作者**：Kim Jun-Seong, GeonU Kim, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh\n- **🏫 单位**: POSTECH ⟐ NVIDIA ⟐ KAIST\n- **🔗 链接**：[[中英摘要](../abs/2502.16652.md)] [[arXiv:2502.16652](https://arxiv.org/abs/2502.16652)] [[Code](https://github.com/kaist-ami/Dr-Splat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [76] DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Liao Shen, Tianqi Liu, Huiqiang Sun, Jiaqi Li, Zhiguo Cao, Wei Li, Chen Change Loy\n- **🏫 单位**: School of AIA, Huazhong University of Science and Technology ⟐ S-Lab, Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2503.00746.md)] [[arXiv:2503.00746](https://arxiv.org/abs/2503.00746)] [[Code](https://dof-gaussian.github.io/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [77] Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior\n- **🧑‍🔬 作者**：Chen Guo, Junxuan Li, Yash Kant, Yaser Sheikh, Shunsuke Saito, Chen Cao\n- **🏫 单位**: Codec Avatars Lab, Meta ⟐ ETH Zurich ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](../abs/2503.01610.md)] [[arXiv:2503.01610](https://arxiv.org/abs/2503.01610)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [78] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models\n- **🧑‍🔬 作者**：Jay Zhangjie Wu, Yuxuan Zhang, Haithem Turki, Xuanchi Ren, Jun Gao, Mike Zheng Shou, Sanja Fidler, Zan Gojcic, Huan Ling\n- **🏫 单位**: NVIDIA ⟐ National U. of Singapore ⟐ University of Toronto ⟐ Vector Institute\n- **🔗 链接**：[[中英摘要](../abs/2503.01774.md)] [[arXiv:2503.01774](https://arxiv.org/abs/2503.01774)] [[Code](https://research.nvidia.com/labs/toronto-ai/difix3d/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [79] Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization\n- **🧑‍🔬 作者**：Jamie Wynn, Zawar Qureshi, Jakub Powierza, Jamie Watson, Mohamed Sayed\n- **🏫 单位**：Niantic ⟐ UCL\n- **🔗 链接**：[[中英摘要](../abs/2503.02009.md)] [[arXiv:2503.02009](https://arxiv.org/abs/2503.02009)] [[Code](https://nianticlabs.github.io/morpheus/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [80] NTR-Gaussian: Nighttime Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics\n- **🧑‍🔬 作者**：Kun Yang, Yuxiang Liu, Zeyu Cui, Yu Liu, Maojun Zhang, Shen Yan, Qing Wang\n- **🏫 单位**: Northwestern Polytechnical University ⟐ National University of Defense Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.03115.md)] [[arXiv:2503.03115](https://arxiv.org/abs/2503.03115)] [[Code](https://github.com/ykykykykyk/NTR-Gaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [81] S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shuojue Yang, Zijian Wu, Mingxuan Hong, Qian Li, Daiyun Shen, Septimiu E. Salcudean, Yueming Jin\n- **🏫 单位**: China University of Petroleum (East China) ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.04314.md)] [[arXiv:2503.04314](https://arxiv.org/abs/2503.04314)] [[Code](https://jeasco.github.io/S2Gaussian/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [82] Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs\n- **🧑‍🔬 作者**：Yingji Zhong, Zhihao Li, Dave Zhenyu Chen, Lanqing Hong, Dan Xu\n- **🏫 单位**: The Hong Kong University of Science and Technology ⟐ Huawei Noah's Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2503.05082.md)] [[arXiv:2503.05082](https://arxiv.org/abs/2503.05082)] [[Code](https://github.com/zhongyingji/guidedvd-3dgs)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [83] DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction\n- **🧑‍🔬 作者**：Miaowei Wang, Yibo Zhang, Rui Ma, Weiwei Xu, Changqing Zou, Daniel Morris\n- **🏫 单位**: The University of Edinburgh ⟐ Jilin University ⟐ Zhejiang University ⟐ Michigan State University\n- **🔗 链接**：[[中英摘要](../abs/2503.05484.md)] [[arXiv:2503.05484](https://arxiv.org/abs/2503.05484)] [[Code](https://github.com/wangmiaowei/DecoupledGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [84] DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation\n- **🧑‍🔬 作者**：Xiaoliang Ju, Hongsheng Li\n- **🏫 单位**: CUHK MMLab ⟐ CPII under InnoHK\n- **🔗 链接**：[[中英摘要](../abs/2503.06900.md)] [[arXiv:2503.06900](https://arxiv.org/abs/2503.06900)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [85] SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahui Zhang, Fangneng Zhan, Ling Shao, Shijian Lu\n- **🏫 单位**: Nanyang Technological University ⟐ Harvard University ⟐ MIT ⟐ UCAS-Terminus AI Lab, University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](../abs/2503.07476.md)] [[arXiv:2503.07476](https://arxiv.org/abs/2503.07476)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [86] ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, Ruizhen Hu\n- **🏫 单位**：University of Science and Technology of China ⟐ National University of Defense Technology ⟐ Shenzhen University\n- **🔗 链接**：[[中英摘要](../abs/2503.08135.md)] [[arXiv:2503.08135](https://arxiv.org/abs/2503.08135)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [87] HRAvatar: High-Quality and Relightable Gaussian Head Avatar\n- **🧑‍🔬 作者**：Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang\n- **🏫 单位**: Tsinghua Shenzhen International Graduate School ⟐ International Digital Economy Academy (IDEA)\n- **🔗 链接**：[[中英摘要](../abs/2503.08224.md)] [[arXiv:2503.08224](https://arxiv.org/abs/2503.08224)] [[Code](https://github.com/Pixel-Talk/HRAvatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [88] Mitigating Ambiguities in 3D Classification with Gaussian Splatting\n- **🧑‍🔬 作者**：Ruiqi Zhang, Hao Zhu, Jingyi Zhao, Qi Zhang, Xun Cao, Zhan Ma\n- **🏫 单位**: Nanjing University ⟐ Imperial College London ⟐ Vivo Company\n- **🔗 链接**：[[中英摘要](../abs/2503.08352.md)] [[arXiv:2503.08352](https://arxiv.org/abs/2503.08352)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [89] GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping\n- **🧑‍🔬 作者**：Jinfeng Liu, Lingtong Kong, Bo Li, Dan Xu\n- **🏫 单位**: The Hong Kong University of Science and Technology ⟐ vivo Mobile Communication Co.\n- **🔗 链接**：[[中英摘要](../abs/2503.10143.md)] [[arXiv:2503.10143](https://arxiv.org/abs/2503.10143)] [[Code](https://github.com/LiuJF1226/GaussHDR)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [90] 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models\n- **🧑‍🔬 作者**：Zhiqi Li, Chengrui Dong, Yiming Chen, Zhangchi Huang, Peidong Liu\n- **🏫 单位**: Harvard University ⟐ Tsinghua University ⟐ Stony Brook University ⟐ Brown University ⟐ ETH Zürich\n- **🔗 链接**：[[中英摘要](../abs/2503.10437.md)] [[arXiv:2503.10437](https://arxiv.org/abs/2503.10437)] [[Code](https://github.com/zrporz/4DLangSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [91] SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs\n- **🧑‍🔬 作者**：Songen Gu, Haoxuan Song, Binjie Liu, Qian Yu, Sanyi Zhang, Haiyong Jiang, Jin Huang, Feng Tian\n- **🏫 单位**: Peking University ⟐ Pengcheng Laboratory ⟐ University of Nottingham\n- **🔗 链接**：[[中英摘要](../abs/2503.12535.md)] [[arXiv:2503.12535](https://arxiv.org/abs/2503.12535)] [[Code](https://github.com/gbliao/SPC-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [92] RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars\n- **🧑‍🔬 作者**：Linzhou Li, Yumeng Li, Yanlin Weng, Youyi Zheng, Kun Zhou\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2503.12886.md)] [[arXiv:2503.12886](https://arxiv.org/abs/2503.12886)] [[Code](https://github.com/gapszju/RGBAvatar)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [93] Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting\n- **🧑‍🔬 作者**：Runsong Zhu, Shi Qiu, Zhengzhe Liu, Ka-Hei Hui, Qianyi Wu, Pheng-Ann Heng, Chi-Wing Fu\n- **🏫 单位**: The Chinese University of Hong Kong ⟐ Lingnan University ⟐ Carnegie Mellon University ⟐ Monash University\n- **🔗 链接**：[[中英摘要](../abs/2503.14029.md)] [[arXiv:2503.14029](https://arxiv.org/abs/2503.14029)] [[Code](https://github.com/Runsong123/Unified-Lift)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [94] RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images\n- **🧑‍🔬 作者**：Junjin Xiao, Qing Zhang, Yonewei Nie, Lei Zhu, Wei-Shi Zheng\n- **🏫 单位**: School of Computer Science and Engineering, Sun Yat-sen University, China ⟐ South China University of Technology ⟐ Hong Kong University of Science and Technology (Guangzhou) ⟐ Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China\n- **🔗 链接**：[[中英摘要](../abs/2503.14198.md)] [[arXiv:2503.14198](https://arxiv.org/abs/2503.14198)] [[Code](https://github.com/iSEE-Laboratory/RoGSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [95] BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting\n- **🧑‍🔬 作者**：Yiren Lu, Yunlai Zhou, Disheng Liu, Tuo Liang, Yu Yin\n- **🏫 单位**: Case Western Reserve University\n- **🔗 链接**：[[中英摘要](../abs/2503.15835.md)] [[arXiv:2503.15835](https://arxiv.org/abs/2503.15835)] [[Code](https://vulab-ai.github.io/BARD-GS/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [96] RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos\n- **🧑‍🔬 作者**：Yuxin Yao, Zhi Deng, Junhui Hou\n- **🏫 单位**：City University of Hong Kong ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](../abs/2503.16822.md)] [[arXiv:2503.16822](https://arxiv.org/abs/2503.16822)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [97] Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting\n- **🧑‍🔬 作者**：Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang\n- **🏫 单位**: Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology ⟐ Shenzhen Graduate School, Peking University ⟐ Pengcheng Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2503.16979.md)] [[arXiv:2503.16979](https://arxiv.org/abs/2503.16979)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [98] TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang\n- **🏫 单位**: Alibaba Group, Hangzhou, China\n- **🔗 链接**：[[中英摘要](../abs/2503.17032.md)] [[arXiv:2503.17032](https://arxiv.org/abs/2503.17032)] [[Code](https://pixelai-team.github.io/TaoAvatar/)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [99] PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding\n- **🧑‍🔬 作者**：Hongjia Zhai, Hai Li, Zhenzhe Li, Xiaokun Pan, Yijia He, Guofeng Zhang\n- **🏫 单位**：State Key Lab of CAD & CG, Zhejiang University ⟐ RayNeo\n- **🔗 链接**：[[中英摘要](../abs/2503.18107.md)] [[arXiv:2503.18107](https://arxiv.org/abs/2503.18107)] [[Code](https://github.com/zhaihongjia/PanoGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [100] DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds\n- **🧑‍🔬 作者**：Youyu Chen, Junjun Jiang, Kui Jiang, Xiao Tang, Zhihao Li, Xianming Liu, Yinyu Nie\n- **🏫 单位**：Harbin Institute of Technology ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2503.18402.md)] [[arXiv:2503.18402](https://arxiv.org/abs/2503.18402)] [[Code](https://github.com/YouyuChen0207/DashGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [101] 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video\n- **🧑‍🔬 作者**：Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](../abs/2503.18421.md)] [[arXiv:2503.18421](https://arxiv.org/abs/2503.18421)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [102] Hardware-Rasterized Ray-Based Gaussian Splatting\n- **🧑‍🔬 作者**：Samuel Rota Bulò, Nemanja Bartolovic, Lorenzo Porzi, Peter Kontschieder\n- **🏫 单位**：Meta Reality Labs, Zurich\n- **🔗 链接**：[[中英摘要](../abs/2503.18682.md)] [[arXiv:2503.18682](https://arxiv.org/abs/2503.18682)] [[Code](https://github.com/facebookresearch/vkraygs)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [103] NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yulong Zheng, Zicheng Jiang, Shengfeng He, Yandu Sun, Junyu Dong, Huaidong Zhang, Yong Du\n- **🏫 单位**：Ocean University of China ⟐ Singapore Management University ⟐ South China University of Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.18794.md)] [[arXiv:2503.18794](https://arxiv.org/abs/2503.18794)] [[Code](https://github.com/USMizuki/NexusGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [104] HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting\n- **🧑‍🔬 作者**：Xinpeng Liu, Zeyi Huang, Fumio Okura, Yasuyuki Matsushita\n- **🏫 单位**：The University of Osaka ⟐ Microsoft Research Asia – Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2503.19232.md)] [[arXiv:2503.19232](https://arxiv.org/abs/2503.19232)] [[Code](https://github.com/huntorochi/HoGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [105] From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang\n- **🏫 单位**：State Key Lab of CAD & CG, Zhejiang University ⟐ SenseTime Research\n- **🔗 链接**：[[中英摘要](../abs/2503.19358.md)] [[arXiv:2503.19358](https://arxiv.org/abs/2503.19358)] [[Code](https://github.com/zju3dv/STDLoc)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [106] COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting\n- **🧑‍🔬 作者**：Jiaxin Zhang, Junjun Jiang, Youyu Chen, Kui Jiang, Xianming Liu\n- **🏫 单位**：Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.19443.md)] [[arXiv:2503.19443](https://arxiv.org/abs/2503.19443)] [[Code](https://github.com/ZestfulJX/COB-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [107] GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shujuan Li, Yu-Shen Liu, Zhizhong Han\n- **🏫 单位**：School of Software, Tsinghua University ⟐ Department of Computer Science, Wayne State University\n- **🔗 链接**：[[中英摘要](../abs/2503.19458.md)] [[arXiv:2503.19458](https://arxiv.org/abs/2503.19458)] [[Code](https://github.com/lisj575/GaussianUDF)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [108] PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model\n- **🧑‍🔬 作者**：Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao\n- **🏫 单位**：Tsinghua University ⟐ University of Michigan ⟐ Peking University ⟐ BAAI\n- **🔗 链接**：[[中英摘要](../abs/2503.19913.md)] [[arXiv:2503.19913](https://arxiv.org/abs/2503.19913)] [[Code](https://github.com/GasaiYU/PartRM)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [109] Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields\n- **🧑‍🔬 作者**：Navami Kairanda, Marc Habermann, Shanthika Naik, Christian Theobalt, Vladislav Golyanik\n- **🏫 单位**：MPI for Informatics, SIC ⟐ VIA Research Center ⟐ IIT Jodhpur\n- **🔗 链接**：[[中英摘要](../abs/2503.19976.md)] [[arXiv:2503.19976](https://arxiv.org/abs/2503.19976)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [110] EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis\n- **🧑‍🔬 作者**：Sheng Miao, Jiaxin Huang, Dongfeng Bai, Xu Yan, Hongyu Zhou, Yue Wang, Bingbing Liu, Andreas Geiger, Yiyi Liao\n- **🏫 单位**：Zhejiang University ⟐ Huawei Noah’s Ark Lab ⟐ University of Tubingen ⟐ Tubingen AI Center\n- **🔗 链接**：[[中英摘要](../abs/2503.20168.md)] [[arXiv:2503.20168](https://arxiv.org/abs/2503.20168)] [[Code](https://github.com/Miaosheng1/EVolSplat)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [111] Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields\n- **🧑‍🔬 作者**：Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi\n- **🏫 单位**：UCLA ⟐ Great Bay University ⟐ MIT ⟐ UT Austin ⟐ DEVCOM ARL\n- **🔗 链接**：[[中英摘要](../abs/2503.20776.md)] [[arXiv:2503.20776](https://arxiv.org/abs/2503.20776)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [112] Photorealistic Simulation-Ready Garments from a Single Pose\n- **🧑‍🔬 作者**：Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban, Nikolaos Sarafianos, Hsiao-yu Chen, Oshri Halimi, Aljaž Božič, Shunsuke Saito, Jiajun Wu, C. Karen Liu, Tuur Stuyck, Egor Larionov\n- **🏫 单位**：Stanford University ⟐ Meta Reality Labs\n- **🔗 链接**：[[中英摘要](../abs/2503.20779.md)] [[arXiv:2503.20779](https://arxiv.org/abs/2503.20779)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [113] CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis\n- **🧑‍🔬 作者**：Youngkyoon Jang, Eduardo Pérez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2503.20998.md)] [[arXiv:2503.20998](https://arxiv.org/abs/2503.20998)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [114] RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting\n- **🧑‍🔬 作者**：Qiyu Dai, Xingyu Ni, Qianfan Shen, Wenzheng Chen, Baoquan Chen, Mengyu Chu\n- **🏫 单位**：Peking University ⟐ State Key Laboratory of Multimedia Information Processing ⟐ State Key Laboratory of General Artificial Intelligence\n- **🔗 链接**：[[中英摘要](../abs/2503.21442.md)] [[arXiv:2503.21442](https://arxiv.org/abs/2503.21442)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [115] EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis\n- **🧑‍🔬 作者**：Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan\n- **🏫 单位**：Meitu Inc.\n- **🔗 链接**：[[中英摘要](../abs/2503.21816.md)] [[arXiv:2503.21816](https://arxiv.org/abs/2503.21816)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [116] ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning\n- **🧑‍🔬 作者**：Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue\n- **🏫 单位**：Fudan University ⟐ Nanyang Technological University ⟐ Shanghai Innovation Institute ⟐ NeuHelium Co., Ltd\n- **🔗 链接**：[[中英摘要](../abs/2503.23297.md)] [[arXiv:2503.23297](https://arxiv.org/abs/2503.23297)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [117] DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Seungjun Lee, Gim Hee Lee\n- **🏫 单位**：Department of Computer Science, National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2503.24210.md)] [[arXiv:2503.24210](https://arxiv.org/abs/2503.24210)] [[Code](https://github.com/DiET-GS/DiET-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [118] Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views\n- **🧑‍🔬 作者**：Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor\n- **🏫 单位**：State Key Lab of CAD & CG, Zhejiang University ⟐ ETH Zürich ⟐ University of Tübingen\n- **🔗 链接**：[[中英摘要](../abs/2503.24382.md)] [[arXiv:2503.24382](https://arxiv.org/abs/2503.24382)] [[Code](https://github.com/chobao/Free360)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [119] LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors\n- **🧑‍🔬 作者**：Han Zhou, Wei Dong, Jun Chen\n- **🏫 单位**：McMaster University\n- **🔗 链接**：[[中英摘要](../abs/2504.00219.md)] [[arXiv:2504.00219](https://arxiv.org/abs/2504.00219)] [[Code](https://github.com/LowLevelAI/LITA-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [120] Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration\n- **🧑‍🔬 作者**：Zilong Huang, Jun He, Junyan Ye, Lihan Jiang, Weijia Li, Yiping Chen, Ting Han\n- **🏫 单位**：Sun Yat-sen University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](../abs/2504.00387.md)] [[arXiv:2504.00387](https://arxiv.org/abs/2504.00387)] [[Code](https://github.com/LongHZ140516/Scene4U)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [121] Monocular and Generalizable Gaussian Talking Head Animation\n- **🧑‍🔬 作者**：Shengjie Gong, Haojie Li, Jiapeng Tang, Dongming Hu, Shuangping Huang, Hao Chen, Tianshui Chen, Zhuoman Liu\n- **🏫 单位**：South China University of Technology ⟐ Technical University of Munich ⟐ Pazhou Laboratory ⟐ Guangdong University of Technology ⟐ The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](../abs/2504.00665.md)] [[arXiv:2504.00665](https://arxiv.org/abs/2504.00665)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [122] UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction\n- **🧑‍🔬 作者**：Yunxuan Mao, Rong Xiong, Yue Wang, Yiyi Liao\n- **🏫 单位**： Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2504.00763.md)] [[arXiv:2504.00763](https://arxiv.org/abs/2504.00763)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [123] DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting\n- **🧑‍🔬 作者**：Hyunwoo Park, Gun Ryu, Wonjun Kim\n- **🏫 单位**： Konkuk University\n- **🔗 链接**：[[中英摘要](../abs/2504.00773.md)] [[arXiv:2504.00773](https://arxiv.org/abs/2504.00773)] [[Code](https://github.com/DCVL-3D/DropGaussian)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [124] Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment\n- **🧑‍🔬 作者**：Hyunwoo Park, Gun Ryu, Wonjun Kim\n- **🏫 单位**：The University of Tokyo ⟐ RIKEN AIP\n- **🔗 链接**：[[中英摘要](../abs/2504.01503.md)] [[arXiv:2504.01503](https://arxiv.org/abs/2504.01503)] [[Code](https://github.com/cuiziteng/Luminance-GS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [125] Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting\n- **🧑‍🔬 作者**：Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen\n- **🏫 单位**：National Yang Ming Chiao Tung University ⟐ Atmanity Inc.\n- **🔗 链接**：[[中英摘要](../abs/2504.01957.md)] [[arXiv:2504.01957](https://arxiv.org/abs/2504.01957)] [[Code](https://github.com/HCIS-Lab/GaussianLSS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [126] Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model\n- **🧑‍🔬 作者**：Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan\n- **🏫 单位**：Tsinghua University ⟐ WeChat Vision, Tecent Inc.\n- **🔗 链接**：[[中英摘要](../abs/2504.02764.md)] [[arXiv:2504.02764](https://arxiv.org/abs/2504.02764)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [127] WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments\n- **🧑‍🔬 作者**：Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni\n- **🏫 单位**：Stanford University ⟐ ETH Zürich ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](../abs/2504.03886.md)] [[arXiv:2504.03886](https://arxiv.org/abs/2504.03886)] [[Code](https://github.com/GradientSpaces/WildGS-SLAM)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [128] PanoDreamer: Consistent Text to 360-Degree Scene Generation\n- **🧑‍🔬 作者**：Isha Sharma, Dieter Schmalstieg\n- **🏫 单位**：OPPO US Research Center ⟐ Washington University in St. Louis\n- **🔗 链接**：[[中英摘要](../abs/2504.05152.md)] [[arXiv:2504.05152](https://arxiv.org/abs/2504.05152)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025 Workshop\n\n#### [129] HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation\n- **🧑‍🔬 作者**：Yiming Liang, Tianhan Xu, Yuta Kikuchi\n- **🏫 单位**：Waseda University ⟐ Preferred Networks, Inc.\n- **🔗 链接**：[[中英摘要](../abs/2504.06210.md)] [[arXiv:2504.06210](https://arxiv.org/abs/2504.06210)] [[Code](https://github.com/pfnet-research/himor)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [130] Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting\n- **🧑‍🔬 作者**：Daiwei Zhang, Joaquin Gajardo, Tomislav Medic, Isinsu Katircioglu, Mike Boss, Norbert Kirchgessner, Achim Walter, Lukas Roth\n- **🏫 单位**：ETH Zurich ⟐ Swiss Data Science Center\n- **🔗 链接**：[[中英摘要](./abs/2504.06978.md)] [[arXiv:2504.06978](https://arxiv.org/abs/2504.06978)] [[Code](https://github.com/zdwww/Wheat-3DGS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 Workshop\n\n#### [131] BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Junuk Cha, Soohyun Hwang, Hyein Hwang, Seungryul Baek\n- **🏫 单位**：UNIST, South Korea\n- **🔗 链接**：[[中英摘要](../abs/2504.09097.md)] [[arXiv:2504.09097](https://arxiv.org/abs/2504.09097)] [Code]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [132] DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering\n- **🧑‍🔬 作者**：Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo\n- **🏫 单位**：The Shenzhen Campus, Sun Yat-Sen University ⟐ Xiamen University ⟐ University of Macau\n- **🔗 链接**：[[中英摘要](../abs/2504.09491.md)] [[arXiv:2504.09491](https://arxiv.org/abs/2504.09491)] [[Code](https://github.com/xuyx55/DropoutGS)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [133] ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos\n- **🧑‍🔬 作者**：Zetong Zhang, Manuel kaufmann, Lixin Xue, Jie Song, Martin R. Oswald\n- **🏫 单位**：ETH Z¨ urich ⟐ HKUST(GZ) ⟐ HKUST ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](../abs/2504.13167.md)] [[arXiv:2504.13167](https://arxiv.org/abs/2504.13167)] [[Code](https://github.com/eth-ait/ODHSR)]\n- **📝 说明**：🏆 Accepted to CVPR 2025\n\n#### [134] SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos\n- **🧑‍🔬 作者**：Yuxin Yao, Yan Zhang, Zhening Huang, Joan Lasenby\n- **🏫 单位**：University of Cambridge ⟐ Meshcapade\n- **🔗 链接**：[[中英摘要](../abs/2504.17810.md)] [[arXiv:2504.17810](https://arxiv.org/abs/2504.17810)] [[Code](https://github.com/YuxinYao620/SmallGS-release)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 PBVS Workshop\n\n#### [135] Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views\n- **🧑‍🔬 作者**：Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang\n- **🏫 单位**：Northwestern Polytechnical University\n- **🔗 链接**：[[中英摘要](../abs/2504.20378.md)] [[arXiv:2504.20378](https://arxiv.org/abs/2504.20378)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [136] SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Shubhendu Jena, Shishir Reddy Vutukur, Adnane Boukhayma\n- **🏫 单位**：INRIA ⟐ TUM\n- **🔗 链接**：[[中英摘要](../abs/2505.02175.md)] [[arXiv:2505.02175](https://arxiv.org/abs/2505.02175)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 Workshop\n\n#### [137] SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction\n- **🧑‍🔬 作者**：Xinran Yang, Donghao Ji, Yuanqi Li, Jie Guo, Yanwen Guo, Junyuan Xie\n- **🏫 单位**：Nanjing University, Nanjing, China\n- **🔗 链接**：[[中英摘要](../abs/2505.04668.md)] [[arXiv:2505.04668](https://arxiv.org/abs/2505.04668)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [138] Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields\n- **🧑‍🔬 作者**：Xinran Yang, Donghao Ji, Yuanqi Li, Jie Guo, Yanwen Guo, Junyuan Xie\n- **🏫 单位**：Brown University ⟐ Meta Reality Labs ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2505.05356.md)] [[arXiv:2505.05356](https://arxiv.org/abs/2505.05356)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [139] SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation\n- **🧑‍🔬 作者**：Yonwoo Choi\n- **🏫 单位**：SECERNAI\n- **🔗 链接**：[[中英摘要](../abs/2505.05475.md)] [[arXiv:2505.05475](https://arxiv.org/abs/2505.05475)] [[Code](https://github.com/yc4ny/SVAD)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 SyntaGen Workshop\n\n#### [140] Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation\n- **🧑‍🔬 作者**：Yiming Qin, Zhu Xu, Yang Liu\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2505.05505.md)] [[arXiv:2505.05505](https://arxiv.org/abs/2505.05505)] [[Code](https://github.com/Wakals/GASCOL)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [141] Steepest Descent Density Control for Compact 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Peihao Wang, Yuehao Wang, Dilin Wang, Sreyas Mohan, Zhiwen Fan, Lemeng Wu, Ruisi Cai, Yu-Ying Yeh, Zhangyang Wang, Qiang Liu, Rakesh Ranjan\n- **🏫 单位**：The University of Texas at Austin ⟐ Meta Reality Labs\n- **🔗 链接**：[[中英摘要](../abs/2505.05587.md)] [[arXiv:2505.05587](https://arxiv.org/abs/2505.05587)] [[Code](https://github.com/facebookresearch/SteepGS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [142] Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians\n- **🧑‍🔬 作者**：Ma Changfeng, Bi Ran, Guo Jie, Wang Chongjun, Guo Yanwen\n- **🏫 单位**：1Nanjing University ⟐ North University of China\n- **🔗 链接**：[[中英摘要](./abs/2505.09413.md)] [[arXiv:2505.09413](https://arxiv.org/abs/2505.09413)] [[Code](https://github.com/murcherful/GauPCRender)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [143] iSegMan: Interactive Segment-and-Manipulate 3D Gaussians\n- **🧑‍🔬 作者**：Yian Zhao, Wanshi Xu, Ruochong Zheng, Pengchong Qiao, Chang Liu, Jie Chen\n- **🏫 单位**：Peking University Shenzhen Graduate School ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2505.11934.md)] [[arXiv:2505.11934](https://arxiv.org/abs/2505.11934)] [[Code](https://github.com/Zhao-Yian/iSegMan)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [144] MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models\n- **🧑‍🔬 作者**：Yifan Liu, Keyu Fan, Weihao Yu, Chenxin Li, Hao Lu, Yixuan Yuan\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Tsinghua University (Shenzhen) ⟐ HKUST (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2505.15185.md)] [[arXiv:2505.15185](https://arxiv.org/abs/2505.15185)] [[Code](https://github.com/CUHK-AIM-Group/MonoSplat)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [145] 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians\n- **🧑‍🔬 作者**：Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison\n- **🏫 单位**：Imperial College London\n- **🔗 链接**：[[中英摘要](./abs/2505.22859.md)] [[arXiv:2505.22859](https://arxiv.org/abs/2505.22859)] [[Code](https://github.com/muskie82/4dtam)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [146] 3D Gaussian Splat Vulnerabilities\n- **🧑‍🔬 作者**：Matthew Hull, Haoyang Yang, Pratham Mehta, Mansi Phute, Aeree Cho, Haoran Wang, Matthew Lau, Wenke Lee, Willian T. Lunardi, Martin Andreoni, Polo Chau\n- **🏫 单位**：Georgia Tech ⟐ Technology Innovation Institute\n- **🔗 链接**：[[中英摘要](./abs/2506.00280.md)] [[arXiv:2506.00280](https://arxiv.org/abs/2506.00280)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 Workshop\n\n#### [147] FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hengyu Liu, Yuehao Wang, Chenxin Li, Ruisi Cai, Kevin Wang, Wuyang Li, Pavlo Molchanov, Peihao Wang, Zhangyang Wang\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ University of Texas at Austin ⟐ Nvidia\n- **🔗 链接**：[[中英摘要](./abs/2506.04174.md)] [[arXiv:2506.04174](https://arxiv.org/abs/2506.04174)] [[Code](https://github.com/LiuHengyu321/FlexGS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [148] FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang, Yong Chen, Hujun Bao, Sida Peng, Xiaowei Zhou\n- **🏫 单位**：Zhejiang University ⟐ Geely Automobile Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2506.05348.md)] [[arXiv:2506.05348](https://arxiv.org/abs/2506.05348)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [149] Gen4D: Synthesizing Humans and Scenes in the Wild\n- **🧑‍🔬 作者**：Jerrin Bright, Zhibo Wang, Yuhao Chen, Sirisha Rambhatla, John Zelek, David Clausi\n- **🏫 单位**：Vision and Image Processing Lab ⟐ Critical ML Lab ⟐ University of Waterloo\n- **🔗 链接**：[[中英摘要](./abs/2506.05397.md)] [[arXiv:2506.05397](https://arxiv.org/abs/2506.05397)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 Workshop\n\n#### [150] VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction\n- **🧑‍🔬 作者**：Ziyue Zhu, Shenlong Wang, Jin Xie, Jiang-jiang Liu, Jingdong Wang, Jian Yang\n- **🏫 单位**：Nankai University ⟐ UIUC ⟐ Nanjing University ⟐ Baidu\n- **🔗 链接**：[[中英摘要](./abs/2506.05563.md)] [[arXiv:2506.05563](https://arxiv.org/abs/2506.05563)] [[Code](https://github.com/ZZY816/VoxelSplat)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [151] FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity\n- **🧑‍🔬 作者**：Jinxi Li, Ziyang Song, Siyuan Zhou, Bo Yang\n- **🏫 单位**：The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2506.07865.md)] [[arXiv:2506.07865](https://arxiv.org/abs/2506.07865)] [[Code](https://github.com/vLAR-group/FreeGave)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [152] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting\n- **🧑‍🔬 作者**：Ziyi Wang, Yanran Zhang, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2506.09952.md)] [[arXiv:2506.09952](https://arxiv.org/abs/2506.09952)] [[Code](https://github.com/wangzy22/UniPre3D)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [153] GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction\n- **🧑‍🔬 作者**：Jinguang Tong, Xuesong li, Fahira Afzal Maken, Sundaram Muthu, Lars Petersson, Chuong Nguyen, Hongdong Li\n- **🏫 单位**：Australian National University ⟐ CSIRO ⟐ Indian Institute of Technology Madras\n- **🔗 链接**：[[中英摘要](./abs/2506.13110.md)] [[arXiv:2506.13110](https://arxiv.org/abs/2506.13110)] [[Code](https://github.com/hirotong/GS2DGS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [154] SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting\n- **🧑‍🔬 作者**：Ziqiao Peng, Wentao Hu, Junyuan Ma, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Hui Tian, Jun He, Hongyan Liu, Zhaoxin Fan\n- **🏫 单位**：Renmin University of China ⟐ Beijing University of Posts and Telecommunications ⟐ Chinese Academy of Sciences ⟐ Tsinghua University ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2506.14742.md)] [[arXiv:2506.14742](https://arxiv.org/abs/2506.14742)] [[Code](https://github.com/ziqiaopeng/SyncTalk)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [155] DBMovi-GS: Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting\n- **🧑‍🔬 作者**：Yeon-Ji Song, Jaein Kim, Byung-Ju Kim, Byoung-Tak Zhang\n- **🏫 单位**：Interdisciplinary Program in Neuroscience ⟐ IPAI ⟐ Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2506.20998.md)] [[arXiv:2506.20998](https://arxiv.org/abs/2506.20998)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025 Neural Fields Beyond Conventional Cameras Workshop\n\n#### [156] Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning\n- **🧑‍🔬 作者**：Buzhen Huang, Chen Li, Chongyang Xu, Dongyue Lu, Jinnan Chen, Yangang Wang, Gim Hee Lee\n- **🏫 单位**：Southeast University ⟐ National University of Singapore ⟐ Sichuan University ⟐ Agency for Science, Technology and Research, Singapore\n- **🔗 链接**：[[中英摘要](./abs/2507.02565.md)] [[arXiv:2507.02565](https://arxiv.org/abs/2507.02565)] [[Code](https://github.com/boycehbz/CloseApp)]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n\n#### [157] Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization\n- **🧑‍🔬 作者**：Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler\n- **🏫 单位**：Czech Technical University in Prague ⟐ NAVER LABS Europe\n- **🔗 链接**：[[中英摘要](./abs/2507.23569.md)] [[arXiv:2507.23569](https://arxiv.org/abs/2507.23569)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2025\n"
  },
  {
    "path": "2025/ICASSP.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICASSP2025\n\n#### [1] DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis\n- **🧑‍🔬 作者**：Kaijun Deng, Dezhi Zheng, Jindong Xie, Jinbao Wang, Weicheng Xie, Linlin Shen, Siyang Song\n- **🏫 单位**：Shenzhen University ⟐ Guangdong Provincial Key Laboratory of Intelligent Information Processing ⟐ University of Exeter\n- **🔗 链接**：[[中英摘要](../abs/2412.20148.md)] [[arXiv:2412.20148](https://arxiv.org/abs/2412.20148)] [[Code](https://github.com/CVI-SZU/DEGSTalk)]\n- **📝 说明**：🏆 Accepted to ICASSP 2025\n\n#### [2] ConcealGS: Concealing Invisible Copyright Information in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yifeng Yang, Hengyu Liu, Chenxin Li, Yining Sun, Wuyang Li, Yifan Liu, Yiyang Lin, Yixuan Yuan, Nanyang Ye\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong ⟐ Johns Hopkins University\n- **🔗 链接**：[[中英摘要](../abs/2501.03605.md)] [[arXiv:2501.03605](https://arxiv.org/abs/2501.03605)] [[Code](https://github.com/zxk1212/ConcealGS)]\n- **📝 说明**：🏆 Accepted to ICASSP 2025\n\n#### [3] 3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering\n- **🧑‍🔬 作者**：Meenakshi Krishnan, Liam Fowl, Ramani Duraiswami\n- **🏫 单位**：Univ. of Maryland ⟐ Google, New York\n- **🔗 链接**：[[中英摘要](../abs/2501.08370.md)] [[arXiv:2501.08370](https://arxiv.org/abs/2501.08370)] [Code]\n- **📝 说明**：🏆 Accepted to ICASSP 2025 Workshop\n\n#### [4] See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization\n- **🧑‍🔬 作者**：Zongqi He, Zhe Xiao, Kin-Chung Chan, Yushen Zuo, Jun Xiao, Kin-Man Lam\n- **🏫 单位**：The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](../abs/2501.11508.md)] [[arXiv:2501.11508](https://arxiv.org/abs/2501.11508)] [Code]\n- **📝 说明**：🏆 Accepted to ICASSP 2025\n\n#### [5] Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting\n- **🧑‍🔬 作者**：Anil Armagan, Albert Saà-Garriga, Bruno Manganelli, Mateusz Nowak, Mehmet Kerim Yucel\n- **🏫 单位**：Samsung R&D Institute UK (SRUK)\n- **🔗 链接**：[[中英摘要](../abs/2501.14534.md)] [[arXiv:2501.14534](https://arxiv.org/abs/2501.14534)] [Code]\n- **📝 说明**：🏆 Accepted to ICASSP 2025\n"
  },
  {
    "path": "2025/ICCV.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICCV2025\n\n#### [1] GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting\n- **🧑‍🔬 作者**：Wanshui Gan, Fang Liu, Hongbin Xu, Ningkai Mo, Naoto Yokoya\n- **🏫 单位**：University of Tokyo ⟐ RIKEN ⟐ South China University of Technology ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2408.11447.md)] [[arXiv:2408.11447](https://arxiv.org/abs/2408.11447)] [[Code](https://github.com/GANWANSHUI/GaussianOcc)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [2] 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt\n- **🧑‍🔬 作者**：Lukas Höllein, Aljaž Božič, Michael Zollhöfer, Matthias Nießner\n- **🏫 单位**：Technical University of Munich ⟐ Meta\n- **🔗 链接**：[[中英摘要](./abs/2409.12892.md)] [[arXiv:2409.12892](https://arxiv.org/abs/2409.12892)] [[Code](https://github.com/lukasHoel/3DGS-LM)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [3] Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats\n- **🧑‍🔬 作者**：Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu\n- **🏫 单位**：Oregon State University ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2410.12781.md)] [[arXiv:2410.12781](https://arxiv.org/abs/2410.12781)] [[Code](https://github.com/arthurhero/Long-LRM)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [4] MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes\n- **🧑‍🔬 作者**：Xinjie Zhang, Zhening Liu, Yifan Zhang, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Zehong Lin, Shuicheng Yan, Jun Zhang\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Skywork AI ⟐ The Chinese University of Hong Kong ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2410.13613.md)] [[arXiv:2410.13613](https://arxiv.org/abs/2410.13613)] [[Code](https://github.com/Xinjie-Q/MEGA)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [5] LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes\n- **🧑‍🔬 作者**：Juliette Marrie, Romain Ménégaux, Michael Arbel, Diane Larlus, Julien Mairal\n- **🏫 单位**：Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK ⟐ NAVER LABS Europe\n- **🔗 链接**：[[中英摘要](./abs/2410.14462.md)] [[arXiv:2410.14462](https://arxiv.org/abs/2410.14462)] [[Code](https://github.com/naver/ludvig)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [6] Multimodal LLM Guided Exploration and Active Mapping using Fisher Information\n- **🧑‍🔬 作者**：Wen Jiang, Boshu Lei, Katrina Ashton, Kostas Daniilidis\n- **🏫 单位**：University of Pennsylvania\n- **🔗 链接**：[[中英摘要](./abs/2410.17422.md)] [[arXiv:2410.17422](https://arxiv.org/abs/2410.17422)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [7] GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering\n- **🧑‍🔬 作者**：Kai Ye, Chong Gao, Guanbin Li, Wenzheng Chen, Baoquan Chen\n- **🏫 单位**：Peking University ⟐ Sun Yat-sen University ⟐ State Key Laboratory of General AI\n- **🔗 链接**：[[中英摘要](./abs/2410.24204.md)] [[arXiv:2410.24204](https://arxiv.org/abs/2410.24204)] [[Code](https://github.com/PKU-VCL-Geometry/GeoSplatting)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [8] Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis\n- **🧑‍🔬 作者**：Chen Zhao, Xuan Wang, Tong Zhang, Saqib Javed, Mathieu Salzmann\n- **🏫 单位**：EPFL ⟐ Ant Group ⟐ Swiss Data Science Center\n- **🔗 链接**：[[中英摘要](./abs/2411.00144.md)] [[arXiv:2411.00144](https://arxiv.org/abs/2411.00144)] [[Code](https://github.com/sailor-z/SE-GS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [9] TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction\n- **🧑‍🔬 作者**：DaDong Jiang, Zhihui Ke, Xiaobo Zhou, Zhi Hou, Xianghui Yang, Wenbo Hu, Tie Qiu, Chunchao Guo\n- **🏫 单位**：Tianjin University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Tencent Hunyuan ⟐ Tencent AI Lab\n- **🔗 链接**：[[中英摘要](./abs/2411.11941.md)] [[arXiv:2411.11941](https://arxiv.org/abs/2411.11941)] [[Code](https://github.com/PatrickDDj/TimeFormer-Code)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [10] Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting\n- **🧑‍🔬 作者**：Haoyu Zhao, Hao Wang, Xingyue Zhao, Hongqiu Wang, Zhiyu Wu, Chengjiang Long, Hua Zou\n- **🏫 单位**：Wuhan University ⟐ Huazhong University of Science and Technology ⟐ Xi’an Jiao Tong University ⟐ Kong University of Science and Technology (Guangzhou) ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2411.12789.md)] [[arXiv:2411.12789](https://arxiv.org/abs/2411.12789)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [11] GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaobao Wei, Peng Chen, Guangyu Li, Ming Lu, Hui Chen, Feng Tian\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Intel Labs China\n- **🔗 链接**：[[中英摘要](./abs/2411.12981.md)] [[arXiv:2411.12981](https://arxiv.org/abs/2411.12981)] [[Code](https://github.com/ucwxb/GazeGaussian)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [12] Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction\n- **🧑‍🔬 作者**：Yuanhao Cai, He Zhang, Kai Zhang, Yixun Liang, Mengwei Ren, Fujun Luan, Qing Liu, Soo Ye Kim, Jianming Zhang, Zhifei Zhang, Yuqian Zhou, Yulun Zhang, Xiaokang Yang, Zhe Lin, Alan Yuille\n- **🏫 单位**：Johns Hopkins University ⟐ Adobe Research ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2411.14384.md)] [[arXiv:2411.14384](https://arxiv.org/abs/2411.14384)] [[Code](https://github.com/caiyuanhao1998/Open-DiffusionGS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [13] EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaobao Wei, Qingpo Wuwu, Zhongyu Zhao, Zhuangzhe Wu, Nan Huang, Ming Lu, Ningning MA, Shanghang Zhang\n- **🏫 单位**：Peking University ⟐ Autonomous Driving Development, NIO\n- **🔗 链接**：[[中英摘要](./abs/2411.15582.md)] [[arXiv:2411.15582](https://arxiv.org/abs/2411.15582)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [14] Sequential Gaussian Avatars with Hierarchical Motion Context\n- **🧑‍🔬 作者**：Wangze Xu, Yifan Zhan, Zhihang Zhong, Xiao Sun\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ The University of Tokyo\n- **🔗 链接**：[[中英摘要](./abs/2411.16768.md)] [[arXiv:2411.16768](https://arxiv.org/abs/2411.16768)] [[Code](https://github.com/zezeaaa/SeqAvatar)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [15] A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision\n- **🧑‍🔬 作者**：Chensheng Peng, Ido Sobol, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu, Or Litany\n- **🏫 单位**：UC Berkeley ⟐ Technion\n- **🔗 链接**：[[中英摘要](./abs/2412.00623.md)] [[arXiv:2412.00623](https://arxiv.org/abs/2412.00623)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [16] InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models\n- **🧑‍🔬 作者**：Yifan Lu, Xuanchi Ren, Jiawei Yang, Tianchang Shen, Zhangjie Wu, Jun Gao, Yue Wang, Siheng Chen, Mike Chen, Sanja Fidler, Jiahui Huang\n- **🏫 单位**：NVIDIA ⟐ Shanghai Jiao Tong University ⟐ University of Toronto ⟐ Vector Institute ⟐ University of Southern California\n- **🔗 链接**：[[中英摘要](./abs/2412.03934.md)] [[arXiv:2412.03934](https://arxiv.org/abs/2412.03934)] [[Code](https://github.com/nv-tlabs/InfiniCube)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [17] EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding\n- **🧑‍🔬 作者**：Yuqi Wu, Wenzhao Zheng, Sicheng Zuo, Yuanhui Huang, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2412.04380.md)] [[arXiv:2412.04380](https://arxiv.org/abs/2412.04380)] [[Code](https://github.com/YkiWu/EmbodiedOcc)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [18] FaceLift: Single Image to 3D Head with View Generation and GS-LRM\n- **🧑‍🔬 作者**：Weijie Lyu, Yi Zhou, Ming-Hsuan Yang, Zhixin Shu\n- **🏫 单位**：University of California, Merced ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2412.17812.md)] [[arXiv:2412.17812](https://arxiv.org/abs/2412.17812)] [[Code](https://github.com/weijielyu/FaceLift)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [19] SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding\n- **🧑‍🔬 作者**：Wen Tianci, Liu Zhiang, Lu Biao, Fang Yongchun\n- **🏫 单位**：Nankai University\n- **🔗 链接**：[[中英摘要](./abs/2501.05242.md)] [[arXiv:2501.05242](https://arxiv.org/abs/2501.05242)] [[Code](https://github.com/leaner-forever/SEGS-SLAM)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [20] Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution\n- **🧑‍🔬 作者**：Du Chen, Liyi Chen, Zhengqiang Zhang, Lei Zhang\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ OPPO Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2501.06838.md)] [[arXiv:2501.06838](https://arxiv.org/abs/2501.06838)] [[Code](https://github.com/ChrisDud0257/GSASR)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [21] AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaoyu Zhou, Jingqi Wang, Yongtao Wang, Yufei Wei, Nan Dong, Ming-Hsuan Yang\n- **🏫 单位**：Wangxuan Institute of Computer Technology, Peking University ⟐ Chongqing Changan Automobile Co., Ltd ⟐ University of California, Merced\n- **🔗 链接**：[[中英摘要](./abs/2502.04981.md)] [[arXiv:2502.04981](https://arxiv.org/abs/2502.04981)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [22] GaussRender: Learning 3D Occupancy with Gaussian Rendering\n- **🧑‍🔬 作者**：Loick Chambon, Eloi Zablocki, Alexandre Boulch, Mickael Chen, Matthieu Cord\n- **🏫 单位**：Valeo.ai, Paris, France ⟐ Sorbonne Universit´ e, Paris, France ⟐ Hcompany.ai, Paris, France\n- **🔗 链接**：[[中英摘要](./abs/2502.05040.md)] [[arXiv:2502.05040](https://arxiv.org/abs/2502.05040)] [[Code](https://github.com/valeoai/GaussRender)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [23] Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction\n- **🧑‍🔬 作者**：Youming Deng, Wenqi Xian, Guandao Yang, Leonidas Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec\n- **🏫 单位**：Cornell University ⟐ Netflix Eyeline Studios ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2502.09563.md)] [[arXiv:2502.09563](https://arxiv.org/abs/2502.09563)] [[Code](https://github.com/denghilbert/Self-Cali-GS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [24] GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow\n- **🧑‍🔬 作者**：Simon Boeder, Fabian Gigengack, Benjamin Risse\n- **🏫 单位**：Bosch Research ⟐ University of Munster\n- **🔗 链接**：[[中英摘要](./abs/2502.17288.md)] [[arXiv:2502.17288](https://arxiv.org/abs/2502.17288)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [25] Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars\n- **🧑‍🔬 作者**：Tobias Kirschstein, Javier Romero, Artem Sevastopolsky, Matthias Nießner, Shunsuke Saito\n- **🏫 单位**：Technical University of Munich ⟐ Meta Reality Labs Pittsburgh\n- **🔗 链接**：[[中英摘要](./abs/2502.20220.md)] [[arXiv:2502.20220](https://arxiv.org/abs/2502.20220)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [26] MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions\n- **🧑‍🔬 作者**：Qingyuan Zhou, Yuehu Gong, Weidong Yang, Jiaze Li, Yeqi Luo, Baixin Xu, Shuhao Li, Ben Fei, Ying He\n- **🏫 单位**：Fudan University ⟐ Nanyang Technological University ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2503.05182.md)] [[arXiv:2503.05182](https://arxiv.org/abs/2503.05182)] [[Code](https://github.com/TsingyuanChou/MGSR)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [27] CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images\n- **🧑‍🔬 作者**：Jungho Lee, Donghyeong Kim, Dogyoon Lee, Suhwan Cho, Minhyeok Lee, Wonjoon Lee, Taeoh Kim, Dongyoon Wee, Sangyoun Lee\n- **🏫 单位**：Yonsei University ⟐ NAVER Cloud\n- **🔗 链接**：[[中英摘要](./abs/2503.05332.md)] [[arXiv:2503.05332](https://arxiv.org/abs/2503.05332)] [[Code](https://github.com/Jho-Yonsei/CoMoGaussian)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [28] SplatTalk: 3D VQA with Gaussian Splatting\n- **🧑‍🔬 作者**：Anh Thai, Songyou Peng, Kyle Genova, Leonidas Guibas, Thomas Funkhouser\n- **🏫 单位**：Georgia Institute of Technology ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2503.06271.md)] [[arXiv:2503.06271](https://arxiv.org/abs/2503.06271)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [29] 7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting\n- **🧑‍🔬 作者**：Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu\n- **🏫 单位**：United Imaging Intelligence, Boston MA, USA\n- **🔗 链接**：[[中英摘要](./abs/2503.07946.md)] [[arXiv:2503.07946](https://arxiv.org/abs/2503.07946)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [30] DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction\n- **🧑‍🔬 作者**：Rui Wang, Quentin Lohmeyer, Mirko Meboldt, Siyu Tang\n- **🏫 单位**：ETHZurich\n- **🔗 链接**：[[中英摘要](./abs/2503.13176.md)] [[arXiv:2503.13176](https://arxiv.org/abs/2503.13176)] [[Code](https://github.com/BatFaceWayne/DeGauss)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [31] Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation\n- **🧑‍🔬 作者**：Tiange Xiang, Kai Li, Chengjiang Long, Christian Häne, Peihong Guo, Scott Delp, Ehsan Adeli, Li Fei-Fei\n- **🏫 单位**：Stanford University ⟐ Meta Reality Labs\n- **🔗 链接**：[[中英摘要](./abs/2503.15877.md)] [[arXiv:2503.15877](https://arxiv.org/abs/2503.15877)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [32] OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering\n- **🧑‍🔬 作者**：Shiyong Liu, Xiao Tang, Zhihao Li, Yingfan He, Chongjie Ye, Jianzhuang Liu, Binxiao Huang, Shunbo Zhou, Xiaofei Wu\n- **🏫 单位**：Huawei Noah's Ark Lab ⟐ The Chinese University of HongKong (Shenzhen) ⟐ Shenzhen Institute of Advanced Technology ⟐ The University of HongKong ⟐ Huawei Embodied Intelligence Lab\n- **🔗 链接**：[[中英摘要](./abs/2503.16177.md)] [[arXiv:2503.16177](https://arxiv.org/abs/2503.16177)] [[Code](https://occlugaussian.github.io/)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [33] X^2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction\n- **🧑‍🔬 作者**：Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Johns Hopkins University ⟐ The Australian National University ⟐ University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2503.21779.md)] [[arXiv:2503.21779](https://arxiv.org/abs/2503.21779)] [[Code](https://github.com/yuyouxixi/x2-gaussian)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [34] FlowR: Flowing from Sparse to Dense 3D Reconstructions\n- **🧑‍🔬 作者**：Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang, Nikhil Varma Keetha, Lorenzo Porzi, Norman Müller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder\n- **🏫 单位**：ETH Zurich ⟐ Meta Reality Labs Zurich ⟐ CMU\n- **🔗 链接**：[[中英摘要](./abs/2504.01647.md)] [[arXiv:2504.01647](https://arxiv.org/abs/2504.01647)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [35] SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets\n- **🧑‍🔬 作者**：Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong\n- **🏫 单位**：USTC ⟐ Shanghai AI Lab ⟐ SJTU ⟐ CMU\n- **🔗 链接**：[[中英摘要](./abs/2504.06982.md)] [[arXiv:2504.06982](https://arxiv.org/abs/2504.06982)] [[Code](https://github.com/yyvhang/SIGMAN_release)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [36] DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting\n- **🧑‍🔬 作者**：Zeren Jiang, Shaofei Wang, Siyu Tang\n- **🏫 单位**：Visual Geometry Group, University of Oxford ⟐ ETH Zürich\n- **🔗 链接**：[[中英摘要](./abs/2504.10486.md)] [[arXiv:2504.10486](https://arxiv.org/abs/2504.10486)] [[Code](https://github.com/jzr99/DNF-Avatar)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [37] GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR\n- **🧑‍🔬 作者**：Christophe Bolduc, Yannick Hold-Geoffroy, Zhixin Shu, Jean-François Lalonde\n- **🏫 单位**：Universit´e Laval ⟐ Adobe\n- **🔗 链接**：[[中英摘要](./abs/2504.10809.md)] [[arXiv:2504.10809](https://arxiv.org/abs/2504.10809)] [[Code](https://github.com/lvsn/gaslight)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [38] HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction\n- **🧑‍🔬 作者**：Zhongtao Wang, Mai Su, Huishan Au, Yilong Li, Xizhe Cao, Chengwei Pan, Yisong Chen, Guoping Wang\n- **🏫 单位**：Peking University ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2504.16606.md)] [[arXiv:2504.16606](https://arxiv.org/abs/2504.16606)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [39] Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning\n- **🧑‍🔬 作者**：Mingxuan Cui, Qing Guo, Yuyi Wang, Hongkai Yu, Di Lin, Qin Zou, Ming-Ming Cheng, Xi Li\n- **🏫 单位**：Zhejiang University ⟐ CFARand IHPC, A*STAR, Singapore ⟐ CRRCZhuzhou Institute & Tengen Intelligence Institute ⟐ Cleveland State University ⟐ Tianjin University ⟐ Wuhan University ⟐ Nankai University\n- **🔗 链接**：[[中英摘要](./abs/2504.17815.md)] [[arXiv:2504.17815](https://arxiv.org/abs/2504.17815)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [40] Sparfels: Fast Reconstruction from Sparse Unposed Imagery\n- **🧑‍🔬 作者**：Shubhendu Jena, Amine Ouasfi, Mae Younes, Adnane Boukhayma\n- **🏫 单位**：INRIA\n- **🔗 链接**：[[中英摘要](./abs/2505.02178.md)] [[arXiv:2505.02178](https://arxiv.org/abs/2505.02178)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [41] GUAVA: Generalizable Upper Body 3D Gaussian Avatar\n- **🧑‍🔬 作者**：Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Yang Li, Minghan Qin, Yu Li, Haoqian Wang\n- **🏫 单位**：Tsinghua University ⟐ International Digital Economy Academy\n- **🔗 链接**：[[中英摘要](./abs/2505.03351.md)] [[arXiv:2505.03351](https://arxiv.org/abs/2505.03351)] [[Code](https://github.com/Pixel-Talk/GUAVA)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [42] QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization\n- **🧑‍🔬 作者**：Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner, Angela Dai\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2505.05591.md)] [[arXiv:2505.05591](https://arxiv.org/abs/2505.05591)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [43] SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations\n- **🧑‍🔬 作者**：Songchun Zhang, Huiyao Xu, Sitong Guo, Zhongwei Xie, Pengwei Liu, Hujun Bao, Weiwei Xu, Changqing Zou\n- **🏫 单位**：Zhejiang University ⟐ Zhejiang Lab ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2505.11992.md)] [[arXiv:2505.11992](https://arxiv.org/abs/2505.11992)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [44] CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting\n- **🧑‍🔬 作者**：Lei Tian, Xiaomin Li, Liqian Ma, Hefei Huang, Zirui Zheng, Hao Yin, Taiqing Li, Huchuan Lu, Xu Jia\n- **🏫 单位**：Dalian University of Technology ⟐ ZMO AI\n- **🔗 链接**：[[中英摘要](./abs/2505.20469.md)] [[arXiv:2505.20469](https://arxiv.org/abs/2505.20469)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [45] SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images\n- **🧑‍🔬 作者**：Yu Sheng, Jiajun Deng, Xinran Zhang, Yu Zhang, Bei Hua, Yanyong Zhang, Jianmin Ji\n- **🏫 单位**：University of Science and Technology of China ⟐ The University of Adelaide\n- **🔗 链接**：[[中英摘要](./abs/2505.23044.md)] [[arXiv:2505.23044](https://arxiv.org/abs/2505.23044)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [46] AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion\n- **🧑‍🔬 作者**：Yangyi Huang, Ye Yuan, Xueting Li, Jan Kautz, Umar Iqbal\n- **🏫 单位**：NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2505.24877.md)] [[arXiv:2505.24877](https://arxiv.org/abs/2505.24877)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [47] RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS\n- **🧑‍🔬 作者**：Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, Xiaochun Cao\n- **🏫 单位**：Sun Yat-sen University ⟐ FNii-Shenzhen ⟐ CUHKSZ\n- **🔗 链接**：[[中英摘要](./abs/2506.02751.md)] [[arXiv:2506.02751](https://arxiv.org/abs/2506.02751)] [[Code](https://github.com/fcyycf/RobustSplat)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [48] CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization\n- **🧑‍🔬 作者**：Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas Guibas, Songyou Peng\n- **🏫 单位**：ETH Zurich ⟐ Stanford University ⟐ CTU Prague ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2506.21117.md)] [[arXiv:2506.21117](https://arxiv.org/abs/2506.21117)] [[Code](https://github.com/jan-ackermann/cl-splats)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [49] Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction\n- **🧑‍🔬 作者**：Zhirui Gao, Renjiao Yi, Yaqiao Dai, Xuening Zhu, Wei Chen, Chenyang Zhu, Kai Xu\n- **🏫 单位**：National University of Defense Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.21401.md)] [[arXiv:2506.21401](https://arxiv.org/abs/2506.21401)] [[Code](https://github.com/zhirui-gao/Curve-Gaussian)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [50] GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation\n- **🧑‍🔬 作者**：Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian\n- **🏫 单位**：Beijing University of Posts and Telecommunications ⟐ Kuaishou Technology ⟐ Renmin University of China\n- **🔗 链接**：[[中英摘要](./abs/2506.21513.md)] [[arXiv:2506.21513](https://arxiv.org/abs/2506.21513)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [51] RoboPearls: Editable Video Simulation for Robot Manipulation\n- **🧑‍🔬 作者**：Tao Tang, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, xia zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang\n- **🏫 单位**：Shenzhen Campus of Sun Yat-sen University ⟐ Sun Yat-sen University ⟐ Bytedance Seed ⟐ Li Auto Inc.\n- **🔗 链接**：[[中英摘要](./abs/2506.22756.md)] [[arXiv:2506.22756](https://arxiv.org/abs/2506.22756)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [52] VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding\n- **🧑‍🔬 作者**：Minchao Jiang, Shunyu Jia, Jiaming Gu, Xiaoyuan Lu, Guangming Zhu, Anqi Dong, Liang Zhang\n- **🏫 单位**：Xidian University ⟐ Algorithm R&D Center ⟐ Shanghai Pudong Cryptography Research Institute ⟐ KTH Royal Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.22799.md)] [[arXiv:2506.22799](https://arxiv.org/abs/2506.22799)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [53] RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors\n- **🧑‍🔬 作者**：Sicong Du, Jiarun Liu, Qifeng Chen, Hao-Xiang Chen, Tai-Jiang Mu, Sheng Yang\n- **🏫 单位**：CaiNiao Inc. ⟐ Zhejiang University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2506.22800.md)] [[arXiv:2506.22800](https://arxiv.org/abs/2506.22800)] [[Code](https://github.com/CN-ADLab/RGE-GS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [54] From Coarse to Fine: Learnable Discrete Wavelet Transforms for Efficient 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hung Nguyen, An Le, Runfa Li, Truong Nguyen\n- **🏫 单位**：UC San Diego\n- **🔗 链接**：[[中英摘要](./abs/2506.23042.md)] [[arXiv:2506.23042](https://arxiv.org/abs/2506.23042)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 Workshop\n\n#### [55] Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space\n- **🧑‍🔬 作者**：Yingping Liang, Yutao Hu, Wenqi Shao, Ying Fu\n- **🏫 单位**：Beijing Institute of Technology ⟐ Southeast University ⟐ Shanghai Al Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2507.00392.md)] [[arXiv:2507.00392](https://arxiv.org/abs/2507.00392)] [[Code](https://github.com/Sharpiless/L2M)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [56] 3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation\n- **🧑‍🔬 作者**：Tianrui Lou, Xiaojun Jia, Siyuan Liang, Jiawei Liang, Ming Zhang, Yanjun Xiao, Xiaochun Cao\n- **🏫 单位**：Sun Yat-Sen University ⟐ Peng Cheng Laboratory ⟐ Nanyang Technological University ⟐ National University of Singapore ⟐ National Key Laboratory of Science and Technology on Information System Security ⟐ Nsfocus\n- **🔗 链接**：[[中英摘要](./abs/2507.01367.md)] [[arXiv:2507.01367](https://arxiv.org/abs/2507.01367)] [[Code](https://github.com/TRLou/PGA)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [57] LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling\n- **🧑‍🔬 作者**：Jiahao Wu, Rui Peng, Jianbo Jiao, Jiayu Yang, Luyang Tang, Kaiqiang Xiong, Jie Liang, Jinbo Yan, Runling Liu, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Pengcheng Lab ⟐ University of Birmingham\n- **🔗 链接**：[[中英摘要](./abs/2507.02363.md)] [[arXiv:2507.02363](https://arxiv.org/abs/2507.02363)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [58] Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps\n- **🧑‍🔬 作者**：Chong Cheng, Sicheng Yu, Zijian Wang, Yifan Zhou, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2507.03737.md)] [[arXiv:2507.03737](https://arxiv.org/abs/2507.03737)] [[Code](https://github.com/3DAgentWorld/S3PO-GS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [59] VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis\n- **🧑‍🔬 作者**：Alexandre Symeonidis-Herzig, Özge Mercanoğlu Sincan, Richard Bowden\n- **🏫 单位**：University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2507.06060.md)] [[arXiv:2507.06060](https://arxiv.org/abs/2507.06060)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 Workshop\n\n#### [60] RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration\n- **🧑‍🔬 作者**：Chong Cheng, Yu Hu, Sicheng Yu, Beizhen Zhao, Zijian Wang, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2507.08136.md)] [[arXiv:2507.08136](https://arxiv.org/abs/2507.08136)] [[Code](https://3dagentworld.github.io/reggs/)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [61] Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling\n- **🧑‍🔬 作者**：Hayeon Kim, Ji Ha Jang, Se Young Chun\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2507.11061.md)] [[arXiv:2507.11061](https://arxiv.org/abs/2507.11061)] [[Code](https://github.com/janeyeon/romap-code)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [62] TRAN-D: 2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update\n- **🧑‍🔬 作者**：Jeongyun Kim, Seunghoon Jeong, Giseop Kim, Myung-Hwan Jeon, Eunji Jun, Ayoung Kim\n- **🏫 单位**：Seoul National University ⟐ DGIST ⟐ University of Illinois Urbana-Champaign ⟐ Hyundai Motor Group\n- **🔗 链接**：[[中英摘要](./abs/2507.11069.md)] [[arXiv:2507.11069](https://arxiv.org/abs/2507.11069)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [63] DCHM: Depth-Consistent Human Modeling for Multiview Detection\n- **🧑‍🔬 作者**：Jiahao Ma, Tianyu Wang, Miaomiao Liu, David Ahmedt-Aristizabal, Chuong Nguyen\n- **🏫 单位**：Australian National University ⟐ CSIRO Data61\n- **🔗 链接**：[[中英摘要](./abs/2507.14505.md)] [[arXiv:2507.14505](https://arxiv.org/abs/2507.14505)] [[Code](https://github.com/Jiahao-Ma/DCHM-code)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [64] ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting\n- **🧑‍🔬 作者**：Ruijie Zhu, Mulin Yu, Linning Xu, Lihan Jiang, Yixuan Li, Tianzhu Zhang, Jiangmiao Pang, Bo Dai\n- **🏫 单位**：University of Science and Technology of China ⟐ Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2507.15454.md)] [[arXiv:2507.15454](https://arxiv.org/abs/2507.15454)] [[Code](https://github.com/RuijieZhu94/ObjectGS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [65] SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting\n- **🧑‍🔬 作者**：Zihui Gao, Jia-Wang Bian, Guosheng Lin, Hao Chen, Chunhua Shen\n- **🏫 单位**：Zhejiang University ⟐ ByteDance Seed ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2507.15602.md)] [[arXiv:2507.15602](https://arxiv.org/abs/2507.15602)] [[Code](https://github.com/aim-uofa/SurfaceSplat)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [66] Gaussian Splatting with Discretized SDF for Relightable Assets\n- **🧑‍🔬 作者**：Zuo-Liang Zhu, Jian Yang, Beibei Wang\n- **🏫 单位**：Nankai University ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](./abs/2507.15629.md)] [[arXiv:2507.15629](https://arxiv.org/abs/2507.15629)] [[Code](https://github.com/NK-CS-ZZL/DiscretizedSDF)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [67] GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar\n- **🧑‍🔬 作者**：SeungJun Moon, Hah Min Lew, Seungeun Lee, Ji-Su Kang, Gyeong-Moon Park\n- **🏫 单位**：Klleon AI Research ⟐ Korea University\n- **🔗 链接**：[[中英摘要](./abs/2507.18155.md)] [[arXiv:2507.18155](https://arxiv.org/abs/2507.18155)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [68] DASH: 4D Hash Encoding with Self-Supervised Decomposition for Real-Time Dynamic Scene Rendering\n- **🧑‍🔬 作者**：Jie Chen, Zhangchi Hu, Peixi Wu, Huyue Zhu, Hebei Li, Xiaoyan Sun\n- **🏫 单位**：University of Science and Technology of China ⟐ Hefei Comprehensive National Science Center\n- **🔗 链接**：[[中英摘要](./abs/2507.19141.md)] [[arXiv:2507.19141](https://arxiv.org/abs/2507.19141)] [[Code](https://github.com/chenj02/DASH)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [69] HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars\n- **🧑‍🔬 作者**：Byungjun Kim, Shunsuke Saito, Giljoo Nam, Tomas Simon, Jason Saragih, Hanbyul Joo, Junxuan Li\n- **🏫 单位**：Seoul National University ⟐ Codec Avatars Lab, Meta\n- **🔗 链接**：[[中英摘要](./abs/2507.19481.md)] [[arXiv:2507.19481](https://arxiv.org/abs/2507.19481)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [70] From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos\n- **🧑‍🔬 作者**：Chenjian Gao, Lihe Ding, Rui Han, Zhanpeng Huang, Zibin Wang, Tianfan Xue\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ SenseTime Research ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2507.20331.md)] [[arXiv:2507.20331](https://arxiv.org/abs/2507.20331)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [71] GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections\n- **🧑‍🔬 作者**：Haiyang Bai, Jiaqi Zhu, Songru Jiang, Wei Huang, Tao Lu, Yuanqi Li, Jie Guo, Runze Fu, Yanwen Guo, Lijun Chen\n- **🏫 单位**：Nanjing University ⟐ Brown University ⟐ JSTI Group\n- **🔗 链接**：[[中英摘要](./abs/2507.20512.md)] [[arXiv:2507.20512](https://arxiv.org/abs/2507.20512)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [72] Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction\n- **🧑‍🔬 作者**：Zhensheng Yuan, Haozhi Huang, Zhen Xiong, Di Wang, Guanghua Yang\n- **🏫 单位**：Jinan University ⟐ University of Macau\n- **🔗 链接**：[[中英摘要](./abs/2507.23006.md)] [[arXiv:2507.23006](https://arxiv.org/abs/2507.23006)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [73] NeRF Is a Valuable Assistant for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shuangkang Fang, I-Chao Shen, Takeo Igarashi, Yufeng Wang, ZeSheng Wang, Yi Yang, Wenrui Ding, Shuchang Zhou\n- **🏫 单位**：Beihang University ⟐ The University of Toky ⟐ StepFun\n- **🔗 链接**：[[中英摘要](./abs/2507.23374.md)] [[arXiv:2507.23374](https://arxiv.org/abs/2507.23374)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [74] MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction\n- **🧑‍🔬 作者**：Zijian Dong, Longteng Duan, Jie Song, Michael J. Black, Andreas Geiger\n- **🏫 单位**：ETH Zurich ⟐ University of Tubingen ⟐ HKUST ⟐ Max Planck Institute for Intelligent Systems\n- **🔗 链接**：[[中英摘要](./abs/2507.23597.md)] [[arXiv:2507.23597](https://arxiv.org/abs/2507.23597)] [[Code](https://github.com/zj-dong/MoGA)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [75] Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis\n- **🧑‍🔬 作者**：Bowen Zhang, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, Baining Guo\n- **🏫 单位**：University of Science and Technology of China ⟐ Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](./abs/2507.23785.md)] [[arXiv:2507.23785](https://arxiv.org/abs/2507.23785)] [[Code](https://github.com/ForeverFancy/gvfdiffusion)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [76] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation\n- **🧑‍🔬 作者**：Wenxuan Guo, Xiuwei Xu, Hang Yin, Ziwei Wang, Jianjiang Feng, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2508.00823.md)] [[arXiv:2508.00823](https://arxiv.org/abs/2508.00823)] [[Code](https://github.com/GWxuan/IGL-Nav)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [77] No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views\n- **🧑‍🔬 作者**：Ranran Huang, Krystian Mikolajczyk\n- **🏫 单位**：Imperial College London\n- **🔗 链接**：[[中英摘要](./abs/2508.01171.md)] [[arXiv:2508.01171](https://arxiv.org/abs/2508.01171)] [[Code](https://github.com/ranrhuang/SPFSplat)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [78] OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGS\n- **🧑‍🔬 作者**：Han Ling, Xian Xu, Yinghui Sun, Quansen Sun\n- **🏫 单位**：Nanjing University of Science and Technology ⟐ Southeast University ⟐ CityU HK\n- **🔗 链接**：[[中英摘要](../abs/2508.01239.md)] [[arXiv:2508.01239](https://arxiv.org/abs/2508.01239)] [[Code](https://github.com/HanLingsgjk/OCSplats)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [79] Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians\n- **🧑‍🔬 作者**：Quankai Gao, Iliyan Georgiev, Tuanfeng Y. Wang, Krishna Kumar Singh, Ulrich Neumann, Jae Shin Yoon\n- **🏫 单位**：University of Southern California ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2508.01464.md)] [[arXiv:2508.01464](https://arxiv.org/abs/2508.01464)] [[Code](https://github.com/Zerg-Overmind/Can3Tok)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [80] Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing\n- **🧑‍🔬 作者**：Hongyu Shen, Junfeng Ni, Yixin Chen, Weishuo Li, Mingtao Pei, Siyuan Huang\n- **🏫 单位**：Beijing Institute of Technoloy ⟐ BIGAI ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2508.03227.md)] [[arXiv:2508.03227](https://arxiv.org/abs/2508.03227)] [[Code](https://github.com/trace-3d/Trace3D)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [81] Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework\n- **🧑‍🔬 作者**：Yi-Ting Chen, Ting-Hsuan Liao, Pengsheng Guo, Alexander Schwing, Jia-Bin Huang\n- **🏫 单位**：University of Maryland, College Park ⟐ Carnegie Mellon University ⟐ University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2508.04090.md)] [[arXiv:2508.04090](https://arxiv.org/abs/2508.04090)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [82] MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction\n- **🧑‍🔬 作者**：Yaopeng Lou, Liao Shen, Tianqi Liu, Jiaqi Li, Zihao Huang, Huiqiang Sun, Zhiguo Cao\n- **🏫 单位**：Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.04297.md)] [[arXiv:2508.04297](https://arxiv.org/abs/2508.04297)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [83] CF3: Compact and Fast 3D Feature Fields\n- **🧑‍🔬 作者**：Hyunjoon Lee, Joonkyu Min, Jaesik Park\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2508.05254.md)] [[arXiv:2508.05254](https://arxiv.org/abs/2508.05254)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [84] GAP: Gaussianize Any Point Clouds with Text Guidance\n- **🧑‍🔬 作者**：Weiqi Zhang, Junsheng Zhou, Haotian Geng, Wenyuan Zhang, Yu-Shen Liu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2508.05631.md)] [[arXiv:2508.05631](https://arxiv.org/abs/2508.05631)] [[Code](https://github.com/weiqi-zhang/GAP)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [85] ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors\n- **🧑‍🔬 作者**：Minsu Kim, Subin Jeon, In Cho, Mijin Yoo, Seon Joo Kim\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2508.06014.md)] [[arXiv:2508.06014](https://arxiv.org/abs/2508.06014)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [86] Learning an Implicit Physics Model for Image-based Fluid Simulation\n- **🧑‍🔬 作者**：Emily Yue-Ting Jia, Jiageng Mao, Zhiyuan Gao, Yajie Zhao, Yue Wang\n- **🏫 单位**：University of Southern California\n- **🔗 链接**：[[中英摘要](./abs/2508.08254.md)] [[arXiv:2508.08254](https://arxiv.org/abs/2508.08254)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [87] GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments\n- **🧑‍🔬 作者**：Lin Zeng, Boming Zhao, Jiarui Hu, Xujie Shen, Ziqiang Dang, Hujun Bao, Zhaopeng Cui\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2508.08867.md)] [[arXiv:2508.08867](https://arxiv.org/abs/2508.08867)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [88] SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing\n- **🧑‍🔬 作者**：Heyi Sun, Cong Wang, Tian-Xing Xu, Jingwei Huang, Di Kang, Chunchao Guo, Song-Hai Zhang\n- **🏫 单位**：Tsinghua University ⟐ Tencent Hunyuan\n- **🔗 链接**：[[中英摘要](./abs/2508.09597.md)] [[arXiv:2508.09597](https://arxiv.org/abs/2508.09597)] [[Code](https://github.com/heyy-sun/SVG-Head)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [89] TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos\n- **🧑‍🔬 作者**：Jinxi Li, Ziyang Song, Bo Yang\n- **🏫 单位**：The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2508.09811.md)] [[arXiv:2508.09811](https://arxiv.org/abs/2508.09811)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [90] WIPES: Wavelet-based Visual Primitives\n- **🧑‍🔬 作者**：Wenhao Zhang, Hao Zhu, Delong Wu, Di Kang, Linchao Bao, Xun Cao, Zhan Ma\n- **🏫 单位**：Nanjing University ⟐ Tencent\n- **🔗 链接**：[[中英摘要](./abs/2508.12615.md)] [[arXiv:2508.12615](https://arxiv.org/abs/2508.12615)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [91] LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos\n- **🧑‍🔬 作者**：Chin-Yang Lin, Cheng Sun, Fu-En Yang, Min-Hung Chen, Yen-Yu Lin, Yu-Lun Liu\n- **🏫 单位**：National Yang Ming Chiao Tung University ⟐ NVIDIA Research\n- **🔗 链接**：[[中英摘要](./abs/2508.14041.md)] [[arXiv:2508.14041](https://arxiv.org/abs/2508.14041)] [[Code](https://github.com/NVlabs/LongSplat)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [92] GWM: Towards Scalable Gaussian World Models for Robotic Manipulation\n- **🧑‍🔬 作者**：Guanxing Lu, Baoxiong Jia, Puhao Li, Yixin Chen, Ziwei Wang, Yansong Tang, Siyuan Huang\n- **🏫 单位**：Tsinghua University ⟐ BIGAI ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2508.17600.md)] [[arXiv:2508.17600](https://arxiv.org/abs/2508.17600)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [93] GSVisLoc: Generalizable Visual Localization for Gaussian Splatting Scene Representations\n- **🧑‍🔬 作者**：Fadi Khatib, Dror Moran, Guy Trostianetsky, Yoni Kasten, Meirav Galun, Ronen Basri\n- **🏫 单位**：Weizmann Institute of Science ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2508.18242.md)] [[arXiv:2508.18242](https://arxiv.org/abs/2508.18242)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 CALIPOSE workshop\n\n#### [94] Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images\n- **🧑‍🔬 作者**：Changha Shin, Woong Oh Cho, Seon Joo Kim\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2508.20080.md)] [[arXiv:2508.20080](https://arxiv.org/abs/2508.20080)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [95] Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars\n- **🧑‍🔬 作者**：Vanessa Sklyarova, Egor Zakharov, Malte Prinzler, Giorgio Becherini, Michael J. Black, Justus Thies\n- **🏫 单位**：Max Planck Institute for Intelligent Systems ⟐ ETH Zurich ⟐ Technical University of Darmstadt\n- **🔗 链接**：[[中英摘要](./abs/2509.01469.md)] [[arXiv:2509.01469](https://arxiv.org/abs/2509.01469)] [[Code](https://github.com/Vanessik/Im2Haircut)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [96] T2Bs: Text-to-Character Blendshapes via Video Generation\n- **🧑‍🔬 作者**：Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky, Vladislav Shakhrai, Di Liu, Peiye Zhuang, Sergey Tulyakov, Peter Wonka, Hsin-Ying Lee, James Davis, Jian Wang\n- **🏫 单位**：University of Californi ⟐ Snap Inc. ⟐ Rutgers University ⟐ KAUST\n- **🔗 链接**：[[中英摘要](./abs/2509.10678.md)] [[arXiv:2509.10678](https://arxiv.org/abs/2509.10678)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [97] EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device\n- **🧑‍🔬 作者**：Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad, Zsolt Kira\n- **🏫 单位**：Georgia Tech ⟐ Toyota Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2509.17430.md)] [[arXiv:2509.17430](https://arxiv.org/abs/2509.17430)] [[Code](https://github.com/gchhablani/embodied-splat-v1)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [98] PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction\n- **🧑‍🔬 作者**：Yufei Han, Bowen Tie, Heng Guo, Youwei Lyu, Si Li, Boxin Shi, Yunpeng Jia, Zhanyu Ma\n- **🏫 单位**：Beijing University of Posts and Telecommunications ⟐ Xiong’an Aerospace Information Research Institute ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2509.19726.md)] [[arXiv:2509.19726](https://arxiv.org/abs/2509.19726)] [[Code](https://github.com/PRIS-CV/PolGS)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [99] PU-Gaussian: Point Cloud Upsampling using 3D Gaussian Representation\n- **🧑‍🔬 作者**：Mahmoud Khater, Mona Strauss, Philipp von Olshausen, Alexander Reiterer\n- **🏫 单位**：University of Freiburg ⟐ Fraunhofer IPM\n- **🔗 链接**：[[中英摘要](./abs/2509.20207.md)] [[arXiv:2509.20207](https://arxiv.org/abs/2509.20207)] [[Code](https://github.com/mvg-inatech/PU-Gaussian)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 e2e3D Workshop\n\n#### [100] SeHDR: Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing\n- **🧑‍🔬 作者**：Yiyu Li, Haoyuan Wang, Ke Xu, Gerhard Petrus Hancke, Rynson W.H. Lau\n- **🏫 单位**：City University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2509.20400.md)] [[arXiv:2509.20400](https://arxiv.org/abs/2509.20400)] [[Code](https://github.com/yiyulics/SeHDR)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [101] StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions\n- **🧑‍🔬 作者**：Bo-Hsu Ke, You-Zhe Xie, Yu-Lun Liu, Wei-Chen Chiu\n- **🏫 单位**：National Yang Ming Chiao Tung University\n- **🔗 链接**：[[中英摘要](./abs/2510.02314.md)] [[arXiv:2510.02314](https://arxiv.org/abs/2510.02314)] [[Code](https://github.com/Hentci/StealthAttack_official)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [102] Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency\n- **🧑‍🔬 作者**：Yuxin Cheng, Binxiao Huang, Taiqiang Wu, Wenyong Zhou, Chenchen Ding, Zhengwu Liu, Graziano Chesi, Ngai Wong\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2510.10993.md)] [[arXiv:2510.10993](https://arxiv.org/abs/2510.10993)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [103] Hybrid Gaussian Splatting for Novel Urban View Synthesis\n- **🧑‍🔬 作者**：Mohamed Omran, Farhad Zanjani, Davide Abati, Jens Petersen, Amirhossein Habibian\n- **🏫 单位**：Qualcomm AI Research\n- **🔗 链接**：[[中英摘要](./abs/2510.12308.md)] [[arXiv:2510.12308](https://arxiv.org/abs/2510.12308)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 RealADSim Workshop\n\n#### [104] Leveraging 2D Priors and SDF Guidance for Dynamic Urban Scene Rendering\n- **🧑‍🔬 作者**：Siddharth Tourani, Jayaram Reddy, Akash Kumbar, Satyajit Tourani, Nishant Goyal, Madhava Krishna, N. Dinesh Reddy, Muhammad Haris Khan\n- **🏫 单位**：IIIT Hyderabad ⟐ MBZUAI ⟐ University of Heidelberg ⟐ VLM Run ⟐ IIT Kharagpur\n- **🔗 链接**：[[中英摘要](./abs/2510.13381.md)] [[arXiv:2510.13381](https://arxiv.org/abs/2510.13381)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [105] Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images\n- **🧑‍🔬 作者**：Emanuel Garbin, Guy Adam, Oded Krams, Zohar Barzelay, Eran Guendelman, Michael Schwarz, Matteo Presutto, Moran Vatelmacher, Yigal Shenkman, Eli Peker, Itai Druker, Uri Patish, Yoav Blum, Max Bluvstein, Junxuan Li, Rawal Khirodkar, Shunsuke Saito\n- **🏫 单位**：Meta\n- **🔗 链接**：[[中英摘要](./abs/2510.14081.md)] [[arXiv:2510.14081](https://arxiv.org/abs/2510.14081)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 AMFG Workshop\n\n#### [106] Leveraging Learned Image Prior for 3D Gaussian Compression\n- **🧑‍🔬 作者**：Seungjoo Shin, Jaesik Park, Sunghyun Cho\n- **🏫 单位**：POSTECH ⟐ Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2510.14705.md)] [[arXiv:2510.14705](https://arxiv.org/abs/2510.14705)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025 Workshop on ECLR\n\n#### [107] InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation\n- **🧑‍🔬 作者**：Jungmin Lee, Seonghyuk Hong, Juyong Lee, Jaeyoon Lee, Jongwon Choi\n- **🏫 单位**：Chung-Ang University ⟐ National Research Institute of Cultural Heritage, Republic of Korea\n- **🔗 链接**：[[中英摘要](./abs/2510.17864.md)] [[arXiv:2510.17864](https://arxiv.org/abs/2510.17864)] [Code]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [108] HouseTour: A Virtual Real Estate A(I)gent\n- **🧑‍🔬 作者**：Ata Çelen, Marc Pollefeys, Daniel Barath, Iro Armeni\n- **🏫 单位**：ETH Zurich ⟐ Stanford University ⟐ Microsoft Spatial AI Lab ⟐ HUN-REN SZTAKI\n- **🔗 链接**：[[中英摘要](./abs/2510.18054.md)] [[arXiv:2510.18054](https://arxiv.org/abs/2510.18054)] [[Code](https://github.com/GradientSpaces/HouseTour)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n\n#### [109] DIMO: Diverse 3D Motion Generation for Arbitrary Objects\n- **🧑‍🔬 作者**：Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis\n- **🏫 单位**：University of Pennsylvania ⟐ Archimedes, Athena RC\n- **🔗 链接**：[[中英摘要](./abs/2511.07409.md)] [[arXiv:2511.07409](https://arxiv.org/abs/2511.07409)] [[Code](https://github.com/Friedrich-M/DIMO)]\n- **📝 说明**: 🏆 Accepted to ICCV 2025\n"
  },
  {
    "path": "2025/ICIP.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICIP2025\n\n#### [1] Sparse2DGS: Sparse-View Surface Reconstruction using 2D Gaussian Splatting with Dense Point Cloud\n- **🧑‍🔬 作者**：Natsuki Takama, Shintaro Ito, Koichi Ito, Hwann-Tzong Chen, Takafumi Aoki\n- **🏫 单位**：Tohoku University ⟐ National Tsing Hua University\n- **🔗 链接**：[[中英摘要](./abs/2505.19854.md)] [[arXiv:2505.19854](https://arxiv.org/abs/2505.19854)] [Code]\n- **📝 说明**: 🏆 Accepted to ICIP 2025\n\n#### [2] ErpGS: Equirectangular Image Rendering enhanced with 3D Gaussian Regularization\n- **🧑‍🔬 作者**：Shintaro Ito, Natsuki Takama, Koichi Ito, Hwann-Tzong Chen, Takafumi Aoki\n- **🏫 单位**：Tohoku University ⟐ National Tsing Hua University\n- **🔗 链接**：[[中英摘要](./abs/2505.19883.md)] [[arXiv:2505.19883](https://arxiv.org/abs/2505.19883)] [Code]\n- **📝 说明**: 🏆 Accepted to ICIP 2025\n\n#### [3] Pose-free 3D Gaussian splatting via shape-ray estimation\n- **🧑‍🔬 作者**：Youngju Na, Taeyeon Kim, Jumin Lee, Kyu Beom Han, Woo Jae Kim, Sung-eui Yoon\n- **🏫 单位**：KAIST\n- **🔗 链接**：[[中英摘要](./abs/2505.22978.md)] [[arXiv:2505.22978](https://arxiv.org/abs/2505.22978)] [Code]\n- **📝 说明**: 🏆 Accepted to ICIP 2025\n\n#### [4] ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes\n- **🧑‍🔬 作者**：Chenhao Zhang, Yezhi Shen, Fengqing Zhu\n- **🏫 单位**：Purdue University\n- **🔗 链接**：[[中英摘要](./abs/2506.21629.md)] [[arXiv:2506.21629](https://arxiv.org/abs/2506.21629)] [[Code](https://github.com/Chenhao-Z/ICP-3DGS)]\n- **📝 说明**: 🏆 Accepted to ICIP 2025\n"
  },
  {
    "path": "2025/ICLR.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICLR2025\n\n#### [1] Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos\n- **🧑‍🔬 作者**：Isabella Liu, Hao Su, Xiaolong Wang\n- **🏫 单位**：University of California, San Diego\n- **🔗 链接**：[[中英摘要](../abs/2404.12379.md)] [[arXiv:2404.12379](https://arxiv.org/abs/2404.12379)] [[Code](https://github.com/Isabella98Liu/DG-Mesh)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [2] EG4D: Explicit Generation of 4D Object without Score Distillation\n- **🧑‍🔬 作者**：Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li\n- **🏫 单位**：USTC ⟐ City University of Hong Kong ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](../abs/2405.18132.md)] [[arXiv:2405.18132](https://arxiv.org/abs/2405.18132)] [[Code](https://github.com/jasongzy/EG4D)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [3] 3D StreetUnveiler with Semantic-Aware 2DGS\n- **🧑‍🔬 作者**：Jingwei Xu, Yikai Wang, Yiqun Zhao, Yanwei Fu, Shenghua Gao\n- **🏫 单位**：ShanghaiTech University ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2405.18416.md)] [[arXiv:2405.18416](https://arxiv.org/abs/2405.18416)] [[Code](https://github.com/DavidXu-JJ/StreetUnveiler)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [4] 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting\n- **🧑‍🔬 作者**：Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Stanford University ⟐ Snap Inc. ⟐ University of California Los Angeles ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](../abs/2405.18424.md)] [[arXiv:2405.18424](https://arxiv.org/abs/2405.18424)] [[Code](https://github.com/zqh0253/3DitScene)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [5] MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors\n- **🧑‍🔬 作者**：Qingming Liu, Yuan Liu, Jiepeng Wang, Xianqiang Lv, Peng Wang, Wenping Wang, Junhui Hou\n- **🏫 单位**：City University of Hong kong ⟐ The University of Hong kong ⟐ Texas A&M University, U.S.A\n- **🔗 链接**：[[中英摘要](../abs/2406.00434.md)] [[arXiv:2406.00434](https://arxiv.org/abs/2406.00434)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [6] Generalizable Human Gaussians from Single-View Image\n- **🧑‍🔬 作者**：Jinnan Chen, Chen Li, Jianfeng Zhang, Hanlin Chen, Buzhen Huang, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐  Bytedance\n- **🔗 链接**：[[中英摘要](../abs/2406.06050.md)] [[arXiv:2406.06050](https://arxiv.org/abs/2406.06050)] [[Code](https://github.com/jinnan-chen/HGM)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [7] InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting\n- **🧑‍🔬 作者**：Chenxin Li, Hengyu Liu, Zhiwen Fan, Wuyang Li, Yifan Liu, Panwang Pan, Yixuan Yuan\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ University of Texas at Austin ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](./abs/2407.01301.md)] [[arXiv:2407.01301](https://arxiv.org/abs/2407.01301)] [[Code](https://github.com/CUHK-AIM-Group/GaussianStego)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [8] Gaussian Splatting Lucas-Kanade\n- **🧑‍🔬 作者**：Liuyue Xie, Joel Julin, Koichiro Niinuma, Laszlo A. Jeni\n- **🏫 单位**：Carnegie Mellon University ⟐ Fujitsu Research of America\n- **🔗 链接**：[[中英摘要](./abs/2407.11309.md)] [[arXiv:2407.11309](https://arxiv.org/abs/2407.11309)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [9] GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Zirui Wang, Ming Cheng, Victor Adrian Prisacariu, Tristan Braud\n- **🏫 单位**：HKUST ⟐ University of Oxford ⟐ Dartmouth College\n- **🔗 链接**：[[中英摘要](../abs/2408.11085.md)] [[arXiv:2408.11085](https://arxiv.org/abs/2408.11085)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [10] Atlas Gaussians Diffusion for 3D Generation with Infinite Number of Points\n- **🧑‍🔬 作者**：Haitao Yang, Yuan Dong, Hanwen Jiang, Dejia Xu, Georgios Pavlakos, Qixing Huang\n- **🏫 单位**：The University of Texas at Austin ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2408.13055.md)] [[arXiv:2408.13055](https://arxiv.org/abs/2408.13055)] [[Code](https://github.com/yanghtr/AtlasGaussians)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [11] OmniRe: Omni Urban Scene Reconstruction\n- **🧑‍🔬 作者**：Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, Yue Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Technion ⟐ University of Toronto ⟐ Stanford University ⟐ NVIDIA Research ⟐ University of Southern California\n- **🔗 链接**：[[中英摘要](../abs/2408.16760.md)] [[arXiv:2408.16760](https://arxiv.org/abs/2408.16760)] [[Code](https://github.com/ziyc/drivestudio)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [12] ThermalGaussian: Thermal 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Rongfeng Lu, Hangyu Chen, Zunjie Zhu, Yuhang Qin, Ming Lu, Le Zhang, Chenggang Yan, Anke Xue\n- **🏫 单位**：Hangzhou Dianzi University ⟐ Intel Labs China ⟐ State Key Lab of CAD&CG, Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2409.07200.md)] [[arXiv:2409.07200](https://arxiv.org/abs/2409.07200)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [13] Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection\n- **🧑‍🔬 作者**：Hongru Yan, Yu Zheng, Yueqi Duan\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2410.01404.md)] [[arXiv:2410.01404](https://arxiv.org/abs/2410.01404)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [14] GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering\n- **🧑‍🔬 作者**：Hongze Chen, Zehong Lin, Jun Zhang\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2410.02619.md)] [[arXiv:2410.02619](https://arxiv.org/abs/2410.02619)] [[Code](https://github.com/stopaimme/GI-GS)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [15] 6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering\n- **🧑‍🔬 作者**：Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu\n- **🏫 单位**：United Imaging Intelligence, Boston MA, USA\n- **🔗 链接**：[[中英摘要](../abs/2410.04974.md)] [[arXiv:2410.04974](https://arxiv.org/abs/2410.04974)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [16] RelitLRM: Generative Relightable Radiance for Large Reconstruction Models\n- **🧑‍🔬 作者**：Tianyuan Zhang, Zhengfei Kuang, Haian Jin, Zexiang Xu, Sai Bi, Hao Tan, He Zhang, Yiwei Hu, Milos Hasan, William T. Freeman, Kai Zhang, Fujun Luan\n- **🏫 单位**：Massachusetts Institute of Technology ⟐ Stanford University ⟐ Cornell University ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](../abs/2410.06231.md)] [[arXiv:2410.06231](https://arxiv.org/abs/2410.06231)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [17] 3D Vision-Language Gaussian Splatting\n- **🧑‍🔬 作者**：Qucheng Peng, Benjamin Planche, Zhongpai Gao, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Chen Chen, Ziyan Wu\n- **🏫 单位**：University of Central Florida ⟐ United Imaging Intelligence, Boston MA\n- **🔗 链接**：[[中英摘要](./abs/2410.07577.md)] [[arXiv:2410.07577](https://arxiv.org/abs/2410.07577)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [18] Fast Feedforward 3D Gaussian Splatting Compression\n- **🧑‍🔬 作者**：Yihang Chen, Qianyi Wu, Mengyao Li, Weiyao Lin, Mehrtash Harandi, Jianfei Cai\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Monash University ⟐ Shanghai University\n- **🔗 链接**：[[中英摘要](../abs/2410.08017.md)] [[arXiv:2410.08017](https://arxiv.org/abs/2410.08017)] [[Code](https://github.com/YihangChen-ee/FCGS)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [19] Poison-splat: Computation Cost Attack on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahao Lu, Yifan Zhang, Qiuhong Shen, Xinchao Wang, Shuicheng Yan\n- **🏫 单位**：National University of Singapore ⟐ Skywork AI\n- **🔗 链接**：[[中英摘要](../abs/2410.08190.md)] [[arXiv:2410.08190](https://arxiv.org/abs/2410.08190)] [[Code](https://github.com/jiahaolu97/poison-splat)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [20] SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars\n- **🧑‍🔬 作者**：Jaeseong Lee, Taewoong Kang, Marcel C. Bühler, Min-Jung Kim, Sungwon Hwang, Junha Hyung, Hyojin Jang, Jaegul Choo\n- **🏫 单位**：KAIST ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](./abs/2410.11682.md)] [[arXiv:2410.11682](https://arxiv.org/abs/2410.11682)] [[Code](https://github.com/summertight/surfhead_repo)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [21] Sort-free Gaussian Splatting via Weighted Sum Rendering\n- **🧑‍🔬 作者**：Qiqi Hou, Randall Rauwendaal, Zifeng Li, Hoang Le, Farzad Farhadzadeh, Fatih Porikli, Alexei Bourd, Amir Said\n- **🏫 单位**：Qualcomm AI Research ⟐ Graphics Research Team\n- **🔗 链接**：[[中英摘要](./abs/2410.18931.md)] [[arXiv:2410.18931](https://arxiv.org/abs/2410.18931)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [22] No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images\n- **🧑‍🔬 作者**：Botao Ye, Sifei Liu, Haofei Xu, Xueting Li, Marc Pollefeys, Ming-Hsuan Yang, Songyou Peng\n- **🏫 单位**：ETH Zurich ⟐ NVIDIA ⟐ Microsoft ⟐ UC Merced\n- **🔗 链接**：[[中英摘要](../abs/2410.24207.md)] [[arXiv:2410.24207](https://arxiv.org/abs/2410.24207)] [[Code](https://github.com/cvg/NoPoSplat)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [23] CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes\n- **🧑‍🔬 作者**：Yang Liu, Chuanchen Luo, Zhongkai Mao, Junran Peng, Zhaoxiang Zhang\n- **🏫 单位**：Chinese Academy of Science ⟐ University of Chinese Academy of Science ⟐ HKIS ⟐ Shandong Universit ⟐ University of Science and Technology Beijin\n- **🔗 链接**：[[中英摘要](../abs/2411.00771.md)] [[arXiv:2411.00771](https://arxiv.org/abs/2411.00771)] [[Code](https://github.com/DekuLiuTesla/CityGaussian)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [24] SplatFormer: Point Transformer for Robust 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang\n- **🏫 单位**：ETH Zurich ⟐ University of Maryland, College Park ⟐ ROCS, University Hospital Balgrist, University of Zurich\n- **🔗 链接**：[[中英摘要](../abs/2411.06390.md)] [[arXiv:2411.06390](https://arxiv.org/abs/2411.06390)] [[Code](https://github.com/ChenYutongTHU/SplatFormer)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [25] Reflective Gaussian Splatting\n- **🧑‍🔬 作者**：Yuxuan Yao, Zixuan Zeng, Chun Gu, Xiatian Zhu, Li Zhang\n- **🏫 单位**：School of Data Science, Fudan University ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](../abs/2412.19282.md)] [[arXiv:2412.19282](https://arxiv.org/abs/2412.19282)] [[Code](https://github.com/fudan-zvg/ref-gaussian)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [26] STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes\n- **🧑‍🔬 作者**：Jiawei Yang, Jiahui Huang, Yuxiao Chen, Yan Wang, Boyi Li, Yurong You, Apoorva Sharma, Maximilian Igl, Peter Karkus, Danfei Xu, Boris Ivanovic, Yue Wang, Marco Pavone\n- **🏫 单位**：University of Southern California ⟐ Georgia Institute of Technology ⟐ Stanford University ⟐ NVIDIA Research\n- **🔗 链接**：[[中英摘要](./abs/2501.00602.md)] [[arXiv:2501.00602](https://arxiv.org/abs/2501.00602)] [[Code](https://github.com/NVlabs/GaussianSTORM)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [27] Locality-aware Gaussian Compression for Fast and High-quality Rendering\n- **🧑‍🔬 作者**：Seungjoo Shin, Jaesik Park, Sunghyun Cho\n- **🏫 单位**：POSTECH ⟐ Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2501.05757.md)] [[arXiv:2501.05757](https://arxiv.org/abs/2501.05757)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [28] Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video\n- **🧑‍🔬 作者**：Xiaohao Xu, Tianyi Zhang, Shibo Zhao, Xiang Li, Sibo Wang, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Sebastian Scherer, Xiaonan Huang\n- **🏫 单位**：University of Michigan ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2501.14319.md)] [[arXiv:2501.14319](https://arxiv.org/abs/2501.14319)] [[Code](https://github.com/Xiaohao-Xu/SLAM-under-Perturbation)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [29] DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation\n- **🧑‍🔬 作者**：Chenguo Lin, Panwang Pan, Bangbang Yang, Zeming Li, Yadong Mu\n- **🏫 单位**：Peking University ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](../abs/2501.16764.md)] [[arXiv:2501.16764](https://arxiv.org/abs/2501.16764)] [[Code](https://github.com/chenguolin/DiffSplat)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [30] OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation\n- **🧑‍🔬 作者**：Yuchen Lin, Chenguo Lin, Jianjin Xu, Yadong Mu\n- **🏫 单位**：Peking University ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2501.18982.md)] [[arXiv:2501.18982](https://arxiv.org/abs/2501.18982)] [[Code](https://github.com/wgsxm/OmniPhysGS)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [31] SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting\n- **🧑‍🔬 作者**：Huajian Huang, Yingshu Chen, Longwei Li, Hui Cheng, Tristan Braud, Yajie Zhao, Sai-Kit Yeung\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Sun Yat-sen University ⟐ Institute for Creative Technologies, University of Southern California\n- **🔗 链接**：[[中英摘要](../abs/2502.04734.md)] [[arXiv:2502.04734](https://arxiv.org/abs/2502.04734)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [32] Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors\n- **🧑‍🔬 作者**：Lin-Zhuo Chen, Kangjie Liu, Youtian Lin, Siyu Zhu, Zhihao Li, Xun Cao, Yao Yao\n- **🏫 单位**：Nanjing University ⟐ Fudan University ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](../abs/2502.07615.md)] [[arXiv:2502.07615](https://arxiv.org/abs/2502.07615)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [33] OMG: Opacity Matters in Material Modeling with Gaussian Splatting\n- **🧑‍🔬 作者**：Silong Yong, Venkata Nagarjun Pudureddiyur Manivannan, Bernhard Kerbl, Zifu Wan, Simon Stepputtis, Katia Sycara, Yaqi Xie\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2502.10988.md)] [[arXiv:2502.10988](https://arxiv.org/abs/2502.10988)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [34] High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation\n- **🧑‍🔬 作者**：Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang\n- **🏫 单位**: Sun Yat-sen University ⟐ The Chinese University of Hong Kong, Shenzhen ⟐ Guangzhou Meteorological Observatory\n- **🔗 链接**：[[中英摘要](../abs/2502.14895.md)] [[arXiv:2502.14895](https://arxiv.org/abs/2502.14895)] [[Code](https://github.com/Ziyeeee/STC-GS)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [35] Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chong Cheng, Gaochao Song, Yiyang Yao, Qinzheng Zhou, Gangjian Zhang, Hao Wang\n- **🏫 单位**: HKUST(GZ) ⟐ HKU ⟐ SCUT ⟐ UC Berkeley\n- **🔗 链接**：[[中英摘要](../abs/2502.17377.md)] [[arXiv:2502.17377](https://arxiv.org/abs/2502.17377)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [36] UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting\n- **🧑‍🔬 作者**：Haoyuan Li, Yanpeng Zhou, Tao Tang, Jifei Song, Yihan Zeng, Michael Kampffmeyer, Hang Xu, Xiaodan Liang\n- **🏫 单位**: Shenzhen campus of Sun Yat-sen University ⟐ Huawei Noah’s Ark Lab ⟐ UiT The Arctic University of Norway ⟐ Peng Cheng Laboratory ⟐ Guangdong Key Laboratory of Big Data Analysis and Processing\n- **🔗 链接**：[[中英摘要](../abs/2502.17860.md)] [[arXiv:2502.17860](https://arxiv.org/abs/2502.17860)] [[Code](https://github.com/Li-Hao-yuan/UniGS)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [37] Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting\n- **🧑‍🔬 作者**：Yu Liu, Baoxiong Jia, Ruijie Lu, Junfeng Ni, Song-Chun Zhu, Siyuan Huang\n- **🏫 单位**: Tsinghua University ⟐ State Key Laboratory of General Artificial Intelligence, BIGAI ⟐ Peking University\n- **🔗 链接**：[[中英摘要](../abs/2502.19459.md)] [[arXiv:2502.19459](https://arxiv.org/abs/2502.19459)] [[Code](https://github.com/YuLiu-LY/ArtGS)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [38] CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression\n- **🧑‍🔬 作者**：Yu-Ting Zhan, Cheng-Yuan Ho, Hebi Yang, Yi-Hsin Chen, Jui Chiu Chiang, Yu-Lun Liu, Wen-Hsiao Peng\n- **🏫 单位**: National Yang Ming Chiao Tung University, Taiwan\n- **🔗 链接**：[[中英摘要](../abs/2503.00357.md)] [[arXiv:2503.00357](https://arxiv.org/abs/2503.00357)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [39] SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography\n- **🧑‍🔬 作者**：Xuanyu Zhang, Jiarui Meng, Zhipei Xu, Shuzhou Yang, Yanmin Wu, Ronggang Wang, Jian Zhang\n- **🏫 单位**: Peking University ⟐ Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.06118.md)] [[arXiv:2503.06118](https://arxiv.org/abs/2503.06118)] [Code]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [40] Swift4D:Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene\n- **🧑‍🔬 作者**：Jiahao Wu, Rui Peng, Zhiyan Wang, Lu Xiao, Luyang Tang, Jinbo Yan, Kaiqiang Xiong, Ronggang Wang\n- **🏫 单位**: Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology ⟐ Shenzhen Graduate School, Peking University\n- **🔗 链接**：[[中英摘要](../abs/2503.12307.md)] [[arXiv:2503.12307](https://arxiv.org/abs/2503.12307)] [[Code](https://github.com/WuJH2001/swift4d)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [41] 3D-Spatial MultiModal Memory\n- **🧑‍🔬 作者**：Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng, Jianglong Ye, Sifei Liu, Xiaolong Wang\n- **🏫 单位**: UC San Diego ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2503.16413.md)] [[arXiv:2503.16413](https://arxiv.org/abs/2503.16413)] [[Code](https://github.com/MaureenZOU/m3-spatial)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n\n#### [42] Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images\n- **🧑‍🔬 作者**：In-Hwan Jin, Haesoo Choo, Seong-Hun Jeong, Heemoon Park, Junghwan Kim, Oh-joon Kwon, Kyeongbo Kong\n- **🏫 单位**：Pusan National University ⟐ Busan Munhwa Broadcasting Corporation ⟐ Korea University ⟐ DM Studio\n- **🔗 链接**：[[中英摘要](../abs/2504.05458.md)] [[arXiv:2504.05458](https://arxiv.org/abs/2504.05458)] [[Code](https://github.com/cvsp-lab/ICLR2025_3D-MOM)]\n- **📝 说明**：🏆 Accepted to ICLR 2025\n"
  },
  {
    "path": "2025/ICME.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICME2025\n\n#### [1] Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussian Spatting\n- **🧑‍🔬 作者**：Zhiying Yan, Yiyuan Liang, Shilv Cai, Tao Zhang, Sheng Zhong, Luxin Yan, Xu Zou\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ National Key Laboratory of Multispectral Information Intelligent Processing Technology, China ⟐ Nanyang Technological University, Singapore\n- **🔗 链接**：[[中英摘要](../abs/2503.19332.md)] [[arXiv:2503.19332](https://arxiv.org/abs/2503.19332)] [Code]\n- **📝 说明**：🏆 Accepted to ICME 2025\n\n#### [2] TC-GS: Tri-plane based compression for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Taorui Wang, Zitong Yu, Yong Xu\n- **🏫 单位**：Harbin Institute of Technology Shenzhen ⟐ Great Bay University ⟐ Dongguan Key Laboratory for Intelligence and Information Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.20221.md)] [[arXiv:2503.20221](https://arxiv.org/abs/2503.20221)] [[Code](https://github.com/timwang2001/TC-GS)]\n- **📝 说明**：🏆 Accepted to ICME 2025\n\n#### [3] ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Wenjie Liu, Zhongliang Liu, Xiaoyan Yang, Man Sha, Yang Li\n- **🏫 单位**：School of Computer Science and Technology, East China Normal University, Shanghai, China ⟐ School of Software Engineering, East China Normal University, Shanghai, China ⟐ Shanghai Chinafortune Co Ltd, Shanghai, China\n- **🔗 链接**：[[中英摘要](../abs/2503.22218.md)] [[arXiv:2503.22218](https://arxiv.org/abs/2503.22218)] [[Code](https://github.com/vpx-ecnu/ABC-GS)]\n- **📝 说明**：🏆 Accepted to ICME 2025\n\n#### [4] Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction\n- **🧑‍🔬 作者**：Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue\n- **🏫 单位**：Fudan University ⟐ Nanyang Technological University ⟐ Shanghai Innovation Institute ⟐ NeuHelium Co., Ltd\n- **🔗 链接**：[[中英摘要](../abs/2503.23337.md)] [[arXiv:2503.23337](https://arxiv.org/abs/2503.23337)] [Code]\n- **📝 说明**：🏆 Accepted to ICME 2025\n\n#### [5] ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image\n- **🧑‍🔬 作者**：Tianyi Gong, Boyan Li, Yifei Zhong, Fangxin Wang\n- **🏫 单位**：School of Science and Engineering, The Chinese University of Hongkong, Shenzhen\n- **🔗 链接**：[[中英摘要](../abs/2503.23881.md)] [[arXiv:2503.23881](https://arxiv.org/abs/2503.23881)] [Code]\n- **📝 说明**：🏆 Accepted to ICME 2025\n\n#### [6] 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hao Wu, Hao Wang, Ruochong Li, Xuran Ma, Hui Xiong\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou), China\n- **🔗 链接**：[[中英摘要](../abs/2504.01619.md)] [[arXiv:2504.01619](https://arxiv.org/abs/2504.01619)] [Code]\n- **📝 说明**：🏆 Accepted to ICME 2025\n\n#### [7] ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting\n- **🧑‍🔬 作者**：Junbang Liu, Enpei Huang, Dongxing Mao, Hui Zhang, Xinyuan Song, Yongxin Ni\n- **🏫 单位**：Beijing Normal-Hong Kong Baptist University ⟐ National University of Singapore ⟐ Emory University\n- **🔗 链接**：[[中英摘要](./abs/2504.08100.md)] [[arXiv:2504.08100](https://arxiv.org/abs/2504.08100)] [Code]\n- **📝 说明**: 🏆 Accepted to ICME 2025\n\n#### [8] SMPL Normal Map Is All You Need for Single-view Textured Human Reconstruction\n- **🧑‍🔬 作者**：Wenhao Shen, Gangjian Zhang, Jianfeng Zhang, Yu Feng, Nanjie Yao, Xuanmeng Zhang, Hao Wang\n- **🏫 单位**：Nanyang Technological University ⟐ The Hong Kong University of Science and Technology (Guangzhou) ⟐ National University of Singapore ⟐ Zhejiang University of Technology ⟐ University of Technology Sydney\n- **🔗 链接**：[[中英摘要](./abs/2506.12793.md)] [[arXiv:2506.12793](https://arxiv.org/abs/2506.12793)] [Code]\n- **📝 说明**: 🏆 Accepted to ICME 2025\n"
  },
  {
    "path": "2025/ICML.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICML2025\n\n#### [1] PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, Seungryong Kim\n- **🏫 单位**：Korea University ⟐ KAIST ⟐ Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](./abs/2410.22128.md)] [[arXiv:2410.22128](https://arxiv.org/abs/2410.22128)] [[Code](https://github.com/cvlab-kaist/PF3plat)]\n- **📝 说明**: 🏆 Accepted to ICML 2025\n\n#### [2] PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering\n- **🧑‍🔬 作者**：Junxi Jin, Xiulai Li, Haiping Huang, Lianjun Liu, Yujie Sun\n- **🏫 单位**：Hainan University, Haikou, China\n- **🔗 链接**：[[中英摘要](./abs/2411.05731.md)] [[arXiv:2411.05731](https://arxiv.org/abs/2411.05731)] [[Code](https://github.com/cvlab-kaist/PF3plat)]\n- **📝 说明**：🏆 Accepted to ICML 2025\n\n#### [3] HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder\n- **🧑‍🔬 作者**：Qi Yang, Le Yang, Geert Van Der Auwera, Zhu Li\n- **🏫 单位**：School of Science and Engineering, University of MissouriKansas City, Kansas, US ⟐ Electrical and Computer Engineering, University of Canterbury, Christchurch, New Zealand ⟐ Qualcomm, California, US\n- **🔗 链接**：[[中英摘要](../abs/2505.01938.md)] [[arXiv:2505.01938](https://arxiv.org/abs/2505.01938)] [[Code](https://github.com/Qi-Yangsjtu/HybridGS)]\n- **📝 说明**: 🏆 Accepted to ICML 2025\n\n#### [4] Tackling View-Dependent Semantics in 3D Language Gaussian Splatting\n- **🧑‍🔬 作者**：Jiazhong Cen, Xudong Zhou, Jiemin Fang, Changsong Wen, Lingxi Xie, Xiaopeng Zhang, Wei Shen, Qi Tian\n- **🏫 单位**：Huawei ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2505.24746.md)] [[arXiv:2505.24746](https://arxiv.org/abs/2505.24746)] [[Code](https://github.com/SJTU-DeepVisionLab/LaGa)]\n- **📝 说明**: 🏆 Accepted to ICML 2025\n\n#### [5] VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians\n- **🧑‍🔬 作者**：Pengchong Hu, Zhizhong Han\n- **🏫 单位**：Wayne State University\n- **🔗 链接**：[[中英摘要](./abs/2506.02741.md)] [[arXiv:2506.02741](https://arxiv.org/abs/2506.02741)] [[Code](https://github.com/MachinePerceptionLab/VTGaussian-SLAM)]\n- **📝 说明**: 🏆 Accepted to ICML 2025\n\n#### [6] Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting\n- **🧑‍🔬 作者**：Hongbi Zhou, Zhangkai Ni\n- **🏫 单位**：Tongji University\n- **🔗 链接**：[[中英摘要](./abs/2506.12400.md)] [[arXiv:2506.12400](https://arxiv.org/abs/2506.12400)] [[Code](https://github.com/eezkni/Perceptual-GS)]\n- **📝 说明**: 🏆 Accepted to ICML 2025\n\n#### [7] ReferSplat: Referring Segmentation in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shuting He, Guangquan Jie, Changshuo Wang, Yun Zhou, Shuming Hu, Guanbin Li, Henghui Ding\n- **🏫 单位**：Shanghai University of Finance and Economics ⟐ Fudan University ⟐ Nanyang Technological University ⟐ Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2508.08252.md)] [[arXiv:2508.08252](https://arxiv.org/abs/2508.08252)] [[Code](https://github.com/heshuting555/ReferSplat)]\n- **📝 说明**: 🏆 Accepted to ICML 2025 Oral\n"
  },
  {
    "path": "2025/ICRA.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICRA2025\n\n#### [1] Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaolei Lang, Laijian Li, Hang Zhang, Feng Xiong, Mu Xu, Yong Liu, Xingxing Zuo, Jiajun Lv\n- **🏫 单位**：Zhejiang University ⟐  AMAP ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2404.06926.md)] [[arXiv:2404.06926](https://arxiv.org/abs/2404.06926)] [[Code](https://github.com/APRIL-ZJU/Gaussian-LIC)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [2] AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction\n- **🧑‍🔬 作者**：Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, Bingbing Liu\n- **🏫 单位**：University of Toronto ⟐ Noah’s Ark Lab, Huawei Technologies\n- **🔗 链接**：[[中英摘要](../abs/2407.02598.md)] [[arXiv:2407.02598](https://arxiv.org/abs/2407.02598)] [Code]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [3] SAFER-Splat: A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps\n- **🧑‍🔬 作者**：Timothy Chen, Aiden Swann, Javier Yu, Ola Shorinwa, Riku Murai, Monroe Kennedy III, Mac Schwager\n- **🏫 单位**：Stanford University, Stanford, CA, USA ⟐ Imperial College London, London, UK\n- **🔗 链接**：[[中英摘要](../abs/2409.09868.md)] [[arXiv:2409.09868](https://arxiv.org/abs/2409.09868)] [[Code](https://chengine.github.io/safer-splat/)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [4] RenderWorld: World Model with Self-Supervised 3D Label\n- **🧑‍🔬 作者**：Ziyang Yan, Wenzhen Dong, Yihua Shao, Yuhang Lu, Liu Haiyang, Jingwen Liu, Haozhe Wang, Zhe Wang, Yan Wang, Fabio Remondino, Yuexin Ma\n- **🏫 单位**：ShanghaiTech University ⟐ Fondazione Bruno Kessler ⟐ University of Trento ⟐ Tsinghua University ⟐ The University of Science and Technology Beijing ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](../abs/2409.11356.md)] [[arXiv:2409.11356](https://arxiv.org/abs/2409.11356)] [Code]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [5] Hier-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting\n- **🧑‍🔬 作者**：Boying Li, Zhixi Cai, Yuan-Fang Li, Ian Reid, Hamid Rezatofighi\n- **🏫 单位**：Faculty of Information Technology, Monash University, Australia ⟐ Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates\n- **🔗 链接**：[[中英摘要](../abs/2409.12518.md)] [[arXiv:2409.12518](https://arxiv.org/abs/2409.12518)] [[Code](https://github.com/LeeBY68/Hier-SLAM)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [6] MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yan Song Hu, Nicolas Abboud, Muhammad Qasim Ali, Adam Srebrnjak Yang, Imad Elhajj, Daniel Asmar, Yuhao Chen, John S. Zelek\n- **🏫 单位**：University of Waterloo ⟐ University of Beirut\n- **🔗 链接**：[[中英摘要](./abs/2409.13055.md)] [[arXiv:2409.13055](https://arxiv.org/abs/2409.13055)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2025\n\n#### [7] SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model\n- **🧑‍🔬 作者**：Daniel Yang, John J. Leonard, Yogesh Girdhar\n- **🏫 单位**：MIT CSAIL ⟐ Woods Hole Oceanographic Institution\n- **🔗 链接**：[[中英摘要](./abs/2409.17345.md)] [[arXiv:2409.17345](https://arxiv.org/abs/2409.17345)] [[Code](https://github.com/dxyang/seasplat)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [8] RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning\n- **🧑‍🔬 作者**：Yuxuan Wu, Lei Pan, Wenhua Wu, Guangming Wang, Yanzi Miao, Hesheng Wang\n- **🏫 单位**：Department of Automation, Shanghai Jiao Tong University ⟐ School of Information and Control Engineering, China University of Mining and Technology ⟐ MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University ⟐ Department of Engineering, University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2409.20291.md)] [[arXiv:2409.20291](https://arxiv.org/abs/2409.20291)] [[Code](https://github.com/FurryGreen/RL-GS-Bridge)]\n- **📝 说明**: 🏆 Accepted to ICRA 2025\n\n#### [9] Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis, Monroe Kennedy III\n- **🏫 单位**：Stanford University ⟐ University of Pennsylvania\n- **🔗 链接**：[[中英摘要](../abs/2410.04680.md)] [[arXiv:2410.04680](https://arxiv.org/abs/2410.04680)] [[Code](https://github.com/armlabstanford/NextBestSense)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [10] DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes\n- **🧑‍🔬 作者**：Hao Li, Yuanyuan Gao, Haosong Peng, Chenming Wu, Weicai Ye, Yufeng Zhan, Chen Zhao, Dingwen Zhang, Jingdong Wang, Junwei Han\n- **🏫 单位**：NWPU ⟐ Baidu VIS ⟐ Beijing Institute of Technology ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2411.12309.md)] [[arXiv:2411.12309](https://arxiv.org/abs/2411.12309)] [Code]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [11] WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting\n- **🧑‍🔬 作者**：Chenghao Qian, Yuhu Guo, Wenjing Li, Gustav Markkula\n- **🏫 单位**：University of Leeds ⟐ Carnegie Mellon University ⟐ 42dot ⟐  School of Computing, KAIST\n- **🔗 链接**：[[中英摘要](./abs/2412.18862.md)] [[arXiv:2412.18862](https://arxiv.org/abs/2412.18862)] [[Code](https://github.com/Jumponthemoon/WeatherGS)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [12] Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping\n- **🧑‍🔬 作者**：Yiming Huang, Beilei Cui, Long Bai, Zhen Chen, Jinlin Wu, Zhen Li, Hongbin Liu, Hongliang Ren\n- **🏫 单位**：CUHK\n- **🔗 链接**：[[中英摘要](../abs/2501.19319.md)] [[arXiv:2501.19319](https://arxiv.org/abs/2501.19319)] [[Code](https://github.com/lastbasket/Endo-2DTAM)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [13] GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAM\n- **🧑‍🔬 作者**：Mingrui Li, Weijian Chen, Na Cheng, Jingyuan Xu, Dong Li, Hongyu Wang\n- **🏫 单位**：Dalian University of Technology ⟐ Sun Yat-sen University ⟐ University of Macau\n- **🔗 链接**：[[中英摘要](../abs/2502.03228.md)] [[arXiv:2502.03228](https://arxiv.org/abs/2502.03228)] [Code]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [14] PUGS: Zero-shot Physical Understanding with Gaussian Splatting\n- **🧑‍🔬 作者**：Yinghao Shuai, Ran Yu, Yuantao Chen, Zijian Jiang, Xiaowei Song, Nan Wang, Jv Zheng, Jianzhu Ma, Meng Yang, Zhicheng Wang, Wenbo Ding, Hao Zhao\n- **🏫 单位**: CAD Research Center, Tongji University ⟐ Shenzhen Ubiquitous Data Enabling Key Lab, Shenzhen International Graduate School, Tsinghua University ⟐ The Chinese University of Hong Kong, Shenzhen ⟐ MGI Tech ⟐ Chulalongkorn University ⟐ Institute for AI Industry Research (AIR), Tsinghua University ⟐ Beijing Academy of Artificial Intelligence(BAAI) ⟐ Lightwheel AI\n- **🔗 链接**：[[中英摘要](../abs/2502.12231.md)] [[arXiv:2502.12231](https://arxiv.org/abs/2502.12231)] [[Code](https://github.com/EverNorif/PUGS)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [15] RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes\n- **🧑‍🔬 作者**：Sicheng Yu, Chong Cheng, Yifan Zhou, Xiaojun Yang, Hao Wang\n- **🏫 单位**: The Hong Kong University of Science and Technology (GuangZhou)\n- **🔗 链接**：[[中英摘要](../abs/2502.15633.md)] [[arXiv:2502.15633](https://arxiv.org/abs/2502.15633)] [[Code](https://github.com/3DAgentWorld/OpenGS-SLAM)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [16] OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding\n- **🧑‍🔬 作者**：Dianyi Yang, Yu Gao, Xihan Wang, Yufeng Yue, Yi Yang, Mengyin Fu\n- **🏫 单位**: Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](../abs/2503.01646.md)] [[arXiv:2503.01646](https://arxiv.org/abs/2503.01646)] [[Code](https://young-bit.github.io/opengs-github.github.io/)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [17] Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects\n- **🧑‍🔬 作者**：Justin Yu, Kush Hari, Karim El-Refai, Arnav Dalal, Justin Kerr, Chung Min Kim, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg\n- **🏫 单位**: University of California, Berkeley ⟐ Toyota Research Institute\n- **🔗 链接**：[[中英摘要](../abs/2503.05189.md)] [[arXiv:2503.05189](https://arxiv.org/abs/2503.05189)] [[Code](https://github.com/uynitsuj/pogs)]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [18] LiDAR-enhanced 3D Gaussian Splatting Mapping\n- **🧑‍🔬 作者**：Jian Shen, Huai Yu, Ji Wu, Wen Yang, Gui-Song Xia\n- **🏫 单位**: School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China\n- **🔗 链接**：[[中英摘要](../abs/2503.05425.md)] [[arXiv:2503.05425](https://arxiv.org/abs/2503.05425)] [Code]\n- **📝 说明**：🏆 Accepted to ICRA 2025\n\n#### [19] DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Holly Dinkel, Marcel Büsching, Alberta Longhini, Brian Coltin, Trey Smith, Danica Kragic, Mårten Björkman, Timothy Bretl\n- **🏫 单位**：University of Illinois Urbana-Champaign ⟐ KTH Royal Institute of Technology ⟐ NASA Ames Research Center\n- **🔗 链接**：[[中英摘要](./abs/2505.08644.md)] [[arXiv:2505.08644](https://arxiv.org/abs/2505.08644)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2025 RMDO workshop\n\n#### [20] Large-Scale Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Zhe Xin, Chenyang Wu, Penghui Huang, Yanyong Zhang, Yinian Mao, Guoquan Huang\n- **🏫 单位**：Meituan ⟐ University of Science and Technology of China ⟐ University of Delaware\n- **🔗 链接**：[[中英摘要](./abs/2505.09915.md)] [[arXiv:2505.09915](https://arxiv.org/abs/2505.09915)] [[Code](https://github.com/lsg-slam/LSG-SLAM)]\n- **📝 说明**: 🏆 Accepted to ICRA 2025\n\n#### [21] Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Xiaolei Lang, Jiajun Lv, Kai Tang, Laijian Li, Jianxin Huang, Lina Liu, Yong Liu, Xingxing Zuo\n- **🏫 单位**：Zhejiang University ⟐ MBZUAI\n- **🔗 链接**：[[中英摘要](./abs/2507.04004.md)] [[arXiv:2507.04004](https://arxiv.org/abs/2507.04004)] [[Code](https://github.com/APRIL-ZJU/Gaussian-LIC)]\n- **📝 说明**: 🏆 Accepted to ICRA 2025\n\n#### [22] FGO-SLAM: Enhancing Gaussian SLAM with Globally Consistent Opacity Radiance Field\n- **🧑‍🔬 作者**：Fan Zhu, Yifan Zhao, Ziyu Chen, Biao Yu, Hui Zhu\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2509.01547.md)] [[arXiv:2509.01547](https://arxiv.org/abs/2509.01547)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2025\n\n#### [23] A Shared-Autonomy Construction Robotic System for Overhead Works\n- **🧑‍🔬 作者**：David Minkwan Kim, K. M. Brian Lee, Yong Hyeok Seo, Nikola Raicevic, Runfa Blark Li, Kehan Long, Chan Seon Yoon, Dong Min Kang, Byeong Jo Lim, Young Pyoung Kim, Nikolay Atanasov, Truong Nguyen, Se Woong Jun, Young Wook Kim\n- **🏫 单位**：Korea Electronics Technology Institute (KETI) ⟐ University of California ⟐ ITONE Inc.\n- **🔗 链接**：[[中英摘要](./abs/2511.09695.md)] [[arXiv:2511.09695](https://arxiv.org/abs/2511.09695)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2025 Construction Workshop\n"
  },
  {
    "path": "2025/IROS.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to IROS2025\n\n#### [1] SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Siting Zhu, Renjie Qin, Guangming Wang, Jiuming Liu, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2403.07494.md)] [[arXiv:2403.07494](https://arxiv.org/abs/2403.07494)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [2] GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics\n- **🧑‍🔬 作者**：Qianzhong Chen, Jiankai Sun, Naixiang Gao, JunEn Low, Timothy Chen, Mac Schwager\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2503.03984.md)] [[arXiv:2503.03984](https://arxiv.org/abs/2503.03984)] [[Code](https://github.com/Qianzhong-Chen/grad_nav)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [3] GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction\n- **🧑‍🔬 作者**：Jianheng Liu, Yunfei Wan, Bowen Wang, Chunran Zheng, Jiarong Lin, Fu Zhang\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2503.10170.md)] [[arXiv:2503.10170](https://arxiv.org/abs/2503.10170)] [[Code](https://github.com/hku-mars/GS-SDF)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [4] Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Zhicong Sun, Jacqueline Lo, Jinxing Hu\n- **🏫 单位**：Hong Kong Polytechnic University, Hong Kong SAR, China ⟐ Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China\n- **🔗 链接**：[[中英摘要](./abs/2504.04844.md)] [[arXiv:2504.04844](https://arxiv.org/abs/2504.04844)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [5] FOCI: Trajectory Optimization on Gaussian Splats\n- **🧑‍🔬 作者**：Mario Gomez Andreu, Maximum Wilder-Smith, Victor Klemm, Vaishakh Patil, Jesus Tordesillas, Marco Hutter\n- **🏫 单位**：ETH Zurich ⟐ Comillas Pontifical University\n- **🔗 链接**：[[中英摘要](./abs/2505.08510.md)] [[arXiv:2505.08510](https://arxiv.org/abs/2505.08510)] [[Code](https://github.com/leggedrobotics/foci)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [6] 3D Gaussian Splatting for Fine-Detailed Surface Reconstruction in Large-Scale Scene\n- **🧑‍🔬 作者**：Shihan Chen, Zhaojin Li, Zeyu Chen, Qingsong Yan, Gaoyang Shen, Ran Duan\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2506.17636.md)] [[arXiv:2506.17636](https://arxiv.org/abs/2506.17636)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [7] ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects\n- **🧑‍🔬 作者**：Qiaojun Yu, Xibin Yuan, Yu jiang, Junting Chen, Dongzhe Zheng, Ce Hao, Yang You, Yixing Chen, Yao Mu, Liu Liu, Cewu Lu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai AI Laboratory ⟐ National University of Singapore ⟐ Princeton University ⟐ Stanford University ⟐ Hefei University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.02600.md)] [[arXiv:2507.02600](https://arxiv.org/abs/2507.02600)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [8] SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation\n- **🧑‍🔬 作者**：Beining Xu, Siting Zhu, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2507.12027.md)] [[arXiv:2507.12027](https://arxiv.org/abs/2507.12027)] [[Code](https://github.com/IRMVLab/SGLoc)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [9] CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting\n- **🧑‍🔬 作者**：Haoran Xu, Saining Zhang, Peishuo Li, Baijun Ye, Xiaoxue Chen, Huan-ang Gao, Jv Zheng, Xiaowei Song, Ziqiao Peng, Run Miao, Jinrang Jia, Yifeng Shi, Guangqi Yi, Hang Zhao, Hao Tang, Hongyang Li, Kaicheng Yu, Hao Zhao\n- **🏫 单位**：Tsinghua University ⟐ Beijing Institute of Technology ⟐ Nanyang Technological University ⟐ Tsinghua University ⟐ Renmin University of China ⟐ Beijing University of Technology ⟐ Baidu Inc. ⟐ Peking University ⟐ Shanghai AI Lab ⟐ Westlake University ⟐ Beijing Academy of Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2507.18473.md)] [[arXiv:2507.18473](https://arxiv.org/abs/2507.18473)] [[Code](https://github.com/SainingZhang/CRUISE)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [10] Automated 3D-GS Registration and Fusion via Skeleton Alignment and Gaussian-Adaptive Features\n- **🧑‍🔬 作者**：Shiyang Liu, Dianyi Yang, Yu Gao, Bohan Ren, Yi Yang, Mengyin Fu\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.20480.md)] [[arXiv:2507.20480](https://arxiv.org/abs/2507.20480)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [11] OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding\n- **🧑‍🔬 作者**：Dianyi Yang, Xihan Wang, Yu Gao, Shiyang Liu, Bohan Ren, Yufeng Yue, Yi Yang\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.01150.md)] [[arXiv:2508.01150](https://arxiv.org/abs/2508.01150)] [[Code](https://github.com/YOUNG-bit/OpenGS-Fusion)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [12] Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline\n- **🧑‍🔬 作者**：Linqing Zhao, Xiuwei Xu, Yirui Wang, Hao Wang, Wenzhao Zheng, Yansong Tang, Haibin Yan, Jiwen Lu\n- **🏫 单位**：Tsinghua University ⟐ Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2508.04597.md)] [[arXiv:2508.04597](https://arxiv.org/abs/2508.04597)] [[Code](https://github.com/wangyr22/DepthGS)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [13] Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction\n- **🧑‍🔬 作者**：Bo Jia, Yanan Guo, Ying Chang, Benkui Zhang, Ying Xie, Kangning Du, Lin Cao\n- **🏫 单位**：Beijing Information Science and Technology University ⟐ CAS\n- **🔗 链接**：[[中英摘要](./abs/2508.07701.md)] [[arXiv:2508.07701](https://arxiv.org/abs/2508.07701)] [[Code](https://github.com/Bistu3DV/MND-GS)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [14] Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats\n- **🧑‍🔬 作者**：Simeon Adebola, Chung Min Kim, Justin Kerr, Shuangyu Xie, Prithvi Akella, Jose Luis Susa Rincon, Eugen Solowjow, Ken Goldberg\n- **🏫 单位**：UC Berkeley ⟐ Siemens Research Lab\n- **🔗 链接**：[[中英摘要](./abs/2510.17783.md)] [[arXiv:2510.17783](https://arxiv.org/abs/2510.17783)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [15] GRASPLAT: Enabling dexterous grasping through novel view synthesis\n- **🧑‍🔬 作者**：Matteo Bortolon, Nuno Ferreira Duarte, Plinio Moreno, Fabio Poiesi, José Santos-Victor, Alessio Del Bue\n- **🏫 单位**：Fondazione Bruno Kessler ⟐ Fondazione Istituto Italiano di Tecnologia ⟐ Universidade de Lisboa ⟐ University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2510.19200.md)] [[arXiv:2510.19200](https://arxiv.org/abs/2510.19200)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [16] STG-Avatar: Animatable Human Avatars via Spacetime Gaussian\n- **🧑‍🔬 作者**：Guangan Jiang, Tianzi Zhang, Dong Li, Zhenjun Zhao, Haoang Li, Mingrui Li, Hongyu Wang\n- **🏫 单位**：Dalian University of Technology ⟐ Fudan University ⟐ University of Macau ⟐ University of Zaragoz ⟐ Hong Kong University of Science and Technology\n(Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2510.22140.md)] [[arXiv:2510.22140](https://arxiv.org/abs/2510.22140)] [[Code](https://github.com/jiangguangan/STG-Avatar)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [17] Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes\n- **🧑‍🔬 作者**：Meijun Guo, Yongliang Shi, Caiyun Liu, Yixiao Feng, Ming Ma, Tinghai Yan, Weining Lu, Bin Liang\n- **🏫 单位**：Beijing Institute of Technology ⟐ Beiing National Research Center for Information Science and Technology ⟐ Qiyuan Lab ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2511.06765.md)] [[arXiv:2511.06765](https://arxiv.org/abs/2511.06765)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [18] iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion\n- **🧑‍🔬 作者**：Hao Wang, Linqing Zhao, Xiuwei Xu, Jiwen Lu, Haibin Yan\n- **🏫 单位**：Beijing University of Posts and Telecommunications ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2511.14149.md)] [[arXiv:2511.14149](https://arxiv.org/abs/2511.14149)] [[Code](https://github.com/pythongod-exe/iGaussian)]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n"
  },
  {
    "path": "2025/MICCAI.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to MICCAI2025\n\n#### [1] EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting\n- **🧑‍🔬 作者**：Taoyu Wu, Yiyi Miao, Zhuoxiao Li, Haocheng Zhao, Kang Dang, Jionglong Su, Limin Yu, Haoang Li\n- **🏫 单位**：Xi’an Jiaotong Liverpool University ⟐ University of Liverpool ⟐ The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2506.21420.md)] [[arXiv:2506.21420](https://arxiv.org/abs/2506.21420)] [Code]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n\n#### [2] DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model\n- **🧑‍🔬 作者**：Yuliang Huang, Imraj Singh, Thomas Joyce, Kris Thielemans, Jamie R. McClelland\n- **🏫 单位**：University College London\n- **🔗 链接**：[[中英摘要](./abs/2506.22280.md)] [[arXiv:2506.22280](https://arxiv.org/abs/2506.22280)] [[Code](https://github.com/Yuliang-Huang/DIGS)]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n\n#### [3] Endo-4DGX: Robust Endoscopic Scene Reconstruction and Illumination Correction with Gaussian Splatting\n- **🧑‍🔬 作者**：Yiming Huang, Long Bai, Beilei Cui, Yanheng Li, Tong Chen, Jie Wang, Jinlin Wu, Zhen Lei, Hongbin Liu, Hongliang Ren\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ City University of Hong Kong ⟐ The University of Sydney ⟐ Chinese Academy of Sciences ⟐ Shenzhen Research Institute, CUHK\n- **🔗 链接**：[[中英摘要](./abs/2506.23308.md)] [[arXiv:2506.23308](https://arxiv.org/abs/2506.23308)] [[Code](https://github.com/lastbasket/Endo-4DGX)]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n\n#### [4] SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting\n- **🧑‍🔬 作者**：Yiming Huang, Long Bai, Beilei Cui, Kun Yuan, Guankun Wang, Mobarak I. Hoque, Nicolas Padoy, Nassir Navab, Hongliang Ren\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Shenzhen Research Institute, CUHK ⟐ Technical University of Munich ⟐ University of Strasbourg & IHU Strasbourg ⟐ University College London\n- **🔗 链接**：[[中英摘要](./abs/2506.23309.md)] [[arXiv:2506.23309](https://arxiv.org/abs/2506.23309)] [[Code](https://github.com/lastbasket/SurgTPGS)]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n\n#### [5] Dyna3DGR: 4D Cardiac Motion Tracking with Dynamic 3D Gaussian Representation\n- **🧑‍🔬 作者**：Xueming Fu, Pei Wu, Yingtai Li, Xin Luo, Zihang Jiang, Junhao Mei, Jian Lu, Gao-Jun Teng, S. Kevin Zhou\n- **🏫 单位**：USTC ⟐ Jiangsu Provincial Key Laboratory of Multimodal Digital Twin Technology ⟐ Southeast University\n- **🔗 链接**：[[中英摘要](./abs/2507.16608.md)] [[arXiv:2507.16608](https://arxiv.org/abs/2507.16608)] [[Code](https://github.com/windrise/Dyna3DGR)]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n\n#### [6] Efficient 3D Scene Reconstruction and Simulation from Sparse Endoscopic Views\n- **🧑‍🔬 作者**：Zhenya Yang\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2509.17027.md)] [[arXiv:2509.17027](https://arxiv.org/abs/2509.17027)] [Code]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025 AECAI Workshop\n\n#### [7] BridgeSplat: Bidirectionally Coupled CT and Non-Rigid Gaussian Splatting for Deformable Intraoperative Surgical Navigation\n- **🧑‍🔬 作者**：Maximilian Fehrentz, Alexander Winkler, Thomas Heiliger, Nazim Haouchine, Christian Heiliger, Nassir Navab\n- **🏫 单位**：TU Munic ⟐ Hospital of the LMU Munich, Ludwig-Maximilians-Universität (LMU) ⟐ Harvard Medical School, Brigham and Women’s Hospital\n- **🔗 链接**：[[中英摘要](./abs/2509.18501.md)] [[arXiv:2509.18501](https://arxiv.org/abs/2509.18501)] [Code]\n- **📝 说明**: 🏆 Accepted to MICCAI 2025\n"
  },
  {
    "path": "2025/NeurIPS.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to NeurIPS2025\n\n#### [1] EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation\n- **🧑‍🔬 作者**：Siyuan Huang, Liliang Chen, Pengfei Zhou, Shengcong Chen, Zhengkai Jiang, Yue Hu, Peng Gao, Hongsheng Li, Maoqing Yao, Guanghui Ren\n- **🏫 单位**：AgiBot ⟐ Shanghai AI Lab ⟐ CUHK ⟐ SJTU ⟐ FDU ⟐ HKUST ⟐ HIT\n- **🔗 链接**：[[中英摘要](./abs/2501.01895.md)] [[arXiv:2501.01895](https://arxiv.org/abs/2501.01895)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [2] Learning Efficient Fuse-and-Refine for Feed-Forward 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yiming Wang, Lucy Chai, Xuan Luo, Michael Niemeyer, Manuel Lagunas, Stephen Lombardi, Siyu Tang, Tiancheng Sun\n- **🏫 单位**：ETHZurich ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2503.14698.md)] [[arXiv:2503.14698](https://arxiv.org/abs/2503.14698)] [[Code](https://19reborn.github.io/SplatVoxel/)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [3] Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting\n- **🧑‍🔬 作者**：Yiren Lu, Yunlai Zhou, Yiran Qiao, Chaoda Song, Tuo Liang, Jing Ma, Yu Yin\n- **🏫 单位**：Case Western Reserve University\n- **🔗 链接**：[[中英摘要](./abs/2503.22204.md)] [[arXiv:2503.22204](https://arxiv.org/abs/2503.22204)] [[Code](https://vulab-ai.github.io/Segment-then-Splat/)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [4] Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction\n- **🧑‍🔬 作者**：Jiacong Chen, Qingyu Mao, Youneng Bao, Xiandong Meng, Fanyang Meng, Ronggang Wang, Yongsheng Liang\n- **🏫 单位**：Shenzhen University ⟐ Shenzhen Technology University ⟐  City University of Hong Kong ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2505.16533.md)] [[arXiv:2505.16533](https://arxiv.org/abs/2505.16533)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [5] CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting\n- **🧑‍🔬 作者**：Kornel Howil, Joanna Waczyńska, Piotr Borycki, Tadeusz Dziarmaga, Marcin Mazur, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University\n- **🔗 链接**：[[中英摘要](./abs/2505.22854.md)] [[arXiv:2505.22854](https://arxiv.org/abs/2505.22854)] [[Code](https://github.com/kornelhowil/CLIPGaussian)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [6] LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering\n- **🧑‍🔬 作者**：Jonas Kulhanek, Marie-Julie Rakotosaona, Fabian Manhardt, Christina Tsalicoglou, Michael Niemeyer, Torsten Sattler, Songyou Peng, Federico Tombari\n- **🏫 单位**：Google ⟐ CTU in Prague ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2505.23158.md)] [[arXiv:2505.23158](https://arxiv.org/abs/2505.23158)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [7] ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS\n- **🧑‍🔬 作者**：Weijie Wang, Donny Y. Chen, Zeyu Zhang, Duochao Shi, Akide Liu, Bohan Zhuang\n- **🏫 单位**：Zhejiang University ⟐ Monash University\n- **🔗 链接**：[[中英摘要](./abs/2505.23734.md)] [[arXiv:2505.23734](https://arxiv.org/abs/2505.23734)] [[Code](https://github.com/ziplab/ZPressor)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [8] Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chengqi Li, Zhihao Shi, Yangdi Lu, Wenbo He, Xiangyu Xu\n- **🏫 单位**：McMaster University\n- **🔗 链接**：[[中英摘要](./abs/2506.03538.md)] [[arXiv:2506.03538](https://arxiv.org/abs/2506.03538)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [9] BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading\n- **🧑‍🔬 作者**：Jonathan Schmidt, Simon Giebenhain, Matthias Niessner\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2506.06271.md)] [[arXiv:2506.06271](https://arxiv.org/abs/2506.06271)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [10] HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene\n- **🧑‍🔬 作者**：Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ The University of Queensland ⟐ Jiangsu University\n- **🔗 链接**：[[中英摘要](./abs/2506.09518.md)] [[arXiv:2506.09518](https://arxiv.org/abs/2506.09518)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [11] Metropolis-Hastings Sampling for 3D Gaussian Reconstruction\n- **🧑‍🔬 作者**：Hyunjin Kim, Haebeom Jung, Jaesik Park\n- **🏫 单位**：UC San Diego ⟐ Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2506.12945.md)] [[arXiv:2506.12945](https://arxiv.org/abs/2506.12945)] [[Code](https://github.com/hjhyunjinkim/MH-3DGS)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [12] LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS\n- **🧑‍🔬 作者**：Wanhua Li, Yujie Zhao, Minghan Qin, Yang Liu, Yuanhao Cai, Chuang Gan, Hanspeter Pfister\n- **🏫 单位**：Harvard University ⟐ University of Chinese Academy of Sciences ⟐ Tsinghua University ⟐ Johns Hopkins University ⟐ MIT-IBM Watson AI Lab ⟐ UMass Amherst\n- **🔗 链接**：[[中英摘要](./abs/2507.07136.md)] [[arXiv:2507.07136](https://arxiv.org/abs/2507.07136)] [[Code](https://github.com/ZhaoYujie2002/LangSplatV2)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [13] Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Hyeongmin Lee, Kyungjune Baek\n- **🏫 单位**：Twelve Labs ⟐ Sejong University\n- **🔗 链接**：[[中英摘要](./abs/2507.17336.md)] [[arXiv:2507.17336](https://arxiv.org/abs/2507.17336)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [14] Omni-Scan: Creating Visually-Accurate Digital Twin Object Models Using a Bimanual Robot with Handover and Gaussian Splat Merging\n- **🧑‍🔬 作者**：Tianshuang Qiu, Zehan Ma, Karim El-Refai, Hiya Shah, Chung Min Kim, Justin Kerr, Ken Goldberg\n- **🏫 单位**：University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2508.00354.md)] [[arXiv:2508.00354](https://arxiv.org/abs/2508.00354)] [Code]\n- **📝 说明**: 🏆 Accepted to IROS 2025\n\n#### [15] Quantifying and Alleviating Co-Adaptation in Sparse-View 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kangjie Chen, Yingji Zhong, Zhihao Li, Jiaqi Lin, Youyu Chen, Minghan Qin, Haoqian Wang\n- **🏫 单位**：Tsinghua University ⟐ HKUST ⟐ Huawei Noah’s Ark Lab ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.12720.md)] [[arXiv:2508.12720](https://arxiv.org/abs/2508.12720)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [16] SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving\n- **🧑‍🔬 作者**：Haiming Zhang, Yiyao Zhu, Wending Zhou, Xu Yan, Yingjie Cai, Bingbing Liu, Shuguang Cui, Zhen Li\n- **🏫 单位**：FNii, Shenzhen ⟐ CUHK-Shenzhen ⟐ HKUST ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2509.16588.md)] [[arXiv:2509.16588](https://arxiv.org/abs/2509.16588)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [17] HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis\n- **🧑‍🔬 作者**：Zipeng Wang, Dan Xu\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.17083.md)] [[arXiv:2509.17083](https://arxiv.org/abs/2509.17083)] [[Code](https://github.com/wzpscott/hybrid-radiance-fields)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [18] Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos\n- **🧑‍🔬 作者**：Junyi Wu, Jiachen Tao, Haoxuan Wang, Gaowen Liu, Ramana Rao Kompella, Yan Yan\n- **🏫 单位**：University of Illinois Chicago ⟐ Cisco Research\n- **🔗 链接**：[[中英摘要](./abs/2509.23492.md)] [[arXiv:2509.23492](https://arxiv.org/abs/2509.23492)] [[Code](https://github.com/adreamwu/OriGS)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [19] MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics\n- **🧑‍🔬 作者**：Changmin Lee, Jihyun Lee, Tae-Kyun Kim\n- **🏫 单位**：KAIST\n- **🔗 链接**：[[中英摘要](./abs/2510.01619.md)] [[arXiv:2510.01619](https://arxiv.org/abs/2510.01619)] [[Code](https://github.com/KAISTChangmin/MPMAvatar)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [20] VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment\n- **🧑‍🔬 作者**：Qing Li, Huifang Feng, Xun Gong, Yu-Shen Liu\n- **🏫 单位**：Southwest Jiaotong University ⟐ Xihua University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2510.11473.md)] [[arXiv:2510.11473](https://arxiv.org/abs/2510.11473)] [[Code](https://github.com/LeoQLi/VA-GS)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [21] DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion\n- **🧑‍🔬 作者**：Weijie Wang, Jiagang Zhu, Zeyu Zhang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Haoxiao Wang, Guan Huang, Xinze Chen, Yukun Zhou, Wenkang Qin, Duochao Shi, Haoyun Li, Guanghong Jia, Jiwen Lu\n- **🏫 单位**：GigaAI ⟐ Zhejiang University ⟐ Tsinghua University ⟐ Chinese Academy of Sciences ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2510.15264.md)] [[arXiv:2510.15264](https://arxiv.org/abs/2510.15264)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025 Workshop on Next Practices in Video Generation and Evaluation (Short Paper Track)\n\n#### [22] Fix False Transparency by Noise Guided Splatting\n- **🧑‍🔬 作者**：Aly El Hakie, Yiren Lu, Yu Yin, Michael Jenkins, Yehe Liu\n- **🏫 单位**：OpsiClear LLC ⟐ Case Western Reserve University\n- **🔗 链接**：[[中英摘要](./abs/2510.15736.md)] [[arXiv:2510.15736](https://arxiv.org/abs/2510.15736)] [[Code](https://github.com/OpsiClear/noise_guided_splatting)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [23] COS3D: Collaborative Open-Vocabulary 3D Segmentation\n- **🧑‍🔬 作者**：Runsong Zhu, Ka-Hei Hui, Zhengzhe Liu, Qianyi Wu, Weiliang Tang, Shi Qiu, Pheng-Ann Heng, Chi-Wing Fu\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Autodesk AI Lab ⟐ Lingnan University ⟐ Monash University\n- **🔗 链接**：[[中英摘要](./abs/2510.20238.md)] [[arXiv:2510.20238](https://arxiv.org/abs/2510.20238)] [[Code](https://github.com/Runsong123/COS3D)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [24] OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects\n- **🧑‍🔬 作者**：Mark He Huang, Lin Geng Foo, Christian Theobalt, Ying Sun, De Wen Soh\n- **🏫 单位**：Singapore University of Technology and Design ⟐ Max Planck Institute for Informatics ⟐ A*STAR\n- **🔗 链接**：[[中英摘要](./abs/2510.20605.md)] [[arXiv:2510.20605](https://arxiv.org/abs/2510.20605)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [25] VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hoonhee Cho, Jae-Young Kang, Giwon Lee, Hyemin Yang, Heejun Park, Seokwoo Jung, Kuk-Jin Yoon\n- **🏫 单位**：KAIST ⟐ 42dot\n- **🔗 链接**：[[中英摘要](./abs/2510.23205.md)] [[arXiv:2510.23205](https://arxiv.org/abs/2510.23205)] [[Code](https://github.com/mickeykang16/VR-Drive)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [26] PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors\n- **🧑‍🔬 作者**：Xirui Jin, Renbiao Jin, Boying Li, Danping Zou, Wenxian Yu\n- **🏫 单位**：Monash University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2510.23930.md)] [[arXiv:2510.23930](https://arxiv.org/abs/2510.23930)] [[Code](https://github.com/SJTU-ViSYS-team/PlanarGS)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [27] AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians\n- **🧑‍🔬 作者**：Xiyu Zhang, Chong Bao, Yipeng Chen, Hongjia Zhai, Yitong Dong, Hujun Bao, Zhaopeng Cui, Guofeng Zhang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2510.25129.md)] [[arXiv:2510.25129](https://arxiv.org/abs/2510.25129)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [28] EA3D: Online Open-World 3D Object Extraction from Streaming Videos\n- **🧑‍🔬 作者**：Xiaoyu Zhou, Jingqi Wang, Yuang Jia, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang\n- **🏫 单位**： Peking University ⟐ Google DeepMind ⟐ University of California\n- **🔗 链接**：[[中英摘要](./abs/2510.25146.md)] [[arXiv:2510.25146](https://arxiv.org/abs/2510.25146)] [[Code](https://github.com/VDIGPKU/EA3D)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [29] HEIR: Learning Graph-Based Motion Hierarchies\n- **🧑‍🔬 作者**：Cheng Zheng, William Koch, Baiang Li, Felix Heide\n- **🏫 单位**：Princeton University ⟐ Torc Robotics\n- **🔗 链接**：[[中英摘要](./abs/2510.26786.md)] [[arXiv:2510.26786](https://arxiv.org/abs/2510.26786)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [30] DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Moonsoo Jeong, Dongbeen Kim, Minseong Kim, Sungkil Lee\n- **🏫 单位**：Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](./abs/2510.26921.md)] [[arXiv:2510.26921](https://arxiv.org/abs/2510.26921)] [[Code](https://github.com/cgskku/dc4gs)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [31] GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies\n- **🧑‍🔬 作者**：Ziye Wang, Li Kang, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang\n- **🏫 单位**：Sun Yat-sen University ⟐ The University of Hong Kong ⟐ Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong, Shenzhen ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2511.00998.md)] [[arXiv:2511.00998](https://arxiv.org/abs/2511.00998)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [32] IBGS: Image-Based Gaussian Splatting\n- **🧑‍🔬 作者**：Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez, Miaomiao Liu\n- **🏫 单位**：Australian National University ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](../abs/2511.14357.md)] [[arXiv:2511.14357](https://arxiv.org/abs/2511.14357)] [[Code](https://github.com/HoangChuongNguyen/ibgs)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [33] TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming\n- **🧑‍🔬 作者**：Zeyuan Yin, Xiaoming Liu\n- **🏫 单位**：Michigan State University\n- **🔗 链接**：[[中英摘要](../abs/2511.16642.md)] [[arXiv:2511.16642](https://arxiv.org/abs/2511.16642)] [[Code](https://github.com/zeyuanyin/TRIM)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [34] Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion\n- **🧑‍🔬 作者**：Yan Xu, Yixing Wang, Stella X. Yu\n- **🏫 单位**：University of Michigan ⟐ UC Berkeley\n- **🔗 链接**：[[中英摘要](../abs/2511.17932.md)] [[arXiv:2511.17932](https://arxiv.org/abs/2511.17932)] [[Code](https://github.com/DecaYale/SYN3R)]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n\n#### [35] DGH: Dynamic Gaussian Hair\n- **🧑‍🔬 作者**：Junying Wang, Yuanlu Xu, Edith Tretschk, Ziyan Wang, Anastasia Ianina, Aljaz Bozic, Ulrich Neumann, Tony Tung\n- **🏫 单位**：University of Southern California ⟐ Meta Reality Labs Research\n- **🔗 链接**：[[中英摘要](../abs/2512.17094.md)] [[arXiv:2512.17094](https://arxiv.org/abs/2512.17094)] [Code]\n- **📝 说明**: 🏆 Accepted to NeurIPS 2025\n"
  },
  {
    "path": "2025/SIGGRAPH.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to SIGGRAPH2025\n\n#### [1] SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sara Sabour, Lily Goli, George Kopanas, Mark Matthews, Dmitry Lagun, Leonidas Guibas, Alec Jacobson, David J. Fleet, Andrea Tagliasacchi\n- **🏫 单位**：Google DeepMind ⟐ University of Toronto ⟐ Stanford University ⟐ Simon Fraser University\n- **🔗 链接**：[[中英摘要](./abs/2406.20055.md)] [[arXiv:2406.20055](https://arxiv.org/abs/2406.20055)] [[Code](https://github.com/lilygoli/SpotLessSplats)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2025\n\n#### [2] FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering\n- **🧑‍🔬 作者**：Yunji Seo, Young Sun Choi, Hyun Seung Son, Youngjung Uh\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2408.12894.md)] [[arXiv:2408.12894](https://arxiv.org/abs/2408.12894)] [[Code](https://github.com/3DGS-FLoD/flod)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [3] LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation\n- **🧑‍🔬 作者**：Shuai Yang, Jing Tan, Mengchen Zhang, Tong Wu, Yixuan Li, Gordon Wetzstein, Ziwei Liu, Dahua Lin\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai AI Laboratory ⟐ CUHK ⟐ Zhejiang University ⟐ Stanford University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2408.13252.md)] [[arXiv:2408.13252](https://arxiv.org/abs/2408.13252)] [[Code](https://github.com/3DTopia/LayerPano3D)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2025\n\n#### [4] SqueezeMe: Efficient Gaussian Avatars for VR\n- **🧑‍🔬 作者**：Shunsuke Saito, Stanislav Pidhorskyi, Igor Santesteban, Forrest Iandola, Divam Gupta, Anuj Pahuja, Nemanja Bartolovic, Frank Yu, Emanuel Garbin, Tomas Simon\n- **🏫 单位**：Meta Reality Labs, USA ⟐ Meta Reality Labs, Switzerland\n- **🔗 链接**：[[中英摘要](./abs/2412.15171.md)] [[arXiv:2412.15171](https://arxiv.org/abs/2412.15171)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [5] SplatMAP: Online Dense Monocular SLAM with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yue Hu, Rong Liu, Meida Chen, Peter Beerel, Andrew Feng\n- **🏫 单位**：University of Southern California\n- **🔗 链接**：[[中英摘要](./abs/2501.07015.md)] [[arXiv:2501.07015](https://arxiv.org/abs/2501.07015)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025 Symposium on Interactive 3D Graphics and Games\n\n#### [6] Deformable Beta Splatting\n- **🧑‍🔬 作者**：Rong Liu, Dylan Sun, Meida Chen, Yue Wang, Andrew Feng\n- **🏫 单位**：University of Southern California\n- **🔗 链接**：[[中英摘要](./abs/2501.18630.md)] [[arXiv:2501.18630](https://arxiv.org/abs/2501.18630)] [[Code](https://github.com/RongLiu-Leo/beta-splatting)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2025\n\n#### [7] Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yansong Qu, Dian Chen, Xinyang Li, Xiaofan Li, Shengchuan Zhang, Liujuan Cao, Rongrong Ji\n- **🏫 单位**：Xiamen University ⟐ Baidu Inc.\n- **🔗 链接**：[[中英摘要](./abs/2501.18672.md)] [[arXiv:2501.18672](https://arxiv.org/abs/2501.18672)] [[Code](https://github.com/Quyans/Drag-Your-Gaussian)]\n- **📝 说明**：🏆 Accepted to SIGGRAPH 2025\n\n#### [8] GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats\n- **🧑‍🔬 作者**：Kai Deng, Jian Yang, Shenlong Wang, Jin Xie\n- **🏫 单位**：Nankai University ⟐ University of Illinois at Urbana-Champaign ⟐ Nanjing University Suzhou Campus\n- **🔗 链接**：[[中英摘要](./abs/2503.08071.md)] [[arXiv:2503.08071](https://arxiv.org/abs/2503.08071)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Aisa 2025\n\n#### [9] Generating 360° Video is What You Need For a 3D Scene\n- **🧑‍🔬 作者**：Zhaoyang Zhang, Yannick Hold-Geoffroy, Miloš Hašan, Ziwen Chen, Fujun Luan, Julie Dorsey, Yiwei Hu\n- **🏫 单位**：Adobe ⟐ Yale University ⟐ Oregon State University\n- **🔗 链接**：[[中英摘要](./abs/2504.02045.md)] [[arXiv:2504.02045](https://arxiv.org/abs/2504.02045)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [10] In-2-4D: Inbetweening from Two Single-View Images to 4D Generation\n- **🧑‍🔬 作者**：Sauradip Nag, Daniel Cohen-Or, Hao Zhang, Ali Mahdavi-Amiri\n- **🏫 单位**：Simon Fraser University ⟐ Tel Aviv University\n- **🔗 链接**：[[中英摘要](./abs/2504.08366.md)] [[arXiv:2504.08366](https://arxiv.org/abs/2504.08366)] [[Code](https://github.com/sauradip/In-2-4D)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [11] TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians\n- **🧑‍🔬 作者**：Letian Huang, Dongwei Ye, Jialin Dan, Chengzhi Tao, Huiwen Liu, Kun Zhou, Bo Ren, Yuanqi Li, Yanwen Guo, Jie Guo\n- **🏫 单位**：Nanjing University ⟐ Nankai University ⟐ Zhejiang University ⟐ Institute of Hangzhou Holographic Intelligent Technology\n- **🔗 链接**：[[中英摘要](../abs/2504.18768.md)] [[arXiv:2504.18768](https://arxiv.org/abs/2504.18768)] [[Code](https://github.com/LetianHuang/transparentgs)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [12] Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Youngsik Yun, Jeongmin Bae, Hyunseung Son, Seoha Kim, Hahyun Lee, Gun Bang, Youngjung Uh\n- **🏫 单位**：Yonsei University ⟐ Electronics and Telecommunications Research Institute\n- **🔗 链接**：[[中英摘要](../abs/2505.01235.md)] [[arXiv:2505.01235](https://arxiv.org/abs/2505.01235)] [[Code](https://github.com/bbangsik13/OR2)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [13] TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling\n- **🧑‍🔬 作者**：Gengyan Li, Paulo Gotardo, Timo Bolkart, Stephan Garbin, Kripasindhu Sarkar, Abhimitra Meka, Alexandros Lattas, Thabo Beeler\n- **🏫 单位**：ETHZurich, Switzerland and Google, Switzerland ⟐ Google,Switzerland ⟐ Google,United Kingdom\n- **🔗 链接**：[[中英摘要](../abs/2505.05672.md)] [[arXiv:2505.05672](https://arxiv.org/abs/2505.05672)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [14] Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes\n- **🧑‍🔬 作者**：Xijie Yang, Linning Xu, Lihan Jiang, Dahua Lin, Bo Dai\n- **🏫 单位**：Zhejiang University, China and Shanghai Artificial Intelligence Laboratory, China ⟐ The Chinese University of Hong Kong, China ⟐ University of Science and Technology of China, China and Shanghai Artificial Intelligence Laboratory, China ⟐ The Chinese University of Hong Kong, China and Shanghai Artificial Intelligence Laboratory, China ⟐ The University of Hong Kong, China and Feeling AI, China\n- **🔗 链接**：[[中英摘要](../abs/2505.06523.md)] [[arXiv:2505.06523](https://arxiv.org/abs/2505.06523)] [[Code](https://github.com/city-super/V3DG)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [15] Monocular Online Reconstruction with Enhanced Detail Preservation\n- **🧑‍🔬 作者**：Songyin Wu, Zhaoyang Lv, Yufeng Zhu, Duncan Frost, Zhengqin Li, Ling-Qi Yan, Carl Ren, Richard Newcombe, Zhao Dong\n- **🏫 单位**：University of California Santa Barbara ⟐ Meta Reality Labs Research\n- **🔗 链接**：[[中英摘要](./abs/2505.07887.md)] [[arXiv:2505.07887](https://arxiv.org/abs/2505.07887)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [16] EVA: Expressive Virtual Avatars from Multi-view Videos\n- **🧑‍🔬 作者**：Hendrik Junkawitsch, Guoxing Sun, Heming Zhu, Christian Theobalt, Marc Habermann\n- **🏫 单位**：Max Planck Institute for Informatics\n- **🔗 链接**：[[中英摘要](./abs/2505.15385.md)] [[arXiv:2505.15385](https://arxiv.org/abs/2505.15385)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [17] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views\n- **🧑‍🔬 作者**：Lihan Jiang, Yucheng Mao, Linning Xu, Tao Lu, Kerui Ren, Yichen Jin, Xudong Xu, Mulin Yu, Jiangmiao Pang, Feng Zhao, Dahua Lin, Bo Dai\n- **🏫 单位**：The University of Science and Technology of China ⟐ Shanghai Artificial Intelligence Laboratory ⟐ The Chinese University of Hong Kong ⟐ Brown University ⟐ Shanghai Jiao Tong University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2505.23716.md)] [[arXiv:2505.23716](https://arxiv.org/abs/2505.23716)] [[Code](https://github.com/OpenRobotLab/AnySplat)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [18] WorldExplorer: Towards Generating Fully Navigable 3D Scenes\n- **🧑‍🔬 作者**：Manuel-Andreas Schneider, Lukas Höllein, Matthias Nießner\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2506.01799.md)] [[arXiv:2506.01799](https://arxiv.org/abs/2506.01799)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [19] On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images\n- **🧑‍🔬 作者**：Andreas Meuleman, Ishaan Shah, Alexandre Lanvin, Bernhard Kerbl, George Drettakis\n- **🏫 单位**：Inria, Université Côte d’Azur ⟐ TU Wien\n- **🔗 链接**：[[中英摘要](./abs/2506.05558.md)] [[arXiv:2506.05558](https://arxiv.org/abs/2506.05558)] [[Code](https://github.com/graphdeco-inria/on-the-fly-nvs)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [20] Splat and Replace: 3D Reconstruction with Repetitive Elements\n- **🧑‍🔬 作者**：Nicolás Violante, Andreas Meuleman, Alban Gauthier, Frédo Durand, Thibault Groueix, George Drettakis\n- **🏫 单位**：Inria & Université Côte d’Azur ⟐ Adobe ⟐ MIT\n- **🔗 链接**：[[中英摘要](./abs/2506.06462.md)] [[arXiv:2506.06462](https://arxiv.org/abs/2506.06462)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [21] SOF: Sorted Opacity Fields for Fast Unbounded Surface Reconstruction\n- **🧑‍🔬 作者**：Lukas Radl, Felix Windisch, Thomas Deixelberger, Jozef Hladky, Michael Steiner, Dieter Schmalstieg, Markus Steinberger\n- **🏫 单位**：Graz University of Technology ⟐ Huawei Technologies ⟐ University of Stuttgart\n- **🔗 链接**：[[中英摘要](./abs/2506.19139.md)] [[arXiv:2506.19139](https://arxiv.org/abs/2506.19139)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [22] 3DGH: 3D Head Generation with Composable Hair and Face\n- **🧑‍🔬 作者**：Chengan He, Junxuan Li, Tobias Kirschstein, Artem Sevastopolsky, Shunsuke Saito, Qingyang Tan, Javier Romero, Chen Cao, Holly Rushmeier, Giljoo Nam\n- **🏫 单位**：Yale University ⟐ Meta Codec Avatars Lab ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2506.20875.md)] [[arXiv:2506.20875](https://arxiv.org/abs/2506.20875)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [23] GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering\n- **🧑‍🔬 作者**：Zinuo You, Stamatios Georgoulis, Anpei Chen, Siyu Tang, Dengxin Dai\n- **🏫 单位**：ETH Zürich ⟐ Huawei Research Zürich\n- **🔗 链接**：[[中英摘要](./abs/2506.23957.md)] [[arXiv:2506.23957](https://arxiv.org/abs/2506.23957)] [[Code](https://github.com/sinoyou/GaVS)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [24] Gbake: Baking 3D Gaussian Splats into Reflection Probes\n- **🧑‍🔬 作者**：Stephen Pasch, Joel K. Salzman, Changxi Zheng\n- **🏫 单位**：Columbia University ⟐ Brown University\n- **🔗 链接**：[[中英摘要](./abs/2507.02257.md)] [[arXiv:2507.02257](https://arxiv.org/abs/2507.02257)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [25] ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions\n- **🧑‍🔬 作者**：Shivangi Aneja, Sebastian Weiss, Irene Baeza, Prashanth Chandran, Gaspard Zoss, Matthias Nießner, Derek Bradley\n- **🏫 单位**：Technical University of Munich ⟐ DisneyResearch|Studios\n- **🔗 链接**：[[中英摘要](./abs/2507.10542.md)] [[arXiv:2507.10542](https://arxiv.org/abs/2507.10542)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH 2025\n\n#### [26] AD-GS: Alternating Densification for Sparse-Input 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Gurutva Patle, Nilay Girgaonkar, Nagabhushan Somraj, Rajiv Soundararajan\n- **🏫 单位**：Indian Institute of Science ⟐ Birla Institute of Technology and Science\n- **🔗 链接**：[[中英摘要](./abs/2509.11003.md)] [[arXiv:2509.11003](https://arxiv.org/abs/2509.11003)] [[Code](https://github.com/gurutvapatle/AD-GS)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [27] GS-RoadPatching: Inpainting Gaussians via 3D Searching and Placing for Driving Scenes\n- **🧑‍🔬 作者**：Guo Chen, Jiarun Liu, Sicong Du, Chenming Wu, Deqi Li, Shi-Sheng Huang, Guofeng Zhang, Sheng Yang\n- **🏫 单位**：Beijing Normal University ⟐ Alibaba ⟐ Zhejiang University ⟐ Baidu\n- **🔗 链接**：[[中英摘要](./abs/2509.19937.md)] [[arXiv:2509.19937](https://arxiv.org/abs/2509.19937)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [28] PowerGS: Display-Rendering Power Co-Optimization for Neural Rendering in Power-Constrained XR Systems\n- **🧑‍🔬 作者**：Weikai Lin, Sushant Kondguli, Carl Marshall, Yuhao Zhu\n- **🏫 单位**：University of Rochester ⟐ Reality Labs Research, Meta\n- **🔗 链接**：[[中英摘要](./abs/2509.21702.md)] [[arXiv:2509.21702](https://arxiv.org/abs/2509.21702)] [[Code](https://github.com/horizon-research/PowerGS)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [29] LVT: Large-Scale Scene Reconstruction via Local View Transformers\n- **🧑‍🔬 作者**：Tooba Imtiaz, Lucy Chai, Kathryn Heal, Xuan Luo, Jungyeon Park, Jennifer Dy, John Flynn\n- **🏫 单位**：Google ⟐ Northeastern University\n- **🔗 链接**：[[中英摘要](./abs/2509.25001.md)] [[arXiv:2509.25001](https://arxiv.org/abs/2509.25001)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [30] PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos\n- **🧑‍🔬 作者**：Ting-Hsuan Liao, Haowen Liu, Yiran Xu, Songwei Ge, Gengshan Yang, Jia-Bin Huang\n- **🏫 单位**：University of Maryland College Park\n- **🔗 链接**：[[中英摘要](./abs/2509.25183.md)] [[arXiv:2509.25183](https://arxiv.org/abs/2509.25183)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [31] Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures\n- **🧑‍🔬 作者**：Yuancheng Xu, Wenqi Xian, Li Ma, Julien Philip, Ahmet Levent Taşel, Yiwei Zhao, Ryan Burgert, Mingming He, Oliver Hermann, Oliver Pilarski, Rahul Garg, Paul Debevec, Ning Yu\n- **🏫 单位**：Eyeline Labs ⟐ Netflix\n- **🔗 链接**：[[中英摘要](./abs/2510.14179.md)] [[arXiv:2510.14179](https://arxiv.org/abs/2510.14179)] [Code]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [32] Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video\n- **🧑‍🔬 作者**：Yarin Bekor, Gal Michael Harari, Or Perel, Or Litany\n- **🏫 单位**：Israel Institute of Technology ⟐ NVIDIA ⟐ University of Toronto ⟐ Vector Institute\n- **🔗 链接**：[[中英摘要](../abs/2511.14848.md)] [[arXiv:2511.14848](https://arxiv.org/abs/2511.14848)] [[Code](https://github.com/GSGD-MotionTransfer/GSGD)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [33] Clustered Error Correction with Grouped 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Taeho Kang, Jaeyeon Park, Kyungjin Lee, Youngki Lee\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2511.16112.md)] [[arXiv:2511.16112](https://arxiv.org/abs/2511.16112)] [[Code](https://github.com/tho-kn/cem-4dgs)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [34] Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction\n- **🧑‍🔬 作者**：Yiming Wang, Shaofei Wang, Marko Mihajlovic, Siyu Tang\n- **🏫 单位**：ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2511.18873.md)] [[arXiv:2511.18873](https://arxiv.org/abs/2511.18873)] [[Code](https://github.com/19reborn/neural-texture-splatting)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n\n#### [35] DeMapGS: Simultaneous Mesh Deformation and Surface Attribute Mapping via Gaussian Splatting\n- **🧑‍🔬 作者**：Shuyi Zhou, Shengze Zhong, Kenshi Takayama, Takafumi Taketomi, Takeshi Oishi\n- **🏫 单位**：The University of Tokyo ⟐ CyberAgent\n- **🔗 链接**：[[中英摘要](../abs/2512.10572.md)] [[arXiv:2512.10572](https://arxiv.org/abs/2512.10572)] [[Code](https://github.com/CyberAgentAILab/DeMapGS)]\n- **📝 说明**: 🏆 Accepted to SIGGRAPH Asia 2025\n"
  },
  {
    "path": "2025/WACV.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to WACV2025\n\n#### [1] GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis\n- **🧑‍🔬 作者**：Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao\n- **🏫 单位**：Meta ⟐ Brown University\n- **🔗 链接**：[[中英摘要](../abs/2312.11458.md)] [[arXiv:2312.11458](https://arxiv.org/abs/2312.11458)] [[Supp](https://lynl7130.github.io/gaufre/static/pdfs/suppl.pdf)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [2] DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing\n- **🧑‍🔬 作者**：Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala\n- **🏫 单位**：ETH Zurich ⟐ Tampere University ⟐ Aalto University ⟐ Spectacular AI\n- **🔗 链接**：[[中英摘要](../abs/2403.17822.md)] [[arXiv:2403.17822](https://arxiv.org/abs/2403.17822)] [[Code](https://github.com/maturk/dn-splatter)]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [3] OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images\n- **🧑‍🔬 作者**：Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng\n- **🏫 单位**：Sun Yat-Sen University\n- **🔗 链接**：[[中英摘要](../abs/2404.03202.md)] [[arXiv:2404.03202](https://arxiv.org/abs/2404.03202)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [4] GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation\n- **🧑‍🔬 作者**：Florian Chabot, Nicolas Granger, Guillaume Lapouge\n- **🏫 单位**：CEA, List, F-91120, Palaiseau, France\n- **🔗 链接**：[[中英摘要](../abs/2407.14108.md)] [[arXiv:2407.14108](https://arxiv.org/abs/2407.14108)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [5] Localized Gaussian Splatting Editing with Contextual Awareness\n- **🧑‍🔬 作者**：Hanyuan Xiao, Yingshu Chen, Huajian Huang, Haolin Xiong, Jing Yang, Pratusha Prasad, Yajie Zhao\n- **🏫 单位**：University of Southern California ⟐ Institute for Creative Technologies ⟐ HKUST ⟐ University of California, Los Angeles\n- **🔗 链接**：[[中英摘要](./abs/2408.00083.md)] [[arXiv:2408.00083](https://arxiv.org/abs/2408.00083)] [Code]\n- **📝 说明**: 🏆 Accepted to WACV 2025\n\n#### [6] LumiGauss: High-Fidelity Outdoor Relighting with 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Joanna Kaleta, Kacper Kania, Tomasz Trzcinski, Marek Kowalski\n- **🏫 单位**：Warsaw University of Technology ⟐ Sano Centre for Computational Medicine ⟐ Microsoft ⟐ IDEAS NCBR ⟐ Tooploox\n- **🔗 链接**：[[中英摘要](../abs/2408.04474.md)] [[arXiv:2408.04474](https://arxiv.org/abs/2408.04474)] [[Code](https://github.com/joaxkal/lumigauss)]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [7] EdgeGaussians -- 3D Edge Mapping via Gaussian Splatting\n- **🧑‍🔬 作者**：Kunal Chelani, Assia Benbihi, Torsten Sattler, Fredrik Kahl\n- **🏫 单位**：Chalmers University of Technology ⟐  Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague\n- **🔗 链接**：[[中英摘要](../abs/2409.12886.md)] [[arXiv:2409.12886](https://arxiv.org/abs/2409.12886)] [[Code](https://github.com/kunalchelani/EdgeGaussians)]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [8] GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling\n- **🧑‍🔬 作者**：Victor Rong, Jingxiang Chen, Sherwin Bahmani, Kiriakos N. Kutulakos, David B. Lindell\n- **🏫 单位**：University of Toronto\n- **🔗 链接**：[[中英摘要](../abs/2409.12954.md)] [[arXiv:2409.12954](https://arxiv.org/abs/2409.12954)] [[Code](https://lessvrong.com/cs/gstex/)]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [9] Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities\n- **🧑‍🔬 作者**：Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du\n- **🏫 单位**：University of British Columbia ⟐ Huawei Canada ⟐ University of British Columbia (Okanagan)\n- **🔗 链接**：[[中英摘要](../abs/2409.16147.md)] [[arXiv:2409.16147](https://arxiv.org/abs/2409.16147)] [[Code](https://github.com/PeizhiYan/gaussian-dejavu)]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [10] UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction\n- **🧑‍🔬 作者**：Haoran Wang, Nantheera Anantrasirichai, Fan Zhang, David Bull\n- **🏫 单位**：School of Computer Science, University of Bristol, Bristol, UK\n- **🔗 链接**：[[中英摘要](../abs/2410.01517.md)] [[arXiv:2410.01517](https://arxiv.org/abs/2410.01517)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [11] ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Muhammad Salman Ali, Sung-Ho Bae, Enzo Tartaglione\n- **🏫 单位**：LTCI, Tel´ ecom Paris, Institut Polytechnique de Paris, France ⟐ Kyung Hee University, Republic of Korea\n- **🔗 链接**：[[中英摘要](../abs/2410.23213.md)] [[arXiv:2410.23213](https://arxiv.org/abs/2410.23213)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [12] Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction\n- **🧑‍🔬 作者**：Decai Chen, Brianne Oberson, Ingo Feldmann, Oliver Schreer, Anna Hilsmann, Peter Eisert\n- **🏫 单位**：Fraunhofer HHI ⟐ Humboldt University of Berlin ⟐ Technical University of Berlin\n- **🔗 链接**：[[中英摘要](../abs/2411.06602.md)] [[arXiv:2411.06602](https://arxiv.org/abs/2411.06602)] [[Code](https://github.com/fraunhoferhhi/AT-GS)]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [13] Planar Gaussian Splatting\n- **🧑‍🔬 作者**：Farhad G. Zanjani, Hong Cai, Hanno Ackermann, Leila Mirvakhabova, Fatih Porikli\n- **🏫 单位**：Qualcomm AI Research\n- **🔗 链接**：[[中英摘要](../abs/2412.01931.md)] [[arXiv:2412.01931](https://arxiv.org/abs/2412.01931)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [14] BeSplat -- Gaussian Splatting from a Single Blurry Image and Event Stream\n- **🧑‍🔬 作者**：Gopi Raju Matta, Reddypalli Trisha, Kaushik Mitra\n- **🏫 单位**：IIT Madras ⟐ IIIT RGUKT RKValley\n- **🔗 链接**：[[中英摘要](../abs/2412.19370.md)] [[arXiv:2412.19370](https://arxiv.org/abs/2412.19370)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025 EVGEN Workshop\n\n#### [15] Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation\n- **🧑‍🔬 作者**：Rohan Chacko, Nicolai Haeni, Eldar Khaliullin, Lin Sun, Douglas Lee\n- **🏫 单位**：Magic Leap Inc.\n- **🔗 链接**：[[中英摘要](../abs/2502.00173.md)] [[arXiv:2502.00173](https://arxiv.org/abs/2502.00173)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025\n\n#### [16] Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis\n- **🧑‍🔬 作者**：Niluthpol Chowdhury Mithun, Tuan Pham, Qiao Wang, Ben Southall, Kshitij Minhas, Bogdan Matei, Stephan Mandt, Supun Samarasekera, Rakesh Kumar\n- **🏫 单位**：SRI International, Princeton, NJ, USA ⟐ University of California, Irvine, CA, USA\n- **🔗 链接**：[[中英摘要](../abs/2504.01960.md)] [[arXiv:2504.01960](https://arxiv.org/abs/2504.01960)] [Code]\n- **📝 说明**：🏆 Accepted to WACV 2025 ULTRRA Workshop\n"
  },
  {
    "path": "2026/3DV.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to 3DV2026\n\n#### [1] Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars\n- **🧑‍🔬 作者**：Marcel C. Bühler, Ye Yuan, Xueting Li, Yangyi Huang, Koki Nagano, Umar Iqbal\n- **🏫 单位**：ETH Zurich ⟐ NVIDIA ⟐ Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2507.15979.md)] [[arXiv:2507.15979](https://arxiv.org/abs/2507.15979)] [Code]\n- **📝 说明**: 🏆 Accepted to 3DV 2026\n\n#### [2] GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects\n- **🧑‍🔬 作者**：Licheng Shen, Saining Zhang, Honghan Li, Peilin Yang, Zihao Huang, Zongzheng Zhang, Hao Zhao\n- **🏫 单位**：THU ⟐ NTU ⟐ BIT ⟐ HUST ⟐ BAAI\n- **🔗 链接**：[[中英摘要](../abs/2508.14891.md)] [[arXiv:2508.14891](https://arxiv.org/abs/2508.14891)] [Code]\n- **📝 说明**: 🏆 Accepted to 3DV 2026\n\n#### [3] Proxy-Free Gaussian Splats Deformation with Splat-Based Surface Estimation\n- **🧑‍🔬 作者**：Jaeyeong Kim, Seungwoo Yoo, Minhyuk Sung\n- **🏫 单位**：KAIST\n- **🔗 链接**：[[中英摘要](../abs/2511.19542.md)] [[arXiv:2511.19542](https://arxiv.org/abs/2511.19542)] [[Code](https://github.com/kjae0/SpLap)]\n- **📝 说明**: 🏆 Accepted to 3DV 2026\n\n#### [4] TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars\n- **🧑‍🔬 作者**：Jaeseong Lee, Junyeong Ahn, Taewoong Kang, Jaegul Choo\n- **🏫 单位**：KAIST ⟐ Hanyang University\n- **🔗 链接**：[[中英摘要](../abs/2512.21099.md)] [[arXiv:2512.21099](https://arxiv.org/abs/2512.21099)] [[Code](https://github.com/summertight/TexAvatars_repo)]\n- **📝 说明**: 🏆 Accepted to 3DV 2026\n\n#### [5] From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors\n- **🧑‍🔬 作者**：Ding-Jiun Huang, Yuanhao Wang, Shao-Ji Yuan, Albert Mosella-Montoro, Francisco Vicente Carrasco, Cheng Zhang, Fernando De la Torre\n- **🏫 单位**：Carnegie Mellon University ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](../abs/2602.06122.md)] [[arXiv:2602.06122](https://arxiv.org/abs/2602.06122)] [[Code](https://github.com/humansensinglab/super-head)]\n- **📝 说明**: 🏆 Accepted to 3DV 2026\n"
  },
  {
    "path": "2026/AAAI.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to AAAI2026\n\n#### [1] FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yitong Yang, Yinglin Wang, Changshuo Wang, Huajie Wang, Shuting He\n- **🏫 单位**：Shanghai University of Finance and Economics ⟐ Shandong University of Finance and Economics ⟐ University College London\n- **🔗 链接**：[[中英摘要](../abs/2508.08136.md)] [[arXiv:2508.08136](https://arxiv.org/abs/2508.08136)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [2] Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction\n- **🧑‍🔬 作者**：Cheng Chen, Hao Huang, Saurabh Bagchi\n- **🏫 单位**：Purdue University ⟐ New York University, Abu Dhabi\n- **🔗 链接**：[[中英摘要](../abs/2508.10936.md)] [[arXiv:2508.10936](https://arxiv.org/abs/2508.10936)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [3] Arbitrary-Scale 3D Gaussian Super-Resolution\n- **🧑‍🔬 作者**：Huimin Zeng, Yue Bai, Yun Fu\n- **🏫 单位**：Northeastern University\n- **🔗 链接**：[[中英摘要](../abs/2508.16467.md)] [[arXiv:2508.16467](https://arxiv.org/abs/2508.16467)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [4] MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting\n- **🧑‍🔬 作者**：Hanzhi Chang, Ruijie Zhu, Wenjie Chang, Mulin Yu, Yanzhe Liang, Jiahao Lu, Zhuoyuan Li, Tianzhu Zhang\n- **🏫 单位**：University of Science and Technology of China ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](../abs/2508.17811.md)] [[arXiv:2508.17811](https://arxiv.org/abs/2508.17811)] [[Code](https://github.com/HanzhiChang/MeshSplat)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [5] StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video\n- **🧑‍🔬 作者**：Zhihui Ke, Yuyang Liu, Xiaobo Zhou, Tie Qiu\n- **🏫 单位**：Tianjin University\n- **🔗 链接**：[[中英摘要](../abs/2511.06046.md)] [[arXiv:2511.06046](https://arxiv.org/abs/2511.06046)] [[Code](https://github.com/kkkzh/StreamSTGS)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [6] Physics-Informed Deformable Gaussian Splatting: Towards Unified Constitutive Laws for Time-Evolving Material Field\n- **🧑‍🔬 作者**：Haoqin Hong, Ding Fan, Fubin Dou, Zhi-Li Zhou, Haoran Sun, Congcong Zhu, Jingrun Chen\n- **🏫 单位**：University of Science and Technology of China ⟐ University of Illinois Urbana-Champaign ⟐ Suzhou Big Data & AI Research and Engineering Center\n- **🔗 链接**：[[中英摘要](../abs/2511.06299.md)] [[arXiv:2511.06299](https://arxiv.org/abs/2511.06299)] [[Code](https://github.com/SCAILab-USTC/Physics-Informed-Deformable-Gaussian-Splatting)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [7] Rethinking Rainy 3D Scene Reconstruction via Perspective Transforming and Brightness Tuning\n- **🧑‍🔬 作者**：Qianfeng Yang, Xiang Chen, Pengpeng Li, Qiyuan Guan, Guiyue Jin, Jiyu Jin\n- **🏫 单位**：Dalian Polytechnic University ⟐ Nanjing University of Science and Technology\n- **🔗 链接**：[[中英摘要](../abs/2511.06734.md)] [[arXiv:2511.06734](https://arxiv.org/abs/2511.06734)] [[Code](https://github.com/ncfjd/REVR-GSNet)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [8] 4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation\n- **🧑‍🔬 作者**：Mengmeng Liu, Jiuming Liu, Yunpeng Zhang, Jiangtao Li, Michael Ying Yang, Francesco Nex, Hao Cheng\n- **🏫 单位**：University of Twente ⟐ University of Cambridge ⟐ PhiGent Robotics ⟐ University of Bath\n- **🔗 链接**：[[中英摘要](../abs/2511.07241.md)] [[arXiv:2511.07241](https://arxiv.org/abs/2511.07241)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [9] Perceptual Quality Assessment of 3D Gaussian Splatting: A Subjective Dataset and Prediction Metric\n- **🧑‍🔬 作者**：Zhaolin Wan, Yining Diao, Jingqi Xu, Hao Wang, Zhiyang Li, Xiaopeng Fan, Wangmeng Zuo, Debin Zhao\n- **🏫 单位**：Harbin Institute of Technology ⟐ Dalian Maritime University ⟐ City University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2511.08032.md)] [[arXiv:2511.08032](https://arxiv.org/abs/2511.08032)] [[Code](https://github.com/diaoyn/3DGSQA)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [10] TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiyuan Xu, Nan Min, Yuhang Guo, Tong Wei\n- **🏫 单位**：Southeast University\n- **🔗 链接**：[[中英摘要](../abs/2511.09944.md)] [[arXiv:2511.09944](https://arxiv.org/abs/2511.09944)] [[Code](https://github.com/nortonii/TSPE-GS)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [11] Multivariate Gaussian Representation Learning for Medical Action Evaluation\n- **🧑‍🔬 作者**：Luming Yang, Haoxian Liu, Siqing Li, Alper Yilmaz\n- **🏫 单位**：The Ohio State University ⟐ Hong Kong University of Science and Technology ⟐ Southern University of Science and Technology\n- **🔗 链接**：[[中英摘要](../abs/2511.10060.md)] [[arXiv:2511.10060](https://arxiv.org/abs/2511.10060)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [12] PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI\n- **🧑‍🔬 作者**：Sun Jo, Seok Young Hong, JinHyun Kim, Seungmin Kang, Ahjin Choi, Don-Gwan An, Simon Song, Je Hyeong Hong\n- **🏫 单位**：Hanyang University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](../abs/2511.11048.md)] [[arXiv:2511.11048](https://arxiv.org/abs/2511.11048)] [[Code](https://github.com/SpatialAILab/PINGS-X)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [13] SRSplat: Feed-Forward Super-Resolution Gaussian Splatting from Sparse Multi-View Images\n- **🧑‍🔬 作者**：Xinyuan Hu, Changyue Shi, Chuxiao Yang, Minghao Chen, Jiajun Ding, Tao Wei, Chen Wei, Zhou Yu, Min Tan\n- **🏫 单位**：Hangzhou Dianzi University ⟐ Peking University\n- **🔗 链接**：[[中英摘要](../abs/2511.12040.md)] [[arXiv:2511.12040](https://arxiv.org/abs/2511.12040)] [[Code](https://github.com/XinyuanHu66/SRSplat_Code)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [14] GUIDE: Gaussian Unified Instance Detection for Enhanced Obstacle Perception in Autonomous Driving\n- **🧑‍🔬 作者**：Chunyong Hu, Qi Luo, Jianyun Xu, Song Wang, Qiang Li, Sheng Yang\n- **🏫 单位**：CaiNiao Inc. ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](../abs/2511.12941.md)] [[arXiv:2511.12941](https://arxiv.org/abs/2511.12941)] [[Code](https://github.com/CN-ADLab/GUIDE)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [15] Opt3DGS: Optimizing 3D Gaussian Splatting with Adaptive Exploration and Curvature-Aware Exploitation\n- **🧑‍🔬 作者**：Ziyang Huang, Jiagang Chen, Jin Liu, Shunping Ji\n- **🏫 单位**：Wuhan University ⟐ Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](../abs/2511.13571.md)] [[arXiv:2511.13571](https://arxiv.org/abs/2511.13571)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [16] SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction\n- **🧑‍🔬 作者**：Meiying Gu, Jiawei Zhang, Jiahe Li, Xiaohan Yu, Haonan Luo, Jin Zheng, Xiao Bai\n- **🏫 单位**：Beihang University ⟐ Macquarie University ⟐ Southwest Jiaotong University ⟐ State Key Laboratory of Virtual Reality Technology and Systems\n- **🔗 链接**：[[中英摘要](../abs/2511.14633.md)] [[arXiv:2511.14633](https://arxiv.org/abs/2511.14633)] [[Code](https://github.com/miya-oi/SparseSurf)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [17] Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Junseo Koo, Jinseo Jeong, Gunhee Kim\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2511.15102.md)] [[arXiv:2511.15102](https://arxiv.org/abs/2511.15102)] [[Code](https://github.com/1207koo/gaussian_blending)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [18] GS-Checker: Tampering Localization for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Haoliang Han, Ziyuan Luo, Jun Qi, Anderson Rocha, Renjie Wan\n- **🏫 单位**：Hong Kong Baptist University ⟐ University of Campinas\n- **🔗 链接**：[[中英摘要](../abs/2511.20354.md)] [[arXiv:2511.20354](https://arxiv.org/abs/2511.20354)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [19] Debiasing Diffusion Priors via 3D Attention for Consistent Gaussian Splatting\n- **🧑‍🔬 作者**：Shilong Jin, Haoran Duan, Litao Hua, Wentao Huang, Yuan Zhou\n- **🏫 单位**：Nanjing University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2512.07345.md)] [[arXiv:2512.07345](https://arxiv.org/abs/2512.07345)] [[Code](https://github.com/kimslong/AAAI26-TDAttn)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [20] UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning\n- **🧑‍🔬 作者**：Ankit Dhiman, Srinath R, Jaswanth Reddy, Lokesh R Boregowda, Venkatesh Babu Radhakrishnan\n- **🏫 单位**：Indian Institute of Science, Bangalore ⟐ Samsung R&D Institute India - Bangalore\n- **🔗 链接**：[[中英摘要](../abs/2512.24763.md)] [[arXiv:2512.24763](https://arxiv.org/abs/2512.24763)] [[Code](https://github.com/val-iisc/UniC-Lift)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [21] OceanSplat: Object-aware Gaussian Splatting with Trinocular View Consistency for Underwater Scene Reconstruction\n- **🧑‍🔬 作者**：Minseong Kweon, Jinsun Park\n- **🏫 单位**：University of Minnesota ⟐ Pusan National University\n- **🔗 链接**：[[中英摘要](../abs/2601.04984.md)] [[arXiv:2601.04984](https://arxiv.org/abs/2601.04984)] [[Code](https://github.com/mnseong/oceansplat)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [22] TG-Field: Geometry-Aware Radiative Gaussian Fields for Tomographic Reconstruction\n- **🧑‍🔬 作者**：Yuxiang Zhong, Jun Wei, Chaoqi Chen, Senyou An, Hui Huang\n- **🏫 单位**：Shenzhen University\n- **🔗 链接**：[[中英摘要](../abs/2602.11705.md)] [[arXiv:2602.11705](https://arxiv.org/abs/2602.11705)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n"
  },
  {
    "path": "2026/Accepted.md",
    "content": "# 3D Gaussian Splatting Papers Accepted in 2026\n\n#### [1] GS-2M: Gaussian Splatting for Joint Mesh Reconstruction and Material Decomposition\n- **🧑‍🔬 作者**：Dinh Minh Nguyen, Malte Avenhaus, Thomas Lindemeier\n- **🏫 单位**：Norwegian University of Science and Technology ⟐ Carl Zeiss AG\n- **🔗 链接**：[[中英摘要](../abs/2509.22276.md)] [[arXiv:2509.22276](https://arxiv.org/abs/2509.22276)] [[Code](https://github.com/ndming/GS-2M)]\n- **📝 说明**: 🏆 Accepted to Eurographics 2026\n\n#### [2] RoGER-SLAM: A Robust Gaussian Splatting SLAM System for Noisy and Low-light Environment Resilience\n- **🧑‍🔬 作者**：Huilin Yin, Zhaolin Yang, Linchuan Zhang, Gerhard Rigoll, Johannes Betz\n- **🏫 单位**：Tongji University ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2510.22600.md)] [[arXiv:2510.22600](https://arxiv.org/abs/2510.22600)] [Code]\n- **📝 说明**: 🏆 Accepted to TIM 2026\n\n#### [3] Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing\n- **🧑‍🔬 作者**：Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai Qi Zhi Institute ⟐ Chinese Academy of Science\n- **🔗 链接**：[[中英摘要](../abs/2511.18755.md)] [[arXiv:2511.18755](https://arxiv.org/abs/2511.18755)] [Code]\n- **📝 说明**: 🏆 Accepted to HPCA 2026\n\n#### [4] GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars\n- **🧑‍🔬 作者**：Kelian Baert, Mae Younes, Francois Bourel, Marc Christie, Adnane Boukhayma\n- **🏫 单位**：Univ Rennes ⟐ Inria ⟐ CNRS ⟐ IRISA\n- **🔗 链接**：[[中英摘要](../abs/2512.09162.md)] [[arXiv:2512.09162](https://arxiv.org/abs/2512.09162)] [[Code](https://github.com/KelianB/GTAvatar)]\n- **📝 说明**: 🏆 Accepted to Eurographics 2026\n\n#### [5] Lightweight 3D Gaussian Splatting Compression via Video Codec\n- **🧑‍🔬 作者**：Qi Yang, Geert Van Der Auwera, Zhu Li\n- **🏫 单位**：University of Missouri - Kansas City\n- **🔗 链接**：[[中英摘要](../abs/2512.11186.md)] [[arXiv:2512.11186](https://arxiv.org/abs/2512.11186)] [[Code](https://github.com/Qi-Yangsjtu/LGSCV)]\n- **📝 说明**: 🏆 Accepted to DCC 2026\n\n#### [6] Voxel-GS: Quantized Scaffold Gaussian Splatting Compression with Run-Length Coding\n- **🧑‍🔬 作者**：Chunyang Fu, Xiangrui Liu, Shiqi Wang, Zhu Li\n- **🏫 单位**：City University of Hong Kong ⟐ University of Missouri-Kansas City\n- **🔗 链接**：[[中英摘要](../abs/2512.17528.md)] [[arXiv:2512.17528](https://arxiv.org/abs/2512.17528)] [[Code](https://github.com/zb12138/VoxelGS)]\n- **📝 说明**: 🏆 Accepted to DCC 2026\n\n#### [7] GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering\n- **🧑‍🔬 作者**：Junseo Lee, Sangyun Jeon, Jungi Lee, Junyong Park, Jaewoong Sim\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2601.20429.md)] [[arXiv:2601.20429](https://arxiv.org/abs/2601.20429)] [Code]\n- **📝 说明**: 🏆 Accepted to HPCA 2026\n\n#### [8] VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR\n- **🧑‍🔬 作者**：Hail Song, Boram Yoon, Seokhwan Yang, Seoyoung Kang, Hyunjeong Kim, Henning Metzmacher, Woontack Woo\n- **🏫 单位**：KAIST ⟐ Hansung University ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2602.01674.md)] [[arXiv:2602.01674](https://arxiv.org/abs/2602.01674)] [[Code](https://github.com/hailsong/VRGaussianAvatar)]\n- **📝 说明**: 🏆 Accepted to TVCG 2026\n\n#### [9] OFERA: Blendshape-driven 3D Gaussian Control for Occluded Facial Expression to Realistic Avatars in VR\n- **🧑‍🔬 作者**：Seokhwan Yang, Boram Yoon, Seoyoung Kang, Hail Song, Woontack Woo\n- **🏫 单位**：KAIST\n- **🔗 链接**：[[中英摘要](../abs/2602.01748.md)] [[arXiv:2602.01748](https://arxiv.org/abs/2602.01748)] [Code]\n- **📝 说明**: 🏆 Accepted to TVCG 2026\n\n#### [10] SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization\n- **🧑‍🔬 作者**：Lifan Wu, Ruijie Zhu, Yubo Ai, Tianzhu Zhang\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](../abs/2602.04271.md)] [[arXiv:2602.04271](https://arxiv.org/abs/2602.04271)] [[Code](https://github.com/wusar/SkeletonGaussian)]\n- **📝 说明**: 🏆 Accepted to CVM 2026\n\n#### [11] LeafFit: Plant Assets Creation from 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chang Luo, Nobuyuki Umetani\n- **🏫 单位**：The University of Tokyo\n- **🔗 链接**：[[中英摘要](../abs/2602.11577.md)] [[arXiv:2602.11577](https://arxiv.org/abs/2602.11577)] [[Code](https://github.com/netbeifeng/leaf_fit)]\n- **📝 说明**: 🏆 Accepted to Eurographics 2026\n"
  },
  {
    "path": "2026/CVPR.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to CVPR2026\n\n#### [1] HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars\n- **🧑‍🔬 作者**：Gent Serifi, Marcel C. Bühler\n- **🏫 单位**：ETH Zurich\n- **🔗 链接**：[[中英摘要](../abs/2507.02803.md)] [[arXiv:2507.02803](https://arxiv.org/abs/2507.02803)] [[Code](https://github.com/gserifi/HyperGaussians)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [2] MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second\n- **🧑‍🔬 作者**：Chenguo Lin, Yuchen Lin, Panwang Pan, Yifan Yu, Honglei Yan, Katerina Fragkiadaki, Yadong Mu\n- **🏫 单位**：Peking University ⟐ ByteDance ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](../abs/2507.10065.md)] [[arXiv:2507.10065](https://arxiv.org/abs/2507.10065)] [[Code](https://github.com/chenguolin/MoVieS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [3] iLRM: An Iterative Large 3D Reconstruction Model\n- **🧑‍🔬 作者**：Gyeongjin Kang, Seungtae Nam, Xiangyu Sun, Sameh Khamis, Abdelrahman Mohamed, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Yonsei Universiy ⟐ Rembrand\n- **🔗 链接**：[[中英摘要](../abs/2507.23277.md)] [[arXiv:2507.23277](https://arxiv.org/abs/2507.23277)] [[Code](https://github.com/Gynjn/iLRM)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [4] Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images\n- **🧑‍🔬 作者**：Xiangyu Sun, Haoyi Jiang, Liu Liu, Seungtae Nam, Gyeongjin Kang, Xinjie Wang, Wei Sui, Zhizhong Su, Wenyu Liu, Xinggang Wang, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Huazhong University of Science & Technology ⟐ Yonsei University ⟐ Horizon Robotics ⟐ D-Robotics\n- **🔗 链接**：[[中英摘要](../abs/2508.03643.md)] [[arXiv:2508.03643](https://arxiv.org/abs/2508.03643)] [[Code](https://github.com/HorizonRobotics/Uni3R)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [5] REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting\n- **🧑‍🔬 作者**：Changyue Shi, Minghao Chen, Yiping Mao, Chuxiao Yang, Xinyuan Hu, Jiajun Ding, Zhou Yu\n- **🏫 单位**：Peking University ⟐ Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](../abs/2510.16410.md)] [[arXiv:2510.16410](https://arxiv.org/abs/2510.16410)] [[Code](https://github.com/ChangyueShi/REALM-Code)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [6] Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models\n- **🧑‍🔬 作者**：Panwang Pan, Chenguo Lin, Jingjing Zhao, Chenxin Li, Yuchen Lin, Haopeng Li, Honglei Yan, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu\n- **🏫 单位**：Peking University ⟐ The Chinese University of Hong Kong ⟐ Xiamen University\n- **🔗 链接**：[[中英摘要](../abs/2511.00503.md)] [[arXiv:2511.00503](https://arxiv.org/abs/2511.00503)] [[Code](https://github.com/paulpanwang/Diff4Splat)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [7] FastGS: Training 3D Gaussian Splatting in 100 Seconds\n- **🧑‍🔬 作者**：Shiwei Ren, Tianci Wen, Yongchun Fang, Biao Lu\n- **🏫 单位**：NanKai University\n- **🔗 链接**：[[中英摘要](../abs/2511.04283.md)] [[arXiv:2511.04283](https://arxiv.org/abs/2511.04283)] [[Code](https://github.com/fastgs/FastGS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [8] STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction\n- **🧑‍🔬 作者**：Jiankuo Zhao, Xiangyu Zhu, Zidu Wang, Zhen Lei\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](../abs/2511.19854.md)] [[arXiv:2511.19854](https://arxiv.org/abs/2511.19854)] [[Code](https://github.com/LCFAW/STAvatar)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [9] EcoSplat: Efficiency-controllable Feed-forward 3D Gaussian Splatting from Multi-view Images\n- **🧑‍🔬 作者**：Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, Munchurl Kim\n- **🏫 单位**：KAIST ⟐ Chung-Ang University\n- **🔗 链接**：[[中英摘要](../abs/2512.18692.md)] [[arXiv:2512.18692](https://arxiv.org/abs/2512.18692)] [[Code](https://github.com/KAIST-VICLab/EcoSplat)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [10] GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation\n- **🧑‍🔬 作者**：Tianchen Deng, Xuefeng Chen, Yi Chen, Qu Chen, Yuyao Xu, Lijin Yang, Le Xu, Yu Zhang, Bo Zhang, Wuxiong Huang, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Tsinghua University ⟐ MEGVII Technology ⟐ Mach Drive\n- **🔗 链接**：[[中英摘要](../abs/2512.23180.md)] [[arXiv:2512.23180](https://arxiv.org/abs/2512.23180)] [[Code](https://github.com/dtc111111/GaussianDWM)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [11] ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking\n- **🧑‍🔬 作者**：Xiaobao Wei, Zhangjie Ye, Yuxiang Gu, Zunjie Zhu, Yunfei Guo, Yingying Shen, Shan Zhao, Ming Lu, Haiyang Sun, Bing Wang, Guang Chen, Rongfeng Lu, Hangjun Ye\n- **🏫 单位**：Xiaomi EV ⟐ Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](../abs/2601.01386.md)] [[arXiv:2601.01386](https://arxiv.org/abs/2601.01386)] [[Code](https://github.com/wm-research/ParkGaussian)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [12] Faster-GS: Analyzing and Improving Gaussian Splatting Optimization\n- **🧑‍🔬 作者**：Florian Hahlbohm, Linus Franke, Martin Eisemann, Marcus Magnor\n- **🏫 单位**：TU Braunschweig ⟐ Inria ⟐ Université Côte d’Azur\n- **🔗 链接**：[[中英摘要](../abs/2602.09999.md)] [[arXiv:2602.09999](https://arxiv.org/abs/2602.09999)] [[Code](https://github.com/nerficg-project/faster-gaussian-splatting)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [13] B$^3$-Seg: Camera-Free, Training-Free 3DGS Segmentation via Analytic EIG and Beta-Bernoulli Bayesian Updates\n- **🧑‍🔬 作者**：Hiromichi Kamata, Samuel Arthur Munro, Fuminori Homma\n- **🏫 单位**：Sony Group Corporation ⟐ Pixomondo\n- **🔗 链接**：[[中英摘要](../abs/2602.17134.md)] [[arXiv:2602.17134](https://arxiv.org/abs/2602.17134)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [14] RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing\n- **🧑‍🔬 作者**：Kaifa Yang, Qi Yang, Yiling Xu, Zhu Li\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Missouri–Kansas City\n- **🔗 链接**：[[中英摘要](../abs/2602.19753.md)] [[arXiv:2602.19753](https://arxiv.org/abs/2602.19753)] [[Code](https://github.com/yyyykf/RAP)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [15] tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction\n- **🧑‍🔬 作者**：Chen Wang, Hao Tan, Wang Yifan, Zhiqin Chen, Yuheng Liu, Kalyan Sunkavalli, Sai Bi, Lingjie Liu, Yiwei Hu\n- **🏫 单位**：University of Pennsylvania ⟐ Adobe Research ⟐ UCI\n- **🔗 链接**：[[中英摘要](../abs/2602.20160.md)] [[arXiv:2602.20160](https://arxiv.org/abs/2602.20160)] [[Code](https://github.com/cwchenwang/tttLRM)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [16] RU4D-SLAM: Reweighting Uncertainty in Gaussian Splatting SLAM for 4D Scene Reconstruction\n- **🧑‍🔬 作者**：Yangfan Zhao, Hanwei Zhang, Ke Huang, Qiufeng Wang, Zhenzhou Shao, Dengyu Wu\n- **🏫 单位**：Capital Normal University ⟐ Saarland University ⟐ Xi’an Jiaotong-Liverpool University ⟐ King’s College London\n- **🔗 链接**：[[中英摘要](../abs/2602.20807.md)] [[arXiv:2602.20807](https://arxiv.org/abs/2602.20807)] [[Code](https://github.com/CNU-Bot-Group/ru4dslam)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026 Findings Track\n\n#### [17] Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting\n- **🧑‍🔬 作者**：Shuangkang Fang, I-Chao Shen, Xuanyang Zhang, Zesheng Wang, Yufeng Wang, Wenrui Ding, Gang Yu, Takeo Igarashi\n- **🏫 单位**：Beihang University ⟐ The University of Tokyo ⟐ StepFun\n- **🔗 链接**：[[中英摘要](../abs/2602.20933.md)] [[arXiv:2602.20933](https://arxiv.org/abs/2602.20933)] [[Code](https://github.com/Fangkang515/DropAnSH-GS)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [18] BrepGaussian: CAD reconstruction from Multi-View Images with Gaussian Splatting\n- **🧑‍🔬 作者**：Jiaxing Yu, Dongyang Ren, Hangyu Xu, Zhouyuxiao Yang, Yuanqi Li, Jie Guo, Zhengkang Zhou, Yanwen Guo\n- **🏫 单位**：Nanjing University ⟐ Nanjing Bridge Intelligent Management Co., Ltd.\n- **🔗 链接**：[[中英摘要](../abs/2602.21105.md)] [[arXiv:2602.21105](https://arxiv.org/abs/2602.21105)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [19] AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction\n- **🧑‍🔬 作者**：Hanyang Liu, Rongjun Qin\n- **🏫 单位**：The Ohio State University\n- **🔗 链接**：[[中英摘要](../abs/2602.22376.md)] [[arXiv:2602.22376](https://arxiv.org/abs/2602.22376)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [20] OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution\n- **🧑‍🔬 作者**：Chong Xia, Fangfu Liu, Yule Wang, Yize Pang, Yueqi Duan\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](../abs/2603.02134.md)] [[arXiv:2603.02134](https://arxiv.org/abs/2603.02134)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2026 Findings Track\n\n#### [21] EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding\n- **🧑‍🔬 作者**：Seungjun Lee, Zihan Wang, Yunsong Wang, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2603.04254.md)] [[arXiv:2603.04254](https://arxiv.org/abs/2603.04254)] [[Code](https://github.com/0nandon/EmbodiedSplat)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [22] Speeding Up the Learning of 3D Gaussians with Much Shorter Gaussian Lists\n- **🧑‍🔬 作者**：Jiaqi Liu, Zhizhong Han\n- **🏫 单位**：Wayne State University\n- **🔗 链接**：[[中英摘要](../abs/2603.09277.md)] [[arXiv:2603.09277](https://arxiv.org/abs/2603.09277)] [[Code](https://github.com/MachinePerceptionLab/ShorterSplatting)]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n\n#### [23] VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM\n- **🧑‍🔬 作者**：Anh Thuan Tran, Jana Kosecka\n- **🏫 单位**：George Mason University\n- **🔗 链接**：[[中英摘要](../abs/2603.09673.md)] [[arXiv:2603.09673](https://arxiv.org/abs/2603.09673)] [Code]\n- **📝 说明**: 🏆 Accepted to CVPR 2026\n"
  },
  {
    "path": "2026/ICASSP.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICASSP2026\n\n#### [1] Gaussian Mesh Renderer for Lightweight Differentiable Rendering\n- **🧑‍🔬 作者**：Xinpeng Liu, Fumio Okura\n- **🏫 单位**：The University of Osaka\n- **🔗 链接**：[[中英摘要](../abs/2602.14493.md)] [[arXiv:2602.14493](https://arxiv.org/abs/2602.14493)] [[Code](https://github.com/huntorochi/Gaussian-Mesh-Renderer)]\n- **📝 说明**: 🏆 Accepted to ICASSP 2026\n\n#### [2] Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution\n- **🧑‍🔬 作者**：Shuting Liu, Lei Zhang, Wei Huang, Zhao Zhang, Zizhou Wang\n- **🏫 单位**：Sichuan University ⟐ A*STAR\n- **🔗 链接**：[[中英摘要](../abs/2603.09621.md)] [[arXiv:2603.09621](https://arxiv.org/abs/2603.09621)] [Code]\n- **📝 说明**: 🏆 Accepted to ICASSP 2026\n"
  },
  {
    "path": "2026/ICLR.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICLR2026\n\n#### [1] Gradient-Direction-Aware Density Control for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zheng Zhou, Yu-Jie Xiong, Chun-Ming Xia, Jia-Chen Zhang, Hong-Jian Zhan\n- **🏫 单位**：Shanghai University of Engineering Science\n- **🔗 链接**：[[中英摘要](../abs/2508.09239.md)] [[arXiv:2508.09239](https://arxiv.org/abs/2508.09239)] [[Code](https://github.com/zzcqz/GDAGS)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [2] FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers\n- **🧑‍🔬 作者**：Yue Wu, Yufan Wu, Wen Li, Yuxi Lu, Kairui Feng, Xuanhong Chen\n- **🏫 单位**：Tongji University ⟐ Shanghai Innovation Institute ⟐ Shanghai Jiao Tong University ⟐ AKool\n- **🔗 链接**：[[中英摘要](../abs/2508.19754.md)] [[arXiv:2508.19754](https://arxiv.org/abs/2508.19754)] [[Code](https://github.com/TyrionWuYue/FastAvatar)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [3] MEGS2: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning\n- **🧑‍🔬 作者**：Jiarui Chen, Yikeng Chen, Yingshuang Zou, Ye Huang, Peng Wang, Yuan Liu, Yujing Sun, Wenping Wang\n- **🏫 单位**：HKUST ⟐ SZU ⟐ SYSU ⟐ Adobe ⟐ NTU ⟐ TAMU\n- **🔗 链接**：[[中英摘要](../abs/2509.07021.md)] [[arXiv:2509.07021](https://arxiv.org/abs/2509.07021)] [[Code](https://github.com/IGL-HKUST/MEGS-2)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [4] Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation\n- **🧑‍🔬 作者**：Sherwin Bahmani, Tianchang Shen, Jiawei Ren, Jiahui Huang, Yifeng Jiang, Haithem Turki, Andrea Tagliasacchi, David B. Lindell, Zan Gojcic, Sanja Fidler, Huan Ling, Jun Gao, Xuanchi Ren\n- **🏫 单位**：NVIDIA ⟐ University of Toronto ⟐ Vector Institute ⟐ Simon Fraser University\n- **🔗 链接**：[[中英摘要](../abs/2509.19296.md)] [[arXiv:2509.19296](https://arxiv.org/abs/2509.19296)] [[Code](https://github.com/nv-tlabs/lyra)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [5] WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving\n- **🧑‍🔬 作者**：Ziyue Zhu, Zhanqian Wu, Zhenxin Zhu, Lijun Zhou, Haiyang Sun, Bing Wan, Kun Ma, Guang Chen, Hangjun Ye, Jin Xie, jian Yang\n- **🏫 单位**：Nankai University ⟐ Xiaomi EV ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](../abs/2509.23402.md)] [[arXiv:2509.23402](https://arxiv.org/abs/2509.23402)] [[Code](https://github.com/wm-research/worldsplat)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [6] Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting\n- **🧑‍🔬 作者**：Hanzhou Liu, Jia Huang, Mi Lu, Srikanth Saripalli, Peng Jiang\n- **🏫 单位**：Texas A&M University\n- **🔗 链接**：[[中英摘要](../abs/2509.26455.md)] [[arXiv:2509.26455](https://arxiv.org/abs/2509.26455)] [[Code](https://github.com/HanzhouLiu/StylOS)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [7] Universal Beta Splatting\n- **🧑‍🔬 作者**：Rong Liu, Zhongpai Gao, Benjamin Planche, Meida Chen, Van Nguyen Nguyen, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Yue Wang, Andrew Feng, Ziyan Wu\n- **🏫 单位**：University of Southern California ⟐ United Imaging Intelligence\n- **🔗 链接**：[[中英摘要](../abs/2510.03312.md)] [[arXiv:2510.03312](https://arxiv.org/abs/2510.03312)] [[Code](https://github.com/RongLiu-Leo/universal-beta-splatting)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [8] Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction\n- **🧑‍🔬 作者**：Chi Yan, Dan Xu\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ ZEEKR Automobile R&D Co., Ltd\n- **🔗 链接**：[[中英摘要](../abs/2510.04759.md)] [[arXiv:2510.04759](https://arxiv.org/abs/2510.04759)] [[Code](https://github.com/yanchi-3dv/PG-Occ)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [9] ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation\n- **🧑‍🔬 作者**：Guanghao Li, Kerui Ren, Linning Xu, Zhewen Zheng, Changjian Jiang, Xin Gao, Bo Dai, Jian Pu, Mulin Yu, Jiangmiao Pang\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ Fudan University ⟐ Shanghai Innovation Institute ⟐ Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong ⟐ Carnegie Mellon University ⟐ Zhejiang University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2510.08551.md)] [[arXiv:2510.08551](https://arxiv.org/abs/2510.08551)] [[Code](https://github.com/InternRobotics/ARTDECO)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [10] D2GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction\n- **🧑‍🔬 作者**：Meixi Song, Xin Lin, Dizhe Zhang, Haodong Li, Xiangtai Li, Bo Du, Lu Qi\n- **🏫 单位**：Insta360 Research ⟐ Tsinghua University ⟐ University of California, San Diego ⟐ Nanyang Technological University ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](../abs/2510.08566.md)] [[arXiv:2510.08566](https://arxiv.org/abs/2510.08566)] [[Code](https://github.com/Insta360-Research-Team/DDGS)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [11] CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhigang Cheng, Mingchao Sun, Yu Liu, Zengye Ge, Luyang Tang, Mu Xu, Yangyan Li, Peng Pan\n- **🏫 单位**：Tsinghua University ⟐ AMAP ⟐ Ant Group\n- **🔗 链接**：[[中英摘要](../abs/2510.09997.md)] [[arXiv:2510.09997](https://arxiv.org/abs/2510.09997)] [Code]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [12] Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer\n- **🧑‍🔬 作者**：Yecong Wan, Mingwen Shao, Renlong Wu, Wangmeng Zuo\n- **🏫 单位**：Harbin Institute of Technology ⟐ Shenzhen University of Advanced Technology\n- **🔗 链接**：[[中英摘要](../abs/2510.10152.md)] [[arXiv:2510.10152](https://arxiv.org/abs/2510.10152)] [[Code](https://github.com/yecongwan/Color3D)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [13] G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior\n- **🧑‍🔬 作者**：Junfeng Ni, Yixin Chen, Zhifei Yang, Yu Liu, Ruijie Lu, Song-Chun Zhu, Siyuan Huang\n- **🏫 单位**：Tsinghua University ⟐ BIGAI ⟐ Peking University\n- **🔗 链接**：[[中英摘要](../abs/2510.12099.md)] [[arXiv:2510.12099](https://arxiv.org/abs/2510.12099)] [[Code](https://github.com/DaLi-Jack/G4Splat)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [14] Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction\n- **🧑‍🔬 作者**：Fengzhi Guo, Chih-Chuan Hsu, Sihao Ding, Cheng Zhang\n- **🏫 单位**：Texas A&M University ⟐ Mercedes-Benz North America\n- **🔗 链接**：[[中英摘要](../abs/2510.12768.md)] [[arXiv:2510.12768](https://arxiv.org/abs/2510.12768)] [[Code](https://github.com/TAMU-Visual-AI/usplat4d)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [15] VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator\n- **🧑‍🔬 作者**：Hyojun Go, Dominik Narnhofer, Goutam Bhat, Prune Truong, Federico Tombari, Konrad Schindler\n- **🏫 单位**：ETH Zurich ⟐ Google\n- **🔗 链接**：[[中英摘要](../abs/2510.13454.md)] [[arXiv:2510.13454](https://arxiv.org/abs/2510.13454)] [[Code](https://github.com/gohyojun15/VIST3A)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026 Oral\n\n#### [16] FlashWorld: High-quality 3D Scene Generation within Seconds\n- **🧑‍🔬 作者**：Xinyang Li, Tengfei Wang, Zixiao Gu, Shengchuan Zhang, Chunchao Guo, Liujuan Cao\n- **🏫 单位**：Xiamen University ⟐ Tencent ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](../abs/2510.13678.md)] [[arXiv:2510.13678](https://arxiv.org/abs/2510.13678)] [[Code](https://github.com/imlixinyang/FlashWorld)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026 Oral\n\n#### [17] MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting\n- **🧑‍🔬 作者**：In-Hwan Jin, Hyeongju Mun, Joonsoo Kim, Kugjin Yun, Kyeongbo Kong\n- **🏫 单位**：Pusan National University ⟐ Electronics and Telecommunications Research Institute\n- **🔗 链接**：[[中英摘要](../abs/2510.19210.md)] [[arXiv:2510.19210](https://arxiv.org/abs/2510.19210)] [Code]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [18] UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction\n- **🧑‍🔬 作者**：Chen Shi, Shaoshuai Shi, Xiaoyang Lyu, Chunyang Liu, Kehua Sheng, Bo Zhang, Li Jiang\n- **🏫 单位**：The Chinese University of Hong Kong, Shenzhen ⟐ Didi Chuxing ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](../abs/2511.04595.md)] [[arXiv:2511.04595](https://arxiv.org/abs/2511.04595)] [[Code](https://github.com/chenshi3/UniSplat)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [19] Sharp Monocular View Synthesis in Less Than a Second\n- **🧑‍🔬 作者**：Lars Mescheder, Wei Dong, Shiwei Li, Xuyang Bai, Marcel Santos, Peiyun Hu, Bruno Lecouat, Mingmin Zhen, Amaël Delaunoy, Tian Fang, Yanghai Tsin, Stephan R. Richter, Vladlen Koltun\n- **🏫 单位**：Apple\n- **🔗 链接**：[[中英摘要](../abs/2512.10685.md)] [[arXiv:2512.10685](https://arxiv.org/abs/2512.10685)] [[Code](https://github.com/apple/ml-sharp)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [20] A Step to Decouple Optimization in 3DGS\n- **🧑‍🔬 作者**：Renjie Ding, Yaonan Wang, Min Liu, Jialin Zhu, Jiazheng Wang, Jiahao Zhao, Wenting Shen, Feixiang He, Xiang Chen\n- **🏫 单位**：Hunan University ⟐ Baidu ⟐ Central South University\n- **🔗 链接**：[[中英摘要](../abs/2601.16736.md)] [[arXiv:2601.16736](https://arxiv.org/abs/2601.16736)] [Code]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [21] SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors\n- **🧑‍🔬 作者**：Bing He, Jingnan Gao, Yunuo Chen, Ning Cao, Gang Chen, Zhengxue Cheng, Li Song, Wenjun Zhang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Tianyi Shilian Technology Co., Ltd.\n- **🔗 链接**：[[中英摘要](../abs/2602.02000.md)] [[arXiv:2602.02000](https://arxiv.org/abs/2602.02000)] [[Code](https://github.com/hebing-sjtu/SurfSplat)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [22] UrbanGS: A Scalable and Efficient Architecture for Geometrically Accurate Large-Scene Reconstruction\n- **🧑‍🔬 作者**：Changbai Li, Haodong Zhu, Hanlin Chen, Xiuping Liang, Tongfei Chen, Shuwei Shao, Linlin Yang, Huobin Tan, Baochang Zhang\n- **🏫 单位**：Beihang University ⟐ Communication University of China ⟐ National University of Singapore ⟐ Shandong University ⟐ Lobachevsky State University\n- **🔗 链接**：[[中英摘要](../abs/2602.02089.md)] [[arXiv:2602.02089](https://arxiv.org/abs/2602.02089)] [Code]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [23] Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation\n- **🧑‍🔬 作者**：David Shavin, Sagie Benaim\n- **🏫 单位**：The Hebrew University of Jerusalem\n- **🔗 链接**：[[中英摘要](../abs/2602.06032.md)] [[arXiv:2602.06032](https://arxiv.org/abs/2602.06032)] [[Code](https://github.com/davidshavin4/Splat-and-Distill)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [24] Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting\n- **🧑‍🔬 作者**：Yixin Yang, Bojian Wu, Yang Zhou, Hui Huang\n- **🏫 单位**：Shenzhen University ⟐ Tencent Games\n- **🔗 链接**：[[中英摘要](../abs/2602.19916.md)] [[arXiv:2602.19916](https://arxiv.org/abs/2602.19916)] [Code]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [25] Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement\n- **🧑‍🔬 作者**：Hao Ai, Wenjie Chang, Jianbo Jiao, Ales Leonardis, Ofek Eyal\n- **🏫 单位**：University of Birmingham ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](../abs/2603.02910.md)] [[arXiv:2603.02910](https://arxiv.org/abs/2603.02910)] [[Code](https://github.com/haoai-1997/AiM)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n\n#### [26] DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics\n- **🧑‍🔬 作者**：Yuanhang Lei, Boming Zhao, Zesong Yang, Xingxuan Li, Tao Cheng, Haocheng Peng, Ru Zhang, Yang Yang, Siyuan Huang, Yujun Shen, Ruizhen Hu, Hujun Bao, Zhaopeng Cui\n- **🏫 单位**：Zhejiang University ⟐ BIGAI ⟐ Ant Group ⟐ Shenzhen University\n- **🔗 链接**：[[中英摘要](../abs/2603.09668.md)] [[arXiv:2603.09668](https://arxiv.org/abs/2603.09668)] [[Code](https://github.com/zju3dv/DiffWind)]\n- **📝 说明**: 🏆 Accepted to ICLR 2026\n"
  },
  {
    "path": "2026/ICRA.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to ICRA2026\n\n#### [1] RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI\n- **🧑‍🔬 作者**：Cong Tai, Zhaoyu Zheng, Haixu Long, Hansheng Wu, Haodong Xiang, Zhengbin Long, Jun Xiong, Rong Shi, Shizhuang Zhang, Gang Qiu, He Wang, Ruifeng Li, Jun Huang, Bin Chang, Shuai Feng, Tao Shen\n- **🏫 单位**： ZTE Corporation ⟐ The Chinese University of Hong Kong, Shenzhen\n- **🔗 链接**：[[中英摘要](../abs/2509.14687.md)] [[arXiv:2509.14687](https://arxiv.org/abs/2509.14687)] [[Code](https://github.com/terminators2025/RealMirror)]\n- **📝 说明**: 🏆 Accepted to ICRA 2026\n\n#### [2] Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation\n- **🧑‍🔬 作者**：Maggie Wang, Stephen Tian, Aiden Swann, Ola Shorinwa, Jiajun Wu, Mac Schwager\n- **🏫 单位**：Stanford University ⟐ Princeton University\n- **🔗 链接**：[[中英摘要](../abs/2510.11689.md)] [[arXiv:2510.11689](https://arxiv.org/abs/2510.11689)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2026\n\n#### [3] Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes\n- **🧑‍🔬 作者**：Seunghoon Jeong, Eunho Lee, Jeongyun Kim, Ayoung Kim\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](../abs/2602.08266.md)] [[arXiv:2602.08266](https://arxiv.org/abs/2602.08266)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2026\n\n#### [4] GaussianCaR: Gaussian Splatting for Efficient Camera-Radar Fusion\n- **🧑‍🔬 作者**：Santiago Montiel-Marín, Miguel Antunes-García, Fabio Sánchez-García, Angel Llamazares, Holger Caesar, Luis M. Bergasa\n- **🏫 单位**：University of Alcalá ⟐ Delft University of Technology\n- **🔗 链接**：[[中英摘要](../abs/2602.08784.md)] [[arXiv:2602.08784](https://arxiv.org/abs/2602.08784)] [Code]\n- **📝 说明**: 🏆 Accepted to ICRA 2026\n\n#### [5] MipSLAM: Alias-Free Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Yingzhao Li, Yan Li, Shixiong Tian, Yanjie Liu, Lijun Zhao, Gim Hee Lee\n- **🏫 单位**：Harbin Institute of Technology ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](../abs/2603.06989.md)] [[arXiv:2603.06989](https://arxiv.org/abs/2603.06989)] [[Code](https://github.com/yzli1998/MipSLAM)]\n- **📝 说明**: 🏆 Accepted to ICRA 2026\n"
  },
  {
    "path": "2026/WACV.md",
    "content": "# 3D Gaussian Splatting Papers Accepted to WACV2026\n\n#### [1] Optimization-Free Style Transfer for 3D Gaussian Splats\n- **🧑‍🔬 作者**：Raphael Du Sablon, David Hart\n- **🏫 单位**：East Carolina University\n- **🔗 链接**：[[中英摘要](../abs/2508.05813.md)] [[arXiv:2508.05813](https://arxiv.org/abs/2508.05813)] [[Code](https://github.com/davidmhart/FastSplatStyler)]\n- **📝 说明**: 🏆 Accepted to WACV 2026\n\n#### [2] Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes\n- **🧑‍🔬 作者**：Shaoxiang Wang, Shihong Zhang, Christen Millerdurai, Rüdiger Westermann, Didier Stricker, Alain Pagani\n- **🏫 单位**：German Research Center for Artificial Intelligence ⟐ RPTU ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](../abs/2511.06457.md)] [[arXiv:2511.06457](https://arxiv.org/abs/2511.06457)] [[Code](https://github.com/dfki-av/Inpaint360GS)]\n- **📝 说明**: 🏆 Accepted to WACV 2026\n\n#### [3] SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering\n- **🧑‍🔬 作者**：Laura Bragagnolo, Leonardo Barcellona, Stefano Ghidoni\n- **🏫 单位**：University of Padova ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](../abs/2511.08294.md)] [[arXiv:2511.08294](https://arxiv.org/abs/2511.08294)] [[Code](https://github.com/laurabragagnolo/SkelSplat)]\n- **📝 说明**: 🏆 Accepted to WACV 2026\n\n#### [4] CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yu-Jen Tseng, Chia-Hao Kao, Jing-Zhong Chen, Alessandro Gnutti, Shao-Yuan Lo, Yen-Yu Lin, Wen-Hsiao Peng\n- **🏫 单位**：National Yang Ming Chiao Tung University ⟐ University of Brescia ⟐ National Taiwan University\n- **🔗 链接**：[[中英摘要](../abs/2601.12814.md)] [[arXiv:2601.12814](https://arxiv.org/abs/2601.12814)] [Code]\n- **📝 说明**: 🏆 Accepted to WACV 2026\n"
  },
  {
    "path": "Changelog.md",
    "content": "# Changelog\n\n### 2026/03/11\n\nAdd \"GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System\"\n\nAdd \"ProGS: Towards Progressive Coding for 3D Gaussian Splatting\"\n\nAdd \"VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM\"\n\nAdd \"DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics\"\n\nAdd \"X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models\"\n\nAdd \"Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution\"\n\nAdd \"DenoiseSplat: Feed-Forward Gaussian Splatting for Noisy 3D Scene Reconstruction\"\n\nAdd \"Speeding Up the Learning of 3D Gaussians with Much Shorter Gaussian Lists\"\n\nAdd \"GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models\"\n\nAdd \"SkipGS: Post-Densification Backward Skipping for Efficient 3DGS Training\"\n\nAdd \"SurgCalib: Gaussian Splatting-Based Hand-Eye Calibration for Robot-Assisted Minimally Invasive Surgery\"\n\nAdd \"Where, What, Why: Toward Explainable 3D-GS Watermarking\"\n\nAdd \"ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting\"\n\nAdd \"Spherical-GOF: Geometry-Aware Panoramic Gaussian Opacity Fields for 3D Scene Reconstruction\"\n\nAdd \"DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving\"\n\nAdd \"Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence\"\n\nAdd \"EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation\"\n\nAdd \"3DGS-HPC: Distractor-free 3D Gaussian Splatting with Hybrid Patch-wise Classification\"\n\nAdd \"ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction\"\n\nAdd \"MipSLAM: Alias-Free Gaussian Splatting SLAM\"\n\nAdd \"ColonSplat: Reconstruction of Peristaltic Motion in Colonoscopy with Dynamic Gaussian Splatting\"\n\nAdd \"Active View Selection with Perturbed Gaussian Ensemble for Tomographic Reconstruction\"\n\nAdd \"EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian Splatting\"\n\nAdd \"VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction\"\n\nAdd \"Transforming Omnidirectional RGB-LiDAR data into 3D Gaussian Splatting\"\n\nAdd \"FTSplat: Feed-forward Triangle Splatting Network\"\n\nAdd \"CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis\"\n\nAdd \"Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation\"\n\nAdd \"SSR-GS: Separating Specular Reflection in Gaussian Splatting for Glossy Surface Reconstruction\"\n\nAdd \"GaussTwin: Unified Simulation and Correction with Gaussian Splatting for Robotic Digital Twins\"\n\nAdd \"GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction\"\n\nAdd \"DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction\"\n\nAdd \"Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On\"\n\nAdd \"EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding\"\n\nAdd \"DM-CFO: A Diffusion Model for Compositional 3D Tooth Generation with Collision-Free Optimization\"\n\nAdd \"VIRGi: View-dependent Instant Recoloring of 3D Gaussians Splats\"\n\nAdd \"Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement\"\n\nAdd \"Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting\"\n\nAdd \"Generalized non-exponential Gaussian splatting\"\n\nAdd \"Multimodal-Prior-Guided Importance Sampling for Hierarchical Gaussian Splatting in Sparse-View Novel View Synthesis\"\n\nAdd \"R3GW: Relightable 3D Gaussians for Outdoor Scenes in the Wild\"\n\nAdd \"SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding\"\n\nAdd \"OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution\"\n\nAdd \"LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation\"\n\nAdd \"Sparse View Distractor-Free Gaussian Splatting\"\n\nAdd \"FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian Splatting\"\n\nAdd \"HeroGS: Hierarchical Guidance for Robust 3D Gaussian Splatting under Sparse Views\"\n\nAdd \"Decoupling Motion and Geometry in 4D Gaussian Splatting\"\n\nAdd \"TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction\"\n\nAdd \"Zero-Shot Robotic Manipulation via 3D Gaussian Splatting-Enhanced Multimodal Retrieval-Augmented Generation\"\n\nAdd \"ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models\"\n\nAdd \"UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images\"\n\nAdd \"GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction\"\n\nAdd \"Prune Wisely, Reconstruct Sharply: Compact 3D Gaussian Splatting via Adaptive Pruning and Difference-of-Gaussian Primitives\"\n\nAdd \"DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer\"\n\nAdd \"SR3R: Rethinking Super-Resolution 3D Reconstruction With Feed-Forward Gaussian Splatting\"\n\nAdd \"No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency\"\n\nAdd \"Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking\"\n\nAdd \"GSTurb: Gaussian Splatting for Atmospheric Turbulence Mitigation\"\n\nAdd \"Sapling-NeRF: Geo-Localised Sapling Reconstruction in Forests for Ecological Monitoring\"\n\nAdd \"ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals\"\n\nAdd \"BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model\"\n\nAdd \"GIFSplat: Generative Prior-Guided Iterative Feed-Forward 3D Gaussian Splatting from Sparse Views\"\n\nAdd \"SwiftNDC: Fast Neural Depth Correction for High-Fidelity 3D Reconstruction\"\n\nAdd \"AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction\"\n\nAdd \"Interactive Augmented Reality-enabled Outdoor Scene Visualization For Enhanced Real-time Disaster Response\"\n\nAdd \"DAGS-SLAM: Dynamic-Aware 3DGS SLAM via Spatiotemporal Motion Probability and Uncertainty-Aware Scheduling\"\n\nAdd \"Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting\"\n\nAdd \"M-Gaussian: An Magnetic Gaussian Framework for Efficient Multi-Stack MRI Reconstruction\"\n\nAdd \"RU4D-SLAM: Reweighting Uncertainty in Gaussian Splatting SLAM for 4D Scene Reconstruction\"\n\nAdd \"Monocular Endoscopic Tissue 3D Reconstruction with Multi-Level Geometry Regularization\"\n\nAdd \"WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos\"\n\nAdd \"Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field\"\n\nAdd \"Large-scale Photorealistic Outdoor 3D Scene Reconstruction from UAV Imagery Using Gaussian Splatting Techniques\"\n\nAdd \"tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction\"\n\nAdd \"Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting\"\n\nAdd \"RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing\"\n\nAdd \"DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering\"\n\nAdd \"Spatial-Temporal State Propagation Autoregressive Model for 4D Object Generation\"\n\nAdd \"Unifying Color and Lightness Correction with View-Adaptive Curve Adjustment for Robust 3D Novel View Synthesis\"\n\nAdd \"4D Monocular Surgical Reconstruction under Arbitrary Camera Motions\"\n\nAdd \"NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting\"\n\nAdd \"B$^3$-Seg: Camera-Free, Training-Free 3DGS Segmentation via Analytic EIG and Beta-Bernoulli Bayesian Updates\"\n\nAdd \"3D Scene Rendering with Multimodal Gaussian Splatting\"\n\nAdd \"i-PhysGaussian: Implicit Physical Simulation for 3D Gaussian Splatting\"\n\nAdd \"Semantic-Guided 3D Gaussian Splatting for Transient Object Removal\"\n\nAdd \"DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles\"\n\nAdd \"Time-Archival Camera Virtualization for Sports and Visual Performances\"\n\nAdd \"Wrivinder: Towards Spatial Intelligence for Geo-locating Ground Images onto Satellite Imagery\"\n\nAdd \"Gaussian Mesh Renderer for Lightweight Differentiable Rendering\"\n\nAdd \"Learnable Multi-level Discrete Wavelet Transforms for 3D Gaussian Splatting Frequency Modulation\"\n\nAdd \"Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos\"\n\nAdd \"Joint Orientation and Weight Optimization for Robust Watertight Surface Reconstruction via Dirichlet-Regularized Winding Fields\"\n\nAdd \"Nighttime Autonomous Driving Scene Reconstruction with Physically-Based Gaussian Splatting\"\n\nAdd \"FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation\"\n\nAdd \"GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction\"\n\nAdd \"LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning\"\n\nAdd \"3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting\"\n\nAdd \"TG-Field: Geometry-Aware Radiative Gaussian Fields for Tomographic Reconstruction\"\n\nAdd \"OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars\"\n\nAdd \"GR-Diffusion: 3D Gaussian Representation Meets Diffusion in Whole-Body PET Reconstruction\"\n\nAdd \"Variation-aware Flexible 3D Gaussian Editing\"\n\nAdd \"LeafFit: Plant Assets Creation from 3D Gaussian Splatting\"\n\nAdd \"ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles\"\n\nAdd \"ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting\"\n\nAdd \"XSPLAIN: XAI-enabling Splat-based Prototype Learning for Attribute-aware INterpretability\"\n\nAdd \"Faster-GS: Analyzing and Improving Gaussian Splatting Optimization\"\n\nAdd \"ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop\"\n\nAdd \"CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video\"\n\nAdd \"Toward Fine-Grained Facial Control in 3D Talking Head Generation\"\n\nAdd \"Grow with the Flow: 4D Reconstruction of Growing Plants with Gaussian Flow Fields\"\n\nAdd \"Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit\"\n\nAdd \"GaussianCaR: Gaussian Splatting for Efficient Camera-Radar Fusion\"\n\nAdd \"FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction\"\n\nAdd \"Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes\"\n\nAdd \"Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video\"\n\nAdd \"DynFOA: Generating First-Order Ambisonics with Conditional Diffusion for Dynamic and Acoustically Complex 360-Degree Videos\"\n\nAdd \"GaussianPOP: Principled Simplification Framework for Compact 3D Gaussian Splatting via Error Quantification\"\n\nAdd \"Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting\"\n\nAdd \"LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM\"\n\nAdd \"TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction\"\n\nAdd \"Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering\"\n\nAdd \"Pseudo-View Enhancement via Confidence Fusion for Unposed Sparse-View Reconstruction\"\n\nAdd \"BrepGaussian: CAD reconstruction from Multi-View Images with Gaussian Splatting\"\n\nAdd \"Three-dimensional Damage Visualization of Civil Structures via Gaussian Splatting-enabled Digital Twins\"\n\nAdd \"From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors\"\n\nAdd \"Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation\"\n\nAdd \"Unified Sensor Simulation for Autonomous Driving\"\n\nAdd \"QuantumGS: Quantum Encoding Framework for Gaussian Splatting\"\n\nAdd \"Nix and Fix: Targeting 1000x Compression of 3D Gaussian Splatting with Diffusion Models\"\n\nAdd \"VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image\"\n\nAdd \"JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction\"\n\nAdd \"SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization\"\n\nAdd \"Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions\"\n\nAdd \"AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting\"\n\nAdd \"Constrained Dynamic Gaussian Splatting\"\n\nAdd \"Pi-GS: Sparse-View Gaussian Splatting with Dense π^3 Initialization\"\n\nAdd \"SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation\"\n\nAdd \"SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation\"\n\nAdd \"Intellectual Property Protection for 3D Gaussian Splatting Assets: A Survey\"\n\nAdd \"UrbanGS: A Scalable and Efficient Architecture for Geometrically Accurate Large-Scene Reconstruction\"\n\nAdd \"SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors\"\n\nAdd \"OFERA: Blendshape-driven 3D Gaussian Control for Occluded Facial Expression to Realistic Avatars in VR\"\n\nAdd \"FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization\"\n\nAdd \"VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR\"\n\nAdd \"Position: 3D Gaussian Splatting Watermarking Should Be Scenario-Driven and Threat-Model Explicit\"\n\nAdd \"Split&Splat: Zero-Shot Panoptic Segmentation via Explicit Instance Modeling and 3D Gaussian Splatting\"\n\nAdd \"Radioactive 3D Gaussian Ray Tracing for Tomographic Reconstruction\"\n\nAdd \"HPC: Hierarchical Point-based Latent Representation for Streaming Dynamic Gaussian Splatting Compression\"\n\nAdd \"PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting\"\n\nAdd \"3DGS$^2$-TR: Scalable Second-Order Trust-Region Method for 3D Gaussian Splatting\"\n\nAdd \"EAG-PT: Emission-Aware Gaussians and Path Tracing for Indoor Scene Reconstruction and Editing\"\n\nAdd \"Self-Supervised Slice-to-Volume Reconstruction with Gaussian Representations for Fetal MRI\"\n\nAdd \"PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction\"\n\nAdd \"FreeFix: Boosting 3D Gaussian Splatting via Fine-Tuning-Free Diffusion Models\"\n\nAdd \"GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering\"\n\nAdd \"GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction\"\n\nAdd \"Graphical X Splatting (GraphiXS): A Graphical Model for 4D Gaussian Splatting under Uncertainty\"\n\nAdd \"WaterClear-GS: Optical-Aware Gaussian Splatting for Underwater Reconstruction and Restoration\"\n\nAdd \"DiffStyle3D: Consistent 3D Gaussian Stylization via Attention Optimization\"\n\nAdd \"Fast Converging 3D Gaussian Splatting for 1-Minute Reconstruction\"\n\nAdd \"UniMGS: Unifying Mesh and 3D Gaussian Splatting with Single-Pass Rasterization and Proxy-Based Deformation\"\n\nAdd \"Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting\"\n\nAdd \"ExoGS: A 4D Real-to-Sim-to-Real Framework for Scalable Manipulation Data Collection\"\n\nAdd \"LoD-Structured 3D Gaussian Splatting for Streaming Video Reconstruction\"\n\nAdd \"Geometry-Grounded Gaussian Splatting\"\n\nAdd \"PocketGS: On-Device Training of 3D Gaussian Splatting for High Perceptual Modeling\"\n\nAdd \"LGDWT-GS: Local and Global Discrete Wavelet-Regularized 3D Gaussian Splatting for Sparse-View Scene Reconstruction\"\n\nAdd \"A Step to Decouple Optimization in 3DGS\"\n\nAdd \"EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis\"\n\nAdd \"ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling\"\n\nAdd \"LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting\"\n\nAdd \"LL-GaussianMap: Zero-shot Low-Light Image Enhancement via 2D Gaussian Splatting Guided Gain Maps\"\n\nAdd \"SplatBus: A Gaussian Splatting Viewer Framework via GPU Interprocess Communication\"\n\nAdd \"POTR: Post-Training 3DGS Compression\"\n\nAdd \"Structured Image-based Coding for Efficient Gaussian Splatting Compression\"\n\nAdd \"Rig-Aware 3D Reconstruction of Vehicle Undercarriages using Gaussian Splatting\"\n\nAdd \"GaussExplorer: 3D Gaussian Splatting for Embodied Exploration and Reasoning\"\n\nAdd \"TreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement\"\n\nAdd \"CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting\"\n\nAdd \"GaussianTrimmer: Online Trimming Boundaries for 3DGS Segmentation\"\n\nAdd \"Active Semantic Mapping of Horticultural Environments Using Gaussian Splatting\"\n\nAdd \"studentSplat: Your Student Model Learns Single-view 3D Gaussian Splatting\"\n\nAdd \"Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting\"\n\nAdd \"TIDI-GS: Floater Suppression in 3D Gaussian Splatting for Enhanced Indoor Scene Fidelity\"\n\nAdd \"3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing\"\n\nAdd \"NAS-GS: Noise-Aware Sonar Gaussian Splatting\"\n\nAdd \"LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting\"\n\nAdd \"FeatureSLAM: Feature-enriched 3D gaussian splatting SLAM in real time\"\n\nAdd \"GS-DMSR: Dynamic Sensitive Multi-scale Manifold Enhancement for Accelerated High-Quality 3D Gaussian Splatting\"\n\nAdd \"GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting\"\n\nAdd \"OceanSplat: Object-aware Gaussian Splatting with Trinocular View Consistency for Underwater Scene Reconstruction\"\n\nAdd \"ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting\"\n\nAdd \"SCAR-GS: Spatial Context Attention for Residuals in Progressive Gaussian Splatting\"\n\nAdd \"IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting\"\n\nAdd \"CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature\"\n\nAdd \"A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting\"\n\nAdd \"SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting for Next Best View Selection\"\n\nAdd \"360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images\"\n\nAdd \"SketchRodGS: Sketch-based Extraction of Slender Geometries for Animating Gaussian Splatting Scenes\"\n\nAdd \"ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting\"\n\nAdd \"ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking\"\n\nAdd \"Clean-GS: Semantic Mask-Guided Pruning for 3D Gaussian Splatting\"\n\nAdd \"ShadowGS: Shadow-Aware 3D Gaussian Splatting for Satellite Imagery\"\n\nAdd \"RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization\"\n\nAdd \"SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting\"\n\nAdd \"PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes\"\n\nAdd \"UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning\"\n\nAdd \"Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression\"\n\nAdd \"Improved 3D Gaussian Splatting of Unknown Spacecraft Structure Using Space Environment Illumination Knowledge\"\n\nAdd \"GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation\"\n\nAdd \"Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting\"\n\nAdd \"TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars\"\n\nAdd \"AirGS: Real-Time 4D Gaussian Streaming for Free-Viewpoint Video Experiences\"\n\nAdd \"Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting\"\n\nAdd \"Nebula: Enable City-Scale 3D Gaussian Splatting in Virtual Reality via Collaborative Rendering and Accelerated Stereo Rasterization\"\n\nAdd \"Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)\"\n\nAdd \"Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs\"\n\nAdd \"HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction\"\n\nAdd \"WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion\"\n\nAdd \"4D Gaussian Splatting as a Learned Dynamical System\"\n\nAdd \"EcoSplat: Efficiency-controllable Feed-forward 3D Gaussian Splatting from Multi-view Images\"\n\nAdd \"SplatBright: Generalizable Low-Light Scene Reconstruction from Sparse Views via Physically-Guided Gaussian Enhancement\"\n\nAdd \"Geometric-Photometric Event-based 3D Gaussian Ray Tracing\"\n\nAdd \"MatSpray: Fusing 2D Material World Knowledge on 3D Geometry\"\n\nAdd \"Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding\"\n\nAdd \"Animate Any Character in Any World\"\n\nAdd \"G3Splat: Geometrically Consistent Generalizable Gaussian Splatting\"\n\nAdd \"FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views\"\n\nAdd \"Voxel-GS: Quantized Scaffold Gaussian Splatting Compression with Run-Length Coding\"\n\nAdd \"Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture\"\n\nAdd \"DGH: Dynamic Gaussian Hair\"\n\nAdd \"Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation\"\n\nAdd \"Flying in Clutter on Monocular RGB by Learning in 3D Radiance Fields with Domain Adaptation\"\n\nAdd \"SDFoam: Signed-Distance Foam for explicit surface reconstruction\"\n\nAdd \"Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering\"\n\nAdd \"Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting\"\n\nAdd \"VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments\"\n\nAdd \"MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance\"\n\nAdd \"Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos\"\n\nAdd \"HGS: Hybrid Gaussian Splatting with Static-Dynamic Decomposition for Compact Dynamic View Synthesis\"\n\nAdd \"Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination\"\n\nAdd \"Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere\"\n\nAdd \"Consistent Instance Field for Dynamic Scene Understanding\"\n\nAdd \"GaussianPlant: Structure-aligned Gaussian Splatting for 3D Reconstruction of Plants\"\n\nAdd \"ASAP-Textured Gaussians: Enhancing Textured Gaussians with Adaptive Sampling and Anisotropic Parameterization\"\n\nAdd \"Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries\"\n\nAdd \"Computer vision training dataset generation for robotic environments using Gaussian splatting\"\n\nAdd \"Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance\"\n\nAdd \"Fast and Explicit: Slice-to-Volume Reconstruction via 3D Gaussian Primitives with Analytic Point Spread Function Modeling\"\n\nAdd \"Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video\"\n\nAdd \"Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction\"\n\nAdd \"Lightweight 3D Gaussian Splatting Compression via Video Codec\"\n\nAdd \"GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting\"\n\nAdd \"Sharp Monocular View Synthesis in Less Than a Second\"\n\nAdd \"DeMapGS: Simultaneous Mesh Deformation and Surface Attribute Mapping via Gaussian Splatting\"\n\nAdd \"Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views\"\n\nAdd \"TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing\"\n\nAdd \"GAINS: Gaussian-based Inverse Rendering from Sparse Multi-View Captures\"\n\nAdd \"Splatent: Splatting Diffusion Latents for Novel View Synthesis\"\n\nAdd \"YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos\"\n\nAdd \"Super4DR: 4D Radar-centric Self-supervised Odometry and Gaussian-based Map Optimization\"\n\nAdd \"D$^2$GSLAM: 4D Dynamic Gaussian Splatting SLAM\"\n\nAdd \"Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular Video\"\n\nAdd \"GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars\"\n\nAdd \"OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics\"\n\nAdd \"HybridSplat: Fast Reflection-baked Gaussian Tracing using Hybrid Splatting\"\n\nAdd \"Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects\"\n\nAdd \"Debiasing Diffusion Priors via 3D Attention for Consistent Gaussian Splatting\"\n\n### 2026/02/02\n\nAdd \"GS-Checker: Tampering Localization for 3D Gaussian Splatting\"\n\nAdd \"Material-informed Gaussian Splatting for 3D World Reconstruction in a Digital Twin\"\n\nAdd \"Active3D: Active High-Fidelity 3D Reconstruction via Hierarchical Uncertainty Quantification\"\n\nAdd \"GigaWorld-0: World Models as Data Engine to Empower Embodied AI\"\n\nAdd \"STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction\"\n\nAdd \"Proxy-Free Gaussian Splats Deformation with Splat-Based Surface Estimation\"\n\nAdd \"DensifyBeforehand: LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting\"\n\nAdd \"IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes\"\n\nAdd \"NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting\"\n\nAdd \"MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes\"\n\nAdd \"Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction\"\n\nAdd \"Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing\"\n\nAdd \"NeAR: Coupled Neural Asset-Renderer Stack\"\n\nAdd \"PhysGS: Bayesian-Inferred Gaussian Splatting for Physical Property Estimation\"\n\nAdd \"ReCoGS: Real-time ReColoring for Gaussian Splatting scenes\"\n\nAdd \"SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation\"\n\nAdd \"Observer Actor: Active Vision Imitation Learning with Sparse View Gaussian Splatting\"\n\nAdd \"RoboArmGS: High-Quality Robotic Arm Splatting via Bézier Curve Refinement\"\n\nAdd \"Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion\"\n\nAdd \"Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization\"\n\nAdd \"CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation\"\n\nAdd \"AEGIS: Preserving privacy of 3D Facial Avatars with Adversarial Perturbations\"\n\nAdd \"FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception\"\n\nAdd \"SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors\"\n\nAdd \"PEGS: Physics-Event Enhanced Large Spatiotemporal Motion Reconstruction via 3D Gaussian Splatting\"\n\nAdd \"SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting\"\n\nAdd \"RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation\"\n\nAdd \"PhysMorph-GS: Differentiable Shape Morphing via Joint Optimization of Physics and Rendering Objectives\"\n\nAdd \"Gradient-Driven Natural Selection for Compact 3D Gaussian Splatting\"\n\nAdd \"One Walk is All You Need: Data-Efficient 3D RF Scene Reconstruction with Human Movements\"\n\nAdd \"Vorion: A RISC-V GPU with Hardware-Accelerated 3D Gaussian Rendering and Training\"\n\nAdd \"TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming\"\n\nAdd \"EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering\"\n\nAdd \"Optimizing 3D Gaussian Splattering for Mobile GPUs\"\n\nAdd \"LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM\"\n\n### 2026/01/20\n\nAdd \"Clustered Error Correction with Grouped 4D Gaussian Splatting\"\n\nAdd \"Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments\"\n\nAdd \"CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis\"\n\nAdd \"Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting\"\n\nAdd \"Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video\"\n\nAdd \"SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction\"\n\nAdd \"Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction\"\n\nAdd \"IBGS: Image-Based Gaussian Splatting\"\n\nAdd \"Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs\"\n\nAdd \"GEN3D: Generating Domain-Free 3D Scenes from a Single Image\"\n\nAdd \"iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion\"\n\nAdd \"Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting\"\n\nAdd \"Opt3DGS: Optimizing 3D Gaussian Splatting with Adaptive Exploration and Curvature-Aware Exploitation\"\n\nAdd \"SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting\"\n\nAdd \"SymGS : Leveraging Local Symmetries for 3D Gaussian Splatting Compression\"\n\nAdd \"Monocular 3D Lane Detection via Structure Uncertainty-Aware Network with Curve-Point Queries\"\n\n### 2025/12/15\n\nAdd \"Beyond Darkness: Thermal-Supervised 3D Gaussian Splatting for Low-Light Novel View Synthesis\"\n\nAdd \"TR-Gaussians: High-fidelity Real-time Rendering of Planar Transmission and Reflection with 3D Gaussian Splatting\"\n\nAdd \"SplatSearch: Instance Image Goal Navigation for Mobile Robots using 3D Gaussian Splatting and Diffusion Models\"\n\nAdd \"GUIDE: Gaussian Unified Instance Detection for Enhanced Obstacle Perception in Autonomous Driving\"\n\nAdd \"Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration\"\n\nAdd \"Reconstructing 3D Scenes in Native High Dynamic Range\"\n\nAdd \"Changes in Real Time: Online Scene Change Detection with Multi-View Fusion\"\n\nAdd \"SRSplat: Feed-Forward Super-Resolution Gaussian Splatting from Sparse Multi-View Images\"\n\nAdd \"3D Gaussian and Diffusion-Based Gaze Redirection\"\n\nAdd \"RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting\"\n\nAdd \"PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI\"\n\nAdd \"Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision\"\n\nAdd \"Multivariate Gaussian Representation Learning for Medical Action Evaluation\"\n\nAdd \"TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting\"\n\nAdd \"AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting\"\n\nAdd \"Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration\"\n\nAdd \"A Shared-Autonomy Construction Robotic System for Overhead Works\"\n\nAdd \"OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS\"\n\nAdd \"SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering\"\n\nAdd \"Perceptual Quality Assessment of 3D Gaussian Splatting: A Subjective Dataset and Prediction Metric\"\n\nAdd \"DIMO: Diverse 3D Motion Generation for Arbitrary Objects\"\n\nAdd \"YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting\"\n\nAdd \"4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation\"\n\nAdd \"GFix: Perceptually Enhanced Gaussian Splatting Video Compression\"\n\nAdd \"MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks\"\n\nAdd \"ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives\"\n\nAdd \"Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes\"\n\nAdd \"Rethinking Rainy 3D Scene Reconstruction via Perspective Transforming and Brightness Tuning\"\n\nAdd \"DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting\"\n\nAdd \"Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes\"\n\nAdd \"Physics-Informed Deformable Gaussian Splatting: Towards Unified Constitutive Laws for Time-Evolving Material Field\"\n\nAdd \"StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video\"\n\nAdd \"4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos\"\n\n### 2025/11/25\n\nAdd \"Efficient representation of 3D spatial data for defense-related applications\"\n\nAdd \"CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting\"\n\nAdd \"3D Gaussian Point Encoders\"\n\nAdd \"Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions\"\n\nAdd \"UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction\"\n\nAdd \"FastGS: Training 3D Gaussian Splatting in 100 Seconds\"\n\nAdd \"CaRF: Enhancing Multi-View Consistency in Referring 3D Gaussian Splatting Segmentation\"\n\nAdd \"Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization\"\n\nAdd \"DentalSplat: Dental Occlusion Novel View Synthesis from Sparse Intra-Oral Photographs\"\n\nAdd \"PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing\"\n\nAdd \"Object-Centric 3D Gaussian Splatting for Strawberry Plant Reconstruction and Phenotyping\"\n\nAdd \"3D Gaussian Radiation Field Modeling for Integrated RIS-FAS Systems: Analysis and Optimization\"\n\nAdd \"GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies\"\n\nAdd \"4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting\"\n\nAdd \"Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models\"\n\nAdd \"Object-Aware 4D Human Motion Generation\"\n\nAdd \"SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction\"\n\n### 2025/11/18\n\nAdd \"DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting\"\n\nAdd \"HEIR: Learning Graph-Based Motion Hierarchies\"\n\nAdd \"AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM\"\n\nAdd \"6D Channel Knowledge Map Construction via Bidirectional Wireless Gaussian Splatting\"\n\nAdd \"JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting\"\n\nAdd \"Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation\"\n\nAdd \"EA3D: Online Open-World 3D Object Extraction from Streaming Videos\"\n\nAdd \"AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians\"\n\nAdd \"DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes\"\n\nAdd \"NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation\"\n\nAdd \"LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation\"\n\nAdd \"PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors\"\n\nAdd \"Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation\"\n\nAdd \"VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting\"\n\nAdd \"EndoWave: Rational-Wavelet 4D Gaussian Splatting for Endoscopic Reconstruction\"\n\nAdd \"Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression\"\n\nAdd \"Region-Adaptive Learned Hierarchical Encoding for 3D Gaussian Splatting Data\"\n\nAdd \"LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering\"\n\nAdd \"RoGER-SLAM: A Robust Gaussian Splatting SLAM System for Noisy and Low-light Environment Resilience\"\n\nAdd \"DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss\"\n\nAdd \"DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum\"\n\nAdd \"STG-Avatar: Animatable Human Avatars via Spacetime Gaussian\"\n\nAdd \"Towards Physically Executable 3D Gaussian for Embodied Navigation\"\n\n### 2025/11/10\n\nAdd \"GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation\"\n\nAdd \"OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects\"\n\nAdd \"COS3D: Collaborative Open-Vocabulary 3D Segmentation\"\n\nAdd \"Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses\"\n\nAdd \"Re-Activating Frozen Primitives for 3D Gaussian Splatting\"\n\nAdd \"MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting\"\n\nAdd \"GRASPLAT: Enabling dexterous grasping through novel view synthesis\"\n\nAdd \"Moving Light Adaptive Colonoscopy Reconstruction via Illumination-Attenuation-Aware 3D Gaussian Splatting\"\n\nAdd \"OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion\"\n\nAdd \"From Volume Rendering to 3D Gaussian Splatting: Theory and Applications\"\n\n### 2025/11/06\n\nAdd \"HouseTour: A Virtual Real Estate A(I)gent\"\n\nAdd \"InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation\"\n\nAdd \"Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats\"\n\nAdd \"Raindrop GS: A Benchmark for 3D Gaussian Splatting under Raindrop Conditions\"\n\nAdd \"Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS\"\n\nAdd \"GSPlane: Concise and Accurate Planar Reconstruction via Structured Representation\"\n\nAdd \"2DGS-R: Revisiting the Normal Consistency Regularization in 2D Gaussian Splatting\"\n\nAdd \"GS2POSE: Marry Gaussian Splatting to 6D Object Pose Estimation\"\n\nAdd \"HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars\"\n\nAdd \"REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting\"\n\nAdd \"Fix False Transparency by Noise Guided Splatting\"\n\nAdd \"PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction\"\n\n### 2025/11/03\n\nAdd \"GaussGym: An open-source real-to-sim framework for learning locomotion from pixels\"\n\nAdd \"DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion\"\n\nAdd \"SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images\"\n\nAdd \"Terra: Explorable Native 3D World Model with Point Latents\"\n\nAdd \"Leveraging Learned Image Prior for 3D Gaussian Compression\"\n\nAdd \"BalanceGS: Algorithm-System Co-design for Efficient 3D Gaussian Splatting Training on GPU\"\n\nAdd \"GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering\"\n\nAdd \"Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures\"\n\nAdd \"Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images\"\n\nAdd \"Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications\"\n\nAdd \"FlashWorld: High-quality 3D Scene Generation within Seconds\"\n\nAdd \"VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator\"\n\nAdd \"Leveraging 2D Priors and SDF Guidance for Dynamic Urban Scene Rendering\"\n\nAdd \"SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms\"\n\nAdd \"Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction\"\n\nAdd \"BSGS: Bi-stage 3D Gaussian Splatting for Camera Motion Deblurring\"\n\nAdd \"Hybrid Gaussian Splatting for Novel Urban View Synthesis\"\n\nAdd \"PAGS: Priority-Adaptive Gaussian Splatting for Dynamic Driving Scenes\"\n\nAdd \"UniGS: Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering\"\n\nAdd \"Gaussian Semantic Field for One-shot LiDAR Global Localization\"\n\nAdd \"G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior\"\n\nAdd \"GS-Verse: Mesh-based Gaussian Splatting for Physics-aware Interaction in Virtual Reality\"\n\nAdd \"Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams\"\n\n### 2025/11/02\n\nAdd \"Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation\"\n\nAdd \"VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment\"\n\nAdd \"Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency\"\n\nAdd \"WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting\"\n\nAdd \"High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting\"\n\nAdd \"Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework\"\n\nAdd \"Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting\"\n\nAdd \"Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer\"\n\nAdd \"Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting\"\n\nAdd \"P-4DGS: Predictive 4D Gaussian Splatting with 90× Compression\"\n\nAdd \"CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting\"\n\nAdd \"VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping\"\n\nAdd \"LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates\"\n\nAdd \"Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians\"\n\nAdd \"Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes\"\n\nAdd \"EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation\"\n\n### 2025/10/31\n\nAdd \"ReSplat: Learning Recurrent Gaussian Splats\"\n\nAdd \"D2GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction\"\n\nAdd \"ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation\"\n\nAdd \"Splat the Net: Radiance Fields with Splattable Neural Primitives\"\n\nAdd \"Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting\"\n\nAdd \"CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving\"\n\nAdd \"PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting\"\n\nAdd \"DEGS: Deformable Event-based 3D Gaussian Splatting from RGB and Event Stream\"\n\nAdd \"ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes\"\n\nAdd \"Generating Surface for Text-to-3D using 2D Gaussian Splatting\"\n\nAdd \"Capture and Interact: Rapid 3D Object Acquisition and Rendering with Gaussian Splatting in Unity\"\n\nAdd \"SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis\"\n\nAdd \"RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction\"\n\nAdd \"ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars\"\n\nAdd \"Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction\"\n\nAdd \"SketchPlan: Diffusion Based Drone Planning From Human Sketches\"\n\nAdd \"Universal Beta Splatting\"\n\nAdd \"GS-Share: Enabling High-fidelity Map Sharing with Incremental Gaussian Splatting\"\n\nAdd \"From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting\"\n\nAdd \"StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions\"\n\nAdd \"GaussianMorphing: Mesh-Guided 3D Gaussians for Semantic-Aware Object Morphing\"\n\nAdd \"ROI-GS: Interest-based Local Quality 3D Gaussian Splatting\"\n\nAdd \"LOBE-GS: Load-Balanced and Efficient 3D Gaussian Splatting for Large-Scale Scene Reconstruction\"\n\nAdd \"MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics\"\n\nAdd \"HART: Human Aligned Reconstruction Transformer\"\n\n### 2025/10/28\n\nAdd \"Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting\"\n\nAdd \"GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts\"\n\nAdd \"PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion\"\n\nAdd \"LLM-Powered Code Analysis and Optimization for Gaussian Splatting Kernels\"\n\nAdd \"GaussianLens: Localized High-Resolution Reconstruction via On-Demand Gaussian Densification\"\n\nAdd \"PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos\"\n\nAdd \"Triangle Splatting+: Differentiable Rendering with Opaque Triangles\"\n\nAdd \"UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation\"\n\nAdd \"GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction\"\n\nAdd \"LVT: Large-Scale Scene Reconstruction via Local View Transformers\"\n\nAdd \"HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping\"\n\nAdd \"ExGS: Extreme 3D Gaussian Compression with Diffusion Priors\"\n\nAdd \"Proxy-GS: Efficient 3D Gaussian Splatting via Proxy Mesh\"\n\nAdd \"Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos\"\n\nAdd \"CrashSplat: 2D to 3D Vehicle Damage Segmentation in Gaussian Splatting\"\n\nAdd \"Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos\"\n\nAdd \"WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving\"\n\nAdd \"OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting\"\n\nAdd \"Learning Unified Representation of 3D Gaussian Splatting\"\n\nAdd \"HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes\"\n\nAdd \"GS-2M: Gaussian Splatting for Joint Mesh Reconstruction and Material Decomposition\"\n\nAdd \"Polysemous Language Gaussian Splatting via Matching-based Mask Lifting\"\n\nAdd \"Rigidity-Aware 3D Gaussian Deformation from a Single Image\"\n\nAdd \"Large Material Gaussian Model for Relightable 3D Generation\"\n\nAdd \"Dynamic Novel View Synthesis in High Dynamic Range\"\n\nAdd \"PowerGS: Display-Rendering Power Co-Optimization for Neural Rendering in Power-Constrained XR Systems\"\n\n### 2025/10/23\n\nAdd \"SeHDR: Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing\"\n\nAdd \"4D Driving Scene Generation With Stereo Forcing\"\n\nAdd \"PU-Gaussian: Point Cloud Upsampling using 3D Gaussian Representation\"\n\nAdd \"GS-RoadPatching: Inpainting Gaussians via 3D Searching and Placing for Driving Scenes\"\n\nAdd \"Aerial-Ground Image Feature Matching via 3D Gaussian Splatting-based Intermediate View Rendering\"\n\nAdd \"BiTAA: A Bi-Task Adversarial Attack for Object Detection and Depth Estimation via 3D Gaussian Splatting\"\n\nAdd \"PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction\"\n\nAdd \"VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction\"\n\nAdd \"Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation\"\n\nAdd \"WaveletGaussian: Wavelet-domain Diffusion for Sparse-view 3D Gaussian Object Reconstruction\"\n\nAdd \"Seeing Through Reflections: Advancing 3D Scene Reconstruction in Mirror-Containing Environments with Gaussian Splatting\"\n\nAdd \"DeblurSplat: SfM-free 3D Gaussian Splatting with Event Camera for Robust Deblurring\"\n\nAdd \"FixingGS: Enhancing 3D Gaussian Splatting via Training-Free Score Distillation\"\n\nAdd \"Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction\"\n\nAdd \"BridgeSplat: Bidirectionally Coupled CT and Non-Rigid Gaussian Splatting for Deformable Intraoperative Surgical Navigation\"\n\nAdd \"ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos\"\n\nAdd \"From Restoration to Reconstruction: Rethinking 3D Gaussian Splatting for Underwater Scenes\"\n\nAdd \"EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device\"\n\n### 2025/10/14\n\nAdd \"FGGS-LiDAR: Ultra-Fast, GPU-Accelerated Simulation from General 3DGS Models to LiDAR\"\n\nAdd \"SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction\"\n\nAdd \"SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views\"\n\nAdd \"HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis\"\n\nAdd \"Efficient 3D Scene Reconstruction and Simulation from Sparse Endoscopic Views\"\n\nAdd \"SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments\"\n\nAdd \"PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control\"\n\nAdd \"ConfidentSplat: Confidence-Weighted Depth Fusion for Accurate 3D Gaussian Splatting SLAM\"\n\nAdd \"MedGS: Gaussian Splatting for Multi-Modal 3D Medical Imaging\"\n\nAdd \"SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving\"\n\nAdd \"ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting\"\n\nAdd \"3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction\"\n\nAdd \"RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars\"\n\nAdd \"Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval\"\n\nAdd \"Camera Splatting for Continuous View Optimization\"\n\nAdd \"FingerSplat: Contactless Fingerprint 3D Reconstruction and Generation based on 3D Gaussian Splatting\"\n\nAdd \"GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading\"\n\nAdd \"MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild\"\n\nAdd \"Causal Reasoning Elicits Controllable 3D Scene Generation\"\n\nAdd \"FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction\"\n\nAdd \"RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI\"\n\nAdd \"Perception-Integrated Safety Critical Control via Analytic Collision Cone Barrier Functions on 3D Gaussian Splatting\"\n\nAdd \"MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping\"\n\nAdd \"Plug-and-Play PDE Optimization for 3D Gaussian Splatting: Toward High-Quality Rendering and Reconstruction\"\n\nAdd \"MemGS: Memory-Efficient Gaussian Splatting for Real-Time SLAM\"\n\nAdd \"Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization\"\n\nAdd \"Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image\"\n\n### 2025/10/08\n\nAdd \"Beyond Averages: Open-Vocabulary 3D Scene Understanding with Gaussian Splatting and Bag of Embeddings\"\n\nAdd \"4DRadar-GS: Self-Supervised Dynamic Driving Scene Reconstruction with 4D Radar\"\n\nAdd \"Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization\"\n\nAdd \"E2-BKI: Evidential Ellipsoidal Bayesian Kernel Inference for Uncertainty-aware Gaussian Semantic Mapping\"\n\nAdd \"Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting\"\n\nAdd \"A Controllable 3D Deepfake Generation Framework with Gaussian Splatting\"\n\nAdd \"Gaussian-Plus-SDF SLAM: High-fidelity 3D Reconstruction at 150+ fps\"\n\nAdd \"ROSGS: Relightable Outdoor Scenes With Gaussian Splatting\"\n\nAdd \"SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion\"\n\nAdd \"SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting\"\n\nAdd \"AD-GS: Alternating Densification for Sparse-Input 3D Gaussian Splatting\"\n\nAdd \"T2Bs: Text-to-Character Blendshapes via Video Generation\"\n\nAdd \"On the Geometric Accuracy of Implicit and Primitive-based Representations Derived from View Rendering Constraints\"\n\nAdd \"SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting\"\n\nAdd \"HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting\"\n\nAdd \"PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image\"\n\nAdd \"Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning\"\n\nAdd \"DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation\"\n\nAdd \"MEGS2: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning\"\n\nAdd \"3DOF+Quantization: 3DGS quantization for large scenes with limited Degrees of Freedom\"\n\nAdd \"Reconstruction and Reenactment Separated Method for Realistic Gaussian Head\"\n\nAdd \"Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting\"\n\nAdd \"GeoSplat: A Deep Dive into Geometry-Constrained Gaussian Splatting\"\n\nAdd \"SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer\"\n\n### 2025/09/28\n\nAdd \"ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction\"\n\nAdd \"Efficient Geometry Compression and Communication for 3D Gaussian Splatting Point Clouds\"\n\nAdd \"GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals\"\n\nAdd \"2D Gaussian Splatting with Semantic Alignment for Image Inpainting\"\n\nAdd \"FGO-SLAM: Enhancing Gaussian SLAM with Globally Consistent Opacity Radiance Field\"\n\nAdd \"Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars\"\n\nAdd \"Towards Integrating Multi-Spectral Imaging with Gaussian Splatting\"\n\nAdd \"GS-TG: 3D Gaussian Splatting Accelerator with Tile Grouping for Reducing Redundant Sorting while Preserving Rasterization Efficiency\"\n\nAdd \"UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring\"\n\nAdd \"SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting\"\n\nAdd \"MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image Structure\"\n\nAdd \"AGS: Accelerating 3D Gaussian Splatting SLAM via CODEC-Assisted Frame Covisibility Detection\"\n\nAdd \"Complete Gaussian Splats from a Single Image with Denoising Diffusion Models\"\n\nAdd \"Scale-GS: Efficient Scalable Gaussian Splatting via Redundancy-filtering Training on Streaming Content\"\n\nAdd \"ARGS: Advanced Regularization on Aligning Gaussians over the Surface\"\n\nAdd \"RadGS-Reg: Registering Spine CT with Biplanar X-rays via Joint 3D Radiative Gaussians Reconstruction and 3D/3D Registration\"\n\nAdd \"DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving Scenes\"\n\nAdd \"AvatarBack: Back-Head Generation for Complete 3D Avatars from Front-View Images\"\n\nAdd \"Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation\"\n\nAdd \"Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images\"\n\nAdd \"MAPo: Motion-Aware Partitioning of Deformable 3D Gaussian Splatting for High-Fidelity Dynamic Scene Reconstruction\"\n\nAdd \"FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers\"\n\nAdd \"LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation\"\n\nAdd \"Style4D-Bench: A Benchmark Suite for 4D Stylization\"\n\nAdd \"ColorGS: High-fidelity Surgical Scene Reconstruction with Colored Gaussian Splatting\"\n\nAdd \"FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses\"\n\n### 2025/09/25\n\nAdd \"GSVisLoc: Generalizable Visual Localization for Gaussian Splatting Scene Representations\"\n\nAdd \"Camera Pose Refinement via 3D Gaussian Splatting\"\n\nAdd \"MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting\"\n\nAdd \"GWM: Towards Scalable Gaussian World Models for Robotic Manipulation\"\n\nAdd \"IDU: Incremental Dynamic Update of Existing 3D Virtual Environments with New Imagery Data\"\n\nAdd \"Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels\"\n\nAdd \"Fiducial Marker Splatting for High-Fidelity Robotics Simulations\"\n\nAdd \"Arbitrary-Scale 3D Gaussian Super-Resolution\"\n\nAdd \"UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation\"\n\nAdd \"Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework\"\n\nAdd \"DriveSplat: Decoupled Driving Scene Reconstruction with Geometry-enhanced Partitioned Neural Gaussians\"\n\nAdd \"Image-Conditioned 3D Gaussian Splat Quantization\"\n\nAdd \"MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion\"\n\nAdd \"Zero-shot Volumetric CT Super-Resolution using 3D Gaussian Splatting with Upsampled 2D X-ray Projection Priors\"\n\nAdd \"Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds\"\n\n### 2025/09/17\n\nAdd \"GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects\"\n\nAdd \"GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting\"\n\nAdd \"GeMS: Efficient Gaussian Splatting for Extreme Motion Blur\"\n\nAdd \"GOGS: High-Fidelity Geometry and Relighting for Glossy Objects via Gaussian Surfels\"\n\nAdd \"From Slices to Structures: Unsupervised 3D Reconstruction of Female Pelvic Anatomy from Freehand Transvaginal Ultrasound\"\n\nAdd \"D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis\"\n\nAdd \"Reconstruction Using the Invisible: Intuition from NIR and Metadata for Enhanced 3D Gaussian Splatting\"\n\nAdd \"GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting\"\n\nAdd \"LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos\"\n\nAdd \"Distilled-3DGS:Distilled 3D Gaussian Splatting\"\n\nAdd \"Online 3D Gaussian Splatting Modeling with Novel View Selection\"\n\nAdd \"PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis\"\n\nAdd \"EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors\"\n\nAdd \"InnerGS: Internal Scenes Rendering via Factorized 3D Gaussian Splatting\"\n\nAdd \"IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion\"\n\nAdd \"IntelliCap: Intelligent Guidance for Consistent View Sampling\"\n\nAdd \"Quantifying and Alleviating Co-Adaptation in Sparse-View 3D Gaussian Splatting\"\n\nAdd \"WIPES: Wavelet-based Visual Primitives\"\n\nAdd \"TiP4GEN: Text to Immersive Panorama 4D Scene Generation\"\n\nAdd \"Improving Densification in 3D Gaussian Splatting for High-Fidelity Rendering\"\n\nAdd \"InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes\"\n\nAdd \"ComplicitSplat: Downstream Models are Vulnerable to Blackbox Attacks by 3D Gaussian Splat Camouflages\"\n\nAdd \"Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction\"\n\n### 2025/08/17\n\nAdd \"Multi-Sample Anti-Aliasing and Constrained Optimization for 3D Gaussian Splatting\"\n\nAdd \"EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting\"\n\nAdd \"E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras\"\n\nAdd \"TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos\"\n\nAdd \"GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors\"\n\nAdd \"Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation\"\n\nAdd \"DualPhys-GS: Dual Physically-Guided 3D Gaussian Splatting for Underwater Scene Reconstruction\"\n\nAdd \"SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing\"\n\nAdd \"SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images\"\n\nAdd \"Gradient-Direction-Aware Density Control for 3D Gaussian Splatting\"\n\nAdd \"GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments\"\n\nAdd \"Learning an Implicit Physics Model for Image-based Fluid Simulation\"\n\nAdd \"ReferSplat: Referring Segmentation in 3D Gaussian Splatting\"\n\nAdd \"SAGOnline: Segment Any Gaussians Online\"\n\nAdd \"FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting\"\n\nAdd \"NeeCo: Image Synthesis of Novel Instrument States Based on Dynamic and Deformable 3D Gaussian Reconstruction\"\n\nAdd \"Touch-Augmented Gaussian Splatting for Enhanced 3D Scene Reconstruction\"\n\nAdd \"Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction\"\n\nAdd \"Novel View Synthesis with Gaussian Splatting: Impact on Photogrammetry Model Accuracy and Resolution\"\n\nAdd \"DIP-GS: Deep Image Prior For Gaussian Splatting Sparse View Recovery\"\n\nAdd \"GS4Buildings: Prior-Guided Gaussian Splatting for 3D Building Reconstruction\"\n\nAdd \"Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems\"\n\nAdd \"3D Gaussian Representations with Motion Trajectory Field for Dynamic Scene Reconstruction\"\n\nAdd \"3DGS-VBench: A Comprehensive Video Quality Evaluation Benchmark for 3DGS Compression\"\n\nAdd \"EGS-SLAM: RGB-D Gaussian Splatting SLAM with Events\"\n\nAdd \"Evaluating Fisheye-Compatible 3D Gaussian Splatting Methods on Real Images Beyond 180 Degree Field of View\"\n\nAdd \"A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation\"\n\n### 2025/08/12\n\nAdd \"UW-3DGS: Underwater 3D Reconstruction with Physics-Aware Gaussian Splatting\"\n\nAdd \"Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation\"\n\nAdd \"ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors\"\n\nAdd \"A 3DGS-Diffusion Self-Supervised Framework for Normal Estimation from a Single Image\"\n\nAdd \"Optimization-Free Style Transfer for 3D Gaussian Splats\"\n\nAdd \"GAP: Gaussianize Any Point Clouds with Text Guidance\"\n\nAdd \"3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering\"\n\nAdd \"CF3: Compact and Fast 3D Feature Fields\"\n\nAdd \"Refining Gaussian Splatting: A Volumetric Densification Approach\"\n\nAdd \"UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS\"\n\nAdd \"Laplacian Analysis Meets Dynamics Modelling: Gaussian Splatting for 4D Reconstruction\"\n\nAdd \"Perceive-Sample-Compress: Towards Real-Time 3D Gaussian Splatting\"\n\nAdd \"CryoGS: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction\"\n\nAdd \"Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline\"\n\nAdd \"Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds\"\n\nAdd \"MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction\"\n\nAdd \"SplitGaussian: Reconstructing Dynamic Scenes via Visual Geometry Decomposition\"\n\nAdd \"DET-GS: Depth- and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting\"\n\nAdd \"Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework\"\n\nAdd \"RLGS: Reinforcement Learning-Based Adaptive Hyperparameter Tuning for Gaussian Splatting\"\n\nAdd \"Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images\"\n\nAdd \"Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing\"\n\nAdd \"Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting\"\n\nAdd \"RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions\"\n\nAdd \"SA-3DGS: A Self-Adaptive Compression Method for 3D Gaussian Splatting\"\n\nAdd \"GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing\"\n\nAdd \"PMGS: Reconstruction of Projectile Motion across Large Spatiotemporal Spans via 3D Gaussian Splatting\"\n\nAdd \"Low-Frequency First: Eliminating Floating Artifacts in 3D Gaussian Splatting\"\n\nAdd \"GR-Gaussian: Graph-Based Radiative Gaussian Splatting for Sparse-View CT Reconstruction\"\n\nAdd \"SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion\"\n\nAdd \"GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting\"\n\nAdd \"AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing\"\n\nAdd \"LT-Gaussian: Long-Term Map Update Using 3D Gaussian Splatting for Autonomous Driving\"\n\nAdd \"Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians\"\n\nAdd \"OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGS\"\n\nAdd \"MoGaFace: Momentum-Guided and Texture-Aware Gaussian Avatars for Consistent Facial Geometry\"\n\nAdd \"No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views\"\n\nAdd \"OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding\"\n\nAdd \"IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation\"\n\nAdd \"Omni-Scan: Creating Visually-Accurate Digital Twin Object Models Using a Bimanual Robot with Handover and Gaussian Splat Merging\"\n\nAdd \"PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting\"\n\nAdd \"Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis\"\n\nAdd \"SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting\"\n\nAdd \"Enhanced Velocity Field Modeling for Gaussian Video Reconstruction\"\n\nAdd \"I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation\"\n\nAdd \"Stereo 3D Gaussian Splatting SLAM for Outdoor Urban Scenes\"\n\n### 2025/08/11\n\nAdd \"MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction\"\n\nAdd \"Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization\"\n\nAdd \"NeRF Is a Valuable Assistant for 3D Gaussian Splatting\"\n\nAdd \"iLRM: An Iterative Large 3D Reconstruction Model\"\n\nAdd \"GSFusion:Globally Optimized LiDAR-Inertial-Visual Mapping for Gaussian Splatting\"\n\nAdd \"Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction\"\n\nAdd \"UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views\"\n\nAdd \"MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors\"\n\nAdd \"S3LAM: Surfel Splatting SLAM for Geometrically Accurate Tracking and Mapping\"\n\nAdd \"GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections\"\n\nAdd \"Automated 3D-GS Registration and Fusion via Skeleton Alignment and Gaussian-Adaptive Features\"\n\nAdd \"From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos\"\n\nAdd \"Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction\"\n\nAdd \"RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection\"\n\nAdd \"SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations\"\n\nAdd \"Taking Language Embedded 3D Gaussian Splatting into the Wild\"\n\nAdd \"GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting\"\n\nAdd \"HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars\"\n\nAdd \"DASH: 4D Hash Encoding with Self-Supervised Decomposition for Real-Time Dynamic Scene Rendering\"\n\nAdd \" Gaussian Set Surface Reconstruction through Per-Gaussian Optimization\"\n\nAdd \"Learning Efficient and Generalizable Human Representation with Human Gaussian Model\"\n\nAdd \"Unposed 3DGS Reconstruction with Probabilistic Procrustes Mapping\"\n\nAdd \"GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians\"\n\nAdd \"CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting\"\n\nAdd \"MVG4D: Image Matrix-Based Multi-View and Motion Generation for 4D Content Creation from a Single Image\"\n\nAdd \"G2S-ICP SLAM: Geometry-aware Gaussian Splatting ICP SLAM\"\n\nAdd \"PS-GS: Gaussian Splatting for Multi-View Photometric Stereo\"\n\nAdd \"GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar\"\n\nAdd \"High-fidelity 3D Gaussian Inpainting: preserving multi-view consistency and photorealistic details\"\n\nAdd \"Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting\"\n\nAdd \"StreamME: Simplify 3D Gaussian Avatar within Live Stream\"\n\nAdd \"Dyna3DGR: 4D Cardiac Motion Tracking with Dynamic 3D Gaussian Representation\"\n\nAdd \"LongSplat: Online Generalizable 3D Gaussian Splatting from Long Sequence Images\"\n\nAdd \"Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars\"\n\nAdd \"Appearance Harmonization via Bilateral Grid Prediction with Transformers for 3DGS\"\n\n### 2025/08/09\n\nAdd \"DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting\"\n\nAdd \"Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing\"\n\nAdd \"Gaussian Splatting with Discretized SDF for Relightable Assets\"\n\nAdd \"SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting\"\n\nAdd \"ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting\"\n\nAdd \"Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction\"\n\nAdd \"DCHM: Depth-Consistent Human Modeling for Multiview Detection\"\n\nAdd \"Adaptive 3D Gaussian Splatting Video Streaming: Visual Saliency-Aware Tiling and Meta-Learning-Based Bitrate Adaptation\"\n\nAdd \"Adaptive 3D Gaussian Splatting Video Streaming\"\n\nAdd \"DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation\"\n\nAdd \"PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations\"\n\nAdd \"TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting\"\n\nAdd \"VolSegGS: Segmentation and Tracking in Dynamic Volumetric Scenes via Deformable 3D Gaussians\"\n\nAdd \"NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting\"\n\nAdd \"Wavelet-GS: 3D Gaussian Splatting with Wavelet Decomposition\"\n\nAdd \"BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images\"\n\nAdd \"SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation\"\n\nAdd \"Dark-EvGS: Event Camera as an Eye for Radiance Field in the Dark\"\n\nAdd \"A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction\"\n\nAdd \"TRAN-D: 2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update\"\n\nAdd \"Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling\"\n\nAdd \"ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions\"\n\nAdd \"MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second\"\n\nAdd \"3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving\"\n\nAdd \"RePaintGS: Reference-Guided Gaussian Splatting for Realistic and View-Consistent 3D Scene Inpainting\"\n\nAdd \"Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction\"\n\n### 2025/08/08\n\nAdd \"RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration\"\n\nAdd \"RTR-GS: 3D Gaussian Splatting for Inverse Rendering with Radiance Transfer and Reflection\"\n\nAdd \"SD-GS: Structured Deformable 3D Gaussians for Efficient Dynamic Scene Reconstruction\"\n\nAdd \"Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections\"\n\nAdd \"LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS\"\n\nAdd \"FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting\"\n\nAdd \"LighthouseGS: Indoor Structure-aware 3D Gaussian Splatting for Panorama-Style Mobile Captures\"\n\nAdd \"Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering\"\n\nAdd \"VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis\"\n\nAdd \"D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos\"\n\nAdd \"3DGS_LSR:Large_Scale Relocation for Autonomous Driving Based on 3D Gaussian Splatting\"\n\nAdd \"Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors\"\n\nAdd \"InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior\"\n\nAdd \"A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality\"\n\nAdd \"Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM\"\n\nAdd \"ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments\"\n\nAdd \"Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps\"\n\nAdd \"HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars\"\n\nAdd \"ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects\"\n\nAdd \"Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning\"\n\nAdd \"AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars\"\n\nAdd \"LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling\"\n\nAdd \"Gbake: Baking 3D Gaussian Splats into Reflection Probes\"\n\nAdd \"3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation\"\n\nAdd \"VISTA: Open-Vocabulary, Task-Relevant Robot Exploration with Online Semantic Gaussian Splatting\"\n\nAdd \"Masks make discriminative models great again!\"\n\nAdd \"GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond\"\n\nAdd \"LOD-GS: Level-of-Detail-Sensitive 3D Gaussian Splatting for Detail Conserved Anti-Aliasing\"\n\nAdd \"Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space\"\n\nAdd \"GDGS: 3D Gaussian Splatting Via Geometry-Guided Initialization And Dynamic Density Control\"\n\nAdd \"MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction\"\n\nAdd \"GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering\"\n\nAdd \"AttentionGS: Towards Initialization-Free 3D Gaussian Splatting via Structural Attention\"\n\n### 2025/08/07\n\nAdd \"SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting\"\n\nAdd \"Endo-4DGX: Robust Endoscopic Scene Reconstruction and Illumination Correction with Gaussian Splatting\"\n\nAdd \"TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints\"\n\nAdd \"From Coarse to Fine: Learnable Discrete Wavelet Transforms for Efficient 3D Gaussian Splatting\"\n\nAdd \"Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions\"\n\nAdd \"RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors\"\n\nAdd \"VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding\"\n\nAdd \"RoboPearls: Editable Video Simulation for Robot Manipulation\"\n\nAdd \"Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D Gaussians\"\n\nAdd \"DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model\"\n\nAdd \"Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field\"\n\nAdd \"SAR-GS: 3D Gaussian Splatting for Synthetic Aperture Radar Target Reconstruction\"\n\nAdd \"ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes\"\n\nAdd \"MADrive: Memory-Augmented Driving Scene Modeling\"\n\nAdd \"GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation\"\n\nAdd \"EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting\"\n\nAdd \"Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction\"\n\nAdd \"Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image\"\n\nAdd \"CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization\"\n\nAdd \"DBMovi-GS: Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting\"\n\nAdd \"3DGH: 3D Head Generation with Composable Hair and Face\"\n\nAdd \"Virtual Memory for 3D Gaussian Splatting\"\n\nAdd \"SOF: Sorted Opacity Fields for Fast Unbounded Surface Reconstruction\"\n\nAdd \"GRAND-SLAM: Local Optimization for Globally Consistent Large-Scale Multi-Agent Gaussian SLAM\"\n\nAdd \"4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation\"\n\nAdd \"BulletGen: Improving 4D Reconstruction with Bullet-Time Generation\"\n\nAdd \"2D Triangle Splatting for Direct Differentiable Mesh Training\"\n\nAdd \"3D Gaussian Splatting for Fine-Detailed Surface Reconstruction in Large-Scale Scene\"\n\nAdd \"Part2GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting\"\n\nAdd \"GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction\"\n\nAdd \"SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting\"\n\nAdd \"3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting\"\n\n### 2025/08/06\n\nAdd \"HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction\"\n\nAdd \"GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation\"\n\nAdd \"GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics\"\n\nAdd \"PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images\"\n\nAdd \"Micro-macro Gaussian Splatting with Enhanced Scalability for Unconstrained Scene Reconstruction\"\n\nAdd \"Multiview Geometric Regularization of Gaussian Splatting for Accurate Radiance Fields\"\n\nAdd \"GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction\"\n\nAdd \"Metropolis-Hastings Sampling for 3D Gaussian Reconstruction\"\n\nAdd \"SMPL Normal Map Is All You Need for Single-view Textured Human Reconstruction\"\n\nAdd \"Efficient multi-view training for 3D Gaussian Splatting\"\n\nAdd \"Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors\"\n\nAdd \"Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting\"\n\nAdd \"PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting\"\n\nAdd \"DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos\"\n\n### 2025/08/05\n\nAdd \"UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting\"\n\nAdd \"Self-Supervised Multi-Part Articulated Objects Modeling via Deformable Gaussian Splatting and Progressive Primitive Segmentation\"\n\nAdd \"SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields\"\n\nAdd \"Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS\"\n\nAdd \"HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene\"\n\nAdd \"TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation\"\n\nAdd \"ODG: Occupancy Prediction Using Dual Gaussians\"\n\nAdd \"UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images\"\n\nAdd \"STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support\"\n\nAdd \"StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams\"\n\nAdd \"Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting\"\n\nAdd \"SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting\"\n\nAdd \"TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering\"\n\nAdd \"Complex-Valued Holographic Radiance Fields\"\n\nAdd \"Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes\"\n\nAdd \"GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution\"\n\n### 2025/07/31\n\nAdd \"FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity\"\n\nAdd \"R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation\"\n\nAdd \"OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting\"\n\nAdd \"ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views\"\n\nAdd \"PIG: Physically-based Multi-Material Interaction with 3D Gaussians\"\n\nAdd \"Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation\"\n\nAdd \"Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization\"\n\nAdd \"Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction\"\n\nAdd \"Gaussian Mapping for Evolving Scenes\"\n\nAdd \"Multi-StyleGS: Stylizing Gaussian Splatting with Multiple Styles\"\n\nAdd \"Hi-LSplat: Hierarchical 3D Language Gaussian Splatting\"\n\nAdd \"Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling\"\n\nAdd \"GS4: Generalizable Sparse Splatting Semantic SLAM\"\n\nAdd \"Splat and Replace: 3D Reconstruction with Repetitive Elements\"\n\nAdd \"BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading\"\n\nAdd \"Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments\"\n\nAdd \"SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction\"\n\nAdd \"Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy\"\n\nAdd \"VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction\"\n\n### 2025/06/27\n\nAdd \"On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images\"\n\nAdd \"ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting\"\n\nAdd \"S2GO: Streaming Sparse Gaussian Occupancy Prediction\"\n\nAdd \"Gen4D: Synthesizing Humans and Scenes in the Wild\"\n\nAdd \"FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction\"\n\nAdd \"Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting\"\n\nAdd \"DSG-World: Learning a 3D Gaussian World Model from Dual State Videos\"\n\nAdd \"OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View\"\n\nAdd \"Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training\"\n\nAdd \"UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting\"\n\nAdd \"Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting\"\n\nAdd \"Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer\"\n\nAdd \"Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations\"\n\nAdd \"HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting\"\n\nAdd \"FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting\"\n\nAdd \"Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data\"\n\nAdd \"JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting\"\n\nAdd \"SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting\"\n\nAdd \"Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting\"\n\nAdd \"Multi-Spectral Gaussian Splatting with Neural Color Representation\"\n\nAdd \"LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM\"\n\nAdd \"Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone\"\n\nAdd \"RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS\"\n\nAdd \"VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians\"\n\nAdd \"EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR\"\n\nAdd \"GSCodec Studio: A Modular Framework for Gaussian Splat Compression\"\n\nAdd \"WorldExplorer: Towards Generating Fully Navigable 3D Scenes\"\n\nAdd \"RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes\"\n\nAdd \"CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting\"\n\nAdd \"PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation\"\n\nAdd \"Globally Consistent RGB-D SLAM with 2D Gaussian Splatting\"\n\nAdd \"3D Gaussian Splat Vulnerabilities\"\n\nAdd \"Adaptive Voxelization for Transform coding of 3D Gaussian splatting data\"\n\nAdd \"Understanding while Exploring: Semantics-driven Active Mapping\"\n\nAdd \"AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion\"\n\nAdd \"TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores\"\n\nAdd \"Tackling View-Dependent Semantics in 3D Language Gaussian Splatting\"\n\nAdd \"GARLIC: GAussian Representation LearnIng for spaCe partitioning\"\n\nAdd \"3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians\"\n\nAdd \"ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS\"\n\nAdd \"AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views\"\n\nAdd \"Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting\"\n\n### 2025/06/25\n\nAdd \"LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering\"\n\nAdd \"SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images\"\n\nAdd \"Pose-free 3D Gaussian splatting via shape-ray estimation\"\n\nAdd \"3DGS Compression with Sparsity-guided Hierarchical Transform Coding\"\n\nAdd \"4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians\"\n\nAdd \"CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting\"\n\nAdd \"STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering\"\n\nAdd \"UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments\"\n\nAdd \"Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss\"\n\nAdd \"Hyperspectral Gaussian Splatting\"\n\nAdd \"Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis\"\n\nAdd \"MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation\"\n\nAdd \"Plenodium: UnderWater 3D Scene Reconstruction with Plenoptic Medium Representation\"\n\n### 2025/06/24\n\nAdd \"3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling\"\n\nAdd \"CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians\"\n\nAdd \"ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient\"\n\nAdd \"Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting\"\n\nAdd \"Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting\"\n\nAdd \"OmniIndoor3D: Comprehensive Indoor 3D Reconstruction\"\n\nAdd \"WeatherEdit: Controllable Weather Editing with 4D Gaussian Field\"\n\nAdd \"CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting\"\n\nAdd \"ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation\"\n\nAdd \"HaloGS: Loose Coupling of Compact Geometry and Gaussian Splats for 3D Scenes\"\n\nAdd \"ErpGS: Equirectangular Image Rendering enhanced with 3D Gaussian Regularization\"\n\nAdd \"Sparse2DGS: Sparse-View Surface Reconstruction using 2D Gaussian Splatting with Dense Point Cloud\"\n\nAdd \"ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting\"\n\nAdd \"Improving Novel view synthesis of 360∘ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images\"\n\nAdd \"Triangle Splatting for Real-Time Radiance Field Rendering\"\n\nAdd \"FHGS: Feature-Homogenized Gaussian Splatting\"\n\nAdd \"Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis\"\n\nAdd \"VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes\"\n\nAdd \"Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting\"\n\nAdd \"SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting\"\n\nAdd \"Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance\"\n\nAdd \"CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis\"\n\nAdd \"From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation\"\n\nAdd \"Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering\"\n\n### 2025/06/19\n\nAdd \"Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction\"\n\nAdd \"RUSplatting: Robust 3D Gaussian Splatting for Sparse-View Underwater Scene Reconstruction\"\n\nAdd \"PlantDreamer: Achieving Realistic 3D Plant Models with Diffusion-Guided Gaussian Splatting\"\n\nAdd \"EVA: Expressive Virtual Avatars from Multi-view Videos\"\n\nAdd \"GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation\"\n\nAdd \"X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography\"\n\nAdd \"GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting\"\n\nAdd \"MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models\"\n\nAdd \"Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning\"\n\nAdd \"Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image\"\n\nAdd \"MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene Reconstruction\"\n\nAdd \"Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos\"\n\nAdd \"Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation\"\n\nAdd \"3D Gaussian Adaptive Reconstruction for Fourier Light-Field Microscopy\"\n\nAdd \"TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy\"\n\nAdd \"SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations\"\n\nAdd \"iSegMan: Interactive Segment-and-Manipulate 3D Gaussians\"\n\nAdd \"GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity\"\n\nAdd \"MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos\"\n\nAdd \"GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats\"\n\nAdd \"EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes\"\n\nAdd \"GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention\"\n\nAdd \"ExploreGS: a vision-based low overhead framework for 3D scene reconstruction\"\n\nAdd \"Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting\"\n\nAdd \"VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality\"\n\nAdd \"ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars\"\n\nAdd \"Large-Scale Gaussian Splatting SLAM\"\n\nAdd \"Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware\"\n\nAdd \"Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians\"\n\nAdd \"Neural Video Compression using 2D Gaussian Splatting\"\n\nAdd \"TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian\"\n\nAdd \"DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting\"\n\nAdd \"FOCI: Trajectory Optimization on Gaussian Splats\"\n\nAdd \"SLAG: Scalable Language-Augmented Gaussian Splatting\"\n\nAdd \"Monocular Online Reconstruction with Enhanced Detail Preservation\"\n\n### 2025/05/28\n\nAdd \"3D Scene Generation: A Survey\"\n\nAdd \"Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes\"\n\nAdd \"TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling\"\n\nAdd \"UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes\"\n\nAdd \"QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization\"\n\nAdd \"Steepest Descent Density Control for Compact 3D Gaussian Splatting\"\n\nAdd \"Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation\"\n\nAdd \"SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation\"\n\nAdd \"Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields\"\n\nAdd \"MoRe-3DGSMR: Motion-resolved reconstruction framework for free-breathing pulmonary MRI based on 3D Gaussian representation\"\n\nAdd \"SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction\"\n\nAdd \"GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes\"\n\nAdd \"Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting\"\n\nAdd \"GUAVA: Generalizable Upper Body 3D Gaussian Avatar\"\n\nAdd \"3D Gaussian Splatting Data Compression with Mixture of Priors\"\n\nAdd \"Sparfels: Fast Reconstruction from Sparse Unposed Imagery\"\n\nAdd \"SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting\"\n\nAdd \"GarmentGS: Point-Cloud Guided Gaussian Splatting for High-Fidelity Non-Watertight 3D Garment Reconstruction\"\n\nAdd \"SignSplat: Rendering Sign Language via Gaussian Splatting\"\n\nAdd \"HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder\"\n\nAdd \"GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting\"\n\nAdd \"AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting\"\n\nAdd \"FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors\"\n\nAdd \"Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting\"\n\nAdd \"Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos\"\n\nAdd \"HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation\"\n\nAdd \"GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction\"\n\nAdd \"GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion\"\n\nAdd \"EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian\"\n\nAdd \"Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting\"\n\nAdd \"GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting\"\n\nAdd \"Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views\"\n\n### 2025/05/19\n\nAdd \"GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field\"\n\nAdd \"Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting\"\n\nAdd \"4DGS-CC: A Contextual Coding Framework for 4D Gaussian Splatting Data Compression\"\n\nAdd \"TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians\"\n\nAdd \"RGS-DR: Reflective Gaussian Surfels with Deferred Rendering for Shiny Objects\"\n\nAdd \"STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting\"\n\nAdd \"PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models\"\n\nAdd \"iVR-GS: Inverse Volume Rendering for Explorable Visualization via Editable 3D Gaussian Splatting\"\n\nAdd \"Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning\"\n\nAdd \"SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos\"\n\nAdd \"CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos\"\n\nAdd \"When Gaussian Meets Surfel: Ultra-fast High-fidelity Radiance Field Rendering\"\n\nAdd \"Gaussian Splatting is an Effective Data Generator for 3D Object Detection\"\n\nAdd \"PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation\"\n\nAdd \"HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction\"\n\nAdd \"ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration\"\n\nAdd \"StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians\"\n\nAdd \"Immersive Teleoperation Framework for Locomanipulation Tasks\"\n\nAdd \"MoBGS: Motion Deblurring Dynamic 3D Gaussian Splatting for Blurry Monocular Video\"\n\nAdd \"IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays\"\n\nAdd \"NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation\"\n\nAdd \"VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control\"\n\nAdd \"Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding\"\n\nAdd \"SEGA: Drivable 3D Gaussian Head Avatar from a Single Image\"\n\nAdd \"Green Robotic Mixed Reality with Gaussian Splatting\"\n\nAdd \"EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting\"\n\nAdd \"Volume Encoding Gaussians: Transfer Function-Agnostic 3D Gaussians for Volume Rendering\"\n\nAdd \"BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction\"\n\nAdd \"EDGS: Eliminating Densification for Efficient Convergence of 3DGS\"\n\n### 2025/04/20\n\nAdd \"Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation\"\n\nAdd \"ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos\"\n\nAdd \"Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs\"\n\nAdd \"CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation\"\n\nAdd \"GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration\"\n\nAdd \"Second-order Optimization of Gaussian Splats with Importance Sampling\"\n\nAdd \"AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering\"\n\nAdd \"CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation\"\n\nAdd \"TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors\"\n\nAdd \"ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior\"\n\nAdd \"SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians\"\n\nAdd \"CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting\"\n\nAdd \"3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians\"\n\nAdd \"Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation\"\n\nAdd \"3D Gabor Splatting: Reconstruction of High-frequency Surface Texture using Gabor Noise\"\n\nAdd \"GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR\"\n\nAdd \"DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting\"\n\nAdd \"LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis\"\n\nAdd \"ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting\"\n\nAdd \"EBAD-Gaussian: Event-driven Bundle Adjusted Deblur Gaussian Splatting\"\n\nAdd \"GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting\"\n\nAdd \"MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling\"\n\nAdd \"LightHeadEd: Relightable & Editable Head Avatars from a Smartphone\"\n\nAdd \"TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting\"\n\nAdd \"EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler\"\n\nAdd \"DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering\"\n\nAdd \"A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds\"\n\nAdd \"BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting\"\n\nAdd \"You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting\"\n\nAdd \"BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting\"\n\n### 2025/04/11\n\nAdd \"FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents\"\n\nAdd \"Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation\"\n\nAdd \"In-2-4D: Inbetweening from Two Single-View Images to 4D Generation\"\n\nAdd \"ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting\"\n\nAdd \"InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians\"\n\nAdd \"View-Dependent Uncertainty Estimation of 3D Gaussian Splatting\"\n\nAdd \"GIGA: Generalizable Sparse Image-driven Gaussian Avatars\"\n\nAdd \"SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets\"\n\nAdd \"Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting\"\n\nAdd \"IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments\"\n\nAdd \"SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering\"\n\nAdd \"GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction\"\n\nAdd \"Stochastic Ray Tracing of 3D Transparent Gaussians\"\n\nAdd \"HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation\"\n\nAdd \"econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians\"\n\nAdd \"Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting\"\n\nAdd \"View-Dependent Deformation Fields for 2D Editing of 3D Models\"\n\nAdd \"L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery\"\n\nAdd \"Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images\"\n\n### 2025/04/08\n\nAdd \"Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects\"\n\nAdd \"PanoDreamer: Consistent Text to 360-Degree Scene Generation\"\n\nAdd \"3D Gaussian Particle Approximation of VDB Datasets: A Study for Scientific Visualization\"\n\nAdd \"Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM\"\n\nAdd \"3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS\"\n\nAdd \"Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning\"\n\nAdd \"Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization\"\n\nAdd \"WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments\"\n\nAdd \"HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration\"\n\nAdd \"Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization\"\n\nAdd \"Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model\"\n\nAdd \"MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM\"\n\nAdd \"ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation\"\n\nAdd \"Digital-twin imaging based on descattering Gaussian splatting\"\n\nAdd \"UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting\"\n\nAdd \"WorldPrompter: Traversable Text-to-Scene Generation\"\n\nAdd \"Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis\"\n\nAdd \"Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting\"\n\nAdd \"BOGausS: Better Optimized Gaussian Splatting\"\n\nAdd \"FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking\"\n\nAdd \"FlowR: Flowing from Sparse to Dense 3D Reconstructions\"\n\nAdd \"3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting\"\n\nAdd \"RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avata\"\n\nAdd \"High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model\"\n\nAdd \"Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment\"\n\nAdd \"3D Gaussian Inverse Rendering with Approximated Global Illumination\"\n\nAdd \"DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting\"\n\nAdd \"UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction\"\n\nAdd \"Monocular and Generalizable Gaussian Talking Head Animation\"\n\nAdd \"Coca-Splat: Collaborative Optimization for Camera Parameters and 3D Gaussians\"\n\nAdd \"Robust LiDAR-Camera Calibration with 2D Gaussian Splatting\"\n\nAdd \"Distilling Multi-view Diffusion Models into 3D Generators\"\n\nAdd \"ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs\"\n\nAdd \"Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration\"\n\nAdd \"LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors\"\n\nAdd \"SonarSplat: Novel View Synthesis of Imaging Sonar via Gaussian Splatting\"\n\nAdd \"Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views\"\n\n### 2025/04/07\n\nAdd \"StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting\"\n\nAdd \"Visual Acoustic Fields\"\n\nAdd \"DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting\"\n\nAdd \"Learning 3D-Gaussian Simulators from RGB Videos\"\n\nAdd \"ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image\"\n\nAdd \"Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR\"\n\nAdd \"Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction\"\n\nAdd \"ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning\"\n\nAdd \"NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations\"\n\nAdd \"CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction\"\n\nAdd \"FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction\"\n\nAdd \"VizFlyt: Perception-centric Pedagogical Framework For Autonomous Aerial Robots\"\n\nAdd \"TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting\"\n\nAdd \"Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis\"\n\nAdd \"EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting\"\n\nAdd \"AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation\"\n\nAdd \"Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance\"\n\nAdd \"ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting\"\n\nAdd \"Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting\"\n\nAdd \"Disentangled 4D Gaussian Splatting: Towards Faster and More Efficient Dynamic Scene Rendering\"\n\nAdd \"EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis\"\n\n### 2025/03/28\n\nAdd \"EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis\"\n\nAdd \"X^2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction\"\n\nAdd \"Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying\"\n\nAdd \"RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting\"\n\nAdd \"STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM\"\n\nAdd \"Frequency-Aware Gaussian Splatting Decomposition\"\n\nAdd \"CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis\"\n\nAdd \"PGC: Physics-Based Gaussian Cloth from a Single Pose\"\n\nAdd \"Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields\"\n\nAdd \"TC-GS: Tri-plane based compression for 3D Gaussian Splatting\"\n\nAdd \"EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis\"\n\nAdd \"PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model\"\n\nAdd \"Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields\"\n\nAdd \"High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting\"\n\nAdd \"GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting\"\n\nAdd \"SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors\"\n\nAdd \"COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting\"\n\nAdd \"From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting\"\n\nAdd \"Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussian Spatting\"\n\nAdd \"MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection\"\n\nAdd \"HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting\"\n\nAdd \"NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting\"\n\nAdd \"GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting\"\n\nAdd \"Hardware-Rasterized Ray-Based Gaussian Splatting\"\n\nAdd \"LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment\"\n\nAdd \"StableGS: A Floater-Free Framework for 3D Gaussian Splatting\"\n\nAdd \"ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation\"\n\nAdd \"4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video\"\n\nAdd \"DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds\"\n\nAdd \"GI-SLAM: Gaussian-Inertial SLAM\"\n\nAdd \"Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving\"\n\nAdd \"PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding\"\n\nAdd \"PanopticSplatting: End-to-End Panoptic Gaussian Splatting\"\n\nAdd \"SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining\"\n\nAdd \"Real-time Global Illumination for Dynamic 3D Gaussian Scenes\"\n\nAdd \"GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting\"\n\nAdd \"GS-LTS: 3D Gaussian Splatting-Based Adaptive Modeling for Long-Term Service Robots\"\n\nAdd \"Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting\"\n\nAdd \"Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping\"\n\nAdd \"ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes\"\n\n### 2025/03/24\n\nAdd \"TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting\"\n\nAdd \"Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting\"\n\nAdd \"DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery\"\n\nAdd \"Optimized Minimal 3D Gaussian Splatting\"\n\nAdd \"RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos\"\n\nAdd \"SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality\"\n\nAdd \"4D Gaussian Splatting SLAM\"\n\nAdd \"GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting\"\n\nAdd \"1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering\"\n\nAdd \"M3: 3D-Spatial MultiModal Memory\"\n\nAdd \"Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images\"\n\nAdd \"OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering\"\n\nAdd \"Enhancing Close-up Novel View Synthesis via Pseudo-labeling\"\n\nAdd \"Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation\"\n\nAdd \"VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling\"\n\nAdd \"BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting\"\n\nAdd \"Controlling Avatar Diffusion with Learnable Gaussian Embedding\"\n\nAdd \"Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes\"\n\nAdd \"CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image\"\n\nAdd \"ClimateGS: Real-Time Climate Simulation with 3D Gaussian Style Transfer\"\n\n### 2025/03/21\n\nAdd \"Online Language Splatting\"\n\nAdd \"Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation\"\n\nAdd \"FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting\"\n\nAdd \"Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting\"\n\nAdd \"TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness\"\n\nAdd \"GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping\"\n\nAdd \"3D Student Splatting and Scooping\"\n\nAdd \"GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction\"\n\nAdd \"ROODI: Reconstructing Occluded Objects with Denoising Inpainters\"\n\nAdd \"VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames\"\n\nAdd \"4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models\"\n\nAdd \"MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction\"\n\nAdd \"LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds\"\n\nAdd \"RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors\"\n\nAdd \"Uncertainty-Aware Normal-Guided Gaussian Splatting for Surface Reconstruction from Sparse Image Sequences\"\n\nAdd \"EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting\"\n\nAdd \"Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information\"\n\nAdd \"Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation\"\n\nAdd \"Snapmoji: Instant Generation of Animatable Dual-Stylized Avatars\"\n\nAdd \"DynaGSLAM: Real-Time Gaussian-Splatting SLAM for Online Rendering, Tracking, Motion Predictions of Moving Objects in Dynamic Scenes\"\n\nAdd \"DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting\"\n\nAdd \"3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction\"\n\nAdd \"REdiSplats: Ray Tracing for Editable Gaussian Splatting\"\n\nAdd \"Swift4D:Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene\"\n\nAdd \"GS-I3: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images\"\n\nAdd \"TopoGaussian: Inferring Internal Topology Structures from Visual Clues\"\n\nAdd \"VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting\"\n\nAdd \"SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs\"\n\nAdd \"MTGS: Multi-Traversal Gaussian Splatting\"\n\nAdd \"Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View\"\n\nAdd \"Deblur Gaussian Splatting SLAM\"\n\nAdd \"CompMarkGS: Robust Watermarking for Compression 3D Gaussian Splatting\"\n\nAdd \"CAT-3DGS Pro: A New Benchmark for Efficient 3DGS Compression\"\n\nAdd \"RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars\"\n\nAdd \"Gaussian On-the-Fly Splatting: A Progressive Framework for Robust Near Real-Time 3DGS Optimization\"\n\nAdd \"DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction\"\n\nAdd \"Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors\"\n\nAdd \"Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model\"\n\nAdd \"BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering\"\n\nAdd \"Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting\"\n\nAdd \"Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images\"\n\nAdd \"RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images\"\n\nAdd \"Improving Adaptive Density Control for 3D Gaussian Splatting\"\n\nAdd \"Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation\"\n\nAdd \"SplatVoxel: History-Aware Novel View Streaming without Temporal Training\"\n\nAdd \"HandSplat: Embedding-Driven Gaussian Splatting for High-Fidelity Hand Rendering\"\n\nAdd \"SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting\"\n\n### 2025/03/20\n\nAdd \"MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior\"\n\nAdd \"S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction\"\n\nAdd \"Dynamic Scene Reconstruction: Recent Advance in Real-time Rendering and Streaming\"\n\nAdd \"DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction\"\n\nAdd \"Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior\"\n\nAdd \"Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars\"\n\nAdd \"LUCAS: Layered Universal Codec Avatars\"\n\n### 2025/03/17\n\nAdd \"Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training\"\n\nAdd \"GASPACHO: Gaussian Splatting for Controllable Humans and Objects\"\n\nAdd \"Motion Blender Gaussian Splatting for Dynamic Reconstruction\"\n\nAdd \"PCGS: Progressive Compression of 3D Gaussian Splatting\"\n\nAdd \"TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting\"\n\nAdd \"Mitigating Ambiguities in 3D Classification with Gaussian Splatting\"\n\nAdd \"Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios\"\n\nAdd \"HRAvatar: High-Quality and Relightable Gaussian Head Avatar\"\n\nAdd \"ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting\"\n\nAdd \"MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction\"\n\nAdd \"GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats\"\n\nAdd \"7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting\"\n\nAdd \"POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality\"\n\nAdd \"SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting\"\n\nAdd \"EigenGS Representation: From Eigenspace to Gaussian Image Space\"\n\nAdd \"All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting\"\n\nAdd \"Frequency-Aware Density Control via Reparameterization for High-Quality Rendering of 3D Gaussian Splatting\"\n\nAdd \"DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation\"\n\nAdd \"ActiveInitSplat: How Active Image Selection Helps Gaussian Splatting\"\n\nAdd \"Gaussian RBFNet: Gaussian Radial Basis Functions for Fast and Accurate Representation and Reconstruction of Neural Fields\"\n\nAdd \"CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving\"\n\nAdd \"D3DR: Lighting-Aware Object Insertion in Gaussian Splatting\"\n\nAdd \"REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints\"\n\nAdd \"Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling\"\n\nAdd \"Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction\"\n\nAdd \"StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting\"\n\nAdd \"SplatTalk: 3D VQA with Gaussian Splatting\"\n\nAdd \"StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams\"\n\nAdd \"ForestSplats: Deformable transient field for Gaussian Splatting in the Wild\"\n\nAdd \"Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction\"\n\nAdd \"GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation\"\n\nAdd \"SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography\"\n\nAdd \"Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting\"\n\n### 2025/03/10\n\nAdd \"D2GV: Deformable 2D Gaussian Splatting for Video Representation in 400FPS\"\n\nAdd \"Free Your Hands: Lightweight Relightable Turntable Capture Pipeline\"\n\nAdd \"LiDAR-enhanced 3D Gaussian Splatting Mapping\"\n\nAdd \"Self-Modeling Robots by Photographing\"\n\nAdd \"CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images\"\n\nAdd \"STGA: Selective-Training Gaussian Head Avatars\"\n\nAdd \"Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects\"\n\nAdd \"MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions\"\n\nAdd \"SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting\"\n\nAdd \"SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting\"\n\nAdd \"EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation\"\n\nAdd \"GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting\"\n\nAdd \"GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting\"\n\nAdd \"Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs\"\n\nAdd \"GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting\"\n\nAdd \"S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting\"\n\nAdd \"Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting\"\n\nAdd \"Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering\"\n\nAdd \"Beyond Existance: Fulfill 3D Reconstructed Scenes with Pseudo Details\"\n\nAdd \"GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding\"\n\nAdd \"GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics\"\n\nAdd \"LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation\"\n\nAdd \"NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics\"\n\nAdd \"Compression in 3D Gaussian Splatting: A Survey of Methods, Trends, and Future Directions\"\n\nAdd \"2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting\"\n\nAdd \"DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting\"\n\nAdd \"Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization\"\n\nAdd \"Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models\"\n\nAdd \"OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding\"\n\nAdd \"LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training\"\n\nAdd \"FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion\"\n\nAdd \"Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization\"\n\nAdd \"Vid2Fluid: 3D Dynamic Fluid Assets from Single-View Videos with Generative Gaussian Splatting\"\n\nAdd \"PSRGS:Progressive Spectral Residual of 3D Gaussian for High-Frequency Recovery\"\n\nAdd \"Enhancing Monocular 3D Scene Completion with Diffusion Model\"\n\nAdd \"GaussianSeal: Rooting Adaptive Watermarks for 3D Gaussian Generation Model\"\n\nAdd \"Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups\"\n\nAdd \"CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression\"\n\nAdd \"Abstract Rendering: Computing All that is Seen in Gaussian Splat Scenes\"\n\nAdd \"Seeing A 3D World in A Grain of Sand\"\n\nAdd \"FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering\"\n\nAdd \"EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering\"\n\nAdd \"ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting\"\n\nAdd \"Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling\"\n\nAdd \"No Parameters, No Problem: 3D Gaussian Splatting without Camera Intrinsics and Extrinsics\"\n\nAdd \"Open-Vocabulary Semantic Part Segmentation of 3D Human\"\n\nAdd \"Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting\"\n\nAdd \"Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?\"\n\nAdd \"OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation\"\n\n### 2025/02/27\n\nAdd \"UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting\"\n\nAdd \"Laplace-Beltrami Operator for Gaussian Splatting\"\n\nAdd \"Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting\"\n\nAdd \"GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow\"\n\nAdd \"GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis\"\n\nAdd \"Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration\"\n\nAdd \"Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control\"\n\nAdd \"RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes\"\n\nAdd \"DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation\"\n\nAdd \"GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models\"\n\nAdd \"Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting\"\n\nAdd \"High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation\"\n\n### 2025/02/21\n\nAdd \"CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting\"\n\nAdd \"OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving\"\n\nAdd \"GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian\"\n\nAdd \"Inter3D: A Benchmark and Strong Baseline for Human-Interactive 3D Object Reconstruction\"\n\nAdd \"3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments\"\n\nAdd \"GS-QA: Comprehensive Quality Assessment Benchmark for Gaussian Splatting View Synthesis\"\n\nAdd \"RadSplatter: Extending 3D Gaussian Splatting to Radio Frequencies for Wireless Radiomap Extrapolation\"\n\nAdd \"PUGS: Zero-shot Physical Understanding with Gaussian Splatting\"\n\n### 2025/02/18\n\nAdd \"3D Gaussian Inpainting with Depth-Guided Cross-View Consistency\"\n\nAdd \"Exploring the Versal AI Engine for 3D Gaussian Splatting\"\n\nAdd \"GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text\"\n\nAdd \"OMG: Opacity Matters in Material Modeling with Gaussian Splatting\"\n\nAdd \"GS-GVINS: A Tightly-integrated GNSS-Visual-Inertial Navigation System Augmented by 3D Gaussian Splatting\"\n\nAdd \"E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting\"\n\nAdd \"X-SG2S: Safe and Generalizable Gaussian Splatting with X-dimensional Watermarks\"\n\nAdd \"Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction\"\n\nAdd \"DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior\"\n\nAdd \"Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting\"\n\nAdd \"TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation\"\n\nAdd \"MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization\"\n\nAdd \"Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors\"\n\nAdd \"SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps\"\n\nAdd \"Three-Dimensional MRI Reconstruction with Gaussian Representations: Tackling the Undersampling Problem\"\n\nAdd \"Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform\"\n\nAdd \"PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map\"\n\nAdd \"AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting\"\n\nAdd \"GaussRender: Learning 3D Occupancy with Gaussian Rendering\"\n\nAdd \"OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting\"\n\nAdd \"SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting\"\n\nAdd \"High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting\"\n\nAdd \"GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAM\"\n\nAdd \"GP-GS: Gaussian Processes for Enhanced Gaussian Splatting\"\n\nAdd \"LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation\"\n\nAdd \"UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping\"\n\nAdd \"Scalable 3D Gaussian Splatting-Based RF Signal Spatial Propagation Modeling\"\n\nAdd \"VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion\"\n\nAdd \"Radiant Foam: Real-Time Differentiable Ray Tracing\"\n\n### 2025/02/04\n\nAdd \"PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation\"\n\nAdd \"EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis\"\n\nAdd \"Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation\"\n\nAdd \"Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping\"\n\nAdd \"RaySplats: Ray Tracing based Gaussian Splatting\"\n\nAdd \"JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting\"\n\nAdd \"OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation\"\n\nAdd \"Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting\"\n\nAdd \"Deformable Beta Splatting\"\n\nAdd \"StructuredField: Unifying Structured Geometry and Radiance Field\"\n\n### 2025/02/03\n\nAdd \"VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting\"\n\nAdd \"CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering\"\n\nAdd \"FeatureGS: Eigenvalue-Feature Optimization in 3D Gaussian Splatting for Geometrically Accurate and Artifact-Reduced Reconstruction\"\n\nAdd \"Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds\"\n\nAdd \"DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation\"\n\nAdd \"GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting\"\n\nAdd \"Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos\"\n\nAdd \"HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion\"\n\nAdd \"Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting\"\n\nAdd \"Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video\"\n\nAdd \"Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images\"\n\nAdd \"Dense-SfM: Structure from Motion with Dense Consistent Matching\"\n\nAdd \"HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting\"\n\nAdd \"3DGS2: Near Second-order Converging 3D Gaussian Splatting\"\n\nAdd \"GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting\"\n\nAdd \"GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression\"\n\nAdd \"MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance\"\n\nAdd \"GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization\"\n\nAdd \"VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM\"\n\nAdd \"Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos\"\n\nAdd \"Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes\"\n\n### 2025/01/22\n\nAdd \"DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions\"\n\nAdd \"HAC++: Towards 100X Compression of 3D Gaussian Splatting\"\n\nAdd \"GaussianVideo: Efficient Video Representation Through 2D Gaussian Splatting\"\n\nAdd \"See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization\"\n\nAdd \"RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering\"\n\nAdd \"Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car Reconstruction\"\n\nAdd \"Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting\"\n\nAdd \"BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation\"\n\nAdd \"GSTAR: Gaussian Surface Tracking and Reconstruction\"\n\nAdd \"GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor\"\n\nAdd \"Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study\"\n\n### 2025/01/16\n\nAdd \"CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation\"\n\nAdd \"GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping\"\n\nAdd \"3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering\"\n\nAdd \"VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes\"\n\nAdd \"Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes\"\n\nAdd \"3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh\"\n\nAdd \"RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians\"\n\nAdd \"SplatMAP: Online Dense Monocular SLAM with 3D Gaussian Splatting\"\n\nAdd \"ActiveGAMER: Active GAussian Mapping through Efficient Rendering\"\n\nAdd \"Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution\"\n\nAdd \"F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting\"\n\n### 2025/01/13\n\nAdd \"Locality-aware Gaussian Compression for Fast and High-quality Rendering\"\n\nAdd \"Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation\"\n\nAdd \"Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance\"\n\nAdd \"Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and Photorealistic Mapping\"\n\nAdd \"GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting\"\n\nAdd \"FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency\"\n\nAdd \"Spatiotemporal Gaussian Optimization for 4D Cone Beam CT Reconstruction from Sparse Projections\"\n\nAdd \"ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting\"\n\nAdd \"MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting\"\n\nAdd \"DehazeGS: Seeing Through Fog with 3D Gaussian Splatting\"\n\nAdd \"ConcealGS: Concealing Invisible Copyright Information in 3D Gaussian Splatting\"\n\nAdd \"Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs\"\n\nAdd \"Gaussian Masked Autoencoders\"\n\nAdd \"HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation\"\n\nAdd \"GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking\"\n\nAdd \"EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation\"\n\n### 2025/01/06\n\nAdd \"Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision\"\n\nAdd \"CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction\"\n\nAdd \"PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping\"\n\nAdd \"Deformable Gaussian Splatting for Efficient and High-Fidelity Reconstruction of Surgical Scenes\"\n\nAdd \"EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy\"\n\nAdd \"Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting\"\n\nAdd \"STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes\"\n\nAdd \"DreamDrive: Generative 4D Scene Modeling from Street View Images\"\n\nAdd \"PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM\"\n\nAdd \"SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians\"\n\nAdd \"OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies\"\n\nAdd \"PERSE: Personalized 3D Generative Avatars from A Single Portrait\"\n\nAdd \"KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences\"\n\nAdd \"4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives\"\n\nAdd \"MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks\"\n\nAdd \"DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis\"\n\nAdd \"GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting\"\n\nAdd \"FlameGS: Reconstruct flame light field via Gaussian Splatting\"\n\nAdd \"DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction\"\n\nAdd \"Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images\"\n\nAdd \"Learning Radiance Fields from a Single Snapshot Compressive Image\"\n\nAdd \"BeSplat -- Gaussian Splatting from a Single Blurry Image and Event Stream\"\n\nAdd \"Reflective Gaussian Splatting\"\n\nAdd \"Generating Editable Head Avatars with 3D Gaussian GANs\"\n\nAdd \"CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting\"\n\nAdd \"MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo\"\n\nAdd \"WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting\"\n\nAdd \"ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization\"\n\n### 2024/12/26\n\nAdd \"RSGaussian:3D Gaussian Splatting with LiDAR for Aerial Remote Sensing Novel View Synthesis\"\n\nAdd \"FaceLift: Single Image to 3D Head with View Generation and GS-LRM\"\n\nAdd \"ActiveGS: Active Scene Reconstruction using Gaussian Splatting\"\n\nAdd \"GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance\"\n\nAdd \"LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding\"\n\nAdd \"CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction\"\n\nAdd \"Exploring Dynamic Novel View Synthesis Technologies for Cinematography\"\n\nAdd \"Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling\"\n\nAdd \"GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs\"\n\nAdd \"GeoTexDensifier: Geometry-Texture-Aware Densification for High-Quality Photorealistic 3D Gaussian Splatting\"\n\nAdd \"Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity\"\n\nAdd \"OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities\"\n\nAdd \"SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum\"\n\nAdd \"Interactive Scene Authoring with Specialized Generative Primitives\"\n\nAdd \"CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images\"\n\nAdd \"IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing\"\n\nAdd \"AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion\"\n\nAdd \"EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene\"\n\nAdd \"LiHi-GS: LiDAR-Supervised Gaussian Splatting for Highway Driving Scene Reconstruction\"\n\nAdd \"SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction\"\n\nAdd \"EnvGS: Modeling View-Dependent Appearance with Environment Gaussian\"\n\n### 2024/12/20\n\nAdd \"SqueezeMe: Efficient Gaussian Avatars for VR\"\n\nAdd \"IDOL: Instant Photorealistic 3D Human Creation from a Single Image\"\n\nAdd \"GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting\"\n\nAdd \"Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation\"\n\nAdd \"GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians\"\n\nAdd \"GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting\"\n\nAdd \"4D Radar-Inertial Odometry based on Gaussian Modeling and Multi-Hypothesis Scan Matching\"\n\nAdd \"Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields\"\n\nAdd \"GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding\"\n\nAdd \"NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment\"\n\nAdd \"EOGS: Gaussian Splatting for Earth Observation\"\n\nAdd \"4DRGS: 4D Radiative Gaussian Splatting for Efficient 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images\"\n\nAdd \"CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image\"\n\nAdd \"HyperGS: Hyperspectral 3D Gaussian Splatting\"\n\nAdd \"Gaussian Billboards: Expressive 2D Gaussian Splatting with Textures\"\n\nAdd \"3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting\"\n\nAdd \"PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting\"\n\nAdd \"Wonderland: Navigating 3D Scenes from a Single Image\"\n\nAdd \"GS-ProCams: Gaussian Splatting-based Projector-Camera Systems\"\n\nAdd \"Deformable Radial Kernel Splatting\"\n\nAdd \"3D2-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling\"\n\nAdd \"SweepEvGS: Event-Based 3D Gaussian Splatting for Macro and Micro Radiance Field Rendering from a Single Sweep\"\n\nAdd \"EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting\"\n\nAdd \"GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs\"\n\nAdd \"DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting\"\n\nAdd \"GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction\"\n\nAdd \"GaussianAD: Gaussian-Centric End-to-End Autonomous Driving\"\n\nAdd \"SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians\"\n\nAdd \"GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion\"\n\nAdd \"Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories\"\n\nAdd \"TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views\"\n\nAdd \"SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video\"\n\nAdd \"RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting\"\n\nAdd \"MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction\"\n\nAdd \"DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models\"\n\n### 2024/12/15\n\nAdd \"Feat2GS: Probing Visual Foundation Models with Gaussian Splatting\"\n\nAdd \"LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors\"\n\nAdd \"FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction\"\n\nAdd \"SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing\"\n\nAdd \"GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency\"\n\nAdd \"LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting\"\n\nAdd \"PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis\"\n\nAdd \"SLGaussian: Fast Language Gaussian Splatting in Sparse Views\"\n\nAdd \"ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing\"\n\nAdd \"Diffusion-Based Attention Warping for Consistent 3D Scene Editing\"\n\nAdd \"GASP: Gaussian Avatars with Synthetic Priors\"\n\nAdd \"Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians\"\n\nAdd \"Faster and Better 3D Splatting via Group Training\"\n\nAdd \"ReCap: Better Gaussian Relighting with Cross-Environment Captures\"\n\nAdd \"ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery\"\n\nAdd \"EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering\"\n\nAdd \"MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds\"\n\nAdd \"MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views\"\n\nAdd \"Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video\"\n\nAdd \"4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes\"\n\nAdd \"Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction\"\n\nAdd \"Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects\"\n\n### 2024/12/11\n\nAdd \"Splatter-360: Generalizable 360∘ Gaussian Splatting for Wide-baseline Panoramic Images\"\n\nAdd \"Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction\"\n\nAdd \"Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation\"\n\nAdd \"SizeGS: Size-aware Compression of 3D Gaussians with Hierarchical Mixed Precision Quantization\"\n\nAdd \"GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting and Meshing\"\n\nAdd \"Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes\"\n\nAdd \"WATER-GS: Toward Copyright Protection for 3D Gaussian Splatting via Universal Watermarking\"\n\nAdd \"Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis\"\n\nAdd \"Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation\"\n\nAdd \"Street Gaussians without 3D Object Tracker\"\n\nAdd \"Radiant: Large-scale 3D Gaussian Rendering based on Hierarchical Framework\"\n\nAdd \"Extrapolated Urban View Synthesis Benchmark\"\n\nAdd \"MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting\"\n\nAdd \"Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction\"\n\nAdd \"WRF-GS: Wireless Radiation Field Reconstruction with 3D Gaussian Splatting\"\n\nAdd \"Pushing Rendering Boundaries: Hard Gaussian Splatting\"\n\nAdd \"Turbo3D: Ultra-fast Text-to-3D Generation\"\n\n### 2024/12/09\n\nAdd \"QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos\"\n\nAdd \"Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps\"\n\nAdd \"PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars\"\n\nAdd \"EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding\"\n\nAdd \"InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models\"\n\nAdd \"Multi-View Pose-Agnostic Change Localization with Zero Labels\"\n\nAdd \"DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction\"\n\nAdd \"HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting\"\n\nAdd \"Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos\"\n\nAdd \"Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction\"\n\nAdd \"2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction\"\n\nAdd \"Volumetrically Consistent 3D Gaussian Rasterization\"\n\nAdd \"SGSST: Scaling Gaussian Splatting StyleTransfer\"\n\nAdd \"Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting\"\n\nAdd \"RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos\"\n\nAdd \"Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects\"\n\nAdd \"AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction\"\n\nAdd \"RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians\"\n\nAdd \"GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos\"\n\nAdd \"Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance\"\n\nAdd \"SparseLGS: Sparse View Language Embedded Gaussian Splatting\"\n\nAdd \"SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images\"\n\nAdd \"Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion\"\n\nAdd \"Planar Gaussian Splatting\"\n\nAdd \"HDGS: Textured 2D Gaussian Splatting for Enhanced Scene Rendering\"\n\nAdd \"Occam's LGS: A Simple Approach for Language Gaussian Splatting\"\n\nAdd \"Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes\"\n\nAdd \"3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting\"\n\nAdd \"SfM-Free 3D Gaussian Splatting via Hierarchical Training\"\n\nAdd \"6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting\"\n\nAdd \"ULSR-GS: Ultra Large-scale Surface Reconstruction Gaussian Splatting with Multi-View Geometric Consistency\"\n\nAdd \"RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting\"\n\nAdd \"Ref-GS: Directional Factorization for 2D Gaussian Splatting\"\n\nAdd \"DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair\"\n\nAdd \"ChatSplat: 3D Conversational Gaussian Splatting\"\n\nAdd \"FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting\"\n\nAdd \"A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision\"\n\nAdd \"Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives\"\n\nAdd \"LineGS : 3D Line Segment Representation on 3D Gaussian Splatting\"\n\nAdd \"GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision\"\n\nAdd \"Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling\"\n\nAdd \"T-3DGS: Removing Transient Objects for 3D Scene Reconstruction\"\n\n### 2024/12/02\n\nAdd \"GuardSplat: Robust and Efficient Watermarking for 3D Gaussian Splatting\"\n\nAdd \"DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering\"\n\nAdd \"TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting\"\n\nAdd \"Tortho-Gaussian: Splatting True Digital Orthophoto Maps\"\n\nAdd \"Gaussian Splashing: Direct Volumetric Rendering Underwater\"\n\nAdd \"Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding\"\n\nAdd \"GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction\"\n\nAdd \"SADG: Segment Any Dynamic Gaussian Without Object Trackers\"\n\nAdd \"AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones\"\n\nAdd \"InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception\"\n\nAdd \"Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes\"\n\nAdd \"SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors\"\n\nAdd \"RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning\"\n\nAdd \"GaussianSpeech: Audio-Driven Gaussian Avatars\"\n\nAdd \"Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting\"\n\n### 2024/11/29\n\nAdd \"Textured Gaussians for Enhanced 3D Scene Appearance Modeling\"\n\nAdd \"PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image\"\n\nAdd \"HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression\"\n\nAdd \"Neural Surface Priors for Editable Gaussian Splatting\"\n\nAdd \"Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters\"\n\nAdd \"SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images\"\n\nAdd \"GLS: Geometry-aware 3D Language Gaussian Splatting\"\n\nAdd \"HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction\"\n\nAdd \"DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting\"\n\nAdd \"Distractor-free Generalizable 3D Gaussian Splatting\"\n\nAdd \"SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting\"\n\nAdd \"Geometry Field Splatting with Gaussian Surfels\"\n\nAdd \"4D Scaffold Gaussian Splatting for Memory Efficient Dynamic Scene Reconstruction\"\n\nAdd \"G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs\"\n\nAdd \"PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence\"\n\nAdd \"SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving\"\n\nAdd \"MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM\"\n\nAdd \"NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model\"\n\nAdd \"GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context\"\n\nAdd \"Bundle Adjusted Gaussian Avatars Deblurring\"\n\nAdd \"SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis\"\n\nAdd \"Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction\"\n\nAdd \"Event-boosted Deformable 3D Gaussians for Fast Dynamic Scene Reconstruction\"\n\nAdd \"UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation\"\n\nAdd \"Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors\"\n\nAdd \"PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments\"\n\nAdd \"ZeroGS: Training 3D Gaussian Splatting from Unposed Images\"\n\nAdd \"DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models\"\n\nAdd \"GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision\"\n\nAdd \"EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting\"\n\nAdd \"SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving\"\n\nAdd \"Gassidy: Gaussian Splatting SLAM in Dynamic Environments\"\n\nAdd \"SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion\"\n\nAdd \"UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations\"\n\nAdd \"Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting\"\n\nAdd \"Neural 4D Evolution under Large Topological Changes from 2D Images\"\n\n### 2024/11/25\n\nAdd \"3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes\"\n\nAdd \"Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly Training for 4D Reconstruction\"\n\nAdd \"VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving\"\n\nAdd \"NexusSplats: Efficient 3D Gaussian Splatting in the Wild\"\n\nAdd \"Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation\"\n\nAdd \"FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting\"\n\nAdd \"GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting\"\n\nAdd \"Sim Anything: Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting\"\n\nAdd \"Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification\"\n\nAdd \"PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy\"\n\nAdd \"SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image\"\n\nAdd \"GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving\"\n\nAdd \"Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels\"\n\nAdd \"DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes\"\n\nAdd \"LiV-GS: LiDAR-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments\"\n\nAdd \"Sketch-guided Cage-based 3D Gaussian Splatting Deformation\"\n\nAdd \"FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting\"\n\nAdd \"TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction\"\n\nAdd \"DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes\"\n\nAdd \"RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator\"\n\nAdd \"GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views\"\n\nAdd \"VeGaS: Video Gaussian Splatting\"\n\nAdd \"Direct and Explicit 3D Generation from a Single Image\"\n\nAdd \"DGS-SLAM: Gaussian Splatting SLAM in Dynamic Environment\"\n\nAdd \"USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting\"\n\n### 2024/11/18\n\nAdd \"Efficient Density Control for 3D Gaussian Splatting\"\n\nAdd \"GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization\"\n\nAdd \"GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video\"\n\nAdd \"DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction\"\n\n### 2024/11/12\n\nAdd \"4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization\"\n\nAdd \"BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis\"\n\nAdd \"DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization\"\n\nAdd \"MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation\"\n\nAdd \"Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation\"\n\nAdd \"GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting\"\n\nAdd \"HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting\"\n\nAdd \"GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering\"\n\nAdd \"A Hierarchical Compression Technique for 3D Gaussian Splatting Compression\"\n\nAdd \"Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction\"\n\nAdd \"SplatFormer: Point Transformer for Robust 3D Gaussian Splatting\"\n\nAdd \"GaussianSpa: An Optimizing-Sparsifying Simplification Framework for Compact and High-Quality 3D Gaussian Splatting\"\n\nAdd \"PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering\"\n\nAdd \"ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing\"\n\n### 2024/11/10\n\nAdd \"MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views\"\n\nAdd \"GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting\"\n\nAdd \"3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement\"\n\nAdd \"Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis\"\n\nAdd \"Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting\"\n\n### 2024/11/06\n\nAdd \"HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features\"\n\nAdd \"LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting\"\n\nAdd \"Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting\"\n\nAdd \"FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training\"\n\nAdd \"GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes\"\n\nAdd \"Real-Time Spatio-Temporal Reconstruction of Dynamic Endoscopic Scenes with 4D Gaussian Splatting\"\n\nAdd \"CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes\"\n\nAdd \"Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes\"\n\nAdd \"Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis\"\n\nAdd \"URAvatar: Universal Relightable Gaussian Codec Avatars\"\n\n### 2024/11/04\n\nAdd \"No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images\"\n\nAdd \"GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering\"\n\nAdd \"GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting\"\n\nAdd \"GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring\"\n\nAdd \"ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting\"\n\nAdd \"Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis\"\n\nAdd \"Geometry Cloak: Preventing TGS-based 3D Reconstruction from Copyrighted Images\"\n\nAdd \"PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting\"\n\nAdd \"FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives\"\n\nAdd \"ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting\"\n\nAdd \"ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting\"\n\n### 2024/10/29\n\nAdd \"Grid4D: 4D Decomposed Hash Encoding for High-fidelity Dynamic Gaussian Splatting\"\n\nAdd \"LoDAvatar: Hierarchical Embedding and Adaptive Levels of Detail with Gaussian Splatting for Enhanced Human Avatars\"\n\nAdd \"CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians\"\n\nAdd \"ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings\"\n\nAdd \"Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering\"\n\nAdd \"SCube: Instant Large-Scale Scene Reconstruction using VoxSplats\"\n\nAdd \"DiffGS: Functional Gaussian Splatting Diffusion\"\n\nAdd \"PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views\"\n\nAdd \"Sort-free Gaussian Splatting via Weighted Sum Rendering\"\n\nAdd \"Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling\"\n\nAdd \"Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis\"\n\nAdd \"VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points\"\n\nAdd \"PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting\"\n\nAdd \"AG-SLAM: Active Gaussian Splatting SLAM\"\n\nAdd \"SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes\"\n\n### 2024/10/23\n\nAdd \"GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting\"\n\nAdd \"E-3DGS: Gaussian Splatting with Exposure and Motion Events\"\n\nAdd \"Multi-Layer Gaussian Splatting for Immersive Anatomy Visualization\"\n\nAdd \"MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors\"\n\nAdd \"3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors\"\n\nAdd \"LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images\"\n\nAdd \"Fully Explicit Dynamic Gaussian Splatting\"\n\nAdd \"EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting\"\n\n### 2024/10/21\n\nAdd \"3D Gaussian Splatting in Robotics: A Survey\"\n\nAdd \"LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes\"\n\nAdd \"Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set\"\n\nAdd \"DepthSplat: Connecting Gaussian Splatting and Depth\"\n\nAdd \"MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes\"\n\nAdd \"DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering\"\n\nAdd \"GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting\"\n\nAdd \"Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats\"\n\nAdd \"LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images\"\n\nAdd \"GS^3: Efficient Relighting with Triple Gaussian Splatting\"\n\nAdd \"MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields\"\n\nAdd \"GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance information\"\n\nAdd \"Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting\"\n\nAdd \"Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting\"\n\nAdd \"4-LEGS: 4D Language Embedded Gaussian Splatting\"\n\nAdd \"4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting\"\n\nAdd \"Gaussian Splatting Visual MPC for Granular Media Manipulation\"\n\nAdd \"Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors\"\n\nAdd \"SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction\"\n\nAdd \"MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering\"\n\nAdd \"Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars\"\n\nAdd \"Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization\"\n\nAdd \"FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction\"\n\nAdd \"Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics\"\n\n### 2024/10/12\n\nAdd \"Poison-splat: Computation Cost Attack on 3D Gaussian Splatting\"\n\nAdd \"RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image\"\n\nAdd \"Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency\"\n\nAdd \"IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera\"\n\nAdd \"Fast Feedforward 3D Gaussian Splatting Compression\"\n\nAdd \"Generalizable and Animatable Gaussian Head Avatar\"\n\nAdd \"MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting\"\n\nAdd \"3D Vision-Language Gaussian Splatting\"\n\nAdd \"Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting\"\n\nAdd \"DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation\"\n\nAdd \"ES-Gaussian: Gaussian Splatting Mapping via Error Space-Based Gaussian Completion\"\n\nAdd \"HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction\"\n\nAdd \"RelitLRM: Generative Relightable Radiance for Large Reconstruction Models\"\n\nAdd \"GSLoc: Visual Localization with 3D Gaussian Splatting\"\n\nAdd \"SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting\"\n\n### 2024/10/09\n\nAdd \"GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting\"\n\nAdd \"LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting\"\n\nAdd \"PhotoReg: Photometrically Registering 3D Gaussian Splatting Models\"\n\nAdd \"6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering\"\n\nAdd \"Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting\"\n\nAdd \"Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering\"\n\n### 2024/10/08\n\nAdd \"StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting\"\n\nAdd \"Variational Bayes Gaussian Splatting\"\n\nAdd \"Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats\"\n\nAdd \"GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering\"\n\nAdd \"SuperGS: Super-Resolution 3D Gaussian Splatting via Latent Feature Field and Gradient-guided Splitting\"\n\nAdd \"MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis\"\n\nAdd \"EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis\"\n\nAdd \"3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection\"\n\nAdd \"Gaussian Splatting in Mirrors: Reflection-Aware Rendering via Virtual Camera Optimization\"\n\nAdd \"GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians\"\n\nAdd \"MiraGe: Editable 2D Images using Gaussian Splatting\"\n\nAdd \"UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction\"\n\nAdd \"EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings\"\n\nAdd \"Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection\"\n\n### 2024/10/02\n\nAdd \"CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM\"\n\nAdd \"Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance\"\n\nAdd \"GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving\"\n\nAdd \"RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning\"\n\nAdd \"Robust Gaussian Splatting SLAM by Leveraging Loop Closure\"\n\nAdd \"RNG: Relightable Neural Gaussians\"\n\nAdd \"GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting\"\n\nAdd \"1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction\"\n\nAdd \"Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation\"\n\nAdd \"Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes\"\n\nAdd \"RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration\"\n\nAdd \"Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot\"\n\nAdd \"WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians\"\n\nAdd \"HGS-Planner: Hierarchical Planning Framework for Active Scene Reconstruction Using 3D Gaussian Splatting\"\n\nAdd \"SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model\"\n\nAdd \"Disco4D: Disentangled 4D Human Generation and Animation from a Single Image\"\n\nAdd \"Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM\"\n\nAdd \"Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model\"\n\nAdd \"Let's Make a Splan: Risk-Aware Trajectory Optimization in a Normalized Gaussian Splat\"\n\nAdd \"Low Latency Point Cloud Rendering with Learned Splatting\"\n\nAdd \"GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization\"\n\nAdd \"Frequency-based View Selection in Gaussian Splatting Reconstruction\"\n\nAdd \"LiDAR-3DGS: LiDAR Reinforced 3D Gaussian Splatting for Multimodal Radiance Field Rendering\"\n\nAdd \"Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities\"\n\nAdd \"Semantics-Controlled Gaussian Splatting for Outdoor Scene Reconstruction and Rendering in Virtual Reality\"\n\nAdd \"Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB\"\n\n### 2024/09/24\n\nAdd \"Human Hair Reconstruction with Strand-Aligned 3D Gaussians\"\n\nAdd \"MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views\"\n\nAdd \"SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality\"\n\nAdd \"Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors\"\n\nAdd \"3D-GSW: 3D Gaussian Splatting Watermark for Protecting Copyrights in Radiance Fields\"\n\nAdd \"MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting\"\n\n### 2024/09/22\n\nAdd \"GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling\"\n\nAdd \"LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction\"\n\nAdd \"3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt\"\n\nAdd \"EdgeGaussians -- 3D Edge Mapping via Gaussian Splatting\"\n\nAdd \"GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction\"\n\nAdd \"Spectral-GS: Taming 3D Gaussian Splatting with Spectral Entropy\"\n\nAdd \"DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input\"\n\nAdd \"CrossRT: A cross platform programming technology for hardware-accelerated ray tracing in CG and CV applications\"\n\nAdd \"Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting\"\n\nAdd \"Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus\"\n\nAdd \"Vista3D: Unravel the 3D Darkside of a Single Image\"\n\nAdd \"SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation\"\n\nAdd \"Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks\"\n\n### 2024/09/18\n\nAdd \"RenderWorld: World Model with Self-Supervised 3D Label\"\n\nAdd \"GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module\"\n\nAdd \"SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction\"\n\nAdd \"GLC-SLAM: Gaussian Splatting SLAM with Efficient Loop Closure\"\n\nAdd \"Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering\"\n\nAdd \"BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting\"\n\nAdd \"SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting\"\n\nAdd \"DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments\"\n\nAdd \"SAFER-Splat: A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps\"\n\nAdd \"MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation\"\n\nAdd \"GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians\"\n\nAdd \"A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis\"\n\n### 2024/09/16\n\nAdd \"AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius\"\n\nAdd \"Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints\"\n\nAdd \"CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting\"\n\nAdd \"Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos\"\n\n### 2024/09/13\n\nAdd \"FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally\"\n\nAdd \"Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis\"\n\nAdd \"SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length\"\n\nAdd \"Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs\"\n\nAdd \"Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models\"\n\nAdd \"Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering\"\n\nAdd \"Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks\"\n\nAdd \"ThermalGaussian: Thermal 3D Gaussian Splatting\"\n\nAdd \"gsplat: An Open-Source Library for Gaussian Splatting\"\n\nAdd \"GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction\"\n\nAdd \"SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting\"\n\n### 2024/09/10\n\nAdd \"GASP: Gaussian Splatting for Physic-Based Simulations\"\n\nAdd \"Lagrangian Hashing for Compressed Neural Field Representations\"\n\nAdd \"GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning\"\n\nAdd \"Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras\"\n\nAdd \"GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers\"\n\nAdd \"3D-GP-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors\"\n\n### 2024/09/09\n\nAdd \"LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors\"\n\nAdd \"Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction\"\n\nAdd \"Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models\"\n\nAdd \"Object Gaussian for Monocular 6D Pose Estimation from Sparse Views\"\n\nAdd \"GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving\"\n\nAdd \"DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction\"\n\nAdd \"PRoGS: Progressive Rendering of Gaussian Splats\"\n\nAdd \"GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting\"\n\nAdd \"Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos\"\n\nAdd \"3D Gaussian Splatting for Large-scale 3D Surface Reconstruction from Aerial Images\"\n\nAdd \"UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM\"\n\n### 2024/09/03\n\nAdd \"OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping\"\n\nAdd \"2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction\"\n\nAdd \"ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model\"\n\nAdd \"OmniRe: Omni Urban Scene Reconstruction\"\n\n### 2024/08/29\n\nAdd \"Towards Realistic Example-based Modeling via 3D Gaussian Stitching\"\n\nAdd \"G-Style: Stylized Gaussian Splatting\"\n\nAdd \"Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty\"\n\nAdd \"Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation\"\n\nAdd \"LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming\"\n\nAdd \"Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control\"\n\nAdd \"DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting\"\n\nAdd \"Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs\"\n\nAdd \"TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers\"\n\nAdd \"SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting\"\n\nAdd \"BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting\"\n\nAdd \"Atlas Gaussians Diffusion for 3D Generation with Infinite Number of Points\"\n\nAdd \"LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation\"\n\nAdd \"S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points\"\n\nAdd \"FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering\"\n\nAdd \"GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion\"\n\n### 2024/08/21\n\nAdd \"Subsurface Scattering for 3D Gaussian Splatting\"\n\nAdd \"Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors\"\n\nAdd \"DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments\"\n\nAdd \"GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting\"\n\nAdd \"Pano2Room: Novel View Synthesis from a Single Indoor Panorama\"\n\nAdd \"GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting\"\n\nAdd \"Large Point-to-Gaussian Model for Image-to-3D Generation\"\n\nAdd \"ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining\"\n\nAdd \"DEGAS: Detailed Expressions on Full-Body Gaussian Avatars\"\n\nAdd \"LoopSplat: Loop Closure by Registering 3D Gaussian Splats\"\n\nAdd \"Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation\"\n\nAdd \"SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting\"\n\nAdd \"CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning\"\n\nAdd \"Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting\"\n\nAdd \"Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS\"\n\nAdd \"GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization\"\n\n### 2024/08/16\n\nAdd \"WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting\"\n\nAdd \"FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering\"\n\nAdd \"3D Gaussian Editing with A Single Image\"\n\nAdd \"SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis\"\n\nAdd \"HDRGS: High Dynamic Range Gaussian Splatting\"\n\nAdd \"Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering\"\n\nAdd \"HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors\"\n\nAdd \"Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis\"\n\nAdd \"PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer\"\n\nAdd \"Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction\"\n\nAdd \"LumiGauss: High-Fidelity Outdoor Relighting with 2D Gaussian Splatting\"\n\nAdd \"InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting\"\n\n### 2024/08/10\n\nAdd \"Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM\"\n\nAdd \"Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields\"\n\nAdd \"3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting\"\n\nAdd \"PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting\"\n\nAdd \"MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images\"\n\nAdd \"A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness\"\n\nAdd \"Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion\"\n\nAdd \"IG-SLAM: Instant Gaussian SLAM\"\n\nAdd \"EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head\"\n\nAdd \"LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting\"\n\nAdd \"Localized Gaussian Splatting Editing with Contextual Awareness\"\n\nAdd \"Expressive Whole-Body 3D Gaussian Avatar\"\n\nAdd \"Registering Neural 4D Gaussians for Endoscopic Surgery\"\n\nAdd \"ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting\"\n\n### 2024/07/26\n\nAdd \"3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities\"\n\nAdd \"DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene\"\n\nAdd \"HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images\"\n\nAdd \"Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance\"\n\n### 2024/07/23\n\nAdd \"6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model\"\n\nAdd \"Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures\"\n\nAdd \"HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions\"\n\nAdd \"3D Gaussian Parametric Head Model\"\n\nAdd \"Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting\"\n\nAdd \"GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation\"\n\nAdd \"PlacidDreamer: Advancing Harmony in Text-to-3D Generation\"\n\n### 2024/07/19\n\nAdd \"Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation\"\n\nAdd \"EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting\"\n\nAdd \"Generalizable Human Gaussians for Sparse View Synthesis\"\n\nAdd \"Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections\"\n\n### 2024/07/17\n\nAdd \"MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification\"\n\nAdd \"Click-Gaussian: Interactive Segmentation to Any 3D Gaussians\"\n\nAdd \"Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering\"\n\nAdd \"Gaussian Splatting Lucas-Kanade\"\n\nAdd \"iHuman: Instant Animatable Digital Humans From Monocular Videos\"\n\n### 2024/07/16\n\nAdd \"RecGS: Removing Water Caustic with Recurrent Gaussian Splatting\"\n\nAdd \"3DEgo: 3D Editing on the Go!\"\n\nAdd \"SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion\"\n\nAdd \"3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods\"\n\nAdd \"StyleSplat: 3D Object Style Transfer with Gaussian Splatting\"\n\nAdd \"WildGaussians: 3D Gaussian Splatting in the Wild\"\n\n### 2024/07/11\n\nAdd \"MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition\"\n\nAdd \"Reference-based Controllable Scene Stylization with Gaussian Splatting\"\n\nAdd \"3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes\"\n\nAdd \"PICA: Physics-Integrated Clothed Avatar\"\n\nAdd \"GaussReg: Fast 3D Registration with Gaussian Splatting\"\n\nAdd \"SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction\"\n\n### 2024/07/08\n\nAdd \"LaRa: Efficient Large-Baseline Radiance Fields\"\n\nAdd \"Gaussian Eigen Models for Human Heads\"\n\nAdd \"Segment Any 4D Gaussians\"\n\nAdd \"GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction\"\n\nAdd \"CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images\"\n\nAdd \"PFGS: High Fidelity Point Cloud Rendering via Feature Splatting\"\n\n### 2024/07/04\n\nAdd \"Expressive Gaussian Human Avatars from Monocular RGB Video\"\n\nAdd \"VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors\"\n\nAdd \"Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction\"\n\nAdd \"AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction\"\n\nAdd \"TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation\"\n\nAdd \"DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction\"\n\n### 2024/07/03\n\nAdd \"GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting\"\n\nAdd \"Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction\"\n\nAdd \"EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting\"\n\nAdd \"RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering\"\n\nAdd \"OccFusion: Rendering Occluded Humans with Generative Diffusion Priors\"\n\n### 2024/07/01\n\nAdd \"SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting\"\n\nAdd \"EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting\"\n\nAdd \"Lightweight Predictive 3D Gaussian Splats\"\n\n### 2024/06/30\n\nAdd \"FAGhead: Fully Animate Gaussian Head from Monocular Videos\"\n\nAdd \"Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos\"\n\nAdd \"GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors\"\n\nAdd \"On Scaling Up 3D Gaussian Splatting Training\"\n\nAdd \"GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality\"\n\nAdd \"Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning\"\n\nAdd \"GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting\"\n\nAdd \"VDG: Vision-Only Dynamic Gaussian for Driving Simulation\"\n\nAdd \"Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text\"\n\nAdd \"Reducing the Memory Footprint of 3D Gaussian Splatting\"\n\nAdd \"ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians\"\n\nAdd \"Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling\"\n\nAdd \"LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction\"\n\nAdd \"Taming 3DGS: High-Quality Radiance Fields with Limited Resources\"\n\n### 2024/06/24\n\nAdd \"GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation\"\n\nAdd \"Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks\"\n\nAdd \"E2GS: Event Enhanced Gaussian Splatting\"\n\nAdd \"Gaussian-Informed Continuum for Physical Property Identification and Simulation\"\n\n### 2024/06/22\n\nAdd \"Splatter a Video: Video Gaussian Representation for Versatile Processing\"\n\nAdd \"Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models\"\n\n### 2024/06/19\n\nAdd \"HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors\"\n\nAdd \"A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets\"\n\nAdd \"RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians\"\n\nAdd \"Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting\"\n\nAdd \"Projecting Radiance Fields to Mesh Surfaces\"\n\n### 2024/06/18\n\nAdd \"Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics\"\n\nAdd \"Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections\"\n\nAdd \"L4GM: Large 4D Gaussian Reconstruction Model\"\n\nAdd \"PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting\"\n\nAdd \"Unified Gaussian Primitives for Scene Representation and Rendering\"\n\nAdd \"GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion\"\n\nAdd \"GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors\"\n\n### 2024/06/14\n\nAdd \"Modeling Ambient Scene Dynamics for Free-view Synthesis\"\n\nAdd \"WonderWorld: Interactive 3D Scene Generation from a Single Image\"\n\nAdd \"GGHead: Fast and Generalizable 3D Gaussian Heads\"\n\nAdd \"Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling\"\n\nAdd \"ICE-G: Image Conditional Editing of 3D Gaussian Splats\"\n\nAdd \"Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models\"\n\nAdd \"From Chaos to Clarity: 3DGS in the Dark\"\n\nAdd \"Trim 3D Gaussian Splatting for Accurate Geometry Representation\"\n\nAdd \"Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field\"\n\n### 2024/06/11\n\nAdd \"VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction\"\n\nAdd \"RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering\"\n\nAdd \"InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping\"\n\nAdd \"Generalizable Human Gaussians from Single-View Image\"\n\nAdd \"Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis\"\n\nAdd \"MVGamba: Unify 3D Content Generation as State Space Sequence Modeling\"\n\nAdd \"PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction\"\n\nAdd \"GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation\"\n\n### 2024/06/07\n\nAdd \"Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image\"\n\nAdd \"Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion\"\n\nAdd \"Localized Gaussian Point Management\"\n\nAdd \"Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction\"\n\nAdd \"Gaussian Representation for Deformable Image Registration\"\n\n### 2024/06/06\n\nAdd \"Dynamic 3D Gaussian Fields for Urban Areas\"\n\nAdd \"Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion\"\n\nAdd \"Adversarial Generation of Hierarchical Gaussians for 3D Generative Model\"\n\nAdd \"3D-HGS: 3D Half-Gaussian Splatting\"\n\nAdd \"Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting\"\n\nAdd \"SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition\"\n\nAdd \"DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering\"\n\nAdd \"WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections\"\n\nAdd \"Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning\"\n\nAdd \"OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding\"\n\nAdd \"FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping\"\n\nAdd \"End-to-End Rate-Distortion Optimized 3D Gaussian Representation\"\n\nAdd \"Tetrahedron Splatting for 3D Generation\"\n\n### 2024/06/04\n\nAdd \"MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos\"\n\nAdd \"Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture\"\n\nAdd \"SuperGaussian: Repurposing Video Models for 3D Super Resolution\"\n\nAdd \"Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting\"\n\nAdd \"RaDe-GS: Rasterizing Depth in Gaussian Splatting\"\n\nAdd \"DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors\"\n\nAdd \"Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting\"\n\n### 2024/06/03\n\nAdd \"Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation\"\n\nAdd \"R2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction\"\n\nAdd \"ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model\"\n\nAdd \"GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis\"\n\n### 2024/05/31\n\nAdd \"DGD: Dynamic 3D Gaussians Distillation\"\n\nAdd \"NPGA: Neural Parametric Gaussian Avatars\"\n\nAdd \"TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM\"\n\nAdd \"Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian\"\n\nAdd \"GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction\"\n\nAdd \"GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis\"\n\nAdd \"PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting\"\n\nAdd \"Object-centric Reconstruction and Tracking of Dynamic Unknown Objects using 3D Gaussian Splatting\"\n\nAdd \"EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images\"\n\nAdd \"A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction\"\n\nAdd \"S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving\"\n\n### 2024/05/30\n\nAdd \"GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane\"\n\nAdd \"DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos\"\n\nAdd \"SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction\"\n\nAdd \"Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh\"\n\nAdd \"Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting\"\n\nAdd \"HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction\"\n\nAdd \"A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction\"\n\nAdd \"FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes\"\n\nAdd \"RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields\"\n\nAdd \"EG4D: Explicit Generation of 4D Object without Score Distillation\"\n\nAdd \"A Grid-Free Fluid Solver based on Gaussian Spatial Representation\"\n\nAdd \"NegGS: Negative Gaussian Splatting\"\n\nAdd \"3D StreetUnveiler with Semantic-Aware 2DGS\"\n\nAdd \"3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting\"\n\nAdd \"GFlow: Recovering 4D World from Monocular Video\"\n\nAdd \"LP-3DGS: Learning to Prune 3D Gaussian Splatting\"\n\nAdd \"E3Gen: Efficient, Expressive and Editable Avatars Generation\"\n\n### 2024/05/28\n\nAdd \"Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors\"\n\nAdd \"Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians\"\n\nAdd \"PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting\"\n\nAdd \"Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation\"\n\nAdd \"SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain\"\n\nAdd \"F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting\"\n\nAdd \"Memorize What Matters: Emergent Scene Decomposition from Multitraverse\"\n\nAdd \"DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Refocusing,Defocus Rendering and Blur Removal\"\n\nAdd \"GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction\"\n\n### 2024/05/27\n\nAdd \"EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting\"\n\nAdd \"GS-Hider: Hiding Messages into 3D Gaussian Splatting\"\n\nAdd \"HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting\"\n\nAdd \"DisC-GS: Discontinuity-aware Gaussian Splatting\"\n\nAdd \"GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting\"\n\nAdd \"Feature Splatting for Better Novel View Synthesis with Low Overlap\"\n\n### 2024/05/24\n\nAdd \"Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances\"\n\nAdd \"Monocular Gaussian SLAM with Language Extended Loop Closure\"\n\nAdd \"DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus\"\n\nAdd \"NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation\"\n\nAdd \"D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup\"\n\nAdd \"RoGS: Large Scale Road Surface Reconstruction based on 2D Gaussian Splatting\"\n\nAdd \"TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing\"\n\nAdd \"MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes\"\n\n### 2024/05/22\n\nAdd \"AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field\"\n\nAdd \"GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details\"\n\nAdd \"Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery\"\n\nAdd \"LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting\"\n\nAdd \"MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video\"\n\n### 2024/05/21\n\nAdd \"Photorealistic 3D Urban Scene Reconstruction and Point Cloud Extraction using Google Earth Imagery and Gaussian Splatting\"\n\nAdd \"MotionGS : Compact Gaussian Splatting SLAM by Motion Filter\"\n\nAdd \"MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections\"\n\nAdd \"GGAvatar: Geometric Adjustment of Gaussian Head Avatar\"\n\nAdd \"Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping\"\n\nAdd \"CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization\"\n\nAdd \"Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo\"\n\n### 2024/05/19\n\nAdd \"I3DGS: Improve 3D Gaussian Splatting from Multiple Dimension\"\n\nAdd \"OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation\"\n\nAdd \"Direct Learning of Mesh and Appearance via 3D Gaussian Splatting\"\n\nAdd \"LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer\"\n\nAdd \"GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting\"\n\nAdd \"GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction\"\n\nAdd \"ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation\"\n\n### 2024/05/10\n\nAdd \"GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields\"\n\nAdd \"NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap\"\n\nAdd \"FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting\"\n\nAdd \"DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation\"\n\n### 2024/05/08\n\nAdd \"Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting\"\n\n### 2024/05/07\n\nAdd \"HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2\"\n\nAdd \"DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos\"\n\nAdd \"Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review\"\n\nAdd \"A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose\"\n\n### 2024/05/05\n\nAdd \"Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians\"\n\n### 2024/05/02\n\nAdd \"MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing\"\n\nAdd \"GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting\"\n\nAdd \"SAGS: Structure-Aware 3D Gaussian Splatting\"\n\nAdd \"3D Gaussian Blendshapes for Head Avatar Animation\"\n\nAdd \"MicroDreamer: Zero-shot 3D Generation in ∼20 Seconds by Score-based Iterative Reconstruction\"\n\nAdd \"GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting\"\n\nAdd \"RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting\"\n\nAdd \"Spectrally Pruned Gaussian Fields with Neural Compensation\"\n\n### 2024/04/30\n\nAdd \"SLAM for Indoor Mapping of Wide Area Construction Environments\"\n\nAdd \"Reconstructing Satellites in 3D from Amateur Telescope Images\"\n\nAdd \"3D Gaussian Splatting with Deferred Reflection\"\n\nAdd \"Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting\"\n\nAdd \"DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing\"\n\n### 2024/04/26\n\nAdd \"DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction\"\n\n### 2024/04/25\n\nAdd \"TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting\"\n\nAdd \"OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation\"\n\nAdd \"GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting\"\n\n### 2024/04/23\n\nAdd \"GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal\"\n\nAdd \"GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting\"\n\nAdd \"CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding\"\n\nAdd \"Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses\"\n\n### 2024/04/22\n\nAdd \"Does Gaussian Splatting need SFM Initialization?\"\n\nAdd \"EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation\"\n\nAdd \"Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation\"\n\n### 2024/04/19\n\nAdd \"Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos\"\n\n### 2024/04/18\n\nAdd \"DeblurGS: Gaussian Splatting for Camera Motion Blur\"\n\nAdd \"InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior\"\n\n### 2024/04/17\n\nAdd \"SRGS: Super-Resolution 3D Gaussian Splatting\"\n\nAdd \"AbsGS: Recovering Fine Details for 3D Gaussian Splatting\"\n\nAdd \"Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks\"\n\nAdd \"Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes\"\n\n### 2024/04/16\n\nAdd \"LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field\"\n\nAdd \"EGGS: Edge Guided Gaussian Splatting for Radiance Fields\"\n\nAdd \"DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling\"\n\nAdd \"DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading\"\n\nAdd \"CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting\"\n\nAdd \"3D Gaussian Splatting as Markov Chain Monte Carlo\"\n\nAdd \"LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives\"\n\n### 2024/04/15\n\nAdd \"OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering\"\n\n### 2024/04/11\n\nAdd \"DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting\"\n\nAdd \"Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting\"\n\nAdd \"RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion\"\n\nAdd \"Reinforcement Learning with Generalizable Gaussian Splatting\"\n\n### 2024/04/10\n\nAdd \"Dual-Camera Smooth Zoom on Mobile Phones\"\n\nAdd \"Revising Densification in Gaussian Splatting\"\n\nAdd \"Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction\"\n\n### 2024/04/09\n\nAdd \"Robust Gaussian Splatting\"\n\nAdd \"Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion\"\n\nAdd \"GauU-Scene V2: Expanse Lidar Image Dataset Shows Unreliable Geometric Reconstruction Using Gaussian Splatting and NeRF\"\n\nAdd \"StylizedGS: Controllable Stylization for 3D Gaussian Splatting\"\n\n### 2024/04/06\n\nAdd \"3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting\"\n\nAdd \"MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements\"\n\nAdd \"CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians\"\n\nAdd \"Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting\"\n\nAdd \"Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing\"\n\nAdd \"Surface Reconstruction from Gaussian Splatting via Novel Stereo Views\"\n\nAdd \"TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Surrounding Autonomous Driving Scenes\"\n\nAdd \"GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis\"\n\nAdd \"OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images\"\n\nAdd \"Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting\"\n\n### 2024/04/01\n\nAdd \"HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes\"\n\nAdd \"SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior\"\n\nAdd \"HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes\"\n\nAdd \"Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces\"\n\nAdd \"InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds\"\n\n### 2024/03/29\n\nAdd \"Modeling uncertainty for Gaussian Splatting\"\n\nAdd \"SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface\"\n\nAdd \"Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction\"\n\nAdd \"CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians\"\n\nAdd \"SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing\"\n\nAdd \"GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond\"\n\nAdd \"GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling\"\n\n### 2024/03/27\n\nAdd \"STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians\"\n\nAdd \"EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting\"\n\nAdd \"Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting\"\n\nAdd \"Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting\"\n\nAdd \"Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections\"\n\nAdd \"CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field\"\n\nAdd \"latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction\"\n\nAdd \"GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction\"\n\nAdd \"DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion\"\n\nAdd \"DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing\"\n\nAdd \"2D Gaussian Splatting for Geometrically Accurate Radiance Fields\"\n\nAdd \"Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians\"\n\n### 2024/03/23\n\nAdd \"Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion\"\n\nAdd \"RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS\"\n\nAdd \"Mini-Splatting: Representing Scenes with a Constrained Number of Gaussian\"\n\n### 2024/03/22\n\nAdd \"Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering\"\n\nAdd \"Hash-grid Assisted Context for 3D Gaussian Splatting Compression\"\n\nAdd \"Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering\"\n\nAdd \"GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation\"\n\nAdd \"MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images\"\n\n### 2024/03/20\n\nAdd \"GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation\"\n\nAdd \"High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization\"\n\nAdd \"RGBD GS-ICP SLAM\"\n\nAdd \"GVGEN: Text-to-3D Generation with Volumetric Representation\"\n\n### 2024/03/19\n\nAdd \"Recent Advances in 3D Gaussian Splatting\"\n\nAdd \"GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation\"\n\nAdd \"DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark\"\n\nAdd \"Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration\"\n\nAdd \"Compact 3D Gaussian Splatting For Dense Visual SLAM\"\n\nAdd \"BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis\"\n\nAdd \"GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering\"\n\nAdd \"3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization\"\n\nAdd \"BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors\"\n\nAdd \"Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction\"\n\nAdd \"Bridging 3D Gaussian and Mesh for Freeview Video Rendering\"\n\nAdd \"Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning\"\n\nAdd \"3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration\"\n\nAdd \"UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling\"\n\nAdd \"GaussNav: Gaussian Splatting for Visual Navigation\"\n\nAdd \"NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting\"\n\nAdd \"BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting\"\n\nAdd \"View-Consistent 3D Editing with Gaussian Splatting\"\n\nAdd \"VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model\"\n\n### 2024/03/18\n\nAdd \"Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting\"\n\nAdd \"Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting\"\n\nAdd \"Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing\"\n\nAdd \"GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time\"\n\nAdd \"FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model\"\n\nAdd \"SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians\"\n\n### 2024/03/15\n\nAdd \"A New Split Algorithm for 3D Gaussian Splatting\"\n\nAdd \"Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph\"\n\nAdd \"Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting\"\n\nAdd \"Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians\"\n\nAdd \"GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping\"\n\n### 2024/03/14\n\nAdd \"ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation\"\n\nAdd \"Gaussian Splatting in Style\"\n\nAdd \"GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing\"\n\n### 2024/03/13\n\nAdd \"SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM\"\n\nAdd \"StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting\"\n\n### 2024/03/11\n\nAdd \"BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling\"\n\nAdd \"GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting\"\n\n### 2024/03/09\n\nAdd \"Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps\"\n\nAdd \"Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis\"\n\n### 2024/03/01\n\nAdd \"3D Gaussian Model for Animation and Texturing\"\n\n### 2024/02/28\n\nAdd \"VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction\"\n\n### 2024/02/27\n\nAdd \"Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting\"\n\nAdd \"GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video\"\n\n### 2024/02/23\n\nAdd \"Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting\"\n\nAdd \"GaussianPro: 3D Gaussian Splatting with Progressive Propagation\"\n\n### 2024/02/19\n\nAdd \"GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting\"\n\nAdd \"GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians\"\n\n### 2024/02/17\n\nAdd \"GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering\"\n\n### 2024/02/13\n\nAdd \"3D Gaussian as a New Vision Era: A Survey\"\n\nAdd \"GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting\"\n\n### 2024/02/12\n\nAdd \"HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting\"\n\nAdd \"GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data\"\n\n### 2024/02/08\n\nAdd \"Mesh-based Gaussian Splatting for Real-time Large-scale Deformation\"\n\nAdd \"LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation\"\n\n### 2024/02/07\n\nAdd \"Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos\"\n\n### 2024/02/06\n\nAdd \"SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM\"\n\nAdd \"4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes\"\n\n### 2024/02/05\n\nAdd \"GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting\"\n\n### 2024/02/03\n\nAdd \"Segment Anything in 3D Gaussians\"\n\nAdd \"StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering\"\n\nAdd \"GS++: Error Analyzing and Optimal Gaussian Splatting\"\n\nAdd \"360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming\"\n\n### 2024/01/31\n\nAdd \"VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality\"\n\n### 2024/01/30\n\nAdd \"LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering\"\n\nAdd \"Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting\"\n\nAdd \"Endo-4DGS: Distilling Depth Ranking for Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting\"\n\n### 2024/01/24\n\nAdd \"EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction\"\n\nAdd \"PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Creation with 3D Gaussian Splatting\"\n\n### 2024/01/23\n\nAdd \"Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting\"\n\n### 2024/01/19\n\nAdd \"GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting\"\n\n### 2024/01/18\n\nAdd \"Fast Dynamic 3D Object Generation from a Single-view Video\"\n\n### 2024/01/12\n\nAdd \"TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering\"\n\nAdd \"CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians\"\n\nAdd \"DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines\"\n\n### 2024/01/09\n\nAdd \"A Survey on 3D Gaussian Splatting\"\n\nAdd \"AGG: Amortized Generative 3D Gaussians for Single Image to 3D\"\n\n### 2024/01/08\n\nAdd \"Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis\"\n\nAdd \"Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting\"\n\n### 2024/01/05\n\nAdd \"Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting\"\n\nAdd \"FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding\"\n\nAdd \"PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation\"\n\n### 2024/01/04\n\nAdd \"Text-to-3D using Gaussian Splatting\"\n\nAdd \"GS-IR: 3D Gaussian Splatting for Inverse Rendering\"\n\nAdd \"Mip-Splatting: Alias-free 3D Gaussian Splatting\"\n\nAdd \"HUGS: Human Gaussian Splats\"\n\nAdd \"Human101: Training 100+FPS Human Gaussians in 100s from 1 View\"\n\nAdd \"Deformable 3D Gaussian Splatting for Animatable Human Avatars\"\n\nAdd \"GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning\"\n\nAdd \"Exploring the Feasibility of Generating Realistic 3D Models of Endangered Species Using DreamGaussian: An Analysis of Elevation Angle's Impact on Model Generation\"\n\nAdd \"HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting\"\n\nAdd \"Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering\"\n\n### 2024/01/03\n\nAdd \"Street Gaussians for Modeling Dynamic Urban Scenes\"\n\n### 2024/01/02\n\nAdd \"Deblurring 3D Gaussian Splatting\"\n\n### 2024/01/01\n\nAdd \"GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis\"\n\nAdd \"pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction\"\n\nAdd \"SpecNeRF: Gaussian Directional Encoding for Specular Reflections\"\n\nAdd \"Compact 3D Scene Representation via Self-Organizing Gaussian Grids\"\n\nAdd \"SWAGS: Sampling Windows Adaptively for Dynamic 3D Gaussian Splatting\"\n\nAdd \"Gaussian Splatting with NeRF-based Color and Opacity\"\n\nAdd \"Sparse-view CT Reconstruction with 3D Gaussian Volumetric Representation\"\n\nAdd \"2D-Guided 3D Gaussian Segmentation\"\n\nAdd \"Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis\"\n\nAdd \"4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency\"\n\n### 2023/12/29\n\nAdd \"DreamGaussian4D: Generative 4D Gaussian Splatting\"\n\n### 2023/12/28\n\nAdd \"SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes\"\n\n### 2023/12/27\n\nAdd \"LangSplat: 3D Language Gaussian Splatting\"\n\n### 2023/12/26\n\nAdd \"Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle\"\n\nAdd \"GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis\"\n\n### 2023/12/24\n\nAdd \"Splatter Image: Ultra-Fast Single-View 3D Reconstruction\"\n\n"
  },
  {
    "path": "README.md",
    "content": "# 3D Gaussian Splatting Papers\n\n[![更新日志](https://img.shields.io/badge/💡-更新日志-informational.svg?style=flat)](Changelog.md)\n[![发现错误](https://img.shields.io/badge/🐛-发现错误-yellow.svg?style=flat)](https://github.com/Awesome3DGS/3D-Gaussian-Splatting-Papers/issues)\n[![提交修改](https://img.shields.io/badge/👐-提交修改-brightgreen.svg?style=flat)](https://github.com/Awesome3DGS/3D-Gaussian-Splatting-Papers/pulls)\n\n> 📢 本项目现由 Agent 协助维护。如发现论文遗漏、链接错误或信息有误，欢迎提交 Issue 勘误。\n\n#### **📚 会议期刊**\n\n- **2024**\n\n  - [[ICLR](./2024/ICLR.md)] (2 篇)\n    [[CVPR](./2024/CVPR.md)] (67 篇)\n    [[ECCV](./2024/ECCV.md)] (86 篇)\n    [[ACM MM](./2024/ACMMM.md)] (10 篇)\n    [[MICCAI](./2024/MICCAI.md)] (8 篇)\n    [[SIGGRAPH](./2024/SIGGRAPH.md)] (27 篇)\n  - [[NeurIPS](./2024/NeurIPS.md)] (70 篇)\n    [[ICML](./2024/ICML.md)] (4 篇)\n    [[BMVC](./2024/BMVC.md)] (8 篇)\n    [[CoRL](./2024/CoRL.md)] (6 篇)\n    [[IROS](./2024/IROS.md)] (9 篇)\n    [[others](./2024/Accepted.md)] (27 篇)\n\n- **2025**\n\n  - [[3DV](./2025/3DV.md)] (21 篇)\n    [[WACV](./2025/WACV.md)] (16 篇)\n    [[AAAI](./2025/AAAI.md)] (28 篇)\n    [[ICLR](./2025/ICLR.md)] (42 篇)\n    [[ICASSP](./2025/ICASSP.md)] (5 篇)\n    [[ICRA](./2025/ICRA.md)] (23 篇)\n  - [[CVPR](./2025/CVPR.md)] (157 篇)\n    [[ICCV](./2025/ICCV.md)] (109 篇)\n    [[ACM MM](./2025/ACMMM.md)] (22 篇)\n    [[MICCAI](./2025/MICCAI.md)] (7 篇)\n    [[SIGGRAPH](./2025/SIGGRAPH.md)] (35 篇)\n  - [[ICML](./2025/ICML.md)] (7 篇)\n    [[IROS](./2025/IROS.md)] (18 篇)\n    [[ICME](./2025/ICME.md)] (8 篇)\n    [[ICIP](./2025/ICIP.md)] (4 篇)\n    [[BMVC](./2025/BMVC.md)] (3 篇)\n  - [[NeurIPS](./2025/NeurIPS.md)] (35 篇)\n    [[others](./2025/Accepted.md)] (99 篇)\n\n- **2026**\n\n  - [[3DV](./2026/3DV.md)] (5 篇)\n    [[WACV](./2026/WACV.md)] (4 篇)\n    [[AAAI](./2026/AAAI.md)] (22 篇)\n    [[ICLR](./2026/ICLR.md)] (26 篇)\n    [[ICASSP](./2026/ICASSP.md)] (2 篇)\n    [[ICRA](./2026/ICRA.md)] (5 篇)\n  - [[CVPR](./2026/CVPR.md)] (23 篇)\n    [[others](./2026/Accepted.md)] (11 篇)\n\n#### **📂 归档论文**\n\n- [[Survey Papers](./Survey.md)] (12 篇)\n\n- **归档时间**\n\n  - [[2024/07/01](./archive/202407.md)] (176 篇)\n    [[2024/10/01](./archive/202410.md)] (99 篇)\n    [[2025/01/01](./archive/202501.md)] (196 篇)\n    [[2025/04/01](./archive/202504.md)] (203 篇)\n  - [[2025/07/01](./archive/202507.md)] (216 篇)\n\n---\n\n#### [1] GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System\n- **🧑‍🔬 作者**：Zhiye Tang, Qiudan Zhang, Lei Zhang, Junhui Hou, You Yang, Xu Wang\n- **🏫 单位**：Shenzhen University ⟐ City University of Hong Kong ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.09718.md)] [[arXiv:2603.09718](https://arxiv.org/abs/2603.09718)] [Code]\n- **📝 说明**:\n\n#### [2] ProGS: Towards Progressive Coding for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiye Tang, Lingzhuo Liu, Shengjie Jiao, Qiudan Zhang, Junhui Hou, You Yang, Xu Wang\n- **🏫 单位**：Shenzhen University ⟐ City University of Hong Kong ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.09703.md)] [[arXiv:2603.09703](https://arxiv.org/abs/2603.09703)] [Code]\n- **📝 说明**:\n\n#### [3] X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models\n- **🧑‍🔬 作者**：Yueen Ma, Irwin King\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2603.09632.md)] [[arXiv:2603.09632](https://arxiv.org/abs/2603.09632)] [Code]\n- **📝 说明**:\n\n#### [4] DenoiseSplat: Feed-Forward Gaussian Splatting for Noisy 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Fuzhen Jiang, Zhuoran Li, Yinlin Zhang\n- **🏫 单位**：Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](./abs/2603.09291.md)] [[arXiv:2603.09291](https://arxiv.org/abs/2603.09291)] [Code]\n- **📝 说明**:\n\n#### [5] GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models\n- **🧑‍🔬 作者**：Md Selim Sarowar, Omer Tariq, Sungho Kim\n- **🏫 单位**：Yeungnam University ⟐ KAIST\n- **🔗 链接**：[[中英摘要](./abs/2603.09079.md)] [[arXiv:2603.09079](https://arxiv.org/abs/2603.09079)] [Code]\n- **📝 说明**:\n\n#### [6] SkipGS: Post-Densification Backward Skipping for Efficient 3DGS Training\n- **🧑‍🔬 作者**：Jingxing Li, Yongjae Leeand, Deliang Fan\n- **🏫 单位**：Arizona State University\n- **🔗 链接**：[[中英摘要](./abs/2603.08997.md)] [[arXiv:2603.08997](https://arxiv.org/abs/2603.08997)] [Code]\n- **📝 说明**:\n\n#### [7] SurgCalib: Gaussian Splatting-Based Hand-Eye Calibration for Robot-Assisted Minimally Invasive Surgery\n- **🧑‍🔬 作者**：Zijian Wu, Shuojue Yang, Yu Chung Lee, Eitan Prisman, Yueming Jin, Septimiu E. Salcudean\n- **🏫 单位**：University of British Columbia ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2603.08983.md)] [[arXiv:2603.08983](https://arxiv.org/abs/2603.08983)] [Code]\n- **📝 说明**:\n\n#### [8] Where, What, Why: Toward Explainable 3D-GS Watermarking\n- **🧑‍🔬 作者**：Mingshu Cai, Jiajun Li, Osamu Yoshie, Yuya Ieiri, Yixuan Li\n- **🏫 单位**：Waseda University ⟐ Southeast University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2603.08809.md)] [[arXiv:2603.08809](https://arxiv.org/abs/2603.08809)] [Code]\n- **📝 说明**:\n\n#### [9] ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jordi Muñoz Vicente\n- **🏫 单位**：Universidad de Murcia\n- **🔗 链接**：[[中英摘要](./abs/2603.08661.md)] [[arXiv:2603.08661](https://arxiv.org/abs/2603.08661)] [Code]\n- **📝 说明**:\n\n#### [10] Spherical-GOF: Geometry-Aware Panoramic Gaussian Opacity Fields for 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Zhe Yang, Guoqiang Zhao, Sheng Wu, Kai Luo, Kailun Yang\n- **🏫 单位**：Hunan University\n- **🔗 链接**：[[中英摘要](./abs/2603.08503.md)] [[arXiv:2603.08503](https://arxiv.org/abs/2603.08503)] [[Code](https://github.com/1170632760/Spherical-GOF)]\n- **📝 说明**:\n\n#### [11] DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving\n- **🧑‍🔬 作者**：Zhuolin He, Jing Li, Guanghao Li, Xiaolei Chen, Jiacheng Tang, Siyang Zhang, Zhounan Jin, Feipeng Cai, Bin Li, Jian Pu, Jia Cai, Xiangyang Xue\n- **🏫 单位**：Fudan University ⟐ Huawei ⟐ Yinwang Intelligent Technology ⟐ Shanghai Innovation Institute ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2603.08254.md)] [[arXiv:2603.08254](https://arxiv.org/abs/2603.08254)] [Code]\n- **📝 说明**:\n\n#### [12] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence\n- **🧑‍🔬 作者**：Yuanyuan Gao, Hao Li, Yifei Liu, Xinhao Ji, Yuning Gong, Yuanjun Liao, Fangfu Liu, Manyuan Zhang, Yuchen Yang, Dan Xu, Xue Yang, Huaxi Huang, Hongjie Zhang, Ziwei Liu, Xiao Sun, Dingwen Zhang, Zhihang Zhong\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Shanghai Jiao Tong University ⟐ Peking University ⟐ Nanyang Technological University ⟐ Beihang University ⟐ Sichuan University ⟐ Tsinghua University ⟐ The Chinese University of Hong Kong ⟐ Fudan University ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.07660.md)] [[arXiv:2603.07660](https://arxiv.org/abs/2603.07660)] [[Code](https://github.com/Visionary-Laboratory/Holi-Spatial)]\n- **📝 说明**:\n\n#### [13] EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation\n- **🧑‍🔬 作者**：Arpita Saggar, Jonathan C. Darling, Duygu Sarikaya, David C. Hogg\n- **🏫 单位**：University of Leeds\n- **🔗 链接**：[[中英摘要](./abs/2603.07604.md)] [[arXiv:2603.07604](https://arxiv.org/abs/2603.07604)] [Code]\n- **📝 说明**:\n\n#### [14] 3DGS-HPC: Distractor-free 3D Gaussian Splatting with Hybrid Patch-wise Classification\n- **🧑‍🔬 作者**：Jiahao Chen, Yipeng Qin, Ganlong Zhao, Xin Li, Wenping Wang, Guanbin Li\n- **🏫 单位**：Sun Yat-sen University ⟐ Texas A&M University ⟐ Cardiff University\n- **🔗 链接**：[[中英摘要](./abs/2603.07587.md)] [[arXiv:2603.07587](https://arxiv.org/abs/2603.07587)] [Code]\n- **📝 说明**:\n\n#### [15] ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction\n- **🧑‍🔬 作者**：Haibao Yu, Kuntao Xiao, Jiahang Wang, Ruiyang Hao, Yuxin Huang, Guoran Hu, Haifang Qin, Bowen Jing, Yuntian Bo, Ping Luo\n- **🏫 单位**：Tuojing Intelligence ⟐ The University of Hong Kong ⟐ King's College London ⟐ The University of Sydney ⟐ Mohamed bin Zayed University of Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2603.07552.md)] [[arXiv:2603.07552](https://arxiv.org/abs/2603.07552)] [Code]\n- **📝 说明**:\n\n#### [16] ColonSplat: Reconstruction of Peristaltic Motion in Colonoscopy with Dynamic Gaussian Splatting\n- **🧑‍🔬 作者**：Weronika Smolak-Dyżewska, Joanna Kaleta, Diego Dall'Alba, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ Warsaw University of Technology ⟐ Sano Centre for Computational Medicine ⟐ University of Verona ⟐ IDEAS Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2603.06860.md)] [[arXiv:2603.06860](https://arxiv.org/abs/2603.06860)] [[Code](https://github.com/wMito/ColonSplat)]\n- **📝 说明**:\n\n#### [17] Active View Selection with Perturbed Gaussian Ensemble for Tomographic Reconstruction\n- **🧑‍🔬 作者**：Yulun Wu, Ruyi Zha, Wei Cao, Yingying Li, Yuanhao Cai, Yaoyao Liu\n- **🏫 单位**：University of Illinois Urbana-Champaign ⟐ Australian National University ⟐ Johns Hopkins University\n- **🔗 链接**：[[中英摘要](./abs/2603.06852.md)] [[arXiv:2603.06852](https://arxiv.org/abs/2603.06852)] [Code]\n- **📝 说明**:\n\n#### [18] EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Miriam Jäger, Boris Jutzi\n- **🏫 单位**：Karlsruhe Institute of Technology (KIT)\n- **🔗 链接**：[[中英摘要](./abs/2603.06216.md)] [[arXiv:2603.06216](https://arxiv.org/abs/2603.06216)] [Code]\n- **📝 说明**:\n\n#### [19] VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction\n- **🧑‍🔬 作者**：Xiaoyang Yan, Muleilan Pei, Shaojie Shen\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.06210.md)] [[arXiv:2603.06210](https://arxiv.org/abs/2603.06210)] [Code]\n- **📝 说明**:\n\n#### [20] Transforming Omnidirectional RGB-LiDAR data into 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Semin Bae, Hansol Lim, Jongseong Brad Choi\n- **🏫 单位**：State University of New York\n- **🔗 链接**：[[中英摘要](./abs/2603.06061.md)] [[arXiv:2603.06061](https://arxiv.org/abs/2603.06061)] [Code]\n- **📝 说明**:\n\n#### [21] FTSplat: Feed-forward Triangle Splatting Network\n- **🧑‍🔬 作者**：Xiong Jinlin, Li Can, Shen Jiawei, Qi Zhigang, Sun Lei, Zhao Dongyang\n- **🏫 单位**：Nankai University ⟐ Beijing Institute of Computer Application\n- **🔗 链接**：[[中英摘要](./abs/2603.05932.md)] [[arXiv:2603.05932](https://arxiv.org/abs/2603.05932)] [Code]\n- **📝 说明**:\n\n#### [22] CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis\n- **🧑‍🔬 作者**：Qiwei Wang, Xianghui Ze, Jingyi Yu, Yujiao Shi\n- **🏫 单位**：ShanghaiTech University ⟐ Nanjing University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.05882.md)] [[arXiv:2603.05882](https://arxiv.org/abs/2603.05882)] [Code]\n- **📝 说明**:\n\n#### [23] Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation\n- **🧑‍🔬 作者**：Haonan Wang, Hanyu Zhou, Haoyue Liu, Tao Gu, Luxin Yan\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ National University of Singapore ⟐ Macquarie University\n- **🔗 链接**：[[中英摘要](./abs/2603.05845.md)] [[arXiv:2603.05845](https://arxiv.org/abs/2603.05845)] [Code]\n- **📝 说明**:\n\n#### [24] SSR-GS: Separating Specular Reflection in Gaussian Splatting for Glossy Surface Reconstruction\n- **🧑‍🔬 作者**：Ningjing Fan, Yiqun Wang\n- **🏫 单位**：Chongqing University\n- **🔗 链接**：[[中英摘要](./abs/2603.05152.md)] [[arXiv:2603.05152](https://arxiv.org/abs/2603.05152)] [Code]\n- **📝 说明**:\n\n#### [25] GaussTwin: Unified Simulation and Correction with Gaussian Splatting for Robotic Digital Twins\n- **🧑‍🔬 作者**：Yichen Cai, Paul Jansonnie, Cristiana de Farias, Oleg Arenz, Jan Peters\n- **🏫 单位**：Technical University of Darmstadt ⟐ Hessian.AI ⟐ German Research Center for AI (DFKI) ⟐ Robotics Institute Germany (RIG) ⟐ NAVER LABS Europe\n- **🔗 链接**：[[中英摘要](./abs/2603.05108.md)] [[arXiv:2603.05108](https://arxiv.org/abs/2603.05108)] [Code]\n- **📝 说明**:\n\n#### [26] GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction\n- **🧑‍🔬 作者**：Tianyu Xiong, Rui Li, Linjie Li, Jiaqi Yang\n- **🏫 单位**：Northwestern Polytechnical University ⟐ King Abdullah University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.04847.md)] [[arXiv:2603.04847](https://arxiv.org/abs/2603.04847)] [Code]\n- **📝 说明**:\n\n#### [27] DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction\n- **🧑‍🔬 作者**：Shiyu Zhang, Zhicong Wu, Huangxuan Zhao, Zhentao Liu, Lei Chen, Yong Luo, Lefei Zhang, Zhiming Cui, Ziwen Ke, Bo Du\n- **🏫 单位**：Wuhan University ⟐ Xiamen University ⟐ ShanghaiTech University ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.04770.md)] [[arXiv:2603.04770](https://arxiv.org/abs/2603.04770)] [Code]\n- **📝 说明**:\n\n#### [28] Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On\n- **🧑‍🔬 作者**：Zhiyi Chen, Hsuan-I Ho, Tianjian Jiang, Jie Song, Manuel Kaufmann, Chen Guo\n- **🏫 单位**：ETH Zurich ⟐ ETH AI Center ⟐ The Hong Kong University of Science and Technology (Guangzhou) ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2603.04290.md)] [[arXiv:2603.04290](https://arxiv.org/abs/2603.04290)] [Code]\n- **📝 说明**:\n\n#### [29] DM-CFO: A Diffusion Model for Compositional 3D Tooth Generation with Collision-Free Optimization\n- **🧑‍🔬 作者**：Yan Tian, Pengcheng Xue, Weiping Ding, Mahmoud Hassaballah, Karen Egiazarian, Aura Conci, Abdulkadir Sengur, Leszek Rutkowski\n- **🏫 单位**：Zhejiang Gongshang University ⟐ Shining3D Tech Co., Ltd. ⟐ Nantong University ⟐ Prince Sattam Bin Abdulaziz University ⟐ Qena University ⟐ Tampere University ⟐ Universidade Federal Fluminense ⟐ Firat University ⟐ Systems Research Institute of the Polish Academy of Sciences ⟐ AGH University of Krakow ⟐ SAN University\n- **🔗 链接**：[[中英摘要](./abs/2603.03602.md)] [[arXiv:2603.03602](https://arxiv.org/abs/2603.03602)] [Code]\n- **📝 说明**:\n\n#### [30] VIRGi: View-dependent Instant Recoloring of 3D Gaussians Splats\n- **🧑‍🔬 作者**：Alessio Mazzucchelli, Ivan Ojeda-Martin, Fernando Rivas-Manzaneque, Elena Garces, Adrian Penate-Sanchez, Francesc Moreno-Noguer\n- **🏫 单位**：Arquimea Research Center ⟐ Universidad Politécnica de Catalunya ⟐ Volinga AI ⟐ Universidad Politécnica de Madrid ⟐ Universidad Rey Juan Carlos ⟐ Universidad de Las Palmas de Gran Canaria ⟐ Institut de Robòtica i Informàtica Industrial (IRI) ⟐ CSIC-UPC\n- **🔗 链接**：[[中英摘要](./abs/2603.02986.md)] [[arXiv:2603.02986](https://arxiv.org/abs/2603.02986)] [Code]\n- **📝 说明**:\n\n#### [31] Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting\n- **🧑‍🔬 作者**：Kaiqiang Xiong, Rui Peng, Jiahao Wu, Zhanke Wang, Jie Liang, Xiaoyun Zheng, Feng Gao, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory ⟐ Migu Culture Technology Co., Ltd ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2603.02893.md)] [[arXiv:2603.02893](https://arxiv.org/abs/2603.02893)] [Code]\n- **📝 说明**:\n\n#### [32] Generalized non-exponential Gaussian splatting\n- **🧑‍🔬 作者**：Sébastien Speierer, Adrian Jarabo\n- **🏫 单位**：Meta\n- **🔗 链接**：[[中英摘要](./abs/2603.02887.md)] [[arXiv:2603.02887](https://arxiv.org/abs/2603.02887)] [Code]\n- **📝 说明**:\n\n#### [33] Multimodal-Prior-Guided Importance Sampling for Hierarchical Gaussian Splatting in Sparse-View Novel View Synthesis\n- **🧑‍🔬 作者**：Kaiqiang Xiong, Zhanke Wang, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory ⟐ Migu Culture Technology Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2603.02866.md)] [[arXiv:2603.02866](https://arxiv.org/abs/2603.02866)] [Code]\n- **📝 说明**:\n\n#### [34] R3GW: Relightable 3D Gaussians for Outdoor Scenes in the Wild\n- **🧑‍🔬 作者**：Margherita Lea Corona, Wieland Morgenstern, Peter Eisert, Anna Hilsmann\n- **🏫 单位**：Fraunhofer Heinrich Hertz Institute ⟐ Humboldt Universität zu Berlin\n- **🔗 链接**：[[中英摘要](./abs/2603.02801.md)] [[arXiv:2603.02801](https://arxiv.org/abs/2603.02801)] [Code]\n- **📝 说明**:\n\n#### [35] SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding\n- **🧑‍🔬 作者**：Sheng Ye, Zhen-Hui Dong, Ruoyu Fan, Tian Lv, Yong-Jin Liu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2603.02548.md)] [[arXiv:2603.02548](https://arxiv.org/abs/2603.02548)] [Code]\n- **📝 说明**:\n\n#### [36] LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation\n- **🧑‍🔬 作者**：Hualiang Wei, Shunran Jia, Jialun Liu, Wenhui Li\n- **🏫 单位**：Jilin University ⟐ Impressed Inc DBA SocialBook ⟐ Institute of Artificial Intelligence of China Telecom\n- **🔗 链接**：[[中英摘要](./abs/2603.02129.md)] [[arXiv:2603.02129](https://arxiv.org/abs/2603.02129)] [Code]\n- **📝 说明**:\n\n#### [37] Sparse View Distractor-Free Gaussian Splatting\n- **🧑‍🔬 作者**：Yi Gu, Zhaorui Wang, Jiahang Cao, Jiaxu Wang, Mingle Zhao, Dongjun Ye, Renjing Xu\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ University of Macau\n- **🔗 链接**：[[中英摘要](./abs/2603.01603.md)] [[arXiv:2603.01603](https://arxiv.org/abs/2603.01603)] [Code]\n- **📝 说明**:\n\n#### [38] FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Wenhui Ou, Zhuoyu Wu, Yipu Zhang, Dongjun Wu, Freddy Ziyang Hong, Chik Patrick Yue\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Monash University, Malaysia Campus\n- **🔗 链接**：[[中英摘要](./abs/2603.01158.md)] [[arXiv:2603.01158](https://arxiv.org/abs/2603.01158)] [Code]\n- **📝 说明**:\n\n#### [39] HeroGS: Hierarchical Guidance for Robust 3D Gaussian Splatting under Sparse Views\n- **🧑‍🔬 作者**：Jiashu Li, Xumeng Han, Zhaoyang Wei, Zipeng Wang, Kuiran Wang, Guorong Li, Zhenjun Han, Jianbin Jiao\n- **🏫 单位**：University of the Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2603.01099.md)] [[arXiv:2603.01099](https://arxiv.org/abs/2603.01099)] [Code]\n- **📝 说明**:\n\n#### [40] Decoupling Motion and Geometry in 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Yi Zhang, Yulei Kang, Jian-Fang Hu\n- **🏫 单位**：Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2603.00952.md)] [[arXiv:2603.00952](https://arxiv.org/abs/2603.00952)] [Code]\n- **📝 说明**:\n\n#### [41] TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction\n- **🧑‍🔬 作者**：Yihui Li, Chengxin Lv, Zichen Tang, Hongyu Yang, Di Huang\n- **🏫 单位**：State Key Laboratory of Complex and Critical Software Environment ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2603.00697.md)] [[arXiv:2603.00697](https://arxiv.org/abs/2603.00697)] [Code]\n- **📝 说明**:\n\n#### [42] Zero-Shot Robotic Manipulation via 3D Gaussian Splatting-Enhanced Multimodal Retrieval-Augmented Generation\n- **🧑‍🔬 作者**：Zilong Xie, Jingyu Gong, Xin Tan, Zhizhong Zhang, Yuan Xie\n- **🏫 单位**：East China Normal University ⟐ Chongqing Institute of East China Normal University ⟐ Shanghai Key Laboratory of Computer Software Evaluating and Testing\n- **🔗 链接**：[[中英摘要](./abs/2603.00500.md)] [[arXiv:2603.00500](https://arxiv.org/abs/2603.00500)] [Code]\n- **📝 说明**:\n\n#### [43] ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models\n- **🧑‍🔬 作者**：Riccardo de Lutio, Tobias Fischer, Yen-Yu Chang, Yuxuan Zhang, Jay Zhangjie Wu, Xuanchi Ren, Tianchang Shen, Katarina Tothova, Zan Gojcic, Haithem Turki\n- **🏫 单位**：NVIDIA ⟐ ETH Zurich ⟐ Cornell University ⟐ University of Toronto ⟐ Vector Institute\n- **🔗 链接**：[[中英摘要](./abs/2603.00492.md)] [[arXiv:2603.00492](https://arxiv.org/abs/2603.00492)] [Code]\n- **📝 说明**:\n\n#### [44] UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images\n- **🧑‍🔬 作者**：Junhwa Hur, Charles Herrmann, Songyou Peng, Philipp Henzler, Zeyu Ma, Todd Zickler, Deqing Sun\n- **🏫 单位**：Google ⟐ Princeton University ⟐ Harvard University\n- **🔗 链接**：[[中英摘要](./abs/2602.24290.md)] [[arXiv:2602.24290](https://arxiv.org/abs/2602.24290)] [Code]\n- **📝 说明**:\n\n#### [45] GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction\n- **🧑‍🔬 作者**：Chao Xu, Xiaochen Zhao, Xiang Deng, Jingxiang Sun, Zhuo Su, Donglin Di, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ ByteDance ⟐ Li Auto\n- **🔗 链接**：[[中英摘要](./abs/2602.24161.md)] [[arXiv:2602.24161](https://arxiv.org/abs/2602.24161)] [Code]\n- **📝 说明**:\n\n#### [46] Prune Wisely, Reconstruct Sharply: Compact 3D Gaussian Splatting via Adaptive Pruning and Difference-of-Gaussian Primitives\n- **🧑‍🔬 作者**：Haoran Wang, Guoxi Huang, Fan Zhang, David Bull, Nantheera Anantrasirichai\n- **🏫 单位**：University of Bristol\n- **🔗 链接**：[[中英摘要](./abs/2602.24136.md)] [[arXiv:2602.24136](https://arxiv.org/abs/2602.24136)] [Code]\n- **📝 说明**:\n\n#### [47] DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer\n- **🧑‍🔬 作者**：Yuxuan Zhang, Katarína Tóthová, Zian Wang, Kangxue Yin, Haithem Turki, Riccardo de Lutio, Yen-Yu Chang, Or Litany, Sanja Fidler, Zan Gojcic\n- **🏫 单位**：NVIDIA ⟐ University of Toronto ⟐ Cornell University ⟐ Technion\n- **🔗 链接**：[[中英摘要](./abs/2602.24096.md)] [[arXiv:2602.24096](https://arxiv.org/abs/2602.24096)] [Code]\n- **📝 说明**:\n\n#### [48] SR3R: Rethinking Super-Resolution 3D Reconstruction With Feed-Forward Gaussian Splatting\n- **🧑‍🔬 作者**：Xiang Feng, Xiangbo Wang, Tieshi Zhong, Chengkai Wang, Yiting Zhao, Tianxiang Xu, Zhenzhong Kuang, Feiwei Qin, Xuefei Yin, Yanming Zhu\n- **🏫 单位**：Hangzhou Dianzi University ⟐ ShanghaiTech University ⟐ Griffith University ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2602.24020.md)] [[arXiv:2602.24020](https://arxiv.org/abs/2602.24020)] [Code]\n- **📝 说明**:\n\n#### [49] No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency\n- **🧑‍🔬 作者**：Cho-Ying Wu, Zixun Huang, Xinyu Huang, Liu Ren\n- **🏫 单位**：Bosch Research North America ⟐ Bosch Center for Artificial Intelligence (BCAI)\n- **🔗 链接**：[[中英摘要](./abs/2602.23559.md)] [[arXiv:2602.23559](https://arxiv.org/abs/2602.23559)] [Code]\n- **📝 说明**:\n\n#### [50] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking\n- **🧑‍🔬 作者**：Maximilian Luz, Rohit Mohan, Thomas Nürnberg, Yakov Miron, Daniele Cattaneo, Abhinav Valada\n- **🏫 单位**：University of Freiburg ⟐ Bosch Research ⟐ University of Haifa\n- **🔗 链接**：[[中英摘要](./abs/2602.23172.md)] [[arXiv:2602.23172](https://arxiv.org/abs/2602.23172)] [Code]\n- **📝 说明**:\n\n#### [51] GSTurb: Gaussian Splatting for Atmospheric Turbulence Mitigation\n- **🧑‍🔬 作者**：Hanliang Du, Zhangji Lu, Zewei Cai, Qijian Tang, Qifeng Yu, Xiaoli Liu\n- **🏫 单位**：University of Macau ⟐ Shenzhen University ⟐ State Key Laboratory of Radio Frequency Heterogeneous Integration (Shenzhen University) ⟐ Shenzhen Key Laboratory of Intelligent Optical Measurement and Detection ⟐ Hunan Provincial Key Laboratory of Image Measurement and Vision Navigation ⟐ National University of Defense Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.22800.md)] [[arXiv:2602.22800](https://arxiv.org/abs/2602.22800)] [Code]\n- **📝 说明**:\n\n#### [52] Sapling-NeRF: Geo-Localised Sapling Reconstruction in Forests for Ecological Monitoring\n- **🧑‍🔬 作者**：Miguel Ángel Muñoz-Bañón, Nived Chebrolu, Sruthi M. Krishna Moorthy, Yifu Tao, Fernando Torres, Roberto Salguero-Gómez, Maurice Fallon\n- **🏫 单位**：Oxford Robotics Institute ⟐ University of Oxford ⟐ University of Alicante\n- **🔗 链接**：[[中英摘要](./abs/2602.22731.md)] [[arXiv:2602.22731](https://arxiv.org/abs/2602.22731)] [Code]\n- **📝 说明**:\n\n#### [53] ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals\n- **🧑‍🔬 作者**：Xuelu Li, Zhaonan Wang, Xiaogang Wang, Lei Wu, Manyi Li, Changhe Tu\n- **🏫 单位**：Shandong University ⟐ Southwest University\n- **🔗 链接**：[[中英摘要](./abs/2602.22666.md)] [[arXiv:2602.22666](https://arxiv.org/abs/2602.22666)] [Code]\n- **📝 说明**:\n\n#### [54] BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model\n- **🧑‍🔬 作者**：Yuci Han, Charles Toth, John E. Anderson, William J. Shuart, Alper Yilmaz\n- **🏫 单位**：The Ohio State University ⟐ USACE ERDC GRL\n- **🔗 链接**：[[中英摘要](./abs/2602.22596.md)] [[arXiv:2602.22596](https://arxiv.org/abs/2602.22596)] [Code]\n- **📝 说明**:\n\n#### [55] GIFSplat: Generative Prior-Guided Iterative Feed-Forward 3D Gaussian Splatting from Sparse Views\n- **🧑‍🔬 作者**：Tianyu Chen, Wei Xiang, Kang Han, Yu Lu, Di Wu, Gaowen Liu, Ramana Rao Kompella\n- **🏫 单位**：La Trobe University ⟐ Cisco Research\n- **🔗 链接**：[[中英摘要](./abs/2602.22571.md)] [[arXiv:2602.22571](https://arxiv.org/abs/2602.22571)] [Code]\n- **📝 说明**:\n\n#### [56] SwiftNDC: Fast Neural Depth Correction for High-Fidelity 3D Reconstruction\n- **🧑‍🔬 作者**：Kang Han, Wei Xiang, Lu Yu, Mathew Wyatt, Gaowen Liu, Ramana Rao Kompella\n- **🏫 单位**：La Trobe University ⟐ Australian Institute of Marine Science ⟐ Cisco Research\n- **🔗 链接**：[[中英摘要](./abs/2602.22565.md)] [[arXiv:2602.22565](https://arxiv.org/abs/2602.22565)] [Code]\n- **📝 说明**:\n\n#### [57] Interactive Augmented Reality-enabled Outdoor Scene Visualization For Enhanced Real-time Disaster Response\n- **🧑‍🔬 作者**：Dimitrios Apostolakis, Georgios Angelidis, Vasileios Argyriou, Panagiotis Sarigiannidis, Georgios Th. Papadopoulos\n- **🏫 单位**：Harokopio University of Athens ⟐ Archimedes, Athena Research Center ⟐ Kingston University ⟐ University of Western Macedonia\n- **🔗 链接**：[[中英摘要](./abs/2602.21874.md)] [[arXiv:2602.21874](https://arxiv.org/abs/2602.21874)] [Code]\n- **📝 说明**:\n\n#### [58] DAGS-SLAM: Dynamic-Aware 3DGS SLAM via Spatiotemporal Motion Probability and Uncertainty-Aware Scheduling\n- **🧑‍🔬 作者**：Li Zhang, Yu-An Liu, Xijia Jiang, Conghao Huang, Danyang Li, Yanyong Zhang\n- **🏫 单位**：Hefei University of Technology ⟐ Tsinghua University ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2602.21644.md)] [[arXiv:2602.21644](https://arxiv.org/abs/2602.21644)] [Code]\n- **📝 说明**:\n\n#### [59] M-Gaussian: An Magnetic Gaussian Framework for Efficient Multi-Stack MRI Reconstruction\n- **🧑‍🔬 作者**：Kangyuan Zheng, Xuan Cai, Jiangqi Wang, Guixing Fu, Zhuoshuo Li, Yazhou Chen, Xinting Ge, Liangqiong Qu, Mengting Liu\n- **🏫 单位**：Sun Yat-sen University ⟐ Shandong Normal University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2603.00145.md)] [[arXiv:2603.00145](https://arxiv.org/abs/2603.00145)] [Code]\n- **📝 说明**:\n\n#### [60] Monocular Endoscopic Tissue 3D Reconstruction with Multi-Level Geometry Regularization\n- **🧑‍🔬 作者**：Yangsen Chen, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2602.20718.md)] [[arXiv:2602.20718](https://arxiv.org/abs/2602.20718)] [Code]\n- **📝 说明**:\n\n#### [61] WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos\n- **🧑‍🔬 作者**：Hanhui Li, Xuan Huang, Wanquan Liu, Yuhao Cheng, Long Chen, Yiqiang Yan, Xiaodan Liang, Chenqiang Gao\n- **🏫 单位**：Sun Yat-sen University ⟐ Lenovo Research Group\n- **🔗 链接**：[[中英摘要](./abs/2602.20556.md)] [[arXiv:2602.20556](https://arxiv.org/abs/2602.20556)] [[Code](https://github.com/XuanHuang0/WildGHand)]\n- **📝 说明**:\n\n#### [62] Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field\n- **🧑‍🔬 作者**：Sheyang Tang, Armin Shafiee Sarvestani, Jialu Xu, Xiaoyu Xu, Zhou Wang\n- **🏫 单位**：University of Waterloo ⟐ City University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2602.20363.md)] [[arXiv:2602.20363](https://arxiv.org/abs/2602.20363)] [Code]\n- **📝 说明**:\n\n#### [63] Large-scale Photorealistic Outdoor 3D Scene Reconstruction from UAV Imagery Using Gaussian Splatting Techniques\n- **🧑‍🔬 作者**：Christos Maikos, Georgios Angelidis, Georgios Th. Papadopoulos\n- **🏫 单位**：Harokopio University of Athens ⟐ Archimedes, Athena Research Center\n- **🔗 链接**：[[中英摘要](./abs/2602.20342.md)] [[arXiv:2602.20342](https://arxiv.org/abs/2602.20342)] [Code]\n- **📝 说明**:\n\n#### [64] DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering\n- **🧑‍🔬 作者**：Yiran Qiao, Yiren Lu, Yunlai Zhou, Rui Yang, Linlin Hou, Yu Yin, Jing Ma\n- **🏫 单位**：Case Western Reserve University\n- **🔗 链接**：[[中英摘要](./abs/2602.19323.md)] [[arXiv:2602.19323](https://arxiv.org/abs/2602.19323)] [Code]\n- **📝 说明**:\n\n#### [65] Spatial-Temporal State Propagation Autoregressive Model for 4D Object Generation\n- **🧑‍🔬 作者**：Liying Yang, Jialun Liu, Jiakui Hu, Chenhao Guan, Haibin Huang, Fangqiu Yi, Chi Zhang, Yanyan Liang\n- **🏫 单位**：Macau University of Science and Technology ⟐ TeleAI ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2602.18830.md)] [[arXiv:2602.18830](https://arxiv.org/abs/2602.18830)] [Code]\n- **📝 说明**:\n\n#### [66] Unifying Color and Lightness Correction with View-Adaptive Curve Adjustment for Robust 3D Novel View Synthesis\n- **🧑‍🔬 作者**：Ziteng Cui, Shuhong Liu, Xiaoyu Dong, Xuangeng Chu, Lin Gu, Ming-Hsuan Yang, Tatsuya Harada\n- **🏫 单位**：The University of Tokyo ⟐ Tohoku University ⟐ University of California at Merced ⟐ Google DeepMind ⟐ RIKEN AIP\n- **🔗 链接**：[[中英摘要](./abs/2602.18322.md)] [[arXiv:2602.18322](https://arxiv.org/abs/2602.18322)] [Code]\n- **📝 说明**:\n\n#### [67] 4D Monocular Surgical Reconstruction under Arbitrary Camera Motions\n- **🧑‍🔬 作者**：Jiwei Shan, Zeyu Cai, Cheng-Tai Hsieh, Yirui Li, Hao Liu, Lijun Han, Hesheng Wang, Shing Shin Cheng\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Shenyang Institute of Automation, Chinese Academy of Sciences ⟐ State Key Laboratory of Robotics and Intelligent Systems ⟐ Shanghai Jiao Tong University ⟐ Key Laboratory of System Control and Information Processing\n- **🔗 链接**：[[中英摘要](./abs/2602.17473.md)] [[arXiv:2602.17473](https://arxiv.org/abs/2602.17473)] [Code]\n- **📝 说明**:\n\n#### [68] NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiwei Shan, Zeyu Cai, Yirui Li, Yongbo Chen, Lijun Han, Yun-hui Liu, Hesheng Wang, Shing Shin Cheng\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2602.17182.md)] [[arXiv:2602.17182](https://arxiv.org/abs/2602.17182)] [Code]\n- **📝 说明**:\n\n#### [69] 3D Scene Rendering with Multimodal Gaussian Splatting\n- **🧑‍🔬 作者**：Chi-Shiang Gau, Konstantinos D. Polyzos, Athanasios Bacharis, Saketh Madhuvarasu, Tara Javidi\n- **🏫 单位**：University of California San Diego ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2602.17124.md)] [[arXiv:2602.17124](https://arxiv.org/abs/2602.17124)] [Code]\n- **📝 说明**:\n\n#### [70] i-PhysGaussian: Implicit Physical Simulation for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yicheng Cao, Zhuo Huang, Yu Yao, Yiming Ying, Daoyi Dong, Tongliang Liu\n- **🏫 单位**：The University of Sydney ⟐ University of Technology Sydney\n- **🔗 链接**：[[中英摘要](./abs/2602.17117.md)] [[arXiv:2602.17117](https://arxiv.org/abs/2602.17117)] [Code]\n- **📝 说明**:\n\n#### [71] Semantic-Guided 3D Gaussian Splatting for Transient Object Removal\n- **🧑‍🔬 作者**：Aditi Prabakaran, Priyesh Shukla\n- **🏫 单位**：SRM University ⟐ International Institute of Information Technology, Hyderabad\n- **🔗 链接**：[[中英摘要](./abs/2602.15516.md)] [[arXiv:2602.15516](https://arxiv.org/abs/2602.15516)] [Code]\n- **📝 说明**:\n\n#### [72] DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles\n- **🧑‍🔬 作者**：Rong Fu, Jiekai Wu, Haiyun Wei, Yee Tan Jia, Yang Li, Xiaowen Ma, Wangyu Wu, Simon Fong\n- **🏫 单位**：University of Macau ⟐ Juntendo University ⟐ Tongji University ⟐ Renmin University of China ⟐ University of Chinese Academy of Sciences ⟐ Zhejiang University ⟐ University of Liverpool\n- **🔗 链接**：[[中英摘要](./abs/2602.15355.md)] [[arXiv:2602.15355](https://arxiv.org/abs/2602.15355)] [Code]\n- **📝 说明**:\n\n#### [73] Time-Archival Camera Virtualization for Sports and Visual Performances\n- **🧑‍🔬 作者**：Yunxiao Zhang, William Stone, Suryansh Kumar\n- **🏫 单位**：Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2602.15181.md)] [[arXiv:2602.15181](https://arxiv.org/abs/2602.15181)] [[Code](https://github.com/JackZhang-SH/Time-Archival-Camera-Virtualization-for-Sports-and-Visual-Performance)]\n- **📝 说明**:\n\n#### [74] Wrivinder: Towards Spatial Intelligence for Geo-locating Ground Images onto Satellite Imagery\n- **🧑‍🔬 作者**：Chandrakanth Gudavalli, Tajuddin Manhar Mohammed, Abhay Yadav, Ananth Vishnu Bhaskar, Hardik Prajapati, Cheng Peng, Rama Chellappa, Shivkumar Chandrasekaran, B. S. Manjunath\n- **🏫 单位**：Mayachitra, Inc. ⟐ Johns Hopkins University\n- **🔗 链接**：[[中英摘要](./abs/2602.14929.md)] [[arXiv:2602.14929](https://arxiv.org/abs/2602.14929)] [Code]\n- **📝 说明**:\n\n#### [75] Learnable Multi-level Discrete Wavelet Transforms for 3D Gaussian Splatting Frequency Modulation\n- **🧑‍🔬 作者**：Hung Nguyen, An Le, Truong Nguyen\n- **🏫 单位**：UC San Diego\n- **🔗 链接**：[[中英摘要](./abs/2602.14199.md)] [[arXiv:2602.14199](https://arxiv.org/abs/2602.14199)] [Code]\n- **📝 说明**:\n\n#### [76] Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos\n- **🧑‍🔬 作者**：Can Li, Jie Gu, Jingmin Chen, Fangzhou Qiu, Lei Sun\n- **🏫 单位**：Nankai University ⟐ Rightly Robotics\n- **🔗 链接**：[[中英摘要](./abs/2602.13806.md)] [[arXiv:2602.13806](https://arxiv.org/abs/2602.13806)] [Code]\n- **📝 说明**:\n\n#### [77] Joint Orientation and Weight Optimization for Robust Watertight Surface Reconstruction via Dirichlet-Regularized Winding Fields\n- **🧑‍🔬 作者**：Jiaze Li, Daisheng Jin, Fei Hou, Junhui Hou, Zheng Liu, Shiqing Xin, Wenping Wang, Ying He\n- **🏫 单位**：Nanyang Technological University ⟐ Institute of Software, Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ City University of Hong Kong ⟐ China University of Geosciences (Wuhan) ⟐ Shandong University ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2602.13801.md)] [[arXiv:2602.13801](https://arxiv.org/abs/2602.13801)] [Code]\n- **📝 说明**:\n\n#### [78] Nighttime Autonomous Driving Scene Reconstruction with Physically-Based Gaussian Splatting\n- **🧑‍🔬 作者**：Tae-Kyeong Kim, Xingxin Chen, Guile Wu, Chengjie Huang, Dongfeng Bai, Bingbing Liu\n- **🏫 单位**：Huawei Noah’s Ark Lab ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2602.13549.md)] [[arXiv:2602.13549](https://arxiv.org/abs/2602.13549)] [Code]\n- **📝 说明**:\n\n#### [79] FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation\n- **🧑‍🔬 作者**：Huajian Zeng, Lingyun Chen, Jiaqi Yang, Yuantai Zhang, Fan Shi, Peidong Liu, Xingxing Zuo\n- **🏫 单位**：Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) ⟐ Technical University of Munich ⟐ National University of Singapore ⟐ Westlake University\n- **🔗 链接**：[[中英摘要](./abs/2602.13444.md)] [[arXiv:2602.13444](https://arxiv.org/abs/2602.13444)] [[Code](https://github.com/huajian-zeng/flowhoi)]\n- **📝 说明**:\n\n#### [80] GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction\n- **🧑‍🔬 作者**：Xiao Ren, Yu Liu, Ning An, Jian Cheng, Xin Qiao, He Kong\n- **🏫 单位**：Southern University of Science and Technology ⟐ Research Institute of Mine Artificial Intelligence ⟐ China Coal Research Institute ⟐ State Key Laboratory of Intelligent Coal Mining and Strata Control ⟐ Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2602.12796.md)] [[arXiv:2602.12796](https://arxiv.org/abs/2602.12796)] [[Code](https://github.com/AISLAB-sustech/GSM-GS)]\n- **📝 说明**:\n\n#### [81] LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning\n- **🧑‍🔬 作者**：Junwoon Lee, Yulun Tian\n- **🏫 单位**：University of Michigan\n- **🔗 链接**：[[中英摘要](./abs/2602.12314.md)] [[arXiv:2602.12314](https://arxiv.org/abs/2602.12314)] [Code]\n- **📝 说明**:\n\n#### [82] 3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Wancai Zheng, Hao Chen, Xianlong Lu, Linlin Ou, Xinyi Yu\n- **🏫 单位**：Zhejiang University of Technology ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2602.12159.md)] [[arXiv:2602.12159](https://arxiv.org/abs/2602.12159)] [Code]\n- **📝 说明**:\n\n#### [83] OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars\n- **🧑‍🔬 作者**：Zehao Xia, Yiqun Wang, Zhengda Lu, Kai Liu, Jun Xiao, Peter Wonka\n- **🏫 单位**：Chongqing University ⟐ University of Chinese Academy of Sciences ⟐ KAUST\n- **🔗 链接**：[[中英摘要](./abs/2602.11693.md)] [[arXiv:2602.11693](https://arxiv.org/abs/2602.11693)] [Code]\n- **📝 说明**:\n\n#### [84] GR-Diffusion: 3D Gaussian Representation Meets Diffusion in Whole-Body PET Reconstruction\n- **🧑‍🔬 作者**：Mengxiao Geng, Zijie Chen, Ran Hong, Bingxuan Li, Qiegen Liu\n- **🏫 单位**：Nanchang University ⟐ Institute of Artificial Intelligence, Hefei Comprehensive National Science Center\n- **🔗 链接**：[[中英摘要](./abs/2602.11653.md)] [[arXiv:2602.11653](https://arxiv.org/abs/2602.11653)] [Code]\n- **📝 说明**:\n\n#### [85] Variation-aware Flexible 3D Gaussian Editing\n- **🧑‍🔬 作者**：Hao Qin, Yukai Sun, Meng Wang, Ming Kong, Mengxu Lu, Qiang Zhu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2602.11638.md)] [[arXiv:2602.11638](https://arxiv.org/abs/2602.11638)] [Code]\n- **📝 说明**:\n\n#### [86] ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles\n- **🧑‍🔬 作者**：Seungyeon Yoo, Youngseok Jang, Dabin Kim, Youngsoo Han, Seungwoo Jung, H. Jin Kim\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2602.11575.md)] [[arXiv:2602.11575](https://arxiv.org/abs/2602.11575)] [Code]\n- **📝 说明**:\n\n#### [87] ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zehua Ma, Hanhui Li, Zhenyu Xie, Xiaonan Luo, Michael Kampffmeyer, Feng Gao, Xiaodan Liang\n- **🏫 单位**：Shenzhen Campus of Sun Yat-sen University ⟐ Peking University ⟐ Mohamed bin Zayed University of Artificial Intelligence ⟐ Guilin University of Electronic Technology ⟐ UiT The Arctic University of Norway\n- **🔗 链接**：[[中英摘要](./abs/2602.10278.md)] [[arXiv:2602.10278](https://arxiv.org/abs/2602.10278)] [Code]\n- **📝 说明**:\n\n#### [88] XSPLAIN: XAI-enabling Splat-based Prototype Learning for Attribute-aware INterpretability\n- **🧑‍🔬 作者**：Dominik Galus, Julia Farganus, Tymoteusz Zapala, Mikołaj Czachorowski, Piotr Borycki, Przemysław Spurek, Piotr Syga\n- **🏫 单位**：Wrocław University of Science and Technology ⟐ Jagiellonian University ⟐ IDEAS Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2602.10239.md)] [[arXiv:2602.10239](https://arxiv.org/abs/2602.10239)] [[Code](https://github.com/Solvro/ml-splat-xai)]\n- **📝 说明**:\n\n#### [89] ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop\n- **🧑‍🔬 作者**：Clement Fuji Tsang, Anita Hu, Or Perel, Carsten Kolve, Maria Shugrina\n- **🏫 单位**：NVIDIA ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2602.10173.md)] [[arXiv:2602.10173](https://arxiv.org/abs/2602.10173)] [Code]\n- **📝 说明**:\n\n#### [90] CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video\n- **🧑‍🔬 作者**：Hojun Song, Heejung Choi, Aro Kim, Chae-yeong Song, Gahyeon Kim, Soo Ye Kim, Jaehyup Lee, Sang-hyo Park\n- **🏫 单位**：Kyungpook National University ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2602.09816.md)] [[arXiv:2602.09816](https://arxiv.org/abs/2602.09816)] [Code]\n- **📝 说明**:\n\n#### [91] Toward Fine-Grained Facial Control in 3D Talking Head Generation\n- **🧑‍🔬 作者**：Shaoyang Xie, Xiaofeng Cong, Baosheng Yu, Zhipeng Gui, Jie Gui, Yuan Yan Tang, James Tin-Yau Kwok\n- **🏫 单位**：Southeast University ⟐ Nanyang Technological University ⟐ Wuhan University ⟐ Purple Mountain Laboratories ⟐ University of Macau ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.09736.md)] [[arXiv:2602.09736](https://arxiv.org/abs/2602.09736)] [Code]\n- **📝 说明**:\n\n#### [92] Grow with the Flow: 4D Reconstruction of Growing Plants with Gaussian Flow Fields\n- **🧑‍🔬 作者**：Weihan Luo, Lily Goli, Sherwin Bahmani, Felix Taubner, Andrea Tagliasacchi, David B. Lindell\n- **🏫 单位**：University of Toronto ⟐ Vector Institute ⟐ Simon Fraser University\n- **🔗 链接**：[[中英摘要](./abs/2602.08958.md)] [[arXiv:2602.08958](https://arxiv.org/abs/2602.08958)] [[Code](https://github.com/weihan1/growflow)]\n- **📝 说明**:\n\n#### [93] Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit\n- **🧑‍🔬 作者**：Zhendong Wang, Cihan Ruan, Jingchuan Xiao, Chuqing Shi, Wei Jiang, Wei Wang, Wenjie Liu, Nam Ling\n- **🏫 单位**：Santa Clara University ⟐ Mary Immaculate College ⟐ University of California, San Diego ⟐ Futurewei Technologies Inc. ⟐ East China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2602.08909.md)] [[arXiv:2602.08909](https://arxiv.org/abs/2602.08909)] [Code]\n- **📝 说明**:\n\n#### [94] FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction\n- **🧑‍🔬 作者**：Guan Yuan Tan, Ngoc Tuan Vu, Arghya Pal, Sailaja Rajanala, Raphael Phan C. -W., Mettu Srinivas, Chee-Ming Ting\n- **🏫 单位**：Monash University ⟐ National Institute of Technology Warangal\n- **🔗 链接**：[[中英摘要](./abs/2602.08558.md)] [[arXiv:2602.08558](https://arxiv.org/abs/2602.08558)] [Code]\n- **📝 说明**:\n\n#### [95] Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video\n- **🧑‍🔬 作者**：Zihui Gao, Ke Liu, Donny Y. Chen, Duochao Shi, Guosheng Lin, Hao Chen, Chunhua Shen\n- **🏫 单位**：Zhejiang University ⟐ Nanyang Technological University ⟐ Independent Researcher ⟐ Zhejiang University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.07891.md)] [[arXiv:2602.07891](https://arxiv.org/abs/2602.07891)] [Code]\n- **📝 说明**:\n\n#### [96] DynFOA: Generating First-Order Ambisonics with Conditional Diffusion for Dynamic and Acoustically Complex 360-Degree Videos\n- **🧑‍🔬 作者**：Ziyu Luo, Lin Chen, Qiang Qu, Xiaoming Chen, Yiran Shen\n- **🏫 单位**：Beijing Technology and Business University ⟐ The University of Sydney ⟐ Shandong University\n- **🔗 链接**：[[中英摘要](./abs/2602.06846.md)] [[arXiv:2602.06846](https://arxiv.org/abs/2602.06846)] [Code]\n- **📝 说明**:\n\n#### [97] GaussianPOP: Principled Simplification Framework for Compact 3D Gaussian Splatting via Error Quantification\n- **🧑‍🔬 作者**：Soonbin Lee, Yeong-Gyu Kim, Simon Sasse, Tomas M. Borges, Yago Sanchez, Eun-Seok Ryu, Thomas Schierl, Cornelius Hellge\n- **🏫 单位**：Fraunhofer Heinrich-Hertz-Institute (HHI) ⟐ Sungkyunkwan University (SKKU)\n- **🔗 链接**：[[中英摘要](./abs/2602.06830.md)] [[arXiv:2602.06830](https://arxiv.org/abs/2602.06830)] [Code]\n- **📝 说明**:\n\n#### [98] Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zinan Lv, Yeqian Qian, Chen Sang, Hao Liu, Danping Zou, Ming Yang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2602.07101.md)] [[arXiv:2602.07101](https://arxiv.org/abs/2602.07101)] [Code]\n- **📝 说明**:\n\n#### [99] LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Seongbo Ha, Sibaek Lee, Kyungsu Kang, Joonyeol Choi, Seungjun Tak, Hyeonwoo Yu\n- **🏫 单位**：Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](./abs/2602.06991.md)] [[arXiv:2602.06991](https://arxiv.org/abs/2602.06991)] [Code]\n- **📝 说明**:\n\n#### [100] TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction\n- **🧑‍🔬 作者**：Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall\n- **🏫 单位**：The University of Sydney\n- **🔗 链接**：[[中英摘要](./abs/2602.06400.md)] [[arXiv:2602.06400](https://arxiv.org/abs/2602.06400)] [Code]\n- **📝 说明**:\n\n#### [101] Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering\n- **🧑‍🔬 作者**：Weiquan Wang, Feifei Shao, Lin Li, Zhen Wang, Jun Xiao, Long Chen\n- **🏫 单位**：Zhejiang University ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.06343.md)] [[arXiv:2602.06343](https://arxiv.org/abs/2602.06343)] [Code]\n- **📝 说明**:\n\n#### [102] Pseudo-View Enhancement via Confidence Fusion for Unposed Sparse-View Reconstruction\n- **🧑‍🔬 作者**：Beizhen Zhao, Sicheng Yu, Guanzhi Ding, Yu Hu, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2602.21535.md)] [[arXiv:2602.21535](https://arxiv.org/abs/2602.21535)] [Code]\n- **📝 说明**:\n\n#### [103] Three-dimensional Damage Visualization of Civil Structures via Gaussian Splatting-enabled Digital Twins\n- **🧑‍🔬 作者**：Shuo Wang, Shuo Wang, Xin Nie, Yasutaka Narazaki, Thomas Matiki, Billie F. Spencer Jr.\n- **🏫 单位**：Tsinghua University ⟐ Zhejiang University-University of Illinois Urbana-Champaign Institute ⟐ Zhejiang University ⟐ University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2602.16713.md)] [[arXiv:2602.16713](https://arxiv.org/abs/2602.16713)] [Code]\n- **📝 说明**:\n\n#### [104] Unified Sensor Simulation for Autonomous Driving\n- **🧑‍🔬 作者**：Nikolay Patakin, Arsenii Shirokov, Anton Konushin, Dmitry Senushkin\n- **🏫 单位**：Lomonosov Moscow State University\n- **🔗 链接**：[[中英摘要](./abs/2602.05617.md)] [[arXiv:2602.05617](https://arxiv.org/abs/2602.05617)] [Code]\n- **📝 说明**:\n\n#### [105] QuantumGS: Quantum Encoding Framework for Gaussian Splatting\n- **🧑‍🔬 作者**：Grzegorz Wilczyński, Rafał Tobiasz, Paweł Gora, Marcin Mazur, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ IDEAS Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2602.05047.md)] [[arXiv:2602.05047](https://arxiv.org/abs/2602.05047)] [[Code](https://github.com/gwilczynski95/QuantumGS)]\n- **📝 说明**:\n\n#### [106] Nix and Fix: Targeting 1000x Compression of 3D Gaussian Splatting with Diffusion Models\n- **🧑‍🔬 作者**：Cem Eteke, Enzo Tartaglione\n- **🏫 单位**：Munich Institute of Robotics and Machine Intelligence ⟐ Technical University of Munich ⟐ LTCI ⟐ Télécom Paris ⟐ Institut Polytechnique de Paris\n- **🔗 链接**：[[中英摘要](./abs/2602.04549.md)] [[arXiv:2602.04549](https://arxiv.org/abs/2602.04549)] [Code]\n- **📝 说明**:\n\n#### [107] VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image\n- **🧑‍🔬 作者**：Teng-Fang Hsiao, Bo-Kai Ruan, Yu-Lun Liu, Hong-Han Shuai\n- **🏫 单位**：National Yang Ming Chiao Tung University\n- **🔗 链接**：[[中英摘要](./abs/2602.04349.md)] [[arXiv:2602.04349](https://arxiv.org/abs/2602.04349)] [[Code](https://github.com/BlueDyee/VecSet-Edit)]\n- **📝 说明**:\n\n#### [108] JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction\n- **🧑‍🔬 作者**：Zihan Lou, Jinlong Fan, Sihan Ma, Yuxiang Yang, Jing Zhang\n- **🏫 单位**：Wuhan University ⟐ Hangzhou Dianzi University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2602.04317.md)] [[arXiv:2602.04317](https://arxiv.org/abs/2602.04317)] [[Code](https://github.com/MiliLab/JOintGS)]\n- **📝 说明**:\n\n#### [109] Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions\n- **🧑‍🔬 作者**：Li Wang, Ruixuan Gong, Yumo Han, Lei Yang, Lu Yang, Ying Li, Bin Xu, Huaping Liu, Rong Fu\n- **🏫 单位**：Beijing Institute of Technology ⟐ Chongqing Innovation Center, Beijing Institute of Technology ⟐ University of Science and Technology Beijing ⟐ Nanyang Technological University ⟐ Tsinghua University ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2602.04251.md)] [[arXiv:2602.04251](https://arxiv.org/abs/2602.04251)] [Code]\n- **📝 说明**:\n\n#### [110] AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Joanna Kaleta, Bartosz Świrta, Kacper Kania, Przemysław Spurek, Marek Kowalski\n- **🏫 单位**：Warsaw University of Technology ⟐ Sano Centre for Computational Medicine ⟐ Jagiellonian University ⟐ IDEAS NCBR ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](./abs/2602.04043.md)] [[arXiv:2602.04043](https://arxiv.org/abs/2602.04043)] [[Code](https://github.com/joaxkal/AnyStyle)]\n- **📝 说明**:\n\n#### [111] Constrained Dynamic Gaussian Splatting\n- **🧑‍🔬 作者**：Zihan Zheng, Zhenglong Wu, Xuanxuan Wang, Houqiang Zhong, Xiaoyun Zhang, Qiang Hu, Guangtao Zhai, Wenjun Zhang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2602.03538.md)] [[arXiv:2602.03538](https://arxiv.org/abs/2602.03538)] [Code]\n- **📝 说明**:\n\n#### [112] Pi-GS: Sparse-View Gaussian Splatting with Dense π^3 Initialization\n- **🧑‍🔬 作者**：Manuel Hofer, Markus Steinberger, Thomas Köhler\n- **🏫 单位**：Graz University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.03327.md)] [[arXiv:2602.03327](https://arxiv.org/abs/2602.03327)] [Code]\n- **📝 说明**:\n\n#### [113] SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation\n- **🧑‍🔬 作者**：Zhanfeng Liao, Jiajun Zhang, Hanzhang Tu, Zhixi Wang, Yunqi Gao, Hongwen Zhang, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ Beijing University of Posts and Telecommunications ⟐ Beijing Normal University ⟐ Central China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2602.02989.md)] [[arXiv:2602.02989](https://arxiv.org/abs/2602.02989)] [Code]\n- **📝 说明**:\n\n#### [114] SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation\n- **🧑‍🔬 作者**：Mu Huang, Hui Wang, Kerui Ren, Linning Xu, Yunsong Zhou, Mulin Yu, Bo Dai, Jiangmiao Pang\n- **🏫 单位**：Fudan University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2602.02402.md)] [[arXiv:2602.02402](https://arxiv.org/abs/2602.02402)] [[Code](https://github.com/Wrioste/SoMA)]\n- **📝 说明**:\n\n#### [115] Intellectual Property Protection for 3D Gaussian Splatting Assets: A Survey\n- **🧑‍🔬 作者**：Longjie Zhao, Ziming Hong, Jiaxin Huang, Runnan Chen, Mingming Gong, Tongliang Liu\n- **🏫 单位**：The University of Sydney ⟐ The University of Melbourne ⟐ Mohamed bin Zayed University of Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2602.03878.md)] [[arXiv:2602.03878](https://arxiv.org/abs/2602.03878)] [[Code](https://github.com/tmllab/Awesome-3DGS-IP-Protection)]\n- **📝 说明**:\n\n#### [116] FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization\n- **🧑‍🔬 作者**：Yikun Ma, Yiqing Li, Jingwen Ye, Zhongkai Wu, Weidong Zhang, Lin Gao, Zhi Jin\n- **🏫 单位**：Sun Yat-sen University ⟐ Anonymous Institution ⟐ Institute of Computing Technology, Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Guangdong Provincial Key Laboratory of Fire Science and Intelligent Emergency Technology ⟐ Guangdong Provincial Key Laboratory of Robotics and Digital Intelligent Manufacturing Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.01723.md)] [[arXiv:2602.01723](https://arxiv.org/abs/2602.01723)] [Code]\n- **📝 说明**:\n\n#### [117] Position: 3D Gaussian Splatting Watermarking Should Be Scenario-Driven and Threat-Model Explicit\n- **🧑‍🔬 作者**：Yangfan Deng, Anirudh Nakra, Min Wu\n- **🏫 单位**：University of Maryland, College Park\n- **🔗 链接**：[[中英摘要](./abs/2602.02602.md)] [[arXiv:2602.02602](https://arxiv.org/abs/2602.02602)] [Code]\n- **📝 说明**:\n\n#### [118] Split&Splat: Zero-Shot Panoptic Segmentation via Explicit Instance Modeling and 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Leonardo Monchieri, Elena Camuffo, Francesco Barbato, Pietro Zanuttigh, Simone Milani\n- **🏫 单位**：University of Padova\n- **🔗 链接**：[[中英摘要](./abs/2602.03809.md)] [[arXiv:2602.03809](https://arxiv.org/abs/2602.03809)] [Code]\n- **📝 说明**:\n\n#### [119] Radioactive 3D Gaussian Ray Tracing for Tomographic Reconstruction\n- **🧑‍🔬 作者**：Ling Chen, Bao Yang\n- **🏫 单位**：Independent Researcher ⟐ Southern Medical University\n- **🔗 链接**：[[中英摘要](./abs/2602.01057.md)] [[arXiv:2602.01057](https://arxiv.org/abs/2602.01057)] [Code]\n- **📝 说明**:\n\n#### [120] HPC: Hierarchical Point-based Latent Representation for Streaming Dynamic Gaussian Splatting Compression\n- **🧑‍🔬 作者**：Yangzhi Ma, Bojun Liu, Wenting Liao, Dong Liu, Zhu Li, Li Li\n- **🏫 单位**：University of Science and Technology of China ⟐ University of Missouri–Kansas City\n- **🔗 链接**：[[中英摘要](./abs/2602.00671.md)] [[arXiv:2602.00671](https://arxiv.org/abs/2602.00671)] [Code]\n- **📝 说明**:\n\n#### [121] PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting\n- **🧑‍🔬 作者**：Xin Zhang, Shen Chen, Jiale Zhou, Lei Li\n- **🏫 单位**：East China University of Science and Technology ⟐ Zhejiang University ⟐ Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2602.00463.md)] [[arXiv:2602.00463](https://arxiv.org/abs/2602.00463)] [Code]\n- **📝 说明**:\n\n#### [122] 3DGS$^2$-TR: Scalable Second-Order Trust-Region Method for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Roger Hsiao, Yuchen Fang, Xiangru Huang, Ruilong Li, Hesam Rabeti, Zan Gojcic, Javad Lavaei, James Demmel, Sophia Shao\n- **🏫 单位**：University of California, Berkeley ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2602.00395.md)] [[arXiv:2602.00395](https://arxiv.org/abs/2602.00395)] [Code]\n- **📝 说明**:\n\n#### [123] EAG-PT: Emission-Aware Gaussians and Path Tracing for Indoor Scene Reconstruction and Editing\n- **🧑‍🔬 作者**：Xijie Yang, Mulin Yu, Changjian Jiang, Kerui Ren, Tao Lu, Jiangmiao Pang, Dahua Lin, Bo Dai, Linning Xu\n- **🏫 单位**：Zhejiang University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong ⟐ The University of Hong Kong ⟐ Feeling AI\n- **🔗 链接**：[[中英摘要](./abs/2601.23065.md)] [[arXiv:2601.23065](https://arxiv.org/abs/2601.23065)] [Code]\n- **📝 说明**:\n\n#### [124] Self-Supervised Slice-to-Volume Reconstruction with Gaussian Representations for Fetal MRI\n- **🧑‍🔬 作者**：Yinsong Wang, Thomas Fletcher, Xinzhe Luo, Aine Travers Dineen, Rhodri Cusack, Chen Qin\n- **🏫 单位**：Imperial College London ⟐ Trinity College Institute of Neuroscience, Trinity College Dublin\n- **🔗 链接**：[[中英摘要](./abs/2601.22990.md)] [[arXiv:2601.22990](https://arxiv.org/abs/2601.22990)] [Code]\n- **📝 说明**:\n\n#### [125] PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction\n- **🧑‍🔬 作者**：Changjian Jiang, Kerui Ren, Xudong Li, Kaiwen Song, Guanghao Li, Linning Xu, Tao Lu, Junting Dong, Yu Zhang, Bo Dai, Mulin Yu\n- **🏫 单位**：Zhejiang University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Shanghai Jiao Tong University ⟐ Northwestern Polytechnical University ⟐ University of Science and Technology of China ⟐ Fudan University ⟐ The Chinese University of Hong Kong ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2601.22046.md)] [[arXiv:2601.22046](https://arxiv.org/abs/2601.22046)] [[Code](https://github.com/InternRobotics/PLANING)]\n- **📝 说明**:\n\n#### [126] FreeFix: Boosting 3D Gaussian Splatting via Fine-Tuning-Free Diffusion Models\n- **🧑‍🔬 作者**：Hongyu Zhou, Zisen Shao, Sheng Miao, Pan Wang, Dongfeng Bai, Bingbing Liu, Yiyi Liao\n- **🏫 单位**：Zhejiang University ⟐ University of Maryland, College Park ⟐ Huawei\n- **🔗 链接**：[[中英摘要](./abs/2601.20857.md)] [[arXiv:2601.20857](https://arxiv.org/abs/2601.20857)] [[Code](https://github.com/hyzhou404/FreeFix)]\n- **📝 说明**:\n\n#### [127] GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction\n- **🧑‍🔬 作者**：Mai Su, Qihan Yu, Zhongtao Wang, Yilong Li, Chengwei Pan, Yisong Chen, Guoping Wang\n- **🏫 单位**：Peking University ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2601.20331.md)] [[arXiv:2601.20331](https://arxiv.org/abs/2601.20331)] [[Code](https://github.com/GVGScode/GVGS)]\n- **📝 说明**:\n\n#### [128] Graphical X Splatting (GraphiXS): A Graphical Model for 4D Gaussian Splatting under Uncertainty\n- **🧑‍🔬 作者**：Doğa Yılmaz, Jialin Zhu, Deshan Gong, He Wang\n- **🏫 单位**：University College London ⟐ Baidu Research ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2601.19843.md)] [[arXiv:2601.19843](https://arxiv.org/abs/2601.19843)] [Code]\n- **📝 说明**:\n\n#### [129] WaterClear-GS: Optical-Aware Gaussian Splatting for Underwater Reconstruction and Restoration\n- **🧑‍🔬 作者**：Xinrui Zhang, Yufeng Wang, Shuangkang Fang, Zesheng Wang, Dacheng Qi, Wenrui Ding\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2601.19753.md)] [[arXiv:2601.19753](https://arxiv.org/abs/2601.19753)] [Code]\n- **📝 说明**:\n\n#### [130] DiffStyle3D: Consistent 3D Gaussian Stylization via Attention Optimization\n- **🧑‍🔬 作者**：Yitong Yang, Xuexin Liu, Yinglin Wang, Jing Wang, Hao Dou, Changshuo Wang, Shuting He\n- **🏫 单位**：Shanghai University of Finance and Economics ⟐ University College London\n- **🔗 链接**：[[中英摘要](./abs/2601.19717.md)] [[arXiv:2601.19717](https://arxiv.org/abs/2601.19717)] [Code]\n- **📝 说明**:\n\n#### [131] Fast Converging 3D Gaussian Splatting for 1-Minute Reconstruction\n- **🧑‍🔬 作者**：Ziyu Zhang, Tianle Liu, Diantao Tu, Shuhan Shen\n- **🏫 单位**：University of Chinese Academy of Sciences ⟐ Institute of Automation, Chinese Academy of Sciences ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2601.19489.md)] [[arXiv:2601.19489](https://arxiv.org/abs/2601.19489)] [[Code](https://github.com/will-zzy/siggraph_asia)]\n- **📝 说明**: 🏆 First Rank of SIGGRAPH Asia 2025 3DGS Challenge\n\n#### [132] UniMGS: Unifying Mesh and 3D Gaussian Splatting with Single-Pass Rasterization and Proxy-Based Deformation\n- **🧑‍🔬 作者**：Zeyu Xiao, Mingyang Sun, Yimin Cong, Lintao Wang, Dongliang Kou, Zhenyi Wu, Dingkang Yang, Peng Zhai, Zeyu Wang, Lihua Zhang\n- **🏫 单位**：Fudan University ⟐ Fysics Intelligence Technologies Co., Ltd. ⟐ The Hong Kong University of Science and Technology (Guangzhou) ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2601.19233.md)] [[arXiv:2601.19233](https://arxiv.org/abs/2601.19233)] [Code]\n- **📝 说明**:\n\n#### [133] Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting\n- **🧑‍🔬 作者**：Tong Shi, Melonie de Almeida, Daniela Ivanova, Nicolas Pugeault, Paul Henderson\n- **🏫 单位**：University of Glasgow\n- **🔗 链接**：[[中英摘要](./abs/2601.18633.md)] [[arXiv:2601.18633](https://arxiv.org/abs/2601.18633)] [[Code](https://github.com/stonewalking/Splat-portrait)]\n- **📝 说明**:\n\n#### [134] ExoGS: A 4D Real-to-Sim-to-Real Framework for Scalable Manipulation Data Collection\n- **🧑‍🔬 作者**：Yiming Wang, Ruogu Zhang, Minyang Li, Hao Shi, Junbo Wang, Deyi Li, Jieji Ren, Wenhai Liu, Weiming Wang, Hao-Shu Fang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2601.18629.md)] [[arXiv:2601.18629](https://arxiv.org/abs/2601.18629)] [[Code](https://github.com/zaixiabalala/ExoGS)]\n- **📝 说明**:\n\n#### [135] LoD-Structured 3D Gaussian Splatting for Streaming Video Reconstruction\n- **🧑‍🔬 作者**：Xinhui Liu, Can Wang, Lei Liu, Zhenghao Chen, Wei Jiang, Wei Wang, Dong Xu\n- **🏫 单位**：The University of Hong Kong ⟐ Futurewei Technologies Inc ⟐ The University of Newcastle\n- **🔗 链接**：[[中英摘要](./abs/2601.18475.md)] [[arXiv:2601.18475](https://arxiv.org/abs/2601.18475)] [Code]\n- **📝 说明**:\n\n#### [136] Geometry-Grounded Gaussian Splatting\n- **🧑‍🔬 作者**：Baowen Zhang, Chenxing Jiang, Heng Li, Shaojie Shen, Ping Tan\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2601.17835.md)] [[arXiv:2601.17835](https://arxiv.org/abs/2601.17835)] [Code]\n- **📝 说明**:\n\n#### [137] PocketGS: On-Device Training of 3D Gaussian Splatting for High Perceptual Modeling\n- **🧑‍🔬 作者**：Wenzhi Guo, Guangchi Fang, Shu Yang, Bing Wang\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](./abs/2601.17354.md)] [[arXiv:2601.17354](https://arxiv.org/abs/2601.17354)] [Code]\n- **📝 说明**:\n\n#### [138] LGDWT-GS: Local and Global Discrete Wavelet-Regularized 3D Gaussian Splatting for Sparse-View Scene Reconstruction\n- **🧑‍🔬 作者**：Shima Salehi, Atharva Agashe, Andrew J. McFarland, Joshua Peeples\n- **🏫 单位**：Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2601.17185.md)] [[arXiv:2601.17185](https://arxiv.org/abs/2601.17185)] [Code]\n- **📝 说明**:\n\n#### [139] EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis\n- **🧑‍🔬 作者**：Sheng Miao, Sijin Li, Pan Wang, Dongfeng Bai, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao\n- **🏫 单位**：Zhejiang University ⟐ Huawei ⟐ University of Tübingen\n- **🔗 链接**：[[中英摘要](./abs/2601.15951.md)] [[arXiv:2601.15951](https://arxiv.org/abs/2601.15951)] [Code]\n- **📝 说明**:\n\n#### [140] ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling\n- **🧑‍🔬 作者**：Zhaoqi Su, Shihai Chen, Xinyan Lin, Liqin Huang, Zhipeng Su, Xiaoqiang Lu\n- **🏫 单位**：Fuzhou University\n- **🔗 链接**：[[中英摘要](./abs/2601.15897.md)] [[arXiv:2601.15897](https://arxiv.org/abs/2601.15897)] [Code]\n- **📝 说明**:\n\n#### [141] LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuhan Chen, Wenxuan Yu, Guofa Li, Yijun Xu, Ying Fang, Yicui Shi, Long Cao, Wenbo Chu, Keqiang Li\n- **🏫 单位**：College of Mechanical and Vehicle Engineering, Chongqing University ⟐ School of Electronic Information, Wuhan University ⟐ National Innovation Center of Intelligent and Connected Vehicles ⟐ School of Vehicle and Mobility, Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2601.15772.md)] [[arXiv:2601.15772](https://arxiv.org/abs/2601.15772)] [Code]\n- **📝 说明**:\n\n#### [142] LL-GaussianMap: Zero-shot Low-Light Image Enhancement via 2D Gaussian Splatting Guided Gain Maps\n- **🧑‍🔬 作者**：Yuhan Chen, Ying Fang, Guofa Li, Wenxuan Yu, Yicui Shi, Jingrui Zhang, Kefei Qian, Wenbo Chu, Keqiang Li\n- **🏫 单位**：College of Mechanical and Vehicle Engineering, Chongqing University ⟐ National Innovation Center of Intelligent and Connected Vehicles ⟐ School of Computer Science, Wuhan University ⟐ School of Vehicle and Mobility, Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2601.15766.md)] [[arXiv:2601.15766](https://arxiv.org/abs/2601.15766)] [Code]\n- **📝 说明**:\n\n#### [143] SplatBus: A Gaussian Splatting Viewer Framework via GPU Interprocess Communication\n- **🧑‍🔬 作者**：Yinghan Xu, Théo Morales, John Dingliana\n- **🏫 单位**：Trinity College Dublin\n- **🔗 链接**：[[中英摘要](./abs/2601.15431.md)] [[arXiv:2601.15431](https://arxiv.org/abs/2601.15431)] [[Code](https://github.com/RockyXu66/splatbus)]\n- **📝 说明**:\n\n#### [144] POTR: Post-Training 3DGS Compression\n- **🧑‍🔬 作者**：Bert Ramlot, Martijn Courteaux, Peter Lambert, Glenn Van Wallendael\n- **🏫 单位**：Ghent University ⟐ imec\n- **🔗 链接**：[[中英摘要](./abs/2601.14821.md)] [[arXiv:2601.14821](https://arxiv.org/abs/2601.14821)] [Code]\n- **📝 说明**:\n\n#### [145] Structured Image-based Coding for Efficient Gaussian Splatting Compression\n- **🧑‍🔬 作者**：Pedro Martin, Antonio Rodrigues, Joao Ascenso, Maria Paula Queluz\n- **🏫 单位**：Instituto de Telecomunicacoes ⟐ Instituto Superior Tecnico ⟐ University of Lisbon\n- **🔗 链接**：[[中英摘要](./abs/2601.14510.md)] [[arXiv:2601.14510](https://arxiv.org/abs/2601.14510)] [Code]\n- **📝 说明**:\n\n#### [146] Rig-Aware 3D Reconstruction of Vehicle Undercarriages using Gaussian Splatting\n- **🧑‍🔬 作者**：Nitin Kulkarni, Akhil Devarashetti, Charlie Cluss, Livio Forte, Dan Buckmaster, Philip Schneider, Chunming Qiao, Alina Vereshchaka\n- **🏫 单位**：University at Buffalo ⟐ ACV Auctions\n- **🔗 链接**：[[中英摘要](./abs/2601.14208.md)] [[arXiv:2601.14208](https://arxiv.org/abs/2601.14208)] [Code]\n- **📝 说明**:\n\n#### [147] GaussExplorer: 3D Gaussian Splatting for Embodied Exploration and Reasoning\n- **🧑‍🔬 作者**：Kim Yu-Ji, Dahye Lee, Kim Jun-Seong, GeonU Kim, Nam Hyeon-Woo, Yongjin Kwon, Yu-Chiang Frank Wang, Jaesung Choe, Tae-Hyun Oh\n- **🏫 单位**：POSTECH ⟐ KAIST ⟐ ETRI ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2601.13132.md)] [[arXiv:2601.13132](https://arxiv.org/abs/2601.13132)] [Code]\n- **📝 说明**:\n\n#### [148] TreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement\n- **🧑‍🔬 作者**：Belal Shaheen, Minh-Hieu Nguyen, Bach-Thuan Bui, Shubham, Tim Wu, Michael Fairley, Matthew David Zane, Michael Wu, James Tompkin\n- **🏫 单位**：Coolant ⟐ Brown University\n- **🔗 链接**：[[中英摘要](./abs/2601.12823.md)] [[arXiv:2601.12823](https://arxiv.org/abs/2601.12823)] [Code]\n- **📝 说明**:\n\n#### [149] GaussianTrimmer: Online Trimming Boundaries for 3DGS Segmentation\n- **🧑‍🔬 作者**：Liwei Liao, Ronggang Wang\n- **🏫 单位**：Peking University Shenzhen Graduate School\n- **🔗 链接**：[[中英摘要](./abs/2601.12683.md)] [[arXiv:2601.12683](https://arxiv.org/abs/2601.12683)] [Code]\n- **📝 说明**:\n\n#### [150] Active Semantic Mapping of Horticultural Environments Using Gaussian Splatting\n- **🧑‍🔬 作者**：Jose Cuaran, Naveen Kumar Uppalapati, Girish Chowdhary\n- **🏫 单位**：Siebel School of Computing and Data Science ⟐ Department of Agricultural and Biological Engineering ⟐ National Center for Supercomputing Applications ⟐ University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2601.12122.md)] [[arXiv:2601.12122](https://arxiv.org/abs/2601.12122)] [Code]\n- **📝 说明**:\n\n#### [151] studentSplat: Your Student Model Learns Single-view 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yimu Pan, Hongda Mao, Qingshuang Chen, Yelin Kim\n- **🏫 单位**：The Pennsylvania State University ⟐ Amazon\n- **🔗 链接**：[[中英摘要](./abs/2601.11772.md)] [[arXiv:2601.11772](https://arxiv.org/abs/2601.11772)] [Code]\n- **📝 说明**:\n\n#### [152] Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhendong Wang, Lebin Zhou, Jingchuan Xiao, Rongduo Han, Nam Ling, Cihan Ruan\n- **🏫 单位**：Santa Clara University ⟐ Mary Immaculate College ⟐ Nankai University\n- **🔗 链接**：[[中英摘要](./abs/2601.10075.md)] [[arXiv:2601.10075](https://arxiv.org/abs/2601.10075)] [Code]\n- **📝 说明**:\n\n#### [153] TIDI-GS: Floater Suppression in 3D Gaussian Splatting for Enhanced Indoor Scene Fidelity\n- **🧑‍🔬 作者**：Sooyeun Yang, Cheyul Im, Jee Won Lee, Jongseong Brad Choi\n- **🏫 单位**：State University of New York, Korea ⟐ State University of New York, Stony Brook\n- **🔗 链接**：[[中英摘要](./abs/2601.09291.md)] [[arXiv:2601.09291](https://arxiv.org/abs/2601.09291)] [Code]\n- **📝 说明**:\n\n#### [154] 3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing\n- **🧑‍🔬 作者**：Jiahua Dong, Yu-Xiong Wang\n- **🏫 单位**：University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2601.07963.md)] [[arXiv:2601.07963](https://arxiv.org/abs/2601.07963)] [Code]\n- **📝 说明**:\n\n#### [155] NAS-GS: Noise-Aware Sonar Gaussian Splatting\n- **🧑‍🔬 作者**：Shida Xu, Jingqi Jiang, Jonatan Scharff Willners, Sen Wang\n- **🏫 单位**：I-X ⟐ Department of Electrical and Electronic Engineering, Imperial College London ⟐ Frontier Robotics ⟐ The National Robotarium\n- **🔗 链接**：[[中英摘要](./abs/2601.06285.md)] [[arXiv:2601.06285](https://arxiv.org/abs/2601.06285)] [Code]\n- **📝 说明**:\n\n#### [156] LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Yinghan Xu, John Dingliana\n- **🏫 单位**：Trinity College Dublin\n- **🔗 链接**：[[中英摘要](./abs/2601.05853.md)] [[arXiv:2601.05853](https://arxiv.org/abs/2601.05853)] [[Code](https://github.com/RockyXu66/LayerGS)]\n- **📝 说明**:\n\n#### [157] FeatureSLAM: Feature-enriched 3D gaussian splatting SLAM in real time\n- **🧑‍🔬 作者**：Christopher Thirgood, Oscar Mendez, Erin Ling, Jon Storey, Simon Hadfield\n- **🏫 单位**：University of Surrey ⟐ I3D Robotics\n- **🔗 链接**：[[中英摘要](./abs/2601.05738.md)] [[arXiv:2601.05738](https://arxiv.org/abs/2601.05738)] [Code]\n- **📝 说明**:\n\n#### [158] GS-DMSR: Dynamic Sensitive Multi-scale Manifold Enhancement for Accelerated High-Quality 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Nengbo Lu, Minghua Pan, Shaohua Sun, Yizhou Liang\n- **🏫 单位**：School of Artificial Intelligence, Guilin University of Electronic Technology ⟐ School of Computer Science and Information Security, Guilin University of Electronic Technology ⟐ Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology\n- **🔗 链接**：[[中英摘要](./abs/2601.05584.md)] [[arXiv:2601.05584](https://arxiv.org/abs/2601.05584)] [Code]\n- **📝 说明**:\n\n#### [159] GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xuan Cheng, Jiahao Rao, Chengyang Li, Wenhao Wang, Weilin Chen, Lvqing Yang\n- **🏫 单位**：School of Informatics, Xiamen University\n- **🔗 链接**：[[中英摘要](./abs/2601.05511.md)] [[arXiv:2601.05511](https://arxiv.org/abs/2601.05511)] [Code]\n- **📝 说明**:\n\n#### [160] ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yen-Jen Chiou, Wei-Tse Cheng, Yuan-Fu Yang\n- **🏫 单位**：National Yang Ming Chiao Tung University\n- **🔗 链接**：[[中英摘要](./abs/2601.04754.md)] [[arXiv:2601.04754](https://arxiv.org/abs/2601.04754)] [[Code](https://github.com/chiou1203/ProFuse)]\n- **📝 说明**:\n\n#### [161] SCAR-GS: Spatial Context Attention for Residuals in Progressive Gaussian Splatting\n- **🧑‍🔬 作者**：Diego Revilla, Pooja Suresh, Anand Bhojan, Wei Tsang Ooi\n- **🏫 单位**：National University of Singapore ⟐ University of Deusto\n- **🔗 链接**：[[中英摘要](./abs/2601.04348.md)] [[arXiv:2601.04348](https://arxiv.org/abs/2601.04348)] [Code]\n- **📝 说明**:\n\n#### [162] IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Wei Long, Haifeng Wu, Shiyin Jiang, Jinhua Zhang, Xinchun Ji, Shuhang Gu\n- **🏫 单位**：University of Electronic Science and Technology of China ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2601.03824.md)] [[arXiv:2601.03824](https://arxiv.org/abs/2601.03824)] [Code]\n- **📝 说明**:\n\n#### [163] CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature\n- **🧑‍🔬 作者**：Eldad Matmon, Amit Bracha, Noam Rotstein, Ron Kimmel\n- **🏫 单位**：Technion - Israel Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2601.03319.md)] [[arXiv:2601.03319](https://arxiv.org/abs/2601.03319)] [Code]\n- **📝 说明**:\n\n#### [164] A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Ziyang Sun, Lingfan Bao, Tianhu Peng, Jingcheng Sun, Chengxu Zhou\n- **🏫 单位**：Department of Computer Science, University College London\n- **🔗 链接**：[[中英摘要](./abs/2601.03200.md)] [[arXiv:2601.03200](https://arxiv.org/abs/2601.03200)] [Code]\n- **📝 说明**:\n\n#### [165] SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting for Next Best View Selection\n- **🧑‍🔬 作者**：Kim Jun-Seong, Tae-Hyun Oh, Eduardo Perez-Pellitero, Youngkyoon Jang\n- **🏫 单位**：POSTECH ⟐ KAIST ⟐ Huawei Noah's Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2601.03024.md)] [[arXiv:2601.03024](https://arxiv.org/abs/2601.03024)] [Code]\n- **📝 说明**:\n\n#### [166] 360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images\n- **🧑‍🔬 作者**：Jiaqi Yao, Zhongmiao Yan, Jingyi Xu, Songpengcheng Xia, Yan Xiang, Ling Pei\n- **🏫 单位**：Shanghai Key Laboratory of Navigation and Location Based Services, Shanghai Jiao Tong University ⟐ State Key Laboratory of Submarine Geoscience, School of Automation and Intelligent Sensing, Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2601.02102.md)] [[arXiv:2601.02102](https://arxiv.org/abs/2601.02102)] [Code]\n- **📝 说明**:\n\n#### [167] SketchRodGS: Sketch-based Extraction of Slender Geometries for Animating Gaussian Splatting Scenes\n- **🧑‍🔬 作者**：Haato Watanabe, Nobuyuki Umetani\n- **🏫 单位**：The University of Tokyo\n- **🔗 链接**：[[中英摘要](./abs/2601.02072.md)] [[arXiv:2601.02072](https://arxiv.org/abs/2601.02072)] [Code]\n- **📝 说明**:\n\n#### [168] ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chuhang Ma, Shuai Tan, Ye Pan, Jiaolong Yang, Xin Tong\n- **🏫 单位**：JHC & AI Institute, Shanghai Jiao Tong University ⟐ Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](./abs/2601.01847.md)] [[arXiv:2601.01847](https://arxiv.org/abs/2601.01847)] [Code]\n- **📝 说明**:\n\n#### [169] Clean-GS: Semantic Mask-Guided Pruning for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Subhankar Mishra\n- **🏫 单位**：School of Computer Sciences ⟐ National Institute of Science Education and Research\n- **🔗 链接**：[[中英摘要](./abs/2601.00913.md)] [[arXiv:2601.00913](https://arxiv.org/abs/2601.00913)] [[Code](https://github.com/smlab-niser/clean-gs)]\n- **📝 说明**:\n\n#### [170] ShadowGS: Shadow-Aware 3D Gaussian Splatting for Satellite Imagery\n- **🧑‍🔬 作者**：Feng Luo, Hongbo Pan, Xiang Yang, Baoyu Jiang, Fengqing Liu, Tao Huang\n- **🏫 单位**：Central South University\n- **🔗 链接**：[[中英摘要](./abs/2601.00939.md)] [[arXiv:2601.00939](https://arxiv.org/abs/2601.00939)] [Code]\n- **📝 说明**:\n\n#### [171] RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization\n- **🧑‍🔬 作者**：Wei-Tse Cheng, Yen-Jen Chiou, Yuan-Fu Yang\n- **🏫 单位**：National Yang Ming Chiao Tung University\n- **🔗 链接**：[[中英摘要](./abs/2601.00705.md)] [[arXiv:2601.00705](https://arxiv.org/abs/2601.00705)] [[Code](https://github.com/Breeze1124/RGS-SLAM)]\n- **📝 说明**:\n\n#### [172] SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting\n- **🧑‍🔬 作者**：Jun-Jee Chao, Volkan Isler\n- **🏫 单位**：University of Minnesota ⟐ The University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2601.00285.md)] [[arXiv:2601.00285](https://arxiv.org/abs/2601.00285)] [Code]\n- **📝 说明**:\n\n#### [173] PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes\n- **🧑‍🔬 作者**：Luca Collorone, Mert Kiray, Indro Spinelli, Fabio Galasso, Benjamin Busam\n- **🏫 单位**：Sapienza University of Rome ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2512.24986.md)] [[arXiv:2512.24986](https://arxiv.org/abs/2512.24986)] [Code]\n- **📝 说明**:\n\n#### [174] Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression\n- **🧑‍🔬 作者**：Xiang Liu, Yimin Zhou, Jinxiang Wang, Yujun Huang, Shuzhao Xie, Shiyu Qin, Mingyao Hong, Jiawei Li, Yaowei Wang, Zhi Wang, Shu-Tao Xia, Bin Chen\n- **🏫 单位**：Tsinghua University ⟐ Harbin Institute of Technology, Shenzhen ⟐ Huawei\n- **🔗 链接**：[[中英摘要](./abs/2512.24742.md)] [[arXiv:2512.24742](https://arxiv.org/abs/2512.24742)] [[Code](https://github.com/splatwizard/splatwizard)]\n- **📝 说明**:\n\n#### [175] Improved 3D Gaussian Splatting of Unknown Spacecraft Structure Using Space Environment Illumination Knowledge\n- **🧑‍🔬 作者**：Tae Ha Park, Simone D'Amico\n- **🏫 单位**：Nara Space Technology Inc ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2512.23998.md)] [[arXiv:2512.23998](https://arxiv.org/abs/2512.23998)] [Code]\n- **📝 说明**:\n\n#### [176] Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yiqian Li, Wen Jiang, Kostas Daniilidis\n- **🏫 单位**：University of Pennsylvania\n- **🔗 链接**：[[中英摘要](./abs/2512.22771.md)] [[arXiv:2512.22771](https://arxiv.org/abs/2512.22771)] [Code]\n- **📝 说明**:\n\n#### [177] AirGS: Real-Time 4D Gaussian Streaming for Free-Viewpoint Video Experiences\n- **🧑‍🔬 作者**：Zhe Wang, Jinghang Li, Yifei Zhu\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2512.20943.md)] [[arXiv:2512.20943](https://arxiv.org/abs/2512.20943)] [Code]\n- **📝 说明**:\n\n#### [178] Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yoonwoo Jeong, Cheng Sun, Frank Wang, Minsu Cho, Jaesung Choe\n- **🏫 单位**：NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2512.20927.md)] [[arXiv:2512.20927](https://arxiv.org/abs/2512.20927)] [Code]\n- **📝 说明**:\n\n#### [179] Nebula: Enable City-Scale 3D Gaussian Splatting in Virtual Reality via Collaborative Rendering and Accelerated Stereo Rasterization\n- **🧑‍🔬 作者**：He Zhu, Zheng Liu, Xingyang Li, Anbang Wu, Jieru Zhao, Fangxin Liu, Yiming Gan, Jingwen Leng, Yu Feng\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai Qi Zhi Institute ⟐ Institute of Computing Technology, Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2512.20495.md)] [[arXiv:2512.20495](https://arxiv.org/abs/2512.20495)] [Code]\n- **📝 说明**:\n\n#### [180] Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)\n- **🧑‍🔬 作者**：Robert van de Ven, Trim Bresilla, Bram Nelissen, Ard Nieuwenhuizen, Eldert J. van Henten, Gert Kootstra\n- **🏫 单位**：Agricultural Biosystems Engineering, Wageningen University & Research ⟐ Agrosystems Research, Wageningen University & Research\n- **🔗 链接**：[[中英摘要](./abs/2512.20148.md)] [[arXiv:2512.20148](https://arxiv.org/abs/2512.20148)] [Code]\n- **📝 说明**:\n\n#### [181] HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction\n- **🧑‍🔬 作者**：Jong Wook Kim, Wonseok Roh, Ha Dam Baek, Pilhyeon Lee, Jonghyun Choi, Sangpil Kim\n- **🏫 单位**：Korea University ⟐ Hyundai Motor Company ⟐ Inha University\n- **🔗 链接**：[[中英摘要](./abs/2512.19871.md)] [[arXiv:2512.19871](https://arxiv.org/abs/2512.19871)] [Code]\n- **📝 说明**:\n\n#### [182] WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion\n- **🧑‍🔬 作者**：Hanyang Kong, Xingyi Yang, Xiaoxu Zheng, Xinchao Wang\n- **🏫 单位**：National University of Singapore ⟐ The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2512.19678.md)] [[arXiv:2512.19678](https://arxiv.org/abs/2512.19678)] [[Code](https://github.com/HyoKong/WorldWarp)]\n- **📝 说明**:\n\n#### [183] 4D Gaussian Splatting as a Learned Dynamical System\n- **🧑‍🔬 作者**：Arnold Caleb Asiimwe, Carl Vondrick\n- **🏫 单位**：Princeton University ⟐ Columbia University\n- **🔗 链接**：[[中英摘要](./abs/2512.19648.md)] [[arXiv:2512.19648](https://arxiv.org/abs/2512.19648)] [Code]\n- **📝 说明**:\n\n#### [184] SplatBright: Generalizable Low-Light Scene Reconstruction from Sparse Views via Physically-Guided Gaussian Enhancement\n- **🧑‍🔬 作者**：Yue Wen, Liang Song, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ China DXR Technology Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2512.18655.md)] [[arXiv:2512.18655](https://arxiv.org/abs/2512.18655)] [Code]\n- **📝 说明**:\n\n#### [185] Geometric-Photometric Event-based 3D Gaussian Ray Tracing\n- **🧑‍🔬 作者**：Kai Kohyama, Yoshimitsu Aoki, Guillermo Gallego, Shintaro Shiba\n- **🏫 单位**：Keio University ⟐ Robotics Institute Germany\n- **🔗 链接**：[[中英摘要](./abs/2512.18640.md)] [[arXiv:2512.18640](https://arxiv.org/abs/2512.18640)] [Code]\n- **📝 说明**:\n\n#### [186] MatSpray: Fusing 2D Material World Knowledge on 3D Geometry\n- **🧑‍🔬 作者**：Philipp Langsteiner, Jan-Niklas Dihlmann, Hendrik P.A. Lensch\n- **🏫 单位**：University of Tübingen\n- **🔗 链接**：[[中英摘要](./abs/2512.18314.md)] [[arXiv:2512.18314](https://arxiv.org/abs/2512.18314)] [Code]\n- **📝 说明**:\n\n#### [187] Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding\n- **🧑‍🔬 作者**：Yue Li, Qi Ma, Runyi Yang, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Theo Gevers, Luc Van Gool, Danda Pani Paudel, Martin R. Oswald\n- **🏫 单位**：University of Amsterdam ⟐ ETH Zurich ⟐ Sofia University ⟐ University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2512.17817.md)] [[arXiv:2512.17817](https://arxiv.org/abs/2512.17817)] [Code]\n- **📝 说明**:\n\n#### [188] Animate Any Character in Any World\n- **🧑‍🔬 作者**：Yitong Wang, Fangyun Wei, Hongyang Zhang, Bo Dai, Yan Lu\n- **🏫 单位**：Fudan University ⟐ Microsoft Research ⟐ University of Waterloo ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2512.17796.md)] [[arXiv:2512.17796](https://arxiv.org/abs/2512.17796)] [[Code](https://github.com/snowflakewang/AniX)]\n- **📝 说明**:\n\n#### [189] G3Splat: Geometrically Consistent Generalizable Gaussian Splatting\n- **🧑‍🔬 作者**：Mehdi Hosseinzadeh, Shin-Fang Chng, Yi Xu, Simon Lucey, Ian Reid, Ravi Garg\n- **🏫 单位**：Australian Institute for Machine Learning ⟐ Goertek Alpha Labs ⟐ MBZUAI\n- **🔗 链接**：[[中英摘要](./abs/2512.17547.md)] [[arXiv:2512.17547](https://arxiv.org/abs/2512.17547)] [[Code](https://github.com/m80hz/g3splat)]\n- **📝 说明**:\n\n#### [190] FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views\n- **🧑‍🔬 作者**：Qijian Tian, Xin Tan, Jiayu Ying, Xuhong Wang, Yuan Xie, Lizhuang Ma\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ East China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2512.17541.md)] [[arXiv:2512.17541](https://arxiv.org/abs/2512.17541)] [Code]\n- **📝 说明**:\n\n#### [191] Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture\n- **🧑‍🔬 作者**：Haodi He, Jihun Yu, Ronald Fedkiw\n- **🏫 单位**：Epic Games ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2512.16397.md)] [[arXiv:2512.16397](https://arxiv.org/abs/2512.16397)] [Code]\n- **📝 说明**:\n\n#### [192] Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation\n- **🧑‍🔬 作者**：Kaiwen Jiang, Xueting Li, Seonwook Park, Ravi Ramamoorthi, Shalini De Mello, Koki Nagano\n- **🏫 单位**：University of California, San Diego ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2512.16893.md)] [[arXiv:2512.16893](https://arxiv.org/abs/2512.16893)] [Code]\n- **📝 说明**:\n\n#### [193] Flying in Clutter on Monocular RGB by Learning in 3D Radiance Fields with Domain Adaptation\n- **🧑‍🔬 作者**：Xijie Huang, Jinhan Li, Tianyue Wu, Xin Zhou, Zhichao Han, Fei Gao\n- **🏫 单位**：State Key Laboratory of Industrial Control Technology, Zhejiang University ⟐ Differential Robotics\n- **🔗 链接**：[[中英摘要](./abs/2512.17349.md)] [[arXiv:2512.17349](https://arxiv.org/abs/2512.17349)] [Code]\n- **📝 说明**:\n\n#### [194] SDFoam: Signed-Distance Foam for explicit surface reconstruction\n- **🧑‍🔬 作者**：Antonella Rech, Nicola Conci, Nicola Garau\n- **🏫 单位**：University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2512.16706.md)] [[arXiv:2512.16706](https://arxiv.org/abs/2512.16706)] [Code]\n- **📝 说明**:\n\n#### [195] Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering\n- **🧑‍🔬 作者**：Divam Gupta, Anuj Pahuja, Nemanja Bartolovic, Tomas Simon, Forrest Iandola, Giljoo Nam\n- **🏫 单位**：Meta\n- **🔗 链接**：[[中英摘要](./abs/2512.15711.md)] [[arXiv:2512.15711](https://arxiv.org/abs/2512.15711)] [Code]\n- **📝 说明**:\n\n#### [196] Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Arthur Moreau, Richard Shaw, Michal Nazarczuk, Jisu Shin, Thomas Tanay, Zhensong Zhang, Songcen Xu, Eduardo Pérez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2512.15508.md)] [[arXiv:2512.15508](https://arxiv.org/abs/2512.15508)] [Code]\n- **📝 说明**:\n\n#### [197] VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments\n- **🧑‍🔬 作者**：Yuze Wu, Mo Zhu, Xingxing Li, Yuheng Du, Yuxin Fan, Wenjun Li, Zhichao Han, Xin Zhou, Fei Gao\n- **🏫 单位**：Zhejiang University ⟐ Differential Robotics\n- **🔗 链接**：[[中英摘要](./abs/2512.15258.md)] [[arXiv:2512.15258](https://arxiv.org/abs/2512.15258)] [Code]\n- **📝 说明**:\n\n#### [198] MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance\n- **🧑‍🔬 作者**：Kaizhe Zhang, Shinan Chen, Qian Zhao, Weizhan Zhang, Caixia Yan, Yudeng Xin\n- **🏫 单位**：Xi’an Jiaotong University ⟐ University of Melbourne\n- **🔗 链接**：[[中英摘要](./abs/2512.15048.md)] [[arXiv:2512.15048](https://arxiv.org/abs/2512.15048)] [Code]\n- **📝 说明**:\n\n#### [199] Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos\n- **🧑‍🔬 作者**：Guofan Fan, Yihang Chen, Yingshu Chen, Xiaofeng Wang, Yang Li, Ruigang Yang\n- **🏫 单位**：The University of Hong Kong ⟐ GigaAI\n- **🔗 链接**：[[中英摘要](./abs/2512.14406.md)] [[arXiv:2512.14406](https://arxiv.org/abs/2512.14406)] [Code]\n- **📝 说明**:\n\n#### [200] HGS: Hybrid Gaussian Splatting with Static-Dynamic Decomposition for Compact Dynamic View Synthesis\n- **🧑‍🔬 作者**：Kaizhe Zhang, Yijie Zhou, Weizhan Zhang, Caixia Yan, Haipeng Du, yugui xie, Yu-Hui Wen, Yong-Jin Liu\n- **🏫 单位**：Xi’an Jiaotong University ⟐ Beijing Jiaotong University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2512.14352.md)] [[arXiv:2512.14352](https://arxiv.org/abs/2512.14352)] [Code]\n- **📝 说明**:\n\n#### [201] Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination\n- **🧑‍🔬 作者**：Zhuoxiao Li, Wenzong Ma, Taoyu Wu, Jinjing Zhu, Zhenchao Q, Shuai Zhang, Jing Ou, Yinrui Ren, Weiqing Qi, Guobin Shen, Hui Xiong, Wufan Zhao\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ University of Liverpool\n- **🔗 链接**：[[中英摘要](./abs/2512.14200.md)] [[arXiv:2512.14200](https://arxiv.org/abs/2512.14200)] [Code]\n- **📝 说明**:\n\n#### [202] Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere\n- **🧑‍🔬 作者**：Francesco Di Sario, Daniel Rebain, Dor Verbin, Marco Grangetto, Andrea Tagliasacchi\n- **🏫 单位**：University of Torino ⟐ Simon Fraser University ⟐ University of British Columbia ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2512.14180.md)] [[arXiv:2512.14180](https://arxiv.org/abs/2512.14180)] [Code]\n- **📝 说明**:\n\n#### [203] Consistent Instance Field for Dynamic Scene Understanding\n- **🧑‍🔬 作者**：Minghao Hou, Xuqi Wang, Yiming Zhang, Qiyuan Sun, Jing Zhang, Nick Barnes\n- **🏫 单位**：Australian National University ⟐ The University of Sydney\n- **🔗 链接**：[[中英摘要](./abs/2512.14126.md)] [[arXiv:2512.14126](https://arxiv.org/abs/2512.14126)] [Code]\n- **📝 说明**:\n\n#### [204] GaussianPlant: Structure-aligned Gaussian Splatting for 3D Reconstruction of Plants\n- **🧑‍🔬 作者**：Yang Yang, Risa Shinoda, Hiroaki Santo, Fumio Okura\n- **🏫 单位**：The University of Osaka\n- **🔗 链接**：[[中英摘要](./abs/2512.14087.md)] [[arXiv:2512.14087](https://arxiv.org/abs/2512.14087)] [Code]\n- **📝 说明**:\n\n#### [205] ASAP-Textured Gaussians: Enhancing Textured Gaussians with Adaptive Sampling and Anisotropic Parameterization\n- **🧑‍🔬 作者**：Meng Wei, Cheng Zhang, Jianmin Zheng, Hamid Rezatofighi, Jianfei Cai\n- **🏫 单位**：Monash University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2512.14039.md)] [[arXiv:2512.14039](https://arxiv.org/abs/2512.14039)] [Code]\n- **📝 说明**:\n\n#### [206] Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries\n- **🧑‍🔬 作者**：Victor Rong, Jan Held, Victor Chu, Daniel Rebain, Marc Van Droogenbroeck, Kiriakos N. Kutulakos, Andrea Tagliasacchi, David B. Lindell\n- **🏫 单位**：University of Toronto ⟐ Vector Institute ⟐ Simon Fraser University ⟐ University of Liège ⟐ University of British Columbia\n- **🔗 链接**：[[中英摘要](./abs/2512.13796.md)] [[arXiv:2512.13796](https://arxiv.org/abs/2512.13796)] [[Code](https://github.com/victor-rong/nexels)]\n- **📝 说明**:\n\n#### [207] Computer vision training dataset generation for robotic environments using Gaussian splatting\n- **🧑‍🔬 作者**：Patryk Niżeniec, Marcin Iwanowski\n- **🏫 单位**：Institute of Engineering and Technology ⟐ Nicolaus Copernicus University in Toruń\n- **🔗 链接**：[[中英摘要](./abs/2512.13411.md)] [[arXiv:2512.13411](https://arxiv.org/abs/2512.13411)] [[Code](https://github.com/PatrykNi/UnitySplat2Data)]\n- **📝 说明**:\n\n#### [208] Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance\n- **🧑‍🔬 作者**：Jan U. Müller, Robin Tim Landsgesell, Leif Van Holland, Patrick Stotko, Reinhard Klein\n- **🏫 单位**：University of Bonn\n- **🔗 链接**：[[中英摘要](./abs/2512.11800.md)] [[arXiv:2512.11800](https://arxiv.org/abs/2512.11800)] [Code]\n- **📝 说明**:\n\n#### [209] Fast and Explicit: Slice-to-Volume Reconstruction via 3D Gaussian Primitives with Analytic Point Spread Function Modeling\n- **🧑‍🔬 作者**：Maik Dannecker, Steven Jia, Nil Stolt-Ansó, Nadine Girard, Guillaume Auzias, François Rousseau, Daniel Rueckert\n- **🏫 单位**：TUM University Hospital ⟐ Technical University of Munich ⟐ Institut de Neurosciences de la Timone ⟐ Aix-Marseille Université ⟐ IMT Atlantique ⟐ Imperial College London\n- **🔗 链接**：[[中英摘要](./abs/2512.11624.md)] [[arXiv:2512.11624](https://arxiv.org/abs/2512.11624)] [[Code](https://github.com/m-dannecker/Gaussian-Primitives-for-Fast-SVR)]\n- **📝 说明**:\n\n#### [210] Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video\n- **🧑‍🔬 作者**：Meng-Li Shih, Ying-Huan Chen, Yu-Lun Liu, Brian Curless\n- **🏫 单位**：University of Washington\n- **🔗 链接**：[[中英摘要](./abs/2512.11356.md)] [[arXiv:2512.11356](https://arxiv.org/abs/2512.11356)] [Code]\n- **📝 说明**:\n\n#### [211] Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction\n- **🧑‍🔬 作者**：Chen Ziwen, Hao Tan, Peng Wang, Zexiang Xu, Li Fuxin\n- **🏫 单位**：Adobe Research ⟐ Tripo AI ⟐ Hillbot ⟐ Oregon State University\n- **🔗 链接**：[[中英摘要](./abs/2512.10267.md)] [[arXiv:2512.10267](https://arxiv.org/abs/2512.10267)] [Code]\n- **📝 说明**:\n\n#### [212] GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting\n- **🧑‍🔬 作者**：Madhav Agarwal, Mingtian Zhang, Laura Sevilla-Lara, Steven McDonagh\n- **🏫 单位**：University of Edinburgh ⟐ University College London\n- **🔗 链接**：[[中英摘要](./abs/2512.10939.md)] [[arXiv:2512.10939](https://arxiv.org/abs/2512.10939)] [Code]\n- **📝 说明**:\n\n#### [213] Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views\n- **🧑‍🔬 作者**：Zhankuo Xu, Chaoran Feng, Yingtao Li, Jianbin Zhao, Jiashu Yang, Wangbo Yu, Li Yuan, Yonghong Tian\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2512.10369.md)] [[arXiv:2512.10369](https://arxiv.org/abs/2512.10369)] [[Code](https://github.com/PotatoBigRoom/CoherentGS)]\n- **📝 说明**:\n\n#### [214] TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing\n- **🧑‍🔬 作者**：Jiachen Tao, Junyi Wu, Haoxuan Wang, Zongxin Yang, Dawen Cai, Yan Yan\n- **🏫 单位**：University of Illinois Chicago ⟐ Harvard Medical School ⟐ University of Michigan\n- **🔗 链接**：[[中英摘要](./abs/2512.10095.md)] [[arXiv:2512.10095](https://arxiv.org/abs/2512.10095)] [Code]\n- **📝 说明**:\n\n#### [215] GAINS: Gaussian-based Inverse Rendering from Sparse Multi-View Captures\n- **🧑‍🔬 作者**：Patrick Noras, Jun Myeong Choi, Didier Stricker, Pieter Peers, Roni Sengupta\n- **🏫 单位**：University Kaiserslautern-Landau ⟐ German Research Center for Artificial Intelligence ⟐ University of North Carolina at Chapel Hill ⟐ College of William & Mary\n- **🔗 链接**：[[中英摘要](./abs/2512.09925.md)] [[arXiv:2512.09925](https://arxiv.org/abs/2512.09925)] [Code]\n- **📝 说明**:\n\n#### [216] Splatent: Splatting Diffusion Latents for Novel View Synthesis\n- **🧑‍🔬 作者**：Or Hirschorn, Omer Sela, Inbar Huberman-Spiegelglas, Netalee Efrat, Eli Alshan, Ianir Ideses, Frederic Devernay, Yochai Zvik, Lior Fritz\n- **🏫 单位**：Amazon Prime Video ⟐ Tel-Aviv University\n- **🔗 链接**：[[中英摘要](./abs/2512.09923.md)] [[arXiv:2512.09923](https://arxiv.org/abs/2512.09923)] [Code]\n- **📝 说明**:\n\n#### [217] YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos\n- **🧑‍🔬 作者**：Ryan Meegan, Adam D'Souza, Bryan Bo Cao, Shubham Jain, Kristin Dana\n- **🏫 单位**：Rutgers University ⟐ Stony Brook University\n- **🔗 链接**：[[中英摘要](./abs/2512.09903.md)] [[arXiv:2512.09903](https://arxiv.org/abs/2512.09903)] [Code]\n- **📝 说明**:\n\n#### [218] Super4DR: 4D Radar-centric Self-supervised Odometry and Gaussian-based Map Optimization\n- **🧑‍🔬 作者**：Zhiheng Li, Weihua Wang, Qiang Shen, Yichen Zhao, Zheng Fang\n- **🏫 单位**：Northeastern University\n- **🔗 链接**：[[中英摘要](./abs/2512.09608.md)] [[arXiv:2512.09608](https://arxiv.org/abs/2512.09608)] [Code]\n- **📝 说明**:\n\n#### [219] D$^2$GSLAM: 4D Dynamic Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Siting Zhu, Yuxiang Huang, Wenhua Wu, Chaokang Jiang, Yongbo Chen, I-Ming Chen, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Bosch ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2512.09411.md)] [[arXiv:2512.09411](https://arxiv.org/abs/2512.09411)] [Code]\n- **📝 说明**:\n\n#### [220] OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics\n- **🧑‍🔬 作者**：Jisang Yoo, Gyeongjin Kang, Hyun-kyu Ko, Hyeonwoo Yu, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2512.08625.md)] [[arXiv:2512.08625](https://arxiv.org/abs/2512.08625)] [Code]\n- **📝 说明**:\n\n#### [221] HybridSplat: Fast Reflection-baked Gaussian Tracing using Hybrid Splatting\n- **🧑‍🔬 作者**：Chang Liu, Hongliang Yuan, Lianghao Zhang, Sichao Wang, Jianwei Guo, Shi-Sheng Huang\n- **🏫 单位**：Beijing Normal University ⟐ Xiaomi Inc\n- **🔗 链接**：[[中英摘要](./abs/2512.08334.md)] [[arXiv:2512.08334](https://arxiv.org/abs/2512.08334)] [Code]\n- **📝 说明**:\n\n#### [222] Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects\n- **🧑‍🔬 作者**：Shuohan Tao, Boyao Zhou, Hanzhang Tu, Yuwang Wang, Yebin Liu\n- **🏫 单位**：University of Cambridge ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2512.07381.md)] [[arXiv:2512.07381](https://arxiv.org/abs/2512.07381)] [Code]\n- **📝 说明**:\n\n#### [223] Material-informed Gaussian Splatting for 3D World Reconstruction in a Digital Twin\n- **🧑‍🔬 作者**：Andy Huynh, João Malheiro Silva, Holger Caesar, Tong Duy Son\n- **🏫 单位**：Siemens Digital Industries Software ⟐ Delft University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2511.20348.md)] [[arXiv:2511.20348](https://arxiv.org/abs/2511.20348)] [Code]\n- **📝 说明**:\n\n#### [224] Active3D: Active High-Fidelity 3D Reconstruction via Hierarchical Uncertainty Quantification\n- **🧑‍🔬 作者**：Yan Li, Yingzhao Li, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2511.20050.md)] [[arXiv:2511.20050](https://arxiv.org/abs/2511.20050)] [Code]\n- **📝 说明**:\n\n#### [225] GigaWorld-0: World Models as Data Engine to Empower Embodied AI\n- **🧑‍🔬 作者**：GigaWorld Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jiagang Zhu, Kerui Li, Mengyuan Xu, Qiuping Deng, Siting Wang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yankai Wang, Yu Cao, Yifan Chang, Yuan Xu, Yun Ye, Yang Wang, Yukun Zhou, Zhengyuan Zhang, Zhehao Dong, Zheng Zhu\n- **🏫 单位**：GigaAI\n- **🔗 链接**：[[中英摘要](./abs/2511.19861.md)] [[arXiv:2511.19861](https://arxiv.org/abs/2511.19861)] [[Code](https://github.com/open-gigaai/giga-world-0)]\n- **📝 说明**:\n\n#### [226] DensifyBeforehand: LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Phurtivilai Patt, Leyang Huang, Yinqiang Zhang, Yang Lei\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2511.19294.md)] [[arXiv:2511.19294](https://arxiv.org/abs/2511.19294)] [Code]\n- **📝 说明**:\n\n#### [227] IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes\n- **🧑‍🔬 作者**：Carl Lindström, Mahan Rafidashti, Maryam Fatemi, Lars Hammarstrand, Martin R. Oswald, Lennart Svensson\n- **🏫 单位**：Zenseact ⟐ Chalmers University of Technology ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](./abs/2511.19235.md)] [[arXiv:2511.19235](https://arxiv.org/abs/2511.19235)] [Code]\n- **📝 说明**:\n\n#### [228] NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Brent Zoomers, Florian Hahlbohm, Joni Vanherck, Lode Jorissen, Marcus Magnor, Nick Michiels\n- **🏫 单位**：Hasselt University ⟐ TU Braunschweig\n- **🔗 链接**：[[中英摘要](./abs/2511.19202.md)] [[arXiv:2511.19202](https://arxiv.org/abs/2511.19202)] [Code]\n- **📝 说明**:\n\n#### [229] MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes\n- **🧑‍🔬 作者**：Kehua Chen, Tianlu Mao, Zhuxin Ma, Hao Jiang, Zehao Li, Zihan Liu, Shuqi Gao, Honglong Zhao, Feng Dai, Yucheng Zhang, Zhaoqi Wang\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2511.19172.md)] [[arXiv:2511.19172](https://arxiv.org/abs/2511.19172)] [Code]\n- **📝 说明**:\n\n#### [230] NeAR: Coupled Neural Asset-Renderer Stack\n- **🧑‍🔬 作者**：Hong Li, Chongjie Ye, Houyuan Chen, Weiqing Xiao, Ziyang Yan, Lixing Xiao, Zhaoxi Chen, Jianfeng Xiang, Shaocong Xu, Xuhui Liu, Yikai Wang, Baochang Zhang, Xiaoguang Han, Jiaolong Yang, Hao Zhao\n- **🏫 单位**：BAAI ⟐ BUAA ⟐ CUHKSZ ⟐ NJU ⟐ UniTn ⟐ ZJU ⟐ NTU ⟐ THU ⟐ BNU\n- **🔗 链接**：[[中英摘要](./abs/2511.18600.md)] [[arXiv:2511.18600](https://arxiv.org/abs/2511.18600)] [Code]\n- **📝 说明**:\n\n#### [231] PhysGS: Bayesian-Inferred Gaussian Splatting for Physical Property Estimation\n- **🧑‍🔬 作者**：Samarth Chopra, Jing Liang, Gershom Seneviratne, Dinesh Manocha\n- **🏫 单位**：University of Maryland ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2511.18570.md)] [[arXiv:2511.18570](https://arxiv.org/abs/2511.18570)] [Code]\n- **📝 说明**:\n\n#### [232] SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation\n- **🧑‍🔬 作者**：Peter Siegel, Federico Tombari, Marc Pollefeys, Daniel Barath\n- **🏫 单位**：ETH Zurich ⟐ Google ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](./abs/2511.18386.md)] [[arXiv:2511.18386](https://arxiv.org/abs/2511.18386)] [Code]\n- **📝 说明**:\n\n#### [233] Observer Actor: Active Vision Imitation Learning with Sparse View Gaussian Splatting\n- **🧑‍🔬 作者**：Yilong Wang, Cheng Qian, Ruomeng Fan, Edward Johns\n- **🏫 单位**：Imperial College London\n- **🔗 链接**：[[中英摘要](./abs/2511.18140.md)] [[arXiv:2511.18140](https://arxiv.org/abs/2511.18140)] [Code]\n- **📝 说明**:\n\n#### [234] RoboArmGS: High-Quality Robotic Arm Splatting via Bézier Curve Refinement\n- **🧑‍🔬 作者**：Hao Wang, Xiaobao Wei, Ying Li, Qingpo Wuwu, Dongli Wu, Jiajun Cao, Ming Lu, Wenzhao Zheng, Shanghang Zhang\n- **🏫 单位**：Peking University ⟐ University of California\n- **🔗 链接**：[[中英摘要](./abs/2511.17961.md)] [[arXiv:2511.17961](https://arxiv.org/abs/2511.17961)] [Code]\n- **📝 说明**:\n\n#### [235] Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization\n- **🧑‍🔬 作者**：Youngsik Yun, Dongjun Gu, Youngjung Uh\n- **🏫 单位**：Yonsei University ⟐ UNIST ⟐ Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2511.17918.md)] [[arXiv:2511.17918](https://arxiv.org/abs/2511.17918)] [[Code](https://github.com/bbangsik13/FASR)]\n- **📝 说明**:\n\n#### [236] CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation\n- **🧑‍🔬 作者**：Yuhang Ming, Chenxin Fang, Xingyuan Yu, Fan Zhang, Weichen Dai, Wanzeng Kong, Guofeng Zhang\n- **🏫 单位**： Hangzhou Dianzi University ⟐ Zhejiang University ⟐ University of Bristol\n- **🔗 链接**：[[中英摘要](./abs/2511.17904.md)] [[arXiv:2511.17904](https://arxiv.org/abs/2511.17904)] [Code]\n- **📝 说明**:\n\n#### [237] AEGIS: Preserving privacy of 3D Facial Avatars with Adversarial Perturbations\n- **🧑‍🔬 作者**：Dawid Wolkiewicz, Anastasiya Pechko, Przemysław Spurek, Piotr Syga\n- **🏫 单位**：Wroclaw University of Science and Technology ⟐ Jagiellonian University\n- **🔗 链接**：[[中英摘要](./abs/2511.17747.md)] [[arXiv:2511.17747](https://arxiv.org/abs/2511.17747)] [Code]\n- **📝 说明**:\n\n#### [238] SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors\n- **🧑‍🔬 作者**：Kunyi Li, Michael Niemeyer, Sen Wang, Stefano Gasperini, Nassir Navab, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Google ⟐ Munich Center for Machine Learning ⟐ VisualAIs\n- **🔗 链接**：[[中英摘要](./abs/2511.17207.md)] [[arXiv:2511.17207](https://arxiv.org/abs/2511.17207)] [Code]\n- **📝 说明**:\n\n#### [239] PEGS: Physics-Event Enhanced Large Spatiotemporal Motion Reconstruction via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yijun Xu, Jingrui Zhang, Hongyi Liu, Yuhan Chen, Yuanyang Wang, Qingyao Guo, Dingwen Wang, Lei Yu, Chu He\n- **🏫 单位**：Wuhan University ⟐ Chongqing University\n- **🔗 链接**：[[中英摘要](./abs/2511.17116.md)] [[arXiv:2511.17116](https://arxiv.org/abs/2511.17116)] [Code]\n- **📝 说明**:\n\n#### [240] SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting\n- **🧑‍🔬 作者**：Di Wu, Liu Liu, Xueyu Yuan, Qiaojun Yu, Wenxiao Chen, Ruilong Yan, Yiming Tang, Liangtu Song\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Science and Technology of China ⟐ Hefei University of Technology ⟐ Shanghai AI Laboratory ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2511.17092.md)] [[arXiv:2511.17092](https://arxiv.org/abs/2511.17092)] [Code]\n- **📝 说明**:\n\n#### [241] RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation\n- **🧑‍🔬 作者**：Wenzhuo Sun, Mingjian Liang, Wenxuan Song, Xuelian Cheng, Zongyuan Ge\n- **🏫 单位**：Monash University ⟐ The Hong Kong University of Science and Technology(GZ)\n- **🔗 链接**：[[中英摘要](./abs/2511.17048.md)] [[arXiv:2511.17048](https://arxiv.org/abs/2511.17048)] [Code]\n- **📝 说明**:\n\n#### [242] PhysMorph-GS: Differentiable Shape Morphing via Joint Optimization of Physics and Rendering Objectives\n- **🧑‍🔬 作者**：Chang-Yong Song, David Hyde\n- **🏫 单位**：Vanderbilt University\n- **🔗 链接**：[[中英摘要](./abs/2511.16988.md)] [[arXiv:2511.16988](https://arxiv.org/abs/2511.16988)] [Code]\n- **📝 说明**:\n\n#### [243] Gradient-Driven Natural Selection for Compact 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaobin Deng, Qiuli Yu, Changyu Diao, Min Li, Duanqing Xu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2511.16980.md)] [[arXiv:2511.16980](https://arxiv.org/abs/2511.16980)] [[Code](https://github.com/XiaoBin2001/GNS)]\n- **📝 说明**:\n\n#### [244] One Walk is All You Need: Data-Efficient 3D RF Scene Reconstruction with Human Movements\n- **🧑‍🔬 作者**：Yiheng Bian, Zechen Li, Lanqing Yang, Hao Pan, Yezhou Wang, Longyuan Ge, Jeffery Wu, Ruiheng Liu, Yongjian Fu, Yichao chen, Guangtao xue\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Central South University\n- **🔗 链接**：[[中英摘要](./abs/2511.16966.md)] [[arXiv:2511.16966](https://arxiv.org/abs/2511.16966)] [Code]\n- **📝 说明**:\n\n#### [245] Vorion: A RISC-V GPU with Hardware-Accelerated 3D Gaussian Rendering and Training\n- **🧑‍🔬 作者**：Yipeng Wang, Mengtian Yang, Chieh-pu Lo, Jaydeep P. Kulkarni\n- **🏫 单位**：University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2511.16831.md)] [[arXiv:2511.16831](https://arxiv.org/abs/2511.16831)] [Code]\n- **📝 说明**:\n\n#### [246] Optimizing 3D Gaussian Splattering for Mobile GPUs\n- **🧑‍🔬 作者**：Md Musfiqur Rahman Sanim, Zhihao Shu, Bahram Afsharmanesh, AmirAli Mirian, Jiexiong Guan, Wei Niu, Bin Ren, Gagan Agrawal\n- **🏫 单位**：University of Georgia ⟐ William & Mary\n- **🔗 链接**：[[中英摘要](./abs/2511.16298.md)] [[arXiv:2511.16298](https://arxiv.org/abs/2511.16298)] [Code]\n- **📝 说明**:\n\n#### [247] LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM\n- **🧑‍🔬 作者**：Sibaek Lee, Seongbo Ha, Kyeongsu Kang, Joonyeol Choi, Seungjun Tak, Hyeonwoo Yu\n- **🏫 单位**：Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](./abs/2511.16144.md)] [[arXiv:2511.16144](https://arxiv.org/abs/2511.16144)] [[Code](https://github.com/Lab-of-AI-and-Robotics/LEGO-SLAM)]\n- **📝 说明**:\n\n#### [248] Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments\n- **🧑‍🔬 作者**：Renxiang Xiao, Wei Liu, Yuanfan Zhang, Yushuai Chen, Jinming Chen, Zilu Wang, Liang Hu\n- **🏫 单位**：Harbin Institute of Technology, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2511.16091.md)] [[arXiv:2511.16091](https://arxiv.org/abs/2511.16091)] [Code]\n- **📝 说明**:\n\n#### [249] CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis\n- **🧑‍🔬 作者**：Zijian Wu, Mingfeng Jiang, Zidian Lin, Ying Song, Hanjie Ma, Qun Wu, Dongping Zhang, Guiyang Pu\n- **🏫 单位**：Zhejiang Sci-Tech University ⟐ China Jiliang University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2511.16030.md)] [[arXiv:2511.16030](https://arxiv.org/abs/2511.16030)] [Code]\n- **📝 说明**:\n\n#### [250] Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction\n- **🧑‍🔬 作者**：Hao Tian, Chenyangguang Zhang, Rui Liu, Wen Shen, Xiaolin Qin\n- **🏫 单位**：Chinese Academy of Sciences ⟐ Tsinghua University ⟐ Minzu University of China ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2511.14540.md)] [[arXiv:2511.14540](https://arxiv.org/abs/2511.14540)] [Code]\n- **📝 说明**:\n\n#### [251] Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs\n- **🧑‍🔬 作者**：Yiyi Miao, Taoyu Wu, Tong Chen, Ji Jiang, Zhe Tang, Zhengyong Jiang, Angelos Stefanidis, Limin Yu, Jionglong Su\n- **🏫 单位**：Xi’an Jiaotong-Liverpool University ⟐ University of Liverpool ⟐ Zhejiang University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2511.14315.md)] [[arXiv:2511.14315](https://arxiv.org/abs/2511.14315)] [Code]\n- **📝 说明**:\n\n#### [252] GEN3D: Generating Domain-Free 3D Scenes from a Single Image\n- **🧑‍🔬 作者**：Yuxin Zhang, Ziyu Lu, Hongbo Duan, Keyu Fan, Pengting Luo, Peiyu Zhuang, Mengyu Yang, Houde Liu\n- **🏫 单位**： Tsinghua University, Shenzhen ⟐ Huawei\n- **🔗 链接**：[[中英摘要](./abs/2511.14291.md)] [[arXiv:2511.14291](https://arxiv.org/abs/2511.14291)] [Code]\n- **📝 说明**:\n\n#### [253] Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting\n- **🧑‍🔬 作者**：Jiangnan Ye, Jiedong Zhuang, Lianrui Mu, Wenjie Zheng, Jiaqi Hu, Xingze Zou, Jing Wang, Haoji Hu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2511.13684.md)] [[arXiv:2511.13684](https://arxiv.org/abs/2511.13684)] [Code]\n- **📝 说明**:\n\n#### [254] SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zihan Li, Tengfei Wang, Wentian Gan, Hao Zhan, Xin Wang, Zongqian Zhan\n- **🏫 单位**：Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2511.13278.md)] [[arXiv:2511.13278](https://arxiv.org/abs/2511.13278)] [Code]\n- **📝 说明**:\n\n#### [255] SymGS : Leveraging Local Symmetries for 3D Gaussian Splatting Compression\n- **🧑‍🔬 作者**：Keshav Gupta, Akshat Sanghvi, Shreyas Reddy Palley, Astitva Srivastava, Charu Sharma, Avinash Sharma\n- **🏫 单位**：IIIT Hyderabad ⟐ IIT Jodhpur ⟐ University of California\n- **🔗 链接**：[[中英摘要](./abs/2511.13264.md)] [[arXiv:2511.13264](https://arxiv.org/abs/2511.13264)] [[Code](https://github.com/SymGS/symgs)]\n- **📝 说明**:\n\n#### [256] Monocular 3D Lane Detection via Structure Uncertainty-Aware Network with Curve-Point Queries\n- **🧑‍🔬 作者**：Ruixin Liu, Zejian Yuan\n- **🏫 单位**：Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2511.13055.md)] [[arXiv:2511.13055](https://arxiv.org/abs/2511.13055)] [[Code](https://github.com/lrx02/MonoUnc)]\n- **📝 说明**:\n\n#### [257] Beyond Darkness: Thermal-Supervised 3D Gaussian Splatting for Low-Light Novel View Synthesis\n- **🧑‍🔬 作者**：Qingsen Ma, Chen Zou, Dianyun Wang, Jia Wang, Liuyu Xiang, Zhaofeng He\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2511.13011.md)] [[arXiv:2511.13011](https://arxiv.org/abs/2511.13011)] [Code]\n- **📝 说明**:\n\n#### [258] TR-Gaussians: High-fidelity Real-time Rendering of Planar Transmission and Reflection with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yong Liu, Keyang Ye, Tianjia Shao, Kun Zhou\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2511.13009.md)] [[arXiv:2511.13009](https://arxiv.org/abs/2511.13009)] [Code]\n- **📝 说明**:\n\n#### [259] SplatSearch: Instance Image Goal Navigation for Mobile Robots using 3D Gaussian Splatting and Diffusion Models\n- **🧑‍🔬 作者**：Siddarth Narasimhan, Matthew Lisondra, Haitong Wang, Goldie Nejat\n- **🏫 单位**：University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2511.12972.md)] [[arXiv:2511.12972](https://arxiv.org/abs/2511.12972)] [[Code](https://github.com/Quest2GM/SplatSearch)]\n- **📝 说明**:\n\n#### [260] Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration\n- **🧑‍🔬 作者**：Changhun Oh, Seongryong Oh, Jinwoo Hwang, Yoonsung Kim, Hardik Sharma, Jongse Park\n- **🏫 单位**：KAIST\n- **🔗 链接**：[[中英摘要](./abs/2511.12930.md)] [[arXiv:2511.12930](https://arxiv.org/abs/2511.12930)] [Code]\n- **📝 说明**:\n\n#### [261] Reconstructing 3D Scenes in Native High Dynamic Range\n- **🧑‍🔬 作者**：Kaixuan Zhang, Minxian Li, Mingwu Ren, Jiankang Deng, Xiatian Zhu\n- **🏫 单位**：Nanjing University of Science and Technology ⟐ Imperial College London ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2511.12895.md)] [[arXiv:2511.12895](https://arxiv.org/abs/2511.12895)] [Code]\n- **📝 说明**:\n\n#### [262] Changes in Real Time: Online Scene Change Detection with Multi-View Fusion\n- **🧑‍🔬 作者**：Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim, Donald Dansereau, Niko Sünderhauf, Dimity Miller\n- **🏫 单位**：QUT Centre for Robotics ⟐ ARIAM ⟐ University of Sydney ⟐ Abyss Solutions\n- **🔗 链接**：[[中英摘要](./abs/2511.12370.md)] [[arXiv:2511.12370](https://arxiv.org/abs/2511.12370)] [Code]\n- **📝 说明**:\n\n#### [263] 3D Gaussian and Diffusion-Based Gaze Redirection\n- **🧑‍🔬 作者**：Abiram Panchalingam, Indu Bodala, Stuart Middleton\n- **🏫 单位**：University of Southampton\n- **🔗 链接**：[[中英摘要](./abs/2511.11231.md)] [[arXiv:2511.11231](https://arxiv.org/abs/2511.11231)] [Code]\n- **📝 说明**:\n\n#### [264] RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting\n- **🧑‍🔬 作者**：Ruocheng Wu, Haolan He, Yufei Wang, Zhihao Li, Bihan Wen\n- **🏫 单位**：University of Electronic Science and Technology of China ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2511.11213.md)] [[arXiv:2511.11213](https://arxiv.org/abs/2511.11213)] [Code]\n- **📝 说明**:\n\n#### [265] Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision\n- **🧑‍🔬 作者**：Yu Deng, Baozhu Zhao, Junyan Su, Xiaohan Zhang, Qi Liu\n- **🏫 单位**：South China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2511.10316.md)] [[arXiv:2511.10316](https://arxiv.org/abs/2511.10316)] [Code]\n- **📝 说明**:\n\n#### [266] AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting\n- **🧑‍🔬 作者**：Aymen Mir, Jian Wang, Riza Alp Guler, Chuan Guo, Gerard Pons-Moll, Bing Zhou\n- **🏫 单位**：Snap Inc. ⟐ University of Tübingen\n- **🔗 链接**：[[中英摘要](./abs/2511.09827.md)] [[arXiv:2511.09827](https://arxiv.org/abs/2511.09827)] [Code]\n- **📝 说明**:\n\n#### [267] Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration\n- **🧑‍🔬 作者**：Hanzhou Liu, Peng Jiang, Jia Huang, Mi Lu\n- **🏫 单位**：Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2511.09818.md)] [[arXiv:2511.09818](https://arxiv.org/abs/2511.09818)] [Code]\n- **📝 说明**:\n\n#### [268] OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS\n- **🧑‍🔬 作者**：Haiyi Li, Qi Chen, Denis Kalkofen, Hsiang-Ting Chen\n- **🏫 单位**：Adelaide University\n- **🔗 链接**：[[中英摘要](./abs/2511.09397.md)] [[arXiv:2511.09397](https://arxiv.org/abs/2511.09397)] [Code]\n- **📝 说明**:\n\n#### [269] YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Botao Ye, Boqi Chen, Haofei Xu, Daniel Barath, Marc Pollefeys\n- **🏫 单位**：ETH Zurich ⟐ ETH AI Center ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](./abs/2511.07321.md)] [[arXiv:2511.07321](https://arxiv.org/abs/2511.07321)] [Code]\n- **📝 说明**:\n\n#### [270] GFix: Perceptually Enhanced Gaussian Splatting Video Compression\n- **🧑‍🔬 作者**：Siyue Teng, Ge Gao, Duolikun Danier, Yuxuan Jiang, Fan Zhang, Thomas Davis, Zoe Liu, David Bull\n- **🏫 单位**：University of Bristol ⟐ University of Edinburgh ⟐ Visionular Inc.\n- **🔗 链接**：[[中英摘要](./abs/2511.06953.md)] [[arXiv:2511.06953](https://arxiv.org/abs/2511.06953)] [Code]\n- **📝 说明**:\n\n#### [271] MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks\n- **🧑‍🔬 作者**：Tianang Chen, Jian Jin, Shilv Cai, Zhuangzi Li, Weisi Lin\n- **🏫 单位**：Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2511.06830.md)] [[arXiv:2511.06830](https://arxiv.org/abs/2511.06830)] [Code]\n- **📝 说明**:\n\n#### [272] ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives\n- **🧑‍🔬 作者**：Bartłomiej Baranowski, Stefano Esposito, Patricia Gschoßmann, Anpei Chen, Andreas Geiger\n- **🏫 单位**：University of Tubingen\n- **🔗 链接**：[[中英摘要](./abs/2511.06810.md)] [[arXiv:2511.06810](https://arxiv.org/abs/2511.06810)] [Code]\n- **📝 说明**:\n\n#### [273] DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Chenpeng Su, Wenhua Wu, Chensheng Peng, Tianchen Deng, Zhe Liu, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2511.06632.md)] [[arXiv:2511.06632](https://arxiv.org/abs/2511.06632)] [Code]\n- **📝 说明**:\n\n#### [274] 4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos\n- **🧑‍🔬 作者**：Mengqi Guo, Bo Xu, Yanyan Li, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2511.05229.md)] [[arXiv:2511.05229](https://arxiv.org/abs/2511.05229)] [Code]\n- **📝 说明**:\n\n#### [275] Efficient representation of 3D spatial data for defense-related applications\n- **🧑‍🔬 作者**：Benjamin Kahl, Marcus Hebel, Michael Arens\n- **🏫 单位**：Fraunhofer Institute of Optronics, System Technologies and Image Exploitation\n- **🔗 链接**：[[中英摘要](./abs/2511.05109.md)] [[arXiv:2511.05109](https://arxiv.org/abs/2511.05109)] [Code]\n- **📝 说明**:\n\n#### [276] 3D Gaussian Point Encoders\n- **🧑‍🔬 作者**：Jim James, Ben Wilson, Simon Lucey, James Hays\n- **🏫 单位**：Georgia Tech ⟐ University of Adelaide\n- **🔗 链接**：[[中英摘要](./abs/2511.04797.md)] [[arXiv:2511.04797](https://arxiv.org/abs/2511.04797)] [Code]\n- **📝 说明**:\n\n#### [277] Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions\n- **🧑‍🔬 作者**：Kaifeng Zhang, Shuo Sha, Hanxiao Jiang, Matthew Loper, Hyunjong Song, Guangyan Cai, Zhuo Xu, Xiaochen Hu, Changxi Zheng, Yunzhu Li\n- **🏫 单位**：Columbia University ⟐ SceniX Inc. ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2511.04665.md)] [[arXiv:2511.04665](https://arxiv.org/abs/2511.04665)] [[Code](https://github.com/kywind/real2sim-eval)]\n- **📝 说明**:\n\n#### [278] CaRF: Enhancing Multi-View Consistency in Referring 3D Gaussian Splatting Segmentation\n- **🧑‍🔬 作者**：Yuwen Tao, Kanglei Zhou, Xin Tan, Yuan Xie\n- **🏫 单位**：Shanghai Pinghe High School ⟐ Tsinghua University ⟐ East China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2511.03992.md)] [[arXiv:2511.03992](https://arxiv.org/abs/2511.03992)] [Code]\n- **📝 说明**:\n\n#### [279] Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization\n- **🧑‍🔬 作者**：Zhejia Cai, Puhua Jiang, Shiwei Mao, Hongkun Cao, Ruqi Huang\n- **🏫 单位**：Tsinghua University ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2511.03950.md)] [[arXiv:2511.03950](https://arxiv.org/abs/2511.03950)] [Code]\n- **📝 说明**:\n\n#### [280] DentalSplat: Dental Occlusion Novel View Synthesis from Sparse Intra-Oral Photographs\n- **🧑‍🔬 作者**：Yiyi Miao, Taoyu Wu, Tong Chen, Sihao Li, Ji Jiang, Youpeng Yang, Angelos Stefanidis, Limin Yu, Jionglong Su\n- **🏫 单位**：Xi’an Jiaotong Liverpool University ⟐ University of Liverpool ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2511.03099.md)] [[arXiv:2511.03099](https://arxiv.org/abs/2511.03099)] [Code]\n- **📝 说明**:\n\n#### [281] PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing\n- **🧑‍🔬 作者**：Antonio Oroz, Matthias Nießner, Tobias Kirschstein\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2511.02777.md)] [[arXiv:2511.02777](https://arxiv.org/abs/2511.02777)] [Code]\n- **📝 说明**:\n\n#### [282] Object-Centric 3D Gaussian Splatting for Strawberry Plant Reconstruction and Phenotyping\n- **🧑‍🔬 作者**：Jiajia Li, Keyi Zhu, Qianwen Zhang, Dong Chen, Qi Sun, Zhaojian Li\n- **🏫 单位**：Michigan State University ⟐ Mississippi State University ⟐ New York University\n- **🔗 链接**：[[中英摘要](./abs/2511.02207.md)] [[arXiv:2511.02207](https://arxiv.org/abs/2511.02207)] [Code]\n- **📝 说明**:\n\n#### [283] 3D Gaussian Radiation Field Modeling for Integrated RIS-FAS Systems: Analysis and Optimization\n- **🧑‍🔬 作者**：Kaining Wang, Bo Yang, Yusheng Lei, Zhiwen Yu, Xuelin Cao, Liang Wang, Bin Guo, George C. Alexandropoulos, Mérouane Debbah, Zhu Han\n- **🏫 单位**：Northwestern Polytechnical University ⟐ National and Kapodistrian University of Athens ⟐ Khalifa University ⟐ University of Houston ⟐ Kyung Hee University\n- **🔗 链接**：[[中英摘要](./abs/2511.01373.md)] [[arXiv:2511.01373](https://arxiv.org/abs/2511.01373)] [Code]\n- **📝 说明**:\n\n#### [284] 4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting\n- **🧑‍🔬 作者**：Chun-Tin Wu, Jun-Cheng Chen\n- **🏫 单位**：National Taiwan University ⟐ Academia Sinica\n- **🔗 链接**：[[中英摘要](./abs/2511.00560.md)] [[arXiv:2511.00560](https://arxiv.org/abs/2511.00560)] [Code]\n- **📝 说明**:\n\n#### [285] Object-Aware 4D Human Motion Generation\n- **🧑‍🔬 作者**：Shurui Gui, Deep Anil Patel, Xiner Li, Martin Renqiang Min\n- **🏫 单位**：Texas A&M University ⟐ NEC Laboratories America\n- **🔗 链接**：[[中英摘要](./abs/2511.00248.md)] [[arXiv:2511.00248](https://arxiv.org/abs/2511.00248)] [Code]\n- **📝 说明**:\n\n#### [286] SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction\n- **🧑‍🔬 作者**：Wenfeng Huang, Xiangyun Liao, Yinling Qian, Hao Liu, Yongming Yang, Wenjing Jia, Qiong Wang\n- **🏫 单位**：University of Technology Sydney ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2510.27318.md)] [[arXiv:2510.27318](https://arxiv.org/abs/2510.27318)] [Code]\n- **📝 说明**:\n\n#### [287] AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Mirko Usuelli, David Rapado-Rincon, Gert Kootstra, Matteo Matteucci\n- **🏫 单位**：Politecnico di Milano ⟐ Wageningen University\n- **🔗 链接**：[[中英摘要](./abs/2510.26358.md)] [[arXiv:2510.26358](https://arxiv.org/abs/2510.26358)] [Code]\n- **📝 说明**:\n\n#### [288] 6D Channel Knowledge Map Construction via Bidirectional Wireless Gaussian Splatting\n- **🧑‍🔬 作者**：Juncong Zhou, Chao Hu, Guanlin Wu, Zixiang Ren, Han Hu, Juyong Zhang, Rui Zhang, Jie Xu\n- **🏫 单位**：The Chinese University of Hong Kong, Shenzhen ⟐ Beijing Institute of Technology ⟐ University of Science and Technology of China ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2510.26166.md)] [[arXiv:2510.26166](https://arxiv.org/abs/2510.26166)] [Code]\n- **📝 说明**:\n\n#### [289] JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuxuan Li, Tao Wang, Xianben Yang\n- **🏫 单位**：Beijing Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2510.26117.md)] [[arXiv:2510.26117](https://arxiv.org/abs/2510.26117)] [Code]\n- **📝 说明**:\n\n#### [290] DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes\n- **🧑‍🔬 作者**：Qirui Hou, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui\n- **🏫 单位**：Harbin Institute of Technology ⟐ Chongqing Research Institute of HIT ⟐ Li Auto\n- **🔗 链接**：[[中英摘要](./abs/2510.24734.md)] [[arXiv:2510.24734](https://arxiv.org/abs/2510.24734)] [Code]\n- **📝 说明**:\n\n#### [291] NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation\n- **🧑‍🔬 作者**：Mingyu Jeong, Eunsung Kim, Sehun Park, Andrew Jaeyong Choi\n- **🏫 单位**： Gachon University\n- **🔗 链接**：[[中英摘要](./abs/2510.24335.md)] [[arXiv:2510.24335](https://arxiv.org/abs/2510.24335)] [Code]\n- **📝 说明**:\n\n#### [292] LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation\n- **🧑‍🔬 作者**：Haotian Zhou, Xiaole Wang, He Li, Fusheng Sun, Shengyu Guo, Guolei Qi, Jianghuan Xu, Huijing Zhao\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2510.24118.md)] [[arXiv:2510.24118](https://arxiv.org/abs/2510.24118)] [Code]\n- **📝 说明**:\n\n#### [293] EndoWave: Rational-Wavelet 4D Gaussian Splatting for Endoscopic Reconstruction\n- **🧑‍🔬 作者**：Taoyu Wu, Yiyi Miao, Jiaxin Guo, Ziyan Chen, Sihang Zhao, Zhuoxiao Li, Zhe Tang, Baoru Huang, Limin Yu\n- **🏫 单位**：Xi’an Jiaotong-Liverpool Universit ⟐ University of Liverpool ⟐ The Chinese University of Hong Kong ⟐ The Hong Kong University of Science and Technology (Guangzhou) ⟐ Zhejiang University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.23087.md)] [[arXiv:2510.23087](https://arxiv.org/abs/2510.23087)] [Code]\n- **📝 说明**:\n\n#### [294] Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression\n- **🧑‍🔬 作者**：Pranav Saxena\n- **🏫 单位**：Birla Institute of Technology and Science Pilani\n- **🔗 链接**：[[中英摘要](./abs/2510.22930.md)] [[arXiv:2510.22930](https://arxiv.org/abs/2510.22930)] [[Code](https://github.com/Pranav-Saxena/Gen-LangSplat)]\n- **📝 说明**:\n\n#### [295] Region-Adaptive Learned Hierarchical Encoding for 3D Gaussian Splatting Data\n- **🧑‍🔬 作者**：Shashank N. Sridhara, Birendra Kathariya, Fangjun Pu, Peng Yin, Eduardo Pavez, Antonio Ortega\n- **🏫 单位**：University of Southern California ⟐ Dolby Laboratories, Inc.\n- **🔗 链接**：[[中英摘要](./abs/2510.22812.md)] [[arXiv:2510.22812](https://arxiv.org/abs/2510.22812)] [Code]\n- **📝 说明**:\n\n#### [296] LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering\n- **🧑‍🔬 作者**：Wenkai Zhu, Xu Li, Qimin Xu, Benwu Wang, Kun Wei, Yiming Peng, Zihang Wang\n- **🏫 单位**：Southeast University\n- **🔗 链接**：[[中英摘要](./abs/2510.22669.md)] [[arXiv:2510.22669](https://arxiv.org/abs/2510.22669)] [[Code](https://github.com/zwk0901/LVD-GS2025)]\n- **📝 说明**:\n\n#### [297] DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss\n- **🧑‍🔬 作者**：Jing Yang, Yufeng Yang\n- **🏫 单位**：Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2510.22473.md)] [[arXiv:2510.22473](https://arxiv.org/abs/2510.22473)] [Code]\n- **📝 说明**:\n\n#### [298] DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum\n- **🧑‍🔬 作者**：Yaokun Li, Lihe Ding, Xiao Chen, Guang Tan, Tianfan Xue\n- **🏫 单位**：Sun Yat-sen University ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2510.22213.md)] [[arXiv:2510.22213](https://arxiv.org/abs/2510.22213)] [[Code](https://github.com/Iron-LYK/DynamicTree)]\n- **📝 说明**:\n\n#### [299] Towards Physically Executable 3D Gaussian for Embodied Navigation\n- **🧑‍🔬 作者**：Bingchen Miao, Rong Wei, Zhiqi Ge, Xiaoquan sun, Shiqi Gao, Jingzhe Zhu, Renhan Wang, Siliang Tang, Jun Xiao, Rui Tang, Juncheng Li\n- **🏫 单位**：Zhejiang University ⟐ Manycore Tech Inc ⟐ Huazhong University of Science and Technolog\n- **🔗 链接**：[[中英摘要](./abs/2510.21307.md)] [[arXiv:2510.21307](https://arxiv.org/abs/2510.21307)] [Code]\n- **📝 说明**:\n\n#### [300] GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation\n- **🧑‍🔬 作者**：Guangqi Jiang, Haoran Chang, Ri-Zhao Qiu, Yutong Liang, Mazeyu Ji, Jiyue Zhu, Zhao Dong, Xueyan Zou, Xiaolong Wang\n- **🏫 单位**：UC San Diego ⟐ UC Los Angeles ⟐ Meta\n- **🔗 链接**：[[中英摘要](./abs/2510.20813.md)] [[arXiv:2510.20813](https://arxiv.org/abs/2510.20813)] [[Code](https://github.com/luccachiang/GSWorld)]\n- **📝 说明**:\n\n#### [301] Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses\n- **🧑‍🔬 作者**：Damian Bowness, Charalambos Poullis\n- **🏫 单位**：Concordia University\n- **🔗 链接**：[[中英摘要](./abs/2510.20027.md)] [[arXiv:2510.20027](https://arxiv.org/abs/2510.20027)] [Code]\n- **📝 说明**:\n\n#### [302] Re-Activating Frozen Primitives for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuxin Cheng, Binxiao Huang, Wenyong Zhou, Taiqiang Wu, Zhengwu Liu, Graziano Chesi, Ngai Wong\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2510.19653.md)] [[arXiv:2510.19653](https://arxiv.org/abs/2510.19653)] [[Code](https://github.com/react-gs/ReAct-GS)]\n- **📝 说明**:\n\n#### [303] Moving Light Adaptive Colonoscopy Reconstruction via Illumination-Attenuation-Aware 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hao Wang, Ying Zhou, Haoyu Zhao, Rui Wang, Qiang Hu, Xing Zhang, Qiang Li, Zhiwei Wang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Wuhan University ⟐ Wuhan United Imaging Healthcare Surgical Technology Co., Ltd\n- **🔗 链接**：[[中英摘要](./abs/2510.18739.md)] [[arXiv:2510.18739](https://arxiv.org/abs/2510.18739)] [Code]\n- **📝 说明**:\n\n#### [304] OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion\n- **🧑‍🔬 作者**：Tianyu Huang, Runnan Chen, Dongting Hu, Fengming Huang, Mingming Gong, Tongliang Liu\n- **🏫 单位**：University of Sydney ⟐ University of Melbourne\n- **🔗 链接**：[[中英摘要](./abs/2510.18253.md)] [[arXiv:2510.18253](https://arxiv.org/abs/2510.18253)] [Code]\n- **📝 说明**:\n\n#### [305] Raindrop GS: A Benchmark for 3D Gaussian Splatting under Raindrop Conditions\n- **🧑‍🔬 作者**：Zhiqiang Teng, Beibei Lin, Tingting Chen, Zifeng Yuan, Xuanyi Li, Xuanyu Zhang, Shunli Zhang\n- **🏫 单位**：Beijing Jiaotong University ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2510.17719.md)] [[arXiv:2510.17719](https://arxiv.org/abs/2510.17719)] [Code]\n- **📝 说明**:\n\n#### [306] Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS\n- **🧑‍🔬 作者**：Feng Zhou, Wenkai Guo, Pu Cao, Zhicheng Zhang, Jianqin Yin\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2510.17479.md)] [[arXiv:2510.17479](https://arxiv.org/abs/2510.17479)] [[Code](https://github.com/zss171999645/ItG-GS)]\n- **📝 说明**:\n\n#### [307] GSPlane: Concise and Accurate Planar Reconstruction via Structured Representation\n- **🧑‍🔬 作者**：Ruitong Gan, Junran Peng, Yang Liu, Chuanchen Luo, Qing Li, Zhaoxiang Zhang\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ University of Science and Technology Beijing ⟐ Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Shandong University ⟐ Linketic\n- **🔗 链接**：[[中英摘要](./abs/2510.17095.md)] [[arXiv:2510.17095](https://arxiv.org/abs/2510.17095)] [Code]\n- **📝 说明**:\n\n#### [308] 2DGS-R: Revisiting the Normal Consistency Regularization in 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Haofan Ren, Qingsong Yan, Ming Lu, Rongfeng Lu, Zunjie Zhu\n- **🏫 单位**：Hangzhou Dianzi University ⟐ Wuhan University ⟐ Intel Labs China\n- **🔗 链接**：[[中英摘要](./abs/2510.16837.md)] [[arXiv:2510.16837](https://arxiv.org/abs/2510.16837)] [Code]\n- **📝 说明**:\n\n#### [309] GS2POSE: Marry Gaussian Splatting to 6D Object Pose Estimation\n- **🧑‍🔬 作者**：Junbo Li, Weimin Yuan, Yinuo Wang, Yue Zeng, Shihao Shu, Cai Meng, Xiangzhi Bai\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2510.16777.md)] [[arXiv:2510.16777](https://arxiv.org/abs/2510.16777)] [Code]\n- **📝 说明**:\n\n#### [310] PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction\n- **🧑‍🔬 作者**：Ting-Yu Yen, Yu-Sheng Chiu, Shih-Hsuan Hung, Peter Wonka, Hung-Kuo Chu\n- **🏫 单位**：National Tsing Hua University ⟐ King Abdullah University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.15386.md)] [[arXiv:2510.15386](https://arxiv.org/abs/2510.15386)] [Code]\n- **📝 说明**:\n\n#### [311] GaussGym: An open-source real-to-sim framework for learning locomotion from pixels\n- **🧑‍🔬 作者**：Alejandro Escontrela, Justin Kerr, Arthur Allshire, Jonas Frey, Rocky Duan, Carmelo Sferrazza, Pieter Abbeel\n- **🏫 单位**：UC Berkeley ⟐ ETH Zurich ⟐ Amazon FAR\n- **🔗 链接**：[[中英摘要](./abs/2510.15352.md)] [[arXiv:2510.15352](https://arxiv.org/abs/2510.15352)] [[Code](https://github.com/escontra/gauss_gym)]\n- **📝 说明**:\n\n#### [312] SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images\n- **🧑‍🔬 作者**：Jiaxin Guo, Tongfan Guan, Wenzhen Dong, Wenzhao Zheng, Wenting Wang, Yue Wang, Yeung Yam, Yun-Hui Liu\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Hong Kong Center for Logistics Robotics ⟐ University of California, Berkeley ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2510.15072.md)] [[arXiv:2510.15072](https://arxiv.org/abs/2510.15072)] [Code]\n- **📝 说明**:\n\n#### [313] Terra: Explorable Native 3D World Model with Point Latents\n- **🧑‍🔬 作者**：Yuanhui Huang, Weiliang Chen, Wenzhao Zheng, Xin Tao, Pengfei Wan, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University ⟐ Kuaishou Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.14977.md)] [[arXiv:2510.14977](https://arxiv.org/abs/2510.14977)] [Code]\n- **📝 说明**:\n\n#### [314] GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering\n- **🧑‍🔬 作者**：Alexander Valverde, Brian Xu, Yuyin Zhou, Meng Xu, Hongyun Wang\n- **🏫 单位**：University of California ⟐ Brown University ⟐ Kean University\n- **🔗 链接**：[[中英摘要](./abs/2510.14270.md)] [[arXiv:2510.14270](https://arxiv.org/abs/2510.14270)] [Code]\n- **📝 说明**:\n\n#### [315] SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms\n- **🧑‍🔬 作者**：Haithem Turki, Qi Wu, Xin Kang, Janick Martinez Esturo, Shengyu Huang, Ruilong Li, Zan Gojcic, Riccardo de Lutio\n- **🏫 单位**：NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2510.12901.md)] [[arXiv:2510.12901](https://arxiv.org/abs/2510.12901)] [Code]\n- **📝 说明**:\n\n#### [316] PAGS: Priority-Adaptive Gaussian Splatting for Dynamic Driving Scenes\n- **🧑‍🔬 作者**：Ying A, Wenzhang Sun, Chang Zeng, Chunfeng Wang, Hao Li, Jianxun Cui\n- **🏫 单位**：Harbin Institute of Technology ⟐ Li Auto ⟐ Chongqing Research Institute of HIT\n- **🔗 链接**：[[中英摘要](./abs/2510.12282.md)] [[arXiv:2510.12282](https://arxiv.org/abs/2510.12282)] [Code]\n- **📝 说明**:\n\n#### [317] UniGS: Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering\n- **🧑‍🔬 作者**：Yusen Xie, Zhenmin Huang, Jianhao Jiao, Dimitrios Kanoulas, Jun Ma\n- **🏫 单位**：HKUST (GZ) ⟐ HKUST ⟐ UCL\n- **🔗 链接**：[[中英摘要](./abs/2510.12174.md)] [[arXiv:2510.12174](https://arxiv.org/abs/2510.12174)] [[Code](https://github.com/xieyuser/UniGS)]\n- **📝 说明**:\n\n#### [318] Gaussian Semantic Field for One-shot LiDAR Global Localization\n- **🧑‍🔬 作者**：Pengyu Yin, Shenghai Yuan, Haozhi Cao, Xingyu Ji, Ruofei Bai, Siyu Chen, Lihua Xie\n- **🏫 单位**：Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2510.12101.md)] [[arXiv:2510.12101](https://arxiv.org/abs/2510.12101)] [Code]\n- **📝 说明**:\n\n#### [319] GS-Verse: Mesh-based Gaussian Splatting for Physics-aware Interaction in Virtual Reality\n- **🧑‍🔬 作者**：Anastasiya Pechko, Piotr Borycki, Joanna Waczyńska, Daniel Barczyk, Agata Szymańska, Sławomir Tadeja, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ University of Cambridge ⟐ IDEAS Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2510.11878.md)] [[arXiv:2510.11878](https://arxiv.org/abs/2510.11878)] [[Code](https://github.com/Anastasiya999/GS-Verse)]\n- **📝 说明**:\n\n#### [320] WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting\n- **🧑‍🔬 作者**：Yifan Liu, Zhiyuan Min, Zhenwei Wang, Junta Wu, Tengfei Wang, Yixuan Yuan, Yawei Luo, Chunchao Guo\n- **🏫 单位**：Zhejiang University ⟐ Chinese University of Hong Kong ⟐ Tencent Hunyuan\n- **🔗 链接**：[[中英摘要](./abs/2510.10726.md)] [[arXiv:2510.10726](https://arxiv.org/abs/2510.10726)] [Code]\n- **📝 说明**:\n\n#### [321] High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting\n- **🧑‍🔬 作者**：Haoyu Zhao, Cheng Zeng, Linghao Zhuang, Yaxi Zhao, Shengke Xue, Hao Wang, Xingyue Zhao, Zhongyu Li, Kehan Li, Siteng Huang, Mingxiu Chen, Xin Li, Deli Zhao, Hua Zou\n- **🏫 单位**：Wuhan University ⟐ DAMO Academy ⟐ Hupan Lab ⟐ The Chinese University of Hong Kong ⟐ Tsinghua University ⟐ Huazhong University of Science and Technology ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2510.10637.md)] [[arXiv:2510.10637](https://arxiv.org/abs/2510.10637)] [[Code](https://github.com/Maxwell-Zhao/RoboSimGS)]\n- **📝 说明**:\n\n#### [322] Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework\n- **🧑‍🔬 作者**：Shanzhi Yin, Bolin Chen, Xinju Wu, Ru-Ling Liao, Jie Chen, Shiqi Wang, Yan Ye\n- **🏫 单位**：City University of Hong Kong ⟐ DAMO Academy ⟐ HuPan Laboratory ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2510.10492.md)] [[arXiv:2510.10492](https://arxiv.org/abs/2510.10492)] [Code]\n- **📝 说明**:\n\n#### [323] Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Abdelrhman Elrawy, Emad A. Mohammed\n- **🏫 单位**：Wilfrid Laurier University\n- **🔗 链接**：[[中英摘要](./abs/2510.10257.md)] [[arXiv:2510.10257](https://arxiv.org/abs/2510.10257)] [Code]\n- **📝 说明**:\n\n#### [324] Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahui Lu, Haihong Xiao, Xueyan Zhao, Wenxiong Kang\n- **🏫 单位**：SCUT\n- **🔗 链接**：[[中英摘要](./abs/2510.10097.md)] [[arXiv:2510.10097](https://arxiv.org/abs/2510.10097)] [Code]\n- **📝 说明**:\n\n#### [325] P-4DGS: Predictive 4D Gaussian Splatting with 90× Compression\n- **🧑‍🔬 作者**：Henan Wang, Hanxin Zhu, Xinliang Gong, Tianyu He, Xin Li, Zhibo Chen\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2510.10030.md)] [[arXiv:2510.10030](https://arxiv.org/abs/2510.10030)] [Code]\n- **📝 说明**:\n\n#### [326] VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping\n- **🧑‍🔬 作者**：Yicheng He, Jingwen Yu, Guangcheng Chen, Hong Zhang\n- **🏫 单位**：Southern University of Science and Technology ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.09962.md)] [[arXiv:2510.09962](https://arxiv.org/abs/2510.09962)] [Code]\n- **📝 说明**:\n\n#### [327] LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates\n- **🧑‍🔬 作者**：Minkwan Kim, Seungmin Lee, Junho Kim, Young Min Kim\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2510.09881.md)] [[arXiv:2510.09881](https://arxiv.org/abs/2510.09881)] [Code]\n- **📝 说明**:\n\n#### [328] Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians\n- **🧑‍🔬 作者**：Jin-Chuan Shi, Chengye Su, Jiajun Wang, Ariel Shamir, Miao Wang\n- **🏫 单位**：Zhejiang University ⟐ Beihang University ⟐ Reichman University\n- **🔗 链接**：[[中英摘要](./abs/2510.09438.md)] [[arXiv:2510.09438](https://arxiv.org/abs/2510.09438)] [Code]\n- **📝 说明**:\n\n#### [329] Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes\n- **🧑‍🔬 作者**：Yikang Zhang, Rui Fan\n- **🏫 单位**：Tongji University\n- **🔗 链接**：[[中英摘要](./abs/2510.09364.md)] [[arXiv:2510.09364](https://arxiv.org/abs/2510.09364)] [Code]\n- **📝 说明**:\n\n#### [330] ReSplat: Learning Recurrent Gaussian Splats\n- **🧑‍🔬 作者**：Haofei Xu, Daniel Barath, Andreas Geiger, Marc Pollefeys\n- **🏫 单位**：ETH Zurich ⟐ University of Tübingen ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](./abs/2510.08575.md)] [[arXiv:2510.08575](https://arxiv.org/abs/2510.08575)] [Code]\n- **📝 说明**:\n\n#### [331] Splat the Net: Radiance Fields with Splattable Neural Primitives\n- **🧑‍🔬 作者**：Xilong Zhou, Bao-Huy Nguyen, Loïc Magne, Vladislav Golyanik, Thomas Leimkühler, Christian Theobalt\n- **🏫 单位**：Max Planck Institute for Informatics\n- **🔗 链接**：[[中英摘要](./abs/2510.08491.md)] [[arXiv:2510.08491](https://arxiv.org/abs/2510.08491)] [[Code](https://github.com/huynguyenbao/splat-the-net)]\n- **📝 说明**:\n\n#### [332] CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving\n- **🧑‍🔬 作者**：Tianrui Zhang, Yichen Liu, Zilin Guo, Yuxin Guo, Jingcheng Ni, Chenjing Ding, Dan Xu, Lewei Lu, Zehuan Wu\n- **🏫 单位**：Sensetime Research ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.07944.md)] [[arXiv:2510.07944](https://arxiv.org/abs/2510.07944)] [[Code](https://github.com/SenseTime-FVG/OpenDWM)]\n- **📝 说明**:\n\n#### [333] PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Houqiang Zhong, Zhenglong Wu, Sihua Fu, Zihan Zheng, Xin Jin, Xiaoyun Zhang, Li Song, Qiang Hu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Eastern Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2510.07830.md)] [[arXiv:2510.07830](https://arxiv.org/abs/2510.07830)] [Code]\n- **📝 说明**:\n\n#### [334] ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes\n- **🧑‍🔬 作者**：Jian Gao, Mengqi Yuan, Yifei Zeng, Chang Zeng, Zhihao Li, Zhenyu Chen, Weichao Qiu, Xiao-Xiao Long, Hao Zhu, Xun Cao, Yao Yao\n- **🏫 单位**：Nanjing University ⟐ Huawei Noah's Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2510.07729.md)] [[arXiv:2510.07729](https://arxiv.org/abs/2510.07729)] [Code]\n- **📝 说明**:\n\n#### [335] Generating Surface for Text-to-3D using 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Huanning Dong, Fan Li, Ping Kuang, Jianwen Min\n- **🏫 单位**：University of Electronic Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2510.06967.md)] [[arXiv:2510.06967](https://arxiv.org/abs/2510.06967)] [Code]\n- **📝 说明**:\n\n#### [336] Capture and Interact: Rapid 3D Object Acquisition and Rendering with Gaussian Splatting in Unity\n- **🧑‍🔬 作者**：Islomjon Shukhratov, Sergey Gorinsky\n- **🏫 单位**：IMDEA Networks Institute\n- **🔗 链接**：[[中英摘要](./abs/2510.06802.md)] [[arXiv:2510.06802](https://arxiv.org/abs/2510.06802)] [Code]\n- **📝 说明**:\n\n#### [337] ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars\n- **🧑‍🔬 作者**：Peizhi Yan, Rabab Ward, Qiang Tang, Shan Du\n- **🏫 单位**：University of British Columbia\n- **🔗 链接**：[[中英摘要](./abs/2510.05488.md)] [[arXiv:2510.05488](https://arxiv.org/abs/2510.05488)] [Code]\n- **📝 说明**:\n\n#### [338] SketchPlan: Diffusion Based Drone Planning From Human Sketches\n- **🧑‍🔬 作者**：Sixten Norelius, Aaron O. Feldman, Mac Schwager\n- **🏫 单位**：KTH Royal Institute of Technology ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2510.03545.md)] [[arXiv:2510.03545](https://arxiv.org/abs/2510.03545)] [[Code](https://github.com/sixnor/SketchPlan)]\n- **📝 说明**:\n\n#### [339] GS-Share: Enabling High-fidelity Map Sharing with Incremental Gaussian Splatting\n- **🧑‍🔬 作者**：Xinran Zhang, Hanqi Zhu, Yifan Duan, Yanyong Zhang\n- **🏫 单位**：University of Science and Technology of China ⟐ Hefei Comprehensive National Science Center\n- **🔗 链接**：[[中英摘要](./abs/2510.02884.md)] [[arXiv:2510.02884](https://arxiv.org/abs/2510.02884)] [Code]\n- **📝 说明**:\n\n#### [340] From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ The University of Queensland\n- **🔗 链接**：[[中英摘要](./abs/2510.02732.md)] [[arXiv:2510.02732](https://arxiv.org/abs/2510.02732)] [Code]\n- **📝 说明**:\n\n#### [341] GaussianMorphing: Mesh-Guided 3D Gaussians for Semantic-Aware Object Morphing\n- **🧑‍🔬 作者**：Mengtian Li, Yunshu Bai, Yimin Chu, Yijun Shen, Zhongmei Li, Weifeng Ge, Zhifeng Xie, Chaofeng Chen\n- **🏫 单位**：Shanghai University ⟐ Wuhan University ⟐ East China University of Science and Technology ⟐ Fudan University ⟐ Shanghai Engineering Research Center of Motion Picture Special Effects\n- **🔗 链接**：[[中英摘要](./abs/2510.02034.md)] [[arXiv:2510.02034](https://arxiv.org/abs/2510.02034)] [Code]\n- **📝 说明**:\n\n#### [342] ROI-GS: Interest-based Local Quality 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Quoc-Anh Bui, Gilles Rougeron, Géraldine Morin, Simone Gasparini\n- **🏫 单位**：Universite Paris-Saclay ⟐ Universite de Toulouse\n- **🔗 链接**：[[中英摘要](./abs/2510.01978.md)] [[arXiv:2510.01978](https://arxiv.org/abs/2510.01978)] [Code]\n- **📝 说明**:\n\n#### [343] LOBE-GS: Load-Balanced and Efficient 3D Gaussian Splatting for Large-Scale Scene Reconstruction\n- **🧑‍🔬 作者**：Sheng-Hsiang Hung, Ting-Yu Yen, Wei-Fang Sun, Simon See, Shih-Hsuan Hung, Hung-Kuo Chu\n- **🏫 单位**：National Tsing Hua University ⟐ NVIDIA AI Technology Center\n- **🔗 链接**：[[中英摘要](./abs/2510.01767.md)] [[arXiv:2510.01767](https://arxiv.org/abs/2510.01767)] [Code]\n- **📝 说明**:\n\n#### [344] HART: Human Aligned Reconstruction Transformer\n- **🧑‍🔬 作者**：Xiyi Chen, Shaofei Wang, Marko Mihajlovic, Taewon Kang, Sergey Prokudin, Ming Lin\n- **🏫 单位**：University of Maryland ⟐ BIGAI ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](./abs/2509.26621.md)] [[arXiv:2509.26621](https://arxiv.org/abs/2509.26621)] [Code]\n- **📝 说明**:\n\n#### [345] LLM-Powered Code Analysis and Optimization for Gaussian Splatting Kernels\n- **🧑‍🔬 作者**：Yi Hu, Huiyang Zhou\n- **🏫 单位**：North Carolina State University\n- **🔗 链接**：[[中英摘要](./abs/2509.25626.md)] [[arXiv:2509.25626](https://arxiv.org/abs/2509.25626)] [Code]\n- **📝 说明**:\n\n#### [346] GaussianLens: Localized High-Resolution Reconstruction via On-Demand Gaussian Densification\n- **🧑‍🔬 作者**：Yijia Weng, Zhicheng Wang, Songyou Peng, Saining Xie, Howard Zhou, Leonidas J. Guibas\n- **🏫 单位**：Stanford University ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2509.25603.md)] [[arXiv:2509.25603](https://arxiv.org/abs/2509.25603)] [Code]\n- **📝 说明**:\n\n#### [347] Triangle Splatting+: Differentiable Rendering with Opaque Triangles\n- **🧑‍🔬 作者**：Jan Held, Renaud Vandeghen, Sanghyun Son, Daniel Rebain, Matheus Gadelha, Yi Zhou, Ming C. Lin, Marc Van Droogenbroeck, Andrea Tagliasacchi\n- **🏫 单位**：University of Liege ⟐ Simon Fraser University ⟐ University of Maryland ⟐ University of British Columbia ⟐ University of Toronto ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2509.25122.md)] [[arXiv:2509.25122](https://arxiv.org/abs/2509.25122)] [[Code](https://github.com/trianglesplatting2/triangle-splatting2)]\n- **📝 说明**:\n\n#### [348] UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation\n- **🧑‍🔬 作者**：Guanjun Wu, Jiemin Fang, Chen Yang, Sikuang Li, Taoran Yi, Jia Lu, Zanwei Zhou, Jiazhong Cen, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Xinggang Wang, Qi Tian\n- **🏫 单位**：Huawei Inc. ⟐ Huazhong University of Science and Technology ⟐ Shanghai Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2509.25079.md)] [[arXiv:2509.25079](https://arxiv.org/abs/2509.25079)] [[Code](https://github.com/UniLat3D/UniLat3D)]\n- **📝 说明**:\n\n#### [349] GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction\n- **🧑‍🔬 作者**：Huaizhi Qu, Xiao Wang, Gengwei Zhang, Jie Peng, Tianlong Chen\n- **🏫 单位**：University of North Carolina at Chapel Hill ⟐ University of Washington\n- **🔗 链接**：[[中英摘要](./abs/2509.25075.md)] [[arXiv:2509.25075](https://arxiv.org/abs/2509.25075)] [Code]\n- **📝 说明**:\n\n#### [350] HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping\n- **🧑‍🔬 作者**：Yu Ma, Guoliang Wei, Haihong Xiao, Yue Cheng\n- **🏫 单位**：University of Shanghai for Science and Technology ⟐ Hefei University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.24893.md)] [[arXiv:2509.24893](https://arxiv.org/abs/2509.24893)] [Code]\n- **📝 说明**:\n\n#### [351] ExGS: Extreme 3D Gaussian Compression with Diffusion Priors\n- **🧑‍🔬 作者**：Jiaqi Chen, Xinhao Ji, Yuanyuan Gao, Hao Li, Yuning Gong, Yifei Liu, Dan Xu, Zhihang Zhong, Dingwen Zhang, Xiao Sun\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.24758.md)] [[arXiv:2509.24758](https://arxiv.org/abs/2509.24758)] [Code]\n- **📝 说明**:\n\n#### [352] Proxy-GS: Efficient 3D Gaussian Splatting via Proxy Mesh\n- **🧑‍🔬 作者**：Yuanyuan Gao, Yuning Gong, Yifei Liu, Li Jingfeng, Zhihang Zhong, Dingwen Zhang, Yanci Zhang, Dan Xu, Xiao Sun\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Sichuan University ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.24421.md)] [[arXiv:2509.24421](https://arxiv.org/abs/2509.24421)] [[Code](https://github.com/gyy456/Proxy-GS)]\n- **📝 说明**:\n\n#### [353] Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos\n- **🧑‍🔬 作者**：Yingdong Hu, Yisheng He, Jinnan Chen, Weihao Yuan, Kejie Qiu, Zehong Lin, Siyu Zhu, Zilong Dong, Jun Zhang\n- **🏫 单位**：HKUST ⟐ Tongyi Lab, Alibaba Group ⟐ NUS ⟐ FDU\n- **🔗 链接**：[[中英摘要](./abs/2509.24209.md)] [[arXiv:2509.24209](https://arxiv.org/abs/2509.24209)] [[Code](https://github.com/zhenliuZJU/Forge4D)]\n- **📝 说明**:\n\n#### [354] CrashSplat: 2D to 3D Vehicle Damage Segmentation in Gaussian Splatting\n- **🧑‍🔬 作者**：Dragoş-Andrei Chileban, Andrei-Ştefan Bulzan, Cosmin Cernǎzanu-Glǎvan\n- **🏫 单位**：Politehnica University of Timis¸oara\n- **🔗 链接**：[[中英摘要](./abs/2509.23947.md)] [[arXiv:2509.23947](https://arxiv.org/abs/2509.23947)] [Code]\n- **📝 说明**:\n\n#### [355] OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting\n- **🧑‍🔬 作者**：Atakan Topaloglu, Kunyi Li, Michael Niemeyer, Nassir Navab, A. Murat Tekalp, Federico Tombari\n- **🏫 单位**：ETH Zurich ⟐ Koc¸ University ⟐ KUIS AI Center ⟐ Technical University of Munich ⟐ Google ⟐ Munich Center for Machine Learning\n- **🔗 链接**：[[中英摘要](./abs/2509.23258.md)] [[arXiv:2509.23258](https://arxiv.org/abs/2509.23258)] [Code]\n- **📝 说明**:\n\n#### [356] Learning Unified Representation of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuelin Xin, Yuheng Liu, Xiaohui Xie, Xinke Li\n- **🏫 单位**：UC Irvine ⟐ City University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2509.22917.md)] [[arXiv:2509.22917](https://arxiv.org/abs/2509.22917)] [Code]\n- **📝 说明**:\n\n#### [357] HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes\n- **🧑‍🔬 作者**：Katrina Ashton, Chahyon Ku, Shrey Shah, Wen Jiang, Kostas Daniilidis, Bernadette Bucher\n- **🏫 单位**：University of Pennsylvania ⟐ University of Michigan\n- **🔗 链接**：[[中英摘要](./abs/2509.22498.md)] [[arXiv:2509.22498](https://arxiv.org/abs/2509.22498)] [[Code](https://github.com/m-and-m-lab/helios)]\n- **📝 说明**:\n\n#### [358] Polysemous Language Gaussian Splatting via Matching-based Mask Lifting\n- **🧑‍🔬 作者**：Jiayu Ding, Xinpeng Liu, Zhiyi Pan, Shiqiang Long, Ge Li\n- **🏫 单位**：Peking University ⟐ Tianjin University ⟐ Guangdong Bohua UHD Innovation Center Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2509.22225.md)] [[arXiv:2509.22225](https://arxiv.org/abs/2509.22225)] [Code]\n- **📝 说明**:\n\n#### [359] Rigidity-Aware 3D Gaussian Deformation from a Single Image\n- **🧑‍🔬 作者**：Jinhyeok Kim, Jaehun Bang, Seunghyun Seo, Kyungdon Joo\n- **🏫 单位**：UNIST\n- **🔗 链接**：[[中英摘要](./abs/2509.22222.md)] [[arXiv:2509.22222](https://arxiv.org/abs/2509.22222)] [Code]\n- **📝 说明**:\n\n#### [360] Large Material Gaussian Model for Relightable 3D Generation\n- **🧑‍🔬 作者**：Jingrui Ye, Lingting Zhu, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Lequan Yu, Qingmin Liao\n- **🏫 单位**： Tsinghua Shenzhen International Graduate School ⟐ The University of Hong Kong ⟐ LIGHTSPEED ⟐ The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2509.22112.md)] [[arXiv:2509.22112](https://arxiv.org/abs/2509.22112)] [Code]\n- **📝 说明**:\n\n#### [361] Dynamic Novel View Synthesis in High Dynamic Range\n- **🧑‍🔬 作者**：Kaixuan Zhang, Zhipeng Xiong, Minxian Li, Mingwu Ren, Jiankang Deng, Xiatian Zhu\n- **🏫 单位**：Nanjing University of Science and Technology ⟐ Imperial College London ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2509.21853.md)] [[arXiv:2509.21853](https://arxiv.org/abs/2509.21853)] [Code]\n- **📝 说明**:\n\n#### [362] 4D Driving Scene Generation With Stereo Forcing\n- **🧑‍🔬 作者**：Hao Lu, Zhuang Ma, Guangfeng Jiang, Wenhang Ge, Bohan Li, Yuzhan Cai, Wenzhao Zheng, Yunpeng Zhang, Yingcong Chen\n- **🏫 单位**：Hong Kong University of Science and Technology (Guangzhou) ⟐ University of Science and Technology of China ⟐ Shanghai Jiao Tong University, Shanghai ⟐ University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2509.20251.md)] [[arXiv:2509.20251](https://arxiv.org/abs/2509.20251)] [Code]\n- **📝 说明**:\n\n#### [363] Aerial-Ground Image Feature Matching via 3D Gaussian Splatting-based Intermediate View Rendering\n- **🧑‍🔬 作者**：Jiangxue Yu, Hui Wang, San Jiang, Xing Zhang, Dejin Zhang, Qingquan Li\n- **🏫 单位**：China University of Geosciences ⟐ Shenzhen University ⟐ Engineering Research Center of Natural Resource Information Management and Digital\nTwin Engineering Software\n- **🔗 链接**：[[中英摘要](./abs/2509.19898.md)] [[arXiv:2509.19898](https://arxiv.org/abs/2509.19898)] [Code]\n- **📝 说明**:\n\n#### [364] BiTAA: A Bi-Task Adversarial Attack for Object Detection and Depth Estimation via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yixun Zhang, Feng Zhou, Jianqin Yin\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2509.19793.md)] [[arXiv:2509.19793](https://arxiv.org/abs/2509.19793)] [Code]\n- **📝 说明**:\n\n#### [365] VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction\n- **🧑‍🔬 作者**：Weijie Wang, Yeqing Chen, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu, Donny Y. Chen, Bohan Zhuang\n- **🏫 单位**：Zhejiang University ⟐ GigaAI ⟐ University of Electronic Science and Technology of China ⟐ The Chinese University of Hong Kong ⟐ Tsinghua University ⟐ Monash University\n- **🔗 链接**：[[中英摘要](./abs/2509.19297.md)] [[arXiv:2509.19297](https://arxiv.org/abs/2509.19297)] [[Code](https://github.com/ziplab/VolSplat)]\n- **📝 说明**:\n\n#### [366] WaveletGaussian: Wavelet-domain Diffusion for Sparse-view 3D Gaussian Object Reconstruction\n- **🧑‍🔬 作者**：Hung Nguyen, Runfa Li, An Le, Truong Nguyen\n- **🏫 单位**：UC San Diego\n- **🔗 链接**：[[中英摘要](./abs/2509.19073.md)] [[arXiv:2509.19073](https://arxiv.org/abs/2509.19073)] [Code]\n- **📝 说明**:\n\n#### [367] Seeing Through Reflections: Advancing 3D Scene Reconstruction in Mirror-Containing Environments with Gaussian Splatting\n- **🧑‍🔬 作者**：Zijing Guo, Yunyang Zhao, Lin Wang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2509.18956.md)] [[arXiv:2509.18956](https://arxiv.org/abs/2509.18956)] [Code]\n- **📝 说明**:\n\n#### [368] DeblurSplat: SfM-free 3D Gaussian Splatting with Event Camera for Robust Deblurring\n- **🧑‍🔬 作者**：Pengteng Li, Yunfan Lu, Pinhao Song, Weiyu Guo, Huizai Yao, F. Richard Yu, Hui Xiong\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ KU Leuven ⟐ Carleton University\n- **🔗 链接**：[[中英摘要](./abs/2509.18898.md)] [[arXiv:2509.18898](https://arxiv.org/abs/2509.18898)] [Code]\n- **📝 说明**:\n\n#### [369] FixingGS: Enhancing 3D Gaussian Splatting via Training-Free Score Distillation\n- **🧑‍🔬 作者**：Zhaorui Wang, Yi Gu, Deming Zhou, Renjing Xu\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2509.18759.md)] [[arXiv:2509.18759](https://arxiv.org/abs/2509.18759)] [Code]\n- **📝 说明**:\n\n#### [370] Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction\n- **🧑‍🔬 作者**：Xiaoting Yin, Hao Shi, Kailun Yang, Jiajun Zhai, Shangwei Guo, Lin Wang, Kaiwei Wang\n- **🏫 单位**：Zhejiang University ⟐ Nanyang Technological University ⟐ Hunan University\n- **🔗 链接**：[[中英摘要](./abs/2509.18566.md)] [[arXiv:2509.18566](https://arxiv.org/abs/2509.18566)] [Code]\n- **📝 说明**:\n\n#### [371] ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos\n- **🧑‍🔬 作者**：Shi Chen, Erik Sandström, Sandro Lombardi, Siyuan Li, Martin R. Oswald\n- **🏫 单位**：ETH Zürich ⟐ Google ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](./abs/2509.17864.md)] [[arXiv:2509.17864](https://arxiv.org/abs/2509.17864)] [Code]\n- **📝 说明**:\n\n#### [372] From Restoration to Reconstruction: Rethinking 3D Gaussian Splatting for Underwater Scenes\n- **🧑‍🔬 作者**：Guoxi Huang, Haoran Wang, Zipeng Qi, Wenjun Lu, David Bull, Nantheera Anantrasirichai\n- **🏫 单位**：University of Bristol ⟐ Beijing University of Aeronautics and Astronautics ⟐ The University of Sydney\n- **🔗 链接**：[[中英摘要](./abs/2509.17789.md)] [[arXiv:2509.17789](https://arxiv.org/abs/2509.17789)] [Code]\n- **📝 说明**:\n\n#### [373] FGGS-LiDAR: Ultra-Fast, GPU-Accelerated Simulation from General 3DGS Models to LiDAR\n- **🧑‍🔬 作者**：Junzhe Wu, Yufei Jia, Yiyi Yan, Zhixing Chen, Tiao Tan, Zifan Wang, Guangyu Wang\n- **🏫 单位**：Tsinghua University ⟐ DISCOVER Robotics ⟐ Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2509.17390.md)] [[arXiv:2509.17390](https://arxiv.org/abs/2509.17390)] [Code]\n- **📝 说明**:\n\n#### [374] SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction\n- **🧑‍🔬 作者**：Neham Jain, Andrew Jong, Sebastian Scherer, Ioannis Gkioulekas\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2509.17329.md)] [[arXiv:2509.17329](https://arxiv.org/abs/2509.17329)] [[Code](https://github.com/cmu-ci-lab/SmokeSeer)]\n- **📝 说明**:\n\n#### [375] SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views\n- **🧑‍🔬 作者**：Ranran Huang, Krystian Mikolajczyk\n- **🏫 单位**：Imperial College London\n- **🔗 链接**：[[中英摘要](./abs/2509.17246.md)] [[arXiv:2509.17246](https://arxiv.org/abs/2509.17246)] [Code]\n- **📝 说明**:\n\n#### [376] MedGS: Gaussian Splatting for Multi-Modal 3D Medical Imaging\n- **🧑‍🔬 作者**：Kacper Marzol, Ignacy Kolton, Weronika Smolak-Dyżewska, Joanna Kaleta, Marcin Mazur, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ Warsaw University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.16806.md)] [[arXiv:2509.16806](https://arxiv.org/abs/2509.16806)] [[Code](https://github.com/gmum/MedGS)]\n- **📝 说明**:\n\n#### [377] ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaoyang Yan, Muleilan Pei, Shaojie Shen\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.16552.md)] [[arXiv:2509.16552](https://arxiv.org/abs/2509.16552)] [Code]\n- **📝 说明**:\n\n#### [378] 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction\n- **🧑‍🔬 作者**：3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction\n- **🏫 单位**：Simon Fraser University ⟐ University of Toronto ⟐ University of Bologna ⟐ University of British Columbia\n- **🔗 链接**：[[中英摘要](./abs/2509.16423.md)] [[arXiv:2509.16423](https://arxiv.org/abs/2509.16423)] [Code]\n- **📝 说明**:\n\n#### [379] RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars\n- **🧑‍🔬 作者**：Weiyi Xiong, Bing Zhu, Tao Huang, Zewei Zheng\n- **🏫 单位**：Beihang University ⟐ James Cook University\n- **🔗 链接**：[[中英摘要](./abs/2509.16119.md)] [[arXiv:2509.16119](https://arxiv.org/abs/2509.16119)] [Code]\n- **📝 说明**:\n\n#### [380] Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval\n- **🧑‍🔬 作者**：Liwei Liao, Xufeng Li, Xiaoyun Zheng, Boning Liu, Feng Gao, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory ⟐ City University of Hongkong\n- **🔗 链接**：[[中英摘要](./abs/2509.15871.md)] [[arXiv:2509.15871](https://arxiv.org/abs/2509.15871)] [[Code](https://github.com/leviome/GVR_demos)]\n- **📝 说明**:\n\n#### [381] Camera Splatting for Continuous View Optimization\n- **🧑‍🔬 作者**：Gahye Lee, Hyomin Kim, Gwangjin Ju, Jooeun Son, Hyejeong Yoon, Seungyong Lee\n- **🏫 单位**：POSTECH\n- **🔗 链接**：[[中英摘要](./abs/2509.15677.md)] [[arXiv:2509.15677](https://arxiv.org/abs/2509.15677)] [Code]\n- **📝 说明**:\n\n#### [382] FingerSplat: Contactless Fingerprint 3D Reconstruction and Generation based on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuwei Jia, Yutang Lu, Zhe Cui, Fei Su\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2509.15648.md)] [[arXiv:2509.15648](https://arxiv.org/abs/2509.15648)] [Code]\n- **📝 说明**:\n\n#### [383] GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading\n- **🧑‍🔬 作者**：Donghyun Lee, Dawoon Jeong, Jae W. Lee, Hongil Yoon\n- **🏫 单位**：Seoul National University ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2509.15645.md)] [[arXiv:2509.15645](https://arxiv.org/abs/2509.15645)] [Code]\n- **📝 说明**:\n\n#### [384] MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild\n- **🧑‍🔬 作者**：Deming Li, Kaiwen Jiang, Yutao Tang, Ravi Ramamoorthi, Rama Chellappa, Cheng Peng\n- **🏫 单位**：Johns Hopkins University ⟐ University of California ⟐ University of Virginia\n- **🔗 链接**：[[中英摘要](./abs/2509.15548.md)] [[arXiv:2509.15548](https://arxiv.org/abs/2509.15548)] [Code]\n- **📝 说明**:\n\n#### [385] Causal Reasoning Elicits Controllable 3D Scene Generation\n- **🧑‍🔬 作者**：Shen Chen, Ruiyu Zhao, Jiale Zhou, Zongkai Wu, Jenq-Neng Hwang, Lei Li\n- **🏫 单位**：Zhejiang University ⟐ East China University of Science and Technology ⟐ Skai Intelligence ⟐ University of Washington ⟐ VitaSight\n- **🔗 链接**：[[中英摘要](./abs/2509.15249.md)] [[arXiv:2509.15249](https://arxiv.org/abs/2509.15249)] [[Code](https://github.com/gokucs/causalstruct)]\n- **📝 说明**:\n\n#### [386] FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction\n- **🧑‍🔬 作者**：Jinlong Fan, Bingyu Hu, Xingguang Li, Yuxiang Yang, Jing Zhang\n- **🏫 单位**：HangZhou Dianzi University ⟐ Shenzhen Polytechnic University ⟐ WuHan University\n- **🔗 链接**：[[中英摘要](./abs/2509.14739.md)] [[arXiv:2509.14739](https://arxiv.org/abs/2509.14739)] [Code]\n- **📝 说明**:\n\n#### [387] MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping\n- **🧑‍🔬 作者**：Zhihao Cao, Hanyu Wu, Li Wa Tang, Zizhou Luo, Zihan Zhu, Wei Zhang, Marc Pollefeys, Martin R. Oswald\n- **🏫 单位**：ETH Zurich ⟐ University of Zurich ⟐ University of Stuttgart ⟐ Microsoft Mixed Reality and AI Lab ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](./abs/2509.14191.md)] [[arXiv:2509.14191](https://arxiv.org/abs/2509.14191)] [Code]\n- **📝 说明**:\n\n#### [388] Plug-and-Play PDE Optimization for 3D Gaussian Splatting: Toward High-Quality Rendering and Reconstruction\n- **🧑‍🔬 作者**：Yifan Mo, Youcheng Cai, Ligang Liu\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2509.13938.md)] [[arXiv:2509.13938](https://arxiv.org/abs/2509.13938)] [Code]\n- **📝 说明**:\n\n#### [389] MemGS: Memory-Efficient Gaussian Splatting for Real-Time SLAM\n- **🧑‍🔬 作者**：Yinlong Bai, Hongxin Zhang, Sheng Zhong, Junkai Niu, Hai Li, Yijia He, Yi Zhou\n- **🏫 单位**：Hunan University ⟐ TCL RayNeo\n- **🔗 链接**：[[中英摘要](./abs/2509.13536.md)] [[arXiv:2509.13536](https://arxiv.org/abs/2509.13536)] [[Code](https://github.com/NAIL-HNU/MemGS_SLAM)]\n- **📝 说明**:\n\n#### [390] Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization\n- **🧑‍🔬 作者**：Hao Xu, Xiaolin Wu, Xi Zhang\n- **🏫 单位**： McMaster University ⟐ Southwest Jiaotong Universit ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2509.13482.md)] [[arXiv:2509.13482](https://arxiv.org/abs/2509.13482)] [[Code](https://github.com/hxu160/SALVQ)]\n- **📝 说明**:\n\n#### [391] Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image\n- **🧑‍🔬 作者**：Gaofeng Liu, Hengsen Li, Ruoyu Gao, Xuetong Li, Zhiyuan Ma, Tao Fang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2509.13013.md)] [[arXiv:2509.13013](https://arxiv.org/abs/2509.13013)] [Code]\n- **📝 说明**:\n\n#### [392] Beyond Averages: Open-Vocabulary 3D Scene Understanding with Gaussian Splatting and Bag of Embeddings\n- **🧑‍🔬 作者**：Abdalla Arafa, Didier Stricker\n- **🏫 单位**：German Research Center for Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2509.12938.md)] [[arXiv:2509.12938](https://arxiv.org/abs/2509.12938)] [Code]\n- **📝 说明**:\n\n#### [393] 4DRadar-GS: Self-Supervised Dynamic Driving Scene Reconstruction with 4D Radar\n- **🧑‍🔬 作者**：Xiao Tang, Guirong Zhuo, Cong Wang, Boyuan Zheng, Minqing Huang, Lianqing Zheng, Long Chen, Shouyi Lu\n- **🏫 单位**：Tongji University ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2509.12931.md)] [[arXiv:2509.12931](https://arxiv.org/abs/2509.12931)] [Code]\n- **📝 说明**:\n\n#### [394] Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization\n- **🧑‍🔬 作者**：Mengjiao Han, Andres Sewell, Joseph Insley, Janet Knowles, Victor A. Mateevitsi, Michael E. Papka, Steve Petruzza, Silvio Rizzi\n- **🏫 单位**：Argonne National Laboratory ⟐ Utah State University\n- **🔗 链接**：[[中英摘要](./abs/2509.12138.md)] [[arXiv:2509.12138](https://arxiv.org/abs/2509.12138)] [[Code](https://github.com/MengjiaoH/Grendel-GS-SciVIS)]\n- **📝 说明**:\n\n#### [395] Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yi-Hsin Li, Thomas Sikora, Sebastian Knorr, Måarten Sjöström\n- **🏫 单位**：Mid Sweden University ⟐ Technical University of Berlin ⟐ Hochschule fur Technik und Wirtschaft Berlin\n- **🔗 链接**：[[中英摘要](./abs/2509.11853.md)] [[arXiv:2509.11853](https://arxiv.org/abs/2509.11853)] [Code]\n- **📝 说明**:\n\n#### [396] Gaussian-Plus-SDF SLAM: High-fidelity 3D Reconstruction at 150+ fps\n- **🧑‍🔬 作者**：Zhexi Peng, Kun Zhou, Tianjia Shao\n- **🏫 单位**：Zhejiang University ⟐ Hangzhou Research Institute of AI and Holographic Technology\n- **🔗 链接**：[[中英摘要](./abs/2509.11574.md)] [[arXiv:2509.11574](https://arxiv.org/abs/2509.11574)] [Code]\n- **📝 说明**:\n\n#### [397] ROSGS: Relightable Outdoor Scenes With Gaussian Splatting\n- **🧑‍🔬 作者**：Lianjun Liao, Chunhui Zhang, Tong Wu, Henglei Lv, Bailin Deng, Lin Gao\n- **🏫 单位**：North China University of Technology ⟐ Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Cardiff University\n- **🔗 链接**：[[中英摘要](./abs/2509.11275.md)] [[arXiv:2509.11275](https://arxiv.org/abs/2509.11275)] [Code]\n- **📝 说明**:\n\n#### [398] SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Ashkan Taghipour, Vahid Naghshin, Benjamin Southwell, Farid Boussaid, Hamid Laga, Mohammed Bennamoun\n- **🏫 单位**：The University of Western Australia ⟐ Dolby Laboratories ⟐  Murdoch University\n- **🔗 链接**：[[中英摘要](./abs/2509.11116.md)] [[arXiv:2509.11116](https://arxiv.org/abs/2509.11116)] [Code]\n- **📝 说明**:\n\n#### [399] SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting\n- **🧑‍🔬 作者**：Mahtab Dahaghin, Milind G. Padalkar, Matteo Toso, Alessio Del Bue\n- **🏫 单位**：Istituto Italiano di Tecnologia\n- **🔗 链接**：[[中英摘要](./abs/2509.07809.md)] [[arXiv:2509.07809](https://arxiv.org/abs/2509.07809)] [Code]\n- **📝 说明**:\n\n#### [400] PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image\n- **🧑‍🔬 作者**：Peng Li, Yisheng He, Yingdong Hu, Yuan Dong, Weihao Yuan, Yuan Liu, Zilong Dong, Yike Guo\n- **🏫 单位**：HKUST ⟐ Tongyi Lab, Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2509.07552.md)] [[arXiv:2509.07552](https://arxiv.org/abs/2509.07552)] [Code]\n- **📝 说明**:\n\n#### [401] Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning\n- **🧑‍🔬 作者**：Wenzhi Guo, Bing Wang\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ Nanjing University ⟐ Nanjing University of Aeronautics and Astronautics\n- **🔗 链接**：[[中英摘要](./abs/2509.07493.md)] [[arXiv:2509.07493](https://arxiv.org/abs/2509.07493)] [[Code](https://github.com/DARYL-GWZ/DIGS)]\n- **📝 说明**:\n\n#### [402] DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation\n- **🧑‍🔬 作者**：Ze-Xin Yin, Jiaxiong Qiu, Liu Liu, Xinjie Wang, Wei Sui, Zhizhong Su, Jian Yang, Jin Xie\n- **🏫 单位**：Nankai University ⟐ Nanjing University ⟐ Horizon Robotics ⟐ D-Robotics\n- **🔗 链接**：[[中英摘要](./abs/2509.07435.md)] [[arXiv:2509.07435](https://arxiv.org/abs/2509.07435)] [[Code](https://github.com/ZX-Yin/DreamLifting)]\n- **📝 说明**:\n\n#### [403] 3DOF+Quantization: 3DGS quantization for large scenes with limited Degrees of Freedom\n- **🧑‍🔬 作者**：Matthieu Gendrin, Stéphane Pateux, Théo Ladune\n- **🏫 单位**：Orange Innovation\n- **🔗 链接**：[[中英摘要](./abs/2509.06400.md)] [[arXiv:2509.06400](https://arxiv.org/abs/2509.06400)] [Code]\n- **📝 说明**:\n\n#### [404] Reconstruction and Reenactment Separated Method for Realistic Gaussian Head\n- **🧑‍🔬 作者**：Zhiling Ye, Cong Zhou, Xiubao Zhang, Haifeng Shen, Weihong Deng, Quan Lu\n- **🏫 单位**：Mashang Consumer Finance Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2509.05582.md)] [[arXiv:2509.05582](https://arxiv.org/abs/2509.05582)] [Code]\n- **📝 说明**:\n\n#### [405] Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sen Wang, Kunyi Li, Siyun Liang, Elena Alegret, Jing Ma, Nassir Navab, Stefano Gasperini\n- **🏫 单位**：Technical University of Munich ⟐ Munich Cental for Machine Learning ⟐ VisualAIs ⟐ Ludwig Maximilian University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2509.05515.md)] [[arXiv:2509.05515](https://arxiv.org/abs/2509.05515)] [Code]\n- **📝 说明**:\n\n#### [406] GeoSplat: A Deep Dive into Geometry-Constrained Gaussian Splatting\n- **🧑‍🔬 作者**：Yangming Li, Chaoyu Liu, Lihao Liu, Simon Masnou, Carola-Bibiane Schönlieb\n- **🏫 单位**：University of Cambridge ⟐ Universite Claude Bernard Lyon\n- **🔗 链接**：[[中英摘要](./abs/2509.05075.md)] [[arXiv:2509.05075](https://arxiv.org/abs/2509.05075)] [Code]\n- **📝 说明**:\n\n#### [407] SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer\n- **🧑‍🔬 作者**：Jimin Xu, Bosheng Qin, Tao Jin, Zhou Zhao, Zhenhui Ye, Jun Yu, Fei Wu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2509.04379.md)] [[arXiv:2509.04379](https://arxiv.org/abs/2509.04379)] [Code]\n- **📝 说明**:\n\n#### [408] ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction\n- **🧑‍🔬 作者**：Sankeerth Durvasula, Sharanshangar Muhunthan, Zain Moustafa, Richard Chen, Ruofan Liang, Yushi Guan, Nilesh Ahuja, Nilesh Jain, Selvakumar Panneer, Nandita Vijaykumar\n- **🏫 单位**：University of Toronto ⟐ Intel\n- **🔗 链接**：[[中英摘要](./abs/2509.03775.md)] [[arXiv:2509.03775](https://arxiv.org/abs/2509.03775)] [Code]\n- **📝 说明**:\n\n#### [409] GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals\n- **🧑‍🔬 作者**：Mohit Mendiratta, Mayur Deshmukh, Kartik Teotia, Vladislav Golyanik, Adam Kortylewski, Christian Theobalt\n- **🏫 单位**：Max Planck Institute for Informatics and Saarland University ⟐ University of Freiburg\n- **🔗 链接**：[[中英摘要](./abs/2509.02141.md)] [[arXiv:2509.02141](https://arxiv.org/abs/2509.02141)] [Code]\n- **📝 说明**:\n\n#### [410] 2D Gaussian Splatting with Semantic Alignment for Image Inpainting\n- **🧑‍🔬 作者**：Hongyu Li, Chaofeng Chen, Xiaoming Li, Guangming Lu\n- **🏫 单位**：Harbin Institute of Technology, Shenzhen ⟐ Wuhan University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2509.01964.md)] [[arXiv:2509.01964](https://arxiv.org/abs/2509.01964)] [[Code](https://github.com/hitlhy715/2DGS_inpaint)]\n- **📝 说明**:\n\n#### [411] UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring\n- **🧑‍🔬 作者**：Zhijing Wu, Longguang Wang\n- **🏫 单位**：University of Cambridge ⟐ The Shenzhen Campus of Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2509.00831.md)] [[arXiv:2509.00831](https://arxiv.org/abs/2509.00831)] [Code]\n- **📝 说明**:\n\n#### [412] SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting\n- **🧑‍🔬 作者**：Zhuodong Jiang, Haoran Wang, Guoxi Huang, Brett Seymour, Nantheera Anantrasirichai\n- **🏫 单位**：University of Bristol ⟐ Submerged Resources Centre\n- **🔗 链接**：[[中英摘要](./abs/2509.00800.md)] [[arXiv:2509.00800](https://arxiv.org/abs/2509.00800)] [Code]\n- **📝 说明**:\n\n#### [413] MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image Structure\n- **🧑‍🔬 作者**：Xiufeng Huang, Ziyuan Luo, Qi Song, Ruofei Wang, Renjie Wan\n- **🏫 单位**：Hong Kong Baptist University\n- **🔗 链接**：[[中英摘要](./abs/2509.00757.md)] [[arXiv:2509.00757](https://arxiv.org/abs/2509.00757)] [Code]\n- **📝 说明**:\n\n#### [414] AGS: Accelerating 3D Gaussian Splatting SLAM via CODEC-Assisted Frame Covisibility Detection\n- **🧑‍🔬 作者**：Houshu He, Naifeng Jing, Li Jiang, Xiaoyao Liang, Zhuoran Song\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2509.00433.md)] [[arXiv:2509.00433](https://arxiv.org/abs/2509.00433)] [Code]\n- **📝 说明**:\n\n#### [415] Complete Gaussian Splats from a Single Image with Denoising Diffusion Models\n- **🧑‍🔬 作者**：Ziwei Liao, Mohamed Sayed, Steven L. Waslander, Sara Vicente, Daniyar Turmukhambetov, Michael Firman\n- **🏫 单位**：University of Toronto ⟐ Niantic Spatial\n- **🔗 链接**：[[中英摘要](./abs/2508.21542.md)] [[arXiv:2508.21542](https://arxiv.org/abs/2508.21542)] [Code]\n- **📝 说明**:\n\n#### [416] Scale-GS: Efficient Scalable Gaussian Splatting via Redundancy-filtering Training on Streaming Content\n- **🧑‍🔬 作者**：Jiayu Yang, Weijian Su, Songqian Zhang, Yuqi Han, Jinli Suo, Qiang Zhang\n- **🏫 单位**： Dalian University of Technology ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2508.21444.md)] [[arXiv:2508.21444](https://arxiv.org/abs/2508.21444)] [Code]\n- **📝 说明**:\n\n#### [417] ARGS: Advanced Regularization on Aligning Gaussians over the Surface\n- **🧑‍🔬 作者**：Jeong Uk Lee, Sung Hee Choi\n- **🏫 单位**：KAIST\n- **🔗 链接**：[[中英摘要](./abs/2508.21344.md)] [[arXiv:2508.21344](https://arxiv.org/abs/2508.21344)] [Code]\n- **📝 说明**:\n\n#### [418] DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving Scenes\n- **🧑‍🔬 作者**：Yajiao Xiong, Xiaoyu Zhou, Yongtao Wan, Deqing Sun, Ming-Hsuan Yang\n- **🏫 单位**：Peking University ⟐ Google DeepMind ⟐ University of California, Merced\n- **🔗 链接**：[[中英摘要](./abs/2508.20965.md)] [[arXiv:2508.20965](https://arxiv.org/abs/2508.20965)] [Code]\n- **📝 说明**: Preliminary work DrivingGaussian has been published in CVPR 2024\n\n#### [419] AvatarBack: Back-Head Generation for Complete 3D Avatars from Front-View Images\n- **🧑‍🔬 作者**：Shiqi Xin, Xiaolin Zhang, Yanbin Liu, Peng Zhang, Caifeng Shan\n- **🏫 单位**：Shandong University of Science and Technology ⟐ Auckland University of Technology ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](./abs/2508.20623.md)] [[arXiv:2508.20623](https://arxiv.org/abs/2508.20623)] [Code]\n- **📝 说明**:\n\n#### [420] Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation\n- **🧑‍🔬 作者**：Jiusi Li, Jackson Jiang, Jinyu Miao, Miao Long, Tuopu Wen, Peijin Jia, Shengxiang Liu, Chunlei Yu, Maolin Liu, Yuzhan Cai, Kun Jiang, Mengmeng Yang, Diange Yang\n- **🏫 单位**：Tsinghua University ⟐ WUWEN AI ⟐ PhiGent Robotics\n- **🔗 链接**：[[中英摘要](./abs/2508.20471.md)] [[arXiv:2508.20471](https://arxiv.org/abs/2508.20471)] [Code]\n- **📝 说明**:\n\n#### [421] MAPo: Motion-Aware Partitioning of Deformable 3D Gaussian Splatting for High-Fidelity Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Han Jiao, Jiakai Sun, Yexing Xu, Lei Zhao, Wei Xing, Huaizhong Lin\n- **🏫 单位**：Zhejiang University ⟐ Sun Yat-Sen University\n- **🔗 链接**：[[中英摘要](./abs/2508.19786.md)] [[arXiv:2508.19786](https://arxiv.org/abs/2508.19786)] [Code]\n- **📝 说明**:\n\n#### [422] Style4D-Bench: A Benchmark Suite for 4D Stylization\n- **🧑‍🔬 作者**：Beiqi Chen, Shuai Shao, Haitang Feng, Jianhuang Lai, Jianlou Si, Guangcong Wang\n- **🏫 单位**：Harbin Institute of Technology ⟐ Great Bay University ⟐ Nanjing University ⟐ Sun Yat-Sen University ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2508.19243.md)] [[arXiv:2508.19243](https://arxiv.org/abs/2508.19243)] [[Code](https://github.com/Becky-catherine/Style4D-Bench)]\n- **📝 说明**:\n\n#### [423] ColorGS: High-fidelity Surgical Scene Reconstruction with Colored Gaussian Splatting\n- **🧑‍🔬 作者**：Qun Ji, Peng Li, Mingqiang Wei\n- **🏫 单位**：Nanjing University of Aeronautics and Astronautics\n- **🔗 链接**：[[中英摘要](./abs/2508.18696.md)] [[arXiv:2508.18696](https://arxiv.org/abs/2508.18696)] [Code]\n- **📝 说明**:\n\n#### [424] FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses\n- **🧑‍🔬 作者**：Hao Liang, Zhixuan Ge, Ashish Tiwari, Soumendu Majee, G.M. Dilshan Godaliyadda, Ashok Veeraraghavan, Guha Balakrishnan\n- **🏫 单位**：Rice University ⟐ Samsung Research America\n- **🔗 链接**：[[中英摘要](./abs/2508.18389.md)] [[arXiv:2508.18389](https://arxiv.org/abs/2508.18389)] [[Code](https://github.com/hliang2/FastAvatar)]\n- **📝 说明**:\n\n#### [425] Camera Pose Refinement via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Lulu Hao, Lipu Zhou, Zhenzhong Wei, Xu Wang\n- **🏫 单位**：Beihang University ⟐ Meituan\n- **🔗 链接**：[[中英摘要](./abs/2508.17876.md)] [[arXiv:2508.17876](https://arxiv.org/abs/2508.17876)] [Code]\n- **📝 说明**:\n\n#### [426] IDU: Incremental Dynamic Update of Existing 3D Virtual Environments with New Imagery Data\n- **🧑‍🔬 作者**：Meida Chen, Luis Leal, Yue Hu, Rong Liu, Butian Xiong, Andrew Feng, Jiuyi Xu, Yangming Shi\n- **🏫 单位**：USC Institute for Creative Technologies ⟐ Colorado School of Mines\n- **🔗 链接**：[[中英摘要](./abs/2508.17579.md)] [[arXiv:2508.17579](https://arxiv.org/abs/2508.17579)] [Code]\n- **📝 说明**:\n\n#### [427] Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels\n- **🧑‍🔬 作者**：Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, Lingjie Liu\n- **🏫 单位**：University of Pennsylvania ⟐ Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.17437.md)] [[arXiv:2508.17437](https://arxiv.org/abs/2508.17437)] [[Code](https://github.com/vlongle/pixie)]\n- **📝 说明**:\n\n#### [428] Fiducial Marker Splatting for High-Fidelity Robotics Simulations\n- **🧑‍🔬 作者**：Diram Tabaa, Gianni Di Caro\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2508.17012.md)] [[arXiv:2508.17012](https://arxiv.org/abs/2508.17012)] [Code]\n- **📝 说明**:\n\n#### [429] Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework\n- **🧑‍🔬 作者**：Zongqi He, Hanmin Li, Kin-Chung Chan, Yushen Zuo, Hao Xie, Zhe Xiao, Jun Xiao, Kin-Man Lam\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2508.15457.md)] [[arXiv:2508.15457](https://arxiv.org/abs/2508.15457)] [Code]\n- **📝 说明**:\n\n#### [430] DriveSplat: Decoupled Driving Scene Reconstruction with Geometry-enhanced Partitioned Neural Gaussians\n- **🧑‍🔬 作者**：Cong Wang, Xianda Guo, Wenbo Xu, Wei Tian, Ruiqi Song, Chenming Zhang, Lingxi Li, Long Chen\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Wuhan University ⟐ Waytous ⟐ Tongji University ⟐ Xi’an Jiaotong University ⟐ Purdue University\n- **🔗 链接**：[[中英摘要](./abs/2508.15376.md)] [[arXiv:2508.15376](https://arxiv.org/abs/2508.15376)] [[Code](https://github.com/Michael-Evans-Savitar/DriveSplat)]\n- **📝 说明**:\n\n#### [431] Image-Conditioned 3D Gaussian Splat Quantization\n- **🧑‍🔬 作者**：Xinshuang Liu, Runfa Blark Li, Keito Suzuki, Truong Nguyen\n- **🏫 单位**：University of California, San Diego\n- **🔗 链接**：[[中英摘要](./abs/2508.15372.md)] [[arXiv:2508.15372](https://arxiv.org/abs/2508.15372)] [[Code](https://github.com/XinshuangL/ICGS-Quantizer)]\n- **📝 说明**:\n\n#### [432] MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion\n- **🧑‍🔬 作者**：Xuyang Chen, Zhijun Zhai, Kaixuan Zhou, Zengmao Wang, Jianan He, Dong Wang, Yanfeng Zhang, mingwei Sun, Rüdiger Westermann, Konrad Schindler, Liqiu Meng\n- **🏫 单位**：TU Munich ⟐ Wuhan University ⟐ Huawei Riemann Lab ⟐ ETH\n- **🔗 链接**：[[中英摘要](./abs/2508.15169.md)] [[arXiv:2508.15169](https://arxiv.org/abs/2508.15169)] [Code]\n- **📝 说明**:\n\n#### [433] Zero-shot Volumetric CT Super-Resolution using 3D Gaussian Splatting with Upsampled 2D X-ray Projection Priors\n- **🧑‍🔬 作者**：Jeonghyun Noh, Hyun-Jic Oh, Byungju Chae, Won-Ki Jeong\n- **🏫 单位**：Korea University\n- **🔗 链接**：[[中英摘要](./abs/2508.15151.md)] [[arXiv:2508.15151](https://arxiv.org/abs/2508.15151)] [Code]\n- **📝 说明**:\n\n#### [434] Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds\n- **🧑‍🔬 作者**：Jia Lu, Taoran Yi, Jiemin Fang, Chen Yang, Chuiyun Wu, Wei Shen, Wenyu Liu, Qi Tian, Xinggang Wang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Huawei Inc. ⟐ Shanghai Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2508.14892.md)] [[arXiv:2508.14892](https://arxiv.org/abs/2508.14892)] [[Code](https://github.com/hustvl/Snap-Snap)]\n- **📝 说明**:\n\n#### [435] GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting\n- **🧑‍🔬 作者**：Jiaxin Wei, Stefan Leutenegger, Simon Schaefer\n- **🏫 单位**：Technical University of Munich ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](./abs/2508.14717.md)] [[arXiv:2508.14717](https://arxiv.org/abs/2508.14717)] [[Code](https://github.com/GSFix3D/GSFix3D)]\n- **📝 说明**:\n\n#### [436] GeMS: Efficient Gaussian Splatting for Extreme Motion Blur\n- **🧑‍🔬 作者**：Gopi Raju Matta, Trisha Reddypalli, Vemunuri Divya Madhuri, Kaushik Mitra\n- **🏫 单位**：IIT Madras\n- **🔗 链接**：[[中英摘要](./abs/2508.14682.md)] [[arXiv:2508.14682](https://arxiv.org/abs/2508.14682)] [Code]\n- **📝 说明**:\n\n#### [437] GOGS: High-Fidelity Geometry and Relighting for Glossy Objects via Gaussian Surfels\n- **🧑‍🔬 作者**：Xingyuan Yang, Min Wei\n- **🏫 单位**：Chengdu University of Information Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.14563.md)] [[arXiv:2508.14563](https://arxiv.org/abs/2508.14563)] [Code]\n- **📝 说明**:\n\n#### [438] From Slices to Structures: Unsupervised 3D Reconstruction of Female Pelvic Anatomy from Freehand Transvaginal Ultrasound\n- **🧑‍🔬 作者**：Max Krähenmann, Sergio Tascon-Morales, Fabian Laumer, Julia E. Vogt, Ece Ozkan\n- **🏫 单位**：ETH Zurich ⟐ Scanvio Medical AG\n- **🔗 链接**：[[中英摘要](./abs/2508.14552.md)] [[arXiv:2508.14552](https://arxiv.org/abs/2508.14552)] [Code]\n- **📝 说明**:\n\n#### [439] D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis\n- **🧑‍🔬 作者**：Yuhang Guo, Kaijun Deng, Siyang Song, Jindong Xie, Wenhui Ma, Linlin Shen\n- **🏫 单位**：Shenzhen University ⟐ Guangdong Provincial Key Laboratory of Intelligent Information Processing ⟐ University of Exeter\n- **🔗 链接**：[[中英摘要](./abs/2508.14449.md)] [[arXiv:2508.14449](https://arxiv.org/abs/2508.14449)] [Code]\n- **📝 说明**:\n\n#### [440] Reconstruction Using the Invisible: Intuition from NIR and Metadata for Enhanced 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Gyusam Chang, Tuan-Anh Vu, Vivek Alumootil, Harris Song, Deanna Pham, Sangpil Kim, M. Khalid Jawed\n- **🏫 单位**：Korea University ⟐ University of California\n- **🔗 链接**：[[中英摘要](./abs/2508.14443.md)] [[arXiv:2508.14443](https://arxiv.org/abs/2508.14443)] [Code]\n- **📝 说明**:\n\n#### [441] GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting\n- **🧑‍🔬 作者**：Elena Alegret, Kunyi Li, Sen Wang, Siyun Liang, Michael Niemeyer, Stefano Gasperini, Nassir Navab, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Universitat Politecnica de Catalunya ⟐ Google ⟐ Munich Center for Machine Learning ⟐ University of Tubingen ⟐ ETH Zurich ⟐ Visualais\n- **🔗 链接**：[[中英摘要](./abs/2508.14278.md)] [[arXiv:2508.14278](https://arxiv.org/abs/2508.14278)] [Code]\n- **📝 说明**:\n\n#### [442] Distilled-3DGS:Distilled 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Lintao Xiang, Xinkai Chen, Jianhuang Lai, Guangcong Wang\n- **🏫 单位**：The University of Manchester ⟐ Great Bay University ⟐ Sun Yat-Sen University\n- **🔗 链接**：[[中英摘要](./abs/2508.14037.md)] [[arXiv:2508.14037](https://arxiv.org/abs/2508.14037)] [[Code](https://github.com/lt-xiang/Distilled-3DGS)]\n- **📝 说明**:\n\n#### [443] Online 3D Gaussian Splatting Modeling with Novel View Selection\n- **🧑‍🔬 作者**：Byeonggwon Lee, Junkyu Park, Khang Truong Giang, Soohwan Song\n- **🏫 单位**：Dongguk University ⟐ 42dot\n- **🔗 链接**：[[中英摘要](./abs/2508.14014.md)] [[arXiv:2508.14014](https://arxiv.org/abs/2508.14014)] [Code]\n- **📝 说明**:\n\n#### [444] PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis\n- **🧑‍🔬 作者**：Chunji Lv, Zequn Chen, Donglin Di, Weinan Zhang, Hao Li, Wei Chen, Changsheng Li\n- **🏫 单位**：Beijing Institute of Technology ⟐ Li Auto Inc. ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.13911.md)] [[arXiv:2508.13911](https://arxiv.org/abs/2508.13911)] [Code]\n- **📝 说明**:\n\n#### [445] EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors\n- **🧑‍🔬 作者**：Shikun Zhang, Cunjian Chen, Yiqun Wang, Qiuhong Ke, Yong Li\n- **🏫 单位**：Monash University ⟐ Chongqing University\n- **🔗 链接**：[[中英摘要](./abs/2508.13537.md)] [[arXiv:2508.13537](https://arxiv.org/abs/2508.13537)] [[Code](https://github.com/Kkun12345/EAvatar)]\n- **📝 说明**:\n\n#### [446] InnerGS: Internal Scenes Rendering via Factorized 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shuxin Liang, Yihan Xiao, Wenlu Tang\n- **🏫 单位**：University of Alberta ⟐ Sichuan University ⟐ University of Alberta\n- **🔗 链接**：[[中英摘要](./abs/2508.13287.md)] [[arXiv:2508.13287](https://arxiv.org/abs/2508.13287)] [[Code](https://github.com/Shuxin-Liang/InnerGS)]\n- **📝 说明**:\n\n#### [447] IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion\n- **🧑‍🔬 作者**：Wenhao Hu, Zesheng Li, Haonan Zhou, Liu Liu, Xuexiang Wen, Zhizhong Su, Xi Li, Gaoang Wang\n- **🏫 单位**：Zhejiang University ⟐ Nanyang Technological University ⟐ Horizon Robotics\n- **🔗 链接**：[[中英摘要](./abs/2508.13153.md)] [[arXiv:2508.13153](https://arxiv.org/abs/2508.13153)] [[Code](https://github.com/whhu7/IGFuse-code)]\n- **📝 说明**:\n\n#### [448] Improving Densification in 3D Gaussian Splatting for High-Fidelity Rendering\n- **🧑‍🔬 作者**：Xiaobin Deng, Changyu Diao, Min Li, Ruohan Yu, Duanqing Xu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2508.12313.md)] [[arXiv:2508.12313](https://arxiv.org/abs/2508.12313)] [[Code](https://github.com/XiaoBin2001/Improved-GS)]\n- **📝 说明**:\n\n#### [449] InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes\n- **🧑‍🔬 作者**：Hongyuan Liu, Haochen Yu, Jianfei Jiang, Qiankun Liu, Jiansheng Chen, Huimin Ma\n- **🏫 单位**：University of Science and Technology Beijing\n- **🔗 链接**：[[中英摘要](./abs/2508.12015.md)] [[arXiv:2508.12015](https://arxiv.org/abs/2508.12015)] [Code]\n- **📝 说明**:\n\n#### [450] ComplicitSplat: Downstream Models are Vulnerable to Blackbox Attacks by 3D Gaussian Splat Camouflages\n- **🧑‍🔬 作者**：Matthew Hull, Haoyang Yang, Pratham Mehta, Mansi Phute, Aeree Cho, Haorang Wang, Matthew Lau, Wenke Lee, Wilian Lunardi, Martin Andreoni, Duen Horng Chau\n- **🏫 单位**：Georgia Tech ⟐ Technology Innovation Institute\n- **🔗 链接**：[[中英摘要](./abs/2508.11854.md)] [[arXiv:2508.11854](https://arxiv.org/abs/2508.11854)] [Code]\n- **📝 说明**:\n\n#### [451] Multi-Sample Anti-Aliasing and Constrained Optimization for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zheng Zhou, Jia-Chen Zhang, Yu-Jie Xiong, Chun-Ming Xia\n- **🏫 单位**：Shanghai University of Engineering Science\n- **🔗 链接**：[[中英摘要](./abs/2508.10507.md)] [[arXiv:2508.10507](https://arxiv.org/abs/2508.10507)] [Code]\n- **📝 说明**:\n\n#### [452] EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yuning Huang, Jiahao Pang, Fengqing Zhu, Dong Tian\n- **🏫 单位**：InterDigital\n- **🔗 链接**：[[中英摘要](./abs/2508.10227.md)] [[arXiv:2508.10227](https://arxiv.org/abs/2508.10227)] [Code]\n- **📝 说明**:\n\n#### [453] GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors\n- **🧑‍🔬 作者**：Xingyilang Yin, Qi Zhang, Jiahao Chang, Ying Feng, Qingnan Fan, Xi Yang, Chi-Man Pun, Huaqi Zhang, Xiaodong Cun\n- **🏫 单位**：University of Macau ⟐ VIVO ⟐ CUHKSZ ⟐ Xidian University ⟐ Great Bay University\n- **🔗 链接**：[[中英摘要](./abs/2508.09667.md)] [[arXiv:2508.09667](https://arxiv.org/abs/2508.09667)] [[Code](https://github.com/GVCLab/GSFixer)]\n- **📝 说明**:\n\n#### [454] Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation\n- **🧑‍🔬 作者**：Xu Tang, Junan Jia, Yijing Wang, Jingjing Ma, Xiangrong Zhang\n- **🏫 单位**：Xidian University\n- **🔗 链接**：[[中英摘要](./abs/2508.09626.md)] [[arXiv:2508.09626](https://arxiv.org/abs/2508.09626)] [Code]\n- **📝 说明**:\n\n#### [455] DualPhys-GS: Dual Physically-Guided 3D Gaussian Splatting for Underwater Scene Reconstruction\n- **🧑‍🔬 作者**：Jiachen Li, Guangzhi Han, Jin Wan, Yuan Gao, Delong Han\n- **🏫 单位**：Shandong Computer Science Center ⟐ Shandong Fundamental Research Center for Computer Science\n- **🔗 链接**：[[中英摘要](./abs/2508.09610.md)] [[arXiv:2508.09610](https://arxiv.org/abs/2508.09610)] [Code]\n- **📝 说明**:\n\n#### [456] SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images\n- **🧑‍🔬 作者**：Xuejun Huang, Xinyi Liu, Yi Wan, Zhi Zheng, Bin Zhang, Mingtao Xiong, Yingying Pei, Yongjun Zhang\n- **🏫 单位**：Wuhan University ⟐ Technology Innovation Center for Collaborative Applications of Natural Resources Data in GBA, Ministry of Natural Resources ⟐ The Chinese University of Hong Kong ⟐ China Railway Siyuan Survey and Design Group Co., LTD\n- **🔗 链接**：[[中英摘要](./abs/2508.09479.md)] [[arXiv:2508.09479](https://arxiv.org/abs/2508.09479)] [Code]\n- **📝 说明**:\n\n#### [457] SAGOnline: Segment Any Gaussians Online\n- **🧑‍🔬 作者**：Wentao Sun, Quanyun Wu, Hanqing Xu, Kyle Gao, Zhengsen Xu, Yiping Chen, Dedong Zhang, Lingfei Ma, John S. Zelek, Jonathan Li\n- **🏫 单位**：East China Normal University ⟐ University of Waterloo ⟐ University of Calgary ⟐ Sun Yat-Sen University\n- **🔗 链接**：[[中英摘要](./abs/2508.08219.md)] [[arXiv:2508.08219](https://arxiv.org/abs/2508.08219)] [Code]\n- **📝 说明**:\n\n#### [458] NeeCo: Image Synthesis of Novel Instrument States Based on Dynamic and Deformable 3D Gaussian Reconstruction\n- **🧑‍🔬 作者**：Tianle Zeng, Junlei Hu, Gerardo Loza Galindo, Sharib Ali, Duygu Sarikaya, Pietro Valdastri, Dominic Jones\n- **🏫 单位**：University of Leeds ⟐ STORM Lab\n- **🔗 链接**：[[中英摘要](./abs/2508.07897.md)] [[arXiv:2508.07897](https://arxiv.org/abs/2508.07897)] [Code]\n- **📝 说明**:\n\n#### [459] Touch-Augmented Gaussian Splatting for Enhanced 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Yuchen Gao, Xiao Xu, Eckehard Steinbach, Daniel E. Lucani, Qi Zhang\n- **🏫 单位**：Aarhus University ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2508.07717.md)] [[arXiv:2508.07717](https://arxiv.org/abs/2508.07717)] [Code]\n- **📝 说明**:\n\n#### [460] Novel View Synthesis with Gaussian Splatting: Impact on Photogrammetry Model Accuracy and Resolution\n- **🧑‍🔬 作者**：Pranav Chougule\n- **🏫 单位**：Arizona State University\n- **🔗 链接**：[[中英摘要](./abs/2508.07483.md)] [[arXiv:2508.07483](https://arxiv.org/abs/2508.07483)] [[Code](https://github.com/pranavc2255/gaussian-splatting-novel-view-render)]\n- **📝 说明**:\n\n#### [461] DIP-GS: Deep Image Prior For Gaussian Splatting Sparse View Recovery\n- **🧑‍🔬 作者**：Rajaei Khatib, Raja Giryes\n- **🏫 单位**：Tel Aviv University\n- **🔗 链接**：[[中英摘要](./abs/2508.07372.md)] [[arXiv:2508.07372](https://arxiv.org/abs/2508.07372)] [Code]\n- **📝 说明**:\n\n#### [462] Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems\n- **🧑‍🔬 作者**：Qingyuan Zeng, Shu Jiang, Jiajing Lin, Zhenzhong Wang, Kay Chen Tan, Min Jiang\n- **🏫 单位**：Xiamen University ⟐ The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2508.07263.md)] [[arXiv:2508.07263](https://arxiv.org/abs/2508.07263)] [Code]\n- **📝 说明**:\n\n#### [463] 3D Gaussian Representations with Motion Trajectory Field for Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Xuesong Li, Lars Petersson, Vivien Rolland\n- **🏫 单位**：CSIRO ⟐ The Australian National University\n- **🔗 链接**：[[中英摘要](./abs/2508.07182.md)] [[arXiv:2508.07182](https://arxiv.org/abs/2508.07182)] [Code]\n- **📝 说明**:\n\n#### [464] 3DGS-VBench: A Comprehensive Video Quality Evaluation Benchmark for 3DGS Compression\n- **🧑‍🔬 作者**：Yuke Xing, William Gordon, Qi Yang, Kaifa Yang, Jiarui Wang, Yiling Xu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Basis Independent Silicon Valley ⟐ University of Missouri-Kansas City\n- **🔗 链接**：[[中英摘要](./abs/2508.07038.md)] [[arXiv:2508.07038](https://arxiv.org/abs/2508.07038)] [[Code](https://github.com/YukeXing/3DGS-VBench)]\n- **📝 说明**:\n\n#### [465] Evaluating Fisheye-Compatible 3D Gaussian Splatting Methods on Real Images Beyond 180 Degree Field of View\n- **🧑‍🔬 作者**：Ulas Gunes, Matias Turkulainen, Juho Kannala, Esa Rahtu\n- **🏫 单位**：Tampere University ⟐ Aalto University ⟐ University of Oulu\n- **🔗 链接**：[[中英摘要](./abs/2508.06968.md)] [[arXiv:2508.06968](https://arxiv.org/abs/2508.06968)] [Code]\n- **📝 说明**:\n\n#### [466] UW-3DGS: Underwater 3D Reconstruction with Physics-Aware Gaussian Splatting\n- **🧑‍🔬 作者**：Wenpeng Xing, Jie Chen, Zaifeng Yang, Changting Lin, Jianfeng Dong, Chaochao Chen, Xun Zhou, Meng Han\n- **🏫 单位**：Zhejiang University ⟐ Hong Kong Baptist University ⟐ A*STAR ⟐ Binjiang Institute of Zhejiang University ⟐ Zhejiang Gongshang University ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.06169.md)] [[arXiv:2508.06169](https://arxiv.org/abs/2508.06169)] [Code]\n- **📝 说明**:\n\n#### [467] A 3DGS-Diffusion Self-Supervised Framework for Normal Estimation from a Single Image\n- **🧑‍🔬 作者**：Yanxing Liang, Yinghui Wang, Jinlong Yang, Wei Li\n- **🏫 单位**：Jiangnan University\n- **🔗 链接**：[[中英摘要](./abs/2508.05950.md)] [[arXiv:2508.05950](https://arxiv.org/abs/2508.05950)] [Code]\n- **📝 说明**:\n\n#### [468] Refining Gaussian Splatting: A Volumetric Densification Approach\n- **🧑‍🔬 作者**：Mohamed Abdul Gafoor, Marius Preda, Titus Zaharia\n- **🏫 单位**：Institut Polytechnique de Paris\n- **🔗 链接**：[[中英摘要](./abs/2508.05187.md)] [[arXiv:2508.05187](https://arxiv.org/abs/2508.05187)] [Code]\n- **📝 说明**:\n\n#### [469] UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS\n- **🧑‍🔬 作者**：Zhihao Guo, Peng Wang, Zidong Chen, Xiangyu Kong, Yan Lyu, Guanyu Gao, Liangxiu Han\n- **🏫 单位**：Manchester Metropolitan University ⟐ Imperial College London ⟐ University of Exeter ⟐ Southeast University ⟐ Nanjing University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2508.04968.md)] [[arXiv:2508.04968](https://arxiv.org/abs/2508.04968)] [Code]\n- **📝 说明**:\n\n#### [470] Laplacian Analysis Meets Dynamics Modelling: Gaussian Splatting for 4D Reconstruction\n- **🧑‍🔬 作者**：Yifan Zhou, Beizhen Zhao, Pengcheng Wu, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2508.04966.md)] [[arXiv:2508.04966](https://arxiv.org/abs/2508.04966)] [Code]\n- **📝 说明**:\n\n#### [471] Perceive-Sample-Compress: Towards Real-Time 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zijian Wang, Beizhen Zhao, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2508.04965.md)] [[arXiv:2508.04965](https://arxiv.org/abs/2508.04965)] [Code]\n- **📝 说明**:\n\n#### [472] CryoGS: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction\n- **🧑‍🔬 作者**：Suyi Chen, Haibin Ling\n- **🏫 单位**：Stony Brook University\n- **🔗 链接**：[[中英摘要](./abs/2508.04929.md)] [[arXiv:2508.04929](https://arxiv.org/abs/2508.04929)] [Code]\n- **📝 说明**:\n\n#### [473] Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds\n- **🧑‍🔬 作者**：Haodong Zhu, Changbai Li, Yangyang Ren, Zichao Feng, Xuhui Liu, Hanlin Chen, Xiantong Zhen, Baochang Zhang\n- **🏫 单位**：Beihang University ⟐ National University of Singapore ⟐ United Imaging ⟐ Zhongguancun Academy\n- **🔗 链接**：[[中英摘要](./abs/2508.04508.md)] [[arXiv:2508.04508](https://arxiv.org/abs/2508.04508)] [Code]\n- **📝 说明**:\n\n#### [474] SplitGaussian: Reconstructing Dynamic Scenes via Visual Geometry Decomposition\n- **🧑‍🔬 作者**：Jiahui Li, Shengeng Tang, Jingxuan He, Gang Huang, Zhangye Wang, Yantao Pan, Lechao Cheng\n- **🏫 单位**：Zhejiang University ⟐ Hefei University of Technology ⟐ KAIYANG Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2508.04224.md)] [[arXiv:2508.04224](https://arxiv.org/abs/2508.04224)] [Code]\n- **📝 说明**:\n\n#### [475] DET-GS: Depth- and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zexu Huang, Min Xu, Stuart Perry\n- **🏫 单位**：University of Technology Sydney\n- **🔗 链接**：[[中英摘要](./abs/2508.04099.md)] [[arXiv:2508.04099](https://arxiv.org/abs/2508.04099)] [Code]\n- **📝 说明**:\n\n#### [476] RLGS: Reinforcement Learning-Based Adaptive Hyperparameter Tuning for Gaussian Splatting\n- **🧑‍🔬 作者**：Zhan Li, Huangying Zhan, Changyang Li, Qingan Yan, Yi Xu\n- **🏫 单位**：Goertek Alpha Labs\n- **🔗 链接**：[[中英摘要](./abs/2508.04078.md)] [[arXiv:2508.04078](https://arxiv.org/abs/2508.04078)] [Code]\n- **📝 说明**:\n\n#### [477] Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting\n- **🧑‍🔬 作者**：Weihang Liu, Yuke Li, Yuxuan Li, Jingyi Yu, Xin Lou\n- **🏫 单位**：ShanghaiTech University\n- **🔗 链接**：[[中英摘要](./abs/2508.03180.md)] [[arXiv:2508.03180](https://arxiv.org/abs/2508.03180)] [Code]\n- **📝 说明**:\n\n#### [478] RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions\n- **🧑‍🔬 作者**：Anran Wu, Long Peng, Xin Di, Xueyuan Dai, Chen Wu, Yang Wang, Xueyang Fu, Yang Cao, Zheng-Jun Zha\n- **🏫 单位**：University of Science and Technology of China ⟐ Chang’an University\n- **🔗 链接**：[[中英摘要](./abs/2508.03077.md)] [[arXiv:2508.03077](https://arxiv.org/abs/2508.03077)] [Code]\n- **📝 说明**:\n\n#### [479] SA-3DGS: A Self-Adaptive Compression Method for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Liheng Zhang, Weihao Yu, Zubo Lu, Haozhi Gu, Jin Huang\n- **🏫 单位**：South China Normal University ⟐ China Telecom Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2508.03017.md)] [[arXiv:2508.03017](https://arxiv.org/abs/2508.03017)] [Code]\n- **📝 说明**: This paper has been withdrawn\n\n#### [480] GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing\n- **🧑‍🔬 作者**：Mikołaj Zieliński, Krzysztof Byrski, Tomasz Szczepanik, Przemysław Spurek\n- **🏫 单位**：Poznan University of Technology ⟐ Jagiellonian University ⟐ IDEAS Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2508.02831.md)] [[arXiv:2508.02831](https://arxiv.org/abs/2508.02831)] [[Code](https://github.com/MikolajZielinski/genie)]\n- **📝 说明**:\n\n#### [481] PMGS: Reconstruction of Projectile Motion across Large Spatiotemporal Spans via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yijun Xu, Jingrui Zhang, Yuhan Chen, Dingwen Wang, Lei Yu, Chu He\n- **🏫 单位**：Wuhan University ⟐ Chongqing University\n- **🔗 链接**：[[中英摘要](./abs/2508.02660.md)] [[arXiv:2508.02660](https://arxiv.org/abs/2508.02660)] [Code]\n- **📝 说明**:\n\n#### [482] GR-Gaussian: Graph-Based Radiative Gaussian Splatting for Sparse-View CT Reconstruction\n- **🧑‍🔬 作者**：Yikuang Yuluo, Yue Ma, Kuan Shen, Tongtong Jin, Wang Liao, Yangpu Ma, Fuquan Wang\n- **🏫 单位**：Chongqing University ⟐ HKUST\n- **🔗 链接**：[[中英摘要](./abs/2508.02408.md)] [[arXiv:2508.02408](https://arxiv.org/abs/2508.02408)] [Code]\n- **📝 说明**:\n\n#### [483] SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion\n- **🧑‍🔬 作者**：Rui Qian, Haozhi Cao, Tianchen Deng, Shenghai Yuan, Lihua Xie\n- **🏫 单位**：Nanyang Technological University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2508.02261.md)] [[arXiv:2508.02261](https://arxiv.org/abs/2508.02261)] [Code]\n- **📝 说明**:\n\n#### [484] AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing\n- **🧑‍🔬 作者**：Zhaonan Wang, Manyi Li, Changhe Tu\n- **🏫 单位**：Shandong University\n- **🔗 链接**：[[中英摘要](./abs/2508.01740.md)] [[arXiv:2508.01740](https://arxiv.org/abs/2508.01740)] [Code]\n- **📝 说明**:\n\n#### [485] MoGaFace: Momentum-Guided and Texture-Aware Gaussian Avatars for Consistent Facial Geometry\n- **🧑‍🔬 作者**：Yujian Liu, Linlang Cao, Chuang Chen, Fanyu Geng, Dongxu Shen, Peng Cao, Shidang Xu, Xiaoli Liu\n- **🏫 单位**：AiShiWeiLai AI Research ⟐ South China University of Technology ⟐ Hong Kong University of Science and Technology (Guangzhou) ⟐ Northeastern University\n- **🔗 链接**：[[中英摘要](./abs/2508.01218.md)] [[arXiv:2508.01218](https://arxiv.org/abs/2508.01218)] [Code]\n- **📝 说明**:\n\n#### [486] PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting\n- **🧑‍🔬 作者**：Wentao Sun, Hanqing Xu, Quanyun Wu, Dedong Zhang, Yiping Chen, Lingfei Ma, John S. Zelek, Jonathan Li\n- **🏫 单位**：University of Waterloo ⟐ East China Normal University ⟐ Sun Yat-Sen University\n- **🔗 链接**：[[中英摘要](./abs/2508.00259.md)] [[arXiv:2508.00259](https://arxiv.org/abs/2508.00259)] [Code]\n- **📝 说明**:\n\n#### [487] SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Di Li, Jie Feng, Jiahao Chen, Weisheng Dong, Guanbin Li, Yuhui Zheng, Mingtao Feng, Guangming Shi\n- **🏫 单位**：Xidian University ⟐ Sun Yat-sen University ⟐ Qinghai Normal University\n- **🔗 链接**：[[中英摘要](./abs/2507.23772.md)] [[arXiv:2507.23772](https://arxiv.org/abs/2507.23772)] [Code]\n- **📝 说明**:\n\n#### [488] Enhanced Velocity Field Modeling for Gaussian Video Reconstruction\n- **🧑‍🔬 作者**：Zhenyang Li, Xiaoyang Bai, Tongchen Zhang, Pengfei Shen, Weiwei Xu, Yifan Peng\n- **🏫 单位**：The University of Hong Kong ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2507.23704.md)] [[arXiv:2507.23704](https://arxiv.org/abs/2507.23704)] [Code]\n- **📝 说明**:\n\n#### [489] I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation\n- **🧑‍🔬 作者**：Jialei Chen, Wuhao Xu, Sipeng He, Baoru Huang, Dongchun Ren\n- **🏫 单位**：Yootta ⟐ Soochow University ⟐ Southeast University ⟐ University of Liverpool\n- **🔗 链接**：[[中英摘要](./abs/2507.23683.md)] [[arXiv:2507.23683](https://arxiv.org/abs/2507.23683)] [Code]\n- **📝 说明**:\n\n#### [490] Stereo 3D Gaussian Splatting SLAM for Outdoor Urban Scenes\n- **🧑‍🔬 作者**：Xiaohan Li, Ziren Gong, Fabio Tosi, Matteo Poggi, Stefano Mattoccia, Dong Liu, Jun Wu\n- **🏫 单位**：University of Science and Technology of China ⟐ University of Bologna ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2507.23677.md)] [[arXiv:2507.23677](https://arxiv.org/abs/2507.23677)] [Code]\n- **📝 说明**:\n\n#### [491] GSFusion:Globally Optimized LiDAR-Inertial-Visual Mapping for Gaussian Splatting\n- **🧑‍🔬 作者**：Jaeseok Park, Chanoh Park, Minsu Kim, Soohwan Kim\n- **🏫 单位**：RovifyLab ⟐ Kwangwoon University\n- **🔗 链接**：[[中英摘要](./abs/2507.23273.md)] [[arXiv:2507.23273](https://arxiv.org/abs/2507.23273)] [Code]\n- **📝 说明**:\n\n#### [492] UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views\n- **🧑‍🔬 作者**：Yuki Fujimura, Takahiro Kushida, Kazuya Kitano, Takuya Funatomi, Yasuhiro Mukaigawa\n- **🏫 单位**：NAIST ⟐ Ritsumeikan University ⟐ Kyoto University\n- **🔗 链接**：[[中英摘要](./abs/2507.22342.md)] [[arXiv:2507.22342](https://arxiv.org/abs/2507.22342)] [[Code](https://github.com/yfujimura/UFV-Splatter)]\n- **📝 说明**:\n\n#### [493] MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors\n- **🧑‍🔬 作者**：Shouyi Lu, Zihan Lin, Chao Lu, Huanran Wang, Guirong Zhuo, Lianqing Zheng\n- **🏫 单位**：Tongji University ⟐ Mach Drive\n- **🔗 链接**：[[中英摘要](./abs/2507.21872.md)] [[arXiv:2507.21872](https://arxiv.org/abs/2507.21872)] [Code]\n- **📝 说明**:\n\n#### [494] S3LAM: Surfel Splatting SLAM for Geometrically Accurate Tracking and Mapping\n- **🧑‍🔬 作者**：Ruoyu Fan, Yuhui Wen, Jiajia Dai, Tao Zhang, Long Zeng, Yong-jin Liu\n- **🏫 单位**：Tsinghua University ⟐ Beijing Jiaotong University ⟐ Pudu Robotics ⟐ Tsinghua Shenzhen International Graduate School\n- **🔗 链接**：[[中英摘要](./abs/2507.20854.md)] [[arXiv:2507.20854](https://arxiv.org/abs/2507.20854)] [Code]\n- **📝 说明**:\n\n#### [495] Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Binxiao Huang, Zhengwu Liu, Ngai Wong\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2507.20239.md)] [[arXiv:2507.20239](https://arxiv.org/abs/2507.20239)] [Code]\n- **📝 说明**:\n\n#### [496] RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection\n- **🧑‍🔬 作者**：Xiaokai Bai, Chenxu Zhou, Lianqing Zheng, Si-Yuan Cao, Jianan Liu, Xiaohan Zhang, Zhengzhuang Zhang, Hui-liang Shen\n- **🏫 单位**：Zhejiang University ⟐ Tongji University ⟐ Momoni AI\n- **🔗 链接**：[[中英摘要](./abs/2507.19856.md)] [[arXiv:2507.19856](https://arxiv.org/abs/2507.19856)] [Code]\n- **📝 说明**:\n\n#### [497] Taking Language Embedded 3D Gaussian Splatting into the Wild\n- **🧑‍🔬 作者**：Yuze Wang, Yue Qi\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2507.19830.md)] [[arXiv:2507.19830](https://arxiv.org/abs/2507.19830)] [[Code](https://github.com/yuzewang1998/takinglangsplatw)]\n- **📝 说明**:\n\n#### [498] GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：David Bauer, Qi Wu, Hamid Gadirov, Kwan-Liu Ma\n- **🏫 单位**：University of California at Davis ⟐ NVIDIA Research ⟐ University of Groningen\n- **🔗 链接**：[[中英摘要](./abs/2507.19718.md)] [[arXiv:2507.19718](https://arxiv.org/abs/2507.19718)] [Code]\n- **📝 说明**:\n\n#### [499] Gaussian Set Surface Reconstruction through Per-Gaussian Optimization\n- **🧑‍🔬 作者**：Zhentao Huang, Di Wu, Zhenbang He, Minglun Gong\n- **🏫 单位**：University of Guelph ⟐ University of Macau ⟐ University of British Columbia Okanagan\n- **🔗 链接**：[[中英摘要](./abs/2507.18923.md)] [[arXiv:2507.18923](https://arxiv.org/abs/2507.18923)] [Code]\n- **📝 说明**:\n\n#### [500] Learning Efficient and Generalizable Human Representation with Human Gaussian Model\n- **🧑‍🔬 作者**：Yifan Liu, Shengjun Zhang, Chensheng Dai, Yang Chen, Hao Liu, Chen Li, Yueqi Duan\n- **🏫 单位**：Tsinghua University ⟐ WeChat Vision, Tecent Inc. ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2507.18758.md)] [[arXiv:2507.18758](https://arxiv.org/abs/2507.18758)] [Code]\n- **📝 说明**:\n\n#### [501] Unposed 3DGS Reconstruction with Probabilistic Procrustes Mapping\n- **🧑‍🔬 作者**：Chong Cheng, Zijian Wang, Sicheng Yu, Yu Hu, Nanjie Yao, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2507.18541.md)] [[arXiv:2507.18541](https://arxiv.org/abs/2507.18541)] [Code]\n- **📝 说明**:\n\n#### [502] GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians\n- **🧑‍🔬 作者**：Tomislav Pavković, Mohammad-Ali Nikouei Mahani, Johannes Niedermayer, Johannes Betz\n- **🏫 单位**：Technical University of Munich ⟐ BMW Group\n- **🔗 链接**：[[中英摘要](./abs/2507.18522.md)] [[arXiv:2507.18522](https://arxiv.org/abs/2507.18522)] [Code]\n- **📝 说明**:\n\n#### [503] MVG4D: Image Matrix-Based Multi-View and Motion Generation for 4D Content Creation from a Single Image\n- **🧑‍🔬 作者**：DongFu Yin, Xiaotian Chen, Fei Richard Yu, Xuanchen Li, Xinhao Zhang\n- **🏫 单位**：Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen) ⟐ Shenzhen University ⟐ Tsinghua University ⟐ Dalian University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.18371.md)] [[arXiv:2507.18371](https://arxiv.org/abs/2507.18371)] [Code]\n- **📝 说明**:\n\n#### [504] G2S-ICP SLAM: Geometry-aware Gaussian Splatting ICP SLAM\n- **🧑‍🔬 作者**：Gyuhyeon Pak, Hae Min Cho, Euntai Kim\n- **🏫 单位**：Yonsei University ⟐ Gachon University\n- **🔗 链接**：[[中英摘要](./abs/2507.18344.md)] [[arXiv:2507.18344](https://arxiv.org/abs/2507.18344)] [Code]\n- **📝 说明**:\n\n#### [505] PS-GS: Gaussian Splatting for Multi-View Photometric Stereo\n- **🧑‍🔬 作者**：Yixiao Chen, Bin Liang, Hanzhi Guo, Yongqing Cheng, Jiayi Zhao, Dongdong Weng\n- **🏫 单位**：Beijing Institute of Technology ⟐ China Software Testing Center\n- **🔗 链接**：[[中英摘要](./abs/2507.18231.md)] [[arXiv:2507.18231](https://arxiv.org/abs/2507.18231)] [Code]\n- **📝 说明**:\n\n#### [506] High-fidelity 3D Gaussian Inpainting: preserving multi-view consistency and photorealistic details\n- **🧑‍🔬 作者**：Jun Zhou, Dinghao Li, Nannan Li, Mingjie Wang\n- **🏫 单位**：Dalian Maritime University ⟐ Zhejiang Sci-Tech University\n- **🔗 链接**：[[中英摘要](./abs/2507.18023.md)] [[arXiv:2507.18023](https://arxiv.org/abs/2507.18023)] [Code]\n- **📝 说明**:\n\n#### [507] StreamME: Simplify 3D Gaussian Avatar within Live Stream\n- **🧑‍🔬 作者**：Luchuan Song, Yang Zhou, Zhan Xu, Yi Zhou, Deepali Aneja, Chenliang Xu\n- **🏫 单位**：University of Rochester ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2507.17029.md)] [[arXiv:2507.17029](https://arxiv.org/abs/2507.17029)] [[Code](https://github.com/Songluchuan/StreamMEcode)]\n- **📝 说明**:\n\n#### [508] LongSplat: Online Generalizable 3D Gaussian Splatting from Long Sequence Images\n- **🧑‍🔬 作者**：Guichen Huang, Ruoyu Wang, Xiangjun Gao, Che Sun, Yuwei Wu, Shenghua Gao, Yunde Jia\n- **🏫 单位**：Beijing Institute of Technology ⟐ Shenzhen MSU-BIT University ⟐ Transcengram ⟐ The Hong Kong University of Science and Technology ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2507.16144.md)] [[arXiv:2507.16144](https://arxiv.org/abs/2507.16144)] [Code]\n- **📝 说明**:\n\n#### [509] Appearance Harmonization via Bilateral Grid Prediction with Transformers for 3DGS\n- **🧑‍🔬 作者**：Jisu Shin, Richard Shaw, Seunghyun Shin, Anton Pelykh, Zhensong Zhang, Hae-Gon Jeon, Eduardo Perez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab ⟐ GIST AI Graduate School ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2507.15748.md)] [[arXiv:2507.15748](https://arxiv.org/abs/2507.15748)] [Code]\n- **📝 说明**:\n\n#### [510] Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing\n- **🧑‍🔬 作者**：Boni Hu, Zhenyu Xia, Lin Chen, Pengcheng Han, Shuhui Bu\n- **🏫 单位**：Northwestern Polytechnical University ⟐ National Key Laboratory of Aircraft Configuration Design\n- **🔗 链接**：[[中英摘要](./abs/2507.15683.md)] [[arXiv:2507.15683](https://arxiv.org/abs/2507.15683)] [Code]\n- **📝 说明**:\n\n#### [511] Adaptive 3D Gaussian Splatting Video Streaming: Visual Saliency-Aware Tiling and Meta-Learning-Based Bitrate Adaptation\n- **🧑‍🔬 作者**：Han Gong, Qiyue Li, Jie Li, Zhi Liu\n- **🏫 单位**：Hefei University of Technology ⟐ The University of Electro-Communications\n- **🔗 链接**：[[中英摘要](./abs/2507.14454.md)] [[arXiv:2507.14454](https://arxiv.org/abs/2507.14454)] [Code]\n- **📝 说明**:\n\n#### [512] Adaptive 3D Gaussian Splatting Video Streaming\n- **🧑‍🔬 作者**：Han Gong, Qiyue Li, Zhi Liu, Hao Zhou, Peng Yuan Zhou, Zhu Li, Jie Li\n- **🏫 单位**：Hefei University of Technology ⟐ The University of Electro-Communications ⟐ University of Science and Technology of China ⟐ Aarhus University ⟐ University of Missouri–Kansas City\n- **🔗 链接**：[[中英摘要](./abs/2507.14432.md)] [[arXiv:2507.14432](https://arxiv.org/abs/2507.14432)] [Code]\n- **📝 说明**:\n\n#### [513] DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation\n- **🧑‍🔬 作者**：Haoran Li, Yuli Tian, Kun Lan, Yong Liao, Lin Wang, Pan Hui, Peng Yuan Zhou\n- **🏫 单位**：University of Science and Technology of China ⟐ Nanyang Technological University ⟐ Hong Kong University of Science and Technology (Guangzhou) ⟐ Aarhus University\n- **🔗 链接**：[[中英摘要](./abs/2507.13985.md)] [[arXiv:2507.13985](https://arxiv.org/abs/2507.13985)] [[Code](https://github.com/DreamScene-Project/DreamScene)]\n- **📝 说明**:\n\n#### [514] PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations\n- **🧑‍🔬 作者**：Yu Wei, Jiahui Zhang, Xiaoqin Zhang, Ling Shao, Shijian Lu\n- **🏫 单位**：Nanyang Technological University ⟐ Zhejiang University of Technology ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2507.13891.md)] [[arXiv:2507.13891](https://arxiv.org/abs/2507.13891)] [Code]\n- **📝 说明**:\n\n#### [515] VolSegGS: Segmentation and Tracking in Dynamic Volumetric Scenes via Deformable 3D Gaussians\n- **🧑‍🔬 作者**：Siyuan Yao, Chaoli Wang\n- **🏫 单位**：University of Notre Dame\n- **🔗 链接**：[[中英摘要](./abs/2507.12667.md)] [[arXiv:2507.12667](https://arxiv.org/abs/2507.12667)] [Code]\n- **📝 说明**:\n\n#### [516] Wavelet-GS: 3D Gaussian Splatting with Wavelet Decomposition\n- **🧑‍🔬 作者**：Beizhen Zhao, Yifan Zhou, Sicheng Yu, Zijian Wang, Hao Wang\n- **🏫 单位**：HKUST(GZ)\n- **🔗 链接**：[[中英摘要](./abs/2507.12498.md)] [[arXiv:2507.12498](https://arxiv.org/abs/2507.12498)] [Code]\n- **📝 说明**:\n\n#### [517] BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images\n- **🧑‍🔬 作者**：Davide Di Nucci, Matteo Tomei, Guido Borghi, Luca Ciuffreda, Roberto Vezzani, Rita Cucchiara\n- **🏫 单位**：University of Modena and Reggio Emilia ⟐ Prometeia\n- **🔗 链接**：[[中英摘要](./abs/2507.12095.md)] [[arXiv:2507.12095](https://arxiv.org/abs/2507.12095)] [Code]\n- **📝 说明**:\n\n#### [518] Dark-EvGS: Event Camera as an Eye for Radiance Field in the Dark\n- **🧑‍🔬 作者**：Jingqian Wu, Peiqi Duan, Zongqiang Wang, Changwei Wang, Boxin Shi, Edmund Y. Lam\n- **🏫 单位**：The University of Hong Kong ⟐ Peking University ⟐ Chinese Academy of Science ⟐ Qilu University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.11931.md)] [[arXiv:2507.11931](https://arxiv.org/abs/2507.11931)] [Code]\n- **📝 说明**:\n\n#### [519] A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction\n- **🧑‍🔬 作者**：Haoxuan Qu, Yujun Cai, Hossein Rahmani, Ajay Kumar, Junsong Yuan, Jun Liu\n- **🏫 单位**：Lancaster University ⟐ The University of Queensland ⟐ The Hong Kong Polytechnic University ⟐ University at Buffalo\n- **🔗 链接**：[[中英摘要](./abs/2507.11321.md)] [[arXiv:2507.11321](https://arxiv.org/abs/2507.11321)] [Code]\n- **📝 说明**:\n\n#### [520] 3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving\n- **🧑‍🔬 作者**：Yixun Zhang, Lizhi Wang, Junjun Zhao, Wending Zhao, Feng Zhou, Yonghao Dang, Jianqin Yin\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2507.09993.md)] [[arXiv:2507.09993](https://arxiv.org/abs/2507.09993)] [Code]\n- **📝 说明**:\n\n#### [521] RePaintGS: Reference-Guided Gaussian Splatting for Realistic and View-Consistent 3D Scene Inpainting\n- **🧑‍🔬 作者**：Ji Hyun Seo, Byounhyun Yoo, Gerard Jounghyun Kim\n- **🏫 单位**：Korea Institute of Science and Technology ⟐ Korea University ⟐ Korea National University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.08434.md)] [[arXiv:2507.08434](https://arxiv.org/abs/2507.08434)] [Code]\n- **📝 说明**:\n\n#### [522] RTR-GS: 3D Gaussian Splatting for Inverse Rendering with Radiance Transfer and Reflection\n- **🧑‍🔬 作者**：Yongyang Zhou, Fang-Lue Zhang, Zichen Wang, Lei Zhang\n- **🏫 单位**：Beijing Institute of Technology ⟐ Victoria University of Wellington\n- **🔗 链接**：[[中英摘要](./abs/2507.07733.md)] [[arXiv:2507.07733](https://arxiv.org/abs/2507.07733)] [Code]\n- **📝 说明**:\n\n#### [523] SD-GS: Structured Deformable 3D Gaussians for Efficient Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Wei Yao, Shuzhao Xie, Letian Li, Weixiang Zhang, Zhixin Lai, Shiqi Dai, Ke Zhang, Zhi Wang\n- **🏫 单位**：Tsinghua University ⟐ Google ⟐ Soochow University\n- **🔗 链接**：[[中英摘要](./abs/2507.07465.md)] [[arXiv:2507.07465](https://arxiv.org/abs/2507.07465)] [Code]\n- **📝 说明**:\n\n#### [524] LighthouseGS: Indoor Structure-aware 3D Gaussian Splatting for Panorama-Style Mobile Captures\n- **🧑‍🔬 作者**：Seungoh Han, Jaehoon Jang, Hyunsu Kim, Jaeheung Surh, Junhyung Kwak, Hyowon Ha, Kyungdon Joo\n- **🏫 单位**：UNIST ⟐ Bucketplace, Co., Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2507.06109.md)] [[arXiv:2507.06109](https://arxiv.org/abs/2507.06109)] [Code]\n- **📝 说明**:\n\n#### [525] Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering\n- **🧑‍🔬 作者**：Jiayi Song, Zihan Ye, Qingyuan Zhou, Weidong Yang, Ben Fei, Jingyi Xu, Ying He, Wanli Ouyang\n- **🏫 单位**：Fudan University ⟐ The Chinese University of Hong Kong ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2507.06103.md)] [[arXiv:2507.06103](https://arxiv.org/abs/2507.06103)] [[Code](https://github.com/Kallyelish/Ref-Unlock)]\n- **📝 说明**:\n\n#### [526] D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos\n- **🧑‍🔬 作者**：Wenkang Zhang, Yan Zhao, Qiang Wang, Li Song, Zhengxue Cheng\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Visionstar Information Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.05859.md)] [[arXiv:2507.05859](https://arxiv.org/abs/2507.05859)] [Code]\n- **📝 说明**:\n\n#### [527] 3DGS_LSR:Large_Scale Relocation for Autonomous Driving Based on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Haitao Lu, Haijier Chen, Haoze Liu, Shoujian Zhang, Bo Xu, Ziao Liu\n- **🏫 单位**：Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2507.05661.md)] [[arXiv:2507.05661](https://arxiv.org/abs/2507.05661)] [Code]\n- **📝 说明**:\n\n#### [528] Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors\n- **🧑‍🔬 作者**：Lanqing Guo, Yufei Wang, Hezhen Hu, Yan Zheng, Yeying Jin, Siyu Huang, Zhangyang Wang\n- **🏫 单位**：The University of Texas at Austin ⟐ Snap Research ⟐ Tencent ⟐ Clemson University\n- **🔗 链接**：[[中英摘要](./abs/2507.05426.md)] [[arXiv:2507.05426](https://arxiv.org/abs/2507.05426)] [Code]\n- **📝 说明**:\n\n#### [529] InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior\n- **🧑‍🔬 作者**：Minghao Wen, Shengjie Wu, Kangkan Wang, Dong Liang\n- **🏫 单位**：Nanjing University of Aeronautics and Astronautics ⟐ Nanjing University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2507.04961.md)] [[arXiv:2507.04961](https://arxiv.org/abs/2507.04961)] [Code]\n- **📝 说明**:\n\n#### [530] ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments\n- **🧑‍🔬 作者**：Guile Wu, Dongfeng Bai, Bingbing Liu\n- **🏫 单位**：Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2507.03886.md)] [[arXiv:2507.03886](https://arxiv.org/abs/2507.03886)] [Code]\n- **📝 说明**:\n\n#### [531] AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars\n- **🧑‍🔬 作者**：Yiming Zhong, Xiaolin Zhang, Ligang Liu, Yao Zhao, Yunchao Wei\n- **🏫 单位**：Beijing Jiaotong University ⟐ Shandong University of Science and Technology ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2507.02419.md)] [[arXiv:2507.02419](https://arxiv.org/abs/2507.02419)] [Code]\n- **📝 说明**:\n\n#### [532] VISTA: Open-Vocabulary, Task-Relevant Robot Exploration with Online Semantic Gaussian Splatting\n- **🧑‍🔬 作者**：Keiko Nagami, Timothy Chen, Javier Yu, Ola Shorinwa, Maximilian Adang, Carlyn Dougherty, Eric Cristofalo, Mac Schwager\n- **🏫 单位**：Stanford University ⟐ MIT Lincoln Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2507.01125.md)] [[arXiv:2507.01125](https://arxiv.org/abs/2507.01125)] [Code]\n- **📝 说明**:\n\n#### [533] Masks make discriminative models great again!\n- **🧑‍🔬 作者**：Tianshi Cao, Marie-Julie Rakotosaona, Ben Poole, Federico Tombari, Michael Niemeyer\n- **🏫 单位**：University of Toronto ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2507.00916.md)] [[arXiv:2507.00916](https://arxiv.org/abs/2507.00916)] [Code]\n- **📝 说明**:\n\n#### [534] GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond\n- **🧑‍🔬 作者**：Anna-Maria Halacheva, Jan-Nico Zaech, Xi Wang, Danda Pani Paudel, Luc Van Gool\n- **🏫 单位**：Sofia University ⟐ ETH Zurich ⟐ TU Munich\n- **🔗 链接**：[[中英摘要](./abs/2507.00886.md)] [[arXiv:2507.00886](https://arxiv.org/abs/2507.00886)] [Code]\n- **📝 说明**:\n\n#### [535] LOD-GS: Level-of-Detail-Sensitive 3D Gaussian Splatting for Detail Conserved Anti-Aliasing\n- **🧑‍🔬 作者**：Zhenya Yang, Bingchen Gong, Kai Chen\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Ecole Polytechnique\n- **🔗 链接**：[[中英摘要](./abs/2507.00554.md)] [[arXiv:2507.00554](https://arxiv.org/abs/2507.00554)] [[Code](https://github.com/Huster-YZY/LOD-GS)]\n- **📝 说明**:\n\n#### [536] GDGS: 3D Gaussian Splatting Via Geometry-Guided Initialization And Dynamic Density Control\n- **🧑‍🔬 作者**：Xingjun Wang, Lianlei Shan\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2507.00363.md)] [[arXiv:2507.00363](https://arxiv.org/abs/2507.00363)] [Code]\n- **📝 说明**:\n\n---\n\n**更早论文已按时间归档，跳转到**[[归档](#-归档论文)]**查看**\n"
  },
  {
    "path": "Survey.md",
    "content": "# Survey Papers\n\n#### [1] A Survey on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Guikun Chen, Wenguan Wang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2401.03890.md)] [[arXiv:2401.03890](https://arxiv.org/abs/2401.03890)]\n- **📝 说明**：🔥 首篇综述\n\n#### [2] 3D Gaussian as a New Vision Era: A Survey\n- **🧑‍🔬 作者**：Ben Fei, Jingyi Xu, Rui Zhang, Qingyuan Zhou, Weidong Yang, Ying He\n- **🏫 单位**：Fudan University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2402.07181.md)] [[arXiv:2402.07181](https://arxiv.org/abs/2402.07181)]\n- **📝 说明**：🏆 Accepted to IEEE TVCG 2024\n\n#### [3] Recent Advances in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao\n- **🏫 单位**：Chinese Academy of Sciences ⟐ VAST ⟐  University of California\n- **🔗 链接**：[[中英摘要](./abs/2403.11134.md)] [[arXiv:2403.11134](https://arxiv.org/abs/2403.11134)]\n- **📝 说明**：🔥 第三篇综述\n\n#### [4] Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review\n- **🧑‍🔬 作者**：Anurag Dalal, Daniel Hagen, Kjell G. Robbersmyr, Kristian Muri Knausgård\n- **🏫 单位**：University of Agder\n- **🔗 链接**：[[中英摘要](./abs/2405.03417.md)] [[arXiv:2405.03417](https://arxiv.org/abs/2405.03417)]\n- **📝 说明**：🔥 第四篇综述\n\n#### [5] 3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods\n- **🧑‍🔬 作者**：Milena T. Bagdasarian, Paul Knoll, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern\n- **🏫 单位**：Fraunhofer Heinrich Hertz ⟐ Humboldt University of Berlin\n- **🔗 链接**：[[中英摘要](./abs/2407.09510.md)] [[arXiv:2407.09510](https://arxiv.org/abs/2407.09510)]\n- **📝 说明**：🧩 高斯压缩综述\n\n#### [6] 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities\n- **🧑‍🔬 作者**：Yanqi Bao, Tianyu Ding, Jing Huo, Yaoli Liu, Yuxin Li, Wenbin Li, Yang Gao, Jiebo Luo\n- **🏫 单位**：State Key Laboratory for Novel Software Technology, Nanjing University, China ⟐  the Applied Sciences Group, Microsoft Corporation, Redmond, USA ⟐ the Department of Computer Science, University of Rochester, America\n- **🔗 链接**：[[中英摘要](./abs/2407.17418.md)] [[arXiv:2407.17418](https://arxiv.org/abs/2407.17418)]\n- **📝 说明**：🔥 分类比较细\n\n#### [7] 3D Gaussian Splatting in Robotics: A Survey\n- **🧑‍🔬 作者**：Siting Zhu, Guangming Wang, Dezhi Kong, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2410.12262.md)] [[arXiv:2407.12262](https://arxiv.org/abs/2410.12262)]\n- **📝 说明**：🤖 Robotics\n\n#### [8] Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects\n- **🧑‍🔬 作者**：Shi Qiu, Binzhu Xie, Qixuan Liu, Pheng-Ann Heng\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2412.06257.md)] [[arXiv:2412.06257](https://arxiv.org/abs/2412.06257)]\n- **📝 说明**：🏆 Accepted to IEEE AIxVR 2025\n\n#### [9] Compression in 3D Gaussian Splatting: A Survey of Methods, Trends, and Future Directions\n- **🧑‍🔬 作者**：Muhammad Salman Ali, Chaoning Zhang, Marco Cagnazzo, Giuseppe Valenzise, Enzo Tartaglione, Sung-Ho Bae\n- **🏫 单位**：Kyung Hee University ⟐ UESTC ⟐ Télécom Paris ⟐ Institut Polytechnique de Paris ⟐ Universita degli Studi di Padova ⟐ Université Paris-Saclay\n- **🔗 链接**：[[中英摘要](./abs/2502.19457.md)] [[arXiv:2502.19457](https://arxiv.org/abs/2502.19457)] [Code]\n- **📝 说明**：🧩 高斯压缩综述\n\n#### [10] Dynamic Scene Reconstruction: Recent Advance in Real-time Rendering and Streaming\n- **🧑‍🔬 作者**：Jiaxuan Zhu, Hao Tang\n- **🏫 单位**：Southeast University ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2503.08166.md)] [[arXiv:2503.08166](https://arxiv.org/abs/2503.08166)] [Code]\n- **📝 说明**：🎥 动态场景重建综述\n\n#### [11] 3D Scene Generation: A Survey\n- **🧑‍🔬 作者**：Beichen Wen, Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu\n- **🏫 单位**： Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2505.05474.md)] [[arXiv:2505.05474](https://arxiv.org/abs/2505.05474)] [[Code](https://github.com/hzxie/Awesome-3D-Scene-Generation)]\n- **📝 说明**：场景生成综述\n\n#### [12] A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation\n- **🧑‍🔬 作者**：Shuting He, Peilin Ji, Yitong Yang, Changshuo Wang, Jiayi Ji, Yinglin Wang, Henghui Ding\n- **🏫 单位**：Shanghai University of Finance and Economics ⟐ University College London ⟐ National University of Singapore ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2508.09977.md)] [[arXiv:2508.09977](https://arxiv.org/abs/2508.09977)] [[Code](https://github.com/heshuting555/Awesome-3DGS-Applications)]\n- **📝 说明**：应用综述，包括分割、编辑和生成\n"
  },
  {
    "path": "abs/2308.04079.md",
    "content": "### 3D Gaussian Splatting for Real-Time Radiance Field Rendering\n\nRadiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.\n\n辐射场方法最近彻底改变了使用多张照片或视频捕捉的场景的新视角合成技术。然而，要实现高视觉质量仍然需要成本高昂的神经网络来训练和渲染，而近期较快的方法不可避免地在速度和质量之间做出妥协。对于未受限且完整的场景（而不是孤立对象）和1080p分辨率渲染，目前还没有方法能实现实时显示率。我们引入了三个关键元素，使我们能够在保持竞争力的训练时间的同时，实现行业领先的视觉质量，并且重要的是，支持高质量的实时（>= 30 fps）新视角合成，分辨率为1080p。首先，从相机校准过程中产生的稀疏点出发，我们用3D高斯分布来表示场景，它保留了连续体积辐射场对场景优化的有益属性，同时避免了在空旷空间中的不必要计算；其次，我们进行3D高斯的交错优化/密度控制，特别是优化各向异性协方差以实现场景的精确表现；第三，我们开发了一种快速的可感知可见性的渲染算法，支持各向异性溅射，加速了训练并允许实时渲染。我们在几个已建立的数据集上展示了行业领先的视觉质量和实时渲染。\n"
  },
  {
    "path": "abs/2308.09713.md",
    "content": "### Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis\n\nWe present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local-rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.\n\n我们提出了一种方法，同时解决了动态场景新视角合成和所有密集场景元素的六自由度（6-DOF）跟踪任务。我们遵循分析-合成框架，受到最近将场景建模为一组3D高斯分布的工作启发，这些高斯分布通过可微渲染优化来重构输入图像。为了建模动态场景，我们允许高斯分布随时间移动和旋转，同时确保它们保持持久的颜色、不透明度和大小。通过对高斯分布的运动和旋转施加局部刚性约束，我们展示了我们的动态3D高斯分布如何正确地随时间模拟物理空间的相同区域，包括该空间的旋转。密集的6-DOF跟踪和动态重构自然地从持久的动态视角合成中产生，而不需要任何对应或流作为输入。我们展示了我们的表征所支持的大量下游应用，包括第一人称视角合成、动态组合场景合成和4D视频编辑。\n"
  },
  {
    "path": "abs/2308.14737.md",
    "content": "### Flexible Techniques for Differentiable Rendering with 3D Gaussians\n\nFast, reliable shape reconstruction is an essential ingredient in many computer vision applications. Neural Radiance Fields demonstrated that photorealistic novel view synthesis is within reach, but was gated by performance requirements for fast reconstruction of real scenes and objects. Several recent approaches have built on alternative shape representations, in particular, 3D Gaussians. We develop extensions to these renderers, such as integrating differentiable optical flow, exporting watertight meshes and rendering per-ray normals. Additionally, we show how two of the recent methods are interoperable with each other. These reconstructions are quick, robust, and easily performed on GPU or CPU.\n\n快速、可靠的形状重建是许多计算机视觉应用中的重要组成部分。神经辐射场表明，真实感的新视角合成是可行的，但其性能要求限制了对真实场景和物体的快速重建。最近几种方法基于替代形状表示，特别是3D高斯分布。我们对这些渲染器进行了扩展，例如集成了可微光流、导出密闭网格和渲染每条光线的法线。此外，我们展示了最近的两种方法如何相互兼容。这些重建快速、稳健，且可以轻松地在GPU或CPU上执行。\n"
  },
  {
    "path": "abs/2309.13101.md",
    "content": "### Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction\n\nImplicit neural representation has paved the way for new approaches to dynamic scene reconstruction and rendering. Nonetheless, cutting-edge dynamic neural rendering methods rely heavily on these implicit representations, which frequently struggle to capture the intricate details of objects in the scene. Furthermore, implicit methods have difficulty achieving real-time rendering in general dynamic scenes, limiting their use in a variety of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using 3D Gaussians and learns them in canonical space with a deformation field to model monocular dynamic scenes. We also introduce an annealing smoothing training mechanism with no extra overhead, which can mitigate the impact of inaccurate poses on the smoothness of time interpolation tasks in real-world datasets. Through a differential Gaussian rasterizer, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time interpolation, and real-time rendering.\n\n隐式神经表示为动态场景重建和渲染开辟了新的方法。然而，尖端的动态神经渲染方法严重依赖这些隐式表示，这些表示通常难以捕捉场景中物体的复杂细节。此外，隐式方法通常难以实现一般动态场景的实时渲染，限制了它们在各种任务中的使用。为了解决这些问题，我们提出了一种可变形3D高斯溅射方法，该方法使用3D高斯分布重建场景，并在规范空间中学习它们，配合变形场来模拟单目动态场景。我们还引入了一个无额外开销的退火平滑训练机制，可以缓解不准确姿态对实际数据集中时间插值任务平滑性的影响。通过差分高斯光栅化器，可变形3D高斯不仅实现了更高的渲染质量，还实现了实时渲染速度。实验表明，我们的方法在渲染质量和速度方面都显著优于现有方法，非常适合新视角合成、时间插值和实时渲染等任务。\n"
  },
  {
    "path": "abs/2309.16585.md",
    "content": "### Text-to-3D using Gaussian Splatting\n\nIn this paper, we present Gaussian Splatting based text-to-3D generation (GSGEN), a novel approach for generating high-quality 3D objects. Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components.\n\n在本文中，我们提出了基于高斯溅射的文本到3D生成方法（GSGEN），这是一种生成高质量3D对象的新颖方法。以前的方法由于缺乏3D先验和适当的表示，而受到几何精度不准确和保真度有限的困扰。我们利用3D高斯溅射——一种最新的顶尖表示法，通过利用其明确性质来解决现有的不足，使得能够结合3D先验。具体来说，我们的方法采用了一个渐进的优化策略，包括一个几何优化阶段和一个外观细化阶段。在几何优化中，根据3D几何先验以及常规的2D SDS损失，建立了一个粗略的表示，确保了一个合理且与3D一致的粗略形状。随后，获得的高斯进行迭代细化以丰富细节。在这个阶段，我们通过基于紧凑性的增密来增加高斯数量，以提高连续性和改善保真度。通过这些设计，我们的方法能够生成具有精致细节和更准确几何形状的3D内容。广泛的评估证明了我们方法的有效性，尤其是在捕捉高频组件方面。\n"
  },
  {
    "path": "abs/2309.16653.md",
    "content": "### DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation\n\nRecent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods.\n\n最近在3D内容创作方面的进展主要利用基于优化的3D生成，通过分数蒸馏采样（SDS）。尽管展示了有希望的结果，但这些方法通常受到每个样本优化速度慢的限制，限制了它们的实际使用。在本文中，我们提出了DreamGaussian，一种新型的3D内容生成框架，同时实现了效率和质量。我们的关键见解是设计一个生成性的3D高斯溅射模型，并配备网格提取和UV空间的纹理细化。与神经辐射场中使用的占用率修剪相比，我们证明了3D高斯的逐步密集化对于3D生成任务的收敛速度明显更快。为了进一步提高纹理质量并促进下游应用，我们引入了一种高效的算法，将3D高斯转换为带纹理的网格，并应用微调阶段来细化细节。广泛的实验表明我们提出的方法具有超越的效率和竞争性的生成质量。值得注意的是，DreamGaussian能够在仅用单视图图像的情况下在2分钟内产生高质量的带纹理网格，与现有方法相比大约实现了10倍的加速。\n"
  },
  {
    "path": "abs/2310.08528.md",
    "content": "### 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering\n\nRepresenting and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame. In 4D-GS, a novel explicit representation containing both 3D Gaussians and 4D neural voxels is proposed. A decomposed neural voxel encoding algorithm inspired by HexPlane is proposed to efficiently build Gaussian features from 4D neural voxels and then a lightweight MLP is applied to predict Gaussian deformations at novel timestamps. Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800×800 resolution on an RTX 3090 GPU while maintaining comparable or better quality than previous state-of-the-art methods.\n\n表示和渲染动态场景一直是一个重要但具有挑战性的任务。特别是，要准确地模拟复杂的运动，通常很难保证高效率。为了实现实时动态场景渲染，同时享有高训练和存储效率，我们提出了4D高斯溅射（4D-GS）作为动态场景的整体表示，而不是为每个单独的帧应用3D-GS。在4D-GS中，提出了一个包含3D高斯和4D神经体素的新型显式表示。我们受到HexPlane启发，提出了一个分解的神经体素编码算法，以有效地从4D神经体素构建高斯特征，然后应用轻量级的多层感知器（MLP）来预测新时间戳的高斯变形。我们的4D-GS方法在高分辨率下实现了实时渲染，在RTX 3090 GPU上以800×800分辨率达到82 FPS，同时保持与之前最先进方法相当或更好的质量。\n"
  },
  {
    "path": "abs/2310.08529.md",
    "content": "### GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models\n\nIn recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can help generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D object generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our GaussianDreamer can generate a high-quality 3D instance or 3D avatar within 15 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time.\n\n近期，从文本提示生成3D资源取得了令人印象深刻的成果。2D和3D扩散模型都能帮助基于提示生成合理的3D对象。3D扩散模型具有良好的3D一致性，但由于可训练的3D数据昂贵且难以获得，其质量和泛化能力受限。2D扩散模型具有强大的泛化能力和细致的生成能力，但很难保证3D一致性。本文尝试通过最近的显式和高效3D高斯溅射表示，将这两种扩散模型的优势结合起来。我们提出了一种快速的3D对象生成框架，名为GaussianDreamer，其中3D扩散模型提供初始化的先验，而2D扩散模型丰富几何形状和外观。引入了噪点生长和颜色扰动操作来增强初始化的高斯分布。我们的GaussianDreamer可以在15分钟内在单个GPU上生成高质量的3D实例或3D头像，比之前的方法快得多，同时生成的实例可以直接实时渲染。\n"
  },
  {
    "path": "abs/2310.10642.md",
    "content": "### Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting\n\nReconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. Despite advancements in neural implicit models, limitations persist: (i) Inadequate Scene Structure: Existing methods struggle to reveal the spatial and temporal structure of dynamic scenes from directly learning the complex 6D plenoptic function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element deformation becomes impractical for complex dynamics. To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Learning to optimize the 4D primitives enables us to synthesize novel views at any desired time with our tailored rendering routine. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics. This approach offers simplicity, flexibility for variable-length video and end-to-end training, and efficient real-time rendering, making it suitable for capturing complex dynamic scene motions. Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency.\n\n从2D图像重建动态3D场景并随时间生成多样化视图是一个挑战，因为场景复杂性和时间动态性。尽管在神经隐式模型方面取得了进步，但仍存在局限性：（i）场景结构不足：现有方法在直接学习复杂的6D全视函数时，难以揭示动态场景的空间和时间结构。 （ii）缩放变形建模：对于复杂动态，显式建模场景元素的变形变得不切实际。为解决这些问题，我们将时空视为一个整体，提出通过优化一组4D原语来近似动态场景的潜在时空4D体积，同时进行显式的几何和外观建模。学习优化4D原语使我们能够在任何期望的时间使用我们定制的渲染程序合成新视图。我们的模型在概念上简单，由4D高斯构成，由各向异性椭圆参数化，这些椭圆可以在空间和时间中任意旋转，以及视角依赖和随时间演变的外观，由4D球柱谐波系数表示。这种方法提供了简单性、适应可变长度视频和端到端训练的灵活性，以及高效的实时渲染，使其适合捕捉复杂的动态场景运动。在各种基准测试中，包括单眼和多视图场景，我们的4DGS模型展示了卓越的视觉质量和效率。\n"
  },
  {
    "path": "abs/2310.19441.md",
    "content": "### Dynamic Gaussian Splatting from Markerless Motion Capture can Reconstruct Infants Movements\n\nEasy access to precise 3D tracking of movement could benefit many aspects of rehabilitation. A challenge to achieving this goal is that while there are many datasets and pretrained algorithms for able-bodied adults, algorithms trained on these datasets often fail to generalize to clinical populations including people with disabilities, infants, and neonates. Reliable movement analysis of infants and neonates is important as spontaneous movement behavior is an important indicator of neurological function and neurodevelopmental disability, which can help guide early interventions. We explored the application of dynamic Gaussian splatting to sparse markerless motion capture (MMC) data. Our approach leverages semantic segmentation masks to focus on the infant, significantly improving the initialization of the scene. Our results demonstrate the potential of this method in rendering novel views of scenes and tracking infant movements. This work paves the way for advanced movement analysis tools that can be applied to diverse clinical populations, with a particular emphasis on early detection in infants.\n\n精确的3D运动跟踪易于获取，可以使康复的许多方面受益。实现这一目标的挑战在于，尽管有许多面向健全成人的数据集和预训练算法，但在这些数据集上训练的算法常常无法泛化到包括残疾人、婴儿和新生儿在内的临床人群。对婴儿和新生儿的可靠运动分析很重要，因为自发运动行为是神经功能和神经发育障碍的重要指标，有助于指导早期干预。我们探索了将动态高斯溅射应用于稀疏无标记运动捕捉（MMC）数据。我们的方法利用语义分割掩码专注于婴儿，显著改善了场景的初始化。我们的结果展示了这种方法在渲染场景的新视角和跟踪婴儿运动方面的潜力。这项工作为可以应用于不同临床人群的先进运动分析工具铺平了道路，特别强调了对婴儿的早期检测。\n"
  },
  {
    "path": "abs/2311.08581.md",
    "content": "### Drivable 3D Gaussian Avatars\n\nWe present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.\n\n我们提出了可驾驶的3D高斯头像（D3GA），这是第一个使用高斯溅射渲染的可控制人体3D模型。目前的逼真可驾驶头像在训练期间要求精确的3D注册，在测试期间需要密集的输入图像，或者两者兼而有之。基于神经辐射场的头像也倾向于在远程呈现应用中过于缓慢。这项工作使用最近提出的3D高斯溅射（3DGS）技术，使用密集校准的多视图视频作为输入，以实时帧率渲染逼真的人类。为了变形这些原语，我们放弃了常用的点变形方法线性混合蒙皮（LBS），使用一种经典的体积变形方法：笼形变形。考虑到它们较小的尺寸，我们用关节角度和关键点来驱动这些变形，这更适合通信应用。我们对九个身材、衣着和动作各异的受试者进行的实验，当使用相同的训练和测试数据时，获得了比现有最先进方法更高质量的结果。\n"
  },
  {
    "path": "abs/2311.10812.md",
    "content": "### SplatArmor: Articulated Gaussian splatting for animatable humans from monocular RGB videos\n\nWe propose SplatArmor, a novel approach for recovering detailed and animatable human models by 'armoring' a parameterized body model with 3D Gaussians. Our approach represents the human as a set of 3D Gaussians within a canonical space, whose articulation is defined by extending the skinning of the underlying SMPL geometry to arbitrary locations in the canonical space. To account for pose-dependent effects, we introduce a SE(3) field, which allows us to capture both the location and anisotropy of the Gaussians. Furthermore, we propose the use of a neural color field to provide color regularization and 3D supervision for the precise positioning of these Gaussians. We show that Gaussian splatting provides an interesting alternative to neural rendering based methods by leverging a rasterization primitive without facing any of the non-differentiability and optimization challenges typically faced in such approaches. The rasterization paradigms allows us to leverage forward skinning, and does not suffer from the ambiguities associated with inverse skinning and warping. We show compelling results on the ZJU MoCap and People Snapshot datasets, which underscore the effectiveness of our method for controllable human synthesis.\n\n我们提出了SplatArmor，这是一种通过使用3D高斯分布“加固”参数化人体模型来恢复详细和可动画人类模型的新方法。我们的方法将人体表示为规范空间中的一组3D高斯分布，其关节动作是通过将底层SMPL几何体的蒙皮扩展到规范空间中的任意位置来定义的。为了解释姿势依赖的效应，我们引入了一个SE(3)场，它允许我们捕捉高斯分布的位置和各向异性。此外，我们提出使用神经颜色场来提供颜色规范化和3D监督，以精确定位这些高斯分布。我们展示了高斯溅射提供了一种有趣的替代方案，通过利用光栅化原语，而不面临神经渲染基方法中通常遇到的任何不可微分和优化挑战。光栅化范式允许我们利用正向蒙皮，不会受到与逆向蒙皮和扭曲相关的模糊性的影响。我们在ZJU MoCap和People Snapshot数据集上展示了引人注目的结果，这强调了我们方法在可控人类合成方面的有效性。\n"
  },
  {
    "path": "abs/2311.11221.md",
    "content": "### GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise\n\nText-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes.\n\n文本到3D技术，以其高效的生成方法和广泛的创造潜力而在人工智能生成内容（AIGC）领域受到了显著关注。然而，Nerf和2D扩散模型的结合经常产生过饱和的图像，由于像素级渲染方法的限制，这对下游工业应用造成了严重的限制。最近，高斯溅射已经取代了在基于NeRF的方法中普遍存在的传统逐点采样技术，彻底改变了3D重建的各个方面。本文介绍了一种基于高斯溅射的新型文本到3D内容生成框架，通过控制每个高斯球体的透明度，实现对图像饱和度的精细控制，从而产生更逼真的图像。在3D生成中实现多视图一致性的挑战显著阻碍了建模的复杂性和准确性。受到SJC的启发，我们探索使用多视图噪声分布来扰动由3D高斯溅射生成的图像，旨在纠正多视图几何中的不一致性。我们巧妙地设计了一种高效的方法来生成噪声，从不同视角产生高斯噪声，所有噪声均源自共享的噪声源。此外，普通的基于3D高斯的生成倾向于使模型陷入局部最小值，导致浮游物、毛刺或过度生长等伪影。为了缓解这些问题，我们提出了变分高斯溅射技术，以提高3D外观的质量和稳定性。据我们所知，我们的方法代表了对整个3D内容生成过程中高斯溅射的首次全面利用。\n"
  },
  {
    "path": "abs/2311.11700.md",
    "content": "### GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting\n\nIn this paper, we introduce GS-SLAM that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D re-rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussian in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. The source code will be released soon.\n\n在这篇论文中，我们介绍了首次在同时定位与建图（SLAM）系统中利用3D高斯表示的GS-SLAM。它促进了效率和准确性之间的更好平衡。与最近采用神经隐式表示的SLAM方法相比，我们的方法利用了一个实时可微分的溅射渲染管线，为地图优化和RGB-D重渲染提供了显著的加速。具体来说，我们提出了一种自适应扩展策略，以添加新的或删除噪声3D高斯，从而高效地重建新观测到的场景几何，并改进之前观测区域的映射。这一策略对于将3D高斯表示扩展到重建整个场景而不是在现有方法中合成静态对象至关重要。此外，在姿态跟踪过程中，设计了一种有效的粗到细技术，用于选择可靠的3D高斯表示来优化相机姿态，从而减少运行时间并提供鲁棒的估计。我们的方法在Replica、TUM-RGBD数据集上与现有最先进的实时方法相比达到了竞争性能。源代码将很快发布。\n"
  },
  {
    "path": "abs/2311.12198.md",
    "content": "### PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics\n\nWe introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, \"cage meshes,\" or any other geometry embedding, highlighting the principle of \"what you see is what you simulate (WS2).\" Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements.\n\n我们介绍了PhysGaussian，这是一种新方法，它将物理上基于牛顿动力学的原理无缝融入到3D高斯分布中，以实现高质量的新颖运动合成。通过使用定制的物质点方法（MPM），我们的方法丰富了3D高斯核，增加了物理意义上的运动变形和机械应力属性，所有这些都符合连续介质力学原理。我们方法的一个显著特点是物理模拟与视觉渲染之间的无缝集成：两个组件都使用相同的3D高斯核作为它们的离散表示。这消除了对三角形/四面体网格化、行进立方体、“笼状网格”或任何其他几何嵌入的必要性，突出了“所见即所模拟（WS2）”的原则。我们的方法在广泛的材料上展示了卓越的多功能性——包括弹性实体、金属、非牛顿流体和颗粒材料——展示了其在创造具有新视角和运动的多样化视觉内容方面的强大能力。\n"
  },
  {
    "path": "abs/2311.12775.md",
    "content": "### SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering\n\nWe propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene. We then introduce a method that exploits this alignment to extract a mesh from the Gaussians using Poisson reconstruction, which is fast, scalable, and preserves details, in contrast to the Marching Cubes algorithm usually applied to extract meshes from Neural SDFs. Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality.\n\n我们提出了一种从3D高斯溅射中精确且极快提取网格的方法。高斯溅射最近变得非常流行，因为它产生逼真的渲染，同时训练速度比神经辐射场（NeRFs）快得多。然而，从数百万个微小的3D高斯中提取网格是具有挑战性的，因为这些高斯在优化后往往是无序的，而且迄今为止还没有提出方法。我们的第一个关键贡献是一个正则化项，它鼓励高斯与场景表面良好对齐。然后，我们引入了一种利用这种对齐从高斯中提取网格的方法，使用泊松重建，这种方法快速、可扩展且保留细节，与通常用于从神经SDF中提取网格的行进立方体算法形成对比。最后，我们引入了一种可选的细化策略，将高斯绑定到网格的表面，并通过高斯溅射渲染联合优化这些高斯和网格。这使得通过操作网格而不是高斯本身，使用传统软件轻松编辑、雕刻、装配、动画、合成和重新照明高斯成为可能。与神经SDF上的最先进方法相比，我们的方法在几分钟内就可以检索到这样一个可编辑的网格用于逼真渲染，同时提供更好的渲染质量。\n"
  },
  {
    "path": "abs/2311.12897.md",
    "content": "### An Efficient 3D Gaussian Representation for Monocular/Multi-view Dynamic Scenes\n\nIn novel view synthesis of scenes from multiple input views, 3D Gaussian splatting emerges as a viable alternative to existing radiance field approaches, delivering great visual quality and real-time rendering. While successful in static scenes, the present advancement of 3D Gaussian representation, however, faces challenges in dynamic scenes in terms of memory consumption and the need for numerous observations per time step, due to the onus of storing 3D Gaussian parameters per time step. In this study, we present an efficient 3D Gaussian representation tailored for dynamic scenes in which we define positions and rotations as functions of time while leaving other time-invariant properties of the static 3D Gaussian unchanged. Notably, our representation reduces memory usage, which is consistent regardless of the input sequence length. Additionally, it mitigates the risk of overfitting observed frames by accounting for temporal changes. The optimization of our Gaussian representation based on image and flow reconstruction results in a powerful framework for dynamic scene view synthesis in both monocular and multi-view cases. We obtain the highest rendering speed of 118 frames per second (FPS) at a resolution of 1352×1014 with a single GPU, showing the practical usability and effectiveness of our proposed method in dynamic scene rendering scenarios.\n\n在从多个输入视图合成场景的新视角中，3D高斯溅射作为现有辐射场方法的一种可行替代方案出现，提供了出色的视觉质量和实时渲染。虽然在静态场景中取得了成功，但目前3D高斯表示在动态场景中面临着挑战，主要是由于需要存储每个时间步的3D高斯参数，导致内存消耗大和每个时间步需要大量观测。在这项研究中，我们提出了一种针对动态场景量身定制的高效3D高斯表示，我们将位置和旋转定义为时间的函数，同时保持静态3D高斯的其他时间不变属性不变。值得注意的是，我们的表示减少了内存使用，这与输入序列的长度无关。此外，它通过考虑时间变化，减轻了过度拟合观测帧的风险。我们基于图像和流重建优化的高斯表示形成了一个强大的框架，用于在单目和多视图情况下进行动态场景视图合成。我们在单个GPU上以1352×1014的分辨率实现了最高118帧每秒（FPS）的渲染速度，显示了我们提出的方法在动态场景渲染场景中的实用性和有效性。\n"
  },
  {
    "path": "abs/2311.13384.md",
    "content": "### LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes\n\nWith the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene.\n\n随着VR设备和内容的广泛使用，对3D场景生成技术的需求变得越来越流行。然而，现有的3D场景生成模型限制了目标场景到特定领域，主要是因为它们使用远离现实世界的3D扫描数据集进行训练。为了解决这一限制，我们提出了LucidDreamer，这是一种域自由的场景生成管道，充分利用现有的大规模基于扩散的生成模型的能力。我们的LucidDreamer有两个交替步骤：梦境和对齐。首先，为了从输入生成多视图一致的图像，我们将点云设置为每个图像生成的几何指南。具体来说，我们将点云的一部分投影到期望的视图，并将投影作为使用生成模型进行修补的指南。用估计的深度图提升到3D空间的修补图像，组成了新的点。其次，为了将新点聚合到3D场景中，我们提出了一种对齐算法，和谐地集成了新生成的3D场景部分。最终获得的3D场景作为优化高斯溅射的初始点。与以前的3D场景生成方法相比，LucidDreamer产生的高斯溅射细节丰富，且目标场景的领域没有限制。\n"
  },
  {
    "path": "abs/2311.13398.md",
    "content": "### Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images\n\nIn this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide to mitigate overfitting. We obtained the depth map using a pre-trained monocular depth estimation model and aligning the scale and offset using sparse COLMAP feature points. The adjusted depth aids in the color-based optimization of 3D Gaussian splatting, mitigating floating artifacts, and ensuring adherence to geometric constraints. We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images. Our approach demonstrates robust geometry compared to the original method that relies solely on images.\n\n在本文中，我们提出了一种在有限数量的图像下优化高斯涂抹的方法，同时避免过拟合。通过组合大量高斯斑点来表示3D场景已经取得了卓越的视觉质量。然而，当仅有少量图像可用时，它倾向于过度拟合训练视图。为了解决这个问题，我们引入了密集深度图作为几何指南来缓解过拟合。我们使用预训练的单目深度估计模型获得深度图，并使用稀疏COLMAP特征点来调整比例和偏移。调整后的深度有助于优化基于颜色的3D高斯涂抹，减少浮动伪影，并确保遵循几何约束。我们在NeRF-LLFF数据集上验证了所提出的方法，使用不同数量的少量图像。与仅依赖图像的原始方法相比，我们的方法展示了更强的几何稳健性。\n"
  },
  {
    "path": "abs/2311.13404.md",
    "content": "### Animatable 3D Gaussians for High-fidelity Synthesis of Human Motions\n\nWe present a novel animatable 3D Gaussian model for rendering high-fidelity free-view human motions in real time. Compared to existing NeRF-based methods, the model owns better capability in synthesizing high-frequency details without the jittering problem across video frames. The core of our model is a novel augmented 3D Gaussian representation, which attaches each Gaussian with a learnable code. The learnable code serves as a pose-dependent appearance embedding for refining the erroneous appearance caused by geometric transformation of Gaussians, based on which an appearance refinement model is learned to produce residual Gaussian properties to match the appearance in target pose. To force the Gaussians to learn the foreground human only without background interference, we further design a novel alpha loss to explicitly constrain the Gaussians within the human body. We also propose to jointly optimize the human joint parameters to improve the appearance accuracy. The animatable 3D Gaussian model can be learned with shallow MLPs, so new human motions can be synthesized in real time (66 fps on avarage). Experiments show that our model has superior performance over NeRF-based methods.\n\n我们提出了一种新颖的可动画化的三维高斯模型，用于实时渲染高保真的自由视角人体动作。与现有基于NeRF的方法相比，该模型在合成高频细节方面具有更好的能力，且在视频帧间没有抖动问题。我们模型的核心是一种新颖的增强型三维高斯表示，它为每个高斯分配了一个可学习的编码。这个可学习的编码作为姿势依赖的外观嵌入，用于修正高斯变换造成的外观错误，基于此我们学习了一个外观精炼模型，以产生残差高斯属性以匹配目标姿势中的外观。为了迫使高斯学习仅前景人体而不受背景干扰，我们进一步设计了一种新颖的alpha损失，显式地限制高斯在人体内部。我们还建议联合优化人体关节参数以提高外观准确性。这种可动画的三维高斯模型可以通过浅层MLPs学习，因此可以实时合成新的人体动作（平均66 fps）。实验表明，我们的模型在性能上优于基于NeRF的方法。\n"
  },
  {
    "path": "abs/2311.13681.md",
    "content": "### Compact 3D Gaussian Representation for Radiance Field\n\nNeural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. In our extensive experiments, we consistently show over 10× reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.\n\n神经辐射场（NeRFs）在捕捉高保真复杂三维场景方面显示出了显著的潜力。然而，阻碍NeRFs广泛应用的一个持续挑战是由于体积渲染导致的计算瓶颈。另一方面，3D高斯飞溅（3DGS）最近作为一种替代表达方式出现，它利用基于3D高斯的表示，并采用光栅化管线而不是体积渲染来渲染图像，实现了非常快速的渲染速度和有希望的图像质量。然而，一个显著的缺点是3DGS需要大量的3D高斯来维持渲染图像的高保真度，这需要大量的内存和存储空间。为了解决这个关键问题，我们特别强调两个关键目标：在不牺牲性能的情况下减少高斯点的数量，并压缩高斯属性，如视角依赖的颜色和协方差。为此，我们提出了一种可学习的掩码策略，显著减少了高斯数量同时保持了高性能。此外，我们提出了一种基于网格的神经场来有效表示视角依赖的颜色，而不是依赖于球谐函数。最后，我们通过向量量化学习码本来紧凑地表示高斯的几何属性。在我们广泛的实验中，与3DGS相比，我们一致地展示了超过10倍的存储减少和增强的渲染速度，同时保持了场景表示的质量。我们的工作提供了一个全面的三维场景表示框架，实现了高性能、快速训练、紧凑性和实时渲染。\n"
  },
  {
    "path": "abs/2311.14521.md",
    "content": "### GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting\n\n3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing.\n\n三维编辑在许多领域（如游戏和虚拟现实）中扮演着关键角色。传统的三维编辑方法，依赖于像网格和点云这样的表示，往往在真实地描绘复杂场景方面存在不足。另一方面，基于隐式三维表示的方法，如神经辐射场（NeRF），虽然能有效渲染复杂场景，但却因处理速度慢和对特定场景区域控制有限而受到限制。针对这些挑战，我们的论文介绍了GaussianEditor，一种基于高斯飞溅（GS）的创新高效三维编辑算法，GS是一种新颖的三维表示。GaussianEditor通过我们提出的高斯语义跟踪来提高编辑中的精确度和控制，该技术在整个训练过程中跟踪编辑目标。此外，我们提出了分层高斯飞溅（HGS），在二维扩散模型的随机生成指导下实现稳定和精细的结果。我们还开发了有效的对象移除和整合的编辑策略，这是现有方法的一个挑战性任务。我们的综合实验展示了GaussianEditor在控制、效率和快速性能方面的优越性，标志着三维编辑领域的重大进展。\n"
  },
  {
    "path": "abs/2311.16037.md",
    "content": "### GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions\n\nRecently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussian splatting, we propose a systematic framework, named GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text instructions. Benefiting from the explicit property of 3D Gaussians, we design a series of techniques to achieve delicate editing. Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians. The Gaussian RoI is further used to control the editing process. Our framework can achieve more delicate and precise editing of 3D scenes than previous methods while enjoying much faster training speed, i.e. within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF (45 minutes -- 2 hours).\n\n近期，在基于2D扩散模型的3D场景文本指令编辑方面取得了令人印象深刻的成果。然而，目前的扩散模型主要通过预测潜空间中的噪声来生成图像，而编辑通常应用于整个图像，这使得对3D场景进行精细的、尤其是局部的编辑变得具有挑战性。受到最近3D高斯飞溅的启发，我们提出了一个名为GaussianEditor的系统框架，通过3D高斯和文本指令来精细地编辑3D场景。得益于3D高斯的显式属性，我们设计了一系列技术来实现精细编辑。具体来说，我们首先提取与文本指令相对应的兴趣区域（RoI），并将其与3D高斯对齐。接着，利用高斯RoI来控制编辑过程。我们的框架能够比以前的方法更精细、更准确地编辑3D场景，同时享受更快的训练速度，即在单个V100 GPU上仅需20分钟，比Instruct-NeRF2NeRF（45分钟至2小时）快两倍以上。\n"
  },
  {
    "path": "abs/2311.16043.md",
    "content": "### Relightable 3D Gaussian: Real-time Point Cloud Relighting with BRDF Decomposition and Ray Tracing\n\nWe present a novel differentiable point-based rendering framework for material and lighting decomposition from multi-view images, enabling editing, ray-tracing, and real-time relighting of the 3D point cloud. Specifically, a 3D scene is represented as a set of relightable 3D Gaussian points, where each point is additionally associated with a normal direction, BRDF parameters, and incident lights from different directions. To achieve robust lighting estimation, we further divide incident lights of each point into global and local components, as well as view-dependent visibilities. The 3D scene is optimized through the 3D Gaussian Splatting technique while BRDF and lighting are decomposed by physically-based differentiable rendering. Moreover, we introduce an innovative point-based ray-tracing approach based on the bounding volume hierarchy for efficient visibility baking, enabling real-time rendering and relighting of 3D Gaussian points with accurate shadow effects. Extensive experiments demonstrate improved BRDF estimation and novel view rendering results compared to state-of-the-art material estimation approaches. Our framework showcases the potential to revolutionize the mesh-based graphics pipeline with a relightable, traceable, and editable rendering pipeline solely based on point cloud.\n\n我们提出了一种新颖的可微分点基渲染框架，用于从多视图图像中进行材质和光照分解，使得3D点云的编辑、光线追踪和实时重新照明成为可能。具体来说，一个3D场景被表示为一组可重新照明的3D高斯点，其中每个点额外关联有法线方向、BRDF参数和来自不同方向的入射光。为了实现稳健的光照估计，我们进一步将每个点的入射光分为全局和局部组成部分，以及视角依赖的可见性。3D场景通过3D高斯飞溅技术进行优化，而BRDF和光照通过基于物理的可微分渲染进行分解。此外，我们引入了一种基于边界体积层次结构的创新点基光线追踪方法，用于高效的可见性烘焙，使得3D高斯点的实时渲染和重新照明能够实现准确的阴影效果。广泛的实验展示了与最先进的材质估计方法相比，我们的框架在BRDF估计和新视角渲染结果方面的改进。我们的框架展示了用基于点云的可重新照明、可追踪和可编辑渲染管线革命性地替代基于网格的图形管线的潜力。\n"
  },
  {
    "path": "abs/2311.16096.md",
    "content": "### Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling\n\nModeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front \\& back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. Overall, our method can create lifelike avatars with dynamic, realistic and generalized appearances. Experiments show that our method outperforms other state-of-the-art approaches.\n\n从RGB视频中建模可动画的人类化身是一个长期存在且充满挑战的问题。近期的作品通常采用基于MLP的神经辐射场（NeRF）来表示3D人类，但对于纯MLP来说，回归姿势依赖的服装细节仍然是困难的。为此，我们引入了可动画高斯体，这是一种新的化身表示方法，它利用强大的2D CNNs和3D高斯飞溅来创建高保真化身。为了将3D高斯与可动画化身关联起来，我们从输入视频中学习一个参数化模板，然后在两个前后标准的高斯图上对模板进行参数化，其中每个像素代表一个3D高斯。所学习的模板能够适应所穿着的服装，以模拟像连衣裙这样的宽松衣服。这种模板引导的2D参数化使我们能够采用强大的基于StyleGAN的CNN来学习姿势依赖的高斯图，以模拟详细的动态外观。此外，我们引入了一种姿势投影策略，以便在给定新姿势时更好地泛化。总的来说，我们的方法可以创建逼真的、动态的、真实的且泛化性强的化身。实验表明，我们的方法优于其他最先进的方法。\n"
  },
  {
    "path": "abs/2311.16099.md",
    "content": "### GART: Gaussian Articulated Template Models\n\nWe introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnable forward skinning while further generalizing to more complex non-rigid deformations with novel latent bones. GART can be reconstructed via differentiable rendering from monocular videos in seconds or minutes and rendered in novel poses faster than 150fps.\n\n我们介绍了高斯铰接模板模型GART，这是一种明确的、高效的、表现力丰富的表示方法，用于从单眼视频捕捉和渲染非刚性铰接主体。GART利用移动的3D高斯混合显式近似可变形主体的几何和外观。它利用分类模板模型先验（如SMPL、SMAL等）以及可学习的前向蒙皮技术，同时通过新颖的潜在骨骼进一步泛化到更复杂的非刚性变形。GART可以通过单眼视频的可微分渲染在几秒或几分钟内重建，并且在新姿势下的渲染速度超过150fps。\n"
  },
  {
    "path": "abs/2311.16473.md",
    "content": "### GS-IR: 3D Gaussian Splatting for Inverse Rendering\n\nWe propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (e.g. NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (e.g. rasterization and splatting) cannot trace the occlusion like backward mapping (e.g. ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes.\n\n我们提出了一种基于3D高斯溅射（GS）的新型逆向渲染方法GS-IR，该方法利用正向映射体积渲染实现逼真的新视角合成和重新照明效果。与以前使用隐式神经表示和体积渲染的工作（例如NeRF）不同，这些工作受限于表达能力低和计算复杂度高，我们扩展了GS——一种用于新视角合成的顶级性能表示，以从在未知照明条件下捕获的多视图图像中估计场景几何、表面材料和环境照明。在将GS引入逆向渲染时存在两个主要问题：1）GS本身不支持产生合理的法线；2）正向映射（例如光栅化和溅射）无法像反向映射（例如光线追踪）那样追踪遮挡。为了解决这些挑战，我们的GS-IR提出了一种有效的优化方案，该方案结合了基于深度导数的规范化用于法线估计和基于烘焙的遮挡以模拟间接照明。灵活且富有表现力的GS表示使我们能够实现快速紧凑的几何重建、逼真的新视角合成和有效的基于物理的渲染。我们通过对各种具有挑战性的场景进行定性和定量评估，展示了我们方法相对于基线方法的优越性。\n"
  },
  {
    "path": "abs/2311.16482.md",
    "content": "### Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars\n\nNeural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce hash-encoded shape and appearance to speed up training and propose time-dependent ambient occlusion to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method outperforms existing methods in terms of training time, rendering speed, and reconstruction quality. Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.\n\n神经辐射场能够重建高质量的可驱动人类化身，但训练和渲染成本高昂。为了减少消耗，我们提出了可动画的3D高斯方法，它从输入图像和姿势中学习人类化身。我们通过建模一组被蒙皮的3D高斯和相应的骨骼在标准空间中，根据输入的姿势将3D高斯变形到姿势空间，将3D高斯扩展到动态人类场景。我们引入哈希编码的形状和外观来加速训练，并提出时间依赖的环境遮挡，以实现在包含复杂运动和动态阴影的场景中高质量的重建。在新视角合成和新姿势合成任务上，我们的方法在训练时间、渲染速度和重建质量方面均优于现有方法。我们的方法可以轻松扩展到多人场景，并在只有25秒的训练中实现十人场景的可比新视角合成结果。\n"
  },
  {
    "path": "abs/2311.16493.md",
    "content": "### Mip-Splatting: Alias-free 3D Gaussian Splatting\n\nRecently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, e.g., by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.\n\n最近，三维高斯溅点在新视角合成方面展现了令人印象深刻的成果，达到了高保真度和高效率。然而，当改变采样率时，例如通过改变焦距或相机距离，可以观察到强烈的失真现象。我们发现，这一现象的源头可以归因于缺乏三维频率约束和使用二维扩张滤波器。为了解决这个问题，我们引入了一个三维平滑滤波器，该滤波器基于输入视图引起的最大采样频率来约束三维高斯原语的大小，消除了放大时的高频失真。此外，用二维 Mip 滤波器替换二维扩张滤波器，该滤波器模拟二维盒式滤波器，有效缓解了混叠和扩张问题。我们的评估，包括在单尺度图像上训练和在多尺度上测试的场景，验证了我们方法的有效性。\n"
  },
  {
    "path": "abs/2311.17061.md",
    "content": "### HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting\n\nRealistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios.\n\n从文本提示中生成逼真的3D人类是一个令人向往但具有挑战性的任务。现有方法通过评分蒸馏采样（SDS）优化像网格或神经场这样的3D表示，但这些方法存在细节不足或训练时间过长的问题。在本文中，我们提出了一种高效且有效的框架HumanGaussian，它生成具有细致几何结构和逼真外观的高质量3D人类。我们的关键见解是，3D高斯飞溅是一种高效的渲染器，具有周期性的高斯缩减或增长，这种自适应密度控制可以自然地由人类内在结构引导。具体来说，1）我们首先提出了一种结构感知的SDS，它同时优化人类外观和几何结构。利用RGB和深度空间的多模态评分函数来蒸馏高斯密化和修剪过程。2）此外，我们设计了一种退火负面提示指导，通过将SDS分解为噪声较大的生成评分和较清晰的分类器评分，有效解决了过饱和问题。基于高斯大小的修剪阶段进一步消除了浮动伪影，以增强生成的平滑度。广泛的实验表明，我们的框架在效率上卓越，并在渲染多样化场景下的3D人类方面具有竞争力的质量。\n"
  },
  {
    "path": "abs/2311.17089.md",
    "content": "### Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering\n\n3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13%-66% PSNR and 160%-2400% rendering speed improvement at 4×-128× scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting.\n\n3D高斯最近已成为3D重建和渲染的一个高效表示方法。尽管在高分辨率下其渲染质量和速度都很高，但在较低分辨率或远距离摄像机位置进行渲染时，它们都会急剧恶化。在低分辨率或远距离渲染时，与每个飞溅的3D高斯的屏幕尺寸相比，图像的像素大小可能会低于奈奎斯特频率，从而导致走样效应。渲染速度也因为每个像素飞溅更多高斯的顺序阿尔法混合而大大减慢。为了解决这些问题，我们提出了一种多尺度3D高斯飞溅算法，它保持了不同尺度的高斯来表示同一场景。高分辨率图像使用更多的小高斯渲染，而低分辨率图像使用更少的大高斯渲染。在类似的训练时间内，与单尺度3D高斯飞溅相比，我们的算法在Mip-NeRF360数据集上的4×-128×尺度渲染中可以实现13％-66％的PSNR和160％-2400％的渲染速度提升。\n"
  },
  {
    "path": "abs/2311.17113.md",
    "content": "### Human Gaussian Splatting: Real-time Rendering of Animatable Avatars\n\nThis work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose the first animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. Our body is represented by a set of gaussian primitives in a canonical space which are deformed in a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (OURS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method presents a PSNR 1.5dbB better than the state-of-the-art on THuman4 dataset while being able to render at 20fps or more.\n\n本工作解决了从多视角视频中学习的真实感人体虚拟形象的实时渲染问题。尽管传统的虚拟人类建模和渲染方法通常使用纹理网格，但近期的研究已经开发出了神经体表示，实现了令人印象深刻的视觉质量。然而，这些模型难以实时渲染，并且当角色的身体姿势与训练观测不同时，其质量会降低。我们提出了基于3D高斯喷溅（Gaussian Splatting）的第一个可动画人类模型，这是一种非常高效的替代神经辐射场的新方法。我们的身体由一组高斯原始体在规范空间中表示，并通过结合前向蒙皮和局部非刚性精细化的方法进行粗到细的变形。我们描述了如何从多视角观察中端到端地学习我们的人类高斯喷溅（OURS）模型，并将其与最新技术方法在新姿势合成有衣体方面进行评估。我们的方法在THuman4数据集上比现有技术高出1.5dbB的PSNR，同时能够以20fps或更高速度渲染。\n"
  },
  {
    "path": "abs/2311.17245.md",
    "content": "### LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS\n\nRecent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency.\n\nTo address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses.\n\nIn summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets.\n\n最近在实时神经渲染中使用基于点的技术取得的进步，为3D表示的广泛采用铺平了道路。然而，像3D高斯喷溅这样的基础方法带来了大量的存储开销，因为它将SfM点增长到数百万，通常需要单个无限制场景的千兆字节级磁盘空间，这对可扩展性构成了重大挑战，并阻碍了喷溅效率。\n\n为了应对这一挑战，我们引入了LightGaussian，这是一种旨在将3D高斯转换成更高效、更紧凑格式的新方法。从网络剪枝的概念中获得灵感，LightGaussian识别出在场景重建中贡献不大的高斯，并采用剪枝和恢复过程，有效减少了高斯数量的冗余，同时保留了视觉效果。此外，LightGaussian运用蒸馏和伪视图增强，将球面谐波蒸馏到更低的程度，允许知识转移到更紧凑的表示中，同时保持反射率。此外，我们提出了一种混合方案，VecTree量化，用于对所有属性进行量化，从而实现更低位宽表示，同时将准确性损失降至最低。\n\n总结来说，LightGaussian实现了平均压缩率超过15倍，同时将FPS从139提升到215，使得在Mip-NeRF 360、Tank和Temple数据集上高效地表示复杂场景成为可能。\n"
  },
  {
    "path": "abs/2311.17857.md",
    "content": "### Gaussian Shell Maps for Efficient 3D Human Generation\n\nEfficient generation of 3D digital humans is important in several industries, including virtual reality, social media, and cinematic production. 3D generative adversarial networks (GANs) have demonstrated state-of-the-art (SOTA) quality and diversity for generated assets. Current 3D GAN architectures, however, typically rely on volume representations, which are slow to render, thereby hampering the GAN training and requiring multi-view-inconsistent 2D upsamplers. Here, we introduce Gaussian Shell Maps (GSMs) as a framework that connects SOTA generator network architectures with emerging 3D Gaussian rendering primitives using an articulable multi shell--based scaffold. In this setting, a CNN generates a 3D texture stack with features that are mapped to the shells. The latter represent inflated and deflated versions of a template surface of a digital human in a canonical body pose. Instead of rasterizing the shells directly, we sample 3D Gaussians on the shells whose attributes are encoded in the texture features. These Gaussians are efficiently and differentiably rendered. The ability to articulate the shells is important during GAN training and, at inference time, to deform a body into arbitrary user-defined poses. Our efficient rendering scheme bypasses the need for view-inconsistent upsamplers and achieves high-quality multi-view consistent renderings at a native resolution of 512×512 pixels. We demonstrate that GSMs successfully generate 3D humans when trained on single-view datasets, including SHHQ and DeepFashion.\n\n在多个行业中，包括虚拟现实、社交媒体和电影制作，高效生成3D数字人类是非常重要的。3D生成对抗网络（GANs）已经展示了生成资产的最新（SOTA）质量和多样性。然而，当前的3D GAN架构通常依赖于体积表示，这使得渲染变慢，从而阻碍了GAN训练，并且需要多视图不一致的2D上采样器。在这里，我们引入了高斯壳映射（GSMs）作为一个框架，它将最新的生成器网络架构与新兴的3D高斯渲染原语结合起来，使用一个可操作的多壳基础架构。在这种设置下，CNN生成一个具有特征的3D纹理堆栈，这些特征被映射到壳上。后者代表了数字人类模板表面在规范身体姿势下的膨胀和收缩版本。我们不是直接光栅化这些壳，而是在壳上采样3D高斯，其属性被编码在纹理特征中。这些高斯可以高效且可微地渲染。在GAN训练期间以及在推理时将身体变形为任意用户定义姿势时，操作壳的能力非常重要。我们的高效渲染方案绕过了对视图不一致上采样器的需求，并以512×512像素的原生分辨率实现了高质量的多视图一致渲染。我们证明了GSMs在训练于单视图数据集，包括SHHQ和DeepFashion时，成功地生成了3D人类。\n"
  },
  {
    "path": "abs/2311.17907.md",
    "content": "### CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting\n\nWith the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy.\n\n随着基于扩散的生成模型的出现以及它们生成文本条件图像的能力，内容生成得到了巨大的振兴。最近，这些模型已被证明可为3D图形资产的生成提供有用的指导。然而，现有的文本条件3D生成工作面临着基本限制：（i）无法生成详细的多对象场景，（ii）无法通过文本控制多对象配置，以及（iii）物理上逼真的场景构成。在这项工作中，我们提出了CG3D，一种用于组合生成可扩展3D资产的方法，解决了这些约束。我们发现，明确的高斯辐射场，被参数化以允许对象的组合，具备实现语义和物理上一致场景的能力。通过利用围绕这种明确表示构建的指导框架，我们展示了最新的成果，甚至能够在对象组合和物理精确度方面超越引导扩散模型。\n"
  },
  {
    "path": "abs/2311.17910.md",
    "content": "### HUGS: Human Gaussian Splats\n\nRecent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings new challenges, including the artifacts created when articulating the Gaussians. We propose to jointly optimize the linear blend skinning weights to coordinate the movements of individual Gaussians during animation. Our approach enables novel-pose synthesis of human and novel view synthesis of both the human and the scene. We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work.\n\n最近在神经渲染方面的进步大幅提高了训练和渲染时间。虽然这些方法展示了最先进的质量和速度，但它们主要设计用于静态场景的摄影测量，对于环境中自由移动的人类并不适用。在这项工作中，我们引入了人类高斯溅点（Human Gaussian Splats，简称 HUGS），它使用三维高斯溅点（3D Gaussian Splatting，简称 3DGS）来表示可动画的人类和场景。我们的方法仅需要一个单目视频，帧数在50到100之间，它能在30分钟内自动学习区分静态场景和完全可动画的人类化身。我们利用SMPL身体模型初始化人类高斯。为了捕捉SMPL无法模拟的细节（例如衣物、头发），我们允许三维高斯偏离人体模型。使用三维高斯来实现人类动画带来了新的挑战，包括在高斯关节活动时产生的失真。我们提出共同优化线性混合蒙皮权重，以协调动画过程中各个高斯的运动。我们的方法使得人类新姿势的合成和人类及场景的新视角合成成为可能。我们实现了最先进的渲染质量，并且渲染速度达到60 FPS，训练速度比以往的工作快约100倍。\n"
  },
  {
    "path": "abs/2311.17977.md",
    "content": "### GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces\n\nThe advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h).\n\n神经3D高斯的出现最近在神经渲染领域引发了一场革命，促进了以实时速度生成高质量渲染的能力。然而，这种明确和离散的表示在应用于具有反射表面的场景时遇到了挑战。在这篇论文中，我们介绍了GaussianShader，这是一种新方法，它在3D高斯上应用了简化的着色函数，以增强具有反射表面的场景中的神经渲染，同时保持训练和渲染效率。应用着色函数的主要挑战在于对离散3D高斯上的精确法线估计。具体来说，我们提出了一种基于3D高斯最短轴方向的新颖法线估计框架，并设计了一个精细的损失函数，以确保法线与高斯球的几何形状之间的一致性。实验表明，GaussianShader在效率和视觉质量之间取得了值得称赞的平衡。我们的方法在反射物体数据集上的PSNR上超过了高斯喷溅，展示了1.57dB的改进。与处理反射表面的先前工作（如Ref-NeRF）相比，我们的优化时间显著加快（23小时对比0.58小时）。\n"
  },
  {
    "path": "abs/2311.18159.md",
    "content": "### Compact3D: Compressing Gaussian Splat Radiance Field Models with Vector Quantization\n\n3D Gaussian Splatting is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians. We notice that many Gaussians may share similar parameters, so we introduce a simple vector quantization method based on \\kmeans algorithm to quantize the Gaussian parameters. Then, we store the small codebook along with the index of the code for each Gaussian. Moreover, we compress the indices further by sorting them and using a method similar to run-length encoding. We do extensive experiments on standard benchmarks as well as a new benchmark which is an order of magnitude larger than the standard benchmarks. We show that our simple yet effective method can reduce the storage cost for the original 3D Gaussian Splatting method by a factor of almost 20× with a very small drop in the quality of rendered images.\n\n3D高斯喷溅是一种新的建模和渲染3D辐射场的方法，与最新的NeRF方法相比，它实现了更快的学习和渲染时间。然而，与NeRF方法相比，它的一个缺点是需要更大的存储需求，因为它需要存储几个3D高斯的参数。我们注意到许多高斯可能具有相似的参数，因此我们引入了一种基于\\kmeans算法的简单向量量化方法来量化高斯参数。然后，我们存储小型码本以及每个高斯的码索引。此外，我们通过排序索引并使用类似于游程编码的方法进一步压缩索引。我们在标准基准测试以及一个比标准基准测试大一个数量级的新基准测试上进行了广泛的实验。我们展示了我们这种简单而有效的方法可以将原始3D高斯喷溅方法的存储成本减少近20倍，同时渲染图像的质量只有非常小的下降。\n"
  },
  {
    "path": "abs/2311.18482.md",
    "content": "### Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding\n\nOpen-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU.\n\n在3D空间中进行开放词汇查询是具有挑战性但对于场景理解任务（如对象定位和分割）至关重要的。语言嵌入式场景表示通过将语言特征融入到3D空间中取得了进展。然而，它们的有效性严重依赖于训练和渲染中资源密集的神经网络。尽管最近的3D高斯提供了高效和高质量的新视图合成，但直接在其中嵌入语言特征会导致过高的内存使用和性能下降。在这项工作中，我们引入了语言嵌入式3D高斯，这是一种用于开放词汇查询任务的新型场景表示。我们提出了一种专用的量化方案，大幅减轻了内存需求，而不是在3D高斯上嵌入高维原始语义特征。我们还提出了一种新的嵌入过程，实现了更平滑但高精度的查询，以应对多视图特征不一致性和基于点的表示中的高频感应偏差。我们的全面实验表明，我们的表示在当前语言嵌入式表示中实现了最佳的视觉质量和语言查询准确性，同时在单个桌面GPU上保持实时渲染帧率。\n"
  },
  {
    "path": "abs/2311.18561.md",
    "content": "### Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering\n\nModeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative.\n\n由于城市场景具有高度复杂的几何结构和在空间和时间上不受限制的动态变化，对其进行动态大规模建模是一项挑战。以往的方法常常采用高级建筑先验，将静态和动态元素分离，从而未能充分捕捉它们的协同互动。为了应对这一挑战，我们提出了一种统一的表示模型，称为周期性振动高斯（PVG）。PVG在高效的三维高斯溅射技术基础上构建，该技术最初是为静态场景表示而设计的，通过引入基于周期性振动的时间动态，PVG能够优雅且统一地表示动态城市场景中各种对象和元素的特性。为了在稀疏训练数据下增强时间上的连贯表示学习，我们引入了一种新颖的基于流的时间平滑机制和位置感知的自适应控制策略。在Waymo开放数据集和KITTI基准测试上的广泛实验表明，PVG在重建和新视角合成方面均超越了现有的最先进方法，无论是对动态场景还是静态场景。值得注意的是，PVG实现了这一点，而无需依赖手动标记的对象边界框或昂贵的光流估计。此外，PVG在训练/渲染方面比最佳替代方案快了50/6000倍。\n"
  },
  {
    "path": "abs/2312.00109.md",
    "content": "### Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering\n\nNeural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed.\n\n神经渲染方法在各种学术和工业应用中显著推进了真实感3D场景渲染。最近的3D高斯喷溅方法结合了基于原始体表示和体积表示的优势，实现了最新的渲染质量和速度。然而，它经常导致大量冗余的高斯尝试适应每个训练视图，而忽视了潜在的场景几何。因此，生成的模型对重大视图变化、无纹理区域和光照效果的鲁棒性变差。我们引入了Scaffold-GS，它使用锚点来分布局部3D高斯，并根据观看方向和视锥内的距离即时预测它们的属性。我们根据神经高斯的重要性，开发了锚点生长和剪枝策略，可靠地提高场景覆盖率。我们展示了我们的方法有效地减少了冗余高斯，同时提供高质量的渲染。我们还展示了在不牺牲渲染速度的情况下，增强了适应具有不同细节级别和视图依赖观察的场景的能力。\n"
  },
  {
    "path": "abs/2312.00112.md",
    "content": "### DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting\n\nAccurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios.\n\n准确高效地建模动态场景和运动被认为是一个极具挑战性的任务，因为时间动态和运动复杂性。为了应对这些挑战，我们提出了DynMF，一种紧凑高效的表示，将动态场景分解为少量神经轨迹。我们认为，动态场景的每个点的运动可以分解为一小组显式或学习的轨迹。我们精心设计的神经框架由一小组仅在时间上查询的学习基础组成，允许与3D高斯喷溅相似的渲染速度，超过120 FPS，同时仅需要与静态场景相比两倍的存储空间。我们的神经表示充分约束了动态场景本质上不受约束的运动场，从而实现了有效且快速的优化。这是通过将每个点绑定到运动系数上实现的，这些运动系数强制每个点共享基础轨迹。通过对运动系数仔细应用稀疏损失，我们能够分离构成场景的运动，独立控制它们，并生成之前从未见过的新的运动组合。我们可以在短短5分钟的训练内达到最新的渲染质量，并在不到半小时内，我们可以合成具有卓越真实感质量的动态场景的新视图。我们的表示是可解释的、高效的，并且足够表现力，以提供复杂动态场景运动的实时视图合成，无论是单眼还是多视图场景。\n"
  },
  {
    "path": "abs/2312.00206.md",
    "content": "### SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting\n\nThe problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360 scenes from sparse training views. We find that using naive depth priors is not sufficient and integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by up to 30.5% and NeRF-based methods by up to 15.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost.\n\n近来，随着神经辐射场（NeRFs）和其他隐式场景表示方法的引入，新视图合成问题的受欢迎程度显著增长。最近的一个进展，3D高斯喷溅（3DGS），利用显式表示实现了高质量的实时渲染。然而，3DGS仍然需要大量的训练视图来生成一个连贯的场景表示。在少量样本设置中，与NeRF类似，3DGS倾向于过拟合训练视图，导致背景崩塌和过多的漂浮物，尤其是当训练视图数量减少时。我们提出了一种方法，使得能够从稀疏训练视图中训练出连贯的基于3DGS的辐射场，用于360度场景。我们发现，仅使用原始的深度先验是不够的，并将深度先验与生成和显式约束结合起来，以减少背景崩塌，移除漂浮物，并增强从未见过视点的一致性。实验表明，我们的方法在MipNeRF-360数据集上的LPIPS性能比基础3DGS提高了高达30.5%，比基于NeRF的方法提高了高达15.6%，同时训练和推理成本大大减少。\n"
  },
  {
    "path": "abs/2312.00451.md",
    "content": "### FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting\n\nNovel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender.\n\n从有限的观察中合成新视图仍然是一个重要且持续的任务。然而，在现有基于NeRF的少样本视图合成中，为了获得准确的3D表示，往往会牺牲高效性。为了应对这一挑战，我们提出了一种基于3D高斯喷溅的少样本视图合成框架，该框架能够仅使用三个训练视图实现实时和真实感视图合成。我们提出的方法，称为FSGS，通过精心设计的高斯Unpooling过程处理极其稀疏的初始化SfM点。我们的方法迭代地在最具代表性的位置周围分布新高斯，随后在空白区域填充局部细节。我们还在高斯优化过程中整合了一个大规模预训练的单目深度估计器，利用在线增强视图指导几何优化朝着最优解发展。从有限输入视点观察到的稀疏点开始，我们的FSGS能够准确地扩展到未见区域，全面覆盖场景并提升新视图的渲染质量。总体而言，FSGS在多种数据集上都实现了最新的性能，包括LLFF、Mip-NeRF360和Blender，无论是在准确性还是渲染效率方面。\n"
  },
  {
    "path": "abs/2312.00583.md",
    "content": "### MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes\n\nAccurate 3D tracking in highly deformable scenes with occlusions and shadows can facilitate new applications in robotics, augmented reality, and generative AI. However, tracking under these conditions is extremely challenging due to the ambiguity that arises with large deformations, shadows, and occlusions. We introduce MD-Splatting, an approach for simultaneous 3D tracking and novel view synthesis, using video captures of a dynamic scene from various camera poses. MD-Splatting builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel view synthesis. MD-Splatting learns a deformation function to project a set of Gaussians with non-metric, thus canonical, properties into metric space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on local rigidity, conservation of momentum, and isometry, which leads to trajectories with smaller trajectory errors. MD-Splatting achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. Compared to state-of-the-art, we improve 3D tracking by an average of 23.9 %, while simultaneously achieving high-quality novel view synthesis. With sufficient texture such as in scene 6, MD-Splatting achieves a median tracking error of 3.39 mm on a cloth of 1 x 1 meters in size.\n\n在高度可变形的场景中进行精确的3D跟踪，可以为机器人技术、增强现实和生成式人工智能开辟新的应用领域。然而，在这些条件下进行跟踪极具挑战性，因为大幅度的变形、阴影和遮挡会产生歧义。我们引入了MD-Splatting，这是一种同时进行3D跟踪和新视图合成的方法，使用从不同摄像机角度捕捉的动态场景视频。MD-Splatting建立在高斯喷溅的最新进展之上，这是一种学习大量高斯属性以实现最新和快速新视图合成的方法。MD-Splatting学习一个变形函数，将具有非度量（即规范）属性的一组高斯投影到度量空间中。变形函数使用神经体素编码和多层感知器（MLP）来推断高斯的位置、旋转和阴影标量。我们强制执行基于局部刚性、动量守恒和等距的物理启发式正则化项，这导致具有较小轨迹误差的轨迹。MD-Splatting在具有阴影和遮挡的高度可变形场景中实现了高质量的3D跟踪。与最新技术相比，我们将3D跟踪平均提高了23.9％，同时实现了高质量的新视图合成。在具有足够纹理的场景中，如场景6，MD-Splatting在1 x 1米大小的布料上实现了3.39毫米的中位跟踪误差。\n"
  },
  {
    "path": "abs/2312.00732.md",
    "content": "### Gaussian Grouping: Segment and Edit Anything in 3D Scenes\n\nThe recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by SAM, along with introduced 3D spatial consistency regularization. Comparing to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization and scene recomposition.\n\n最近的高斯喷溅技术实现了3D场景的高质量和实时新视图合成。然而，它仅专注于外观和几何建模，而缺乏细粒度的对象级场景理解。为了解决这个问题，我们提出了高斯分组，这是对高斯喷溅的扩展，用于同时重建和分割开放世界3D场景中的任何事物。我们为每个高斯增加了一个紧凑的身份编码，允许根据3D场景中的对象实例或材料成员将高斯进行分组。我们不是求助于昂贵的3D标签，而是在可微渲染过程中通过利用SAM的2D遮罩预测来监督身份编码，同时引入了3D空间一致性正则化。与隐式的NeRF表示相比，我们展示了离散且分组的3D高斯可以以高视觉质量、细粒度和效率在3D中重建、分割和编辑任何事物。基于高斯分组，我们进一步提出了一种局部高斯编辑方案，该方案在多种场景编辑应用中显示出有效性，包括3D对象移除、修复、上色和场景重组。\n"
  },
  {
    "path": "abs/2312.00846.md",
    "content": "### NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance\n\nExisting neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstruction pipeline with guidance from 3D Gaussian Splatting to recover highly detailed surfaces. The advantage of 3D Gaussian Splatting is that it can generate dense point clouds with detailed structure. Nonetheless, a naive adoption of 3D Gaussian Splatting can fail since the generated points are the centers of 3D Gaussians that do not necessarily lie on the surface. We thus introduce a scale regularizer to pull the centers close to the surface by enforcing the 3D Gaussians to be extremely thin. Moreover, we propose to refine the point cloud from 3D Gaussians Splatting with the normal priors from the surface predicted by neural implicit models instead of using a fixed set of points as guidance. Consequently, the quality of surface reconstruction improves from the guidance of the more accurate 3D Gaussian splatting. By jointly optimizing the 3D Gaussian Splatting and the neural implicit model, our approach benefits from both representations and generates complete surfaces with intricate details. Experiments on Tanks and Temples verify the effectiveness of our proposed method.\n\n现有的神经隐式表面重建方法在多视图3D重建中取得了令人印象深刻的表现，这归功于利用显式几何先验（如深度图或点云）作为正则化。然而，重建结果仍然缺乏细节，因为深度图过于平滑或点云稀疏。在这项工作中，我们提出了一种以3D高斯喷溅为指导的神经隐式表面重建流程，以恢复高度详细的表面。3D高斯喷溅的优势在于它能生成具有详细结构的密集点云。尽管如此，简单地采用3D高斯喷溅可能会失败，因为生成的点是3D高斯的中心，这些中心不一定位于表面上。因此，我们引入了一个尺度正则化器，通过强制3D高斯变得极其细长，将中心点拉近到表面。此外，我们提出使用由神经隐式模型预测的表面法线先验来细化3D高斯喷溅产生的点云，而不是使用固定点集作为指导。结果是，更准确的3D高斯喷溅指导下，表面重建质量得到了提升。通过联合优化3D高斯喷溅和神经隐式模型，我们的方法受益于两种表示，并生成了具有复杂细节的完整表面。在“坦克与庙宇”数据集上的实验验证了我们提出方法的有效性。\n"
  },
  {
    "path": "abs/2312.00860.md",
    "content": "### Segment Any 3D Gaussians\n\nInteractive 3D segmentation in radiance fields is an appealing task since its importance in 3D scene understanding and manipulation. However, existing methods face challenges in either achieving fine-grained, multi-granularity segmentation or contending with substantial computational overhead, inhibiting real-time interaction. In this paper, we introduce Segment Any 3D GAussians (SAGA), a novel 3D interactive segmentation approach that seamlessly blends a 2D segmentation foundation model with 3D Gaussian Splatting (3DGS), a recent breakthrough of radiance fields. SAGA efficiently embeds multi-granularity 2D segmentation results generated by the segmentation foundation model into 3D Gaussian point features through well-designed contrastive training. Evaluation on existing benchmarks demonstrates that SAGA can achieve competitive performance with state-of-the-art methods. Moreover, SAGA achieves multi-granularity segmentation and accommodates various prompts, including points, scribbles, and 2D masks. Notably, SAGA can finish the 3D segmentation within milliseconds, achieving nearly 1000x acceleration compared to previous SOTA.\n\n交互式3D辐射场分割是一个吸引人的任务，因为它在3D场景理解和操纵中非常重要。然而，现有方法在实现细粒度、多粒度分割或应对大量计算开销方面面临挑战，这限制了实时互动。在这篇论文中，我们介绍了“分割任意3D高斯”（SAGA），这是一种新颖的3D交互式分割方法，它将2D分割基础模型与3D高斯喷溅（3DGS）无缝融合，后者是辐射场的最新突破。SAGA通过精心设计的对比训练，高效地将分割基础模型生成的多粒度2D分割结果嵌入到3D高斯点特征中。在现有基准测试上的评估表明，SAGA能够与最先进的方法竞争。此外，SAGA实现了多粒度分割，并适应各种提示，包括点、涂鸦和2D蒙版。值得注意的是，SAGA可以在几毫秒内完成3D分割，与以前的最先进技术相比，几乎实现了1000倍的加速。\n"
  },
  {
    "path": "abs/2312.01196.md",
    "content": "### Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction\n\nReconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects, maintaining 3D consistency across novel views. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues.\n\n从单目视频中重建动态对象是一个受限严重且具有挑战性的问题，近期的工作已经从各个方向对其进行了探索。然而，由于这个问题本质上是不适定的，目前还没有解决方案能够从与训练视图显著不同的相机位置提供一致、高质量的新视图。在这项工作中，我们引入了神经参数高斯（NPGs）来应对这一挑战，采用了两阶段方法：首先，我们拟合一个低秩神经变形模型，然后在第二阶段用作非刚性重建的正则化。第一阶段学习对象的变形，以保持新视图中的一致性。第二阶段通过优化由粗略模型驱动的3D高斯来获得高重建质量。为此，我们引入了一种局部3D高斯表示，其中时间共享的高斯被固定在局部定向体积中并由其变形。最终组合模型可以被渲染为辐射场，从而实现对非刚性变形对象的高质量、逼真的重建，同时在新视图中保持3D一致性。我们展示了NPGs与先前工作相比，在具有少量多视图提示的挑战性场景中取得了更优异的结果。\n"
  },
  {
    "path": "abs/2312.01632.md",
    "content": "### GaussianHead: Impressive Head Avatars with Learnable Gaussian Diffusion\n\nPrevious head avatar methods have primarily relied on fixed-shape scene primitives, lacking a balance between geometric topology, texture details, and computational efficiency. Some hybrid neural network methods (e.g., planes and voxels) gained advantages in fast rendering, but they all used axis-aligned mappings to extract features explicitly, leading to issues of axis-aligned bias and feature dilution. We present GaussianHead, which utilizes deformable 3D Gaussians as building blocks for the head avatars. We propose a novel methodology where the core Gaussians designated for rendering undergo dynamic diffusion before being mapped onto a factor plane to acquire canonical sub-factors. Through our factor blending strategy, the canonical features for the core Gaussians used in rendering are obtained. This approach deviates from the previous practice of utilizing axis-aligned mappings, especially improving the representation capability of subtle structures such as teeth, wrinkles, hair, and even facial pores. In comparison to state-of-the-art methods, our unique primitive selection and factor decomposition in GaussianHead deliver superior visual results while maintaining rendering performance (0.1 seconds per frame). Code will released for research.\n\n以往的头像头部虚拟化方法主要依赖于固定形状的场景基元，缺乏几何拓扑、纹理细节和计算效率之间的平衡。一些混合神经网络方法（例如，平面和体素）在快速渲染方面获得了优势，但它们都使用轴对齐映射来显式提取特征，导致了轴对齐偏差和特征稀释的问题。我们提出了GaussianHead，它利用可变形的3D高斯作为头像头部的构建块。我们提出了一种新颖的方法，其中用于渲染的核心高斯在映射到因子平面以获取规范子因子之前会经历动态扩散。通过我们的因子混合策略，用于渲染中的核心高斯的规范特征得以获得。这种方法偏离了以前使用轴对齐映射的做法，尤其是在提升细微结构（如牙齿、皱纹、头发甚至面部毛孔）的表示能力方面。与最先进的方法相比，我们的GaussianHead独特的原始选择和因子分解提供了卓越的视觉结果，同时保持了渲染性能（每帧0.1秒）。\n"
  },
  {
    "path": "abs/2312.02069.md",
    "content": "### GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians\n\nWe introduce GaussianAvatars, a new method to create photorealistic head avatars that are fully controllable in terms of expression, pose, and viewpoint. The core idea is a dynamic 3D representation based on 3D Gaussian splats that are rigged to a parametric morphable face model. This combination facilitates photorealistic rendering while allowing for precise animation control via the underlying parametric model, e.g., through expression transfer from a driving sequence or by manually changing the morphable model parameters. We parameterize each splat by a local coordinate frame of a triangle and optimize for explicit displacement offset to obtain a more accurate geometric representation. During avatar reconstruction, we jointly optimize for the morphable model parameters and Gaussian splat parameters in an end-to-end fashion. We demonstrate the animation capabilities of our photorealistic avatar in several challenging scenarios. For instance, we show reenactments from a driving video, where our method outperforms existing works by a significant margin.\n\n我们介绍了GaussianAvatars，这是一种创建逼真的头部虚拟化形象的新方法，这些形象在表情、姿势和视点方面都是完全可控的。核心思想是基于3D高斯喷溅的动态3D表示，这些喷溅被绑定到一个参数化的可变形面部模型。这种组合促进了逼真的渲染，同时允许通过底层参数模型进行精确的动画控制，例如，通过从驱动序列传递表情或手动更改可变形模型参数。我们通过三角形的局部坐标框架参数化每个喷溅，并优化显式位移偏移，以获得更准确的几何表示。在虚拟化形象重建过程中，我们以端到端的方式联合优化可变形模型参数和高斯喷溅参数。我们在几个具有挑战性的场景中展示了我们逼真虚拟化形象的动画能力。例如，我们展示了从驱动视频中的再现，其中我们的方法在性能上显著超越了现有工作。\n"
  },
  {
    "path": "abs/2312.02126.md",
    "content": "### SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM\n\nDense simultaneous localization and mapping (SLAM) is pivotal for embodied scene understanding. Recent work has shown that 3D Gaussians enable high-quality reconstruction and real-time rendering of scenes using multiple posed cameras. In this light, we show for the first time that representing a scene by 3D Gaussians can enable dense SLAM using a single unposed monocular RGB-D camera. Our method, SplaTAM, addresses the limitations of prior radiance field-based representations, including fast rendering and optimization, the ability to determine if areas have been previously mapped, and structured map expansion by adding more Gaussians. We employ an online tracking and mapping pipeline while tailoring it to specifically use an underlying Gaussian representation and silhouette-guided optimization via differentiable rendering. Extensive experiments show that SplaTAM achieves up to 2X state-of-the-art performance in camera pose estimation, map construction, and novel-view synthesis, demonstrating its superiority over existing approaches, while allowing real-time rendering of a high-resolution dense 3D map.\n\n密集型同时定位与地图构建（SLAM）对于具身场景理解至关重要。最近的工作显示，使用多个摆放的摄像头，3D高斯可以实现高质量重建和实时渲染场景。基于此，我们首次展示了用3D高斯表示一个场景，可以使用单个未摆放的单目RGB-D摄像头实现密集型SLAM。我们的方法，SplaTAM，解决了以前基于辐射场表示的限制，包括快速渲染和优化、判断区域是否已经被映射的能力，以及通过增加更多高斯来结构化地扩展地图。我们采用在线跟踪和映射流程，同时专门使用基于高斯的表示和通过可微分渲染的轮廓引导优化。广泛的实验表明，SplaTAM在相机姿态估计、地图构建和新视图合成方面实现了多达2倍的最先进性能，证明了其优于现有方法，同时允许对高分辨率密集3D地图进行实时渲染。\n"
  },
  {
    "path": "abs/2312.02134.md",
    "content": "### GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians\n\nWe present GaussianAvatar, an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. We start by introducing animatable 3D Gaussians to explicitly represent humans in various poses and clothing styles. Such an explicit and animatable representation can fuse 3D appearances more efficiently and consistently from 2D observations. Our representation is further augmented with dynamic properties to support pose-dependent appearance modeling, where a dynamic appearance network along with an optimizable feature tensor is designed to learn the motion-to-appearance mapping. Moreover, by leveraging the differentiable motion condition, our method enables a joint optimization of motions and appearances during avatar modeling, which helps to tackle the long-standing issue of inaccurate motion estimation in monocular settings. The efficacy of GaussianAvatar is validated on both the public dataset and our collected dataset, demonstrating its superior performances in terms of appearance quality and rendering efficiency.\n\n我们介绍了 GaussianAvatar，这是一种从单个视频创建逼真的动态三维人类形象的高效方法。我们首先引入可动画的三维高斯模型来明确表示各种姿势和服装风格的人类。这种明确且可动画的表示可以更有效、更一致地从二维观测中融合三维外观。我们的表示进一步增加了动态属性，以支持依赖于姿势的外观建模，其中设计了一个动态外观网络和一个可优化的特征张量，用于学习运动到外观的映射。此外，通过利用可微分的运动条件，我们的方法可以在形象建模过程中对运动和外观进行联合优化，这有助于解决单眼设置中运动估计不准确的长期问题。在公共数据集和我们收集的数据集上验证了 GaussianAvatar 的有效性，其在外观质量和渲染效率方面表现出色。\n"
  },
  {
    "path": "abs/2312.02137.md",
    "content": "### MANUS: Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians\n\nUnderstanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that can cause misalignments resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 53 cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand.\n\n理解我们如何用手抓握物体在机器人技术和混合现实等领域具有重要应用。然而，这个挑战性问题需要准确建模手和物体之间的接触。为了捕捉抓握动作，现有方法使用骨架、网格或参数模型，这可能导致不对齐，从而导致接触不准确。我们提出了一种名为 MANUS 的方法，用于无标记的手-物体抓握捕捉，使用可动的三维高斯模型。我们构建了一种新颖的可动三维高斯表示，扩展了三维高斯分散技术，以高保真度表示关节手。由于我们的表示使用高斯原始体，它使我们能够高效且准确地估计手和物体之间的接触。为了获得最准确的结果，我们的方法需要数十个摄像机视角，而当前的数据集并未提供。因此，我们构建了 MANUS-Grasps，一个新的数据集，包含从53个摄像机拍摄的30多个场景、3个主体以及超过700万帧的手-物体抓握动作。除了大量定性结果外，我们还展示了我们的方法在定量接触评估方法上的表现优于其他方法，该方法使用从物体到手的颜料转移。\n"
  },
  {
    "path": "abs/2312.02155.md",
    "content": "### GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis\n\nWe present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.\n\n我们提出了一种新的方法，名为GPS-Gaussian，用于实时合成角色的新视图。所提出的方法能够在稀疏视图摄像头设置下实现2K分辨率渲染。与原始的高斯喷溅或神经隐式渲染方法需要对每个主题进行优化不同，我们在源视图上引入了定义的高斯参数图，并直接回归高斯喷溅属性以实现即时新视图合成，无需任何微调或优化。为此，我们在大量人类扫描数据上训练我们的高斯参数回归模块，与深度估计模块联合使用，将2D参数图提升到3D空间。所提出的框架完全可微分，且在几个数据集上的实验表明，我们的方法在渲染速度上超越了最先进的方法。\n"
  },
  {
    "path": "abs/2312.02902.md",
    "content": "### HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting\n\n3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, the first model to use 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit representation from 3DGS with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent final color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, which surpasses baselines by up to ~2dB, while accelerating rendering speed by over x10.\n\n三维头部动画在过去几年里取得了重大的质量和运行时间改进，特别是受益于可微分渲染和神经辐射场的进步。实时渲染是现实世界应用中非常渴望达到的目标。我们提出了 HeadGaS，这是第一个使用三维高斯分散（3DGS）进行三维头部重建和动画的模型。在本文中，我们介绍了一种混合模型，该模型将来自3DGS的显式表示与可学习的潜在特征基底相结合，这些特征可以与参数头部模型中的低维参数线性混合，以获得表情依赖的最终颜色和不透明度值。我们展示了 HeadGaS 在实时推理帧率方面提供了最先进的结果，其性能超过基准线高达约2dB，同时加速渲染速度超过10倍。\n"
  },
  {
    "path": "abs/2312.02973.md",
    "content": "### GauHuman: Articulated Gaussian Splatting from Monocular Human Videos\n\nWe present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1 ~ 2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame. Specifically, GauHuman encodes Gaussian Splatting in the canonical space and transforms 3D Gaussians from canonical space to posed space with linear blend skinning (LBS), in which effective pose and LBS refinement modules are designed to learn fine details of 3D humans under negligible computational cost. Moreover, to enable fast optimization of GauHuman, we initialize and prune 3D Gaussians with 3D human prior, while splitting/cloning via KL divergence guidance, along with a novel merge operation for further speeding up. Extensive experiments on ZJU_Mocap and MonoCap datasets demonstrate that GauHuman achieves state-of-the-art performance quantitatively and qualitatively with fast training and real-time rendering speed. Notably, without sacrificing rendering quality, GauHuman can fast model the 3D human performer with ~13k 3D Gaussians.\n\n我们介绍了GauHuman，一种具有高斯投影的三维人体模型，相比现有基于NeRF的隐式表示建模框架，GauHuman不仅训练速度快（1～2分钟），而且能实时渲染（最高达189 FPS），现有模型需要几小时的训练时间和每帧几秒的渲染时间。具体来说，GauHuman在典型空间内编码高斯投影，并通过线性混合皮肤（LBS）将三维高斯从典型空间转换到姿态空间，在此过程中，我们设计了有效的姿态和LBS精炼模块，以微不足道的计算成本学习三维人体的细节。此外，为了快速优化GauHuman，我们使用三维人体先验初始化和修剪三维高斯，同时通过KL散度指导进行分裂/克隆，并引入一种新的合并操作以进一步加速。在ZJU_Mocap和MonoCap数据集上的广泛实验表明，GauHuman在快速训练和实时渲染速度方面定量和定性上均达到了最先进的性能。值得注意的是，GauHuman能够快速建模三维人体表演者，使用约13k个三维高斯，而不牺牲渲染质量。\n"
  },
  {
    "path": "abs/2312.03029.md",
    "content": "### Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians\n\nCreating high-fidelity 3D head avatars has always been a research hotspot, but there remains a great challenge under lightweight sparse view setups. In this paper, we propose Gaussian Head Avatar represented by controllable 3D Gaussians for high-fidelity head avatar modeling. We optimize the neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. Experiments show our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions.\n\n创建高保真三维头部形象一直是研究热点，但在轻量级稀疏视图设置下仍然存在巨大挑战。在这篇论文中，我们提出了由可控三维高斯表示的 Gaussian Head Avatar，用于高保真头部形象建模。我们优化了中性的三维高斯模型和一个完全学习的基于 MLP 的变形场，以捕捉复杂的表情。这两部分相辅相成，因此我们的方法可以在确保表情准确性的同时，建模精细的动态细节。此外，我们设计了一个基于隐式SDF和深度行进四面体的精心设计的几何引导初始化策略，以确保训练程序的稳定性和收敛性。实验表明，我们的方法在超高保真渲染质量方面超越了其他最先进的稀疏视图方法，即使在夸张的表情下也能实现2K分辨率。\n"
  },
  {
    "path": "abs/2312.03203.md",
    "content": "### Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields\n\n3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework leads to warp-level divergence. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model.\n\n近年来，三维场景表示在技术界获得了巨大的关注。使用神经辐射场 (NeRF) 的方法对传统任务如新视角合成表现出多功能性。近期，一些工作出现，旨在扩展 NeRF 功能超越视角合成，用于诸如编辑和分割等具有语义意识的任务，这是通过从二维基础模型提炼三维特征场实现的。然而，这些方法有两个主要限制：(a) 它们受限于 NeRF 管道的渲染速度，以及 (b) 隐式表示的特征场遭受连续性伪影，降低了特征质量。最近，三维高斯分散在实时辐射场渲染上展现了最先进的性能。在这项工作中，我们更进一步：除了辐射场渲染外，我们还通过二维基础模型提炼，使三维高斯分散能够应用于任意维度的语义特征。这种转换并不简单：直接将特征场纳入3DGS框架会导致层级发散。我们提出了架构和训练的改变，以有效地避免这个问题。我们提出的方法是通用的，我们的实验展示了新视角的语义分割、语言引导的编辑，以及通过学习来自最先进的二维基础模型如 SAM 和 CLIP-LSeg 的特征场，实现“分割任何事物”。在实验中，我们的提炼方法能够提供可比较或更好的结果，同时在训练和渲染方面显著更快。此外，据我们所知，我们是第一个通过利用 SAM 模型，实现辐射场操纵的点和边界框提示的方法。\n"
  },
  {
    "path": "abs/2312.03431.md",
    "content": "### Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle\n\nWe introduce Gaussian-Flow, a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos. In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS). Specifically, a novel Dual-Domain Deformation Model (DDDM) is proposed to explicitly model attribute deformations of each Gaussian point, where the time-dependent residual of each attribute is captured by a polynomial fitting in the time domain, and a Fourier series fitting in the frequency domain. The proposed DDDM is capable of modeling complex scene deformations across long video footage, eliminating the need for training separate 3DGS for each frame or introducing an additional implicit neural field to model 3D dynamics. Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction. Our proposed approach showcases a substantial efficiency improvement, achieving a 5× faster training speed compared to the per-frame 3DGS modeling. In addition, quantitative results demonstrate that the proposed Gaussian-Flow significantly outperforms previous leading methods in novel view rendering quality.\n\n我们介绍了 Gaussian-Flow，这是一种新颖的基于点的方法，用于快速动态场景重建和从多视角及单眼视频实时渲染。与受到训练和渲染速度缓慢困扰的流行 NeRF-based 方法不同，我们的方法利用了最新的点基三维高斯分散 (3DGS) 技术。具体来说，我们提出了一种新颖的双域变形模型 (DDDM)，用于显式建模每个高斯点的属性变形，其中每个属性的时域残差通过多项式拟合捕获，频域残差通过傅立叶级数拟合捕获。所提出的 DDDM 能够模拟长时间视频材料中复杂的场景变形，消除了为每一帧训练单独 3DGS 或引入额外隐式神经场来模拟三维动态的需求。此外，对离散高斯点的显式变形建模确保了四维场景的超快训练和渲染速度，与原始设计用于静态三维重建的 3DGS 相当。我们提出的方法展示了显著的效率提升，与每帧 3DGS 建模相比，训练速度提高了 5 倍。此外，定量结果表明，所提出的 Gaussian-Flow 在新视角渲染质量方面显著优于先前领先的方法。\n"
  },
  {
    "path": "abs/2312.03461.md",
    "content": "### HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting\n\nWe have recently seen tremendous progress in photo-real human modeling and rendering. Yet, efficiently rendering realistic human performance and integrating it into the rasterization pipeline remains challenging. In this paper, we present HiFi4G, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. Our core intuition is to marry the 3D Gaussian representation with non-rigid tracking, achieving a compact and compression-friendly representation. We first propose a dual-graph mechanism to obtain motion priors, with a coarse deformation graph for effective initialization and a fine-grained Gaussian graph to enforce subsequent constraints. Then, we utilize a 4D Gaussian optimization scheme with adaptive spatial-temporal regularizers to effectively balance the non-rigid prior and Gaussian updating. We also present a companion compression scheme with residual compensation for immersive experiences on various platforms. It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame. Extensive experiments demonstrate the effectiveness of our approach, which significantly outperforms existing approaches in terms of optimization speed, rendering quality, and storage overhead.\n\n近期，在逼真人类建模和渲染方面取得了巨大进展。然而，高效地渲染逼真的人类表现并将其集成到光栅化流程中仍然具有挑战性。在这篇论文中，我们介绍了HiFi4G，这是一种基于高斯的明确且紧凑的方法，用于从密集的影像资料中渲染高保真度的人类表现。我们的核心直觉是将三维高斯表征与非刚性跟踪结合起来，实现一个紧凑且便于压缩的表征。首先，我们提出了一种双图机制来获取运动先验，包括一个粗略变形图用于有效初始化，以及一个细粒度高斯图用于实施后续约束。接着，我们利用一种具有自适应时空正则化器的4D高斯优化方案，有效平衡非刚性先验和高斯更新。我们还提出了一种伴随的压缩方案，带有残差补偿，以在各种平台上实现沉浸式体验。该方案实现了约25倍的显著压缩率，每帧存储量不到2MB。广泛的实验表明，我们的方法在优化速度、渲染质量和存储开销方面显著优于现有方法。\n"
  },
  {
    "path": "abs/2312.03704.md",
    "content": "### Relightable Gaussian Codec Avatars\n\nThe fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with spatially all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars.\n\n在照明再现的真实度方面，几何和外观表示都是限制因素。对于几何来说，网格和体积方法在模拟诸如三维头发几何这样复杂结构时都存在困难。在外观方面，现有的照明再现模型在真实度上有限，并且在高分辨率连续环境中实时渲染过于缓慢。在这项工作中，我们提出了 Relightable Gaussian Codec Avatars 方法，用于构建高保真、可重新照明的头部形象，这些形象可以被动画化以生成新的表情。我们基于三维高斯的几何模型能够捕获三维一致的亚毫米级细节，如动态面部序列中的头发丝和毛孔。为了统一支持人类头部的各种材料，如眼睛、皮肤和头发，我们提出了一种基于可学习辐射传递的新颖可照明外观模型。结合全局照明感知的球谐函数用于漫反射组件，我们使用球形高斯实现了实时照明，具有空间上全频率的反射。这种外观模型可以在点光源和连续照明下有效地重新照明。我们进一步通过引入可照明的显式眼睛模型来提高眼睛反射的真实度，并实现显式凝视控制。我们的方法在不妥协实时性能的情况下，超越了现有方法。我们还展示了在有线消费者 VR 头显上实时重新照明形象，展示了我们形象的效率和真实度。\n"
  },
  {
    "path": "abs/2312.04558.md",
    "content": "### MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar\n\nThe ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods.\n\n在单目人像视频序列中重建逼真的头部形象的能力，代表了连接虚拟世界与现实世界之间的关键一步。最近在头部形象技术方面的进展，包括显式的三维可变形网格（3DMM）、点云和神经隐式表示，已被用于这一持续的研究。然而，基于3DMM的方法受限于其固定的拓扑结构，基于点的方法由于涉及的点数量庞大而承受着繁重的训练负担，最后一种方法在变形灵活性和渲染效率上存在局限性。为了应对这些挑战，我们提出了 MonoGaussianAvatar（单目高斯点基头部形象），这是一种新颖的方法，它利用三维高斯点表示与高斯变形场结合，从单目人像视频中学习显式头部形象。我们用具有可调整形状的高斯点定义头部形象，从而实现灵活的拓扑结构。这些点随着目标姿势和人的表情，通过高斯变形场展示运动，从而实现高效的变形。此外，高斯点具有可控的形状、大小、颜色和不透明度，结合高斯分散，允许高效的训练和渲染。实验展示了我们方法的卓越性能，它在先前方法中实现了最先进的结果。\n"
  },
  {
    "path": "abs/2312.04564.md",
    "content": "### EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS\n\nRecently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach results in scene representations with fewer Gaussians and quantized representations, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce memory by more than an order of magnitude all while maintaining the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed.\n\n近来，三维高斯分散（3D-GS）在新视角场景合成中获得了人气。它解决了与神经辐射场（NeRFs）相关的漫长训练时间和缓慢的渲染速度的挑战。通过快速、可微的三维高斯光栅化，3D-GS实现了实时渲染和加速训练。然而，它们需要大量的内存资源用于训练和存储，因为每个场景的点云表示需要数百万个高斯点。我们提出了一种使用量化嵌入的技术，显著降低了内存存储需求，并采用了从粗到细的训练策略，以更快、更稳定地优化高斯点云。我们的方法导致使用更少的高斯点和量化表示的场景表示，从而实现了更快的训练时间和渲染速度，用于实时渲染高分辨率场景。我们在保持重建质量的同时，将内存需求减少了一个数量级以上。我们在各种数据集和场景上验证了我们方法的有效性，同时保持了视觉质量，同时消耗了比以往少10-20倍的内存，并且训练/推断速度更快。\n"
  },
  {
    "path": "abs/2312.04820.md",
    "content": "### Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting\n\nWe propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss.\n\n我们提出了一个旨在增强三维生成任务扩散先验的统一框架。尽管这些任务极其重要，现有方法在生成高质量结果方面往往面临挑战。我们首先审视了以往扩散先验中的固有局限性。我们发现了扩散先验与扩散模型训练程序之间的偏差，这大大降低了三维生成的质量。为了解决这个问题，我们提出了一个新颖的统一框架，该框架迭代优化三维模型和扩散先验。利用扩散先验的不同可学习参数，我们的方法提供了多种配置，允许在性能和实施复杂性之间进行各种权衡。值得注意的是，我们的实验结果表明，我们的方法显著超越了现有技术，在文本到三维生成领域确立了新的最先进水平。此外，我们的方法在神经辐射场（NeRF）和新引入的三维高斯分散骨架上都表现出色。此外，我们的框架对最近的得分蒸馏方法，如 VSD 和 DDS 损失的理解，提供了有意义的贡献。\n"
  },
  {
    "path": "abs/2312.05133.md",
    "content": "### GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization\n\nThis paper presents GIR, a 3D Gaussian Inverse Rendering method for relightable scene factorization. Compared to existing methods leveraging discrete meshes or neural implicit fields for inverse rendering, our method utilizes 3D Gaussians to estimate the material properties, illumination, and geometry of an object from multi-view images. Our study is motivated by the evidence showing that 3D Gaussian is a more promising backbone than neural fields in terms of performance, versatility, and efficiency. In this paper, we aim to answer the question: \"How can 3D Gaussian be applied to improve the performance of inverse rendering?\" To address the complexity of estimating normals based on discrete and often in-homogeneous distributed 3D Gaussian representations, we proposed an efficient self-regularization method that facilitates the modeling of surface normals without the need for additional supervision. To reconstruct indirect illumination, we propose an approach that simulates ray tracing. Extensive experiments demonstrate our proposed GIR's superior performance over existing methods across multiple tasks on a variety of widely used datasets in inverse rendering. This substantiates its efficacy and broad applicability, highlighting its potential as an influential tool in relighting and reconstruction.\n\n这篇论文介绍了一种名为GIR的3D高斯逆渲染方法，用于可重照明的场景分解。与现有利用离散网格或神经隐式场进行逆渲染的方法相比，我们的方法使用3D高斯来估计从多视图图像中物体的材料属性、照明和几何形状。我们的研究是由证据激发的，这些证据表明，3D高斯作为骨干网络在性能、多功能性和效率方面比神经场更有前景。在本文中，我们旨在回答这样一个问题：“3D高斯如何应用于提高逆渲染的性能？”为了解决基于离散且通常是不均匀分布的3D高斯表示估计法线的复杂性，我们提出了一种有效的自我调节方法，该方法有助于在不需要额外监督的情况下对表面法线进行建模。为了重建间接照明，我们提出了一种模拟光线追踪的方法。大量实验表明，我们提出的GIR在多项任务上的性能优于现有方法，这些任务在逆渲染中广泛使用的各种数据集上进行。这证实了其有效性和广泛的适用性，突出了其作为重照明和重建中有影响力工具的潜力。\n"
  },
  {
    "path": "abs/2312.05664.md",
    "content": "### CoGS: Controllable Gaussian Splatting\n\nCapturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity.\n\n捕捉和重现关节物体的三维结构面临着重大障碍。一方面，需要广泛校准的多视图设置方法过于复杂和资源密集，限制了它们的实际应用性。另一方面，虽然单摄像头的神经辐射场（NeRFs）提供了一种更加简洁的方法，但它们的训练和渲染成本过高。三维高斯投影（3D Gaussian Splatting）本可以是一个合适的替代方法，但存在两个问题。首先，现有的三维动态高斯方法需要同步的多视图摄像头，其次，动态场景下缺乏可控性。我们提出了CoGS，一种可控高斯投影方法，能够直接操作场景元素，提供动态场景的实时控制，无需预先计算控制信号。我们使用包括动态对象在内的合成和真实世界数据集对CoGS进行了评估，这些动态对象在难度上有所不同。在我们的评估中，CoGS在视觉保真度方面始终优于现有的动态和可控神经表示。\n"
  },
  {
    "path": "abs/2312.05941.md",
    "content": "### ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering\n\nReal-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.\n\n实时渲染逼真且可控的人类虚拟形象是计算机视觉和图形学的一个基石。虽然近期在神经隐式渲染方面的进展为数字化虚拟形象解锁了前所未有的逼真度，但实时性能大多只在静态场景中得到展示。为了解决这个问题，我们提出了ASH，一种可动画的3D高斯喷溅方法，用于实时渲染动态人类的逼真图像。我们将穿衣的人类参数化为可动画的3D高斯，这些高斯可以高效地喷溅到图像空间中以生成最终渲染。然而，天真地在3D空间中学习高斯参数在计算上提出了严峻挑战。相反，我们将高斯附加到一个可变形的角色模型上，并在2D纹理空间中学习它们的参数，这允许利用高效的2D卷积架构，轻松地扩展所需的高斯数量。我们在姿势可控虚拟形象上对ASH进行了基准测试，并与竞争方法进行了比较，证明我们的方法在实时方法上的性能远远超过现有方法，并且显示出与离线方法相当甚至更好的结果。\n"
  },
  {
    "path": "abs/2312.06741.md",
    "content": "### Gaussian Splatting SLAM\n\nWe present the first application of 3D Gaussian Splatting to incremental 3D reconstruction using a single moving monocular or RGB-D camera. Our Simultaneous Localisation and Mapping (SLAM) method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation, but also reconstruction of tiny and even transparent objects.\n\n我们首次将3D高斯喷溅应用于使用单个移动的单眼或RGB-D摄像机进行增量式3D重建。我们的同时定位与地图构建（SLAM）方法实时运行，速度为每秒3帧，仅使用高斯作为唯一的3D表示，统一了精确、高效跟踪、绘图和高质量渲染所需的表示。要从实时摄像机不断重建高保真度的3D场景，需要多项创新。首先，为了超越原始的3D高斯喷溅（3DGS）算法，该算法需要离线结构运动（SfM）系统提供精确的姿态，我们为3DGS制定了针对3D高斯进行直接优化的摄像机跟踪方法，并展示了这使得快速且稳健的跟踪成为可能，并具有广泛的收敛盆地。其次，通过利用高斯的显性特性，我们引入了几何验证和规范化来处理增量式3D密集重建中出现的歧义。最后，我们引入了一个完整的SLAM系统，它不仅在新视图合成和轨迹估计方面实现了最新技术水平，而且还能重建微小甚至透明的物体。\n"
  },
  {
    "path": "abs/2312.07504.md",
    "content": "### COLMAP-Free 3D Gaussian Splatting\n\nWhile neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.\n\n虽然神经渲染在场景重建和新视角合成方面取得了令人印象深刻的进展，但它严重依赖于精确预计算的摄像机姿态。为了放宽这一限制，已经做出了多种努力，以在没有预处理摄像机姿态的情况下训练神经辐射场（NeRFs）。然而，NeRFs的隐式表示为同时优化3D结构和摄像机姿态带来了额外的挑战。另一方面，最近提出的3D高斯喷溅由于其明确的点云表示，提供了新的机会。本文利用明确的几何表示和输入视频流的连续性，无需进行任何SfM预处理，就可以进行新视角合成。我们以序列方式处理输入帧，并通过一次处理一个输入帧，逐步增长3D高斯集，无需预先计算摄像机姿态。我们的方法在大幅运动变化下的视图合成和摄像机姿态估计方面，显著改善了以前的方法。\n"
  },
  {
    "path": "abs/2312.07920.md",
    "content": "### DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes\n\nWe present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes. For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene with incremental static 3D Gaussians. We then leverage a composite dynamic Gaussian graph to handle multiple moving objects, individually reconstructing each object and restoring their accurate positions and occlusion relationships within the scene. We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency. DrivingGaussian outperforms existing methods in driving scene reconstruction and enables photorealistic surround-view synthesis with high-fidelity and multi-camera consistency. The source code and trained models will be released.\n\n我们提出了DrivingGaussian，这是一个高效且有效的框架，用于处理环绕动态自动驾驶场景。对于具有移动物体的复杂场景，我们首先使用增量静态3D高斯，顺序且逐步地模拟整个场景的静态背景。然后，我们利用复合动态高斯图来处理多个移动物体，分别重建每个物体，并恢复它们在场景中的准确位置和遮挡关系。我们进一步使用激光雷达先验进行高斯喷溅，以重建更多细节的场景，并保持全景一致性。DrivingGaussian在驾驶场景重建方面优于现有方法，并能实现高保真度和多摄像机一致性的逼真环绕视图合成。源代码和训练好的模型将会发布。\n"
  },
  {
    "path": "abs/2312.09031.md",
    "content": "### iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching\n\nWe present a method named iComMa to address the 6D pose estimation problem in computer vision. The conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods address mesh-free 6D pose estimation by employing the inversion of a Neural Radiance Field (NeRF), aiming to overcome the aforementioned constraints. However, it still suffers from adverse initializations. By contrast, we model the pose estimation as the problem of inverting the 3D Gaussian Splatting (3DGS) with both the comparing and matching loss. In detail, a render-and-compare strategy is adopted for the precise estimation of poses. Additionally, a matching module is designed to enhance the model's robustness against adverse initializations by minimizing the distances between 2D keypoints. This framework systematically incorporates the distinctive characteristics and inherent rationale of render-and-compare and matching-based approaches. This comprehensive consideration equips the framework to effectively address a broader range of intricate and challenging scenarios, including instances with substantial angular deviations, all while maintaining a high level of prediction accuracy. Experimental results demonstrate the superior precision and robustness of our proposed jointly optimized framework when evaluated on synthetic and complex real-world data in challenging scenarios.\n\n我们提出了一种名为iComMa的方法，用于解决计算机视觉中的6D姿态估计问题。传统的姿态估计方法通常依赖于目标的CAD模型或需要针对特定物体类别进行特定的网络训练。一些现有方法通过采用神经辐射场（NeRF）的逆变换来解决无网格的6D姿态估计，旨在克服上述限制。然而，它仍然受到不良初始化的困扰。相比之下，我们将姿态估计建模为逆转3D高斯喷溅（3DGS）的问题，并结合了比较和匹配损失。具体来说，采用了渲染并比较策略来精确估计姿态。此外，设计了一个匹配模块，通过最小化2D关键点之间的距离，增强模型对不良初始化的鲁棒性。该框架系统性地结合了渲染并比较以及基于匹配方法的独特特征和内在逻辑。这种全面的考虑使该框架能够有效地解决更广泛的复杂和具有挑战性的场景，包括具有显著角度偏差的情况，同时保持高水平的预测准确性。实验结果表明，我们提出的联合优化框架在评估合成和复杂的现实世界数据的具有挑战性的场景时，展现出卓越的精度和鲁棒性。\n"
  },
  {
    "path": "abs/2312.09147.md",
    "content": "### Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers\n\nRecent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques.\n\n近期在从单张图片进行3D重建方面的进展，是由生成模型的发展驱动的。其中突出的方法基于得分蒸馏采样（SDS）和扩散模型在3D领域的适应。尽管这些技术取得了进展，但由于优化或渲染过程缓慢，通常面临着漫长的训练和优化时间的限制。在这篇论文中，我们介绍了一种用于单视图重建的新方法，该方法通过前馈推理有效地从单张图片生成3D模型。我们的方法使用两个基于变换器的网络，即点解码器和三平面解码器，利用混合三平面-高斯中间表示来重建3D对象。这种混合表示实现了平衡，在与隐式表示相比提供更快的渲染速度的同时，也比显式表示提供了更优秀的渲染质量。点解码器旨在从单张图片生成点云，提供了一个显式表示，然后被三平面解码器用来为每个点查询高斯特征。这种设计选择解决了直接回归非结构性质的显式3D高斯属性所带来的挑战。随后，3D高斯由一个多层感知机（MLP）解码，以通过喷溅实现快速渲染。这两个解码器都建立在可扩展的基于变换器的架构之上，并已在大规模3D数据集上进行了高效训练。在合成数据集和真实世界图片上进行的评估表明，我们的方法不仅实现了更高的质量，而且与以前的最新技术相比，确保了更快的运行时间。\n"
  },
  {
    "path": "abs/2312.09228.md",
    "content": "### 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting\n\nWe introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.\n\n我们介绍了一种方法，使用3D高斯喷溅（3DGS）从单目视频中创建可动画的人类虚拟形象。现有基于神经辐射场（NeRFs）的方法能够实现高质量的新视角/新姿态图像合成，但通常需要数天的训练时间，并且在推理时极其缓慢。最近，研究社区探索了快速网格结构，用于高效训练穿着服装的虚拟形象。尽管这些方法在训练时极为快速，但它们几乎只能达到大约15 FPS的交互式渲染帧率。在这篇论文中，我们使用3D高斯喷溅，并学习一个非刚性变形网络，以重建可动画的穿着服装的人类虚拟形象，这些虚拟形象可以在30分钟内训练完成，并以实时帧率（50+ FPS）渲染。鉴于我们表示的明确性，我们进一步引入尽可能等距的规范化，作用于高斯均值向量和协方差矩阵上，增强了我们模型在高度关节化未见姿态上的泛化能力。实验结果显示，与最先进的方法相比，我们的方法在从单目输入创建可动画虚拟形象方面实现了可比甚至更好的性能，同时在训练和推理方面分别快了400倍和250倍。\n"
  },
  {
    "path": "abs/2312.09242.md",
    "content": "### Text2Immersion: Generative Immersive Scene with 3D Gaussians\n\nWe introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts. Our proposed pipeline initiates by progressively generating a Gaussian cloud using pre-trained 2D diffusion and depth estimation models. This is followed by a refining stage on the Gaussian cloud, interpolating and refining it to enhance the details of the generated scene. Distinct from prevalent methods that focus on single object or indoor scenes, or employ zoom-out trajectories, our approach generates diverse scenes with various objects, even extending to the creation of imaginary scenes. Consequently, Text2Immersion can have wide-ranging implications for various applications such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation.\n\n我们介绍了一种名为Text2Immersion的优雅方法，用于从文本提示生成高质量的3D沉浸式场景。我们提出的流程首先使用预训练的2D扩散和深度估计模型逐步生成高斯云。这之后是对高斯云的细化阶段，对其进行插值和细化，以增强生成场景的细节。与侧重于单个对象或室内场景的流行方法不同，或使用缩放轨迹，我们的方法生成具有各种对象的多样场景，甚至扩展到创造想象中的场景。因此，Text2Immersion可广泛应用于各种应用程序，如虚拟现实、游戏开发和自动内容创建。广泛的评估表明，我们的系统在渲染质量和多样性方面超越了其他方法，进一步推动了文本驱动的3D场景生成。\n"
  },
  {
    "path": "abs/2312.09682.md",
    "content": "### Exploring the Feasibility of Generating Realistic 3D Models of Endangered Species Using DreamGaussian: An Analysis of Elevation Angle's Impact on Model Generation\n\nMany species face the threat of extinction. It's important to study these species and gather information about them as much as possible to preserve biodiversity. Due to the rarity of endangered species, there is a limited amount of data available, making it difficult to apply data requiring generative AI methods to this domain. We aim to study the feasibility of generating consistent and real-like 3D models of endangered animals using limited data. Such a phenomenon leads us to utilize zero-shot stable diffusion models that can generate a 3D model out of a single image of the target species. This paper investigates the intricate relationship between elevation angle and the output quality of 3D model generation, focusing on the innovative approach presented in DreamGaussian. DreamGaussian, a novel framework utilizing Generative Gaussian Splatting along with novel mesh extraction and refinement algorithms, serves as the focal point of our study. We conduct a comprehensive analysis, analyzing the effect of varying elevation angles on DreamGaussian's ability to reconstruct 3D scenes accurately. Through an empirical evaluation, we demonstrate how changes in elevation angle impact the generated images' spatial coherence, structural integrity, and perceptual realism. We observed that giving a correct elevation angle with the input image significantly affects the result of the generated 3D model. We hope this study to be influential for the usability of AI to preserve endangered animals; while the penultimate aim is to obtain a model that can output biologically consistent 3D models via small samples, the qualitative interpretation of an existing state-of-the-art model such as DreamGaussian will be a step forward in our goal.\n\n许多物种面临灭绝的威胁。为了保护生物多样性，研究这些物种并尽可能多地收集关于它们的信息非常重要。由于濒危物种的稀有性，可用数据量有限，这使得难以将需要大量数据的生成型人工智能方法应用于此领域。我们的目标是研究使用有限数据生成一致且逼真的濒危动物三维模型的可行性。这种现象促使我们利用零样本稳定扩散模型，它能够仅根据目标物种的单一图像生成三维模型。本文研究了高度角与三维模型生成输出质量之间的复杂关系，重点关注DreamGaussian中提出的创新方法。DreamGaussian是一个新颖的框架，它结合了生成型高斯溅射技术以及新型网格提取和细化算法，是我们研究的焦点。我们进行了全面的分析，分析了不同高度角对DreamGaussian准确重建三维场景能力的影响。通过实证评估，我们展示了高度角变化如何影响生成图像的空间一致性、结构完整性和感知真实性。我们观察到，输入图像时给出正确的高度角对生成的三维模型结果有显著影响。我们希望这项研究对利用人工智能保护濒危动物的可用性产生影响；虽然最终目标是获得一个能够通过少量样本输出生物学上一致的三维模型的模型，但对现有先进模型如DreamGaussian的定性解释将是我们实现目标的一大步。\n"
  },
  {
    "path": "abs/2312.11458.md",
    "content": "### GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis\n\nWe propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering.\n\n我们提出了一种适用于单目视频的动态场景重建方法，该方法使用可变形的3D高斯函数。基于高斯飞溅效率，我们的方法将表征扩展到通过位于规范空间中的一组可变形高斯函数和由多层感知器（MLP）定义的时变形变场来适应动态元素。此外，基于大多数自然场景有大面积静态区域的假设，我们允许MLP通过额外包括一个静态高斯点云来聚焦其表征能力。连接的动态和静态点云形成了高斯飞溅光栅化器的输入，实现实时渲染。这种可微分的管道通过自我监督的渲染损失进行端到端优化。我们的方法在与最先进的动态神经辐射场方法相当的情况下，实现了更快的优化和渲染。\n"
  },
  {
    "path": "abs/2312.11461.md",
    "content": "### GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning\n\nGaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.\n\n高斯溅射作为一种强大的三维表征手段，融合了显式（网格）和隐式（NeRF）三维表征的优势。在这篇论文中，我们试图利用高斯溅射技术，仅通过文本描述生成真实可动的虚拟形象，解决了基于网格或NeRF表征的限制（例如，灵活性和效率）。然而，简单应用高斯溅射技术无法生成高质量的可动虚拟形象，并且面临学习不稳定性；它也无法捕捉精细的虚拟形象几何结构，常导致身体部位退化。为解决这些问题，我们首先提出了一种基于原始体的三维高斯表征方法，其中高斯函数定义在受姿势驱动的原始体内以便于动画制作。其次，为了稳定学习数百万个高斯函数并减轻学习负担，我们提出使用神经隐式场预测高斯属性（例如，颜色）。最后，为了捕捉精细的虚拟形象几何结构并提取详细的网格，我们提出了一种新颖的基于SDF的隐式网格学习方法，用于三维高斯处理，这种方法规范了底层几何结构，并提取了高度详细的带纹理网格。我们提出的方法，GAvatar，使得使用文本提示大规模生成多样的可动虚拟形象成为可能。GAvatar在外观和几何质量方面显著超越现有方法，并在1K分辨率下实现了极快的渲染速度（100帧/秒）。\n"
  },
  {
    "path": "abs/2312.12337.md",
    "content": "### pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction\n\nWe introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.\n\n我们介绍了一种名为pixelSplat的前馈模型，它学会从成对图像中重建由3D高斯原语参数化的3D辐射场。我们的模型具有实时和内存高效的渲染功能，适用于可扩展训练，以及在推理时快速的3D重建。为了克服稀疏和局部支持表征固有的局部最小值问题，我们预测3D上的密集概率分布，并从该概率分布中采样高斯均值。我们通过重新参数化技巧使这种采样操作可微分，从而能够通过高斯飞溅表示反向传播梯度。我们在真实世界的RealEstate10k和ACID数据集上对我们的方法进行了基准测试，在广基线新视角合成方面，我们的性能超越了最先进的光场变换器，并在重建可解释和可编辑的3D辐射场时加速了渲染速度达2.5个数量级。\n"
  },
  {
    "path": "abs/2312.13102.md",
    "content": "### SpecNeRF: Gaussian Directional Encoding for Specular Reflections\n\nNeural radiance fields have achieved remarkable performance in modeling the appearance of 3D scenes. However, existing approaches still struggle with the view-dependent appearance of glossy surfaces, especially under complex lighting of indoor environments. Unlike existing methods, which typically assume distant lighting like an environment map, we propose a learnable Gaussian directional encoding to better model the view-dependent effects under near-field lighting conditions. Importantly, our new directional encoding captures the spatially-varying nature of near-field lighting and emulates the behavior of prefiltered environment maps. As a result, it enables the efficient evaluation of preconvolved specular color at any 3D location with varying roughness coefficients. We further introduce a data-driven geometry prior that helps alleviate the shape radiance ambiguity in reflection modeling. We show that our Gaussian directional encoding and geometry prior significantly improve the modeling of challenging specular reflections in neural radiance fields, which helps decompose appearance into more physically meaningful components.\n\n神经辐射场在模拟3D场景的外观方面取得了显著成绩。然而，现有方法在处理光泽表面的视角依赖外观时仍存在困难，特别是在复杂的室内环境光照下。与通常假设远场光照如环境图的现有方法不同，我们提出了一种可学习的高斯方向编码，以更好地模拟近场光照条件下的视角依赖效应。重要的是，我们的新方向编码捕捉了近场光照的空间变化特性，并模仿了预过滤环境图的行为。结果是，它使得在任何具有不同粗糙度系数的3D位置高效评估预卷积镜面颜色成为可能。我们进一步引入了一个数据驱动的几何形状先验，有助于减轻反射建模中的形状辐射模糊性。我们展示了我们的高斯方向编码和几何形状先验在神经辐射场中显著改善了具有挑战性的镜面反射建模，有助于将外观分解为更物理意义的组成部分。\n"
  },
  {
    "path": "abs/2312.13150.md",
    "content": "### Splatter Image: Ultra-Fast Single-View 3D Reconstruction\n\nWe introduce the Splatter Image, an ultra-fast approach for monocular 3D object reconstruction which operates at 38 FPS. Splatter Image is based on Gaussian Splatting, which has recently brought real-time rendering, fast training, and excellent scaling to multi-view reconstruction. For the first time, we apply Gaussian Splatting in a monocular reconstruction setting. Our approach is learning-based, and, at test time, reconstruction only requires the feed-forward evaluation of a neural network. The main innovation of Splatter Image is the surprisingly straightforward design: it uses a 2D image-to-image network to map the input image to one 3D Gaussian per pixel. The resulting Gaussians thus have the form of an image, the Splatter Image. We further extend the method to incorporate more than one image as input, which we do by adding cross-view attention. Owning to the speed of the renderer (588 FPS), we can use a single GPU for training while generating entire images at each iteration in order to optimize perceptual metrics like LPIPS. On standard benchmarks, we demonstrate not only fast reconstruction but also better results than recent and much more expensive baselines in terms of PSNR, LPIPS, and other metrics.\n\n我们介绍了“飞溅图像”(Splatter Image)，这是一种超快速的单目三维物体重建方法，工作速度可达每秒38帧。飞溅图像基于高斯飞溅(Gaussian Splatting)技术，该技术最近在多视角重建领域带来了实时渲染、快速训练和优秀的扩展能力。这是我们首次将高斯飞溅应用于单目重建环境。我们的方法基于学习，且在测试时，重建仅需要神经网络的前向评估。飞溅图像的主要创新在于其出人意料的简洁设计：它使用二维图像到图像的网络将输入图像映射到每个像素的一个三维高斯上。因此，生成的高斯以图像的形式呈现，即飞溅图像。我们进一步扩展了该方法，以纳入多个图像作为输入，这是通过添加交叉视图注意力实现的。由于渲染器的高速（每秒588帧），我们可以在单个GPU上进行训练，同时在每次迭代中生成整个图像，以优化感知度量，如LPIPS。在标准基准测试中，我们不仅展示了快速的重建速度，还在PSNR、LPIPS和其他指标上展示了比近期更昂贵的基线方法更好的结果。\n"
  },
  {
    "path": "abs/2312.13271.md",
    "content": "### Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting\n\nRecent one image to 3D generation methods commonly adopt Score Distillation Sampling (SDS). Despite the impressive results, there are multiple deficiencies including multi-view inconsistency, over-saturated and over-smoothed textures, as well as the slow generation speed. To address these deficiencies, we present Repaint123 to alleviate multi-view bias as well as texture degradation and speed up the generation process. The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency. We further propose visibility-aware adaptive repainting strength for overlap regions to enhance the generated image quality in the repainting process. The generated high-quality and multi-view consistent images enable the use of simple Mean Square Error (MSE) loss for fast 3D content generation. We conduct extensive experiments and show that our method has a superior ability to generate high-quality 3D content with multi-view consistency and fine textures in 2 minutes from scratch.\n\n最近的一种从单张图片生成三维内容的方法通常采用分数蒸馏采样（SDS）。尽管结果令人印象深刻，但还存在多个缺陷，包括多视角不一致性、过饱和和过平滑的纹理，以及生成速度慢。为了解决这些缺陷，我们提出了 Repaint123，旨在减轻多视角偏见以及纹理退化，并加快生成过程。核心思想是结合二维扩散模型的强大图像生成能力和重绘策略的纹理对齐能力，生成具有一致性的高质量多视角图像。我们进一步提出了可见性感知的自适应重绘强度，以增强重绘过程中生成图像的质量。生成的高质量且多视角一致的图像使得使用简单的均方误差（MSE）损失快速生成三维内容成为可能。我们进行了广泛的实验，并展示了我们的方法在2分钟内从零开始生成具有多视角一致性和精细纹理的高质量三维内容的卓越能力。\n"
  },
  {
    "path": "abs/2312.13299.md",
    "content": "### Compact 3D Scene Representation via Self-Organizing Gaussian Grids\n\n3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, e.g.~on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 8x to 26x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption.\n\n三维高斯飞溅(3D Gaussian Splatting)技术近来已成为静态三维场景建模的一种非常有前景的技术。与神经辐射场(Neural Radiance Fields)相比，它利用高效的光栅化实现了高质量的快速渲染。然而，它的存储大小显著增加，这限制了它在资源受限设备上的实际部署。在本文中，我们引入了一种紧凑的场景表示方法，将三维高斯飞溅的参数组织到具有局部同质性的二维网格中，从而大幅减少存储需求，同时在渲染过程中不影响视觉质量。我们的想法核心是明确利用自然场景中存在的感知冗余。本质上，场景的固有特性允许使用众多高斯参数的排列来等效地表示它。为此，我们提出了一种新颖的高度并行算法，它将高维高斯参数有规律地排列到二维网格中，同时保留它们的邻域结构。在训练过程中，我们进一步在网格中对排序的参数施加局部平滑性。未压缩的高斯使用与三维高斯飞溅相同的结构，确保与现有渲染器的无缝集成。我们的方法在复杂场景的大小上实现了8倍至26倍的减少，且不增加训练时间，标志着在三维场景分发和消费领域的一大飞跃。\n"
  },
  {
    "path": "abs/2312.13308.md",
    "content": "### SWAGS: Sampling Windows Adaptively for Dynamic 3D Gaussian Splatting\n\nNovel view synthesis has shown rapid progress recently, with methods capable of producing evermore photo-realistic results. 3D Gaussian Splatting has emerged as a particularly promising method, producing high-quality renderings of static scenes and enabling interactive viewing at real-time frame rates. However, it is currently limited to static scenes only. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model the dynamics of a scene using a tunable MLP, which learns the deformation field from a canonical space to a set of 3D Gaussians per frame. To disentangle the static and dynamic parts of the scene, we learn a tuneable parameter for each Gaussian, which weighs the respective MLP parameters to focus attention on the dynamic parts. This improves the model's ability to capture dynamics in scenes with an imbalance of static to dynamic regions. To handle scenes of arbitrary length whilst maintaining high rendering quality, we introduce an adaptive window sampling strategy to partition the sequence into windows based on the amount of movement in the sequence. We train a separate dynamic Gaussian Splatting model for each window, allowing the canonical representation to change, thus enabling the reconstruction of scenes with significant geometric or topological changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time with our dynamic interactive viewer.\n\n近期，新视角合成技术取得了迅速的进展，这些方法能够制作出越来越逼真的照片级结果。三维高斯飞溅(3D Gaussian Splatting)作为一种特别有前景的方法，已经出现，能够产生高质量的静态场景渲染，并支持实时帧率的交互式查看。然而，它目前仅限于静态场景。在这项工作中，我们将三维高斯飞溅扩展到动态场景重建。我们使用可调整的多层感知器(MLP)来模拟场景的动态，它学习从规范空间到每帧的一组三维高斯的变形场。为了分离场景的静态和动态部分，我们为每个高斯学习一个可调整的参数，该参数加权了相应的MLP参数，以专注于动态部分。这提高了模型捕捉静态与动态区域不平衡场景动态的能力。为了处理任意长度的场景，同时保持高渲染质量，我们引入了一种自适应窗口采样策略，根据序列中的运动量将序列划分为窗口。我们为每个窗口训练一个单独的动态高斯飞溅模型，允许规范表示发生变化，从而使得具有显著几何或拓扑变化的场景得以重建。使用随机采样的新视角上的自监督一致性损失进行微调，以强制时间一致性。结果是，我们的方法生成了具有竞争性定量表现的一般动态场景的高质量渲染，并且可以通过我们的动态交互式查看器实时查看。\n"
  },
  {
    "path": "abs/2312.13729.md",
    "content": "### Gaussian Splatting with NeRF-based Color and Opacity\n\nNeural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar renders quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects.\n\n神经辐射场(NeRFs)已经展示了神经网络捕捉三维物体复杂性的惊人潜力。通过在神经网络权重中编码形状和颜色信息，NeRFs在生成三维物体的新视角方面表现出色，能产生极为清晰的图像。最近，利用生成模型的NeRFs的众多泛化版本已经出现，扩展了其多功能性。相比之下，高斯飞溅(Gaussian Splatting, GS)提供了类似的渲染质量，且由于不需要神经网络就可以工作，因此在训练和推理上更快。我们将关于三维物体的信息编码在一组高斯分布中，这些分布可以像传统网格一样在三维中渲染。不幸的是，GS很难进行条件设定，因为它们通常需要大约十万个高斯组件。为了缓解这两种模型的局限，我们提出了一种混合模型，它使用GS表示三维物体的形状，以及基于NeRF的颜色和不透明度编码。我们的模型使用具有可训练位置（即高斯的均值）、形状（即高斯的协方差）、颜色和不透明度的高斯分布，以及一个神经网络，该网络采用高斯的参数和观看方向来产生颜色和不透明度的变化。因此，我们的模型更好地描述了三维物体的阴影、光反射和透明度。\n"
  },
  {
    "path": "abs/2312.13763.md",
    "content": "### Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models\n\nText-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional generation-based approach, and combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization, thereby simultaneously enforcing temporal consistency, high-quality visual appearance and realistic geometry. Our method, called Align Your Gaussians (AYG), leverages dynamic 3D Gaussian Splatting with deformation fields as 4D representation. Crucial to AYG is a novel method to regularize the distribution of the moving 3D Gaussians and thereby stabilize the optimization and induce motion. We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation. These techniques allow us to synthesize vivid dynamic scenes, outperform previous work qualitatively and quantitatively and achieve state-of-the-art text-to-4D performance. Due to the Gaussian 4D representation, different 4D animations can be seamlessly combined, as we demonstrate. AYG opens up promising avenues for animation, simulation and digital content creation as well as synthetic data generation.\n\n本文中，我们重点关注尚未充分探索的文本到4D（三维动画）转换，并使用带有额外时间维度的分数蒸馏方法合成动态的、有动画效果的三维物体。与之前的工作相比，我们采取了一种全新的组合式生成方法，结合了文本到图像、文本到视频和具有三维意识的多视图扩散模型，以在4D物体优化过程中提供反馈，从而同时实现时间一致性、高质量视觉外观和真实的几何形状。我们的方法，称为“对齐你的高斯”(Align Your Gaussians, AYG)，利用动态三维高斯飞溅与变形场作为4D表示。AYG的关键是一种新颖的方法，用于规范移动的三维高斯的分布，从而稳定优化过程并诱导运动。我们还提出了一种运动放大机制以及一种新的自回归合成方案，用于生成和组合多个4D序列，实现更长时间的生成。这些技术使我们能够合成生动的动态场景，从定性和定量上超越以往的工作，并实现最先进的文本到4D性能。由于采用了高斯4D表示，不同的4D动画可以无缝结合，正如我们所展示的。AYG为动画、模拟、数字内容创作以及合成数据生成开辟了有前途的途径。\n"
  },
  {
    "path": "abs/2312.14937.md",
    "content": "### SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes\n\nNovel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications.\n\n在计算机视觉和图形学中，动态场景的新视角合成仍然是一个挑战性问题。最近，高斯涂抹技术作为一种表示静态场景的稳健技术浮现出来，使得高质量和实时的新视角合成成为可能。基于这种技术，我们提出了一种新的表征方式，它显式地将动态场景的运动和外观分解为稀疏控制点和密集高斯模型。我们的关键思想是使用数量远少于高斯模型的稀疏控制点来学习紧凑的6自由度（DoF）转换基础，这些基础可以通过学习到的插值权重在局部插值，从而产生3D高斯模型的运动场。我们采用变形多层感知器（MLP）来预测每个控制点的时变6自由度转换，这降低了学习复杂性，增强了学习能力，并有助于获得时空连贯的运动模式。然后，我们联合学习3D高斯模型、控制点的典型空间位置和变形MLP，以重建3D场景的外观、几何和动态特性。在学习过程中，控制点的位置和数量会根据不同区域的运动复杂性自适应地调整，并且开发了一个遵循尽可能刚性原则的ARAP（As Rigid As Possible）损失，以强制学习到的运动具有空间连续性和局部刚性。最后，由于明确的稀疏运动表征及其与外观的分解，我们的方法能够在保持高保真外观的同时，实现用户控制的运动编辑。广泛的实验表明，我们的方法在新视角合成方面超越了现有方法，并具有高渲染速度，使得新的保持外观的运动编辑应用成为可能。\n"
  },
  {
    "path": "abs/2312.15059.md",
    "content": "### Deformable 3D Gaussian Splatting for Animatable Human Avatars\n\nRecent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually.\n\n近期在神经辐射场方面的进展使得在动态设置中合成逼真图像的新视角成为可能，这可以应用于包含人类动画的场景。通常用于建立精确模型的隐式主干网络，然而，需要多个输入视图和额外的标注，如人类遮罩、UV映射和深度图。在这项工作中，我们提出了ParDy-Human（参数化动态人类化身），这是一种完全显式的方法，用来从单一的单眼序列构建数字化身。ParDy-Human将参数驱动的动态引入到3D高斯涂抹中，其中3D高斯通过人体姿态模型变形以动画化身。我们的方法由两部分组成：第一个模块根据SMPL顶点变形典型的3D高斯；接着的模块进一步采用它们设计的关节编码，并预测每个高斯的变形，以处理超出SMPL顶点变形的动态。然后通过光栅器合成图像。ParDy-Human构成了一个对于逼真动态人类化身的显式模型，它需要显著更少的训练视图和图像。我们的化身学习不需要额外的标注，如遮罩，并且可以在可变背景下进行训练，同时即使在消费级硬件上也能高效地推断出全分辨率图像。我们提供实验证据表明，ParDy-Human在ZJU-MoCap和THUman4.0数据集上在定量和视觉上都优于最先进的方法。\n"
  },
  {
    "path": "abs/2312.15258.md",
    "content": "### Human101: Training 100+FPS Human Gaussians in 100s from 1 View\n\nReconstructing the human body from single-view videos plays a pivotal role in the virtual reality domain. One prevalent application scenario necessitates the rapid reconstruction of high-fidelity 3D digital humans while simultaneously ensuring real-time rendering and interaction. Existing methods often struggle to fulfill both requirements. In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS. Our method leverages the strengths of 3D Gaussian Splatting, which provides an explicit and efficient representation of 3D humans. Standing apart from prior NeRF-based pipelines, Human101 ingeniously applies a Human-centric Forward Gaussian Animation method to deform the parameters of 3D Gaussians, thereby enhancing rendering speed (i.e., rendering 1024-resolution images at an impressive 60+ FPS and rendering 512-resolution images at 100+ FPS). Experimental results indicate that our approach substantially eclipses current methods, clocking up to a 10 times surge in frames per second and delivering comparable or superior rendering quality.\n\n在虚拟现实领域，从单视图视频重建人体扮演着关键角色。一种流行的应用场景需要快速重建高保真度的3D数字人类，同时确保实时渲染和交互。现有方法通常难以满足这两个要求。在本文中，我们介绍了Human101，这是一个新颖的框架，能够通过在100秒内训练3D高斯并以100+ FPS的速度渲染，从1视图视频中生成高保真度的动态3D人类重建。我们的方法利用了3D高斯涂抹的优势，为3D人类提供了一种显式且高效的表征。与以往基于NeRF的管道不同，Human101巧妙地应用了以人为中心的前向高斯动画方法来变形3D高斯的参数，从而提高了渲染速度（即以60+ FPS渲染1024分辨率的图像，以100+ FPS渲染512分辨率的图像）。实验结果表明，我们的方法大幅超越了当前的方法，帧数每秒提高了高达10倍，并提供了可比拟或更优越的渲染质量。\n"
  },
  {
    "path": "abs/2312.15676.md",
    "content": "### Sparse-view CT Reconstruction with 3D Gaussian Volumetric Representation\n\nSparse-view CT is a promising strategy for reducing the radiation dose of traditional CT scans, but reconstructing high-quality images from incomplete and noisy data is challenging. Recently, 3D Gaussian has been applied to model complex natural scenes, demonstrating fast convergence and better rendering of novel views compared to implicit neural representations (INRs). Taking inspiration from the successful application of 3D Gaussians in natural scene modeling and novel view synthesis, we investigate their potential for sparse-view CT reconstruction. We leverage prior information from the filtered-backprojection reconstructed image to initialize the Gaussians; and update their parameters via comparing difference in the projection space. Performance is further enhanced by adaptive density control. Compared to INRs, 3D Gaussians benefit more from prior information to explicitly bypass learning in void spaces and allocate the capacity efficiently, accelerating convergence. 3D Gaussians also efficiently learn high-frequency details. Trained in a self-supervised manner, 3D Gaussians avoid the need for large-scale paired data. Our experiments on the AAPM-Mayo dataset demonstrate that 3D Gaussians can provide superior performance compared to INR-based methods.\n\n稀疏视图CT（计算机断层扫描）是一种减少传统CT扫描辐射剂量的有前景策略，但从不完整和噪声数据中重建高质量图像是具有挑战性的。最近，3D高斯模型已经被应用于复杂自然场景建模，与隐式神经表示（INRs）相比，展示出快速收敛和更好的新视角渲染。受3D高斯在自然场景建模和新视角合成应用成功的启发，我们研究了它们在稀疏视图CT重建中的潜力。我们利用过滤反投影重建图像中的先验信息来初始化高斯模型；并通过比较投影空间中的差异来更新它们的参数。通过自适应密度控制进一步提高性能。与INRs相比，3D高斯更能从先验信息中受益，明确地绕过在空白空间的学习，并高效地分配容量，加速收敛。3D高斯还能有效地学习高频细节。采用自我监督的方式训练，3D高斯避免了对大规模配对数据的需求。我们在AAPM-Mayo数据集上的实验表明，与基于INR的方法相比，3D高斯可以提供更优越的性能。\n"
  },
  {
    "path": "abs/2312.16047.md",
    "content": "### 2D-Guided 3D Gaussian Segmentation\n\nRecently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods.\n\n最近，作为一种显式的3D表征方法，3D高斯在表达复杂场景和训练时长方面展现出了相对于神经辐射场（NeRF）的强大竞争力。这些优势预示着3D高斯在3D理解和编辑方面的广泛应用潜力。同时，3D高斯的分割还处于初级阶段。现有的分割方法不仅繁琐，而且无法在短时间内同时分割多个对象。为此，本文介绍了一种用2D分割作为监督的3D高斯分割方法。这种方法使用输入的2D分割图来指导增加的3D高斯语义信息的学习，同时最近邻聚类和统计过滤用于精炼分割结果。实验表明，我们这种简洁的方法在多对象分割的平均交并比（mIOU）和平均准确度（mAcc）上能够达到与之前单对象分割方法相当的性能。\n"
  },
  {
    "path": "abs/2312.16084.md",
    "content": "### LangSplat: 3D Language Gaussian Splatting\n\nHuman lives in a 3D world and commonly uses natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP language embeddings in a NeRF model, LangSplat advances the field by utilizing a collection of 3D Gaussians, each encoding language features distilled from CLIP, to represent the language field. By employing a tile-based splatting technique for rendering language features, we circumvent the costly rendering process inherent in NeRF. Instead of directly learning CLIP embeddings, LangSplat first trains a scene-wise language autoencoder and then learns language features on the scene-specific latent space, thereby alleviating substantial memory demands imposed by explicit modeling. Existing methods struggle with imprecise and vague 3D language fields, which fail to discern clear boundaries between objects. We delve into this issue and propose to learn hierarchical semantics using SAM, thereby eliminating the need for extensively querying the language field across various scales and the regularization of DINO features. Extensive experiments on open-vocabulary 3D object localization and semantic segmentation demonstrate that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a {\\speed} × speedup compared to LERF at the resolution of 1440 × 1080.\n\n人类生活在一个3D世界中，并通常使用自然语言与3D场景进行交互。近来，建模一个支持在3D中进行开放式语言查询的3D语言场受到越来越多的关注。本文介绍了LangSplat，它构建了一个3D语言场，使得在3D空间中进行精确且高效的开放词汇查询成为可能。与现有方法不同，后者在NeRF模型中嵌入CLIP语言特征，LangSplat通过使用一系列3D高斯模型推进了这一领域，每个高斯模型都编码了从CLIP中提炼出的语言特征来代表语言场。我们采用基于瓦片的涂抹技术来渲染语言特征，从而避开了NeRF中固有的成本高昂的渲染过程。LangSplat首先训练一个场景级别的语言自编码器，然后在场景特定的潜在空间上学习语言特征，而不是直接学习CLIP嵌入，从而减轻了显式建模所带来的大量内存需求。现有方法在3D语言场中往往存在不精确和模糊的问题，无法清晰区分对象之间的边界。我们深入研究了这个问题，并提出使用SAM来学习层次化语义，从而消除了在不同尺度上广泛查询语言场和DINO特征的正则化的需要。广泛的实验表明，在开放词汇的3D对象定位和语义分割方面，LangSplat显著超越了之前的最先进方法LERF。值得注意的是，LangSplat非常高效，在1440 × 1080的分辨率下比LERF快{\\speed}倍。\n\n\n"
  },
  {
    "path": "abs/2312.16812.md",
    "content": "### Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis\n\nNovel view synthesis of dynamic scenes has been an intriguing yet challenging problem. Despite recent advancements, simultaneously achieving high-resolution photorealistic results, real-time rendering, and compact storage remains a formidable task. To address these challenges, we propose Spacetime Gaussian Feature Splatting as a novel dynamic scene representation, composed of three pivotal components. First, we formulate expressive Spacetime Gaussians by enhancing 3D Gaussians with temporal opacity and parametric motion/rotation. This enables Spacetime Gaussians to capture static, dynamic, as well as transient content within a scene. Second, we introduce splatted feature rendering, which replaces spherical harmonics with neural features. These features facilitate the modeling of view- and time-dependent appearance while maintaining small size. Third, we leverage the guidance of training error and coarse depth to sample new Gaussians in areas that are challenging to converge with existing pipelines. Experiments on several established real-world datasets demonstrate that our method achieves state-of-the-art rendering quality and speed, while retaining compact storage. At 8K resolution, our lite-version model can render at 60 FPS on an Nvidia RTX 4090 GPU.\n\n动态场景的新视角合成一直是一个有趣但具有挑战性的问题。尽管近期取得了一些进展，但要同时实现高分辨率的逼真结果、实时渲染和紧凑存储仍然是一个艰巨的任务。为了解决这些挑战，我们提出了时空高斯特征涂抹作为一种新的动态场景表征，它由三个关键组成部分构成。首先，我们通过增强3D高斯模型与时间不透明度和参数化运动/旋转，构建了表现力强的时空高斯。这使时空高斯能够捕捉场景内的静态、动态以及瞬时内容。其次，我们引入了涂抹特征渲染，用神经特征替代球形谐波。这些特征有助于建模视角和时间依赖的外观，同时保持小尺寸。第三，我们利用训练误差和粗略深度的指导，在现有管道难以收敛的区域采样新的高斯模型。在几个已建立的真实世界数据集上的实验表明，我们的方法在渲染质量和速度方面达到了最先进水平，同时保持了紧凑的存储。在8K分辨率下，我们的轻量版模型可以在Nvidia RTX 4090 GPU上以60 FPS的速度渲染。\n"
  },
  {
    "path": "abs/2312.17142.md",
    "content": "### DreamGaussian4D: Generative 4D Gaussian Splatting\n\nRemarkable progress has been made in 4D content generation recently. However, existing methods suffer from long optimization time, lack of motion controllability, and a low level of detail. In this paper, we introduce DreamGaussian4D, an efficient 4D generation framework that builds on 4D Gaussian Splatting representation. Our key insight is that the explicit modeling of spatial transformations in Gaussian Splatting makes it more suitable for the 4D generation setting compared with implicit representations. DreamGaussian4D reduces the optimization time from several hours to just a few minutes, allows flexible control of the generated 3D motion, and produces animated meshes that can be efficiently rendered in 3D engines.\n\n最近在4D内容生成方面取得了显著进展。然而，现有方法存在优化时间长、运动可控性差和细节水平低的问题。在本文中，我们介绍了DreamGaussian4D，这是一个高效的4D生成框架，建立在4D高斯涂抹表征之上。我们的关键洞察是，高斯涂抹中对空间转换的显式建模使其更适合4D生成设置，与隐式表征相比。DreamGaussian4D将优化时间从几小时减少到仅几分钟，允许灵活控制生成的3D运动，并产生可以在3D引擎中高效渲染的动画网格。\n"
  },
  {
    "path": "abs/2312.17225.md",
    "content": "### 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency\n\nAided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize the entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content creation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods.\n\n在文本到图像和文本到视频扩散模型的帮助下，现有的4D内容创建管道使用分数蒸馏采样来优化整个动态3D场景。然而，由于这些管道是从文本或图像输入生成4D内容的，它们在通过试错进行提示工程时耗费了大量时间和精力。本工作介绍了4DGen，这是一个用于根据地面情况创建4D内容的新颖、整体框架，它将4D生成任务分解为多个阶段。我们确定静态3D资产和单镜头视频序列是构建4D内容的关键组成部分。我们的管道促进了条件性4D生成，使用户能够指定几何（3D资产）和运动（单镜头视频），从而提供对内容创建的更高控制。此外，我们使用动态3D高斯构建我们的4D表征，这允许在训练期间通过渲染进行高效、高分辨率的监督，从而促进高质量的4D生成。此外，我们在锚定帧上使用时空伪标签，通过3D感知的分数蒸馏采样和平滑度正则化实现无缝一致性先验。与现有基线相比，我们的方法在忠实重建输入信号和从新视点和时间步骤现实地推断渲染方面取得了竞争性结果。最重要的是，我们的方法支持根据地面情况的生成，为用户提供增强的控制，这是以前方法难以实现的功能。\n"
  },
  {
    "path": "abs/2401.00834.md",
    "content": "### Deblurring 3D Gaussian Splatting\n\nRecent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring.\n\n最近在辐射场的研究为新视角合成铺平了一条坚实的道路，其逼真的渲染质量令人印象深刻。然而，它们通常采用神经网络和体积渲染，这在训练上成本高昂，并且由于渲染时间过长，阻碍了它们在各种实时应用中的广泛使用。最近，基于3D高斯涂抹的方法被提出来模拟3D场景，并在实时渲染图像时实现了显著的视觉质量。然而，如果训练图像模糊，它会严重降低渲染质量。由于镜头失焦、物体运动和相机抖动，模糊通常发生，并且不可避免地干扰了清晰图像的获取。一些先前的研究已经尝试使用神经场从模糊输入图像中渲染出清晰锐利的图像。然而，这些工作的大多数只设计用于基于体积渲染的神经辐射场，并不适用于基于光栅化的3D高斯涂抹方法。因此，我们提出了一种新的实时去模糊框架，使用一个小型的多层感知器（MLP）操作每个3D高斯的协方差来模拟场景模糊度，从而去模糊3D高斯涂抹。虽然去模糊3D高斯涂抹仍然可以实现实时渲染，但它可以从模糊图像中重建出细腻和锐利的细节。我们在基准上进行了多种实验，结果显示了我们方法的去模糊效果。\n"
  },
  {
    "path": "abs/2401.01339.md",
    "content": "### Street Gaussians for Modeling Dynamic Urban Scenes\n\nThis paper aims to tackle the problem of modeling dynamic urban street scenes from monocular videos. Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes. However, significant limitations are their slow training and rendering speed, coupled with the critical need for high precision in tracked vehicle poses. We introduce Street Gaussians, a new explicit scene representation that tackles all these limitations. Specifically, the dynamic urban street is represented as a set of point clouds equipped with semantic logits and 3D Gaussians, each associated with either a foreground vehicle or the background. To model the dynamics of foreground object vehicles, each object point cloud is optimized with optimizable tracked poses, along with a dynamic spherical harmonics model for the dynamic appearance. The explicit representation allows easy composition of object vehicles and background, which in turn allows for scene editing operations and rendering at 133 FPS (1066×1600 resolution) within half an hour of training. The proposed method is evaluated on multiple challenging benchmarks, including KITTI and Waymo Open datasets. Experiments show that the proposed method consistently outperforms state-of-the-art methods across all datasets. Furthermore, the proposed representation delivers performance on par with that achieved using precise ground-truth poses, despite relying only on poses from an off-the-shelf tracker.\n\n本文旨在解决从单目视频建模动态城市街景的问题。最近的方法通过结合跟踪的车辆姿态来扩展NeRF，以激活车辆，实现动态城市街景的逼真视角合成。然而，这些方法的显著局限性在于它们的训练和渲染速度缓慢，加上对跟踪车辆姿态高精度的关键需求。我们引入了Street Gaussians，这是一种新的显式场景表征，解决了所有这些限制。具体来说，动态城市街道被表示为一组点云，配备语义逻辑和3D高斯，每个高斯都与前景车辆或背景相关联。为了模拟前景物体车辆的动态，每个物体点云都通过可优化的跟踪姿态进行优化，同时还有一个动态球形谐波模型来表达动态外观。显式表征允许轻松组合物体车辆和背景，这反过来允许进行场景编辑操作，并在半小时的训练内以133 FPS（1066×1600分辨率）渲染。所提出的方法在包括KITTI和Waymo Open数据集在内的多个具有挑战性的基准上进行了评估。实验表明，提出的方法在所有数据集上始终优于最先进的方法。此外，尽管仅依赖于现成跟踪器的姿态，提出的表征在性能上与使用精确地面真实姿态达到的水平相当。\n"
  },
  {
    "path": "abs/2401.01970.md",
    "content": "### FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding\n\nPrecisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present Foundation Model Embedded Gaussian Splatting (FMGS), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851× faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments.\n\n为了准确感知现实世界三维物体的几何和语义属性，对于增强现实和机器人应用的持续发展至关重要。为此，我们提出了一种集成视觉-语言嵌入基础模型的三维高斯溅射方法（FMGS），将基础模型的视觉-语言嵌入融入到三维高斯溅射（GS）中。这项工作的主要贡献是一种高效的三维视觉-语言模型重建和表示方法。这是通过将基于图像的基础模型生成的特征图蒸馏到我们的三维模型渲染的特征图中来实现的。为了确保高质量的渲染和快速训练，我们通过结合GS和多分辨率哈希编码（MHE）的优势，引入了一种新颖的场景表示方法。我们的有效训练程序还引入了像素对齐损失，使相同语义实体的渲染特征距离靠近，遵循像素级语义边界。我们的结果展示了显著的多视图语义一致性，有助于多种下游任务，打败了最先进方法，在开放词汇语言基础的物体检测上提高了10.2%，尽管我们的推理速度快851倍。这项研究探索了视觉、语言和三维场景表示的交集，为在不受控的现实世界环境中增强场景理解铺平了道路。\n"
  },
  {
    "path": "abs/2401.02281.md",
    "content": "### PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DOF Object Pose Dataset Generation\n\nWe introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene through interaction between meshes extracted for the objects and the environment. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the Ramen dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.\n\n\n我们介绍了一种用于生成六自由度（6DOF）物体姿态数据集的多功能数据集生成器——物理增强高斯溅射模拟系统（PEGASUS），它基于三维高斯溅射技术。使用普通相机就能轻松获得环境和物体的表示，并通过高斯溅射重建。PEGASUS允许通过合并环境与一个或多个物体各自的高斯溅射点云，来组成新场景。利用物理引擎可以模拟自然物体在场景中的放置，通过物体与环境提取的网格之间的相互作用来实现。因此，通过结合不同的环境和物体，可以创建大量新场景——无论是静态的还是动态的。通过从各种视角渲染场景，可以提取出多样的数据点，如RGB图像、深度图、语义掩膜和6DoF物体姿态。我们的研究表明，通过在PEGASUS生成的数据上进行训练，姿态估计网络能够成功地从合成数据转移到现实世界数据。此外，我们还介绍了拉面数据集，包括30种日式杯装面条项目。该数据集包括球形扫描，能够捕获物体半球和高斯溅射重建的图像，使其与PEGASUS兼容。\n"
  },
  {
    "path": "abs/2401.02436.md",
    "content": "### Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis\n\nRecently, high-fidelity scene reconstruction with an optimized 3D Gaussian splat representation has been introduced for novel view synthesis from sparse image sets. Making such representations suitable for applications like network streaming and rendering on low-power devices requires significantly reduced memory consumption as well as improved rendering efficiency. We propose a compressed 3D Gaussian splat representation that utilizes sensitivity-aware vector clustering with quantization-aware training to compress directional colors and Gaussian parameters. The learned codebooks have low bitrates and achieve a compression rate of up to 31× on real-world scenes with only minimal degradation of visual quality. We demonstrate that the compressed splat representation can be efficiently rendered with hardware rasterization on lightweight GPUs at up to 4× higher framerates than reported via an optimized GPU compute pipeline. Extensive experiments across multiple datasets demonstrate the robustness and rendering speed of the proposed approach.\n\n最近，为了从稀疏图像集合合成新视图，引入了一种优化的3D高斯散点表示进行高保真场景重建。要使这些表示适用于网络流媒体和低功耗设备上的渲染，需要显著降低内存消耗并提高渲染效率。我们提出了一种压缩的3D高斯散点表示，该表示利用敏感度感知的向量聚类和量化感知训练来压缩方向颜色和高斯参数。学习得到的代码本具有低比特率，并在真实世界场景中实现了高达31倍的压缩率，同时仅对视觉质量造成极小的降级。我们展示了压缩的散点表示可以在轻量级GPU上通过硬件光栅化高效渲染，与通过优化的GPU计算管线报告的帧率相比，最高可提高4倍。在多个数据集上的广泛实验展示了所提方法的鲁棒性和渲染速度。\n"
  },
  {
    "path": "abs/2401.02588.md",
    "content": "### Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting\n\nThe accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks.\n\n随着轨道航天器部署的加速，对轨道服务（OOS）、航天器检查和主动碎片清除（ADR）的兴趣日益增长。这类任务需要在非合作、可能未知的在轨驻留空间物体附近进行精确的会合和邻近操作。载人任务的安全顾虑和地面控制的延迟时间要求完全自主性。这需要对目标几何形状进行稳健的表征。在本文中，我们提出了一种基于3D高斯散点的方法，用于绘制轨道上卫星的几何图形，该方法可以在当前航天硬件上可用的计算资源上运行。我们展示了在若干现实照明和运动条件下，硬件在环卫星模拟器上的模型训练和3D渲染性能。我们的模型被证明能够在轨道上进行训练，并且比以前基于NeRF的算法快近两个数量级，从而呈现出更高质量的未知卫星的新视图。这种机载能力对于实现自主引导、导航和控制任务所需的下游机器智能任务至关重要。\n"
  },
  {
    "path": "abs/2401.03890.md",
    "content": "### A Survey on 3D Gaussian Splatting\n\n3D Gaussian splatting (3D GS) has recently emerged as a transformative technique in the explicit radiance field and computer graphics landscape. This innovative approach, characterized by the utilization of millions of 3D Gaussians, represents a significant departure from the neural radiance field (NeRF) methodologies, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS, with its explicit scene representations and differentiable rendering algorithms, not only promises real-time rendering capabilities but also introduces unprecedented levels of control and editability. This positions 3D GS as a potential game-changer for the next generation of 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the advent of 3D GS, setting the stage for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By facilitating real-time performance, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation.\n\n3D 高斯溅射（3D Gaussian Splatting，简称3D GS）最近作为一种变革性技术，在显式辐射场和计算机图形领域中崭露头角。这种创新方法的特点是使用数以百万计的3D 高斯函数，代表了与神经辐射场（Neural Radiance Field，简称NeRF）方法论的显著不同，后者主要使用隐式的、基于坐标的模型将空间坐标映射到像素值。3D GS凭借其显式的场景表示和可微渲染算法，不仅承诺实时渲染能力，而且引入了前所未有的控制和可编辑性级别。这使得3D GS成为下一代3D重建和表示的潜在游戏规则改变者。在本文中，我们提供了对3D GS领域近期发展和关键贡献的首次系统概述。我们首先详细探讨了3D GS兴起背后的基本原理和驱动力，为理解其重要性奠定基础。我们讨论的一个焦点是3D GS的实际应用性。通过促进实时性能，3D GS开启了从虚拟现实到交互媒体等众多应用的大门。这一点通过比较分析领先的3D GS模型在各种基准任务上的表现，突显了它们的性能和实用性。调查总结了当前挑战，并提出了在这一领域未来研究的潜在途径。通过这项调查，我们旨在为新手和资深研究人员提供一个宝贵的资源，促进在适用和显式辐射场表示方面的进一步探索和发展。\n"
  },
  {
    "path": "abs/2401.04099.md",
    "content": "### AGG: Amortized Generative 3D Gaussians for Single Image to 3D\n\nGiven the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster.\n\n随着自动化3D内容创建流程需求的增长，研究人员已经研究了多种3D表示方法，以从单个图像生成3D对象。由于其卓越的渲染效率，基于3D高斯溅射的模型在3D重建和生成方面近期取得了显著成绩。用于将图像转换为3D生成的3D高斯溅射方法通常是基于优化的，需要许多计算成本高昂的得分提取步骤。为了克服这些挑战，我们引入了一种摊销生成的3D高斯框架（Amortized Generative 3D Gaussian，简称AGG），它能够从单个图像中即时产生3D高斯函数，消除了每个实例优化的需求。AGG利用一个中间的混合表示，对3D高斯位置和其他外观属性的生成进行分解，以实现联合优化。此外，我们提出了一个级联流程，首先生成3D数据的粗略表示，然后通过3D高斯超分辨率模块进行上采样。我们的方法与现有基于优化的3D高斯框架以及利用其他3D表示的采样基流程进行了比较，AGG在质量和数量上展现了竞争力的生成能力，同时在速度上快了数个数量级。\n"
  },
  {
    "path": "abs/2401.05345.md",
    "content": "### DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines\n\nDifferentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (e.g. 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x).\n\n可微渲染是一种在视觉计算应用中日益重要的技术，它通过使用梯度下降算法从2D图像中训练得到的模型来表示3D场景。最近的研究（例如3D高斯喷溅）使用光栅化管道来实现从这些学习到的3D模型中以高速渲染高质量的照片级真实图像。这些方法已经被证明非常有前景，为许多重要任务提供了最先进的质量。然而，即使使用强大的GPU，训练一个模型来表示一个场景仍然是一个耗时的任务。在这项工作中，我们观察到在训练过程中的梯度计算阶段是GPU上的一个重要瓶颈，因为需要处理大量的原子操作。这些原子操作压倒了L2分区中的原子单位，导致停滞。为了应对这一挑战，我们利用以下观察结果：（1）对于大多数warp，所有线程都会原子性地更新相同的内存位置；（2）warp生成不同数量的原子流量（因为有些线程可能不活跃）。我们提出了DISTWAR，一种基于两个关键思想的软件方法来加速原子操作：首先，我们利用寄存器在SM子核心内启用warp级别的线程缩减，以利用内部warp原子更新的局部性。其次，我们将原子计算在SM的warp级别缩减和L2原子单位之间分配，以提高原子计算的吞吐量。在SM上调度对相同内存位置进行大量原子更新的warp，而其他则使用L2原子单位。我们使用现有的warp级别原语实现了DISTWAR。我们在广泛使用的基于光栅的可微渲染工作负载上评估了DISTWAR。我们展示了平均2.44倍的显著加速（高达5.7倍）。\n"
  },
  {
    "path": "abs/2401.05925.md",
    "content": "### CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians\n\nWe propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based 3D segmentation methods have relied on implicit or voxel neural scene representation and ray-marching volume rendering which are time consuming. Recent 3D Gaussian Splatting significantly improves the rendering speed, however, existing Gaussians-based segmentation methods(eg: Gaussian Grouping) fail to provide compact segmentation masks especially in zero-shot segmentation, which is mainly caused by the lack of robustness and compactness for straightforwardly assigning learnable parameters to each Gaussian when encountering inconsistent 2D machine-generated labels. Our method aims to achieve compact and reliable zero-shot scene segmentation swiftly by mapping fused spatial and semantically meaningful features for each Gaussian point with a shallow decoding network. Specifically, our method firstly optimizes Gaussian points' position, convariance and color attributes under the supervision of RGB images. After Gaussian Locating, we distill multi-scale DINO features extracted from images through unprojection to each Gaussian, which is then incorporated with spatial features from the fast point features processing network, i.e. RandLA-Net. Then the shallow decoding MLP is applied to the multi-scale fused features to obtain compact segmentation. Experimental results show that our model can perform high-quality zero-shot scene segmentation, as our model outperforms other segmentation methods on both semantic and panoptic segmentation task, meanwhile consumes approximately only 10% segmenting time compared to NeRF-based segmentation.\n\n我们提出了一种紧凑而快速的分割3D高斯方法（CoSSegGaussians），用于在仅有RGB图像输入的情况下实现紧凑且一致的3D场景分割和快速渲染。之前基于NeRF的3D分割方法依赖于隐式或体素神经场景表示和射线行走体积渲染，这些方法耗时较长。最近的3D高斯喷溅显著提高了渲染速度，但现有基于高斯的分割方法（例如：高斯分组）尤其在零样本分割中未能提供紧凑的分割掩码，这主要是由于在遇到不一致的2D机器生成标签时，直接为每个高斯分配可学习参数缺乏鲁棒性和紧凑性。我们的方法旨在通过映射每个高斯点的融合空间和语义意义特征，并使用浅层解码网络，迅速实现紧凑且可靠的零样本场景分割。具体来说，我们的方法首先在RGB图像的监督下优化高斯点的位置、协方差和颜色属性。在定位高斯后，我们通过投影将从图像中提取的多尺度DINO特征提炼到每个高斯点上，然后与来自快速点特征处理网络（即RandLA-Net）的空间特征结合。接着，对融合的多尺度特征应用浅层解码MLP，以获得紧凑的分割。实验结果表明，我们的模型能够执行高质量的零样本场景分割，我们的模型在语义和全景分割任务上均优于其他分割方法，同时大约只需NeRF基分割的10%分割时间。\n"
  },
  {
    "path": "abs/2401.06003.md",
    "content": "### TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering\n\nPoint-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [Rückert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud.\n\nIn this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions.\n\nOur evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage.\n\n基于点的辐射场渲染已经在新视角合成方面展示出令人印象深刻的结果，提供了渲染质量和计算效率的引人注目的结合。然而，即使是该领域的最新方法也不是没有缺点。3D高斯喷溅[Kerbl和Kopanas等，2023]在渲染高细节场景时会遇到困难，因为它会导致模糊和云状伪影。另一方面，ADOP [Rückert等，2022]能够呈现更清晰的图像，但是神经重建网络降低了性能，它还难以处理时间上的不稳定性，并且无法有效应对点云中的大型间隙。\n\n在本文中，我们介绍了TRIPS（三线性点喷溅），这是一种结合了高斯喷溅和ADOP思想的方法。我们新技术背后的基本概念涉及将点光栅化成屏幕空间图像金字塔，金字塔层的选择由投影点的大小决定。这种方法允许使用单次三线性写入来渲染任意大的点。然后使用一个轻量级神经网络来重建包括超出喷溅分辨率细节的无孔图像。重要的是，我们的渲染管线是完全可微分的，允许对点的大小和位置进行自动优化。\n\n我们的评估表明，TRIPS在渲染质量方面超越了现有的最先进方法，同时在现成硬件上保持每秒60帧的实时帧率。这种性能扩展到具有挑战性的场景，如具有复杂几何形状、广阔景观和自动曝光镜头的场景。\n"
  },
  {
    "path": "abs/2401.06116.md",
    "content": "### Gaussian Shadow Casting for Neural Characters\n\nNeural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density proxy that replaces sampling with a simple analytic formula. It supports dynamic motion and is tailored for shadow computation, thereby avoiding the affine projection approximation and sorting required by the closely related Gaussian splatting. Combined with a deferred neural rendering model, our Gaussian shadows enable Lambertian shading and shadow casting with minimal overhead. We demonstrate improved reconstructions, with better separation of albedo, shading, and shadows in challenging outdoor scenes with direct sun light and hard shadows. Our method is able to optimize the light direction without any input from the user. As a result, novel poses have fewer shadow artifacts and relighting in novel scenes is more realistic compared to the state-of-the-art methods, providing new ways to pose neural characters in novel environments, increasing their applicability.\n\n神经角色模型现在可以从视频中重建详细的几何形状和纹理，但它们缺乏明确的阴影和着色，导致在生成新视角和姿态或在重新照明时出现失真。特别是包含阴影非常困难，因为阴影是一种全局效应，且所需的次级射线投射成本很高。我们提出了一种新的阴影模型，使用高斯密度代理替代采样，采用简单的分析公式。它支持动态运动，专为阴影计算量身定制，从而避免了与密切相关的高斯喷溅所需的仿射投影近似和排序。结合使用了延迟神经渲染模型，我们的高斯阴影支持兰伯特着色和阴影投射，且额外开销最小。我们展示了改进的重建结果，在具有直接阳光和硬阴影的挑战性户外场景中，反照率、着色和阴影的分离更好。我们的方法能够在没有任何用户输入的情况下优化光线方向。因此，新的姿态有更少的阴影失真，而且在新场景中的重新照明比现有最先进方法更加逼真，为在新环境中摆放神经角色提供了新的方式，增加了它们的适用性。\n"
  },
  {
    "path": "abs/2401.08742.md",
    "content": "### Fast Dynamic 3D Object Generation from a Single-view Video\n\nGenerating dynamic three-dimensional (3D) object from a single-view video is challenging due to the lack of 4D labeled data. Existing methods extend text-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling, but they are slow and expensive to scale (e.g., 150 minutes per object) due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this limitation, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly train a novel 4D Gaussian splatting model with explicit point cloud geometry, enabling real-time rendering under continuous camera trajectories. Extensive experiments on synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the same level of innovative view synthesis quality. For example, Efficient4D takes only 14 minutes to model a dynamic object.\n\n从单视角视频生成动态三维（3D）对象是具有挑战性的，因为缺乏四维（4D）标注数据。现有方法通过转移现成的图像生成模型（如分数蒸馏采样）来扩展文本至3D的流程，但由于需要通过大型预训练模型反向传播信息有限的监督信号，这些方法速度慢且难以扩展（例如，每个对象需要150分钟）。为了解决这一限制，我们提出了一种高效的视频至4D对象生成框架，名为Efficient4D。它生成高质量、时空一致的图像，并在不同的相机视角下使用这些图像作为标注数据，直接训练一个具有明确点云几何形状的新颖4D高斯喷溅模型，从而实现实时渲染连续的相机轨迹。在合成和真实视频上的大量实验表明，与先前的技术相比，Efficient4D在速度上实现了显著的10倍提升，同时保持了相同水平的创新视图合成质量。例如，Efficient4D仅需14分钟就能对一个动态对象进行建模。\n"
  },
  {
    "path": "abs/2401.09720.md",
    "content": "### GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting\n\nIn this work, we propose a novel clothed human reconstruction method called GaussianBody, based on 3D Gaussian Splatting. Compared with the costly neural radiance based models, 3D Gaussian Splatting has recently demonstrated great performance in terms of training time and rendering quality. However, applying the static 3D Gaussian Splatting model to the dynamic human reconstruction problem is non-trivial due to complicated non-rigid deformations and rich cloth details. To address these challenges, our method considers explicit pose-guided deformation to associate dynamic Gaussians across the canonical space and the observation space, introducing a physically-based prior with regularized transformations helps mitigate ambiguity between the two spaces. During the training process, we further propose a pose refinement strategy to update the pose regression for compensating the inaccurate initial estimation and a split-with-scale mechanism to enhance the density of regressed point clouds. The experiments validate that our method can achieve state-of-the-art photorealistic novel-view rendering results with high-quality details for dynamic clothed human bodies, along with explicit geometry reconstruction.\n\n在这项工作中，我们提出了一种基于3D高斯喷溅的新型穿着人体重建方法，名为GaussianBody。与成本高昂的神经辐射基模型相比，3D高斯喷溅最近在训练时间和渲染质量方面展现出了优异的性能。然而，由于复杂的非刚性变形和丰富的衣物细节，将静态的3D高斯喷溅模型应用于动态人体重建问题并非易事。为了解决这些挑战，我们的方法考虑了显式的姿态引导变形，以关联规范空间和观察空间中的动态高斯，引入基于物理的先验并配合规则化变换有助于减少两个空间之间的歧义。在训练过程中，我们进一步提出了一种姿态细化策略，用于更新姿态回归以补偿初始估计的不准确性，以及一种分割与缩放机制，用以增强回归点云的密度。实验验证了我们的方法能够为动态穿着的人体实现具有高质量细节的最先进的逼真新视角渲染结果，同时还实现了显式几何重建。\n"
  },
  {
    "path": "abs/2401.11535.md",
    "content": "### Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting\n\nSurgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision to optimize 3D targets with a single viewpoint, and a spatial-temporal weight mask to mitigate tool occlusion. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality.\n\n外科3D重建是机器人外科手术研究中的一个关键领域。近期的研究采用动态辐射场的变体，成功实现了从单视点视频中对可变形组织进行3D重建。然而，这些方法通常存在耗时优化或质量较低的问题，限制了它们在下游任务中的应用。受到近期流行的3D表示方法3D高斯溅射的启发，我们提出了EndoGS，将高斯溅射应用于可变形内窥镜组织重建。具体来说，我们的方法结合了变形场以处理动态场景，深度引导的监督来优化单视点的3D目标，以及空间-时间权重掩码来减轻工具遮挡。结果表明，EndoGS能够从单视点视频、估计的深度图和标记的工具掩码中重建和渲染高质量的可变形内窥镜组织。在DaVinci机器人外科手术视频上的实验表明，EndoGS实现了卓越的渲染质量。\n"
  },
  {
    "path": "abs/2401.12561.md",
    "content": "### EndoGaussian: Gaussian Splatting for Deformable Surgical Scene Reconstruction\n\nReconstructing deformable tissues from endoscopic stereo videos is essential in many downstream surgical applications. However, existing methods suffer from slow inference speed, which greatly limits their practical use. In this paper, we introduce EndoGaussian, a real-time surgical scene reconstruction framework that builds on 3D Gaussian Splatting. Our framework represents dynamic surgical scenes as canonical Gaussians and a time-dependent deformation field, which predicts Gaussian deformations at novel timestamps. Due to the efficient Gaussian representation and parallel rendering pipeline, our framework significantly accelerates the rendering speed compared to previous methods. In addition, we design the deformation field as the combination of a lightweight encoding voxel and an extremely tiny MLP, allowing for efficient Gaussian tracking with a minor rendering burden. Furthermore, we design a holistic Gaussian initialization method to fully leverage the surface distribution prior, achieved by searching informative points from across the input image sequence. Experiments on public endoscope datasets demonstrate that our method can achieve real-time rendering speed (195 FPS real-time, 100× gain) while maintaining the state-of-the-art reconstruction quality (35.925 PSNR) and the fastest training speed (within 2 min/scene), showing significant promise for intraoperative surgery applications.\n\n从内窥镜立体视频中重建可变形组织，在许多下游外科应用中至关重要。然而，现有方法受到推理速度缓慢的限制，这极大地限制了它们的实际应用。在本文中，我们介绍了EndoGaussian，这是一个基于3D高斯溅射的实时外科场景重建框架。我们的框架将动态外科场景表示为规范高斯和一个时变的变形场，该变形场预测在新时间戳上的高斯变形。由于高效的高斯表示和并行渲染流水线，我们的框架显著加快了与先前方法相比的渲染速度。此外，我们将变形场设计为轻量级编码体素和极小的多层感知器的组合，允许高效的高斯追踪，同时带来较小的渲染负担。我们还设计了一种全面的高斯初始化方法，充分利用了通过从输入图像序列中搜索信息点获得的表面分布先验。在公共内窥镜数据集上的实验表明，我们的方法可以实现实时渲染速度（实时195 FPS，100倍提速），同时保持最先进的重建质量（35.925 PSNR）和最快的训练速度（在2分钟/场景内），为术中外科应用展示了重大的潜力。\n"
  },
  {
    "path": "abs/2401.12900.md",
    "content": "### PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Creation with 3D Gaussian Splatting\n\nDespite much progress, creating real-time high-fidelity head avatar is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency.\nAlthough 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time (≥ 25 fps at a resolution of 512 x 512 )\n\n尽管取得了很大进展，但创建实时高保真头部化身仍然是一项挑战，现有方法必须在速度和质量之间进行权衡。基于3DMM的方法通常无法模拟非面部结构，如眼镜和发型，而神经隐式模型则受困于变形的不灵活性和渲染效率低下。\n尽管3D高斯在几何表示和辐射场重建方面已显示出有希望的能力，但在头部化身创建中应用3D高斯仍然是一个主要挑战，因为3D高斯难以模拟由变化的姿势和表情引起的头部形状变化。在这篇文章中，我们介绍了PSAvatar，这是一个用于可动画头部化身创建的新颖框架，它利用离散几何原语来创建参数化可变形状模型，并采用3D高斯进行细节表示和高保真渲染。这个参数化可变形状模型是一个基于点的可变形状模型（PMSM），它使用点而不是网格进行3D表示，以实现更强的表示灵活性。PMSM首先通过在表面上以及网格外采样将FLAME网格转换为点，以实现不仅能重建表面类结构，而且能重建如眼镜和发型等复杂几何结构。通过以分析综合的方式将这些点与头部形状对齐，PMSM使得可以利用3D高斯进行细节表示和外观建模，从而实现高保真化身的创建。我们展示了PSAvatar可以重建各种对象的高保真头部化身，这些化身可以实时动画化（≥25 fps，分辨率为512 x 512）。\n"
  },
  {
    "path": "abs/2401.14857.md",
    "content": "### LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map Rendering\n\nWe introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multi-modal sensor fused mapping system that builds on the differentiable surface splatting to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.\n\nThis system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initial poses for surface Gaussian scenes are obtained using a LiDAR-inertial system with size-adaptive voxels. Then, we optimized and refined the Gaussians by visual-derived photometric gradients to optimize the quality and density of LiDAR measurements.\n\nOur method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality while also holding potential applicability in real-time SLAM and robotics domains.\n\n我们介绍了一种集成的精确激光雷达、惯性和视觉（LIV）多模态传感器融合映射系统，该系统基于可微分面溅射技术，以提高映射的保真度、质量和结构精度。值得注意的是，这也是一种用于激光雷达-视觉-惯性传感器融合的紧密耦合地图的新形式。\n\n该系统利用激光雷达和视觉数据的互补特性，捕捉大规模3D场景的几何结构，并以高保真度恢复其视觉表面信息。我们使用具有大小自适应体素的激光雷达-惯性系统获取表面高斯场景的初始姿态。然后，我们利用视觉衍生的光度梯度优化和细化高斯，以优化激光雷达测量的质量和密度。\n\n我们的方法兼容各种类型的激光雷达，包括固态和机械激光雷达，支持重复和非重复扫描模式。它通过激光雷达加强结构构建，并有助于在不同的LIV数据集中实时生成逼真的渲染。该系统在生成实时逼真场景方面表现出显著的韧性和多功能性，这些场景可能用于数字双胞胎和虚拟现实，同时也具有在实时SLAM和机器人技术领域应用的潜力。\n"
  },
  {
    "path": "abs/2401.15318.md",
    "content": "### Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting\n\nWe demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views.\n\n我们展示了将基于物理的固体和流体动画与3D高斯溅射（3DGS）结合的可行性，用以在使用3DGS重建的虚拟场景中创造新颖效果。利用高斯溅射和基于位置的动力学（PBD）在底层表示中的一致性，我们以一种连贯的方式管理固体和流体的渲染、视图合成和动态。类似于高斯着色器，我们通过增加一个法线来增强每个高斯核，使核的方向与表面法线对齐，以细化PBD模拟。这种方法有效地消除了由固体中的旋转变形引起的尖锐噪声。它还允许我们集成基于物理的渲染来增强流体上的动态表面反射。因此，我们的框架能够真实地再现动态流体上的表面高光，并促进场景物体和流体之间从新视角的交互。\n"
  },
  {
    "path": "abs/2401.16416.md",
    "content": "### Endo-4DGS: Distilling Depth Ranking for Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting\n\nIn the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes. Nonetheless, these methods are hampered by slow inference, prolonged training, and substantial computational demands. Additionally, some rely on stereo depth estimation, which is often infeasible due to the high costs and logistical challenges associated with stereo cameras. Moreover, the monocular reconstruction quality for deformable scenes is currently inadequate. To overcome these obstacles, we present Endo-4DGS, an innovative, real-time endoscopic dynamic reconstruction approach that utilizes 4D Gaussian Splatting (GS) and requires no ground truth depth data. This method extends 3D GS by incorporating a temporal component and leverages a lightweight MLP to capture temporal Gaussian deformations. This effectively facilitates the reconstruction of dynamic surgical scenes with variable conditions. We also integrate Depth-Anything to generate pseudo-depth maps from monocular views, enhancing the depth-guided reconstruction process. Our approach has been validated on two surgical datasets, where it has proven to render in real-time, compute efficiently, and reconstruct with remarkable accuracy. These results underline the vast potential of Endo-4DGS to improve surgical assistance.\n\n在机器人辅助的微创手术领域中，动态场景重建可以显著增强下游任务并改善手术结果。基于神经辐射场（NeRF）的方法因其卓越的场景重建能力而近来备受关注。然而，这些方法受到推理速度慢、训练时间长和计算需求大的限制。此外，一些方法依赖于立体深度估计，由于立体相机的高成本和物流挑战，这通常是不可行的。此外，目前单目重建对于可变形场景的质量还不够充分。为了克服这些障碍，我们提出了Endo-4DGS，这是一种创新的、实时的内窥镜动态重建方法，它利用4D高斯溅射（GS）并且不需要真实深度数据。该方法通过加入时间成分来扩展3D GS，并利用轻量级的多层感知器捕捉时间高斯变形。这有效地促进了在不同条件下动态外科场景的重建。我们还集成了Depth-Anything来从单目视图生成伪深度图，增强了深度引导的重建过程。我们的方法已在两个外科数据集上进行了验证，结果证明它能够实时渲染、高效计算，并以卓越的准确度进行重建。这些结果突显了Endo-4DGS在改善外科辅助方面的巨大潜力。\n"
  },
  {
    "path": "abs/2401.16663.md",
    "content": "### VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality\n\nAs consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience.\n\n随着消费级虚拟现实（VR）和混合现实（MR）技术的日益普及，对3D虚拟内容的开发与互动越来越受到关注。不幸的是，传统的内容创建、编辑和虚拟空间内的互动技术充满困难。这些技术不仅需要大量工程技术，还需要广泛的专业知识，这增加了虚拟对象操纵的挫败感和低效率。我们提出的VR-GS系统代表了人本中心的3D内容互动的一大飞跃，提供了无缝且直观的用户体验。通过在虚拟现实环境中开发一个意识到物理动力学的交互式高斯溅射，并构建一个高效的双层嵌入策略以及可变形体仿真，VR-GS确保了实时执行和高度逼真的动态响应。我们的虚拟现实系统组件旨在实现高效和有效，从详细的场景重建和对象分割开始，通过多视图图像修复，到交互式基于物理的编辑。该系统还包括实时形变嵌入和动态阴影投射，确保全面而引人入胜的虚拟体验。\n"
  },
  {
    "path": "abs/2401.17857.md",
    "content": "### Segment Anything in 3D Gaussians\n\n3D Gaussian Splatting has emerged as an alternative 3D representation of Neural Radiance Fields (NeRFs), benefiting from its high-quality rendering results and real-time rendering speed. Considering the 3D Gaussian representation remains unparsed, it is necessary first to execute object segmentation within this domain. Subsequently, scene editing and collision detection can be performed, proving vital to a multitude of applications, such as virtual reality (VR), augmented reality (AR), game/movie production, etc. In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters. We refer to the proposed method as SA-GS, for Segment Anything in 3D Gaussians. Given a set of clicked points in a single input view, SA-GS can generalize SAM to achieve 3D consistent segmentation via the proposed multi-view mask generation and view-wise label assignment methods. We also propose a cross-view label-voting approach to assign labels from different views. In addition, in order to address the boundary roughness issue of segmented objects resulting from the non-negligible spatial sizes of 3D Gaussian located at the boundary, SA-GS incorporates the simple but effective Gaussian Decomposition scheme. Extensive experiments demonstrate that SA-GS achieves high-quality 3D segmentation results, which can also be easily applied for scene editing and collision detection tasks.\n\n3D 高斯散射已经成为神经辐射场（NeRFs）的一种替代3D表示方法，它因高质量的渲染结果和实时渲染速度而受益。考虑到3D高斯表示仍未被解析，首先需要在此域内执行对象分割。随后，可以进行场景编辑和碰撞检测，这对于许多应用至关重要，如虚拟现实（VR）、增强现实（AR）、游戏/电影制作等。在本文中，我们提出了一种在3D高斯中实现对象分割的新方法，该方法通过一个无需任何训练过程和学习参数的交互式程序来实现。我们将这种方法称为SA-GS，即在3D高斯中分割任何东西。通过在单个输入视图中点击一组点，SA-GS可以利用所提出的多视图掩码生成和逐视图标签分配方法，实现3D一致性分割的SAM推广。我们还提出了一种跨视图标签投票方法，用于从不同视图分配标签。此外，为了解决由于位于边界的3D高斯的非微不足道的空间大小导致的分割对象边界粗糙问题，SA-GS采用了简单但有效的高斯分解方案。广泛的实验表明，SA-GS实现了高质量的3D分割结果，这也可以轻松应用于场景编辑和碰撞检测任务。\n"
  },
  {
    "path": "abs/2402.00525.md",
    "content": "### StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering\n\nGaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates popping artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements.\n\n高斯散射已作为一种突出的模型出现，用于从跨多个领域的图像中构建3D表示。然而，3D高斯散射渲染管线的效率依赖于几种简化。尤其是，将高斯简化为具有单一视图空间深度的2D散射体，会在视图旋转过程中引入弹出和混合伪像。解决这一问题需要准确的每像素深度计算，然而，与全局排序操作相比，完整的每像素排序证明过于昂贵。在本文中，我们提出了一种新颖的分层光栅化方法，该方法系统地重新排序和剔除散射体，同时最小化处理开销。我们的软件光栅化器有效地消除了弹出伪像和视图不一致性，通过定量和定性测量都得到了证明。同时，我们的方法减少了利用弹出现象作弊的视图依赖效果的可能性，确保了更真实的表示。尽管消除了作弊，我们的方法在测试图像的定量结果上与原始高斯散射相当，同时在运动中的新视图合成的一致性上有所增加。由于其设计，我们的分层方法平均仅比原始高斯散射慢4%。值得注意的是，强制一致性使得高斯的数量大约减半，几乎不影响质量和视图一致性。因此，渲染性能几乎翻倍，使我们的方法比原始高斯散射快1.6倍，内存需求减少50%。\n"
  },
  {
    "path": "abs/2402.00752.md",
    "content": "### Optimal Projection for 3D Gaussian Splatting\n\n3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance , and robustness in sparse viewpoints , leading to various improvements. However, there has been a notable lack of attention to the projection errors introduced by the local affine approximation inherent in the splatting itself, and the consequential impact of these errors on the quality of photo-realistic rendering. This paper addresses the projection error function of 3D Gaussian Splatting, commencing with the residual error from the first-order Taylor expansion of the projection function ϕ. The analysis establishes a correlation between the error and the Gaussian mean position. Subsequently, leveraging function optimization theory, this paper analyzes the function's minima to provide an optimal projection strategy for Gaussian Splatting referred to Optimal Gaussian Splatting. Experimental validation further confirms that this projection methodology reduces artifacts, resulting in a more convincingly realistic rendering.\n\n3D高斯散射在实时神经渲染中获得了广泛的关注和应用。同时，也有人对这项技术在点云存储、性能以及在稀疏视点下的鲁棒性等方面的局限性提出了担忧，这导致了各种改进。然而，对于散射本身固有的局部仿射近似引入的投影错误及这些错误对于照片级真实渲染质量的影响，缺乏足够的关注。本文讨论了3D高斯散射的投影误差函数，从投影函数ϕ的一阶泰勒展开的残差误差开始。分析建立了误差与高斯平均位置之间的相关性。随后，利用函数优化理论，本文分析了函数的最小值，以提供一个称为最优高斯散射的高斯散射的最优投影策略。实验验证进一步确认了这种投影方法减少了伪影，结果是更加令人信服的真实渲染。\n"
  },
  {
    "path": "abs/2402.00763.md",
    "content": "### 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming\n\n3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of 360∘ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (e.g., walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel 360∘ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios.\n\n3D高斯散射（3D-GS）最近因其实时和照片级真实渲染而受到极大关注。这项技术通常以透视图像作为输入，并通过将一组3D椭圆高斯散射到图像平面上，优化这些高斯，从而产生2D高斯。然而，将3D-GS应用于全景输入在有效地使用2D高斯对360度图像的球面进行投影建模方面提出了挑战。在实际应用中，输入的全景图往往是稀疏的，导致3D高斯的不可靠初始化以及随后3D-GS质量的降低。此外，由于缺乏约束的几何体（如墙面和地板）的纹理，3D-GS在用椭圆高斯建模这些平坦区域时遇到困难，导致在新视角中出现显著的浮动物。为了解决这些问题，我们提出了360-GS，一种新颖的360度高斯散射，专为有限的全景输入设计。360-GS不是直接将3D高斯散射到球面上，而是先将它们投影到单位球的切平面上，然后再将它们映射到球面投影上。这种调整使得使用高斯表示投影成为可能。我们通过利用全景图内的布局先验来指导360-GS的优化，这些布局先验简单易得，并包含关于室内场景的强大结构信息。我们的实验结果表明，360-GS允许全景渲染，并且在新视角合成中比现有最先进方法产生更少的伪影，从而提供室内场景中的沉浸式漫游体验。\n"
  },
  {
    "path": "abs/2402.01459.md",
    "content": "### GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting\n\nIn recent years, a range of neural network-based methods for image rendering have been introduced. For instance, widely-researched neural radiance fields (NeRF) rely on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-theart technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce Gaussian Mesh Splatting (GaMeS) model, a hybrid of mesh and a Gaussian distribution, that pin all Gaussians splats on the object surface (mesh). The unique contribution of our methods is defining Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain high-quality renders in the real-time generation of high-quality views. Furthermore, we demonstrate that in the absence of a predefined mesh, it is possible to fine-tune the initial mesh during the learning process.\n\n近年来，引入了一系列基于神经网络的图像渲染方法。例如，广泛研究的神经辐射场（NeRF）依赖于神经网络来表示3D场景，允许从少量2D图像合成真实视图。然而，大多数NeRF模型都受到长时间训练和推理时间的限制。相比之下，高斯散射（GS）是一种新颖的、最先进的技术，通过高斯分布近似它们对图像像素的贡献来渲染3D场景中的点，保证了快速训练和快速、实时渲染。GS的一个缺点是缺乏一个明确定义的方法来进行其条件化，因为需要对几十万个高斯组件进行条件化。为了解决这个问题，我们引入了高斯网格散射（GaMeS）模型，这是一种网格和高斯分布的混合体，将所有高斯散射固定在对象表面（网格）上。我们方法的独特贡献是仅根据它们在网格上的位置定义高斯散射，允许在动画过程中自动调整位置、规模和旋转。结果是，我们获得了高质量的实时生成高质量视图的渲染。此外，我们证明，在没有预定义网格的情况下，有可能在学习过程中对初始网格进行微调。\n"
  },
  {
    "path": "abs/2402.03246.md",
    "content": "### SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM\n\nSemantic understanding plays a crucial role in Dense Simultaneous Localization and Mapping (SLAM), facilitating comprehensive scene interpretation. Recent advancements that integrate Gaussian Splatting into SLAM systems have demonstrated its effectiveness in generating high-quality renderings through the use of explicit 3D Gaussian representations. Building on this progress, we propose SGS-SLAM, the first semantic dense visual SLAM system grounded in 3D Gaussians, which provides precise 3D semantic segmentation alongside high-fidelity reconstructions. Specifically, we propose to employ multi-channel optimization during the mapping process, integrating appearance, geometric, and semantic constraints with key-frame optimization to enhance reconstruction quality. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and semantic segmentation, outperforming existing methods meanwhile preserving real-time rendering ability.\n\n语义理解在密集型同时定位与地图构建（SLAM）中扮演着至关重要的角色，它促进了对场景的全面解释。近期将高斯喷溅技术整合到SLAM系统中的进展证明了其在通过使用显式的3D高斯表示生成高质量渲染图像方面的有效性。基于这一进展，我们提出了SGS-SLAM，这是第一个基于3D高斯的语义密集视觉SLAM系统，它提供精确的3D语义分割与高保真重建。具体来说，我们提议在映射过程中采用多通道优化，整合外观、几何和语义约束与关键帧优化来提升重建质量。广泛的实验表明，SGS-SLAM在相机位姿估计、地图重建和语义分割方面提供了最先进的性能，同时保持了实时渲染能力，超越了现有方法。\n"
  },
  {
    "path": "abs/2402.03307.md",
    "content": "### 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes\n\nWe consider the problem of novel view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or capturing high-fidelity renderings. In this paper, we introduce 4D Gaussian Splatting (4DGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images. As an explicit spatial-temporal representation, 4DGS demonstrates powerful capabilities for modeling complicated dynamics and fine details, especially for scenes with abrupt motions. We further implement our temporal slicing and splatting techniques in a highly optimized CUDA acceleration framework, achieving real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and 583 FPS on an RTX 4090 GPU. Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DGS, which consistently outperforms existing methods both quantitatively and qualitatively.\n\n我们考虑动态场景下的新视角合成（NVS）问题。近期的神经方法已经在静态3D场景的NVS问题上取得了卓越的成果，但将这些方法扩展到4D时变场景仍然非常具有挑战性。先前的努力通常通过学习一个规范空间加上隐式或显式的形变场来编码动态，这在面对突然运动或捕获高保真渲染的困难场景时往往会遇到挑战。在这篇论文中，我们引入了4D高斯喷溅（4DGS），一种新的方法，通过各向异性的4D XYZT高斯函数来表示动态场景，这一方法的灵感来源于静态场景中3D高斯喷溅的成功。我们通过时间切片4D高斯函数来模拟每个时间戳的动态，这自然而然地组成了动态的3D高斯函数，并且可以无缝地投影到图像中。作为一个显式的时空表示方法，4DGS展示了在模拟复杂动态和细节方面的强大能力，特别是对于有突然运动的场景。我们进一步在一个高度优化的CUDA加速框架中实现了我们的时间切片和喷溅技术，达到了在RTX 3090 GPU上每秒最高277帧，在RTX 4090 GPU上每秒最高583帧的实时推理渲染速度。对于具有多样运动的场景的严格评估展示了4DGS的卓越效率和有效性，它在定量和定性上均一致超越了现有方法。\n"
  },
  {
    "path": "abs/2402.03723.md",
    "content": "### Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos\n\nCreating controllable 3D human portraits from casual smartphone videos is highly desirable due to their immense value in AR/VR applications. The recent development of 3D Gaussian Splatting (3DGS) has shown improvements in rendering quality and training efficiency. However, it still remains a challenge to accurately model and disentangle head movements and facial expressions from a single-view capture to achieve high-quality renderings. In this paper, we introduce Rig3DGS to address this challenge. We represent the entire scene, including the dynamic subject, using a set of 3D Gaussians in a canonical space. Using a set of control signals, such as head pose and expressions, we transform them to the 3D space with learned deformations to generate the desired rendering. Our key innovation is a carefully designed deformation method which is guided by a learnable prior derived from a 3D morphable model. This approach is highly efficient in training and effective in controlling facial expressions, head positions, and view synthesis across various captures. We demonstrate the effectiveness of our learned deformation through extensive quantitative and qualitative experiments.\n\n由于在增强现实/虚拟现实应用中的巨大价值，从普通智能手机视频中创建可控的3D人像变得非常受欢迎。最近3D高斯喷溅（3DGS）的发展在渲染质量和训练效率上显示出了改进。然而，从单一视角捕捉准确地建模和分离头部移动和面部表情以实现高质量渲染仍然是一个挑战。在这篇论文中，我们介绍了Rig3DGS来解决这个挑战。我们使用一组3D高斯在规范空间表示整个场景，包括动态主题。使用一组控制信号，如头部姿势和表情，我们通过学习到的形变将它们转换到3D空间中，以生成所需的渲染。我们的关键创新是一个精心设计的形变方法，该方法由来自3D可塑模型的可学习先验指导。这种方法在训练中高效，并且在控制面部表情、头部位置和跨各种捕捉的视图合成上非常有效。我们通过广泛的定量和定性实验演示了我们学习到的形变的有效性。\n"
  },
  {
    "path": "abs/2402.04796.md",
    "content": "### Mesh-based Gaussian Splatting for Real-time Large-scale Deformation\n\nNeural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion. Gaussian Splatting(GS) has recently become a promising method with explicit geometry for representing static scenes and facilitating high-quality and real-time synthesis of novel views. However,it cannot be easily deformed due to the use of discrete Gaussians and lack of explicit topology. To address this, we develop a novel GS-based method that enables interactive deformation. Our key idea is to design an innovative mesh-based GS representation, which is integrated into Gaussian learning and manipulation. 3D Gaussians are defined over an explicit mesh, and they are bound with each other: the rendering of 3D Gaussians guides the mesh face split for adaptive refinement, and the mesh face split directs the splitting of 3D Gaussians. Moreover, the explicit mesh constraints help regularize the Gaussian distribution, suppressing poor-quality Gaussians(e.g. misaligned Gaussians,long-narrow shaped Gaussians), thus enhancing visual quality and avoiding artifacts during deformation. Based on this representation, we further introduce a large-scale Gaussian deformation technique to enable deformable GS, which alters the parameters of 3D Gaussians according to the manipulation of the associated mesh. Our method benefits from existing mesh deformation datasets for more realistic data-driven Gaussian deformation. Extensive experiments show that our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate(65 FPS on average).\n\n神经隐式表示，包括神经距离场和神经辐射场，已经显示出在重建具有复杂几何和拓扑结构的表面以及生成场景的新视图方面的显著能力。然而，对于用户来说，直接以实时方式对这些隐式表示进行大变形或操作仍然具有挑战性。高斯喷溅（GS）最近成为一种有前景的方法，它具有明确的几何表示，用于表示静态场景并促进高质量且实时的新视图合成。然而，由于使用了离散的高斯函数和缺乏明确的拓扑结构，它不易于形变。为了解决这个问题，我们开发了一种新的基于GS的方法，该方法能够实现交互式形变。我们的关键思想是设计一种创新的基于网格的GS表示，它被整合进高斯学习和操作中。3D高斯被定义在一个明确的网格上，并且彼此绑定：3D高斯的渲染指导网格面的分裂以进行自适应细化，网格面的分裂指导3D高斯的分裂。此外，明确的网格约束有助于规范高斯分布，抑制质量差的高斯（例如，未对齐的高斯、长窄形状的高斯），从而提高视觉质量并在形变过程中避免伪影。基于这种表示，我们进一步引入了一种大规模高斯形变技术来实现可形变的GS，它根据与之关联的网格的操作改变3D高斯的参数。我们的方法受益于现有的网格形变数据集，以实现更现实的数据驱动的高斯形变。广泛的实验表明，我们的方法在保持高帧率（平均每秒65帧）的同时，实现了高质量的重建和有效的形变，并维持了有前景的渲染结果。\n"
  },
  {
    "path": "abs/2402.05054.md",
    "content": "### LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation\n\n3D content creation has achieved significant progress in terms of both quality and speed. Although current feed-forward models can produce 3D objects in seconds, their resolution is constrained by the intensive computation required during training. In this paper, we introduce Large Multi-View Gaussian Model (LGM), a novel framework designed to generate high-resolution 3D models from text prompts or single-view images. Our key insights are two-fold: 1) 3D Representation: We propose multi-view Gaussian features as an efficient yet powerful representation, which can then be fused together for differentiable rendering. 2) 3D Backbone: We present an asymmetric U-Net as a high-throughput backbone operating on multi-view images, which can be produced from text or single-view image input by leveraging multi-view diffusion models. Extensive experiments demonstrate the high fidelity and efficiency of our approach. Notably, we maintain the fast speed to generate 3D objects within 5 seconds while boosting the training resolution to 512, thereby achieving high-resolution 3D content generation.\n\n3D内容创作在质量和速度方面都取得了显著进展。尽管当前的前馈模型可以在几秒钟内产生3D对象，但它们的分辨率受到训练期间所需密集计算的限制。在这篇论文中，我们介绍了大型多视图高斯模型（LGM），这是一个旨在从文本提示或单视图图像生成高分辨率3D模型的新颖框架。我们的关键洞察有两点：1) 3D表示：我们提出多视图高斯特征作为一种高效且强大的表示，然后可以将其融合用于可微渲染。2) 3D骨干网络：我们展示了一个不对称的U-Net作为高通量骨干网络，操作在多视图图像上，这些多视图图像可以通过利用多视图扩散模型从文本或单视图图像输入产生。广泛的实验展示了我们方法的高保真度和效率。值得注意的是，我们保持了在5秒内生成3D对象的快速速度，同时将训练分辨率提高到512，从而实现了高分辨率3D内容的生成。\n"
  },
  {
    "path": "abs/2402.06149.md",
    "content": "### HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting\n\nCreating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated avatars effectively. In this paper, we present HeadStudio, a novel framework that utilizes 3D Gaussian splatting to generate realistic and animated avatars from text prompts. Our method drives 3D Gaussians semantically to create a flexible and achievable appearance through the intermediate FLAME representation. Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh. 2) FLAME-based score distillation sampling, utilizing FLAME-based fine-grained control signal to guide score distillation from the text prompt. Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting visually appealing appearances. The avatars are capable of rendering high-quality real-time (≥40 fps) novel views at a resolution of 1024. They can be smoothly controlled by real-world speech and video. We hope that HeadStudio can advance digital avatar creation and that the present method can widely be applied across various domains.\n\n从文本提示创建数字化头像一直是一个令人期待但又充满挑战的任务。尽管通过在最近的研究中使用2D扩散先验获得了有希望的结果，当前方法在有效地实现高质量和动画化头像方面面临挑战。在本文中，我们介绍了HeadStudio，一个新颖的框架，它利用3D高斯喷溅技术从文本提示生成逼真和动画化的头像。我们的方法通过中间的FLAME表示，语义驱动3D高斯体，以创建灵活且可实现的外观。具体来说，我们将FLAME融入到3D表示和分数蒸馏中：1）基于FLAME的3D高斯喷溅，通过将每个点绑定到FLAME网格来驱动3D高斯点。2）基于FLAME的分数蒸馏采样，利用基于FLAME的细粒度控制信号来指导从文本提示中的分数蒸馏。广泛的实验展示了HeadStudio在从文本提示生成可动画化头像方面的有效性，展示了视觉上吸引人的外观。这些头像能够以1024的分辨率渲染高质量实时（≥40 fps）新视图。它们可以被真实世界的语音和视频平滑控制。我们希望HeadStudio能推进数字头像创建，而且当前方法能广泛应用于各个领域。\n"
  },
  {
    "path": "abs/2402.06198.md",
    "content": "### GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data\n\n3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval. However, the discrete representations of point cloud lost the object's surface shape information and creates a gap between rendering results and 2D correspondences. To address this problem, we propose GS-CLIP for the first attempt to introduce 3DGS (3D Gaussian Splatting) into multimodal pre-training to enhance 3D representation. GS-CLIP leverages a pre-trained vision-language model for a learned common visual and textual space on massive real world image-text pairs and then learns a 3D Encoder for aligning 3DGS optimized per object. Additionally, a novel Gaussian-Aware Fusion is proposed to extract and fuse global explicit feature. As a general framework for language-image-3D pre-training, GS-CLIP is agnostic to 3D backbone networks. Experiments on challenging shows that GS-CLIP significantly improves the state-of-the-art, outperforming the previously best results.\n\n将3D形状表示为点云，在将图像和语言描述对齐的多模态预训练方面取得了进展，这对于对象识别、分类和检索至关重要。然而，点云的离散表示丢失了对象表面形状信息，并在渲染结果与2D对应项之间创建了差距。为了解决这个问题，我们提出了GS-CLIP，这是首次尝试将3DGS（3D高斯喷溅）引入多模态预训练以增强3D表示。GS-CLIP利用预训练的视觉-语言模型，为大量真实世界的图像-文本对学习一个共有的视觉和文本空间，然后学习一个3D编码器以针对每个对象优化3DGS对齐。此外，提出了一种新颖的高斯感知融合方法，以提取和融合全局显式特征。作为一个针对语言-图像-3D预训练的通用框架，GS-CLIP对3D基础网络是不可知的。在具有挑战性的实验中，GS-CLIP显著提高了最先进的水平，超越了以前最好的结果。\n"
  },
  {
    "path": "abs/2402.07181.md",
    "content": "### 3D Gaussian as a New Vision Era: A Survey\n\n3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural networks, such as Neural Radiance Fields (NeRF). This technique has found diverse applications in areas such as robotics, urban mapping, autonomous navigation, and virtual reality/augmented reality, just name a few. Given the growing popularity and expanding research in 3D Gaussian Splatting, this paper presents a comprehensive survey of relevant papers from the past year. We organize the survey into taxonomies based on characteristics and applications, providing an introduction to the theoretical underpinnings of 3D Gaussian Splatting. Our goal through this survey is to acquaint new researchers with 3D Gaussian Splatting, serve as a valuable reference for seminal works in the field, and inspire future research directions, as discussed in our concluding section.\n\n3D高斯喷溅（3D-GS）已成为计算机图形学领域的一个重要进步，提供了明确的场景表示和新视角合成，而不依赖于神经网络，如神经辐射场（NeRF）。这项技术在机器人学、城市绘图、自主导航、虚拟现实/增强现实等多个领域找到了广泛应用。鉴于3D高斯喷溅的日益流行和研究的不断扩展，本文提出了对过去一年相关论文的全面调查。我们根据特征和应用组织了调查分类，为3D高斯喷溅的理论基础提供了介绍。通过这项调查，我们的目标是让新研究人员熟悉3D高斯喷溅，为该领域的重要工作提供宝贵的参考，并启发未来的研究方向，如我们在结论部分所讨论的。\n"
  },
  {
    "path": "abs/2402.07207.md",
    "content": "### GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting\n\nWe present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an object-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene.\n\n我们介绍了GALA3D，一种带有布局引导控制的生成式3D高斯方法，用于有效的组合式文本到3D生成。我们首先利用大型语言模型（LLMs）生成初始布局，并引入了一个布局引导的3D高斯表示，以适应几何约束条件下的3D内容生成。然后，我们提出了一个以条件扩散为基础的对象-场景组合优化机制，以协作生成具有一致几何、纹理、规模和多个对象之间准确互动的逼真3D场景，同时调整从LLMs提取的粗略布局先验，以使其与生成的场景对齐。实验表明，GALA3D是一个用户友好的、端到端的框架，用于最先进的场景级3D内容生成和可控编辑，同时确保场景内对象级实体的高保真度。\n"
  },
  {
    "path": "abs/2402.10128.md",
    "content": "### GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering\n\nAdvancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes.\nIt is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (e.g. squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%.\n\n3D高斯喷溅的进步显著加速了3D重建和生成。然而，它可能需要大量的高斯函数，这将产生大量的内存占用。本文介绍了GES（广义指数喷溅），一种新颖的表示方法，采用广义指数函数（GEF）来模拟3D场景，需要远少于高斯喷溅方法的粒子来表示场景，因此在效率上显著超越高斯喷溅方法，并且具有替代基于高斯的工具的即插即用能力。GES在原理上和实证上都在一维设置和现实3D场景中得到了验证。它被证明能更准确地表示具有尖锐边缘的信号，这对于高斯函数来说通常是个挑战，因为它们固有的低通特性。我们的实证分析表明，GEF在拟合自然出现的信号（例如，正方形、三角形和抛物线信号）方面优于高斯函数，从而减少了需要增加高斯喷溅内存占用的广泛分割操作的需求。借助频率调制损失，GES在新视图合成基准测试中实现了竞争性能，同时所需的内存存储量不到高斯喷溅的一半，并将渲染速度提高了多达39%。\n"
  },
  {
    "path": "abs/2402.10259.md",
    "content": "### GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting\n\nReconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting, that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination which explicitly inject structure priors into the initial optimization process for helping build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. Our GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, and OpenIllumination, achieving strong reconstruction results from only 4 views and significantly outperforming previous state-of-the-art methods.\n\n重建和渲染来自高度稀疏视角的三维对象对于推动三维视觉技术的应用和改善用户体验至关重要。然而，稀疏视角的图像只包含非常有限的三维信息，导致两个显著挑战：1）由于匹配图像太少，建立多视角一致性困难；2）部分省略或高度压缩的对象信息，因为视角覆盖不足。为了解决这些挑战，我们提出了一个框架 GaussianObject，通过高斯喷溅来表示和渲染三维对象，仅使用4张输入图像即可实现高质量渲染。我们首先介绍了视觉外壳和浮点消除技术，这些技术明确地将结构先验注入初始优化过程中，以帮助建立多视角一致性，产生一个粗糙的三维高斯表示。然后我们基于扩散模型构建了一个高斯修复模型来补充省略的对象信息，其中高斯被进一步细化。我们设计了一种自生成策略来获得用于训练修复模型的图像对。我们的 GaussianObject 在几个具有挑战性的数据集上进行了评估，包括 MipNeRF360、OmniObject3D 和 OpenIllumination，仅从4个视角就实现了强大的重建结果，显著超越了之前的最先进方法。\n"
  },
  {
    "path": "abs/2402.10483.md",
    "content": "### GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians\n\nHairstyle reflects culture and ethnicity at first glance. In the digital era, various realistic human hairstyles are also critical to high-fidelity digital human assets for beauty and inclusivity. Yet, realistic hair modeling and real-time rendering for animation is a formidable challenge due to its sheer number of strands, complicated structures of geometry, and sophisticated interaction with light. This paper presents GaussianHair, a novel explicit hair representation. It enables comprehensive modeling of hair geometry and appearance from images, fostering innovative illumination effects and dynamic animation capabilities. At the heart of GaussianHair is the novel concept of representing each hair strand as a sequence of connected cylindrical 3D Gaussian primitives. This approach not only retains the hair's geometric structure and appearance but also allows for efficient rasterization onto a 2D image plane, facilitating differentiable volumetric rendering. We further enhance this model with the \"GaussianHair Scattering Model\", adept at recreating the slender structure of hair strands and accurately capturing their local diffuse color in uniform lighting. Through extensive experiments, we substantiate that GaussianHair achieves breakthroughs in both geometric and appearance fidelity, transcending the limitations encountered in state-of-the-art methods for hair reconstruction. Beyond representation, GaussianHair extends to support editing, relighting, and dynamic rendering of hair, offering seamless integration with conventional CG pipeline workflows. Complementing these advancements, we have compiled an extensive dataset of real human hair, each with meticulously detailed strand geometry, to propel further research in this field.\n\n发型首先反映了文化和种族。在数字时代，各种逼真的人类发型对于高保真度的数字人类资产也至关重要，这关乎美感和包容性。然而，由于发丝数量庞大、几何结构复杂以及与光的复杂相互作用，逼真的头发建模和动画的实时渲染是一个巨大的挑战。本文介绍了一种新颖的显式头发表示方法 GaussianHair。它能够从图像中全面建模头发的几何形状和外观，促进创新的照明效果和动态动画能力的发展。GaussianHair的核心是一个新颖的概念，即将每一根头发丝表示为一系列连接的圆柱形3D高斯原始体。这种方法不仅保留了头发的几何结构和外观，还允许高效地光栅化到2D图像平面上，便于不同的体积渲染。我们进一步增强了这个模型，通过“GaussianHair散射模型”，能够熟练地重现头发丝的细长结构，并在均匀光照下准确捕捉它们的局部漫反射颜色。通过广泛的实验，我们证明了GaussianHair在几何和外观保真度方面都取得了突破性进展，超越了当前最先进方法在头发重建方面遇到的限制。超越表示之外，GaussianHair还支持编辑、重新照明和动态渲染头发，提供了与传统CG流程工作流的无缝集成。为了进一步推动这一领域的研究，我们还编纂了一个包含真实人类头发的广泛数据集，每个数据集都有精心详细的头发丝几何结构。\n"
  },
  {
    "path": "abs/2402.13827.md",
    "content": "### Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting\n\n3D Gaussian splatting (3D-GS) is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality. 3D-GS represents 3D scenes by utilizing millions of 3D Gaussians and projects these Gaussians onto the 2D image plane for rendering. However, during the rendering process, a substantial number of unnecessary 3D Gaussians exist for the current view direction, resulting in significant computation costs associated with their identification. In this paper, we propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view without compromising image quality. This is accomplished through the offline clustering of 3D Gaussians that are close in distance, followed by the projection of these clusters onto a 2D image plane during runtime. Additionally, we analyze the bottleneck associated with the proposed technique when executed on GPUs and propose an efficient hardware architecture that seamlessly supports the proposed scheme. For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering computation by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR). The proposed accelerator also achieves a speedup of 10.7x compared to a GPU.\n\n三维高斯喷溅（3D-GS）是一种新的渲染方法，无论是速度还是图像质量，都优于神经辐射场（NeRF）。3D-GS通过利用数以百万计的三维高斯来表示三维场景，并将这些高斯投影到二维图像平面上进行渲染。然而，在渲染过程中，对于当前视图方向而言，存在大量不必要的三维高斯，这导致了与它们的识别相关的显著计算成本。在本文中，我们提出了一种计算降低技术，该技术能够在不损害图像质量的情况下，实时快速识别出渲染当前视图时不必要的三维高斯。这通过离线聚类那些距离接近的三维高斯，然后在运行时将这些簇投影到二维图像平面上来实现。此外，我们分析了在GPU上执行时与所提技术相关的瓶颈，并提出了一种高效的硬件架构，无缝支持所提方案。对于Mip-NeRF360数据集，所提技术平均排除了63%的三维高斯在二维图像投影之前，这减少了将近38.3%的总体渲染计算量，同时不牺牲峰值信噪比（PSNR）。所提出的加速器还实现了相比GPU 10.7倍的加速。\n"
  },
  {
    "path": "abs/2402.14650.md",
    "content": "### GaussianPro: 3D Gaussian Splatting with Progressive Propagation\n\nThe advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR.\n\n三维高斯喷溅（3DGS）的出现最近在神经渲染领域引发了一场革命，促进了实时速度下的高质量渲染。然而，3DGS严重依赖于由结构从运动（SfM）技术产生的初始化点云。在处理不可避免包含无纹理表面的大规模场景时，SfM技术总是无法在这些表面上产生足够的点，并且不能为3DGS提供好的初始化。结果，3DGS遭受了优化困难和低质量渲染的问题。在本文中，受到经典多视图立体（MVS）技术的启发，我们提出了一种新颖的方法GaussianPro，该方法应用了一个渐进的传播策略来指导三维高斯的密集化。与3DGS中使用的简单分裂和克隆策略相比，我们的方法利用了场景已重建几何体的先验和块匹配技术，产生了具有准确位置和方向的新高斯。在大规模和小规模场景上的实验验证了我们方法的有效性，其中我们的方法在Waymo数据集上显著超越了3DGS，展示了在峰值信噪比（PSNR）方面1.15dB的改进。\n"
  },
  {
    "path": "abs/2402.15870.md",
    "content": "### Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting\n\nThe recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components. This issue stems from the limited ability of spherical harmonics (SH) to represent high-frequency information. To overcome this challenge, we introduce Spec-Gaussian, an approach that utilizes an anisotropic spherical Gaussian (ASG) appearance field instead of SH for modeling the view-dependent appearance of each 3D Gaussian. Additionally, we have developed a coarse-to-fine training strategy to improve learning efficiency and eliminate floaters caused by overfitting in real-world scenes. Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality. Thanks to ASG, we have significantly improved the ability of 3D-GS to model scenes with specular and anisotropic components without increasing the number of 3D Gaussians. This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces.\n\n最近在3D高斯喷溅（3D-GS）方面的进展，不仅通过现代GPU光栅化管道实现了实时渲染，而且还达到了最先进的渲染质量。然而，尽管在标准数据集上具有卓越的渲染质量和性能，3D-GS在准确建模镜面和各向异性成分方面经常遇到困难。这个问题源于球形谐波（SH）表示高频信息能力的限制。为了克服这个挑战，我们引入了Spec-Gaussian方法，该方法使用各向异性球形高斯（ASG）外观场而不是SH来建模每个3D高斯的视觉依赖外观。此外，我们还开发了一种从粗到细的训练策略，以提高学习效率并消除实际场景中过拟合造成的漂浮物。我们的实验结果表明，我们的方法在渲染质量方面超越了现有方法。多亏了ASG，我们显著提高了3D-GS建模具有镜面和各向异性成分场景的能力，而没有增加3D高斯的数量。这一改进扩展了3D GS处理具有镜面和各向异性表面的复杂场景的适用性。\n"
  },
  {
    "path": "abs/2402.16607.md",
    "content": "### GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video\n\nThis paper presents GEA, a novel method for creating expressive 3D avatars with high-fidelity reconstructions of body and hands based on 3D Gaussians. The key contributions are twofold. First, we design a two-stage pose estimation method to obtain an accurate SMPL-X pose from input images, providing a correct mapping between the pixels of a training image and the SMPL-X model. It uses an attention-aware network and an optimization scheme to align the normal and silhouette between the estimated SMPL-X body and the real body in the image. Second, we propose an iterative re-initialization strategy to handle unbalanced aggregation and initialization bias faced by Gaussian representation. This strategy iteratively redistributes the avatar's Gaussian points, making it evenly distributed near the human body surface by applying meshing, resampling and re-Gaussian operations. As a result, higher-quality rendering can be achieved. Extensive experimental analyses validate the effectiveness of the proposed model, demonstrating that it achieves state-of-the-art performance in photorealistic novel view synthesis while offering fine-grained control over the human body and hand pose.\n\n本文提出了GEA，一种创新的方法，用于基于3D高斯创建表情丰富的3D化身，具有高保真度的身体和手部重建。主要贡献有两方面。首先，我们设计了一种两阶段姿态估计方法，从输入图像中获得准确的SMPL-X姿态，提供了训练图像的像素与SMPL-X模型之间的正确映射。它使用一个注意力感知网络和一个优化方案，以对齐估计的SMPL-X身体与图像中真实身体的法线和轮廓。其次，我们提出了一种迭代重初始化策略，以处理高斯表示面临的不平衡聚合和初始化偏差。这一策略通过应用网格化、重采样和重新高斯化操作，迭代地重新分配化身的高斯点，使其在人体表面附近均匀分布。结果是，可以实现更高质量的渲染。广泛的实验分析验证了所提模型的有效性，证明了它在真实感新视角合成中达到了最先进的性能，同时提供了对人体和手部姿态的细粒度控制。\n"
  },
  {
    "path": "abs/2402.17427.md",
    "content": "### VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction\n\nExisting NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering.\n\n现有基于NeRF的大场景重建方法在视觉质量和渲染速度上往往存在限制。虽然最近的3D高斯喷溅在小规模和以对象为中心的场景上表现良好，但将其扩展到大场景时面临着有限的视频内存、长时间的优化以及明显的外观变化等挑战。为了解决这些挑战，我们提出了VastGaussian，这是第一个基于3D高斯喷溅，针对大场景进行高质量重建和实时渲染的方法。我们提出了一种渐进式分割策略，将一个大场景分成多个单元，其中训练相机和点云以一个空域感知的可见性标准适当分布。这些单元在并行优化后合并成一个完整的场景。我们还将解耦的外观建模引入优化过程，以减少渲染图像中的外观变化。我们的方法超越了现有的基于NeRF的方法，并在多个大场景数据集上实现了最先进的结果，实现了快速优化和高保真实时渲染。\n"
  },
  {
    "path": "abs/2402.19441.md",
    "content": "### 3D Gaussian Model for Animation and Texturing\n\n3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We propose a modeling that is analogous to typical 3D models, which we call 3D Gaussian Model (3DGM); it provides a manipulatable proxy for novel animation and texture transfer. By binding the 3D Gaussians in texture space and re-projecting them back to world space through implicit shell mapping, we show how our 3D modeling can serve as a valid rendering methodology for interactive applications. It is further noted that recently, 3D mesh reconstruction works have been able to produce high-quality mesh for rendering. Our work, on the other hand, only requires an approximated geometry for rendering an object in high fidelity. Applicationwise, we will show that our proxy-based 3DGM is capable of driving novel animation without animated training data and texture transferring via UV mapping of the 3D Gaussians. We believe the result indicates the potential of our work for enabling interactive applications for 3D Gaussian Splatting.\n\n3D高斯喷溅在神经渲染领域取得了显著影响，实现了令人印象深刻的保真度和性能。尽管取得了这一成就，但它并不容易直接应用于开发交互式应用程序。像XR应用和游戏这样的实时应用需要同时通过使用3D模型操作动画、UV映射和模型编辑等功能。我们提出了一种类似于典型3D模型的建模方法，我们称之为3D高斯模型（3DGM）；它提供了一个可操作的代理，用于新颖的动画和纹理转移。通过将3D高斯绑定在纹理空间，并通过隐式壳层映射重新投影回世界空间，我们展示了我们的3D建模如何作为交互式应用的有效渲染方法论。还应注意的是，最近的3D网格重建工作已经能够生成高质量的网格用于渲染。另一方面，我们的工作只需要一个近似的几何形状就可以高保真度地渲染对象。应用方面，我们将展示我们基于代理的3DGM能够驱动新颖的动画，无需动画训练数据，以及通过3D高斯的UV映射实现纹理转移。我们相信这一结果表明了我们的工作为3D高斯喷溅启用交互式应用的潜力。\n"
  },
  {
    "path": "abs/2403.01444.md",
    "content": "### 3DGStream: On-the-fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos\n\nConstructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the naïve approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.\n\n构建动态场景的逼真自由视点视频（FVVs）仍然是一项挑战性的任务。尽管当前神经渲染技术取得了显著进步，但这些方法通常需要完整的视频序列进行离线训练，并且无法实现实时渲染。为了解决这些限制，我们引入了3DGStream，一种为真实世界动态场景高效FVV流媒体设计的方法。我们的方法实现了快速的即时逐帧重建，在12秒内完成，并能以200 FPS的速度实时渲染。具体来说，我们使用3D高斯（3DGs）来表示场景。我们没有采用直接优化每帧3DGs的简单方法，而是采用了一个紧凑的神经转换缓存（NTC）来模拟3DGs的平移和旋转，显著减少了每个FVV帧所需的训练时间和存储空间。此外，我们提出了一种自适应3DG添加策略来处理动态场景中出现的新对象。实验表明，与最先进的方法相比，3DGStream在渲染速度、图像质量、训练时间和模型存储方面都达到了有竞争力的性能。\n"
  },
  {
    "path": "abs/2403.02751.md",
    "content": "### Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps\n\nWe present Splat-Nav, a navigation pipeline that consists of a real-time safe planning module and a robust state estimation module designed to operate in the Gaussian Splatting (GSplat) environment representation, a popular emerging 3D scene representation from computer vision. We formulate rigorous collision constraints that can be computed quickly to build a guaranteed-safe polytope corridor through the map. We then optimize a B-spline trajectory through this corridor. We also develop a real-time, robust state estimation module by interpreting the GSplat representation as a point cloud. The module enables the robot to localize its global pose with zero prior knowledge from RGB-D images using point cloud alignment, and then track its own pose as it moves through the scene from RGB images using image-to-point cloud localization. We also incorporate semantics into the GSplat in order to obtain better images for localization. All of these modules operate mainly on CPU, freeing up GPU resources for tasks like real-time scene reconstruction. We demonstrate the safety and robustness of our pipeline in both simulation and hardware, where we show re-planning at 5 Hz and pose estimation at 20 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation.\n\n我们介绍了Splat-Nav，一种导航流程，它包括一个实时安全规划模块和一个鲁棒状态估计模块，这两个模块设计用于在高斯Splatting（GSplat）环境表示中运行，GSplat是计算机视觉中流行的新兴3D场景表示方法。我们制定了可以快速计算的严格碰撞约束，以通过地图构建一个保证安全的多面体走廊。然后，我们通过这个走廊优化一个B样条轨迹。我们还通过将GSplat表示解释为点云，开发了一个实时、鲁棒的状态估计模块。该模块使机器人能够利用点云对齐从RGB-D图像中，无需任何先验知识，定位其全局姿态，然后利用图像到点云的定位，跟踪它在场景中移动的姿态。我们还将语义信息整合到GSplat中，以获得更好的定位图像。所有这些模块主要在CPU上运行，为像实时场景重建这样的任务释放GPU资源。我们在模拟和硬件中展示了我们流程的安全性和鲁棒性，在这里我们展示了以5 Hz的速度重新规划和以20 Hz的速度估计姿态，比基于神经辐射场（NeRF）的导航方法快一个数量级，从而实现了实时导航。\n"
  },
  {
    "path": "abs/2403.04116.md",
    "content": "### Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis\n\nX-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method.\n\nX射线由于其比自然光更强的穿透能力，被广泛应用于透射成像。在渲染新视角X射线投影时，现有方法主要基于NeRF，遭受长时间训练和慢速推理的问题。在本文中，我们提出了一个基于3D高斯Splatting的框架，命名为X-Gaussian，用于X射线新视角合成。首先，我们重新设计了一个辐射高斯点云模型，灵感来自X射线成像的各向同性特性。我们的模型在学习预测3D点的辐射强度时排除了视角方向的影响。基于此模型，我们开发了一个具有CUDA实现的可微分辐射栅格化（DRR）。其次，我们定制了一个角度-姿态立方体均匀初始化（ACUI）策略，直接使用X射线扫描器的参数计算相机信息，然后在包围扫描对象的立方体内均匀采样点位置。实验表明，我们的X-Gaussian在性能上超越了最先进的方法6.5 dB，同时享受不到15%的训练时间和超过73倍的推理速度。在稀疏视图CT重建上的应用也揭示了我们方法的实际价值。\n"
  },
  {
    "path": "abs/2403.04926.md",
    "content": "### BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling\n\nRecent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \\etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches.\n\n近期在使用3D高斯进行场景重建和新视角合成的努力在精心策划的基准测试上可以取得令人印象深刻的结果；然而，实际生活中捕获的图像往往是模糊的。在这项工作中，我们分析了基于高斯Splatting方法对各种图像模糊（如运动模糊、散焦模糊、缩小模糊等）的鲁棒性。在这些退化条件下，基于高斯Splatting的方法往往会过拟合并产生比基于神经辐射场的方法更糟的结果。为了解决这个问题，我们提出了对模糊不敏感的高斯Splatting（BAGS）。BAGS引入了额外的2D建模能力，使得尽管存在图像级的模糊，也能重建出3D一致且高质量的场景。具体来说，我们通过估计每个像素的卷积核从模糊提议网络（BPN）来模拟模糊。BPN被设计为考虑场景的空间、颜色和深度变化，以最大化建模能力。此外，BPN还提出了一个质量评估掩码，指示出模糊发生的区域。最后，我们引入了一个从粗到细的核优化方案；这个优化方案快速且避免了由于在模糊图像上应用运动结构时经常发生的稀疏点云初始化而导致的次优解。我们证明BAGS在各种具有挑战性的模糊条件和成像几何下实现了逼真的渲染效果，同时显著改善了现有方法。\n\n"
  },
  {
    "path": "abs/2403.05087.md",
    "content": "### SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting\n\nWe present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.\n\n我们介绍了SplattingAvatar，一种混合3D表示的真实感人类化身，将高斯Splatting嵌入到三角形网格中，它在现代GPU上的渲染速度超过300 FPS，在移动设备上为30 FPS。我们解开了虚拟人类的运动和外观，使用显式的网格几何和隐式的外观建模与高斯Splatting。高斯定义为三角网格上的重心坐标和位移，作为Phong表面。我们扩展了提升优化，以同时优化高斯参数，同时在三角形网格上行走。SplattingAvatar是虚拟人类的混合表示，其中网格代表低频运动和表面形变，而高斯接管高频几何和详细外观。与依赖于MLP基线性混合蒙皮（LBS）场的现有变形方法不同，我们通过网格直接控制高斯的旋转和平移，这增强了其与各种动画技术的兼容性，例如，骨骼动画、混合形状和网格编辑。SplattingAvatar能够从单目视频中训练，适用于全身和头部化身，显示出跨多个数据集的最先进渲染质量。\n"
  },
  {
    "path": "abs/2403.05154.md",
    "content": "### GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting\n\nWe present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail.\n\n我们介绍了GSEdit，一种基于高斯Splatting模型的文本引导3D对象编辑流程。我们的方法能够在几分钟内，在消费级硬件上编辑3D对象的风格和外观，而不改变它们的主要细节。我们通过利用高斯Splatting来表示3D场景，并通过预训练的基于图像的扩散模型逐渐变化图像监督来优化模型，来解决这个问题。输入对象可以作为3D三角形网格给出，或者直接作为来自如DreamGaussian这样的生成模型的高斯提供。GSEdit确保了不同视点之间的一致性，保持了原始对象信息的完整性。与之前依赖于类NeRF的MLP模型的方法相比，GSEdit以其效率脱颖而出，使3D编辑任务更快完成。我们的编辑过程通过应用SDS损失来细化，确保我们的编辑既精确又准确。我们的全面评估表明，GSEdit有效地根据给定的文本指令改变对象的形状和外观，同时保持它们的连贯性和细节。\n"
  },
  {
    "path": "abs/2403.06908.md",
    "content": "### FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization\n\n3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (e.g., Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently.\n\n3D 高斯喷溅技术在实时新视角合成中取得了非常令人印象深刻的性能。然而，它在高斯密集化过程中经常遭受过度重建的问题，其中高方差图像区域只被少数几个大高斯覆盖，导致渲染图像中出现模糊和伪影。我们设计了一种渐进式频率正则化（FreGS）技术来解决频率空间内的过度重建问题。具体而言，FreGS通过利用低通和高通滤波器在傅里叶空间轻松提取的低频至高频成分，执行由粗到细的高斯密集化。通过最小化渲染图像的频率谱与相应真实图像之间的差异，它实现了高质量的高斯密集化，并有效缓解了高斯喷溅的过度重建问题。在多个广泛采用的基准测试（例如 Mip-NeRF360、Tanks-and-Temples 和 Deep Blending）上的实验表明，FreGS实现了卓越的新视角合成，并一致性地超越了最先进的技术。\n"
  },
  {
    "path": "abs/2403.06912.md",
    "content": "### DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization\n\nRadiance fields have demonstrated impressive performance in synthesizing novel views from sparse input views, yet prevailing methods suffer from high training costs and slow inference speed. This paper introduces DNGaussian, a depth-regularized framework based on 3D Gaussian radiance fields, offering real-time and high-quality few-shot novel view synthesis at low costs. Our motivation stems from the highly efficient representation and surprising quality of the recent 3D Gaussian Splatting, despite it will encounter a geometry degradation when input views decrease. In the Gaussian radiance fields, we find this degradation in scene geometry primarily lined to the positioning of Gaussian primitives and can be mitigated by depth constraint. Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes. Extensive experiments on LLFF, DTU, and Blender datasets demonstrate that DNGaussian outperforms state-of-the-art methods, achieving comparable or better results with significantly reduced memory cost, a 25× reduction in training time, and over 3000× faster rendering speed.\n\n辐射场在从稀疏输入视图合成新视图方面展示了令人印象深刻的性能，然而，现有方法受到高训练成本和慢推理速度的困扰。本文介绍了 DNGaussian，一个基于 3D 高斯辐射场的深度正则化框架，提供实时、高质量的少样本新视角合成，且成本低。我们的动机源自最近 3D 高斯喷溅的高效表示和令人惊讶的质量，尽管在输入视图减少时会遇到几何退化。在高斯辐射场中，我们发现场景几何的这种退化主要与高斯原语的定位有关，可以通过深度约束来缓解。因此，我们提出了硬性和软性深度正则化，以在粗糙的单目深度监督下恢复精确的场景几何，同时保持细腻的颜色外观。为了进一步细化几何形状的重塑，我们引入了全局-局部深度归一化，增强了对小的局部深度变化的关注。在 LLFF、DTU 和 Blender 数据集上的广泛实验表明，DNGaussian 超越了最先进的方法，以显著降低的内存成本、训练时间的 25 倍减少和渲染速度的 3000 倍以上提升，实现了可比或更好的结果。\n"
  },
  {
    "path": "abs/2403.07494.md",
    "content": "### SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM\n\nWe propose SemGauss-SLAM, the first semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering in real-time. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift and improve reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to more robust tracking and consistent mapping. Our SemGauss-SLAM method demonstrates superior performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in novel-view semantic synthesis and 3D semantic mapping.\n\n我们提出了 SemGauss-SLAM，这是第一个利用 3D 高斯表示的语义 SLAM 系统，它能够在实时内实现精确的 3D 语义映射、稳健的相机跟踪和高质量的渲染。在这个系统中，我们将语义特征嵌入到 3D 高斯表示中，有效地在环境的空间布局中编码语义信息，以实现精确的语义场景表示。此外，我们提出了用于更新 3D 高斯表示的特征级损失，为 3D 高斯优化提供更高级别的指导。另外，为了减少累积漂移和提高重建精度，我们引入了利用语义关联进行 3D 高斯表示和相机姿态联合优化的语义信息捆绑调整，从而实现更稳健的跟踪和一致的映射。我们的 SemGauss-SLAM 方法在 Replica 和 ScanNet 数据集上的映射和跟踪精度方面，展示了优于现有密集语义 SLAM 方法的卓越性能，同时也展现了在新视角语义合成和 3D 语义映射方面的出色能力。\n"
  },
  {
    "path": "abs/2403.07807.md",
    "content": "### StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting\n\nWe introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency.\n\n我们介绍了 StyleGaussian，一种新颖的 3D 风格转换技术，它能够以每秒 10 帧（fps）的速度将任何图像的风格瞬间转移到 3D 场景中。利用 3D 高斯喷溅（3DGS），StyleGaussian 在不影响其实时渲染能力和多视图一致性的前提下实现了风格转换。它通过三个步骤实现瞬间风格转换：嵌入、转换和解码。最初，2D VGG 场景特征被嵌入到重构的 3D 高斯中。接下来，嵌入的特征根据参考风格图像进行变换。最后，变换后的特征被解码成风格化的 RGB。StyleGaussian 有两个新颖的设计。第一个是一个高效的特征渲染策略，它首先渲染低维特征，然后在嵌入 VGG 特征时将它们映射到高维特征。这显著减少了内存消耗，并使 3DGS 能够渲染高维的、内存密集型的特征。第二个是基于 K 最近邻的 3D CNN。作为风格化特征的解码器，它消除了破坏严格多视图一致性的 2D CNN 操作。广泛的实验表明，StyleGaussian 在保持实时渲染和严格多视图一致性的同时，实现了瞬间 3D 风格化，并具有优越的风格化质量。\n"
  },
  {
    "path": "abs/2403.08321.md",
    "content": "### ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation\n\nPerforming language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1\\% in average success rate.\n\n在非结构化环境中执行语言条件下的机器人操纵任务对于通用智能机器人来说需求极高。传统的机器人操纵方法通常学习观察的语义表示以预测动作，这忽略了完成人类目标的场景级时空动态。在本文中，我们提出了一种名为ManiGaussian的动态高斯溅射方法，用于多任务机器人操纵，该方法通过未来场景重建来挖掘场景动态。具体来说，我们首先构建了动态高斯溅射框架，该框架推断高斯嵌入空间中的语义传播，其中语义表示被利用来预测最优的机器人动作。然后，我们构建了一个高斯世界模型来参数化我们的动态高斯溅射框架中的分布，该模型通过未来场景重建在交互环境中提供信息丰富的监督。我们在10个RLBench任务上评估了我们的ManiGaussian，包含166种变化，结果表明我们的框架可以平均成功率比最先进方法高出13.1\\%。\n"
  },
  {
    "path": "abs/2403.08498.md",
    "content": "### Gaussian Splatting in Style\n\nScene stylization extends the work of neural style transfer to three spatial dimensions. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across a multi-view setting. A vast majority of the previous works achieve this by optimizing the scene with a specific style image. In contrast, we propose a novel architecture trained on a collection of style images, that at test time produces high quality stylized novel views. Our work builds up on the framework of 3D Gaussian splatting. For a given scene, we take the pretrained Gaussians and process them using a multi resolution hash grid and a tiny MLP to obtain the conditional stylised views. The explicit nature of 3D Gaussians give us inherent advantages over NeRF-based methods including geometric consistency, along with having a fast training and rendering regime. This enables our method to be useful for vast practical use cases such as in augmented or virtual reality applications. Through our experiments, we show our methods achieve state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.\n\n场景风格化将神经风格迁移的工作扩展到三维空间。在这个问题中，一个重要的挑战是在多视图设置中保持风格化外观的一致性。大多数以前的工作通过优化具有特定风格图片的场景来实现这一点。相比之下，我们提出了一种在风格图片集合上训练的新颖架构，这种架构在测试时可以产生高质量的风格化新视图。我们的工作是在3D高斯溅射框架上构建的。对于给定的场景，我们使用预训练的高斯体并通过多分辨率哈希网格和一个小型MLP（多层感知机）处理它们，以获得条件风格化视图。3D高斯的明确性质赋予我们相比基于NeRF（神经辐射场）的方法包括几何一致性在内的内在优势，同时拥有快速的训练和渲染体制。这使得我们的方法对于诸如增强现实或虚拟现实应用等广泛的实际用例非常有用。通过我们的实验，我们展示了我们的方法在各种室内外真实世界数据上达到了最先进的性能，并具有优越的视觉质量。\n"
  },
  {
    "path": "abs/2403.08733.md",
    "content": "### GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing\n\nWe propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS).\nOur method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model.\nOur key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works.\nIt leads to faster editing as well as higher visual quality.\nThis is achieved by the two terms:\n(a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps.\n(b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations.\nExperiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.\n\n我们提出了GaussCtrl，这是一种基于文本的方法，用于编辑由3D高斯溅射（3DGS）重建的3D场景。我们的方法首先使用3DGS渲染一系列图像，并根据输入提示，使用预训练的2D扩散模型（ControlNet）编辑它们，然后用于优化3D模型。我们的关键贡献是多视图一致性编辑，它允许同时编辑所有图像，而不是像以前的工作那样，迭代编辑一个图像同时更新3D模型。这导致编辑速度更快以及更高的视觉质量。这通过两个条款实现：\n(a)深度条件编辑，通过利用自然一致的深度图来强制多视图图像间的几何一致性。\n(b)基于注意力的潜码对齐，通过将编辑的图像条件化到几个参考视图上，并通过图像潜在表示之间的自注意力和交叉视图注意力，统一编辑图像的外观。实验表明，我们的方法比以往的最先进方法实现了更快的编辑和更好的视觉结果。\n"
  },
  {
    "path": "abs/2403.09143.md",
    "content": "### A New Split Algorithm for 3D Gaussian Splatting\n\n3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to flatten large untextured regions, yielding a very sparse point cloud. These problems are caused by the non-uniform nature of 3D Gaussian splatting models, so in this paper, we propose a new 3D Gaussian splitting algorithm, which can produce a more uniform and surface-bounded 3D Gaussian splatting model. Our algorithm splits an N-dimensional Gaussian into two N-dimensional Gaussians. It ensures consistency of mathematical characteristics and similarity of appearance, allowing resulting 3D Gaussian splatting models to be more uniform and a better fit to the underlying surface, and thus more suitable for explicit editing, point cloud extraction and other tasks. Meanwhile, our 3D Gaussian splitting approach has a very simple closed-form solution, making it readily applicable to any 3D Gaussian model.\n\n3D 高斯喷溅模型作为一种新型的显式3D表示，近来已被应用于多个领域，例如显式几何编辑和几何生成。进展迅速。然而，由于它们混合的尺度和杂乱的形状，3D 高斯喷溅模型在表面附近可能产生模糊或针状效果。同时，3D 高斯喷溅模型倾向于平坦化大型无纹理区域，产生非常稀疏的点云。这些问题是由3D 高斯喷溅模型的非均匀性质引起的，因此在本文中，我们提出了一种新的3D高斯分裂算法，可以产生更均匀且表面受限的3D高斯喷溅模型。我们的算法将一个N维高斯分裂为两个N维高斯。它确保了数学特性的一致性和外观的相似性，允许结果的3D高斯喷溅模型更加均匀，更适合于底层表面，因此更适合于显式编辑、点云提取和其他任务。同时，我们的3D高斯分裂方法有一个非常简单的封闭形式解，使其可以轻松应用于任何3D高斯模型。\n"
  },
  {
    "path": "abs/2403.09236.md",
    "content": "### Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph\n\nText-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a method named ``3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework is anchored by a well-established mainflow and an essential module, named ``Geometry and Texture Hypergraph Refiner (HGRefiner)''. This module not only refines the representation of 3D Gaussians but also accelerates the update process of these 3D Gaussians by conducting the Patch-3DGS Hypergraph Learning on both explicit attributes and latent visual features. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation. Extensive experimentation has shown that our proposed method significantly enhances the quality of 3D generation while incurring no additional computational overhead for the underlying framework.\n\n文本到3D生成是一个令人兴奋的领域，已经看到了快速的进展，促进了将文本描述转化为详细3D模型的转变。然而，当前的进展常常忽视了3D对象内部几何与纹理之间错综复杂的高阶相关性，导致了如过度平滑、过度饱和和Janus问题等挑战。在这项工作中，我们提出了一种名为“通过超图的3D高斯生成（Hyper-3DG）”的方法，旨在捕捉3D对象内存在的复杂高阶相关性。我们的框架由一个建立良好的主流程和一个关键模块支撑，名为“几何与纹理超图细化器（HGRefiner）”。该模块不仅细化了3D高斯的表示，还通过对显式属性和潜在视觉特征进行Patch-3DGS超图学习，加速了这些3D高斯的更新过程。我们的框架允许在一个连贯的优化中生产精细生成的3D对象，有效地规避了退化。广泛的实验表明，我们提出的方法显著提高了3D生成的质量，同时不增加底层框架的额外计算开销。\n"
  },
  {
    "path": "abs/2403.09413.md",
    "content": "### Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting\n\n3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When trained with randomly initialized point clouds, 3DGS fails to maintain its ability to produce high-quality images, undergoing large performance drops of 4-5 dB in PSNR. Through extensive analysis of SfM initialization in the frequency domain and analysis of a 1D regression task with multiple 1D Gaussians, we propose a novel optimization strategy dubbed RAIN-GS (Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting), that successfully trains 3D Gaussians from random point clouds. We show the effectiveness of our strategy through quantitative and qualitative comparisons on multiple datasets, largely improving the performance in all settings.\n\n3D高斯喷溅（3DGS）最近在实时新视角合成和3D重建方面展示了令人印象深刻的能力。然而，3DGS严重依赖于从运动结构（SfM）方法得出的准确初始化。当用随机初始化的点云进行训练时，3DGS未能维持其产生高质量图像的能力，PSNR性能下降了4-5 dB。通过对SfM初始化在频域的广泛分析和对使用多个一维高斯的一维回归任务的分析，我们提出了一种名为RAIN-GS（放宽3D高斯喷溅的准确初始化约束）的新优化策略，成功地从随机点云训练3D高斯。我们通过在多个数据集上的定量和定性比较展示了我们策略的有效性，大大改善了所有设置中的性能。\n"
  },
  {
    "path": "abs/2403.09434.md",
    "content": "### Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians\n\nReconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, provide modeling for 3D appearance and geometry but lack the ability to simulate physical properties or optimize parameters for heterogeneous objects. We propose Spring-Gaus, a novel framework that integrates 3D Gaussians with physics-based simulation for reconstructing and simulating elastic objects from multi-view videos. Our method utilizes a 3D Spring-Mass model, enabling the optimization of physical parameters at the individual point level while decoupling the learning of physics and appearance. This approach achieves great sample efficiency, enhances generalization, and reduces sensitivity to the distribution of simulation particles. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. This includes future prediction and simulation under varying initial states and environmental parameters.\n\n从视觉观察重建和模拟弹性对象对于计算机视觉和机器人学的应用至关重要。现有方法，如3D高斯，为3D外观和几何提供了建模，但缺乏模拟物理属性或为异质对象优化参数的能力。我们提出了一种名为Spring-Gaus的新型框架，将3D高斯与基于物理的模拟整合起来，用于从多视角视频重建和模拟弹性对象。我们的方法利用了一个3D弹簧质量模型，使得在个别点水平上优化物理参数成为可能，同时解耦了物理和外观的学习。这种方法实现了极高的样本效率，增强了泛化能力，并减少了对模拟粒子分布的敏感性。我们在合成和现实世界数据集上评估了Spring-Gaus，展示了准确重建和模拟弹性对象的能力。这包括在不同初始状态和环境参数下的未来预测和模拟。\n"
  },
  {
    "path": "abs/2403.09637.md",
    "content": "### GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping\n\nConstructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks.\n\n构建一个能够适应开放式语言查询的3D场景，在机器人学领域尤其重要。这项技术便利了机器人基于人类语言指令执行对象操纵任务。为了应对这一挑战，一些研究工作致力于开发嵌入语言的隐式场。然而，隐式场（例如NeRF）由于重建需要处理大量输入视图，以及其在推断中固有的低效率而遇到限制。因此，我们提出了GaussianGrasper，它利用3D高斯喷溅明确表示场景为一系列高斯原始体的集合。我们的方法采用了有限的RGB-D视图集合，并采用基于瓦片的喷溅技术来创建特征场。特别地，我们提出了一个高效特征蒸馏（EFD）模块，该模块采用对比学习高效且准确地蒸馏出自基础模型派生的语言嵌入。通过高斯场的重建几何形状，我们的方法使得预训练的抓取模型能够生成无碰撞的抓取姿态候选。此外，我们提出了一个以法线为指导的抓取模块来选择最佳抓取姿态。通过全面的现实世界实验，我们证明了GaussianGrasper使得机器人能够准确地通过语言指令查询和抓取对象，为语言引导的操纵任务提供了新的解决方案。\n"
  },
  {
    "path": "abs/2403.09875.md",
    "content": "### Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting\n\nIn this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects.\n\n在这项工作中，我们提出了一种利用光学触觉传感器监督3D高斯溅射（3DGS）场景的新方法。光学触觉传感器在机器人操控和物体表征方面的使用已经变得广泛；然而，原始的光学触觉传感器数据不适合直接用于监督3DGS场景。我们的表示利用高斯过程隐式曲面来隐式表示对象，将许多触摸合并成一个具有不确定性的统一表示。我们将这个模型与单眼深度估计网络合并，通过一个两阶段过程进行对齐，首先粗略地与深度摄像机对齐，然后精细调整以匹配我们的触觉数据。对于每一张训练图像，我们的方法产生一个相应的融合深度和不确定性地图。利用这些额外信息，我们提出了一个新的损失函数，方差加权深度监督损失，用于训练3DGS场景模型。我们利用DenseTact光学触觉传感器和RealSense RGB-D摄像机展示，以这种方式结合触觉和视觉，相比单独使用视觉或触觉，在少视角场景合成上，无论是对于不透明物体还是反射和透明物体，都能得到定量和定性更好的结果。\n"
  },
  {
    "path": "abs/2403.09981.md",
    "content": "### Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting\n\nWhile text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.\n\n虽然文本到3D和图像到3D的生成任务已经得到了相当多的关注，但在它们之间的一个重要但未被充分探索的领域是可控文本到3D生成，这正是我们在这项工作中主要关注的。为了解决这个任务，1) 我们引入了多视图控制网（MVControl），这是一个新颖的神经网络架构，旨在通过整合额外的输入条件，如边缘、深度、法线和涂鸦图，来增强现有的预训练多视图扩散模型。我们的创新在于引入了一个条件模块，该模块使用从输入条件图像和相机姿态计算得到的局部和全局嵌入来控制基础扩散模型。一旦训练完成，MVControl能够为基于优化的3D生成提供3D扩散引导。并且，2) 我们提出了一个高效的多阶段3D生成流程，该流程利用了最近大型重建模型和分数蒸馏算法的优势。基于我们的MVControl架构，我们采用了一种独特的混合扩散引导方法来指导优化过程。为了追求效率，我们采用3D高斯作为我们的表示，而不是常用的隐式表示。我们还率先使用了SuGaR，一种将高斯绑定到网格三角形面的混合表示。这种方法缓解了3D高斯中几何质量差的问题，并使得直接在网格上雕刻细粒度几何成为可能。广泛的实验表明，我们的方法实现了强大的泛化能力，并使得可控生成高质量3D内容成为可能。\n"
  },
  {
    "path": "abs/2403.10050.md",
    "content": "### Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing\n\n3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture mapping module, which consists of a UV mapping MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, e.g. a single RTX 2080 Ti GPU.\n\n3D高斯溅射作为一种开创性的方法，因其高保真重建和实时渲染的能力而受到越来越多的关注。然而，它将场景的外观和几何属性耦合在高斯属性中，这限制了编辑操作的灵活性，比如纹理交换。为了解决这个问题，我们提出了一种新的方法，即Texture-GS，通过将外观表示为映射到3D表面上的2D纹理，从而实现外观和几何的分离，进而便于外观编辑。技术上，分离是通过我们提出的纹理映射模块实现的，该模块包括一个UV映射MLP来学习3D高斯中心的UV坐标，MLP的局部泰勒展开来高效近似射线-高斯交点的UV坐标，以及一个可学习的纹理来捕捉细粒度外观。在DTU数据集上的广泛实验表明，我们的方法不仅便于高保真外观编辑，而且还实现了在消费级设备上的实时渲染，例如单个RTX 2080 Ti GPU。\n"
  },
  {
    "path": "abs/2403.10147.md",
    "content": "### GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time\n\nThis paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger applicability of 3D Gaussian Splatting (3D-GS) in real-world scenarios. Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model. With the joint learning mechanism, the proposed framework can inherently estimate robust relative pose information from the image observations and thus primarily alleviate the requirement of real camera poses. Moreover, we implement a deferred back-propagation mechanism that enables high-resolution training and inference, overcoming the resolution constraints of previous methods. To enhance the speed and efficiency, we further introduce a progressive Gaussian cache module that dynamically adjusts during training and inference. As the first pose-free generalizable 3D-GS framework, GGRt achieves inference at ≥ 5 FPS and real-time rendering at ≥ 100 FPS. Through extensive experimentation, we demonstrate that our method outperforms existing NeRF-based pose-free techniques in terms of inference speed and effectiveness. It can also approach the real pose-based 3D-GS methods. Our contributions provide a significant leap forward for the integration of computer vision and computer graphics into practical applications, offering state-of-the-art results on LLFF, KITTI, and Waymo Open datasets and enabling real-time rendering for immersive experiences.\n\n本文提出了GGRt，一种新颖的可泛化新视角合成方法，该方法减轻了对真实相机姿态的需求、处理高分辨率图像的复杂性以及漫长的优化过程，从而加强了3D高斯溅射（3D-GS）在现实世界场景中的应用性。具体来说，我们设计了一个新颖的联合学习框架，该框架由迭代姿态优化网络（IPO-Net）和可泛化3D高斯模型（G-3DG）组成。借助联合学习机制，所提出的框架可以从图像观测中固有地估计出稳健的相对姿态信息，从而主要减轻了对真实相机姿态的需求。此外，我们实现了一种延迟反向传播机制，使得高分辨率训练和推断成为可能，克服了先前方法的分辨率限制。为了提高速度和效率，我们进一步引入了一个渐进式高斯缓存模块，该模块在训练和推断过程中动态调整。作为首个无姿态可泛化3D-GS框架，GGRt实现了≥5 FPS的推断速度和≥100 FPS的实时渲染速度。通过广泛的实验，我们证明了我们的方法在推断速度和有效性方面超越了现有的基于NeRF的无姿态技术。它还可以接近真实姿态基的3D-GS方法。我们的贡献为计算机视觉与计算机图形学融入实际应用提供了重大进步，在LLFF、KITTI和Waymo Open数据集上提供了最先进的结果，并实现了沉浸式体验的实时渲染。\n"
  },
  {
    "path": "abs/2403.10242.md",
    "content": "### FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model\n\nReconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available. In this paper, we introduce FDGaussian, a novel two-stage framework for single-image 3D reconstruction. Recent methods typically utilize pre-trained 2D diffusion models to generate plausible novel views from the input image, yet they encounter issues with either multi-view inconsistency or lack of geometric fidelity. To overcome these challenges, we propose an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, enabling the generation of consistent multi-view images. Moreover, we further accelerate the state-of-the-art Gaussian Splatting incorporating epipolar attention to fuse images from different viewpoints. We demonstrate that FDGaussian generates images with high consistency across different views and reconstructs high-quality 3D objects, both qualitatively and quantitatively.\n\n从单视图图像重建详细的3D对象仍然是一个具有挑战性的任务，因为可用的信息有限。在本文中，我们引入了FDGaussian，一个用于单图像3D重建的新颖的两阶段框架。最近的方法通常使用预训练的2D扩散模型从输入图像生成可信的新视角图像，但它们遇到了多视图不一致或缺乏几何保真度的问题。为了克服这些挑战，我们提出了一个正交平面分解机制，从2D输入中提取3D几何特征，使得生成一致的多视角图像成为可能。此外，我们进一步通过整合视差注意力来加速最先进的高斯溅射，以融合来自不同视点的图像。我们展示了FDGaussian生成了在不同视角间高度一致的图像，并且定性和定量地重建了高质量的3D对象。\n"
  },
  {
    "path": "abs/2403.10427.md",
    "content": "### SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians\n\nImplicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency.\n\n隐式神经表示方法在从非结构化野外照片集学习3D场景方面取得了令人印象深刻的进步，但仍受到体积渲染大量计算成本的限制。最近，3D高斯溅射作为一个更快的替代方法出现，具有更优越的渲染质量和训练效率，特别是对于小规模和以物体为中心的场景。然而，这项技术在处理非结构化野外数据时性能不佳。为了解决这个问题，我们扩展了3D高斯溅射以处理非结构化图像集合。我们通过建模外观来捕获渲染图像中的光度变化来实现这一点。此外，我们引入了一种新的机制，以无监督的方式训练临时高斯，以处理场景遮挡物的存在。在多样化的照片集场景和室外地标的多次采集上的实验显示，我们的方法相比之前的作品更有效，实现了最先进的结果，同时提高了效率。\n"
  },
  {
    "path": "abs/2403.10683.md",
    "content": "### GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation\n\nThis paper introduces GS-Pose, an end-to-end framework for locating and estimating the 6D pose of objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we utilize 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive evaluations on the LINEMOD and OnePose-LowTexture datasets demonstrate excellent performance, establishing the new state-of-the-art.\n\n本文介绍了GS-Pose，一种端到端的框架，用于定位和估计物体的6D姿态。GS-Pose从一组先前未见过的物体的带姿态RGB图像开始，构建三种不同的表示，并存储在数据库中。在推理时，GS-Pose按顺序操作，首先定位输入图像中的物体，使用检索方法估计其初始6D姿态，然后使用渲染和比较方法进行姿态精炼。关键洞察是在过程的每个阶段应用适当的物体表示。特别是对于精炼步骤，我们利用3D高斯平滑，一种新颖的可微分渲染技术，提供高渲染速度和相对低的优化时间。可以使用现成的工具链和普通硬件，如手机，来捕捉要添加到数据库中的新物体。在LINEMOD和OnePose-LowTexture数据集上的广泛评估展示了卓越的性能，树立了新的行业标杆。\n"
  },
  {
    "path": "abs/2403.10814.md",
    "content": "### DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark\n\nHumans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments.\n\n人类具有在有限或变化的照明条件下构建环境一致心理模型的非凡能力。我们希望赋予机器人同样的能力。在本文中，我们应对在光照不足条件下以及有移动光源时构建真实感场景表示的挑战。我们将照明建模任务视为一个学习问题，并利用开发的照明模型来帮助场景重建。我们引入了一个创新框架，使用数据驱动方法，即神经光模拟器（NeLiS），来模型化和校准相机-光系统。此外，我们提出了DarkGS方法，该方法应用NeLiS创建一个可重新照明的3D高斯场景模型，能够从新的视角进行实时、真实感渲染。我们展示了我们提出的模拟器和系统在各种真实世界环境中的适用性和稳健性。\n"
  },
  {
    "path": "abs/2403.11056.md",
    "content": "### Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration\n\nThe 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity.\n\n3D高斯平滑（3DGS）最近因结合了基于原始和体积3D表示的优势，从而提高了3D场景渲染的质量和效率而受到欢迎。然而，3DGS并非无别名，其在不同分辨率下的渲染可能会产生严重的模糊或锯齿。这是因为3DGS将每个像素视为一个孤立的单点而不是一个区域，导致对像素足迹变化的不敏感。因此，这种离散的采样方案不可避免地导致了别名问题，这是由于受限的采样带宽。在本文中，我们推导出一种解决这一问题的分析解。更具体地说，我们使用条件逻辑函数作为一维高斯信号的累积分布函数（CDF）的解析近似，并通过减去CDF来计算高斯积分。然后，我们在二维像素着色中引入这种近似，并提出分析平滑，它在2D像素窗口区域内解析近似高斯积分，以更好地捕捉每个像素的强度响应。此外，我们使用像素窗口积分区域的近似响应参与体渲染的透射计算，使分析平滑对不同分辨率下像素足迹的变化敏感。在各种数据集上的实验验证了我们的方法具有更好的抗锯齿能力，提供了更多细节和更好的保真度。\n"
  },
  {
    "path": "abs/2403.11134.md",
    "content": "### Recent Advances in 3D Gaussian Splatting\n\nThe emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from the fast rendering speed, the explicit representation of 3D Gaussian Splatting facilitates editing tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid change and growing number of works in this field, we present a literature review of recent 3D Gaussian Splatting methods, which can be roughly classified into 3D reconstruction, 3D editing, and other downstream applications by functionality. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian Splatting are also illustrated for a better understanding of this technique. This survey aims to help beginners get into this field quickly and provide experienced researchers with a comprehensive overview, which can stimulate the future development of the 3D Gaussian Splatting representation.\n\n3D高斯平滑（3DGS）的出现大大加快了新视角合成的渲染速度。与通过位置和视点条件神经网络表示3D场景的神经隐式表示，如Neural Radiance Fields（NeRF）不同，3D高斯平滑利用一组高斯椭球来模拟场景，从而可以通过将高斯椭球光栅化成图像来实现高效渲染。除了快速渲染速度之外，3D高斯平滑的显式表示还便于执行编辑任务，如动态重建、几何编辑和物理模拟。考虑到这一领域的迅速变化和不断增长的工作数量，我们提出了一篇关于最近3D高斯平滑方法的文献综述，这些方法大致可以按功能分为3D重建、3D编辑和其他下游应用。还阐述了传统的基于点的渲染方法和3D高斯平滑的渲染公式，以更好地理解这项技术。这项调查旨在帮助初学者快速进入这一领域，并为有经验的研究人员提供一个全面的概述，这可以刺激3D高斯平滑表示的未来发展。\n"
  },
  {
    "path": "abs/2403.11247.md",
    "content": "### Compact 3D Gaussian Splatting For Dense Visual SLAM\n\nRecent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.\n\n最近的工作表明，基于3D高斯的SLAM（同时定位与地图构建）能够实现高质量的重建、精确的姿态估计和场景的实时渲染。然而，这些方法建立在大量冗余的3D高斯椭球上，导致高内存和存储成本，以及训练速度缓慢。为了解决这一限制，我们提出了一个紧凑的3D高斯平滑SLAM系统，减少了高斯椭球的数量和参数大小。首先提出了一种基于滑动窗口的掩蔽策略来减少冗余椭球。然后我们观察到，大多数3D高斯椭球的协方差矩阵（几何属性）极其相似，这激发了一个新颖的几何码本来压缩3D高斯几何属性，即参数。通过带有重投影损失的全局捆绑调整方法实现了稳健和精确的姿态估计。广泛的实验表明，我们的方法在保持场景表示的最新艺术（SOTA）质量的同时，实现了更快的训练和渲染速度。\n"
  },
  {
    "path": "abs/2403.11273.md",
    "content": "### BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis\n\nText-to-3D synthesis has recently seen intriguing advances by combining the text-to-image models with 3D representation methods, e.g., Gaussian Splatting (GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to one-stage generation for any unseen text prompts, which yet remains challenging. A hurdle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end single-stage approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (i.e., scaling, rotation, opacity, and SH coefficient), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the triplane feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts.\n\n最近，文本到3D合成通过将文本到图像模型与3D表示方法结合起来，例如通过得分蒸馏采样（SDS）的高斯平滑（GS），取得了有趣的进展。然而，现有方法的一个障碍是低效率，即对单个3D对象的每个提示的优化。因此，从每个提示的优化转变为对任何未见文本提示的一阶段生成是迫切需要的，这仍然是一个挑战。一个障碍是如何直接生成数百万个3D高斯来表示一个3D对象。本文提出了BrightDreamer，一种端到端的单阶段方法，可以实现可泛化和快速（77毫秒）的文本到3D生成。我们的关键思想是将生成过程形式化为估计从一个具有预定义位置的锚形状的3D变形。为此，我们首先提出一个文本引导的形状变形（TSD）网络，以预测变形的形状及其新位置，用作3D高斯的中心（一个属性）。为了估计其他四个属性（即缩放、旋转、不透明度和SH系数），我们接着设计了一个新颖的文本引导的三平面生成器（TTG），来为3D对象生成一个三平面表示。每个高斯的中心使我们能够将三平面特征转换为四个属性。生成的3D高斯最终可以以每秒705帧的速度渲染。广泛的实验表明我们方法相比现有方法的优越性。此外，BrightDreamer即使对复杂的文本提示也具有强大的语义理解能力。\n"
  },
  {
    "path": "abs/2403.11324.md",
    "content": "### GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering\n\nDuring the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue, we propose a novel approach called GeoGaussian. Based on the smoothly connected areas observed from point clouds, this method introduces a novel pipeline to initialize thin Gaussians aligned with the surfaces, where the characteristic can be transferred to new generations through a carefully designed densification strategy. Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints. Benefiting from the proposed architecture, the generative ability of 3D Gaussians is enhanced, especially in structured regions. Our proposed pipeline achieves state-of-the-art performance in novel view synthesis and geometric reconstruction, as evaluated qualitatively and quantitatively on public datasets.\n\n在高斯平滑优化过程中，如果不特意保持场景的结构，场景的几何形态会逐渐恶化，特别是在非纹理区域如墙壁、天花板和家具表面。这种退化显著影响了从训练数据中的视点大幅偏离的新视角的渲染质量。为了缓解这个问题，我们提出了一种名为GeoGaussian的新方法。基于从点云观察到的平滑连接区域，这种方法引入了一种新的管道来初始化与表面对齐的细高斯，其中的特性可以通过精心设计的密集化策略转移到新生成物上。最后，该管道确保通过具有显式几何约束的受限优化过程保持场景的几何形态和纹理。得益于所提出的架构，3D高斯的生成能力得到了增强，特别是在结构化区域。我们提出的管道在新视角合成和几何重建方面达到了最新技术水平，已通过公共数据集上的定性和定量评估证实。\n"
  },
  {
    "path": "abs/2403.11367.md",
    "content": "### 3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization\n\nThis paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.\n\n本文提出了一种新的系统，旨在使用3D高斯平滑进行3D地图制作和视觉重定位。我们的方法利用激光雷达和相机数据，创建环境的准确且视觉上可信的表示。通过利用激光雷达数据启动3D高斯平滑地图的训练，我们的系统构建了既详细又几何上准确的地图。为了减轻过度的GPU内存使用并促进快速的空间查询，我们采用了2D体素图和KD树的组合。这一准备使我们的方法非常适合视觉定位任务，能够通过归一化互相关（NCC）高效识别查询图像与高斯平滑地图渲染图像之间的对应关系。此外，我们使用基于特征的匹配和透视n点（PnP）技术，对查询图像的相机姿态进行了精细化。我们的系统的有效性、适应性和精度通过在KITTI360数据集上的广泛评估得到了证明。\n\n"
  },
  {
    "path": "abs/2403.11427.md",
    "content": "### BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors\n\nAnimatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods.\n\n在各个领域中，可动画的3D重建具有重要的应用价值，主要依赖于艺术家的手工创建。最近，一些研究已成功地从单目视频中构建可动画的3D模型。然而，这些方法要求输入视频中的物体有足够的视角覆盖，并且通常需要大量的时间和计算成本进行训练和渲染。这一限制限制了它们的实际应用。在这项工作中，我们提出了一种从单目视频中使用扩散先验构建可动画的3D高斯平滑的方法。3D高斯表示显著加快了训练和渲染过程，扩散先验允许该方法学习有限视点的3D模型。我们还提出了刚性正则化来增强对先验的利用。我们对各种真实世界的视频进行了广泛的评估，展示了其相比当前最新技术方法的优越性能。\n"
  },
  {
    "path": "abs/2403.11447.md",
    "content": "### Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction\n\n3D Gaussian Splatting (3DGS) has become an emerging tool for dynamic scene reconstruction. However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy. To address the above problem, we propose a novel motion-aware enhancement framework for dynamic scene reconstruction, which mines useful motion cues from optical flow to improve different paradigms of dynamic 3DGS. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. Then a novel flow augmentation method is introduced with additional insights into uncertainty and loss collaboration. Moreover, for the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed. We conduct extensive experiments on both multi-view and monocular scenes to verify the merits of our work. Compared with the baselines, our method shows significant superiority in both rendering quality and efficiency.\n\n3D高斯平滑（3DGS）已成为动态场景重建的新兴工具。然而，现有方法主要集中在将静态3DGS扩展到时间变化表示上，而忽视了2D观察所携带的丰富运动信息，因此遭受性能下降和模型冗余问题。为了解决上述问题，我们提出了一种新颖的感知运动增强框架，用于动态场景重建，该框架从光流中挖掘有用的运动线索以改善动态3DGS的不同范式。具体来说，我们首先建立3D高斯运动与像素级流之间的对应关系。然后引入一种新颖的流增强方法，并对不确定性和损失协作提供额外的见解。此外，对于流行的基于变形的范式，它提出了一个难以优化问题，我们提出了一个瞬态感知的变形辅助模块。我们在多视图和单目场景上进行了广泛的实验以验证我们工作的优点。与基准相比，我们的方法在渲染质量和效率方面显示了显著的优势。\n"
  },
  {
    "path": "abs/2403.11453.md",
    "content": "### Bridging 3D Gaussian and Mesh for Freeview Video Rendering\n\nThis is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the fuzzy geometry. Meanwhile, the point-based splatting (e.g. the 3D Gaussian Splatting) method usually produces artifacts or blurry pixels in the area with smooth geometry and sharp textures. As a result, it is difficult, even not impossible, to represent the complex and dynamic scene with a single type of primitive. To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes. Given a sequence of tracked mesh as initialization, our goal is to simultaneously optimize the mesh geometry, color texture, opacity maps, a set of 3D Gaussians, and the deformation field. At a specific time, we perform α-blending on the RGB and opacity values based on the merged and re-ordered z-buffers from mesh and 3D Gaussian rasterizations. This produces the final rendering, which is supervised by the ground-truth image. Experiments demonstrate that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene and outperforms all the baseline methods in both quantitative and qualitative comparisons without losing render speed.\n\n这只是GauMesh的预览版本。最近，基于原始图元的渲染在解决从2D图像建模和渲染3D动态场景的问题中已被证明能够取得令人信服的结果。尽管如此，在新视角合成的上下文中，每种类型的图元在表示能力方面都有其固有的缺陷。利用网格描述模糊的几何形状是困难的。同时，基于点的平滑（例如3D高斯平滑）方法通常在具有平滑几何和锐利纹理的区域产生伪影或模糊像素。因此，用单一类型的图元来表示复杂和动态的场景是困难的，甚至是不可能的。为此，我们提出了一种新的方法，GauMesh，以桥接3D高斯和网格，用于建模和渲染动态场景。给定一系列跟踪的网格作为初始化，我们的目标是同时优化网格几何、颜色纹理、不透明度映射、一组3D高斯和变形场。在特定时间，我们根据从网格和3D高斯光栅化合并和重新排序的z缓冲区，对RGB和不透明度值进行α混合。这产生了最终渲染，该渲染由真实图像监督。实验表明，我们的方法适应不同类型的图元来表示动态场景的不同部分，并在定量和定性比较中超越所有基线方法，而不会损失渲染速度。\n"
  },
  {
    "path": "abs/2403.11460.md",
    "content": "### Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning\n\nIn this work, we present Fed3DGS, a scalable 3D reconstruction framework based on 3D Gaussian splatting (3DGS) with federated learning. Existing city-scale reconstruction methods typically adopt a centralized approach, which gathers all data in a central server and reconstructs scenes. The approach hampers scalability because it places a heavy load on the server and demands extensive data storage when reconstructing scenes on a scale beyond city-scale. In pursuit of a more scalable 3D reconstruction, we propose a federated learning framework with 3DGS, which is a decentralized framework and can potentially use distributed computational resources across millions of clients. We tailor a distillation-based model update scheme for 3DGS and introduce appearance modeling for handling non-IID data in the scenario of 3D reconstruction with federated learning. We simulate our method on several large-scale benchmarks, and our method demonstrates rendered image quality comparable to centralized approaches. In addition, we also simulate our method with data collected in different seasons, demonstrating that our framework can reflect changes in the scenes and our appearance modeling captures changes due to seasonal variations.\n\n在这项工作中，我们介绍了Fed3DGS，一个基于3D高斯平滑（3DGS）与联邦学习的可扩展3D重建框架。现有的城市规模重建方法通常采用中心化方法，将所有数据集中在一个中央服务器上并重建场景。这种方法阻碍了可扩展性，因为它给服务器带来了重负载，并且在超过城市规模的场景重建时要求大量的数据存储。为了追求更可扩展的3D重建，我们提出了一个带有3DGS的联邦学习框架，这是一个去中心化的框架，可以潜在地使用跨数百万客户端的分布式计算资源。我们为3DGS量身定制了一个基于蒸馏的模型更新方案，并引入了外观建模，以处理在联邦学习下3D重建场景中的非独立同分布（non-IID）数据。我们在几个大规模基准上模拟了我们的方法，我们的方法展示出与中心化方法相当的渲染图像质量。此外，我们还使用在不同季节收集的数据模拟了我们的方法，展示了我们的框架可以反映场景的变化，我们的外观建模捕捉了由季节变化引起的变化。\n"
  },
  {
    "path": "abs/2403.11577.md",
    "content": "### 3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration\n\nReliable multimodal sensor fusion algorithms re- quire accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high compu- tational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new ren- dering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset.\n\n可靠的多模态传感器融合算法需要精确的时空校准。最近，基于隐式神经表示的无目标校准技术已被证明能提供精确和鲁棒的结果。尽管如此，鉴于体积渲染所需的大量采样点导致的高计算开销，这类方法本质上训练速度慢。随着最近引入的3D高斯平滑作为隐式表示方法的更快替代方案，我们提议利用这种新的渲染方法来实现更快的多传感器校准。我们介绍3DGS-Calib，一种新的校准方法，依靠3D高斯平滑的速度和渲染精度，实现精确、鲁棒且与依赖隐式神经表示方法相比有显著加速的多模态时空校准。我们通过在KITTI-360这一广泛使用的驾驶数据集上的序列的实验结果，展示了我们提议的优越性。\n"
  },
  {
    "path": "abs/2403.11589.md",
    "content": "### UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling\n\nReconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose.\n\n重建逼真可驾驶的人类化身从多视图图像序列一直是计算机视觉和图形领域一个受欢迎且具有挑战性的话题。尽管现有基于NeRF的方法可以实现人类模型的高质量新视角渲染，但训练和推理过程都非常耗时。最近的方法利用3D高斯分布来表示人体，使训练和渲染速度更快。然而，这些方法忽视了网格引导的重要性，并直接在3D空间中预测高斯分布，仅使用粗略的网格引导。这阻碍了高斯分布的学习过程，并倾向于产生模糊的纹理。因此，我们提出UV高斯分布，该方法通过共同学习网格变形和2D UV空间高斯纹理来模拟3D人体。我们利用UV映射的嵌入来在2D空间学习高斯纹理，利用强大的2D网络的能力来提取特征。另外，通过一个独立的网格网络，我们优化了依赖于姿态的几何变形，从而指导高斯渲染并显著提高渲染质量。我们收集并处理了一个新的人体运动数据集，包括多视图图像、扫描模型、参数模型注册和相应的纹理映射。实验结果表明，我们的方法在新视角和新姿态的合成上达到了最先进的水平。\n"
  },
  {
    "path": "abs/2403.11625.md",
    "content": "### GaussNav: Gaussian Splatting for Visual Navigation\n\nIn embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors.\nExisting map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene.\nTo address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS).\nThe proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects.\nOur GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset.\n\n在具身视觉中，实例图像目标导航（Instance ImageGoal Navigation，IIN）要求一个代理在未探索的环境中定位到一个在目标图像中描绘的特定对象。IIN的主要难点源于需要跨不同视点识别目标对象并排除潜在的干扰物。\n现有的基于地图的导航方法大多采用鸟瞰图（Bird's Eye View，BEV）地图的表示形式，然而，这种方法缺乏场景中详细纹理的表示。\n为了解决上述问题，我们提出了一种新的高斯喷溅导航（Gaussian Splatting Navigation，简称GaussNav）框架用于IIN任务，该框架基于3D高斯喷溅（3D Gaussian Splatting，3DGS）构建了一种新颖的地图表示。\n所提出的框架使代理不仅能记住场景的几何和语义信息，还能保留对象的纹理特征。\n我们的GaussNav框架在性能上展示了显著的飞跃，这一点通过在具有挑战性的Habitat-Matterport 3D（HM3D）数据集上将成功率加权路径长度（Success weighted by Path Length，SPL）从0.252提高到0.578得到了证明。\n\n"
  },
  {
    "path": "abs/2403.11679.md",
    "content": "### NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting\n\nWe propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping.\n\n我们提出了NEDS-SLAM，一个基于3D高斯表示的显式密集语义SLAM系统，该系统能够在实时内实现稳健的3D语义映射、准确的相机跟踪和高质量的渲染。在该系统中，我们提出了一个空间一致性特征融合模型，以减少来自预训练分割头的错误估计对语义重建的影响，实现稳健的3D语义高斯映射。此外，我们采用了一个轻量级的编解码器，将高维语义特征压缩成紧凑的3D高斯表示，从而减轻过多内存消耗的负担。进一步地，我们利用了3D高斯喷溅的优势，该技术能够实现高效且可微的新视角渲染，并提出了一种虚拟相机视图修剪方法，以消除异常的GS点，从而有效提升场景表示的质量。我们的NEDS-SLAM方法在映射和跟踪精度方面，在Replica和ScanNet数据集上展示出与现有密集语义SLAM方法的竞争性能，同时也展现了在3D密集语义映射方面的出色能力。\n"
  },
  {
    "path": "abs/2403.11831.md",
    "content": "### BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting\n\nWhile neural rendering has demonstrated impressive capabilities in 3D scene reconstruction and novel view synthesis, it heavily relies on high-quality sharp images and accurate camera poses. Numerous approaches have been proposed to train Neural Radiance Fields (NeRF) with motion-blurred images, commonly encountered in real-world scenarios such as low-light or long-exposure conditions. However, the implicit representation of NeRF struggles to accurately recover intricate details from severely motion-blurred images and cannot achieve real-time rendering. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction and real-time rendering by explicitly optimizing point clouds as Gaussian spheres.\nIn this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction. Our method models the physical image formation process of motion-blurred images and jointly learns the parameters of Gaussians while recovering camera motion trajectories during exposure time.\nIn our experiments, we demonstrate that BAD-Gaussians not only achieves superior rendering quality compared to previous state-of-the-art deblur neural rendering methods on both synthetic and real datasets but also enables real-time rendering capabilities.\n\n虽然神经渲染在3D场景重建和新视角合成方面展示了令人印象深刻的能力，但它严重依赖于高质量清晰图像和准确的相机姿态。许多方法已被提出来用运动模糊图像训练神经辐射场（NeRF），这是在实际场景中常遇到的情况，比如低光照或长时间曝光条件。然而，NeRF的隐式表示难以从严重运动模糊的图像中准确恢复出复杂的细节，且无法实现实时渲染。相比之下，最近在3D高斯喷溅方面的进展通过显式优化点云为高斯球，实现了高质量的3D场景重建和实时渲染。\n在本文中，我们介绍了一种新颖的方法，名为BAD-Gaussians（Bundle Adjusted Deblur Gaussian Splatting），它利用显式高斯表示，并能处理严重运动模糊图像及不准确的相机姿态，以实现高质量的场景重建。我们的方法模拟了运动模糊图像的物理成像过程，并在曝光时间内共同学习高斯参数，同时恢复相机运动轨迹。\n在我们的实验中，我们展示了BAD-Gaussians不仅在合成和真实数据集上相比之前的最先进去模糊神经渲染方法实现了更优越的渲染质量，而且还启用了实时渲染能力。\n"
  },
  {
    "path": "abs/2403.11868.md",
    "content": "### View-Consistent 3D Editing with Gaussian Splatting\n\nThe advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes.\n\n3D高斯喷溅（3DGS）的出现彻底革新了3D编辑，提供了高效、高保真的渲染并实现了精确的局部操作。目前，扩散基础的2D编辑模型被用于修改多视图渲染图像，这些图像随后指导3DGS模型的编辑。然而，这种方法面临一个关键问题，即多视图不一致性，其中指导图像在不同视图中展现出显著差异，导致模式崩溃和3DGS的视觉缺陷。为此，我们引入了视图一致性编辑（VcEdit），一个将3DGS无缝整合到图像编辑过程中的新颖框架，确保编辑后的指导图像具有多视图一致性，并有效缓解模式崩溃问题。VcEdit采用了两个创新的一致性模块：交叉注意力一致性模块和编辑一致性模块，都旨在减少编辑图像中的不一致性。通过将这些一致性模块纳入迭代模式，VcEdit熟练地解决了多视图不一致性问题，促进了在多样化场景中进行高质量3DGS编辑。\n"
  },
  {
    "path": "abs/2403.12010.md",
    "content": "### VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model\n\nGenerating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects.\n\n基于文本或单图提示生成多视图图像对于3D内容的创建来说是一个关键能力。围绕这一主题的两个基本问题是我们用于训练的数据是什么，以及如何确保多视图的一致性。本文引入了一个新颖的框架，对这两个问题都做出了基本的贡献。不同于利用2D扩散模型的图像进行训练，我们提出了一个从现成视频生成模型中微调的密集一致多视图生成模型。视频生成模型的图像更适合于多视图生成，因为生成它们的底层网络架构采用了时间模块来强制帧一致性。此外，用于训练这些模型的视频数据集丰富且多样，导致训练-微调领域的差距减小。为了增强多视图一致性，我们引入了一个3D感知去噪采样，首先利用前馈重建模块得到一个显式的全球3D模型，然后采用一种采样策略，有效地将从全球3D模型渲染的图像纳入去噪采样循环中，以提高最终图像的多视图一致性。作为一个附带产物，这个模块还提供了一种在几秒钟内快速创建由3D高斯表示的3D资产的方法。我们的方法可以生成24个密集视图，并且在训练中的收敛速度比现有的最先进方法（4个GPU小时对比数千个GPU小时）要快得多，同时在视觉质量和一致性上也相当。通过进一步的微调，我们的方法在定量指标和视觉效果上都超过了现有的最先进方法。\n"
  },
  {
    "path": "abs/2403.12365.md",
    "content": "### GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation\n\nCreating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis.\n\n从图像或视频中创建4D高斯喷溅场是一项具有挑战性的任务，因为其性质不受约束。虽然优化可以从输入视频中提取光度参考或通过生成模型进行调节，但直接监督高斯运动仍然是一个未充分探索的领域。在本文中，我们引入了一个新概念，高斯流，它连接了3D高斯的动态和连续帧之间的像素速度。通过将高斯动态喷溅到图像空间中，可以高效地获得高斯流。这一可微分过程使得直接从光流中进行动态监督成为可能。我们的方法显著促进了4D动态内容生成和4D新视角合成的高斯喷溅，特别是对于那些难以被现有方法处理的富含运动的内容。常见的颜色漂移问题在4D生成中也得到了解决，高斯动态有所改进。大量实验中的优越视觉质量展示了我们方法的有效性。定量和定性评估表明，我们的方法在4D生成和4D新视角合成的两项任务上均达到了最先进的结果。\n"
  },
  {
    "path": "abs/2403.12535.md",
    "content": "### High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization\n\nWe propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM.\n\n我们提出了一种基于3D高斯喷溅的密集RGBD SLAM系统，该系统提供了精确的位姿跟踪和视觉上逼真的重建。为此，我们首先提出了一种基于渲染损失的高斯密集化策略，以映射未观察到的区域并精细化重复观察到的区域。其次，我们引入了额外的正则化参数来缓解连续映射问题中的遗忘问题，其中参数倾向于过度拟合最新帧，并导致之前帧的渲染质量下降。映射和跟踪都是通过最小化可微分方式的重渲染损失，并使用高斯参数来执行的。与最近的神经网络和同时开发的高斯喷溅RGBD SLAM基线相比，我们的方法在合成数据集Replica上实现了最先进的结果，在现实世界数据集TUM上也达到了有竞争力的结果。\n"
  },
  {
    "path": "abs/2403.12550.md",
    "content": "### RGBD GS-ICP SLAM\n\nSimultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map.\n\n在机器人、虚拟现实（VR）和增强现实（AR）应用中，具有密集表示的同时定位与建图（SLAM）起着关键作用。最近在密集表示SLAM的进展突显了利用神经场景表示和3D高斯表示进行高保真空间表示的潜力。在本文中，我们提出了一种新颖的密集表示SLAM方法，该方法融合了广义迭代最近点（G-ICP）和3D高斯喷溅（3DGS）。与现有方法不同，我们利用单一的高斯地图同时进行跟踪和映射，从而获得相互利益。通过在跟踪和映射过程中交换协方差，并使用比例对齐技术，我们最小化了冗余计算并实现了一个高效的系统。此外，我们通过我们的关键帧选择方法提高了跟踪精度和映射质量。实验结果证明了我们方法的有效性，显示出高达107 FPS（整个系统）的令人难以置信的快速速度和重建地图的优越质量。\n"
  },
  {
    "path": "abs/2403.12722.md",
    "content": "### HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting\n\nHolistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manually annotated 3D bounding boxes. In this paper, we introduce a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy. Experimental results on KITTI, KITTI-360, and Virtual KITTI 2 demonstrate the effectiveness of our approach.\n\n基于RGB图像的城市场景的整体理解是一个具有挑战性但非常重要的问题。它包括了对几何和外观的理解，以实现新视角的合成、解析语义标签和跟踪移动对象。尽管取得了相当大的进步，现有方法通常只关注这一任务的特定方面，并且需要额外的输入，如LiDAR扫描或手动标注的3D边界框。在这篇论文中，我们引入了一个利用3D高斯喷溅进行城市场景整体理解的新流程。我们的主要思想涉及使用静态和动态3D高斯的组合对几何、外观、语义和运动进行联合优化，其中移动对象姿态通过物理约束进行规范化。我们的方法提供了实时渲染新视点的能力，能够高精度地生成2D和3D语义信息，并且即使在3D边界框检测非常噪声的情况下也能重建动态场景。在KITTI、KITTI-360和Virtual KITTI 2上的实验结果展示了我们方法的有效性。\n"
  },
  {
    "path": "abs/2403.12957.md",
    "content": "### GVGEN: Text-to-3D Generation with Volumetric Representation\n\nIn recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed (∼7 seconds), effectively striking a balance between quality and efficiency.\n\n近年来，3D高斯喷溅作为一种强大的3D重建和生成技术而崭露头角，以其快速和高质量的渲染能力而闻名。为了解决这些不足，本文介绍了一种新颖的基于扩散的框架，GVGEN，旨在高效地从文本输入生成3D高斯表示。我们提出了两种创新技术：（1）结构化体积表示。我们首先将无组织的3D高斯点作为一种结构化形式的GaussianVolume排列。这种转换允许捕捉由固定数量的高斯组成的体积内的复杂纹理细节。为了更好地优化这些细节的表示，我们提出了一种独特的修剪和密集化方法，名为候选池策略，通过选择性优化增强细节保真度。（2）由粗到细的生成管道。为了简化GaussianVolume的生成并使模型能够生成具有详细3D几何形状的实例，我们提出了一种由粗到细的管道。它最初构建一个基本的几何结构，随后预测完整的高斯属性。我们的框架，GVGEN，在定性和定量评估中相比现有的3D生成方法表现出优越的性能。同时，它保持了快速的生成速度（∼7秒），有效地在质量和效率之间找到了平衡。\n"
  },
  {
    "path": "abs/2403.13327.md",
    "content": "### Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion\n\nHigh-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings.\n\n高质量场景重建和新视角合成基于高斯喷溅（3DGS）通常需要稳定、高质量的照片，这往往难以通过手持相机捕捉实现。我们提出了一种方法，该方法能够适应相机运动，并允许使用受运动模糊和卷帘快门畸变影响的手持视频数据进行高质量场景重建。我们的方法基于对物理图像形成过程的详细建模，并利用视觉-惯性测程（VIO）估计出的速度。考虑到单个图像帧的曝光时间内相机姿态是非静态的，并且在重建过程中进一步优化相机姿态。我们构建了一个可微渲染管线，该管线利用屏幕空间近似高效地将卷帘快门和运动模糊效果纳入到3DGS框架中。我们使用合成数据和真实数据的结果展示了在减轻相机运动方面相较于现有方法的优越性能，从而推进了3DGS在自然场景设置中的应用。\n"
  },
  {
    "path": "abs/2403.13806.md",
    "content": "### RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS\n\nRecent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods, on the other hand, rely on rasterization and naturally achieve real-time rendering but suffer from brittle optimization heuristics that underperform on more challenging scenes. In this work, we present RadSplat, a lightweight method for robust real-time rendering of complex scenes. Our main contributions are threefold. First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization. Next, we develop a novel pruning technique reducing the overall point count while maintaining high quality, leading to smaller and more compact scene representations with faster inference speeds. Finally, we propose a novel test-time filtering approach that further accelerates rendering and allows to scale to larger, house-sized scenes. We find that our method enables state-of-the-art synthesis of complex captures at 900+ FPS.\n\n最近在视图合成和实时渲染方面的进步已经在令人印象深刻的渲染速度下实现了逼真的质量。虽然基于辐射场的方法在诸如野外捕捉和大规模场景等挑战性场景中实现了最先进的质量，但它们通常受到与体积渲染相关的过高计算需求的困扰。另一方面，基于高斯喷溅的方法依赖于光栅化，并自然实现实时渲染，但在更具挑战性的场景中表现不佳，因为它们受到脆弱的优化启发式的影响。在这项工作中，我们介绍了RadSplat，一种轻量级方法，用于复杂场景的稳健实时渲染。我们的主要贡献有三个方面。首先，我们使用辐射场作为优化基于点的场景表示的先验和监督信号，从而提高了质量并使优化更加稳健。接下来，我们开发了一种新颖的剪枝技术，减少了整体点数，同时保持高质量，导致更小且更紧凑的场景表示，以及更快的推理速度。最后，我们提出了一种新颖的测试时过滤方法，进一步加速了渲染，并允许扩展到更大的、房屋大小的场景。我们发现，我们的方法能够以900+ FPS的速度实现复杂捕捉的最先进合成。\n"
  },
  {
    "path": "abs/2403.14166.md",
    "content": "### Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians\n\nIn this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through Gaussian binarization and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our proposed Mini-Splatting method integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works.\n\n在这项研究中，我们探讨了如何高效地用有限数量的高斯函数表示场景的挑战。我们的分析从传统图形学和二维计算机视觉转向点云的视角，强调高斯表示的低效空间分布是模型性能的一个关键限制。为了解决这个问题，我们引入了密集化策略，包括模糊分裂和深度重新初始化，以及通过高斯二值化和采样来简化。这些技术重新组织了高斯的空间位置，导致在渲染质量、资源消耗和存储压缩方面在各种数据集和基准测试中的显著改进。我们提出的Mini-Splatting方法与原始光栅化管线无缝集成，为未来基于高斯喷溅的研究提供了一个强大的基线。\n"
  },
  {
    "path": "abs/2403.14244.md",
    "content": "### Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering\n\nThe 3D Gaussian splatting method has drawn a lot of attention, thanks to its high performance in training and high quality of the rendered image. However, it uses anisotropic Gaussian kernels to represent the scene. Although such anisotropic kernels have advantages in representing the geometry, they lead to difficulties in terms of computation, such as splitting or merging two kernels. In this paper, we propose to use isotropic Gaussian kernels to avoid such difficulties in the computation, leading to a higher performance method. The experiments confirm that the proposed method is about {\\bf 100X} faster without losing the geometry representation accuracy. The proposed method can be applied in a large range applications where the radiance field is needed, such as 3D reconstruction, view synthesis, and dynamic object modeling.\n\n3D高斯喷溅方法由于其在训练中的高性能和渲染图像的高质量而受到了广泛关注。然而，它使用各向异性高斯核来表示场景。尽管这样的各向异性核在表示几何形状方面有其优势，但它们在计算方面导致了困难，例如分裂或合并两个核。在本文中，我们提出使用各向同性高斯核以避免计算中的这些困难，从而导致一种性能更高的方法。实验确认，所提出的方法在不失去几何表示精度的情况下，速度约快了 **100倍**。所提出的方法可以应用于大范围需要辐射场的应用中，如3D重建、视图合成和动态物体建模。\n"
  },
  {
    "path": "abs/2403.14530.md",
    "content": "### HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression\n\n3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over 75× compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over 11× size reduction over SOTA 3DGS compression approach Scaffold-GS.\n\n3D高斯喷溅（3DGS）已成为新颖视图合成的一个有前途的框架，以其快速渲染速度和高保真度而自豪。然而，大量的高斯及其相关属性需要有效的压缩技术。尽管如此，高斯点云（或在我们的论文中称为锚点）的稀疏和无组织性质给压缩带来了挑战。为了解决这一问题，我们利用了无组织锚点与结构化哈希网格之间的关系，利用它们的互信息进行上下文建模，并提出了一个哈希网格辅助上下文（HAC）框架，用于高度紧凑的3DGS表示。我们的方法引入了二进制哈希网格以建立连续的空间一致性，允许我们通过精心设计的上下文模型揭示锚点的固有空间关系。为了促进熵编码，我们使用高斯分布来准确估计每个量化属性的概率，其中提出了一个自适应量化模块，以实现这些属性的高精度量化，从而改善保真度恢复。此外，我们加入了一个自适应遮罩策略来消除无效的高斯和锚点。重要的是，我们的工作是首次探索基于上下文的压缩，用于3DGS表示，与普通的3DGS相比，实现了超过75倍的显著大小减少，同时提高了保真度，并且与SOTA 3DGS压缩方法Scaffold-GS相比，实现了超过11倍的大小减少。\n"
  },
  {
    "path": "abs/2403.14554.md",
    "content": "### Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering\n\nWe propose Gaussian Frosting, a novel mesh-based representation for high-quality rendering and editing of complex 3D effects in real-time. Our approach builds on the recent 3D Gaussian Splatting framework, which optimizes a set of 3D Gaussians to approximate a radiance field from images. We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adaptive layer of Gaussians with a variable thickness around the mesh to better capture the fine details and volumetric effects near the surface, such as hair or grass. We call this layer Gaussian Frosting, as it resembles a coating of frosting on a cake. The fuzzier the material, the thicker the frosting. We also introduce a parameterization of the Gaussians to enforce them to stay inside the frosting layer and automatically adjust their parameters when deforming, rescaling, editing or animating the mesh. Our representation allows for efficient rendering using Gaussian splatting, as well as editing and animation by modifying the base mesh. We demonstrate the effectiveness of our method on various synthetic and real scenes, and show that it outperforms existing surface-based approaches. We will release our code and a web-based viewer as additional contributions.\n\n我们提出了高斯霜化，这是一种新颖的基于网格的表示法，用于实时高质量渲染和编辑复杂的3D效果。我们的方法基于最近的3D高斯喷溅框架，该框架优化了一组3D高斯以从图像中近似辐射场。我们提出首先在优化过程中从高斯中提取一个基础网格，然后在网格周围构建并细化一个具有可变厚度的自适应高斯层，以更好地捕捉表面附近的细节和体积效果，如头发或草。我们称这层为高斯霜化，因为它类似于蛋糕上的一层霜。材料越模糊，霜化层越厚。我们还引入了高斯的参数化，以强制它们保持在霜化层内，并在变形、重新缩放、编辑或动画化网格时自动调整它们的参数。我们的表示法允许使用高斯喷溅进行高效渲染，以及通过修改基础网格进行编辑和动画制作。我们在各种合成和真实场景上演示了我们方法的有效性，并展示了它超越了现有的基于表面的方法。我们将发布我们的代码和一个基于网络的查看器作为额外的贡献。\n"
  },
  {
    "path": "abs/2403.14621.md",
    "content": "### GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation\n\nWe introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models.\n\n我们介绍了GRM，这是一种大规模重建器，能够在大约0.1秒内从稀疏视图图像中恢复出3D资产。GRM是一个基于前馈变换器的模型，有效地结合多视图信息，将输入像素转换为与像素对齐的高斯，这些高斯被反投影以创建一组密集分布的3D高斯，代表一个场景。我们的变换器架构和3D高斯的使用共同解锁了一个可扩展和高效的重建框架。广泛的实验结果证明了我们的方法在重建质量和效率方面都优于其他选择。我们还展示了GRM在生成任务中的潜力，即通过将其与现有的多视图扩散模型集成，实现文本到3D和图像到3D。\n"
  },
  {
    "path": "abs/2403.14627.md",
    "content": "### MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images\n\nWe propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the Gaussian centers, we propose to build a cost volume representation via plane sweeping in the 3D space, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We learn the Gaussian primitives' opacities, covariances, and spherical harmonics coefficients jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussian Splatting models via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, our model achieves state-of-the-art performance with the fastest feed-forward inference speed (22 fps). Compared to the latest state-of-the-art method pixelSplat, our model uses 10× fewer parameters and infers more than 2× faster while providing higher appearance and geometry quality as well as better cross-dataset generalization.\n\n我们提出了MVSplat，这是一个从稀疏多视图图像学习得来的高效前馈3D高斯喷溅模型。为了准确地定位高斯中心，我们提出通过在3D空间中进行平面扫描来构建成本体积表示，其中存储在成本体积中的跨视图特征相似性可以为深度估计提供宝贵的几何线索。我们学习高斯原语的不透明度、协方差和球谐函数系数，同时仅依赖于光度监督与高斯中心共同进行。我们通过广泛的实验评估，展示了成本体积表示在学习前馈高斯喷溅模型中的重要性。在大规模的RealEstate10K和ACID基准测试上，我们的模型以最快的前馈推理速度（22fps）实现了最先进的性能。与最新的最先进方法pixelSplat相比，我们的模型使用了10倍更少的参数，并且推理速度快2倍以上，同时提供了更高的外观和几何质量以及更好的跨数据集泛化能力。\n"
  },
  {
    "path": "abs/2403.14939.md",
    "content": "### STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians\n\nRecent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from 3D generation techniques, we utilize a multi-view diffusion model to initialize multi-view images anchoring on the input video frames, where the video can be either real-world captured or generated by a video diffusion model. To ensure the temporal consistency of the multi-view sequence initialization, we introduce a simple yet effective fusion strategy to leverage the first frame as a temporal anchor in the self-attention computation. With the almost consistent multi-view sequences, we then apply the score distillation sampling to optimize the 4D Gaussian point cloud. The 4D Gaussian spatting is specially crafted for the generation task, where an adaptive densification strategy is proposed to mitigate the unstable Gaussian gradient for robust optimization. Notably, the proposed pipeline does not require any pre-training or fine-tuning of diffusion networks, offering a more accessible and practical solution for the 4D generation task. Extensive experiments demonstrate that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video.\n\n近期，预训练的扩散模型和3D生成技术的进步激发了对4D内容创作的兴趣。然而，实现具有空间-时间一致性的高保真4D生成仍然是一个挑战。在这项工作中，我们提出了STAG4D，一个新颖的框架，结合了预训练的扩散模型和动态3D高斯喷溅技术，用于高保真4D生成。借鉴3D生成技术的灵感，我们利用多视图扩散模型来初始化固定在输入视频帧上的多视图图像，其中视频可以是现实世界捕获的，也可以是通过视频扩散模型生成的。为了确保多视图序列初始化的时间一致性，我们引入了一个简单而有效的融合策略，利用第一帧作为自注意力计算中的时间锚。通过几乎一致的多视图序列，我们随后应用得分蒸馏采样来优化4D高斯点云。4D高斯喷溅特别为生成任务设计，其中提出了一种适应性增密策略，以缓解不稳定的高斯梯度，实现稳健的优化。值得注意的是，所提出的流程不需要任何预训练或微调扩散网络，为4D生成任务提供了一个更加可行和实用的解决方案。广泛的实验表明，我们的方法在渲染质量、空间-时间一致性和生成鲁棒性方面超越了以往的4D生成工作，为从多样化输入（包括文本、图像和视频）生成4D内容设定了新的行业标准。\n"
  },
  {
    "path": "abs/2403.15124.md",
    "content": "### EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting\n\nPrecise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries.\n\n精确的相机跟踪、高保真的3D组织重建以及实时在线可视化对于体内医学成像设备，如内窥镜和胶囊机器人等，至关重要。然而，现有的SLAM（同时定位与地图构建）方法经常难以同时实现完整的高质量外科手术场景重建和高效的计算，限制了它们在内窥镜手术中的术中应用。在这篇论文中，我们介绍了EndoGSLAM，一种针对内窥镜手术的高效SLAM方法，该方法整合了简化的高斯表示和可微光栅化，以便在在线相机跟踪和组织重建过程中实现超过100 fps的渲染速度。广泛的实验表明，EndoGSLAM在术中可用性和重建质量之间实现了比传统或神经SLAM方法更好的权衡，为内窥镜手术展现了巨大的潜力。\n"
  },
  {
    "path": "abs/2403.15530.md",
    "content": "### Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets.\n\n3D高斯喷溅（3DGS）在推进实时渲染性能的同时，展示了令人印象深刻的新视角合成结果。然而，它严重依赖于初始点云的质量，导致在初始化点不足的区域出现模糊和针状伪影。这主要归因于3DGS中的点云生长条件只考虑了来自可观察视图的点的平均梯度大小，因此对于许多视点可观察但许多仅在边界覆盖的大高斯，它未能进行生长。为此，我们提出了一种新的方法，名为Pixel-GS，考虑在计算生长条件时，每个视图中高斯覆盖的像素数。我们将覆盖的像素数视为权重，以动态平均不同视图的梯度，使得可以促进大高斯的生长。结果，可以更有效地增长初始化点不足区域内的点，导致更准确和详细的重建。此外，我们提出了一种简单而有效的策略，根据到相机的距离缩放梯度场，以抑制靠近相机的浮点生长。大量的实验，无论是定性的还是定量的，都证明了我们的方法在挑战性的Mip-NeRF 360和Tanks & Temples数据集上，实现了最先进的渲染质量，同时保持了实时渲染速度。\n"
  },
  {
    "path": "abs/2403.15624.md",
    "content": "### Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting\n\nOpen-vocabulary 3D scene understanding presents a significant challenge in computer vision, withwide-ranging applications in embodied agents and augmented reality systems. Previous approaches haveadopted Neural Radiance Fields (NeRFs) to analyze 3D scenes. In this paper, we introduce SemanticGaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our keyidea is distilling pre-trained 2D semantics into 3D Gaussians. We design a versatile projection approachthat maps various 2Dsemantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, withoutthe additional training required by NeRFs. We further build a 3D semantic network that directly predictsthe semantic component from raw 3D Gaussians for fast inference. We explore several applications ofSemantic Gaussians: semantic segmentation on ScanNet-20, where our approach attains a 4.2% mIoU and 4.0%mAcc improvement over prior open-vocabulary scene understanding counterparts; object part segmentation,sceneediting, and spatial-temporal segmentation with better qualitative results over 2D and 3D baselines,highlighting its versatility and effectiveness on supporting diverse downstream tasks.\n\n开放词汇的3D场景理解在计算机视觉中提出了重大挑战，它在体现智能体和增强现实系统中有广泛的应用。之前的方法采用了神经辐射场（NeRFs）来分析3D场景。在这篇论文中，我们介绍了SemanticGaussians，一种基于3D高斯喷溅的新颖开放词汇场景理解方法。我们的关键思想是将预训练的2D语义提炼到3D高斯中。我们设计了一种多功能的投影方法，将来自预训练图像编码器的各种2D语义特征映射到3D高斯的一个新颖的语义组成部分，而无需NeRFs所需的额外训练。我们进一步构建了一个3D语义网络，可以直接从原始3D高斯预测语义组成部分，以快速推理。我们探索了SemanticGaussians的几种应用：在ScanNet-20上的语义分割，我们的方法相比之前的开放词汇场景理解方法，在mIoU上获得了4.2%的提升，在mAcc上获得了4.0%的提升；物体部分分割、场景编辑和空间-时间分割，在定性结果上超过了2D和3D基线，突出了其在支持多样化下游任务上的多功能性和有效性。\n"
  },
  {
    "path": "abs/2403.15704.md",
    "content": "### Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections\n\nNovel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to previous methods, with a 1000× increase in rendering speed.\n\n从不受约束的野外图像中合成新视角仍是一个有意义但充满挑战的任务。这些不受约束图像中的光度变化和瞬时遮挡物使得准确重建原始场景变得困难。以往的方法通过在神经辐射场（NeRF）中引入全局外观特征来解决这个问题。然而，在现实世界中，场景中每个微小点的独特外观是由其独立的内在材料属性和它接收的不同环境影响决定的。受此启发，我们提出了一种方法，名为野外中的高斯（GS-W），使用3D高斯点来重建场景，并为每个点引入分离的内在和动态外观特征，捕捉不变的场景外观以及光照和天气等动态变化。此外，我们提出了一种自适应采样策略，允许每个高斯点更有效地关注局部和详细信息。我们还使用2D可见性图减少了瞬时遮挡物的影响。更多实验已经证明，与以往方法相比，GS-W在重建质量和细节方面表现更好，渲染速度提高了1000倍。\n"
  },
  {
    "path": "abs/2403.16095.md",
    "content": "### CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field\n\nRecently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, i.e., CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz.\n\n最近，神经辐射场（NeRF）作为3D表示，已被广泛用于密集的同时定位与地图构建（SLAM）。尽管在表面建模和新视角合成方面取得了显著成功，但现有基于NeRF的方法受到其计算密集和耗时的体积渲染流程的阻碍。本文提出了一个高效的密集RGB-D SLAM系统，即CG-SLAM，基于一个新颖的、具有高一致性和几何稳定性的不确定性感知3D高斯场。通过对高斯喷溅的深入分析，我们提出了几种技术，以构建一个适用于跟踪和映射的一致且稳定的3D高斯场。此外，我们提出了一个新颖的深度不确定性模型，以确保在优化过程中选择有价值的高斯原语，从而提高跟踪效率和准确性。在各种数据集上的实验表明，CG-SLAM实现了卓越的跟踪和映射性能，具有高达15 Hz的显著跟踪速度。\n"
  },
  {
    "path": "abs/2403.16292.md",
    "content": "### latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction\n\nWe present latentSplat, a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. Existing methods for generalizable 3D reconstruction either do not enable fast inference of high resolution novel views due to slow volume rendering, or are limited to interpolation of close input views, even in simpler settings with a single central object, where 360-degree generalization is possible. In this work, we combine a regression-based approach with a generative model, moving towards both of these capabilities within the same method, trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient Gaussian splatting and a fast, generative decoder network. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data.\n\n我们提出了latentSplat，一种在3D潜空间中预测语义高斯的方法，这些高斯可以被轻量级生成性2D架构喷溅并解码。现有的通用3D重建方法要么由于体积渲染速度慢而无法快速推断高分辨率新视图，要么限于对接近输入视图的插值，即使在具有单一中心对象的更简单设置中，其中360度概括是可能的。在这项工作中，我们结合了基于回归的方法和生成模型，向在同一方法内同时拥有这两种能力迈进，该方法完全基于现成的真实视频数据进行训练。我们方法的核心是变分3D高斯，这是一种有效编码潜空间中不同不确定性的表示，该潜空间由3D特征高斯组成。从这些高斯中，可以采样特定实例并通过高效的高斯喷溅和快速的生成解码器网络渲染。我们展示了latentSplat在重建质量和概括性方面超越了之前的工作，同时在处理高分辨率数据方面快速且可扩展。\n"
  },
  {
    "path": "abs/2403.16964.md",
    "content": "### GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction\n\nPresenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry.\n\n呈现来自多视图图像的3D场景仍然是计算机视觉和计算机图形学中的一个核心且长期的挑战。渲染和重建是其中的两个主要需求。值得注意的是，最先进的渲染质量通常通过神经体积渲染技术实现，这些技术依赖于聚合的点/原语的颜色，而忽视了潜在的场景几何结构。神经隐式表面的学习源自神经渲染的成功。当前的工作要么限制密度场的分布，要么限制原语的形状，导致渲染质量下降和学习到的场景表面上的缺陷。这类方法的效果受到所选神经表示固有约束的限制，特别是对于更大、更复杂的场景，难以捕捉细微的表面细节。为了解决这些问题，我们引入了GSDF，一种新颖的双分支架构，结合了灵活高效的3D高斯喷溅（3DGS）表示与神经符号距离场（SDF）的优势。核心思想是利用并增强每个分支的优点，通过相互指导和联合监督来减轻它们的限制。我们在多样化的场景上展示，我们的设计解锁了更准确、更详细的表面重建潜力，并同时让3DGS渲染受益于与潜在几何结构更一致的结构。\n"
  },
  {
    "path": "abs/2403.17237.md",
    "content": "### DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion\n\nWe present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions.\n\n我们介绍了DreamPolisher，一种基于高斯Splatting的新方法，它采用几何引导，专为从文本描述中学习跨视图一致性和复杂细节而设计。虽然最近在文本到三维生成方法上的进展令人鼓舞，但现有方法常常无法确保视图一致性和纹理丰富性。这一问题对仅与文本输入工作的方法尤为明显。为了解决这个问题，我们提出了一种基于两阶段高斯Splatting的方法，该方法强制执行视图间的几何一致性。最初，通过几何优化对粗糙的三维生成进行精炼。随后，我们使用一个由ControlNet驱动的细化器，结合几何一致性项，来改善生成的三维资产的纹理保真度和整体一致性。跨多个对象类别的多样文本提示的经验评估证明了DreamPolisher在生成与文本指令语义紧密对齐的一致和现实的三维对象方面的有效性。\n"
  },
  {
    "path": "abs/2403.17822.md",
    "content": "### DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing\n\n3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction, an important downstream application. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use the geometry of the 3D Gaussians supervised by normal cues to achieve better alignment with the true scene geometry. We improve depth estimation and novel view synthesis results over baselines and show how this simple yet effective regularization technique can be used to directly extract meshes from the Gaussian representation yielding more physically accurate reconstructions on indoor scenes.\n\n三维高斯Splatting是一种新颖的可微渲染技术，已在新视角合成结果上达到了最先进的水平，具有高渲染速度和相对较低的训练时间。然而，由于在优化过程中缺乏几何约束，其在常见的室内数据集场景中的表现不佳。我们通过深度和法线线索扩展了三维高斯Splatting，以应对具有挑战性的室内数据集，并展示了有效的网格提取技术，这是一个重要的下游应用。具体来说，我们用深度信息规范化优化程序，强制执行附近高斯的局部平滑性，并利用由法线线索监督的三维高斯的几何性质，以更好地与真实场景几何对齐。我们改善了深度估计和新视角合成结果，超越了基准线，并展示了这种简单而有效的规范化技术如何被用于直接从高斯表示中提取网格，从而在室内场景中获得更物理精确的重建。\n"
  },
  {
    "path": "abs/2403.17888.md",
    "content": "### 2D Gaussian Splatting for Geometrically Accurate Radiance Fields\n\n3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-accurate 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering. Our code will be made publicly available.\n\n三维高斯Splatting（3DGS）最近彻底改变了辐射场重建，实现了高质量的新视角合成和快速渲染速度，而无需烘焙。然而，3DGS由于三维高斯的多视图不一致性质，未能准确表示表面。我们提出了二维高斯Splatting（2DGS），一种新颖的方法，用于从多视图图像中建模和重构几何精确的辐射场。我们的关键思想是将三维体积折叠成一组二维定向的平面高斯盘。与三维高斯不同，二维高斯在本质上提供了视图一致的几何结构，同时建模表面。为了准确恢复薄表面并实现稳定的优化，我们引入了一个利用射线-斑点交叉和光栅化的透视精确的二维Splatting过程。此外，我们还结合了深度失真和法线一致性条款，以进一步提高重建的质量。我们证明了我们的可微分渲染器允许进行无噪声和详细的几何重建，同时保持竞争性的外观质量、快速的训练速度和实时渲染。\n"
  },
  {
    "path": "abs/2403.17898.md",
    "content": "### Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians\n\nThe recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularly noticeable in zoom-out views and can lead to inconsistent rendering speeds in scenes with varying details. Moreover, it often struggles to capture the corresponding level of details at different scales with its heuristic density control operation. Inspired by the Level-of-Detail (LOD) techniques, we introduce Octree-GS, featuring an LOD-structured 3D Gaussian approach supporting level-of-detail decomposition for scene representation that contributes to the final rendering results. Our model dynamically selects the appropriate level from the set of multi-resolution anchor points, ensuring consistent rendering performance with adaptive LOD adjustments while maintaining high-fidelity rendering results.\n\n近期的三维高斯Splatting（3D-GS）与基于NeRF的神经场景表示相比，显示出了显著的渲染保真度和效率。虽然展示了实时渲染的潜力，但3D-GS在处理大型场景中的复杂细节时遇到了渲染瓶颈，这是由于观察视锥内存在过多的高斯原语所致。这个限制在缩小视图时特别明显，并且在细节变化的场景中可能导致渲染速度不一致。此外，它在用启发式密度控制操作捕获不同比例下相应细节水平时常常遇到困难。受到细节层次（LOD）技术的启发，我们引入了Octree-GS，它采用了LOD结构化的三维高斯方法，支持场景表示的细节层次分解，有助于最终渲染结果。我们的模型动态地从多分辨率锚点集合中选择合适的层次，确保在保持高保真渲染结果的同时，通过自适应LOD调整来保证一致的渲染性能。\n"
  },
  {
    "path": "abs/2403.18476.md",
    "content": "### Modeling uncertainty for Gaussian Splatting\n\nWe present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications.\n\n我们提出了随机高斯喷溅（SGS）：这是第一个使用高斯喷溅（GS）进行不确定性估计的框架。GS最近通过以更低的计算成本实现令人印象深刻的重建质量，推进了新视角合成领域的发展，相比之下，神经辐射场（NeRF）的计算成本更高。然而，与后者不同的是，它仍然缺乏提供与其输出相关的置信度信息的能力。为了解决这一局限性，在本文中，我们引入了一种基于变分推理的方法，无缝地将不确定性预测集成到GS的常见渲染流程中。此外，我们引入了稀疏化误差下的面积（AUSE）作为损失函数中的一个新项，使得不确定性估计的优化与图像重建同时进行。在LLFF数据集上的实验结果表明，我们的方法在图像渲染质量和不确定性估计精度方面均优于现有方法。总的来说，我们的框架为从业者提供了有关合成视图可靠性的宝贵见解，有助于在现实世界的应用中做出更安全的决策。\n"
  },
  {
    "path": "abs/2403.18784.md",
    "content": "### SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface\n\nWe present SplatFace, a novel Gaussian splatting framework designed for 3D human face reconstruction without reliance on accurate pre-determined geometry. Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions. We incorporate a generic 3D Morphable Model (3DMM) to provide a surface geometric structure, making it possible to reconstruct faces with a limited set of input images. We introduce a joint optimization strategy that refines both the Gaussians and the morphable surface through a synergistic non-rigid alignment process. A novel distance metric, splat-to-surface, is proposed to improve alignment by considering both the Gaussian position and covariance. The surface information is also utilized to incorporate a world-space densification process, resulting in superior reconstruction quality. Our experimental analysis demonstrates that the proposed method is competitive with both other Gaussian splatting techniques in novel view synthesis and other 3D reconstruction methods in producing 3D face meshes with high geometric precision.\n\n我们提出了SplatFace，这是一个新颖的高斯喷溅框架，专为3D人脸重建设计，而不依赖于精确预定的几何形状。我们的方法旨在同时提供高质量的新视角渲染和精确的3D网格重建。我们结合了一个通用的3D可变形模型（3DMM）来提供表面几何结构，使得仅使用有限的输入图像就能重建面孔成为可能。我们引入了一种联合优化策略，通过一种协同的非刚性对齐过程，细化高斯和可变形表面。提出了一种新的距离度量方法，即喷溅到表面，通过考虑高斯位置和协方差来改善对齐。表面信息也被用来结合一个世界空间密集化过程，从而实现更优的重建质量。我们的实验分析表明，所提出的方法在新视角合成方面与其他高斯喷溅技术竞争，在生成高几何精度的3D面部网格方面与其他3D重建方法竞争。\n"
  },
  {
    "path": "abs/2403.18795.md",
    "content": "### Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction\n\nWe tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU.\n\n我们应对的挑战是如何从单张图片高效重建3D资产，随着自动化3D内容创建流程的需求不断增长。以往的方法主要依赖于得分蒸馏采样（SDS）和神经辐射场（NeRF）。尽管这些方法取得了显著的成功，但由于优化时间长和内存使用量大，这些方法遇到了实际限制。在这份报告中，我们介绍了Gamba，一种从单视图图像到端到端摊销的3D重建模型，强调两个主要见解：（1）3D表示：利用大量3D高斯进行高效的3D高斯喷溅过程；（2）骨干网络设计：引入基于Mamba的序列网络，该网络促进了依赖于上下文的推理，并随着序列（令牌）长度线性扩展，能够容纳大量的高斯。Gamba在数据预处理、正则化设计和训练方法论方面取得了重大进步。我们使用真实世界扫描的OmniObject3D数据集，将Gamba与现有的基于优化和前馈的3D生成方法进行了评估。这里，Gamba在质量和数量上都展现了竞争性的生成能力，同时在单个NVIDIA A100 GPU上达到了约0.6秒的显著速度。\n"
  },
  {
    "path": "abs/2403.19495.md",
    "content": "### CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians\n\nThe field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.\n\n近几年来，从图像到3D重建的领域迅速发展，首先是神经辐射场（NeRF）的引入，最近则是3D高斯喷溅（3DGS）。后者在训练和推理速度以及重建质量方面，相较于NeRF有显著的优势。尽管3DGS在密集输入图像中表现良好，但在极其稀疏输入图像（例如，3张图像）的更具挑战性的设置中，类似于无结构点云的表示很快就会过度拟合，从新的视角看上去像是一团乱麻。为了解决这个问题，我们提出了正则化优化和基于深度的初始化。我们的关键思想是引入一个可以在2D图像空间中控制的结构化高斯表示。然后，我们约束高斯，特别是它们的位置，并防止它们在优化过程中独立移动。具体来说，我们通过一个隐式卷积解码器和总变分损失分别引入单视图和多视图约束。通过对高斯引入连贯性，我们进一步通过基于流的损失函数约束优化。为了支持我们的正则化优化，我们提出了一种使用每个输入视图处的单目深度估计来初始化高斯的方法。我们在多种场景上与最新的稀疏视图基于NeRF的方法相比，展示了显著的改进。\n"
  },
  {
    "path": "abs/2403.19615.md",
    "content": "### SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing\n\nIn this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated.\n\n在本文中，我们提出了一种适应尺度的抗锯齿高斯喷溅方法（SA-GS）。虽然最先进的方法Mip-Splatting需要修改高斯喷溅的训练程序，但我们的方法在测试时运行且无需训练。具体来说，SA-GS可以作为插件应用于任何预训练的高斯喷溅场，显著提高场的抗锯齿性能。核心技术是在测试时对每个高斯应用2D适应尺度的滤波器。如Mip-Splatting所指出的，观察不同频率的高斯导致训练和测试期间高斯尺度的不匹配。Mip-Splatting通过使用3D平滑和2D Mip滤波器来解决这个问题，不幸的是，这些滤波器不知道测试频率。在这项工作中，我们展示了一个知道测试频率的2D适应尺度滤波器可以有效地匹配高斯尺度，从而使高斯原始分布在不同测试频率下保持一致。当消除了尺度不一致性时，小于场景频率的采样率会导致传统的锯齿状，我们建议在测试中对每个像素内的投影2D高斯进行积分。这种积分实际上是超采样的一种极限情况，显著提高了抗锯齿性能，超过了原始高斯喷溅。通过使用各种设置以及有界和无界场景进行广泛的实验，我们展示了SA-GS的性能与Mip-Splatting相当或更好。注意，超采样和积分仅在我们的适应尺度滤波启用时有效。\n"
  },
  {
    "path": "abs/2403.19632.md",
    "content": "### GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond\n\nWe present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdoor scenes and improves novel view synthesis. Finally, we propose Gaussian Splatting Surface Reconstruction (GauS), a novel render-then-fuse approach for high-fidelity mesh reconstruction from 3DGS inputs without fine-tuning. Overall, our GauStudio framework, hybrid representation, and GauS approach enhance 3DGS modeling and rendering capabilities, enabling higher-quality novel view synthesis and surface reconstruction.\n\n我们提出了GauStudio，一个用于建模3D高斯喷溅（3DGS）的新颖模块化框架，为用户提供了标准化的、即插即用的组件，以便用户轻松自定义和实现3DGS流程。在我们的框架支持下，我们提出了一种混合高斯表示，包括前景和天空球背景模型。实验表明，这种表示减少了无界室外场景中的伪影，并改善了新视角合成。最后，我们提出了高斯喷溅表面重建（GauS），这是一种新颖的先渲染后融合方法，用于从3DGS输入中重建高保真网格，无需微调。总的来说，我们的GauStudio框架、混合表示和GauS方法增强了3DGS建模和渲染能力，实现了更高质量的新视角合成和表面重建。\n"
  },
  {
    "path": "abs/2403.19655.md",
    "content": "### GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling\n\n3D Gaussian Splatting (GS) have achieved considerable improvement over Neural Radiance Fields in terms of 3D fitting fidelity and rendering speed. However, this unstructured representation with scattered Gaussians poses a significant challenge for generative modeling. To address the problem, we introduce GaussianCube, a structured GS representation that is both powerful and efficient for generative modeling. We achieve this by first proposing a modified densification-constrained GS fitting algorithm which can yield high-quality fitting results using a fixed number of free Gaussians, and then re-arranging the Gaussians into a predefined voxel grid via Optimal Transport. The structured grid representation allows us to use standard 3D U-Net as our backbone in diffusion generative modeling without elaborate designs. Extensive experiments conducted on ShapeNet and OmniObject3D show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a powerful and versatile 3D representation.\n\n3D高斯喷溅（GS）在3D拟合保真度和渲染速度方面相较于神经辐射场取得了显著改进。然而，这种散布的高斯的无结构表示对于生成模型构成了重大挑战。为了解决这个问题，我们介绍了GaussianCube，一种结构化的GS表示，对于生成模型既强大又高效。我们首先提出了一种修改后的密度约束GS拟合算法，该算法使用固定数量的自由高斯可以产生高质量的拟合结果，然后通过最优传输将高斯重新排列到预定义的体素网格中。结构化网格表示允许我们在扩散生成模型中使用标准的3D U-Net作为我们的骨干网络，无需复杂设计。在ShapeNet和OmniObject3D上进行的广泛实验表明，我们的模型在质量和数量上都实现了最先进的生成结果，突显了GaussianCube作为一种强大且多用途的3D表示的潜力。\n"
  },
  {
    "path": "abs/2403.20032.md",
    "content": "### HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes\n\nThe rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across different cameras. Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets.\n\n三维高斯喷涂（3DGS）的迅速发展革新了神经渲染技术，实现了高质量渲染的实时生产。然而，以往基于3DGS的方法在城市场景中存在局限性，这是因为它们依赖于初始的结构从运动（SfM）点，并且在渲染远处、天空和低纹理区域时面临困难。为了克服这些挑战，我们提出了一种名为HO-Gaussian的混合优化方法，该方法结合了基于网格的体积和3DGS流程。HO-Gaussian消除了对SfM点初始化的依赖，使得能够渲染城市场景，并在训练期间通过引入点密度化来提高问题区域的渲染质量。此外，我们引入了高斯方向编码作为渲染流程中球形谐波的替代品，这使得视点依赖的颜色表示成为可能。为了适应多摄像机系统，我们引入了神经扭曲技术以增强不同摄像机之间物体的一致性。在广泛使用的自动驾驶数据集上的实验结果表明，HO-Gaussian能够在多摄像机城市数据集上实现实时的照片级渲染。\n"
  },
  {
    "path": "abs/2403.20079.md",
    "content": "### SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior\n\nNovel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views.\n\n街景的新视角合成（NVS）在自动驾驶模拟中扮演着至关重要的角色。目前实现它的主流技术是神经渲染，例如神经辐射场（NeRF）和三维高斯喷涂（3DGS）。尽管取得了令人振奋的进展，但在处理街景时，当前方法在远离训练视点的视角维持渲染质量方面存在困难。这一问题源于由移动车辆上的固定摄像头捕获的稀疏训练视图。为了解决这个问题，我们提出了一种新颖的方法，通过利用扩散模型的先验以及补充的多模态数据来增强3DGS的能力。具体来说，我们首先通过添加来自相邻帧的图像作为条件来微调扩散模型，同时利用来自激光雷达点云的深度数据提供额外的空间信息。然后我们在训练中将扩散模型应用于未见视图的3DGS正则化。实验结果验证了我们方法与当前最先进模型相比的有效性，并展示了其在从更广视角渲染图像方面的进步。\n"
  },
  {
    "path": "abs/2403.20159.md",
    "content": "### HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes\n\nOnline dense mapping of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in mapping methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense mapping. However, integrating 3DGS into a street-view dense mapping framework still faces two challenges, including incomplete reconstruction due to the absence of geometric information beyond the LiDAR coverage area and extensive computation for reconstruction in large urban scenes. To this end, we propose HGS-Mapping, an online dense mapping framework in unbounded large-scale scenes. To attain complete construction, our framework introduces Hybrid Gaussian Representation, which models different parts of the entire scene using Gaussians with distinct properties. Furthermore, we employ a hybrid Gaussian initialization mechanism and an adaptive update method to achieve high-fidelity and rapid reconstruction. To the best of our knowledge, we are the first to integrate Gaussian representation into online dense mapping of urban scenes. Our approach achieves SOTA reconstruction accuracy while only employing 66% number of Gaussians, leading to 20% faster reconstruction speed.\n\n在线密集映射城市场景构成了自动驾驶车辆场景理解和导航的基本基石。最近在映射方法上的进步主要基于NeRF，其渲染速度太慢，无法满足在线要求。三维高斯喷涂（3DGS），由于其渲染速度比NeRF快数百倍，因此在在线密集映射中拥有更大的潜力。然而，将3DGS整合到街景密集映射框架中仍面临两大挑战，包括由于缺乏LiDAR覆盖区域之外的几何信息而导致的重建不完整，以及在大型城市场景中进行重建的大量计算。为此，我们提出了HGS-Mapping，一个在无边界大规模场景中的在线密集映射框架。为了实现完整构建，我们的框架引入了混合高斯表示法，该方法使用具有不同属性的高斯模型来模拟整个场景的不同部分。此外，我们采用混合高斯初始化机制和自适应更新方法来实现高保真和快速重建。据我们所知，我们是第一个将高斯表示法整合到城市场景在线密集映射中的。我们的方法在仅使用66%的高斯数量的情况下，实现了SOTA的重建精度，重建速度提高了20%。\n"
  },
  {
    "path": "abs/2403.20275.md",
    "content": "### Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces\n\nTouch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (e.g. shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality.\n\n触觉和视觉相辅相成，共同增强了我们理解世界的能力。从研究的角度来看，触觉和视觉的结合是一个探索不足且充满有趣挑战的问题。为此，我们提出了一种新颖的方法——触觉信息三维高斯喷涂（Tactile-Informed 3DGS），该方法将触觉数据（局部深度图）与多视角视觉数据结合起来，以实现表面重建和新视角合成。我们的方法优化了三维高斯原语以精确地模拟接触点处对象的几何形状。通过创建一个在触觉位置降低透射率的框架，我们实现了精细的表面重建，确保了深度图的均匀平滑。当考虑非朗伯体（如光滑或反射表面）的对象时，触觉尤为有用，因为现有方法倾向于无法高保真地重建镜面高光。通过结合视觉和触觉感知，我们使用比先前方法更少的图像实现了更准确的几何重建。我们对具有光滑和反射表面的对象进行了评估，并展示了我们方法的有效性，提供了重建质量的显著改进。\n"
  },
  {
    "path": "abs/2403.20309.md",
    "content": "### InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds\n\nWhile novel view synthesis (NVS) has made substantial progress in 3D computer vision, it typically requires an initial estimation of camera intrinsics and extrinsics from dense viewpoints. This pre-processing is usually conducted via a Structure-from-Motion (SfM) pipeline, a procedure that can be slow and unreliable, particularly in sparse-view scenarios with insufficient matched features for accurate reconstruction. In this work, we integrate the strengths of point-based representations (e.g., 3D Gaussian Splatting, 3D-GS) with end-to-end dense stereo models (DUSt3R) to tackle the complex yet unresolved issues in NVS under unconstrained settings, which encompasses pose-free and sparse view challenges. Our framework, InstantSplat, unifies dense stereo priors with 3D-GS to build 3D Gaussians of large-scale scenes from sparseview & pose-free images in less than 1 minute. Specifically, InstantSplat comprises a Coarse Geometric Initialization (CGI) module that swiftly establishes a preliminary scene structure and camera parameters across all training views, utilizing globally-aligned 3D point maps derived from a pre-trained dense stereo pipeline. This is followed by the Fast 3D-Gaussian Optimization (F-3DGO) module, which jointly optimizes the 3D Gaussian attributes and the initialized poses with pose regularization. Experiments conducted on the large-scale outdoor Tanks & Temples datasets demonstrate that InstantSplat significantly improves SSIM (by 32%) while concurrently reducing Absolute Trajectory Error (ATE) by 80%. These establish InstantSplat as a viable solution for scenarios involving posefree and sparse-view conditions.\n\n虽然新视角合成（NVS）在三维计算机视觉领域取得了实质性进展，但它通常需要从密集视角估计摄像机内部和外部参数。这一预处理通常通过结构从运动（SfM）流程进行，这一程序可能既缓慢又不可靠，尤其是在稀疏视图场景中，由于缺乏足够的匹配特征而无法进行准确重建。在这项工作中，我们将基于点的表示（例如，三维高斯喷涂，3D-GS）的优势与端到端密集立体模型（DUSt3R）结合起来，以解决NVS在无约束设置下的复杂未解决问题，包括无姿态和稀疏视图挑战。我们的框架InstantSplat，将密集立体先验与3D-GS统一起来，从稀疏视图和无姿态图像中构建大规模场景的三维高斯，不到1分钟。具体来说，InstantSplat包括一个粗略几何初始化（CGI）模块，该模块利用来自预训练密集立体流程的全局对齐的三维点图，迅速建立起初步的场景结构和所有训练视图中的摄像机参数。这之后是快速三维高斯优化（F-3DGO）模块，它联合优化三维高斯属性和初始化的姿态，并进行姿态正则化。在大规模室外Tanks & Temples数据集上进行的实验表明，InstantSplat显著提高了SSIM（32%）的同时，将绝对轨迹误差（ATE）减少了80%。这证明了InstantSplat对于无姿态和稀疏视图条件下的场景是一个可行的解决方案。\n"
  },
  {
    "path": "abs/2404.00409.md",
    "content": "### 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting\n\nIn this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. First, we introduce a differentiable SDF-to-opacity transformation function that converts SDF values into corresponding Gaussians' opacities. This function connects the SDF and 3D Gaussians, allowing for unified optimization and enforcing surface constraints on the 3D Gaussians. During learning, optimizing the 3D Gaussians provides supervisory signals for SDF learning, enabling the reconstruction of intricate details. However, this only provides sparse supervisory signals to the SDF at locations occupied by Gaussians, which is insufficient for learning a continuous SDF. Then, to address this limitation, we incorporate volumetric rendering and align the rendered geometric attributes (depth, normal) with those derived from 3D Gaussians. This consistency regularization introduces supervisory signals to locations not covered by discrete 3D Gaussians, effectively eliminating redundant surfaces outside the Gaussian sampling range. Our extensive experimental results demonstrate that our 3DGSR method enables high-quality 3D surface reconstruction while preserving the efficiency and rendering quality of 3DGS. Besides, our method competes favorably with leading surface reconstruction techniques while offering a more efficient learning process and much better rendering qualities.\n\n在本文中，我们提出了一种使用三维高斯喷溅（3DGS）的隐式表面重建方法，即3DGSR，它允许在继承3DGS的高效率和渲染质量的同时，实现具有复杂细节的准确三维重建。关键洞察是在三维高斯中融合隐式符号距离场（SDF），使其能够被对齐并共同优化。首先，我们引入了一个可微的SDF到不透明度转换函数，该函数将SDF值转换为对应高斯的不透明度。这个函数连接了SDF和三维高斯，允许统一优化，并对三维高斯施加表面约束。在学习过程中，优化三维高斯提供了SDF学习的监督信号，使得能够重建复杂的细节。然而，这仅在高斯占据的位置为SDF提供了稀疏的监督信号，这对于学习连续的SDF是不足的。然后，为了解决这一限制，我们融合了体积渲染，并将渲染的几何属性（深度、法线）与由三维高斯派生的属性对齐。这种一致性正则化为未被离散三维高斯覆盖的位置引入了监督信号，有效地消除了高斯采样范围外的冗余表面。我们广泛的实验结果表明，我们的3DGSR方法能够在保持3DGS的效率和渲染质量的同时，实现高质量的三维表面重建。此外，我们的方法与领先的表面重建技术相比具有竞争优势，同时提供了更高效的学习过程和更好的渲染质量。\n"
  },
  {
    "path": "abs/2404.00923.md",
    "content": "### MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements\n\nSimultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map.\n\n同时定位与地图构建对于位置跟踪和场景理解至关重要。基于三维高斯的地图表示方法能够使用多个摆放的摄像头实现场景的真实感重建和实时渲染。我们首次展示了使用三维高斯进行地图表示，结合未摆放摄像头图像和惯性测量，可以实现准确的SLAM。我们的方法，MM3DGS，通过使渲染更快、具有尺度意识和改进的轨迹跟踪，解决了之前基于神经辐射场表示的限制。我们的框架利用关键帧进行映射和跟踪，使用损失函数整合了从预集成的惯性测量、深度估计和光度渲染质量度量中得到的相对姿态变换。我们还发布了一个多模态数据集，UT-MM，该数据集由装备有摄像头和惯性测量单元的移动机器人收集。在数据集的几个场景上进行的实验评估显示，与当前3DGS SLAM的最新技术相比，MM3DGS在跟踪上实现了3倍的改进，在光度渲染质量上改进了5%，同时允许实时渲染高分辨率的密集三维地图。\n"
  },
  {
    "path": "abs/2404.01133.md",
    "content": "### CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians\n\nThe advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strategy for efficient large-scale 3DGS training and rendering. Specifically, the global scene prior and adaptive training data selection enables efficient training and seamless fusion. Based on fused Gaussian primitives, we generate different detail levels through compression, and realize fast rendering across various scales through the proposed block-wise detail levels selection and aggregation strategy. Extensive experimental results on large-scale scenes demonstrate that our approach attains state-of-theart rendering quality, enabling consistent real-time rendering of largescale scenes across vastly different scales.\n\n三维高斯喷溅（3DGS）的进步显著推动了实时三维场景重建和新颖视角合成的发展。然而，有效地训练大规模3DGS并在各种规模上实时渲染仍然具有挑战性。本文介绍了CityGaussian（CityGS），它采用了一种新颖的分而治之训练方法和细节级别（LoD）策略，以高效训练和渲染大规模3DGS。具体来说，全局场景先验和自适应训练数据选择使得训练高效且能无缝融合。基于融合的高斯原始体，我们通过压缩生成不同的细节级别，并通过提出的块状细节级别选择和聚合策略，实现在各种规模上的快速渲染。广泛的实验结果在大规模场景上展示了我们的方法达到了最先进的渲染质量，使得能够在极其不同的规模上实现大规模场景的一致实时渲染。\n"
  },
  {
    "path": "abs/2404.01168.md",
    "content": "### Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. Our code will be made publicly available for reproducible research.\n\n三维高斯喷溅（3DGS）在三维场景重建和新颖视角合成领域标志着一个重大突破。然而，像其前身神经辐射场（NeRF）一样，3DGS在准确模拟物理反射方面存在困难，特别是在现实世界场景中无处不在的镜子反射。这种疏忽错误地将反射视为物理存在的独立实体，导致重建不准确和反射属性在不同视点间的不一致。为了解决这个关键挑战，我们引入了Mirror-3DGS，一个创新的渲染框架，旨在掌握镜面几何形态和反射的复杂性，为真实地描绘镜面反射铺平了道路。通过巧妙地将镜面属性融入3DGS，并利用平面镜成像原理，Mirror-3DGS创造了一个镜面视点，从镜子后面进行观察，增强了场景渲染的真实感。广泛的评估，涵盖合成和现实世界场景，展示了我们的方法在实时渲染新颖视角时，在具有挑战性的镜面区域内，提高了保真度，超越了最先进的Mirror-NeRF。\n"
  },
  {
    "path": "abs/2404.01223.md",
    "content": "### Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing\n\nScene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language.\n\n使用三维高斯原始体的场景表示在建模静态和动态三维场景的外观方面取得了优异的成果。然而，许多图形应用程序要求能够操纵对象的外观和物理属性。我们介绍了特征喷溅（Feature Splatting），这是一种将基于物理的动态场景合成与基于自然语言的视觉语言基础模型中的丰富语义统一起来的方法。我们的第一个贡献是一种方法，能够将高质量的、以对象为中心的视觉-语言特征提炼到三维高斯中，这使得使用文本查询进行半自动场景分解成为可能。我们的第二个贡献是一种方法，能够使用基于粒子的模拟器从本来静态的场景中合成基于物理的动力学，其中材料属性通过文本查询自动分配。我们对在此流程中使用的关键技术进行了剖析，以说明使用携带特征的三维高斯作为外观、几何、材料属性和基于自然语言的语义的统一格式所面临的挑战和机遇。\n"
  },
  {
    "path": "abs/2404.01810.md",
    "content": "### Surface Reconstruction from Gaussian Splatting via Novel Stereo Views\n\nThe Gaussian splatting for radiance field rendering method has recently emerged as an efficient approach for accurate scene representation. It optimizes the location, size, color, and shape of a cloud of 3D Gaussian elements to visually match, after projection, or splatting, a set of given images taken from various viewing directions. And yet, despite the proximity of Gaussian elements to the shape boundaries, direct surface reconstruction of objects in the scene is a challenge.\nWe propose a novel approach for surface reconstruction from Gaussian splatting models. Rather than relying on the Gaussian elements' locations as a prior for surface reconstruction, we leverage the superior novel-view synthesis capabilities of 3DGS. To that end, we use the Gaussian splatting model to render pairs of stereo-calibrated novel views from which we extract depth profiles using a stereo matching method. We then combine the extracted RGB-D images into a geometrically consistent surface. The resulting reconstruction is more accurate and shows finer details when compared to other methods for surface reconstruction from Gaussian splatting models, while requiring significantly less compute time compared to other surface reconstruction methods.\nWe performed extensive testing of the proposed method on in-the-wild scenes, taken by a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the proposed method on the Tanks and Temples benchmark, and it has surpassed the current leading method for surface reconstruction from Gaussian splatting models.\n\n高斯喷溅用于辐射场渲染方法最近已经作为一种高效的准确场景表示方法而出现。它优化了一团三维高斯元素的位置、大小、颜色和形状，以便在投影或喷溅后，从各个观察方向拍摄的一组给定图像视觉上匹配。然而，尽管高斯元素接近形状边界，直接重建场景中对象的表面仍是一项挑战。\n我们提出了一种从高斯喷溅模型重建表面的新方法。我们不是依赖高斯元素的位置作为表面重建的先验，而是利用3DGS卓越的新视角合成能力。为此，我们使用高斯喷溅模型渲染一对经过立体校准的新视角，从中我们使用立体匹配方法提取深度轮廓。然后，我们将提取的RGB-D图像合并成一个几何上一致的表面。与其他从高斯喷溅模型进行表面重建的方法相比，结果重建更加准确，展示了更细致的细节，同时与其他表面重建方法相比，所需的计算时间显著减少。\n我们对提出的方法进行了广泛的测试，这些测试在野外场景中进行，由智能手机拍摄，展示了其卓越的重建能力。此外，我们还在Tanks and Temples基准测试上测试了提出的方法，它已经超过了当前领先的从高斯喷溅模型进行表面重建的方法。\n"
  },
  {
    "path": "abs/2404.02410.md",
    "content": "### TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Surrounding Autonomous Driving Scenes\n\nMost 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.\n\n大多数基于三维高斯喷溅（3D-GS）的城市场景方法直接使用3D激光雷达点初始化三维高斯，这不仅没有充分利用激光雷达数据的能力，也忽视了融合激光雷达与相机数据的潜在优势。在本文中，我们设计了一种新颖的紧密耦合激光雷达-相机高斯喷溅（TCLC-GS），充分利用了激光雷达和相机传感器的综合优势，实现快速、高质量的三维重建和新视角RGB/深度合成。TCLC-GS设计了一种从激光雷达-相机数据派生的混合显式（着色的三维网格）和隐式（层次化八叉树特征）三维表示，以丰富用于喷溅的三维高斯的属性。三维高斯的属性不仅与提供更完整的三维形状和颜色信息的三维网格对齐初始化，而且还通过检索的八叉树隐式特征赋予了更广泛的上下文信息。在高斯喷溅优化过程中，三维网格提供密集的深度信息作为监督，这通过学习鲁棒的几何形状增强了训练过程。在Waymo Open数据集和nuScenes数据集上进行的全面评估验证了我们方法的最先进（SOTA）性能。使用单个NVIDIA RTX 3090 Ti，我们的方法展示了快速训练并在城市场景中实现了实时RGB和深度渲染，分辨率为1920x1280（Waymo）下达到90 FPS，以及分辨率为1600x900（nuScenes）下达到120 FPS。\n"
  },
  {
    "path": "abs/2404.03126.md",
    "content": "### GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis\n\nWe present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection views of the simulated scan, and have better performance than other implicit 3D scene representations methodologies. Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations.\n\n我们提出了GaSpCT，一种用于计算机断层扫描（CT）的新颖视图合成和三维场景表示方法，用于生成CT扫描的新颖投影视图。我们调整了高斯喷溅框架，使其能够基于有限的2D图像投影集合进行CT的新视图合成，而无需使用结构从运动（SfM）方法论。因此，我们减少了总扫描持续时间和患者在扫描过程中接收的辐射剂量。我们通过使用两个促进稀疏性的正则化器：贝塔损失和总变分（TV）损失，调整了损失函数以强化背景和前景之间的区分。最后，我们使用一个均匀的先验分布初始化三维空间中的高斯位置，这个分布预期了大脑在视野中的位置。我们使用帕金森病进展标记物计划（PPMI）数据集中的脑CT扫描评估了我们模型的性能，并证明渲染的新视图与模拟扫描的原始投影视图紧密匹配，并且性能优于其他隐式三维场景表示方法。此外，我们经验性地观察到与基于神经网络的稀疏视图CT图像重建相比，训练时间有所减少。最后，与等效的体素网格图像表示相比，高斯喷溅表示的内存要求减少了17%。\n"
  },
  {
    "path": "abs/2404.03202.md",
    "content": "### OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images\n\nPhotorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. As a result, we realize differentiable optimization of the radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images.\n\n依赖于三维高斯喷溅的真实感重建在机器人技术中展示了巨大潜力。然而，当前的三维高斯喷溅系统仅支持使用未失真的透视图像进行辐射场重建。在本文中，我们提出了OmniGS，一种新颖的全方位高斯喷溅系统，利用全方位图像快速重建辐射场。具体来说，我们对三维高斯喷溅中球形相机模型导数进行了理论分析。根据这些导数，我们随后实现了一个新的GPU加速的全方位栅格化器，直接将三维高斯喷溅到等距圆柱投影屏幕空间上，进行全方位图像渲染。结果，我们实现了辐射场的可微优化，无需立方图纠正或切线平面近似。在以自我为中心和漫游场景中进行的广泛实验表明，我们的方法使用全方位图像达到了最先进的重建质量和高渲染速度。\n"
  },
  {
    "path": "abs/2404.03613.md",
    "content": "### Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting\n\nAs 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames. However, previous works fail to accurately reconstruct dynamic scenes, especially 1) static parts moving along nearby dynamic parts, and 2) some dynamic areas are blurry. We attribute the failure to the wrong design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce an efficient training strategy for faster convergence and higher quality.\n\n由于三维高斯喷溅（3DGS）提供了快速且高质量的新视角合成，将标准的3DGS变形应用于多帧是一个自然的扩展。然而，以往的工作未能准确重建动态场景，特别是1) 静态部分沿着附近的动态部分移动，以及2) 一些动态区域模糊不清。我们将这一失败归因于变形场设计错误，该设计构建为基于坐标的函数。这种方法存在问题，因为3DGS是多个以高斯为中心的场的混合，而不仅仅是一个基于单一坐标的框架。为了解决这个问题，我们将变形定义为每个高斯嵌入和时间嵌入的函数。此外，我们将变形分解为粗变形和细变形，分别模拟慢速和快速运动。同时，我们引入了一种高效的训练策略，以实现更快的收敛和更高的质量。\n"
  },
  {
    "path": "abs/2404.04211.md",
    "content": "### Robust Gaussian Splatting\n\nIn this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines.\n\n在这篇论文中，我们讨论了3D高斯喷涂（3DGS）常见的错误源，包括模糊、不完美的摄像头姿态和颜色不一致，目的是提高其在实际应用中的鲁棒性，比如从手持手机拍摄的重建中。我们的主要贡献涉及将运动模糊建模为摄像头姿态上的高斯分布，允许我们以统一的方式解决摄像头姿态细化和运动模糊校正。此外，我们提出了补偿散焦模糊的机制，以及解决由环境光、阴影或由于摄像头相关因素（如不同的白平衡设置）引起的颜色不一致的方法。我们提出的解决方案与3DGS的公式无缝集成，同时保持了其在训练效率和渲染速度方面的优势。我们在相关的基准数据集上实验验证了我们的贡献，包括Scannet++和Deblur-NeRF，获得了最先进的结果，从而在相关基线上取得了一致的改进。\n"
  },
  {
    "path": "abs/2404.04687.md",
    "content": "### Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion\n\nDifferentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360∘ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).\n\n可微分的3D高斯喷涂（GS）技术正在计算机视觉和图形学中作为重建3D场景的突出技术而兴起。GS通过使用不同透明度的一组3D高斯模型来表示场景，并采用计算效率高的喷涂操作以及解析导数来根据从各种视角捕获的场景图像计算3D高斯参数。不幸的是，在许多现实世界的成像场景中，包括水下成像、建筑物内部的房间和自动导航，捕获周围视图（360°视角）图像是不可能或不切实际的。在这些受限基线成像场景中，GS算法遭受众所周知的“缺失锥”问题，这导致深度轴上重建质量差。在这份手稿中，我们展示了使用瞬态数据（来自声纳）允许我们通过沿深度轴采样高频数据来解决缺失锥问题。我们扩展了高斯喷涂算法适用于两种常用的声纳，并提出了融合算法，这些算法同时利用RGB摄像机数据和声纳数据。通过各种成像场景的模拟、仿真和硬件实验，我们展示了所提出的融合算法导致显著更好的新视角合成（PSNR提高了5 dB）和3D几何重建（Chamfer距离降低了60%）。\n"
  },
  {
    "path": "abs/2404.04880.md",
    "content": "### GauU-Scene V2: Expanse Lidar Image Dataset Shows Unreliable Geometric Reconstruction Using Gaussian Splatting and NeRF\n\nWe introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometers in the first version and 3.5 square kilometers in the second version, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. The dataset size is continually growing and now includes 6 scenarios. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. The detailed data acquisition protocol is appended in the supplementary material. It contains drone assembly, controller path planning, controller assembly, safety and protection, RTK Help, Drone Data Post-Processing, and many other details. U-Scene, developed under the auspices of the Chinese University of Hong Kong, Shenzhen MSU-BIT University, SZIIT, and auxiliary residential area, offers a unique blend of urban and academic environments for advanced spatial analysis. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance and innovation of our work.\n\n我们介绍了一个使用新开发的3D表示方法——高斯喷涂，在我们广泛的U-Scene数据集上的大规模场景重建基准。U-Scene在第一版本中覆盖超过一平方半公里，在第二版本中覆盖3.5平方公里，提供了一个全面的RGB数据集，并配有LiDAR地面真实数据。数据集的大小在持续增长，现在包括6个场景。对于数据采集，我们使用配备高精度Zenmuse L1 LiDAR的Matrix 300无人机，使得精确的屋顶数据收集成为可能。详细的数据采集协议附在补充材料中。它包含无人机组装、控制器路径规划、控制器组装、安全与保护、RTK帮助、无人机数据后处理等许多细节。U-Scene在中国香港中文大学、深圳莫斯科大学-北京理工大学、深圳信息职业技术学院和辅助住宅区的支持下开发，为高级空间分析提供了城市和学术环境的独特融合。我们使用高斯喷涂对U-Scene的评估包括跨各种新视点的详细分析。我们还将这些结果与我们精确的点云数据集得出的结果并列，突出显示出我们工作的重要性和创新之处的显著差异。\n"
  },
  {
    "path": "abs/2404.04908.md",
    "content": "### Dual-Camera Smooth Zoom on Mobile Phones\n\nWhen zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. The datasets, codes, and pre-trained models will be publicly available.\n\n在移动设备上双摄像头之间缩放时，几何内容和图像颜色在预览中会发生明显跳变，不可避免地影响用户的缩放体验。在这项工作中，我们引入了一个新任务，即双摄像头平滑缩放（DCSZ），以实现平滑的缩放预览。帧插值（FI）技术是一个潜在的解决方案，但在收集真实数据方面遇到困难。为了解决这个问题，我们建议一个数据工厂解决方案，其中连续的虚拟摄像头被组装起来，通过渲染场景的重建3D模型来生成DCSZ数据。特别地，我们提出了一个新颖的双摄像头平滑缩放高斯喷涂（ZoomGS），引入了一个特定于摄像头的编码，以构建每个虚拟摄像头的特定3D模型。借助提出的数据工厂，我们构建了一个用于DCSZ的合成数据集，并利用它来微调FI模型。此外，我们收集了没有真实数据的现实世界双重缩放图像进行评估。我们使用多种FI方法进行了广泛的实验。结果显示，微调后的FI模型在DCSZ任务上比原始模型取得了显著的性能提升。数据集、代码和预训练模型将公开可用。\n"
  },
  {
    "path": "abs/2404.05220.md",
    "content": "### StylizedGS: Controllable Stylization for 3D Gaussian Splatting\n\nWith the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user experience and the implicit nature limits its ability to transfer the geometric pattern styles. Additionally, the ability for artists to exert flexible control over stylized scenes is considered highly desirable, fostering an environment conducive to creative exploration. In this paper, we introduce StylizedGS, a 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. The 3DGS brings the benefits of high efficiency. We propose a GS filter to eliminate floaters in the reconstruction which affects the stylization effects before stylization. Then the nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale and regions during the stylization to possess customized capabilities. Our method can attain high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference FPS.\n\n随着XR的快速发展，3D生成和编辑变得越来越重要，其中，风格化是3D外观编辑的一个重要工具。它可以基于单一参考风格图像实现一致的3D艺术风格化，因此是一种用户友好的编辑方式。然而，最近基于NeRF的3D风格化方法面临效率问题，这影响了实际的用户体验，其隐含的性质限制了其传递几何图案风格的能力。此外，艺术家能够对风格化场景施加灵活控制被认为是非常理想的，有助于创造一个有利于创意探索的环境。在本文中，我们介绍了StylizedGS，这是一个基于3D高斯喷涂（3DGS）表示的3D神经风格转换框架，它能够对感知因素进行适应性控制。3DGS带来了高效率的好处。我们提出了一个GS滤波器来消除重建中影响风格化效果的漂浮物。然后引入基于最近邻的风格损失来通过微调3DGS的几何和颜色参数来实现风格化，同时提出了一个带有其他规则化的深度保持损失，以防止几何内容被篡改。此外，通过特别设计的损失，StylizedGS使用户能够在风格化过程中控制颜色、风格化尺度和区域，以拥有定制化的能力。我们的方法可以获得高质量的风格化结果，其特点是忠实的笔触和几何一致性，并具有灵活的控制。广泛的实验跨越不同的场景和风格证明了我们方法在风格化质量和推理FPS方面的有效性和效率。\n"
  },
  {
    "path": "abs/2404.06109.md",
    "content": "### Revising Densification in Gaussian Splatting\n\nIn this paper, we address the limitations of Adaptive Density Control (ADC) in 3D Gaussian Splatting (3DGS), a scene representation method achieving high-quality, photorealistic results for novel view synthesis. ADC has been introduced for automatic 3D point primitive management, controlling densification and pruning, however, with certain limitations in the densification logic. Our main contribution is a more principled, pixel-error driven formulation for density control in 3DGS, leveraging an auxiliary, per-pixel error function as the criterion for densification. We further introduce a mechanism to control the total number of primitives generated per scene and correct a bias in the current opacity handling strategy of ADC during cloning operations. Our approach leads to consistent quality improvements across a variety of benchmark scenes, without sacrificing the method's efficiency.\n\n在本文中，我们解决了自适应密度控制（ADC）在3D高斯喷涂（3DGS）中的限制，3DGS是一种场景表示方法，为新颖视图合成实现高质量、逼真的结果。ADC已被引入用于自动3D点原语管理，控制增密和修剪，然而，在增密逻辑上存在一定的限制。我们的主要贡献是一个更有原则的、以像素误差为驱动的3DGS密度控制公式，利用辅助的、每像素误差函数作为增密的标准。我们进一步引入了一种机制来控制每个场景生成的原语总数，并在ADC的克隆操作期间的当前不透明度处理策略中纠正了一个偏差。我们的方法在一系列基准场景中带来了一致的质量改进，而不牺牲方法的效率。\n"
  },
  {
    "path": "abs/2404.06128.md",
    "content": "### Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction\n\nWithin colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer.\n\n在结直肠癌诊断中，传统的结肠镜技术面临关键限制，包括有限的视野和缺乏深度信息，这可能阻碍了癌前病变检测。当前方法难以提供全面准确的结肠表面3D重建，这有助于最小化遗漏区域和对癌前息肉的重新检查。针对这一问题，我们引入了“高斯煎饼”，一种方法，它结合了3D高斯喷涂（3D GS）和基于递归神经网络的同时定位与建图（RNNSLAM）系统。通过在3D GS框架中引入几何和深度正则化，我们的方法确保了高斯与结肠表面的更准确对齐，结果在查看详细纹理和结构时产生了更平滑的3D重建。在三个不同数据集上的评估显示，“高斯煎饼”提高了新视图合成质量，与当前领先方法相比，PSNR提高了18%，SSIM提高了16%。它还实现了超过100倍的渲染速度和超过10倍的训练时间缩短，使其成为实时应用的实用工具。因此，这为实现临床转化以更好地检测和诊断结直肠癌提供了希望。\n"
  },
  {
    "path": "abs/2404.06270.md",
    "content": "### 3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis\n\nIn this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance.\n\n在这篇论文中，我们提出了一种用于动态视图合成的3D几何感知可变形高斯喷涂方法。现有的基于神经辐射场（NeRF）的解决方案以隐式方式学习变形，无法融入3D场景几何信息。因此，学习到的变形不一定在几何上是连贯的，这导致了动态视图合成和3D动态重建的结果不尽人意。最近，3D高斯喷涂提供了3D场景的新表示方式，基于此可以在学习复杂的3D变形时利用3D几何信息。具体来说，场景被表示为一系列3D高斯体，其中每个3D高斯体随时间移动和旋转以模拟变形。为了在变形过程中强制执行3D场景几何约束，我们显式提取3D几何特征并将其集成到学习3D变形中。通过这种方式，我们的解决方案实现了3D几何感知的变形建模，从而使动态视图合成和3D动态重建得到改进。在合成和真实数据集上的大量实验结果证明了我们解决方案的优越性，实现了新的最先进性能。\n\n"
  },
  {
    "path": "abs/2404.06903.md",
    "content": "### DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting\n\nThe increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360∘ scene generation pipeline that facilitates the creation of comprehensive 360∘ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary \"flat\" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360∘ perspective, providing an enhanced immersive experience over existing techniques.\n\n随着虚拟现实应用需求的增加，制作沉浸式3D资产的重要性日益凸显。我们提出了一种文本到3D全景360度场景生成流程，该流程能在几分钟内为野外环境创建全面的360度场景。我们的方法利用了2D扩散模型的生成能力和自我完善提示来创建高质量且全局一致的全景图像。这个图像作为初步的“平面”（2D）场景表示。随后，它被提升到3D高斯体，使用喷涂技术以实现实时探索。为了产生一致的3D几何结构，我们的流程通过将2D单眼深度对齐到全局优化的点云中来构建空间连贯的结构。这个点云作为3D高斯体的质心的初始状态。为了解决单视角输入固有的不可见问题，我们在合成和输入相机视图上施加语义和几何约束作为规范。这些约束指导高斯体的优化，帮助重建未见区域。总之，我们的方法提供了一个全局一致的3D场景，具有360度的视角，相较现有技术提供了更优的沉浸式体验。\n"
  },
  {
    "path": "abs/2404.06926.md",
    "content": "### Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting\n\nWe present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems.\n\n我们提出了一个实时的激光雷达-惯性-相机SLAM系统，其映射后端采用3D高斯喷涂技术。利用我们的激光雷达-惯性-相机测距（Coco-LIC）所提供的稳定姿态估计，本文提出了一个增量式的逼真映射系统。我们从彩色化的激光雷达点初始化3D高斯体，并使用由3D高斯喷涂提供动力的可微渲染进行优化。我们精心设计的策略用于逐步扩展高斯图并自适应控制其密度，确保高质量的映射并具备实时处理能力。在多种场景下进行的实验表明，与现有的基于辐射场的SLAM系统相比，我们的方法具有优越的性能。\n"
  },
  {
    "path": "abs/2404.07199.md",
    "content": "### RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion\n\nWe introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.\n\n我们介绍了RealmDreamer，这是一种从文本描述生成通用前向3D场景的技术。我们的技术优化了3D高斯喷涂表示，以匹配复杂的文本提示。我们利用最先进的文本到图像生成器初始化这些喷涂，将其样本提升到3D，并计算遮挡体积。然后，我们将这种表示作为一个3D内画任务，在多个视图中进行优化，使用基于图像的扩散模型。为了学习正确的几何结构，我们通过对来自内画模型的样本进行条件化，纳入了一个深度扩散模型，从而提供丰富的几何结构。最后，我们使用来自图像生成器的锐化样本对模型进行微调。值得注意的是，我们的技术不需要视频或多视图数据，可以合成多种不同风格的高质量3D场景，包括多个对象。其通用性还允许从单一图像进行3D合成。\n"
  },
  {
    "path": "abs/2404.07950.md",
    "content": "### Reinforcement Learning with Generalizable Gaussian Splatting\n\nAn excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box\", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.\n\n在强化学习（RL）中，尤其是视觉基础的强化学习任务中，出色的表现对于性能至关重要。环境表征的质量直接影响学习任务的成果。以往基于视觉的RL通常使用显式或隐式的方式来表征环境，如图像、点、体素和神经辐射场。然而，这些表征存在一些缺陷。它们要么无法描述复杂的局部几何形状，要么在未见场景中泛化能力差，或者需要精确的前景遮罩。此外，这些隐式的神经表征类似于“黑盒”，极大地阻碍了可解释性。3D高斯喷涂（3DGS），凭借其显式的场景表现和可微渲染的特性，被认为是重建和表征方法的革命性变革。在本文中，我们提出了一种名为GSRL的新型可泛化高斯喷涂框架，作为RL任务的表征。通过在RoboMimic环境中的验证，我们的方法在多个任务中比其他基线模型取得了更好的结果，在最难的任务上相比基线模型提高了10%、44%和15%的性能。这项工作是首次尝试利用可泛化的3DGS作为RL的表征。\n"
  },
  {
    "path": "abs/2404.07991.md",
    "content": "### GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh\n\nWe introduce GoMAvatar, a novel approach for real-time, memory-efficient, high-quality animatable human modeling. GoMAvatar takes as input a single monocular video to create a digital avatar capable of re-articulation in new poses and real-time rendering from novel viewpoints, while seamlessly integrating with rasterization-based graphics pipelines. Central to our method is the Gaussians-on-Mesh representation, a hybrid 3D model combining rendering quality and speed of Gaussian splatting with geometry modeling and compatibility of deformable meshes. We assess GoMAvatar on ZJU-MoCap data and various YouTube videos. GoMAvatar matches or surpasses current monocular human modeling algorithms in rendering quality and significantly outperforms them in computational efficiency (43 FPS) while being memory-efficient (3.63 MB per subject).\n\n我们介绍了一种名为GoMAvatar的全新方法，用于实时、高效、高质量的可动画人类建模。GoMAvatar的输入是单个单眼视频，可以创建能够在新姿势下重新构造，并从新的视点实时渲染的数字化头像，同时无缝集成到基于光栅化的图形管线中。我们方法的核心是高斯网格表示，这是一种结合了高斯平滑渲染的质量与速度和可变形网格的几何建模及兼容性的混合3D模型。我们在ZJU-MoCap数据和多个YouTube视频上评估了GoMAvatar。GoMAvatar在渲染质量上匹配或超越了当前的单眼人体建模算法，并在计算效率上显著优于它们（43 FPS），同时具有内存效率高（每个对象3.63 MB）。\n"
  },
  {
    "path": "abs/2404.08449.md",
    "content": "### OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering\n\nRendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively.\n\n在单眼视频中渲染动态3D人体对于虚拟现实和数字娱乐等多种应用至关重要。大多数方法假设人物处于无遮挡的场景中，而在现实生活场景中，各种物体可能会导致身体部分被遮挡。之前的方法使用NeRF进行表面渲染以恢复被遮挡区域，但它需要超过一天的时间来训练并需要几秒钟来渲染，无法满足实时交互应用的要求。为解决这些问题，我们提出了基于3D高斯平滑的OccGaussian，该方法可以在6分钟内训练完成，并能以每秒最高160帧的速度产生高质量的人体渲染，即使输入被遮挡。OccGaussian在典型空间初始化3D高斯分布，并在被遮挡区域进行遮挡特征查询，提取聚合的像素对齐特征以补偿缺失信息。然后，我们使用高斯特征MLP进一步处理这些特征，并结合感知遮挡区域的遮挡感知损失函数。广泛的实验，包括在模拟和现实世界的遮挡中，证明我们的方法与最先进的方法相比，达到了可比甚至更优的性能。并且我们分别将训练和推理速度提高了250倍和800倍。\n"
  },
  {
    "path": "abs/2404.08966.md",
    "content": "### LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field\n\nCinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation.\n\n电影图（Cinemagraph）是一种独特的视觉媒介形式，它结合了静态摄影的元素和微妙的运动，创造了引人入胜的体验。然而，最近的作品生成的视频多数缺乏深度信息，仅限于二维图像空间的约束。在本文中，受到新视角合成（NVS）领域3D高斯飞溅（3D-GS）取得显著进展的启发，我们提出了LoopGaussian方法，以3D高斯模型将电影图从二维图像空间提升至三维空间。为实现这一目标，我们首先采用3D-GS方法从多视图图像重建静态场景的3D高斯点云，并加入形状规范化项以防止由物体变形引起的模糊或者艺术效果。接着，我们采用为3D高斯量身定制的自编码器，将其投影到特征空间。为了保持场景的局部连续性，我们设计了基于获取特征的SuperGaussian进行聚类。通过计算聚类之间的相似性并采用两阶段估计方法，我们推导出用以描述整个场景中速度的欧拉运动场。3D高斯点随后在估计的欧拉运动场中移动。通过双向动画技术，我们最终生成了一个展示自然且可无缝循环动态的三维电影图。实验结果验证了我们方法的有效性，展示了高质量且视觉上引人入胜的场景生成。\n"
  },
  {
    "path": "abs/2404.09105.md",
    "content": "### EGGS: Edge Guided Gaussian Splatting for Radiance Fields\n\nThe Gaussian splatting methods are getting popular. However, their loss function only contains the ℓ1 norm and the structural similarity between the rendered and input images, without considering the edges in these images. It is well-known that the edges in an image provide important information. Therefore, in this paper, we propose an Edge Guided Gaussian Splatting (EGGS) method that leverages the edges in the input images. More specifically, we give the edge region a higher weight than the flat region. With such edge guidance, the resulting Gaussian particles focus more on the edges instead of the flat regions. Moreover, such edge guidance does not crease the computation cost during the training and rendering stage. The experiments confirm that such simple edge-weighted loss function indeed improves about 1∼2 dB on several difference data sets. With simply plugging in the edge guidance, the proposed method can improve all Gaussian splatting methods in different scenarios, such as human head modeling, building 3D reconstruction, etc.\n\n高斯飞溅方法正变得越来越流行。然而，它们的损失函数仅包含渲染图像与输入图像之间的 ℓ1 范数和结构相似性度量，未考虑这些图像中的边缘信息。众所周知，图像中的边缘提供了重要信息。因此，在本文中，我们提出了一种边缘引导的高斯飞溅方法（Edge Guided Gaussian Splatting，简称 EGGS），该方法利用输入图像中的边缘。更具体地说，我们给边缘区域比平坦区域更高的权重。通过这种边缘引导，生成的高斯粒子更多地聚焦于边缘而非平坦区域。此外，这种边缘引导在训练和渲染阶段不增加计算成本。实验证实，这种简单的边缘加权损失函数确实在几个不同的数据集上提高了约1到2分贝。通过简单地加入边缘引导，所提出的方法可以改善不同场景下的所有高斯飞溅方法，如人头建模、建筑3D重建等。\n"
  },
  {
    "path": "abs/2404.09227.md",
    "content": "### DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling\n\nRecent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain. Nevertheless, generating 3D scenes characterized by multiple instances and intricate arrangements remains challenging. In this study, we present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions, leveraging the strong 3D representation capabilities of Gaussian Splatting and the complex arrangement abilities of large language models (LLMs). Our approach involves a 3D Gaussian Guide (3DG2) for scene representation, consisting of semantic primitives (objects) and their spatial transformations and relationships derived directly from text prompts using LLMs. This compositional representation allows for local-to-global optimization of the entire scene. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene, which addresses training instability issue arising from simple blending in the subsequent global optimization stage. To mitigate potential biases of LLM priors, we model collision relationships between objects at the global level, enhancing physical correctness and overall realism. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we introduce a sparse initialization and densification strategy. Experiments demonstrate that DreamScape offers high usability and controllability, enabling the generation of high-fidelity 3D scenes from only text prompts and achieving state-of-the-art performance compared to other methods.\n\n近期文本到3D创作的进展得益于将从文本到图像生成中的强大先验扩散模型集成到3D领域中。然而，生成具有多个实例和复杂布局的3D场景仍然具有挑战性。在这项研究中，我们提出了DreamScape方法，仅从文本描述中创建高度一致的3D场景，利用高斯飞溅的强大3D表现能力和大型语言模型（LLMs）的复杂布局能力。我们的方法涉及使用3D高斯指导（3DG2）进行场景表示，包括语义原语（对象）及其空间变换和关系，这些都是直接从文本提示使用LLMs导出的。这种组合表示允许对整个场景进行从局部到全局的优化。在局部对象生成过程中，我们量身定制了逐步规模控制，确保不同大小和密度的对象适应场景，这解决了后续全局优化阶段由简单混合引起的训练不稳定问题。为了减轻LLM先验的潜在偏见，我们在全球层面上模拟物体之间的碰撞关系，增强物理正确性和整体现实感。此外，为了生成如雨和雪这样广泛分布在场景中的普遍对象，我们引入了稀疏初始化和密集化策略。实验表明，DreamScape提供了高度的可用性和可控性，使得仅从文本提示生成高保真度的3D场景成为可能，并与其他方法相比实现了最先进的性能。\n"
  },
  {
    "path": "abs/2404.09412.md",
    "content": "### DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading\n\nReconstructing and editing 3D objects and scenes both play crucial roles in computer graphics and computer vision. Neural radiance fields (NeRFs) can achieve realistic reconstruction and editing results but suffer from inefficiency in rendering. Gaussian splatting significantly accelerates rendering by rasterizing Gaussian ellipsoids. However, Gaussian splatting utilizes a single Spherical Harmonic (SH) function to model both texture and lighting, limiting independent editing capabilities of these components. Recently, attempts have been made to decouple texture and lighting with the Gaussian splatting representation but may fail to produce plausible geometry and decomposition results on reflective scenes. Additionally, the forward shading technique they employ introduces noticeable blending artifacts during relighting, as the geometry attributes of Gaussians are optimized under the original illumination and may not be suitable for novel lighting conditions. To address these issues, we introduce DeferredGS, a method for decoupling and editing the Gaussian splatting representation using deferred shading. To achieve successful decoupling, we model the illumination with a learnable environment map and define additional attributes such as texture parameters and normal direction on Gaussians, where the normal is distilled from a jointly trained signed distance function. More importantly, we apply deferred shading, resulting in more realistic relighting effects compared to previous methods. Both qualitative and quantitative experiments demonstrate the superior performance of DeferredGS in novel view synthesis and editing tasks.\n\n在计算机图形学和计算机视觉领域，重建和编辑3D对象及场景都扮演着关键角色。神经辐射场（NeRFs）能够实现真实的重建和编辑结果，但在渲染效率上存在不足。高斯飞溅通过光栅化高斯椭球体显著加速了渲染。然而，高斯飞溅使用单一的球谐函数（SH）来同时模拟纹理和光照，限制了这些组成部分的独立编辑能力。最近，已经有尝试使用高斯飞溅表现形式解耦纹理和光照，但在反射场景上可能无法生成可信的几何和分解结果。此外，他们采用的正向着色技术在重新照明时引入了明显的混合伪影，因为高斯的几何属性是在原始照明下优化的，可能不适用于新的照明条件。为了解决这些问题，我们引入了DeferredGS方法，使用延迟着色来解耦和编辑高斯飞溅表现形式。为了实现成功的解耦，我们使用可学习的环境映射来模拟照明，并在高斯上定义额外属性，如纹理参数和法线方向，其中法线是从共同训练的有符号距离函数中提取的。更重要的是，我们应用了延迟着色，与以往方法相比，实现了更真实的重新照明效果。定性和定量实验均证明了DeferredGS在新视角合成和编辑任务中的卓越性能。\n"
  },
  {
    "path": "abs/2404.09458.md",
    "content": "### CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting\n\nGaussian splatting, renowned for its exceptional rendering quality and efficiency, has emerged as a prominent technique in 3D scene representation. However, the substantial data volume of Gaussian splatting impedes its practical utility in real-world applications. Herein, we propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS), which harnesses compact Gaussian primitives for faithful 3D scene modeling with a remarkably reduced data size. To ensure the compactness of Gaussian primitives, we devise a hybrid primitive structure that captures predictive relationships between each other. Then, we exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms. Moreover, we develop a rate-constrained optimization scheme to eliminate redundancies within such hybrid primitives, steering our CompGS towards an optimal trade-off between bitrate consumption and representation efficacy. Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality. Our code will be released on GitHub for further research.\n\n高斯飞溅以其卓越的渲染质量和效率而闻名，已成为3D场景表示中的一种突出技术。然而，高斯飞溅的大数据量阻碍了其在实际应用中的实用性。在这里，我们提出了一种高效的3D场景表示方法，名为压缩高斯飞溅（Compressed Gaussian Splatting，简称CompGS），该方法利用紧凑的高斯原语进行忠实的3D场景建模，显著减小了数据大小。为确保高斯原语的紧凑性，我们设计了一种混合原语结构，捕捉彼此之间的预测关系。然后，我们利用一小组锚定原语进行预测，使大多数原语被封装成高度紧凑的残差形式。此外，我们开发了一种受速率约束的优化方案，以消除这种混合原语内的冗余，引导我们的CompGS实现比特率消耗与表示效率之间的最优权衡。实验结果显示，所提出的CompGS显著优于现有方法，在不损害模型精确度和渲染质量的情况下，实现了3D场景表示的超级紧凑性。我们的代码将在GitHub上发布，以供进一步研究。\n"
  },
  {
    "path": "abs/2404.09591.md",
    "content": "### 3D Gaussian Splatting as Markov Chain Monte Carlo\n\nWhile 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which does not always generalize and may lead to poor-quality renderings. In addition, for real-world scenes, they rely on a good initial point cloud to perform well. In this work, we rethink 3D Gaussians as random samples drawn from an underlying probability distribution describing the physical representation of the scene -- in other words, Markov Chain Monte Carlo (MCMC) samples. Under this view, we show that the 3D Gaussian updates are strikingly similar to a Stochastic Langevin Gradient Descent (SGLD) update. As with MCMC, samples are nothing but past visit locations, adding new Gaussians under our framework can simply be realized without heuristics as placing Gaussians at existing Gaussian locations. To encourage using fewer Gaussians for efficiency, we introduce an L1-regularizer on the Gaussians. On various standard evaluation scenes, we show that our method provides improved rendering quality, easy control over the number of Gaussians, and robustness to initialization.\n\n尽管3D高斯飞溅最近在神经渲染领域变得流行，但当前方法依赖于精心设计的克隆和分割策略来放置高斯点，这并不总是泛化的，可能导致渲染质量较差。此外，对于现实世界的场景，它们依赖于一个良好的初始点云以表现良好。在这项工作中，我们重新思考3D高斯点作为从描述场景物理表示的潜在概率分布中抽取的随机样本——换句话说，是马尔可夫链蒙特卡洛（MCMC）样本。在这种观点下，我们展示3D高斯的更新与随机朗之万梯度下降（SGLD）的更新惊人地相似。就像MCMC一样，样本仅是过去的访问位置，根据我们的框架，添加新的高斯点可以简单地实现，无需启发式，只需将高斯点放置在现有高斯位置上。为了鼓励使用更少的高斯点以提高效率，我们引入了一个对高斯点的L1正则化器。在各种标准评估场景上，我们展示了我们的方法提供了改进的渲染质量、对高斯点数量的简易控制以及对初始化的鲁棒性。\n"
  },
  {
    "path": "abs/2404.09748.md",
    "content": "### LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives\n\nLarge garages are ubiquitous yet intricate scenes in our daily lives, posing challenges characterized by monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction fail in these environments due to poor correspondence construction. To address these challenges, this paper introduces LetsGo, a LiDAR-assisted Gaussian splatting approach for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate LiDAR and image data scanning. With this Polar device, we present a GarageWorld dataset consisting of five expansive garage scenes with diverse geometric structures and will release the dataset to the community for further research. We demonstrate that the collected LiDAR point cloud by the Polar device enhances a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We also propose a novel depth regularizer for 3D Gaussian splatting algorithm training, effectively eliminating floating artifacts in rendered images, and a lightweight Level of Detail (LOD) Gaussian renderer for real-time viewing on web-based devices. Additionally, we explore a hybrid representation that combines the advantages of traditional mesh in depicting simple geometry and colors (e.g., walls and the ground) with modern 3D Gaussian representations capturing complex details and high-frequency textures. This strategy achieves an optimal balance between memory performance and rendering quality. Experimental results on our dataset, along with ScanNet++ and KITTI-360, demonstrate the superiority of our method in rendering quality and resource efficiency.\n\n大型车库是我们日常生活中无处不在但又复杂的场景，由于单调的颜色、重复的图案、反光表面和透明的车辆玻璃，它们带来了特有的挑战。传统的运动恢复结构（Structure from Motion, SfM）方法在这些环境中因对应构建不佳而失败。为了应对这些挑战，本文介绍了LetsGo，一种激光雷达辅助的高斯飞溅方法，用于大规模车库建模和渲染。我们开发了一种手持扫描器Polar，配备了IMU、激光雷达和鱼眼相机，以便进行精确的激光雷达和图像数据扫描。利用这种Polar设备，我们呈现了一个包含五个具有不同几何结构的广阔车库场景的GarageWorld数据集，并将发布该数据集供社区进一步研究。我们展示了由Polar设备收集的激光雷达点云如何增强一套3D高斯飞溅算法，用于车库场景的建模和渲染。我们还提出了一个新颖的深度正则化器，用于3D高斯飞溅算法的训练，有效消除渲染图像中的浮动伪影，并提出了一个轻量级的细节级别（Level of Detail, LOD）高斯渲染器，用于基于Web的设备上的实时查看。此外，我们探索了一种混合表现形式，它结合了传统网格在描述简单几何和颜色（如墙壁和地面）方面的优势与现代3D高斯表现形式捕捉复杂细节和高频纹理的能力。这种策略实现了内存性能和渲染质量之间的最佳平衡。我们在自己的数据集以及ScanNet++和KITTI-360上的实验结果展示了我们方法在渲染质量和资源效率方面的优越性。\n"
  },
  {
    "path": "abs/2404.10318.md",
    "content": "### SRGS: Super-Resolution 3D Gaussian Splatting\n\nRecently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address this problem, we propose Super-Resolution 3D Gaussian Splatting (SRGS) to perform the optimization in a high-resolution (HR) space. The sub-pixel constraint is introduced for the increased viewpoints in HR space, exploiting the sub-pixel cross-view information of the multiple low-resolution (LR) views. The gradient accumulated from more viewpoints will facilitate the densification of primitives. Furthermore, a pre-trained 2D super-resolution model is integrated with the sub-pixel constraint, enabling these dense primitives to learn faithful texture features. In general, our method focuses on densification and texture learning to effectively enhance the representation ability of primitives. Experimentally, our method achieves high rendering quality on HRNVS only with LR inputs, outperforming state-of-the-art methods on challenging datasets such as Mip-NeRF 360 and Tanks & Temples. Related codes will be released upon acceptance.\n\n近期，3D高斯飞溅（3DGS）作为一种新颖的显式3D表现形式受到了广泛关注。这种方法依赖于高斯原语的表现力来提供高质量的渲染。然而，低分辨率下优化的原语不可避免地表现出稀疏性和纹理缺失，这给实现高分辨率新视角合成（HRNVS）带来了挑战。为解决这一问题，我们提出了超分辨率3D高斯飞溅（SRGS）在高分辨率（HR）空间进行优化。我们为HR空间中增加的视点引入了子像素约束，利用多个低分辨率（LR）视图的子像素跨视图信息。从更多视点积累的梯度将有助于原语的密集化。此外，一个预训练的2D超分辨率模型与子像素约束相结合，使这些密集的原语能够学习忠实的纹理特征。总的来说，我们的方法专注于原语的密集化和纹理学习，有效地增强了原语的表现能力。实验上，我们的方法仅使用LR输入在HRNVS上实现了高渲染质量，超过了如Mip-NeRF 360和Tanks & Temples等挑战性数据集上的最先进方法。相关代码将在文章接受后发布。\n"
  },
  {
    "path": "abs/2404.10484.md",
    "content": "### AbsGS: Recovering Fine Details for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3D-GS) technique couples 3D Gaussian primitives with differentiable rasterization to achieve high-quality novel view synthesis results while providing advanced real-time rendering performance. However, due to the flaw of its adaptive density control strategy in 3D-GS, it frequently suffers from over-reconstruction issue in intricate scenes containing high-frequency details, leading to blurry rendered images. The underlying reason for the flaw has still been under-explored. In this work, we present a comprehensive analysis of the cause of aforementioned artifacts, namely gradient collision, which prevents large Gaussians in over-reconstructed regions from splitting. To address this issue, we propose the novel homodirectional view-space positional gradient as the criterion for densification. Our strategy efficiently identifies large Gaussians in over-reconstructed regions, and recovers fine details by splitting. We evaluate our proposed method on various challenging datasets. The experimental results indicate that our approach achieves the best rendering quality with reduced or similar memory consumption. Our method is easy to implement and can be incorporated into a wide variety of most recent Gaussian Splatting-based methods.\n\n\n3D高斯飞溅（3D-GS）技术将3D高斯原语与可微光栅化结合起来，实现了高质量的新视角合成结果，同时提供了先进的实时渲染性能。然而，由于其在3D-GS中的自适应密度控制策略存在缺陷，这种方法在包含高频细节的复杂场景中经常出现过度重建问题，导致渲染图像模糊。这一缺陷的根本原因尚未被深入探索。在这项工作中，我们对上述伪影的原因进行了全面分析，即梯度碰撞，这阻止了过度重建区域中的大高斯原语分裂。为了解决这个问题，我们提出了一种新的同向视空间位置梯度作为密集化的标准。我们的策略有效地识别了过度重建区域中的大高斯原语，并通过分裂恢复细节。我们在各种具有挑战性的数据集上评估了我们提出的方法。实验结果表明，我们的方法在减少或相似的内存消耗下实现了最佳的渲染质量。我们的方法易于实现，可以集成到广泛的最新基于高斯飞溅的方法中。\n"
  },
  {
    "path": "abs/2404.10625.md",
    "content": "### Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks\n\nNeRF-based 3D-aware Generative Adversarial Networks (GANs) like EG3D or GIRAFFE have shown very high rendering quality under large representational variety. However, rendering with Neural Radiance Fields poses challenges for 3D applications: First, the significant computational demands of NeRF rendering preclude its use on low-power devices, such as mobiles and VR/AR headsets. Second, implicit representations based on neural networks are difficult to incorporate into explicit 3D scenes, such as VR environments or video games. 3D Gaussian Splatting (3DGS) overcomes these limitations by providing an explicit 3D representation that can be rendered efficiently at high frame rates. In this work, we present a novel approach that combines the high rendering quality of NeRF-based 3D-aware GANs with the flexibility and computational advantages of 3DGS. By training a decoder that maps implicit NeRF representations to explicit 3D Gaussian Splatting attributes, we can integrate the representational diversity and quality of 3D GANs into the ecosystem of 3D Gaussian Splatting for the first time. Additionally, our approach allows for a high resolution GAN inversion and real-time GAN editing with 3D Gaussian Splatting scenes.\n\n基于NeRF的3D感知生成对抗网络（GAN），如EG3D或GIRAFFE，已展示出在大范围表示性方面的非常高的渲染质量。然而，使用神经辐射场（NeRF）进行渲染对3D应用带来了挑战：首先，NeRF渲染的显著计算需求阻止了其在低功率设备上的使用，例如移动设备和VR/AR头显。其次，基于神经网络的隐式表征难以融入到显式3D场景中，如VR环境或视频游戏。3D高斯飞溅（3DGS）通过提供一个可以高帧率高效渲染的显式3D表征，克服了这些限制。在这项工作中，我们提出了一种新颖的方法，将基于NeRF的3D感知GAN的高渲染质量与3DGS的灵活性和计算优势结合起来。通过训练一个将隐式NeRF表征映射到显式3D高斯飞溅属性的解码器，我们首次将3D GAN的表征多样性和质量整合到3D高斯飞溅的生态系统中。此外，我们的方法允许高分辨率的GAN反演以及实时的GAN编辑与3D高斯飞溅场景。\n"
  },
  {
    "path": "abs/2404.10772.md",
    "content": "### Gaussian Opacity Fields: Efficient and Compact Surface Reconstruction in Unbounded Scenes\n\nRecently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and compact surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing marching tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to, or even outperforms, neural implicit methods in both quality and speed.\n\n最近，3D高斯飞溅（3DGS）在新视角合成方面展示了令人印象深刻的结果，同时允许实时渲染高分辨率图像。然而，利用3D高斯进行表面重建因其显式且断开的性质而面临重大挑战。在这项工作中，我们提出了高斯不透明度场（GOF），一种用于无界场景中高效、高质量和紧凑表面重建的新方法。我们的GOF基于基于光线追踪的3D高斯体积渲染，使直接从3D高斯提取几何体成为可能，通过识别其水平集，无需依赖于之前工作中的泊松重建或TSDF融合。我们将高斯的表面法线近似为光线与高斯交点平面的法线，使得可以应用显著增强几何体的正则化。此外，我们开发了一种高效的几何体提取方法，使用行进四面体，其中四面体网格由3D高斯诱导并因此适应场景的复杂性。我们的评估显示，GOF在表面重建和新视角合成方面超越了现有的基于3DGS的方法。此外，它在质量和速度方面与神经隐式方法相比有优势，甚至表现更好。\n"
  },
  {
    "path": "abs/2404.11358.md",
    "content": "### DeblurGS: Gaussian Splatting for Camera Motion Blur\n\nAlthough significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to real-world applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challenge, we propose DeblurGS, a method to optimize sharp 3D Gaussian Splatting from motion-blurred images, even with the noisy camera pose initialization. We restore a fine-grained sharp scene by leveraging the remarkable reconstruction capability of 3D Gaussian Splatting. Our approach estimates the 6-Degree-of-Freedom camera motion for each blurry observation and synthesizes corresponding blurry renderings for the optimization process. Furthermore, we propose Gaussian Densification Annealing strategy to prevent the generation of inaccurate Gaussians at erroneous locations during the early training stages when camera motion is still imprecise. Comprehensive experiments demonstrate that our DeblurGS achieves state-of-the-art performance in deblurring and novel view synthesis for real-world and synthetic benchmark datasets, as well as field-captured blurry smartphone videos.\n\n尽管在从运动模糊图像重建清晰的三维场景方面取得了显著进展，但向现实世界应用的过渡仍然面临挑战。主要障碍源于严重的模糊，这导致通过结构从运动（Structure-from-Motion）获取初始相机姿态时出现不准确性，这一关键方面常被先前方法忽视。为了应对这一挑战，我们提出了一种方法 DeblurGS，即使在初始相机姿态带有噪声的情况下，也能优化从运动模糊图像恢复清晰的三维高斯打点（Gaussian Splatting）。我们通过利用三维高斯打点的卓越重建能力，恢复出细致的清晰场景。我们的方法估计每个模糊观察中的六自由度相机运动，并为优化过程合成相应的模糊渲染。此外，我们提出了高斯密度退火策略，以防止在相机运动尚不精确的早期训练阶段，在错误位置生成不准确的高斯。全面的实验表明，我们的 DeblurGS 在实际和合成基准数据集以及现场捕获的模糊智能手机视频的去模糊和新视角合成方面达到了业界领先水平。\n"
  },
  {
    "path": "abs/2404.11613.md",
    "content": "### InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior\n\n3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion.\n\n三维高斯最近作为新视角合成的一种高效表示方式而出现。本研究着重探讨了其可编辑性，特别是针对补画任务，该任务旨在为不完整的三维高斯点集补充额外的点，以实现视觉上的和谐渲染。与二维补画相比，补画三维高斯的关键是确定引入点的渲染相关属性，这些属性的优化在很大程度上受益于它们最初的三维位置。为此，我们提出使用图像条件深度完成模型指导点的初始化，该模型学习基于观察到的图像直接恢复深度图。这种设计允许我们的模型在与原始深度对齐的尺度上填充深度值，并且还能利用大规模扩散先验的强大泛化能力。由于深度完成更加准确，我们的方法，名为 InFusion，在各种复杂场景下以更好的保真度和效率超越现有替代方案。我们进一步通过几个实际应用来展示 InFusion 的有效性，如使用用户特定纹理的补画或插入新的物体。\n"
  },
  {
    "path": "abs/2404.12379.md",
    "content": "### Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos\n\nModern 3D engines and graphics pipelines require mesh as a memory-efficient representation, which allows efficient rendering, geometry processing, texture editing, and many other downstream operations. However, it is still highly difficult to obtain high-quality mesh in terms of structure and detail from monocular visual observations. The problem becomes even more challenging for dynamic scenes and objects. To this end, we introduce Dynamic Gaussians Mesh (DG-Mesh), a framework to reconstruct a high-fidelity and time-consistent mesh given a single monocular video. Our work leverages the recent advancement in 3D Gaussian Splatting to construct the mesh sequence with temporal consistency from a video. Building on top of this representation, DG-Mesh recovers high-quality meshes from the Gaussian points and can track the mesh vertices over time, which enables applications such as texture editing on dynamic objects. We introduce the Gaussian-Mesh Anchoring, which encourages evenly distributed Gaussians, resulting better mesh reconstruction through mesh-guided densification and pruning on the deformed Gaussians. By applying cycle-consistent deformation between the canonical and the deformed space, we can project the anchored Gaussian back to the canonical space and optimize Gaussians across all time frames. During the evaluation on different datasets, DG-Mesh provides significantly better mesh reconstruction and rendering than baselines.\n\n现代三维引擎和图形管线需要使用网格作为一种存储高效的表现形式，这使得渲染、几何处理、纹理编辑以及许多其他下游操作变得更加高效。然而，从单目视觉观察中获得高质量的网格在结构和细节上仍然是非常困难的。这个问题在动态场景和对象上尤其具有挑战性。为此，我们引入了动态高斯网格（Dynamic Gaussians Mesh，简称 DG-Mesh）框架，该框架可以基于单个单目视频重建出高保真且时间连贯的网格。我们的工作利用了最新的三维高斯喷溅技术（3D Gaussian Splatting），从视频中构建具有时间一致性的网格序列。在这种表现形式的基础上，DG-Mesh 从高斯点恢复出高质量的网格，并能够随时间追踪网格顶点，这使得它能够应用于动态对象的纹理编辑。我们引入了高斯-网格锚定技术，该技术鼓励高斯点均匀分布，通过对变形高斯的网格引导的密集化和修剪，实现更好的网格重建。通过在规范空间和变形空间之间应用循环一致的变形，我们可以将锚定的高斯点投影回规范空间，并在所有时间帧上优化高斯点。在不同数据集上的评估中，DG-Mesh 在网格重建和渲染方面比基准模型表现出显著的改进。\n"
  },
  {
    "path": "abs/2404.12547.md",
    "content": "### Does Gaussian Splatting need SFM Initialization?\n\n3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome. To this end, we investigate various initialization strategies for Gaussian Splatting and delve into how volumetric reconstructions from Neural Radiance Fields (NeRF) can be utilized to bypass the dependency on SFM data. Our findings demonstrate that random initialization can perform much better if carefully designed and that by employing a combination of improved initialization strategies and structure distillation from low-cost NeRF models, it is possible to achieve equivalent results, or at times even superior, to those obtained from SFM initialization.\n\n3D 高斯喷溅近来被广泛认为是场景重建和新视角合成的一种多功能且有效的方法，这归功于其高质量的结果和与硬件光栅化的兼容性。尽管有其优点，高斯喷溅依赖于由结构运动（SFM）算法进行的高质量点云初始化是一个需要克服的重大限制。为此，我们研究了各种高斯喷溅的初始化策略，并深入探讨了如何利用来自神经辐射场（NeRF）的体积重建来绕过对SFM数据的依赖。我们的发现表明，如果设计得当，随机初始化的性能可以大大提升，而且通过采用改进的初始化策略和从低成本NeRF模型中提取结构，有可能达到与SFM初始化相当，甚至有时超过SFM初始化的结果。\n"
  },
  {
    "path": "abs/2404.12777.md",
    "content": "### EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation\n\nIn the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k×4k pixels) is hindered by the excessive computational requirements for managing a large number of Gaussians. Addressing this, we introduce 'EfficientGS', an advanced approach that optimizes 3DGS for high-resolution, large-scale scenes. We analyze the densification process in 3DGS and identify areas of Gaussian over-proliferation. We propose a selective strategy, limiting Gaussian increase to key primitives, thereby enhancing the representational efficiency. Additionally, we develop a pruning mechanism to remove redundant Gaussians, those that are merely auxiliary to adjacent ones. For further enhancement, we integrate a sparse order increment for Spherical Harmonics (SH), designed to alleviate storage constraints and reduce training overhead. Our empirical evaluations, conducted on a range of datasets including extensive 4K+ aerial images, demonstrate that 'EfficientGS' not only expedites training and rendering times but also achieves this with a model size approximately tenfold smaller than conventional 3DGS while maintaining high rendering fidelity.\n\n在3D场景表现领域，3D高斯喷溅（3DGS）已成为一项关键技术。然而，其应用于大规模、高分辨率场景（超过4k×4k像素）时，由于管理大量高斯的计算需求过高而受到限制。针对这一问题，我们推出了“EfficientGS”，一种优化3DGS以适应高分辨率、大规模场景的先进方法。我们分析了3DGS中的密集化过程，并识别了高斯过度增殖的区域。我们提出了一种选择性策略，仅在关键原始体上限制高斯增加，从而提高了表现效率。此外，我们开发了一种修剪机制，用于移除多余的高斯，即那些仅作为邻近高斯的辅助存在的高斯。为了进一步提升，我们整合了一个稀疏阶数增加的球谐（SH），旨在减轻存储限制并降低训练开销。我们在包括大量4K+航拍图像的多个数据集上进行的实证评估表明，“EfficientGS”不仅加速了训练和渲染时间，而且还以大约是传统3DGS十分之一的模型大小实现了高保真度渲染。\n"
  },
  {
    "path": "abs/2404.12784.md",
    "content": "### Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation\n\nWe introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before α blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and α blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by +8% over the state of the art. Code and trained models will be released soon.\n\n我们介绍了一种名为对比高斯聚类的新方法，这种方法能够从任何视角提供分割掩码，并实现场景的3D分割。最近在新视角合成的研究中展示了如何通过一云3D高斯模型来表现场景的外观，以及如何通过在给定视点上投影这些高斯并在它们的颜色上进行α混合来生成精确的图像。继承这一方法，我们训练了一个模型，为每个高斯也包括一个分割特征向量。这些可以用于通过根据其特征向量聚类高斯来进行3D场景分割；以及通过将高斯投影到平面上并在其分割特征上进行α混合来生成2D分割掩码。通过结合对比学习和空间规则化，我们的方法可以在不一致的2D分割掩码上进行训练，仍然学会生成在所有视角下一致的分割掩码。此外，所得模型的准确性极高，预测掩码的IoU准确性比现有技术水平提高了+8％。代码和训练模型将很快发布。\n"
  },
  {
    "path": "abs/2404.13679.md",
    "content": "### GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal\n\nThis paper tackles the intricate challenge of object removal to update the radiance field using the 3D Gaussian Splatting. The main challenges of this task lie in the preservation of geometric consistency and the maintenance of texture coherence in the presence of the substantial discrete nature of Gaussian primitives. We introduce a robust framework specifically designed to overcome these obstacles. The key insight of our approach is the enhancement of information exchange among visible and invisible areas, facilitating content restoration in terms of both geometry and texture. Our methodology begins with optimizing the positioning of Gaussian primitives to improve geometric consistency across both removed and visible areas, guided by an online registration process informed by monocular depth estimation. Following this, we employ a novel feature propagation mechanism to bolster texture coherence, leveraging a cross-attention design that bridges sampling Gaussians from both uncertain and certain areas. This innovative approach significantly refines the texture coherence within the final radiance field. Extensive experiments validate that our method not only elevates the quality of novel view synthesis for scenes undergoing object removal but also showcases notable efficiency gains in training and rendering speeds.\n\n本文解决了使用三维高斯涂抹更新辐射场中物体移除的复杂挑战。这项任务的主要难点在于保持几何一致性和在高斯原始图形显著的离散特性存在的情况下维护纹理一致性。我们引入了一个专门设计的强大框架来克服这些障碍。我们方法的核心见解是增强可见区域和不可见区域之间的信息交换，从而在几何和纹理两个方面促进内容恢复。我们的方法首先通过优化高斯原始图形的定位来提高被移除区域和可见区域的几何一致性，这一过程由单目深度估计信息的在线注册过程指导。接下来，我们采用一种新颖的特征传播机制来增强纹理一致性，利用跨注意力设计桥接不确定和确定区域的采样高斯。这种创新方法显著提高了最终辐射场内的纹理一致性。广泛的实验验证了我们的方法不仅提升了经历物体移除的场景新视角合成的质量，而且还在训练和渲染速度上展示了显著的效率提升。\n"
  },
  {
    "path": "abs/2404.14037.md",
    "content": "### GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting\n\nRecent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip motion generation. Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose, delivering stable and realistic rendered videos. Extensive experimental results suggest that GaussianTalker outperforms existing state-of-the-art methods in talking head synthesis, delivering precise lip synchronization and exceptional visual quality. Our method achieves rendering speeds of 130 FPS on NVIDIA RTX4090 GPU, significantly exceeding the threshold for real-time rendering performance, and can potentially be deployed on other hardware platforms.\n\n近期关于使用神经辐射场（NeRF）进行音频驱动的仿真人头合成的研究已取得了令人印象深刻的成果。然而，由于NeRF隐式表示导致的姿势和表情控制不足，这些方法仍存在一些限制，如嘴唇动作不同步或不自然，以及视觉抖动和伪影。在本文中，我们提出了GaussianTalker，这是一种基于三维高斯涂抹的新型音频驱动仿真人头合成方法。通过将高斯绑定到三维面部模型，三维高斯的显式表示属性实现了面部动作的直观控制。GaussianTalker由两个模块组成，分别是特定发言者的运动翻译器和动态高斯渲染器。特定发言者的运动翻译器通过普遍化的音频特征提取和定制的嘴唇运动生成，实现了针对目标发言者的准确嘴唇动作。动态高斯渲染器引入了特定发言者的BlendShapes，通过潜在姿势增强面部细节表示，提供稳定和逼真的渲染视频。广泛的实验结果表明，GaussianTalker在仿真人头合成方面超越了现有的最先进方法，提供了精确的嘴唇同步和卓越的视觉质量。我们的方法在NVIDIA RTX4090 GPU上的渲染速度达到了130 FPS，大大超过了实时渲染性能的阈值，且有潜力部署在其他硬件平台上。\n"
  },
  {
    "path": "abs/2404.14249.md",
    "content": "### CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding\n\nThe recent 3D Gaussian Splatting (GS) exhibits high-quality and real-time synthesis of novel views in 3D scenes. Currently, it primarily focuses on geometry and appearance modeling, while lacking the semantic understanding of scenes. To bridge this gap, we present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting to efficiently comprehend 3D environments without annotated semantic data. In specific, rather than straightforwardly learning and rendering high-dimensional semantic features of 3D Gaussians, which significantly diminishes the efficiency, we propose a Semantic Attribute Compactness (SAC) approach. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians, enabling highly efficient rendering (>100 FPS). Additionally, to address the semantic ambiguity, caused by utilizing view-inconsistent 2D CLIP semantics to supervise Gaussians, we introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model. 3DCS imposes cross-view semantic consistency constraints by leveraging refined, self-predicted pseudo-labels derived from the trained 3D Gaussian model, thereby enhancing precise and view-consistent segmentation results. Extensive experiments demonstrate that our method remarkably outperforms existing state-of-the-art approaches, achieving improvements of 17.29% and 20.81% in mIoU metric on Replica and ScanNet datasets, respectively, while maintaining real-time rendering speed. Furthermore, our approach exhibits superior performance even with sparse input data, verifying the robustness of our method.\n\n最近的三维高斯涂抹（GS）展示了在三维场景中高质量和实时合成新视角的能力。目前，它主要关注几何和外观建模，而缺乏对场景的语义理解。为了弥补这一差距，我们提出了CLIP-GS，它将对比语言-图像预训练（CLIP）的语义集成到高斯涂抹中，以有效地理解没有标注语义数据的三维环境。具体来说，我们提出了一种语义属性紧凑性（SAC）方法，而不是直接学习和渲染三维高斯的高维语义特征，这大大降低了效率。SAC利用物体内部固有的统一语义，学习紧凑而有效的三维高斯的语义表示，实现高效渲染（>100 FPS）。此外，为了解决利用视图不一致的二维CLIP语义监督高斯所导致的语义歧义问题，我们引入了一种三维连贯自训练（3DCS）策略，依靠三维模型产生的多视图一致性。3DCS通过利用从训练的三维高斯模型派生的精炼的自预测伪标签，施加跨视图语义一致性约束，从而增强精确且视图一致的分割结果。广泛的实验表明，我们的方法显著优于现有的最先进方法，在Replica和ScanNet数据集上的mIoU指标分别提高了17.29%和20.81%，同时保持实时渲染速度。此外，我们的方法即使在输入数据稀疏的情况下也表现出优越的性能，验证了我们方法的鲁棒性。\n"
  },
  {
    "path": "abs/2404.14410.md",
    "content": "### Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses\n\nIn this paper, we present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input. As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation, enabling to conveniently and efficiently compose and render them together. In particular, we address the scenarios with severely limited and sparse observations in 3D human reconstruction, a common challenge encountered in the real world. To tackle this challenge, we introduce a novel approach to optimize the 3D-GS representation in a canonical space by fusing the sparse cues in the common space, where we leverage a pre-trained 2D diffusion model to synthesize unseen views while keeping the consistency with the observed 2D appearances. We demonstrate our method can reconstruct high-quality animatable 3D humans in various challenging examples, in the presence of occlusion, image crops, few-shot, and extremely sparse observations. After reconstruction, our method is capable of not only rendering the scene in any novel views at arbitrary time instances, but also editing the 3D scene by removing individual humans or applying different motions for each human. Through various experiments, we demonstrate the quality and efficiency of our methods over alternative existing approaches.\n\n在本文中，我们提出了一种从单目视频输入中重建三维世界和多个动态人体的方法。作为一个关键思想，我们通过最新出现的三维高斯涂抹（3D-GS）表示来表现世界和多个人体，这使得将它们一起方便高效地组合和渲染成为可能。特别是，我们解决了在三维人体重建中遇到的一个常见挑战——严重有限和稀疏的观察情况。为了应对这一挑战，我们引入了一种在规范空间中优化3D-GS表示的新方法，通过融合公共空间中的稀疏线索，我们利用预训练的二维扩散模型来合成未见视图，同时保持与观察到的二维外观的一致性。我们展示了我们的方法能够在存在遮挡、图像裁剪、少样本以及极端稀疏观察的各种挑战性示例中重建高质量的可动画三维人体。重建后，我们的方法不仅能在任何新视角和任意时间点渲染场景，还能通过移除个别人体或为每个人体应用不同的动作来编辑三维场景。通过各种实验，我们展示了我们的方法相比现有的替代方法的质量和效率。\n\n"
  },
  {
    "path": "abs/2404.15264.md",
    "content": "### TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting\n\nRadiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging the point-based Gaussian Splatting, facial motions can be represented in our method by applying smooth and continuous deformations to persistent Gaussian primitives, without requiring to learn the difficult appearance change like previous methods. Due to this simplification, precise facial motions can be synthesized while keeping a highly intact facial feature. Under such a deformation paradigm, we further identify a face-mouth motion inconsistency that would affect the learning of detailed speaking motions. To address this conflict, we decompose the model into two branches separately for the face and inside mouth areas, therefore simplifying the learning tasks to help reconstruct more accurate motion and structure of the mouth region. Extensive experiments demonstrate that our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.\n\n辐射场在合成逼真的3D说话头部方面表现出色。然而，由于适应剧烈的外观变化较为困难，当前通过直接修改点的外观来呈现面部动作的范式可能导致动态区域的扭曲。为了解决这一挑战，我们引入了TalkingGaussian，这是一个基于形变的辐射场框架，用于高保真的说话头部合成。通过利用基于点的高斯溅射，我们的方法可以通过对持久的高斯原始体应用平滑且连续的形变来表示面部动作，无需学习像以前的方法那样困难的外观变化。由于这种简化，可以合成精确的面部动作，同时保持高度完整的面部特征。在这种形变范式下，我们进一步发现了一个面部-口部动作不一致性，这会影响详细说话动作的学习。为了解决这一冲突，我们将模型分解为面部和口内区域的两个独立分支，从而简化学习任务，帮助重建更精确的口部动作和结构。广泛的实验表明，我们的方法渲染出的高质量唇同步说话头部视频，在面部保真度和效率上比以前的方法有更好的表现。\n"
  },
  {
    "path": "abs/2404.15891.md",
    "content": "### OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation\n\nRecent advancements in 3D reconstruction technologies have paved the way for high-quality and real-time rendering of complex 3D scenes. Despite these achievements, a notable challenge persists: it is difficult to precisely reconstruct specific objects from large scenes. Current scene reconstruction techniques frequently result in the loss of object detail textures and are unable to reconstruct object portions that are occluded or unseen in views. To address this challenge, we delve into the meticulous 3D reconstruction of specific objects within large scenes and propose a framework termed OMEGAS: Object Mesh Extraction from Large Scenes Guided by GAussian Segmentation. OMEGAS employs a multi-step approach, grounded in several excellent off-the-shelf methodologies. Specifically, initially, we utilize the Segment Anything Model (SAM) to guide the segmentation of 3D Gaussian Splatting (3DGS), thereby creating a basic 3DGS model of the target object. Then, we leverage large-scale diffusion priors to further refine the details of the 3DGS model, especially aimed at addressing invisible or occluded object portions from the original scene views. Subsequently, by re-rendering the 3DGS model onto the scene views, we achieve accurate object segmentation and effectively remove the background. Finally, these target-only images are used to improve the 3DGS model further and extract the definitive 3D object mesh by the SuGaR model. In various scenarios, our experiments demonstrate that OMEGAS significantly surpasses existing scene reconstruction methods.\n\n近期在3D重建技术上的进展已经为复杂3D场景的高质量和实时渲染铺平了道路。尽管取得了这些成就，仍存在一个显著的挑战：难以从大型场景中精确重建特定对象。当前的场景重建技术常常导致对象细节纹理的丢失，并且无法重建在视图中被遮挡或未见到的对象部分。为了应对这一挑战，我们深入研究大型场景中特定对象的细致3D重建，并提出了一个名为OMEGAS的框架：由高斯分割引导的大场景中的对象网格提取（Object Mesh Extraction from Large Scenes Guided by GAussian Segmentation）。OMEGAS采用多步骤方法，基于几种优秀的现成技术。具体来说，首先，我们利用Segment Anything Model（SAM）来指导3D高斯溅射（3DGS）的分割，从而创建目标对象的基本3DGS模型。然后，我们利用大规模扩散先验来进一步细化3DGS模型的细节，特别是针对原始场景视图中不可见或被遮挡的对象部分。随后，通过将3DGS模型重新渲染到场景视图上，我们实现了准确的对象分割，并有效地移除了背景。最后，这些仅含目标的图像被用来进一步改进3DGS模型，并通过SuGaR模型提取最终的3D对象网格。在各种场景下，我们的实验表明，OMEGAS显著超越了现有的场景重建方法。\n"
  },
  {
    "path": "abs/2404.16012.md",
    "content": "### GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting\n\nWe propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute. This design exploits the spatial-aware features and enforces interactions between neighboring points. The feature embeddings are then fed to a spatial-audio attention module, which predicts frame-wise offsets for the attributes of each Gaussian. It is more stable than previous concatenation or multiplication approaches for manipulating the numerous Gaussians and their intricate parameters. Experimental results showcase GaussianTalker's superiority in facial fidelity, lip synchronization accuracy, and rendering speed compared to previous methods. Specifically, GaussianTalker achieves a remarkable rendering speed of 120 FPS, surpassing previous benchmarks.\n\n我们提出了一个名为GaussianTalker的新型框架，用于实时生成可控姿态的说话头部。它利用3D高斯溅射（3DGS）的快速渲染能力，同时解决了直接用语音音频控制3DGS的挑战。GaussianTalker构建了一个头部的规范3DGS表示，并且与音频同步进行形变。一个关键的见解是将3D高斯属性编码到一个共享的隐式特征表示中，在此基础上与音频特征合并，以操纵每个高斯属性。这种设计利用了空间感知特征，并强化了相邻点之间的相互作用。然后将特征嵌入送入一个空间-音频注意力模块，该模块预测每个高斯的属性的帧偏移量。与以往的串联或乘法操纵大量高斯及其复杂参数的方法相比，它更为稳定。实验结果展示了GaussianTalker在面部保真度、唇部同步精度和渲染速度方面相比以往方法的优越性。特别是，GaussianTalker实现了120 FPS的显著渲染速度，超越了之前的基准。\n"
  },
  {
    "path": "abs/2404.16323.md",
    "content": "### DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction\n\nIn this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, allowing efficient and effective decoding through 3D reference point and multi-layer refinement adaptations. By harnessing the benefits of 3D Gaussians, our approach offers an efficient and accurate solution for 3D reconstruction from single-view images. We evaluate our method on the ShapeNet SRN dataset, getting PSNR of 24.21 and 24.98 in car and chair dataset, respectively. The result outperforming the recent method by around 2.25%, demonstrating the effectiveness of our method in achieving superior results.\n\n在本文中，我们研究了从单视图 RGB 图像进行 3D 重建的问题，并提出了一种名为 DIG3D 的新方法，用于 3D 对象重建和新视角合成。我们的方法利用了一个编解码框架，在解码器中生成 3D 高斯分布，并由编码器提供的具有深度感知的图像特征指导。特别地，我们引入了可变形变换器的使用，通过 3D 参考点和多层次细化调整，实现了高效和有效的解码。通过利用 3D 高斯分布的优势，我们的方法为从单视图图像进行 3D 重建提供了一个高效且精确的解决方案。我们在 ShapeNet SRN 数据集上评估了我们的方法，在汽车和椅子数据集中分别获得了 24.21 和 24.98 的 PSNR 值。这一结果比最近的方法提高了约 2.25%，证明了我们的方法在实现优越结果方面的有效性。\n"
  },
  {
    "path": "abs/2404.16510.md",
    "content": "### Interactive3D: Create What You Want by Interactive 3D Generation\n\n3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction capabilities. Existing methods mainly offer two approaches: (i) interpreting textual instructions with constrained controllability, or (ii) reconstructing 3D objects from 2D images. Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications. In this work, we introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process through extensive 3D interaction capabilities. Interactive3D is constructed in two cascading stages, utilizing distinct 3D representations. The first stage employs Gaussian Splatting for direct user interaction, allowing modifications and guidance of the generative direction at any intermediate step through (i) Adding and Removing components, (ii) Deformable and Rigid Dragging, (iii) Geometric Transformations, and (iv) Semantic Editing. Subsequently, the Gaussian splats are transformed into InstantNGP. We introduce a novel (v) Interactive Hash Refinement module to further add details and extract the geometry in the second stage. Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation.\n\n3D 对象生成已经取得了显著的进展，能够产生高质量的结果。然而，它在实现精确的用户控制方面仍有不足，往往产生的结果不符合用户的期望，从而限制了其应用范围。基于用户设想的 3D 对象生成在使用当前生成模型实现其概念时面临重大挑战，因为交互能力有限。现有方法主要提供两种途径：(i) 解释文本指令，但控制性有限；(ii) 从 2D 图像重建 3D 对象。这两种方法都将定制限制在 2D 参考的范围内，并可能在 3D 提升过程中引入不希望的误差，限制了直接和多功能的 3D 修改的范围。在这项工作中，我们引入了 Interactive3D，一种创新的用于交互式 3D 生成的框架，它通过广泛的 3D 交互能力，为用户提供了对生成过程的精确控制。Interactive3D 由两个串联阶段构建，利用不同的 3D 表示形式。第一阶段使用高斯喷涂进行直接用户交互，允许在任何中间步骤对生成方向进行修改和指导，包括：(i) 添加和移除组件，(ii) 可变形和刚性拖动，(iii) 几何变换，以及 (iv) 语义编辑。随后，将高斯喷涂转换为 InstantNGP。我们引入了一种新颖的 (v) 交互式哈希细化模块，以在第二阶段进一步添加细节和提取几何形状。我们的实验表明，Interactive3D 显著提高了 3D 生成的可控性和质量。\n\n"
  },
  {
    "path": "abs/2404.17215.md",
    "content": "### SLAM for Indoor Mapping of Wide Area Construction Environments\n\nSimultaneous localization and mapping (SLAM), i.e., the reconstruction of the environment represented by a (3D) map and the concurrent pose estimation, has made astonishing progress. Meanwhile, large scale applications aiming at the data collection in complex environments like factory halls or construction sites are becoming feasible. However, in contrast to small scale scenarios with building interiors separated to single rooms, shop floors or construction areas require measures at larger distances in potentially texture less areas under difficult illumination. Pose estimation is further aggravated since no GNSS measures are available as it is usual for such indoor applications. In our work, we realize data collection in a large factory hall by a robot system equipped with four stereo cameras as well as a 3D laser scanner. We apply our state-of-the-art LiDAR and visual SLAM approaches and discuss the respective pros and cons of the different sensor types for trajectory estimation and dense map generation in such an environment. Additionally, dense and accurate depth maps are generated by 3D Gaussian splatting, which we plan to use in the context of our project aiming on the automatic construction and site monitoring.s\n\n同时定位与地图构建（SLAM），即通过（3D）地图重建环境和同时进行姿态估计，已经取得了惊人的进展。与此同时，针对在复杂环境（如工厂车间或建筑工地）进行数据收集的大规模应用变得可行。然而，与小规模场景不同，小规模场景通常涉及将建筑内部分割为单个房间或商店地板，建筑工地或商店地板需要在潜在的无纹理区域进行更大距离的测量，并且照明条件复杂。由于缺少通常用于此类室内应用的全球导航卫星系统（GNSS）测量，姿态估计的难度进一步增加。在我们的工作中，我们通过装备了四个立体摄像机以及一个3D激光扫描器的机器人系统，在大型工厂车间实现数据收集。我们应用了最先进的激光雷达和视觉SLAM方法，并讨论了在这种环境中用于轨迹估计和密集地图生成的不同传感器类型的优缺点。此外，我们还通过3D高斯喷溅技术生成了密集且精确的深度图，计划在我们的项目中使用，该项目旨在自动化建筑和现场监控。\n"
  },
  {
    "path": "abs/2404.18394.md",
    "content": "### Reconstructing Satellites in 3D from Amateur Telescope Images\n\nThis paper proposes a framework for the 3D reconstruction of satellites in low-Earth orbit, utilizing videos captured by small amateur telescopes. The video data obtained from these telescopes differ significantly from data for standard 3D reconstruction tasks, characterized by intense motion blur, atmospheric turbulence, pervasive background light pollution, extended focal length and constrained observational perspectives. To address these challenges, our approach begins with a comprehensive pre-processing workflow that encompasses deep learning-based image restoration, feature point extraction and camera pose initialization. We proceed with the application of an improved 3D Gaussian splatting algorithm for reconstructing the 3D model. Our technique supports simultaneous 3D Gaussian training and pose estimation, enabling the robust generation of intricate 3D point clouds from sparse, noisy data. The procedure is further bolstered by a post-editing phase designed to eliminate noise points inconsistent with our prior knowledge of a satellite's geometric constraints. We validate our approach using both synthetic datasets and actual observations of China's Space Station, showcasing its significant advantages over existing methods in reconstructing 3D space objects from ground-based observations.\n\n本文提出了一个框架，用于利用小型业余望远镜拍摄的视频重建低地球轨道卫星的三维结构。这些望远镜获得的视频数据与标准三维重建任务的数据有显著不同，其特点包括强烈的运动模糊、大气湍流、普遍的背景光污染、扩展的焦距和受限的观测视角。为了应对这些挑战，我们的方法从一个全面的预处理工作流开始，包括基于深度学习的图像恢复、特征点提取和相机姿态初始化。接着，我们应用改进的三维高斯喷溅算法来重建三维模型。我们的技术支持三维高斯训练和姿态估计的同时进行，使得从稀疏、噪声数据中稳健生成复杂的三维点云成为可能。此过程还通过一个旨在消除与我们对卫星几何约束先验知识不一致的噪声点的后编辑阶段得到进一步加强。我们使用合成数据集和对中国空间站的实际观测来验证我们的方法，展示了其在从地面观测重建三维空间对象方面相较现有方法的显著优势。\n"
  },
  {
    "path": "abs/2404.18454.md",
    "content": "### 3D Gaussian Splatting with Deferred Reflection\n\nThe advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes from the environment map reflection model, which requires accurate surface normal while simultaneously bottlenecks normal estimation with discontinuous gradients. We leverage the per-pixel reflection gradients generated by deferred shading to bridge the optimization process of neighboring Gaussians, allowing nearly correct normal estimations to gradually propagate and eventually spread over all reflective objects. Our method significantly outperforms state-of-the-art techniques and concurrent work in synthesizing high-quality specular reflection effects, demonstrating a consistent improvement of peak signal-to-noise ratio (PSNR) for both synthetic and real-world scenes, while running at a frame rate almost identical to vanilla Gaussian splatting.\n\n神经网络和基于高斯的辐射场方法的出现在新视角合成领域取得了巨大成功。然而，镜面反射的处理仍然非常棘手，因为高频辐射场的拟合稳定性和准确性众所周知是非常困难的。我们提出了一种延迟着色方法，有效地使用高斯喷溅渲染镜面反射。主要挑战来自环境映射反射模型，这需要精确的表面法线，同时在具有不连续梯度的情况下阻碍法线估计。我们利用延迟着色生成的每像素反射梯度来桥接邻近高斯的优化过程，允许几乎正确的法线估计逐渐传播，并最终扩展到所有反射对象上。我们的方法显著优于最先进技术和同时期的工作，在合成高质量镜面反射效果方面展示了一致的峰值信噪比（PSNR）改进，适用于合成场景和现实世界场景，同时运行帧率几乎与普通高斯喷溅相同。\n"
  },
  {
    "path": "abs/2404.18669.md",
    "content": "### Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting\n\nRecent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly deviate from those seen during training. Additionally, issues such as dilation and aliasing arise when zooming in or out. These challenges can all be traced back to a single underlying issue: insufficient sampling. In our paper, we present a bootstrapping method that significantly addresses this problem. This approach employs a diffusion model to enhance the rendering of novel views using trained 3D-GS, thereby streamlining the training process. Our results indicate that bootstrapping effectively reduces artifacts, as well as clear enhancements on the evaluation metrics. Furthermore, we show that our method is versatile and can be easily integrated, allowing various 3D reconstruction projects to benefit from our approach.\n\n近期在神经渲染技术方面的发展显著提升了学术和商业领域中对于逼真3D场景的渲染效果。最新的方法，称为三维高斯喷溅（3D-GS），已经为渲染质量和速度树立了新的标准。然而，在合成新视点时，特别是对于在训练期间很少见到的视点，3D-GS的限制变得明显。此外，在放大或缩小时还会出现膨胀和混叠的问题。这些挑战都可以追溯到一个基本问题：采样不足。在我们的论文中，我们提出了一种显著解决这个问题的引导方法。这种方法采用扩散模型来增强使用训练过的3D-GS渲染新视角，从而简化了训练过程。我们的结果表明，引导方法有效减少了伪影，并在评估指标上取得了明显的提升。此外，我们还展示了我们的方法具有多功能性，可以轻松集成，使各种3D重建项目都能从我们的方法中受益。\n"
  },
  {
    "path": "abs/2404.18929.md",
    "content": "### DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing\n\nWe consider the problem of editing 3D objects and scenes based on open-ended language instructions. The established paradigm to solve this problem is to use a 2D image generator or editor to guide the 3D editing process. However, this is often slow as it requires do update a computationally expensive 3D representations such as a neural radiance field, and to do so by using contradictory guidance from a 2D model which is inherently not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two ways. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. We do so by utilizing a training-free approach which integrates cues from the underlying 3D geometry of the scene. Second, given a multi-view consistent edited sequence of images of the object, we directly and efficiently optimize the 3D object representation, which is based on 3D Gaussian Splatting. Because it does not require to apply edits incrementally and iteratively, DGE is significantly more efficient than existing approaches, and comes with other perks such as allowing selective editing of parts of the scene.\n\n我们考虑了基于开放式语言指令编辑3D对象和场景的问题。解决这个问题的传统范式是使用2D图像生成器或编辑器来指导3D编辑过程。然而，这通常较慢，因为它需要更新如神经辐射场这样的计算成本高昂的3D表示，并且还需使用本质上不具备多视图一致性的2D模型提供的矛盾指导。因此，我们引入了直接高斯编辑器（DGE），这是一种以两种方式解决这些问题的方法。首先，我们修改了像InstructPix2Pix这样的高质量图像编辑器，使其具有多视图一致性。我们通过使用一种无需训练的方法实现，该方法整合了场景底层3D几何的线索。其次，给定一个多视图一致的编辑过的对象图像序列，我们直接且高效地优化基于3D高斯喷溅的3D对象表示。由于DGE不需要逐步和迭代地应用编辑，它比现有方法更加高效，并且还具有其他优点，如允许选择性编辑场景的部分。\n"
  },
  {
    "path": "abs/2404.19026.md",
    "content": "### MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing\n\nCreating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.\n\n\n创建多视角视频的高保真头部头像是许多AR/VR应用的核心问题。然而，现有方法通常难以同时获得所有不同头部组件的高质量渲染，因为它们使用单一表示来模拟具有截然不同特征的组件（例如，皮肤与头发）。在本文中，我们提出了一种混合网格-高斯头部头像（MeGA），它使用更合适的表示来模拟不同的头部组件。具体来说，我们选择一个增强的FLAME网格作为我们的面部表示，并预测一个UV位移图来提供每个顶点的偏移量，以改善个性化的几何细节。为了实现真实感渲染，我们使用延迟神经渲染获得面部颜色，并将神经纹理分解为三个有意义的部分。对于头发建模，我们首先使用3D高斯喷溅构建一个静态的规范头发。然后，应用一个刚性变换和一个基于MLP的变形场来处理复杂的动态表情。结合我们的遮挡感知混合，MeGA为整个头部生成了更高保真的渲染，并自然支持更多的下游任务。在NeRSemble数据集上的实验表明了我们设计的有效性，超越了以前的最先进方法，并支持各种编辑功能，包括发型变化和纹理编辑。\n"
  },
  {
    "path": "abs/2404.19040.md",
    "content": "### GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting\n\nWe present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3∼5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Gaussian deformation field to translate and transform 3D Gaussians to synchronize with audio information, in which multi-resolution hashing grid-based tri-plane and temporal smooth module are incorporated to learn accurate deformation for fine-grained facial details. In addition, a pose-conditioned deformation field is designed to model the stabilized torso. To enable efficient optimization of the condition Gaussian deformation field, we initialize 3D Gaussians by learning a coarse static Gaussian representation. Extensive experiments in person-specific videos with audio tracks validate that GSTalker can generate high-fidelity and audio-lips synchronized results with fast training and real-time rendering speed.\n\n我们介绍了GStalker，这是一个3D音频驱动的对话面生成模型，采用高斯喷溅技术，实现快速训练（40分钟）和实时渲染（每秒125帧），只需3至5分钟的视频作为训练材料。相比之下，先前的基于2D和3D NeRF的建模框架需要几小时的训练和每帧几秒钟的渲染时间。具体来说，GSTalker学习一个由音频驱动的高斯变形场，以平移和变换3D高斯，使其与音频信息同步，其中包括多分辨率哈希网格基础的三平面和时间平滑模块，用于学习精确的变形以呈现细致的面部细节。此外，设计了一个姿势条件的变形场来模拟稳定的躯干。为了有效优化条件高斯变形场，我们通过学习粗略的静态高斯表示来初始化3D高斯。在包含音频轨道的特定人物视频中进行的广泛实验验证了GSTalker能够生成高保真度且音频与嘴唇同步的结果，具有快速训练和实时渲染的速度。\n"
  },
  {
    "path": "abs/2404.19149.md",
    "content": "### SAGS: Structure-Aware 3D Gaussian Splatting\n\nFollowing the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inherent 3D structure of the scene, thereby restricting the expressivity and the quality of the representation, resulting in various floating points and artifacts. In this work, we propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene, which reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets. SAGS is founded on a local-global graph representation that facilitates the learning of complex scenes and enforces meaningful point displacements that preserve the scene's geometry. Additionally, we introduce a lightweight version of SAGS, using a simple yet effective mid-point interpolation scheme, which showcases a compact representation of the scene with up to 24× size reduction without the reliance on any compression strategies. Extensive experiments across multiple benchmark datasets demonstrate the superiority of SAGS compared to state-of-the-art 3D-GS methods under both rendering quality and model size. Besides, we demonstrate that our structure-aware method can effectively mitigate floating artifacts and irregular distortions of previous methods while obtaining precise depth maps.\n\n自从NeRFs的出现之后，3D高斯喷溅（3D-GS）已经开辟了实时神经渲染的道路，克服了体积方法的计算负担。继3D-GS的开创性工作之后，几种方法试图实现可压缩和高保真度的性能替代方案。然而，这些方法采用了几何无关的优化方案，忽视了场景的固有3D结构，从而限制了表示的表现力和质量，导致了各种浮点和伪影。在这项工作中，我们提出了一种结构感知的高斯喷溅方法（SAGS），该方法隐式编码了场景的几何结构，反映出最先进的渲染性能和在基准新视角合成数据集上减少的存储需求。SAGS基于一个局部-全局图表示，便于学习复杂场景并强制执行有意义的点位移，以保持场景的几何结构。此外，我们引入了SAGS的轻量级版本，使用一种简单而有效的中点插值方案，展示了场景的紧凑表示，无需依赖任何压缩策略，可实现高达24倍的尺寸减少。在多个基准数据集上进行的广泛实验表明，与最先进的3D-GS方法相比，SAGS在渲染质量和模型大小方面具有优越性。此外，我们证明了我们的结构感知方法可以有效地减轻以前方法的浮动伪影和不规则扭曲，同时获得精确的深度图。\n"
  },
  {
    "path": "abs/2404.19398.md",
    "content": "### 3D Gaussian Blendshapes for Head Avatar Animation\n\nWe introduce 3D Gaussian blendshapes for modeling photorealistic head avatars. Taking a monocular video as input, we learn a base head model of neutral expression, along with a group of expression blendshapes, each of which corresponds to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain a few properties to depict the avatar appearance. The avatar model of an arbitrary expression can be effectively generated by combining the neutral model and expression blendshapes through linear blending of Gaussians with the expression coefficients. High-fidelity head avatar animations can be synthesized in real time using Gaussian splatting. Compared to state-of-the-art methods, our Gaussian blendshape representation better captures high-frequency details exhibited in input video, and achieves superior rendering performance.\n\n我们引入了3D高斯混合形状（blendshapes）来模拟逼真的头部头像。输入单眼视频，我们学习了一个中性表情的基础头部模型，以及一组表情混合形状，每一个都对应于经典参数面部模型中的基础表情。中性模型和表情混合形状均以3D高斯表示，这些高斯包含几个属性以描述头像外观。通过将中性模型和表情混合形状通过高斯线性混合与表情系数结合，可以有效地生成任意表情的头像模型。使用高斯喷溅技术可以实时合成高保真的头部头像动画。与最先进的方法相比，我们的高斯混合形状表示更好地捕捉了输入视频中展示的高频细节，并实现了更优越的渲染性能。\n"
  },
  {
    "path": "abs/2404.19525.md",
    "content": "### MicroDreamer: Zero-shot 3D Generation in ∼20 Seconds by Score-based Iterative Reconstruction\n\nOptimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model. Given the images produced by the diffusion model, SIR reduces NFEs by repeatedly optimizing 3D parameters, unlike the single optimization in SDS, mimicking the 3D reconstruction process. With other improvements including optimization in the pixel space, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splitting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian.\n\n基于优化的方法，如得分蒸馏抽样（SDS），在零样本3D生成中显示出前景，但由于每个样本所需的高数量的函数评估（NFEs）而效率低下。在本文中，我们引入了基于得分的迭代重建（SIR），这是一个高效且通用的3D生成算法，采用多视图基于得分的扩散模型。给定扩散模型产生的图像，SIR通过反复优化3D参数来减少NFEs，不同于SDS中的单次优化，模仿3D重建过程。通过包括在像素空间中的优化等其他改进，我们提出了一个名为MicroDreamer的高效方法，该方法通常适用于各种3D表示和3D生成任务。特别是，在保持可比性能的同时，MicroDreamer比SDS在生成神经辐射场方面快5-20倍，并且在单个A100 GPU上从3D高斯分割生成网格只需大约20秒，将最快的零样本基线DreamGaussian的时间减半。\n"
  },
  {
    "path": "abs/2404.19702.md",
    "content": "### GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting\n\nWe propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model in downstream 3D generation tasks.\n\n我们提出了GS-LRM，一种可扩展的大型重建模型，能够从2-4张摆好姿势的稀疏图像中，在单个A100 GPU上仅用0.23秒预测高质量的3D高斯原始体。我们的模型采用了非常简单的基于变压器的架构；我们将输入的摆好姿势的图像进行打块处理，将多视图图像令牌串联起来，通过一系列变压器块传递，并直接从这些令牌解码出最终的每像素高斯参数，用于可微分渲染。与只能重建对象的以往LRM相比，通过预测每像素高斯，GS-LRM自然地处理具有大尺度和复杂性变化的场景。我们展示了我们的模型可以通过分别在Objaverse和RealEstate10K上训练，用于物体和场景捕获。在这两种情况下，模型均大幅超越了最先进的基准。我们还展示了该模型在下游3D生成任务中的应用。\n"
  },
  {
    "path": "abs/2405.00676.md",
    "content": "### Spectrally Pruned Gaussian Fields with Neural Compensation\n\nRecently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between primitives. In this paper, we propose a memory-efficient Gaussian field named SUNDAE with spectral pruning and neural compensation. On one hand, we construct a graph on the set of Gaussian primitives to model their relationship and design a spectral down-sampling module to prune out primitives while preserving desired signals. On the other hand, to compensate for the quality loss of pruning Gaussians, we exploit a lightweight neural network head to mix splatted features, which effectively compensates for quality losses while capturing the relationship between primitives in its weights. We demonstrate the performance of SUNDAE with extensive results. For example, SUNDAE can achieve 26.80 PSNR at 145 FPS using 104 MB memory while the vanilla Gaussian splatting algorithm achieves 25.60 PSNR at 160 FPS using 523 MB memory, on the Mip-NeRF360 dataset.\n\n近来，3D高斯喷溅作为一种新型3D表示，因其快速的渲染速度和高质量的渲染效果而受到关注。然而，这也伴随着高内存消耗，例如，一个训练良好的高斯场可能会使用三百万个高斯原始体和超过700MB的内存。我们认为这种高内存占用是由于缺乏对原始体之间关系的考虑。在本文中，我们提出了一个名为SUNDAE的高效内存高斯场，采用频谱修剪和神经补偿。一方面，我们在高斯原始体集上构建一个图，以模拟它们之间的关系，并设计一个频谱下采样模块，以在保留所需信号的同时修剪掉原始体。另一方面，为了补偿修剪高斯所造成的质量损失，我们利用一个轻量级的神经网络头部混合喷溅的特征，有效地补偿质量损失，同时在其权重中捕捉原始体之间的关系。我们通过广泛的结果展示了SUNDAE的性能。例如，在Mip-NeRF360数据集上，SUNDAE能够在使用104MB内存的情况下达到26.80的PSNR和145FPS，而标准的高斯喷溅算法在使用523MB内存的情况下达到25.60的PSNR和160FPS。\n"
  },
  {
    "path": "abs/2405.00956.md",
    "content": "### Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians\n\nSurgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes, we incorporate depth supervision and anisotropy regularization into the Gaussian learning process. Furthermore, we apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations. Our method was evaluated on our collected in-house and public surgical videos datasets. Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene-and produce both visually and physically plausible deformations at a speed approaching real-time. The results demonstrate great potential of our proposed method to enhance the efficiency and variety of simulations available for surgical education and robot learning.\n\n手术场景仿真在手术教育和基于模拟器的机器人学习中扮演着至关重要的角色。传统的创建手术环境的方法涉及到一个劳动密集型的过程，设计师需要手工制作具有纹理和几何形状的软体模拟组织模型。这种手动方法不仅耗时，而且在可扩展性和真实性上有限。相比之下，数据驱动的仿真提供了一个引人注目的替代方案。它有潜力从现实世界的手术视频数据中自动重建3D手术场景，随后应用软体物理学。然而，这一领域相对未被探索。在我们的研究中，我们引入了3D高斯作为可学习的手术场景表示，该表示从立体内窥镜视频中学习得到。为了防止过拟合并确保这些场景的几何正确性，我们在高斯学习过程中引入了深度监督和各向异性正则化。此外，我们将物理属性与3D高斯集成的材料点方法应用于，以实现真实的场景变形。我们的方法在我们收集的内部和公共手术视频数据集上进行了评估。结果显示，它能够从内窥镜视频中有效重建和模拟手术场景——重建手术场景只需几分钟，并以接近实时的速度产生视觉上和物理上合理的变形。这些结果展示了我们提出的方法在提高手术教育和机器人学习仿真的效率和多样性方面的巨大潜力。\n"
  },
  {
    "path": "abs/2405.02005.md",
    "content": "### HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2\n\nIn the fields of photogrammetry, computer vision and computer graphics, the task of neural 3D scene reconstruction has led to the exploration of various techniques. Among these, 3D Gaussian Splatting stands out for its explicit representation of scenes using 3D Gaussians, making it appealing for tasks like 3D point cloud extraction and surface reconstruction. Motivated by its potential, we address the domain of 3D scene reconstruction, aiming to leverage the capabilities of the Microsoft HoloLens 2 for instant 3D Gaussian Splatting. We present HoloGS, a novel workflow utilizing HoloLens sensor data, which bypasses the need for pre-processing steps like Structure from Motion by instantly accessing the required input data i.e. the images, camera poses and the point cloud from depth sensing. We provide comprehensive investigations, including the training process and the rendering quality, assessed through the Peak Signal-to-Noise Ratio, and the geometric 3D accuracy of the densified point cloud from Gaussian centers, measured by Chamfer Distance. We evaluate our approach on two self-captured scenes: An outdoor scene of a cultural heritage statue and an indoor scene of a fine-structured plant. Our results show that the HoloLens data, including RGB images, corresponding camera poses, and depth sensing based point clouds to initialize the Gaussians, are suitable as input for 3D Gaussian Splatting.\n\n在摄影测量学、计算机视觉和计算机图形学领域，神经网络3D场景重建的任务促使人们探索了各种技术。其中，3D高斯喷溅技术因其使用3D高斯显式表示场景而脱颖而出，这使其在3D点云提取和表面重建等任务中显得非常吸引人。受到其潜力的激励，我们致力于3D场景重建领域，目标是利用微软HoloLens 2的能力进行即时的3D高斯喷溅。我们提出了一种名为HoloGS的新型工作流程，该流程利用HoloLens传感器数据，无需进行运动恢复结构等预处理步骤，即可直接获取所需的输入数据，即图像、相机位置和来自深度感测的点云。我们进行了全面的调查，包括训练过程和通过峰值信噪比评估的渲染质量，以及通过钱伯斯距离测量的来自高斯中心的密集点云的几何3D精度。我们在两个自采集的场景上评估了我们的方法：一个是户外的文化遗产雕像场景，另一个是室内的细结构植物场景。我们的结果显示，HoloLens的数据，包括RGB图像、相应的相机位置和基于深度感测的点云以初始化高斯分布，适合作为3D高斯喷溅的输入。\n"
  },
  {
    "path": "abs/2405.02280.md",
    "content": "### DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos\n\nExisting VLMs can track in-the-wild 2D video objects while current generative models provide powerful visual priors for synthesizing novel views for the highly under-constrained 2D-to-3D object lifting. Building upon this exciting progress, we present DreamScene4D, the first approach that can generate three-dimensional dynamic scenes of multiple objects from monocular in-the-wild videos with large object motion across occlusions and novel viewpoints. Our key insight is to design a \"decompose-then-recompose\" scheme to factorize both the whole video scene and each object's 3D motion. We first decompose the video scene by using open-vocabulary mask trackers and an adapted image diffusion model to segment, track, and amodally complete the objects and background in the video. Each object track is mapped to a set of 3D Gaussians that deform and move in space and time. We also factorize the observed motion into multiple components to handle fast motion. The camera motion can be inferred by re-rendering the background to match the video frames. For the object motion, we first model the object-centric deformation of the objects by leveraging rendering losses and multi-view generative priors in an object-centric frame, then optimize object-centric to world-frame transformations by comparing the rendered outputs against the perceived pixel and optical flow. Finally, we recompose the background and objects and optimize for relative object scales using monocular depth prediction guidance. We show extensive results on the challenging DAVIS, Kubric, and self-captured videos, detail some limitations, and provide future directions. Besides 4D scene generation, our results show that DreamScene4D enables accurate 2D point motion tracking by projecting the inferred 3D trajectories to 2D, while never explicitly trained to do so.\n\n现有的视觉语言模型（VLMs）能够跟踪野外环境中二维视频中的对象，而当前的生成模型为高度不受约束的二维至三维物体提升提供了强大的视觉先验，以合成新颖视角的视图。在这一令人兴奋的进展基础上，我们介绍了DreamScene4D，这是第一种能从单目野外视频中生成多个对象的三维动态场景的方法，该方法能处理大范围物体运动、遮挡和新颖视点。我们的关键见解是设计一个“分解再重组”的方案，以分解整个视频场景和每个对象的三维运动。我们首先使用开放词汇的遮罩跟踪器和适应性图像扩散模型分解视频场景，以分割、跟踪和完全表示视频中的对象和背景。每个对象的轨迹被映射到一组在空间和时间中变形和移动的3D高斯。我们还将观察到的运动分解为多个组件以处理快速运动。相机运动可以通过重新渲染背景以匹配视频帧来推断。对于对象运动，我们首先利用渲染损失和多视图生成先验，在对象中心框架中建模对象的变形，然后通过比较渲染输出与感知的像素和光流，优化对象中心到世界框架的转换。最后，我们重新组合背景和对象，并利用单目深度预测指导优化相对对象比例。我们在挑战性的DAVIS、Kubric和自采视频上展示了广泛的结果，详细说明了一些限制，并提供了未来的方向。除了4D场景生成，我们的结果还显示，DreamScene4D能通过将推断出的3D轨迹投影到2D来实现精确的2D点运动跟踪，尽管从未明确训练过这一功能。\n"
  },
  {
    "path": "abs/2405.03417.md",
    "content": "### Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review\n\nImage-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting.\n\n基于图像的三维重建是一项挑战性任务，涉及从一组输入图像中推断出对象或场景的三维形状。基于学习的方法因其能够直接估计三维形状而受到关注。本综述论文聚焦于三维重建的最新技术，包括生成新颖、未见过的视图。文章提供了高斯喷溅方法近期发展的概览，涵盖输入类型、模型结构、输出表达和训练策略。同时讨论了尚未解决的挑战和未来的发展方向。鉴于该领域的快速进展和增强三维重建方法的众多机会，对算法进行全面审查显得尤为重要。因此，本研究提供了关于高斯喷溅最新进展的详尽概述。\n"
  },
  {
    "path": "abs/2405.03659.md",
    "content": "### A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose\n\nNovel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset.\n\n从稀疏的输入图像集合进行新视角合成是一个具有实际意义的挑战性问题，特别是在缺少或不准确的相机位置的情况下。在神经辐射场算法中直接优化相机位置并使用估计的深度通常无法产生良好的结果，这是因为位置和深度之间的耦合以及单目深度估计的不准确性。在这篇论文中，我们利用最近的3D高斯喷溅方法，开发了一种新的构建并优化方法，用于在没有相机位置的情况下进行稀疏视图合成。具体来说，我们通过使用单目深度并将像素投影回三维世界，逐步构建解决方案。在构建过程中，我们通过检测训练视图与相应渲染图像之间的二维对应关系来优化解决方案。我们开发了一个统一的可微分管道，用于相机注册和调整相机位置及深度，随后进行反投影。我们还引入了高斯喷溅中的一个新概念——预期表面，这对我们的优化至关重要。这些步骤使我们能够得到一个粗糙的解决方案，随后可以通过标准优化方法进行低通滤波和细化。我们在Tanks and Temples及Static Hikes数据集上展示了结果，即使仅使用三个相距较远的视图，也显示出比竞争方法包括那些有大致相机位置信息的方法显著更好的质量。此外，我们的结果随着视图数量的增加而改善，并且即使使用一半的数据集也超过了先前的InstantNGP和高斯喷溅算法。\n"
  },
  {
    "path": "abs/2405.04378.md",
    "content": "### Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting\n\nWe present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills latent codes for language semantics and grasp affordance into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical for many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a \"digital twin\" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks, as well as in four multi-stage manipulation tasks using the edited scene to reflect scene changes due to prior manipulation stages, which is not possible with the existing baselines.\n\n我们提出了 Splat-MOVER，这是一个用于开放词汇机器人操控的模块化机器人技术栈，它利用高斯喷溅（GSplat）场景表示的可编辑性，以实现多阶段操控任务。Splat-MOVER 包括：（i）ASK-Splat，一种 GSplat 表示，它提取用于语言语义和抓握适应性的潜在代码到3D场景中。ASK-Splat 能够理解3D场景的几何形状、语义和适应性，这对许多机器人任务至关重要；（ii）SEE-Splat，一个实时场景编辑模块，使用3D语义遮罩和填充来可视化由机器人交互在现实世界中产生的对象运动。SEE-Splat 在操控任务过程中创建了一个“数字孪生”环境；以及（iii）Grasp-Splat，一个抓握生成模块，使用 ASK-Splat 和 SEE-Splat 提出开放世界对象的候选抓握方式。ASK-Splat 在操作前的简短扫描阶段通过 RGB 图像实时训练，而 SEE-Splat 和 Grasp-Splat 在操作期间实时运行。我们通过在 Kinova 机器人上的硬件实验，展示了 Splat-MOVER 在四个单阶段开放词汇操控任务以及使用编辑后场景反映由于先前操控阶段引起的场景变化的四个多阶段操控任务中，相比两个最新基准的优越性能，这在现有基准中是无法实现的。\n\n"
  },
  {
    "path": "abs/2405.05446.md",
    "content": "### GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields\n\nThe 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much less Gaussian splats, leading to the more efficient storage and thus higher computational performance during both training and rendering. Thanks to the sparsity, during the view synthesis, only a small mount of pixels are needed, leading to much higher computational performance (100∼1000× faster). And the 2D image can be recovered from the gradients via solving a Poisson equation with linear computation complexity. Several experiments are performed to confirm the sparseness of the gradients and the computation performance of the proposed method. The method can be applied various applications, such as human body modeling and indoor environment modeling.\n\n三维高斯喷溅方法正变得越来越流行。然而，它们直接作用于信号上，导致信号表示较为密集。尽管采用了剪枝或蒸馏等技术，结果仍然是密集的。在本文中，我们提议建模原始信号的梯度。梯度比原始信号稀疏得多。因此，梯度使用的高斯喷溅数量少得多，导致更有效的存储，从而在训练和渲染期间具有更高的计算性能。由于稀疏性，在视图合成期间只需要少量像素，从而大大提高了计算性能（快100～1000倍）。通过求解具有线性计算复杂性的泊松方程，可以从梯度中恢复2D图像。进行了几项实验以确认梯度的稀疏性和所提方法的计算性能。该方法可以应用于各种应用，例如人体建模和室内环境建模。\n"
  },
  {
    "path": "abs/2405.05702.md",
    "content": "### NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap\n\nGaussian Splatting has garnered widespread attention due to its exceptional performance. Consequently, SLAM systems based on Gaussian Splatting have emerged, leveraging its capabilities for rapid real-time rendering and high-fidelity mapping. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure adjustments and scene generalization capabilities. To address these issues, we introduce NGM-SLAM, the first GS-SLAM system that utilizes neural radiance field submaps for progressive scene expression, effectively integrating the strengths of neural radiance fields and 3D Gaussian Splatting. We have developed neural implicit submaps as supervision and achieve high-quality scene expression and online loop closure adjustments through Gaussian rendering of fused submaps. Our results on multiple real-world scenes and large-scale scene datasets demonstrate that our method can achieve accurate gap filling and high-quality scene expression, supporting both monocular, stereo, and RGB-D inputs, and achieving state-of-the-art scene reconstruction and tracking performance.\n\n高斯喷溅由于其卓越的性能而受到广泛关注。因此，基于高斯喷溅的 SLAM（同步定位与地图构建）系统已经出现，利用其快速实时渲染和高保真度映射的能力。然而，当前的高斯喷溅 SLAM 系统通常在大场景表示上存在困难，且缺乏有效的闭环调整和场景泛化能力。为解决这些问题，我们引入了 NGM-SLAM，这是第一个利用神经辐射场子图进行逐步场景表达的 GS-SLAM 系统，有效整合了神经辐射场和三维高斯喷溅的优势。我们开发了作为监督的神经隐式子图，并通过融合子图的高斯渲染实现高质量的场景表达和在线闭环调整。我们在多个真实世界场景和大规模场景数据集上的结果表明，我们的方法能够实现精确的缺口填补和高质量的场景表达，支持单目、立体和 RGB-D 输入，并实现了最先进的场景重建和跟踪性能。\n\n"
  },
  {
    "path": "abs/2405.05768.md",
    "content": "### FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting\n\nText-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To obtain high-quality novel views, we introduce the Coarse View Synthesis (CVS) and Progressive Novel View Inpainting (PNVI) strategies, ensuring both scene consistency and view quality. Subsequently, we utilize Multi-View Projection (MVP) to form perspective views, and apply 3D Gaussian Splatting (3DGS) for scene reconstruction. Comprehensive experiments demonstrate FastScene surpasses other methods in both generation speed and quality with better scene consistency. Notably, guided only by a text prompt, FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods, making it a paradigm for user-friendly scene generation.\n\n文本驱动的三维室内场景生成在从游戏和智能家居到增强现实/虚拟现实应用等广泛领域中具有重要应用。快速和高保真的场景生成对于确保用户友好体验至关重要。然而，现有方法的生成过程耗时或需要复杂的手动指定运动参数，这为用户带来不便。此外，这些方法通常依赖于狭窄视野的迭代生成，从而影响全局一致性和整体场景质量。为了解决这些问题，我们提出了 FastScene，一个用于快速且更高质量的三维场景生成的框架，同时保持场景的一致性。具体来说，给定一个文本提示，我们生成一个全景并估计其深度，因为全景包含了整个场景的信息并展示了明确的几何约束。为了获得高质量的新视角，我们引入了粗视图合成（CVS）和渐进式新视角填充（PNVI）策略，确保场景的一致性和视图质量。随后，我们利用多视角投影（MVP）形成透视视图，并应用三维高斯喷溅（3DGS）进行场景重建。全面的实验表明，FastScene 在生成速度和质量上都超越了其他方法，具有更好的场景一致性。值得注意的是，仅凭文本提示，FastScene 能在短短15分钟内生成一个三维场景，比最先进的方法至少快了一个小时，使其成为用户友好场景生成的典范。\n"
  },
  {
    "path": "abs/2405.05800.md",
    "content": "### DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation\n\nUser-friendly 3D object editing is a challenging task that has attracted significant attention recently. The limitations of direct 3D object editing without 2D prior knowledge have prompted increased attention towards utilizing 2D generative models for 3D editing. While existing methods like Instruct NeRF-to-NeRF offer a solution, they often lack user-friendliness, particularly due to semantic guided editing. In the realm of 3D representation, 3D Gaussian Splatting emerges as a promising approach for its efficiency and natural explicit property, facilitating precise editing tasks. Building upon these insights, we propose DragGaussian, a 3D object drag-editing framework based on 3D Gaussian Splatting, leveraging diffusion models for interactive image editing with open-vocabulary input. This framework enables users to perform drag-based editing on pre-trained 3D Gaussian object models, producing modified 2D images through multi-view consistent editing. Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.\n\n近期，用户友好的三维对象编辑成为一个具有挑战性的任务，引起了广泛关注。直接三维对象编辑在没有二维先验知识的情况下的局限性，促使人们更多地使用二维生成模型进行三维编辑。尽管现有方法如 Instruct NeRF-to-NeRF 提供了解决方案，但它们通常缺乏用户友好性，特别是在语义引导编辑方面。在三维表示领域中，三维高斯喷溅作为一种有效且自然明确的方法，成为了一个有前景的选择，便于进行精确编辑任务。基于这些洞见，我们提出了 DragGaussian，这是一个基于三维高斯喷溅的三维对象拖拽编辑框架，利用扩散模型进行交互式图像编辑，并支持开放词汇输入。该框架使用户能够对预训练的三维高斯对象模型进行基于拖拽的编辑，通过多视角一致性编辑产生修改后的二维图像。我们的贡献包括引入一个新的任务，开发用于交互式点基三维编辑的 DragGaussian，以及通过定性和定量实验全面验证其有效性。\n"
  },
  {
    "path": "abs/2405.06408.md",
    "content": "### I3DGS: Improve 3D Gaussian Splatting from Multiple Dimensions\n\n3D Gaussian Splatting is a novel method for 3D view synthesis, which can gain an implicit neural learning rendering result than the traditional neural rendering technology but keep the more high-definition fast rendering speed. But it is still difficult to achieve a fast enough efficiency on 3D Gaussian Splatting for the practical applications. To Address this issue, we propose the I3DS, a synthetic model performance improvement evaluation solution and experiments test. From multiple and important levels or dimensions of the original 3D Gaussian Splatting, we made more than two thousand various kinds of experiments to test how the selected different items and components can make an impact on the training efficiency of the 3D Gaussian Splatting model. In this paper, we will share abundant and meaningful experiences and methods about how to improve the training, performance and the impacts caused by different items of the model. A special but normal Integer compression in base 95 and a floating-point compression in base 94 with ASCII encoding and decoding mechanism is presented. Many real and effective experiments and test results or phenomena will be recorded. After a series of reasonable fine-tuning, I3DS can gain excellent performance improvements than the previous one.\n\n3D 高斯喷涂是一种用于3D视图合成的新方法，相较于传统的神经渲染技术，它能实现更隐式的神经学习渲染结果，同时保持更高清晰度的快速渲染速度。然而，在实际应用中实现足够快的效率仍然具有挑战性。为了解决这一问题，我们提出了 I3DS，这是一种合成模型性能改进评估解决方案，包括实验测试。通过对原始3D高斯喷涂的多个重要层面或维度进行分析，我们进行了超过两千种不同类型的实验，以测试选定的不同项目和组件如何影响3D高斯喷涂模型的训练效率。在本文中，我们将分享关于如何改进训练、性能以及模型不同项目带来的影响的丰富而有意义的经验和方法。本文还介绍了一种特殊但常见的基于95的整数压缩和基于94的浮点压缩，以及ASCII编码和解码机制。许多实际有效的实验和测试结果或现象将被记录。经过一系列合理的微调，I3DS能比之前取得显著的性能改进。\n"
  },
  {
    "path": "abs/2405.06547.md",
    "content": "### OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation\n\nOne image to editable dynamic 3D model and video generation is novel direction and change in the research area of single image to 3D representation or 3D reconstruction of image. Gaussian Splatting has demonstrated its advantages in implicit 3D reconstruction, compared with the original Neural Radiance Fields. As the rapid development of technologies and principles, people tried to used the Stable Diffusion models to generate targeted models with text instructions. However, using the normal implicit machine learning methods is hard to gain the precise motions and actions control, further more, it is difficult to generate a long content and semantic continuous 3D video. To address this issue, we propose the OneTo3D, a method and theory to used one single image to generate the editable 3D model and generate the targeted semantic continuous time-unlimited 3D video. We used a normal basic Gaussian Splatting model to generate the 3D model from a single image, which requires less volume of video memory and computer calculation ability. Subsequently, we designed an automatic generation and self-adaptive binding mechanism for the object armature. Combined with the re-editable motions and actions analyzing and controlling algorithm we proposed, we can achieve a better performance than the SOTA projects in the area of building the 3D model precise motions and actions control, and generating a stable semantic continuous time-unlimited 3D video with the input text instructions. Here we will analyze the detailed implementation methods and theories analyses. Relative comparisons and conclusions will be presented.\n\n从单一图像到可编辑的动态3D模型和视频生成是单图像到3D表示或图像3D重建研究领域的新方向和变革。高斯喷涂已经在隐式3D重建中展示了其优势，与原始的神经辐射场相比。随着技术和原理的快速发展，人们尝试使用稳定扩散模型根据文本指令生成目标模型。然而，使用常规的隐式机器学习方法难以精确控制运动和动作，更难以生成长内容和语义连续的3D视频。为了解决这一问题，我们提出了OneTo3D方法和理论，使用单一图像生成可编辑的3D模型，并生成目标语义连续的不限时长的3D视频。我们使用了一个基本的高斯喷涂模型从单一图像生成3D模型，这种方法需要较少的视频内存和计算能力。随后，我们设计了一个自动化生成和自适应绑定机制来处理对象骨架。结合我们提出的可重新编辑的动作分析和控制算法，我们可以在建立精确的3D模型动作控制和生成稳定的语义连续的不限时长3D视频方面，实现比现有最佳技术更好的性能。在这里，我们将分析详细的实施方法和理论分析。相关比较和结论将被提出。\n\n"
  },
  {
    "path": "abs/2405.06945.md",
    "content": "### Direct Learning of Mesh and Appearance via 3D Gaussian Splatting\n\nAccurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of the scene, including the mesh. Experimental results demonstrate that the learned scene model not only achieves state-of-the-art rendering quality but also supports manipulation using the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.\n\n精确重建包括显式几何信息的3D场景既具有吸引力又具有挑战性。几何重建可以通过结合可微分的外观模型来受益，例如神经辐射场和3D高斯喷涂（3DGS）。在这项工作中，我们提出了一个可学习的场景模型，该模型将3DGS与显式的几何表示（即网格）结合在一起。我们的模型以端到端的方式学习网格和外观，其中我们将3D高斯绑定到网格面上，并执行3DGS的可微分渲染以获得光度监督。该模型创建了一个有效的信息通道来监督场景的学习，包括网格。实验结果表明，学习到的场景模型不仅实现了最先进的渲染质量，还支持使用显式网格进行操作。此外，我们的模型在适应场景更新方面具有独特的优势，这得益于网格和外观的端到端学习。\n"
  },
  {
    "path": "abs/2405.07319.md",
    "content": "### LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer\n\nAnimatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods.\n\n面对跨角色的服装转移和动画化，可动画化服装转移是一个充满挑战的问题。大多数人体化身作品将人体和服装的表示混合在一起，这导致跨身份的虚拟试穿存在困难。更糟糕的是，这种纠缠的表示通常无法准确跟踪服装的滑动运动。为了克服这些限制，我们提出了Layered Gaussian Avatars（LayGA），这是一种新的表示方法，将身体和服装作为两个独立的层来实现从多视角视频中进行真实感动画化服装转移。我们的表示基于高斯地图化身，因其在服装细节表现上的卓越能力。然而，高斯地图产生的非结构化3D高斯分布在实际表面周围。缺乏光滑的显式表面在精确的服装跟踪和处理身体与服装之间的碰撞时带来挑战。因此，我们提出了两阶段训练，包括单层重建和多层拟合。在单层重建阶段，我们提出一系列几何约束来重建平滑表面，并同时获得身体和服装之间的分割。接下来，在多层拟合阶段，我们训练两个分离的模型来分别表示身体和服装，并利用重建的服装几何体作为3D监督，以实现更精确的服装跟踪。此外，我们还提出了几何和渲染层，以实现高质量的几何重建和高保真渲染。总体而言，所提出的LayGA实现了逼真的动画和虚拟试穿，并且性能超过了其他基线方法。\n"
  },
  {
    "path": "abs/2405.07472.md",
    "content": "### GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting\n\nThe increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.\n\n电子商务的日益突出，突显了虚拟试穿（VTON）的重要性。然而，以往的研究主要集中在2D领域，并且严重依赖大量数据进行训练。3D VTON的研究主要集中在服装-身体形状的兼容性上，这一主题在2D VTON中已被广泛覆盖。得益于3D场景编辑技术的进步，现在一个2D扩散模型已经被适应用于通过多视角编辑进行3D编辑。在这项工作中，我们提出了GaussianVTON，这是一个创新的3D VTON流程，将高斯喷涂（GS）编辑与2D VTON集成在一起。为了促进从2D到3D VTON的无缝过渡，我们首次提出使用仅图像作为编辑提示进行3D编辑。为了进一步解决面部模糊、服装不准确和编辑过程中视角质量下降等问题，我们设计了一个三阶段细化策略来逐步缓解潜在问题。此外，我们引入了一种新的编辑策略，称为编辑回调重建（ERR），以解决以往编辑策略在导致复杂几何变化方面的局限性。我们全面的实验展示了GaussianVTON的优越性，为3D VTON提供了一个新的视角，同时也为图像提示的3D场景编辑确立了一个新的起点。\n"
  },
  {
    "path": "abs/2405.10142.md",
    "content": "### GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction\n\nActive reconstruction technique enables robots to autonomously collect scene data for full coverage, relieving users from tedious and time-consuming data capturing process. However, designed based on unsuitable scene representations, existing methods show unrealistic reconstruction results or the inability of online quality evaluation. Due to the recent advancements in explicit radiance field technology, online active high-fidelity reconstruction has become achievable. In this paper, we propose GS-Planner, a planning framework for active high-fidelity reconstruction using 3D Gaussian Splatting. With improvement on 3DGS to recognize unobserved regions, we evaluate the reconstruction quality and completeness of 3DGS map online to guide the robot. Then we design a sampling-based active reconstruction strategy to explore the unobserved areas and improve the reconstruction geometric and textural quality. To establish a complete robot active reconstruction system, we choose quadrotor as the robotic platform for its high agility. Then we devise a safety constraint with 3DGS to generate executable trajectories for quadrotor navigation in the 3DGS map. To validate the effectiveness of our method, we conduct extensive experiments and ablation studies in highly realistic simulation scenes.\n\n主动重建技术使机器人能够自主收集全覆盖的场景数据，免除用户进行繁琐和耗时的数据捕获过程。然而，由于基于不适合的场景表示设计，现有方法显示出不现实的重建结果或无法进行在线质量评估。由于近期在显式辐射场技术的进步，在线高保真重建已成为可能。在本文中，我们提出了GS-Planner，一个使用3D高斯喷涂进行主动高保真重建的规划框架。通过改进3DGS以识别未观察到的区域，我们在线评估3DGS地图的重建质量和完整性以指导机器人。然后，我们设计了一种基于采样的主动重建策略，以探索未观察区域并提高重建的几何和纹理质量。为了建立一个完整的机器人主动重建系统，我们选择四旋翼飞行器作为机器人平台，因其具有高灵活性。接着，我们使用3DGS设计了一个安全约束，为四旋翼飞行器在3DGS地图中的导航生成可执行轨迹。为了验证我们方法的有效性，我们在高度现实的模拟场景中进行了广泛的实验和剔除研究。\n"
  },
  {
    "path": "abs/2405.10508.md",
    "content": "### ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation\n\nIn this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.\n\n在本文中，我们通过引入ART3D这一新型框架，探讨了3D艺术场景生成中存在的挑战，该框架结合了扩散模型和3D高斯喷涂技术。我们的方法通过一个创新的图像语义转移算法，有效地弥合了艺术图像与现实图像之间的差距。通过利用深度信息和初始艺术图像，我们生成了一个点云图，解决了领域差异问题。此外，我们提出了一个深度一致性模块，以增强3D场景的一致性。最后，3D场景作为优化高斯喷点的初始点。实验结果表明，与现有方法相比，ART3D在内容和结构一致性指标上表现出优越的性能。ART3D通过为生成高质量3D艺术场景提供创新解决方案，显著推进了AI在艺术创作领域的发展。\n"
  },
  {
    "path": "abs/2405.11021.md",
    "content": "### Photorealistic 3D Urban Scene Reconstruction and Point Cloud Extraction using Google Earth Imagery and Gaussian Splatting\n\n3D urban scene reconstruction and modelling is a crucial research area in remote sensing with numerous applications in academia, commerce, industry, and administration. Recent advancements in view synthesis models have facilitated photorealistic 3D reconstruction solely from 2D images. Leveraging Google Earth imagery, we construct a 3D Gaussian Splatting model of the Waterloo region centered on the University of Waterloo and are able to achieve view-synthesis results far exceeding previous 3D view-synthesis results based on neural radiance fields which we demonstrate in our benchmark. Additionally, we retrieved the 3D geometry of the scene using the 3D point cloud extracted from the 3D Gaussian Splatting model which we benchmarked against our Multi- View-Stereo dense reconstruction of the scene, thereby reconstructing both the 3D geometry and photorealistic lighting of the large-scale urban scene through 3D Gaussian Splatting.\n\n3D城市场景重建和建模是遥感研究中的一个重要领域，广泛应用于学术界、商业、工业和行政管理。最近在视图合成模型方面的进步已经使得仅通过2D图像实现光度真实的3D重建成为可能。我们利用谷歌地球的影像，构建了以滑铁卢大学为中心的滑铁卢地区的3D高斯喷溅模型，并能够实现远超基于神经辐射场的先前3D视图合成结果的视图合成效果。此外，我们还通过从3D高斯喷溅模型中提取的3D点云检索了场景的3D几何形状，并将其与我们的多视图立体密集重建的场景进行了基准测试，从而重建了大规模城市场景的3D几何形状和光度真实的照明效果。\n"
  },
  {
    "path": "abs/2405.11129.md",
    "content": "### MotionGS : Compact Gaussian Splatting SLAM by Motion Filter\n\nWith their high-fidelity scene representation capability, the attention of SLAM field is deeply attracted by the Neural Radiation Field (NeRF) and 3D Gaussian Splatting (3DGS). Recently, there has been a Surge in NeRF-based SLAM, while 3DGS-based SLAM is sparse. A novel 3DGS-based SLAM approach with a fusion of deep visual feature, dual keyframe selection and 3DGS is presented in this paper. Compared with the existing methods, the proposed selectively tracking is achieved by feature extraction and motion filter on each frame. The joint optimization of pose and 3D Gaussian runs through the entire mapping process. Additionally, the coarse-to-fine pose estimation and compact Gaussian scene representation are implemented by dual keyfeature selection and novel loss functions. Experimental results demonstrate that the proposed algorithm not only outperforms the existing methods in tracking and mapping, but also has less memory usage.\n\n\n凭借其高保真场景表示能力，神经辐射场（NeRF）和3D高斯喷溅（3DGS）深深吸引了SLAM领域的注意。最近，基于NeRF的SLAM出现了激增，而基于3DGS的SLAM则较为稀少。本文提出了一种新颖的基于3DGS的SLAM方法，该方法融合了深度视觉特征、双关键帧选择和3DGS。与现有方法相比，所提出的选择性跟踪是通过对每帧进行特征提取和运动滤波来实现的。姿态和3D高斯的联合优化贯穿整个映射过程。此外，通过双关键特征选择和新颖的损失函数实现了从粗到细的姿态估计和紧凑的高斯场景表示。实验结果表明，所提出的算法不仅在跟踪和映射方面优于现有方法，而且还具有较低的内存使用率。\n"
  },
  {
    "path": "abs/2405.11921.md",
    "content": "### MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections\n\n3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting. The key insight is grounded on the mirror symmetry between the real-world space and the virtual mirror space. We introduce an intuitive dual-rendering strategy that enables differentiable rasterization of both the real-world 3D Gaussians and the mirrored counterpart obtained by reflecting the former about the mirror plane. All 3D Gaussians are jointly optimized with the mirror plane in an end-to-end framework. MirrorGaussian achieves high-quality and real-time rendering in scenes with mirrors, empowering scene editing like adding new mirrors and objects. Comprehensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods, achieving state-of-the-art results.\n\n3D高斯喷溅在光度真实和实时新视角合成方面展示了显著的进步。然而，它在模拟镜面反射时面临挑战，这些反射从不同视点显示出显著的外观变化。为了解决这个问题，我们提出了MirrorGaussian，这是第一个基于3D高斯喷溅进行镜面场景重建并实现实时渲染的方法。关键见解基于真实世界空间与虚拟镜面空间之间的镜面对称性。我们引入了一种直观的双重渲染策略，使得真实世界的3D高斯和通过镜面反射得到的镜像部分都能进行可微栅格化。所有3D高斯与镜面一起在端到端框架中进行联合优化。MirrorGaussian在含镜子的场景中实现了高质量和实时渲染，增强了场景编辑能力，如添加新镜子和物体。在多个数据集上的全面实验表明，我们的方法显著超过现有方法，达到了最先进的结果。\n"
  },
  {
    "path": "abs/2405.11993.md",
    "content": "### GGAvatar: Geometric Adjustment of Gaussian Head Avatar\n\nWe propose GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: Neutral Gaussian Initialization Module and Geometry Morph Adjuster. Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, employing an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformation behaviors to address the Linear Blend Skinning formula's limitations effectively. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.\n\n我们提出了GGAvatar，这是一种新颖的3D头像表示方法，旨在健壮地模拟具有复杂身份和变形的动态头像。GGAvatar采用从粗到细的结构，包括两个核心模块：中性高斯初始化模块和几何形态调整器。中性高斯初始化模块将高斯原始体与可变形的三角网格配对，采用自适应密度控制策略来模拟目标对象在中性表情下的几何结构。几何形态调整器为全局空间中的每个高斯引入变形基，创建精细的低维变形行为表示，有效解决线性混合蒙皮公式的局限。广泛的实验表明，GGAvatar可以产生高保真的渲染效果，在视觉质量和定量指标上超越了最先进的方法。\n"
  },
  {
    "path": "abs/2405.12069.md",
    "content": "### Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping\n\nBy equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects.\n\n通过将最新的3D高斯喷溅表示与头部3D可变形模型（3DMM）相结合，现有方法成功地创建了高保真的头像。然而，大多数现有方法只重建了头部而非身体，这在很大程度上限制了它们的应用场景。我们发现，直接应用高斯模型来模拟穿着衣服的胸部和肩膀往往会导致模糊的重建和在新姿势下出现噪声浮点。这是因为高斯和点云的根本限制——每个高斯或点只能有单一方向的辐射而没有空间变异，因此需要大量的点来表示即使是简单几何形状的复杂空间变化纹理。与此相反，我们提出用由粗糙和依赖姿势的细致颜色组成的神经纹理来模拟身体部分。为了在没有精确几何结构或UV映射的情况下正确渲染每个视角和姿势的身体纹理，我们优化了另一组稀疏的高斯锚点，这些锚点限制了将图像平面坐标映射到纹理空间的神经变形场。我们证明了高斯头部与肩膀能够以高保真度适配穿衣上半身的高频细节，并有可能提高头部区域的准确性和保真度。我们使用随意拍摄的手机视频和互联网视频评估我们的方法，并展示了我们的方法在自我和交叉再现任务中具有卓越的重建质量和鲁棒性。为了充分利用3D高斯喷溅的高效渲染速度，我们还提出了一种加速推理方法，无需多层感知器（MLP）查询，就能达到大约130 FPS的稳定渲染速度。\n"
  },
  {
    "path": "abs/2405.12110.md",
    "content": "### CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization\n\nSplatting (3DGS) creates a radiance field consisting of 3D Gaussians to represent a scene. With sparse training views, 3DGS easily suffers from overfitting, negatively impacting the reconstruction quality. This paper introduces a new co-regularization perspective for improving sparse-view 3DGS. When training two 3D Gaussian radiance fields with the same sparse views of a scene, we observe that the two radiance fields exhibit point disagreement and rendering disagreement that can unsupervisedly predict reconstruction quality, stemming from the sampling implementation in densification. We further quantify the point disagreement and rendering disagreement by evaluating the registration between Gaussians' point representations and calculating differences in their rendered pixels. The empirical study demonstrates the negative correlation between the two disagreements and accurate reconstruction, which allows us to identify inaccurate reconstruction without accessing ground-truth information. Based on the study, we propose CoR-GS, which identifies and suppresses inaccurate reconstruction based on the two disagreements: (1) Co-pruning considers Gaussians that exhibit high point disagreement in inaccurate positions and prunes them. (2) Pseudo-view co-regularization considers pixels that exhibit high rendering disagreement are inaccurately rendered and suppress the disagreement. Results on LLFF, Mip-NeRF360, DTU, and Blender demonstrate that CoR-GS effectively regularizes the scene geometry, reconstructs the compact representations, and achieves state-of-the-art novel view synthesis quality under sparse training views.\n\n3D高斯喷溅（3DGS）创建一个由3D高斯组成的辐射场来表示场景。在稀疏训练视图的情况下，3DGS容易过拟合，这对重建质量产生负面影响。本文引入了一种新的共正则化视角，用于改善稀疏视图下的3DGS。当使用同一场景的相同稀疏视图训练两个3D高斯辐射场时，我们观察到两个辐射场表现出\\textit{点不一致}和\\textit{渲染不一致}，这两种不一致可以无监督地预测重建质量，源自在密集化中的采样实现。我们通过评估高斯点表示之间的配准以及计算它们渲染像素的差异来进一步量化点不一致和渲染不一致。实证研究表明两种不一致与精确重建之间的负相关性，这使我们能够在不访问真实信息的情况下识别不准确的重建。基于这项研究，我们提出了CoR-GS，该方法基于两种不一致来识别和抑制不准确的重建：（1）共修剪考虑显示高点不一致的高斯处于不准确的位置并将其修剪。（2）伪视图共正则化考虑显示高渲染不一致的像素是不准确渲染的，并抑制这种不一致。在LLFF、Mip-NeRF360、DTU和Blender数据集上的结果表明，CoR-GS有效地规范了场景几何结构，重建了紧凑的表示，并在稀疏训练视图下实现了最先进的新视角合成质量。\n"
  },
  {
    "path": "abs/2405.12218.md",
    "content": "### Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo\n\nWe present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.\n\n我们提出了MVSGaussian，这是一种新的从多视图立体（MVS）衍生的通用3D高斯表示方法，能够高效地重建未见过的场景。具体来说，1）我们利用MVS来编码具有几何意识的高斯表示，并将其解码为高斯参数。2）为了进一步提高性能，我们提出了一种混合高斯渲染技术，该技术整合了一种高效的体积渲染设计，用于新视角合成。3）为了支持特定场景的快速微调，我们引入了一种多视图几何一致性聚合策略，有效地聚合由通用模型生成的点云，作为每个场景优化的初始化。与之前需要几分钟微调时间和每幅图像几秒钟渲染时间的通用NeRF-based方法相比，MVSGaussian实现了每个场景更好的合成质量的实时渲染。与原始的3D-GS相比，MVSGaussian在更低的训练计算成本下实现了更好的视图合成。在DTU、Real Forward-facing、NeRF Synthetic以及Tanks and Temples数据集上的广泛实验验证了MVSGaussian具有卓越的性能、令人信服的通用性、实时渲染速度和快速的场景特定优化能力。\n"
  },
  {
    "path": "abs/2405.12369.md",
    "content": "### AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field\n\n3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods.\n\n3D高斯喷溅（3DGS）最近通过提供卓越的新视角合成能力和实时渲染速度，推进了辐射场重建技术。然而，其融合优化和自适应密度控制的策略可能导致次优结果；有时由于优先优化大的高斯而牺牲了适当增密小高斯，可能产生噪声几何和模糊的伪影。为了解决这个问题，我们引入了AtomGS，包括原子化扩增和几何引导优化。原子化扩增将不同大小的椭球高斯限制为更统一大小的原子高斯。这种策略通过在场景细节方面强调密度增加，增强了细微特征区域的表示。此外，我们提出了一种几何引导优化方法，其中包含了边缘感知的法线损失。这种优化方法有效地平滑了平坦表面，同时保留了复杂的细节。我们的评估显示，AtomGS在渲染质量上超过了现有的最先进方法。此外，它在几何重建的准确性上具有竞争力，并且相比其他基于SDF的方法在训练速度上有显著提高。\n"
  },
  {
    "path": "abs/2405.12420.md",
    "content": "### GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details\n\nTraditional 3D garment creation is labor-intensive, involving sketching, modeling, UV mapping, and texturing, which are time-consuming and costly. Recent advances in diffusion-based generative models have enabled new possibilities for 3D garment generation from text prompts, images, and videos. However, existing methods either suffer from inconsistencies among multi-view images or require additional processes to separate cloth from the underlying human model. In this paper, we propose GarmentDreamer, a novel method that leverages 3D Gaussian Splatting (GS) as guidance to generate wearable, simulation-ready 3D garment meshes from text prompts. In contrast to using multi-view images directly predicted by generative models as guidance, our 3DGS guidance ensures consistent optimization in both garment deformation and texture synthesis. Our method introduces a novel garment augmentation module, guided by normal and RGBA information, and employs implicit Neural Texture Fields (NeTF) combined with Score Distillation Sampling (SDS) to generate diverse geometric and texture details. We validate the effectiveness of our approach through comprehensive qualitative and quantitative experiments, showcasing the superior performance of GarmentDreamer over state-of-the-art alternatives.\n\n传统的3D服装创建是劳动密集型的，涉及素描、建模、UV映射和纹理化等多个步骤，这些步骤既耗时又昂贵。最近，基于扩散的生成模型的进展为从文本提示、图像和视频生成3D服装开辟了新的可能性。然而，现有方法要么在多视图图像中存在不一致，要么需要额外的过程来从底层人体模型中分离出衣物。在本文中，我们提出了一种名为GarmentDreamer的新方法，该方法利用3D高斯喷溅（GS）作为指导，从文本提示生成可穿戴、可模拟的3D服装网格。与直接使用生成模型预测的多视图图像作为指导相比，我们的3DGS指导确保了服装变形和纹理合成的一致优化。我们的方法引入了一个由法线和RGBA信息指导的新颖服装增强模块，并结合了隐式神经纹理场（NeTF）和得分蒸馏采样（SDS）来生成多样化的几何和纹理细节。我们通过全面的定性和定量实验验证了我们方法的有效性，展示了GarmentDreamer相比于现有最先进方法的卓越性能。\n"
  },
  {
    "path": "abs/2405.12477.md",
    "content": "### Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery\n\nAlthough 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions.\n\n尽管3D高斯喷溅（3DGS）在3D人体重建领域最近取得了进展，但它主要依赖于2D像素级监督，忽视了不同身体部位的几何复杂性和拓扑关系。为了解决这一差距，我们引入了层次化图形人体高斯控制（HUGS）框架，以实现高保真的3D人体重建。我们的方法利用身体部位的显式语义先验，以确保几何拓扑的一致性，从而能够捕捉身体部位之间复杂的几何和拓扑关联。此外，我们从全局人体特征中分离出高频特征，以细化身体部位的表面细节。广泛的实验表明，我们的方法在人体重建方面表现出色，特别是在增强表面细节和准确重建身体部分接合处方面。\n"
  },
  {
    "path": "abs/2405.12663.md",
    "content": "### LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting\n\nCreating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable avatars with diverse garments. By decoupling garments from avatar, our framework empowers users to conviniently edit avatars at the garment level. Our approach begins by modeling the avatar using a set of Gaussian points organized in a layered structure, where each layer corresponds to a specific garment or the human body itself. To generate high-quality garments for each layer, we introduce a coarse-to-fine strategy for diverse garment generation and a novel dual-SDS loss function to maintain coherence between the generated garments and avatar components, including the human body and other garments. Moreover, we introduce three regularization losses to guide the movement of Gaussians for garment transfer, allowing garments to be freely transferred to various avatars. Extensive experimentation demonstrates that our approach surpasses existing methods in the generation of 3D clothed humans.\n\n创建和定制基于文本描述的3D穿着服装的虚拟形象是一项关键且具有挑战性的任务。传统方法常常将人体和服装视为不可分割的整体，限制了用户自由混搭服装的能力。为了应对这一限制，我们提出了分层高斯虚拟形象（LAGA），一个精心设计的框架，使得创建高保真、可分解的虚拟形象和多样化的服装成为可能。通过将服装与虚拟形象分离，我们的框架使用户能够方便地在服装层面编辑虚拟形象。我们的方法首先利用一组在分层结构中组织的高斯点来建模虚拟形象，其中每一层对应于特定的服装或人体本身。为了为每一层生成高质量的服装，我们引入了一个从粗到细的多样化服装生成策略和一个新颖的双SDS损失函数，以保持生成的服装与虚拟形象组件（包括人体和其他服装）之间的一致性。此外，我们引入了三个正则化损失来指导高斯点的移动，实现服装转移，允许将服装自由转移到不同的虚拟形象上。广泛的实验表明，我们的方法在生成3D穿着人形方面超越了现有方法。\n"
  },
  {
    "path": "abs/2405.12806.md",
    "content": "### MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video\n\nSingle-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clothed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively.\n\n单视图穿着人体重建在虚拟现实应用中占据核心地位，尤其是在涉及复杂人体运动的情境中。它在实现真实服装变形方面呈现出显著的挑战。当前的方法常常忽视运动对表面变形的影响，导致重建的表面缺乏全局运动所施加的约束。为了克服这些限制，我们引入了一个创新的框架，基于运动的3D穿着人体合成（MOSS），该框架利用运动学信息实现对人体表面的运动感知高斯分裂。我们的框架包括两个模块：运动学高斯定位喷溅（KGAS）和表面变形探测器（UID）。KGAS采用矩阵-费舍尔分布来传播全身表面的全局运动。这种分布的密度和旋转因素显式控制高斯，从而增强重建表面的真实性。此外，为了应对单视图中的局部遮挡，基于KGAS，UID识别重要表面，并执行几何重建以补偿这些变形。实验结果表明，MOSS在从单眼视频合成3D穿着人形方面实现了最先进的视觉质量。特别地，我们分别在Human NeRF和高斯喷溅的LPIPS*指标上提高了33.94%和16.75%。\n"
  },
  {
    "path": "abs/2405.13694.md",
    "content": "### Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances\n\nRecent advancements in neural rendering techniques have significantly enhanced the fidelity of 3D reconstruction. Notably, the emergence of 3D Gaussian Splatting (3DGS) has marked a significant milestone by adopting a discrete scene representation, facilitating efficient training and real-time rendering. Several studies have successfully extended the real-time rendering capability of 3DGS to dynamic scenes. However, a challenge arises when training images are captured under vastly differing weather and lighting conditions. This scenario poses a challenge for 3DGS and its variants in achieving accurate reconstructions. Although NeRF-based methods (NeRF-W, CLNeRF) have shown promise in handling such challenging conditions, their computational demands hinder real-time rendering capabilities. In this paper, we present Gaussian Time Machine (GTM) which models the time-dependent attributes of Gaussian primitives with discrete time embedding vectors decoded by a lightweight Multi-Layer-Perceptron(MLP). By adjusting the opacity of Gaussian primitives, we can reconstruct visibility changes of objects. We further propose a decomposed color model for improved geometric consistency. GTM achieved state-of-the-art rendering fidelity on 3 datasets and is 100 times faster than NeRF-based counterparts in rendering. Moreover, GTM successfully disentangles the appearance changes and renders smooth appearance interpolation.\n\n近期神经渲染技术的进展显著提高了三维重建的真实感。特别是3D 高斯投影（3D Gaussian Splatting，简称3DGS）的出现，通过采用离散场景表示，促进了高效训练和实时渲染的重大进展。几项研究已成功将3DGS的实时渲染能力扩展到动态场景。然而，当训练图像在极其不同的天气和光照条件下捕获时，会出现挑战。这种情况对3DGS及其变体在实现精确重建方面构成了挑战。尽管基于NeRF的方法（如NeRF-W、CLNeRF）在处理这类困难条件下表现出了希望，但它们的计算需求限制了实时渲染能力。在本文中，我们介绍了高斯时间机器（Gaussian Time Machine，简称GTM），它通过解码轻量级多层感知机（MLP）的离散时间嵌入向量来模拟高斯原语的时间依赖属性。通过调整高斯原语的不透明度，我们可以重建对象的可见性变化。我们进一步提出了一个分解的颜色模型以改善几何一致性。GTM在三个数据集上实现了最先进的渲染真实感，并且在渲染速度上比基于NeRF的对应方法快100倍。此外，GTM成功分离了外观变化，并实现了平滑的外观插值。\n"
  },
  {
    "path": "abs/2405.13748.md",
    "content": "### Monocular Gaussian SLAM with Language Extended Loop Closure\n\nRecently,3DGaussianSplattinghasshowngreatpotentialin visual Simultaneous Localization And Mapping (SLAM). Existing methods have achieved encouraging results on RGB-D SLAM, but studies of the monocular case are still scarce. Moreover, they also fail to correct drift errors due to the lack of loop closure and global optimization. In this paper, we present MG-SLAM, a monocular Gaussian SLAM with a language-extended loop closure module capable of performing drift-corrected tracking and high-fidelity reconstruction while achieving a high-level understanding of the environment. Our key idea is to represent the global map as 3D Gaussian and use it to guide the estimation of the scene geometry, thus mitigating the efforts of missing depth information. Further, an additional language-extended loop closure module which is based on CLIP feature is designed to continually perform global optimization to correct drift errors accumulated as the system runs. Our system shows promising results on multiple challenging datasets in both tracking and mapping and even surpasses some existing RGB-D methods.\n\n最近，3D高斯投影在视觉同时定位与建图（SLAM）中显示出巨大潜力。现有方法在RGB-D SLAM上取得了鼓舞人心的结果，但对单目情况的研究仍然较少。此外，由于缺乏环路闭合和全局优化，它们也未能纠正漂移错误。在本文中，我们介绍了MG-SLAM，一种具有语言扩展环路闭合模块的单目高斯SLAM，能够进行校正漂移的跟踪和高保真重建，同时实现对环境的高级理解。我们的核心思想是将全球地图表示为3D高斯，并使用它来指导场景几何的估计，从而减轻缺失深度信息的努力。此外，我们还设计了一个基于CLIP特征的额外语言扩展环路闭合模块，用于持续进行全球优化以纠正系统运行中累积的漂移错误。我们的系统在多个具有挑战性的数据集上显示出在跟踪和映射方面的有希望的结果，甚至超过了一些现有的RGB-D方法。\n"
  },
  {
    "path": "abs/2405.13943.md",
    "content": "### DoGaussian: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus\n\nThe recent advances in 3D Gaussian Splatting (3DGS) show promising results on the novel view synthesis (NVS) task. With its superior rendering performance and high-fidelity rendering quality, 3DGS is excelling at its previous NeRF counterparts. The most recent 3DGS method focuses either on improving the instability of rendering efficiency or reducing the model size. On the other hand, the training efficiency of 3DGS on large-scale scenes has not gained much attention. In this work, we propose DoGaussian, a method that trains 3DGS distributedly. Our method first decomposes a scene into K blocks and then introduces the Alternating Direction Method of Multipliers (ADMM) into the training procedure of 3DGS. During training, our DoGaussian maintains one global 3DGS model on the master node and K local 3DGS models on the slave nodes. The K local 3DGS models are dropped after training and we only query the global 3DGS model during inference. The training time is reduced by scene decomposition, and the training convergence and stability are guaranteed through the consensus on the shared 3D Gaussians. Our method accelerates the training of 3DGS by 6+ times when evaluated on large-scale scenes while concurrently achieving state-of-the-art rendering quality.\n\n最近在3D高斯投影（3DGS）方面的进展在新视角合成（NVS）任务上显示出有希望的结果。凭借其卓越的渲染性能和高保真渲染质量，3DGS在其先前的NeRF对应物中表现出色。最新的3DGS方法要么专注于改善渲染效率的不稳定性，要么致力于减小模型大小。另一方面，3DGS在大规模场景上的训练效率尚未受到太多关注。在这项工作中，我们提出了DoGaussian，一种分布式训练3DGS的方法。我们的方法首先将场景分解为K个块，然后在3DGS的训练过程中引入交替方向乘子法（ADMM）。在训练期间，我们的DoGaussian在主节点上维护一个全局3DGS模型，在从节点上维护K个本地3DGS模型。训练后会丢弃K个本地3DGS模型，我们只在推理期间查询全球3DGS模型。通过场景分解减少了训练时间，并通过对共享的3D高斯的一致性保证了训练的收敛和稳定性。我们的方法在对大规模场景进行评估时，将3DGS的训练速度提高了6倍以上，同时还实现了最先进的渲染质量。\n"
  },
  {
    "path": "abs/2405.14241.md",
    "content": "### NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation\n\nPoint Cloud Interpolation confronts challenges from point sparsity, complex spatiotemporal dynamics, and the difficulty of deriving complete 3D point clouds from sparse temporal information. This paper presents NeuroGauss4D-PCI, which excels at modeling complex non-rigid deformations across varied dynamic scenes. The method begins with an iterative Gaussian cloud soft clustering module, offering structured temporal point cloud representations. The proposed temporal radial basis function Gaussian residual utilizes Gaussian parameter interpolation over time, enabling smooth parameter transitions and capturing temporal residuals of Gaussian distributions. Additionally, a 4D Gaussian deformation field tracks the evolution of these parameters, creating continuous spatiotemporal deformation fields. A 4D neural field transforms low-dimensional spatiotemporal coordinates (x,y,z,t) into a high-dimensional latent space. Finally, we adaptively and efficiently fuse the latent features from neural fields and the geometric features from Gaussian deformation fields. NeuroGauss4D-PCI outperforms existing methods in point cloud frame interpolation, delivering leading performance on both object-level (DHB) and large-scale autonomous driving datasets (NL-Drive), with scalability to auto-labeling and point cloud densification tasks.\n\n点云插值面临的挑战包括点稀疏性、复杂的时空动态以及从稀疏时间信息中推导出完整3D点云的困难。本文介绍了NeuroGauss4D-PCI，该方法擅长对各种动态场景中的复杂非刚性变形进行建模。该方法从一个迭代的高斯云软聚类模块开始，提供结构化的时序点云表示。所提出的时序径向基函数高斯残差利用时间上的高斯参数插值，实现平滑的参数转换并捕捉高斯分布的时间残差。此外，一个4D高斯变形场跟踪这些参数的演化，创建连续的时空变形场。一个4D神经场将低维时空坐标(x,y,z,t)转换成高维潜在空间。最后，我们适应性地、高效地融合来自神经场的潜在特征和来自高斯变形场的几何特征。NeuroGauss4D-PCI在点云帧插值方面优于现有方法，在对象级（DHB）和大规模自动驾驶数据集（NL-Drive）上均展现出领先性能，并且可扩展到自动标记和点云密集化任务。\n"
  },
  {
    "path": "abs/2405.14276.md",
    "content": "### D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup\n\nOver the past years, we have observed an abundance of approaches for modeling dynamic 3D scenes using Gaussian Splatting (GS). Such solutions use GS to represent the scene's structure and the neural network to model dynamics. Such approaches allow fast rendering and extracting each element of such a dynamic scene. However, modifying such objects over time is challenging. SC-GS (Sparse Controlled Gaussian Splatting) enhanced with Deformed Control Points partially solves this issue. However, this approach necessitates selecting elements that need to be kept fixed, as well as centroids that should be adjusted throughout editing. Moreover, this task poses additional difficulties regarding the re-productivity of such editing. To address this, we propose Dynamic Multi-Gaussian Soup (D-MiSo), which allows us to model the mesh-inspired representation of dynamic GS. Additionally, we propose a strategy of linking parameterized Gaussian splats, forming a Triangle Soup with the estimated mesh. Consequently, we can separately construct new trajectories for the 3D objects composing the scene. Thus, we can make the scene's dynamic editable over time or while maintaining partial dynamics.\n\n近年来，我们观察到许多使用高斯投影（GS）来建模动态3D场景的方法。这些解决方案使用GS来表示场景的结构，并使用神经网络来模拟动态。这种方法允许快速渲染并提取这种动态场景的每个元素。然而，随时间修改这些对象是一项挑战。通过变形控制点增强的SC-GS（稀疏控制高斯投影）部分解决了这个问题。然而，这种方法需要选择需要保持固定的元素，以及在编辑过程中应调整的质心。此外，这项任务还带来了关于此类编辑可重复性的额外困难。为了解决这个问题，我们提出了动态多高斯汤（D-MiSo），它允许我们模拟动态GS的网格启发式表示。此外，我们提出了一种策略，将参数化的高斯斑点链接起来，形成带有估计网格的三角汤。因此，我们可以单独构建构成场景的3D对象的新轨迹。这样，我们可以在保持部分动态的同时，随时间编辑场景的动态。\n"
  },
  {
    "path": "abs/2405.14342.md",
    "content": "### RoGS: Large Scale Road Surface Reconstruction based on 2D Gaussian Splatting\n\nRoad surface reconstruction plays a crucial role in autonomous driving, which can be used for road lane perception and autolabeling tasks. Recently, mesh-based road surface reconstruction algorithms show promising reconstruction results. However, these mesh-based methods suffer from slow speed and poor rendering quality. In contrast, the 3D Gaussian Splatting (3DGS) shows superior rendering speed and quality. Although 3DGS employs explicit Gaussian spheres to represent the scene, it lacks the ability to directly represent the geometric information of the scene. To address this limitation, we propose a novel large-scale road surface reconstruction approach based on 2D Gaussian Splatting (2DGS), named RoGS. The geometric shape of the road is explicitly represented using 2D Gaussian surfels, where each surfel stores color, semantics, and geometric information. Compared to Gaussian spheres, the Gaussian surfels aligns more closely with the physical reality of the road. Distinct from previous initialization methods that rely on point clouds for Gaussian spheres, we introduce a trajectory-based initialization for Gaussian surfels. Thanks to the explicit representation of the Gaussian surfels and a good initialization, our method achieves a significant acceleration while improving reconstruction quality. We achieve excellent results in reconstruction of roads surfaces in a variety of challenging real-world scenes.\n\n道路表面重建在自动驾驶中扮演着关键角色，可用于道路车道感知和自动标记任务。近期，基于网格的道路表面重建算法显示出有希望的重建结果。然而，这些基于网格的方法存在速度慢和渲染质量差的问题。相比之下，3D高斯投影（3DGS）显示出更优的渲染速度和质量。尽管3DGS使用显式的高斯球体来表示场景，但它缺乏直接表示场景几何信息的能力。为了解决这个限制，我们提出了一种基于2D高斯投影（2DGS）的新型大规模道路表面重建方法，名为RoGS。道路的几何形状使用2D高斯表面元显式表示，每个表面元存储颜色、语义和几何信息。与高斯球体相比，高斯表面元更紧密地与道路的物理现实相符。与之前依赖点云进行高斯球体初始化的方法不同，我们引入了基于轨迹的高斯表面元初始化。得益于高斯表面元的显式表示和良好的初始化，我们的方法实现了显著的加速，同时提高了重建质量。我们在多种具有挑战性的现实世界场景中实现了道路表面的优秀重建结果。\n"
  },
  {
    "path": "abs/2405.14455.md",
    "content": "### TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing\n\nEditing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics. As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieve the target objects and subsequently performing modifications based on instructions. Though available in pieces, existing techniques mainly embed sparse semantics into Gaussians for retrieval, and rely on an iterative dataset update paradigm for editing, leading to over-smoothing or inconsistency issues. To this end, this paper proposes a systematic approach, namely TIGER, for coherent text-instructed 3D Gaussian retrieval and editing. In contrast to the top-down language grounding approach for 3D Gaussians, we adopt a bottom-up language aggregation strategy to generate a denser language embedded 3D Gaussians that supports open-vocabulary retrieval. To overcome the over-smoothing and inconsistency issues in editing, we propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation, producing multi-view consistent editing with much finer details. In various experiments, we demonstrate that our TIGER is able to accomplish more consistent and realistic edits than prior work.\n\n在计算机视觉和图形学的广泛应用中，场景内对象的编辑是一项关键功能。随着3D高斯投影（3DGS）成为场景表示的前沿，有效修改3D高斯场景变得越来越重要。这个过程包括准确地检索目标对象，然后根据指令执行修改。尽管现有技术提供了一些解决方案，这些技术主要是将稀疏语义嵌入到高斯中进行检索，并依赖于迭代数据集更新范式进行编辑，这导致了过度平滑或不一致性问题。为此，本文提出了一种系统方法，即TIGER，用于连贯的文本指导的3D高斯检索和编辑。与为3D高斯采用的自上而下的语言定位方法不同，我们采用自下而上的语言聚合策略，以生成支持开放词汇检索的更密集的语言嵌入3D高斯。为了克服编辑中的过度平滑和不一致性问题，我们提出了一种连贯得分蒸馏（CSD），该策略结合了2D图像编辑扩散模型和多视图扩散模型进行得分蒸馏，产生具有更细致细节的多视图一致性编辑。在各种实验中，我们展示了我们的TIGER能够比以前的工作实现更一致和更现实的编辑。\n"
  },
  {
    "path": "abs/2405.14475.md",
    "content": "### MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes\n\nWhile controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its transformative potential for autonomous driving simulation and beyond.\n\n虽然可控生成模型在图像和视频领域取得了显著的成功，但在无界限场景（如自动驾驶）中，高质量的3D场景模型仍然处于不发达状态，原因是数据获取成本高。在本文中，我们介绍了MagicDrive3D，这是一种新颖的可控3D街景生成流程，支持包括鸟瞰图（BEV）地图、3D对象和文本描述在内的多条件控制。与先前的方法不同，这些方法在训练生成模型之前进行重建，MagicDrive3D首先训练一个视频生成模型，然后从生成的数据进行重建。这种创新方法使得生成控制更加容易，并且能够获取静态场景，从而实现高质量的场景重建。为了解决生成内容中的小错误，我们提出了带有单目深度初始化和外观建模的可变形高斯投影，以管理不同视点之间的曝光差异。在nuScenes数据集上进行验证，MagicDrive3D生成了多样化的高质量3D驾驶场景，支持任意视角渲染，并增强了下游任务，如BEV分割。我们的结果展示了该框架的优越性能，展现了其对自动驾驶仿真及其它领域的变革潜力。\n\n"
  },
  {
    "path": "abs/2405.14959.md",
    "content": "### EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting\n\nEvent cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based generalizable 3D reconstruction framework, called EvGGS, which reconstructs scenes as 3D Gaussians from only event input in a feedforward manner and can generalize to unseen cases without any retraining. This framework includes a depth estimation module, an intensity reconstruction module, and a Gaussian regression module. These submodules connect in a cascading manner, and we collaboratively train them with a designed joint loss to make them mutually promote. To facilitate related studies, we build a novel event-based 3D dataset with various material objects and calibrated labels of grayscale images, depth maps, camera poses, and silhouettes. Experiments show models that have jointly trained significantly outperform those trained individually. Our approach performs better than all baselines in reconstruction quality, and depth/intensity predictions with satisfactory rendering speed.\n\n事件相机以其高动态范围和低延迟的优势备受推崇，非常适合光照条件复杂和快速移动的场景。然而，从原始事件流重建三维场景非常困难，因为事件数据稀疏且不包含绝对颜色信息。为了释放其在三维重建中的潜力，我们提出了第一个基于事件的可泛化三维重建框架，称为 EvGGS，它可以仅从事件输入中以前馈方式重建场景为三维高斯模型，并能在未见过的案例中泛化，无需重新训练。该框架包括深度估计模块、强度重建模块和高斯回归模块。这些子模块以级联方式连接，并通过设计的联合损失函数进行协同训练，以促进它们的相互提升。为了促进相关研究，我们构建了一个新颖的基于事件的三维数据集，包括各种材料的对象和经过校准的灰度图像、深度图、相机姿态和剪影标签。实验显示，联合训练的模型显著优于单独训练的模型。我们的方法在重建质量、深度/强度预测方面均优于所有基线，并且渲染速度令人满意。\n"
  },
  {
    "path": "abs/2405.15118.md",
    "content": "### GS-Hider: Hiding Messages into 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has already become the emerging research focus in the fields of 3D scene reconstruction and novel view synthesis. Given that training a 3DGS requires a significant amount of time and computational cost, it is crucial to protect the copyright, integrity, and privacy of such 3D assets. Steganography, as a crucial technique for encrypted transmission and copyright protection, has been extensively studied. However, it still lacks profound exploration targeted at 3DGS. Unlike its predecessor NeRF, 3DGS possesses two distinct features: 1) explicit 3D representation; and 2) real-time rendering speeds. These characteristics result in the 3DGS point cloud files being public and transparent, with each Gaussian point having a clear physical significance. Therefore, ensuring the security and fidelity of the original 3D scene while embedding information into the 3DGS point cloud files is an extremely challenging task. To solve the above-mentioned issue, we first propose a steganography framework for 3DGS, dubbed GS-Hider, which can embed 3D scenes and images into original GS point clouds in an invisible manner and accurately extract the hidden messages. Specifically, we design a coupled secured feature attribute to replace the original 3DGS's spherical harmonics coefficients and then use a scene decoder and a message decoder to disentangle the original RGB scene and the hidden message. Extensive experiments demonstrated that the proposed GS-Hider can effectively conceal multimodal messages without compromising rendering quality and possesses exceptional security, robustness, capacity, and flexibility.\n\n三维高斯喷溅技术（3DGS）已成为三维场景重建和新视角合成领域的新兴研究焦点。鉴于训练3DGS需要大量时间和计算成本，保护这类三维资产的版权、完整性和隐私至关重要。隐写术作为加密传输和版权保护的关键技术已经被广泛研究。然而，针对3DGS的深入探索仍然不足。与其前身NeRF不同，3DGS具有两个独特特征：1) 明确的三维表示；和2) 实时渲染速度。这些特性导致3DGS点云文件公开且透明，每个高斯点都具有明确的物理意义。因此，在将信息嵌入3DGS点云文件的同时确保原始三维场景的安全性和保真度是一项极其具有挑战性的任务。为了解决上述问题，我们首次提出一个针对3DGS的隐写框架，称为GS-Hider，它可以以隐形方式将三维场景和图像嵌入原始GS点云中，并准确提取隐藏信息。具体来说，我们设计了一个耦合的安全特征属性来替换原始3DGS的球谐系数，然后使用场景解码器和信息解码器来分离原始RGB场景和隐藏信息。广泛的实验表明，所提出的GS-Hider能够有效地隐藏多模态信息，同时不损害渲染质量，并且具有卓越的安全性、鲁棒性、容量和灵活性。\n"
  },
  {
    "path": "abs/2405.15125.md",
    "content": "### HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting\n\nHigh dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed. In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. Specifically, we design a Dual Dynamic Range (DDR) Gaussian point cloud model that uses spherical harmonics to fit HDR color and employs an MLP-based tone-mapper to render LDR color. The HDR and LDR colors are then fed into two Parallel Differentiable Rasterization (PDR) processes to reconstruct HDR and LDR views. To establish the data foundation for the research of 3D Gaussian splatting-based methods in HDR NVS, we recalibrate the camera parameters and compute the initial positions for Gaussian point clouds. Experiments demonstrate that our HDR-GS surpasses the state-of-the-art NeRF-based method by 3.84 and 1.91 dB on LDR and HDR NVS while enjoying 1000x inference speed and only requiring 6.3% training time.\n\n高动态范围（HDR）新视角合成（NVS）旨在使用HDR成像技术从新的视角创建逼真的图像。渲染的HDR图像捕捉更广泛的亮度级别，包含比普通低动态范围（LDR）图像更多的场景细节。现有的HDR NVS方法主要基于NeRF，它们的缺点是训练时间长和推理速度慢。在本文中，我们提出了一个新框架，高动态范围高斯喷溅（HDR-GS），它可以高效地渲染新的HDR视角并根据用户输入的曝光时间重建LDR图像。具体来说，我们设计了一个双动态范围（DDR）高斯点云模型，使用球谐函数拟合HDR颜色，并采用基于MLP的色调映射器来渲染LDR颜色。然后将HDR和LDR颜色输入到两个并行可微光栅化（PDR）过程中，以重建HDR和LDR视图。为了为HDR NVS中基于三维高斯喷溅方法的研究建立数据基础，我们重新校准了相机参数并计算了高斯点云的初始位置。实验表明，我们的HDR-GS在LDR和HDR NVS上分别超过了最新的基于NeRF的方法3.84 dB和1.91 dB，同时享有1000倍的推理速度，仅需6.3%的训练时间。\n"
  },
  {
    "path": "abs/2405.15196.md",
    "content": "### DisC-GS: Discontinuity-aware Gaussian Splatting\n\nRecently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To address this issue, we propose a novel framework enabling Gaussian Splatting to perform discontinuity-aware image rendering. Additionally, we introduce a Bézier-boundary gradient approximation strategy within our framework to keep the  \"differentiability\" of the proposed discontinuity-aware rendering process. Extensive experiments demonstrate the efficacy of our framework.\n\n最近，高斯喷溅法，一种将三维场景表示为高斯分布集合的方法，在解决新视角合成任务中获得了显著关注。在本文中，我们指出了高斯喷溅的一个基本局限性：由于高斯分布的连续性质，其无法准确渲染图像中的不连续性和边界。为了解决这个问题，我们提出了一个新框架，使高斯喷溅能够进行感知不连续性的图像渲染。此外，我们在框架中引入了一个贝塞尔边界梯度近似策略，以保持所提出的感知不连续性渲染过程的“可微分性”。广泛的实验表明我们框架的有效性。\n"
  },
  {
    "path": "abs/2405.15491.md",
    "content": "### GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting\n\nWe present GSDeformer, a method that achieves free-form deformation on 3D Gaussian Splatting(3DGS) without requiring any architectural changes. Our method extends cage-based deformation, a traditional mesh deformation method, to 3DGS. This is done by converting 3DGS into a novel proxy point cloud representation, where its deformation can be used to infer the transformations to apply on the 3D gaussians making up 3DGS. We also propose an automatic cage construction algorithm for 3DGS to minimize manual work. Our method does not modify the underlying architecture of 3DGS. Therefore, any existing trained vanilla 3DGS can be easily edited by our method. We compare the deformation capability of our method against other existing methods, demonstrating the ease of use and comparable quality of our method, despite being more direct and thus easier to integrate with other concurrent developments on 3DGS.\n\n我们介绍了GSDeformer，这是一种在三维高斯喷溅（3DGS）上实现自由形变的方法，无需对架构进行任何改变。我们的方法是将基于笼子的形变——一种传统的网格形变方法——扩展到3DGS上。这是通过将3DGS转换成一种新的代理点云表示来实现的，其中的形变可以用来推断应用于构成3DGS的三维高斯的变换。我们还提出了一种自动笼子构建算法，用于3DGS，以尽量减少手工操作。我们的方法不修改3DGS的底层架构。因此，任何现有的训练过的普通3DGS都可以通过我们的方法轻松编辑。我们将我们方法的形变能力与其他现有方法进行了比较，证明了我们方法的易用性和可比性质量，尽管它更直接，因此更容易与3DGS的其他同时发展集成。\n"
  },
  {
    "path": "abs/2405.15518.md",
    "content": "### Feature Splatting for Better Novel View Synthesis with Low Overlap\n\n3D Gaussian Splatting has emerged as a very promising scene representation, achieving state-of-the-art quality in novel view synthesis significantly faster than competing alternatives. However, its use of spherical harmonics to represent scene colors limits the expressivity of 3D Gaussians and, as a consequence, the capability of the representation to generalize as we move away from the training views. In this paper, we propose to encode the color information of 3D Gaussians into per-Gaussian feature vectors, which we denote as Feature Splatting (FeatSplat). To synthesize a novel view, Gaussians are first \"splatted\" into the image plane, then the corresponding feature vectors are alpha-blended, and finally the blended vector is decoded by a small MLP to render the RGB pixel values. To further inform the model, we concatenate a camera embedding to the blended feature vector, to condition the decoding also on the viewpoint information. Our experiments show that these novel model for encoding the radiance considerably improves novel view synthesis for low overlap views that are distant from the training views. Finally, we also show the capacity and convenience of our feature vector representation, demonstrating its capability not only to generate RGB values for novel views, but also their per-pixel semantic labels.\n\n三维高斯喷溅作为一种非常有前景的场景表示方法，已在新视角合成方面实现了前所未有的质量，其速度远快于竞争性替代方法。然而，它使用球谐函数来表示场景颜色限制了三维高斯的表达能力，因此，当我们远离训练视图时，该表示的泛化能力也受到限制。在本文中，我们提出将三维高斯的颜色信息编码到每个高斯的特征向量中，我们将这种方法称为特征喷溅（FeatSplat）。为了合成一个新的视角，首先将高斯“喷溅”到图像平面上，然后将相应的特征向量进行alpha混合，最后通过一个小型的多层感知器（MLP）解码混合向量，渲染出RGB像素值。为了进一步提供模型信息，我们将相机嵌入向量与混合特征向量连接起来，以便也根据视点信息进行解码条件设置。我们的实验表明，这种新的辐射编码模型显著提高了对训练视图远离且重叠度低的新视图的合成质量。最后，我们还展示了我们特征向量表示的容量和便利性，证明了它不仅能为新视图生成RGB值，还能生成每个像素的语义标签。\n"
  },
  {
    "path": "abs/2405.16517.md",
    "content": "### Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors\n\nWe aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.\n\n我们的目标是利用潜在扩散模型（LDM）的先验来解决360度三维场景的稀疏视图重建问题。稀疏视图设置本质上是不适定的和受限的，尤其是在相机围绕一个点旋转360度的场景中，因为除了一些集中在中心对象的正面视图之外，没有其他视觉信息可用。在这项工作中，我们展示了预训练的二维扩散模型能够显著改善场景的重建，并且只需低成本的微调。具体来说，我们提出了SparseSplat360（Sp2360），这是一种使用级联的补画和去除伪影模型的方法，用于填补缺失细节并清理新视图。由于训练和渲染速度的优势，我们采用了形式为三维高斯的显式场景表示，而不是基于NeRF的隐式表示。我们提出了一种迭代更新策略，将生成的伪新视图与适配初始稀疏输入的现有三维高斯融合。结果，我们获得了一个多视图一致的场景表示，其细节与观察到的输入一致。我们在具有挑战性的Mip-NeRF360数据集上的评估显示，我们提出的二维到三维蒸馏算法显著提高了适应稀疏视图设置的规则化三维高斯喷溅（3DGS）的性能，并超越了现有的稀疏视图重建方法在360场景重建中的表现。从质量上看，我们的方法可以从仅仅9个输入视图生成整个360度场景，且前景和背景细节的程度很高。\n"
  },
  {
    "path": "abs/2405.16544.md",
    "content": "### Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians\n\n3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Mapping (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neural points clouds, since they either do not employ global map and pose optimization or make use of monocular depth. In response, we propose the first RGB-only SLAM system with a dense 3D Gaussian map representation that utilizes all benefits of globally optimized tracking by adapting dynamically to keyframe pose and depth updates by actively deforming the 3D Gaussian map. Moreover, we find that refining the depth updates in inaccurate areas with a monocular depth estimator further improves the accuracy of the 3D reconstruction. Our experiments on the Replica, TUM-RGBD, and ScanNet datasets indicate the effectiveness of globally optimized 3D Gaussians, as the approach achieves superior or on par performance with existing RGB-only SLAM methods methods in tracking, mapping and rendering accuracy while yielding small map sizes and fast runtimes.\n\n三维高斯喷溅已成为仅使用RGB进行密集型同时定位与映射（SLAM）的强大的几何和外观表示方法，因为它提供了紧凑的密集地图表示，同时实现了高效和高质量的地图渲染。然而，现有方法与使用其他三维表示的竞争方法相比，重建质量显著较差，例如神经点云，这是因为它们要么没有采用全局地图和姿态优化，要么仅使用单眼深度。作为回应，我们提出了第一个仅使用RGB的SLAM系统，该系统采用密集的三维高斯地图表示，并利用全局优化跟踪的所有好处，通过动态适应关键帧姿态和深度更新来主动变形三维高斯地图。此外，我们发现，用单眼深度估计器在不准确区域细化深度更新进一步提高了三维重建的准确性。我们在Replica、TUM-RGBD和ScanNet数据集上进行的实验表明，全球优化的三维高斯方法是有效的，因为这种方法在跟踪、映射和渲染精度方面与现有的仅RGB SLAM方法相比具有优越或相当的性能，同时具有较小的地图大小和快速的运行时间。\n"
  },
  {
    "path": "abs/2405.16822.md",
    "content": "### Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels\n\nVideo generative models are receiving particular attention given their ability to generate realistic and imaginative frames. Besides, these models are also observed to exhibit strong 3D consistency, significantly enhancing their potential to act as world simulators. In this work, we present Vidu4D, a novel reconstruction model that excels in accurately reconstructing 4D (i.e., sequential 3D) representations from single generated videos, addressing challenges associated with non-rigidity and frame distortion. This capability is pivotal for creating high-fidelity virtual contents that maintain both spatial and temporal coherence. At the core of Vidu4D is our proposed Dynamic Gaussian Surfels (DGS) technique. DGS optimizes time-varying warping functions to transform Gaussian surfels (surface elements) from a static state to a dynamically warped state. This transformation enables a precise depiction of motion and deformation over time. To preserve the structural integrity of surface-aligned Gaussian surfels, we design the warped-state geometric regularization based on continuous warping fields for estimating normals. Additionally, we learn refinements on rotation and scaling parameters of Gaussian surfels, which greatly alleviates texture flickering during the warping process and enhances the capture of fine-grained appearance details. Vidu4D also contains a novel initialization state that provides a proper start for the warping fields in DGS. Equipping Vidu4D with an existing video generative model, the overall framework demonstrates high-fidelity text-to-4D generation in both appearance and geometry.\n\n视频生成模型因其生成真实且富有想象力的帧的能力而备受关注。此外，这些模型还展示了较强的3D一致性，大大增强了其作为世界模拟器的潜力。在本研究中，我们提出了 Vidu4D，一种新颖的重建模型，能够从单一生成视频中精确重建4D（即顺序3D）表示，解决了非刚性和帧畸变相关的挑战。这一能力对于创建具有空间和时间一致性的高保真虚拟内容至关重要。\nVidu4D 的核心是我们提出的 动态高斯表面元素（Dynamic Gaussian Surfels, DGS） 技术。DGS 通过优化时变变形函数，将高斯表面元素（surfels）从静态状态转换为动态变形状态，从而实现对运动和形变的精确描绘。为保持表面对齐的高斯表面元素的结构完整性，我们设计了基于连续变形场的几何正则化，用于估计法向。此外，我们学习了高斯表面元素在旋转和缩放参数上的精细调整，大大减轻了变形过程中的纹理闪烁问题，并增强了对细致外观细节的捕捉能力。\nVidu4D 还包含一种新颖的初始化状态，为 DGS 的变形场提供了合理的起点。结合现有的视频生成模型，整体框架在外观和几何方面展示了高保真的文本到4D生成能力。这一成果为生成一致且精确的4D内容开辟了新的路径。\n"
  },
  {
    "path": "abs/2405.16829.md",
    "content": "### PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting\n\nNeural Radiance Fields (NeRFs) have demonstrated remarkable proficiency in synthesizing photorealistic images of large-scale scenes. However, they are often plagued by a loss of fine details and long rendering durations. 3D Gaussian Splatting has recently been introduced as a potent alternative, achieving both high-fidelity visual results and accelerated rendering performance. Nonetheless, scaling 3D Gaussian Splatting is fraught with challenges. Specifically, large-scale scenes grapples with the integration of objects across multiple scales and disparate viewpoints, which often leads to compromised efficacy as the Gaussians need to balance between detail levels. Furthermore, the generation of initialization points via COLMAP from large-scale dataset is both computationally demanding and prone to incomplete reconstructions. To address these challenges, we present Pyramidal 3D Gaussian Splatting (PyGS) with NeRF Initialization. Our approach represent the scene with a hierarchical assembly of Gaussians arranged in a pyramidal fashion. The top level of the pyramid is composed of a few large Gaussians, while each subsequent layer accommodates a denser collection of smaller Gaussians. We effectively initialize these pyramidal Gaussians through sampling a rapidly trained grid-based NeRF at various frequencies. We group these pyramidal Gaussians into clusters and use a compact weighting network to dynamically determine the influence of each pyramid level of each cluster considering camera viewpoint during rendering. Our method achieves a significant performance leap across multiple large-scale datasets and attains a rendering time that is over 400 times faster than current state-of-the-art approaches.\n\n神经辐射场（NeRF）在合成大规模场景的逼真图像方面表现出了卓越的能力。然而，它们常常受到细节丢失和长时间渲染的困扰。最近引入的三维高斯喷溅作为一种强大的替代方案，实现了高保真的视觉效果和加速的渲染性能。尽管如此，扩展三维高斯喷溅面临着挑战。特别是，大规模场景需要处理多个尺度和不同视角的对象整合，这常常导致效能受损，因为高斯需要在细节水平之间取得平衡。此外，通过COLMAP从大规模数据集生成初始化点既计算要求高，又容易导致重建不完整。为了解决这些挑战，我们提出了具有NeRF初始化的金字塔式三维高斯喷溅（PyGS）。我们的方法使用金字塔式排列的高斯层次结构来表示场景。金字塔的顶层由少量大高斯组成，而每个后续层包含更密集的小高斯集合。我们通过在不同频率上采样一个快速训练的基于网格的NeRF有效地初始化这些金字塔高斯。我们将这些金字塔高斯聚集成簇，并使用一个紧凑的加权网络在渲染过程中动态确定每个簇的每个金字塔层次的影响，考虑到相机视角。我们的方法在多个大规模数据集上取得了显著的性能飞跃，并实现了比当前最先进方法快400多倍的渲染时间。\n"
  },
  {
    "path": "abs/2405.16849.md",
    "content": "### Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation\n\nIn this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shape reconstruction to extract the shape and motion of reference objects. This process involves segmenting the reference objects into motion-related parts based on skinning weights and establishing shape correspondences with generated target shapes. To address shape and temporal inconsistencies prevalent in existing methods, we integrate physical simulation, driving the target shapes with matched motion. This integration is optimized through a displacement loss to ensure reliable and genuine dynamics. Our approach supports diverse reference inputs, including humans, quadrupeds, and articulated objects, and can generate dynamics of arbitrary length, providing enhanced fidelity and applicability. Unlike methods heavily reliant on diffusion video generation models, our technique offers specific and high-quality motion transfer, maintaining both shape integrity and temporal consistency.\n\n在这项工作中，我们介绍了一种使用随意捕获的参考视频在三维生成的高斯中创造可控动态的新方法。我们的方法将参考视频中的物体运动转移到不同类别的生成三维高斯上，确保精确和可定制的运动传递。我们通过采用基于混合蒙皮的非参数形状重建来提取参考对象的形状和运动来实现这一点。这个过程涉及根据蒙皮权重将参考对象分割成与运动相关的部分，并与生成的目标形状建立形状对应关系。为了解决现有方法中普遍存在的形状和时间不一致性，我们整合了物理模拟，驱动目标形状与匹配的运动。这种整合通过位移损失优化，以确保可靠和真实的动态。我们的方法支持多样的参考输入，包括人类、四足动物和关节对象，并可以生成任意长度的动态，提供了更高的保真度和适用性。与严重依赖扩散视频生成模型的方法不同，我们的技术提供了特定的、高质量的运动转移，保持了形状完整性和时间连贯性。\n"
  },
  {
    "path": "abs/2405.16923.md",
    "content": "### SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain\n\nWith the emergence of Gaussian Splats, recent efforts have focused on large-scale scene geometric reconstruction. However, most of these efforts either concentrate on memory reduction or spatial space division, neglecting information in the semantic space. In this paper, we propose a novel method, named SA-GS, for fine-grained 3D geometry reconstruction using semantic-aware 3D Gaussian Splats. Specifically, we leverage prior information stored in large vision models such as SAM and DINO to generate semantic masks. We then introduce a geometric complexity measurement function to serve as soft regularization, guiding the shape of each Gaussian Splat within specific semantic areas. Additionally, we present a method that estimates the expected number of Gaussian Splats in different semantic areas, effectively providing a lower bound for Gaussian Splats in these areas. Subsequently, we extract the point cloud using a novel probability density-based extraction method, transforming Gaussian Splats into a point cloud crucial for downstream tasks. Our method also offers the potential for detailed semantic inquiries while maintaining high image-based reconstruction results. We provide extensive experiments on publicly available large-scale scene reconstruction datasets with highly accurate point clouds as ground truth and our novel dataset. Our results demonstrate the superiority of our method over current state-of-the-art Gaussian Splats reconstruction methods by a significant margin in terms of geometric-based measurement metrics. Code and additional results will soon be available on our project page.\n\n随着高斯喷溅的出现，最近的努力集中在大规模场景的几何重建上。然而，这些努力大多只关注内存减少或空间分割，而忽略了语义空间中的信息。在本文中，我们提出了一种名为SA-GS的新方法，使用语义感知的三维高斯喷溅进行细粒度的三维几何重建。具体来说，我们利用存储在大型视觉模型如SAM和DINO中的先验信息来生成语义掩码。然后，我们引入一个几何复杂性测量函数作为软正则化，指导每个高斯喷溅在特定语义区域内的形状。此外，我们提出了一种估计不同语义区域中高斯喷溅预期数量的方法，有效地为这些区域中的高斯喷溅提供了下限。随后，我们使用一种新颖的基于概率密度的提取方法提取点云，将高斯喷溅转化为对下游任务至关重要的点云。我们的方法还提供了进行详细语义查询的潜力，同时保持高质量的基于图像的重建结果。我们在公开的大规模场景重建数据集以及我们的新数据集上进行了广泛的实验，这些数据集具有高度精确的点云作为真实基准。我们的结果显示，就基于几何的测量指标而言，我们的方法在与当前最先进的高斯喷溅重建方法相比具有显著的优势。代码和更多结果将很快在我们的项目页面上提供。\n"
  },
  {
    "path": "abs/2405.17083.md",
    "content": "### F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting\n\nThe neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering speeds while maintaining excellent image quality. However, as it represents objects and scenes using a myriad of Gaussians, it requires substantial storage to achieve high-quality representation. To mitigate the storage overhead, we propose Factorized 3D Gaussian Splatting (F-3DGS), a novel approach that drastically reduces storage requirements while preserving image quality. Inspired by classical matrix and tensor factorization techniques, our method represents and approximates dense clusters of Gaussians with significantly fewer Gaussians through efficient factorization. We aim to efficiently represent dense 3D Gaussians by approximating them with a limited amount of information for each axis and their combinations. This method allows us to encode a substantially large number of Gaussians along with their essential attributes -- such as color, scale, and rotation -- necessary for rendering using a relatively small number of elements. Extensive experimental results demonstrate that F-3DGS achieves a significant reduction in storage costs while maintaining comparable quality in rendered images.\n\n神经辐射场（NeRF）在表示三维场景和合成新视图方面取得了重要进展。尽管有所进步，NeRF的高计算成本对其在资源受限环境和实时应用中的部署构成了挑战。作为一种替代NeRF类神经渲染方法，三维高斯喷溅（3DGS）提供了快速渲染速度，同时保持了优秀的图像质量。然而，由于它使用大量的高斯函数来表示对象和场景，因此需要大量的存储空间来实现高质量的表示。为了减轻存储开销，我们提出了一种新的方法——分解式三维高斯喷溅（F-3DGS），该方法大幅度降低了存储需求，同时保持了图像质量。我们的方法受到传统矩阵和张量分解技术的启发，通过高效的分解，用远少于原本数量的高斯函数来表示和近似密集的高斯团簇。我们的目标是通过对每个轴及其组合的有限信息的近似，有效地表示密集的三维高斯。这种方法使我们能够用相对较少的元素编码大量的高斯及其重要属性（如颜色、规模和旋转），这些都是渲染所必需的。广泛的实验结果表明，F-3DGS在减少存储成本的同时，保持了与渲染图像质量相当的水平。\n"
  },
  {
    "path": "abs/2405.17187.md",
    "content": "### Memorize What Matters: Emergent Scene Decomposition from Multitraverse\n\nHumans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 3D mapping and 2D segmentation without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.\n\n人类天生能够记住永久性元素，而短暂的瞬间往往会从记忆中溜走。这种选择性的记忆保留对于机器人的感知、定位和映射至关重要。为了让机器人具备这种能力，我们引入了三维高斯映射（3DGM），这是一个自监督的、仅使用相机的离线映射框架，基于三维高斯喷溅。3DGM将同一区域的多次穿越RGB视频转换为基于高斯的环境地图，同时执行2D短暂物体分割。我们的关键观察是，环境在多次穿越中保持一致，而物体频繁变化。这使我们能够利用重复穿越的自监督来实现环境与物体的分解。更具体地说，3DGM将多次穿越环境映射问题形式化为一个稳健的可微渲染问题，将环境和物体的像素分别视为内点和外点。通过稳健特征提取、特征残差挖掘和稳健优化，3DGM无需人工干预，即可同时进行3D映射和2D分割。我们构建了Mapverse基准，来源于Ithaca365和nuPlan数据集，以评估我们方法在无监督2D分割、3D重建和神经渲染方面的效果。广泛的结果验证了我们方法在自动驾驶和机器人技术中的有效性和潜力。\n"
  },
  {
    "path": "abs/2405.17351.md",
    "content": "### DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Refocusing,Defocus Rendering and Blur Removal\n\n3D Gaussian Splatting-based techniques have recently advanced 3D scene reconstruction and novel view synthesis, achieving high-quality real-time rendering. However, these approaches are inherently limited by the underlying pinhole camera assumption in modeling the images and hence only work for All-in-Focus (AiF) sharp image inputs. This severely affects their applicability in real-world scenarios where images often exhibit defocus blur due to the limited depth-of-field (DOF) of imaging devices. Additionally, existing 3D Gaussian Splatting (3DGS) methods also do not support rendering of DOF effects.\nTo address these challenges, we introduce DOF-GS that allows for rendering adjustable DOF effects, removing defocus blur as well as refocusing of 3D scenes, all from multi-view images degraded by defocus blur. To this end, we re-imagine the traditional Gaussian Splatting pipeline by employing a finite aperture camera model coupled with explicit, differentiable defocus rendering guided by the Circle-of-Confusion (CoC). The proposed framework provides for dynamic adjustment of DOF effects by changing the aperture and focal distance of the underlying camera model on-demand. It also enables rendering varying DOF effects of 3D scenes post-optimization, and generating AiF images from defocused training images. Furthermore, we devise a joint optimization strategy to further enhance details in the reconstructed scenes by jointly optimizing rendered defocused and AiF images. Our experimental results indicate that DOF-GS produces high-quality sharp all-in-focus renderings conditioned on inputs compromised by defocus blur, with the training process incurring only a modest increase in GPU memory consumption. We further demonstrate the applications of the proposed method for adjustable defocus rendering and refocusing of the 3D scene from input images degraded by defocus blur.\n\n基于三维高斯喷溅的技术最近在三维场景重建和新视角合成方面取得了进展，实现了高质量的实时渲染。然而，这些方法固有地受到基础针孔相机模型的限制，因此只适用于所有焦点（AiF）锐利图像输入。这严重影响了它们在实际应用场景中的适用性，因为成像设备的有限景深（DOF）常常导致图像出现散焦模糊。此外，现有的三维高斯喷溅（3DGS）方法也不支持渲染景深效果。\n为了解决这些挑战，我们引入了DOF-GS，它允许渲染可调节的景深效果，消除散焦模糊以及从多视角图像重聚焦三维场景，这些图像都因散焦模糊而质量下降。为此，我们重新设想了传统的高斯喷溅流程，采用了具有明确的、可微的散焦渲染引导的有限光圈相机模型，该模型以圆锥混淆（CoC）为指导。提出的框架通过按需改变相机模型的光圈和焦距，为景深效果的动态调整提供了支持。它还允许在优化后渲染三维场景的不同景深效果，并从散焦训练图像生成AiF图像。此外，我们设计了一种联合优化策略，通过联合优化渲染的散焦和AiF图像进一步提升重建场景中的细节。我们的实验结果表明，DOF-GS在输入受散焦模糊影响的条件下，能够产生高质量的锐利全焦点渲染，训练过程中GPU内存消耗只增加了一点。我们进一步展示了该方法在可调散焦渲染和从受散焦模糊影响的输入图像中重聚焦三维场景的应用。\n"
  },
  {
    "path": "abs/2405.17429.md",
    "content": "### GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction\n\n3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI-360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption.\n\n三维语义占用预测旨在获取周围场景的三维细粒度几何形状和语义，这是视觉中心自动驾驶系统稳定性的重要任务。大多数现有方法使用密集网格（如体素）作为场景表示，这忽略了占用的稀疏性和对象尺度的多样性，从而导致资源分配不均衡。为了解决这一问题，我们提出了一种以对象为中心的表示方法来描述三维场景，使用稀疏的三维语义高斯表示，每个高斯代表一个灵活的兴趣区域及其语义特征。我们通过注意力机制从图像中聚合信息，并迭代细化三维高斯的属性，包括位置、协方差和语义。接着，我们提出了一种高效的高斯到体素的喷溅方法来生成三维占用预测，该方法只聚合特定位置附近的高斯。我们在广泛采用的nuScenes和KITTI-360数据集上进行了广泛的实验。实验结果表明，GaussianFormer在只有17.8%至24.8%的内存消耗下，达到了与最先进方法相当的性能。\n"
  },
  {
    "path": "abs/2405.17596.md",
    "content": "### GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane\n\n3D open-vocabulary scene understanding, crucial for advancing augmented reality and robotic applications, involves interpreting and locating specific regions within a 3D space as directed by natural language instructions. To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane. Our approach includes an efficient compression method that utilizes scene priors to condense noisy high-dimensional semantic features into compact low-dimensional vectors, which are subsequently embedded in 3DGS. During the open-vocabulary querying process, we adopt a distinct approach compared to existing methods, which depend on a manually set fixed empirical threshold to select regions based on their semantic feature distance to the query text embedding. This traditional approach often lacks universal accuracy, leading to challenges in precisely identifying specific target areas. Instead, our method treats the feature selection process as a hyperplane division within the feature space, retaining only those features that are highly relevant to the query. We leverage off-the-shelf 2D Referring Expression Segmentation (RES) models to fine-tune the semantic-space hyperplane, enabling a more precise distinction between target regions and others. This fine-tuning substantially improves the accuracy of open-vocabulary queries, ensuring the precise localization of pertinent 3D Gaussians. Extensive experiments demonstrate GOI's superiority over previous state-of-the-art methods.\n\n三维开放词汇场景理解对于推动增强现实和机器人应用至关重要，它涉及根据自然语言指令解释和定位三维空间内的特定区域。为此，我们引入了 GOI 框架，该框架将二维视觉-语言基础模型的语义特征集成到三维高斯喷溅（3DGS）中，并使用可优化的语义空间超平面识别感兴趣的三维高斯。我们的方法包括一种高效的压缩方法，该方法利用场景先验将嘈杂的高维语义特征压缩成紧凑的低维向量，随后将其嵌入到 3DGS 中。在开放词汇查询过程中，我们采取了与现有方法不同的方法，现有方法依赖于手动设置的固定经验阈值来根据语义特征距离查询文本嵌入选择区域。这种传统方法通常缺乏普遍的准确性，导致在精确识别特定目标区域时面临挑战。相反，我们的方法将特征选择过程视为特征空间内的超平面划分，只保留与查询高度相关的特征。我们利用现成的二维指示表达分割（RES）模型来微调语义空间超平面，从而更精确地区分目标区域和其他区域。这种微调显著提高了开放词汇查询的准确性，确保了相关三维高斯的精确定位。广泛的实验表明，GOI 在先前最先进方法上的优越性。\n"
  },
  {
    "path": "abs/2405.17705.md",
    "content": "### DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos\n\nWe present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across various types of vehicles and capture a broader range of scenarios. Dash cam videos often suffer from severe obstructions such as reflections and occlusions on the windshields, which significantly impede the application of neural rendering techniques. To address this challenge, we develop DC-Gaussian based on the recent real-time neural rendering technique 3D Gaussian Splatting (3DGS). Our approach includes an adaptive image decomposition module to model reflections and occlusions in a unified manner. Additionally, we introduce illumination-aware obstruction modeling to manage reflections and occlusions under varying lighting conditions. Lastly, we employ a geometry-guided Gaussian enhancement strategy to improve rendering details by incorporating additional geometry priors. Experiments on self-captured and public dash cam videos show that our method not only achieves state-of-the-art performance in novel view synthesis, but also accurately reconstructing captured scenes getting rid of obstructions.\n\n我们提出了 DC-Gaussian，这是一种从车载仪表盘摄像头视频生成新视角的新方法。尽管神经渲染技术在驾驶场景中取得了重大进展，但现有方法主要是为自动驾驶车辆收集的视频设计的。然而，与仪表盘摄像头视频相比，这些视频在数量和多样性上都有限，仪表盘摄像头视频在各种类型的车辆中更为普遍使用，并且能捕捉到更广泛的场景。仪表盘摄像头视频经常受到严重的遮挡，如挡风玻璃上的反射和遮挡，这在很大程度上阻碍了神经渲染技术的应用。为了应对这一挑战，我们基于最近的实时神经渲染技术三维高斯喷溅（3DGS）开发了 DC-Gaussian。我们的方法包括一个自适应图像分解模块，以统一的方式模拟反射和遮挡。此外，我们引入了照明感知的遮挡建模，以在不同的光照条件下管理反射和遮挡。最后，我们采用几何引导的高斯增强策略，通过整合额外的几何先验来改善渲染细节。在自采集和公开的仪表盘摄像头视频上的实验表明，我们的方法不仅在新视角合成中达到了最先进的性能，而且还能准确地重构捕获的场景，消除遮挡。\n"
  },
  {
    "path": "abs/2405.17793.md",
    "content": "### SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction\n\n3D Gaussian Splatting (3DGS) has made a significant stride in novel view synthesis, demonstrating top-notch rendering quality while achieving real-time rendering speed. However, the excessively large number of Gaussian primitives resulting from 3DGS' suboptimal densification process poses a major challenge, slowing down frame-per-second (FPS) and demanding considerable memory cost, making it unfavorable for low-end devices. To cope with this issue, many follow-up studies have suggested various pruning techniques, often in combination with different score functions, to optimize rendering performance. Nonetheless, a comprehensive discussion regarding their effectiveness and implications across all techniques is missing. In this paper, we first categorize 3DGS pruning techniques into two types: Cross-view pruning and pixel-wise pruning, which differ in their approaches to rank primitives. Our subsequent experiments reveal that while cross-view pruning leads to disastrous quality drops under extreme Gaussian primitives decimation, the pixel-wise pruning technique not only sustains relatively high rendering quality with minuscule performance degradation but also provides a reasonable minimum boundary for pruning. Building on this observation, we further propose multiple variations of score functions and empirically discover that the color-weighted score function outperforms others for discriminating insignificant primitives for rendering. We believe our research provides valuable insights for optimizing 3DGS pruning strategies for future works.\n\n三维高斯喷溅（3DGS）在新视角合成方面取得了显著进展，展示了顶级的渲染质量，同时实现了实时渲染速度。然而，由于 3DGS 的次优密集化过程导致的高斯原语数量过多，这构成了一个主要挑战，降低了每秒帧数（FPS）并要求相当的内存成本，使其不适合低端设备。为了解决这个问题，许多后续研究提出了各种修剪技术，这些技术通常与不同的评分函数结合使用，以优化渲染性能。尽管如此，对所有技术的有效性和影响的全面讨论尚未见报道。在本文中，我们首先将 3DGS 修剪技术分类为两种类型：跨视图修剪和像素级修剪，这两种方法在对原语进行排名方面有所不同。我们随后的实验显示，尽管跨视图修剪在极端高斯原语削减下导致了灾难性的质量下降，但像素级修剪技术不仅能够在性能微小下降的情况下保持相对高的渲染质量，而且还提供了一个合理的修剪最小界限。基于这一观察，我们进一步提出了多种评分函数的变体，并通过实证发现，色彩加权评分函数在区分渲染中无关紧要的原语方面优于其他评分函数。我们相信我们的研究为未来优化 3DGS 修剪策略提供了宝贵的见解。\n"
  },
  {
    "path": "abs/2405.17811.md",
    "content": "### Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh\n\nNeural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not highly controllable and requires a long training and inference time. With the emergence of 3D Gaussian Splatting (3DGS), extremely high-fidelity novel view synthesis can be achieved using an explicit point-based 3D representation with much faster training and rendering speed. However, there is still a lack of effective means to manipulate 3DGS freely while maintaining rendering quality. In this work, we aim to tackle the challenge of achieving manipulable photo-realistic rendering. We propose to utilize a triangular mesh to manipulate 3DGS directly with self-adaptation. This approach reduces the need to design various algorithms for different types of Gaussian manipulation. By utilizing a triangle shape-aware Gaussian binding and adapting method, we can achieve 3DGS manipulation and preserve high-fidelity rendering after manipulation. Our approach is capable of handling large deformations, local manipulations, and soft body simulations while keeping high-quality rendering. Furthermore, we demonstrate that our method is also effective with inaccurate meshes extracted from 3DGS. Experiments conducted demonstrate the effectiveness of our method and its superiority over baseline approaches.\n\n神经3D表征如神经辐射场（NeRF）在产生逼真的渲染结果方面表现出色，但缺乏内容创建中至关重要的操作和编辑的灵活性。以往的工作尝试通过在规范空间中变形NeRF或基于显式网格操作辐射场来解决这个问题。然而，操作NeRF的控制性不高，需要较长的训练和推理时间。随着三维高斯喷溅（3DGS）的出现，可以使用显式基于点的3D表示实现极高保真的新视角合成，同时大大加快训练和渲染速度。然而，仍然缺乏有效的手段自由操作3DGS同时保持渲染质量。在这项工作中，我们旨在解决可操作的逼真渲染的挑战。我们提议直接使用三角网格来操作3DGS，并进行自适应。这种方法减少了为不同类型的高斯操作设计各种算法的需要。通过利用三角形感知的高斯绑定和适应方法，我们可以实现3DGS的操作，并在操作后保持高保真渲染。我们的方法能够处理大的变形、局部操作和软体模拟，同时保持高质量的渲染。此外，我们还展示了我们的方法对于从3DGS提取的不准确网格同样有效。所进行的实验证明了我们方法的有效性及其相较于基线方法的优越性。\n"
  },
  {
    "path": "abs/2405.17835.md",
    "content": "### Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting\n\nTissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruction framework, termed Deform3DGS, for deformable tissues during endoscopic surgery. Specifically, we introduce 3D GS into surgical scenes by integrating a point cloud initialization to improve reconstruction. Furthermore, we propose a novel flexible deformation modeling scheme (FDM) to learn tissue deformation dynamics at the level of individual Gaussians. Our FDM can model the surface deformation with efficient representations, allowing for real-time rendering performance. More importantly, FDM significantly accelerates surgical scene reconstruction, demonstrating considerable clinical values, particularly in intraoperative settings where time efficiency is crucial. Experiments on DaVinci robotic surgery videos indicate the efficacy of our approach, showcasing superior reconstruction fidelity PSNR: (37.90) and rendering speed (338.8 FPS) while substantially reducing training time to only 1 minute/scene.\n\n组织变形对于精确的手术场景重建构成了关键挑战。尽管现有方法能够实现高质量的重建，但它们的渲染速度慢，训练时间长，限制了它们在手术中的适用性。受近期在实时三维渲染技术——三维高斯喷溅（3DGS）进展的启发，本工作提出了一种名为 Deform3DGS 的新型快速重建框架，用于内窥镜手术中的可变形组织。具体来说，我们通过整合点云初始化来将3D GS 引入手术场景中，以改善重建。此外，我们提出了一种新颖的灵活变形建模方案（FDM），用于在个别高斯的水平上学习组织变形动态。我们的 FDM 能够使用高效的表示来模拟表面变形，允许实时渲染性能。更重要的是，FDM 显著加速了手术场景重建，显示出相当的临床价值，特别是在时间效率至关重要的手术中设置中。在 DaVinci 机器人手术视频上的实验表明了我们方法的有效性，展示了优越的重建保真度 PSNR:（37.90）和渲染速度（338.8 FPS），同时将训练时间大幅缩短到只有1分钟/场景。\n"
  },
  {
    "path": "abs/2405.17872.md",
    "content": "### HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction\n\nRobot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent trend, offering rapid inference capabilities and superior 3D quality. However, these methods still struggle with under-reconstruction in both static and dynamic scenes. In this paper, we propose HFGS, a novel approach for deformable endoscopic reconstruction that addresses these challenges from spatial and temporal frequency perspectives. Our approach incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth. Additionally, we introduce Temporal High-Frequency Emphasis Reconstruction (THF) to enhance dynamic awareness in neural rendering by leveraging flow priors, focusing optimization on motion-intensive parts. Extensive experiments on two widely used benchmarks demonstrate that HFGS achieves superior rendering quality.\n\n机器人辅助的微创手术受益于增强动态场景重建，因为这可以改善手术结果。尽管神经辐射场（NeRF）在场景重建方面有效，但其缓慢的推理速度和漫长的训练时间限制了其应用性。为了克服这些限制，基于三维高斯喷溅（3D-GS）的方法已经成为最近的趋势，提供快速的推理能力和优越的3D质量。然而，这些方法在静态和动态场景中仍然面临重建不足的问题。在这篇文章中，我们提出了一种新的方法HFGS，用于可变形的内窥镜重建，从空间和时间频率的角度来解决这些挑战。我们的方法包括变形场以更好地处理动态场景，并引入了空间高频强调重建（SHF），以最小化渲染图像与其真实图像之间的空间频率谱的差异。此外，我们引入了时间高频强调重建（THF），通过利用流先验来增强神经渲染中的动态意识，专注于运动密集部分的优化。在两个广泛使用的基准测试上的广泛实验表明，HFGS实现了优越的渲染质量。\n"
  },
  {
    "path": "abs/2405.17891.md",
    "content": "### A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction\n\nIn recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.\n\n近年来，神经辐射场（NeRF）凭借其隐式表达方式革新了三维（3D）重建技术。在 NeRF 的基础上，三维高斯喷溅（3D-GS）脱离了神经网络的隐式表达，转而直接将场景以高斯形状分布的点云形式表示。虽然这种转变显著提升了辐射场的渲染质量和速度，但不可避免地导致了内存使用的显著增加。此外，有效渲染 3D-GS 中的动态场景已成为一个紧迫的挑战。为解决这些问题，本文提出了一种精细的三维高斯表达方式，用于高质量动态场景重建。首先，我们使用一个可变形的多层感知器（MLP）网络来捕捉高斯点的动态偏移，并通过哈希编码及一个小型 MLP 来表达点的颜色特征，以减少存储需求。接着，我们引入了一个可学习的去噪掩模与去噪损失相结合，以从场景中消除噪点，进一步压缩三维高斯模型。最后，通过静态约束和运动一致性约束来缓解点的运动噪声。实验结果表明，我们的方法在渲染质量和速度上超越了现有方法，同时显著降低了与 3D-GS 相关的内存使用，使其非常适合用于新视角合成、动态映射等多种任务。\n"
  },
  {
    "path": "abs/2405.17958.md",
    "content": "### FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes\n\nEmpowering 3D Gaussian Splatting with generalization ability is appealing. However, existing generalizable 3D Gaussian Splatting methods are largely confined to narrow-range interpolation between stereo images due to their heavy backbones, thus lacking the ability to accurately localize 3D Gaussian and support free-view synthesis across wide view range. In this paper, we present a novel framework FreeSplat that is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis.Specifically, we firstly introduce Low-cost Cross-View Aggregation achieved by constructing adaptive cost volumes among nearby views and aggregating features using a multi-scale structure. Subsequently, we present the Pixel-wise Triplet Fusion to eliminate redundancy of 3D Gaussians in overlapping view regions and to aggregate features observed across multiple views. Additionally, we propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views. Our empirical results demonstrate state-of-the-art novel view synthesis peformances in both novel view rendered color maps quality and depth maps accuracy across different numbers of input views. We also show that FreeSplat performs inference more efficiently and can effectively reduce redundant Gaussians, offering the possibility of feed-forward large scene reconstruction without depth priors.\n\n赋予三维高斯喷溅泛化能力具有吸引力。然而，现有的泛化三维高斯喷溅方法大多限于在立体图像间的狭窄范围内插值，因此缺乏准确定位三维高斯和支持宽视角范围自由视图合成的能力。在本文中，我们提出了一个名为 FreeSplat 的新框架，该框架能够从长序列输入中重建几何上一致的三维场景，以实现自由视图合成。具体来说，我们首先引入了低成本的跨视图聚合，通过在附近视图之间构建自适应的成本体积并使用多尺度结构聚合特征来实现。接下来，我们提出了像素级三元组融合，以消除重叠视图区域中的三维高斯冗余，并聚合跨多个视图观察到的特征。此外，我们提出了一个简单但有效的自由视图训练策略，确保在更广泛的视角范围内实现稳健的视图合成，无论视图数量如何。我们的实验结果显示，在不同输入视图数量下，无论是新视角渲染的色彩地图质量还是深度地图精度，我们的方法都展示了最先进的新视角合成性能。我们还展示了 FreeSplat 在推理上更为高效，并且能有效减少冗余的高斯，提供了无需深度先验的大场景前馈重建的可能性。\n"
  },
  {
    "path": "abs/2405.18033.md",
    "content": "### RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields\n\nGaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.\n\n高斯喷溅技术通过实现实时的高渲染性能，已经革新了新视角合成的领域。最近的研究集中在为下游任务丰富这些三维表征的语义信息。在本文中，我们介绍了 RT-GS2，这是首个使用高斯喷溅的泛化语义分割方法。尽管现有基于高斯喷溅的方法依赖于特定场景的训练，RT-GS2 展示了对未见场景的泛化能力。我们的方法采用了一种新的方法，首先以自监督的方式提取视图独立的三维高斯特征，然后通过一种新颖的视图依赖/视图独立（VDVI）特征融合来增强不同视图下的语义一致性。在三个不同数据集上的广泛实验展示了 RT-GS2 在语义分割质量上超越了最先进方法，以 Replica 数据集上 8.01% 的 mIoU 增量为例。此外，我们的方法实现了 27.03 FPS 的实时性能，与现有方法相比速度提升了惊人的 901 倍。这项工作代表了该领域的一个重大进步，据我们所知，它是首个为辐射场的三维高斯表征引入实时泛化语义分割方法。\n"
  },
  {
    "path": "abs/2405.18132.md",
    "content": "### EG4D: Explicit Generation of 4D Object without Score Distillation\n\nIn recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose DG4D, a novel multi-stage framework that generates high-quality and consistent 4D assets without score distillation. Specifically, collaborative techniques and solutions are developed, including an attention injection strategy to synthesize temporal-consistent multi-view videos, a robust and efficient dynamic reconstruction method based on Gaussian Splatting, and a refinement stage with diffusion prior for semantic restoration. The qualitative results and user preference study demonstrate that our framework outperforms the baselines in generation quality by a considerable margin.\n\n近年来，设计和游戏应用中对动态3D资产的需求日益增加，促使强大的生成管道能够合成高质量的4D对象。以前的方法通常依赖于分数蒸馏采样（SDS）算法来推断4D对象的未见视图和运动，从而导致结果不尽人意，存在过饱和和“Janus问题”等缺陷。因此，受到最近视频扩散模型进展的启发，我们提出通过显式生成多视角视频从一个输入图像优化4D表示。然而，处理这样一个管道所面临的实际挑战，包括显著的时间不一致性、帧间几何和纹理多样性以及视频生成结果带来的语义缺陷，绝非易事。为了解决这些问题，我们提出了 DG4D，一个新的多阶段框架，它在不使用分数蒸馏的情况下生成高质量且一致的4D资产。具体而言，我们开发了包括注意力注入策略以合成时间一致的多视角视频、基于高斯喷溅的稳健且高效的动态重建方法，以及带有扩散先验的精炼阶段用于语义恢复的协作技术和解决方案。定性结果和用户偏好研究表明，我们的框架在生成质量上大幅超过了基准线。\n"
  },
  {
    "path": "abs/2405.18133.md",
    "content": "### A Grid-Free Fluid Solver based on Gaussian Spatial Representation\n\nWe present a grid-free fluid solver featuring a novel Gaussian representation. Drawing inspiration from the expressive capabilities of 3D Gaussian Splatting in multi-view image reconstruction, we model the continuous flow velocity as a weighted sum of multiple Gaussian functions. Leveraging this representation, we derive differential operators for the field and implement a time-dependent PDE solver using the traditional operator splitting method. Compared to implicit neural representations as another continuous spatial representation with increasing attention, our method with flexible 3D Gaussians presents enhanced accuracy on vorticity preservation. Moreover, we apply physics-driven strategies to accelerate the optimization-based time integration of Gaussian functions. This temporal evolution surpasses previous work based on implicit neural representation with reduced computational time and memory. Although not surpassing the quality of state-of-the-art Eulerian methods in fluid simulation, experiments and ablation studies indicate the potential of our memory-efficient representation. With enriched spatial information, our method exhibits a distinctive perspective combining the advantages of Eulerian and Lagrangian approaches.\n\n我们提出了一种新颖的高斯表征的无网格流体求解器。受到三维高斯喷溅（3DGS）在多视图图像重建中的表现能力的启发，我们将连续流速建模为多个高斯函数的加权和。利用这种表征，我们为场派生了微分算子，并使用传统的算子分裂方法实现了一个时变偏微分方程（PDE）求解器。与作为另一种持续受到关注的连续空间表征的隐式神经表征相比，我们使用灵活的三维高斯方法在保持涡度方面展示了提高的精确度。此外，我们应用物理驱动策略来加速基于优化的高斯函数的时间积分。这种时间演化超越了基于隐式神经表征的先前工作，减少了计算时间和内存。虽然没有超越流体模拟中最先进的欧拉方法的质量，但实验和消融研究表明了我们这种节省内存的表征的潜力。通过丰富的空间信息，我们的方法展示了一种独特的视角，结合了欧拉方法和拉格朗日方法的优势。\n"
  },
  {
    "path": "abs/2405.18163.md",
    "content": "### NegGS: Negative Gaussian Splatting\n\nOne of the key advantages of 3D rendering is its ability to simulate intricate scenes accurately. One of the most widely used methods for this purpose is Gaussian Splatting, a novel approach that is known for its rapid training and inference capabilities. In essence, Gaussian Splatting involves incorporating data about the 3D objects of interest into a series of Gaussian distributions, each of which can then be depicted in 3D in a manner analogous to traditional meshes. It is regrettable that the use of Gaussians in Gaussian Splatting is currently somewhat restrictive due to their perceived linear nature. In practice, 3D objects are often composed of complex curves and highly nonlinear structures. This issue can to some extent be alleviated by employing a multitude of Gaussian components to reflect the complex, nonlinear structures accurately. However, this approach results in a considerable increase in time complexity. This paper introduces the concept of negative Gaussians, which are interpreted as items with negative colors. The rationale behind this approach is based on the density distribution created by dividing the probability density functions (PDFs) of two Gaussians, which we refer to as Diff-Gaussian. Such a distribution can be used to approximate structures such as donut and moon-shaped datasets. Experimental findings indicate that the application of these techniques enhances the modeling of high-frequency elements with rapid color transitions. Additionally, it improves the representation of shadows. To the best of our knowledge, this is the first paper to extend the simple elipsoid shapes of Gaussian Splatting to more complex nonlinear structures.\n\n3D渲染的一个关键优势是其能够精确模拟复杂场景。高斯喷溅是为此目的广泛使用的方法之一，这种新颖的方法以其快速训练和推理能力而闻名。本质上，高斯喷溅涉及将有关感兴趣的3D对象的数据融入到一系列高斯分布中，每个分布随后可以以类似于传统网格的方式在3D中进行描述。遗憾的是，目前由于对其线性特性的认识，使用高斯喷溅中的高斯分布有些受限。实际上，3D对象往往由复杂的曲线和高度非线性的结构组成。通过使用多个高斯组件来准确反映复杂的非线性结构可以在一定程度上缓解这一问题。然而，这种方法导致时间复杂性显著增加。本文介绍了负高斯的概念，这些高斯被解释为具有负色彩的物体。该方法背后的理由基于通过两个高斯的概率密度函数（PDF）相除创建的密度分布，我们称之为 Diff-Gaussian。这样的分布可用于近似甜甜圈形和月牙形数据集等结构。实验结果表明，这些技术的应用增强了对快速色彩过渡的高频元素的建模。此外，它改善了阴影的表现。据我们所知，这是第一篇将高斯喷溅的简单椭球形状扩展到更复杂的非线性结构的论文。\n"
  },
  {
    "path": "abs/2405.18416.md",
    "content": "### 3D StreetUnveiler with Semantic-Aware 2DGS\n\nUnveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporary static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scenes involve long trajectories that differ from previous 3D inpainting tasks. The camera-centric moving environment of captured videos further complicates the task due to the limited degree and time duration of object observation. To address these obstacles, we introduce StreetUnveiler to reconstruct an empty street. StreetUnveiler learns a 3D representation of the empty street from crowded observations. Our representation is based on the hard-label semantic 2D Gaussian Splatting (2DGS) for its scalability and ability to identify Gaussians to be removed. We inpaint rendered image after removing unwanted Gaussians to provide pseudo-labels and subsequently re-optimize the 2DGS. Given its temporal continuous movement, we divide the empty street scene into observed, partial-observed, and unobserved regions, which we propose to locate through a rendered alpha map. This decomposition helps us to minimize the regions that need to be inpainted. To enhance the temporal consistency of the inpainting, we introduce a novel time-reversal framework to inpaint frames in reverse order and use later frames as references for earlier frames to fully utilize the long-trajectory observations. Our experiments conducted on the street scene dataset successfully reconstructed a 3D representation of the empty street. The mesh representation of the empty street can be extracted for further applications.\n\n为自动驾驶揭示由车载摄像头捕获的拥挤观测中的空荡街道至关重要。然而，移除所有临时静止对象，如停车的车辆和站立的行人，呈现出显著的挑战。与依赖于小场景中彻底观察的以对象为中心的3D补画不同，街景涉及的是长轨迹，这与以往的3D补画任务不同。由于对象观察的限度和时间持续性，摄像头为中心的移动环境进一步复杂化了这一任务。为了解决这些障碍，我们推出了 StreetUnveiler 来重建一个空荡的街道。StreetUnveiler 学习了从拥挤观测中得到的空荡街道的3D表征。我们的表征基于硬标签语义的二维高斯喷溅（2DGS），因其可扩展性和识别需要移除的高斯的能力。我们在移除不需要的高斯后补画渲染图像，提供伪标签并随后重新优化2DGS。鉴于其时间上的连续运动，我们将空荡街道场景划分为已观察区域、部分观察区域和未观察区域，我们建议通过渲染的透明度图来定位这些区域。这种分解帮助我们最小化需要补画的区域。为了增强补画的时间一致性，我们引入了一种新颖的时间反转框架，以逆序补画帧，并使用后续帧作为早期帧的参考，以充分利用长轨迹观测。我们在街道场景数据集上进行的实验成功重建了空荡街道的3D表征。可以提取空荡街道的网格表征以供进一步应用。\n"
  },
  {
    "path": "abs/2405.18424.md",
    "content": "### 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting\n\nScene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity. In this work, we propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. We first incorporate 3D Gaussians that are refined through generative priors and optimization techniques. Language features from CLIP then introduce semantics into 3D geometry for object disentanglement. With the disentangled Gaussians, 3DitScene allows for manipulation at both the global and individual levels, revolutionizing creative expression and empowering control over scenes and objects. Experimental results demonstrate the effectiveness and versatility of 3DitScene in scene image editing.\n\n场景图像编辑对于娱乐、摄影和广告设计至关重要。现有方法仅专注于2D单个对象或3D全局场景编辑，导致缺乏一个统一的方法来在不同粒度级别上有效控制和操作3D场景。在这项工作中，我们提出了3DitScene，这是一个利用语言引导的解耦高斯喷溅的新颖统一场景编辑框架，它实现了从2D到3D的无缝编辑，允许对场景构成和单个对象进行精确控制。我们首先合并了通过生成先验和优化技术精炼的3D高斯。然后，来自CLIP的语言特征为对象解耦引入了3D几何语义。通过解耦的高斯，3DitScene允许在全局和个别层面上进行操作，彻底改变了创造性表达并增强了对场景和对象的控制力。实验结果显示了3DitScene在场景图像编辑中的有效性和多功能性。\n"
  },
  {
    "path": "abs/2405.18426.md",
    "content": "### GFlow: Recovering 4D World from Monocular Video\n\nReconstructing 4D scenes from video inputs is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view video inputs, known camera parameters, or static scenes, all of which are typically absent under in-the-wild scenarios. In this paper, we relax all these constraints and tackle a highly ambitious but practical task, which we termed as AnyV4D: we assume only one monocular video is available without any camera parameters as input, and we aim to recover the dynamic 4D world alongside the camera poses. To this end, we introduce GFlow, a new framework that utilizes only 2D priors (depth and optical flow) to lift a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time. GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process that optimizes camera poses and the dynamics of 3D Gaussian points based on 2D priors and scene clustering, ensuring fidelity among neighboring points and smooth movement across frames. Since dynamic scenes always introduce new content, we also propose a new pixel-wise densification strategy for Gaussian points to integrate new visual content. Moreover, GFlow transcends the boundaries of mere 4D reconstruction; it also enables tracking of any points across frames without the need for prior training and segments moving objects from the scene in an unsupervised way. Additionally, the camera poses of each frame can be derived from GFlow, allowing for rendering novel views of a video scene through changing camera pose. By employing the explicit representation, we may readily conduct scene-level or object-level editing as desired, underscoring its versatility and power.\n\n从视频输入中重建4D场景是一项至关重要但又极具挑战性的任务。传统方法通常依赖于多视角视频输入、已知的相机参数或静态场景的假设，这些在野外环境中通常是不存在的。在这篇论文中，我们放宽了所有这些限制，并解决了一个非常雄心勃勃但实际的任务，我们称之为 AnyV4D：我们假设只有一个单目视频可用，没有任何相机参数作为输入，我们的目标是恢复动态的4D世界以及相机姿态。为此，我们引入了 GFlow，这是一个新框架，仅使用2D先验（深度和光流）将视频（3D）提升到一个4D显式表征，包括通过时空的高斯喷溅流动。GFlow首先将场景聚类为静止部分和移动部分，然后应用一个序列优化过程，该过程基于2D先验和场景聚类优化相机姿态和3D高斯点的动态，确保相邻点之间的保真度和帧间的平滑运动。由于动态场景总是引入新内容，我们还提出了一种新的高斯点的像素级密集化策略，以整合新的视觉内容。此外，GFlow超越了单纯的4D重建的界限；它还可以在无需事先训练的情况下跟踪任何帧之间的点，并以无监督的方式从场景中分割移动对象。此外，可以从GFlow派生每个帧的相机姿态，允许通过改变相机姿态渲染视频场景的新视角。通过采用显式表征，我们可以根据需要进行场景级或对象级编辑，突显其多功能性和强大性。\n"
  },
  {
    "path": "abs/2405.18784.md",
    "content": "### LP-3DGS: Learning to Prune 3D Gaussian Splatting\n\nRecently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset pruning ratio or importance score threshold to prune the point cloud. Such hyperparamter requires multiple rounds of training to optimize and achieve the maximum pruning ratio, while maintaining the rendering quality for each scene. In this work, we propose learning-to-prune 3DGS (LP-3DGS), where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically. Instead of using the traditional straight-through estimator (STE) method to approximate the binary mask gradient, we redesign the masking function to leverage the Gumbel-Sigmoid method, making it differentiable and compatible with the existing training process of 3DGS. Extensive experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.\n\n最近，由于其高质量和快速的渲染速度，三维高斯喷溅（3DGS）已成为新视角合成（NVS）的主流方法之一。然而，作为一种基于点的场景表征，3DGS可能会生成大量的高斯点以拟合场景，导致高内存使用。已经提出的改进需要一个经验的、预设的修剪比率或重要性得分阈值来修剪点云。这种超参数需要多轮训练来优化，以达到最大的修剪比例，同时保持每个场景的渲染质量。在这项工作中，我们提出了学习修剪三维高斯喷溅（LP-3DGS），其中应用了一个可训练的二进制掩码到重要性得分上，可以自动找到最佳的修剪比例。我们没有使用传统的直通估计器（STE）方法来近似二进制掩码梯度，而是重新设计了掩码函数，利用 Gumbel-Sigmoid 方法使其可微分并与现有的3DGS训练过程兼容。广泛的实验表明，LP-3DGS一致地产生了既高效又高质量的良好平衡。\n"
  },
  {
    "path": "abs/2405.19203.md",
    "content": "### E3Gen: Efficient, Expressive and Editable Avatars Generation\n\nThis paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: (1) The unstructured nature of 3D Gaussian makes it incompatible with current generation pipelines; (2) the expressive animation of 3D Gaussian in a generative setting that involves training with multiple subjects remains unexplored. In this paper, we propose a novel avatar generation method named E3Gen, to effectively address these challenges. First, we propose a novel generative UV features plane representation that encodes unstructured 3D Gaussian onto a structured 2D UV space defined by the SMPL-X parametric model. This novel representation not only preserves the representation ability of the original 3D Gaussian but also introduces a shared structure among subjects to enable generative learning of the diffusion model. To tackle the second challenge, we propose a part-aware deformation module to achieve robust and accurate full-body expressive pose control. Extensive experiments demonstrate that our method achieves superior performance in avatar generation and enables expressive full-body pose control and editing.\n\n本文旨在引入三维高斯技术，以实现高效、富有表现力且可编辑的数字化化身生成。这一任务面临两大挑战：（1）三维高斯的非结构化特性使其与当前的生成流水线不兼容；（2）在涉及多主体训练的生成环境中，三维高斯的表现性动画仍未被探索。为了有效应对这些挑战，我们提出了一种名为 E3Gen 的新颖化身生成方法。首先，我们提出了一种新颖的生成UV特征平面表征，该表征将非结构化的三维高斯编码到由SMPL-X参数模型定义的结构化的二维UV空间中。这种新颖的表征不仅保留了原始三维高斯的表征能力，还引入了在主体之间共享的结构，以支持扩散模型的生成学习。为了应对第二个挑战，我们提出了一个部分感知的变形模块，以实现稳健且准确的全身表情姿态控制。广泛的实验表明，我们的方法在化身生成中实现了卓越的性能，并能够实现表现力强的全身姿态控制和编辑。\n\n\n"
  },
  {
    "path": "abs/2405.19321.md",
    "content": "### DGD: Dynamic 3D Gaussians Distillation\n\nWe tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, specified using a simple and intuitive interface that includes a user click or a text prompt. To this end, we present DGD, a unified 3D representation for both the appearance and semantics of a dynamic 3D scene, building upon the recently proposed dynamic 3D Gaussians representation. Our representation is optimized over time with both color and semantic information. Key to our method is the joint optimization of the appearance and semantic attributes, which jointly affect the geometric properties of the scene. We evaluate our approach in its ability to enable dense semantic 3D object tracking and demonstrate high-quality results that are fast to render, for a diverse set of scenes.\n\n我们处理的任务是基于单个单目视频学习动态三维语义辐射场。我们学习到的语义辐射场能够捕捉每个点的语义以及动态三维场景的颜色和几何特性，从而生成新视角及其对应的语义。这使得可以使用简单直观的界面（如用户点击或文本提示）对各种三维语义实体进行分割和跟踪。为此，我们提出了DGD，这是一种用于动态三维场景外观和语义的统一三维表示，基于最近提出的动态三维高斯表示构建。我们的表示随时间进行优化，结合了颜色和语义信息。我们方法的关键是外观和语义属性的联合优化，这共同影响场景的几何特性。我们在密集语义三维对象跟踪能力方面对我们的方法进行了评估，并展示了在各种场景中快速渲染的高质量结果。\n"
  },
  {
    "path": "abs/2405.19331.md",
    "content": "### NPGA: Neural Parametric Gaussian Avatars\n\nThe creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian Splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.\n\n在进一步将虚拟组件融入我们日常生活的过程中，创建高保真数字化人类头部是一个重要的步骤。构建这样的头像是一个挑战性的研究问题，因为它需要高度的照片真实感和实时渲染性能。在这项工作中，我们提出了神经参数高斯头像（NPGA），一种基于数据驱动的方法，从多视角视频记录中创建高保真、可控的头像。我们围绕3D高斯溅射构建了我们的方法，因为它具有高效的渲染能力，并继承了点云的拓扑灵活性。与以往的研究不同，我们的头像动态依赖于神经参数头模型（NPHM）的丰富表情空间，而不是基于网格的3DMMs。为此，我们将我们的NPHM的后向变形场提炼为与光栅化渲染兼容的前向变形。所有剩余的细致表情依赖的细节都是从多视角视频中学习的。为了增加我们头像的表现力，我们使用每个原始的潜在特征增强规范高斯点云，这些特征控制其动态行为。为了规范这种增加的动态表达能力，我们在潜在特征和预测动态上提出了拉普拉斯项。我们在公共NeRSemble数据集上评估了我们的方法，展示了NPGA在自我再现任务上的表现显著优于之前的最先进头像，提高了2.6 PSNR。此外，我们还展示了从现实世界的单目视频准确的动画能力。\n"
  },
  {
    "path": "abs/2405.19614.md",
    "content": "### TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM\n\nThe limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results, which increase the difficulty of 3DGS rendering convergence. Thus, a cutting-edge 3DGS-based SLAM system is introduced, leveraging the efficiency and flexibility of 3DGS to achieve real-time performance while remaining robust against sensor noise, motion blur, and the challenges posed by long-session SLAM. Central to this approach is the Fusion Bridge module, which seamlessly integrates tracking-centered ORB Visual Odometry with mapping-centered online 3DGS. Precise pose initialization is enabled by this module through joint optimization of re-projection and rendering loss, as well as strategic view selection, enhancing rendering convergence in large-scale scenes. Extensive experiments demonstrate state-of-the-art rendering quality and localization accuracy, positioning this system as a promising solution for real-world robotics applications that require stable, near-real-time performance.\n\n三维高斯溅射（3DGS）对运动模糊和相机噪声的有限鲁棒性，以及其较差的实时性能，限制了其在机器人SLAM任务中的应用。通过分析，这些问题的主要原因是运动模糊视图的密度和从计算基于噪声原始图像和渲染结果的损失的密集姿态估计中累积的错误，这增加了3DGS渲染收敛的难度。因此，引入了一种尖端的基于3DGS的SLAM系统，利用3DGS的效率和灵活性实现实时性能，同时对抗传感器噪声、运动模糊和长时间会话SLAM所带来的挑战。这种方法的核心是Fusion Bridge模块，该模块无缝集成了以跟踪为中心的ORB视觉里程计和以映射为中心的在线3DGS。通过联合优化重投影和渲染损失，以及战略性视图选择，该模块实现了精确的姿态初始化，增强了大规模场景中的渲染收敛。广泛的实验表明，该系统具有最先进的渲染质量和定位精度，是现实世界机器人应用中要求稳定、接近实时性能的有希望的解决方案。\n"
  },
  {
    "path": "abs/2405.19657.md",
    "content": "### Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian\n\n3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. However, achieving successful reconstruction from RGB images generally requires multiple input views captured under static conditions. To address the challenge of sparse input views, previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting, using dense predictions from pretrained depth networks as pseudo-ground truth. Nevertheless, depth predictions from monocular depth estimation models inherently exhibit significant uncertainty in specific areas. Relying solely on pixel-wise L2 loss may inadvertently incorporate detrimental noise from these uncertain areas. In this work, we introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates. To address these localized errors in depth predictions, we integrate a patch-wise optimal transport strategy to complement traditional L2 loss in depth supervision. Extensive experiments conducted on the LLFF, DTU, and Blender datasets demonstrate that our approach, UGOT, achieves superior novel view synthesis and consistently outperforms state-of-the-art methods.\n\n三维高斯溅射在实时新视角合成中展示了令人印象深刻的性能。然而，从RGB图像成功重建通常需要在静态条件下捕获的多个输入视图。为了应对稀疏输入视图的挑战，先前的方法引入了深度监督到3D高斯的训练中，以减轻过拟合，使用预训练深度网络的密集预测作为伪真实值。尽管如此，来自单目深度估计模型的深度预测本质上在特定区域显示出显著的不确定性。仅依赖逐像素L2损失可能会无意中引入这些不确定区域的有害噪声。在这项工作中，我们引入了一种新的方法来监督3D高斯的深度分布，利用具有集成不确定性估计的深度先验。为了解决深度预测中这些局部错误，我们整合了一种基于块的最优传输策略，以补充传统的L2损失进行深度监督。在LLFF、DTU和Blender数据集上进行的广泛实验表明，我们的方法UGOT实现了优越的新视角合成，并一致超越了最先进的方法。\n"
  },
  {
    "path": "abs/2405.19671.md",
    "content": "### GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction\n\nRecently, 3D Gaussian Splatting(3DGS) has revolutionized neural rendering with its high-quality rendering and real-time speed. However, when it comes to indoor scenes with a significant number of textureless areas, 3DGS yields incomplete and noisy reconstruction results due to the poor initialization of the point cloud and under-constrained optimization. Inspired by the continuity of signed distance field (SDF), which naturally has advantages in modeling surfaces, we present a unified optimizing framework integrating neural SDF with 3DGS. This framework incorporates a learnable neural SDF field to guide the densification and pruning of Gaussians, enabling Gaussians to accurately model scenes even with poor initialized point clouds. At the same time, the geometry represented by Gaussians improves the efficiency of the SDF field by piloting its point sampling. Additionally, we regularize the optimization with normal and edge priors to eliminate geometry ambiguity in textureless areas and improve the details. Extensive experiments in ScanNet and ScanNet++ show that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.\n\n近期，三维高斯溅射（3DGS）以其高质量的渲染和实时速度革新了神经渲染技术。然而，当涉及到内部场景中有大量无纹理区域的情况时，由于点云的初始化不良和优化受限，3DGS会产生不完整和嘈杂的重建结果。受到符号距离场（SDF）连续性的启发，该场自然在建模表面方面具有优势，我们提出了一个将神经SDF与3DGS整合的统一优化框架。该框架整合了一个可学习的神经SDF场，以指导高斯体的密集化和修剪，使高斯体能够准确地建模即使是初始化较差的点云的场景。同时，由高斯体表示的几何形状通过引导其点采样来提高SDF场的效率。此外，我们还用法线和边缘先验规范优化，以消除无纹理区域中的几何模糊并改善细节。在ScanNet和ScanNet++数据集上进行的广泛实验表明，我们的方法在表面重建和新视角合成方面均达到了最先进的性能。\n"
  },
  {
    "path": "abs/2405.19745.md",
    "content": "### GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis\n\nForecasting future scenarios in dynamic environments is essential for intelligent decision-making and navigation, a challenge yet to be fully realized in computer vision and robotics. Traditional approaches like video prediction and novel-view synthesis either lack the ability to forecast from arbitrary viewpoints or to predict temporal dynamics. In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments. GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes. To this end, we first propose a 3D Gaussian canonical space with deformation modeling to capture the appearance and geometry of dynamic scenes, and integrate the lifecycle property into Gaussians for irreversible deformations. To make the prediction feasible and efficient, a concentric motion distillation approach is developed by distilling the scene motion with key points. Finally, a Graph Convolutional Network is employed to predict the motions of key points, enabling the rendering of photorealistic images of future scenarios. Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.\n\n在动态环境中预测未来场景对于智能决策和导航至关重要，这是计算机视觉和机器人技术尚未完全实现的挑战。传统方法如视频预测和新视角合成要么缺乏从任意视点预测的能力，要么无法预测时间动态。在本文中，我们引入了GaussianPrediction，这是一个新颖的框架，它使3D高斯表示能够对动态环境中的动态场景进行建模和未来场景合成。GaussianPrediction可以使用动态场景的视频观测数据从任何视点预测未来状态。为此，我们首先提出了一个具有形变建模的3D高斯典型空间，以捕捉动态场景的外观和几何，并将生命周期属性整合到高斯体中以处理不可逆形变。为了使预测可行且高效，我们开发了一种同心运动提炼方法，通过关键点提炼场景运动。最后，使用图卷积网络预测关键点的运动，使得能够渲染未来场景的逼真图像。我们的框架在合成和现实世界数据集上表现出色，证明了其在预测和渲染未来环境方面的有效性。\n"
  },
  {
    "path": "abs/2405.19957.md",
    "content": "### PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting\n\nAs text-conditioned diffusion models (DMs) achieve breakthroughs in image, video, and 3D generation, the research community's focus has shifted to the more challenging task of text-to-4D synthesis, which introduces a temporal dimension to generate dynamic 3D objects. In this context, we identify Score Distillation Sampling (SDS), a widely used technique for text-to-3D synthesis, as a significant hindrance to text-to-4D performance due to its Janus-faced and texture-unrealistic problems coupled with high computational costs. In this paper, we propose Pixel-Level Alignments for Text-to-4D Gaussian Splatting (PLA4D), a novel method that utilizes text-to-video frames as explicit pixel alignment targets to generate static 3D objects and inject motion into them. Specifically, we introduce Focal Alignment to calibrate camera poses for rendering and GS-Mesh Contrastive Learning to distill geometry priors from rendered image contrasts at the pixel level. Additionally, we develop Motion Alignment using a deformation network to drive changes in Gaussians and implement Reference Refinement for smooth 4D object surfaces. These techniques enable 4D Gaussian Splatting to align geometry, texture, and motion with generated videos at the pixel level. Compared to previous methods, PLA4D produces synthesized outputs with better texture details in less time and effectively mitigates the Janus-faced problem. PLA4D is fully implemented using open-source models, offering an accessible, user-friendly, and promising direction for 4D digital content creation.\n\n随着文本条件下的扩散模型（DMs）在图像、视频和3D生成领域取得突破，研究社区的焦点已转向更具挑战性的文本到4D合成任务，该任务引入了时间维度，用于生成动态3D对象。在这一背景下，我们发现文本到3D合成中广泛使用的技术——得分蒸馏采样（SDS），由于其具有两面性和纹理不真实的问题以及高计算成本，成为文本到4D性能的重大障碍。在本文中，我们提出了一种新颖方法：文本到4D高斯喷涂的像素级对齐（PLA4D），该方法利用文本到视频帧作为明确的像素对齐目标，以生成静态3D对象并注入动态。具体来说，我们引入了焦点对齐以校准渲染的摄像机姿态，并通过像素级渲染图像对比度来提取几何先验的GS-Mesh对比学习。此外，我们还开发了动作对齐，通过一个变形网络驱动高斯的变化，并实施了参考精细化以平滑4D对象表面。这些技术使得4D高斯喷涂能够在像素级别与生成的视频对齐几何、纹理和动作。与以往方法相比，PLA4D生成的输出在纹理细节上更佳，耗时更少，并有效缓解了两面性问题。PLA4D完全使用开源模型实现，为4D数字内容创作提供了一个易于使用、用户友好且前景广阔的方向。\n"
  },
  {
    "path": "abs/2405.20104.md",
    "content": "### Object-centric Reconstruction and Tracking of Dynamic Unknown Objects using 3D Gaussian Splatting\n\nGeneralizable perception is one of the pillars of high-level autonomy in space robotics. Estimating the structure and motion of unknown objects in dynamic environments is fundamental for such autonomous systems. Traditionally, the solutions have relied on prior knowledge of target objects, multiple disparate representations, or low-fidelity outputs unsuitable for robotic operations. This work proposes a novel approach to incrementally reconstruct and track a dynamic unknown object using a unified representation -- a set of 3D Gaussian blobs that describe its geometry and appearance. The differentiable 3D Gaussian Splatting framework is adapted to a dynamic object-centric setting. The input to the pipeline is a sequential set of RGB-D images. 3D reconstruction and 6-DoF pose tracking tasks are tackled using first-order gradient-based optimization. The formulation is simple, requires no pre-training, assumes no prior knowledge of the object or its motion, and is suitable for online applications. The proposed approach is validated on a dataset of 10 unknown spacecraft of diverse geometry and texture under arbitrary relative motion. The experiments demonstrate successful 3D reconstruction and accurate 6-DoF tracking of the target object in proximity operations over a short to medium duration. The causes of tracking drift are discussed and potential solutions are outlined.\n\n通用感知是空间机器人高级自主性的支柱之一。在动态环境中估计未知物体的结构和运动对于这样的自主系统至关重要。传统上，这些解决方案依赖于目标物体的先验知识、多种不同的表示形式，或者不适合机器人操作的低保真输出。本工作提出了一种新颖的方法，使用统一表示——一组描述其几何形状和外观的三维高斯斑点——来增量重建和跟踪动态未知物体。差分三维高斯溅射框架被适应于动态物体中心的设置。该管线的输入是一组序列化的RGB-D图像。使用一阶梯度基优化来解决三维重建和6自由度姿态跟踪任务。该公式简单，无需预训练，不假设对物体或其运动的先验知识，并且适合在线应用。所提出的方法在一个包含10个具有不同几何和纹理的未知航天器在任意相对运动下的数据集上进行了验证。实验展示了目标物体在短到中期的近距离操作中成功的三维重建和准确的6自由度跟踪。讨论了跟踪漂移的原因，并概述了潜在的解决方案。\n"
  },
  {
    "path": "abs/2405.20224.md",
    "content": "### EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images\n\n3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light environments that require long exposure times. To address these challenges, we introduce Event Stream Assisted Gaussian Splatting (EvaGaussians), a novel approach that integrates event streams captured by an event camera to assist in reconstructing high-quality 3D-GS from blurry images. Capitalizing on the high temporal resolution and dynamic range offered by the event camera, we leverage the event streams to explicitly model the formation process of motion-blurred images and guide the deblurring reconstruction of 3D-GS. By jointly optimizing the 3D-GS parameters and recovering camera motion trajectories during the exposure time, our method can robustly facilitate the acquisition of high-fidelity novel views with intricate texture details. We comprehensively evaluated our method and compared it with previous state-of-the-art deblurring rendering methods. Both qualitative and quantitative comparisons demonstrate that our method surpasses existing techniques in restoring fine details from blurry images and producing high-fidelity novel views.\n\n三维高斯溅射（3D-GS）在三维场景重建和新视角合成方面展示了卓越的能力。然而，它的训练严重依赖于高质量、清晰的图像和准确的相机姿态。在非理想的现实世界场景中满足这些要求可能是具有挑战性的，例如在高速移动的摄像机或需要长时间曝光的低光环境中常常遇到的运动模糊图像。为了应对这些挑战，我们引入了事件流辅助高斯溅射（EvaGaussians），这是一种新颖的方法，它整合了由事件摄像机捕获的事件流来协助从模糊图像中重建高质量的3D-GS。利用事件摄像机提供的高时间分辨率和动态范围，我们利用事件流来明确建模运动模糊图像的形成过程，并指导3D-GS的去模糊重建。通过联合优化3D-GS参数和在曝光时间内恢复相机运动轨迹，我们的方法能够稳健地促进具有复杂纹理细节的高保真新视角的获取。我们全面评估了我们的方法，并与之前的最先进的去模糊渲染方法进行了比较。定性和定量的比较均表明，我们的方法在从模糊图像恢复细节和产生高保真新视角方面超越了现有技术。\n"
  },
  {
    "path": "abs/2405.20310.md",
    "content": "### A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction\n\nLearning 3D scene representation from a single-view image is a long-standing fundamental problem in computer vision, with the inherent ambiguity in predicting contents unseen from the input view. Built on the recently proposed 3D Gaussian Splatting (3DGS), the Splatter Image method has made promising progress on fast single-image novel view synthesis via learning a single 3D Gaussian for each pixel based on the U-Net feature map of an input image. However, it has limited expressive power to represent occluded components that are not observable in the input view. To address this problem, this paper presents a Hierarchical Splatter Image method in which a pixel is worth more than one 3D Gaussians. Specifically,\neach pixel is represented by a parent 3D Gaussian and a small number of child 3D Gaussians. Parent 3D Gaussians are learned as done in the vanilla Splatter Image. Child 3D Gaussians are learned via a lightweight Multi-Layer Perceptron (MLP) which takes as input the projected image features of a parent 3D Gaussian and the embedding of a target camera view. Both parent and child 3D Gaussians are learned end-to-end in a stage-wise way. The joint condition of input image features from eyes of the parent Gaussians and the target camera position facilitates learning to allocate child Gaussians to \"see the unseen\", recovering the occluded details that are often missed by parent Gaussians.\nIn experiments, the proposed method is tested on the ShapeNet-SRN and CO3D datasets with state-of-the-art performance obtained, especially showing promising capabilities of reconstructing occluded contents in the input view.\n\n从单视图图像学习三维场景表示是计算机视觉中一个长期存在的基本问题，其固有的挑战在于预测从输入视图中看不见的内容。基于最近提出的三维高斯溅射（3DGS），Splatter Image方法通过基于输入图像的U-Net特征图为每个像素学习一个单独的三维高斯体，对快速单图像新视角合成取得了有希望的进展。然而，它在表示输入视图中不可观测的被遮挡组件方面具有有限的表达能力。为了解决这个问题，本文提出了一个分层Splatter Image方法，其中一个像素由多个三维高斯体表示。具体来说，每个像素由一个父三维高斯体和少量子三维高斯体组成。父三维高斯体的学习与传统的Splatter Image中的方法相同。子三维高斯体通过一个轻量级的多层感知机（MLP）学习，该MLP以一个父三维高斯体的投影图像特征和目标摄像机视图的嵌入为输入。父子三维高斯体均通过分阶段的方式端到端学习。输入图像特征与目标摄像机位置的联合条件有助于学习分配子高斯体以“看见未见”，恢复经常被父高斯体遗漏的被遮挡细节。\n在实验中，所提出的方法在ShapeNet-SRN和CO3D数据集上进行了测试，获得了最先进的性能，特别是在重构输入视图中被遮挡内容的能力方面表现出了有希望的能力。\n"
  },
  {
    "path": "abs/2405.20323.md",
    "content": "### S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving\n\nPhotorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements for effective reconstruction, limiting their applications for in-the-wild scenarios. To facilitate efficient 3D scene reconstruction without costly annotations, we propose a self-supervised street Gaussian (S3Gaussian) method to decompose dynamic and static elements from 4D consistency. We represent each scene with 3D Gaussians to preserve the explicitness and further accompany them with a spatial-temporal field network to compactly model the 4D dynamics. We conduct extensive experiments on the challenging Waymo-Open dataset to evaluate the effectiveness of our method. Our S3Gaussian demonstrates the ability to decompose static and dynamic scenes and achieves the best performance without using 3D annotations.\n\n三维高保真重建街景对于开发自动驾驶实时模拟器是一项关键技术。尽管神经辐射场（NeRF）在驾驶场景中效果显著，但三维高斯溅射（3DGS）由于其更快的速度和更明确的表示而成为一个有前景的方向。然而，大多数现有的街道3DGS方法需要跟踪的三维车辆边界框来分解静态和动态元素以有效重建，这限制了它们在野外场景中的应用。为了在不需要昂贵标注的情况下促进高效的三维场景重建，我们提出了一种自监督街道高斯（S3Gaussian）方法，从四维一致性中分解动态和静态元素。我们用三维高斯表示每个场景以保持明确性，并进一步将其与一个空间-时间场网络结合使用，以紧凑地模拟四维动态。我们在具有挑战性的Waymo-Open数据集上进行了广泛的实验以评估我们方法的有效性。我们的S3Gaussian展示了分解静态和动态场景的能力，并在不使用三维注释的情况下达到了最佳性能。\n"
  },
  {
    "path": "abs/2405.20669.md",
    "content": "### Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation\n\nSingle image-to-3D generation is pivotal for crafting controllable 3D assets. Given its underconstrained nature, we leverage geometric priors from a 3D novel view generation diffusion model and appearance priors from a 2D image generation method to guide the optimization process. We note that a disparity exists between the training datasets of 2D and 3D diffusion models, leading to their outputs showing marked differences in appearance. Specifically, 2D models tend to deliver more detailed visuals, whereas 3D models produce consistent yet over-smooth results across different views. Hence, we optimize a set of 3D Gaussians using 3D priors in spatial domain to ensure geometric consistency, while exploiting 2D priors in the frequency domain through Fourier transform for higher visual quality. This 2D-3D hybrid Fourier Score Distillation objective function (dubbed hy-FSD), can be integrated into existing 3D generation methods, yielding significant performance improvements. With this technique, we further develop an image-to-3D generation pipeline to create high-quality 3D objects within one minute, named Fourier123. Extensive experiments demonstrate that Fourier123 excels in efficient generation with rapid convergence speed and visual-friendly generation results.\n\n单图到三维生成对于创建可控的三维资产至关重要。鉴于其不完全约束的性质，我们利用三维新视角生成扩散模型的几何先验和二维图像生成方法的外观先验来指导优化过程。我们注意到二维和三维扩散模型的训练数据集之间存在差异，导致它们的输出在外观上显示出明显的不同。具体来说，二维模型往往提供更详细的视觉效果，而三维模型则在不同视角下产生一致但过于平滑的结果。因此，我们使用三维先验优化一组三维高斯函数，以确保几何一致性，同时通过傅立叶变换利用二维先验在频域中获得更高的视觉质量。这种二维-三维混合傅立叶得分蒸馏目标函数（简称hy-FSD），可以整合到现有的三维生成方法中，显著提升性能。借助这种技术，我们进一步开发了一个名为Fourier123的图像到三维生成流程，能够在一分钟内创建高质量的三维对象。广泛的实验表明，Fourier123在高效生成、快速收敛速度和视觉友好的生成结果方面表现出色。\n"
  },
  {
    "path": "abs/2405.20693.md",
    "content": "### R2-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction\n\n3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a previously unknown integration bias in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. Our new method presents three key innovations: (1) introducing tailored Gaussian kernels, (2) extending rasterization to X-ray imaging, and (3) developing a CUDA-based differentiable voxelizer. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches by 0.93 dB in PSNR and 0.014 in SSIM. Crucially, it delivers high-quality results in 3 minutes, which is 12x faster than NeRF-based methods and on par with traditional algorithms. The superior performance and rapid convergence of our method highlight its practical value.\n\n三维高斯绘制（3DGS）在图像渲染和表面重建中显示出有希望的结果。然而，其在体积重建任务中的潜力，例如X射线计算机断层扫描，仍然未被充分探索。本文介绍了R2-Gaussian，这是第一个基于3DGS的稀疏视图断层重建框架。通过仔细推导X射线光栅化函数，我们发现了标准3DGS公式中之前未知的积分偏差，这阻碍了准确的体积检索。为了解决这个问题，我们提出了一种通过重构从三维到二维高斯的投影的新颖矫正技术。我们的新方法呈现三个关键创新：（1）引入定制的高斯核，（2）将光栅化扩展到X射线成像，以及（3）开发基于CUDA的可微体素化器。广泛的实验表明，我们的方法在峰值信噪比（PSNR）上超过了最先进的方法0.93 dB，在结构相似性指数（SSIM）上提高了0.014。关键的是，它在3分钟内提供高质量结果，这比基于NeRF的方法快12倍，与传统算法相当。我们方法的卓越性能和快速收敛突显了其实际价值。\n"
  },
  {
    "path": "abs/2405.20721.md",
    "content": "### ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model\n\nRecently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time, with little design for their interactions and spatial dependence. Inspired by the effectiveness of the context model in image compression, we propose the first autoregressive model at the anchor level for 3DGS compression in this work. We divide anchors into different levels and the anchors that are not coded yet can be predicted based on the already coded ones in all the coarser levels, leading to more accurate modeling and higher coding efficiency. To further improve the efficiency of entropy coding, e.g., to code the coarsest level with no already coded anchors, we propose to introduce a low-dimensional quantized feature as the hyperprior for each anchor, which can be effectively compressed. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.\n\n最近，三维高斯绘制（3DGS）已成为新视图合成的一个有前景的框架，提供了快速的渲染速度和高保真度。然而，大量的高斯及其相关属性需要有效的压缩技术。现有方法主要是单独和独立地压缩神经高斯，即同时编码所有神经高斯，对它们的交互和空间依赖性设计较少。受图像压缩中上下文模型有效性的启发，我们在这项工作中提出了第一个用于3DGS压缩的锚点级自回归模型。我们将锚点划分为不同的层次，尚未编码的锚点可以基于所有较粗层次中已编码的锚点进行预测，从而实现更准确的建模和更高的编码效率。为了进一步提高熵编码的效率，例如，用于编码没有已编码锚点的最粗层次，我们提议引入每个锚点的低维量化特征作为超先验，这可以被有效压缩。我们的工作在锚点级别为3DGS表示引入了上下文模型，与传统3DGS相比，压缩大小超过100倍，与最新的先进工作Scaffold-GS相比，压缩比达到15倍，同时实现了可比较甚至更高的渲染质量。\n"
  },
  {
    "path": "abs/2405.20791.md",
    "content": "### GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis\n\nDecoupling the illumination in 3D scenes is crucial for novel view synthesis and relighting. In this paper, we propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components, enabling the synthesis of realistic lighting effects. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework. The fundamental idea is to view the rendering tasks under various lighting positions as a multi-task learning problem, which our meta-learning approach effectively addresses by generalizing the learned Gaussian geometries not only across different viewpoints but also across diverse light positions. Experimental results demonstrate the effectiveness of our approach in terms of training efficiency and rendering quality compared to existing methods for free-viewpoint relighting.\n\n在三维场景中分离照明对于新视角合成和重新照明至关重要。在本文中，我们提出了一种新方法，使用一组可重新照明的三维高斯点来表示由点光源照亮的场景。受到Blinn-Phong模型的启发，我们的方法将场景分解为环境光、漫反射和镜面反射成分，从而实现了真实的照明效果的合成。为了便于独立于照明条件分解几何信息，我们引入了一种新的双层优化基元学习框架。基本思想是将在不同照明位置下的渲染任务视为一个多任务学习问题，我们的元学习方法有效地通过泛化学习到的高斯几何形状，不仅跨不同视点，还跨不同的光线位置。实验结果证明了我们方法在训练效率和渲染质量方面与现有的自由视点重新照明方法相比的有效性。\n"
  },
  {
    "path": "abs/2406.00434.md",
    "content": "### MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos\n\nIn this paper, we propose MoDGS, a new pipeline to render novel-view images in dynamic scenes using only casually captured monocular videos. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid movement of input cameras to construct multiview consistency but fail to reconstruct dynamic scenes on casually captured input videos whose cameras are static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms baseline methods by a significant margin.\n\n在本文中，我们提出了 MoDGS，这是一种新的管道，仅使用随意捕获的单目视频来渲染动态场景中的新视角图像。之前的单目动态 NeRF 或高斯喷溅方法严重依赖输入相机的快速移动来构建多视图一致性，但无法在相机静止或缓慢移动的随意捕获视频输入上重建动态场景。为了解决这一挑战性任务，MoDGS 采用了最近的单视图深度估计方法来指导动态场景的学习。然后，提出了一种新颖的 3D 感知初始化方法，以学习合理的变形场，以及一个新的鲁棒深度损失，用于指导动态场景几何的学习。全面的实验表明，MoDGS 能够仅从随意捕获的单目视频中渲染出动态场景的高质量新视角图像，其性能显著优于基线方法。\n\n"
  },
  {
    "path": "abs/2406.00440.md",
    "content": "### Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture\n\n4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures.\n\n4D 头部捕捉旨在从视频中生成动态拓扑网格和相应的纹理贴图，这在电影和游戏中得到了广泛使用，因为它能够模拟面部肌肉运动并恢复毛孔挤压中的动态纹理。该行业通常采用多视图立体和非刚性对齐方法。然而，这种方法容易出错，并且严重依赖于艺术家耗时的手动处理。为了简化这个过程，我们提出了 Topo4D，这是一个用于自动几何和纹理生成的新框架，它可以直接从校准的多视图时间序列图像中优化密集对齐的 4D 头部和 8K 纹理贴图。具体来说，我们首先将时间序列面部表示为一组具有固定拓扑的动态 3D 高斯分布，其中高斯中心绑定到网格顶点。之后，我们逐帧进行几何和纹理的交替优化，以实现高质量的几何和纹理学习，同时保持时间拓扑稳定性。最后，我们可以从学习到的高斯分布中提取具有规则布线排列和具有毛孔级细节的高保真纹理的动态面部网格。广泛的实验表明，我们的方法在网格和纹理的质量上均优于当前最先进的面部重建方法。\n"
  },
  {
    "path": "abs/2406.00609.md",
    "content": "### SuperGaussian: Repurposing Video Models for 3D Super Resolution\n\nWe present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models.\n\n我们提出了一种简单、模块化且通用的方法，该方法通过增加几何和外观细节来上采样粗糙的 3D 模型。尽管现在存在生成式 3D 模型，但它们的质量尚未达到图像和视频领域同类产品的质量。我们证明了可以直接将现有的（预训练的）视频模型用于 3D 超分辨率，从而绕过高质量 3D 训练模型库短缺的问题。我们描述了如何重新利用视频上采样模型，这些模型在 3D 上不一致，并将它们与 3D 整合结合起来，以产生 3D 一致的结果。作为输出，我们生成了高质量的高斯喷溅模型，这些模型以对象为中心且效果显著。我们的方法不受类别限制，可以轻松整合到现有的 3D 工作流程中。我们在多种 3D 输入上评估了我们提出的 SuperGaussian，这些输入在复杂性和表现形式（例如，高斯喷溅或 NeRFs）方面都具有多样性，并证明我们的简单方法显著提高了最终 3D 模型的保真度。\n\n"
  },
  {
    "path": "abs/2406.01042.md",
    "content": "### Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting\n\nGaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which lack accuracy as well are time-consuming. This sometimes results in poor dynamic scene representation, especially in scenes with large object movements, or extreme camera conditions e.g. small translations combined with large rotations. Some studies simultaneously optimize the estimation of camera parameters and scenes, supervised by additional information like depth, optical flow, etc. obtained from off-the-shelf models. Using this unverified information as ground truth can reduce robustness and accuracy, which does frequently occur for long monocular videos (with e.g. > hundreds of frames). We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure, and their use for subsequent joint optimization of camera parameters and 3D structure towards overall 4D scene optimization. We demonstrate the accuracy and time efficiency of our method through extensive quantitative and qualitative experimental results on several standard benchmarks. The results show significant improvements over state-of-the-art methods for 4D novel view synthesis.\n\n高斯喷溅（GS）在场景重建效率和新视角合成（NVS）的准确性上相较于神经辐射场（NeRF）有了显著提高，尤其是在动态场景中。然而，当前的 4D NVS 方法，无论是基于 GS 还是 NeRF，主要依赖由 COLMAP 提供的相机参数，甚至使用 COLMAP 生成的稀疏点云进行初始化，这些方法不仅缺乏准确性，而且耗时。这有时会导致动态场景表示质量不佳，特别是在物体移动大或相机条件极端（例如，小位移配合大旋转）的场景中。一些研究同时优化相机参数和场景的估计，由额外信息（如深度、光流等）监督，这些信息来自现成模型。使用这些未经验证的信息作为真值可能会降低鲁棒性和准确性，这在长时间的单目视频（例如，超过数百帧）中经常发生。我们提出了一种新颖的方法，通过自我校准相机参数学习高保真度的 4D GS 场景表示。它包括提取稳健代表 3D 结构的 2D 点特征，以及使用这些特征对相机参数和 3D 结构进行后续的联合优化，以实现整体 4D 场景优化。我们通过在几个标准基准上进行广泛的定量和定性实验，展示了我们方法的准确性和时间效率。结果显示，与现有最先进方法相比，我们的方法在 4D 新视角合成方面取得了显著改进。\n\n"
  },
  {
    "path": "abs/2406.01467.md",
    "content": "### RaDe-GS: Rasterizing Depth in Gaussian Splatting\n\nGaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. Our approach achieves a Chamfer distance error comparable to NeuraLangelo on the DTU dataset and similar training and rendering time as traditional Gaussian Splatting on the Tanks & Temples dataset. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods.\n\n高斯喷溅（GS）在新视角合成中已被证明非常有效，能够实现高质量和实时渲染。然而，其在重建详细的 3D 形状方面的潜力尚未被充分挖掘。由于高斯喷溅的离散和无结构性质，现有方法常常受到形状精度有限的困扰，这使得形状提取变得复杂。尽管最近的技术如 2D GS 试图改进形状重建，但它们常常重新制定高斯原语，以降低渲染质量和计算效率。为了解决这些问题，我们的工作引入了一种光栅化方法来渲染一般 3D 高斯喷溅的深度图和表面法线图。我们的方法不仅显著提高了形状重建的准确性，还保持了高斯喷溅固有的计算效率。我们的方法在 DTU 数据集上达到了与 NeuraLangelo 相当的 Chamfer 距离误差，并且在 Tanks & Temples 数据集上具有与传统高斯喷溅类似的训练和渲染时间。我们的方法是高斯喷溅的一大进步，并且可以直接集成到现有的基于高斯喷溅的方法中。\n\n"
  },
  {
    "path": "abs/2406.01476.md",
    "content": "### DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors\\\n\nDynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the simulated results would become unnatural. The latter tends to formulate the video with minor motions and discontinuous frames, due to the absence of physical constraints in deformation learning. We think that video generative models are trained with real-world captured data, capable of judging physical phenomenon in simulation environments. To this end, we propose DreamPhysics in this work, which estimates physical properties of 3D Gaussian Splatting with video diffusion priors. DreamPhysics supports both image- and text-conditioned guidance, optimizing physical parameters via score distillation sampling with frame interpolation and log gradient. Based on a material point method simulator with proper physical parameters, our method can generate 4D content with realistic motions. Experimental results demonstrate that, by distilling the prior knowledge of video diffusion models, inaccurate physical properties can be gradually refined for high-quality simulation.\n\n动态 3D 交互在近期的研究中引起了极大的兴趣，而创建此类 4D 内容仍然具有挑战性。一种解决方案是通过物理基础的模拟来动画化 3D 场景，另一种则是通过提炼视频生成模型来学习静态 3D 对象的形变。前者需要为目标对象分配精确的物理属性，否则模拟结果可能会变得不自然。后者由于在形变学习中缺乏物理约束，往往会导致视频中的微小运动和不连续的帧。我们认为视频生成模型是通过现实世界捕获的数据训练而成的，能够在模拟环境中判断物理现象。为此，我们在这项工作中提出了 DreamPhysics，它利用视频扩散先验估计 3D 高斯喷溅的物理属性。DreamPhysics 支持图像和文本条件指导，通过得分提炼采样与帧插值和对数梯度来优化物理参数。基于具有适当物理参数的材料点方法模拟器，我们的方法可以生成具有真实运动的 4D 内容。实验结果表明，通过提炼视频扩散模型的先验知识，可以逐渐完善不准确的物理属性，以实现高质量的模拟。\n"
  },
  {
    "path": "abs/2406.01579.md",
    "content": "### Tetrahedron Splatting for 3D Generation\n\n3D representation is essential to the significant advance of 3D generation with 2D diffusion priors. As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges. Alternatively, 3D Gaussian Splatting (3DGS) is favored in both training and rendering efficiency while falling short in mesh extraction. In this work, we introduce a novel 3D representation, Tetrahedron Splatting (TeT-Splatting), that supports easy convergence during optimization, precise mesh extraction, and real-time rendering simultaneously. This is achieved by integrating surface-based volumetric rendering within a structured tetrahedral grid while preserving the desired ability of precise mesh extraction, and a tile-based differentiable tetrahedron rasterizer. Furthermore, we incorporate eikonal and normal consistency regularization terms for the signed distance field to improve generation quality and stability. Critically, our representation can be trained without mesh extraction, making the optimization process easier to converge. Our TeT-Splatting can be readily integrated in existing 3D generation pipelines, along with polygonal mesh for texture optimization. Extensive experiments show that our TeT-Splatting strikes a superior tradeoff among convergence speed, render efficiency, and mesh quality as compared to previous alternatives under varying 3D generation settings.\n\n三维表示对于利用二维扩散先验进行三维生成的重大进展至关重要。作为一种灵活的表示方式，NeRF首先被用于三维表示。尽管采用基于密度的体积渲染，但它在计算开销大和网格提取不准确方面都有所不足。使用有符号距离场和行进四面体算法，DMTet能够实现精确的网格提取和实时渲染，但在处理网格的大规模拓扑变化时存在局限，导致优化挑战。另一方面，3D高斯涂抹（3DGS）在训练和渲染效率上受到青睐，但在网格提取方面略显不足。在这项工作中，我们引入了一种新颖的三维表示方式，四面体涂抹（TeT-Splatting），它支持在优化过程中易于收敛、精确的网格提取和实时渲染。这是通过在结构化的四面体网格中整合基于表面的体积渲染，同时保留精确网格提取的所需能力，并采用基于瓦片的可微四面体光栅化器实现的。此外，我们为有符号距离场引入了等值面和法线一致性的正则化项，以提高生成质量和稳定性。关键的是，我们的表示可以在不进行网格提取的情况下进行训练，使优化过程更易于收敛。我们的TeT-Splatting可以容易地集成到现有的三维生成流程中，并与多边形网格一起用于纹理优化。广泛的实验表明，与之前的替代方案相比，我们的TeT-Splatting在收敛速度、渲染效率和网格质量之间取得了更优的平衡，适用于不同的三维生成设置。\n"
  },
  {
    "path": "abs/2406.01593.md",
    "content": "### Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting\n\n3D reconstruction and simulation, while interrelated, have distinct objectives: reconstruction demands a flexible 3D representation adaptable to diverse scenes, whereas simulation requires a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to resolve such a dilemma. MaGS constrains 3D Gaussians to hover on the mesh surface, creating a mutual-adsorbed mesh-Gaussian 3D representation that combines the rendering flexibility of 3D Gaussians with the spatial coherence of meshes. Leveraging this representation, we introduce a learnable Relative Deformation Field (RDF) to model the relative displacement between the mesh and 3D Gaussians, extending traditional mesh-driven deformation paradigms that only rely on ARAP prior, thus capturing the motion of each 3D Gaussian more precisely. By joint optimizing meshes, 3D Gaussians, and RDF, MaGS achieves both high rendering accuracy and realistic deformation. Extensive experiments on the D-NeRF and NeRF-DS datasets demonstrate that MaGS can generate competitive results in both reconstruction and simulation.\n\n3D 重建和模拟虽然相互关联，但具有不同的目标：重建要求一种灵活的 3D 表示，适应多样化的场景，而模拟则需要一种结构化的表示以有效地模拟运动原理。本文介绍了网格吸附高斯喷溅（MaGS）方法来解决这种困境。MaGS 将 3D 高斯限制在网格表面上漂浮，创建了一个相互吸附的网格-高斯 3D 表示，结合了 3D 高斯的渲染灵活性与网格的空间连贯性。利用这种表示，我们引入了一个可学习的相对形变场（RDF），用于模拟网格和 3D 高斯之间的相对位移，扩展了传统的仅依赖 ARAP 先验的网格驱动形变范式，从而更精确地捕捉每个 3D 高斯的运动。通过联合优化网格、3D 高斯和 RDF，MaGS 实现了高渲染精度和真实的形变。在 D-NeRF 和 NeRF-DS 数据集上的广泛实验表明，MaGS 能在重建和模拟方面生成具有竞争力的结果。\n\n"
  },
  {
    "path": "abs/2406.01597.md",
    "content": "### End-to-End Rate-Distortion Optimized 3D Gaussian Representation\n\n3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible and continuous rate control. RDO-Gaussian addresses two main issues that exist in current schemes: 1) Different from prior endeavors that minimize the rate under the fixed distortion, we introduce dynamic pruning and entropy-constrained vector quantization (ECVQ) that optimize the rate and distortion at the same time. 2) Previous works treat the colors of each Gaussian equally, while we model the colors of different regions and materials with learnable numbers of parameters. We verify our method on both real and synthetic scenes, showcasing that RDO-Gaussian greatly reduces the size of 3D Gaussian over 40x, and surpasses existing methods in rate-distortion performance.\n\n3D 高斯涂抹（3DGS）已成为3D表征和图像渲染中一种具有显著潜力的新兴技术。然而，3DGS的巨大存储开销显著阻碍了其实际应用。在这项工作中，我们将紧凑的3D高斯学习表述为端到端的速率失真优化（RDO）问题，并提出了RDO-Gaussian，能够实现灵活和连续的速率控制。RDO-Gaussian解决了当前方案中存在的两个主要问题：1）与之前只是在固定失真下最小化速率的尝试不同，我们引入了动态剪枝和熵约束向量量化（ECVQ），同时优化速率和失真。2）以往的工作对每个高斯的颜色处理相同，而我们对不同区域和材料的颜色采用可学习的参数数量进行建模。我们在真实和合成场景中验证了我们的方法，显示出RDO-Gaussian在减少3D高斯大小方面超过40倍，并且在速率失真表现上超越了现有方法。\n"
  },
  {
    "path": "abs/2406.01916.md",
    "content": "### FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping\n\nThe semantically interactive radiance field has always been an appealing task for its potential to facilitate user-friendly and automated real-world 3D scene understanding applications. However, it is a challenging task to achieve high quality, efficiency and zero-shot ability at the same time with semantics in radiance fields. In this work, we present FastLGS, an approach that supports real-time open-vocabulary query within 3D Gaussian Splatting (3DGS) under high resolution. We propose the semantic feature grid to save multi-view CLIP features which are extracted based on Segment Anything Model (SAM) masks, and map the grids to low dimensional features for semantic field training through 3DGS. Once trained, we can restore pixel-aligned CLIP embeddings through feature grids from rendered features for open-vocabulary queries. Comparisons with other state-of-the-art methods prove that FastLGS can achieve the first place performance concerning both speed and accuracy, where FastLGS is 98x faster than LERF and 4x faster than LangSplat. Meanwhile, experiments show that FastLGS is adaptive and compatible with many downstream tasks, such as 3D segmentation and 3D object inpainting, which can be easily applied to other 3D manipulation systems.\n\n语义交互式辐射场一直是一个吸引人的任务，因为它有助于促进用户友好和自动化的现实世界3D场景理解应用。然而，在辐射场中同时实现高质量、高效率和零样本能力是一个挑战性任务。在这项工作中，我们提出了FastLGS，这是一种支持实时开放词汇查询的方法，适用于高分辨率下的3D高斯涂抹（3DGS）。我们提出了语义特征网格来保存基于Segment Anything Model（SAM）掩码提取的多视图CLIP特征，并通过3DGS将网格映射到低维特征进行语义场训练。一旦训练完成，我们可以通过从渲染特征中恢复的特征网格恢复像素对齐的CLIP嵌入，以进行开放词汇查询。与其他最先进方法的比较证明，FastLGS在速度和准确性方面均能取得第一名的表现，其中FastLGS比LERF快98倍，比LangSplat快4倍。同时，实验表明FastLGS能够适应并兼容许多下游任务，例如3D分割和3D对象修复，这些任务可以轻松应用于其他3D操作系统。\n"
  },
  {
    "path": "abs/2406.02058.md",
    "content": "### OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding\n\nThis paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method.\n\n本文介绍了一种基于3D高斯涂抹（3DGS）的方法，名为OpenGaussian，能够实现3D点级开放词汇理解。我们的主要动机来源于观察到现有基于3DGS的开放词汇方法主要关注2D像素级解析。这些方法在3D点级任务上挣扎，原因是特征表达能力弱和2D-3D特征关联不准确。为了确保稳健的特征表现和3D点级理解，我们首先使用不涉及跨帧关联的SAM掩模来训练具有3D一致性的实例特征。这些特征展示了对象内部的一致性和对象间的区别。然后，我们提出了一个两阶段码本，用于将这些特征从粗到细的层次进行离散化。在粗层次上，我们考虑了3D点的位置信息，实现基于位置的聚类，然后在细层次上进行细化。最后，我们引入了一个实例级的3D-2D特征关联方法，将3D点与2D掩模相连，这些掩模进一步与2D CLIP特征相关联。包括基于开放词汇的3D对象选择、3D点云理解、点击式3D对象选择和消融研究在内的广泛实验，证明了我们提出方法的有效性。\n"
  },
  {
    "path": "abs/2406.02370.md",
    "content": "### Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning\n\nLatent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rendering. Moreover, they lack fine-grained semantic information included in their scene representation vectors because they evenly consider free and occupied spaces. Both of them can destroy the performance of downstream RL tasks. To address the above challenges, we propose a novel framework that adopts the efficient 3D Gaussian Splatting (3DGS) to learn 3D scene representation for the first time. In brief, we present the Query-based Generalizable 3DGS to bridge the 3DGS technique and scene representations with more geometrical awareness than those in NeRFs. Moreover, we present the Hierarchical Semantics Encoding to ground the fine-grained semantic features to 3D Gaussians and further distilled to the scene representation vectors. We conduct extensive experiments on two RL platforms including Maniskill2 and Robomimic across 10 different tasks. The results show that our method outperforms the other 5 baselines by a large margin. We achieve the best success rates on 8 tasks and the second-best on the other two tasks.\n\n潜在场景表示在训练强化学习（RL）代理中扮演了重要角色。为了获得描述场景的良好潜在向量，近期的工作将3D意识的潜条件NeRF管道整合到场景表示学习中。然而，这些与NeRF相关的方法因体积渲染中的低效密集采样而难以感知3D结构信息。此外，它们在场景表示向量中缺乏细粒度的语义信息，因为它们均等地考虑了空闲和占用空间。这两者都可能破坏下游RL任务的性能。为了解决上述挑战，我们提出了一个新颖的框架，首次采用高效的3D高斯涂抹（3DGS）学习3D场景表示。简而言之，我们提出了基于查询的泛化3DGS，以比NeRF中的表示具有更多几何意识地桥接3DGS技术和场景表示。此外，我们提出了层次化语义编码，将细粒度的语义特征固化到3D高斯中，并进一步提炼到场景表示向量中。我们在包括Maniskill2和Robomimic的两个RL平台上进行了广泛的实验，涵盖了10个不同的任务。结果显示，我们的方法在5个基准测试中大幅超越其他方法。我们在8个任务上达到了最佳成功率，在另外两个任务上取得了第二好的成绩。\n"
  },
  {
    "path": "abs/2406.02407.md",
    "content": "### WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections\n\nNovel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical harmonic coefficients transfer module that adapts 3DGS to varying lighting conditions and photometric post-processing. This lightweight module can be pre-computed and ensures efficient gradient propagation from rendered images to 3D Gaussian attributes. Additionally, we observe that the appearance encoder and the transient mask predictor, the two most critical parts of NVS from unconstrained photo collections, can be mutually beneficial. We introduce a plug-and-play lightweight spatial attention module to simultaneously predict transient occluders and latent appearance representation for each image. After training and preprocessing, our method aligns with the standard 3DGS format and rendering pipeline, facilitating seamlessly integration into various 3DGS applications. Extensive experiments on diverse datasets show our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.\n\n从不受限制的照片集合进行新视角合成（NVS）在计算机图形学中具有挑战性。最近，3D高斯涂抹（3DGS）在静态场景的光真实和实时新视角合成中显示出了前景。基于3DGS，我们提出了一个高效的基于点的可微分渲染框架，用于从照片集合重建场景。我们的关键创新是一个基于残差的球面谐波系数转移模块，该模块适应不同的照明条件和光度后处理，以调整3DGS。这个轻量级模块可以预计算，并确保从渲染图像到3D高斯属性的有效梯度传播。此外，我们观察到外观编码器和瞬态遮挡物预测器这两个不受限照片集合NVS中最关键的部分可以相互促进。我们引入了一个即插即用的轻量级空间注意力模块，用于同时预测每幅图像的瞬态遮挡物和潜在的外观表示。经过训练和预处理后，我们的方法与标准的3DGS格式和渲染管道对齐，便于无缝集成到各种3DGS应用中。在多样化数据集上进行的广泛实验表明，我们的方法在新视角和外观合成的渲染质量、收敛速度和渲染速度方面均优于现有方法。\n"
  },
  {
    "path": "abs/2406.02518.md",
    "content": "### DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering\n\nDigitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena, such as Compton scattering. We present a novel approach that marries realistic physics-inspired X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting (3DGS). Our direction-disentangled 3DGS (DDGS) method separates the radiosity contribution into isotropic and direction-dependent components, approximating complex anisotropic interactions without intricate runtime simulations. Additionally, we adapt the 3DGS initialization to account for tomography data properties, enhancing accuracy and efficiency. Our method outperforms state-of-the-art techniques in image accuracy. Furthermore, our DDGS shows promise for intraoperative applications and inverse problems such as pose registration, delivering superior registration accuracy and runtime performance compared to analytical DRR methods.\n\n数字重建X线图（DRRs）是从3D CT体积生成的模拟2D X线图像，在术前设置中广泛使用，但由于计算瓶颈，尤其是对于精确但计算量大的基于物理的蒙特卡罗方法，在术中应用中受到限制。虽然分析型DRR渲染器提供了更高的效率，但它们忽略了各向异性X线图像形成现象，如康普顿散射。我们提出了一种新方法，将真实的物理启发型X线模拟与高效的可微分DRR生成结合起来，使用3D高斯涂抹（3DGS）。我们的方向解耦3DGS（DDGS）方法将辐射度贡献分为各向同性和方向依赖的组分，无需复杂的运行时模拟即可近似复杂的各向异性相互作用。此外，我们调整了3DGS的初始化，以考虑层析数据属性，提高了精度和效率。我们的方法在图像精度方面超越了最先进的技术。此外，我们的DDGS对于术中应用和逆问题（如姿态注册）显示出前景，与分析型DRR方法相比，提供了更优的注册精度和运行时性能。\n"
  },
  {
    "path": "abs/2406.02533.md",
    "content": "### SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition\n\nOn-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possibly unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. In this article, we present an approach for mapping geometries and high-confidence detection of components of unknown, non-cooperative satellites on orbit. We implement accelerated 3D Gaussian splatting to learn a 3D representation of the satellite, render virtual views of the target, and ensemble the YOLOv5 object detector over the virtual views, resulting in reliable, accurate, and precise satellite component detections. The full pipeline capable of running on-board and stand to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks.\n\n在轨服务（OOS）、航天器检查以及主动太空碎片清除（ADR）等任务需要在非合作、可能未知的在轨空间对象附近进行精确的交会和近邻操作。有人任务的安全问题以及地面控制的延迟时间要求完全自主性。在本文中，我们介绍了一种方法，用于绘制未知、非合作卫星在轨组件的几何形状和高置信度检测。我们实现了加速的3D高斯涂抹来学习卫星的3D表示，渲染目标的虚拟视图，并在虚拟视图上集成YOLOv5对象检测器，从而实现可靠、准确和精确的卫星组件检测。整个管道能够在机载系统上运行，并支持自主导航、导航和控制任务所必需的下游机器智能任务。\n"
  },
  {
    "path": "abs/2406.02541.md",
    "content": "### Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting\n\nRecent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos. In the first stage, Video-3DGS employs an improved version of COLMAP, referred to as MC-COLMAP, which processes original videos using a Masked and Clipped approach. For each video clip, MC-COLMAP generates the point clouds for dynamic foreground objects and complex backgrounds. These point clouds are utilized to initialize two sets of 3D Gaussians (Frg-3DGS and Bkg-3DGS) aiming to represent foreground and background views. Both foreground and background views are then merged with a 2D learnable parameter map to reconstruct full views. In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model. To demonstrate the efficacy of Video-3DGS on both stages, we conduct extensive experiments across two related tasks: Video Reconstruction and Video Editing. Video-3DGS trained with 3k iterations significantly improves video reconstruction quality (+3 PSNR, +7 PSNR increase) and training efficiency (x1.9, x4.5 times faster) over NeRF-based and 3DGS-based state-of-art methods on DAVIS dataset, respectively. Moreover, it enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.\n\n近期在零样本视频扩散模型方面的进展显示出了以文本驱动视频编辑的潜力，但在实现高时间一致性方面仍面临挑战。为了解决这一问题，我们引入了Video-3DGS，这是一种基于3D高斯涂抹（3DGS）的视频细化器，旨在增强零样本视频编辑器的时间一致性。我们的方法采用了为编辑动态单眼视频量身定制的两阶段3D高斯优化过程。在第一阶段，Video-3DGS采用了改进版的COLMAP，称为MC-COLMAP，该技术使用遮罩和剪辑方法处理原始视频。对于每个视频片段，MC-COLMAP生成动态前景对象和复杂背景的点云。这些点云被用来初始化两组3D高斯（Frg-3DGS和Bkg-3DGS），旨在代表前景和背景视图。然后将前景和背景视图与2D可学习参数图合并以重建完整视图。在第二阶段，我们利用第一阶段开发的重建能力来对视频扩散模型施加时间约束。为了证明Video-3DGS在这两个阶段的有效性，我们在两个相关任务上进行了广泛的实验：视频重建和视频编辑。在DAVIS数据集上，与基于NeRF和基于3DGS的最先进方法相比，经过3000次迭代训练的Video-3DGS显著提高了视频重建质量（PSNR提高3，7）和训练效率（分别快1.9倍，4.5倍）。此外，它通过确保58个动态单眼视频中的时间一致性，增强了视频编辑能力。\n"
  },
  {
    "path": "abs/2406.02720.md",
    "content": "### 3D-HGS: 3D Half-Gaussian Splatting\n\nPhoto-realistic 3D Reconstruction is a fundamental problem in 3D computer vision. This domain has seen considerable advancements owing to the advent of recent neural rendering techniques. These techniques predominantly aim to focus on learning volumetric representations of 3D scenes and refining these representations via loss functions derived from rendering. Among these, 3D Gaussian Splatting (3D-GS) has emerged as a significant method, surpassing Neural Radiance Fields (NeRFs). 3D-GS uses parameterized 3D Gaussians for modeling both spatial locations and color information, combined with a tile-based fast rendering technique. Despite its superior rendering performance and speed, the use of 3D Gaussian kernels has inherent limitations in accurately representing discontinuous functions, notably at edges and corners for shape discontinuities, and across varying textures for color discontinuities. To address this problem, we propose to employ 3D Half-Gaussian (3D-HGS) kernels, which can be used as a plug-and-play kernel. Our experiments demonstrate their capability to improve the performance of current 3D-GS related methods and achieve state-of-the-art rendering performance on various datasets without compromising rendering speed.\n\n光真实3D重建是3D计算机视觉中的一个基本问题。由于最近神经渲染技术的出现，这一领域取得了显著进展。这些技术主要旨在学习3D场景的体积表示，并通过从渲染派生的损失函数来细化这些表示。在这些技术中，3D高斯涂抹（3D-GS）已成为一种重要的方法，超越了神经辐射场（NeRFs）。3D-GS使用参数化的3D高斯核对空间位置和颜色信息进行建模，并结合了基于瓦片的快速渲染技术。尽管其渲染性能和速度优越，但使用3D高斯核在准确表示不连续函数方面存在固有限制，特别是在形状不连续的边缘和角落以及颜色不连续的不同纹理之间。为了解决这个问题，我们建议使用3D半高斯（3D-HGS）核，这可以作为即插即用的核心。我们的实验表明，它们能够提高当前3D-GS相关方法的性能，并在不影响渲染速度的情况下在各种数据集上实现最先进的渲染性能。\n"
  },
  {
    "path": "abs/2406.02968.md",
    "content": "### Adversarial Generation of Hierarchical Gaussians for 3D Generative Model\n\nMost advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a naïve generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability.\n\n大多数3D生成对抗网络（3D GANs）的进步主要依赖于基于光线投射的体积渲染，这导致了巨大的渲染成本。一种有前景的替代方法是基于光栅化的3D高斯涂抹（3D-GS），提供了更快的渲染速度和明确的3D表达。在本文中，我们通过利用其高效和明确的特性，探索高斯作为3D GANs的3D表示。然而，在对抗性框架中，我们观察到，一个简单的生成器架构存在训练不稳定性，并且缺乏调整高斯尺度的能力。这导致了模型发散和由于缺乏对高斯初始化位置的适当指导以及适应性地管理其尺度的密集化而产生的视觉伪像。为了解决这些问题，我们引入了一个具有层次化多尺度高斯表示的生成器架构，有效地规范了生成高斯的位置和尺度。具体来说，我们设计了一个高斯层次结构，其中更细层次的高斯由其粗糙层次的对应物参数化；更细层次的高斯位置将位于其粗糙层次对应物附近，且尺度随着层次的细化而单调减小，从而模拟3D场景的粗糙和细致细节。实验结果表明，与具有可比3D生成能力的最先进的3D一致性GANs相比，我们的方法实现了显著更快的渲染速度（提高100倍）。\n"
  },
  {
    "path": "abs/2406.02972.md",
    "content": "### Event3DGS: Event-based 3D Gaussian Splatting for Fast Egomotion\n\nThe recent emergence of 3D Gaussian splatting (3DGS) leverages the advantage of explicit point-based representations, which significantly improves the rendering speed and quality of novel-view synthesis. However, 3D radiance field rendering in environments with high-dynamic motion or challenging illumination condition remains problematic in real-world robotic tasks. The reason is that fast egomotion is prevalent real-world robotic tasks, which induces motion blur, leading to inaccuracies and artifacts in the reconstructed structure. To alleviate this problem, we propose Event3DGS, the first method that learns Gaussian Splatting solely from raw event streams. By exploiting the high temporal resolution of event cameras and explicit point-based representation, Event3DGS can reconstruct high-fidelity 3D structures solely from the event streams under fast egomotion. Our sparsity-aware sampling and progressive training approaches allow for better reconstruction quality and consistency. To further enhance the fidelity of appearance, we explicitly incorporate the motion blur formation process into a differentiable rasterizer, which is used with a limited set of blurred RGB images to refine the appearance. Extensive experiments on multiple datasets validate the superior rendering quality of Event3DGS compared with existing approaches, with over 95% lower training time and faster rendering speed in orders of magnitude.\n\n最近3D高斯涂抹（3DGS）的出现利用了显式点基表示的优势，显著提高了新视角合成的渲染速度和质量。然而，在具有高动态运动或具挑战性照明条件的环境中，3D辐射场渲染在现实世界的机器人任务中仍然存在问题。问题的原因是，在现实世界的机器人任务中，快速自我运动是普遍现象，这会导致运动模糊，从而导致重建结构中的不准确和伪像。为了缓解这个问题，我们提出了Event3DGS，这是第一个仅从原始事件流中学习高斯涂抹的方法。通过利用事件相机的高时间分辨率和显式点基表示，Event3DGS可以仅从快速自我运动下的事件流中重建高保真3D结构。我们的稀疏感知采样和渐进式训练方法允许更好的重建质量和一致性。为了进一步提高外观的保真度，我们将运动模糊形成过程显式地纳入到一个可微光栅化器中，该光栅化器与有限组模糊的RGB图像一起使用，以细化外观。在多个数据集上进行的广泛实验验证了Event3DGS与现有方法相比的优越渲染质量，训练时间减少了95％以上，渲染速度也有数量级的提升。\n"
  },
  {
    "path": "abs/2406.03175.md",
    "content": "### Dynamic 3D Gaussian Fields for Urban Areas\n\nWe present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas. Existing works are not well suited for applications like mixed-reality or closed-loop simulation due to their limited visual quality and non-interactive rendering speeds. Recently, rasterization-based approaches have achieved high-quality NVS at impressive speeds. However, these methods are limited to small-scale, homogeneous data, i.e. they cannot handle severe appearance and geometry variations due to weather, season, and lighting and do not scale to larger, dynamic areas with thousands of images. We propose 4DGF, a neural scene representation that scales to large-scale dynamic urban areas, handles heterogeneous input data, and substantially improves rendering speeds. We use 3D Gaussians as an efficient geometry scaffold while relying on neural fields as a compact and flexible appearance model. We integrate scene dynamics via a scene graph at global scale while modeling articulated motions on a local level via deformations. This decomposed approach enables flexible scene composition suitable for real-world applications. In experiments, we surpass the state-of-the-art by over 3 dB in PSNR and more than 200 times in rendering speed.\n\n我们提出了一种高效的神经3D场景表示方法，用于大规模动态城市区域的新视角合成（NVS）。现有工作由于视觉质量有限和渲染速度不具交互性，不适合混合现实或闭环模拟等应用。最近，基于光栅化的方法在保证高质量NVS的同时实现了令人印象深刻的速度。然而，这些方法仅限于小规模、同质化数据，即它们不能处理由于天气、季节和照明引起的严重外观和几何变化，也不适用于包含数千张图片的更大、动态的区域。我们提出了4DGF，这是一种神经场景表示，可以扩展到大规模动态城市区域，处理异质输入数据，并显著提高渲染速度。我们使用3D高斯作为高效的几何支架，同时依赖神经场作为一种紧凑且灵活的外观模型。我们通过全局尺度的场景图集成场景动态，同时通过变形在局部层面模拟关节运动。这种分解方法使场景组合灵活，适合实际应用。在实验中，我们的方法在峰值信噪比（PSNR）上超过了现有技术3dB以上，并在渲染速度上提高了200倍以上。\n"
  },
  {
    "path": "abs/2406.03394.md",
    "content": "### Gaussian Representation for Deformable Image Registration\n\nDeformable image registration (DIR) is a fundamental task in radiotherapy, with existing methods often struggling to balance computational efficiency, registration accuracy, and speed effectively. We introduce a novel DIR approach employing parametric 3D Gaussian control points achieving a better tradeoff. It provides an explicit and flexible representation for spatial deformation fields between 3D volumetric medical images, producing a displacement vector field (DVF) across all volumetric positions. The movement of individual voxels is derived using linear blend skinning (LBS) through localized interpolation of transformations associated with neighboring Gaussians. This interpolation strategy not only simplifies the determination of voxel motions but also acts as an effective regularization technique. Our approach incorporates a unified optimization process through backpropagation, enabling iterative learning of both the parameters of the 3D Gaussians and their transformations. Additionally, the density of Gaussians is adjusted adaptively during the learning phase to accommodate varying degrees of motion complexity. We validated our approach on the 4D-CT lung DIR-Lab and cardiac ACDC datasets, achieving an average target registration error (TRE) of 1.06 mm within a much-improved processing time of 2.43 seconds for the DIR-Lab dataset over existing methods, demonstrating significant advancements in both accuracy and efficiency.\n\n形变图像配准（DIR）是放射治疗中的一个基本任务，现有方法通常难以有效平衡计算效率、配准精度和速度。我们引入了一种新的DIR方法，该方法采用参数化的3D高斯控制点，实现了更好的权衡。该方法为3D体积医学图像之间的空间变形场提供了一个明确且灵活的表示，生成了覆盖所有体积位置的位移向量场（DVF）。通过局部插值转换，相关邻近高斯的变换来导出单个体素的移动，采用线性混合蒙皮（LBS）技术。这种插值策略不仅简化了体素运动的确定，还充当了有效的正则化技术。我们的方法通过反向传播，包含了一个统一的优化过程，使得能够迭代学习3D高斯的参数及其变换。此外，在学习阶段，高斯的密度根据运动复杂度的不同进行自适应调整。我们在4D-CT肺DIR-Lab和心脏ACDC数据集上验证了我们的方法，与现有方法相比，DIR-Lab数据集的平均目标配准误差（TRE）为1.06毫米，处理时间大幅缩短到2.43秒，显示出在精度和效率上的显著进步。\n"
  },
  {
    "path": "abs/2406.03697.md",
    "content": "### Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction\n\nRendering novel view images in dynamic scenes is a crucial yet challenging task. Current methods mainly utilize NeRF-based methods to represent the static scene and an additional time-variant MLP to model scene deformations, resulting in relatively low rendering quality as well as slow inference speed. To tackle these challenges, we propose a novel framework named Superpoint Gaussian Splatting (SP-GS). Specifically, our framework first employs explicit 3D Gaussians to reconstruct the scene and then clusters Gaussians with similar properties (e.g., rotation, translation, and location) into superpoints. Empowered by these superpoints, our method manages to extend 3D Gaussian splatting to dynamic scenes with only a slight increase in computational expense. Apart from achieving state-of-the-art visual quality and real-time rendering under high resolutions, the superpoint representation provides a stronger manipulation capability. Extensive experiments demonstrate the practicality and effectiveness of our approach on both synthetic and real-world datasets.\n\n在动态场景中渲染新视角图像是一个关键但具有挑战性的任务。当前方法主要利用基于NeRF的方法来表示静态场景，并使用额外的时变多层感知机（MLP）来模拟场景变形，这导致渲染质量相对较低以及推理速度慢。为了解决这些挑战，我们提出了一个名为Superpoint Gaussian Splatting（SP-GS）的新框架。具体来说，我们的框架首先使用明确的3D高斯来重建场景，然后将具有类似属性（例如，旋转、平移和位置）的高斯聚集到超点中。通过这些超点的支持，我们的方法成功地将3D高斯涂抹扩展到动态场景中，仅略微增加了计算开销。除了在高分辨率下实现了业界领先的视觉质量和实时渲染外，超点表示还提供了更强的操纵能力。广泛的实验表明我们的方法在合成和现实世界数据集上的实用性和有效性。\n"
  },
  {
    "path": "abs/2406.04251.md",
    "content": "### Localized Gaussian Point Management\n\nPoint management is a critical component in optimizing 3D Gaussian Splatting (3DGS) models, as the point initiation (e.g., via structure from motion) is distributionally inappropriate. Typically, the Adaptive Density Control (ADC) algorithm is applied, leveraging view-averaged gradient magnitude thresholding for point densification, opacity thresholding for pruning, and regular all-points opacity reset. However, we reveal that this strategy is limited in tackling intricate/special image regions (e.g., transparent) as it is unable to identify all the 3D zones that require point densification, and lacking an appropriate mechanism to handle the ill-conditioned points with negative impacts (occlusion due to false high opacity). To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration. Zone identification is achieved by leveraging the underlying multiview geometry constraints, with the guidance of image rendering errors. We apply point densification in the identified zone, whilst resetting the opacity of those points residing in front of these regions so that a new opportunity is created to correct ill-conditioned points. Serving as a versatile plugin, LPM can be seamlessly integrated into existing 3D Gaussian Splatting models. Experimental evaluation across both static 3D and dynamic 4D scenes validate the efficacy of our LPM strategy in boosting a variety of existing 3DGS models both quantitatively and qualitatively. Notably, LPM improves both vanilla 3DGS and SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds, outperforming on challenging datasets such as Tanks & Temples and the Neural 3D Video Dataset.\n\n点管理是优化3D高斯涂抹（3DGS）模型的一个关键组成部分，因为点初始化（例如通过运动结构）在分布上是不适当的。通常，会应用自适应密度控制（ADC）算法，利用视图平均梯度幅度阈值进行点密化，透明度阈值进行修剪，以及定期的所有点透明度重置。然而，我们发现这种策略在处理复杂/特殊图像区域（例如透明区域）时存在局限，因为它无法识别所有需要点密化的3D区域，并且缺乏适当的机制来处理带有负面影响的病态点（由于错误高透明度导致的遮挡）。为了解决这些限制，我们提出了一种局部点管理（LPM）策略，能够识别对点增加和几何校正需求最高的那些错误贡献区域。通过利用底层的多视图几何约束和图像渲染错误的指导来实现区域识别。我们在识别的区域内应用点密化，同时重置这些区域前方点的透明度，从而创造了纠正病态点的新机会。作为一个多功能插件，LPM可以无缝集成到现有的3D高斯涂抹模型中。在静态3D和动态4D场景中的实验评估验证了我们的LPM策略在定量和定性上提升各种现有3DGS模型的有效性。值得注意的是，LPM改进了普通3DGS和SpaceTimeGS，实现了业界领先的渲染质量，同时保持了实时速度，在挑战性数据集如Tanks & Temples和Neural 3D Video Dataset上表现优异。\n"
  },
  {
    "path": "abs/2406.04338.md",
    "content": "### Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion\n\nIn recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \\textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments.\n\n近年来，3D生成模型的发展迅速，为模拟3D对象的动态运动和自定义其行为等应用开辟了新的可能性。然而，当前的3D生成模型倾向于只关注表面特征，如颜色和形状，忽视了控制现实世界中物体行为的固有物理属性。为了准确模拟符合物理的动态，预测材料的物理属性并将其纳入行为预测过程是至关重要的。尽管如此，由于物理属性的复杂性，预测真实世界物体的多样材料仍然具有挑战性。在这篇论文中，我们提出了一种名为 **Physics3D** 的新方法，通过视频扩散模型学习3D对象的各种物理属性。我们的方法涉及设计一个基于粘弹性材料模型的高度泛化的物理仿真系统，该系统能够高保真地模拟各种材料。此外，我们从包含对现实物体材料更深理解的视频扩散模型中提取物理先验。广泛的实验表明，我们的方法在弹性和塑性材料上都有效。Physics3D显示出极大的潜力，能够桥接物理世界和虚拟神经空间之间的差距，为虚拟环境中现实物理原理的更好集成和应用提供支持。\n"
  },
  {
    "path": "abs/2406.04343.md",
    "content": "### Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image\n\nIn this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a \"foundation\" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input.\n\n在这篇论文中，我们提出了一种名为Flash3D的方法，用于从单一图像进行场景重建和新视角合成，该方法具有很高的泛化性和效率。为了提高泛化性，我们从用于单眼深度估计的“基础”模型开始，并将其扩展为一个完整的3D形状和外观重构器。为了提高效率，我们将这一扩展基于前馈高斯涂抹。具体来说，我们在预测的深度上预测一层3D高斯，然后添加在空间中偏移的额外高斯层，允许模型完成遮挡和截断后面的重建。Flash3D非常高效，可以在一天内在单个GPU上进行训练，因此对大多数研究者来说都是可行的。在RealEstate10k上训练和测试时，它达到了业界领先的结果。当转移到未见数据集如NYU时，它的表现大大超过了竞争对手。更令人印象深刻的是，当转移到KITTI时，Flash3D比在该数据集上专门训练的方法获得了更好的PSNR。在某些情况下，它甚至超过了使用多视图作为输入的最新方法。\n"
  },
  {
    "path": "abs/2406.05774.md",
    "content": "### VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction\n\nAlthough 3D Gaussian Splatting has been widely studied because of its realistic and efficient novel-view synthesis, it is still challenging to extract a high-quality surface from the point-based representation. Previous works improve the surface by incorporating geometric priors from the off-the-shelf normal estimator. However, there are two main limitations: 1) Supervising normal rendered from 3D Gaussians updates only the rotation parameter while neglecting other geometric parameters; 2) The inconsistency of predicted normal maps across multiple views may lead to severe reconstruction artifacts. In this paper, we propose a Depth-Normal regularizer that directly couples normal with other geometric parameters, leading to full updates of the geometric parameters from normal regularization. We further propose a confidence term to mitigate inconsistencies of normal predictions across multiple views. Moreover, we also introduce a densification and splitting strategy to regularize the size and distribution of 3D Gaussians for more accurate surface modeling. Compared with Gaussian-based baselines, experiments show that our approach obtains better reconstruction quality and maintains competitive appearance quality at faster training speed and 100+ FPS rendering.\n\n虽然由于其现实且高效的新视角合成，3D高斯涂抹已被广泛研究，但从基于点的表示中提取高质量表面仍然是一个挑战。先前的工作通过整合现成的法线估计器中的几何先验来改善表面质量。然而，存在两个主要的局限性：1) 对从3D高斯渲染的法线进行监督只更新旋转参数，而忽略了其他几何参数；2) 多视图中预测的法线图的不一致可能导致严重的重建缺陷。在本文中，我们提出了一个深度-法线正则化器，该正则化器直接将法线与其他几何参数相耦合，从而实现了从法线正则化中对几何参数的完全更新。我们进一步提出了一个置信度项，以减轻多视图中法线预测的不一致性。此外，我们还引入了一种密化和分裂策略，以规范3D高斯的大小和分布，从而实现更精确的表面建模。与基于高斯的基线相比，实验表明我们的方法获得了更好的重建质量，并且在更快的训练速度和100+ FPS的渲染速度下保持了竞争性的外观质量。\n"
  },
  {
    "path": "abs/2406.05852.md",
    "content": "### RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering\n\n3D Gaussian Splatting (3D-GS) has made a notable advancement in the field of neural rendering, 3D scene reconstruction, and novel view synthesis. Nevertheless, 3D-GS encounters the main challenge when it comes to accurately representing physical reflections, especially in the case of total reflection and semi-reflection that are commonly found in real-world scenes. This limitation causes reflections to be mistakenly treated as independent elements with physical presence, leading to imprecise reconstructions. Herein, to tackle this challenge, we propose RefGaussian to disentangle reflections from 3D-GS for realistically modeling reflections. Specifically, we propose to split a scene into transmitted and reflected components and represent these components using two Spherical Harmonics (SH). Given that this decomposition is not fully determined, we employ local regularization techniques to ensure local smoothness for both the transmitted and reflected components, thereby achieving more plausible decomposition outcomes than 3D-GS. Experimental results demonstrate that our approach achieves superior novel view synthesis and accurate depth estimation outcomes. Furthermore, it enables the utilization of scene editing applications, ensuring both high-quality results and physical coherence.\n\n3D高斯涂抹（3D-GS）在神经渲染、3D场景重建和新视角合成领域取得了显著进展。然而，当涉及到精确表示物理反射时，特别是在现实世界场景中常见的全反射和半反射情况下，3D-GS面临主要挑战。这一限制导致反射被错误地视为具有物理存在的独立元素，从而导致重建不精确。为了解决这一挑战，我们提出了RefGaussian方法，以从3D-GS中分离反射，实现现实地模拟反射。具体来说，我们提议将场景分为传输和反射两个部分，并使用两个球谐（SH）来表示这些部分。鉴于这种分解并不完全确定，我们采用局部正则化技术确保传输和反射部分的局部平滑性，从而实现比3D-GS更可信的分解结果。实验结果表明，我们的方法在新视角合成和精确深度估计方面实现了优越的成果。此外，它还使得场景编辑应用成为可能，确保了高质量的结果和物理一致性。\n"
  },
  {
    "path": "abs/2406.05897.md",
    "content": "### InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping\n\n3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians. To address this issue, we develop a mutual information shaping technique that enforces movement resonance between correlated Gaussians in a motion network. Such correlations can be learned from putative 2D object masks in different views. By approximating the mutual information with the Jacobians of the motions, our method ensures consistent movements of the Gaussians composing different objects under various perturbations. In particular, we develop an efficient contrastive training pipeline with lightweight optimization to shape the motion network, avoiding the need for re-shaping throughout the motion sequence. Notably, our training only touches a small fraction of all Gaussians in the scene yet attains the desired compositional behavior according to the underlying dynamic structure. The proposed technique is evaluated on challenging scenes and demonstrates significant performance improvement in promoting consistent movements and 3D object segmentation while inducing low computation and memory requirements.\n\n3D高斯作为一种低层次的场景表示，通常涉及数千至数百万个高斯。这使得难以控制反映底层动态结构的场景，其中独立实体的数量通常要小得多。特别是，场景中的物体动画化和移动可能具有挑战性，这需要多个高斯之间的协调。为了解决这个问题，我们开发了一种互信息塑形技术，该技术在运动网络中强制使相关高斯之间的移动产生共振。这种相关性可以从不同视图中的假定2D对象遮罩中学习得到。通过用运动的雅可比近似互信息，我们的方法确保在各种干扰下，组成不同对象的高斯能够保持一致的移动。特别是，我们开发了一种高效的对比训练管道和轻量级优化，以塑造运动网络，避免在整个运动序列中重新塑形的需求。值得注意的是，我们的训练只触及场景中很小一部分的高斯，但仍能根据底层动态结构达到期望的组合行为。所提出的技术在挑战性场景中进行了评估，并在促进一致的移动和3D对象分割方面显示出显著的性能改进，同时引起的计算和内存需求较低。\n"
  },
  {
    "path": "abs/2406.06050.md",
    "content": "### Generalizable Human Gaussians from Single-View Image\n\nIn this work, we tackle the task of learning generalizable 3D human Gaussians from a single image. The main challenge for this task is to recover detailed geometry and appearance, especially for the unobserved regions. To this end, we propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image. We design a diffusion-based coarse-to-fine pipeline, where the diffusion model is adapted to refine novel-view images rendered from a coarse human Gaussian model. The refined images are then used together with the input image to learn a refined human Gaussian model. Although effective in hallucinating the unobserved views, the approach may generate unrealistic human pose and shapes due to the lack of supervision. We circumvent this problem by further encoding the geometric priors from SMPL model. Specifically, we propagate geometric features from SMPL volume to the predicted Gaussians via sparse convolution and attention mechanism. We validate our approach on publicly available datasets and demonstrate that it significantly surpasses state-of-the-art methods in terms of PSNR and SSIM. Additionally, our method exhibits strong generalization for in-the-wild images.\n\n在这项工作中，我们致力于从单张图片学习可泛化的3D人体高斯模型。这项任务的主要挑战是恢复详细的几何形状和外观，特别是对于未观察到的区域。为此，我们提出了单视图可泛化的人体高斯模型（HGM），这是一个由扩散引导的3D人体建模框架。我们设计了一个基于扩散的从粗到细的管道，其中扩散模型被适应用来细化从粗糙人体高斯模型渲染的新视角图像。然后将细化后的图像与输入图像一起使用，以学习一个细化的人体高斯模型。虽然这种方法在幻想未观察视角时有效，但由于缺乏监督，可能会生成不现实的人体姿态和形状。我们通过进一步编码来自SMPL模型的几何先验来规避这个问题。具体来说，我们通过稀疏卷积和注意力机制将几何特征从SMPL体积传播到预测的高斯模型中。我们在公开可用的数据集上验证了我们的方法，并展示了它在PSNR和SSIM方面显著超越了现有的最先进方法。此外，我们的方法对于野外图像也展现了强大的泛化能力。\n"
  },
  {
    "path": "abs/2406.06216.md",
    "content": "### Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis\n\nVolumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly using 3DGS is challenging due to its inherent drawbacks: 1) in nighttime scenes, extremely low SNR leads to poor structure-from-motion (SfM) estimation in distant views; 2) the limited representation capacity of spherical harmonics (SH) function is unsuitable for RAW linear color space; and 3) inaccurate scene structure hampers downstream tasks such as refocusing. To address these issues, we propose LE3D (Lighting Every darkness with 3DGS). Our method proposes Cone Scatter Initialization to enrich the estimation of SfM, and replaces SH with a Color MLP to represent the RAW linear color space. Additionally, we introduce depth distortion and near-far regularizations to improve the accuracy of scene structure for downstream tasks. These designs enable LE3D to perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes. Compared to previous volumetric rendering based methods, LE3D reduces training time to 1% and improves rendering speed by up to 4,000 times for 2K resolution images in terms of FPS.\n\n基于体积渲染的方法，如NeRF，在从RAW图像合成高动态范围（HDR）视图方面表现出色，特别是在夜间场景中。然而，它们因密集采样要求而训练时间长，无法进行实时渲染。3D高斯涂抹（3DGS）的出现使得实时渲染和更快的训练成为可能。然而，直接使用3DGS实现基于RAW图像的视图合成面临挑战，由于其固有的缺点：1) 在夜间场景中，极低的信噪比导致远视图中的运动结构（SfM）估计质量差；2) 球谐函数（SH）的有限表示能力不适合RAW线性颜色空间；以及3) 不准确的场景结构妨碍了如重新聚焦等下游任务。为解决这些问题，我们提出了LE3D（用3DGS照亮每一个黑暗）。我们的方法提出了锥形散射初始化以丰富SfM的估计，并用颜色MLP替代SH来表示RAW线性颜色空间。此外，我们引入了深度畸变和近远规范化以提高场景结构的准确性，以便于下游任务。这些设计使LE3D能够进行实时的新视角合成、HDR渲染、重新聚焦和色调映射变化。与之前基于体积渲染的方法相比，LE3D将训练时间减少到1%，并将渲染速度提高了高达4000倍，以FPS计，适用于2K分辨率图像。\n"
  },
  {
    "path": "abs/2406.06367.md",
    "content": "### MVGamba: Unify 3D Content Generation as State Space Sequence Modeling\n\nRecent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (\\eg, Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size.\n\n最新的大型3D重建模型（LRMs）能够通过整合多视图扩散模型和可扩展的多视图重建器，在不到一秒的时间内生成高质量的3D内容。当前的工作进一步利用3D高斯涂抹作为3D表示，以提高视觉质量和渲染效率。然而，我们观察到现有的高斯重建模型常常受到多视图不一致和纹理模糊的影响。我们将此归因于为采用功能强大但计算密集的架构（例如，变压器）而妥协多视图信息传播。为解决这一问题，我们引入了MVGamba，这是一个通用且轻量级的高斯重建模型，特点是基于类似RNN的状态空间模型（SSM）的多视图高斯重建器。我们的高斯重建器传播包含多视图信息的因果上下文，用于跨视图自我精炼，同时生成长序列的高斯，用于细节建模，其复杂度为线性。通过集成现成的多视图扩散模型，MVGamba统一了来自单张图片、稀疏图片或文本提示的3D生成任务。广泛的实验表明，MVGamba在所有3D内容生成场景中均超越了最先进的基准，模型大小仅为约0.1倍。\n"
  },
  {
    "path": "abs/2406.06521.md",
    "content": "### PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction\n\nRecently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on surface reconstruction based on 3DGS have emerged recently, the quality of their meshes is generally unsatisfactory. To address this problem, we propose a fast planar-based Gaussian splatting reconstruction representation (PGSR) to achieve high-fidelity surface reconstruction while ensuring high-quality rendering. Specifically, we first introduce an unbiased depth rendering method, which directly renders the distance from the camera origin to the Gaussian plane and the corresponding normal map based on the Gaussian distribution of the point cloud, and divides the two to obtain the unbiased depth. We then introduce single-view geometric, multi-view photometric, and geometric regularization to preserve global geometric accuracy. We also propose a camera exposure compensation model to cope with scenes with large illumination variations. Experiments on indoor and outdoor scenes show that our method achieves fast training and rendering while maintaining high-fidelity rendering and geometric reconstruction, outperforming 3DGS-based and NeRF-based methods.\n\n最近，由于其高质量渲染以及超快的训练和渲染速度，3D高斯涂抹（3DGS）引起了广泛关注。然而，由于高斯点云的非结构化和不规则性，仅仅依赖图像重建损失很难保证几何重建的准确性和多视图一致性。尽管最近出现了许多基于3DGS的表面重建研究，但它们的网格质量通常不令人满意。为了解决这个问题，我们提出了一种快速的基于平面的高斯涂抹重建表示（PGSR），以实现高保真的表面重建，同时确保高质量渲染。具体来说，我们首先引入了一种无偏差深度渲染方法，该方法直接渲染相机原点到高斯平面的距离及其相应的法线图，基于点云的高斯分布，并将两者相除以获取无偏差深度。然后，我们引入单视图几何、多视图光度和几何规则化来保持全局几何精度。我们还提出了一个相机曝光补偿模型，以应对光照变化大的场景。在室内和室外场景的实验表明，我们的方法在保持高保真渲染和几何重建的同时，实现了快速的训练和渲染，超越了基于3DGS和NeRF的方法。\n"
  },
  {
    "path": "abs/2406.06526.md",
    "content": "### GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation\n\n3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage overhead (out-of-memory issues), arising from the need to expand points to billions, often demanding hundreds of Gigabytes of VRAM for a city scene spanning 10km^2. In this paper, we propose GaussianCity, a generative Gaussian Splatting framework dedicated to efficiently synthesizing unbounded 3D cities with a single feed-forward pass. Our key insights are two-fold: 1) Compact 3D Scene Representation: We introduce BEV-Point as a highly compact intermediate representation, ensuring that the growth in VRAM usage for unbounded scenes remains constant, thus enabling unbounded city generation. 2) Spatial-aware Gaussian Attribute Decoder: We present spatial-aware BEV-Point decoder to produce 3D Gaussian attributes, which leverages Point Serializer to integrate the structural and contextual characteristics of BEV points. Extensive experiments demonstrate that GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation. Notably, compared to CityDreamer, GaussianCity exhibits superior performance with a speedup of 60 times (10.72 FPS v.s. 0.18 FPS).\n\n使用基于NeRF的方法进行3D城市生成虽然展示了有前景的生成结果，但计算效率低下。最近，3D高斯涂抹（3D-GS）作为一种高效的对象级3D生成方法浮现出来。然而，将3D-GS从有限规模的3D对象和人类适应到无限规模的3D城市并非易事。无界3D城市生成涉及显著的存储开销（内存溢出问题），因为需要将点扩展到数十亿，通常需要数百吉字节的VRAM以支持覆盖10平方公里的城市场景。在本文中，我们提出了GaussianCity，一个专用于高效合成无界3D城市的生成高斯涂抹框架，它仅通过一次前馈传递即可完成。我们的关键洞察有两点：1) 紧凑的3D场景表示：我们引入了BEV-Point作为一种高度紧凑的中间表示，确保无界场景中VRAM使用的增长保持恒定，从而实现无界城市生成。2) 空间感知的高斯属性解码器：我们展示了空间感知的BEV-Point解码器，以生成3D高斯属性，该解码器利用点序列化器整合了BEV点的结构和上下文特征。广泛的实验表明，GaussianCity在无人机视角和街景视角的3D城市生成中实现了业界领先的结果。值得注意的是，与CityDreamer相比，GaussianCity表现出更优的性能，速度提升了60倍（10.72 FPS vs. 0.18 FPS）。\n"
  },
  {
    "path": "abs/2406.07329.md",
    "content": "### Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field\n\nRadiance field methods represent the state of the art in reconstructing complex scenes from multi-view photos. However, these reconstructions often suffer from one or both of the following limitations: First, they typically represent scenes in low dynamic range (LDR), which restricts their use to evenly lit environments and hinders immersive viewing experiences. Secondly, their reliance on a pinhole camera model, assuming all scene elements are in focus in the input images, presents practical challenges and complicates refocusing during novel-view synthesis. Addressing these limitations, we present a lightweight method based on 3D Gaussian Splatting that utilizes multi-view LDR images of a scene with varying exposure times, apertures, and focus distances as input to reconstruct a high-dynamic-range (HDR) radiance field. By incorporating analytical convolutions of Gaussians based on a thin-lens camera model as well as a tonemapping module, our reconstructions enable the rendering of HDR content with flexible refocusing capabilities. We demonstrate that our combined treatment of HDR and depth of field facilitates real-time cinematic rendering, outperforming the state of the art.\n\n辐射场方法代表了从多视图照片重构复杂场景的最新技术。然而，这些重构通常会遇到以下一种或两种限制：首先，它们通常以低动态范围（LDR）来表示场景，这限制了它们在均匀照明环境中的使用，并妨碍了沉浸式观看体验。其次，它们依赖于针孔相机模型，假设所有场景元素在输入图像中都处于焦点状态，这在实践中带来了挑战，并且在新视角合成期间复焦变得复杂。为了解决这些限制，我们提出了一种基于3D高斯喷溅的轻量级方法，该方法使用具有不同曝光时间、光圈和焦点距离的多视图LDR图像作为输入，以重构高动态范围（HDR）光辉场。通过结合基于薄镜头相机模型的高斯的分析卷积以及色调映射模块，我们的重构使得HDR内容的渲染具有灵活的复焦能力。我们展示了我们对HDR和景深的综合处理，可以实现实时电影渲染，超越了现有技术水平。\n"
  },
  {
    "path": "abs/2406.07499.md",
    "content": "### Trim 3D Gaussian Splatting for Accurate Geometry Representation\n\nIn this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. Our project page is this https URL\n\n在本文中，我们介绍了Trim 3D高斯喷溅（TrimGS），用于从图像重构精确的3D几何形状。之前在3D高斯用于几何重构的研究主要集中在探索强几何正则化。相比之下，我们提出了一种新颖的视角：通过高斯修剪来获取场景的精确3D几何形状，该方法选择性地移除不准确的几何形状，同时保留准确的结构。为此，我们分析了单个3D高斯的贡献，并提出了一种基于贡献的修剪策略，以移除多余或不准确的高斯。此外，我们的实验和理论分析表明，相对较小的高斯尺度是表示和优化复杂细节中一个不可忽视的因素。因此，提出的TrimGS保持了相对较小的高斯尺度。此外，TrimGS也与以前技术中有效的几何正则化策略兼容。当与原始的3DGS和最新的2DGS结合使用时，TrimGS始终能提供更精确的几何形状和更高的感知质量。\n"
  },
  {
    "path": "abs/2406.08300.md",
    "content": "### From Chaos to Clarity: 3DGS in the Dark\n\nNovel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shapes that overfit the noise, thereby significantly degrading reconstruction quality and reducing inference speed, especially in scenarios with limited views. To address these issues, we introduce a novel self-supervised learning framework designed to reconstruct HDR 3DGS from a limited number of noisy raw images. This framework enhances 3DGS by integrating a noise extractor and employing a noise-robust reconstruction loss that leverages a noise distribution prior. Experimental results show that our method outperforms LDR/HDR 3DGS and previous state-of-the-art (SOTA) self-supervised and supervised pre-trained models in both reconstruction quality and inference speed on the RawNeRF dataset across a broad range of training views.\n\n从原始图像合成新视角提供了比从低动态范围RGB图像重构更优越的高动态范围（HDR）信息。然而，未处理原始图像中固有的噪声却影响了3D场景表示的准确性。我们的研究显示，3D高斯喷溅（3DGS）对这种噪声特别敏感，导致生成了许多过度拟合噪声的拉长高斯形状，从而显著降低了重构质量并减缓了推理速度，尤其是在视图有限的情况下。为了解决这些问题，我们引入了一种新的自监督学习框架，旨在从有限数量的噪声原始图像中重构HDR 3DGS。该框架通过整合噪声提取器并采用一种利用噪声分布先验的抗噪声重构损失来增强3DGS。实验结果显示，我们的方法在RawNeRF数据集上，跨广泛的训练视角，无论是重构质量还是推理速度，都超越了LDR/HDR 3DGS和之前的最新自监督及监督预训练模型。\n"
  },
  {
    "path": "abs/2406.08475.md",
    "content": "### Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models\n\nCreating realistic avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot provide multi-view shape priors with guaranteed 3D consistency. We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion. Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other, and by coupling them in a tight manner, we can fully leverage the potential of both models. We introduce a novel image-conditioned generative 3D Gaussian Splats reconstruction model that leverages the priors from 2D multi-view diffusion models, and provides an explicit 3D representation, which further guides the 2D reverse sampling process to have better 3D consistency. Experiments show that our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Extensive ablations also validate the efficacy of our design, (1) multi-view 2D priors conditioning in generative 3D reconstruction and (2) consistency refinement of sampling trajectory via the explicit 3D representation.\n\n从单张RGB图像创建逼真的虚拟形象是一个具有吸引力但又极具挑战性的问题。由于这一问题的不适定性，近期的研究利用基于大数据集预训练的2D扩散模型的强大先验。尽管2D扩散模型展示了强大的泛化能力，但它们不能提供具有保证的3D一致性的多视图形状先验。我们提出了Human 3Diffusion：通过显式3D一致扩散创建逼真的虚拟形象。我们的关键见解是，2D多视图扩散和3D重建模型为彼此提供了互补的信息，并且通过紧密地结合它们，我们可以充分利用这两个模型的潜力。我们引入了一种新颖的图像条件生成的3D高斯喷溅重建模型，该模型利用来自2D多视图扩散模型的先验，并提供了一个显式的3D表示，进一步指导2D反向采样过程以提高3D一致性。实验表明，我们提出的框架超越了现有最先进的方法，并使从单张RGB图像创建逼真的虚拟形象成为可能，同时在几何和外观上都达到了高保真度。广泛的消融实验也验证了我们设计的有效性：（1）在生成性3D重建中多视图2D先验的调节；（2）通过显式3D表示对采样轨迹的一致性细化。\n"
  },
  {
    "path": "abs/2406.08488.md",
    "content": "### ICE-G: Image Conditional Editing of 3D Gaussian Splats\n\nRecently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine-grained control of editing.\n\n近年来，许多技术涌现出来用于创建高质量的3D资产和场景。然而，在编辑这些对象时，现有的方法要么速度慢，要么在质量上妥协，要么无法提供足够的定制化。我们提出了一种从单一参考视图快速编辑3D模型的新方法。我们的技术首先对编辑图像进行分割，然后使用DINO特征匹配选择的分割数据集视图中语义上对应的区域。然后可以以语义上合理的方式自动将编辑图像的特定区域的颜色或纹理变化应用到其他视图。这些编辑后的视图充当更新的数据集，用于进一步训练和重新风格化3D场景。因此最终结果是一个被编辑的3D模型。我们的框架支持各种编辑任务，如手动局部编辑、基于对应关系的样式转移，以及结合多个示例图像的不同风格。我们使用高斯喷溅作为主要的3D表示，因为它们在速度和局部编辑方面的便利性，但我们的技术也适用于其他方法，如NeRFs。我们通过多个示例展示，我们的方法在提供精细控制编辑方面产生了更高质量的结果。\n"
  },
  {
    "path": "abs/2406.08759.md",
    "content": "### Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling\n\nThe field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage challenge, impeding its broader application. To address this challenge, we introduce the Gaussian-Forest modeling framework, which hierarchically represents a scene as a forest of hybrid 3D Gaussians. Each hybrid Gaussian retains its unique explicit attributes while sharing implicit ones with its sibling Gaussians, thus optimizing parameterization with significantly fewer variables. Moreover, adaptive growth and pruning strategies are designed, ensuring detailed representation in complex regions and a notable reduction in the number of required Gaussians. Extensive experiments demonstrate that Gaussian-Forest not only maintains comparable speed and quality but also achieves a compression rate surpassing 10 times, marking a significant advancement in efficient scene modeling.\n\n近期，新视角合成领域见证了3D高斯喷涂技术的崛起，该技术以点为基础来表示场景，并通过光栅化进行渲染。与依赖光线追踪的辐射场不同，这种方法展示了更优的渲染质量和速度。然而，3D高斯的显式和无结构性质带来了显著的存储挑战，阻碍了其更广泛的应用。为了解决这一挑战，我们引入了高斯森林建模框架，该框架以分层的方式将场景表现为一个混合3D高斯的森林。每个混合高斯保留其独特的显式属性，同时与其兄弟高斯共享隐式属性，从而通过显著减少变量来优化参数化。此外，我们设计了适应性增长和修剪策略，确保在复杂区域的详细表达，并显著减少所需高斯的数量。广泛的实验表明，高斯森林不仅保持了可比的速度和质量，还实现了超过10倍的压缩率，标志着在高效场景建模方面的重大进步。\n"
  },
  {
    "path": "abs/2406.09377.md",
    "content": "### GGHead: Fast and Generalizable 3D Gaussian Heads\n\nLearning 3D head priors from large 2D image collections is an important step towards high-quality 3D-aware human modeling. A core requirement is an efficient architecture that scales well to large-scale datasets and large image resolutions. Unfortunately, existing 3D GANs struggle to scale to generate samples at high resolutions due to their relatively slow train and render speeds, and typically have to rely on 2D superresolution networks at the expense of global 3D consistency. To address these challenges, we propose Generative Gaussian Heads (GGHead), which adopts the recent 3D Gaussian Splatting representation within a 3D GAN framework. To generate a 3D representation, we employ a powerful 2D CNN generator to predict Gaussian attributes in the UV space of a template head mesh. This way, GGHead exploits the regularity of the template's UV layout, substantially facilitating the challenging task of predicting an unstructured set of 3D Gaussians. We further improve the geometric fidelity of the generated 3D representations with a novel total variation loss on rendered UV coordinates. Intuitively, this regularization encourages that neighboring rendered pixels should stem from neighboring Gaussians in the template's UV space. Taken together, our pipeline can efficiently generate 3D heads trained only from single-view 2D image observations. Our proposed framework matches the quality of existing 3D head GANs on FFHQ while being both substantially faster and fully 3D consistent. As a result, we demonstrate real-time generation and rendering of high-quality 3D-consistent heads at 10242 resolution for the first time.\n\n从大规模2D图像集合中学习3D头部先验是向高质量3D感知人体建模迈进的重要一步。核心要求是一种能够适应大规模数据集和高分辨率图像的高效架构。遗憾的是，现有的3D生成对抗网络（GAN）在生成高分辨率样本时难以扩展，因为它们的训练和渲染速度相对较慢，并且通常必须依赖2D超分辨率网络，而牺牲全局3D一致性。为了应对这些挑战，我们提出了生成式高斯头部模型（GGHead），该模型在3D GAN框架内采用了最新的3D高斯喷涂表达。为了生成3D表征，我们采用强大的2D CNN生成器在模板头部网格的UV空间中预测高斯属性。通过这种方式，GGHead利用模板的UV布局的规律性，大大简化了预测无结构3D高斯集合的复杂任务。我们进一步通过在渲染的UV坐标上应用一种新颖的总变差损失来提高生成的3D表征的几何保真度。直观地说，这种规范化鼓励邻近的渲染像素应来自模板UV空间中的邻近高斯。综合考虑，我们的管道能够高效地生成仅从单视图2D图像观察训练的3D头部。我们提出的框架在FFHQ上与现有的3D头部GAN的质量相当，同时在速度和3D一致性上都有显著提升。结果表明，我们首次实现了以1024^2分辨率实时生成和渲染高质量3D一致的头部。\n"
  },
  {
    "path": "abs/2406.09394.md",
    "content": "### WonderWorld: Interactive 3D Scene Generation from a Single Image\n\nWe present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian Surfels and a guided diffusion-based depth estimation method, WonderWorld generates geometrically consistent extrapolation while significantly reducing computational time. Our framework generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. We demonstrate the potential of WonderWorld for applications in virtual reality, gaming, and creative design, where users can quickly generate and navigate immersive, potentially infinite virtual worlds from a single image. Our approach represents a significant advancement in interactive 3D scene generation, opening up new possibilities for user-driven content creation and exploration in virtual environments.\n\n我们推出了WonderWorld，这是一个用于交互式3D场景外推的新型框架，使用户能够基于单张输入图像和用户指定的文本探索和塑造虚拟环境。尽管在场景生成的视觉质量上取得了显著提升，但现有方法运行在离线状态，生成一个场景需要花费数十分钟到数小时。通过利用快速高斯表面（Fast Gaussian Surfels）和基于引导扩散的深度估计方法，WonderWorld在大幅减少计算时间的同时，生成几何上一致的场景外推。我们的框架在单个A6000 GPU上不到10秒内生成连通且多样的3D场景，使得用户能够实时互动和探索。我们展示了WonderWorld在虚拟现实、游戏和创意设计等应用中的潜力，用户可以快速生成并导航沉浸式、潜在无限的虚拟世界。我们的方法在交互式3D场景生成方面代表了一个重大进步，为用户驱动的内容创建和虚拟环境中的探索开辟了新的可能性。\n"
  },
  {
    "path": "abs/2406.09395.md",
    "content": "### Modeling Ambient Scene Dynamics for Free-view Synthesis\n\nWe introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require multi-camera captures, and often fail to generalize to unseen motions, limiting their practical application. Our approach overcomes these constraints by leveraging the periodicity of ambient motions to learn the motion trajectory model, coupled with careful regularization. We also propose important practical strategies to improve the visual quality of the baseline 3DGS static reconstructions and to improve memory efficiency critical for GPU-memory intensive learning. We demonstrate high-quality photorealistic novel view synthesis of several ambient natural scenes with intricate textures and fine structural elements.\n\n我们提出了一种新颖的方法，用于从单目摄像头捕捉环境场景，实现动态自由视角合成，为观看体验带来沉浸式的质量。我们的方法建立在最新的3D高斯喷涂（3DGS）技术上，该技术能够忠实地重建复杂的静态场景。以往尝试将3DGS扩展到动态表示的努力，通常局限于有界场景或需要多摄像头捕捉，并且常常无法泛化到未见过的运动，限制了它们的实际应用。我们的方法通过利用环境运动的周期性来学习运动轨迹模型，并结合仔细的规范化来克服这些限制。我们还提出了一些重要的实用策略，以提高基线3DGS静态重建的视觉质量和改善对GPU内存密集型学习至关重要的内存效率。我们展示了几种环境自然场景的高质量光真实新视角合成，这些场景具有复杂的纹理和精细的结构元素。\n"
  },
  {
    "path": "abs/2406.09733.md",
    "content": "### Unified Gaussian Primitives for Scene Representation and Rendering\n\nSearching for a unified scene representation remains a research challenge in computer graphics. Traditional mesh-based representations are unsuitable for dense, fuzzy elements, and introduce additional complexity for filtering and differentiable rendering. Conversely, voxel-based representations struggle to model hard surfaces and suffer from intensive memory requirement. We propose a general-purpose rendering primitive based on 3D Gaussian distribution for unified scene representation, featuring versatile appearance ranging from glossy surfaces to fuzzy elements, as well as physically based scattering to enable accurate global illumination. We formulate the rendering theory for the primitive based on non-exponential transport and derive efficient rendering operations to be compatible with Monte Carlo path tracing. The new representation can be converted from different sources, including meshes and 3D Gaussian splatting, and further refined via transmittance optimization thanks to its differentiability. We demonstrate the versatility of our representation in various rendering applications such as global illumination and appearance editing, while supporting arbitrary lighting conditions by nature. Additionally, we compare our representation to existing volumetric representations, highlighting its efficiency to reproduce details.s\n\n在计算机图形学中，寻找统一的场景表示仍是一个研究挑战。传统的基于网格的表示法不适用于密集、模糊的元素，并且在过滤和可微渲染方面引入了额外的复杂性。相反，基于体素的表示法难以模拟硬表面，并且会遭受密集的内存需求。我们提出了一种基于三维高斯分布的通用渲染原语，用于统一场景表示，具有从光滑表面到模糊元素的多样化外观，以及基于物理的散射，以实现准确的全局照明。我们根据非指数传输公式化了该原语的渲染理论，并推导出与蒙特卡罗路径跟踪兼容的高效渲染操作。这种新的表示法可以从不同来源转换，包括网格和三维高斯平涂，且由于其可微性，可以通过透射优化进一步细化。我们展示了我们的表示在各种渲染应用中的多功能性，如全球照明和外观编辑，同时自然地支持任意光照条件。此外，我们将我们的表示与现有的体积表示进行比较，突出其在复制细节方面的效率。\n"
  },
  {
    "path": "abs/2406.09850.md",
    "content": "### GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion\n\nText-to-3D generation has shown promising results, yet common challenges such as the Multi-face Janus problem and extended generation time for high-quality assets. In this paper, we address these issues by introducing a novel three-stage training pipeline called GradeADreamer. This pipeline is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU. Our proposed method employs a Multi-view Diffusion Model, MVDream, to generate Gaussian Splats as a prior, followed by refining geometry and texture using StableDiffusion. Experimental results demonstrate that our approach significantly mitigates the Multi-face Janus problem and achieves the highest average user preference ranking compared to previous state-of-the-art methods.\n\n文本到三维生成已展示出有希望的结果，但仍存在一些常见挑战，如多面体问题和高质量资产的延长生成时间。在本文中，我们通过引入一个名为GradeADreamer的新型三阶段训练流水线来解决这些问题。这个流水线能够在仅使用一块RTX 3090 GPU的情况下，在30分钟内生成高质量资产。我们提出的方法采用多视图扩散模型（MVDream）生成高斯平涂作为先验，然后使用StableDiffusion细化几何和纹理。实验结果表明，我们的方法显著减轻了多面体问题，并获得了与以往最先进方法相比的最高平均用户偏好排名。\n"
  },
  {
    "path": "abs/2406.10111.md",
    "content": "### GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors\n\nAchieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-quality images at a faster rendering speed. To alleviate the shortage of data for higher-resolution synthesis, we propose to leverage off-the-shelf 2D diffusion priors by distilling the 2D knowledge into 3D with Score Distillation Sampling (SDS). Nevertheless, applying SDS directly to Gaussian-based 3D super-resolution leads to undesirable and redundant 3D Gaussian primitives, due to the randomness brought by generative priors. To mitigate this issue, we introduce two simple yet effective techniques to reduce stochastic disturbances introduced by SDS. Specifically, we 1) shrink the range of diffusion timestep in SDS with an annealing strategy; 2) randomly discard redundant Gaussian primitives during densification. Extensive experiments have demonstrated that our proposed GaussainSR can attain high-quality results for HRNVS with only low-resolution inputs on both synthetic and real-world datasets.\n\n从低分辨率输入视图实现高分辨率新视图合成（HRNVS）是一个具有挑战性的任务，因为缺乏高分辨率数据。以往的方法是从低分辨率输入视图优化高分辨率的神经辐射场（NeRF），但受制于渲染速度慢。在这项工作中，我们基于三维高斯平涂（3DGS），因为它能够以更快的渲染速度产生高质量图像。为了缓解高分辨率合成数据的短缺，我们提出利用现成的二维扩散先验通过得分蒸馏采样（SDS）将二维知识蒸馏到三维。然而，直接将SDS应用于基于高斯的三维超分辨率会导致不良和冗余的三维高斯原语，这是由生成先验带来的随机性造成的。为了缓解这一问题，我们引入了两种简单而有效的技术来减少SDS引入的随机干扰。具体来说，我们1）通过退火策略缩小SDS中扩散时间步的范围；2）在密集化过程中随机丢弃冗余的高斯原语。广泛的实验表明，我们提出的GaussainSR可以仅使用低分辨率输入在合成和现实世界数据集上获得HRNVS的高质量结果。\n"
  },
  {
    "path": "abs/2406.10219.md",
    "content": "### PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting\n\nRecent advancements in novel view synthesis have enabled real-time rendering speeds and high reconstruction accuracy. 3D Gaussian Splatting (3D-GS), a foundational point-based parametric 3D scene representation, models scenes as large sets of 3D Gaussians. Complex scenes can comprise of millions of Gaussians, amounting to large storage and memory requirements that limit the viability of 3D-GS on devices with limited resources. Current techniques for compressing these pretrained models by pruning Gaussians rely on combining heuristics to determine which ones to remove. In this paper, we propose a principled spatial sensitivity pruning score that outperforms these approaches. It is computed as a second-order approximation of the reconstruction error on the training views with respect to the spatial parameters of each Gaussian. Additionally, we propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model without changing the training pipeline. After pruning 88.44% of the Gaussians, we observe that our PUP 3D-GS pipeline increases the average rendering speed of 3D-GS by 2.65× while retaining more salient foreground information and achieving higher image quality metrics than previous pruning techniques on scenes from the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets.\n\n近期在新视图合成领域的进展已经实现了实时渲染速度和高重建精度。三维高斯平涂（3D-GS）是一种基础的基于点的参数化三维场景表示方法，通过大量的三维高斯模型来表示场景。复杂场景可能包括数百万个高斯，这导致大量的存储和内存需求，限制了3D-GS在资源有限的设备上的可行性。目前压缩这些预训练模型的技术通过剪枝高斯，并依赖组合启发式方法来确定去除哪些高斯。在本文中，我们提出了一种基于原理的空间敏感性剪枝得分，其表现超过了这些方法。该得分作为对训练视图中每个高斯的空间参数的重建误差的二阶近似来计算。此外，我们提出了一个可应用于任何预训练3D-GS模型的多轮剪枝-精化流水线，而无需改变训练流程。在剪枝了88.44%的高斯之后，我们观察到我们的PUP 3D-GS流水线将3D-GS的平均渲染速度提高了2.65倍，同时保留了更多显著的前景信息，并在Mip-NeRF 360、坦克与庙宇以及深度混合数据集的场景中，比以前的剪枝技术实现了更高的图像质量指标。\n"
  },
  {
    "path": "abs/2406.10324.md",
    "content": "### L4GM: Large 4D Gaussian Reconstruction Model\n\nWe present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.\n\n我们提出了L4GM，这是第一个4D大型重建模型，能够从单视角视频输入中生成动画对象——仅通过一次前馈传递即可完成，耗时仅一秒。我们成功的关键在于一个新颖的数据集，该数据集包含了来自Objaverse的精心策划和渲染的动画对象多视图视频。该数据集展示了44K个不同的对象，具有110K个动画，这些动画在48个视点中被渲染，产生了总共300M帧的12M个视频。我们保持L4GM的简单性以便于扩展，并直接基于预训练的3D大型重建模型LGM构建，LGM可以从多视图图像输入输出3D高斯椭球体。L4GM从以低帧率采样的视频帧中输出每帧的3D高斯平涂表示，然后将表示上采样到更高的帧率以实现时间平滑性。我们在基础LGM中添加了时间自注意层，帮助它学习时间上的一致性，并使用每个时间步的多视图渲染损失来训练模型。通过训练一个插值模型将表示上采样到更高的帧率，该模型产生中间的3D高斯表示。我们展示了仅在合成数据上训练的L4GM在野外视频上具有极好的泛化能力，能够生成高质量的动画3D资产。\n"
  },
  {
    "path": "abs/2406.10373.md",
    "content": "### Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections\n\nPhotographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their extensive training demands and slow rendering speeds limit practical deployments. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising alternative to NeRF, offering superior training and inference efficiency along with better rendering quality. This paper presents Wild-GS, an innovative adaptation of 3DGS optimized for unconstrained photo collections while preserving its efficiency benefits. Wild-GS determines the appearance of each 3D Gaussian by their inherent material attributes, global illumination and camera properties per image, and point-level local variance of reflectance. Unlike previous methods that model reference features in image space, Wild-GS explicitly aligns the pixel appearance features to the corresponding local Gaussians by sampling the triplane extracted from the reference image. This novel design effectively transfers the high-frequency detailed appearance of the reference view to 3D space and significantly expedites the training process. Furthermore, 2D visibility maps and depth regularization are leveraged to mitigate the transient effects and constrain the geometry, respectively. Extensive experiments demonstrate that Wild-GS achieves state-of-the-art rendering performance and the highest efficiency in both training and inference among all the existing techniques.\n\n在非结构化旅游环境中拍摄的照片常常表现出变化多端的外观和瞬时遮挡，这对准确的场景重建和新视图合成造成挑战，并可能引发伪影。尽管之前的方法已将神经辐射场（NeRF）与额外的可学习模块结合起来以处理动态外观并消除瞬时物体，但它们广泛的训练需求和慢速的渲染速度限制了实际部署。最近，三维高斯平涂（3DGS）作为NeRF的有希望的替代方案出现，提供了更优的训练和推理效率以及更好的渲染质量。本文介绍了Wild-GS，这是对3DGS的创新适应，专为不受约束的照片集合优化，同时保留了其效率优势。Wild-GS通过每张图片的固有材料属性、全局照明和相机属性以及点级局部反射率变化来确定每个3D高斯的外观。与之前在图像空间建模参考特征的方法不同，Wild-GS通过采样从参考图像提取的三平面，显式地将像素外观特征与对应的局部高斯对齐。这种新颖的设计有效地将参考视图的高频详细外观转移到3D空间，并显著加快了训练过程。此外，还利用了2D可见性图和深度正则化来减轻瞬时效应和约束几何形状。广泛的实验表明，Wild-GS在所有现有技术中实现了最佳的渲染性能和最高的训练及推理效率。\n"
  },
  {
    "path": "abs/2406.10788.md",
    "content": "### Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics\n\nFor robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and (ii) allowing online correction from visual observations in a dynamic world. Our representation comprises particles that capture the geometrical aspect of objects in the world and can be used alongside a particle-based physics system to anticipate physically plausible future states. Attached to these particles are 3D Gaussians that render images from any viewpoint through a splatting process thus capturing the visual state. By comparing the predicted and observed images, our approach generates visual forces that correct the particle positions while respecting known physical constraints. By integrating predictive physical modelling with continuous visually-derived corrections, our unified representation reasons about the present and future while synchronizing with reality. Our system runs in realtime at 30Hz using only 3 cameras. We validate our approach on 2D and 3D tracking tasks as well as photometric reconstruction quality.\n\n为了使机器人能够稳健地理解和与物理世界互动，拥有一个全面的表示——模拟几何、物理和视觉观察——对于信息感知、规划和控制算法非常有益。我们提出了一种新颖的双高斯-粒子表示法，该表示法能够模拟物理世界，同时（i）实现未来状态的预测模拟和（ii）允许在动态世界中根据视觉观察进行在线校正。我们的表示包括捕捉世界中对象的几何方面的粒子，并可以与基于粒子的物理系统一起使用，以预测物理上可行的未来状态。这些粒子附加有3D高斯，通过平涂过程从任何视角渲染图像，从而捕捉视觉状态。通过比较预测图像和观察图像，我们的方法生成视觉力，这些视觉力在尊重已知物理约束的同时，校正粒子位置。通过将预测物理建模与连续的视觉导出校正相结合，我们的统一表示法在与现实同步的同时，对当前和未来进行推理。我们的系统仅使用3个摄像头即可实时运行，频率为30Hz。我们在2D和3D跟踪任务以及光度重建质量上验证了我们的方法。\n"
  },
  {
    "path": "abs/2406.11570.md",
    "content": "### Projecting Radiance Fields to Mesh Surfaces\n\nRadiance fields produce high fidelity images with high rendering speed, but are difficult to manipulate. We effectively perform avatar texture transfer across different appearances by combining benefits from radiance fields and mesh surfaces. We represent the source as a radiance field using 3D Gaussian Splatter, then project the Gaussians on the target mesh. Our pipeline consists of Source Preconditioning, Target Vectorization and Texture Projection. The projection completes in 1.12s in a pure CPU compute, compared to baselines techniques of Per Face Texture Projection and Ray Casting (31s, 4.1min). This method lowers the computational requirements, which makes it applicable to a broader range of devices from low-end mobiles to high end computers.\n\n辐射场可以生成高保真度图像并具有高渲染速度，但难以操控。我们通过结合光辉场和网格表面的优势，有效地进行了不同外观间的化身纹理转移。我们使用3D高斯喷洒器将源表示为光辉场，然后将高斯投影到目标网格上。我们的流程包括源预处理、目标矢量化和纹理投影。投影在纯CPU计算中完成，耗时1.12秒，与基准技术的逐面纹理投影和光线投射（31秒、4.1分钟）相比。此方法降低了计算需求，使其适用于从低端移动设备到高端计算机的更广泛的设备范围。\n"
  },
  {
    "path": "abs/2406.11672.md",
    "content": "### Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting\n\n3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity.\n\n多视图图像的三维重建是计算机视觉和图形学中的一个基本挑战。最近，三维高斯喷洒（3DGS）已成为一种有前景的技术，能够实现实时渲染和高质量的三维重建。这种方法利用三维高斯表示和基于瓦片的喷洒技术，绕过了昂贵的神经场查询。尽管有潜力，但3DGS面临着挑战，包括针状伪影、次优几何形状和不准确的法线，这是由于高斯聚集成具有一个主导方差的各向异性高斯。我们提议使用有效秩分析来检查三维高斯原始体的形状统计，并确定高斯确实聚集成具有有效秩1的针状形状。为了解决这个问题，我们引入了作为正则化的有效秩，它约束了高斯的结构。我们的新正则化方法在减少针状伪影的同时，增强了法线和几何形状的重建。这种方法可以作为附加模块集成到其他3DGS变体中，提高它们的质量而不损害视觉保真度。\n"
  },
  {
    "path": "abs/2406.11836.md",
    "content": "### RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians\n\nIn this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.\n\n在这项工作中，我们探索了在大规模、高分辨率数据集上训练高参数三维高斯喷洒（3DGS）模型的可能性。我们设计了一种针对3DGS的通用模型并行训练方法，命名为RetinaGS，该方法使用了恰当的渲染方程，可应用于任何场景和任意分布的高斯基元。它使我们能够探索之前难以探索的3DGS在基元数量和训练分辨率方面的扩展行为，并超越了以往的最先进重建质量。我们观察到使用我们的方法增加基元数量时视觉质量明显提升的正向趋势。我们还展示了首次尝试在全MatrixCity数据集上训练拥有超过十亿基元的3DGS模型，并取得了有希望的视觉质量。\n"
  },
  {
    "path": "abs/2406.12080.md",
    "content": "### A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets\n\nNovel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour.\n\n近年来，新视角合成取得了重大进展，其中三维高斯喷洒提供了卓越的视觉质量、快速训练和实时渲染。然而，训练和渲染所需的资源不可避免地限制了能够以良好视觉质量表示的捕获场景的大小。我们引入了一个三维高斯的层次结构，它保持了非常大型场景的视觉质量，同时提供了一个有效的细节级别（LOD）解决方案，用于高效渲染远处内容，并有效选择级别和平滑过渡各级别。我们引入了一种分而治之的方法，允许我们在独立的块中训练非常大的场景。我们将这些块整合成一个层次结构，可以优化以进一步提高合并到中间节点的高斯的视觉质量。非常大的捕获场景通常具有场景的稀疏覆盖，对原始的三维高斯喷洒训练方法提出了许多挑战；我们调整并规范训练以解决这些问题。我们提供了一个完整的解决方案，使得实时渲染非常大的场景成为可能，并能够根据我们的LOD方法适应可用资源。我们展示了使用一个简单且经济的装置捕获的场景的结果，这些场景包含多达数万张图片，覆盖长达数公里的轨迹，持续时间长达一小时。\n"
  },
  {
    "path": "abs/2406.12459.md",
    "content": "### HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors\n\nDespite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.\n\n尽管近期在高保真人体重建技术方面取得了进展，但密集捕获图像的要求或耗时的单例优化显著限制了它们在更广泛场景中的应用。为了解决这些问题，我们提出了HumanSplat，它可以从单一输入图像以泛化的方式预测任何人的三维高斯喷洒属性。特别是，HumanSplat包括一个2D多视图扩散模型和一个带有人类结构先验的潜在重建变换器，这些先验巧妙地将几何先验和语义特征整合在一个统一框架中。此外，设计了一个包含人类语义信息的层次化损失，以实现高保真纹理建模并更好地约束估计的多视图。在标准基准和实际场景图像上的全面实验表明，HumanSplat在实现逼真的新视角合成方面超越了现有的最先进方法。\n"
  },
  {
    "path": "abs/2406.13099.md",
    "content": "### Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models\n\nWe present a latent diffusion model over 3D scenes, that can be trained using only 2D image data. To achieve this, we first design an autoencoder that maps multi-view images to 3D Gaussian splats, and simultaneously builds a compressed latent representation of these splats. Then, we train a multi-view diffusion model over the latent space to learn an efficient generative model. This pipeline does not require object masks nor depths, and is suitable for complex scenes with arbitrary camera positions. We conduct careful experiments on two large-scale datasets of complex real-world scenes -- MVImgNet and RealEstate10K. We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, from a single input view, or from sparse input views. It produces diverse and high-quality results while running an order of magnitude faster than non-latent diffusion models and earlier NeRF-based generative models.\n\n我们提出了一种在3D场景上的潜在扩散模型，该模型可以仅使用2D图像数据进行训练。为了实现这一点，我们首先设计了一个自动编码器，将多视角图像映射到3D高斯飞溅，并同时构建这些飞溅的压缩潜在表示。然后，我们在潜在空间上训练一个多视图扩散模型，以学习一个高效的生成模型。这一流程不需要物体掩模或深度，并且适用于具有任意相机位置的复杂场景。我们在两个大规模的复杂真实世界场景数据集——MVImgNet和RealEstate10K上进行了仔细的实验。我们展示了我们的方法能够在短至0.2秒内生成3D场景，无论是从零开始，还是从单个输入视图或稀疏输入视图出发。它在运行速度上比非潜在扩散模型和早期基于NeRF的生成模型快一个数量级，同时产生多样化和高质量的结果。\n"
  },
  {
    "path": "abs/2406.13870.md",
    "content": "### Splatter a Video: Video Gaussian Representation for Versatile Processing\n\nVideo representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians. Our proposed representation models video appearance in a 3D canonical space using explicit Gaussians as proxies and associates each Gaussian with 3D motions for video motion. This approach offers a more intrinsic and explicit representation than layered atlas or volumetric pixel matrices. To obtain such a representation, we distill 2D priors, such as optical flow and depth, from foundation models to regularize learning in this ill-posed setting. Extensive applications demonstrate the versatility of our new video representation. It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation.\n\n视频表征是一个长期存在的问题，对于各种下游任务至关重要，如跟踪、深度预测、分割、视图合成和编辑。然而，当前的方法要么因缺乏3D结构而难以模拟复杂动作，要么依赖于不适合操作任务的隐式3D表征。为了应对这些挑战，我们引入了一种新的显式3D表征——视频高斯表征，该表征将视频嵌入到3D高斯中。我们提出的表征在3D规范空间中使用显式高斯作为代理来模拟视频外观，并将每个高斯与视频运动的3D运动相关联。这种方法提供了比分层图集或体积像素矩阵更本质和显式的表征。为了获得这种表征，我们从基础模型中提取2D先验，如光流和深度，以规范这种病态设置中的学习。广泛的应用证明了我们新视频表征的多功能性。它在众多视频处理任务中被证明是有效的，包括跟踪、一致的视频深度和特征精炼、运动和外观编辑，以及立体视频生成。\n"
  },
  {
    "path": "abs/2406.14927.md",
    "content": "### Gaussian-Informed Continuum for Physical Property Identification and Simulation\n\nThis paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to deduce implicit shapes during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuums. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as implicit shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility.\n\n本文研究了通过视觉观察估计物理属性（系统识别）的问题。为了在物理属性估计中提供几何感知的引导，我们引入了一种新颖的混合框架，该框架利用3D高斯表征不仅捕捉显式形状，还在训练过程中使模拟的连续体推断隐式形状。我们提出了一个基于运动分解的新动态3D高斯框架，以在不同时间状态下恢复对象为3D高斯点集。此外，我们开发了一种由粗到细的填充策略，以从高斯重建生成对象的密度场，从而允许提取对象连续体及其表面，并将高斯属性整合到这些连续体中。除了提取的对象表面外，高斯信息的连续体还能在模拟中实现对象掩模的渲染，作为物理属性估计的隐式形状引导。广泛的实验评估表明，我们的流程在多个基准和指标上实现了最先进的性能。此外，我们通过现实世界的演示说明了所提方法的有效性，展示了其实际应用价值。\n"
  },
  {
    "path": "abs/2406.14978.md",
    "content": "### E2GS: Event Enhanced Gaussian Splatting\n\nEvent cameras, known for their high dynamic range, absence of motion blur, and low energy usage, have recently found a wide range of applications thanks to these attributes. In the past few years, the field of event-based 3D reconstruction saw remarkable progress, with the Neural Radiance Field (NeRF) based approach demonstrating photorealistic view synthesis results. However, the volume rendering paradigm of NeRF necessitates extensive training and rendering times. In this paper, we introduce Event Enhanced Gaussian Splatting (E2GS), a novel method that incorporates event data into Gaussian Splatting, which has recently made significant advances in the field of novel view synthesis. Our E2GS effectively utilizes both blurry images and event data, significantly improving image deblurring and producing high-quality novel view synthesis. Our comprehensive experiments on both synthetic and real-world datasets demonstrate our E2GS can generate visually appealing renderings while offering faster training and rendering speed (140 FPS).\n\n事件相机以其高动态范围、无运动模糊和低能耗而闻名，最近由于这些特性而在广泛的应用领域中找到了用途。在过去几年中，基于事件的3D重建领域取得了显著进展，其中基于神经辐射场（NeRF）的方法展示了逼真的视图合成结果。然而，NeRF的体积渲染范式需要大量的训练和渲染时间。在本文中，我们介绍了事件增强高斯飞溅（E2GS），这是一种将事件数据融入高斯飞溅的新方法，高斯飞溅最近在新视角合成领域取得了重大进展。我们的E2GS有效地利用了模糊图像和事件数据，显著改善了图像去模糊，并产生了高质量的新视角合成。我们在合成和真实世界数据集上的全面实验表明，我们的E2GS可以生成视觉上吸引人的渲染，同时提供更快的训练和渲染速度（140 FPS）。\n"
  },
  {
    "path": "abs/2406.15149.md",
    "content": "### Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks\n\nSimulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. To this end, we first build a simulator by integrating Gaussian Splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, crafty programming of expert demonstration training data, and the task understanding capabilities of Liquid networks. Through a series of quantitative flight tests, we demonstrate the robust transfer of navigation skills learned in a single simulation scene directly to the real world. We further show the ability to maintain performance beyond the training environment under drastic distribution and physical environment changes. Our learned Liquid policies, trained on single target manoeuvres curated from a photorealistic simulated indoor flight only, generalize to multi-step hikes onboard a real hardware platform outdoors.\n\n模拟器是自主机器人学习的强大工具，因为它们提供了可扩展的数据生成、灵活的设计和轨迹优化。然而，将从模拟数据中学到的行为转移到现实世界往往是困难的，通常通过计算密集的领域随机化方法或进一步的模型微调来缓解。我们提出了一种方法，以改善从模拟到真实视觉四旋翼导航任务中的泛化能力和对分布偏移的鲁棒性。为此，我们首先通过将高斯飞溅与四旋翼飞行动力学整合来构建模拟器，然后使用液态神经网络训练鲁棒的导航策略。通过这种方式，我们获得了一个全栈的模仿学习协议，该协议结合了3D高斯飞溅辐射场渲染的进展、精巧的专家演示训练数据编程和液态网络的任务理解能力。通过一系列定量飞行测试，我们展示了在单一模拟场景中学到的导航技能直接鲁棒地转移到真实世界的能力。我们进一步展示了在剧烈分布和物理环境变化下保持超出训练环境性能的能力。我们的液态政策学习，仅在从逼真的模拟室内飞行中策划的单一目标机动上训练，普遍适用于在真实硬件平台上户外的多步远足。\n"
  },
  {
    "path": "abs/2406.15333.md",
    "content": "### GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation\n\nIn this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details. Extensive experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications.\n\n在这项工作中，我们引入了几何感知大型重建模型（GeoLRM），这是一种能够仅使用11GB GPU内存从21张输入图像中预测具有512k高斯的高质量资产的方法。以前的工作忽视了3D结构的固有稀疏性，并没有利用3D与2D图像之间的显式几何关系。这限制了这些方法到低分辨率的表示，并使得难以扩展到密集视图以获得更好的质量。GeoLRM通过整合一种新颖的3D感知变换器结构来解决这些问题，该结构直接处理3D点，并使用可变形交叉注意机制有效地将图像特征整合到3D表示中。我们通过两阶段管道实现了这一解决方案：最初，一个轻量级的提议网络从定位图像输入生成一组稀疏的3D锚点；随后，一个专门的重建变换器细化几何并检索纹理细节。广泛的实验结果表明，GeoLRM显著优于现有模型，特别是对于密集视图输入。我们还展示了我们模型在3D生成任务中的实际应用性，展示了其多功能性和在真实世界应用中更广泛采用的潜力。\n"
  },
  {
    "path": "abs/2406.15643.md",
    "content": "### Taming 3DGS: High-Quality Radiance Fields with Limited Resources\n\n3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, its resource requirements limit its usability. Especially on constrained devices, training performance degrades quickly and often cannot complete due to excessive memory consumption of the model. The method converges with an indefinite number of Gaussians -- many of them redundant -- making rendering unnecessarily slow and preventing its usage in downstream tasks that expect fixed-size inputs. To address these issues, we tackle the challenges of training and rendering 3DGS models on a budget. We use a guided, purely constructive densification process that steers densification toward Gaussians that raise the reconstruction quality. Model size continuously increases in a controlled manner towards an exact budget, using score-based densification of Gaussians with training-time priors that measure their contribution. We further address training speed obstacles: following a careful analysis of 3DGS' original pipeline, we derive faster, numerically equivalent solutions for gradient computation and attribute updates, including an alternative parallelization for efficient backpropagation. We also propose quality-preserving approximations where suitable to reduce training time even further. Taken together, these enhancements yield a robust, scalable solution with reduced training times, lower compute and memory requirements, and high quality. Our evaluation shows that in a budgeted setting, we obtain competitive quality metrics with 3DGS while achieving a 4--5x reduction in both model size and training time. With more generous budgets, our measured quality surpasses theirs. These advances open the door for novel-view synthesis in constrained environments, e.g., mobile devices.\n\n3D 高斯散射（3DGS）通过其快速、可解释和高保真渲染，已转变了新视角合成的方式。然而，其资源需求限制了其可用性。特别是在资源受限的设备上，训练性能迅速下降，常常因模型的过度内存消耗而无法完成训练。该方法使用不定数量的高斯核，许多高斯核是多余的，这使得渲染过程不必要地缓慢，并阻碍了其在需要固定大小输入的下游任务中的使用。为了解决这些问题，我们应对了在预算内训练和渲染 3DGS 模型的挑战。我们使用一个引导性的、纯粹建设性的密集化过程，引导密集化过程向提高重建质量的高斯核倾斜。模型大小在控制中不断增加，向精确预算逼近，使用基于分数的高斯核密集化以及训练时先验来衡量它们的贡献。我们还解决了训练速度的障碍：在对 3DGS 原始流程进行仔细分析后，我们得出了用于梯度计算和属性更新的更快、数值等效的解决方案，包括一个用于高效反向传播的替代并行化方案。我们还提出了适当的保质近似方法，以进一步减少训练时间。综合这些增强功能，我们提供了一个健壮、可扩展的解决方案，具有更短的训练时间、更低的计算和内存需求，以及高质量。我们的评估显示，在有预算的情况下，我们与 3DGS 的质量指标具有竞争力，同时实现了模型大小和训练时间的 4-5 倍减少。在更宽松的预算下，我们的测量质量超过了他们。这些进步为在受限环境中（例如移动设备）的新视角合成打开了大门。\n"
  },
  {
    "path": "abs/2406.16073.md",
    "content": "### LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction\n\nThe advent of 3D Gaussian Splatting (3D-GS) techniques and their dynamic scene modeling variants, 4D-GS, offers promising prospects for real-time rendering of dynamic surgical scenarios. However, the prerequisite for modeling dynamic scenes by a large number of Gaussian units, the high-dimensional Gaussian attributes and the high-resolution deformation fields, all lead to serve storage issues that hinder real-time rendering in resource-limited surgical equipment. To surmount these limitations, we introduce a Lightweight 4D Gaussian Splatting framework (LGS) that can liberate the efficiency bottlenecks of both rendering and storage for dynamic endoscopic reconstruction. Specifically, to minimize the redundancy of Gaussian quantities, we propose Deformation-Aware Pruning by gauging the impact of each Gaussian on deformation. Concurrently, to reduce the redundancy of Gaussian attributes, we simplify the representation of textures and lighting in non-crucial areas by pruning the dimensions of Gaussian attributes. We further resolve the feature field redundancy caused by the high resolution of 4D neural spatiotemporal encoder for modeling dynamic scenes via a 4D feature field condensation. Experiments on public benchmarks demonstrate efficacy of LGS in terms of a compression rate exceeding 9 times while maintaining the pleasing visual quality and real-time rendering efficiency. LGS confirms a substantial step towards its application in robotic surgical services.\n\n3D 高斯散射（3D-GS）技术及其动态场景建模变体，4D-GS，为动态手术场景的实时渲染提供了有前景的可能性。然而，为了模拟动态场景而需要大量高斯单元、高维高斯属性和高分辨率变形场，这些都导致了严重的存储问题，阻碍了资源有限的手术设备中的实时渲染。为了克服这些限制，我们引入了一个轻量级4D高斯散射框架（LGS），该框架可以解放动态内窥镜重建的渲染和存储效率瓶颈。具体来说，为了最小化高斯数量的冗余，我们提出了基于变形感知的剪枝，通过评估每个高斯对变形的影响来实施。同时，为了减少高斯属性的冗余，我们通过剪减高斯属性的维度，简化了非关键区域的纹理和光照的表达。我们进一步通过4D神经时空编码器的高分辨率特征场来解决特征场冗余问题，采用了4D特征场的压缩。在公共基准测试上的实验表明，LGS在保持令人满意的视觉质量和实时渲染效率的同时，压缩率超过9倍。LGS确认了其在机器人手术服务中应用的重要进步。\n"
  },
  {
    "path": "abs/2406.16695.md",
    "content": "### Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling\n\nScore distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models.\n\n得分提取采样（SDS）方法，即将预训练的2D扩散模型的得分提炼到3D表示中，最近在文本到3D生成任务中带来了显著的进步。然而，这种方法仍面临着严重的几何不一致问题，例如Janus问题。出发点是这种不一致问题可能由从不同视角预测的2D得分之间的多视图不一致引起，我们引入了GSD，一个简单且通用的即插即用框架，用于将3D一致性和几何意识融入SDS过程。我们的方法由三个组成部分构成：设计用于产生完美遵循标准高斯分布的3D一致噪声图的3D一致噪声化，基于几何的梯度变形用于识别不同视角预测梯度之间的对应关系，以及优化场景几何以产生更一致梯度的新颖梯度一致性损失。我们证明了我们的方法显著提高了性能，成功地解决了文本到3D生成任务中的几何不一致问题，计算成本最小，并且与现有的基于得分提取的模型兼容。\n"
  },
  {
    "path": "abs/2406.16815.md",
    "content": "### ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians\n\nHigh-fidelity 3D garment synthesis from text is desirable yet challenging for digital avatar creation. Recent diffusion-based approaches via Score Distillation Sampling (SDS) have enabled new possibilities but either intricately couple with human body or struggle to reuse. We introduce ClotheDreamer, a 3D Gaussian-based method for generating wearable, production-ready 3D garment assets from text prompts. We propose a novel representation Disentangled Clothe Gaussian Splatting (DCGS) to enable separate optimization. DCGS represents clothed avatar as one Gaussian model but freezes body Gaussian splats. To enhance quality and completeness, we incorporate bidirectional SDS to supervise clothed avatar and garment RGBD renderings respectively with pose conditions and propose a new pruning strategy for loose clothing. Our approach can also support custom clothing templates as input. Benefiting from our design, the synthetic 3D garment can be easily applied to virtual try-on and support physically accurate animation. Extensive experiments showcase our method's superior and competitive performance.\n\n高保真的3D服装从文本合成对数字化化身创建而言是理想但具挑战性的。最近基于扩散的方法通过得分提取采样（SDS）开辟了新的可能性，但这些方法要么与人体紧密耦合，要么难以重用。我们引入了ClotheDreamer，一种基于3D高斯的方法，用于从文本提示生成可穿戴、生产就绪的3D服装资产。我们提出了一种新的表示方法——解耦高斯服装散射（DCGS），以实现单独优化。DCGS将穿着服装的化身表示为一个高斯模型，但冻结了身体高斯散射。为了提高质量和完整性，我们结合使用双向SDS分别监督穿着服装的化身和服装的RGBD渲染，带有姿态条件，并提出了一种新的修剪策略用于宽松服装。我们的方法还可以支持自定义服装模板作为输入。得益于我们的设计，合成的3D服装可以轻松应用于虚拟试穿，并支持物理精确的动画。广泛的实验展示了我们方法的优越和竞争性能。\n"
  },
  {
    "path": "abs/2406.17074.md",
    "content": "### Reducing the Memory Footprint of 3D Gaussian Splatting\n\n3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a 27 reduction in overall size on disk on the standard datasets we tested, along with a 1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device.\n\n3D高斯散射为新视角合成提供了出色的视觉质量，具有快速训练和实时渲染的特点；不幸的是，该方法在存储和传输上的内存需求异常高。我们首先分析了这一问题的原因，确定了可以减少存储的三个主要领域：用于表示场景的3D高斯原始体的数量、用于表示方向性辐射的球谐函数的系数数量，以及存储高斯原始体属性所需的精度。我们针对这些问题提出了解决方案。首先，我们提出了一种高效的、分辨率感知的原始体修剪方法，将原始体数量减少了一半。其次，我们引入了一种自适应调整方法，用于选择表示每个高斯原始体方向性辐射的系数数量，最后是基于码本的量化方法，结合半浮点表示法以进一步减少内存。这三个组件结合在一起，导致我们在测试的标准数据集上的总体磁盘大小减少了27倍，渲染速度提高了1.7倍。我们在标准数据集上演示了我们的方法，并展示了我们的解决方案如何显著减少在移动设备上使用该方法时的下载时间。\n"
  },
  {
    "path": "abs/2406.17601.md",
    "content": "### Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text\n\nRecent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories. To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions. (2) Next, a Gaussian-driven Multi-view Latent Diffusion Model serves as the Decorator, modeling the image sequence distribution given the camera trajectories and texts. This model, fine-tuned from a 2D diffusion model, directly generates pixel-aligned 3D Gaussians as an immediate 3D scene representation for consistent denoising. (3) Lastly, the 3D Gaussians are refined by a novel SDS++ loss as the Detailer, which incorporates the prior of the 2D diffusion model. Extensive experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.\n\n近期在3D生成领域的进展利用了带有真实3D资产和预定义相机的合成数据集。然而，采用真实世界数据集的潜力，这些数据集可以生成更为现实的3D场景，到目前为止还大部分未被探索。在这项工作中，我们深入探讨了真实世界捕捉中发现的复杂且特定于场景的相机轨迹这一关键挑战。我们引入了Director3D，一个健壮的开放世界文本到3D生成框架，旨在生成真实世界的3D场景和自适应相机轨迹。为了实现这一目标，（1）我们首先利用轨迹扩散变压器，作为摄影师，根据文本描述来模拟相机轨迹的分布。（2）接下来，一个由高斯驱动的多视角潜在扩散模型充当装饰者，根据相机轨迹和文本模拟图像序列分布。该模型从2D扩散模型中微调而来，直接生成与像素对齐的3D高斯作为立即的3D场景表示，以实现一致的去噪。（3）最后，3D高斯由一种新颖的SDS++损失精炼，作为细节师，该损失结合了2D扩散模型的先验知识。广泛的实验表明，Director3D优于现有方法，为真实世界的3D生成提供了卓越的性能。\n"
  },
  {
    "path": "abs/2406.18198.md",
    "content": "### VDG: Vision-Only Dynamic Gaussian for Driving Simulation\n\nDynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (VDG) to boost pose and depth initialization and static-dynamic decomposition. Moreover, VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method. We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods.\n\n动态高斯散射已经在新视角的场景重建和图像合成中取得了显著的进展。然而，现有方法严重依赖于通过运动结构（SfM）算法或昂贵传感器预先计算的姿态和高斯初始化。本文首次通过将自监督的视觉里程计（VO）集成到我们的无姿态动态高斯方法（VDG）中，来解决这一问题，以提升姿态和深度初始化以及静态-动态分解。此外，VDG只需使用RGB图像输入，就可以比无姿态动态视角合成方法更快地构建动态场景，并处理更大的场景。我们通过广泛的定量和定性实验展示了我们方法的稳健性。我们的结果显示，在动态视角合成方法中，性能优于现有的最先进技术。\n"
  },
  {
    "path": "abs/2406.18199.md",
    "content": "### GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting\n\nThe 3D Gaussian Splatting technique has significantly advanced the construction of radiance fields from multi-view images, enabling real-time rendering. While point-based rasterization effectively reduces computational demands for rendering, it often struggles to accurately reconstruct the geometry of the target object, especially under strong lighting. To address this challenge, we introduce a novel approach that combines octree-based implicit surface representations with Gaussian splatting. Our method consists of four stages. Initially, it reconstructs a signed distance field (SDF) and a radiance field through volume rendering, encoding them in a low-resolution octree. The initial SDF represents the coarse geometry of the target object. Subsequently, it introduces 3D Gaussians as additional degrees of freedom, which are guided by the SDF. In the third stage, the optimized Gaussians further improve the accuracy of the SDF, allowing it to recover finer geometric details compared to the initial SDF obtained in the first stage. Finally, it adopts the refined SDF to further optimize the 3D Gaussians via splatting, eliminating those that contribute little to visual appearance. Experimental results show that our method, which leverages the distribution of 3D Gaussians with SDFs, reconstructs more accurate geometry, particularly in images with specular highlights caused by strong lighting.\n\n3D高斯散射技术在多视角图像构建辐射场方面取得了重大进展，使实时渲染成为可能。尽管基于点的光栅化有效地降低了渲染的计算需求，但它经常难以准确重建目标对象的几何形状，特别是在强照明下。为了应对这一挑战，我们引入了一种将基于八叉树的隐式表面表示与高斯散射结合的新方法。我们的方法包括四个阶段。起初，它通过体积渲染重建了一个符号距离场（SDF）和一个辐射场，并将它们编码在一个低分辨率的八叉树中。初始SDF代表了目标对象的粗糙几何形状。随后，它引入了3D高斯作为额外的自由度，这些高斯由SDF指导。在第三阶段，优化后的高斯进一步提高了SDF的准确性，使其能够恢复比第一阶段获得的初始SDF更细致的几何细节。最后，它采用了精炼的SDF来进一步优化3D高斯散射，淘汰那些对视觉外观贡献甚微的高斯。实验结果表明，我们的方法利用3D高斯与SDF的分布，特别是在有强照明引起的镜面高光的图像中，能够重建更精确的几何形状。\n"
  },
  {
    "path": "abs/2406.18214.md",
    "content": "### Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning\n\nIn recent times, the utilization of 3D models has gained traction, owing to the capacity for end-to-end training initially offered by Neural Radiance Fields and more recently by 3D Gaussian Splatting (3DGS) models. The latter holds a significant advantage by inherently easing rapid convergence during training and offering extensive editability. However, despite rapid advancements, the literature still lives in its infancy regarding the scalability of these models. In this study, we take some initial steps in addressing this gap, showing an approach that enables both the memory and computational scalability of such models. Specifically, we propose \"Trimming the fat\", a post-hoc gradient-informed iterative pruning technique to eliminate redundant information encoded in the model. Our experimental findings on widely acknowledged benchmarks attest to the effectiveness of our approach, revealing that up to 75% of the Gaussians can be removed while maintaining or even improving upon baseline performance. Our approach achieves around 50× compression while preserving performance similar to the baseline model, and is able to speed-up computation up to 600~FPS.\n\n近期，由于神经辐射场（NeRF）和更近期的3D高斯散射（3DGS）模型提供的端到端训练能力，3D模型的使用已经变得越来越受欢迎。后者在训练期间能够快速收敛，并提供广泛的可编辑性，因此具有显著的优势。然而，尽管取得了迅速的进展，现有文献对这些模型的可扩展性研究仍处于初期阶段。在这项研究中，我们采取了一些初步措施来解决这一差距，展示了一种既能扩展内存也能扩展计算能力的模型方法。具体来说，我们提出了“剪除冗余”，这是一种事后基于梯度的迭代修剪技术，用于消除模型中编码的冗余信息。我们在广泛认可的基准测试上的实验结果证明了我们方法的有效性，显示出多达75%的高斯可以被移除，同时保持或甚至提升基线性能。我们的方法实现了大约50倍的压缩，同时保持与基线模型类似的性能，并能将计算速度提高到600 FPS。\n"
  },
  {
    "path": "abs/2406.18462.md",
    "content": "### GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality\n\nRecently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without control as the generation process may cause indeterminacy. Aiming at highly enhancing the generation quality, we propose a novel framework named GaussianDreamerPro. The main idea is to bind Gaussians to reasonable geometry, which evolves over the whole generation process. Along different stages of our framework, both the geometry and appearance can be enriched progressively. The final output asset is constructed with 3D Gaussians bound to mesh, which shows significantly enhanced details and quality compared with previous methods. Notably, the generated asset can also be seamlessly integrated into downstream manipulation pipelines, e.g. animation, composition, and simulation etc., greatly promoting its potential in wide applications.\n\n最近，3D高斯散射（3D-GS）在重建和渲染真实世界场景方面取得了巨大成功。为了将高质量渲染转移到生成任务中，一系列研究工作尝试从文本生成3D高斯资产。然而，生成的资产并未达到重建任务中的相同质量。我们观察到，在生成过程中，由于可能引起的不确定性，高斯的增长往往无法控制。为了极大地提高生成质量，我们提出了一个名为GaussianDreamerPro的新框架。主要思想是将高斯绑定到整个生成过程中逐步演化的合理几何结构上。在我们框架的不同阶段，几何形状和外观都可以逐步丰富。最终输出的资产是与网格绑定的3D高斯构建的，与以前的方法相比，细节和质量显著提高。值得注意的是，生成的资产还可以无缝地集成到下游操作流程中，例如动画、合成和模拟等，极大地推动了其在广泛应用中的潜力。\n"
  },
  {
    "path": "abs/2406.18533.md",
    "content": "### On Scaling Up 3D Gaussian Splatting Training\n\n3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. As each Gaussian affects a small, dynamic subset of rendered pixels, Grendel employs sparse all-to-all communication to transfer the necessary Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing 3DGS systems that train using one camera view image at a time, Grendel supports batched training with multiple views. We explore various optimization hyperparameter scaling strategies and find that a simple sqrt(batch size) scaling rule is highly effective. Evaluations using large-scale, high-resolution scenes show that Grendel enhances rendering quality by scaling up 3DGS parameters across multiple GPUs. On the Rubble dataset, we achieve a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to a PSNR of 26.28 using 11.2 million Gaussians on a single GPU.\n\n3D高斯散射（3DGS）由于其卓越的视觉质量和渲染速度，越来越受到3D重建的青睐。然而，目前3DGS的训练仅在单个GPU上进行，由于内存限制，这限制了其处理高分辨率和大规模3D重建任务的能力。我们引入了Grendel，这是一个分布式系统，旨在将3DGS参数进行分区并跨多个GPU并行计算。由于每个高斯只影响一小部分动态变化的渲染像素，Grendel采用稀疏全互联通信来传输必要的高斯到像素分区，并执行动态负载平衡。与现有的3DGS系统不同，这些系统一次只训练一个相机视图图像，Grendel支持使用多个视图的批量训练。我们探索了各种优化超参数缩放策略，并发现简单的sqrt(批量大小)缩放规则非常有效。在使用大规模高分辨率场景的评估中显示，Grendel通过在多个GPU上扩展3DGS参数，提高了渲染质量。在Rubble数据集上，我们通过在16个GPU上分布4040万个高斯达到了27.28的测试PSNR，相比之下，在单个GPU上使用1120万个高斯的PSNR为26.28。\n"
  },
  {
    "path": "abs/2406.18544.md",
    "content": "### GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors\n\n3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. Inspired by previous works, the signed distance field (SDF) can serve as an effective way for geometry regularization. However, a direct incorporation between Gaussians and SDF significantly slows training. To this end, we propose GS-ROR for reflective objects relighting with 3DGS aided by SDF priors. At the core of our method is the mutual supervision of the depth and normal between deferred Gaussians and SDF, which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned deferred Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, while the alpha-blended Gaussians are smooth, individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we further introduce an SDF-aware pruning strategy to remove Gaussian outliers, which are located distant from the surface defined by SDF, avoiding the floater issue. Consequently, our method outperforms the existing Gaussian-based inverse rendering methods in terms of relighting quality. Our method also exhibits competitive relighting quality compared to NeRF-based methods with at most 25% of training time and allows rendering at 200+ frames per second on an RTX4090.\n\n3D高斯散射（3DGS）由于其详细的表达能力和高效的渲染速度，在新视角合成中显示出强大的能力。不幸的是，使用3DGS创建可重新照明的3D资产仍然存在问题，尤其是对于反射性物体，因为其不连续的表示形式在约束几何形状方面带来了困难。受先前工作的启发，符号距离场（SDF）可以作为几何规范化的有效方式。然而，高斯与SDF的直接结合显著降低了训练速度。为此，我们提出了GS-ROR方法，用于借助SDF先验对反射性物体进行重新照明处理，结合3DGS使用。我们方法的核心是延迟高斯与SDF之间的深度和法线的相互监督，这避免了SDF的昂贵体积渲染。由于这种相互监督，学习到的延迟高斯受到良好的约束，同时最小化了时间成本。由于高斯在延迟着色模式下渲染，虽然alpha混合的高斯是平滑的，但个别高斯仍可能是异常值，产生浮动伪影。因此，我们进一步引入了一种意识到SDF的修剪策略，以移除远离由SDF定义的表面的高斯异常值，避免了浮动问题。因此，我们的方法在重新照明质量方面超过了现有的基于高斯的逆渲染方法。我们的方法与基于NeRF的方法相比也展示了有竞争力的重新照明质量，训练时间最多只有25%，并且在RTX4090上可以以200+帧每秒的速度渲染。\n"
  },
  {
    "path": "abs/2406.18717.md",
    "content": "### Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos\n\nGaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while also tracking scene geometry far better than alternative representations. Yet, these methods assume dense multi-view videos as supervision, constraining their use to controlled capture settings. In this work, we extend the capability of Gaussian scene representations to casually captured monocular videos. We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained. Building off this finding, we propose Dynamic Gaussian Marbles (DGMarbles), consisting of three core modifications that target the difficulties of the monocular setting. First, DGMarbles uses isotropic Gaussian \"marbles\", reducing the degrees of freedom of each Gaussian, and constraining the optimization to focus on motion and appearance over local shape. Second, DGMarbles employs a hierarchical divide-and-conquer learning strategy to guide the optimization towards solutions with coherent motion. Finally, DGMarbles adds image-level and geometry-level priors into the optimization, including a tracking loss that takes advantage of recent progress in point tracking. By constraining the optimization in these ways, DGMarbles learns Gaussian trajectories that enable novel-view rendering and accurately capture the 3D motion of the scene elements. We evaluate on the (monocular) Nvidia Dynamic Scenes dataset and the Dycheck iPhone dataset, and show that DGMarbles significantly outperforms other Gaussian baselines in quality, and is on-par with non-Gaussian representations, all while maintaining the efficiency, compositionality, editability, and tracking benefits of Gaussians.\n\n高斯散射已成为新视角合成的流行表示方法，展示了其在效率、光度质量和组合可编辑性方面的明显优势。继其成功之后，许多工作将高斯扩展到了4D，表明动态高斯在保持这些优势的同时，还能比其他替代表示方法更好地追踪场景几何。然而，这些方法假设使用密集的多视角视频作为监督，限制了它们在受控捕捉环境中的使用。在本工作中，我们将高斯场景表示的能力扩展到随意捕获的单目视频。我们展示了现有的4D高斯方法在这种设置中戏剧性地失败，因为单目设置是欠约束的。基于这一发现，我们提出了动态高斯弹珠（DGMarbles），包括针对单目设置困难的三个核心修改。首先，DGMarbles使用各向同性的高斯“弹珠”，减少了每个高斯的自由度，并将优化限制在关注运动和外观而非局部形状上。其次，DGMarbles采用了一个层次化的分而治之学习策略，引导优化朝向具有连贯运动的解决方案。最后，DGMarbles在优化中加入了图像级和几何级先验，包括利用最近在点追踪方面的进展的追踪损失。通过这些方式约束优化，DGMarbles学习到的高斯轨迹使得新视角渲染成为可能，并准确捕捉了场景元素的3D运动。我们在（单目的）Nvidia Dynamic Scenes数据集和Dycheck iPhone数据集上进行评估，并显示DGMarbles在质量上显著优于其他高斯基线，并与非高斯表示相当，同时保持了高斯的效率、组合性、可编辑性和追踪优势。\n"
  },
  {
    "path": "abs/2406.19070.md",
    "content": "### FAGhead: Fully Animate Gaussian Head from Monocular Videos\n\nHigh-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Representation Field (PLRF) with learnable Gaussian point positions to enhance reconstruction performance. Meanwhile, to effectively manage the edges of avatars, we introduced the alpha rendering to supervise the alpha value of each pixel. Extensive experimental results on the open-source datasets and our capturing datasets demonstrate that our approach is able to generate high-fidelity 3D head avatars and fully control the expression and pose of the virtual avatars, which is outperforming than existing works.\n\n高保真重建3D人类化身在虚拟现实中有广泛的应用。在本文中，我们介绍了FAGhead方法，该方法能从单目视频生成完全可控的人类肖像。我们明确了传统的3D可变形网格（3DMM），并优化了中性3D高斯来重建复杂的表情。此外，我们采用了一种新颖的基于点的可学习表示场（PLRF），其中包含可学习的高斯点位置，以增强重建性能。同时，为了有效管理化身的边缘，我们引入了alpha渲染来监督每个像素的alpha值。在开源数据集和我们的捕获数据集上的广泛实验结果表明，我们的方法能够生成高保真的3D头部化身，并完全控制虚拟化身的表情和姿势，性能优于现有的工作。\n"
  },
  {
    "path": "abs/2406.19434.md",
    "content": "### Lightweight Predictive 3D Gaussian Splats\n\nRecent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space. This poses a very practical limitation, prohibiting widespread adoption.Several solutions have been proposed to strike a balance between disk size and rendering quality, noticeably reducing the visual quality. In this work, we propose a new representation that dramatically reduces the hard drive footprint while featuring similar or improved quality when compared to the standard 3D Gaussian splats. When compared to other compact solutions, ours offers higher quality renderings with significantly reduced storage, being able to efficiently run on a mobile device in real-time. Our key observation is that nearby points in the scene can share similar representations. Hence, only a small ratio of 3D points needs to be stored. We introduce an approach to identify such points which are called parent points. The discarded points called children points along with attributes can be efficiently predicted by tiny MLPs.\n\n近年来，使用高斯斑点表示3D对象和场景的方法在各种平台和设备上显示出了更快的渲染速度。尽管渲染这种表示确实非常高效，但存储和传输却往往代价高昂。为了表示大规模场景，通常需要存储数百万个3D高斯函数，占据几十GB的磁盘空间。这对实际应用构成了严重限制，阻碍了广泛采用。\n已经提出了几种解决方案来在磁盘空间和渲染质量之间取得平衡，明显降低了视觉质量。在本研究中，我们提出了一种新的表示方法，显著减少了硬盘占用空间，同时在与标准3D高斯斑点相比的视觉质量方面表现出类似或更好的效果。与其他紧凑方案相比，我们的方法能够以更高质量的渲染效果显著减少存储空间，可以在移动设备上实时高效运行。\n我们的关键观察是场景中附近的点可以共享类似的表示。因此，只需要存储场景中的少量3D点。我们引入了一种方法来识别这些称为父点的点。被丢弃的点，即称为子点的点，以及相关属性可以通过小型MLP（多层感知器）进行高效预测。\n"
  },
  {
    "path": "abs/2406.19811.md",
    "content": "### EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting\n\nHuman activities are inherently complex, and even simple household tasks involve numerous object interactions. To better understand these activities and behaviors, it is crucial to model their dynamic interactions with the environment. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand dynamic human-object interactions in 3D environments. However, most existing methods for human activity modeling either focus on reconstructing 3D models of hand-object or human-scene interactions or on mapping 3D scenes, neglecting dynamic interactions with objects. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage the uniquely discrete nature of Gaussian Splatting and segment dynamic interactions from the background. Our approach employs a clip-level online learning pipeline that leverages the dynamic nature of human activities, allowing us to reconstruct the temporal evolution of the scene in chronological order and track rigid object motion. Additionally, our method automatically segments object and background Gaussians, providing 3D representations for both static scenes and dynamic objects. EgoGaussian outperforms previous NeRF and Dynamic Gaussian methods in challenging in-the-wild videos and we also qualitatively demonstrate the high quality of the reconstructed models.\n\n\n人类活动本质上复杂，即使是简单的家庭任务也涉及大量的物体交互。为了更好地理解这些活动和行为，关键在于模拟它们与环境的动态交互。近年来，廉价的头戴摄像机和自我中心数据的可用性提供了在3D环境中理解动态人-物交互更为可靠和高效的手段。然而，大多数现有的人类活动建模方法要么专注于重建手-物体或人-场景交互的3D模型，要么专注于映射3D场景，忽视了对物体的动态交互。现有的解决方案通常需要来自多个来源的输入，包括多摄像头设置、深度感知摄像头或动态传感器。\n为此，我们介绍了EgoGaussian，这是第一种能够仅通过RGB自我中心输入同时重建3D场景和动态跟踪3D物体运动的方法。我们利用了高斯斑点的独特离散特性，并从背景中分割动态交互。我们的方法采用了一个剪辑级在线学习管道，利用人类活动的动态特性，允许我们按时间顺序重建场景的时间演变并跟踪刚性物体运动。此外，我们的方法自动分割物体和背景高斯函数，为静态场景和动态物体提供3D表示。\nEgoGaussian在野外视频中的表现优于先前的NeRF和动态高斯方法，我们还在质量上展示了重建模型的高质量特性。\n"
  },
  {
    "path": "abs/2406.20055.md",
    "content": "### SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world captures problematic. We present SpotlessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors. Our method achieves state-of-the-art reconstruction quality both visually and quantitatively, on casual captures.\n\n3D高斯喷溅（3DGS）是一种有前景的三维重建技术，具备高效的训练和渲染速度，适用于实时应用。然而，当前的方法要求环境高度控制（无移动人物或被风吹动的元素，以及一致的照明），以满足3D高斯喷溅的视角一致性假设。这使得在现实世界捕捉的重建变得困难。我们提出了SpotlessSplats，一种利用预训练和通用特征结合强健优化来有效忽略瞬态干扰的方法。我们的方法在非正式捕捉条件下，在视觉和定量上实现了最先进的重建质量。\n"
  },
  {
    "path": "abs/2407.00316.md",
    "content": "### OccFusion: Rendering Occluded Humans with Generative Diffusion Priors\n\nMost existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of the human. Considering this, we present OccFusion, an approach that utilizes efficient 3D Gaussian splatting supervised by pretrained 2D diffusion models for efficient and high-fidelity human rendering. We propose a pipeline consisting of three stages. In the Initialization stage, complete human masks are generated from partial visibility masks. In the Optimization stage, 3D human Gaussians are optimized with additional supervision by Score-Distillation Sampling (SDS) to create a complete geometry of the human. Finally, in the Refinement stage, in-context inpainting is designed to further improve rendering quality on the less observed human body parts. We evaluate OccFusion on ZJU-MoCap and challenging OcMotion sequences and find that it achieves state-of-the-art performance in the rendering of occluded humans.\n\n现有的大多数人类渲染方法要求视频中人体的每个部分都必须完全可见。然而，在现实生活中，遮挡是常见的，导致人体只能部分可见。基于此考虑，我们提出了OccFusion方法，它利用高效的三维高斯光滑技术，由预训练的二维扩散模型进行监督，实现高效且高保真度的人体渲染。我们提出了一个包含三个阶段的流程。在初始化阶段，从部分可见掩模生成完整的人体掩模。在优化阶段，通过得分蒸馏抽样（SDS）额外监督，优化三维人体高斯模型，以创建完整的人体几何形状。最后，在细化阶段，设计了上下文修复技术，进一步改善对少见人体部位的渲染质量。我们在ZJU-MoCap和具有挑战性的OcMotion序列上评估了OccFusion，并发现它在渲染遮挡人体方面达到了最先进的性能水平。\n"
  },
  {
    "path": "abs/2407.00435.md",
    "content": "### RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering\n\nPoint-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging.\nThis paper proposes RTGS, a PBNR system that for the first time delivers real-time neural rendering on mobile devices while maintaining human visual quality. RTGS combines two techniques. First, we present an efficiency-aware pruning technique to optimize rendering speed. Second, we introduce a Foveated Rendering (FR) method for PBNR, leveraging humans' low visual acuity in peripheral regions to relax rendering quality and improve rendering speed. Our system executes in real-time (above 100 FPS) on Nvidia Jetson Xavier board without sacrificing subjective visual quality, as confirmed by a user study.\n\n本文介绍了一种名为RTGS的点云神经渲染（PBNR）系统，这是首个在移动设备上实现实时神经渲染并保持人类视觉质量的系统。PBNR是一类新兴的渲染技术，特别适用于增强现实/虚拟现实和数字孪生等领域，因其实时性和逼真度需求日益增长而备受关注。在移动设备上实现实时PBNR具有挑战性。\nRTGS系统结合了两种技术。首先，我们提出了一种效率感知的修剪技术，以优化渲染速度。其次，我们引入了一种称为焦点渲染（FR）的方法，用于PBNR，利用人眼在周边区域的低视觉敏感度，以降低渲染质量要求并提高渲染速度。我们的系统在Nvidia Jetson Xavier板上能够实现实时运行（超过100帧每秒），同时未牺牲主观视觉质量，这一点由用户研究得到了确认。\n"
  },
  {
    "path": "abs/2407.01029.md",
    "content": "### EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting\n\n3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually the case in real-world clinical scenarios. To tackle this sparsity challenge, we propose a framework leveraging the prior knowledge from multiple foundation models during the reconstruction process, dubbed as EndoSparse. Experimental results indicate that our proposed strategy significantly improves the geometric and appearance quality under challenging sparse-view conditions, including using only three views. In rigorous benchmarking experiments against state-of-the-art methods, EndoSparse achieves superior results in terms of accurate geometry, realistic appearance, and rendering efficiency, confirming the robustness to sparse-view limitations in endoscopic reconstruction. EndoSparse signifies a steady step towards the practical deployment of neural 3D reconstruction in real-world clinical scenarios.\n\n从一组内窥镜图像中进行生物组织的三维重建是解锁各种重要下游手术应用的关键，这些应用都需要具备三维能力。现有方法利用各种先进的神经渲染技术进行逼真视角合成，但在仅有稀疏观测数据的情况下（这在真实世界的临床场景中往往是常见的），它们往往难以恢复精确的三维表示。\n为了解决这种稀疏性挑战，我们提出了一种框架，利用重建过程中来自多个基础模型的先验知识，称为 EndoSparse。实验结果表明，我们提出的策略显著改善了在挑战性稀疏视角条件下的几何和外观质量，包括仅使用三个视角的情况。在与最先进方法的严格基准测试实验中，EndoSparse在准确的几何重建、逼真的外观和渲染效率方面均取得了优越结果，证实了在内窥镜重建中克服稀疏视角限制的鲁棒性。EndoSparse标志着向实际临床场景中神经三维重建的实际部署迈出了稳定的一步。\n"
  },
  {
    "path": "abs/2407.01090.md",
    "content": "### Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction\n\nCone-Beam Computed Tomography (CBCT) is an indispensable technique in medical imaging, yet the associated radiation exposure raises concerns in clinical practice. To mitigate these risks, sparse-view reconstruction has emerged as an essential research direction, aiming to reduce the radiation dose by utilizing fewer projections for CT reconstruction. Although implicit neural representations have been introduced for sparse-view CBCT reconstruction, existing methods primarily focus on local 2D features queried from sparse projections, which is insufficient to process the more complicated anatomical structures, such as the chest. To this end, we propose a novel reconstruction framework, namely DIF-Gaussian, which leverages 3D Gaussians to represent the feature distribution in the 3D space, offering additional 3D spatial information to facilitate the estimation of attenuation coefficients. Furthermore, we incorporate test-time optimization during inference to further improve the generalization capability of the model. We evaluate DIF-Gaussian on two public datasets, showing significantly superior reconstruction performance than previous state-of-the-art methods.\n\n锥束计算机断层扫描（CBCT）在医学成像中是一种不可或缺的技术，但相关的辐射暴露引起临床实践中的担忧。为了减少这些风险，稀疏视角重建已经成为一个重要的研究方向，旨在通过利用更少的投影来减少CT重建的辐射剂量。尽管隐式神经表示已经被引入用于稀疏视角CBCT重建，现有方法主要集中在从稀疏投影中查询的局部2D特征上，这对于处理胸部等更复杂的解剖结构是不足够的。\n为此，我们提出了一种新的重建框架，名为DIF-Gaussian，它利用三维高斯函数来表示三维空间中的特征分布，提供额外的三维空间信息以便于估计衰减系数。此外，我们在推理过程中引入了测试时优化，进一步提高模型的泛化能力。我们在两个公开数据集上评估了DIF-Gaussian，在重建性能上显示出显著优于先前最先进方法的结果。\n"
  },
  {
    "path": "abs/2407.01301.md",
    "content": "### GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting\n\nRecent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue remains unexplored for emerging generative 3D formats like Gaussian Splatting. We present GaussianStego, a method for embedding steganographic information in the rendering of generated 3D assets. Our approach employs an optimization framework that enables the accurate extraction of hidden information from images rendered using Gaussian assets derived from large models, while maintaining their original visual quality. We conduct preliminary evaluations of our method across several potential deployment scenarios and discuss issues identified through analysis. GaussianStego represents an initial exploration into the novel challenge of embedding customizable, imperceptible, and recoverable information within the renders produced by current 3D generative models, while ensuring minimal impact on the rendered content's quality.\n\n最近大型生成模型和使用基于点的实时神经渲染技术的进展为通过共享合成的3D资产进行广泛视觉数据分发铺平了道路。然而，虽然针对传统视觉内容如图像和视频的标准化方法可以嵌入专有或版权信息，无论是明显还是微妙的方式，但对于新兴的生成式3D格式如高斯光滑，这个问题仍然未被探索。\n我们提出了一种名为GaussianStego的方法，用于在生成的3D资产渲染中嵌入隐写信息。我们的方法采用优化框架，能够准确地从使用大型模型生成的高斯资产渲染图像中提取隐藏信息，同时保持其原始的视觉质量。我们在多个潜在部署场景下对我们的方法进行了初步评估，并讨论了通过分析确定的问题。\nGaussianStego代表了对当前3D生成模型产生的渲染中嵌入可定制、难以察觉和可恢复信息的新挑战的初步探索，同时确保对渲染内容质量的最小影响。\n"
  },
  {
    "path": "abs/2407.01761.md",
    "content": "### DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction\n\n3D building reconstruction from imaging data is an important task for many applications ranging from urban planning to reconnaissance. Modern Novel View synthesis (NVS) methods like NeRF and Gaussian Splatting offer powerful techniques for developing 3D models from natural 2D imagery in an unsupervised fashion. These algorithms generally require input training views surrounding the scene of interest, which, in the case of large buildings, is typically not available across all camera elevations. In particular, the most readily available camera viewpoints at scale across most buildings are at near-ground (e.g., with mobile phones) and aerial (drones) elevations. However, due to the significant difference in viewpoint between drone and ground image sets, camera registration - a necessary step for NVS algorithms - fails. In this work we propose a method, DRAGON, that can take drone and ground building imagery as input and produce a 3D NVS model. The key insight of DRAGON is that intermediate elevation imagery may be extrapolated by an NVS algorithm itself in an iterative procedure with perceptual regularization, thereby bridging the visual feature gap between the two elevations and enabling registration. We compiled a semi-synthetic dataset of 9 large building scenes using Google Earth Studio, and quantitatively and qualitatively demonstrate that DRAGON can generate compelling renderings on this dataset compared to baseline strategies.\n\n3D建筑重建是从成像数据中提取3D模型的重要任务，涵盖了从城市规划到侦察等多种应用。现代的新视角合成（NVS）方法，如NeRF和高斯斑点化，提供了强大的技术，可以无监督地从自然的2D图像中开发3D模型。这些算法通常需要围绕感兴趣场景的输入训练视角，然而对于大型建筑物来说，通常不可能在所有摄像机高度都获得完整的训练视角。特别是，大多数建筑物最容易获得的摄像机视角包括接近地面（例如使用手机）和空中（无人机）高度。然而，由于无人机和地面图像集之间视角显著不同，新视角合成算法所需的摄像机注册步骤通常会失败。\n在这项工作中，我们提出了一种名为DRAGON的方法，它可以接受无人机和地面建筑图像作为输入，并生成3D的新视角合成模型。DRAGON的关键洞察是，中间高度图像可以通过带有感知正则化的迭代过程由NVS算法自身进行外推，从而弥合两种高度之间的视觉特征差距，并实现注册。我们使用Google Earth Studio编制了一个半合成数据集，包括9个大型建筑场景，通过定量和定性方法展示，DRAGON相较于基准策略在该数据集上能够生成引人入胜的渲染效果。\n"
  },
  {
    "path": "abs/2407.02034.md",
    "content": "### TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation\n\nDespite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process. Additionally, we explore the relationship between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choice, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Further extensive quantitative and qualitative results in text-guided 3D scene editing indicate that our method achieves superior editing quality compared to state-of-the-art methods. We will make the complete codebase publicly available following the conclusion of the double-blind review process.\n\n尽管在3D场景编辑领域取得了显著进展，但目前的方法在保持多视角编辑过程中的3D一致性方面仍面临重大挑战。为了解决这一问题，我们提出了一种渐进式的3D编辑策略，通过轨迹锚定方案（TAS）和双分支编辑机制确保多视角一致性。具体来说，TAS促进了2D视图编辑和3D更新之间紧密耦合的迭代过程，防止了由文本到图像处理过程中产生的误差累积。此外，我们探索了基于优化的方法和基于重建的方法之间的关系，为选择更优设计提供了统一的视角，支持设计TAS的理论基础。我们进一步提出了一个无需调整的视图一致注意力控制（VCAC）模块，利用源分支的跨视图语义和几何参考，在编辑2D视图时从目标分支生成对齐的视图。为验证我们方法的有效性，我们分析了2D示例，展示了VCAC模块在提升一致性方面的效果。进一步的定量和定性结果在文本引导的3D场景编辑中表明，我们的方法在编辑质量上优于现有方法。\n"
  },
  {
    "path": "abs/2407.02598.md",
    "content": "### AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction\n\nRealistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes. By imposing geometric constraints on Gaussians representing the road and sky regions, our method enables multi-view consistent simulation of challenging scenarios including lane changes. Leveraging 3D templates, we introduce a reflected Gaussian consistency constraint to supervise both the visible and unseen side of foreground objects. Moreover, to model the dynamic appearance of foreground objects, we estimate residual spherical harmonics for each foreground Gaussian. Extensive experiments on Pandaset and KITTI demonstrate that AutoSplat outperforms state-of-the-art methods in scene reconstruction and novel view synthesis across diverse driving scenarios.\n\n自动驾驶系统的发展需要逼真的场景重建和视角合成，以模拟安全关键的驾驶场景。3D高斯斑点化在实时渲染和静态场景重建方面表现出色，但在建模驾驶场景时面临复杂背景、动态物体和稀疏视角的挑战。我们提出了AutoSplat，这是一个利用高斯斑点化实现高度逼真的自动驾驶场景重建的框架。通过对代表道路和天空区域的高斯函数施加几何约束，我们的方法能够多视角一致地模拟包括车道变换在内的复杂场景。利用3D模板，我们引入了反射高斯一致性约束，监督前景物体可见和不可见侧的重建。此外，为了模拟前景物体的动态外观，我们为每个前景高斯估计残余球面谐波。\n在Pandaset和KITTI数据集上的广泛实验表明，AutoSplat在各种驾驶场景中的场景重建和新视角合成方面优于现有方法。\n"
  },
  {
    "path": "abs/2407.02918.md",
    "content": "### Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction\n\nReal-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3DGS with SfM fails to recover accurate camera poses and geometry in surgical scenes due to the challenges of minimal textures and photometric inconsistencies. To tackle this problem, in this paper, we propose the first SfM-free 3DGS-based method for surgical scene reconstruction by jointly optimizing the camera poses and scene representation. Based on the video continuity, the key of our method is to exploit the immediate optical flow priors to guide the projection flow derived from 3D Gaussians. Unlike most previous methods relying on photometric loss only, we formulate the pose estimation problem as minimizing the flow loss between the projection flow and optical flow. A consistency check is further introduced to filter the flow outliers by detecting the rigid and reliable points that satisfy the epipolar geometry. During 3D Gaussian optimization, we randomly sample frames to optimize the scene representations to grow the 3D Gaussian progressively. Experiments on the SCARED dataset demonstrate our superior performance over existing methods in novel view synthesis and pose estimation with high efficiency.\n\n实时的手术场景三维重建在计算机辅助手术中扮演着重要角色，有望提升外科医生的可视性。最近在三维高斯斑点化（3DGS）方面的进展显示出在一般场景的实时新视角合成中具有巨大潜力，这依赖于通过运动结构（SfM）生成的精确姿态和点云进行初始化。然而，由于手术场景中纹理极少且光度不一致性的挑战，3DGS与SfM在恢复准确的摄像机姿态和几何方面存在困难。\n为了解决这个问题，本文提出了首个基于无SfM的3DGS方法，用于手术场景重建，通过联合优化摄像机姿态和场景表达。根据视频连续性，我们方法的关键在于利用即时光流先验来引导从3D高斯中导出的投影光流。与大多数依赖光度损失的先前方法不同，我们将姿态估计问题形式化为最小化投影光流和光流之间的流动损失。进一步引入一致性检查，通过检测满足对极几何的刚性和可靠点来过滤流异常值。在3D高斯优化过程中，我们随机采样帧来逐步优化场景表达以扩展3D高斯。\n在SCARED数据集上的实验表明，我们的方法在新视角合成和姿态估计方面具有显著的性能优势，并且具有高效率。\n"
  },
  {
    "path": "abs/2407.02945.md",
    "content": "### VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors\n\nNeural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolated View Synthesis (EVS) problem by evaluating the reconstructions on views such as looking left, right or downwards with respect to training camera distributions. To improve rendering quality for EVS, we initialize our model by constructing dense LiDAR map, and propose to leverage prior scene knowledge such as surface normal estimator and large-scale diffusion model. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS. To the best of our knowledge, we are the first to address the EVS problem in urban scene reconstruction.\n\n基于神经渲染的城市场景重建方法通常依赖于从驾驶车辆上采集的图像，摄像头面向前方移动。虽然这些方法能够成功地合成与训练相似视角的图像，但是指向训练摄像头分布之外的新视角，并不能保证同等水平的性能。在本文中，我们解决了“外推视角合成（EVS）”问题，通过评估在不同于训练摄像头分布的视角下的重建效果，例如向左、向右或向下查看。为了改善EVS的渲染质量，我们通过构建密集的激光雷达地图来初始化模型，并提出利用场景先验知识，如表面法线估计器和大规模扩散模型。定性和定量比较显示了我们方法在EVS上的有效性。据我们所知，我们是首个解决城市场景重建中EVS问题的研究工作。\n\n"
  },
  {
    "path": "abs/2407.03204.md",
    "content": "### Expressive Gaussian Human Avatars from Monocular RGB Video\n\nNuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduce EVA, a drivable human model that meticulously sculpts fine details based on 3D Gaussians and SMPL-X, an expressive parametric human model. Focused on enhancing expressiveness, our work makes three key contributions. First, we highlight the critical importance of aligning the SMPL-X model with RGB frames for effective avatar learning. Recognizing the limitations of current SMPL-X prediction methods for in-the-wild videos, we introduce a plug-and-play module that significantly ameliorates misalignment issues. Second, we propose a context-aware adaptive density control strategy, which is adaptively adjusting the gradient thresholds to accommodate the varied granularity across body parts. Last but not least, we develop a feedback mechanism that predicts per-pixel confidence to better guide the learning of 3D Gaussians. Extensive experiments on two benchmarks demonstrate the superiority of our framework both quantitatively and qualitatively, especially on the fine-grained hand and facial details.\n\n在数字人类表示中，通过精细的手部和面部表情来表达细微的情感变化对增强逼真度和生动性至关重要。本文着重探讨从单目RGB视频中学习人类化身的表现力，这种设置在捕捉和动画化精细细节方面面临新的挑战。为此，我们引入了EVA，一个可驾驶的人类模型，根据3D高斯模型和SMPL-X（一种富有表现力的参数化人体模型）精心雕刻细节。我们的工作集中于增强表现力，提出了三个关键贡献。首先，我们强调将SMPL-X模型与RGB帧对齐的重要性，以实现有效的化身学习。鉴于当前SMPL-X预测方法在野外视频中的局限性，我们引入了一个即插即用的模块，显著改善了对齐问题。其次，我们提出了一种上下文感知的自适应密度控制策略，根据不同身体部位的细粒度调整梯度阈值。最后，我们开发了一个反馈机制，预测每个像素的置信度，更好地指导3D高斯模型的学习。在两个基准数据集上的广泛实验显示，我们的框架在数量和质量上均表现出优越性，尤其是在处理手部和面部细节方面。\n"
  },
  {
    "path": "abs/2407.03857.md",
    "content": "### PFGS: High Fidelity Point Cloud Rendering via Feature Splatting\n\nRendering high-fidelity images from sparse point clouds is still challenging. Existing learning-based approaches suffer from either hole artifacts, missing details, or expensive computations. In this paper, we propose a novel framework to render high-quality images from sparse points. This method first attempts to bridge the 3D Gaussian Splatting and point cloud rendering, which includes several cascaded modules. We first use a regressor to estimate Gaussian properties in a point-wise manner, the estimated properties are used to rasterize neural feature descriptors into 2D planes which are extracted from a multiscale extractor. The projected feature volume is gradually decoded toward the final prediction via a multiscale and progressive decoder. The whole pipeline experiences a two-stage training and is driven by our well-designed progressive and multiscale reconstruction loss. Experiments on different benchmarks show the superiority of our method in terms of rendering qualities and the necessities of our main components.\n\n目前，从稀疏点云生成高保真度图像仍然具有挑战性。现有的基于学习的方法往往存在孔洞伪影、细节缺失或计算复杂度高的问题。本文提出了一种新的框架，用于从稀疏点生成高质量图像。该方法首先尝试将3D高斯光滑和点云渲染进行桥接，其中包括多个级联模块。我们首先使用回归器以点云方式估计高斯属性，这些估计的属性用于将神经特征描述符栅格化到从多尺度提取的2D平面中。投影的特征体积通过多尺度和渐进解码器逐步解码至最终预测。整个流程经历了两阶段训练，并受我们设计良好的渐进和多尺度重建损失驱动。在不同基准测试中的实验显示了我们方法在渲染质量上的优越性以及我们主要组成部分的必要性。\n"
  },
  {
    "path": "abs/2407.03923.md",
    "content": "### CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images\n\nNeural radiance fields (NeRFs) have received significant attention due to their high-quality novel view rendering ability, prompting research to address various real-world cases. One critical challenge is the camera motion blur caused by camera movement during exposure time, which prevents accurate 3D scene reconstruction. In this study, we propose continuous rigid motion-aware gaussian splatting (CRiM-GS) to reconstruct accurate 3D scene from blurry images with real-time rendering speed. Considering the actual camera motion blurring process, which consists of complex motion patterns, we predict the continuous movement of the camera based on neural ordinary differential equations (ODEs). Specifically, we leverage rigid body transformations to model the camera motion with proper regularization, preserving the shape and size of the object. Furthermore, we introduce a continuous deformable 3D transformation in the SE(3) field to adapt the rigid body transformation to real-world problems by ensuring a higher degree of freedom. By revisiting fundamental camera theory and employing advanced neural network training techniques, we achieve accurate modeling of continuous camera trajectories. We conduct extensive experiments, demonstrating state-of-the-art performance both quantitatively and qualitatively on benchmark datasets.\n\n神经辐射场（NeRFs）因其高质量的新视角渲染能力而受到广泛关注，促使研究致力于解决各种现实世界情况。其中一个关键挑战是由于曝光期间相机移动引起的相机运动模糊，这阻碍了准确的3D场景重建。本研究中，我们提出了连续刚性运动感知高斯光滑（CRiM-GS），以实现从模糊图像中准确重建3D场景，并具备实时渲染速度。考虑到实际相机运动模糊过程，涉及复杂的运动模式，我们基于神经常微分方程（ODEs）预测相机的连续运动。具体而言，我们利用刚体变换来模拟相机运动，采用适当的正则化方法，保持物体的形状和大小。此外，我们引入了连续可变形的3D变换在 SE(3) 域中，通过确保更高的自由度，使刚体变换适应于实际问题。通过重新审视基本的相机理论并采用先进的神经网络训练技术，我们实现了对连续相机轨迹的准确建模。我们进行了广泛的实验，在基准数据集上定量和定性地展示了最先进的性能。\n"
  },
  {
    "path": "abs/2407.04237.md",
    "content": "### GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction\n\nWe present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach.\n\n我们提出了GSD，一种基于高斯喷溅（Gaussian Splatting，GS）表示的扩散模型方法，用于从单个视角进行3D物体重建。先前的工作由于不恰当的表示方法而导致3D几何不一致或渲染质量中等。我们通过利用最近的最先进3D显式表示方法——高斯喷溅，以及无条件的扩散模型，试图解决这些问题。这个模型学习生成由一组GS椭球体表示的3D物体。凭借这些强大的生成3D先验，虽然是无条件学习的，扩散模型已经可以进行视图引导的重建，无需进一步的模型微调。这是通过通过高效而灵活的光滑函数和引导去噪采样过程传播细粒度的2D特征实现的。此外，还进一步使用了2D扩散模型来增强渲染保真度，并通过优化和重复使用渲染图像来改善重建的GS质量。最终重建的物体具有高质量的3D结构和纹理，能够在任意视角高效渲染。在具有挑战性的真实世界CO3D数据集上的实验证明了我们方法的优越性。\n"
  },
  {
    "path": "abs/2407.04504.md",
    "content": "### Segment Any 4D Gaussians\n\nModeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks.\n\n在XR/VR中建模、理解和重建真实世界至关重要。最近，3D高斯光滑（3D-GS）方法在建模和理解3D场景方面取得了显著成功。类似地，各种4D表示法展示了捕捉4D世界动态的能力。然而，目前缺乏专注于4D表示内分割的研究。在本文中，我们提出了Segment Any 4D Gaussians（SA4D），这是首个基于4D高斯函数在4D数字世界中分割任何物体的框架之一。在SA4D中，引入了一个高效的时间标识特征场来处理高斯漂移，有潜力从嘈杂和稀疏的输入中学习精确的身份特征。此外，还提出了一个4D分割细化过程来去除伪影。我们的SA4D能够在几秒钟内精确地实现高质量的4D高斯函数分割，并展示了移除、重新着色、合成和渲染高质量任何物体掩码的能力。\n"
  },
  {
    "path": "abs/2407.04545.md",
    "content": "### Gaussian Eigen Models for Human Heads\n\nWe present personalized Gaussian Eigen Models (GEMs) for human heads, a novel method that compresses dynamic 3D Gaussians into low-dimensional linear spaces. Our approach is inspired by the seminal work of Blanz and Vetter, where a mesh-based 3D morphable model (3DMM) is constructed from registered meshes. Based on dynamic 3D Gaussians, we create a lower-dimensional representation of primitives that applies to most 3DGS head avatars. Specifically, we propose a universal method to distill the appearance of a mesh-controlled UNet Gaussian avatar using an ensemble of linear eigenbasis. We replace heavy CNN-based architectures with a single linear layer improving speed and enabling a range of real-time downstream applications. To create a particular facial expression, one simply needs to perform a dot product between the eigen coefficients and the distilled basis. This efficient method removes the requirement for an input mesh during testing, enhancing simplicity and speed in expression generation. This process is highly efficient and supports real-time rendering on everyday devices, leveraging the effectiveness of standard Gaussian Splatting. In addition, we demonstrate how the GEM can be controlled using a ResNet-based regression architecture. We show and compare self-reenactment and cross-person reenactment to state-of-the-art 3D avatar methods, demonstrating higher quality and better control. A real-time demo showcases the applicability of the GEM representation.\n\n我们提出了用于人类头部的个性化高斯特征模型（Gaussian Eigen Models，GEMs），这是一种将动态3D高斯函数压缩为低维线性空间的新方法。我们的方法受到Blanz和Vetter的开创性工作的启发，他们构建了基于网格的3D可变形模型（3DMM），从注册的网格中得出。基于动态3D高斯函数，我们创建了一种适用于大多数3D头部虚拟形象的低维表示方法。具体而言，我们提出了一种通用方法，通过一组线性特征基，精炼控制网格的UNet高斯化身形象。我们用单一线性层替换了复杂的基于CNN的架构，提高了速度，并使一系列实时下游应用成为可能。要创建特定的面部表情，只需对特征系数和精炼基之间进行点积运算。这种高效的方法在测试期间消除了对输入网格的需求，增强了生成表情的简易性和速度。这一过程非常高效，并支持在日常设备上实时渲染，充分利用了标准高斯光滑的有效性。此外，我们展示了如何使用基于ResNet的回归架构来控制GEM。我们展示并比较了自我重现和跨人重现与最先进的3D虚拟形象方法，显示了更高的质量和更好的控制。一个实时演示展示了GEM表示的适用性。\n"
  },
  {
    "path": "abs/2407.04699.md",
    "content": "### LaRa: Efficient Large-Baseline Radiance Fields\n\nRadiance field methods have achieved photorealistic novel view synthesis and geometry reconstruction. But they are mostly applied in per-scene optimization or small-baseline settings. While several recent works investigate feed-forward reconstruction with large baselines by utilizing transformers, they all operate with a standard global attention mechanism and hence ignore the local nature of 3D reconstruction. We propose a method that unifies local and global reasoning in transformer layers, resulting in improved quality and faster convergence. Our model represents scenes as Gaussian Volumes and combines this with an image encoder and Group Attention Layers for efficient feed-forward reconstruction. Experimental results demonstrate that our model, trained for two days on four GPUs, demonstrates high fidelity in reconstructing 360&deg radiance fields, and robustness to zero-shot and out-of-domain testing.\n\n辐射场方法已经实现了逼真的新视角合成和几何重建。但它们大多应用于每个场景的优化或小基线设置。尽管最近有几项研究探讨了利用变压器进行大基线的前向重建，但它们都使用了标准的全局注意力机制，因此忽略了3D重建的局部特性。我们提出了一种方法，在变压器层中统一了局部和全局推理，从而提高了质量并加快了收敛速度。我们的模型将场景表示为高斯体，并结合图像编码器和群组注意力层进行高效的前向重建。实验结果表明，我们的模型在四个GPU上训练两天后，展示了在重建360度辐射场方面的高保真度，并对零样本和域外测试具有稳健性。\n"
  },
  {
    "path": "abs/2407.05023.md",
    "content": "### SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction\n\nDynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at this https URL.\n\n内窥镜视频中可变形组织的动态重建是机器人辅助手术的一项关键技术。最近基于神经辐射场(NeRF)的重建方法在手术场景重建方面取得了显著成果。然而，由于基于隐式表示，NeRF 难以捕捉场景中物体的精细细节，且无法实现实时渲染。此外，受限的单视图感知和被遮挡的器械也为手术场景重建提出了特殊挑战。为解决这些问题，我们开发了 SurgicalGaussian，一种可变形的 3D 高斯散射方法来建模动态手术场景。我们的方法通过前向映射变形 MLP 和正则化来建模每个时间戳的软组织时空特征，约束局部 3D 高斯符合一致运动。通过深度初始化策略和工具掩码引导训练，我们的方法可以去除手术器械并重建高保真度的手术场景。通过在各种手术视频上的实验，我们的网络在渲染质量、渲染速度和 GPU 使用等多个方面都优于现有方法。\n"
  },
  {
    "path": "abs/2407.05254.md",
    "content": "### GaussReg: Fast 3D Registration with Gaussian Splatting\n\nPoint cloud registration is a fundamental problem for large-scale 3D scene scanning and reconstruction. With the help of deep learning, registration methods have evolved significantly, reaching a nearly-mature stage. As the introduction of Neural Radiance Fields (NeRF), it has become the most popular 3D scene representation as its powerful view synthesis capabilities. Regarding NeRF representation, its registration is also required for large-scale scene reconstruction. However, this topic extremly lacks exploration. This is due to the inherent challenge to model the geometric relationship among two scenes with implicit representations. The existing methods usually convert the implicit representation to explicit representation for further registration. Most recently, Gaussian Splatting (GS) is introduced, employing explicit 3D Gaussian. This method significantly enhances rendering speed while maintaining high rendering quality. Given two scenes with explicit GS representations, in this work, we explore the 3D registration task between them. To this end, we propose GaussReg, a novel coarse-to-fine framework, both fast and accurate. The coarse stage follows existing point cloud registration methods and estimates a rough alignment for point clouds from GS. We further newly present an image-guided fine registration approach, which renders images from GS to provide more detailed geometric information for precise alignment. To support comprehensive evaluation, we carefully build a scene-level dataset called ScanNet-GSReg with 1379 scenes obtained from the ScanNet dataset and collect an in-the-wild dataset called GSReg. Experimental results demonstrate our method achieves state-of-the-art performance on multiple datasets. Our GaussReg is 44 times faster than HLoc (SuperPoint as the feature extractor and SuperGlue as the matcher) with comparable accuracy.\n\n点云配准是大规模 3D 场景扫描和重建的一个基本问题。在深度学习的帮助下，配准方法已经显著发展，达到了接近成熟的阶段。随着神经辐射场(NeRF)的引入，由于其强大的视图合成能力，它已成为最流行的 3D 场景表示方法。对于 NeRF 表示，其配准也是大规模场景重建所需的。然而，这个主题极度缺乏探索。这是由于在具有隐式表示的两个场景之间建模几何关系的固有挑战。现有方法通常将隐式表示转换为显式表示以进行进一步配准。最近，高斯散射(GS)被引入，采用显式 3D 高斯。这种方法显著提高了渲染速度，同时保持了高渲染质量。\n给定两个具有显式 GS 表示的场景，在本工作中，我们探索了它们之间的 3D 配准任务。为此，我们提出了 GaussReg，一种新颖的粗到细框架，既快速又准确。粗配准阶段遵循现有的点云配准方法，并对来自 GS 的点云进行粗略对齐估计。我们进一步提出了一种新的图像引导精细配准方法，该方法从 GS 渲染图像以提供更详细的几何信息，用于精确对齐。\n为支持全面评估，我们精心构建了一个名为 ScanNet-GSReg 的场景级数据集，其中包含从 ScanNet 数据集获得的 1379 个场景，并收集了一个名为 GSReg 的实际应用数据集。实验结果表明，我们的方法在多个数据集上达到了最先进的性能。我们的 GaussReg 比 HLoc（使用 SuperPoint 作为特征提取器和 SuperGlue 作为匹配器）快 44 倍，同时保持了可比的准确性。\n"
  },
  {
    "path": "abs/2407.05324.md",
    "content": "### PICA: Physics-Integrated Clothed Avatar\n\nWe introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment dynamics, leading to incorrect deformations and noticeable rendering artifacts, especially for sliding or loose garments. Furthermore, previous works represent garment dynamics as pose-dependent deformations and facilitate novel pose animations in a data-driven manner. This often results in outcomes that do not faithfully represent the mechanics of motion and are prone to generating artifacts in out-of-distribution poses. To address these issues, we adopt two individual 3D Gaussian Splatting (3DGS) models with different deformation characteristics, modeling the human body and clothing separately. This distinction allows for better handling of their respective motion characteristics. With this representation, we integrate a graph neural network (GNN)-based clothed body physics simulation module to ensure an accurate representation of clothing dynamics. Our method, through its carefully designed features, achieves high-fidelity rendering of clothed human bodies in complex and novel driving poses, significantly outperforming previous methods under the same settings.\n\n我们介绍了 PICA，这是一种新颖的表示方法，用于高保真可动画的穿衣人体头像，具有物理精确的动态效果，即使对于宽松的服装也是如此。先前基于神经渲染的可动画穿衣人体表示方法通常使用单一模型来表示衣服和底层身体。虽然这种方法高效，但往往无法准确表示复杂的服装动态，导致不正确的变形和明显的渲染伪影，特别是对于滑动或宽松的服装。\n此外，先前的工作将服装动态表示为姿势相关的变形，并以数据驱动的方式实现新颖的姿势动画。这常常导致无法忠实表现运动力学，并容易在分布外姿势中产生伪影。\n为解决这些问题，我们采用了两个具有不同变形特征的独立 3D 高斯散射（3DGS）模型，分别对人体和服装进行建模。这种区分允许更好地处理它们各自的运动特征。基于这种表示方法，我们整合了一个基于图神经网络（GNN）的穿衣人体物理模拟模块，以确保准确表示服装动态。\n通过精心设计的特性，我们的方法在复杂和新颖的驱动姿势下实现了穿衣人体的高保真渲染，在相同设置下显著优于先前的方法。\n"
  },
  {
    "path": "abs/2407.07090.md",
    "content": "### 3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes\n\nParticle-based representations of radiance fields such as 3D Gaussian Splatting have found great success for reconstructing and re-rendering of complex scenes. Most existing methods render particles via rasterization, projecting them to screen space tiles for processing in a sorted order. This work instead considers ray tracing the particles, building a bounding volume hierarchy and casting a ray for each pixel using high-performance GPU ray tracing hardware. To efficiently handle large numbers of semi-transparent particles, we describe a specialized rendering algorithm which encapsulates particles with bounding meshes to leverage fast ray-triangle intersections, and shades batches of intersections in depth-order. The benefits of ray tracing are well-known in computer graphics: processing incoherent rays for secondary lighting effects such as shadows and reflections, rendering from highly-distorted cameras common in robotics, stochastically sampling rays, and more. With our renderer, this flexibility comes at little cost compared to rasterization. Experiments demonstrate the speed and accuracy of our approach, as well as several applications in computer graphics and vision. We further propose related improvements to the basic Gaussian representation, including a simple use of generalized kernel functions which significantly reduces particle hit counts.\n\n我们介绍了一种基于射线追踪的粒子辐射场渲染方法，以及对高斯表示的改进。以下是翻译：\n辐射场的基于粒子的表示方法（如 3D 高斯散射）在复杂场景的重建和重新渲染方面取得了巨大成功。大多数现有方法通过光栅化来渲染粒子，将它们投影到屏幕空间的瓦片上，以排序的方式进行处理。本工作考虑对粒子进行射线追踪，构建边界体积层次结构，并使用高性能 GPU 射线追踪硬件为每个像素投射射线。\n为了高效处理大量半透明粒子，我们描述了一种专门的渲染算法，该算法用边界网格封装粒子以利用快速的射线-三角形相交，并按深度顺序对批量交点进行着色。射线追踪的优势在计算机图形学中是众所周知的：处理非相干射线以产生次级光照效果（如阴影和反射），从机器人学中常见的高度扭曲的相机进行渲染，随机采样射线等。使用我们的渲染器，这种灵活性与光栅化相比几乎没有额外成本。\n实验证明了我们方法的速度和准确性，以及在计算机图形学和视觉方面的多种应用。我们进一步提出了对基本高斯表示的相关改进，包括简单使用广义核函数，这显著减少了粒子命中次数。\n"
  },
  {
    "path": "abs/2407.07220.md",
    "content": "### Reference-based Controllable Scene Stylization with Gaussian Splatting\n\nReferenced-based scene stylization that edits the appearance based on a content-aligned reference image is an emerging research area. Starting with a pretrained neural radiance field (NeRF), existing methods typically learn a novel appearance that matches the given style. Despite their effectiveness, they inherently suffer from time-consuming volume rendering, and thus are impractical for many real-time applications. In this work, we propose ReGS, which adapts 3D Gaussian Splatting (3DGS) for reference-based stylization to enable real-time stylized view synthesis. Editing the appearance of a pretrained 3DGS is challenging as it uses discrete Gaussians as 3D representation, which tightly bind appearance with geometry. Simply optimizing the appearance as prior methods do is often insufficient for modeling continuous textures in the given reference image. To address this challenge, we propose a novel texture-guided control mechanism that adaptively adjusts local responsible Gaussians to a new geometric arrangement, serving for desired texture details. The proposed process is guided by texture clues for effective appearance editing, and regularized by scene depth for preserving original geometric structure. With these novel designs, we show ReGs can produce state-of-the-art stylization results that respect the reference texture while embracing real-time rendering speed for free-view navigation.\n\n基于参考的场景风格化是一个新兴的研究领域，它根据内容对齐的参考图像来编辑外观。现有方法通常从预训练的神经辐射场(NeRF)开始，学习与给定风格匹配的新外观。尽管这些方法很有效，但它们固有地受到耗时的体积渲染的限制，因此对许多实时应用来说并不实用。\n在这项工作中，我们提出了 ReGS，它采用 3D 高斯散射(3DGS)进行基于参考的风格化，以实现实时风格化视图合成。编辑预训练 3DGS 的外观具有挑战性，因为它使用离散高斯作为 3D 表示，这将外观与几何紧密绑定。像之前的方法那样简单地优化外观通常不足以模拟给定参考图像中的连续纹理。\n为解决这一挑战，我们提出了一种新颖的纹理引导控制机制，可自适应地调整局部负责的高斯分布到新的几何排列，以服务于所需的纹理细节。这个提出的过程由纹理线索引导，以实现有效的外观编辑，并由场景深度正则化以保持原始几何结构。\n通过这些新颖的设计，我们展示了 ReGS 可以产生尊重参考纹理的最先进的风格化结果，同时实现实时渲染速度，可用于自由视角导航。\n"
  },
  {
    "path": "abs/2407.07284.md",
    "content": "### MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition\n\nWe introduce MIGS (Multi-Identity Gaussian Splatting), a novel method that learns a single neural representation for multiple identities, using only monocular videos. Recent 3D Gaussian Splatting (3DGS) approaches for human avatars require per-identity optimization. However, learning a multi-identity representation presents advantages in robustly animating humans under arbitrary poses. We propose to construct a high-order tensor that combines all the learnable 3DGS parameters for all the training identities. By assuming a low-rank structure and factorizing the tensor, we model the complex rigid and non-rigid deformations of multiple subjects in a unified network, significantly reducing the total number of parameters. Our proposed approach leverages information from all the training identities, enabling robust animation under challenging unseen poses, outperforming existing approaches. We also demonstrate how it can be extended to learn unseen identities.\n\n我们介绍了 MIGS（多身份高斯散射），这是一种新型方法，仅使用单目视频就能学习多个身份的单一神经表示。最近的用于人体头像的 3D 高斯散射（3DGS）方法需要针对每个身份进行优化。然而，学习多身份表示在任意姿势下稳健地为人体制作动画方面具有优势。\n我们提出构建一个高阶张量，该张量结合了所有训练身份的所有可学习 3DGS 参数。通过假设低秩结构并对张量进行因子分解，我们在一个统一的网络中建模多个主体的复杂刚性和非刚性变形，显著减少了总参数数量。\n我们提出的方法利用了所有训练身份的信息，使其能够在具有挑战性的未见姿势下进行稳健的动画制作，优于现有方法。我们还展示了如何将其扩展到学习未见身份。\n"
  },
  {
    "path": "abs/2407.08447.md",
    "content": "### WildGaussians: 3D Gaussian Splatting in the Wild\n\nWhile the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.\n\n尽管3D场景重建领域由于其逼真的质量而主要由NeRFs（神经辐射场）主导，但最近3D高斯喷溅（3DGS）技术已经出现，提供了类似的质量并具备实时渲染速度。然而，这两种方法主要在受控的3D场景中表现出色，而在自然环境中的数据——特点是遮挡、动态对象和变化的光照——依然具有挑战性。NeRFs能通过每张图片的嵌入向量轻松适应这种条件，但由于3DGS的显式表示和缺乏共享参数，它在处理这些问题上遇到困难。为了解决这一问题，我们引入了一种名为WildGaussians的新方法，该方法通过利用强大的DINO特征并在3DGS中整合外观建模模块，有效处理遮挡和外观变化。我们证明了WildGaussians在保持3DGS的实时渲染速度的同时，在处理自然环境数据方面超越了3DGS和NeRF的基线，且这一切都在一个简单的架构框架内实现。\n"
  },
  {
    "path": "abs/2407.09473.md",
    "content": "### StyleSplat: 3D Object Style Transfer with Gaussian Splatting\n\nRecent advancements in radiance fields have opened new avenues for creating high-quality 3D assets and scenes. Style transfer can enhance these 3D assets with diverse artistic styles, transforming creative expression. However, existing techniques are often slow or unable to localize style transfer to specific objects. We introduce StyleSplat, a lightweight method for stylizing 3D objects in scenes represented by 3D Gaussians from reference style images. Our approach first learns a photorealistic representation of the scene using 3D Gaussian splatting while jointly segmenting individual 3D objects. We then use a nearest-neighbor feature matching loss to finetune the Gaussians of the selected objects, aligning their spherical harmonic coefficients with the style image to ensure consistency and visual appeal. StyleSplat allows for quick, customizable style transfer and localized stylization of multiple objects within a scene, each with a different style. We demonstrate its effectiveness across various 3D scenes and styles, showcasing enhanced control and customization in 3D creation.\n\n最近在辐射场的进展为创建高质量的3D资产和场景开辟了新途径。风格迁移可以通过多样化的艺术风格增强这些3D资产，从而转变创造性表达。然而，现有技术通常速度慢或无法将风格迁移定位到特定对象。我们介绍了一种名为StyleSplat的轻量级方法，用于在由参考风格图像的3D高斯表示的场景中对3D对象进行风格化。我们的方法首先使用3D高斯喷溅技术学习场景的真实感表示，同时对各个3D对象进行分割。然后，我们使用最近邻特征匹配损失来微调所选对象的高斯，将它们的球谐系数与风格图像对齐，以确保一致性和视觉吸引力。StyleSplat允许快速、可定制的风格迁移和场景中多个对象的局部风格化，每个对象都具有不同的风格。我们展示了其在各种3D场景和风格中的有效性，展示了在3D创作中对控制和定制的增强。\n"
  },
  {
    "path": "abs/2407.09510.md",
    "content": "### 3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods\n\nWe present a work-in-progress survey on 3D Gaussian Splatting compression methods, focusing on their statistical performance across various benchmarks. This survey aims to facilitate comparability by summarizing key statistics of different compression approaches in a tabulated format. The datasets evaluated include TanksAndTemples, MipNeRF360, DeepBlending, and SyntheticNeRF. For each method, we report the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Learned Perceptual Image Patch Similarity (LPIPS), and the resultant size in megabytes (MB), as provided by the respective authors. This is an ongoing, open project, and we invite contributions from the research community as GitHub issues or pull requests. Please visit this http URL for more information and a sortable version of the table.\n\n我们在这里展示了一个关于3D高斯喷溅压缩方法的正在进行的研究调查，重点关注它们在各种基准测试中的统计性能。此调查旨在通过以表格格式总结不同压缩方法的关键统计数据，以促进可比性。评估的数据集包括TanksAndTemples、MipNeRF360、DeepBlending和SyntheticNeRF。对于每种方法，我们报告了峰值信噪比（PSNR）、结构相似性指数（SSIM）、学习感知图像补丁相似性（LPIPS）和结果大小（以兆字节MB计），这些数据由各自的作者提供。这是一个持续进行的开放项目，我们邀请研究社区通过GitHub问题或拉取请求贡献。\n"
  },
  {
    "path": "abs/2407.10062.md",
    "content": "### SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion\n\nNovel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-world objects or scenes in various fields, including Virtual Reality or embodied AI. Spike cameras, a novel type of neuromorphic sensor, continuously record scenes with an ultra-high temporal resolution, showing potential for accurate 3D reconstruction. Despite their promise, existing approaches, such as applying Neural Radiance Fields (NeRF) to spike cameras, encounter challenges due to the time-consuming rendering process. To address this issue, we make the first attempt to introduce the 3D Gaussian Splatting (3DGS) into spike cameras in high-speed capture, providing 3DGS as dense and continuous clues of views, then constructing SpikeGS. Specifically, to train SpikeGS, we establish computational equations between the rendering process of 3DGS and the processes of instantaneous imaging and exposing-like imaging of the continuous spike stream. Besides, we build a very lightweight but effective mapping process from spikes to instant images to support training. Furthermore, we introduced a new spike-based 3D rendering dataset for validation. Extensive experiments have demonstrated our method possesses the high quality of novel view rendering, proving the tremendous potential of spike cameras in modeling 3D scenes.\n\n新视角合成通过从3D场景的多视图图像生成新的2D渲染图像，发挥着至关重要的作用。然而，使用传统相机捕捉高速场景通常会导致运动模糊，这阻碍了3D重建的有效性。为了解决这个挑战，高帧率密集3D重建作为一种至关重要的技术出现，使得在虚拟现实或具体化人工智能等多个领域中，能够详细且准确地建模现实世界的对象或场景。神经形态传感器中的一种新型相机，即尖峰相机，可以持续以超高时间分辨率记录场景，显示出用于准确3D重建的潜力。尽管有前景，现有的方法，例如将神经辐射场（NeRF）应用于尖峰相机，由于渲染过程耗时而遇到挑战。为了解决这个问题，我们首次尝试将3D高斯喷溅（3DGS）引入高速捕捉的尖峰相机中，提供3DGS作为视角的密集和连续线索，然后构建SpikeGS。具体来说，为了训练SpikeGS，我们在3DGS的渲染过程与尖峰流的瞬时成像和暴露式成像的过程之间建立计算方程。此外，我们构建了一个非常轻量但有效的从尖峰到瞬时图像的映射过程以支持训练。我们还引入了一个新的基于尖峰的3D渲染数据集进行验证。广泛的实验已经证明了我们方法在新视角渲染的高质量，证实了尖峰相机在3D场景建模中的巨大潜力。\n"
  },
  {
    "path": "abs/2407.10102.md",
    "content": "### 3DEgo: 3D Editing on the Go!\n\nWe introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the dataset with edited images to achieve a 3D scene with text fidelity. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow by overcoming the reliance on COLMAP and eliminating the cost of model initialization. We apply a diffusion model to edit video frames prior to 3D scene creation by incorporating our designed noise blender module for enhancing multi-view editing consistency, a step that does not require additional training or fine-tuning of T2I diffusion models. 3DEgo utilizes 3D Gaussian Splatting to create 3D scenes from the multi-view consistent edited frames, capitalizing on the inherent temporal continuity and explicit point cloud data. 3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources, as validated by extensive evaluations on six datasets, including our own prepared GS25 dataset.\n\n我们引入3DEgo来解决一个新问题：直接从单眼视频中，通过文本提示引导合成逼真的3D场景。传统方法通过三阶段过程构建一个文本条件的3D场景，包括使用Structure-from-Motion（SfM）库如COLMAP进行姿态估计，用未编辑的图像初始化3D模型，并通过迭代更新编辑过的图像数据集以实现具有文本保真度的3D场景。我们的框架通过克服对COLMAP的依赖并消除模型初始化的成本，将传统的多阶段3D编辑过程简化为单阶段工作流。我们在3D场景创建之前应用扩散模型编辑视频帧，整合我们设计的噪声混合模块以增强多视图编辑的一致性，这一步骤不需要额外的训练或微调T2I扩散模型。3DEgo利用3D高斯喷溅从多视图一致的编辑帧中创建3D场景，利用固有的时间连续性和显式点云数据。3DEgo在各种视频源上展示了卓越的编辑精度、速度和适应性，通过在六个数据集上的广泛评估进行了验证，包括我们自己准备的GS25数据集。\n"
  },
  {
    "path": "abs/2407.10318.md",
    "content": "### RecGS: Removing Water Caustic with Recurrent Gaussian Splatting\n\nWater caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this paper, we present a novel method Recurrent Gaussian Splatting, which takes advantage of today's photorealistic 3D reconstruction technology, 3DGS, to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recursively and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our method can effectively separate the caustic from the seafloor, improving the visual appearance.\n\n水面光泽通常在浅水区的海底成像数据中观察到。传统的去除图像中光泽模式的方法通常依赖于2D滤波或在标注数据集上的预训练，这在推广到具有3D结构的实际海底数据时会妨碍性能。在本文中，我们提出了一种新的方法——循环高斯喷溅（Recurrent Gaussian Splatting，简称RecGS），该方法利用当今的逼真3D重建技术3DGS，从海底图像中分离光泽。通过一系列由水下机器人拍摄的图像，我们循环地构建3DGS，并在每次迭代中通过低通滤波分解光泽。在实验中，我们分析并与不同方法进行比较，包括联合优化、2D滤波和深度学习方法。结果表明，我们的方法可以有效地从海底分离光泽，改善视觉外观，并有可能应用于更多具有不一致光照的问题。\n"
  },
  {
    "path": "abs/2407.11174.md",
    "content": "### iHuman: Instant Animatable Digital Humans From Monocular Videos\n\nPersonalized 3D avatars require an animatable representation of digital humans. Doing so instantly from monocular videos offers scalability to broad class of users and wide-scale applications. In this paper, we present a fast, simple, yet effective method for creating animatable 3D digital humans from monocular videos. Our method utilizes the efficiency of Gaussian splatting to model both 3D geometry and appearance. However, we observed that naively optimizing Gaussian splats results in inaccurate geometry, thereby leading to poor animations. This work achieves and illustrates the need of accurate 3D mesh-type modelling of the human body for animatable digitization through Gaussian splats. This is achieved by developing a novel pipeline that benefits from three key aspects: (a) implicit modelling of surface's displacements and the color's spherical harmonics; (b) binding of 3D Gaussians to the respective triangular faces of the body template; (c) a novel technique to render normals followed by their auxiliary supervision. Our exhaustive experiments on three different benchmark datasets demonstrates the state-of-the-art results of our method, in limited time settings. In fact, our method is faster by an order of magnitude (in terms of training time) than its closest competitor. At the same time, we achieve superior rendering and 3D reconstruction performance under the change of poses.\n\n个性化3D头像需要一种可动画化的数字人类表现形式。通过单目视频即时实现这一点，可以扩展到广泛的用户和大规模应用。在本文中，我们介绍了一种快速、简单且有效的方法，用于从单目视频创建可动画化的3D数字人类。我们的方法利用高斯喷溅的效率来同时建模3D几何形状和外观。然而，我们观察到，简单地优化高斯喷溅会导致几何形状不准确，从而导致动画效果差。本工作通过高斯喷溅实现并展示了精确的3D网格类型人体建模对可动画数字化的需求。这是通过开发一个从三个关键方面受益的新颖流程实现的：(a) 表面位移和颜色球谐的隐式建模；(b) 将3D高斯绑定到身体模板的相应三角形面；(c) 一种渲染法线的新技术，随后进行辅助监督。我们在三个不同的基准数据集上进行的详尽实验展示了我们方法在有限时间设置中的最新成果。事实上，就训练时间而言，我们的方法比最接近的竞争者快一个数量级。同时，在姿势变化下，我们实现了更优越的渲染和3D重建性能。\n\n"
  },
  {
    "path": "abs/2407.11309.md",
    "content": "### Gaussian Splatting Lucas-Kanade\n\nReconstructing dynamic 3D scenes from 2D images and generating diverse views over time presents a significant challenge due to the inherent complexity and temporal dynamics involved. While recent advancements in neural implicit models and dynamic Gaussian Splatting have shown promise, limitations persist, particularly in accurately capturing the underlying geometry of highly dynamic scenes. Some approaches address this by incorporating strong semantic and geometric priors through diffusion models. However, we explore a different avenue by investigating the potential of regularizing the native warp field within the dynamic Gaussian Splatting framework. Our method is grounded on the key intuition that an accurate warp field should produce continuous space-time motions. While enforcing the motion constraints on warp fields is non-trivial, we show that we can exploit knowledge innate to the forward warp field network to derive an analytical velocity field, then time integrate for scene flows to effectively constrain both the 2D motion and 3D positions of the Gaussians. This derived Lucas-Kanade style analytical regularization enables our method to achieve superior performance in reconstructing highly dynamic scenes, even under minimal camera movement, extending the boundaries of what existing dynamic Gaussian Splatting frameworks can achieve.\n\n从2D图像重建动态3D场景并随时间生成多样化视图，由于涉及的固有复杂性和时间动态性，这一任务面临着重大挑战。尽管最近在神经隐式模型和动态高斯喷溅技术方面的进展显示出前景，但在准确捕获高动态场景的底层几何结构方面，仍存在局限性。一些方法通过融入强语义和几何先验的扩散模型来解决这一问题。然而，我们探索了一条不同的道路，即通过调查在动态高斯喷溅框架内规范本机扭曲场的潜力。我们的方法基于这样一个核心直觉：一个准确的扭曲场应该产生连续的时空运动。尽管在扭曲场上执行运动约束并非易事，我们展示了如何利用前向扭曲场网络固有的知识来推导出一个解析速度场，然后进行时间积分以有效地约束高斯的2D运动和3D位置。这种派生的Lucas-Kanade风格解析规范使我们的方法在重建高动态场景方面实现了卓越的性能，即使在相机运动最小的情况下，也扩展了现有动态高斯喷溅框架所能达到的边界。\n\n"
  },
  {
    "path": "abs/2407.11343.md",
    "content": "### Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering\n\nComputational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-informed scheme to infer 3D Gaussian splatting from a monocular event camera, enabling efficient novel view synthesis. Leveraging 3D Gaussians with pure event-based supervision, Ev-GS overcomes challenges such as the detection of fast-moving objects and insufficient lighting. Experimental results show that Ev-GS outperforms the method that takes frame-based signals as input by rendering realistic views with reduced blurring and improved visual quality. Moreover, it demonstrates competitive reconstruction quality and reduced computing occupancy compared to existing methods, which paves the way to a highly efficient CNI approach for signal processing.\n\n计算神经形态成像（CNI）使用事件相机提供了最小的运动模糊和增强的动态范围等优势，与传统的基于帧的方法相比。现有基于事件的辐射场渲染方法构建在神经辐射场上，这种方法计算量大且重建速度慢。受到这两个方面的启发，我们介绍了Ev-GS，这是第一个基于CNI的方案，从单目事件相机推断3D高斯喷溅，实现高效的新视角合成。通过利用3D高斯和纯事件基的监督，Ev-GS克服了快速移动对象检测和光照不足等挑战。实验结果表明，与采用基于帧的信号输入的方法相比，Ev-GS通过渲染真实视图减少模糊并提高视觉质量而胜出。此外，它显示出与现有方法相比具有竞争性的重建质量和降低的计算占用，为高效的CNI信号处理方法铺平了道路。\n"
  },
  {
    "path": "abs/2407.11793.md",
    "content": "### Click-Gaussian: Interactive Segmentation to Any 3D Gaussians\n\nInteractive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy.\n\n交互式3D高斯分割为实时操作3D场景提供了巨大的机会，这得益于3D高斯喷溅的实时渲染能力。然而，当前方法因为要处理噪声分割输出而遭受耗时的后处理问题。同时，它们难以提供详细的分割，这对于精细操作3D场景非常重要。在本研究中，我们提出了Click-Gaussian，该方法学习两级粒度的可区分特征场，促进了无需耗时后处理的分割。我们深入探讨了由独立于3D场景获得的2D分割产生的特征场学习不一致性带来的挑战。当2D分割结果（3D分割的主要线索）在不同视图中存在冲突时，3D分割精度会恶化。为克服这些问题，我们提出了全局特征引导学习（GFL）。GFL从各视图的噪声2D分割中构建全局特征候选者群，这在训练3D高斯的特征时平滑了噪声。我们的方法每次点击运行时间为10毫秒，比以前的方法快15到130倍，同时显著提高了分割精度。\n\n"
  },
  {
    "path": "abs/2407.11840.md",
    "content": "### MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification\n\nIn the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and the calculation of depth through the accumulation of opacity can compromise the detail of mesh extraction. To address this issue, we introduce MVG-Splatting, a solution guided by Multi-View considerations. Specifically, we integrate an optimized method for calculating normals, which, combined with image gradients, helps rectify inconsistencies in the original depth computations. Additionally, utilizing projection strategies akin to those in Multi-View Stereo (MVS), we propose an adaptive quantile-based method that dynamically determines the level of additional densification guided by depth maps, from coarse to fine detail. Experimental evidence demonstrates that our method not only resolves the issues of rendering quality degradation caused by depth discrepancies but also facilitates direct mesh extraction from dense Gaussian point clouds using the Marching Cubes algorithm. This approach significantly enhances the overall fidelity and accuracy of the 3D reconstruction process, ensuring that both the geometric details and visual quality.\n\n在快速发展的3D重建领域中，3D高斯喷溅（3DGS）和2D高斯喷溅（2DGS）代表了重大进展。尽管2DGS将3D高斯原始体压缩为2D高斯面元，有效地提高了网格提取质量，但这种压缩可能会导致渲染质量下降。此外，不可靠的增密过程和通过不透明度累积计算深度可能会损害网格提取的细节。为解决这一问题，我们引入了MVG喷溅，这是一种受多视角考虑指导的解决方案。具体来说，我们整合了一种优化的法线计算方法，结合图像梯度，帮助纠正原始深度计算中的不一致。此外，利用类似于多视图立体（MVS）中的投影策略，我们提出了一种自适应的分位数基方法，动态确定从粗糙到细节的额外增密级别，该级别由深度图指导。实验证据表明，我们的方法不仅解决了由深度差异引起的渲染质量降低的问题，而且还通过使用Marching Cubes算法直接从密集的高斯点云中提取网格，显著提高了整个3D重建过程的整体保真度和准确性，确保了几何细节和视觉质量。\n\n"
  },
  {
    "path": "abs/2407.12306.md",
    "content": "### Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections\n\nNovel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS.\n\n从无约束的自然图像集合中进行新视角合成一直是一个重大且具有挑战性的任务，因为光度变化和瞬时遮挡物使得准确的场景重建变得复杂。先前的方法通过在神经辐射场（NeRFs）中集成每张图片的外观特征嵌入来处理这些问题。尽管三维高斯投影（3DGS）提供了更快的训练速度和实时渲染，但由于架构差异显著，将其适用于无约束的图像集合并非易事。在本文中，我们介绍了一种名为Splatfacto-W的方法，该方法将每个高斯的神经颜色特征和每张图像的外观嵌入整合到光栅化过程中，并结合球谐函数背景模型来表示变化的光度外观和更好地描述背景。我们的主要贡献包括潜在外观建模、高效的瞬时物体处理和精确的背景建模。Splatfacto-W提供了高质量的实时新视角合成，改善了在自然场景中的场景一致性。我们的方法与3DGS相比，平均提高了5.3 dB的峰值信噪比（PSNR），训练速度比基于NeRF的方法快150倍，并且达到了与3DGS类似的渲染速度。\n"
  },
  {
    "path": "abs/2407.12777.md",
    "content": "### Generalizable Human Gaussians for Sparse View Synthesis\n\nRecent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D humans from sparse views presents formidable hurdles due to the inherent complexity of human geometry, resulting in inaccurate reconstructions of geometry and textures. To tackle this challenge, this paper leverages recent advancements in Gaussian Splatting and introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views in a feed-forward manner. A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template, which allows leveraging the strong geometry prior and the advantages of 2D convolutions. In addition, a multi-scaffold is proposed to effectively represent the offset details. Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.\n\n近期在神经渲染领域的进展催生了一些开创性的方法，例如神经辐射场（NeRF）和高斯投影，这些方法在增强现实/虚拟现实、游戏以及内容创作等多个领域革新了视图渲染。虽然这些方法在插值训练数据方面表现出色，但在从非常稀疏的视角推广到新场景和对象的挑战仍然存在。特别是，从稀疏视角对三维人类进行建模面临巨大障碍，因为人类几何形态的固有复杂性，导致几何和纹理重建的不准确。为了应对这一挑战，本文利用高斯投影的最新进展，并引入了一种新方法学习可推广的人类高斯，这种方法能够以前馈方式从有限的稀疏视角实现新人类对象的逼真和精确视图渲染。我们方法的一个关键创新在于将学习三维高斯参数的过程重新定义为在人类模板的二维UV空间上进行的回归过程，这使得我们能够利用强大的几何先验和二维卷积的优势。此外，还提出了一个多脚手架模型来有效地表示偏移细节。我们的方法在数据集内外的泛化能力上均优于最近的方法。\n"
  },
  {
    "path": "abs/2407.13520.md",
    "content": "### EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting\n\n3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-assisted 3D Deblur Reconstruction with Gaussian Splatting (EaDeblur-GS), which integrates event camera data to enhance the robustness of 3DGS against motion blur. By employing an Adaptive Deviation Estimator (ADE) network to estimate Gaussian center deviations and using novel loss functions, EaDeblur-GS achieves sharp 3D reconstructions in real-time, demonstrating performance comparable to state-of-the-art methods.\n\n最近，随着神经辐射场（NeRF）和三维高斯投影（3DGS）的发展，三维去模糊重建技术取得了显著进展。尽管这些技术可以从模糊图像输入中恢复相对清晰的三维重建，但在处理严重模糊和复杂的相机运动时仍存在局限。为了解决这些问题，我们提出了一种名为事件辅助的三维去模糊重建技术，结合高斯投影（EaDeblur-GS），该技术整合了事件相机数据以增强3DGS对运动模糊的鲁棒性。通过采用自适应偏差估计器（ADE）网络来估计高斯中心的偏差，并使用新颖的损失函数，EaDeblur-GS能够实时实现清晰的三维重建，其性能可与最先进的方法媲美。\n"
  },
  {
    "path": "abs/2407.13584.md",
    "content": "### Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation\n\nAlthough recent advancements in text-to-3D generation have significantly improved generation quality, issues like limited level of detail and low fidelity still persist, which requires further improvement. To understand the essence of those issues, we thoroughly analyze current score distillation methods by connecting theories of consistency distillation to score distillation. Based on the insights acquired through analysis, we propose an optimization framework, Guided Consistency Sampling (GCS), integrated with 3D Gaussian Splatting (3DGS) to alleviate those issues. Additionally, we have observed the persistent oversaturation in the rendered views of generated 3D assets. From experiments, we find that it is caused by unwanted accumulated brightness in 3DGS during optimization. To mitigate this issue, we introduce a Brightness-Equalized Generation (BEG) scheme in 3DGS rendering. Experimental results demonstrate that our approach generates 3D assets with more details and higher fidelity than state-of-the-art methods.\n\n尽管最近在文本到三维生成的技术进展显著提高了生成质量，但诸如细节水平有限和保真度低等问题仍然存在，这需要进一步改进。为了理解这些问题的本质，我们通过将一致性蒸馏理论与得分蒸馏相结合，对当前的得分蒸馏方法进行了深入分析。基于通过分析获得的洞察，我们提出了一个优化框架——引导一致性采样（GCS），并将其与三维高斯投影（3DGS）整合，以缓解这些问题。此外，我们还观察到生成的三维资产的渲染视图中持续存在过饱和现象。通过实验，我们发现这是由于在优化过程中3DGS中不希望的亮度累积造成的。为了缓解这一问题，我们在3DGS渲染中引入了一个亮度均衡生成（BEG）方案。实验结果表明，我们的方法生成的三维资产比现有最先进方法具有更多细节和更高的保真度。\n"
  },
  {
    "path": "abs/2407.13976.md",
    "content": "### PlacidDreamer: Advancing Harmony in Text-to-3D Generation\n\nRecently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer.\n\n近期,文本到3D生成技术引起了广泛关注,并取得了显著的性能提升。先前的方法使用端到端的3D生成模型来初始化3D高斯基元,利用多视图扩散模型来保持多视图一致性,并应用文本到图像扩散模型配合分数蒸馏算法来细化细节。然而,这些方法存在两个局限性。首先,由于不同模型旨在生成多样化的3D资产,它们在生成方向上存在冲突。其次,分数蒸馏中的过饱和问题尚未得到充分研究和解决。为了解决这些局限性,我们提出了PlacidDreamer,这是一个文本到3D的框架,它通过单一的多视图扩散模型协调初始化、多视图生成和文本条件生成,同时采用新颖的分数蒸馏算法来实现均衡的饱和度。为了统一生成方向,我们引入了Latent-Plane模块,这是一个易于训练的插件式扩展,使多视图扩散模型能够为初始化提供快速的几何重建,并生成增强的多视图图像来个性化文本到图像扩散模型。为了解决过饱和问题,我们将分数蒸馏视为多目标优化问题,并提出了均衡分数蒸馏算法,该算法提供了帕累托最优解,既能实现丰富的细节,又能保持均衡的饱和度。大量实验验证了我们的PlacidDreamer具有出色的能力。\n"
  },
  {
    "path": "abs/2407.14108.md",
    "content": "### GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation\n\nThe Bird's-eye View (BeV) representation is widely used for 3D perception from multi-view camera images. It allows to merge features from different cameras into a common space, providing a unified representation of the 3D scene. The key component is the view transformer, which transforms image views into the BeV. However, actual view transformer methods based on geometry or cross-attention do not provide a sufficiently detailed representation of the scene, as they use a sub-sampling of the 3D space that is non-optimal for modeling the fine structures of the environment. In this paper, we propose GaussianBeV, a novel method for transforming image features to BeV by finely representing the scene using a set of 3D gaussians located and oriented in 3D space. This representation is then splattered to produce the BeV feature map by adapting recent advances in 3D representation rendering based on gaussian splatting. GaussianBeV is the first approach to use this 3D gaussian modeling and 3D scene rendering process online, i.e. without optimizing it on a specific scene and directly integrated into a single stage model for BeV scene understanding. Experiments show that the proposed representation is highly effective and place GaussianBeV as the new state-of-the-art on the BeV semantic segmentation task on the nuScenes dataset.\n\n鸟瞰图(BeV)表示法被广泛用于多视角相机图像的3D感知。它允许将不同相机的特征合并到一个公共空间中,为3D场景提供统一的表示。其关键组件是视图转换器,它将图像视图转换为BeV。然而,基于几何或交叉注意力的现有视图转换器方法无法提供足够详细的场景表示,因为它们使用的3D空间子采样对于建模环境的精细结构来说并不理想。在本文中,我们提出了GaussianBeV,这是一种新颖的方法,通过使用一组位于3D空间中并具有特定朝向的3D高斯分布来精细地表示场景,从而将图像特征转换为BeV。然后,通过改编最近基于高斯喷溅的3D表示渲染进展,将这种表示喷溅生成BeV特征图。GaussianBeV是第一个在线使用这种3D高斯建模和3D场景渲染过程的方法,即无需针对特定场景进行优化,而是直接集成到单阶段模型中用于BeV场景理解。实验表明,所提出的表示方法非常有效,使GaussianBeV成为nuScenes数据集上BeV语义分割任务的新的最先进技术。\n"
  },
  {
    "path": "abs/2407.14846.md",
    "content": "### Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting\n\nComputer vision technologies markedly enhance the automation capabilities of robotic-assisted minimally invasive surgery (RAMIS) through advanced tool tracking, detection, and localization. However, the limited availability of comprehensive surgical datasets for training represents a significant challenge in this field. This research introduces a novel method that employs 3D Gaussian Splatting to generate synthetic surgical datasets. We propose a method for extracting and combining 3D Gaussian representations of surgical instruments and background operating environments, transforming and combining them to generate high-fidelity synthetic surgical scenarios. We developed a data recording system capable of acquiring images alongside tool and camera poses in a surgical scene. Using this pose data, we synthetically replicate the scene, thereby enabling direct comparisons of the synthetic image quality (29.592 PSNR). As a further validation, we compared two YOLOv5 models trained on the synthetic and real data, respectively, and assessed their performance in an unseen real-world test dataset. Comparing the performances, we observe an improvement in neural network performance, with the synthetic-trained model outperforming the real-world trained model by 12%, testing both on real-world data.\n\n计算机视觉技术通过先进的工具跟踪、检测和定位显著提高了机器人辅助微创手术(RAMIS)的自动化能力。然而,用于训练的综合外科数据集的有限可用性在该领域代表了一个重大挑战。本研究引入了一种新颖的方法,利用3D高斯喷溅技术生成合成外科数据集。我们提出了一种方法,用于提取和组合外科器械和背景手术环境的3D高斯表示,对它们进行变换和组合以生成高保真度的合成手术场景。我们开发了一个数据记录系统,能够在手术场景中同时获取图像以及工具和相机的姿态。使用这些姿态数据,我们合成地复制了场景,从而能够直接比较合成图像的质量(29.592 PSNR)。作为进一步的验证,我们比较了分别在合成数据和真实数据上训练的两个YOLOv5模型,并评估了它们在未见过的真实世界测试数据集上的表现。通过比较性能,我们观察到神经网络性能有所提升,在真实世界数据的测试中,合成数据训练的模型比真实世界数据训练的模型表现提高了12%。\n"
  },
  {
    "path": "abs/2407.15070.md",
    "content": "### 3D Gaussian Parametric Head Model\n\nCreating high-fidelity 3D human head avatars is crucial for applications in VR/AR, telepresence, digital human interfaces, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles and accessories, and suffer from low rendering quality and efficiency. This paper introduces a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. Additionally, it enables seamless face portrait interpolation and the reconstruction of detailed head avatars from a single image. Unlike previous methods, the Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, this paper presents a well-designed training framework to ensure smooth convergence, providing a guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models.\n\n创建高保真度的3D人头头像对于虚拟现实/增强现实、远程存在、数字人界面和电影制作等应用至关重要。近期的进展利用了可变形面部模型,从易于获取的数据生成动画头像,在低维参数空间内表示不同的身份和表情。然而,现有方法往往难以模拟复杂的外观细节,如发型和配饰,并且存在渲染质量低和效率低的问题。本文介绍了一种新颖的方法,3D高斯参数化头部模型,该模型采用3D高斯分布精确表示人头的复杂性,允许对身份和表情进行精确控制。此外,它还能实现无缝的面部肖像插值和从单一图像重建详细的头像。与之前的方法不同,高斯模型可以处理复杂的细节,能够逼真地表现各种外观和复杂的表情。此外,本文提出了一个精心设计的训练框架,以确保平稳收敛,为学习丰富内容提供保证。我们的方法实现了高质量、真实感的渲染,同时具有实时效率,为参数化头部模型领域做出了宝贵贡献。\n"
  },
  {
    "path": "abs/2407.15187.md",
    "content": "### HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions\n\n3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain multiple-view supervision from 2D diffusion models, prevailing methods typically employ the diffusion model to generate an initial local image, followed by iteratively outpainting the local image using diffusion models to gradually generate scenes. Nevertheless, these outpainting-based approaches prone to produce global inconsistent scene generation results without high degree of completeness, restricting their broader applications. To tackle these problems, we introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene, then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes. Specifically, we propose Stylized Equirectangular Panorama Generation, a pipeline that combines multiple diffusion models to enable stylized and detailed equirectangular panorama generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to inpaint the missing region and enhance the integrity of the scene. Comprehensive experiments demonstrated that our method outperforms prior works in terms of overall visual consistency and harmony as well as reconstruction quality and rendering robustness when generating fully enclosed scenes.\n\n3D场景生成在虚拟现实、游戏和电影行业等多个领域有着高度需求。由于文本到图像扩散模型强大的生成能力提供了可靠的先验知识,仅使用文本提示创建3D场景已成为可能,从而显著推进了文本驱动的3D场景生成研究。为了从2D扩散模型获得多视角监督,主流方法通常使用扩散模型生成初始局部图像,然后通过迭代使用扩散模型对局部图像进行外扩来逐步生成场景。然而,这些基于外扩的方法容易产生全局不一致的场景生成结果,且完整度不高,限制了它们的广泛应用。为了解决这些问题,我们提出了HoloDreamer,这是一个首先生成高清全景图作为整个3D场景的整体初始化,然后利用3D高斯喷溅(3D-GS)快速重建3D场景的框架,从而促进创建视角一致且完全封闭的3D场景。具体来说,我们提出了风格化等距矩形全景图生成,这是一个结合多个扩散模型的管道,能够从复杂的文本提示生成风格化和详细的等距矩形全景图。随后,我们引入了增强型两阶段全景图重建,对3D-GS进行两阶段优化,以修补缺失区域并提高场景的完整性。全面的实验表明,在生成完全封闭的场景时,我们的方法在整体视觉一致性和和谐性以及重建质量和渲染稳健性方面优于先前的工作。\n"
  },
  {
    "path": "abs/2407.15435.md",
    "content": "### Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures\n\nThe photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream technology in 3D reconstruction. Its only input is a set of images but it relies heavily on geometric parameters computed by the SfM process. At the same time, there is an existing abundance of raw 3D models, that could inform the structural perception of certain buildings but cannot be applied. In this paper, we propose a straightforward method to harness these raw 3D models to guide 3D Gaussians in capturing the basic shape of the building and improve the visual quality of textures and details when photos are captured non-systematically. This exploration opens up new possibilities for improving the effectiveness of 3D reconstruction techniques in the field of architectural design.\n\n建筑场景的真实感重建和渲染在电影、游戏和交通等行业有广泛应用。它在城市规划、建筑设计和城市宣传中也发挥着重要作用,尤其是在保护历史文化遗迹方面。由于比NeRF表现更好,3D高斯喷溅已成为3D重建的主流技术。它的唯一输入是一组图像,但它严重依赖于SfM过程计算的几何参数。同时,现有大量原始3D模型可以提供某些建筑的结构感知,但无法直接应用。在本文中,我们提出了一种直接的方法来利用这些原始3D模型来指导3D高斯分布捕捉建筑物的基本形状,并在非系统性拍摄照片时改善纹理和细节的视觉质量。这一探索为提高建筑设计领域3D重建技术的有效性开辟了新的可能性。\n"
  },
  {
    "path": "abs/2407.15484.md",
    "content": "### 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model\n\nWe propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e.g. iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an \"a priori\" pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by 12% and translation accuracy by 22% on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15fps on consumer hardware.\n\n我们提出了6DGS,用于在给定表示场景的3D高斯喷溅(3DGS)模型的情况下估计目标RGB图像的相机姿态。6DGS避免了分析合成方法(如iNeRF)典型的迭代过程,这些方法还需要对相机姿态进行初始化才能收敛。相反,我们的方法通过反转3DGS渲染过程来估计6自由度姿态。从物体表面开始,我们定义了一个辐射椭胞体(Ellicell),它均匀地生成从参数化3DGS模型的每个椭球体出发的射线。每个椭胞体射线与每个椭球体的渲染参数相关联,这反过来用于获得目标图像像素和投射射线之间的最佳绑定。然后对这些像素-射线绑定进行排序,以选择得分最高的射线束,其交点提供相机中心,进而确定相机旋转。提出的解决方案避免了需要\"先验\"姿态进行初始化,并以闭式形式解决6自由度姿态估计,无需迭代。此外,与用于姿态估计的现有新视角合成(NVS)基准相比,6DGS在真实场景中可以将整体平均旋转精度提高12%,平移精度提高22%,尽管不需要任何初始化姿态。同时,我们的方法接近实时运行,在消费级硬件上达到15fps。\n"
  },
  {
    "path": "abs/2407.16173.md",
    "content": "### Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance\n\nWe present a novel approach for 3D indoor scene reconstruction that combines 3D Gaussian Splatting (3DGS) with mesh representations. We use meshes for the room layout of the indoor scene, such as walls, ceilings, and floors, while employing 3D Gaussians for other objects. This hybrid approach leverages the strengths of both representations, offering enhanced flexibility and ease of editing. However, joint training of meshes and 3D Gaussians is challenging because it is not clear which primitive should affect which part of the rendered image. Objects close to the room layout often struggle during training, particularly when the room layout is textureless, which can lead to incorrect optimizations and unnecessary 3D Gaussians. To overcome these challenges, we employ Segment Anything Model (SAM) to guide the selection of primitives. The SAM mask loss enforces each instance to be represented by either Gaussians or meshes, ensuring clear separation and stable training. Furthermore, we introduce an additional densification stage without resetting the opacity after the standard densification. This stage mitigates the degradation of image quality caused by a limited number of 3D Gaussians after the standard densification.\n\n我们提出了一种新颖的 3D 室内场景重建方法，结合了 3D 高斯溅射(3DGS)和网格表示。我们使用网格来表示室内场景的房间布局，如墙壁、天花板和地板，同时使用 3D 高斯函数来表示其他物体。这种混合方法利用了两种表示方式的优势，提供了更高的灵活性和更容易的编辑能力。然而，网格和 3D 高斯函数的联合训练具有挑战性，因为不清楚哪个基元应该影响渲染图像的哪个部分。靠近房间布局的物体在训练过程中经常遇到困难，特别是当房间布局没有纹理时，这可能导致不正确的优化和不必要的 3D 高斯函数。为了克服这些挑战，我们采用分割任意物体模型(SAM)来指导基元的选择。SAM 掩码损失强制每个实例要么由高斯函数表示，要么由网格表示，确保清晰的分离和稳定的训练。此外，我们引入了一个额外的密集化阶段，在标准密集化之后不重置不透明度。这个阶段缓解了标准密集化后由于 3D 高斯函数数量有限而导致的图像质量下降问题。\n"
  },
  {
    "path": "abs/2407.16503.md",
    "content": "### HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images\n\nThe recent advent of 3D Gaussian Splatting (3DGS) has revolutionized the 3D scene reconstruction space enabling high-fidelity novel view synthesis in real-time. However, with the exception of RawNeRF, all prior 3DGS and NeRF-based methods rely on 8-bit tone-mapped Low Dynamic Range (LDR) images for scene reconstruction. Such methods struggle to achieve accurate reconstructions in scenes that require a higher dynamic range. Examples include scenes captured in nighttime or poorly lit indoor spaces having a low signal-to-noise ratio, as well as daylight scenes with shadow regions exhibiting extreme contrast. Our proposed method HDRSplat tailors 3DGS to train directly on 14-bit linear raw images in near darkness which preserves the scenes' full dynamic range and content. Our key contributions are two-fold: Firstly, we propose a linear HDR space-suited loss that effectively extracts scene information from noisy dark regions and nearly saturated bright regions simultaneously, while also handling view-dependent colors without increasing the degree of spherical harmonics. Secondly, through careful rasterization tuning, we implicitly overcome the heavy reliance and sensitivity of 3DGS on point cloud initialization. This is critical for accurate reconstruction in regions of low texture, high depth of field, and low illumination. HDRSplat is the fastest method to date that does 14-bit (HDR) 3D scene reconstruction in ≤15 minutes/scene (∼30x faster than prior state-of-the-art RawNeRF). It also boasts the fastest inference speed at ≥120fps. We further demonstrate the applicability of our HDR scene reconstruction by showcasing various applications like synthetic defocus, dense depth map extraction, and post-capture control of exposure, tone-mapping and view-point.\n\n最近出现的 3D 高斯溅射(3DGS)技术彻底改变了 3D 场景重建领域，实现了实时高保真新视角合成。然而，除了 RawNeRF 之外，所有先前的基于 3DGS 和 NeRF 的方法都依赖于 8 位色调映射的低动态范围(LDR)图像进行场景重建。这些方法在需要更高动态范围的场景中难以实现准确重建。例如，在夜间或光线不足的室内空间拍摄的场景具有低信噪比，以及具有极端对比度阴影区域的日光场景。\n我们提出的 HDRSplat 方法将 3DGS 定制为直接在接近黑暗的 14 位线性原始图像上进行训练，从而保留场景的全动态范围和内容。我们的主要贡献有两个方面：\n首先，我们提出了一种适用于线性 HDR 空间的损失函数，可以同时有效地从噪声较大的暗区和几乎饱和的亮区提取场景信息，同时在不增加球谐函数阶数的情况下处理视角依赖的颜色。\n其次，通过精心调整光栅化，我们隐式地克服了 3DGS 对点云初始化的严重依赖和敏感性。这对于在低纹理、高景深和低照度区域进行准确重建至关重要。\nHDRSplat 是迄今为止最快的 14 位(HDR)3D 场景重建方法，每个场景仅需 ≤15 分钟（比之前最先进的 RawNeRF 快约 30 倍）。它还拥有最快的推理速度，达到 ≥120fps。我们进一步展示了我们的 HDR 场景重建的应用性，展示了各种应用，如合成散焦、密集深度图提取，以及后期捕捉控制曝光、色调映射和视点。\n"
  },
  {
    "path": "abs/2407.16600.md",
    "content": "### DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene\n\nExisting Gaussian splatting methods struggle to achieve satisfactory novel view synthesis in driving scenes due to the lack of crafty design and geometric constraints of related elements. This paper introduces a novel method called Decoupled Hybrid Gaussian Splatting (DHGS), which aims at promoting the rendering quality of novel view synthesis for driving scenes. The novelty of this work lies in the decoupled and hybrid pixel-level blender for road and non-road layers, without conventional unified differentiable rendering logic for the entire scene, meanwhile maintaining consistent and continuous superimposition through the proposed depth-ordered rendering strategy. Beyond that, an implicit road representation comprised of Signed Distance Field (SDF) is trained to supervise the road surface with subtle geometric attributes. Accompanied by the use of auxiliary transmittance loss and consistency loss, novel images with imperceptible boundary and elevated fidelity are ultimately obtained. Substantial experiments on Waymo dataset prove that DHGS outperforms the state-of-the-art methods.\n\n现有的高斯溅射方法在驾驶场景中难以实现令人满意的新视角合成，这是由于缺乏相关元素的巧妙设计和几何约束。本文提出了一种新颖的方法，称为解耦混合高斯溅射(Decoupled Hybrid Gaussian Splatting, DHGS)，旨在提高驾驶场景新视角合成的渲染质量。该工作的创新之处在于为道路和非道路层设计了解耦和混合的像素级混合器，而不是采用传统的整个场景统一可微渲染逻辑，同时通过提出的深度排序渲染策略保持一致和连续的叠加。\n此外，我们训练了一个由符号距离场(Signed Distance Field, SDF)组成的隐式道路表示，用于监督具有微妙几何属性的道路表面。在辅助透明度损失和一致性损失的使用下，最终获得了具有难以察觉的边界和提高保真度的新颖图像。在 Waymo 数据集上进行的大量实验证明，DHGS 的性能优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2407.17418.md",
    "content": "### 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities\n\n3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations. It can effectively transform multi-view images into explicit 3D Gaussian representations through efficient training, and achieve real-time rendering of novel views. This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives, including related tasks, technologies, challenges, and opportunities. The primary objective is to provide newcomers with a rapid understanding of the field and to assist researchers in methodically organizing existing technologies and challenges. Specifically, we delve into the optimization, application, and extension of 3DGS, categorizing them based on their focuses or motivations. Additionally, we summarize and classify nine types of technical modules and corresponding improvements identified in existing works. Based on these analyses, we further examine the common challenges and technologies across various tasks, proposing potential research opportunities.\n\n3D 高斯溅射(3DGS)已经成为一种突出的技术，有潜力成为 3D 表示的主流方法。它可以通过高效的训练将多视角图像有效地转换为显式的 3D 高斯表示，并实现新视角的实时渲染。本综述旨在从多个交叉的角度分析现有的 3DGS 相关工作，包括相关任务、技术、挑战和机遇。主要目标是帮助新人快速了解该领域，并协助研究人员系统地组织现有技术和挑战。\n具体而言，我们深入研究了 3DGS 的优化、应用和扩展，并根据它们的关注点或动机进行分类。此外，我们总结和分类了在现有工作中发现的九种技术模块和相应的改进。基于这些分析，我们进一步研究了各种任务中的共同挑战和技术，提出了潜在的研究机会。\n"
  },
  {
    "path": "abs/2407.19035.md",
    "content": "### ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting\n\nThe creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to-3D technologies, allowing non-professionals to produce 3D content and decreasing dependence on expert input. Existing methods for 3D content generation struggle to simultaneously achieve detailed textures and strong geometric consistency. We introduce a novel 3D content creation framework, ScalingGaussian, which combines 3D and 2D diffusion models to achieve detailed textures and geometric consistency in generated 3D assets. Initially, a 3D diffusion model generates point clouds, which are then densified through a process of selecting local regions, introducing Gaussian noise, followed by using local density-weighted selection. To refine the 3D gaussians, we utilize a 2D diffusion model with Score Distillation Sampling (SDS) loss, guiding the 3D Gaussians to clone and split. Finally, the 3D Gaussians are converted into meshes, and the surface textures are optimized using Mean Square Error(MSE) and Gradient Profile Prior(GPP) losses. Our method addresses the common issue of sparse point clouds in 3D diffusion, resulting in improved geometric structure and detailed textures. Experiments on image-to-3D tasks demonstrate that our approach efficiently generates high-quality 3D assets.\n\n创建高质量的3D资产在数字遗产保护、娱乐和机器人技术应用中至关重要。传统上，这一过程需要专业人士和专门的软件来进行3D对象的建模、纹理贴图和渲染。然而，随着游戏和虚拟现实（VR）对3D资产需求的上升，出现了便捷的图像转3D技术，使非专业人士能够制作3D内容，从而减少对专家的依赖。现有的3D内容生成方法在同时实现详细纹理和强几何一致性方面存在困难。我们提出了一种新型的3D内容创建框架——ScalingGaussian，该框架结合了3D和2D扩散模型，以在生成的3D资产中实现详细纹理和几何一致性。首先，3D扩散模型生成点云，然后通过选择局部区域、引入高斯噪声以及局部密度加权选择的过程来加密这些点云。为了精炼3D高斯分布，我们利用带有评分蒸馏采样（SDS）损失的2D扩散模型，指导3D高斯分布进行克隆和拆分。最后，将3D高斯分布转换为网格，并使用均方误差（MSE）和梯度轮廓先验（GPP）损失优化表面纹理。我们的方法解决了3D扩散中的点云稀疏问题，改善了几何结构和纹理细节。对图像转3D任务的实验表明，我们的方法能够高效生成高质量的3D资产。\n"
  },
  {
    "path": "abs/2407.20213.md",
    "content": "### Registering Neural 4D Gaussians for Endoscopic Surgery\n\nThe recent advance in neural rendering has enabled the ability to reconstruct high-quality 4D scenes using neural networks. Although 4D neural reconstruction is popular, registration for such representations remains a challenging task, especially for dynamic scene registration in surgical planning and simulation. In this paper, we propose a novel strategy for dynamic surgical neural scene registration. We first utilize 4D Gaussian Splatting to represent the surgical scene and capture both static and dynamic scenes effectively. Then, a spatial aware feature aggregation method, Spatially Weight Cluttering (SWC) is proposed to accurately align the feature between surgical scenes, enabling precise and realistic surgical simulations. Lastly, we present a novel strategy of deformable scene registration to register two dynamic scenes. By incorporating both spatial and temporal information for correspondence matching, our approach achieves superior performance compared to existing registration methods for implicit neural representation. The proposed method has the potential to improve surgical planning and training, ultimately leading to better patient outcomes.\n\n最近的神经渲染进展使得利用神经网络重建高质量的 4D 场景成为可能。尽管 4D 神经重建很流行，但这种表示的配准仍然是一项具有挑战性的任务，尤其是在外科规划和模拟中的动态场景配准。本文提出了一种用于动态外科神经场景配准的新策略。我们首先利用 4D 高斯点光源来有效表示外科场景并捕捉静态和动态场景。然后，我们提出了一种空间感知特征聚合方法，称为空间加权杂乱（Spatially Weight Cluttering，SWC），用于准确对齐外科场景中的特征，从而实现精确且逼真的外科模拟。最后，我们提出了一种新颖的可变形场景配准策略，用于配准两个动态场景。通过结合空间和时间信息进行对应匹配，我们的方法在隐式神经表示的现有配准方法中表现优越。该方法有望改善外科规划和培训，最终提高患者的治疗效果。\n\n"
  },
  {
    "path": "abs/2407.21686.md",
    "content": "### Expressive Whole-Body 3D Gaussian Avatar\n\nFacial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand motions.In this work, we present ExAvatar, an expressive whole-body 3D human avatar learned from a short monocular video. We design ExAvatar as a combination of the whole-body parametric mesh model (SMPL-X) and 3D Gaussian Splatting (3DGS). The main challenges are 1) a limited diversity of facial expressions and poses in the video and 2) the absence of 3D observations, such as 3D scans and RGBD images. The limited diversity in the video makes animations with novel facial expressions and poses non-trivial. In addition, the absence of 3D observations could cause significant ambiguity in human parts that are not observed in the video, which can result in noticeable artifacts under novel motions. To address them, we introduce our hybrid representation of the mesh and 3D Gaussians. Our hybrid representation treats each 3D Gaussian as a vertex on the surface with pre-defined connectivity information (i.e., triangle faces) between them following the mesh topology of SMPL-X. It makes our ExAvatar animatable with novel facial expressions by driven by the facial expression space of SMPL-X. In addition, by using connectivity-based regularizers, we significantly reduce artifacts in novel facial expressions and poses.\n\n面部表情和手部动作对表达情感和与世界互动至关重要。然而，大多数从随意捕获的视频中建模的3D人类头像仅支持身体动作而不支持面部表情和手部动作。在这项工作中，我们提出了ExAvatar，这是一种从短单目视频中学习到的具有表现力的全身3D人类头像。我们将ExAvatar设计为全身参数化网格模型（SMPL-X）和3D高斯点云（3DGS）的组合。主要挑战包括：1）视频中的面部表情和姿势多样性有限，2）缺乏3D观测，如3D扫描和RGBD图像。视频中的有限多样性使得带有新面部表情和姿势的动画变得复杂。此外，缺乏3D观测可能导致视频中未观察到的人体部分出现显著模糊，这可能在新动作下产生明显的伪影。为解决这些问题，我们引入了网格和3D高斯点云的混合表示。我们的混合表示将每个3D高斯点视为表面上的一个顶点，并根据SMPL-X的网格拓扑定义其连接信息（即三角形面）。这使得我们的ExAvatar能够通过SMPL-X的面部表情空间驱动，从而实现新面部表情的动画。此外，通过使用基于连接的正则化器，我们显著减少了新面部表情和姿势下的伪影。\n"
  },
  {
    "path": "abs/2408.00083.md",
    "content": "### Localized Gaussian Splatting Editing with Contextual Awareness\n\nRecent text-guided generation of individual 3D object has achieved great success using diffusion priors. However, these methods are not suitable for object insertion and replacement tasks as they do not consider the background, leading to illumination mismatches within the environment. To bridge the gap, we introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS) representation. Our key observation is that inpainting by the state-of-the-art conditional 2D diffusion model is consistent with background in lighting. To leverage the prior knowledge from the well-trained diffusion models for 3D object generation, our approach employs a coarse-to-fine objection optimization pipeline with inpainted views. In the first coarse step, we achieve image-to-3D lifting given an ideal inpainted view. The process employs 3D-aware diffusion prior from a view-conditioned diffusion model, which preserves illumination present in the conditioning image. To acquire an ideal inpainted image, we introduce an Anchor View Proposal (AVP) algorithm to find a single view that best represents the scene illumination in target region. In the second Texture Enhancement step, we introduce a novel Depth-guided Inpainting Score Distillation Sampling (DI-SDS), which enhances geometry and texture details with the inpainting diffusion prior, beyond the scope of the 3D-aware diffusion prior knowledge in the first coarse step. DI-SDS not only provides fine-grained texture enhancement, but also urges optimization to respect scene lighting. Our approach efficiently achieves local editing with global illumination consistency without explicitly modeling light transport. We demonstrate robustness of our method by evaluating editing in real scenes containing explicit highlight and shadows, and compare against the state-of-the-art text-to-3D editing methods.\n\n最近，文本引导的个体3D对象生成在扩散先验的帮助下取得了巨大成功。然而，这些方法不适用于对象插入和替换任务，因为它们没有考虑背景，导致环境中的光照不匹配。为此，我们提出了一种基于光照感知的3D场景编辑流程，适用于3D高斯点云（3DGS）表示。我们的关键观察是，最先进的条件2D扩散模型的图像修复与背景光照一致。为了利用训练良好的扩散模型在3D对象生成中的先验知识，我们的方法采用了一个从粗到细的目标优化流程，结合了修复视图。在第一个粗略步骤中，我们根据理想的修复视图实现图像到3D的提升。该过程使用来自视图条件扩散模型的3D感知扩散先验，保留了条件图像中的光照。为了获得理想的修复图像，我们引入了一种锚视图提议（AVP）算法，用于找到一个最佳视图，以代表目标区域的场景光照。在第二个纹理增强步骤中，我们引入了一种新型的深度引导修复评分蒸馏采样（DI-SDS），它通过修复扩散先验增强几何和纹理细节，超出了第一粗略步骤中3D感知扩散先验的范围。DI-SDS不仅提供了细粒度的纹理增强，还促进了优化以尊重场景光照。我们的方法高效地实现了局部编辑与全球光照一致性，无需明确建模光传输。通过在包含显著高光和阴影的真实场景中评估编辑效果，并与最先进的文本到3D编辑方法进行比较，我们展示了我们方法的鲁棒性。\n"
  },
  {
    "path": "abs/2408.00254.md",
    "content": "### LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting\n\nDespite the photorealistic novel view synthesis (NVS) performance achieved by the original 3D Gaussian splatting (3DGS), its rendering quality significantly degrades with sparse input views. This performance drop is mainly caused by the limited number of initial points generated from the sparse input, insufficient supervision during the training process, and inadequate regularization of the oversized Gaussian ellipsoids. To handle these issues, we propose the LoopSparseGS, a loop-based 3DGS framework for the sparse novel view synthesis task. In specific, we propose a loop-based Progressive Gaussian Initialization (PGI) strategy that could iteratively densify the initialized point cloud using the rendered pseudo images during the training process. Then, the sparse and reliable depth from the Structure from Motion, and the window-based dense monocular depth are leveraged to provide precise geometric supervision via the proposed Depth-alignment Regularization (DAR). Additionally, we introduce a novel Sparse-friendly Sampling (SFS) strategy to handle oversized Gaussian ellipsoids leading to large pixel errors. Comprehensive experiments on four datasets demonstrate that LoopSparseGS outperforms existing state-of-the-art methods for sparse-input novel view synthesis, across indoor, outdoor, and object-level scenes with various image resolutions.\n\n尽管原始的 3D 高斯点云（3DGS）在照片级真实感的新视角合成（NVS）中取得了良好的性能，但其渲染质量在稀疏输入视角下显著下降。这种性能下降主要是由于稀疏输入生成的初始点数量有限、训练过程中的监督不足以及过大的高斯椭球体的正则化不充分。为了解决这些问题，我们提出了 LoopSparseGS，一个基于循环的 3DGS 框架，用于稀疏新视角合成任务。具体来说，我们提出了一种基于循环的渐进高斯初始化（PGI）策略，该策略可以利用训练过程中的渲染伪图像迭代地稠密化初始化点云。然后，利用来自结构光束法（Structure from Motion）的稀疏且可靠的深度信息和基于窗口的密集单目深度，为提出的深度对齐正则化（DAR）提供精确的几何监督。此外，我们引入了一种新颖的稀疏友好采样（SFS）策略，以处理导致大像素误差的过大高斯椭球体。对四个数据集的全面实验表明，LoopSparseGS 在稀疏输入的新视角合成任务中，优于现有的最先进方法，适用于各种图像分辨率的室内、室外和对象级场景。\n"
  },
  {
    "path": "abs/2408.00297.md",
    "content": "### EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head\n\nWe present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations, and per-frame 3D geometry. By training on the EmoTalk3D dataset, we propose a Speech-to-Geometry-to-Appearance mapping framework that first predicts faithful 3D geometry sequence from the audio features, then the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned from multi-view videos, and fused to render free-view talking head animation. Moreover, our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views. Our method exhibits improved rendering quality and stability in lip motion generation while capturing dynamic facial details such as wrinkles and subtle expressions. Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.\n\n我们提出了一种新颖的方法，用于合成可控情感的三维说话头部模型，具有增强的唇部同步性和渲染质量。尽管该领域取得了显著进展，但现有方法仍存在多视角一致性差和情感表现不足的问题。为解决这些问题，我们收集了带有校准多视角视频、情感标注和每帧三维几何数据的EmoTalk3D数据集。通过在EmoTalk3D数据集上训练，我们提出了一个从语音到几何再到外观的映射框架，该框架首先根据音频特征预测忠实的三维几何序列，然后从预测的几何中合成由4D高斯表示的三维说话头部的外观。外观进一步被解构为从多视角视频中学习到的标准和动态高斯，并融合以渲染自由视角的说话头部动画。此外，我们的模型能够在生成的说话头部中实现可控的情感，并能在广泛的视角中渲染。我们的方法在唇部运动生成的渲染质量和稳定性方面表现出色，同时捕捉到动态的面部细节，如皱纹和细微表情。实验表明，我们的方法在生成高保真度和可控情感的三维说话头部方面有效。\n"
  },
  {
    "path": "abs/2408.01126.md",
    "content": "### IG-SLAM: Instant Gaussian SLAM\n\n3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems to neural implicit representations. However, current methods either lack dense depth maps to supervise the mapping process or detailed training designs that consider the scale of the environment. To address these drawbacks, we present IG-SLAM, a dense RGB-only SLAM system that employs robust Dense-SLAM methods for tracking and combines them with Gaussian Splatting. A 3D map of the environment is constructed using accurate pose and dense depth provided by tracking. Additionally, we utilize depth uncertainty in map optimization to improve 3D reconstruction. Our decay strategy in map optimization enhances convergence and allows the system to run at 10 fps in a single process. We demonstrate competitive performance with state-of-the-art RGB-only SLAM systems while achieving faster operation speeds. We present our experiments on the Replica, TUM-RGBD, ScanNet, and EuRoC datasets. The system achieves photo-realistic 3D reconstruction in large-scale sequences, particularly in the EuRoC dataset.\n\n\n3D Gaussian Splatting 最近作为 SLAM 系统中的一种替代场景表示方法显示了良好的前景，相较于神经隐式表示方法。然而，目前的方法要么缺乏密集深度图来监督映射过程，要么缺乏考虑环境规模的详细训练设计。为了解决这些缺陷，我们提出了 IG-SLAM，这是一个密集 RGB-only SLAM 系统，采用强健的 Dense-SLAM 方法进行跟踪，并将其与 Gaussian Splatting 相结合。通过跟踪提供的准确姿态和密集深度，构建环境的 3D 地图。此外，我们利用深度不确定性进行地图优化，以改进 3D 重建。我们在地图优化中的衰减策略提高了收敛速度，使系统能够以每秒 10 帧的速度运行。我们展示了与最先进的 RGB-only SLAM 系统相竞争的性能，同时实现了更快的操作速度。我们在 Replica、TUM-RGBD、ScanNet 和 EuRoC 数据集上进行了实验。该系统在大规模序列中实现了逼真的 3D 重建，特别是在 EuRoC 数据集上。\n"
  },
  {
    "path": "abs/2408.01225.md",
    "content": "### Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion\n\nWe introduce Reality Fusion, a novel robot teleoperation system that localizes, streams, projects, and merges a typical onboard depth sensor with a photorealistic, high resolution, high framerate, and wide field of view (FoV) rendering of the complex remote environment represented as 3D Gaussian splats (3DGS). Our framework enables robust egocentric and exocentric robot teleoperation in immersive VR, with the 3DGS effectively extending spatial information of a depth sensor with limited FoV and balancing the trade-off between data streaming costs and data visual quality. We evaluated our framework through a user study with 24 participants, which revealed that Reality Fusion leads to significantly better user performance, situation awareness, and user preferences. To support further research and development, we provide an open-source implementation with an easy-to-replicate custom-made telepresence robot, a high-performance virtual reality 3DGS renderer, and an immersive robot control package.\n\n我们介绍了 Reality Fusion，这是一种新型机器人远程操作系统，它通过本地化、流式传输、投影和融合典型的板载深度传感器数据与复杂远程环境的光逼真、高分辨率、高帧率和宽视场（FoV）的 3D Gaussian splats（3DGS）渲染。我们的框架支持沉浸式 VR 中的强大自我中心和外部中心机器人远程操作，3DGS 有效地扩展了视场有限的深度传感器的空间信息，并平衡了数据流成本和视觉质量之间的权衡。我们通过一项涉及 24 位参与者的用户研究评估了我们的框架，结果显示 Reality Fusion 显著提升了用户表现、情况意识和用户偏好。为了支持进一步的研究和开发，我们提供了一个开源实现，包括易于复制的定制远程存在机器人、高性能虚拟现实 3DGS 渲染器和沉浸式机器人控制包。\n"
  },
  {
    "path": "abs/2408.01269.md",
    "content": "### A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness\n\nText-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., \"a dog\", not for lexically richer texts, e.g., \"a dog is sitting on the top of the airplane\". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.\n\n文本到 3D 内容生成最近受到了广泛关注，特别是在 3D Gaussian Splatting 的普及背景下。一般来说，基于 GS 的方法包括两个关键阶段：初始化和渲染优化。为了实现初始化，现有工作通常直接应用随机球体初始化或 3D 扩散模型（如 Point-E）来推导初始形状。然而，这些策略存在两个关键且具有挑战性的问题：1）即使在训练后，最终形状仍然类似于初始形状；2）形状只能从简单文本中生成，例如“狗”，而不能从语义更丰富的文本中生成，例如“一只狗坐在飞机上”。为了解决这些问题，本文提出了一个新颖的通用框架，以提高基于文本到 3D 生成的 3D GS 初始化性能。我们的核心思想是将 3D Gaussian 聚合到空间均匀的体素中，以表示复杂的形状，同时实现 3D Gaussian 之间的空间交互和 Gaussian 与文本之间的语义交互。具体而言，我们首先构建一个体素化表示，每个体素包含一个固定位置、尺度和旋转的 3D Gaussian，同时将透明度设置为确定位置占用的唯一因素。然后，我们设计了一个初始化网络，主要包括两个新颖的组件：1）全局信息感知（GIP）块和 2）Gaussian-Text 融合（GTF）块。这样的设计使每个 3D Gaussian 能够吸收来自其他区域的空间信息以及来自文本的语义信息。大量实验表明，我们的高质量 3D GS 初始化框架在处理简单、中等和困难文本时相较于现有方法（如 Shap-E）具有显著优势。此外，我们的框架可以无缝地集成到最先进的训练框架（如 LucidDreamer）中，以实现语义一致的文本到 3D 生成。\n"
  },
  {
    "path": "abs/2408.03060.md",
    "content": "### MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images\n\nOver the last few decades, image-based building surface reconstruction has garnered substantial research interest and has been applied across various fields, such as heritage preservation, architectural planning, etc. Compared to the traditional photogrammetric and NeRF-based solutions, recently, Gaussian fields-based methods have exhibited significant potential in generating surface meshes due to their time-efficient training and detailed 3D information preservation. However, most gaussian fields-based methods are trained with all image pixels, encompassing building and nonbuilding areas, which results in a significant noise for building meshes and degeneration in time efficiency. This paper proposes a novel framework, Masked Gaussian Fields (MGFs), designed to generate accurate surface reconstruction for building in a time-efficient way. The framework first applies EfficientSAM and COLMAP to generate multi-level masks of building and the corresponding masked point clouds. Subsequently, the masked gaussian fields are trained by integrating two innovative losses: a multi-level perceptual masked loss focused on constructing building regions and a boundary loss aimed at enhancing the details of the boundaries between different masks. Finally, we improve the tetrahedral surface mesh extraction method based on the masked gaussian spheres. Comprehensive experiments on UAV images demonstrate that, compared to the traditional method and several NeRF-based and Gaussian-based SOTA solutions, our approach significantly improves both the accuracy and efficiency of building surface reconstruction. Notably, as a byproduct, there is an additional gain in the novel view synthesis of building.\n\n在过去几十年中，基于图像的建筑表面重建吸引了大量研究兴趣，并广泛应用于遗产保护、建筑规划等领域。与传统的摄影测量和 NeRF 基于的解决方案相比，最近基于 Gaussian 场的方法在生成表面网格方面表现出了显著的潜力，因为它们在训练时间上更高效且能够保留详细的 3D 信息。然而，大多数基于 Gaussian 场的方法在训练时会使用所有图像像素，包括建筑物和非建筑物区域，这会导致建筑网格的显著噪声并降低时间效率。为了解决这些问题，本文提出了一个新颖的框架——遮罩 Gaussian 场（MGFs），旨在以时间高效的方式生成准确的建筑表面重建。该框架首先应用 EfficientSAM 和 COLMAP 生成建筑物的多层次遮罩及相应的遮罩点云。随后，通过整合两个创新损失函数来训练遮罩 Gaussian 场：一个关注建筑区域构建的多层次感知遮罩损失，以及一个旨在增强不同遮罩之间边界细节的边界损失。最后，我们改进了基于遮罩 Gaussian 球体的四面体表面网格提取方法。通过对无人机图像的全面实验表明，与传统方法及若干 NeRF 基于和 Gaussian 基于的最新解决方案相比，我们的方法显著提高了建筑表面重建的准确性和效率。值得注意的是，作为副产品，我们还在建筑的新视角合成中获得了额外的提升。\n"
  },
  {
    "path": "abs/2408.03538.md",
    "content": "### PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting\n\nWe proposed Precomputed RadianceTransfer of GaussianSplats (PRTGS), a real-time high-quality relighting method for Gaussian splats in low-frequency lighting environments that captures soft shadows and interreflections by precomputing 3D Gaussian splats' radiance transfer. Existing studies have demonstrated that 3D Gaussian splatting (3DGS) outperforms neural fields' efficiency for dynamic lighting scenarios. However, the current relighting method based on 3DGS still struggles to compute high-quality shadow and indirect illumination in real time for dynamic light, leading to unrealistic rendering results. We solve this problem by precomputing the expensive transport simulations required for complex transfer functions like shadowing, the resulting transfer functions are represented as dense sets of vectors or matrices for every Gaussian splat. We introduce distinct precomputing methods tailored for training and rendering stages, along with unique ray tracing and indirect lighting precomputation techniques for 3D Gaussian splats to accelerate training speed and compute accurate indirect lighting related to environment light. Experimental analyses demonstrate that our approach achieves state-of-the-art visual quality while maintaining competitive training times and allows high-quality real-time (30+ fps) relighting for dynamic light and relatively complex scenes at 1080p resolution.\n\n我们提出了预计算高斯点光源辐射传输（PRTGS），这是一种针对低频照明环境中高斯点光源的实时高质量重新照明方法，能够通过预计算 3D 高斯点光源的辐射传输来捕捉柔和阴影和间接反射。现有研究已证明 3D 高斯点光源（3DGS）在动态照明场景中比神经场更高效。然而，当前基于 3DGS 的重新照明方法在实时计算高质量阴影和间接光照时仍然存在困难，导致不真实的渲染结果。我们通过预计算复杂传输函数（如阴影）所需的昂贵传输模拟来解决这个问题，得到的传输函数以密集的向量或矩阵集表示每个高斯点光源。我们引入了针对训练和渲染阶段的不同预计算方法，以及独特的光线追踪和间接光照预计算技术，以加速训练速度并计算与环境光相关的准确间接光照。实验分析表明，我们的方法在保持竞争的训练时间的同时实现了最先进的视觉质量，并允许在 1080p 分辨率下对动态光源和相对复杂场景进行高质量的实时（30+ fps）重新照明。\n"
  },
  {
    "path": "abs/2408.03753.md",
    "content": "### 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting\n\nThe use of 3D Gaussians as representation of radiance fields has enabled high quality novel view synthesis at real-time rendering speed. However, the choice of optimising the outgoing radiance of each Gaussian independently as spherical harmonics results in unsatisfactory view dependent effects. In response to these limitations, our work, Factorised Tensorial Illumination for 3D Gaussian Splatting, or 3iGS, improves upon 3D Gaussian Splatting (3DGS) rendering quality. Instead of optimising a single outgoing radiance parameter, 3iGS enhances 3DGS view-dependent effects by expressing the outgoing radiance as a function of a local illumination field and Bidirectional Reflectance Distribution Function (BRDF) features. We optimise a continuous incident illumination field through a Tensorial Factorisation representation, while separately fine-tuning the BRDF features of each 3D Gaussian relative to this illumination field. Our methodology significantly enhances the rendering quality of specular view-dependent effects of 3DGS, while maintaining rapid training and rendering speeds.\n\n将 3D 高斯点光源用于辐射场表示已实现了高质量的新视角合成，并具备实时渲染速度。然而，选择独立优化每个高斯点光源的输出辐射作为球面谐波的方法，导致了不令人满意的视角依赖效果。为了应对这些局限性，我们的研究提出了分解张量光照用于 3D 高斯点光源（3iGS），旨在提升 3D 高斯点光源（3DGS）的渲染质量。与优化单一的输出辐射参数不同，3iGS 通过将输出辐射表示为局部光照场和双向反射分布函数（BRDF）特征的函数，增强了 3DGS 的视角依赖效果。我们通过张量分解表示优化连续的入射光照场，同时相对于该光照场单独微调每个 3D 高斯点光源的 BRDF 特征。我们的方法显著提高了 3DGS 的高光视角依赖效果，同时保持了快速的训练和渲染速度。\n"
  },
  {
    "path": "abs/2408.03822.md",
    "content": "### Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields\n\n3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussian-based representation and introduces an approximated volumetric rendering, achieving very fast rendering speed and promising image quality. Furthermore, subsequent studies have successfully extended 3DGS to dynamic 3D scenes, demonstrating its wide range of applications. However, a significant drawback arises as 3DGS and its following methods entail a substantial number of Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric and temporal attributes by residual vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25x reduced storage and enhanced rendering speed compared to 3DGS for static scenes, while maintaining the quality of the scene representation. For dynamic scenes, our approach achieves more than 12x storage efficiency and retains a high-quality reconstruction compared to the existing state-of-the-art methods. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.\n\n3D 高斯点光源（3DGS）最近作为一种替代表示方法出现，利用基于 3D 高斯点的表示并引入了近似的体积渲染，实现了非常快的渲染速度和有希望的图像质量。此外，后续研究成功地将 3DGS 扩展到了动态 3D 场景，展示了其广泛的应用。然而，一个显著的缺点是 3DGS 及其后续方法需要大量的高斯点来保持渲染图像的高保真度，这需要大量的内存和存储。为了解决这个关键问题，我们特别关注两个主要目标：在不牺牲性能的情况下减少高斯点的数量，以及压缩高斯属性（如视角依赖的颜色和协方差）。为此，我们提出了一种可学习的掩码策略，显著减少了高斯点的数量，同时保持高性能。此外，我们通过使用基于网格的神经场而不是依赖球面谐波，提出了一种紧凑但有效的视角依赖颜色表示方法。最后，我们通过残差向量量化学习代码簿，以紧凑地表示几何和时间属性。通过量化和熵编码等模型压缩技术，我们在静态场景中相比于 3DGS 实现了超过 25 倍的存储减少和增强的渲染速度，同时保持了场景表示的质量。在动态场景中，我们的方法相比于现有的最先进方法实现了超过 12 倍的存储效率，并保持了高质量的重建。我们的工作提供了一个全面的 3D 场景表示框架，实现了高性能、快速训练、紧凑性和实时渲染。\n"
  },
  {
    "path": "abs/2408.03825.md",
    "content": "### Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM\n\nInitial applications of 3D Gaussian Splatting (3DGS) in Visual Simultaneous Localization and Mapping (VSLAM) demonstrate the generation of high-quality volumetric reconstructions from monocular video streams. However, despite these promising advancements, current 3DGS integrations have reduced tracking performance and lower operating speeds compared to traditional VSLAM. To address these issues, we propose integrating 3DGS with Direct Sparse Odometry, a monocular photometric SLAM system. We have done preliminary experiments showing that using Direct Sparse Odometry point cloud outputs, as opposed to standard structure-from-motion methods, significantly shortens the training time needed to achieve high-quality renders. Reducing 3DGS training time enables the development of 3DGS-integrated SLAM systems that operate in real-time on mobile hardware. These promising initial findings suggest further exploration is warranted in combining traditional VSLAM systems with 3DGS.\n\n3D 高斯点光源（3DGS）在视觉同时定位与地图构建（VSLAM）中的初步应用展示了从单目视频流中生成高质量体积重建的能力。然而，尽管这些进展令人鼓舞，目前的 3DGS 集成在跟踪性能和操作速度上仍低于传统的 VSLAM。为了解决这些问题，我们提出将 3DGS 与直接稀疏视觉测程（Direct Sparse Odometry，DSO）这一单目光度 SLAM 系统进行集成。我们的初步实验表明，使用 DSO 点云输出相比于标准的结构光法（structure-from-motion）方法，显著缩短了实现高质量渲染所需的训练时间。缩短 3DGS 的训练时间使得开发能够在移动硬件上实时运行的 3DGS 集成 SLAM 系统成为可能。这些有前景的初步发现表明，结合传统的 VSLAM 系统与 3DGS 仍需进一步探索。\n"
  },
  {
    "path": "abs/2408.04249.md",
    "content": "### InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting\n\nWe present InstantStyleGaussian, an innovative 3D style transfer method based on the 3D Gaussian Splatting (3DGS) scene representation. By inputting a target style image, it quickly generates new 3D GS scenes. Our approach operates on pre-reconstructed GS scenes, combining diffusion models with an improved iterative dataset update strategy. It utilizes diffusion models to generate target style images, adds these new images to the training dataset, and uses this dataset to iteratively update and optimize the GS scenes. Extensive experimental results demonstrate that our method ensures high-quality stylized scenes while offering significant advantages in style transfer speed and consistency.\n\n我们提出了InstantStyleGaussian，这是一种基于3D高斯点绘（3D Gaussian Splatting，3DGS）场景表示的创新型3D风格迁移方法。通过输入目标风格图像，该方法能够快速生成新的3D GS场景。我们的方法在预先重建的GS场景上操作，结合了扩散模型和改进的迭代数据集更新策略。它利用扩散模型生成目标风格图像，将这些新图像添加到训练数据集中，并使用该数据集迭代更新和优化GS场景。广泛的实验结果表明，我们的方法在保证高质量风格化场景的同时，在风格迁移速度和一致性方面具有显著优势。\n"
  },
  {
    "path": "abs/2408.04474.md",
    "content": "### LumiGauss: High-Fidelity Outdoor Relighting with 2D Gaussian Splatting\n\nDecoupling lighting from geometry using unconstrained photo collections is notoriously challenging. Solving it would benefit many users, as creating complex 3D assets takes days of manual labor. Many previous works have attempted to address this issue, often at the expense of output fidelity, which questions the practicality of such methods. We introduce LumiGauss, a technique that tackles 3D reconstruction of scenes and environmental lighting through 2D Gaussian Splatting. Our approach yields high-quality scene reconstructions and enables realistic lighting synthesis under novel environment maps. We also propose a method for enhancing the quality of shadows, common in outdoor scenes, by exploiting spherical harmonics properties. Our approach facilitates seamless integration with game engines and enables the use of fast precomputed radiance transfer. We validate our method on the NeRF-OSR dataset, demonstrating superior performance over baseline methods. Moreover, LumiGauss can synthesize realistic images when applying novel environment maps.\n\n从不受约束的照片集群中分离光照和几何信息是一个众所周知的挑战。解决这一问题将为许多用户带来巨大益处，因为创建复杂的3D资产通常需要耗费数天的手工劳动。许多之前的研究试图解决这个问题，但往往以牺牲输出保真度为代价，进而质疑这些方法的实用性。\n我们引入了LumiGauss，这是一种通过2D高斯点绘技术处理场景和环境光照3D重建的技术。我们的方法能够产生高质量的场景重建，并在新的环境贴图下实现逼真的光照合成。此外，我们提出了一种通过利用球谐函数特性来增强户外场景中常见阴影质量的方法。我们的方案不仅能与游戏引擎无缝集成，还支持使用快速预计算辐射传输。\n我们在NeRF-OSR数据集上验证了我们的方法，展示了其相对于基线方法的优越性能。此外，LumiGauss在应用新的环境贴图时能够合成逼真的图像。\n"
  },
  {
    "path": "abs/2408.04831.md",
    "content": "### Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction\n\nSparse-view 3D reconstruction stands as a formidable challenge in computer vision, aiming to build complete three-dimensional models from a limited array of viewing perspectives. This task confronts several difficulties: 1) the limited number of input images that lack consistent information; 2) dependence on the quality of input images; and 3) the substantial size of model parameters. To address these challenges, we propose a self-augmented coarse-to-fine Gaussian splatting paradigm, enhanced with a structure-aware mask, for sparse-view 3D reconstruction. In particular, our method initially employs a coarse Gaussian model to obtain a basic 3D representation from sparse-view inputs. Subsequently, we develop a fine Gaussian network to enhance consistent and detailed representation of the output with both 3D geometry augmentation and perceptual view augmentation. During training, we design a structure-aware masking strategy to further improve the model's robustness against sparse inputs and noise.Experimental results on the MipNeRF360 and OmniObject3D datasets demonstrate that the proposed method achieves state-of-the-art performances for sparse input views in both perceptual quality and efficiency.\n\n稀疏视角的3D重建是计算机视觉领域的一项艰巨挑战，其目标是在有限的观察角度下构建完整的三维模型。此任务面临多重困难：1）输入图像数量有限，信息不一致；2）依赖输入图像的质量；3）模型参数规模庞大。为了解决这些问题，我们提出了一种自增强的粗到细高斯点绘范式，并结合了结构感知掩码，用于稀疏视角的3D重建。具体而言，我们的方法首先采用粗略的高斯模型，从稀疏视角输入中获得基础的3D表示。随后，我们开发了一个精细的高斯网络，通过3D几何增强和感知视角增强来提高输出的一致性和细节表示。在训练过程中，我们设计了一种结构感知掩码策略，以进一步提高模型对稀疏输入和噪声的鲁棒性。实验结果表明，在MipNeRF360和OmniObject3D数据集上，所提出的方法在稀疏输入视角下的感知质量和效率方面均达到了当前最先进的水平。\n"
  },
  {
    "path": "abs/2408.05631.md",
    "content": "### PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer\n\nWe present PRTGaussian, a realtime relightable novel-view synthesis method made possible by combining 3D Gaussians and Precomputed Radiance Transfer (PRT). By fitting relightable Gaussians to multi-view OLAT data, our method enables real-time, free-viewpoint relighting. By estimating the radiance transfer based on high-order spherical harmonics, we achieve a balance between capturing detailed relighting effects and maintaining computational efficiency. We utilize a two-stage process: in the first stage, we reconstruct a coarse geometry of the object from multi-view images. In the second stage, we initialize 3D Gaussians with the obtained point cloud, then simultaneously refine the coarse geometry and learn the light transport for each Gaussian. Extensive experiments on synthetic datasets show that our approach can achieve fast and high-quality relighting for general objects.\n\n我们提出了PRTGaussian，这是一种结合了3D高斯点绘和预计算辐射传输（Precomputed Radiance Transfer，PRT）的实时可重光照新视角合成方法。通过将可重光照的高斯模型拟合到多视角的单光源照明测试（OLAT）数据中，我们的方法实现了实时、自由视角的重光照。通过基于高阶球谐函数估计辐射传输，我们在捕捉详细的重光照效果和保持计算效率之间取得了平衡。我们的方法采用了两阶段的处理过程：在第一阶段，我们从多视角图像中重建物体的粗略几何结构。在第二阶段，我们利用获得的点云初始化3D高斯模型，然后同时优化粗略几何结构并学习每个高斯点的光传输。大量的合成数据集实验表明，我们的方法能够快速、高质量地对一般物体进行重光照处理。\n"
  },
  {
    "path": "abs/2408.05635.md",
    "content": "### Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis\n\nConventional geometry-based SLAM systems lack dense 3D reconstruction capabilities since their data association usually relies on feature correspondences. Additionally, learning-based SLAM systems often fall short in terms of real-time performance and accuracy. Balancing real-time performance with dense 3D reconstruction capabilities is a challenging problem. In this paper, we propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting, for 3D scene representation and pose estimation. This technique leverages the real-time rendering performance of 3D Gaussian Splatting with rasterization and allows for differentiable optimization in real time through CUDA implementation. We also enable mesh reconstruction from 3D Gaussians for explicit dense 3D reconstruction. To estimate accurate camera poses, we utilize a rotation-translation decoupled strategy with inverse optimization. This involves iteratively updating both in several iterations through gradient-based optimization. This process includes differentiably rendering RGB, depth, and silhouette maps and updating the camera parameters to minimize a combined loss of photometric loss, depth geometry loss, and visibility loss, given the existing 3D Gaussian map. However, 3D Gaussian Splatting (3DGS) struggles to accurately represent surfaces due to the multi-view inconsistency of 3D Gaussians, which can lead to reduced accuracy in both camera pose estimation and scene reconstruction. To address this, we utilize depth priors as additional regularization to enforce geometric constraints, thereby improving the accuracy of both pose estimation and 3D reconstruction. We also provide extensive experimental results on public benchmark datasets to demonstrate the effectiveness of our proposed methods in terms of pose accuracy, geometric accuracy, and rendering performance.\n\n传统的基于几何的SLAM系统由于其数据关联通常依赖于特征匹配，因此缺乏密集的3D重建能力。此外，基于学习的SLAM系统在实时性能和准确性方面往往表现不足。在保持实时性能的同时实现密集的3D重建能力是一个具有挑战性的问题。本文提出了一种实时RGB-D SLAM系统，该系统结合了一种新颖的视角合成技术——3D高斯点绘（3D Gaussian Splatting），用于3D场景表示和姿态估计。该技术利用3D高斯点绘结合光栅化的实时渲染性能，并通过CUDA实现实现实时的可微优化。我们还从3D高斯点绘中提取网格进行显式的密集3D重建。\n为了估计精确的相机姿态，我们采用了一种旋转-平移解耦策略，并通过反向优化进行迭代更新。这包括通过梯度优化迭代更新多个循环中的RGB、深度和轮廓图像的可微渲染，并更新相机参数以最小化光度损失、深度几何损失和可见性损失的组合损失，基于现有的3D高斯图。然而，由于3D高斯点绘的多视角不一致性，3D高斯点绘（3DGS）在准确表示表面时存在困难，这可能导致相机姿态估计和场景重建的准确性降低。为了解决这一问题，我们利用深度先验作为额外的正则化以加强几何约束，从而提高姿态估计和3D重建的准确性。我们还在公共基准数据集上提供了广泛的实验结果，以展示我们提出的方法在姿态准确性、几何准确性和渲染性能方面的有效性。\n\n"
  },
  {
    "path": "abs/2408.06019.md",
    "content": "### HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors\n\nIn this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors derived from a large-scale multi-view dynamic dataset, and the avatar creation phase applies these priors for few-shot personalization. Our approach effectively captures these priors by utilizing a Gaussian Splatting-based auto-decoder network with part-based dynamic modeling. Our method employs identity-shared encoding with personalized latent codes for individual identities to learn the attributes of Gaussian primitives. During the avatar creation phase, we achieve fast head avatar personalization by leveraging inversion and fine-tuning strategies. Extensive experiments demonstrate that our model effectively exploits head priors and successfully generalizes them to few-shot personalization, achieving photo-realistic rendering quality, multi-view consistency, and stable animation.\n\n在本文中，我们提出了一种新颖的3D头像创建方法，该方法能够从少量的野外数据中实现高保真度和可动画的鲁棒性。鉴于此问题的欠约束性，融入先验知识至关重要。因此，我们提出了一个包含先验学习和头像创建阶段的框架。在先验学习阶段，我们利用从大规模多视角动态数据集中获取的3D头像先验信息；在头像创建阶段，我们将这些先验应用于少样本个性化。我们的方法通过使用基于高斯点绘的自动解码网络和基于部分的动态建模来有效捕捉这些先验。我们的方法采用身份共享编码，并结合个性化潜在代码来学习高斯基元的属性。在头像创建阶段，我们通过反向优化和微调策略，实现了快速的头像个性化。大量实验表明，我们的模型能够有效利用头像先验信息，并成功将其泛化到少样本个性化中，达到了照片级逼真的渲染质量、多视角一致性和稳定的动画效果。\n\n"
  },
  {
    "path": "abs/2408.06286.md",
    "content": "### Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering\n\n3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering or filtering techniques towards primitives, the scale-specific information is not involved in Gaussians. In this paper, we propose a unified optimization method to make Gaussians adaptive for arbitrary scales by self-adjusting the primitive properties (e.g., color, shape and size) and distribution (e.g., position). Inspired by the mipmap technique, we design pseudo ground-truth for the target scale and propose a scale-consistency guidance loss to inject scale information into 3D Gaussians. Our method is a plug-in module, applicable for any 3DGS models to solve the zoom-in and zoom-out aliasing. Extensive experiments demonstrate the effectiveness of our method. Notably, our method outperforms 3DGS in PSNR by an average of 9.25 dB for zoom-in and 10.40 dB for zoom-out on the NeRF Synthetic dataset.\n\n3D高斯点绘（3DGS）由于其出色的渲染效率和高保真度，在新视角合成领域引起了广泛关注。然而，由于训练过程中采用单尺度表示，训练后的高斯模型在缩放时会出现严重的降质问题。虽然一些方法尝试通过后处理技术（如选择性渲染或针对基元的过滤技术）来解决这一问题，但这些方法并未将尺度特定的信息融入到高斯模型中。在本文中，我们提出了一种统一的优化方法，使高斯模型能够自适应任意尺度，通过自我调整基元属性（如颜色、形状和大小）和分布（如位置）来解决这一问题。\n受mipmap技术的启发，我们设计了目标尺度的伪真实值，并提出了尺度一致性引导损失，将尺度信息注入到3D高斯模型中。我们的方法是一种插件模块，适用于任何3DGS模型，以解决放大和缩小时的锯齿问题。大量实验验证了我们方法的有效性。值得注意的是，在NeRF Synthetic数据集上，我们的方法在放大和缩小时相较于3DGS，分别在PSNR上平均提高了9.25 dB和10.40 dB。\n"
  },
  {
    "path": "abs/2408.06543.md",
    "content": "### HDRGS: High Dynamic Range Gaussian Splatting\n\nRecent years have witnessed substantial advancements in the field of 3D reconstruction from 2D images, particularly following the introduction of the neural radiance field (NeRF) technique. However, reconstructing a 3D high dynamic range (HDR) radiance field, which aligns more closely with real-world conditions, from 2D multi-exposure low dynamic range (LDR) images continues to pose significant challenges. Approaches to this issue fall into two categories: grid-based and implicit-based. Implicit methods, using multi-layer perceptrons (MLP), face inefficiencies, limited solvability, and overfitting risks. Conversely, grid-based methods require significant memory and struggle with image quality and long training times. In this paper, we introduce Gaussian Splatting-a recent, high-quality, real-time 3D reconstruction technique-into this domain. We further develop the High Dynamic Range Gaussian Splatting (HDR-GS) method, designed to address the aforementioned challenges. This method enhances color dimensionality by including luminance and uses an asymmetric grid for tone-mapping, swiftly and precisely converting pixel irradiance to color. Our approach improves HDR scene recovery accuracy and integrates a novel coarse-to-fine strategy to speed up model convergence, enhancing robustness against sparse viewpoints and exposure extremes, and preventing local optima. Extensive testing confirms that our method surpasses current state-of-the-art techniques in both synthetic and real-world scenarios.\n\n近年来，随着神经辐射场（NeRF）技术的引入，基于2D图像的3D重建领域取得了显著进展。然而，从2D多曝光低动态范围（LDR）图像重建与现实条件更为接近的3D高动态范围（HDR）辐射场仍然面临重大挑战。针对这一问题的方法通常分为两类：基于网格的方法和基于隐式的方法。隐式方法通常使用多层感知机（MLP），但存在效率低下、可解性有限以及过拟合的风险。相比之下，基于网格的方法虽然内存需求巨大，但在图像质量和训练时间方面仍存在困难。\n在本文中，我们将高质量、实时的3D重建技术——高斯点绘（Gaussian Splatting）引入到这一领域，并进一步开发了高动态范围高斯点绘（HDR-GS）方法，旨在解决上述挑战。该方法通过引入亮度来增强颜色维度，并使用非对称网格进行色调映射，从而快速且准确地将像素辐照度转换为颜色。我们的方法不仅提高了HDR场景恢复的准确性，还集成了一种粗到细的策略，加速了模型的收敛，增强了在稀疏视角和极端曝光下的鲁棒性，并避免了局部最优解。广泛的测试结果表明，我们的方法在合成和现实场景中均超越了当前的最先进技术。\n"
  },
  {
    "path": "abs/2408.06975.md",
    "content": "### SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis\n\nWe propose a novel cross-spectral rendering framework based on 3D Gaussian Splatting (3DGS) that generates realistic and semantically meaningful splats from registered multi-view spectrum and segmentation maps. This extension enhances the representation of scenes with multiple spectra, providing insights into the underlying materials and segmentation. We introduce an improved physically-based rendering approach for Gaussian splats, estimating reflectance and lights per spectra, thereby enhancing accuracy and realism. In a comprehensive quantitative and qualitative evaluation, we demonstrate the superior performance of our approach with respect to other recent learning-based spectral scene representation approaches (i.e., XNeRF and SpectralNeRF) as well as other non-spectral state-of-the-art learning-based approaches. Our work also demonstrates the potential of spectral scene understanding for precise scene editing techniques like style transfer, inpainting, and removal. Thereby, our contributions address challenges in multi-spectral scene representation, rendering, and editing, offering new possibilities for diverse applications.\n\n我们提出了一种基于3D高斯点绘（3DGS）的新型跨光谱渲染框架，该框架能够从配准的多视角光谱和分割图中生成逼真且语义丰富的点绘。这一扩展增强了对多光谱场景的表示，提供了关于底层材料和分割的深入见解。我们引入了一种改进的基于物理的高斯点绘渲染方法，通过估计每个光谱的反射率和光照，提升了渲染的准确性和真实感。在全面的定量和定性评估中，我们展示了我们的方法在性能上优于其他最新的基于学习的光谱场景表示方法（如XNeRF和SpectralNeRF），以及其他非光谱的最先进的基于学习的方法。我们的研究还展示了光谱场景理解在精确场景编辑技术（如风格迁移、修复和移除）中的潜力。因此，我们的贡献解决了多光谱场景表示、渲染和编辑中的挑战，为多种应用提供了新的可能性。\n"
  },
  {
    "path": "abs/2408.07540.md",
    "content": "### 3D Gaussian Editing with A Single Image\n\nThe modeling and manipulation of 3D scenes captured from the real world are pivotal in various applications, attracting growing research interest. While previous works on editing have achieved interesting results through manipulating 3D meshes, they often require accurately reconstructed meshes to perform editing, which limits their application in 3D content generation. To address this gap, we introduce a novel single-image-driven 3D scene editing approach based on 3D Gaussian Splatting, enabling intuitive manipulation via directly editing the content on a 2D image plane. Our method learns to optimize the 3D Gaussians to align with an edited version of the image rendered from a user-specified viewpoint of the original scene. To capture long-range object deformation, we introduce positional loss into the optimization process of 3D Gaussian Splatting and enable gradient propagation through reparameterization. To handle occluded 3D Gaussians when rendering from the specified viewpoint, we build an anchor-based structure and employ a coarse-to-fine optimization strategy capable of handling long-range deformation while maintaining structural stability. Furthermore, we design a novel masking strategy to adaptively identify non-rigid deformation regions for fine-scale modeling. Extensive experiments show the effectiveness of our method in handling geometric details, long-range, and non-rigid deformation, demonstrating superior editing flexibility and quality compared to previous approaches.\n\n在众多应用中，真实世界捕获的3D场景的建模和操控至关重要，吸引了越来越多的研究兴趣。尽管之前的研究通过操控3D网格实现了有趣的编辑效果，但它们通常需要精确重建的网格来进行编辑，这限制了其在3D内容生成中的应用。为了解决这一问题，我们引入了一种基于3D高斯点绘（3D Gaussian Splatting）的新型单图像驱动的3D场景编辑方法，使用户能够通过直接在2D图像平面上编辑内容来实现直观的操作。我们的方法通过学习优化3D高斯点绘，使其与用户指定视角下渲染的图像的编辑版本对齐。\n为了捕捉远程物体的变形，我们在3D高斯点绘的优化过程中引入了位置损失，并通过重新参数化实现梯度传播。为了解决从指定视角渲染时被遮挡的3D高斯点绘问题，我们构建了一个基于锚点的结构，并采用了粗到细的优化策略，能够处理远程变形，同时保持结构的稳定性。此外，我们设计了一种新颖的掩码策略，自适应地识别非刚性变形区域，以进行细致的建模。广泛的实验表明，我们的方法在处理几何细节、远程变形和非刚性变形方面具有显著的效果，相较于以往的方法展现出更强的编辑灵活性和质量。\n"
  },
  {
    "path": "abs/2408.07967.md",
    "content": "### FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering\n\nThis work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.\n\n本研究介绍了FlashGS，一款开源的CUDA Python库，旨在通过算法和内核级优化，促进3D高斯点绘的高效可微光栅化。FlashGS的开发基于对渲染过程的全面分析，旨在提高计算效率并推动这一技术的广泛应用。本文详细描述了一系列优化策略，包括冗余消除、有效的流水线处理、精细化的控制与调度机制，以及内存访问优化，这些策略经过精心整合，极大地提升了光栅化过程的性能。\n我们对FlashGS的性能进行了广泛评估，涵盖了多种分辨率的合成和现实世界的大规模场景。实证结果显示，FlashGS在移动消费级GPU上实现了平均4倍的加速，并且显著减少了内存消耗。这些结果凸显了FlashGS在性能提升和资源优化方面的卓越能力，使其成为3D渲染领域中一款强大的工具。\n"
  },
  {
    "path": "abs/2408.08206.md",
    "content": "### WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting\n\nThe underwater 3D scene reconstruction is a challenging, yet interesting problem with applications ranging from naval robots to VR experiences. The problem was successfully tackled by fully volumetric NeRF-based methods which can model both the geometry and the medium (water). Unfortunately, these methods are slow to train and do not offer real-time rendering. More recently, 3D Gaussian Splatting (3DGS) method offered a fast alternative to NeRFs. However, because it is an explicit method that renders only the geometry, it cannot render the medium and is therefore unsuited for underwater reconstruction. Therefore, we propose a novel approach that fuses volumetric rendering with 3DGS to handle underwater data effectively. Our method employs 3DGS for explicit geometry representation and a separate volumetric field (queried once per pixel) for capturing the scattering medium. This dual representation further allows the restoration of the scenes by removing the scattering medium. Our method outperforms state-of-the-art NeRF-based methods in rendering quality on the underwater SeaThru-NeRF dataset. Furthermore, it does so while offering real-time rendering performance, addressing the efficiency limitations of existing methods.\n\n水下3D场景重建是一个充满挑战但极具趣味的问题，应用领域广泛，包括海军机器人和虚拟现实体验。之前的研究成功地利用基于全体积NeRF的方法解决了这个问题，这些方法能够同时建模几何结构和介质（水）。然而，这些方法的训练速度较慢，无法实现实时渲染。近期，3D高斯点绘（3DGS）方法为NeRF提供了一种快速替代方案。然而，由于3DGS是一种显式方法，仅渲染几何结构，无法渲染介质，因此不适用于水下重建。\n为此，我们提出了一种新颖的方法，将体积渲染与3DGS融合，以有效处理水下数据。我们的方法利用3DGS进行显式几何表示，同时使用一个单独的体积场（每像素查询一次）来捕捉散射介质。这种双重表示还允许通过去除散射介质来恢复场景。我们的方法在渲染质量上优于最先进的基于NeRF的方法，特别是在水下SeaThru-NeRF数据集上表现突出。此外，该方法提供了实时渲染性能，有效解决了现有方法在效率上的局限性。\n"
  },
  {
    "path": "abs/2408.08524.md",
    "content": "### GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization\n\nWe present GS-ID, a novel framework for illumination decomposition on Gaussian Splatting, achieving photorealistic novel view synthesis and intuitive light editing. Illumination decomposition is an ill-posed problem facing three main challenges: 1) priors for geometry and material are often lacking; 2) complex illumination conditions involve multiple unknown light sources; and 3) calculating surface shading with numerous light sources is computationally expensive. To address these challenges, we first introduce intrinsic diffusion priors to estimate the attributes for physically based rendering. Then we divide the illumination into environmental and direct components for joint optimization. Last, we employ deferred rendering to reduce the computational load. Our framework uses a learnable environment map and Spherical Gaussians (SGs) to represent light sources parametrically, therefore enabling controllable and photorealistic relighting on Gaussian Splatting. Extensive experiments and applications demonstrate that GS-ID produces state-of-the-art illumination decomposition results while achieving better geometry reconstruction and rendering performance."
  },
  {
    "path": "abs/2408.08723.md",
    "content": "### Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS\n\nNovel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing works rely on per-pixel image loss functions, such as L2 loss. In SfM-free methods, inaccurate initial poses lead to misalignment issue, which, under the constraints of per-pixel image loss functions, results in excessive gradients, causing unstable optimization and poor convergence for NVS. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS. We use correspondences between the target and the rendered result to achieve better pixel alignment, facilitating the optimization of relative poses between frames. We then apply the learned poses to optimize the entire scene. Each 2D screen-space pixel is associated with its corresponding 3D Gaussians through approximated surface rendering to facilitate gradient back propagation. Experimental results underline the superior performance and time efficiency of the proposed approach compared to the state-of-the-art baselines.\n\n无需结构化运动（Structure-from-Motion, SfM）预处理的相机姿态的新视图合成（Novel View Synthesis, NVS）方法，即无SfM方法，对于促进快速响应能力和增强在可变操作条件下的鲁棒性至关重要。最近的无SfM方法已集成了姿态优化，设计了用于联合相机姿态估计和NVS的端到端框架。然而，大多数现有工作依赖于每像素图像损失函数，如L2损失。在无SfM方法中，不准确的初始姿态会导致对齐问题，在每像素图像损失函数的约束下，导致过大的梯度，从而引起不稳定的优化和NVS的差收敛性。在本研究中，我们提出了一种基于对应关系引导的无SfM 3D高斯投影用于NVS。我们利用目标图像和渲染结果之间的对应关系来实现更好的像素对齐，促进帧间相对姿态的优化。然后，我们应用学习到的姿态来优化整个场景。通过近似表面渲染，每个2D屏幕空间像素与其对应的3D高斯相关联，以便于梯度反向传播。实验结果表明，与现有最先进的基线方法相比，该方法在性能和时间效率上具有显著优势。\n"
  },
  {
    "path": "abs/2408.09130.md",
    "content": "### Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting\n\n3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and severely degrades its performance. To tackle this problem, we propose Gaussian-DK. Observing that inconsistencies are mainly caused by camera imaging, we represent a consistent radiance field of the physical world using a set of anisotropic 3D Gaussians, and design a camera response module to compensate for multi-view inconsistencies. We also introduce a step-based gradient scaling strategy to constrain Gaussians near the camera, which turn out to be floaters, from splitting and cloning. Experiments on our proposed benchmark dataset demonstrate that Gaussian-DK produces high-quality renderings without ghosting and floater artifacts and significantly outperforms existing methods. Furthermore, we can also synthesize light-up images by controlling exposure levels that clearly show details in shadow areas.\n\n3D高斯投影最近作为一种强大的表示方法出现，能够使用一致的多视角图像作为输入合成出色的新视图。然而，我们注意到，在未完全照明的黑暗环境中拍摄的图像可能会出现显著的亮度变化和多视角不一致性，这对3D高斯投影提出了巨大挑战，并严重降低其性能。为了解决这个问题，我们提出了Gaussian-DK。通过观察，我们发现不一致性主要由相机成像引起，因此我们使用一组各向异性的3D高斯来表示物理世界的一致辐射场，并设计了一个相机响应模块以补偿多视角不一致性。我们还引入了一种基于步进的梯度缩放策略，以限制靠近相机的高斯（这些往往是浮动物）发生分裂和克隆。在我们提出的基准数据集上的实验表明，Gaussian-DK能够生成没有重影和浮动物伪影的高质量渲染效果，并且显著优于现有方法。此外，通过控制曝光水平，我们还可以合成光亮图像，从而清晰地显示阴影区域的细节。\n"
  },
  {
    "path": "abs/2408.09663.md",
    "content": "### CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning\n\nRecent advancements in human avatar synthesis have utilized radiance fields to reconstruct photo-realistic animatable human avatars. However, both NeRFs-based and 3DGS-based methods struggle with maintaining 3D consistency and exhibit suboptimal detail reconstruction, especially with sparse inputs. To address this challenge, we propose CHASE, which introduces supervision from intrinsic 3D consistency across poses and 3D geometry contrastive learning, achieving performance comparable with sparse inputs to that with full inputs. Following previous work, we first integrate a skeleton-driven rigid deformation and a non-rigid cloth dynamics deformation to coordinate the movements of individual Gaussians during animation, reconstructing basic avatar with coarse 3D consistency. To improve 3D consistency under sparse inputs, we design Dynamic Avatar Adjustment(DAA) to adjust deformed Gaussians based on a selected similar pose/image from the dataset. Minimizing the difference between the image rendered by adjusted Gaussians and the image with the similar pose serves as an additional form of supervision for avatar. Furthermore, we propose a 3D geometry contrastive learning strategy to maintain the 3D global consistency of generated avatars. Though CHASE is designed for sparse inputs, it surprisingly outperforms current SOTA methods in both full and sparse settings on the ZJU-MoCap and H36M datasets, demonstrating that our CHASE successfully maintains avatar's 3D consistency, hence improving rendering quality.\n\n最近在人类化身合成方面的进展利用辐射场来重建可动画的逼真化身。然而，基于NeRFs和3DGS的方法在保持3D一致性方面存在困难，并且在稀疏输入的情况下，细节重建效果欠佳。为了解决这一挑战，我们提出了CHASE，它引入了来自跨姿态的内在3D一致性监督和3D几何对比学习，从而在稀疏输入情况下实现了与完整输入相媲美的性能。借鉴以往的研究，我们首先结合了骨架驱动的刚性变形和非刚性布料动态变形，以协调动画中各个高斯的运动，从而重建具有粗略3D一致性的基础化身。为了在稀疏输入下提高3D一致性，我们设计了动态化身调整（DAA），根据数据集中选择的相似姿态/图像来调整变形后的高斯。通过最小化由调整后的高斯渲染的图像与相似姿态的图像之间的差异，作为对化身的额外监督。此外，我们提出了一种3D几何对比学习策略，以保持生成化身的3D全局一致性。尽管CHASE是为稀疏输入设计的，但它在ZJU-MoCap和H36M数据集的完整和稀疏设置中均优于当前的最先进方法，这表明我们的CHASE成功保持了化身的3D一致性，从而提高了渲染质量。\n"
  },
  {
    "path": "abs/2408.09665.md",
    "content": "### SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting\n\nReconstructing photo-realistic animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the intrinsic structure and connections within the human body, they fail to achieve fine-detail reconstruction of dynamic human avatars. To address this issue, we propose SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic animatable human avatars from monocular videos. We then design a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling. The generated labels are used to guide the optimization of Gaussian semantic attributes. To address the limited receptive field of point-level MLPs for local features, we also propose a 3D network that integrates geometric and semantic associations for human avatar deformation. We further implement three key strategies to enhance the semantic accuracy of 3D Gaussians and rendering quality: semantic projection with 2D regularization, semantic-guided density regularization and semantic-aware regularization with neighborhood consistency. Extensive experiments demonstrate that SG-GS achieves state-of-the-art geometry and appearance reconstruction performance.\n\n从单目视频重建逼真的可动画人类化身在计算机视觉和图形学领域仍然具有挑战性。最近，使用3D高斯来表示人体的方法出现了，提供了更快的优化和实时渲染。然而，由于忽略了人体语义信息的关键作用——这代表了人体内部的固有结构和连接性，这些方法未能实现动态人类化身的精细细节重建。为了解决这一问题，我们提出了SG-GS方法，使用嵌入语义的3D高斯、骨架驱动的刚性变形以及非刚性的布料动态变形，从单目视频中创建逼真的可动画人类化身。我们设计了一个语义人体标注器（SHA），利用SMPL的语义先验进行高效的人体部位语义标注。生成的标签用于指导高斯语义属性的优化。为了解决点级MLP在局部特征方面接收域有限的问题，我们还提出了一个3D网络，将几何和语义关联整合用于人类化身的变形。我们进一步实施了三种关键策略来增强3D高斯的语义准确性和渲染质量：2D正则化的语义投影、语义引导的密度正则化和具有邻域一致性的语义感知正则化。大量实验表明，SG-GS在几何和外观重建性能上达到了当前最先进的水平。\n"
  },
  {
    "path": "abs/2408.10041.md",
    "content": "### Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation\n\nRecent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings through a multi-level tri-plane architecture. This architecture features 2D feature grids at various resolutions across different levels, facilitating continuous spatial domain representation and enhancing spatial correlations among Gaussian primitives. Building upon this foundation, we introduce a level-based progressive training scheme, which incorporates explicit spatial regularization. This method capitalizes on spatial correlations to enhance both the rendering quality and the compactness of the IGS representation. Furthermore, we propose a novel compression pipeline tailored for both point clouds and 2D feature grids, considering the entropy variations across different levels. Extensive experimental evaluations demonstrate that our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity, and yielding results that are competitive with the state-of-the-art."
  },
  {
    "path": "abs/2408.10154.md",
    "content": "### LoopSplat: Loop Closure by Registering 3D Gaussian Splats\n\nSimultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM.\n\n基于3D高斯投影（3DGS）的同步定位与地图构建（SLAM）最近显示出实现更精确、密集的3D场景地图的潜力。然而，现有的基于3DGS的方法未能通过回环闭合和/或全局束调整来解决场景的全局一致性问题。为此，我们提出了LoopSplat，它以RGB-D图像作为输入，利用3DGS子地图和帧对模型跟踪进行密集映射。LoopSplat在线触发回环闭合，并直接通过3DGS配准计算子地图之间的相对回环边约束，相较于传统的全局到局部点云配准，提高了效率和准确性。它采用了一种稳健的位姿图优化方法，通过刚性对齐子地图来实现全局一致性。在合成的Replica数据集和真实世界的TUM-RGBD、ScanNet以及ScanNet++数据集上的评估显示，相较于现有的密集RGB-D SLAM方法，LoopSplat在跟踪、映射和渲染方面表现出竞争力或更优的性能。\n"
  },
  {
    "path": "abs/2408.10588.md",
    "content": "### DEGAS: Detailed Expressions on Full-Body Gaussian Avatars\n\nAlthough neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method learns a conditional variational autoencoder that takes both the body motion and facial expression as driving signals to generate Gaussian maps in the UV layout. To drive the facial expressions, instead of the commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars. Leveraging the rendering capability of 3DGS and the rich expressiveness of the expression latent space, the learned avatars can be reenacted to reproduce photorealistic rendering images with subtle and accurate facial expressions. Experiments on an existing dataset and our newly proposed dataset of full-body talking avatars demonstrate the efficacy of our method. We also propose an audio-driven extension of our method with the help of 2D talking faces, opening new possibilities to interactive AI agents.\n\n尽管神经渲染在创建逼真的、可动画的全身和头部化身方面取得了显著进展，但将详细的面部表情融入全身化身的研究仍然相对较少。我们提出了DEGAS，这是首个基于3D高斯投影（3DGS）的全身化身建模方法，可以表现丰富的面部表情。我们的方法在给定主体的多视角视频上进行训练，学习一个条件变分自编码器，它同时以身体动作和面部表情作为驱动信号来生成UV布局中的高斯图。为了驱动面部表情，我们提出使用仅在2D肖像图像上训练的表情潜在空间，而不是常用的3D头部化身中的3D可变形模型（3DMMs），从而弥合2D说话脸和3D化身之间的差距。利用3DGS的渲染能力和表情潜在空间的丰富表现力，所学习的化身可以重新演绎，以再现具有细腻且准确面部表情的逼真渲染图像。在现有数据集和我们新提出的全身说话化身数据集上的实验表明，我们的方法是有效的。我们还提出了一个基于音频驱动的方法扩展，通过2D说话脸的帮助，为交互式AI代理开辟了新的可能性。\n"
  },
  {
    "path": "abs/2408.10906.md",
    "content": "### ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining\n\n3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU.\nWe utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce Gaussian-MAE, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.\n\n3D Gaussian Splatting (3DGS) 已成为许多视觉任务中三维表示的实际标准。这促使我们直接在这种表示空间中进行三维理解的研究。为推动这一方向的研究，我们首先使用常用的 ShapeNet 和 ModelNet 数据集构建了一个大规模的 3DGS 数据集。我们的数据集 ShapeSplat 包含来自 87 个独特类别的 65K 个对象，其标签与各自的数据集一致。创建此数据集使用了相当于在 TITAN XP GPU 上两年 GPU 计算时间的资源。\n我们将此数据集用于无监督预训练和有监督微调，以进行分类和分割任务。为此，我们引入了 Gaussian-MAE 方法，突出了从高斯参数进行表示学习的独特优势。通过详尽的实验，我们提供了若干有价值的见解。特别是，我们表明：(1) 优化后的 GS 质心的分布与均匀采样点云（用于初始化）的分布显著不同；(2) 当仅使用质心时，这种分布变化会导致分类任务的性能下降，但分割任务的性能提升；(3) 为了利用额外的高斯参数，我们提出了在归一化特征空间中进行高斯特征分组，以及 splats pooling 层，提供了一种定制的解决方案来有效地分组和嵌入相似的高斯分布，从而在微调任务中带来显著改进。\n"
  },
  {
    "path": "abs/2408.10935.md",
    "content": "### Large Point-to-Gaussian Model for Image-to-3D Generation\n\nRecently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the Attention mechanism, Projection mechanism, and Point feature extractor, dubbed as APP block, for fusing the image features with point cloud features. The qualitative and quantitative experiments extensively demonstrate the effectiveness of the proposed approach on GSO and Objaverse datasets, and show the proposed method achieves state-of-the-art performance.\n\n最近，基于大型重建模型的图像到三维（image-to-3D）方法在生成质量和速度方面取得了显著进展，尤其是在三维高斯重建模型方面。现有的大型三维高斯模型直接将二维图像映射到三维高斯参数上，而在没有三维先验的情况下，将二维图像回归到三维高斯表示是具有挑战性的。在本文中，我们提出了一种大型 Point-to-Gaussian 模型，该模型输入由大型三维扩散模型在二维图像条件下生成的初始点云，以生成高斯参数，用于图像到三维的生成。点云提供了高斯生成的初始三维几何先验，因此显著促进了图像到三维的生成。此外，我们提出了注意力机制、投影机制和点特征提取器，统称为 APP 块，用于融合图像特征和点云特征。定性和定量实验广泛证明了该方法在 GSO 和 Objaverse 数据集上的有效性，并显示该方法达到了当前最先进的性能。\n"
  },
  {
    "path": "abs/2408.11085.md",
    "content": "### GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting\n\nWe leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences. GSLoc obviates the need for training feature extractors or descriptors by operating directly on RGB images, utilizing the 3D vision foundation model, MASt3R, for precise 2D matching. To improve the robustness of our model in challenging outdoor environments, we incorporate an exposure-adaptive module within the 3DGS framework. Consequently, GSLoc enables efficient pose refinement given a single RGB query and a coarse initial pose estimation. Our proposed approach surpasses leading NeRF-based optimization methods in both accuracy and runtime across indoor and outdoor visual localization benchmarks, achieving state-of-the-art accuracy on two indoor datasets.\n\n我们利用三维高斯喷涂（3DGS）作为场景表示，提出了一种新颖的测试时相机姿态优化框架，称为 GSLoc。该框架提升了最先进的绝对姿态回归和场景坐标回归方法的定位精度。3DGS 模型通过渲染高质量的合成图像和深度图，促进了二维与三维之间的对应关系建立。GSLoc 无需训练特征提取器或描述符，直接在 RGB 图像上操作，并利用三维视觉基础模型 MASt3R 进行精确的二维匹配。为了提高模型在复杂室外环境中的鲁棒性，我们在 3DGS 框架中引入了一个曝光自适应模块。因此，GSLoc 能够在给定单个 RGB 查询图像和粗略初始姿态估计的情况下高效地进行姿态优化。我们的方法在室内和室外视觉定位基准上，在精度和运行时间上均优于领先的基于 NeRF 的优化方法，并在两个室内数据集上实现了最先进的精度。\n"
  },
  {
    "path": "abs/2408.11413.md",
    "content": "### Pano2Room: Novel View Synthesis from a Single Indoor Panorama\n\nRecent single-view 3D generative methods have made significant advancements by leveraging knowledge distilled from extensive 3D object datasets. However, challenges persist in the synthesis of 3D scenes from a single view, primarily due to the complexity of real-world environments and the limited availability of high-quality prior resources. In this paper, we introduce a novel approach called Pano2Room, designed to automatically reconstruct high-quality 3D indoor scenes from a single panoramic image. These panoramic images can be easily generated using a panoramic RGBD inpainter from captures at a single location with any camera. The key idea is to initially construct a preliminary mesh from the input panorama, and iteratively refine this mesh using a panoramic RGBD inpainter while collecting photo-realistic 3D-consistent pseudo novel views. Finally, the refined mesh is converted into a 3D Gaussian Splatting field and trained with the collected pseudo novel views. This pipeline enables the reconstruction of real-world 3D scenes, even in the presence of large occlusions, and facilitates the synthesis of photo-realistic novel views with detailed geometry. Extensive qualitative and quantitative experiments have been conducted to validate the superiority of our method in single-panorama indoor novel synthesis compared to the state-of-the-art.\n\n近年来，单视图三维生成方法通过利用从广泛的三维对象数据集中提取的知识取得了显著进展。然而，由于真实世界环境的复杂性以及高质量先验资源的有限性，从单个视图合成三维场景仍面临挑战。在本文中，我们提出了一种新颖的方法，称为 Pano2Room，旨在从单个全景图像自动重建高质量的室内三维场景。这些全景图像可以通过全景 RGBD 图像修补器从任何相机在单个位置捕获的图像轻松生成。我们的核心思想是首先从输入的全景图像构建一个初步网格，并使用全景 RGBD 图像修补器迭代地优化该网格，同时收集逼真的三维一致的伪新视图。最终，优化后的网格被转换为三维高斯喷涂场，并使用收集到的伪新视图进行训练。该流程即使在存在大量遮挡的情况下，也能重建真实的三维场景，并能合成具有详细几何信息的逼真新视图。通过大量的定性和定量实验，我们验证了我们的方法在单全景室内新视图合成方面相较于当前最先进的方法的优越性。\n"
  },
  {
    "path": "abs/2408.11447.md",
    "content": "### GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting\n\nWe introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering).\n\n我们介绍了一种系统化方法——GaussianOcc，该方法研究了高斯喷涂在完全自监督和高效的环绕视角三维占用估计中的两种用法。首先，传统的自监督三维占用估计方法在训练期间仍需要传感器提供的真实6D位姿数据。为了解决这一局限性，我们提出了用于投影的高斯喷涂（GSP）模块，通过相邻视角投影提供准确的尺度信息，从而实现完全自监督的训练。此外，现有方法依赖于体渲染来使用二维信号（深度图、语义图）进行最终三维体素表示学习，这不仅耗时而且效果较差。我们提出了从体素空间进行高斯喷涂（GSV），以利用高斯喷涂的快速渲染特性。结果表明，提出的 GaussianOcc 方法在没有真实位姿的情况下实现了完全自监督的三维占用估计，并且在性能具有竞争力的同时，计算成本较低（训练速度提高了2.7倍，渲染速度提高了5倍）。\n\n"
  },
  {
    "path": "abs/2408.11540.md",
    "content": "### DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments\n\nReconstruction under adverse rainy conditions poses significant challenges due to reduced visibility and the distortion of visual perception. These conditions can severely impair the quality of geometric maps, which is essential for applications ranging from autonomous planning to environmental monitoring. In response to these challenges, this study introduces the novel task of 3D Reconstruction in Rainy Environments (3DRRE), specifically designed to address the complexities of reconstructing 3D scenes under rainy conditions. To benchmark this task, we construct the HydroViews dataset that comprises a diverse collection of both synthesized and real-world scene images characterized by various intensities of rain streaks and raindrops. Furthermore, we propose DeRainGS, the first 3DGS method tailored for reconstruction in adverse rainy environments. Extensive experiments across a wide range of rain scenarios demonstrate that our method delivers state-of-the-art performance, remarkably outperforming existing occlusion-free methods.\n\n在恶劣的雨天条件下进行重建由于能见度降低和视觉感知的扭曲而面临重大挑战。这些条件可能严重影响几何地图的质量，而几何地图对于从自动规划到环境监测的各种应用至关重要。针对这些挑战，本研究引入了一个新颖的任务——雨天环境下的三维重建（3DRRE），专门用于解决雨天条件下三维场景重建的复杂性。为评估这一任务，我们构建了 HydroViews 数据集，该数据集包含了多种强度的雨痕和雨滴特征的合成和真实世界场景图像。此外，我们提出了 DeRainGS，这是首个专为恶劣雨天环境中的重建设计的 3DGS 方法。在各种雨天场景下的大量实验表明，我们的方法提供了最先进的性能，显著优于现有的无遮挡方法。\n"
  },
  {
    "path": "abs/2408.11697.md",
    "content": "### Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors\n\n3D Gaussian Splatting has shown impressive novel view synthesis results; nonetheless, it is vulnerable to dynamic objects polluting the input data of an otherwise static scene, so called distractors. Distractors have severe impact on the rendering quality as they get represented as view-dependent effects or result in floating artifacts. Our goal is to identify and ignore such distractors during the 3D Gaussian optimization to obtain a clean reconstruction. To this end, we take a self-supervised approach that looks at the image residuals during the optimization to determine areas that have likely been falsified by a distractor. In addition, we leverage a pretrained segmentation network to provide object awareness, enabling more accurate exclusion of distractors. This way, we obtain segmentation masks of distractors to effectively ignore them in the loss formulation. We demonstrate that our approach is robust to various distractors and strongly improves rendering quality on distractor-polluted scenes, improving PSNR by 1.86dB compared to 3D Gaussian Splatting.\n\n3D 高斯喷涂在新视角合成中展示了令人印象深刻的效果，然而，它容易受到动态物体干扰的影响，这些干扰物会污染原本静态场景的输入数据，被称为干扰物。干扰物会严重影响渲染质量，因为它们会表现为视角依赖的效果或导致浮动的伪影。我们的目标是在 3D 高斯优化过程中识别并忽略这些干扰物，以获得干净的重建效果。为此，我们采用了一种自监督的方法，通过在优化过程中观察图像残差来确定哪些区域可能被干扰物伪造。此外，我们利用预训练的分割网络提供物体感知能力，从而更准确地排除干扰物。通过这种方式，我们获得了干扰物的分割掩码，在损失函数中有效地忽略它们。实验表明，我们的方法对各种干扰物具有鲁棒性，并在干扰物污染的场景中显著提高了渲染质量，与传统 3D 高斯喷涂方法相比，峰值信噪比（PSNR）提高了 1.86dB。\n"
  },
  {
    "path": "abs/2408.12282.md",
    "content": "### Subsurface Scattering for 3D Gaussian Splatting\n\n3D reconstruction and relighting of objects made from scattering materials present a significant challenge due to the complex light transport beneath the surface. 3D Gaussian Splatting introduced high-quality novel view synthesis at real-time speeds. While 3D Gaussians efficiently approximate an object's surface, they fail to capture the volumetric properties of subsurface scattering. We propose a framework for optimizing an object's shape together with the radiance transfer field given multi-view OLAT (one light at a time) data. Our method decomposes the scene into an explicit surface represented as 3D Gaussians, with a spatially varying BRDF, and an implicit volumetric representation of the scattering component. A learned incident light field accounts for shadowing. We optimize all parameters jointly via ray-traced differentiable rendering. Our approach enables material editing, relighting and novel view synthesis at interactive rates. We show successful application on synthetic data and introduce a newly acquired multi-view multi-light dataset of objects in a light-stage setup. Compared to previous work we achieve comparable or better results at a fraction of optimization and rendering time while enabling detailed control over material attributes.\n\n3D重建和重新照明由散射材料制成的物体具有显著的挑战性，因为表面下复杂的光传输。3D高斯投影技术引入了实时速度下的高质量新视图合成。虽然3D高斯能够高效地近似物体的表面，但它们无法捕捉次表面散射的体积特性。我们提出了一个框架，通过多视角OLAT（每次一个光源）数据来优化物体的形状以及辐射传输场。我们的方法将场景分解为一个由3D高斯表示的显式表面，带有空间变化的BRDF，以及一个隐式的散射组件体积表示。一个学习的入射光场负责阴影处理。我们通过光线追踪可微分渲染共同优化所有参数。我们的方法实现了材质编辑、重新照明和交互速率下的新视图合成。我们展示了在合成数据上的成功应用，并引入了一个新获得的多视角多光源数据集，该数据集在光照舞台环境下采集。与之前的工作相比，我们在优化和渲染时间的一小部分内实现了可比或更好的结果，同时能够详细控制材质属性。\n"
  },
  {
    "path": "abs/2408.12677.md",
    "content": "### GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion\n\nTraditional volumetric fusion algorithms preserve the spatial structure of 3D scenes, which is beneficial for many tasks in computer vision and robotics. However, they often lack realism in terms of visualization. Emerging 3D Gaussian splatting bridges this gap, but existing Gaussian-based reconstruction methods often suffer from artifacts and inconsistencies with the underlying 3D structure, and struggle with real-time optimization, unable to provide users with immediate feedback in high quality. One of the bottlenecks arises from the massive amount of Gaussian parameters that need to be updated during optimization. Instead of using 3D Gaussian as a standalone map representation, we incorporate it into a volumetric mapping system to take advantage of geometric information and propose to use a quadtree data structure on images to drastically reduce the number of splats initialized. In this way, we simultaneously generate a compact 3D Gaussian map with fewer artifacts and a volumetric map on the fly. Our method, GSFusion, significantly enhances computational efficiency without sacrificing rendering quality, as demonstrated on both synthetic and real datasets.\n\n传统的体积融合算法能够保持 3D 场景的空间结构，这对于计算机视觉和机器人技术中的许多任务是有益的。然而，这些算法在可视化方面往往缺乏现实感。新兴的 3D 高斯点云技术弥补了这一差距，但现有的基于高斯的重建方法常常存在伪影和与基础 3D 结构的不一致，并且在实时优化方面表现不佳，无法为用户提供高质量的即时反馈。瓶颈之一来自于在优化过程中需要更新的大量高斯参数。我们不再将 3D 高斯作为单独的地图表示，而是将其整合到体积映射系统中，利用几何信息，并提出使用图像上的四叉树数据结构，以大幅减少初始化时的点云数量。通过这种方式，我们同时生成了一个更紧凑的 3D 高斯地图，减少了伪影，并实时生成体积地图。我们的方法 GSFusion 显著提高了计算效率，同时不牺牲渲染质量，这在合成数据集和真实数据集上都得到了验证。\n"
  },
  {
    "path": "abs/2408.12894.md",
    "content": "### FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering\n\n3D Gaussian Splatting (3DGS) achieves fast and high-quality renderings by using numerous small Gaussians, which leads to significant memory consumption. This reliance on a large number of Gaussians restricts the application of 3DGS-based models on low-cost devices due to memory limitations. However, simply reducing the number of Gaussians to accommodate devices with less memory capacity leads to inferior quality compared to the quality that can be achieved on high-end hardware. To address this lack of scalability, we propose integrating a Flexible Level of Detail (FLoD) to 3DGS, to allow a scene to be rendered at varying levels of detail according to hardware capabilities. While existing 3DGSs with LoD focus on detailed reconstruction, our method provides reconstructions using a small number of Gaussians for reduced memory requirements, and a larger number of Gaussians for greater detail. Experiments demonstrate our various rendering options with tradeoffs between rendering quality and memory usage, thereby allowing real-time rendering across different memory constraints. Furthermore, we show that our method generalizes to different 3DGS frameworks, indicating its potential for integration into future state-of-the-art developments.\n\n3D 高斯点云（3DGS）通过使用大量小型高斯实现了快速和高质量的渲染，但这也导致了显著的内存消耗。这种对大量高斯的依赖限制了 3DGS 基础模型在低成本设备上的应用，因为这些设备的内存有限。然而，简单地减少高斯数量以适应内存较小的设备，会导致比高端硬件上实现的质量差。为了解决这种可扩展性不足的问题，我们提出将灵活细节级别（FLoD）集成到 3DGS 中，以根据硬件能力以不同细节级别渲染场景。虽然现有的带有细节级别的 3DGS 侧重于详细重建，我们的方法使用少量高斯进行重建以减少内存需求，并使用更多高斯以获得更高细节。实验展示了我们的各种渲染选项在渲染质量和内存使用之间的权衡，从而允许在不同内存限制下进行实时渲染。此外，我们还表明我们的方法可以推广到不同的 3DGS 框架，显示出其在未来前沿发展中的整合潜力。\n"
  },
  {
    "path": "abs/2408.13036.md",
    "content": "### S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points\n\nRecently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To overcome these challenges, we introduce a novel approach utilizing discrete 3D control points. This method models local rays physically and establishes a motion-decoupling coordinate system, which effectively merges traditional graphics with learnable pipelines for a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that incorporates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D real-world reconstruction into four independent submodules: 3D segmentation, 3D control points generation, object-wise motion manipulation, and residual compensation. Our experiments demonstrate that this method outperforms existing state-of-the-art 4D Gaussian Splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Our approach also significantly accelerates training, with the optimization of our 3D control points achievable within just 2 seconds per frame on a single NVIDIA 4070 GPU.\n\n最近，使用高斯进行动态场景重建引起了越来越多的关注。主流方法通常采用全局变形场在典范空间中扭曲 3D 场景。然而，隐式神经场固有的低频特性常常导致对复杂运动的表示效果不佳。此外，它们的结构刚性可能妨碍对分辨率和持续时间变化的场景的适应。为了解决这些问题，我们提出了一种利用离散 3D 控制点的新方法。该方法物理建模局部光线，并建立了一个运动解耦坐标系统，能够有效地将传统图形学与可学习管道结合起来，实现强大而高效的局部 6 自由度（6-DoF）运动表示。此外，我们开发了一个通用框架，将我们的控制点与高斯相结合。从初始的 3D 重建开始，我们的工作流程将流式 4D 现实世界重建分解为四个独立的子模块：3D 分割、3D 控制点生成、对象级运动操控和残差补偿。实验表明，该方法在 Neu3DV 和 CMU-Panoptic 数据集上优于现有的最先进 4D 高斯点云技术。我们的方案还显著加快了训练速度，在单个 NVIDIA 4070 GPU 上每帧的 3D 控制点优化时间可缩短至仅 2 秒。\n"
  },
  {
    "path": "abs/2408.13055.md",
    "content": "### Atlas Gaussians Diffusion for 3D Generation with Infinite Number of Points\n\nUsing the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables high-quality details of generation results. Moreover, due to local awareness of the representation, the transformer-based decoding procedure operates on a patch level, ensuring efficiency. We train a variational autoencoder to learn the Atlas Gaussians representation, and then apply a latent diffusion model on its latent space for learning 3D Generation. Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation.\n\n使用潜在扩散模型在开发新型 3D 生成技术方面已被证明是有效的。要利用潜在扩散模型，一个关键挑战是设计一种高保真且高效的表示方式，将潜在空间和 3D 空间连接起来。本文介绍了 Atlas Gaussians，一种用于前馈本地 3D 生成的新型表示。Atlas Gaussians 将形状表示为局部补丁的并集，每个补丁可以解码 3D 高斯。我们将补丁参数化为一系列特征向量，并设计了一个可学习的函数，从特征向量中解码 3D 高斯。在此过程中，我们结合了基于 UV 的采样，允许生成足够大且理论上无限的 3D 高斯点。大量的 3D 高斯点能够生成高质量的细节。此外，由于表示的局部特性，基于变压器的解码过程在补丁级别上操作，确保了效率。我们训练了一个变分自编码器来学习 Atlas Gaussians 表示，然后在其潜在空间上应用潜在扩散模型进行 3D 生成。实验表明，我们的方法优于以前的前馈本地 3D 生成技术。\n"
  },
  {
    "path": "abs/2408.13252.md",
    "content": "### LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation\n\n3D immersive scene generation is a challenging yet critical task in computer vision and graphics. A desired virtual 3D scene should 1) exhibit omnidirectional view consistency, and 2) allow for free exploration in complex scene hierarchies. Existing methods either rely on successive scene expansion via inpainting or employ panorama representation to represent large FOV scene environments. However, the generated scene suffers from semantic drift during expansion and is unable to handle occlusion among scene hierarchies. To tackle these challenges, we introduce LayerPano3D, a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt. Our key insight is to decompose a reference 2D panorama into multiple layers at different depth levels, where each layer reveals the unseen space from the reference views via diffusion prior. LayerPano3D comprises multiple dedicated designs: 1) we introduce a novel text-guided anchor view synthesis pipeline for high-quality, consistent panorama generation. 2) We pioneer the Layered 3D Panorama as underlying representation to manage complex scene hierarchies and lift it into 3D Gaussians to splat detailed 360-degree omnidirectional scenes with unconstrained viewing paths. Extensive experiments demonstrate that our framework generates state-of-the-art 3D panoramic scene in both full view consistency and immersive exploratory experience. We believe that LayerPano3D holds promise for advancing 3D panoramic scene creation with numerous applications.\n\n3D 沉浸式场景生成是计算机视觉和图形学中的一个具有挑战性但至关重要的任务。理想的虚拟 3D 场景应具备 1) 全方位视图一致性，以及 2) 允许在复杂场景层级中自由探索。现有方法要么依赖于通过修补逐步扩展场景，要么使用全景表示来展现大视场的场景环境。然而，这些方法在扩展过程中常常出现语义漂移，并且无法处理场景层级间的遮挡。为了解决这些问题，我们提出了 LayerPano3D，一种从单一文本提示生成全视角、可探索全景 3D 场景的新框架。我们的关键见解是将参考的 2D 全景分解为不同深度层次的多个层，其中每一层通过扩散先验揭示参考视角下未见的空间。LayerPano3D 包括多个专用设计：1) 我们引入了一种新型的文本引导锚点视图合成管道，实现高质量、一致的全景生成。2) 我们开创了层次化 3D 全景作为基础表示，以管理复杂的场景层级，并将其提升为 3D 高斯，生成详细的 360 度全方位场景，支持无约束的视角路径。广泛的实验表明，我们的框架在全视角一致性和沉浸式探索体验上生成了最先进的 3D 全景场景。我们相信，LayerPano3D 对于推动 3D 全景场景创作具有广泛的应用前景。\n"
  },
  {
    "path": "abs/2408.13370.md",
    "content": "### BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting\n\nWe present Bidirectional Gaussian Primitives, an image-based novel view synthesis technique designed to represent and render 3D objects with surface and volumetric materials under dynamic illumination. Our approach integrates light intrinsic decomposition into the Gaussian splatting framework, enabling real-time relighting of 3D objects. To unify surface and volumetric material within a cohesive appearance model, we adopt a light- and view-dependent scattering representation via bidirectional spherical harmonics. Our model does not use a specific surface normal-related reflectance function, making it more compatible with volumetric representations like Gaussian splatting, where the normals are undefined. We demonstrate our method by reconstructing and rendering objects with complex materials. Using One-Light-At-a-Time (OLAT) data as input, we can reproduce photorealistic appearances under novel lighting conditions in real time.\n\n我们提出了双向高斯原语（Bidirectional Gaussian Primitives），这是一种基于图像的新视图合成技术，旨在在动态光照下表示和渲染具有表面和体积材料的 3D 对象。我们的方法将光的内在分解集成到高斯点云框架中，实现了 3D 对象的实时重光照。为了将表面和体积材料统一到一个一致的外观模型中，我们采用了基于双向球面谐波的光照和视角依赖散射表示。我们的模型不使用特定的表面法线相关反射函数，因此更适合与高斯点云等体积表示配合使用，因为这些表示中法线是未定义的。我们通过重建和渲染具有复杂材料的对象来展示我们的方法。利用逐光源（One-Light-At-a-Time, OLAT）数据作为输入，我们能够在新光照条件下实时重现逼真的外观。\n"
  },
  {
    "path": "abs/2408.13711.md",
    "content": "### SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting\n\nText-driven 3D scene generation has seen significant advancements recently. However, most existing methods generate single-view images using generative models and then stitch them together in 3D space. This independent generation for each view often results in spatial inconsistency and implausibility in the 3D scenes. To address this challenge, we proposed a novel text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation and employs 3D Gaussian Splatting (3DGS) to ensure consistency across multi-view panoramic images. Specifically, SceneDreamer360 enhances the fine-tuned Panfusion generator with a three-stage panoramic enhancement, enabling the generation of high-resolution, detail-rich panoramic images. During the 3D scene construction, a novel point cloud fusion initialization method is used, producing higher quality and spatially consistent point clouds. Our extensive experiments demonstrate that compared to other methods, SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt.\n\n基于文本的 3D 场景生成最近取得了显著进展。然而，大多数现有方法生成单视图图像并将其拼接到 3D 空间中，这种对每个视图的独立生成常常导致 3D 场景中的空间不一致和不现实。为了解决这个问题，我们提出了一种新型的基于文本的 3D 一致场景生成模型：SceneDreamer360。我们的方法利用基于文本的全景图像生成模型作为 3D 场景生成的先验，并采用 3D 高斯点云（3DGS）确保多视图全景图像的一致性。具体来说，SceneDreamer360 对细化的 Panfusion 生成器进行了三阶段的全景增强，实现了高分辨率、细节丰富的全景图像生成。在 3D 场景构建过程中，我们使用了一种新型的点云融合初始化方法，生成了更高质量和空间一致的点云。我们的大量实验表明，与其他方法相比，SceneDreamer360 通过全景图像生成和 3DGS 能够从任何文本提示生成更高质量、空间一致且视觉上更吸引人的 3D 场景。\n\n"
  },
  {
    "path": "abs/2408.13770.md",
    "content": "### TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers\n\nCompared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlapping areas between various views and contain numerous similar regions, the matching performance of existing methods is poor and the reconstruction precision is limited. To address this problem, we develop a strategy that utilizes a predicted depth confidence map to guide accurate local feature matching. In addition, we propose to utilize the knowledge of existing monocular depth estimation models as prior to boost the depth estimation precision in non-overlapping areas between views. Combining the proposed strategies, we present a novel G-3DGS method named TranSplat, which obtains the best performance on both the RealEstate10K and ACID benchmarks while maintaining competitive speed and presenting strong cross-dataset generalization ability.\n\n与之前的 3D 重建方法如 Nerf 相比，最近的通用 3D 高斯点云（G-3DGS）方法在稀疏视图设置下表现出令人印象深刻的效率。然而，现有 G-3DGS 方法的良好重建性能在很大程度上依赖于准确的多视图特征匹配，这非常具有挑战性。特别是对于那些各视图之间有许多非重叠区域并且包含大量相似区域的场景，现有方法的匹配性能较差，重建精度有限。为了解决这个问题，我们开发了一种利用预测深度置信度图来引导准确局部特征匹配的策略。此外，我们建议利用现有单目深度估计模型的知识作为先验，以提升视图间非重叠区域的深度估计精度。结合这些策略，我们提出了一种新型 G-3DGS 方法，命名为 TranSplat，它在 RealEstate10K 和 ACID 基准测试中均获得了最佳性能，同时保持了竞争的速度，并展示了强大的跨数据集泛化能力。\n\n"
  },
  {
    "path": "abs/2408.13912.md",
    "content": "### Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs\n\nIn this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a foundation 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to construct a Gaussian primitive for each point. Hence, unlike other novel view synthesis methods, Splatt3R is first trained by optimizing the 3D point cloud's geometry loss, and then a novel view synthesis objective. By doing this, we avoid the local minima present in training 3D Gaussian Splats from stereo views. We also propose a novel loss masking strategy that we empirically find is critical for strong performance on extrapolated viewpoints. We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images. Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.\n\n在本文中，我们介绍了Splatt3R，这是一种无姿态、前馈方法，用于从立体图像对中进行野外场景的3D重建和新视角合成。在给定未校准的自然图像的情况下，Splatt3R可以预测3D高斯斑点，而不需要任何相机参数或深度信息。为了提高泛化能力，我们在一个“基础”3D几何重建方法MASt3R的基础上构建了Splatt3R，通过扩展它来处理3D结构和外观。具体来说，与原始MASt3R仅重建3D点云不同，我们预测了构造每个点所需的额外高斯属性。因此，与其他新视角合成方法不同，Splatt3R首先通过优化3D点云的几何损失进行训练，然后再进行新视角合成目标的训练。通过这种方式，我们避免了训练3D高斯斑点时存在的局部最小值问题。我们还提出了一种新颖的损失掩蔽策略，我们通过实验证明，这对在外推视点上获得强性能至关重要。我们在ScanNet++数据集上训练了Splatt3R，并展示了它在未校准的野外图像中的优异泛化能力。Splatt3R可以以512 x 512分辨率下每秒4帧的速度重建场景，并且生成的斑点可以实时渲染。\n\n"
  },
  {
    "path": "abs/2408.13972.md",
    "content": "### DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting\n\nDynamic scene reconstruction has garnered significant attention in recent years due to its capabilities in high-quality and real-time rendering. Among various methodologies, constructing a 4D spatial-temporal representation, such as 4D-GS, has gained popularity for its high-quality rendered images. However, these methods often produce suboptimal surfaces, as the discrete 3D Gaussian point clouds fail to align with the object's surface precisely. To address this problem, we propose DynaSurfGS to achieve both photorealistic rendering and high-fidelity surface reconstruction of dynamic scenarios. Specifically, the DynaSurfGS framework first incorporates Gaussian features from 4D neural voxels with the planar-based Gaussian Splatting to facilitate precise surface reconstruction. It leverages normal regularization to enforce the smoothness of the surface of dynamic objects. It also incorporates the as-rigid-as-possible (ARAP) constraint to maintain the approximate rigidity of local neighborhoods of 3D Gaussians between timesteps and ensure that adjacent 3D Gaussians remain closely aligned throughout. Extensive experiments demonstrate that DynaSurfGS surpasses state-of-the-art methods in both high-fidelity surface reconstruction and photorealistic rendering.\n\n动态场景重建近年来受到了广泛关注，因为它在高质量和实时渲染方面具有显著能力。在各种方法中，构建4D时空表示（如4D-GS）因其高质量的渲染图像而受到欢迎。然而，这些方法往往会产生次优的表面，因为离散的3D高斯点云无法精确对齐物体表面。为了解决这一问题，我们提出了DynaSurfGS，以实现动态场景的光学真实感渲染和高保真表面重建。具体而言，DynaSurfGS框架首先将来自4D神经体素的高斯特征与基于平面的高斯斑点结合，以促进精确的表面重建。它利用法线正则化来强制动态物体表面的光滑性。它还引入了尽可能刚性的（ARAP）约束，以保持时间步之间3D高斯的局部邻域的近似刚性，并确保相邻的3D高斯在整个过程中保持紧密对齐。大量实验表明，DynaSurfGS在高保真表面重建和光学真实感渲染方面超越了现有的最先进方法。\n"
  },
  {
    "path": "abs/2408.13995.md",
    "content": "### Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control\n\nLanguage based editing of 3D human avatars to precisely match user requirements is challenging due to the inherent ambiguity and limited expressiveness of natural language. To overcome this, we propose the Avatar Concept Slider (ACS), a 3D avatar editing method that allows precise manipulation of semantic concepts in human avatars towards a specified intermediate point between two extremes of concepts, akin to moving a knob along a slider track. To achieve this, our ACS has three designs. 1) A Concept Sliding Loss based on Linear Discriminant Analysis to pinpoint the concept-specific axis for precise editing. 2) An Attribute Preserving Loss based on Principal Component Analysis for improved preservation of avatar identity during editing. 3) A 3D Gaussian Splatting primitive selection mechanism based on concept-sensitivity, which updates only the primitives that are the most sensitive to our target concept, to improve efficiency. Results demonstrate that our ACS enables fine-grained 3D avatar editing with efficient feedback, without harming the avatar quality or compromising the avatar's identifying attributes.\n\n基于语言对3D人类头像进行编辑以精确匹配用户需求是一项具有挑战性的任务，因为自然语言固有的模糊性和表达限制。为了解决这个问题，我们提出了Avatar Concept Slider (ACS)，这是一种3D头像编辑方法，允许在两个概念极端之间向指定的中间点精确操作语义概念，类似于在滑块轨道上移动旋钮。为实现这一目标，我们的ACS有三个设计要点：1) 基于线性判别分析的概念滑动损失，用于精确定位概念特定轴线以进行编辑。2) 基于主成分分析的属性保留损失，以在编辑过程中提高头像身份的保留。3) 基于概念敏感性的3D高斯斑点原语选择机制，仅更新对目标概念最敏感的原语，以提高效率。结果表明，我们的ACS实现了细粒度的3D头像编辑，并且能够高效地提供反馈，而不会损害头像质量或妨碍头像的识别属性。\n"
  },
  {
    "path": "abs/2408.14823.md",
    "content": "### LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming\n\nThe rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. This paper proposes LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maintain visual fidelity, and utilizes occupancy maps to efficiently manage Gaussian splats. This proposed model offers a progressive representation supporting a continuous rendering quality adapted for bandwidth-aware streaming. Extensive experiments validate the effectiveness of our approach in balancing visual fidelity with the compactness of the model, with up to 50.71% improvement in SSIM, 286.53% improvement in LPIPS, and 318.41% reduction in model size, and shows its potential for bandwidth-adapted 3D streaming and rendering applications.\n\n随着扩展现实（XR）的兴起，需要高效地流式传输3D在线世界，这对当前的3D高斯斑点（3DGS）表示在带宽受限环境下的适应性提出了挑战。本文提出了LapisGS，这是一种支持自适应流式传输和渐进渲染的分层3DGS方法。我们的方法构建了一个累积表示的分层结构，结合了动态不透明度优化以保持视觉保真度，并利用占用图来高效管理高斯斑点。该模型提供了一种渐进式表示，支持根据带宽需求自适应的连续渲染质量。大量实验验证了我们的方法在平衡视觉保真度与模型紧凑性方面的有效性，SSIM提高了最高50.71%，LPIPS提高了286.53%，模型大小减少了318.41%，并展示了其在带宽适配的3D流媒体和渲染应用中的潜力。\n\n"
  },
  {
    "path": "abs/2408.14873.md",
    "content": "### Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation\n\nReal2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes. We propose a Real2Sim pipeline with a hybrid representation model that integrates mesh geometry, 3D Gaussian kernels, and physics attributes to enhance the digital asset representation of robotic arms. This hybrid representation is implemented through a Gaussian-Mesh-Pixel binding technique, which establishes an isomorphic mapping between mesh vertices and Gaussian models. This enables a fully differentiable rendering pipeline that can be optimized through numerical solvers, achieves high-fidelity rendering via Gaussian Splatting, and facilitates physically plausible simulation of the robotic arm's interaction with its environment using mesh-based methods.\n\nReal2Sim2Real在机器人臂控制和强化学习中扮演着关键角色，但由于机器人及其操作的物体的复杂物理属性，弥合这一差距仍然是一个重大挑战。现有方法缺乏一种全面的解决方案来准确重建具有空间表示和相关物理属性的真实世界物体。\n我们提出了一种Real2Sim管道，采用混合表示模型，将网格几何、3D高斯核和物理属性结合起来，以增强机器人臂的数字资产表示。这种混合表示通过高斯-网格-像素绑定技术实现，该技术在网格顶点和高斯模型之间建立了同构映射。这使得通过数值求解器进行优化的完全可微分渲染管道成为可能，通过高斯斑点实现高保真渲染，并使用基于网格的方法促进机器人臂与环境交互的物理上合理的模拟。\n"
  },
  {
    "path": "abs/2408.15242.md",
    "content": "### Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty\n\nRobust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.\n\n在自动驾驶仿真中，对大规模道路场景进行稳健且真实的渲染至关重要。最近，3D高斯斑点（3D-GS）在神经渲染方面取得了突破性进展，但大规模道路场景渲染的总体保真度通常受到输入图像的限制，这些图像通常具有狭窄的视野，并主要集中在街道级别的局部区域。直观上，从无人机视角获取的数据可以为地面车辆视角的数据提供补充视角，从而增强场景重建和渲染的完整性。然而，直接使用具有较大视角差异的空中图像和地面图像进行训练，对3D-GS来说会导致显著的收敛挑战，并且在道路视图上的性能提升不明显。\n为了增强道路视图的新视角合成并有效利用空中信息，我们设计了一种不确定性感知训练方法，使空中图像能够帮助合成地面图像学习效果较差的区域，而不是像之前的工作那样在3D-GS训练中对所有像素进行均等加权。我们首次将交叉视图不确定性引入3D-GS，通过将汽车视角的集合基础渲染不确定性与空中图像匹配，权衡每个像素对训练过程的贡献。此外，为了系统地量化评估指标，我们组装了一个高质量合成数据集，包括道路场景的空中和地面图像。\n"
  },
  {
    "path": "abs/2408.15695.md",
    "content": "### G-Style: Stylized Gaussian Splatting\n\nWe introduce G-Style, a novel algorithm designed to transfer the style of an image onto a 3D scene represented using Gaussian Splatting. Gaussian Splatting is a powerful 3D representation for novel view synthesis, as -- compared to other approaches based on Neural Radiance Fields -- it provides fast scene renderings and user control over the scene. Recent pre-prints have demonstrated that the style of Gaussian Splatting scenes can be modified using an image exemplar. However, since the scene geometry remains fixed during the stylization process, current solutions fall short of producing satisfactory results. Our algorithm aims to address these limitations by following a three-step process: In a pre-processing step, we remove undesirable Gaussians with large projection areas or highly elongated shapes. Subsequently, we combine several losses carefully designed to preserve different scales of the style in the image, while maintaining as much as possible the integrity of the original scene content. During the stylization process and following the original design of Gaussian Splatting, we split Gaussians where additional detail is necessary within our scene by tracking the gradient of the stylized color. Our experiments demonstrate that G-Style generates high-quality stylizations within just a few minutes, outperforming existing methods both qualitatively and quantitatively.\n\n我们介绍了G-Style，这是一种新颖的算法，用于将图像的风格转移到使用高斯斑点表示的3D场景中。高斯斑点是一种强大的3D表示方法，特别适用于新视角合成，因为与基于神经辐射场的其他方法相比，它提供了快速的场景渲染和对场景的用户控制。近期的预印本已展示了如何使用图像样本修改高斯斑点场景的风格。然而，由于场景几何在风格化过程中保持不变，现有解决方案未能产生令人满意的结果。\n我们的算法旨在通过以下三步过程来解决这些局限性：在预处理步骤中，我们去除投影面积较大或形状高度延伸的非理想高斯。随后，我们结合了几种精心设计的损失函数，以保留图像中不同尺度的风格，同时尽可能保持原始场景内容的完整性。在风格化过程中，按照高斯斑点的原始设计，我们根据风格化颜色的梯度跟踪，在场景中拆分需要额外细节的高斯。我们的实验表明，G-Style能够在短短几分钟内生成高质量的风格化效果，并在定性和定量方面超越了现有方法。\n"
  },
  {
    "path": "abs/2408.15708.md",
    "content": "### Towards Realistic Example-based Modeling via 3D Gaussian Stitching\n\nUsing parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appearance blending. However, the current SeamlessNeRF method struggles to achieve interactive editing and harmonious stitching for real-world scenes due to its gradient-based strategy and grid-based representation. To this end, we present an example-based modeling method that combines multiple Gaussian fields in a point-based representation using sample-guided synthesis. Specifically, as for composition, we create a GUI to segment and transform multiple fields in real time, easily obtaining a semantically meaningful composition of models represented by 3D Gaussian Splatting (3DGS). For texture blending, due to the discrete and irregular nature of 3DGS, straightforwardly applying gradient propagation as SeamlssNeRF is not supported. Thus, a novel sampling-based cloning method is proposed to harmonize the blending while preserving the original rich texture and content. Our workflow consists of three steps: 1) real-time segmentation and transformation of a Gaussian model using a well-tailored GUI, 2) KNN analysis to identify boundary points in the intersecting area between the source and target models, and 3) two-phase optimization of the target model using sampling-based cloning and gradient constraints. Extensive experimental results validate that our approach significantly outperforms previous works in terms of realistic synthesis, demonstrating its practicality.\n\n使用现有模型的部分重建新模型，通常称为基于示例的建模，是计算机图形学领域的一种经典方法。之前的工作大多集中于形状合成，这使得它们在现实世界场景中对3D物体的逼真组合应用上非常困难。这导致了将多个NeRF结合成一个单一3D场景以实现无缝外观混合。然而，当前的SeamlessNeRF方法由于其基于梯度的策略和网格表示，难以实现交互式编辑和现实世界场景的和谐拼接。\n为此，我们提出了一种基于示例的建模方法，结合了多个高斯场，通过点基表示使用样本指导合成。具体来说，在合成方面，我们创建了一个GUI来实时分割和变换多个高斯场，轻松获得由3D高斯斑点（3DGS）表示的语义上有意义的模型组合。对于纹理混合，由于3DGS的离散和不规则性质，像SeamlessNeRF那样直接应用梯度传播是不支持的。因此，提出了一种新的基于采样的克隆方法来协调混合，同时保留原始丰富的纹理和内容。我们的工作流程包括三个步骤：1) 使用精心设计的GUI对高斯模型进行实时分割和变换，2) KNN分析以识别源模型与目标模型交叉区域的边界点，3) 使用基于采样的克隆和梯度约束对目标模型进行两阶段优化。大量实验结果验证了我们的方法在现实合成方面显著优于之前的工作，展示了其实用性。\n"
  },
  {
    "path": "abs/2408.16760.md",
    "content": "### OmniRe: Omni Urban Scene Reconstruction\n\nWe introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs. Recent methods for modeling driving sequences using neural radiance fields or Gaussian Splatting have demonstrated the potential of reconstructing challenging dynamic scenes, but often overlook pedestrians and other non-vehicle dynamic actors, hindering a complete pipeline for dynamic urban scene reconstruction. To that end, we propose a comprehensive 3DGS framework for driving scenes, named OmniRe, that allows for accurate, full-length reconstruction of diverse dynamic objects in a driving log. OmniRe builds dynamic neural scene graphs based on Gaussian representations and constructs multiple local canonical spaces that model various dynamic actors, including vehicles, pedestrians, and cyclists, among many others. This capability is unmatched by existing methods. OmniRe allows us to holistically reconstruct different objects present in the scene, subsequently enabling the simulation of reconstructed scenarios with all actors participating in real-time (~60Hz). Extensive evaluations on the Waymo dataset show that our approach outperforms prior state-of-the-art methods quantitatively and qualitatively by a large margin. We believe our work fills a critical gap in driving reconstruction.\n\n我们介绍了 OmniRe，这是一种全面的方法，用于高效重建高保真动态城市场景。近期使用神经辐射场或高斯点云进行驾驶序列建模的方法展示了重建复杂动态场景的潜力，但常常忽视行人和其他非车辆动态角色，阻碍了动态城市场景重建的完整流程。为此，我们提出了一个全面的 3DGS 框架，名为 OmniRe，它允许在驾驶日志中准确、完整地重建各种动态物体。OmniRe 基于高斯表示构建动态神经场景图，并构建多个本地典型空间，以建模包括车辆、行人和骑自行车者在内的各种动态角色。这一能力在现有方法中无可比拟。OmniRe 使我们能够全面重建场景中存在的不同物体，进而实现所有参与者实时（~60Hz）模拟重建场景。在 Waymo 数据集上的广泛评估表明，我们的方法在定量和定性方面均大幅超越了现有的最先进方法。我们相信我们的工作填补了驾驶重建中的一个关键空白。\n"
  },
  {
    "path": "abs/2408.16767.md",
    "content": "### ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model\n\nAdvancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.\n\n3D 场景重建的进展将现实世界的 2D 图像转化为 3D 模型，通过数百张输入照片生成逼真的 3D 结果。尽管在密集视图重建场景中取得了巨大成功，但从不足的捕获视图中渲染详细场景仍然是一个不适定的优化问题，通常会导致未见区域的伪影和扭曲。在本文中，我们提出了 ReconX，这是一种新颖的 3D 场景重建范式，将模糊的重建挑战重新定义为时间生成任务。关键的见解是释放大型预训练视频扩散模型在稀疏视图重建中的强生成先验。然而，在直接从预训练模型生成的视频帧中，3D 视图一致性难以准确保持。为解决这一问题，在有限输入视图的情况下，ReconX 首先构建全球点云并将其编码为上下文空间作为 3D 结构条件。视频扩散模型在该条件指导下合成既保留细节又具有高度 3D 一致性的视频帧，确保从各个角度的场景连贯性。最后，我们通过一种基于置信度的 3D 高斯点云优化方案从生成的视频中恢复 3D 场景。在各种现实世界数据集上的广泛实验表明，我们的 ReconX 在质量和泛化能力方面优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2408.16982.md",
    "content": "### 2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction\n\n2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank term in the updated formulation. Our experiments demonstrate the extraordinary performance of Gaussian-Hermite kernel in both geometry reconstruction and novel-view synthesis tasks. The proposed kernel outperforms traditional Gaussian Splatting kernels, showcasing its potential for high-quality 3D reconstruction and rendering.\n\n2D 高斯点云最近成为 3D 重建中的一种重要方法，能够同时实现新视图合成和几何重建。尽管广泛使用著名的高斯核，但其缺乏各向异性和变形能力，导致物体轮廓处边缘模糊，限制了当前高斯点云方法的重建质量。为了提升表示能力，我们从量子物理学中获得灵感，提出使用高斯-厄米特核作为高斯点云中的新原语。新核采用统一的数学形式，扩展了高斯函数，作为更新公式中的零阶项。我们的实验表明，高斯-厄米特核在几何重建和新视图合成任务中表现卓越。与传统高斯点云核相比，所提出的核展现了其在高质量 3D 重建和渲染中的潜力。\n"
  },
  {
    "path": "abs/2408.17223.md",
    "content": "### OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping\n\n3D Gaussian splatting (3DGS) has recently demonstrated promising advancements in RGB-D online dense mapping. Nevertheless, existing methods excessively rely on per-pixel depth cues to perform map densification, which leads to significant redundancy and increased sensitivity to depth noise. Additionally, explicitly storing 3D Gaussian parameters of room-scale scene poses a significant storage challenge. In this paper, we introduce OG-Mapping, which leverages the robust scene structural representation capability of sparse octrees, combined with structured 3D Gaussian representations, to achieve efficient and robust online dense mapping. Moreover, OG-Mapping employs an anchor-based progressive map refinement strategy to recover the scene structures at multiple levels of detail. Instead of maintaining a small number of active keyframes with a fixed keyframe window as previous approaches do, a dynamic keyframe window is employed to allow OG-Mapping to better tackle false local minima and forgetting issues. Experimental results demonstrate that OG-Mapping delivers more robust and superior realism mapping results than existing Gaussian-based RGB-D online mapping methods with a compact model, and no additional post-processing is required.\n\n3D 高斯点云（3DGS）最近在 RGB-D 在线密集映射中展示了有前景的进展。然而，现有方法过度依赖每像素深度线索进行地图稠密化，这导致了显著的冗余和对深度噪声的敏感性。此外，显式存储房间尺度场景的 3D 高斯参数面临显著的存储挑战。在本文中，我们介绍了 OG-Mapping，它利用稀疏八叉树的强大场景结构表示能力，结合结构化的 3D 高斯表示，实现高效且鲁棒的在线密集映射。此外，OG-Mapping 采用基于锚点的渐进式地图优化策略，以在多个细节层次恢复场景结构。与以前方法保持固定关键帧窗口的小数量活跃关键帧不同，OG-Mapping 采用动态关键帧窗口，使其能够更好地应对局部最小值和遗忘问题。实验结果表明，OG-Mapping 在紧凑模型下提供了比现有基于高斯的 RGB-D 在线映射方法更鲁棒且更优质的现实映射结果，无需额外的后处理。\n"
  },
  {
    "path": "abs/2409.00362.md",
    "content": "### UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM\n\nRecent advancements in monocular neural depth estimation, particularly those achieved by the UniDepth network, have prompted the investigation of integrating UniDepth within a Gaussian splatting framework for monocular SLAM.This study presents UDGS-SLAM, a novel approach that eliminates the necessity of RGB-D sensors for depth estimation within Gaussian splatting framework. UDGS-SLAM employs statistical filtering to ensure local consistency of the estimated depth and jointly optimizes camera trajectory and Gaussian scene representation parameters. The proposed method achieves high-fidelity rendered images and low ATERMSE of the camera trajectory. The performance of UDGS-SLAM is rigorously evaluated using the TUM RGB-D dataset and benchmarked against several baseline methods, demonstrating superior performance across various scenarios. Additionally, an ablation study is conducted to validate design choices and investigate the impact of different network backbone encoders on system performance.\n\n最近，单目神经深度估计方面的进展，特别是 UniDepth 网络的成果，促使了在单目 SLAM 中集成 UniDepth 的研究。本研究提出了 UDGS-SLAM，这是一种新颖的方法，消除了在高斯点喷射框架中进行深度估计时对 RGB-D 传感器的需求。UDGS-SLAM 使用统计滤波来确保估计深度的局部一致性，并联合优化相机轨迹和高斯场景表示参数。该方法实现了高保真度的渲染图像和较低的相机轨迹 ATERMSE。UDGS-SLAM 的性能通过 TUM RGB-D 数据集进行了严格评估，并与多个基准方法进行了比较，展示了在各种场景下的优越性能。此外，还进行了消融研究，以验证设计选择并探讨不同网络主干编码器对系统性能的影响。\n"
  },
  {
    "path": "abs/2409.00381.md",
    "content": "### 3D Gaussian Splatting for Large-scale 3D Surface Reconstruction from Aerial Images\n\nRecently, 3D Gaussian Splatting (3DGS) has garnered significant attention. However, the unstructured nature of 3DGS poses challenges for large-scale surface reconstruction from aerial images. To address this gap, we propose the first large-scale surface reconstruction method for multi-view stereo (MVS) aerial images based on 3DGS, named Aerial Gaussian Splatting (AGS). Initially, we introduce a data chunking method tailored for large-scale aerial imagery, making the modern 3DGS technology feasible for surface reconstruction over extensive scenes. Additionally, we integrate the Ray-Gaussian Intersection method to obtain normal and depth information, facilitating geometric constraints. Finally, we introduce a multi-view geometric consistency constraint to enhance global geometric consistency and improve reconstruction accuracy. Our experiments on multiple datasets demonstrate for the first time that the GS-based technique can match traditional aerial MVS methods on geometric accuracy, and beat state-of-the-art GS-based methods on geometry and rendering quality.\n\n最近，3D 高斯点喷射（3DGS）引起了广泛关注。然而，3DGS 的非结构化特性给大规模表面重建带来了挑战。为了填补这一空白，我们提出了首个基于 3DGS 的大规模多视角立体（MVS）航拍图像表面重建方法，命名为 Aerial Gaussian Splatting（AGS）。首先，我们引入了一种针对大规模航拍图像的数据分块方法，使现代 3DGS 技术能够在广阔场景中进行表面重建。此外，我们集成了 Ray-Gaussian Intersection 方法，以获取法线和深度信息，便于几何约束。最后，我们引入了多视角几何一致性约束，以提高全局几何一致性和重建精度。我们的实验表明，GS 基础技术首次在几何精度上与传统航拍 MVS 方法相匹配，并在几何和渲染质量上超越了最先进的 GS 基础方法。\n"
  },
  {
    "path": "abs/2409.01003.md",
    "content": "### Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos\n\nReconstructing endoscopic videos is crucial for high-fidelity visualization and the efficiency of surgical operations. Despite the importance, existing 3D reconstruction methods encounter several challenges, including stringent demands for accuracy, imprecise camera positioning, intricate dynamic scenes, and the necessity for rapid reconstruction. Addressing these issues, this paper presents the first camera-pose-free scene reconstruction framework, Free-DyGS, tailored for dynamic surgical videos, leveraging 3D Gaussian splatting technology. Our approach employs a frame-by-frame reconstruction strategy and is delineated into four distinct phases: Scene Initialization, Joint Learning, Scene Expansion, and Retrospective Learning. We introduce a Generalizable Gaussians Parameterization module within the Scene Initialization and Expansion phases to proficiently generate Gaussian attributes for each pixel from the RGBD frames. The Joint Learning phase is crafted to concurrently deduce scene deformation and camera pose, facilitated by an innovative flexible deformation module. In the scene expansion stage, the Gaussian points gradually grow as the camera moves. The Retrospective Learning phase is dedicated to enhancing the precision of scene deformation through the reassessment of prior frames. The efficacy of the proposed Free-DyGS is substantiated through experiments on two datasets: the StereoMIS and Hamlyn datasets. The experimental outcomes underscore that Free-DyGS surpasses conventional baseline models in both rendering fidelity and computational efficiency.\n\n重建内窥镜视频对于高保真可视化和手术操作的效率至关重要。尽管如此，现有的 3D 重建方法面临诸多挑战，包括对准确性的严格要求、相机定位不精确、复杂的动态场景以及快速重建的必要性。针对这些问题，本文提出了首个无相机姿态的场景重建框架 Free-DyGS，专为动态手术视频设计，利用 3D 高斯点喷射技术。我们的方法采用逐帧重建策略，并分为四个不同的阶段：场景初始化、联合学习、场景扩展和回顾学习。在场景初始化和扩展阶段，我们引入了通用高斯参数化模块，以高效地从 RGBD 帧生成每个像素的高斯属性。联合学习阶段旨在通过创新的灵活变形模块同时推导场景变形和相机姿态。在场景扩展阶段，随着相机移动，高斯点逐渐增长。回顾学习阶段则专注于通过重新评估先前的帧来提高场景变形的精度。通过对 StereoMIS 和 Hamlyn 数据集的实验验证，证明了 Free-DyGS 在渲染保真度和计算效率上超越了传统基线模型。\n"
  },
  {
    "path": "abs/2409.01581.md",
    "content": "### GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting\n\nDense colored point clouds enhance visual perception and are of significant value in various robotic applications. However, existing learning-based point cloud upsampling methods are constrained by computational resources and batch processing strategies, which often require subdividing point clouds into smaller patches, leading to distortions that degrade perceptual quality. To address this challenge, we propose a novel 2D-3D hybrid colored point cloud upsampling framework (GaussianPU) based on 3D Gaussian Splatting (3DGS) for robotic perception. This approach leverages 3DGS to bridge 3D point clouds with their 2D rendered images in robot vision systems. A dual scale rendered image restoration network transforms sparse point cloud renderings into dense representations, which are then input into 3DGS along with precise robot camera poses and interpolated sparse point clouds to reconstruct dense 3D point clouds. We have made a series of enhancements to the vanilla 3DGS, enabling precise control over the number of points and significantly boosting the quality of the upsampled point cloud for robotic scene understanding. Our framework supports processing entire point clouds on a single consumer-grade GPU, such as the NVIDIA GeForce RTX 3090, eliminating the need for segmentation and thus producing high-quality, dense colored point clouds with millions of points for robot navigation and manipulation tasks. Extensive experimental results on generating million-level point cloud data validate the effectiveness of our method, substantially improving the quality of colored point clouds and demonstrating significant potential for applications involving large-scale point clouds in autonomous robotics and human-robot interaction scenarios.\n\n密集的彩色点云能够提升视觉感知，并在各种机器人应用中具有重要价值。然而，现有的基于学习的点云上采样方法受到计算资源和批处理策略的限制，这些方法通常需要将点云划分为较小的块，导致失真，从而降低感知质量。为了解决这个问题，我们提出了一种新颖的基于 3D 高斯点喷射（3DGS）的 2D-3D 混合彩色点云上采样框架（GaussianPU），用于机器人感知。该方法利用 3DGS 将 3D 点云与机器人视觉系统中的 2D 渲染图像进行桥接。双尺度渲染图像恢复网络将稀疏点云渲染转换为密集表示，这些表示与精确的机器人相机姿态和插值的稀疏点云一起输入到 3DGS 中，以重建密集的 3D 点云。我们对原始 3DGS 进行了系列增强，实现了对点数的精确控制，并显著提升了点云上采样的质量，以便更好地理解机器人场景。我们的框架支持在单个消费级 GPU（如 NVIDIA GeForce RTX 3090）上处理整个点云，消除了对分割的需求，从而为机器人导航和操作任务生成高质量、密集的彩色点云，点数达到数百万级。大量生成百万级点云数据的实验结果验证了我们方法的有效性，显著提高了彩色点云的质量，并展示了在自主机器人和人机交互场景中应用大规模点云的巨大潜力。\n"
  },
  {
    "path": "abs/2409.01761.md",
    "content": "### PRoGS: Progressive Rendering of Gaussian Splats\n\nOver the past year, 3D Gaussian Splatting (3DGS) has received significant attention for its ability to represent 3D scenes in a perceptually accurate manner. However, it can require a substantial amount of storage since each splat's individual data must be stored. While compression techniques offer a potential solution by reducing the memory footprint, they still necessitate retrieving the entire scene before any part of it can be rendered. In this work, we introduce a novel approach for progressively rendering such scenes, aiming to display visible content that closely approximates the final scene as early as possible without loading the entire scene into memory. This approach benefits both on-device rendering applications limited by memory constraints and streaming applications where minimal bandwidth usage is preferred. To achieve this, we approximate the contribution of each Gaussian to the final scene and construct an order of prioritization on their inclusion in the rendering process. Additionally, we demonstrate that our approach can be combined with existing compression methods to progressively render (and stream) 3DGS scenes, optimizing bandwidth usage by focusing on the most important splats within a scene. Overall, our work establishes a foundation for making remotely hosted 3DGS content more quickly accessible to end-users in over-the-top consumption scenarios, with our results showing significant improvements in quality across all metrics compared to existing methods.\n\n在过去一年中，3D 高斯点喷射（3DGS）因其以感知准确的方式表示 3D 场景而受到了广泛关注。然而，由于每个点喷射的单独数据必须被存储，这可能需要大量的存储空间。虽然压缩技术提供了通过减少内存占用的潜在解决方案，但它们仍然需要在渲染任何部分之前检索整个场景。在这项工作中，我们提出了一种新颖的方法来逐步渲染这些场景，旨在尽早显示接近最终场景的可见内容，而无需将整个场景加载到内存中。这种方法对受内存限制的设备渲染应用程序和对带宽使用有最低要求的流式应用程序都具有优势。为此，我们近似每个高斯点对最终场景的贡献，并建立了一个优先级排序，以确定它们在渲染过程中的包含顺序。此外，我们展示了我们的方法可以与现有的压缩方法结合使用，以逐步渲染（和流式传输）3DGS 场景，通过关注场景中最重要的点喷射来优化带宽使用。总体而言，我们的工作为在超高清视频消费场景中更快地访问远程托管的 3DGS 内容奠定了基础，我们的结果在所有指标上相比现有方法显示了显著的质量提升。\n"
  },
  {
    "path": "abs/2409.02104.md",
    "content": "### DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction\n\nReconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.\n\n场景重建和运动跟踪是密切相关的两个方面。跟踪点可以实现几何重建 [14]，而对（动态）场景的几何重建则允许在时间上进行 3D 点跟踪 [24, 39]。最近，这种方法也被用于 2D 点跟踪，通过将跟踪直接提升到 3D 来克服遮挡歧义 [38]。然而，这些方法要么需要离线处理，要么依赖于多视角相机设置，这些在机器人导航或混合现实等实际应用中并不切实际。我们针对从无姿态单目相机输入中进行在线 2D 和 3D 点跟踪的挑战，提出了动态在线单目重建（DynOMo）。我们利用 3D 高斯点喷射技术以在线方式重建动态场景。我们的方法扩展了 3D 高斯点，以捕捉新内容和物体运动，同时从单帧 RGB 图像中估计相机运动。DynOMo 的独特之处在于通过稳健的图像特征重建和新颖的相似性增强正则化项实现点轨迹的生成，而无需任何对应级别的监督。它为单目无姿态相机的在线点跟踪设立了首个基准，并在性能上与现有方法相当。我们的目标是激励社区推进在线点跟踪和重建技术，将其应用扩展到各种实际场景中。\n"
  },
  {
    "path": "abs/2409.02382.md",
    "content": "### GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving\n\nWe propose GGS, a Generalizable Gaussian Splatting method for Autonomous Driving which can achieve realistic rendering under large viewpoint changes. Previous generalizable 3D gaussian splatting methods are limited to rendering novel views that are very close to the original pair of images, which cannot handle large differences in viewpoint. Especially in autonomous driving scenarios, images are typically collected from a single lane. The limited training perspective makes rendering images of a different lane very challenging. To further improve the rendering capability of GGS under large viewpoint changes, we introduces a novel virtual lane generation module into GSS method to enables high-quality lane switching even without a multi-lane dataset. Besides, we design a diffusion loss to supervise the generation of virtual lane image to further address the problem of lack of data in the virtual lanes. Finally, we also propose a depth refinement module to optimize depth estimation in the GSS model. Extensive validation of our method, compared to existing approaches, demonstrates state-of-the-art performance.\n\n我们提出了 GGS（通用高斯点云渲染）方法，用于自动驾驶，能够在大视角变化下实现真实感渲染。之前的通用 3D 高斯点云渲染方法仅限于渲染与原始图像对非常接近的新视角，无法处理较大的视角差异。特别是在自动驾驶场景中，图像通常从单车道收集。有限的训练视角使得渲染不同车道的图像变得非常具有挑战性。为进一步提升 GGS 在大视角变化下的渲染能力，我们在 GGS 方法中引入了一种新颖的虚拟车道生成模块，即使在没有多车道数据集的情况下，也能实现高质量的车道切换。此外，我们设计了一种扩散损失来监督虚拟车道图像的生成，以进一步解决虚拟车道数据不足的问题。最后，我们还提出了一种深度优化模块，以优化 GGS 模型中的深度估计。与现有方法相比，我们的方法经过广泛验证，展示了最先进的性能。\n"
  },
  {
    "path": "abs/2409.02581.md",
    "content": "### Object Gaussian for Monocular 6D Pose Estimation from Sparse Views\n\nMonocular object pose estimation, as a pivotal task in computer vision and robotics, heavily depends on accurate 2D-3D correspondences, which often demand costly CAD models that may not be readily available. Object 3D reconstruction methods offer an alternative, among which recent advancements in 3D Gaussian Splatting (3DGS) afford a compelling potential. Yet its performance still suffers and tends to overfit with fewer input views. Embracing this challenge, we introduce SGPose, a novel framework for sparse view object pose estimation using Gaussian-based methods. Given as few as ten views, SGPose generates a geometric-aware representation by starting with a random cuboid initialization, eschewing reliance on Structure-from-Motion (SfM) pipeline-derived geometry as required by traditional 3DGS methods. SGPose removes the dependence on CAD models by regressing dense 2D-3D correspondences between images and the reconstructed model from sparse input and random initialization, while the geometric-consistent depth supervision and online synthetic view warping are key to the success. Experiments on typical benchmarks, especially on the Occlusion LM-O dataset, demonstrate that SGPose outperforms existing methods even under sparse view constraints, under-scoring its potential in real-world applications.\n\n单目物体姿态估计作为计算机视觉和机器人学中的关键任务，严重依赖准确的 2D-3D 对应关系，这通常需要昂贵的 CAD 模型，而这些模型可能并不随时可得。物体 3D 重建方法提供了一种替代方案，其中近期在 3D 高斯点云（3DGS）领域的进展展现了极大的潜力。然而，它的性能仍受限，且在输入视角较少的情况下容易过拟合。面对这一挑战，我们引入了 SGPose，这是一个基于高斯方法的稀疏视角物体姿态估计新框架。给定少至十个视角，SGPose 通过从随机立方体初始化开始生成几何感知表示，避免了传统 3DGS 方法依赖于从运动结构（SfM）管道获得的几何结构。SGPose 通过回归图像与从稀疏输入和随机初始化中重建的模型之间的密集 2D-3D 对应关系，消除了对 CAD 模型的依赖，其中几何一致的深度监督和在线合成视图变形是其成功的关键。实验结果，特别是在 Occlusion LM-O 数据集上，表明 SGPose 在稀疏视角限制下优于现有方法，突显了其在实际应用中的潜力。\n"
  },
  {
    "path": "abs/2409.02851.md",
    "content": "### Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models\n\nGenerating lifelike 3D humans from a single RGB image remains a challenging task in computer vision, as it requires accurate modeling of geometry, high-quality texture, and plausible unseen parts. Existing methods typically use multi-view diffusion models for 3D generation, but they often face inconsistent view issues, which hinder high-quality 3D human generation. To address this, we propose Human-VDM, a novel method for generating 3D human from a single RGB image using Video Diffusion Models. Human-VDM provides temporally consistent views for 3D human generation using Gaussian Splatting. It consists of three modules: a view-consistent human video diffusion module, a video augmentation module, and a Gaussian Splatting module. First, a single image is fed into a human video diffusion module to generate a coherent human video. Next, the video augmentation module applies super-resolution and video interpolation to enhance the textures and geometric smoothness of the generated video. Finally, the 3D Human Gaussian Splatting module learns lifelike humans under the guidance of these high-resolution and view-consistent images. Experiments demonstrate that Human-VDM achieves high-quality 3D human from a single image, outperforming state-of-the-art methods in both generation quality and quantity.\n\n从单张 RGB 图像生成逼真的 3D 人物仍然是计算机视觉中的一项挑战任务，因为这需要准确建模几何形状、高质量的纹理和可信的未见部分。现有方法通常使用多视角扩散模型进行 3D 生成，但这些方法常常面临视角不一致的问题，这妨碍了高质量 3D 人物的生成。为了解决这一问题，我们提出了 Human-VDM，一种利用视频扩散模型从单张 RGB 图像生成 3D 人物的新方法。Human-VDM 通过高斯点云渲染提供时间一致的视角用于 3D 人物生成。它包括三个模块：视角一致的人物视频扩散模块、视频增强模块和高斯点云渲染模块。首先，将单张图像输入到人物视频扩散模块中，以生成连贯的人物视频。接着，视频增强模块应用超分辨率和视频插值技术，以增强生成视频的纹理和几何平滑度。最后，3D 人物高斯点云渲染模块在这些高分辨率且视角一致的图像指导下学习生成逼真的 3D 人物。实验结果表明，Human-VDM 能够从单张图像生成高质量的 3D 人物，在生成质量和数量上均优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2409.03213.md",
    "content": "### Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction\n\n3D Gaussian Splatting (3DGS) has emerged as a promising approach for 3D scene representation, offering a reduction in computational overhead compared to Neural Radiance Fields (NeRF). However, 3DGS is susceptible to high-frequency artifacts and demonstrates suboptimal performance under sparse viewpoint conditions, thereby limiting its applicability in robotics and computer vision. To address these limitations, we introduce SVS-GS, a novel framework for Sparse Viewpoint Scene reconstruction that integrates a 3D Gaussian smoothing filter to suppress artifacts. Furthermore, our approach incorporates a Depth Gradient Profile Prior (DGPP) loss with a dynamic depth mask to sharpen edges and 2D diffusion with Score Distillation Sampling (SDS) loss to enhance geometric consistency in novel view synthesis. Experimental evaluations on the MipNeRF-360 and SeaThru-NeRF datasets demonstrate that SVS-GS markedly improves 3D reconstruction from sparse viewpoints, offering a robust and efficient solution for scene understanding in robotics and computer vision applications.\n\n3D 高斯点云（3DGS）作为 3D 场景表示的一种有前景的方法，相较于神经辐射场（NeRF）能减少计算开销。然而，3DGS 易受到高频伪影的影响，在稀疏视角条件下表现不佳，从而限制了其在机器人技术和计算机视觉中的应用。为了解决这些问题，我们引入了 SVS-GS，一个新颖的稀疏视角场景重建框架，它集成了 3D 高斯平滑滤波器以抑制伪影。此外，我们的方法还结合了深度梯度轮廓先验（DGPP）损失和动态深度掩码来锐化边缘，并使用 2D 扩散和评分蒸馏采样（SDS）损失来增强新视角合成中的几何一致性。在 MipNeRF-360 和 SeaThru-NeRF 数据集上的实验评估表明，SVS-GS 显著改善了从稀疏视角进行的 3D 重建，为机器人技术和计算机视觉应用中的场景理解提供了稳健高效的解决方案。\n"
  },
  {
    "path": "abs/2409.03456.md",
    "content": "### LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors\n\nWe aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable successes in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting on input images, and a lack of details. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high-quality 360-degree scene reconstruction. Visual results are on our website.\n\n我们旨在通过利用大规模视觉模型的先验来解决稀疏视角下的 3D 场景重建问题。尽管最近的进展如 3D 高斯点云（3DGS）在 3D 重建中取得了显著成功，但这些方法通常需要数百张密集捕捉底层场景的输入图像，这使得它们在实际应用中既耗时又不切实际。然而，稀疏视角重建本质上是一个病态且约束不足的问题，往往导致结果较差且不完整。这是由于初始化失败、对输入图像的过拟合和细节缺乏等问题。为缓解这些挑战，我们引入了 LM-Gaussian，这是一种能够从有限数量图像生成高质量重建的方法。具体而言，我们提出了一个强健的初始化模块，通过利用立体视觉先验来帮助恢复相机姿态和可靠的点云。此外，采用基于扩散的迭代优化来将图像扩散先验融入高斯优化过程，以保留复杂的场景细节。最后，我们利用视频扩散先验进一步增强渲染图像的真实视觉效果。总体而言，我们的方法显著减少了与之前的 3DGS 方法相比的数据获取需求。我们通过在各种公共数据集上的实验验证了框架的有效性，展示了其在高质量 360 度场景重建中的潜力。\n"
  },
  {
    "path": "abs/2409.04013.md",
    "content": "### 3D-GP-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors\n\nMulti-view image compression is vital for 3D-related applications. To effectively model correlations between views, existing methods typically predict disparity between two views on a 2D plane, which works well for small disparities, such as in stereo images, but struggles with larger disparities caused by significant view changes. To address this, we propose a novel approach: learning-based multi-view image coding with 3D Gaussian geometric priors (3D-GP-LMVIC). Our method leverages 3D Gaussian Splatting to derive geometric priors of the 3D scene, enabling more accurate disparity estimation across views within the compression model. Additionally, we introduce a depth map compression model to reduce redundancy in geometric information between views. A multi-view sequence ordering method is also proposed to enhance correlations between adjacent views. Experimental results demonstrate that 3D-GP-LMVIC surpasses both traditional and learning-based methods in performance, while maintaining fast encoding and decoding speed.\n\n多视图图像压缩在与3D相关的应用中至关重要。为了有效建模视图之间的相关性，现有方法通常在二维平面上预测两个视图之间的视差，这在小视差场景（如立体图像）中效果较好，但在由于视图大幅变化导致的较大视差情况下表现不佳。为了解决这个问题，我们提出了一种新方法：基于学习的带有3D高斯几何先验的多视图图像编码（3D-GP-LMVIC）。该方法利用3D高斯分裂技术获取3D场景的几何先验，从而在压缩模型中实现更准确的视差估计。此外，我们引入了深度图压缩模型，减少视图间几何信息的冗余。我们还提出了一种多视图序列排序方法，以增强相邻视图之间的相关性。实验结果表明，3D-GP-LMVIC在性能上超越了传统和基于学习的方法，同时保持了快速的编码和解码速度。\n"
  },
  {
    "path": "abs/2409.04196.md",
    "content": "### GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers\n\nReconstructing realistic 3D human models from monocular images has significant applications in creative industries, human-computer interfaces, and healthcare. We base our work on 3D Gaussian Splatting (3DGS), a scene representation composed of a mixture of Gaussians. Predicting such mixtures for a human from a single input image is challenging, as it is a non-uniform density (with a many-to-one relationship with input pixels) with strict physical constraints. At the same time, it needs to be flexible to accommodate a variety of clothes and poses. Our key observation is that the vertices of standardized human meshes (such as SMPL) can provide an adequate density and approximate initial position for Gaussians. We can then train a transformer model to jointly predict comparatively small adjustments to these positions, as well as the other Gaussians' attributes and the SMPL parameters. We show empirically that this combination (using only multi-view supervision) can achieve fast inference of 3D human models from a single image without test-time optimization, expensive diffusion models, or 3D points supervision. We also show that it can improve 3D pose estimation by better fitting human models that account for clothes and other variations.\n\n从单目图像重建逼真的3D人体模型在创意产业、人机交互和医疗保健等领域具有重要应用。我们的工作基于3D高斯分裂（3DGS），这是一种由高斯混合体构成的场景表示。对于单张输入图像预测这样的混合体是具有挑战性的，因为它是非均匀密度（与输入像素之间存在多对一的关系）并且受到严格的物理约束。同时，该方法需要灵活以适应多样的衣物和姿态。我们的关键观察是，标准化人体网格（如SMPL）的顶点能够为高斯体提供足够的密度并近似初始位置。接着，我们可以训练一个Transformer模型，联合预测这些位置的较小调整，以及其他高斯体属性和SMPL参数。我们通过实验证明，这种组合（仅使用多视图监督）能够在不需要测试时优化、昂贵的扩散模型或3D点监督的情况下，实现单张图像的快速3D人体模型推理。我们还展示了，该方法通过更好地拟合考虑衣物和其他变化的人体模型，可以提高3D姿态估计的准确性。\n"
  },
  {
    "path": "abs/2409.04751.md",
    "content": "### Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras\n\nRecently, 3D Gaussian Splatting (3DGS) has garnered attention for its high fidelity and real-time rendering. However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation. Additionally, there are inefficiencies in the tile-based splatting, especially for the extreme curvature and wide field of view of fisheye lenses, which are crucial for its broader real-life applications. To tackle these challenges, we introduce Fisheye-GS.This innovative method recalculates the projection transformation and its gradients for fisheye cameras. Our approach can be seamlessly integrated as a module into other efficient 3D rendering methods, emphasizing its extensibility, lightweight nature, and modular design. Since we only modified the projection component, it can also be easily adapted for use with different camera models. Compared to methods that train after undistortion, our approach demonstrates a clear improvement in visual quality.\n\n近期，3D高斯分裂（3DGS）因其高保真度和实时渲染性能备受关注。然而，将3DGS应用于不同的相机模型，尤其是鱼眼镜头，面临挑战，主要是由于独特的3D到2D投影计算。此外，基于平铺的高斯分裂在鱼眼镜头的极端曲率和广视角条件下效率不高，这对于其更广泛的实际应用至关重要。为了解决这些问题，我们提出了Fisheye-GS⋆。该创新方法重新计算了鱼眼相机的投影变换及其梯度。我们的方法可以作为一个模块无缝集成到其他高效的3D渲染方法中，强调其可扩展性、轻量化和模块化设计。由于我们只修改了投影部分，该方法也可以轻松适配不同的相机模型。与通过图像去畸变后进行训练的方法相比，我们的方法在视觉质量上有明显提升。\n"
  },
  {
    "path": "abs/2409.04963.md",
    "content": "### GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning\n\nSelf-supervised learning of point cloud aims to leverage unlabeled 3D data to learn meaningful representations without reliance on manual annotations. However, current approaches face challenges such as limited data diversity and inadequate augmentation for effective feature learning. To address these challenges, we propose GS-PT, which integrates 3D Gaussian Splatting (3DGS) into point cloud self-supervised learning for the first time. Our pipeline utilizes transformers as the backbone for self-supervised pre-training and introduces novel contrastive learning tasks through 3DGS. Specifically, the transformers aim to reconstruct the masked point cloud. 3DGS utilizes multi-view rendered images as input to generate enhanced point cloud distributions and novel view images, facilitating data augmentation and cross-modal contrastive learning. Additionally, we incorporate features from depth maps. By optimizing these tasks collectively, our method enriches the tri-modal self-supervised learning process, enabling the model to leverage the correlation across 3D point clouds and 2D images from various modalities. We freeze the encoder after pre-training and test the model's performance on multiple downstream tasks. Experimental results indicate that GS-PT outperforms the off-the-shelf self-supervised learning methods on various downstream tasks including 3D object classification, real-world classifications, and few-shot learning and segmentation.\n\n点云的自监督学习旨在利用未标注的3D数据学习有意义的表示，而无需依赖人工标注。然而，当前的方法面临数据多样性有限和增强策略不足等挑战，导致特征学习效果不佳。为了解决这些问题，我们首次将3D高斯分裂（3DGS）引入到点云的自监督学习中，提出了GS-PT方法。我们的管道采用Transformer作为自监督预训练的骨干网络，并通过3DGS引入了新的对比学习任务。具体来说，Transformer旨在重建被遮掩的点云，而3DGS则利用多视图渲染图像作为输入，生成增强的点云分布和新视角图像，促进数据增强和跨模态对比学习。此外，我们还结合了来自深度图的特征。通过共同优化这些任务，我们的方法丰富了三模态的自监督学习过程，使模型能够利用不同模态下3D点云和2D图像之间的相关性。我们在预训练后冻结编码器，并测试模型在多个下游任务上的表现。实验结果表明，GS-PT在多个下游任务上（包括3D物体分类、真实世界分类、少样本学习和分割）均优于现有的自监督学习方法。\n\n"
  },
  {
    "path": "abs/2409.05334.md",
    "content": "### Lagrangian Hashing for Compressed Neural Field Representations\n\nWe present Lagrangian Hashing, a representation for neural fields combining the characteristics of fast training NeRF methods that rely on Eulerian grids (i.e.~InstantNGP), with those that employ points equipped with features as a way to represent information (e.g. 3D Gaussian Splatting or PointNeRF). We achieve this by incorporating a point-based representation into the high-resolution layers of the hierarchical hash tables of an InstantNGP representation. As our points are equipped with a field of influence, our representation can be interpreted as a mixture of Gaussians stored within the hash table. We propose a loss that encourages the movement of our Gaussians towards regions that require more representation budget to be sufficiently well represented. Our main finding is that our representation allows the reconstruction of signals using a more compact representation without compromising quality.\n\n我们提出了Lagrangian Hashing，这是一种神经场的表示方法，结合了基于欧拉网格的快速训练NeRF方法（如InstantNGP）与那些使用带有特征点来表示信息的方法（如3D高斯分裂或PointNeRF）的特点。我们通过将基于点的表示引入InstantNGP层次化哈希表的高分辨率层中实现这一目标。由于我们的点带有影响域，这种表示可以解释为存储在哈希表中的高斯混合体。我们提出了一种损失函数，鼓励这些高斯向需要更多表示预算的区域移动，以确保这些区域能够得到充分表示。我们的主要发现是，这种表示方法能够在不损失质量的情况下，使用更加紧凑的表示来重建信号。\n"
  },
  {
    "path": "abs/2409.05819.md",
    "content": "### GASP: Gaussian Splatting for Physic-Based Simulations\n\nPhysics simulation is paramount for modeling and utilization of 3D scenes in various real-world applications. However, its integration with state-of-the-art 3D scene rendering techniques such as Gaussian Splatting (GS) remains challenging. Existing models use additional meshing mechanisms, including triangle or tetrahedron meshing, marching cubes, or cage meshes. As an alternative, we can modify the physics grounded Newtonian dynamics to align with 3D Gaussian components. Current models take the first-order approximation of a deformation map, which locally approximates the dynamics by linear transformations. In contrast, our Gaussian Splatting for Physics-Based Simulations (GASP) model uses such a map (without any modifications) and flat Gaussian distributions, which are parameterized by three points (mesh faces). Subsequently, each 3D point (mesh face node) is treated as a discrete entity within a 3D space. Consequently, the problem of modeling Gaussian components is reduced to working with 3D points. Additionally, the information on mesh faces can be used to incorporate further properties into the physics model, facilitating the use of triangles. Resulting solution can be integrated into any physics engine that can be treated as a black box. As demonstrated in our studies, the proposed model exhibits superior performance on a diverse range of benchmark datasets designed for 3D object rendering.\n\n物理仿真在各种实际应用中对3D场景的建模和利用至关重要。然而，将其与最新的3D场景渲染技术（如高斯分裂，GS）结合仍然充满挑战。现有模型通常使用额外的网格化机制，包括三角形或四面体网格、Marching Cubes算法或笼型网格。作为替代方案，我们可以修改基于物理的牛顿力学，使其与3D高斯组件对齐。目前的模型采用变形映射的一阶近似，通过线性变换局部逼近动力学。相比之下，我们的基于物理仿真的高斯分裂模型（GASP）使用这样的映射（无需任何修改）和由三点（网格面）参数化的平面高斯分布。随后，每个3D点（网格面节点）被视为3D空间中的一个离散实体。这样，建模高斯组件的问题简化为处理3D点。此外，网格面上的信息可以用于将更多属性整合到物理模型中，从而促进三角形的使用。最终的解决方案可以集成到任何可作为黑箱处理的物理引擎中。正如我们的研究所示，所提出的模型在设计用于3D对象渲染的多种基准数据集上表现出卓越的性能。\n\n"
  },
  {
    "path": "abs/2409.05868.md",
    "content": "### SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting\n\nRecently, the 3D Gaussian Splatting (3D-GS) method has achieved great success in novel view synthesis, providing real-time rendering while ensuring high-quality rendering results. However, this method faces challenges in modeling specular reflections and handling anisotropic appearance components, especially in dealing with view-dependent color under complex lighting conditions. Additionally, 3D-GS uses spherical harmonic to learn the color representation, which has limited ability to represent complex scenes. To overcome these challenges, we introduce Lantent-SpecGS, an approach that utilizes a universal latent neural descriptor within each 3D Gaussian. This enables a more effective representation of 3D feature fields, including appearance and geometry. Moreover, two parallel CNNs are designed to decoder the splatting feature maps into diffuse color and specular color separately. A mask that depends on the viewpoint is learned to merge these two colors, resulting in the final rendered image. Experimental results demonstrate that our method obtains competitive performance in novel view synthesis and extends the ability of 3D-GS to handle intricate scenarios with specular reflections.\n\n最近，3D Gaussian Splatting (3D-GS) 方法在新视图合成领域取得了巨大的成功，能够在保证高质量渲染结果的同时实现实时渲染。然而，该方法在建模镜面反射和处理各向异性外观组件时面临挑战，尤其是在复杂光照条件下处理与视角相关的颜色问题。此外，3D-GS 使用球谐函数来学习颜色表示，但在表示复杂场景时能力有限。为了解决这些问题，我们提出了 Latent-SpecGS 方法，该方法在每个 3D 高斯中引入了一个通用的潜在神经描述符，使其能够更有效地表示 3D 特征场，包括外观和几何信息。此外，我们设计了两个并行的卷积神经网络（CNN）分别解码散点特征图，输出漫反射颜色和镜面反射颜色。一个依赖于视角的遮罩被学习用于合并这两种颜色，从而生成最终渲染图像。实验结果表明，我们的方法在新视图合成方面取得了具有竞争力的性能，并扩展了 3D-GS 在处理具有镜面反射的复杂场景中的能力。\n"
  },
  {
    "path": "abs/2409.06685.md",
    "content": "### GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction\n\n3D Gaussian Splatting (3DGS) has shown promising performance in novel view synthesis. Previous methods adapt it to obtaining surfaces of either individual 3D objects or within limited scenes. In this paper, we make the first attempt to tackle the challenging task of large-scale scene surface reconstruction. This task is particularly difficult due to the high GPU memory consumption, different levels of details for geometric representation, and noticeable inconsistencies in appearance. To this end, we propose GigaGS, the first work for high-quality surface reconstruction for large-scale scenes using 3DGS. GigaGS first applies a partitioning strategy based on the mutual visibility of spatial regions, which effectively grouping cameras for parallel processing. To enhance the quality of the surface, we also propose novel multi-view photometric and geometric consistency constraints based on Level-of-Detail representation. In doing so, our method can reconstruct detailed surface structures. Comprehensive experiments are conducted on various datasets. The consistent improvement demonstrates the superiority of GigaGS.\n\n3D Gaussian Splatting (3DGS) 在新视图合成中展现了出色的性能。之前的方法主要应用于单个 3D 物体或有限场景的表面获取。在本文中，我们首次尝试解决大规模场景表面重建这一具有挑战性的任务。由于高GPU内存消耗、几何表示的不同细节层次以及外观上的显著不一致性，这一任务尤为困难。为此，我们提出了 GigaGS，这是第一个基于 3DGS 的高质量大规模场景表面重建方法。GigaGS 首先应用了一种基于空间区域相互可见性的分割策略，有效地将摄像机进行分组，以便并行处理。为了提升表面质量，我们还提出了基于多层细节（Level-of-Detail）表示的多视图光度和几何一致性约束。通过这样做，我们的方法能够重建出精细的表面结构。我们在多个数据集上进行了全面的实验，实验结果显示了 GigaGS 的显著优势。\n"
  },
  {
    "path": "abs/2409.06765.md",
    "content": "### gsplat: An Open-Source Library for Gaussian Splatting\n\ngsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compatible with the PyTorch library and a back-end with highly optimized CUDA kernels. gsplat offers numerous features that enhance the optimization of Gaussian Splatting models, which include optimization improvements for speed, memory, and convergence times. Experimental results demonstrate that gsplat achieves up to 10% less training time and 4x less memory than the original implementation. Utilized in several research projects, gsplat is actively maintained on GitHub.\n\ngsplat 是一个开源库，专为训练和开发 Gaussian Splatting 方法设计。它包含与 PyTorch 库兼容的 Python 绑定前端和高度优化的 CUDA 内核后端。gsplat 提供了多种功能，提升了 Gaussian Splatting 模型的优化效果，包括在速度、内存和收敛时间方面的优化改进。实验结果表明，gsplat 的训练时间比原始实现减少了多达10%，内存消耗减少了4倍。gsplat 已被多个研究项目使用，并在 GitHub 上积极维护。\n"
  },
  {
    "path": "abs/2409.07200.md",
    "content": "### ThermalGaussian: Thermal 3D Gaussian Splatting\n\nThermography is especially valuable for the military and other users of surveillance cameras. Some recent methods based on Neural Radiance Fields (NeRF) are proposed to reconstruct the thermal scenes in 3D from a set of thermal and RGB images. However, unlike NeRF, 3D Gaussian splatting (3DGS) prevails due to its rapid training and real-time rendering. In this work, we propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities. We first calibrate the RGB camera and the thermal camera to ensure that both modalities are accurately aligned. Subsequently, we use the registered images to learn the multimodal 3D Gaussians. To prevent the overfitting of any single modality, we introduce several multimodal regularization constraints. We also develop smoothing constraints tailored to the physical characteristics of the thermal modality. Besides, we contribute a real-world dataset named RGBT-Scenes, captured by a hand-hold thermal-infrared camera, facilitating future research on thermal scene reconstruction. We conduct comprehensive experiments to show that ThermalGaussian achieves photorealistic rendering of thermal images and improves the rendering quality of RGB images. With the proposed multimodal regularization constraints, we also reduced the model's storage cost by 90%.\n\n热成像在军事及其他监控摄像机用户中具有重要价值。最近，一些基于神经辐射场（NeRF）的方法被提出用于从一组热成像和RGB图像中重建3D热场景。然而，与NeRF不同，3D Gaussian Splatting (3DGS) 因其快速训练和实时渲染的优势而更具优势。在这项工作中，我们提出了ThermalGaussian，这是首个能够渲染高质量RGB和热成像图像的3DGS方法。我们首先校准RGB相机和热成像相机，以确保这两种模态准确对齐。随后，我们使用配准后的图像来学习多模态3D高斯。为防止单一模态的过拟合，我们引入了多模态正则化约束。此外，我们还开发了针对热成像物理特性定制的平滑约束。此外，我们贡献了一个名为RGBT-Scenes的真实世界数据集，该数据集由手持热红外相机采集，旨在推动未来热场景重建的研究。我们进行了全面的实验，表明ThermalGaussian在热成像图像的渲染上实现了逼真的效果，并提高了RGB图像的渲染质量。通过所提出的多模态正则化约束，我们还将模型的存储成本减少了90%。\n"
  },
  {
    "path": "abs/2409.07245.md",
    "content": "### Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks\n\nThis paper introduces SO(2)-Equivariant Gaussian Sculpting Networks (GSNs) as an approach for SO(2)-Equivariant 3D object reconstruction from single-view image observations. GSNs take a single observation as input to generate a Gaussian splat representation describing the observed object's geometry and texture. By using a shared feature extractor before decoding Gaussian colors, covariances, positions, and opacities, GSNs achieve extremely high throughput (>150FPS). Experiments demonstrate that GSNs can be trained efficiently using a multi-view rendering loss and are competitive, in quality, with expensive diffusion-based reconstruction algorithms. The GSN model is validated on multiple benchmark experiments. Moreover, we demonstrate the potential for GSNs to be used within a robotic manipulation pipeline for object-centric grasping.\n\n本文提出了SO(2)-等变高斯雕刻网络（SO(2)-Equivariant Gaussian Sculpting Networks, GSNs），该方法用于从单视角图像观测中进行SO(2)-等变的3D物体重建。GSNs以单个观测图像为输入，生成一个描述物体几何和纹理的高斯散点表示。通过在解码高斯颜色、协方差、位置和不透明度之前使用共享特征提取器，GSNs 实现了极高的处理速度（超过150FPS）。实验表明，GSNs 能够通过多视图渲染损失高效训练，且在质量上与昂贵的基于扩散的重建算法相媲美。GSN模型在多个基准实验中得到了验证。此外，我们展示了GSNs在机器人操作流程中应用的潜力，特别是在以物体为中心的抓取任务中。\n"
  },
  {
    "path": "abs/2409.07441.md",
    "content": "### Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering\n\nWe propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme with UV positional encoding to ensure the throughput and rendering quality of GauFace assets generated by our TransGS. Once trained, TransGS can instantly translate facial assets with lighting conditions to GauFace representation, With the rich conditioning modalities, it also enables editing and animation capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional offline and online renderers, as well as recent neural rendering methods, which demonstrate the superior performance of our approach for facial asset rendering. We also showcase diverse immersive applications of facial assets using our TransGS approach and GauFace representation, across various platforms like PCs, phones and even VR headsets.\n\n我们提出了GauFace，一种全新的高斯散点表示法，专为高效动画和物理基础的面部资产渲染设计。通过利用强几何先验和约束优化，GauFace确保了干净且结构化的高斯表示，能够在Snapdragon 8 Gen 2移动平台上实现1440p@30fps的高保真实时面部交互。\n接着，我们介绍了TransGS，一种扩散式Transformer，用于将物理基础的面部资产快速转化为相应的GauFace表示。具体来说，我们采用了基于patch的管线来有效处理大量的高斯点。此外，我们引入了一种新颖的像素对齐采样方案，并结合UV位置编码，确保由TransGS生成的GauFace资产的吞吐量和渲染质量。一旦训练完成，TransGS能够即时将带有光照条件的面部资产转化为GauFace表示。凭借丰富的条件控制模式，它还能够实现类似传统CG管线的编辑和动画功能。\n我们进行了广泛的评估和用户研究，与传统的离线和在线渲染器以及最近的神经渲染方法相比，我们的方法在面部资产渲染上表现出显著的优越性。此外，我们展示了使用TransGS方法和GauFace表示的面部资产在多个平台（如PC、手机甚至VR头戴设备）上的多样化沉浸式应用。\n"
  },
  {
    "path": "abs/2409.07452.md",
    "content": "### Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models\n\nDespite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D awareness. In this work, we present High-resolution Image-to-3D model (Hi3D), a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation (i.e., orbital video generation). This methodology delves into the underlying temporal consistency knowledge in video diffusion model that generalizes well to geometry consistency across multiple views in 3D generation. Technically, Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior (camera pose condition), yielding multi-view images with low-resolution texture details. A 3D-aware video-to-video refiner is learnt to further scale up the multi-view images with high-resolution texture details. Such high-resolution multi-view images are further augmented with novel views through 3D Gaussian Splatting, which are finally leveraged to obtain high-fidelity meshes via 3D reconstruction. Extensive experiments on both novel view synthesis and single view reconstruction demonstrate that our Hi3D manages to produce superior multi-view consistency images with highly-detailed textures.\n\n尽管图像到3D生成领域取得了巨大进展，现有方法在生成具有高分辨率细节纹理的多视角一致图像时仍然面临挑战，尤其是在缺乏3D感知的2D扩散范式下。为了解决这一问题，我们提出了高分辨率图像到3D模型（Hi3D），这是一种基于视频扩散的新范式，将单张图像生成多视角图像重新定义为3D感知的序列图像生成（即轨道视频生成）。这种方法深入探讨了视频扩散模型中的时间一致性知识，这种知识可以很好地推广到3D生成中的几何一致性。\n在技术上，Hi3D 首先为预训练的视频扩散模型赋予了3D感知先验（相机位姿条件），从而生成具有低分辨率纹理细节的多视角图像。接着，学习一个3D感知的视频到视频的精化器，用于进一步提升多视角图像的分辨率和纹理细节。这些高分辨率的多视角图像通过3D高斯散点进行视图扩展，最终用于通过3D重建获取高保真的网格。我们在新视图合成和单视图重建上的广泛实验表明，Hi3D能够生成具有高度细节纹理的多视角一致图像。\n\n"
  },
  {
    "path": "abs/2409.07456.md",
    "content": "### Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs\n\n3D Gaussian Splatting (GS) significantly struggles to accurately represent the underlying 3D scene geometry, resulting in inaccuracies and floating artifacts when rendering depth maps. In this paper, we address this limitation, undertaking a comprehensive analysis of the integration of depth priors throughout the optimization process of Gaussian primitives, and present a novel strategy for this purpose. This latter dynamically exploits depth cues from a readily available stereo network, processing virtual stereo pairs rendered by the GS model itself during training and achieving consistent self-improvement of the scene representation. Experimental results on three popular datasets, breaking ground as the first to assess depth accuracy for these models, validate our findings.\n\n3D Gaussian Splatting (GS) 在准确表示底层3D场景几何方面面临显著挑战，导致深度图渲染时出现不准确和浮动伪影。为了解决这一局限性，本文进行了全面分析，研究了在高斯基元优化过程中整合深度先验的方法，并提出了一种新策略。该策略动态利用来自现成的立体网络的深度线索，在训练过程中处理由GS模型本身渲染的虚拟立体对，从而实现场景表示的一致自我改进。我们在三个流行数据集上的实验结果验证了这一发现，这是首次对这些模型的深度准确性进行评估的研究。\n\n"
  },
  {
    "path": "abs/2409.07759.md",
    "content": "### SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length\n\nRecent advances in 3D Gaussian Splatting (3DGS) have garnered significant attention in computer vision and computer graphics due to its high rendering speed and remarkable quality. While extant research has endeavored to extend the application of 3DGS from static to dynamic scenes, such efforts have been consistently impeded by excessive model sizes, constraints on video duration, and content deviation. These limitations significantly compromise the streamability of dynamic 3D Gaussian models, thereby restricting their utility in downstream applications, including volumetric video, autonomous vehicle, and immersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and rendering volumetric video in a real-time streaming fashion. To address the aforementioned challenges and enhance streamability, SwinGS integrates spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to fit various 3D scenes across frames, in the meantime employing a sliding window captures Gaussian snapshots for each frame in an accumulative way. We implement a prototype of SwinGS and demonstrate its streamability across various datasets and scenes. Additionally, we develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers, including smartphones and tablets. Experimental results show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR. Moreover, SwinGS easily scales to long video sequences without compromising quality.\n\n近年来，3D Gaussian Splatting (3DGS) 因其高渲染速度和卓越的质量在计算机视觉和计算机图形学领域引起了广泛关注。尽管现有研究已经努力将3DGS的应用从静态场景扩展到动态场景，但这些尝试一直受到模型规模过大、视频时长限制以及内容偏差的阻碍。这些局限性大大削弱了动态3D高斯模型的流媒体能力，从而限制了其在体积视频、自动驾驶车辆以及虚拟现实、增强现实和混合现实等沉浸式技术中的应用。\n本文提出了SwinGS，这是一种用于实时流媒体方式训练、传输和渲染体积视频的新框架。为了解决上述挑战并增强流媒体能力，SwinGS 将时空高斯与马尔可夫链蒙特卡洛（MCMC）相结合，使模型能够在不同帧之间适应各种3D场景。同时，采用滑动窗口方法，以累积方式为每一帧捕捉高斯快照。我们实现了SwinGS的原型，并展示了其在多个数据集和场景中的流媒体能力。此外，我们开发了一个交互式WebGL查看器，能够在包括智能手机和平板电脑在内的大多数设备上的现代浏览器中实现实时体积视频播放。实验结果表明，与之前的工作相比，SwinGS 在传输成本上减少了83.6%，且几乎不影响PSNR质量。此外，SwinGS 可以轻松扩展到长视频序列而不影响质量。\n"
  },
  {
    "path": "abs/2409.08042.md",
    "content": "### Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis\n\nNovel-view synthesis based on visible light has been extensively studied. In comparison to visible light imaging, thermal infrared imaging offers the advantage of all-weather imaging and strong penetration, providing increased possibilities for reconstruction in nighttime and adverse weather scenarios. However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images. To address these limitations, this paper introduces a physics-induced 3D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins by modeling atmospheric transmission effects and thermal conduction in three-dimensional media using neural networks. Additionally, a temperature consistency constraint is incorporated into the optimization objective to enhance the reconstruction accuracy of thermal infrared images. Furthermore, to validate the effectiveness of our method, the first large-scale benchmark dataset for this field named Thermal Infrared Novel-view Synthesis Dataset (TI-NSD) is created. This dataset comprises 20 authentic thermal infrared video scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios, totaling 6,664 frames of thermal infrared image data. Based on this dataset, this paper experimentally verifies the effectiveness of Thermal3D-GS. The results indicate that our method outperforms the baseline method with a 3.03 dB improvement in PSNR and significantly addresses the issues of floaters and indistinct edge features present in the baseline method.\n\n基于可见光的新视图合成已被广泛研究。相比于可见光成像，热红外成像具备全天候成像和强穿透力的优势，能够在夜间和恶劣天气条件下提供更多的重建可能性。然而，热红外成像受到大气传输效应和热传导等物理特性的影响，难以精准重建热红外场景中的细节，表现为合成图像中的浮动伪影和边缘特征模糊等问题。为了解决这些局限性，本文提出了一种名为 Thermal3D-GS 的物理驱动3D高斯散点方法。Thermal3D-GS 首先通过神经网络对三维介质中的大气传输效应和热传导进行建模。此外，还将温度一致性约束引入到优化目标中，以提高热红外图像的重建精度。\n为了验证该方法的有效性，本文创建了该领域首个大规模基准数据集，名为 Thermal Infrared Novel-view Synthesis Dataset (TI-NSD)。该数据集包含20个真实的热红外视频场景，涵盖室内、室外以及无人机（UAV）场景，总计包含6,664帧热红外图像数据。基于该数据集，本文通过实验验证了 Thermal3D-GS 的有效性。结果表明，我们的方法在PSNR上较基线方法提升了3.03 dB，并显著解决了基线方法中存在的浮动伪影和边缘模糊问题。\n"
  },
  {
    "path": "abs/2409.08270.md",
    "content": "### FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally\n\nThis study addresses the challenge of accurately segmenting 3D Gaussian Splatting from 2D masks. Conventional methods often rely on iterative gradient descent to assign each Gaussian a unique label, leading to lengthy optimization and sub-optimal solutions. Instead, we propose a straightforward yet globally optimal solver for 3D-GS segmentation. The core insight of our method is that, with a reconstructed 3D-GS scene, the rendering of the 2D masks is essentially a linear function with respect to the labels of each Gaussian. As such, the optimal label assignment can be solved via linear programming in closed form. This solution capitalizes on the alpha blending characteristic of the splatting process for single step optimization. By incorporating the background bias in our objective function, our method shows superior robustness in 3D segmentation against noises. Remarkably, our optimization completes within 30 seconds, about 50× faster than the best existing methods. Extensive experiments demonstrate the efficiency and robustness of our method in segmenting various scenes, and its superior performance in downstream tasks such as object removal and inpainting.\n\n本研究解决了从2D掩码中准确分割3D Gaussian Splatting (3D-GS) 的挑战。传统方法通常依赖迭代的梯度下降算法为每个高斯分配唯一的标签，这导致了冗长的优化过程和次优解。相较之下，我们提出了一种简单且全局最优的3D-GS分割求解器。我们方法的核心洞见在于，对于已重建的3D-GS场景，2D掩码的渲染本质上是与每个高斯的标签相关的线性函数。因此，最优的标签分配可以通过线性规划以闭式形式解决。该方案利用了散点渲染过程中alpha混合的特性，实现了单步优化。通过在目标函数中引入背景偏置，我们的方法在面对噪声时展现出更强的鲁棒性。值得注意的是，我们的优化在30秒内完成，比现有最佳方法快约50倍。广泛的实验表明，我们的方法在分割各种场景中的高效性和鲁棒性，并且在物体移除和修补等下游任务中表现出优越的性能。\n"
  },
  {
    "path": "abs/2409.08353.md",
    "content": "### Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos\n\nVolumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed DualGS, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.\n\n体积视频代表了视觉媒体的革命性进步，使用户能够自由探索沉浸式虚拟体验，缩小了数字世界与现实世界之间的差距。然而，现有工作流程中大量手动干预以稳定网格序列以及生成过于庞大的资产，阻碍了其广泛应用。在本文中，我们提出了一种基于高斯的新方法，称为DualGS，用于复杂人类表演的实时高保真播放，并具有出色的压缩比。DualGS 的核心思想是使用对应的皮肤和关节高斯分别表示运动和外观。这样的显式解耦可以显著减少运动冗余并增强时间一致性。我们首先初始化 DualGS，并在第一帧将皮肤高斯锚定在关节高斯上。随后，我们采用由粗到细的训练策略对逐帧人类表演进行建模。该策略包括整体运动预测的粗略对齐阶段，以及用于鲁棒跟踪和高保真渲染的精细优化阶段。\n为了将体积视频无缝集成到 VR 环境中，我们通过熵编码高效压缩运动，并结合持久码本使用编解码器压缩外观。我们的方法实现了高达 120 倍的压缩比，每帧仅需大约 350KB 的存储空间。我们通过在 VR 头显上提供逼真的自由视角体验，展示了我们的表示法的有效性，让用户能够沉浸式观看音乐家的表演，感受到演奏者指尖音符的节奏。\n"
  },
  {
    "path": "abs/2409.08562.md",
    "content": "### CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting\n\nWe introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses, limited viewpoints, and inconsistent lighting. CSS addresses these challenges through robust geometric priors and advanced illumination modeling, enabling high-quality novel view synthesis under complex, real-world conditions. Our method demonstrates clear improvements over existing approaches, paving the way for more accurate and flexible applications in AR, VR, and large-scale 3D reconstruction.\n\n我们提出了群体众包点云投影（Crowd-Sourced Splatting, CSS），这是一种新颖的3D高斯投影（3D Gaussian Splatting, 3DGS）管线，旨在解决使用群体众包图像进行无姿态场景重建的挑战。重建历史上重要但无法直接接触的场景一直是研究人员的梦想。然而，传统的3D技术在缺失相机姿态、视角受限和光照不一致的情况下表现不佳。CSS通过稳健的几何先验和先进的光照建模，成功应对这些挑战，从而在复杂的真实世界条件下实现高质量的新视角合成。我们的方法相比现有方法表现出明显的改进，为增强现实、虚拟现实以及大规模3D重建中的更准确和灵活的应用铺平了道路。\n"
  },
  {
    "path": "abs/2409.08613.md",
    "content": "### Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints\n\n3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in scene synthesis and novel view synthesis tasks. Typically, the initialization of 3D Gaussian primitives relies on point clouds derived from Structure-from-Motion (SfM) methods. However, in scenarios requiring scene reconstruction from sparse viewpoints, the effectiveness of 3DGS is significantly constrained by the quality of these initial point clouds and the limited number of input images. In this study, we present Dust-GS, a novel framework specifically designed to overcome the limitations of 3DGS in sparse viewpoint conditions. Instead of relying solely on SfM, Dust-GS introduces an innovative point cloud initialization technique that remains effective even with sparse input data. Our approach leverages a hybrid strategy that integrates an adaptive depth-based masking technique, thereby enhancing the accuracy and detail of reconstructed scenes. Extensive experiments conducted on several benchmark datasets demonstrate that Dust-GS surpasses traditional 3DGS methods in scenarios with sparse viewpoints, achieving superior scene reconstruction quality with a reduced number of input images.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）在场景合成和新视角合成任务中表现出了卓越的性能。通常，3D 高斯基元的初始化依赖于通过结构光（Structure-from-Motion, SfM）方法生成的点云。然而，在需要从稀疏视角进行场景重建的场景中，3DGS 的有效性受限于初始点云的质量以及输入图像数量的有限性。在本研究中，我们提出了 Dust-GS，这是一个专门设计用于克服 3DGS 在稀疏视角条件下局限性的全新框架。Dust-GS 并不单纯依赖于 SfM，而是引入了一种创新的点云初始化技术，即使在稀疏的输入数据情况下也能保持有效。我们的方法采用了一种结合自适应深度掩码的混合策略，从而提高了重建场景的准确性和细节。在多个基准数据集上进行的大量实验表明，Dust-GS 在稀疏视角场景中超越了传统的 3DGS 方法，以更少的输入图像实现了更高质量的场景重建。\n"
  },
  {
    "path": "abs/2409.08669.md",
    "content": "### AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius\n\n3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has achieved high-quality reconstruction and real-time rendering of complex scenes. However, the rasterization pipeline still suffers from unnecessary overhead resulting from avoidable serial Gaussian culling, and uneven load due to the distinct number of Gaussian to be rendered across pixels, which hinders wider promotion and application of 3DGS. In order to accelerate Gaussian splatting, we propose AdR-Gaussian, which moves part of serial culling in Render stage into the earlier Preprocess stage to enable parallel culling, employing adaptive radius to narrow the rendering pixel range for each Gaussian, and introduces a load balancing method to minimize thread waiting time during the pixel-parallel rendering. Our contributions are threefold, achieving a rendering speed of 310% while maintaining equivalent or even better quality than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile pairs of low splatting opacity based on an adaptive radius in the Gaussian-parallel Preprocess stage, which reduces the number of affected tile through the Gaussian bounding circle, thus reducing unnecessary overhead and achieving faster rendering speed. Secondly, we further propose early culling based on axis-aligned bounding box for Gaussian splatting, which achieves a more significant reduction in ineffective expenses by accurately calculating the Gaussian size in the 2D directions. Thirdly, we propose a balancing algorithm for pixel thread load, which compresses the information of heavy-load pixels to reduce thread waiting time, and enhance information of light-load pixels to hedge against rendering quality loss. Experiments on three datasets demonstrate that our algorithm can significantly improve the Gaussian Splatting rendering speed.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）是一种近期提出的显式 3D 表示方法，已在复杂场景的高质量重建和实时渲染中取得了显著成果。然而，栅格化管线仍然存在由于可避免的串行高斯剔除导致的不必要开销，以及每个像素渲染的高斯数量不同带来的负载不均，这阻碍了 3DGS 的更广泛推广与应用。为了加速高斯投影，我们提出了 AdR-Gaussian 方法，将渲染阶段的部分串行剔除提前至预处理阶段，实现并行剔除，并通过自适应半径缩小每个高斯的渲染像素范围，同时引入负载平衡方法，以最小化像素并行渲染时线程的等待时间。我们的贡献主要有三点，使渲染速度提升了 310%，同时保持了与现有最先进方法相当甚至更高的质量。\n首先，我们在高斯并行预处理阶段中，基于自适应半径对低投影不透明度的高斯-瓦片对进行早期剔除，减少通过高斯包围圆影响的瓦片数量，从而减少不必要的开销并加快渲染速度。其次，我们进一步提出基于轴对齐包围盒的高斯投影早期剔除，通过精确计算高斯在二维方向的大小，实现了更大幅度的无效开销减少。第三，我们提出了一个像素线程负载平衡算法，通过压缩高负载像素的信息来减少线程等待时间，并增强轻负载像素的信息，以对冲渲染质量的损失。三个数据集上的实验表明，我们的算法可以显著提升高斯投影的渲染速度。\n"
  },
  {
    "path": "abs/2409.08947.md",
    "content": "### A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis\n\nRelighting radiance fields is severely underconstrained for multi-view data, which is most often captured under a single illumination condition; It is especially hard for full scenes containing multiple objects. We introduce a method to create relightable radiance fields using such single-illumination data by exploiting priors extracted from 2D image diffusion models. We first fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by light direction, allowing us to augment a single-illumination capture into a realistic -- but possibly inconsistent -- multi-illumination dataset from directly defined light directions. We use this augmented data to create a relightable radiance field represented by 3D Gaussian splats. To allow direct control of light direction for low-frequency lighting, we represent appearance with a multi-layer perceptron parameterized on light direction. To enforce multi-view consistency and overcome inaccuracies we optimize a per-image auxiliary feature vector. We show results on synthetic and real multi-view data under single illumination, demonstrating that our method successfully exploits 2D diffusion model priors to allow realistic 3D relighting for complete scenes.\n\n对多视角数据进行光照重建是一个严重欠约束的问题，因为这些数据通常是在单一光照条件下捕获的；对于包含多个物体的完整场景尤其困难。我们提出了一种方法，利用从2D图像扩散模型中提取的先验信息，通过单一光照数据创建可重光照的辐射场。我们首先在一个多光照数据集上微调一个2D扩散模型，并以光照方向为条件，允许我们将单一光照捕获的数据扩充为一个基于直接定义的光照方向生成的真实但可能不一致的多光照数据集。我们使用这些扩充的数据创建了一个用3D高斯投影表示的可重光照辐射场。\n为了实现对低频光照下光照方向的直接控制，我们使用一个以光照方向为参数的多层感知机来表示外观。为确保多视角一致性并克服不准确性，我们优化了每张图像的辅助特征向量。我们展示了在单一光照条件下的合成和真实多视角数据的实验结果，证明了我们的方法能够成功利用2D扩散模型的先验，实现对完整场景的逼真3D重光照效果。\n"
  },
  {
    "path": "abs/2409.09295.md",
    "content": "### GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians\n\nConstructing a high-fidelity representation of the 3D scene using a monocular camera can enable a wide range of applications on mobile devices, such as micro-robots, smartphones, and AR/VR headsets. On these devices, memory is often limited in capacity and its access often dominates the consumption of compute energy. Although Gaussian Splatting (GS) allows for high-fidelity reconstruction of 3D scenes, current GS-based SLAM is not memory efficient as a large number of past images is stored to retrain Gaussians for reducing catastrophic forgetting. These images often require two-orders-of-magnitude higher memory than the map itself and thus dominate the total memory usage. In this work, we present GEVO, a GS-based monocular SLAM framework that achieves comparable fidelity as prior methods by rendering (instead of storing) them from the existing map. Novel Gaussian initialization and optimization techniques are proposed to remove artifacts from the map and delay the degradation of the rendered images over time. Across a variety of environments, GEVO achieves comparable map fidelity while reducing the memory overhead to around 58 MBs, which is up to 94x lower than prior works.\n\n使用单目相机构建高保真3D场景表示，可以在移动设备上实现广泛的应用，如微型机器人、智能手机和AR/VR头显。然而，这些设备的内存容量通常有限，且内存访问往往会消耗大量计算能量。虽然高斯投影（Gaussian Splatting, GS）允许高保真重建3D场景，但现有的基于GS的SLAM在内存使用上效率不高，因为需要存储大量过去的图像来重新训练高斯，以减少灾难性遗忘。这些图像通常需要的内存比地图本身高出两个数量级，因此占据了大部分内存使用。在本研究中，我们提出了GEVO，这是一种基于GS的单目SLAM框架，通过从现有地图渲染（而非存储）图像，达到了与先前方法相当的重建保真度。我们还提出了新的高斯初始化和优化技术，以去除地图中的伪影，并延缓渲染图像随时间退化的问题。在各种环境下，GEVO 实现了与之前方法相当的地图保真度，同时将内存开销减少至约58 MB，最高比现有方法低94倍。\n"
  },
  {
    "path": "abs/2409.09756.md",
    "content": "### MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation\n\n3D Gaussian Splatting demonstrates excellent quality and speed in novel view synthesis. Nevertheless, the huge file size of the 3D Gaussians presents challenges for transmission and storage. Current works design compact models to replace the substantial volume and attributes of 3D Gaussians, along with intensive training to distill information. These endeavors demand considerable training time, presenting formidable hurdles for practical deployment. To this end, we propose MesonGS, a codec for post-training compression of 3D Gaussians. Initially, we introduce a measurement criterion that considers both view-dependent and view-independent factors to assess the impact of each Gaussian point on the rendering output, enabling the removal of insignificant points. Subsequently, we decrease the entropy of attributes through two transformations that complement subsequent entropy coding techniques to enhance the file compression rate. More specifically, we first replace rotation quaternions with Euler angles; then, we apply region adaptive hierarchical transform to key attributes to reduce entropy. Lastly, we adopt finer-grained quantization to avoid excessive information loss. Moreover, a well-crafted finetune scheme is devised to restore quality. Extensive experiments demonstrate that MesonGS significantly reduces the size of 3D Gaussians while preserving competitive quality.\n\n3D 高斯投影（3D Gaussian Splatting）在新视角合成中展现了出色的质量和速度。然而，3D 高斯文件的巨大体积在传输和存储方面带来了挑战。当前的工作通过设计紧凑的模型来取代3D高斯的庞大体积和属性，并通过密集的训练来提取信息。这些工作需要大量的训练时间，给实际部署带来了巨大的困难。为此，我们提出了 MesonGS，这是一种用于 3D 高斯后训练压缩的编解码器。首先，我们引入了一种度量标准，考虑视角相关和视角无关因素，以评估每个高斯点对渲染输出的影响，从而删除不重要的点。接下来，我们通过两种变换减少属性的熵，以配合后续的熵编码技术，从而提高文件的压缩率。具体来说，我们首先用欧拉角替代旋转四元数；然后，我们对关键属性应用区域自适应分层变换以减少熵。最后，我们采用更精细的量化方法，避免过度的信息丢失。此外，我们设计了一种精心构造的微调方案，以恢复质量。大量实验表明，MesonGS 在显著减少 3D 高斯文件体积的同时，保持了有竞争力的质量。\n"
  },
  {
    "path": "abs/2409.09868.md",
    "content": "### SAFER-Splat: A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps\n\nSAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive action filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at this https URL.\n\nSAFER-Splat（Simultaneous Action Filtering and Environment Reconstruction）是一种基于控制屏障函数的实时、可扩展且最小干预的动作过滤器，用于在运行时使用高斯投影（GSplat）构建的详细地图中实现安全的机器人导航。我们提出了一种新颖的控制屏障函数（Control Barrier Function, CBF），该函数不仅在场景中的所有高斯基元上引入了安全性，还能够合成到控制器中，以处理数十万个高斯点，同时保持最小的内存占用，并在在线 Splat 训练期间以 15 Hz 的频率运行。在总计算时间中，只有一小部分消耗了 GPU 资源，从而实现了不中断的训练。安全层的干预极少，仅在机器人动作不安全时进行修正。\n为了展示这一安全过滤器，我们还推出了 SplatBridge，一个基于 ROS 的开源软件包，专为机器人实时 GSplat 映射而设计。我们首先在仿真中展示了管道的安全性和鲁棒性，结果表明，我们的方法比基于神经辐射场的竞争方法快 20-50 倍，且更安全、更不保守。进一步地，我们在无人机硬件平台上展示了同时进行 GSplat 映射和安全过滤，使用的仅是机载感知系统。我们验证了在遥控操作下，人工飞行员无法引发碰撞。\n"
  },
  {
    "path": "abs/2409.10041.md",
    "content": "### DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments\n\nThis paper presents DENSER, an efficient and effective approach leveraging 3D Gaussian splatting (3DGS) for the reconstruction of dynamic urban environments. While several methods for photorealistic scene representations, both implicitly using neural radiance fields (NeRF) and explicitly using 3DGS have shown promising results in scene reconstruction of relatively complex dynamic scenes, modeling the dynamic appearance of foreground objects tend to be challenging, limiting the applicability of these methods to capture subtleties and details of the scenes, especially far dynamic objects. To this end, we propose DENSER, a framework that significantly enhances the representation of dynamic objects and accurately models the appearance of dynamic objects in the driving scene. Instead of directly using Spherical Harmonics (SH) to model the appearance of dynamic objects, we introduce and integrate a new method aiming at dynamically estimating SH bases using wavelets, resulting in better representation of dynamic objects appearance in both space and time. Besides object appearance, DENSER enhances object shape representation through densification of its point cloud across multiple scene frames, resulting in faster convergence of model training. Extensive evaluations on KITTI dataset show that the proposed approach significantly outperforms state-of-the-art methods by a wide margin.\n\n本文提出了DENSER，这是一种高效且有效的3D高斯投影（3DGS）方法，用于动态城市环境的重建。尽管隐式的神经辐射场（NeRF）和显式的3DGS在复杂动态场景的重建中都取得了有前景的成果，但对前景动态物体的外观建模仍然具有挑战性，限制了这些方法在捕捉场景细节和微妙之处，尤其是远处动态物体时的适用性。为此，我们提出了DENSER框架，显著增强了动态物体的表示能力，并准确建模了驾驶场景中动态物体的外观。\n与直接使用球谐函数（Spherical Harmonics, SH）来建模动态物体外观不同，我们引入并整合了一种新的方法，利用小波动态估计SH基底，从而在空间和时间上更好地表示动态物体的外观。除了物体外观，DENSER还通过在多个场景帧中对点云进行致密化来提升物体形状的表示，从而加速模型训练的收敛速度。在KITTI数据集上的广泛评估表明，该方法相比于最先进的方法在性能上有显著提升。\n"
  },
  {
    "path": "abs/2409.10161.md",
    "content": "### SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting\n\nSim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim}and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data.\n\nSim2Real 转移，尤其是对于依赖 RGB 图像的操作策略，在机器人学中仍然是一个关键挑战，因为合成与真实世界视觉数据之间存在显著的领域差异。在本文中，我们提出了 SplatSim，一种利用高斯投影作为主要渲染基元的全新框架，以减少 RGB 基操作策略的 Sim2Real 差距。通过在模拟器中用高斯投影替代传统的网格表示，SplatSim 能生成高度逼真的合成数据，同时保持模拟的可扩展性和成本效益。我们展示了该框架的有效性，操控策略在 SplatSim 中训练并直接部署到现实世界中进行零样本测试，取得了平均 86.25% 的成功率，相比于在真实数据上训练的策略成功率为 97.5%。\n"
  },
  {
    "path": "abs/2409.10216.md",
    "content": "### BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting\n\nImage-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to predict future observations, enabling efficient, real-time navigation decisions grounded in the robot's sensory experiences. By integrating Bayesian updates, our method dynamically refines the robot's strategy without requiring extensive prior experience or data. Our algorithm is validated through extensive simulations and physical experiments, showcasing its potential for embodied robot systems in visually complex scenarios.\n\n图像目标导航（Image-goal navigation）使机器人能够使用视觉线索引导，前往捕捉目标图像的位置。然而，当前的方法要么过度依赖数据和计算开销较大的基于学习的方式，要么由于探索策略不足，在复杂环境中缺乏效率。为了解决这些限制，我们提出了一种基于高斯投影的贝叶斯具身图像目标导航方法，称为 BEINGS。这种新方法将图像导航（ImageNav）作为模型预测控制框架中的最优控制问题来进行求解。BEINGS 利用 3D 高斯投影作为场景先验，预测未来的观测结果，从而基于机器人的感知经验做出高效的实时导航决策。通过整合贝叶斯更新，我们的方法能够动态优化机器人的策略，而无需广泛的先验经验或数据。我们通过大量仿真和物理实验验证了该算法，展示了其在视觉复杂场景中对具身机器人系统的潜力。\n"
  },
  {
    "path": "abs/2409.10335.md",
    "content": "### Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering\n\nWe propose two novel ideas (adoption of deferred rendering and mesh-based representation) to improve the quality of 3D Gaussian splatting (3DGS) based inverse rendering. We first report a problem incurred by hidden Gaussians, where Gaussians beneath the surface adversely affect the pixel color in the volume rendering adopted by the existing methods. In order to resolve the problem, we propose applying deferred rendering and report new problems incurred in a naive application of deferred rendering to the existing 3DGS-based inverse rendering. In an effort to improve the quality of 3DGS-based inverse rendering under deferred rendering, we propose a novel two-step training approach which (1) exploits mesh extraction and utilizes a hybrid mesh-3DGS representation and (2) applies novel regularization methods to better exploit the mesh. Our experiments show that, under relighting, the proposed method offers significantly better rendering quality than the existing 3DGS-based inverse rendering methods. Compared with the SOTA voxel grid-based inverse rendering method, it gives better rendering quality while offering real-time rendering.\n\n我们提出了两个创新的想法（延迟渲染的采用和基于网格的表示）以提高基于3D高斯投影（3D Gaussian Splatting, 3DGS）的逆向渲染质量。首先，我们发现了隐藏高斯导致的问题，即表面下的高斯在现有方法采用的体渲染中对像素颜色产生了不利影响。为了解决这一问题，我们提出应用延迟渲染，并报告了在现有 3DGS 逆向渲染中简单应用延迟渲染所引发的新问题。\n为了在延迟渲染下提高 3DGS 逆向渲染的质量，我们提出了一种创新的两步训练方法：（1）利用网格提取，并采用混合网格-3DGS 表示；（2）应用新的正则化方法，更好地利用网格。在重光照条件下，我们的实验表明，与现有的基于 3DGS 的逆向渲染方法相比，该方法提供了显著更好的渲染质量。与最先进的基于体素网格的逆向渲染方法相比，它在提供更高渲染质量的同时，还实现了实时渲染。\n"
  },
  {
    "path": "abs/2409.10982.md",
    "content": "### GLC-SLAM: Gaussian Splatting SLAM with Efficient Loop Closure\n\n3D Gaussian Splatting (3DGS) has gained significant attention for its application in dense Simultaneous Localization and Mapping (SLAM), enabling real-time rendering and high-fidelity mapping. However, existing 3DGS-based SLAM methods often suffer from accumulated tracking errors and map drift, particularly in large-scale environments. To address these issues, we introduce GLC-SLAM, a Gaussian Splatting SLAM system that integrates global optimization of camera poses and scene models. Our approach employs frame-to-model tracking and triggers hierarchical loop closure using a global-to-local strategy to minimize drift accumulation. By dividing the scene into 3D Gaussian submaps, we facilitate efficient map updates following loop corrections in large scenes. Additionally, our uncertainty-minimized keyframe selection strategy prioritizes keyframes observing more valuable 3D Gaussians to enhance submap optimization. Experimental results on various datasets demonstrate that GLC-SLAM achieves superior or competitive tracking and mapping performance compared to state-of-the-art dense RGB-D SLAM systems.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）因其在密集同步定位与地图构建（SLAM）中的应用，能够实现实时渲染和高保真地图构建，受到了广泛关注。然而，现有基于 3DGS 的 SLAM 方法通常会因累积的跟踪误差和地图漂移问题而受限，特别是在大规模环境中。为了解决这些问题，我们提出了 GLC-SLAM，这是一种整合了相机姿态和场景模型全局优化的高斯投影 SLAM 系统。我们的方法采用帧对模型跟踪，并通过全局到局部策略触发分层回环闭合，以最小化漂移累积。通过将场景划分为 3D 高斯子图，我们在大场景中实现了回环校正后的高效地图更新。此外，我们的最小化不确定性的关键帧选择策略优先选择观测到更多重要 3D 高斯的关键帧，以增强子图优化。多个数据集上的实验结果表明，GLC-SLAM 在跟踪和地图构建性能上优于或与最先进的密集 RGB-D SLAM 系统具有竞争力。\n"
  },
  {
    "path": "abs/2409.11211.md",
    "content": "### SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction\n\nDigitizing 3D static scenes and 4D dynamic events from multi-view images has long been a challenge in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a practical and scalable reconstruction method, gaining popularity due to its impressive reconstruction quality, real-time rendering capabilities, and compatibility with widely used visualization tools. However, the method requires a substantial number of input views to achieve high-quality scene reconstruction, introducing a significant practical bottleneck. This challenge is especially severe in capturing dynamic scenes, where deploying an extensive camera array can be prohibitively costly. In this work, we identify the lack of spatial autocorrelation of splat features as one of the factors contributing to the suboptimal performance of the 3DGS technique in sparse reconstruction settings. To address the issue, we propose an optimization strategy that effectively regularizes splat features by modeling them as the outputs of a corresponding implicit neural field. This results in a consistent enhancement of reconstruction quality across various scenarios. Our approach effectively handles static and dynamic cases, as demonstrated by extensive testing across different setups and scene complexities.\n\n从多视角图像中数字化3D静态场景和4D动态事件一直是计算机视觉和图形学中的一大挑战。近年来，3D高斯投影（3D Gaussian Splatting, 3DGS）作为一种实用且可扩展的重建方法，凭借其卓越的重建质量、实时渲染能力以及与广泛使用的可视化工具兼容性，逐渐受到关注。然而，该方法需要大量的输入视角才能实现高质量的场景重建，这成为一个显著的实际瓶颈。这个挑战在捕捉动态场景时尤为严重，因为部署大规模的摄像机阵列成本高昂。\n在本研究中，我们确定了高斯点特征缺乏空间自相关性是3DGS技术在稀疏重建场景中表现不佳的原因之一。为了解决这一问题，我们提出了一种优化策略，通过将高斯点特征建模为相应隐式神经场的输出，有效地对其进行正则化处理。这种方法在各种场景下持续提升了重建质量。通过广泛的测试，我们的方法在处理静态和动态场景方面都表现出色，并在不同的设置和场景复杂度下取得了显著的效果。\n"
  },
  {
    "path": "abs/2409.11307.md",
    "content": "### GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module\n\n3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）结合了基于原语的表示方法与体积渲染技术的优势，实现了实时高质量渲染。然而，3DGS 模型通常对单场景训练过拟合，并且对从结构光（Structure from Motion, SfM）点云中启发式推导的高斯椭球初始化高度敏感，限制了其泛化能力和实用性。为了解决这些限制，我们提出了 GS-Net，一种可扩展的、即插即用的 3DGS 模块，用于从稀疏的 SfM 点云中致密化高斯椭球，增强几何结构的表示能力。据我们所知，GS-Net 是首个具备跨场景泛化能力的即插即用 3DGS 模块。此外，我们引入了 CARLA-NVS 数据集，该数据集包含额外的相机视角，以全面评估重建和渲染质量。大量实验表明，将 GS-Net 应用于 3DGS 可使传统视角下的 PSNR 提高 2.08 dB，新的视角下提高 1.86 dB，验证了该方法的有效性和鲁棒性。\n"
  },
  {
    "path": "abs/2409.11356.md",
    "content": "### RenderWorld: World Model with Self-Supervised 3D Label\n\nEnd-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion planning from autoregressive world model.\n\n基于视觉的端到端自动驾驶不仅比 LiDAR-视觉融合更具成本效益，而且比传统方法更可靠。为了实现经济且稳健的纯视觉自动驾驶系统，我们提出了 RenderWorld，这是一种基于视觉的端到端自动驾驶框架。该框架通过自监督的基于高斯的 Img2Occ 模块生成 3D 占据标签，然后由 AM-VAE 编码这些标签，并使用世界模型进行预测和规划。RenderWorld 采用高斯投影来表示 3D 场景并渲染 2D 图像，与基于 NeRF 的方法相比，大大提高了分割精度并减少了 GPU 内存消耗。通过应用 AM-VAE 分别编码空气和非空气元素，RenderWorld 实现了更精细的场景元素表示，在 4D 占据预测和基于自回归世界模型的运动规划中达到了最先进的性能。\n"
  },
  {
    "path": "abs/2409.11681.md",
    "content": "### Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks\n\n3D Gaussian Splatting has emerged as a powerful 3D scene representation technique, capturing fine details with high efficiency. In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we discovered that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. The project code and additional resources are available at this https URL.\n\n3D Gaussian Splatting 已成为一种强大的 3D 场景表示技术，能够高效捕捉精细细节。在本文中，我们提出了一种基于投票的创新方法，将 2D 分割模型扩展到 3D 高斯散点。我们的方法利用了掩码梯度，通过输入的 2D 掩码过滤梯度，并将这些梯度作为投票，以实现精确的分割。作为副产品，我们发现推理时的梯度还可以用于剪枝高斯分布，压缩率可达 21%。此外，我们还探索了少样本可供性迁移，使 2D 图像的标注能够有效迁移到 3D 高斯散点上。该方法背后坚实且简洁的数学公式，使其成为增强现实 (AR)、物体编辑、机器人等多个下游应用的有效工具。\n"
  },
  {
    "path": "abs/2409.11682.md",
    "content": "### SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation\n\nIn this paper, we propose SRIF, a novel Semantic shape Registration framework based on diffusion-based Image morphing and Flow estimation. More concretely, given a pair of extrinsically aligned shapes, we first render them from multi-views, and then utilize an image interpolation framework based on diffusion models to generate sequences of intermediate images between them. The images are later fed into a dynamic 3D Gaussian splatting framework, with which we reconstruct and post-process for intermediate point clouds respecting the image morphing processing. In the end, tailored for the above, we propose a novel registration module to estimate continuous normalizing flow, which deforms source shape consistently towards the target, with intermediate point clouds as weak guidance. Our key insight is to leverage large vision models (LVMs) to associate shapes and therefore obtain much richer semantic information on the relationship between shapes than the ad-hoc feature extraction and alignment. As a consequence, SRIF achieves high-quality dense correspondences on challenging shape pairs, but also delivers smooth, semantically meaningful interpolation in between. Empirical evidence justifies the effectiveness and superiority of our method as well as specific design choices. The code is released at this https URL.\n\n在本文中，我们提出了SRIF，一种基于扩散式图像变形和流估计的全新语义形状配准框架。具体而言，给定一对外部对齐的形状，我们首先从多个视角渲染它们，然后利用基于扩散模型的图像插值框架生成它们之间的中间图像序列。随后，这些图像会被输入到动态 3D 高斯散点框架中，我们通过该框架重建并后处理中间点云，以尊重图像变形过程。最终，我们为此设计了一种全新的配准模块，用于估计连续的归一化流，从而使源形状一致地变形为目标形状，并以中间点云作为弱引导。我们的关键见解是利用大规模视觉模型（LVMs）来关联形状，从而比依赖特定特征提取和对齐的方法获取更丰富的语义信息。因此，SRIF在具有挑战性的形状对上实现了高质量的密集对应，同时在形状之间提供了平滑且具有语义意义的插值。实验结果证明了我们方法的有效性和优越性，以及特定设计选择的合理性。\n"
  },
  {
    "path": "abs/2409.12193.md",
    "content": "### Vista3D: Unravel the 3D Darkside of a Single Image\n\nWe embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts. To address this, we present Vista3D, a framework that realizes swift and consistent 3D generation within a mere 5 minutes. At the heart of Vista3D lies a two-phase approach: the coarse phase and the fine phase. In the coarse phase, we rapidly generate initial geometry with Gaussian Splatting from a single image. In the fine phase, we extract a Signed Distance Function (SDF) directly from learned Gaussian Splatting, optimizing it with a differentiable isosurface representation. Furthermore, it elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects. Additionally, it harmonizes gradients from 2D diffusion prior with 3D-aware diffusion priors by angular diffusion prior composition. Through extensive evaluation, we demonstrate that Vista3D effectively sustains a balance between the consistency and diversity of the generated 3D objects.\n\n我们着手解决一个古老的难题：通过仅能看到物体的一部分来揭示其隐藏的维度。为此，我们提出了Vista3D，一个能够在短短5分钟内实现快速且一致的3D生成框架。Vista3D的核心采用了两阶段方法：粗略阶段和精细阶段。在粗略阶段，我们通过单张图像快速生成初步几何形态，使用高斯散点技术。在精细阶段，我们从学习到的高斯散点中直接提取符号距离函数（SDF），并通过可微等值面表示进行优化。此外，Vista3D通过使用两组独立的隐式函数对可见和隐藏部分进行解耦表示，进一步提升了生成质量。同时，它通过角度扩散先验组合，将来自2D扩散模型的梯度与3D感知的扩散先验相结合。通过广泛的评估，我们证明了Vista3D能够在生成的3D物体的一致性和多样性之间有效地保持平衡。\n"
  },
  {
    "path": "abs/2409.12323.md",
    "content": "### Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus\n\nDepth estimation is a fundamental task in 3D geometry. While stereo depth estimation can be achieved through triangulation methods, it is not as straightforward for monocular methods, which require the integration of global and local information. The Depth from Defocus (DFD) method utilizes camera lens models and parameters to recover depth information from blurred images and has been proven to perform well. However, these methods rely on All-In-Focus (AIF) images for depth estimation, which is nearly impossible to obtain in real-world applications. To address this issue, we propose a self-supervised framework based on 3D Gaussian splatting and Siamese networks. By learning the blur levels at different focal distances of the same scene in the focal stack, the framework predicts the defocus map and Circle of Confusion (CoC) from a single defocused image, using the defocus map as input to DepthNet for monocular depth estimation. The 3D Gaussian splatting model renders defocused images using the predicted CoC, and the differences between these and the real defocused images provide additional supervision signals for the Siamese Defocus self-supervised network. This framework has been validated on both artificially synthesized and real blurred datasets. Subsequent quantitative and visualization experiments demonstrate that our proposed framework is highly effective as a DFD method.\n\n深度估计是3D几何中的一项基础任务。虽然立体深度估计可以通过三角测量方法实现，但对于单目方法来说，整合全局和局部信息则没有那么直接。深度由散焦（DFD）方法利用相机镜头模型和参数从模糊图像中恢复深度信息，并已被证明表现良好。然而，这些方法依赖全聚焦（AIF）图像进行深度估计，而在实际应用中几乎无法获取这种图像。为了解决这一问题，我们提出了一个基于3D高斯散点和Siamese网络的自监督框架。通过学习同一场景在不同焦距下的模糊程度，该框架能够从单张散焦图像中预测散焦图和模糊圆（CoC），并将散焦图作为输入传递给DepthNet进行单目深度估计。3D高斯散点模型使用预测的CoC渲染散焦图像，真实散焦图像与渲染图像的差异为Siamese Defocus自监督网络提供了额外的监督信号。该框架已经在人工合成和真实模糊数据集上进行了验证，后续的定量和可视化实验表明，我们提出的框架作为DFD方法具有很高的有效性。\n"
  },
  {
    "path": "abs/2409.12518.md",
    "content": "### Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting\n\nWe propose Hi-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making it particularly challenging and costly for scene understanding. To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs). We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Furthermore, we enhance the whole SLAM system, resulting in improved tracking and mapping performance. Our Hi-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up. Additionally, it exhibits competitive performance in rendering semantic segmentation in small synthetic scenes, with significantly reduced storage and training time requirements. Rendering FPS impressively reaches 2,000 with semantic information and 3,000 without it. Most notably, it showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.\n\n我们提出了Hi-SLAM，一种语义3D高斯散点SLAM方法，具有新颖的分层类别表示，使得能够实现精确的全局3D语义映射、可扩展能力以及在3D世界中的显式语义标签预测。随着环境复杂度的增加，语义SLAM系统中的参数使用显著增长，使得场景理解变得尤为具有挑战性和高成本。为了解决这一问题，我们引入了一种新颖的分层表示，将语义信息以紧凑的形式编码到3D高斯散点中，同时利用了大规模语言模型（LLMs）的能力。我们进一步提出了一种新的语义损失，通过层内和跨层优化来优化分层语义信息。此外，我们增强了整个SLAM系统，提升了跟踪和映射性能。我们的Hi-SLAM在映射和跟踪精度上优于现有的密集SLAM方法，同时实现了2倍的操作速度提升。此外，它在小型合成场景中的语义分割渲染表现出色，显著减少了存储和训练时间需求。渲染帧率在包含语义信息的情况下达到令人印象深刻的2000 FPS，而不包含语义信息时则达到3000 FPS。最值得注意的是，它展示了处理超过500种语义类别的复杂真实场景的能力，突显了其强大的扩展能力。\n"
  },
  {
    "path": "abs/2409.12617.md",
    "content": "### CrossRT: A cross platform programming technology for hardware-accelerated ray tracing in CG and CV applications\n\nWe propose a programming technology that bridges cross-platform compatibility and hardware acceleration in ray tracing applications. Our methodology enables developers to define algorithms while our translator manages implementation specifics for different hardware or APIs. Features include: generating hardware-accelerated code from hardware-agnostic, object-oriented C++ algorithm descriptions; enabling users to define software fallbacks for non-hardware-accelerated CPUs and GPUs; producing GPU programming API-based algorithm implementations resembling manually ported C++ versions. The generated code is editable and readable, allowing for additional hardware acceleration. Our translator supports single megakernel and multiple kernel path tracing implementations without altering the programming model or input source code. Wavefront mode is crucial for NeRF and SDF, ensuring efficient evaluation with multiple kernels. Validation on tasks such as BVH tree build/traversal, ray-surface intersection for SDF, ray-volume intersection for 3D Gaussian Splatting, and complex Path Tracing models showed comparable performance levels to expert-written implementations for GPUs. Our technology outperformed existing Path Tracing implementations.\n\n我们提出了一种编程技术，在光线追踪应用中实现跨平台兼容性与硬件加速的桥梁。我们的方法允许开发者定义算法，而我们的翻译器负责为不同的硬件或API管理具体实现。其特点包括：从与硬件无关的面向对象C++算法描述生成硬件加速代码；允许用户为非硬件加速的CPU和GPU定义软件备选方案；生成基于GPU编程API的算法实现，其表现类似于手动移植的C++版本。生成的代码是可编辑和可读的，支持进一步的硬件加速。我们的翻译器支持单一大核和多核路径追踪实现，而无需更改编程模型或输入源代码。Wavefront模式对于NeRF和SDF至关重要，确保使用多核进行高效评估。在诸如BVH树构建/遍历、SDF的光线-表面相交、3D高斯散点的光线-体积相交、以及复杂的路径追踪模型等任务上的验证表明，其性能与专家编写的GPU实现相当。我们的技术在性能上超越了现有的路径追踪实现。\n"
  },
  {
    "path": "abs/2409.12753.md",
    "content": "### DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input\n\nWe propose DrivingForward, a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input. Driving scene images from vehicle-mounted cameras are typically sparse, with limited overlap, and the movement of the vehicle further complicates the acquisition of camera extrinsics. To tackle these challenges and achieve real-time reconstruction, we jointly train a pose network, a depth network, and a Gaussian network to predict the Gaussian primitives that represent the driving scenes. The pose network and depth network determine the position of the Gaussian primitives in a self-supervised manner, without using depth ground truth and camera extrinsics during training. The Gaussian network independently predicts primitive parameters from each input image, including covariance, opacity, and spherical harmonics coefficients. At the inference stage, our model can achieve feed-forward reconstruction from flexible multi-frame surround-view input. Experiments on the nuScenes dataset show that our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.\n\n我们提出了DrivingForward，一种基于前馈高斯散点模型的驾驶场景重建方法，能够从灵活的环视输入中进行重建。来自车辆安装摄像头的驾驶场景图像通常较为稀疏，且重叠区域有限，同时车辆的移动进一步加大了摄像机外参获取的难度。为了解决这些挑战并实现实时重建，我们联合训练了姿态网络、深度网络和高斯网络，以预测代表驾驶场景的高斯基元。姿态网络和深度网络以自监督的方式确定高斯基元的位置，在训练过程中无需深度真值和摄像机外参。高斯网络则独立预测每个输入图像的基元参数，包括协方差、透明度和球谐系数。在推理阶段，我们的模型能够从灵活的多帧环视输入中实现前馈重建。基于nuScenes数据集的实验表明，我们的模型在重建性能上优于现有的前馈和场景优化重建方法。\n"
  },
  {
    "path": "abs/2409.12771.md",
    "content": "### Spectral-GS: Taming 3D Gaussian Splatting with Spectral Entropy\n\nRecently, 3D Gaussian Splatting (3D-GS) has achieved impressive results in novel view synthesis, demonstrating high fidelity and efficiency. However, it easily exhibits needle-like artifacts, especially when increasing the sampling rate. Mip-Splatting tries to remove these artifacts with a 3D smoothing filter for frequency constraints and a 2D Mip filter for approximated supersampling. Unfortunately, it tends to produce over-blurred results, and sometimes needle-like Gaussians still persist. Our spectral analysis of the covariance matrix during optimization and densification reveals that current 3D-GS lacks shape awareness, relying instead on spectral radius and view positional gradients to determine splitting. As a result, needle-like Gaussians with small positional gradients and low spectral entropy fail to split and overfit high-frequency details. Furthermore, both the filters used in 3D-GS and Mip-Splatting reduce the spectral entropy and increase the condition number during zooming in to synthesize novel view, causing view inconsistencies and more pronounced artifacts. Our Spectral-GS, based on spectral analysis, introduces 3D shape-aware splitting and 2D view-consistent filtering strategies, effectively addressing these issues, enhancing 3D-GS's capability to represent high-frequency details without noticeable artifacts, and achieving high-quality photorealistic rendering.\n\n最近，3D Gaussian Splatting (3D-GS) 在新视角合成方面取得了令人印象深刻的成果，展示了高保真度和高效率。然而，当采样率增加时，3D-GS 容易出现针状伪影。Mip-Splatting 试图通过3D平滑滤波器来进行频率约束，并使用2D Mip滤波器实现近似的超采样，从而消除这些伪影。不幸的是，它往往会产生过度模糊的结果，有时针状高斯分布仍然存在。我们对协方差矩阵在优化和致密化过程中的频谱分析表明，当前的3D-GS缺乏对形状的感知，而是依赖频谱半径和视角位置梯度来决定分裂。因此，位置梯度较小且频谱熵低的针状高斯分布无法分裂，过拟合高频细节。此外，3D-GS 和 Mip-Splatting 中使用的滤波器在缩放以合成新视角时，降低了频谱熵并增加了条件数，导致视角不一致以及更明显的伪影。我们的Spectral-GS基于频谱分析，引入了3D形状感知分裂和2D视角一致的滤波策略，有效解决了这些问题，增强了3D-GS在表示高频细节时的能力，避免了显著的伪影，从而实现高质量的真实感渲染。\n"
  },
  {
    "path": "abs/2409.12774.md",
    "content": "### GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction\n\nThis paper proposes a novel framework for large-scale scene reconstruction based on 3D Gaussian splatting (3DGS) and aims to address the scalability and accuracy challenges faced by existing methods. For tackling the scalability issue, we split the large scene into multiple cells, and the candidate point-cloud and camera views of each cell are correlated through a visibility-based camera selection and a progressive point-cloud extension. To reinforce the rendering quality, three highlighted improvements are made in comparison with vanilla 3DGS, which are a strategy of the ray-Gaussian intersection and the novel Gaussians density control for learning efficiency, an appearance decoupling module based on ConvKAN network to solve uneven lighting conditions in large-scale scenes, and a refined final loss with the color loss, the depth distortion loss, and the normal consistency loss. Finally, the seamless stitching procedure is executed to merge the individual Gaussian radiance field for novel view synthesis across different cells. Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that our method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction. We further validate the generalizability of the proposed approach by rendering on self-collected video clips recorded by a commercial drone.\n\n本文提出了一种基于3D Gaussian Splatting（3DGS）的全新框架，用于大规模场景重建，旨在解决现有方法面临的可扩展性和精度挑战。为了解决可扩展性问题，我们将大场景划分为多个单元，通过基于可见性的相机选择和渐进式点云扩展来关联每个单元的候选点云和相机视图。为了提升渲染质量，相比于基础版的3DGS，我们做了三项改进：一种射线与高斯交叉的新策略和高斯密度控制方法，以提高学习效率；基于ConvKAN网络的外观解耦模块，用于解决大规模场景中的不均匀光照问题；以及经过改进的最终损失函数，结合了颜色损失、深度畸变损失和法线一致性损失。最后，执行无缝拼接程序，将各个单元的高斯辐射场合并，实现跨单元的新视角合成。在Mill19、Urban3D和MatrixCity数据集上的评估显示，我们的方法在大规模场景重建中，生成的高保真渲染结果始终优于现有的最先进方法。我们还通过在商业无人机录制的自采视频片段上渲染，验证了所提方法的广泛适应性。\n"
  },
  {
    "path": "abs/2409.12886.md",
    "content": "### EdgeGaussians -- 3D Edge Mapping via Gaussian Splatting\n\nWith their meaningful geometry and their omnipresence in the 3D world, edges are extremely useful primitives in computer vision. 3D edges comprise of lines and curves, and methods to reconstruct them use either multi-view images or point clouds as input. State-of-the-art image-based methods first learn a 3D edge point cloud then fit 3D edges to it. The edge point cloud is obtained by learning a 3D neural implicit edge field from which the 3D edge points are sampled on a specific level set (0 or 1). However, such methods present two important drawbacks: i) it is not realistic to sample points on exact level sets due to float imprecision and training inaccuracies. Instead, they are sampled within a range of levels so the points do not lie accurately on the 3D edges and require further processing. ii) Such implicit representations are computationally expensive and require long training times. In this paper, we address these two limitations and propose a 3D edge mapping that is simpler, more efficient, and preserves accuracy. Our method learns explicitly the 3D edge points and their edge direction hence bypassing the need for point sampling. It casts a 3D edge point as the center of a 3D Gaussian and the edge direction as the principal axis of the Gaussian. Such a representation has the advantage of being not only geometrically meaningful but also compatible with the efficient training optimization defined in Gaussian Splatting. Results show that the proposed method produces edges as accurate and complete as the state-of-the-art while being an order of magnitude faster. Code is released at this https URL.\n\n边缘作为计算机视觉中有意义的几何元素，广泛存在于3D世界中，因而是极为有用的基础元素。3D边缘由直线和曲线构成，重建它们的方法通常使用多视角图像或点云作为输入。当前最先进的基于图像的方法首先学习3D边缘点云，然后拟合3D边缘。边缘点云是通过学习一个3D神经隐式边缘场获得的，从中在特定的水平集（0或1）上采样出3D边缘点。然而，这类方法存在两个重要缺陷：i) 由于浮点不精确和训练误差，精确在水平集上采样点是不现实的。相反，点是在一定的水平范围内采样，导致这些点不能准确位于3D边缘上，且需要进一步处理。ii) 这类隐式表示计算量大，训练时间长。在本文中，我们解决了这两个问题，提出了一种更简单、更高效且能保持准确性的3D边缘映射方法。我们的方法显式地学习3D边缘点及其方向，从而避免了点采样的需求。该方法将3D边缘点表示为3D高斯的中心，并将边缘方向表示为该高斯的主轴。这种表示不仅在几何上有意义，还与高斯散点中的高效训练优化相兼容。结果表明，该方法在边缘的准确性和完整性上与最先进方法相当，但速度快了一个数量级。\n"
  },
  {
    "path": "abs/2409.12892.md",
    "content": "### 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt\n\nWe present 3DGS-LM, a new method that accelerates the reconstruction of 3D Gaussian Splatting (3DGS) by replacing its ADAM optimizer with a tailored Levenberg-Marquardt (LM). Existing methods reduce the optimization time by decreasing the number of Gaussians or by improving the implementation of the differentiable rasterizer. However, they still rely on the ADAM optimizer to fit Gaussian parameters of a scene in thousands of iterations, which can take up to an hour. To this end, we change the optimizer to LM that runs in conjunction with the 3DGS differentiable rasterizer. For efficient GPU parallization, we propose a caching data structure for intermediate gradients that allows us to efficiently calculate Jacobian-vector products in custom CUDA kernels. In every LM iteration, we calculate update directions from multiple image subsets using these kernels and combine them in a weighted mean. Overall, our method is 30% faster than the original 3DGS while obtaining the same reconstruction quality. Our optimization is also agnostic to other methods that acclerate 3DGS, thus enabling even faster speedups compared to vanilla 3DGS.\n\n我们提出了3DGS-LM，一种通过替换ADAM优化器为定制的Levenberg-Marquardt（LM）方法来加速3D Gaussian Splatting（3DGS）重建的新方法。现有方法通过减少高斯数量或改进可微光栅器的实现来缩短优化时间。然而，这些方法仍然依赖于ADAM优化器来调整场景的高斯参数，需要数千次迭代，可能耗时长达一个小时。为此，我们将优化器更换为与3DGS可微光栅器结合运行的LM。为了实现高效的GPU并行化，我们提出了一种用于缓存中间梯度的数据结构，能够通过自定义的CUDA内核高效计算雅可比矩阵与向量的乘积。在每次LM迭代中，我们使用这些内核从多个图像子集计算更新方向，并通过加权均值将它们结合起来。总体而言，我们的方法比原始3DGS快30%，同时保持相同的重建质量。此外，我们的优化对其他加速3DGS的方法保持兼容，因此相较于基础版3DGS能够实现更快的速度提升。\n"
  },
  {
    "path": "abs/2409.12899.md",
    "content": "### LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction\n\nLarge-scale 3D reconstruction is critical in the field of robotics, and the potential of 3D Gaussian Splatting (3DGS) for achieving accurate object-level reconstruction has been demonstrated. However, ensuring geometric accuracy in outdoor and unbounded scenes remains a significant challenge. This study introduces LI-GS, a reconstruction system that incorporates LiDAR and Gaussian Splatting to enhance geometric accuracy in large-scale scenes. 2D Gaussain surfels are employed as the map representation to enhance surface alignment. Additionally, a novel modeling method is proposed to convert LiDAR point clouds to plane-constrained multimodal Gaussian Mixture Models (GMMs). The GMMs are utilized during both initialization and optimization stages to ensure sufficient and continuous supervision over the entire scene while mitigating the risk of over-fitting. Furthermore, GMMs are employed in mesh extraction to eliminate artifacts and improve the overall geometric quality. Experiments demonstrate that our method outperforms state-of-the-art methods in large-scale 3D reconstruction, achieving higher accuracy compared to both LiDAR-based methods and Gaussian-based methods with improvements of 52.6% and 68.7%, respectively.\n\n大规模3D重建在机器人领域至关重要，3D Gaussian Splatting（3DGS）在实现精确的物体级别重建方面展示了其潜力。然而，在室外和无界场景中确保几何精度仍然是一个重要挑战。本研究引入了LI-GS，这是一种结合LiDAR和高斯散点的重建系统，用于提升大规模场景的几何精度。我们采用2D高斯表面单元作为地图表示，以增强表面对齐效果。此外，提出了一种新颖的建模方法，将LiDAR点云转换为平面约束的多模态高斯混合模型（GMM）。在初始化和优化阶段均使用GMM，以确保对整个场景的充分且连续的监督，同时减少过拟合的风险。GMM还用于网格提取，以消除伪影并提高整体几何质量。实验结果表明，我们的方法在大规模3D重建中优于现有最先进的方法，分别比基于LiDAR的方法和基于高斯的方法在精度上提升了52.6%和68.7%。\n"
  },
  {
    "path": "abs/2409.12954.md",
    "content": "### GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling\n\nGaussian splatting has demonstrated excellent performance for view synthesis and scene reconstruction. The representation achieves photorealistic quality by optimizing the position, scale, color, and opacity of thousands to millions of 2D or 3D Gaussian primitives within a scene. However, since each Gaussian primitive encodes both appearance and geometry, these attributes are strongly coupled--thus, high-fidelity appearance modeling requires a large number of Gaussian primitives, even when the scene geometry is simple (e.g., for a textured planar surface). We propose to texture each 2D Gaussian primitive so that even a single Gaussian can be used to capture appearance details. By employing per-primitive texturing, our appearance representation is agnostic to the topology and complexity of the scene's geometry. We show that our approach, GStex, yields improved visual quality over prior work in texturing Gaussian splats. Furthermore, we demonstrate that our decoupling enables improved novel view synthesis performance compared to 2D Gaussian splatting when reducing the number of Gaussian primitives, and that GStex can be used for scene appearance editing and re-texturing.\n\n高斯散点在视角合成和场景重建中展示了出色的性能。这种表示通过优化场景中成千上万甚至数百万的2D或3D高斯基元的位置、尺度、颜色和透明度，达到了逼真的视觉效果。然而，由于每个高斯基元同时编码了外观和几何信息，这些属性之间紧密耦合——因此，即使场景几何简单（例如纹理化的平面表面），高保真的外观建模仍然需要大量的高斯基元。为了解决这一问题，我们提出为每个2D高斯基元添加纹理，这样即便单个高斯也能捕捉到外观细节。通过每个基元的纹理化，我们的外观表示对场景几何的拓扑和复杂度保持无关性。我们的研究表明，提出的方法GStex在高斯散点的纹理化方面相比于之前的工作能提高视觉质量。此外，我们证明了这种解耦使得在减少高斯基元数量的情况下，GStex在新视角合成的性能优于2D高斯散点，并且GStex可以用于场景外观编辑和重新纹理化。\n"
  },
  {
    "path": "abs/2409.13055.md",
    "content": "### MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting\n\nReal-time SLAM with dense 3D mapping is computationally challenging, especially on resource-limited devices. The recent development of 3D Gaussian Splatting (3DGS) offers a promising approach for real-time dense 3D reconstruction. However, existing 3DGS-based SLAM systems struggle to balance hardware simplicity, speed, and map quality. Most systems excel in one or two of the aforementioned aspects but rarely achieve all. A key issue is the difficulty of initializing 3D Gaussians while concurrently conducting SLAM. To address these challenges, we present Monocular GSO (MGSO), a novel real-time SLAM system that integrates photometric SLAM with 3DGS. Photometric SLAM provides dense structured point clouds for 3DGS initialization, accelerating optimization and producing more efficient maps with fewer Gaussians. As a result, experiments show that our system generates reconstructions with a balance of quality, memory efficiency, and speed that outperforms the state-of-the-art. Furthermore, our system achieves all results using RGB inputs. We evaluate the Replica, TUM-RGBD, and EuRoC datasets against current live dense reconstruction systems. Not only do we surpass contemporary systems, but experiments also show that we maintain our performance on laptop hardware, making it a practical solution for robotics, A/R, and other real-time applications.\n\n实时SLAM与密集3D映射在计算上非常具有挑战性，尤其是在资源有限的设备上。近年来，3D Gaussian Splatting (3DGS) 的发展为实时密集3D重建提供了一种有前景的解决方案。然而，现有基于3DGS的SLAM系统在硬件简化、速度和地图质量之间难以平衡。大多数系统在上述某一或两方面表现优异，但很少能全面实现这一平衡。一个关键问题是在同时进行SLAM的过程中初始化3D高斯非常困难。为了解决这些挑战，我们提出了Monocular GSO (MGSO)，这是一种结合光度SLAM与3DGS的全新实时SLAM系统。光度SLAM为3DGS初始化提供了稠密的结构化点云，加速了优化过程，并以更少的高斯生成了更高效的地图。实验结果表明，我们的系统在质量、内存效率和速度之间实现了平衡，优于现有的最先进方法。此外，我们的系统仅使用RGB输入即可实现所有结果。我们在Replica、TUM-RGBD和EuRoC数据集上对比了当前的实时密集重建系统，实验不仅表明我们超越了现有系统，还展示了在笔记本硬件上保持性能的能力，使其成为机器人、增强现实（A/R）以及其他实时应用的实际解决方案。\n"
  },
  {
    "path": "abs/2409.13222.md",
    "content": "### 3D-GSW: 3D Gaussian Splatting Watermark for Protecting Copyrights in Radiance Fields\n\nRecently, 3D Gaussian splatting has been getting a lot of attention as an innovative method for representing 3D space due to rapid rendering and image quality. However, copyright protection for the 3D Gaussian splatting has not yet been introduced. In this paper, we present a novel watermarking method for 3D Gaussian splatting. The proposed method embeds a binary message into 3D Gaussians by fine-tuning the pre-trained 3D Gaussian splatting model. To achieve this, we present Frequency-Guided Densification (FGD) that utilizes Discrete Fourier Transform to find patches with high-frequencies and split 3D Gaussians based on 3D Gaussian Contribution Vector. It is each 3D Gaussian contribution to rendered pixel colors, improving both rendering quality and bit accuracy. Furthermore, we modify an adaptive gradient mask to enhance rendering quality. Our experiments show that our method can embed a watermark in 3D Gaussians imperceptibly with increased capacity and robustness against attacks. Our method reduces optimization cost and achieves state-of-the-art performance compared to other methods.\n\n最近，3D Gaussian Splatting 因其快速渲染和图像质量而备受关注，成为表示3D空间的创新方法。然而，关于3D Gaussian Splatting 的版权保护尚未被引入。在本文中，我们提出了一种用于3D Gaussian Splatting的全新水印方法。该方法通过微调预训练的3D Gaussian Splatting模型，将二进制消息嵌入到3D高斯中。为此，我们提出了频率引导致密化（Frequency-Guided Densification，FGD），利用离散傅里叶变换（DFT）来寻找高频补丁，并根据3D高斯贡献向量进行3D高斯分裂。此向量表示每个3D高斯对渲染像素颜色的贡献，从而提升了渲染质量和比特精度。此外，我们修改了自适应梯度掩码以进一步增强渲染质量。实验结果表明，我们的方法能够在不显著影响3D高斯的前提下嵌入水印，同时增加了水印的容量和对攻击的鲁棒性。相比其他方法，我们的方法不仅降低了优化成本，还达到了最先进的性能。\n"
  },
  {
    "path": "abs/2409.13392.md",
    "content": "### Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors\n\nEvent cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel warm-up initialization strategy that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a progressive event supervision strategy that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes.\n\n事件相机是一种受生物启发的传感器，它输出异步和稀疏的事件流，而不是固定帧。得益于其高动态范围和高时间分辨率等显著优势，事件相机已被应用于解决3D重建问题，这对于机器人地图构建至关重要。最近，神经渲染技术如3D Gaussian Splatting（3DGS）在3D重建中取得了成功。然而，如何开发有效的基于事件的3DGS流程仍然未被充分探索。特别是，由于3DGS通常依赖于高质量的初始化和密集的多视图约束，事件数据的固有稀疏性可能会对3DGS优化带来潜在问题。为此，我们提出了一种新颖的基于事件的3DGS框架，命名为Elite-EvGS。我们的核心思想是利用现成的事件转视频（E2V）模型中的先验知识，以粗到细的优化方式从事件中有效地重建3D场景。具体来说，为了解决基于事件的3DGS初始化复杂性，我们引入了一种新的预热初始化策略，先通过E2V模型生成的帧来优化粗略的3DGS，然后结合事件进一步细化细节。接着，我们提出了一种渐进的事件监督策略，采用窗口切片操作逐步减少用于监督的事件数量，这巧妙地缓解了事件帧的时间随机性，有利于优化局部纹理和全局结构细节。基准数据集上的实验表明，Elite-EvGS能够以更好的纹理和结构细节重建3D场景。同时，我们的方法在真实世界数据中表现出色，特别是在多种具有挑战性的条件下，如快速运动和低光场景。\n"
  },
  {
    "path": "abs/2409.14067.md",
    "content": "### SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality\n\n Visual localization plays an important role in the applications of Augmented Reality (AR), which enable AR devices to obtain their 6-DoF pose in the pre-build map in order to render virtual content in real scenes. However, most existing approaches can not perform novel view rendering and require large storage capacities for maps. To overcome these limitations, we propose an efficient visual localization method capable of high-quality rendering with fewer parameters. Specifically, our approach leverages 3D Gaussian primitives as the scene representation. To ensure precise 2D-3D correspondences for pose estimation, we develop an unbiased 3D scene-specific descriptor decoder for Gaussian primitives, distilled from a constructed feature volume. Additionally, we introduce a salient 3D landmark selection algorithm that selects a suitable primitive subset based on the saliency score for localization. We further regularize key Gaussian primitives to prevent anisotropic effects, which also improves localization performance. Extensive experiments on two widely used datasets demonstrate that our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches.\n\n 视觉定位在增强现实（AR）应用中起着重要作用，它使得AR设备能够在预构建的地图中获取其6自由度（6-DoF）的位姿，以便在真实场景中渲染虚拟内容。然而，大多数现有的方法无法进行新视角渲染，并且需要大量存储空间用于存储地图。为克服这些限制，我们提出了一种高效的视觉定位方法，该方法能够以更少的参数实现高质量渲染。具体而言，我们的方法利用3D高斯基元作为场景表示。为了确保用于位姿估计的精确2D-3D对应关系，我们开发了一种无偏的3D场景特定描述符解码器，用于从构建的特征体积中提取高斯基元。此外，我们引入了一种显著的3D地标选择算法，通过显著性评分选择适合的基元子集用于定位。我们还对关键高斯基元进行了正则化处理，以防止各向异性效应，同时提高定位性能。基于两个广泛使用的数据集的实验结果表明，我们的方法在渲染和定位性能上优于或可与最先进的基于隐式表示的视觉定位方法相媲美。\n"
  },
  {
    "path": "abs/2409.14316.md",
    "content": "### MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views\n\nRecently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose MVPGS, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, we introduce a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed.\n\n最近，神经辐射场（NeRF）的进步推动了少样本新视角合成（NVS）的发展，这是3D视觉应用中的一个重要挑战。尽管有许多尝试减少NeRF对密集输入的需求，它仍然面临耗时的训练和渲染过程。近期，3D Gaussian Splatting（3DGS）通过显式的点基表示实现了实时高质量渲染。然而，和NeRF类似，它也容易由于缺乏约束而过拟合训练视图。在本文中，我们提出了MVPGS，一种基于3D Gaussian Splatting的少样本NVS方法，挖掘多视角先验。我们利用最近的基于学习的多视角立体（MVS）方法，提升3DGS几何初始化的质量。为减轻过拟合，我们提出了一种前向变形方法，基于计算出的几何为场景提供额外的外观约束。此外，我们引入了一个视角一致的几何约束，以促进高斯参数的优化收敛，并使用单目深度正则化作为补充。实验表明，所提出的方法在实时渲染速度下实现了最先进的性能。\n"
  },
  {
    "path": "abs/2409.14778.md",
    "content": "### Human Hair Reconstruction with Strand-Aligned 3D Gaussians\n\nWe introduce a new hair modeling method that uses a dual representation of classical hair strands and 3D Gaussians to produce accurate and realistic strand-based reconstructions from multi-view data. In contrast to recent approaches that leverage unstructured Gaussians to model human avatars, our method reconstructs the hair using 3D polylines, or strands. This fundamental difference allows the use of the resulting hairstyles out-of-the-box in modern computer graphics engines for editing, rendering, and simulation. Our 3D lifting method relies on unstructured Gaussians to generate multi-view ground truth data to supervise the fitting of hair strands. The hairstyle itself is represented in the form of the so-called strand-aligned 3D Gaussians. This representation allows us to combine strand-based hair priors, which are essential for realistic modeling of the inner structure of hairstyles, with the differentiable rendering capabilities of 3D Gaussian Splatting. Our method, named Gaussian Haircut, is evaluated on synthetic and real scenes and demonstrates state-of-the-art performance in the task of strand-based hair reconstruction.\n\n我们介绍了一种新的头发建模方法，该方法采用经典的发束和3D高斯的双重表示，从多视角数据中生成精确且真实的基于发束的重建。与最近使用非结构化高斯模型重建人类头像的方式不同，我们的方法通过3D折线（即发束）来重建头发。这一根本性差异使得生成的发型能够直接在现代计算机图形引擎中用于编辑、渲染和模拟。我们的3D提升方法依赖于非结构化高斯生成多视角的真实数据，以监督发束的拟合。发型本身以所谓的“发束对齐的3D高斯”形式表示。这种表示使我们能够结合基于发束的头发先验——这是对发型内在结构进行真实建模所必需的，同时还能利用3D Gaussian Splatting的可微分渲染能力。我们的方法命名为Gaussian Haircut，在合成和真实场景中进行了评估，并在基于发束的头发重建任务中展示了最先进的性能。\n\n"
  },
  {
    "path": "abs/2409.15689.md",
    "content": "### Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB\n\nThe goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms. Despite the progress in NeRFs and Gaussian Splats, their large model size and specialized renderers make it challenging to distribute free-viewpoint 3D content as easily as images. To address this, we have designed a novel 3D representation that encodes the plenoptic function into sinusoidal function indexed dense volumes. This approach facilitates feature sharing across different locations, improving compactness over traditional spatial voxels. The memory footprint of the dense 3D feature grid can be further reduced using spatial decomposition techniques. This design combines the strengths of spatial hashing functions and voxel decomposition, resulting in a model size as small as 150 KB for each 3D scene. Moreover, PPNG features a lightweight rendering pipeline with only 300 lines of code that decodes its representation into standard GL textures and fragment shaders. This enables real-time rendering using the traditional GL pipeline, ensuring universal compatibility and efficiency across various platforms without additional dependencies.\n\n本文的目标是从二维图像中对三维场景进行极为紧凑的表示编码，并使其能够在多个平台上实时传输、解码和渲染。尽管NeRF和高斯分布在这一领域取得了进展，但其较大的模型尺寸和专门的渲染器使得像图像一样轻松分发自由视角3D内容变得具有挑战性。为了解决这一问题，我们设计了一种新颖的三维表示方法，将全光函数编码为以正弦函数索引的密集体素。该方法有助于在不同位置共享特征，相较于传统的空间体素提高了紧凑性。通过使用空间分解技术，可以进一步减少密集三维特征网格的内存占用。此设计结合了空间哈希函数和体素分解的优势，使每个三维场景的模型大小可缩小至150 KB。此外，PPNG 具有一个轻量级的渲染管道，仅需300行代码即可将其表示解码为标准GL纹理和片段着色器。这使得使用传统GL管道进行实时渲染成为可能，确保了各种平台上的通用兼容性和高效性，无需额外的依赖。\n"
  },
  {
    "path": "abs/2409.15959.md",
    "content": "### Semantics-Controlled Gaussian Splatting for Outdoor Scene Reconstruction and Rendering in Virtual Reality\n\nAdvancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancement or to incorporate 3D assets, segmenting Gaussians by class is essential. Existing segmentation approaches are typically limited to certain types of scenes, e.g., ''circular'' scenes, to determine clear object boundaries. However, this method is ineffective when removing large objects in non-''circling'' scenes such as large outdoor scenes. We propose Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments. SCGS allows scene editing and the extraction of scene parts for VR. Additionally, we introduce a challenging outdoor dataset, overcoming the ''circling'' setup. We outperform the state-of-the-art in visual quality on our dataset and in segmentation quality on the 3D-OVS dataset. We conducted an exploratory user study, comparing a 360-video, plain GS, and SCGS in VR with a fixed viewpoint. In our subsequent main study, users were allowed to move freely, evaluating plain GS and SCGS. Our main study results show that participants clearly prefer SCGS over plain GS. We overall present an innovative approach that surpasses the state-of-the-art both technically and in user experience.\n\n3D 渲染技术的进步，如高斯分布（Gaussian Splatting，GS），使得虚拟现实（VR）中的新视角合成和实时渲染成为可能。然而，GS 创建的三维环境通常难以编辑。为了增强场景或融入三维资产，对高斯点进行按类别分割是至关重要的。现有的分割方法通常仅限于某些类型的场景，例如“环绕”场景，用于确定明确的物体边界。然而，当处理诸如大型户外场景等非“环绕”场景时，这种方法在移除大型物体时效果不佳。我们提出了语义控制的高斯分布（Semantics-Controlled GS，SCGS），这是一种基于分割驱动的 GS 方法，能够在不受控的自然环境中分离出大型场景部分。SCGS 允许对场景进行编辑，并可提取场景部分用于 VR。此外，我们引入了一个具有挑战性的户外数据集，克服了“环绕”设置的局限性。在我们的数据集上，我们在视觉质量方面超越了现有技术，并在 3D-OVS 数据集上的分割质量上取得了优异表现。我们还进行了一项探索性用户研究，比较了 360 视频、纯 GS 和 SCGS 在 VR 中固定视角下的表现。在随后的主要研究中，用户可以自由移动，评估纯 GS 和 SCGS。我们的主要研究结果表明，参与者明显更偏爱 SCGS 而非纯 GS。总体而言，我们提出了一种创新方法，在技术和用户体验上都超越了现有的技术水平。\n"
  },
  {
    "path": "abs/2409.16147.md",
    "content": "### Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities\n\nRecent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the \"Gaussian Déjà-vu\" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.\n\n3D 高斯分布（3D Gaussian Splatting, 3DGS）的最新进展在建模3D头部头像方面展现了巨大的潜力，提供了比基于网格的方法更大的灵活性，并且相比基于NeRF的方法具有更高的渲染效率。尽管取得了这些进展，基于3DGS的可控头部头像的创建仍然耗时，通常需要几十分钟甚至数小时。为了加速这一过程，我们提出了“Gaussian Déjà-vu”框架，该框架首先获得头部头像的通用模型，然后进行个性化调整。通用模型在大规模二维图像（包括合成和真实图像）数据集上进行训练，该模型提供了一个良好初始化的三维高斯头部，随后通过单目视频进一步优化，以实现个性化头像。在个性化过程中，我们提出了可学习的表情感知校正混合图（expression-aware rectification blendmaps）来修正初始的三维高斯，确保快速收敛，且无需依赖神经网络。实验结果表明，所提出的方法达到了其目标，在写实质量上优于当前最先进的3D高斯头部头像方法，并且将训练时间至少减少到现有方法的四分之一，能够在几分钟内生成头像。\n"
  },
  {
    "path": "abs/2409.16296.md",
    "content": "### LiDAR-3DGS: LiDAR Reinforced 3D Gaussian Splatting for Multimodal Radiance Field Rendering\n\nIn this paper, we explore the capabilities of multimodal inputs to 3D Gaussian Splatting (3DGS) based Radiance Field Rendering. We present LiDAR-3DGS, a novel method of reinforcing 3DGS inputs with LiDAR generated point clouds to significantly improve the accuracy and detail of 3D models. We demonstrate a systematic approach of LiDAR reinforcement to 3DGS to enable capturing of important features such as bolts, apertures, and other details that are often missed by image-based features alone. These details are crucial for engineering applications such as remote monitoring and maintenance. Without modifying the underlying 3DGS algorithm, we demonstrate that even a modest addition of LiDAR generated point cloud significantly enhances the perceptual quality of the models. At 30k iterations, the model generated by our method resulted in an increase of 7.064% in PSNR and 0.565% in SSIM, respectively. Since the LiDAR used in this research was a commonly used commercial-grade device, the improvements observed were modest and can be further enhanced with higher-grade LiDAR systems. Additionally, these improvements can be supplementary to other derivative works of Radiance Field Rendering and also provide a new insight for future LiDAR and computer vision integrated modeling.\n\n在本文中，我们探讨了多模态输入对基于3D高斯分布（3D Gaussian Splatting, 3DGS）的辐射场渲染的能力。我们提出了一种新的方法——**LiDAR-3DGS**，通过将LiDAR生成的点云与3DGS输入相结合，显著提升了三维模型的准确性和细节表现。我们展示了一种系统性的LiDAR增强3DGS的方法，使其能够捕捉重要特征，例如螺栓、开口以及其他图像特征往往遗漏的细节。这些细节对于远程监控和维护等工程应用至关重要。我们在不修改3DGS底层算法的前提下，证明了即使适量加入LiDAR生成的点云，也能显著提升模型的感知质量。在30k次迭代后，使用我们方法生成的模型分别在峰值信噪比（PSNR）和结构相似性（SSIM）上提升了7.064%和0.565%。由于本研究中使用的LiDAR设备是常用的商用级设备，因此提升效果相对温和，若使用更高端的LiDAR系统，这些改进可进一步增强。此外，这些改进可以作为辐射场渲染其他衍生工作的补充，并为未来LiDAR与计算机视觉集成建模提供了新的见解。\n"
  },
  {
    "path": "abs/2409.16470.md",
    "content": "### Frequency-based View Selection in Gaussian Splatting Reconstruction\n\nThree-dimensional reconstruction is a fundamental problem in robotics perception. We examine the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible. Although 3D Gaussian Splatting has made significant progress in image rendering and 3D reconstruction, the quality of the reconstruction is strongly impacted by the selection of 2D images and the estimation of camera poses through Structure-from-Motion (SfM) algorithms. Current methods to select views that rely on uncertainties from occlusions, depth ambiguities, or neural network predictions directly are insufficient to handle the issue and struggle to generalize to new scenes. By ranking the potential views in the frequency domain, we are able to effectively estimate the potential information gain of new viewpoints without ground truth data. By overcoming current constraints on model architecture and efficacy, our method achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.\n\n三维重建是机器人感知中的一个基础问题。我们研究了主动视角选择问题，旨在使用尽可能少的输入图像进行3D高斯分布（3D Gaussian Splatting）重建。尽管3D高斯分布在图像渲染和三维重建方面取得了显著进展，但重建质量在很大程度上受到2D图像选择以及通过结构光（Structure-from-Motion, SfM）算法估计相机姿态的影响。现有依赖遮挡、不确定性、深度模糊或神经网络预测的视角选择方法难以解决这一问题，且在泛化到新场景时表现不佳。通过在频域中对潜在视角进行排序，我们能够在没有真实数据的情况下有效估计新视角的潜在信息增益。通过克服当前模型架构和效果的限制，我们的方法在视角选择上实现了最新的成果，展现了其在高效基于图像的三维重建中的潜力。\n"
  },
  {
    "path": "abs/2409.16502.md",
    "content": "### GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization\n\nAlthough various visual localization approaches exist, such as scene coordinate and pose regression, these methods often struggle with high memory consumption or extensive optimization requirements. To address these challenges, we utilize recent advancements in novel view synthesis, particularly 3D Gaussian Splatting (3DGS), to enhance localization. 3DGS allows for the compact encoding of both 3D geometry and scene appearance with its spatial features. Our method leverages the dense description maps produced by XFeat's lightweight keypoint detection and description model. We propose distilling these dense keypoint descriptors into 3DGS to improve the model's spatial understanding, leading to more accurate camera pose predictions through 2D-3D correspondences. After estimating an initial pose, we refine it using a photometric warping loss. Benchmarking on popular indoor and outdoor datasets shows that our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.s\n\n尽管已有多种视觉定位方法，如场景坐标和姿态回归，这些方法通常面临高内存消耗或大量优化需求的问题。为了解决这些挑战，我们利用了新视角合成领域的最新进展，特别是3D高斯分布（3D Gaussian Splatting, 3DGS），来增强定位能力。3DGS 通过其空间特征实现了对三维几何和场景外观的紧凑编码。我们的方法利用了XFeat轻量级关键点检测和描述模型生成的密集描述图，并将这些密集关键点描述符蒸馏到3DGS中，从而提升模型的空间理解能力，通过2D-3D对应关系实现更准确的相机姿态预测。在估计初始姿态后，我们使用光度变形损失进行精细优化。在热门的室内和室外数据集上进行基准测试显示，我们的方法在性能上超越了最新的神经渲染姿态（Neural Render Pose, NRP）方法，包括NeRFMatch和PNeRFLoc。\n"
  },
  {
    "path": "abs/2409.16504.md",
    "content": "### Low Latency Point Cloud Rendering with Learned Splatting\n\nPoint cloud is a critical 3D representation with many emerging applications. Because of the point sparsity and irregularity, high-quality rendering of point clouds is challenging and often requires complex computations to recover the continuous surface representation. On the other hand, to avoid visual discomfort, the motion-to-photon latency has to be very short, under 10 ms. Existing rendering solutions lack in either quality or speed. To tackle these challenges, we present a framework that unlocks interactive, free-viewing and high-fidelity point cloud rendering. We train a generic neural network to estimate 3D elliptical Gaussians from arbitrary point clouds and use differentiable surface splatting to render smooth texture and surface normal for arbitrary views. Our approach does not require per-scene optimization, and enable real-time rendering of dynamic point cloud. Experimental results demonstrate the proposed solution enjoys superior visual quality and speed, as well as generalizability to different scene content and robustness to compression artifacts.\n\n点云是一种关键的三维表示形式，具有众多新兴应用。然而，由于点的稀疏性和不规则性，高质量的点云渲染具有挑战性，通常需要复杂的计算来恢复连续的表面表示。另一方面，为了避免视觉不适，运动到图像呈现的延迟必须非常短，低于10毫秒。现有的渲染解决方案要么缺乏质量，要么缺乏速度。为应对这些挑战，我们提出了一个框架，实现了交互式、自由视角的高保真点云渲染。我们训练了一个通用的神经网络，能够从任意点云中估计三维椭圆高斯，并使用可微分的表面溅射技术渲染平滑的纹理和表面法线，以适应任意视角。我们的方法无需每个场景单独优化，能够实时渲染动态点云。实验结果表明，所提出的解决方案在视觉质量和速度上具有显著优势，同时在处理不同场景内容和应对压缩伪影方面展现了良好的泛化能力和鲁棒性。\n"
  },
  {
    "path": "abs/2409.16915.md",
    "content": "### Let's Make a Splan: Risk-Aware Trajectory Optimization in a Normalized Gaussian Splat\n\nNeural Radiance Fields and Gaussian Splatting have transformed the field of computer vision by enabling photo-realistic representation of complex scenes. Despite this success, they have seen only limited use in real-world robotics tasks such as trajectory optimization. Two key factors have contributed to this limited success. First, it is challenging to reason about collisions in radiance models. Second, it is difficult to perform inference of radiance models fast enough for real-time trajectory synthesis. This paper addresses these challenges by proposing SPLANNING, a risk-aware trajectory optimizer that operates in a Gaussian Splatting model. This paper first derives a method for rigorously upper-bounding the probability of collision between a robot and a radiance field. Second, this paper introduces a normalized reformulation of Gaussian Splatting that enables the efficient computation of the collision bound in a Gaussian Splat. Third, a method is presented to optimize trajectories while avoiding collisions with a scene represented by a Gaussian Splat. Experiments demonstrate that SPLANNING outperforms state-of-the-art methods in generating collision-free trajectories in highly cluttered environments. The proposed system is also tested on a real-world robot manipulator.\n\n神经辐射场（Neural Radiance Fields）和高斯分布（Gaussian Splatting）通过实现复杂场景的逼真表示，已经改变了计算机视觉领域的面貌。然而，尽管取得了显著成功，它们在诸如轨迹优化等实际机器人任务中的应用仍然有限。这种局限性主要归因于两个关键因素。首先，在辐射场模型中推理碰撞具有挑战性。其次，难以在辐射模型中以足够快的速度进行推理，以实现实时轨迹合成。本文通过提出**SPLANNING**（一种在高斯分布模型下运行的风险感知轨迹优化器）来应对这些挑战。本文首先推导了一种严格上界机器人与辐射场碰撞概率的方法。其次，提出了一种标准化的高斯分布重新表述，使得在高斯分布中高效计算碰撞界成为可能。第三，本文提出了一种优化轨迹的方法，能够在避免与由高斯分布表示的场景发生碰撞的同时进行优化。实验表明，SPLANNING 在生成无碰撞轨迹的高密度环境中表现优于现有的最先进方法。该系统还在实际的机器人机械臂上进行了测试。\n\n"
  },
  {
    "path": "abs/2409.16938.md",
    "content": "### Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model\n\nGenerating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.\n\n在三维内容中生成并插入新物体是实现多样化场景重建的一个引人注目的方法。现有依赖SDS优化或单视图修复的方法，往往难以生成高质量的结果。为了解决这一问题，我们提出了一种基于高斯分布（Gaussian Splatting）的新颖物体插入方法。该方法引入了一个多视角扩散模型，称为MVInpainter，基于预训练的稳定视频扩散模型构建，以实现视图一致的物体修复。在MVInpainter中，我们集成了一个基于ControlNet的条件注入模块，从而实现可控且更可预测的多视角生成。在生成多视角修复结果后，我们进一步提出了一种基于掩码感知的三维重建技术，以从这些稀疏修复视图中优化高斯分布重建。通过利用这些精细化的技术，我们的方法能够生成多样化的结果，确保视图一致且和谐的插入，并提升物体的质量。大量实验表明，我们的方法优于现有方法。\n"
  },
  {
    "path": "abs/2409.16944.md",
    "content": "### Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM\n\nWe introduce Go-SLAM, a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments while embedding object-level information within the scene representations. This framework employs advanced object segmentation techniques, assigning a unique identifier to each Gaussian splat that corresponds to the object it represents. Consequently, our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions. Furthermore, the framework features an optimal path generation module that calculates efficient navigation paths for robots toward queried objects, considering obstacles and environmental uncertainties. Comprehensive evaluations in various scene settings demonstrate the effectiveness of our approach in delivering high-fidelity scene reconstructions, precise object segmentation, flexible object querying, and efficient robot path planning. This work represents an additional step forward in bridging the gap between 3D scene reconstruction, semantic object understanding, and real-time environment interactions.\n\n我们提出了Go-SLAM，一个利用3D高斯分布SLAM的新框架，用于在重建动态环境的同时将物体级别的信息嵌入场景表示中。该框架采用先进的物体分割技术，为每个与物体对应的高斯点赋予唯一标识符。由此，我们的系统支持开放词汇查询，允许用户通过自然语言描述定位物体。此外，该框架包含一个最优路径生成模块，能够为机器人计算前往查询物体的高效导航路径，同时考虑障碍物和环境不确定性。在多种场景设置中的全面评估表明，我们的方法在高保真场景重建、精确物体分割、灵活物体查询以及高效机器人路径规划方面表现出色。此项工作进一步推动了3D场景重建、语义物体理解与实时环境交互之间的融合。\n"
  },
  {
    "path": "abs/2409.17280.md",
    "content": "### Disco4D: Disentangled 4D Human Generation and Animation from a Single Image\n\nWe present Disco4D, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. 1) Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. 2) It adopts diffusion models to enhance the 3D generation process, \\textit{e.g.}, modeling occluded parts not visible in the input image. 3) It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in \\url{this https URL}.\n\n我们提出了Disco4D，一个用于从单张图像生成和动画化4D人类的全新高斯分布框架。与现有方法不同，Disco4D 通过将服装（使用高斯模型）与人体（使用SMPL-X模型）明确解耦，大大提升了生成细节和灵活性。它具有以下技术创新：1) Disco4D 学会高效地将服装高斯拟合到 SMPL-X 高斯上。 2) 它采用扩散模型增强了三维生成过程，例如对输入图像中不可见的遮挡部分进行建模。 3) 它为每个服装高斯学习一个身份编码，以便于服装资产的分离和提取。此外，Disco4D 自然支持带有生动动态的 4D 人体动画。大量实验表明，Disco4D 在 4D 人类生成和动画任务上表现出卓越的优势。\n"
  },
  {
    "path": "abs/2409.17345.md",
    "content": "### SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model\n\nWe introduce SeaSplat, a method to enable real-time rendering of underwater scenes leveraging recent advances in 3D radiance fields. Underwater scenes are challenging visual environments, as rendering through a medium such as water introduces both range and color dependent effects on image capture. We constrain 3D Gaussian Splatting (3DGS), a recent advance in radiance fields enabling rapid training and real-time rendering of full 3D scenes, with a physically grounded underwater image formation model. Applying SeaSplat to the real-world scenes from SeaThru-NeRF dataset, a scene collected by an underwater vehicle in the US Virgin Islands, and simulation-degraded real-world scenes, not only do we see increased quantitative performance on rendering novel viewpoints from the scene with the medium present, but are also able to recover the underlying true color of the scene and restore renders to be without the presence of the intervening medium. We show that the underwater image formation helps learn scene structure, with better depth maps, as well as show that our improvements maintain the significant computational improvements afforded by leveraging a 3D Gaussian representation.\n\n我们提出了SeaSplat，一种利用最新的三维辐射场进展实现水下场景实时渲染的方法。水下场景是具有挑战性的视觉环境，因为通过如水这样的介质进行渲染时，图像捕获会受到距离和颜色的依赖性影响。我们对3D高斯分布（3DGS）进行约束，3DGS是辐射场中的一项新进展，能够实现完整三维场景的快速训练和实时渲染，并结合了一个基于物理的水下图像生成模型。将SeaSplat应用于SeaThru-NeRF数据集中的真实世界场景（该数据集由美国维尔京群岛的一辆水下车辆采集），以及通过模拟降质处理的真实世界场景，我们不仅在含有介质的情况下实现了更高的定量渲染性能，还能够恢复场景的真实颜色，并将渲染结果还原为没有介质影响的状态。我们展示了水下图像生成模型有助于学习场景结构，生成更好的深度图，同时我们的改进也保持了通过3D高斯表示带来的显著计算优势。\n"
  },
  {
    "path": "abs/2409.17624.md",
    "content": "### HGS-Planner: Hierarchical Planning Framework for Active Scene Reconstruction Using 3D Gaussian Splatting\n\nIn complex missions such as search and rescue,robots must make intelligent decisions in unknown environments, relying on their ability to perceive and understand their surroundings. High-quality and real-time reconstruction enhances situational awareness and is crucial for intelligent robotics. Traditional methods often struggle with poor scene representation or are too slow for real-time use. Inspired by the efficacy of 3D Gaussian Splatting (3DGS), we propose a hierarchical planning framework for fast and high-fidelity active reconstruction. Our method evaluates completion and quality gain to adaptively guide reconstruction, integrating global and local planning for efficiency. Experiments in simulated and real-world environments show our approach outperforms existing real-time methods.\n\n在复杂任务如搜救中，机器人必须在未知环境中做出智能决策，这依赖于其感知和理解周围环境的能力。高质量和实时的重建提升了情境感知能力，对智能机器人至关重要。传统方法通常难以提供良好的场景表示，或者速度过慢，无法用于实时操作。受3D高斯分布（3D Gaussian Splatting, 3DGS）高效性的启发，我们提出了一种用于快速且高保真主动重建的分层规划框架。我们的方法通过评估完成度和质量增益，自适应地引导重建过程，结合全局和局部规划以提高效率。在模拟和真实环境中的实验表明，我们的方法在性能上优于现有的实时方法。\n"
  },
  {
    "path": "abs/2409.17917.md",
    "content": "### WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians\n\nWhile style transfer techniques have been well-developed for 2D image stylization, the extension of these methods to 3D scenes remains relatively unexplored. Existing approaches demonstrate proficiency in transferring colors and textures but often struggle with replicating the geometry of the scenes. In our work, we leverage an explicit Gaussian Splatting (GS) representation and directly match the distributions of Gaussians between style and content scenes using the Earth Mover's Distance (EMD). By employing the entropy-regularized Wasserstein-2 distance, we ensure that the transformation maintains spatial smoothness. Additionally, we decompose the scene stylization problem into smaller chunks to enhance efficiency. This paradigm shift reframes stylization from a pure generative process driven by latent space losses to an explicit matching of distributions between two Gaussian representations. Our method achieves high-resolution 3D stylization by faithfully transferring details from 3D style scenes onto the content scene. Furthermore, WaSt-3D consistently delivers results across diverse content and style scenes without necessitating any training, as it relies solely on optimization-based techniques. See our project page for additional results and source code: $\\href{this https URL}{this https URL}$.\n\n尽管风格迁移技术在二维图像风格化方面已得到了充分发展，但这些方法在三维场景中的扩展仍相对较少被探索。现有方法在色彩和纹理迁移方面表现出色，但通常难以复制场景的几何结构。在我们的工作中，我们利用显式的高斯分布（Gaussian Splatting, GS）表示，通过使用地球移动者距离（Earth Mover's Distance, EMD）直接匹配风格和内容场景之间的高斯分布。通过采用熵正则化的Wasserstein-2距离，我们确保了变换过程中的空间平滑性。此外，我们将场景风格化问题分解为更小的部分，以提高效率。这一范式的转变将风格化从依赖潜在空间损失的生成过程重新定义为两个高斯表示之间的显式分布匹配。我们的方法通过将三维风格场景中的细节真实地迁移到内容场景中，实现了高分辨率的三维风格化。此外，WaSt-3D 在多种内容和风格场景中始终如一地交付结果，无需任何训练，因为它完全依赖于基于优化的技术。\n"
  },
  {
    "path": "abs/2409.18108.md",
    "content": "### Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot\n\nBuilding semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.\n\n构建语义三维地图在办公室、仓库、商店和家庭中寻找目标物体时非常有价值。我们提出了一种映射系统，能够逐步构建**语言嵌入高斯分布（Language-Embedded Gaussian Splat, LEGS）**：一种详细的三维场景表示，将外观和语义编码为统一表示。LEGS 在机器人行进过程中在线训练，使其能够定位开放词汇的物体查询。我们在四个房间规模的场景中评估了 LEGS，通过对场景中的物体进行查询，评估 LEGS 捕捉语义含义的能力。我们将 LEGS 与 LERF 进行了比较，发现虽然两个系统在物体查询成功率上相当，但 LEGS 的训练速度比 LERF 快3.5倍以上。结果表明，多摄像头设置和增量捆绑调整可以在受限的机器人轨迹中提高视觉重建质量，且 LEGS 能够在开放词汇和长尾物体查询中达到高达66%的准确率。\n"
  },
  {
    "path": "abs/2409.18122.md",
    "content": "### RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration\n\nWe propose a framework for active mapping and exploration that leverages Gaussian splatting for constructing information-rich maps. Further, we develop a parallelized motion planning algorithm that can exploit the Gaussian map for real-time navigation. The Gaussian map constructed onboard the robot is optimized for both photometric and geometric quality while enabling real-time situational awareness for autonomy. We show through simulation experiments that our method is competitive with approaches that use alternate information gain metrics, while being orders of magnitude faster to compute. In real-world experiments, our algorithm achieves better map quality (10% higher Peak Signal-to-Noise Ratio (PSNR) and 30% higher geometric reconstruction accuracy) than Gaussian maps constructed by traditional exploration baselines. Experiment videos and more details can be found on our project page: this https URL\n\n我们提出了一个主动映射与探索框架，利用高斯分布（Gaussian Splatting）构建信息丰富的地图。此外，我们开发了一个并行化的运动规划算法，能够利用该高斯地图实现实时导航。该高斯地图在机器人上进行构建，并针对光度和几何质量进行优化，同时为自主导航提供实时的情境感知。通过模拟实验，我们证明了该方法与使用其他信息增益度量的方法相比具有竞争力，并且计算速度快了几个数量级。在真实环境实验中，我们的算法在地图质量上表现优异，比传统探索基线构建的高斯地图提高了10%的峰值信噪比（PSNR）和30%的几何重建精度。\n"
  },
  {
    "path": "abs/2409.18852.md",
    "content": "### Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes\n\nPrevious surface reconstruction methods either suffer from low geometric accuracy or lengthy training times when dealing with real-world complex dynamic scenes involving multi-person activities, and human-object interactions. To tackle the dynamic contents and the occlusions in complex scenes, we present a space-time 2D Gaussian Splatting approach. Specifically, to improve geometric quality in dynamic scenes, we learn canonical 2D Gaussian splats and deform these 2D Gaussian splats while enforcing the disks of the Gaussian located on the surface of the objects by introducing depth and normal regularizers. Further, to tackle the occlusion issues in complex scenes, we introduce a compositional opacity deformation strategy, which further reduces the surface recovery of those occluded areas. Experiments on real-world sparse-view video datasets and monocular dynamic datasets demonstrate that our reconstructions outperform state-of-the-art methods, especially for the surface of the details. The project page and more visualizations can be found at: this https URL.\n\n以往的表面重建方法在处理涉及多人活动和人-物交互的复杂动态场景时，要么几何精度较低，要么训练时间较长。为了解决复杂场景中的动态内容和遮挡问题，我们提出了一种时空二维高斯分布（2D Gaussian Splatting）方法。具体而言，为了提升动态场景中的几何质量，我们学习了标准的二维高斯点，并对这些二维高斯点进行变形，同时通过引入深度和法线正则化，使高斯的圆盘位于物体表面上。此外，为了应对复杂场景中的遮挡问题，我们提出了一种组合的不透明度变形策略，进一步减少了遮挡区域的表面恢复问题。在稀疏视角视频数据集和单目动态数据集上的实验表明，我们的重建结果在表面细节方面优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2409.19039.md",
    "content": "### Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation\n\nThe creation of digital replicas of physical objects has valuable applications for the preservation and dissemination of tangible cultural heritage. However, existing methods are often slow, expensive, and require expert knowledge. We propose a pipeline to generate a 3D replica of a scene using only RGB images (e.g. photos of a museum) and then extract a model for each item of interest (e.g. pieces in the exhibit). We do this by leveraging the advancements in novel view synthesis and Gaussian Splatting, modified to enable efficient 3D segmentation. This approach does not need manual annotation, and the visual inputs can be captured using a standard smartphone, making it both affordable and easy to deploy. We provide an overview of the method and baseline evaluation of the accuracy of object segmentation.\n\n创建物理对象的数字复制品在保护和传播有形文化遗产方面具有重要的应用。然而，现有的方法通常速度慢、成本高，并且需要专业知识。我们提出了一种管道，仅使用RGB图像（如博物馆的照片）生成场景的三维复制品，并随后提取每个感兴趣物体（如展览中的展品）的模型。我们通过利用新视角合成和高斯分布（Gaussian Splatting）的进展，修改这些技术以实现高效的三维分割。此方法无需手动注释，视觉输入可以通过普通智能手机捕捉，使其既经济又易于部署。我们提供了该方法的概述以及物体分割精度的基准评估。\n"
  },
  {
    "path": "abs/2409.19215.md",
    "content": "### 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction\n\nThis report describes our 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024. In this challenge, we address the task of bimanual category-agnostic hand-object interaction reconstruction, which aims to generate 3D reconstructions of both hands and the object from a monocular video, without relying on predefined templates. This task is particularly challenging due to the significant occlusion and dynamic contact between the hands and the object during bimanual manipulation. We worked to resolve these issues by introducing a mask loss and a 3D contact loss, respectively. Moreover, we applied 3D Gaussian Splatting (3DGS) to this task. As a result, our method achieved a value of 38.69 in the main metric, CD, on the ARCTIC test set.\n\n本报告描述了我们在ECCV 2024第八届HANDS研讨会挑战赛（ARCTIC赛道）中获得第一名的解决方案。在此次挑战赛中，我们应对了双手类无关手物交互重建任务，旨在从单目视频生成双手和物体的三维重建，而无需依赖预定义模板。由于双手操作过程中双手与物体的显著遮挡和动态接触，这一任务尤其具有挑战性。为了解决这些问题，我们分别引入了掩码损失和三维接触损失。此外，我们将三维高斯分布（3D Gaussian Splatting, 3DGS）应用于该任务。最终，我们的方法在ARCTIC测试集上的主要指标CDh中取得了38.69的成绩。\n"
  },
  {
    "path": "abs/2409.19228.md",
    "content": "### GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting\n\nReliable self-localization is a foundational skill for many intelligent mobile platforms. This paper explores the use of event cameras for motion tracking thereby providing a solution with inherent robustness under difficult dynamics and illumination. In order to circumvent the challenge of event camera-based mapping, the solution is framed in a cross-modal way. It tracks a map representation that comes directly from frame-based cameras. Specifically, the proposed method operates on top of gaussian splatting, a state-of-the-art representation that permits highly efficient and realistic novel view synthesis. The key of our approach consists of a novel pose parametrization that uses a reference pose plus first order dynamics for local differential image rendering. The latter is then compared against images of integrated events in a staggered coarse-to-fine optimization scheme. As demonstrated by our results, the realistic view rendering ability of gaussian splatting leads to stable and accurate tracking across a variety of both publicly available and newly recorded data sequences.\n\n可靠的自我定位是众多智能移动平台的基础能力。本文探讨了使用事件相机进行运动跟踪，提供了一种在复杂动态和照明条件下具有固有鲁棒性的解决方案。为了解决基于事件相机的映射挑战，该解决方案采用跨模态的方式，跟踪直接由基于帧的相机生成的地图表示。具体来说，所提出的方法基于高斯分布（Gaussian Splatting），这是一种最先进的表示方法，能够实现高效且逼真的新视角合成。我们方法的关键在于一种新颖的姿态参数化方案，利用参考姿态加上一阶动态来进行局部差分图像渲染。然后，将渲染的结果与通过事件积分得到的图像进行比较，采用分级的粗到细优化策略。正如我们的实验结果所示，高斯分布的逼真视图渲染能力在多个公开数据集和新录制的数据序列中，实现了稳定且精确的跟踪。\n"
  },
  {
    "path": "abs/2409.19702.md",
    "content": "### RNG: Relightable Neural Gaussians\n\n3D Gaussian Splatting (3DGS) has shown its impressive power in novel view synthesis. However, creating relightable 3D assets, especially for objects with ill-defined shapes (e.g., fur), is still a challenging task. For these scenes, the decomposition between the light, geometry, and material is more ambiguous, as neither the surface constraints nor the analytical shading model hold. To address this issue, we propose RNG, a novel representation of relightable neural Gaussians, enabling the relighting of objects with both hard surfaces or fluffy boundaries. We avoid any assumptions in the shading model but maintain feature vectors, which can be further decoded by an MLP into colors, in each Gaussian point. Following prior work, we utilize a point light to reduce the ambiguity and introduce a shadow-aware condition to the network. We additionally propose a depth refinement network to help the shadow computation under the 3DGS framework, leading to better shadow effects under point lights. Furthermore, to avoid the blurriness brought by the alpha-blending in 3DGS, we design a hybrid forward-deferred optimization strategy. As a result, we achieve about 20× faster in training and about 600× faster in rendering than prior work based on neural radiance fields, with 60 frames per second on an RTX4090.\n\n3D 高斯分布（3D Gaussian Splatting, 3DGS）在新视角合成中展示了其强大的能力。然而，创建可重光照的三维资产，尤其是针对形状不明确的物体（如毛发），仍然是一项具有挑战性的任务。在这些场景中，光照、几何和材质之间的分解更加模糊，因为无论是表面约束还是解析光照模型都难以适用。为了解决这个问题，我们提出了RNG，一种新颖的可重光照神经高斯表示方法，能够对具有硬表面或柔软边界的物体进行重光照处理。我们避免了对光照模型的假设，但在每个高斯点上保留了可以通过MLP解码为颜色的特征向量。遵循以往的工作，我们使用点光源来减少模糊性，并引入了一个对阴影敏感的条件到网络中。我们还提出了一个深度优化网络，以帮助在3DGS框架下计算阴影，从而在点光源下实现更好的阴影效果。此外，为了避免3DGS中的alpha混合带来的模糊问题，我们设计了一种混合前向-延迟优化策略。最终，我们实现了比基于神经辐射场的先前工作快约20倍的训练速度和快约600倍的渲染速度，在RTX4090上可实现60帧每秒的渲染速度。\n"
  },
  {
    "path": "abs/2409.20111.md",
    "content": "### Robust Gaussian Splatting SLAM by Leveraging Loop Closure\n\n3D Gaussian Splatting algorithms excel in novel view rendering applications and have been adapted to extend the capabilities of traditional SLAM systems. However, current Gaussian Splatting SLAM methods, designed mainly for hand-held RGB or RGB-D sensors, struggle with tracking drifts when used with rotating RGB-D camera setups. In this paper, we propose a robust Gaussian Splatting SLAM architecture that utilizes inputs from rotating multiple RGB-D cameras to achieve accurate localization and photorealistic rendering performance. The carefully designed Gaussian Splatting Loop Closure module effectively addresses the issue of accumulated tracking and mapping errors found in conventional Gaussian Splatting SLAM systems. First, each Gaussian is associated with an anchor frame and categorized as historical or novel based on its timestamp. By rendering different types of Gaussians at the same viewpoint, the proposed loop detection strategy considers both co-visibility relationships and distinct rendering outcomes. Furthermore, a loop closure optimization approach is proposed to remove camera pose drift and maintain the high quality of 3D Gaussian models. The approach uses a lightweight pose graph optimization algorithm to correct pose drift and updates Gaussians based on the optimized poses. Additionally, a bundle adjustment scheme further refines camera poses using photometric and geometric constraints, ultimately enhancing the global consistency of scenarios. Quantitative and qualitative evaluations on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art methods in camera pose estimation and novel view rendering tasks. The code will be open-sourced for the community.\n\n3D 高斯分布（3D Gaussian Splatting）算法在新视角渲染应用中表现出色，并且已被扩展应用于传统 SLAM 系统。然而，当前的高斯分布 SLAM 方法主要针对手持 RGB 或 RGB-D 传感器设计，在旋转 RGB-D 摄像机设置下容易出现跟踪漂移问题。为了解决这一问题，我们提出了一种鲁棒的高斯分布 SLAM 架构，利用多个旋转 RGB-D 摄像机的输入，实现准确的定位和逼真的渲染性能。我们精心设计的高斯分布闭环检测模块有效解决了常规高斯分布 SLAM 系统中累积的跟踪和建图误差问题。首先，我们为每个高斯关联一个锚定帧，并根据其时间戳将其分类为历史高斯或新高斯。通过在同一视点渲染不同类型的高斯，所提出的闭环检测策略同时考虑了可视性关系和不同的渲染效果。此外，我们提出了一种闭环优化方法，用于消除相机姿态漂移并保持三维高斯模型的高质量。该方法使用轻量级的姿态图优化算法来校正姿态漂移，并根据优化后的姿态更新高斯。此外，捆绑调整方案进一步通过光度和几何约束优化相机姿态，最终增强场景的全局一致性。在合成数据集和真实数据集上的定量和定性评估表明，我们的方法在相机姿态估计和新视角渲染任务中优于最新的现有方法。\n"
  },
  {
    "path": "abs/2409.20291.md",
    "content": "### RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning\n\nSim-to-Real refers to the process of transferring policies learned in simulation to the real world, which is crucial for achieving practical robotics applications. However, recent Sim2real methods either rely on a large amount of augmented data or large learning models, which is inefficient for specific tasks. In recent years, radiance field-based reconstruction methods, especially the emergence of 3D Gaussian Splatting, making it possible to reproduce realistic real-world scenarios. To this end, we propose a novel real-to-sim-to-real reinforcement learning framework, RL-GSBridge, which introduces a mesh-based 3D Gaussian Splatting method to realize zero-shot sim-to-real transfer for vision-based deep reinforcement learning. We improve the mesh-based 3D GS modeling method by using soft binding constraints, enhancing the rendering quality of mesh models. We then employ a GS editing approach to synchronize rendering with the physics simulator, reflecting the interactions of the physical robot more accurately. Through a series of sim-to-real robotic arm experiments, including grasping and pick-and-place tasks, we demonstrate that RL-GSBridge maintains a satisfactory success rate in real-world task completion during sim-to-real transfer. Furthermore, a series of rendering metrics and visualization results indicate that our proposed mesh-based 3D Gaussian reduces artifacts in unstructured objects, demonstrating more realistic rendering performance.\n\nSim-to-Real 是指将模拟中学习的策略转移到现实世界，这对于实现实际的机器人应用至关重要。然而，现有的 Sim2Real 方法往往依赖于大量增强数据或庞大的学习模型，这对于某些特定任务而言效率较低。近年来，基于辐射场的重建方法，尤其是3D高斯分布（3D Gaussian Splatting）的出现，使得再现逼真的现实场景成为可能。为此，我们提出了一种新颖的“真实-模拟-真实”强化学习框架，称为RL-GSBridge，利用基于网格的3D高斯分布方法实现视觉深度强化学习的零样本Sim-to-Real迁移。我们通过软绑定约束改进了基于网格的3D GS建模方法，提升了网格模型的渲染质量。随后，我们采用高斯分布编辑方法，将渲染与物理模拟器同步，更准确地反映了物理机器人交互。通过一系列Sim-to-Real的机械臂实验，包括抓取和挑拣任务，我们展示了RL-GSBridge在Sim-to-Real迁移过程中保持了令人满意的任务完成成功率。此外，一系列渲染指标和可视化结果表明，我们提出的基于网格的3D高斯方法减少了非结构化物体中的伪影，展现出更逼真的渲染效果。\n"
  },
  {
    "path": "abs/2410.00299.md",
    "content": "### GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving\n\nPlace recognition is a crucial module to ensure autonomous vehicles obtain usable localization information in GPS-denied environments. In recent years, multimodal place recognition methods have gained increasing attention due to their ability to overcome the weaknesses of unimodal sensor systems by leveraging complementary information from different modalities. However, challenges arise from the necessity of harmonizing data across modalities and exploiting the spatio-temporal correlations between them sufficiently. In this paper, we propose a 3D Gaussian Splatting-based multimodal place recognition neural network dubbed GSPR. It explicitly combines multi-view RGB images and LiDAR point clouds into a spatio-temporally unified scene representation with the proposed Multimodal Gaussian Splatting. A network composed of 3D graph convolution and transformer is designed to extract high-level spatio-temporal features and global descriptors from the Gaussian scenes for place recognition. We evaluate our method on the nuScenes dataset, and the experimental results demonstrate that our method can effectively leverage complementary strengths of both multi-view cameras and LiDAR, achieving SOTA place recognition performance while maintaining solid generalization ability.\n\n地点识别是确保自动驾驶车辆在无GPS环境中获取可用定位信息的关键模块。近年来，多模态地点识别方法由于能够利用不同模态的互补信息，克服单一传感器系统的弱点，得到了越来越多的关注。然而，挑战在于如何有效协调跨模态的数据，并充分利用它们之间的时空相关性。在本文中，我们提出了一种基于3D高斯分布（3D Gaussian Splatting）的多模态地点识别神经网络，称为GSPR。该方法通过提出的多模态高斯分布，明确地将多视角RGB图像和LiDAR点云结合为一个时空统一的场景表示。我们设计了一个由3D图卷积和Transformer组成的网络，用于从高斯场景中提取高级时空特征和全局描述符，以实现地点识别。我们在nuScenes数据集上对该方法进行了评估，实验结果表明，我们的方法能够有效利用多视角相机和LiDAR的互补优势，达到了当前最先进（SOTA）的地点识别性能，同时保持了良好的泛化能力。\n"
  },
  {
    "path": "abs/2410.00386.md",
    "content": "### Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance\n\nArthroscopy is a minimally invasive surgical procedure used to diagnose and treat joint problems. The clinical workflow of arthroscopy typically involves inserting an arthroscope into the joint through a small incision, during which surgeons navigate and operate largely by relying on their visual assessment through the arthroscope. However, the arthroscope's restricted field of view and lack of depth perception pose challenges in navigating complex articular structures and achieving surgical precision during procedures. Aiming at enhancing intraoperative awareness, we present a robust pipeline that incorporates simultaneous localization and mapping, depth estimation, and 3D Gaussian splatting to realistically reconstruct intra-articular structures solely based on monocular arthroscope video. Extending 3D reconstruction to Augmented Reality (AR) applications, our solution offers AR assistance for articular notch measurement and annotation anchoring in a human-in-the-loop manner. Compared to traditional Structure-from-Motion and Neural Radiance Field-based methods, our pipeline achieves dense 3D reconstruction and competitive rendering fidelity with explicit 3D representation in 7 minutes on average. When evaluated on four phantom datasets, our method achieves RMSE = 2.21mm reconstruction error, PSNR = 32.86 and SSIM = 0.89 on average. Because our pipeline enables AR reconstruction and guidance directly from monocular arthroscopy without any additional data and/or hardware, our solution may hold the potential for enhancing intraoperative awareness and facilitating surgical precision in arthroscopy. Our AR measurement tool achieves accuracy within 1.59 +/- 1.81mm and the AR annotation tool achieves a mIoU of 0.721.\n\n关节镜是一种用于诊断和治疗关节问题的微创手术。关节镜的临床流程通常涉及通过小切口将关节镜插入关节，手术过程中外科医生主要依赖关节镜提供的视觉评估来进行导航和操作。然而，关节镜的视野受限及缺乏深度感知，在处理复杂的关节结构时增加了导航难度，并影响手术精度。为了增强术中认知，我们提出了一个鲁棒的工作流程，结合了同时定位与建图（SLAM）、深度估计以及3D高斯分布（3D Gaussian Splatting），通过单目关节镜视频真实地重建关节内结构。将3D重建扩展到增强现实（AR）应用中，我们的解决方案提供了AR辅助功能，用于关节切迹测量和标注锚定，并支持人机协作的操作模式。与传统的基于运动结构（Structure-from-Motion）和神经辐射场（Neural Radiance Field）方法相比，我们的流程在平均7分钟内实现了密集的3D重建和具有竞争力的渲染保真度，提供明确的3D表示。在四个仿真数据集上进行评估时，我们的方法平均重建误差为RMSE = 2.21mm，峰值信噪比（PSNR）为32.86，结构相似性（SSIM）为0.89。由于我们的流程能够直接从单目关节镜视频实现AR重建和指导，无需额外的数据或硬件，因此我们的解决方案可能有助于增强术中认知并提高关节镜手术的精度。我们的AR测量工具的精度为1.59 +/- 1.81mm，AR标注工具的平均交并比（mIoU）为0.721。\n"
  },
  {
    "path": "abs/2410.00486.md",
    "content": "### CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM\n\nSimultaneous Localization and Mapping (SLAM) is pivotal in robotics, with photorealistic scene reconstruction emerging as a key challenge. To address this, we introduce Computational Alignment for Real-Time Gaussian Splatting SLAM (CaRtGS), a novel method enhancing the efficiency and quality of photorealistic scene reconstruction in real-time environments. Leveraging 3D Gaussian Splatting (3DGS), CaRtGS achieves superior rendering quality and processing speed, which is crucial for scene photorealistic reconstruction. Our approach tackles computational misalignment in Gaussian Splatting SLAM (GS-SLAM) through an adaptive strategy that optimizes training, addresses long-tail optimization, and refines densification. Experiments on Replica and TUM-RGBD datasets demonstrate CaRtGS's effectiveness in achieving high-fidelity rendering with fewer Gaussian primitives. This work propels SLAM towards real-time, photorealistic dense rendering, significantly advancing photorealistic scene representation. For the benefit of the research community, we release the code on our project website: this https URL.\n\n同时定位与建图（SLAM）是机器人技术中的关键环节，而逼真的场景重建则是面临的主要挑战之一。为了解决这一问题，我们提出了用于实时高斯分布SLAM的计算对齐方法（Computational Alignment for Real-Time Gaussian Splatting SLAM，CaRtGS），这是一种新颖的方法，旨在提高实时环境中逼真场景重建的效率和质量。通过利用三维高斯分布（3D Gaussian Splatting, 3DGS），CaRtGS 实现了卓越的渲染质量和处理速度，对于逼真的场景重建至关重要。我们的方法通过自适应策略解决了高斯分布 SLAM（GS-SLAM）中的计算错位问题，优化了训练过程，处理了长尾优化问题，并改进了稠密化。我们在 Replica 和 TUM-RGBD 数据集上的实验表明，CaRtGS 在使用更少的高斯基元的情况下实现了高保真的渲染效果。本研究显著推动了 SLAM 向实时、逼真密集渲染的发展，极大地提升了逼真场景表示的水平。\n"
  },
  {
    "path": "abs/2410.01404.md",
    "content": "### Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection\n\nSkins wrapping around our bodies, leathers covering over the sofa, sheet metal coating the car - it suggests that objects are enclosed by a series of continuous surfaces, which provides us with informative geometry prior for objectness deduction. In this paper, we propose Gaussian-Det which leverages Gaussian Splatting as surface representation for multi-view based 3D object detection. Unlike existing monocular or NeRF-based methods which depict the objects via discrete positional data, Gaussian-Det models the objects in a continuous manner by formulating the input Gaussians as feature descriptors on a mass of partial surfaces. Furthermore, to address the numerous outliers inherently introduced by Gaussian splatting, we accordingly devise a Closure Inferring Module (CIM) for the comprehensive surface-based objectness deduction. CIM firstly estimates the probabilistic feature residuals for partial surfaces given the underdetermined nature of Gaussian Splatting, which are then coalesced into a holistic representation on the overall surface closure of the object proposal. In this way, the surface information Gaussian-Det exploits serves as the prior on the quality and reliability of objectness and the information basis of proposal refinement. Experiments on both synthetic and real-world datasets demonstrate that Gaussian-Det outperforms various existing approaches, in terms of both average precision and recall.\n\n我们的皮肤包裹着身体，皮革覆盖着沙发，金属板覆盖着汽车——这些现象表明物体被一系列连续的表面所包围，这为物体性推断提供了有用的几何先验信息。在本文中，我们提出了Gaussian-Det，它利用高斯散射作为基于多视图的3D目标检测的表面表示。与现有的基于单目或NeRF的方法使用离散位置数据来描述物体不同，Gaussian-Det通过将输入的高斯函数表述为部分表面的特征描述符，以连续的方式建模物体。此外，为了解决高斯散射本质上引入的大量离群点问题，我们相应地设计了一个封闭推理模块（Closure Inferring Module, CIM），用于全面的基于表面的物体性推断。CIM首先根据高斯散射的欠确定性估计部分表面的概率特征残差，随后将其整合成物体提案整体表面的封闭性表示。通过这种方式，Gaussian-Det利用的表面信息作为物体性质量和可靠性的先验，同时也是提案精炼的信息基础。在合成数据集和真实世界数据集上的实验表明，Gaussian-Det在平均精度和召回率方面均优于多种现有方法。\n"
  },
  {
    "path": "abs/2410.01425.md",
    "content": "### EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings\n\nThe feed-forward based 3D Gaussian Splatting method has demonstrated exceptional capability in real-time human novel view synthesis. However, existing approaches are restricted to dense viewpoint settings, which limits their flexibility in free-viewpoint rendering across a wide range of camera view angle discrepancies. To address this limitation, we propose a real-time pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse camera settings. Specifically, we first introduce an Efficient cross-View Attention (EVA) module to accurately estimate the position of each 3D Gaussian from the source images. Then, we integrate the source images with the estimated Gaussian position map to predict the attributes and feature embeddings of the 3D Gaussians. Moreover, we employ a recurrent feature refiner to correct artifacts caused by geometric errors in position estimation and enhance visual further improve synthesis quality, we incorporate a powerful anchor loss function for both 3D Gaussian attributes and human face landmarks. Experimental results on the THuman2.0 and THumansit datasets showcase the superiority of our EVA-Gaussian approach in rendering quality across diverse camera settings.\n\n基于前馈的3D高斯散射方法在实时人类新视图合成中展现了卓越的能力。然而，现有方法局限于稠密的视点设置，限制了其在跨越大范围相机视角差异的自由视点渲染中的灵活性。为了解决这一限制，我们提出了一种名为EVA-Gaussian的实时管线，用于在不同相机设置下实现3D人类新视图合成。具体而言，我们首先引入了高效跨视角注意力（Efficient cross-View Attention, EVA）模块，以精确估计每个3D高斯的位置信息。然后，我们将源图像与估计的高斯位置图整合，以预测3D高斯的属性和特征嵌入。此外，我们采用了一个递归特征优化器，纠正由位置估计中的几何误差引起的伪影，并增强视觉逼真度。为了进一步提升合成质量，我们为3D高斯属性和人脸标志点引入了一个强大的锚点损失函数。在THuman2.0和THumansit数据集上的实验结果展示了我们EVA-Gaussian方法在不同相机设置下的渲染质量优势。\n"
  },
  {
    "path": "abs/2410.01517.md",
    "content": "### UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction\n\n3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering. However, 3DGS assumes that the scene is in a clear medium environment and struggles to generate satisfactory representations in underwater scenes, where light absorption and scattering are prevalent and moving objects are involved. To overcome these, we introduce a novel Gaussian Splatting-based method, UW-GS, designed specifically for underwater applications. It introduces a color appearance that models distance-dependent color variation, employs a new physics-based density control strategy to enhance clarity for distant objects, and uses a binary motion mask to handle dynamic content. Optimized with a well-designed loss function supporting for scattering media and strengthened by pseudo-depth maps, UW-GS outperforms existing methods with PSNR gains up to 1.26dB. To fully verify the effectiveness of the model, we also developed a new underwater dataset, S-UW, with dynamic object masks.\n\n3D高斯散射（3DGS）具有实现实时高质量3D场景渲染的能力。然而，3DGS假设场景处于清晰的介质环境中，因此在处理光吸收和散射普遍存在的水下场景，尤其是涉及移动物体时，难以生成令人满意的表示。为了解决这一问题，我们提出了一种基于高斯散射的全新方法，UW-GS，专为水下应用设计。UW-GS引入了一种颜色外观模型，用于模拟与距离相关的颜色变化，采用了新的基于物理的密度控制策略，以增强远处物体的清晰度，并使用二值运动掩码来处理动态内容。通过为散射介质优化设计的损失函数以及伪深度图的强化，UW-GS相较于现有方法实现了最高达1.26dB的PSNR提升。为了全面验证该模型的有效性，我们还开发了一个全新的水下数据集S-UW，包含动态物体掩码。\n"
  },
  {
    "path": "abs/2410.01521.md",
    "content": "### MiraGe: Editable 2D Images using Gaussian Splatting\n\nImplicit Neural Representations (INRs) approximate discrete data through continuous functions and are commonly used for encoding 2D images. Traditional image-based INRs employ neural networks to map pixel coordinates to RGB values, capturing shapes, colors, and textures within the network's weights. Recently, GaussianImage has been proposed as an alternative, using Gaussian functions instead of neural networks to achieve comparable quality and compression. Such a solution obtains a quality and compression ratio similar to classical INR models but does not allow image modification. In contrast, our work introduces a novel method, MiraGe, which uses mirror reflections to perceive 2D images in 3D space and employs flat-controlled Gaussians for precise 2D image editing. Our approach improves the rendering quality and allows realistic image modifications, including human-inspired perception of photos in the 3D world. Thanks to modeling images in 3D space, we obtain the illusion of 3D-based modification in 2D images. We also show that our Gaussian representation can be easily combined with a physics engine to produce physics-based modification of 2D images. Consequently, MiraGe allows for better quality than the standard approach and natural modification of 2D images.\n\n隐式神经表示（INRs）通过连续函数近似离散数据，常用于编码2D图像。传统的基于图像的INRs使用神经网络将像素坐标映射到RGB值，捕捉形状、颜色和纹理，这些信息存储在网络的权重中。最近，提出了GaussianImage作为替代方案，使用高斯函数代替神经网络来实现相似的质量和压缩效果。虽然这种方法在质量和压缩比上与经典INR模型相当，但不允许图像修改。相反，我们的工作提出了一种新方法，MiraGe，它通过镜面反射在3D空间中感知2D图像，并使用平面控制的高斯函数进行精确的2D图像编辑。我们的方法不仅提高了渲染质量，还允许进行逼真的图像修改，包括模拟人在3D世界中对照片的感知。通过在3D空间中对图像建模，我们实现了在2D图像中进行3D效果修改的错觉。我们还展示了我们的高斯表示可以轻松与物理引擎结合，实现基于物理的2D图像修改。因此，MiraGe相比标准方法提供了更高的质量，并能够自然地修改2D图像。\n"
  },
  {
    "path": "abs/2410.01535.md",
    "content": "### GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians\n\nRecently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.\n\n随着神经辐射场（Neural Radiance Fields）和高斯散射（Gaussian Splatting）技术的发展，3D重建技术已实现了显著的高保真度。然而，这些方法学习到的潜在表示高度纠缠，缺乏可解释性。在本文中，我们提出了一种新颖的部件感知组合重建方法，称为GaussianBlock，它能够实现语义一致且解耦的表示，允许进行精确且物理的编辑，类似于构建积木，同时保持高保真度。GaussianBlock引入了一种混合表示，结合了基元（primitives）和3D高斯的优点，前者以其灵活的可操作性和可编辑性著称，而后者在重建质量上表现出色。具体而言，我们通过一种基于2D语义先验的注意力引导居中损失（attention-guided centering loss）实现了语义一致的基元，并辅以动态分裂与融合策略。此外，我们利用与基元混合的3D高斯细化结构细节，提升保真度。为加强和保持两者之间的联系，我们还采用了绑定继承策略。实验结果表明，我们重建的场景在多样基准测试中实现了解耦、组合性和紧凑性，能够在保持高质量的同时，实现无缝、直接且精确的编辑。\n"
  },
  {
    "path": "abs/2410.01614.md",
    "content": "### Gaussian Splatting in Mirrors: Reflection-Aware Rendering via Virtual Camera Optimization\n\nRecent advancements in 3D Gaussian Splatting (3D-GS) have revolutionized novel view synthesis, facilitating real-time, high-quality image rendering. However, in scenarios involving reflective surfaces, particularly mirrors, 3D-GS often misinterprets reflections as virtual spaces, resulting in blurred and inconsistent multi-view rendering within mirrors. Our paper presents a novel method aimed at obtaining high-quality multi-view consistent reflection rendering by modelling reflections as physically-based virtual cameras. We estimate mirror planes with depth and normal estimates from 3D-GS and define virtual cameras that are placed symmetrically about the mirror plane. These virtual cameras are then used to explain mirror reflections in the scene. To address imperfections in mirror plane estimates, we propose a straightforward yet effective virtual camera optimization method to enhance reflection quality. We collect a new mirror dataset including three real-world scenarios for more diverse evaluation. Experimental validation on both Mirror-Nerf and our real-world dataset demonstrate the efficacy of our approach. We achieve comparable or superior results while significantly reducing training time compared to previous state-of-the-art.\n\n3D高斯散射（3D-GS）的最新进展革新了新视图合成技术，实现了实时高质量图像渲染。然而，在包含反射表面的场景中，特别是镜面，3D-GS往往将反射误解为虚拟空间，导致镜面内的多视图渲染模糊且不一致。本文提出了一种新颖的方法，通过将反射建模为基于物理的虚拟相机，旨在获得高质量且多视图一致的反射渲染。我们利用3D-GS的深度和法线估计来确定镜面平面，并定义对称于镜面平面的虚拟相机。这些虚拟相机用于解释场景中的镜面反射。为了解决镜面平面估计中的不完美之处，我们提出了一种简单而有效的虚拟相机优化方法，以提升反射质量。我们还收集了一个包含三个真实世界场景的新镜面数据集，用于更为多样化的评估。实验验证表明，无论是在Mirror-Nerf还是我们的真实世界数据集上，我们的方法均表现出色，不仅取得了相当或更优的结果，还显著减少了训练时间，相较于之前的最先进方法更具优势。\n"
  },
  {
    "path": "abs/2410.01647.md",
    "content": "### 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection\n\nNeural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on mAP@0.25 and +8.1 on mAP@0.5 for the ScanNet dataset, and impressive +31.5 on mAP@0.25 for the ARKITScenes dataset.\n\n神经辐射场（NeRF）广泛应用于新视图合成，并已被改编用于3D目标检测（3DOD），通过视图合成表示提供了一种有前途的3DOD方法。然而，NeRF存在一些固有的限制：（i）由于其隐式特性，3DOD的表示能力有限；（ii）渲染速度较慢。最近，3D高斯散射（3DGS）作为一种显式的3D表示出现，解决了这些限制。受其优势的启发，本文首次将3DGS引入3DOD领域，识别出两个主要挑战：（i）高斯斑点的空间分布不明确：3DGS主要依赖2D像素级监督，导致高斯斑点的3D空间分布不清晰，难以区分物体和背景，阻碍了3DOD的效果；（ii）背景斑点过多：2D图像通常包含大量背景像素，导致3DGS重建的背景中充斥着大量噪声高斯斑点，影响检测表现。为应对挑战（i），我们利用3DGS重建来自2D图像的事实，提出了一种优雅且高效的解决方案，通过引入2D边界引导显著增强高斯斑点的空间分布，使物体与背景之间的区分更加清晰。针对挑战（ii），我们提出了一种基于2D框的聚焦采样策略，通过在3D空间生成物体概率分布，进行有效的概率采样，保留更多物体斑点并减少噪声背景斑点。得益于我们的设计，3DGS-DET显著超越了基于NeRF的SOTA方法NeRF-Det，在ScanNet数据集上mAP@0.25提升了6.6，mAP@0.5提升了8.1，而在ARKITScenes数据集上mAP@0.25更是提升了31.5。\n"
  },
  {
    "path": "abs/2410.01804.md",
    "content": "### EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis\n\nWe present Exact Volumetric Ellipsoid Rendering (EVER), a method for real-time differentiable emission-only volume rendering. Unlike recent rasterization based approach by 3D Gaussian Splatting (3DGS), our primitive based representation allows for exact volume rendering, rather than alpha compositing 3D Gaussian billboards. As such, unlike 3DGS our formulation does not suffer from popping artifacts and view dependent density, but still achieves frame rates of ∼30 FPS at 720p on an NVIDIA RTX4090. Since our approach is built upon ray tracing it enables effects such as defocus blur and camera distortion (e.g. such as from fisheye cameras), which are difficult to achieve by rasterization. We show that our method is more accurate with fewer blending issues than 3DGS and follow-up work on view-consistent rendering, especially on the challenging large-scale scenes from the Zip-NeRF dataset where it achieves sharpest results among real-time techniques.\n\n我们提出了精确体积椭球体渲染（EVER），这是一种用于实时可微分的仅发射体积渲染的方法。与最近基于3D高斯点（3DGS）进行光栅化的渲染方法不同，我们的基元表示允许进行精确的体积渲染，而不是对3D高斯广告牌进行alpha合成。因此，与3DGS不同，我们的方法不会出现“跳动”伪影和视角依赖的密度问题，同时在NVIDIA RTX4090上仍能以720p的分辨率实现大约30帧每秒的速度。由于我们的方法基于光线追踪，它能够实现难以通过光栅化达到的效果，例如散焦模糊和相机畸变（如鱼眼相机产生的畸变）。我们展示了该方法在融合问题上比3DGS和后续的视角一致性渲染工作更加精确，尤其是在来自Zip-NeRF数据集的大规模复杂场景中，它在实时技术中实现了最为清晰的渲染效果。\n"
  },
  {
    "path": "abs/2410.02103.md",
    "content": "### MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis\n\nRecent works in volume rendering, \\textit{e.g.} NeRF and 3D Gaussian Splatting (3DGS), significantly advance the rendering quality and efficiency with the help of the learned implicit neural radiance field or 3D Gaussians. Rendering on top of an explicit representation, the vanilla 3DGS and its variants deliver real-time efficiency by optimizing the parametric model with single-view supervision per iteration during training which is adopted from NeRF. Consequently, certain views are overfitted, leading to unsatisfying appearance in novel-view synthesis and imprecise 3D geometries. To solve aforementioned problems, we propose a new 3DGS optimization method embodying four key novel contributions: 1) We transform the conventional single-view training paradigm into a multi-view training strategy. With our proposed multi-view regulation, 3D Gaussian attributes are further optimized without overfitting certain training views. As a general solution, we improve the overall accuracy in a variety of scenarios and different Gaussian variants. 2) Inspired by the benefit introduced by additional views, we further propose a cross-intrinsic guidance scheme, leading to a coarse-to-fine training procedure concerning different resolutions. 3) Built on top of our multi-view regulated training, we further propose a cross-ray densification strategy, densifying more Gaussian kernels in the ray-intersect regions from a selection of views. 4) By further investigating the densification strategy, we found that the effect of densification should be enhanced when certain views are distinct dramatically. As a solution, we propose a novel multi-view augmented densification strategy, where 3D Gaussians are encouraged to get densified to a sufficient number accordingly, resulting in improved reconstruction accuracy.\n\n近年来，体积渲染的工作，如NeRF和3D高斯散射（3DGS），在学习到的隐式神经辐射场或3D高斯的帮助下，显著提升了渲染质量和效率。基于显式表示进行渲染，原始3DGS及其变体通过在训练期间每次迭代中采用单视图监督优化参数模型，实现了实时效率，这一策略源自NeRF。然而，某些视图容易出现过拟合，导致新视图合成时表现不佳，并且3D几何形状不够精确。为解决上述问题，我们提出了一种新的3DGS优化方法，包含四个关键创新贡献：1) 我们将传统的单视图训练范式转变为多视图训练策略。通过我们提出的多视图调节机制，3D高斯属性得到了进一步优化，避免了对特定训练视图的过拟合。作为通用解决方案，我们在多种场景和不同的高斯变体中提升了整体精度。2) 受到额外视图带来的好处的启发，我们进一步提出了一种跨内在指导方案，推动了针对不同分辨率的粗到细的训练过程。3) 基于我们多视图调节的训练方法，我们提出了一种跨光线加密策略，在从选定视图的光线相交区域中增加更多的高斯核。4) 通过进一步研究加密策略，我们发现，当某些视图差异显著时，加密的效果应得到增强。为此，我们提出了一种新颖的多视图增强加密策略，鼓励3D高斯根据需要加密到足够的数量，从而提高重建精度。\n"
  },
  {
    "path": "abs/2410.02571.md",
    "content": "### SuperGS: Super-Resolution 3D Gaussian Splatting via Latent Feature Field and Gradient-guided Splitting\n\nRecently, 3D Gaussian Splatting (3DGS) has exceled in novel view synthesis with its real-time rendering capabilities and superior quality. However, it faces challenges for high-resolution novel view synthesis (HRNVS) due to the coarse nature of primitives derived from low-resolution input views. To address this issue, we propose Super-Resolution 3DGS (SuperGS), which is an expansion of 3DGS designed with a two-stage coarse-to-fine training framework, utilizing pretrained low-resolution scene representation as an initialization for super-resolution optimization. Moreover, we introduce Multi-resolution Feature Gaussian Splatting (MFGS) to incorporates a latent feature field for flexible feature sampling and Gradient-guided Selective Splitting (GSS) for effective Gaussian upsampling. By integrating these strategies within the coarse-to-fine framework ensure both high fidelity and memory efficiency. Extensive experiments demonstrate that SuperGS surpasses state-of-the-art HRNVS methods on challenging real-world datasets using only low-resolution inputs.\n\n近期，3D高斯散射（3DGS）凭借其实时渲染能力和卓越的质量在新视图合成领域表现出色。然而，由于从低分辨率输入视图推导出的基元较为粗糙，3DGS在高分辨率新视图合成（HRNVS）中面临挑战。为解决这一问题，我们提出了超分辨率3DGS（SuperGS），这是3DGS的扩展，采用了一个两阶段的粗到细训练框架，利用预训练的低分辨率场景表示作为超分辨率优化的初始化。此外，我们引入了多分辨率特征高斯散射（MFGS），该方法结合了一个潜在特征场，实现了灵活的特征采样，并通过梯度引导的选择性分裂（GSS）实现高效的高斯上采样。通过将这些策略整合到粗到细的框架中，确保了高保真度和内存效率。大量实验表明，SuperGS在仅使用低分辨率输入的情况下，超越了在复杂现实世界数据集上的最先进HRNVS方法。\n"
  },
  {
    "path": "abs/2410.02619.md",
    "content": "### GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering\n\nWe present GI-GS, a novel inverse rendering framework that leverages 3D Gaussian Splatting (3DGS) and deferred shading to achieve photo-realistic novel view synthesis and relighting. In inverse rendering, accurately modeling the shading processes of objects is essential for achieving high-fidelity results. Therefore, it is critical to incorporate global illumination to account for indirect lighting that reaches an object after multiple bounces across the scene. Previous 3DGS-based methods have attempted to model indirect lighting by characterizing indirect illumination as learnable lighting volumes or additional attributes of each Gaussian, while using baked occlusion to represent shadow effects. These methods, however, fail to accurately model the complex physical interactions between light and objects, making it impossible to construct realistic indirect illumination during relighting. To address this limitation, we propose to calculate indirect lighting using efficient path tracing with deferred shading. In our framework, we first render a G-buffer to capture the detailed geometry and material properties of the scene. Then, we perform physically-based rendering (PBR) only for direct lighting. With the G-buffer and previous rendering results, the indirect lighting can be calculated through a lightweight path tracing. Our method effectively models indirect lighting under any given lighting conditions, thereby achieving better novel view synthesis and relighting. Quantitative and qualitative results show that our GI-GS outperforms existing baselines in both rendering quality and efficiency.\n\n我们提出了GI-GS，这是一种新颖的逆向渲染框架，结合了3D高斯散射（3DGS）和延迟着色，实现了照片级逼真的新视图合成和重光照。在逆向渲染中，准确建模物体的光照过程对于实现高保真结果至关重要。因此，整合全局光照以考虑经过场景多次反射后到达物体的间接光照尤为重要。之前基于3DGS的方法尝试通过将间接光照表示为可学习的光照体积或每个高斯的附加属性来建模间接光照，同时使用烘焙的遮挡来表示阴影效果。然而，这些方法未能准确建模光与物体之间复杂的物理交互，使得在重光照过程中无法构建逼真的间接光照。\n为了解决这一局限性，我们提出使用高效的路径追踪与延迟着色来计算间接光照。在我们的框架中，首先渲染G-buffer以捕捉场景的详细几何和材质属性。接着，仅对直接光照执行基于物理的渲染（PBR）。然后，利用G-buffer和之前的渲染结果，通过轻量级路径追踪计算间接光照。我们的方法能够在任意给定的光照条件下有效建模间接光照，从而实现更好的新视图合成和重光照。定量和定性结果表明，GI-GS在渲染质量和效率上均优于现有的基线方法。\n"
  },
  {
    "path": "abs/2410.02764.md",
    "content": "### Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats\n\nWe introduce a simple yet effective approach for separating transmitted and reflected light. Our key insight is that the powerful novel view synthesis capabilities provided by modern inverse rendering methods (e.g.,~3D Gaussian splatting) allow one to perform flash/no-flash reflection separation using unpaired measurements -- this relaxation dramatically simplifies image acquisition over conventional paired flash/no-flash reflection separation methods. Through extensive real-world experiments, we demonstrate our method, Flash-Splat, accurately reconstructs both transmitted and reflected scenes in 3D. Our method outperforms existing 3D reflection separation methods, which do not leverage illumination control, by a large margin.\n\n我们提出了一种简单且有效的传输光与反射光分离方法。我们的关键见解在于，现代逆向渲染方法（如3D高斯散射）所提供的强大新视图合成功能，使得可以利用非成对的测量进行闪光灯/无闪光灯反射分离——这一放宽条件极大简化了图像获取过程，优于传统的成对闪光灯/无闪光灯反射分离方法。通过大量真实场景实验，我们展示了我们的方法——Flash-Splat，能够准确重建传输场景和反射场景的3D结构。我们的方法大幅超越了现有不依赖光照控制的3D反射分离方法。\n"
  },
  {
    "path": "abs/2410.03592.md",
    "content": "### Variational Bayes Gaussian Splatting\n\nRecently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Bayes Gaussian Splatting (VBGS), a novel approach that frames training a Gaussian splat as variational inference over model parameters. By leveraging the conjugacy properties of multivariate Gaussians, we derive a closed-form variational update rule, allowing efficient updates from partial, sequential observations without the need for replay buffers. Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data, drastically improving performance in this setting.\n\n最近，3D高斯散射作为一种使用高斯混合物来建模3D场景的有前途方法得到了广泛关注。这些模型的主要优化方法依赖于通过可微渲染管道进行梯度反向传播，但在处理连续数据流时容易出现灾难性遗忘问题。为了解决这一局限性，我们提出了变分贝叶斯高斯散射（Variational Bayes Gaussian Splatting，VBGS），这是一种将高斯散射训练框架化为模型参数上的变分推断的新方法。通过利用多元高斯的共轭性，我们推导出封闭形式的变分更新规则，从而能够在没有重放缓冲区的情况下，高效地从部分、连续的观测中进行更新。我们的实验表明，VBGS不仅在静态数据集上达到了最先进的性能，还能够从连续流动的2D和3D数据中进行持续学习，在这一场景下显著提升了性能。\n"
  },
  {
    "path": "abs/2410.04354.md",
    "content": "### StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting\n\nReconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long and narrow camera trajectories, occlusion, complex object relationships, and data sparsity across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and ensure scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction.\n\n重建城市街景对于自动驾驶和城市规划等应用至关重要。这类场景通常具有长且狭窄的摄像机轨迹、遮挡、复杂的物体关系以及跨多尺度的数据稀疏性。尽管近年来取得了一些进展，现有的主要为物体中心场景设计的表面重建方法，难以有效适应街景的独特特征。为了解决这一挑战，我们提出了StreetSurfGS，这是第一个专门为可扩展的城市街景表面重建设计的高斯散射方法。StreetSurfGS采用了基于平面的八叉树表示和分段训练，旨在降低内存成本，适应独特的摄像机特性，并确保可扩展性。此外，为了减轻物体重叠引起的深度误差，我们在正则化中提出了一种引导平滑策略，用于消除不准确的边界点和离群值。针对稀疏视图和多尺度问题，我们使用了双步骤匹配策略，结合了相邻信息和长期信息。大量实验验证了StreetSurfGS在新视图合成和表面重建中的有效性。\n"
  },
  {
    "path": "abs/2410.04646.md",
    "content": "### Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering\n\nWe present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.\n\n我们提出了一种用于地面机器人轨迹数据集的新颖视图渲染算法——Mode-GS。我们的方法基于锚定的高斯点，旨在克服现有3D高斯点算法的局限性。之前的神经渲染方法由于场景复杂性和多视角观测不足，往往会出现严重的点漂移问题，且在地面机器人数据集中无法将点固定在真实几何上。我们的方法结合了来自单目深度的像素对齐锚点，并通过残差形式的高斯解码器在这些锚点周围生成高斯点。为了解决单目深度固有的尺度模糊性，我们通过每视角深度尺度参数化锚点，并采用尺度一致的深度损失进行在线尺度校准。基于PSNR、SSIM和LPIPS指标，我们的方法在具有自由轨迹模式的地面场景中表现出了更好的渲染性能，并在R3LIVE里程计数据集和Tanks and Temples数据集上实现了最先进的渲染效果。\n"
  },
  {
    "path": "abs/2410.04680.md",
    "content": "### Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting\n\nWe propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes.\n\n我们提出了一个框架，基于3D高斯点（3DGS）为机器人操作臂选择下一个最佳视图和触摸位置。3DGS作为一种有前景的显式3D场景表示方式，正在机器人领域中崭露头角，因为它能够以照片级真实感和几何精度表示场景。然而，在实际的在线机器人场景中，由于效率要求，视角数量有限，随机视角选择对于3DGS变得不切实际，因为这些视角通常会重叠且冗余。我们通过提出一个端到端的在线训练和主动视角选择流程解决了这一问题，提升了3DGS在少视角机器人环境下的表现。我们首先通过使用Segment Anything Model 2（SAM2）的新颖语义深度对齐方法，结合皮尔逊深度和表面法线损失，提升了真实世界场景中色彩和深度的重建效果，从而提高了少样本3DGS的表现。接着，我们扩展了FisherRF这一用于3DGS的下一步最佳视角选择方法，基于深度不确定性选择视角和触摸姿态。在实际机器人系统中，我们在3DGS在线训练期间执行了在线视角选择。我们为少样本高斯场景的改进提供了动机，并将基于深度的FisherRF扩展应用于这些场景，展示了在具有挑战性的机器人场景中定性和定量的改进效果。\n"
  },
  {
    "path": "abs/2410.04974.md",
    "content": "### 6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering\n\nNovel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based ray tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to better incorporate view-dependent effects, but the Gaussian representation and control scheme are sub-optimal. In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach is fully compatible with the 3DGS framework and significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details. Experiments demonstrate that 6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5% Gaussian points compared to 3DGS.\n\n随着神经辐射场（NeRF）和3D高斯点（3DGS）的发展，新颖视图合成取得了显著进展。然而，在不影响实时渲染的情况下实现高质量仍然是一个挑战，尤其是在具有视角依赖效果的基于物理的光线追踪中。最近，N维高斯（N-DG）引入了6维空间-角度表示，以更好地结合视角依赖效果，但高斯的表示和控制方案仍不理想。本文中，我们重新审视了6维高斯，并引入了6维高斯点（6DGS），通过利用6维空间中的额外方向信息优化高斯控制，增强了颜色和不透明度的表示。我们的方法与3DGS框架完全兼容，并通过更好地建模视角依赖效果和细节显著改善了实时辐射场渲染。实验表明，6DGS在性能上大幅超越了3DGS和N-DG，与3DGS相比，PSNR提升了多达15.73 dB，同时减少了66.5%的高斯点数量。\n"
  },
  {
    "path": "abs/2410.05044.md",
    "content": "### PhotoReg: Photometrically Registering 3D Gaussian Splatting Models\n\nBuilding accurate representations of the environment is critical for intelligent robots to make decisions during deployment. Advances in photorealistic environment models have enabled robots to develop hyper-realistic reconstructions, which can be used to generate images that are intuitive for human inspection. In particular, the recently introduced 3DGS, which describes the scene with up to millions of primitive ellipsoids, can be rendered in real time. 3DGS has rapidly gained prominence. However, a critical unsolved problem persists: how can we fuse multiple 3DGS into a single coherent model? Solving this problem will enable robot teams to jointly build 3DGS models of their surroundings. A key insight of this work is to leverage the duality between photorealistic reconstructions, which render realistic 2D images from 3D structure, and 3D foundation models, which predict 3D structure from image pairs. To this end, we develop PhotoReg, a framework to register multiple photorealistic 3DGS models with 3D foundation models. As 3DGS models are generally built from monocular camera images, they have arbitrary scale. To resolve this, PhotoReg actively enforces scale consistency among the different 3DGS models by considering depth estimates within these models. Then, the alignment is iteratively refined with fine-grained photometric losses to produce high-quality fused 3DGS models. We rigorously evaluate PhotoReg on both standard benchmark datasets and our custom-collected datasets, including with two quadruped robots.\n\n\n构建精确的环境表示对于智能机器人在部署期间做出决策至关重要。近年来，照片级真实感环境模型的进步使机器人能够生成超现实的重建，这些重建可以用于生成直观便于人类检查的图像。特别是最近引入的3DGS，通过多达数百万的原始椭球体来描述场景，并能够实时渲染。3DGS迅速获得了广泛关注。然而，一个关键的未解决问题仍然存在：如何将多个3DGS融合为一个连贯的模型？解决这一问题将使机器人团队能够共同构建其周围环境的3DGS模型。本工作的一个关键见解是利用照片级重建（从3D结构渲染逼真的2D图像）和3D基础模型（从图像对中预测3D结构）之间的{对偶性}。为此，我们开发了PhotoReg框架，将多个照片级真实感3DGS模型与3D基础模型进行注册。由于3DGS模型通常由单目相机图像构建，因此它们具有任意尺度。为了解决这一问题，PhotoReg通过考虑这些模型中的深度估计，主动强制不同3DGS模型之间的尺度一致性。然后，使用精细的光度损失迭代优化对齐，以生成高质量的融合3DGS模型。我们在标准基准数据集和我们自定义收集的数据集（包括两台四足机器人）上对PhotoReg进行了严格评估。\n"
  },
  {
    "path": "abs/2410.05111.md",
    "content": "### LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting\n\nLiDAR simulation plays a crucial role in closed-loop simulation for autonomous driving. Although recent advancements, such as the use of reconstructed mesh and Neural Radiance Fields (NeRF), have made progress in simulating the physical properties of LiDAR, these methods have struggled to achieve satisfactory frame rates and rendering quality. To address these limitations, we present LiDAR-GS, the first LiDAR Gaussian Splatting method, for real-time high-fidelity re-simulation of LiDAR sensor scans in public urban road scenes. The vanilla Gaussian Splatting, designed for camera models, cannot be directly applied to LiDAR re-simulation. To bridge the gap between passive camera and active LiDAR, our LiDAR-GS designs a differentiable laser beam splatting, grounded in the LiDAR range view model. This innovation allows for precise surface splatting by projecting lasers onto micro cross-sections, effectively eliminating artifacts associated with local affine approximations. Additionally, LiDAR-GS leverages Neural Gaussian Fields, which further integrate view-dependent clues, to represent key LiDAR properties that are influenced by the incident angle and external factors. Combining these practices with some essential adaptations, e.g., dynamic instances decomposition, our approach succeeds in simultaneously re-simulating depth, intensity, and ray-drop channels, achieving state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets.\n\nLiDAR模拟在自动驾驶的闭环仿真中起着至关重要的作用。尽管近年来的技术进步（如使用重建网格和神经辐射场NeRF）在模拟LiDAR的物理特性方面取得了一定进展，但这些方法在帧率和渲染质量方面仍未达到令人满意的水平。为了解决这些局限，我们提出了LiDAR-GS，这是一种基于高斯点的新方法，用于在公共城市道路场景中实现实时高保真LiDAR传感器扫描的重模拟。传统的高斯点方法是为相机模型设计的，无法直接应用于LiDAR重模拟。为了弥合被动相机和主动LiDAR之间的差距，LiDAR-GS设计了一种基于LiDAR视距模型的可微激光束点处理方法。该创新通过将激光投射到微小截面上实现了精确的表面点投影，有效消除了与局部仿射近似相关的伪影。此外，LiDAR-GS利用神经高斯场，将视角依赖线索进一步集成，以表示受入射角和外部因素影响的关键LiDAR属性。结合一些必要的适应措施，例如动态实例分解，我们的方法能够同时重模拟深度、强度和光线丢失通道，并在公共大规模场景数据集上实现了在渲染帧率和质量方面的最新成果。\n"
  },
  {
    "path": "abs/2410.05259.md",
    "content": "### GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting\n\nDiffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1) Specifically, we propose a personalized diffusion model that utilizes low-rank adaptation (LoRA) fine-tuning to incorporate personalized information into pre-trained 2D VTON models. To achieve effective LoRA training, we introduce a reference-driven image editing approach that enables the simultaneous editing of multi-view images while ensuring consistency. (2) Furthermore, we propose a persona-aware 3DGS editing framework to facilitate effective editing while maintaining consistent cross-view appearance and high-quality 3D geometry. (3) Additionally, we have established a new 3D VTON benchmark, 3D-VTONBench, which facilitates comprehensive qualitative and quantitative 3D VTON evaluations. Through extensive experiments and comparative analyses with existing methods, the proposed GS-VTON has demonstrated superior fidelity and advanced editing capabilities, affirming its effectiveness for 3D VTON.\n\n基于扩散的2D虚拟试穿（VTON）技术近年来展现出了强大的性能，而3D VTON的发展却相对滞后。尽管文本引导的3D场景编辑技术有所进步，但将2D VTON集成到这些流程中以实现生动的3D VTON仍然充满挑战。原因有两个：首先，文本提示无法提供足够的细节来描述服装；其次，从同一3D场景不同视角生成的2D VTON结果缺乏连贯性和空间关系，因此经常导致外观不一致和几何变形。为了解决这些问题，我们引入了一种图像提示的3D VTON方法（称为GS-VTON），通过使用3D高斯点（3DGS）作为3D表示，能够将2D VTON模型中的预训练知识迁移到3D，并提升跨视角的一致性。具体来说：(1) 我们提出了一个个性化扩散模型，利用低秩适配（LoRA）微调，将个性化信息融入预训练的2D VTON模型中。为实现有效的LoRA训练，我们引入了一种基于参考的图像编辑方法，能够同时编辑多视角图像并确保一致性。(2) 此外，我们提出了一个面向个体的3DGS编辑框架，促进有效编辑的同时保持跨视角一致性和高质量的3D几何。(3) 我们还建立了一个新的3D VTON基准，称为3D-VTONBench，以促进全面的定性和定量3D VTON评估。通过大量实验和与现有方法的对比分析，提出的GS-VTON在保真度和高级编辑能力方面表现出色，证实了其在3D VTON中的有效性。\n"
  },
  {
    "path": "abs/2410.06014.md",
    "content": "### SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting\n\nMany recent developments for robots to represent environments have focused on photorealistic reconstructions. This paper particularly focuses on generating sequences of images from the photorealistic Gaussian Splatting models, that match instructions that are given by user-inputted language. We contribute a novel framework, SplaTraj, which formulates the generation of images within photorealistic environment representations as a continuous-time trajectory optimization problem. Costs are designed so that a camera following the trajectory poses will smoothly traverse through the environment and render the specified spatial information in a photogenic manner. This is achieved by querying a photorealistic representation with language embedding to isolate regions that correspond to the user-specified inputs. These regions are then projected to the camera's view as it moves over time and a cost is constructed. We can then apply gradient-based optimization and differentiate through the rendering to optimize the trajectory for the defined cost. The resulting trajectory moves to photogenically view each of the specified objects. We empirically evaluate our approach on a suite of environments and instructions, and demonstrate the quality of generated image sequences.\n\n许多最近的发展集中在为机器人构建环境的照片级真实感重建上。本文特别关注从照片级高斯点模型生成图像序列，这些图像与用户输入的语言指令相匹配。我们提出了一个新框架，SplaTraj，将在照片级环境表示中生成图像的问题形式化为一个连续时间轨迹优化问题。通过设计成本函数，使得沿着轨迹移动的相机能够平滑地穿过环境，并以一种美观的方式渲染指定的空间信息。该方法通过查询照片级表示与语言嵌入，将对应用户指定输入的区域隔离出来。这些区域随后在相机随时间移动时投射到相机视角中，并构建成本函数。我们随后可以应用基于梯度的优化方法，通过渲染的可微分性优化轨迹以满足定义的成本。最终的轨迹能够以美观的方式查看每个指定的对象。我们在一组环境和指令上对该方法进行了实证评估，并展示了生成图像序列的质量。\n"
  },
  {
    "path": "abs/2410.06165.md",
    "content": "### GSLoc: Visual Localization with 3D Gaussian Splatting\n\nWe present GSLoc: a new visual localization method that performs dense camera alignment using 3D Gaussian Splatting as a map representation of the scene. GSLoc backpropagates pose gradients over the rendering pipeline to align the rendered and target images, while it adopts a coarse-to-fine strategy by utilizing blurring kernels to mitigate the non-convexity of the problem and improve the convergence. The results show that our approach succeeds at visual localization in challenging conditions of relatively small overlap between initial and target frames inside textureless environments when state-of-the-art neural sparse methods provide inferior results. Using the byproduct of realistic rendering from the 3DGS map representation, we show how to enhance localization results by mixing a set of observed and virtual reference keyframes when solving the image retrieval problem. We evaluate our method both on synthetic and real-world data, discussing its advantages and application potential.\n\n我们提出了GSLoc，一种新的视觉定位方法，该方法使用3D高斯点作为场景的地图表示进行稠密相机对齐。GSLoc通过渲染管道反向传播位姿梯度，来对齐渲染图像与目标图像，同时采用粗到细的策略，利用模糊核来缓解问题的非凸性并提高收敛性。结果表明，在初始帧和目标帧之间重叠较小且环境纹理较少的具有挑战性的条件下，我们的方法在视觉定位上取得了成功，而最新的神经稀疏方法表现较差。利用3DGS地图表示中逼真渲染的副产品，我们展示了如何通过在解决图像检索问题时混合一组观测到的和虚拟的参考关键帧来增强定位结果。我们在合成和真实世界的数据上对该方法进行了评估，并讨论了其优势和应用潜力。\n"
  },
  {
    "path": "abs/2410.06231.md",
    "content": "### RelitLRM: Generative Relightable Radiance for Large Reconstruction Models\n\nWe propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster.\n\n我们提出了RelitLRM，一种大型重建模型（LRM），用于在新的照明条件下从稀疏（4-8张）姿态图像生成高质量的3D物体高斯点表示，这些图像是在未知的静态照明下捕获的。与先前的需要密集采集和缓慢优化的逆渲染方法不同，RelitLRM采用了一种前馈的基于Transformer的模型，结合了几何重建器和基于扩散的可重光照外观生成器的创新架构。该模型在具有已知不同照明条件的合成多视角渲染图像上进行端到端训练。此架构设计能够有效分解几何和外观，解决材料与光照之间的模糊性问题，并捕捉重光照外观中的阴影和高光的多模态分布。我们展示了稀疏视角的前馈式RelitLRM在重光照效果上与最先进的基于密集视角优化的基准方法具有竞争力，同时显著加快了处理速度。\n"
  },
  {
    "path": "abs/2410.06245.md",
    "content": "### HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction\n\nReconstructing 3D scenes from multiple viewpoints is a fundamental task in stereo vision. Recently, advances in generalizable 3D Gaussian Splatting have enabled high-quality novel view synthesis for unseen scenes from sparse input views by feed-forward predicting per-pixel Gaussian parameters without extra optimization. However, existing methods typically generate single-scale 3D Gaussians, which lack representation of both large-scale structure and texture details, resulting in mislocation and artefacts. In this paper, we propose a novel framework, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. Specifically, HiSplat generates large coarse-grained Gaussians to capture large-scale structures, followed by fine-grained Gaussians to enhance delicate texture details. To promote inter-scale interactions, we propose an Error Aware Module for Gaussian compensation and a Modulating Fusion Module for Gaussian repair. Our method achieves joint optimization of hierarchical representations, allowing for novel view synthesis using only two-view reference images. Comprehensive experiments on various datasets demonstrate that HiSplat significantly enhances reconstruction quality and cross-dataset generalization compared to prior single-scale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness.\n\n从多个视角重建3D场景是立体视觉中的一项基础任务。近年来，可泛化的3D高斯点技术的进展使得从稀疏的输入视角生成高质量的新视图合成成为可能，通过前馈预测每个像素的高斯参数而无需额外的优化。然而，现有方法通常生成单一尺度的3D高斯，这无法同时表征大尺度结构和纹理细节，导致位置错误和伪影。在本文中，我们提出了一个新的框架，HiSplat，它在可泛化的3D高斯点中引入了分层策略，通过粗到细的方式构建分层的3D高斯。具体来说，HiSplat首先生成较大、粗粒度的高斯以捕捉大尺度结构，随后生成细粒度的高斯以增强精细的纹理细节。为了促进不同尺度间的交互，我们提出了误差感知模块用于高斯补偿，并引入了调制融合模块用于高斯修复。我们的方法实现了分层表示的联合优化，仅使用两视图参考图像即可进行新视图合成。基于多个数据集的综合实验表明，HiSplat在重建质量和跨数据集的泛化能力方面显著优于之前的单一尺度方法。对应的消融研究和对不同尺度3D高斯的分析揭示了其有效性背后的机制。\n"
  },
  {
    "path": "abs/2410.06613.md",
    "content": "### ES-Gaussian: Gaussian Splatting Mapping via Error Space-Based Gaussian Completion\n\nAccurate and affordable indoor 3D reconstruction is critical for effective robot navigation and interaction. Traditional LiDAR-based mapping provides high precision but is costly, heavy, and power-intensive, with limited ability for novel view rendering. Vision-based mapping, while cost-effective and capable of capturing visual data, often struggles with high-quality 3D reconstruction due to sparse point clouds. We propose ES-Gaussian, an end-to-end system using a low-altitude camera and single-line LiDAR for high-quality 3D indoor reconstruction. Our system features Visual Error Construction (VEC) to enhance sparse point clouds by identifying and correcting areas with insufficient geometric detail from 2D error maps. Additionally, we introduce a novel 3DGS initialization method guided by single-line LiDAR, overcoming the limitations of traditional multi-view setups and enabling effective reconstruction in resource-constrained environments. Extensive experimental results on our new Dreame-SR dataset and a publicly available dataset demonstrate that ES-Gaussian outperforms existing methods, particularly in challenging scenarios.\n\n精确且经济的室内3D重建对于机器人导航和交互至关重要。传统的基于LiDAR的地图构建虽然具有高精度，但成本高、重量大且功耗高，并且在新视角渲染方面能力有限。基于视觉的地图构建成本较低，能够捕捉视觉数据，但由于点云稀疏，往往难以实现高质量的3D重建。我们提出了ES-Gaussian，这是一个端到端系统，使用低空摄像头和单线LiDAR实现高质量的室内3D重建。该系统的特点是引入视觉误差构建（VEC）模块，通过2D误差图识别并修正几何细节不足的区域，从而增强稀疏点云。此外，我们提出了一种基于单线LiDAR引导的3DGS初始化方法，克服了传统多视角设置的局限性，使其能够在资源受限的环境中进行有效重建。我们在新的Dreame-SR数据集和一个公开数据集上进行了广泛的实验，结果表明，ES-Gaussian在特别具有挑战性的场景中优于现有方法。\n"
  },
  {
    "path": "abs/2410.06756.md",
    "content": "### DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation\n\nRecent advancements in 2D/3D generative techniques have facilitated the generation of dynamic 3D objects from monocular videos. Previous methods mainly rely on the implicit neural radiance fields (NeRF) or explicit Gaussian Splatting as the underlying representation, and struggle to achieve satisfactory spatial-temporal consistency and surface appearance. Drawing inspiration from modern 3D animation pipelines, we introduce DreamMesh4D, a novel framework combining mesh representation with geometric skinning technique to generate high-quality 4D object from a monocular video. Instead of utilizing classical texture map for appearance, we bind Gaussian splats to triangle face of mesh for differentiable optimization of both the texture and mesh vertices. In particular, DreamMesh4D begins with a coarse mesh obtained through an image-to-3D generation procedure. Sparse points are then uniformly sampled across the mesh surface, and are used to build a deformation graph to drive the motion of the 3D object for the sake of computational efficiency and providing additional constraint. For each step, transformations of sparse control points are predicted using a deformation network, and the mesh vertices as well as the surface Gaussians are deformed via a novel geometric skinning algorithm, which is a hybrid approach combining LBS (linear blending skinning) and DQS (dual-quaternion skinning), mitigating drawbacks associated with both approaches. The static surface Gaussians and mesh vertices as well as the deformation network are learned via reference view photometric loss, score distillation loss as well as other regularizers in a two-stage manner. Extensive experiments demonstrate superior performance of our method. Furthermore, our method is compatible with modern graphic pipelines, showcasing its potential in the 3D gaming and film industry.\n\n最近2D/3D生成技术的进展大大促进了从单目视频生成动态3D对象。以往的方法主要依赖隐式神经辐射场（NeRF）或显式高斯点作为底层表示，但在实现时空一致性和表面外观上仍存在困难。借鉴现代3D动画管线的灵感，我们提出了DreamMesh4D，一个结合网格表示与几何蒙皮技术的框架，用于从单目视频生成高质量的4D对象。与传统的纹理映射不同，我们将高斯点绑定到网格的三角面上，以便对纹理和网格顶点进行可微分优化。\nDreamMesh4D从通过图像到3D生成过程得到的粗糙网格开始。然后在网格表面均匀采样稀疏点，用于构建形变图，以驱动3D对象的运动，既提高了计算效率，也提供了额外的约束。在每一步中，通过形变网络预测稀疏控制点的变换，并使用结合LBS（线性混合蒙皮）和DQS（双四元数蒙皮）的新型几何蒙皮算法，对网格顶点和表面高斯点进行形变，从而减轻两种方法的缺陷。\n静态表面高斯点和网格顶点，以及形变网络的学习，通过参考视图的光度损失、得分蒸馏损失以及其他正则项以两阶段的方式进行。大量实验表明我们的方法具有优越的性能。此外，我们的方法与现代图形管线兼容，展现了其在3D游戏和电影产业中的潜力。\n"
  },
  {
    "path": "abs/2410.07266.md",
    "content": "### Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting\n\n3D Gaussian Splatting is capable of reconstructing 3D scenes in minutes. Despite recent advances in improving surface reconstruction accuracy, the reconstructed results still exhibit bias and suffer from inefficiency in storage and training. This paper provides a different observation on the cause of the inefficiency and the reconstruction bias, which is attributed to the integration of the low-opacity parts (LOPs) of the generated Gaussians. We show that LOPs consist of Gaussians with overall low-opacity (LOGs) and the low-opacity tails (LOTs) of Gaussians. We propose Spiking GS to reduce such two types of LOPs by integrating spiking neurons into the Gaussian Splatting pipeline. Specifically, we introduce global and local full-precision integrate-and-fire spiking neurons to the opacity and representation function of flattened 3D Gaussians, respectively. Furthermore, we enhance the density control strategy with spiking neurons' thresholds and an new criterion on the scale of Gaussians. Our method can represent more accurate reconstructed surfaces at a lower cost.\n\n3D高斯点技术能够在几分钟内重建3D场景。尽管在提高表面重建精度方面取得了进展，但重建结果仍然存在偏差，并且在存储和训练效率方面表现不佳。本文提供了对这些效率问题和重建偏差的不同观察，指出问题源于生成的高斯点中的低不透明部分（LOPs）的整合。我们展示了LOPs包括整体低不透明高斯点（LOGs）和高斯点的低不透明尾部（LOTs）。为此，我们提出了Spiking GS，通过将尖峰神经元集成到高斯点管道中来减少这两类LOPs。具体而言，我们在展平的3D高斯点的不透明度和表示函数中，分别引入了全精度整合-触发（integrate-and-fire）尖峰神经元，用于全局和局部控制。此外，我们通过尖峰神经元的阈值和一种新的高斯点尺度标准，增强了密度控制策略。我们的方法可以在更低的成本下表示更精确的重建表面。\n\n"
  },
  {
    "path": "abs/2410.07577.md",
    "content": "### 3D Vision-Language Gaussian Splatting\n\nRecent advancements in 3D reconstruction methods and vision-language models have propelled the development of multi-modal 3D scene understanding, which has vital applications in robotics, autonomous driving, and virtual/augmented reality. However, current multi-modal scene understanding approaches have naively embedded semantic representations into 3D reconstruction methods without striking a balance between visual and language modalities, which leads to unsatisfying semantic rasterization of translucent or reflective objects, as well as over-fitting on color modality. To alleviate these limitations, we propose a solution that adequately handles the distinct visual and semantic modalities, i.e., a 3D vision-language Gaussian splatting model for scene understanding, to put emphasis on the representation learning of language modality. We propose a novel cross-modal rasterizer, using modality fusion along with a smoothed semantic indicator for enhancing semantic rasterization. We also employ a camera-view blending technique to improve semantic consistency between existing and synthesized views, thereby effectively mitigating over-fitting. Extensive experiments demonstrate that our method achieves state-of-the-art performance in open-vocabulary semantic segmentation, surpassing existing methods by a significant margin.\n\n近年来，3D重建方法和视觉语言模型的进展推动了多模态3D场景理解的发展，这在机器人、自动驾驶以及虚拟/增强现实等领域具有重要应用。然而，当前的多模态场景理解方法往往简单地将语义表示嵌入到3D重建方法中，未能在视觉和语言模态之间取得平衡，导致半透明或反射物体的语义光栅化效果不佳，并且过度依赖颜色模态。为了解决这些问题，我们提出了一种能够充分处理视觉和语义模态差异的解决方案，即一个用于场景理解的3D视觉语言高斯点模型，强调语言模态的表示学习。我们提出了一种新颖的跨模态光栅器，通过模态融合以及平滑语义指示器来增强语义光栅化效果。此外，我们采用了相机视角融合技术，以提高现有视图和合成视图之间的语义一致性，从而有效减轻过拟合问题。大量实验表明，我们的方法在开放词汇的语义分割任务中达到了最新的性能，显著超越了现有方法。\n"
  },
  {
    "path": "abs/2410.07707.md",
    "content": "### MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting\n\nDynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results.\n\n动态场景重建一直是3D视觉领域的长期挑战。近年来，3D高斯点技术的出现为解决该问题提供了新的思路。尽管后续工作迅速将静态的3D高斯扩展到动态场景中，但它们通常缺乏对物体运动的显式约束，导致优化困难和性能下降。为了解决上述问题，我们提出了一种新颖的可变形3D高斯点框架——MotionGS，该框架通过探索显式运动先验来引导3D高斯的变形。具体而言，我们首先引入了一个光流解耦模块，将光流解耦为相机流和运动流，分别对应相机运动和物体运动。运动流可以有效地约束3D高斯的变形，从而模拟动态物体的运动。此外，我们还提出了一个相机姿态优化模块，交替优化3D高斯和相机姿态，减轻不准确的相机姿态对重建的影响。大量实验验证了MotionGS在单目动态场景中的优越性，在定性和定量结果上均显著超越了最新的方法。\n"
  },
  {
    "path": "abs/2410.07971.md",
    "content": "### Generalizable and Animatable Gaussian Head Avatar\n\nIn this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction. Existing methods rely on neural radiance fields, leading to heavy rendering consumption and low reenactment speeds. To address these limitations, we generate the parameters of 3D Gaussians from a single image in a single forward pass. The key innovation of our work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details. Additionally, we leverage global image features and the 3D morphable model to construct 3D Gaussians for controlling expressions. After training, our model can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds. Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. We believe our method can establish new benchmarks for future research and advance applications of digital avatars.\n\n在本文中，我们提出了一种用于单次可动画头像重建的可泛化和可动画的高斯头像模型（GAGAvatar）。现有方法依赖于神经辐射场，导致渲染开销大且重演速度慢。为了解决这些问题，我们通过单次前向传递从单张图像生成3D高斯参数。我们工作的关键创新在于提出了双提升方法，该方法生成了高保真3D高斯，能够捕捉身份和面部细节。此外，我们利用全局图像特征和3D可变形模型来构建3D高斯以控制表情。在训练完成后，我们的模型可以在不进行特定优化的情况下重建未知身份，并以实时速度进行重演渲染。实验表明，我们的方法在重建质量和表情准确性方面相较于之前的方法表现更优。我们相信，所提出的方法可以为未来的研究树立新的基准，并推动数字化身的应用进展。\n"
  },
  {
    "path": "abs/2410.08017.md",
    "content": "### Fast Feedforward 3D Gaussian Splatting Compression\n\nWith 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods.\n\n随着3D高斯点（3DGS）在新视图合成的实时高保真渲染中取得进展，存储需求成为其广泛应用的挑战。尽管已经提出了多种压缩技术，但现有方法存在一个共同的局限性：任何现有的3DGS都需要针对每个场景进行优化才能实现压缩，这使得压缩过程缓慢而低效。为了解决这一问题，我们提出了Fast Compression of 3D Gaussian Splatting（FCGS），这是一种无需优化的模型，可以在单次前馈过程中快速压缩3DGS表示，将压缩时间从几分钟大幅缩短到几秒钟。为了提高压缩效率，我们提出了一个多路径熵模块，该模块将高斯属性分配到不同的熵约束路径，以在压缩大小和保真度之间取得平衡。我们还精心设计了高斯间和高斯内的上下文模型，以去除非结构化高斯点之间的冗余。总体而言，FCGS在保持保真度的同时实现了超过20倍的压缩比，超越了大多数基于场景优化的最新方法。\n"
  },
  {
    "path": "abs/2410.08107.md",
    "content": "### IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera\n\nImplicit neural representation and explicit 3D Gaussian Splatting (3D-GS) for novel view synthesis have achieved remarkable progress with frame-based camera (e.g. RGB and RGB-D cameras) recently. Compared to frame-based camera, a novel type of bio-inspired visual sensor, i.e. event camera, has demonstrated advantages in high temporal resolution, high dynamic range, low power consumption and low latency. Due to its unique asynchronous and irregular data capturing process, limited work has been proposed to apply neural representation or 3D Gaussian splatting for an event camera. In this work, we present IncEventGS, an incremental 3D Gaussian Splatting reconstruction algorithm with a single event camera. To recover the 3D scene representation incrementally, we exploit the tracking and mapping paradigm of conventional SLAM pipelines for IncEventGS. Given the incoming event stream, the tracker firstly estimates an initial camera motion based on prior reconstructed 3D-GS scene representation. The mapper then jointly refines both the 3D scene representation and camera motion based on the previously estimated motion trajectory from the tracker. The experimental results demonstrate that IncEventGS delivers superior performance compared to prior NeRF-based methods and other related baselines, even we do not have the ground-truth camera poses. Furthermore, our method can also deliver better performance compared to state-of-the-art event visual odometry methods in terms of camera motion estimation.\n\n隐式神经表示和显式3D高斯点（3D-GS）技术在基于帧的相机（如RGB和RGB-D相机）的新视图合成方面取得了显著进展。相比于基于帧的相机，一种新型仿生视觉传感器——事件相机，展现了在高时间分辨率、高动态范围、低功耗和低延迟方面的优势。由于其独特的异步和不规则数据捕捉过程，现有应用于事件相机的神经表示或3D高斯点技术的工作较为有限。在本研究中，我们提出了IncEventGS，这是一种利用单个事件相机的增量式3D高斯点重建算法。为了逐步恢复3D场景表示，我们在IncEventGS中采用了传统SLAM管线中的跟踪与建图范式。在接收到事件流后，跟踪器首先基于之前重建的3D-GS场景表示估计初始相机运动。然后，建图器根据跟踪器之前估计的运动轨迹，联合优化3D场景表示和相机运动。实验结果表明，即使没有真实的相机位姿数据，IncEventGS在性能上优于先前基于NeRF的方法和其他相关基准。此外，我们的方法在相机运动估计方面也优于当前最先进的事件视觉里程计方法。\n"
  },
  {
    "path": "abs/2410.08129.md",
    "content": "### Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency\n\n3D Gaussian Splats (3DGS) have proven a versatile rendering primitive, both for inverse rendering as well as real-time exploration of scenes. In these applications, coherence across camera frames and multiple views is crucial, be it for robust convergence of a scene reconstruction or for artifact-free fly-throughs. Recent work started mitigating artifacts that break multi-view coherence, including popping artifacts due to inconsistent transparency sorting and perspective-correct outlines of (2D) splats. At the same time, real-time requirements forced such implementations to accept compromises in how transparency of large assemblies of 3D Gaussians is resolved, in turn breaking coherence in other ways. In our work, we aim at achieving maximum coherence, by rendering fully perspective-correct 3D Gaussians while using a high-quality approximation of accurate blending, hybrid transparency, on a per-pixel level, in order to retain real-time frame rates. Our fast and perspectively accurate approach for evaluation of 3D Gaussians does not require matrix inversions, thereby ensuring numerical stability and eliminating the need for special handling of degenerate splats, and the hybrid transparency formulation for blending maintains similar quality as fully resolved per-pixel transparencies at a fraction of the rendering costs. We further show that each of these two components can be independently integrated into Gaussian splatting systems. In combination, they achieve up to 2× higher frame rates, 2× faster optimization, and equal or better image quality with fewer rendering artifacts compared to traditional 3DGS on common benchmarks.\n\n"
  },
  {
    "path": "abs/2410.08181.md",
    "content": "### RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image\n\nThe generation of high-quality 3D car assets is essential for various applications, including video games, autonomous driving, and virtual reality. Current 3D generation methods utilizing NeRF or 3D-GS as representations for 3D objects, generate a Lambertian object under fixed lighting and lack separated modelings for material and global illumination. As a result, the generated assets are unsuitable for relighting under varying lighting conditions, limiting their applicability in downstream tasks. To address this challenge, we propose a novel relightable 3D object generative framework that automates the creation of 3D car assets, enabling the swift and accurate reconstruction of a vehicle's geometry, texture, and material properties from a single input image. Our approach begins with introducing a large-scale synthetic car dataset comprising over 1,000 high-precision 3D vehicle models. We represent 3D objects using global illumination and relightable 3D Gaussian primitives integrating with BRDF parameters. Building on this representation, we introduce a feed-forward model that takes images as input and outputs both relightable 3D Gaussians and global illumination parameters. Experimental results demonstrate that our method produces photorealistic 3D car assets that can be seamlessly integrated into road scenes with different illuminations, which offers substantial practical benefits for industrial applications.\n\n高质量3D汽车资产的生成对于包括视频游戏、自动驾驶和虚拟现实在内的多种应用至关重要。目前使用NeRF或3D-GS作为3D物体表示的生成方法，通常在固定光照下生成朗伯体对象，并且缺乏对材质和全局光照的独立建模。因此，生成的资产在不同光照条件下无法重新照明，限制了其在下游任务中的适用性。为了解决这一挑战，我们提出了一种全新的可重新照明3D物体生成框架，能够自动创建3D汽车资产，实现从单张输入图像快速且精确地重建车辆的几何结构、纹理和材质属性。我们的方法首先引入了一个大规模合成汽车数据集，包含超过1000个高精度3D车辆模型。我们使用结合BRDF参数的全局光照和可重新照明的3D高斯基元来表示3D对象。在此表示基础上，我们引入了一个前馈模型，能够将图像作为输入并输出可重新照明的3D高斯和全局光照参数。实验结果表明，我们的方法能够生成逼真的3D汽车资产，并可无缝集成到不同光照条件的道路场景中，这为工业应用提供了巨大的实用价值。\n"
  },
  {
    "path": "abs/2410.08190.md",
    "content": "### Poison-splat: Computation Cost Attack on 3D Gaussian Splatting\n\n3D Gaussian splatting (3DGS), known for its groundbreaking performance and efficiency, has become a dominant 3D representation and brought progress to many 3D vision tasks. However, in this work, we reveal a significant security vulnerability that has been largely overlooked in 3DGS: the computation cost of training 3DGS could be maliciously tampered by poisoning the input data. By developing an attack named Poison-splat, we reveal a novel attack surface where the adversary can poison the input images to drastically increase the computation memory and time needed for 3DGS training, pushing the algorithm towards its worst computation complexity. In extreme cases, the attack can even consume all allocable memory, leading to a Denial-of-Service (DoS) that disrupts servers, resulting in practical damages to real-world 3DGS service vendors. Such a computation cost attack is achieved by addressing a bi-level optimization problem through three tailored strategies: attack objective approximation, proxy model rendering, and optional constrained optimization. These strategies not only ensure the effectiveness of our attack but also make it difficult to defend with simple defensive measures. We hope the revelation of this novel attack surface can spark attention to this crucial yet overlooked vulnerability of 3DGS systems.\n\n3D高斯点（3DGS）因其突破性的性能和效率，已成为主流的3D表示方式，并推动了许多3D视觉任务的发展。然而，在本研究中，我们揭示了3DGS中一个被广泛忽视的重要安全漏洞：训练3DGS的计算成本可能被通过输入数据投毒进行恶意篡改。通过开发一种名为Poison-splat的攻击，我们揭示了一个新的攻击面，攻击者可以投毒输入图像，从而大幅增加3DGS训练所需的计算内存和时间，将算法推向其最糟糕的计算复杂度。在极端情况下，攻击甚至可以消耗所有可分配的内存，导致拒绝服务（DoS），中断服务器，给实际的3DGS服务提供商带来损害。此类计算成本攻击是通过解决一个双层优化问题并结合三种定制策略实现的：攻击目标近似、代理模型渲染和可选的约束优化。这些策略不仅确保了攻击的有效性，还使得简单的防御措施难以抵御此类攻击。我们希望这一新型攻击面的揭示能够引起人们对3DGS系统这一重要但被忽视的漏洞的关注。\n"
  },
  {
    "path": "abs/2410.08257.md",
    "content": "### Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics\n\nWhile humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle. Current methods for visual grounding of dynamics either use pure neural-network-based simulators (black box), which may violate physical laws, or traditional physical simulators (white box), which rely on expert-defined equations that may not fully capture actual dynamics. We propose the Neural Material Adaptor (NeuMA), which integrates existing physical laws with learned corrections, facilitating accurate learning of actual dynamics while maintaining the generalizability and interpretability of physical priors. Additionally, we propose Particle-GS, a particle-driven 3D Gaussian Splatting variant that bridges simulation and observed images, allowing back-propagate image gradients to optimize the simulator. Comprehensive experiments on various dynamics in terms of grounded particle accuracy, dynamic rendering quality, and generalization ability demonstrate that NeuMA can accurately capture intrinsic dynamics.\n\n尽管人类能够轻松识别内在动态并适应新场景，现代AI系统却常常面临挑战。当前的视觉动态定锚方法要么使用纯神经网络模拟器（黑箱），这可能违反物理定律，要么依赖传统的物理模拟器（白箱），这些方法依赖专家定义的方程，可能无法完全捕捉真实的动态。我们提出了一种名为Neural Material Adaptor (NeuMA) 的方法，它将现有的物理定律与学习到的修正相结合，能够在保持物理先验的普适性和可解释性的同时，精确学习实际的动态。此外，我们还提出了Particle-GS，这是一种基于粒子的3D高斯点云变体，用于连接模拟和观测到的图像，允许通过反向传播图像梯度来优化模拟器。在粒子精度、动态渲染质量和泛化能力等多方面的动态实验中，NeuMA展现了其对内在动态的准确捕捉能力。\n"
  },
  {
    "path": "abs/2410.08282.md",
    "content": "### FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction\n\nHumans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website.\n\n人类能够轻松地将常识知识与视觉和触觉的感官输入相结合，以理解周围环境。为了模拟这一能力，我们提出了FusionSense，这是一种新颖的3D重建框架，使机器人能够将基础模型中的先验知识与来自视觉和触觉传感器的高度稀疏观测数据相融合。FusionSense解决了三个关键挑战：(i) 机器人如何高效获取关于周围场景和物体的全局形状信息？(ii) 机器人如何利用几何和常识先验策略性地选择物体上的触点？(iii) 像触觉信号这样的部分观测如何改善物体的整体表示？我们的框架采用3D高斯点云作为核心表示，结合了一种分层优化策略，涉及全局结构构建、物体可视外壳剪裁以及局部几何约束。该进展在传统上具有挑战性的环境中（例如透明、反光或暗色的物体）实现了快速且鲁棒的感知，促进了更多后续操作或导航任务。在真实世界数据上的实验表明，我们的框架在性能上优于之前的稀疏视角方法。所有代码和数据均已在项目网站上开源。\n"
  },
  {
    "path": "abs/2410.08743.md",
    "content": "### Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization\n\n3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method.\n\n3D Gaussian Splatting 最近作为一种快速且精确的新视角合成工具崭露头角，利用一组具有姿态的输入图像生成新视角。然而，与大多数新视角合成方法类似，它依赖于精确的相机姿态信息，这在现实场景中可能难以获取，甚至不可能实现。为此，我们提出了对3D Gaussian Splatting框架的扩展，通过优化与光度残差相关的外部相机参数来克服这一限制。我们推导了解析梯度，并将其计算与现有的高性能CUDA实现集成。这使得后续任务如六自由度（6-DoF）相机姿态估计以及联合重建和相机优化成为可能。特别是，我们在真实场景中实现了快速收敛和高精度的姿态估计。该方法能够在无需精确姿态信息的情况下，通过联合优化几何和相机姿态，快速重建3D场景，同时在新视角合成中达到最新的技术水平。与大多数竞争方法相比，我们的方法优化速度显著加快，渲染速度也提升了数倍。我们在真实场景和模拟环境中的复杂轨迹上展示了实验结果，在LLFF数据集上实现了最新的技术水平，并将运行时间减少了两到四倍，优于当前最有效的竞争方法。\n"
  },
  {
    "path": "abs/2410.08840.md",
    "content": "### Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars\n\nIn this paper, we propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs. Existing GS-based methods designed for single subjects often yield unsatisfactory results due to limited input views, various hand poses, and occlusions. To address these challenges, we introduce a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas. Particularly, to handle hand variations, we disentangle the 3D presentation of hands into optimization-based identity maps and learning-based latent geometric features and neural texture maps. Learning-based features are captured by trained networks to provide reliable priors for poses, shapes, and textures, while optimization-based identity maps enable efficient one-shot fitting of out-of-distribution hands. Furthermore, we devise an interaction-aware attention module and a self-adaptive Gaussian refinement module. These modules enhance image rendering quality in areas with intra- and inter-hand interactions, overcoming the limitations of existing GS-based methods. Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset, and it significantly improves the state-of-the-art performance in image quality.\n\n在本文中，我们提出了一种基于3D高斯散射（GS）和单张图像输入的可动画交互手部头像生成方法。现有的基于GS的单主体方法由于输入视角有限、手部姿势多样以及遮挡问题，常常导致效果不佳。为了解决这些挑战，我们引入了一种新颖的两阶段交互感知GS框架，该框架利用跨主体的手部先验知识，并对交互区域的3D高斯进行精细化处理。特别地，为了处理手部变化，我们将手部的3D表示解耦为基于优化的身份映射和基于学习的潜在几何特征以及神经纹理图。基于学习的特征通过训练的网络捕捉，用于提供姿势、形状和纹理的可靠先验，而基于优化的身份映射则能够高效地进行超出分布手部的一次性拟合。此外，我们设计了一个交互感知注意力模块和一个自适应高斯细化模块。这些模块增强了手部内部及手部之间交互区域的图像渲染质量，克服了现有基于GS方法的局限性。通过在大规模InterHand2.6M数据集上的广泛实验验证，我们的方法在图像质量方面显著提升了当前最先进方法的性能。\n"
  },
  {
    "path": "abs/2410.08941.md",
    "content": "### MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering\n\nRecently, 3D Gaussian splatting has gained attention for its capability to generate high-fidelity rendering results. At the same time, most applications such as games, animation, and AR/VR use mesh-based representations to represent and render 3D scenes. We propose a novel approach that integrates mesh representation with 3D Gaussian splats to perform high-quality rendering of reconstructed real-world scenes. In particular, we introduce a distance-based Gaussian splatting technique to align the Gaussian splats with the mesh surface and remove redundant Gaussian splats that do not contribute to the rendering. We consider the distance between each Gaussian splat and the mesh surface to distinguish between tightly-bound and loosely-bound Gaussian splats. The tightly-bound splats are flattened and aligned well with the mesh geometry. The loosely-bound Gaussian splats are used to account for the artifacts in reconstructed 3D meshes in terms of rendering. We present a training strategy of binding Gaussian splats to the mesh geometry, and take into account both types of splats. In this context, we introduce several regularization techniques aimed at precisely aligning tightly-bound Gaussian splats with the mesh surface during the training process. We validate the effectiveness of our method on large and unbounded scene from mip-NeRF 360 and Deep Blending datasets. Our method surpasses recent mesh-based neural rendering techniques by achieving a 2dB higher PSNR, and outperforms mesh-based Gaussian splatting methods by 1.3 dB PSNR, particularly on the outdoor mip-NeRF 360 dataset, demonstrating better rendering quality. We provide analyses for each type of Gaussian splat and achieve a reduction in the number of Gaussian splats by 30% compared to the original 3D Gaussian splatting.\n\n最近，3D高斯散射因其生成高保真渲染结果的能力而受到关注。同时，大多数应用（如游戏、动画和AR/VR）使用基于网格的表示来表达和渲染3D场景。我们提出了一种新颖的方法，将网格表示与3D高斯散射相结合，以对重建的真实世界场景进行高质量渲染。具体来说，我们引入了一种基于距离的高斯散射技术，将高斯点与网格表面对齐，并移除对渲染无贡献的冗余高斯点。我们考虑每个高斯点与网格表面之间的距离，以区分紧密绑定的高斯点和松散绑定的高斯点。紧密绑定的高斯点被压平并与网格几何形状对齐，而松散绑定的高斯点则用于修正重建3D网格中的渲染瑕疵。我们提出了一种将高斯点绑定到网格几何形状的训练策略，并同时考虑两种类型的高斯点。在此背景下，我们引入了几种正则化技术，旨在训练过程中精确对齐紧密绑定的高斯点与网格表面。我们在mip-NeRF 360和Deep Blending数据集上的大规模和无界场景中验证了我们方法的有效性。我们的方法相较于最近的基于网格的神经渲染技术，在PSNR上提升了2dB，并且在mip-NeRF 360的户外数据集上，较基于网格的高斯散射方法提升了1.3dB PSNR，展现了更好的渲染质量。我们对每种类型的高斯点进行了分析，并将高斯点的数量相比原始3D高斯散射减少了30%。\n"
  },
  {
    "path": "abs/2410.09292.md",
    "content": "### SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction\n\nAccurate 3D reconstruction of dynamic surgical scenes from endoscopic video is essential for robotic-assisted surgery. While recent 3D Gaussian Splatting methods have shown promise in achieving high-quality reconstructions with fast rendering speeds, their use of inverse depth loss functions compresses depth variations. This can lead to a loss of fine geometric details, limiting their ability to capture precise 3D geometry and effectiveness in intraoperative application. To address these challenges, we present SurgicalGS, a dynamic 3D Gaussian Splatting framework specifically designed for surgical scene reconstruction with improved geometric accuracy. Our approach first initialises a Gaussian point cloud using depth priors, employing binary motion masks to identify pixels with significant depth variations and fusing point clouds from depth maps across frames for initialisation. We use the Flexible Deformation Model to represent dynamic scene and introduce a normalised depth regularisation loss along with an unsupervised depth smoothness constraint to ensure more accurate geometric reconstruction. Extensive experiments on two real surgical datasets demonstrate that SurgicalGS achieves state-of-the-art reconstruction quality, especially in terms of accurate geometry, advancing the usability of 3D Gaussian Splatting in robotic-assisted surgery.\n\n从内窥镜视频中精确重建动态手术场景对于机器人辅助手术至关重要。尽管近期的3D高斯散射方法在实现高质量重建和快速渲染方面展现了潜力，但其使用的反深度损失函数压缩了深度变化，导致细微几何细节的丢失，限制了捕捉精确3D几何形状的能力，进而影响其在术中应用的效果。为了解决这些问题，我们提出了SurgicalGS，这是一个专为手术场景重建设计的动态3D高斯散射框架，能够提升几何精度。我们的方法首先使用深度先验初始化高斯点云，利用二值运动掩码识别具有显著深度变化的像素，并通过融合多个帧的深度图点云进行初始点云的生成。我们采用灵活的变形模型来表示动态场景，并引入归一化深度正则化损失和无监督深度平滑约束，以确保更加精确的几何重建。在两个真实手术数据集上的大量实验表明，SurgicalGS在几何精度方面达到了当前最先进的重建质量，推动了3D高斯散射在机器人辅助手术中的实用性。\n"
  },
  {
    "path": "abs/2410.09467.md",
    "content": "### Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors\n\n3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Accurately reconstructing an object's complete 3D structure and texture has numerous applications in real-world scenarios, including robotic manipulation, grasping, 3D scene understanding, and AR/VR. Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture by optimizing the efficient representation of Gaussian Splatting, guided by pre-trained 2D or 3D diffusion models. However, a notable disparity exists between the training datasets of these models, leading to distinct differences in their outputs. While 2D models generate highly detailed visuals, they lack cross-view consistency in geometry and texture. In contrast, 3D models ensure consistency across different views but often result in overly smooth textures. We propose bridging the gap between 2D and 3D diffusion models to address this limitation by integrating a two-stage frequency-based distillation loss with Gaussian Splatting. Specifically, we leverage geometric priors in the low-frequency spectrum from a 3D diffusion model to maintain consistent geometry and use a 2D diffusion model to refine the fidelity and texture in the high-frequency spectrum of the generated 3D structure, resulting in more detailed and fine-grained outcomes. Our approach enhances geometric consistency and visual quality, outperforming the current SOTA. Additionally, we demonstrate the easy adaptability of our method for efficient object pose estimation and tracking.\n\n从单张图像生成3D物体需要估计出在自然环境中拍摄的无姿态RGB图像的完整3D几何形状和纹理。准确重建物体的完整3D结构和纹理在现实世界中有广泛的应用，例如机器人操作、抓取、3D场景理解以及AR/VR。最近在3D物体生成方面的进展引入了通过优化高效的高斯散射表示，结合预训练的2D或3D扩散模型来重建物体的3D形状和纹理。然而，这些模型的训练数据集存在显著差异，导致输出结果的不同。2D模型尽管生成了高度细致的视觉效果，但在几何形状和纹理的一致性上表现欠佳。相反，3D模型能够确保跨视图的一致性，但通常会导致纹理过于平滑。为了解决这一局限性，我们提出了通过结合2D和3D扩散模型来弥合这一差距的方法，并将其与高斯散射相集成，采用两阶段的基于频率的蒸馏损失。具体来说，我们利用3D扩散模型在低频谱中的几何先验来保持几何一致性，同时使用2D扩散模型在高频谱中精细化生成的3D结构的保真度和纹理，从而生成更加详细和精细的结果。我们的方法提升了几何一致性和视觉质量，超越了当前的最先进技术。此外，我们展示了该方法在高效物体姿态估计和跟踪方面的易于适应性。\n"
  },
  {
    "path": "abs/2410.09740.md",
    "content": "### Gaussian Splatting Visual MPC for Granular Media Manipulation\n\nRecent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.\n\n最近在学习型3D表示方面的进展极大推动了复杂机器人操作任务的解决，特别是对刚体物体的操作。然而，操控颗粒状材料（如豆类、坚果和大米）仍然具有挑战性，这是由于颗粒相互作用的复杂物理特性、高维且部分可观测的状态、难以在一堆颗粒中可视化跟踪单个颗粒，以及对准确动态预测的高计算要求。目前的深度潜在动态模型在颗粒材料操作中常因缺乏归纳偏差而难以泛化。在这项工作中，我们提出了一种新颖的方法，通过对场景的高斯散射表示学习视觉动态模型，并利用该模型通过模型预测控制（Model-Predictive Control）来操控颗粒介质。我们的方法能够有效优化复杂的颗粒介质堆操作任务。我们在仿真和真实环境中评估了该方法，展示了其在解决未见过的规划任务和零样本迁移至新环境中的泛化能力。与现有的颗粒介质操作方法相比，我们的方法在预测和操作性能上也有显著提升。\n"
  },
  {
    "path": "abs/2410.10412.md",
    "content": "### 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting\n\n3D neural style transfer has gained significant attention for its potential to provide user-friendly stylization with spatial consistency. However, existing 3D style transfer methods often fall short in terms of inference efficiency, generalization ability, and struggle to handle dynamic scenes with temporal consistency. In this paper, we introduce 4DStyleGaussian, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence. Our approach leverages an embedded 4D Gaussian Splatting technique, which is trained using a reversible neural network for reducing content loss in the feature distillation process. Utilizing the 4D embedded Gaussians, we predict a 4D style transformation matrix that facilitates spatially and temporally consistent style transfer with Gaussian Splatting. Experiments demonstrate that our method can achieve high-quality and zero-shot stylization for 4D scenarios with enhanced efficiency and spatial-temporal consistency.\n\n3D神经风格迁移因其能够提供用户友好的风格化并保持空间一致性而备受关注。然而，现有的3D风格迁移方法在推理效率、泛化能力方面往往不足，且在处理具有时间一致性的动态场景时存在挑战。在本文中，我们提出了一种新颖的4D风格迁移框架——4DStyleGaussian，旨在实现任意风格参考的实时风格化，同时保持合理的内容关联性、多视角一致性以及时间连贯性。我们的方法利用了嵌入式4D高斯散射技术，该技术通过可逆神经网络进行训练，从而在特征蒸馏过程中减少内容损失。借助嵌入的4D高斯，我们预测了一个4D风格变换矩阵，以实现具有高斯散射的空间和时间一致的风格迁移。实验表明，我们的方法在4D场景中能够实现高质量的零样本风格迁移，并在效率和时空一致性方面得到了显著提升。\n"
  },
  {
    "path": "abs/2410.10719.md",
    "content": "### 4-LEGS: 4D Language Embedded Gaussian Splatting\n\nThe emergence of neural representations has revolutionized our means for digitally viewing a wide range of 3D scenes, enabling the synthesis of photorealistic images rendered from novel views. Recently, several techniques have been proposed for connecting these low-level representations with the high-level semantics understanding embodied within the scene. These methods elevate the rich semantic understanding from 2D imagery to 3D representations, distilling high-dimensional spatial features onto 3D space. In our work, we are interested in connecting language with a dynamic modeling of the world. We show how to lift spatio-temporal features to a 4D representation based on 3D Gaussian Splatting. This enables an interactive interface where the user can spatiotemporally localize events in the video from text prompts. We demonstrate our system on public 3D video datasets of people and animals performing various actions.\n\n神经表示的出现彻底改变了我们数字化观看各种3D场景的方式，使得从新视角合成照片级真实感的图像成为可能。近期，一些技术被提出，用于将这些低级表示与场景中的高级语义理解相连接。这些方法将来自2D图像的丰富语义理解提升至3D表示，通过在3D空间上蒸馏高维空间特征。在我们的工作中，我们关注如何将语言与动态世界的建模相结合。我们展示了如何基于3D高斯散射将时空特征提升至4D表示，这使得用户可以通过文本提示在视频中时空定位事件。我们在公共的3D视频数据集上展示了系统的效果，这些数据集包含了人类和动物执行各种动作的场景。\n"
  },
  {
    "path": "abs/2410.11080.md",
    "content": "### Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting\n\n3D Gaussian splatting has surpassed neural radiance field methods in novel view synthesis by achieving lower computational costs and real-time high-quality rendering. Although it produces a high-quality rendering with a lot of input views, its performance drops significantly when only a few views are available. In this work, we address this by proposing a depth-aware Gaussian splatting method for few-shot novel view synthesis. We use monocular depth prediction as a prior, along with a scale-invariant depth loss, to constrain the 3D shape under just a few input views. We also model color using lower-order spherical harmonics to avoid overfitting. Further, we observe that removing splats with lower opacity periodically, as performed in the original work, leads to a very sparse point cloud and, hence, a lower-quality rendering. To mitigate this, we retain all the splats, leading to a better reconstruction in a few view settings. Experimental results show that our method outperforms the traditional 3D Gaussian splatting methods by achieving improvements of 10.5% in peak signal-to-noise ratio, 6% in structural similarity index, and 14.1% in perceptual similarity, thereby validating the effectiveness of our approach.\n\n3D高斯散射在新视角合成中已超越神经辐射场方法，实现了更低的计算成本和实时高质量渲染。尽管在大量输入视角下能生成高质量的渲染，但当仅有少量视角时，其性能会显著下降。在本工作中，我们提出了一种深度感知的高斯散射方法，专门用于少样本的新视角合成。我们使用单目深度预测作为先验，并结合尺度不变的深度损失来约束在少量输入视角下的3D形状。此外，我们采用低阶球谐函数来建模颜色，以避免过拟合。此外，我们观察到在原始方法中定期移除低不透明度的散点会导致点云过于稀疏，从而降低渲染质量。为了解决这一问题，我们保留了所有的散点，从而在少视角设置下实现了更好的重建。实验结果表明，我们的方法在峰值信噪比（PSNR）上提高了10.5%，结构相似性指数（SSIM）上提高了6%，感知相似性上提高了14.1%，验证了我们方法的有效性。\n"
  },
  {
    "path": "abs/2410.11285.md",
    "content": "### Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting\n\nScene reconstruction and novel-view synthesis for large, complex, multi-story, indoor scenes is a challenging and time-consuming task. Prior methods have utilized drones for data capture and radiance fields for scene reconstruction, both of which present certain challenges. First, in order to capture diverse viewpoints with the drone's front-facing camera, some approaches fly the drone in an unstable zig-zag fashion, which hinders drone-piloting and generates motion blur in the captured data. Secondly, most radiance field methods do not easily scale to arbitrarily large number of images. This paper proposes an efficient and scalable pipeline for indoor novel-view synthesis from drone-captured 360 videos using 3D Gaussian Splatting. 360 cameras capture a wide set of viewpoints, allowing for comprehensive scene capture under a simple straightforward drone trajectory. To scale our method to large scenes, we devise a divide-and-conquer strategy to automatically split the scene into smaller blocks that can be reconstructed individually and in parallel. We also propose a coarse-to-fine alignment strategy to seamlessly match these blocks together to compose the entire scene. Our experiments demonstrate marked improvement in both reconstruction quality, i.e. PSNR and SSIM, and computation time compared to prior approaches.\n\n对于大规模、复杂的多层室内场景，场景重建和新视角合成是一项充满挑战且耗时的任务。以往的方法使用无人机进行数据捕捉和辐射场进行场景重建，但面临一些挑战。首先，为了使用无人机的前置摄像头捕捉多样化的视角，一些方法采用不稳定的Z字形飞行模式，这不仅影响无人机的操作，还会导致捕捉数据时出现运动模糊。其次，大多数辐射场方法难以轻松扩展至任意大量的图像。本论文提出了一种高效且可扩展的管线，利用3D高斯散射从无人机捕捉的360度视频中进行室内新视角合成。360度相机捕捉到了广泛的视角范围，允许在简单直线飞行轨迹下全面捕捉场景。为了使我们的方法能够扩展到大场景，我们设计了一种分而治之的策略，自动将场景划分为可独立并行重建的小块。我们还提出了一种由粗到精的对齐策略，能够无缝匹配这些小块，从而构建整个场景。实验表明，与以往方法相比，我们的方法在重建质量（如PSNR和SSIM）和计算时间上都有显著提升。\n"
  },
  {
    "path": "abs/2410.11356.md",
    "content": "### GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance\n\nThe emergence of 3D Gaussian Splatting (3DGS) has recently sparked a renewed wave of dense visual SLAM research. However, current methods face challenges such as sensitivity to artifacts and noise, sub-optimal selection of training viewpoints, and a lack of light global optimization. In this paper, we propose a dense SLAM system that tightly couples 3DGS with ORB features. We design a joint optimization approach for robust tracking and effectively reducing the impact of noise and artifacts. This involves combining novel geometric observations, derived from accumulated transmittance, with ORB features extracted from pixel data. Furthermore, to improve mapping quality, we propose an adaptive Gaussian expansion and regularization method that enables Gaussian primitives to represent the scene compactly. This is coupled with a viewpoint selection strategy based on the hybrid graph to mitigate over-fitting effects and enhance convergence quality. Finally, our approach achieves compact and high-quality scene representations and accurate localization. GSORB-SLAM has been evaluated on different datasets, demonstrating outstanding performance.\n\n随着3D高斯散射（3DGS）的出现，密集视觉SLAM研究再次掀起了一股新的热潮。然而，当前的方法面临诸多挑战，例如对伪影和噪声的敏感性、训练视点选择不佳以及全局优化不足等问题。在本文中，我们提出了一种将3DGS与ORB特征紧密结合的密集SLAM系统。我们设计了一种联合优化方法，以实现鲁棒的跟踪，并有效减少噪声和伪影的影响。这种方法结合了基于累积透射率的几何观测与从像素数据中提取的ORB特征。此外，为了提升地图构建的质量，我们提出了一种自适应高斯扩展和正则化方法，使得高斯基元能够紧凑地表示场景。我们还引入了一种基于混合图的视点选择策略，以减少过拟合现象并增强收敛质量。最终，我们的方法实现了紧凑且高质量的场景表示和精确的定位。GSORB-SLAM在不同数据集上进行了评估，表现出了卓越的性能。\n\n"
  },
  {
    "path": "abs/2410.11394.md",
    "content": "### MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields\n\nRadiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering. However, with sparse input views, the lack of multi-view consistency constraints results in poorly initialized point clouds and unreliable heuristics for optimization and densification, leading to suboptimal performance. Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images. Additionally, they rely on multi-view stereo (MVS)-based initialization, which limits the efficiency of scene representation. To overcome these challenges, we propose a view synthesis framework based on 3D Gaussian Splatting, named MCGS, enabling photorealistic scene reconstruction from sparse input views. The key innovations of MCGS in enhancing multi-view consistency are as follows: i) We introduce an initialization method by leveraging a sparse matcher combined with a random filling strategy, yielding a compact yet sufficient set of initial points. This approach enhances the initial geometry prior, promoting efficient scene representation. ii) We develop a multi-view consistency-guided progressive pruning strategy to refine the Gaussian field by strengthening consistency and eliminating low-contribution Gaussians. These modular, plug-and-play strategies enhance robustness to sparse input views, accelerate rendering, and reduce memory consumption, making MCGS a practical and efficient framework for 3D Gaussian Splatting.\n\n以3D高斯表示的辐射场在新视角合成中表现出色，既具备高效的训练能力，又能实现快速渲染。然而，在稀疏输入视角的情况下，缺乏多视角一致性约束，导致点云初始化较差，优化和密集化过程中依赖的不可靠启发式方法，进而导致性能不佳。现有方法通常借助于密集估计网络中的深度先验，但忽视了输入图像中固有的多视角一致性。此外，它们依赖于基于多视角立体（MVS）的初始化，这限制了场景表示的效率。为了解决这些问题，我们提出了基于3D高斯散射的视角合成框架——MCGS，能够从稀疏输入视角进行照片级真实感的场景重建。MCGS在增强多视角一致性方面的关键创新如下：\ni) 我们引入了一种初始化方法，结合稀疏匹配器和随机填充策略，生成紧凑但足够的初始点集。这种方法增强了几何先验，促进了高效的场景表示。\nii) 我们开发了一种基于多视角一致性的渐进修剪策略，通过加强一致性并消除低贡献的高斯点来优化高斯场。这些模块化的即插即用策略提升了稀疏输入视角下的鲁棒性，加速了渲染并减少了内存消耗，使MCGS成为一个实用且高效的3D高斯散射框架。\n"
  },
  {
    "path": "abs/2410.11419.md",
    "content": "### GS^3: Efficient Relighting with Triple Gaussian Splatting\n\nWe present a spatial and angular Gaussian based representation and a triple splatting process, for real-time, high-quality novel lighting-and-view synthesis from multi-view point-lit input images. To describe complex appearance, we employ a Lambertian plus a mixture of angular Gaussians as an effective reflectance function for each spatial Gaussian. To generate self-shadow, we splat all spatial Gaussians towards the light source to obtain shadow values, which are further refined by a small multi-layer perceptron. To compensate for other effects like global illumination, another network is trained to compute and add a per-spatial-Gaussian RGB tuple. The effectiveness of our representation is demonstrated on 30 samples with a wide variation in geometry (from solid to fluffy) and appearance (from translucent to anisotropic), as well as using different forms of input data, including rendered images of synthetic/reconstructed objects, photographs captured with a handheld camera and a flash, or from a professional lightstage. We achieve a training time of 40-70 minutes and a rendering speed of 90 fps on a single commodity GPU. Our results compare favorably with state-of-the-art techniques in terms of quality/performance.\n\n我们提出了一种基于空间和角度高斯表示的三重散射过程，用于从多视角点光源输入图像中进行实时、高质量的新光照和视角合成。为了描述复杂的外观，我们为每个空间高斯采用了朗伯反射与混合角度高斯的有效反射函数。为了生成自阴影，我们将所有空间高斯投影到光源方向以获取阴影值，这些值随后通过一个小型多层感知机（MLP）进行进一步优化。为了补偿全局光照等其他效果，我们训练了另一个网络来计算并为每个空间高斯添加RGB三元组。我们的表示在几何形状（从实心到松软）和外观（从半透明到各向异性）变化广泛的30个样本上得到了验证，并使用了不同形式的输入数据，包括合成/重建对象的渲染图像、手持相机加闪光灯拍摄的照片，以及专业光照台的图像。我们在一块普通GPU上实现了40到70分钟的训练时间和每秒90帧的渲染速度。我们的结果在质量和性能方面与最先进的技术相比表现优越。\n"
  },
  {
    "path": "abs/2410.11505.md",
    "content": "### LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images\n\nVisual localization involves estimating a query image's 6-DoF (degrees of freedom) camera pose, which is a fundamental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach's SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.\n\n视觉定位涉及估计查询图像的6自由度（6-DoF）相机姿态，这是各种计算机视觉和机器人任务中的核心组成部分。本文提出了LoGS，一个基于视觉的定位管线，利用3D高斯散射（GS）技术作为场景表示。该新颖的表示方法支持高质量的新视角合成。在建图阶段，首先应用结构从运动（SfM）方法，然后生成GS地图。在定位过程中，初始位置通过图像检索获得，并结合局部特征匹配和PnP求解器，接着通过在GS地图上的基于合成分析的方式获得高精度姿态。四个大规模数据集上的实验结果表明，该方法在相机姿态估计方面达到了当前最先进（SoTA）的精度，并在少样本挑战条件下表现出较强的鲁棒性。\n"
  },
  {
    "path": "abs/2410.11682.md",
    "content": "### SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars\n\nRecent advancements in head avatar rendering using Gaussian primitives have achieved significantly high-fidelity results. Although precise head geometry is crucial for applications like mesh reconstruction and relighting, current methods struggle to capture intricate geometric details and render unseen poses due to their reliance on similarity transformations, which cannot handle stretch and shear transforms essential for detailed deformations of geometry. To address this, we propose SurFhead, a novel method that reconstructs riggable head geometry from RGB videos using 2D Gaussian surfels, which offer well-defined geometric properties, such as precise depth from fixed ray intersections and normals derived from their surface orientation, making them advantageous over 3D counterparts. SurFhead ensures high-fidelity rendering of both normals and images, even in extreme poses, by leveraging classical mesh-based deformation transfer and affine transformation interpolation. SurFhead introduces precise geometric deformation and blends surfels through polar decomposition of transformations, including those affecting normals. Our key contribution lies in bridging classical graphics techniques, such as mesh-based deformation, with modern Gaussian primitives, achieving state-of-the-art geometry reconstruction and rendering quality. Unlike previous avatar rendering approaches, SurFhead enables efficient reconstruction driven by Gaussian primitives while preserving high-fidelity geometry.\n\n在头部头像渲染领域，基于高斯基元的最新进展已经实现了显著的高保真效果。尽管精确的头部几何形状对于网格重建和光照再现等应用至关重要，但现有方法由于依赖相似变换，难以捕捉复杂的几何细节并渲染未见过的姿态。相似变换无法处理几何变形中所需的拉伸和剪切变换，从而限制了细节的表现。为了解决这一问题，我们提出了SurFhead，这是一种新方法，使用2D高斯表面体（surfels）从RGB视频中重建可操控的头部几何形状。2D高斯表面体具有明确的几何特性，例如通过固定射线交点精确获取深度，并且可根据表面方向推导法线，相比3D高斯基元具有优势。SurFhead通过结合经典的基于网格的形变传递和仿射变换插值，实现了在极端姿态下的高保真法线和图像渲染。SurFhead通过对影响法线的变换进行极分解，引入了精确的几何变形和表面体的混合。我们的核心贡献在于将经典图形技术（如基于网格的形变）与现代高斯基元结合，实现了最先进的几何重建和渲染质量。与以往的头像渲染方法不同，SurFhead在通过高斯基元驱动的高效重建的同时，保持了高保真的几何形态。\n"
  },
  {
    "path": "abs/2410.12080.md",
    "content": "### SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection\n\nImage-based Pose-Agnostic 3D Anomaly Detection is an important task that has emerged in industrial quality control. This task seeks to find anomalies from query images of a tested object given a set of reference images of an anomaly-free object. The challenge is that the query views (a.k.a poses) are unknown and can be different from the reference views. Currently, new methods such as OmniposeAD and SplatPose have emerged to bridge the gap by synthesizing pseudo reference images at the query views for pixel-to-pixel comparison. However, none of these methods can infer in real-time, which is critical in industrial quality control for massive production. For this reason, we propose SplatPose+, which employs a hybrid representation consisting of a Structure from Motion (SfM) model for localization and a 3D Gaussian Splatting (3DGS) model for Novel View Synthesis. Although our proposed pipeline requires the computation of an additional SfM model, it offers real-time inference speeds and faster training compared to SplatPose. Quality-wise, we achieved a new SOTA on the Pose-agnostic Anomaly Detection benchmark with the Multi-Pose Anomaly Detection (MAD-SIM) dataset.\n\n基于图像的姿态无关3D异常检测是工业质量控制中出现的一项重要任务。该任务旨在通过一组无异常物体的参考图像，从待测物体的查询图像中发现异常。其挑战在于查询视角（即姿态）未知，且可能与参考视角不同。目前，诸如OmniposeAD和SplatPose等新方法通过在查询视角下合成伪参考图像进行像素级对比，试图缩小差距。然而，这些方法均无法实现实时推断，而实时性在大规模生产的工业质量控制中至关重要。为此，我们提出了SplatPose+，该方法采用了混合表示形式，结合运动结构（SfM）模型进行定位，并利用3D高斯点云（3DGS）模型进行新视角合成。尽管我们的方法需要额外计算SfM模型，但相比SplatPose，它在推断速度和训练速度上更具实时性。质量方面，我们在姿态无关异常检测基准测试中，利用Multi-Pose Anomaly Detection (MAD-SIM) 数据集，达到了新的SOTA（最先进技术）水平。\n"
  },
  {
    "path": "abs/2410.12262.md",
    "content": "### 3D Gaussian Splatting in Robotics: A Survey\n\nDense 3D representations of the environment have been a long-term goal in the robotics field. While previous Neural Radiance Fields (NeRF) representation have been prevalent for its implicit, coordinate-based model, the recent emergence of 3D Gaussian Splatting (3DGS) has demonstrated remarkable potential in its explicit radiance field representation. By leveraging 3D Gaussian primitives for explicit scene representation and enabling differentiable rendering, 3DGS has shown significant advantages over other radiance fields in real-time rendering and photo-realistic performance, which is beneficial for robotic applications. In this survey, we provide a comprehensive understanding of 3DGS in the field of robotics. We divide our discussion of the related works into two main categories: the application of 3DGS and the advancements in 3DGS techniques. In the application section, we explore how 3DGS has been utilized in various robotics tasks from scene understanding and interaction perspectives. The advance of 3DGS section focuses on the improvements of 3DGS own properties in its adaptability and efficiency, aiming to enhance its performance in robotics. We then summarize the most commonly used datasets and evaluation metrics in robotics. Finally, we identify the challenges and limitations of current 3DGS methods and discuss the future development of 3DGS in robotics.\n\n密集的3D环境表示一直是机器人领域的长期目标。尽管先前的神经辐射场（NeRF）因其基于坐标的隐式模型而广受欢迎，但近期3D高斯散射（3DGS）的出现展示了其显式辐射场表示的巨大潜力。通过利用3D高斯基元进行显式场景表示并实现可微渲染，3DGS在实时渲染和照片级真实感性能方面表现出显著优势，这对机器人应用非常有利。在本综述中，我们对3DGS在机器人领域的应用进行了全面的探讨。我们将相关工作分为两个主要类别：3DGS的应用和3DGS技术的进展。在应用部分，我们探讨了3DGS如何从场景理解和交互的角度应用于各种机器人任务。在3DGS技术进展部分，我们聚焦于其自适应性和效率的提升，旨在增强其在机器人领域的性能。随后，我们总结了机器人领域中最常用的数据集和评估指标。最后，我们指出了当前3DGS方法的挑战和局限性，并讨论了3DGS在机器人领域未来的发展方向。\n"
  },
  {
    "path": "abs/2410.12781.md",
    "content": "### Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats\n\nWe propose Long-LRM, a generalizable 3D Gaussian reconstruction model that is capable of reconstructing a large scene from a long sequence of input images. Specifically, our model can process 32 source images at 960x540 resolution within only 1.3 seconds on a single A100 80G GPU. Our architecture features a mixture of the recent Mamba2 blocks and the classical transformer blocks which allowed many more tokens to be processed than prior work, enhanced by efficient token merging and Gaussian pruning steps that balance between quality and efficiency. Unlike previous feed-forward models that are limited to processing 1~4 input images and can only reconstruct a small portion of a large scene, Long-LRM reconstructs the entire scene in a single feed-forward step. On large-scale scene datasets such as DL3DV-140 and Tanks and Temples, our method achieves performance comparable to optimization-based approaches while being two orders of magnitude more efficient.\n\n我们提出了Long-LRM，这是一个可扩展的3D高斯重建模型，能够从长序列的输入图像中重建大规模场景。具体来说，我们的模型能够在一块A100 80G GPU上仅用1.3秒处理32张分辨率为960x540的源图像。我们的架构结合了近期的Mamba2模块和经典的Transformer模块，能够处理比以往工作更多的tokens，并通过高效的token合并和高斯修剪步骤在质量与效率之间取得平衡。与之前受限于处理1至4张输入图像、只能重建场景一小部分的前馈模型不同，Long-LRM能够在单次前馈步骤中重建整个场景。在像DL3DV-140和Tanks and Temples这样的大规模场景数据集上，我们的方法在性能上与基于优化的方法相当，但效率却高出两个数量级。\n"
  },
  {
    "path": "abs/2410.13195.md",
    "content": "### UniG: Modelling Unitary 3D Gaussians for View-consistent 3D Reconstruction\n\nIn this work, we present UniG, a view-consistent 3D reconstruction and novel view synthesis model that generates a high-fidelity representation of 3D Gaussians from sparse images. Existing 3D Gaussians-based methods usually regress Gaussians per-pixel of each view, create 3D Gaussians per view separately, and merge them through point concatenation. Such a view-independent reconstruction approach often results in a view inconsistency issue, where the predicted positions of the same 3D point from different views may have discrepancies. To address this problem, we develop a DETR (DEtection TRansformer)-like framework, which treats 3D Gaussians as decoder queries and updates their parameters layer by layer by performing multi-view cross-attention (MVDFA) over multiple input images. In this way, multiple views naturally contribute to modeling a unitary representation of 3D Gaussians, thereby making 3D reconstruction more view-consistent. Moreover, as the number of 3D Gaussians used as decoder queries is irrespective of the number of input views, allow an arbitrary number of input images without causing memory explosion. Extensive experiments validate the advantages of our approach, showcasing superior performance over existing methods quantitatively (improving PSNR by 4.2 dB when trained on Objaverse and tested on the GSO benchmark) and qualitatively.\n\n在这项工作中，我们提出了UniG，这是一种视角一致的3D重建和新视角合成模型，可以从稀疏图像中生成高保真的3D高斯表示。现有基于3D高斯的方法通常针对每个视图的每个像素回归高斯，在每个视图上分别创建3D高斯，并通过点拼接进行合并。这种视图独立的重建方法常常导致视角不一致的问题，即不同视图下预测的同一个3D点的位置可能存在偏差。为了解决这个问题，我们开发了一个类似DETR（DEtection TRansformer）的框架，将3D高斯视为解码器查询，通过在多个输入图像上执行多视图交叉注意力（MVDFA），逐层更新其参数。通过这种方式，多个视图自然共同作用于3D高斯的单一表示，从而使3D重建更加视角一致。此外，作为解码器查询使用的3D高斯数量与输入视图数量无关，这允许任意数量的输入图像而不会导致内存爆炸。大量实验验证了我们方法的优势，展示了在现有方法上定量（在Objaverse上训练并在GSO基准上测试时，PSNR提高了4.2 dB）和定性方面的优越表现。\n"
  },
  {
    "path": "abs/2410.13280.md",
    "content": "### Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization\n\nNovel view synthesis has made significant progress in the field of 3D computer vision. However, the rendering of view-consistent novel views from imperfect camera poses remains challenging. In this paper, we introduce a hybrid bundle-adjusting 3D Gaussians model that enables view-consistent rendering with pose optimization. This model jointly extract image-based and neural 3D representations to simultaneously generate view-consistent images and camera poses within forward-facing scenes. The effective of our model is demonstrated through extensive experiments conducted on both real and synthetic datasets. These experiments clearly illustrate that our model can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments.\n\n新视角合成在3D计算机视觉领域取得了显著进展，然而，从不完美的相机位姿中渲染视角一致的新视图仍然充满挑战。在本文中，我们引入了一种混合捆绑调整的3D高斯模型，该模型通过位姿优化实现视角一致的渲染。该模型联合提取基于图像和神经3D表示，在前向场景中同时生成视角一致的图像和相机位姿。通过在真实和合成数据集上进行的大量实验，我们证明了该模型的有效性。这些实验清晰地展示了该模型能够有效优化神经场景表示，同时解决显著的相机位姿错位问题。\n"
  },
  {
    "path": "abs/2410.13349.md",
    "content": "### GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting\n\nReconstructing objects from posed images is a crucial and complex task in computer graphics and computer vision. While NeRF-based neural reconstruction methods have exhibited impressive reconstruction ability, they tend to be time-comsuming. Recent strategies have adopted 3D Gaussian Splatting (3D-GS) for inverse rendering, which have led to quick and effective outcomes. However, these techniques generally have difficulty in producing believable geometries and materials for glossy objects, a challenge that stems from the inherent ambiguities of inverse rendering. To address this, we introduce GlossyGS, an innovative 3D-GS-based inverse rendering framework that aims to precisely reconstruct the geometry and materials of glossy objects by integrating material priors. The key idea is the use of micro-facet geometry segmentation prior, which helps to reduce the intrinsic ambiguities and improve the decomposition of geometries and materials. Additionally, we introduce a normal map prefiltering strategy to more accurately simulate the normal distribution of reflective surfaces. These strategies are integrated into a hybrid geometry and material representation that employs both explicit and implicit methods to depict glossy objects. We demonstrate through quantitative analysis and qualitative visualization that the proposed method is effective to reconstruct high-fidelity geometries and materials of glossy objects, and performs favorably against state-of-the-arts.\n\n从有姿态的图像重建物体是计算机图形学和计算机视觉中的一项关键且复杂的任务。虽然基于NeRF的神经重建方法展示了令人印象深刻的重建能力，但这些方法往往耗时较长。最近的策略采用了3D高斯散射（3D-GS）进行逆向渲染，取得了快速且有效的成果。然而，这些技术通常难以为光滑物体生成逼真的几何形状和材质，主要是由于逆向渲染中的固有模糊性。为了解决这一问题，我们提出了GlossyGS，这是一个创新的基于3D-GS的逆向渲染框架，旨在通过整合材质先验精确重建光滑物体的几何形状和材质。其核心思想是使用微面几何分割先验，帮助减少固有模糊性，并改进几何形状和材质的分解。此外，我们引入了法线图预过滤策略，以更准确地模拟反射表面的法线分布。这些策略集成到一个混合几何和材质表示中，结合了显式和隐式方法来描绘光滑物体。通过定量分析和定性可视化，我们证明了所提出的方法在重建光滑物体的高保真几何形状和材质方面是有效的，并且相较于现有的最先进方法表现优越。\n"
  },
  {
    "path": "abs/2410.13530.md",
    "content": "### L3DG: Latent 3D Gaussian Diffusion\n\nWe propose L3DG, the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. This enables effective generative 3D modeling, scaling to generation of entire room-scale scenes which can be very efficiently rendered. To enable effective synthesis of 3D Gaussians, we propose a latent diffusion formulation, operating in a compressed latent space of 3D Gaussians. This compressed latent space is learned by a vector-quantized variational autoencoder (VQ-VAE), for which we employ a sparse convolutional architecture to efficiently operate on room-scale scenes. This way, the complexity of the costly generation process via diffusion is substantially reduced, allowing higher detail on object-level generation, as well as scalability to large scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time. We demonstrate that our approach significantly improves visual quality over prior work on unconditional object-level radiance field synthesis and showcase its applicability to room-scale scene generation.\n\n我们提出了L3DG，这是首个通过潜在3D高斯扩散公式进行3D高斯生成建模的方法。这使得高效的3D生成建模成为可能，扩展至生成整个房间规模的场景，并且这些场景可以非常高效地渲染。为实现3D高斯的有效合成，我们提出了一种在压缩的3D高斯潜在空间中运行的扩散公式。该压缩的潜在空间由向量量化变分自编码器（VQ-VAE）学习，我们采用稀疏卷积架构，以便高效处理房间规模的场景。通过这种方式，基于扩散的高成本生成过程的复杂性得到了大幅降低，使得在对象级生成中获得更高细节，同时能够扩展至大规模场景。通过利用3D高斯表示，生成的场景可以从任意视角实时渲染。我们展示了该方法在无条件对象级辐射场合成上的视觉质量相较于之前的工作有显著提升，并展示了其在房间规模场景生成中的适用性。\n"
  },
  {
    "path": "abs/2410.13607.md",
    "content": "### DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering\n\nDynamic scenes rendering is an intriguing yet challenging problem. Although current methods based on NeRF have achieved satisfactory performance, they still can not reach real-time levels. Recently, 3D Gaussian Splatting (3DGS) has gar?nered researchers attention due to their outstanding rendering quality and real?time speed. Therefore, a new paradigm has been proposed: defining a canonical 3D gaussians and deforming it to individual frames in deformable fields. How?ever, since the coordinates of canonical 3D gaussians are filled with noise, which can transfer noise into the deformable fields, and there is currently no method that adequately considers the aggregation of 4D information. Therefore, we pro?pose Denoised Deformable Network with Temporal-Spatial Aggregation for Dy?namic Scene Rendering (DN-4DGS). Specifically, a Noise Suppression Strategy is introduced to change the distribution of the coordinates of the canonical 3D gaussians and suppress noise. Additionally, a Decoupled Temporal-Spatial Ag?gregation Module is designed to aggregate information from adjacent points and frames. Extensive experiments on various real-world datasets demonstrate that our method achieves state-of-the-art rendering quality under a real-time level.\n\n动态场景渲染是一个引人入胜但具有挑战性的问题。尽管基于NeRF的当前方法已取得了令人满意的表现，但它们仍无法达到实时渲染的水平。最近，3D高斯散射（3DGS）因其卓越的渲染质量和实时速度而引起了研究者的关注。因此，一种新的范式被提出：定义标准的3D高斯并将其变形应用到可变形场中的各个帧。然而，由于标准3D高斯的坐标充满噪声，这些噪声可能会传递到可变形场中，目前还没有方法能够充分考虑4D信息的聚合。为了解决这个问题，我们提出了带有时空聚合的去噪可变形网络用于动态场景渲染（DN-4DGS）。具体来说，我们引入了一种噪声抑制策略，以改变标准3D高斯坐标的分布并抑制噪声。此外，我们设计了一个解耦时空聚合模块，以从相邻点和帧中聚合信息。通过在多个真实世界数据集上的大量实验，我们的方法在实时水平下实现了最先进的渲染质量。\n\n"
  },
  {
    "path": "abs/2410.13613.md",
    "content": "### MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes\n\n4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite its advantages, 4DGS faces significant challenges, notably the requirement of millions of 4D Gaussians, each with extensive associated attributes, leading to substantial memory and storage cost. This paper introduces a memory-efficient framework for 4DGS. We streamline the color attribute by decomposing it into a per-Gaussian direct color component with only 3 parameters and a shared lightweight alternating current color predictor. This approach eliminates the need for spherical harmonics coefficients, which typically involve up to 144 parameters in classic 4DGS, thereby creating a memory-efficient 4D Gaussian representation. Furthermore, we introduce an entropy-constrained Gaussian deformation technique that uses a deformation field to expand the action range of each Gaussian and integrates an opacity-based entropy loss to limit the number of Gaussians, thus forcing our model to use as few Gaussians as possible to fit a dynamic scene well. With simple half-precision storage and zip compression, our framework achieves a storage reduction by approximately 190× and 125× on the Technicolor and Neural 3D Video datasets, respectively, compared to the original 4DGS. Meanwhile, it maintains comparable rendering speeds and scene representation quality, setting a new standard in the field.\n\n4D高斯散射（4DGS）作为捕捉复杂动态3D场景的高保真技术，最近获得了广泛关注。它利用4D高斯表示和GPU友好的光栅化器，实现了快速渲染速度。尽管具有诸多优势，4DGS仍面临显著挑战，尤其是需要数百万个4D高斯，每个高斯都附带大量属性，导致巨大的内存和存储成本。本文提出了一种内存高效的4DGS框架。我们通过将颜色属性分解为每个高斯的直接颜色分量（仅需3个参数）和一个共享的轻量级交流色彩预测器，从而简化了颜色表示。这一方法消除了传统4DGS中常见的球谐函数系数，后者通常需要多达144个参数，创建了一种内存高效的4D高斯表示。此外，我们引入了一种受限熵的高斯变形技术，该技术使用变形场来扩展每个高斯的作用范围，并结合基于不透明度的熵损失，限制高斯数量，从而迫使模型使用尽可能少的高斯点来很好地拟合动态场景。通过简单的半精度存储和zip压缩，我们的框架在Technicolor和Neural 3D Video数据集上分别实现了约190倍和125倍的存储压缩，相比原始4DGS，在保持相似的渲染速度和场景表示质量的同时，设立了该领域的新标准。\n"
  },
  {
    "path": "abs/2410.13862.md",
    "content": "### DepthSplat: Connecting Gaussian Splatting and Depth\n\nGaussian splatting and single/multi-view depth estimation are typically studied in isolation. In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation and study their interactions. More specifically, we first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features, leading to high-quality feed-forward 3D Gaussian splatting reconstructions. We also show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models from large-scale unlabelled datasets. We validate the synergy between Gaussian splatting and depth estimation through extensive ablation and cross-task transfer experiments. Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets in terms of both depth estimation and novel view synthesis, demonstrating the mutual benefits of connecting both tasks.\n\n高斯散射和单视角/多视角深度估计通常是独立研究的。在本文中，我们提出了DepthSplat，旨在连接高斯散射和深度估计，并研究它们之间的相互作用。具体而言，我们首先通过利用预训练的单目深度特征，贡献了一个鲁棒的多视角深度模型，从而实现了高质量的前馈式3D高斯散射重建。我们还展示了高斯散射可以作为一种无监督的预训练目标，从大规模未标注数据集中学习强大的深度模型。通过广泛的消融实验和跨任务转移实验，我们验证了高斯散射与深度估计之间的协同作用。我们的DepthSplat在ScanNet、RealEstate10K和DL3DV数据集上，在深度估计和新视角合成方面均达到了最先进的性能，展示了连接这两项任务的互惠优势。\n"
  },
  {
    "path": "abs/2410.14189.md",
    "content": "### Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set\n\n It is vital to infer a signed distance function (SDF) in multi-view based surface reconstruction. 3D Gaussian splatting (3DGS) provides a novel perspective for volume rendering, and shows advantages in rendering efficiency and quality. Although 3DGS provides a promising neural rendering option, it is still hard to infer SDFs for surface reconstruction with 3DGS due to the discreteness, the sparseness, and the off-surface drift of 3D Gaussians. To resolve these issues, we propose a method that seamlessly merge 3DGS with the learning of neural SDFs. Our key idea is to more effectively constrain the SDF inference with the multi-view consistency. To this end, we dynamically align 3D Gaussians on the zero-level set of the neural SDF using neural pulling, and then render the aligned 3D Gaussians through the differentiable rasterization. Meanwhile, we update the neural SDF by pulling neighboring space to the pulled 3D Gaussians, which progressively refine the signed distance field near the surface. With both differentiable pulling and splatting, we jointly optimize 3D Gaussians and the neural SDF with both RGB and geometry constraints, which recovers more accurate, smooth, and complete surfaces with more geometry details. Our numerical and visual comparisons show our superiority over the state-of-the-art results on the widely used benchmarks.\n\n 在基于多视角的表面重建中，推断有符号距离函数（SDF）是至关重要的。3D高斯散射（3DGS）为体积渲染提供了一种新颖的视角，并在渲染效率和质量方面展现了优势。尽管3DGS是一种有前景的神经渲染选择，但由于3D高斯的离散性、稀疏性以及偏离表面的漂移问题，使用3DGS推断SDF进行表面重建仍然具有困难。为了解决这些问题，我们提出了一种将3DGS与神经SDF学习无缝融合的方法。我们的关键思想是通过多视角一致性更有效地约束SDF的推断。为此，我们通过神经拉动动态地将3D高斯对齐到神经SDF的零级集上，然后通过可微光栅化渲染对齐后的3D高斯。同时，我们通过拉动邻近空间到被拉动的3D高斯上，逐步优化靠近表面的有符号距离场。通过可微分的拉动与散射，我们结合RGB和几何约束，共同优化3D高斯和神经SDF，从而恢复出更加精确、平滑且完整的表面，并展现更多的几何细节。我们的数值和视觉比较结果表明，在广泛使用的基准上，我们的方法优于当前最先进的成果。\n"
  },
  {
    "path": "abs/2410.14462.md",
    "content": "### LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes\n\nWe address the task of uplifting visual features or semantic masks from 2D vision models to 3D scenes represented by Gaussian Splatting. Whereas common approaches rely on iterative optimization-based procedures, we show that a simple yet effective aggregation technique yields excellent results. Applied to semantic masks from Segment Anything (SAM), our uplifting approach leads to segmentation quality comparable to the state of the art. We then extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results despite DINOv2 not being trained on millions of annotated masks like SAM.\n\n我们研究了将2D视觉模型的视觉特征或语义掩码提升到由高斯散射表示的3D场景中的任务。与常见的基于迭代优化的方法不同，我们展示了一种简单但有效的聚合技术能够产生出色的结果。应用于来自Segment Anything（SAM）的语义掩码时，我们的提升方法在分割质量上可与当前最先进的方法媲美。随后，我们将该方法扩展到通用的DINOv2特征，通过图扩散集成3D场景几何信息，尽管DINOv2没有像SAM那样在数百万标注掩码上进行训练，但仍然取得了有竞争力的分割结果。\n"
  },
  {
    "path": "abs/2410.15392.md",
    "content": "### EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting\n\nScene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames.We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS.\n\n从随意拍摄的视频中重建场景在实际应用中具有广泛用途。随着可微渲染技术的最新进展，一些方法尝试同时优化场景表示（如NeRF或3DGS）和相机位姿。然而，现有依赖传统相机输入的方法在高速（或等效的低帧率）场景中往往表现不佳。事件相机受生物视觉启发，能够以高时间分辨率异步记录像素级强度变化，为帧间的盲区提供宝贵的场景和运动信息。本文首次引入事件相机来辅助随意拍摄视频的场景重建，并提出了Event-Aided Free-Trajectory 3DGS（EF-3DGS），通过三个关键组件将事件相机的优势无缝集成到3DGS中。首先，我们利用事件生成模型（EGM）融合事件和帧，以事件流监督观察到的渲染视图。其次，我们在分段方式中采用对比度最大化（CMax）框架，通过最大化事件扭曲图像（IWE）的对比度来提取运动信息，从而校准估计的相机位姿。此外，基于线性事件生成模型（LEGM），IWE编码的亮度信息也被用于在梯度域约束3DGS。第三，为缓解事件缺少颜色信息的问题，我们引入光度束调整（PBA），以确保事件与帧之间的视角一致性。我们在公共的Tanks and Temples基准和一个新收集的真实数据集RealEv-DAVIS上对该方法进行了评估。\n\n"
  },
  {
    "path": "abs/2410.15629.md",
    "content": "### Fully Explicit Dynamic Gaussian Splatting\n\n3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Explicit 4D Gaussian Splatting(Ex4DGS). Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU.\n\n3D高斯点云已在静态场景中展示了快速且高质量的渲染效果，这得益于其密集的3D先验和显式表示。然而，这些优势并未延伸至动态场景的新视角合成。讽刺的是，主要障碍正是对这些先验和表示的依赖，导致在处理动态场景时训练和渲染时间大幅增加。本文提出了一种显式4D高斯点云方法（Ex4DGS）。我们的核心思路是首先在训练过程中分离静态和动态高斯，并在稀疏时间戳上显式采样动态高斯的位置和旋转。采样的位置和旋转随后通过插值来表示动态场景中物体的空间和时间连续运动，同时降低了计算成本。此外，我们引入了一种渐进式训练方案和点回溯技术，以提高Ex4DGS的收敛性。我们初始使用较短的时间戳训练Ex4DGS，并逐步延长时间戳，从而使其在少量点云上表现良好。点回溯用于量化每个高斯随时间的累积误差，从而在动态场景中检测并移除误差较大的高斯。多场景的全面实验验证了我们方法的渲染质量，单块2080Ti GPU上实现了62 fps的快速渲染。\n"
  },
  {
    "path": "abs/2410.15636.md",
    "content": "### LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images\n\nRecent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, these methods often struggle with controllability, as they lack information from multiple views, leading to incomplete or inconsistent 3D reconstructions. To address this limitation, we introduce LucidFusion, a flexible end-to-end feed-forward framework that leverages the Relative Coordinate Map (RCM). Unlike traditional methods linking images to 3D world thorough pose, LucidFusion utilizes RCM to align geometric features coherently across different views, making it highly adaptable for 3D generation from arbitrary, unposed images. Furthermore, LucidFusion seamlessly integrates with the original single-image-to-3D pipeline, producing detailed 3D Gaussians at a resolution of 512×512, making it well-suited for a wide range of applications.\n\n近年来，大型重建模型在从单张图像生成高质量3D对象方面取得了显著进展。然而，这些方法通常缺乏多视角信息，导致3D重建的可控性差、结构不完整或不一致。为了解决这一局限，我们提出了LucidFusion，这是一种灵活的端到端前馈框架，利用了相对坐标图（RCM）。不同于通过姿态将图像与3D世界关联的传统方法，LucidFusion利用RCM在不同视图间一致地对齐几何特征，使其在任意、无姿态的图像下生成3D对象时具备高度适应性。此外，LucidFusion无缝集成到原有的单图像转3D管道中，能够在512×512分辨率下生成精细的3D高斯表示，适用于多种应用场景。\n"
  },
  {
    "path": "abs/2410.16266.md",
    "content": "### 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors\n\nNovel-view synthesis aims to generate novel views of a scene from multiple input images or videos, and recent advancements like 3D Gaussian splatting (3DGS) have achieved notable success in producing photorealistic renderings with efficient pipelines. However, generating high-quality novel views under challenging settings, such as sparse input views, remains difficult due to insufficient information in under-sampled areas, often resulting in noticeable artifacts. This paper presents 3DGS-Enhancer, a novel pipeline for enhancing the representation quality of 3DGS representations. We leverage 2D video diffusion priors to address the challenging 3D view consistency problem, reformulating it as achieving temporal consistency within a video generation process. 3DGS-Enhancer restores view-consistent latent features of rendered novel views and integrates them with the input views through a spatial-temporal decoder. The enhanced views are then used to fine-tune the initial 3DGS model, significantly improving its rendering performance. Extensive experiments on large-scale datasets of unbounded scenes demonstrate that 3DGS-Enhancer yields superior reconstruction performance and high-fidelity rendering results compared to state-of-the-art methods.\n\n新视角合成旨在从多个输入图像或视频中生成场景的新视角。近年来，3D高斯点云（3DGS）等方法在生成高效的写实渲染方面取得了显著进展。然而，在稀疏输入视角等具有挑战性的场景下生成高质量的新视角仍然困难，因欠采样区域信息不足，常导致明显的伪影问题。本文提出了一种新颖的增强3DGS表示质量的流程，称为3DGS-Enhancer。我们利用2D视频扩散先验来解决具有挑战性的3D视角一致性问题，将其重新表述为视频生成过程中的时间一致性问题。3DGS-Enhancer恢复了渲染的新视角的视角一致性潜在特征，并通过时空解码器将其与输入视角整合。增强后的视角用于微调初始的3DGS模型，从而显著提高其渲染性能。在大规模无边界场景数据集上的大量实验表明，3DGS-Enhancer相较于最先进的方法，能提供更优越的重建性能和高保真的渲染结果。\n"
  },
  {
    "path": "abs/2410.16272.md",
    "content": "### MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors\n\nDrag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topology changes or generating new textures across diverse object categories. To overcome these limitations, we introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing that leverages multi-view generation and reconstruction priors. At the core of our approach is the usage of a multi-view diffusion model as a strong generative prior to perform consistent drag editing over multiple rendered views, which is followed by a reconstruction model that reconstructs 3D Gaussians of the edited object. While the initial 3D Gaussians may suffer from misalignment between different views, we address this via view-specific deformation networks that adjust the position of Gaussians to be well aligned. In addition, we propose a multi-view score function that distills generative priors from multiple views to further enhance the view consistency and visual quality. Extensive experiments demonstrate that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.\n\n拖拽编辑在2D内容创作中已受到广泛欢迎，这得益于图像生成模型的强大能力。然而，将该技术扩展到3D仍然面临挑战。现有的3D拖拽编辑方法，无论是采用显式空间变换，还是在有限容量的3D生成模型中依赖隐式潜在优化，都难以应对显著的拓扑变化或在多种对象类别中生成新纹理。为克服这些限制，我们提出了MVDrag3D，这是一种更加灵活和创意的拖拽式3D编辑框架，利用多视角生成和重建先验。我们的方法核心在于使用多视角扩散模型作为强大的生成先验，在多个渲染视图上执行一致的拖拽编辑，随后通过重建模型重建被编辑对象的3D高斯表示。尽管初始的3D高斯可能存在视图之间的错位，我们通过视图特定的变形网络来调整高斯的位置以实现良好的对齐。此外，我们提出了一个多视角得分函数，从多个视角中提取生成先验，进一步增强视图一致性和视觉质量。大量实验表明，MVDrag3D提供了一种精确、生成式且灵活的3D拖拽编辑解决方案，支持多种对象类别和3D表示的多样化编辑效果。\n"
  },
  {
    "path": "abs/2410.16978.md",
    "content": "### Multi-Layer Gaussian Splatting for Immersive Anatomy Visualization\n\nIn medical image visualization, path tracing of volumetric medical data like CT scans produces lifelike three-dimensional visualizations. Immersive VR displays can further enhance the understanding of complex anatomies. Going beyond the diagnostic quality of traditional 2D slices, they enable interactive 3D evaluation of anatomies, supporting medical education and planning. Rendering high-quality visualizations in real-time, however, is computationally intensive and impractical for compute-constrained devices like mobile headsets. We propose a novel approach utilizing GS to create an efficient but static intermediate representation of CT scans. We introduce a layered GS representation, incrementally including different anatomical structures while minimizing overlap and extending the GS training to remove inactive Gaussians. We further compress the created model with clustering across layers. Our approach achieves interactive frame rates while preserving anatomical structures, with quality adjustable to the target hardware. Compared to standard GS, our representation retains some of the explorative qualities initially enabled by immersive path tracing. Selective activation and clipping of layers are possible at rendering time, adding a degree of interactivity to otherwise static GS models. This could enable scenarios where high computational demands would otherwise prohibit using path-traced medical volumes.\n\n在医学图像可视化中，对CT扫描等体积医学数据进行路径追踪可以生成逼真的三维可视化。沉浸式VR显示进一步增强了对复杂解剖结构的理解，超越了传统二维切片的诊断质量，使得交互式三维解剖评估成为可能，支持医学教育和规划。然而，在实时渲染高质量的可视化效果时，计算量需求极高，对于移动头显等计算受限设备来说不切实际。\n我们提出了一种利用高斯点云（GS）的新方法，用于创建高效但静态的CT扫描中间表示。我们引入了分层的GS表示，逐层增量地包含不同的解剖结构，同时最小化重叠，并通过扩展GS训练来移除不活跃的高斯。我们进一步通过层间聚类对模型进行压缩，以提升效率。\n我们的方法在保留解剖结构的同时实现了交互帧率，并可根据目标硬件调整质量。与标准GS相比，我们的表示保留了沉浸式路径追踪初始提供的一些探索特性。渲染时可选择性地激活和裁剪各层，为静态GS模型增加了一定的交互性。这使得在高计算需求的场景下，原本无法使用路径追踪的医学体数据成为可能。\n"
  },
  {
    "path": "abs/2410.16995.md",
    "content": "### E-3DGS: Gaussian Splatting with Exposure and Motion Events\n\nEstimating Neural Radiance Fields (NeRFs) from images captured under optimal conditions has been extensively explored in the vision community. However, robotic applications often face challenges such as motion blur, insufficient illumination, and high computational overhead, which adversely affect downstream tasks like navigation, inspection, and scene visualization. To address these challenges, we propose E-3DGS, a novel event-based approach that partitions events into motion (from camera or object movement) and exposure (from camera exposure), using the former to handle fast-motion scenes and using the latter to reconstruct grayscale images for high-quality training and optimization of event-based 3D Gaussian Splatting (3DGS). We introduce a novel integration of 3DGS with exposure events for high-quality reconstruction of explicit scene representations. Our versatile framework can operate on motion events alone for 3D reconstruction, enhance quality using exposure events, or adopt a hybrid mode that balances quality and effectiveness by optimizing with initial exposure events followed by high-speed motion events. We also introduce EME-3D, a real-world 3D dataset with exposure events, motion events, camera calibration parameters, and sparse point clouds. Our method is faster and delivers better reconstruction quality than event-based NeRF while being more cost-effective than NeRF methods that combine event and RGB data by using a single event sensor. By combining motion and exposure events, E-3DGS sets a new benchmark for event-based 3D reconstruction with robust performance in challenging conditions and lower hardware demands.\n\n在视觉领域中，已广泛研究了从理想条件下拍摄的图像中估计神经辐射场（NeRFs）。然而，机器人应用通常面临运动模糊、光照不足和高计算开销等挑战，这些因素不利于导航、检测和场景可视化等下游任务。为应对这些挑战，我们提出了E-3DGS，这是一种基于事件的创新方法，将事件划分为运动（由相机或物体运动引起）和曝光（由相机曝光引起），前者用于处理快速运动场景，后者用于重建灰度图像，以便高质量训练和优化基于事件的3D高斯点云（3DGS）。我们首次将3DGS与曝光事件相结合，实现高质量的显式场景表示重建。\n我们的多功能框架可以仅依靠运动事件进行3D重建，通过曝光事件提升质量，或采用混合模式：首先使用初始曝光事件优化，再利用高速运动事件来平衡质量和效率。此外，我们引入了EME-3D，这是一种包含曝光事件、运动事件、相机校准参数和稀疏点云的真实3D数据集。相比事件驱动的NeRF，我们的方法更快且重建质量更高，同时相比那些结合事件和RGB数据的NeRF方法，由于仅使用单个事件传感器，具备更高的成本效益。通过结合运动和曝光事件，E-3DGS在具有挑战性的条件下实现了强大的3D重建表现，降低了硬件需求，树立了基于事件的3D重建新基准。\n"
  },
  {
    "path": "abs/2410.17084.md",
    "content": "### GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting\n\nIn this paper, we introduce GS-LIVM, a real-time photo-realistic LiDAR-Inertial-Visual mapping framework with Gaussian Splatting tailored for outdoor scenes. Compared to existing methods based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), our approach enables real-time photo-realistic mapping while ensuring high-quality image rendering in large-scale unbounded outdoor environments. In this work, Gaussian Process Regression (GPR) is employed to mitigate the issues resulting from sparse and unevenly distributed LiDAR observations. The voxel-based 3D Gaussians map representation facilitates real-time dense mapping in large outdoor environments with acceleration governed by custom CUDA kernels. Moreover, the overall framework is designed in a covariance-centered manner, where the estimated covariance is used to initialize the scale and rotation of 3D Gaussians, as well as update the parameters of the GPR. We evaluate our algorithm on several outdoor datasets, and the results demonstrate that our method achieves state-of-the-art performance in terms of mapping efficiency and rendering quality. The source code is available on GitHub.\n\n本文提出了GS-LIVM，这是一种基于高斯点云的实时写实LiDAR-惯性-视觉（LIV）映射框架，专为户外场景设计。与基于神经辐射场（NeRF）和3D高斯点云（3DGS）的现有方法相比，我们的方法能够在大规模无限制的户外环境中实现实时的写实映射，并确保高质量的图像渲染。为解决稀疏且不均匀分布的LiDAR观测带来的问题，我们采用高斯过程回归（GPR）。基于体素的3D高斯表示支持在大型户外环境中进行实时密集映射，并通过自定义CUDA内核进行加速。此外，整个框架以协方差为核心，估计的协方差用于初始化3D高斯的尺度和旋转，并更新GPR的参数。我们在多个户外数据集上对该算法进行了评估，结果显示该方法在映射效率和渲染质量方面达到了最先进的性能。\n"
  },
  {
    "path": "abs/2410.17249.md",
    "content": "### SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes\n\nWe present SpectroMotion, a novel approach that combines 3D Gaussian Splatting (3DGS) with physically-based rendering (PBR) and deformation fields to reconstruct dynamic specular scenes. Previous methods extending 3DGS to model dynamic scenes have struggled to accurately represent specular surfaces. Our method addresses this limitation by introducing a residual correction technique for accurate surface normal computation during deformation, complemented by a deformable environment map that adapts to time-varying lighting conditions. We implement a coarse-to-fine training strategy that significantly enhances both scene geometry and specular color prediction. We demonstrate that our model outperforms prior methods for view synthesis of scenes containing dynamic specular objects and that it is the only existing 3DGS method capable of synthesizing photorealistic real-world dynamic specular scenes, outperforming state-of-the-art methods in rendering complex, dynamic, and specular scenes.\n\n我们提出了 SpectroMotion，一种将三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 与基于物理的渲染 (PBR) 和变形场相结合，用于重建动态的高光场景。先前将 3DGS 拓展到动态场景建模的方法在准确表现高光表面方面存在困难。我们的方法通过引入残差校正技术来准确计算变形过程中的表面法向量，并配合可变形的环境贴图以适应随时间变化的光照条件，从而解决了这一局限。我们采用由粗到细的训练策略，大幅提升了场景几何结构和高光颜色的预测效果。实验表明，我们的模型在包含动态高光物体的场景视图合成方面优于现有方法，并且是唯一能够合成逼真动态高光真实场景的 3DGS 方法，在渲染复杂的动态高光场景方面超越了最新技术。\n"
  },
  {
    "path": "abs/2410.17422.md",
    "content": "### AG-SLAM: Active Gaussian Splatting SLAM\n\nWe present AG-SLAM, the first active SLAM system utilizing 3D Gaussian Splatting (3DGS) for online scene reconstruction. In recent years, radiance field scene representations, including 3DGS have been widely used in SLAM and exploration, but actively planning trajectories for robotic exploration is still unvisited. In particular, many exploration methods assume precise localization and thus do not mitigate the significant risk of constructing a trajectory, which is difficult for a SLAM system to operate on. This can cause camera tracking failure and lead to failures in real-world robotic applications. Our method leverages Fisher Information to balance the dual objectives of maximizing the information gain for the environment while minimizing the cost of localization errors. Experiments conducted on the Gibson and Habitat-Matterport 3D datasets demonstrate state-of-the-art results of the proposed method.\n\n我们提出了 AG-SLAM，这是首个利用三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 进行在线场景重建的主动 SLAM 系统。近年来，辐射场场景表示（包括 3DGS）在 SLAM 和环境探索中得到了广泛应用，但主动规划机器人探索的轨迹仍未被深入研究。尤其是，许多探索方法假设精确定位，从而未能解决构建难以用于 SLAM 系统的轨迹的显著风险。这可能导致摄像机跟踪失败，从而影响现实中的机器人应用。我们的方法利用费舍尔信息，在最大化环境信息增益和最小化定位误差成本的双重目标间实现平衡。基于 Gibson 和 Habitat-Matterport 3D 数据集的实验结果表明，所提出的方法达到了最新的技术水平。\n"
  },
  {
    "path": "abs/2410.17505.md",
    "content": "### PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting\n\nPrevious methods utilize the Neural Radiance Field (NeRF) for panoptic lifting, while their training and rendering speed are unsatisfactory. In contrast, 3D Gaussian Splatting (3DGS) has emerged as a prominent technique due to its rapid training and rendering speed. However, unlike NeRF, the conventional 3DGS may not satisfy the basic smoothness assumption as it does not rely on any parameterized structures to render (e.g., MLPs). Consequently, the conventional 3DGS is, in nature, more susceptible to noisy 2D mask supervision. In this paper, we propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks while maintaining superior efficiency compared to NeRF-based methods. Specifically, we build a panoptic-aware structured 3D Gaussian model to introduce smoothness and design effective noise reduction strategies. For the semantic field, instead of initialization with structure from motion, we construct reliable semantic anchor points to initialize the 3D Gaussians. We then use these anchor points as smooth regularization during training. Additionally, we present a self-training approach using pseudo labels generated by merging the rendered masks with the noisy masks to enhance the robustness of PLGS. For the instance field, we project the 2D instance masks into 3D space and match them with oriented bounding boxes to generate cross-view consistent instance masks for supervision. Experiments on various benchmarks demonstrate that our method outperforms previous state-of-the-art methods in terms of both segmentation quality and speed.\n\n以往的方法使用神经辐射场 (NeRF) 进行全景提升，但其训练和渲染速度不尽人意。相比之下，三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 凭借快速的训练和渲染速度成为了一种显著技术。然而，与 NeRF 不同，传统的 3DGS 因不依赖任何参数化结构（如 MLPs）进行渲染，可能无法满足基本的平滑性假设。因此，传统的 3DGS 更容易受到噪声二维掩码监督的影响。本文提出了一种新的方法，称为 PLGS，该方法使 3DGS 能够在保持相较于基于 NeRF 方法的高效性的同时，从噪声二维分割掩码生成一致的全景分割掩码。具体来说，我们构建了一个全景感知的结构化 3D 高斯模型以引入平滑性，并设计了有效的噪声消减策略。在语义场景构建中，我们不采用基于运动结构初始化，而是构建可靠的语义锚点来初始化 3D 高斯，并在训练过程中将这些锚点作为平滑正则化。此外，我们提出了一种自训练方法，通过合并渲染掩码和噪声掩码生成伪标签，增强 PLGS 的鲁棒性。在实例场景构建中，我们将二维实例掩码投影到三维空间，并与定向包围盒匹配，以生成跨视角一致的实例掩码用于监督。各种基准测试实验表明，我们的方法在分割质量和速度方面均优于现有的最新方法。\n"
  },
  {
    "path": "abs/2410.17932.md",
    "content": "### VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points\n\nRecent advances in novel view synthesis (NVS), particularly neural radiance fields (NeRF) and Gaussian splatting (3DGS), have demonstrated impressive results in photorealistic scene rendering. These techniques hold great potential for applications in virtual tourism and teleportation, where immersive realism is crucial. However, the high-performance demands of virtual reality (VR) systems present challenges in directly utilizing even such fast-to-render scene representations like 3DGS due to latency and computational constraints.\nIn this paper, we propose foveated rendering as a promising solution to these obstacles. We analyze state-of-the-art NVS methods with respect to their rendering performance and compatibility with the human visual system. Our approach introduces a novel foveated rendering approach for Virtual Reality, that leverages the sharp, detailed output of neural point rendering for the foveal region, fused with a smooth rendering of 3DGS for the peripheral vision.\nOur evaluation confirms that perceived sharpness and detail-richness are increased by our approach compared to a standard VR-ready 3DGS configuration. Our system meets the necessary performance requirements for real-time VR interactions, ultimately enhancing the user's immersive experience.\n\n最新的视图合成 (NVS) 技术，尤其是神经辐射场 (NeRF) 和高斯喷涂 (3D Gaussian Splatting, 3DGS)，在真实感场景渲染方面展示了令人印象深刻的效果。这些技术在虚拟旅游和远程呈现等需要高度沉浸感的应用中具有巨大潜力。然而，虚拟现实 (VR) 系统对高性能的需求使得直接使用即便是快速渲染的场景表示（如 3DGS）也面临延迟和计算限制方面的挑战。\n本文提出了注视点渲染作为应对这些挑战的有效解决方案。我们分析了最新 NVS 方法在渲染性能及其与人类视觉系统兼容性方面的表现。我们的方法为 VR 引入了一种创新的注视点渲染技术，利用神经点渲染的清晰细节输出处理注视区域，并结合 3DGS 平滑渲染处理外周视觉区域。评估结果表明，与标准 VR 适配的 3DGS 配置相比，我们的方法显著提升了感知清晰度和细节丰富度。\n我们的系统满足实时 VR 交互所需的性能要求，从而进一步增强了用户的沉浸式体验。\n"
  },
  {
    "path": "abs/2410.18822.md",
    "content": "### Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis\n\nNovel view synthesis from sparse inputs is a vital yet challenging task in 3D computer vision. Previous methods explore 3D Gaussian Splatting with neural priors (e.g. depth priors) as an additional supervision, demonstrating promising quality and efficiency compared to the NeRF based methods. However, the neural priors from 2D pretrained models are often noisy and blurry, which struggle to precisely guide the learning of radiance fields. In this paper, We propose a novel method for synthesizing novel views from sparse views with Gaussian Splatting that does not require external prior as supervision. Our key idea lies in exploring the self-supervisions inherent in the binocular stereo consistency between each pair of binocular images constructed with disparity-guided image warping. To this end, we additionally introduce a Gaussian opacity constraint which regularizes the Gaussian locations and avoids Gaussian redundancy for improving the robustness and efficiency of inferring 3D Gaussians from sparse views. Extensive experiments on the LLFF, DTU, and Blender datasets demonstrate that our method significantly outperforms the state-of-the-art methods.\n\n稀疏输入下的视图合成是三维计算机视觉中一个重要但具有挑战性的任务。先前的方法探索了结合神经先验（如深度先验）作为额外监督的三维高斯喷涂（3D Gaussian Splatting），在质量和效率方面相比 NeRF 方法显示出可喜的进展。然而，由二维预训练模型生成的神经先验通常存在噪声和模糊性，难以有效引导辐射场的学习。本文提出了一种新的方法，通过高斯喷涂从稀疏视图生成新视图，而不需要外部先验作为监督。\n我们的核心思想在于利用双目立体一致性所固有的自监督，通过视差引导的图像变换构建的双目图像对之间的自监督来学习。为此，我们引入了一种高斯不透明度约束，以正则化高斯的位置，避免高斯冗余，从而提升从稀疏视图推理三维高斯的鲁棒性和效率。在 LLFF、DTU 和 Blender 数据集上的大量实验表明，我们的方法在效果上显著优于最新的方法。\n"
  },
  {
    "path": "abs/2410.18912.md",
    "content": "### Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling\n\nVideos of robots interacting with objects encode rich information about the objects' dynamics. However, existing video prediction approaches typically do not explicitly account for the 3D information from videos, such as robot actions and objects' 3D states, limiting their use in real-world robotic applications. In this work, we introduce a framework to learn object dynamics directly from multi-view RGB videos by explicitly considering the robot's action trajectories and their effects on scene dynamics. We utilize the 3D Gaussian representation of 3D Gaussian Splatting (3DGS) to train a particle-based dynamics model using Graph Neural Networks. This model operates on sparse control particles downsampled from the densely tracked 3D Gaussian reconstructions. By learning the neural dynamics model on offline robot interaction data, our method can predict object motions under varying initial configurations and unseen robot actions. The 3D transformations of Gaussians can be interpolated from the motions of control particles, enabling the rendering of predicted future object states and achieving action-conditioned video prediction. The dynamics model can also be applied to model-based planning frameworks for object manipulation tasks. We conduct experiments on various kinds of deformable materials, including ropes, clothes, and stuffed animals, demonstrating our framework's ability to model complex shapes and dynamics.\n\n机器人与物体交互的视频包含丰富的物体动态信息。然而，现有的视频预测方法通常未能明确利用视频中的三维信息（如机器人动作和物体的三维状态），从而限制了其在真实机器人应用中的使用。本文提出了一种框架，通过明确考虑机器人的动作轨迹及其对场景动态的影响，从多视角 RGB 视频中直接学习物体的动态。我们利用三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 的三维高斯表示，通过图神经网络训练基于粒子的动态模型。该模型在从密集追踪的三维高斯重建中下采样的稀疏控制粒子上运行。\n通过对离线机器人交互数据学习神经动态模型，我们的方法能够预测物体在不同初始配置和未见过的机器人动作下的运动。高斯的三维变换可以通过控制粒子的运动进行插值，从而实现物体未来状态的预测渲染，达成基于动作的条件视频预测。该动态模型还可用于基于模型的规划框架，以完成物体操作任务。我们在包括绳索、衣物和填充玩具等多种可变形材料上进行了实验，验证了该框架在建模复杂形状和动态方面的能力。\n"
  },
  {
    "path": "abs/2410.18931.md",
    "content": "### Sort-free Gaussian Splatting via Weighted Sum Rendering\n\nRecently, 3D Gaussian Splatting (3DGS) has emerged as a significant advancement in 3D scene reconstruction, attracting considerable attention due to its ability to recover high-fidelity details while maintaining low complexity. Despite the promising results achieved by 3DGS, its rendering performance is constrained by its dependence on costly non-commutative alpha-blending operations. These operations mandate complex view dependent sorting operations that introduce computational overhead, especially on the resource-constrained platforms such as mobile phones. In this paper, we propose Weighted Sum Rendering, which approximates alpha blending with weighted sums, thereby removing the need for sorting. This simplifies implementation, delivers superior performance, and eliminates the \"popping\" artifacts caused by sorting. Experimental results show that optimizing a generalized Gaussian splatting formulation to the new differentiable rendering yields competitive image quality. The method was implemented and tested in a mobile device GPU, achieving on average 1.23× faster rendering.\n\n最近，三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 在三维场景重建方面取得了显著进展，因其能够在保持低复杂度的同时恢复高保真细节而备受关注。尽管 3DGS 取得了有前景的结果，其渲染性能受到依赖于昂贵的非交换性 alpha 混合操作的限制。这些操作要求复杂的视角相关排序操作，尤其在移动设备等资源受限的平台上引入了计算开销。本文提出了加权求和渲染，通过加权求和来近似 alpha 混合，从而消除排序需求。这一方法简化了实现过程，提升了性能，并消除了因排序引起的“跳跃”伪影。实验结果表明，将通用的高斯喷涂公式优化至新的可微分渲染形式能够实现具有竞争力的图像质量。该方法在移动设备 GPU 上实现并测试，平均渲染速度提升了 1.23 倍。\n"
  },
  {
    "path": "abs/2410.18979.md",
    "content": "### PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views\n\nWe propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views.\n\n我们提出了 PixelGaussian，一种高效的前馈框架，用于从任意视角学习具有良好泛化性的三维高斯重建。大多数现有方法依赖于均匀的像素级高斯表示，每个视角学习固定数量的三维高斯，因此无法在输入视角增多时很好地泛化。与之不同的是，PixelGaussian 根据几何复杂度动态调整高斯分布和数量，从而实现更高效的表示，并显著提升重建质量。\n具体来说，我们引入了 级联高斯适配器 (Cascade Gaussian Adapter, CGA)，通过关键点评分器识别的局部几何复杂度来调整高斯分布。CGA 在具有上下文感知的超网络中利用可变形注意力来指导高斯剪枝和分裂，从而在复杂区域中保证精确表示，同时减少冗余。此外，我们设计了基于 Transformer 的迭代高斯细化器 (Iterative Gaussian Refiner) 模块，通过图像与高斯的直接交互来细化高斯表示。\nPixelGaussian 可以随着输入视角的增加有效减少高斯冗余。我们在大规模的 ACID 和 RealEstate10K 数据集上进行了大量实验，结果表明我们的方法在各种视角数量下表现出色，达到了最新的技术水平，并具有良好的泛化能力。\n"
  },
  {
    "path": "abs/2410.19657.md",
    "content": "### DiffGS: Functional Gaussian Splatting Diffusion\n\n3D Gaussian Splatting (3DGS) has shown convincing performance in rendering speed and fidelity, yet the generation of Gaussian Splatting remains a challenge due to its discreteness and unstructured nature. In this work, we propose DiffGS, a general Gaussian generator based on latent diffusion models. DiffGS is a powerful and efficient 3D generative model which is capable of generating Gaussian primitives at arbitrary numbers for high-fidelity rendering with rasterization. The key insight is to represent Gaussian Splatting in a disentangled manner via three novel functions to model Gaussian probabilities, colors and transforms. Through the novel disentanglement of 3DGS, we represent the discrete and unstructured 3DGS with continuous Gaussian Splatting functions, where we then train a latent diffusion model with the target of generating these Gaussian Splatting functions both unconditionally and conditionally. Meanwhile, we introduce a discretization algorithm to extract Gaussians at arbitrary numbers from the generated functions via octree-guided sampling and optimization. We explore DiffGS for various tasks, including unconditional generation, conditional generation from text, image, and partial 3DGS, as well as Point-to-Gaussian generation. We believe that DiffGS provides a new direction for flexibly modeling and generating Gaussian Splatting.\n\n三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 在渲染速度和保真度方面表现出色，但由于其离散性和非结构化特性，高斯喷涂的生成仍然是一个挑战。本文提出了一种基于潜在扩散模型的通用高斯生成器，称为 DiffGS。DiffGS 是一种功能强大且高效的三维生成模型，能够生成任意数量的高斯基元，以实现基于光栅化的高保真渲染。\n核心思想在于通过三个新颖的函数以解耦的方式表示高斯喷涂，分别建模高斯概率、颜色和变换。通过对 3DGS 的这种新颖解耦，我们使用连续高斯喷涂函数来表示离散和非结构化的 3DGS，并训练潜在扩散模型，以生成无条件和有条件的高斯喷涂函数。此外，我们引入了一种离散化算法，通过八叉树引导的采样和优化，从生成的函数中提取任意数量的高斯基元。\n我们在多个任务中探索了 DiffGS，包括无条件生成、基于文本、图像和部分 3DGS 的条件生成，以及从点到高斯的生成。我们相信，DiffGS 为灵活建模和生成高斯喷涂开辟了新的方向。\n"
  },
  {
    "path": "abs/2410.20030.md",
    "content": "### SCube: Instant Large-Scale Scene Reconstruction using VoxSplats\n\nWe present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images. Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold. To reconstruct a VoxSplat from images, we employ a hierarchical voxel latent diffusion model conditioned on the input images followed by a feedforward appearance prediction model. The diffusion model generates high-resolution grids progressively in a coarse-to-fine manner, and the appearance network predicts a set of Gaussians within each voxel. From as few as 3 non-overlapping input images, SCube can generate millions of Gaussians with a 1024^3 voxel grid spanning hundreds of meters in 20 seconds. Past works tackling scene reconstruction from images either rely on per-scene optimization and fail to reconstruct the scene away from input views (thus requiring dense view coverage as input) or leverage geometric priors based on low-resolution models, which produce blurry results. In contrast, SCube leverages high-resolution sparse networks and produces sharp outputs from few views. We show the superiority of SCube compared to prior art using the Waymo self-driving dataset on 3D reconstruction and demonstrate its applications, such as LiDAR simulation and text-to-scene generation.\n\n我们提出了 SCube，一种从稀疏的姿态图像集合中重建大规模三维场景（几何、外观和语义）的新方法。我们的方法通过一种新的表示方法 VoxSplat 对重建的场景进行编码，该表示是一组在高分辨率稀疏体素框架上支持的三维高斯。为了从图像重建 VoxSplat，我们采用了条件在输入图像上的分层体素潜在扩散模型，随后是一个前馈外观预测模型。扩散模型以粗到细的方式逐步生成高分辨率网格，而外观网络在每个体素内预测一组高斯。\nSCube 能够仅从 3 张不重叠的输入图像中生成包含数百万个高斯的 1024³ 体素网格，覆盖数百米范围，并在 20 秒内完成生成。过去基于图像的场景重建工作要么依赖于每个场景的优化，无法重建远离输入视角的区域（因此需要密集的视角覆盖），要么依赖基于低分辨率模型的几何先验，导致结果模糊。相比之下，SCube 利用高分辨率稀疏网络，从少量视角生成清晰的输出。\n我们在 Waymo 自动驾驶数据集上展示了 SCube 在三维重建方面相较于现有技术的优越性，并展示了其应用场景，如 LiDAR 模拟和文本生成场景。\n"
  },
  {
    "path": "abs/2410.20593.md",
    "content": "### Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering\n\nRendering and reconstruction are long-standing topics in computer vision and graphics. Achieving both high rendering quality and accurate geometry is a challenge. Recent advancements in 3D Gaussian Splatting (3DGS) have enabled high-fidelity novel view synthesis at real-time speeds. However, the noisy and discrete nature of 3D Gaussian primitives hinders accurate surface estimation. Previous attempts to regularize 3D Gaussian normals often degrade rendering quality due to the fundamental disconnect between normal vectors and the rendering pipeline in 3DGS-based methods. Therefore, we introduce Normal-GS, a novel approach that integrates normal vectors into the 3DGS rendering pipeline. The core idea is to model the interaction between normals and incident lighting using the physically-based rendering equation. Our approach re-parameterizes surface colors as the product of normals and a designed Integrated Directional Illumination Vector (IDIV). To optimize memory usage and simplify optimization, we employ an anchor-based 3DGS to implicitly encode locally-shared IDIVs. Additionally, Normal-GS leverages optimized normals and Integrated Directional Encoding (IDE) to accurately model specular effects, enhancing both rendering quality and surface normal precision. Extensive experiments demonstrate that Normal-GS achieves near state-of-the-art visual quality while obtaining accurate surface normals and preserving real-time rendering performance.\n\n渲染和重建是计算机视觉和图形学中的长期研究课题，实现高渲染质量和准确几何形状的结合是一大挑战。最近，三维高斯喷涂 (3D Gaussian Splatting, 3DGS) 的进展使得在实时速度下实现高保真新视图合成成为可能。然而，3D 高斯基元的噪声和离散特性阻碍了精确的表面估计。先前试图对 3D 高斯法向量进行正则化的方法通常因法向量与 3DGS 渲染管道之间的根本脱节而导致渲染质量下降。因此，我们提出了 Normal-GS，一种将法向量集成到 3DGS 渲染管道中的新方法。\n核心思想是利用基于物理的渲染方程来建模法向量与入射光之间的交互。我们通过重新参数化表面颜色，将其表示为法向量与设计的集成方向光矢量 (Integrated Directional Illumination Vector, IDIV) 的乘积。为了优化内存使用并简化优化过程，我们采用基于锚点的 3DGS 以隐式编码局部共享的 IDIV。此外，Normal-GS 利用优化后的法向量和集成方向编码 (Integrated Directional Encoding, IDE) 精确建模高光效果，从而增强了渲染质量和表面法向精度。\n大量实验表明，Normal-GS 实现了接近最新技术的视觉质量，同时获得了精确的表面法向量并保持了实时渲染性能。\n"
  },
  {
    "path": "abs/2410.20686.md",
    "content": "### ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings\n\nOmnidirectional (or 360-degree) images are increasingly being used for 3D applications since they allow the rendering of an entire scene with a single image. Existing works based on neural radiance fields demonstrate successful 3D reconstruction quality on egocentric videos, yet they suffer from long training and rendering times. Recently, 3D Gaussian splatting has gained attention for its fast optimization and real-time rendering. However, directly using a perspective rasterizer to omnidirectional images results in severe distortion due to the different optical properties between two image domains. In this work, we present ODGS, a novel rasterization pipeline for omnidirectional images, with geometric interpretation. For each Gaussian, we define a tangent plane that touches the unit sphere and is perpendicular to the ray headed toward the Gaussian center. We then leverage a perspective camera rasterizer to project the Gaussian onto the corresponding tangent plane. The projected Gaussians are transformed and combined into the omnidirectional image, finalizing the omnidirectional rasterization process. This interpretation reveals the implicit assumptions within the proposed pipeline, which we verify through mathematical proofs. The entire rasterization process is parallelized using CUDA, achieving optimization and rendering speeds 100 times faster than NeRF-based methods. Our comprehensive experiments highlight the superiority of ODGS by delivering the best reconstruction and perceptual quality across various datasets. Additionally, results on roaming datasets demonstrate that ODGS restores fine details effectively, even when reconstructing large 3D scenes.\n\n全景（或 360 度）图像因其能够使用单张图像渲染整个场景，逐渐在三维应用中得到广泛应用。基于神经辐射场的现有方法在第一人称视角视频中展示了成功的三维重建质量，但在训练和渲染时间上较长。最近，三维高斯喷涂 (3D Gaussian Splatting) 因其快速优化和实时渲染性能而备受关注。然而，直接使用透视光栅化处理全景图像会因两种图像域的光学属性差异而产生严重的失真。\n为了解决这一问题，我们提出了一种具有几何解释的全景光栅化流水线 ODGS。对于每个高斯基元，我们定义一个与单位球相切且垂直于指向高斯中心的射线的切平面。然后，我们利用透视相机光栅化器将高斯投影到对应的切平面。投影后的高斯基元经过变换并组合到全景图像中，完成全景光栅化过程。该几何解释揭示了提出的流水线中的隐式假设，我们通过数学证明对此进行验证。\n整个光栅化过程使用 CUDA 并行化，实现了比基于 NeRF 的方法快 100 倍的优化和渲染速度。我们在多个数据集上进行了全面实验，结果表明 ODGS 在重建质量和感知质量方面表现优越。此外，在移动数据集上的结果表明，即使在重建大型三维场景时，ODGS 也能有效恢复细节。\n"
  },
  {
    "path": "abs/2410.20723.md",
    "content": "### CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians\n\nRecent breakthroughs in text-guided image generation have significantly advanced the field of 3D generation. While generating a single high-quality 3D object is now feasible, generating multiple objects with reasonable interactions within a 3D space, a.k.a. compositional 3D generation, presents substantial challenges. This paper introduces CompGS, a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation. To achieve this goal, two core designs are proposed: (1) 3D Gaussians Initialization with 2D compositionality: We transfer the well-established 2D compositionality to initialize the Gaussian parameters on an entity-by-entity basis, ensuring both consistent 3D priors for each entity and reasonable interactions among multiple entities; (2) Dynamic Optimization: We propose a dynamic strategy to optimize 3D Gaussians using Score Distillation Sampling (SDS) loss. CompGS first automatically decomposes 3D Gaussians into distinct entity parts, enabling optimization at both the entity and composition levels. Additionally, CompGS optimizes across objects of varying scales by dynamically adjusting the spatial parameters of each entity, enhancing the generation of fine-grained details, particularly in smaller entities. Qualitative comparisons and quantitative evaluations on T3Bench demonstrate the effectiveness of CompGS in generating compositional 3D objects with superior image quality and semantic alignment over existing methods. CompGS can also be easily extended to controllable 3D editing, facilitating scene generation. We hope CompGS will provide new insights to the compositional 3D generation.\n\n近年来，文本引导的图像生成取得了显著突破，大幅推进了三维生成领域的发展。尽管生成单个高质量的三维物体已变得可行，但在三维空间中生成具有合理交互的多个物体（即组合性三维生成）仍面临巨大挑战。本文提出了 CompGS，一种新颖的生成框架，通过三维高斯喷涂 (3D Gaussian Splatting, GS) 实现高效的组合性文本到三维内容生成。\n为实现这一目标，我们提出了两个核心设计：(1) 具有二维组合性的三维高斯初始化：我们将成熟的二维组合性转移到高斯参数的实体级别初始化上，确保每个实体具有一致的三维先验，并实现多实体间合理的交互；(2) 动态优化：我们提出了一种基于得分蒸馏采样 (SDS) 损失的动态策略，用于优化三维高斯。CompGS 首先自动将三维高斯分解为不同的实体部分，从而在实体和组合层面上进行优化。此外，CompGS 通过动态调整每个实体的空间参数来优化不同尺度的物体，增强了细粒度细节的生成，尤其是在较小实体的生成上。\n在 T3Bench 数据集上的定性比较和定量评估表明，CompGS 在生成组合性三维物体方面优于现有方法，具有更高的图像质量和语义对齐度。此外，CompGS 也可轻松扩展到可控的三维编辑，促进场景生成。我们希望 CompGS 能为组合性三维生成提供新的见解。\n"
  },
  {
    "path": "abs/2410.20789.md",
    "content": "### LoDAvatar: Hierarchical Embedding and Adaptive Levels of Detail with Gaussian Splatting for Enhanced Human Avatars\n\nWith the advancement of virtual reality, the demand for 3D human avatars is increasing. The emergence of Gaussian Splatting technology has enabled the rendering of Gaussian avatars with superior visual quality and reduced computational costs. Despite numerous methods researchers propose for implementing drivable Gaussian avatars, limited attention has been given to balancing visual quality and computational costs. In this paper, we introduce LoDAvatar, a method that introduces levels of detail into Gaussian avatars through hierarchical embedding and selective detail enhancement methods. The key steps of LoDAvatar encompass data preparation, Gaussian embedding, Gaussian optimization, and selective detail enhancement. We conducted experiments involving Gaussian avatars at various levels of detail, employing both objective assessments and subjective evaluations. The outcomes indicate that incorporating levels of detail into Gaussian avatars can decrease computational costs during rendering while upholding commendable visual quality, thereby enhancing runtime frame rates. We advocate adopting LoDAvatar to render multiple dynamic Gaussian avatars or extensive Gaussian scenes to balance visual quality and computational costs.\n\n随着虚拟现实的进步，对三维人类头像的需求日益增长。高斯喷涂技术的出现使得渲染高斯头像具有出色的视觉质量和较低的计算成本。尽管研究者提出了多种可驱动高斯头像的方法，但在平衡视觉质量和计算成本方面的研究较为有限。本文提出了 LoDAvatar，一种通过分层嵌入和选择性细节增强方法在高斯头像中引入细节层次的方法。\nLoDAvatar 的关键步骤包括数据准备、高斯嵌入、高斯优化以及选择性细节增强。我们对不同细节层次的高斯头像进行了实验，采用了客观评估和主观评价。结果表明，在高斯头像中引入细节层次可以在渲染过程中降低计算成本，同时保持出色的视觉质量，从而提升运行时的帧率。我们倡导采用 LoDAvatar 进行多个动态高斯头像或大规模高斯场景的渲染，以平衡视觉质量和计算成本。\n"
  },
  {
    "path": "abs/2410.20815.md",
    "content": "### Grid4D: 4D Decomposed Hash Encoding for High-fidelity Dynamic Gaussian Splatting\n\nRecently, Gaussian splatting has received more and more attention in the field of static scene rendering. Due to the low computational overhead and inherent flexibility of explicit representations, plane-based explicit methods are popular ways to predict deformations for Gaussian-based dynamic scene rendering models. However, plane-based methods rely on the inappropriate low-rank assumption and excessively decompose the space-time 4D encoding, resulting in overmuch feature overlap and unsatisfactory rendering quality. To tackle these problems, we propose Grid4D, a dynamic scene rendering model based on Gaussian splatting and employing a novel explicit encoding method for the 4D input through the hash encoding. Different from plane-based explicit representations, we decompose the 4D encoding into one spatial and three temporal 3D hash encodings without the low-rank assumption. Additionally, we design a novel attention module that generates the attention scores in a directional range to aggregate the spatial and temporal features. The directional attention enables Grid4D to more accurately fit the diverse deformations across distinct scene components based on the spatial encoded features. Moreover, to mitigate the inherent lack of smoothness in explicit representation methods, we introduce a smooth regularization term that keeps our model from the chaos of deformation prediction. Our experiments demonstrate that Grid4D significantly outperforms the state-of-the-art models in visual quality and rendering speed.\n\n近期，高斯喷涂在静态场景渲染领域受到越来越多关注。由于显式表示的低计算开销和灵活性，基于平面的显式方法在高斯动态场景渲染模型中成为预测形变的流行选择。然而，平面方法依赖于不恰当的低秩假设，并过度分解时空四维编码，导致特征重叠过多，渲染质量不理想。\n为了解决这些问题，我们提出了 Grid4D，一种基于高斯喷涂的动态场景渲染模型，采用了一种新颖的哈希编码显式方法处理四维输入。与基于平面的显式表示不同，我们将四维编码分解为一个空间和三个时间的三维哈希编码，避免了低秩假设。此外，我们设计了一种新的注意力模块，在特定方向范围内生成注意力得分，以聚合空间和时间特征。该方向性注意力使得 Grid4D 能够更准确地根据空间编码特征拟合不同场景组件的多样化形变。\n此外，为了缓解显式表示方法固有的平滑性不足问题，我们引入了一个平滑正则项，防止模型在形变预测中出现混乱。实验结果表明，Grid4D 在视觉质量和渲染速度上显著优于现有最新模型。\n"
  },
  {
    "path": "abs/2410.21310.md",
    "content": "### ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting\n\nScanning Electron Microscopes (SEMs) are widely renowned for their ability to analyze the surface structures of microscopic objects, offering the capability to capture highly detailed, yet only grayscale, images. To create more expressive and realistic illustrations, these images are typically manually colorized by an artist with the support of image editing software. This task becomes highly laborious when multiple images of a scanned object require colorization. We propose facilitating this process by using the underlying 3D structure of the microscopic scene to propagate the color information to all the captured images, from as little as one colorized view. We explore several scene representation techniques and achieve high-quality colorized novel view synthesis of a SEM scene. In contrast to prior work, there is no manual intervention or labelling involved in obtaining the 3D representation. This enables an artist to color a single or few views of a sequence and automatically retrieve a fully colored scene or video.\n\n扫描电子显微镜（SEM）因其分析微观物体表面结构的能力而广受认可，能够捕捉高度精细的图像，但仅限于灰度显示。为了创造更具表现力和真实感的图像，这些图像通常由艺术家借助图像编辑软件手动上色。当需要对同一扫描物体的多张图像进行上色时，此任务显得尤为繁重。我们提出利用微观场景的三维结构，将色彩信息传播到所有捕获的图像中，从而简化这一过程，仅需一个上色视图即可实现。我们探讨了几种场景表示技术，并实现了SEM场景的高质量彩色新视角合成。与以往的工作不同，我们无需手动干预或标签来获得三维表示，从而使艺术家只需对序列中的一个或少数视角进行上色，即可自动获得完整上色的场景或视频。\n"
  },
  {
    "path": "abs/2410.21566.md",
    "content": "### MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps\n\nThe key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model. Our code is available at this https URL.\n\n多视角室内三维物体检测的关键挑战在于从图像中推断出准确的几何信息，以实现精确的三维检测。以往的方法依赖于NeRF进行几何推理。然而，从NeRF中提取的几何信息通常不够准确，导致检测性能不理想。本文提出了一种利用平面扫描的几何感知三维物体检测方法——MVSDet。为避免准确深度预测所需的大量深度平面，我们设计了一种概率采样与软加权机制，以确定像素特征在三维体积中的放置位置。我们为每个像素选择在概率体积中得分最高的多个位置，并使用它们的概率分数来指示置信度。此外，我们进一步应用最新的像素对齐高斯点散射技术来正则化深度预测，以较低的计算开销提升检测性能。我们在ScanNet和ARKitScenes数据集上进行了大量实验，展示了我们模型的优越性。\n"
  },
  {
    "path": "abs/2410.21955.md",
    "content": "### ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting\n\nWe propose ActiveSplat, an autonomous high-fidelity reconstruction system leveraging Gaussian splatting. Taking advantage of efficient and realistic rendering, the system establishes a unified framework for online mapping, viewpoint selection, and path planning. The key to ActiveSplat is a hybrid map representation that integrates both dense information about the environment and a sparse abstraction of the workspace. Therefore, the system leverages sparse topology for efficient viewpoint sampling and path planning, while exploiting view-dependent dense prediction for viewpoint selection, facilitating efficient decision-making with promising accuracy and completeness. A hierarchical planning strategy based on the topological map is adopted to mitigate repetitive trajectories and improve local granularity given limited budgets, ensuring high-fidelity reconstruction with photorealistic view synthesis. Extensive experiments and ablation studies validate the efficacy of the proposed method in terms of reconstruction accuracy, data coverage, and exploration efficiency.\n\n我们提出了ActiveSplat，一种利用高斯分裂进行自主高保真重建的系统。该系统利用高效且逼真的渲染功能，构建了在线建图、视角选择和路径规划的统一框架。ActiveSplat的关键在于一种混合地图表示，将环境的密集信息与工作空间的稀疏抽象相结合。因此，该系统利用稀疏拓扑结构进行高效的视角采样和路径规划，同时通过视角依赖的密集预测进行视角选择，从而实现准确且完整的高效决策。基于拓扑地图的分层规划策略能够在预算有限的情况下减轻重复轨迹并提升局部精细度，从而确保高保真重建和逼真视图合成。大量实验和消融研究验证了所提方法在重建精度、数据覆盖率和探索效率方面的有效性。\n"
  },
  {
    "path": "abs/2410.22070.md",
    "content": "### FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives\n\nReconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints. Widely adopted approaches supervise complex interactions with additional masks and control signal annotations, limiting their real-world applications. In this paper, we propose an annotation guidance-free method, dubbed FreeGaussian, that mathematically derives dynamic Gaussian motion from optical flow and camera motion using novel dynamic Gaussian constraints. By establishing a connection between 2D flows and 3D Gaussian dynamic control, our method enables self-supervised optimization and continuity of dynamic Gaussian motions from flow priors. Furthermore, we introduce a 3D spherical vector controlling scheme, which represents the state with a 3D Gaussian trajectory, thereby eliminating the need for complex 1D control signal calculations and simplifying controllable Gaussian modeling. Quantitative and qualitative evaluations on extensive experiments demonstrate the state-of-the-art visual performance and control capability of our method.\n\n从单目视频中重建可控的高斯点是一项具有挑战性的任务，因为其内在的约束不足。广泛采用的方法通过额外的掩码和控制信号注释来监督复杂的交互，这在很大程度上限制了其在真实场景中的应用。本文提出了一种无需注释指导的方法，称为FreeGaussian。该方法通过创新的动态高斯约束，从光流和相机运动中数学推导出动态高斯运动。通过建立二维光流和三维高斯动态控制之间的联系，我们的方法实现了从流先验中自监督优化和动态高斯运动的连续性。此外，我们引入了一种三维球面向量控制方案，通过三维高斯轨迹表示状态，从而避免了复杂的一维控制信号计算，简化了可控高斯建模。大量实验中的定量和定性评估结果表明，我们的方法在视觉效果和控制能力方面达到了最新的性能水准。\n"
  },
  {
    "path": "abs/2410.22128.md",
    "content": "### PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting\n\nWe consider the problem of novel view synthesis from unposed images in a single feed-forward. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS, where we further extend it to offer a practical solution that relaxes common assumptions such as dense image views, accurate camera poses, and substantial image overlaps. We achieve this through identifying and addressing unique challenges arising from the use of pixel-aligned 3DGS: misaligned 3D Gaussians across different views induce noisy or sparse gradients that destabilize training and hinder convergence, especially when above assumptions are not met. To mitigate this, we employ pre-trained monocular depth estimation and visual correspondence models to achieve coarse alignments of 3D Gaussians. We then introduce lightweight, learnable modules to refine depth and pose estimates from the coarse alignments, improving the quality of 3D reconstruction and novel view synthesis. Furthermore, the refined estimates are leveraged to estimate geometry confidence scores, which assess the reliability of 3D Gaussian centers and condition the prediction of Gaussian parameters accordingly. Extensive evaluations on large-scale real-world datasets demonstrate that PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.\n\n我们研究单次前向传递中从未定姿图像生成新视角的问题。我们的框架利用了3DGS在速度、可扩展性以及高质量三维重建和视角合成方面的优势，进一步扩展其应用，以提供一种放宽常见假设（如密集的图像视角、准确的相机姿态以及显著的图像重叠）的实用解决方案。我们通过识别和解决因使用像素对齐的3DGS而引发的独特挑战来实现这一目标：不同视角间的三维高斯对齐不良会导致噪声或稀疏的梯度，破坏训练稳定性并阻碍收敛，尤其在上述假设不成立的情况下。为此，我们采用预训练的单目深度估计和视觉对应模型，实现三维高斯的粗对齐。随后，我们引入轻量且可学习的模块，以从粗对齐中精化深度和姿态估计，从而提升三维重建和新视角合成的质量。此外，精化后的估计被用于计算几何置信分数，以评估三维高斯中心的可靠性，并相应地调节高斯参数的预测。对大规模真实数据集的广泛评估表明，PF3plat在所有基准测试中均设立了新的性能标杆，并通过全面的消融研究验证了我们的设计选择。\n"
  },
  {
    "path": "abs/2410.22705.md",
    "content": "### Geometry Cloak: Preventing TGS-based 3D Reconstruction from Copyrighted Images\n\nSingle-view 3D reconstruction methods like Triplane Gaussian Splatting (TGS) have enabled high-quality 3D model generation from just a single image input within seconds. However, this capability raises concerns about potential misuse, where malicious users could exploit TGS to create unauthorized 3D models from copyrighted images. To prevent such infringement, we propose a novel image protection approach that embeds invisible geometry perturbations, termed \"geometry cloaks\", into images before supplying them to TGS. These carefully crafted perturbations encode a customized message that is revealed when TGS attempts 3D reconstructions of the cloaked image. Unlike conventional adversarial attacks that simply degrade output quality, our method forces TGS to fail the 3D reconstruction in a specific way - by generating an identifiable customized pattern that acts as a watermark. This watermark allows copyright holders to assert ownership over any attempted 3D reconstructions made from their protected images. Extensive experiments have verified the effectiveness of our geometry cloak.\n\n单视角三维重建方法（如Triplane Gaussian Splatting，TGS）已使得从单张图像输入中在数秒内生成高质量的三维模型成为可能。然而，这一能力也引发了潜在的滥用问题，恶意用户可能利用TGS从受版权保护的图像中创建未经授权的三维模型。为防止此类侵权行为，我们提出了一种创新的图像保护方法，将不可见的几何扰动（称为“几何伪装”）嵌入图像中，在将其输入到TGS之前进行处理。这些精心设计的扰动编码了一条定制信息，当TGS尝试对伪装后的图像进行三维重建时，这条信息便会显现。与传统的对抗性攻击仅简单地降低输出质量不同，我们的方法使TGS在特定的方式下无法完成三维重建——生成一个可识别的定制图案，充当水印。该水印使版权持有人能够对使用其受保护图像进行的任何三维重建主张所有权。大量实验验证了我们几何伪装的有效性。\n"
  },
  {
    "path": "abs/2410.22817.md",
    "content": "### Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis\n\nGeneralizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality.\n\n可泛化的三维高斯分裂（3DGS）可以在前向推理中从稀疏视角观察中重建新场景，无需像传统3DGS那样进行场景特定的重新训练。然而，现有方法严重依赖极线先验，这在复杂的真实场景中可能不可靠，尤其在非重叠和遮挡区域中。本文提出了一种高效的前向推理3DGS模型——eFreeSplat，用于泛化的新视角合成，能够独立于极线约束进行操作。为提升多视角特征提取的三维感知能力，我们在大规模数据集上使用自监督的Vision Transformer (ViT)进行跨视角补全预训练。此外，我们引入了一种迭代的跨视角高斯对齐方法，以确保不同视角间的深度尺度一致性。eFreeSplat代表了一种创新的泛化新视角合成方法。与现有的纯无几何方法不同，eFreeSplat更加注重实现无极线约束的特征匹配和编码，通过跨视角预训练提供三维先验。我们在RealEstate10K和ACID数据集上进行了宽基线新视角合成任务的评估。大量实验表明，eFreeSplat在几何重建和新视角合成质量上均超越了依赖极线先验的最新基准方法。\n"
  },
  {
    "path": "abs/2410.23213.md",
    "content": "### ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting\n\n3D models have recently been popularized by the potentiality of end-to-end training offered first by Neural Radiance Fields and most recently by 3D Gaussian Splatting models. The latter has the big advantage of naturally providing fast training convergence and high editability. However, as the research around these is still in its infancy, there is still a gap in the literature regarding the model's scalability. In this work, we propose an approach enabling both memory and computation scalability of such models. More specifically, we propose an iterative pruning strategy that removes redundant information encoded in the model. We also enhance compressibility for the model by including in the optimization strategy a differentiable quantization and entropy coding estimator. Our results on popular benchmarks showcase the effectiveness of the proposed approach and open the road to the broad deployability of such a solution even on resource-constrained devices.\n\n由于神经辐射场（Neural Radiance Fields）首次引入的端到端训练潜力，以及最近3D高斯分裂模型的推动，三维模型逐渐流行起来。后者具有快速训练收敛和高度可编辑性的显著优势。然而，由于该研究领域仍处于初期阶段，关于模型的可扩展性方面的文献仍存在空白。在本研究中，我们提出了一种方法，以实现此类模型的内存和计算扩展性。具体而言，我们提出了一种迭代剪枝策略，用于移除模型中编码的冗余信息。同时，我们通过在优化策略中引入可微量化和熵编码估计器，进一步提高模型的可压缩性。我们在常用基准测试上的结果展示了所提方法的有效性，为这种解决方案在资源受限设备上的广泛部署铺平了道路。\n"
  },
  {
    "path": "abs/2410.23658.md",
    "content": "### GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring\n\nTo train a deblurring network, an appropriate dataset with paired blurry and sharp images is essential. Existing datasets collect blurry images either synthetically by aggregating consecutive sharp frames or using sophisticated camera systems to capture real blur. However, these methods offer limited diversity in blur types (blur trajectories) or require extensive human effort to reconstruct large-scale datasets, failing to fully reflect real-world blur scenarios. To address this, we propose GS-Blur, a dataset of synthesized realistic blurry images created using a novel approach. To this end, we first reconstruct 3D scenes from multi-view images using 3D Gaussian Splatting (3DGS), then render blurry images by moving the camera view along the randomly generated motion trajectories. By adopting various camera trajectories in reconstructing our GS-Blur, our dataset contains realistic and diverse types of blur, offering a large-scale dataset that generalizes well to real-world blur. Using GS-Blur with various deblurring methods, we demonstrate its ability to generalize effectively compared to previous synthetic or real blur datasets, showing significant improvements in deblurring performance.\n\n在训练去模糊网络时，拥有成对的模糊和清晰图像的合适数据集至关重要。现有数据集通常通过合成方法（叠加连续的清晰帧）或使用复杂的摄像系统捕捉真实模糊图像。然而，这些方法在模糊类型（如模糊轨迹）的多样性上有限，或需要大量人力来构建大规模数据集，难以充分反映真实世界中的模糊场景。为此，我们提出了GS-Blur，这是一个利用新方法合成的逼真模糊图像数据集。具体而言，我们首先利用多视角图像通过三维高斯分裂（3DGS）重建3D场景，然后通过沿随机生成的运动轨迹移动摄像机视角来渲染模糊图像。通过在GS-Blur的构建中采用不同的摄像机轨迹，我们的数据集包含了逼真且多样化的模糊类型，提供了一个适用于真实世界模糊的高泛化性大规模数据集。使用GS-Blur进行去模糊实验，与之前的合成或真实模糊数据集相比，我们展示了其在去模糊性能上的显著提升。\n"
  },
  {
    "path": "abs/2410.23718.md",
    "content": "### GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has become a crucial method for acquiring 3D assets. To protect the copyright of these assets, digital watermarking techniques can be applied to embed ownership information discreetly within 3DGS models. However, existing watermarking methods for meshes, point clouds, and implicit radiance fields cannot be directly applied to 3DGS models, as 3DGS models use explicit 3D Gaussians with distinct structures and do not rely on neural networks. Naively embedding the watermark on a pre-trained 3DGS can cause obvious distortion in rendered images. In our work, we propose an uncertainty-based method that constrains the perturbation of model parameters to achieve invisible watermarking for 3DGS. At the message decoding stage, the copyright messages can be reliably extracted from both 3D Gaussians and 2D rendered images even under various forms of 3D and 2D distortions. We conduct extensive experiments on the Blender, LLFF and MipNeRF-360 datasets to validate the effectiveness of our proposed method, demonstrating state-of-the-art performance on both message decoding accuracy and view synthesis quality.\n\n三维高斯点散射（3DGS）已成为获取三维资产的重要方法。为了保护这些资产的版权，可将数字水印技术应用于3DGS模型，在其中隐秘地嵌入所有权信息。然而，现有的针对网格、点云和隐式辐射场的水印方法无法直接应用于3DGS模型，因为3DGS模型使用显式的三维高斯，具有独特的结构，并且不依赖神经网络。将水印直接嵌入预训练的3DGS模型中会导致渲染图像出现明显的失真。在我们的研究中，我们提出了一种基于不确定性的方案，通过限制模型参数的扰动，实现对3DGS的隐形水印嵌入。在信息解码阶段，即使在各种三维和二维失真情况下，仍能可靠地从3D高斯和2D渲染图像中提取出版权信息。我们在Blender、LLFF和MipNeRF-360数据集上进行了大量实验，以验证所提方法的有效性，显示了在信息解码准确性和视图合成质量上的领先表现。\n"
  },
  {
    "path": "abs/2410.24204.md",
    "content": "### GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering\n\nWe consider the problem of physically-based inverse rendering using 3D Gaussian Splatting (3DGS) representations. While recent 3DGS methods have achieved remarkable results in novel view synthesis (NVS), accurately capturing high-fidelity geometry, physically interpretable materials and lighting remains challenging, as it requires precise geometry modeling to provide accurate surface normals, along with physically-based rendering (PBR) techniques to ensure correct material and lighting disentanglement. Previous 3DGS methods resort to approximating surface normals, but often struggle with noisy local geometry, leading to inaccurate normal estimation and suboptimal material-lighting decomposition. In this paper, we introduce GeoSplatting, a novel hybrid representation that augments 3DGS with explicit geometric guidance and differentiable PBR equations. Specifically, we bridge isosurface and 3DGS together, where we first extract isosurface mesh from a scalar field, then convert it into 3DGS points and formulate PBR equations for them in a fully differentiable manner. In GeoSplatting, 3DGS is grounded on the mesh geometry, enabling precise surface normal modeling, which facilitates the use of PBR frameworks for material decomposition. This approach further maintains the efficiency and quality of NVS from 3DGS while ensuring accurate geometry from the isosurface. Comprehensive evaluations across diverse datasets demonstrate the superiority of GeoSplatting, consistently outperforming existing methods both quantitatively and qualitatively.\n\n我们研究基于物理的逆向渲染问题，使用三维高斯分裂（3DGS）表示。尽管最新的3DGS方法在新视角合成（NVS）中取得了显著成果，但要精确捕捉高保真几何、物理可解释的材质和光照仍然具有挑战性，因为这需要精确的几何建模以提供准确的表面法线，并且需要基于物理的渲染（PBR）技术来确保材质和光照的正确解耦。现有的3DGS方法通常通过近似表面法线来解决该问题，但在处理噪声较大的局部几何时往往会遇到困难，导致法线估计不准和次优的材质光照分解。在本文中，我们提出了一种名为GeoSplatting的新型混合表示方法，通过显式几何引导和可微PBR方程扩展3DGS。具体而言，我们将等值面与3DGS相结合，首先从标量场中提取等值面网格，然后将其转换为3DGS点，并为其构建全可微的PBR方程。在GeoSplatting中，3DGS基于网格几何，使得表面法线建模更加精确，从而支持PBR框架用于材质分解。此方法在保持3DGS的NVS效率和质量的同时，确保了来自等值面的精确几何表现。在多样化数据集上的全面评估表明，GeoSplatting在定量和定性上均显著优于现有方法。\n"
  },
  {
    "path": "abs/2410.24207.md",
    "content": "### No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images\n\nWe introduce NoPoSplat, a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from unposed sparse multi-view images. Our model, trained exclusively with photometric loss, achieves real-time 3D Gaussian reconstruction during inference. To eliminate the need for accurate pose input during reconstruction, we anchor one input view's local camera coordinates as the canonical space and train the network to predict Gaussian primitives for all views within this space. This approach obviates the need to transform Gaussian primitives from local coordinates into a global coordinate system, thus avoiding errors associated with per-frame Gaussians and pose estimation. To resolve scale ambiguity, we design and compare various intrinsic embedding methods, ultimately opting to convert camera intrinsics into a token embedding and concatenate it with image tokens as input to the model, enabling accurate scene scale prediction. We utilize the reconstructed 3D Gaussians for novel view synthesis and pose estimation tasks and propose a two-stage coarse-to-fine pipeline for accurate pose estimation. Experimental results demonstrate that our pose-free approach can achieve superior novel view synthesis quality compared to pose-required methods, particularly in scenarios with limited input image overlap. For pose estimation, our method, trained without ground truth depth or explicit matching loss, significantly outperforms the state-of-the-art methods with substantial improvements. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.\n\n我们提出了NoPoSplat，一种能够从无位姿的稀疏多视角图像中重建由三维高斯参数化的三维场景的前向模型。我们的模型仅使用光度损失进行训练，实现了推理过程中实时的三维高斯重建。为消除重建过程中对准确位姿输入的需求，我们将一个输入视角的局部相机坐标系作为规范空间，并训练网络在该空间内为所有视角预测高斯基元。这种方法避免了将高斯基元从局部坐标系转换到全局坐标系的过程，从而避免了与每帧高斯和位姿估计相关的误差。为解决尺度模糊问题，我们设计并比较了多种内参嵌入方法，最终选择将相机内参转换为token嵌入，并将其与图像tokens作为模型输入进行拼接，从而实现准确的场景尺度预测。我们利用重建的三维高斯用于新视角合成和位姿估计任务，并提出了一个两阶段的粗到精位姿估计流程。实验结果表明，与需要位姿的模型相比，我们的无位姿方法在输入图像重叠有限的情况下可以实现更高质量的新视角合成。对于位姿估计任务，我们的方法在未使用真实深度数据或显式匹配损失的情况下，显著超越了最新的先进方法，取得了显著提升。本研究在无位姿泛化三维重建方面取得了重要进展，并展示了其在真实场景中的适用性。\n"
  },
  {
    "path": "abs/2410.24223.md",
    "content": "### URAvatar: Universal Relightable Gaussian Codec Avatars\n\nWe present a new approach to creating photorealistic and relightable head avatars from a phone scan with unknown illumination. The reconstructed avatars can be animated and relit in real time with the global illumination of diverse environments. Unlike existing approaches that estimate parametric reflectance parameters via inverse rendering, our approach directly models learnable radiance transfer that incorporates global light transport in an efficient manner for real-time rendering. However, learning such a complex light transport that can generalize across identities is non-trivial. A phone scan in a single environment lacks sufficient information to infer how the head would appear in general environments. To address this, we build a universal relightable avatar model represented by 3D Gaussians. We train on hundreds of high-quality multi-view human scans with controllable point lights. High-resolution geometric guidance further enhances the reconstruction accuracy and generalization. Once trained, we finetune the pretrained model on a phone scan using inverse rendering to obtain a personalized relightable avatar. Our experiments establish the efficacy of our design, outperforming existing approaches while retaining real-time rendering capability.\n\n我们提出了一种新方法，可通过手机扫描在未知光照条件下创建具有真实感和可重光照的头像模型。重建的头像能够在多种环境的全局光照下实现实时动画和重光照。与通过逆向渲染估计参数化反射参数的现有方法不同，我们的方法直接建模了可学习的辐射传输，在实时渲染中高效地结合了全局光传输。然而，学习这种能够跨身份泛化的复杂光传输并非易事。单一环境中的手机扫描缺乏推断头像在通用环境中外观的足够信息。为了解决这个问题，我们构建了一个由3D高斯表示的通用可重光照头像模型。我们在数百个使用可控点光源的高质量多视角人像扫描数据上进行训练。高分辨率的几何指导进一步提升了重建的准确性和泛化能力。训练完成后，我们通过逆向渲染对预训练模型进行微调，使其适应手机扫描数据，从而获得个性化的可重光照头像。我们的实验验证了该设计的有效性，优于现有方法，同时保持了实时渲染能力。\n"
  },
  {
    "path": "abs/2411.00144.md",
    "content": "### Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis\n\n3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness for novel view synthesis (NVS). However, the 3DGS model tends to overfit when trained with sparse posed views, limiting its generalization capacity for broader pose variations. In this paper, we alleviate the overfitting problem by introducing a self-ensembling Gaussian Splatting (SE-GS) approach. We present two Gaussian Splatting models named the Σ-model and the Δ-model. The Σ-model serves as the primary model that generates novel-view images during inference. At the training stage, the Σ-model is guided away from specific local optima by an uncertainty-aware perturbing strategy. We dynamically perturb the Δ-model based on the uncertainties of novel-view renderings across different training steps, resulting in diverse temporal models sampled from the Gaussian parameter space without additional training costs. The geometry of the Σ-model is regularized by penalizing discrepancies between the Σ-model and the temporal samples. Therefore, our SE-GS conducts an effective and efficient regularization across a large number of Gaussian Splatting models, resulting in a robust ensemble, the Σ-model. Experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets show that our approach improves NVS quality with few-shot training views, outperforming existing state-of-the-art methods.\n\n3D Gaussian Splatting（3DGS）在新视图合成（NVS）中表现出了显著的效果。然而，3DGS模型在使用稀疏姿态视图训练时容易出现过拟合，限制了其对更广泛姿态变化的泛化能力。本文通过引入一种自集成的高斯散点方法（Self-Ensembling Gaussian Splatting，SE-GS）来缓解过拟合问题。我们提出了两个高斯散点模型，分别命名为Σ-模型和Δ-模型。Σ-模型作为主要模型，用于推理阶段生成新视图图像。在训练阶段，通过一种不确定性感知扰动策略将Σ-模型引导离开特定的局部最优解。我们基于不同训练步骤中新视图渲染的不确定性对Δ-模型进行动态扰动，从而在无需额外训练成本的情况下，从高斯参数空间中采样出多样的时间模型。通过惩罚Σ-模型与这些时间样本之间的几何差异，对Σ-模型进行正则化。因此，SE-GS在大量高斯散点模型上实现了高效而有效的正则化，最终形成一个稳健的集成模型，即Σ-模型。实验结果表明，在LLFF、Mip-NeRF360、DTU和MVImgNet数据集上，我们的方法在少样本训练视图下提升了NVS质量，超越了现有的最先进方法。\n"
  },
  {
    "path": "abs/2411.00239.md",
    "content": "### Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes\n\nRepresenting underwater 3D scenes is a valuable yet complex task, as attenuation and scattering effects during underwater imaging significantly couple the information of the objects and the water. This coupling presents a significant challenge for existing methods in effectively representing both the objects and the water medium simultaneously. To address this challenge, we propose Aquatic-GS, a hybrid 3D representation approach for underwater scenes that effectively represents both the objects and the water medium. Specifically, we construct a Neural Water Field (NWF) to implicitly model the water parameters, while extending the latest 3D Gaussian Splatting (3DGS) to model the objects explicitly. Both components are integrated through a physics-based underwater image formation model to represent complex underwater scenes. Moreover, to construct more precise scene geometry and details, we design a Depth-Guided Optimization (DGO) mechanism that uses a pseudo-depth map as auxiliary guidance. After optimization, Aquatic-GS enables the rendering of novel underwater viewpoints and supports restoring the true appearance of underwater scenes, as if the water medium were absent. Extensive experiments on both simulated and real-world datasets demonstrate that Aquatic-GS surpasses state-of-the-art underwater 3D representation methods, achieving better rendering quality and real-time rendering performance with a 410x increase in speed. Furthermore, regarding underwater image restoration, Aquatic-GS outperforms representative dewatering methods in color correction, detail recovery, and stability.\n\n表示水下3D场景是一项既有价值又复杂的任务，因为在水下成像过程中，衰减和散射效应显著耦合了物体和水体的信息。这种耦合为现有方法带来了巨大挑战，难以同时有效表示物体和水介质。为了解决这一问题，我们提出了Aquatic-GS，一种用于水下场景的混合3D表示方法，可以有效地同时表示物体和水体介质。具体而言，我们构建了一个神经水域场（Neural Water Field, NWF），用于隐式建模水体参数，同时扩展了最新的3D高斯散点（3DGS）方法来显式建模物体。这两个组件通过基于物理的水下图像生成模型相结合，以表现复杂的水下场景。此外，为了构建更精确的场景几何和细节，我们设计了一个深度引导优化（Depth-Guided Optimization, DGO）机制，使用伪深度图作为辅助指导。在优化后，Aquatic-GS能够渲染新的水下视点，并支持还原水下场景的真实外观，就像水体介质不存在一样。大量的模拟和真实数据集实验表明，Aquatic-GS优于最先进的水下3D表示方法，提供了更好的渲染质量和实时渲染性能，速度提升了410倍。此外，在水下图像还原方面，Aquatic-GS在色彩校正、细节恢复和稳定性上也优于代表性去水方法。\n"
  },
  {
    "path": "abs/2411.00771.md",
    "content": "### CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes\n\nRecently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10× compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs.\n\n近年来，3D Gaussian Splatting（3DGS）革新了辐射场重建，展现了高效且高保真的新视图合成能力。然而，由于3DGS的非结构化特性，准确表示表面，特别是在大规模和复杂场景中，仍然是一个显著的挑战。本文提出了一种新的大规模场景重建方法——CityGaussianV2，专注于解决几何精度和效率相关的关键问题。在2D Gaussian Splatting（2DGS）良好泛化能力的基础上，我们解决了其收敛和可扩展性问题。具体而言，我们实现了基于分解梯度的密化和深度回归技术，以消除模糊伪影并加速收敛。为实现扩展，我们引入了一种拉伸滤波器，以缓解由2DGS退化引起的高斯数量膨胀。此外，我们优化了CityGaussian的训练管道以支持并行训练，达到了最高10倍的压缩效果，至少节省25%的训练时间，并减少50%的内存使用。我们还在大规模场景下建立了标准几何基准测试。实验结果表明，我们的方法在视觉质量、几何精度、存储和训练成本之间实现了良好的平衡。\n"
  },
  {
    "path": "abs/2411.01218.md",
    "content": "### Real-Time Spatio-Temporal Reconstruction of Dynamic Endoscopic Scenes with 4D Gaussian Splatting\n\nDynamic scene reconstruction is essential in robotic minimally invasive surgery, providing crucial spatial information that enhances surgical precision and outcomes. However, existing methods struggle to address the complex, temporally dynamic nature of endoscopic scenes. This paper presents ST-Endo4DGS, a novel framework that models the spatio-temporal volume of dynamic endoscopic scenes using unbiased 4D Gaussian Splatting (4DGS) primitives, parameterized by anisotropic ellipses with flexible 4D rotations. This approach enables precise representation of deformable tissue dynamics, capturing intricate spatial and temporal correlations in real time. Additionally, we extend spherindrical harmonics to represent time-evolving appearance, achieving realistic adaptations to lighting and view changes. A new endoscopic normal alignment constraint (ENAC) further enhances geometric fidelity by aligning rendered normals with depth-derived geometry. Extensive evaluations show that ST-Endo4DGS outperforms existing methods in both visual quality and real-time performance, establishing a new state-of-the-art in dynamic scene reconstruction for endoscopic surgery.\n\n动态场景重建在机器人微创手术中至关重要，为提升手术精度和效果提供了关键的空间信息。然而，现有方法难以应对内窥镜场景复杂且时间动态的特性。本文提出了一种新框架——ST-Endo4DGS，使用无偏的4D高斯散点（4DGS）基元建模动态内窥镜场景的时空体积，基元通过各向异性椭球体表示，并支持灵活的4D旋转。该方法能够精确表示可变形组织的动态，实时捕捉复杂的空间和时间相关性。此外，我们扩展了球面谐波以表示随时间变化的外观，实现了对光照和视角变化的真实适应。新引入的内窥镜法线对齐约束（ENAC）通过将渲染法线与深度派生几何对齐，进一步增强了几何精度。大量评估结果表明，ST-Endo4DGS在视觉质量和实时性能方面优于现有方法，确立了内窥镜手术动态场景重建的新先进标准。\n"
  },
  {
    "path": "abs/2411.01853.md",
    "content": "### GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes\n\nIn this paper we present a novel method for efficient and effective 3D surface reconstruction in open scenes. Existing Neural Radiance Fields (NeRF) based works typically require extensive training and rendering time due to the adopted implicit representations. In contrast, 3D Gaussian splatting (3DGS) uses an explicit and discrete representation, hence the reconstructed surface is built by the huge number of Gaussian primitives, which leads to excessive memory consumption and rough surface details in sparse Gaussian areas. To address these issues, we propose Gaussian Voxel Kernel Functions (GVKF), which establish a continuous scene representation based on discrete 3DGS through kernel regression. The GVKF integrates fast 3DGS rasterization and highly effective scene implicit representations, achieving high-fidelity open scene surface reconstruction. Experiments on challenging scene datasets demonstrate the efficiency and effectiveness of our proposed GVKF, featuring with high reconstruction quality, real-time rendering speed, significant savings in storage and training memory consumption.\n\n本文提出了一种用于高效3D表面重建的新方法，适用于开放场景。现有基于神经辐射场（NeRF）的工作通常由于采用隐式表示，需较长的训练和渲染时间。相比之下，3D Gaussian Splatting（3DGS）使用显式且离散的表示，因此重建的表面依赖于大量的高斯基元，导致内存消耗过大，并在高斯稀疏区域产生粗糙的表面细节。为了解决这些问题，我们提出了高斯体素核函数（Gaussian Voxel Kernel Functions, GVKF），通过核回归在离散的3DGS基础上建立连续的场景表示。GVKF结合了快速的3DGS光栅化和高效的场景隐式表示，实现了高保真的开放场景表面重建。在具有挑战性的数据集上的实验表明，我们提出的GVKF在重建质量、实时渲染速度、存储及训练内存消耗方面均表现出色。\n"
  },
  {
    "path": "abs/2411.02229.md",
    "content": "### FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training\n\nThe field of novel view synthesis from images has seen rapid advancements with the introduction of Neural Radiance Fields (NeRF) and more recently with 3D Gaussian Splatting. Gaussian Splatting became widely adopted due to its efficiency and ability to render novel views accurately. While Gaussian Splatting performs well when a sufficient amount of training images are available, its unstructured explicit representation tends to overfit in scenarios with sparse input images, resulting in poor rendering performance. To address this, we present a 3D Gaussian-based novel view synthesis method using sparse input images that can accurately render the scene from the viewpoints not covered by the training images. We propose a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models. This is achieved by using the matches of the available training images to supervise the generation of the novel views sampled between the training frames with color, geometry, and semantic losses. In addition, we introduce a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance of our method in few-shot novel view synthesis compared to existing state-of-the-art methods.\n\n从图像生成新视图的领域随着神经辐射场（NeRF）以及最近的3D Gaussian Splatting技术的引入得到了快速发展。由于高效且能够准确渲染新视图，高斯散点已被广泛采用。然而，当训练图像数量不足时，高斯散点的非结构化显式表示容易在稀疏输入图像场景中过拟合，导致渲染效果不佳。为此，我们提出了一种基于3D高斯的少样本新视图合成方法，使用稀疏输入图像即可准确渲染训练图像未覆盖视角下的场景。我们提出了一种多阶段训练方案，在新视图上施加基于匹配的一致性约束，而无需依赖预训练的深度估计或扩散模型。通过利用现有训练图像的匹配信息，对在训练帧之间采样的颜色、几何和语义损失进行监督生成。此外，我们引入了一种局部性保持正则化，用于3D高斯，保留场景的局部颜色结构，从而消除渲染伪影。合成和真实数据集上的评估结果表明，我们的方法在少样本新视图合成方面相比现有最先进方法具有竞争力甚至优越的性能。\n\n"
  },
  {
    "path": "abs/2411.02547.md",
    "content": "### Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting\n\nIn this paper, we present a novel algorithm for probabilistically updating and rasterizing semantic maps within 3D Gaussian Splatting (3D-GS). Although previous methods have introduced algorithms which learn to rasterize features in 3D-GS for enhanced scene understanding, 3D-GS can fail without warning which presents a challenge for safety-critical robotic applications. To address this gap, we propose a method which advances the literature of continuous semantic mapping from voxels to ellipsoids, combining the precise structure of 3D-GS with the ability to quantify uncertainty of probabilistic robotic maps. Given a set of images, our algorithm performs a probabilistic semantic update directly on the 3D ellipsoids to obtain an expectation and variance through the use of conjugate priors. We also propose a probabilistic rasterization which returns per-pixel segmentation predictions with quantifiable uncertainty. We compare our method with similar probabilistic voxel-based methods to verify our extension to 3D ellipsoids, and perform ablation studies on uncertainty quantification and temporal smoothing.\n\n本文提出了一种新算法，用于在3D Gaussian Splatting（3D-GS）中概率更新和光栅化语义地图。尽管已有方法通过学习在3D-GS中光栅化特征来增强场景理解，但3D-GS在某些情况下可能突然失效，这对安全关键的机器人应用构成了挑战。为弥补这一不足，我们提出了一种方法，将连续语义映射从体素扩展到椭球体，结合了3D-GS的精确结构与量化概率机器人地图不确定性的能力。给定一组图像，我们的算法直接在3D椭球体上执行概率语义更新，通过共轭先验获得期望和方差。同时，我们还提出了一种概率光栅化方法，能够返回具有可量化不确定性的逐像素分割预测。我们将该方法与类似的概率体素方法进行了比较，以验证其对3D椭球体的扩展，并进行了不确定性量化和时间平滑的消融研究。\n"
  },
  {
    "path": "abs/2411.02703.md",
    "content": "### LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has shown its ability in rapid rendering and high-fidelity mapping. In this paper, we introduce LVI-GS, a tightly-coupled LiDAR-Visual-Inertial mapping framework with 3DGS, which leverages the complementary characteristics of LiDAR and image sensors to capture both geometric structures and visual details of 3D scenes. To this end, the 3D Gaussians are initialized from colourized LiDAR points and optimized using differentiable rendering. In order to achieve high-fidelity mapping, we introduce a pyramid-based training approach to effectively learn multi-level features and incorporate depth loss derived from LiDAR measurements to improve geometric feature perception. Through well-designed strategies for Gaussian-Map expansion, keyframe selection, thread management, and custom CUDA acceleration, our framework achieves real-time photo-realistic mapping. Numerical experiments are performed to evaluate the superior performance of our method compared to state-of-the-art 3D reconstruction systems.\n\n3D Gaussian Splatting（3DGS）在快速渲染和高保真映射方面展现了其能力。本文提出了LVI-GS，一种紧耦合的LiDAR-视觉-惯性（LiDAR-Visual-Inertial）映射框架，结合3DGS的优势，利用LiDAR和图像传感器的互补特性来捕捉3D场景的几何结构和视觉细节。为此，我们从彩色化的LiDAR点初始化3D高斯，并通过可微分渲染进行优化。为实现高保真映射，我们引入了一种基于金字塔的训练方法，有效学习多层次特征，并结合从LiDAR测量获得的深度损失，以提升几何特征感知能力。通过精心设计的高斯图扩展、关键帧选择、线程管理和自定义CUDA加速策略，我们的框架实现了实时逼真的映射效果。数值实验验证了我们的方法相较于最先进的3D重建系统的卓越性能。\n"
  },
  {
    "path": "abs/2411.03086.md",
    "content": "### HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features\n\nRecent advancements in radiance field rendering show promising results in 3D scene representation, where Gaussian splatting-based techniques emerge as state-of-the-art due to their quality and efficiency. Gaussian splatting is widely used for various applications, including 3D human representation. However, previous 3D Gaussian splatting methods either use parametric body models as additional information or fail to provide any underlying structure, like human biomechanical features, which are essential for different applications. In this paper, we present a novel approach called HFGaussian that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS. The proposed method leverages generalizable Gaussian splatting technique to represent the human subject and its associated features, enabling efficient and generalizable reconstruction. By incorporating a pose regression network and the feature splatting technique with Gaussian splatting, HFGaussian demonstrates improved capabilities over existing 3D human methods, showcasing the potential of 3D human representations with integrated biomechanics. We thoroughly evaluate our HFGaussian method against the latest state-of-the-art techniques in human Gaussian splatting and pose estimation, demonstrating its real-time, state-of-the-art performance.\n\n在辐射场渲染的最新进展中，3D场景表示取得了令人瞩目的成果，其中基于高斯散点的技术凭借其质量和效率成为先进方法。这种技术已广泛应用于包括3D人体表示在内的多种应用。然而，现有的3D高斯散点方法要么依赖于参数化人体模型作为额外信息，要么未能提供底层结构（如人体生物力学特征），这些特征对于不同应用至关重要。本文提出了一种新方法，称为HFGaussian，能够从稀疏输入图像实时估计新视图和人体特征（如3D骨架、3D关键点和密集姿态），帧率达25 FPS。该方法利用通用高斯散点技术来表示人体对象及其相关特征，实现高效且具有泛化能力的重建。通过结合姿态回归网络和特征散点技术与高斯散点，HFGaussian在现有的3D人体方法之上展示了增强的能力，体现了与生物力学集成的人体3D表示的潜力。我们对HFGaussian方法与最新的先进人体高斯散点和姿态估计技术进行了全面评估，证明其在实时性能和先进性方面的表现。\n"
  },
  {
    "path": "abs/2411.03555.md",
    "content": "### Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting\n\nThis paper introduces a method to enhance Interactive Imitation Learning (IIL) by extracting touch interaction points and tracking object movement from video demonstrations. The approach extends current IIL systems by providing robots with detailed knowledge of both where and how to interact with objects, particularly complex articulated ones like doors and drawers. By leveraging cutting-edge techniques such as 3D Gaussian Splatting and FoundationPose for tracking, this method allows robots to better understand and manipulate objects in dynamic environments. The research lays the foundation for more effective task learning and execution in autonomous robotic systems.\n\n本文提出了一种方法，通过从视频演示中提取触摸交互点并跟踪物体运动，来增强交互式模仿学习（Interactive Imitation Learning, IIL）。该方法扩展了现有的 IIL 系统，使机器人能够详细了解如何以及在何处与物体交互，尤其是复杂的关节类物体，如门和抽屉。通过利用先进技术，例如用于跟踪的 3D Gaussian Splatting 和 FoundationPose，该方法使机器人能够更好地理解和操作动态环境中的物体。这项研究为自主机器人系统中更高效的任务学习与执行奠定了基础。\n"
  },
  {
    "path": "abs/2411.03637.md",
    "content": "### Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis\n\nDespite the substantial progress of novel view synthesis, existing methods, either based on the Neural Radiance Fields (NeRF) or more recently 3D Gaussian Splatting (3DGS), suffer significant degradation when the input becomes sparse. Numerous efforts have been introduced to alleviate this problem, but they still struggle to synthesize satisfactory results efficiently, especially in the large scene. In this paper, we propose SCGaussian, a Structure Consistent Gaussian Splatting method using matching priors to learn 3D consistent scene structure. Considering the high interdependence of Gaussian attributes, we optimize the scene structure in two folds: rendering geometry and, more importantly, the position of Gaussian primitives, which is hard to be directly constrained in the vanilla 3DGS due to the non-structure property. To achieve this, we present a hybrid Gaussian representation. Besides the ordinary non-structure Gaussian primitives, our model also consists of ray-based Gaussian primitives that are bound to matching rays and whose optimization of their positions is restricted along the ray. Thus, we can utilize the matching correspondence to directly enforce the position of these Gaussian primitives to converge to the surface points where rays intersect. Extensive experiments on forward-facing, surrounding, and complex large scenes show the effectiveness of our approach with state-of-the-art performance and high efficiency.\n\n尽管新颖视图合成取得了显著进展，现有方法（无论是基于神经辐射场（NeRF）还是最近的3D Gaussian Splatting（3DGS））在输入稀疏时仍会出现显著性能退化。尽管已经提出了许多改进措施来缓解这一问题，但在大场景中高效生成令人满意的结果依然具有挑战性。在本文中，我们提出了一种名为 SCGaussian 的结构一致性高斯分布（Structure Consistent Gaussian Splatting）方法，通过匹配先验学习3D一致的场景结构。考虑到高斯属性之间的高度相关性，我们从两方面优化场景结构：渲染几何以及更重要的高斯基元的位置。由于标准3DGS方法的非结构特性，高斯基元位置难以直接约束。为此，我们提出了一种混合高斯表示。除了普通的非结构高斯基元外，我们的模型还包括基于射线的高斯基元，这些基元绑定于匹配的射线，其位置优化限制在射线方向上。因此，我们能够利用匹配对应关系，直接强制这些高斯基元的位置收敛到射线与表面交点处。广泛的实验表明，在前向视图、环绕场景以及复杂大场景中，我们的方法表现出了高效的性能和领先的效果。\n"
  },
  {
    "path": "abs/2411.03706.md",
    "content": "### 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement\n\nWe present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update.\n\n我们提出了3DGS-CD，这是首个基于3D Gaussian Splatting (3DGS) 的方法，用于检测3D场景中物体的物理重排。该方法通过比较在不同时间拍摄的两组未对齐图像，估计3D物体级别的变化。利用3DGS的新颖视图渲染和EfficientSAM的零样本分割能力，我们检测2D物体级别的变化，并在不同视图间关联和融合，最终估计出3D变化。我们的方法能够在杂乱环境中通过稀疏的变化后图像检测重排，检测时间仅需18秒，甚至只需一张新图像。该方法不依赖深度输入、用户指令、物体类别或物体模型——物体是否被重排仅通过位置变化来识别。我们在公共和自采集的真实世界数据集上进行了评估，与最先进的基于辐射场的变化检测方法相比，精度提高了多达14%，性能提升了三个数量级。这种显著的性能提升使得广泛的下游应用成为可能，其中我们重点介绍了三个关键用例：物体重建、机器人工作空间复位以及3DGS模型更新。\n"
  },
  {
    "path": "abs/2411.03807.md",
    "content": "### GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting\n\nThis paper proposes a new method for accurate and robust 6D pose estimation of novel objects, named GS2Pose. By introducing 3D Gaussian splatting, GS2Pose can utilize the reconstruction results without requiring a high-quality CAD model, which means it only requires segmented RGBD images as input. Specifically, GS2Pose employs a two-stage structure consisting of coarse estimation followed by refined estimation. In the coarse stage, a lightweight U-Net network with a polarization attention mechanism, called Pose-Net, is designed. By using the 3DGS model for supervised training, Pose-Net can generate NOCS images to compute a coarse pose. In the refinement stage, GS2Pose formulates a pose regression algorithm following the idea of reprojection or Bundle Adjustment (BA), referred to as GS-Refiner. By leveraging Lie algebra to extend 3DGS, GS-Refiner obtains a pose-differentiable rendering pipeline that refines the coarse pose by comparing the input images with the rendered images. GS-Refiner also selectively updates parameters in the 3DGS model to achieve environmental adaptation, thereby enhancing the algorithm's robustness and flexibility to illuminative variation, occlusion, and other challenging disruptive factors. GS2Pose was evaluated through experiments conducted on the LineMod dataset, where it was compared with similar algorithms, yielding highly competitive results.\n\n本文提出了一种名为 GS2Pose 的新方法，用于对新物体进行准确且鲁棒的 6D 姿态估计。通过引入 3D Gaussian Splatting，GS2Pose 可以利用重建结果而无需高质量的 CAD 模型，仅需分割后的 RGBD 图像作为输入。具体而言，GS2Pose 采用了由粗估计和精估计组成的两阶段结构。在粗估计阶段，设计了一个轻量化的 U-Net 网络 Pose-Net，该网络结合极化注意力机制，并通过 3DGS 模型进行监督训练，以生成 NOCS 图像用于计算粗略姿态。在精估计阶段，GS2Pose 基于重投影或捆绑调整（Bundle Adjustment, BA）的思想，设计了一种姿态回归算法，称为 GS-Refiner。通过利用 Lie 代数扩展 3DGS，GS-Refiner 实现了一个姿态可微的渲染管线，通过将输入图像与渲染图像进行比较来优化粗略姿态。此外，GS-Refiner 有选择性地更新 3DGS 模型中的参数，以适应环境变化，从而增强算法对光照变化、遮挡以及其他挑战性干扰因素的鲁棒性和灵活性。通过在 LineMod 数据集上的实验评估，GS2Pose 与类似算法进行了对比，展现了极具竞争力的结果。\n"
  },
  {
    "path": "abs/2411.04924.md",
    "content": "### MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views\n\nWe introduce MVSplat360, a feed-forward approach for 360° novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively combining geometry-aware 3D reconstruction with temporally consistent video generation. Specifically, it refactors a feed-forward 3D Gaussian Splatting (3DGS) model to render features directly into the latent space of a pre-trained Stable Video Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360° NVS tasks. Experiments on the existing benchmark RealEstate10K also confirm the effectiveness of our model.\n\n我们提出了 MVSplat360，一种用于真实世界场景 360° 新视图合成（NVS）的前馈式方法，仅需稀疏观测输入。在这一设置下，由于输入视图之间的重叠极少以及提供的视觉信息不足，使得问题本质上具有不适定性，传统方法难以生成高质量结果。MVSplat360 通过有效结合几何感知的 3D 重建和时间一致的视频生成，克服了这一挑战。具体来说，我们对前馈式 3D Gaussian Splatting (3DGS) 模型进行重构，将特征直接渲染到预训练的 Stable Video Diffusion (SVD) 模型的潜在空间中。这些特征作为位姿和视觉提示，引导去噪过程，从而生成具有真实感且 3D 一致的视图。我们的模型支持端到端训练，能够以少至 5 个稀疏输入视图渲染任意视图。为评估 MVSplat360 的性能，我们在具有挑战性的 DL3DV-10K 数据集上引入了一个新的基准测试，MVSplat360 在广角甚至 360° NVS 任务中相比最先进的方法实现了更高的视觉质量。此外，在现有的 RealEstate10K 基准上进行的实验同样验证了我们模型的有效性。\n"
  },
  {
    "path": "abs/2411.05006.md",
    "content": "### ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing\n\nThis paper proposes ProEdit - a simple yet effective framework for high-quality 3D scene editing guided by diffusion distillation in a novel progressive manner. Inspired by the crucial observation that multi-view inconsistency in scene editing is rooted in the diffusion model's large feasible output space (FOS), our framework controls the size of FOS and reduces inconsistency by decomposing the overall editing task into several subtasks, which are then executed progressively on the scene. Within this framework, we design a difficulty-aware subtask decomposition scheduler and an adaptive 3D Gaussian splatting (3DGS) training strategy, ensuring high quality and efficiency in performing each subtask. Extensive evaluation shows that our ProEdit achieves state-of-the-art results in various scenes and challenging editing tasks, all through a simple framework without any expensive or sophisticated add-ons like distillation losses, components, or training procedures. Notably, ProEdit also provides a new way to control, preview, and select the \"aggressivity\" of editing operation during the editing process.\n\n本文提出了 ProEdit——一种简单但高效的框架，用于通过新颖的渐进式扩散蒸馏方式实现高质量的 3D 场景编辑。受到以下关键观察的启发：场景编辑中的多视图不一致性源于扩散模型的巨大可行输出空间（FOS），我们的框架通过将整体编辑任务分解为若干子任务并在场景上逐步执行，来控制 FOS 的规模并减少不一致性。在该框架内，我们设计了一个难度感知的子任务分解调度器和一种自适应的 3D Gaussian Splatting (3DGS) 训练策略，从而保证每个子任务的高质量和高效率。广泛的评估表明，ProEdit 在各种场景和复杂编辑任务中达到了最先进的效果，同时框架简单，不需要昂贵或复杂的附加组件，例如蒸馏损失、额外模块或复杂训练流程。值得注意的是，ProEdit 还提供了一种新的方式来在编辑过程中控制、预览和选择编辑操作的“激进性”。\n"
  },
  {
    "path": "abs/2411.05731.md",
    "content": "### PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering\n\nRecent advances in structured 3D Gaussians for view-adaptive rendering, particularly through methods like Scaffold-GS, have demonstrated promising results in neural scene representation. However, existing approaches still face challenges in perceptual consistency and precise view-dependent effects. We present PEP-GS, a novel framework that enhances structured 3D Gaussians through three key innovations: (1) a Local-Enhanced Multi-head Self-Attention (LEMSA) mechanism that replaces spherical harmonics for more accurate view-dependent color decoding, and (2) Kolmogorov-Arnold Networks (KAN) that optimize Gaussian opacity and covariance functions for enhanced interpretability and splatting precision. (3) a Neural Laplacian Pyramid Decomposition (NLPD) that improves perceptual similarity across views. Our comprehensive evaluation across multiple datasets indicates that, compared to the current state-of-the-art methods, these improvements are particularly evident in challenging scenarios such as view-dependent effects, specular reflections, fine-scale details and false geometry generation.\n\n在视图自适应渲染中，基于结构化 3D 高斯的方法（尤其是 Scaffold-GS）近年来在神经场景表示方面取得了令人瞩目的成果。然而，现有方法仍在感知一致性和精确的视图依赖效果方面面临挑战。我们提出了 PEP-GS，这是一种通过以下三个关键创新增强结构化 3D 高斯的框架：(1) 引入局部增强多头自注意力机制（Local-Enhanced Multi-head Self-Attention, LEMSA），取代球谐函数，实现更精确的视图依赖颜色解码；(2) 采用 Kolmogorov-Arnold 网络（Kolmogorov-Arnold Networks, KAN），优化高斯的不透明度和协方差函数，提升可解释性和投影精度；(3) 提出神经拉普拉斯金字塔分解（Neural Laplacian Pyramid Decomposition, NLPD），以提高跨视图的感知相似性。我们在多个数据集上的综合评估表明，与当前最先进的方法相比，这些改进在视图依赖效果、镜面反射、细节刻画以及虚假几何生成等具有挑战性的场景中表现尤为显著。\n"
  },
  {
    "path": "abs/2411.06019.md",
    "content": "### GaussianSpa: An \"Optimizing-Sparsifying\" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a mainstream for novel view synthesis, leveraging continuous aggregations of Gaussian functions to model scene geometry. However, 3DGS suffers from substantial memory requirements to store the multitude of Gaussians, hindering its practicality. To address this challenge, we introduce GaussianSpa, an optimization-based simplification framework for compact and high-quality 3DGS. Specifically, we formulate the simplification as an optimization problem associated with the 3DGS training. Correspondingly, we propose an efficient \"optimizing-sparsifying\" solution that alternately solves two independent sub-problems, gradually imposing strong sparsity onto the Gaussians in the training process. Our comprehensive evaluations on various datasets show the superiority of GaussianSpa over existing state-of-the-art approaches. Notably, GaussianSpa achieves an average PSNR improvement of 0.9 dB on the real-world Deep Blending dataset with 10× fewer Gaussians compared to the vanilla 3DGS.\n\n3D Gaussian Splatting (3DGS) 已成为新视图合成的主流方法，通过连续聚合高斯函数来建模场景几何。然而，3DGS 需要大量内存存储大量的高斯基元，限制了其实用性。为了解决这一问题，我们提出了 GaussianSpa，一种基于优化的简化框架，用于实现紧凑且高质量的 3DGS。具体而言，我们将简化问题表述为与 3DGS 训练相关的优化问题。为此，我们提出了一种高效的“优化-稀疏化”解决方案，通过交替解决两个独立的子问题，在训练过程中逐步对高斯基元施加强稀疏性。我们在多个数据集上的综合评估表明，GaussianSpa 相较于现有最先进方法表现出显著优势。尤其是在真实世界的 Deep Blending 数据集上，GaussianSpa 在使用 10 倍更少的高斯基元的情况下，平均 PSNR 提升了 0.9 dB，相较于标准 3DGS 展现了卓越的效果。\n"
  },
  {
    "path": "abs/2411.06390.md",
    "content": "### SplatFormer: Point Transformer for Robust 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on Gaussian splats. SplatFormer takes as input an initial 3DGS set optimized under limited training views and refines it in a single forward pass, effectively removing potential artifacts in OOD test views. To our knowledge, this is the first successful application of point transformers directly on 3DGS sets, surpassing the limitations of previous multi-scene training methods, which could handle only a restricted number of input views during inference. Our model significantly improves rendering quality under extreme novel views, achieving state-of-the-art performance in these challenging scenarios and outperforming various 3DGS regularization techniques, multi-scene models tailored for sparse view synthesis, and diffusion-based frameworks.\n\n3D Gaussian Splatting (3DGS) 最近在高真实感重建领域取得了突破，兼具高视觉保真度和实时性能。然而，当测试视图偏离训练时使用的摄像机角度时，渲染质量会显著下降，这对沉浸式自由视点渲染和导航等应用构成了重大挑战。在本研究中，我们对 3DGS 及相关的新视图合成方法在分布外（Out-of-Distribution, OOD）测试摄像机场景下进行了全面评估。通过在合成和真实数据集上创建多样化的测试案例，我们发现，包括采用各种正则化技术和数据驱动先验的现有方法在内，大多数方法在应对 OOD 视图时仍然难以实现有效泛化。\n为了解决这一局限性，我们提出了 SplatFormer，这是首个专为高斯投影点设计的点变换器模型。SplatFormer 以有限训练视图优化的初始 3DGS 集作为输入，并在单次前向传递中对其进行精化，有效消除了 OOD 测试视图中的潜在伪影。据我们所知，这也是点变换器首次成功应用于 3DGS 集，突破了先前多场景训练方法的局限性，这些方法在推理期间只能处理有限数量的输入视图。\n我们的模型显著提升了极端新视图下的渲染质量，在这些具有挑战性的场景中实现了最先进的性能，并超越了各种 3DGS 正则化技术、多场景稀疏视图合成模型以及基于扩散框架的方法。\n"
  },
  {
    "path": "abs/2411.06602.md",
    "content": "### Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction\n\n3D Gaussian Splatting has recently achieved notable success in novel view synthesis for dynamic scenes and geometry reconstruction in static scenes. Building on these advancements, early methods have been developed for dynamic surface reconstruction by globally optimizing entire sequences. However, reconstructing dynamic scenes with significant topology changes, emerging or disappearing objects, and rapid movements remains a substantial challenge, particularly for long sequences. To address these issues, we propose AT-GS, a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization. To avoid local minima across frames, we introduce a unified and adaptive gradient-aware densification strategy that integrates the strengths of conventional cloning and splitting techniques. Additionally, we reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames. Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis, even in complex and challenging scenes. Extensive experiments on diverse multi-view video datasets demonstrate the effectiveness of our approach, showing clear advantages over baseline methods.\n\n3D Gaussian Splatting (3DGS) 近年来在动态场景的新视图合成和静态场景的几何重建方面取得了显著成功。在此基础上，早期方法通过对整个序列进行全局优化，实现了动态表面的重建。然而，对于具有显著拓扑变化、新物体出现或消失以及快速运动的动态场景的重建，特别是在长序列中，仍然存在巨大挑战。\n为了解决这些问题，我们提出了 AT-GS，一种通过逐帧增量优化从多视图视频中重建高质量动态表面的新方法。为避免帧间的局部极小值，我们引入了一种统一且自适应的梯度感知致密化策略，结合了传统克隆与分裂技术的优势。此外，通过确保连续帧间曲率图的一致性，我们有效减少了动态表面中的时间抖动。\n我们的方法在动态表面重建中实现了更高的准确性和时间一致性，即使在复杂且具有挑战性的场景中，也能提供高保真的时空新视图合成。在广泛的多视图视频数据集上进行的实验表明，我们的方法相较基线方法展现了显著优势，有效验证了其性能和适用性。\n"
  },
  {
    "path": "abs/2411.06976.md",
    "content": "### A Hierarchical Compression Technique for 3D Gaussian Splatting Compression\n\n3D Gaussian Splatting (GS) demonstrates excellent rendering quality and generation speed in novel view synthesis. However, substantial data size poses challenges for storage and transmission, making 3D GS compression an essential technology. Current 3D GS compression research primarily focuses on developing more compact scene representations, such as converting explicit 3D GS data into implicit forms. In contrast, compression of the GS data itself has hardly been explored. To address this gap, we propose a Hierarchical GS Compression (HGSC) technique. Initially, we prune unimportant Gaussians based on importance scores derived from both global and local significance, effectively reducing redundancy while maintaining visual quality. An Octree structure is used to compress 3D positions. Based on the 3D GS Octree, we implement a hierarchical attribute compression strategy by employing a KD-tree to partition the 3D GS into multiple blocks. We apply farthest point sampling to select anchor primitives within each block and others as non-anchor primitives with varying Levels of Details (LoDs). Anchor primitives serve as reference points for predicting non-anchor primitives across different LoDs to reduce spatial redundancy. For anchor primitives, we use the region adaptive hierarchical transform to achieve near-lossless compression of various attributes. For non-anchor primitives, each is predicted based on the k-nearest anchor primitives. To further minimize prediction errors, the reconstructed LoD and anchor primitives are combined to form new anchor primitives to predict the next LoD. Our method notably achieves superior compression quality and a significant data size reduction of over 4.5 times compared to the state-of-the-art compression method on small scenes datasets\n\n3D Gaussian Splatting (GS) 在新视图合成中表现出卓越的渲染质量和生成速度。然而，其巨大的数据规模对存储和传输提出了挑战，使得 3D GS 数据压缩成为一项关键技术。目前的 3D GS 压缩研究主要集中于开发更紧凑的场景表示形式，例如将显式的 3D GS 数据转化为隐式形式。而对 GS 数据本身的压缩尚未得到充分探索。\n为填补这一空白，我们提出了一种 分层高斯分布压缩技术 (Hierarchical GS Compression, HGSC)。首先，我们通过基于全局和局部重要性得分的筛选机制裁剪不重要的高斯基元，减少冗余同时保持视觉质量。随后，我们采用八叉树 (Octree) 结构对 3D 位置进行压缩。在基于 3D GS 八叉树的基础上，我们通过使用 KD 树将 3D GS 数据分块，实施分层属性压缩策略。在每个块内，我们通过最远点采样选取锚点基元 (anchor primitives)，其余为不同细节级别 (Levels of Details, LoDs) 的非锚点基元。\n锚点基元作为参考点，用于预测不同 LoD 的非锚点基元，以减少空间冗余。对于锚点基元，我们采用区域自适应分层变换技术，实现对多种属性的近无损压缩。而对于非锚点基元，我们基于 k 个最近锚点基元进行预测。为进一步减少预测误差，结合重建的 LoD 和锚点基元形成新的锚点基元，进一步预测下一级 LoD。\n实验表明，我们的方法在小规模场景数据集上实现了显著的数据压缩效果，与最先进的压缩方法相比，数据规模减少了超过 4.5 倍，同时保持了优越的压缩质量。\n"
  },
  {
    "path": "abs/2411.07478.md",
    "content": "### GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering\n\nRecovering the intrinsic physical attributes of a scene from images, generally termed as the inverse rendering problem, has been a central and challenging task in computer vision and computer graphics. In this paper, we present GUS-IR, a novel framework designed to address the inverse rendering problem for complicated scenes featuring rough and glossy surfaces. This paper starts by analyzing and comparing two prominent shading techniques popularly used for inverse rendering, forward shading and deferred shading, effectiveness in handling complex materials. More importantly, we propose a unified shading solution that combines the advantages of both techniques for better decomposition. In addition, we analyze the normal modeling in 3D Gaussian Splatting (3DGS) and utilize the shortest axis as normal for each particle in GUS-IR, along with a depth-related regularization, resulting in improved geometric representation and better shape reconstruction. Furthermore, we enhance the probe-based baking scheme proposed by GS-IR to achieve more accurate ambient occlusion modeling to better handle indirect illumination. Extensive experiments have demonstrated the superior performance of GUS-IR in achieving precise intrinsic decomposition and geometric representation, supporting many downstream tasks (such as relighting, retouching) in computer vision, graphics, and extended reality.\n\n从图像中恢复场景的内在物理属性（通常称为反向渲染问题）一直是计算机视觉和计算机图形学中的核心挑战性任务。在本文中，我们提出了 GUS-IR，一个用于解决复杂场景（包括粗糙和光滑表面）反向渲染问题的新框架。本文首先分析并比较了反向渲染中常用的两种主要着色技术——正向着色 (forward shading) 和延迟着色 (deferred shading)——在处理复杂材质时的效果。更重要的是，我们提出了一种结合两种技术优势的统一着色解决方案，从而实现更优的分解。\n此外，我们分析了 3D Gaussian Splatting (3DGS) 的法线建模，并在 GUS-IR 中利用每个粒子的最短轴作为法线，同时引入深度相关的正则化，提升了几何表示能力并改善了形状重建效果。此外，我们改进了 GS-IR 提出的基于探针的烘焙方案，以更精确地模拟环境光遮蔽（ambient occlusion），从而更好地处理间接光照。\n大量实验表明，GUS-IR 在实现精确的内在分解和几何表示方面表现优越，支持包括重光照（relighting）和图像修饰（retouching）在内的众多下游任务，对计算机视觉、图形学和扩展现实领域具有广泛的应用价值。\n"
  },
  {
    "path": "abs/2411.07541.md",
    "content": "### HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting\n\nThe online reconstruction of dynamic scenes from multi-view streaming videos faces significant challenges in training, rendering and storage efficiency. Harnessing superior learning speed and real-time rendering capabilities, 3D Gaussian Splatting (3DGS) has recently demonstrated considerable potential in this field. However, 3DGS can be inefficient in terms of storage and prone to overfitting by excessively growing Gaussians, particularly with limited views. This paper proposes an efficient framework, dubbed HiCoM, with three key components. First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy. Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians to swiftly and accurately learn motions across frames. Finally, we continually refine the 3DGS with additional Gaussians, which are later merged into the initial 3DGS to maintain consistency with the evolving scene. To preserve a compact representation, an equivalent number of low-opacity Gaussians that minimally impact the representation are removed before processing subsequent frames. Extensive experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about 20% and reduces the data storage by 85%, achieving competitive free-viewpoint video synthesis quality but with higher robustness and stability. Moreover, by parallel learning multiple frames simultaneously, our HiCoM decreases the average training wall time to <2 seconds per frame with negligible performance degradation, substantially boosting real-world applicability and responsiveness.\n\n从多视图流媒体视频中在线重建动态场景面临着训练、渲染和存储效率方面的显著挑战。3D Gaussian Splatting (3DGS) 以其卓越的学习速度和实时渲染能力，近年来在该领域展现了巨大潜力。然而，3DGS 在存储效率方面存在不足，且在视图有限的情况下容易通过过多增长的高斯基元导致过拟合。为了解决这些问题，本文提出了一个高效框架 HiCoM，包含三个关键组件。\n首先，我们采用扰动平滑策略构建紧凑且鲁棒的初始 3DGS 表示。接着，我们引入了一种 分层一致运动机制 (Hierarchical Coherent Motion)，利用 3D 高斯的非均匀分布和局部一致性，快速准确地学习帧间运动。最后，我们通过额外高斯对 3DGS 进行持续优化，并将这些高斯合并到初始 3DGS 中，以保持与动态变化场景的一致性。为维持紧凑表示，在处理后续帧前，移除等量的低不透明度高斯基元，这些基元对表示的影响最小。\n在两个广泛使用的数据集上进行的大量实验表明，该框架在学习效率上比现有最先进方法提升约 20%，存储需求减少 85%，在自由视点视频合成质量上达到了具有竞争力的效果，同时表现出更高的鲁棒性和稳定性。此外，通过并行学习多个帧，HiCoM 将平均训练时间减少至每帧小于 2 秒，性能几乎没有下降，显著提升了其在实际应用中的响应速度和适用性。\n"
  },
  {
    "path": "abs/2411.07555.md",
    "content": "### GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting\n\nWe introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.\n\n我们提出了 GaussianCut，一种针对以 3D 高斯为表示的场景的交互式多视图分割新方法。该方法支持通过单视图交互选择目标对象，接受直观的用户输入形式，例如点选、粗略涂抹或文本描述。利用 3D Gaussian Splatting (3DGS) 作为底层场景表示，简化了感兴趣对象的提取，这些对象被视为场景高斯基元的子集。\n我们的核心思想是将场景表示为图，并通过图切割算法最小化能量函数，从而有效地将高斯基元划分为前景和背景。为此，我们基于场景中的高斯基元构建图，并设计了一个分割对齐的能量函数，将用户输入与场景属性相结合。为获得初步的粗分割结果，我们利用 2D 图像/视频分割模型，并通过我们的图构建方法进一步优化这些粗分割结果。\n实验证明，GaussianCut 在各种场景中具有很强的适应性。在无需额外针对分割任务训练的情况下，GaussianCut 实现了与当前最先进的 3D 分割方法相媲美的性能。\n"
  },
  {
    "path": "abs/2411.07579.md",
    "content": "### Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation\n\nRecently, 3D Gaussian Splatting has dominated novel-view synthesis with its real-time rendering speed and state-of-the-art rendering quality. However, during the rendering process, the use of the Jacobian of the affine approximation of the projection transformation leads to inevitable errors, resulting in blurriness, artifacts and a lack of scene consistency in the final rendered images. To address this issue, we introduce an ellipsoid-based projection method to calculate the projection of Gaussian ellipsoid on the image plane, witch is the primitive of 3D Gaussian Splatting. As our proposed ellipsoid-based projection method cannot handle Gaussian ellipsoids with camera origins inside them or parts lying below z=0 plane in the camera space, we designed a pre-filtering strategy. Experiments over multiple widely adopted benchmark datasets show that using our ellipsoid-based projection method can enhance the rendering quality of 3D Gaussian Splatting and its extensions.\n\n近年来，3D Gaussian Splatting 凭借其实时渲染速度和最先进的渲染质量在新视图合成领域占据了主导地位。然而，在渲染过程中，使用投影变换的仿射近似雅可比矩阵不可避免地引入误差，导致最终渲染图像出现模糊、伪影以及场景一致性不足等问题。\n为了解决这一问题，我们提出了一种基于椭球的投影方法，用于计算 3D Gaussian Splatting 中高斯椭球在图像平面上的投影。相比于传统方法，该投影更接近真实投影，减少了误差。然而，由于该方法无法处理相机原点位于高斯椭球内部或高斯椭球部分位于相机空间 ￼ 平面以下的情况，我们设计了一种预过滤策略以应对这些特殊情况。\n在多个广泛采用的基准数据集上的实验表明，基于椭球的投影方法能够显著提升 3D Gaussian Splatting 及其扩展方法的渲染质量。\n"
  },
  {
    "path": "abs/2411.08279.md",
    "content": "### MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation\n\nEmerging 3D scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated their effectiveness in Simultaneous Localization and Mapping (SLAM) for photo-realistic rendering, particularly when using high-quality video sequences as input. However, existing methods struggle with motion-blurred frames, which are common in real-world scenarios like low-light or long-exposure conditions. This often results in a significant reduction in both camera localization accuracy and map reconstruction quality. To address this challenge, we propose a dense visual SLAM pipeline (i.e. MBA-SLAM) to handle severe motion-blurred inputs. Our approach integrates an efficient motion blur-aware tracker with either neural radiance fields or Gaussian Splatting based mapper. By accurately modeling the physical image formation process of motion-blurred images, our method simultaneously learns 3D scene representation and estimates the cameras' local trajectory during exposure time, enabling proactive compensation for motion blur caused by camera movement. In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach.\n\n新兴的 3D 场景表示方法，如神经辐射场（Neural Radiance Fields, NeRF）和 3D Gaussian Splatting (3DGS)，在基于高质量视频序列的同时定位与建图（Simultaneous Localization and Mapping, SLAM）任务中表现出了高效的真实感渲染能力。然而，现有方法在处理运动模糊帧时表现欠佳，这在现实场景中（如低光照或长曝光条件下）十分常见，往往导致相机定位精度和地图重建质量显著下降。\n为应对这一挑战，我们提出了一种处理严重运动模糊输入的密集视觉 SLAM 管道——MBA-SLAM。该方法结合了一种高效的运动模糊感知追踪器（motion blur-aware tracker）和基于神经辐射场或高斯分布的建图器（mapper）。通过准确建模运动模糊图像的物理生成过程，该方法能够同时学习 3D 场景表示并估计曝光时间内相机的局部轨迹，从而主动补偿因相机运动导致的运动模糊。\n在实验中，我们验证了 MBA-SLAM 在相机定位和地图重建方面优于当前最先进的方法。无论是在包含清晰图像的数据集还是存在运动模糊的数据集中（包括合成和真实数据集），我们的方法均表现出卓越性能，展示了其卓越的通用性和鲁棒性。\n"
  },
  {
    "path": "abs/2411.08373.md",
    "content": "### DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization\n\nAchieving robust and precise pose estimation in dynamic scenes is a significant research challenge in Visual Simultaneous Localization and Mapping (SLAM). Recent advancements integrating Gaussian Splatting into SLAM systems have proven effective in creating high-quality renderings using explicit 3D Gaussian models, significantly improving environmental reconstruction fidelity. However, these approaches depend on a static environment assumption and face challenges in dynamic environments due to inconsistent observations of geometry and photometry. To address this problem, we propose DG-SLAM, the first robust dynamic visual SLAM system grounded in 3D Gaussians, which provides precise camera pose estimation alongside high-fidelity reconstructions. Specifically, we propose effective strategies, including motion mask generation, adaptive Gaussian point management, and a hybrid camera tracking algorithm to improve the accuracy and robustness of pose estimation. Extensive experiments demonstrate that DG-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and novel-view synthesis in dynamic scenes, outperforming existing methods meanwhile preserving real-time rendering ability.\n\n在动态场景中实现鲁棒且精确的姿态估计是视觉同时定位与建图（Visual SLAM）领域的一个重要研究难题。最近，结合 3D Gaussian Splatting 的 SLAM 系统在通过显式 3D 高斯模型生成高质量渲染方面表现出色，显著提高了环境重建的保真度。然而，这些方法依赖静态环境假设，在动态场景中，由于几何和光度观测的不一致性而面临挑战。\n为了解决这一问题，我们提出了 DG-SLAM，这是首个基于 3D 高斯的鲁棒动态视觉 SLAM 系统，同时提供精确的相机姿态估计和高保真的重建。具体而言，我们设计了多项有效策略，包括运动掩膜生成、适应性高斯点管理以及混合相机跟踪算法，从而提升姿态估计的准确性和鲁棒性。\n大量实验表明，DG-SLAM 在动态场景中的相机姿态估计、地图重建以及新视图合成方面实现了最先进的性能，同时保留了实时渲染能力，显著优于现有方法。\n"
  },
  {
    "path": "abs/2411.08508.md",
    "content": "### BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis\n\nWe present billboard Splatting (BBSplat) - a novel approach for 3D scene representation based on textured geometric primitives. BBSplat represents the scene as a set of optimizable textured planar primitives with learnable RGB textures and alpha-maps to control their shape. BBSplat primitives can be used in any Gaussian Splatting pipeline as drop-in replacements for Gaussians. Our method's qualitative and quantitative improvements over 3D and 2D Gaussians are most noticeable when fewer primitives are used, when BBSplat achieves over 1200 FPS. Our novel regularization term encourages textures to have a sparser structure, unlocking an efficient compression that leads to a reduction in storage space of the model. Our experiments show the efficiency of BBSplat on standard datasets of real indoor and outdoor scenes such as Tanks&Temples, DTU, and Mip-NeRF-360. We demonstrate improvements on PSNR, SSIM, and LPIPS metrics compared to the state-of-the-art, especially for the case when fewer primitives are used, which, on the other hand, leads to up to 2 times inference speed improvement for the same rendering quality.\n\n我们提出了 Billboard Splatting (BBSplat)，一种基于纹理几何基元的创新 3D 场景表示方法。BBSplat 将场景表示为一组可优化的纹理平面基元，这些基元具有可学习的 RGB 纹理和 alpha 映射，用于控制其形状。BBSplat 基元可以作为高斯基元的直接替代品，无缝集成到任何 3D Gaussian Splatting (3DGS) 管道中。\n与 3D 和 2D 高斯相比，当使用较少的基元时，BBSplat 在定性和定量上的改进最为显著，同时实现了超过 1200 FPS 的渲染速度。我们设计了一种新颖的正则化项，鼓励纹理具有稀疏结构，从而实现高效压缩，大幅减少模型存储空间需求。\n在 Tanks&Temples、DTU 和 Mip-NeRF-360 等标准室内外场景数据集上的实验表明，BBSplat 在 PSNR、SSIM 和 LPIPS 等指标上相较当前最先进方法取得了显著改进。尤其是在使用较少基元的情况下，BBSplat 在保持相同渲染质量的同时，实现了高达 2 倍的推理速度提升，展示了其卓越的效率与性能。\n"
  },
  {
    "path": "abs/2411.08879.md",
    "content": "### 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization\n\nNovel view synthesis of dynamic scenes is becoming important in various applications, including augmented and virtual reality. We propose a novel 4D Gaussian Splatting (4DGS) algorithm for dynamic scenes from casually recorded monocular videos. To overcome the overfitting problem of existing work for these real-world videos, we introduce an uncertainty-aware regularization that identifies uncertain regions with few observations and selectively imposes additional priors based on diffusion models and depth smoothness on such regions. This approach improves both the performance of novel view synthesis and the quality of training image reconstruction. We also identify the initialization problem of 4DGS in fast-moving dynamic regions, where the Structure from Motion (SfM) algorithm fails to provide reliable 3D landmarks. To initialize Gaussian primitives in such regions, we present a dynamic region densification method using the estimated depth maps and scene flow. Our experiments show that the proposed method improves the performance of 4DGS reconstruction from a video captured by a handheld monocular camera and also exhibits promising results in few-shot static scene reconstruction.\n\n动态场景的新视图合成在增强现实和虚拟现实等应用中变得越来越重要。我们提出了一种新颖的 4D Gaussian Splatting (4DGS) 算法，用于从随意录制的单目视频中生成动态场景。为克服现有方法在处理真实世界视频时的过拟合问题，我们引入了一种不确定性感知正则化，该方法识别观测较少的高不确定性区域，并在这些区域选择性地施加基于扩散模型和深度平滑性的附加先验。此方法提升了新视图合成的性能，同时改善了训练图像重建的质量。\n此外，我们还识别了 4DGS 在快速移动的动态区域中初始化的难点。在这些区域，结构化运动（Structure from Motion, SfM）算法无法提供可靠的 3D 特征点。为解决这一问题，我们提出了一种基于估计深度图和场景流的动态区域致密化方法，用于初始化这些区域中的高斯基元。\n实验结果表明，该方法显著提升了从手持单目相机录制视频中进行 4DGS 重建的性能，同时在少样本静态场景重建中也表现出了令人期待的效果。\n"
  },
  {
    "path": "abs/2411.09156.md",
    "content": "### DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction\n\nRecent advancements in 3D Gaussian Splatting (3DGS), which lead to high-quality novel view synthesis and accelerated rendering, have remarkably improved the quality of radiance field reconstruction. However, the extraction of mesh from a massive number of minute 3D Gaussian points remains great challenge due to the large volume of Gaussians and difficulty of representation of sharp signals caused by their inherent low-pass characteristics. To address this issue, we propose DyGASR, which utilizes generalized exponential function instead of traditional 3D Gaussian to decrease the number of particles and dynamically optimize the representation of the captured signal. In addition, it is observed that reconstructing mesh with Generalized Exponential Splatting(GES) without modifications frequently leads to failures since the generalized exponential distribution centroids may not precisely align with the scene surface. To overcome this, we adopt Sugar's approach and introduce Generalized Surface Regularization (GSR), which reduces the smallest scaling vector of each point cloud to zero and ensures normal alignment perpendicular to the surface, facilitating subsequent Poisson surface mesh reconstruction. Additionally, we propose a dynamic resolution adjustment strategy that utilizes a cosine schedule to gradually increase image resolution from low to high during the training stage, thus avoiding constant full resolution, which significantly boosts the reconstruction speed. Our approach surpasses existing 3DGS-based mesh reconstruction methods, as evidenced by extensive evaluations on various scene datasets, demonstrating a 25% increase in speed, and a 30% reduction in memory usage.\n\n最近在3D Gaussian Splatting (3DGS) 方面的进展显著提升了新视角合成的质量和渲染速度，加速了辐射场重建。然而，从大量微小的3D高斯点中提取网格仍然是一大挑战，主要原因在于高斯点数量庞大且其固有的低通特性难以表现出锐利信号。为了解决这个问题，我们提出了 DyGASR 方法，该方法采用广义指数函数代替传统的3D高斯分布，从而减少粒子数量，并动态优化捕获信号的表示能力。\n此外，我们观察到直接使用 广义指数分布点渲染（Generalized Exponential Splatting, GES） 进行网格重建通常会失败，这是因为广义指数分布的质心可能无法准确对齐场景表面。为了解决这一问题，我们借鉴了 Sugar 方法，引入了 广义表面正则化（Generalized Surface Regularization, GSR）。该方法将每个点云的最小缩放向量减少到零，并确保法线垂直于表面对齐，从而促进后续的 Poisson 表面网格重建。\n此外，我们提出了一种动态分辨率调整策略，在训练过程中通过余弦调度从低分辨率逐步提高至高分辨率，避免始终使用全分辨率，从而显著加快重建速度。\n通过在多个场景数据集上的广泛评估，我们的方法在网格重建的速度上比现有的基于3DGS的方法提升了 25%，内存使用减少了 30%，展现了显著的优势。\n"
  },
  {
    "path": "abs/2411.09952.md",
    "content": "### GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video\n\nAvatar modelling has broad applications in human animation and virtual try-ons. Recent advancements in this field have focused on high-quality and comprehensive human reconstruction but often overlook the separation of clothing from the body. To bridge this gap, this paper introduces GGAvatar (Garment-separated 3D Gaussian Splatting Avatar), which relies on monocular videos. Through advanced parameterized templates and unique phased training, this model effectively achieves decoupled, editable, and realistic reconstruction of clothed humans. Comparative evaluations with other costly models confirm GGAvatar's superior quality and efficiency in modelling both clothed humans and separable garments. The paper also showcases applications in clothing editing, as illustrated in Figure 1, highlighting the model's benefits and the advantages of effective disentanglement.\n\n头像建模在人体动画和虚拟试穿中具有广泛的应用。尽管该领域的最新进展集中于高质量且全面的人体重建，但往往忽略了服装与身体的分离。为填补这一空白，本文提出了 GGAvatar（Garment-separated 3D Gaussian Splatting Avatar），其基于单目视频，通过先进的参数化模板和独特的分阶段训练方法，有效实现了穿衣人体的分离式、可编辑且逼真的重建。\n与其他高成本模型的对比评估证实了 GGAvatar 在建模穿衣人体和可分离服装方面的优越质量和效率。此外，本文展示了该模型在服装编辑中的应用（如图1所示），突显了模型的优势以及高效解耦带来的价值。\n"
  },
  {
    "path": "abs/2411.10033.md",
    "content": "### GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization\n\nWith the emergence of large-scale Text-to-Image(T2I) models and implicit 3D representations like Neural Radiance Fields (NeRF), many text-driven generative editing methods based on NeRF have appeared. However, the implicit encoding of geometric and textural information poses challenges in accurately locating and controlling objects during editing. Recently, significant advancements have been made in the editing methods of 3D Gaussian Splatting, a real-time rendering technology that relies on explicit representation. However, these methods still suffer from issues including inaccurate localization and limited manipulation over editing. To tackle these challenges, we propose GSEditPro, a novel 3D scene editing framework which allows users to perform various creative and precise editing using text prompts only. Leveraging the explicit nature of the 3D Gaussian distribution, we introduce an attention-based progressive localization module to add semantic labels to each Gaussian during rendering. This enables precise localization on editing areas by classifying Gaussians based on their relevance to the editing prompts derived from cross-attention layers of the T2I model. Furthermore, we present an innovative editing optimization method based on 3D Gaussian Splatting, obtaining stable and refined editing results through the guidance of Score Distillation Sampling and pseudo ground truth. We prove the efficacy of our method through extensive experiments.\n\n随着大规模文本生成图像（Text-to-Image, T2I）模型和神经辐射场（Neural Radiance Fields, NeRF）等隐式3D表示的兴起，许多基于NeRF的文本驱动生成编辑方法相继出现。然而，隐式编码几何和纹理信息的方式在编辑中准确定位和控制对象方面仍面临挑战。最近，基于3D Gaussian Splatting 的实时渲染技术在编辑方法上取得了显著进展，该技术依赖显式表示。然而，这些方法仍然存在定位不准确和编辑操控性有限的问题。\n为了解决这些挑战，我们提出了 GSEditPro，一种全新的3D场景编辑框架，允许用户仅使用文本提示进行多种创造性且精确的编辑。通过利用3D高斯分布的显式特性，我们引入了一种基于注意力的渐进定位模块，在渲染过程中为每个高斯添加语义标签。该模块通过T2I模型交叉注意力层生成的编辑提示，基于高斯的相关性对其进行分类，从而实现编辑区域的精准定位。\n此外，我们提出了一种基于3D Gaussian Splatting 的创新编辑优化方法，结合得分蒸馏采样（Score Distillation Sampling）和伪真实值（pseudo ground truth）的引导，获得稳定且精细的编辑结果。通过广泛的实验验证，我们证明了该方法的有效性。\n"
  },
  {
    "path": "abs/2411.10133.md",
    "content": "### Efficient Density Control for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) excels in novel view synthesis, balancing advanced rendering quality with real-time performance. However, in trained scenes, a large number of Gaussians with low opacity significantly increase rendering costs. This issue arises due to flaws in the split and clone operations during the densification process, which lead to extensive Gaussian overlap and subsequent opacity reduction. To enhance the efficiency of Gaussian utilization, we improve the adaptive density control of 3DGS. First, we introduce a more efficient long-axis split operation to replace the original clone and split, which mitigates Gaussian overlap and improves densification efficiency. Second, we propose a simple adaptive pruning technique to reduce the number of low-opacity Gaussians. Finally, by dynamically lowering the splitting threshold and applying importance weighting, the efficiency of Gaussian utilization is further improved. We evaluate our proposed method on various challenging real-world datasets. Experimental results show that our Efficient Density Control (EDC) can enhance both the rendering speed and quality.\n\n3D Gaussian Splatting (3DGS) 在新视角合成中表现出色，兼顾了先进的渲染质量和实时性能。然而，在已训练场景中，大量低不透明度的高斯点显著增加了渲染成本。此问题主要源于致密化过程中分裂与克隆操作的缺陷，这导致高斯点的严重重叠及随之而来的不透明度降低。\n为提高高斯点的利用效率，我们改进了3DGS的自适应密度控制方法。首先，我们引入了一种更高效的长轴分裂操作，替代原有的克隆与分裂机制，从而减少高斯重叠并提升致密化效率。其次，我们提出了一种简单的自适应剪枝技术，用以减少低不透明度高斯点的数量。最后，通过动态降低分裂阈值并引入重要性加权策略，进一步优化了高斯点的利用效率。\n我们在多个具有挑战性的真实场景数据集上对该方法进行了评估。实验结果表明，所提出的高效密度控制方法（Efficient Density Control, EDC）能够同时提升渲染速度和质量。\n"
  },
  {
    "path": "abs/2411.10504.md",
    "content": "### USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting\n\nSpike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task via Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Previous spike-based 3D reconstruction approaches often employ a casecased pipeline: starting with high-quality image reconstruction from spike streams based on established spike-to-image reconstruction algorithms, then progressing to camera pose estimation and 3D reconstruction. However, this cascaded approach suffers from substantial cumulative errors, where quality limitations of initial image reconstructions negatively impact pose estimation, ultimately degrading the fidelity of the 3D reconstruction. To address these issues, we propose a synergistic optimization framework, \\textbf{USP-Gaussian}, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework. Leveraging the multi-view consistency afforded by 3DGS and the motion capture capability of the spike camera, our framework enables a joint iterative optimization that seamlessly integrates information between the spike-to-image network and 3DGS. Experiments on synthetic datasets with accurate poses demonstrate that our method surpasses previous approaches by effectively eliminating cascading errors. Moreover, we integrate pose optimization to achieve robust 3D reconstruction in real-world scenarios with inaccurate initial poses, outperforming alternative methods by effectively reducing noise and preserving fine texture details.\n\n尖峰相机作为一种创新的类脑神经形态相机，以 40 kHz 的速率捕获场景并生成 0-1 比特流，正逐步应用于通过神经辐射场（NeRF）或三维高斯点（3DGS）进行三维重建任务。以往基于尖峰相机的三维重建方法通常采用一个级联的处理流程：首先通过现有的尖峰流到图像的重建算法生成高质量的图像，然后进行相机位姿估计和三维重建。然而，这种级联方法存在显著的累积误差问题，初始图像重建质量的限制会对位姿估计产生负面影响，从而最终降低三维重建的精度。\n为了解决这些问题，我们提出了一种协同优化框架，称为 USP-Gaussian，将基于尖峰的图像重建、位姿校正和高斯点绘制统一到一个端到端的框架中。通过利用 3DGS 提供的多视图一致性和尖峰相机的运动捕获能力，该框架实现了尖峰流到图像网络和 3DGS 之间信息的联合迭代优化。在合成数据集上的实验表明，即使初始位姿非常准确，我们的方法仍能通过有效消除级联误差而优于现有方法。此外，在真实场景中面对不准确的初始位姿时，我们集成了位姿优化，能够实现稳健的三维重建，相较于其他方法，我们的方法能够有效降低噪声并保留细腻的纹理细节。\n"
  },
  {
    "path": "abs/2411.10722.md",
    "content": "### DGS-SLAM: Gaussian Splatting SLAM in Dynamic Environment\n\nWe introduce Dynamic Gaussian Splatting SLAM (DGS-SLAM), the first dynamic SLAM framework built on the foundation of Gaussian Splatting. While recent advancements in dense SLAM have leveraged Gaussian Splatting to enhance scene representation, most approaches assume a static environment, making them vulnerable to photometric and geometric inconsistencies caused by dynamic objects. To address these challenges, we integrate Gaussian Splatting SLAM with a robust filtering process to handle dynamic objects throughout the entire pipeline, including Gaussian insertion and keyframe selection. Within this framework, to further improve the accuracy of dynamic object removal, we introduce a robust mask generation method that enforces photometric consistency across keyframes, reducing noise from inaccurate segmentation and artifacts such as shadows. Additionally, we propose the loop-aware window selection mechanism, which utilizes unique keyframe IDs of 3D Gaussians to detect loops between the current and past frames, facilitating joint optimization of the current camera poses and the Gaussian map. DGS-SLAM achieves state-of-the-art performance in both camera tracking and novel view synthesis on various dynamic SLAM benchmarks, proving its effectiveness in handling real-world dynamic scenes.\n\n我们提出了动态高斯点 SLAM（DGS-SLAM），这是第一个基于高斯点绘制构建的动态 SLAM 框架。尽管最近稠密 SLAM 的进展已经利用高斯点绘制来增强场景表示，但大多数方法假设环境是静态的，这使其容易受到动态物体引起的光度和几何不一致的影响。为了解决这些问题，我们将高斯点 SLAM 与鲁棒过滤流程相结合，以处理整个管道中的动态物体，包括高斯点的插入和关键帧选择。\n在该框架中，为了进一步提高动态物体移除的精度，我们引入了一种鲁棒的遮罩生成方法，通过在关键帧之间强制光度一致性，减少了由于不准确分割和诸如阴影等伪影引起的噪声。此外，我们提出了一种回环感知的窗口选择机制，利用 3D 高斯点的唯一关键帧 ID 检测当前帧与历史帧之间的回环，从而实现当前相机位姿和高斯地图的联合优化。\nDGS-SLAM 在多个动态 SLAM 基准上，在相机追踪和新视图合成任务中都达到了最先进的性能，证明了其在处理真实动态场景方面的有效性。\n"
  },
  {
    "path": "abs/2411.10947.md",
    "content": "### Direct and Explicit 3D Generation from a Single Image\n\nCurrent image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time.\n\n现有的图像到三维方法面临高计算成本的问题，且在高分辨率输出中缺乏可扩展性。对此，我们提出了一种新颖的框架，通过多视角二维深度图和 RGB 图像以及三维高斯特征，结合改造的 Stable Diffusion 模型，直接生成显式表面几何和纹理。\n我们在 U-Net 中引入了深度分支，实现高效且高质量的多视角跨域生成，同时在像素解码器中融入了极线注意力机制，确保像素级的多视角一致性。通过将生成的深度像素反投影到三维空间，我们构建了一种结构化的三维表示，该表示既可以通过高斯点绘制进行渲染，也可以提取为高质量网格。此外，我们利用额外的新视图合成损失，进一步提升了性能。\n大量实验表明，我们的方法在几何和纹理质量方面均优于现有基线，同时显著减少了生成时间，实现了更高的效率和表现力。\n"
  },
  {
    "path": "abs/2411.11024.md",
    "content": "### VeGaS: Video Gaussian Splatting\n\nImplicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data.\n\n隐式神经表示（INRs）通过神经网络将离散数据近似为连续函数。在视频数据的背景下，此类模型可以将像素位置的坐标和帧的时间（或索引）转换为 RGB 颜色值。尽管 INRs 在压缩方面表现出色，但它们并不适合用于编辑操作。一种潜在的解决方案是使用基于三维高斯点绘制（3DGS）的模型，例如视频高斯表示（VGR），它能够将视频编码为多个三维高斯点，并支持包括编辑在内的多种视频处理操作。然而，这种情况下的修改能力仅限于少量基本变换。\n为了解决这一问题，我们提出了 视频高斯点绘制（VeGaS） 模型，能够实现对视频数据的真实感修改。为了构建 VeGaS，我们设计了一种新的折叠高斯分布（Folded-Gaussian distributions）家族，用于捕捉视频流中的非线性动态，并通过条件分布将连续帧建模为相应的二维高斯分布。\n实验表明，VeGaS 在帧重建任务中优于现有的最先进方法，同时能够对视频数据进行真实感的修改，显著拓展了视频编辑的可能性。\n"
  },
  {
    "path": "abs/2411.11363.md",
    "content": "### GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views\n\nDifferentiable rendering techniques have recently shown promising results for free-viewpoint video synthesis of characters. However, such methods, either Gaussian Splatting or neural implicit rendering, typically necessitate per-subject optimization which does not meet the requirement of real-time rendering in an interactive application. We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. To this end, we introduce Gaussian parameter maps defined on the source views and directly regress Gaussian properties for instant novel view synthesis without any fine-tuning or optimization. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable with both depth and rendering supervision or with only rendering supervision. We further introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between two source views, especially when neglecting depth supervision. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.\n\n可微分渲染技术近年来在角色的自由视角视频合成中展现了令人瞩目的成果。然而，无论是高斯点绘制还是神经隐式渲染，这些方法通常需要针对每个目标进行优化，难以满足交互式应用中实时渲染的需求。为此，我们提出了一种可泛化的高斯点绘制方法，能够在稀疏视角相机设置下实现高分辨率图像渲染。\n我们引入了一种基于源视图定义的高斯参数映射，通过直接回归高斯属性，无需任何微调或优化即可实现即时的新视角合成。我们的高斯参数回归模块可以在仅包含人体的数据或包含人体与场景的数据上进行训练，并与深度估计模块联合，借助二维参数映射提升至三维空间。该框架完全可微分，可以利用深度和渲染监督，或仅通过渲染监督进行训练。\n此外，我们提出了一种正则化项以及极线注意力机制，以在忽略深度监督时仍然保持两视图之间的几何一致性。在多个数据集上的实验表明，我们的方法在渲染质量上优于当前最先进的方法，同时实现了显著的渲染速度提升。\n"
  },
  {
    "path": "abs/2411.11839.md",
    "content": "### RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator\n\nEfficient acquisition of real-world embodied data has been increasingly critical. However, large-scale demonstrations captured by remote operation tend to take extremely high costs and fail to scale up the data size in an efficient manner. Sampling the episodes under a simulated environment is a promising way for large-scale collection while existing simulators fail to high-fidelity modeling on texture and physics. To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine. RoboGSim mainly includes four parts: Gaussian Reconstructor, Digital Twins Builder, Scene Composer, and Interactive Engine. It can synthesize the simulated data with novel views, objects, trajectories, and scenes. RoboGSim also provides an online, reproducible, and safe evaluation for different manipulation policies. The real2sim and sim2real transfer experiments show a high consistency in the texture and physics. Moreover, the effectiveness of synthetic data is validated under the real-world manipulated tasks. We hope RoboGSim serves as a closed-loop simulator for fair comparison on policy learning.\n\n高效获取真实世界的具身数据变得越来越重要。然而，通过远程操作捕获的大规模示范数据成本极高，难以有效扩大数据规模。在模拟环境下采样任务片段是一种有前景的大规模数据收集方式，但现有模拟器在纹理和物理建模的高保真性方面存在不足。为了解决这些限制，我们提出了 RoboGSim，一个基于真实到模拟再回归真实（real2sim2real）流程的机器人模拟器，由 3D 高斯点绘制和物理引擎驱动。\nRoboGSim 主要包括四个模块：高斯重构器（Gaussian Reconstructor）、数字孪生构建器（Digital Twins Builder）、场景编辑器（Scene Composer）以及交互引擎（Interactive Engine）。它可以合成具有新视角、对象、轨迹和场景的模拟数据，并提供在线、可复现且安全的环境，用于评估不同的操作策略。通过真实到模拟和模拟到真实的实验验证，RoboGSim 在纹理和物理表现上展现了高度一致性。此外，实验还表明，合成数据在真实世界中的操作任务中具有显著的有效性。\n我们希望 RoboGSim 能作为一个闭环模拟器，为策略学习的公平比较提供支持。\n"
  },
  {
    "path": "abs/2411.11921.md",
    "content": "### DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes\n\nWe present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations.\n\n我们提出了 DeSiRe-GS，一种自监督的高斯点绘制表示方法，能够在复杂驾驶场景中实现有效的静态-动态分解和高保真表面重建。我们的方法采用两阶段的优化管道，用于处理动态街景中的高斯点。\n在第一阶段，我们基于一个关键观察——三维高斯点绘制本质上只能重建动态环境中的静态区域——提取二维运动掩膜。这些提取的二维运动先验随后被以可微分的方式映射到高斯空间。在第二阶段，我们利用动态高斯的高效表达式进行优化。结合我们提出的几何正则化策略，该方法能够解决自动驾驶数据稀疏性导致的过拟合问题，从而重建与物体表面对齐的物理合理高斯点，而不是漂浮在空中。\n此外，我们引入了时间上的跨视角一致性，确保时间和视点上的连贯性，从而实现高质量的表面重建。全面的实验表明，DeSiRe-GS 在效率和效果上均优于现有的自监督方法，并在准确性上接近依赖外部 3D 边界框标注的方法。\n"
  },
  {
    "path": "abs/2411.11941.md",
    "content": "### TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction\n\nDynamic scene reconstruction is a long-term challenge in 3D vision. Recent methods extend 3D Gaussian Splatting to dynamic scenes via additional deformation fields and apply explicit constraints like motion flow to guide the deformation. However, they learn motion changes from individual timestamps independently, making it challenging to reconstruct complex scenes, particularly when dealing with violent movement, extreme-shaped geometries, or reflective surfaces. To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective. Specifically, TimeFormer includes a Cross-Temporal Transformer Encoder, which adaptively learns the temporal relationships of deformable 3D Gaussians. Furthermore, we propose a two-stream optimization strategy that transfers the motion knowledge learned from TimeFormer to the base stream during the training phase. This allows us to remove TimeFormer during inference, thereby preserving the original rendering speed. Extensive experiments in the multi-view and monocular dynamic scenes validate qualitative and quantitative improvement brought by TimeFormer.\n\n动态场景重建一直是3D视觉领域的长期挑战。近期的方法通过附加的变形场将3D高斯点扩展到动态场景，并应用显式约束（如运动流）来引导变形。然而，这些方法从单独的时间戳独立学习运动变化，这使得在重建复杂场景时面临挑战，尤其是在处理剧烈运动、极端几何形状或反射表面时。为了解决上述问题，我们设计了一个即插即用模块，称为 TimeFormer，使现有的可变形3D高斯重建方法能够从学习的角度隐式建模运动模式。\n具体而言，TimeFormer 包括一个 跨时间 Transformer 编码器（Cross-Temporal Transformer Encoder），能够自适应地学习可变形3D高斯的时间关系。此外，我们提出了一种 双流优化策略，在训练阶段将 TimeFormer 学到的运动知识传递到基础流（base stream）。这样，在推理阶段可以移除 TimeFormer，从而保留原始的渲染速度。\n在多视角和单目动态场景中的大量实验表明，TimeFormer 带来了定性和定量的显著改进。\n"
  },
  {
    "path": "abs/2411.12089.md",
    "content": "### FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting\n\nIn the real world, objects reveal internal textures when sliced or cut, yet this behavior is not well-studied in 3D generation tasks today. For example, slicing a virtual 3D watermelon should reveal flesh and seeds. Given that no available dataset captures an object's full internal structure and collecting data from all slices is impractical, generative methods become the obvious approach. However, current 3D generation and inpainting methods often focus on visible appearance and overlook internal textures. To bridge this gap, we introduce FruitNinja, the first method to generate internal textures for 3D objects undergoing geometric and topological changes. Our approach produces objects via 3D Gaussian Splatting (3DGS) with both surface and interior textures synthesized, enabling real-time slicing and rendering without additional optimization. FruitNinja leverages a pre-trained diffusion model to progressively inpaint cross-sectional views and applies voxel-grid-based smoothing to achieve cohesive textures throughout the object. Our OpaqueAtom GS strategy overcomes 3DGS limitations by employing densely distributed opaque Gaussians, avoiding biases toward larger particles that destabilize training and sharp color transitions for fine-grained textures. Experimental results show that FruitNinja substantially outperforms existing approaches, showcasing unmatched visual quality in real-time rendered internal views across arbitrary geometry manipulations.\n\n在现实世界中，物体被切开或分割时会显露其内部纹理，但这一行为在当前的3D生成任务中并未得到充分研究。例如，切开一个虚拟的3D西瓜应显示其果肉和种子。然而，目前没有可用的数据集能够捕获物体的完整内部结构，同时从所有切片收集数据也不现实，因此生成式方法成为显而易见的解决方案。然而，当前的3D生成与修补方法通常关注物体的可见外观，而忽略了内部纹理。\n为弥补这一空白，我们提出 FruitNinja，这是首个针对几何和拓扑变化生成3D物体内部纹理的方法。我们的方法通过 3D Gaussian Splatting (3DGS) 生成物体，合成表面与内部纹理，实现实时切割和渲染，无需额外的优化过程。FruitNinja 利用预训练的扩散模型逐步修补横截面视图，并通过基于体素网格的平滑方法生成物体内部一致的纹理。\n此外，我们提出了 OpaqueAtom GS 策略，克服了 3DGS 的局限性。该策略采用密集分布的不透明高斯点，避免了对较大粒子的偏向，这些偏向通常会导致训练不稳定及颜色过渡不够精细的问题，从而实现了细腻的纹理效果。实验结果表明，FruitNinja 在实时渲染的内部视图质量上远超现有方法，在任意几何操作下展现了无与伦比的视觉效果。\n"
  },
  {
    "path": "abs/2411.12168.md",
    "content": "### Sketch-guided Cage-based 3D Gaussian Splatting Deformation\n\n3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and computer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.\n\n3D Gaussian Splatting (GS) 是一种备受关注的新型3D表示方法，在计算机图形学和计算机视觉领域具有广阔的前景。尽管已有各种系统为3D GS引入了编辑功能，例如基于文本提示的引导，但对变形的精细控制仍然是一个未解决的挑战。在本研究中，我们提出了一种新颖的 草图引导3D GS变形系统，允许用户通过从单一视角绘制轮廓草图直观地修改3D GS模型的几何形状。\n我们的方法引入了一种全新的变形方法，结合了基于笼形变形（cage-based deformation）与神经雅可比场（Neural Jacobian Fields）的变体，实现了精确的细粒度控制。此外，系统还利用大规模2D扩散先验和 ControlNet，确保生成的变形在语义上具有合理性。通过一系列实验，我们验证了该方法的有效性，并展示了其将静态3D GS模型动画化的关键应用之一。\n"
  },
  {
    "path": "abs/2411.12185.md",
    "content": "### LiV-GS: LiDAR-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments\n\nWe present LiV-GS, a LiDAR-visual SLAM system in outdoor environments that leverages 3D Gaussian as a differentiable spatial representation. Notably, LiV-GS is the first method that directly aligns discrete and sparse LiDAR data with continuous differentiable Gaussian maps in large-scale outdoor scenes, overcoming the limitation of fixed resolution in traditional LiDAR mapping. The system aligns point clouds with Gaussian maps using shared covariance attributes for front-end tracking and integrates the normal orientation into the loss function to refines the Gaussian map. To reliably and stably update Gaussians outside the LiDAR field of view, we introduce a novel conditional Gaussian constraint that aligns these Gaussians closely with the nearest reliable ones. The targeted adjustment enables LiV-GS to achieve fast and accurate mapping with novel view synthesis at a rate of 7.98 FPS. Extensive comparative experiments demonstrate LiV-GS's superior performance in SLAM, image rendering and mapping. The successful cross-modal radar-LiDAR localization highlights the potential of LiV-GS for applications in cross-modal semantic positioning and object segmentation with Gaussian maps.\n\n我们提出了 LiV-GS，一种用于户外环境的 LiDAR-视觉 SLAM 系统，该系统利用 3D 高斯作为可微分的空间表示。值得注意的是，LiV-GS 是首个能够在大规模户外场景中直接对齐离散稀疏的 LiDAR 数据与连续可微的高斯地图的方法，克服了传统 LiDAR 映射中固定分辨率的局限性。\n该系统通过共享的协方差属性将点云与高斯地图对齐，用于前端跟踪，并将法向量方向引入损失函数以优化高斯地图。此外，为了可靠且稳定地更新 LiDAR 视场之外的高斯，我们引入了一种新颖的 条件高斯约束，使这些高斯能够与最近的可靠高斯紧密对齐。这种有针对性的调整使 LiV-GS 能够以 7.98 FPS 的速率实现快速且准确的映射和新视图合成。\n大量对比实验表明，LiV-GS 在 SLAM、图像渲染和映射方面表现优异。尤其是跨模态雷达-LiDAR 定位的成功，凸显了 LiV-GS 在基于高斯地图的跨模态语义定位和物体分割应用中的潜力。\n"
  },
  {
    "path": "abs/2411.12309.md",
    "content": "### DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes\n\nNovel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction. However, these methods rely heavily on dense image inputs and prolonged training times, making them unsuitable where computational resources are limited. Additionally, few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes. Our approach divides the scene into regions, processed independently by drones with sparse image inputs. Using a feed-forward Gaussian model, we predict high-quality Gaussian primitives, followed by a global alignment algorithm to ensure geometric consistency. Synthetic views and depth priors are incorporated to further enhance training, while a distillation-based model aggregation mechanism enables efficient reconstruction. Our method achieves high-quality large-scale scene reconstruction and novel-view synthesis in significantly reduced training times, outperforming existing approaches in both speed and scalability. We demonstrate the effectiveness of our framework on vast aerial scenes, achieving high-quality results within minutes.\n\n新视图合成（NVS）方法在大场景重建中扮演了关键角色。然而，这些方法严重依赖密集的图像输入和长时间的训练，限制了其在计算资源有限的环境中的应用。此外，少样本方法在大场景中往往面临重建质量较差的问题。\n本文提出了一种新颖的分布式框架 DGTR，用于稀疏视图大场景的高效高斯重建。我们的方法将场景划分为多个区域，由携带稀疏图像输入的无人机独立处理。通过一个前馈高斯模型，我们预测高质量的高斯基元，随后使用全局对齐算法确保几何一致性。合成视图和深度先验被引入以进一步提升训练效果，而基于蒸馏的模型聚合机制则实现了高效的重建。\n我们的方法在显著减少训练时间的同时，实现了高质量的大规模场景重建和新视图合成。在速度和可扩展性上均超越现有方法。我们在大规模航拍场景上验证了该框架的有效性，展示了其在数分钟内生成高质量结果的能力。\n"
  },
  {
    "path": "abs/2411.12440.md",
    "content": "### Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels\n\nRecent advancements in 3D Gaussian Splatting (3DGS) have substantially improved novel view synthesis, enabling high-quality reconstruction and real-time rendering. However, blurring artifacts, such as floating primitives and over-reconstruction, remain challenging. Current methods address these issues by refining scene structure, enhancing geometric representations, addressing blur in training images, improving rendering consistency, and optimizing density control, yet the role of kernel design remains underexplored. We identify the soft boundaries of Gaussian ellipsoids as one of the causes of these artifacts, limiting detail capture in high-frequency regions. To bridge this gap, we introduce 3D Linear Splatting (3DLS), which replaces Gaussian kernels with linear kernels to achieve sharper and more precise results, particularly in high-frequency regions. Through evaluations on three datasets, 3DLS demonstrates state-of-the-art fidelity and accuracy, along with a 30% FPS improvement over baseline 3DGS. The implementation will be made publicly available upon acceptance.\n\n近期在 3D Gaussian Splatting (3DGS) 上的进展显著提升了新视图合成的质量，实现了高质量重建和实时渲染。然而，模糊伪影（如漂浮基元和过度重建）依然是一个难题。目前的方法通过优化场景结构、增强几何表示、处理训练图像中的模糊、改进渲染一致性以及优化密度控制来解决这些问题，但内核设计的重要性却未被充分探索。\n我们发现，高斯椭球的软边界是导致这些伪影的原因之一，限制了高频区域细节的捕获。为解决这一问题，我们提出 3D Linear Splatting (3DLS)，用线性内核替代高斯内核，在高频区域实现了更清晰、更精确的结果。\n在三个数据集上的评估中，3DLS 展现了当前最先进的保真度和准确性，并在帧率上相比基准的 3DGS 提升了 30%。代码将在论文接收后公开。\n"
  },
  {
    "path": "abs/2411.12452.md",
    "content": "### GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving\n\nSelf-supervised learning has made substantial strides in image processing, while visual pre-training for autonomous driving is still in its infancy. Existing methods often focus on learning geometric scene information while neglecting texture or treating both aspects separately, hindering comprehensive scene understanding. In this context, we are excited to introduce GaussianPretrain, a novel pre-training paradigm that achieves a holistic understanding of the scene by uniformly integrating geometric and texture representations. Conceptualizing 3D Gaussian anchors as volumetric LiDAR points, our method learns a deepened understanding of scenes to enhance pre-training performance with detailed spatial structure and texture, achieving that 40.6% faster than NeRF-based method UniPAD with 70% GPU memory only. We demonstrate the effectiveness of GaussianPretrain across multiple 3D perception tasks, showing significant performance improvements, such as a 7.05% increase in NDS for 3D object detection, boosts mAP by 1.9% in HD map construction and 0.8% improvement on Occupancy prediction. These significant gains highlight GaussianPretrain's theoretical innovation and strong practical potential, promoting visual pre-training development for autonomous driving.\n\n自监督学习在图像处理领域取得了显著进展，而用于自动驾驶的视觉预训练仍处于起步阶段。现有方法通常专注于学习场景的几何信息，却忽视了纹理，或将两者分开处理，从而阻碍了对场景的全面理解。在此背景下，我们提出 GaussianPretrain，一种新颖的预训练范式，通过统一整合几何和纹理表示，实现对场景的整体理解。\n我们将 3D 高斯锚点概念化为体积化的 LiDAR 点，通过这种方法学习场景的深度理解，从而增强预训练性能，并在捕获详细的空间结构和纹理的同时，比基于 NeRF 的方法 UniPAD 快 40.6%，且仅消耗 70% 的 GPU 内存。\n在多个 3D 感知任务中的实验表明，GaussianPretrain 带来了显著的性能提升。例如，在 3D 目标检测中 NDS 提升 7.05%，在高清地图构建中 mAP 提升 1.9%，以及在占用预测中 提升 0.8%。这些显著的性能增益突显了 GaussianPretrain 的理论创新与强大的实用潜力，为自动驾驶的视觉预训练发展提供了新的推动力。\n"
  },
  {
    "path": "abs/2411.12471.md",
    "content": "### SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image\n\nSnapshot Compressive Imaging (SCI) offers a possibility for capturing information in high-speed dynamic scenes, requiring efficient reconstruction method to recover scene information. Despite promising results, current deep learning-based and NeRF-based reconstruction methods face challenges: 1) deep learning-based reconstruction methods struggle to maintain 3D structural consistency within scenes, and 2) NeRF-based reconstruction methods still face limitations in handling dynamic scenes. To address these challenges, we propose SCIGS, a variant of 3DGS, and develop a primitive-level transformation network that utilizes camera pose stamps and Gaussian primitive coordinates as embedding vectors. This approach resolves the necessity of camera pose in vanilla 3DGS and enhances multi-view 3D structural consistency in dynamic scenes by utilizing transformed primitives. Additionally, a high-frequency filter is introduced to eliminate the artifacts generated during the transformation. The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes. Experiments on both static and dynamic scenes demonstrate that SCIGS not only enhances SCI decoding but also outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image.\n\nSnapshot Compressive Imaging (SCI) 为捕捉高速动态场景信息提供了一种可能性，但需要高效的重建方法来恢复场景信息。尽管当前基于深度学习和 NeRF 的重建方法已取得一定进展，但仍存在以下挑战：1）基于深度学习的重建方法难以在场景中保持 3D 结构的一致性；2）基于 NeRF 的重建方法在处理动态场景时仍存在局限性。\n为解决这些问题，我们提出了 SCIGS，这是 3D Gaussian Splatting (3DGS) 的一种变体，并开发了一个基元级变换网络。该网络利用相机姿态标记和高斯基元坐标作为嵌入向量，解决了传统 3DGS 中对相机姿态的依赖，同时通过变换后的基元增强动态场景中多视角 3D 结构的一致性。此外，我们引入了一个高频滤波器，消除变换过程中产生的伪影。\nSCIGS 是首个能够从单张压缩图像重建 3D 显式场景的方法，并将其应用扩展至动态 3D 场景。对静态和动态场景的实验表明，SCIGS 不仅提升了 SCI 的解码能力，还在从单张压缩图像重建动态 3D 场景方面超越了当前最先进的方法。\n"
  },
  {
    "path": "abs/2411.12510.md",
    "content": "### PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy\n\nEndoscopic procedures are crucial for colorectal cancer diagnosis, and three-dimensional reconstruction of the environment for real-time novel-view synthesis can significantly enhance diagnosis. We present PR-ENDO, a framework that leverages 3D Gaussian Splatting within a physically based, relightable model tailored for the complex acquisition conditions in endoscopy, such as restricted camera rotations and strong view-dependent illumination. By exploiting the connection between the camera and light source, our approach introduces a relighting model to capture the intricate interactions between light and tissue using physically based rendering and MLP. Existing methods often produce artifacts and inconsistencies under these conditions, which PR-ENDO overcomes by incorporating a specialized diffuse MLP that utilizes light angles and normal vectors, achieving stable reconstructions even with limited training camera rotations. We benchmarked our framework using a publicly available dataset and a newly introduced dataset with wider camera rotations. Our methods demonstrated superior image quality compared to baseline approaches.\n\n内窥镜手术在结直肠癌诊断中至关重要，而通过实时新视图合成进行环境的三维重建，可以显著提升诊断效果。我们提出了 PR-ENDO，一个框架结合了 3D Gaussian Splatting (3DGS) 和基于物理的可重光照模型，专为内窥镜复杂采集条件（如有限的相机旋转和强视角相关光照）而设计。\n通过利用相机和光源之间的关联，PR-ENDO 引入了一种重光照模型，利用基于物理的渲染和多层感知机（MLP）捕获光与组织之间复杂的交互。在这些条件下，现有方法常常生成伪影和不一致的结果，而 PR-ENDO 通过加入一个专门设计的漫反射 MLP，结合光角度和法向量，有效实现了在有限训练相机旋转下的稳定重建。\n我们使用一个公开数据集和一个包含更广相机旋转的新引入数据集对框架进行了基准测试。实验结果表明，PR-ENDO 在图像质量上相比基准方法表现出显著优势。\n"
  },
  {
    "path": "abs/2411.12788.md",
    "content": "### Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification\n\nIn this study, we explore the essential challenge of fast scene optimization for Gaussian Splatting. Through a thorough analysis of the geometry modeling process, we reveal that dense point clouds can be effectively reconstructed early in optimization through Gaussian representations. This insight leads to our approach of aggressive Gaussian densification, which provides a more efficient alternative to conventional progressive densification methods. By significantly increasing the number of critical Gaussians, we enhance the model capacity to capture dense scene geometry at the early stage of optimization. This strategy is seamlessly integrated into the Mini-Splatting densification and simplification framework, enabling rapid convergence without compromising quality. Additionally, we introduce visibility culling within Gaussian Splatting, leveraging per-view Gaussian importance as precomputed visibility to accelerate the optimization process. Our Mini-Splatting2 achieves a balanced trade-off among optimization time, the number of Gaussians, and rendering quality, establishing a strong baseline for future Gaussian-Splatting-based works. Our work sets the stage for more efficient, high-quality 3D scene modeling in real-world applications.\n\n在本研究中，我们探讨了 Gaussian Splatting 快速场景优化的核心挑战。通过对几何建模过程的深入分析，我们发现可以通过高斯表示在优化的早期阶段有效地重建稠密点云。基于这一洞察，我们提出了 激进的高斯密化策略，作为传统渐进密化方法的一种更高效替代方案。通过显著增加关键高斯的数量，我们增强了模型在优化初期捕获稠密场景几何的能力。\n该策略无缝集成到 Mini-Splatting 的密化与简化框架中，实现了快速收敛且不牺牲质量。此外，我们在高斯分布中引入了 可见性剔除，利用每视角的高斯重要性作为预计算的可见性指标，加速优化过程。\n我们的 Mini-Splatting2 在优化时间、高斯数量和渲染质量之间达成了良好的平衡，为未来基于高斯分布的研究奠定了强大的基线。我们的工作为现实应用中的高效、高质量 3D 场景建模铺平了道路。\n"
  },
  {
    "path": "abs/2411.12789.md",
    "content": "### Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting\n\nRecent advancements in 3D generation models have opened new possibilities for simulating dynamic 3D object movements and customizing behaviors, yet creating this content remains challenging. Current methods often require manual assignment of precise physical properties for simulations or rely on video generation models to predict them, which is computationally intensive. In this paper, we rethink the usage of multi-modal large language model (MLLM) in physics-based simulation, and present Sim Anything, a physics-based approach that endows static 3D objects with interactive dynamics. We begin with detailed scene reconstruction and object-level 3D open-vocabulary segmentation, progressing to multi-view image in-painting. Inspired by human visual reasoning, we propose MLLM-based Physical Property Perception (MLLM-P3) to predict mean physical properties of objects in a zero-shot manner. Based on the mean values and the object's geometry, the Material Property Distribution Prediction model (MPDP) model then estimates the full distribution, reformulating the problem as probability distribution estimation to reduce computational costs. Finally, we simulate objects in an open-world scene with particles sampled via the Physical-Geometric Adaptive Sampling (PGAS) strategy, efficiently capturing complex deformations and significantly reducing computational costs. Extensive experiments and user studies demonstrate our Sim Anything achieves more realistic motion than state-of-the-art methods within 2 minutes on a single GPU.\n\n近期3D生成模型的进展为模拟动态3D对象运动和定制行为提供了新的可能性，但生成此类内容依然具有挑战性。现有方法通常需要手动指定精确的物理属性进行模拟，或者依赖视频生成模型进行预测，这对计算资源要求较高。\n本文重新思考了多模态大语言模型（MLLM）在基于物理模拟中的应用，提出了 Sim Anything，一种赋予静态3D对象交互动态的物理模拟方法。我们从详细的场景重建和对象级 3D 开放词汇分割开始，逐步实现多视角图像修补。受人类视觉推理的启发，我们设计了 MLLM-based Physical Property Perception (MLLM-P3)，以零样本方式预测对象的平均物理属性。基于平均值和对象几何信息，Material Property Distribution Prediction (MPDP) 模型进一步估计完整分布，将问题重构为概率分布估计，从而显著降低计算成本。\n最后，我们利用 Physical-Geometric Adaptive Sampling (PGAS) 策略在开放世界场景中对对象进行模拟，通过采样粒子高效捕捉复杂变形，并显著减少计算成本。大量实验和用户研究表明，Sim Anything 能够在单张 GPU 上于 2 分钟内 生成比现有最先进方法更真实的运动效果。\n"
  },
  {
    "path": "abs/2411.12981.md",
    "content": "### GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting\n\nGaze estimation encounters generalization challenges when dealing with out-of-distribution data. To address this problem, recent methods use neural radiance fields (NeRF) to generate augmented data. However, existing methods based on NeRF are computationally expensive and lack facial details. 3D Gaussian Splatting (3DGS) has become the prevailing representation of neural fields. While 3DGS has been extensively examined in head avatars, it faces challenges with accurate gaze control and generalization across different subjects. In this work, we propose GazeGaussian, a high-fidelity gaze redirection method that uses a two-stream 3DGS model to represent the face and eye regions separately. By leveraging the unstructured nature of 3DGS, we develop a novel eye representation for rigid eye rotation based on the target gaze direction. To enhance synthesis generalization across various subjects, we integrate an expression-conditional module to guide the neural renderer. Comprehensive experiments show that GazeGaussian outperforms existing methods in rendering speed, gaze redirection accuracy, and facial synthesis across multiple datasets. We also demonstrate that existing gaze estimation methods can leverage GazeGaussian to improve their generalization performance.\n\n凝视估计在处理分布外数据时面临泛化挑战。为解决这一问题，近期方法尝试使用 NeRF（神经辐射场）生成增强数据。然而，基于 NeRF 的现有方法计算代价高昂且缺乏面部细节。随着 3D Gaussian Splatting (3DGS) 成为神经场的主流表示，其在头部头像建模中已有广泛应用，但在准确的凝视控制和跨主体的泛化方面仍存在挑战。\n为此，我们提出 GazeGaussian，一种高保真凝视重定向方法，使用双流 3DGS 模型分别表示面部和眼睛区域。通过利用 3DGS 的非结构化特性，我们设计了一种基于目标凝视方向的刚性眼球旋转新颖表示方法。为增强在不同主体间的合成泛化能力，我们引入了一个 表情条件模块，以引导神经渲染器。\n全面实验表明，GazeGaussian 在渲染速度、凝视重定向精度以及面部合成质量上均优于现有方法，并在多个数据集上表现出卓越的性能。此外，我们进一步证明，现有的凝视估计方法可以利用 GazeGaussian 提升其泛化能力，从而改进对分布外数据的适应性。\n"
  },
  {
    "path": "abs/2411.13753.md",
    "content": "### FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting\n\nWe present FAST-Splat for fast, ambiguity-free semantic Gaussian Splatting, which seeks to address the main limitations of existing semantic Gaussian Splatting methods, namely: slow training and rendering speeds; high memory usage; and ambiguous semantic object localization. In deriving FAST-Splat , we formulate open-vocabulary semantic Gaussian Splatting as the problem of extending closed-set semantic distillation to the open-set (open-vocabulary) setting, enabling FAST-Splat to provide precise semantic object localization results, even when prompted with ambiguous user-provided natural-language queries. Further, by exploiting the explicit form of the Gaussian Splatting scene representation to the fullest extent, FAST-Splat retains the remarkable training and rendering speeds of Gaussian Splatting. Specifically, while existing semantic Gaussian Splatting methods distill semantics into a separate neural field or utilize neural models for dimensionality reduction, FAST-Splat directly augments each Gaussian with specific semantic codes, preserving the training, rendering, and memory-usage advantages of Gaussian Splatting over neural field methods. These Gaussian-specific semantic codes, together with a hash-table, enable semantic similarity to be measured with open-vocabulary user prompts and further enable FAST-Splat to respond with unambiguous semantic object labels and 3D masks, unlike prior methods. In experiments, we demonstrate that FAST-Splat is 4x to 6x faster to train with a 13x faster data pre-processing step, achieves between 18x to 75x faster rendering speeds, and requires about 3x smaller GPU memory, compared to the best-competing semantic Gaussian Splatting methods. Further, FAST-Splat achieves relatively similar or better semantic segmentation performance compared to existing methods.\n\n我们提出了 FAST-Splat，一种用于快速、无歧义语义高斯分布（Semantic Gaussian Splatting）的新方法，旨在解决现有语义高斯分布方法的主要局限性，包括：训练和渲染速度慢、内存占用高以及语义对象定位的歧义性。\n在 FAST-Splat 的推导中，我们将开放词汇（open-vocabulary）语义高斯分布形式化为将封闭集语义蒸馏扩展到开放集（开放词汇）环境的问题，从而使其能够在用户提供的模糊自然语言查询下，提供精准的语义对象定位结果。此外，通过充分利用高斯分布场景表示的显式特性，FAST-Splat 保留了高斯分布在训练和渲染速度上的显著优势。\n具体而言，与现有方法通过单独的神经场或神经网络进行维度降维的方式不同，FAST-Splat 直接将特定的语义编码附加到每个高斯上，从而在不牺牲训练速度、渲染速度和内存占用的情况下，将语义集成到高斯分布中。这些高斯特定的语义编码结合哈希表，使系统能够通过开放词汇提示测量语义相似性，并以无歧义的语义对象标签和三维掩码作出响应，这点显著优于现有方法。\n在实验中，FAST-Splat 的性能优势明显：相比最佳竞品语义高斯分布方法，训练速度提升 4x 到 6x，数据预处理速度提升 13x，渲染速度提升 18x 到 75x，GPU 内存需求减少约 3x。此外，FAST-Splat 在语义分割性能上表现出与现有方法相当甚至更优的效果。\n"
  },
  {
    "path": "abs/2411.14384.md",
    "content": "### Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation\n\nExisting feed-forward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric prompt images. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generalization ability of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (~6s on an A100 GPU) than SOTA methods. The user study and text-to-3D applications also reveals the practical values of our method.\n\n现有的前馈式图像到3D方法主要依赖于二维多视角扩散模型，但这些模型难以保证三维一致性。在视角变化时，这些方法容易崩溃，并且主要适用于以物体为中心的提示图像。为解决这些问题，本文提出了一种新颖的单阶段三维扩散模型 DiffusionGS，用于从单视图生成物体和场景。\nDiffusionGS 在每个时间步直接输出三维高斯点云，从而强化视角一致性，使模型能够稳健地生成来自任意方向的提示视图，而不仅限于以物体为中心的输入。此外，为了提高 DiffusionGS 的生成能力和泛化能力，我们开发了一种 场景-物体混合训练策略，大规模扩展了三维训练数据。\n实验表明，与现有最先进方法相比，DiffusionGS 在生成质量上表现更佳（PSNR 提高 2.20 dB，FID 降低 23.25），并且速度提高超过 5 倍（在 A100 GPU 上约为 6 秒）。用户研究和文本到3D应用进一步展示了该方法的实用价值。\n"
  },
  {
    "path": "abs/2411.14514.md",
    "content": "### NexusSplats: Efficient 3D Gaussian Splatting in the Wild\n\nWhile 3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering quality and efficiency in 3D scene reconstruction, it struggles with varying lighting conditions and incidental occlusions in real-world scenarios. To accommodate varying lighting conditions, existing 3DGS extensions apply color mapping to the massive Gaussian primitives with individually optimized appearance embeddings. To handle occlusions, they predict pixel-wise uncertainties via 2D image features for occlusion capture. Nevertheless, such massive color mapping and pixel-wise uncertainty prediction strategies suffer from not only additional computational costs but also coarse-grained lighting and occlusion handling. In this work, we propose a nexus kernel-driven approach, termed NexusSplats, for efficient and finer 3D scene reconstruction under complex lighting and occlusion conditions. In particular, NexusSplats leverages a novel light decoupling strategy where appearance embeddings are optimized based on nexus kernels instead of massive Gaussian primitives, thus accelerating reconstruction speeds while ensuring local color consistency for finer textures. Additionally, a Gaussian-wise uncertainty mechanism is developed, aligning 3D structures with 2D image features for fine-grained occlusion handling. Experimental results demonstrate that NexusSplats achieves state-of-the-art rendering quality while reducing reconstruction time by up to 70.4% compared to the current best in quality.\n\n虽然 3D Gaussian Splatting (3DGS) 在3D场景重建中展现了卓越的渲染质量和效率，但在现实场景中面对光照变化和偶然遮挡时表现不足。为适应光照变化，现有的 3DGS 扩展方法通过单独优化的外观嵌入对大量高斯基元进行颜色映射。为处理遮挡，它们基于 2D 图像特征预测像素级的不确定性。然而，这些方法的海量颜色映射和像素级不确定性预测策略不仅增加了计算成本，还在光照和遮挡处理上存在粗粒度问题。\n为此，我们提出了一种基于 核驱动策略 的方法，称为 NexusSplats，以高效且精细地重建复杂光照和遮挡条件下的 3D 场景。具体而言，NexusSplats 引入了一种新颖的 光照解耦策略，通过优化基于核的外观嵌入而非大量高斯基元，显著加速重建速度，同时确保局部颜色一致性以获得更精细的纹理。此外，我们开发了一种 高斯级不确定性机制，将 3D 结构与 2D 图像特征对齐，实现精细的遮挡处理。\n实验结果表明，NexusSplats 在渲染质量上达到当前最先进水平，同时相比质量最佳的现有方法，重建时间减少了 70.4%。\n"
  },
  {
    "path": "abs/2411.14716.md",
    "content": "### VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving\n\nThis paper introduces VisionPAD, a novel self-supervised pre-training paradigm designed for vision-centric algorithms in autonomous driving. In contrast to previous approaches that employ neural rendering with explicit depth supervision, VisionPAD utilizes more efficient 3D Gaussian Splatting to reconstruct multi-view representations using only images as supervision. Specifically, we introduce a self-supervised method for voxel velocity estimation. By warping voxels to adjacent frames and supervising the rendered outputs, the model effectively learns motion cues in the sequential data. Furthermore, we adopt a multi-frame photometric consistency approach to enhance geometric perception. It projects adjacent frames to the current frame based on rendered depths and relative poses, boosting the 3D geometric representation through pure image supervision. Extensive experiments on autonomous driving datasets demonstrate that VisionPAD significantly improves performance in 3D object detection, occupancy prediction and map segmentation, surpassing state-of-the-art pre-training strategies by a considerable margin.\n\n本文提出了 VisionPAD，一种专为自动驾驶视觉算法设计的新型自监督预训练范式。与以往依赖显式深度监督的神经渲染方法不同，VisionPAD 通过更高效的 3D Gaussian Splatting (3DGS)，仅使用图像作为监督信号即可重建多视图表示。\n具体而言，我们提出了一种自监督的体素速度估计方法。通过将体素变换到相邻帧并监督其渲染输出，模型能够有效地从序列数据中学习运动线索。此外，我们采用了 多帧光度一致性 方法来增强几何感知能力。该方法基于渲染深度和相对位姿将相邻帧投影到当前帧，从纯图像监督中提升 3D 几何表示。\n在自动驾驶数据集上的广泛实验表明，VisionPAD 在 3D目标检测、占用预测 和 地图分割 等任务中显著提升了性能，并在多个基准上超越了现有最先进的预训练策略。\n"
  },
  {
    "path": "abs/2411.14847.md",
    "content": "### Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly Training for 4D Reconstruction\n\nThe recent development of 3D Gaussian Splatting (3DGS) has led to great interest in 4D dynamic spatial reconstruction from multi-view visual inputs. While existing approaches mainly rely on processing full-length multi-view videos for 4D reconstruction, there has been limited exploration of iterative online reconstruction methods that enable on-the-fly training and per-frame streaming. Current 3DGS-based streaming methods treat the Gaussian primitives uniformly and constantly renew the densified Gaussians, thereby overlooking the difference between dynamic and static features and also neglecting the temporal continuity in the scene. To address these limitations, we propose a novel three-stage pipeline for iterative streamable 4D dynamic spatial reconstruction. Our pipeline comprises a selective inheritance stage to preserve temporal continuity, a dynamics-aware shift stage for distinguishing dynamic and static primitives and optimizing their movements, and an error-guided densification stage to accommodate emerging objects. Our method achieves state-of-the-art performance in online 4D reconstruction, demonstrating a 20% improvement in on-the-fly training speed, superior representation quality, and real-time rendering capability.\n\n最近，3D Gaussian Splatting（3DGS）的发展引发了对从多视角视觉输入中进行4D动态空间重建的广泛兴趣。现有方法主要依赖处理完整长度的多视角视频来实现4D重建，而对能够实现实时训练和逐帧流式处理的迭代在线重建方法探索较少。目前基于3DGS的流式方法通常将高斯原语视为统一对象，并不断更新密集化的高斯体，忽视了动态和静态特征之间的差异，同时也未充分利用场景中的时间连续性。为了解决这些问题，我们提出了一种新颖的三阶段流程，用于迭代的可流式4D动态空间重建。该流程包括选择性继承阶段以保持时间连续性、动态感知偏移阶段用于区分动态和静态原语并优化其移动，以及误差引导的密集化阶段以适应新出现的对象。我们的方法在在线4D重建中实现了最先进的性能，表现出训练速度提升20%、更高的表示质量以及实时渲染能力。\n"
  },
  {
    "path": "abs/2411.14974.md",
    "content": "### 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes\n\nRecent advances in radiance field reconstruction, such as 3D Gaussian Splatting (3DGS), have achieved high-quality novel view synthesis and fast rendering by representing scenes with compositions of Gaussian primitives. However, 3D Gaussians present several limitations for scene reconstruction. Accurately capturing hard edges is challenging without significantly increasing the number of Gaussians, creating a large memory footprint. Moreover, they struggle to represent flat surfaces, as they are diffused in space. Without hand-crafted regularizers, they tend to disperse irregularly around the actual surface. To circumvent these issues, we introduce a novel method, named 3D Convex Splatting (3DCS), which leverages 3D smooth convexes as primitives for modeling geometrically-meaningful radiance fields from multi-view images. Smooth convex shapes offer greater flexibility than Gaussians, allowing for a better representation of 3D scenes with hard edges and dense volumes using fewer primitives. Powered by our efficient CUDA-based rasterizer, 3DCS achieves superior performance over 3DGS on benchmarks such as Mip-NeRF360, Tanks and Temples, and Deep Blending. Specifically, our method attains an improvement of up to 0.81 in PSNR and 0.026 in LPIPS compared to 3DGS while maintaining high rendering speeds and reducing the number of required primitives. Our results highlight the potential of 3D Convex Splatting to become the new standard for high-quality scene reconstruction and novel view synthesis.\n\n近年来，辐射场重建技术取得了显著进展，例如3D Gaussian Splatting（3DGS），通过使用高斯原语的组合来表示场景，成功实现了高质量的新视图合成和快速渲染。然而，3D高斯在场景重建中存在一些局限性。在不显著增加高斯数量的情况下，很难准确捕捉场景中的硬边，从而导致较大的内存占用。此外，它们难以表示平坦表面，因为高斯分布在空间中较为弥散。如果没有精心设计的正则化器，高斯原语通常会在实际表面周围不规则地分散。为了解决这些问题，我们提出了一种新方法，称为3D Convex Splatting（3DCS），利用3D平滑凸体作为原语，从多视角图像中建模具有几何意义的辐射场。相比高斯原语，平滑凸体具有更大的灵活性，能够以更少的原语更好地表示具有硬边和高密度区域的3D场景。在我们高效的基于CUDA的光栅化器支持下，3DCS在多个基准测试中（如Mip-NeRF360、Tanks and Temples和Deep Blending）表现优于3DGS。具体而言，与3DGS相比，我们的方法在PSNR上提高了多达0.81，在LPIPS上提升了0.026，同时保持了高渲染速度并减少了所需原语的数量。我们的结果凸显了3D Convex Splatting在高质量场景重建和新视图合成领域成为新标准的潜力。\n"
  },
  {
    "path": "abs/2411.15018.md",
    "content": "### Neural 4D Evolution under Large Topological Changes from 2D Images\n\nIn the literature, it has been shown that the evolution of the known explicit 3D surface to the target one can be learned from 2D images using the instantaneous flow field, where the known and target 3D surfaces may largely differ in topology. We are interested in capturing 4D shapes whose topology changes largely over time. We encounter that the straightforward extension of the existing 3D-based method to the desired 4D case performs poorly.\nIn this work, we address the challenges in extending 3D neural evolution to 4D under large topological changes by proposing two novel modifications. More precisely, we introduce (i) a new architecture to discretize and encode the deformation and learn the SDF and (ii) a technique to impose the temporal consistency. (iii) Also, we propose a rendering scheme for color prediction based on Gaussian splatting. Furthermore, to facilitate learning directly from 2D images, we propose a learning framework that can disentangle the geometry and appearance from RGB images. This method of disentanglement, while also useful for the 4D evolution problem that we are concentrating on, is also novel and valid for static scenes. Our extensive experiments on various data provide awesome results and, most importantly, open a new approach toward reconstructing challenging scenes with significant topological changes and deformations.\n\n在文献中已有研究表明，可以通过瞬时流场从二维图像中学习已知显式三维曲面向目标曲面的演变过程，其中已知曲面和目标曲面在拓扑结构上可能存在较大差异。我们感兴趣的是捕捉拓扑结构随时间发生显著变化的四维形状。然而，现有三维方法的直接扩展在处理这种四维情况时表现较差。\n在本研究中，我们针对在拓扑结构发生显著变化的情况下，将三维神经演化扩展到四维所面临的挑战，提出了两项新的改进。具体来说，我们引入了以下关键创新：(i) 一种新架构，用于离散化和编码形变，同时学习符号距离函数（SDF）；(ii) 一种用于实现时间一致性的技术；以及 (iii) 一种基于高斯点云的颜色预测渲染方案。此外，为了直接从二维图像中学习，我们提出了一种框架，可以从 RGB 图像中解耦几何信息和外观信息。这种解耦方法不仅对我们关注的四维演化问题有用，而且对于静态场景也是新颖且有效的。\n通过在多种数据集上的广泛实验，我们的研究获得了卓越的结果，更重要的是，为重建具有显著拓扑变化和形变的复杂场景开辟了一种全新的方法。\n"
  },
  {
    "path": "abs/2411.15193.md",
    "content": "### Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting\n\nWe introduce a training-free method for feature field rendering in Gaussian splatting. Our approach back-projects 2D features into pre-trained 3D Gaussians, using a weighted sum based on each Gaussian's influence in the final rendering. While most training-based feature field rendering methods excel at 2D segmentation but perform poorly at 3D segmentation without post-processing, our method achieves high-quality results in both 2D and 3D segmentation. Experimental results demonstrate that our approach is fast, scalable, and offers performance comparable to training-based methods.\n\n我们提出了一种无需训练的高斯点云特征场渲染方法。该方法通过基于每个高斯点在最终渲染中的影响力进行加权求和，将二维特征反投影到预训练的三维高斯点中。与大多数基于训练的特征场渲染方法相比，这些方法在二维分割上表现出色，但在没有后处理的情况下，三维分割表现较差。我们的方法在二维和三维分割中均能实现高质量的结果。实验结果表明，该方法不仅快速且具有良好的可扩展性，其性能可与基于训练的方法相媲美。\n"
  },
  {
    "path": "abs/2411.15355.md",
    "content": "### UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations\n\nUrban scene reconstruction is crucial for real-world autonomous driving simulators. Although existing methods have achieved photorealistic reconstruction, they mostly focus on pinhole cameras and neglect fisheye cameras. In fact, how to effectively simulate fisheye cameras in driving scene remains an unsolved problem. In this work, we propose UniGaussian, a novel approach that learns a unified 3D Gaussian representation from multiple camera models for urban scene reconstruction in autonomous driving. Our contributions are two-fold. First, we propose a new differentiable rendering method that distorts 3D Gaussians using a series of affine transformations tailored to fisheye camera models. This addresses the compatibility issue of 3D Gaussian splatting with fisheye cameras, which is hindered by light ray distortion caused by lenses or mirrors. Besides, our method maintains real-time rendering while ensuring differentiability. Second, built on the differentiable rendering method, we design a new framework that learns a unified Gaussian representation from multiple camera models. By applying affine transformations to adapt different camera models and regularizing the shared Gaussians with supervision from different modalities, our framework learns a unified 3D Gaussian representation with input data from multiple sources and achieves holistic driving scene understanding. As a result, our approach models multiple sensors (pinhole and fisheye cameras) and modalities (depth, semantic, normal and LiDAR point clouds). Our experiments show that our method achieves superior rendering quality and fast rendering speed for driving scene simulation.\n\n城市场景重建对现实世界自动驾驶模拟器至关重要。尽管现有方法已实现照片级真实的重建，但它们大多专注于针孔相机而忽略了鱼眼相机。事实上，在驾驶场景中如何有效模拟鱼眼相机仍是一个未解决的问题。在这项工作中，我们提出了UniGaussian，一种新颖的方法，通过多种相机模型学习统一的三维高斯表示，用于自动驾驶中的城市场景重建。\n我们的贡献有两方面。首先，我们提出了一种新的可微分渲染方法，通过一系列针对鱼眼相机模型设计的仿射变换来扭曲三维高斯。这解决了由于镜头或反射镜导致的光线扭曲，使得三维高斯点渲染无法兼容鱼眼相机的问题。此外，我们的方法在保持实时渲染的同时保证了可微分性。其次，基于这一可微分渲染方法，我们设计了一个新的框架，用于从多种相机模型中学习统一的高斯表示。通过对不同相机模型应用仿射变换，并结合来自不同模态的监督对共享的高斯进行正则化，我们的框架能够从多源输入数据中学习统一的三维高斯表示，实现对驾驶场景的整体理解。\n因此，我们的方法能够建模多种传感器（针孔相机和鱼眼相机）和模态（深度、语义、法线以及激光雷达点云）。实验结果表明，我们的方法在驾驶场景模拟中实现了更高的渲染质量和快速的渲染速度。\n"
  },
  {
    "path": "abs/2411.15468.md",
    "content": "### SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion\n\nA signed distance function (SDF) is a useful representation for continuous-space geometry and many related operations, including rendering, collision checking, and mesh generation. Hence, reconstructing SDF from image observations accurately and efficiently is a fundamental problem. Recently, neural implicit SDF (SDF-NeRF) techniques, trained using volumetric rendering, have gained a lot of attention. Compared to earlier truncated SDF (TSDF) fusion algorithms that rely on depth maps and voxelize continuous space, SDF-NeRF enables continuous-space SDF reconstruction with better geometric and photometric accuracy. However, the accuracy and convergence speed of scene-level SDF reconstruction require further improvements for many applications. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, several works have focused on improving SDF-NeRF by introducing consistency losses on depth and surface normals between 3DGS and SDF-NeRF. However, loss-level connections alone lead to incremental improvements. We propose a novel neural implicit SDF called \"SplatSDF\" to fuse 3DGSandSDF-NeRF at an architecture level with significant boosts to geometric and photometric accuracy and convergence speed. Our SplatSDF relies on 3DGS as input only during training, and keeps the same complexity and efficiency as the original SDF-NeRF during inference. Our method outperforms state-of-the-art SDF-NeRF models on geometric and photometric evaluation by the time of submission.\n\n有符号距离函数（Signed Distance Function, SDF）是一种有效的连续空间几何表示方式，广泛应用于渲染、碰撞检测和网格生成等相关操作。因此，如何从图像观测中准确且高效地重建 SDF 是一个基础性问题。最近，基于体渲染训练的神经隐式 SDF 技术（SDF-NeRF）引起了广泛关注。相比于依赖深度图并将连续空间体素化的早期截断 SDF（TSDF）融合算法，SDF-NeRF 实现了更高几何和光度精度的连续空间 SDF 重建。然而，在场景级 SDF 重建的精度和收敛速度方面，仍有许多应用需要进一步改进。\n随着 3D 高斯投影（3D Gaussian Splatting, 3DGS）作为一种具有出色渲染质量和速度的显式表示的出现，一些研究致力于通过在深度和表面法线之间引入一致性损失来改进 SDF-NeRF。然而，单纯基于损失级别的连接只能带来渐进式的改进。我们提出了一种新颖的神经隐式 SDF 方法，称为“SplatSDF”，在架构层面融合了 3DGS 和 SDF-NeRF，显著提升了几何和光度精度以及收敛速度。SplatSDF 在训练过程中仅依赖 3DGS 输入，而在推理时保持与原始 SDF-NeRF 相同的复杂度和效率。我们的方法在几何和光度评估方面优于截至投稿时的最先进 SDF-NeRF 模型。\n"
  },
  {
    "path": "abs/2411.15476.md",
    "content": "### Gassidy: Gaussian Splatting SLAM in Dynamic Environments\n\n3D Gaussian Splatting (3DGS) allows flexible adjustments to scene representation, enabling continuous optimization of scene quality during dense visual simultaneous localization and mapping (SLAM) in static environments. However, 3DGS faces challenges in handling environmental disturbances from dynamic objects with irregular movement, leading to degradation in both camera tracking accuracy and map reconstruction quality. To address this challenge, we develop an RGB-D dense SLAM which is called Gaussian Splatting SLAM in Dynamic Environments (Gassidy). This approach calculates Gaussians to generate rendering loss flows for each environmental component based on a designed photometric-geometric loss function. To distinguish and filter environmental disturbances, we iteratively analyze rendering loss flows to detect features characterized by changes in loss values between dynamic objects and static components. This process ensures a clean environment for accurate scene reconstruction. Compared to state-of-the-art SLAM methods, experimental results on open datasets show that Gassidy improves camera tracking precision by up to 97.9% and enhances map quality by up to 6%.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）能够灵活调整场景表示，使其在静态环境下进行稠密视觉同时定位与建图（SLAM）时，可持续优化场景质量。然而，3DGS 在处理动态物体的不规则运动引起的环境干扰时面临挑战，这会导致摄像机跟踪精度和地图重建质量的下降。为应对这一问题，我们开发了一种基于 RGB-D 的稠密 SLAM 方法，称为动态环境中的高斯投影 SLAM（Gassidy）。\n该方法通过设计的光度-几何损失函数，为每个环境组件计算高斯分布以生成渲染损失流。为了区分并过滤环境干扰，我们迭代分析渲染损失流，以检测由动态物体和静态组件之间损失值变化所表征的特征。这一过程确保了一个干净的环境，从而实现准确的场景重建。与最先进的 SLAM 方法相比，在公开数据集上的实验结果表明，Gassidy 将摄像机跟踪精度提高了最多 97.9%，并将地图质量提升了最多 6%。\n"
  },
  {
    "path": "abs/2411.15482.md",
    "content": "### SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving\n\nMost existing Dynamic Gaussian Splatting methods for complex dynamic urban scenarios rely on accurate object-level supervision from expensive manual labeling, limiting their scalability in real-world applications. In this paper, we introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF) to learn 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB, depth and flow synthesis. SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields. Leveraging NMFF, SplatFlow effectively decomposes static background and dynamic objects, representing them with 3D and 4D Gaussian primitives, respectively. NMFF also models the status correspondences of each 4D Gaussian across time, which aggregates temporal features to enhance cross-view consistency of dynamic components. SplatFlow further improves dynamic scene identification by distilling features from 2D foundational models into 4D space-time representation. Comprehensive evaluations conducted on the Waymo Open Dataset and KITTI Dataset validate SplatFlow's state-of-the-art (SOTA) performance for both image reconstruction and novel view synthesis in dynamic urban scenarios.\n\n目前大多数针对复杂动态城市场景的动态高斯投影（Dynamic Gaussian Splatting）方法依赖于昂贵的手动标注提供的精确目标级监督，这限制了其在实际应用中的可扩展性。在本文中，我们提出了一种名为 SplatFlow 的自监督动态高斯投影方法，通过神经运动流场（Neural Motion Flow Fields, NMFF）学习 4D 时空表示，无需依赖跟踪的 3D 边界框，从而实现了准确的动态场景重建以及新视角的 RGB、深度和流的生成。\nSplatFlow 设计了一个统一框架，将时间相关的 4D 高斯表示无缝集成到 NMFF 中。其中，NMFF 是一组隐式函数，用于将 LiDAR 点和高斯表示的时间运动建模为连续的运动流场。借助 NMFF，SplatFlow 有效地分解了静态背景和动态物体，分别使用 3D 和 4D 高斯基元进行表示。NMFF 还建模了每个 4D 高斯随时间变化的状态对应关系，从而聚合时间特征，增强动态组件的跨视角一致性。\n此外，SplatFlow 通过将 2D 基础模型的特征提炼到 4D 时空表示中，进一步提升了动态场景的识别能力。在 Waymo Open Dataset 和 KITTI Dataset 上进行的综合评估表明，SplatFlow 在动态城市场景中的图像重建和新视角生成任务上均达到了最先进水平（SOTA）。\n"
  },
  {
    "path": "abs/2411.15582.md",
    "content": "### EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting\n\nPhotorealistic reconstruction of street scenes is essential for developing real-world simulators in autonomous driving. While recent methods based on 3D/4D Gaussian Splatting (GS) have demonstrated promising results, they still encounter challenges in complex street scenes due to the unpredictable motion of dynamic objects. Current methods typically decompose street scenes into static and dynamic objects, learning the Gaussians in either a supervised manner (e.g., w/ 3D bounding-box) or a self-supervised manner (e.g., w/o 3D bounding-box). However, these approaches do not effectively model the motions of dynamic objects (e.g., the motion speed of pedestrians is clearly different from that of vehicles), resulting in suboptimal scene decomposition. To address this, we propose Explicit Motion Decomposition (EMD), which models the motions of dynamic objects by introducing learnable motion embeddings to the Gaussians, enhancing the decomposition in street scenes. The proposed EMD is a plug-and-play approach applicable to various baseline methods. We also propose tailored training strategies to apply EMD to both supervised and self-supervised baselines. Through comprehensive experimentation, we illustrate the effectiveness of our approach with various established baselines. The code will be released at: this https URL.\n\n街景的逼真重建对开发自动驾驶的真实场景模拟器至关重要。尽管基于 3D/4D 高斯投影（Gaussian Splatting, GS）的最新方法在该领域展现了良好前景，但由于动态物体不可预测的运动，这些方法在复杂街景中仍面临挑战。当前的方法通常将街景分解为静态和动态物体，并通过有监督（例如基于 3D 边界框）或自监督（例如无需 3D 边界框）的方式学习高斯分布。然而，这些方法未能有效建模动态物体的运动特性（例如，行人的运动速度明显不同于车辆），导致场景分解效果不够理想。\n为了解决这一问题，我们提出了显式运动分解（Explicit Motion Decomposition, EMD）方法，通过向高斯分布中引入可学习的运动嵌入（motion embeddings），对动态物体的运动进行建模，从而增强街景的分解效果。所提出的 EMD 方法是一种可即插即用的方案，适用于多种基线方法。此外，我们还设计了针对性的训练策略，使 EMD 能够应用于有监督和自监督的基线方法。\n通过全面的实验，我们验证了在多种基线方法中应用 EMD 的有效性，并表明其显著改善了街景动态物体的分解与建模。\n"
  },
  {
    "path": "abs/2411.15723.md",
    "content": "### GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision\n\nSurface reconstruction from multi-view images is a core challenge in 3D vision. Recent studies have explored signed distance fields (SDF) within Neural Radiance Fields (NeRF) to achieve high-fidelity surface reconstructions. However, these approaches often suffer from slow training and rendering speeds compared to 3D Gaussian splatting (3DGS). Current state-of-the-art techniques attempt to fuse depth information to extract geometry from 3DGS, but frequently result in incomplete reconstructions and fragmented surfaces. In this paper, we introduce GSurf, a novel end-to-end method for learning a signed distance field directly from Gaussian primitives. The continuous and smooth nature of SDF addresses common issues in the 3DGS family, such as holes resulting from noisy or missing depth data. By using Gaussian splatting for rendering, GSurf avoids the redundant volume rendering typically required in other GS and SDF integrations. Consequently, GSurf achieves faster training and rendering speeds while delivering 3D reconstruction quality comparable to neural implicit surface methods, such as VolSDF and NeuS. Experimental results across various benchmark datasets demonstrate the effectiveness of our method in producing high-fidelity 3D reconstructions.\n\n从多视角图像中进行表面重建是 3D 视觉领域的核心挑战。近年来的研究通过在神经辐射场（Neural Radiance Fields, NeRF）中利用有符号距离场（Signed Distance Fields, SDF），实现了高保真的表面重建。然而，这些方法的训练和渲染速度通常比 3D 高斯投影（3D Gaussian Splatting, 3DGS）慢得多。当前最先进的技术尝试将深度信息融合到 3DGS 中以提取几何，但经常导致重建不完整或表面破碎的问题。\n本文提出了一种名为 GSurf 的新颖端到端方法，直接从高斯基元学习有符号距离场（SDF）。SDF 的连续和平滑特性有效解决了 3DGS 方法中常见的问题，例如由于噪声或深度数据缺失导致的孔洞。通过使用高斯投影进行渲染，GSurf 避免了其他 GS 和 SDF 集成方法中常见的冗余体积渲染。由此，GSurf 实现了更快的训练和渲染速度，同时在 3D 重建质量上可与神经隐式表面方法（如 VolSDF 和 NeuS）相媲美。\n在多个基准数据集上的实验结果表明，GSurf 能够高效生成高保真的 3D 重建，验证了其方法的有效性。\n"
  },
  {
    "path": "abs/2411.15732.md",
    "content": "### DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models\n\nGenerating and editing dynamic 3D head avatars are crucial tasks in virtual reality and film production. However, existing methods often suffer from facial distortions, inaccurate head movements, and limited fine-grained editing capabilities. To address these challenges, we present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips and parameters associated with facial positions and expressions. Our approach enables precise editing through a novel prompt-based editing model, which integrates user-provided prompts with guiding parameters derived from large language models (LLMs). To achieve this, we propose a dual-tracking framework based on Gaussian Splatting and introduce a prompt preprocessing module to enhance editing stability. By incorporating a specialized GAN algorithm and connecting it to our control module, which generates precise guiding parameters from LLMs, we successfully address the limitations of existing methods. Additionally, we develop a dynamic editing strategy that selectively utilizes specific training datasets to improve the efficiency and adaptability of the model for dynamic editing tasks.\n\n生成和编辑动态 3D 头像是虚拟现实和电影制作中的关键任务。然而，现有方法通常存在面部失真、头部运动不准确以及精细编辑能力有限等问题。为了解决这些挑战，我们提出了 DynamicAvatars，一种动态模型，可根据视频片段和与面部位置及表情相关的参数生成逼真、动态的 3D 头像。\n我们的方法通过一种新颖的基于提示的编辑模型实现精确编辑，该模型将用户提供的提示与由大型语言模型（LLMs）生成的指导参数相结合。为此，我们提出了基于高斯投影的双重跟踪框架，并引入提示预处理模块以增强编辑稳定性。通过结合一种专门设计的生成对抗网络（GAN）算法，并将其连接到我们的控制模块中（该模块利用 LLM 生成精确的指导参数），我们成功解决了现有方法的局限性。\n此外，我们开发了一种动态编辑策略，该策略选择性地利用特定的训练数据集，提高了模型在动态编辑任务中的效率和适应性。实验结果表明，DynamicAvatars 在生成和编辑动态 3D 头像方面实现了高精度和灵活性，为虚拟现实和影视制作提供了强大的工具支持。\n"
  },
  {
    "path": "abs/2411.15779.md",
    "content": "### ZeroGS: Training 3D Gaussian Splatting from Unposed Images\n\nNeural radiance fields (NeRF) and 3D Gaussian Splatting (3DGS) are popular techniques to reconstruct and render photo-realistic images. However, the pre-requisite of running Structure-from-Motion (SfM) to get camera poses limits their completeness. While previous methods can reconstruct from a few unposed images, they are not applicable when images are unordered or densely captured. In this work, we propose ZeroGS to train 3DGS from hundreds of unposed and unordered images. Our method leverages a pretrained foundation model as the neural scene representation. Since the accuracy of the predicted pointmaps does not suffice for accurate image registration and high-fidelity image rendering, we propose to mitigate the issue by initializing and finetuning the pretrained model from a seed image. Images are then progressively registered and added to the training buffer, which is further used to train the model. We also propose to refine the camera poses and pointmaps by minimizing a point-to-camera ray consistency loss across multiple views. Experiments on the LLFF dataset, the MipNeRF360 dataset, and the Tanks-and-Temples dataset show that our method recovers more accurate camera poses than state-of-the-art pose-free NeRF/3DGS methods, and even renders higher quality images than 3DGS with COLMAP poses. Our project page is available at this https URL.\n\n神经辐射场（Neural Radiance Fields, NeRF）和 3D 高斯投影（3D Gaussian Splatting, 3DGS）是重建和渲染逼真图像的热门技术。然而，这些方法通常需要先运行结构化运动恢复（Structure-from-Motion, SfM）以获取相机位姿，这限制了其应用的完整性。尽管现有方法能够从少量未配准的图像中进行重建，但当图像是无序的或密集捕获时，它们无法适用。\n为了解决这一问题，我们提出了 ZeroGS，一种能够从数百张未配准且无序图像中训练 3DGS 的方法。我们的方法利用预训练的基础模型作为神经场景表示。由于预测点图的精度不足以进行准确的图像配准和高保真的图像渲染，我们通过从种子图像初始化并微调预训练模型来缓解这一问题。图像随后被逐步配准并添加到训练缓冲区中，从而进一步用于模型训练。\n此外，我们提出通过最小化多视角下的点到相机射线一致性损失来优化相机位姿和点图。实验结果表明，在 LLFF 数据集、MipNeRF360 数据集和 Tanks-and-Temples 数据集上，我们的方法在相机位姿估计方面优于最先进的无位姿 NeRF/3DGS 方法，并且在图像渲染质量上甚至超越了使用 COLMAP 位姿的 3DGS 方法。\n"
  },
  {
    "path": "abs/2411.15800.md",
    "content": "### PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments\n\nSimultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans and exploit geometric and appearance constraints of humans and items. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the observations for noise compensation. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation.\n\n同时定位与建图（SLAM）在静态环境中已经取得了显著的性能。然而，SLAM 在动态环境中的应用仍然是一个未解决的问题。许多方法直接过滤掉动态物体，导致场景重建不完整以及摄像机定位精度受限。其他工作通过点云、稀疏关节或粗糙网格表示动态物体，但无法提供照片级真实感的表现。\n为了解决上述限制，我们提出了一种通过扩展高斯投影实现照片级真实感和几何感知的 RGB-D SLAM 方法。我们的方法由三个主要模块组成：1）映射包括非刚体人类和刚体物品在内的动态前景，2）重建静态背景，3）定位摄像机。为了映射前景，我们专注于建模形变和/或运动，并结合人类的形状先验以及几何和外观约束来表示人类和物品。对于背景重建，我们设计了一种通过将外观约束整合到几何对齐中的邻域局部地图优化策略。至于摄像机定位，我们利用静态背景和动态前景来增加观测量以补偿噪声。\n我们通过将 3D 高斯与 2D 光流和像素块关联，探索几何和外观约束。在多个真实世界数据集上的实验表明，我们的方法在摄像机定位和场景表示方面优于最先进的方法。\n"
  },
  {
    "path": "abs/2411.15966.md",
    "content": "### Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors\n\nIn this work, we introduce a generative approach for pose-free reconstruction of 360∘ scenes from a limited number of uncalibrated 2D images. Pose-free scene reconstruction from incomplete, unposed observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of unbounded scenes with known camera poses using diffusion priors, these methods rely on explicit camera embeddings for extrapolating unobserved regions. This reliance limits their application in pose-free settings, where view-specific data is only implicitly available. To address this, we propose an instruction-following RGBD diffusion model designed to inpaint missing details and remove artifacts in novel view renders and depth maps of a 3D scene. We also propose a novel confidence measure for Gaussian representations to allow for better detection of these artifacts. By progressively integrating these novel views in a Gaussian-SLAM-inspired process, we achieve a multi-view-consistent Gaussian representation. Evaluations on the MipNeRF360 dataset demonstrate that our method surpasses existing pose-free techniques and performs competitively with state-of-the-art posed reconstruction methods in complex 360∘scenes.\n\n在本研究中，我们提出了一种生成式方法，用于从有限数量的未校准 2D 图像中进行无位姿 360∘ 场景重建。无位姿场景重建在面对不完整、未配准的观测时，通常通过深度估计或 3D 基础先验进行正则化。尽管最近的研究已经实现了在已知相机位姿条件下使用扩散先验进行稀疏视角的无界场景重建，这些方法依赖显式相机嵌入来外推未观测区域。这种依赖性限制了它们在无位姿环境中的应用，因为视角特定的数据仅以隐式方式可用。\n为了解决这一问题，我们提出了一种指令跟随的 RGBD 扩散模型，用于对 3D 场景的新视角渲染和深度图中的缺失细节进行修复，并去除伪影。此外，我们设计了一种针对高斯表示的新颖置信度度量方法，用于更好地检测这些伪影。通过将这些新视角渐进式地集成到一个受 Gaussian-SLAM 启发的流程中，我们实现了多视角一致的高斯表示。\n在 MipNeRF360 数据集上的评估结果表明，我们的方法在复杂的 360∘ 场景中超越了现有的无位姿技术，并且在性能上与最先进的有位姿重建方法相媲美。\n"
  },
  {
    "path": "abs/2411.16053.md",
    "content": "### UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation\n\nVision-and-Language Navigation (VLN), where an agent follows instructions to reach a target destination, has recently seen significant advancements. In contrast to navigation in discrete environments with predefined trajectories, VLN in Continuous Environments (VLN-CE) presents greater challenges, as the agent is free to navigate any unobstructed location and is more vulnerable to visual occlusions or blind spots. Recent approaches have attempted to address this by imagining future environments, either through predicted future visual images or semantic features, rather than relying solely on current observations. However, these RGB-based and feature-based methods lack intuitive appearance-level information or high-level semantic complexity crucial for effective navigation. To overcome these limitations, we introduce a novel, generalizable 3DGS-based pre-training paradigm, called UnitedVLN, which enables agents to better explore future environments by unitedly rendering high-fidelity 360 visual images and semantic features. UnitedVLN employs two key schemes: search-then-query sampling and separate-then-united rendering, which facilitate efficient exploitation of neural primitives, helping to integrate both appearance and semantic information for more robust navigation. Extensive experiments demonstrate that UnitedVLN outperforms state-of-the-art methods on existing VLN-CE benchmarks.\n\n视觉与语言导航（Vision-and-Language Navigation, VLN）任务中，智能体需要根据指令导航到目标地点。与具有预定义轨迹的离散环境中的导航不同，连续环境中的 VLN（VLN-CE）面临更大的挑战，因为智能体可以自由导航到任何未受阻的地点，同时更容易受到视觉遮挡或盲区的影响。近期方法尝试通过预测未来视觉图像或语义特征来“想象”未来环境，而不仅依赖当前观测。然而，这些基于 RGB 和特征的方法缺乏直观的外观级信息或高层次的语义复杂性，这对于有效导航至关重要。\n为克服这些局限性，我们提出了一种新颖的、具有泛化能力的基于 3DGS 的预训练范式，称为 UnitedVLN。该方法通过联合渲染高保真的 360 度视觉图像和语义特征，使智能体能够更好地探索未来环境。UnitedVLN 包括两个关键机制：搜索-查询采样（search-then-query sampling）和分离-联合渲染（separate-then-united rendering），以高效利用神经基元，帮助集成外观和语义信息，从而实现更稳健的导航。\n大量实验结果表明，UnitedVLN 在现有 VLN-CE 基准测试中表现优于最先进的方法，显著提高了导航的准确性和鲁棒性。\n"
  },
  {
    "path": "abs/2411.16180.md",
    "content": "### Event-boosted Deformable 3D Gaussians for Fast Dynamic Scene Reconstruction\n\n3D Gaussian Splatting (3D-GS) enables real-time rendering but struggles with fast motion due to low temporal resolution of RGB cameras. To address this, we introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for fast dynamic scene reconstruction. We observe that threshold modeling for events plays a crucial role in achieving high-quality reconstruction. Therefore, we propose a GS-Threshold Joint Modeling (GTJM) strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling. Moreover, we introduce a Dynamic-Static Decomposition (DSD) strategy that first identifies dynamic areas by exploiting the inability of static Gaussians to represent motions, then applies a buffer-based soft decomposition to separate dynamic and static areas. This strategy accelerates rendering by avoiding unnecessary deformation in static areas, and focuses on dynamic areas to enhance fidelity. Our approach achieves high-fidelity dynamic reconstruction at 156 FPS with a 400×400 resolution on an RTX 3090 GPU.\n\n3D 高斯投影（3D Gaussian Splatting, 3D-GS）支持实时渲染，但由于 RGB 相机的时间分辨率较低，在处理快速运动时表现不足。为了解决这一问题，我们首次将事件相机（Event Cameras）与可变形 3D-GS 相结合，利用事件相机捕获的高时间分辨率连续运动数据，实现快速动态场景重建。\n我们观察到，事件的阈值建模对于实现高质量重建至关重要。因此，我们提出了一种 GS-Threshold Joint Modeling (GTJM) 策略，通过创建一个相互增强的过程，大幅提升了 3D 重建质量和阈值建模的准确性。此外，我们引入了一种 Dynamic-Static Decomposition (DSD) 策略，首先通过静态高斯无法表示运动的特性识别动态区域，然后采用基于缓冲的软分解方法将动态区域和静态区域分离。该策略通过避免静态区域中的不必要形变加速了渲染，同时聚焦于动态区域以提高细节保真度。\n我们的方法在 RTX 3090 GPU 上以 400×400 分辨率实现了 156 FPS 的高保真动态重建。\n"
  },
  {
    "path": "abs/2411.16392.md",
    "content": "### Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction\n\nRecently, 3D Gaussian Splatting (3DGS) has attracted attention for its superior rendering quality and speed over Neural Radiance Fields (NeRF). To address 3DGS's limitations in surface representation, 2D Gaussian Splatting (2DGS) introduced disks as scene primitives to model and reconstruct geometries from multi-view images, offering view-consistent geometry. However, the disk's first-order linear approximation often leads to over-smoothed results. We propose Quadratic Gaussian Splatting (QGS), a novel method that replaces disks with quadric surfaces, enhancing geometric fitting, whose code will be open-sourced. QGS defines Gaussian distributions in non-Euclidean space, allowing primitives to capture more complex textures. As a second-order surface approximation, QGS also renders spatial curvature to guide the normal consistency term, to effectively reduce over-smoothing. Moreover, QGS is a generalized version of 2DGS that achieves more accurate and detailed reconstructions, as verified by experiments on DTU and TNT, demonstrating its effectiveness in surpassing current state-of-the-art methods in geometry reconstruction.\n\n最近，3D 高斯投影（3D Gaussian Splatting, 3DGS）因其在渲染质量和速度上优于神经辐射场（Neural Radiance Fields, NeRF）而受到关注。为了解决 3DGS 在表面表示上的局限性，2D 高斯投影（2D Gaussian Splatting, 2DGS）引入了以圆盘为场景基元的方法，从多视角图像中建模和重建几何结构，提供视图一致的几何表示。然而，圆盘的线性一阶近似往往导致结果过于平滑。\n我们提出了一种新方法 Quadratic Gaussian Splatting (QGS)，将圆盘替换为二次曲面，从而提升几何拟合能力，相关代码将开源。QGS 在非欧几里得空间中定义高斯分布，使得基元能够捕捉更复杂的纹理特征。作为二阶曲面近似方法，QGS 还能渲染空间曲率，引导法线一致性项，从而有效减少过度平滑。\n此外，QGS 是 2DGS 的广义版本，能够实现更精确、更细致的重建。通过在 DTU 和 TNT 数据集上的实验验证，我们的方法在几何重建方面超越了当前最先进的方法，展现了其卓越的效果。\n"
  },
  {
    "path": "abs/2411.16443.md",
    "content": "### SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis\n\nText-based generation and editing of 3D scenes hold significant potential for streamlining content creation through intuitive user interactions. While recent advances leverage 3D Gaussian Splatting (3DGS) for high-fidelity and real-time rendering, existing methods are often specialized and task-focused, lacking a unified framework for both generation and editing. In this paper, we introduce SplatFlow, a comprehensive framework that addresses this gap by enabling direct 3DGS generation and editing. SplatFlow comprises two main components: a multi-view rectified flow (RF) model and a Gaussian Splatting Decoder (GSDecoder). The multi-view RF model operates in latent space, generating multi-view images, depths, and camera poses simultaneously, conditioned on text prompts, thus addressing challenges like diverse scene scales and complex camera trajectories in real-world settings. Then, the GSDecoder efficiently translates these latent outputs into 3DGS representations through a feed-forward 3DGS method. Leveraging training-free inversion and inpainting techniques, SplatFlow enables seamless 3DGS editing and supports a broad range of 3D tasks-including object editing, novel view synthesis, and camera pose estimation-within a unified framework without requiring additional complex pipelines. We validate SplatFlow's capabilities on the MVImgNet and DL3DV-7K datasets, demonstrating its versatility and effectiveness in various 3D generation, editing, and inpainting-based tasks.\n\n基于文本的 3D 场景生成和编辑在通过直观的用户交互简化内容创作方面具有巨大的潜力。尽管最近的进展利用了 3D 高斯投影（3D Gaussian Splatting, 3DGS）实现高保真和实时渲染，但现有方法往往专注于特定任务，缺乏一个同时支持生成和编辑的统一框架。\n本文提出了 SplatFlow，一个综合框架，填补了这一空白，实现了直接的 3DGS 生成和编辑。SplatFlow 包含两个主要组件：多视角校正流（Multi-view Rectified Flow, RF）模型和高斯投影解码器（Gaussian Splatting Decoder, GSDecoder）。多视角 RF 模型在潜在空间中操作，基于文本提示同时生成多视角图像、深度图和相机位姿，从而解决了现实场景中多样化场景尺度和复杂相机轨迹等挑战。随后，GSDecoder 通过前馈 3DGS 方法高效地将这些潜在输出转换为 3DGS 表示。\n通过无训练反演和修复技术，SplatFlow 实现了无缝的 3DGS 编辑，并在一个统一框架下支持广泛的 3D 任务，包括对象编辑、新视角合成和相机位姿估计，无需额外复杂的管道。我们在 MVImgNet 和 DL3DV-7K 数据集上验证了 SplatFlow 的能力，展示了其在各种 3D 生成、编辑和基于修复任务中的多功能性和有效性。\n"
  },
  {
    "path": "abs/2411.16758.md",
    "content": "### Bundle Adjusted Gaussian Avatars Deblurring\n\nThe development of 3D human avatars from multi-view videos represents a significant yet challenging task in the field. Recent advancements, including 3D Gaussian Splattings (3DGS), have markedly progressed this domain. Nonetheless, existing techniques necessitate the use of high-quality sharp images, which are often impractical to obtain in real-world settings due to variations in human motion speed and intensity. In this study, we attempt to explore deriving sharp intrinsic 3D human Gaussian avatars from blurry video footage in an end-to-end manner. Our approach encompasses a 3D-aware, physics-oriented model of blur formation attributable to human movement, coupled with a 3D human motion model to clarify ambiguities found in motion-induced blurry images. This methodology facilitates the concurrent learning of avatar model parameters and the refinement of sub-frame motion parameters from a coarse initialization. We have established benchmarks for this task through a synthetic dataset derived from existing multi-view captures, alongside a real-captured dataset acquired through a 360-degree synchronous hybrid-exposure camera system. Comprehensive evaluations demonstrate that our model surpasses existing baselines.\n\n从多视角视频生成 3D 人体化身是一个重要但具有挑战性的任务。近期的进展，包括 3D 高斯投影（3D Gaussian Splattings, 3DGS），显著推动了这一领域的发展。然而，现有技术通常要求高质量、清晰的图像，而在现实场景中，由于人体运动速度和强度的变化，这种要求往往难以满足。\n本研究尝试以端到端的方式，从模糊视频中推导出清晰的内在 3D 人体高斯化身。我们的方法包括一个 3D 感知且基于物理的模糊形成模型，用于描述人体运动引起的模糊，以及一个 3D 人体运动模型，用于澄清运动引起的模糊图像中的歧义。该方法能够同时学习化身模型参数，并从粗略的初始化中优化子帧运动参数。\n我们通过基于现有多视角捕获数据生成的合成数据集，以及通过 360 度同步混合曝光相机系统采集的真实数据集，建立了该任务的基准数据集。全面评估结果表明，我们的模型在性能上显著超越了现有基线方法。\n"
  },
  {
    "path": "abs/2411.16768.md",
    "content": "### GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context\n\n3D human avatars, through the use of canonical radiance fields and per-frame observed warping, enable high-fidelity rendering and animating. However, existing methods, which rely on either spatial SMPL(-X) poses or temporal embeddings, respectively suffer from coarse rendering quality or limited animation flexibility. To address these challenges, we propose GAST, a framework that unifies 3D human modeling with 3DGS by hierarchically integrating both spatial and temporal information. Specifically, we design a sequential conditioning framework for the non-rigid warping of the human body, under whose guidance more accurate 3D Gaussians can be obtained in the observation space. Moreover, the explicit properties of Gaussians allow us to embed richer sequential information, encompassing both the coarse sequence of human poses and finer per-vertex motion details. These sequence conditions are further sampled across different temporal scales, in a coarse-to-fine manner, ensuring unbiased inputs for non-rigid warping. Experimental results demonstrate that our method combined with hierarchical spatio-temporal modeling surpasses concurrent baselines, delivering both high-quality rendering and flexible animating capabilities.\n\n通过使用规范辐射场和逐帧观察到的形变，3D 人体化身能够实现高保真的渲染和动画。然而，现有方法依赖空间上的 SMPL(-X) 姿态或时间嵌入，分别面临渲染质量粗糙或动画灵活性受限的问题。\n为了解决这些挑战，我们提出了 GAST，一个将 3D 人体建模与 3D 高斯投影（3DGS）相统一的框架，通过层次化整合空间和时间信息实现高效建模。具体来说，我们设计了一种顺序条件框架，用于非刚体的人体形变，在其引导下，可以在观测空间中获得更精确的 3D 高斯。此外，高斯的显式属性允许我们嵌入更丰富的序列信息，涵盖人体姿态的粗略序列以及更细粒度的逐顶点运动细节。\n这些序列条件以粗到细的方式在不同时间尺度上进行采样，从而确保非刚体形变的输入不带偏差。实验结果表明，我们结合层次化时空建模的方法，超越了现有的同期基线，实现了高质量渲染和灵活的动画能力，显著提升了 3D 人体建模的表现力和实用性。\n"
  },
  {
    "path": "abs/2411.16779.md",
    "content": "### NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model\n\nWe introduce NovelGS, a diffusion model for Gaussian Splatting (GS) given sparse-view images. Recent works leverage feed-forward networks to generate pixel-aligned Gaussians, which could be fast rendered. Unfortunately, the method was unable to produce satisfactory results for areas not covered by the input images due to the formulation of these methods. In contrast, we leverage the novel view denoising through a transformer-based network to generate 3D Gaussians. Specifically, by incorporating both conditional views and noisy target views, the network predicts pixel-aligned Gaussians for each view. During training, the rendered target and some additional views of the Gaussians are supervised. During inference, the target views are iteratively rendered and denoised from pure noise. Our approach demonstrates state-of-the-art performance in addressing the multi-view image reconstruction challenge. Due to generative modeling of unseen regions, NovelGS effectively reconstructs 3D objects with consistent and sharp textures. Experimental results on publicly available datasets indicate that NovelGS substantially surpasses existing image-to-3D frameworks, both qualitatively and quantitatively. We also demonstrate the potential of NovelGS in generative tasks, such as text-to-3D and image-to-3D, by integrating it with existing multiview diffusion models. We will make the code publicly accessible.\n\n我们提出了 NovelGS，一种基于稀疏视角图像进行高斯投影（Gaussian Splatting, GS）的扩散模型。近期研究利用前馈网络生成像素对齐的高斯基元，能够实现快速渲染。然而，这些方法由于其公式化限制，对于输入图像未覆盖的区域往往无法生成令人满意的结果。\n与之相比，我们通过基于 Transformer 的新视角去噪网络生成 3D 高斯分布。具体而言，网络结合条件视角和噪声目标视角，为每个视角预测像素对齐的高斯。在训练过程中，对目标视角的渲染结果以及一些额外视角的高斯进行监督。在推理过程中，目标视角从纯噪声中迭代渲染并去噪。\n我们的方法在多视角图像重建挑战中表现出最先进的性能。由于对未见区域的生成建模，NovelGS 能够有效重建具有一致且清晰纹理的 3D 对象。在公开数据集上的实验结果表明，无论从定性还是定量角度，NovelGS 均显著超越现有的图像到 3D 框架。此外，我们通过将 NovelGS 与现有多视角扩散模型集成，展示了其在生成任务（如文本到 3D 和图像到 3D）中的潜力。\n"
  },
  {
    "path": "abs/2411.16785.md",
    "content": "### MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM\n\nSimultaneous localization and mapping (SLAM) systems with novel view synthesis capabilities are widely used in computer vision, with applications in augmented reality, robotics, and autonomous driving. However, existing approaches are limited to single-agent operation. Recent work has addressed this problem using a distributed neural scene representation. Unfortunately, existing methods are slow, cannot accurately render real-world data, are restricted to two agents, and have limited tracking accuracy. In contrast, we propose a rigidly deformable 3D Gaussian-based scene representation that dramatically speeds up the system. However, improving tracking accuracy and reconstructing a globally consistent map from multiple agents remains challenging due to trajectory drift and discrepancies across agents' observations. Therefore, we propose new tracking and map-merging mechanisms and integrate loop closure in the Gaussian-based SLAM pipeline. We evaluate MAGiC-SLAM on synthetic and real-world datasets and find it more accurate and faster than the state of the art.\n\n同时定位与建图（Simultaneous Localization and Mapping, SLAM）系统结合新视角合成功能广泛应用于计算机视觉领域，如增强现实、机器人技术和自动驾驶。然而，现有方法局限于单代理操作。最近的研究通过分布式神经场景表示解决了这一问题，但现有方法存在运行速度慢、无法准确渲染真实数据、仅支持两个代理以及跟踪精度有限等问题。\n针对这些限制，我们提出了一种基于刚性可变形 3D 高斯的场景表示，大幅提升了系统的运行速度。然而，由于轨迹漂移和代理观测之间的不一致性，提高跟踪精度并从多代理观测中重建全局一致的地图仍然是一个挑战。为此，我们引入了新的跟踪和地图合并机制，并在基于高斯的 SLAM 流水线中集成了闭环检测（loop closure）。\n我们在合成和真实数据集上对 MAGiC-SLAM 进行了评估，结果表明，该方法在精度和速度方面均优于现有的最先进方法，展示了在多代理 SLAM 系统中的显著优势。\n"
  },
  {
    "path": "abs/2411.16816.md",
    "content": "### SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving\n\nEnsuring the safety of autonomous robots, such as self-driving vehicles, requires extensive testing across diverse driving scenarios. Simulation is a key ingredient for conducting such testing in a cost-effective and scalable way. Neural rendering methods have gained popularity, as they can build simulation environments from collected logs in a data-driven manner. However, existing neural radiance field (NeRF) methods for sensor-realistic rendering of camera and lidar data suffer from low rendering speeds, limiting their applicability for large-scale testing. While 3D Gaussian Splatting (3DGS) enables real-time rendering, current methods are limited to camera data and are unable to render lidar data essential for autonomous driving. To address these limitations, we propose SplatAD, the first 3DGS-based method for realistic, real-time rendering of dynamic scenes for both camera and lidar data. SplatAD accurately models key sensor-specific phenomena such as rolling shutter effects, lidar intensity, and lidar ray dropouts, using purpose-built algorithms to optimize rendering efficiency. Evaluation across three autonomous driving datasets demonstrates that SplatAD achieves state-of-the-art rendering quality with up to +2 PSNR for NVS and +3 PSNR for reconstruction while increasing rendering speed over NeRF-based methods by an order of magnitude. See this https URL for our project page.\n\n确保自主机器人（如自动驾驶车辆）的安全性需要在多样化的驾驶场景中进行广泛测试。仿真是以成本有效且可扩展的方式开展此类测试的关键工具。神经渲染方法因其能够以数据驱动的方式从收集的日志中构建仿真环境而日益受到关注。然而，现有的基于神经辐射场（Neural Radiance Field, NeRF）的摄像头和激光雷达数据传感器真实感渲染方法，由于渲染速度较慢，限制了其在大规模测试中的应用。\n尽管 3D 高斯投影（3D Gaussian Splatting, 3DGS）支持实时渲染，但现有方法仅限于摄像头数据，无法渲染对自动驾驶至关重要的激光雷达数据。为解决这些限制，我们提出了 SplatAD，这是第一个基于 3DGS 的方法，能够对动态场景的摄像头和激光雷达数据进行真实感的实时渲染。SplatAD 通过专门设计的算法优化了渲染效率，精确建模了关键的传感器特定现象，例如滚动快门效应、激光雷达强度和激光雷达射线丢失。\n在三个自动驾驶数据集上的评估表明，SplatAD 在渲染质量上达到了最先进水平，对于新视角合成（NVS）提升了 +2 PSNR，对于重建任务提升了 +3 PSNR，同时渲染速度比基于 NeRF 的方法提高了一个数量级。\n"
  },
  {
    "path": "abs/2411.16877.md",
    "content": "### PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence\n\nWe present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length. Unlike previous approaches, PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images, enabling efficient novel-view rendering. We leverage DUSt3R's ability for pair-wise 3D structure reconstruction, and extend it to sequential multi-view input via a spatial memory network, eliminating the need for optimization-based global alignment. Additionally, PreF3R incorporates a dense Gaussian parameter prediction head, which enables subsequent novel-view synthesis with differentiable rasterization. This allows supervising our model with the combination of photometric loss and pointmap regression loss, enhancing both photorealism and structural accuracy. Given a sequence of ordered images, PreF3R incrementally reconstructs the 3D Gaussian field at 20 FPS, therefore enabling real-time novel-view rendering. Empirical experiments demonstrate that PreF3R is an effective solution for the challenging task of pose-free feed-forward novel-view synthesis, while also exhibiting robust generalization to unseen scenes.\n\n我们提出了 PreF3R，一种从可变长度图像序列中进行无位姿前馈式 3D 重建的方法。与以往的方法不同，PreF3R 不需要相机校准，能够直接从未配准的图像序列中在规范坐标系内重建 3D 高斯场，从而实现高效的新视角渲染。\n我们利用 DUSt3R 在成对 3D 结构重建中的能力，并通过空间记忆网络扩展到序列多视角输入，从而消除了基于优化的全局对齐需求。此外，PreF3R 集成了一个密集高斯参数预测模块，支持后续的新视角合成，结合可微光栅化进行渲染。该机制使得我们能够通过光度损失和点图回归损失的组合来监督模型，从而增强图像真实感和结构准确性。\n对于有序的图像序列，PreF3R 能以每秒 20 帧的速度增量式地重建 3D 高斯场，从而实现实时的新视角渲染。实验表明，PreF3R 是解决无位姿前馈式新视角合成这一挑战性任务的有效方案，同时在未知场景中表现出稳健的泛化能力。\n"
  },
  {
    "path": "abs/2411.16898.md",
    "content": "### G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs\n\nState-of-the-art novel view synthesis methods such as 3D Gaussian Splatting (3DGS) achieve remarkable visual quality. While 3DGS and its variants can be rendered efficiently using rasterization, many tasks require access to the underlying 3D surface, which remains challenging to extract due to the sparse and explicit nature of this representation. In this paper, we introduce G2SDF, a novel approach that addresses this limitation by integrating a neural implicit Signed Distance Field (SDF) into the Gaussian Splatting framework. Our method links the opacity values of Gaussians with their distances to the surface, ensuring a closer alignment of Gaussians with the scene surface. To extend this approach to unbounded scenes at varying scales, we propose a normalization function that maps any range to a fixed interval. To further enhance reconstruction quality, we leverage an off-the-shelf depth estimator as pseudo ground truth during Gaussian Splatting optimization. By establishing a differentiable connection between the explicit Gaussians and the implicit SDF, our approach enables high-quality surface reconstruction and rendering. Experimental results on several real-world datasets demonstrate that G2SDF achieves superior reconstruction quality than prior works while maintaining the efficiency of 3DGS.\n\n最先进的新视角合成方法，如 3D 高斯投影（3D Gaussian Splatting, 3DGS），在视觉质量上表现出色。虽然 3DGS 及其变体能够通过光栅化高效渲染，但许多任务需要访问底层 3D 表面，而由于其稀疏且显式的表示形式，这一问题仍然具有挑战性。\n本文提出了一种新方法 G2SDF，通过将神经隐式有符号距离场（Signed Distance Field, SDF）集成到高斯投影框架中，解决了这一限制。我们的方法将高斯的不透明度值与其到表面的距离关联起来，确保高斯与场景表面更加紧密地对齐。为了扩展到不同尺度的无限场景，我们提出了一种归一化函数，将任意范围映射到固定区间。此外，为了进一步提高重建质量，我们在高斯投影优化过程中利用现成的深度估计器作为伪地面实况。\n通过在显式高斯与隐式 SDF 之间建立可微分连接，G2SDF 实现了高质量的表面重建和渲染。在多个真实世界数据集上的实验结果表明，G2SDF 在保持 3DGS 高效性的同时，比现有方法实现了更优越的重建质量。\n"
  },
  {
    "path": "abs/2411.17044.md",
    "content": "### 4D Scaffold Gaussian Splatting for Memory Efficient Dynamic Scene Reconstruction\n\nExisting 4D Gaussian methods for dynamic scene reconstruction offer high visual fidelity and fast rendering. However, these methods suffer from excessive memory and storage demands, which limits their practical deployment. This paper proposes a 4D anchor-based framework that retains visual quality and rendering speed of 4D Gaussians while significantly reducing storage costs. Our method extends 3D scaffolding to 4D space, and leverages sparse 4D grid-aligned anchors with compressed feature vectors. Each anchor models a set of neural 4D Gaussians, each of which represent a local spatiotemporal region. In addition, we introduce a temporal coverage-aware anchor growing strategy to effectively assign additional anchors to under-reconstructed dynamic regions. Our method adjusts the accumulated gradients based on Gaussians' temporal coverage, improving reconstruction quality in dynamic regions. To reduce the number of anchors, we further present enhanced formulations of neural 4D Gaussians. These include the neural velocity, and the temporal opacity derived from a generalized Gaussian distribution. Experimental results demonstrate that our method achieves state-of-the-art visual quality and 97.8% storage reduction over 4DGS.\n\n现有的 4D 高斯方法在动态场景重建中能够提供高视觉保真度和快速渲染，但存在内存和存储需求过高的问题，限制了其实际部署。本文提出了一种基于 4D 锚点的框架，在保留 4D 高斯方法视觉质量和渲染速度的同时，显著降低存储成本。\n我们的方法将 3D 框架扩展到 4D 空间，并利用稀疏的 4D 网格对齐锚点和压缩特征向量。每个锚点建模了一组神经 4D 高斯，这些高斯分别表示局部时空区域。此外，我们引入了一个 时间覆盖感知的锚点增长策略，通过为未充分重建的动态区域分配额外的锚点来提高重建质量。我们的方法基于高斯的时间覆盖调整累积梯度，从而提升动态区域的重建效果。\n为减少锚点数量，我们进一步提出了增强版神经 4D 高斯的公式，包括神经速度和从广义高斯分布中导出的时间不透明度。实验结果表明，我们的方法在保持最先进视觉质量的同时，将存储需求降低了 97.8%，相较于 4DGS 实现了显著改进。\n"
  },
  {
    "path": "abs/2411.17067.md",
    "content": "### Geometry Field Splatting with Gaussian Surfels\n\nGeometric reconstruction of opaque surfaces from images is a longstanding challenge in computer vision, with renewed interest from volumetric view synthesis algorithms using radiance fields. We leverage the geometry field proposed in recent work for stochastic opaque surfaces, which can then be converted to volume densities. We adapt Gaussian kernels or surfels to splat the geometry field rather than the volume, enabling precise reconstruction of opaque solids. Our first contribution is to derive an efficient and almost exact differentiable rendering algorithm for geometry fields parameterized by Gaussian surfels, while removing current approximations involving Taylor series and no self-attenuation. Next, we address the discontinuous loss landscape when surfels cluster near geometry, showing how to guarantee that the rendered color is a continuous function of the colors of the kernels, irrespective of ordering. Finally, we use latent representations with spherical harmonics encoded reflection vectors rather than spherical harmonics encoded colors to better address specular surfaces. We demonstrate significant improvement in the quality of reconstructed 3D surfaces on widely-used datasets.\n\n从图像中进行不透明表面的几何重建是计算机视觉中的一个长期挑战，近年来，基于辐射场的体视角合成算法重新激发了对这一问题的兴趣。我们利用最近研究中提出的用于随机不透明表面的几何场（geometry field），将其转换为体密度，并适配高斯核或表面元（surfels）对几何场进行投影（splatting），从而实现对不透明固体的精确重建。我们的主要贡献包括以下三点：1. 高效且近乎精确的可微渲染算法：针对使用高斯表面元参数化的几何场，我们推导出一种高效且近乎精确的可微渲染算法，避免了当前方法中涉及的泰勒级数近似以及自衰减问题。2. 解决不连续的损失梯度问题：在表面元聚集于几何附近时，损失函数可能表现出不连续性。我们提出了一种方法，保证渲染颜色是内核颜色的连续函数，无论其排序如何，从而确保优化的稳定性。3. 改进镜面反射的建模：我们使用包含球谐反射向量的潜在表示替代传统的球谐颜色编码，以更好地处理镜面表面。在广泛使用的数据集上的实验表明，我们的方法显著提高了重建 3D 表面的质量，为几何重建任务提供了重要进展。\n"
  },
  {
    "path": "abs/2411.17190.md",
    "content": "### SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting\n\nWe propose SelfSplat, a novel 3D Gaussian Splatting model designed to perform pose-free and 3D prior-free generalizable 3D reconstruction from unposed multi-view images. These settings are inherently ill-posed due to the lack of ground-truth data, learned geometric information, and the need to achieve accurate 3D reconstruction without finetuning, making it difficult for conventional methods to achieve high-quality results. Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality. Furthermore, we incorporate a matching-aware pose estimation network and a depth refinement module to enhance geometry consistency across views, ensuring more accurate and stable 3D reconstructions. To present the performance of our method, we evaluated it on large-scale real-world datasets, including RealEstate10K, ACID, and DL3DV. SelfSplat achieves superior results over previous state-of-the-art methods in both appearance and geometry quality, also demonstrates strong cross-dataset generalization capabilities. Extensive ablation studies and analysis also validate the effectiveness of our proposed methods. Code and pretrained models are available at this https URL\n\n我们提出了 SelfSplat，一种新颖的 3D 高斯投影模型，旨在从未配准的多视角图像中进行无位姿和无 3D 先验的可泛化 3D 重建。这种设置由于缺乏真实数据、已学习的几何信息以及无需微调情况下实现准确 3D 重建的需求，天生具有病态性，使得传统方法难以获得高质量的结果。\n我们的模型通过有效整合显式 3D 表示与自监督的深度和位姿估计技术，解决了这些挑战，从而在位姿精度和 3D 重建质量上实现了相互促进的改进。此外，我们引入了匹配感知的位姿估计网络和深度优化模块，以增强跨视角的几何一致性，从而确保更准确且更稳定的 3D 重建。\n为了展示我们方法的性能，我们在大规模真实数据集（包括 RealEstate10K、ACID 和 DL3DV）上进行了评估。实验结果表明，SelfSplat 在外观和几何质量方面均优于之前的最先进方法，同时展现了强大的跨数据集泛化能力。广泛的消融研究和分析进一步验证了我们方法的有效性。\n"
  },
  {
    "path": "abs/2411.17605.md",
    "content": "### Distractor-free Generalizable 3D Gaussian Splatting\n\nWe present DGGS, a novel framework addressing the previously unexplored challenge of Distractor-free Generalizable 3D Gaussian Splatting (3DGS). It accomplishes two key objectives: fortifying generalizable 3DGS against distractor-laden data during both training and inference phases, while successfully extending cross-scene adaptation capabilities to conventional distractor-free approaches. To achieve these objectives, DGGS introduces a scene-agnostic reference-based mask prediction and refinement methodology during training phase, coupled with a training view selection strategy, effectively improving distractor prediction accuracy and training stability. Moreover, to address distractor-induced voids and artifacts during inference stage, we propose a two-stage inference framework for better reference selection based on the predicted distractor masks, complemented by a distractor pruning module to eliminate residual distractor effects. Extensive generalization experiments demonstrate DGGS's advantages under distractor-laden conditions. Additionally, experimental results show that our scene-agnostic mask inference achieves accuracy comparable to scene-specific trained methods. Homepage is \\url{this https URL}.\n\n我们提出了 DGGS，一个针对以往未探索的无干扰可泛化 3D 高斯投影（3D Gaussian Splatting, 3DGS）挑战的新框架。DGGS 实现了两个主要目标：在训练和推理阶段增强可泛化 3DGS 对含干扰数据的鲁棒性，同时将跨场景适应能力扩展到传统无干扰方法。\n为实现这些目标，DGGS 在训练阶段引入了与场景无关的参考掩码预测与优化方法，结合训练视角选择策略，有效提高了干扰预测的准确性并增强了训练稳定性。此外，为解决推理阶段干扰引起的空洞和伪影问题，我们提出了一种两阶段推理框架，通过基于预测干扰掩码的参考选择改进推理效果，同时结合干扰修剪模块消除残余干扰影响。\n广泛的泛化实验表明，DGGS 在含干扰条件下展现了显著优势。此外，实验结果显示，我们的场景无关掩码推理方法在准确性上可媲美场景特定训练方法。\n"
  },
  {
    "path": "abs/2411.17660.md",
    "content": "### DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting\n\nRecent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible.\nHowever, the tracking performance still lacks behind traditional and end-to-end SLAM systems.\nAn optimal trade-off between robustness, speed and accuracy has not yet been reached, especially for monocular video.\nIn this paper, we introduce a SLAM system based on an end-to-end Tracker and extend it with a Renderer based on recent 3D Gaussian Splatting techniques.\nOur framework DroidSplat achieves both SotA tracking and rendering results on common SLAM benchmarks.\nWe implemented multiple building blocks of modern SLAM systems to run in parallel, allowing for fast inference on common consumer GPU's.\nRecent progress in monocular depth prediction and camera calibration allows our system to achieve strong results even on in-the-wild data without known camera intrinsics.\n\n近期在场景合成方面的进展使得基于渲染目标优化超基元的独立 SLAM 系统成为可能。然而，其跟踪性能仍落后于传统和端到端的 SLAM 系统。尤其是在单目视频中，鲁棒性、速度和精度之间的最佳平衡尚未实现。\n本文提出了一种基于端到端跟踪器的 SLAM 系统，并结合了基于最新 3D 高斯投影（3D Gaussian Splatting）技术的渲染器。我们的框架 DroidSplat 在常见 SLAM 基准测试中同时实现了最先进（SotA）的跟踪和渲染性能。\n我们实现了多个现代 SLAM 系统的关键模块，使其能够并行运行，从而在常见消费级 GPU 上实现快速推理。得益于单目深度预测和相机校准领域的最新进展，我们的系统即使在未知相机内参的自然场景数据上也能获得优异的结果。\n"
  },
  {
    "path": "abs/2411.17982.md",
    "content": "### HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction\n\nWe present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, and ScanNet++, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality.\n\n我们提出了 HI-SLAM2，一种几何感知的高斯 SLAM 系统，仅使用 RGB 输入即可实现快速且精确的单目场景重建。现有的神经 SLAM 或基于 3D 高斯投影（3DGS）的 SLAM 方法通常在渲染质量和几何精度之间存在权衡。我们的研究表明，仅依靠 RGB 输入即可同时实现高质量的渲染和精确的几何重建。\n我们方法的核心思想是通过结合易获取的单目先验和基于学习的稠密 SLAM，提高几何估计能力，并使用 3D 高斯投影作为核心地图表示，以高效建模场景。在闭环（loop closure）阶段，我们通过高效的位姿图捆绑调整（pose graph bundle adjustment）和基于锚定关键帧更新的 3D 高斯单元变形，确保实时全局一致性和即时地图更新。此外，我们引入了一种基于网格的尺度对齐策略，以改进先验深度的尺度一致性，从而获取更精细的深度细节。\n通过在 Replica、ScanNet 和 ScanNet++ 数据集上的广泛实验，我们的方法在重建和渲染质量上显著优于现有的神经 SLAM 方法，甚至在某些指标上超越了基于 RGB-D 的方法。1\n\n\n"
  },
  {
    "path": "abs/2411.18066.md",
    "content": "### GLS: Geometry-aware 3D Language Gaussian Splatting\n\nRecently, 3D Gaussian Splatting (3DGS) has achieved significant performance on indoor surface reconstruction and open-vocabulary segmentation. This paper presents GLS, a unified framework of surface reconstruction and open-vocabulary segmentation based on 3DGS. GLS extends two fields by exploring the correlation between them. For indoor surface reconstruction, we introduce surface normal prior as a geometric cue to guide the rendered normal, and use the normal error to optimize the rendered depth. For open-vocabulary segmentation, we employ 2D CLIP features to guide instance features and utilize DEVA masks to enhance their view consistency. Extensive experiments demonstrate the effectiveness of jointly optimizing surface reconstruction and open-vocabulary segmentation, where GLS surpasses state-of-the-art approaches of each task on MuSHRoom, ScanNet++, and LERF-OVS datasets. Code will be available at this https URL.\n\n近年来，3D 高斯投影（3D Gaussian Splatting, 3DGS）在室内表面重建和开放词汇分割任务中取得了显著的性能进展。本文提出了 GLS，一个基于 3DGS 的统一框架，用于表面重建和开放词汇分割。GLS 通过探索两者之间的关联，扩展了这两个领域。\n在室内表面重建方面，我们引入了表面法线先验作为几何线索来引导渲染的法线，并通过法线误差优化渲染的深度。在开放词汇分割方面，我们采用 2D CLIP 特征来引导实例特征，并利用 DEVA 掩码增强视图一致性。\n大量实验表明，同时优化表面重建和开放词汇分割的有效性。在 MuSHRoom、ScanNet++ 和 LERF-OVS 数据集上，GLS 在每项任务中均超越了当前最先进的方法，展示了卓越的性能和通用性。\n"
  },
  {
    "path": "abs/2411.18072.md",
    "content": "### SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images\n\nSparse Multi-view Images can be Learned to predict explicit radiance fields via Generalizable Gaussian Splatting approaches, which can achieve wider application prospects in real-life when ground-truth camera parameters are not required as inputs. In this paper, a novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images. First, Gaussian surfels are predicted based on the multi-head Gaussian regression decoder, which can are represented with less degree-of-freedom but have better multi-view consistency. Furthermore, the normal vectors of Gaussian surfel are enhanced based on high-quality of normal priors. Second, the Gaussians and camera parameters (both extrinsic and intrinsic) are optimized to obtain high-quality Gaussian radiance fields for novel view synthesis tasks based on the proposed Bundle-Adjusting Gaussian Splatting module. Extensive experiments on novel view rendering and depth map prediction tasks are conducted on public datasets, demonstrating that the proposed method achieves state-of-the-art performance in various 3D vision tasks.\n\n稀疏多视角图像可以通过可泛化的高斯投影方法预测显式辐射场，从而在不需要真实相机参数作为输入的情况下实现更广泛的现实应用。本文提出了一种新颖的可泛化高斯投影方法 SmileSplat，仅依赖无约束的稀疏多视角图像即可在多样化场景中重建像素对齐的高斯表面元（surfels）。\n首先，SmileSplat 基于多头高斯回归解码器预测高斯表面元，具有较少的自由度表示，同时实现更好的多视角一致性。此外，通过高质量的法线先验增强了高斯表面元的法线向量精度。其次，我们设计了一个 Bundle-Adjusting Gaussian Splatting 模块，优化高斯和相机参数（包括外参和内参），以生成高质量的高斯辐射场用于新视角合成任务。\n在公共数据集上的新视角渲染和深度图预测任务的广泛实验表明，所提出的方法在多种 3D 视觉任务中实现了最先进的性能，展示了其在真实应用中的潜力和高效性。\n\n\n"
  },
  {
    "path": "abs/2411.18197.md",
    "content": "### Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters\n\n3D characters are essential to modern creative industries, but making them animatable often demands extensive manual work in tasks like rigging and skinning. Existing automatic rigging tools face several limitations, including the necessity for manual annotations, rigid skeleton topologies, and limited generalization across diverse shapes and poses. An alternative approach is to generate animatable avatars pre-bound to a rigged template mesh. However, this method often lacks flexibility and is typically limited to realistic human shapes. To address these issues, we present Make-It-Animatable, a novel data-driven method to make any 3D humanoid model ready for character animation in less than one second, regardless of its shapes and poses. Our unified framework generates high-quality blend weights, bones, and pose transformations. By incorporating a particle-based shape autoencoder, our approach supports various 3D representations, including meshes and 3D Gaussian splats. Additionally, we employ a coarse-to-fine representation and a structure-aware modeling strategy to ensure both accuracy and robustness, even for characters with non-standard skeleton structures. We conducted extensive experiments to validate our framework's effectiveness. Compared to existing methods, our approach demonstrates significant improvements in both quality and speed.\n\n3D 角色是现代创意产业的重要组成部分，但使其具有可动画性通常需要大量的手动工作，例如绑定骨架（rigging）和蒙皮（skinning）。现有的自动绑定工具存在多种局限性，包括需要手动标注、骨架拓扑结构固定，以及在多样化形状和姿势上的泛化能力有限。另一种替代方法是生成预绑定到骨架模板网格的可动画化身，但这种方法通常缺乏灵活性，并且通常仅限于逼真的人类形状。\n为了解决这些问题，我们提出了 Make-It-Animatable，一种新颖的数据驱动方法，能够在不到一秒的时间内使任何 3D 人形模型准备好用于角色动画，而不受其形状和姿势的限制。我们的统一框架能够生成高质量的混合权重（blend weights）、骨骼以及姿势变换。通过结合基于粒子的形状自动编码器（particle-based shape autoencoder），该方法支持多种 3D 表示形式，包括网格和 3D 高斯投影（Gaussian splats）。\n此外，我们采用了粗到细的表示方法和结构感知建模策略，确保即使对于具有非标准骨架结构的角色，也能实现准确性和鲁棒性。我们进行了广泛的实验验证了该框架的有效性。与现有方法相比，我们的方法在质量和速度上均表现出显著提升，为多样化的 3D 角色动画制作提供了高效的解决方案。\n"
  },
  {
    "path": "abs/2411.18311.md",
    "content": "### Neural Surface Priors for Editable Gaussian Splatting\n\nIn computer graphics, there is a need to recover easily modifiable representations of 3D geometry and appearance from image data. We introduce a novel method for this task using 3D Gaussian Splatting, which enables intuitive scene editing through mesh adjustments. Starting with input images and camera poses, we reconstruct the underlying geometry using a neural Signed Distance Field and extract a high-quality mesh. Our model then estimates a set of Gaussians, where each component is flat, and the opacity is conditioned on the recovered neural surface. To facilitate editing, we produce a proxy representation that encodes information about the Gaussians' shape and position. Unlike other methods, our pipeline allows modifications applied to the extracted mesh to be propagated to the proxy representation, from which we recover the updated parameters of the Gaussians. This effectively transfers the mesh edits back to the recovered appearance representation. By leveraging mesh-guided transformations, our approach simplifies 3D scene editing and offers improvements over existing methods in terms of usability and visual fidelity of edits. The complete source code for this project can be accessed at \\url{this https URL}\n\n在计算机图形学中，从图像数据中恢复易于修改的 3D 几何和外观表示是一项重要需求。我们提出了一种新方法，利用 3D 高斯投影（3D Gaussian Splatting） 实现这一任务，从而通过网格调整实现直观的场景编辑。以输入图像和相机位姿为起点，我们通过神经有符号距离场（Signed Distance Field, SDF）重建底层几何并提取高质量网格。\n我们的模型随后估计一组高斯分布，其中每个高斯组件是平坦的，其不透明度由恢复的神经表面决定。为简化编辑，我们生成了一种代理表示，编码了高斯的形状和位置信息。与其他方法不同，我们的流程允许对提取网格的修改自动传播到代理表示中，并由此恢复更新后的高斯参数。这一机制将网格编辑有效地传递回恢复的外观表示。\n通过利用基于网格的变换，我们的方法简化了 3D 场景编辑，并在编辑的可用性和视觉保真度方面优于现有方法。这种方式为用户提供了更直观和灵活的 3D 编辑体验，同时保证了高质量的渲染效果。\n"
  },
  {
    "path": "abs/2411.18473.md",
    "content": "### HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression\nFast progress in 3D Gaussian Splatting (3DGS) has made 3D Gaussians popular for 3D modeling and image rendering, but this creates big challenges in data storage and transmission. To obtain a highly compact 3DGS representation, we propose a hybrid entropy model for Gaussian Splatting (HEMGS) data compression, which comprises two primary components, a hyperprior network and an autoregressive network. To effectively reduce structural redundancy across attributes, we apply a progressive coding algorithm to generate hyperprior features, in which we use previously compressed attributes and location as prior information. In particular, to better extract the location features from these compressed attributes, we adopt a domain-aware and instance-aware architecture to respectively capture domain-aware structural relations without additional storage costs and reveal scene-specific features through MLPs. Additionally, to reduce redundancy within each attribute, we leverage relationships between neighboring compressed elements within the attributes through an autoregressive network. Given its unique structure, we propose an adaptive context coding algorithm with flexible receptive fields to effectively capture adjacent compressed elements. Overall, we integrate our HEMGS into an end-to-end optimized 3DGS compression framework and the extensive experimental results on four benchmarks indicate that our method achieves about 40\\% average reduction in size while maintaining the rendering quality over our baseline method and achieving state-of-the-art compression results.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）的快速进展使 3D 高斯在 3D 建模和图像渲染中备受欢迎，但也带来了数据存储和传输方面的巨大挑战。为实现高度紧凑的 3DGS 表示，我们提出了一种用于高斯投影数据压缩的混合熵模型（Hybrid Entropy Model for Gaussian Splatting, HEMGS），该模型由两个主要组件组成：一个超先验网络（hyperprior network）和一个自回归网络（autoregressive network）。\n为了有效减少属性间的结构冗余，我们采用了一种渐进编码算法生成超先验特征，利用先前压缩的属性和位置作为先验信息。特别是，为更好地从这些压缩属性中提取位置特征，我们采用了一种域感知和实例感知的架构，通过域感知结构关系捕捉实现无额外存储开销的结构化关系，同时通过多层感知机（MLPs）揭示场景特定特征。\n此外，为减少单个属性内的冗余，我们通过自回归网络利用相邻压缩元素之间的关系。基于这一独特结构，我们提出了一种具有灵活感受野的自适应上下文编码算法，有效捕捉相邻压缩元素。\n总体而言，我们将 HEMGS 集成到端到端优化的 3DGS 压缩框架中。通过在四个基准数据集上的广泛实验结果表明，与基线方法相比，我们的方法在保持渲染质量的同时，平均将数据大小减少了约 40%，并实现了最先进的压缩效果。\n"
  },
  {
    "path": "abs/2411.18548.md",
    "content": "### PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image\n\nWe present PhyCAGE, the first approach for physically plausible compositional 3D asset generation from a single image. Given an input image, we first generate consistent multi-view images for components of the assets. These images are then fitted with 3D Gaussian Splatting representations. To ensure that the Gaussians representing objects are physically compatible with each other, we introduce a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique to further optimize the positions of the Gaussians. It is achieved by setting the gradient of the SDS loss as the initial velocity of the physical simulation, allowing the simulator to act as a physics-guided optimizer that progressively corrects the Gaussians' positions to a physically compatible state. Experimental results demonstrate that the proposed method can generate physically plausible compositional 3D assets given a single image.\n\n我们提出了 PhyCAGE，这是第一个从单张图像生成物理合理的可组合 3D 资产的方法。对于输入图像，我们首先为资产的各个组件生成一致的多视角图像。这些图像随后通过 3D 高斯投影（Gaussian Splatting）表示进行拟合。\n为了确保表示对象的高斯与彼此之间在物理上相兼容，我们引入了一种 物理模拟增强的得分蒸馏采样（Physical Simulation-Enhanced Score Distillation Sampling, PSE-SDS） 技术，用于进一步优化高斯的位置。具体而言，我们将 SDS 损失的梯度设置为物理模拟的初始速度，使模拟器能够作为一个物理引导的优化器，逐步校正高斯的位置以达到物理兼容状态。\n实验结果表明，所提出的方法能够基于单张图像生成物理合理的可组合 3D 资产，为单视角下的 3D 场景生成提供了一种新颖且有效的解决方案。\n"
  },
  {
    "path": "abs/2411.18625.md",
    "content": "### Textured Gaussians for Enhanced 3D Scene Appearance Modeling\n\n3D Gaussian Splatting (3DGS) has recently emerged as a state-of-the-art 3D reconstruction and rendering technique due to its high-quality results and fast training and rendering time. However, pixels covered by the same Gaussian are always shaded in the same color up to a Gaussian falloff scaling factor. Furthermore, the finest geometric detail any individual Gaussian can represent is a simple ellipsoid. These properties of 3DGS greatly limit the expressivity of individual Gaussian primitives. To address these issues, we draw inspiration from texture and alpha mapping in traditional graphics and integrate it with 3DGS. Specifically, we propose a new generalized Gaussian appearance representation that augments each Gaussian with alpha~(A), RGB, or RGBA texture maps to model spatially varying color and opacity across the extent of each Gaussian. As such, each Gaussian can represent a richer set of texture patterns and geometric structures, instead of just a single color and ellipsoid as in naive Gaussian Splatting. Surprisingly, we found that the expressivity of Gaussians can be greatly improved by using alpha-only texture maps, and further augmenting Gaussians with RGB texture maps achieves the highest expressivity. We validate our method on a wide variety of standard benchmark datasets and our own custom captures at both the object and scene levels. We demonstrate image quality improvements over existing methods while using a similar or lower number of Gaussians.\n\n3D 高斯投影（3D Gaussian Splatting, 3DGS）因其高质量的结果以及快速的训练和渲染时间，近年来成为最先进的 3D 重建和渲染技术。然而，每个高斯覆盖的像素始终以相同的颜色着色，仅受到高斯衰减缩放因子的影响。此外，每个高斯能够表示的最精细几何细节仅限于简单的椭球体。这些特性极大地限制了单个高斯基元的表现能力。\n为了解决这些问题，我们从传统图形学中的纹理和透明度映射中汲取灵感，将其与 3DGS 相结合。具体而言，我们提出了一种新的广义高斯外观表示方法，为每个高斯添加透明度（Alpha, A）、RGB 或 RGBA 纹理映射，从而能够在每个高斯范围内建模空间变化的颜色和不透明度。这使得每个高斯不仅可以表示单一颜色和椭球体，还能够表现更加丰富的纹理模式和几何结构。\n令人惊讶的是，我们发现仅使用透明度纹理映射（alpha-only texture maps）即可显著提升高斯的表现力，而进一步为高斯增加 RGB 纹理映射可实现最高的表现力。我们在多种标准基准数据集以及自定义捕获的数据上对方法进行了验证，涵盖对象和场景级别的测试。实验结果表明，与现有方法相比，我们的方法在使用相似或更少数量高斯的情况下，显著提升了图像质量。\n"
  },
  {
    "path": "abs/2411.18667.md",
    "content": "### Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting\n\nPre-training on large-scale unlabeled datasets contribute to the model achieving powerful performance on 3D vision tasks, especially when annotations are limited. However, existing rendering-based self-supervised frameworks are computationally demanding and memory-intensive during pre-training due to the inherent nature of volume rendering. In this paper, we propose an efficient framework named GS3 to learn point cloud representation, which seamlessly integrates fast 3D Gaussian Splatting into the rendering-based framework. The core idea behind our framework is to pre-train the point cloud encoder by comparing rendered RGB images with real RGB images, as only Gaussian points enriched with learned rich geometric and appearance information can produce high-quality renderings. Specifically, we back-project the input RGB-D images into 3D space and use a point cloud encoder to extract point-wise features. Then, we predict 3D Gaussian points of the scene from the learned point cloud features and uses a tile-based rasterizer for image rendering. Finally, the pre-trained point cloud encoder can be fine-tuned to adapt to various downstream 3D tasks, including high-level perception tasks such as 3D segmentation and detection, as well as low-level tasks such as 3D scene reconstruction. Extensive experiments on downstream tasks demonstrate the strong transferability of the pre-trained point cloud encoder and the effectiveness of our self-supervised learning framework. In addition, our GS3 framework is highly efficient, achieving approximately 9× pre-training speedup and less than 0.25× memory cost compared to the previous rendering-based framework Ponder.\n\n大规模无标注数据集的预训练有助于模型在 3D 视觉任务中实现强大的性能，尤其是在标注有限的情况下。然而，现有基于渲染的自监督框架由于体渲染的固有特性，在预训练过程中通常计算开销大且内存占用高。本文提出了一种高效框架，名为 GS3，用于学习点云表示，该框架将快速的 3D 高斯点绘制（Gaussian Splatting）无缝集成到基于渲染的框架中。\n该框架的核心思想是通过比较渲染的 RGB 图像和真实 RGB 图像，来预训练点云编码器，因为只有富含几何和外观信息的高斯点才能生成高质量的渲染结果。具体来说，我们将输入的 RGB-D 图像反投影到 3D 空间中，并使用点云编码器提取逐点特征。随后，从学习到的点云特征中预测场景的 3D 高斯点，并使用基于网格的光栅化器进行图像渲染。最后，预训练的点云编码器可以被微调，用于适配各种下游 3D 任务，包括高层次感知任务（如 3D 分割和检测）以及低层次任务（如 3D 场景重建）。\n在下游任务上的大量实验表明，预训练点云编码器具有很强的迁移能力，而我们的自监督学习框架也非常高效。此外，GS3 框架在预训练速度上实现了约 9 倍加速，内存成本仅为之前基于渲染的框架 Ponder 的 0.25 倍以下。\n"
  },
  {
    "path": "abs/2411.18675.md",
    "content": "### GaussianSpeech: Audio-Driven Gaussian Avatars\n\nWe introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of talking humans in correspondence with audio, we captured a new large-scale multi-view dataset of audio-visual sequences of talking humans with native English accents and diverse facial geometry. GaussianSpeech consistently achieves state-of-the-art performance with visually natural motion at real time rendering rates, while encompassing diverse facial expressions and styles.\n\n我们提出了一种名为 GaussianSpeech 的新方法，可根据语音音频生成高保真动画序列，生成的 3D 逼真个性化人头头像具备高度写实效果。为了捕捉人头的丰富细节和表现力，包括皮肤皱褶和细微的面部动作，我们将语音信号与 3D 高斯点绘制（Gaussian Splatting）相结合，以生成逼真且时间上连贯的动态序列。\n我们设计了一种紧凑且高效的基于 3DGS（3D Gaussian Splatting）的头像表示方法，该方法生成与表情相关的颜色，并通过基于皱纹和感知的损失函数来合成面部细节，包括随不同表情变化的皱纹。为实现音频驱动的 3D 高斯点序列建模，我们开发了一种音频条件变换器模型（audio-conditioned transformer），能够直接从音频输入中提取唇部和表情特征。\n由于缺乏与语音对应的高质量人脸说话数据集，我们采集了一个全新的大规模多视角音视频序列数据集，该数据集包含以英语为母语的人物说话场景，具有多样的面部几何特征。GaussianSpeech 在真实时间渲染速率下，一贯实现了视觉自然的运动效果，支持多样的面部表情和风格，并在性能上达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2411.18866.md",
    "content": "### RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning\n\nGiven a single image of a target object, image-to-3D generation aims to reconstruct its texture and geometric shape. Recent methods often utilize intermediate media, such as multi-view images or videos, to bridge the gap between input image and the 3D target, thereby guiding the generation of both shape and texture. However, inconsistencies in the generated multi-view snapshots frequently introduce noise and artifacts along object boundaries, undermining the 3D reconstruction process. To address this challenge, we leverage 3D Gaussian Splatting (3DGS) for 3D reconstruction, and explicitly integrate uncertainty-aware learning into the reconstruction process. By capturing the stochasticity between two Gaussian models, we estimate an uncertainty map, which is subsequently used for uncertainty-aware regularization to rectify the impact of inconsistencies. Specifically, we optimize both Gaussian models simultaneously, calculating the uncertainty map by evaluating the discrepancies between rendered images from identical viewpoints. Based on the uncertainty map, we apply adaptive pixel-wise loss weighting to regularize the models, reducing reconstruction intensity in high-uncertainty regions. This approach dynamically detects and mitigates conflicts in multi-view labels, leading to smoother results and effectively reducing artifacts. Extensive experiments show the effectiveness of our method in improving 3D generation quality by reducing inconsistencies and artifacts.\n\n针对目标物体的单张图像，图像到 3D 的生成旨在重建其纹理和几何形状。近年来的方法通常利用多视图图像或视频等中间媒介，以弥合输入图像与 3D 目标之间的差距，从而指导形状和纹理的生成。然而，生成的多视图快照中的不一致性往往会在物体边界处引入噪声和伪影，削弱 3D 重建的效果。\n为了解决这一问题，我们在 3D 重建中引入了 3D Gaussian Splatting (3DGS)，并明确地将不确定性感知学习（uncertainty-aware learning）集成到重建过程中。通过捕捉两个高斯模型之间的随机性，我们估计出一个不确定性映射（uncertainty map），并基于此进行不确定性感知的正则化，以校正不一致性的影响。具体而言，我们同时优化两个高斯模型，通过评估从相同视点渲染的图像之间的差异来计算不确定性映射。基于该映射，我们采用自适应逐像素损失加权来正则化模型，在高不确定性区域减少重建强度。\n这种方法能够动态检测并缓解多视图标注中的冲突，从而实现更平滑的结果，并有效减少伪影。大量实验表明，我们的方法在减少不一致性和伪影的同时，显著提升了 3D 生成的质量。\n"
  },
  {
    "path": "abs/2411.18966.md",
    "content": "### SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors\n\nGaussian Splattings demonstrate impressive results in multi-view reconstruction based on Gaussian explicit representations. However, the current Gaussian primitives only have a single view-dependent color and an opacity to represent the appearance and geometry of the scene, resulting in a non-compact representation. In this paper, we introduce a new method called SuperGaussians that utilizes spatially varying colors and opacity in a single Gaussian primitive to improve its representation ability. We have implemented bilinear interpolation, movable kernels, and even tiny neural networks as spatially varying functions. Quantitative and qualitative experimental results demonstrate that all three functions outperform the baseline, with the best movable kernels achieving superior novel view synthesis performance on multiple datasets, highlighting the strong potential of spatially varying functions.\n\n高斯点云表示（Gaussian Splattings） 在基于高斯显式表示的多视图重建中表现出色。然而，目前的高斯原语仅通过单一的视图依赖颜色和不透明度表示场景的外观和几何结构，导致表示能力不足且不紧凑。\n本文提出了一种新方法 SuperGaussians，通过在单个高斯原语中引入空间变化颜色和不透明度，提升其表示能力。我们实现了三种空间变化函数：双线性插值、可移动核和微型神经网络。实验表明，这些方法均优于传统基线，尤其是可移动核在多个数据集的新视图合成任务中表现最优，展现了卓越性能。\n实验验证了空间变化函数的潜力，不仅提升了表示效率，还增强了新视图合成的效果，为高斯点云的多视图重建提供了更紧凑高效的解决方案。\n"
  },
  {
    "path": "abs/2411.19233.md",
    "content": "### Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes\n\nState-of-the-art novel view synthesis methods achieve impressive results for multi-view captures of static 3D scenes. However, the reconstructed scenes still lack \"liveliness,\" a key component for creating engaging 3D experiences. Recently, novel video diffusion models generate realistic videos with complex motion and enable animations of 2D images, however they cannot naively be used to animate 3D scenes as they lack multi-view consistency. To breathe life into the static world, we propose Gaussians2Life, a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is to leverage powerful video diffusion models as the generative component of our model and to combine these with a robust technique to lift 2D videos into meaningful 3D motion. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes and further enables the animation of a large variety of object classes, while related work is mostly focused on prior-based character animation, or single 3D objects. Our model enables the creation of consistent, immersive 3D experiences for arbitrary scenes.\n\n当前最先进的新视角合成方法在静态 3D 场景的多视图捕获方面取得了令人印象深刻的成果。然而，重建的场景仍然缺乏“生动性”，而这恰恰是创造引人入胜的 3D 体验的关键要素。最近，新的视频扩散模型可以生成具有复杂运动的逼真视频，并使 2D 图像的动画化成为可能，但由于缺乏多视图一致性，它们无法直接用于动画化 3D 场景。\n为了为静态世界注入生机，我们提出了 Gaussians2Life，一种基于高质量 3D 场景的高斯点绘制（Gaussian Splatting）表示实现局部动画化的方法。我们的方法核心在于利用强大的视频扩散模型作为生成组件，并结合一种稳健的技术，将 2D 视频提升为有意义的 3D 动态。与现有方法相比，我们发现这种方法能够对复杂的预存 3D 场景进行真实动画化，并支持多种对象类别的动画化，而相关工作主要集中于基于先验的角色动画或单一 3D 对象。\n我们的模型能够为任意场景创建一致且沉浸式的 3D 体验，从而为 3D 动画化和多视图一致性提供了新的可能性。\n"
  },
  {
    "path": "abs/2411.19235.md",
    "content": "### InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception\n\n3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations. However, three major challenges remain in leveraging 3DGS for scene understanding: 1) an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes; 2) inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and 3) reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation. In this work, we propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include: i) a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation; ii) a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and iii) a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies.\n\n3D 场景理解已成为一个重要的研究领域，广泛应用于自动驾驶、机器人和增强现实等领域。近年来，3D 高斯点云表示（3D Gaussian Splatting, 3DGS）作为一种强大的方法脱颖而出，它将显式建模与神经网络的适应性相结合，提供高效且细致的场景表示。然而，在利用 3DGS 进行场景理解时，仍然存在三大挑战：1）外观与语义之间的不平衡，细粒度纹理建模所需的高斯点云密度与语义属性的最低需求之间存在差异；2）外观与语义之间的不一致，单纯基于外观的高斯点云通常会错误地表示物体边界；以及 3）对自上而下实例分割方法的依赖，这种方法在类别分布不均时表现不佳，导致过分割或不足分割。\n为了解决这些问题，我们提出了 InstanceGaussian 方法，该方法能够联合学习外观和语义特征，同时自适应地聚合实例。我们的贡献包括：\ni）提出一种新颖的语义支架高斯点云表示（Semantic-Scaffold-GS），在外观和语义之间取得平衡，以改善特征表示和边界刻画；\nii）设计了一种渐进式外观-语义联合训练策略，以增强稳定性和分割准确性；\niii）提出一种自下而上、类别无关的实例聚合方法，利用最远点采样和连通分量分析解决分割挑战。\n我们的方法在类别无关的开放词汇 3D 点级分割任务中达到了最新的性能水平，验证了所提出表示方法和训练策略的有效性。\n"
  },
  {
    "path": "abs/2411.19271.md",
    "content": "### AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones\n\nGeometric priors are often used to enhance 3D reconstruction. With many smartphones featuring low-resolution depth sensors and the prevalence of off-the-shelf monocular geometry estimators, incorporating geometric priors as regularization signals has become common in 3D vision tasks. However, the accuracy of depth estimates from mobile devices is typically poor for highly detailed geometry, and monocular estimators often suffer from poor multi-view consistency and precision. In this work, we propose an approach for joint surface depth and normal refinement of Gaussian Splatting methods for accurate 3D reconstruction of indoor scenes. We develop supervision strategies that adaptively filters low-quality depth and normal estimates by comparing the consistency of the priors during optimization. We mitigate regularization in regions where prior estimates have high uncertainty or ambiguities. Our filtering strategy and optimization design demonstrate significant improvements in both mesh estimation and novel-view synthesis for both 3D and 2D Gaussian Splatting-based methods on challenging indoor room datasets. Furthermore, we explore the use of alternative meshing strategies for finer geometry extraction. We develop a scale-aware meshing strategy inspired by TSDF and octree-based isosurface extraction, which recovers finer details from Gaussian models compared to other commonly used open-source meshing tools.\n\n几何先验通常用于增强 3D 重建。随着许多智能手机配备低分辨率深度传感器，以及现成单目几何估计器的普及，在 3D 视觉任务中将几何先验作为正则化信号进行整合已变得十分常见。然而，移动设备生成的深度估计在高细节几何方面通常精度较低，而单目估计器则往往缺乏多视图一致性和精确度。\n在本研究中，我们提出了一种 联合表面深度和法向量细化的高斯点云方法，用于精确重建室内场景的 3D 几何。我们开发了监督策略，通过在优化过程中比较先验的一致性，自适应地过滤低质量的深度和法向量估计。在先验估计具有较高不确定性或存在歧义的区域，我们减少正则化的影响。我们的过滤策略和优化设计在 3D 和 2D 高斯点云方法基础上，对挑战性室内场景数据集的网格估计和新视图合成表现出了显著改进。\n此外，我们还探索了用于提取更精细几何的替代网格生成策略。我们开发了一种基于 TSDF（截断符号距离函数）和八叉树等值面提取 的尺度感知网格生成策略，相比其他常用的开源网格生成工具，该策略能够从高斯模型中恢复更精细的几何细节。\n"
  },
  {
    "path": "abs/2411.19290.md",
    "content": "### SADG: Segment Any Dynamic Gaussian Without Object Trackers\n\nUnderstanding dynamic 3D scenes is fundamental for various applications, including extended reality (XR) and autonomous driving. Effectively integrating semantic information into 3D reconstruction enables holistic representation that opens opportunities for immersive and interactive applications. We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs. In contrast to existing works, we do not rely on supervision based on object identities to enable consistent segmentation of dynamic 3D objects. To this end, we propose to learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining. The learned Gaussian features can be effectively clustered without further post-processing. This enables fast computation for further object-level editing, such as object removal, composition, and style transfer by manipulating the Gaussians in the scene. We further extend several dynamic novel-view datasets with segmentation benchmarks to enable testing of learned feature fields from unseen viewpoints. We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes along with its effectiveness for further downstream editing tasks.\n\n理解动态 3D 场景是扩展现实（XR）和自动驾驶等应用的基础。将语义信息有效集成到 3D 重建中，可实现整体化表示，为沉浸式和交互式应用提供了可能性。我们提出了 SADG（Segment Any Dynamic Gaussian Without Object Trackers），一种全新方法，结合了动态高斯点云表示与语义信息，而无需依赖对象 ID。\n与现有方法相比，我们不依赖基于对象身份的监督，便能够对动态 3D 对象进行一致的分割。为实现这一点，我们提出通过利用 Segment Anything Model (SAM) 生成的掩膜，以及基于困难像素挖掘的全新对比学习目标，学习语义感知特征。学习到的高斯特征可以在无需进一步后处理的情况下有效聚类，从而支持快速计算，实现对象级编辑，例如对象移除、组合和风格迁移，这些操作均通过对场景中高斯点进行操作完成。\n此外，我们扩展了多个动态新视图数据集，添加了分割基准，用于测试从未见过的视点中学习的特征场。我们在这些基准上评估了 SADG，结果表明，该方法在动态场景中对象分割方面具有卓越性能，同时在后续编辑任务中也展现了其高效性和有效性。\n"
  },
  {
    "path": "abs/2411.19454.md",
    "content": "### GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction\n\n3D Gaussian Splatting has achieved impressive performance in novel view synthesis with real-time rendering capabilities. However, reconstructing high-quality surfaces with fine details using 3D Gaussians remains a challenging task. In this work, we introduce GausSurf, a novel approach to high-quality surface reconstruction by employing geometry guidance from multi-view consistency in texture-rich areas and normal priors in texture-less areas of a scene. We observe that a scene can be mainly divided into two primary regions: 1) texture-rich and 2) texture-less areas. To enforce multi-view consistency at texture-rich areas, we enhance the reconstruction quality by incorporating a traditional patch-match based Multi-View Stereo (MVS) approach to guide the geometry optimization in an iterative scheme. This scheme allows for mutual reinforcement between the optimization of Gaussians and patch-match refinement, which significantly improves the reconstruction results and accelerates the training process. Meanwhile, for the texture-less areas, we leverage normal priors from a pre-trained normal estimation model to guide optimization. Extensive experiments on the DTU and Tanks and Temples datasets demonstrate that our method surpasses state-of-the-art methods in terms of reconstruction quality and computation time.\n\n3D 高斯点云表示（3D Gaussian Splatting）在新视图合成和实时渲染中表现出色，但在高质量、细节丰富的表面重建方面仍然具有挑战性。为此，我们提出了一种新方法 GausSurf，通过在纹理丰富区域利用多视图一致性几何指导，以及在纹理缺乏区域利用法向量先验，实现高质量的表面重建。\n我们观察到一个场景主要可以划分为两个区域：1）纹理丰富区域和 2）纹理缺乏区域。针对纹理丰富区域，我们通过结合传统的基于补丁匹配的多视图立体（Multi-View Stereo, MVS）方法，在迭代优化方案中引导几何优化，增强重建质量。该方案允许高斯点优化与补丁匹配细化之间的相互增强，从而显著提升重建结果并加速训练过程。\n而针对纹理缺乏区域，我们从预训练的法向量估计模型中提取法向量先验，用于指导优化。通过这种结合纹理区域特性的双重优化策略，GausSurf 能够更准确地恢复复杂场景中的几何细节。\n在 DTU 和 Tanks and Temples 数据集上的广泛实验表明，GausSurf 在重建质量和计算时间上均优于当前最先进的方法，展现了其卓越的性能和高效性。\n"
  },
  {
    "path": "abs/2411.19551.md",
    "content": "### Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding\n\nInjecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view segmentation and semantic understanding, their heavy reliance on 2D supervision can undermine cross-view semantic consistency and necessitate complex data preparation processes, therefore hindering view-consistent scene understanding. In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. Instead of directly learning semantic features, we introduce the IDentity-coupled Semantic Field (IDSF) into 3DGS, which captures both semantic representations and view-consistent instance indices for each Gaussian. We optimize IDSF with a two-step alternating strategy: semantics help to extract coherent instances in 3D space, while the resulting instances regularize the injection of stable semantics from 2D space. Additionally, we adopt a 2D-3D joint contrastive loss to enhance the complementarity between view-consistent 3D geometry and rich semantics during the bootstrapping process, enabling FreeGS to uniformly perform tasks such as novel-view semantic segmentation, object selection, and 3D object detection. Extensive experiments on LERF-Mask, 3D-OVS, and ScanNet datasets demonstrate that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.\n\n在 3D 高斯点云表示（3D Gaussian Splatting, 3DGS）中注入语义信息，近年来引起了广泛关注。尽管当前的方法通常依赖从 2D 基础模型（如 CLIP 和 SAM）提取 3D 语义特征，以促进新视图分割和语义理解，但其对 2D 监督的高度依赖可能会削弱跨视图语义一致性，同时需要复杂的数据准备过程，从而阻碍了视图一致的场景理解。\n为此，我们提出了 FreeGS，一种无监督的语义嵌入 3DGS 框架，可在无需 2D 标签的情况下实现视图一致的 3D 场景理解。与直接学习语义特征不同，我们引入了 身份耦合语义场（IDentity-coupled Semantic Field, IDSF） 到 3DGS 中，该方法为每个高斯点捕获语义表示和视图一致的实例索引。\n我们通过两步交替策略优化 IDSF：语义用于在 3D 空间中提取一致的实例，而提取出的实例则对从 2D 空间注入稳定语义起到正则化作用。此外，我们采用 2D-3D 联合对比损失，增强视图一致的 3D 几何与丰富语义之间的互补性，在引导过程中支持 FreeGS 统一执行多种任务，例如新视图语义分割、对象选择和 3D 对象检测。\n在 LERF-Mask、3D-OVS 和 ScanNet 数据集上的广泛实验表明，FreeGS 的性能与当前最先进的方法相当，同时避免了复杂的数据预处理工作量。这验证了 FreeGS 的高效性和实用性，为语义注入的 3D 场景理解提供了一种新方向。\n"
  },
  {
    "path": "abs/2411.19588.md",
    "content": "### Gaussian Splashing: Direct Volumetric Rendering Underwater\n\nIn underwater images, most useful features are occluded by water. The extent of the occlusion depends on imaging geometry and can vary even across a sequence of burst images. As a result, 3D reconstruction methods robust on in-air scenes, like Neural Radiance Field methods (NeRFs) or 3D Gaussian Splatting (3DGS), fail on underwater scenes. While a recent underwater adaptation of NeRFs achieved state-of-the-art results, it is impractically slow: reconstruction takes hours and its rendering rate, in frames per second (FPS), is less than 1. Here, we present a new method that takes only a few minutes for reconstruction and renders novel underwater scenes at 140 FPS. Named Gaussian Splashing, our method unifies the strengths and speed of 3DGS with an image formation model for capturing scattering, introducing innovations in the rendering and depth estimation procedures and in the 3DGS loss function. Despite the complexities of underwater adaptation, our method produces images at unparalleled speeds with superior details. Moreover, it reveals distant scene details with far greater clarity than other methods, dramatically improving reconstructed and rendered images. We demonstrate results on existing datasets and a new dataset we have collected.\n\n在水下图像中，大多数有用的特征会被水体遮挡，遮挡程度取决于成像几何结构，并且在一系列连拍图像中可能存在变化。因此，针对空气场景表现稳健的 3D 重建方法（如 Neural Radiance Fields (NeRFs) 或 3D Gaussian Splatting (3DGS)）在水下场景中往往失效。尽管最近的水下 NeRFs 改进方法达到了最先进的结果，但其效率极低：重建耗时数小时，渲染速率（每秒帧数，FPS）不足 1。\n为解决上述问题，我们提出了一种新方法 Gaussian Splashing，只需几分钟即可完成重建，并以 140 FPS 的速度渲染新的水下场景。该方法结合了 3DGS 的高效性与图像形成模型对散射现象的建模，针对渲染和深度估计过程以及 3DGS 损失函数进行了创新设计。\n尽管水下场景适应存在复杂性，Gaussian Splashing 方法能够以无与伦比的速度生成图像，并提供更优的细节表现。此外，该方法显著增强了对远距离场景细节的还原能力，与其他方法相比，大幅提升了重建和渲染图像的质量。\n我们在现有数据集和一个新采集的数据集上验证了该方法的效果，结果显示 Gaussian Splashing 不仅在速度上遥遥领先，还在水下图像重建的清晰度和细节表现方面展现了显著优势。\n"
  },
  {
    "path": "abs/2411.19594.md",
    "content": "### Tortho-Gaussian: Splatting True Digital Orthophoto Maps\n\nTrue Digital Orthophoto Maps (TDOMs) are essential products for digital twins and Geographic Information Systems (GIS). Traditionally, TDOM generation involves a complex set of traditional photogrammetric process, which may deteriorate due to various challenges, including inaccurate Digital Surface Model (DSM), degenerated occlusion detections, and visual artifacts in weak texture regions and reflective surfaces, etc. To address these challenges, we introduce TOrtho-Gaussian, a novel method inspired by 3D Gaussian Splatting (3DGS) that generates TDOMs through orthogonal splatting of optimized anisotropic Gaussian kernel. More specifically, we first simplify the orthophoto generation by orthographically splatting the Gaussian kernels onto 2D image planes, formulating a geometrically elegant solution that avoids the need for explicit DSM and occlusion detection. Second, to produce TDOM of large-scale area, a divide-and-conquer strategy is adopted to optimize memory usage and time efficiency of training and rendering for 3DGS. Lastly, we design a fully anisotropic Gaussian kernel that adapts to the varying characteristics of different regions, particularly improving the rendering quality of reflective surfaces and slender structures. Extensive experimental evaluations demonstrate that our method outperforms existing commercial software in several aspects, including the accuracy of building boundaries, the visual quality of low-texture regions and building facades. These results underscore the potential of our approach for large-scale urban scene reconstruction, offering a robust alternative for enhancing TDOM quality and scalability.\n\n真实数字正射影像图 (TDOMs) 是数字孪生和地理信息系统 (GIS) 的重要产品。然而，传统 TDOM 生成方法依赖复杂的摄影测量流程，容易受到各种挑战的影响，例如数字表面模型 (DSM) 的不准确性、遮挡检测的退化，以及弱纹理区域和反射表面中的视觉伪影等问题。\n为了解决这些问题，我们提出了 TOrtho-Gaussian，一种基于 3D Gaussian Splatting (3DGS) 的新方法，通过正交投影优化的各向异性高斯核生成 TDOM。具体而言：\n\t1.\t简化正射影像生成：我们通过将高斯核正交投影到 2D 图像平面，提供了一种几何上优雅的解决方案，避免了对显式 DSM 和遮挡检测的需求。\n\t2.\t处理大规模区域的 TDOM：我们采用分治策略，优化 3DGS 的内存使用和训练与渲染效率，从而支持大规模区域的正射影像生成。\n\t3.\t全各向异性高斯核设计：我们设计了一个完全各向异性的高斯核，根据不同区域的特性进行适配，特别是在提高反射表面和细长结构的渲染质量方面表现出色。\n通过大量实验评估，我们的方法在多个方面优于现有的商业软件，包括建筑边界的精确度、低纹理区域和建筑立面的视觉质量。这些结果表明，TOrtho-Gaussian 为大规模城市场景重建提供了一种强大的替代方法，显著提升了 TDOM 的质量和可扩展性。\n"
  },
  {
    "path": "abs/2411.19654.md",
    "content": "### TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting\n\nPhysically Based Rendering (PBR) materials play a crucial role in modern graphics, enabling photorealistic rendering across diverse environment maps. Developing an effective and efficient algorithm that is capable of automatically generating high-quality PBR materials rather than RGB texture for 3D meshes can significantly streamline the 3D content creation. Most existing methods leverage pre-trained 2D diffusion models for multi-view image synthesis, which often leads to severe inconsistency between the generated textures and input 3D meshes. This paper presents TexGaussian, a novel method that uses octant-aligned 3D Gaussian Splatting for rapid PBR material generation. Specifically, we place each 3D Gaussian on the finest leaf node of the octree built from the input 3D mesh to render the multiview images not only for the albedo map but also for roughness and metallic. Moreover, our model is trained in a regression manner instead of diffusion denoising, capable of generating the PBR material for a 3D mesh in a single feed-forward process. Extensive experiments on publicly available benchmarks demonstrate that our method synthesizes more visually pleasing PBR materials and runs faster than previous methods in both unconditional and text-conditional scenarios, which exhibit better consistency with the given geometry. Our code and trained models are available at this https URL.\n\n基于物理渲染（Physically Based Rendering, PBR） 材质在现代图形学中具有关键作用，使得在多种环境光照条件下实现真实感渲染成为可能。开发一种能够自动为 3D 网格生成高质量 PBR 材质（而非仅 RGB 纹理）的高效算法，可以显著简化 3D 内容创作流程。然而，现有大多数方法利用预训练的 2D 扩散模型进行多视图图像合成，常导致生成的纹理与输入 3D 网格之间严重不一致。\n为此，我们提出了 TexGaussian，一种利用八分体对齐的 3D Gaussian Splatting 快速生成 PBR 材质的新方法。具体来说，我们将每个 3D 高斯点置于从输入 3D 网格构建的八叉树的最细叶节点上，以渲染多视图图像，不仅生成反照率（albedo）图，还包括粗糙度（roughness）和金属性（metallic）图。此外，我们的模型通过回归方式训练，而非扩散去噪，这使得在一次前向传播过程中即可完成 3D 网格的 PBR 材质生成。\n在公开基准数据集上的广泛实验表明，TexGaussian 在无条件和文本条件场景下均能合成更具视觉吸引力的 PBR 材质，同时运行速度显著快于现有方法，并与给定几何保持更好的一致性。这使得 TexGaussian 成为高效生成高质量 PBR 材质的有力工具，为 3D 内容创作带来新突破。\n"
  },
  {
    "path": "abs/2411.19756.md",
    "content": "### DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering\n\nGaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat's effectiveness on three benchmark data sets for distractor-free novel view synthesis.\n\n高斯点云表示（Gaussian Splatting） 在静态 3D 环境中的快速新视图合成中表现出色。然而，在现实世界环境中进行高质量重建仍然具有挑战性，因为干扰物或遮挡物会破坏多视图一致性假设，从而影响精确的 3D 重建。大多数现有方法依赖于预训练模型提供的外部语义信息，这引入了额外的计算开销，无论是在预处理阶段还是优化过程中。\n为了解决这些问题，我们提出了一种新方法 DeSplat，能够仅基于高斯原语的体渲染直接分离干扰物和静态场景元素。我们的方法通过在每个相机视图中初始化高斯点，用于重建特定视图的干扰物，从而在alpha 合成阶段分别建模静态 3D 场景和干扰物。\nDeSplat 实现了静态元素和干扰物的显式场景分离，在不牺牲渲染速度的情况下，取得了与现有无干扰方法相当的效果。我们在三个基准数据集上验证了 DeSplat 的有效性，用于无干扰的新视图合成，结果表明该方法在效率和准确性方面均具有显著优势。\n"
  },
  {
    "path": "abs/2411.19895.md",
    "content": "### GuardSplat: Robust and Efficient Watermarking for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently created impressive assets for various applications. However, the copyright of these assets is not well protected as existing watermarking methods are not suited for 3DGS considering security, capacity, and invisibility. Besides, these methods often require hours or even days for optimization, limiting the application scenarios. In this paper, we propose GuardSplat, an innovative and efficient framework that effectively protects the copyright of 3DGS assets. Specifically, 1) We first propose a CLIP-guided Message Decoupling Optimization module for training the message decoder, leveraging CLIP's aligning capability and rich representations to achieve a high extraction accuracy with minimal optimization costs, presenting exceptional capability and efficiency. 2) Then, we propose a Spherical-harmonic-aware (SH-aware) Message Embedding module tailored for 3DGS, which employs a set of SH offsets to seamlessly embed the message into the SH features of each 3D Gaussian while maintaining the original 3D structure. It enables the 3DGS assets to be watermarked with minimal fidelity trade-offs and prevents malicious users from removing the messages from the model files, meeting the demands for invisibility and security. 3) We further propose an Anti-distortion Message Extraction module to improve robustness against various visual distortions. Extensive experiments demonstrate that GuardSplat outperforms the state-of-the-art methods and achieves fast optimization speed.\n\n3D Gaussian Splatting (3DGS) 最近在多种应用中展现了强大的能力。然而，这些资产的版权保护尚未得到充分解决，因为现有的水印方法在安全性、容量和隐蔽性方面不适合 3DGS。此外，这些方法通常需要数小时甚至数天进行优化，限制了实际应用场景。\n为此，我们提出了 GuardSplat，一个创新且高效的框架，用于有效保护 3DGS 资产的版权。具体而言：\n\t1.\tCLIP 引导的消息解耦优化模块\n我们提出了一个 CLIP 引导的消息解耦优化模块，用于训练消息解码器。利用 CLIP 的对齐能力和丰富表示，该模块能够以最小的优化成本实现高精度的消息提取，展现了出色的效率和性能。\n\t2.\t球谐感知（SH-aware）的消息嵌入模块\n我们设计了一种专为 3DGS 定制的 球谐感知消息嵌入模块，通过一组球谐偏移量（SH offsets）将消息无缝嵌入每个 3D 高斯的球谐特征中，同时保持原始 3D 结构。这种方法使 3DGS 资产能够在几乎不牺牲保真度的情况下嵌入水印，同时防止恶意用户从模型文件中移除消息，满足隐蔽性和安全性要求。\n\t3.\t抗失真消息提取模块\n我们进一步提出了一个 抗失真消息提取模块，增强了水印在面对各种视觉失真的鲁棒性。\n大量实验表明，GuardSplat 优于现有最先进的方法，在快速优化速度的同时，实现了卓越的水印嵌入和提取性能，为 3DGS 资产的版权保护提供了强有力的支持。\n"
  },
  {
    "path": "abs/2412.00155.md",
    "content": "### T-3DGS: Removing Transient Objects for 3D Scene Reconstruction\n\nWe propose a novel framework to remove transient objects from input videos for 3D scene reconstruction using Gaussian Splatting. Our framework consists of the following steps. In the first step, we propose an unsupervised training strategy for a classification network to distinguish between transient objects and static scene parts based on their different training behavior inside the 3D Gaussian Splatting reconstruction. In the second step, we improve the boundary quality and stability of the detected transients by combining our results from the first step with an off-the-shelf segmentation method. We also propose a simple and effective strategy to track objects in the input video forward and backward in time. Our results show an improvement over the current state of the art in existing sparsely captured datasets and significant improvements in a newly proposed densely captured (video) dataset.\n\n我们提出了一种新颖的框架，用于通过高斯散射方法去除输入视频中的瞬态物体，以实现3D场景重建。我们的框架包含以下步骤。\n在第一步中，我们提出了一种无监督训练策略，通过分类网络基于瞬态物体和静态场景在3D高斯散射重建中的不同训练行为来区分它们。\n在第二步中，我们将第一步的结果与现成的分割方法结合，提升了检测出的瞬态物体的边界质量和稳定性。\n我们还提出了一种简单有效的策略，用于在输入视频中对物体进行前后时间的跟踪。\n我们的结果表明，在现有稀疏捕获数据集上，我们的方法优于当前最先进的方法，并且在新提出的密集捕获（视频）数据集中表现出显著改进。\n"
  },
  {
    "path": "abs/2412.00333.md",
    "content": "### Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling\n\nDynamic scene rendering has taken a leap forward with the rise of 4D Gaussian Splatting, but there's still one elusive challenge: how to make 3D Gaussians move through time as naturally as they would in the real world, all while keeping the motion smooth and consistent. In this paper, we unveil a fresh approach that blends state-space modeling with Wasserstein geometry, paving the way for a more fluid and coherent representation of dynamic scenes. We introduce a State Consistency Filter that merges prior predictions with the current observations, enabling Gaussians to stay true to their way over time. We also employ Wasserstein distance regularization to ensure smooth, consistent updates of Gaussian parameters, reducing motion artifacts. Lastly, we leverage Wasserstein geometry to capture both translational motion and shape deformations, creating a more physically plausible model for dynamic scenes. Our approach guides Gaussians along their natural way in the Wasserstein space, achieving smoother, more realistic motion and stronger temporal coherence. Experimental results show significant improvements in rendering quality and efficiency, outperforming current state-of-the-art techniques.\n\n动态场景渲染随着4D高斯散射的兴起取得了显著进展，但仍有一个难以攻克的挑战：如何让3D高斯在时间中自然移动，就像真实世界中的表现一样，同时保持运动的平滑性和一致性。\n在本文中，我们提出了一种全新的方法，将状态空间建模与Wasserstein几何相结合，为动态场景提供更流畅且连贯的表示。我们引入了一种状态一致性滤波器（State Consistency Filter），将先验预测与当前观测融合，使高斯能够在时间维度上保持其自然轨迹。\n我们还采用Wasserstein距离正则化，确保高斯参数的更新平滑且一致，从而减少运动伪影。此外，通过Wasserstein几何捕捉平移运动和形状变形，构建了一个更符合物理规律的动态场景模型。\n我们的方法引导高斯在Wasserstein空间中沿其自然路径移动，实现更平滑、更真实的运动以及更强的时间连贯性。实验结果表明，该方法在渲染质量和效率方面均取得了显著改进，优于当前最先进的技术。\n"
  },
  {
    "path": "abs/2412.00392.md",
    "content": "### GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision\n\nWhile 3D Gaussian Splatting enables high-quality real-time rendering, existing Gaussian-based frameworks for 3D semantic segmentation still face significant challenges in boundary recognition accuracy. To address this, we propose a novel 3DGS-based framework named GradiSeg, incorporating Identity Encoding to construct a deeper semantic understanding of scenes. Our approach introduces two key modules: Identity Gradient Guided Densification (IGD) and Local Adaptive K-Nearest Neighbors (LA-KNN). The IGD module supervises gradients of Identity Encoding to refine Gaussian distributions along object boundaries, aligning them closely with boundary contours. Meanwhile, the LA-KNN module employs position gradients to adaptively establish locality-aware propagation of Identity Encodings, preventing irregular Gaussian spreads near boundaries. We validate the effectiveness of our method through comprehensive experiments. Results show that GradiSeg effectively addresses boundary-related issues, significantly improving segmentation accuracy without compromising scene reconstruction quality. Furthermore, our method's robust segmentation capability and decoupled Identity Encoding representation make it highly suitable for various downstream scene editing tasks, including 3D object removal, swapping and so on.\n\n虽然3D高斯散射（3D Gaussian Splatting）能够实现高质量的实时渲染，但基于高斯的3D语义分割框架在边界识别准确性方面仍面临显著挑战。为此，我们提出了一种新颖的基于3DGS的框架，名为GradiSeg，通过引入身份编码（Identity Encoding）来构建对场景的更深层次语义理解。\n我们的方法包含两个关键模块：身份梯度引导密化模块（Identity Gradient Guided Densification, IGD）和局部自适应K近邻模块（Local Adaptive K-Nearest Neighbors, LA-KNN）。IGD模块利用身份编码的梯度信息对高斯分布进行监督，使其在物体边界处更加精确，与边界轮廓对齐。与此同时，LA-KNN模块通过位置梯度自适应地建立身份编码的局部传播，避免边界附近出现不规则的高斯扩散。\n我们通过全面实验验证了方法的有效性。结果表明，GradiSeg在解决边界相关问题方面表现卓越，大幅提升了分割准确性，同时不影响场景重建质量。此外，我们方法的强大分割能力及其解耦的身份编码表示，使其在各种下游场景编辑任务中具有很高的适用性，包括3D物体移除、交换等操作。\n"
  },
  {
    "path": "abs/2412.00477.md",
    "content": "### LineGS : 3D Line Segment Representation on 3D Gaussian Splatting\n\nAbstract representations of 3D scenes are essential in computer vision, supporting tasks like mapping, localization, and surface reconstruction. Line segments are commonly used to capture scene structure, but existing 3D reconstruction methods often face limitations, either from instability in 2D projections or noise in direct 3D data. This paper introduces LineGS, a method that integrates geometry-guided 3D line reconstruction with a 3D Gaussian splatting model to improve accuracy. By leveraging Gaussian point densities along scene edges, LineGS refines initial line segments, aligning them more closely with the scene's geometric features. Experiments confirm that this approach enhances the fit to 3D structures, providing an efficient and reliable abstract representation of 3D scenes.\n\n3D场景的抽象表示在计算机视觉中至关重要，支持诸如地图构建、定位和表面重建等任务。线段通常用于捕捉场景结构，但现有的3D重建方法常因2D投影的不稳定性或直接3D数据中的噪声而面临限制。\n本文提出了一种名为LineGS的方法，将几何引导的3D线重建与3D高斯散射模型相结合，以提高精度。通过利用场景边缘附近的高斯点密度，LineGS能够优化初始线段，使其更贴合场景的几何特征。\n实验结果表明，该方法能够增强3D结构的拟合效果，提供了一种高效且可靠的3D场景抽象表示方式。\n"
  },
  {
    "path": "abs/2412.00578.md",
    "content": "### Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives\n\n3D Gaussian Splatting (3D-GS) is a recent 3D scene reconstruction technique that enables real-time rendering of novel views by modeling scenes as parametric point clouds of differentiable 3D Gaussians. However, its rendering speed and model size still present bottlenecks, especially in resource-constrained settings. In this paper, we identify and address two key inefficiencies in 3D-GS, achieving substantial improvements in rendering speed, model size, and training time. First, we optimize the rendering pipeline to precisely localize Gaussians in the scene, boosting rendering speed without altering visual fidelity. Second, we introduce a novel pruning technique and integrate it into the training pipeline, significantly reducing model size and training time while further raising rendering speed. Our Speedy-Splat approach combines these techniques to accelerate average rendering speed by a drastic 6.71× across scenes from the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets with 10.6× fewer primitives than 3D-GS.\n\n3D高斯散射（3D Gaussian Splatting, 3D-GS）是一种新兴的3D场景重建技术，通过将场景建模为可微分3D高斯的参数点云，实现了新视角的实时渲染。然而，其渲染速度和模型大小在资源受限的环境中仍然是瓶颈问题。\n在本文中，我们识别并解决了3D-GS中的两个关键低效点，从而在渲染速度、模型大小和训练时间方面实现了显著改进。首先，我们优化了渲染管道，精准定位场景中的高斯，提高了渲染速度，同时保持视觉保真度不变。其次，我们引入了一种新颖的剪枝技术，并将其整合到训练管道中，大幅减少了模型大小和训练时间，同时进一步提升了渲染速度。\n我们的方法Speedy-Splat结合了上述技术，将平均渲染速度提升了6.71倍，同时所需的高斯基元数量比3D-GS减少了10.6倍。实验在Mip-NeRF 360、Tanks & Temples和Deep Blending数据集上验证了这一性能。\n"
  },
  {
    "path": "abs/2412.00623.md",
    "content": "### A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision\n\nWe introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images, addressing the ill-posed nature of lifting 2D inputs to 3D. Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data. Diffusion models have recently shown promise as powerful generative models for 3D data, including Gaussian splats; however, standard diffusion frameworks typically require the target signal and denoised signal to be in the same modality, which is challenging given the scarcity of 3D data. To overcome this, we propose a novel training strategy that decouples the denoised modality from the supervision modality. By using a deterministic model as a noisy teacher to create the noised signal and transitioning from single-step to multi-step denoising supervised by an image rendering loss, our approach significantly enhances performance compared to the deterministic teacher. Additionally, our method is flexible, as it can learn from various 3D Gaussian Splat (3DGS) teachers with minimal adaptation; we demonstrate this by surpassing the performance of two different deterministic models as teachers, highlighting the potential generalizability of our framework. Our approach further incorporates a guidance mechanism to aggregate information from multiple views, enhancing reconstruction quality when more than one view is available. Experimental results on object-level and scene-level datasets demonstrate the effectiveness of our framework.\n\n我们提出了一种针对高斯散射（Gaussian Splats）的扩散模型，称为SplatDiffusion，以从单张图像生成三维结构，解决将二维输入提升为三维的病态问题。现有方法依赖于确定性、前馈式预测，这限制了它们处理从二维数据推断三维固有模糊性的能力。\n扩散模型最近被证明是三维数据（包括高斯散射）的强大生成模型。然而，标准的扩散框架通常要求目标信号和去噪信号处于相同模态中，这在三维数据稀缺的情况下具有挑战性。为了解决这一问题，我们提出了一种新的训练策略，将去噪模态与监督模态解耦。具体来说，我们利用一个确定性模型作为噪声教师，生成带噪信号，并从单步去噪过渡到通过图像渲染损失监督的多步去噪，大幅提升了相较于确定性教师的性能。\n此外，我们的方法具有灵活性，可通过最小适配从不同的三维高斯散射（3DGS）教师中学习；实验表明，我们的方法优于两种不同的确定性教师模型，展现了框架的潜在泛化能力。我们的方法还结合了一种指导机制，以聚合来自多视角的信息，在可用多个视角时进一步提高重建质量。\n在物体级和场景级数据集上的实验结果证明了我们框架的有效性。\n"
  },
  {
    "path": "abs/2412.00682.md",
    "content": "### FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting\n\nWe present FlashSLAM, a novel SLAM approach that leverages 3D Gaussian Splatting for efficient and robust 3D scene reconstruction. Existing 3DGS-based SLAM methods often fall short in sparse view settings and during large camera movements due to their reliance on gradient descent-based optimization, which is both slow and inaccurate. FlashSLAM addresses these limitations by combining 3DGS with a fast vision-based camera tracking technique, utilizing a pretrained feature matching model and point cloud registration for precise pose estimation in under 80 ms - a 90% reduction in tracking time compared to SplaTAM - without costly iterative rendering. In sparse settings, our method achieves up to a 92% improvement in average tracking accuracy over previous methods. Additionally, it accounts for noise in depth sensors, enhancing robustness when using unspecialized devices such as smartphones. Extensive experiments show that FlashSLAM performs reliably across both sparse and dense settings, in synthetic and real-world environments. Evaluations on benchmark datasets highlight its superior accuracy and efficiency, establishing FlashSLAM as a versatile and high-performance solution for SLAM, advancing the state-of-the-art in 3D reconstruction across diverse applications.\n\n我们提出了一种新颖的SLAM方法——FlashSLAM，该方法利用3D高斯散点（3D Gaussian Splatting，3DGS）实现高效且鲁棒的三维场景重建。现有基于3DGS的SLAM方法在稀疏视角设置和大范围相机运动情况下表现较差，主要原因是其依赖于梯度下降的优化过程，速度缓慢且精度不足。FlashSLAM通过将3DGS与一种快速的基于视觉的相机跟踪技术相结合，克服了这些局限。该方法采用预训练的特征匹配模型和点云配准技术，在不依赖代价高昂的迭代渲染的情况下，实现了精准的位姿估计，耗时不足80毫秒——与SplaTAM相比，跟踪时间减少了90%。在稀疏场景中，我们的方法在平均跟踪精度上相较于现有方法提升了高达92%。此外，它能够有效处理深度传感器中的噪声，从而增强了使用智能手机等非专业设备时的鲁棒性。\n大量实验表明，FlashSLAM在稀疏和密集环境中均表现可靠，适用于合成和真实场景。基准数据集上的评估结果进一步突出了其卓越的精度和效率，使FlashSLAM成为一个多功能、高性能的SLAM解决方案，在3D重建领域的多个应用中推动了技术的发展。\n"
  },
  {
    "path": "abs/2412.00734.md",
    "content": "### ChatSplat: 3D Conversational Gaussian Splatting\n\nHumans naturally interact with their 3D surroundings using language, and modeling 3D language fields for scene understanding and interaction has gained growing interest. This paper introduces ChatSplat, a system that constructs a 3D language field, enabling rich chat-based interaction within 3D space. Unlike existing methods that primarily use CLIP-derived language features focused solely on segmentation, ChatSplat facilitates interaction on three levels: objects, views, and the entire 3D scene. For view-level interaction, we designed an encoder that encodes the rendered feature map of each view into tokens, which are then processed by a large language model (LLM) for conversation. At the scene level, ChatSplat combines multi-view tokens, enabling interactions that consider the entire scene. For object-level interaction, ChatSplat uses a patch-wise language embedding, unlike LangSplat's pixel-wise language embedding that implicitly includes mask and embedding. Here, we explicitly decouple the language embedding into separate mask and feature map representations, allowing more flexible object-level interaction. To address the challenge of learning 3D Gaussians posed by the complex and diverse distribution of language embeddings used in the LLM, we introduce a learnable normalization technique to standardize these embeddings, facilitating effective learning. Extensive experimental results demonstrate that ChatSplat supports multi-level interactions -- object, view, and scene -- within 3D space, enhancing both understanding and engagement.\n\n人类自然地通过语言与三维环境交互，而针对场景理解和交互的三维语言场建模正引起越来越多的关注。本文介绍了ChatSplat，这是一种构建三维语言场的系统，能够在三维空间中实现丰富的基于对话的交互。与现有主要使用基于CLIP的语言特征并仅专注于分割的方式不同，ChatSplat在三个层次上实现交互：对象、视角和整个三维场景。\n在视角层次，ChatSplat设计了一种编码器，用于将每个视角的渲染特征图编码为令牌，这些令牌随后由大型语言模型（LLM）处理以支持对话。在场景层次，ChatSplat结合了多视角令牌，实现了考虑整个场景的交互。在对象层次，ChatSplat采用了基于patch的语言嵌入，与LangSplat的基于像素的语言嵌入（隐式包含掩码和嵌入）不同，这里明确地将语言嵌入解耦为单独的掩码和特征图表示，从而实现更灵活的对象级交互。\n针对LLM中语言嵌入复杂多样分布对三维高斯学习带来的挑战，我们引入了一种可学习的归一化技术，用于标准化这些嵌入，从而促进高效学习。大量实验结果表明，ChatSplat支持三维空间中的多层次交互（对象、视角和场景），显著增强了场景理解和交互体验。\n"
  },
  {
    "path": "abs/2412.00851.md",
    "content": "### DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair\n\nRecent advances in 3D Gaussian Splatting have shown promising results. Existing methods typically assume static scenes and/or multiple images with prior poses. Dynamics, sparse views, and unknown poses significantly increase the problem complexity due to insufficient geometric constraints. To overcome this challenge, we propose a method that can use only two images without prior poses to fit Gaussians in dynamic environments. To achieve this, we introduce two technical contributions. First, we propose an object-level two-view bundle adjustment. This strategy decomposes dynamic scenes into piece-wise rigid components, and jointly estimates the camera pose and motions of dynamic objects. Second, we design an SE(3) field-driven Gaussian training method. It enables fine-grained motion modeling through learnable per-Gaussian transformations. Our method leads to high-fidelity novel view synthesis of dynamic scenes while accurately preserving temporal consistency and object motion. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art approaches designed for the cases of static environments, multiple images, and/or known poses.\n\n近年来，3D高斯喷溅技术取得了令人瞩目的进展。现有方法通常假设场景是静态的和/或具有已知位姿的多张图像。在动态环境、稀疏视角以及未知位姿的情况下，由于几何约束不足，问题复杂性显著增加。为了解决这一挑战，我们提出了一种仅需两张无先验位姿图像即可在动态环境中拟合高斯的方法。\n为实现这一目标，我们引入了两项关键技术创新：\n首先，我们提出了一种基于对象级的双视图捆绑调整策略。该方法将动态场景分解为逐片刚性组件，并联合估计相机位姿和动态对象的运动。\n其次，我们设计了一种基于SE(3)场的高斯训练方法。这种方法通过每个高斯的可学习变换实现了细粒度的运动建模。\n我们的方法能够在动态场景中实现高保真度的新视角合成，同时精确保持时间一致性和对象运动的准确性。在合成和真实世界数据集上的实验表明，与针对静态环境、多图像和/或已知位姿设计的最新方法相比，我们的方法显著优于其性能。\n"
  },
  {
    "path": "abs/2412.00905.md",
    "content": "### Ref-GS: Directional Factorization for 2D Gaussian Splatting\n\nIn this paper, we introduce Ref-GS, a novel approach for directional light factorization in 2D Gaussian splatting, which enables photorealistic view-dependent appearance rendering and precise geometry recovery. Ref-GS builds upon the deferred rendering of Gaussian splatting and applies directional encoding to the deferred-rendered surface, effectively reducing the ambiguity between orientation and viewing angle. Next, we introduce a spherical Mip-grid to capture varying levels of surface roughness, enabling roughness-aware Gaussian shading. Additionally, we propose a simple yet efficient geometry-lighting factorization that connects geometry and lighting via the vector outer product, significantly reducing renderer overhead when integrating volumetric attributes. Our method achieves superior photorealistic rendering for a range of open-world scenes while also accurately recovering geometry.\n\n本文提出了Ref-GS，一种新颖的用于二维高斯散点方向光分解的方法，能够实现基于视角的真实感外观渲染和精确的几何恢复。Ref-GS建立在高斯散点的延迟渲染基础上，通过在延迟渲染表面上应用方向编码，有效降低了方向与视角之间的模糊性。\n此外，我们引入了一种球形Mip-grid，用于捕获表面粗糙度的不同级别，从而实现支持粗糙度感知的高斯着色。与此同时，我们提出了一种简单而高效的几何-光照分解方法，通过向量外积将几何与光照连接，在集成体积属性时显著降低了渲染器的计算开销。\n实验结果表明，Ref-GS在多个开放世界场景中实现了卓越的真实感渲染，同时能够准确地恢复场景几何，展现了强大的性能和适用性。\n"
  },
  {
    "path": "abs/2412.01217.md",
    "content": "### RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting\n\nHigh-fidelity reconstruction is crucial for dense SLAM. Recent popular methods utilize 3D gaussian splatting (3D GS) techniques for RGB, depth, and semantic reconstruction of scenes. However, these methods ignore issues of detail and consistency in different parts of the scene. To address this, we propose RGBDS-SLAM, a RGB-D semantic dense SLAM system based on 3D multi-level pyramid gaussian splatting, which enables high-fidelity dense reconstruction of scene RGB, depth, and semantics. In this system, we introduce a 3D multi-level pyramid gaussian splatting method that restores scene details by extracting multi-level image pyramids for gaussian splatting training, ensuring consistency in RGB, depth, and semantic reconstructions. Additionally, we design a tightly-coupled multifeatures reconstruction optimization mechanism, allowing the reconstruction accuracy of RGB, depth, and semantic features to mutually enhance each other during the rendering optimization process. Extensive quantitative, qualitative, and ablation experiments on the Replica and ScanNet public datasets demonstrate that our proposed method outperforms current state-of-the-art methods, which achieves great improvement by 11.13% in PSNR and 68.57% in LPIPS.\n\n\n高保真重建对于密集SLAM至关重要。近年来流行的方法利用3D高斯散点（3D Gaussian Splatting, 3D GS）技术对场景的RGB、深度和语义进行重建。然而，这些方法在场景不同部分的细节和一致性问题上存在忽视。为了解决这些问题，我们提出了RGBDS-SLAM，这是一种基于3D多级金字塔高斯散点的RGB-D语义密集SLAM系统，实现了场景RGB、深度和语义的高保真密集重建。\n在该系统中，我们引入了一种3D多级金字塔高斯散点方法，通过提取多级图像金字塔进行高斯散点训练，恢复场景细节，并确保RGB、深度和语义重建的一致性。此外，我们设计了一种紧耦合的多特征重建优化机制，使RGB、深度和语义特征在渲染优化过程中能够相互增强重建精度。\n在Replica和ScanNet公共数据集上的大量定量、定性和消融实验表明，我们提出的方法优于现有最先进方法，在PSNR指标上提升了11.13%，在LPIPS指标上提升了68.57%。\n"
  },
  {
    "path": "abs/2412.01402.md",
    "content": "### ULSR-GS: Ultra Large-scale Surface Reconstruction Gaussian Splatting with Multi-View Geometric Consistency\n\nWhile Gaussian Splatting (GS) demonstrates efficient and high-quality scene rendering and small area surface extraction ability, it falls short in handling large-scale aerial image surface extraction tasks. To overcome this, we present ULSR-GS, a framework dedicated to high-fidelity surface extraction in ultra-large-scale scenes, addressing the limitations of existing GS-based mesh extraction methods. Specifically, we propose a point-to-photo partitioning approach combined with a multi-view optimal view matching principle to select the best training images for each sub-region. Additionally, during training, ULSR-GS employs a densification strategy based on multi-view geometric consistency to enhance surface extraction details. Experimental results demonstrate that ULSR-GS outperforms other state-of-the-art GS-based works on large-scale aerial photogrammetry benchmark datasets, significantly improving surface extraction accuracy in complex urban environments.\n\n尽管高斯喷溅（Gaussian Splatting, GS）在高效高质量场景渲染和小面积表面提取方面表现出色，但在处理大规模航拍图像的表面提取任务时存在不足。为了解决这一问题，我们提出了ULSR-GS，一个专注于超大规模场景高保真表面提取的框架，用于克服现有基于GS的网格提取方法的局限性。\n具体而言，我们提出了一种点到照片的分区方法（point-to-photo partitioning），结合多视图最佳视角匹配原则，选择每个子区域的最佳训练图像。此外，在训练过程中，ULSR-GS采用了一种基于多视图几何一致性的稠密化策略，以增强表面提取细节。\n实验结果表明，ULSR-GS在大规模航拍摄影测量基准数据集上的表现显著优于其他最先进的基于GS的方法，在复杂城市环境中大幅提高了表面提取的准确性。\n"
  },
  {
    "path": "abs/2412.01543.md",
    "content": "### 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting\n\nEfficient and accurate object pose estimation is an essential component for modern vision systems in many applications such as Augmented Reality, autonomous driving, and robotics. While research in model-based 6D object pose estimation has delivered promising results, model-free methods are hindered by the high computational load in rendering and inferring consistent poses of arbitrary objects in a live RGB-D video stream. To address this issue, we present 6DOPE-GS, a novel method for online 6D object pose estimation \\& tracking with a single RGB-D camera by effectively leveraging advances in Gaussian Splatting. Thanks to the fast differentiable rendering capabilities of Gaussian Splatting, 6DOPE-GS can simultaneously optimize for 6D object poses and 3D object reconstruction. To achieve the necessary efficiency and accuracy for live tracking, our method uses incremental 2D Gaussian Splatting with an intelligent dynamic keyframe selection procedure to achieve high spatial object coverage and prevent erroneous pose updates. We also propose an opacity statistic-based pruning mechanism for adaptive Gaussian density control, to ensure training stability and efficiency. We evaluate our method on the HO3D and YCBInEOAT datasets and show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction while providing a 5× speedup. We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.\n\n高效且精准的物体位姿估计是增强现实（AR）、自动驾驶和机器人等许多应用中现代视觉系统的核心组件。尽管基于模型的6D物体位姿估计研究取得了令人鼓舞的成果，但无模型方法由于需要在实时RGB-D视频流中渲染和推断任意物体的一致位姿，通常面临较高的计算负担。\n为解决这一问题，我们提出了6DOPE-GS，一种利用单个RGB-D相机进行在线6D物体位姿估计与跟踪的新方法，充分利用了高斯散点（Gaussian Splatting）的技术进步。借助高斯散点的快速可微渲染能力，6DOPE-GS能够同时优化6D物体位姿和3D物体重建。\n为了实现实时跟踪所需的效率和精度，我们的方法采用增量式2D高斯散点结合智能动态关键帧选择策略，以实现高空间覆盖率并避免错误的位姿更新。此外，我们提出了一种基于不透明度统计的修剪机制，用于自适应高斯密度控制，以确保训练的稳定性与效率。\n我们在HO3D和YCBInEOAT数据集上对该方法进行了评估，结果表明，6DOPE-GS在无模型的6D位姿跟踪和重建任务中表现与最先进的基线方法相当，同时提供了5倍的速度提升。此外，我们还展示了该方法在实时动态物体跟踪和重建的真实场景中的适用性。\n"
  },
  {
    "path": "abs/2412.01553.md",
    "content": "### SfM-Free 3D Gaussian Splatting via Hierarchical Training\n\nStandard 3D Gaussian Splatting (3DGS) relies on known or pre-computed camera poses and a sparse point cloud, obtained from structure-from-motion (SfM) preprocessing, to initialize and grow 3D Gaussians. We propose a novel SfM-Free 3DGS (SFGS) method for video input, eliminating the need for known camera poses and SfM preprocessing. Our approach introduces a hierarchical training strategy that trains and merges multiple 3D Gaussian representations -- each optimized for specific scene regions -- into a single, unified 3DGS model representing the entire scene. To compensate for large camera motions, we leverage video frame interpolation models. Additionally, we incorporate multi-source supervision to reduce overfitting and enhance representation. Experimental results reveal that our approach significantly surpasses state-of-the-art SfM-free novel view synthesis methods. On the Tanks and Temples dataset, we improve PSNR by an average of 2.25dB, with a maximum gain of 3.72dB in the best scene. On the CO3D-V2 dataset, we achieve an average PSNR boost of 1.74dB, with a top gain of 3.90dB.\n\n标准的三维高斯散点（3D Gaussian Splatting, 3DGS）依赖于已知或预计算的相机位姿以及通过结构化运动（SfM）预处理获得的稀疏点云，用于初始化和扩展3D高斯。我们提出了一种面向视频输入的全新SfM-Free 3DGS（SFGS）方法，消除了对已知相机位姿和SfM预处理的依赖。\n我们的方法引入了一种分层训练策略，通过训练和合并多个针对特定场景区域优化的3D高斯表示，生成一个统一的3DGS模型来表示整个场景。为应对大范围相机运动，我们利用了视频帧插值模型。此外，我们结合多源监督，降低过拟合风险并增强场景表示能力。\n实验结果表明，我们的方法显著优于当前最先进的无SfM新视角合成方法。在Tanks and Temples数据集上，我们的PSNR平均提升了2.25dB，单场景最高提升达3.72dB。在CO3D-V2数据集上，我们的平均PSNR提升了1.74dB，最大增幅达3.90dB。\n\n"
  },
  {
    "path": "abs/2412.01583.md",
    "content": "### 3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting\n\nThe creation of 3D scenes has traditionally been both labor-intensive and costly, requiring designers to meticulously configure 3D assets and environments. Recent advancements in generative AI, including text-to-3D and image-to-3D methods, have dramatically reduced the complexity and cost of this process. However, current techniques for editing complex 3D scenes continue to rely on generally interactive multi-step, 2D-to-3D projection methods and diffusion-based techniques, which often lack precision in control and hamper real-time performance. In this work, we propose 3DSceneEditor, a fully 3D-based paradigm for real-time, precise editing of intricate 3D scenes using Gaussian Splatting. Unlike conventional methods, 3DSceneEditor operates through a streamlined 3D pipeline, enabling direct manipulation of Gaussians for efficient, high-quality edits based on input prompts.The proposed framework (i) integrates a pre-trained instance segmentation model for semantic labeling; (ii) employs a zero-shot grounding approach with CLIP to align target objects with user prompts; and (iii) applies scene modifications, such as object addition, repositioning, recoloring, replacing, and deletion directly on Gaussians. Extensive experimental results show that 3DSceneEditor achieves superior editing precision and speed with respect to current SOTA 3D scene editing approaches, establishing a new benchmark for efficient and interactive 3D scene customization.\n\n\n3D场景的创建传统上既耗时又昂贵，需要设计者精心配置3D资产和环境。近年来，生成式人工智能的进步（如文本到3D和图像到3D方法）显著降低了这一过程的复杂性和成本。然而，当前用于编辑复杂3D场景的技术依然依赖于交互性强、多步骤的2D到3D投影方法和扩散模型技术，这些方法往往缺乏精确控制并且阻碍了实时性能的实现。\n在本研究中，我们提出了3DSceneEditor，一种基于完全3D的实时精确编辑复杂3D场景的范式，利用高斯散点（Gaussian Splatting）实现。与传统方法不同，3DSceneEditor通过精简的3D管道运行，允许直接操控高斯，以高效、高质量地根据输入提示进行编辑。\n所提出的框架具有以下特点：(i) 集成了预训练的实例分割模型以进行语义标注；(ii) 使用基于CLIP的零样本（zero-shot）定位方法，将目标对象与用户提示对齐；(iii) 支持直接在高斯上进行场景修改，如对象添加、重新定位、重新着色、替换和删除。\n大量实验结果表明，3DSceneEditor在编辑精度和速度方面显著优于当前最先进的3D场景编辑方法，为高效、交互式3D场景定制设立了新的标杆。\n"
  },
  {
    "path": "abs/2412.01745.md",
    "content": "### Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes\n\nSeamless integration of both aerial and street view images remains a significant challenge in neural scene reconstruction and rendering. Existing methods predominantly focus on single domain, limiting their applications in immersive environments, which demand extensive free view exploration with large view changes both horizontally and vertically. We introduce Horizon-GS, a novel approach built upon Gaussian Splatting techniques, tackles the unified reconstruction and rendering for aerial and street views. Our method addresses the key challenges of combining these perspectives with a new training strategy, overcoming viewpoint discrepancies to generate high-fidelity scenes. We also curate a high-quality aerial-to-ground views dataset encompassing both synthetic and real-world scene to advance further research. Experiments across diverse urban scene datasets confirm the effectiveness of our method.\n\n在神经场景重建与渲染中，实现航拍视角与街景视角的无缝融合仍然是一项重大挑战。现有方法大多专注于单一视角领域，限制了其在需要大范围自由视角探索（包括水平和垂直大视角变化）的沉浸式环境中的应用。我们提出了Horizon-GS，一种基于高斯散点（Gaussian Splatting）技术的新方法，旨在实现航拍与街景视角的统一重建与渲染。\n该方法针对将这两种视角融合的核心挑战，引入了一种全新的训练策略，克服了视角差异问题，从而生成高保真的场景。此外，我们精心构建了一个高质量的“航拍到地面视角”数据集，涵盖合成和真实场景，以推动相关研究的进一步发展。\n在多个城市场景数据集上的实验结果验证了我们方法的有效性，展现了其在高质量视角融合与场景重建上的强大性能。\n\n"
  },
  {
    "path": "abs/2412.01807.md",
    "content": "### Occam's LGS: A Simple Approach for Language Gaussian Splatting\n\nGaussian Splatting is a widely adopted approach for 3D scene representation that offers efficient, high-quality 3D reconstruction and rendering. A major reason for the success of 3DGS is its simplicity of representing a scene with a set of Gaussians, which makes it easy to interpret and adapt. To enhance scene understanding beyond the visual representation, approaches have been developed that extend 3D Gaussian Splatting with semantic vision-language features, especially allowing for open-set tasks. In this setting, the language features of 3D Gaussian Splatting are often aggregated from multiple 2D views. Existing works address this aggregation problem using cumbersome techniques that lead to high computational cost and training time. In this work, we show that the sophisticated techniques for language-grounded 3D Gaussian Splatting are simply unnecessary. Instead, we apply Occam's razor to the task at hand and perform weighted multi-view feature aggregation using the weights derived from the standard rendering process, followed by a simple heuristic-based noisy Gaussian filtration. Doing so offers us state-of-the-art results with a speed-up of two orders of magnitude. We showcase our results in two commonly used benchmark datasets: LERF and 3D-OVS. Our simple approach allows us to perform reasoning directly in the language features, without any compression whatsoever. Such modeling in turn offers easy scene manipulation, unlike the existing methods -- which we illustrate using an application of object insertion in the scene. Furthermore, we provide a thorough discussion regarding the significance of our contributions within the context of the current literature.\n\n高斯喷溅（Gaussian Splatting）是一种广泛应用于三维场景表示的方法，以其高效性和高质量的3D重建与渲染能力而备受认可。3DGS取得成功的主要原因在于其简单性——通过一组高斯来表示场景，使其易于理解和适配。为了在视觉表现之外增强场景理解，一些方法扩展了3DGS，引入了语义视觉-语言特征，特别是支持开放集任务的能力。在这一背景下，3DGS的语言特征通常通过多个2D视角进行聚合。然而，现有方法在处理特征聚合问题时依赖复杂的技术，导致计算成本和训练时间居高不下。\n在本研究中，我们表明，对于语言驱动的3DGS而言，这些复杂技术是完全不必要的。相反，我们借助奥卡姆剃刀原则，采用基于标准渲染过程权重的加权多视角特征聚合方法，并结合简单的启发式噪声高斯过滤。此方法不仅实现了最先进的性能，还实现了两个数量级的加速。\n我们在两个常用的基准数据集（LERF和3D-OVS）上展示了我们的成果。我们的简单方法允许直接在语言特征中进行推理，无需任何压缩操作。这种建模方法还支持轻松的场景操作，与现有方法不同——我们通过场景中对象插入的应用示例说明了这一点。此外，我们深入讨论了我们的贡献在当前文献中的意义，为3DGS在语义和语言扩展中的进一步发展提供了启示。\n"
  },
  {
    "path": "abs/2412.01823.md",
    "content": "### HDGS: Textured 2D Gaussian Splatting for Enhanced Scene Rendering\n\nRecent advancements in neural rendering, particularly 2D Gaussian Splatting (2DGS), have shown promising results for jointly reconstructing fine appearance and geometry by leveraging 2D Gaussian surfels. However, current methods face significant challenges when rendering at arbitrary viewpoints, such as anti-aliasing for down-sampled rendering, and texture detail preservation for high-resolution rendering. We proposed a novel method to align the 2D surfels with texture maps and augment it with per-ray depth sorting and fisher-based pruning for rendering consistency and efficiency. With correct order, per-surfel texture maps significantly improve the capabilities to capture fine details. Additionally, to render high-fidelity details in varying viewpoints, we designed a frustum-based sampling method to mitigate the aliasing artifacts. Experimental results on benchmarks and our custom texture-rich dataset demonstrate that our method surpasses existing techniques, particularly in detail preservation and anti-aliasing.\n\n近年来，神经渲染技术，特别是二维高斯散点（2D Gaussian Splatting, 2DGS），通过利用二维高斯表面元素（surfels），在细节外观和几何的联合重建方面展现了令人瞩目的成果。然而，当前方法在任意视角渲染时仍面临诸多挑战，例如在下采样渲染时的抗锯齿处理以及高分辨率渲染时的纹理细节保留。\n为解决这些问题，我们提出了一种新颖的方法，将二维表面元素与纹理贴图对齐，并通过每光线深度排序和基于Fisher准则的修剪增强渲染一致性和效率。在正确的顺序下，基于每表面元素的纹理贴图显著提升了捕获细节的能力。此外，为了在不同视角下渲染高保真细节，我们设计了一种基于视锥的采样方法，有效减轻了锯齿伪影。\n在基准测试数据集以及我们自定义的高纹理细节数据集上的实验结果表明，我们的方法在细节保留和抗锯齿方面显著优于现有技术，为高保真渲染设立了新的标杆。\n"
  },
  {
    "path": "abs/2412.01931.md",
    "content": "### Planar Gaussian Splatting\n\nThis paper presents Planar Gaussian Splatting (PGS), a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene, directly from multiple RGB images. The PGS leverages Gaussian primitives to model the scene and employ a hierarchical Gaussian mixture approach to group them. Similar Gaussians are progressively merged probabilistically in the tree-structured Gaussian mixtures to identify distinct 3D plane instances and form the overall 3D scene geometry. In order to enable the grouping, the Gaussian primitives contain additional parameters, such as plane descriptors derived by lifting 2D masks from a general 2D segmentation model and surface normals. Experiments show that the proposed PGS achieves state-of-the-art performance in 3D planar reconstruction without requiring either 3D plane labels or depth supervision. In contrast to existing supervised methods that have limited generalizability and struggle under domain shift, PGS maintains its performance across datasets thanks to its neural rendering and scene-specific optimization mechanism, while also being significantly faster than existing optimization-based approaches.\n\n本文提出了平面高斯喷溅（Planar Gaussian Splatting, PGS），一种新颖的神经渲染方法，用于从多张RGB图像直接学习场景的三维几何和解析三维平面结构。PGS利用高斯原语对场景进行建模，并采用分层高斯混合方法对这些高斯进行分组。通过树状高斯混合模型的概率性逐步合并，相似的高斯被识别为不同的三维平面实例，从而形成整体的三维场景几何。\n为了实现分组，每个高斯原语包含额外的参数，例如从通用二维分割模型中提升的二维掩码平面描述符和表面法线。实验结果表明，PGS在三维平面重建中实现了最先进的性能，无需三维平面标签或深度监督。与现有的监督方法相比，这些方法通常在领域迁移下表现不佳且泛化能力有限，而PGS得益于其神经渲染和场景特定优化机制，能够在跨数据集的情况下保持优异表现，同时显著快于现有的基于优化的方法。\nPGS为三维平面重建提供了一种无需显式监督的新路径，在保持高效性的同时，拓展了在未标注数据和多场景条件下的应用潜力。\n"
  },
  {
    "path": "abs/2412.02075.md",
    "content": "### Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion\n\n3D scene reconstruction is a foundational problem in computer vision. Despite recent advancements in Neural Implicit Representations (NIR), existing methods often lack editability and compositional flexibility, limiting their use in scenarios requiring high interactivity and object-level manipulation. In this paper, we introduce the Gaussian Object Carver (GOC), a novel, efficient, and scalable framework for object-compositional 3D scene reconstruction. GOC leverages 3D Gaussian Splatting (GS), enriched with monocular geometry priors and multi-view geometry regularization, to achieve high-quality and flexible reconstruction. Furthermore, we propose a zero-shot Object Surface Completion (OSC) model, which uses 3D priors from 3d object data to reconstruct unobserved surfaces, ensuring object completeness even in occluded areas. Experimental results demonstrate that GOC improves reconstruction efficiency and geometric fidelity. It holds promise for advancing the practical application of digital twins in embodied AI, AR/VR, and interactive simulation environments.\n\n三维场景重建是计算机视觉中的一个基础问题。尽管神经隐式表示（Neural Implicit Representations, NIR）取得了显著进展，但现有方法通常缺乏可编辑性和组合灵活性，限制了其在需要高交互性和对象级操作的场景中的应用。\n本文提出了高斯对象雕刻器（Gaussian Object Carver, GOC），这是一种新颖、高效且可扩展的框架，用于对象组成式三维场景重建。GOC结合了三维高斯喷溅（3D Gaussian Splatting, GS）技术，并通过单目几何先验和多视图几何正则化，提供高质量且灵活的重建能力。此外，我们提出了一种零样本对象表面补全（Object Surface Completion, OSC）模型，利用来自三维对象数据的几何先验重建未观测到的表面，从而确保在被遮挡区域中对象的完整性。\n实验结果表明，GOC在重建效率和几何保真度上显著提升，为数字孪生技术在具身人工智能（Embodied AI）、增强现实/虚拟现实（AR/VR）和交互式模拟环境中的实际应用开辟了新的可能性。\n"
  },
  {
    "path": "abs/2412.02140.md",
    "content": "### SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images\n\nLanguage-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects. However, existing methods often depend on dense camera views and struggle to quickly update scenes, limiting their effectiveness in changeable environments. In contrast, we propose SparseGrasp, a novel open-vocabulary robotic grasping system that operates efficiently with sparse-view RGB images and handles scene updates fastly. Our system builds upon and significantly enhances existing computer vision modules in robotic learning. Specifically, SparseGrasp utilizes DUSt3R to generate a dense point cloud as the initialization for 3D Gaussian Splatting (3DGS), maintaining high fidelity even under sparse supervision. Importantly, SparseGrasp incorporates semantic awareness from recent vision foundation models. To further improve processing efficiency, we repurpose Principal Component Analysis (PCA) to compress features from 2D models. Additionally, we introduce a novel render-and-compare strategy that ensures rapid scene updates, enabling multi-turn grasping in changeable environments. Experimental results show that SparseGrasp significantly outperforms state-of-the-art methods in terms of both speed and adaptability, providing a robust solution for multi-turn grasping in changeable environment.\n\n语言引导的机器人抓取是一个快速发展的领域，通过人类语言指令让机器人抓取特定物体。然而，现有方法通常依赖于密集相机视图，并在快速更新场景时表现不佳，限制了其在变化环境中的有效性。\n与之相比，我们提出了SparseGrasp，一种新颖的开放词汇机器人抓取系统，能够高效处理稀疏视图RGB图像，并快速应对场景更新。SparseGrasp在现有计算机视觉模块的基础上显著增强了机器人学习能力。具体而言，SparseGrasp利用DUSt3R生成稠密点云作为三维高斯散点（3D Gaussian Splatting, 3DGS）的初始化，即使在稀疏监督下仍能保持高保真度。此外，SparseGrasp结合了最新视觉基础模型的语义感知能力。为进一步提高处理效率，我们重新利用主成分分析（PCA）对二维模型特征进行压缩。同时，我们引入了一种新颖的渲染与比较策略（render-and-compare strategy），确保场景快速更新，从而支持多轮抓取任务。\n实验结果表明，SparseGrasp在速度和适应性方面显著优于最先进的方法，为变化环境中的多轮抓取任务提供了鲁棒的解决方案。\n"
  },
  {
    "path": "abs/2412.02245.md",
    "content": "### SparseLGS: Sparse View Language Embedded Gaussian Splatting\n\nRecently, several studies have combined Gaussian Splatting to obtain scene representations with language embeddings for open-vocabulary 3D scene understanding. While these methods perform well, they essentially require very dense multi-view inputs, limiting their applicability in real-world scenarios. In this work, we propose SparseLGS to address the challenge of 3D scene understanding with pose-free and sparse view input images. Our method leverages a learning-based dense stereo model to handle pose-free and sparse inputs, and a three-step region matching approach to address the multi-view semantic inconsistency problem, which is especially important for sparse inputs. Different from directly learning high-dimensional CLIP features, we extract low-dimensional information and build bijections to avoid excessive learning and storage costs. We introduce a reconstruction loss during semantic training to improve Gaussian positions and shapes. To the best of our knowledge, we are the first to address the 3D semantic field problem with sparse pose-free inputs. Experimental results show that SparseLGS achieves comparable quality when reconstructing semantic fields with fewer inputs (3-4 views) compared to previous SOTA methods with dense input. Besides, when using the same sparse input, SparseLGS leads significantly in quality and heavily improves the computation speed (5×speedup).\n\n近年来，一些研究将高斯散点（Gaussian Splatting）与语言嵌入相结合，用于开放词汇的三维场景理解。这些方法尽管表现良好，但通常需要非常密集的多视角输入，从而限制了其在真实世界场景中的适用性。为解决这一问题，我们提出了SparseLGS，一种针对无位姿稀疏视图输入的三维场景理解方法。\nSparseLGS采用基于学习的稠密立体模型来处理无位姿和稀疏输入，并引入三步区域匹配方法以解决多视图语义不一致性问题，这对于稀疏输入尤为重要。不同于直接学习高维的CLIP特征，我们提取低维信息并构建双射关系，以避免过度的学习和存储成本。在语义训练中，我们引入重建损失以优化高斯的位置和形状。\n据我们所知，SparseLGS是首个针对无位姿稀疏输入的三维语义场问题的研究方法。实验结果表明，与现有最先进方法相比，SparseLGS在仅使用3-4个视图的稀疏输入情况下，能够以较少的输入重建出质量可比的语义场。此外，在相同稀疏输入条件下，SparseLGS的质量显著领先，并大幅提升了计算速度（5倍加速）。\n"
  },
  {
    "path": "abs/2412.02249.md",
    "content": "### Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance\n\nImplicit neural representations and 3D Gaussian splatting (3DGS) have shown great potential for scene reconstruction. Recent studies have expanded their applications in autonomous reconstruction through task assignment methods. However, these methods are mainly limited to single robot, and rapid reconstruction of large-scale scenes remains challenging. Additionally, task-driven planning based on surface uncertainty is prone to being trapped in local optima. To this end, we propose the first 3DGS-based centralized multi-robot autonomous 3D reconstruction framework. To further reduce time cost of task generation and improve reconstruction quality, we integrate online open-vocabulary semantic segmentation with surface uncertainty of 3DGS, focusing view sampling on regions with high instance uncertainty. Finally, we develop a multi-robot collaboration strategy with mode and task assignments improving reconstruction quality while ensuring planning efficiency. Our method demonstrates the highest reconstruction quality among all planning methods and superior planning efficiency compared to existing multi-robot methods. We deploy our method on multiple robots, and results show that it can effectively plan view paths and reconstruct scenes with high quality.\n\n隐式神经表示（Implicit Neural Representations）和三维高斯散点（3D Gaussian Splatting, 3DGS）在场景重建中展现了巨大潜力。近期研究将其应用扩展至自主重建中的任务分配方法。然而，这些方法主要局限于单机器人场景，对大规模场景的快速重建仍具挑战性。此外，基于表面不确定性的任务驱动规划容易陷入局部最优。\n为此，我们提出了首个基于3DGS的集中式多机器人自主三维重建框架。为进一步减少任务生成的时间成本并提高重建质量，我们将在线开放词汇语义分割与3DGS表面不确定性相结合，将视角采样聚焦于实例不确定性高的区域。最后，我们开发了一种多机器人协作策略，结合模式和任务分配，在保证规划效率的同时提升重建质量。\n实验结果表明，我们的方法在所有规划方法中实现了最高的重建质量，并在规划效率方面显著优于现有多机器人方法。在多机器人实际部署中，我们的方法能够高效规划视角路径并以高质量重建场景，为大规模多机器人三维场景重建提供了强有力的解决方案。\n"
  },
  {
    "path": "abs/2412.02267.md",
    "content": "### GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos\n\nTracking the 6DoF pose of unknown objects in monocular RGB video sequences is crucial for robotic manipulation. However, existing approaches typically rely on accurate depth information, which is non-trivial to obtain in real-world scenarios. Although depth estimation algorithms can be employed, geometric inaccuracy can lead to failures in RGBD-based pose tracking methods. To address this challenge, we introduce GSGTrack, a novel RGB-based pose tracking framework that jointly optimizes geometry and pose. Specifically, we adopt 3D Gaussian Splatting to create an optimizable 3D representation, which is learned simultaneously with a graph-based geometry optimization to capture the object's appearance features and refine its geometry. However, the joint optimization process is susceptible to perturbations from noisy pose and geometry data. Thus, we propose an object silhouette loss to address the issue of pixel-wise loss being overly sensitive to pose noise during tracking. To mitigate the geometric ambiguities caused by inaccurate depth information, we propose a geometry-consistent image pair selection strategy, which filters out low-confidence pairs and ensures robust geometric optimization. Extensive experiments on the OnePose and HO3D datasets demonstrate the effectiveness of GSGTrack in both 6DoF pose tracking and object reconstruction.\n\n在单目RGB视频序列中跟踪未知物体的6自由度（6DoF）位姿对于机器人操作至关重要。然而，现有方法通常依赖于准确的深度信息，而在真实世界场景中获取深度数据并非易事。虽然可以使用深度估计算法，但几何不准确性可能导致基于RGBD的位姿跟踪方法失败。\n为了解决这一问题，我们提出了GSGTrack，一种新颖的基于RGB的位姿跟踪框架，可联合优化几何和位姿。具体来说，我们采用三维高斯散点（3D Gaussian Splatting）来创建可优化的三维表示，该表示在捕获物体外观特征和优化几何的同时，通过基于图的几何优化进行学习。然而，联合优化过程容易受到噪声位姿和几何数据的干扰。为此，我们提出了一种物体轮廓损失（silhouette loss），以解决像素级损失对位姿噪声过于敏感的问题。\n此外，为减轻由深度信息不准确引起的几何模糊性，我们设计了一种几何一致的图像对选择策略（geometry-consistent image pair selection strategy），过滤掉低置信度的图像对，以确保稳健的几何优化。\n在OnePose和HO3D数据集上的大量实验表明，GSGTrack在6DoF位姿跟踪和物体重建任务中均表现出色，显著提升了鲁棒性和精度，为基于RGB的物体位姿跟踪提供了一个有效的解决方案。\n"
  },
  {
    "path": "abs/2412.02493.md",
    "content": "### RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians\n\nReconstructing dynamic scenes with large-scale and complex motions remains a significant challenge. Recent techniques like Neural Radiance Fields and 3D Gaussian Splatting (3DGS) have shown promise but still struggle with scenes involving substantial movement. This paper proposes RelayGS, a novel method based on 3DGS, specifically designed to represent and reconstruct highly dynamic scenes. Our RelayGS learns a complete 4D representation with canonical 3D Gaussians and a compact motion field, consisting of three stages. First, we learn a fundamental 3DGS from all frames, ignoring temporal scene variations, and use a learnable mask to separate the highly dynamic foreground from the minimally moving background. Second, we replicate multiple copies of the decoupled foreground Gaussians from the first stage, each corresponding to a temporal segment, and optimize them using pseudo-views constructed from multiple frames within each segment. These Gaussians, termed Relay Gaussians, act as explicit relay nodes, simplifying and breaking down large-scale motion trajectories into smaller, manageable segments. Finally, we jointly learn the scene's temporal motion and refine the canonical Gaussians learned from the first two stages. We conduct thorough experiments on two dynamic scene datasets featuring large and complex motions, where our RelayGS outperforms state-of-the-arts by more than 1 dB in PSNR, and successfully reconstructs real-world basketball game scenes in a much more complete and coherent manner, whereas previous methods usually struggle to capture the complex motion of players.\n\n重建具有大规模复杂运动的动态场景仍是一个显著的挑战。尽管神经辐射场（NeRF）和三维高斯散点（3D Gaussian Splatting, 3DGS）等技术在该领域表现出潜力，但在处理具有显著运动的场景时仍显不足。\n本文提出了RelayGS，一种基于3DGS的新方法，专为表示和重建高度动态场景而设计。RelayGS通过三阶段学习，构建了完整的四维表示，其中包含规范的三维高斯和紧凑的运动场。第一阶段，我们从所有帧中学习基础的3DGS模型，忽略时间上的场景变化，并利用可学习掩码将剧烈运动的前景与微动的背景分离。第二阶段，我们从第一阶段分离的前景高斯生成多个副本，每个副本对应一个时间段，并通过利用每段内多个帧构建的伪视角进行优化。这些高斯被称为Relay Gaussians，作为显式的中继节点，将大规模运动轨迹分解为更小且可控的片段。第三阶段，我们联合学习场景的时间运动，并对前两阶段学习的规范高斯进行细化优化。\n在包含大规模复杂运动的两个动态场景数据集上的实验表明，RelayGS在PSNR上比现有最先进方法提高了1 dB以上，并成功重建了真实世界中的篮球比赛场景。相比之下，现有方法通常难以捕捉球员复杂的运动，而RelayGS能够以更完整和连贯的方式进行重建，展现了卓越的性能和适应性。\n"
  },
  {
    "path": "abs/2412.02684.md",
    "content": "### AniGS: Animatable Gaussian Avatar from a Single Image with Inconsistent Gaussian Reconstruction\n\nGenerating animatable human avatars from a single image is essential for various digital human modeling applications. Existing 3D reconstruction methods often struggle to capture fine details in animatable models, while generative approaches for controllable animation, though avoiding explicit 3D modeling, suffer from viewpoint inconsistencies in extreme poses and computational inefficiencies. In this paper, we address these challenges by leveraging the power of generative models to produce detailed multi-view canonical pose images, which help resolve ambiguities in animatable human reconstruction. We then propose a robust method for 3D reconstruction of inconsistent images, enabling real-time rendering during inference. Specifically, we adapt a transformer-based video generation model to generate multi-view canonical pose images and normal maps, pretraining on a large-scale video dataset to improve generalization. To handle view inconsistencies, we recast the reconstruction problem as a 4D task and introduce an efficient 3D modeling approach using 4D Gaussian Splatting. Experiments demonstrate that our method achieves photorealistic, real-time animation of 3D human avatars from in-the-wild images, showcasing its effectiveness and generalization capability.\n\n从单张图像生成可动画的人类头像对数字人建模的各类应用至关重要。然而，现有的三维重建方法往往难以捕捉可动画模型中的细节，而基于生成方法的可控动画虽然避免了显式的三维建模，但在极端姿态下容易出现视角不一致性，并且计算效率较低。\n为解决这些问题，本文利用生成模型的强大能力生成细致的多视角规范姿态图像，从而缓解可动画人类重建中的模糊性。接着，我们提出了一种针对不一致图像的鲁棒三维重建方法，在推理过程中实现实时渲染。具体而言，我们调整了一个基于Transformer的视频生成模型，用于生成多视角规范姿态图像和法线贴图，并在大规模视频数据集上进行预训练以提高模型的泛化能力。为解决视角不一致性问题，我们将重建问题重新表述为一个四维任务，并引入了基于**四维高斯散点（4D Gaussian Splatting）**的高效三维建模方法。\n实验结果表明，本文方法能够从现实世界图像中生成真实感的三维人类头像动画，并支持实时渲染，展示了其卓越的效果和泛化能力，为单图像驱动的三维人类建模提供了一种创新且高效的解决方案。\n"
  },
  {
    "path": "abs/2412.02803.md",
    "content": "### Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3D Objects\n\n3D Gaussian Splatting has advanced radiance field reconstruction, enabling high-quality view synthesis and fast rendering in 3D modeling. While adversarial attacks on object detection models are well-studied for 2D images, their impact on 3D models remains underexplored. This work introduces the Masked Iterative Fast Gradient Sign Method (M-IFGSM), designed to generate adversarial noise targeting the CLIP vision-language model. M-IFGSM specifically alters the object of interest by focusing perturbations on masked regions, degrading the performance of CLIP's zero-shot object detection capability when applied to 3D models. Using eight objects from the Common Objects 3D (CO3D) dataset, we demonstrate that our method effectively reduces the accuracy and confidence of the model, with adversarial noise being nearly imperceptible to human observers. The top-1 accuracy in original model renders drops from 95.4% to 12.5% for train images and from 91.2% to 35.4% for test images, with confidence levels reflecting this shift from true classification to misclassification, underscoring the risks of adversarial attacks on 3D models in applications such as autonomous driving, robotics, and surveillance. The significance of this research lies in its potential to expose vulnerabilities in modern 3D vision models, including radiance fields, prompting the development of more robust defenses and security measures in critical real-world applications.\n\n三维高斯散点（3D Gaussian Splatting）在辐射场重建方面取得了显著进展，实现了高质量的视图合成和快速渲染，广泛应用于三维建模领域。然而，尽管对二维图像目标检测模型的对抗攻击已被深入研究，其对三维模型的影响仍未被充分探索。\n本文提出了掩膜迭代快速梯度符号方法（Masked Iterative Fast Gradient Sign Method, M-IFGSM），专为生成针对CLIP视觉-语言模型的对抗性噪声而设计。M-IFGSM通过将扰动聚焦于目标对象的掩膜区域，有效削弱了CLIP在三维模型上的零样本目标检测能力。我们使用了来自**Common Objects 3D (CO3D)**数据集的八个对象，实验表明该方法能够显著降低模型的准确性和置信度，同时对抗性噪声对人类观察者几乎不可见。\n在实验中，原始模型渲染的top-1准确率从训练图像的95.4%降至12.5%，测试图像从91.2%降至35.4%，置信水平从正确分类显著偏向错误分类。这些结果突显了对三维模型（包括辐射场模型）实施对抗性攻击的潜在风险，对自动驾驶、机器人技术和监控等应用具有重要影响。\n本研究的意义在于揭示了现代三维视觉模型的潜在脆弱性，强调了在关键真实世界应用中开发更稳健防御措施和安全机制的必要性。\n"
  },
  {
    "path": "abs/2412.03077.md",
    "content": "### RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos\n\nDynamic view synthesis (DVS) has advanced remarkably in recent years, achieving high-fidelity rendering while reducing computational costs. Despite the progress, optimizing dynamic neural fields from casual videos remains challenging, as these videos do not provide direct 3D information, such as camera trajectories or the underlying scene geometry. In this work, we present RoDyGS, an optimization pipeline for dynamic Gaussian Splatting from casual videos. It effectively learns motion and underlying geometry of scenes by separating dynamic and static primitives, and ensures that the learned motion and geometry are physically plausible by incorporating motion and geometric regularization terms. We also introduce a comprehensive benchmark, Kubric-MRig, that provides extensive camera and object motion along with simultaneous multi-view captures, features that are absent in previous benchmarks. Experimental results demonstrate that the proposed method significantly outperforms previous pose-free dynamic neural fields and achieves competitive rendering quality compared to existing pose-free static neural fields.\n\n动态视图合成（Dynamic View Synthesis, DVS）近年来取得了显著进展，不仅实现了高保真渲染，还降低了计算成本。然而，从普通视频中优化动态神经场仍然具有挑战性，因为这类视频通常缺乏直接的三维信息，例如相机轨迹或场景几何。\n在本文中，我们提出了RoDyGS，一种从普通视频中优化动态高斯散点（Dynamic Gaussian Splatting）的管道。该方法通过分离动态和静态原语，学习场景的运动和底层几何，同时通过引入运动和几何正则化项，确保学习到的运动和几何符合物理可行性。此外，我们构建了一个全面的基准数据集Kubric-MRig，该数据集提供了丰富的相机和物体运动，以及同步的多视角捕捉，这些特性在现有基准中尚未实现。\n实验结果表明，所提出的方法在无位姿动态神经场任务中显著优于现有方法，并在渲染质量上与现有的无位姿静态神经场方法具有竞争力，为动态场景的高效建模提供了新思路。\n"
  },
  {
    "path": "abs/2412.03121.md",
    "content": "### Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting\n\n3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and generation tasks, there is an urgent need to protect the copyright of 3DGS assets. However, existing copyright protection techniques for 3DGS overlook the usability of 3D assets, posing challenges for practical deployment. Here we describe WaterGS, the first 3DGS watermarking framework that embeds 3D content in 3DGS itself without modifying any attributes of the vanilla 3DGS. To achieve this, we take a deep insight into spherical harmonics (SH) and devise an importance-graded SH coefficient encryption strategy to embed the hidden SH coefficients. Furthermore, we employ a convolutional autoencoder to establish a mapping between the original Gaussian primitives' opacity and the hidden Gaussian primitives' opacity. Extensive experiments indicate that WaterGS significantly outperforms existing 3D steganography techniques, with 5.31% higher scene fidelity and 3X faster rendering speed, while ensuring security, robustness, and user experience.\n\n三维高斯散点（3D Gaussian Splatting, 3DGS）凭借其显式场景表示，在三维重建和生成任务中表现出色。随着3DGS在各类三维重建和生成任务中的广泛应用，保护3DGS资产版权的需求日益紧迫。然而，现有针对3DGS的版权保护技术往往忽略了3D资产的可用性，给实际部署带来了挑战。\n本文提出了WaterGS，这是首个针对3DGS的水印框架，能够在3DGS中嵌入三维内容，而无需修改原始3DGS的任何属性。为实现这一目标，我们深入研究了球谐函数（Spherical Harmonics, SH），设计了一种重要性分级的SH系数加密策略，用于嵌入隐藏的SH系数。此外，我们采用了卷积自编码器（Convolutional Autoencoder），在原始高斯原语的不透明度与隐藏高斯原语的不透明度之间建立了映射关系。\n大量实验表明，WaterGS显著优于现有的三维隐写技术，其场景保真度提高了5.31%，渲染速度提升了3倍，同时确保了水印的安全性、鲁棒性以及用户体验。这一框架为保护3DGS资产版权提供了一种高效、实用的解决方案。\n"
  },
  {
    "path": "abs/2412.03371.md",
    "content": "### SGSST: Scaling Gaussian Splatting StyleTransfer\n\nApplying style transfer to a full 3D environment is a challenging task that has seen many developments since the advent of neural rendering. 3D Gaussian splatting (3DGS) has recently pushed further many limits of neural rendering in terms of training speed and reconstruction quality. This work introduces SGSST: Scaling Gaussian Splatting Style Transfer, an optimization-based method to apply style transfer to pretrained 3DGS scenes. We demonstrate that a new multiscale loss based on global neural statistics, that we name SOS for Simultaneously Optimized Scales, enables style transfer to ultra-high resolution 3D scenes. Not only SGSST pioneers 3D scene style transfer at such high image resolutions, it also produces superior visual quality as assessed by thorough qualitative, quantitative and perceptual comparisons.\n\n将风格迁移应用于完整的三维环境是一项具有挑战性的任务，自神经渲染兴起以来，这一领域取得了许多进展。近年来，三维高斯喷溅（3D Gaussian Splatting, 3DGS）在训练速度和重建质量方面进一步突破了神经渲染的许多限制。\n本文提出了SGSST（Scaling Gaussian Splatting Style Transfer），一种基于优化的方法，用于将风格迁移应用于预训练的3DGS场景。我们设计了一种新的多尺度损失函数，基于全局神经统计信息，将其命名为SOS（Simultaneously Optimized Scales），使得风格迁移能够扩展到超高分辨率的三维场景。该方法不仅在高分辨率3D场景风格迁移上实现了突破，还在视觉质量方面表现卓越。\n通过全面的定性、定量和感知比较，我们证明了SGSST在高分辨率三维场景风格迁移中表现出色，为实现更高质量和更逼真的3D环境风格化开辟了新方向。\n"
  },
  {
    "path": "abs/2412.03378.md",
    "content": "### Volumetrically Consistent 3D Gaussian Rasterization\n\nRecently, 3D Gaussian Splatting (3DGS) has enabled photorealistic view synthesis at high inference speeds. However, its splatting-based rendering model makes several approximations to the rendering equation, reducing physical accuracy. We show that splatting and its approximations are unnecessary, even within a rasterizer; we instead volumetrically integrate 3D Gaussians directly to compute the transmittance across them analytically. We use this analytic transmittance to derive more physically-accurate alpha values than 3DGS, which can directly be used within their framework. The result is a method that more closely follows the volume rendering equation (similar to ray-tracing) while enjoying the speed benefits of rasterization. Our method represents opaque surfaces with higher accuracy and fewer points than 3DGS. This enables it to outperform 3DGS for view synthesis (measured in SSIM and LPIPS). Being volumetrically consistent also enables our method to work out of the box for tomography. We match the state-of-the-art 3DGS-based tomography method with fewer points. Being volumetrically consistent also enables our method to work out of the box for tomography. We match the state-of-the-art 3DGS-based tomography method with fewer points.\n\n近年来，三维高斯喷溅（3D Gaussian Splatting, 3DGS）在高推理速度下实现了逼真的视图合成。然而，其基于散点的渲染模型对渲染方程作出了一些近似，从而降低了物理精确性。本文表明，即使在光栅化框架中，这种散点及其近似也是不必要的；我们通过直接对三维高斯进行体积积分，解析地计算穿透率（transmittance），以实现更精确的渲染。\n我们利用这一解析穿透率推导出比3DGS更物理精确的alpha值，这些值可以直接在其框架中使用。结果是一种更接近体积渲染方程（类似于光线追踪）的方法，同时享有光栅化的速度优势。我们的方法能够以更少的点表示不透明表面，并具有更高的精确度。这使得我们在视图合成（以SSIM和LPIPS测量）方面超越了3DGS。\n此外，由于具有体积一致性，我们的方法可以直接应用于断层成像（tomography），并以更少的点匹配最先进的基于3DGS的断层成像方法。这种体积一致性展示了该方法在逼真渲染和科学应用中的潜力，为物理精确的高效三维渲染和建模提供了新路径。\n"
  },
  {
    "path": "abs/2412.03428.md",
    "content": "### 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction\n\nThe reconstruction of indoor scenes remains challenging due to the inherent complexity of spatial structures and the prevalence of textureless regions. Recent advancements in 3D Gaussian Splatting have improved novel view synthesis with accelerated processing but have yet to deliver comparable performance in surface reconstruction. In this paper, we introduce 2DGS-Room, a novel method leveraging 2D Gaussian Splatting for high-fidelity indoor scene reconstruction. Specifically, we employ a seed-guided mechanism to control the distribution of 2D Gaussians, with the density of seed points dynamically optimized through adaptive growth and pruning mechanisms. To further improve geometric accuracy, we incorporate monocular depth and normal priors to provide constraints for details and textureless regions respectively. Additionally, multi-view consistency constraints are employed to mitigate artifacts and further enhance reconstruction quality. Extensive experiments on ScanNet and ScanNet++ datasets demonstrate that our method achieves state-of-the-art performance in indoor scene reconstruction.\n\n室内场景的重建因空间结构的复杂性和无纹理区域的普遍存在而充满挑战。近年来，三维高斯喷溅（3D Gaussian Splatting, 3DGS）在加速处理的同时改进了新视角合成，但在表面重建性能上仍未达到同等水平。\n本文提出了2DGS-Room，一种利用二维高斯喷溅（2D Gaussian Splatting）实现高保真室内场景重建的新方法。具体而言，我们采用种子引导机制控制二维高斯的分布，通过自适应增长和修剪机制动态优化种子点的密度。为进一步提升几何精度，我们引入单目深度和法线先验，分别为细节和无纹理区域提供约束。此外，利用多视图一致性约束减少伪影并进一步增强重建质量。\n在ScanNet和ScanNet++数据集上的大量实验表明，2DGS-Room在室内场景重建中达到了当前最先进的性能，显著提高了几何和纹理细节的保真度，为室内场景的高精度重建提供了一种有效解决方案。\n"
  },
  {
    "path": "abs/2412.03473.md",
    "content": "### Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction\n\nReconstructing dynamic urban scenes presents significant challenges due to their intrinsic geometric structures and spatiotemporal dynamics. Existing methods that attempt to model dynamic urban scenes without leveraging priors on potentially moving regions often produce suboptimal results. Meanwhile, approaches based on manual 3D annotations yield improved reconstruction quality but are impractical due to labor-intensive labeling. In this paper, we revisit the potential of 2D semantic maps for classifying dynamic and static Gaussians and integrating spatial and temporal dimensions for urban scene representation. We introduce Urban4D, a novel framework that employs a semantic-guided decomposition strategy inspired by advances in deep 2D semantic map generation. Our approach distinguishes potentially dynamic objects through reliable semantic Gaussians. To explicitly model dynamic objects, we propose an intuitive and effective 4D Gaussian splatting (4DGS) representation that aggregates temporal information through learnable time embeddings for each Gaussian, predicting their deformations at desired timestamps using a multilayer perceptron (MLP). For more accurate static reconstruction, we also design a k-nearest neighbor (KNN)-based consistency regularization to handle the ground surface due to its low-texture characteristic. Extensive experiments on real-world datasets demonstrate that Urban4D not only achieves comparable or better quality than previous state-of-the-art methods but also effectively captures dynamic objects while maintaining high visual fidelity for static elements.\n\n重建动态城市场景因其固有的几何结构和时空动态性而充满挑战。现有不利用潜在动态区域先验的建模方法往往产生次优结果，而基于手动3D标注的方法尽管提高了重建质量，但由于标注过程劳动强度大而不实用。\n本文重新探索了利用二维语义图对动态与静态高斯进行分类，并将空间和时间维度集成以表示城市场景的潜力。我们提出了Urban4D，一种基于语义引导分解策略的创新框架，受到深度二维语义图生成技术进展的启发。该方法通过语义高斯可靠地区分潜在的动态物体。为显式建模动态物体，我们提出了一种直观且高效的**四维高斯散点（4D Gaussian Splatting, 4DGS）**表示方法，通过可学习的时间嵌入为每个高斯聚合时间信息，使用多层感知机（MLP）预测其在目标时间戳下的变形。\n针对静态场景的更精确重建，我们设计了一种基于k近邻（KNN）一致性正则化的策略，以处理地面等低纹理特性区域。大量真实数据集实验表明，Urban4D不仅在质量上与现有最先进方法相当或更优，还能够有效捕捉动态物体，同时保持静态元素的高视觉保真度。\nUrban4D为动态城市场景的高效、精确重建提供了新思路，特别在处理复杂时空变化场景中展现了卓越的表现力。\n"
  },
  {
    "path": "abs/2412.03526.md",
    "content": "### Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos\n\nRecent advancements in static feed-forward scene reconstruction have demonstrated significant progress in high-quality novel view synthesis. However, these models often struggle with generalizability across diverse environments and fail to effectively handle dynamic content. We present BTimer (short for BulletTimer), the first motion-aware feed-forward model for real-time reconstruction and novel view synthesis of dynamic scenes. Our approach reconstructs the full scene in a 3D Gaussian Splatting representation at a given target ('bullet') timestamp by aggregating information from all the context frames. Such a formulation allows BTimer to gain scalability and generalization by leveraging both static and dynamic scene datasets. Given a casual monocular dynamic video, BTimer reconstructs a bullet-time scene within 150ms while reaching state-of-the-art performance on both static and dynamic scene datasets, even compared with optimization-based approaches.\n\n近期在静态前馈场景重建方面的进展显著提升了高质量新视图合成的效果。然而，这些模型通常在适应多样化环境方面表现欠佳，并且难以有效处理动态内容。我们提出了 BTimer（全称 BulletTimer），这是首个面向动态场景实时重建与新视图合成的运动感知前馈模型。\n我们的方法通过聚合所有上下文帧的信息，在指定的目标时间点（即“子弹时间”）以三维高斯点绘（3D Gaussian Splatting）表示形式重建完整场景。这样的设计使 BTimer 能够利用静态和动态场景数据集，获得良好的扩展性和泛化能力。\n面对一个普通的单目动态视频，BTimer 能在 150 毫秒内重建子弹时间场景，并在静态和动态场景数据集上均实现了当前最先进的性能，甚至超越了一些基于优化的方法。\n"
  },
  {
    "path": "abs/2412.03844.md",
    "content": "### HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting\n\nGenerating high-quality novel view renderings of 3D Gaussian Splatting (3DGS) in scenes featuring transient objects is challenging. We propose a novel hybrid representation, termed as HybridGS, using 2D Gaussians for transient objects per image and maintaining traditional 3D Gaussians for the whole static scenes. Note that, the 3DGS itself is better suited for modeling static scenes that assume multi-view consistency, but the transient objects appear occasionally and do not adhere to the assumption, thus we model them as planar objects from a single view, represented with 2D Gaussians. Our novel representation decomposes the scene from the perspective of fundamental viewpoint consistency, making it more reasonable. Additionally, we present a novel multi-view regulated supervision method for 3DGS that leverages information from co-visible regions, further enhancing the distinctions between the transients and statics. Then, we propose a straightforward yet effective multi-stage training strategy to ensure robust training and high-quality view synthesis across various settings. Experiments on benchmark datasets show our state-of-the-art performance of novel view synthesis in both indoor and outdoor scenes, even in the presence of distracting elements.\n\n在包含瞬态物体的场景中生成高质量的新视角渲染是三维高斯散点（3D Gaussian Splatting, 3DGS）的一个挑战。本文提出了一种新颖的混合表示方法，称为HybridGS，利用二维高斯表示每幅图像中的瞬态物体，同时保持传统的三维高斯表示整个静态场景。\n需要注意的是，3DGS更适合建模具有多视图一致性的静态场景，而瞬态物体偶尔出现且不符合多视图一致性的假设。因此，我们将它们建模为单视图平面物体，用二维高斯表示。我们的新表示方法从基本视角一致性的角度对场景进行分解，使其更加合理。\n此外，我们提出了一种新颖的多视图调控监督方法，用于3DGS，通过利用共视区域的信息进一步增强瞬态物体和静态场景之间的区分。随后，我们设计了一种简单但有效的多阶段训练策略，以确保在各种设置下实现稳健的训练和高质量的视角合成。\n在基准数据集上的实验表明，HybridGS在室内和室外场景的新视角合成中表现出色，即使在存在干扰元素的情况下，仍能实现最先进的性能。这表明该方法在同时处理动态和静态场景元素方面具有显著优势。\n"
  },
  {
    "path": "abs/2412.03910.md",
    "content": "### DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction\n\nDynamic scene reconstruction from monocular video is critical for real-world applications. This paper tackles the dual challenges of dynamic novel-view synthesis and 3D geometry reconstruction by introducing a hybrid framework: Deformable Gaussian Splatting and Dynamic Neural Surfaces (DGNS), in which both modules can leverage each other for both tasks. During training, depth maps generated by the deformable Gaussian splatting module guide the ray sampling for faster processing and provide depth supervision within the dynamic neural surface module to improve geometry reconstruction. Simultaneously, the dynamic neural surface directs the distribution of Gaussian primitives around the surface, enhancing rendering quality. To further refine depth supervision, we introduce a depth-filtering process on depth maps derived from Gaussian rasterization. Extensive experiments on public datasets demonstrate that DGNS achieves state-of-the-art performance in both novel-view synthesis and 3D reconstruction.\n\n从单目视频中进行动态场景重建对于实际应用至关重要。本文提出了一种混合框架：可变形高斯点绘与动态神经表面（DGNS），以解决动态新视图合成和三维几何重建的双重挑战。该框架的两个模块能够相互利用，以同时优化这两项任务。\n在训练过程中，由可变形高斯点绘模块生成的深度图引导光线采样，从而加速处理，并为动态神经表面模块提供深度监督以改进几何重建。同时，动态神经表面模块调整高斯原语在表面周围的分布，从而提升渲染质量。\n为了进一步优化深度监督，我们对通过高斯光栅化生成的深度图引入了深度过滤过程。大量在公开数据集上的实验表明，DGNS在新视图合成和三维重建两个任务中均达到了当前最先进的性能。\n"
  },
  {
    "path": "abs/2412.03911.md",
    "content": "### Multi-View Pose-Agnostic Change Localization with Zero Labels\n\nAutonomous agents often require accurate methods for detecting and localizing changes in their environment, particularly when observations are captured from unconstrained and inconsistent viewpoints. We propose a novel label-free, pose-agnostic change detection method that integrates information from multiple viewpoints to construct a change-aware 3D Gaussian Splatting (3DGS) representation of the scene. With as few as 5 images of the post-change scene, our approach can learn additional change channels in a 3DGS and produce change masks that outperform single-view techniques. Our change-aware 3D scene representation additionally enables the generation of accurate change masks for unseen viewpoints. Experimental results demonstrate state-of-the-art performance in complex multi-object scenes, achieving a 1.7× and 1.6× improvement in Mean Intersection Over Union and F1 score respectively over other baselines. We also contribute a new real-world dataset to benchmark change detection in diverse challenging scenes in the presence of lighting variations.\n\n\n自主智能体通常需要准确的方法来检测和定位环境中的变化，尤其是在观察视点不受限制且不一致的情况下。我们提出了一种新颖的、无需标注且与姿态无关的变化检测方法，该方法整合来自多个视点的信息，以构建场景的**变化感知三维高斯点绘（3DGS）**表示。即使仅使用后变化场景的 5 张图像，我们的方法也能在 3DGS 中学习附加的变化通道，并生成优于单视图技术的变化掩码。\n我们的变化感知三维场景表示还能够为未见过的视点生成准确的变化掩码。实验结果表明，该方法在复杂多物体场景中达到了当前最先进的性能，IoU和F1分数分别相比其他基线提高了 1.7 倍和 1.6 倍。此外，我们还贡献了一个新的真实场景数据集，用于在存在光照变化的多样化复杂场景中对变化检测进行基准测试。\n"
  },
  {
    "path": "abs/2412.03934.md",
    "content": "### InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models\n\nWe present InfiniCube, a scalable method for generating unbounded dynamic 3D driving scenes with high fidelity and controllability. Previous methods for scene generation either suffer from limited scales or lack geometric and appearance consistency along generated sequences. In contrast, we leverage the recent advancements in scalable 3D representation and video models to achieve large dynamic scene generation that allows flexible controls through HD maps, vehicle bounding boxes, and text descriptions. First, we construct a map-conditioned sparse-voxel-based 3D generative model to unleash its power for unbounded voxel world generation. Then, we re-purpose a video model and ground it on the voxel world through a set of carefully designed pixel-aligned guidance buffers, synthesizing a consistent appearance. Finally, we propose a fast feed-forward approach that employs both voxel and pixel branches to lift the dynamic videos to dynamic 3D Gaussians with controllable objects. Our method can generate controllable and realistic 3D driving scenes, and extensive experiments validate the effectiveness and superiority of our model.\n\n我们提出了 InfiniCube，一种可扩展的方法，用于生成高保真且可控的无限动态三维驾驶场景。以往的场景生成方法要么受限于生成规模，要么在生成序列中缺乏几何和外观的一致性。相比之下，我们利用了近期在可扩展三维表示和视频模型方面的进展，实现了大型动态场景生成，并通过高清地图（HD maps）、车辆边界框和文本描述实现灵活控制。\n首先，我们构建了一个基于地图约束的稀疏体素三维生成模型，释放其在生成无限体素世界中的潜力。接着，我们重新设计了一个视频模型，并通过一组精心设计的像素对齐引导缓冲器将其锚定在体素世界中，以合成一致的外观。最后，我们提出了一种快速前馈方法，结合体素分支和像素分支，将动态视频提升为包含可控对象的动态三维高斯表示。\n我们的方法能够生成可控且逼真的三维驾驶场景，并通过大量实验验证了模型的有效性和优越性。\n"
  },
  {
    "path": "abs/2412.04380.md",
    "content": "### EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding\n\n3D occupancy prediction provides a comprehensive description of the surrounding scenes and has become an essential task for 3D perception. Most existing methods focus on offline perception from one or a few views and cannot be applied to embodied agents which demands to gradually perceive the scene through progressive embodied exploration. In this paper, we formulate an embodied 3D occupancy prediction task to target this practical scenario and propose a Gaussian-based EmbodiedOcc framework to accomplish it. We initialize the global scene with uniform 3D semantic Gaussians and progressively update local regions observed by the embodied agent. For each update, we extract semantic and structural features from the observed image and efficiently incorporate them via deformable cross-attention to refine the regional Gaussians. Finally, we employ Gaussian-to-voxel splatting to obtain the global 3D occupancy from the updated 3D Gaussians. Our EmbodiedOcc assumes an unknown (i.e., uniformly distributed) environment and maintains an explicit global memory of it with 3D Gaussians. It gradually gains knowledge through local refinement of regional Gaussians, which is consistent with how humans understand new scenes through embodied exploration. We reorganize an EmbodiedOcc-ScanNet benchmark based on local annotations to facilitate the evaluation of the embodied 3D occupancy prediction task. Experiments demonstrate that our EmbodiedOcc outperforms existing local prediction methods and accomplishes the embodied occupancy prediction with high accuracy and strong expandability.\n\n三维占据预测能够全面描述周围场景，是三维感知领域的一项核心任务。目前大多数方法专注于基于单视图或少量视图的离线感知，无法满足具身智能体（embodied agents）逐步通过探索感知场景的需求。本文针对这一实际场景，提出了具身三维占据预测任务（embodied 3D occupancy prediction），并设计了基于高斯的 EmbodiedOcc 框架来实现。\n我们以均匀分布的三维语义高斯初始化全局场景，并逐步更新具身智能体观测到的局部区域。对于每次更新，我们从观测图像中提取语义和结构特征，并通过高效的可变形跨注意力机制（deformable cross-attention）整合这些特征，以优化区域高斯表示。最终，通过高斯到体素的点绘（Gaussian-to-voxel splatting）将更新后的三维高斯转化为全局三维占据表示。\nEmbodiedOcc 假设环境未知（即初始为均匀分布），并通过三维高斯显式维护全局记忆。它通过对局部区域的逐步优化来逐渐获取知识，这种方式与人类通过具身探索理解新场景的过程一致。我们基于局部标注重组了 EmbodiedOcc-ScanNet 基准，用于评估具身三维占据预测任务。\n实验表明，EmbodiedOcc 超越了现有局部预测方法，在高精度和强扩展性方面表现出色，成功实现了具身占据预测任务。\n"
  },
  {
    "path": "abs/2412.04433.md",
    "content": "### PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars\n\nThis paper introduces a novel clothed human model that can be learned from multiview RGB videos, with a particular emphasis on recovering physically accurate body and cloth movements. Our method, Position Based Dynamic Gaussians (PBDyG), realizes ''movement-dependent'' cloth deformation via physical simulation, rather than merely relying on ''pose-dependent'' rigid transformations. We model the clothed human holistically but with two distinct physical entities in contact: clothing modeled as 3D Gaussians, which are attached to a skinned SMPL body that follows the movement of the person in the input videos. The articulation of the SMPL body also drives physically-based simulation of the clothes' Gaussians to transform the avatar to novel poses. In order to run position based dynamics simulation, physical properties including mass and material stiffness are estimated from the RGB videos through Dynamic 3D Gaussian Splatting. Experiments demonstrate that our method not only accurately reproduces appearance but also enables the reconstruction of avatars wearing highly deformable garments, such as skirts or coats, which have been challenging to reconstruct using existing methods.\n\n本文提出了一种新颖的着装人体模型，可从多视角 RGB 视频中学习，特别强调恢复物理精确的身体与服装运动。我们的方法 基于位置的动态高斯（Position Based Dynamic Gaussians, PBDyG），通过物理模拟实现了“运动相关”的服装变形，而不仅仅依赖于“姿态相关”的刚性变换。\n我们整体建模着装人体，但将其分为两个接触的物理实体：服装被建模为三维高斯，与一个经过蒙皮的 SMPL 人体相连接，SMPL 人体根据输入视频中的人物动作进行移动。同时，SMPL 人体的关节驱动服装高斯的基于物理的模拟，从而将虚拟人转换到新的姿态。\n为了进行基于位置的动力学模拟，物理属性（包括质量和材料刚度）通过动态三维高斯点绘从 RGB 视频中估计而得。实验表明，我们的方法不仅能准确再现外观，还能够重建穿着高度可变形服装（如裙子或外套）的虚拟人，这对于现有方法而言一直是一个挑战。\n"
  },
  {
    "path": "abs/2412.04457.md",
    "content": "### Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps\n\nGaussian splatting methods are emerging as a popular approach for converting multi-view image data into scene representations that allow view synthesis. In particular, there is interest in enabling view synthesis for dynamic scenes using only monocular input data -- an ill-posed and challenging problem. The fast pace of work in this area has produced multiple simultaneous papers that claim to work best, which cannot all be true. In this work, we organize, benchmark, and analyze many Gaussian-splatting-based methods, providing apples-to-apples comparisons that prior works have lacked. We use multiple existing datasets and a new instructive synthetic dataset designed to isolate factors that affect reconstruction quality. We systematically categorize Gaussian splatting methods into specific motion representation types and quantify how their differences impact performance. Empirically, we find that their rank order is well-defined in synthetic data, but the complexity of real-world data currently overwhelms the differences. Furthermore, the fast rendering speed of all Gaussian-based methods comes at the cost of brittleness in optimization. We summarize our experiments into a list of findings that can help to further progress in this lively problem setting.\n\n高斯点绘方法正在成为一种流行的技术，用于将多视角图像数据转换为可实现视图合成的场景表示。尤其是在仅使用单目输入数据的情况下实现动态场景的视图合成，这一问题因其病态性和挑战性而备受关注。近期，该领域的快速发展带来了多个几乎同时发表的论文，每篇都声称其方法是最优的，但这些说法显然不可能全部成立。\n在本文中，我们对许多基于高斯点绘的方法进行了组织、基准测试和分析，提供了前人研究中缺乏的“等量齐观”比较。我们使用多个现有数据集，以及一个新设计的教学性合成数据集，该数据集旨在分离影响重建质量的关键因素。我们系统地将高斯点绘方法按运动表示类型进行分类，并量化它们在性能上的差异。\n实验结果表明，在合成数据中，这些方法的优劣顺序非常清晰，但在真实世界的数据中，数据复杂性会掩盖这些差异。此外，尽管所有基于高斯的方法在渲染速度上具有显著优势，但这种速度以优化过程的脆弱性为代价。\n我们将实验总结为一系列发现，为这一活跃的研究领域提供了指导，旨在推动进一步的进展。\n\n"
  },
  {
    "path": "abs/2412.04469.md",
    "content": "### QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos\n\nOnline free-viewpoint video (FVV) streaming is a challenging problem, which is relatively under-explored. It requires incremental on-the-fly updates to a volumetric representation, fast training and rendering to satisfy real-time constraints and a small memory footprint for efficient transmission. If achieved, it can enhance user experience by enabling novel applications, e.g., 3D video conferencing and live volumetric video broadcast, among others. In this work, we propose a novel framework for QUantized and Efficient ENcoding (QUEEN) for streaming FVV using 3D Gaussian Splatting (3D-GS). QUEEN directly learns Gaussian attribute residuals between consecutive frames at each time-step without imposing any structural constraints on them, allowing for high quality reconstruction and generalizability. To efficiently store the residuals, we further propose a quantization-sparsity framework, which contains a learned latent-decoder for effectively quantizing attribute residuals other than Gaussian positions and a learned gating module to sparsify position residuals. We propose to use the Gaussian viewspace gradient difference vector as a signal to separate the static and dynamic content of the scene. It acts as a guide for effective sparsity learning and speeds up training. On diverse FVV benchmarks, QUEEN outperforms the state-of-the-art online FVV methods on all metrics. Notably, for several highly dynamic scenes, it reduces the model size to just 0.7 MB per frame while training in under 5 sec and rendering at 350 FPS.\n\n在线自由视角视频（FVV）流媒体是一项具有挑战性但研究较少的问题。该任务要求对体素表示进行实时增量更新，快速训练和渲染以满足实时约束，同时具有较小的内存占用以实现高效传输。如果实现，该技术能够显著提升用户体验，支持如3D视频会议和实时体积视频直播等新型应用。\n在本文中，我们提出了一种用于流式 FVV 的新框架：基于三维高斯点绘（3D-GS）的量化高效编码框架（QUEEN）。QUEEN 直接在每个时间步学习连续帧之间的高斯属性残差，无需对其施加结构性约束，从而实现高质量的重建和良好的泛化能力。\n为了高效存储残差，我们进一步提出了一种量化-稀疏性框架，其中包含一个学习的潜在解码器，用于有效量化除高斯位置以外的属性残差，以及一个学习的门控模块，用于稀疏化位置残差。此外，我们利用高斯视空间梯度差向量作为信号，将场景的静态和动态内容分离。此信号可以引导有效的稀疏学习并加速训练。\n在多个 FVV 基准数据集上，QUEEN 在所有指标上均超越现有最先进的在线 FVV 方法。尤其是在一些高度动态的场景中，QUEEN 将模型大小降低到每帧仅 0.7 MB，训练时间少于 5 秒，并以 350 FPS 的速度渲染。\n"
  },
  {
    "path": "abs/2412.04470.md",
    "content": "### Turbo3D: Ultra-fast Text-to-3D Generation\n\nWe present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.\n\n我们介绍了 Turbo3D，这是一种超高速文本到3D生成系统，能够在不到一秒的时间内生成高质量的高斯点云资产。Turbo3D采用快速的4步4视角扩散生成器和高效的前馈式高斯重构器，两者均在潜空间中运行。4步4视角生成器是通过一种新颖的双教师（Dual-Teacher）方法蒸馏的学生模型，该方法鼓励学生从多视角教师中学习视角一致性，并从单视角教师中学习照片真实感。通过将高斯重构器的输入从像素空间转移到潜空间，我们消除了额外的图像解码时间，并将变换器序列长度减半，从而实现了最高效率。与先前的基准方法相比，我们的方法在运行时间大幅缩短的同时，生成了更优质的3D结果。\n"
  },
  {
    "path": "abs/2412.04826.md",
    "content": "### Pushing Rendering Boundaries: Hard Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive Novel View Synthesis (NVS) results in a real-time rendering manner. During training, it relies heavily on the average magnitude of view-space positional gradients to grow Gaussians to reduce rendering loss. However, this average operation smooths the positional gradients from different viewpoints and rendering errors from different pixels, hindering the growth and optimization of many defective Gaussians. This leads to strong spurious artifacts in some areas. To address this problem, we propose Hard Gaussian Splatting, dubbed HGS, which considers multi-view significant positional gradients and rendering errors to grow hard Gaussians that fill the gaps of classical Gaussian Splatting on 3D scenes, thus achieving superior NVS results. In detail, we present positional gradient driven HGS, which leverages multi-view significant positional gradients to uncover hard Gaussians. Moreover, we propose rendering error guided HGS, which identifies noticeable pixel rendering errors and potentially over-large Gaussians to jointly mine hard Gaussians. By growing and optimizing these hard Gaussians, our method helps to resolve blurring and needle-like artifacts. Experiments on various datasets demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time efficiency.\n\n3D高斯点云（3D Gaussian Splatting, 3DGS）在实时渲染环境下展示了令人印象深刻的新视角合成（Novel View Synthesis, NVS）效果。在训练过程中，它严重依赖视角空间位置梯度平均幅值来扩展高斯点云以减少渲染损失。然而，这种平均操作会平滑来自不同视点的位置梯度和不同像素的渲染误差，从而阻碍许多有缺陷高斯点云的生长和优化。这导致某些区域出现明显的伪影。为了解决这个问题，我们提出了一种称为硬高斯点云（Hard Gaussian Splatting, HGS）的方法，该方法通过考虑多视角显著位置梯度和渲染误差，生成硬高斯点云，填补经典高斯点云在3D场景中的空白，从而实现更优的新视角合成效果。\n具体来说，我们提出了基于位置梯度驱动的HGS方法，该方法利用多视角显著位置梯度来发现硬高斯点云。此外，我们还提出了基于渲染误差引导的HGS方法，通过识别显著像素渲染误差以及潜在的过大的高斯点云，联合挖掘硬高斯点云。通过扩展和优化这些硬高斯点云，我们的方法有效解决了模糊和针状伪影问题。多种数据集上的实验表明，该方法在保持实时效率的同时，实现了最新的渲染质量。\n"
  },
  {
    "path": "abs/2412.04887.md",
    "content": "### Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction\n\n3D Gaussian Splatting has demonstrated notable success in large-scale scene reconstruction, but challenges persist due to high training memory consumption and storage overhead. Hybrid representations that integrate implicit and explicit features offer a way to mitigate these limitations. However, when applied in parallelized block-wise training, two critical issues arise since reconstruction accuracy deteriorates due to reduced data diversity when training each block independently, and parallel training restricts the number of divided blocks to the available number of GPUs. To address these issues, we propose Momentum-GS, a novel approach that leverages momentum-based self-distillation to promote consistency and accuracy across the blocks while decoupling the number of blocks from the physical GPU count. Our method maintains a teacher Gaussian decoder updated with momentum, ensuring a stable reference during training. This teacher provides each block with global guidance in a self-distillation manner, promoting spatial consistency in reconstruction. To further ensure consistency across the blocks, we incorporate block weighting, dynamically adjusting each block's weight according to its reconstruction accuracy. Extensive experiments on large-scale scenes show that our method consistently outperforms existing techniques, achieving a 12.8% improvement in LPIPS over CityGaussian with much fewer divided blocks and establishing a new state of the art. Project page: this https URL\n\n3D高斯点云（3D Gaussian Splatting）在大规模场景重建中取得了显著成功，但由于训练过程中高内存消耗和存储开销，仍面临诸多挑战。结合隐式与显式特征的混合表示为缓解这些限制提供了可能。然而，在并行分块训练中应用时会出现两个关键问题：由于每个块独立训练导致数据多样性下降，重建精度随之恶化；并且并行训练限制了划分块的数量，受制于可用GPU的数量。\n为了解决这些问题，我们提出了Momentum-GS，这是一种利用基于动量的自蒸馏方法的新颖框架，旨在提升各块之间的一致性和精度，同时将划分块的数量从物理GPU数量中解耦。我们的方法保持一个通过动量更新的教师高斯解码器，在训练过程中提供稳定的参考。该教师模型以自蒸馏的方式为每个块提供全局指导，促进空间一致性的重建。此外，为了进一步保证块间的一致性，我们引入了块权重机制，动态调整每个块的权重以匹配其重建精度。\n在大规模场景上的广泛实验表明，我们的方法在保持较少划分块的同时，性能显著优于现有技术。在LPIPS指标上，相比CityGaussian提升了12.8%，并在领域内树立了新的技术标杆。\n"
  },
  {
    "path": "abs/2412.04955.md",
    "content": "### MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting\n\nReconstructing high-fidelity 3D head avatars is crucial in various applications such as virtual reality. The pioneering methods reconstruct realistic head avatars with Neural Radiance Fields (NeRF), which have been limited by training and rendering speed. Recent methods based on 3D Gaussian Splatting (3DGS) significantly improve the efficiency of training and rendering. However, the surface inconsistency of 3DGS results in subpar geometric accuracy; later, 2DGS uses 2D surfels to enhance geometric accuracy at the expense of rendering fidelity. To leverage the benefits of both 2DGS and 3DGS, we propose a novel method named MixedGaussianAvatar for realistically and geometrically accurate head avatar reconstruction. Our main idea is to utilize 2D Gaussians to reconstruct the surface of the 3D head, ensuring geometric accuracy. We attach the 2D Gaussians to the triangular mesh of the FLAME model and connect additional 3D Gaussians to those 2D Gaussians where the rendering quality of 2DGS is inadequate, creating a mixed 2D-3D Gaussian representation. These 2D-3D Gaussians can then be animated using FLAME parameters. We further introduce a progressive training strategy that first trains the 2D Gaussians and then fine-tunes the mixed 2D-3D Gaussians. We demonstrate the superiority of MixedGaussianAvatar through comprehensive experiments.\n\n重建高保真3D头部头像在虚拟现实等多种应用中至关重要。先前的方法使用神经辐射场（Neural Radiance Fields, NeRF）重建逼真的头部头像，但受限于训练和渲染速度。基于3D高斯点云（3D Gaussian Splatting, 3DGS）的最新方法显著提升了训练和渲染效率。然而，3DGS的表面不一致性导致几何精度不足；后续的2DGS通过2D点云提高了几何精度，但以渲染质量为代价。为同时利用2DGS和3DGS的优势，我们提出了一种名为 MixedGaussianAvatar 的新方法，用于实现逼真且几何准确的头部头像重建。\n我们的核心思路是利用2D高斯点云重建3D头部表面，以确保几何精度。具体而言，我们将2D高斯点云附着在FLAME模型的三角网格上，并在2DGS渲染质量不足的地方附加额外的3D高斯点云，形成混合的2D-3D高斯表示。这些2D-3D高斯点云可以通过FLAME参数进行动画化。\n此外，我们引入了一种渐进式训练策略：首先训练2D高斯点云，然后对混合的2D-3D高斯点云进行微调。综合实验结果表明，MixedGaussianAvatar 在真实感和几何精度上均表现出色，优于现有方法。\n"
  },
  {
    "path": "abs/2412.05256.md",
    "content": "### Extrapolated Urban View Synthesis Benchmark\n\nPhotorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct quantitative and qualitative evaluations of state-of-the-art Gaussian Splatting methods across different difficulty levels. Our results show that Gaussian Splatting is prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We have released our data to help advance self-driving and urban robotics simulation technology.\n\n逼真的模拟器在基于视觉的自动驾驶车辆（AVs）的训练和评估中至关重要，其核心能力是新视角合成（Novel View Synthesis, NVS）。NVS通过生成多样的未见视角，适应自动驾驶车辆的广泛且连续的姿态分布。近年来，诸如3D高斯点云（3D Gaussian Splatting）等辐射场技术在实时速度下实现了逼真渲染，并广泛用于大规模驾驶场景建模。然而，目前的性能评估通常基于插值设置，训练和测试视角高度相关。相比之下，外推评估（extrapolation），即测试视角与训练视角大幅偏离的情况，尚未得到充分探索，这限制了通用模拟技术的发展。\n为填补这一空白，我们利用公开的自动驾驶数据集，这些数据集包含多次遍历、多辆车和多相机设置，构建了首个 外推城市视角合成基准（Extrapolated Urban View Synthesis, EUVS）。同时，我们对不同难度级别下的最新高斯点云方法进行了定量和定性评估。结果表明，高斯点云方法容易过拟合到训练视角。此外，即使结合扩散先验或改进几何处理，在大幅视角变化下也无法从根本上提升NVS性能，这凸显了对更鲁棒方法和大规模训练的需求。\n我们已公开了相关数据，以推动自动驾驶与城市机器人模拟技术的进一步发展。\n"
  },
  {
    "path": "abs/2412.05546.md",
    "content": "### Radiant: Large-scale 3D Gaussian Rendering based on Hierarchical Framework\n\nWith the advancement of computer vision, the recently emerged 3D Gaussian Splatting (3DGS) has increasingly become a popular scene reconstruction algorithm due to its outstanding performance. Distributed 3DGS can efficiently utilize edge devices to directly train on the collected images, thereby offloading computational demands and enhancing efficiency. However, traditional distributed frameworks often overlook computational and communication challenges in real-world environments, hindering large-scale deployment and potentially posing privacy risks. In this paper, we propose Radiant, a hierarchical 3DGS algorithm designed for large-scale scene reconstruction that considers system heterogeneity, enhancing the model performance and training efficiency. Via extensive empirical study, we find that it is crucial to partition the regions for each edge appropriately and allocate varying camera positions to each device for image collection and training. The core of Radiant is partitioning regions based on heterogeneous environment information and allocating workloads to each device accordingly. Furthermore, we provide a 3DGS model aggregation algorithm that enhances the quality and ensures the continuity of models' boundaries. Finally, we develop a testbed, and experiments demonstrate that Radiant improved reconstruction quality by up to 25.7% and reduced up to 79.6% end-to-end latency.\n\n随着计算机视觉的进步，新兴的3D高斯点云（3D Gaussian Splatting, 3DGS）因其卓越的性能，日益成为受欢迎的场景重建算法。分布式3DGS可以高效利用边缘设备直接对采集的图像进行训练，从而减少计算需求并提高效率。然而，传统的分布式框架通常忽视了现实环境中的计算和通信挑战，阻碍了大规模部署，并可能带来隐私风险。\n本文提出了一种层次化的3DGS算法——Radiant，专为大规模场景重建设计，充分考虑系统异构性，从而提升模型性能和训练效率。通过广泛的实证研究，我们发现，合理划分边缘设备的区域并为每个设备分配不同的摄像机位置进行图像采集和训练至关重要。Radiant的核心在于基于异构环境信息划分区域，并据此分配工作负载到每个设备。此外，我们还提供了一种3DGS模型聚合算法，用于提高重建质量并确保模型边界的连续性。\n最后，我们开发了一个测试平台。实验结果表明，Radiant将重建质量提高了多达25.7%，并将端到端时延减少了多达79.6%，展示了其在大规模场景重建中的显著优势。\n"
  },
  {
    "path": "abs/2412.05548.md",
    "content": "### Street Gaussians without 3D Object Tracker\n\nRealistic scene reconstruction in driving scenarios poses significant challenges due to fast-moving objects. Most existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space and move them based on these poses during rendering. While some approaches attempt to use 3D object trackers to replace manual annotations, the limited generalization of 3D trackers -- caused by the scarcity of large-scale 3D datasets -- results in inferior reconstructions in real-world settings. In contrast, 2D foundation models demonstrate strong generalization capabilities. To eliminate the reliance on 3D trackers and enhance robustness across diverse environments, we propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections. Experimental results on Waymo-NOTR datasets show we achieve state-of-the-art performance.\n\n在驾驶场景中的真实感场景重建中，由于快速移动的物体带来了显著挑战。大多数现有方法依赖于劳动密集的手动标注对象姿态，以在标准空间中重建动态对象，并在渲染过程中根据这些姿态移动对象。一些方法尝试使用3D对象追踪器替代手动标注，但由于缺乏大规模3D数据集，3D追踪器的泛化能力有限，导致其在真实环境中的重建效果不佳。\n相比之下，2D基础模型展示了强大的泛化能力。为消除对3D追踪器的依赖并增强在多样化环境中的鲁棒性，我们提出了一种稳定的对象追踪模块，利用2D深度追踪器的关联性结合3D对象融合策略。针对不可避免的追踪误差，我们进一步引入了一种隐式特征空间中的运动学习策略，该策略能够自主修正轨迹误差并恢复漏检的目标。\n在Waymo-NOTR数据集上的实验结果表明，我们的方法达到了最新的性能水平，显著优于现有方法，展示了在动态对象重建中的卓越效果。\n"
  },
  {
    "path": "abs/2412.05560.md",
    "content": "### Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation\n\nText-to-3D generation is a valuable technology in virtual reality and digital content creation. While recent works have pushed the boundaries of text-to-3D generation, producing high-fidelity 3D objects with inefficient prompts and simulating their physics-grounded motion accurately still remain unsolved challenges. To address these challenges, we present an innovative framework that utilizes the Large Language Model (LLM)-refined prompts and diffusion priors-guided Gaussian Splatting (GS) for generating 3D models with accurate appearances and geometric structures. We also incorporate a continuum mechanics-based deformation map and color regularization to synthesize vivid physics-grounded motion for the generated 3D Gaussians, adhering to the conservation of mass and momentum. By integrating text-to-3D generation with physics-grounded motion synthesis, our framework renders photo-realistic 3D objects that exhibit physics-aware motion, accurately reflecting the behaviors of the objects under various forces and constraints across different materials. Extensive experiments demonstrate that our approach achieves high-quality 3D generations with realistic physics-grounded motion.\n\n文本到3D生成技术在虚拟现实和数字内容创作中具有重要价值。尽管最近的研究推动了文本到3D生成的边界，但高保真3D对象的生成仍然面临低效提示的挑战，同时精确模拟基于物理的运动也尚未完全解决。\n为应对这些挑战，我们提出了一种创新框架，结合了大语言模型（LLM）优化的提示和扩散先验引导的高斯点云（Gaussian Splatting, GS），用于生成具有准确外观和几何结构的3D模型。同时，我们引入了基于连续介质力学的变形映射和颜色正则化方法，为生成的3D高斯点云合成生动的物理运动，遵循质量和动量守恒原则。\n通过将文本到3D生成与基于物理的运动合成相结合，我们的框架能够渲染出逼真的3D对象，这些对象表现出物理感知的运动，准确反映在不同材料下对象在各种力和约束条件下的行为。广泛的实验表明，我们的方法不仅生成了高质量的3D模型，还实现了逼真、基于物理的运动合成效果。\n"
  },
  {
    "path": "abs/2412.05570.md",
    "content": "### Template-free Articulated Gaussian Splatting for Real-time Reposable Dynamic View Synthesis\n\nWhile novel view synthesis for dynamic scenes has made significant progress, capturing skeleton models of objects and re-posing them remains a challenging task. To tackle this problem, in this paper, we propose a novel approach to automatically discover the associated skeleton model for dynamic objects from videos without the need for object-specific templates. Our approach utilizes 3D Gaussian Splatting and superpoints to reconstruct dynamic objects. Treating superpoints as rigid parts, we can discover the underlying skeleton model through intuitive cues and optimize it using the kinematic model. Besides, an adaptive control strategy is applied to avoid the emergence of redundant superpoints. Extensive experiments demonstrate the effectiveness and efficiency of our method in obtaining re-posable 3D objects. Not only can our approach achieve excellent visual fidelity, but it also allows for the real-time rendering of high-resolution images.\n\n尽管动态场景的新视角合成取得了显著进展，但捕获对象的骨架模型并对其重新定位仍然是一项具有挑战性的任务。为解决这一问题，本文提出了一种新颖的方法，能够无需对象特定模板，自动从视频中发现动态对象的相关骨架模型。\n我们的方法利用3D高斯点云（3D Gaussian Splatting）和超点（superpoints）重建动态对象。将超点视为刚性部件，我们通过直观的线索发现底层骨架模型，并通过运动学模型进行优化。此外，我们应用了一种自适应控制策略，以避免冗余超点的出现。\n大量实验表明，我们的方法在获取可重新定位的3D对象方面表现出色，不仅能够实现优异的视觉保真度，还支持高分辨率图像的实时渲染。这表明该方法在动态对象的骨架建模和重定位任务中具有高效性和有效性。\n"
  },
  {
    "path": "abs/2412.05695.md",
    "content": "### WATER-GS: Toward Copyright Protection for 3D Gaussian Splatting via Universal Watermarking\n\n3D Gaussian Splatting (3DGS) has emerged as a pivotal technique for 3D scene representation, providing rapid rendering speeds and high fidelity. As 3DGS gains prominence, safeguarding its intellectual property becomes increasingly crucial since 3DGS could be used to imitate unauthorized scene creations and raise copyright issues. Existing watermarking methods for implicit NeRFs cannot be directly applied to 3DGS due to its explicit representation and real-time rendering process, leaving watermarking for 3DGS largely unexplored. In response, we propose WATER-GS, a novel method designed to protect 3DGS copyrights through a universal watermarking strategy. First, we introduce a pre-trained watermark decoder, treating raw 3DGS generative modules as potential watermark encoders to ensure imperceptibility. Additionally, we implement novel 3D distortion layers to enhance the robustness of the embedded watermark against common real-world distortions of point cloud data. Comprehensive experiments and ablation studies demonstrate that WATER-GS effectively embeds imperceptible and robust watermarks into 3DGS without compromising rendering efficiency and quality. Our experiments indicate that the 3D distortion layers can yield up to a 20% improvement in accuracy rate. Notably, our method is adaptable to different 3DGS variants, including 3DGS compression frameworks and 2D Gaussian splatting.\n\n3D高斯点云（3D Gaussian Splatting, 3DGS）已成为3D场景表示的关键技术，具有快速渲染速度和高保真度。随着3DGS的重要性日益提升，保护其知识产权变得愈加重要，因为3DGS可能被用于模仿未经授权的场景创作，从而引发版权问题。由于3DGS采用显式表示并支持实时渲染，现有针对隐式NeRF的水印方法无法直接应用于3DGS，这使得3DGS的水印保护仍是一个尚未充分探索的领域。\n对此，我们提出了 WATER-GS，一种通过通用水印策略保护3DGS版权的新方法。首先，我们引入了一个预训练的水印解码器，将原始的3DGS生成模块视为潜在的水印编码器，从而确保水印的不可感知性。此外，我们设计了新颖的3D失真层，以增强嵌入水印在点云数据常见真实世界失真下的鲁棒性。\n全面的实验和消融研究表明，WATER-GS 能够在不影响渲染效率和质量的情况下，将不可感知且鲁棒的水印嵌入到3DGS中。实验结果显示，3D失真层可以将水印解码的准确率提高多达20%。值得注意的是，我们的方法适用于不同的3DGS变体，包括3DGS压缩框架和2D高斯点云分布。这为3DGS的版权保护提供了一种有效且通用的解决方案。\n"
  },
  {
    "path": "abs/2412.05700.md",
    "content": "### Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes\n\nRecent advancements in high-fidelity dynamic scene reconstruction have leveraged dynamic 3D Gaussians and 4D Gaussian Splatting for realistic scene representation. However, to make these methods viable for real-time applications such as AR/VR, gaming, and rendering on low-power devices, substantial reductions in memory usage and improvements in rendering efficiency are required. While many state-of-the-art methods prioritize lightweight implementations, they struggle in handling scenes with complex motions or long sequences. In this work, we introduce Temporally Compressed 3D Gaussian Splatting (TC3DGS), a novel technique designed specifically to effectively compress dynamic 3D Gaussian representations. TC3DGS selectively prunes Gaussians based on their temporal relevance and employs gradient-aware mixed-precision quantization to dynamically compress Gaussian parameters. It additionally relies on a variation of the Ramer-Douglas-Peucker algorithm in a post-processing step to further reduce storage by interpolating Gaussian trajectories across frames. Our experiments across multiple datasets demonstrate that TC3DGS achieves up to 67× compression with minimal or no degradation in visual quality.\n\n在高保真动态场景重建领域，动态3D高斯点云（Dynamic 3D Gaussians）和4D高斯点云（4D Gaussian Splatting）技术取得了显著进展，为真实场景表示提供了强大支持。然而，为使这些方法在增强现实（AR）、虚拟现实（VR）、游戏以及低功耗设备上的实时应用中具备可行性，需显著降低内存使用并提高渲染效率。目前许多先进方法优先考虑轻量化实现，但在处理复杂运动场景或长时间序列时往往表现欠佳。\n为解决这一问题，我们提出了 TC3DGS（Temporally Compressed 3D Gaussian Splatting），这是一种专为高效压缩动态3D高斯表示而设计的新技术。TC3DGS通过基于时间相关性的选择性剪枝策略对高斯点进行优化，并采用梯度感知的混合精度量化动态压缩高斯参数。此外，在后处理步骤中，我们引入了一种改进的Ramer-Douglas-Peucker算法，通过跨帧插值高斯轨迹进一步减少存储需求。\n在多个数据集上的实验表明，TC3DGS 在保持视觉质量几乎无损的前提下，实现了高达67倍的压缩效果。这种方法为动态场景的高效表示提供了新的可能性，同时满足实时渲染和资源受限设备的需求。\n"
  },
  {
    "path": "abs/2412.05808.md",
    "content": "### SizeGS: Size-aware Compression of 3D Gaussians with Hierarchical Mixed Precision Quantization\n\nEffective compression technology is crucial for 3DGS to adapt to varying storage and transmission conditions. However, existing methods fail to address size constraints while maintaining optimal quality. In this paper, we introduce SizeGS, a framework that compresses 3DGS within a specified size budget while optimizing visual quality. We start with a size estimator to establish a clear relationship between file size and hyperparameters. Leveraging this estimator, we incorporate mixed precision quantization (MPQ) into 3DGS attributes, structuring MPQ in two hierarchical level -- inter-attribute and intra-attribute -- to optimize visual quality under the size constraint. At the inter-attribute level, we assign bit-widths to each attribute channel by formulating the combinatorial optimization as a 0-1 integer linear program, which can be efficiently solved. At the intra-attribute level, we divide each attribute channel into blocks of vectors, quantizing each vector based on the optimal bit-width derived at the inter-attribute level. Dynamic programming determines block lengths. Using the size estimator and MPQ, we develop a calibrated algorithm to identify optimal hyperparameters in just 10 minutes, achieving a 1.69× efficiency increase with quality comparable to state-of-the-art methods.\n\n高效的压缩技术对于3D高斯点云（3D Gaussian Splatting, 3DGS）适应多变的存储和传输条件至关重要。然而，现有方法在满足存储大小限制的同时保持最佳质量方面表现不足。为此，本文提出 SizeGS，一种框架化方法，能够在指定的大小预算内压缩3DGS，同时优化视觉质量。\n我们首先设计了一个 大小估计器，用于建立文件大小与超参数之间的明确关系。在此基础上，我们将混合精度量化（Mixed Precision Quantization, MPQ）引入3DGS属性，并在两个层级中结构化MPQ：跨属性层级（inter-attribute）和属性内层级（intra-attribute），以在大小限制下优化视觉质量。跨属性层级：通过将组合优化问题表述为0-1整数线性规划，为每个属性通道分配位宽，该问题能够高效求解。属性内层级：将每个属性通道划分为向量块，并基于跨属性层级确定的最佳位宽对每个向量进行量化。块长度则通过动态规划方法确定。\n利用大小估计器和MPQ，我们开发了一种校准算法，仅需10分钟即可识别最佳超参数。在实验中，SizeGS实现了1.69倍的效率提升，并且在视觉质量上可媲美最新的先进方法。这一方法为3DGS在存储受限环境中的高效应用提供了新工具，同时保持了优异的视觉效果。\n"
  },
  {
    "path": "abs/2412.05908.md",
    "content": "### GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting and Meshing\n\nGaussian splatting has gained attention for its efficient representation and rendering of 3D scenes using continuous Gaussian primitives. However, it struggles with sparse-view inputs due to limited geometric and photometric information, causing ambiguities in depth, shape, and texture.\nwe propose GBR: Generative Bundle Refinement, a method for high-fidelity Gaussian splatting and meshing using only 4-6 input views. GBR integrates a neural bundle adjustment module to enhance geometry accuracy and a generative depth refinement module to improve geometry fidelity. More specifically, the neural bundle adjustment module integrates a foundation network to produce initial 3D point maps and point matches from unposed images, followed by bundle adjustment optimization to improve multiview consistency and point cloud accuracy. The generative depth refinement module employs a diffusion-based strategy to enhance geometric details and fidelity while preserving the scale. Finally, for Gaussian splatting optimization, we propose a multimodal loss function incorporating depth and normal consistency, geometric regularization, and pseudo-view supervision, providing robust guidance under sparse-view conditions. Experiments on widely used datasets show that GBR significantly outperforms existing methods under sparse-view inputs. Additionally, GBR demonstrates the ability to reconstruct and render large-scale real-world scenes, such as the Pavilion of Prince Teng and the Great Wall, with remarkable details using only 6 views.\n\n高斯点云（Gaussian Splatting）因其利用连续高斯基元进行高效的3D场景表示和渲染而备受关注。然而，在稀疏视角输入的情况下，由于几何和光度信息有限，深度、形状和纹理存在歧义性，导致效果较差。\n\n为了解决这一问题，我们提出了 GBR（Generative Bundle Refinement），一种仅使用4-6个输入视角即可实现高保真高斯点云和网格化重建的方法。GBR集成了一个神经束调整模块和一个生成式深度细化模块，分别用于提高几何精度和细节保真度。\n神经束调整模块：结合基础网络，从未定位的图像中生成初始3D点图和点匹配关系，并通过束调整优化（bundle adjustment）改进多视角一致性和点云精度。生成式深度细化模块：采用基于扩散的策略，增强几何细节和保真度，同时保持尺度一致性。\n此外，为优化高斯点云，我们提出了一种多模态损失函数，结合深度和法线一致性、几何正则化以及伪视角监督，提供在稀疏视角条件下的稳健指导。\n在广泛使用的数据集上的实验表明，GBR在稀疏视角输入情况下显著优于现有方法。此外，GBR展示了在大规模真实场景（如滕王阁和长城）中进行高细节重建和渲染的能力，仅使用6个视角即可实现令人惊叹的细节还原。这表明GBR在稀疏数据条件下具有强大的重建和渲染潜力。\n"
  },
  {
    "path": "abs/2412.05969.md",
    "content": "### Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation\n\nIn this paper, we propose a novel semantic splatting approach based on Gaussian Splatting to achieve efficient and low-latency. Our method projects the RGB attributes and semantic features of point clouds onto the image plane, simultaneously rendering RGB images and semantic segmentation results. Leveraging the explicit structure of point clouds and a one-time rendering strategy, our approach significantly enhances efficiency during optimization and rendering. Additionally, we employ SAM2 to generate pseudo-labels for boundary regions, which often lack sufficient supervision, and introduce two-level aggregation losses at the 2D feature map and 3D spatial levels to improve the view-consistent and spatial continuity.\n\n本文提出了一种基于高斯点云（Gaussian Splatting）的新型语义投影方法，以实现高效、低延迟的渲染与优化。我们的方法将点云的RGB属性和语义特征投影到图像平面，同时生成RGB图像和语义分割结果。通过利用点云的显式结构和一次性渲染策略，我们的方法显著提升了优化和渲染过程的效率。\n此外，为了解决边界区域监督不足的问题，我们引入了 SAM2 生成伪标签，用于增强边界区域的语义信息。与此同时，我们在二维特征图和三维空间两个层次上引入了双重聚合损失，以改善视角一致性和空间连续性。\n这一方法不仅在效率上有显著提升，还在语义分割的准确性和连续性上取得了优异表现，为实时场景渲染和语义理解提供了一种有效的解决方案。\n"
  },
  {
    "path": "abs/2412.06234.md",
    "content": "### Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction\n\nGeneralized feed-forward Gaussian models have achieved significant progress in sparse-view 3D reconstruction by leveraging prior knowledge from large multi-view datasets. However, these models often struggle to represent high-frequency details due to the limited number of Gaussians. While the densification strategy used in per-scene 3D Gaussian splatting (3D-GS) optimization can be adapted to the feed-forward models, it may not be ideally suited for generalized scenarios. In this paper, we propose Generative Densification, an efficient and generalizable method to densify Gaussians generated by feed-forward models. Unlike the 3D-GS densification strategy, which iteratively splits and clones raw Gaussian parameters, our method up-samples feature representations from the feed-forward models and generates their corresponding fine Gaussians in a single forward pass, leveraging the embedded prior knowledge for enhanced generalization. Experimental results on both object-level and scene-level reconstruction tasks demonstrate that our method outperforms state-of-the-art approaches with comparable or smaller model sizes, achieving notable improvements in representing fine details.\n\n基于广义前馈高斯模型的稀疏视角3D重建利用大规模多视角数据集的先验知识取得了显著进展。然而，由于高斯数量有限，这些模型在表示高频细节方面往往表现不足。尽管每场景优化的3D高斯点云（3D-GS）中采用的密化策略可以适配于前馈模型，但在广义场景中可能并不理想。\n本文提出了 生成式密化（Generative Densification），这是一种高效且具备良好泛化能力的方法，用于密化由前馈模型生成的高斯点云。不同于3D-GS密化策略通过迭代地分裂和复制原始高斯参数来实现密化，我们的方法通过一次前向传播对前馈模型的特征表示进行上采样，并生成相应的细化高斯点云，从而利用嵌入的先验知识提升泛化能力。\n在对象级和场景级重建任务上的实验表明，我们的方法在模型大小相当或更小的情况下，性能优于最新方法，在细节表示上取得了显著改进。这种方法不仅提升了前馈模型在稀疏视角条件下的表现，还为高效、细节丰富的3D重建提供了一种通用解决方案。\n"
  },
  {
    "path": "abs/2412.06250.md",
    "content": "### Splatter-360: Generalizable 360∘ Gaussian Splatting for Wide-baseline Panoramic Images\n\nWide-baseline panoramic images are frequently used in applications like VR and simulations to minimize capturing labor costs and storage needs. However, synthesizing novel views from these panoramic images in real time remains a significant challenge, especially due to panoramic imagery's high resolution and inherent distortions. Although existing 3D Gaussian splatting (3DGS) methods can produce photo-realistic views under narrow baselines, they often overfit the training views when dealing with wide-baseline panoramic images due to the difficulty in learning precise geometry from sparse 360∘ views. This paper presents Splatter-360, a novel end-to-end generalizable 3DGS framework designed to handle wide-baseline panoramic images. Unlike previous approaches, Splatter-360 performs multi-view matching directly in the spherical domain by constructing a spherical cost volume through a spherical sweep algorithm, enhancing the network's depth perception and geometry estimation. Additionally, we introduce a 3D-aware bi-projection encoder to mitigate the distortions inherent in panoramic images and integrate cross-view attention to improve feature interactions across multiple viewpoints. This enables robust 3D-aware feature representations and real-time rendering capabilities. Experimental results on the HM3Dhm3d and Replicareplica demonstrate that Splatter-360 significantly outperforms state-of-the-art NeRF and 3DGS methods (e.g., PanoGRF, MVSplat, DepthSplat, and HiSplat) in both synthesis quality and generalization performance for wide-baseline panoramic images.\n\n宽基线全景图像常用于虚拟现实（VR）和模拟等应用场景，以减少采集劳动成本和存储需求。然而，从这些全景图像中实时生成新视角仍然是一项重大挑战，尤其是由于全景图像的高分辨率和固有畸变问题。尽管现有的3D高斯点云（3D Gaussian Splatting, 3DGS）方法能够在窄基线条件下生成逼真的视图，但在处理稀疏360°宽基线全景图像时，由于难以从稀疏视角中学习精确的几何结构，这些方法通常会过拟合训练视图。\n为解决这一问题，本文提出了 Splatter-360，一种面向宽基线全景图像的端到端可泛化3DGS框架。与以往方法不同，Splatter-360 直接在球面域中进行多视图匹配，通过球面扫描算法构建球面代价体，从而增强网络的深度感知和几何估计能力。此外，我们引入了一个3D感知双投影编码器来缓解全景图像的畸变问题，并集成了跨视角注意力机制以改善多视点之间的特征交互。这种设计能够生成稳健的3D感知特征表示，并支持实时渲染。\n在 HM3D 和 Replica 数据集上的实验结果表明，Splatter-360 在宽基线全景图像的新视角合成质量和泛化性能方面，显著优于现有的最新方法（如 PanoGRF、MVSplat、DepthSplat 和 HiSplat）。这一框架不仅提升了合成精度，还为宽基线全景图像的实时处理提供了高效解决方案。\n"
  },
  {
    "path": "abs/2412.06257.md",
    "content": "### Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects\n\n3D Gaussian Splatting (3DGS) has attracted significant attention for its potential to revolutionize 3D representation, rendering, and interaction. Despite the rapid growth of 3DGS research, its direct application to Extended Reality (XR) remains underexplored. Although many studies recognize the potential of 3DGS for XR, few have explicitly focused on or demonstrated its effectiveness within XR environments. In this paper, we aim to synthesize innovations in 3DGS that show specific potential for advancing XR research and development. We conduct a comprehensive review of publicly available 3DGS papers, with a focus on those referencing XR-related concepts. Additionally, we perform an in-depth analysis of innovations explicitly relevant to XR and propose a taxonomy to highlight their significance. Building on these insights, we propose several prospective XR research areas where 3DGS can make promising contributions, yet remain rarely touched. By investigating the intersection of 3DGS and XR, this paper provides a roadmap to push the boundaries of XR using cutting-edge 3DGS techniques.\n\n3D高斯投影（3D Gaussian Splatting, 3DGS）因其在革新3D表示、渲染和交互方面的潜力而备受关注。尽管3DGS研究发展迅速，其在扩展现实（Extended Reality, XR）中的直接应用仍然鲜有探讨。虽然许多研究认识到3DGS在XR领域的潜力，但明确专注于XR环境中验证其有效性的研究却较少。在本文中，我们旨在整合3DGS领域内具有推动XR研究与开发潜力的创新点。我们对公开发表的3DGS论文进行了全面回顾，特别关注那些提及XR相关概念的研究。此外，我们深入分析了与XR明确相关的创新，并提出一个分类法以突出其重要性。在此基础上，我们提出了一些3DGS在XR中可发挥重要作用但鲜有涉足的潜在研究方向。通过探讨3DGS与XR的交集，本文为利用尖端3DGS技术推动XR发展提供了一条清晰的路线图。\n"
  },
  {
    "path": "abs/2412.06273.md",
    "content": "### Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction\n\nPrior works employing pixel-based Gaussian representation have demonstrated efficacy in feed-forward sparse-view reconstruction. However, such representation necessitates cross-view overlap for accurate depth estimation, and is challenged by object occlusions and frustum truncations. As a result, these methods require scene-centric data acquisition to maintain cross-view overlap and complete scene visibility to circumvent occlusions and truncations, which limits their applicability to scene-centric reconstruction. In contrast, in autonomous driving scenarios, a more practical paradigm is ego-centric reconstruction, which is characterized by minimal cross-view overlap and frequent occlusions and truncations. The limitations of pixel-based representation thus hinder the utility of prior works in this task. In light of this, this paper conducts an in-depth analysis of different representations, and introduces Omni-Gaussian representation with tailored network design to complement their strengths and mitigate their drawbacks. Experiments show that our method significantly surpasses state-of-the-art methods, pixelSplat and MVSplat, in ego-centric reconstruction, and achieves comparable performance to prior works in scene-centric reconstruction. Furthermore, we extend our method with diffusion models, pioneering feed-forward multi-modal generation of 3D driving scenes.\n\n以像素为基础的高斯表示法在前人研究中已被证明在前馈稀疏视图重建任务中具有较高的有效性。然而，这种表示需要跨视角重叠以确保深度估计的准确性，并且在处理物体遮挡和视锥体截断问题时面临挑战。因此，这些方法通常需要以场景为中心的数据采集方式，以维持视角重叠和场景的完整可见性，从而绕过遮挡和截断的问题，但这也限制了其在场景中心重建任务中的应用。相比之下，在自动驾驶场景中，更实用的范式是以自我为中心的重建（ego-centric reconstruction），其特点是视角重叠最小化，同时伴随频繁的遮挡和截断现象。像素为基础的表示法的局限性因此制约了前人方法在此任务中的应用。\n针对这一问题，本文深入分析了不同的表示方法，并提出了一种称为全方位高斯表示（Omni-Gaussian representation）的新方法，结合定制化的网络设计，以补充这些方法的优点并减轻其缺点。实验结果表明，我们的方法在以自我为中心的重建任务中显著超越了最先进的方法，如 pixelSplat 和 MVSplat，同时在以场景为中心的重建任务中取得了与前人方法相当的性能。此外，我们将该方法扩展至扩散模型，率先实现了自动驾驶场景中3D的前馈多模态生成。\n"
  },
  {
    "path": "abs/2412.06299.md",
    "content": "### 4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes\n\nReconstructing dynamic scenes from video sequences is a highly promising task in the multimedia domain. While previous methods have made progress, they often struggle with slow rendering and managing temporal complexities such as significant motion and object appearance/disappearance. In this paper, we propose SaRO-GS as a novel dynamic scene representation capable of achieving real-time rendering while effectively handling temporal complexities in dynamic scenes. To address the issue of slow rendering speed, we adopt a Gaussian primitive-based representation and optimize the Gaussians in 4D space, which facilitates real-time rendering with the assistance of 3D Gaussian Splatting. Additionally, to handle temporally complex dynamic scenes, we introduce a Scale-aware Residual Field. This field considers the size information of each Gaussian primitive while encoding its residual feature and aligns with the self-splitting behavior of Gaussian primitives. Furthermore, we propose an Adaptive Optimization Schedule, which assigns different optimization strategies to Gaussian primitives based on their distinct temporal properties, thereby expediting the reconstruction of dynamic regions. Through evaluations on monocular and multi-view datasets, our method has demonstrated state-of-the-art performance.\n\n从视频序列中重建动态场景是多媒体领域中一个非常有前景的任务。尽管现有方法已经取得了一定进展，但它们通常在渲染速度较慢以及应对时间复杂性（如显著的运动以及物体的出现/消失）方面存在挑战。为此，本文提出了一种新的动态场景表示方法——SaRO-GS，该方法能够实现实时渲染，同时有效处理动态场景中的时间复杂性。\n为解决渲染速度慢的问题，我们采用基于高斯原语（Gaussian primitive）的表示方法，并在四维空间中优化高斯分布，通过结合3D高斯投影技术实现实时渲染。此外，为处理时间复杂的动态场景，我们引入了尺度感知残差场（Scale-aware Residual Field）。该场在编码残差特征时考虑了每个高斯原语的尺寸信息，并与高斯原语的自分裂行为相一致。\n此外，我们提出了一种自适应优化调度（Adaptive Optimization Schedule），根据高斯原语的不同时间特性分配优化策略，从而加速动态区域的重建过程。通过对单目和多视角数据集的评估，我们的方法展现出了当前最先进的性能水平。\n"
  },
  {
    "path": "abs/2412.06424.md",
    "content": "### Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video\n\nRecent 4D reconstruction methods have yielded impressive results but rely on sharp videos as supervision. However, motion blur often occurs in videos due to camera shake and object movement, while existing methods render blurry results when using such videos for reconstructing 4D models. Although a few NeRF-based approaches attempted to address the problem, they struggled to produce high-quality results, due to the inaccuracy in estimating continuous dynamic representations within the exposure time. Encouraged by recent works in 3D motion trajectory modeling using 3D Gaussian Splatting (3DGS), we suggest taking 3DGS as the scene representation manner, and propose the first 4D Gaussian Splatting framework to reconstruct a high-quality 4D model from blurry monocular video, named Deblur4DGS. Specifically, we transform continuous dynamic representations estimation within an exposure time into the exposure time estimation. Moreover, we introduce exposure regularization to avoid trivial solutions, as well as multi-frame and multi-resolution consistency ones to alleviate artifacts. Furthermore, to better represent objects with large motion, we suggest blur-aware variable canonical Gaussians. Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame interpolation, and video stabilization. Extensive experiments on the above four tasks show that Deblur4DGS outperforms state-of-the-art 4D reconstruction methods.\n\nRecent 4D reconstruction methods have yielded impressive results but rely on sharp videos as supervision. However, motion blur often occurs in videos due to camera shake and object movement, while existing methods render blurry results when using such videos for reconstructing 4D models. Although a few NeRF-based approaches attempted to address the problem, they struggled to produce high-quality results, due to the inaccuracy in estimating continuous dynamic representations within the exposure time. Encouraged by recent works in 3D motion trajectory modeling using 3D Gaussian Splatting (3DGS), we suggest taking 3DGS as the scene representation manner, and propose the first 4D Gaussian Splatting framework to reconstruct a high-quality 4D model from blurry monocular video, named Deblur4DGS. Specifically, we transform continuous dynamic representations estimation within an exposure time into the exposure time estimation. Moreover, we introduce exposure regularization to avoid trivial solutions, as well as multi-frame and multi-resolution consistency ones to alleviate artifacts. Furthermore, to better represent objects with large motion, we suggest blur-aware variable canonical Gaussians. Beyond novel-view synthesis, Deblur4DGS can be applied to improve blurry video from multiple perspectives, including deblurring, frame interpolation, and video stabilization. Extensive experiments on the above four tasks show that Deblur4DGS outperforms state-of-the-art 4D reconstruction methods.\n"
  },
  {
    "path": "abs/2412.06767.md",
    "content": "### MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views\n\nWe present a novel appearance model that simultaneously realizes explicit high-quality 3D surface mesh recovery and photorealistic novel view synthesis from sparse view samples. Our key idea is to model the underlying scene geometry Mesh as an Atlas of Charts which we render with 2D Gaussian surfels (MAtCha Gaussians). MAtCha distills high-frequency scene surface details from an off-the-shelf monocular depth estimator and refines it through Gaussian surfel rendering. The Gaussian surfels are attached to the charts on the fly, satisfying photorealism of neural volumetric rendering and crisp geometry of a mesh model, i.e., two seemingly contradicting goals in a single model. At the core of MAtCha lies a novel neural deformation model and a structure loss that preserve the fine surface details distilled from learned monocular depths while addressing their fundamental scale ambiguities. Results of extensive experimental validation demonstrate MAtCha's state-of-the-art quality of surface reconstruction and photorealism on-par with top contenders but with dramatic reduction in the number of input views and computational time. We believe MAtCha will serve as a foundational tool for any visual application in vision, graphics, and robotics that require explicit geometry in addition to photorealism.\n\n我们提出了一种新颖的外观模型，可以同时实现高质量的3D表面网格重建和基于稀疏视图样本的真实感新视角合成。我们的核心思想是将底层场景几何网格（Mesh）建模为一组二维图表构成的图集（Atlas of Charts），并通过二维高斯面元（Gaussian surfels）进行渲染，称为 MAtCha Gaussians。MAtCha 利用现成的单目深度估计器提取场景表面的高频细节，并通过高斯面元渲染进一步优化这些细节。\n高斯面元动态附加到图表上，从而在一个模型中同时实现神经体积渲染的真实感和网格模型的清晰几何结构，即解决了两个看似矛盾的目标。MAtCha 的核心是一种新颖的神经变形模型和结构损失函数，这些创新既保留了从学习的单目深度中提取的细致表面细节，又解决了深度估计固有的尺度歧义问题。\n广泛的实验验证表明，MAtCha 在表面重建质量和真实感方面达到了当前最先进的水平，与顶尖方法相当，同时显著减少了所需的输入视图数量和计算时间。我们相信，MAtCha 将成为视觉、图形和机器人领域中任何需要几何显式表示和真实感的视觉应用的基础工具。\n"
  },
  {
    "path": "abs/2412.06974.md",
    "content": "### MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds\n\nRecent sparse multi-view scene reconstruction advances like DUSt3R and MASt3R no longer require camera calibration and camera pose estimation. However, they only process a pair of views at a time to infer pixel-aligned pointmaps. When dealing with more than two views, a combinatorial number of error prone pairwise reconstructions are usually followed by an expensive global optimization, which often fails to rectify the pairwise reconstruction errors. To handle more views, reduce errors, and improve inference time, we propose the fast single-stage feed-forward network MV-DUSt3R. At its core are multi-view decoder blocks which exchange information across any number of views while considering one reference view. To make our method robust to reference view selection, we further propose MV-DUSt3R+, which employs cross-reference-view blocks to fuse information across different reference view choices. To further enable novel view synthesis, we extend both by adding and jointly training Gaussian splatting heads. Experiments on multi-view stereo reconstruction, multi-view pose estimation, and novel view synthesis confirm that our methods improve significantly upon prior art.\n\n近年来，稀疏多视图场景重建技术（如 DUSt3R 和 MASt3R）已经不再依赖相机校准和相机位姿估计。然而，这些方法每次仅处理一对视图，通过推断像素对齐的点图来进行重建。当处理多个视图时，通常需要对成对视图进行组合数级别的重建，这种方法容易产生误差，并且需要昂贵的全局优化来校正这些成对重建的误差。然而，全局优化往往难以有效解决这些误差问题。\n为处理更多视图、减少误差并提高推理速度，我们提出了单阶段前馈网络 MV-DUSt3R。其核心是多视图解码块（multi-view decoder blocks），能够在参考视图的基础上与任意数量的视图交换信息。为增强对参考视图选择的鲁棒性，我们进一步提出 MV-DUSt3R+，通过跨参考视图块（cross-reference-view blocks）在不同参考视图之间融合信息。\n此外，为实现新视角合成（novel view synthesis），我们扩展了这两种模型，加入并联合训练高斯投影（Gaussian splatting）模块。实验表明，在多视图立体重建、多视图位姿估计和新视角合成任务中，MV-DUSt3R 和 MV-DUSt3R+ 显著优于现有方法，推动了多视图场景重建的性能与效率的新高度。\n"
  },
  {
    "path": "abs/2412.07293.md",
    "content": "### EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering\n\nWe introduce a method for using event camera data in novel view synthesis via Gaussian Splatting. Event cameras offer exceptional temporal resolution and a high dynamic range. Leveraging these capabilities allows us to effectively address the novel view synthesis challenge in the presence of fast camera motion. For initialization of the optimization process, our approach uses prior knowledge encoded in an event-to-video model. We also use spline interpolation for obtaining high quality poses along the event camera trajectory. This enhances the reconstruction quality from fast-moving cameras while overcoming the computational limitations traditionally associated with event-based Neural Radiance Field (NeRF) methods. Our experimental evaluation demonstrates that our results achieve higher visual fidelity and better performance than existing event-based NeRF approaches while being an order of magnitude faster to render.\n\n我们提出了一种基于高斯投影（Gaussian Splatting）的方法，将事件相机数据应用于新视角合成。事件相机具有卓越的时间分辨率和高动态范围，利用这些特性，我们能够有效应对快速相机运动下的新视角合成挑战。\n在优化过程的初始化阶段，我们采用事件到视频的模型（event-to-video model）所编码的先验知识。此外，我们使用样条插值来生成事件相机轨迹中的高质量姿态，从而提升快速运动相机的重建质量，同时克服传统事件驱动的神经辐射场（NeRF）方法所面临的计算限制。\n实验评估表明，与现有的基于事件的 NeRF 方法相比，我们的结果在视觉保真度和性能上表现更佳，同时渲染速度提高了一个数量级。\n"
  },
  {
    "path": "abs/2412.07494.md",
    "content": "### ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery\n\nRecently, 3D Gaussian Splatting (3D-GS) has prevailed in novel view synthesis, achieving high fidelity and efficiency. However, it often struggles to capture rich details and complete geometry. Our analysis highlights a key limitation of 3D-GS caused by the fixed threshold in densification, which balances geometry coverage against detail recovery as the threshold varies. To address this, we introduce a novel densification method, residual split, which adds a downscaled Gaussian as a residual. Our approach is capable of adaptively retrieving details and complementing missing geometry while enabling progressive refinement. To further support this method, we propose a pipeline named ResGS. Specifically, we integrate a Gaussian image pyramid for progressive supervision and implement a selection scheme that prioritizes the densification of coarse Gaussians over time. Extensive experiments demonstrate that our method achieves SOTA rendering quality. Consistent performance improvements can be achieved by applying our residual split on various 3D-GS variants, underscoring its versatility and potential for broader application in 3D-GS-based applications.\n\n近年来，3D高斯投影（3D Gaussian Splatting, 3D-GS）在新视角合成任务中表现突出，兼具高保真度和高效率。然而，其在捕捉丰富细节和完整几何结构方面常显不足。我们的分析表明，3D-GS 的主要限制在于固定的稠密化阈值。该阈值的变化在几何覆盖和细节恢复之间形成权衡，从而影响整体表现。\n为解决这一问题，我们提出了一种新颖的稠密化方法——残差分裂（Residual Split）。该方法通过添加一个缩小尺度的高斯作为残差，能够自适应地恢复细节并补充缺失的几何，同时支持渐进式优化。此外，为配合该方法，我们设计了一条名为 ResGS 的流水线。具体而言，我们集成了高斯图像金字塔，用于渐进式监督，并实现了一种选择机制，优先对粗略高斯进行稠密化处理。\n广泛的实验表明，我们的方法在渲染质量上达到了当前最先进（SOTA）的水平。通过在多种 3D-GS 变体中应用残差分裂，均可实现一致的性能提升，这突显了其广泛适用性及在 3D-GS 应用中的潜力。\n"
  },
  {
    "path": "abs/2412.07534.md",
    "content": "### ReCap: Better Gaussian Relighting with Cross-Environment Captures\n\nAccurate 3D objects relighting in diverse unseen environments is crucial for realistic virtual object placement. Due to the albedo-lighting ambiguity, existing methods often fall short in producing faithful relights. Without proper constraints, observed training views can be explained by numerous combinations of lighting and material attributes, lacking physical correspondence with the actual environment maps used for relighting. In this work, we present ReCap, treating cross-environment captures as multi-task target to provide the missing supervision that cuts through the entanglement. Specifically, ReCap jointly optimizes multiple lighting representations that share a common set of material attributes. This naturally harmonizes a coherent set of lighting representations around the mutual material attributes, exploiting commonalities and differences across varied object appearances. Such coherence enables physically sound lighting reconstruction and robust material estimation - both essential for accurate relighting. Together with a streamlined shading function and effective post-processing, ReCap outperforms the leading competitor by 3.4 dB in PSNR on an expanded relighting benchmark.\n\n在多样且未知的环境中实现准确的3D对象重光照（relighting）对于虚拟对象的真实感放置至关重要。然而，由于反照率（albedo）与光照的模糊性，现有方法往往无法生成真实可信的重光照效果。缺乏适当约束时，训练视图可能被解释为多种光照与材质属性的组合，这些组合与实际用于重光照的环境光图缺乏物理对应关系。\n在本研究中，我们提出了 ReCap 方法，通过将跨环境捕捉任务视为多任务目标，提供缺失的监督信号，解开光照与材质之间的纠缠。具体而言，ReCap 同时优化多个光照表示，并共享一组共同的材质属性。这样的设计自然协调了一组围绕共同材质属性的连贯光照表示，充分利用对象在不同外观下的共性和差异。\n这种连贯性支持物理合理的光照重建和稳健的材质估计，而这两者对于实现精确的重光照至关重要。结合精简的着色函数与高效的后处理，ReCap 在扩展的重光照基准上，较领先方法的峰值信噪比（PSNR）提升了3.4 dB，显著优于现有技术。\n"
  },
  {
    "path": "abs/2412.07608.md",
    "content": "### Faster and Better 3D Splatting via Group Training\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, demonstrating remarkable capability in high-fidelity scene reconstruction through its Gaussian primitive representations. However, the computational overhead induced by the massive number of primitives poses a significant bottleneck to training efficiency. To overcome this challenge, we propose Group Training, a simple yet effective strategy that organizes Gaussian primitives into manageable groups, optimizing training efficiency and improving rendering quality. This approach shows universal compatibility with existing 3DGS frameworks, including vanilla 3DGS and Mip-Splatting, consistently achieving accelerated training while maintaining superior synthesis quality. Extensive experiments reveal that our straightforward Group Training strategy achieves up to 30% faster convergence and improved rendering quality across diverse scenarios.\n\n3D高斯投影（3D Gaussian Splatting, 3DGS）作为一种强大的新视角合成技术，通过高斯原语的表示展现了在高保真场景重建中的卓越能力。然而，由于高斯原语数量庞大，其计算开销显著，成为训练效率的主要瓶颈。\n为解决这一问题，我们提出了一种简单而高效的策略——分组训练（Group Training），通过将高斯原语组织成可管理的分组来优化训练效率，并提升渲染质量。该方法具有与现有3DGS框架（包括基础3DGS和Mip-Splatting）的广泛兼容性，能够在加速训练的同时保持卓越的合成质量。\n广泛的实验表明，我们的分组训练策略在多种场景下实现了高达30%的训练收敛加速，并在渲染质量上取得了进一步提升。这一方法不仅简便易用，还为3DGS技术在高效训练和优质合成之间提供了新的平衡点。\n"
  },
  {
    "path": "abs/2412.07660.md",
    "content": "### Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians\n\nBuildings are primary components of cities, often featuring repeated elements such as windows and doors. Traditional 3D building asset creation is labor-intensive and requires specialized skills to develop design rules. Recent generative models for building creation often overlook these patterns, leading to low visual fidelity and limited scalability. Drawing inspiration from procedural modeling techniques used in the gaming and visual effects industry, our method, Proc-GS, integrates procedural code into the 3D Gaussian Splatting (3D-GS) framework, leveraging their advantages in high-fidelity rendering and efficient asset management from both worlds. By manipulating procedural code, we can streamline this process and generate an infinite variety of buildings. This integration significantly reduces model size by utilizing shared foundational assets, enabling scalable generation with precise control over building assembly. We showcase the potential for expansive cityscape generation while maintaining high rendering fidelity and precise control on both real and synthetic cases.\n\n建筑物是城市的主要组成部分，通常包含重复的元素，例如窗户和门。传统的3D建筑资产创建过程劳动密集，且需要专业技能来制定设计规则。近年来的生成模型在建筑生成方面往往忽略了这些重复模式，导致视觉保真度较低且扩展性有限。\n受游戏和视觉特效行业中程序化建模技术的启发，我们提出了 Proc-GS 方法，将程序化代码集成到 3D 高斯投影（3D Gaussian Splatting, 3D-GS）框架中，结合两者在高保真渲染和高效资产管理上的优势。通过操作程序化代码，我们能够简化建筑生成过程，同时生成无限多样化的建筑。\n这一集成通过利用共享的基础资产显著减少了模型大小，从而实现可扩展的生成，并对建筑装配提供精确控制。我们展示了在真实与合成场景中扩展城市景观生成的潜力，同时保持高渲染保真度与精确控制。\n"
  },
  {
    "path": "abs/2412.07739.md",
    "content": "### GASP: Gaussian Avatars with Synthetic Priors\n\nGaussian Splatting has changed the game for real-time photo-realistic rendering. One of the most popular applications of Gaussian Splatting is to create animatable avatars, known as Gaussian Avatars. Recent works have pushed the boundaries of quality and rendering efficiency but suffer from two main limitations. Either they require expensive multi-camera rigs to produce avatars with free-view rendering, or they can be trained with a single camera but only rendered at high quality from this fixed viewpoint. An ideal model would be trained using a short monocular video or image from available hardware, such as a webcam, and rendered from any view. To this end, we propose GASP: Gaussian Avatars with Synthetic Priors. To overcome the limitations of existing datasets, we exploit the pixel-perfect nature of synthetic data to train a Gaussian Avatar prior. By fitting this prior model to a single photo or video and fine-tuning it, we get a high-quality Gaussian Avatar, which supports 360∘ rendering. Our prior is only required for fitting, not inference, enabling real-time application. Through our method, we obtain high-quality, animatable Avatars from limited data which can be animated and rendered at 70fps on commercial hardware. See our project page (this https URL) for results."
  },
  {
    "path": "abs/2412.07984.md",
    "content": "### Diffusion-Based Attention Warping for Consistent 3D Scene Editing\n\nWe present a novel method for 3D scene editing using diffusion models, designed to ensure view consistency and realism across perspectives. Our approach leverages attention features extracted from a single reference image to define the intended edits. These features are warped across multiple views by aligning them with scene geometry derived from Gaussian splatting depth estimates. Injecting these warped features into other viewpoints enables coherent propagation of edits, achieving high fidelity and spatial alignment in 3D space. Extensive evaluations demonstrate the effectiveness of our method in generating versatile edits of 3D scenes, significantly advancing the capabilities of scene manipulation compared to the existing methods.\n\n我们提出了一种用于3D场景编辑的新方法，基于扩散模型，旨在确保视角一致性和真实感。该方法利用从单个参考图像中提取的注意力特征来定义目标编辑，并通过与由高斯投影深度估计得出的场景几何对齐，将这些特征在多个视角中进行变换。将变换后的特征注入其他视角，使得编辑能够在3D空间中以高保真和空间对齐的方式传播。\n广泛的评估表明，我们的方法在生成多样化的3D场景编辑方面表现出色，与现有方法相比显著提升了场景操作的能力，为3D场景编辑技术带来了新的突破。\n"
  },
  {
    "path": "abs/2412.08152.md",
    "content": "### ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing\n\n3D editing plays a crucial role in editing and reusing existing 3D assets, thereby enhancing productivity. Recently, 3DGS-based methods have gained increasing attention due to their efficient rendering and flexibility. However, achieving desired 3D editing results often requires multiple adjustments in an iterative loop, resulting in tens of minutes of training time cost for each attempt and a cumbersome trial-and-error cycle for users. This in-the-loop training paradigm results in a poor user experience. To address this issue, we introduce the concept of process-oriented modelling for 3D editing and propose the Progressive Gaussian Differential Field (ProGDF), an out-of-loop training approach that requires only a single training session to provide users with controllable editing capability and variable editing results through a user-friendly interface in real-time. ProGDF consists of two key components: Progressive Gaussian Splatting (PGS) and Gaussian Differential Field (GDF). PGS introduces the progressive constraint to extract the diverse intermediate results of the editing process and employs rendering quality regularization to improve the quality of these results. Based on these intermediate results, GDF leverages a lightweight neural network to model the editing process. Extensive results on two novel applications, namely controllable 3D editing and flexible fine-grained 3D manipulation, demonstrate the effectiveness, practicality and flexibility of the proposed ProGDF.\n\n3D编辑在修改和重用现有3D资产方面起着关键作用，从而显著提升生产效率。近年来，基于3D高斯投影（3D Gaussian Splatting, 3DGS）的方法因其高效渲染和灵活性而备受关注。然而，要实现理想的3D编辑效果，通常需要经过多次调整的迭代循环，每次尝试都可能耗费数十分钟的训练时间，导致用户体验受到冗长试错过程的限制。这种“循环内训练”范式严重降低了用户体验。\n为解决这一问题，我们提出了一种面向过程的3D编辑建模新概念，并设计了渐进式高斯差分场（Progressive Gaussian Differential Field, ProGDF），一种“循环外训练”方法。ProGDF仅需一次训练即可通过实时、用户友好的界面为用户提供可控的编辑能力和多样化的编辑结果。\nProGDF由两个关键组件组成：渐进式高斯投影（Progressive Gaussian Splatting, PGS）和高斯差分场（Gaussian Differential Field, GDF）。PGS 引入渐进约束，以提取编辑过程中的多样化中间结果，并通过渲染质量正则化提高这些结果的质量。在此基础上，GDF 利用轻量级神经网络对编辑过程进行建模。\n在两种新应用场景——可控3D编辑和灵活的细粒度3D操作——中的广泛实验结果表明，ProGDF 在效果、实用性和灵活性方面表现出显著优势，为3D编辑提供了高效且易用的解决方案。\n"
  },
  {
    "path": "abs/2412.08331.md",
    "content": "### SLGaussian: Fast Language Gaussian Splatting in Sparse Views\n\n3D semantic field learning is crucial for applications like autonomous navigation, AR/VR, and robotics, where accurate comprehension of 3D scenes from limited viewpoints is essential. Existing methods struggle under sparse view conditions, relying on inefficient per-scene multi-view optimizations, which are impractical for many real-world tasks. To address this, we propose SLGaussian, a feed-forward method for constructing 3D semantic fields from sparse viewpoints, allowing direct inference of 3DGS-based scenes. By ensuring consistent SAM segmentations through video tracking and using low-dimensional indexing for high-dimensional CLIP features, SLGaussian efficiently embeds language information in 3D space, offering a robust solution for accurate 3D scene understanding under sparse view conditions. In experiments on two-view sparse 3D object querying and segmentation in the LERF and 3D-OVS datasets, SLGaussian outperforms existing methods in chosen IoU, Localization Accuracy, and mIoU. Moreover, our model achieves scene inference in under 30 seconds and open-vocabulary querying in just 0.011 seconds per query.\n\n3D语义场学习在自动驾驶、增强/虚拟现实（AR/VR）和机器人等领域至关重要，因为这些应用需要从有限视角中准确理解3D场景。然而，现有方法在稀疏视图条件下表现不佳，依赖于效率低下的逐场景多视图优化，这在许多实际任务中并不实用。\n为了解决这一问题，我们提出了 SLGaussian，一种用于从稀疏视角构建3D语义场的前馈方法，实现对基于3D高斯投影（3DGS）场景的直接推理。通过视频跟踪确保一致的 SAM（Segment Anything Model）分割，以及使用低维索引高维 CLIP 特征，SLGaussian 能高效地在3D空间中嵌入语言信息，从而在稀疏视图条件下提供稳健的3D场景理解解决方案。\n在 LERF 和 3D-OVS 数据集上的双视图稀疏3D对象查询与分割实验中，SLGaussian 在选择的 IoU、定位准确率（Localization Accuracy）和 mIoU 指标上均优于现有方法。此外，我们的模型在场景推理中实现了小于30秒的推理时间，并能以每次查询仅 0.011 秒的速度完成开放词汇查询，展现了高效性和实用性。\n"
  },
  {
    "path": "abs/2412.08504.md",
    "content": "### PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis\n\nTalking head synthesis with arbitrary speech audio is a crucial challenge in the field of digital humans. Recently, methods based on radiance fields have received increasing attention due to their ability to synthesize high-fidelity and identity-consistent talking heads from just a few minutes of training video. However, due to the limited scale of the training data, these methods often exhibit poor performance in audio-lip synchronization and visual quality. In this paper, we propose a novel 3D Gaussian-based method called PointTalk, which constructs a static 3D Gaussian field of the head and deforms it in sync with the audio. It also incorporates an audio-driven dynamic lip point cloud as a critical component of the conditional information, thereby facilitating the effective synthesis of talking heads. Specifically, the initial step involves generating the corresponding lip point cloud from the audio signal and capturing its topological structure. The design of the dynamic difference encoder aims to capture the subtle nuances inherent in dynamic lip movements more effectively. Furthermore, we integrate the audio-point enhancement module, which not only ensures the synchronization of the audio signal with the corresponding lip point cloud within the feature space, but also facilitates a deeper understanding of the interrelations among cross-modal conditional features. Extensive experiments demonstrate that our method achieves superior high-fidelity and audio-lip synchronization in talking head synthesis compared to previous methods.\n\n使用任意语音音频生成数字人头部的说话动画是数字人领域的关键挑战。近年来，基于辐射场的方法因其能够从仅几分钟的训练视频中生成高保真且身份一致的说话头像而受到越来越多的关注。然而，由于训练数据规模有限，这些方法通常在音频与唇形的同步性及视觉质量方面表现欠佳。\n为了解决这些问题，我们提出了一种新颖的基于3D高斯的生成方法，称为 PointTalk。该方法通过构建一个静态的头部3D高斯场，并根据音频进行同步变形。同时，PointTalk 引入了一个由音频驱动的动态唇部点云作为条件信息的关键组成部分，从而有效促进了说话头像的合成。\n具体而言，PointTalk 的初始步骤是从音频信号生成对应的唇部点云并捕捉其拓扑结构。设计的动态差分编码器（Dynamic Difference Encoder）旨在更有效地捕捉唇部动态运动中细微的变化。此外，我们集成了音频-点云增强模块（Audio-Point Enhancement Module），不仅在特征空间中确保音频信号与对应唇部点云的同步性，还促进了跨模态条件特征之间关系的深入理解。\n大量实验表明，与现有方法相比，我们的方法在说话头像合成的高保真度和音频-唇形同步性方面均表现出色，为该领域的进一步发展提供了新的技术基础。\n"
  },
  {
    "path": "abs/2412.09176.md",
    "content": "### LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting\n\nRecently, radiance field rendering, such as 3D Gaussian Splatting (3DGS), has shown immense potential in VR content creation due to its high-quality rendering and efficient production process. However, existing physics-based interaction systems for 3DGS can only perform simple and non-realistic simulations or demand extensive user input for complex scenes, primarily due to the absence of scene understanding. In this paper, we propose LIVE-GS, a highly realistic interactive VR system powered by LLM. After object-aware GS reconstruction, we prompt GPT-4o to analyze the physical properties of objects in the scene, which are used to guide physical simulations consistent with real phenomena. We also design a GPT-assisted GS inpainting module to fill the unseen area covered by manipulative objects. To perform a precise segmentation of Gaussian kernels, we propose a feature-mask segmentation strategy. To enable rich interaction, we further propose a computationally efficient physical simulation framework through an PBD-based unified interpolation method, supporting various physical forms such as rigid body, soft body, and granular materials. Our experimental results show that with the help of LLM's understanding and enhancement of scenes, our VR system can support complex and realistic interactions without additional manual design and annotation.\n\n近年来，辐射场渲染技术（如3D高斯投影，3D Gaussian Splatting, 3DGS）因其高质量渲染和高效生产流程，在VR内容创作中展现了巨大潜力。然而，现有基于物理的3DGS交互系统要么只能进行简单且不真实的模拟，要么在复杂场景中需要大量用户输入，主要原因在于缺乏场景理解能力。\n为解决这一问题，本文提出了 LIVE-GS，一种基于大型语言模型（LLM）的高真实感交互式VR系统。通过对象感知的高斯投影重建后，我们利用 GPT-4o 对场景中物体的物理属性进行分析，这些属性用于指导与真实现象一致的物理模拟。同时，我们设计了一个由 GPT 辅助的高斯场景修补模块（GS Inpainting Module），以填补被交互物体遮挡的未见区域。为精确分割高斯核，我们提出了一种特征掩膜分割策略（Feature-Mask Segmentation Strategy）。\n为了实现丰富的交互，我们进一步提出了一种基于 PBD（Position-Based Dynamics）的高效物理模拟框架，通过统一插值方法支持刚体、软体和颗粒材料等多种物理形式。实验结果表明，在 LLM 的场景理解和增强能力的帮助下，我们的VR系统无需额外的手工设计和标注，就能支持复杂且逼真的交互，大幅提升了系统的实用性和表现力。\n"
  },
  {
    "path": "abs/2412.09511.md",
    "content": "### GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency\n\nIdentifying affordance regions on 3D objects from semantic cues is essential for robotics and human-machine interaction. However, existing 3D affordance learning methods struggle with generalization and robustness due to limited annotated data and a reliance on 3D backbones focused on geometric encoding, which often lack resilience to real-world noise and data corruption. We propose GEAL, a novel framework designed to enhance the generalization and robustness of 3D affordance learning by leveraging large-scale pre-trained 2D models. We employ a dual-branch architecture with Gaussian splatting to establish consistent mappings between 3D point clouds and 2D representations, enabling realistic 2D renderings from sparse point clouds. A granularity-adaptive fusion module and a 2D-3D consistency alignment module further strengthen cross-modal alignment and knowledge transfer, allowing the 3D branch to benefit from the rich semantics and generalization capacity of 2D models. To holistically assess the robustness, we introduce two new corruption-based benchmarks: PIAD-C and LASO-C. Extensive experiments on public datasets and our benchmarks show that GEAL consistently outperforms existing methods across seen and novel object categories, as well as corrupted data, demonstrating robust and adaptable affordance prediction under diverse conditions.\n\n从语义线索中识别3D对象的可供性区域对机器人学和人机交互至关重要。然而，现有的3D可供性学习方法由于标注数据的有限性以及过度依赖几何编码的3D网络结构，往往在泛化性和鲁棒性上表现欠佳，特别是在面对现实世界中的噪声和数据损坏时。\n我们提出了 GEAL，一种旨在通过利用大规模预训练的2D模型来增强3D可供性学习的泛化性和鲁棒性的新框架。GEAL 采用双分支架构，并结合高斯投影（Gaussian Splatting）技术，在3D点云与2D表示之间建立一致的映射，从稀疏点云生成真实感的2D渲染图。框架中设计了粒度自适应融合模块（Granularity-Adaptive Fusion Module）和2D-3D一致性对齐模块（2D-3D Consistency Alignment Module），进一步加强了跨模态对齐与知识迁移，使得3D分支能够充分利用2D模型的丰富语义信息和强泛化能力。\n为全面评估鲁棒性，我们引入了两个新的基于损坏的基准测试集：PIAD-C 和 LASO-C。在公开数据集及新基准上的广泛实验表明，GEAL 在已知和新类别对象以及损坏数据场景下的表现显著优于现有方法，展现出强大的鲁棒性和适应性，能够在多样化条件下实现准确的可供性预测。\n"
  },
  {
    "path": "abs/2412.09545.md",
    "content": "### SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing\n\nWe introduce SimAvatar, a framework designed to generate simulation-ready clothed 3D human avatars from a text prompt. Current text-driven human avatar generation methods either model hair, clothing, and the human body using a unified geometry or produce hair and garments that are not easily adaptable for simulation within existing simulation pipelines. The primary challenge lies in representing the hair and garment geometry in a way that allows leveraging established prior knowledge from foundational image diffusion models (e.g., Stable Diffusion) while being simulation-ready using either physics or neural simulators. To address this task, we propose a two-stage framework that combines the flexibility of 3D Gaussians with simulation-ready hair strands and garment meshes. Specifically, we first employ three text-conditioned 3D generative models to generate garment mesh, body shape and hair strands from the given text prompt. To leverage prior knowledge from foundational diffusion models, we attach 3D Gaussians to the body mesh, garment mesh, as well as hair strands and learn the avatar appearance through optimization. To drive the avatar given a pose sequence, we first apply physics simulators onto the garment meshes and hair strands. We then transfer the motion onto 3D Gaussians through carefully designed mechanisms for each body part. As a result, our synthesized avatars have vivid texture and realistic dynamic motion. To the best of our knowledge, our method is the first to produce highly realistic, fully simulation-ready 3D avatars, surpassing the capabilities of current approaches.\n\n我们提出了 SimAvatar，一个从文本提示生成可用于仿真的穿衣3D人类头像的框架。目前的基于文本的人类头像生成方法要么通过统一几何建模头发、衣物和人体，要么生成的头发和衣物难以适配现有仿真管线中的物理或神经模拟器。核心挑战在于以适合仿真的方式表示头发和衣物几何，同时利用基础图像扩散模型（如 Stable Diffusion）的先验知识。\n为解决这一任务，我们提出了一个两阶段框架，将3D高斯投影的灵活性与可仿真的头发丝和衣物网格相结合。具体而言，第一阶段利用三个基于文本条件的3D生成模型，从文本提示生成衣物网格、身体形状和头发丝。为利用扩散模型的先验知识，我们将3D高斯投影附加到身体网格、衣物网格以及头发丝上，并通过优化学习头像的外观。\n在驱动头像进行姿态序列动作时，我们首先对衣物网格和头发丝应用物理模拟器，然后通过为各身体部位精心设计的机制将动作转移到3D高斯上。最终，我们生成的头像具有生动的纹理和逼真的动态动作。\n据我们所知，SimAvatar 是首个能够生成高度真实且完全可仿真的3D头像的框架，其能力超越了现有方法，为仿真和动画领域带来了显著进步。\n"
  },
  {
    "path": "abs/2412.09573.md",
    "content": "### FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction\n\nExisting sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalable, feed-forward reconstruction framework capable of generating high-quality 3D Gaussians from uncalibrated sparse-view images and recovering their camera parameters in mere seconds. FreeSplatter is built upon a streamlined transformer architecture, comprising sequential self-attention blocks that facilitate information exchange among multi-view image tokens and decode them into pixel-wise 3D Gaussian primitives. The predicted Gaussian primitives are situated in a unified reference frame, allowing for high-fidelity 3D modeling and instant camera parameter estimation using off-the-shelf solvers. To cater to both object-centric and scene-level reconstruction, we train two model variants of FreeSplatter on extensive datasets. In both scenarios, FreeSplatter outperforms state-of-the-art baselines in terms of reconstruction quality and pose estimation accuracy. Furthermore, we showcase FreeSplatter's potential in enhancing the productivity of downstream applications, such as text/image-to-3D content creation.\n\n现有的稀疏视图重建模型在很大程度上依赖于准确的已知相机位姿。然而，从稀疏视图图像中提取相机的外参和内参面临重大挑战。为此，我们提出 FreeSplatter，一个高度可扩展的前馈重建框架，能够从未校准的稀疏视图图像中快速生成高质量的3D高斯表示，并在短短几秒内恢复相机参数。\nFreeSplatter 基于精简的 Transformer 架构，由一系列顺序的自注意力模块组成，促进多视图图像令牌之间的信息交换，并将其解码为像素级的3D高斯原语。这些预测的高斯原语位于统一的参考坐标系中，从而实现高保真度的3D建模，并使用现成的解算器即时估算相机参数。\n针对对象级和场景级重建需求，我们在大规模数据集上训练了 FreeSplatter 的两种模型变体。在这两种场景下，FreeSplatter 在重建质量和位姿估计精度方面均优于当前最先进的基线方法。此外，我们展示了 FreeSplatter 在下游应用中的潜力，例如文本/图像到3D内容创建，大幅提升了相关任务的生产效率。\n"
  },
  {
    "path": "abs/2412.09597.md",
    "content": "### LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors\n\nSingle-image 3D reconstruction remains a fundamental challenge in computer vision due to inherent geometric ambiguities and limited viewpoint information. Recent advances in Latent Video Diffusion Models (LVDMs) offer promising 3D priors learned from large-scale video data. However, leveraging these priors effectively faces three key challenges: (1) degradation in quality across large camera motions, (2) difficulties in achieving precise camera control, and (3) geometric distortions inherent to the diffusion process that damage 3D consistency. We address these challenges by proposing LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency. Specifically, we design an articulated trajectory strategy to generate video frames, which decomposes video sequences with large camera motions into ones with controllable small motions. Then we use robust neural matching models, i.e. MASt3R, to calibrate the camera poses of generated frames and produce corresponding point clouds. Finally, we propose a distortion-aware 3D Gaussian splatting representation, which can learn independent distortions between frames and output undistorted canonical Gaussians. Extensive experiments demonstrate that LiftImage3D achieves state-of-the-art performance on two challenging datasets, i.e. LLFF, DL3DV, and Tanks and Temples, and generalizes well to diverse in-the-wild images, from cartoon illustrations to complex real-world scenes.\n\n单张图像的3D重建是计算机视觉领域的一个基础挑战，由于固有的几何模糊性和有限的视角信息，使得这一任务难以解决。最近，潜在视频扩散模型（Latent Video Diffusion Models, LVDMs）在从大规模视频数据中学习3D先验方面展现出巨大潜力。然而，要有效利用这些先验，仍面临三个关键挑战：(1) 大范围相机运动导致的质量下降，(2) 难以实现精确的相机控制，(3) 扩散过程中固有的几何失真破坏了3D一致性。\n为了解决这些问题，我们提出了 LiftImage3D 框架，能够有效释放 LVDMs 的生成先验，同时确保3D一致性。具体而言，我们设计了一种关节轨迹策略（articulated trajectory strategy），用于生成视频帧，将大范围的相机运动分解为可控的小范围运动序列。接着，我们采用强大的神经匹配模型（如 MASt3R）对生成的帧进行相机位姿校准，并生成相应的点云。\n最后，我们提出了一种失真感知的3D高斯投影表示（distortion-aware 3D Gaussian splatting representation），能够在帧之间学习独立的失真，并输出无失真的规范高斯表示。广泛的实验表明，LiftImage3D 在两个具有挑战性的数据集（LLFF、DL3DV 和 Tanks and Temples）上实现了当前最先进的性能，并且能够很好地泛化到多样的真实场景和野外图像，从卡通插图到复杂的现实场景均表现优异。\n"
  },
  {
    "path": "abs/2412.09606.md",
    "content": "### Feat2GS: Probing Visual Foundation Models with Gaussian Splatting\n\nGiven that visual foundation models (VFMs) are trained on extensive datasets but often limited to 2D images, a natural question arises: how well do they understand the 3D world? With the differences in architecture and training protocols (i.e., objectives, proxy tasks), a unified framework to fairly and comprehensively probe their 3D awareness is urgently needed. Existing works on 3D probing suggest single-view 2.5D estimation (e.g., depth and normal) or two-view sparse 2D correspondence (e.g., matching and tracking). Unfortunately, these tasks ignore texture awareness, and require 3D data as ground-truth, which limits the scale and diversity of their evaluation set. To address these issues, we introduce Feat2GS, which readout 3D Gaussians attributes from VFM features extracted from unposed images. This allows us to probe 3D awareness for geometry and texture via novel view synthesis, without requiring 3D data. Additionally, the disentanglement of 3DGS parameters - geometry (x,α,Σ) and texture (c) - enables separate analysis of texture and geometry awareness. Under Feat2GS, we conduct extensive experiments to probe the 3D awareness of several VFMs, and investigate the ingredients that lead to a 3D aware VFM. Building on these findings, we develop several variants that achieve state-of-the-art across diverse datasets. This makes Feat2GS useful for probing VFMs, and as a simple-yet-effective baseline for novel-view synthesis. Code and data will be made available at this https URL.\n\n视觉基础模型（Visual Foundation Models, VFMs）虽然在大规模数据集上训练，但通常局限于2D图像处理。那么，这些模型对3D世界的理解能力到底如何？由于架构和训练协议（如目标和代理任务）的差异，迫切需要一个统一的框架来公平且全面地探测其3D认知能力。\n现有的3D探测方法主要集中于单视图的2.5D估计（如深度和法线）或双视图的稀疏2D对应（如匹配和跟踪）。然而，这些任务忽略了纹理感知，并且依赖于3D数据作为真实标签（ground-truth），从而限制了其评估数据集的规模和多样性。\n为解决这些问题，我们提出了 Feat2GS，通过从未标定的图像中提取的 VFM 特征读取3D高斯属性。这使我们能够通过新视角合成来探测几何和纹理的3D认知能力，而无需依赖3D数据。此外，3D高斯投影（3DGS）参数的解耦——几何属性（￼）和纹理属性（￼）——使得可以分别分析模型的几何和纹理认知能力。\n基于 Feat2GS，我们进行了大量实验，探测了多个 VFMs 的3D认知能力，并研究了哪些因素有助于构建具备3D认知能力的 VFM。基于这些发现，我们开发了多个变体，在多个数据集上实现了当前最先进的性能。这不仅使 Feat2GS 成为探测 VFM 的有效工具，还作为一种简单但高效的新视角合成基线方法，为3D认知研究提供了新的方向。\n"
  },
  {
    "path": "abs/2412.09648.md",
    "content": "### DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models\n\nGenerating high-quality 3D content requires models capable of learning robust distributions of complex scenes and the real-world objects within them. Recent Gaussian-based 3D reconstruction techniques have achieved impressive results in recovering high-fidelity 3D assets from sparse input images by predicting 3D Gaussians in a feed-forward manner. However, these techniques often lack the extensive priors and expressiveness offered by Diffusion Models. On the other hand, 2D Diffusion Models, which have been successfully applied to denoise multiview images, show potential for generating a wide range of photorealistic 3D outputs but still fall short on explicit 3D priors and consistency. In this work, we aim to bridge these two approaches by introducing DSplats, a novel method that directly denoises multiview images using Gaussian Splat-based Reconstructors to produce a diverse array of realistic 3D assets. To harness the extensive priors of 2D Diffusion Models, we incorporate a pretrained Latent Diffusion Model into the reconstructor backbone to predict a set of 3D Gaussians. Additionally, the explicit 3D representation embedded in the denoising network provides a strong inductive bias, ensuring geometrically consistent novel view generation. Our qualitative and quantitative experiments demonstrate that DSplats not only produces high-quality, spatially consistent outputs, but also sets a new standard in single-image to 3D reconstruction. When evaluated on the Google Scanned Objects dataset, DSplats achieves a PSNR of 20.38, an SSIM of 0.842, and an LPIPS of 0.109.\n\n生成高质量的三维内容需要能够学习复杂场景和其中真实世界对象分布的模型。最近基于高斯的三维重建技术通过以前馈方式预测三维高斯，实现了从稀疏输入图像中恢复高保真三维资产的出色成果。然而，这些技术通常缺乏扩展的先验知识和扩展性，而这些是扩散模型所能提供的。另一方面，尽管二维扩散模型已成功应用于对多视图图像去噪，并展现出生成多种真实感三维输出的潜力，但它们在明确三维先验和一致性方面仍然存在不足。\n在本研究中，我们提出了一种名为 DSplats 的新方法，通过使用基于高斯点云重建器直接对多视图图像进行去噪，以生成多样化的逼真三维资产。为了利用二维扩散模型的丰富先验知识，我们在重建器框架中引入了预训练的潜在扩散模型（Latent Diffusion Model），用于预测一组三维高斯。此外，嵌入到去噪网络中的显式三维表示提供了强大的归纳偏置，从而确保了几何一致的全新视图生成。\n我们的定性和定量实验表明，DSplats 不仅能够生成高质量、空间一致的输出，还为单张图像到三维重建设立了新标杆。在 Google Scanned Objects 数据集上的评估结果显示，DSplats 实现了 PSNR 20.38，SSIM 0.842，以及 LPIPS 0.109 的优秀表现。\n"
  },
  {
    "path": "abs/2412.09723.md",
    "content": "### MAC-Ego3D: Multi-Agent Gaussian Consensus for Real-Time Collaborative Ego-Motion and Photorealistic 3D Reconstruction\n\nReal-time multi-agent collaboration for ego-motion estimation and high-fidelity 3D reconstruction is vital for scalable spatial intelligence. However, traditional methods produce sparse, low-detail maps, while recent dense mapping approaches struggle with high latency. To overcome these challenges, we present MAC-Ego3D, a novel framework for real-time collaborative photorealistic 3D reconstruction via Multi-Agent Gaussian Consensus. MAC-Ego3D enables agents to independently construct, align, and iteratively refine local maps using a unified Gaussian splat representation. Through Intra-Agent Gaussian Consensus, it enforces spatial coherence among neighboring Gaussian splats within an agent. For global alignment, parallelized Inter-Agent Gaussian Consensus, which asynchronously aligns and optimizes local maps by regularizing multi-agent Gaussian splats, seamlessly integrates them into a high-fidelity 3D model. Leveraging Gaussian primitives, MAC-Ego3D supports efficient RGB-D rendering, enabling rapid inter-agent Gaussian association and alignment. MAC-Ego3D bridges local precision and global coherence, delivering higher efficiency, largely reducing localization error, and improving mapping fidelity. It establishes a new SOTA on synthetic and real-world benchmarks, achieving a 15x increase in inference speed, order-of-magnitude reductions in ego-motion estimation error for partial cases, and RGB PSNR gains of 4 to 10 dB.\n\n实时多智能体协作进行自运动估计和高保真三维重建是实现可扩展空间智能的关键。然而，传统方法通常生成稀疏、低细节的地图，而最近的密集映射方法则面临高延迟问题。为了解决这些挑战，我们提出了 MAC-Ego3D，一种通过多智能体高斯共识实现实时协作光真实感三维重建的新框架。\nMAC-Ego3D 使智能体能够独立构建、对齐并通过统一的高斯点云表示迭代优化本地地图。通过智能体内高斯共识（Intra-Agent Gaussian Consensus），框架在单个智能体内的邻近高斯点云之间强制保持空间一致性。对于全局对齐，框架采用并行的智能体间高斯共识（Inter-Agent Gaussian Consensus），异步对齐并优化本地地图，通过对多智能体高斯点云的正则化，流畅地将其整合为高保真的三维模型。\n借助高斯基元，MAC-Ego3D 支持高效的 RGB-D 渲染，实现快速的智能体间高斯关联和对齐。框架在局部精度与全局一致性之间架起了桥梁，不仅显著提升效率，极大地降低了定位误差，还提高了映射的保真度。\nMAC-Ego3D 在合成和真实世界基准测试中设立了新的性能标杆，实现了 15 倍的推理速度提升，自运动估计误差在部分场景中降低了一个数量级，并且 RGB 的 PSNR 提升了 4 至 10 dB。\n"
  },
  {
    "path": "abs/2412.09868.md",
    "content": "### RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting\n\n3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.\n\n三维高斯点云技术（3D Gaussian Splatting）已经成为实现高质量三维渲染的一种有前景的方法，这使得将3DGS集成到真实感SLAM系统中的研究兴趣日益增长。然而，现有方法面临着高斯基元冗余、连续优化过程中遗忘问题以及在单目场景中由于缺乏深度信息而难以初始化基元等挑战。为实现高效且光真实感的映射，我们提出了RP-SLAM，这是一种基于三维高斯点云的视觉SLAM方法，适用于单目和RGB-D相机。RP-SLAM通过将相机位姿估计与高斯基元优化解耦，提出了一种高效的增量映射方法，通过自适应采样和高斯基元过滤实现对场景的紧凑且准确表示；引入了一种动态窗口优化方法，以缓解遗忘问题并提高地图一致性；针对单目场景，设计了一种基于稀疏点云的单目关键帧初始化方法，以提高高斯基元初始化的精度，为后续优化提供几何基础。大量实验结果表明，RP-SLAM在确保实时性能和模型紧凑性的同时，实现了业界领先的地图渲染精度。\n"
  },
  {
    "path": "abs/2412.09982.md",
    "content": "### SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video\n\nSynthesizing novel views from in-the-wild monocular videos is challenging due to scene dynamics and the lack of multi-view cues. To address this, we propose SplineGS, a COLMAP-free dynamic 3D Gaussian Splatting (3DGS) framework for high-quality reconstruction and fast rendering from monocular videos. At its core is a novel Motion-Adaptive Spline (MAS) method, which represents continuous dynamic 3D Gaussian trajectories using cubic Hermite splines with a small number of control points. For MAS, we introduce a Motion-Adaptive Control points Pruning (MACP) method to model the deformation of each dynamic 3D Gaussian across varying motions, progressively pruning control points while maintaining dynamic modeling integrity. Additionally, we present a joint optimization strategy for camera parameter estimation and 3D Gaussian attributes, leveraging photometric and geometric consistency. This eliminates the need for Structure-from-Motion preprocessing and enhances SplineGS's robustness in real-world conditions. Experiments show that SplineGS significantly outperforms state-of-the-art methods in novel view synthesis quality for dynamic scenes from monocular videos, achieving thousands times faster rendering speed.\n\n从自然场景的单目视频合成新视图是一个具有挑战性的问题，主要原因在于场景动态性和缺乏多视角信息。为了解决这一问题，我们提出了 SplineGS，一种无需 COLMAP 的动态三维高斯点云（3DGS）框架，能够从单目视频中实现高质量重建和快速渲染。该框架的核心是一个新颖的 运动自适应样条（Motion-Adaptive Spline, MAS） 方法，通过使用带少量控制点的三次 Hermite 样条来表示连续的动态三维高斯轨迹。\n针对 MAS，我们设计了一种 运动自适应控制点修剪（Motion-Adaptive Control points Pruning, MACP） 方法，用于在不同运动情况下建模动态三维高斯的形变，同时逐步修剪控制点以保持动态建模的完整性。此外，我们提出了一种联合优化策略，通过光度一致性和几何一致性对相机参数和三维高斯属性进行联合优化。这种策略避免了对基于 Structure-from-Motion 的预处理需求，并增强了 SplineGS 在真实场景条件下的鲁棒性。\n实验结果表明，SplineGS 在动态场景的单目视频新视图合成质量上显著优于现有方法，同时实现了数千倍的渲染速度提升。\n"
  },
  {
    "path": "abs/2412.10051.md",
    "content": "### TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views\n\nRecent advances in Gaussian Splatting have significantly advanced the field, achieving both panoptic and interactive segmentation of 3D scenes. However, existing methodologies often overlook the critical need for reconstructing specified targets with complex structures from sparse views. To address this issue, we introduce TSGaussian, a novel framework that combines semantic constraints with depth priors to avoid geometry degradation in challenging novel view synthesis tasks. Our approach prioritizes computational resources on designated targets while minimizing background allocation. Bounding boxes from YOLOv9 serve as prompts for Segment Anything Model to generate 2D mask predictions, ensuring semantic accuracy and cost efficiency. TSGaussian effectively clusters 3D gaussians by introducing a compact identity encoding for each Gaussian ellipsoid and incorporating 3D spatial consistency regularization. Leveraging these modules, we propose a pruning strategy to effectively reduce redundancy in 3D gaussians. Extensive experiments demonstrate that TSGaussian outperforms state-of-the-art methods on three standard datasets and a new challenging dataset we collected, achieving superior results in novel view synthesis of specific objects.\n\n高斯点云技术的最新进展显著推动了三维场景的全景和交互式分割。然而，现有方法往往忽视了从稀疏视角重建复杂结构指定目标的关键需求。为了解决这一问题，我们提出了 TSGaussian，一种结合语义约束和深度先验的新框架，用于在具有挑战性的视图合成任务中避免几何退化。我们的方法优先将计算资源分配到指定目标上，同时最小化对背景的资源分配。\nTSGaussian 使用来自 YOLOv9 的边界框作为提示，通过 Segment Anything Model 生成 2D 掩码预测，从而在保证语义准确性的同时提高成本效率。通过引入每个高斯椭球的紧凑身份编码和 3D 空间一致性正则化，TSGaussian 实现了对三维高斯点云的有效聚类。基于这些模块，我们设计了一种修剪策略，有效减少三维高斯的冗余。\n大量实验表明，TSGaussian 在三个标准数据集以及我们新收集的一个具有挑战性的数据集上均优于现有最先进方法，在特定目标的新视图合成中取得了卓越的效果。\n"
  },
  {
    "path": "abs/2412.10078.md",
    "content": "### Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories\n\nCurrently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory. This paper presents a Toy-GS method for accurately rendering large-scale free camera trajectories. Specifically, we propose an adaptive spatial division approach for free trajectories to divide cameras and the sparse point cloud of the entire scene into various regions according to camera poses. Training each local Gaussian in parallel for each area enables us to concentrate on texture details and minimize GPU memory usage. Next, we use the multi-view constraint and position-aware point adaptive control (PPAC) to improve the rendering quality of texture details. In addition, our regional fusion approach combines local and global Gaussians to enhance rendering quality with an increasing number of divided areas. Extensive experiments have been carried out to confirm the effectiveness and efficiency of Toy-GS, leading to state-of-the-art results on two public large-scale datasets as well as our SCUTic dataset. Our proposal demonstrates an enhancement of 1.19 dB in PSNR and conserves 7 G of GPU memory when compared to various benchmarks.\n\n目前，对于大规模自由相机轨迹的3D渲染（即任意输入相机轨迹）存在显著挑战：1）相机的分布和观测角度不规则，自由轨迹中包含多种类型的场景；2）处理大规模场景的整个点云和所有图像需要大量的GPU内存。本文提出了一种名为 Toy-GS 的方法，用于准确渲染大规模自由相机轨迹。具体而言，我们提出了一种针对自由轨迹的自适应空间划分方法，根据相机位姿将整个场景的相机和稀疏点云划分为不同区域。通过对每个区域的局部高斯进行并行训练，我们能够专注于纹理细节，并最小化GPU内存使用。\n接下来，我们利用多视图约束和位置感知点自适应控制（PPAC）来提高纹理细节的渲染质量。此外，我们的区域融合方法结合了局部和全局高斯，随着划分区域数量的增加进一步增强渲染质量。广泛的实验验证了 Toy-GS 的有效性和效率，在两个公共的大规模数据集以及我们的 SCUTic 数据集上实现了最先进的性能。与各种基准方法相比，我们的方法在 PSNR 上提升了1.19 dB，同时节省了7 GB的GPU内存。\n"
  },
  {
    "path": "abs/2412.10209.md",
    "content": "### GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion\n\nWe propose a novel approach for reconstructing animatable 3D Gaussian avatars from monocular videos captured by commodity devices like smartphones. Photorealistic 3D head avatar reconstruction from such recordings is challenging due to limited observations, which leaves unobserved regions under-constrained and can lead to artifacts in novel views. To address this problem, we introduce a multi-view head diffusion model, leveraging its priors to fill in missing regions and ensure view consistency in Gaussian splatting renderings. To enable precise viewpoint control, we use normal maps rendered from FLAME-based head reconstruction, which provides pixel-aligned inductive biases. We also condition the diffusion model on VAE features extracted from the input image to preserve details of facial identity and appearance. For Gaussian avatar reconstruction, we distill multi-view diffusion priors by using iteratively denoised images as pseudo-ground truths, effectively mitigating over-saturation issues. To further improve photorealism, we apply latent upsampling to refine the denoised latent before decoding it into an image. We evaluate our method on the NeRSemble dataset, showing that GAF outperforms the previous state-of-the-art methods in novel view synthesis by a 5.34% higher SSIM score. Furthermore, we demonstrate higher-fidelity avatar reconstructions from monocular videos captured on commodity devices.\n\n我们提出了一种从由智能手机等常见设备拍摄的单目视频中重建可动画化三维高斯头像的新方法。从此类视频中进行光真实感三维头像重建具有挑战性，因受限的观察视角会使未观察区域欠约束，从而在新视图中引发伪影问题。为解决这一问题，我们引入了一种多视角头部扩散模型，利用其先验知识填补缺失区域，并在高斯点云渲染中确保视图一致性。\n为了实现精确的视角控制，我们使用基于 FLAME 的头部重建生成的法线图，提供像素对齐的归纳偏置。同时，我们通过对扩散模型输入条件化的方式，将从输入图像提取的 VAE 特征作为条件，保留面部身份和外观的细节。在高斯头像重建中，我们通过使用迭代去噪图像作为伪真值，蒸馏多视角扩散先验，有效缓解了过度饱和的问题。为了进一步提高光真实感，我们采用潜变量上采样技术，在解码图像之前对去噪潜变量进行精细化处理。\n在 NeRSemble 数据集上的评估结果表明，GAF 在新视图合成中比现有最先进方法提高了 5.34% 的 SSIM 得分。此外，我们证明了从由常见设备拍摄的单目视频中实现了更高保真度的头像重建。\n"
  },
  {
    "path": "abs/2412.10231.md",
    "content": "### SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians\n\n3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthesis, more recent works investigated how to extend it with scene understanding and language features. However, existing methods lack a detailed comprehension of scenes, limiting their ability to segment and interpret complex structures. To this end, We introduce SuperGSeg, a novel approach that fosters cohesive, context-aware scene representation by disentangling segmentation and language field distillation. SuperGSeg first employs neural Gaussians to learn instance and hierarchical segmentation features from multi-view images with the aid of off-the-shelf 2D masks. These features are then leveraged to create a sparse set of what we call Super-Gaussians. Super-Gaussians facilitate the distillation of 2D language features into 3D space. Through Super-Gaussians, our method enables high-dimensional language feature rendering without extreme increases in GPU memory. Extensive experiments demonstrate that SuperGSeg outperforms prior works on both open-vocabulary object localization and semantic segmentation tasks.\n\n三维高斯点云技术（3D Gaussian Splatting）因其高效的训练和实时渲染能力，近期受到了广泛关注。尽管基础的高斯点云表示主要用于视图合成，但近年来的研究尝试将其扩展到场景理解和语言特征融合。然而，现有方法在场景的细粒度理解方面存在不足，限制了其对复杂结构进行分割和解释的能力。\n为此，我们提出了 SuperGSeg，一种通过解耦分割和语言场蒸馏来促进连贯、上下文感知场景表示的新方法。SuperGSeg 首先利用神经高斯（Neural Gaussians）结合现成的二维掩码，从多视角图像中学习实例和层次分割特征。这些特征随后被用于创建一个稀疏集合，我们称之为 Super-Gaussians。Super-Gaussians 用于将二维语言特征蒸馏到三维空间，从而支持高维语言特征渲染，而无需极大地增加 GPU 内存需求。\n广泛的实验结果表明，SuperGSeg 在开放词汇对象定位和语义分割任务上均优于现有方法，显著提升了性能和场景理解的能力。\n"
  },
  {
    "path": "abs/2412.10371.md",
    "content": "### GaussianAD: Gaussian-Centric End-to-End Autonomous Driving\n\nVision-based autonomous driving shows great potential due to its satisfactory performance and low costs. Most existing methods adopt dense representations (e.g., bird's eye view) or sparse representations (e.g., instance boxes) for decision-making, which suffer from the trade-off between comprehensiveness and efficiency. This paper explores a Gaussian-centric end-to-end autonomous driving (GaussianAD) framework and exploits 3D semantic Gaussians to extensively yet sparsely describe the scene. We initialize the scene with uniform 3D Gaussians and use surrounding-view images to progressively refine them to obtain the 3D Gaussian scene representation. We then use sparse convolutions to efficiently perform 3D perception (e.g., 3D detection, semantic map construction). We predict 3D flows for the Gaussians with dynamic semantics and plan the ego trajectory accordingly with an objective of future scene forecasting. Our GaussianAD can be trained in an end-to-end manner with optional perception labels when available. Extensive experiments on the widely used nuScenes dataset verify the effectiveness of our end-to-end GaussianAD on various tasks including motion planning, 3D occupancy prediction, and 4D occupancy forecasting.\n\n基于视觉的自动驾驶因其出色的性能和低成本展现了巨大潜力。目前大多数方法采用密集表示（如鸟瞰视图）或稀疏表示（如实例框）进行决策，这在全面性和效率之间存在权衡。本文提出了一种以高斯为中心的端到端自动驾驶框架（GaussianAD），利用3D语义高斯实现对场景的广泛且稀疏的描述。我们使用均匀分布的3D高斯初始化场景，并通过周围视角的图像逐步细化，生成3D高斯场景表示。随后，我们利用稀疏卷积高效地执行3D感知任务（如3D检测和语义地图构建）。\n我们针对具有动态语义的高斯预测3D流动，并以未来场景预测为目标规划自车轨迹。GaussianAD 可以采用端到端的方式进行训练，并在可用时利用可选的感知标签。在广泛使用的 nuScenes 数据集上的实验表明，GaussianAD 在运动规划、3D占用预测以及4D占用预测等多项任务中表现出色，验证了其端到端方法的有效性。\n"
  },
  {
    "path": "abs/2412.10373.md",
    "content": "### GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction\n\n3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. To incorporate sequential inputs, most existing methods fuse representations from previous frames to infer the current 3D occupancy. However, they fail to consider the continuity of driving scenarios and ignore the strong prior provided by the evolution of 3D scenes (e.g., only dynamic objects move). In this paper, we propose a world-model-based framework to exploit the scene evolution for perception. We reformulate 3D occupancy prediction as a 4D occupancy forecasting problem conditioned on the current sensor input. We decompose the scene evolution into three factors: 1) ego motion alignment of static scenes; 2) local movements of dynamic objects; and 3) completion of newly-observed scenes. We then employ a Gaussian world model (GaussianWorld) to explicitly exploit these priors and infer the scene evolution in the 3D Gaussian space considering the current RGB observation. We evaluate the effectiveness of our framework on the widely used nuScenes dataset. Our GaussianWorld improves the performance of the single-frame counterpart by over 2% in mIoU without introducing additional computations.\n\n3D占用预测对于自动驾驶至关重要，因为它能够全面感知周围环境。为结合序列输入，目前的大多数方法通过融合先前帧的表示来推断当前的3D占用。然而，这些方法未能考虑驾驶场景的连续性，并忽略了由3D场景演化（例如，仅动态物体会移动）提供的强先验。在本文中，我们提出了一种基于世界模型的框架，用于利用场景演化进行感知。我们将3D占用预测重新表述为一种以当前传感器输入为条件的4D占用预测问题。\n我们将场景演化分解为三个因素：1）静态场景的自车运动对齐；2）动态物体的局部运动；3）新观察场景的补全。随后，我们采用高斯世界模型（GaussianWorld），在考虑当前RGB观测的情况下，显式地利用这些先验来推断3D高斯空间中的场景演化。\n在广泛使用的 nuScenes 数据集上的实验表明，我们的 GaussianWorld 在不增加额外计算的情况下，将单帧方法的性能（mIoU）提高了2%以上，验证了框架的有效性。\n"
  },
  {
    "path": "abs/2412.10972.md",
    "content": "### DCSEG: Decoupled 3D Open-Set Segmentation using Gaussian Splatting\n\nOpen-set 3D segmentation represents a major point of interest for multiple downstream robotics and augmented/virtual reality applications. Recent advances introduce 3D Gaussian Splatting as a computationally efficient representation of the underlying scene. They enable the rendering of novel views while achieving real-time display rates and matching the quality of computationally far more expensive methods. We present a decoupled 3D segmentation pipeline to ensure modularity and adaptability to novel 3D representations and semantic segmentation foundation models. The pipeline proposes class-agnostic masks based on a 3D reconstruction of the scene. Given the resulting class-agnostic masks, we use a class-aware 2D foundation model to add class annotations to the 3D masks. We test this pipeline with 3D Gaussian Splatting and different 2D segmentation models and achieve better performance than more tailored approaches while also significantly increasing the modularity.\n\n开放集三维分割是机器人和增强/虚拟现实等多个下游应用中的重要研究方向。最近的研究引入了三维高斯点云（3D Gaussian Splatting）作为一种计算高效的场景表示方法，不仅能够渲染新视图，还能实现实时显示速率，同时在质量上可与计算成本更高的方法相媲美。为提升适应性和模块化，我们提出了一种解耦的三维分割流程，以适配新型三维表示和语义分割基础模型。\n该流程基于场景的三维重建生成与类别无关的掩码，然后利用一个类别感知的二维基础模型为三维掩码添加类别注释。我们在三维高斯点云以及不同的二维分割模型上测试了这一流程，与更为定制化的方法相比，不仅取得了更优的性能，还显著提升了流程的模块化程度。\n\n"
  },
  {
    "path": "abs/2412.11258.md",
    "content": "### GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs\n\nEstimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics, underpinning applications such as augmented reality, physical simulation, and robotic grasping. However, this area remains under-explored due to the inherent ambiguities in physical property estimation. To address these challenges, we introduce GaussianProperty, a training-free framework that assigns physical properties of materials to 3D Gaussians. Specifically, we integrate the segmentation capability of SAM with the recognition capability of GPT-4V(ision) to formulate a global-local physical property reasoning module for 2D images. Then we project the physical properties from multi-view 2D images to 3D Gaussians using a voting strategy. We demonstrate that 3D Gaussians with physical property annotations enable applications in physics-based dynamic simulation and robotic grasping. For physics-based dynamic simulation, we leverage the Material Point Method (MPM) for realistic dynamic simulation. For robot grasping, we develop a grasping force prediction strategy that estimates a safe force range required for object grasping based on the estimated physical properties. Extensive experiments on material segmentation, physics-based dynamic simulation, and robotic grasping validate the effectiveness of our proposed method, highlighting its crucial role in understanding physical properties from visual data.\n\n对视觉数据进行物理属性估计是计算机视觉、图形学和机器人领域的一项关键任务，支撑着增强现实、物理模拟和机器人抓取等应用。然而，由于物理属性估计的内在模糊性，这一领域仍然未被充分探索。为应对这些挑战，我们提出了一种GaussianProperty框架，这是一种无需训练的方案，能够为3D高斯分配材料的物理属性。\n具体来说，我们结合了 SAM 的分割能力和 GPT-4V(ision) 的识别能力，设计了一个用于2D图像的全局-局部物理属性推理模块。随后，我们通过投票策略将多视图2D图像的物理属性投射到3D高斯上。我们证明了带有物理属性标注的3D高斯能够支持基于物理的动态模拟和机器人抓取等应用。在基于物理的动态模拟中，我们利用材料点方法（Material Point Method, MPM）进行逼真的动态模拟。在机器人抓取方面，我们开发了一种抓取力预测策略，基于估计的物理属性，预测安全的抓取力范围。\n在材料分割、基于物理的动态模拟以及机器人抓取方面的广泛实验验证了我们方法的有效性，凸显了其在通过视觉数据理解物理属性中的重要作用。\n"
  },
  {
    "path": "abs/2412.11520.md",
    "content": "### EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting\n\nRecent advancements in 3D editing have highlighted the potential of text-driven methods in real-time, user-friendly AR/VR applications. However, current methods rely on 2D diffusion models without adequately considering multi-view information, resulting in multi-view inconsistency. While 3D Gaussian Splatting (3DGS) significantly improves rendering quality and speed, its 3D editing process encounters difficulties with inefficient optimization, as pre-trained Gaussians retain excessive source information, hindering optimization. To address these limitations, we propose \\textbf{EditSplat}, a novel 3D editing framework that integrates Multi-view Fusion Guidance (MFG) and Attention-Guided Trimming (AGT). Our MFG ensures multi-view consistency by incorporating essential multi-view information into the diffusion process, leveraging classifier-free guidance from the text-to-image diffusion model and the geometric properties of 3DGS. Additionally, our AGT leverages the explicit representation of 3DGS to selectively prune and optimize 3D Gaussians, enhancing optimization efficiency and enabling precise, semantically rich local edits. Through extensive qualitative and quantitative evaluations, EditSplat achieves superior multi-view consistency and editing quality over existing methods, significantly enhancing overall efficiency.\n\n三维编辑的最新进展凸显了文本驱动方法在实时、用户友好的 AR/VR 应用中的潜力。然而，现有方法依赖于二维扩散模型，未能充分考虑多视角信息，从而导致多视角不一致的问题。尽管三维高斯点云技术（3D Gaussian Splatting, 3DGS）在渲染质量和速度上取得了显著提升，其三维编辑过程却因优化效率低下而面临挑战，主要原因在于预训练的高斯点云保留了过多的原始信息，阻碍了优化过程。\n为了解决这些限制，我们提出了 EditSplat，一种融合了多视角融合引导（Multi-view Fusion Guidance, MFG）和注意力引导修剪（Attention-Guided Trimming, AGT）的新型三维编辑框架。MFG 通过在扩散过程中整合关键多视角信息，结合文本到图像扩散模型的无分类器引导和 3DGS 的几何特性，确保了多视角一致性。同时，AGT 利用 3DGS 的显式表示，对三维高斯进行选择性修剪和优化，从而提高优化效率并实现精准且语义丰富的局部编辑。\n通过广泛的定性和定量评估，EditSplat 在多视角一致性和编辑质量方面显著优于现有方法，同时显著提升了整体效率。\n"
  },
  {
    "path": "abs/2412.11579.md",
    "content": "### SweepEvGS: Event-Based 3D Gaussian Splatting for Macro and Micro Radiance Field Rendering from a Single Sweep\n\nRecent advancements in 3D Gaussian Splatting (3D-GS) have demonstrated the potential of using 3D Gaussian primitives for high-speed, high-fidelity, and cost-efficient novel view synthesis from continuously calibrated input views. However, conventional methods require high-frame-rate dense and high-quality sharp images, which are time-consuming and inefficient to capture, especially in dynamic environments. Event cameras, with their high temporal resolution and ability to capture asynchronous brightness changes, offer a promising alternative for more reliable scene reconstruction without motion blur. In this paper, we propose SweepEvGS, a novel hardware-integrated method that leverages event cameras for robust and accurate novel view synthesis across various imaging settings from a single sweep. SweepEvGS utilizes the initial static frame with dense event streams captured during a single camera sweep to effectively reconstruct detailed scene views. We also introduce different real-world hardware imaging systems for real-world data collection and evaluation for future research. We validate the robustness and efficiency of SweepEvGS through experiments in three different imaging settings: synthetic objects, real-world macro-level, and real-world micro-level view synthesis. Our results demonstrate that SweepEvGS surpasses existing methods in visual rendering quality, rendering speed, and computational efficiency, highlighting its potential for dynamic practical applications.\n\n三维高斯点云技术（3D Gaussian Splatting, 3D-GS）的最新进展显示了利用三维高斯基元在高速、高保真和成本高效的新视图合成中的潜力，这基于连续校准的输入视角。然而，传统方法依赖高帧率的稠密、高质量的清晰图像，这种图像的采集在动态环境中既耗时又低效。事件相机以其高时间分辨率和捕捉异步亮度变化的能力，为无运动模糊的更可靠场景重建提供了一种有前途的替代方案。\n在本文中，我们提出了 SweepEvGS，一种新型硬件集成方法，利用事件相机在各种成像条件下实现鲁棒且精确的新视图合成。SweepEvGS 使用单次相机扫描期间捕获的初始静态帧和稠密事件流，有效地重建了细节丰富的场景视图。同时，我们还介绍了不同的真实世界硬件成像系统，用于数据采集和未来研究的评估。\n通过在三种不同成像条件下的实验验证了 SweepEvGS 的鲁棒性和高效性，这些条件包括合成对象、真实世界宏观视图以及真实世界微观视图合成。结果表明，SweepEvGS 在视觉渲染质量、渲染速度和计算效率上均优于现有方法，凸显了其在动态实际应用中的潜力。\n\n"
  },
  {
    "path": "abs/2412.11599.md",
    "content": "### 3D2-Actor: Learning Pose-Conditioned 3D-Aware Denoiser for Realistic Gaussian Avatar Modeling\n\nAdvancements in neural implicit representations and differentiable rendering have markedly improved the ability to learn animatable 3D avatars from sparse multi-view RGB videos. However, current methods that map observation space to canonical space often face challenges in capturing pose-dependent details and generalizing to novel poses. While diffusion models have demonstrated remarkable zero-shot capabilities in 2D image generation, their potential for creating animatable 3D avatars from 2D inputs remains underexplored. In this work, we introduce 3D2-Actor, a novel approach featuring a pose-conditioned 3D-aware human modeling pipeline that integrates iterative 2D denoising and 3D rectifying steps. The 2D denoiser, guided by pose cues, generates detailed multi-view images that provide the rich feature set necessary for high-fidelity 3D reconstruction and pose rendering. Complementing this, our Gaussian-based 3D rectifier renders images with enhanced 3D consistency through a two-stage projection strategy and a novel local coordinate representation. Additionally, we propose an innovative sampling strategy to ensure smooth temporal continuity across frames in video synthesis. Our method effectively addresses the limitations of traditional numerical solutions in handling ill-posed mappings, producing realistic and animatable 3D human avatars. Experimental results demonstrate that 3D2-Actor excels in high-fidelity avatar modeling and robustly generalizes to novel poses.\n\n神经隐式表示和可微渲染的进步显著提升了从稀疏多视角RGB视频中学习可动画3D角色的能力。然而，目前将观察空间映射到规范空间的方法在捕捉依赖姿态的细节以及推广到新姿态时常面临挑战。尽管扩散模型在2D图像生成中展现了卓越的零样本能力，但其在从2D输入中创建可动画3D角色的潜力仍未被充分挖掘。\n在这项工作中，我们提出了 3D2-Actor，一种新颖的姿态条件3D感知人类建模管线，集成了迭代的2D去噪和3D校正步骤。2D去噪模块通过姿态线索引导，生成详细的多视角图像，为高保真3D重建和姿态渲染提供了丰富的特征集合。与此相辅相成，我们的基于高斯的3D校正模块通过两阶段投影策略和一种新颖的局部坐标表示，渲染出具有增强3D一致性的图像。此外，我们提出了一种创新的采样策略，确保视频合成中跨帧的平滑时间连续性。\n我们的方法有效解决了传统数值解法在处理病态映射时的局限性，生成真实且可动画的3D人类角色。实验结果表明，3D2-Actor 在高保真角色建模方面表现卓越，并能够稳健地推广到新姿态。\n"
  },
  {
    "path": "abs/2412.11752.md",
    "content": "### Deformable Radial Kernel Splatting\n\nRecently, Gaussian splatting has emerged as a robust technique for representing 3D scenes, enabling real-time rasterization and high-fidelity rendering. However, Gaussians' inherent radial symmetry and smoothness constraints limit their ability to represent complex shapes, often requiring thousands of primitives to approximate detailed geometry. We introduce Deformable Radial Kernel (DRK), which extends Gaussian splatting into a more general and flexible framework. Through learnable radial bases with adjustable angles and scales, DRK efficiently models diverse shape primitives while enabling precise control over edge sharpness and boundary curvature. iven DRK's planar nature, we further develop accurate ray-primitive intersection computation for depth sorting and introduce efficient kernel culling strategies for improved rasterization efficiency. Extensive experiments demonstrate that DRK outperforms existing methods in both representation efficiency and rendering quality, achieving state-of-the-art performance while dramatically reducing primitive count.\n\n近年来，高斯点云技术（Gaussian Splatting）作为一种鲁棒的三维场景表示方法迅速发展，实现了实时光栅化和高保真渲染。然而，高斯本身的径向对称性和光滑性限制了其对复杂形状的表示能力，通常需要成千上万个基元才能逼近细节几何。\n为解决这一问题，我们提出了 可变形径向核（Deformable Radial Kernel, DRK），将高斯点云扩展为更通用且灵活的框架。通过可学习的径向基函数，DRK 支持角度和尺度的可调节性，能够高效地建模多样的形状基元，同时实现对边缘锐度和边界曲率的精确控制。针对 DRK 的平面特性，我们进一步开发了准确的光线与基元相交计算方法，用于深度排序，并引入了高效的核剔除策略以提高光栅化效率。\n大量实验表明，DRK 在表示效率和渲染质量方面均优于现有方法，不仅达到了当前最先进的性能，还显著减少了基元数量，展现了卓越的表示能力和实用价值。\n"
  },
  {
    "path": "abs/2412.11762.md",
    "content": "### GS-ProCams: Gaussian Splatting-based Projector-Camera Systems\n\nWe present GS-ProCams, the first Gaussian Splatting-based framework for projector-camera systems (ProCams). GS-ProCams significantly enhances the efficiency of projection mapping (PM) that requires establishing geometric and radiometric mappings between the projector and the camera. Previous CNN-based ProCams are constrained to a specific viewpoint, limiting their applicability to novel perspectives. In contrast, NeRF-based ProCams support view-agnostic projection mapping, however, they require an additional colocated light source and demand significant computational and memory resources. To address this issue, we propose GS-ProCams that employs 2D Gaussian for scene representations, and enables efficient view-agnostic ProCams applications. In particular, we explicitly model the complex geometric and photometric mappings of ProCams using projector responses, the target surface's geometry and materials represented by Gaussians, and global illumination component. Then, we employ differentiable physically-based rendering to jointly estimate them from captured multi-view projections. Compared to state-of-the-art NeRF-based methods, our GS-ProCams eliminates the need for additional devices, achieving superior ProCams simulation quality. It is also 600 times faster and uses only 1/10 of the GPU memory.\n\n我们提出了 GS-ProCams，这是第一个基于高斯点云（Gaussian Splatting）的投影仪-摄像头系统（ProCams）框架，显著提升了投影映射（Projection Mapping, PM）的效率。投影映射需要建立投影仪与摄像头之间的几何和辐射映射，而以往基于 CNN 的 ProCams 方法仅适用于特定视角，限制了其在新视角下的应用能力。相比之下，基于 NeRF 的 ProCams 支持与视角无关的投影映射，但需要额外的同位光源，并且对计算和内存资源有较高需求。\n为了解决上述问题，我们提出的 GS-ProCams 使用二维高斯点云进行场景表示，实现了高效的视角无关 ProCams 应用。具体而言，我们显式建模了 ProCams 的复杂几何和光度映射，涵盖投影仪响应、由高斯表示的目标表面几何和材质，以及全局光照组件。随后，我们通过可微分的基于物理的渲染（PBR），从多视角投影捕获中联合估计这些组件。\n与最新的基于 NeRF 的方法相比，GS-ProCams 不再需要额外的设备，达到了更高质量的 ProCams 模拟效果，同时在速度上提高了 600 倍，GPU 内存使用量减少至原来的 1/10，展现了卓越的性能和效率。\n"
  },
  {
    "path": "abs/2412.12091.md",
    "content": "### Wonderland: Navigating 3D Scenes from a Single Image\n\nThis paper addresses a challenging question: How can we efficiently create high-quality, wide-scope 3D scenes from a single arbitrary image? Existing methods face several constraints, such as requiring multi-view data, time-consuming per-scene optimization, low visual quality in backgrounds, and distorted reconstructions in unseen areas. We propose a novel pipeline to overcome these limitations. Specifically, we introduce a large-scale reconstruction model that uses latents from a video diffusion model to predict 3D Gaussian Splattings for the scenes in a feed-forward manner. The video diffusion model is designed to create videos precisely following specified camera trajectories, allowing it to generate compressed video latents that contain multi-view information while maintaining 3D consistency. We train the 3D reconstruction model to operate on the video latent space with a progressive training strategy, enabling the efficient generation of high-quality, wide-scope, and generic 3D scenes. Extensive evaluations across various datasets demonstrate that our model significantly outperforms existing methods for single-view 3D scene generation, particularly with out-of-domain images. For the first time, we demonstrate that a 3D reconstruction model can be effectively built upon the latent space of a diffusion model to realize efficient 3D scene generation.\n\n本文探讨了一个具有挑战性的问题：如何从单张任意图像高效生成高质量、广范围的三维场景。现有方法存在多种限制，例如需要多视角数据、场景优化耗时、背景视觉质量较低，以及未观察区域的重建失真等。为克服这些问题，我们提出了一种新型流程，设计了一个基于视频扩散模型的三维高斯点云（3D Gaussian Splattings）预测模型，以前馈方式生成场景。\n该视频扩散模型专为精确生成遵循指定相机轨迹的视频而设计，能够生成包含多视角信息的压缩视频潜变量，同时保持三维一致性。我们通过渐进式训练策略训练三维重建模型，使其能够在视频潜变量空间中操作，从而高效生成高质量、广范围的通用三维场景。\n在多个数据集上的广泛评估表明，我们的模型在单视角三维场景生成方面显著优于现有方法，尤其在处理域外图像时表现尤为突出。我们首次展示了一种基于扩散模型潜变量空间的三维重建模型能够有效实现高效的三维场景生成，开创了新的研究方向。\n"
  },
  {
    "path": "abs/2412.12096.md",
    "content": "### PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting\n\nWith the advent of portable 360° cameras, panorama has gained significant attention in applications like virtual reality (VR), virtual tours, robotics, and autonomous driving. As a result, wide-baseline panorama view synthesis has emerged as a vital task, where high resolution, fast inference, and memory efficiency are essential. Nevertheless, existing methods are typically constrained to lower resolutions (512 × 1024) due to demanding memory and computational requirements. In this paper, we present PanSplat, a generalizable, feed-forward approach that efficiently supports resolution up to 4K (2048 × 4096). Our approach features a tailored spherical 3D Gaussian pyramid with a Fibonacci lattice arrangement, enhancing image quality while reducing information redundancy. To accommodate the demands of high resolution, we propose a pipeline that integrates a hierarchical spherical cost volume and Gaussian heads with local operations, enabling two-step deferred backpropagation for memory-efficient training on a single A100 GPU. Experiments demonstrate that PanSplat achieves state-of-the-art results with superior efficiency and image quality across both synthetic and real-world datasets.\n\n随着便携式 360° 相机的普及，全景图在虚拟现实（VR）、虚拟旅游、机器人和自动驾驶等应用中引起了广泛关注。因此，宽基线全景视图合成成为了一项重要任务，其中高分辨率、快速推理和内存效率至关重要。然而，现有方法通常受限于较低分辨率（512 × 1024），原因在于高昂的内存和计算需求。\n本文提出了 PanSplat，一种通用的前馈式方法，可高效支持高达 4K（2048 × 4096）分辨率。我们的方法采用了专门设计的球面三维高斯金字塔，并基于 Fibonacci 格点排列，以提升图像质量同时减少信息冗余。为满足高分辨率的需求，我们设计了一种集成分层球面代价体积和局部操作高斯头的流程，通过两步延迟反向传播实现单张 A100 GPU 上的内存高效训练。\n实验表明，PanSplat 在合成和真实世界数据集上均取得了当前最先进的结果，不仅具备优越的效率，还显著提高了图像质量。\n"
  },
  {
    "path": "abs/2412.12507.md",
    "content": "### 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has shown great potential for efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware. However, due to its rasterization-based formulation, 3DGS is constrained to ideal pinhole cameras and lacks support for secondary lighting effects. Recent methods address these limitations by tracing volumetric particles instead, however, this comes at the cost of significantly slower rendering speeds. In this work, we propose 3D Gaussian Unscented Transform (3DGUT), replacing the EWA splatting formulation in 3DGS with the Unscented Transform that approximates the particles through sigma points, which can be projected exactly under any nonlinear projection function. This modification enables trivial support of distorted cameras with time dependent effects such as rolling shutter, while retaining the efficiency of rasterization. Additionally, we align our rendering formulation with that of tracing-based methods, enabling secondary ray tracing required to represent phenomena such as reflections and refraction within the same 3D representation.\n\n三维高斯点云（3D Gaussian Splatting, 3DGS）在消费者级硬件上实现复杂场景的高效重建和高保真实时渲染方面展现了巨大潜力。然而，由于其基于光栅化的框架，3DGS 受限于理想针孔相机模型，且不支持次级光照效果。尽管近期方法通过跟踪体积粒子解决了这些局限性，但代价是渲染速度显著降低。\n在本研究中，我们提出了 3D Gaussian Unscented Transform (3DGUT)，将 3DGS 中的 EWA 点云投影公式替换为无迹变换（Unscented Transform），通过使用 sigma 点来逼近粒子，并允许粒子在任意非线性投影函数下精确投影。此改进不仅保留了光栅化的效率，还能够轻松支持带有时间依赖效应（如滚动快门）的畸变相机。\n此外，我们将渲染公式与基于追踪的方法对齐，从而在同一三维表示框架中支持次级光线追踪，以呈现诸如反射和折射等现象。3DGUT 通过结合光栅化的高效性与追踪方法的灵活性，为复杂场景的逼真渲染提供了一种创新的解决方案。\n"
  },
  {
    "path": "abs/2412.12734.md",
    "content": "### Gaussian Billboards: Expressive 2D Gaussian Splatting with Textures\n\nGaussian Splatting has recently emerged as the go-to representation for reconstructing and rendering 3D scenes. The transition from 3D to 2D Gaussian primitives has further improved multi-view consistency and surface reconstruction accuracy. In this work we highlight the similarity between 2D Gaussian Splatting (2DGS) and billboards from traditional computer graphics. Both use flat semi-transparent 2D geometry that is positioned, oriented and scaled in 3D space. However 2DGS uses a solid color per splat and an opacity modulated by a Gaussian distribution, where billboards are more expressive, modulating the color with a uv-parameterized texture. We propose to unify these concepts by presenting Gaussian Billboards, a modification of 2DGS to add spatially-varying color achieved using per-splat texture interpolation. The result is a mixture of the two representations, which benefits from both the robust scene optimization power of 2DGS and the expressiveness of texture mapping. We show that our method can improve the sharpness and quality of the scene representation in a wide range of qualitative and quantitative evaluations compared to the original 2DGS implementation.\n\n高斯点云技术（Gaussian Splatting）近年来成为三维场景重建与渲染的首选表示方法。通过从三维高斯基元过渡到二维高斯基元（2DGS），多视图一致性和表面重建精度得到了进一步提升。在本研究中，我们指出二维高斯点云（2DGS）与传统计算机图形学中的广告牌（billboards）具有相似性。两者都使用平坦的半透明二维几何体，定位、定向并在三维空间中缩放。然而，2DGS 为每个点云分配了一个固定颜色，并通过高斯分布调节不透明度，而广告牌则更具表现力，利用 uv 参数化纹理调节颜色。\n我们提出了 Gaussian Billboards，一种对 2DGS 的改进方法，通过每个点云的纹理插值实现空间变化的颜色。该方法将两种表示形式统一起来，结合了 2DGS 在场景优化中的鲁棒性和纹理映射的表现力。相比于原始 2DGS 实现，我们的方法在广泛的定性和定量评估中表现出更高的清晰度和场景表示质量，为场景渲染提供了更精细的解决方案。\n"
  },
  {
    "path": "abs/2412.12849.md",
    "content": "### HyperGS: Hyperspectral 3D Gaussian Splatting\n\nWe introduce HyperGS, a novel framework for Hyperspectral Novel View Synthesis (HNVS), based on a new latent 3D Gaussian Splatting (3DGS) technique. Our approach enables simultaneous spatial and spectral renderings by encoding material properties from multi-view 3D hyperspectral datasets. HyperGS reconstructs high-fidelity views from arbitrary perspectives with improved accuracy and speed, outperforming currently existing methods. To address the challenges of high-dimensional data, we perform view synthesis in a learned latent space, incorporating a pixel-wise adaptive density function and a pruning technique for increased training stability and efficiency. Additionally, we introduce the first HNVS benchmark, implementing a number of new baselines based on recent SOTA RGB-NVS techniques, alongside the small number of prior works on HNVS. We demonstrate HyperGS's robustness through extensive evaluation of real and simulated hyperspectral scenes with a 14db accuracy improvement upon previously published models.\n\n我们提出了 HyperGS，一种基于新型潜在三维高斯点云技术（3D Gaussian Splatting, 3DGS）的高光谱新视图合成（Hyperspectral Novel View Synthesis, HNVS）框架。HyperGS 通过对多视角三维高光谱数据集的材质属性编码，实现了空间与光谱的同步渲染，能够从任意视角高保真地重建场景，同时在精度和速度上优于现有方法。\n为应对高维数据的挑战，我们在一个学习的潜在空间中进行视图合成，并引入了一种像素级自适应密度函数和修剪技术，以提高训练的稳定性和效率。此外，我们提出了首个 HNVS 基准测试，结合最新的 SOTA RGB 新视图合成技术和少量现有的 HNVS 方法，构建了一系列新基准。\n通过对真实和模拟高光谱场景的广泛评估，HyperGS 展现了出色的鲁棒性，相较于已有模型实现了高达 14 dB 的精度提升，为高光谱场景的高效、高精度合成设立了新标杆。\n"
  },
  {
    "path": "abs/2412.12906.md",
    "content": "### CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image\n\nRecently, generalizable feed-forward methods based on 3D Gaussian Splatting have gained significant attention for their potential to reconstruct 3D scenes using finite resources. These approaches create a 3D radiance field, parameterized by per-pixel 3D Gaussian primitives, from just a few images in a single forward pass. However, unlike multi-view methods that benefit from cross-view correspondences, 3D scene reconstruction with a single-view image remains an underexplored area. In this work, we introduce CATSplat, a novel generalizable transformer-based framework designed to break through the inherent constraints in monocular settings. First, we propose leveraging textual guidance from a visual-language model to complement insufficient information from a single image. By incorporating scene-specific contextual details from text embeddings through cross-attention, we pave the way for context-aware 3D scene reconstruction beyond relying solely on visual cues. Moreover, we advocate utilizing spatial guidance from 3D point features toward comprehensive geometric understanding under single-view settings. With 3D priors, image features can capture rich structural insights for predicting 3D Gaussians without multi-view techniques. Extensive experiments on large-scale datasets demonstrate the state-of-the-art performance of CATSplat in single-view 3D scene reconstruction with high-quality novel view synthesis.\n\n近年来，基于三维高斯点云（3D Gaussian Splatting）的通用前馈方法因其在有限资源下重建三维场景的潜力而备受关注。这些方法通过单次前向传播，从少量图像中生成由每像素三维高斯基元参数化的三维辐射场。然而，与利用多视角对应性的多视图方法相比，单视图图像的三维场景重建仍然是一个尚未深入探索的领域。\n在本研究中，我们提出了 CATSplat，一种创新的基于 Transformer 的框架，旨在突破单目设置中的固有限制。首先，我们通过视觉-语言模型的文本引导来补充单张图像中不足的信息。通过交叉注意力机制，将文本嵌入中的场景特定上下文细节引入重建过程，超越了单纯依赖视觉线索的限制，为上下文感知的三维场景重建提供了新思路。此外，我们还利用来自三维点特征的空间引导，以在单视图条件下实现全面的几何理解。通过三维先验，图像特征能够捕获丰富的结构信息，从而在无需多视图技术的情况下预测三维高斯点云。\n在大规模数据集上的广泛实验表明，CATSplat 在单视图三维场景重建和高质量新视图合成方面达到了当前最先进的性能，显著提升了单视图条件下的三维重建能力。\n"
  },
  {
    "path": "abs/2412.12919.md",
    "content": "### 4DRGS: 4D Radiative Gaussian Splatting for Efficient 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images\n\nReconstructing 3D vessel structures from sparse-view dynamic digital subtraction angiography (DSA) images enables accurate medical assessment while reducing radiation exposure. Existing methods often produce suboptimal results or require excessive computation time. In this work, we propose 4D radiative Gaussian splatting (4DRGS) to achieve high-quality reconstruction efficiently. In detail, we represent the vessels with 4D radiative Gaussian kernels. Each kernel has time-invariant geometry parameters, including position, rotation, and scale, to model static vessel structures. The time-dependent central attenuation of each kernel is predicted from a compact neural network to capture the temporal varying response of contrast agent flow. We splat these Gaussian kernels to synthesize DSA images via X-ray rasterization and optimize the model with real captured ones. The final 3D vessel volume is voxelized from the well-trained kernels. Moreover, we introduce accumulated attenuation pruning and bounded scaling activation to improve reconstruction quality. Extensive experiments on real-world patient data demonstrate that 4DRGS achieves impressive results in 5 minutes training, which is 32x faster than the state-of-the-art method. This underscores the potential of 4DRGS for real-world clinics.\n\n从稀疏视角的动态数字减影血管造影（DSA）图像中重建三维血管结构，可以在降低辐射剂量的同时实现准确的医学评估。然而，现有方法常常结果欠佳或计算耗时过长。为此，我们提出了 四维辐射高斯点云（4D Radiative Gaussian Splatting, 4DRGS），以高效实现高质量重建。\n具体而言，我们使用四维辐射高斯核来表示血管结构。每个高斯核具有时间不变的几何参数，包括位置、旋转和尺度，用于建模静态的血管结构。为了捕捉对比剂流动的时间变化响应，我们通过一个紧凑的神经网络预测每个高斯核的时间相关中心衰减。我们通过 X 射线光栅化对这些高斯核进行点云渲染，合成 DSA 图像，并利用真实采集的 DSA 图像优化模型。最终的三维血管体积从经过充分训练的高斯核体素化生成。\n此外，我们引入了 累积衰减修剪 和 有界缩放激活 策略，以进一步提升重建质量。基于真实患者数据的大量实验表明，4DRGS 在仅 5 分钟的训练时间内即可实现卓越的重建效果，其速度比当前最先进方法快 32 倍。这表明 4DRGS 在实际临床应用中具有巨大潜力。\n"
  },
  {
    "path": "abs/2412.13047.md",
    "content": "### EOGS: Gaussian Splatting for Earth Observation\n\nRecently, Gaussian splatting has emerged as a strong alternative to NeRF, demonstrating impressive 3D modeling capabilities while requiring only a fraction of the training and rendering time. In this paper, we show how the standard Gaussian splatting framework can be adapted for remote sensing, retaining its high efficiency. This enables us to achieve state-of-the-art performance in just a few minutes, compared to the day-long optimization required by the best-performing NeRF-based Earth observation methods. The proposed framework incorporates remote-sensing improvements from EO-NeRF, such as radiometric correction and shadow modeling, while introducing novel components, including sparsity, view consistency, and opacity regularizations.\n\n近年来，高斯点云技术（Gaussian Splatting）作为一种强有力的 NeRF 替代方法，以显著降低训练和渲染时间的优势，展现了卓越的三维建模能力。在本文中，我们展示了如何将标准的高斯点云框架适配于遥感领域，同时保持其高效性。这一改进使我们能够在短短几分钟内实现当前最先进的性能，相比之下，性能最佳的基于 NeRF 的地球观测方法通常需要耗时一天的优化过程。\n该框架结合了来自 EO-NeRF 的遥感优化方法，包括辐射校正和阴影建模，同时引入了新的组件，如稀疏性约束、多视图一致性，以及不透明度正则化。这些改进使得我们的方法在保持高效率的同时，显著提升了遥感三维建模的精度和鲁棒性，为遥感数据的高效处理提供了新方案。\n"
  },
  {
    "path": "abs/2412.13176.md",
    "content": "### NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment\n\nSimultaneous Localization And Mapping (SLAM) from a monocular endoscopy video can enable autonomous navigation, guidance to unsurveyed regions, and 3D visualizations, which can significantly improve endoscopy experience for surgeons and patient outcomes. Existing dense SLAM algorithms often assume distant and static lighting and textured surfaces, and alternate between optimizing scene geometry and camera parameters by minimizing a photometric rendering loss, often called Photometric Bundle Adjustment. However, endoscopic environments exhibit dynamic near-field lighting due to the co-located light and camera moving extremely close to the surface, textureless surfaces, and strong specular reflections due to mucus layers. When not considered, these near-field lighting effects can cause significant performance reductions for existing SLAM algorithms from indoor/outdoor scenes when applied to endoscopy videos. To mitigate this problem, we introduce a new Near-Field Lighting Bundle Adjustment Loss (LNFL−BA) that can also be alternatingly optimized, along with the Photometric Bundle Adjustment loss, such that the captured images' intensity variations match the relative distance and orientation between the surface and the co-located light and camera. We derive a general NFL-BA loss function for 3D Gaussian surface representations and demonstrate that adding LNFL−BA can significantly improve the tracking and mapping performance of two state-of-the-art 3DGS-SLAM systems, MonoGS (35% improvement in tracking, 48% improvement in mapping with predicted depth maps) and EndoGSLAM (22% improvement in tracking, marginal improvement in mapping with predicted depths), on the C3VD endoscopy dataset for colons.\n\n从单目内窥镜视频实现同步定位与建图（SLAM）可以实现自主导航、引导未探测区域以及3D可视化，这将显著改善外科医生的内窥镜操作体验并提高患者的治疗效果。现有的密集SLAM算法通常假设远距离的静态光照和纹理化表面，并通过最小化一种称为光度渲染损失（Photometric Bundle Adjustment）的光度误差，在场景几何和相机参数的优化之间交替进行。然而，在内窥镜环境中，由于光源和相机位置靠近且随运动而动态变化，表面通常缺乏纹理，同时由于粘液层的存在会产生强烈的镜面反射，这些近场光照效应会对传统针对室内/室外场景的SLAM算法在内窥镜视频中的性能造成显著影响。为了解决这一问题，我们提出了一种新的近场光照联合调整损失（LNFL−BA），可与光度联合调整损失（Photometric Bundle Adjustment）交替优化，使捕获图像的强度变化能够匹配表面与近场光源及相机之间的相对距离和方向关系。我们推导了一个通用的适用于3D高斯表面表示的NFL-BA损失函数，并证明在两种先进的3DGS-SLAM系统（MonoGS和EndoGSLAM）中添加LNFL−BA可以显著提升追踪和建图性能，分别在C3VD内窥镜结肠数据集上的追踪性能提升35%和22%，建图性能在基于预测深度图的情况下分别提升48%和略有提升。这些结果表明，考虑近场光照效应的损失函数能够有效增强SLAM系统在内窥镜复杂环境中的鲁棒性和准确性。\n"
  },
  {
    "path": "abs/2412.13193.md",
    "content": "### GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding\n\n3D Semantic Occupancy Prediction is fundamental for spatial understanding as it provides a comprehensive semantic cognition of surrounding environments. However, prevalent approaches primarily rely on extensive labeled data and computationally intensive voxel-based modeling, restricting the scalability and generalizability of 3D representation learning. In this paper, we introduce GaussTR, a novel Gaussian Transformer that leverages alignment with foundation models to advance self-supervised 3D spatial understanding. GaussTR adopts a Transformer architecture to predict sparse sets of 3D Gaussians that represent scenes in a feed-forward manner. Through aligning rendered Gaussian features with diverse knowledge from pre-trained foundation models, GaussTR facilitates the learning of versatile 3D representations and enables open-vocabulary occupancy prediction without explicit annotations. Empirical evaluations on the Occ3D-nuScenes dataset showcase GaussTR's state-of-the-art zero-shot performance, achieving 11.70 mIoU while reducing training duration by approximately 50%. These experimental results highlight the significant potential of GaussTR for scalable and holistic 3D spatial understanding, with promising implications for autonomous driving and embodied agents.\n\n3D语义占用预测是空间理解的基础，因为它能够提供对周围环境的全面语义认知。然而，目前流行的方法主要依赖大量标注数据和计算密集的基于体素的建模，这限制了3D表示学习的可扩展性和通用性。在本文中，我们提出了 GaussTR，一种新颖的高斯Transformer，通过与基础模型的对齐推进自监督的3D空间理解。GaussTR 采用 Transformer 架构，以前馈方式预测表示场景的稀疏3D高斯集合。通过将渲染的高斯特征与预训练基础模型的多样化知识对齐，GaussTR 促进了多功能3D表示的学习，并在没有显式标注的情况下实现了开放词汇的占用预测。\n在 Occ3D-nuScenes 数据集上的实证评估表明，GaussTR 实现了最先进的零样本性能，以 11.70 mIoU 的结果领先，同时训练时间减少了约50%。这些实验结果展示了 GaussTR 在可扩展和整体性3D空间理解方面的显著潜力，并在自动驾驶和智能体领域具有重要的应用前景。\n"
  },
  {
    "path": "abs/2412.13547.md",
    "content": "### Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields\n\nNovel-view synthesis is an important problem in computer vision with applications in 3D reconstruction, mixed reality, and robotics. Recent methods like 3D Gaussian Splatting (3DGS) have become the preferred method for this task, providing high-quality novel views in real time. However, the training time of a 3DGS model is slow, often taking 30 minutes for a scene with 200 views. In contrast, our goal is to reduce the optimization time by training for fewer steps while maintaining high rendering quality. Specifically, we combine the guidance from both the position error and the appearance error to achieve a more effective densification. To balance the rate between adding new Gaussians and fitting old Gaussians, we develop a convergence-aware budget control mechanism. Moreover, to make the densification process more reliable, we selectively add new Gaussians from mostly visited regions. With these designs, we reduce the Gaussian optimization steps to one-third of the previous approach while achieving a comparable or even better novel view rendering quality. To further facilitate the rapid fitting of 4K resolution images, we introduce a dilation-based rendering technique. Our method, Turbo-GS, speeds up optimization for typical scenes and scales well to high-resolution (4K) scenarios on standard datasets. Through extensive experiments, we show that our method is significantly faster in optimization than other methods while retaining quality.\n\n新视图合成是计算机视觉中的一个重要问题，广泛应用于三维重建、混合现实和机器人领域。近年来，三维高斯点云（3D Gaussian Splatting, 3DGS）成为该任务的首选方法，能够实时提供高质量的新视图。然而，3DGS 模型的训练时间较长，对于包含 200 个视图的场景，通常需要 30 分钟的优化时间。\n针对这一问题，我们的目标是在减少训练步骤的同时保持高渲染质量。具体来说，我们结合了位置误差和外观误差的引导，来实现更高效的高斯密化过程。为平衡添加新高斯和优化旧高斯的速率，我们设计了一种 收敛感知预算控制机制。此外，为了提高密化过程的可靠性，我们优先从访问频率较高的区域选择添加新高斯。\n通过这些设计，我们将高斯优化步骤减少到原方法的三分之一，同时在新视图渲染质量上保持相当甚至更好的表现。为了进一步加速 4K 分辨率图像的拟合，我们引入了一种基于膨胀的渲染技术。我们的方法 Turbo-GS 不仅加速了典型场景的优化，还能够良好扩展至高分辨率（4K）场景。\n大量实验表明，Turbo-GS 相较于其他方法在优化速度上显著更快，同时保留了高质量的渲染效果。\n\n"
  },
  {
    "path": "abs/2412.13639.md",
    "content": "### 4D Radar-Inertial Odometry based on Gaussian Modeling and Multi-Hypothesis Scan Matching\n\n4D millimeter-wave (mmWave) radars are sensors that provide robustness against adverse weather conditions (rain, snow, fog, etc.), and as such they are increasingly being used for odometry and SLAM applications. However, the noisy and sparse nature of the returned scan data proves to be a challenging obstacle for existing point cloud matching based solutions, especially those originally intended for more accurate sensors such as LiDAR. Inspired by visual odometry research around 3D Gaussian Splatting, in this paper we propose using freely positioned 3D Gaussians to create a summarized representation of a radar point cloud tolerant to sensor noise, and subsequently leverage its inherent probability distribution function for registration (similar to NDT). Moreover, we propose simultaneously optimizing multiple scan matching hypotheses in order to further increase the robustness of the system against local optima of the function. Finally, we fuse our Gaussian modeling and scan matching algorithms into an EKF radar-inertial odometry system designed after current best practices. Experiments show that our Gaussian-based odometry is able to outperform current baselines on a well-known 4D radar dataset used for evaluation.\n\n4D 毫米波 (mmWave) 雷达因其在恶劣天气条件（如雨、雪、雾等）下的鲁棒性，正越来越多地被应用于里程计和 SLAM 系统。然而，由于雷达返回的扫描数据通常具有噪声和稀疏的特性，现有基于点云匹配的解决方案（尤其是那些针对更高精度传感器如 LiDAR 设计的方法）面临较大挑战。\n受基于三维高斯点云（3D Gaussian Splatting）的视觉里程计研究的启发，本文提出了一种利用自由定位的三维高斯来生成雷达点云的摘要表示的方法。该表示对传感器噪声具有较高容忍度，并利用其固有的概率分布函数进行配准（类似于 NDT 方法）。此外，我们提出了多配准假设的同时优化，以进一步提高系统在面对函数局部最优时的鲁棒性。\n最终，我们将高斯建模和扫描匹配算法整合到一个基于扩展卡尔曼滤波（EKF）的雷达-惯性里程计系统中，设计遵循当前最佳实践。实验表明，我们基于高斯的里程计在一个知名的 4D 雷达数据集上的性能优于现有基线方法，展现了其在雷达点云处理中的强大潜力。\n\n"
  },
  {
    "path": "abs/2412.13654.md",
    "content": "### GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting\n\n3D open-vocabulary scene understanding, which accurately perceives complex semantic properties of objects in space, has gained significant attention in recent years. In this paper, we propose GAGS, a framework that distills 2D CLIP features into 3D Gaussian splatting, enabling open-vocabulary queries for renderings on arbitrary viewpoints. The main challenge of distilling 2D features for 3D fields lies in the multiview inconsistency of extracted 2D features, which provides unstable supervision for the 3D feature field. GAGS addresses this challenge with two novel strategies. First, GAGS associates the prompt point density of SAM with the camera distances, which significantly improves the multiview consistency of segmentation results. Second, GAGS further decodes a granularity factor to guide the distillation process and this granularity factor can be learned in a unsupervised manner to only select the multiview consistent 2D features in the distillation process. Experimental results on two datasets demonstrate significant performance and stability improvements of GAGS in visual grounding and semantic segmentation, with an inference speed 2× faster than baseline methods.\n\n三维开放词汇场景理解（3D Open-Vocabulary Scene Understanding）近年来受到广泛关注，其目标是准确感知空间中对象的复杂语义属性。在本文中，我们提出了 GAGS，一种将二维 CLIP 特征蒸馏到三维高斯点云（3D Gaussian Splatting）中的框架，从而支持在任意视角下进行开放词汇查询和渲染。\n二维特征蒸馏到三维场景的主要挑战在于提取的二维特征在多视角间的一致性不足，这为三维特征场的监督带来了不稳定性。GAGS 通过两种创新策略解决了这一问题。首先，GAGS 将 SAM（Segment Anything Model）中的提示点密度与相机距离相关联，从而显著提升了分割结果的多视角一致性。其次，GAGS 解码了一个粒度因子来引导蒸馏过程，该粒度因子通过无监督方式学习，仅选择多视角一致的二维特征参与蒸馏。\n在两个数据集上的实验结果表明，GAGS 在视觉定位和语义分割任务中取得了显著的性能和稳定性提升，同时推理速度比基线方法快 2 倍，展现了其在三维开放词汇场景理解中的卓越表现。\n"
  },
  {
    "path": "abs/2412.13983.md",
    "content": "### GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians\n\nRendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size.\n\n从任意视角渲染光真实感的头像是虚拟现实等多种应用中的关键任务。尽管基于神经辐射场（NeRF）的现有方法能够取得令人印象深刻的效果，但在保真度和效率方面仍存在不足。近期基于三维高斯点云（3D Gaussian Splatting, 3DGS）的方法改善了渲染质量并实现了实时性能，但其存储开销仍然较高。\n为了解决这一问题，我们提出了 GraphAvatar，一种利用图神经网络（Graph Neural Networks, GNN）生成头像三维高斯的方法。具体而言，GraphAvatar 通过训练几何 GNN 和外观 GNN，从追踪的网格中生成三维高斯的属性。因此，我们的方法仅需存储 GNN 模型，而不需要存储三维高斯本身，将存储开销显著降低至仅 10MB。\n为减轻面部追踪误差的影响，我们引入了一种基于图引导的优化模块，用于在训练过程中优化面部追踪参数。此外，我们提出了一个三维感知增强模块，用于后处理以提升渲染质量。\n通过全面实验，我们验证了 GraphAvatar 在视觉保真度和存储消耗方面的显著优势。消融研究进一步探讨了渲染质量与模型大小之间的权衡，表明 GraphAvatar 在性能与存储效率上的平衡具有重要意义。\n\n"
  },
  {
    "path": "abs/2412.14568.md",
    "content": "### Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation\n\nRecent learning-based Multi-View Stereo models have demonstrated state-of-the-art performance in sparse-view 3D reconstruction. However, directly applying 3D Gaussian Splatting (3DGS) as a refinement step following these models presents challenges. We hypothesize that the excessive positional degrees of freedom (DoFs) in Gaussians induce geometry distortion, fitting color patterns at the cost of structural fidelity. To address this, we propose reprojection-based DoF separation, a method distinguishing positional DoFs in terms of uncertainty: image-plane-parallel DoFs and ray-aligned DoF. To independently manage each DoF, we introduce a reprojection process along with tailored constraints for each DoF. Through experiments across various datasets, we confirm that separating the positional DoFs of Gaussians and applying targeted constraints effectively suppresses geometric artifacts, producing reconstruction results that are both visually and geometrically plausible.\n\n近年来，基于学习的多视图立体（Multi-View Stereo, MVS）模型在稀疏视角的三维重建中表现出色。然而，将三维高斯点云（3D Gaussian Splatting, 3DGS）直接作为这些模型的后续优化步骤会面临一些挑战。我们假设，高斯基元中过多的位姿自由度（Degrees of Freedom, DoFs）会引发几何失真，导致为了匹配颜色模式而牺牲结构的准确性。\n为了解决这一问题，我们提出了基于重投影的自由度分离（Reprojection-based DoF Separation）方法，通过不确定性将位姿自由度区分为图像平面平行自由度和沿射线方向的自由度。为了独立管理每种自由度，我们引入了一个重投影过程，并针对每种自由度设计了专门的约束。\n在多个数据集上的实验结果表明，对高斯基元的位姿自由度进行分离并施加针对性的约束，能够有效抑制几何伪影，从而生成在视觉上和几何上均可信的重建结果。\n"
  },
  {
    "path": "abs/2412.14579.md",
    "content": "### GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting\n\n3D occupancy perception is gaining increasing attention due to its capability to offer detailed and precise environment representations. Previous weakly-supervised NeRF methods balance efficiency and accuracy, with mIoU varying by 5-10 points due to sampling count along camera rays. Recently, real-time Gaussian splatting has gained widespread popularity in 3D reconstruction, and the occupancy prediction task can also be viewed as a reconstruction task. Consequently, we propose GSRender, which naturally employs 3D Gaussian Splatting for occupancy prediction, simplifying the sampling process. In addition, the limitations of 2D supervision result in duplicate predictions along the same camera ray. We implemented the Ray Compensation (RC) module, which mitigates this issue by compensating for features from adjacent frames. Finally, we redesigned the loss to eliminate the impact of dynamic objects from adjacent frames. Extensive experiments demonstrate that our approach achieves SOTA (state-of-the-art) results in RayIoU (+6.0), while narrowing the gap with 3D supervision methods.\n\n三维占用感知（3D Occupancy Perception）因其能够提供细致且精确的环境表示而备受关注。以往的弱监督 NeRF 方法在效率与精度之间取得了一定平衡，但沿相机射线的采样次数会导致 mIoU 存在 5-10 个点的波动。近年来，实时高斯点云技术（3D Gaussian Splatting）因其在三维重建中的出色表现广受欢迎，而占用预测任务同样可以视为一种重建任务。\n基于此，我们提出了 GSRender，将三维高斯点云自然应用于占用预测任务，从而简化了采样过程。此外，由于二维监督的局限性，沿同一相机射线可能会出现重复预测。为解决这一问题，我们实现了 射线补偿模块（Ray Compensation, RC），通过补偿来自相邻帧的特征，缓解了这一问题。最后，我们重新设计了损失函数，消除了相邻帧中动态物体的影响。\n大量实验结果表明，我们的方法在 RayIoU 上实现了 +6.0 的提升，同时缩小了与三维监督方法的差距，达到了当前最先进（SOTA）的结果。我们的代码将很快开源。\n"
  },
  {
    "path": "abs/2412.14963.md",
    "content": "### IDOL: Instant Photorealistic 3D Human Creation from a Single Image\n\nCreating a high-fidelity, animatable 3D full-body avatar from a single image is a challenging task due to the diverse appearance and poses of humans and the limited availability of high-quality training data. To achieve fast and high-quality human reconstruction, this work rethinks the task from the perspectives of dataset, model, and representation. First, we introduce a large-scale HUman-centric GEnerated dataset, HuGe100K, consisting of 100K diverse, photorealistic sets of human images. Each set contains 24-view frames in specific human poses, generated using a pose-controllable image-to-multi-view model. Next, leveraging the diversity in views, poses, and appearances within HuGe100K, we develop a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space from a given human image. This model is trained to disentangle human pose, body shape, clothing geometry, and texture. The estimated Gaussians can be animated without post-processing. We conduct comprehensive experiments to validate the effectiveness of the proposed dataset and method. Our model demonstrates the ability to efficiently reconstruct photorealistic humans at 1K resolution from a single input image using a single GPU instantly. Additionally, it seamlessly supports various applications, as well as shape and texture editing tasks.\n\n创建一个高保真、可动画的3D全身头像，仅从单张图像生成，是一个具有挑战性的任务，因为人类的外观和姿态多样，以及高质量训练数据的有限性。为了实现快速且高质量的人体重建，本研究从数据集、模型和表示方法的角度重新思考了这一任务。\n首先，我们引入了一个大规模以人为中心的生成数据集 HuGe100K，该数据集由100K组多样化、逼真的人像图像组成。每组包含24视角帧，显示特定的人体姿态，利用一个可控制姿态的图像到多视图模型生成。\n接着，利用 HuGe100K 数据集中丰富的视角、姿态和外观多样性，我们开发了一个可扩展的前馈Transformer模型，该模型能够从单个人像图像中预测一个统一空间内的3D人体高斯表示。模型经过训练，可以解耦人体的姿态、体型、服装几何形状和纹理。预测的高斯表示可直接用于动画生成，无需后处理。\n我们进行了全面的实验，验证了所提出数据集和方法的有效性。实验表明，我们的模型能够高效地从单张输入图像重建分辨率达1K的逼真人体，并可在单张GPU上即时完成。此外，该方法还无缝支持多种应用，包括形状和纹理编辑任务。\n"
  },
  {
    "path": "abs/2412.15171.md",
    "content": "### SqueezeMe: Efficient Gaussian Avatars for VR\n\nGaussian Splatting has enabled real-time 3D human avatars with unprecedented levels of visual quality. While previous methods require a desktop GPU for real-time inference of a single avatar, we aim to squeeze multiple Gaussian avatars onto a portable virtual reality headset with real-time drivable inference. We begin by training a previous work, Animatable Gaussians, on a high quality dataset captured with 512 cameras. The Gaussians are animated by controlling base set of Gaussians with linear blend skinning (LBS) motion and then further adjusting the Gaussians with a neural network decoder to correct their appearance. When deploying the model on a Meta Quest 3 VR headset, we find two major computational bottlenecks: the decoder and the rendering. To accelerate the decoder, we train the Gaussians in UV-space instead of pixel-space, and we distill the decoder to a single neural network layer. Further, we discover that neighborhoods of Gaussians can share a single corrective from the decoder, which provides an additional speedup. To accelerate the rendering, we develop a custom pipeline in Vulkan that runs on the mobile GPU. Putting it all together, we run 3 Gaussian avatars concurrently at 72 FPS on a VR headset.\n\n高斯点云技术（Gaussian Splatting）已使实时三维人像生成达到了前所未有的视觉质量水平。虽然现有方法需要桌面级 GPU 才能实现单个虚拟人像的实时推理，我们的目标是将多个高斯虚拟人像压缩至便携式虚拟现实（VR）头显上，并实现实时驱动推理。\n我们首先在一个由 512 台相机捕获的高质量数据集上训练现有的 可动画高斯（Animatable Gaussians） 模型。通过线性混合蒙皮（LBS）运动控制一组基础高斯，再通过神经网络解码器调整外观实现高斯动画化。在将模型部署到 Meta Quest 3 VR 头显 时，我们发现解码器和渲染是两个主要的计算瓶颈。\n为加速解码器，我们在 UV 空间而非像素空间中训练高斯，并将解码器蒸馏为一个单层神经网络。此外，我们发现，高斯点云的邻域可以共享一个修正值，从而进一步提升速度。在渲染方面，我们开发了一条基于 Vulkan 的自定义管线，以在移动 GPU 上高效运行。\n综合以上改进，我们成功在 VR 头显上以 72 FPS 同时运行 3 个高斯虚拟人像，显著提高了性能，为多虚拟人像实时驱动的便携式 VR 应用提供了可行的解决方案。\n"
  },
  {
    "path": "abs/2412.15215.md",
    "content": "### EnvGS: Modeling View-Dependent Appearance with Environment Gaussian\n\nReconstructing complex reflections in real-world scenes from 2D images is essential for achieving photorealistic novel view synthesis. Existing methods that utilize environment maps to model reflections from distant lighting often struggle with high-frequency reflection details and fail to account for near-field reflections. In this work, we introduce EnvGS, a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments. These environment Gaussian primitives are incorporated with base Gaussian primitives to model the appearance of the whole scene. To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based renderer that leverages the GPU's RT core for fast rendering. This allows us to jointly optimize our model for high-quality reconstruction while maintaining real-time rendering speeds. Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections, achieving the best rendering quality in real-time novel view synthesis.\n\n从2D图像中重建真实场景中的复杂反射对于实现逼真的新视角合成至关重要。现有利用环境贴图来模拟远距离光照反射的方法通常难以捕捉高频反射细节，并且无法有效处理近场反射问题。在本文中，我们提出了一种新方法 EnvGS，通过一组高斯原语作为显式3D表示来捕捉环境的反射。这些环境高斯原语与基础高斯原语相结合，用于建模整个场景的外观。\n为了高效渲染这些环境高斯原语，我们开发了一种基于光线追踪的渲染器，利用GPU的RT核心实现快速渲染。这使得我们能够在保持实时渲染速度的同时，对模型进行高质量重建的联合优化。来自多个真实场景和合成数据集的实验结果表明，我们的方法能够显著生成更加细致的反射效果，在实时新视角合成任务中实现了最佳渲染质量。\n"
  },
  {
    "path": "abs/2412.15400.md",
    "content": "### SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction\n\nGaussian splatting has achieved impressive improvements for both novel-view synthesis and surface reconstruction from multi-view images. However, current methods still struggle to reconstruct high-quality surfaces from only sparse view input images using Gaussian splatting. In this paper, we propose a novel method called SolidGS to address this problem. We observed that the reconstructed geometry can be severely inconsistent across multi-views, due to the property of Gaussian function in geometry rendering. This motivates us to consolidate all Gaussians by adopting a more solid kernel function, which effectively improves the surface reconstruction quality. With the additional help of geometrical regularization and monocular normal estimation, our method achieves superior performance on the sparse view surface reconstruction than all the Gaussian splatting methods and neural field methods on the widely used DTU, Tanks-and-Temples, and LLFF datasets.\n\n高斯点云在多视图图像的新颖视图合成和表面重建方面取得了显著的进展。然而，当前的方法在仅使用稀疏视图输入图像进行高斯点云时，仍难以重建高质量的表面。在本文中，我们提出了一种名为SolidGS的新方法来解决这一问题。我们观察到，由于高斯函数在几何渲染中的特性，重建的几何体在多视图之间可能存在严重的不一致性。这促使我们通过采用更稳固的核函数来整合所有高斯点，从而有效提升表面重建质量。在几何正则化和单目法线估计的额外辅助下，我们的方法在稀疏视图表面重建方面，在广泛使用的DTU、Tanks-and-Temples和LLFF数据集上，表现出优于所有高斯点云方法和神经场方法的卓越性能。\n"
  },
  {
    "path": "abs/2412.15447.md",
    "content": "### LiHi-GS: LiDAR-Supervised Gaussian Splatting for Highway Driving Scene Reconstruction\n\nPhotorealistic 3D scene reconstruction plays an important role in autonomous driving, enabling the generation of novel data from existing datasets to simulate safety-critical scenarios and expand training data without additional acquisition costs. Gaussian Splatting (GS) facilitates real-time, photorealistic rendering with an explicit 3D Gaussian representation of the scene, providing faster processing and more intuitive scene editing than the implicit Neural Radiance Fields (NeRFs). While extensive GS research has yielded promising advancements in autonomous driving applications, they overlook two critical aspects: First, existing methods mainly focus on low-speed and feature-rich urban scenes and ignore the fact that highway scenarios play a significant role in autonomous driving. Second, while LiDARs are commonplace in autonomous driving platforms, existing methods learn primarily from images and use LiDAR only for initial estimates or without precise sensor modeling, thus missing out on leveraging the rich depth information LiDAR offers and limiting the ability to synthesize LiDAR data. In this paper, we propose a novel GS method for dynamic scene synthesis and editing with improved scene reconstruction through LiDAR supervision and support for LiDAR rendering. Unlike prior works that are tested mostly on urban datasets, to the best of our knowledge, we are the first to focus on the more challenging and highly relevant highway scenes for autonomous driving, with sparse sensor views and monotone backgrounds.\n\n逼真的3D场景重建在自动驾驶中具有重要作用，它能够从现有数据集中生成新的数据，用于模拟安全关键场景，扩展训练数据，而无需额外的采集成本。Gaussian Splatting (GS) 提供了一种显式3D高斯表示的场景重建方法，实现了实时的逼真渲染，与隐式的神经辐射场（NeRFs）相比，具有更快的处理速度和更直观的场景编辑能力。\n尽管现有的GS研究在自动驾驶应用中取得了显著进展，但它们忽视了两个关键问题：首先，现有方法主要关注低速和特征丰富的城市场景，而忽略了高速公路场景在自动驾驶中的重要性；其次，尽管激光雷达在自动驾驶平台中十分普遍，但现有方法主要依赖图像进行学习，仅将激光雷达数据用于初始估计或未进行精确的传感器建模，未能充分利用激光雷达丰富的深度信息，这限制了生成激光雷达数据的能力。\n为解决这些问题，本文提出了一种新颖的GS方法，用于动态场景的合成和编辑。通过激光雷达监督改进场景重建，并支持激光雷达渲染。与以往主要在城市数据集上测试的工作不同，据我们所知，这是首个关注高速公路场景的研究，这类场景对自动驾驶尤为重要，但具有稀疏的传感器视角和单调的背景，挑战性更大。\n"
  },
  {
    "path": "abs/2412.15550.md",
    "content": "### EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene\n\n3D Gaussian Splatting (3D GS) has gained popularity due to its faster rendering speed and high-quality novel view synthesis. Some researchers have explored using 3D GS for reconstructing driving scenes. However, these methods often rely on various data types, such as depth maps, 3D boxes, and trajectories of moving objects. Additionally, the lack of annotations for synthesized images limits their direct application in downstream tasks. To address these issues, we propose EGSRAL, a 3D GS-based method that relies solely on training images without extra annotations. EGSRAL enhances 3D GS's capability to model both dynamic objects and static backgrounds and introduces a novel adaptor for auto labeling, generating corresponding annotations based on existing annotations. We also propose a grouping strategy for vanilla 3D GS to address perspective issues in rendering large-scale, complex scenes. Our method achieves state-of-the-art performance on multiple datasets without any extra annotation. For example, the PSNR metric reaches 29.04 on the nuScenes dataset. Moreover, our automated labeling can significantly improve the performance of 2D/3D detection tasks.\n\n3D高斯点绘（3D Gaussian Splatting, 3D GS）因其快速渲染速度和高质量的新视角合成能力而受到广泛关注。一些研究者已探索将3D GS应用于驾驶场景重建。然而，这些方法通常依赖多种数据类型，例如深度图、3D框以及动态物体的轨迹。此外，合成图像缺乏标注，限制了其在下游任务中的直接应用。\n为了解决这些问题，我们提出了一种基于3D GS的新方法 EGSRAL，该方法完全依赖训练图像而无需额外标注。EGSRAL增强了3D GS在建模动态物体和静态背景方面的能力，并引入了一种新颖的自动标注适配器，能够基于已有标注生成相应的注释。此外，我们提出了一种针对基础3D GS的分组策略，用以解决在渲染大规模复杂场景时的透视问题。\n我们的方法在多个数据集上实现了最先进的性能，无需额外标注。例如，在 nuScenes 数据集上，PSNR 指标达到了29.04。此外，我们的自动化标注功能显著提高了2D/3D检测任务的性能。\n"
  },
  {
    "path": "abs/2412.15609.md",
    "content": "### AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion\n\nCreating high-quality 3D avatars using 3D Gaussian Splatting (3DGS) from a monocular video benefits virtual reality and telecommunication applications. However, existing automatic methods exhibit artifacts under novel poses due to limited information in the input video. We propose AvatarPerfect, a novel system that allows users to iteratively refine 3DGS avatars by manually editing the rendered avatar images. In each iteration, our system suggests a new body and camera pose to help users identify and correct artifacts. The edited images are then used to update the current avatar, and our system suggests the next body and camera pose for further refinement. To investigate the effectiveness of AvatarPerfect, we conducted a user study comparing our method to an existing 3DGS editor SuperSplat, which allows direct manipulation of Gaussians without automatic pose suggestions. The results indicate that our system enables users to obtain higher quality refined 3DGS avatars than the existing 3DGS editor.\n\n利用3D高斯点绘（3D Gaussian Splatting, 3DGS）从单目视频创建高质量3D角色，为虚拟现实和远程通信应用提供了重要支持。然而，由于输入视频信息有限，现有自动化方法在生成新姿态时往往会出现伪影问题。我们提出了一种新系统 AvatarPerfect，允许用户通过手动编辑渲染的角色图像，迭代地优化3DGS角色。\n在每次迭代中，系统会建议新的身体和相机姿态，帮助用户识别并修正伪影。用户编辑后的图像用于更新当前角色模型，随后系统继续推荐下一个身体和相机姿态，进行进一步优化。为评估 AvatarPerfect 的有效性，我们与现有的3DGS编辑器 SuperSplat 进行了对比研究。后者允许直接操控高斯点，但缺乏自动姿态推荐功能。研究结果表明，与现有编辑器相比，我们的系统能够帮助用户生成质量更高的3DGS角色模型。\n"
  },
  {
    "path": "abs/2412.15867.md",
    "content": "### IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing\n\nIn inverse rendering, accurately modeling visibility and indirect radiance for incident light is essential for capturing secondary effects. Due to the absence of a powerful Gaussian ray tracer, previous 3DGS-based methods have either adopted a simplified rendering equation or used learnable parameters to approximate incident light, resulting in inaccurate material and lighting estimations. To this end, we introduce inter-reflective Gaussian splatting (IRGS) for inverse rendering. To capture inter-reflection, we apply the full rendering equation without simplification and compute incident radiance on the fly using the proposed differentiable 2D Gaussian ray tracing. Additionally, we present an efficient optimization scheme to handle the computational demands of Monte Carlo sampling for rendering equation evaluation. Furthermore, we introduce a novel strategy for querying the indirect radiance of incident light when relighting the optimized scenes. Extensive experiments on multiple standard benchmarks validate the effectiveness of IRGS, demonstrating its capability to accurately model complex inter-reflection effects.\n\n在逆向渲染中，准确建模可见性和入射光的间接辐射对于捕捉次级效应至关重要。由于缺乏强大的高斯光线追踪器，之前基于3D Gaussian Splatting（3DGS）的方法要么采用了简化的渲染方程，要么使用可学习参数来近似入射光，导致材料和光照估计不准确。为此，我们引入了用于逆向渲染的互反射高斯点云（Inter-Reflective Gaussian Splatting，IRGS）。为了捕捉互反射，我们采用了完整的渲染方程而不进行简化，并使用所提出的可微分二维高斯光线追踪实时计算入射辐射。此外，我们提出了一种高效的优化方案，以应对蒙特卡洛采样在渲染方程评估中的计算需求。此外，我们还引入了一种新颖的策略，用于在重新照明优化后的场景时查询入射光的间接辐射。在多个标准基准上的广泛实验验证了IRGS的有效性，展示了其准确建模复杂互反射效应的能力。\n"
  },
  {
    "path": "abs/2412.16028.md",
    "content": "### CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images\n\n3D Gaussian Splatting (3DGS) has attracted significant attention for its high-quality novel view rendering, inspiring research to address real-world challenges. While conventional methods depend on sharp images for accurate scene reconstruction, real-world scenarios are often affected by defocus blur due to finite depth of field, making it essential to account for realistic 3D scene representation. In this study, we propose CoCoGaussian, a Circle of Confusion-aware Gaussian Splatting that enables precise 3D scene representation using only defocused images. CoCoGaussian addresses the challenge of defocus blur by modeling the Circle of Confusion (CoC) through a physically grounded approach based on the principles of photographic defocus. Exploiting 3D Gaussians, we compute the CoC diameter from depth and learnable aperture information, generating multiple Gaussians to precisely capture the CoC shape. Furthermore, we introduce a learnable scaling factor to enhance robustness and provide more flexibility in handling unreliable depth in scenes with reflective or refractive surfaces. Experiments on both synthetic and real-world datasets demonstrate that CoCoGaussian achieves state-of-the-art performance across multiple benchmarks.\n\n3D高斯点绘（3D Gaussian Splatting, 3DGS）因其高质量的新视角渲染能力而受到广泛关注，激发了应对现实场景挑战的研究。传统方法依赖清晰图像进行准确的场景重建，而现实场景由于有限景深常受到散焦模糊的影响，这使得考虑逼真的3D场景表示变得至关重要。\n在本研究中，我们提出了 CoCoGaussian，一种基于散焦模糊感知的高斯点绘方法，能够仅利用散焦图像实现精确的3D场景表示。CoCoGaussian 通过基于摄影散焦原理的物理方法建模模糊圈（Circle of Confusion, CoC），解决了散焦模糊带来的挑战。利用3D高斯点，我们从深度信息和可学习的光圈参数中计算CoC直径，并生成多个高斯点以精确捕捉CoC形状。此外，我们引入了一种可学习的缩放因子，以增强在处理反射或折射表面等不可靠深度场景中的鲁棒性和灵活性。\n在合成和真实数据集上的实验表明，CoCoGaussian 在多个基准测试中实现了最先进的性能，验证了其在散焦模糊场景下的高效性和准确性。\n"
  },
  {
    "path": "abs/2412.16253.md",
    "content": "### Interactive Scene Authoring with Specialized Generative Primitives\n\nGenerating high-quality 3D digital assets often requires expert knowledge of complex design tools. We introduce Specialized Generative Primitives, a generative framework that allows non-expert users to author high-quality 3D scenes in a seamless, lightweight, and controllable manner. Each primitive is an efficient generative model that captures the distribution of a single exemplar from the real world. With our framework, users capture a video of an environment, which we turn into a high-quality and explicit appearance model thanks to 3D Gaussian Splatting. Users then select regions of interest guided by semantically-aware features. To create a generative primitive, we adapt Generative Cellular Automata to single-exemplar training and controllable generation. We decouple the generative task from the appearance model by operating on sparse voxels and we recover a high-quality output with a subsequent sparse patch consistency step. Each primitive can be trained within 10 minutes and used to author new scenes interactively in a fully compositional manner. We showcase interactive sessions where various primitives are extracted from real-world scenes and controlled to create 3D assets and scenes in a few minutes. We also demonstrate additional capabilities of our primitives: handling various 3D representations to control generation, transferring appearances, and editing geometries.\n\n生成高质量的3D数字资产通常需要掌握复杂设计工具的专业知识。我们提出了Specialized Generative Primitives，一种生成框架，能够让非专业用户以无缝、轻量化且可控的方式创建高质量3D场景。每个生成原语是一个高效的生成模型，能够捕捉单个真实世界样本的分布。\n在我们的框架中，用户通过拍摄环境视频，我们利用3D高斯点绘（3D Gaussian Splatting）将其转化为高质量且显式的外观模型。用户可以基于语义感知特征选择感兴趣区域。为了创建生成原语，我们将**生成元胞自动机（Generative Cellular Automata）**适配于单样本训练和可控生成。我们通过在稀疏体素上操作，将生成任务与外观模型分离，并通过后续的稀疏补丁一致性步骤恢复高质量输出。\n每个生成原语的训练时间不到10分钟，用户可在完全可组合的环境中交互式地创作新场景。我们展示了交互式操作会话，其中从真实场景中提取的不同原语被控制用于快速创建3D资产和场景。同时，我们还展示了原语的附加功能：支持多种3D表示控制生成、外观迁移以及几何编辑。\n"
  },
  {
    "path": "abs/2412.16346.md",
    "content": "### SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum\n\nWe propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only on-board perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k observation-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level body rate and thrust commands at 20Hz onboard a drone. Crucially, SV-Net includes a Rapid Motor Adaptation (RMA) module that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: this https URL.\n\n我们提出了一种新的模拟器、训练方法和策略架构，统称为SOUS VIDE，用于端到端的视觉无人机导航。我们的训练策略展现了零样本仿真到现实的迁移能力，并在仅依赖车载感知和计算的情况下，实现了在真实世界中的稳健性能。我们的模拟器FiGS结合了计算简单的无人机动力学模型和高视觉保真度的高斯点云场景重建。FiGS能够快速模拟无人机飞行，生成高达130帧每秒的逼真图像。我们使用FiGS从具有特权状态和动力学信息的专家MPC中收集了10万至30万对观测-动作数据，这些数据在动力学参数和空间干扰方面进行了随机化。然后，我们将这些专家MPC蒸馏成一个端到端的视觉运动策略，采用一种轻量级的神经架构，称为SV-Net。SV-Net处理彩色图像、光流和IMU数据流，并以20Hz的频率在无人机上生成低级的体速率和推力指令。关键地，SV-Net包括一个快速电机适应（RMA）模块，能够在运行时适应无人机动力学的变化。在105次硬件实验中，我们展示了SOUS VIDE策略在面对30%的质量变化、40 m/s的风速突变、60%的环境亮度变化、场景中物体的移动或移除以及人员在无人机视野中积极移动等情况下，依然保持了高度的鲁棒性。\n"
  },
  {
    "path": "abs/2412.16604.md",
    "content": "### OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities\n\nFeed-forward 3D Gaussian Splatting (3DGS) models have gained significant popularity due to their ability to generate scenes immediately without needing per-scene optimization. Although omnidirectional images are getting more popular since they reduce the computation for image stitching to composite a holistic scene, existing feed-forward models are only designed for perspective images. The unique optical properties of omnidirectional images make it difficult for feature encoders to correctly understand the context of the image and make the Gaussian non-uniform in space, which hinders the image quality synthesized from novel views. We propose OmniSplat, a pioneering work for fast feed-forward 3DGS generation from a few omnidirectional images. We introduce Yin-Yang grid and decompose images based on it to reduce the domain gap between omnidirectional and perspective images. The Yin-Yang grid can use the existing CNN structure as it is, but its quasi-uniform characteristic allows the decomposed image to be similar to a perspective image, so it can exploit the strong prior knowledge of the learned feed-forward network. OmniSplat demonstrates higher reconstruction accuracy than existing feed-forward networks trained on perspective images. Furthermore, we enhance the segmentation consistency between omnidirectional images by leveraging attention from the encoder of OmniSplat, providing fast and clean 3DGS editing results.\n\n前馈式3D高斯点绘（3D Gaussian Splatting, 3DGS）模型因其无需对每个场景进行优化即可直接生成场景而备受关注。随着全景图像因其减少图像拼接计算量而逐渐流行，现有的前馈模型却仍然仅针对透视图像设计。全景图像独特的光学特性使得特征编码器难以正确理解图像上下文，从而导致高斯点在空间上的非均匀分布，进而影响从新视角生成图像的质量。\n我们提出了OmniSplat，这是从少量全景图像中快速生成3DGS的开创性方法。我们引入了阴阳网格（Yin-Yang grid），并基于此对图像进行分解，以缩小全景图像与透视图像之间的领域差距。阴阳网格能够直接使用现有的卷积神经网络（CNN）结构，同时其准均匀特性使得分解后的图像类似于透视图像，从而能够利用已有前馈网络中的强先验知识。\n实验表明，OmniSplat 在全景图像上的重建精度优于现有基于透视图像训练的前馈网络。此外，我们通过利用 OmniSplat 编码器的注意力机制增强了全景图像之间的分割一致性，从而提供快速且整洁的3DGS编辑效果。\n"
  },
  {
    "path": "abs/2412.16619.md",
    "content": "### Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity\n\nGaussian Splatting (GS) has emerged as a crucial technique for representing discrete volumetric radiance fields. It leverages unique parametrization to mitigate computational demands in scene optimization. This work introduces Topology-Aware 3D Gaussian Splatting (Topology-GS), which addresses two key limitations in current approaches: compromised pixel-level structural integrity due to incomplete initial geometric coverage, and inadequate feature-level integrity from insufficient topological constraints during optimization. To overcome these limitations, Topology-GS incorporates a novel interpolation strategy, Local Persistent Voronoi Interpolation (LPVI), and a topology-focused regularization term based on persistent barcodes, named PersLoss. LPVI utilizes persistent homology to guide adaptive interpolation, enhancing point coverage in low-curvature areas while preserving topological structure. PersLoss aligns the visual perceptual similarity of rendered images with ground truth by constraining distances between their topological features. Comprehensive experiments on three novel-view synthesis benchmarks demonstrate that Topology-GS outperforms existing methods in terms of PSNR, SSIM, and LPIPS metrics, while maintaining efficient memory usage. This study pioneers the integration of topology with 3D-GS, laying the groundwork for future research in this area.\n\n高斯点绘（Gaussian Splatting, GS）已成为表示离散体积辐射场的重要技术，它通过独特的参数化方法降低场景优化中的计算需求。本文提出了拓扑感知3D高斯点绘（Topology-GS），以解决当前方法的两个关键限制：由于初始几何覆盖不完全导致的像素级结构完整性受损，以及由于优化过程中缺乏足够的拓扑约束导致的特征级完整性不足。\n为克服这些限制，Topology-GS 引入了一种新颖的插值策略 局部持久Voronoi插值（Local Persistent Voronoi Interpolation, LPVI），以及基于持久条形码的拓扑聚焦正则项 PersLoss。LPVI 利用持久同调（persistent homology）引导自适应插值，在低曲率区域增强点覆盖，同时保留拓扑结构。PersLoss 通过约束渲染图像与真实图像的拓扑特征之间的距离，将视觉感知相似性与真实场景对齐。\n在三个新视角合成基准数据集上的全面实验表明，Topology-GS 在 PSNR、SSIM 和 LPIPS 指标上均优于现有方法，同时保持高效的内存使用。该研究开创性地将拓扑与3D-GS相结合，为该领域未来的研究奠定了基础。\n"
  },
  {
    "path": "abs/2412.16809.md",
    "content": "### GeoTexDensifier: Geometry-Texture-Aware Densification for High-Quality Photorealistic 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently attracted wide attentions in various areas such as 3D navigation, Virtual Reality (VR) and 3D simulation, due to its photorealistic and efficient rendering performance. High-quality reconstrution of 3DGS relies on sufficient splats and a reasonable distribution of these splats to fit real geometric surface and texture details, which turns out to be a challenging problem. We present GeoTexDensifier, a novel geometry-texture-aware densification strategy to reconstruct high-quality Gaussian splats which better comply with the geometric structure and texture richness of the scene. Specifically, our GeoTexDensifier framework carries out an auxiliary texture-aware densification method to produce a denser distribution of splats in fully textured areas, while keeping sparsity in low-texture regions to maintain the quality of Gaussian point cloud. Meanwhile, a geometry-aware splitting strategy takes depth and normal priors to guide the splitting sampling and filter out the noisy splats whose initial positions are far from the actual geometric surfaces they aim to fit, under a Validation of Depth Ratio Change checking. With the help of relative monocular depth prior, such geometry-aware validation can effectively reduce the influence of scattered Gaussians to the final rendering quality, especially in regions with weak textures or without sufficient training views. The texture-aware densification and geometry-aware splitting strategies are fully combined to obtain a set of high-quality Gaussian splats. We experiment our GeoTexDensifier framework on various datasets and compare our Novel View Synthesis results to other state-of-the-art 3DGS approaches, with detailed quantitative and qualitative evaluations to demonstrate the effectiveness of our method in producing more photorealistic 3DGS models.\n\n3D高斯点绘（3D Gaussian Splatting, 3DGS）因其逼真且高效的渲染性能，近年来在3D导航、虚拟现实（VR）和3D模拟等领域广受关注。高质量的3DGS重建依赖于足够的点分布以及合理的点密度，以匹配真实的几何表面和纹理细节，这一过程往往面临诸多挑战。\n我们提出了GeoTexDensifier，一种新颖的几何-纹理感知的密集化策略，用于重建高质量的高斯点分布，更好地符合场景的几何结构和纹理丰富性。具体来说，GeoTexDensifier 框架采用辅助的纹理感知密集化方法，在纹理丰富的区域生成更高密度的点分布，同时在低纹理区域保持稀疏分布，从而保证高斯点云的整体质量。同时，我们设计了一种几何感知分裂策略，通过深度和法线的先验信息指导分裂采样，并利用深度比率变化验证（Validation of Depth Ratio Change）筛除初始位置远离实际几何表面的噪声点。\n借助相对单目深度先验，这种几何感知验证方法能有效减少离散高斯点对最终渲染质量的影响，特别是在纹理较弱或缺乏充分训练视角的区域。纹理感知密集化与几何感知分裂策略的结合，最终生成了一组高质量的高斯点。\n我们在多个数据集上对 GeoTexDensifier 框架进行了实验，并将其新视角合成结果与其他最先进的3DGS方法进行比较，通过详尽的定量和定性评估展示了我们方法在生成更逼真3DGS模型方面的有效性。\n"
  },
  {
    "path": "abs/2412.16932.md",
    "content": "### GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs\n\nModeling and understanding the 3D world is crucial for various applications, from augmented reality to robotic navigation. Recent advancements based on 3D Gaussian Splatting have integrated semantic information from multi-view images into Gaussian primitives. However, these methods typically require costly per-scene optimization from dense calibrated images, limiting their practicality. In this paper, we consider the new task of generalizable 3D semantic field modeling from sparse, uncalibrated image pairs. Building upon the Splatt3R architecture, we introduce GSemSplat, a framework that learns open-vocabulary semantic representations linked to 3D Gaussians without the need for per-scene optimization, dense image collections or calibration. To ensure effective and reliable learning of semantic features in 3D space, we employ a dual-feature approach that leverages both region-specific and context-aware semantic features as supervision in the 2D space. This allows us to capitalize on their complementary strengths. Experimental results on the ScanNet++ dataset demonstrate the effectiveness and superiority of our approach compared to the traditional scene-specific method. We hope our work will inspire more research into generalizable 3D understanding.\n\n建模和理解三维世界对于各种应用至关重要，从增强现实到机器人导航。最近基于三维高斯点云的进展已经将来自多视图图像的语义信息整合到高斯原语中。然而，这些方法通常需要从密集校准图像中进行昂贵的每场景优化，限制了其实用性。在本文中，我们考虑了从稀疏、未校准的图像对中进行可泛化的三维语义场建模的新任务。基于Splatt3R架构，我们引入了GSemSplat，这是一个无需每场景优化、密集图像集合或校准的框架，能够学习与三维高斯关联的开放词汇语义表示。为了确保在三维空间中有效且可靠地学习语义特征，我们采用了双特征方法，利用区域特定和上下文感知的语义特征作为二维空间中的监督。这使我们能够利用它们的互补优势。在ScanNet++数据集上的实验结果表明，与传统的场景特定方法相比，我们的方法在效果和优越性方面具有显著优势。我们希望我们的工作能激发更多关于可泛化三维理解的研究。\n"
  },
  {
    "path": "abs/2412.17378.md",
    "content": "### Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling\n\n3D Gaussian Splatting (3DGS) is increasingly attracting attention in both academia and industry owing to its superior visual quality and rendering speed. However, training a 3DGS model remains a time-intensive task, especially in load imbalance scenarios where workload diversity among pixels and Gaussian spheres causes poor renderCUDA kernel performance. We introduce Balanced 3DGS, a Gaussian-wise parallelism rendering with fine-grained tiling approach in 3DGS training process, perfectly solving load-imbalance issues. First, we innovatively introduce the inter-block dynamic workload distribution technique to map workloads to Streaming Multiprocessor(SM) resources within a single GPU dynamically, which constitutes the foundation of load balancing. Second, we are the first to propose the Gaussian-wise parallel rendering technique to significantly reduce workload divergence inside a warp, which serves as a critical component in addressing load imbalance. Based on the above two methods, we further creatively put forward the fine-grained combined load balancing technique to uniformly distribute workload across all SMs, which boosts the forward renderCUDA kernel performance by up to 7.52x. Besides, we present a self-adaptive render kernel selection strategy during the 3DGS training process based on different load-balance situations, which effectively improves training efficiency.\n\n三维高斯点云（3D Gaussian Splatting，3DGS）因其卓越的视觉质量和渲染速度，在学术界和工业界日益受到关注。然而，训练3DGS模型仍然是一项耗时的任务，尤其是在负载不平衡的场景中，像素和高斯球体之间的工作负载多样性导致renderCUDA内核性能不佳。我们提出了Balanced 3DGS，这是一种在3DGS训练过程中采用细粒度切片的高斯级并行渲染方法，完美解决了负载不平衡的问题。首先，我们创新性地引入了块间动态工作负载分配技术，动态地将工作负载映射到单个GPU内的流多处理器（Streaming Multiprocessor，SM）资源上，这构成了负载平衡的基础。其次，我们首次提出了高斯级并行渲染技术，显著减少了一个warp内的工作负载分歧，这是解决负载不平衡的关键组件。在上述两种方法的基础上，我们进一步创造性地提出了细粒度组合负载平衡技术，以均匀分配所有SM的工作负载，从而将前向renderCUDA内核性能提升高达7.52倍。此外，我们在3DGS训练过程中基于不同的负载平衡情况提出了一种自适应渲染内核选择策略，有效提高了训练效率。\n"
  },
  {
    "path": "abs/2412.17532.md",
    "content": "### Exploring Dynamic Novel View Synthesis Technologies for Cinematography\n\nNovel view synthesis (NVS) has shown significant promise for applications in cinematographic production, particularly through the exploitation of Neural Radiance Fields (NeRF) and Gaussian Splatting (GS). These methods model real 3D scenes, enabling the creation of new shots that are challenging to capture in the real world due to set topology or expensive equipment requirement. This innovation also offers cinematographic advantages such as smooth camera movements, virtual re-shoots, slow-motion effects, etc. This paper explores dynamic NVS with the aim of facilitating the model selection process. We showcase its potential through a short montage filmed using various NVS models.\n\n新颖视图合成（Novel View Synthesis，NVS）在电影制作等应用中展现了显著的前景，尤其是通过利用神经辐射场（Neural Radiance Fields，NeRF）和高斯点云（Gaussian Splatting，GS）。这些方法对真实的三维场景进行建模，使得创建在现实世界中由于场景拓扑或昂贵设备需求而难以捕捉的新镜头成为可能。这一创新还提供了诸如平滑的相机运动、虚拟重拍、慢动作效果等电影制作优势。本文探讨了动态NVS，旨在简化模型选择过程。我们通过使用各种NVS模型拍摄的短片展示了其潜力。\n"
  },
  {
    "path": "abs/2412.17612.md",
    "content": "### CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive performance in scene reconstruction. However, most existing GS-based surface reconstruction methods focus on 3D objects or limited scenes. Directly applying these methods to large-scale scene reconstruction will pose challenges such as high memory costs, excessive time consumption, and lack of geometric detail, which makes it difficult to implement in practical applications. To address these issues, we propose a multi-agent collaborative fast 3DGS surface reconstruction framework based on distributed learning for large-scale surface reconstruction. Specifically, we develop local model compression (LMC) and model aggregation schemes (MAS) to achieve high-quality surface representation of large scenes while reducing GPU memory consumption. Extensive experiments on Urban3d, MegaNeRF, and BlendedMVS demonstrate that our proposed method can achieve fast and scalable high-fidelity surface reconstruction and photorealistic rendering.\n\n三维高斯点云（3D Gaussian Splatting，3DGS）在场景重建中表现出令人印象深刻的性能。然而，大多数现有的基于GS的表面重建方法主要集中在三维对象或有限的场景上。将这些方法直接应用于大规模场景重建将面临诸如高内存成本、过多的时间消耗以及缺乏几何细节等挑战，这使得在实际应用中难以实现。为了解决这些问题，我们提出了一种基于分布式学习的大规模表面重建的多代理协作快速3DGS表面重建框架。具体来说，我们开发了本地模型压缩（Local Model Compression，LMC）和模型聚合方案（Model Aggregation Schemes，MAS），以在减少GPU内存消耗的同时，实现大场景的高质量表面表示。在Urban3d、MegaNeRF和BlendedMVS上的广泛实验表明，我们提出的方法能够实现快速、可扩展的高保真表面重建和逼真渲染。\n"
  },
  {
    "path": "abs/2412.17635.md",
    "content": "### LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding\n\nApplying Gaussian Splatting to perception tasks for 3D scene understanding is becoming increasingly popular. Most existing works primarily focus on rendering 2D feature maps from novel viewpoints, which leads to an imprecise 3D language field with outlier languages, ultimately failing to align objects in 3D space. By utilizing masked images for feature extraction, these approaches also lack essential contextual information, leading to inaccurate feature representation. To this end, we propose a Language-Embedded Surface Field (LangSurf), which accurately aligns the 3D language fields with the surface of objects, facilitating precise 2D and 3D segmentation with text query, widely expanding the downstream tasks such as removal and editing. The core of LangSurf is a joint training strategy that flattens the language Gaussian on the object surfaces using geometry supervision and contrastive losses to assign accurate language features to the Gaussians of objects. In addition, we also introduce the Hierarchical-Context Awareness Module to extract features at the image level for contextual information then perform hierarchical mask pooling using masks segmented by SAM to obtain fine-grained language features in different hierarchies. Extensive experiments on open-vocabulary 2D and 3D semantic segmentation demonstrate that LangSurf outperforms the previous state-of-the-art method LangSplat by a large margin. As shown in Fig. 1, our method is capable of segmenting objects in 3D space, thus boosting the effectiveness of our approach in instance recognition, removal, and editing, which is also supported by comprehensive experiments.\n\n将高斯点云应用于3D场景理解的感知任务正变得越来越受欢迎。大多数现有工作主要集中在从新视点渲染二维特征图，这导致3D语言场存在异常语言，从而无法在三维空间中精确对齐对象。通过利用遮罩图像进行特征提取，这些方法也缺乏必要的上下文信息，导致特征表示不准确。为此，我们提出了语言嵌入表面场（Language-Embedded Surface Field，LangSurf），该方法能够准确地将3D语言场与对象表面对齐，促进了基于文本查询的精确二维和三维分割，广泛扩展了下游任务，如移除和编辑。LangSurf的核心是联合训练策略，利用几何监督和对比损失将语言高斯在对象表面上展开，从而为对象的高斯分配准确的语言特征。此外，我们还引入了层级上下文感知模块，以在图像级别提取上下文信息，然后使用由SAM分割的遮罩进行层级掩模池化，以获得不同层级的细粒度语言特征。在开放词汇的二维和三维语义分割上的广泛实验表明，LangSurf在很大程度上优于之前的最先进方法LangSplat。如图1所示，我们的方法能够在三维空间中分割对象，从而提升了我们方法在实例识别、移除和编辑方面的有效性，这也得到了全面实验的支持。\n"
  },
  {
    "path": "abs/2412.17715.md",
    "content": "### GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance\n\nIn this paper, we present GaussianPainter, the first method to paint a point cloud into 3D Gaussians given a reference image. GaussianPainter introduces an innovative feed-forward approach to overcome the limitations of time-consuming test-time optimization in 3D Gaussian splatting. Our method addresses a critical challenge in the field: the non-uniqueness problem inherent in the large parameter space of 3D Gaussian splatting. This space, encompassing rotation, anisotropic scales, and spherical harmonic coefficients, introduces the challenge of rendering similar images from substantially different Gaussian fields. As a result, feed-forward networks face instability when attempting to directly predict high-quality Gaussian fields, struggling to converge on consistent parameters for a given output. To address this issue, we propose to estimate a surface normal for each point to determine its Gaussian rotation. This strategy enables the network to effectively predict the remaining Gaussian parameters in the constrained space. We further enhance our approach with an appearance injection module, incorporating reference image appearance into Gaussian fields via a multiscale triplane representation. Our method successfully balances efficiency and fidelity in 3D Gaussian generation, achieving high-quality, diverse, and robust 3D content creation from point clouds in a single forward pass.\n\n在本文中，我们提出了GaussianPainter，这是第一个在给定参考图像的情况下将点云绘制为三维高斯点云的方法。GaussianPainter引入了一种创新的前馈方法，克服了三维高斯点云中耗时的测试时优化的限制。我们的方法解决了该领域的一个关键挑战：三维高斯点云的大参数空间固有的非唯一性问题。这个参数空间包括旋转、各向异性缩放和球面谐系数，这导致从截然不同的高斯场渲染出相似图像成为一大挑战。因此，前馈网络在尝试直接预测高质量的高斯场时会面临不稳定性，难以在给定输出上收敛到一致的参数。为了解决这个问题，我们提出为每个点估计一个表面法线，以确定其高斯旋转。该策略使网络能够在受限的空间内有效地预测其余的高斯参数。我们进一步通过外观注入模块增强了我们的方法，通过多尺度三平面表示将参考图像的外观融入高斯场。我们的方法在三维高斯生成中成功地平衡了效率和保真度，实现了从点云中在单次前馈过程中生成高质量、多样化且鲁棒的三维内容。\n"
  },
  {
    "path": "abs/2412.17769.md",
    "content": "### ActiveGS: Active Scene Reconstruction using Gaussian Splatting\n\nRobotics applications often rely on scene reconstructions to enable downstream tasks. In this work, we tackle the challenge of actively building an accurate map of an unknown scene using an on-board RGB-D camera. We propose a hybrid map representation that combines a Gaussian splatting map with a coarse voxel map, leveraging the strengths of both representations: the high-fidelity scene reconstruction capabilities of Gaussian splatting and the spatial modelling strengths of the voxel map. The core of our framework is an effective confidence modelling technique for the Gaussian splatting map to identify under-reconstructed areas, while utilising spatial information from the voxel map to target unexplored areas and assist in collision-free path planning. By actively collecting scene information in under-reconstructed and unexplored areas for map updates, our approach achieves superior Gaussian splatting reconstruction results compared to state-of-the-art approaches. Additionally, we demonstrate the applicability of our active scene reconstruction framework in the real world using an unmanned aerial vehicle.\n\n机器人应用通常依赖场景重建来实现下游任务。在这项工作中，我们应对了使用车载RGB-D摄像头主动构建未知场景精确地图的挑战。我们提出了一种混合地图表示方法，将高斯点云图与粗略体素图相结合，利用两种表示方式的优势：高斯点云在场景重建方面的高保真能力和体素图在空间建模方面的优势。我们框架的核心是一种有效的高斯点云置信度建模技术，用于识别重建不足的区域，同时利用体素图的空间信息来定位未探索区域并辅助无碰撞路径规划。通过在重建不足和未探索区域主动收集场景信息以更新地图，我们的方法在高斯点云重建结果上优于最先进的方法。此外，我们还通过使用无人机展示了我们主动场景重建框架在现实世界中的适用性。\n"
  },
  {
    "path": "abs/2412.17812.md",
    "content": "### FaceLift: Single Image to 3D Head with View Generation and GS-LRM\n\nWe present FaceLift, a feed-forward approach for rapid, high-quality, 360-degree head reconstruction from a single image. Our pipeline begins by employing a multi-view latent diffusion model that generates consistent side and back views of the head from a single facial input. These generated views then serve as input to a GS-LRM reconstructor, which produces a comprehensive 3D representation using Gaussian splats. To train our system, we develop a dataset of multi-view renderings using synthetic 3D human head as-sets. The diffusion-based multi-view generator is trained exclusively on synthetic head images, while the GS-LRM reconstructor undergoes initial training on Objaverse followed by fine-tuning on synthetic head data. FaceLift excels at preserving identity and maintaining view consistency across views. Despite being trained solely on synthetic data, FaceLift demonstrates remarkable generalization to real-world images. Through extensive qualitative and quantitative evaluations, we show that FaceLift outperforms state-of-the-art methods in 3D head reconstruction, highlighting its practical applicability and robust performance on real-world images. In addition to single image reconstruction, FaceLift supports video inputs for 4D novel view synthesis and seamlessly integrates with 2D reanimation techniques to enable 3D facial animation.\n\n我们提出了FaceLift，这是一种前馈方法，用于从单张图像快速、高质量地进行360度头部重建。我们的流程首先使用多视图潜在扩散模型，从单一面部输入生成一致的头部侧面和背面视图。这些生成的视图随后作为GS-LRM重建器的输入，后者使用高斯点云生成全面的三维表示。为了训练我们的系统，我们开发了一个使用合成三维人头资产的多视图渲染数据集。基于扩散的多视图生成器仅在合成头部图像上进行训练，而GS-LRM重建器则先在Objaverse上进行初步训练，然后在合成头部数据上进行微调。FaceLift在保持身份特征和各视图之间的一致性方面表现出色。尽管仅在合成数据上进行训练，FaceLift在真实世界图像上的泛化能力表现出色。通过广泛的定性和定量评估，我们展示了FaceLift在三维头部重建方面优于最先进的方法，突显了其在实际应用中的可行性和在真实世界图像上的稳健表现。除了单张图像重建，FaceLift还支持视频输入用于4D新颖视图合成，并与二维再动画技术无缝集成，实现三维面部动画。\n"
  },
  {
    "path": "abs/2412.18380.md",
    "content": "### RSGaussian:3D Gaussian Splatting with LiDAR for Aerial Remote Sensing Novel View Synthesis\n\nThis study presents RSGaussian, an innovative novel view synthesis (NVS) method for aerial remote sensing scenes that incorporate LiDAR point cloud as constraints into the 3D Gaussian Splatting method, which ensures that Gaussians grow and split along geometric benchmarks, addressing the overgrowth and floaters issues occurs. Additionally, the approach introduces coordinate transformations with distortion parameters for camera models to achieve pixel-level alignment between LiDAR point clouds and 2D images, facilitating heterogeneous data fusion and achieving the high-precision geo-alignment required in aerial remote sensing. Depth and plane consistency losses are incorporated into the loss function to guide Gaussians towards real depth and plane representations, significantly improving depth estimation accuracy. Experimental results indicate that our approach has achieved novel view synthesis that balances photo-realistic visual quality and high-precision geometric estimation under aerial remote sensing datasets. Finally, we have also established and open-sourced a dense LiDAR point cloud dataset along with its corresponding aerial multi-view images, AIR-LONGYAN.\n\n本研究提出了RSGaussian，这是一种创新的新颖视图合成（Novel View Synthesis，NVS）方法，适用于航空遥感场景。该方法将LiDAR点云作为约束引入三维高斯点云（3D Gaussian Splatting）方法，确保高斯点云沿几何基准增长和分裂，从而解决了过度增长和漂浮点的问题。此外，该方法引入了带有畸变参数的坐标变换，用于相机模型，以实现LiDAR点云与二维图像之间的像素级对齐，促进异构数据融合，并实现航空遥感中所需的高精度地理对齐。深度和平面一致性损失被纳入损失函数中，以引导高斯点云朝向真实的深度和平面表示，显著提高了深度估计的准确性。实验结果表明，我们的方法在航空遥感数据集下实现了在照片真实视觉质量和高精度几何估计之间的平衡的新颖视图合成。最后，我们还建立并开源了一个密集的LiDAR点云数据集及其对应的航空多视图图像，命名为AIR-LONGYAN。\n"
  },
  {
    "path": "abs/2412.18783.md",
    "content": "### ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization\n\nAs demand from the film and gaming industries for 3D scenes with target styles grows, the importance of advanced 3D stylization techniques increases. However, recent methods often struggle to maintain local consistency in color and texture throughout stylized scenes, which is essential for maintaining aesthetic coherence. To solve this problem, this paper introduces ArtNVG, an innovative 3D stylization framework that efficiently generates stylized 3D scenes by leveraging reference style images. Built on 3D Gaussian Splatting (3DGS), ArtNVG achieves rapid optimization and rendering while upholding high reconstruction quality. Our framework realizes high-quality 3D stylization by incorporating two pivotal techniques: Content-Style Separated Control and Attention-based Neighboring-View Alignment. Content-Style Separated Control uses the CSGO model and the Tile ControlNet to decouple the content and style control, reducing risks of information leakage. Concurrently, Attention-based Neighboring-View Alignment ensures consistency of local colors and textures across neighboring views, significantly improving visual quality. Extensive experiments validate that ArtNVG surpasses existing methods, delivering superior results in content preservation, style alignment, and local consistency.\n\n随着电影和游戏产业对具有目标风格的3D场景需求的增长，先进的3D风格化技术的重要性也在提升。然而，近期的方法常常难以在整个风格化场景中保持颜色和纹理的局部一致性，而这对于维持美学连贯性至关重要。为了解决这一问题，本文介绍了ArtNVG，这是一种创新的3D风格化框架，通过利用参考风格图像高效地生成风格化的3D场景。基于3D高斯散射（3DGS），ArtNVG在保持高重建质量的同时，实现了快速优化和渲染。我们的框架通过结合两项关键技术实现了高质量的3D风格化：内容-风格分离控制和基于注意力的邻近视图对齐。内容-风格分离控制使用CSGO模型和Tile ControlNet来分离内容和风格控制，降低信息泄露的风险。同时，基于注意力的邻近视图对齐确保了邻近视图之间局部颜色和纹理的一致性，显著提升了视觉质量。大量实验验证了ArtNVG优于现有方法，在内容保留、风格对齐和局部一致性方面提供了更为出色的结果。\n"
  },
  {
    "path": "abs/2412.18862.md",
    "content": "### WeatherGS: 3D Scene Reconstruction in Adverse Weather Conditions via Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has gained significant attention for 3D scene reconstruction, but still suffers from complex outdoor environments, especially under adverse weather. This is because 3DGS treats the artifacts caused by adverse weather as part of the scene and will directly reconstruct them, largely reducing the clarity of the reconstructed scene. To address this challenge, we propose WeatherGS, a 3DGS-based framework for reconstructing clear scenes from multi-view images under different weather conditions. Specifically, we explicitly categorize the multi-weather artifacts into the dense particles and lens occlusions that have very different characters, in which the former are caused by snowflakes and raindrops in the air, and the latter are raised by the precipitation on the camera lens. In light of this, we propose a dense-to-sparse preprocess strategy, which sequentially removes the dense particles by an Atmospheric Effect Filter (AEF) and then extracts the relatively sparse occlusion masks with a Lens Effect Detector (LED). Finally, we train a set of 3D Gaussians by the processed images and generated masks for excluding occluded areas, and accurately recover the underlying clear scene by Gaussian splatting. We conduct a diverse and challenging benchmark to facilitate the evaluation of 3D reconstruction under complex weather scenarios. Extensive experiments on this benchmark demonstrate that our WeatherGS consistently produces high-quality, clean scenes across various weather scenarios, outperforming existing state-of-the-art methods.\n\n三维高斯散射（3D Gaussian Splatting，3DGS）在三维场景重建领域备受关注，但在复杂的户外环境中，尤其是在恶劣天气条件下仍存在挑战。这是因为3DGS将恶劣天气引起的伪影视为场景的一部分，直接对其进行重建，显著降低了重建场景的清晰度。为了解决这一问题，我们提出了WeatherGS，一种基于3DGS的框架，用于在不同天气条件下从多视角图像中重建清晰的场景。\n具体而言，我们明确将多种天气伪影分为两类：密集颗粒和镜头遮挡，这两者具有截然不同的特性。其中，密集颗粒由空气中的雪花和雨滴引起，而镜头遮挡则是由于降水附着在相机镜头上造成的。基于这一分类，我们提出了一种“密到疏”的预处理策略，依次使用大气效应滤波器（Atmospheric Effect Filter, AEF）去除密集颗粒，然后利用镜头效应检测器（Lens Effect Detector, LED）提取相对稀疏的遮挡掩膜。最后，我们通过处理后的图像和生成的遮挡掩膜训练一组三维高斯，并排除遮挡区域，通过高斯散射精确恢复场景的清晰内容。\n我们构建了一个多样化且具有挑战性的基准，用于评估复杂天气场景下的三维重建性能。大量实验表明，在该基准上，WeatherGS在各种天气条件下始终生成高质量、干净的场景，其性能显著优于现有的最先进方法。\n"
  },
  {
    "path": "abs/2412.19130.md",
    "content": "### MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo\n\nThis study addresses the challenge of online 3D model generation for neural rendering using an RGB image stream. Previous research has tackled this issue by incorporating Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS) as scene representations within dense SLAM methods. However, most studies focus primarily on estimating coarse 3D scenes rather than achieving detailed reconstructions. Moreover, depth estimation based solely on images is often ambiguous, resulting in low-quality 3D models that lead to inaccurate renderings. To overcome these limitations, we propose a novel framework for high-quality 3DGS modeling that leverages an online multi-view stereo (MVS) approach. Our method estimates MVS depth using sequential frames from a local time window and applies comprehensive depth refinement techniques to filter out outliers, enabling accurate initialization of Gaussians in 3DGS. Furthermore, we introduce a parallelized backend module that optimizes the 3DGS model efficiently, ensuring timely updates with each new keyframe. Experimental results demonstrate that our method outperforms state-of-the-art dense SLAM methods, particularly excelling in challenging outdoor environments.\n\n本研究针对使用RGB图像流进行神经渲染的在线3D模型生成这一挑战展开。以往研究通过在密集SLAM方法中引入神经辐射场（NeRF）或三维高斯散射（3DGS）作为场景表示来解决这一问题。然而，大多数研究主要关注粗略3D场景的估计，而非实现细致的重建。此外，仅基于图像的深度估计通常存在歧义，导致3D模型质量较低，从而影响渲染的准确性。\n为克服这些局限性，我们提出了一种新的高质量3DGS建模框架，该框架结合了在线多视图立体（MVS）方法。我们的方法使用局部时间窗口中的序列帧估计MVS深度，并应用全面的深度优化技术过滤离群值，从而实现对3DGS中高斯点的精确初始化。此外，我们引入了一个并行化的后端模块，能够高效优化3DGS模型，确保在每个新关键帧加入时及时更新。\n实验结果表明，我们的方法在性能上优于现有的最先进密集SLAM方法，特别是在复杂的户外环境中表现突出。\n"
  },
  {
    "path": "abs/2412.19142.md",
    "content": "### CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting\n\nRecent works in 3D multimodal learning have made remarkable progress. However, typically 3D multimodal models are only capable of handling point clouds. Compared to the emerging 3D representation technique, 3D Gaussian Splatting (3DGS), the spatially sparse point cloud cannot depict the texture information of 3D objects, resulting in inferior reconstruction capabilities. This limitation constrains the potential of point cloud-based 3D multimodal representation learning. In this paper, we present CLIP-GS, a novel multimodal representation learning framework grounded in 3DGS. We introduce the GS Tokenizer to generate serialized gaussian tokens, which are then processed through transformer layers pre-initialized with weights from point cloud models, resulting in the 3DGS embeddings. CLIP-GS leverages contrastive loss between 3DGS and the visual-text embeddings of CLIP, and we introduce an image voting loss to guide the directionality and convergence of gradient optimization. Furthermore, we develop an efficient way to generate triplets of 3DGS, images, and text, facilitating CLIP-GS in learning unified multimodal representations. Leveraging the well-aligned multimodal representations, CLIP-GS demonstrates versatility and outperforms point cloud-based models on various 3D tasks, including multimodal retrieval, zero-shot, and few-shot classification.\n\n近年来，三维多模态学习取得了显著进展。然而，典型的三维多模态模型通常仅能处理点云。与新兴的三维表示技术——三维高斯散射（3D Gaussian Splatting, 3DGS）相比，空间稀疏的点云无法准确描述三维物体的纹理信息，从而导致较差的重建能力。这一局限性制约了基于点云的三维多模态表示学习的潜力。\n在本文中，我们提出了CLIP-GS，一种基于3DGS的创新多模态表示学习框架。我们引入了GS分词器（GS Tokenizer），用于生成序列化的高斯标记（Gaussian Tokens），这些标记随后通过预初始化了点云模型权重的Transformer层进行处理，从而生成3DGS嵌入。CLIP-GS利用3DGS与CLIP的视觉-文本嵌入之间的对比损失（Contrastive Loss），并提出了一种图像投票损失（Image Voting Loss）以指导梯度优化的方向性和收敛性。此外，我们开发了一种高效的方法，用于生成3DGS、图像和文本的三元组，从而促进CLIP-GS学习统一的多模态表示。\n通过对齐良好的多模态表示，CLIP-GS展现了高度的通用性，并在多种三维任务中优于基于点云的模型，包括多模态检索、零样本和小样本分类等任务。\n"
  },
  {
    "path": "abs/2412.19149.md",
    "content": "### Generating Editable Head Avatars with 3D Gaussian GANs\n\nGenerating animatable and editable 3D head avatars is essential for various applications in computer vision and graphics. Traditional 3D-aware generative adversarial networks (GANs), often using implicit fields like Neural Radiance Fields (NeRF), achieve photorealistic and view-consistent 3D head synthesis. However, these methods face limitations in deformation flexibility and editability, hindering the creation of lifelike and easily modifiable 3D heads. We propose a novel approach that enhances the editability and animation control of 3D head avatars by incorporating 3D Gaussian Splatting (3DGS) as an explicit 3D representation. This method enables easier illumination control and improved editability. Central to our approach is the Editable Gaussian Head (EG-Head) model, which combines a 3D Morphable Model (3DMM) with texture maps, allowing precise expression control and flexible texture editing for accurate animation while preserving identity. To capture complex non-facial geometries like hair, we use an auxiliary set of 3DGS and tri-plane features. Extensive experiments demonstrate that our approach delivers high-quality 3D-aware synthesis with state-of-the-art controllability.\n\n生成可动画和可编辑的三维头部头像在计算机视觉和图形学的各种应用中至关重要。传统的三维感知生成对抗网络（GAN），通常使用神经辐射场（NeRF）等隐式场景表示，实现了照片级逼真和视角一致的三维头部合成。然而，这些方法在变形灵活性和可编辑性方面存在局限，难以创建逼真且易于修改的三维头部模型。\n我们提出了一种新方法，通过将三维高斯散射（3D Gaussian Splatting, 3DGS）作为显式三维表示，增强三维头部头像的可编辑性和动画控制能力。该方法支持更简单的光照控制和改进的可编辑性。我们方法的核心是可编辑高斯头部模型（Editable Gaussian Head, EG-Head），它将三维可变形模型（3D Morphable Model, 3DMM）与纹理贴图相结合，允许精确的表情控制和灵活的纹理编辑，从而在保持身份特征的同时实现准确的动画效果。\n为了捕获复杂的非面部几何结构（如头发），我们引入了一组辅助的3DGS和三平面特征（tri-plane features）。大量实验表明，我们的方法在高质量的三维感知合成方面表现出色，同时具备当前最先进的可控性。\n"
  },
  {
    "path": "abs/2412.19282.md",
    "content": "### Reflective Gaussian Splatting\n\nNovel view synthesis has experienced significant advancements owing to increasingly capable NeRF- and 3DGS-based methods. However, reflective object reconstruction remains challenging, lacking a proper solution to achieve real-time, high-quality rendering while accommodating inter-reflection. To fill this gap, we introduce a Reflective Gaussian splatting (\\textbf{Ref-Gaussian}) framework characterized with two components: (I) {\\em Physically based deferred rendering} that empowers the rendering equation with pixel-level material properties via formulating split-sum approximation; (II) {\\em Gaussian-grounded inter-reflection} that realizes the desired inter-reflection function within a Gaussian splatting paradigm for the first time. To enhance geometry modeling, we further introduce material-aware normal propagation and an initial per-Gaussian shading stage, along with 2D Gaussian primitives. Extensive experiments on standard datasets demonstrate that Ref-Gaussian surpasses existing approaches in terms of quantitative metrics, visual quality, and compute efficiency. Further, we show that our method serves as a unified solution for both reflective and non-reflective scenes, going beyond the previous alternatives focusing on only reflective scenes. Also, we illustrate that Ref-Gaussian supports more applications such as relighting and editing.\n\n新视图合成技术因日益强大的 NeRF 和 3DGS 方法而取得了显著进展。然而，对于反射性物体的重建仍然面临挑战，缺乏能够在实现实时高质量渲染的同时处理互反射的解决方案。为填补这一空白，我们提出了一种 Reflective Gaussian Splatting（Ref-Gaussian）框架，该框架由以下两个关键组件构成：1) 基于物理的延迟渲染（Physically based deferred rendering）：通过公式化分割求和近似（split-sum approximation），使渲染方程具备像素级材质属性支持。2)\t基于高斯的互反射（Gaussian-grounded inter-reflection）：首次在高斯喷射框架内实现了期望的互反射功能。\n为增强几何建模，我们进一步引入了 材质感知法线传播 和 初始高斯着色阶段，并结合 二维高斯基元。\n在标准数据集上的大量实验表明，Ref-Gaussian 在定量指标、视觉质量和计算效率方面均优于现有方法。此外，我们证明了该方法不仅能够处理反射场景，还可以统一应用于反射和非反射场景，超越了以往仅聚焦于反射场景的替代方案。同时，我们还展示了 Ref-Gaussian 支持更多应用，如重新光照和编辑。\n"
  },
  {
    "path": "abs/2412.19370.md",
    "content": "### BeSplat -- Gaussian Splatting from a Single Blurry Image and Event Stream\n\nNovel view synthesis has been greatly enhanced by the development of radiance field methods. The introduction of 3D Gaussian Splatting (3DGS) has effectively addressed key challenges, such as long training times and slow rendering speeds, typically associated with Neural Radiance Fields (NeRF), while maintaining high-quality reconstructions. In this work (BeSplat), we demonstrate the recovery of sharp radiance field (Gaussian splats) from a single motion-blurred image and its corresponding event stream. Our method jointly learns the scene representation via Gaussian Splatting and recovers the camera motion through Bezier SE(3) formulation effectively, minimizing discrepancies between synthesized and real-world measurements of both blurry image and corresponding event stream. We evaluate our approach on both synthetic and real datasets, showcasing its ability to render view-consistent, sharp images from the learned radiance field and the estimated camera trajectory. To the best of our knowledge, ours is the first work to address this highly challenging ill-posed problem in a Gaussian Splatting framework with the effective incorporation of temporal information captured using the event stream.\n\n新视角合成技术因辐射场方法的发展而得到了极大提升。三维高斯散射（3D Gaussian Splatting, 3DGS）的引入有效解决了神经辐射场（NeRF）通常面临的关键挑战，例如训练时间长和渲染速度慢，同时仍能保持高质量的重建。在本研究（BeSplat）中，我们展示了如何从单张运动模糊图像及其对应的事件流中恢复清晰的辐射场（高斯散点）。\n我们的方法通过联合学习，利用高斯散射进行场景表示，并通过Bezier SE(3)形式有效恢复相机运动，最小化合成图像与真实世界数据（模糊图像和对应事件流）之间的差异。我们在合成数据集和真实数据集上评估了该方法，展示了其从学习的辐射场和估计的相机轨迹中渲染视角一致、清晰图像的能力。据我们所知，这是首次在高斯散射框架中结合事件流捕获的时间信息，有效解决这一高度挑战的病态问题。\n"
  },
  {
    "path": "abs/2412.19483.md",
    "content": "### Learning Radiance Fields from a Single Snapshot Compressive Image\n\nIn this paper, we explore the potential of Snapshot Compressive Imaging (SCI) technique for recovering the underlying 3D scene structure from a single temporal compressed image. SCI is a cost-effective method that enables the recording of high-dimensional data, such as hyperspectral or temporal information, into a single image using low-cost 2D imaging sensors. To achieve this, a series of specially designed 2D masks are usually employed, reducing storage and transmission requirements and offering potential privacy protection. Inspired by this, we take one step further to recover the encoded 3D scene information leveraging powerful 3D scene representation capabilities of neural radiance fields (NeRF). Specifically, we propose SCINeRF, in which we formulate the physical imaging process of SCI as part of the training of NeRF, allowing us to exploit its impressive performance in capturing complex scene structures. In addition, we further integrate the popular 3D Gaussian Splatting (3DGS) framework and propose SCISplat to improve 3D scene reconstruction quality and training/rendering speed by explicitly optimizing point clouds into 3D Gaussian representations. To assess the effectiveness of our method, we conduct extensive evaluations using both synthetic data and real data captured by our SCI system. Experimental results demonstrate that our proposed approach surpasses the state-of-the-art methods in terms of image reconstruction and novel view synthesis. Moreover, our method also exhibits the ability to render high frame-rate multi-view consistent images in real time by leveraging SCI and the rendering capabilities of 3DGS.\n\n在本文中，我们探索了**快照压缩成像（Snapshot Compressive Imaging, SCI）**技术在从单张时间压缩图像中恢复三维场景结构的潜力。SCI是一种经济高效的方法，通过使用低成本的二维成像传感器将高维数据（如高光谱或时间信息）记录到单张图像中。为实现这一目标，通常使用一系列专门设计的二维掩模，从而减少存储和传输需求，并提供潜在的隐私保护。\n受此启发，我们进一步利用神经辐射场（NeRF）的强大三维场景表示能力来恢复编码的三维场景信息。具体而言，我们提出了SCINeRF，将SCI的物理成像过程融入NeRF的训练中，从而利用其在捕获复杂场景结构方面的卓越表现。此外，我们结合了流行的三维高斯散射（3D Gaussian Splatting, 3DGS）框架，提出了SCISplat，通过显式优化点云为三维高斯表示，提高了三维场景重建的质量以及训练和渲染速度。\n为评估我们方法的有效性，我们在使用SCI系统采集的合成数据和真实数据上进行了广泛的实验。实验结果表明，我们的方法在图像重建和新视角合成方面超越了现有的最先进方法。此外，通过结合SCI和3DGS的渲染能力，我们的方法还表现出实时渲染高帧率、多视图一致图像的能力。\n"
  },
  {
    "path": "abs/2412.19518.md",
    "content": "### Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images\n\nPhoto-realistic scene reconstruction from sparse-view, uncalibrated images is highly required in practice. Although some successes have been made, existing methods are either Sparse-View but require accurate camera parameters (i.e., intrinsic and extrinsic), or SfM-free but need densely captured images. To combine the advantages of both methods while addressing their respective weaknesses, we propose Dust to Tower (D2T), an accurate and efficient coarse-to-fine framework to optimize 3DGS and image poses simultaneously from sparse and uncalibrated images. Our key idea is to first construct a coarse model efficiently and subsequently refine it using warped and inpainted images at novel viewpoints. To do this, we first introduce a Coarse Construction Module (CCM) which exploits a fast Multi-View Stereo model to initialize a 3D Gaussian Splatting (3DGS) and recover initial camera poses. To refine the 3D model at novel viewpoints, we propose a Confidence Aware Depth Alignment (CADA) module to refine the coarse depth maps by aligning their confident parts with estimated depths by a Mono-depth model. Then, a Warped Image-Guided Inpainting (WIGI) module is proposed to warp the training images to novel viewpoints by the refined depth maps, and inpainting is applied to fulfill the \"holes\" in the warped images caused by view-direction changes, providing high-quality supervision to further optimize the 3D model and the camera poses. Extensive experiments and ablation studies demonstrate the validity of D2T and its design choices, achieving state-of-the-art performance in both tasks of novel view synthesis and pose estimation while keeping high efficiency.\n\n从稀疏视角、未校准的图像中进行照片级真实感的场景重建在实践中具有重要需求。尽管已有方法取得了一定成功，但现有方法要么是针对稀疏视角但需要精确的相机参数（即内参和外参），要么是无需结构化运动（SfM）但需要密集采集的图像。为结合两种方法的优势并克服各自的局限性，我们提出了Dust to Tower (D2T)，一种准确高效的从粗到精框架，用于从稀疏未校准图像中同时优化3DGS（3D Gaussian Splatting）和图像的相机位姿。\n我们的核心思路是，先高效地构建粗略模型，然后利用在新视角下生成和修复的图像对其进行细化。具体来说，我们首先引入粗略构建模块（Coarse Construction Module, CCM），利用快速的多视图立体模型（MVS）初始化3D高斯散射（3DGS），并恢复初始的相机位姿。为了在新视角下优化3D模型，我们提出了置信度感知深度对齐模块（Confidence Aware Depth Alignment, CADA），通过将粗略深度图的高置信度部分与单目深度模型估计的深度对齐来细化深度图。随后，我们设计了基于图像变换的修复模块（Warped Image-Guided Inpainting, WIGI），通过细化后的深度图将训练图像变换到新视角，并通过修复技术填补因视角变化导致的图像“空洞”，提供高质量的监督信息以进一步优化3D模型和相机位姿。\n大量实验和消融研究验证了D2T及其设计选择的有效性，在新视角合成和位姿估计任务中均实现了最先进的性能，同时保持了较高的效率。\n"
  },
  {
    "path": "abs/2412.19584.md",
    "content": "### DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction\n\nWe propose a novel framework for scene decomposition and static background reconstruction from everyday videos. By integrating the trained motion masks and modeling the static scene as Gaussian splats with dynamics-aware optimization, our method achieves more accurate background reconstruction results than previous works. Our proposed method is termed DAS3R, an abbreviation for Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction. Compared to existing methods, DAS3R is more robust in complex motion scenarios, capable of handling videos where dynamic objects occupy a significant portion of the scene, and does not require camera pose inputs or point cloud data from SLAM-based methods. We compared DAS3R against recent distractor-free approaches on the DAVIS and Sintel datasets; DAS3R demonstrates enhanced performance and robustness with a margin of more than 2 dB in PSNR.\n\n我们提出了一种新颖的框架，用于从日常视频中进行场景分解和静态背景重建。通过集成训练好的运动掩码，并将静态场景建模为具有动态感知优化的高斯喷射，我们的方法比现有工作实现了更准确的背景重建结果。我们将该方法命名为 DAS3R，即 Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction 的缩写。\n与现有方法相比，DAS3R 在复杂运动场景中更加鲁棒，能够处理动态物体占据场景大部分的视频，并且无需相机位姿输入或基于 SLAM 方法的点云数据。在 DAVIS 和 Sintel 数据集上，我们将 DAS3R 与最近的无干扰方法进行了比较；结果表明，DAS3R 在性能和鲁棒性上均有显著提升，PSNR 指标提高了 2 dB 以上。\n"
  },
  {
    "path": "abs/2412.19841.md",
    "content": "### FlameGS: Reconstruct flame light field via Gaussian Splatting\n\nTo address the time-consuming and computationally intensive issues of traditional ART algorithms for flame combustion diagnosis, inspired by flame simulation technology, we propose a novel representation method for flames. By modeling the luminous process of flames and utilizing 2D projection images for supervision, our experimental validation shows that this model achieves an average structural similarity index of 0.96 between actual images and predicted 2D projections, along with a Peak Signal-to-Noise Ratio of 39.05. Additionally, it saves approximately 34 times the computation time and about 10 times the memory compared to traditional algorithms.\n\n为解决传统 ART 算法在火焰燃烧诊断中耗时且计算量大的问题，受火焰仿真技术的启发，我们提出了一种新颖的火焰表示方法。通过对火焰发光过程建模并利用二维投影图像进行监督，我们的实验验证表明，该模型在实际图像与预测二维投影之间实现了平均结构相似性指数（SSIM）为 0.96，以及峰值信噪比（PSNR）为 39.05。此外，与传统算法相比，该方法计算时间减少约 34 倍，内存需求降低约 10 倍。\n\n"
  },
  {
    "path": "abs/2412.20056.md",
    "content": "### GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting\n\nWe present GSplatLoc, a camera localization method that leverages the differentiable rendering capabilities of 3D Gaussian splatting for ultra-precise pose estimation. By formulating pose estimation as a gradient-based optimization problem that minimizes discrepancies between rendered depth maps from a pre-existing 3D Gaussian scene and observed depth images, GSplatLoc achieves translational errors within 0.01 cm and near-zero rotational errors on the Replica dataset - significantly outperforming existing methods. Evaluations on the Replica and TUM RGB-D datasets demonstrate the method's robustness in challenging indoor environments with complex camera motions. GSplatLoc sets a new benchmark for localization in dense mapping, with important implications for applications requiring accurate real-time localization, such as robotics and augmented reality.\n\n我们提出了GSplatLoc，一种利用三维高斯散射（3D Gaussian Splatting）可微渲染能力的相机定位方法，实现了超高精度的位姿估计。通过将位姿估计表述为一个基于梯度优化的问题，GSplatLoc通过最小化预先构建的三维高斯场景渲染深度图与观测深度图之间的差异，达到了前所未有的定位精度。在Replica数据集上，GSplatLoc的平移误差低至0.01厘米，旋转误差接近零，显著优于现有方法。\n在Replica和TUM RGB-D数据集上的评估表明，该方法在复杂相机运动的室内环境中表现出极高的鲁棒性。GSplatLoc为密集映射中的定位任务设立了新的基准，对需要高精度实时定位的应用（如机器人技术和增强现实）具有重要意义。\n"
  },
  {
    "path": "abs/2412.20148.md",
    "content": "### DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis\n\nAccurately synthesizing talking face videos and capturing fine facial features for individuals with long hair presents a significant challenge. To tackle these challenges in existing methods, we propose a decomposed per-embedding Gaussian fields (DEGSTalk), a 3D Gaussian Splatting (3DGS)-based talking face synthesis method for generating realistic talking faces with long hairs. Our DEGSTalk employs Deformable Pre-Embedding Gaussian Fields, which dynamically adjust pre-embedding Gaussian primitives using implicit expression coefficients. This enables precise capture of dynamic facial regions and subtle expressions. Additionally, we propose a Dynamic Hair-Preserving Portrait Rendering technique to enhance the realism of long hair motions in the synthesized videos. Results show that DEGSTalk achieves improved realism and synthesis quality compared to existing approaches, particularly in handling complex facial dynamics and hair preservation.\n\n针对长发个体生成逼真的说话人脸视频，并捕捉精细面部特征是现有方法中的一大挑战。为了解决这些问题，我们提出了一种基于三维高斯散射（3D Gaussian Splatting, 3DGS）的说话人脸合成方法，称为分解式每嵌入高斯场（DEGSTalk），用于生成逼真的长发说话人脸。\n我们的DEGSTalk采用可变形预嵌入高斯场（Deformable Pre-Embedding Gaussian Fields），通过隐式表情系数动态调整预嵌入的高斯原语，从而实现对动态面部区域和细微表情的精准捕捉。此外，我们提出了一种动态头发保留人像渲染技术（Dynamic Hair-Preserving Portrait Rendering），以增强合成视频中长发运动的真实感。\n实验结果表明，DEGSTalk在合成质量和真实感方面相比现有方法取得了显著提升，尤其是在处理复杂的面部动态和头发保留方面表现优异。\n"
  },
  {
    "path": "abs/2412.20522.md",
    "content": "### MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks\n\nWhile 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in novel view synthesis and real-time rendering, the high memory consumption due to the use of millions of Gaussians limits its practicality. To mitigate this issue, improvements have been made by pruning unnecessary Gaussians, either through a hand-crafted criterion or by using learned masks. However, these methods deterministically remove Gaussians based on a snapshot of the pruning moment, leading to sub-optimized reconstruction performance from a long-term perspective. To address this issue, we introduce MaskGaussian, which models Gaussians as probabilistic entities rather than permanently removing them, and utilize them according to their probability of existence. To achieve this, we propose a masked-rasterization technique that enables unused yet probabilistically existing Gaussians to receive gradients, allowing for dynamic assessment of their contribution to the evolving scene and adjustment of their probability of existence. Hence, the importance of Gaussians iteratively changes and the pruned Gaussians are selected diversely. Extensive experiments demonstrate the superiority of the proposed method in achieving better rendering quality with fewer Gaussians than previous pruning methods, pruning over 60% of Gaussians on average with only a 0.02 PSNR decline.\n\n尽管三维高斯散射（3D Gaussian Splatting, 3DGS）在新视角合成和实时渲染方面表现出色，但由于使用了数百万个高斯点，其高内存消耗限制了实际应用的可行性。为缓解这一问题，一些改进方法通过手工设计的标准或学习生成的掩码来修剪不必要的高斯点。然而，这些方法在修剪时基于某一时刻的快照确定性地移除高斯点，从长远来看可能导致次优的重建性能。\n为了解决这一问题，我们提出了MaskGaussian，将高斯点建模为概率性实体，而非永久移除，并根据其存在的概率来利用它们。为实现这一目标，我们设计了一种掩码光栅化技术（masked-rasterization technique），使得那些未被使用但概率上仍存在的高斯点能够接收梯度，从而动态评估它们对场景演化的贡献，并调整其存在的概率。因此，高斯点的重要性能够迭代地发生变化，修剪过程中的选择也更加多样化。\n大量实验表明，与以往的修剪方法相比，MaskGaussian在使用更少高斯点的情况下实现了更好的渲染质量。平均而言，该方法能够修剪超过60%的高斯点，仅带来0.02 PSNR的微小下降。\n"
  },
  {
    "path": "abs/2412.20720.md",
    "content": "### 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives\n\nDynamic 3D scene representation and novel view synthesis from captured videos are crucial for enabling immersive experiences required by AR/VR and metaverse applications. However, this task is challenging due to the complexity of unconstrained real-world scenes and their temporal dynamics. In this paper, we frame dynamic scenes as a spatio-temporal 4D volume learning problem, offering a native explicit reformulation with minimal assumptions about motion, which serves as a versatile dynamic scene learning framework. Specifically, we represent a target dynamic scene using a collection of 4D Gaussian primitives with explicit geometry and appearance features, dubbed as 4D Gaussian splatting (4DGS). This approach can capture relevant information in space and time by fitting the underlying spatio-temporal volume. Modeling the spacetime as a whole with 4D Gaussians parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, our model can naturally learn view-dependent and time-evolved appearance with 4D spherindrical harmonics. Notably, our 4DGS model is the first solution that supports real-time rendering of high-resolution, photorealistic novel views for complex dynamic scenes. To enhance efficiency, we derive several compact variants that effectively reduce memory footprint and mitigate the risk of overfitting. Extensive experiments validate the superiority of 4DGS in terms of visual quality and efficiency across a range of dynamic scene-related tasks (e.g., novel view synthesis, 4D generation, scene understanding) and scenarios (e.g., single object, indoor scenes, driving environments, synthetic and real data).\n\n动态三维场景表示与新视角合成对于增强AR/VR和元宇宙应用中的沉浸式体验至关重要。然而，由于不受约束的真实场景的复杂性及其时间动态特性，这一任务面临巨大挑战。本文将动态场景表述为一个时空4D体积学习问题，提供了一种原生的显式重构方式，几乎不对运动做任何假设，从而构建了一个通用的动态场景学习框架。\n具体而言，我们使用具有显式几何和外观特征的4D高斯原语集合表示目标动态场景，这种方法称为四维高斯散射（4D Gaussian Splatting, 4DGS）。通过拟合场景的底层时空体积，该方法能够捕获空间和时间中的相关信息。我们通过各向异性椭圆参数化的4D高斯，将时空整体建模，使其可以在空间和时间中任意旋转，从而自然地学习视角依赖和随时间变化的外观，并结合**四维球柱谐波（4D Spherindrical Harmonics）**实现高级特性建模。\n值得注意的是，4DGS模型是首个支持实时渲染高分辨率、照片级真实感动态场景新视角的解决方案。为提升效率，我们还提出了多个紧凑变体，有效减少内存占用并降低过拟合风险。\n大量实验验证了4DGS在视觉质量和效率上的卓越性能，适用于多种动态场景相关任务（如新视角合成、4D生成、场景理解）和场景类型（如单一对象、室内场景、驾驶环境、合成与真实数据）。\n"
  },
  {
    "path": "abs/2412.20767.md",
    "content": "### KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences\n\nReconstructing high-quality 3D models from sparse 2D images has garnered significant attention in computer vision. Recently, 3D Gaussian Splatting (3DGS) has gained prominence due to its explicit representation with efficient training speed and real-time rendering capabilities. However, existing methods still heavily depend on accurate camera poses for reconstruction. Although some recent approaches attempt to train 3DGS models without the Structure-from-Motion (SfM) preprocessing from monocular video datasets, these methods suffer from prolonged training times, making them impractical for many applications. In this paper, we present an efficient framework that operates without any depth or matching model. Our approach initially uses SfM to quickly obtain rough camera poses within seconds, and then refines these poses by leveraging the dense representation in 3DGS. This framework effectively addresses the issue of long training times. Additionally, we integrate the densification process with joint refinement and propose a coarse-to-fine frequency-aware densification to reconstruct different levels of details. This approach prevents camera pose estimation from being trapped in local minima or drifting due to high-frequency signals. Our method significantly reduces training time from hours to minutes while achieving more accurate novel view synthesis and camera pose estimation compared to previous methods.\n\n从稀疏二维图像重建高质量三维模型是计算机视觉中的一个重要研究方向。近年来，三维高斯散射（3D Gaussian Splatting, 3DGS）因其显式表示、高效的训练速度和实时渲染能力而受到广泛关注。然而，现有方法在重建中仍然严重依赖于精确的相机位姿。尽管一些最新方法尝试在单目视频数据集上训练3DGS模型而无需依赖结构化运动（Structure-from-Motion, SfM）预处理，这些方法却由于训练时间过长而在许多应用场景中变得不切实际。\n在本文中，我们提出了一个无需深度或匹配模型的高效框架。我们的方法首先通过SfM在几秒内快速获得粗略的相机位姿，然后利用3DGS的密集表示对这些位姿进行优化，从而有效解决了长时间训练的问题。此外，我们将密集化过程与联合优化相结合，提出了一种从粗到细的频率感知密集化方法，用于重建不同层次的细节。该方法能够避免相机位姿估计因高频信号陷入局部最小值或发生漂移。\n与现有方法相比，我们的方法显著减少了训练时间，从数小时缩短至数分钟，同时在新视角合成和相机位姿估计的准确性上实现了更优的性能。\n"
  },
  {
    "path": "abs/2412.21206.md",
    "content": "### PERSE: Personalized 3D Generative Avatars from A Single Portrait\n\nWe present PERSE, a method for building an animatable personalized generative avatar from a reference portrait. Our avatar model enables facial attribute editing in a continuous and disentangled latent space to control each facial attribute, while preserving the individual's identity. To achieve this, our method begins by synthesizing large-scale synthetic 2D video datasets, where each video contains consistent changes in the facial expression and viewpoint, combined with a variation in a specific facial attribute from the original input. We propose a novel pipeline to produce high-quality, photorealistic 2D videos with facial attribute editing. Leveraging this synthetic attribute dataset, we present a personalized avatar creation method based on the 3D Gaussian Splatting, learning a continuous and disentangled latent space for intuitive facial attribute manipulation. To enforce smooth transitions in this latent space, we introduce a latent space regularization technique by using interpolated 2D faces as supervision. Compared to previous approaches, we demonstrate that PERSE generates high-quality avatars with interpolated attributes while preserving identity of reference person.\n\n我们提出了 PERSE，一种从参考肖像生成可动画化个性化生成头像的方法。该头像模型允许在连续且解耦的潜在空间中编辑面部属性，从而精确控制各个面部属性，同时保持个体身份的一致性。\n为实现这一目标，我们的方法首先通过生成大规模的合成二维视频数据集入手，其中每个视频包含面部表情和视角的连续变化，并结合特定面部属性的变化，这些变化基于原始输入。我们提出了一种新颖的管道，用于生成高质量、真实感强的二维视频，同时实现面部属性编辑。\n利用这一合成属性数据集，我们基于 3D 高斯喷射（Gaussian Splatting）提出了一种个性化头像创建方法，通过学习连续且解耦的潜在空间，实现直观的面部属性操控。为了在潜在空间中实现平滑过渡，我们引入了一种潜在空间正则化技术，通过插值的二维面部图像进行监督。\n与现有方法相比，我们证明了 PERSE 能够生成具有高质量属性插值的头像，同时保持参考人物的身份一致性。\n"
  },
  {
    "path": "abs/2501.00326.md",
    "content": "### OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies\n\nOpen-vocabulary scene understanding using 3D Gaussian (3DGS) representations has garnered considerable attention. However, existing methods mostly lift knowledge from large 2D vision models into 3DGS on a scene-by-scene basis, restricting the capabilities of open-vocabulary querying within their training scenes so that lacking the generalizability to novel scenes. In this work, we propose OVGaussian, a generalizable Open-Vocabulary 3D semantic segmentation framework based on the 3D Gaussian representation. We first construct a large-scale 3D scene dataset based on 3DGS, dubbed SegGaussian, which provides detailed semantic and instance annotations for both Gaussian points and multi-view images. To promote semantic generalization across scenes, we introduce Generalizable Semantic Rasterization (GSR), which leverages a 3D neural network to learn and predict the semantic property for each 3D Gaussian point, where the semantic property can be rendered as multi-view consistent 2D semantic maps. In the next, we propose a Cross-modal Consistency Learning (CCL) framework that utilizes open-vocabulary annotations of 2D images and 3D Gaussians within SegGaussian to train the 3D neural network capable of open-vocabulary semantic segmentation across Gaussian-based 3D scenes. Experimental results demonstrate that OVGaussian significantly outperforms baseline methods, exhibiting robust cross-scene, cross-domain, and novel-view generalization capabilities.\n\n基于 3D 高斯表示（3DGS）的开放词汇场景理解引起了广泛关注。然而，现有方法主要依赖将大规模二维视觉模型的知识逐场景地迁移到 3DGS 中，这限制了开放词汇查询的能力，仅能在其训练场景内操作，缺乏对新场景的泛化能力。为了解决这一问题，我们提出了 OVGaussian，一种基于 3D 高斯表示的通用开放词汇三维语义分割框架。\n我们首先基于 3DGS 构建了一个大规模 3D 场景数据集，称为 SegGaussian，该数据集为高斯点和多视角图像提供了详细的语义和实例标注。为促进语义在不同场景间的泛化，我们提出了 通用语义栅格化（Generalizable Semantic Rasterization, GSR），利用 3D 神经网络学习并预测每个 3D 高斯点的语义属性，将语义属性渲染为多视角一致的二维语义图。\n接下来，我们提出了一种 跨模态一致性学习（Cross-modal Consistency Learning, CCL）框架，通过利用 SegGaussian 数据集中二维图像和 3D 高斯的开放词汇标注，训练能够在基于高斯的三维场景中进行开放词汇语义分割的 3D 神经网络。\n实验结果表明，OVGaussian 在跨场景、跨领域以及新视图泛化能力方面显著优于基线方法，展现出强大的鲁棒性和泛化能力。\n"
  },
  {
    "path": "abs/2501.00342.md",
    "content": "### SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians\n\n3D Gaussian Splatting is emerging as a state-of-the-art technique in novel view synthesis, recognized for its impressive balance between visual quality, speed, and rendering efficiency. However, reliance on third-degree spherical harmonics for color representation introduces significant storage demands and computational overhead, resulting in a large memory footprint and slower rendering speed. We introduce SG-Splatting with Spherical Gaussians based color representation, a novel approach to enhance rendering speed and quality in novel view synthesis. Our method first represents view-dependent color using Spherical Gaussians, instead of three degree spherical harmonics, which largely reduces the number of parameters used for color representation, and significantly accelerates the rendering process. We then develop an efficient strategy for organizing multiple Spherical Gaussians, optimizing their arrangement to achieve a balanced and accurate scene representation. To further improve rendering quality, we propose a mixed representation that combines Spherical Gaussians with low-degree spherical harmonics, capturing both high- and low-frequency color information effectively. SG-Splatting also has plug-and-play capability, allowing it to be easily integrated into existing systems. This approach improves computational efficiency and overall visual fidelity, making it a practical solution for real-time applications.\n\n三维高斯散射（3D Gaussian Splatting）作为新视角合成中的前沿技术，因其在视觉质量、速度和渲染效率之间的出色平衡而备受认可。然而，依赖三阶球谐函数（Spherical Harmonics）进行颜色表示会带来显著的存储需求和计算开销，导致较大的内存占用和较慢的渲染速度。\n为解决这一问题，我们提出了SG-Splatting，一种基于球面高斯（Spherical Gaussians）的颜色表示方法，用于提升新视角合成的渲染速度和质量。我们的方法首先用球面高斯代替三阶球谐函数进行视角依赖的颜色表示，大幅减少了颜色表示所需的参数数量，从而显著加速了渲染过程。随后，我们开发了一种高效的策略来组织多个球面高斯，优化其排列以实现平衡且精确的场景表示。\n为进一步提升渲染质量，我们提出了一种混合表示方法，将球面高斯与低阶球谐函数结合，能够有效捕捉高频与低频的颜色信息。此外，SG-Splatting具备即插即用的能力，能够轻松集成到现有系统中。\n实验表明，SG-Splatting在提升计算效率的同时显著提高了视觉保真度，成为适用于实时应用的实用解决方案。\n"
  },
  {
    "path": "abs/2501.00352.md",
    "content": "### PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM\n\nUnderstanding geometric, semantic, and instance information in 3D scenes from sequential video data is essential for applications in robotics and augmented reality. However, existing Simultaneous Localization and Mapping (SLAM) methods generally focus on either geometric or semantic reconstruction. In this paper, we introduce PanoSLAM, the first SLAM system to integrate geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation within a unified framework. Our approach builds upon 3D Gaussian Splatting, modified with several critical components to enable efficient rendering of depth, color, semantic, and instance information from arbitrary viewpoints. To achieve panoptic 3D scene reconstruction from sequential RGB-D videos, we propose an online Spatial-Temporal Lifting (STL) module that transfers 2D panoptic predictions from vision models into 3D Gaussian representations. This STL module addresses the challenges of label noise and inconsistencies in 2D predictions by refining the pseudo labels across multi-view inputs, creating a coherent 3D representation that enhances segmentation accuracy. Our experiments show that PanoSLAM outperforms recent semantic SLAM methods in both mapping and tracking accuracy. For the first time, it achieves panoptic 3D reconstruction of open-world environments directly from the RGB-D video.\n\n理解视频序列中三维场景的几何、语义和实例信息对机器人和增强现实等应用至关重要。然而，现有的同步定位与建图（SLAM）方法通常仅关注几何重建或语义重建的一方面。为此，我们提出了PanoSLAM，这是首个将几何重建、三维语义分割和三维实例分割整合在统一框架中的SLAM系统。\n我们的方法基于三维高斯散射（3D Gaussian Splatting），并通过多项关键改进，使其能够从任意视角高效渲染深度、颜色、语义和实例信息。为实现从RGB-D视频到全景三维场景重建，我们提出了一个在线时空提升模块（Spatial-Temporal Lifting, STL），将二维视觉模型生成的全景预测转换为三维高斯表示。该模块通过多视角输入对伪标签进行优化，解决了二维预测中的标签噪声和不一致性问题，从而生成一致的三维表示，显著提高了分割准确性。\n实验结果表明，PanoSLAM在建图和跟踪精度上优于近期的语义SLAM方法。更重要的是，该系统首次实现了从RGB-D视频直接生成开放世界环境的全景三维重建，为多模态场景理解开辟了新的可能性。\n"
  },
  {
    "path": "abs/2501.00601.md",
    "content": "### DreamDrive: Generative 4D Scene Modeling from Street View Images\n\nSynthesizing photo-realistic visual observations from an ego vehicle's driving trajectory is a critical step towards scalable training of self-driving models. Reconstruction-based methods create 3D scenes from driving logs and synthesize geometry-consistent driving videos through neural rendering, but their dependence on costly object annotations limits their ability to generalize to in-the-wild driving scenarios. On the other hand, generative models can synthesize action-conditioned driving videos in a more generalizable way but often struggle with maintaining 3D visual consistency. In this paper, we present DreamDrive, a 4D spatial-temporal scene generation approach that combines the merits of generation and reconstruction, to synthesize generalizable 4D driving scenes and dynamic driving videos with 3D consistency. Specifically, we leverage the generative power of video diffusion models to synthesize a sequence of visual references and further elevate them to 4D with a novel hybrid Gaussian representation. Given a driving trajectory, we then render 3D-consistent driving videos via Gaussian splatting. The use of generative priors allows our method to produce high-quality 4D scenes from in-the-wild driving data, while neural rendering ensures 3D-consistent video generation from the 4D scenes. Extensive experiments on nuScenes and street view images demonstrate that DreamDrive can generate controllable and generalizable 4D driving scenes, synthesize novel views of driving videos with high fidelity and 3D consistency, decompose static and dynamic elements in a self-supervised manner, and enhance perception and planning tasks for autonomous driving.\n\n生成从自车驾驶轨迹中获取的真实感视觉观测是实现自动驾驶模型可扩展训练的关键一步。基于重建的方法通过驾驶日志创建三维场景，并利用神经渲染生成几何一致的驾驶视频，但其对高成本目标标注的依赖限制了其在真实场景驾驶环境中的泛化能力。另一方面，生成模型能够以更具泛化性的方式生成动作条件下的驾驶视频，但常常难以保持三维视觉一致性。\n在本文中，我们提出了 DreamDrive，一种结合生成和重建优势的四维时空场景生成方法，用于生成具有三维一致性的通用四维驾驶场景和动态驾驶视频。具体而言，我们利用视频扩散模型的生成能力，生成一系列视觉参考，并通过一种新颖的混合高斯表示将其提升到四维场景。在给定驾驶轨迹的情况下，我们通过高斯喷射（Gaussian Splatting）渲染出三维一致的驾驶视频。\n生成先验的使用使我们的方法能够从真实驾驶数据中生成高质量的四维场景，而神经渲染则确保了从四维场景生成的驾驶视频具备三维一致性。我们在 nuScenes 和街景图像上的广泛实验表明，DreamDrive 可以生成可控且具有泛化能力的四维驾驶场景，生成高保真且三维一致的新视图驾驶视频，能够以自监督方式分解静态和动态元素，并增强自动驾驶的感知与规划任务。\n"
  },
  {
    "path": "abs/2501.00602.md",
    "content": "### STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes\n\nWe present STORM, a spatio-temporal reconstruction model designed for reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic reconstruction methods often rely on per-scene optimization, dense observations across space and time, and strong motion supervision, resulting in lengthy optimization times, limited generalization to novel views or scenes, and degenerated quality caused by noisy pseudo-labels for dynamics. To address these challenges, STORM leverages a data-driven Transformer architecture that directly infers dynamic 3D scene representations--parameterized by 3D Gaussians and their velocities--in a single forward pass. Our key design is to aggregate 3D Gaussians from all frames using self-supervised scene flows, transforming them to the target timestep to enable complete (i.e., \"amodal\") reconstructions from arbitrary viewpoints at any moment in time. As an emergent property, STORM automatically captures dynamic instances and generates high-quality masks using only reconstruction losses. Extensive experiments on public datasets show that STORM achieves precise dynamic scene reconstruction, surpassing state-of-the-art per-scene optimization methods (+4.3 to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic regions. STORM reconstructs large-scale outdoor scenes in 200ms, supports real-time rendering, and outperforms competitors in scene flow estimation, improving 3D EPE by 0.422m and Acc5 by 28.02%. Beyond reconstruction, we showcase four additional applications of our model, illustrating the potential of self-supervised learning for broader dynamic scene understanding.\n\n我们提出了 STORM，一种用于从稀疏观测中重建动态户外场景的时空重建模型。现有的动态重建方法通常依赖于逐场景优化、空间和时间上的密集观测以及强监督的运动信息，导致优化时间较长，对新视图或新场景的泛化能力有限，并且在动态伪标签噪声影响下质量下降。\n为解决这些问题，STORM 采用了一种数据驱动的 Transformer 架构，能够通过单次前向传递直接推断动态三维场景表示，参数化为 3D 高斯及其速度。其核心设计是通过自监督的场景流（scene flow）将所有帧的 3D 高斯聚合，并将其变换到目标时间步，支持从任意视点和时间点进行完整（即“全模态”）重建。\n作为一种自然涌现的特性，STORM 仅通过重建损失即可自动捕捉动态实例并生成高质量的遮罩。我们在公共数据集上的大量实验表明，STORM 在动态场景重建中表现出色，在动态区域的重建精度上显著超越了现有的逐场景优化方法（PSNR 提高 4.3 至 6.6）和前馈方法（PSNR 提高 2.1 至 4.7）。STORM 在 200ms 内即可完成大规模户外场景重建，支持实时渲染，并在场景流估计中表现优异，3D EPE 提高 0.422m，Acc5 提高 28.02%。\n除了重建功能外，我们展示了该模型的四种额外应用，进一步证明了自监督学习在更广泛的动态场景理解中的潜力。\n"
  },
  {
    "path": "abs/2501.00625.md",
    "content": "### Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting\n\nRecently released open-source pre-trained foundational image segmentation and object detection models (SAM2+GroundingDINO) allow for geometrically consistent segmentation of objects of interest in multi-view 2D images. Users can use text-based or click-based prompts to segment objects of interest without requiring labeled training datasets. Gaussian Splatting allows for the learning of the 3D representation of a scene's geometry and radiance based on 2D images. Combining Google Earth Studio, SAM2+GroundingDINO, 2D Gaussian Splatting, and our improvements in mask refinement based on morphological operations and contour simplification, we created a pipeline to extract the 3D mesh of any building based on its name, address, or geographic coordinates.\n\n最近发布的开源预训练基础图像分割和目标检测模型（SAM2+GroundingDINO）可以在多视角二维图像中实现几何一致的目标分割。用户可以使用基于文本或点击的提示，分割感兴趣的目标，而无需依赖标注的训练数据集。高斯喷射（Gaussian Splatting）则可以基于二维图像学习场景几何和辐射的三维表示。结合 Google Earth Studio、SAM2+GroundingDINO、二维高斯喷射以及我们在基于形态学操作和轮廓简化的遮罩优化方面的改进，我们创建了一套流程，可以基于建筑的名称、地址或地理坐标提取其三维网格模型。\n"
  },
  {
    "path": "abs/2501.01003.md",
    "content": "### EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy\n\n3D Gaussian Splatting (3DGS) techniques have achieved satisfactory 3D scene representation. Despite their impressive performance, they confront challenges due to the limitation of structure-from-motion (SfM) methods on acquiring accurate scene initialization, or the inefficiency of densification strategy. In this paper, we introduce a novel framework EasySplat to achieve high-quality 3DGS modeling. Instead of using SfM for scene initialization, we employ a novel method to release the power of large-scale pointmap approaches. Specifically, we propose an efficient grouping strategy based on view similarity, and use robust pointmap priors to obtain high-quality point clouds and camera poses for 3D scene initialization. After obtaining a reliable scene structure, we propose a novel densification approach that adaptively splits Gaussian primitives based on the average shape of neighboring Gaussian ellipsoids, utilizing KNN scheme. In this way, the proposed method tackles the limitation on initialization and optimization, leading to an efficient and accurate 3DGS modeling. Extensive experiments demonstrate that EasySplat outperforms the current state-of-the-art (SOTA) in handling novel view synthesis.\n\n3D 高斯喷射（3DGS）技术在三维场景表示方面取得了令人满意的成果。尽管其表现出色，但仍面临结构光恢复（SfM）方法在获取准确场景初始化中的局限性或密集化策略低效的挑战。在本文中，我们提出了一种新颖的框架 EasySplat，以实现高质量的 3DGS 建模。与传统基于 SfM 的场景初始化方法不同，我们采用了一种新方法来释放大规模点云映射方法的潜力。具体而言，我们提出了一种基于视图相似性的高效分组策略，并利用鲁棒的点云先验获得高质量的点云和相机位姿，用于三维场景初始化。\n在获得可靠的场景结构后，我们提出了一种新颖的密集化方法，基于邻近高斯椭球的平均形状，利用 KNN 方案自适应地分裂高斯基元。通过这种方式，所提出的方法解决了初始化和优化的局限性，从而实现了高效且准确的 3DGS 建模。大量实验表明，EasySplat 在处理新视图合成方面优于当前最先进技术（SOTA）。\n\n"
  },
  {
    "path": "abs/2501.01101.md",
    "content": "### Deformable Gaussian Splatting for Efficient and High-Fidelity Reconstruction of Surgical Scenes\n\nEfficient and high-fidelity reconstruction of deformable surgical scenes is a critical yet challenging task. Building on recent advancements in 3D Gaussian splatting, current methods have seen significant improvements in both reconstruction quality and rendering speed. However, two major limitations remain: (1) difficulty in handling irreversible dynamic changes, such as tissue shearing, which are common in surgical scenes; and (2) the lack of hierarchical modeling for surgical scene deformation, which reduces rendering speed. To address these challenges, we introduce EH-SurGS, an efficient and high-fidelity reconstruction algorithm for deformable surgical scenes. We propose a deformation modeling approach that incorporates the life cycle of 3D Gaussians, effectively capturing both regular and irreversible deformations, thus enhancing reconstruction quality. Additionally, we present an adaptive motion hierarchy strategy that distinguishes between static and deformable regions within the surgical scene. This strategy reduces the number of 3D Gaussians passing through the deformation field, thereby improving rendering speed. Extensive experiments demonstrate that our method surpasses existing state-of-the-art approaches in both reconstruction quality and rendering speed. Ablation studies further validate the effectiveness and necessity of our proposed components.\n\n对可变形手术场景的高效且高保真重建是一个关键而具有挑战性的任务。基于最近 3D 高斯喷射技术的进展，目前的方法在重建质量和渲染速度上取得了显著提升。然而，仍存在两大主要限制：(1) 难以处理不可逆的动态变化，例如手术场景中常见的组织剪切；(2) 缺乏对手术场景变形的分层建模，导致渲染速度降低。\n为了解决这些问题，我们提出了 EH-SurGS，一种针对可变形手术场景的高效高保真重建算法。我们设计了一种变形建模方法，融入了 3D 高斯的生命周期管理，有效捕捉规则和不可逆变形，从而提升重建质量。此外，我们提出了一种自适应运动层次策略，能够区分手术场景中的静态区域和可变形区域。该策略减少了穿过变形场的 3D 高斯数量，从而提高渲染速度。\n大量实验表明，我们的方法在重建质量和渲染速度上均优于现有最先进方法（SOTA）。消融实验进一步验证了所提出组件的有效性和必要性。\n\n"
  },
  {
    "path": "abs/2501.01677.md",
    "content": "### PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping\n\n3D Gaussian Splatting (3DGS) has emerged as a transformative method in the field of real-time novel synthesis. Based on 3DGS, recent advancements cope with large-scale scenes via spatial-based partition strategy to reduce video memory and optimization time costs. In this work, we introduce a parallel Gaussian splatting method, termed PG-SAG, which fully exploits semantic cues for both partitioning and Gaussian kernel optimization, enabling fine-grained building surface reconstruction of large-scale urban areas without downsampling the original image resolution. First, the Cross-modal model - Language Segment Anything is leveraged to segment building masks. Then, the segmented building regions is grouped into sub-regions according to the visibility check across registered images. The Gaussian kernels for these sub-regions are optimized in parallel with masked pixels. In addition, the normal loss is re-formulated for the detected edges of masks to alleviate the ambiguities in normal vectors on edges. Finally, to improve the optimization of 3D Gaussians, we introduce a gradient-constrained balance-load loss that accounts for the complexity of the corresponding scenes, effectively minimizing the thread waiting time in the pixel-parallel rendering stage as well as the reconstruction lost. Extensive experiments are tested on various urban datasets, the results demonstrated the superior performance of our PG-SAG on building surface reconstruction, compared to several state-of-the-art 3DGS-based methods.\n\n3D 高斯喷射 (3DGS) 已成为实时新颖视图合成领域的一种变革性方法。基于 3DGS，近期的研究通过基于空间的分区策略应对大规模场景，以降低视频内存和优化时间成本。在本文中，我们提出了一种并行高斯喷射方法，称为 PG-SAG，它充分利用语义信息进行分区和高斯核优化，实现了对大规模城市区域的细粒度建筑表面重建，而无需对原始图像分辨率进行下采样。\n首先，我们利用跨模态模型 Language Segment Anything 提取建筑区域的分割掩码。然后，根据已注册图像中的可见性检查，将分割的建筑区域划分为子区域。这些子区域的高斯核通过掩码像素进行并行优化。此外，我们重新设计了法向损失函数，专注于掩码检测边缘，缓解边缘处法向量模糊的问题。\n最后，为了改进 3D 高斯的优化，我们引入了一种 梯度约束的负载平衡损失，该损失考虑了相应场景的复杂性，有效减少了像素并行渲染阶段的线程等待时间，同时降低了重建损失。\n我们在多个城市数据集上进行了广泛实验，结果表明，与几种最先进的基于 3DGS 的方法相比，PG-SAG 在建筑表面重建性能上表现更为优越。\n"
  },
  {
    "path": "abs/2501.01695.md",
    "content": "### CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction\n\n3D Gaussian Splatting (3DGS) has emerged as a prominent method for scene representation and reconstruction, leveraging densely distributed Gaussian primitives to enable real-time rendering of high-resolution images. While existing 3DGS methods perform well in scenes with minor view variation, large view changes in cross-view scenes pose optimization challenges for these methods. To address these issues, we propose a novel cross-view Gaussian Splatting method for large-scale scene reconstruction, based on dual-branch fusion. Our method independently reconstructs models from aerial and ground views as two independent branches to establish the baselines of Gaussian distribution, providing reliable priors for cross-view reconstruction during both initialization and densification. Specifically, a gradient-aware regularization strategy is introduced to mitigate smoothing issues caused by significant view disparities. Additionally, a unique Gaussian supplementation strategy is utilized to incorporate complementary information of dual-branch into the cross-view model. Extensive experiments on benchmark datasets demonstrate that our method achieves superior performance in novel view synthesis compared to state-of-the-art methods.\n\n3D 高斯喷射 (3DGS) 作为一种场景表示与重建方法，通过密集分布的高斯基元实现了高分辨率图像的实时渲染。然而，现有的 3DGS 方法在视角变化较小的场景中表现良好，但在视角差异较大的跨视角场景中，其优化能力面临挑战。\n为解决这些问题，我们提出了一种基于 双分支融合 的新颖跨视角高斯喷射方法，用于大规模场景重建。我们的方法分别从航拍视角和地面视角独立重建模型，作为两个独立的分支，以建立高斯分布的基线。这些基线在初始化和密集化过程中为跨视角重建提供可靠的先验信息。\n具体而言，我们引入了一种 梯度感知正则化策略，用于缓解因显著视角差异引起的平滑问题。此外，我们设计了一种独特的 高斯补充策略，将双分支中的互补信息整合到跨视角模型中。\n在基准数据集上的广泛实验表明，与现有最先进方法相比，我们的方法在新视图合成任务中表现出更优异的性能，尤其是在处理大视角变化的场景时。\n"
  },
  {
    "path": "abs/2501.01715.md",
    "content": "### Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision\n\nWe introduce Cloth-Splatting, a method for estimating 3D states of cloth from RGB images through a prediction-update framework. Cloth-Splatting leverages an action-conditioned dynamics model for predicting future states and uses 3D Gaussian Splatting to update the predicted states. Our key insight is that coupling a 3D mesh-based representation with Gaussian Splatting allows us to define a differentiable map between the cloth state space and the image space. This enables the use of gradient-based optimization techniques to refine inaccurate state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting not only improves state estimation accuracy over current baselines but also reduces convergence time.\n\n我们提出了 Cloth-Splatting，一种通过预测-更新框架从 RGB 图像中估计布料三维状态的方法。Cloth-Splatting 利用动作条件下的动力学模型预测布料的未来状态，并通过 3D 高斯喷射 (Gaussian Splatting) 来更新预测状态。\n我们的关键洞察是，将基于三维网格的表示与高斯喷射相结合，能够在布料状态空间与图像空间之间定义一个可微分的映射。这使得我们可以利用基于梯度的优化技术，仅通过 RGB 监督来优化和修正不准确的状态估计。\n实验结果表明，Cloth-Splatting 不仅在状态估计精度上优于现有基线方法，还显著缩短了收敛时间。\n"
  },
  {
    "path": "abs/2501.01895.md",
    "content": "### EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation\n\nWe introduce EnerVerse, a comprehensive framework for embodied future space generation specifically designed for robotic manipulation tasks. EnerVerse seamlessly integrates convolutional and bidirectional attention mechanisms for inner-chunk space modeling, ensuring low-level consistency and continuity. Recognizing the inherent redundancy in video data, we propose a sparse memory context combined with a chunkwise unidirectional generative paradigm to enable the generation of infinitely long sequences. To further augment robotic capabilities, we introduce the Free Anchor View (FAV) space, which provides flexible perspectives to enhance observation and analysis. The FAV space mitigates motion modeling ambiguity, removes physical constraints in confined environments, and significantly improves the robot's generalization and adaptability across various tasks and settings. To address the prohibitive costs and labor intensity of acquiring multi-camera observations, we present a data engine pipeline that integrates a generative model with 4D Gaussian Splatting (4DGS). This pipeline leverages the generative model's robust generalization capabilities and the spatial constraints provided by 4DGS, enabling an iterative enhancement of data quality and diversity, thus creating a data flywheel effect that effectively narrows the sim-to-real gap. Finally, our experiments demonstrate that the embodied future space generation prior substantially enhances policy predictive capabilities, resulting in improved overall performance, particularly in long-range robotic manipulation tasks.\n\n我们提出了 EnerVerse，一种专为机器人操作任务设计的综合性化身未来空间生成框架。EnerVerse 将卷积和双向注意力机制无缝整合用于块内空间建模，确保低层次的一致性和连续性。鉴于视频数据中固有的冗余性，我们提出了一种稀疏记忆上下文，结合基于块的单向生成范式，实现了无限长序列的生成。\n为了进一步增强机器人能力，我们引入了自由锚视角空间（Free Anchor View, FAV space），提供灵活的观察视角以改善对场景的观察和分析。FAV 空间有效减轻了运动建模中的歧义问题，消除了受限环境中的物理约束，并显著提升了机器人在各种任务和环境中的泛化性与适应性。\n针对获取多摄像机观测的高昂成本和劳动强度，我们提出了一种结合生成模型与 4D 高斯点绘制（4D Gaussian Splatting, 4DGS）的数据引擎管线。该管线利用生成模型的强泛化能力以及 4DGS 提供的空间约束，迭代提升数据质量和多样性，形成了一个数据飞轮效应，有效缩小了模拟到现实的差距（sim-to-real gap）。\n实验结果表明，化身未来空间生成先验显著增强了策略预测能力，从而提升了整体性能，特别是在长距离机器人操作任务中表现尤为突出。\n"
  },
  {
    "path": "abs/2501.02690.md",
    "content": "### GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking\n\n4D video control is essential in video generation as it enables the use of sophisticated lens techniques, such as multi-camera shooting and dolly zoom, which are currently unsupported by existing methods. Training a video Diffusion Transformer (DiT) directly to control 4D content requires expensive multi-view videos. Inspired by Monocular Dynamic novel View Synthesis (MDVS) that optimizes a 4D representation and renders videos according to different 4D elements, such as camera pose and object motion editing, we bring pseudo 4D Gaussian fields to video generation. Specifically, we propose a novel framework that constructs a pseudo 4D Gaussian field with dense 3D point tracking and renders the Gaussian field for all video frames. Then we finetune a pretrained DiT to generate videos following the guidance of the rendered video, dubbed as GS-DiT. To boost the training of the GS-DiT, we also propose an efficient Dense 3D Point Tracking (D3D-PT) method for the pseudo 4D Gaussian field construction. Our D3D-PT outperforms SpatialTracker, the state-of-the-art sparse 3D point tracking method, in accuracy and accelerates the inference speed by two orders of magnitude. During the inference stage, GS-DiT can generate videos with the same dynamic content while adhering to different camera parameters, addressing a significant limitation of current video generation models. GS-DiT demonstrates strong generalization capabilities and extends the 4D controllability of Gaussian splatting to video generation beyond just camera poses. It supports advanced cinematic effects through the manipulation of the Gaussian field and camera intrinsics, making it a powerful tool for creative video production.\n\n4D 视频控制在视频生成中至关重要，因为它能够实现复杂的镜头技术，例如多机位拍摄和缩放镜头（dolly zoom），而现有方法尚不支持这些功能。直接训练一个视频扩散变换器（Video Diffusion Transformer, DiT）来控制 4D 内容需要耗费大量多视角视频资源。受到单目动态新颖视图合成（Monocular Dynamic novel View Synthesis, MDVS）的启发，我们将伪 4D 高斯场引入到视频生成中。MDVS 优化了 4D 表示并根据不同的 4D 元素（如相机姿态和对象运动编辑）渲染视频。\n具体来说，我们提出了一种新的框架，通过稠密的 3D 点跟踪构建伪 4D 高斯场，并渲染高斯场以生成所有视频帧。然后，我们微调一个预训练的 DiT，以生成遵循渲染视频引导的视频，称为 GS-DiT。为提升 GS-DiT 的训练效果，我们还提出了一种高效的稠密 3D 点跟踪方法（Dense 3D Point Tracking, D3D-PT），用于伪 4D 高斯场的构建。我们的 D3D-PT 方法在准确性上优于现有最先进的稀疏 3D 点跟踪方法 SpatialTracker，并将推理速度提升了两个数量级。\n在推理阶段，GS-DiT 能够生成动态内容相同但相机参数不同的视频，从而解决了当前视频生成模型的一大局限性。GS-DiT 展现出了强大的泛化能力，并将高斯点的 4D 可控性扩展到视频生成领域，不再局限于相机姿态的调整。通过操控高斯场和相机内参数，GS-DiT 支持高级的电影效果，是一款功能强大的创意视频制作工具。\n"
  },
  {
    "path": "abs/2501.02845.md",
    "content": "### HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation\n\nUnderstanding of bimanual hand-object interaction plays an important role in robotics and virtual reality. However, due to significant occlusions between hands and object as well as the high degree-of-freedom motions, it is challenging to collect and annotate a high-quality, large-scale dataset, which prevents further improvement of bimanual hand-object interaction-related baselines. In this work, we propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction, which is capable of augmenting existing dataset to large-scale photorealistic data with various hand-object pose and viewpoints. First, we use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used, we design a super-resolution module. Second, we extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction, which can significantly expand the pose distribution of the dataset. Third, we conduct an analysis for the impact of different aspects of the proposed data augmentation on the understanding of the bimanual hand-object interaction. We perform our data augmentation on two benchmarks, H2O and Arctic, and verify that our method can improve the performance of the baselines.\n\n对双手与物体交互的理解在机器人和虚拟现实领域具有重要作用。然而，由于双手和物体之间的显著遮挡以及高自由度运动的复杂性，收集和标注高质量、大规模数据集具有极大挑战性，从而限制了与双手物体交互相关的基线模型的进一步改进。在本研究中，我们提出了一种基于 3D 高斯点（3D Gaussian Splatting, 3DGS）的数据增强框架，用于双手物体交互。该框架能够将现有数据集扩展为具有多种手-物体姿态和视角的大规模真实感数据。\n首先，我们使用基于网格的 3DGS 模型化物体和手，并针对多分辨率输入图像导致的渲染模糊问题设计了一个超分辨率模块。其次，我们扩展了单手抓取姿态优化模块以适应双手物体交互，从而生成多样化的双手物体交互姿态，这显著扩展了数据集的姿态分布。最后，我们对所提出数据增强方法的不同方面对双手物体交互理解的影响进行了分析。\n我们在两个基准数据集 H2O 和 Arctic 上进行了数据增强实验，并验证了我们的方法可以提升基线模型的性能。\n"
  },
  {
    "path": "abs/2501.03229.md",
    "content": "### Gaussian Masked Autoencoders\n\nThis paper explores Masked Autoencoders (MAE) with Gaussian Splatting. While reconstructive self-supervised learning frameworks such as MAE learns good semantic abstractions, it is not trained for explicit spatial awareness. Our approach, named Gaussian Masked Autoencoder, or GMAE, aims to learn semantic abstractions and spatial understanding jointly. Like MAE, it reconstructs the image end-to-end in the pixel space, but beyond MAE, it also introduces an intermediate, 3D Gaussian-based representation and renders images via splatting. We show that GMAE can enable various zero-shot learning capabilities of spatial understanding (e.g., figure-ground segmentation, image layering, edge detection, etc.) while preserving the high-level semantics of self-supervised representation quality from MAE. To our knowledge, we are the first to employ Gaussian primitives in an image representation learning framework beyond optimization-based single-scene reconstructions. We believe GMAE will inspire further research in this direction and contribute to developing next-generation techniques for modeling high-fidelity visual data.\n\n本文探讨了结合高斯点绘制（Gaussian Splatting）的掩码自动编码器（Masked Autoencoders, MAE）。尽管重构型自监督学习框架（如 MAE）能够学习良好的语义抽象，它们却未针对显式的空间感知进行训练。我们的方法被称为 高斯掩码自动编码器（Gaussian Masked Autoencoder, GMAE），旨在联合学习语义抽象和空间理解。\n与 MAE 类似，GMAE 在像素空间中端到端地重构图像，但不同于 MAE，GMAE 引入了一个基于 3D 高斯的中间表示，并通过高斯点绘制渲染图像。实验表明，GMAE 在保留 MAE 自监督表示高层语义质量的同时，还能实现多种空间理解的零样本学习能力（如前景-背景分割、图像分层、边缘检测等）。\n据我们所知，GMAE 是首个在图像表示学习框架中使用高斯原语（Gaussian Primitives）超越基于优化的单场景重建的研究工作。我们相信，GMAE 将激发更多在这一方向上的研究，并为开发高保真视觉数据建模的下一代技术提供新的思路。\n"
  },
  {
    "path": "abs/2501.03399.md",
    "content": "### Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs\n\n3D Gaussian Splatting is a recognized method for 3D scene representation, known for its high rendering quality and speed. However, its substantial data requirements present challenges for practical applications. In this paper, we introduce an efficient compression technique that significantly reduces storage overhead by using compact representation. We propose a unified architecture that combines point cloud data and feature planes through a progressive tri-plane structure. Our method utilizes 2D feature planes, enabling continuous spatial representation. To further optimize these representations, we incorporate entropy modeling in the frequency domain, specifically designed for standard video codecs. We also propose channel-wise bit allocation to achieve a better trade-off between bitrate consumption and feature plane representation. Consequently, our model effectively leverages spatial correlations within the feature planes to enhance rate-distortion performance using standard, non-differentiable video codecs. Experimental results demonstrate that our method outperforms existing methods in data compactness while maintaining high rendering quality.\n\n3D 高斯点绘制（3D Gaussian Splatting）是一种备受认可的 3D 场景表示方法，以高渲染质量和速度见长。然而，其庞大的数据需求为实际应用带来了挑战。在本文中，我们提出了一种高效的压缩技术，通过紧凑表示显著减少存储开销。\n我们设计了一种结合点云数据和特征平面的统一架构，采用渐进式三平面结构（progressive tri-plane structure）。该方法利用 2D 特征平面实现连续的空间表示。为了进一步优化这些表示，我们在频域中引入了针对标准视频编解码器设计的熵建模，并提出了通道级比特分配方法，以在比特率消耗和特征平面表示之间取得更好的平衡。\n因此，我们的模型能够有效利用特征平面内的空间相关性，通过标准的非可微视频编解码器提升率失真性能。实验结果表明，该方法在数据紧凑性上优于现有方法，同时保持了高渲染质量。\n"
  },
  {
    "path": "abs/2501.03605.md",
    "content": "### ConcealGS: Concealing Invisible Copyright Information in 3D Gaussian Splatting\n\nWith the rapid development of 3D reconstruction technology, the widespread distribution of 3D data has become a future trend. While traditional visual data (such as images and videos) and NeRF-based formats already have mature techniques for copyright protection, steganographic techniques for the emerging 3D Gaussian Splatting (3D-GS) format have yet to be fully explored. To address this, we propose ConcealGS, an innovative method for embedding implicit information into 3D-GS. By introducing the knowledge distillation and gradient optimization strategy based on 3D-GS, ConcealGS overcomes the limitations of NeRF-based models and enhances the robustness of implicit information and the quality of 3D reconstruction. We evaluate ConcealGS in various potential application scenarios, and experimental results have demonstrated that ConcealGS not only successfully recovers implicit information but also has almost no impact on rendering quality, providing a new approach for embedding invisible and recoverable information into 3D models in the future.\n\n随着 3D 重建技术的快速发展，3D 数据的广泛分发正成为未来的趋势。尽管传统视觉数据（如图像和视频）以及基于 NeRF 的格式在版权保护方面已有成熟技术，但针对新兴的 3D 高斯点绘制（3D-GS）格式的隐写技术尚未得到充分研究。为此，我们提出了一种创新方法 ConcealGS，用于将隐式信息嵌入到 3D-GS 中。\nConcealGS 通过引入基于 3D-GS 的知识蒸馏和梯度优化策略，克服了基于 NeRF 的模型的局限性，并提升了隐式信息的鲁棒性以及 3D 重建的质量。我们在多种潜在应用场景中评估了 ConcealGS，实验结果表明，ConcealGS 不仅能够成功恢复嵌入的隐式信息，而且几乎对渲染质量没有影响，为未来在 3D 模型中嵌入不可见且可恢复的信息提供了一种全新的方法。\n"
  },
  {
    "path": "abs/2501.03659.md",
    "content": "### DehazeGS: Seeing Through Fog with 3D Gaussian Splatting\n\nCurrent novel view synthesis tasks primarily rely on high-quality and clear images. However, in foggy scenes, scattering and attenuation can significantly degrade the reconstruction and rendering quality. Although NeRF-based dehazing reconstruction algorithms have been developed, their use of deep fully connected neural networks and per-ray sampling strategies leads to high computational costs. Moreover, NeRF's implicit representation struggles to recover fine details from hazy scenes. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction by explicitly modeling point clouds into 3D Gaussians. In this paper, we propose leveraging the explicit Gaussian representation to explain the foggy image formation process through a physically accurate forward rendering process. We introduce DehazeGS, a method capable of decomposing and rendering a fog-free background from participating media using only muti-view foggy images as input. We model the transmission within each Gaussian distribution to simulate the formation of fog. During this process, we jointly learn the atmospheric light and scattering coefficient while optimizing the Gaussian representation of the hazy scene. In the inference stage, we eliminate the effects of scattering and attenuation on the Gaussians and directly project them onto a 2D plane to obtain a clear view. Experiments on both synthetic and real-world foggy datasets demonstrate that DehazeGS achieves state-of-the-art performance in terms of both rendering quality and computational efficiency.\n\n当前的新视图合成任务主要依赖高质量和清晰的图像。然而，在雾霾场景中，散射和衰减会显著降低重建和渲染质量。尽管基于 NeRF 的去雾重建算法已经被开发出来，但其深度全连接神经网络和逐射线采样策略导致了较高的计算成本。此外，NeRF 的隐式表示在恢复雾霾场景的细节方面存在困难。相比之下，近期 3D 高斯点绘制（3D Gaussian Splatting）的进展通过将点云显式建模为 3D 高斯分布，实现了高质量的 3D 场景重建。\n本文提出了 DehazeGS，一种基于显式高斯表示的方法，通过物理准确的前向渲染过程解释雾霾图像的形成过程。DehazeGS 能够利用多视角雾霾图像作为输入，分解并渲染出无雾的背景。我们在每个高斯分布内建模光的传输过程，以模拟雾霾的形成。在这一过程中，我们联合学习大气光和散射系数，同时优化雾霾场景的高斯表示。\n在推理阶段，我们消除散射和衰减对高斯分布的影响，并将其直接投影到二维平面，获得清晰的视图。基于合成和真实雾霾数据集的实验表明，DehazeGS 在渲染质量和计算效率方面均达到了当前最先进的水平。\n"
  },
  {
    "path": "abs/2501.03714.md",
    "content": "### MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has made significant strides in scene representation and neural rendering, with intense efforts focused on adapting it for dynamic scenes. Despite delivering remarkable rendering quality and speed, existing methods struggle with storage demands and representing complex real-world motions. To tackle these issues, we propose MoDecGS, a memory-efficient Gaussian splatting framework designed for reconstructing novel views in challenging scenarios with complex motions. We introduce GlobaltoLocal Motion Decomposition (GLMD) to effectively capture dynamic motions in a coarsetofine manner. This approach leverages Global Canonical Scaffolds (Global CS) and Local Canonical Scaffolds (Local CS), extending static Scaffold representation to dynamic video reconstruction. For Global CS, we propose Global Anchor Deformation (GAD) to efficiently represent global dynamics along complex motions, by directly deforming the implicit Scaffold attributes which are anchor position, offset, and local context features. Next, we finely adjust local motions via the Local Gaussian Deformation (LGD) of Local CS explicitly. Additionally, we introduce Temporal Interval Adjustment (TIA) to automatically control the temporal coverage of each Local CS during training, allowing MoDecGS to find optimal interval assignments based on the specified number of temporal segments. Extensive evaluations demonstrate that MoDecGS achieves an average 70% reduction in model size over stateoftheart methods for dynamic 3D Gaussians from realworld dynamic videos while maintaining or even improving rendering quality.\n\n3D 高斯点绘制（3D Gaussian Splatting, 3DGS）在场景表示和神经渲染领域取得了显著进展，尤其在动态场景中的适配上备受关注。尽管现有方法在渲染质量和速度上表现出色，但它们在存储需求和复杂真实场景的动态运动表示方面仍存在挑战。为了解决这些问题，我们提出了 MoDecGS，一种内存高效的高斯点绘制框架，旨在应对复杂运动场景中的新视角重建。\n我们引入了 全局到局部运动分解（Global-to-Local Motion Decomposition, GLMD），以粗到细的方式高效捕捉动态运动。该方法利用 全局规范支架（Global Canonical Scaffold, Global CS） 和 局部规范支架（Local Canonical Scaffold, Local CS），将静态支架表示扩展到动态视频重建。对于 Global CS，我们提出了 全局锚点变形（Global Anchor Deformation, GAD），通过直接变形锚点位置、偏移量和局部上下文特征等隐式支架属性，高效表示复杂运动中的全局动态。随后，通过对 Local CS 的 局部高斯变形（Local Gaussian Deformation, LGD），显式调整局部运动。\n此外，我们引入了 时间间隔调整（Temporal Interval Adjustment, TIA），在训练过程中自动控制每个 Local CS 的时间覆盖范围，使 MoDecGS 能够基于指定的时间段数找到最优的时间间隔分配。\n大量实验表明，MoDecGS 在处理真实动态视频的动态 3D 高斯场景时，相较于最先进方法，模型尺寸平均减少了 70%，同时在渲染质量上保持甚至有所提升。\n"
  },
  {
    "path": "abs/2501.03875.md",
    "content": "### ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting\n\nStylizing a dynamic scene based on an exemplar image is critical for various real-world applications, including gaming, filmmaking, and augmented and virtual reality. However, achieving consistent stylization across both spatial and temporal dimensions remains a significant challenge. Most existing methods are designed for static scenes and often require an optimization process for each style image, limiting their adaptability. We introduce ZDySS, a zero-shot stylization framework for dynamic scenes, allowing our model to generalize to previously unseen style images at inference. Our approach employs Gaussian splatting for scene representation, linking each Gaussian to a learned feature vector that renders a feature map for any given view and timestamp. By applying style transfer on the learned feature vectors instead of the rendered feature map, we enhance spatio-temporal consistency across frames. Our method demonstrates superior performance and coherence over state-of-the-art baselines in tests on real-world dynamic scenes, making it a robust solution for practical applications.\n\n基于示例图像对动态场景进行风格化在游戏、电影制作以及增强与虚拟现实等多种实际应用中至关重要。然而，要在空间和时间维度上实现一致的风格化仍是一个重大挑战。现有的大多数方法主要针对静态场景设计，且通常需要为每张风格图像单独进行优化，这限制了其适应性。\n我们提出了 ZDySS，一种针对动态场景的零样本风格化框架，使模型在推理时能够泛化到未见过的风格图像。我们的方法采用高斯点绘制（Gaussian Splatting）作为场景表示，将每个高斯与一个可学习的特征向量相连接，以渲染任意视角和时间戳的特征图。通过对学习到的特征向量进行风格迁移，而非直接操作渲染后的特征图，我们显著提升了跨帧的时空一致性。\n实验结果表明，在真实动态场景测试中，ZDySS 相较于当前最先进的基线方法表现出更优异的性能和一致性，成为一项适用于实际应用的强大解决方案。\n"
  },
  {
    "path": "abs/2501.04140.md",
    "content": "### Spatiotemporal Gaussian Optimization for 4D Cone Beam CT Reconstruction from Sparse Projections\n\nIn image-guided radiotherapy (IGRT), four-dimensional cone-beam computed tomography (4D-CBCT) is critical for assessing tumor motion during a patients breathing cycle prior to beam delivery. However, generating 4D-CBCT images with sufficient quality requires significantly more projection images than a standard 3D-CBCT scan, leading to extended scanning times and increased imaging dose to the patient. To address these limitations, there is a strong demand for methods capable of reconstructing high-quality 4D-CBCT images from a 1-minute 3D-CBCT acquisition. The challenge lies in the sparse sampling of projections, which introduces severe streaking artifacts and compromises image quality. This paper introduces a novel framework leveraging spatiotemporal Gaussian representation for 4D-CBCT reconstruction from sparse projections, achieving a balance between streak artifact reduction, dynamic motion preservation, and fine detail restoration. Each Gaussian is characterized by its 3D position, covariance, rotation, and density. Two-dimensional X-ray projection images can be rendered from the Gaussian point cloud representation via X-ray rasterization. The properties of each Gaussian were optimized by minimizing the discrepancy between the measured projections and the rendered X-ray projections. A Gaussian deformation network is jointly optimized to deform these Gaussian properties to obtain a 4D Gaussian representation for dynamic CBCT scene modeling. The final 4D-CBCT images are reconstructed by voxelizing the 4D Gaussians, achieving a high-quality representation that preserves both motion dynamics and spatial detail.\n\n在图像引导放射治疗（Image-Guided Radiotherapy, IGRT）中，四维锥束计算机断层扫描（4D-CBCT）对于在放射束投放前评估患者呼吸周期中的肿瘤运动至关重要。然而，生成具有足够质量的 4D-CBCT 图像需要显著多于标准 3D-CBCT 扫描的投影图像，导致扫描时间延长和患者受照剂量增加。为解决这些限制，迫切需要能够从 1 分钟的 3D-CBCT 采集中重建高质量 4D-CBCT 图像的方法。挑战在于投影采样稀疏性，这会引入严重的条纹伪影并降低图像质量。\n本文提出了一种新颖框架，利用时空高斯表示（spatiotemporal Gaussian representation）从稀疏投影中重建 4D-CBCT 图像，在条纹伪影减少、动态运动保留和细节恢复之间实现平衡。每个高斯由其三维位置、协方差、旋转和密度表征。通过 X 射线光栅化，可以从高斯点云表示渲染二维 X 射线投影图像。通过最小化测量投影与渲染投影之间的差异，优化每个高斯的属性。\n框架中还联合优化了一个高斯变形网络，用于变形这些高斯属性，从而获得动态 CBCT 场景建模的 4D 高斯表示。最终的 4D-CBCT 图像通过对 4D 高斯进行体素化重建，生成了高质量的图像表示，既保留了运动动态，又保持了空间细节。这种方法为 4D-CBCT 重建提供了一种高效且精准的解决方案。\n"
  },
  {
    "path": "abs/2501.04628.md",
    "content": "### FatesGS: Fast and Accurate Sparse-View Surface Reconstruction using Gaussian Splatting with Depth-Feature Consistency\n\nRecently, Gaussian Splatting has sparked a new trend in the field of computer vision. Apart from novel view synthesis, it has also been extended to the area of multi-view reconstruction. The latest methods facilitate complete, detailed surface reconstruction while ensuring fast training speed. However, these methods still require dense input views, and their output quality significantly degrades with sparse views. We observed that the Gaussian primitives tend to overfit the few training views, leading to noisy floaters and incomplete reconstruction surfaces. In this paper, we present an innovative sparse-view reconstruction framework that leverages intra-view depth and multi-view feature consistency to achieve remarkably accurate surface reconstruction. Specifically, we utilize monocular depth ranking information to supervise the consistency of depth distribution within patches and employ a smoothness loss to enhance the continuity of the distribution. To achieve finer surface reconstruction, we optimize the absolute position of depth through multi-view projection features. Extensive experiments on DTU and BlendedMVS demonstrate that our method outperforms state-of-the-art methods with a speedup of 60x to 200x, achieving swift and fine-grained mesh reconstruction without the need for costly pre-training.\n\n最近，高斯点绘制（Gaussian Splatting）在计算机视觉领域掀起了一股新潮流。除了用于新视图合成外，它还被扩展到多视角重建领域。最新的方法能够实现完整且细致的表面重建，同时保证较快的训练速度。然而，这些方法仍然需要密集的输入视图，当视图稀疏时，输出质量会显著下降。我们观察到，高斯原语在稀疏视图训练时容易过拟合，导致噪声浮点和不完整的重建表面。\n本文提出了一种创新的稀疏视图重建框架，通过利用视内深度和多视角特征一致性，实现了高度精确的表面重建。具体而言，我们利用单目深度排序信息监督补丁内的深度分布一致性，并引入平滑损失以增强分布的连续性。为了实现更精细的表面重建，我们通过多视角投影特征优化深度的绝对位置。\n在 DTU 和 BlendedMVS 数据集上的大量实验表明，所提出的方法在速度上比当前最先进的方法快 60 倍至 200 倍，同时无需昂贵的预训练即可实现快速且细粒度的网格重建。这一方法显著提升了稀疏视图重建的精度和效率。\n"
  },
  {
    "path": "abs/2501.04782.md",
    "content": "### GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting\n\nEfficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training times, and temporal consistency. To address these issues, we introduce a novel neural video representation that combines 3D Gaussian splatting with continuous camera motion modeling. By leveraging Neural ODEs, our approach learns smooth camera trajectories while maintaining an explicit 3D scene representation through Gaussians. Additionally, we introduce a spatiotemporal hierarchical learning strategy, progressively refining spatial and temporal features to enhance reconstruction quality and accelerate convergence. This memory-efficient approach achieves high-quality rendering at impressive speeds. Experimental results show that our hierarchical learning, combined with robust camera motion modeling, captures complex dynamic scenes with strong temporal consistency, achieving state-of-the-art performance across diverse video datasets in both high- and low-motion scenarios.\n\n高效的神经表示对于动态视频场景在视频压缩和交互式仿真等应用中至关重要。然而，现有方法常面临高内存占用、训练时间过长以及时间一致性不足等问题。为了解决这些问题，我们提出了一种新的神经视频表示方法，将 3D 高斯点绘制（3D Gaussian Splatting） 与连续相机运动建模相结合。\n通过引入 神经微分方程（Neural ODEs），我们的方法在保持通过高斯进行显式 3D 场景表示的同时，学习平滑的相机轨迹。此外，我们提出了一种 时空分层学习策略（spatiotemporal hierarchical learning strategy），逐步优化空间和时间特征，以提升重建质量并加速收敛。该方法在内存高效的基础上，实现了高质量渲染，速度表现尤为出色。\n实验结果表明，时空分层学习与稳健的相机运动建模相结合，使得该方法能够捕捉复杂的动态场景，同时保持强大的时间一致性。在高运动和低运动场景下的多样化视频数据集上，该方法均达到了当前最先进的性能表现。\n"
  },
  {
    "path": "abs/2501.05242.md",
    "content": "### Scaffold-SLAM: Structured 3D Gaussians for Simultaneous Localization and Photorealistic Mapping\n\n3D Gaussian Splatting (3DGS) has recently revolutionized novel view synthesis in the Simultaneous Localization and Mapping (SLAM). However, existing SLAM methods utilizing 3DGS have failed to provide high-quality novel view rendering for monocular, stereo, and RGB-D cameras simultaneously. Notably, some methods perform well for RGB-D cameras but suffer significant degradation in rendering quality for monocular cameras. In this paper, we present Scaffold-SLAM, which delivers simultaneous localization and high-quality photorealistic mapping across monocular, stereo, and RGB-D cameras. We introduce two key innovations to achieve this state-of-the-art visual quality. First, we propose Appearance-from-Motion embedding, enabling 3D Gaussians to better model image appearance variations across different camera poses. Second, we introduce a frequency regularization pyramid to guide the distribution of Gaussians, allowing the model to effectively capture finer details in the scene. Extensive experiments on monocular, stereo, and RGB-D datasets demonstrate that Scaffold-SLAM significantly outperforms state-of-the-art methods in photorealistic mapping quality, e.g., PSNR is 16.76% higher in the TUM RGB-D datasets for monocular cameras.\n\n3D 高斯点绘制（3D Gaussian Splatting, 3DGS）近年来在同步定位与建图（Simultaneous Localization and Mapping, SLAM）中的新视图合成任务中取得了革命性进展。然而，现有利用 3DGS 的 SLAM 方法尚未能够同时为单目、立体视觉和 RGB-D 摄像机提供高质量的新视图渲染。其中，一些方法在 RGB-D 摄像机中表现较好，但在单目摄像机中渲染质量显著下降。\n本文提出了 Scaffold-SLAM，一种能够在单目、立体视觉和 RGB-D 摄像机中同时实现高质量光真实感建图和定位的系统。为实现这一最先进的视觉质量，我们提出了基于运动的外观嵌入（Appearance-from-Motion embedding），使得 3D 高斯能够更好地建模不同相机姿态下图像的外观变化。此外，我们引入了频率正则化金字塔（Frequency Regularization Pyramid），用于引导高斯分布，从而有效捕捉场景中的细节信息。\n在单目、立体视觉和 RGB-D 数据集上的大量实验表明，Scaffold-SLAM 在光真实感建图质量上显著优于当前最先进的方法。例如，在 TUM RGB-D 数据集的单目摄像机实验中，PSNR 提高了 16.76%。\n"
  },
  {
    "path": "abs/2501.05379.md",
    "content": "### Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance\n\nInspired by the effectiveness of 3D Gaussian Splatting (3DGS) in reconstructing detailed 3D scenes within multi-view setups and the emergence of large 2D human foundation models, we introduce Arc2Avatar, the first SDS-based method utilizing a human face foundation model as guidance with just a single image as input. To achieve that, we extend such a model for diverse-view human head generation by fine-tuning on synthetic data and modifying its conditioning. Our avatars maintain a dense correspondence with a human face mesh template, allowing blendshape-based expression generation. This is achieved through a modified 3DGS approach, connectivity regularizers, and a strategic initialization tailored for our task. Additionally, we propose an optional efficient SDS-based correction step to refine the blendshape expressions, enhancing realism and diversity. Experiments demonstrate that Arc2Avatar achieves state-of-the-art realism and identity preservation, effectively addressing color issues by allowing the use of very low guidance, enabled by our strong identity prior and initialization strategy, without compromising detail.\n\n受益于 3D 高斯点绘制（3D Gaussian Splatting, 3DGS）在多视角设置中重建精细 3D 场景的能力，以及大规模 2D 人体基础模型的兴起，我们提出了 Arc2Avatar，这是首个利用基于 SDS 的方法并以人脸基础模型为引导的技术，仅需单张输入图像即可生成结果。为实现这一目标，我们通过在合成数据上进行微调和修改条件输入，将人脸基础模型扩展用于多视角人头生成。\n生成的虚拟人头与一个人脸网格模板保持密集对应关系，从而支持基于混合变形（blendshape）的表情生成。这一过程通过改进的 3DGS 方法、连通性正则器以及为任务量身定制的初始化策略得以实现。此外，我们提出了一种可选的高效 SDS 校正步骤，用于细化混合变形表情，从而进一步提升现实感和多样性。\n实验结果表明，Arc2Avatar 在现实感和身份保留方面达到了最先进水平，通过我们的强身份先验和初始化策略，能够在保持细节的同时有效解决颜色问题，仅需使用非常低的引导强度即可实现。这项技术显著提升了生成结果的真实性和多样性，推动了单张图像驱动的虚拟人头生成的技术前沿。\n"
  },
  {
    "path": "abs/2501.05427.md",
    "content": "### Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation\n\nRecent advances in 2D image generation have achieved remarkable quality,largely driven by the capacity of diffusion models and the availability of large-scale datasets. However, direct 3D generation is still constrained by the scarcity and lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel approach that addresses this problem by enabling direct single-view generation on Gaussian splats using pretrained 2D diffusion models. Our key insight is that Gaussian splats, a 3D representation, can be decomposed into multi-view images encoding different attributes. This reframes the challenging task of direct 3D generation within a 2D diffusion framework, allowing us to leverage the rich priors of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats. This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects. Extensive experiments on both synthetic and in-the-wild datasets demonstrate superior performance in 3D object generation, offering a new approach to high-quality 3D generation.\n\n近年来，2D 图像生成技术取得了显著进展，这主要得益于扩散模型的强大能力和大规模数据集的可用性。然而，直接生成 3D 数据仍然受到 3D 数据集稀缺性和较低保真度的限制。本文提出了一种新方法 Zero-1-to-G，通过利用预训练的 2D 扩散模型，实现了在高斯点（Gaussian splats）上的直接单视角生成，从而应对这一问题。\n我们的核心见解是，高斯点作为一种 3D 表示形式，可以分解为包含不同属性的多视角图像。这将直接 3D 生成这一具有挑战性的任务重新框定为 2D 扩散框架内的问题，使我们能够充分利用预训练 2D 扩散模型的丰富先验知识。为了引入 3D 感知，我们提出了跨视角和跨属性注意力层，这些注意力层能够捕捉复杂的相关性并在生成的高斯点之间强制实现 3D 一致性。这使得 Zero-1-to-G 成为首个有效利用预训练 2D 扩散先验的直接图像到 3D 生成模型，实现了高效训练并提升了对未见物体的泛化能力。\n在合成数据集和真实场景数据集上的大量实验表明，Zero-1-to-G 在 3D 对象生成方面表现出色，提供了一种生成高质量 3D 数据的新方法，为 3D 生成领域开辟了新的可能性。\n"
  },
  {
    "path": "abs/2501.05757.md",
    "content": "### Locality-aware Gaussian Compression for Fast and High-quality Rendering\n\nWe present LocoGS, a locality-aware 3D Gaussian Splatting (3DGS) framework that exploits the spatial coherence of 3D Gaussians for compact modeling of volumetric scenes. To this end, we first analyze the local coherence of 3D Gaussian attributes, and propose a novel locality-aware 3D Gaussian representation that effectively encodes locally-coherent Gaussian attributes using a neural field representation with a minimal storage requirement. On top of the novel representation, LocoGS is carefully designed with additional components such as dense initialization, an adaptive spherical harmonics bandwidth scheme and different encoding schemes for different Gaussian attributes to maximize compression performance. Experimental results demonstrate that our approach outperforms the rendering quality of existing compact Gaussian representations for representative real-world 3D datasets while achieving from 54.6× to 96.6× compressed storage size and from 2.1× to 2.4× rendering speed than 3DGS. Even our approach also demonstrates an averaged 2.4× higher rendering speed than the state-of-the-art compression method with comparable compression performance.\n\n我们提出了 LocoGS，一种面向局部性的 3D 高斯点绘制（3D Gaussian Splatting, 3DGS）框架，通过利用 3D 高斯的空间一致性，实现体积场景的紧凑建模。为此，我们首先分析了 3D 高斯属性的局部一致性，并提出了一种新颖的面向局部性的 3D 高斯表示法。该方法利用神经场表示高效编码局部一致的高斯属性，同时最大限度地减少存储需求。\n基于这一新颖表示，LocoGS 经过精心设计，加入了若干附加组件，例如稠密初始化、自适应球谐带宽方案，以及针对不同高斯属性的差异化编码策略，以最大化压缩性能。实验结果表明，我们的方法在代表性的真实 3D 数据集上，不仅在渲染质量上优于现有的紧凑高斯表示，还实现了 54.6 倍至 96.6 倍 的存储压缩比，并在渲染速度上比传统 3DGS 快 2.1 倍至 2.4 倍。此外，与当前最先进的压缩方法相比，LocoGS 的渲染速度平均提高 2.4 倍，同时保持了可比的压缩性能。\n这一结果表明，LocoGS 在存储效率和渲染性能方面达到了新的高度，是一种高效且适应性强的 3D 场景建模方法。\n"
  },
  {
    "path": "abs/2501.06714.md",
    "content": "### F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting\n\nThis paper tackles the problem of generalizable 3D-aware generation from monocular datasets, e.g., ImageNet. The key challenge of this task is learning a robust 3D-aware representation without multi-view or dynamic data, while ensuring consistent texture and geometry across different viewpoints. Although some baseline methods are capable of 3D-aware generation, the quality of the generated images still lags behind state-of-the-art 2D generation approaches, which excel in producing high-quality, detailed images. To address this severe limitation, we propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting, coined as F3D-Gaus, which can produce more realistic and reliable 3D renderings from monocular inputs. In addition, we introduce a self-supervised cycle-consistent constraint to enforce cross-view consistency in the learned 3D representation. This training strategy naturally allows aggregation of multiple aligned Gaussian primitives and significantly alleviates the interpolation limitations inherent in single-view pixel-aligned Gaussian Splatting. Furthermore, we incorporate video model priors to perform geometry-aware refinement, enhancing the generation of fine details in wide-viewpoint scenarios and improving the model's capability to capture intricate 3D textures. Extensive experiments demonstrate that our approach not only achieves high-quality, multi-view consistent 3D-aware generation from monocular datasets, but also significantly improves training and inference efficiency.\n\n本文研究了一种从单目数据集（如 ImageNet）中实现具有泛化能力的三维感知生成问题。该任务的关键挑战在于，在没有多视角或动态数据的情况下，学习一种稳健的三维感知表示，同时确保不同视角间纹理和几何的一致性。尽管一些基线方法能够实现三维感知生成，但生成图像的质量仍然落后于目前最先进的二维生成方法，而后者在生成高质量、细节丰富的图像方面表现卓越。\n为了解决这一严重限制，我们提出了一种基于像素对齐高斯散点的新型前馈流程，称为 F3D-Gaus，可以从单目输入中生成更加真实可靠的三维渲染。此外，我们引入了一种自监督的循环一致性约束，以在学习到的三维表示中强制实现跨视角一致性。这种训练策略自然地支持对多个对齐的高斯原语的聚合，并显著缓解了单视角像素对齐高斯散点内插的局限性。\n进一步地，我们结合视频模型先验进行几何感知的优化，在宽视角场景下增强了细节生成能力，并提升了模型捕捉复杂三维纹理的能力。大量实验表明，我们的方法不仅能够实现从单目数据集中生成高质量、多视角一致的三维感知结果，还显著提高了训练和推理的效率。\n"
  },
  {
    "path": "abs/2501.06838.md",
    "content": "### Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution\n\nEquipped with the continuous representation capability of Multi-Layer Perceptron (MLP), Implicit Neural Representation (INR) has been successfully employed for Arbitrary-scale Super-Resolution (ASR). However, the limited receptive field of the linear layers in MLP restricts the representation capability of INR, while it is computationally expensive to query the MLP numerous times to render each pixel. Recently, Gaussian Splatting (GS) has shown its advantages over INR in both visual quality and rendering speed in 3D tasks, which motivates us to explore whether GS can be employed for the ASR task. However, directly applying GS to ASR is exceptionally challenging because the original GS is an optimization-based method through overfitting each single scene, while in ASR we aim to learn a single model that can generalize to different images and scaling factors. We overcome these challenges by developing two novel techniques. Firstly, to generalize GS for ASR, we elaborately design an architecture to predict the corresponding image-conditioned Gaussians of the input low-resolution image in a feed-forward manner. Secondly, we implement an efficient differentiable 2D GPU/CUDA-based scale-aware rasterization to render super-resolved images by sampling discrete RGB values from the predicted contiguous Gaussians. Via end-to-end training, our optimized network, namely GSASR, can perform ASR for any image and unseen scaling factors. Extensive experiments validate the effectiveness of our proposed method.\n\n基于多层感知机（MLP）的连续表示能力，隐式神经表示（INR）已被成功应用于任意比例超分辨率（ASR）。然而，MLP 中线性层的有限感受野限制了 INR 的表示能力，同时多次查询 MLP 来渲染每个像素的计算开销较高。最近，高斯散点（Gaussian Splatting, GS）在 3D 任务中展现了其在视觉质量和渲染速度上的优势，这促使我们探索 GS 是否可以被用于 ASR 任务。然而，直接将 GS 应用于 ASR 面临极大的挑战，因为原始 GS 是一种通过对每个单一场景进行过拟合的优化方法，而在 ASR 中，我们的目标是学习一个可以泛化到不同图像和缩放因子的单一模型。\n我们通过开发两项新技术克服了这些挑战。首先，为了将 GS 泛化到 ASR，我们精心设计了一种架构，以前馈的方式预测与输入低分辨率图像相关的图像条件高斯分布。其次，我们实现了一种高效的基于 GPU/CUDA 的可微分二维缩放感知光栅化，通过从预测的连续高斯分布中采样离散的 RGB 值来渲染超分辨率图像。通过端到端的训练，我们优化的网络，即 GSASR，可以对任意图像和未见过的缩放因子执行 ASR。大量实验验证了我们提出方法的有效性。\n"
  },
  {
    "path": "abs/2501.06897.md",
    "content": "### ActiveGAMER: Active GAussian Mapping through Efficient Rendering\n\nWe introduce ActiveGAMER, an active mapping system that utilizes 3D Gaussian Splatting (3DGS) to achieve high-quality, real-time scene mapping and exploration. Unlike traditional NeRF-based methods, which are computationally demanding and restrict active mapping performance, our approach leverages the efficient rendering capabilities of 3DGS, allowing effective and efficient exploration in complex environments. The core of our system is a rendering-based information gain module that dynamically identifies the most informative viewpoints for next-best-view planning, enhancing both geometric and photometric reconstruction accuracy. ActiveGAMER also integrates a carefully balanced framework, combining coarse-to-fine exploration, post-refinement, and a global-local keyframe selection strategy to maximize reconstruction completeness and fidelity. Our system autonomously explores and reconstructs environments with state-of-the-art geometric and photometric accuracy and completeness, significantly surpassing existing approaches in both aspects. Extensive evaluations on benchmark datasets such as Replica and MP3D highlight ActiveGAMER's effectiveness in active mapping tasks.\n\n我们介绍了 ActiveGAMER，一种利用三维高斯散点（3DGS）实现高质量实时场景映射与探索的主动映射系统。与传统基于 NeRF 的方法不同，这些方法计算量大，限制了主动映射性能，而我们的方法利用 3DGS 的高效渲染能力，使得在复杂环境中的探索更加高效。系统的核心是一种基于渲染的信息增益模块，该模块能够动态识别最具信息价值的视点用于最佳下一视角规划，从而提高几何和光度重建的准确性。\nActiveGAMER 还整合了精心平衡的框架，包括粗到细的探索、后期优化以及一种全局-局部关键帧选择策略，以最大化重建的完整性和保真度。该系统能够自主探索和重建环境，在几何和光度精度及完整性方面达到最新的研究水平，显著超越现有方法。\n在 Replica 和 MP3D 等基准数据集上的大量评估验证了 ActiveGAMER 在主动映射任务中的有效性。\n\n"
  },
  {
    "path": "abs/2501.07015.md",
    "content": "### SplatMAP: Online Dense Monocular SLAM with 3D Gaussian Splatting\n\nAchieving high-fidelity 3D reconstruction from monocular video remains challenging due to the inherent limitations of traditional methods like Structure-from-Motion (SfM) and monocular SLAM in accurately capturing scene details. While differentiable rendering techniques such as Neural Radiance Fields (NeRF) address some of these challenges, their high computational costs make them unsuitable for real-time applications. Additionally, existing 3D Gaussian Splatting (3DGS) methods often focus on photometric consistency, neglecting geometric accuracy and failing to exploit SLAM's dynamic depth and pose updates for scene refinement. We propose a framework integrating dense SLAM with 3DGS for real-time, high-fidelity dense reconstruction. Our approach introduces SLAM-Informed Adaptive Densification, which dynamically updates and densifies the Gaussian model by leveraging dense point clouds from SLAM. Additionally, we incorporate Geometry-Guided Optimization, which combines edge-aware geometric constraints and photometric consistency to jointly optimize the appearance and geometry of the 3DGS scene representation, enabling detailed and accurate SLAM mapping reconstruction. Experiments on the Replica and TUM-RGBD datasets demonstrate the effectiveness of our approach, achieving state-of-the-art results among monocular systems. Specifically, our method achieves a PSNR of 36.864, SSIM of 0.985, and LPIPS of 0.040 on Replica, representing improvements of 10.7%, 6.4%, and 49.4%, respectively, over the previous SOTA. On TUM-RGBD, our method outperforms the closest baseline by 10.2%, 6.6%, and 34.7% in the same metrics. These results highlight the potential of our framework in bridging the gap between photometric and geometric dense 3D scene representations, paving the way for practical and efficient monocular dense reconstruction.\n\n从单目视频实现高保真三维重建仍然充满挑战，原因在于传统方法如结构光（Structure-from-Motion, SfM）和单目 SLAM 在准确捕捉场景细节方面的内在局限性。尽管可微渲染技术（如神经辐射场，NeRF）可以解决部分问题，但其高计算成本使其不适用于实时应用。此外，现有的三维高斯散点（3DGS）方法通常关注光度一致性，而忽略了几何准确性，并未利用 SLAM 的动态深度和位姿更新进行场景优化。\n我们提出了一种将密集 SLAM 与 3DGS 集成的框架，用于实时高保真密集重建。该方法引入了 SLAM 引导的自适应加密（SLAM-Informed Adaptive Densification），通过利用 SLAM 的密集点云动态更新和加密高斯模型。此外，我们结合了 几何引导优化（Geometry-Guided Optimization），融合基于边缘的几何约束与光度一致性，共同优化 3DGS 场景表示的外观和几何，使得 SLAM 映射的重建更加细致和准确。\n在 Replica 和 TUM-RGBD 数据集上的实验表明，我们的方法实现了单目系统的最新成果。在 Replica 数据集上，我们的方法取得了 PSNR 为 36.864、SSIM 为 0.985 和 LPIPS 为 0.040 的性能，分别比之前的最新方法提高了 10.7%、6.4% 和 49.4%。在 TUM-RGBD 数据集上，我们的方法在相同指标上比最接近的基线方法分别高出 10.2%、6.6% 和 34.7%。这些结果表明，我们的框架在弥合光度和几何密集三维场景表示的差距方面的潜力，为实用高效的单目密集重建铺平了道路。\n"
  },
  {
    "path": "abs/2501.07104.md",
    "content": "### RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians\n\nWe introduce RMAvatar, a novel human avatar representation with Gaussian splatting embedded on mesh to learn clothed avatar from a monocular video. We utilize the explicit mesh geometry to represent motion and shape of a virtual human and implicit appearance rendering with Gaussian Splatting. Our method consists of two main modules: Gaussian initialization module and Gaussian rectification module. We embed Gaussians into triangular faces and control their motion through the mesh, which ensures low-frequency motion and surface deformation of the avatar. Due to the limitations of LBS formula, the human skeleton is hard to control complex non-rigid transformations. We then design a pose-related Gaussian rectification module to learn fine-detailed non-rigid deformations, further improving the realism and expressiveness of the avatar. We conduct extensive experiments on public datasets, RMAvatar shows state-of-the-art performance on both rendering quality and quantitative evaluations.\n\n我们提出了 RMAvatar，一种嵌入高斯散点到网格上的新型人体化身表示方法，可通过单目视频学习穿衣化身。我们利用显式网格几何表示虚拟人类的运动和形状，并通过高斯散点实现隐式外观渲染。我们的方法由两个主要模块组成：高斯初始化模块和高斯校正模块。\n我们将高斯嵌入到三角面片中，并通过网格控制其运动，从而确保化身的低频运动和表面变形。由于线性混合骨骼（LBS）公式的限制，人体骨架难以控制复杂的非刚性变形。为此，我们设计了一个与姿态相关的高斯校正模块，用于学习精细的非刚性变形，进一步提升化身的真实感和表现力。\n我们在公共数据集上进行了大量实验，结果表明 RMAvatar 在渲染质量和定量评估方面均达到了最新水平的性能。\n"
  },
  {
    "path": "abs/2501.07478.md",
    "content": "### 3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh\n\n3D Gaussian Splatting (3DGS) excels at producing highly detailed 3D reconstructions, but these scenes often require specialised renderers for effective visualisation. In contrast, point clouds are a widely used 3D representation and are compatible with most popular 3D processing software, yet converting 3DGS scenes into point clouds is a complex challenge. In this work we introduce 3DGS-to-PC, a flexible and highly customisable framework that is capable of transforming 3DGS scenes into dense, high-accuracy point clouds. We sample points probabilistically from each Gaussian as a 3D density function. We additionally threshold new points using the Mahalanobis distance to the Gaussian centre, preventing extreme outliers. The result is a point cloud that closely represents the shape encoded into the 3D Gaussian scene. Individual Gaussians use spherical harmonics to adapt colours depending on view, and each point may contribute only subtle colour hints to the resulting rendered scene. To avoid spurious or incorrect colours that do not fit with the final point cloud, we recalculate Gaussian colours via a customised image rendering approach, assigning each Gaussian the colour of the pixel to which it contributes most across all views. 3DGS-to-PC also supports mesh generation through Poisson Surface Reconstruction, applied to points sampled from predicted surface Gaussians. This allows coloured meshes to be generated from 3DGS scenes without the need for re-training. This package is highly customisable and capability of simple integration into existing 3DGS pipelines. 3DGS-to-PC provides a powerful tool for converting 3DGS data into point cloud and surface-based formats.\n\n三维高斯散点（3D Gaussian Splatting, 3DGS）在生成高度细节化的三维重建方面表现出色，但这些场景通常需要专用渲染器进行有效的可视化。相比之下，点云是一种广泛使用的三维表示形式，与大多数流行的三维处理软件兼容，但将 3DGS 场景转换为点云是一个复杂的挑战。\n在本研究中，我们提出了 3DGS-to-PC，一个灵活且高度可定制的框架，能够将 3DGS 场景转换为密集且高精度的点云。我们将每个高斯看作三维密度函数，以概率方式采样点。此外，我们通过马氏距离（Mahalanobis Distance）对新生成的点进行阈值筛选，从而避免极端离群值的影响。最终生成的点云能够紧密表示 3DGS 场景中编码的形状。\n在颜色处理上，单个高斯使用球谐函数根据视角自适应颜色，每个点仅对渲染场景提供细微的颜色提示。为避免与最终点云不匹配的虚假或错误颜色，我们通过定制的图像渲染方法重新计算高斯颜色，为每个高斯分配其在所有视角中对某像素贡献最大的颜色。\n3DGS-to-PC 还支持通过泊松表面重建（Poisson Surface Reconstruction）生成网格，应用于从预测的表面高斯采样的点。这使得可以从 3DGS 场景生成彩色网格，而无需重新训练。\n此工具包高度可定制，并可轻松集成到现有的 3DGS 流程中。3DGS-to-PC 提供了一种强大的工具，用于将 3DGS 数据转换为基于点云和表面的格式。\n"
  },
  {
    "path": "abs/2501.08072.md",
    "content": "### Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes\n\nGaussian Splatting (GS) and Neural Radiance Fields (NeRF) are two groundbreaking technologies that have revolutionized the field of Novel View Synthesis (NVS), enabling immersive photorealistic rendering and user experiences by synthesizing multiple viewpoints from a set of images of sparse views. The potential applications of NVS, such as high-quality virtual and augmented reality, detailed 3D modeling, and realistic medical organ imaging, underscore the importance of quality assessment of NVS methods from the perspective of human perception. Although some previous studies have explored subjective quality assessments for NVS technology, they still face several challenges, especially in NVS methods selection, scenario coverage, and evaluation methodology. To address these challenges, we conducted two subjective experiments for the quality assessment of NVS technologies containing both GS-based and NeRF-based methods, focusing on dynamic and real-world scenes. This study covers 360°, front-facing, and single-viewpoint videos while providing a richer and greater number of real scenes. Meanwhile, it's the first time to explore the impact of NVS methods in dynamic scenes with moving objects. The two types of subjective experiments help to fully comprehend the influences of different viewing paths from a human perception perspective and pave the way for future development of full-reference and no-reference quality metrics. In addition, we established a comprehensive benchmark of various state-of-the-art objective metrics on the proposed database, highlighting that existing methods still struggle to accurately capture subjective quality. The results give us some insights into the limitations of existing NVS methods and may promote the development of new NVS methods.\n\n高斯散点（Gaussian Splatting, GS）和神经辐射场（Neural Radiance Fields, NeRF）是变革性的技术，革新了新视角合成（Novel View Synthesis, NVS）领域，通过从稀疏视角图像集生成多个视点的逼真渲染，为用户提供沉浸式的真实感体验。NVS 的潜在应用包括高质量的虚拟与增强现实、精细的三维建模以及逼真的医学器官成像，这些都凸显了从人类感知角度对 NVS 方法进行质量评估的重要性。\n尽管已有研究探索了 NVS 技术的主观质量评估，但在方法选择、场景覆盖和评估方法论上仍面临诸多挑战。为了解决这些问题，我们针对 NVS 技术（涵盖 GS 和 NeRF 方法）开展了两项主观实验，重点关注动态和真实场景。本研究涵盖了 360°、前视和单视点视频，并提供了更丰富且数量更多的真实场景数据。同时，我们首次探索了 NVS 方法在包含移动物体的动态场景中的影响。通过这两类主观实验，我们能够从人类感知的角度全面理解不同观看路径的影响，为未来全参考（full-reference）和无参考（no-reference）质量评估指标的发展铺平道路。\n此外，我们在提出的数据库上对多种最新的客观指标进行了全面基准测试，结果显示现有方法仍难以准确捕捉主观质量。这些结果为我们揭示了现有 NVS 方法的局限性，并可能推动新型 NVS 方法的开发。\n"
  },
  {
    "path": "abs/2501.08174.md",
    "content": "### Object-Centric 2D Gaussian Splatting: Background Removal and Occlusion-Aware Pruning for Compact Object Models\n\nCurrent Gaussian Splatting approaches are effective for reconstructing entire scenes but lack the option to target specific objects, making them computationally expensive and unsuitable for object-specific applications. We propose a novel approach that leverages object masks to enable targeted reconstruction, resulting in object-centric models. Additionally, we introduce an occlusion-aware pruning strategy to minimize the number of Gaussians without compromising quality. Our method reconstructs compact object models, yielding object-centric Gaussian and mesh representations that are up to 96\\% smaller and up to 71% faster to train compared to the baseline while retaining competitive quality. These representations are immediately usable for downstream applications such as appearance editing and physics simulation without additional processing.\n\n当前的高斯散点方法在重建整个场景方面表现出色，但缺乏针对特定对象的选项，导致计算成本高昂，不适用于以对象为中心的应用。我们提出了一种新方法，利用对象掩码实现目标重建，从而生成以对象为中心的模型。\n此外，我们引入了一种感知遮挡的剪枝策略，以在不降低质量的情况下最小化高斯数量。我们的方法能够重建紧凑的对象模型，生成的以对象为中心的高斯和网格表示相比基线模型减少了高达 96% 的存储需求，并且训练速度提升高达 71%，同时保留了具有竞争力的质量。\n这些表示可以直接应用于下游任务，例如外观编辑和物理模拟，而无需额外处理。\n"
  },
  {
    "path": "abs/2501.08286.md",
    "content": "### VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes\n\nVINGS-Mono is a monocular (inertial) Gaussian Splatting (GS) SLAM framework designed for large scenes. The framework comprises four main components: VIO Front End, 2D Gaussian Map, NVS Loop Closure, and Dynamic Eraser. In the VIO Front End, RGB frames are processed through dense bundle adjustment and uncertainty estimation to extract scene geometry and poses. Based on this output, the mapping module incrementally constructs and maintains a 2D Gaussian map. Key components of the 2D Gaussian Map include a Sample-based Rasterizer, Score Manager, and Pose Refinement, which collectively improve mapping speed and localization accuracy. This enables the SLAM system to handle large-scale urban environments with up to 50 million Gaussian ellipsoids. To ensure global consistency in large-scale scenes, we design a Loop Closure module, which innovatively leverages the Novel View Synthesis (NVS) capabilities of Gaussian Splatting for loop closure detection and correction of the Gaussian map. Additionally, we propose a Dynamic Eraser to address the inevitable presence of dynamic objects in real-world outdoor scenes. Extensive evaluations in indoor and outdoor environments demonstrate that our approach achieves localization performance on par with Visual-Inertial Odometry while surpassing recent GS/NeRF SLAM methods. It also significantly outperforms all existing methods in terms of mapping and rendering quality. Furthermore, we developed a mobile app and verified that our framework can generate high-quality Gaussian maps in real time using only a smartphone camera and a low-frequency IMU sensor. To the best of our knowledge, VINGS-Mono is the first monocular Gaussian SLAM method capable of operating in outdoor environments and supporting kilometer-scale large scenes.\n\nVINGS-Mono 是一款专为大场景设计的单目（惯性）高斯散点（Gaussian Splatting, GS）SLAM 框架。该框架由四个主要组件组成：VIO 前端、二维高斯地图、NVS 闭环检测 和 动态擦除器。\n在 VIO 前端 中，RGB 帧通过稠密捆绑调整和不确定性估计处理，用于提取场景几何和相机位姿。基于这些输出，映射模块逐步构建并维护一个二维高斯地图。二维高斯地图的核心组件包括样本化光栅化器、分数管理器和位姿优化器，这些模块协同工作以提升映射速度和定位精度，使得 SLAM 系统能够处理包含多达 5000 万个高斯椭球的大规模城市环境。\n为确保大场景的全局一致性，我们设计了一个 闭环检测模块，创新地利用高斯散点的新视角合成（Novel View Synthesis, NVS）能力进行闭环检测和高斯地图的修正。此外，我们提出了 动态擦除器，以应对真实世界户外场景中动态物体的干扰。\n在室内和室外环境中的大量实验表明，我们的方法在定位性能上可与视觉惯性里程计（Visual-Inertial Odometry, VIO）媲美，同时在映射和渲染质量方面显著优于最新的 GS/NeRF SLAM 方法，并在所有现有方法中表现最佳。此外，我们开发了一款移动应用，验证了该框架仅使用智能手机摄像头和低频 IMU 传感器即可实时生成高质量的高斯地图。\n据我们所知，VINGS-Mono 是首个能够在户外环境中运行并支持公里级大规模场景的单目高斯 SLAM 方法。\n"
  },
  {
    "path": "abs/2501.08370.md",
    "content": "### 3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering\n\nDifferentiable 3D Gaussian splatting has emerged as an efficient and flexible rendering technique for representing complex scenes from a collection of 2D views and enabling high-quality real-time novel-view synthesis. However, its reliance on photometric losses can lead to imprecisely reconstructed geometry and extracted meshes, especially in regions with high curvature or fine detail. We propose a novel regularization method using the gradients of a signed distance function estimated from the Gaussians, to improve the quality of rendering while also extracting a surface mesh. The regularizing normal supervision facilitates better rendering and mesh reconstruction, which is crucial for downstream applications in video generation, animation, AR-VR and gaming. We demonstrate the effectiveness of our approach on datasets such as Mip-NeRF360, Tanks and Temples, and Deep-Blending. Our method scores higher on photorealism metrics compared to other mesh extracting rendering methods without compromising mesh quality.\n\n可微分三维高斯散点（Differentiable 3D Gaussian Splatting）已成为一种高效且灵活的渲染技术，能够从二维视图集合中表示复杂场景，并实现高质量的实时新视角合成。然而，由于其依赖于光度损失，在高曲率区域或细节丰富区域，重建的几何和提取的网格可能不够精确。\n我们提出了一种基于从高斯估算的符号距离函数梯度的新正则化方法，以提高渲染质量并实现表面网格的提取。该正则化的法线监督能够改善渲染效果和网格重建质量，这对于视频生成、动画、增强现实（AR）-虚拟现实（VR）以及游戏等下游应用至关重要。\n我们在 Mip-NeRF360、Tanks and Temples 和 Deep-Blending 等数据集上验证了该方法的有效性。相比其他网格提取渲染方法，我们的方法在不降低网格质量的前提下，在写实度指标上表现更优。\n"
  },
  {
    "path": "abs/2501.08672.md",
    "content": "### GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping\n\nIn recent years, 3D Gaussian splatting (3D-GS) has emerged as a novel scene representation approach. However, existing vision-only 3D-GS methods often rely on hand-crafted heuristics for point-cloud densification and face challenges in handling occlusions and high GPU memory and computation consumption. LiDAR-Inertial-Visual (LIV) sensor configuration has demonstrated superior performance in localization and dense mapping by leveraging complementary sensing characteristics: rich texture information from cameras, precise geometric measurements from LiDAR, and high-frequency motion data from IMU. Inspired by this, we propose a novel real-time Gaussian-based simultaneous localization and mapping (SLAM) system. Our map system comprises a global Gaussian map and a sliding window of Gaussians, along with an IESKF-based odometry. The global Gaussian map consists of hash-indexed voxels organized in a recursive octree, effectively covering sparse spatial volumes while adapting to different levels of detail and scales. The Gaussian map is initialized through multi-sensor fusion and optimized with photometric gradients. Our system incrementally maintains a sliding window of Gaussians, significantly reducing GPU computation and memory consumption by only optimizing the map within the sliding window. Moreover, we implement a tightly coupled multi-sensor fusion odometry with an iterative error state Kalman filter (IESKF), leveraging real-time updating and rendering of the Gaussian map. Our system represents the first real-time Gaussian-based SLAM framework deployable on resource-constrained embedded systems, demonstrated on the NVIDIA Jetson Orin NX platform. The framework achieves real-time performance while maintaining robust multi-sensor fusion capabilities.\n\n近年来，三维高斯散点（3D Gaussian Splatting, 3D-GS）作为一种新颖的场景表示方法迅速兴起。然而，现有基于视觉的 3D-GS 方法通常依赖人工设计的启发式规则来实现点云加密，并且在处理遮挡、高 GPU 内存和计算资源消耗方面面临挑战。LiDAR-惯性-视觉（LIV）传感器配置通过结合相机的丰富纹理信息、LiDAR 的精确几何测量以及 IMU 的高频运动数据，已在定位和密集建图方面展现出卓越性能。\n受此启发，我们提出了一种基于高斯的实时同步定位与建图（SLAM）系统。我们的地图系统由一个全局高斯地图和一个高斯滑动窗口组成，并结合了基于迭代误差状态卡尔曼滤波器（IESKF）的里程计模块。全局高斯地图通过递归八叉树结构将哈希索引的体素组织起来，有效覆盖稀疏空间体积，同时适应不同的细节层次和尺度。高斯地图通过多传感器融合初始化，并通过光度梯度进行优化。\n我们的系统增量维护一个高斯滑动窗口，通过仅优化滑动窗口内的地图，显著降低了 GPU 的计算和内存消耗。此外，我们实现了一种紧耦合的多传感器融合里程计，结合 IESKF，实现了高斯地图的实时更新与渲染。\n该系统是第一个可在资源受限嵌入式系统上部署的基于高斯的实时 SLAM 框架，并在 NVIDIA Jetson Orin NX 平台上得以验证。该框架在保持鲁棒多传感器融合能力的同时，实现了实时性能。\n"
  },
  {
    "path": "abs/2501.08982.md",
    "content": "### CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation\n\nLocalizing text descriptions in large-scale 3D scenes is inherently an ambiguous task. This nonetheless arises while describing general concepts, e.g. all traffic lights in a city. To facilitate reasoning based on such concepts, text localization in the form of distribution is required. In this paper, we generate the distribution of the camera poses conditioned upon the textual description.\nTo facilitate such generation, we propose a diffusion-based architecture that conditionally diffuses the noisy 6DoF camera poses to their plausible locations. The conditional signals are derived from the text descriptions, using the pre-trained text encoders. The connection between text descriptions and pose distribution is established through pretrained Vision-Language-Model, i.e. CLIP. Furthermore, we demonstrate that the candidate poses for the distribution can be further refined by rendering potential poses using 3D Gaussian splatting, guiding incorrectly posed samples towards locations that better align with the textual description, through visual reasoning. We demonstrate the effectiveness of our method by comparing it with both standard retrieval methods and learning-based approaches. Our proposed method consistently outperforms these baselines across all five large-scale datasets.\n\n在大规模三维场景中定位文本描述本质上是一个具有模糊性的任务，尤其是在描述诸如“城市中所有交通信号灯”这样的通用概念时。为便于基于这些概念进行推理，需要以分布形式表达文本定位。在本文中，我们生成了基于文本描述条件的相机位姿分布。\n为了实现这种生成，我们提出了一种基于扩散的架构，该架构在文本描述条件下将噪声化的 6 自由度（6DoF）相机位姿扩散到其可能的位置。条件信号通过预训练的文本编码器从文本描述中提取。文本描述与位姿分布之间的连接通过预训练的视觉-语言模型（Vision-Language-Model），即 CLIP 建立。\n此外，我们证明了可以通过使用三维高斯散点（3D Gaussian Splatting）渲染潜在位姿，对分布中的候选位姿进行进一步优化。通过视觉推理，引导不正确的样本向更符合文本描述的位置移动。\n我们通过与标准检索方法和基于学习的方法进行比较，验证了该方法的有效性。在五个大规模数据集上的实验表明，我们的方法在所有基准上均优于这些基线方法，表现出了一致的优势。\n"
  },
  {
    "path": "abs/2501.09302.md",
    "content": "### Creating Virtual Environments with 3D Gaussian Splatting: A Comparative Study\n\n3D Gaussian Splatting (3DGS) has recently emerged as an innovative and efficient 3D representation technique. While its potential for extended reality (XR) applications is frequently highlighted, its practical effectiveness remains underexplored. In this work, we examine three distinct 3DGS-based approaches for virtual environment (VE) creation, leveraging their unique strengths for efficient and visually compelling scene representation. By conducting a comparable study, we evaluate the feasibility of 3DGS in creating immersive VEs, identify its limitations in XR applications, and discuss future research and development opportunities.\n\n3D 高斯点渲染（3D Gaussian Splatting，简称 3DGS）近年来成为一种创新且高效的 3D 表示技术。尽管其在扩展现实（XR）应用中的潜力常被强调，但其实际效果尚未得到充分探索。在本工作中，我们研究了三种基于 3DGS 的虚拟环境（VE）创建方法，利用其各自的优势来实现高效且视觉吸引力的场景表示。通过对比研究，我们评估了 3DGS 在构建沉浸式虚拟环境中的可行性，明确了其在 XR 应用中的局限性，并探讨了未来研究与开发的机遇。\n"
  },
  {
    "path": "abs/2501.09978.md",
    "content": "### GaussianAvatar-Editor: Photorealistic Animatable Gaussian Head Avatar Editor\n\nWe introduce GaussianAvatar-Editor, an innovative framework for text-driven editing of animatable Gaussian head avatars that can be fully controlled in expression, pose, and viewpoint. Unlike static 3D Gaussian editing, editing animatable 4D Gaussian avatars presents challenges related to motion occlusion and spatial-temporal inconsistency. To address these issues, we propose the Weighted Alpha Blending Equation (WABE). This function enhances the blending weight of visible Gaussians while suppressing the influence on non-visible Gaussians, effectively handling motion occlusion during editing. Furthermore, to improve editing quality and ensure 4D consistency, we incorporate conditional adversarial learning into the editing process. This strategy helps to refine the edited results and maintain consistency throughout the animation. By integrating these methods, our GaussianAvatar-Editor achieves photorealistic and consistent results in animatable 4D Gaussian editing. We conduct comprehensive experiments across various subjects to validate the effectiveness of our proposed techniques, which demonstrates the superiority of our approach over existing methods.\n\n我们提出了 GaussianAvatar-Editor，这是一个用于文本驱动可动画高斯头部头像编辑的创新框架，可实现对表情、姿态和视角的全面控制。与静态 3D 高斯编辑不同，编辑可动画的 4D 高斯头像面临运动遮挡和时空一致性的问题。\n为了解决这些挑战，我们提出了加权阿尔法混合方程（Weighted Alpha Blending Equation, WABE）。该函数通过增强可见高斯的混合权重，同时抑制对不可见高斯的影响，能够有效处理编辑过程中出现的运动遮挡。此外，为提高编辑质量并确保 4D 一致性，我们将条件对抗学习融入编辑过程。此策略有助于优化编辑结果，并在动画的整个过程中保持一致性。\n通过整合这些方法，我们的 GaussianAvatar-Editor 实现了在可动画 4D 高斯编辑中的逼真效果和一致性。我们对多个主体进行了全面实验，以验证所提技术的有效性，结果表明我们的方法在现有技术中具有明显优势。\n"
  },
  {
    "path": "abs/2501.10283.md",
    "content": "### GSTAR: Gaussian Surface Tracking and Reconstruction\n\n3D Gaussian Splatting techniques have enabled efficient photo-realistic rendering of static scenes. Recent works have extended these approaches to support surface reconstruction and tracking. However, tracking dynamic surfaces with 3D Gaussians remains challenging due to complex topology changes, such as surfaces appearing, disappearing, or splitting. To address these challenges, we propose GSTAR, a novel method that achieves photo-realistic rendering, accurate surface reconstruction, and reliable 3D tracking for general dynamic scenes with changing topology. Given multi-view captures as input, GSTAR binds Gaussians to mesh faces to represent dynamic objects. For surfaces with consistent topology, GSTAR maintains the mesh topology and tracks the meshes using Gaussians. In regions where topology changes, GSTAR adaptively unbinds Gaussians from the mesh, enabling accurate registration and the generation of new surfaces based on these optimized Gaussians. Additionally, we introduce a surface-based scene flow method that provides robust initialization for tracking between frames. Experiments demonstrate that our method effectively tracks and reconstructs dynamic surfaces, enabling a range of applications.\n\n3D 高斯点渲染技术已经实现了高效的静态场景逼真渲染。近期的研究将这些方法扩展至支持表面重建和跟踪。然而，由于动态表面的复杂拓扑变化（例如表面出现、消失或分裂），使用 3D 高斯跟踪动态表面仍然具有挑战性。\n为了解决这些问题，我们提出了 GSTAR，一种新方法，可针对拓扑变化的通用动态场景实现逼真渲染、精确表面重建和可靠的 3D 跟踪。在多视图捕获作为输入的情况下，GSTAR 将高斯绑定到网格面，用于表示动态对象。对于拓扑一致的表面，GSTAR 保持网格拓扑不变，并使用高斯跟踪网格。而在拓扑发生变化的区域，GSTAR 自适应地将高斯从网格解绑，从而通过优化后的高斯实现准确的配准并生成新表面。\n此外，我们提出了一种基于表面的场景流方法，为帧间跟踪提供了稳健的初始化。实验表明，我们的方法能够有效地跟踪和重建动态表面，从而支持多种应用场景。\n"
  },
  {
    "path": "abs/2501.10462.md",
    "content": "### BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation\n\nWith the widespread use of virtual reality applications, 3D scene generation has become a new challenging research frontier. 3D scenes have highly complex structures and need to ensure that the output is dense, coherent, and contains all necessary structures. Many current 3D scene generation methods rely on pre-trained text-to-image diffusion models and monocular depth estimators. However, the generated scenes occupy large amounts of storage space and often lack effective regularisation methods, leading to geometric distortions. To this end, we propose BloomScene, a lightweight structured 3D Gaussian splatting for crossmodal scene generation, which creates diverse and high-quality 3D scenes from text or image inputs. Specifically, a crossmodal progressive scene generation framework is proposed to generate coherent scenes utilizing incremental point cloud reconstruction and 3D Gaussian splatting. Additionally, we propose a hierarchical depth prior-based regularization mechanism that utilizes multi-level constraints on depth accuracy and smoothness to enhance the realism and continuity of the generated scenes. Ultimately, we propose a structured context-guided compression mechanism that exploits structured hash grids to model the context of unorganized anchor attributes, which significantly eliminates structural redundancy and reduces storage overhead. Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines.\n\n随着虚拟现实应用的广泛普及，3D 场景生成已成为一个具有挑战性的研究前沿。3D 场景具有高度复杂的结构，要求输出密集、连贯，并包含所有必要的结构。许多当前的 3D 场景生成方法依赖预训练的文本到图像扩散模型和单目深度估计器。然而，生成的场景通常占用大量存储空间，且缺乏有效的正则化方法，导致几何失真问题。\n为此，我们提出了 BloomScene，一种基于轻量化结构化 3D 高斯点渲染的跨模态场景生成方法，可从文本或图像输入生成多样化且高质量的 3D 场景。具体而言，我们设计了一种跨模态渐进式场景生成框架，通过增量点云重建和 3D 高斯点渲染生成连贯的场景。此外，我们提出了一种基于分层深度先验的正则化机制，通过多层次的深度精度和光滑性约束，提高生成场景的真实感和连贯性。\n最终，我们提出了一种结构化上下文引导的压缩机制，利用结构化哈希网格建模非组织锚点属性的上下文，有效消除结构冗余并减少存储开销。多场景的综合实验表明，与多个基线方法相比，我们的框架展现出了显著的潜力和优势。\n"
  },
  {
    "path": "abs/2501.10788.md",
    "content": "### Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting\n\nGaussian Splatting has emerged as a prominent 3D representation in novel view synthesis, but it still suffers from appearance variations, which are caused by various factors, such as modern camera ISPs, different time of day, weather conditions, and local light changes. These variations can lead to floaters and color distortions in the rendered images/videos. Recent appearance modeling approaches in Gaussian Splatting are either tightly coupled with the rendering process, hindering real-time rendering, or they only account for mild global variations, performing poorly in scenes with local light changes. In this paper, we propose DAVIGS, a method that decouples appearance variations in a plug-and-play and efficient manner. By transforming the rendering results at the image level instead of the Gaussian level, our approach can model appearance variations with minimal optimization time and memory overhead. Furthermore, our method gathers appearance-related information in 3D space to transform the rendered images, thus building 3D consistency across views implicitly. We validate our method on several appearance-variant scenes, and demonstrate that it achieves state-of-the-art rendering quality with minimal training time and memory usage, without compromising rendering speeds. Additionally, it provides performance improvements for different Gaussian Splatting baselines in a plug-and-play manner.\n\n高斯点渲染已成为新视角合成中的一种重要 3D 表示方法，但仍然面临外观变化问题。这些变化由多种因素引起，例如现代相机 ISP、不同的时间、天气条件和局部光照变化。这些因素可能导致渲染图像或视频中出现浮点伪影和颜色失真问题。目前高斯点渲染中的外观建模方法通常与渲染过程紧密耦合，限制了实时渲染的实现；或者仅能处理轻微的全局变化，在存在局部光照变化的场景中表现较差。\n为此，我们提出了 DAVIGS，一种解耦外观变化的高效、模块化方法。通过在图像级别（而非高斯级别）对渲染结果进行转换，我们的方法能够以最小的优化时间和内存开销建模外观变化。此外，我们的方法在 3D 空间中收集与外观相关的信息，用于对渲染图像进行转换，从而隐式地在视角之间建立 3D 一致性。\n我们在多个外观变化场景中验证了该方法，结果表明 DAVIGS 在实现最先进渲染质量的同时，显著减少了训练时间和内存使用，并且不影响渲染速度。此外，它还能以模块化的方式为不同的高斯点渲染基线方法带来性能改进。\n"
  },
  {
    "path": "abs/2501.11020.md",
    "content": "### Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car Reconstruction\n\n3D car modeling is crucial for applications in autonomous driving systems, virtual and augmented reality, and gaming. However, due to the distinctive properties of cars, such as highly reflective and transparent surface materials, existing methods often struggle to achieve accurate 3D car reconstruction. To address these limitations, we propose Car-GS, a novel approach designed to mitigate the effects of specular highlights and the coupling of RGB and geometry in 3D geometric and shading reconstruction (3DGS). Our method incorporates three key innovations: First, we introduce view-dependent Gaussian primitives to effectively model surface reflections. Second, we identify the limitations of using a shared opacity parameter for both image rendering and geometric attributes when modeling transparent objects. To overcome this, we assign a learnable geometry-specific opacity to each 2D Gaussian primitive, dedicated solely to rendering depth and normals. Third, we observe that reconstruction errors are most prominent when the camera view is nearly orthogonal to glass surfaces. To address this issue, we develop a qualityaware supervision module that adaptively leverages normal priors from a pre-trained large-scale normal model. Experimental results demonstrate that Car-GS achieves precise reconstruction of car surfaces and significantly outperforms prior methods.\n\n3D 汽车建模对于自动驾驶系统、虚拟与增强现实以及游戏等应用至关重要。然而，由于汽车的独特特性，例如高度反射和透明的表面材料，现有方法往往难以实现准确的 3D 汽车重建。为了解决这些局限性，我们提出了 Car-GS，这是一种旨在缓解 3D 几何和光照重建 (3DGS) 中镜面高光以及 RGB 与几何耦合影响的新方法。我们的方法包含以下三个关键创新：\n首先，我们引入视角依赖的高斯基元，有效地建模了表面反射。\n其次，我们识别到在建模透明物体时，共享的不透明参数同时用于图像渲染和几何属性会导致局限性。为此，我们为每个 2D 高斯基元分配了一个专用于渲染深度和法线的可学习几何特定透明度。\n第三，我们观察到，当相机视角几乎垂直于玻璃表面时，重建误差最为显著。为解决这一问题，我们开发了一种基于质量感知的监督模块，该模块自适应地利用来自预训练的大规模法线模型的法线先验。\n实验结果表明，Car-GS 在汽车表面重建方面表现出色，显著优于现有方法。\n"
  },
  {
    "path": "abs/2501.11102.md",
    "content": "### RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering\n\nEfficiently synthesizing novel views from sparse inputs while maintaining accuracy remains a critical challenge in 3D reconstruction. While advanced techniques like radiance fields and 3D Gaussian Splatting achieve rendering quality and impressive efficiency with dense view inputs, they suffer from significant geometric reconstruction errors when applied to sparse input views. Moreover, although recent methods leverage monocular depth estimation to enhance geometric learning, their dependence on single-view estimated depth often leads to view inconsistency issues across different viewpoints. Consequently, this reliance on absolute depth can introduce inaccuracies in geometric information, ultimately compromising the quality of scene reconstruction with Gaussian splats. In this paper, we present RDG-GS, a novel sparse-view 3D rendering framework with Relative Depth Guidance based on 3D Gaussian Splatting. The core innovation lies in utilizing relative depth guidance to refine the Gaussian field, steering it towards view-consistent spatial geometric representations, thereby enabling the reconstruction of accurate geometric structures and capturing intricate textures. First, we devise refined depth priors to rectify the coarse estimated depth and insert global and fine-grained scene information to regular Gaussians. Building on this, to address spatial geometric inaccuracies from absolute depth, we propose relative depth guidance by optimizing the similarity between spatially correlated patches of depth and images. Additionally, we also directly deal with the sparse areas challenging to converge by the adaptive sampling for quick densification. Across extensive experiments on Mip-NeRF360, LLFF, DTU, and Blender, RDG-GS demonstrates state-of-the-art rendering quality and efficiency, making a significant advancement for real-world application.\n\n高效地从稀疏输入中合成新视角，同时保持准确性，仍然是 3D 重建中的关键挑战。尽管辐射场和 3D 高斯点渲染等先进技术在密集视角输入下能够实现高质量渲染和出色效率，但在应用于稀疏输入视角时，它们会出现显著的几何重建误差。此外，尽管近期方法利用单目深度估计来增强几何学习，但对单视角估计深度的依赖常导致不同视点间的视图不一致性。这种对绝对深度的依赖可能引入几何信息的不准确性，从而削弱高斯点场景重建的质量。\n在本文中，我们提出了 RDG-GS，一种基于相对深度引导和 3D 高斯点渲染的稀疏视角 3D 渲染框架。其核心创新在于利用相对深度引导优化高斯场，推动其生成视图一致的空间几何表示，从而实现精准的几何结构重建并捕捉复杂的纹理。\n首先，我们设计了精细化的深度先验，用以修正粗略估计的深度，并将全局和细粒度的场景信息融入常规高斯点中。在此基础上，为解决绝对深度引发的空间几何误差，我们提出了相对深度引导，通过优化深度与图像中空间相关的区域块之间的相似性，提升几何一致性。此外，对于稀疏区域的收敛问题，我们通过自适应采样快速实现点的密集化。\n在 Mip-NeRF360、LLFF、DTU 和 Blender 数据集上的大量实验表明，RDG-GS 实现了最先进的渲染质量和效率，为真实场景应用带来了重要进展。\n"
  },
  {
    "path": "abs/2501.11508.md",
    "content": "### See In Detail: Enhancing Sparse-view 3D Gaussian Splatting with Local Depth and Semantic Regularization\n\n3D Gaussian Splatting (3DGS) has shown remarkable performance in novel view synthesis. However, its rendering quality deteriorates with sparse inphut views, leading to distorted content and reduced details. This limitation hinders its practical application. To address this issue, we propose a sparse-view 3DGS method. Given the inherently ill-posed nature of sparse-view rendering, incorporating prior information is crucial. We propose a semantic regularization technique, using features extracted from the pretrained DINO-ViT model, to ensure multi-view semantic consistency. Additionally, we propose local depth regularization, which constrains depth values to improve generalization on unseen views. Our method outperforms state-of-the-art novel view synthesis approaches, achieving up to 0.4dB improvement in terms of PSNR on the LLFF dataset, with reduced distortion and enhanced visual quality.\n\n3D 高斯点渲染 (3DGS) 在新视角合成中表现出色，但在稀疏输入视角情况下，其渲染质量显著下降，导致内容失真和细节丢失。这一局限性阻碍了其实际应用。为了解决此问题，我们提出了一种稀疏视角的 3DGS 方法。由于稀疏视角渲染本质上是不适定问题，引入先验信息至关重要。\n我们提出了一种语义正则化技术，利用预训练的 DINO-ViT 模型提取的特征，确保多视角的语义一致性。此外，我们引入了局部深度正则化，通过约束深度值来提高对未见视角的泛化能力。\n实验表明，我们的方法优于当前最先进的新视角合成方法，在 LLFF 数据集上的 PSNR 提升高达 0.4dB，同时显著降低失真并增强视觉质量。\n"
  },
  {
    "path": "abs/2501.12060.md",
    "content": "### GaussianVideo: Efficient Video Representation Through 2D Gaussian Splatting\n\n3D Gaussian splats have emerged as a revolutionary, effective, learned representation for static 3D scenes. In this work, we explore using 2D Gaussian splats as a new primitive for representing videos. We propose GaussianVideo, an approach to learning a set of 2D Gaussian splats that can effectively represent video frames. GaussianVideo incorporates the following techniques: (i) To exploit temporal redundancy among adjacent frames, which can speed up training and improve the compression efficiency, we predict the Gaussian splats of a frame based on its previous frame; (ii) To control the trade-offs between file size and quality, we remove Gaussian splats with low contribution to the video quality; (iii) To capture dynamics in videos, we randomly add Gaussian splats to fit content with large motion or newly-appeared objects; (iv) To handle significant changes in the scene, we detect key frames based on loss differences during the learning process. Experiment results show that GaussianVideo achieves good rate-distortion trade-offs, comparable to state-of-the-art video codecs such as AV1 and VVC, and a rendering speed of 1500 fps for a 1920x1080 video.\n\n3D 高斯点已成为静态 3D 场景的一种革命性且高效的学习表示方法。在本工作中，我们探索使用 2D 高斯点作为表示视频的新基本单元，并提出了一种名为 GaussianVideo 的方法，用于学习一组能够有效表示视频帧的 2D 高斯点。GaussianVideo 包括以下技术：（i）通过基于前一帧预测当前帧的高斯点来利用相邻帧之间的时间冗余，这可以加速训练并提高压缩效率；（ii）通过移除对视频质量贡献较低的高斯点来控制文件大小与质量的权衡；（iii）通过随机添加高斯点来适应具有大运动量或新出现物体的内容，以捕捉视频中的动态变化；（iv）在学习过程中根据损失差异检测关键帧，以处理场景的显著变化。实验结果表明，GaussianVideo 在率失真（rate-distortion）权衡方面达到了与最先进的视频编解码器（如 AV1 和 VVC）相当的水平，并且在 1920x1080 视频上的渲染速度达到了 1500 帧每秒（fps）。\n"
  },
  {
    "path": "abs/2501.12255.md",
    "content": "### HAC++: Towards 100X Compression of 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To achieve a compact size, we propose HAC++, which leverages the relationships between unorganized anchors and a structured hash grid, utilizing their mutual information for context modeling. Additionally, HAC++ captures intra-anchor contextual relationships to further enhance compression performance. To facilitate entropy coding, we utilize Gaussian distributions to precisely estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Moreover, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously improving fidelity. It also delivers more than 20X size reduction compared to Scaffold-GS.\n\n3D Gaussian Splatting (3DGS) 已成为一种极具前景的新视角合成框架，兼具快速渲染速度和高保真度。然而，大量高斯点及其相关属性对高效压缩技术提出了严峻挑战。此外，高斯点云（或本文中的锚点）稀疏且无序的特性进一步增加了压缩的难度。为实现紧凑的存储尺寸，我们提出了 HAC++，该方法利用无序锚点与结构化哈希网格之间的关系，通过其相互信息进行上下文建模。此外，HAC++ 还捕捉锚点内部的上下文关系，从而进一步提升压缩性能。为支持熵编码，我们利用高斯分布精确估计每个量化属性的概率，并设计了自适应量化模块，以实现这些属性的高精度量化，从而提高保真度的恢复效果。与此同时，我们引入了一种自适应掩蔽策略，用于剔除无效的高斯点和锚点。总体而言，HAC++ 在所有数据集上的平均压缩率较原始 3DGS 实现了超过 100 倍的尺寸缩减，同时提升了保真度。相比 Scaffold-GS，HAC++ 还实现了超过 20 倍的尺寸压缩。\n"
  },
  {
    "path": "abs/2501.12369.md",
    "content": "### DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions\n\nSplatting-based 3D reconstruction methods have gained popularity with the advent of 3D Gaussian Splatting, efficiently synthesizing high-quality novel views. These methods commonly resort to using exponential family functions, such as the Gaussian function, as reconstruction kernels due to their anisotropic nature, ease of projection, and differentiability in rasterization. However, the field remains restricted to variations within the exponential family, leaving generalized reconstruction kernels largely underexplored, partly due to the lack of easy integrability in 3D to 2D projections. In this light, we show that a class of decaying anisotropic radial basis functions (DARBFs), which are non-negative functions of the Mahalanobis distance, supports splatting by approximating the Gaussian function's closed-form integration advantage. With this fresh perspective, we demonstrate up to 34% faster convergence during training and a 15% reduction in memory consumption across various DARB reconstruction kernels, while maintaining comparable PSNR, SSIM, and LPIPS results.\n\n基于 Splatting 的 3D 重建方法随着 3D Gaussian Splatting 的出现而广受欢迎，能够高效地合成高质量的新视角。这些方法通常使用指数族函数（如高斯函数）作为重建核，因其各向异性、易于投影以及在光栅化中的可微性。然而，该领域的研究主要局限于指数族函数的变体，对通用重建核的探索相对较少，部分原因是缺乏简单的 3D 到 2D 投影积分方法。在此背景下，我们表明，一类衰减的各向异性径向基函数（DARBFs），作为 Mahalanobis 距离的非负函数，通过近似高斯函数的闭式积分优势，能够支持 Splatting 操作。通过这种全新的视角，我们展示了在不同的 DARBF 重建核中，训练收敛速度提高了多达 34%，内存消耗减少了 15%，同时在 PSNR、SSIM 和 LPIPS 结果上保持了可比的表现。\n"
  },
  {
    "path": "abs/2501.13045.md",
    "content": "### Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes\n\n3D Gaussian Splatting (3DGS) has emerged as a promising representation for photorealistic rendering of 3D scenes. However, its high storage requirements pose significant challenges for practical applications. We observe that Gaussians exhibit distinct roles and characteristics that are analogous to traditional artistic techniques -- Like how artists first sketch outlines before filling in broader areas with color, some Gaussians capture high-frequency features like edges and contours; While other Gaussians represent broader, smoother regions, that are analogous to broader brush strokes that add volume and depth to a painting. Based on this observation, we propose a novel hybrid representation that categorizes Gaussians into (i) Sketch Gaussians, which define scene boundaries, and (ii) Patch Gaussians, which cover smooth regions. Sketch Gaussians are efficiently encoded using parametric models, leveraging their geometric coherence, while Patch Gaussians undergo optimized pruning, retraining, and vector quantization to maintain volumetric consistency and storage efficiency. Our comprehensive evaluation across diverse indoor and outdoor scenes demonstrates that this structure-aware approach achieves up to 32.62% improvement in PSNR, 19.12% in SSIM, and 45.41% in LPIPS at equivalent model sizes, and correspondingly, for an indoor scene, our model maintains the visual quality with 2.3% of the original model size.\n\n3D高斯点云（3DGS）作为一种用于3D场景写实渲染的有前景的表示方法，近年来受到关注。然而，它的高存储需求在实际应用中带来了显著挑战。我们观察到，高斯点云表现出不同的角色和特征，这些特征与传统艺术技法类似——就像艺术家在填充较大区域的颜色之前首先勾画出轮廓一样，部分高斯点云捕捉了高频特征，如边缘和轮廓；而其他高斯点云则代表了更广泛、更平滑的区域，类似于更大幅的笔触，给画作添加体积感和深度。基于这一观察，我们提出了一种新颖的混合表示方法，将高斯点云分为：（i）草图高斯点云，用于定义场景边界；（ii）贴片高斯点云，用于覆盖平滑区域。草图高斯点云通过利用其几何一致性，使用参数化模型进行高效编码；而贴片高斯点云则通过优化剪枝、再训练和向量量化等技术，以保持体积一致性和存储效率。我们在多种室内和室外场景中的综合评估表明，这种结构感知的方法在相同模型大小下，实现了高达32.62%的PSNR提升，19.12%的SSIM提升和45.41%的LPIPS提升；对于室内场景，我们的模型在原始模型大小的2.3%下，保持了视觉质量。\n"
  },
  {
    "path": "abs/2501.13335.md",
    "content": "### Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos\n\nWe introduce Deblur-Avatar, a novel framework for modeling high-fidelity, animatable 3D human avatars from motion-blurred monocular video inputs. Motion blur is prevalent in real-world dynamic video capture, especially due to human movements in 3D human avatar modeling. Existing methods either (1) assume sharp image inputs, failing to address the detail loss introduced by motion blur, or (2) mainly consider blur by camera movements, neglecting the human motion blur which is more common in animatable avatars. Our proposed approach integrates a human movement-based motion blur model into 3D Gaussian Splatting (3DGS). By explicitly modeling human motion trajectories during exposure time, we jointly optimize the trajectories and 3D Gaussians to reconstruct sharp, high-quality human avatars. We employ a pose-dependent fusion mechanism to distinguish moving body regions, optimizing both blurred and sharp areas effectively. Extensive experiments on synthetic and real-world datasets demonstrate that Deblur-Avatar significantly outperforms existing methods in rendering quality and quantitative metrics, producing sharp avatar reconstructions and enabling real-time rendering under challenging motion blur conditions.\n\n我们介绍了Deblur-Avatar，一个新颖的框架，用于从运动模糊的单目视频输入中建模高保真、可动画化的3D人类虚拟形象。运动模糊在现实世界的动态视频捕捉中很常见，尤其是在3D人类虚拟形象建模中，由于人体运动的原因。现有方法要么（1）假设图像输入是清晰的，未能解决运动模糊带来的细节丢失，要么（2）主要考虑相机运动产生的模糊，忽略了在人类虚拟形象的动画中更常见的人的运动模糊。我们提出的方法将基于人体运动的运动模糊模型集成到3D高斯溅射（3DGS）中。通过在曝光时间内显式建模人体运动轨迹，我们共同优化这些轨迹和3D高斯，从而重建清晰、高质量的人物虚拟形象。我们采用了一个依赖于姿势的融合机制来区分运动中的身体区域，有效地优化模糊区域和清晰区域。大量合成数据和真实数据集上的实验表明，Deblur-Avatar在渲染质量和定量指标方面显著超越了现有方法，生成清晰的虚拟形象重建，并在具有挑战性的运动模糊条件下实现实时渲染。\n"
  },
  {
    "path": "abs/2501.13402.md",
    "content": "### VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM\n\nRecently, map representations based on radiance fields such as 3D Gaussian Splatting and NeRF, which excellent for realistic depiction, have attracted considerable attention, leading to attempts to combine them with SLAM. While these approaches can build highly realistic maps, large-scale SLAM still remains a challenge because they require a large number of Gaussian images for mapping and adjacent images as keyframes for tracking. We propose a novel 3D Gaussian Splatting SLAM method, VIGS SLAM, that utilizes sensor fusion of RGB-D and IMU sensors for large-scale indoor environments. To reduce the computational load of 3DGS-based tracking, we adopt an ICP-based tracking framework that combines IMU preintegration to provide a good initial guess for accurate pose estimation. Our proposed method is the first to propose that Gaussian Splatting-based SLAM can be effectively performed in large-scale environments by integrating IMU sensor measurements. This proposal not only enhances the performance of Gaussian Splatting SLAM beyond room-scale scenarios but also achieves SLAM performance comparable to state-of-the-art methods in large-scale indoor environments.\n\n最近，基于辐射场的地图表示方法，如3D高斯点云（3D Gaussian Splatting）和NeRF，以其出色的写实表现吸引了大量关注，并引发了将它们与SLAM（同步定位与地图构建）结合的尝试。虽然这些方法能够构建高写实度的地图，但由于它们需要大量的高斯图像进行建图，并且需要相邻图像作为关键帧进行跟踪，因此大规模SLAM仍然面临挑战。我们提出了一种新颖的3D高斯点云SLAM方法——VIGS SLAM，利用RGB-D和IMU传感器的传感器融合，应用于大规模室内环境。为了减少基于3DGS的跟踪计算负担，我们采用了一种基于ICP的跟踪框架，并结合IMU预积分，为准确的姿态估计提供一个良好的初始猜测。我们提出的方法首次提出通过集成IMU传感器测量，基于高斯点云的SLAM可以在大规模环境中有效执行。该提案不仅增强了基于高斯点云的SLAM在超越房间规模场景中的性能，而且在大规模室内环境中实现了与最先进方法相当的SLAM性能。\n"
  },
  {
    "path": "abs/2501.13417.md",
    "content": "### GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization\n\nMapping and localization are crucial problems in robotics and autonomous driving. Recent advances in 3D Gaussian Splatting (3DGS) have enabled precise 3D mapping and scene understanding by rendering photo-realistic images. However, existing 3DGS methods often struggle to accurately reconstruct a 3D map that reflects the actual scale and geometry of the real world, which degrades localization performance. To address these limitations, we propose a novel 3DGS method called Geometry-Aware Gaussian Splatting (GeomGS). This method fully integrates LiDAR data into 3D Gaussian primitives via a probabilistic approach, as opposed to approaches that only use LiDAR as initial points or introduce simple constraints for Gaussian points. To this end, we introduce a Geometric Confidence Score (GCS), which identifies the structural reliability of each Gaussian point. The GCS is optimized simultaneously with Gaussians under probabilistic distance constraints to construct a precise structure. Furthermore, we propose a novel localization method that fully utilizes both the geometric and photometric properties of GeomGS. Our GeomGS demonstrates state-of-the-art geometric and localization performance across several benchmarks, while also improving photometric performance.\n\n地图构建和定位是机器人技术和自动驾驶中的关键问题。最近，3D高斯点云（3DGS）的进展使得通过渲染照片级真实感图像，能够实现精确的3D地图构建和场景理解。然而，现有的3DGS方法通常难以准确重建反映真实世界实际尺度和几何的3D地图，从而影响定位性能。为了解决这些问题，我们提出了一种新颖的3DGS方法——几何感知高斯点云（Geometry-Aware Gaussian Splatting，GeomGS）。该方法通过概率方法将LiDAR数据完全整合到3D高斯原语中，而不是仅将LiDAR作为初始点或为高斯点引入简单约束。为此，我们引入了几何置信度评分（Geometric Confidence Score，GCS），用于识别每个高斯点的结构可靠性。GCS与高斯点一起，在概率距离约束下进行优化，从而构建出精确的结构。此外，我们还提出了一种新型的定位方法，充分利用GeomGS的几何和光度特性。我们的GeomGS方法在多个基准测试中展示了最先进的几何和定位性能，同时也提高了光度性能。\n"
  },
  {
    "path": "abs/2501.13449.md",
    "content": "### MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance\n\nWhile single-concept customization has been studied in 3D, multi-concept customization remains largely unexplored. To address this, we propose MultiDreamer3D that can generate coherent multi-concept 3D content in a divide-and-conquer manner. First, we generate 3D bounding boxes using an LLM-based layout controller. Next, a selective point cloud generator creates coarse point clouds for each concept. These point clouds are placed in the 3D bounding boxes and initialized into 3D Gaussian Splatting with concept labels, enabling precise identification of concept attributions in 2D projections. Finally, we refine 3D Gaussians via concept-aware interval score matching, guided by concept-aware diffusion. Our experimental results show that MultiDreamer3D not only ensures object presence and preserves the distinct identities of each concept but also successfully handles complex cases such as property change or interaction. To the best of our knowledge, we are the first to address the multi-concept customization in 3D.\n\n尽管单一概念的定制在3D领域已被研究，但多概念定制仍然 largely 未被探索。为了解决这一问题，我们提出了MultiDreamer3D，它可以以分而治之的方式生成一致的多概念3D内容。首先，我们使用基于大型语言模型（LLM）的布局控制器生成3D边界框。接着，选择性点云生成器为每个概念创建粗略的点云。这些点云被放置在3D边界框内，并初始化为带有概念标签的3D高斯溅射，从而在2D投影中实现精确的概念归属识别。最后，我们通过概念感知的区间评分匹配和概念感知扩散来精炼3D高斯。实验结果表明，MultiDreamer3D不仅确保了物体的存在并保持每个概念的独特身份，而且成功处理了如属性变化或交互等复杂情况。根据我们所知，我们是首个解决3D中多概念定制问题的方法。\n"
  },
  {
    "path": "abs/2501.13558.md",
    "content": "### GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression\n\n3D Gaussian Splatting enhances real-time performance in novel view synthesis by representing scenes with mixtures of Gaussians and utilizing differentiable rasterization. However, it typically requires large storage capacity and high VRAM, demanding the design of effective pruning and compression techniques. Existing methods, while effective in some scenarios, struggle with scalability and fail to adapt models based on critical factors such as computing capabilities or bandwidth, requiring to re-train the model under different configurations. In this work, we propose a novel, model-agnostic technique that organizes Gaussians into several hierarchical layers, enabling progressive Level of Detail (LoD) strategy. This method, combined with recent approach of compression of 3DGS, allows a single model to instantly scale across several compression ratios, with minimal to none impact to quality compared to a single non-scalable model and without requiring re-training. We validate our approach on typical datasets and benchmarks, showcasing low distortion and substantial gains in terms of scalability and adaptability.\n\n3D高斯点云（3DGS）通过使用高斯混合表示场景并利用可微光栅化技术，在新视角合成中提升了实时性能。然而，它通常需要较大的存储容量和高显存，要求设计有效的剪枝和压缩技术。现有的方法虽然在某些场景下有效，但在可扩展性方面存在困难，且无法根据计算能力或带宽等关键因素调整模型，通常需要在不同配置下重新训练模型。在本工作中，我们提出了一种新颖的、与模型无关的技术，将高斯点云组织为多个层级，从而实现渐进的细节层次（Level of Detail, LoD）策略。该方法结合了最近的3DGS压缩方法，使得单个模型能够在多个压缩比下即时扩展，且与单个不可扩展模型相比，几乎不影响质量，且无需重新训练。我们在典型数据集和基准测试中验证了该方法，展示了低失真和显著的可扩展性与适应性提升。\n"
  },
  {
    "path": "abs/2501.13971.md",
    "content": "### GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting\n\nLiDAR novel view synthesis (NVS) has emerged as a novel task within LiDAR simulation, offering valuable simulated point cloud data from novel viewpoints to aid in autonomous driving systems. However, existing LiDAR NVS methods typically rely on neural radiance fields (NeRF) as their 3D representation, which incurs significant computational costs in both training and rendering. Moreover, NeRF and its variants are designed for symmetrical scenes, making them ill-suited for driving scenarios. To address these challenges, we propose GS-LiDAR, a novel framework for generating realistic LiDAR point clouds with panoramic Gaussian splatting. Our approach employs 2D Gaussian primitives with periodic vibration properties, allowing for precise geometric reconstruction of both static and dynamic elements in driving scenarios. We further introduce a novel panoramic rendering technique with explicit ray-splat intersection, guided by panoramic LiDAR supervision. By incorporating intensity and ray-drop spherical harmonic (SH) coefficients into the Gaussian primitives, we enhance the realism of the rendered point clouds. Extensive experiments on KITTI-360 and nuScenes demonstrate the superiority of our method in terms of quantitative metrics, visual quality, as well as training and rendering efficiency.\n\nLiDAR新视角合成（NVS）作为LiDAR仿真中的一个新兴任务，提供了来自新视角的宝贵模拟点云数据，有助于自动驾驶系统的开发。然而，现有的LiDAR NVS方法通常依赖于神经辐射场（NeRF）作为其3D表示，这在训练和渲染过程中都需要消耗大量计算资源。此外，NeRF及其变体通常是为对称场景设计的，因此不太适合用于驾驶场景。为了解决这些问题，我们提出了GS-LiDAR，一种基于全景高斯点云溅射生成逼真LiDAR点云的新框架。我们的方法采用具有周期性振动特性的2D高斯原语，能够精确地重建驾驶场景中的静态和动态元素的几何形状。我们进一步提出了一种新的全景渲染技术，通过显式的光线-溅射交点，并结合全景LiDAR监督来引导渲染过程。通过将强度和光线丢失的球谐（SH）系数融入高斯原语中，我们增强了渲染点云的真实感。通过在KITTI-360和nuScenes上的大量实验，验证了我们方法在定量指标、视觉质量以及训练和渲染效率方面的优越性。\n"
  },
  {
    "path": "abs/2501.13975.md",
    "content": "### 3DGS2: Near Second-order Converging 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a mainstream solution for novel view synthesis and 3D reconstruction. By explicitly encoding a 3D scene using a collection of Gaussian kernels, 3DGS achieves high-quality rendering with superior efficiency. As a learning-based approach, 3DGS training has been dealt with the standard stochastic gradient descent (SGD) method, which offers at most linear convergence. Consequently, training often requires tens of minutes, even with GPU acceleration. This paper introduces a (near) second-order convergent training algorithm for 3DGS, leveraging its unique properties. Our approach is inspired by two key observations. First, the attributes of a Gaussian kernel contribute independently to the image-space loss, which endorses isolated and local optimization algorithms. We exploit this by splitting the optimization at the level of individual kernel attributes, analytically constructing small-size Newton systems for each parameter group, and efficiently solving these systems on GPU threads. This achieves Newton-like convergence per training image without relying on the global Hessian. Second, kernels exhibit sparse and structured coupling across input images. This property allows us to effectively utilize spatial information to mitigate overshoot during stochastic training. Our method converges an order faster than standard GPU-based 3DGS training, requiring over 10× fewer iterations while maintaining or surpassing the quality of the compared with the SGD-based 3DGS reconstructions.\n\n3D高斯点云（3DGS）已成为新视角合成和3D重建的主流解决方案。通过显式地使用一组高斯核编码3D场景，3DGS实现了高质量的渲染并具有卓越的效率。作为一种基于学习的方法，3DGS的训练通常使用标准的随机梯度下降（SGD）方法，这种方法最多只能实现线性收敛。因此，即使在GPU加速的情况下，训练过程通常也需要数十分钟。本文提出了一种（接近）二阶收敛的训练算法，针对3DGS的独特性质进行优化。我们的方法受到两个关键观察的启发。首先，高斯核的属性对图像空间损失的贡献是独立的，这支持局部优化算法。我们通过在单个核属性级别拆分优化，分析性地构建每个参数组的小型牛顿系统，并在GPU线程上高效求解这些系统，从而实现每个训练图像的类似牛顿收敛，而不依赖于全局Hessian矩阵。其次，高斯核在输入图像之间表现出稀疏且结构化的耦合特性。这一特性使我们能够有效利用空间信息，以减轻在随机训练过程中的过冲问题。与标准的GPU基础3DGS训练方法相比，我们的方法收敛速度快了一倍以上，迭代次数减少了超过10倍，同时在质量上与基于SGD的3DGS重建相当或更优。\n"
  },
  {
    "path": "abs/2501.14147.md",
    "content": "### HAMMER: Heterogeneous, Multi-Robot Semantic Gaussian Splatting\n\n3D Gaussian Splatting offers expressive scene reconstruction, modeling a broad range of visual, geometric, and semantic information. However, efficient real-time map reconstruction with data streamed from multiple robots and devices remains a challenge. To that end, we propose HAMMER, a server-based collaborative Gaussian Splatting method that leverages widely available ROS communication infrastructure to generate 3D, metric-semantic maps from asynchronous robot data-streams with no prior knowledge of initial robot positions and varying on-device pose estimators. HAMMER consists of (i) a frame alignment module that transforms local SLAM poses and image data into a global frame and requires no prior relative pose knowledge, and (ii) an online module for training semantic 3DGS maps from streaming data. HAMMER handles mixed perception modes, adjusts automatically for variations in image pre-processing among different devices, and distills CLIP semantic codes into the 3D scene for open-vocabulary language queries. In our real-world experiments, HAMMER creates higher-fidelity maps (2x) compared to competing baselines and is useful for downstream tasks, such as semantic goal-conditioned navigation (e.g., \"go to the couch\").\n\n3D高斯点云（3DGS）提供了表达丰富的场景重建，能够建模广泛的视觉、几何和语义信息。然而，利用来自多个机器人和设备的数据流进行高效的实时地图重建仍然是一个挑战。为此，我们提出了HAMMER，一种基于服务器的协作高斯点云方法，利用广泛可用的ROS通信基础设施，从异步机器人数据流中生成3D度量语义地图，而无需先验的机器人初始位置知识以及设备上变化的姿态估计器。HAMMER由以下两个模块组成：（i）一个帧对齐模块，将局部SLAM姿态和图像数据转换为全局框架，且不需要先验的相对姿态知识；（ii）一个在线模块，用于从流式数据中训练语义3D高斯点云地图。HAMMER能够处理混合感知模式，自动调整不同设备之间图像预处理的差异，并将CLIP语义编码注入到3D场景中，以便进行开放词汇的语言查询。在我们的现实世界实验中，HAMMER相比于竞争基线，生成了更高保真度的地图（提高2倍），并且对于下游任务（如语义目标条件导航，举例来说：“去沙发”）非常有用。\n"
  },
  {
    "path": "abs/2501.14231.md",
    "content": "### Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images\n\n3D reconstruction from unconstrained image collections presents substantial challenges due to varying appearances and transient occlusions. In this paper, we introduce Micro-macro Wavelet-based Gaussian Splatting (MW-GS), a novel approach designed to enhance 3D reconstruction by disentangling scene representations into global, refined, and intrinsic components. The proposed method features two key innovations: Micro-macro Projection, which allows Gaussian points to capture details from feature maps across multiple scales with enhanced diversity; and Wavelet-based Sampling, which leverages frequency domain information to refine feature representations and significantly improve the modeling of scene appearances. Additionally, we incorporate a Hierarchical Residual Fusion Network to seamlessly integrate these features. Extensive experiments demonstrate that MW-GS delivers state-of-the-art rendering performance, surpassing existing methods.\n\n从非受约束的图像集合进行3D重建面临诸多挑战，包括外观变化和瞬时遮挡等问题。本文提出了一种新颖的方法——微宏小波基高斯点云（Micro-macro Wavelet-based Gaussian Splatting, MW-GS），通过将场景表示解耦为全局、精细和内在组件，提升3D重建质量。该方法的核心创新包括：（i）微宏投影（Micro-macro Projection），使高斯点能够从不同尺度的特征图中提取信息，增强细节捕捉的多样性；（ii）基于小波的采样（Wavelet-based Sampling），利用频域信息优化特征表示，显著提升场景外观建模能力。此外，我们引入了分层残差融合网络（Hierarchical Residual Fusion Network），用于无缝集成这些特征。大量实验表明，MW-GS在渲染性能上达到了最先进水平，超越了现有方法。\n\n"
  },
  {
    "path": "abs/2501.14277.md",
    "content": "### Dense-SfM: Structure from Motion with Dense Consistent Matching\n\nWe present Dense-SfM, a novel Structure from Motion (SfM) framework designed for dense and accurate 3D reconstruction from multi-view images. Sparse keypoint matching, which traditional SfM methods often rely on, limits both accuracy and point density, especially in texture-less areas. Dense-SfM addresses this limitation by integrating dense matching with a Gaussian Splatting (GS) based track extension which gives more consistent, longer feature tracks. To further improve reconstruction accuracy, Dense-SfM is equipped with a multi-view kernelized matching module leveraging transformer and Gaussian Process architectures, for robust track refinement across multi-views. Evaluations on the ETH3D and Texture-Poor SfM datasets show that Dense-SfM offers significant improvements in accuracy and density over state-of-the-art methods.\n\n我们提出了Dense-SfM，这是一种新颖的运动结构从图像（SfM）框架，旨在从多视图图像中进行密集且精确的3D重建。传统SfM方法常依赖的稀疏关键点匹配，限制了精度和点的密度，尤其是在无纹理区域。Dense-SfM通过结合密集匹配和基于高斯溅射（GS）的轨迹扩展，克服了这一限制，提供了更加一致和更长的特征轨迹。为了进一步提高重建精度，Dense-SfM配备了一个多视图核化匹配模块，利用变换器和高斯过程架构，在多视图之间进行稳健的轨迹优化。对ETH3D和Texture-Poor SfM数据集的评估表明，Dense-SfM在精度和密度方面相较于现有最先进的方法有了显著提升。\n"
  },
  {
    "path": "abs/2501.14319.md",
    "content": "### Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video\n\nWe aim to redefine robust ego-motion estimation and photorealistic 3D reconstruction by addressing a critical limitation: the reliance on noise-free data in existing models. While such sanitized conditions simplify evaluation, they fail to capture the unpredictable, noisy complexities of real-world environments. Dynamic motion, sensor imperfections, and synchronization perturbations lead to sharp performance declines when these models are deployed in practice, revealing an urgent need for frameworks that embrace and excel under real-world noise. To bridge this gap, we tackle three core challenges: scalable data generation, comprehensive benchmarking, and model robustness enhancement. First, we introduce a scalable noisy data synthesis pipeline that generates diverse datasets simulating complex motion, sensor imperfections, and synchronization errors. Second, we leverage this pipeline to create Robust-Ego3D, a benchmark rigorously designed to expose noise-induced performance degradation, highlighting the limitations of current learning-based methods in ego-motion accuracy and 3D reconstruction quality. Third, we propose Correspondence-guided Gaussian Splatting (CorrGS), a novel test-time adaptation method that progressively refines an internal clean 3D representation by aligning noisy observations with rendered RGB-D frames from clean 3D map, enhancing geometric alignment and appearance restoration through visual correspondence. Extensive experiments on synthetic and real-world data demonstrate that CorrGS consistently outperforms prior state-of-the-art methods, particularly in scenarios involving rapid motion and dynamic illumination.\n\n我们的目标是通过解决一个关键限制来重新定义稳健的自我运动估计和逼真3D重建：现有模型依赖于无噪声数据。尽管这种清理过的数据条件简化了评估，但它们未能捕捉到现实环境中不可预测和噪声复杂性。动态运动、传感器缺陷和同步扰动在这些模型实际应用时会导致性能急剧下降，这暴露出急需能够在现实世界噪声下表现出色的框架。为了解决这一问题，我们着手解决三个核心挑战：可扩展数据生成、全面基准测试和模型稳健性增强。首先，我们提出了一种可扩展的噪声数据合成管道，能够生成多样的数据集，模拟复杂的运动、传感器缺陷和同步错误。其次，我们利用这个管道创建了Robust-Ego3D，一个基准测试，经过严格设计以暴露噪声引起的性能下降，突出当前基于学习的方法在自我运动精度和3D重建质量方面的局限性。第三，我们提出了一种新颖的测试时自适应方法——基于对应关系的高斯溅射（CorrGS），通过将噪声观测与来自干净3D地图的渲染RGB-D帧进行对齐，逐步精炼内部干净的3D表示，从而通过视觉对应关系增强几何对齐和外观恢复。在合成数据和真实世界数据上的大量实验表明，CorrGS在快速运动和动态光照等场景下，持续超越了先前的最先进方法。\n"
  },
  {
    "path": "abs/2501.14534.md",
    "content": "### Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting\n\nGaussian splatting (GS) for 3D reconstruction has become quite popular due to their fast training, inference speeds and high quality reconstruction. However, GS-based reconstructions generally consist of millions of Gaussians, which makes them hard to use on computationally constrained devices such as smartphones. In this paper, we first propose a principled analysis of advances in efficient GS methods. Then, we propose Trick-GS, which is a careful combination of several strategies including (1) progressive training with resolution, noise and Gaussian scales, (2) learning to prune and mask primitives and SH bands by their significance, and (3) accelerated GS training framework. Trick-GS takes a large step towards resource-constrained GS, where faster run-time, smaller and faster-convergence of models is of paramount concern. Our results on three datasets show that Trick-GS achieves up to 2x faster training, 40x smaller disk size and 2x faster rendering speed compared to vanilla GS, while having comparable accuracy.\n\n3D重建中的高斯点云（Gaussian Splatting, GS）因其快速训练、高效推理和高质量重建而受到广泛关注。然而，GS 生成的重建通常包含数百万个高斯点，使其难以在智能手机等计算资源受限的设备上运行。本文首先对高效 GS 方法的最新进展进行了系统分析。随后，我们提出 Trick-GS，一种精心设计的优化策略组合，包括：（1）渐进式训练，通过逐步调整分辨率、噪声和高斯尺度，提高收敛效率；（2）基于重要性的剪枝与掩码学习，优化高斯原语和球谐（SH）带的存储与计算；（3）加速 GS 训练框架，提升整体训练与推理效率。Trick-GS 迈出了面向资源受限环境（如移动设备）优化 GS 运行的重要一步，使得 GS 具备更快的运行速度、更小的模型尺寸和更快的收敛能力。我们在三个数据集上的实验结果表明，与标准 GS 相比，Trick-GS 训练速度提高最多 2 倍，磁盘存储需求减少 40 倍，渲染速度提升 2 倍，同时保持了相当的重建精度。\n"
  },
  {
    "path": "abs/2501.15008.md",
    "content": "### HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion\n\nWe present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.\n\n我们提出了 HuGDiffusion，一种通用的3D高斯点云（3DGS）学习管道，用于从单视图输入图像实现人类角色的新视角合成（NVS）。现有的方法通常需要单目视频或标定好的多视角图像作为输入，这在现实场景中由于摄像机姿态的任意性和/或未知性，其适用性可能会受到限制。本文的目标是通过基于扩散的框架，生成一组由单张图像提取的人类先验条件的3DGS属性。具体来说，我们首先整合了以人为中心的特征提取过程，以推导出有用的条件信号。基于我们在实验中的观察，联合学习整个3DGS属性是一个具有挑战性的优化任务，因此我们设计了一种多阶段生成策略，以获取不同类型的3DGS属性。为了促进训练过程，我们探讨了构建代理真实标签的3D高斯属性作为高质量的属性级监督信号。通过大量实验，HuGDiffusion 在性能上显著超过了现有的最先进方法。\n"
  },
  {
    "path": "abs/2501.15096.md",
    "content": "### Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful representation due to its efficiency and high-fidelity rendering. However, 3DGS training requires a known camera pose for each input view, typically obtained by Structure-from-Motion (SfM) pipelines. Pioneering works have attempted to relax this restriction but still face difficulties when handling long sequences with complex camera trajectories. In this work, we propose Rob-GS, a robust framework to progressively estimate camera poses and optimize 3DGS for arbitrarily long video sequences. Leveraging the inherent continuity of videos, we design an adjacent pose tracking method to ensure stable pose estimation between consecutive frames. To handle arbitrarily long inputs, we adopt a \"divide and conquer\" scheme that adaptively splits the video sequence into several segments and optimizes them separately. Extensive experiments on the Tanks and Temples dataset and our collected real-world dataset show that our Rob-GS outperforms the state-of-the-arts.\n\n3D高斯点云（3DGS）因其高效性和高保真度渲染而成为一种强大的表示方法。然而，3DGS的训练需要每个输入视图的已知相机姿态，这通常通过结构光束法（SfM）管道获得。先驱性的研究尝试放宽这一限制，但在处理具有复杂相机轨迹的长序列时仍然面临困难。在本研究中，我们提出了 Rob-GS，一个稳健的框架，用于逐步估计相机姿态并优化3DGS，以处理任意长度的视频序列。利用视频的固有连续性，我们设计了一种相邻姿态跟踪方法，确保连续帧之间姿态估计的稳定性。为了处理任意长度的输入，我们采用了一种“分而治之”的策略，动态地将视频序列划分为多个段，并分别对其进行优化。在Tanks and Temples数据集和我们收集的真实世界数据集上的大量实验表明，Rob-GS 超越了现有的最先进方法。\n\n"
  },
  {
    "path": "abs/2501.15619.md",
    "content": "### GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting\n\nEffective image tokenization is crucial for both multi-modal understanding and generation tasks due to the necessity of the alignment with discrete text data. To this end, existing approaches utilize vector quantization (VQ) to project pixels onto a discrete codebook and reconstruct images from the discrete representation. However, compared with the continuous latent space, the limited discrete codebook space significantly restrict the representational ability of these image tokenizers. In this paper, we propose GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting as a solution. We first represent the encoded samples as multiple flexible featured 2D Gaussians characterized by positions, rotation angles, scaling factors, and feature coefficients. We adopt the standard quantization for the Gaussian features and then concatenate the quantization results with the other intrinsic Gaussian parameters before the corresponding splatting operation and the subsequent decoding module. In general, GaussianToken integrates the local influence of 2D Gaussian distribution into the discrete space and thus enhances the representation capability of the image tokenizer. Competitive reconstruction performances on CIFAR, Mini-ImageNet, and ImageNet-1K demonstrate the effectiveness of our framework.\n\n高效的图像标记化（tokenization）对于多模态理解和生成任务至关重要，因为它需要与离散文本数据对齐。为此，现有方法通常采用向量量化（VQ）技术，将像素投影到离散码本，并从离散表示中重建图像。然而，与连续潜空间相比，受限的离散码本空间极大地限制了这些图像标记器的表示能力。\n在本文中，我们提出 GaussianToken——一种基于 2D 高斯点渲染（Gaussian Splatting） 的高效图像标记化方法。我们首先使用多个灵活的 2D 高斯来表示编码样本，并通过位置、旋转角、缩放因子和特征系数来描述其特性。随后，我们对高斯特征进行标准量化，并在相应的点渲染（splatting）操作和后续解码模块之前，将量化结果与其他内在高斯参数进行拼接。总体而言，GaussianToken 将 2D 高斯分布的局部影响引入离散空间，从而增强了图像标记器的表示能力。\n在 CIFAR、Mini-ImageNet 和 ImageNet-1K 数据集上的竞争性重建性能验证了我们框架的有效性。\n\n"
  },
  {
    "path": "abs/2501.16764.md",
    "content": "### DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation\n\nRecent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.\n\n近年来，从文本或单张图像生成3D内容的研究取得了进展，但仍受限于高质量3D数据集的匮乏，以及2D多视角生成带来的不一致性问题。为此，我们提出 DiffSplat，一种新颖的3D生成框架，通过调整大规模文本到图像的扩散模型，直接生成3D高斯点云（Gaussian splats）。与现有的3D生成模型不同，DiffSplat 有效利用了网络规模的2D先验信息，同时在一个统一的模型中保持3D一致性。\n为了优化训练过程，我们提出了一种轻量级重建模型，可即时生成多视角高斯点云网格，用于大规模数据集的构建。此外，在常规扩散损失的基础上，我们引入了 3D渲染损失，以增强跨任意视角的3D一致性。由于DiffSplat与图像扩散模型兼容，因此可以无缝适配各种图像生成技术到3D领域。\n大量实验表明，DiffSplat 在文本和图像条件生成任务及下游应用中均表现出色。此外，详尽的消融研究验证了每个关键设计选择的有效性，并提供了对其内在机制的深入理解。\n"
  },
  {
    "path": "abs/2501.17085.md",
    "content": "### Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds\n\nEfficient and realistic crowd rendering is an important element of many real-time graphics applications such as Virtual Reality (VR) and games. To this end, Levels of Detail (LOD) avatar representations such as polygonal meshes, image-based impostors, and point clouds have been proposed and evaluated. More recently, 3D Gaussian Splatting has been explored as a potential method for real-time crowd rendering. In this paper, we present a two-alternative forced choice (2AFC) experiment that aims to determine the perceived quality of 3D Gaussian avatars. Three factors were explored: Motion, LOD (i.e., #Gaussians), and the avatar height in Pixels (corresponding to the viewing distance). Participants viewed pairs of animated 3D Gaussian avatars and were tasked with choosing the most detailed one. Our findings can inform the optimization of LOD strategies in Gaussian-based crowd rendering, thereby helping to achieve efficient rendering while maintaining visual quality in real-time applications.\n\n高效且真实的 crowd 渲染是许多实时图形应用（如虚拟现实（VR）和游戏）中的重要元素。为此，提出并评估了不同的细节层次（LOD）头像表示方法，如多边形网格、基于图像的替代物和点云。最近，3D高斯点云（Gaussian Splatting）被探索作为实时 crowd 渲染的潜在方法。本文介绍了一项二选一强迫选择实验（2AFC），旨在确定3D高斯头像的感知质量。我们探讨了三个因素：运动、LOD（即高斯点数量）以及头像在像素中的高度（对应于视距）。参与者观看了成对的动画3D高斯头像，并选择了细节更丰富的一个。我们的研究结果有助于优化基于高斯的 crowd 渲染中的LOD策略，从而帮助在实时应用中实现高效渲染同时保持视觉质量。\n"
  },
  {
    "path": "abs/2501.17655.md",
    "content": "### FeatureGS: Eigenvalue-Feature Optimization in 3D Gaussian Splatting for Geometrically Accurate and Artifact-Reduced Reconstruction\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful approach for 3D scene reconstruction using 3D Gaussians. However, neither the centers nor surfaces of the Gaussians are accurately aligned to the object surface, complicating their direct use in point cloud and mesh reconstruction. Additionally, 3DGS typically produces floater artifacts, increasing the number of Gaussians and storage requirements. To address these issues, we present FeatureGS, which incorporates an additional geometric loss term based on an eigenvalue-derived 3D shape feature into the optimization process of 3DGS. The goal is to improve geometric accuracy and enhance properties of planar surfaces with reduced structural entropy in local 3D neighborhoods. We present four alternative formulations for the geometric loss term based on ’planarity’ of Gaussians, as well as ’planarity’, ’omnivariance’, and ’eigenentropy’ of Gaussian neighborhoods. We provide quantitative and qualitative evaluations on 15 scenes of the DTU benchmark dataset focusing on following key aspects: Geometric accuracy and artifact-reduction, measured by the Chamfer distance, and memory efficiency, evaluated by the total number of Gaussians. Additionally, rendering quality is monitored by Peak-Signal-to-Noise Ratio. FeatureGS achieves a 30% improvement in geometric accuracy, reduces the number of Gaussians by 90%, and suppresses floater artifacts, while maintaining comparable photometric rendering quality. The geometric loss with ’planarity’ from Gaussians provides the highest geometric accuracy, while ’omnivariance’ in Gaussian neighborhoods reduces floater artifacts and number of Gaussians the most. This makes FeatureGS a strong method for geometrically accurate, artifact-reduced and memoryefficient 3D scene reconstruction, enabling the direct use of Gaussian centers for geometric representation\n\n3D高斯点云（3DGS）因其使用3D高斯点进行场景重建而成为一种强大的方法。然而，3DGS中的高斯中心和表面通常与物体表面没有准确对齐，这使得其在点云和网格重建中的直接应用变得复杂。此外，3DGS通常会产生浮动伪影，增加高斯点的数量和存储需求。为了解决这些问题，我们提出了 FeatureGS，它在3DGS的优化过程中引入了一个基于特征值派生的3D形状特征的几何损失项。其目标是提高几何精度，增强平面表面的特性，并减少局部3D邻域的结构熵。\n我们为几何损失项提供了四种不同的公式，基于高斯的**“平面性”、高斯邻域的“平面性”、“全方差”和“特征熵”**。我们在DTU基准数据集的15个场景上进行定量和定性评估，重点关注以下几个关键方面：通过Chamfer距离测量的几何精度和伪影减少，以及通过高斯总数评估的内存效率。此外，通过峰值信噪比（PSNR）监控渲染质量。\n实验结果表明，FeatureGS在几何精度上提高了30%，减少了90%的高斯点数，抑制了浮动伪影，同时保持了可比的光度渲染质量。使用高斯的“平面性”进行的几何损失提供了最高的几何精度，而高斯邻域的“全方差”则最大程度地减少了浮动伪影和高斯点数量。这使得FeatureGS成为一种强有力的几何准确、伪影减少且内存高效的3D场景重建方法，能够直接使用高斯中心进行几何表示。\n"
  },
  {
    "path": "abs/2501.17792.md",
    "content": "### CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering\n\nWe present CrowdSplat, a novel approach that leverages 3D Gaussian Splatting for real-time, high-quality crowd rendering. Our method utilizes 3D Gaussian functions to represent animated human characters in diverse poses and outfits, which are extracted from monocular videos. We integrate Level of Detail (LoD) rendering to optimize computational efficiency and quality. The CrowdSplat framework consists of two stages: (1) avatar reconstruction and (2) crowd synthesis. The framework is also optimized for GPU memory usage to enhance scalability. Quantitative and qualitative evaluations show that CrowdSplat achieves good levels of rendering quality, memory efficiency, and computational performance. Through these experiments, we demonstrate that CrowdSplat is a viable solution for dynamic, realistic crowd simulation in real-time applications.\n\n我们提出了CrowdSplat，一种利用3D高斯点渲染进行实时高质量人群渲染的新方法。我们的方法利用3D高斯函数表示从单目视频中提取的各种姿势和服装的动画人物。我们集成了细节层次（LoD）渲染，以优化计算效率和质量。CrowdSplat框架分为两个阶段：（1）虚拟形象重建，（2）人群合成。该框架还针对GPU内存使用进行了优化，以增强可扩展性。定量和定性评估表明，CrowdSplat在渲染质量、内存效率和计算性能方面都取得了良好的水平。通过这些实验，我们展示了CrowdSplat在实时应用中实现动态、逼真的人群仿真的可行性。\n"
  },
  {
    "path": "abs/2501.17978.md",
    "content": "### VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting\n\nReconstructing a 3D scene from images is challenging due to the different ways light interacts with surfaces depending on the viewer's position and the surface's material. In classical computer graphics, materials can be classified as diffuse or specular, interacting with light differently. The standard 3D Gaussian Splatting model struggles to represent view-dependent content, since it cannot differentiate an object within the scene from the light interacting with its specular surfaces, which produce highlights or reflections. In this paper, we propose to extend the 3D Gaussian Splatting model by introducing an additional symmetric matrix to enhance the opacity representation of each 3D Gaussian. This improvement allows certain Gaussians to be suppressed based on the viewer's perspective, resulting in a more accurate representation of view-dependent reflections and specular highlights without compromising the scene's integrity. By allowing the opacity to be view dependent, our enhanced model achieves state-of-the-art performance on Mip-Nerf, Tanks\\&Temples, Deep Blending, and Nerf-Synthetic datasets without a significant loss in rendering speed, achieving >60FPS, and only incurring a minimal increase in memory used.\n\n从图像重建三维场景具有挑战性，因为光与表面的相互作用方式取决于观察者的位置和表面材质。在经典计算机图形学中，材质通常分为漫反射和镜面反射，它们与光的交互方式不同。标准的 3D 高斯点渲染模型难以表示与视角相关的内容，因为它无法区分场景中的物体与光在其镜面反射表面上的交互，从而导致高光或反射效果的缺失。\n在本文中，我们提出了一种改进的 3D 高斯点渲染模型，通过引入一个额外的对称矩阵来增强每个 3D 高斯的不透明度表示。此改进使得某些高斯可以根据观察者的视角被抑制，从而更准确地表示视角相关的反射和镜面高光，同时保持场景的完整性。通过使不透明度随视角变化，我们的增强模型在 Mip-Nerf、Tanks&Temples、Deep Blending 和 Nerf-Synthetic 数据集上达到了最先进的性能，同时渲染速度未显著下降（>60FPS），且仅带来了极小的内存开销增加。\n"
  },
  {
    "path": "abs/2501.18152.md",
    "content": "### StructuredField: Unifying Structured Geometry and Radiance Field\n\nRecent point-based differentiable rendering techniques have achieved significant success in high-fidelity reconstruction and fast rendering. However, due to the unstructured nature of point-based representations, they are difficult to apply to modern graphics pipelines designed for structured meshes, as well as to a variety of simulation and editing algorithms that work well with structured mesh representations. To this end, we propose StructuredField, a novel representation that achieves both a structured geometric representation of the reconstructed object and a high-fidelity rendering reconstruction of the object. We employ structured tetrahedral meshes to represent the reconstructed object. We reparameterize the geometric parameters of the tetrahedral mesh into the geometric shape parameters of a 3D Gaussians, thereby achieving differentiable high-fidelity rendering of the tetrahedral mesh. We propose a novel inversion-free homeomorphism to constrain the optimization of the tetrahedral mesh, which strictly guarantees that the tetrahedral mesh is remains both inversion-free and self-intersection-free during the optimization process and the final result. Based on our proposed StructuredField, we achieve high-quality structured meshes and high-fidelity reconstruction. We also demonstrate the applicability of our representation to various applications such as physical simulation and deformation.\n\n近年来，基于点的可微渲染技术在高保真重建和快速渲染方面取得了显著成功。然而，由于点基表示的无结构特性，它们难以应用于为结构化网格设计的现代图形管线，以及许多与结构化网格表示兼容的模拟和编辑算法。为此，我们提出了 StructuredField，一种新颖的表示方法，能够同时实现重建物体的结构化几何表示和高保真度的渲染重建。我们采用结构化四面体网格来表示重建物体，并将四面体网格的几何参数重新参数化为 3D 高斯的几何形状参数，从而实现四面体网格的可微高保真渲染。我们提出了一种新颖的无逆同胚映射，用于约束四面体网格的优化，这确保了在优化过程和最终结果中，四面体网格始终保持无逆且无自交。基于我们提出的 StructuredField，我们实现了高质量的结构化网格和高保真度的重建。我们还展示了该表示方法在物理模拟和变形等多个应用中的可行性。\n"
  },
  {
    "path": "abs/2501.18630.md",
    "content": "### Deformable Beta Splatting\n\n3D Gaussian Splatting (3DGS) has advanced radiance field reconstruction by enabling real-time rendering. However, its reliance on Gaussian kernels for geometry and low-order Spherical Harmonics (SH) for color encoding limits its ability to capture complex geometries and diverse colors. We introduce Deformable Beta Splatting (DBS), a deformable and compact approach that enhances both geometry and color representation. DBS replaces Gaussian kernels with deformable Beta Kernels, which offer bounded support and adaptive frequency control to capture fine geometric details with higher fidelity while achieving better memory efficiency. In addition, we extended the Beta Kernel to color encoding, which facilitates improved representation of diffuse and specular components, yielding superior results compared to SH-based methods. Furthermore, Unlike prior densification techniques that depend on Gaussian properties, we mathematically prove that adjusting regularized opacity alone ensures distribution-preserved Markov chain Monte Carlo (MCMC), independent of the splatting kernel type. Experimental results demonstrate that DBS achieves state-of-the-art visual quality while utilizing only 45% of the parameters and rendering 1.5x faster than 3DGS-based methods. Notably, for the first time, splatting-based methods outperform state-of-the-art Neural Radiance Fields, highlighting the superior performance and efficiency of DBS for real-time radiance field rendering.\n\n3D Gaussian Splatting (3DGS) 通过实现实时渲染推进了辐射场重建的研究。然而，它依赖于高斯核来表示几何结构和低阶球面调和函数（SH）进行颜色编码，这限制了其捕捉复杂几何和多样化颜色的能力。为此，我们引入了 Deformable Beta Splatting (DBS)，一种可变形且紧凑的方法，增强了几何和颜色表示。DBS 用可变形 Beta 核替代了高斯核，Beta 核具有有界支持和自适应频率控制，能够以更高的保真度捕捉精细的几何细节，同时实现更好的内存效率。此外，我们将 Beta 核扩展到颜色编码，改善了漫反射和镜面反射成分的表示，与基于 SH 的方法相比，获得了更优的结果。进一步地，与依赖高斯性质的先前密集化技术不同，我们通过数学证明，单独调整正则化的透明度即可确保分布保持的马尔科夫链蒙特卡洛（MCMC），而与 Splatting 核类型无关。实验结果表明，DBS 实现了最先进的视觉质量，同时仅使用 3DGS 方法 45% 的参数，并且渲染速度比 3DGS 快 1.5 倍。值得注意的是，首次出现了 Splatting 方法超越最先进的神经辐射场（NeRF）技术，突显了 DBS 在实时辐射场渲染中的优越性能和效率。\n"
  },
  {
    "path": "abs/2501.18672.md",
    "content": "### Drag Your Gaussian: Effective Drag-Based Editing with Score Distillation for 3D Gaussian Splatting\n\nRecent advancements in 3D scene editing have been propelled by the rapid development of generative models. Existing methods typically utilize generative models to perform text-guided editing on 3D representations, such as 3D Gaussian Splatting (3DGS). However, these methods are often limited to texture modifications and fail when addressing geometric changes, such as editing a character's head to turn around. Moreover, such methods lack accurate control over the spatial position of editing results, as language struggles to precisely describe the extent of edits. To overcome these limitations, we introduce DYG, an effective 3D drag-based editing method for 3D Gaussian Splatting. It enables users to conveniently specify the desired editing region and the desired dragging direction through the input of 3D masks and pairs of control points, thereby enabling precise control over the extent of editing. DYG integrates the strengths of the implicit triplane representation to establish the geometric scaffold of the editing results, effectively overcoming suboptimal editing outcomes caused by the sparsity of 3DGS in the desired editing regions. Additionally, we incorporate a drag-based Latent Diffusion Model into our method through the proposed Drag-SDS loss function, enabling flexible, multi-view consistent, and fine-grained editing. Extensive experiments demonstrate that DYG conducts effective drag-based editing guided by control point prompts, surpassing other baselines in terms of editing effect and quality, both qualitatively and quantitatively.\n\n近期，3D 场景编辑的进展得益于生成模型的快速发展。现有方法通常利用生成模型对 3D 表示（如 3D Gaussian Splatting，3DGS）进行文本引导编辑。然而，这些方法通常仅限于纹理修改，在处理几何变化时（例如编辑角色的头部使其转向）会失败。此外，这些方法缺乏对编辑结果空间位置的精确控制，因为语言难以准确描述编辑的程度。为克服这些限制，我们提出了 DYG，一种有效的基于拖拽的 3D 编辑方法，适用于 3D Gaussian Splatting。通过输入 3D 掩模和控制点对，DYG 使用户能够方便地指定期望的编辑区域和拖拽方向，从而实现对编辑范围的精确控制。DYG 将隐式三平面表示的优势与编辑结果的几何支架结合起来，有效克服了由于 3DGS 在所需编辑区域的稀疏性所造成的次优编辑效果。此外，我们通过提出的 Drag-SDS 损失函数将基于拖拽的潜在扩散模型集成到方法中，实现灵活的、多视角一致的细粒度编辑。大量实验表明，DYG 通过控制点提示引导的拖拽编辑在编辑效果和质量上均超越了其他基线方法，表现出显著的定性和定量优势。\n"
  },
  {
    "path": "abs/2501.18982.md",
    "content": "### OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation\n\nRecently, significant advancements have been made in the reconstruction and generation of 3D assets, including static cases and those with physical interactions. To recover the physical properties of 3D assets, existing methods typically assume that all materials belong to a specific predefined category (e.g., elasticity). However, such assumptions ignore the complex composition of multiple heterogeneous objects in real scenarios and tend to render less physically plausible animation given a wider range of objects. We propose OmniPhysGS for synthesizing a physics-based 3D dynamic scene composed of more general objects. A key design of OmniPhysGS is treating each 3D asset as a collection of constitutive 3D Gaussians. For each Gaussian, its physical material is represented by an ensemble of 12 physical domain-expert sub-models (rubber, metal, honey, water, etc.), which greatly enhances the flexibility of the proposed model. In the implementation, we define a scene by user-specified prompts and supervise the estimation of material weighting factors via a pretrained video diffusion model. Comprehensive experiments demonstrate that OmniPhysGS achieves more general and realistic physical dynamics across a broader spectrum of materials, including elastic, viscoelastic, plastic, and fluid substances, as well as interactions between different materials. Our method surpasses existing methods by approximately 3% to 16% in metrics of visual quality and text alignment.\n\n近期，在 3D 资产的重建和生成方面取得了显著进展，包括静态场景和具有物理交互的动态场景。为了恢复 3D 资产的物理属性，现有方法通常假设所有材料属于某个特定的预定义类别（如弹性）。然而，这种假设忽略了现实场景中多种异质物体的复杂组成，且在处理更广泛物体时，可能导致动画效果在物理上不够逼真。为此，我们提出了 OmniPhysGS，用于合成一个基于物理的 3D 动态场景，能够包含更多一般性的物体。OmniPhysGS 的一个关键设计是将每个 3D 资产视为一组构成的 3D 高斯点。对于每个高斯点，其物理材料通过 12 个物理领域专家子模型（如橡胶、金属、蜂蜜、水等）的集合来表示，这大大增强了模型的灵活性。在实现过程中，我们通过用户指定的提示定义场景，并通过预训练的视频扩散模型来监督材料加权因子的估计。综合实验表明，OmniPhysGS 在包括弹性、粘弹性、塑性、流体等物质的多种材料及其交互作用下，能够实现更广泛且更逼真的物理动态。与现有方法相比，我们的方法在视觉质量和文本对齐度的指标上提升了约 3% 至 16%。\n"
  },
  {
    "path": "abs/2501.19088.md",
    "content": "### JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting\n\nSince hands are the primary interface in daily interactions, modeling high-quality digital human hands and rendering realistic images is a critical research problem. Furthermore, considering the requirements of interactive and rendering applications, it is essential to achieve real-time rendering and driveability of the digital model without compromising rendering quality. Thus, we propose Jointly 3D Gaussian Hand (JGHand), a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation that renders high-fidelity hand images in real-time for various poses and characters. Distinct from existing articulated neural rendering techniques, we introduce a differentiable process for spatial transformations based on 3D key points. This process supports deformations from the canonical template to a mesh with arbitrary bone lengths and poses. Additionally, we propose a real-time shadow simulation method based on per-pixel depth to simulate self-occlusion shadows caused by finger movements. Finally, we embed the hand prior and propose an animatable 3DGS representation of the hand driven solely by 3D key points. We validate the effectiveness of each component of our approach through comprehensive ablation studies. Experimental results on public datasets demonstrate that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.\n\n由于手是日常互动中的主要接口，建模高质量的数字人类手部并渲染逼真的图像是一个关键的研究问题。此外，考虑到交互和渲染应用的需求，实现实时渲染和数字模型的可驱动性而不降低渲染质量至关重要。因此，我们提出了 Jointly 3D Gaussian Hand (JGHand)，一种基于联合驱动的 3D 高斯 Splatting（3DGS）手部表示方法，能够实时渲染高保真的手部图像，适用于各种姿势和角色。与现有的关节化神经渲染技术不同，我们引入了一种基于 3D 关键点的可微分空间变换过程。该过程支持从标准模板到具有任意骨骼长度和姿势的网格的形变。此外，我们提出了一种基于每像素深度的实时阴影模拟方法，用于模拟由于手指运动而产生的自遮挡阴影。最后，我们嵌入了手部先验，并提出了一种仅通过 3D 关键点驱动的可动画 3DGS 手部表示方法。通过全面的消融研究验证了我们方法中每个组件的有效性。基于公共数据集的实验结果表明，JGHand 实现了实时渲染速度，并在质量上有所提升，超越了现有的最先进方法。\n"
  },
  {
    "path": "abs/2501.19196.md",
    "content": "### RaySplats: Ray Tracing based Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) is a process that enables the direct creation of 3D objects from 2D images. This representation offers numerous advantages, including rapid training and rendering. However, a significant limitation of 3DGS is the challenge of incorporating light and shadow reflections, primarily due to the utilization of rasterization rather than ray tracing for rendering. This paper introduces RaySplats, a model that employs ray-tracing based Gaussian Splatting. Rather than utilizing the projection of Gaussians, our method employs a ray-tracing mechanism, operating directly on Gaussian primitives represented by confidence ellipses with RGB colors. In practice, we compute the intersection between ellipses and rays to construct ray-tracing algorithms, facilitating the incorporation of meshes with Gaussian Splatting models and the addition of lights, shadows, and other related effects.\n\n3D Gaussian Splatting (3DGS) 是一种通过 2D 图像直接创建 3D 物体的过程。这种表示方法具有许多优势，包括快速训练和渲染。然而，3DGS 的一个显著局限性是难以处理光照和阴影的反射，主要因为其采用光栅化而非光线追踪进行渲染。本文介绍了 RaySplats，一种基于光线追踪的高斯 Splatting 模型。与传统的利用高斯投影的方法不同，我们的方法采用光线追踪机制，直接作用于由信心椭圆和 RGB 颜色表示的高斯原语。在实际应用中，我们计算椭圆与光线的交点，以构建光线追踪算法，从而使高斯 Splatting 模型能够与网格结合，并加入光照、阴影等相关效果。\n"
  },
  {
    "path": "abs/2501.19319.md",
    "content": "### Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping\n\nSimultaneous Localization and Mapping (SLAM) is essential for precise surgical interventions and robotic tasks in minimally invasive procedures. While recent advancements in 3D Gaussian Splatting (3DGS) have improved SLAM with high-quality novel view synthesis and fast rendering, these systems struggle with accurate depth and surface reconstruction due to multi-view inconsistencies. Simply incorporating SLAM and 3DGS leads to mismatches between the reconstructed frames. In this work, we present Endo-2DTAM, a real-time endoscopic SLAM system with 2D Gaussian Splatting (2DGS) to address these challenges. Endo-2DTAM incorporates a surface normal-aware pipeline, which consists of tracking, mapping, and bundle adjustment modules for geometrically accurate reconstruction. Our robust tracking module combines point-to-point and point-to-plane distance metrics, while the mapping module utilizes normal consistency and depth distortion to enhance surface reconstruction quality. We also introduce a pose-consistent strategy for efficient and geometrically coherent keyframe sampling. Extensive experiments on public endoscopic datasets demonstrate that Endo-2DTAM achieves an RMSE of 1.87±0.63 mm for depth reconstruction of surgical scenes while maintaining computationally efficient tracking, high-quality visual appearance, and real-time rendering.\n\n同时定位与建图（SLAM）对于精确的外科手术干预和微创手术中的机器人任务至关重要。尽管 3D 高斯 Splatting（3DGS）最近的进展已通过高质量的新视角合成和快速渲染提升了 SLAM 的性能，但这些系统在深度和表面重建上仍面临挑战，主要由于多视角之间的不一致性。简单地将 SLAM 和 3DGS 结合会导致重建帧之间的不匹配。为了解决这些问题，我们提出了 Endo-2DTAM，一种基于 2D 高斯 Splatting（2DGS）的实时内窥镜 SLAM 系统。Endo-2DTAM 引入了一个表面法线感知的流程，包括跟踪、建图和束调整模块，以实现几何精确的重建。我们的稳健跟踪模块结合了点对点和点对平面距离度量，而建图模块则利用法线一致性和深度失真来提高表面重建质量。我们还提出了一种姿态一致性策略，用于高效且几何一致的关键帧采样。基于公共内窥镜数据集的大量实验表明，Endo-2DTAM 在外科场景的深度重建中实现了 1.87±0.63 毫米的 RMSE，同时保持了计算高效的跟踪、高质量的视觉效果和实时渲染能力。\n"
  },
  {
    "path": "abs/2502.00173.md",
    "content": "### Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation\n\nWe introduce Lifting By Gaussians (LBG), a novel approach for open-world instance segmentation of 3D Gaussian Splatted Radiance Fields (3DGS). Recently, 3DGS Fields have emerged as a highly efficient and explicit alternative to Neural Field-based methods for high-quality Novel View Synthesis. Our 3D instance segmentation method directly lifts 2D segmentation masks from SAM (alternately FastSAM, etc.), together with features from CLIP and DINOv2, directly fusing them onto 3DGS (or similar Gaussian radiance fields such as 2DGS). Unlike previous approaches, LBG requires no per-scene training, allowing it to operate seamlessly on any existing 3DGS reconstruction. Our approach is not only an order of magnitude faster and simpler than existing approaches; it is also highly modular, enabling 3D semantic segmentation of existing 3DGS fields without requiring a specific parametrization of the 3D Gaussians. Furthermore, our technique achieves superior semantic segmentation for 2D semantic novel view synthesis and 3D asset extraction results while maintaining flexibility and efficiency. We further introduce a novel approach to evaluate individually segmented 3D assets from 3D radiance field segmentation methods.\n\n我们提出了 Lifting By Gaussians (LBG)，一种用于 3D 高斯 Splatting 辐射场（3DGS）开放世界实例分割的新方法。近年来，3DGS 辐射场作为一种高效且显式的替代方案，已成为基于神经场的方法进行高质量新视角合成的有力竞争者。我们的 3D 实例分割方法直接从 SAM（或其他如 FastSAM 等）中提取 2D 分割掩模，并结合来自 CLIP 和 DINOv2 的特征，直接将它们融合到 3DGS（或类似的高斯辐射场如 2DGS）中。与之前的方法不同，LBG 不需要针对每个场景进行训练，从而能够无缝地在任何现有的 3DGS 重建上运行。我们的方法不仅比现有方法速度快、实现更简单，而且具有高度模块化的特点，使得在无需对 3D 高斯进行特定参数化的情况下，能够实现现有 3DGS 辐射场的 3D 语义分割。此外，我们的技术在 2D 语义新视角合成和 3D 资产提取结果中实现了优越的语义分割效果，同时保持了灵活性和高效性。我们还提出了一种新的方法，用于评估从 3D 辐射场分割方法中单独分割出的 3D 资产。\n"
  },
  {
    "path": "abs/2502.00654.md",
    "content": "### EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis\n\n3D Gaussian splatting-based talking head synthesis has recently gained attention for its ability to render high-fidelity images with real-time inference speed. However, since it is typically trained on only a short video that lacks the diversity in facial emotions, the resultant talking heads struggle to represent a wide range of emotions. To address this issue, we propose a lip-aligned emotional face generator and leverage it to train our EmoTalkingGaussian model. It is able to manipulate facial emotions conditioned on continuous emotion values (i.e., valence and arousal); while retaining synchronization of lip movements with input audio. Additionally, to achieve the accurate lip synchronization for in-the-wild audio, we introduce a self-supervised learning method that leverages a text-to-speech network and a visual-audio synchronization network. We experiment our EmoTalkingGaussian on publicly available videos and have obtained better results than state-of-the-arts in terms of image quality (measured in PSNR, SSIM, LPIPS), emotion expression (measured in V-RMSE, A-RMSE, V-SA, A-SA, Emotion Accuracy), and lip synchronization (measured in LMD, Sync-E, Sync-C), respectively.\n\n基于 3D 高斯 Splatting 的说话人合成因其能够以实时推理速度渲染高保真图像而受到关注。然而，由于通常仅在缺乏面部情感多样性、且仅包含短视频数据上进行训练，生成的说话人面部难以表现广泛的情感。为了解决这一问题，我们提出了一种唇部对齐的情感面部生成器，并利用该生成器来训练我们的 EmoTalkingGaussian 模型。该模型能够根据连续的情感值（即情感价值和唤醒度）调节面部表情，同时保持与输入音频的唇部动作同步。此外，为了实现野外音频的准确唇部同步，我们引入了一种自监督学习方法，该方法利用文本转语音网络和视觉音频同步网络。我们在公开视频数据集上对 EmoTalkingGaussian 进行了实验，并在图像质量（以 PSNR、SSIM、LPIPS 测量）、情感表达（以 V-RMSE、A-RMSE、V-SA、A-SA、情感准确度测量）以及唇部同步（以 LMD、Sync-E、Sync-C 测量）方面获得了优于现有方法的结果。\n"
  },
  {
    "path": "abs/2502.00708.md",
    "content": "### PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation\n\nText-to-3D asset generation has achieved significant optimization under the supervision of 2D diffusion priors. However, when dealing with compositional scenes, existing methods encounter several challenges: 1). failure to ensure that composite scene layouts comply with physical laws; 2). difficulty in accurately capturing the assets and relationships described in complex scene descriptions; 3). limited autonomous asset generation capabilities among layout approaches leveraging large language models (LLMs). To avoid these compromises, we propose a novel framework for compositional scene generation, PhiP-G, which seamlessly integrates generation techniques with layout guidance based on a world model. Leveraging LLM-based agents, PhiP-G analyzes the complex scene description to generate a scene graph, and integrating a multimodal 2D generation agent and a 3D Gaussian generation method for targeted assets creation. For the stage of layout, PhiP-G employs a physical pool with adhesion capabilities and a visual supervision agent, forming a world model for layout prediction and planning. Extensive experiments demonstrate that PhiP-G significantly enhances the generation quality and physical rationality of the compositional scenes. Notably, PhiP-G attains state-of-the-art (SOTA) performance in CLIP scores, achieves parity with the leading methods in generation quality as measured by the T3Bench, and improves efficiency by 24x.\n\n文本到 3D 资产生成在 2D 扩散先验的监督下取得了显著优化。然而，在处理复合场景时，现有方法面临多个挑战：1）无法确保复合场景布局符合物理定律；2）难以准确捕捉复杂场景描述中提到的资产和它们之间的关系；3）利用大语言模型（LLMs）的布局方法在自主资产生成能力方面有限。为避免这些妥协，我们提出了一种新的复合场景生成框架 PhiP-G，该框架无缝地将生成技术与基于世界模型的布局引导相结合。通过利用基于 LLM 的代理，PhiP-G 分析复杂的场景描述并生成场景图，同时集成多模态的 2D 生成代理和 3D 高斯生成方法用于有针对性的资产创建。在布局阶段，PhiP-G 使用具有附着能力的物理池和视觉监督代理，形成一个用于布局预测和规划的世界模型。大量实验表明，PhiP-G 显著提升了复合场景的生成质量和物理合理性。值得注意的是，PhiP-G 在 CLIP 分数上达到了最先进的（SOTA）性能，在 T3Bench 测量的生成质量上与领先方法持平，并提高了 24 倍的效率。\n"
  },
  {
    "path": "abs/2502.01157.md",
    "content": "### Radiant Foam: Real-Time Differentiable Ray Tracing\n\nResearch on differentiable scene representations is consistently moving towards more efficient, real-time models. Recently, this has led to the popularization of splatting methods, which eschew the traditional ray-based rendering of radiance fields in favor of rasterization. This has yielded a significant improvement in rendering speeds due to the efficiency of rasterization algorithms and hardware, but has come at a cost: the approximations that make rasterization efficient also make implementation of light transport phenomena like reflection and refraction much more difficult. We propose a novel scene representation which avoids these approximations, but keeps the efficiency and reconstruction quality of splatting by leveraging a decades-old efficient volumetric mesh ray tracing algorithm which has been largely overlooked in recent computer vision research. The resulting model, which we name Radiant Foam, achieves rendering speed and quality comparable to Gaussian Splatting, without the constraints of rasterization. Unlike ray traced Gaussian models that use hardware ray tracing acceleration, our method requires no special hardware or APIs beyond the standard features of a programmable GPU.\n\n可微场景表示的研究始终朝着更高效、实时的模型发展。最近，这推动了点云方法的普及，这些方法摒弃了传统的基于光线的辐射场渲染，而采用光栅化渲染。由于光栅化算法和硬件的高效性，这在渲染速度上取得了显著提高，但也带来了代价：使光栅化高效的近似方法也使得光传输现象（如反射和折射）的实现变得更加困难。我们提出了一种新型的场景表示方法，避免了这些近似，但通过利用一种已经被计算机视觉研究大多忽视的几十年历史的高效体积网格光线追踪算法，保留了点云方法的效率和重建质量。我们命名这种模型为Radiant Foam，它在渲染速度和质量上与高斯点云（Gaussian Splatting）相当，但没有光栅化的约束。与使用硬件光线追踪加速的光线追踪高斯模型不同，我们的方法不需要特殊硬件或API，仅依赖于可编程GPU的标准功能。\n"
  },
  {
    "path": "abs/2502.01536.md",
    "content": "### VR-Robo: A Real-to-Sim-to-Real Framework for Visual Robot Navigation and Locomotion\n\nRecent success in legged robot locomotion is attributed to the integration of reinforcement learning and physical simulators. However, these policies often encounter challenges when deployed in real-world environments due to sim-to-real gaps, as simulators typically fail to replicate visual realism and complex real-world geometry. Moreover, the lack of realistic visual rendering limits the ability of these policies to support high-level tasks requiring RGB-based perception like ego-centric navigation. This paper presents a Real-to-Sim-to-Real framework that generates photorealistic and physically interactive \"digital twin\" simulation environments for visual navigation and locomotion learning. Our approach leverages 3D Gaussian Splatting (3DGS) based scene reconstruction from multi-view images and integrates these environments into simulations that support ego-centric visual perception and mesh-based physical interactions. To demonstrate its effectiveness, we train a reinforcement learning policy within the simulator to perform a visual goal-tracking task. Extensive experiments show that our framework achieves RGB-only sim-to-real policy transfer. Additionally, our framework facilitates the rapid adaptation of robot policies with effective exploration capability in complex new environments, highlighting its potential for applications in households and factories.\n\n最近，四足机器人运动的成功归功于强化学习与物理仿真器的结合。然而，这些策略在部署到实际环境时常常面临挑战，因为仿真器通常无法真实再现视觉效果和复杂的现实世界几何形状，这导致了模拟与现实之间的差距。此外，缺乏逼真的视觉渲染限制了这些策略支持高层任务（如基于RGB的自我导航）的能力。本文提出了一种“现实-仿真-现实”框架，生成逼真且物理互动的“数字孪生”仿真环境，用于视觉导航和运动学习。我们的方法利用基于3D高斯溅射（3DGS）的多视角图像场景重建，并将这些环境集成到支持自我中心视觉感知和基于网格的物理交互的仿真中。为了验证其有效性，我们在仿真器中训练了一个强化学习策略，执行视觉目标跟踪任务。大量实验表明，我们的框架实现了仅基于RGB的模拟到现实策略转移。此外，我们的框架还促进了机器人策略在复杂新环境中的快速适应，并具有有效的探索能力，突显了其在家庭和工厂应用中的潜力。\n"
  },
  {
    "path": "abs/2502.01826.md",
    "content": "### Scalable 3D Gaussian Splatting-Based RF Signal Spatial Propagation Modeling\n\nEffective network planning and sensing in wireless networks require resource-intensive site surveys for data collection. An alternative is Radio-Frequency (RF) signal spatial propagation modeling, which computes received signals given transceiver positions in a scene (e.g.s a conference room). We identify a fundamental trade-off between scalability and fidelity in the state-of-the-art method. To address this issue, we explore leveraging 3D Gaussian Splatting (3DGS), an advanced technique for the image synthesis of 3D scenes in real-time from arbitrary camera poses. By integrating domain-specific insights, we design three components for adapting 3DGS to the RF domain, including Gaussian-based RF scene representation, gradient-guided RF attribute learning, and RF-customized CUDA for ray tracing. Building on them, we develop RFSPM, an end-to-end framework for scalable RF signal Spatial Propagation Modeling. We evaluate RFSPM in four field studies and two applications across RFID, BLE, LoRa, and 5G, covering diverse frequencies, antennas, signals, and scenes. The results show that RFSPM matches the fidelity of the state-of-the-art method while reducing data requirements, training GPU-hours, and inference latency by up to 9.8×, 18.6×, and 84.4×, respectively.\n\n有效的无线网络规划和感知需要资源密集型的现场调查来收集数据。另一种替代方法是射频（RF）信号空间传播建模，该方法根据场景中发射机和接收机的位置计算接收到的信号（例如会议室）。我们发现现有方法在可扩展性和保真度之间存在一个基本的权衡。为了解决这个问题，我们探索了利用3D高斯溅射（3DGS），一种用于实时合成3D场景图像的先进技术，可从任意相机位置生成图像。通过结合领域特定的见解，我们设计了三个组件，将3DGS适配到射频领域，包括基于高斯的射频场景表示、梯度引导的射频属性学习和射频定制CUDA用于光线追踪。在此基础上，我们开发了RFSPM，一个端到端的可扩展射频信号空间传播建模框架。我们在四个实地研究和两个应用中评估了RFSPM，涵盖了RFID、BLE、LoRa和5G，涉及不同的频率、天线、信号和场景。结果表明，RFSPM在保真度上与现有方法相当，同时将数据需求、训练GPU小时数和推理延迟分别减少了多达9.8倍、18.6倍和84.4倍。\n"
  },
  {
    "path": "abs/2502.01846.md",
    "content": "### UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping\n\n3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3D objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured, and permutation-invariant nature. In this work, we present a simple yet effective method to overcome these challenges. We utilize spherical mapping to transform 3DGS into a structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions as a concatenation of Gaussian attributes such as position, scale, color, opacity, and rotation. We further find that these heterogeneous features can be compressed into a lower-dimensional (e.g., 3-channel) shared feature space using a carefully designed multi-branch network. The compressed UVGS can be treated as typical RGB images. Remarkably, we discover that typical VAEs trained with latent diffusion models can directly generalize to this new representation without additional training. Our novel representation makes it effortless to leverage foundational 2D models, such as diffusion models, to directly model 3DGS. Additionally, one can simply increase the 2D UV resolution to accommodate more Gaussians, making UVGS a scalable solution compared to typical 3D backbones. This approach immediately unlocks various novel generation applications of 3DGS by inherently utilizing the already developed superior 2D generation capabilities. In our experiments, we demonstrate various unconditional, conditional generation, and inpainting applications of 3DGS based on diffusion models, which were previously non-trivial.\n\n3D高斯溅射（3DGS）在建模3D物体和场景方面展示了卓越的质量。然而，由于其离散、无结构且不变的排列特性，生成3DGS仍然具有挑战性。在本研究中，我们提出了一种简单而有效的方法来克服这些挑战。我们利用球面映射将3DGS转化为结构化的2D表示，称为UVGS。UVGS可以被视为多通道图像，其特征维度是多个高斯属性的拼接，如位置、尺度、颜色、不透明度和旋转。我们进一步发现，这些异质特征可以通过精心设计的多分支网络压缩到一个低维（例如3通道）共享特征空间。压缩后的UVGS可以被视为典型的RGB图像。值得注意的是，我们发现，使用潜在扩散模型训练的典型变分自编码器（VAE）可以直接泛化到这种新表示，而无需额外训练。我们创新的表示方法使得利用基础的2D模型（如扩散模型）直接建模3DGS变得轻而易举。此外，通过简单地增加2D UV分辨率以适应更多的高斯，UVGS相较于典型的3D骨干网络，提供了一个可扩展的解决方案。这一方法通过本质上利用已开发的优越2D生成能力，立即开启了3DGS的各种新型生成应用。在我们的实验中，我们展示了基于扩散模型的多种无条件、条件生成和图像修复应用，之前这些任务并非易事。\n"
  },
  {
    "path": "abs/2502.01949.md",
    "content": "### LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation\n\nRecently, the field of text-guided 3D scene generation has garnered significant attention. High-quality generation that aligns with physical realism and high controllability is crucial for practical 3D scene applications. However, existing methods face fundamental limitations: (i) difficulty capturing complex relationships between multiple objects described in the text, (ii) inability to generate physically plausible scene layouts, and (iii) lack of controllability and extensibility in compositional scenes. In this paper, we introduce LayoutDreamer, a framework that leverages 3D Gaussian Splatting (3DGS) to facilitate high-quality, physically consistent compositional scene generation guided by text. Specifically, given a text prompt, we convert it into a directed scene graph and adaptively adjust the density and layout of the initial compositional 3D Gaussians. Subsequently, dynamic camera adjustments are made based on the training focal point to ensure entity-level generation quality. Finally, by extracting directed dependencies from the scene graph, we tailor physical and layout energy to ensure both realism and flexibility. Comprehensive experiments demonstrate that LayoutDreamer outperforms other compositional scene generation quality and semantic alignment methods. Specifically, it achieves state-of-the-art (SOTA) performance in the multiple objects generation metric of T3Bench.\n\n最近，文本引导的3D场景生成领域引起了广泛关注。高质量的生成不仅要与物理现实对齐，还要具备高度可控性，这对于实际的3D场景应用至关重要。然而，现有方法面临着根本性的限制：（i）难以捕捉文本中描述的多个物体之间的复杂关系；（ii）无法生成物理上合理的场景布局；（iii）在组合场景中缺乏可控性和可扩展性。本文介绍了LayoutDreamer，一个利用3D高斯溅射（3DGS）促进高质量、物理一致的组合场景生成框架，该框架由文本引导。具体而言，给定一个文本提示，我们将其转换为一个定向场景图，并自适应调整初始组合3D高斯的密度和布局。随后，基于训练焦点进行动态相机调整，以确保实体级别的生成质量。最后，通过从场景图中提取定向依赖关系，我们量身定制物理和布局能量，以确保现实性和灵活性。全面的实验表明，LayoutDreamer在组合场景生成质量和语义对齐方面超越了其他方法。具体而言，在T3Bench的多物体生成指标中，LayoutDreamer达到了最先进的（SOTA）性能。\n"
  },
  {
    "path": "abs/2502.02091.md",
    "content": "### Instruct-4DGS: Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation\n\nRecent 4D dynamic scene editing methods require editing thousands of 2D images used for dynamic scene synthesis and updating the entire scene with additional training loops, resulting in several hours of processing to edit a single dynamic scene. Therefore, these methods are not scalable with respect to the temporal dimension of the dynamic scene (i.e., the number of timesteps). In this work, we propose Instruct-4DGS, an efficient dynamic scene editing method that is more scalable in terms of temporal dimension. To achieve computational efficiency, we leverage a 4D Gaussian representation that models a 4D dynamic scene by combining static 3D Gaussians with a Hexplane-based deformation field, which captures dynamic information. We then perform editing solely on the static 3D Gaussians, which is the minimal but sufficient component required for visual editing. To resolve the misalignment between the edited 3D Gaussians and the deformation field, which may arise from the editing process, we introduce a refinement stage using a score distillation mechanism. Extensive editing results demonstrate that Instruct-4DGS is efficient, reducing editing time by more than half compared to existing methods while achieving high-quality edits that better follow user instructions.\n\n现有的4D动态场景编辑方法需要对用于动态场景合成的数千张2D图像进行修改，并通过额外的训练迭代更新整个场景，因此编辑单个动态场景往往需要数小时处理时间。因此，这些方法在动态场景的时间维度（即时间步数）上缺乏可扩展性。本文提出了 Instruct-4DGS，这是一种在时间维度上具有更高可扩展性的高效动态场景编辑方法。为实现计算效率，我们采用了一种4D高斯表示，通过结合静态3D高斯与基于Hexplane的形变场来建模4D动态场景，其中形变场用于捕捉动态信息。随后，我们仅在静态3D高斯上进行编辑，这是实现视觉编辑所需的最小但充分的部分。为了解决在编辑过程中可能出现的已编辑3D高斯与形变场之间的不对齐问题，我们引入了一个基于得分蒸馏机制的精炼阶段。大量的编辑结果表明，Instruct-4DGS高效，将编辑时间相比现有方法缩短了一半以上，同时能够实现更高质量的编辑，并更好地遵循用户指令。\n"
  },
  {
    "path": "abs/2502.02283.md",
    "content": "### GP-GS: Gaussian Processes for Enhanced Gaussian Splatting\n\n3D Gaussian Splatting has emerged as an efficient photorealistic novel view synthesis method. However, its reliance on sparse Structure-from-Motion (SfM) point clouds consistently compromises the scene reconstruction quality. To address these limitations, this paper proposes a novel 3D reconstruction framework Gaussian Processes Gaussian Splatting (GP-GS), where a multi-output Gaussian Process model is developed to achieve adaptive and uncertainty-guided densification of sparse SfM point clouds. Specifically, we propose a dynamic sampling and filtering pipeline that adaptively expands the SfM point clouds by leveraging GP-based predictions to infer new candidate points from the input 2D pixels and depth maps. The pipeline utilizes uncertainty estimates to guide the pruning of high-variance predictions, ensuring geometric consistency and enabling the generation of dense point clouds. The densified point clouds provide high-quality initial 3D Gaussians to enhance reconstruction performance. Extensive experiments conducted on synthetic and real-world datasets across various scales validate the effectiveness and practicality of the proposed framework.\n\n3D高斯溅射（3DGS）已成为一种高效的逼真新视角合成方法。然而，它对稀疏的结构光从运动（SfM）点云的依赖始终会影响场景重建的质量。为了解决这些局限性，本文提出了一种新颖的3D重建框架——高斯过程高斯溅射（GP-GS），在该框架中，开发了一个多输出高斯过程模型，以实现稀疏SfM点云的自适应和不确定性引导的密集化。具体而言，我们提出了一种动态采样和过滤管道，利用基于高斯过程的预测，通过从输入的2D像素和深度图推断新的候选点，自适应地扩展SfM点云。该管道利用不确定性估计来引导高方差预测的修剪，从而确保几何一致性并生成稠密的点云。这些密集化的点云提供了高质量的初始3D高斯，用于增强重建性能。在合成和真实世界数据集上进行的大量实验验证了所提出框架的有效性和实用性。\n"
  },
  {
    "path": "abs/2502.03228.md",
    "content": "### GARAD-SLAM: 3D GAussian splatting for Real-time Anti Dynamic SLAM\n\nThe 3D Gaussian Splatting (3DGS)-based SLAM system has garnered widespread attention due to its excellent performance in real-time high-fidelity rendering. However, in real-world environments with dynamic objects, existing 3DGS-based SLAM systems often face mapping errors and tracking drift issues. To address these problems, we propose GARAD-SLAM, a real-time 3DGS-based SLAM system tailored for dynamic scenes. In terms of tracking, unlike traditional methods, we directly perform dynamic segmentation on Gaussians and map them back to the front-end to obtain dynamic point labels through a Gaussian pyramid network, achieving precise dynamic removal and robust tracking. For mapping, we impose rendering penalties on dynamically labeled Gaussians, which are updated through the network, to avoid irreversible erroneous removal caused by simple pruning. Our results on real-world datasets demonstrate that our method is competitive in tracking compared to baseline methods, generating fewer artifacts and higher-quality reconstructions in rendering.\n\n基于3D高斯溅射（3DGS）的SLAM系统因其在实时高保真渲染中的优异表现而广受关注。然而，在包含动态物体的真实环境中，现有的基于3DGS的SLAM系统常常面临映射错误和跟踪漂移问题。为了解决这些问题，我们提出了GARAD-SLAM，一个专为动态场景量身定制的实时3DGS基SLAM系统。在跟踪方面，与传统方法不同，我们直接对高斯进行动态分割，并通过高斯金字塔网络将其映射回前端，通过高斯的动态标签实现精确的动态物体去除和稳健的跟踪。在映射方面，我们对动态标记的高斯施加渲染惩罚，这些高斯通过网络进行更新，从而避免了简单修剪导致的不可逆错误去除。我们在真实世界数据集上的结果表明，与基线方法相比，我们的方法在跟踪方面具有竞争力，渲染中生成的伪影较少，并且重建质量更高。\n"
  },
  {
    "path": "abs/2502.04630.md",
    "content": "### High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting\n\nCapturing and reconstructing high-speed dynamic 3D scenes has numerous applications in computer graphics, vision, and interdisciplinary fields such as robotics, aerodynamics, and evolutionary biology. However, achieving this using a single imaging modality remains challenging. For instance, traditional RGB cameras suffer from low frame rates, limited exposure times, and narrow baselines. To address this, we propose a novel sensor fusion approach using Gaussian splatting, which combines RGB, depth, and event cameras to capture and reconstruct deforming scenes at high speeds. The key insight of our method lies in leveraging the complementary strengths of these imaging modalities: RGB cameras capture detailed color information, event cameras record rapid scene changes with microsecond resolution, and depth cameras provide 3D scene geometry. To unify the underlying scene representation across these modalities, we represent the scene using deformable 3D Gaussians. To handle rapid scene movements, we jointly optimize the 3D Gaussian parameters and their temporal deformation fields by integrating data from all three sensor modalities. This fusion enables efficient, high-quality imaging of fast and complex scenes, even under challenging conditions such as low light, narrow baselines, or rapid motion. Experiments on synthetic and real datasets captured with our prototype sensor fusion setup demonstrate that our method significantly outperforms state-of-the-art techniques, achieving noticeable improvements in both rendering fidelity and structural accuracy.\n\n捕捉和重建高速动态3D场景在计算机图形学、视觉以及机器人学、空气动力学和进化生物学等跨学科领域中具有广泛的应用。然而，使用单一成像方式实现这一目标仍然具有挑战性。例如，传统RGB相机受限于低帧率、有限的曝光时间和狭窄的基线。为了解决这个问题，我们提出了一种新颖的传感器融合方法，利用高斯溅射技术，将RGB、深度和事件相机结合起来，用于捕捉和重建高速变形场景。我们方法的关键在于利用这些成像方式的互补优势：RGB相机捕捉详细的颜色信息，事件相机以微秒级分辨率记录快速场景变化，深度相机提供3D场景几何信息。为了统一这些模式下的场景表示，我们使用可变形的3D高斯表示场景。为了处理快速场景运动，我们通过整合来自三种传感器模式的数据，联合优化3D高斯参数及其时间变形场。这种融合方法使得在低光、狭窄基线或快速运动等挑战性条件下，仍能高效且高质量地成像快速复杂的场景。我们在使用原型传感器融合系统捕获的合成和真实数据集上的实验结果表明，我们的方法显著超越了现有的最先进技术，在渲染保真度和结构准确性上都取得了显著提升。\n"
  },
  {
    "path": "abs/2502.04734.md",
    "content": "### SC-OmniGS: Self-Calibrating Omnidirectional Gaussian Splatting\n\n360-degree cameras streamline data collection for radiance field 3D reconstruction by capturing comprehensive scene data. However, traditional radiance field methods do not address the specific challenges inherent to 360-degree images. We present SC-OmniGS, a novel self-calibrating omnidirectional Gaussian splatting system for fast and accurate omnidirectional radiance field reconstruction using 360-degree images. Rather than converting 360-degree images to cube maps and performing perspective image calibration, we treat 360-degree images as a whole sphere and derive a mathematical framework that enables direct omnidirectional camera pose calibration accompanied by 3D Gaussians optimization. Furthermore, we introduce a differentiable omnidirectional camera model in order to rectify the distortion of real-world data for performance enhancement. Overall, the omnidirectional camera intrinsic model, extrinsic poses, and 3D Gaussians are jointly optimized by minimizing weighted spherical photometric loss. Extensive experiments have demonstrated that our proposed SC-OmniGS is able to recover a high-quality radiance field from noisy camera poses or even no pose prior in challenging scenarios characterized by wide baselines and non-object-centric configurations. The noticeable performance gain in the real-world dataset captured by consumer-grade omnidirectional cameras verifies the effectiveness of our general omnidirectional camera model in reducing the distortion of 360-degree images.\n\n360度相机通过捕捉全面的场景数据，简化了辐射场3D重建的数据收集过程。然而，传统的辐射场方法并未解决360度图像固有的特定挑战。我们提出了SC-OmniGS，一种新颖的自校准全向高斯溅射系统，旨在利用360度图像实现快速且精确的全向辐射场重建。不同于将360度图像转换为立方体图并进行透视图像校准，我们将360度图像视为一个完整的球体，并推导出一个数学框架，使得能够直接进行全向相机位姿校准，同时优化3D高斯。此外，我们引入了一个可微分的全向相机模型，以纠正真实数据的畸变，从而提高性能。总体而言，全向相机内参模型、外参位姿和3D高斯通过最小化加权球面光度损失共同优化。大量实验表明，我们提出的SC-OmniGS能够在具有宽基线和非物体中心配置的挑战性场景中，从噪声相机位姿甚至无位姿先验的情况下恢复高质量的辐射场。由消费级全向相机捕获的真实世界数据集上，性能的显著提升验证了我们通用全向相机模型在减少360度图像畸变方面的有效性。\n"
  },
  {
    "path": "abs/2502.04981.md",
    "content": "### OccGS: Zero-shot 3D Occupancy Reconstruction with Semantic and Geometric-Aware Gaussian Splatting\n\nObtaining semantic 3D occupancy from raw sensor data without manual annotations remains an essential yet challenging task. While prior works have approached this as a perception prediction problem, we formulate it as scene-aware 3D occupancy reconstruction with geometry and semantics. In this work, we propose OccGS, a novel 3D Occupancy reconstruction framework utilizing Semantic and Geometric-Aware Gaussian Splatting in a zero-shot manner. Leveraging semantics extracted from vision-language models and geometry guided by LiDAR points, OccGS constructs Semantic and Geometric-Aware Gaussians from raw multisensor data. We also develop a cumulative Gaussian-to-3D voxel splatting method for reconstructing occupancy from the Gaussians. OccGS performs favorably against self-supervised methods in occupancy prediction, achieving comparable performance to fully supervised approaches and achieving state-of-the-art performance on zero-shot semantic 3D occupancy estimation.\n\n从原始传感器数据中获取语义3D占据信息而无需人工标注，依然是一个至关重要且具有挑战性的任务。尽管先前的研究将其视为感知预测问题，但我们将其表述为具有几何和语义的场景感知3D占据重建。在本研究中，我们提出了OccGS，一种利用语义和几何感知高斯溅射的零样本3D占据重建框架。OccGS通过利用从视觉-语言模型提取的语义信息和通过LiDAR点引导的几何信息，从原始多传感器数据中构建语义和几何感知高斯。我们还开发了一种累积的高斯到3D体素溅射方法，用于从高斯中重建占据信息。与自监督方法相比，OccGS在占据预测方面表现优异，达到了与完全监督方法相当的性能，并在零样本语义3D占据估计任务中实现了最先进的性能。\n"
  },
  {
    "path": "abs/2502.05040.md",
    "content": "### GaussRender: Learning 3D Occupancy with Gaussian Rendering\n\nUnderstanding the 3D geometry and semantics of driving scenes is critical for developing of safe autonomous vehicles. While 3D occupancy models are typically trained using voxel-based supervision with standard losses (e.g., cross-entropy, Lovasz, dice), these approaches treat voxel predictions independently, neglecting their spatial relationships. In this paper, we propose GaussRender, a plug-and-play 3D-to-2D reprojection loss that enhances voxel-based supervision. Our method projects 3D voxel representations into arbitrary 2D perspectives and leverages Gaussian splatting as an efficient, differentiable rendering proxy of voxels, introducing spatial dependencies across projected elements. This approach improves semantic and geometric consistency, handles occlusions more efficiently, and requires no architectural modifications. Extensive experiments on multiple benchmarks (SurroundOcc-nuScenes, Occ3D-nuScenes, SSCBench-KITTI360) demonstrate consistent performance gains across various 3D occupancy models (TPVFormer, SurroundOcc, Symphonies), highlighting the robustness and versatility of our framework.\n\n理解驾驶场景的3D几何和语义对于开发安全的自动驾驶车辆至关重要。虽然3D占据模型通常使用基于体素的监督与标准损失（例如交叉熵、Lovasz、dice）进行训练，但这些方法将体素预测视为独立的，忽视了它们之间的空间关系。本文提出了GaussRender，一种即插即用的3D到2D重投影损失，旨在增强基于体素的监督。我们的方法将3D体素表示投影到任意2D视角，并利用高斯溅射作为体素的高效、可微分渲染代理，引入了投影元素之间的空间依赖关系。这种方法改善了语义和几何一致性，更高效地处理了遮挡问题，并且不需要对架构进行修改。在多个基准测试（SurroundOcc-nuScenes、Occ3D-nuScenes、SSCBench-KITTI360）上的大量实验表明，我们的方法在各种3D占据模型（TPVFormer、SurroundOcc、Symphonies）中都表现出了稳定的性能提升，凸显了我们框架的鲁棒性和多样性。\n"
  },
  {
    "path": "abs/2502.05176.md",
    "content": "### AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting\n\nThree-dimensional scene inpainting is crucial for applications from virtual reality to architectural visualization, yet existing methods struggle with view consistency and geometric accuracy in 360° unbounded scenes. We present AuraFusion360, a novel reference-based method that enables high-quality object removal and hole filling in 3D scenes represented by Gaussian Splatting. Our approach introduces (1) depth-aware unseen mask generation for accurate occlusion identification, (2) Adaptive Guided Depth Diffusion, a zero-shot method for accurate initial point placement without requiring additional training, and (3) SDEdit-based detail enhancement for multi-view coherence. We also introduce 360-USID, the first comprehensive dataset for 360° unbounded scene inpainting with ground truth. Extensive experiments demonstrate that AuraFusion360 significantly outperforms existing methods, achieving superior perceptual quality while maintaining geometric accuracy across dramatic viewpoint changes.\n\n三维场景修复对于虚拟现实、建筑可视化等应用至关重要，但现有方法在360°无限场景中往往难以保持视角一致性和几何准确性。我们提出了AuraFusion360，一种新颖的基于参考的方法，可以在通过高斯溅射表示的3D场景中实现高质量的物体移除和孔洞填充。我们的方法引入了以下创新：（1）基于深度的未见区域掩膜生成，用于准确识别遮挡，（2）自适应引导深度扩散，一种零样本方法，无需额外训练即可实现精确的初始点放置，（3）基于SDEdit的细节增强，确保多视角一致性。我们还提出了360-USID，这是第一个具有真实标签的360°无限场景修复综合数据集。大量实验表明，AuraFusion360显著超越了现有方法，在极端视角变化下实现了卓越的感知质量，同时保持几何准确性。\n"
  },
  {
    "path": "abs/2502.05752.md",
    "content": "### PINGS: Gaussian Splatting Meets Distance Fields within a Point-Based Implicit Neural Map\n\nRobots require high-fidelity reconstructions of their environment for effective operation. Such scene representations should be both, geometrically accurate and photorealistic to support downstream tasks. While this can be achieved by building distance fields from range sensors and radiance fields from cameras, the scalable incremental mapping of both fields consistently and at the same time with high quality remains challenging. In this paper, we propose a novel map representation that unifies a continuous signed distance field and a Gaussian splatting radiance field within an elastic and compact point-based implicit neural map. By enforcing geometric consistency between these fields, we achieve mutual improvements by exploiting both modalities. We devise a LiDAR-visual SLAM system called PINGS using the proposed map representation and evaluate it on several challenging large-scale datasets. Experimental results demonstrate that PINGS can incrementally build globally consistent distance and radiance fields encoded with a compact set of neural points. Compared to the state-of-the-art methods, PINGS achieves superior photometric and geometric rendering at novel views by leveraging the constraints from the distance field. Furthermore, by utilizing dense photometric cues and multi-view consistency from the radiance field, PINGS produces more accurate distance fields, leading to improved odometry estimation and mesh reconstruction.\n\n机器人需要对其环境进行高保真重建，以实现有效的操作。这样的场景表示应该在几何上准确，并且在视觉上逼真，以支持后续任务。虽然通过从激光雷达传感器构建距离场和通过摄像头构建辐射场可以实现这一目标，但同时高质量且可扩展的增量化映射这两种场景表示仍然是一个挑战。本文提出了一种新型的地图表示方法，它将连续的符号距离场和高斯点云辐射场统一到一个弹性且紧凑的基于点的隐式神经地图中。通过强制这两种场景表示之间的几何一致性，我们通过利用这两种模式相互促进，取得了共同的改进。我们设计了一种名为PINGS的激光雷达-视觉SLAM系统，使用提出的地图表示，并在多个具有挑战性的、大规模数据集上进行了评估。实验结果表明，PINGS能够增量地构建全局一致的距离场和辐射场，并用一组紧凑的神经点进行编码。与最先进的方法相比，PINGS通过利用距离场的约束，在新视角下实现了更优的光度和几何渲染。此外，通过利用来自辐射场的密集光度线索和多视角一致性，PINGS能够生成更准确的距离场，从而改善了里程计估计和网格重建。\n"
  },
  {
    "path": "abs/2502.05769.md",
    "content": "### Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform\n\nUrban digital twins are virtual replicas of cities that use multi-source data and data analytics to optimize urban planning, infrastructure management, and decision-making. Towards this, we propose a framework focused on the single-building scale. By connecting to cloud mapping platforms such as Google Map Platforms APIs, by leveraging state-of-the-art multi-agent Large Language Models data analysis using ChatGPT(4o) and Deepseek-V3/R1, and by using our Gaussian Splatting-based mesh extraction pipeline, our Digital Twin Buildings framework can retrieve a building's 3D model, visual descriptions, and achieve cloud-based mapping integration with large language model-based data analytics using a building's address, postal code, or geographic coordinates.\n\n城市数字孪生是城市的虚拟复制品，通过多源数据和数据分析来优化城市规划、基础设施管理和决策过程。为此，我们提出了一个聚焦于单一建筑尺度的框架。通过连接到云地图平台，如Google Map Platforms API，利用最先进的多智能体大语言模型（如ChatGPT(4o)和Deepseek-V3/R1）进行数据分析，并使用基于高斯溅射的网格提取管道，我们的数字孪生建筑框架可以检索建筑的3D模型、视觉描述，并实现基于建筑地址、邮政编码或地理坐标的云端地图集成与大语言模型驱动的数据分析。\n"
  },
  {
    "path": "abs/2502.06510.md",
    "content": "### Three-Dimensional MRI Reconstruction with Gaussian Representations: Tackling the Undersampling Problem\n\nThree-Dimensional Gaussian Splatting (3DGS) has shown substantial promise in the field of computer vision, but remains unexplored in the field of magnetic resonance imaging (MRI). This study explores its potential for the reconstruction of isotropic resolution 3D MRI from undersampled k-space data. We introduce a novel framework termed 3D Gaussian MRI (3DGSMR), which employs 3D Gaussian distributions as an explicit representation for MR volumes. Experimental evaluations indicate that this method can effectively reconstruct voxelized MR images, achieving a quality on par with that of well-established 3D MRI reconstruction techniques found in the literature. Notably, the 3DGSMR scheme operates under a self-supervised framework, obviating the need for extensive training datasets or prior model training. This approach introduces significant innovations to the domain, notably the adaptation of 3DGS to MRI reconstruction and the novel application of the existing 3DGS methodology to decompose MR signals, which are presented in a complex-valued format.\n\n三维高斯溅射（3DGS）在计算机视觉领域展示了巨大的潜力，但在磁共振成像（MRI）领域仍未得到探索。本研究探讨了其在从欠采样的k空间数据重建各向同性分辨率3D MRI中的潜力。我们提出了一种新颖的框架，称为3D高斯MRI（3DGSMR），该框架使用3D高斯分布作为MRI体积的显式表示。实验评估表明，该方法能够有效重建体素化的MRI图像，质量与文献中已建立的3D MRI重建技术相当。值得注意的是，3DGSMR方案在自监督框架下运行，无需大量训练数据集或先验模型训练。这一方法为该领域带来了重要创新，特别是将3DGS适应于MRI重建，并创新性地应用现有的3DGS方法来分解以复数格式呈现的MRI信号。\n"
  },
  {
    "path": "abs/2502.06519.md",
    "content": "### SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps\n\nWe present SIREN for registration of multi-robot Gaussian Splatting (GSplat) maps, with zero access to camera poses, images, and inter-map transforms for initialization or fusion of local submaps. To realize these capabilities, SIREN harnesses the versatility and robustness of semantics in three critical ways to derive a rigorous registration pipeline for multi-robot GSplat maps. First, SIREN utilizes semantics to identify feature-rich regions of the local maps where the registration problem is better posed, eliminating the need for any initialization which is generally required in prior work. Second, SIREN identifies candidate correspondences between Gaussians in the local maps using robust semantic features, constituting the foundation for robust geometric optimization, coarsely aligning 3D Gaussian primitives extracted from the local maps. Third, this key step enables subsequent photometric refinement of the transformation between the submaps, where SIREN leverages novel-view synthesis in GSplat maps along with a semantics-based image filter to compute a high-accuracy non-rigid transformation for the generation of a high-fidelity fused map. We demonstrate the superior performance of SIREN compared to competing baselines across a range of real-world datasets, and in particular, across the most widely-used robot hardware platforms, including a manipulator, drone, and quadruped. In our experiments, SIREN achieves about 90x smaller rotation errors, 300x smaller translation errors, and 44x smaller scale errors in the most challenging scenes, where competing methods struggle. We will release the code and provide a link to the project page after the review process.\n\n我们提出了SIREN，用于多机器人高斯溅射（GSplat）地图的配准，且无需访问相机位姿、图像或本地子图的地图间转换进行初始化或融合。为实现这些功能，SIREN利用语义的多样性和鲁棒性，通过三种关键方式推导出多机器人GSplat地图的严格配准流程。首先，SIREN利用语义识别本地地图中的特征丰富区域，这些区域使得配准问题更易解决，从而消除了先前工作中通常需要的初始化步骤。其次，SIREN通过鲁棒的语义特征识别本地地图中高斯之间的候选对应关系，为稳健的几何优化奠定了基础，粗略地对从本地地图提取的3D高斯原语进行对齐。第三，这一步骤使得后续的子图之间的光度细化成为可能，SIREN在GSplat地图中利用新视角合成和基于语义的图像滤波器来计算高精度的非刚性变换，以生成高保真的融合地图。我们在多个真实世界数据集上展示了SIREN相较于竞争基线的优越性能，特别是在最广泛使用的机器人硬件平台上，包括机械臂、无人机和四足机器人。在我们的实验中，SIREN在最具挑战性的场景中实现了约90倍更小的旋转误差、300倍更小的平移误差和44倍更小的尺度误差，而竞争方法则难以应对这些场景。我们将在审稿过程后发布代码并提供项目页面链接。\n"
  },
  {
    "path": "abs/2502.07615.md",
    "content": "### Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors\n\n3D Gaussian Splatting (3DGS) has achieved excellent rendering quality with fast training and rendering speed. However, its optimization process lacks explicit geometric constraints, leading to suboptimal geometric reconstruction in regions with sparse or no observational input views. In this work, we try to mitigate the issue by incorporating a pre-trained matching prior to the 3DGS optimization process. We introduce Flow Distillation Sampling (FDS), a technique that leverages pre-trained geometric knowledge to bolster the accuracy of the Gaussian radiance field. Our method employs a strategic sampling technique to target unobserved views adjacent to the input views, utilizing the optical flow calculated from the matching model (Prior Flow) to guide the flow analytically calculated from the 3DGS geometry (Radiance Flow). Comprehensive experiments in depth rendering, mesh reconstruction, and novel view synthesis showcase the significant advantages of FDS over state-of-the-art methods. Additionally, our interpretive experiments and analysis aim to shed light on the effects of FDS on geometric accuracy and rendering quality, potentially providing readers with insights into its performance.\n\n三维高斯溅射（3DGS）在渲染质量和训练与渲染速度上取得了优异的表现。然而，其优化过程缺乏明确的几何约束，导致在观测输入视角稀疏或没有观测输入的区域几何重建不理想。在这项工作中，我们尝试通过在3DGS优化过程中引入预训练匹配先验来缓解这一问题。我们提出了流蒸馏采样（Flow Distillation Sampling，FDS）技术，该技术利用预训练的几何知识来增强高斯辐射场的准确性。我们的方法采用一种战略性采样技术，针对输入视角相邻的未观察视角进行采样，利用从匹配模型（先验流，Prior Flow）计算的光流，指导3DGS几何（辐射流，Radiance Flow）中分析计算得到的光流。深度渲染、网格重建和新视角合成的综合实验展示了FDS相较于最先进方法的显著优势。此外，我们的解释性实验和分析旨在阐明FDS对几何准确性和渲染质量的影响，可能为读者提供关于其性能的深入见解。\n"
  },
  {
    "path": "abs/2502.07754.md",
    "content": "### MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization\n\nGaussian Splatting (GS) is a recent and pivotal technique in 3D computer graphics. GS-based algorithms almost always bypass classical methods such as ray tracing, which offers numerous inherent advantages for rendering. For example, ray tracing is able to handle incoherent rays for advanced lighting effects, including shadows and reflections. To address this limitation, we introduce MeshSplats, a method which converts GS to a mesh-like format. Following the completion of training, MeshSplats transforms Gaussian elements into mesh faces, enabling rendering using ray tracing methods with all their associated benefits. Our model can be utilized immediately following transformation, yielding a mesh of slightly reduced quality without additional training. Furthermore, we can enhance the reconstruction quality through the application of a dedicated optimization algorithm that operates on mesh faces rather than Gaussian components. The efficacy of our method is substantiated by experimental results, underscoring its extensive applications in computer graphics and image processing.\n\n高斯溅射（GS）是3D计算机图形学中的一种新兴且关键的技术。基于GS的算法几乎总是绕过传统方法，如光线追踪，而光线追踪在渲染中提供了许多固有的优势。例如，光线追踪能够处理不连续的光线，从而实现高级光照效果，包括阴影和反射。为了解决这一限制，我们提出了MeshSplats，一种将GS转换为类似网格格式的方法。在训练完成后，MeshSplats将高斯元素转换为网格面，从而实现使用光线追踪方法进行渲染，并获得所有相关的优势。我们的模型在转换后即可直接使用，生成略微降低质量的网格，无需额外训练。此外，我们还可以通过应用专门的优化算法，针对网格面而非高斯组件进行优化，从而提升重建质量。实验结果验证了我们方法的有效性，强调了其在计算机图形学和图像处理中的广泛应用。\n\n"
  },
  {
    "path": "abs/2502.07840.md",
    "content": "### TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation\n\nTransparent object manipulation remains a sig- nificant challenge in robotics due to the difficulty of acquiring accurate and dense depth measurements. Conventional depth sensors often fail with transparent objects, resulting in in- complete or erroneous depth data. Existing depth completion methods struggle with interframe consistency and incorrectly model transparent objects as Lambertian surfaces, leading to poor depth reconstruction. To address these challenges, we propose TranSplat, a surface embedding-guided 3D Gaussian Splatting method tailored for transparent objects. TranSplat uses a latent diffusion model to generate surface embeddings that provide consistent and continuous representations, making it robust to changes in viewpoint and lighting. By integrating these surface embeddings with input RGB images, TranSplat effectively captures the complexities of transparent surfaces, enhancing the splatting of 3D Gaussians and improving depth completion. Evaluations on synthetic and real-world transpar- ent object benchmarks, as well as robot grasping tasks, show that TranSplat achieves accurate and dense depth completion, demonstrating its effectiveness in practical applications.\n\n透明物体的操作在机器人学中仍然是一个重大挑战，因为很难获得准确且密集的深度测量。传统的深度传感器在透明物体上常常无法正常工作，导致深度数据不完整或出现错误。现有的深度补全方法在帧间一致性方面存在困难，并且错误地将透明物体建模为朗伯（Lambertian）表面，导致深度重建效果较差。为了解决这些问题，我们提出了TranSplat，一种基于表面嵌入引导的3D高斯溅射方法，专门针对透明物体。TranSplat使用潜在扩散模型生成表面嵌入，这些嵌入提供了一致且连续的表示，使其对视角和光照变化具有鲁棒性。通过将这些表面嵌入与输入的RGB图像结合，TranSplat有效捕捉透明表面的复杂性，增强了3D高斯溅射，从而改善深度补全。我们在合成和真实世界的透明物体基准测试以及机器人抓取任务上的评估表明，TranSplat能够实现准确且密集的深度补全，证明了其在实际应用中的有效性。\n"
  },
  {
    "path": "abs/2502.09039.md",
    "content": "### Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting\n\nWhile Implicit Neural Representations (INRs) have demonstrated significant success in image representation, they are often hindered by large training memory and slow decoding speed. Recently, Gaussian Splatting (GS) has emerged as a promising solution in 3D reconstruction due to its high-quality novel view synthesis and rapid rendering capabilities, positioning it as a valuable tool for a broad spectrum of applications. In particular, a GS-based representation, 2DGS, has shown potential for image fitting. In our work, we present Large Images are Gaussians (LIG), which delves deeper into the application of 2DGS for image representations, addressing the challenge of fitting large images with 2DGS in the situation of numerous Gaussian points, through two distinct modifications: 1) we adopt a variant of representation and optimization strategy, facilitating the fitting of a large number of Gaussian points; 2) we propose a Level-of-Gaussian approach for reconstructing both coarse low-frequency initialization and fine high-frequency details. Consequently, we successfully represent large images as Gaussian points and achieve high-quality large image representation, demonstrating its efficacy across various types of large images.\n\n尽管隐式神经表示（INRs）在图像表示方面取得了显著成功，但它们通常受到大规模训练内存和较慢解码速度的限制。最近，高斯溅射（GS）由于其高质量的新视角合成和快速渲染能力，成为3D重建领域中的一种有前景的解决方案，定位为广泛应用的有价值工具。特别是，基于GS的表示方法——2DGS，在图像拟合方面显示了潜力。在我们的研究中，我们提出了Large Images are Gaussians (LIG)，进一步探讨了2DGS在图像表示中的应用，解决了在大量高斯点情况下，如何使用2DGS拟合大图像这一挑战，具体通过两项独特的改进：1）我们采用了一种变体的表示和优化策略，便于拟合大量高斯点；2）我们提出了一种高斯层级方法，用于重建粗略的低频初始化和细致的高频细节。因此，我们成功地将大图像表示为高斯点，并实现了高质量的大图像表示，展示了其在多种类型的大图像中的有效性。\n"
  },
  {
    "path": "abs/2502.09111.md",
    "content": "### DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior\n\nGaussian SLAM systems excel in real-time rendering and fine-grained reconstruction compared to NeRF-based systems. However, their reliance on extensive keyframes is impractical for deployment in real-world robotic systems, which typically operate under sparse-view conditions that can result in substantial holes in the map. To address these challenges, we introduce DenseSplat, the first SLAM system that effectively combines the advantages of NeRF and 3DGS. DenseSplat utilizes sparse keyframes and NeRF priors for initializing primitives that densely populate maps and seamlessly fill gaps. It also implements geometry-aware primitive sampling and pruning strategies to manage granularity and enhance rendering efficiency. Moreover, DenseSplat integrates loop closure and bundle adjustment, significantly enhancing frame-to-frame tracking accuracy. Extensive experiments on multiple large-scale datasets demonstrate that DenseSplat achieves superior performance in tracking and mapping compared to current state-of-the-art methods.\n\n高斯SLAM系统在实时渲染和精细重建方面相较于NeRF-based系统具有优势。然而，它们对大量关键帧的依赖在实际的机器人系统中难以部署，因为这些系统通常在稀疏视角条件下操作，这可能导致地图中出现较大的空洞。为了解决这些挑战，我们提出了DenseSplat，这是第一个有效结合NeRF和3DGS优势的SLAM系统。DenseSplat利用稀疏的关键帧和NeRF先验来初始化原始元素，密集填充地图并无缝填补空洞。它还实现了几何感知的原始元素采样和修剪策略，以管理粒度并提高渲染效率。此外，DenseSplat集成了回环闭合和束束调整，显著提高了帧间跟踪精度。对多个大规模数据集的广泛实验表明，DenseSplat在跟踪和地图构建方面相较于现有的最先进方法表现出色。\n"
  },
  {
    "path": "abs/2502.09563.md",
    "content": "### Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction\n\nIn this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. In particular, our technique enables high-quality scene reconstruction from Large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of images. Our approach introduces a novel method for modeling complex lens distortions using a hybrid network that combines invertible residual networks with explicit grids. This design effectively regularizes the optimization process, achieving greater accuracy than conventional camera models. Additionally, we propose a cubemap-based resampling strategy to support large FOV images without sacrificing resolution or introducing distortion artifacts. Our method is compatible with the fast rasterization of Gaussian Splatting, adaptable to a wide variety of camera lens distortion, and demonstrates state-of-the-art performance on both synthetic and real-world datasets.\n\n在本文中，我们提出了一个自校准框架，该框架联合优化相机参数、镜头畸变和3D高斯表示，从而实现精确且高效的场景重建。特别地，我们的技术能够从大视场（FOV）图像中进行高质量场景重建，这些图像使用广角镜头拍摄，使得能够从较少的图像中建模场景。我们的方法引入了一种新颖的镜头畸变建模方法，采用混合网络，结合了可逆残差网络和显式网格。这一设计有效地规范化了优化过程，相比传统相机模型达到了更高的准确性。此外，我们提出了一种基于立方体贴图的重采样策略，支持大视场图像，同时不牺牲分辨率或引入畸变伪影。我们的方法与高斯溅射的快速光栅化兼容，可适应各种相机镜头畸变，并在合成和真实世界数据集上展示了最先进的性能。\n"
  },
  {
    "path": "abs/2502.10475.md",
    "content": "### X-SG2S: Safe and Generalizable Gaussian Splatting with X-dimensional Watermarks\n\n3D Gaussian Splatting (3DGS) has been widely used in 3D reconstruction and 3D generation. Training to get a 3DGS scene often takes a lot of time and resources and even valuable inspiration. The increasing amount of 3DGS digital asset have brought great challenges to the copyright protection. However, it still lacks profound exploration targeted at 3DGS. In this paper, we propose a new framework X-SG2S which can simultaneously watermark 1 to 3D messages while keeping the original 3DGS scene almost unchanged. Generally, we have a X-SG2S injector for adding multi-modal messages simultaneously and an extractor for extract them. Specifically, we first split the watermarks into message patches in a fixed manner and sort the 3DGS points. A self-adaption gate is used to pick out suitable location for watermarking. Then use a XD(multi-dimension)-injection heads to add multi-modal messages into sorted 3DGS points. A learnable gate can recognize the location with extra messages and XD-extraction heads can restore hidden messages from the location recommended by the learnable gate. Extensive experiments demonstrated that the proposed X-SG2S can effectively conceal multi modal messages without changing pretrained 3DGS pipeline or the original form of 3DGS parameters. Meanwhile, with simple and efficient model structure and high practicality, X-SG2S still shows good performance in hiding and extracting multi-modal inner structured or unstructured messages. X-SG2S is the first to unify 1 to 3D watermarking model for 3DGS and the first framework to add multi-modal watermarks simultaneous in one 3DGS which pave the wave for later researches.\n\n3D高斯溅射（3DGS）已广泛应用于3D重建和3D生成领域。然而，训练以获得3DGS场景通常需要大量时间和资源，甚至可能影响创作灵感。随着3DGS数字资产的数量不断增加，版权保护面临巨大的挑战。然而，目前针对3DGS的深入探索仍显不足。本文提出了一种新框架X-SG2S，该框架能够同时在3DGS场景中水印1到3D信息，同时保持原始3DGS场景几乎不变。通常，我们设计了一个X-SG2S注入器，用于同时添加多模态信息，以及一个提取器，用于提取这些信息。具体来说，我们首先以固定的方式将水印分割为消息块，并对3DGS点进行排序。使用自适应门控机制来选择合适的位置进行水印嵌入。然后，使用XD（多维）注入头将多模态消息添加到排序后的3DGS点中。一个可学习的门控机制能够识别具有额外信息的位置，而XD提取头则从由可学习门控机制推荐的位置恢复隐藏的消息。大量实验表明，所提出的X-SG2S能够有效地隐藏多模态消息，而不会改变预训练的3DGS流程或原始3DGS参数的形式。同时，X-SG2S凭借其简单高效的模型结构和高实用性，在隐藏和提取多模态内部结构或非结构化消息方面表现出色。X-SG2S是首个为3DGS统一1到3D水印模型的框架，也是首个在一个3DGS中同时添加多模态水印的框架，为后续研究铺平了道路。\n"
  },
  {
    "path": "abs/2502.10827.md",
    "content": "### E-3DGS: Event-Based Novel View Rendering of Large-Scale Scenes Using 3D Gaussian Splatting\n\nNovel view synthesis techniques predominantly utilize RGB cameras, inheriting their limitations such as the need for sufficient lighting, susceptibility to motion blur, and restricted dynamic range. In contrast, event cameras are significantly more resilient to these limitations but have been less explored in this domain, particularly in large-scale settings. Current methodologies primarily focus on front-facing or object-oriented (360-degree view) scenarios. For the first time, we introduce 3D Gaussians for event-based novel view synthesis. Our method reconstructs large and unbounded scenes with high visual quality. We contribute the first real and synthetic event datasets tailored for this setting. Our method demonstrates superior novel view synthesis and consistently outperforms the baseline EventNeRF by a margin of 11-25% in PSNR (dB) while being orders of magnitude faster in reconstruction and rendering.\n\n新型视图合成技术主要依赖于RGB相机，继承了其诸如需要充足光照、易受运动模糊影响以及动态范围受限等局限性。相比之下，事件相机在这些限制下表现得更加稳健，但在这一领域，尤其是在大规模设置中，仍然探索较少。目前的方法主要集中在前向或面向物体（360度视图）场景。首次，我们将3D高斯溅射应用于基于事件的视图合成。我们的方法能够重建大型且无限制的场景，并提供高视觉质量。我们贡献了首个专为此设置设计的真实和合成事件数据集。我们的算法在新视图合成方面表现优异，在PSNR（分贝）上比基线方法EventNeRF提高了11-25%，同时在重建和渲染速度上快了几个数量级。\n"
  },
  {
    "path": "abs/2502.10975.md",
    "content": "### GS-GVINS: A Tightly-integrated GNSS-Visual-Inertial Navigation System Augmented by 3D Gaussian Splatting\n\nRecently, the emergence of 3D Gaussian Splatting (3DGS) has drawn significant attention in the area of 3D map reconstruction and visual SLAM. While extensive research has explored 3DGS for indoor trajectory tracking using visual sensor alone or in combination with Light Detection and Ranging (LiDAR) and Inertial Measurement Unit (IMU), its integration with GNSS for large-scale outdoor navigation remains underexplored. To address these concerns, we proposed GS-GVINS: a tightly-integrated GNSS-Visual-Inertial Navigation System augmented by 3DGS. This system leverages 3D Gaussian as a continuous differentiable scene representation in largescale outdoor environments, enhancing navigation performance through the constructed 3D Gaussian map. Notably, GS-GVINS is the first GNSS-Visual-Inertial navigation application that directly utilizes the analytical jacobians of SE3 camera pose with respect to 3D Gaussians. To maintain the quality of 3DGS rendering in extreme dynamic states, we introduce a motionaware 3D Gaussian pruning mechanism, updating the map based on relative pose translation and the accumulated opacity along the camera ray. For validation, we test our system under different driving environments: open-sky, sub-urban, and urban. Both self-collected and public datasets are used for evaluation. The results demonstrate the effectiveness of GS-GVINS in enhancing navigation accuracy across diverse driving environments.\n\n最近，3D高斯溅射（3DGS）的出现引起了在3D地图重建和视觉SLAM领域的广泛关注。尽管已有大量研究探讨了单独使用视觉传感器或与激光雷达（LiDAR）和惯性测量单元（IMU）结合使用的3DGS在室内轨迹跟踪中的应用，但其与全球导航卫星系统（GNSS）结合用于大规模户外导航的研究仍然较少。为了解决这一问题，我们提出了GS-GVINS：一种由3DGS增强的紧密集成GNSS-视觉-惯性导航系统。该系统利用3D高斯作为大规模户外环境中连续可微的场景表示，通过构建的3D高斯地图提高导航性能。值得注意的是，GS-GVINS是首个直接利用SE3相机位姿相对于3D高斯的解析雅可比矩阵的GNSS-视觉-惯性导航应用。为了保持在极端动态状态下3DGS渲染的质量，我们引入了一种运动感知的3D高斯修剪机制，根据相对位姿平移和沿相机光线的累积不透明度更新地图。为了验证我们的系统，我们在不同的驾驶环境下进行了测试：开阔天空、郊区和城市环境。我们使用了自收集的数据集和公开数据集进行评估。结果表明，GS-GVINS在提高不同驾驶环境下的导航精度方面非常有效。\n"
  },
  {
    "path": "abs/2502.10988.md",
    "content": "### OMG: Opacity Matters in Material Modeling with Gaussian Splatting\n\nDecomposing geometry, materials and lighting from a set of images, namely inverse rendering, has been a long-standing problem in computer vision and graphics. Recent advances in neural rendering enable photo-realistic and plausible inverse rendering results. The emergence of 3D Gaussian Splatting has boosted it to the next level by showing real-time rendering potentials. An intuitive finding is that the models used for inverse rendering do not take into account the dependency of opacity w.r.t. material properties, namely cross section, as suggested by optics. Therefore, we develop a novel approach that adds this dependency to the modeling itself. Inspired by radiative transfer, we augment the opacity term by introducing a neural network that takes as input material properties to provide modeling of cross section and a physically correct activation function. The gradients for material properties are therefore not only from color but also from opacity, facilitating a constraint for their optimization. Therefore, the proposed method incorporates more accurate physical properties compared to previous works. We implement our method into 3 different baselines that use Gaussian Splatting for inverse rendering and achieve significant improvements universally in terms of novel view synthesis and material modeling.\n\n从一组图像中分解几何形状、材料和光照的过程，即逆向渲染，一直是计算机视觉和图形学中的一个长期难题。最近，神经渲染技术的进展使得能够生成逼真的逆向渲染结果，而3D高斯溅射（3DGS）的出现将其推向了一个新的水平，展示了实时渲染的潜力。一个直观的发现是，现有的逆向渲染模型并未考虑不透明度与材料属性之间的依赖关系，即横截面，这一点在光学中有所提及。为了解决这个问题，我们开发了一种新的方法，将这种依赖关系添加到建模中。\n受到辐射传输启发，我们通过引入一个神经网络来增强不透明度项，利用材料属性作为输入，提供横截面的建模，并采用物理正确的激活函数。因此，材料属性的梯度不仅来自颜色信息，还来自不透明度，从而为优化过程提供了约束。通过这种方法，与以前的工作相比，我们能够更加准确地结合物理属性。\n我们将这种方法实现到三种不同的基线模型中，这些模型都使用高斯溅射进行逆向渲染，并在新视角合成和材料建模方面普遍取得了显著的改进。\n"
  },
  {
    "path": "abs/2502.11642.md",
    "content": "### GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text\n\nIn this paper, we introduce GaussianMotion, a novel human rendering model that generates fully animatable scenes aligned with textual descriptions using Gaussian Splatting. Although existing methods achieve reasonable text-to-3D generation of human bodies using various 3D representations, they often face limitations in fidelity and efficiency, or primarily focus on static models with limited pose control. In contrast, our method generates fully animatable 3D avatars by combining deformable 3D Gaussian Splatting with text-to-3D score distillation, achieving high fidelity and efficient rendering for arbitrary poses. By densely generating diverse random poses during optimization, our deformable 3D human model learns to capture a wide range of natural motions distilled from a pose-conditioned diffusion model in an end-to-end manner. Furthermore, we propose Adaptive Score Distillation that effectively balances realistic detail and smoothness to achieve optimal 3D results. Experimental results demonstrate that our approach outperforms existing baselines by producing high-quality textures in both static and animated results, and by generating diverse 3D human models from various textual inputs.\n\n在本文中，我们提出了一种新型的渲染模型——GaussianMotion，该模型能够生成完全可动画化的场景，并与文本描述对齐，采用高斯溅射（Gaussian Splatting）技术。尽管现有方法能够实现合理的文本到3D的人体生成，利用多种3D表示方式，但这些方法通常在保真度和效率上存在限制，或者主要集中于静态模型，并且姿势控制有限。与之不同，我们的方法通过结合可变形的3D高斯溅射和文本到3D得分蒸馏（score distillation），生成完全可动画化的3D头像，实现了对任意姿势的高保真和高效渲染。\n在优化过程中，我们通过密集地生成多种随机姿势，使得我们的可变形3D人体模型能够捕捉来自姿势条件扩散模型（pose-conditioned diffusion model）的广泛自然运动，并以端到端的方式进行蒸馏。此外，我们还提出了自适应得分蒸馏，该方法有效地平衡了真实细节和流畅度，从而实现了最佳的3D效果。\n实验结果表明，我们的方法在静态和动态结果上都优于现有基线，通过生成高质量的纹理以及从不同文本输入中生成多样化的3D人体模型，展示了显著的改进。\n"
  },
  {
    "path": "abs/2502.11782.md",
    "content": "### Exploring the Versal AI Engine for 3D Gaussian Splatting\n\nDataflow-oriented spatial architectures are the emerging paradigm for higher computation performance and efficiency.\nAMD Versal AI Engine is a commercial spatial architecture consisting of tiles of VLIW processors supporting SIMD operations arranged in a two-dimensional mesh.\nThe architecture requires the explicit design of task assignments and dataflow configurations for each tile to maximize performance, demanding advanced techniques and meticulous design.\nHowever, a few works revealed the performance characteristics of the Versal AI Engine through practical workloads.\nIn this work, we provide the comprehensive performance evaluation of the Versal AI Engine using Gaussian feature computation in 3D Gaussian splatting as a practical workload, and we then propose a novel dedicated algorithm to fully exploit the hardware architecture.\nThe computations of 3D Gaussian splatting include matrix multiplications and color computations utilizing high-dimensional spherical harmonic coefficients.\nThese tasks are processed efficiently by leveraging the SIMD capabilities and their instruction-level parallelism.\nAdditionally, pipelined processing is achieved by assigning different tasks to individual cores, thereby fully exploiting the spatial parallelism of AI Engines.\nThe proposed method demonstrated a 226-fold throughput increase in simulation-based evaluation, outperforming a naive approach.\nThese findings provide valuable insights for application development that effectively harnesses the spatial and architectural advantages of AI Engines.\n\n在这项工作中，我们提供了对Versal AI Engine的全面性能评估，并采用3D高斯溅射中的高斯特征计算作为实际工作负载进行分析。Versal AI Engine是一种商业化的空间架构，由支持SIMD操作的VLIW处理器单元组成，这些处理器单元在一个二维网格中排列。为了最大化性能，该架构需要显式地设计每个处理单元的任务分配和数据流配置，这要求采用先进的技术和精细的设计。\n然而，关于Versal AI Engine的实际工作负载性能特性，现有研究较少。我们的研究通过高斯特征计算作为工作负载，对Versal AI Engine进行了性能评估，并提出了一种新颖的专用算法，以充分利用该硬件架构。\n3D高斯溅射的计算包括矩阵乘法和颜色计算，涉及到高维球面调和系数。通过利用SIMD能力和指令级并行性，这些任务可以高效处理。此外，通过将不同任务分配到各个核心，我们实现了流水线处理，从而充分发挥AI Engine的空间并行性。\n在基于仿真的评估中，所提出的方法展示了226倍的吞吐量提升，明显优于简单的方法。这些发现为应用开发提供了宝贵的洞察，能够有效利用AI Engine的空间和架构优势。\n"
  },
  {
    "path": "abs/2502.11801.md",
    "content": "### 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency\n\nWhen performing 3D inpainting using novel-view rendering methods like Neural Radiance Field (NeRF) or 3D Gaussian Splatting (3DGS), how to achieve texture and geometry consistency across camera views has been a challenge. In this paper, we propose a framework of 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency (3DGIC) for cross-view consistent 3D inpainting. Guided by the rendered depth information from each training view, our 3DGIC exploits background pixels visible across different views for updating the inpainting mask, allowing us to refine the 3DGS for inpainting purposes. Through extensive experiments on benchmark datasets, we confirm that our 3DGIC outperforms current state-of-the-art 3D inpainting methods quantitatively and qualitatively.\n\n在使用新型视图渲染方法进行3D修复（如神经辐射场（NeRF）或3D高斯点云（3DGS））时，如何在不同相机视角之间实现纹理和几何一致性一直是一个挑战。本文提出了一种具有深度引导跨视角一致性的3D高斯修复框架（3DGIC）用于跨视角一致性的3D修复。在每个训练视角渲染的深度信息的引导下，我们的3DGIC利用不同视角中可见的背景像素来更新修复掩码，从而使得3DGS能够更好地进行修复。通过在基准数据集上进行大量实验，我们验证了3DGIC在定量和定性上都超越了现有的最先进的3D修复方法。\n"
  },
  {
    "path": "abs/2502.12231.md",
    "content": "### PUGS: Zero-shot Physical Understanding with Gaussian Splatting\n\nCurrent robotic systems can understand the categories and poses of objects well. But understanding physical properties like mass, friction, and hardness, in the wild, remains challenging. We propose a new method that reconstructs 3D objects using the Gaussian splatting representation and predicts various physical properties in a zero-shot manner. We propose two techniques during the reconstruction phase: a geometry-aware regularization loss function to improve the shape quality and a region-aware feature contrastive loss function to promote region affinity. Two other new techniques are designed during inference: a feature-based property propagation module and a volume integration module tailored for the Gaussian representation. Our framework is named as zero-shot physical understanding with Gaussian splatting, or PUGS. PUGS achieves new state-of-the-art results on the standard benchmark of ABO-500 mass prediction. We provide extensive quantitative ablations and qualitative visualization to demonstrate the mechanism of our designs. We show the proposed methodology can help address challenging real-world grasping tasks.\n\n当前的机器人系统可以很好地理解物体的类别和姿势，但在实际环境中，理解物理属性如质量、摩擦力和硬度仍然具有挑战性。我们提出了一种新方法，利用高斯溅射表示重建三维物体，并以零样本方式预测各种物理属性。在重建阶段，我们提出了两种技术：一种几何感知正则化损失函数，用于提高形状质量；一种区域感知特征对比损失函数，用于促进区域亲和性。在推理阶段，我们设计了另外两种新技术：一种基于特征的属性传播模块和一种针对高斯表示定制的体积积分模块。我们的框架被命名为零样本物理理解与高斯溅射（PUGS）。PUGS在标准基准ABO-500质量预测上取得了新的最先进成果。我们提供了广泛的定量消融实验和定性可视化，以展示我们设计机制的效果。我们展示了所提出的方法能够帮助解决具有挑战性的真实世界抓取任务。\n"
  },
  {
    "path": "abs/2502.12686.md",
    "content": "### RadSplatter: Extending 3D Gaussian Splatting to Radio Frequencies for Wireless Radiomap Extrapolation\n\nA radiomap represents the spatial distribution of wireless signal strength, critical for applications like network optimization and autonomous driving. However, constructing radiomap relies on measuring radio signal power across the entire system, which is costly in outdoor environments due to large network scales. We present RadSplatter, a framework that extends 3D Gaussian Splatting (3DGS) to radio frequencies for efficient and accurate radiomap extrapolation from sparse measurements. RadSplatter models environmental scatterers and radio paths using 3D Gaussians, capturing key factors of radio wave propagation. It employs a relaxed-mean (RM) scheme to reparameterize the positions of 3D Gaussians from noisy and dense 3D point clouds. A camera-free 3DGS-based projection is proposed to map 3D Gaussians onto 2D radio beam patterns. Furthermore, a regularized loss function and recursive fine-tuning using highly structured sparse measurements in real-world settings are applied to ensure robust generalization. Experiments on synthetic and real-world data show state-of-the-art extrapolation accuracy and execution speed.\n\n无线电图（radiomap）表示无线信号强度的空间分布，对于网络优化和自动驾驶等应用至关重要。然而，构建无线电图需要测量整个系统中的无线电信号功率，在室外环境中，由于网络规模庞大，成本较高。我们提出 RadSplatter，一个将 3D 高斯溅射（3D Gaussian Splatting, 3DGS） 扩展到无线电频率的框架，以高效且精准地从稀疏测量数据外推无线电图。RadSplatter 通过 3D 高斯 建模环境中的散射体和无线电路径，捕捉无线电波传播的关键因素。它采用 松弛均值（RM） 方案，对噪声较大且密集的 3D 点云 进行 3D 高斯位置重参数化。此外，我们提出了一种 无相机的 3DGS 投影方法，用于将 3D 高斯映射到 2D 无线电波束模式。为了确保鲁棒性和泛化能力，该框架引入了 正则化损失函数，并利用真实世界中高度结构化的稀疏测量数据进行 递归微调。在合成数据和真实世界数据上的实验表明，RadSplatter 在外推精度和执行速度方面均达到了最先进水平。\n"
  },
  {
    "path": "abs/2502.13196.md",
    "content": "### GS-QA: Comprehensive Quality Assessment Benchmark for Gaussian Splatting View Synthesis\n\nGaussian Splatting (GS) offers a promising alternative to Neural Radiance Fields (NeRF) for real-time 3D scene rendering. Using a set of 3D Gaussians to represent complex geometry and appearance, GS achieves faster rendering times and reduced memory consumption compared to the neural network approach used in NeRF. However, quality assessment of GS-generated static content is not yet explored in-depth. This paper describes a subjective quality assessment study that aims to evaluate synthesized videos obtained with several static GS state-of-the-art methods. The methods were applied to diverse visual scenes, covering both 360-degree and forward-facing (FF) camera trajectories. Moreover, the performance of 18 objective quality metrics was analyzed using the scores resulting from the subjective study, providing insights into their strengths, limitations, and alignment with human perception. All videos and scores are made available providing a comprehensive database that can be used as benchmark on GS view synthesis and objective quality metrics.\n\n高斯溅射（Gaussian Splatting, GS） 为实时三维场景渲染提供了一种有前景的替代方案，相较于 神经辐射场（Neural Radiance Fields, NeRF），GS 采用 3D 高斯 集合来表示复杂的几何结构和外观，从而实现比 NeRF 依赖神经网络的方法更快的渲染速度和更低的内存消耗。然而，对 GS 生成的静态内容的质量评估尚未得到深入研究。本文描述了一项 主观质量评估研究，旨在评估使用多种最先进静态 GS 方法合成的视频。研究涵盖了多种视觉场景，包括 360 度全景 和 前向视角（FF） 的摄像机轨迹。此外，我们分析了 18 种客观质量评估指标 在主观研究得分下的表现，以深入探讨它们的优势、局限性及其与人类感知的一致性。所有视频及评分均已公开，提供了一个全面的数据库，可作为 GS 视图合成和客观质量评估指标 的基准测试数据集。\n"
  },
  {
    "path": "abs/2502.13803.md",
    "content": "### 3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments\n\nThe field of visual localization has been researched for several decades and has meanwhile found many practical applications. Despite the strong progress in this field, there are still challenging situations in which established methods fail. We present an approach to significantly improve the accuracy and reliability of established visual localization methods by adding rendered images. In detail, we first use a modern visual SLAM approach that provides a 3D Gaussian Splatting (3DGS) based map to create reference data. We demonstrate that enriching reference data with images rendered from 3DGS at randomly sampled poses significantly improves the performance of both geometry-based visual localization and Scene Coordinate Regression (SCR) methods. Through comprehensive evaluation in a large industrial environment, we analyze the performance impact of incorporating these additional rendered views.\n\n视觉定位（Visual Localization）领域已被研究数十年，并已应用于众多实际场景。尽管该领域取得了显著进展，但仍存在某些具有挑战性的情况，使得现有方法可能失效。我们提出了一种方法，通过引入渲染图像，显著提升现有视觉定位方法的准确性和可靠性。具体而言，我们首先使用基于 3D 高斯溅射（3D Gaussian Splatting, 3DGS） 的现代视觉 SLAM 方法构建三维地图，以生成参考数据。我们证明，将 3DGS 渲染的图像（在随机采样的位姿处生成）用于丰富参考数据，可以显著提升基于几何的视觉定位方法和 场景坐标回归（Scene Coordinate Regression, SCR） 方法的性能。通过在 大型工业环境 中进行全面评估，我们分析了这些额外渲染视图对定位性能的影响。\n"
  },
  {
    "path": "abs/2502.14004.md",
    "content": "### Inter3D: A Benchmark and Strong Baseline for Human-Interactive 3D Object Reconstruction\n\nRecent advancements in implicit 3D reconstruction methods, e.g., neural rendering fields and Gaussian splatting, have primarily focused on novel view synthesis of static or dynamic objects with continuous motion states. However, these approaches struggle to efficiently model a human-interactive object with n movable parts, requiring 2^n separate models to represent all discrete states. To overcome this limitation, we propose Inter3D, a new benchmark and approach for novel state synthesis of human-interactive objects. We introduce a self-collected dataset featuring commonly encountered interactive objects and a new evaluation pipeline, where only individual part states are observed during training, while part combination states remain unseen. We also propose a strong baseline approach that leverages Space Discrepancy Tensors to efficiently modelling all states of an object. To alleviate the impractical constraints on camera trajectories across training states, we propose a Mutual State Regularization mechanism to enhance the spatial density consistency of movable parts. In addition, we explore two occupancy grid sampling strategies to facilitate training efficiency. We conduct extensive experiments on the proposed benchmark, showcasing the challenges of the task and the superiority of our approach.\n\n近年来，隐式三维重建（Implicit 3D Reconstruction） 方法（如 神经渲染场（Neural Rendering Fields） 和 高斯溅射（Gaussian Splatting））的进展主要集中在 静态或动态对象的 新视角合成，其中对象的运动状态是 连续的。然而，这些方法在建模 具有 n 个可移动部件的人机交互对象 时存在效率问题，需要 2^n 个独立模型 才能表示所有离散状态，从而导致计算和存储成本急剧上升。\n为了解决这一局限性，我们提出 Inter3D，一个针对 人机交互对象的新状态合成（Novel State Synthesis） 的 基准（Benchmark） 和 新方法。我们构建了一个 自采集数据集，涵盖常见的交互式对象，并引入了 新的评估流程，其中 训练阶段仅观测单个部件的状态，而部件组合状态在训练中是不可见的。此外，我们提出了一种 强基线方法，利用 空间差异张量（Space Discrepancy Tensors） 高效建模对象的所有状态。\n为缓解 训练过程中不同状态下相机轨迹的一致性约束，我们设计了一种 互相状态正则化（Mutual State Regularization） 机制，以增强可移动部件的空间密度一致性。此外，我们探索了 两种占据网格（Occupancy Grid）采样策略 以提升训练效率。我们在 Inter3D 基准 上进行了广泛实验，展示了该任务的挑战性以及我们方法的优越性。\n"
  },
  {
    "path": "abs/2502.14129.md",
    "content": "### GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian\n\nThe reconstruction of 3D objects from calibrated photographs represents a fundamental yet intricate challenge in the domains of computer graphics and vision. Although neural reconstruction approaches based on Neural Radiance Fields (NeRF) have shown remarkable capabilities, their processing costs remain substantial. Recently, the advent of 3D Gaussian Splatting (3D-GS) largely improves the training efficiency and facilitates to generate realistic rendering in real-time. However, due to the limited ability of Spherical Harmonics (SH) to represent high-frequency information, 3D-GS falls short in reconstructing glossy objects. Researchers have turned to enhance the specular expressiveness of 3D-GS through inverse rendering. Yet these methods often struggle to maintain the training and rendering efficiency, undermining the benefits of Gaussian Splatting techniques. In this paper, we introduce GlossGau, an efficient inverse rendering framework that reconstructs scenes with glossy surfaces while maintaining training and rendering speeds comparable to vanilla 3D-GS. Specifically, we explicitly model the surface normals, Bidirectional Reflectance Distribution Function (BRDF) parameters, as well as incident lights and use Anisotropic Spherical Gaussian (ASG) to approximate the per-Gaussian Normal Distribution Function under the microfacet model. We utilize 2D Gaussian Splatting (2D-GS) as foundational primitives and apply regularization to significantly alleviate the normal estimation challenge encountered in related works. Experiments demonstrate that GlossGau achieves competitive or superior reconstruction on datasets with glossy surfaces. Compared with previous GS-based works that address the specular surface, our optimization time is considerably less.\n\n从校准照片重建三维物体是计算机图形学和计算机视觉领域中的基础但复杂的挑战。尽管基于神经辐射场（Neural Radiance Fields, NeRF）的神经重建方法表现出卓越的能力，但其计算开销仍然十分巨大。近期，3D 高斯溅射（3D Gaussian Splatting, 3D-GS）的出现极大地提升了训练效率，并使得实时生成高逼真渲染成为可能。\n然而，由于球谐函数（Spherical Harmonics, SH）在表示高频信息方面的能力有限，3D-GS 在重建光滑（Glossy）物体时表现欠佳。研究者尝试通过逆渲染（Inverse Rendering）增强3D-GS对镜面反射（Specular Reflectance）的表达能力，但这些方法往往难以在训练和渲染效率之间取得平衡，从而削弱了高斯溅射技术的优势。\n在本文中，我们提出GlossGau，一个高效的逆渲染框架，能够重建具有光滑表面的场景，同时保持与标准3D-GS相当的训练和渲染速度。具体而言，我们显式建模了表面法向量、双向反射分布函数（Bidirectional Reflectance Distribution Function, BRDF）参数，以及入射光，并利用各向异性球面高斯（Anisotropic Spherical Gaussian, ASG）来近似微表面模型（Microfacet Model）中的每个高斯法线分布函数。此外，我们采用2D高斯溅射（2D-GS）作为基础单元，并通过正则化策略显著缓解了相关工作中法向估计面临的挑战。\n实验结果表明，GlossGau 在具有光滑表面的数据集上实现了有竞争力甚至优于现有方法的重建效果。与现有基于GS处理镜面反射表面的工作相比，GlossGau 的优化时间显著缩短，进一步证明了其高效性。\n"
  },
  {
    "path": "abs/2502.14235.md",
    "content": "### OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving\n\nAccurate and realistic 3D scene reconstruction enables the lifelike creation of autonomous driving simulation environments. With advancements in 3D Gaussian Splatting (3DGS), previous studies have applied it to reconstruct complex dynamic driving scenes. These methods typically require expensive LiDAR sensors and pre-annotated datasets of dynamic objects. To address these challenges, we propose OG-Gaussian, a novel approach that replaces LiDAR point clouds with Occupancy Grids (OGs) generated from surround-view camera images using Occupancy Prediction Network (ONet). Our method leverages the semantic information in OGs to separate dynamic vehicles from static street background, converting these grids into two distinct sets of initial point clouds for reconstructing both static and dynamic objects. Additionally, we estimate the trajectories and poses of dynamic objects through a learning-based approach, eliminating the need for complex manual annotations. Experiments on Waymo Open dataset demonstrate that OG-Gaussian is on par with the current state-of-the-art in terms of reconstruction quality and rendering speed, achieving an average PSNR of 35.13 and a rendering speed of 143 FPS, while significantly reducing computational costs and economic overhead.\n\n精准且真实的三维场景重建 对于自动驾驶仿真环境的逼真构建至关重要。随着 3D 高斯溅射（3D Gaussian Splatting, 3DGS） 技术的进步，先前研究已将其应用于复杂动态驾驶场景的重建。然而，这些方法通常需要昂贵的 LiDAR 传感器 以及预标注的动态物体数据集，导致成本较高且难以大规模推广。\n为了解决这些问题，我们提出 OG-Gaussian，一种创新方法，用由环视相机图像通过占据预测网络（Occupancy Prediction Network, ONet）生成的占据网格（Occupancy Grids, OGs）替代 LiDAR 点云。我们的方法利用 OGs 中的语义信息，将动态车辆与静态街道背景分离，并将这些网格转换为 两组独立的初始点云，分别用于重建 静态物体 和 动态物体。此外，我们通过 基于学习的方法 估计 动态物体的轨迹和姿态，无需复杂的人工标注。\n在 Waymo Open 数据集 上的实验表明，OG-Gaussian 在重建质量和渲染速度方面可媲美当前最先进的方法，平均 PSNR 达 35.13，渲染速度 达到 143 FPS，同时大幅降低了计算成本和经济开销，为高效的自动驾驶场景重建提供了新方向。\n"
  },
  {
    "path": "abs/2502.14684.md",
    "content": "### CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has shown significant advantages in novel view synthesis (NVS), particularly in achieving high rendering speeds and high-quality results. However, its geometric accuracy in 3D reconstruction remains limited due to the lack of explicit geometric constraints during optimization. This paper introduces CDGS, a confidence-aware depth regularization approach developed to enhance 3DGS. We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision during the optimization process. Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy. Experiments on the publicly available Tanks and Temples benchmark dataset show that our method achieves more stable convergence behavior and more accurate geometric reconstruction results, with improvements of up to 2.31 dB in PSNR for NVS and consistently lower geometric errors in M3C2 distance metrics. Notably, our method reaches comparable F-scores to the original 3DGS with only 50% of the training iterations. We expect this work will facilitate the development of efficient and accurate 3D reconstruction systems for real-world applications such as digital twin creation, heritage preservation, or forestry applications.\n\n3D 高斯溅射（3D Gaussian Splatting, 3DGS） 在新视角合成（Novel View Synthesis, NVS） 方面展现了显著优势，特别是在实现高渲染速度和高质量合成方面。然而，由于优化过程中缺乏显式的几何约束，3DGS 在 三维重建 任务中的几何精度仍然有限。\n本文提出 CDGS，一种基于置信度的深度正则化（Confidence-aware Depth Regularization） 方法，以增强 3DGS 的几何准确性。我们利用单目深度估计的多线索置信图（multi-cue confidence maps） 以及 稀疏结构光测深（Sparse Structure-from-Motion Depth），在优化过程中自适应调整深度监督。该方法在早期训练阶段有效提升了几何细节保留能力，并在NVS 质量和几何准确性方面均表现出色。\n在 Tanks and Temples 公共基准数据集 上的实验表明，CDGS 收敛过程更加稳定，并在几何重建精度上取得了显著提升，在 NVS 任务中 PSNR 提升最高可达 2.31 dB，且在 M3C2 距离度量 上始终保持更低的几何误差。值得注意的是，我们的方法仅需原始 3DGS 50% 的训练迭代次数，即可达到相当的 F-score。我们期待该研究能够促进高效、精准的 3D 重建系统 发展，为 数字孪生（Digital Twin）、文化遗产保护和林业应用 等现实场景提供更优解决方案。\n"
  },
  {
    "path": "abs/2502.14895.md",
    "content": "### High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation\n\nWeather nowcasting is an essential task that involves predicting future radar echo sequences based on current observations, offering significant benefits for disaster management, transportation, and urban planning. Current prediction methods are limited by training and storage efficiency, mainly focusing on 2D spatial predictions at specific altitudes. Meanwhile, 3D volumetric predictions at each timestamp remain largely unexplored. To address such a challenge, we introduce a comprehensive framework for 3D radar sequence prediction in weather nowcasting, using the newly proposed SpatioTemporal Coherent Gaussian Splatting (STC-GS) for dynamic radar representation and GauMamba for efficient and accurate forecasting. Specifically, rather than relying on a 4D Gaussian for dynamic scene reconstruction, STC-GS optimizes 3D scenes at each frame by employing a group of Gaussians while effectively capturing their movements across consecutive frames. It ensures consistent tracking of each Gaussian over time, making it particularly effective for prediction tasks. With the temporally correlated Gaussian groups established, we utilize them to train GauMamba, which integrates a memory mechanism into the Mamba framework. This allows the model to learn the temporal evolution of Gaussian groups while efficiently handling a large volume of Gaussian tokens. As a result, it achieves both efficiency and accuracy in forecasting a wide range of dynamic meteorological radar signals. The experimental results demonstrate that our STC-GS can efficiently represent 3D radar sequences with over 16× higher spatial resolution compared with the existing 3D representation methods, while GauMamba outperforms state-of-the-art methods in forecasting a broad spectrum of high-dynamic weather conditions.\n\n天气临近预报（Weather Nowcasting）是一项关键任务，旨在基于当前观测数据预测未来的雷达回波序列，为 灾害管理、交通调度和城市规划 提供重要支持。现有预测方法主要专注于 特定高度的 2D 空间预测，但受到 训练效率和存储需求 的限制，而对 每个时间步的 3D 体积预测 仍然鲜有探索。\n为应对这一挑战，我们提出了一个 完整的 3D 雷达序列预测框架，基于新提出的 时空一致高斯溅射（SpatioTemporal Coherent Gaussian Splatting, STC-GS） 进行 动态雷达表示，并结合 GauMamba 以实现高效、精准的天气预报。具体而言，与依赖 4D 高斯 进行动态场景重建的方法不同，STC-GS 通过 一组高斯（Gaussian Groups） 在每一帧中优化 3D 场景，并有效地捕捉 连续帧之间的运动，确保对每个高斯点的一致跟踪，从而使其在 预测任务 中表现优异。\n在构建了时间相关的高斯组后，我们利用这些数据训练 GauMamba，该模型在 Mamba 框架 中集成了一种 记忆机制，能够学习高斯组的时间演化规律，并高效处理 大规模高斯数据流。最终，该方法能够在各种高动态气象雷达信号的预测任务中兼顾高效性与准确性。\n实验结果表明，STC-GS 能够以 16× 以上的空间分辨率 高效表示 3D 雷达序列，远超现有 3D 表示方法。同时，GauMamba 在预测各种复杂天气条件时，优于当前最先进的方法，为 高动态天气预报 提供了一种高效且精准的新方案。\n"
  },
  {
    "path": "abs/2502.14931.md",
    "content": "### Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting\n\nWe propose Hier-SLAM++, a comprehensive Neuro-Symbolic semantic 3D Gaussian Splatting SLAM method with both RGB-D and monocular input featuring an advanced hierarchical categorical representation, which enables accurate pose estimation as well as global 3D semantic mapping. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making scene understanding particularly challenging and costly. To address this problem, we introduce a novel and general hierarchical representation that encodes both semantic and geometric information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs) as well as the 3D generative model. By utilizing the proposed hierarchical tree structure, semantic information is symbolically represented and learned in an end-to-end manner. We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Additionally, we propose an improved SLAM system to support both RGB-D and monocular inputs using a feed-forward model. To the best of our knowledge, this is the first semantic monocular Gaussian Splatting SLAM system, significantly reducing sensor requirements for 3D semantic understanding and broadening the applicability of semantic Gaussian SLAM system. We conduct experiments on both synthetic and real-world datasets, demonstrating superior or on-par performance with state-of-the-art NeRF-based and Gaussian-based SLAM systems, while significantly reducing storage and training time requirements.\n\n我们提出了 Hier-SLAM++，一种全面的 神经-符号语义 3D 高斯溅射 SLAM 方法，支持 RGB-D 和单目输入，并引入了 先进的分层类别表示，实现了精确的位姿估计和全局 3D 语义映射。\n在语义 SLAM 系统中，随着环境复杂度的增加，参数使用量显著增长，使得场景理解变得尤为困难且成本高昂。为了解决这一问题，我们提出了一种新颖且通用的 分层表示方法，将 语义与几何信息 以紧凑的形式编码到 3D 高斯溅射 中，同时利用 大语言模型（LLMs） 以及 3D 生成模型 的能力。通过所提出的 分层树结构，语义信息以符号化方式进行表示，并通过端到端方式进行学习。\n此外，我们进一步引入了一种新颖的 语义损失函数，通过 层内（inter-level）和跨层（cross-level）优化，提升分层语义信息的优化效果。同时，我们提出了一种改进的 SLAM 系统，该系统通过 前馈（feed-forward）模型 支持 RGB-D 和单目输入。据我们所知，这是首个支持单目输入的语义高斯溅射 SLAM 系统，大幅降低了 3D 语义理解的传感器需求，拓展了语义高斯 SLAM 系统的应用范围。\n我们在合成数据集和真实世界数据集上进行了实验，结果表明，该方法在性能上优于或可媲美最先进的基于 NeRF 和高斯表示的 SLAM 系统，同时显著降低了存储需求和训练时间。\n"
  },
  {
    "path": "abs/2502.14938.md",
    "content": "### GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models\n\nRendering large-scale 3D Gaussian Splatting (3DGS) model faces significant challenges in achieving real-time, high-fidelity performance on consumer-grade devices. Fully realizing the potential of 3DGS in applications such as virtual reality (VR) requires addressing critical system-level challenges to support real-time, immersive experiences. We propose GS-Cache, an end-to-end framework that seamlessly integrates 3DGS's advanced representation with a highly optimized rendering system. GS-Cache introduces a cache-centric pipeline to eliminate redundant computations, an efficiency-aware scheduler for elastic multi-GPU rendering, and optimized CUDA kernels to overcome computational bottlenecks. This synergy between 3DGS and system design enables GS-Cache to achieve up to 5.35x performance improvement, 35% latency reduction, and 42% lower GPU memory usage, supporting 2K binocular rendering at over 120 FPS with high visual quality. By bridging the gap between 3DGS's representation power and the demands of VR systems, GS-Cache establishes a scalable and efficient framework for real-time neural rendering in immersive environments.\n\n在消费级设备上渲染大规模 3D 高斯溅射（3DGS） 模型面临重大挑战，难以实现 实时、高保真 的性能。要充分发挥 3DGS 在 虚拟现实（VR） 等应用中的潜力，需要解决关键的系统级挑战，以支持 实时沉浸式体验。\n为此，我们提出 GS-Cache，一个端到端框架，将 3DGS 的高级表示 与 高度优化的渲染系统 无缝集成。GS-Cache 引入了 缓存中心化（cache-centric）管线 以消除冗余计算，设计了 基于效率感知（efficiency-aware）的调度器，支持弹性多 GPU 渲染，并优化了 CUDA 内核 以突破计算瓶颈。\n通过 3DGS 与系统设计的协同优化，GS-Cache 实现了 最高 5.35 倍的性能提升，35% 的延迟降低，以及 42% 的 GPU 内存占用减少，能够支持 2K 双目渲染，在 120+ FPS 下提供 高视觉质量。GS-Cache 弥合了 3DGS 的表示能力与 VR 系统需求之间的差距，为 沉浸式环境中的实时神经渲染 建立了 可扩展且高效的框架。\n"
  },
  {
    "path": "abs/2502.15309.md",
    "content": "### DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation\n\nIn real-world scenarios, environment changes caused by human or agent activities make it extremely challenging for robots to perform various long-term tasks. Recent works typically struggle to effectively understand and adapt to dynamic environments due to the inability to update their environment representations in memory according to environment changes and lack of fine-grained reconstruction of the environments. To address these challenges, we propose DynamicGSG, a dynamic, high-fidelity, open-vocabulary scene graph construction system leveraging Gaussian splatting. DynamicGSG builds hierarchical scene graphs using advanced vision language models to represent the spatial and semantic relationships between objects in the environments, utilizes a joint feature loss we designed to supervise Gaussian instance grouping while optimizing the Gaussian maps, and locally updates the Gaussian scene graphs according to real environment changes for long-term environment adaptation. Experiments and ablation studies demonstrate the performance and efficacy of our proposed method in terms of semantic segmentation, language-guided object retrieval, and reconstruction quality. Furthermore, we validate the dynamic updating capabilities of our system in real laboratory environments.\n\n在现实世界场景中，由人类或智能体活动引起的环境变化，使得机器人执行各种长期任务变得极具挑战性。现有研究通常难以有效理解和适应动态环境，其主要原因在于：无法根据环境变化更新内存中的环境表示，以及缺乏对环境的精细重建。\n为了解决这些问题，我们提出 DynamicGSG，一个基于高斯溅射（Gaussian Splatting）的动态、高保真、开放词汇场景图构建系统。DynamicGSG 通过先进的视觉-语言模型构建分层场景图，以表示环境中物体的空间与语义关系，并采用我们设计的联合特征损失（joint feature loss）对高斯实例分组进行监督，同时优化高斯映射。此外，系统可局部更新高斯场景图，使其能够根据真实环境变化进行长期适应。\n实验和消融研究表明，我们的方法在语义分割、语言引导的目标检索、重建质量等方面具有优异的性能和有效性。此外，我们在真实实验室环境中验证了系统的动态更新能力。\n"
  },
  {
    "path": "abs/2502.15633.md",
    "content": "### RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes\n\n3D Gaussian Splatting (3DGS) has become a popular solution in SLAM, as it can produce high-fidelity novel views. However, previous GS-based methods primarily target indoor scenes and rely on RGB-D sensors or pre-trained depth estimation models, hence underperforming in outdoor scenarios. To address this issue, we propose a RGB-only gaussian splatting SLAM method for unbounded outdoor scenes--OpenGS-SLAM. Technically, we first employ a pointmap regression network to generate consistent pointmaps between frames for pose estimation. Compared to commonly used depth maps, pointmaps include spatial relationships and scene geometry across multiple views, enabling robust camera pose estimation. Then, we propose integrating the estimated camera poses with 3DGS rendering as an end-to-end differentiable pipeline. Our method achieves simultaneous optimization of camera poses and 3DGS scene parameters, significantly enhancing system tracking accuracy. Specifically, we also design an adaptive scale mapper for the pointmap regression network, which provides more accurate pointmap mapping to the 3DGS map representation. Our experiments on the Waymo dataset demonstrate that OpenGS-SLAM reduces tracking error to 9.8\\% of previous 3DGS methods, and achieves state-of-the-art results in novel view synthesis.\n\n3D 高斯溅射（3DGS） 由于能够生成高保真的新视角，已成为 SLAM 领域的热门解决方案。然而，现有基于 3DGS 的方法主要针对室内场景，依赖 RGB-D 传感器 或 预训练的深度估计模型，因此在室外场景中的表现较差。\n为了解决这一问题，我们提出 OpenGS-SLAM，一种针对无界室外场景的纯 RGB 高斯溅射 SLAM 方法。\n在技术上，我们首先引入 点图（pointmap）回归网络，用于在不同帧之间生成一致的点图，以进行位姿估计。与常用的深度图相比，点图能够编码跨多个视角的空间关系和场景几何信息，从而实现更稳健的相机位姿估计。接着，我们提出将估计的相机位姿与 3DGS 渲染相结合，构建端到端可微分管线，实现相机位姿与 3DGS 场景参数的联合优化，显著提升系统的跟踪精度。此外，我们特别设计了一种自适应尺度映射器，用于点图回归网络，以提供更精确的点图到 3DGS 地图的映射。\n在 Waymo 数据集上的实验表明，OpenGS-SLAM 将跟踪误差降低到现有 3DGS 方法的 9.8%，并在新视角合成任务上达到了最先进的性能。\n\n"
  },
  {
    "path": "abs/2502.16475.md",
    "content": "### Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control\n\nSingle-image 3D generation has emerged as a prominent research topic, playing a vital role in virtual reality, 3D modeling, and digital content creation. However, existing methods face challenges such as a lack of multi-view geometric consistency and limited controllability during the generation process, which significantly restrict their usability. % To tackle these challenges, we introduce Dragen3D, a novel approach that achieves geometrically consistent and controllable 3D generation leveraging 3D Gaussian Splatting (3DGS). We introduce the Anchor-Gaussian Variational Autoencoder (Anchor-GS VAE), which encodes a point cloud and a single image into anchor latents and decode these latents into 3DGS, enabling efficient latent-space generation. To enable multi-view geometry consistent and controllable generation, we propose a Seed-Point-Driven strategy: first generate sparse seed points as a coarse geometry representation, then map them to anchor latents via the Seed-Anchor Mapping Module. Geometric consistency is ensured by the easily learned sparse seed points, and users can intuitively drag the seed points to deform the final 3DGS geometry, with changes propagated through the anchor latents. To the best of our knowledge, we are the first to achieve geometrically controllable 3D Gaussian generation and editing without relying on 2D diffusion priors, delivering comparable 3D generation quality to state-of-the-art methods.\n\n单图 3D 生成已成为一个重要的研究课题，在虚拟现实、3D 建模和数字内容创作中发挥着关键作用。然而，现有方法面临多视角几何一致性不足和生成过程中可控性有限的挑战，这极大地限制了其实用性。\n为了解决这些挑战，我们提出 Dragen3D，这是一种利用 3D Gaussian Splatting (3DGS) 实现几何一致且可控的 3D 生成的新方法。我们引入了 Anchor-Gaussian 变分自编码器（Anchor-GS VAE），该模型能够将点云和单张图像编码为锚定潜变量（anchor latents），并解码为 3DGS，从而实现高效的潜空间生成。\n为了实现多视角几何一致且可控的生成，我们提出了一种 种子点驱动（Seed-Point-Driven） 策略：首先生成稀疏种子点作为粗略的几何表示，然后通过 种子-锚定映射模块（Seed-Anchor Mapping Module） 将其映射到锚定潜变量。几何一致性由易于学习的稀疏种子点确保，同时用户可以直观地拖动种子点来变形最终的 3DGS 几何结构，所有变形都会通过锚定潜变量传播到最终生成结果。\n据我们所知，我们是 首个在无需依赖 2D 扩散先验的情况下，实现几何可控的 3D Gaussian 生成与编辑 的方法，并在 3D 生成质量上达到与当前最先进方法相媲美的水平。\n"
  },
  {
    "path": "abs/2502.16652.md",
    "content": "### Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration\n\nWe introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large-scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks.\n\n我们提出 Dr. Splat，这是一种利用 3D Gaussian Splatting (3DGS) 进行 开放词汇 3D 场景理解 的新方法。与现有的基于语言嵌入的 3DGS 方法不同，它们依赖于渲染过程，而我们的方法直接将 与语言对齐的 CLIP 嵌入（embeddings） 关联到 3D Gaussians，以实现整体的 3D 场景理解。\n我们方法的关键是 语言特征注册技术（Language Feature Registration），其中 CLIP 嵌入被分配到每条像素射线（pixel-ray）所交叉的 主要 Gaussians 上。此外，我们结合了 基于大规模通用图像数据训练的乘积量化（Product Quantization, PQ），以紧凑地表示嵌入，而无需针对每个场景进行优化。\n实验表明，我们的方法在 3D 认知基准任务上 显著优于现有方法，包括 开放词汇 3D 语义分割、3D 目标定位和 3D 目标选择 等任务。\n\n"
  },
  {
    "path": "abs/2502.16748.md",
    "content": "### GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis\n\nWe can achieve fast and consistent early skin cancer detection with recent developments in computer vision and deep learning techniques. However, the existing skin lesion segmentation and classification prediction models run independently, thus missing potential efficiencies from their integrated execution. To unify skin lesion analysis, our paper presents the Gaussian Splatting - Transformer UNet (GS-TransUNet), a novel approach that synergistically combines 2D Gaussian splatting with the Transformer UNet architecture for automated skin cancer diagnosis. Our unified deep learning model efficiently delivers dual-function skin lesion classification and segmentation for clinical diagnosis. Evaluated on ISIC-2017 and PH2 datasets, our network demonstrates superior performance compared to existing state-of-the-art models across multiple metrics through 5-fold cross-validation. Our findings illustrate significant advancements in the precision of segmentation and classification. This integration sets new benchmarks in the field and highlights the potential for further research into multi-task medical image analysis methodologies, promising enhancements in automated diagnostic systems.\n\n近年来，计算机视觉和深度学习技术的发展使皮肤癌的快速且一致的早期检测成为可能。然而，现有的皮肤病变分割与分类模型通常独立运行，未能充分利用两者的协同执行带来的潜在效率提升。\n为此，我们提出Gaussian Splatting - Transformer UNet (GS-TransUNet)，这是一种结合 2D 高斯散点 (2D Gaussian Splatting) 与 Transformer UNet 体系结构的新型方法，用于自动化皮肤癌诊断。该统一的深度学习模型能够高效地同时执行皮肤病变的分类与分割，从而辅助临床诊断。\n在 ISIC-2017 和 PH2 数据集上的实验表明，我们的网络在5 折交叉验证 (5-fold cross-validation) 评测中，在多个指标上均优于当前最先进模型。实验结果表明，我们的方法在分割与分类精度方面取得了显著提升，不仅刷新了该领域的基准，还展示了多任务医学图像分析方法的研究潜力，为自动化诊断系统的进一步发展提供了新方向。\n"
  },
  {
    "path": "abs/2502.17288.md",
    "content": "### GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow\n\nOccupancy estimation has become a prominent task in 3D computer vision, particularly within the autonomous driving community. In this paper, we present a novel approach to occupancy estimation, termed GaussianFlowOcc, which is inspired by Gaussian Splatting and replaces traditional dense voxel grids with a sparse 3D Gaussian representation. Our efficient model architecture based on a Gaussian Transformer significantly reduces computational and memory requirements by eliminating the need for expensive 3D convolutions used with inefficient voxel-based representations that predominantly represent empty 3D spaces. GaussianFlowOcc effectively captures scene dynamics by estimating temporal flow for each Gaussian during the overall network training process, offering a straightforward solution to a complex problem that is often neglected by existing methods. Moreover, GaussianFlowOcc is designed for scalability, as it employs weak supervision and does not require costly dense 3D voxel annotations based on additional data (e.g., LiDAR). Through extensive experimentation, we demonstrate that GaussianFlowOcc significantly outperforms all previous methods for weakly supervised occupancy estimation on the nuScenes dataset while featuring an inference speed that is 50 times faster than current SOTA.\n\n占用估计（Occupancy Estimation） 已成为 3D 计算机视觉 领域的重要任务，尤其在 自动驾驶 领域受到了广泛关注。在本文中，我们提出了一种新颖的占用估计算法 GaussianFlowOcc，该方法受 Gaussian Splatting 启发，用 稀疏 3D 高斯表示 替代了传统的 稠密体素网格。\n我们基于 Gaussian Transformer 设计了一种高效的模型架构，大幅降低了计算和内存开销。相比传统 基于体素的表示方法 主要用于表示 空旷的 3D 空间，但需要高昂的 3D 卷积计算，我们的方法无需这些低效操作。GaussianFlowOcc 通过在整个网络训练过程中，为每个 3D Gaussian 估计 时序流（temporal flow），从而高效捕捉场景动态，为这一复杂问题提供了一种直观的解决方案，而该问题往往被现有方法所忽略。\n此外，GaussianFlowOcc 具有良好的可扩展性，采用 弱监督 训练 （Weak Supervision），无需依赖额外数据（如 LiDAR）生成的 高成本稠密 3D 体素标注。\n通过广泛的实验，我们表明 GaussianFlowOcc 在 nuScenes 数据集上的弱监督占用估计任务中，显著超越了所有已有方法，并且 推理速度比当前 SOTA 方法快 50 倍。\n"
  },
  {
    "path": "abs/2502.17377.md",
    "content": "### Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting\n\nThis paper investigates an open research challenge of reconstructing high-quality, large 3D open scenes from images. It is observed existing methods have various limitations, such as requiring precise camera poses for input and dense viewpoints for supervision. To perform effective and efficient 3D scene reconstruction, we propose a novel graph-guided 3D scene reconstruction framework, GraphGS. Specifically, given a set of images captured by RGB cameras on a scene, we first design a spatial prior-based scene structure estimation method. This is then used to create a camera graph that includes information about the camera topology. Further, we propose to apply the graph-guided multi-view consistency constraint and adaptive sampling strategy to the 3D Gaussian Splatting optimization process. This greatly alleviates the issue of Gaussian points overfitting to specific sparse viewpoints and expedites the 3D reconstruction process. We demonstrate GraphGS achieves high-fidelity 3D reconstruction from images, which presents state-of-the-art performance through quantitative and qualitative evaluation across multiple datasets.\n\n本文研究了从图像重建 高质量、大规模 3D 开放场景 这一开放性研究挑战。现有方法存在诸多局限性，例如 需要精确的相机姿态输入，以及 依赖稠密视角进行监督。\n为了实现高效且高质量的 3D 场景重建，我们提出了一种 基于图引导的 3D 场景重建框架——GraphGS。具体而言，给定由 RGB 相机 拍摄的一组场景图像，我们首先设计了一种 基于空间先验的场景结构估计算法，并利用该估计结果 构建相机图（Camera Graph），以描述相机的拓扑关系。此外，我们在 3D Gaussian Splatting 优化过程中 引入 图引导的多视角一致性约束（Graph-Guided Multi-View Consistency Constraint） 和 自适应采样策略（Adaptive Sampling Strategy），有效缓解 3D 高斯点（Gaussian Points）过拟合于特定稀疏视角 的问题，并加速 3D 重建过程。\n实验结果表明，GraphGS 能够从图像高保真地重建 3D 场景，在多个数据集上的定量和定性评估均达到 当前最先进（SOTA） 的性能。\n"
  },
  {
    "path": "abs/2502.17531.md",
    "content": "### Laplace-Beltrami Operator for Gaussian Splatting\n\nWith the rising popularity of 3D Gaussian splatting and the expanse of applications from rendering to 3D reconstruction, there comes also a need for geometry processing applications directly on this new representation. While considering the centers of Gaussians as a point cloud or meshing them is an option that allows to apply existing algorithms, this might ignore information present in the data or be unnecessarily expensive. Additionally, Gaussian splatting tends to contain a large number of outliers which do not affect the rendering quality but need to be handled correctly in order not to produce noisy results in geometry processing applications. In this work, we propose a formulation to compute the Laplace-Beltrami operator, a widely used tool in geometry processing, directly on Gaussian splatting using the Mahalanobis distance. While conceptually similar to a point cloud Laplacian, our experiments show superior accuracy on the point clouds encoded in the Gaussian splatting centers and, additionally, the operator can be used to evaluate the quality of the output during optimization.\n\n随着 3D Gaussian Splatting 的日益流行及其在 渲染、3D 重建 等应用中的广泛扩展，对基于这种新表示形式的 几何处理（Geometry Processing） 需求也随之增长。虽然可以将 高斯中心视为点云 或进行 网格化（Meshing） 以应用现有算法，但这种方法可能忽略数据中的关键信息，或导致不必要的计算开销。此外，3D Gaussian Splatting 通常包含大量离群点（Outliers），这些点虽然不影响渲染质量，但若在几何处理中未能正确处理，可能会引入严重的噪声。\n为此，我们提出了一种 直接在 Gaussian Splatting 上计算拉普拉斯-贝尔特拉米算子（Laplace-Beltrami Operator） 的新方法，该方法利用 马哈拉诺比斯距离（Mahalanobis Distance） 进行计算。虽然在概念上类似于 点云拉普拉斯（Point Cloud Laplacian），但实验表明，我们的方法在 高斯中心编码的点云 上具有更高的准确性。此外，该算子还可用于优化过程中评估输出质量，从而提升整体几何处理的稳定性和可靠性。\n\n"
  },
  {
    "path": "abs/2502.17860.md",
    "content": "### UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting\n\nRecent advancements in multi-modal 3D pre-training methods have shown promising efficacy in learning joint representations of text, images, and point clouds. However, adopting point clouds as 3D representation fails to fully capture the intricacies of the 3D world and exhibits a noticeable gap between the discrete points and the dense 2D pixels of images. To tackle this issue, we propose UniGS, integrating 3D Gaussian Splatting (3DGS) into multi-modal pre-training to enhance the 3D representation. We first rely on the 3DGS representation to model the 3D world as a collection of 3D Gaussians with color and opacity, incorporating all the information of the 3D scene while establishing a strong connection with 2D images. Then, to achieve Language-Image-3D pertaining, UniGS starts with a pre-trained vision-language model to establish a shared visual and textual space through extensive real-world image-text pairs. Subsequently, UniGS employs a 3D encoder to align the optimized 3DGS with the Language-Image representations to learn unified multi-modal representations. To facilitate the extraction of global explicit 3D features by the 3D encoder and achieve better cross-modal alignment, we additionally introduce a novel Gaussian-Aware Guidance module that guides the learning of fine-grained representations of the 3D domain. Through extensive experiments across the Objaverse, ABO, MVImgNet and SUN RGBD datasets with zero-shot classification, text-driven retrieval and open-world understanding tasks, we demonstrate the effectiveness of UniGS in learning a more general and stronger aligned multi-modal representation. Specifically, UniGS achieves leading results across different 3D tasks with remarkable improvements over previous SOTA, Uni3D, including on zero-shot classification (+9.36%), text-driven retrieval (+4.3%) and open-world understanding (+7.92%).\n\n近年来，多模态 3D 预训练方法在学习文本、图像和点云的联合表示方面取得了显著进展。然而，将点云作为 3D 表示方式难以充分捕捉 3D 世界的复杂细节，并且在离散点云与图像中密集 2D 像素之间存在明显的鸿沟。为了解决这一问题，我们提出了 UniGS，将 3D Gaussian Splatting (3DGS) 引入多模态预训练，以增强 3D 表示能力。\n首先，UniGS 依赖 3DGS 表示，将 3D 世界建模为具有颜色和不透明度的 3D 高斯分布集合，从而完整地保留 3D 场景的所有信息，并与 2D 图像建立紧密联系。然后，为了实现 语言-图像-3D 预训练，UniGS 以一个 预训练的视觉-语言模型 作为起点，通过大规模的真实世界图文对构建共享的视觉和文本空间。随后，UniGS 采用 3D 编码器，将优化后的 3DGS 与 语言-图像表示 进行对齐，从而学习统一的多模态表示。\n此外，为了促进 3D 编码器 提取全局显式 3D 特征并实现更优的跨模态对齐，我们进一步引入了一种新颖的 高斯感知引导（Gaussian-Aware Guidance）模块，用于引导 3D 领域的细粒度表示学习。\n在 Objaverse、ABO、MVImgNet 和 SUN RGBD 数据集上，我们进行了 零样本分类、文本驱动检索和开放世界理解 等广泛实验，以验证 UniGS 在学习更通用且更强对齐的多模态表示方面的有效性。实验结果表明，与现有 SOTA 方法 Uni3D 相比，UniGS 在不同的 3D 任务中均取得领先表现，并在多个指标上带来了显著提升，包括 零样本分类提升 +9.36%，文本驱动检索提升 +4.3%，开放世界理解提升 +7.92%。\n"
  },
  {
    "path": "abs/2502.18041.md",
    "content": "### OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation\n\nVision-Language Navigation (VLN) aims to guide agents through an environment by leveraging both language instructions and visual cues, playing a pivotal role in embodied AI. Indoor VLN has been extensively studied, whereas outdoor aerial VLN remains underexplored. The potential reason is that outdoor aerial view encompasses vast areas, making data collection more challenging, which results in a lack of benchmarks. To address this problem, we propose OpenFly, a platform comprising a versatile toolchain and large-scale benchmark for aerial VLN. Firstly, we develop a highly automated toolchain for data collection, enabling automatic point cloud acquisition, scene semantic segmentation, flight trajectory creation, and instruction generation. Secondly, based on the toolchain, we construct a large-scale aerial VLN dataset with 100k trajectories, covering diverse heights and lengths across 18 scenes. The corresponding visual data are generated using various rendering engines and advanced techniques, including Unreal Engine, GTA V, Google Earth, and 3D Gaussian Splatting (3D GS). All data exhibit high visual quality. Particularly, 3D GS supports real-to-sim rendering, further enhancing the realism of the dataset. Thirdly, we propose OpenFly-Agent, a keyframe-aware VLN model, which takes language instructions, current observations, and historical keyframes as input, and outputs flight actions directly. Extensive analyses and experiments are conducted, showcasing the superiority of our OpenFly platform and OpenFly-Agent. The toolchain, dataset, and codes will be open-sourced.\n\n视觉-语言导航（Vision-Language Navigation, VLN） 旨在利用语言指令和视觉线索引导智能体穿越环境，在具身智能（Embodied AI）中扮演着关键角色。尽管 室内 VLN 已被广泛研究，但 室外航空 VLN 仍然较少被探索。其潜在原因在于，室外航空视图覆盖范围广阔，数据采集更具挑战性，导致缺乏相关基准数据集。\n为了解决这一问题，我们提出 OpenFly，一个集 多功能工具链 和 大规模基准数据集 于一体的 航空 VLN 研究平台。\n首先，我们开发了一套高度自动化的数据采集工具链，实现 点云自动采集、场景语义分割、飞行轨迹生成和导航指令生成，极大地提高了数据构建效率。\n其次，基于该工具链，我们构建了一个 大规模航空 VLN 数据集，包含 10 万条导航轨迹，覆盖 18 个场景，涉及不同的 飞行高度和航线长度。对应的视觉数据采用多种渲染引擎和先进技术生成，包括 Unreal Engine、GTA V、Google Earth 以及 3D Gaussian Splatting（3D GS），确保了高质量的视觉效果。其中，3D GS 支持真实到模拟（Real-to-Sim）渲染，进一步增强了数据集的逼真性。\n第三，我们提出了一种 关键帧感知 VLN 模型——OpenFly-Agent。该模型以 语言指令、当前观察视图和历史关键帧 作为输入，直接输出飞行动作，以此提升导航决策能力。\n我们进行了广泛的分析和实验，验证了 OpenFly 平台和 OpenFly-Agent 的优越性。工具链、数据集和代码将全部开源，以促进航空 VLN 研究的发展。\n"
  },
  {
    "path": "abs/2502.19318.md",
    "content": "### Does 3D Gaussian Splatting Need Accurate Volumetric Rendering?\n\nSince its introduction, 3D Gaussian Splatting (3DGS) has become an important reference method for learning 3D representations of a captured scene, allowing real-time novel-view synthesis with high visual quality and fast training times. Neural Radiance Fields (NeRFs), which preceded 3DGS, are based on a principled ray-marching approach for volumetric rendering. In contrast, while sharing a similar image formation model with NeRF, 3DGS uses a hybrid rendering solution that builds on the strengths of volume rendering and primitive rasterization. A crucial benefit of 3DGS is its performance, achieved through a set of approximations, in many cases with respect to volumetric rendering theory. A naturally arising question is whether replacing these approximations with more principled volumetric rendering solutions can improve the quality of 3DGS. In this paper, we present an in-depth analysis of the various approximations and assumptions used by the original 3DGS solution. We demonstrate that, while more accurate volumetric rendering can help for low numbers of primitives, the power of efficient optimization and the large number of Gaussians allows 3DGS to outperform volumetric rendering despite its approximations.\n\n自 3D Gaussian Splatting (3DGS) 被提出以来，它已成为学习 捕获场景的 3D 表示 的重要基准方法，能够以 高视觉质量 和 快速训练时间 实现 实时新视角合成。相比之下，其前身 Neural Radiance Fields (NeRFs) 基于严格的 射线行进（ray-marching）体渲染 方法，而 3DGS 虽然采用了与 NeRF 相似的成像模型，但其渲染方案是 体渲染与基本光栅化（primitive rasterization）相结合的混合方法。\n3DGS 的一个关键优势在于其 高效的性能，这得益于一系列 针对体渲染理论的近似处理。这自然引出了一个问题：如果用更严谨的体渲染方法取代这些近似，是否能提升 3DGS 的渲染质量？\n在本文中，我们对 原始 3DGS 方案中涉及的各种近似和假设 进行了深入分析。我们发现，尽管在 少量 3D 基元（primitives） 的情况下，更精确的体渲染方法 确实能带来一定的质量提升，但 高效的优化策略 以及 庞大的高斯分布数量 使得 3DGS 在总体性能上依然优于传统体渲染方法，即便它依赖近似计算。\n"
  },
  {
    "path": "abs/2502.19457.md",
    "content": "### Compression in 3D Gaussian Splatting: A Survey of Methods, Trends, and Future Directions\n\n3D Gaussian Splatting (3DGS) has recently emerged as a pioneering approach in explicit scene rendering and computer graphics. Unlike traditional neural radiance field (NeRF) methods, which typically rely on implicit, coordinate-based models to map spatial coordinates to pixel values, 3DGS utilizes millions of learnable 3D Gaussians. Its differentiable rendering technique and inherent capability for explicit scene representation and manipulation positions 3DGS as a potential game-changer for the next generation of 3D reconstruction and representation technologies. This enables 3DGS to deliver real-time rendering speeds while offering unparalleled editability levels. However, despite its advantages, 3DGS suffers from substantial memory and storage requirements, posing challenges for deployment on resource-constrained devices. In this survey, we provide a comprehensive overview focusing on the scalability and compression of 3DGS. We begin with a detailed background overview of 3DGS, followed by a structured taxonomy of existing compression methods. Additionally, we analyze and compare current methods from the topological perspective, evaluating their strengths and limitations in terms of fidelity, compression ratios, and computational efficiency. Furthermore, we explore how advancements in efficient NeRF representations can inspire future developments in 3DGS optimization. Finally, we conclude with current research challenges and highlight key directions for future exploration.\n\n3D Gaussian Splatting (3DGS) 近年来作为 显式场景渲染和计算机图形学 领域的一项创新方法崭露头角。与传统的 神经辐射场（NeRF） 方法不同，NeRF 通常依赖 隐式、基于坐标的模型 来将空间坐标映射到像素值，而 3DGS 采用了数百万个可学习的 3D 高斯分布。其 可微渲染技术 以及 显式场景表示和编辑能力 使其成为 下一代 3D 重建与表示技术的潜在变革者。3DGS 不仅能够实现 实时渲染，还提供了 前所未有的可编辑性。\n然而，尽管 3DGS 具有诸多优势，但它仍面临 高内存占用和存储需求 的挑战，在资源受限设备上的部署难度较大。\n在本综述中，我们对 3DGS 的可扩展性与压缩技术 进行了全面分析。首先，我们提供 3DGS 的详细背景概述，然后构建了一套 现有压缩方法的系统分类。此外，我们从 拓扑结构的角度 对当前方法进行分析和比较，评估其在 重建质量、压缩率和计算效率 方面的优劣势。进一步地，我们探讨了 高效 NeRF 表示方法的最新进展，并分析其如何为 3DGS 优化 提供启发。\n最后，我们总结当前研究挑战，并提出 未来研究的关键方向，以推动 3DGS 在 高效性和可扩展性 方面的进一步发展。\n"
  },
  {
    "path": "abs/2502.19459.md",
    "content": "### Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting\n\nBuilding articulated objects is a key challenge in computer vision. Existing methods often fail to effectively integrate information across different object states, limiting the accuracy of part-mesh reconstruction and part dynamics modeling, particularly for complex multi-part articulated objects. We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation to address these issues. Our method incorporates canonical Gaussians with coarse-to-fine initialization and updates for aligning articulated part information across different object states, and employs a skinning-inspired part dynamics modeling module to improve both part-mesh reconstruction and articulation learning. Extensive experiments on both synthetic and real-world datasets, including a new benchmark for complex multi-part objects, demonstrate that ArtGS achieves state-of-the-art performance in joint parameter estimation and part mesh reconstruction. Our approach significantly improves reconstruction quality and efficiency, especially for multi-part articulated objects. Additionally, we provide comprehensive analyses of our design choices, validating the effectiveness of each component to highlight potential areas for future improvement.\n\n构建关节化对象（articulated objects）是计算机视觉中的一大挑战。现有方法往往难以有效整合不同状态下的对象信息，从而限制了部件网格重建和部件动态建模的准确性，特别是在处理复杂的多部件关节化对象时。\n为了解决这一问题，我们提出 ArtGS，一种新颖的方法，利用 3D 高斯分布（3D Gaussians） 作为灵活高效的表示方式，从而优化关节化对象的建模。我们的方法采用规范高斯（canonical Gaussians），结合由粗到细的初始化和更新策略，使得不同状态下的关节化部件信息能够精确对齐。此外，我们引入了受蒙皮技术（skinning）启发的部件动态建模模块，以提升部件网格重建和关节学习的效果。\n我们在多个合成数据集和真实世界数据集上进行了广泛实验，包括一个新的复杂多部件对象基准数据集。实验结果表明，ArtGS 在关节参数估计和部件网格重建方面均达到了最新的最优性能（SOTA）。特别是，在多部件关节化对象的重建质量和计算效率方面，我们的方法实现了显著提升。\n此外，我们对 ArtGS 的关键设计选择进行了全面分析，验证了各组件的有效性，并探讨了未来潜在的优化方向。\n"
  },
  {
    "path": "abs/2502.19739.md",
    "content": "### LUCAS: Layered Universal Codec Avatars\n\nPhotorealistic 3D head avatar reconstruction faces critical challenges in modeling dynamic face-hair interactions and achieving cross-identity generalization, particularly during expressions and head movements. We present LUCAS, a novel Universal Prior Model (UPM) for codec avatar modeling that disentangles face and hair through a layered representation. Unlike previous UPMs that treat hair as an integral part of the head, our approach separates the modeling of the hairless head and hair into distinct branches. LUCAS is the first to introduce a mesh-based UPM, facilitating real-time rendering on devices. Our layered representation also improves the anchor geometry for precise and visually appealing Gaussian renderings. Experimental results indicate that LUCAS outperforms existing single-mesh and Gaussian-based avatar models in both quantitative and qualitative assessments, including evaluations on held-out subjects in zero-shot driving scenarios. LUCAS demonstrates superior dynamic performance in managing head pose changes, expression transfer, and hairstyle variations, thereby advancing the state-of-the-art in 3D head avatar reconstruction.\n\n逼真的 3D 头像重建在建模动态面部-头发交互以及实现跨身份泛化方面面临关键挑战，特别是在表情变化和头部运动时。为此，我们提出 LUCAS，一种用于编解码头像建模（codec avatar modeling）的通用先验模型（Universal Prior Model, UPM），其核心创新在于通过**分层表示（layered representation）**实现面部与头发的解耦。\n与以往将头发视为头部整体一部分的 UPM 方法不同，我们的方法将无发头部与头发建模为独立分支，分别处理。这一方法首次引入基于网格（mesh-based）的 UPM，从而实现设备端的实时渲染。此外，我们的分层表示还优化了锚点几何（anchor geometry），提升了**高斯渲染（Gaussian renderings）**的精度和视觉效果。\n实验结果表明，LUCAS 在单网格（single-mesh）和基于高斯的头像模型中均表现优异，在零样本（zero-shot）驱动场景下，对未见过的测试对象进行评估，展现了卓越的定量和定性优势。LUCAS 在头部姿态变化、表情迁移和发型变化等动态场景中的表现显著优于现有方法，推动了 3D 头像重建的最前沿技术发展。\n"
  },
  {
    "path": "abs/2502.19782.md",
    "content": "### Open-Vocabulary Semantic Part Segmentation of 3D Human\n\n3D part segmentation is still an open problem in the field of 3D vision and AR/VR. Due to limited 3D labeled data, traditional supervised segmentation methods fall short in generalizing to unseen shapes and categories. Recently, the advancement in vision-language models' zero-shot abilities has brought a surge in open-world 3D segmentation methods. While these methods show promising results for 3D scenes or objects, they do not generalize well to 3D humans. In this paper, we present the first open-vocabulary segmentation method capable of handling 3D human. Our framework can segment the human category into desired fine-grained parts based on the textual prompt. We design a simple segmentation pipeline, leveraging SAM to generate multi-view proposals in 2D and proposing a novel HumanCLIP model to create unified embeddings for visual and textual inputs. Compared with existing pre-trained CLIP models, the HumanCLIP model yields more accurate embeddings for human-centric contents. We also design a simple-yet-effective MaskFusion module, which classifies and fuses multi-view features into 3D semantic masks without complex voting and grouping mechanisms. The design of decoupling mask proposals and text input also significantly boosts the efficiency of per-prompt inference. Experimental results on various 3D human datasets show that our method outperforms current state-of-the-art open-vocabulary 3D segmentation methods by a large margin. In addition, we show that our method can be directly applied to various 3D representations including meshes, point clouds, and 3D Gaussian Splatting.\n\n3D 部件分割仍然是3D 视觉和 AR/VR 领域中的一个开放性问题。由于3D 标注数据有限，传统的监督分割方法难以泛化到未见过的形状和类别。近年来，视觉-语言模型 在零样本能力上的突破推动了开放世界 3D 分割方法的发展。然而，尽管这些方法在3D 场景或物体上取得了良好效果，但它们在3D 人体上的泛化能力仍然较差。\n在本文中，我们提出了首个能够处理 3D 人体的开放词汇分割方法。我们的框架能够根据文本提示，将人体类别划分为细粒度的部件。我们设计了一条简洁的分割流水线，利用 SAM 在2D 视图上生成多视角分割候选区域，并提出 HumanCLIP 模型，以创建统一的视觉和文本嵌入。相比现有的预训练 CLIP 模型，HumanCLIP 在人体相关内容上生成的嵌入更加准确。我们还提出 MaskFusion 模块，以分类和融合多视角特征，从而生成3D 语义掩码，无需复杂的投票和分组机制。此外，掩码候选区域与文本输入的解耦设计显著提升了按需推理（per-prompt inference）的效率。\n在多个3D 人体数据集上的实验表明，我们的方法相比现有最先进（SOTA）的开放词汇 3D 分割方法取得了大幅度提升。此外，我们的方法可直接适用于多种 3D 表示，包括网格（meshes）、点云（point clouds）和 3D Gaussian Splatting（3DGS），展现出卓越的通用性。\n"
  },
  {
    "path": "abs/2502.19800.md",
    "content": "### No Parameters, No Problem: 3D Gaussian Splatting without Camera Intrinsics and Extrinsics\n\nWhile 3D Gaussian Splatting (3DGS) has made significant progress in scene reconstruction and novel view synthesis, it still heavily relies on accurately pre-computed camera intrinsics and extrinsics, such as focal length and camera poses. In order to mitigate this dependency, the previous efforts have focused on optimizing 3DGS without the need for camera poses, yet camera intrinsics remain necessary. To further loose the requirement, we propose a joint optimization method to train 3DGS from an image collection without requiring either camera intrinsics or extrinsics. To achieve this goal, we introduce several key improvements during the joint training of 3DGS. We theoretically derive the gradient of the camera intrinsics, allowing the camera intrinsics to be optimized simultaneously during training. Moreover, we integrate global track information and select the Gaussian kernels associated with each track, which will be trained and automatically rescaled to an infinitesimally small size, closely approximating surface points, and focusing on enforcing multi-view consistency and minimizing reprojection errors, while the remaining kernels continue to serve their original roles. This hybrid training strategy nicely unifies the camera parameters estimation and 3DGS training. Extensive evaluations demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on both public and synthetic datasets.\n\n尽管 3D Gaussian Splatting (3DGS) 在新视角合成方面表现出色，但它仍然依赖于精确的预计算相机参数，而这些参数往往难以获取且容易受到噪声影响。此前的无 COLMAP（COLMAP-Free） 方法通过局部约束优化相机位姿，但在复杂场景中常常表现不佳。\n为了解决这一问题，我们提出 TrackGS，通过特征轨迹（feature tracks） 对多视角几何关系施加全局约束。我们选择与每条轨迹关联的高斯分布，并在训练过程中缩放至无限小尺寸，以确保空间精度。此外，我们同时最小化重投影误差（reprojection error）和反投影误差（backprojection error），以增强几何一致性。\n此外，我们推导了相机内参的梯度计算，从而在 3DGS 训练过程中对相机参数进行联合优化，构建了一个端到端的优化框架。在包含剧烈相机运动的挑战性数据集上，我们的方法达到了**最先进（SOTA）**的性能。\n"
  },
  {
    "path": "abs/2502.20220.md",
    "content": "### Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars\n\nTraditionally, creating photo-realistic 3D head avatars requires a studio-level multi-view capture setup and expensive optimization during test-time, limiting the use of digital human doubles to the VFX industry or offline renderings.\nTo address this shortcoming, we present Avat3r, which regresses a high-quality and animatable 3D head avatar from just a few input images, vastly reducing compute requirements during inference. More specifically, we make Large Reconstruction Models animatable and learn a powerful prior over 3D human heads from a large multi-view video dataset. For better 3D head reconstructions, we employ position maps from DUSt3R and generalized feature maps from the human foundation model Sapiens. To animate the 3D head, our key discovery is that simple cross-attention to an expression code is already sufficient. Finally, we increase robustness by feeding input images with different expressions to our model during training, enabling the reconstruction of 3D head avatars from inconsistent inputs, e.g., an imperfect phone capture with accidental movement, or frames from a monocular video.\nWe compare Avat3r with current state-of-the-art methods for few-input and single-input scenarios, and find that our method has a competitive advantage in both tasks. Finally, we demonstrate the wide applicability of our proposed model, creating 3D head avatars from images of different sources, smartphone captures, single images, and even out-of-domain inputs like antique busts.\n\n传统上，创建逼真的 3D 头像模型需要使用专业级的多视角捕捉设备，并在测试阶段进行昂贵的优化，这限制了数字人双胞胎技术的应用范围，仅能用于视觉特效（VFX）行业或离线渲染。\n为了解决这一局限性，我们提出了 Avat3r，该方法能够仅凭少量输入图像回归出高质量且可动画化的 3D 头像模型，大幅降低推理时的计算成本。具体而言，我们使大规模重建模型（Large Reconstruction Models）具备动画能力，并从大规模多视角视频数据集中学习了强大的 3D 人头先验。为了实现更高质量的 3D 头部重建，我们结合了 DUSt3R 提供的位置映射（position maps）以及人类基础模型 Sapiens 的通用特征映射（generalized feature maps）。\n在 3D 头部动画化方面，我们的关键发现是：简单的跨注意力（cross-attention）机制应用于表情编码（expression code）即可实现高效的动画驱动。此外，为了增强鲁棒性，我们在训练过程中输入了具有不同表情的图像，使模型能够从不一致的输入数据中重建 3D 头像，例如因意外运动导致的手机拍摄误差，或是单目视频中的不同帧图像。\n我们将 Avat3r 与当前最先进的少量输入和单输入 3D 头像重建方法进行了比较，结果表明，我们的方法在这两种任务上均具有竞争优势。最后，我们展示了 Avat3r 的广泛适用性，它能够从不同来源的图像（如智能手机拍摄、单张图片，甚至是古代雕像）生成高质量的 3D 头像模型。\n"
  },
  {
    "path": "abs/2502.20378.md",
    "content": "### Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling\n\nRendering dynamic scenes from monocular videos is a crucial yet challenging task. The recent deformable Gaussian Splatting has emerged as a robust solution to represent real-world dynamic scenes. However, it often leads to heavily redundant Gaussians, attempting to fit every training view at various time steps, leading to slower rendering speeds. Additionally, the attributes of Gaussians in static areas are time-invariant, making it unnecessary to model every Gaussian, which can cause jittering in static regions. In practice, the primary bottleneck in rendering speed for dynamic scenes is the number of Gaussians. In response, we introduce Efficient Dynamic Gaussian Splatting (EDGS), which represents dynamic scenes via sparse time-variant attribute modeling. Our approach formulates dynamic scenes using a sparse anchor-grid representation, with the motion flow of dense Gaussians calculated via a classical kernel representation. Furthermore, we propose an unsupervised strategy to efficiently filter out anchors corresponding to static areas. Only anchors associated with deformable objects are input into MLPs to query time-variant attributes. Experiments on two real-world datasets demonstrate that our EDGS significantly improves the rendering speed with superior rendering quality compared to previous state-of-the-art methods.\n\n从单目视频渲染动态场景是一个关键但具有挑战性的任务。近年来，可变形高斯散点 (Deformable Gaussian Splatting) 已成为表示真实世界动态场景的强大解决方案。然而，该方法往往会引入大量冗余高斯基元，试图在不同时间步拟合所有训练视图，导致渲染速度变慢。此外，静态区域的高斯属性本质上是时间不变的，但现有方法仍对其进行建模，容易导致静态区域的抖动 (jittering)。在实际应用中，动态场景渲染速度的主要瓶颈是高斯基元的数量。\n为此，我们提出高效动态高斯散点 (Efficient Dynamic Gaussian Splatting, EDGS)，通过稀疏的时间变化属性建模来表示动态场景。我们的方法采用稀疏锚点网格 (sparse anchor-grid) 表示动态场景，并利用经典核表示 (classical kernel representation) 计算稠密高斯的运动流 (motion flow)。此外，我们提出了一种无监督策略 (unsupervised strategy) 来高效过滤静态区域的锚点，仅对可变形物体关联的锚点输入 MLP 以查询时间变化属性。\n在两个真实世界数据集上的实验表明，EDGS 在显著提升渲染速度的同时，渲染质量优于当前最先进方法。\n"
  },
  {
    "path": "abs/2502.20386.md",
    "content": "### ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting\n\nWe address the challenge of task-oriented navigation in unstructured and unknown environments, where robots must incrementally build and reason on rich, metric-semantic maps in real time. Since tasks may require clarification or re-specification, it is necessary for the information in the map to be rich enough to enable generalization across a wide range of tasks. To effectively execute tasks specified in natural language, we propose a hierarchical representation built on language-embedded Gaussian splatting that enables both sparse semantic planning that lends itself to online operation and dense geometric representation for collision-free navigation. We validate the effectiveness of our method through real-world robot experiments conducted in both cluttered indoor and kilometer-scale outdoor environments, with a competitive ratio of about 60% against privileged baselines.\n\n我们研究面向任务的导航 (task-oriented navigation) 在非结构化 (unstructured) 和未知环境中的挑战，其中机器人需要实时构建和推理丰富的度量-语义地图 (metric-semantic maps)。由于任务可能需要澄清或重新指定，地图中的信息必须足够丰富，以便能够泛化到各种任务。\n为了使机器人能够有效执行自然语言指定的任务，我们提出了一种基于语言嵌入的高斯散点 (language-embedded Gaussian Splatting) 的分层表示。这一方法同时具备稀疏语义规划 (sparse semantic planning)，以适应在线任务执行，以及稠密几何表示 (dense geometric representation)，以确保无碰撞导航。\n我们在真实世界机器人实验中验证了该方法的有效性，实验涵盖了复杂的室内环境和公里级的室外环境。在对比特权基线 (privileged baselines) 的评测中，我们的方法实现了约 60% 的竞争性比率 (competitive ratio)，展现出卓越的性能。\n"
  },
  {
    "path": "abs/2502.20669.md",
    "content": "### EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering\n\nThe lack of labeled datasets in 3D vision for surgical scenes inhibits the development of robust 3D reconstruction algorithms in the medical domain. Despite the popularity of Neural Radiance Fields and 3D Gaussian Splatting in the general computer vision community, these systems have yet to find consistent success in surgical scenes due to challenges such as non-stationary lighting and non-Lambertian surfaces. As a result, the need for labeled surgical datasets continues to grow. In this work, we introduce a differentiable rendering framework for material and lighting estimation from endoscopic images and known geometry. Compared to previous approaches that model lighting and material jointly as radiance, we explicitly disentangle these scene properties for robust and photorealistic novel view synthesis. To disambiguate the training process, we formulate domain-specific properties inherent in surgical scenes. Specifically, we model the scene lighting as a simple spotlight and material properties as a bidirectional reflectance distribution function, parameterized by a neural network. By grounding color predictions in the rendering equation, we can generate photorealistic images at arbitrary camera poses. We evaluate our method with various sequences from the Colonoscopy 3D Video Dataset and show that our method produces competitive novel view synthesis results compared with other approaches. Furthermore, we demonstrate that synthetic data can be used to develop 3D vision algorithms by finetuning a depth estimation model with our rendered outputs. Overall, we see that the depth estimation performance is on par with fine-tuning with the original real images.\n\n3D 视觉在手术场景中的标注数据集缺乏，严重限制了医学领域中鲁棒 3D 重建算法的发展。尽管 Neural Radiance Fields (NeRF) 和 3D Gaussian Splatting (3DGS) 在计算机视觉社区中广受关注，但由于非固定光照和非朗伯（non-Lambertian）表面等挑战，这些方法在手术场景中尚未取得稳定成功。因此，对标注的手术数据集的需求仍在增长。\n在本研究中，我们提出了一个可微分渲染框架，用于从内窥镜图像和已知几何信息中估计材质和光照。与以往将光照和材质联合建模为辐射度（radiance）的方法不同，我们显式地解耦这些场景属性，以实现更鲁棒且具备高真实感的新视角合成。\n为了消除训练过程中的歧义，我们针对手术场景固有的特性进行了建模。具体而言，我们将场景光照建模为单一聚光灯（spotlight），并使用双向反射分布函数（BRDF） 来表征材质属性，该函数由神经网络参数化。通过将颜色预测严格约束在渲染方程内，我们能够在任意相机位姿下生成高逼真的图像。\n我们在 Colonoscopy 3D Video Dataset（结肠镜 3D 视频数据集）的多个序列上进行了评估，结果表明，我们的方法在新视角合成任务上与现有方法相比具有竞争力。此外，我们进一步证明，合成数据可用于 3D 视觉算法的开发——具体而言，我们使用渲染输出对深度估计模型进行微调，最终的深度估计性能与使用原始真实图像微调的结果相当。\n"
  },
  {
    "path": "abs/2502.21093.md",
    "content": "### FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering\n\nDriving scene reconstruction and rendering have advanced significantly using the 3D Gaussian Splatting. However, most prior research has focused on the rendering quality along a pre-recorded vehicle path and struggles to generalize to out-of-path viewpoints, which is caused by the lack of high-quality supervision in those out-of-path views. To address this issue, we introduce an Inverse View Warping technique to create compact and high-quality images as supervision for the reconstruction of the out-of-path views, enabling high-quality rendering results for those views. For accurate and robust inverse view warping, a depth bootstrap strategy is proposed to obtain on-the-fly dense depth maps during the optimization process, overcoming the sparsity and incompleteness of LiDAR depth data. Our method achieves superior in-path and out-of-path reconstruction and rendering performance on the widely used Waymo Open dataset. In addition, a simulator-based benchmark is proposed to obtain the out-of-path ground truth and quantitatively evaluate the performance of out-of-path rendering, where our method outperforms previous methods by a significant margin.\n\n驾驶场景的 3D 重建与渲染在 3D Gaussian Splatting (3DGS) 的推动下取得了显著进展。然而，现有研究主要关注沿预录车辆轨迹的渲染质量，在轨迹外（out-of-path）视角上泛化能力较差。这一问题主要源于缺乏高质量的监督信号，导致轨迹外视角的重建效果受限。\n为了解决这一问题，我们提出 逆向视图变换（Inverse View Warping） 技术，以紧凑且高质量的图像作为监督信号，用于优化轨迹外视角的重建，从而实现高质量的渲染效果。为了保证逆向视图变换的准确性和鲁棒性，我们进一步引入 深度自举（Depth Bootstrap）策略，在优化过程中动态获取高密度深度图，克服 LiDAR 深度数据的稀疏性和不完整性。\n在 Waymo Open Dataset 上，我们的方法在轨迹内（in-path）和轨迹外（out-of-path）的重建与渲染性能均显著优于现有方法。此外，我们构建了一个基于模拟器的基准测试，用于获取轨迹外的真实参考（ground truth），并定量评估轨迹外视角的渲染性能。实验结果表明，我们的方法在轨迹外渲染质量上较现有方法取得了显著提升。\n"
  },
  {
    "path": "abs/2503.00260.md",
    "content": "### Seeing A 3D World in A Grain of Sand\n\nWe present a snapshot imaging technique for recovering 3D surrounding views of miniature scenes. Due to their intricacy, miniature scenes with objects sized in millimeters are difficult to reconstruct, yet miniatures are common in life and their 3D digitalization is desirable. We design a catadioptric imaging system with a single camera and eight pairs of planar mirrors for snapshot 3D reconstruction from a dollhouse perspective. We place paired mirrors on nested pyramid surfaces for capturing surrounding multi-view images in a single shot. Our mirror design is customizable based on the size of the scene for optimized view coverage. We use the 3D Gaussian Splatting (3DGS) representation for scene reconstruction and novel view synthesis. We overcome the challenge posed by our sparse view input by integrating visual hull-derived depth constraint. Our method demonstrates state-of-the-art performance on a variety of synthetic and real miniature scenes.\n\n我们提出了一种快照成像技术，用于重建微缩场景的 3D 全方位视图。由于微缩场景通常包含尺寸仅为毫米级的复杂物体，其 3D 重建极具挑战性。然而，微缩模型在现实生活中十分常见，对其进行3D 数字化具有重要价值。\n我们设计了一种折反射（catadioptric）成像系统，采用单个相机和八对平面镜，以玩具屋视角（dollhouse perspective）实现快照 3D 重建。具体而言，我们将成对的镜子布置在嵌套金字塔表面，从而在单次曝光中捕获环绕式多视角图像。该镜面设计可根据场景大小进行定制，以优化视角覆盖范围。\n在场景重建和新视角合成方面，我们采用 3D Gaussian Splatting (3DGS) 作为表示方式。由于输入视角较为稀疏，我们结合视觉轮廓（visual hull）推导的深度约束，以克服重建挑战。实验结果表明，我们的方法在合成和真实微缩场景上均实现了**最先进（SOTA）**的性能。\n"
  },
  {
    "path": "abs/2503.00308.md",
    "content": "### Abstract Rendering: Computing All that is Seen in Gaussian Splat Scenes\n\nWe introduce abstract rendering, a method for computing a set of images by rendering a scene from a continuously varying range of camera positions. The resulting abstract image-which encodes an infinite collection of possible renderings-is represented using constraints on the image matrix, enabling rigorous uncertainty propagation through the rendering process. This capability is particularly valuable for the formal verification of vision-based autonomous systems and other safety-critical applications. Our approach operates on Gaussian splat scenes, an emerging representation in computer vision and robotics. We leverage efficient piecewise linear bound propagation to abstract fundamental rendering operations, while addressing key challenges that arise in matrix inversion and depth sorting-two operations not directly amenable to standard approximations. To handle these, we develop novel linear relational abstractions that maintain precision while ensuring computational efficiency. These abstractions not only power our abstract rendering algorithm but also provide broadly applicable tools for other rendering problems. Our implementation, AbstractSplat, is optimized for scalability, handling up to 750k Gaussians while allowing users to balance memory and runtime through tile and batch-based computation. Compared to the only existing abstract image method for mesh-based scenes, AbstractSplat achieves 2-14x speedups while preserving precision. Our results demonstrate that continuous camera motion, rotations, and scene variations can be rigorously analyzed at scale, making abstract rendering a powerful tool for uncertainty-aware vision applications.\n\n我们提出抽象渲染 (Abstract Rendering)，一种通过在连续变化的相机位置范围内渲染场景来计算一组图像的方法。所得的抽象图像编码了无限多种可能的渲染结果，并通过约束图像矩阵的方式进行表示，从而在渲染过程中实现严格的不确定性传播。这一能力在基于视觉的自主系统的形式化验证及其他安全关键 (safety-critical) 应用中尤为重要。\n我们的方法适用于高斯散点 (Gaussian Splatting) 场景，这一新兴的计算机视觉与机器人学表示方式。我们利用高效的分段线性界限传播 (piecewise linear bound propagation) 来对基本渲染操作进行抽象，同时解决了矩阵求逆 (matrix inversion) 和深度排序 (depth sorting) 中的关键挑战——这两个操作通常不适用于标准近似方法。为此，我们开发了一种新的线性关系抽象 (linear relational abstraction)，既保持了计算精度，又确保了计算效率。\n这些抽象不仅支撑了我们的抽象渲染算法，还提供了可广泛应用于其他渲染问题的工具。我们的实现AbstractSplat 经过可扩展性优化，可处理多达 75 万个高斯散点，同时支持基于块 (tile) 和批量 (batch) 计算，让用户灵活平衡内存占用与计算时间。与当前唯一适用于基于网格 (mesh-based) 场景的抽象图像方法相比，AbstractSplat 在保持精度的同时实现了 2-14 倍的速度提升。实验结果表明，我们的方法可对连续相机运动、旋转和场景变化进行大规模严格分析，使抽象渲染成为不确定性感知视觉应用的强大工具。\n"
  },
  {
    "path": "abs/2503.00357.md",
    "content": "### CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression\n\n3D Gaussian Splatting (3DGS) has recently emerged as a promising 3D representation. Much research has been focused on reducing its storage requirements and memory footprint. However, the needs to compress and transmit the 3DGS representation to the remote side are overlooked. This new application calls for rate-distortion-optimized 3DGS compression. How to quantize and entropy encode sparse Gaussian primitives in the 3D space remains largely unexplored. Few early attempts resort to the hyperprior framework from learned image compression. But, they fail to utilize fully the inter and intra correlation inherent in Gaussian primitives. Built on ScaffoldGS, this work, termed CAT-3DGS, introduces a context-adaptive triplane approach to their rate-distortion-optimized coding. It features multi-scale triplanes, oriented according to the principal axes of Gaussian primitives in the 3D space, to capture their inter correlation (i.e. spatial correlation) for spatial autoregressive coding in the projected 2D planes. With these triplanes serving as the hyperprior, we further perform channel-wise autoregressive coding to leverage the intra correlation within each individual Gaussian primitive. Our CAT-3DGS incorporates a view frequency-aware masking mechanism. It actively skips from coding those Gaussian primitives that potentially have little impact on the rendering quality. When trained end-to-end to strike a good rate-distortion trade-off, our CAT-3DGS achieves the state-of-the-art compression performance on the commonly used real-world datasets.\n\n3D Gaussian Splatting (3DGS) 近期作为一种前景广阔的 3D 表示方式受到广泛关注。现有研究主要集中在降低存储需求和内存占用，然而，对 3DGS 进行压缩和远程传输 的需求却被忽视。这一新应用场景迫切需要优化率失真（rate-distortion-optimized）的 3DGS 压缩方法。目前，如何对 3D 空间中的稀疏高斯基元进行量化和熵编码 仍然是一个尚未深入探索的问题。\n早期的尝试借鉴了基于超先验（hyperprior）框架的学习型图像压缩方法，但未能充分利用高斯基元内部及相互之间的相关性。在此基础上，我们提出 CAT-3DGS，一种基于上下文自适应三平面（context-adaptive triplane） 方法的 3DGS 率失真优化编码。\nCAT-3DGS 采用多尺度三平面（multi-scale triplanes），其方向与3D 空间中高斯基元的主轴对齐，以捕获其空间相关性（inter correlation），从而在投影的 2D 平面上实现空间自回归编码（spatial autoregressive coding）。此外，三平面还作为超先验（hyperprior），进一步进行通道级自回归编码（channel-wise autoregressive coding），以利用每个高斯基元内部的相关性（intra correlation）。\n此外，CAT-3DGS 结合视图频率感知（view frequency-aware）掩码机制，主动跳过对渲染质量影响较小的高斯基元，从而减少冗余编码。通过端到端训练，实现**率失真权衡（rate-distortion trade-off）的最优效果，CAT-3DGS 在常用的真实世界数据集上达到了最先进（SOTA）**的压缩性能。\n"
  },
  {
    "path": "abs/2503.00370.md",
    "content": "### Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups\n\nSimulating object dynamics from real-world perception shows great promise for digital twins and robotic manipulation but often demands labor-intensive measurements and expertise. We present a fully automated Real2Sim pipeline that generates simulation-ready assets for real-world objects through robotic interaction. Using only a robot's joint torque sensors and an external camera, the pipeline identifies visual geometry, collision geometry, and physical properties such as inertial parameters. Our approach introduces a general method for extracting high-quality, object-centric meshes from photometric reconstruction techniques (e.g., NeRF, Gaussian Splatting) by employing alpha-transparent training while explicitly distinguishing foreground occlusions from background subtraction. We validate the full pipeline through extensive experiments, demonstrating its effectiveness across diverse objects. By eliminating the need for manual intervention or environment modifications, our pipeline can be integrated directly into existing pick-and-place setups, enabling scalable and efficient dataset creation.\n\n从真实世界感知中模拟物体动态对于数字孪生 (Digital Twins) 和机器人操控具有巨大潜力，但通常需要繁琐的测量和专业知识。为此，我们提出了一种全自动的 Real2Sim 流水线，通过机器人交互自动生成可用于仿真的真实世界物体资产。\n该流水线仅依赖机器人关节力矩传感器和外部摄像头，即可识别视觉几何、碰撞几何及惯性参数等物理属性。我们的方法提出了一种通用的高质量物体中心网格提取方案，适用于基于光度重建 (Photometric Reconstruction) 的技术（如 NeRF 和 高斯散点 (Gaussian Splatting)）。其中，我们通过透明 Alpha 训练 (Alpha-transparent training) 方案，在训练过程中显式区分前景遮挡和背景信息，提高重建质量。\n我们通过大量实验验证了完整流水线的有效性，适用于多种不同物体。由于无需人工干预或环境修改，该流水线可直接集成到现有的抓取与放置 (Pick-and-Place) 任务中，实现可扩展且高效的数据集创建。\n"
  },
  {
    "path": "abs/2503.00531.md",
    "content": "### GaussianSeal: Rooting Adaptive Watermarks for 3D Gaussian Generation Model\n\nWith the advancement of AIGC technologies, the modalities generated by models have expanded from images and videos to 3D objects, leading to an increasing number of works focused on 3D Gaussian Splatting (3DGS) generative models. Existing research on copyright protection for generative models has primarily concentrated on watermarking in image and text modalities, with little exploration into the copyright protection of 3D object generative models. In this paper, we propose the first bit watermarking framework for 3DGS generative models, named GaussianSeal, to enable the decoding of bits as copyright identifiers from the rendered outputs of generated 3DGS. By incorporating adaptive bit modulation modules into the generative model and embedding them into the network blocks in an adaptive way, we achieve high-precision bit decoding with minimal training overhead while maintaining the fidelity of the model's outputs. Experiments demonstrate that our method outperforms post-processing watermarking approaches for 3DGS objects, achieving superior performance of watermark decoding accuracy and preserving the quality of the generated results.\n\n随着 AIGC（AI-Generated Content） 技术的发展，生成模型的模态已从图像和视频扩展到 3D 物体，引发了越来越多关于 3D Gaussian Splatting (3DGS) 生成模型 的研究。然而，现有的生成模型版权保护研究主要集中在图像和文本的水印嵌入，而对 3D 物体生成模型的版权保护 仍然缺乏探索。\n在本文中，我们提出了首个面向 3DGS 生成模型的比特水印（bit watermarking）框架，命名为 GaussianSeal，该方法能够从生成的 3DGS 渲染输出中解码比特作为版权标识。\n我们在生成模型中引入自适应比特调制模块（adaptive bit modulation modules），并以自适应方式嵌入网络结构，实现高精度的比特解码，同时保持生成结果的高保真度，且仅需极小的训练开销。实验表明，相较于后处理水印方法，GaussianSeal 在3DGS 物体上的水印解码准确率更高，并能更好地保持生成结果的质量。\n"
  },
  {
    "path": "abs/2503.00726.md",
    "content": "### Enhancing Monocular 3D Scene Completion with Diffusion Model\n\n3D scene reconstruction is essential for applications in virtual reality, robotics, and autonomous driving, enabling machines to understand and interact with complex environments. Traditional 3D Gaussian Splatting techniques rely on images captured from multiple viewpoints to achieve optimal performance, but this dependence limits their use in scenarios where only a single image is available. In this work, we introduce FlashDreamer, a novel approach for reconstructing a complete 3D scene from a single image, significantly reducing the need for multi-view inputs. Our approach leverages a pre-trained vision-language model to generate descriptive prompts for the scene, guiding a diffusion model to produce images from various perspectives, which are then fused to form a cohesive 3D reconstruction. Extensive experiments show that our method effectively and robustly expands single-image inputs into a comprehensive 3D scene, extending monocular 3D reconstruction capabilities without further training.\n\n\n3D 场景重建在虚拟现实（VR）、机器人技术和自动驾驶等应用中至关重要，使机器能够理解并与复杂环境交互。传统的 3D Gaussian Splatting (3DGS) 技术依赖于多视角图像来实现最佳性能，但这种依赖限制了其在单视角图像场景中的应用。\n在本研究中，我们提出 FlashDreamer，一种从单张图像重建完整 3D 场景的新方法，大幅减少了对多视角输入的需求。我们的方法利用预训练的视觉-语言模型为场景生成描述性提示（prompts），并以此引导扩散模型（diffusion model）从多个视角生成图像，随后将这些视角图像融合，形成连贯的 3D 重建。\n广泛实验表明，我们的方法能够有效且稳健地扩展单视角输入，生成完整的 3D 场景，从而在无需额外训练的情况下提升单目 3D 重建能力。\n"
  },
  {
    "path": "abs/2503.00746.md",
    "content": "### DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting\n\nRecent advances in 3D Gaussian Splatting (3D-GS) have shown remarkable success in representing 3D scenes and generating high-quality, novel views in real-time. However, 3D-GS and its variants assume that input images are captured based on pinhole imaging and are fully in focus. This assumption limits their applicability, as real-world images often feature shallow depth-of-field (DoF). In this paper, we introduce DoF-Gaussian, a controllable depth-of-field method for 3D-GS. We develop a lens-based imaging model based on geometric optics principles to control DoF effects. To ensure accurate scene geometry, we incorporate depth priors adjusted per scene, and we apply defocus-to-focus adaptation to minimize the gap in the circle of confusion. We also introduce a synthetic dataset to assess refocusing capabilities and the model's ability to learn precise lens parameters. Our framework is customizable and supports various interactive applications. Extensive experiments confirm the effectiveness of our method. Our project is available at this https URL.\n\n近年来，3D Gaussian Splatting (3D-GS) 在三维场景表示和实时高质量新视角生成方面取得了显著成功。然而，现有的 3D-GS 及其变体 通常假设输入图像基于针孔成像模型，且完全处于焦点内。这一假设限制了其在真实场景中的应用，因为现实世界的图像往往具有浅景深（Depth-of-Field, DoF）效应。\n为了解决这一问题，我们提出 DoF-Gaussian，一种可控景深的 3D-GS 方法。我们基于几何光学原理构建了基于透镜的成像模型，从而能够控制景深效应。为了确保准确的场景几何，我们在每个场景中引入深度先验（depth priors）并进行散焦-聚焦（defocus-to-focus）自适应优化，以最小化弥散圆（circle of confusion）带来的误差。\n此外，我们构建了一个合成数据集，用于评估重聚焦能力及模型对镜头参数的学习能力。我们的框架高度可定制，支持多种交互式应用。大量实验结果验证了我们方法的有效性。\n"
  },
  {
    "path": "abs/2503.00848.md",
    "content": "### PSRGS:Progressive Spectral Residual of 3D Gaussian for High-Frequency Recovery\n\n3D Gaussian Splatting (3D GS) achieves impressive results in novel view synthesis for small, single-object scenes through Gaussian ellipsoid initialization and adaptive density control. However, when applied to large-scale remote sensing scenes, 3D GS faces challenges: the point clouds generated by Structure-from-Motion (SfM) are often sparse, and the inherent smoothing behavior of 3D GS leads to over-reconstruction in high-frequency regions, where have detailed textures and color variations. This results in the generation of large, opaque Gaussian ellipsoids that cause gradient artifacts. Moreover, the simultaneous optimization of both geometry and texture may lead to densification of Gaussian ellipsoids at incorrect geometric locations, resulting in artifacts in other views. To address these issues, we propose PSRGS, a progressive optimization scheme based on spectral residual maps. Specifically, we create a spectral residual significance map to separate low-frequency and high-frequency regions. In the low-frequency region, we apply depth-aware and depth-smooth losses to initialize the scene geometry with low threshold. For the high-frequency region, we use gradient features with higher threshold to split and clone ellipsoids, refining the scene. The sampling rate is determined by feature responses and gradient loss. Finally, we introduce a pre-trained network that jointly computes perceptual loss from multiple views, ensuring accurate restoration of high-frequency details in both Gaussian ellipsoids geometry and color. We conduct experiments on multiple datasets to assess the effectiveness of our method, which demonstrates competitive rendering quality, especially in recovering texture details in high-frequency regions.\n\n3D Gaussian Splatting (3DGS) 在小规模、单物体场景的新视角合成任务中表现出色，依赖于高斯椭球初始化和自适应密度控制。然而，在大规模遥感场景中，3DGS 仍面临诸多挑战：由于结构光恢复（Structure-from-Motion, SfM） 生成的点云往往稀疏，而 3DGS 的固有平滑性 会导致高频区域（具有丰富纹理和颜色变化的区域）发生过度重建，生成大尺度不透明高斯椭球，从而引发梯度伪影。此外，同时优化几何和纹理可能会导致高斯椭球在错误几何位置过度密集化，在其他视角下产生伪影。\n为了解决这些问题，我们提出 PSRGS，一种基于谱残差图（spectral residual maps）的渐进式优化方案。具体而言，我们构建谱残差显著性图，用于区分低频区域和高频区域。在低频区域，我们引入深度感知（depth-aware）和深度平滑（depth-smooth）损失，以较低的阈值初始化场景几何。在高频区域，我们利用梯度特征，并采用更高的阈值进行高斯椭球的分裂与克隆，以优化细节。采样率由特征响应和梯度损失决定。\n此外，我们引入预训练网络，在多个视角上联合计算感知损失（perceptual loss），确保高斯椭球的几何结构和颜色信息能够精准还原高频细节。\n我们在多个数据集上进行实验，结果表明 PSRGS 在渲染质量方面具备竞争力，尤其在高频区域的纹理细节恢复方面表现突出。\n"
  },
  {
    "path": "abs/2503.00868.md",
    "content": "### Vid2Fluid: 3D Dynamic Fluid Assets from Single-View Videos with Generative Gaussian Splatting\n\nThe generation of 3D content from single-view images has been extensively studied, but 3D dynamic scene generation with physical consistency from videos remains in its early stages. We propose a novel framework leveraging generative 3D Gaussian Splatting (3DGS) models to extract 3D dynamic fluid objects from single-view videos. The fluid geometry represented by 3DGS is initially generated from single-frame images, then denoised, densified, and aligned across frames. We estimate the fluid surface velocity using optical flow and compute the mainstream of the fluid to refine it. The 3D volumetric velocity field is then derived from the enclosed surface. The velocity field is then converted into a divergence-free, grid-based representation, enabling the optimization of simulation parameters through its differentiability across frames. This process results in simulation-ready fluid assets with physical dynamics closely matching those observed in the source video. Our approach is applicable to various fluid types, including gas, liquid, and viscous fluids, and allows users to edit the output geometry or extend movement durations seamlessly. Our automatic method for creating 3D dynamic fluid assets from single-view videos, easily obtainable from the internet, shows great potential for generating large-scale 3D fluid assets at a low cost.\n\n从单视角图像生成 3D 内容已被广泛研究，但具有物理一致性的 3D 动态场景生成仍处于早期阶段。在本文中，我们提出了一种新颖的框架，利用生成式 3D Gaussian Splatting (3DGS) 模型，从单视角视频中提取3D 动态流体对象。\n首先，我们通过 3DGS 生成流体几何表示，初始形态由单帧图像生成，随后进行去噪、密集化和跨帧对齐。接着，我们利用光流（optical flow） 估算流体表面速度，并计算流体的主流方向（mainstream） 以优化流体形态。随后，我们基于封闭的流体表面推导3D 体速度场（3D volumetric velocity field），并将其转换为无散度的网格化表示（divergence-free, grid-based representation），从而在跨帧优化过程中保持可微分性，实现仿真参数优化。\n这一过程生成的流体资产具备可模拟的物理动态，并与源视频中的流体运动高度匹配。我们的方法适用于气体、液体和粘性流体等多种流体类型，同时支持用户编辑几何形态或无缝扩展运动持续时间。该方法可自动从互联网获取单视角视频，生成大规模 3D 动态流体资产，极大降低了成本，展现出广阔的应用前景。\n"
  },
  {
    "path": "abs/2503.00881.md",
    "content": "### Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization\n\nRepresenting 3D scenes from multiview images is a core challenge in computer vision and graphics, which requires both precise rendering and accurate reconstruction. Recently, 3D Gaussian Splatting (3DGS) has garnered significant attention for its high-quality rendering and fast inference speed. Yet, due to the unstructured and irregular nature of Gaussian point clouds, ensuring accurate geometry reconstruction remains difficult. Existing methods primarily focus on geometry regularization, with common approaches including primitive-based and dual-model frameworks. However, the former suffers from inherent conflicts between rendering and reconstruction, while the latter is computationally and storage-intensive. To address these challenges, we propose CarGS, a unified model leveraging Contribution-adaptive regularization to achieve simultaneous, high-quality rendering and surface reconstruction. The essence of our framework is learning adaptive contribution for Gaussian primitives by squeezing the knowledge from geometry regularization into a compact MLP. Additionally, we introduce a geometry-guided densification strategy with clues from both normals and Signed Distance Fields (SDF) to improve the capability of capturing high-frequency details. Our design improves the mutual learning of the two tasks, meanwhile its unified structure does not require separate models as in dual-model based approaches, guaranteeing efficiency. Extensive experiments demonstrate the ability to achieve state-of-the-art (SOTA) results in both rendering fidelity and reconstruction accuracy while maintaining real-time speed and minimal storage size.\n\n从多视角图像表示 3D 场景是计算机视觉和计算机图形学中的核心挑战，既要求精准渲染，又需要准确重建。近年来，3D Gaussian Splatting (3DGS) 由于其高质量渲染和快速推理能力受到广泛关注。然而，由于高斯点云的非结构化和不规则性，确保准确的几何重建仍然十分困难。\n现有方法主要关注几何正则化，常见方案包括基元（primitive-based）方法和双模型（dual-model）框架。然而，基元方法在渲染与重建之间存在固有冲突，而双模型方法则计算和存储成本较高。\n为了解决这些问题，我们提出 CarGS，一种基于自适应贡献正则化（Contribution-adaptive regularization）的统一模型，能够同时实现高质量渲染和高精度表面重建。\nCarGS 的核心思想是学习高斯基元的自适应贡献，通过将几何正则化的信息压缩进紧凑的 MLP，实现高效建模。此外，我们引入基于几何引导的密集化策略（geometry-guided densification strategy），利用**法向（normals）和符号距离场（SDF）**信息，以增强高频细节的捕捉能力。\n我们的设计不仅改善了渲染与重建的相互学习，同时由于采用统一结构，避免了双模型方法需要单独训练多个模型的高开销，确保计算效率。\n大量实验表明，CarGS 在渲染保真度和重建精度方面均达到了最新的最优水平（SOTA），同时保持了实时速度和最小存储需求。\n"
  },
  {
    "path": "abs/2503.01109.md",
    "content": "### FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion\n\n3D gaussian splatting has advanced simultaneous localization and mapping (SLAM) technology by enabling real-time positioning and the construction of high-fidelity maps. However, the uncertainty in gaussian position and initialization parameters introduces challenges, often requiring extensive iterative convergence and resulting in redundant or insufficient gaussian representations. To address this, we introduce a novel adaptive densification method based on Fourier frequency domain analysis to establish gaussian priors for rapid convergence. Additionally, we propose constructing independent and unified sparse and dense maps, where a sparse map supports efficient tracking via Generalized Iterative Closest Point (GICP) and a dense map creates high-fidelity visual representations. This is the first SLAM system leveraging frequency domain analysis to achieve high-quality gaussian mapping in real-time. Experimental results demonstrate an average frame rate of 36 FPS on Replica and TUM RGB-D datasets, achieving competitive accuracy in both localization and mapping.\n\n3D Gaussian Splatting 推动了同步定位与建图（SLAM）技术的发展，实现了实时定位与高保真地图构建。然而，由于高斯位置和初始化参数的不确定性，现有方法往往需要大量迭代收敛，导致高斯表示的冗余或不足，从而影响建图效率和质量。\n为了解决这一问题，我们提出了一种基于傅里叶频域分析（Fourier frequency domain analysis）的自适应密集化方法（adaptive densification method），用于建立高斯先验，从而加速收敛。此外，我们构建了一种独立且统一的稀疏-稠密地图：稀疏地图（sparse map） 利用广义迭代最近点（Generalized Iterative Closest Point, GICP） 进行高效跟踪，而稠密地图（dense map） 用于生成高保真的视觉表示。\n本研究提出了首个利用频域分析进行实时高斯建图的 SLAM 系统。实验结果表明，在 Replica 和 TUM RGB-D 数据集上，我们的方法达到了36 FPS 的平均帧率，在定位和建图精度方面均表现出强竞争力。\n"
  },
  {
    "path": "abs/2503.01199.md",
    "content": "### LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training\n\nGaussian splatting has emerged as a powerful technique for reconstruction of 3D scenes in computer graphics and vision. However, conventional implementations often suffer from inefficiencies, limited flexibility, and high computational overhead, which constrain their adaptability to diverse applications. In this paper, we present LiteGS,a high-performance and modular framework that enhances both the efficiency and usability of Gaussian splatting. LiteGS achieves a 3.4x speedup over the original 3DGS implementation while reducing GPU memory usage by approximately 30%. Its modular design decomposes the splatting process into multiple highly optimized operators, and it provides dual API support via a script-based interface and a CUDA-based interface. The script-based interface, in combination with autograd, enables rapid prototyping and straightforward customization of new ideas, while the CUDA-based interface delivers optimal training speeds for performance-critical applications. LiteGS retains the core algorithm of 3DGS, ensuring compatibility. Comprehensive experiments on the Mip-NeRF 360 dataset demonstrate that LiteGS accelerates training without compromising accuracy, making it an ideal solution for both rapid prototyping and production environments.\n\nGaussian Splatting 已成为计算机图形学和计算机视觉领域3D 场景重建的强大技术。然而，传统实现方式往往存在效率低、灵活性受限以及计算开销高等问题，限制了其在不同应用场景中的适应性。\n在本文中，我们提出 LiteGS，一种高性能、模块化的 Gaussian Splatting 框架，旨在提高计算效率和可用性。LiteGS 相较于原始 3DGS 实现，训练速度提升 3.4 倍，同时 GPU 内存占用减少约 30%。其模块化设计将splatting 过程拆解为多个高度优化的算子，并提供基于脚本和 CUDA 的双接口支持。其中，脚本接口结合 autograd，支持快速原型开发和自定义新方法，而 CUDA 接口提供最优训练速度，满足高性能应用需求。LiteGS 仍然保持 3DGS 的核心算法，确保兼容性。\n在 Mip-NeRF 360 数据集上的综合实验表明，LiteGS 显著加速训练，同时保持渲染精度不变，使其成为快速原型设计和生产环境的理想解决方案。\n"
  },
  {
    "path": "abs/2503.01610.md",
    "content": "### Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior\n\nWe present Vid2Avatar-Pro, a method to create photorealistic and animatable 3D human avatars from monocular in-the-wild videos. Building a high-quality avatar that supports animation with diverse poses from a monocular video is challenging because the observation of pose diversity and view points is inherently limited. The lack of pose variations typically leads to poor generalization to novel poses, and avatars can easily overfit to limited input view points, producing artifacts and distortions from other views. In this work, we address these limitations by leveraging a universal prior model (UPM) learned from a large corpus of multi-view clothed human performance capture data. We build our representation on top of expressive 3D Gaussians with canonical front and back maps shared across identities. Once the UPM is learned to accurately reproduce the large-scale multi-view human images, we fine-tune the model with an in-the-wild video via inverse rendering to obtain a personalized photorealistic human avatar that can be faithfully animated to novel human motions and rendered from novel views. The experiments show that our approach based on the learned universal prior sets a new state-of-the-art in monocular avatar reconstruction by substantially outperforming existing approaches relying only on heuristic regularization or a shape prior of minimally clothed bodies (e.g., SMPL) on publicly available datasets.\n\n我们提出 Vid2Avatar-Pro，一种从单目自然视频（monocular in-the-wild videos）构建逼真且可动画化的 3D 人体头像的方法。从单目视频中创建支持多种姿势动画的高质量头像具有挑战性，因为其姿势多样性和视角观察 inherently 受限。这种姿势变化的缺乏通常会导致对新姿势的泛化能力较差，而头像模型也容易过拟合于有限的输入视角，从其他视角观察时可能出现伪影和失真。\n在本研究中，我们利用从大规模多视角着衣人体表演捕捉数据（multi-view clothed human performance capture data）中学习的通用先验模型（Universal Prior Model, UPM）来克服这些限制。我们的表示构建在具有表达力的 3D 高斯（3D Gaussians）之上，并共享标准化前后视图（canonical front and back maps），以增强跨身份的适用性。\n一旦 UPM 学习到准确重现大规模多视角人体图像，我们便通过逆渲染（inverse rendering）对模型进行微调，使其能够从自然视频中构建个性化的逼真 3D 头像，该头像不仅可以逼真地动画化以匹配新的人体动作，还可以从新视角渲染。\n实验表明，我们基于学习到的通用先验的单目 3D 头像重建方法，在公开数据集上显著优于仅依赖启发式正则化（heuristic regularization）或最小着衣人体形状先验（如 SMPL）的现有方法，达到了新的最先进水平（state-of-the-art）。\n"
  },
  {
    "path": "abs/2503.01646.md",
    "content": "### OpenGS-SLAM: Open-Set Dense Semantic SLAM with 3D Gaussian Splatting for Object-Level Scene Understanding\n\nRecent advancements in 3D Gaussian Splatting have significantly improved the efficiency and quality of dense semantic SLAM. However, previous methods are generally constrained by limited-category pre-trained classifiers and implicit semantic representation, which hinder their performance in open-set scenarios and restrict 3D object-level scene understanding. To address these issues, we propose OpenGS-SLAM, an innovative framework that utilizes 3D Gaussian representation to perform dense semantic SLAM in open-set environments. Our system integrates explicit semantic labels derived from 2D foundational models into the 3D Gaussian framework, facilitating robust 3D object-level scene understanding. We introduce Gaussian Voting Splatting to enable fast 2D label map rendering and scene updating. Additionally, we propose a Confidence-based 2D Label Consensus method to ensure consistent labeling across multiple views. Furthermore, we employ a Segmentation Counter Pruning strategy to improve the accuracy of semantic scene representation. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our method in scene understanding, tracking, and mapping, achieving 10 times faster semantic rendering and 2 times lower storage costs compared to existing methods.\n\n3D Gaussian Splatting 的最新进展显著提升了稠密语义 SLAM 的效率和质量。然而，现有方法通常受限于类别有限的预训练分类器和隐式语义表示，这不仅影响其在开放集（open-set）场景中的表现，也限制了3D 物体级场景理解的能力。\n为了解决这些问题，我们提出 OpenGS-SLAM，一种创新性框架，利用3D Gaussian 表示在开放集环境中执行稠密语义 SLAM。我们的系统将2D 基础模型提取的显式语义标签集成到 3D Gaussian 框架中，从而增强3D 物体级场景理解的能力。\n我们引入 Gaussian Voting Splatting，用于快速 2D 语义标签渲染和场景更新，并提出 基于置信度的 2D 标签一致性（Confidence-based 2D Label Consensus） 方法，以确保跨视角的一致标注。此外，我们设计 分割计数裁剪（Segmentation Counter Pruning） 策略，提高语义场景表示的准确性。\n在合成数据集和真实数据集上的广泛实验表明，我们的方法在场景理解、跟踪和建图方面均表现出色，相较于现有方法，语义渲染速度提升 10 倍，存储成本降低 2 倍。\n"
  },
  {
    "path": "abs/2503.01774.md",
    "content": "### Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models\n\nNeural Radiance Fields and 3D Gaussian Splatting have revolutionized 3D reconstruction and novel-view synthesis task. However, achieving photorealistic rendering from extreme novel viewpoints remains challenging, as artifacts persist across representations. In this work, we introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis through single-step diffusion models. At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation. Difix serves two critical roles in our pipeline. First, it is used during the reconstruction phase to clean up pseudo-training views that are rendered from the reconstruction and then distilled back into 3D. This greatly enhances underconstrained regions and improves the overall 3D representation quality. More importantly, Difix also acts as a neural enhancer during inference, effectively removing residual artifacts arising from imperfect 3D supervision and the limited capacity of current reconstruction models. Difix3D+ is a general solution, a single model compatible with both NeRF and 3DGS representations, and it achieves an average 2× improvement in FID score over baselines while maintaining 3D consistency.\n\nNeural Radiance Fields (NeRF) 和 3D Gaussian Splatting (3DGS) 彻底变革了3D 重建和新视角合成任务。然而，在极端新视角下实现照片级真实感渲染仍然充满挑战，因为伪影问题在不同表示方式中依然存在。\n在本文中，我们提出 Difix3D+，一种基于单步扩散模型（single-step diffusion models）的3D 重建与新视角合成增强管线。我们的核心方法 Difix 是一个单步图像扩散模型，专门用于增强渲染的新视角图像并去除因3D 表示受限区域而产生的伪影。\nDifix 在整个管线中发挥两个关键作用。首先，在重建阶段，我们利用 Difix 清理伪训练视图，即从重建的 3D 表示中渲染训练视图，并将其优化后的结果蒸馏回 3D 结构，从而有效提升受限区域的质量，增强整体 3D 表示能力。更重要的是，在推理阶段，Difix 作为神经增强器（neural enhancer），有效消除由不完善的 3D 监督和现有重建模型的容量限制所导致的残余伪影。\nDifix3D+ 是一种通用方案，单一模型即可兼容 NeRF 和 3DGS，并在多个基准测试中FID 分数平均提升 2 倍，同时保持3D 一致性。\n"
  },
  {
    "path": "abs/2503.02009.md",
    "content": "### Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization\n\nExploring real-world spaces using novel-view synthesis is fun, and reimagining those worlds in a different style adds another layer of excitement. Stylized worlds can also be used for downstream tasks where there is limited training data and a need to expand a model's training distribution. Most current novel-view synthesis stylization techniques lack the ability to convincingly change geometry. This is because any geometry change requires increased style strength which is often capped for stylization stability and consistency. In this work, we propose a new autoregressive 3D Gaussian Splatting stylization method. As part of this method, we contribute a new RGBD diffusion model that allows for strength control over appearance and shape stylization. To ensure consistency across stylized frames, we use a combination of novel depth-guided cross attention, feature injection, and a Warp ControlNet conditioned on composite frames for guiding the stylization of new frames. We validate our method via extensive qualitative results, quantitative experiments, and a user study.\n\n探索真实世界空间并进行新视角合成是一项有趣的任务，而将这些世界重新塑造成不同的风格更是增添了一层激动人心的可能性。风格化的世界不仅可以用于增强视觉体验，还可以扩展模型的训练分布，从而在训练数据有限的下游任务中发挥作用。然而，大多数现有的新视角合成风格化技术难以真实地改变几何结构。这是因为任何几何变形都需要更强的风格化强度，但为了保证风格化的稳定性和一致性，这种强度通常受到限制。\n在本研究中，我们提出了一种新的自回归 3D Gaussian Splatting 风格化方法。作为该方法的一部分，我们提出了一种RGBD 扩散模型（RGBD diffusion model），支持对外观和形状风格化强度的精确控制。为了确保风格化帧之间的一致性，我们结合深度引导的交叉注意力（depth-guided cross attention）、特征注入（feature injection），并利用Warp ControlNet，以复合帧作为条件来引导新帧的风格化过程。\n我们通过广泛的定性实验、定量分析和用户研究对该方法进行了验证，结果表明，该方法能够在风格化外观的同时，实现对 3D 几何形状的有效修改，从而突破现有风格化技术的局限性。\n"
  },
  {
    "path": "abs/2503.02223.md",
    "content": "### DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting\n\nAccurate object perception is essential for robotic applications such as object navigation. In this paper, we propose DQO-MAP, a novel object-SLAM system that seamlessly integrates object pose estimation and reconstruction. We employ 3D Gaussian Splatting for high-fidelity object reconstruction and leverage quadrics for precise object pose estimation. Both of them management is handled on the CPU, while optimization is performed on the GPU, significantly improving system efficiency. By associating objects with unique IDs, our system enables rapid object extraction from the scene. Extensive experimental results on object reconstruction and pose estimation demonstrate that DQO-MAP achieves outstanding performance in terms of precision, reconstruction quality, and computational efficiency.\n\n精确的物体感知在机器人应用（如目标导航）中至关重要。在本文中，我们提出 DQO-MAP，一种新型的对象级 SLAM（Object-SLAM）系统，能够无缝集成物体位姿估计和重建。\n我们的系统利用 3D Gaussian Splatting（3DGS） 进行高保真物体重建，并结合二次曲面（quadrics） 进行精确的物体位姿估计。物体管理在CPU 端处理，而优化在 GPU 上执行，大幅提升系统效率。此外，通过为物体分配唯一 ID，DQO-MAP 实现了快速的物体提取。\n在物体重建和位姿估计任务上的广泛实验表明，DQO-MAP 在精度、重建质量和计算效率方面均表现出色。\n"
  },
  {
    "path": "abs/2503.02452.md",
    "content": "### 2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting\n\nReal-time rendering of high-fidelity and animatable avatars from monocular videos remains a challenging problem in computer vision and graphics. Over the past few years, the Neural Radiance Field (NeRF) has made significant progress in rendering quality but behaves poorly in run-time performance due to the low efficiency of volumetric rendering. Recently, methods based on 3D Gaussian Splatting (3DGS) have shown great potential in fast training and real-time rendering. However, they still suffer from artifacts caused by inaccurate geometry. To address these problems, we propose 2DGS-Avatar, a novel approach based on 2D Gaussian Splatting (2DGS) for modeling animatable clothed avatars with high-fidelity and fast training performance. Given monocular RGB videos as input, our method generates an avatar that can be driven by poses and rendered in real-time. Compared to 3DGS-based methods, our 2DGS-Avatar retains the advantages of fast training and rendering while also capturing detailed, dynamic, and photo-realistic appearances. We conduct abundant experiments on popular datasets such as AvatarRex and THuman4.0, demonstrating impressive performance in both qualitative and quantitative metrics.\n\n从单目视频实时渲染高保真且可动画的虚拟人（avatar）仍然是计算机视觉和计算机图形学中的一项挑战性任务。近年来，Neural Radiance Field (NeRF) 在渲染质量方面取得了显著进展，但由于体渲染效率低，其实时性能较差。近期，基于3D Gaussian Splatting (3DGS) 的方法在快速训练和实时渲染方面展现出巨大潜力，但仍然受到几何不准确导致的伪影问题影响。\n为了解决这些问题，我们提出 2DGS-Avatar，一种基于2D Gaussian Splatting (2DGS) 的新方法，可用于建模高保真、可动画的穿着服饰虚拟人，同时具备高效训练性能。在输入单目 RGB 视频的情况下，我们的方法能够生成可由姿态驱动、实时渲染的虚拟人。\n与基于 3DGS 的方法相比，2DGS-Avatar 保持了快速训练和实时渲染的优势，同时能够捕捉更精细、动态且具备真实感的外观。我们在 AvatarRex 和 THuman4.0 等主流数据集上进行了大量实验，结果表明，在定性和定量指标上均取得了卓越性能。\n"
  },
  {
    "path": "abs/2503.03115.md",
    "content": "### NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics\n\nThermal infrared imaging offers the advantage of all-weather capability, enabling non-intrusive measurement of an object's surface temperature. Consequently, thermal infrared images are employed to reconstruct 3D models that accurately reflect the temperature distribution of a scene, aiding in applications such as building monitoring and energy management. However, existing approaches predominantly focus on static 3D reconstruction for a single time period, overlooking the impact of environmental factors on thermal radiation and failing to predict or analyze temperature variations over time. To address these challenges, we propose the NTR-Gaussian method, which treats temperature as a form of thermal radiation, incorporating elements like convective heat transfer and radiative heat dissipation. Our approach utilizes neural networks to predict thermodynamic parameters such as emissivity, convective heat transfer coefficient, and heat capacity. By integrating these predictions, we can accurately forecast thermal temperatures at various times throughout a nighttime scene. Furthermore, we introduce a dynamic dataset specifically for nighttime thermal imagery. Extensive experiments and evaluations demonstrate that NTR-Gaussian significantly outperforms comparison methods in thermal reconstruction, achieving a predicted temperature error within 1 degree Celsius.\n\n热红外成像具有全天候工作能力，可用于非接触式测量物体表面温度。因此，热红外图像被用于3D 重建，以准确反映场景的温度分布，助力建筑监测和能源管理等应用。然而，现有方法主要关注单一时间段的静态 3D 重建，忽略了环境因素对热辐射的影响，无法预测或分析温度随时间的变化。\n为了解决这些问题，我们提出 NTR-Gaussian，将温度视为热辐射的一种形式，结合对流换热（convective heat transfer）和辐射散热（radiative heat dissipation）等物理因素。我们的方法利用神经网络预测热力学参数，包括发射率（emissivity）、对流换热系数和热容量（heat capacity）。通过融合这些预测信息，我们能够精确预测夜间场景中不同时间点的温度分布。\n此外，我们引入了专门针对夜间热成像的动态数据集。广泛实验和评估表明，NTR-Gaussian 在热重建方面显著优于对比方法，并在温度预测误差控制在 1°C 以内。\n"
  },
  {
    "path": "abs/2503.03890.md",
    "content": "### LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation\n\nLearning dexterous manipulation from few-shot demonstrations is a significant yet challenging problem for advanced, human-like robotic systems. Dense distilled feature fields have addressed this challenge by distilling rich semantic features from 2D visual foundation models into the 3D domain. However, their reliance on neural rendering models such as Neural Radiance Fields (NeRF) or Gaussian Splatting results in high computational costs. In contrast, previous approaches based on sparse feature fields either suffer from inefficiencies due to multi-view dependencies and extensive training or lack sufficient grasp dexterity. To overcome these limitations, we propose Language-ENhanced Sparse Distilled Feature Field (LensDFF), which efficiently distills view-consistent 2D features onto 3D points using our novel language-enhanced feature fusion strategy, thereby enabling single-view few-shot generalization. Based on LensDFF, we further introduce a few-shot dexterous manipulation framework that integrates grasp primitives into the demonstrations to generate stable and highly dexterous grasps. Moreover, we present a real2sim grasp evaluation pipeline for efficient grasp assessment and hyperparameter tuning. Through extensive simulation experiments based on the real2sim pipeline and real-world experiments, our approach achieves competitive grasping performance, outperforming state-of-the-art approaches.\n\n从少量示范中学习灵巧操作是先进仿人机器人系统的一项重要且具有挑战性的问题。密集蒸馏特征场（Dense Distilled Feature Fields） 通过将2D 视觉基础模型的丰富语义特征蒸馏到 3D 空间，在一定程度上解决了该问题。然而，这些方法依赖于Neural Radiance Fields (NeRF) 或 Gaussian Splatting 等神经渲染模型，导致计算开销较高。相比之下，基于稀疏特征场（Sparse Feature Fields）的方法要么因多视角依赖和冗长的训练过程而效率低下，要么缺乏足够的抓取灵巧度。\n为了解决这些问题，我们提出 Language-ENhanced Sparse Distilled Feature Field (LensDFF)，该方法通过语言增强特征融合策略（language-enhanced feature fusion strategy），高效地将视角一致的 2D 特征蒸馏到3D 点云，从而实现单视角的少样本泛化。基于 LensDFF，我们进一步提出了一种少样本灵巧操作框架，将抓取基元（grasp primitives） 融入示范过程，以生成稳定且高度灵巧的抓取动作。\n此外，我们引入了 real2sim 抓取评估管线，用于高效的抓取性能评估和超参数调整。通过在 real2sim 评估管线 和真实世界实验中的广泛模拟实验，我们的方法在抓取性能上优于现有最先进（SOTA）方法，展现出卓越的灵巧抓取能力。\n"
  },
  {
    "path": "abs/2503.03984.md",
    "content": "### GRaD-Nav: Efficiently Learning Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics\n\nAutonomous visual navigation is an essential element in robot autonomy. Reinforcement learning (RL) offers a promising policy training paradigm. However existing RL methods suffer from high sample complexity, poor sim-to-real transfer, and limited runtime adaptability to navigation scenarios not seen during training. These problems are particularly challenging for drones, with complex nonlinear and unstable dynamics, and strong dynamic coupling between control and perception. In this paper, we propose a novel framework that integrates 3D Gaussian Splatting (3DGS) with differentiable deep reinforcement learning (DDRL) to train vision-based drone navigation policies. By leveraging high-fidelity 3D scene representations and differentiable simulation, our method improves sample efficiency and sim-to-real transfer. Additionally, we incorporate a Context-aided Estimator Network (CENet) to adapt to environmental variations at runtime. Moreover, by curriculum training in a mixture of different surrounding environments, we achieve in-task generalization, the ability to solve new instances of a task not seen during training. Drone hardware experiments demonstrate our method's high training efficiency compared to state-of-the-art RL methods, zero shot sim-to-real transfer for real robot deployment without fine tuning, and ability to adapt to new instances within the same task class (e.g. to fly through a gate at different locations with different distractors in the environment).\n\n自主视觉导航是机器人自主性的核心要素，而强化学习（Reinforcement Learning, RL） 提供了一种有前景的策略训练范式。然而，现有的 RL 方法 存在高样本复杂度、弱 sim-to-real 迁移能力，以及对训练未见导航场景的适应性较差等问题。这些挑战在无人机任务中尤为突出，因其具有复杂的非线性、不稳定动力学，以及控制与感知之间的强动态耦合。\n在本文中，我们提出了一种新颖的3D Gaussian Splatting (3DGS) 与可微分深度强化学习（Differentiable Deep Reinforcement Learning, DDRL）相结合的框架，用于训练基于视觉的无人机导航策略。通过高保真 3D 场景表示和可微分仿真，我们的框架提高了样本效率和sim-to-real 迁移能力。此外，我们引入 Context-aided Estimator Network (CENet)，用于适应运行时（runtime）环境变化。\n同时，我们采用课程学习（curriculum training），在多种不同环境组合中训练策略，实现任务内泛化（in-task generalization），即在未见过的同类任务实例中仍能保持良好性能（例如，在不同位置的门洞中飞行，或应对不同的环境干扰）。\n无人机硬件实验表明，与最先进（SOTA）RL 方法相比，我们的方法训练效率更高，并且无需微调即可实现零样本 sim-to-real 迁移，同时具备在相同任务类别内适应新实例的能力。\n"
  },
  {
    "path": "abs/2503.04034.md",
    "content": "### GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding\n\nRecent advancements in 3D Gaussian Splatting(3DGS) have significantly improved semantic scene understanding, enabling natural language queries to localize objects within a scene. However, existing methods primarily focus on embedding compressed CLIP features to 3D Gaussians, suffering from low object segmentation accuracy and lack spatial reasoning capabilities. To address these limitations, we propose GaussianGraph, a novel framework that enhances 3DGS-based scene understanding by integrating adaptive semantic clustering and scene graph generation. We introduce a \"Control-Follow\" clustering strategy, which dynamically adapts to scene scale and feature distribution, avoiding feature compression and significantly improving segmentation accuracy. Additionally, we enrich scene representation by integrating object attributes and spatial relations extracted from 2D foundation models. To address inaccuracies in spatial relationships, we propose 3D correction modules that filter implausible relations through spatial consistency verification, ensuring reliable scene graph construction. Extensive experiments on three datasets demonstrate that GaussianGraph outperforms state-of-the-art methods in both semantic segmentation and object grounding tasks, providing a robust solution for complex scene understanding and interaction.\n\n3D Gaussian Splatting (3DGS) 的最新进展大幅提升了语义场景理解能力，使得自然语言查询能够在场景中准确定位物体。然而，现有方法主要依赖于将压缩的 CLIP 特征嵌入 3D 高斯，导致物体分割精度较低，并且缺乏空间推理能力。\n为了解决这些问题，我们提出 GaussianGraph，一种增强型 3DGS 场景理解框架，通过自适应语义聚类和场景图生成提升 3D 语义理解能力。我们引入 “控制-跟随”（Control-Follow）聚类策略，该策略可根据场景尺度和特征分布动态调整，避免特征压缩，并显著提高语义分割精度。此外，我们在场景表示中集成2D 基础模型提取的物体属性和空间关系，增强语义信息的表达能力。\n针对空间关系不准确的问题，我们设计了 3D 关系校正模块，通过空间一致性验证（spatial consistency verification）过滤不合理的关系，确保场景图构建的可靠性。\n在三个数据集上的实验表明，GaussianGraph 在语义分割和物体定位任务上均超越现有最先进（SOTA）方法，为复杂场景理解和交互提供了一种鲁棒的解决方案。\n"
  },
  {
    "path": "abs/2503.04037.md",
    "content": "### Beyond Existance: Fulfill 3D Reconstructed Scenes with Pseudo Details\n\nThe emergence of 3D Gaussian Splatting (3D-GS) has significantly advanced 3D reconstruction by providing high fidelity and fast training speeds across various scenarios. While recent efforts have mainly focused on improving model structures to compress data volume or reduce artifacts during zoom-in and zoom-out operations, they often overlook an underlying issue: training sampling deficiency. In zoomed-in views, Gaussian primitives can appear unregulated and distorted due to their dilation limitations and the insufficient availability of scale-specific training samples. Consequently, incorporating pseudo-details that ensure the completeness and alignment of the scene becomes essential. In this paper, we introduce a new training method that integrates diffusion models and multi-scale training using pseudo-ground-truth data. This approach not only notably mitigates the dilation and zoomed-in artifacts but also enriches reconstructed scenes with precise details out of existing scenarios. Our method achieves state-of-the-art performance across various benchmarks and extends the capabilities of 3D reconstruction beyond training datasets.\n\n3D Gaussian Splatting (3D-GS) 的出现极大推动了3D 重建的发展，在不同场景中提供高保真度和快速训练速度。尽管近期研究主要致力于优化模型结构以压缩数据体积或减少缩放操作中的伪影，但一个核心问题往往被忽视：训练采样不足（training sampling deficiency）。\n在放大视角（zoomed-in views）下，由于高斯基元的膨胀（dilation）限制及缺乏尺度特定的训练样本，高斯分布可能会表现出不稳定和变形。因此，引入伪细节（pseudo-details） 以确保场景的完整性和对齐性变得尤为重要。\n在本文中，我们提出了一种新型训练方法，结合扩散模型（diffusion models） 与 多尺度训练（multi-scale training），并利用伪真值数据（pseudo-ground-truth data） 进行优化。这种方法不仅能够显著缓解高斯基元的膨胀问题和缩放伪影，还可以丰富重建场景的精细细节，超越训练数据集中的原始信息。\n我们的实验表明，该方法在多个基准测试上达到了最先进（SOTA）性能，并进一步拓展了3D 重建的能力边界，使其能够在训练数据集之外的场景中表现卓越。\n"
  },
  {
    "path": "abs/2503.04079.md",
    "content": "### Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering\n\nAccurate geometric reconstruction of deformable tissues in monocular endoscopic video remains a fundamental challenge in robot-assisted minimally invasive surgery. Although recent volumetric and point primitive methods based on neural radiance fields (NeRF) and 3D Gaussian primitives have efficiently rendered surgical scenes, they still struggle with handling artifact-free tool occlusions and preserving fine anatomical details. These limitations stem from unrestricted Gaussian scaling and insufficient surface alignment constraints during reconstruction. To address these issues, we introduce Surgical Gaussian Surfels (SGS), which transforms anisotropic point primitives into surface-aligned elliptical splats by constraining the scale component of the Gaussian covariance matrix along the view-aligned axis. We predict accurate surfel motion fields using a lightweight Multi-Layer Perceptron (MLP) coupled with locality constraints to handle complex tissue deformations. We use homodirectional view-space positional gradients to capture fine image details by splitting Gaussian Surfels in over-reconstructed regions. In addition, we define surface normals as the direction of the steepest density change within each Gaussian surfel primitive, enabling accurate normal estimation without requiring monocular normal priors. We evaluate our method on two in-vivo surgical datasets, where it outperforms current state-of-the-art methods in surface geometry, normal map quality, and rendering efficiency, while remaining competitive in real-time rendering performance.\n\n在单目内窥镜视频中对可变形组织（deformable tissues）进行精确几何重建仍然是机器人辅助微创手术中的核心挑战。尽管基于神经辐射场（NeRF）和 3D 高斯基元（Gaussian Primitives） 的体渲染和点基元方法在手术场景渲染方面取得了进展，但它们在去伪影工具遮挡和保留精细解剖细节方面仍存在困难。这些限制主要源于高斯缩放（Gaussian scaling）不受约束以及重建过程中表面对齐约束不足。\n为了解决这些问题，我们提出 Surgical Gaussian Surfels (SGS)，该方法将各向异性点基元转换为表面对齐的椭圆 splats，通过限制高斯协方差矩阵的视图对齐轴尺度分量，确保重建的表面结构更加精确。\n我们使用轻量级多层感知机（MLP） 结合局部约束（locality constraints），预测精确的 surfel 运动场，以处理复杂的组织变形。此外，我们利用同向（homodirectional）视图空间位置梯度，在过度重建区域中分裂 Gaussian Surfels，以捕获更精细的图像细节。\n同时，我们定义表面法向量为每个 Gaussian Surfel 基元内部密度变化最陡的方向，从而在无需单目法向先验的情况下实现精确的法向估计。\n我们在两个体内手术数据集上对方法进行了评估，结果表明，相较于现有最先进（SOTA）方法，SGS 在表面几何重建、法向质量和渲染效率方面均表现优越，同时保持了实时渲染的竞争力。\n"
  },
  {
    "path": "abs/2503.04082.md",
    "content": "### Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting\n\nReal2Sim is becoming increasingly important with the rapid development of surgical artificial intelligence (AI) and autonomy. In this work, we propose a novel Real2Sim methodology, Instrument-Splatting, that leverages 3D Gaussian Splatting to provide fully controllable 3D reconstruction of surgical instruments from monocular surgical videos. To maintain both high visual fidelity and manipulability, we introduce a geometry pre-training to bind Gaussian point clouds on part mesh with accurate geometric priors and define a forward kinematics to control the Gaussians as flexible as real instruments. Afterward, to handle unposed videos, we design a novel instrument pose tracking method leveraging semantics-embedded Gaussians to robustly refine per-frame instrument poses and joint states in a render-and-compare manner, which allows our instrument Gaussian to accurately learn textures and reach photorealistic rendering. We validated our method on 2 publicly released surgical videos and 4 videos collected on ex vivo tissues and green screens. Quantitative and qualitative evaluations demonstrate the effectiveness and superiority of the proposed method.\n\nReal2Sim（从真实到仿真） 随着手术人工智能（AI）和自主技术的快速发展变得越来越重要。在本文中，我们提出了一种新颖的 Real2Sim 方法论——Instrument-Splatting，利用 3D Gaussian Splatting（3DGS） 实现完全可控的手术器械 3D 重建，可从单目手术视频中提取高质量的 3D 仿真模型。\n为同时保持高视觉保真度和可操控性，我们引入几何预训练（geometry pre-training），将高斯点云绑定到部件网格（part mesh），以结合精准的几何先验。此外，我们定义了一种**前向运动学（forward kinematics）**模型，使高斯点云能够如同真实手术器械一样灵活运动。\n针对无位姿（unposed）视频，我们设计了一种器械位姿跟踪方法（instrument pose tracking method），利用嵌入语义信息的高斯（semantics-embedded Gaussians），通过渲染-对比（render-and-compare）机制对每帧的器械位姿和关节状态进行鲁棒优化。这使得我们的高斯器械能够准确学习纹理，并实现逼真的光照渲染。\n我们在两个公开手术视频和四个离体组织及绿幕视频上验证了该方法。定量和定性评估均表明，我们的方法在重建质量、位姿估计和视觉真实感方面均优于现有方法。\n"
  },
  {
    "path": "abs/2503.04314.md",
    "content": "### S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting\n\nIn this paper, we aim ambitiously for a realistic yet challenging problem, namely, how to reconstruct high-quality 3D scenes from sparse low-resolution views that simultaneously suffer from deficient perspectives and clarity. Whereas existing methods only deal with either sparse views or low-resolution observations, they fail to handle such hybrid and complicated scenarios. To this end, we propose a novel Sparse-view Super-resolution 3D Gaussian Splatting framework, dubbed S2Gaussian, that can reconstruct structure-accurate and detail-faithful 3D scenes with only sparse and low-resolution views. The S2Gaussian operates in a two-stage fashion. In the first stage, we initially optimize a low-resolution Gaussian representation with depth regularization and densify it to initialize the high-resolution Gaussians through a tailored Gaussian Shuffle Split operation. In the second stage, we refine the high-resolution Gaussians with the super-resolved images generated from both original sparse views and pseudo-views rendered by the low-resolution Gaussians. In which a customized blur-free inconsistency modeling scheme and a 3D robust optimization strategy are elaborately designed to mitigate multi-view inconsistency and eliminate erroneous updates caused by imperfect supervision. Extensive experiments demonstrate superior results and in particular establishing new state-of-the-art performances with more consistent geometry and finer details.\n\n在本文中，我们针对一个现实且极具挑战性的问题：如何从稀疏低分辨率视角重建高质量 3D 场景，同时克服视角不足和清晰度受限的双重挑战。现有方法通常仅处理稀疏视角或低分辨率观测中的单一问题，难以应对这种混合复杂场景。\n为此，我们提出了一种全新的 稀疏视角超分辨 3D Gaussian Splatting 框架，命名为 S2Gaussian，该方法能够在仅有稀疏、低分辨率输入的情况下，重建结构准确、细节丰富的 3D 场景。\nS2Gaussian 采用两阶段优化策略。在第一阶段，我们首先利用深度正则化优化一个低分辨率高斯表示，并通过定制的 Gaussian Shuffle Split 操作对其进行密集化，以初始化高分辨率高斯分布。在第二阶段，我们结合原始稀疏视角与由低分辨率高斯渲染的伪视角，利用超分辨图像进一步优化高分辨率高斯分布。其中，我们设计了一种定制的无模糊不一致建模方案（blur-free inconsistency modeling scheme）和三维鲁棒优化策略（3D robust optimization strategy），有效缓解多视角不一致问题，并消除由于不完美监督导致的错误更新。\n大量实验表明，S2Gaussian 在 3D 重建质量上超越了现有最先进（SOTA）方法，在几何一致性和细节保真度方面均取得新的最佳表现。\n"
  },
  {
    "path": "abs/2503.04333.md",
    "content": "### GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting\n\nImplicit Neural Representation for Videos (NeRV) has introduced a novel paradigm for video representation and compression, outperforming traditional codecs. As model size grows, however, slow encoding and decoding speed and high memory consumption hinder its application in practice. To address these limitations, we propose a new video representation and compression method based on 2D Gaussian Splatting to efficiently handle video data. Our proposed deformable 2D Gaussian Splatting dynamically adapts the transformation of 2D Gaussians at each frame, significantly reducing memory cost. Equipped with a multi-plane-based spatiotemporal encoder and a lightweight decoder, it predicts changes in color, coordinates, and shape of initialized Gaussians, given the time step. By leveraging temporal gradients, our model effectively captures temporal redundancy at negligible cost, significantly enhancing video representation efficiency. Our method reduces GPU memory usage by up to 78.4%, and significantly expedites video processing, achieving 5.5x faster training and 12.5x faster decoding compared to the state-of-the-art NeRV methods.\n\n隐式神经表示 (Implicit Neural Representation) 在视频领域的应用 (NeRV) 提出了一种新颖的视频表示与压缩范式，其性能优于传统编解码器。然而，随着模型规模的增长，编码和解码速度较慢以及高内存消耗限制了其实际应用。\n为解决这些问题，我们提出了一种基于二维高斯散点 (2D Gaussian Splatting) 的视频表示与压缩方法，以更高效地处理视频数据。我们的方法引入可变形二维高斯散点 (Deformable 2D Gaussian Splatting)，在每一帧动态调整 2D 高斯变换，从而显著降低内存占用。此外，我们采用基于多平面的时空编码器 (multi-plane-based spatiotemporal encoder) 和轻量级解码器 (lightweight decoder)，在给定时间步的情况下，预测初始化高斯的颜色、坐标和形状变化。\n通过利用时间梯度 (temporal gradients)，我们的模型能以极低的计算开销有效捕捉时序冗余，大幅提升视频表示效率。实验表明，我们的方法将 GPU 内存使用量降低最高达 78.4%，训练速度提升 5.5 倍，解码速度比最先进的 NeRV 方法快 12.5 倍，极大加速了视频处理。\n"
  },
  {
    "path": "abs/2503.05082.md",
    "content": "### Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs\n\nDespite recent successes in novel view synthesis using 3D Gaussian Splatting (3DGS), modeling scenes with sparse inputs remains a challenge. In this work, we address two critical yet overlooked issues in real-world sparse-input modeling: extrapolation and occlusion. To tackle these issues, we propose to use a reconstruction by generation pipeline that leverages learned priors from video diffusion models to provide plausible interpretations for regions outside the field of view or occluded. However, the generated sequences exhibit inconsistencies that do not fully benefit subsequent 3DGS modeling. To address the challenge of inconsistencies, we introduce a novel scene-grounding guidance based on rendered sequences from an optimized 3DGS, which tames the diffusion model to generate consistent sequences. This guidance is training-free and does not require any fine-tuning of the diffusion model. To facilitate holistic scene modeling, we also propose a trajectory initialization method. It effectively identifies regions that are outside the field of view and occluded. We further design a scheme tailored for 3DGS optimization with generated sequences. Experiments demonstrate that our method significantly improves upon the baseline and achieves state-of-the-art performance on challenging benchmarks.\n\n尽管 3D Gaussian Splatting (3DGS) 在新视角合成方面取得了显著成功，但在稀疏输入的场景建模中仍然面临挑战。在本文中，我们聚焦于真实世界稀疏输入建模中两个关键但被忽视的问题：外推（extrapolation）和遮挡（occlusion）。\n为了解决这些问题，我们提出了一种基于生成的重建（reconstruction by generation）方法，利用视频扩散模型（video diffusion models） 的学习先验，为视野之外或被遮挡区域提供合理的解释。然而，直接生成的序列往往存在不一致性，难以充分促进后续 3DGS 建模。\n针对这一问题，我们提出了一种基于优化 3DGS 渲染序列的场景引导（scene-grounding guidance）方法，以约束扩散模型生成一致的序列。该引导方法无需额外训练，且不需要对扩散模型进行微调。\n此外，为了实现整体场景建模，我们提出了一种轨迹初始化（trajectory initialization）方法，用于有效识别视野之外和被遮挡区域。我们还设计了一种适用于 3DGS 优化的生成序列策略，以进一步提高建模质量。\n实验表明，我们的方法在多个具有挑战性的基准测试上显著超越基线方法，并在稀疏输入场景建模中达到了最先进（SOTA）性能。\n"
  },
  {
    "path": "abs/2503.05152.md",
    "content": "### GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting\n\nThis paper presents a novel approach to image-goal navigation by integrating 3D Gaussian Splatting (3DGS) with Visual Navigation Models (VNMs), a method we refer to as GSplatVNM. VNMs offer a promising paradigm for image-goal navigation by guiding a robot through a sequence of point-of-view images without requiring metrical localization or environment-specific training. However, constructing a dense and traversable sequence of target viewpoints from start to goal remains a central challenge, particularly when the available image database is sparse. To address these challenges, we propose a 3DGS-based viewpoint synthesis framework for VNMs that synthesizes intermediate viewpoints to seamlessly bridge gaps in sparse data while significantly reducing storage overhead. Experimental results in a photorealistic simulator demonstrate that our approach not only enhances navigation efficiency but also exhibits robustness under varying levels of image database sparsity.\n\n本文提出了一种新颖的图像目标导航 (image-goal navigation) 方法，将 3D 高斯散点 (3D Gaussian Splatting, 3DGS) 与视觉导航模型 (Visual Navigation Models, VNMs) 相结合，称为 GSplatVNM。\nVNMs 通过一系列视角图像引导机器人导航，无需度量级定位 (metrical localization) 或特定环境的训练，为图像目标导航提供了一个有前景的范式。然而，在起点到目标点的导航过程中，如何构建稠密且可遍历的目标视点序列仍然是核心挑战，特别是在可用图像数据库稀疏的情况下。\n为此，我们提出了一种基于 3DGS 的 VNMs 视点合成框架，用于合成中间视点，从而无缝填补数据稀疏带来的空缺，同时大幅减少存储开销。在高真实感模拟环境中的实验表明，我们的方法不仅提升了导航效率，还能在不同程度的图像数据库稀疏性下保持稳健性。\n"
  },
  {
    "path": "abs/2503.05161.md",
    "content": "### GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting\n\nThe automatic reconstruction of 3D computer-aided design (CAD) models from CAD sketches has recently gained significant attention in the computer vision community. Most existing methods, however, rely on vector CAD sketches and 3D ground truth for supervision, which are often difficult to be obtained in industrial applications and are sensitive to noise inputs. We propose viewing CAD reconstruction as a specific instance of sparse-view 3D reconstruction to overcome these limitations. While this reformulation offers a promising perspective, existing 3D reconstruction methods typically require natural images and corresponding camera poses as inputs, which introduces two major significant challenges: (1) modality discrepancy between CAD sketches and natural images, and (2) difficulty of accurate camera pose estimation for CAD sketches. To solve these issues, we first transform the CAD sketches into representations resembling natural images and extract corresponding masks. Next, we manually calculate the camera poses for the orthographic views to ensure accurate alignment within the 3D coordinate system. Finally, we employ a customized sparse-view 3D reconstruction method to achieve high-quality reconstructions from aligned orthographic views. By leveraging raster CAD sketches for self-supervision, our approach eliminates the reliance on vector CAD sketches and 3D ground truth. Experiments on the Sub-Fusion360 dataset demonstrate that our proposed method significantly outperforms previous approaches in CAD reconstruction performance and exhibits strong robustness to noisy inputs.\n\n从 CAD 草图自动重建 3D 计算机辅助设计（CAD）模型 近年来在计算机视觉领域受到广泛关注。然而，现有方法主要依赖矢量 CAD 草图和 3D 真实数据进行监督，而在工业应用中，这类数据通常难以获取，并且对噪声输入极为敏感。\n为克服这些限制，我们提出将 CAD 重建视为稀疏视角 3D 重建的特定实例。尽管这一重新表述提供了新的思路，但现有的3D 重建方法通常依赖自然图像及其相机位姿作为输入，因此带来了两个主要挑战：（1）CAD 草图与自然图像的模态差异，（2）CAD 草图的相机位姿估计困难。\n为了解决这些问题，我们首先将 CAD 草图转换为类似自然图像的表示，并提取相应的掩码。随后，我们手动计算正交视图的相机位姿，确保其在 3D 坐标系中的准确对齐。最后，我们采用定制的稀疏视角 3D 重建方法，基于对齐的正交视图实现高质量的 3D CAD 重建。\n通过利用栅格化 CAD 草图进行自监督学习，我们的方法无需依赖矢量 CAD 草图和 3D 真实数据。在 Sub-Fusion360 数据集上的实验表明，我们的方法在 CAD 重建性能 上显著优于现有方法，同时对噪声输入表现出较强的鲁棒性。\n"
  },
  {
    "path": "abs/2503.05162.md",
    "content": "### EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation\n\nWe have recently seen great progress in 3D scene reconstruction through explicit point-based 3D Gaussian Splatting (3DGS), notable for its high quality and fast rendering speed. However, reconstructing dynamic scenes such as complex human performances with long durations remains challenging. Prior efforts fall short of modeling a long-term sequence with drastic motions, frequent topology changes or interactions with props, and resort to segmenting the whole sequence into groups of frames that are processed independently, which undermines temporal stability and thereby leads to an unpleasant viewing experience and inefficient storage footprint. In view of this, we introduce EvolvingGS, a two-stage strategy that first deforms the Gaussian model to coarsely align with the target frame, and then refines it with minimal point addition/subtraction, particularly in fast-changing areas. Owing to the flexibility of the incrementally evolving representation, our method outperforms existing approaches in terms of both per-frame and temporal quality metrics while maintaining fast rendering through its purely explicit representation. Moreover, by exploiting temporal coherence between successive frames, we propose a simple yet effective compression algorithm that achieves over 50x compression rate. Extensive experiments on both public benchmarks and challenging custom datasets demonstrate that our method significantly advances the state-of-the-art in dynamic scene reconstruction, particularly for extended sequences with complex human performances.\n\n3D Gaussian Splatting (3DGS) 作为一种基于显式点的 3D 场景重建方法，因其高质量渲染和快速推理能力，在 3D 场景重建领域取得了巨大进展。然而，针对**长时动态场景（如复杂的人体表演）**的重建仍然充满挑战。\n现有方法在处理剧烈运动、频繁拓扑变化或与道具交互的长序列时表现不佳，通常需要将整个序列拆分为多个独立帧组，并分别进行处理。这种做法会破坏时序稳定性，导致不连贯的观感，同时造成存储开销的浪费。\n为了解决这些问题，我们提出 EvolvingGS，一种两阶段策略：首先，我们对高斯模型进行变形，使其与目标帧进行粗对齐，然后，在快速变化区域进行最小化点增/删优化，以精细调整模型。这种增量式演化表示（incrementally evolving representation） 使我们的模型在单帧质量和时序一致性上均优于现有方法，同时由于采用完全显式的表示方式，仍能保持高效的实时渲染。\n此外，我们利用连续帧之间的时序一致性（temporal coherence），提出了一种简单但有效的压缩算法，实现了超过 50 倍的压缩率。\n在公共基准测试和复杂自定义数据集上的广泛实验表明，EvolvingGS 在动态场景重建方面显著提升了最先进（SOTA）水平，尤其适用于长时复杂人体表演的重建任务。\n"
  },
  {
    "path": "abs/2503.05168.md",
    "content": "### SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has become a crucial rendering technique for many real-time applications. However, the limited hardware resources on today's mobile platforms hinder these applications, as they struggle to achieve real-time performance. In this paper, we propose SeeLe, a general framework designed to accelerate the 3DGS pipeline for resource-constrained mobile devices.\nSpecifically, we propose two GPU-oriented techniques: hybrid preprocessing and contribution-aware rasterization. Hybrid preprocessing alleviates the GPU compute and memory pressure by reducing the number of irrelevant Gaussians during rendering. The key is to combine our view-dependent scene representation with online filtering. Meanwhile, contribution-aware rasterization improves the GPU utilization at the rasterization stage by prioritizing Gaussians with high contributions while reducing computations for those with low contributions. Both techniques can be seamlessly integrated into existing 3DGS pipelines with minimal fine-tuning. Collectively, our framework achieves 2.6× speedup and 32.3% model reduction while achieving superior rendering quality compared to existing methods.\n\n3D 高斯散点 (3D Gaussian Splatting, 3DGS) 已成为众多实时应用中的关键渲染技术。然而，当前移动平台受限的硬件资源阻碍了这些应用的发展，使其难以实现实时性能。在本文中，我们提出 SeeLe，一个旨在加速资源受限的移动设备上 3DGS 渲染流程的通用框架。\n具体而言，我们提出了两种面向 GPU 的优化技术：混合预处理 (Hybrid Preprocessing) 和 贡献感知光栅化 (Contribution-Aware Rasterization)。混合预处理通过减少渲染过程中无关高斯点的数量，降低 GPU 计算和内存压力。其核心思想是结合基于视角的场景表示与在线过滤机制。与此同时，贡献感知光栅化通过优先处理贡献度高的高斯点，并减少低贡献度高斯点的计算量，从而提高光栅化阶段的 GPU 利用率。\n这两种技术可以无缝集成到现有的 3DGS 渲染管线中，且仅需最小的参数微调。综合来看，我们的框架在保证优越渲染质量的同时，相较于现有方法实现了 2.6× 的加速，并减少 32.3% 的模型大小。\n"
  },
  {
    "path": "abs/2503.05174.md",
    "content": "### SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting\n\n6-DoF pose estimation is a fundamental task in computer vision with wide-ranging applications in augmented reality and robotics. Existing single RGB-based methods often compromise accuracy due to their reliance on initial pose estimates and susceptibility to rotational ambiguity, while approaches requiring depth sensors or multi-view setups incur significant deployment costs. To address these limitations, we introduce SplatPose, a novel framework that synergizes 3D Gaussian Splatting (3DGS) with a dual-branch neural architecture to achieve high-precision pose estimation using only a single RGB image. Central to our approach is the Dual-Attention Ray Scoring Network (DARS-Net), which innovatively decouples positional and angular alignment through geometry-domain attention mechanisms, explicitly modeling directional dependencies to mitigate rotational ambiguity. Additionally, a coarse-to-fine optimization pipeline progressively refines pose estimates by aligning dense 2D features between query images and 3DGS-synthesized views, effectively correcting feature misalignment and depth errors from sparse ray sampling. Experiments on three benchmark datasets demonstrate that SplatPose achieves state-of-the-art 6-DoF pose estimation accuracy in single RGB settings, rivaling approaches that depend on depth or multi-view images.\n\n6-DoF 位姿估计是计算机视觉中的一项基础任务，在增强现实和机器人等领域具有广泛应用。然而，现有基于单张 RGB 图像的方法往往因依赖初始位姿估计且易受旋转歧义影响而牺牲精度，而依赖深度传感器或多视角设置的方法则会带来较高的部署成本。\n为了解决这些局限性，我们提出 SplatPose，一个结合 3D 高斯散点 (3D Gaussian Splatting, 3DGS) 与双分支神经网络架构的全新框架，仅利用单张 RGB 图像即可实现高精度的位姿估计。\n我们方法的核心是 双注意力射线评分网络 (Dual-Attention Ray Scoring Network, DARS-Net)，其创新之处在于通过几何域注意力机制解耦位置和角度对齐，显式建模方向依赖关系，以缓解旋转歧义问题。此外，我们设计了一种由粗到细的优化管线 (coarse-to-fine optimization pipeline)，通过在查询图像与 3DGS 合成视图之间对齐稠密的 2D 特征，逐步优化位姿估计，从而有效纠正因稀疏射线采样导致的特征错位和深度误差。\n在三个基准数据集上的实验表明，SplatPose 在单 RGB 设置下实现了当前最先进的 6-DoF 位姿估计精度，其效果可与依赖深度或多视角图像的方法相媲美。\n"
  },
  {
    "path": "abs/2503.05182.md",
    "content": "### MGSR: 2D/3D Mutual-boosted Gaussian Splatting for High-fidelity Surface Reconstruction under Various Light Conditions\n\nNovel view synthesis (NVS) and surface reconstruction (SR) are essential tasks in 3D Gaussian Splatting (3D-GS). Despite recent progress, these tasks are often addressed independently, with GS-based rendering methods struggling under diverse light conditions and failing to produce accurate surfaces, while GS-based reconstruction methods frequently compromise rendering quality. This raises a central question: must rendering and reconstruction always involve a trade-off? To address this, we propose MGSR, a 2D/3D Mutual-boosted Gaussian splatting for Surface Reconstruction that enhances both rendering quality and 3D reconstruction accuracy. MGSR introduces two branches--one based on 2D-GS and the other on 3D-GS. The 2D-GS branch excels in surface reconstruction, providing precise geometry information to the 3D-GS branch. Leveraging this geometry, the 3D-GS branch employs a geometry-guided illumination decomposition module that captures reflected and transmitted components, enabling realistic rendering under varied light conditions. Using the transmitted component as supervision, the 2D-GS branch also achieves high-fidelity surface reconstruction. Throughout the optimization process, the 2D-GS and 3D-GS branches undergo alternating optimization, providing mutual supervision. Prior to this, each branch completes an independent warm-up phase, with an early stopping strategy implemented to reduce computational costs. We evaluate MGSR on a diverse set of synthetic and real-world datasets, at both object and scene levels, demonstrating strong performance in rendering and surface reconstruction.\n\n新视角合成（Novel View Synthesis, NVS）和表面重建（Surface Reconstruction, SR）是 3D 高斯散点 (3D Gaussian Splatting, 3D-GS) 中的两项核心任务。尽管近年来取得了显著进展，这两项任务通常是独立处理的：基于 GS 的渲染方法在不同光照条件下表现不稳定，难以生成精确的表面，而基于 GS 的重建方法往往会牺牲渲染质量。这引发了一个核心问题：渲染与重建是否必须相互妥协？\n为了解决这一问题，我们提出 MGSR，即一种 2D/3D 互增强的高斯散点表面重建方法 (Mutual-boosted Gaussian Splatting for Surface Reconstruction)，旨在同时提升渲染质量和 3D 重建精度。\nMGSR 采用双分支架构：一个基于 2D-GS，另一个基于 3D-GS。其中，2D-GS 分支擅长表面重建，提供精确的几何信息以增强 3D-GS 分支。利用这一几何信息，3D-GS 分支引入几何引导的光照分解模块 (Geometry-Guided Illumination Decomposition Module)，能够分离反射与透射成分，从而在不同光照条件下实现逼真的渲染。同时，以透射成分作为监督信号，2D-GS 分支也能实现高保真的表面重建。\n在优化过程中，2D-GS 和 3D-GS 分支采用交替优化机制，互相提供监督信息。在此之前，每个分支会独立完成预训练阶段 (warm-up phase)，并采用提前停止策略 (early stopping strategy) 以降低计算成本。\n我们在多个合成和真实数据集上进行了实验，包括物体级和场景级评估，结果表明 MGSR 在渲染与表面重建任务上均表现出色。\n"
  },
  {
    "path": "abs/2503.05189.md",
    "content": "### Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects\n\nTracking and manipulating irregularly-shaped, previously unseen objects in dynamic environments is important for robotic applications in manufacturing, assembly, and logistics. Recently introduced Gaussian Splats efficiently model object geometry, but lack persistent state estimation for task-oriented manipulation. We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics, self-supervised visual features, and object grouping features into a compact representation that can be continuously updated to estimate the pose of scanned objects. POGS updates object states without requiring expensive rescanning or prior CAD models of objects. After an initial multi-view scene capture and training phase, POGS uses a single stereo camera to integrate depth estimates along with self-supervised vision encoder features for object pose estimation. POGS supports grasping, reorientation, and natural language-driven manipulation by refining object pose estimates, facilitating sequential object reset operations with human-induced object perturbations and tool servoing, where robots recover tool pose despite tool perturbations of up to 30°. POGS achieves up to 12 consecutive successful object resets and recovers from 80% of in-grasp tool perturbations.\n\n在制造、装配和物流等机器人应用中，跟踪和操控不规则形状的未知物体 在动态环境下具有重要意义。近期提出的高斯散点 (Gaussian Splats) 能够高效建模物体几何形状，但缺乏持久的状态估计，难以用于面向任务的操控。\n我们提出 Persistent Object Gaussian Splat (POGS)，该系统将语义信息、自监督视觉特征以及物体分组特征嵌入到紧凑的表示中，并能持续更新以估计已扫描物体的位姿。POGS 无需昂贵的重新扫描或物体 CAD 先验模型即可更新物体状态。在初始多视角场景捕获和训练阶段后，POGS 仅依赖单个双目相机，融合深度估计和自监督视觉编码特征进行物体位姿估计。\nPOGS 支持抓取、重定位和自然语言驱动的操控，通过优化物体位姿估计来执行顺序物体复位，可应对人为干扰和工具伺服控制。在机器人工具受到最大 30° 的干扰后，POGS 仍能恢复工具位姿，并可实现连续 12 次成功的物体复位，在抓取过程中成功恢复 80% 的工具干扰。\n"
  },
  {
    "path": "abs/2503.05196.md",
    "content": "### STGA: Selective-Training Gaussian Head Avatars\n\nWe propose selective-training Gaussian head avatars (STGA) to enhance the details of dynamic head Gaussian. The dynamic head Gaussian model is trained based on the FLAME parameterized model. Each Gaussian splat is embedded within the FLAME mesh to achieve mesh-based animation of the Gaussian model. Before training, our selection strategy calculates the 3D Gaussian splat to be optimized in each frame. The parameters of these 3D Gaussian splats are optimized in the training of each frame, while those of the other splats are frozen. This means that the splats participating in the optimization process differ in each frame, to improve the realism of fine details. Compared with network-based methods, our method achieves better results with shorter training time. Compared with mesh-based methods, our method produces more realistic details within the same training time. Additionally, the ablation experiment confirms that our method effectively enhances the quality of details.\n\n我们提出 选择性训练高斯头像 (Selective-Training Gaussian Head Avatars, STGA)，以增强动态头像高斯模型 (Dynamic Head Gaussian Model) 的细节表现。该动态高斯头像模型基于 FLAME 参数化模型 进行训练，每个高斯散点 (Gaussian Splat) 均嵌入 FLAME 网格 (FLAME Mesh)，从而实现基于网格的高斯模型动画。\n在训练前，我们的选择策略 (Selection Strategy) 计算每一帧中需要优化的 3D 高斯散点。在每一帧的训练过程中，仅优化选定的 3D 高斯散点的参数，而其他高斯散点的参数保持冻结。这意味着每一帧参与优化的高斯散点都会动态变化，以提升精细细节的真实感。\n与基于神经网络的方法相比，我们的方法在训练时间更短的情况下取得了更好的效果；与基于网格的方法相比，我们在相同训练时间内能够生成更逼真的细节。此外，消融实验 (Ablation Study) 进一步验证了该方法在提升细节质量方面的有效性。\n"
  },
  {
    "path": "abs/2503.05332.md",
    "content": "### CoMoGaussian: Continuous Motion-Aware Gaussian Splatting from Motion-Blurred Images\n\n3D Gaussian Splatting (3DGS) has gained significant attention for their high-quality novel view rendering, motivating research to address real-world challenges. A critical issue is the camera motion blur caused by movement during exposure, which hinders accurate 3D scene reconstruction. In this study, we propose CoMoGaussian, a Continuous Motion-Aware Gaussian Splatting that reconstructs precise 3D scenes from motion-blurred images while maintaining real-time rendering speed. Considering the complex motion patterns inherent in real-world camera movements, we predict continuous camera trajectories using neural ordinary differential equations (ODEs). To ensure accurate modeling, we employ rigid body transformations, preserving the shape and size of the object but rely on the discrete integration of sampled frames. To better approximate the continuous nature of motion blur, we introduce a continuous motion refinement (CMR) transformation that refines rigid transformations by incorporating additional learnable parameters. By revisiting fundamental camera theory and leveraging advanced neural ODE techniques, we achieve precise modeling of continuous camera trajectories, leading to improved reconstruction accuracy. Extensive experiments demonstrate state-of-the-art performance both quantitatively and qualitatively on benchmark datasets, which include a wide range of motion blur scenarios, from moderate to extreme blur.\n\n3D 高斯散点 (3D Gaussian Splatting, 3DGS) 由于其高质量的新视角渲染能力，近年来受到广泛关注，推动了针对真实世界挑战的研究。其中，一个关键问题是相机运动模糊 (Camera Motion Blur)，即由于曝光过程中相机的运动导致的模糊现象，这一问题严重影响了准确的 3D 场景重建。\n在本研究中，我们提出 CoMoGaussian，即一种连续运动感知的高斯散点 (Continuous Motion-Aware Gaussian Splatting)，能够从运动模糊图像中精确重建 3D 场景，同时保持实时渲染的性能。\n考虑到真实世界相机运动的复杂性，我们采用神经常微分方程 (Neural Ordinary Differential Equations, ODEs) 来预测连续相机轨迹。为了确保建模的准确性，我们引入刚体变换 (Rigid Body Transformations)，在保留物体形状和大小的前提下，通过离散积分 (Discrete Integration) 处理采样帧。然而，为了更精确地逼近连续运动模糊 (Continuous Motion Blur)，我们进一步提出连续运动优化 (Continuous Motion Refinement, CMR) 变换，该方法在刚体变换的基础上引入额外的可学习参数，以优化相机运动建模。\n通过重新审视基础相机理论 (Fundamental Camera Theory) 并结合先进的神经 ODE 技术，我们的模型能够对连续相机轨迹进行精确建模，从而提升 3D 重建的准确性。大量实验表明，在多个基准数据集上，我们的方法在定量指标 (Quantitative Metrics) 和定性效果 (Qualitative Results) 方面均达到了当前最先进的性能 (State-of-the-Art Performance)，能够处理从中等模糊 (Moderate Blur) 到极端模糊 (Extreme Blur) 的多种运动模糊场景。\n"
  },
  {
    "path": "abs/2503.05398.md",
    "content": "### Self-Modeling Robots by Photographing\n\nSelf-modeling enables robots to build task-agnostic models of their morphology and kinematics based on data that can be automatically collected, with minimal human intervention and prior information, thereby enhancing machine intelligence. Recent research has highlighted the potential of data-driven technology in modeling the morphology and kinematics of robots. However, existing self-modeling methods suffer from either low modeling quality or excessive data acquisition costs. Beyond morphology and kinematics, texture is also a crucial component of robots, which is challenging to model and remains unexplored. In this work, a high-quality, texture-aware, and link-level method is proposed for robot self-modeling. We utilize three-dimensional (3D) Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones, whose deformations are controlled by the transformation matrices generated by a kinematic neural network. The 3D Gaussians and kinematic neural network are trained using data pairs composed of joint angles, camera parameters and multi-view images without depth information. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level, and render robot images from different perspectives with the aid of 3D Gaussian splatting. Furthermore, we demonstrate that the established model can be exploited to perform downstream tasks such as motion planning and inverse kinematics.\n\n自建模 (Self-modeling) 使机器人能够在最小化人工干预和先验信息的情况下，基于自动采集的数据构建与任务无关的形态与运动学模型，从而提升机器智能。近年来，数据驱动技术在机器人形态 (Morphology) 和运动学 (Kinematics) 建模中的潜力已被广泛研究。然而，现有的自建模方法通常在建模质量较低或数据采集成本过高之间存在权衡。此外，除了形态和运动学，纹理 (Texture) 也是机器人建模中的关键组成部分，但由于建模难度较大，该方面仍然未被充分探索。\n在本研究中，我们提出了一种高质量、具备纹理感知能力 (Texture-aware) 且基于链节级 (Link-level) 的机器人自建模方法。我们采用三维高斯散点 (3D Gaussians) 来表示机器人的静态形态和纹理，并通过对 3D 高斯散点进行聚类来构建神经椭球骨骼 (Neural Ellipsoid Bones)，其变形由运动学神经网络 (Kinematic Neural Network) 生成的变换矩阵控制。\n在训练过程中，我们的 3D 高斯散点和运动学神经网络利用仅包含关节角度、相机参数和多视角图像（不含深度信息）的数据对进行优化。通过向运动学神经网络输入关节角度，训练好的模型能够在链节级准确描述机器人形态、运动学和纹理，并借助3D 高斯散点渲染 (3D Gaussian Splatting) 生成不同视角下的机器人图像。\n此外，我们进一步验证了所建立模型在下游任务中的应用价值，例如运动规划 (Motion Planning) 和逆运动学 (Inverse Kinematics)，展现了该方法的广泛适用性和潜力。\n"
  },
  {
    "path": "abs/2503.05425.md",
    "content": "### LiDAR-enhanced 3D Gaussian Splatting Mapping\n\nThis paper introduces LiGSM, a novel LiDAR-enhanced 3D Gaussian Splatting (3DGS) mapping framework that improves the accuracy and robustness of 3D scene mapping by integrating LiDAR data. LiGSM constructs joint loss from images and LiDAR point clouds to estimate the poses and optimize their extrinsic parameters, enabling dynamic adaptation to variations in sensor alignment. Furthermore, it leverages LiDAR point clouds to initialize 3DGS, providing a denser and more reliable starting points compared to sparse SfM points. In scene rendering, the framework augments standard image-based supervision with depth maps generated from LiDAR projections, ensuring an accurate scene representation in both geometry and photometry. Experiments on public and self-collected datasets demonstrate that LiGSM outperforms comparative methods in pose tracking and scene rendering.\n\n本文提出 LiGSM，一种基于 LiDAR 增强的 3D 高斯散点 (LiDAR-enhanced 3D Gaussian Splatting, 3DGS) 映射框架，通过融合 LiDAR 数据来提升三维场景映射的精度和鲁棒性。\nLiGSM 通过图像和 LiDAR 点云构建联合损失 (Joint Loss)，用于位姿估计 (Pose Estimation) 并优化其外参 (Extrinsic Parameters)，从而能够动态适应传感器对齐的变化。此外，该方法利用LiDAR 点云初始化 3DGS，相比稀疏的结构光测绘 (Structure-from-Motion, SfM) 点云，提供更密集、更可靠的初始点。\n在场景渲染方面，该框架结合标准的基于图像监督与由 LiDAR 投影生成的深度图 (Depth Maps)，确保几何和光度上的精确场景表示。\n在多个公开数据集和自采集数据集上的实验表明，LiGSM 在位姿跟踪 (Pose Tracking) 和场景渲染 (Scene Rendering) 方面均优于现有方法，展现了其优越的性能和应用潜力。\n"
  },
  {
    "path": "abs/2503.05484.md",
    "content": "### DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction\n\nWe present DecoupledGaussian, a novel system that decouples static objects from their contacted surfaces captured in-the-wild videos, a key prerequisite for realistic Newtonian-based physical simulations. Unlike prior methods focused on synthetic data or elastic jittering along the contact surface, which prevent objects from fully detaching or moving independently, DecoupledGaussian allows for significant positional changes without being constrained by the initial contacted surface. Recognizing the limitations of current 2D inpainting tools for restoring 3D locations, our approach proposes joint Poisson fields to repair and expand the Gaussians of both objects and contacted scenes after separation. This is complemented by a multi-carve strategy to refine the object's geometry. Our system enables realistic simulations of decoupling motions, collisions, and fractures driven by user-specified impulses, supporting complex interactions within and across multiple scenes. We validate DecoupledGaussian through a comprehensive user study and quantitative benchmarks. This system enhances digital interaction with objects and scenes in real-world environments, benefiting industries such as VR, robotics, and autonomous driving.\n\n我们提出 DecoupledGaussian，一个创新系统，可在真实视频中解耦静态物体与其接触表面，这一能力是实现基于牛顿物理模拟的逼真交互的关键前提。与以往仅限于合成数据或沿接触面进行弹性抖动的方法不同，这些方法通常无法使物体完全分离或独立移动，而 DecoupledGaussian 允许物体在不受初始接触表面限制的情况下进行大幅度位置变化。\n针对现有 2D 修复工具 在恢复 3D 位置 时的局限性，我们提出了一种 联合泊松场（Joint Poisson Fields） 方法，在物体与接触场景分离后，对 高斯分布（Gaussians） 进行修复和扩展。此外，我们引入 多重雕刻（Multi-Carve）策略 进一步优化物体几何结构。\n我们的系统支持现实感强的物体解耦运动、碰撞和断裂模拟，可根据用户设定的外力驱动复杂交互，适用于多场景环境。我们通过用户研究与定量基准测试对 DecoupledGaussian 进行了全面验证，证明其在数字交互、VR、机器人技术及自动驾驶等行业中的广泛应用潜力。\n"
  },
  {
    "path": "abs/2503.05511.md",
    "content": "### Free Your Hands: Lightweight Relightable Turntable Capture Pipeline\n\nNovel view synthesis (NVS) from multiple captured photos of an object is a widely studied problem. Achieving high quality typically requires dense sampling of input views, which can lead to frustrating and tedious manual labor. Manually positioning cameras to maintain an optimal desired distribution can be difficult for humans, and if a good distribution is found, it is not easy to replicate. Additionally, the captured data can suffer from motion blur and defocus due to human error. In this paper, we present a lightweight object capture pipeline to reduce the manual workload and standardize the acquisition setup. We use a consumer turntable to carry the object and a tripod to hold the camera. As the turntable rotates, we automatically capture dense samples from various views and lighting conditions; we can repeat this for several camera positions. This way, we can easily capture hundreds of valid images in several minutes without hands-on effort. However, in the object reference frame, the light conditions vary; this is harmful to a standard NVS method like 3D Gaussian splatting (3DGS) which assumes fixed lighting. We design a neural radiance representation conditioned on light rotations, which addresses this issue and allows relightability as an additional benefit. We demonstrate our pipeline using 3DGS as the underlying framework, achieving competitive quality compared to previous methods with exhaustive acquisition and showcasing its potential for relighting and harmonization tasks.\n\n从多个拍摄角度生成新视角合成 (Novel View Synthesis, NVS) 是计算机视觉中的一个广泛研究的问题。通常，实现高质量的 NVS 需要密集采样输入视图，但这往往需要繁琐且费时的手动操作。\n手动调整相机 以保持理想的分布对人类而言较为困难，即使找到了一个合适的分布，也很难精确复现。此外，由于人为操作，采集的数据可能会受到运动模糊 (Motion Blur) 和焦点偏移 (Defocus) 的影响，进一步降低 NVS 质量。\n在本文中，我们提出了一种轻量级的物体采集流程 (Lightweight Object Capture Pipeline)，以减少手动工作量并标准化数据采集。我们使用消费级转盘 (Consumer Turntable) 来放置物体，并用三脚架 (Tripod) 固定相机。在转盘旋转过程中，系统会自动捕捉来自多个角度和不同光照条件的密集图像，并可在多个相机位置重复此过程。这样，我们可以在几分钟内 无人工干预地捕获数百张有效图像，极大提高采集效率。\n然而，在物体的参考坐标系中，光照条件会随着转盘旋转而变化，这对 3D 高斯散点 (3D Gaussian Splatting, 3DGS) 等标准 NVS 方法不利，因为这些方法通常假设光照恒定。为了解决这一问题，我们设计了一种基于神经辐射场 (Neural Radiance Representation) 的光照旋转条件模型，能够适应光照变化，同时提供重光照 (Relightability) 的额外能力。\n我们基于 3DGS 作为底层框架验证了该采集流程，在数据采集极大简化的情况下，实现了与以往高密度采集方法相媲美的渲染质量，同时展现了其在重光照 (Relighting) 和光照一致化 (Harmonization) 任务中的潜力。\n"
  },
  {
    "path": "abs/2503.05600.md",
    "content": "### D2GV: Deformable 2D Gaussian Splatting for Video Representation in 400FPS\n\nImplicit Neural Representations (INRs) have emerged as a powerful approach for video representation, offering versatility across tasks such as compression and inpainting. However, their implicit formulation limits both interpretability and efficacy, undermining their practicality as a comprehensive solution. We propose a novel video representation based on deformable 2D Gaussian splatting, dubbed D2GV, which aims to achieve three key objectives: 1) improved efficiency while delivering superior quality; 2) enhanced scalability and interpretability; and 3) increased friendliness for downstream tasks. Specifically, we initially divide the video sequence into fixed-length Groups of Pictures (GoP) to allow parallel training and linear scalability with video length. For each GoP, D2GV represents video frames by applying differentiable rasterization to 2D Gaussians, which are deformed from a canonical space into their corresponding timestamps. Notably, leveraging efficient CUDA-based rasterization, D2GV converges fast and decodes at speeds exceeding 400 FPS, while delivering quality that matches or surpasses state-of-the-art INRs. Moreover, we incorporate a learnable pruning and quantization strategy to streamline D2GV into a more compact representation. We demonstrate D2GV's versatility in tasks including video interpolation, inpainting and denoising, underscoring its potential as a promising solution for video representation.\n\n隐式神经表示 (Implicit Neural Representations, INRs) 已成为视频表示的一种强大方法，在压缩、修复等任务上展现出极大的灵活性。然而，其隐式公式限制了可解释性和有效性，削弱了其作为通用解决方案的实用性。为此，我们提出了一种基于可变形二维高斯散点 (Deformable 2D Gaussian Splatting) 的新型视频表示方法，称为 D2GV，旨在实现以下三个核心目标：1) 在提升质量的同时提高效率；2) 增强可扩展性和可解释性；3) 提高对下游任务的友好性。\n具体而言，我们首先将视频序列划分为固定长度的图像组 (GoP, Groups of Pictures)，以支持并行训练，并确保随视频长度线性扩展。对于每个 GoP，D2GV 通过可微光栅化 (Differentiable Rasterization) 处理 2D 高斯基元，并根据时间戳从一个标准空间进行变形表示。值得注意的是，D2GV 依托高效的 CUDA 加速光栅化，实现了快速收敛，并以超过 400 FPS 的解码速度提供与最先进 INR 方法相匹配甚至更优的质量。此外，我们引入了可学习的剪枝与量化策略，以优化 D2GV，使其更加紧凑高效。\n我们在视频插值、修复和去噪等任务上验证了 D2GV 的多功能性，进一步表明其作为新一代视频表示方案的潜力。\n"
  },
  {
    "path": "abs/2503.05949.md",
    "content": "### Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting\n\nOpen-set semantic mapping requires (i) determining the correct granularity to represent the scene (e.g., how should objects be defined), and (ii) fusing semantic knowledge across multiple 2D observations into an overall 3D reconstruction -ideally with a high-fidelity yet low-memory footprint. While most related works bypass the first issue by grouping together primitives with similar semantics (according to some manually tuned threshold), we recognize that the object granularity is task-dependent, and develop a task-driven semantic mapping approach. To address the second issue, current practice is to average visual embedding vectors over multiple views. Instead, we show the benefits of using a probabilistic approach based on the properties of the underlying visual-language foundation model, and leveraging Bayesian updating to aggregate multiple observations of the scene. The result is Bayesian Fields, a task-driven and probabilistic approach for open-set semantic mapping. To enable high-fidelity objects and a dense scene representation, Bayesian Fields uses 3D Gaussians which we cluster into task-relevant objects, allowing for both easy 3D object extraction and reduced memory usage.\n\n开放集语义映射 (Open-set Semantic Mapping) 需要解决两个关键问题：(i) 确定适当的粒度来表示场景（例如，如何定义对象）；(ii) 融合来自多个 2D 视角的语义信息，以构建完整的 3D 重建——理想情况下，该重建应具备高保真度 (High-fidelity) 且内存占用低 (Low-memory Footprint)。\n大多数相关研究回避了第一个问题，通常通过手动设定的阈值将具有相似语义的元素聚类为同一类别。然而，我们意识到对象的粒度依赖于具体任务，因此提出了一种任务驱动的语义映射 (Task-driven Semantic Mapping) 方法。\n针对第二个问题，目前主流做法是对多个视角的视觉嵌入向量 (Visual Embedding Vectors) 进行均值计算。不同于这种方法，我们提出了一种基于视觉-语言基础模型 (Visual-Language Foundation Model) 的概率方法，利用贝叶斯更新 (Bayesian Updating) 以更有效地融合场景中的多次观测信息。\n最终，我们提出了 Bayesian Fields，这是一种任务驱动 (Task-driven) 且概率化 (Probabilistic) 的开放集语义映射方法。为了实现高保真对象与高密度场景表示，Bayesian Fields 采用 3D 高斯散点 (3D Gaussians)，并将其聚类为与任务相关的对象，从而兼顾易于提取的 3D 物体表示与更低的内存占用。\n"
  },
  {
    "path": "abs/2503.06118.md",
    "content": "### SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography\n\n3D Gaussian Splatting (3DGS) has emerged as a premier method for 3D representation due to its real-time rendering and high-quality outputs, underscoring the critical need to protect the privacy of 3D assets. Traditional NeRF steganography methods fail to address the explicit nature of 3DGS since its point cloud files are publicly accessible. Existing GS steganography solutions mitigate some issues but still struggle with reduced rendering fidelity, increased computational demands, and security flaws, especially in the security of the geometric structure of the visualized point cloud. To address these demands, we propose a SecureGS, a secure and efficient 3DGS steganography framework inspired by Scaffold-GS's anchor point design and neural decoding. SecureGS uses a hybrid decoupled Gaussian encryption mechanism to embed offsets, scales, rotations, and RGB attributes of the hidden 3D Gaussian points in anchor point features, retrievable only by authorized users through privacy-preserving neural networks. To further enhance security, we propose a density region-aware anchor growing and pruning strategy that adaptively locates optimal hiding regions without exposing hidden information. Extensive experiments show that SecureGS significantly surpasses existing GS steganography methods in rendering fidelity, speed, and security.\n\n3D 高斯散点 (3D Gaussian Splatting, 3DGS) 由于其实时渲染能力和高质量输出，已成为 3D 表示的领先方法，这凸显了保护 3D 资产隐私的重要性。传统的 NeRF 隐写方法无法应对 3DGS 的显式特性，因为其点云文件是公开可访问的。现有的 GS 隐写解决方案在一定程度上缓解了这一问题，但仍然面临渲染质量下降、计算需求增加以及安全漏洞等挑战，特别是在可视化点云几何结构的安全性方面。\n为了解决这些需求，我们提出 SecureGS，一个受 Scaffold-GS 的锚点设计和神经解码启发的安全高效 3DGS 隐写框架。SecureGS 采用 混合解耦高斯加密机制，将隐藏 3D 高斯点的偏移、尺度、旋转和 RGB 属性嵌入锚点特征中，仅授权用户可通过隐私保护的神经网络进行恢复。为了进一步增强安全性，我们提出了一种 密度区域感知的锚点生长与剪枝策略，能够自适应地定位最优隐藏区域，同时避免暴露隐藏信息。\n大量实验表明，SecureGS 在渲染质量、速度和安全性方面均显著优于现有的 GS 隐写方法。\n"
  },
  {
    "path": "abs/2503.06136.md",
    "content": "### GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation\n\nImage-based 3D generation has vast applications in robotics and gaming, where high-quality, diverse outputs and consistent 3D representations are crucial. However, existing methods have limitations: 3D diffusion models are limited by dataset scarcity and the absence of strong pre-trained priors, while 2D diffusion-based approaches struggle with geometric consistency. We propose a method that leverages 2D diffusion models' implicit 3D reasoning ability while ensuring 3D consistency via Gaussian-splatting-based geometric distillation. Specifically, the proposed Gaussian Splatting Decoder enforces 3D consistency by transforming SV3D latent outputs into an explicit 3D representation. Unlike SV3D, which only relies on implicit 2D representations for video generation, Gaussian Splatting explicitly encodes spatial and appearance attributes, enabling multi-view consistency through geometric constraints. These constraints correct view inconsistencies, ensuring robust geometric consistency. As a result, our approach simultaneously generates high-quality, multi-view-consistent images and accurate 3D models, providing a scalable solution for single-image-based 3D generation and bridging the gap between 2D Diffusion diversity and 3D structural coherence. Experimental results demonstrate state-of-the-art multi-view consistency and strong generalization across diverse datasets.\n\n基于图像的 3D 生成在机器人技术和游戏领域具有广泛的应用，其中高质量、多样化的输出以及一致的 3D 表示至关重要。然而，现有方法存在一定局限性：3D 扩散模型 受限于数据集的稀缺性和缺乏强大的预训练先验，而 2D 扩散方法 则难以保证几何一致性。\n为此，我们提出了一种方法，该方法利用 2D 扩散模型的隐式 3D 推理能力，同时通过基于高斯散点的几何蒸馏（Gaussian-Splatting-based Geometric Distillation）确保 3D 一致性。具体而言，所提出的 高斯散点解码器（Gaussian Splatting Decoder） 通过将 SV3D 潜变量输出转换为显式 3D 表示，强制实现 3D 一致性。与仅依赖隐式 2D 表示进行视频生成的 SV3D 不同，高斯散点能够显式编码空间和外观属性，并通过几何约束实现多视角一致性。这些几何约束可校正视角不一致问题，确保稳健的几何一致性。\n最终，我们的方法能够同时生成高质量、多视角一致的图像和精确的 3D 模型，为单图 3D 生成提供了一种可扩展的解决方案，并有效弥合了 2D 扩散模型的多样性 与 3D 结构一致性 之间的鸿沟。实验结果表明，该方法在多视角一致性方面达到了当前最先进水平，并在多种数据集上展现了强大的泛化能力。\n\n"
  },
  {
    "path": "abs/2503.06161.md",
    "content": "### Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction\n\nMinimally invasive surgery (MIS) has transformed clinical practice by reducing recovery times, minimizing complications, and enhancing precision. Nonetheless, MIS inherently relies on indirect visualization and precise instrument control, posing unique challenges. Recent advances in artificial intelligence have enabled real-time surgical scene understanding through techniques such as image classification, object detection, and segmentation, with scene reconstruction emerging as a key element for enhanced intraoperative guidance. Although neural radiance fields (NeRFs) have been explored for this purpose, their substantial data requirements and slow rendering inhibit real-time performance. In contrast, 3D Gaussian Splatting (3DGS) offers a more efficient alternative, achieving state-of-the-art performance in dynamic surgical scene reconstruction. In this work, we introduce Feature-EndoGaussian (FEG), an extension of 3DGS that integrates 2D segmentation cues into 3D rendering to enable real-time semantic and scene reconstruction. By leveraging pretrained segmentation foundation models, FEG incorporates semantic feature distillation within the Gaussian deformation framework, thereby enhancing both reconstruction fidelity and segmentation accuracy. On the EndoNeRF dataset, FEG achieves superior performance (SSIM of 0.97, PSNR of 39.08, and LPIPS of 0.03) compared to leading methods. Additionally, on the EndoVis18 dataset, FEG demonstrates competitive class-wise segmentation metrics while balancing model size and real-time performance.\n\n微创手术（Minimally Invasive Surgery, MIS） 通过缩短恢复时间、减少并发症并提高精确度，已彻底改变了临床实践。然而，MIS 本质上依赖于间接可视化和精确的器械控制，因此面临独特的挑战。近年来，人工智能的进步使得实时手术场景理解成为可能，包括图像分类、目标检测和分割等技术，而场景重建已成为增强术中引导的关键要素。\n尽管 Neural Radiance Fields（NeRFs） 已被用于手术场景重建，但其对大量数据的需求和缓慢的渲染速度阻碍了实时应用。相比之下，3D 高斯散点（3D Gaussian Splatting, 3DGS） 提供了一种更高效的替代方案，在动态手术场景重建方面达到了当前最先进的性能。\n在本研究中，我们提出 Feature-EndoGaussian（FEG），这是一种 3DGS 的扩展方法，将2D 分割信息融入 3D 渲染，从而实现实时语义和场景重建。FEG 通过利用预训练的分割基础模型，在高斯变形框架中引入语义特征蒸馏，从而同时提升重建保真度和分割精度。\n在 EndoNeRF 数据集上，FEG 取得了优越的性能（SSIM 0.97，PSNR 39.08，LPIPS 0.03），超越现有主流方法。此外，在 EndoVis18 数据集上，FEG 在类别级分割指标上表现出竞争力，并在模型规模与实时性能之间实现了良好平衡。\n"
  },
  {
    "path": "abs/2503.06179.md",
    "content": "### ForestSplats: Deformable transient field for Gaussian Splatting in the Wild\n\nRecently, 3D Gaussian Splatting (3D-GS) has emerged, showing real-time rendering speeds and high-quality results in static scenes. Although 3D-GS shows effectiveness in static scenes, their performance significantly degrades in real-world environments due to transient objects, lighting variations, and diverse levels of occlusion. To tackle this, existing methods estimate occluders or transient elements by leveraging pre-trained models or integrating additional transient field pipelines. However, these methods still suffer from two defects: 1) Using semantic features from the Vision Foundation model (VFM) causes additional computational costs. 2) The transient field requires significant memory to handle transient elements with per-view Gaussians and struggles to define clear boundaries for occluders, solely relying on photometric errors. To address these problems, we propose ForestSplats, a novel approach that leverages the deformable transient field and a superpixel-aware mask to efficiently represent transient elements in the 2D scene across unconstrained image collections and effectively decompose static scenes from transient distractors without VFM. We designed the transient field to be deformable, capturing per-view transient elements. Furthermore, we introduce a superpixel-aware mask that clearly defines the boundaries of occluders by considering photometric errors and superpixels. Additionally, we propose uncertainty-aware densification to avoid generating Gaussians within the boundaries of occluders during densification. Through extensive experiments across several benchmark datasets, we demonstrate that ForestSplats outperforms existing methods without VFM and shows significant memory efficiency in representing transient elements.\n\n最近，三维高斯散点（3D Gaussian Splatting，3D-GS）技术兴起，在静态场景中展现了实时渲染速度和高质量的效果。尽管 3D-GS 在静态场景中表现出色，但在真实世界环境下，其性能会因瞬态物体、光照变化和不同程度的遮挡而显著下降。为了解决这一问题，现有方法通常利用预训练模型或集成额外的瞬态场景建模管线来估计遮挡物或瞬态元素。然而，这些方法仍然存在两个主要缺陷：1）依赖视觉基础模型（Vision Foundation Model，VFM）提取语义特征会导致额外的计算开销；2）瞬态场景建模需要大量内存来处理基于视角的高斯表示，并且仅依赖光度误差难以明确划定遮挡物的边界。\n针对这些问题，我们提出 ForestSplats，一种新颖的方法，它结合可变形瞬态场景建模和 超像素感知掩码（superpixel-aware mask），能够高效地在无约束图像集合中表示 2D 场景中的瞬态元素，并在不依赖 VFM 的情况下有效地将静态场景从瞬态干扰中分离。我们设计的瞬态场景建模是可变形的，可以捕捉基于视角的瞬态元素。此外，我们引入了超像素感知掩码，综合考虑光度误差和超像素信息，从而清晰地定义遮挡物的边界。此外，我们提出了不确定性感知加密策略（uncertainty-aware densification），在加密过程中避免在遮挡物边界内生成高斯点。\n通过在多个基准数据集上的广泛实验，我们证明 ForestSplats 在不依赖 VFM 的情况下优于现有方法，并在瞬态元素的表示上展现出显著的内存效率。\n\n"
  },
  {
    "path": "abs/2503.06235.md",
    "content": "### StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams\n\nThe advent of 3D Gaussian Splatting (3DGS) has advanced 3D scene reconstruction and novel view synthesis. With the growing interest of interactive applications that need immediate feedback, online 3DGS reconstruction in real-time is in high demand. However, none of existing methods yet meet the demand due to three main challenges: the absence of predetermined camera parameters, the need for generalizable 3DGS optimization, and the necessity of reducing redundancy. We propose StreamGS, an online generalizable 3DGS reconstruction method for unposed image streams, which progressively transform image streams to 3D Gaussian streams by predicting and aggregating per-frame Gaussians. Our method overcomes the limitation of the initial point reconstruction \\cite{dust3r} in tackling out-of-domain (OOD) issues by introducing a content adaptive refinement. The refinement enhances cross-frame consistency by establishing reliable pixel correspondences between adjacent frames. Such correspondences further aid in merging redundant Gaussians through cross-frame feature aggregation. The density of Gaussians is thereby reduced, empowering online reconstruction by significantly lowering computational and memory costs. Extensive experiments on diverse datasets have demonstrated that StreamGS achieves quality on par with optimization-based approaches but does so 150 times faster, and exhibits superior generalizability in handling OOD scenes.\n\n三维高斯散点（3D Gaussian Splatting，3DGS）的出现推动了三维场景重建和新视角合成的发展。随着对需要即时反馈的交互式应用的关注不断增长，实时在线 3DGS 重建的需求也日益增加。然而，现有方法尚无法满足这一需求，主要受到以下三大挑战的限制：缺乏预设相机参数、需要可泛化的 3DGS 优化，以及减少冗余的必要性。\n为此，我们提出 StreamGS，一种用于无位姿图像流的在线可泛化 3DGS 重建方法，该方法通过预测和聚合逐帧高斯点，逐步将图像流转换为 3D 高斯流。我们的方法克服了初始点重建方法 \\cite{dust3r} 在处理域外（OOD）场景时的局限性，引入了一种内容自适应优化（content adaptive refinement）。该优化方法通过在相邻帧之间建立可靠的像素对应关系来增强跨帧一致性。这种对应关系进一步帮助通过跨帧特征聚合合并冗余高斯点，从而减少高斯点的密度，大幅降低计算和内存开销，使在线重建成为可能。\n在多个不同数据集上的广泛实验表明，StreamGS 在重建质量上可与基于优化的方法相媲美，但速度提高 150 倍，同时在处理 OOD 场景时展现出更强的泛化能力。\n"
  },
  {
    "path": "abs/2503.06271.md",
    "content": "### SplatTalk: 3D VQA with Gaussian Splatting\n\nLanguage-guided 3D scene understanding is important for advancing applications in robotics, AR/VR, and human-computer interaction, enabling models to comprehend and interact with 3D environments through natural language. While 2D vision-language models (VLMs) have achieved remarkable success in 2D VQA tasks, progress in the 3D domain has been significantly slower due to the complexity of 3D data and the high cost of manual annotations. In this work, we introduce SplatTalk, a novel method that uses a generalizable 3D Gaussian Splatting (3DGS) framework to produce 3D tokens suitable for direct input into a pretrained LLM, enabling effective zero-shot 3D visual question answering (3D VQA) for scenes with only posed images. During experiments on multiple benchmarks, our approach outperforms both 3D models trained specifically for the task and previous 2D-LMM-based models utilizing only images (our setting), while achieving competitive performance with state-of-the-art 3D LMMs that additionally utilize 3D inputs.\n\n三维高斯散点（3D Gaussian Splatting，3DGS）的出现推动了三维场景重建和新视角合成的发展。随着对需要即时反馈的交互式应用的关注不断增长，实时在线 3DGS 重建的需求也日益增加。然而，现有方法尚无法满足这一需求，主要受到以下三大挑战的限制：缺乏预设相机参数、需要可泛化的 3DGS 优化，以及减少冗余的必要性。\n为此，我们提出 StreamGS，一种用于无位姿图像流的在线可泛化 3DGS 重建方法，该方法通过预测和聚合逐帧高斯点，逐步将图像流转换为 3D 高斯流。我们的方法克服了初始点重建方法 \\cite{dust3r} 在处理域外（OOD）场景时的局限性，引入了一种内容自适应优化（content adaptive refinement）。该优化方法通过在相邻帧之间建立可靠的像素对应关系来增强跨帧一致性。这种对应关系进一步帮助通过跨帧特征聚合合并冗余高斯点，从而减少高斯点的密度，大幅降低计算和内存开销，使在线重建成为可能。\n在多个不同数据集上的广泛实验表明，StreamGS 在重建质量上可与基于优化的方法相媲美，但速度提高 150 倍，同时在处理 OOD 场景时展现出更强的泛化能力。\n"
  },
  {
    "path": "abs/2503.06462.md",
    "content": "### StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting\n\nRecent advancements in 3D reconstruction coupled with neural rendering techniques have greatly improved the creation of photo-realistic 3D scenes, influencing both academic research and industry applications. The technique of 3D Gaussian Splatting and its variants incorporate the strengths of both primitive-based and volumetric representations, achieving superior rendering quality. While 3D Geometric Scattering (3DGS) and its variants have advanced the field of 3D representation, they fall short in capturing the stochastic properties of non-local structural information during the training process. Additionally, the initialisation of spherical functions in 3DGS-based methods often fails to engage higher-order terms in early training rounds, leading to unnecessary computational overhead as training progresses. Furthermore, current 3DGS-based approaches require training on higher resolution images to render higher resolution outputs, significantly increasing memory demands and prolonging training durations. We introduce StructGS, a framework that enhances 3D Gaussian Splatting (3DGS) for improved novel-view synthesis in 3D reconstruction. StructGS innovatively incorporates a patch-based SSIM loss, dynamic spherical harmonics initialisation and a Multi-scale Residual Network (MSRN) to address the above-mentioned limitations, respectively. Our framework significantly reduces computational redundancy, enhances detail capture and supports high-resolution rendering from low-resolution inputs. Experimentally, StructGS demonstrates superior performance over state-of-the-art (SOTA) models, achieving higher quality and more detailed renderings with fewer artifacts.\n\n近年来，三维重建技术结合神经渲染方法，极大地提升了照片级真实感三维场景的构建能力，对学术研究和工业应用均产生了深远影响。**三维高斯散点（3D Gaussian Splatting, 3DGS）**及其变体结合了基元（primitive-based）表示和体素（volumetric）表示的优势，实现了卓越的渲染质量。\n尽管 3DGS 及其变体 在 三维表示 领域取得了重要进展，但在训练过程中，它们未能充分捕捉非局部结构信息的随机特性。此外，3DGS 方法中球函数（spherical functions）的初始化通常在训练初期无法有效激活高阶项，导致训练过程中不必要的计算开销。此外，当前基于 3DGS 的方法需要在高分辨率图像上训练才能生成高分辨率的输出，这显著增加了内存需求并延长了训练时间。\n为了解决这些问题，我们提出 StructGS，一个针对 三维重建中新视角合成（novel-view synthesis） 进行增强的 3DGS 框架。StructGS 通过块级 SSIM 损失（patch-based SSIM loss）、动态球谐函数初始化（dynamic spherical harmonics initialization）和多尺度残差网络（Multi-scale Residual Network, MSRN），分别解决了上述限制。我们的框架有效降低计算冗余，增强细节捕捉，并支持从低分辨率输入渲染高分辨率输出。\n实验结果表明，StructGS 在多个基准测试中超越**当前最先进（SOTA）**方法，在渲染质量和细节保留方面均表现出色，同时减少了伪影。\n"
  },
  {
    "path": "abs/2503.06587.md",
    "content": "### Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction\n\nRecently, 2D Gaussian Splatting (2DGS) has demonstrated superior geometry reconstruction quality than the popular 3DGS by using 2D surfels to approximate thin surfaces. However, it falls short when dealing with glossy surfaces, resulting in visible holes in these areas. We found the reflection discontinuity causes the issue. To fit the jump from diffuse to specular reflection at different viewing angles, depth bias is introduced in the optimized Gaussian primitives. To address that, we first replace the depth distortion loss in 2DGS with a novel depth convergence loss, which imposes a strong constraint on depth continuity. Then, we rectified the depth criterion in determining the actual surface, which fully accounts for all the intersecting Gaussians along the ray. Qualitative and quantitative evaluations across various datasets reveal that our method significantly improves reconstruction quality, with more complete and accurate surfaces than 2DGS.\n\n最近，二维高斯散点 (2DGS) 在几何重建质量上展现出了比流行的 三维高斯散点 (3DGS) 更优越的性能，它通过 二维曲面元素 (2D surfels) 逼近薄表面。然而，在处理光滑表面 (glossy surfaces) 时，2DGS 表现不佳，导致这些区域出现明显的空洞。我们发现，该问题源于反射不连续性 (reflection discontinuity)。为了适应从漫反射 (diffuse) 到镜面反射 (specular) 在不同视角下的突变，2DGS 在优化的高斯基元中引入了深度偏移 (depth bias)，但这并未能彻底解决问题。\n为了解决这一问题，我们提出了两个改进措施。首先，我们用一种 新的深度收敛损失 (depth convergence loss) 替换了 2DGS 现有的 深度畸变损失 (depth distortion loss)，该损失对深度连续性施加了更强的约束。其次，我们修正了用于确定实际表面的深度判定准则，使其充分考虑沿射线方向上所有相交的高斯基元。\n在多个数据集上的定性和定量评估表明，我们的方法显著提升了重建质量，生成的表面比 2DGS 更完整且更准确。\n"
  },
  {
    "path": "abs/2503.06617.md",
    "content": "### Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling\n\nArbitrary-scale super-resolution (ASSR) aims to reconstruct high-resolution (HR) images from low-resolution (LR) inputs with arbitrary upsampling factors using a single model, addressing the limitations of traditional SR methods constrained to fixed-scale factors (\\textit{e.g.}, × 2). Recent advances leveraging implicit neural representation (INR) have achieved great progress by modeling coordinate-to-pixel mappings. However, the efficiency of these methods may suffer from repeated upsampling and decoding, while their reconstruction fidelity and quality are constrained by the intrinsic representational limitations of coordinate-based functions. To address these challenges, we propose a novel ContinuousSR framework with a Pixel-to-Gaussian paradigm, which explicitly reconstructs 2D continuous HR signals from LR images using Gaussian Splatting. This approach eliminates the need for time-consuming upsampling and decoding, enabling extremely fast arbitrary-scale super-resolution. Once the Gaussian field is built in a single pass, ContinuousSR can perform arbitrary-scale rendering in just 1ms per scale. Our method introduces several key innovations. Through statistical analysis, we uncover the Deep Gaussian Prior (DGP) and propose DGP-Driven Covariance Weighting, which dynamically optimizes covariance via adaptive weighting. Additionally, we present Adaptive Position Drifting, which refines the positional distribution of the Gaussian space based on image content, further enhancing reconstruction quality. Extensive experiments on seven benchmarks demonstrate that our ContinuousSR delivers significant improvements in SR quality across all scales, with an impressive 19.5× speedup when continuously upsampling an image across forty scales.\n\n任意尺度超分辨率 (Arbitrary-scale Super-Resolution, ASSR) 旨在通过单一模型从低分辨率 (LR) 输入重建高分辨率 (HR) 图像，并支持任意放大倍数，从而突破传统超分辨率 (SR) 方法受固定缩放因子（\\textit{e.g.}, ×2）限制的问题。近年来，利用隐式神经表示 (Implicit Neural Representation, INR) 的方法通过建模坐标到像素的映射取得了显著进展。然而，这类方法的效率可能受到重复放大和解码过程的影响，同时，其重建保真度和质量也受限于基于坐标函数的固有表示能力。\n为解决这些挑战，我们提出了一种新的 ContinuousSR 框架，基于像素到高斯 (Pixel-to-Gaussian) 方案，通过高斯散点 (Gaussian Splatting) 从 LR 图像显式重建二维连续 HR 信号。这种方法无需耗时的放大和解码过程，实现了极高速的任意尺度超分辨率。一旦建立高斯场（Gaussian field），ContinuousSR 仅需 1ms 即可完成任意尺度渲染。\n我们的方法引入了多个关键创新点。首先，通过统计分析，我们揭示了深度高斯先验 (Deep Gaussian Prior, DGP)，并提出 DGP 驱动的协方差加权 (DGP-Driven Covariance Weighting)，利用自适应加权动态优化协方差。此外，我们提出 自适应位置漂移 (Adaptive Position Drifting)，根据图像内容优化高斯空间中的位置分布，进一步提升重建质量。 在七个基准数据集上的广泛实验表明，ContinuousSR 在所有尺度下均显著提升 SR 质量，并在跨四十个尺度连续放大时实现了 19.5× 的加速。\n"
  },
  {
    "path": "abs/2503.06677.md",
    "content": "### REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints\n\nArticulated objects, as prevalent entities in human life, their 3D representations play crucial roles across various applications. However, achieving both high-fidelity textured surface reconstruction and dynamic generation for articulated objects remains challenging for existing methods. In this paper, we present REArtGS, a novel framework that introduces additional geometric and motion constraints to 3D Gaussian primitives, enabling high-quality textured surface reconstruction and generation for articulated objects. Specifically, given multi-view RGB images of arbitrary two states of articulated objects, we first introduce an unbiased Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields, enhancing geometry constraints and improving surface reconstruction quality. Then we establish deformable fields for 3D Gaussians constrained by the kinematic structures of articulated objects, achieving unsupervised generation of surface meshes in unseen states. Extensive experiments on both synthetic and real datasets demonstrate our approach achieves high-quality textured surface reconstruction for given states, and enables high-fidelity surface generation for unseen states.\n\n关节物体（articulated objects）是人类生活中常见的实体，其三维表示在多个应用领域中起着至关重要的作用。然而，现有方法仍难以同时实现高保真纹理化表面重建和关节物体的动态生成。\n在本文中，我们提出 REArtGS，一个新颖的框架，该框架在三维高斯基元（3D Gaussian primitives）中引入额外的几何和运动约束，从而实现高质量的关节物体纹理化表面重建和生成。具体而言，给定关节物体在两个不同状态下的多视角 RGB 图像，我们首先引入无偏 Signed Distance Field（SDF）引导来正则化高斯的不透明度场，从而增强几何约束并提升表面重建质量。随后，我们基于关节物体的运动学结构，为 3D 高斯建立可变形场（deformable fields），从而在无监督条件下生成未见状态下的表面网格。\n在合成数据集和真实数据集上的广泛实验表明，我们的方法不仅能在给定状态下实现高质量的纹理化表面重建，还能在未见状态下生成高保真的表面。\n"
  },
  {
    "path": "abs/2503.06740.md",
    "content": "### D3DR: Lighting-Aware Object Insertion in Gaussian Splatting\n\nGaussian Splatting has become a popular technique for various 3D Computer Vision tasks, including novel view synthesis, scene reconstruction, and dynamic scene rendering. However, the challenge of natural-looking object insertion, where the object's appearance seamlessly matches the scene, remains unsolved. In this work, we propose a method, dubbed D3DR, for inserting a 3DGS-parametrized object into 3DGS scenes while correcting its lighting, shadows, and other visual artifacts to ensure consistency, a problem that has not been successfully addressed before. We leverage advances in diffusion models, which, trained on real-world data, implicitly understand correct scene lighting. After inserting the object, we optimize a diffusion-based Delta Denoising Score (DDS)-inspired objective to adjust its 3D Gaussian parameters for proper lighting correction. Utilizing diffusion model personalization techniques to improve optimization quality, our approach ensures seamless object insertion and natural appearance. Finally, we demonstrate the method's effectiveness by comparing it to existing approaches, achieving 0.5 PSNR and 0.15 SSIM improvements in relighting quality.\n\n高斯散点（Gaussian Splatting, 3DGS） 已成为众多三维计算机视觉任务中的主流技术，包括新视角合成（novel view synthesis）、场景重建和动态场景渲染。然而，在自然逼真的对象插入任务中，使得插入对象的外观无缝匹配场景的挑战仍未得到解决。\n在本工作中，我们提出 D3DR，一种在 3DGS 场景中插入 3DGS 参数化对象的方法，并校正其光照、阴影及其他视觉伪影，确保视觉一致性——这一问题此前尚未得到成功解决。我们利用扩散模型（diffusion models）的最新进展，这些模型在真实世界数据上训练，能够隐式理解正确的场景光照。在对象插入后，我们优化基于扩散的 Delta Denoising Score（DDS）损失，以调整其 3D Gaussian 参数 进行光照校正。此外，我们结合扩散模型个性化（personalization）技术以提升优化质量，从而实现无缝的对象插入和自然的外观匹配。\n最终，我们通过与现有方法的比较验证了该方法的有效性，在**重光照质量（relighting quality）**方面实现了 PSNR 提升 0.5，SSIM 提升 0.15，显著提升了插入对象的视觉一致性。\n"
  },
  {
    "path": "abs/2503.06744.md",
    "content": "### CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving\n\nDynamic scene rendering opens new avenues in autonomous driving by enabling closed-loop simulations with photorealistic data, which is crucial for validating end-to-end algorithms. However, the complex and highly dynamic nature of traffic environments presents significant challenges in accurately rendering these scenes. In this paper, we introduce a novel 4D Gaussian Splatting (4DGS) approach, which incorporates context and temporal deformation awareness to improve dynamic scene rendering. Specifically, we employ a 2D semantic segmentation foundation model to self-supervise the 4D semantic features of Gaussians, ensuring meaningful contextual embedding. Simultaneously, we track the temporal deformation of each Gaussian across adjacent frames. By aggregating and encoding both semantic and temporal deformation features, each Gaussian is equipped with cues for potential deformation compensation within 3D space, facilitating a more precise representation of dynamic scenes. Experimental results show that our method improves 4DGS's ability to capture fine details in dynamic scene rendering for autonomous driving and outperforms other self-supervised methods in 4D reconstruction and novel view synthesis. Furthermore, CoDa-4DGS deforms semantic features with each Gaussian, enabling broader applications.\n\n动态场景渲染在自动驾驶领域开辟了新的研究方向，使得基于**光真实数据（photorealistic data）的闭环仿真成为可能，对于端到端算法（end-to-end algorithms）**的验证至关重要。然而，交通环境的高度动态性和复杂性使得准确渲染此类场景极具挑战性。\n在本文中，我们提出了一种新颖的 四维高斯散点（4D Gaussian Splatting, 4DGS） 方法，该方法结合上下文感知和时序变形感知来提升动态场景渲染的质量。具体而言，我们采用二维语义分割基础模型（2D semantic segmentation foundation model）对高斯点的 4D 语义特征进行自监督学习，从而确保语义特征的有效嵌入。同时，我们跟踪每个高斯点在相邻帧间的时序变形。通过聚合并编码语义信息与时序变形特征，每个高斯点都具备时序变形补偿能力，从而在三维空间中更精准地表示动态场景。\n实验结果表明，我们的方法显著提升了 4DGS 在自动驾驶场景中的动态细节捕捉能力，在4D 重建和新视角合成（novel view synthesis）方面优于其他自监督方法。此外，CoDa-4DGS 使得语义特征能够随高斯点一起变形，从而拓展了更广泛的应用场景。\n"
  },
  {
    "path": "abs/2503.06762.md",
    "content": "### Gaussian RBFNet: Gaussian Radial Basis Functions for Fast and Accurate Representation and Reconstruction of Neural Fields\n\nNeural fields such as DeepSDF and Neural Radiance Fields have recently revolutionized novel-view synthesis and 3D reconstruction from RGB images and videos. However, achieving high-quality representation, reconstruction, and rendering requires deep neural networks, which are slow to train and evaluate. Although several acceleration techniques have been proposed, they often trade off speed for memory. Gaussian splatting-based methods, on the other hand, accelerate the rendering time but remain costly in terms of training speed and memory needed to store the parameters of a large number of Gaussians. In this paper, we introduce a novel neural representation that is fast, both at training and inference times, and lightweight. Our key observation is that the neurons used in traditional MLPs perform simple computations (a dot product followed by ReLU activation) and thus one needs to use either wide and deep MLPs or high-resolution and high-dimensional feature grids to parameterize complex nonlinear functions. We show in this paper that by replacing traditional neurons with Radial Basis Function (RBF) kernels, one can achieve highly accurate representation of 2D (RGB images), 3D (geometry), and 5D (radiance fields) signals with just a single layer of such neurons. The representation is highly parallelizable, operates on low-resolution feature grids, and is compact and memory-efficient. We demonstrate that the proposed novel representation can be trained for 3D geometry representation in less than 15 seconds and for novel view synthesis in less than 15 mins. At runtime, it can synthesize novel views at more than 60 fps without sacrificing quality.\n\n神经场（Neural Fields） 技术，如 DeepSDF 和 神经辐射场（Neural Radiance Fields, NeRF），近年来在新视角合成（novel-view synthesis）和三维重建（3D reconstruction）方面取得了突破性进展。然而，要实现高质量的表示、重建和渲染，通常需要依赖深度神经网络，导致训练和推理速度较慢。尽管已有多个加速方法被提出，但它们往往在速度与内存占用之间做出权衡。\n相比之下，基于高斯散点（Gaussian Splatting）的方法加速了渲染过程，但仍然存在训练速度慢和存储大量高斯参数所需的内存开销高的问题。\n在本文中，我们提出了一种新颖的神经表示方法，在训练和推理阶段均具有高效性，同时占用更少的存储。我们发现，传统的 MLP 神经元执行的计算相对简单（点积 + ReLU 激活），因此通常需要宽且深的 MLP 或 高分辨率高维特征网格 来表示复杂的非线性函数。本文表明，将传统神经元替换为径向基函数（Radial Basis Function, RBF）核，可以在单层神经元的情况下实现高精度的 2D（RGB 图像）、3D（几何）、以及 5D（辐射场）信号表示。\n我们的方法高度并行化，可在低分辨率特征网格上运行，并且紧凑且内存高效。实验表明，该方法可在 15 秒内完成 3D 几何训练，在 15 分钟内完成新视角合成训练，并且在运行时能够以 60+ fps 生成新视角，而不会牺牲渲染质量。\n"
  },
  {
    "path": "abs/2503.06859.md",
    "content": "### ActiveInitSplat: How Active Image Selection Helps Gaussian Splatting\n\nGaussian splatting (GS) along with its extensions and variants provides outstanding performance in real-time scene rendering while meeting reduced storage demands and computational efficiency. While the selection of 2D images capturing the scene of interest is crucial for the proper initialization and training of GS, hence markedly affecting the rendering performance, prior works rely on passively and typically densely selected 2D images. In contrast, this paper proposes ActiveInitSplat, a novel framework for active selection of training images for proper initialization and training of GS. ActiveInitSplat relies on density and occupancy criteria of the resultant 3D scene representation from the selected 2D images, to ensure that the latter are captured from diverse viewpoints leading to better scene coverage and that the initialized Gaussian functions are well aligned with the actual 3D structure. Numerical tests on well-known simulated and real environments demonstrate the merits of ActiveInitSplat resulting in significant GS rendering performance improvement over passive GS baselines, in the widely adopted LPIPS, SSIM, and PSNR metrics.\n\n高斯散点（Gaussian Splatting, GS） 及其扩展和变体在实时场景渲染方面表现卓越，同时兼顾存储效率和计算高效性。然而，在 GS 的初始化和训练过程中，场景相关的 2D 图像选择至关重要，这直接影响最终的渲染性能。以往的方法主要依赖于被动选择，通常会采用密集采样的 2D 图像，但这一策略未能充分优化训练数据的质量和多样性。\n相较之下，本文提出 ActiveInitSplat，一种用于 GS 主动训练图像选择（active selection of training images） 的新框架。ActiveInitSplat 依据 已选 2D 图像所生成的 3D 场景表示的密度和占用率（density and occupancy criteria），确保这些图像来自多样化视角，从而优化场景覆盖范围，并使初始化的高斯函数更精准地对齐真实的 3D 结构。\n在多个模拟和真实环境上的数值测试表明，ActiveInitSplat 相较于传统的被动 GS 方案，在LPIPS、SSIM 和 PSNR 等广泛采用的评测指标上均实现了显著的渲染性能提升。\n"
  },
  {
    "path": "abs/2503.06900.md",
    "content": "### DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation\n\nWe present DirectTriGS, a novel framework designed for 3D object generation with Gaussian Splatting (GS). GS-based rendering for 3D content has gained considerable attention recently. However, there has been limited exploration in directly generating 3D Gaussians compared to traditional generative modeling approaches. The main challenge lies in the complex data structure of GS represented by discrete point clouds with multiple channels. To overcome this challenge, we propose employing the triplane representation, which allows us to represent Gaussian Splatting as an image-like continuous field. This representation effectively encodes both the geometry and texture information, enabling smooth transformation back to Gaussian point clouds and rendering into images by a TriRenderer, with only 2D supervisions. The proposed TriRenderer is fully differentiable, so that the rendering loss can supervise both texture and geometry encoding. Furthermore, the triplane representation can be compressed using a Variational Autoencoder (VAE), which can subsequently be utilized in latent diffusion to generate 3D objects. The experiments demonstrate that the proposed generation framework can produce high-quality 3D object geometry and rendering results in the text-to-3D task.\n\n我们提出 DirectTriGS，一个用于三维对象生成（3D object generation）的高斯散点（Gaussian Splatting, GS）新框架。近年来，基于 GS 的 3D 内容渲染 受到了广泛关注。然而，相较于传统的生成建模（generative modeling）方法，直接生成 3D 高斯点的研究仍然较少。其主要挑战在于 GS 采用的复杂数据结构，即由多个通道组成的离散点云，使得直接生成和优化变得困难。\n为了解决这一问题，我们提出采用 三平面（triplane）表示，将 Gaussian Splatting 转换为类图像的连续场（image-like continuous field）。这一表示方式能够有效编码几何与纹理信息，并且可以平滑地转换回高斯点云，通过 TriRenderer 渲染为图像，仅需 2D 监督。我们设计的 TriRenderer 是完全可微的，使得渲染损失能够同时监督纹理和几何编码，确保生成高质量的 3D 结构。\n此外，我们利用 变分自编码器（VAE） 对三平面表示进行压缩，并进一步在潜在扩散模型（latent diffusion）中进行三维对象生成。实验结果表明，该框架能够在 文本生成 3D（text-to-3D） 任务中生成高质量的 3D 物体几何和渲染结果，展示了 DirectTriGS 在三维生成任务中的卓越表现。\n\n"
  },
  {
    "path": "abs/2503.07000.md",
    "content": "### Frequency-Aware Density Control via Reparameterization for High-Quality Rendering of 3D Gaussian Splatting\n\nBy adaptively controlling the density and generating more Gaussians in regions with high-frequency information, 3D Gaussian Splatting (3DGS) can better represent scene details. From the signal processing perspective, representing details usually needs more Gaussians with relatively smaller scales. However, 3DGS currently lacks an explicit constraint linking the density and scale of 3D Gaussians across the domain, leading to 3DGS using improper-scale Gaussians to express frequency information, resulting in the loss of accuracy. In this paper, we propose to establish a direct relation between density and scale through the reparameterization of the scaling parameters and ensure the consistency between them via explicit constraints (i.e., density responds well to changes in frequency). Furthermore, we develop a frequency-aware density control strategy, consisting of densification and deletion, to improve representation quality with fewer Gaussians. A dynamic threshold encourages densification in high-frequency regions, while a scale-based filter deletes Gaussians with improper scale. Experimental results on various datasets demonstrate that our method outperforms existing state-of-the-art methods quantitatively and qualitatively.\n\n通过自适应控制密度并在高频信息区域生成更多高斯点，三维高斯散点（3D Gaussian Splatting, 3DGS）能够更精细地表示场景细节。从信号处理的角度来看，表示高频细节通常需要更多的高斯点，且其尺度相对较小。然而，当前 3DGS 缺乏一个明确的约束来关联三维高斯点的密度与尺度，导致 3DGS 在整个空间域中采用不适当尺度的高斯点来表达频率信息，从而影响表示精度。\n为了解决这一问题，本文提出通过缩放参数的重参数化（reparameterization of the scaling parameters）建立密度与尺度的直接关系，并引入显式约束以确保它们之间的一致性（即密度能够准确响应频率变化）。此外，我们设计了一种频率感知的密度控制策略（frequency-aware density control strategy），包括加密（densification）和删除（deletion）机制，以在减少高斯点数量的同时提升表示质量。具体而言，我们引入动态阈值以在高频区域促进加密，同时采用**基于尺度的滤波（scale-based filter）**来删除尺度不当的高斯点。\n在多个数据集上的实验结果表明，我们的方法在定量和定性评测上均超越了现有的最先进方法（state-of-the-art, SOTA），有效提升了 3DGS 的表示精度和渲染质量。\n"
  },
  {
    "path": "abs/2503.07191.md",
    "content": "### All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting\n\nRecent advances in 3D Gaussian Splatting (3DGS) have revolutionized scene reconstruction, opening new possibilities for 3D steganography by hiding 3D secrets within 3D covers. The key challenge in steganography is ensuring imperceptibility while maintaining high-fidelity reconstruction. However, existing methods often suffer from detectability risks and utilize only suboptimal 3DGS features, limiting their full potential. We propose a novel end-to-end key-secured 3D steganography framework (KeySS) that jointly optimizes a 3DGS model and a key-secured decoder for secret reconstruction. Our approach reveals that Gaussian features contribute unequally to secret hiding. The framework incorporates a key-controllable mechanism enabling multi-secret hiding and unauthorized access prevention, while systematically exploring optimal feature update to balance fidelity and security. To rigorously evaluate steganographic imperceptibility beyond conventional 2D metrics, we introduce 3D-Sinkhorn distance analysis, which quantifies distributional differences between original and steganographic Gaussian parameters in the representation space. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both cover and secret reconstruction while maintaining high security levels, advancing the field of 3D steganography.\n\n三维高斯散点（3D Gaussian Splatting, 3DGS）的最新进展革新了场景重建，并为三维信息隐藏（3D 隐写术）开辟了新的可能性，使得在 3D 载体（cover） 中隐藏 3D 机密（secret） 成为现实。隐写术的核心挑战在于确保隐蔽性（imperceptibility），同时保持高保真的重建质量。然而，现有方法通常面临可检测性风险，且仅利用了次优的 3DGS 特征，未能充分发挥其潜力。\n为此，我们提出 KeySS，一种端到端密钥保护的三维隐写框架（Key-Secured 3D Steganography Framework），该框架联合优化 3DGS 模型与密钥保护解码器以进行机密重建。我们的研究揭示了高斯特征在隐写过程中的贡献是不均衡的。为此，我们引入密钥可控机制（key-controllable mechanism），支持多机密嵌入，同时防止未经授权访问。此外，我们系统性地探索最优的特征更新策略，以在**保真度（fidelity）和安全性（security）**之间取得最佳平衡。\n为了超越传统的 2D 评估指标，我们提出 3D-Sinkhorn 距离分析，用于量化原始高斯参数与隐写高斯参数在表示空间中的分布差异，从而更严格地评估隐写的不可察觉性（imperceptibility）。\n大量实验表明，KeySS 在 3D 载体和 3D 机密的重建质量方面达到了最先进（SOTA）水平，同时保持了极高的安全性，推动了3D 隐写技术的发展。\n"
  },
  {
    "path": "abs/2503.07446.md",
    "content": "### EigenGS Representation: From Eigenspace to Gaussian Image Space\n\nPrincipal Component Analysis (PCA), a classical dimensionality reduction technique, and 2D Gaussian representation, an adaptation of 3D Gaussian Splatting for image representation, offer distinct approaches to modeling visual data. We present EigenGS, a novel method that bridges these paradigms through an efficient transformation pipeline connecting eigenspace and image-space Gaussian representations. Our approach enables instant initialization of Gaussian parameters for new images without requiring per-image optimization from scratch, dramatically accelerating convergence. EigenGS introduces a frequency-aware learning mechanism that encourages Gaussians to adapt to different scales, effectively modeling varied spatial frequencies and preventing artifacts in high-resolution reconstruction. Extensive experiments demonstrate that EigenGS not only achieves superior reconstruction quality compared to direct 2D Gaussian fitting but also reduces necessary parameter count and training time. The results highlight EigenGS's effectiveness and generalization ability across images with varying resolutions and diverse categories, making Gaussian-based image representation both high-quality and viable for real-time applications.\n\n主成分分析（Principal Component Analysis, PCA）是一种经典的降维技术，而 二维高斯表示（2D Gaussian representation） 是 三维高斯散点（3D Gaussian Splatting, 3DGS） 在图像表示中的应用，它们在视觉数据建模方面提供了不同的方法。\n在本文中，我们提出 EigenGS，一种连接特征空间（eigenspace）与图像空间高斯表示的新方法。EigenGS 通过高效的变换管道（transformation pipeline），使得 高斯参数可以即时初始化，无需对每张新图像从零开始优化，从而显著加速收敛。\nEigenGS 引入了一种频率感知学习机制（frequency-aware learning mechanism），使得高斯点能够自适应不同尺度，有效建模不同的空间频率（spatial frequencies），并防止高分辨率重建中的伪影。\n广泛的实验表明，EigenGS 在重建质量上优于直接的 2D 高斯拟合，同时减少了所需参数数量并缩短训练时间。实验结果进一步验证了 EigenGS 在不同分辨率和多种类别图像上的泛化能力，使基于高斯的图像表示既能保证高质量，又适用于实时应用（real-time applications）。\n"
  },
  {
    "path": "abs/2503.07476.md",
    "content": "### SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting\n\nAnchor-based 3D Gaussian splatting (3D-GS) exploits anchor features in 3D Gaussian prediction, which has achieved impressive 3D rendering quality with reduced Gaussian redundancy. On the other hand, it often encounters the dilemma among anchor features, model size, and rendering quality - large anchor features lead to large 3D models and high-quality rendering whereas reducing anchor features degrades Gaussian attribute prediction which leads to clear artifacts in the rendered textures and geometries. We design SOGS, an anchor-based 3D-GS technique that introduces second-order anchors to achieve superior rendering quality and reduced anchor features and model size simultaneously. Specifically, SOGS incorporates covariance-based second-order statistics and correlation across feature dimensions to augment features within each anchor, compensating for the reduced feature size and improving rendering quality effectively. In addition, it introduces a selective gradient loss to enhance the optimization of scene textures and scene geometries, leading to high-quality rendering with small anchor features. Extensive experiments over multiple widely adopted benchmarks show that SOGS achieves superior rendering quality in novel view synthesis with clearly reduced model size.\n\n基于锚点（anchor-based）的三维高斯散点（3D Gaussian Splatting, 3D-GS） 通过在 3D 高斯预测 中利用锚点特征，在减少高斯冗余的同时，实现了高质量的 3D 渲染。然而，该方法常面临锚点特征、模型大小和渲染质量之间的权衡问题——更大的锚点特征可以提升渲染质量，但会导致模型膨胀；而减少锚点特征则会削弱高斯属性预测能力，导致纹理和几何渲染伪影的产生。\n为此，我们设计 SOGS，一种基于锚点的 3D-GS 技术，引入二阶锚点（second-order anchors），在减少锚点特征和模型尺寸的同时，实现更高质量的渲染。具体而言，SOGS 结合基于协方差的二阶统计信息以及特征维度间的相关性，增强锚点内部的特征表达能力，从而在减少特征尺寸的同时，有效提升渲染质量。此外，SOGS 还提出了一种选择性梯度损失（selective gradient loss），专门优化场景纹理和几何信息，确保即使在较小锚点特征的情况下，也能保持高质量的渲染效果。\n在多个广泛使用的基准测试上的实验结果表明，SOGS 在新视角合成（novel view synthesis）任务中，实现了更优的渲染质量，同时显著降低了模型尺寸，展现出卓越的性能优势。\n"
  },
  {
    "path": "abs/2503.07819.md",
    "content": "### POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality\n\nIn this paper, we present a novel algorithm for quantifying uncertainty and information gained within 3D Gaussian Splatting (3D-GS) through P-Optimality. While 3D-GS has proven to be a useful world model with high-quality rasterizations, it does not natively quantify uncertainty. Quantifying uncertainty in parameters of 3D-GS is necessary to understand the information gained from acquiring new images as in active perception, or identify redundant images which can be removed from memory due to resource constraints in online 3D-GS SLAM. We propose to quantify uncertainty and information gain in 3D-GS by reformulating the problem through the lens of optimal experimental design, which is a classical solution to measuring information gain. By restructuring information quantification of 3D-GS through optimal experimental design, we arrive at multiple solutions, of which T-Optimality and D-Optimality perform the best quantitatively and qualitatively as measured on two popular datasets. Additionally, we propose a block diagonal approximation of the 3D-GS uncertainty, which provides a measure of correlation for computing more accurate information gain, at the expense of a greater computation cost.\n\n在本文中，我们提出了一种新颖的不确定性量化算法，用于在三维高斯散点（3D Gaussian Splatting, 3D-GS）中评估不确定性和信息增益，基于 P-Optimality 原则。尽管 3D-GS 已被证明是一种高质量的世界建模方法，并可实现精细的光栅化（rasterization），但其本身并不原生支持不确定性量化。\n在 3D-GS 参数中量化不确定性是至关重要的，它不仅有助于在主动感知（active perception）中评估新增图像带来的信息增益，还能用于在线 3D-GS SLAM 中识别冗余图像，以减少内存占用并优化资源管理。\n为此，我们提出通过最优实验设计（optimal experimental design）的视角重新构造 3D-GS 的信息量化问题，这一方法是经典的信息增益测量解决方案。基于此，我们得到了多个可能的最优性准则，其中 T-Optimality 和 D-Optimality 在定量和定性评测上表现最佳（在两个流行数据集上进行了实验验证）。\n此外，我们提出了一种块对角近似（block diagonal approximation）方法，用于估计 3D-GS 不确定性，它可以提供更精确的信息增益计算，同时引入一定计算成本。\n"
  },
  {
    "path": "abs/2503.07946.md",
    "content": "### 7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting\n\nReal-time rendering of dynamic scenes with view-dependent effects remains a fundamental challenge in computer graphics. While recent advances in Gaussian Splatting have shown promising results separately handling dynamic scenes (4DGS) and view-dependent effects (6DGS), no existing method unifies these capabilities while maintaining real-time performance. We present 7D Gaussian Splatting (7DGS), a unified framework representing scene elements as seven-dimensional Gaussians spanning position (3D), time (1D), and viewing direction (3D). Our key contribution is an efficient conditional slicing mechanism that transforms 7D Gaussians into view- and time-conditioned 3D Gaussians, maintaining compatibility with existing 3D Gaussian Splatting pipelines while enabling joint optimization. Experiments demonstrate that 7DGS outperforms prior methods by up to 7.36 dB in PSNR while achieving real-time rendering (401 FPS) on challenging dynamic scenes with complex view-dependent effects.\n\n实时渲染具有视角相关效应的动态场景仍然是计算机图形学中的一项基本挑战。尽管高斯投影（Gaussian Splatting）技术的最新进展分别在处理动态场景（4DGS）和视角相关效应（6DGS）方面取得了显著成果，但目前尚无方法能够在保持实时性能的同时统一这两种能力。我们提出 7D 高斯投影（7D Gaussian Splatting, 7DGS），这是一种统一框架，将场景元素表示为七维高斯，涵盖 位置（3D）、时间（1D）和视角方向（3D）。我们的核心贡献是一种高效的 条件切片机制（conditional slicing mechanism），能够将 7D 高斯转换为 视角-时间条件化的 3D 高斯，从而在兼容现有 3D 高斯投影流水线的同时，实现联合优化。实验结果表明，7DGS 在 PSNR 方面比现有方法最高提升 7.36 dB，同时在具有复杂视角相关效应的动态场景中实现了实时渲染（401 FPS）。\n"
  },
  {
    "path": "abs/2503.08071.md",
    "content": "### GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats\n\nTracking and mapping in large-scale, unbounded outdoor environments using only monocular RGB input presents substantial challenges for existing SLAM systems. Traditional Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) SLAM methods are typically limited to small, bounded indoor settings. To overcome these challenges, we introduce GigaSLAM, the first NeRF/3DGS-based SLAM framework for kilometer-scale outdoor environments, as demonstrated on the KITTI and KITTI 360 datasets. Our approach employs a hierarchical sparse voxel map representation, where Gaussians are decoded by neural networks at multiple levels of detail. This design enables efficient, scalable mapping and high-fidelity viewpoint rendering across expansive, unbounded scenes. For front-end tracking, GigaSLAM utilizes a metric depth model combined with epipolar geometry and PnP algorithms to accurately estimate poses, while incorporating a Bag-of-Words-based loop closure mechanism to maintain robust alignment over long trajectories. Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.\n\n在 大规模、无限制的室外环境 中，仅使用单目 RGB 输入进行跟踪与建图对现有 SLAM 系统构成了重大挑战。传统的 神经辐射场（NeRF） 和 三维高斯投影（3D Gaussian Splatting, 3DGS） SLAM 方法通常受限于 小规模、有界的室内环境。为了解决这些问题，我们提出 GigaSLAM，这是首个基于 NeRF/3DGS 的千米级室外 SLAM 框架，并在 KITTI 和 KITTI 360 数据集 上进行了验证。\n我们的方法采用 分层稀疏体素地图表示（hierarchical sparse voxel map representation），其中 高斯（Gaussians） 由神经网络以 多层次细节（multiple levels of detail） 进行解码。这一设计使得在 广阔的无限制场景中 进行 高效、可扩展的建图 和 高保真视角渲染 成为可能。\n在 前端跟踪（front-end tracking） 方面，GigaSLAM 结合 度量深度模型（metric depth model）、极线几何（epipolar geometry） 和 PnP 算法，精确估计相机位姿。此外，我们集成了 基于词袋（Bag-of-Words）的回环检测机制（loop closure mechanism），确保在长轨迹场景中的稳健对齐。\n最终，GigaSLAM 在城市室外基准测试上实现了高精度跟踪和视觉逼真的渲染，构建了一种 适用于大规模、长期场景的稳健 SLAM 解决方案，显著扩展了 高斯投影 SLAM 系统在无限制室外环境中的适用性。\n"
  },
  {
    "path": "abs/2503.08093.md",
    "content": "### MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction\n\n3D Gaussian Splatting (3DGS) has gained significant attention for its high-quality rendering capabilities, ultra-fast training, and inference speeds. However, when we apply 3DGS to surface reconstruction tasks, especially in environments with dynamic objects and distractors, the method suffers from floating artifacts and color errors due to inconsistency from different viewpoints. To address this challenge, we propose Multi-View Consistency Gaussian Splatting for the domain of Robust Surface Reconstruction (\\textbf{MVGSR}), which takes advantage of lightweight Gaussian models and a {heuristics-guided distractor masking} strategy for robust surface reconstruction in non-static environments. Compared to existing methods that rely on MLPs for distractor segmentation strategies, our approach separates distractors from static scene elements by comparing multi-view feature consistency, allowing us to obtain precise distractor masks early in training. Furthermore, we introduce a pruning measure based on multi-view contributions to reset transmittance, effectively reducing floating artifacts. Finally, a multi-view consistency loss is applied to achieve high-quality performance in surface reconstruction tasks. Experimental results demonstrate that MVGSR achieves competitive geometric accuracy and rendering fidelity compared to the state-of-the-art surface reconstruction algorithms.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS） 以其高质量渲染能力、极快的训练和推理速度而备受关注。然而，当 3DGS 应用于 表面重建 任务，特别是在包含动态物体和干扰因素的环境中时，由于不同视角间的不一致性，该方法容易出现 漂浮伪影（floating artifacts） 和 颜色误差（color errors）。\n为了解决这一挑战，我们提出 多视图一致性高斯投影（Multi-View Consistency Gaussian Splatting, MVGSR），面向 鲁棒表面重建（Robust Surface Reconstruction） 领域。MVGSR 结合轻量级高斯模型 和 启发式引导的干扰物遮罩（heuristics-guided distractor masking） 策略，实现了在 非静态环境 下的鲁棒表面重建。\n与现有 依赖 MLP 进行干扰物分割 的方法不同，我们的方法通过 多视图特征一致性 分离干扰物和静态场景元素，使得在 训练早期 就能获得 精确的干扰物遮罩（distractor masks）。此外，我们提出了一种 基于多视图贡献的剪枝（pruning）措施 来 重置透射率（reset transmittance），有效减少漂浮伪影。最后，我们引入 多视图一致性损失（multi-view consistency loss） 以提升表面重建的质量。\n实验结果表明，MVGSR 在 几何精度（geometric accuracy） 和 渲染保真度（rendering fidelity） 方面达到了 与最新表面重建算法相媲美的水平。\n"
  },
  {
    "path": "abs/2503.08135.md",
    "content": "### ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting\n\nWe tackle the challenge of concurrent reconstruction at the part level with the RGB appearance and estimation of motion parameters for building digital twins of articulated objects using the 3D Gaussian Splatting (3D-GS) method. With two distinct sets of multi-view imagery, each depicting an object in separate static articulation configurations, we reconstruct the articulated object in 3D Gaussian representations with both appearance and geometry information at the same time. Our approach decoupled multiple highly interdependent parameters through a multi-step optimization process, thereby achieving a stable optimization procedure and high-quality outcomes. We introduce ArticulatedGS, a self-supervised, comprehensive framework that autonomously learns to model shapes and appearances at the part level and synchronizes the optimization of motion parameters, all without reliance on 3D supervision, motion cues, or semantic labels. Our experimental results demonstrate that, among comparable methodologies, our approach has achieved optimal outcomes in terms of part segmentation accuracy, motion estimation accuracy, and visual quality.\n\n我们针对 构建可动对象（articulated objects）数字孪生 的需求，采用 三维高斯投影（3D Gaussian Splatting, 3D-GS） 方法，解决 基于 RGB 视觉信息的部件级别重建 及 运动参数估计 这一挑战。在我们的设定中，输入为 两组多视角图像，分别捕捉了对象在 不同静态关节配置 下的状态，我们的目标是在 三维高斯表示 中 同时重建对象的外观和几何信息。\n我们的方法通过 多步优化过程（multi-step optimization process） 解耦多个高度相关的参数，从而实现 稳定的优化流程 并获得 高质量的重建结果。为此，我们提出 ArticulatedGS，这是一个 自监督（self-supervised）、全面（comprehensive） 的框架，能够 自主学习建模部件级别的形状和外观，并同步优化运动参数，无需 3D 监督、运动线索或语义标签。\n实验结果表明，在 可比方法 中，我们的方案在 部件分割精度（part segmentation accuracy）、运动估计精度（motion estimation accuracy） 和 视觉质量（visual quality） 方面均实现了 最佳表现。\n"
  },
  {
    "path": "abs/2503.08166.md",
    "content": "### Dynamic Scene Reconstruction: Recent Advance in Real-time Rendering and Streaming\n\nRepresenting and rendering dynamic scenes from 2D images is a fundamental yet challenging problem in computer vision and graphics. This survey provides a comprehensive review of the evolution and advancements in dynamic scene representation and rendering, with a particular emphasis on recent progress in Neural Radiance Fields based and 3D Gaussian Splatting based reconstruction methods. We systematically summarize existing approaches, categorize them according to their core principles, compile relevant datasets, compare the performance of various methods on these benchmarks, and explore the challenges and future research directions in this rapidly evolving field. In total, we review over 170 relevant papers, offering a broad perspective on the state of the art in this domain.\n\n从二维图像中表示和渲染动态场景是计算机视觉和计算机图形学中的一个基础性但具有挑战性的问题。本综述对动态场景表示与渲染的发展和最新进展进行了全面回顾，特别关注基于神经辐射场（Neural Radiance Fields, NeRF）和三维高斯散点（3D Gaussian Splatting, 3DGS）的重建方法。我们系统地总结了现有方法，按照其核心原理进行分类，整理了相关数据集，比较了各种方法在这些基准上的性能，并探讨了该领域的挑战与未来研究方向。在本综述中，我们共回顾了170 余篇相关论文，为该领域的最新研究现状提供了广泛的视角。\n"
  },
  {
    "path": "abs/2503.08217.md",
    "content": "### S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction\n\nRecently, 3D Gaussian Splatting (3DGS) has reshaped the field of photorealistic 3D reconstruction, achieving impressive rendering quality and speed. However, when applied to large-scale street scenes, existing methods suffer from rapidly escalating per-viewpoint reconstruction costs as scene size increases, leading to significant computational overhead. After revisiting the conventional pipeline, we identify three key factors accounting for this issue: unnecessary local-to-global transformations, excessive 3D-to-2D projections, and inefficient rendering of distant content. To address these challenges, we propose S3R-GS, a 3DGS framework that Streamlines the pipeline for large-scale Street Scene Reconstruction, effectively mitigating these limitations. Moreover, most existing street 3DGS methods rely on ground-truth 3D bounding boxes to separate dynamic and static components, but 3D bounding boxes are difficult to obtain, limiting real-world applicability. To address this, we propose an alternative solution with 2D boxes, which are easier to annotate or can be predicted by off-the-shelf vision foundation models. Such designs together make S3R-GS readily adapt to large, in-the-wild scenarios. Extensive experiments demonstrate that S3R-GS enhances rendering quality and significantly accelerates reconstruction. Remarkably, when applied to videos from the challenging Argoverse2 dataset, it achieves state-of-the-art PSNR and SSIM, reducing reconstruction time to below 50%--and even 20%--of competing methods.\n\n近年来，三维高斯散点（3D Gaussian Splatting, 3DGS）在逼真3D重建领域取得了突破性进展，实现了卓越的渲染质量和速度。然而，当应用于大规模街景场景时，现有方法的单视角重建成本会随着场景规模的增加而急剧上升，从而导致巨大的计算开销。通过重新审视传统流水线，我们发现导致该问题的三个关键因素：不必要的局部-全局变换、过多的3D到2D投影以及低效的远处内容渲染。为了解决这些挑战，我们提出 S3R-GS，一个面向大规模街景重建的3DGS框架，有效缓解上述限制。\n此外，大多数现有的街景3DGS方法依赖于真实的3D边界框来分离动态和静态组件，但3D边界框的获取较为困难，限制了其在真实场景中的适用性。为此，我们提出了一种基于2D边界框的替代方案，这些边界框更易于标注，或者可以通过现成的视觉基础模型进行预测。这些设计使得 S3R-GS 能够轻松适应大规模、真实世界的场景。\n大量实验表明，S3R-GS 不仅提升了渲染质量，还显著加速了重建过程。值得注意的是，在 Argoverse2 数据集的挑战性视频上进行测试时，S3R-GS 达到了最先进的 PSNR 和 SSIM 指标，并将重建时间降低至竞争方法的 50% 甚至 20% 以下。\n"
  },
  {
    "path": "abs/2503.08218.md",
    "content": "### MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior\n\n3D human reconstruction from a single image is a challenging problem and has been exclusively studied in the literature. Recently, some methods have resorted to diffusion models for guidance, optimizing a 3D representation via Score Distillation Sampling(SDS) or generating one backview image for facilitating reconstruction. However, these methods tend to produce unsatisfactory artifacts (e.g. flattened human structure or over-smoothing results caused by inconsistent priors from multiple views) and struggle with real-world generalization in the wild. In this work, we present MVD-HuGaS, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model. We first generate multi-view images from the single reference image with an enhanced multi-view diffusion model, which is well fine-tuned on high-quality 3D human datasets to incorporate 3D geometry priors and human structure priors. To infer accurate camera poses from the sparse generated multi-view images for reconstruction, an alignment module is introduced to facilitate joint optimization of 3D Gaussians and camera poses. Furthermore, we propose a depth-based Facial Distortion Mitigation module to refine the generated facial regions, thereby improving the overall fidelity of the reconstruction. Finally, leveraging the refined multi-view images, along with their accurate camera poses, MVD-HuGaS optimizes the 3D Gaussians of the target human for high-fidelity free-view renderings. Extensive experiments on Thuman2.0 and 2K2K datasets show that the proposed MVD-HuGaS achieves state-of-the-art performance on single-view 3D human rendering.\n\n从单张图像重建3D 人体是一个极具挑战的问题，并已在文献中广泛研究。近期，一些方法尝试利用扩散模型（diffusion models）进行引导，例如使用得分蒸馏采样（Score Distillation Sampling, SDS）优化 3D 表示，或生成后视图图像（backview image）以辅助重建。然而，这些方法往往会产生不理想的伪影（例如，扁平化的人体结构或因多视角先验不一致导致的过度平滑），并且在**真实世界场景（in-the-wild）**中的泛化能力较弱。\n为了解决这些问题，我们提出 MVD-HuGaS，一种基于多视角人体扩散模型（multi-view human diffusion model）的自由视角 3D 人体渲染方法，仅需单张输入图像即可生成高质量 3D 头像。\n具体而言，我们首先使用增强型多视角扩散模型（multi-view diffusion model）从单张参考图像生成多个视角的图像。该模型经过高质量3D 人体数据集的精细调优，使其能够有效学习3D 几何先验和人体结构先验。\n为了从这些稀疏的多视角生成图像中推断准确的相机位姿以进行 3D 重建，我们引入对齐模块（alignment module），用于联合优化3D 高斯（3D Gaussians）和相机位姿，提高重建的准确性。此外，我们提出基于深度的面部失真缓解模块（Depth-based Facial Distortion Mitigation module），专门优化生成的面部区域，从而提升整体重建的保真度。\n最终，利用优化后的多视角图像及其精准相机位姿，MVD-HuGaS 进一步优化目标人体的3D 高斯表示，从而实现高保真度的自由视角渲染。在 Thuman2.0 和 2K2K 数据集上的广泛实验表明，MVD-HuGaS 在单视角 3D 人体渲染任务上达到了当前最先进水平（state-of-the-art）。\n"
  },
  {
    "path": "abs/2503.08224.md",
    "content": "### HRAvatar: High-Quality and Relightable Gaussian Head Avatar\n\nReconstructing animatable and high-quality 3D head avatars from monocular videos, especially with realistic relighting, is a valuable task. However, the limited information from single-view input, combined with the complex head poses and facial movements, makes this challenging. Previous methods achieve real-time performance by combining 3D Gaussian Splatting with a parametric head model, but the resulting head quality suffers from inaccurate face tracking and limited expressiveness of the deformation model. These methods also fail to produce realistic effects under novel lighting conditions. To address these issues, we propose HRAvatar, a 3DGS-based method that reconstructs high-fidelity, relightable 3D head avatars. HRAvatar reduces tracking errors through end-to-end optimization and better captures individual facial deformations using learnable blendshapes and learnable linear blend skinning. Additionally, it decomposes head appearance into several physical properties and incorporates physically-based shading to account for environmental lighting. Extensive experiments demonstrate that HRAvatar not only reconstructs superior-quality heads but also achieves realistic visual effects under varying lighting conditions.\n\n从单目视频重建 可动画、高质量的 3D 头像，特别是 具备真实光照效果（realistic relighting），是一项极具价值的任务。然而，由于单视角输入的信息受限，同时 头部姿态（head poses） 和 面部运动（facial movements） 复杂，这一任务充满挑战。\n现有方法通常结合 三维高斯投影（3D Gaussian Splatting, 3DGS） 与 参数化头部模型（parametric head model） 来实现实时性能，但 面部跟踪误差（face tracking errors） 和 变形模型的表达能力有限（limited expressiveness of the deformation model），导致最终的头像质量欠佳。此外，这些方法在 新光照条件（novel lighting conditions） 下难以呈现逼真的效果。\n为了解决这些问题，我们提出 HRAvatar，一种基于 3DGS 的方法，可 重建高保真（high-fidelity）、可重光照（relightable） 的 3D 头像。HRAvatar 通过 端到端优化（end-to-end optimization） 减少跟踪误差，并利用 可学习 Blendshape（learnable blendshapes） 和 可学习线性混合蒙皮（learnable linear blend skinning） 更好地捕捉个体化的面部变形。此外，该方法将 头部外观（head appearance） 分解为多个物理属性，并结合 基于物理的光照渲染（physically-based shading） 以模拟环境光照效果。\n大量实验表明，HRAvatar 不仅能重建高质量 3D 头像，还能在不同光照条件下呈现逼真的视觉效果。\n"
  },
  {
    "path": "abs/2503.08317.md",
    "content": "### Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios\n\nEnsuring the safety of autonomous vehicles necessitates comprehensive simulation of multi-sensor data, encompassing inputs from both cameras and LiDAR sensors, across various dynamic driving scenarios. Neural rendering techniques, which utilize collected raw sensor data to simulate these dynamic environments, have emerged as a leading methodology. While NeRF-based approaches can uniformly represent scenes for rendering data from both camera and LiDAR, they are hindered by slow rendering speeds due to dense sampling. Conversely, Gaussian Splatting-based methods employ Gaussian primitives for scene representation and achieve rapid rendering through rasterization. However, these rasterization-based techniques struggle to accurately model non-linear optical sensors. This limitation restricts their applicability to sensors beyond pinhole cameras. To address these challenges and enable unified representation of dynamic driving scenarios using Gaussian primitives, this study proposes a novel hybrid approach. Our method utilizes rasterization for rendering image data while employing Gaussian ray-tracing for LiDAR data rendering. Experimental results on public datasets demonstrate that our approach outperforms current state-of-the-art methods. This work presents a unified and efficient solution for realistic simulation of camera and LiDAR data in autonomous driving scenarios using Gaussian primitives, offering significant advancements in both rendering quality and computational efficiency.\n\n确保 自动驾驶车辆的安全性 需要对 多传感器数据 进行全面模拟，包括 摄像头（cameras） 和 LiDAR 传感器 在 各种动态驾驶场景 下的输入。近年来，神经渲染（neural rendering） 技术利用采集的原始传感器数据来模拟这些动态环境，已成为主要方法之一。\n基于 NeRF 的方法 可以 统一表示场景，以同时渲染来自摄像头和 LiDAR 的数据，但由于 需要密集采样（dense sampling），其渲染速度较慢。相比之下，基于高斯投影（Gaussian Splatting）的方法 通过 高斯原语（Gaussian primitives） 进行场景表示，并利用 光栅化（rasterization） 实现快速渲染。然而，这些 基于光栅化的技术难以准确建模非线性光学传感器（non-linear optical sensors），从而限制了其在 针孔相机（pinhole cameras）以外的传感器 上的适用性。\n为了解决这些挑战，并实现 使用高斯原语对动态驾驶场景的统一表示，本研究提出了一种 新型混合方法（hybrid approach）。我们的方法采用 光栅化（rasterization） 来渲染 图像数据（image data），同时利用 高斯光线追踪（Gaussian ray-tracing） 进行 LiDAR 数据渲染。\n在 公开数据集 上的实验结果表明，我们的方法 优于当前最先进的（state-of-the-art）方法。本研究提出了一种 统一且高效的解决方案，可用于 自动驾驶场景中的摄像头和 LiDAR 数据的真实感模拟，在 渲染质量 和 计算效率 方面均取得了重要进展。\n"
  },
  {
    "path": "abs/2503.08352.md",
    "content": "### Mitigating Ambiguities in 3D Classification with Gaussian Splatting\n\n3D classification with point cloud input is a fundamental problem in 3D vision. However, due to the discrete nature and the insufficient material description of point cloud representations, there are ambiguities in distinguishing wire-like and flat surfaces, as well as transparent or reflective objects. To address these issues, we propose Gaussian Splatting (GS) point cloud-based 3D classification. We find that the scale and rotation coefficients in the GS point cloud help characterize surface types. Specifically, wire-like surfaces consist of multiple slender Gaussian ellipsoids, while flat surfaces are composed of a few flat Gaussian ellipsoids. Additionally, the opacity in the GS point cloud represents the transparency characteristics of objects. As a result, ambiguities in point cloud-based 3D classification can be mitigated utilizing GS point cloud as input. To verify the effectiveness of GS point cloud input, we construct the first real-world GS point cloud dataset in the community, which includes 20 categories with 200 objects in each category. Experiments not only validate the superiority of GS point cloud input, especially in distinguishing ambiguous objects, but also demonstrate the generalization ability across different classification methods.\n\n3D 点云输入的分类是 3D 视觉中的一个基础问题。然而，由于点云表示的离散性以及对材料描述的不足，在区分线状结构与平面结构，以及透明或反射物体时存在一定的歧义。为了解决这些问题，我们提出了一种基于高斯溅射（Gaussian Splatting, GS）点云的 3D 分类方法。我们发现，GS 点云中的尺度和旋转系数有助于表征表面类型。具体而言，线状结构由多个细长的高斯椭球体组成，而平面结构则由少量扁平的高斯椭球体构成。此外，GS 点云中的不透明度能够反映物体的透明特性。因此，使用 GS 点云作为输入可以有效缓解基于点云的 3D 分类中的歧义性。\n为了验证 GS 点云输入的有效性，我们构建了社区内首个真实世界的 GS 点云数据集，该数据集包含 20 个类别，每个类别包含 200 个物体。实验不仅验证了 GS 点云输入的优越性，特别是在区分易混淆物体方面的能力，同时也展示了其在不同分类方法上的良好泛化能力。\n"
  },
  {
    "path": "abs/2503.08485.md",
    "content": "### TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting\n\nSelf-supervised 3D occupancy prediction offers a promising solution for understanding complex driving scenes without requiring costly 3D annotations. However, training dense voxel decoders to capture fine-grained geometry and semantics can demand hundreds of GPU hours, and such models often fail to adapt to varying voxel resolutions or new classes without extensive retraining. To overcome these limitations, we propose a practical and flexible test-time occupancy prediction framework termed TT-GaussOcc. Our approach incrementally optimizes time-aware 3D Gaussians instantiated from raw sensor streams at runtime, enabling voxelization at arbitrary user-specified resolution. Specifically, TT-GaussOcc operates in a \"lift-move-voxel\" symphony: we first \"lift\" surrounding-view semantics obtained from 2D vision foundation models (VLMs) to instantiate Gaussians at non-empty 3D space; Next, we \"move\" dynamic Gaussians from previous frames along estimated Gaussian scene flow to complete appearance and eliminate trailing artifacts of fast-moving objects, while accumulating static Gaussians to enforce temporal consistency; Finally, we mitigate inherent noises in semantic predictions and scene flow vectors by periodically smoothing neighboring Gaussians during optimization, using proposed trilateral RBF kernels that jointly consider color, semantic, and spatial affinities. The historical static and current dynamic Gaussians are then combined and voxelized to generate occupancy prediction. Extensive experiments on Occ3D and nuCraft with varying voxel resolutions demonstrate that TT-GaussOcc surpasses self-supervised baselines by 46% on mIoU without any offline training, and supports finer voxel resolutions at 2.6 FPS inference speed.\n\n自监督 3D 占用预测（occupancy prediction）为理解复杂的驾驶场景提供了一种有前景的解决方案，而无需昂贵的 3D 标注。然而，训练用于捕捉细粒度几何和语义信息的稠密体素解码器可能需要数百 GPU 小时，并且这些模型往往难以适应不同的体素分辨率或新类别，除非进行大规模的重新训练。\n为了解决这些局限性，我们提出了一种实用且灵活的测试时占用预测框架，称为 TT-GaussOcc。该方法在推理过程中增量优化由原始传感器流实时实例化的时序 3D 高斯（time-aware 3D Gaussians），从而实现任意用户指定分辨率的体素化。具体而言，TT-GaussOcc 采用“提升-移动-体素化”（lift-move-voxel）协同机制：首先，我们利用 2D 视觉基础模型（VLMs）提取的周围视图语义信息，将其“提升”到 3D 空间，实例化非空 3D 高斯；然后，我们根据估计的高斯场景流（Gaussian scene flow）对动态高斯进行“移动”，以补全外观信息，并消除快速移动物体的拖影伪影，同时累积静态高斯以增强时序一致性；最后，我们通过提出的三边 RBF 核（trilateral RBF kernels），综合考虑颜色、语义和空间邻域关系，在优化过程中对相邻高斯进行平滑，以缓解语义预测和场景流向量中的固有噪声。最终，历史静态高斯和当前动态高斯被合并并体素化，生成最终的占用预测结果。\n在 Occ3D 和 nuCraft 数据集上的大量实验表明，在不同体素分辨率下，TT-GaussOcc 无需任何离线训练，即可比自监督基线方法在 mIoU 上提升 46%，并在 2.6 FPS 的推理速度下支持更精细的体素分辨率。\n"
  },
  {
    "path": "abs/2503.08511.md",
    "content": "### PCGS: Progressive Compression of 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) achieves impressive rendering fidelity and speed for novel view synthesis. However, its substantial data size poses a significant challenge for practical applications. While many compression techniques have been proposed, they fail to efficiently utilize existing bitstreams in on-demand applications due to their lack of progressivity, leading to a waste of resource. To address this issue, we propose PCGS (Progressive Compression of 3D Gaussian Splatting), which adaptively controls both the quantity and quality of Gaussians (or anchors) to enable effective progressivity for on-demand applications. Specifically, for quantity, we introduce a progressive masking strategy that incrementally incorporates new anchors while refining existing ones to enhance fidelity. For quality, we propose a progressive quantization approach that gradually reduces quantization step sizes to achieve finer modeling of Gaussian attributes. Furthermore, to compact the incremental bitstreams, we leverage existing quantization results to refine probability prediction, improving entropy coding efficiency across progressive levels. Overall, PCGS achieves progressivity while maintaining compression performance comparable to SoTA non-progressive methods.\n\n3D 高斯溅射（3D Gaussian Splatting, 3DGS）在新视角合成方面表现出卓越的渲染保真度和速度。然而，其庞大的数据规模对实际应用构成了重大挑战。尽管已有许多压缩技术被提出，但由于缺乏渐进性（progressivity），这些方法无法高效地在按需应用场景中利用现有比特流，导致资源浪费。\n为了解决这一问题，我们提出 PCGS（Progressive Compression of 3D Gaussian Splatting），通过自适应控制高斯（或锚点）的数量和质量，实现按需应用场景下的高效渐进性。具体而言，在数量控制方面，我们引入了一种 渐进掩码策略（progressive masking strategy），通过逐步添加新锚点并优化已有锚点，以提高渲染保真度。在质量控制方面，我们提出 渐进量化方法（progressive quantization approach），逐步缩小量化步长，以更精细地建模高斯属性。此外，为了进一步压缩增量比特流，我们利用已有的量化结果优化概率预测，从而提升熵编码的效率，使其在各个渐进级别上更加紧凑。\n总体而言，PCGS 实现了渐进性，同时在压缩性能上保持了与当前最优（SoTA）非渐进方法相当的效果。\n"
  },
  {
    "path": "abs/2503.09040.md",
    "content": "### Motion Blender Gaussian Splatting for Dynamic Reconstruction\n\nGaussian splatting has emerged as a powerful tool for high-fidelity reconstruction of dynamic scenes. However, existing methods primarily rely on implicit motion representations, such as encoding motions into neural networks or per-Gaussian parameters, which makes it difficult to further manipulate the reconstructed motions. This lack of explicit controllability limits existing methods to replaying recorded motions only, which hinders a wider application. To address this, we propose Motion Blender Gaussian Splatting (MB-GS), a novel framework that uses motion graph as an explicit and sparse motion representation. The motion of graph links is propagated to individual Gaussians via dual quaternion skinning, with learnable weight painting functions determining the influence of each link. The motion graphs and 3D Gaussians are jointly optimized from input videos via differentiable rendering. Experiments show that MB-GS achieves state-of-the-art performance on the iPhone dataset while being competitive on HyperNeRF. Additionally, we demonstrate the application potential of our method in generating novel object motions and robot demonstrations through motion editing.\n\n高斯溅射（Gaussian Splatting）已成为高保真动态场景重建的强大工具。然而，现有方法主要依赖于隐式运动表示，例如将运动编码到神经网络或单个高斯参数中，这使得对重建的运动进行进一步操控变得困难。这种缺乏显式可控性的问题，使现有方法仅限于重放已记录的运动，限制了更广泛的应用。\n为了解决这一问题，我们提出 MB-GS（Motion Blender Gaussian Splatting），这是一种利用 运动图（motion graph） 作为显式且稀疏运动表示的新框架。具体而言，运动图的链接运动（motion of graph links） 通过 双四元数蒙皮（dual quaternion skinning） 传播至单个高斯点，而可学习的权重绘制函数（weight painting functions） 负责确定每个链接对高斯点的影响。运动图和 3D 高斯点可通过可微渲染（differentiable rendering），从输入视频中联合优化得到。\n实验表明，MB-GS 在 iPhone 数据集上达到了最先进（state-of-the-art）的重建性能，同时在 HyperNeRF 数据集上也表现出竞争力。此外，我们进一步展示了该方法在运动编辑任务中的应用潜力，包括生成新颖的物体运动和机器人演示（robot demonstrations）。\n\n"
  },
  {
    "path": "abs/2503.09332.md",
    "content": "### SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction\n\nDynamic and static components in scenes often exhibit distinct properties, yet most 4D reconstruction methods treat them indiscriminately, leading to suboptimal performance in both cases. This work introduces SDD-4DGS, the first framework for static-dynamic decoupled 4D scene reconstruction based on Gaussian Splatting. Our approach is built upon a novel probabilistic dynamic perception coefficient that is naturally integrated into the Gaussian reconstruction pipeline, enabling adaptive separation of static and dynamic components. With carefully designed implementation strategies to realize this theoretical framework, our method effectively facilitates explicit learning of motion patterns for dynamic elements while maintaining geometric stability for static structures. Extensive experiments on five benchmark datasets demonstrate that SDD-4DGS consistently outperforms state-of-the-art methods in reconstruction fidelity, with enhanced detail restoration for static structures and precise modeling of dynamic motions.\n\n在 4D 场景中，动态与静态组件通常表现出不同的特性，然而，大多数 4D 重建方法未加区分地处理它们，导致整体性能受限。为了解决这一问题，我们提出 SDD-4DGS，这是首个基于高斯溅射（Gaussian Splatting）的 静态-动态解耦（Static-Dynamic Decoupled）4D 场景重建 框架。\n我们的方法基于一种新颖的 概率动态感知系数（probabilistic dynamic perception coefficient），该系数自然集成到高斯重建管道中，使得静态与动态组件能够自适应分离。通过精心设计的实现策略，我们的框架能够显式学习动态元素的运动模式，同时保持静态结构的几何稳定性。\n在五个基准数据集上的大量实验表明，SDD-4DGS 在重建保真度方面始终优于当前最先进（state-of-the-art）方法，不仅增强了静态结构的细节恢复，还实现了动态运动的精准建模。\n\n"
  },
  {
    "path": "abs/2503.09342.md",
    "content": "### GASPACHO: Gaussian Splatting for Controllable Humans and Objects\n\nWe present GASPACHO: a method for generating photorealistic controllable renderings of human-object interactions. Given a set of multi-view RGB images of human-object interactions, our method reconstructs animatable templates of the human and object as separate sets of Gaussians simultaneously. Different from existing work, which focuses on human reconstruction and ignores objects as background, our method explicitly reconstructs both humans and objects, thereby allowing for controllable renderings of novel human object interactions in different poses from novel-camera viewpoints. During reconstruction, we constrain the Gaussians that generate rendered images to be a linear function of a set of canonical Gaussians. By simply changing the parameters of the linear deformation functions after training, our method can generate renderings of novel human-object interaction in novel poses from novel camera viewpoints. We learn the 3D Gaussian properties of the canonical Gaussians on the underlying 2D manifold of the canonical human and object templates. This in turns requires a canonical object template with a fixed UV unwrapping. To define such an object template, we use a feature based representation to track the object across the multi-view sequence. We further propose an occlusion aware photometric loss that allows for reconstructions under significant occlusions. Several experiments on two human-object datasets - BEHAVE and DNA-Rendering - demonstrate that our method allows for high-quality reconstruction of human and object templates under significant occlusion and the synthesis of controllable renderings of novel human-object interactions in novel human poses from novel camera views.\n\n我们提出 GASPACHO，一种用于生成逼真且可控的人-物交互渲染的方法。给定一组多视角 RGB 图像，我们的方法能够同时重建可动画的人体和物体模板，将它们作为两个独立的高斯集合进行建模。不同于现有方法仅关注人体重建并将物体视为背景，我们的方法显式重建人和物体，从而实现从新视角、不同姿态下的人-物交互可控渲染。\n在重建过程中，我们约束用于渲染图像的高斯点，使其是一组标准高斯点（canonical Gaussians）的线性函数。在训练完成后，仅需调整线性变形函数的参数，我们的方法即可在新视角、不同人体姿态下生成新的人-物交互渲染。我们在标准人体与物体模板的二维流形上学习标准高斯点的 3D 属性。这一过程需要定义一个固定 UV 展开的标准物体模板，为此，我们采用基于特征的表示来跟踪物体在多视角序列中的变化。此外，我们提出了一种遮挡感知的光度损失（occlusion-aware photometric loss），使得方法能够在严重遮挡的情况下进行重建。\n在 BEHAVE 和 DNA-Rendering 两个人-物交互数据集上的实验表明，我们的方法能够在显著遮挡条件下实现高质量的人体与物体模板重建，并能生成可控的人-物交互渲染，支持新的人体姿态和新视角。\n"
  },
  {
    "path": "abs/2503.09396.md",
    "content": "### Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive performance in synthesizing novel views after training on a given set of viewpoints. However, its rendering quality deteriorates when the synthesized view deviates significantly from the training views. This decline occurs due to (1) the model's difficulty in generalizing to out-of-distribution scenarios and (2) challenges in interpolating fine details caused by substantial resolution changes and occlusions. A notable case of this limitation is close-up view generation--producing views that are significantly closer to the object than those in the training set. To tackle this issue, we propose a novel approach for close-up view generation based by progressively training the 3DGS model with self-generated data. Our solution is based on three key ideas. First, we leverage the See3D model, a recently introduced 3D-aware generative model, to enhance the details of rendered views. Second, we propose a strategy to progressively expand the \"trust regions\" of the 3DGS model and update a set of reference views for See3D. Finally, we introduce a fine-tuning strategy to carefully update the 3DGS model with training data generated from the above schemes. We further define metrics for close-up views evaluation to facilitate better research on this problem. By conducting evaluations on specifically selected scenarios for close-up views, our proposed approach demonstrates a clear advantage over competitive solutions.\n\n3D 高斯散点 (3DGS) 在对给定视点集进行训练后，已展现出卓越的新视图合成性能。然而，当合成视图与训练视图存在较大偏差时，其渲染质量会显著下降。这一退化主要归因于 (1) 该模型难以泛化到分布外（out-of-distribution）场景，以及 (2) 由于分辨率变化和遮挡严重，细节插值变得更加困难。其中一个典型的限制案例是近景视图生成——即生成比训练集中视点更接近物体的视图。\n为解决这一问题，我们提出了一种基于逐步训练 3DGS 模型的自生成数据的新方法。我们的方案基于三个关键思想。首先，我们利用 See3D 模型，这是一种新近提出的 3D 感知生成模型，以增强渲染视图的细节。其次，我们提出了一种策略，以逐步扩展 3DGS 模型的“信任区域”，并更新 See3D 的一组参考视图。最后，我们引入了一种精细微调策略，以利用上述方案生成的训练数据谨慎地更新 3DGS 模型。我们进一步定义了近景视图评估的度量标准，以促进对该问题的深入研究。通过在特定的近景视图场景中进行评估，我们的方法在竞争性解决方案中展现出明显的优势。\n"
  },
  {
    "path": "abs/2503.09447.md",
    "content": "### Online Language Splatting\n\nTo enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by integrating language features into geometrically detailed 3D scene representations using 3D Gaussian Splatting (GS), these approaches rely on computationally intensive offline preprocessing of language features for each input image, limiting adaptability to new environments. In this work, we introduce Online Language Splatting, the first framework to achieve online, near real-time, open-vocabulary language mapping within a 3DGS-SLAM system without requiring pre-generated language features. The key challenge lies in efficiently fusing high-dimensional language features into 3D representations while balancing the computation speed, memory usage, rendering quality and open-vocabulary capability. To this end, we innovatively design: (1) a high-resolution CLIP embedding module capable of generating detailed language feature maps in 18ms per frame, (2) a two-stage online auto-encoder that compresses 768-dimensional CLIP features to 15 dimensions while preserving open-vocabulary capabilities, and (3) a color-language disentangled optimization approach to improve rendering quality. Experimental results show that our online method not only surpasses the state-of-the-art offline methods in accuracy but also achieves more than 40x efficiency boost, demonstrating the potential for dynamic and interactive AI applications.\n\n为了使 AI 代理能够与人类和 3D 环境无缝互动，它们不仅需要准确感知 3D 世界，还需要将人类语言与 3D 空间表征对齐。虽然以往的研究通过将语言特征整合到几何细节丰富的 3D 场景表征中，使用 3D 高斯溅射（3DGS）取得了显著进展，但这些方法依赖于对每个输入图像进行计算密集型的离线语言特征预处理，限制了其对新环境的适应性。在本研究中，我们提出了在线语言溅射（Online Language Splatting），这是第一个能够在 3DGS-SLAM 系统中实现在线、接近实时、开放词汇语言映射的框架，无需预生成语言特征。关键挑战在于如何高效地将高维语言特征融入 3D 表征中，同时平衡计算速度、内存使用、渲染质量和开放词汇能力。为此，我们创新性地设计了：(1) 一种高分辨率的 CLIP 嵌入模块，能够在每帧 18 毫秒内生成详细的语言特征图；(2) 一种两阶段的在线自编码器，能够将 768 维的 CLIP 特征压缩至 15 维，同时保持开放词汇能力；(3) 一种颜色-语言解耦优化方法，旨在提升渲染质量。实验结果表明，我们的在线方法不仅在准确性上超越了最先进的离线方法，还实现了超过 40 倍的效率提升，展示了动态和交互式 AI 应用的潜力。\n"
  },
  {
    "path": "abs/2503.09464.md",
    "content": "### Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation\n\nNeural reconstruction models for autonomous driving simulation have made significant strides in recent years, with dynamic models becoming increasingly prevalent. However, these models are typically limited to handling in-domain objects closely following their original trajectories. We introduce a hybrid approach that combines the strengths of neural reconstruction with physics-based rendering. This method enables the virtual placement of traditional mesh-based dynamic agents at arbitrary locations, adjustments to environmental conditions, and rendering from novel camera viewpoints. Our approach significantly enhances novel view synthesis quality -- especially for road surfaces and lane markings -- while maintaining interactive frame rates through our novel training method, NeRF2GS. This technique leverages the superior generalization capabilities of NeRF-based methods and the real-time rendering speed of 3D Gaussian Splatting (3DGS). We achieve this by training a customized NeRF model on the original images with depth regularization derived from a noisy LiDAR point cloud, then using it as a teacher model for 3DGS training. This process ensures accurate depth, surface normals, and camera appearance modeling as supervision. With our block-based training parallelization, the method can handle large-scale reconstructions (greater than or equal to 100,000 square meters) and predict segmentation masks, surface normals, and depth maps. During simulation, it supports a rasterization-based rendering backend with depth-based composition and multiple camera models for real-time camera simulation, as well as a ray-traced backend for precise LiDAR simulation.\n\n近年来，自动驾驶仿真中的神经重建模型取得了显著进展，动态模型越来越普及。然而，这些模型通常仅限于处理跟随原始轨迹的领域内对象。我们提出了一种混合方法，结合了神经重建和基于物理的渲染的优势。这种方法能够在任意位置虚拟放置传统的基于网格的动态代理，调整环境条件，并从新颖的摄像机视角进行渲染。我们的方法显著提高了新视角合成质量，特别是在路面和车道标线的渲染上，同时通过我们新颖的训练方法 NeRF2GS 保持了交互式帧率。该技术利用了基于 NeRF 方法的优越泛化能力和 3D 高斯溅射（3DGS）的实时渲染速度。我们通过在原始图像上训练一个定制的 NeRF 模型，并利用来自噪声 LiDAR 点云的深度正则化，随后将其作为教师模型进行 3DGS 训练。这个过程确保了深度、表面法线和摄像机外观建模的准确性，并作为监督信号。通过我们基于块的训练并行化方法，该方法能够处理大规模重建（大于或等于 100,000 平方米），并预测分割掩码、表面法线和深度图。在仿真过程中，它支持基于光栅化的渲染后端，结合基于深度的合成和多摄像机模型进行实时摄像机仿真，同时还支持基于光线追踪的后端进行精确的 LiDAR 仿真。\n"
  },
  {
    "path": "abs/2503.09635.md",
    "content": "### FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting\n\nWe present FPGS, a feed-forward photorealistic style transfer method of large-scale radiance fields represented by Gaussian Splatting. FPGS, stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view consistency and real-time rendering speed of 3D Gaussians. Prior arts required tedious per-style optimization or time-consuming per-scene training stage and were limited to small-scale 3D scenes. FPGS efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D feature field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPGS supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPGS also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPGS achieves favorable photorealistic quality scene stylization for large-scale static and dynamic 3D scenes with diverse reference images.\n\n我们提出了 FPGS，一种基于高斯溅射表示的大规模辐射场的前馈式光照风格转移方法。FPGS 在不需要额外优化的情况下，通过任意多个风格参考图像对大规模 3D 场景进行风格化，同时保持 3D 高斯的多视角一致性和实时渲染速度。之前的方法需要繁琐的每个风格优化或耗时的每个场景训练阶段，并且仅限于小规模的 3D 场景。FPGS 通过引入一个风格分解的 3D 特征场来高效风格化大规模 3D 场景，该特征场继承了 AdaIN 的前馈式风格化机制，支持任意风格参考图像。此外，FPGS 支持多参考风格化，通过语义对应匹配和局部 AdaIN 为 3D 场景风格提供了多样的用户控制。FPGS 还通过直接将语义匹配和风格转移过程应用到 3D 空间中的查询特征，保持了多视角一致性。在实验中，我们展示了 FPGS 能够为大规模静态和动态 3D 场景实现优质的光照风格转移，并且支持多样的参考图像。\n"
  },
  {
    "path": "abs/2503.09640.md",
    "content": "### Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting\n\nRendering realistic human-object interactions (HOIs) from sparse-view inputs is challenging due to occlusions and incomplete observations, yet crucial for various real-world applications. Existing methods always struggle with either low rendering qualities (\\eg, visual fidelity and physically plausible HOIs) or high computational costs. To address these limitations, we propose HOGS (Human-Object Rendering via 3D Gaussian Splatting), a novel framework for efficient and physically plausible HOI rendering from sparse views. Specifically, HOGS combines 3D Gaussian Splatting with a physics-aware optimization process. It incorporates a Human Pose Refinement module for accurate pose estimation and a Sparse-View Human-Object Contact Prediction module for efficient contact region identification. This combination enables coherent joint rendering of human and object Gaussians while enforcing physically plausible interactions. Extensive experiments on the HODome dataset demonstrate that HOGS achieves superior rendering quality, efficiency, and physical plausibility compared to existing methods. We further show its extensibility to hand-object grasp rendering tasks, presenting its broader applicability to articulated object interactions.\n\n从稀疏视图输入渲染逼真的人类-物体互动（HOIs）是一个具有挑战性的任务，主要由于遮挡和不完全观察的影响，但这对于各种实际应用至关重要。现有方法通常在渲染质量（如视觉真实感和物理上合理的 HOIs）或计算成本上存在问题。为了解决这些局限性，我们提出了 HOGS（通过 3D 高斯溅射进行的人类-物体渲染），这是一个新颖的框架，用于从稀疏视图高效且物理上合理地渲染 HOI。具体来说，HOGS 将 3D 高斯溅射与一个物理感知优化过程相结合。它包含一个人类姿势细化模块，用于精确的姿势估计，以及一个稀疏视图人类-物体接触预测模块，用于高效识别接触区域。这种结合使得在人类和物体高斯的联合渲染中实现一致性，同时强制执行物理上合理的互动。我们在 HODome 数据集上进行了大量实验，结果表明，HOGS 相较于现有方法在渲染质量、效率和物理合理性方面都表现出色。我们进一步展示了它在手-物体抓取渲染任务中的可扩展性，展示了其在关节物体互动中的更广泛适用性。\n"
  },
  {
    "path": "abs/2503.09941.md",
    "content": "### TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness\n\n3D semantic occupancy has rapidly become a research focus in the fields of robotics and autonomous driving environment perception due to its ability to provide more realistic geometric perception and its closer integration with downstream tasks. By performing occupancy prediction of the 3D space in the environment, the ability and robustness of scene understanding can be effectively improved. However, existing occupancy prediction tasks are primarily modeled using voxel or point cloud-based approaches: voxel-based network structures often suffer from the loss of spatial information due to the voxelization process, while point cloud-based methods, although better at retaining spatial location information, face limitations in representing volumetric structural details. To address this issue, we propose a dual-modal prediction method based on 3D Gaussian sets and sparse points, which balances both spatial location and volumetric structural information, achieving higher accuracy in semantic occupancy prediction. Specifically, our method adopts a Transformer-based architecture, taking 3D Gaussian sets, sparse points, and queries as inputs. Through the multi-layer structure of the Transformer, the enhanced queries and 3D Gaussian sets jointly contribute to the semantic occupancy prediction, and an adaptive fusion mechanism integrates the semantic outputs of both modalities to generate the final prediction results. Additionally, to further improve accuracy, we dynamically refine the point cloud at each layer, allowing for more precise location information during occupancy prediction. We conducted experiments on the Occ3DnuScenes dataset, and the experimental results demonstrate superior performance of the proposed method on IoU based metrics.\n\n3D 语义占据已经迅速成为机器人技术和自动驾驶环境感知领域的研究重点，因其能够提供更真实的几何感知，并与下游任务有更紧密的集成。通过对环境中 3D 空间的占据预测，可以有效提高场景理解的能力和鲁棒性。然而，现有的占据预测任务主要通过体素或点云方法进行建模：基于体素的网络结构通常由于体素化过程而导致空间信息的丢失，而基于点云的方法虽然更好地保留了空间位置信息，但在表示体积结构细节时存在局限性。为了解决这一问题，我们提出了一种基于 3D 高斯集合和稀疏点的双模态预测方法，平衡了空间位置信息和体积结构信息，从而在语义占据预测中实现更高的准确性。具体来说，我们的方法采用基于 Transformer 的架构，输入为 3D 高斯集合、稀疏点和查询。通过 Transformer 的多层结构，增强的查询和 3D 高斯集合共同贡献于语义占据预测，并且一个自适应融合机制将两种模态的语义输出整合起来，生成最终的预测结果。此外，为了进一步提高准确性，我们在每一层动态地细化点云，使占据预测过程中能够获得更精确的位置信息。我们在 Occ3DnuScenes 数据集上进行了实验，实验结果表明，所提出的方法在基于 IoU 的指标上表现出色。\n"
  },
  {
    "path": "abs/2503.10143.md",
    "content": "### GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping\n\nHigh dynamic range (HDR) novel view synthesis (NVS) aims to reconstruct HDR scenes by leveraging multi-view low dynamic range (LDR) images captured at different exposure levels. Current training paradigms with 3D tone mapping often result in unstable HDR reconstruction, while training with 2D tone mapping reduces the model's capacity to fit LDR images. Additionally, the global tone mapper used in existing methods can impede the learning of both HDR and LDR representations. To address these challenges, we present GaussHDR, which unifies 3D and 2D local tone mapping through 3D Gaussian splatting. Specifically, we design a residual local tone mapper for both 3D and 2D tone mapping that accepts an additional context feature as input. We then propose combining the dual LDR rendering results from both 3D and 2D local tone mapping at the loss level. Finally, recognizing that different scenes may exhibit varying balances between the dual results, we introduce uncertainty learning and use the uncertainties for adaptive modulation. Extensive experiments demonstrate that GaussHDR significantly outperforms state-of-the-art methods in both synthetic and real-world scenarios.\n\n高动态范围（HDR）新视角合成（NVS）旨在通过利用在不同曝光级别下拍摄的多视角低动态范围（LDR）图像来重建 HDR 场景。目前，采用 3D 调色映射的训练模式往往导致 HDR 重建不稳定，而采用 2D 调色映射的训练则降低了模型对 LDR 图像的拟合能力。此外，现有方法中使用的全局调色映射器可能会阻碍 HDR 和 LDR 表征的学习。为了解决这些挑战，我们提出了 GaussHDR，它通过 3D 高斯溅射统一了 3D 和 2D 局部调色映射。具体来说，我们设计了一种残差局部调色映射器，适用于 3D 和 2D 调色映射，并接受额外的上下文特征作为输入。然后，我们提出在损失层面将来自 3D 和 2D 局部调色映射的双重 LDR 渲染结果结合起来。最后，考虑到不同场景可能在双重结果之间表现出不同的平衡，我们引入了不确定性学习，并利用不确定性进行自适应调节。大量实验表明，GaussHDR 在合成场景和现实场景中均显著优于现有的最先进方法。\n"
  },
  {
    "path": "abs/2503.10148.md",
    "content": "### 3D Student Splatting and Scooping\n\nRecently, 3D Gaussian Splatting (3DGS) provides a new framework for novel view synthesis, and has spiked a new wave of research in neural rendering and related applications. As 3DGS is becoming a foundational component of many models, any improvement on 3DGS itself can bring huge benefits. To this end, we aim to improve the fundamental paradigm and formulation of 3DGS. We argue that as an unnormalized mixture model, it needs to be neither Gaussians nor splatting. We subsequently propose a new mixture model consisting of flexible Student's t distributions, with both positive (splatting) and negative (scooping) densities. We name our model Student Splatting and Scooping, or SSS. When providing better expressivity, SSS also poses new challenges in learning. Therefore, we also propose a new principled sampling approach for optimization. Through exhaustive evaluation and comparison, across multiple datasets, settings, and metrics, we demonstrate that SSS outperforms existing methods in terms of quality and parameter efficiency, e.g. achieving matching or better quality with similar numbers of components, and obtaining comparable results while reducing the component number by as much as 82%.\n\n最近，3D 高斯溅射（3DGS）为新视角合成提供了一个新的框架，并引发了神经渲染及相关应用领域的新一波研究。随着 3DGS 成为许多模型的基础组件，任何对 3DGS 本身的改进都可能带来巨大的好处。为此，我们旨在改进 3DGS 的基本范式和公式。我们认为，作为一个未归一化的混合模型，它不必一定是高斯分布或溅射。随后，我们提出了一种新的混合模型，由灵活的 Student’s t 分布组成，具有正（溅射）和负（铲取）密度。我们将我们的模型命名为 Student Splatting and Scooping，简称 SSS。虽然 SSS 提供了更好的表现能力，但它也带来了新的学习挑战。因此，我们还提出了一种新的原则性采样方法来进行优化。通过在多个数据集、设置和指标上的全面评估和比较，我们证明了 SSS 在质量和参数效率方面超越了现有方法，例如，在相似的组件数量下实现了匹配或更好的质量，并且在将组件数量减少最多 82% 的同时，获得了相当的结果。\n"
  },
  {
    "path": "abs/2503.10170.md",
    "content": "### GS-SDF: LiDAR-Augmented Gaussian Splatting and Neural SDF for Geometrically Consistent Rendering and Reconstruction\n\nDigital twins are fundamental to the development of autonomous driving and embodied artificial intelligence. However, achieving high-granularity surface reconstruction and high-fidelity rendering remains a challenge. Gaussian splatting offers efficient photorealistic rendering but struggles with geometric inconsistencies due to fragmented primitives and sparse observational data in robotics applications. Existing regularization methods, which rely on render-derived constraints, often fail in complex environments. Moreover, effectively integrating sparse LiDAR data with Gaussian splatting remains challenging. We propose a unified LiDAR-visual system that synergizes Gaussian splatting with a neural signed distance field. The accurate LiDAR point clouds enable a trained neural signed distance field to offer a manifold geometry field, This motivates us to offer an SDF-based Gaussian initialization for physically grounded primitive placement and a comprehensive geometric regularization for geometrically consistent rendering and reconstruction. Experiments demonstrate superior reconstruction accuracy and rendering quality across diverse trajectories. To benefit the community, the codes will be released at this https URL.\n\n数字双胞胎是自动驾驶和具身人工智能发展的基础。然而，实现高精度表面重建和高保真渲染仍然是一大挑战。高斯点云渲染提供了高效的逼真渲染，但由于机器人应用中原语碎片化和稀疏的观测数据，它在几何一致性方面存在问题。现有的正则化方法通常依赖于渲染派生的约束，但在复杂环境中往往失败。此外，将稀疏的LiDAR数据与高斯点云渲染有效集成仍然是一个挑战。我们提出了一种统一的LiDAR-视觉系统，将高斯点云渲染与神经符号距离场（SDF）相结合。准确的LiDAR点云使得训练好的神经符号距离场能够提供流形几何场，这激励我们提出基于SDF的高斯初始化，用于物理上合理的原语放置，并提供全面的几何正则化，确保几何一致的渲染和重建。实验结果表明，在不同轨迹下，我们的方法在重建精度和渲染质量上表现出色。\n"
  },
  {
    "path": "abs/2503.10256.md",
    "content": "### ROODI: Reconstructing Occluded Objects with Denoising Inpainters\n\nWhile the quality of novel-view images has improved dramatically with 3D Gaussian Splatting, extracting specific objects from scenes remains challenging. Isolating individual 3D Gaussian primitives for each object and handling occlusions in scenes remain far from being solved. We propose a novel object extraction method based on two key principles: (1) being object-centric by pruning irrelevant primitives; and (2) leveraging generative inpainting to compensate for missing observations caused by occlusions. For pruning, we analyze the local structure of primitives using K-nearest neighbors, and retain only relevant ones. For inpainting, we employ an off-the-shelf diffusion-based inpainter combined with occlusion reasoning, utilizing the 3D representation of the entire scene. Our findings highlight the crucial synergy between pruning and inpainting, both of which significantly enhance extraction performance. We evaluate our method on a standard real-world dataset and introduce a synthetic dataset for quantitative analysis. Our approach outperforms the state-of-the-art, demonstrating its effectiveness in object extraction from complex scenes.\n\n尽管使用 3D 高斯溅射（3DGS）新视角图像的质量有了显著提高，但从场景中提取特定对象仍然具有挑战性。为每个对象孤立出单独的 3D 高斯原语并处理场景中的遮挡问题，仍远未解决。我们提出了一种基于两个关键原则的新型对象提取方法：(1) 以对象为中心，通过修剪不相关的原语；(2) 利用生成性修复来弥补由于遮挡导致的缺失观察。对于修剪，我们通过使用 K 最近邻分析原语的局部结构，并仅保留相关的原语。对于修复，我们采用一个现成的基于扩散的修复模型，并结合遮挡推理，利用整个场景的 3D 表征。我们的研究结果强调了修剪与修复之间的关键协同作用，这两者显著提高了提取性能。我们在一个标准的现实世界数据集上评估了我们的方法，并引入了一个合成数据集进行定量分析。实验结果表明，我们的方法优于最先进的技术，展示了它在复杂场景中进行对象提取的有效性。\n"
  },
  {
    "path": "abs/2503.10286.md",
    "content": "### VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames\n\nWe present VicaSplat, a novel framework for joint 3D Gaussians reconstruction and camera pose estimation from a sequence of unposed video frames, which is a critical yet underexplored task in real-world 3D applications. The core of our method lies in a novel transformer-based network architecture. In particular, our model starts with an image encoder that maps each image to a list of visual tokens. All visual tokens are concatenated with additional inserted learnable camera tokens. The obtained tokens then fully communicate with each other within a tailored transformer decoder. The camera tokens causally aggregate features from visual tokens of different views, and further modulate them frame-wisely to inject view-dependent features. 3D Gaussian splats and camera pose parameters can then be estimated via different prediction heads. Experiments show that VicaSplat surpasses baseline methods for multi-view inputs, and achieves comparable performance to prior two-view approaches. Remarkably, VicaSplat also demonstrates exceptional cross-dataset generalization capability on the ScanNet benchmark, achieving superior performance without any fine-tuning.\n\n我们提出了 VicaSplat，这是一个新的框架，用于从一系列无姿态视频帧中联合进行 3D 高斯重建和相机姿态估计，这是一个在现实世界 3D 应用中至关重要但尚未充分探索的任务。我们方法的核心在于一个新型的基于 Transformer 的网络架构。具体来说，我们的模型从一个图像编码器开始，将每个图像映射到一组视觉标记。所有视觉标记与额外插入的可学习相机标记一起进行拼接。然后，获得的标记在一个定制的 Transformer 解码器中完全相互通信。相机标记通过因果聚合来自不同视角的视觉标记特征，并进一步按帧调节这些特征，以注入视角依赖特征。随后，可以通过不同的预测头估计 3D 高斯溅射和相机姿态参数。实验表明，VicaSplat 在多视角输入下超越了基线方法，并且与先前的双视角方法性能相当。值得注意的是，VicaSplat 在 ScanNet 基准测试上还展示了卓越的跨数据集泛化能力，且无需任何微调即可获得优异的表现。\n"
  },
  {
    "path": "abs/2503.10437.md",
    "content": "### 4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models\n\nLearning 4D language fields to enable time-sensitive, open-ended language queries in dynamic scenes is essential for many real-world applications. While LangSplat successfully grounds CLIP features into 3D Gaussian representations, achieving precision and efficiency in 3D static scenes, it lacks the ability to handle dynamic 4D fields as CLIP, designed for static image-text tasks, cannot capture temporal dynamics in videos. Real-world environments are inherently dynamic, with object semantics evolving over time. Building a precise 4D language field necessitates obtaining pixel-aligned, object-wise video features, which current vision models struggle to achieve. To address these challenges, we propose 4D LangSplat, which learns 4D language fields to handle time-agnostic or time-sensitive open-vocabulary queries in dynamic scenes efficiently. 4D LangSplat bypasses learning the language field from vision features and instead learns directly from text generated from object-wise video captions via Multimodal Large Language Models (MLLMs). Specifically, we propose a multimodal object-wise video prompting method, consisting of visual and text prompts that guide MLLMs to generate detailed, temporally consistent, high-quality captions for objects throughout a video. These captions are encoded using a Large Language Model into high-quality sentence embeddings, which then serve as pixel-aligned, object-specific feature supervision, facilitating open-vocabulary text queries through shared embedding spaces. Recognizing that objects in 4D scenes exhibit smooth transitions across states, we further propose a status deformable network to model these continuous changes over time effectively. Our results across multiple benchmarks demonstrate that 4D LangSplat attains precise and efficient results for both time-sensitive and time-agnostic open-vocabulary queries.\n\n学习 4D 语言场以支持动态场景中的时间敏感和开放词汇查询对于许多现实世界的应用至关重要。虽然 LangSplat 成功地将 CLIP 特征与 3D 高斯表征结合，从而在 3D 静态场景中实现了精确性和效率，但它缺乏处理动态 4D 场的能力，因为 CLIP 主要为静态图像-文本任务设计，无法捕捉视频中的时间动态。现实世界中的环境本质上是动态的，物体语义随时间变化。构建精确的 4D 语言场需要获得像素对齐的、面向物体的视频特征，而当前的视觉模型在这方面存在困难。为了解决这些挑战，我们提出了 4D LangSplat，它通过学习 4D 语言场高效处理动态场景中的时间无关或时间敏感的开放词汇查询。4D LangSplat 避免了从视觉特征中学习语言场的过程，而是直接从通过多模态大语言模型（MLLMs）生成的物体视频字幕中的文本学习。具体来说，我们提出了一种多模态物体视频提示方法，包括视觉和文本提示，引导 MLLMs 生成视频中物体的详细、时间一致的高质量字幕。这些字幕通过大语言模型编码成高质量的句子嵌入，然后作为像素对齐的、面向物体的特征监督，为共享嵌入空间中的开放词汇文本查询提供支持。鉴于 4D 场景中的物体状态展现出平滑的状态过渡，我们进一步提出了一种状态可变形网络，有效地建模这些随时间变化的连续变化。我们在多个基准测试上的结果表明，4D LangSplat 在处理时间敏感和时间无关的开放词汇查询时，均能够实现精确且高效的结果。\n"
  },
  {
    "path": "abs/2503.10604.md",
    "content": "### MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction\n\nRecent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial performance deterioration under significant viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization, providing comprehensive supervision signals to refine 3DGS representations for rendering robustness enhancement under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.\n\n最近，辐射场的突破性进展显著推动了自动驾驶中的 3D 场景重建和新视角合成（NVS）。然而，仍然存在一些关键的局限性：基于重建的方法在视角偏离训练轨迹较大时性能显著下降，而基于生成的技术则在时间一致性和精确场景可控性方面存在问题。为了解决这些挑战，我们提出了 MuDG，这是一个创新框架，将多模态扩散模型与高斯溅射（GS）结合用于城市场景重建。MuDG 利用聚合的 LiDAR 点云、RGB 图像和几何先验来调节多模态视频扩散模型，从而合成新视角下的光照逼真的 RGB 图像、深度图和语义输出。该合成流程实现了前馈式的新视角合成，无需对每个场景进行计算密集型的优化，并提供全面的监督信号，优化 3DGS 表征，从而提升在极端视角变化下的渲染鲁棒性。在 Open Waymo 数据集上的实验表明，MuDG 在重建和合成质量方面均优于现有方法。\n"
  },
  {
    "path": "abs/2503.10625.md",
    "content": "### LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds\n\nAnimatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation. Recent advances in 3D human reconstruction mainly focus on static human modeling, and the reliance of using synthetic 3D scans for training limits their generalization ability. Conversely, optimization-based video methods achieve higher fidelity but demand controlled capture conditions and computationally intensive refinement processes. Motivated by the emergence of large reconstruction models for efficient static reconstruction, we propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism, enabling detailed preservation of clothing geometry and texture. To further boost the face identity preservation and fine detail recovery, we propose a head feature pyramid encoding scheme to aggregate multi-scale features of the head regions. Extensive experiments demonstrate that our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.\n\n从单张图像重建可动画的 3D 人体是一个具有挑战性的问题，因为在解耦几何形状、外观和变形时存在模糊性。近年来，3D 人体重建的进展主要集中在静态人体建模上，并且依赖使用合成 3D 扫描进行训练，限制了其泛化能力。相反，基于优化的视频方法能够实现更高的逼真度，但需要受控的拍摄条件并且计算上非常密集。受高效静态重建的大型重建模型的启发，我们提出了 LHM（大型可动画人体重建模型），通过前馈传递推断出以 3D 高斯溅射表示的高保真化身。我们的模型利用多模态 Transformer 架构，有效地编码人体的位置信息特征和图像特征，并结合注意力机制，能够细致地保留服装几何形状和纹理。为了进一步增强面部身份的保留和细节恢复，我们提出了一种头部特征金字塔编码方案，聚合头部区域的多尺度特征。大量实验表明，我们的 LHM 在没有后处理的情况下能够在几秒钟内生成可信的可动画人体，包括面部和手部，且在重建精度和泛化能力上均优于现有方法。\n"
  },
  {
    "path": "abs/2503.10860.md",
    "content": "### RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors\n\nIn this paper, we propose RI3D, a novel 3DGS-based approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images. Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions, and introducing two personalized diffusion models, each tailored to one of these tasks. Specifically, one model ('repair') takes a rendered image as input and predicts the corresponding high-quality image, which in turn is used as a pseudo ground truth image to constrain the optimization. The other model ('inpainting') primarily focuses on hallucinating details in unobserved areas. To integrate these models effectively, we introduce a two-stage optimization strategy: the first stage reconstructs visible areas using the repair model, and the second stage reconstructs missing regions with the inpainting model while ensuring coherence through further optimization. Moreover, we augment the optimization with a novel Gaussian initialization method that obtains per-image depth by combining 3D-consistent and smooth depth with highly detailed relative depth. We demonstrate that by separating the process into two tasks and addressing them with the repair and inpainting models, we produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes with extremely sparse inputs.\n\n在本文中，我们提出了 RI3D，这是一种基于 3D 高斯溅射（3DGS）的方法，利用扩散模型的力量，在给定稀疏输入图像的情况下重建高质量的新视角。我们的关键贡献是将视角合成过程分为重建可见区域和虚拟缺失区域两个任务，并引入了两个个性化的扩散模型，每个模型针对其中一个任务进行了优化。具体来说，一个模型（“修复”）以渲染图像为输入，预测相应的高质量图像，该图像进一步作为伪地面真实图像用于约束优化。另一个模型（“修复”）主要集中在虚拟未观察到区域的细节。为了有效地集成这些模型，我们引入了两阶段优化策略：第一阶段使用修复模型重建可见区域，第二阶段使用修复模型重建缺失区域，并确保通过进一步优化保持一致性。此外，我们通过一种新的高斯初始化方法增强了优化过程，该方法通过将 3D 一致性和平滑深度与高度详细的相对深度结合来获取每个图像的深度。我们证明，通过将过程分为两个任务，并通过修复和修复模型解决它们，我们可以在可见和缺失区域中生成细致纹理的结果，在极其稀疏输入的多样场景上超越了最先进的技术。\n"
  },
  {
    "path": "abs/2503.11172.md",
    "content": "### Uncertainty-Aware Normal-Guided Gaussian Splatting for Surface Reconstruction from Sparse Image Sequences\n\n3D Gaussian Splatting (3DGS) has achieved impressive rendering performance in novel view synthesis. However, its efficacy diminishes considerably in sparse image sequences, where inherent data sparsity amplifies geometric uncertainty during optimization. This often leads to convergence at suboptimal local minima, resulting in noticeable structural artifacts in the reconstructed scenes. To mitigate these issues, we propose Uncertainty-aware Normal-Guided Gaussian Splatting (UNG-GS), a novel framework featuring an explicit Spatial Uncertainty Field (SUF) to quantify geometric uncertainty within the 3DGS pipeline. UNG-GS enables high-fidelity rendering and achieves high-precision reconstruction without relying on priors. Specifically, we first integrate Gaussian-based probabilistic modeling into the training of 3DGS to optimize the SUF, providing the model with adaptive error tolerance. An uncertainty-aware depth rendering strategy is then employed to weight depth contributions based on the SUF, effectively reducing noise while preserving fine details. Furthermore, an uncertainty-guided normal refinement method adjusts the influence of neighboring depth values in normal estimation, promoting robust results. Extensive experiments demonstrate that UNG-GS significantly outperforms state-of-the-art methods in both sparse and dense sequences.\n\n3D 高斯溅射（3DGS）在新视角合成中取得了令人印象深刻的渲染性能。然而，在稀疏图像序列中，它的效果显著下降，因为固有的数据稀疏性在优化过程中加剧了几何不确定性。这常常导致模型收敛到次优的局部极小值，从而在重建的场景中产生明显的结构性伪影。为了解决这些问题，我们提出了基于不确定性感知的法线引导高斯溅射（UNG-GS），这是一个新颖的框架，具有一个显式的空间不确定性场（SUF）来量化 3DGS 流程中的几何不确定性。UNG-GS 实现了高保真渲染，并且无需依赖先验即可实现高精度重建。具体来说，我们首先将基于高斯的概率建模集成到 3DGS 的训练过程中，以优化 SUF，从而为模型提供自适应的误差容忍度。然后，采用一种不确定性感知的深度渲染策略，根据 SUF 对深度贡献进行加权，有效地减少噪声，同时保留细节。此外，我们还提出了一种不确定性引导的法线优化方法，调整邻近深度值在法线估计中的影响，促进鲁棒性结果。大量实验表明，UNG-GS 在稀疏和密集序列中的表现均显著优于现有的最先进方法。\n"
  },
  {
    "path": "abs/2503.11345.md",
    "content": "### EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting\n\nEgocentric scenes exhibit frequent occlusions, varied viewpoints, and dynamic interactions compared to typical scene understanding tasks. Occlusions and varied viewpoints can lead to multi-view semantic inconsistencies, while dynamic objects may act as transient distractors, introducing artifacts into semantic feature modeling. To address these challenges, we propose EgoSplat, a language-embedded 3D Gaussian Splatting framework for open-vocabulary egocentric scene understanding. A multi-view consistent instance feature aggregation method is designed to leverage the segmentation and tracking capabilities of SAM2 to selectively aggregate complementary features across views for each instance, ensuring precise semantic representation of scenes. Additionally, an instance-aware spatial-temporal transient prediction module is constructed to improve spatial integrity and temporal continuity in predictions by incorporating spatial-temporal associations across multi-view instances, effectively reducing artifacts in the semantic reconstruction of egocentric scenes. EgoSplat achieves state-of-the-art performance in both localization and segmentation tasks on two datasets, outperforming existing methods with a 8.2% improvement in localization accuracy and a 3.7% improvement in segmentation mIoU on the ADT dataset, and setting a new benchmark in open-vocabulary egocentric scene understanding.\n\n与典型的场景理解任务相比，第一人称视角场景具有频繁的遮挡、变化的视角和动态交互。遮挡和视角变化可能导致多视角语义不一致，而动态物体可能作为瞬时干扰物，引入伪影到语义特征建模中。为了解决这些挑战，我们提出了 EgoSplat，一个语言嵌入的 3D 高斯溅射框架，用于开放词汇的第一人称视角场景理解。我们设计了一种多视角一致的实例特征聚合方法，利用 SAM2 的分割和跟踪能力，选择性地跨视角聚合每个实例的互补特征，从而确保场景的精确语义表示。此外，我们构建了一个实例感知的时空瞬时预测模块，通过结合跨视角实例的时空关联，改善预测中的空间完整性和时间连续性，有效地减少了第一人称视角场景语义重建中的伪影。EgoSplat 在两个数据集上的定位和分割任务中取得了最先进的性能，在 ADT 数据集上，定位精度提高了 8.2%，分割 mIoU 提高了 3.7%，超越了现有方法，设立了开放词汇第一人称视角场景理解的新基准。\n"
  },
  {
    "path": "abs/2503.11601.md",
    "content": "### Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information\n\nWe present a novel framework for enhancing the visual fidelity and consistency of text-guided 3D Gaussian Splatting (3DGS) editing. Existing editing approaches face two critical challenges: inconsistent geometric reconstructions across multiple viewpoints, particularly in challenging camera positions, and ineffective utilization of depth information during image manipulation, resulting in over-texture artifacts and degraded object boundaries. To address these limitations, we introduce: 1) A complementary information mutual learning network that enhances depth map estimation from 3DGS, enabling precise depth-conditioned 3D editing while preserving geometric structures. 2) A wavelet consensus attention mechanism that effectively aligns latent codes during the diffusion denoising process, ensuring multi-view consistency in the edited results. Through extensive experimentation, our method demonstrates superior performance in rendering quality and view consistency compared to state-of-the-art approaches. The results validate our framework as an effective solution for text-guided editing of 3D scenes.\n\n我们提出了一个新的框架，用于增强文本引导的 3D 高斯溅射（3DGS）编辑的视觉真实感和一致性。现有的编辑方法面临两个关键挑战：在多个视角下几何重建的不一致，特别是在具有挑战性的摄像机位置下；以及在图像操作过程中深度信息的无效利用，导致过度纹理伪影和物体边界的退化。为了解决这些局限性，我们引入了：1）一个互补信息互学习网络，增强了从 3DGS 获得的深度图估计，使得在保持几何结构的同时能够进行精确的深度条件 3D 编辑；2）一种小波共识注意机制，在扩散去噪过程中有效地对齐潜在代码，确保编辑结果在多视角下的一致性。通过广泛的实验，我们的方法在渲染质量和视角一致性方面，相较于最先进的方法，展现了更优的性能。实验结果验证了我们的框架作为文本引导 3D 场景编辑的有效解决方案。\n"
  },
  {
    "path": "abs/2503.11731.md",
    "content": "### Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation\n\nSensor simulation is pivotal for scalable validation of autonomous driving systems, yet existing Neural Radiance Fields (NeRF) based methods face applicability and efficiency challenges in industrial workflows. This paper introduces a Gaussian Splatting (GS) based system to address these challenges: We first break down sensor simulator components and analyze the possible advantages of GS over NeRF. Then in practice, we refactor three crucial components through GS, to leverage its explicit scene representation and real-time rendering: (1) choosing the 2D neural Gaussian representation for physics-compliant scene and sensor modeling, (2) proposing a scene editing pipeline to leverage Gaussian primitives library for data augmentation, and (3) coupling a controllable diffusion model for scene expansion and harmonization. We implement this framework on a proprietary autonomous driving dataset supporting cameras and LiDAR sensors. We demonstrate through ablation studies that our approach reduces frame-wise simulation latency, achieves better geometric and photometric consistency, and enables interpretable explicit scene editing and expansion. Furthermore, we showcase how integrating such a GS-based sensor simulator with traffic and dynamic simulators enables full-stack testing of end-to-end autonomy algorithms. Our work provides both algorithmic insights and practical validation, establishing GS as a cornerstone for industrial-grade sensor simulation.\n\n传感器仿真对于自动驾驶系统的可扩展验证至关重要，但现有基于神经辐射场（NeRF）的方法在工业工作流程中面临适用性和效率的挑战。本文提出了一种基于高斯点云渲染（GS）的系统来解决这些挑战：我们首先分解了传感器仿真器的组成部分，并分析了GS相对于NeRF的潜在优势。然后在实际应用中，我们通过GS重构了三个关键组件，利用其显式的场景表示和实时渲染能力：（1）选择二维神经高斯表示进行符合物理的场景和传感器建模，（2）提出了一个场景编辑管道，利用高斯原语库进行数据增强，（3）将可控的扩散模型与场景扩展和协调结合。我们在一个支持摄像头和LiDAR传感器的专有自动驾驶数据集上实现了这一框架。通过消融实验，我们证明了我们的方法减少了每帧仿真延迟，达到了更好的几何和光度一致性，并且支持可解释的显式场景编辑和扩展。此外，我们展示了如何将这种基于GS的传感器仿真器与交通和动态仿真器集成，实现端到端自动驾驶算法的全栈测试。我们的工作为算法提供了深刻的见解，并提供了实践验证，确立了GS作为工业级传感器仿真基石的地位。\n"
  },
  {
    "path": "abs/2503.11978.md",
    "content": "### Snapmoji: Instant Generation of Animatable Dual-Stylized Avatars\n\nThe increasing popularity of personalized avatar systems, such as Snapchat Bitmojis and Apple Memojis, highlights the growing demand for digital self-representation. Despite their widespread use, existing avatar platforms face significant limitations, including restricted expressivity due to predefined assets, tedious customization processes, or inefficient rendering requirements. Addressing these shortcomings, we introduce Snapmoji, an avatar generation system that instantly creates animatable, dual-stylized avatars from a selfie. We propose Gaussian Domain Adaptation (GDA), which is pre-trained on large-scale Gaussian models using 3D data from sources such as Objaverse and fine-tuned with 2D style transfer tasks, endowing it with a rich 3D prior. This enables Snapmoji to transform a selfie into a primary stylized avatar, like the Bitmoji style, and apply a secondary style, such as Plastic Toy or Alien, all while preserving the user's identity and the primary style's integrity. Our system is capable of producing 3D Gaussian avatars that support dynamic animation, including accurate facial expression transfer. Designed for efficiency, Snapmoji achieves selfie-to-avatar conversion in just 0.9 seconds and supports real-time interactions on mobile devices at 30 to 40 frames per second. Extensive testing confirms that Snapmoji outperforms existing methods in versatility and speed, making it a convenient tool for automatic avatar creation in various styles.\n\n个性化头像系统的日益流行，如Snapchat的Bitmoji和Apple的Memojis，凸显了数字自我表现的需求不断增长。尽管这些平台得到广泛使用，但现有的头像平台面临着诸多限制，包括由于预定义资源导致的表现力受限、繁琐的定制过程和低效的渲染需求。为了解决这些问题，我们提出了Snapmoji，一种能够从自拍照片即时创建可动画化的双重风格头像的生成系统。我们提出了高斯领域自适应（GDA）方法，该方法在大规模高斯模型上进行预训练，利用如Objaverse等来源的三维数据，并通过二维风格迁移任务进行微调，使其具备丰富的三维先验。这使得Snapmoji能够将自拍转化为主要风格化的头像（如Bitmoji风格），并应用次级风格（如塑料玩具或外星人风格），同时保持用户的身份和主要风格的完整性。我们的系统能够生成支持动态动画的三维高斯头像，包括准确的面部表情转移。为了提高效率，Snapmoji实现了自拍到头像的转换仅需0.9秒，并支持在移动设备上以30至40帧每秒的实时交互。大量测试结果表明，Snapmoji在多样性和速度方面超越了现有方法，成为一种便捷的自动头像创建工具，适用于各种风格。\n"
  },
  {
    "path": "abs/2503.11979.md",
    "content": "### DynaGSLAM: Real-Time Gaussian-Splatting SLAM for Online Rendering, Tracking, Motion Predictions of Moving Objects in Dynamic Scenes\n\nSimultaneous Localization and Mapping (SLAM) is one of the most important environment-perception and navigation algorithms for computer vision, robotics, and autonomous cars/drones. Hence, high quality and fast mapping becomes a fundamental problem. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, state-of-the-art (SOTA) works introduce GS to SLAM. Compared to classical pointcloud-SLAM, GS-SLAM generates photometric information by learning from input camera views and synthesize unseen views with high-quality textures. However, these GS-SLAM fail when moving objects occupy the scene that violate the static assumption of bundle adjustment. The failed updates of moving GS affects the static GS and contaminates the full map over long frames. Although some efforts have been made by concurrent works to consider moving objects for GS-SLAM, they simply detect and remove the moving regions from GS rendering (\"anti'' dynamic GS-SLAM), where only the static background could benefit from GS. To this end, we propose the first real-time GS-SLAM, \"DynaGSLAM'', that achieves high-quality online GS rendering, tracking, motion predictions of moving objects in dynamic scenes while jointly estimating accurate ego motion. Our DynaGSLAM outperforms SOTA static & \"Anti'' dynamic GS-SLAM on three dynamic real datasets, while keeping speed and memory efficiency in practice.\n\n同时定位与地图构建（SLAM）是计算机视觉、机器人技术以及自动驾驶汽车/无人机中最重要的环境感知与导航算法之一。因此，高质量且快速的地图构建成为一个基本问题。随着三维高斯点云渲染（3DGS）作为一种显式表示方法的出现，具有优异的渲染质量和速度，最先进的研究（SOTA）将高斯点云引入SLAM。与传统的点云SLAM相比，GS-SLAM通过从输入摄像头视角中学习生成光度信息，并利用高质量的纹理合成未见过的视角。然而，当场景中出现运动物体时，这些GS-SLAM会失败，因为运动物体违反了束束调整的静态假设。运动物体的更新失败会影响静态高斯点云，导致长时间的地图污染。尽管有一些并行工作已经开始尝试考虑运动物体对GS-SLAM的影响，它们只是简单地检测并移除GS渲染中的运动区域（即“反”动态GS-SLAM），使得只有静态背景能够从GS中受益。为此，我们提出了第一个实时GS-SLAM算法——“DynaGSLAM”，它能够在动态场景中实现高质量的在线GS渲染、跟踪和运动物体预测，同时联合估计准确的自我运动。我们的DynaGSLAM在三个动态真实数据集上优于SOTA静态和“反”动态GS-SLAM，并在实践中保持速度和内存效率。\n"
  },
  {
    "path": "abs/2503.11981.md",
    "content": "### DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting\n\nText-to-3D generation saw dramatic advances in recent years by leveraging Text-to-Image models. However, most existing techniques struggle with compositional prompts, which describe multiple objects and their spatial relationships. They often fail to capture fine-grained inter-object interactions. We introduce DecompDreamer, a Gaussian splatting-based training routine designed to generate high-quality 3D compositions from such complex prompts. DecompDreamer leverages Vision-Language Models (VLMs) to decompose scenes into structured components and their relationships. We propose a progressive optimization strategy that first prioritizes joint relationship modeling before gradually shifting toward targeted object refinement. Our qualitative and quantitative evaluations against state-of-the-art text-to-3D models demonstrate that DecompDreamer effectively generates intricate 3D compositions with superior object disentanglement, offering enhanced control and flexibility in 3D generation.\n\n近年来，通过利用文本到图像（Text-to-Image）模型，文本到三维（Text-to-3D）生成取得了显著进展。然而，大多数现有技术在处理组合性提示时存在困难，组合性提示描述了多个物体及其空间关系。这些技术通常难以捕捉物体间的细粒度交互。我们提出了DecompDreamer，一种基于高斯点云渲染的训练流程，旨在从这些复杂的提示中生成高质量的三维组合。DecompDreamer利用视觉语言模型（VLMs）将场景分解为结构化的组件及其关系。我们提出了一种渐进式优化策略，首先优先进行联合关系建模，然后逐步转向目标物体的精细化优化。我们与最先进的文本到三维模型进行了定性和定量评估，结果表明，DecompDreamer能够有效地生成复杂的三维组合，具有优越的物体解耦能力，在三维生成中提供了更强的控制力和灵活性。\n"
  },
  {
    "path": "abs/2503.12001.md",
    "content": "### 3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction\n\nThe accurate reconstruction of dynamic street scenes is critical for applications in autonomous driving, augmented reality, and virtual reality. Traditional methods relying on dense point clouds and triangular meshes struggle with moving objects, occlusions, and real-time processing constraints, limiting their effectiveness in complex urban environments. While multi-view stereo and neural radiance fields have advanced 3D reconstruction, they face challenges in computational efficiency and handling scene dynamics. This paper proposes a novel 3D Gaussian point distribution method for dynamic street scene reconstruction. Our approach introduces an adaptive transparency mechanism that eliminates moving objects while preserving high-fidelity static scene details. Additionally, iterative refinement of Gaussian point distribution enhances geometric accuracy and texture representation. We integrate directional encoding with spatial position optimization to optimize storage and rendering efficiency, reducing redundancy while maintaining scene integrity. Experimental results demonstrate that our method achieves high reconstruction quality, improved rendering performance, and adaptability in large-scale dynamic environments. These contributions establish a robust framework for real-time, high-precision 3D reconstruction, advancing the practicality of dynamic scene modeling across multiple applications.\n\n动态街景的准确重建对于自动驾驶、增强现实和虚拟现实等应用至关重要。传统方法依赖于密集点云和三角网格，但在处理运动物体、遮挡和实时处理限制时存在困难，这限制了它们在复杂城市环境中的有效性。尽管多视角立体视觉和神经辐射场（Neural Radiance Fields）在三维重建方面取得了进展，但它们在计算效率和处理场景动态方面仍面临挑战。本文提出了一种用于动态街景重建的全新三维高斯点分布方法。我们的方法引入了一种自适应透明度机制，可以消除运动物体，同时保留高保真的静态场景细节。此外，迭代优化高斯点分布提高了几何精度和纹理表现。我们结合方向编码与空间位置优化，以优化存储和渲染效率，减少冗余的同时保持场景完整性。实验结果表明，我们的方法在大规模动态环境中实现了高质量的重建、更好的渲染性能以及较强的适应性。这些贡献为实时高精度三维重建建立了一个稳健的框架，推动了动态场景建模在多个应用中的实际可行性。\n"
  },
  {
    "path": "abs/2503.12284.md",
    "content": "### REdiSplats: Ray Tracing for Editable Gaussian Splatting\n\nGaussian Splatting (GS) has become one of the most important neural rendering algorithms. GS represents 3D scenes using Gaussian components with trainable color and opacity. This representation achieves high-quality renderings with fast inference. Regrettably, it is challenging to integrate such a solution with varying light conditions, including shadows and light reflections, manual adjustments, and a physical engine. Recently, a few approaches have appeared that incorporate ray-tracing or mesh primitives into GS to address some of these caveats. However, no such solution can simultaneously solve all the existing limitations of the classical GS. Consequently, we introduce REdiSplats, which employs ray tracing and a mesh-based representation of flat 3D Gaussians. In practice, we model the scene using flat Gaussian distributions parameterized by the mesh. We can leverage fast ray tracing and control Gaussian modification by adjusting the mesh vertices. Moreover, REdiSplats allows modeling of light conditions, manual adjustments, and physical simulation. Furthermore, we can render our models using 3D tools such as Blender or Nvdiffrast, which opens the possibility of integrating them with all existing 3D graphics techniques dedicated to mesh representations.\n\n高斯点云渲染（GS）已成为最重要的神经渲染算法之一。GS使用具有可训练颜色和不透明度的高斯组件表示三维场景。这种表示方法能够实现高质量的渲染并具有快速推理能力。遗憾的是，将这种解决方案与变化的光照条件（包括阴影和光线反射）、手动调整和物理引擎结合起来是具有挑战性的。近年来，一些方法尝试将光线追踪或网格原语融入GS，以解决其中的一些问题。然而，现有的解决方案无法同时解决经典GS的所有局限性。因此，我们提出了REdiSplats，它采用光线追踪和基于网格的平面三维高斯表示。在实践中，我们通过网格对平面高斯分布进行参数化来建模场景。我们可以利用快速的光线追踪，并通过调整网格顶点来控制高斯修改。此外，REdiSplats还支持光照条件、手动调整和物理仿真建模。此外，我们可以使用Blender或Nvdiffrast等三维工具渲染我们的模型，这为将其与所有现有的专门针对网格表示的三维图形技术集成提供了可能性。\n"
  },
  {
    "path": "abs/2503.12307.md",
    "content": "### Swift4D:Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene\n\nNovel view synthesis has long been a practical but challenging task, although the introduction of numerous methods to solve this problem, even combining advanced representations like 3D Gaussian Splatting, they still struggle to recover high-quality results and often consume too much storage memory and training time. In this paper we propose Swift4D, a divide-and-conquer 3D Gaussian Splatting method that can handle static and dynamic primitives separately, achieving a good trade-off between rendering quality and efficiency, motivated by the fact that most of the scene is the static primitive and does not require additional dynamic properties. Concretely, we focus on modeling dynamic transformations only for the dynamic primitives which benefits both efficiency and quality. We first employ a learnable decomposition strategy to separate the primitives, which relies on an additional parameter to classify primitives as static or dynamic. For the dynamic primitives, we employ a compact multi-resolution 4D Hash mapper to transform these primitives from canonical space into deformation space at each timestamp, and then mix the static and dynamic primitives to produce the final output. This divide-and-conquer method facilitates efficient training and reduces storage redundancy. Our method not only achieves state-of-the-art rendering quality while being 20X faster in training than previous SOTA methods with a minimum storage requirement of only 30MB on real-world datasets.\n\n新颖视角合成一直是一个实用但具有挑战性的任务，尽管已有许多方法试图解决这一问题，甚至结合了像三维高斯点云渲染（3D Gaussian Splatting）这样的先进表示，但它们仍然难以恢复高质量的结果，且往往消耗过多的存储内存和训练时间。本文提出了Swift4D，一种分治的三维高斯点云渲染方法，可以分别处理静态和动态原语，在渲染质量和效率之间实现良好的平衡，灵感来源于大部分场景是静态原语，不需要额外的动态属性。具体来说，我们仅针对动态原语建模动态变换，这对效率和质量都有好处。我们首先采用一种可学习的分解策略，将原语分离开来，该策略依赖于额外的参数将原语分类为静态或动态。对于动态原语，我们使用紧凑的多分辨率4D哈希映射器，将这些原语从规范空间转换到变形空间，并在每个时间戳下进行处理，之后将静态和动态原语混合，生成最终输出。这种分治方法促进了高效的训练，并减少了存储冗余。我们的方法不仅实现了最先进的渲染质量，而且在训练速度上比以往的SOTA方法快20倍，并且在实际数据集上仅需30MB的最低存储需求。\n"
  },
  {
    "path": "abs/2503.12335.md",
    "content": "### GS-I3: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images\n\nAccurate geometric surface reconstruction, providing essential environmental information for navigation and manipulation tasks, is critical for enabling robotic self-exploration and interaction. Recently, 3D Gaussian Splatting (3DGS) has gained significant attention in the field of surface reconstruction due to its impressive geometric quality and computational efficiency. While recent relevant advancements in novel view synthesis under inconsistent illumination using 3DGS have shown promise, the challenge of robust surface reconstruction under such conditions is still being explored. To address this challenge, we propose a method called GS-3I. Specifically, to mitigate 3D Gaussian optimization bias caused by underexposed regions in single-view images, based on Convolutional Neural Network (CNN), a tone mapping correction framework is introduced. Furthermore, inconsistent lighting across multi-view images, resulting from variations in camera settings and complex scene illumination, often leads to geometric constraint mismatches and deviations in the reconstructed surface. To overcome this, we propose a normal compensation mechanism that integrates reference normals extracted from single-view image with normals computed from multi-view observations to effectively constrain geometric inconsistencies. Extensive experimental evaluations demonstrate that GS-3I can achieve robust and accurate surface reconstruction across complex illumination scenarios, highlighting its effectiveness and versatility in this critical challenge.\n\n准确的几何表面重建为导航和操作任务提供了必要的环境信息，对于实现机器人自我探索和交互至关重要。近年来，三维高斯点云渲染（3DGS）因其出色的几何质量和计算效率，在表面重建领域引起了广泛关注。尽管在不一致光照条件下使用3DGS进行新颖视角合成的相关进展已展现出潜力，但在此类条件下实现鲁棒的表面重建仍是一个正在探索的挑战。为了解决这一挑战，我们提出了一种名为GS-3I的方法。具体而言，为了减轻单视图图像中由于曝光不足区域造成的3D高斯优化偏差，我们引入了基于卷积神经网络（CNN）的色调映射校正框架。此外，由于相机设置的变化和复杂场景的光照条件，多个视图图像中的不一致光照往往导致几何约束不匹配和重建表面的偏差。为了解决这一问题，我们提出了一种法线补偿机制，将从单视图图像提取的参考法线与从多视图观测计算出的法线结合，以有效地约束几何不一致性。广泛的实验评估表明，GS-3I能够在复杂光照场景下实现鲁棒且准确的表面重建，突显了其在这一关键挑战中的有效性和多样性。\n"
  },
  {
    "path": "abs/2503.12343.md",
    "content": "### TopoGaussian: Inferring Internal Topology Structures from Visual Clues\n\nWe present TopoGaussian, a holistic, particle-based pipeline for inferring the interior structure of an opaque object from easily accessible photos and videos as input. Traditional mesh-based approaches require tedious and error-prone mesh filling and fixing process, while typically output rough boundary surface. Our pipeline combines Gaussian Splatting with a novel, versatile particle-based differentiable simulator that simultaneously accommodates constitutive model, actuator, and collision, without interference with mesh. Based on the gradients from this simulator, we provide flexible choice of topology representation for optimization, including particle, neural implicit surface, and quadratic surface. The resultant pipeline takes easily accessible photos and videos as input and outputs the topology that matches the physical characteristics of the input. We demonstrate the efficacy of our pipeline on a synthetic dataset and four real-world tasks with 3D-printed prototypes. Compared with existing mesh-based method, our pipeline is 5.26x faster on average with improved shape quality. These results highlight the potential of our pipeline in 3D vision, soft robotics, and manufacturing applications.\n\n我们提出了TopoGaussian，一种全面的基于粒子的流程，用于从易于获取的照片和视频推断不透明物体的内部结构。传统的基于网格的方法需要繁琐且容易出错的网格填充和修复过程，且通常只能输出粗略的边界表面。我们的流程将高斯点云渲染与一种新颖的、通用的粒子基差分模拟器相结合，该模拟器能够同时处理材料模型、执行器和碰撞，而不与网格发生冲突。基于该模拟器的梯度，我们为优化提供了灵活的拓扑表示选择，包括粒子、神经隐式表面和二次曲面。最终的流程以易于获取的照片和视频为输入，输出与输入物理特性匹配的拓扑结构。我们在一个合成数据集和四个实际任务（使用3D打印原型）上演示了我们流程的有效性。与现有的基于网格的方法相比，我们的流程在平均速度上提高了5.26倍，并且改善了形状质量。这些结果突显了我们流程在三维视觉、软机器人技术和制造应用中的潜力。\n"
  },
  {
    "path": "abs/2503.12383.md",
    "content": "### VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting\n\nWe propose VRSketch2Gaussian, a first VR sketch-guided, multi-modal, native 3D object generation framework that incorporates a 3D Gaussian Splatting representation. As part of our work, we introduce VRSS, the first large-scale paired dataset containing VR sketches, text, images, and 3DGS, bridging the gap in multi-modal VR sketch-based generation. Our approach features the following key innovations: 1) Sketch-CLIP feature alignment. We propose a two-stage alignment strategy that bridges the domain gap between sparse VR sketch embeddings and rich CLIP embeddings, facilitating both VR sketch-based retrieval and generation tasks. 2) Fine-Grained multi-modal conditioning. We disentangle the 3D generation process by using explicit VR sketches for geometric conditioning and text descriptions for appearance control. To facilitate this, we propose a generalizable VR sketch encoder that effectively aligns different modalities. 3) Efficient and high-fidelity 3D native generation. Our method leverages a 3D-native generation approach that enables fast and texture-rich 3D object synthesis. Experiments conducted on our VRSS dataset demonstrate that our method achieves high-quality, multi-modal VR sketch-based 3D generation. We believe our VRSS dataset and VRsketch2Gaussian method will be beneficial for the 3D generation community.\n\n我们提出了VRSketch2Gaussian，这是一个首个基于虚拟现实（VR）草图指导的多模态原生三维物体生成框架，结合了三维高斯点云渲染（3D Gaussian Splatting）表示。作为我们工作的一个部分，我们引入了VRSS，这是首个包含VR草图、文本、图像和3DGS的大规模配对数据集，填补了多模态VR草图基础生成中的空白。我们的方法具有以下关键创新：1）草图-CLIP特征对齐。我们提出了一种两阶段的对齐策略，弥合了稀疏的VR草图嵌入和丰富的CLIP嵌入之间的领域差距，从而促进了VR草图基础的检索和生成任务。2）细粒度的多模态条件控制。我们通过使用显式的VR草图进行几何条件控制，并使用文本描述进行外观控制，将三维生成过程解耦。为此，我们提出了一种通用的VR草图编码器，有效地对齐了不同的模态。3）高效且高保真的原生三维生成。我们的方法利用一种原生三维生成方法，实现了快速且富有纹理的三维物体合成。我们在VRSS数据集上的实验表明，我们的方法实现了高质量的多模态VR草图基础三维生成。我们相信，VRSS数据集和VRSketch2Gaussian方法将对三维生成社区产生积极影响。\n"
  },
  {
    "path": "abs/2503.12535.md",
    "content": "### SPC-GS: Gaussian Splatting with Semantic-Prompt Consistency for Indoor Open-World Free-view Synthesis from Sparse Inputs\n\n3D Gaussian Splatting-based indoor open-world free-view synthesis approaches have shown significant performance with dense input images. However, they exhibit poor performance when confronted with sparse inputs, primarily due to the sparse distribution of Gaussian points and insufficient view supervision. To relieve these challenges, we propose SPC-GS, leveraging Scene-layout-based Gaussian Initialization (SGI) and Semantic-Prompt Consistency (SPC) Regularization for open-world free view synthesis with sparse inputs. Specifically, SGI provides a dense, scene-layout-based Gaussian distribution by utilizing view-changed images generated from the video generation model and view-constraint Gaussian points densification. Additionally, SPC mitigates limited view supervision by employing semantic-prompt-based consistency constraints developed by SAM2. This approach leverages available semantics from training views, serving as instructive prompts, to optimize visually overlapping regions in novel views with 2D and 3D consistency constraints. Extensive experiments demonstrate the superior performance of SPC-GS across Replica and ScanNet benchmarks. Notably, our SPC-GS achieves a 3.06 dB gain in PSNR for reconstruction quality and a 7.3% improvement in mIoU for open-world semantic segmentation.\n\n基于三维高斯点云渲染（3D Gaussian Splatting）的室内开放世界自由视角合成方法在密集输入图像下表现出显著的性能。然而，当面对稀疏输入时，它们的表现较差，主要是由于高斯点的稀疏分布和视角监督不足。为了解决这些挑战，我们提出了SPC-GS，利用基于场景布局的高斯初始化（SGI）和语义提示一致性（SPC）正则化来进行稀疏输入的开放世界自由视角合成。具体而言，SGI通过利用视频生成模型生成的视角变化图像和视角约束的高斯点密集化，提供了一个密集的基于场景布局的高斯分布。此外，SPC通过采用由SAM2开发的语义提示一致性约束，缓解了有限视角监督的问题。该方法利用训练视角中可用的语义信息作为指导提示，优化新视角中视觉重叠区域，结合2D和3D一致性约束。广泛的实验表明，SPC-GS在Replica和ScanNet基准测试中的表现优越。值得注意的是，我们的SPC-GS在重建质量上取得了3.06 dB的PSNR提升，并且在开放世界语义分割中实现了7.3%的mIoU改善。\n"
  },
  {
    "path": "abs/2503.12552.md",
    "content": "### MTGS: Multi-Traversal Gaussian Splatting\n\nMulti-traversal data, commonly collected through daily commutes or by self-driving fleets, provides multiple viewpoints for scene reconstruction within a road block. This data offers significant potential for high-quality novel view synthesis, which is crucial for applications such as autonomous vehicle simulators. However, inherent challenges in multi-traversal data often result in suboptimal reconstruction quality, including variations in appearance and the presence of dynamic objects. To address these issues, we propose Multi-Traversal Gaussian Splatting (MTGS), a novel approach that reconstructs high-quality driving scenes from arbitrarily collected multi-traversal data by modeling a shared static geometry while separately handling dynamic elements and appearance variations. Our method employs a multi-traversal dynamic scene graph with a shared static node and traversal-specific dynamic nodes, complemented by color correction nodes with learnable spherical harmonics coefficient residuals. This approach enables high-fidelity novel view synthesis and provides flexibility to navigate any viewpoint. We conduct extensive experiments on a large-scale driving dataset, nuPlan, with multi-traversal data. Our results demonstrate that MTGS improves LPIPS by 23.5% and geometry accuracy by 46.3% compared to single-traversal baselines.\n\n多次穿越数据通常通过日常通勤或自动驾驶车队收集，提供了用于路段内场景重建的多个视角。这些数据为高质量的新颖视角合成提供了显著潜力，这对于自动驾驶汽车模拟器等应用至关重要。然而，多次穿越数据中固有的挑战常常导致重建质量不理想，包括外观变化和动态物体的存在。为了解决这些问题，我们提出了多次穿越高斯点云渲染（MTGS），这是一种通过建模共享静态几何结构，同时单独处理动态元素和外观变化，从任意收集的多次穿越数据中重建高质量驾驶场景的新方法。我们的方法采用了一个多次穿越动态场景图，包含一个共享的静态节点和特定于每次穿越的动态节点，并辅以带有可学习球谐系数残差的颜色校正节点。该方法能够实现高保真的新颖视角合成，并提供在任意视角下进行导航的灵活性。我们在一个大规模驾驶数据集nuPlan上进行了大量实验，使用了多次穿越数据。实验结果表明，与单次穿越基线相比，MTGS在LPIPS上提高了23.5%，在几何准确性上提高了46.3%。\n"
  },
  {
    "path": "abs/2503.12553.md",
    "content": "### Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View\n\nRecent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling. This paper presents Niagara, a new single-view 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time.\nOur approach integrates monocular depth and normal estimation as input, which substantially improves its ability to capture fine details, mitigating common issues like geometric detail loss and deformation.\nAdditionally, we introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields, striking a balance between efficient rendering and high-fidelity reconstruction.\nOur framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters, which can be used for novel view synthesis. Extensive results and analyses suggest that our Niagara surpasses prior SoTA approaches such as Flash3D in both single-view and dual-view settings, significantly enhancing the geometric accuracy and visual fidelity, especially in outdoor scenes.\n\n近年来，单视角三维场景重建的进展突显了捕捉细致几何细节和确保结构一致性方面的挑战，尤其是在高保真户外场景建模中。本文提出了Niagara，一种新的单视角三维场景重建框架，首次能够从单张输入图像中真实地重建具有挑战性的户外场景。\n我们的方法将单眼深度和法线估计作为输入，显著提高了捕捉细节的能力，缓解了几何细节丧失和变形等常见问题。此外，我们引入了几何仿射场（GAF）和三维自注意力作为几何约束，这将显式几何的结构特性与隐式特征场的适应性结合，平衡了高效渲染与高保真重建之间的关系。\n我们的框架最终提出了一种专门的编码器-解码器架构，其中提出了基于深度的三维高斯解码器，用于预测三维高斯参数，这些参数可以用于新颖视角合成。广泛的结果和分析表明，我们的Niagara在单视角和双视角设置中均优于先前的最先进方法，如Flash3D，显著提高了几何准确性和视觉保真度，特别是在户外场景中。\n"
  },
  {
    "path": "abs/2503.12572.md",
    "content": "### Deblur Gaussian Splatting SLAM\n\nWe present Deblur-SLAM, a robust RGB SLAM pipeline designed to recover sharp reconstructions from motion-blurred inputs. The proposed method bridges the strengths of both frame-to-frame and frame-to-model approaches to model sub-frame camera trajectories that lead to high-fidelity reconstructions in motion-blurred settings. Moreover, our pipeline incorporates techniques such as online loop closure and global bundle adjustment to achieve a dense and precise global trajectory. We model the physical image formation process of motion-blurred images and minimize the error between the observed blurry images and rendered blurry images obtained by averaging sharp virtual sub-frame images. Additionally, by utilizing a monocular depth estimator alongside the online deformation of Gaussians, we ensure precise mapping and enhanced image deblurring. The proposed SLAM pipeline integrates all these components to improve the results. We achieve state-of-the-art results for sharp map estimation and sub-frame trajectory recovery both on synthetic and real-world blurry input data.\n\n我们提出了Deblur-SLAM，这是一种强大的RGB SLAM管道，旨在从运动模糊的输入中恢复出清晰的重建。所提出的方法结合了帧对帧和帧对模型两种方法的优点，能够建模子帧相机轨迹，从而在运动模糊环境下实现高保真的重建。此外，我们的管道还采用了在线闭环检测和全局束束调整等技术，以实现密集且精确的全局轨迹。我们建模了运动模糊图像的物理图像形成过程，并最小化观测到的模糊图像与通过平均清晰虚拟子帧图像得到的渲染模糊图像之间的误差。此外，通过利用单目深度估计器和在线高斯点形变，我们确保了精确的地图映射和增强的图像去模糊效果。所提出的SLAM管道将所有这些组件集成在一起，提升了结果。在合成数据和真实世界的模糊输入数据上，我们在清晰地图估计和子帧轨迹恢复方面都达到了最先进的结果。\n"
  },
  {
    "path": "abs/2503.12836.md",
    "content": "### CompMarkGS: Robust Watermarking for Compression 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) enables rapid differentiable rendering for 3D reconstruction and novel view synthesis, leading to its widespread commercial use. Consequently, copyright protection via watermarking has become critical. However, because 3DGS relies on millions of Gaussians, which require gigabytes of storage, efficient transfer and storage require compression. Existing 3DGS watermarking methods are vulnerable to quantization-based compression, often resulting in the loss of the embedded watermark. To address this challenge, we propose a novel watermarking method that ensures watermark robustness after model compression while maintaining high rendering quality. In detail, we incorporate a quantization distortion layer that simulates compression during training, preserving the watermark under quantization-based compression. Also, we propose a learnable watermark embedding feature that embeds the watermark into the anchor feature, ensuring structural consistency and seamless integration into the 3D scene. Furthermore, we present a frequency-aware anchor growing mechanism to enhance image quality in high-frequency regions by effectively identifying Guassians within these regions. Experimental results confirm that our method preserves the watermark and maintains superior image quality under high compression, validating it as a promising approach for a secure 3DGS model.\n\n三维高斯点云渲染（3DGS）实现了快速的可微分渲染，用于三维重建和新颖视角合成，推动了其广泛的商业应用。因此，通过水印保护版权变得至关重要。然而，由于3DGS依赖于数百万个高斯点，且需要数GB的存储空间，因此高效的传输和存储需要压缩。现有的3DGS水印方法容易受到基于量化的压缩的影响，通常导致嵌入的水印丢失。为了解决这一挑战，我们提出了一种新颖的水印方法，确保在模型压缩后水印的鲁棒性，同时保持高渲染质量。具体而言，我们引入了一个量化失真层，在训练过程中模拟压缩，从而在基于量化的压缩下保持水印。此外，我们提出了一种可学习的水印嵌入特征，将水印嵌入到锚点特征中，确保结构一致性，并无缝集成到三维场景中。进一步地，我们提出了一种频率感知的锚点扩展机制，通过有效识别高频区域中的高斯点来增强这些区域的图像质量。实验结果验证了我们的方法在高压缩情况下能够保持水印，并维持优越的图像质量，证明其是一个有前景的安全3DGS模型方法。\n"
  },
  {
    "path": "abs/2503.12862.md",
    "content": "### CAT-3DGS Pro: A New Benchmark for Efficient 3DGS Compression\n\n3D Gaussian Splatting (3DGS) has shown immense potential for novel view synthesis. However, achieving rate-distortion-optimized compression of 3DGS representations for transmission and/or storage applications remains a challenge. CAT-3DGS introduces a context-adaptive triplane hyperprior for end-to-end optimized compression, delivering state-of-the-art coding performance. Despite this, it requires prolonged training and decoding time. To address these limitations, we propose CAT-3DGS Pro, an enhanced version of CAT-3DGS that improves both compression performance and computational efficiency. First, we introduce a PCA-guided vector-matrix hyperprior, which replaces the triplane-based hyperprior to reduce redundant parameters. To achieve a more balanced rate-distortion trade-off and faster encoding, we propose an alternate optimization strategy (A-RDO). Additionally, we refine the sampling rate optimization method in CAT-3DGS, leading to significant improvements in rate-distortion performance. These enhancements result in a 46.6% BD-rate reduction and 3x speedup in training time on BungeeNeRF, while achieving 5x acceleration in decoding speed for the Amsterdam scene compared to CAT-3DGS.\n\n三维高斯点云渲染（3DGS）在新颖视角合成方面展现了巨大潜力。然而，实现针对传输和/或存储应用的3DGS表示的率失真优化压缩仍然是一个挑战。CAT-3DGS引入了上下文自适应三平面超先验（triplane hyperprior）用于端到端优化压缩，提供了最先进的编码性能。尽管如此，它仍然需要较长的训练和解码时间。为了解决这些限制，我们提出了CAT-3DGS Pro，这是CAT-3DGS的增强版本，改进了压缩性能和计算效率。首先，我们引入了一种基于主成分分析（PCA）的向量矩阵超先验，替代了基于三平面的超先验，以减少冗余参数。为了实现更平衡的率失真权衡和更快的编码速度，我们提出了一种替代优化策略（A-RDO）。此外，我们优化了CAT-3DGS中的采样率优化方法，从而显著提高了率失真性能。这些改进使得在BungeeNeRF上实现了46.6%的BD-rate降低和3倍的训练时间加速，而在Amsterdam场景的解码速度上相较于CAT-3DGS实现了5倍的加速。\n"
  },
  {
    "path": "abs/2503.12886.md",
    "content": "### RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars\n\nWe present Reduced Gaussian Blendshapes Avatar (RGBAvatar), a method for reconstructing photorealistic, animatable head avatars at speeds sufficient for on-the-fly reconstruction. Unlike prior approaches that utilize linear bases from 3D morphable models (3DMM) to model Gaussian blendshapes, our method maps tracked 3DMM parameters into reduced blendshape weights with an MLP, leading to a compact set of blendshape bases. The learned compact base composition effectively captures essential facial details for specific individuals, and does not rely on the fixed base composition weights of 3DMM, leading to enhanced reconstruction quality and higher efficiency. To further expedite the reconstruction process, we develop a novel color initialization estimation method and a batch-parallel Gaussian rasterization process, achieving state-of-the-art quality with training throughput of about 630 images per second. Moreover, we propose a local-global sampling strategy that enables direct on-the-fly reconstruction, immediately reconstructing the model as video streams in real time while achieving quality comparable to offline settings.\n\n我们提出了减少高斯混合形状头像（RGBAvatar）的方法，用于以足够的速度进行即时重建，重建出逼真的、可动画化的头部头像。与先前的方法不同，先前的方法使用来自三维可变形模型（3DMM）的线性基来建模高斯混合形状，而我们的方法通过使用多层感知器（MLP）将追踪到的3DMM参数映射到减少的混合形状权重，从而得到一组紧凑的混合形状基。学习到的紧凑基组合有效捕捉了特定个体的面部细节，并且不依赖于3DMM固定基组合的权重，从而提高了重建质量和效率。为了进一步加速重建过程，我们开发了一种新颖的颜色初始化估计方法和批量并行高斯光栅化过程，实现了最先进的质量，并且训练吞吐量约为每秒630张图像。此外，我们提出了一种局部-全局采样策略，使得能够直接进行即时重建，在实时视频流中立即重建模型，同时达到与离线设置相当的质量。\n"
  },
  {
    "path": "abs/2503.13086.md",
    "content": "### Gaussian On-the-Fly Splatting: A Progressive Framework for Robust Near Real-Time 3DGS Optimization\n\n3D Gaussian Splatting (3DGS) achieves high-fidelity rendering with fast real-time performance, but existing methods rely on offline training after full Structure-from-Motion (SfM) processing. In contrast, this work introduces On-the-Fly GS, a progressive framework enabling near real-time 3DGS optimization during image capture. As each image arrives, its pose and sparse points are updated via on-the-fly SfM, and newly optimized Gaussians are immediately integrated into the 3DGS field. We propose a progressive local optimization strategy to prioritize new images and their neighbors by their corresponding overlapping relationship, allowing the new image and its overlapping images to get more training. To further stabilize training across old and new images, an adaptive learning rate schedule balances the iterations and the learning rate. Moreover, to maintain overall quality of the 3DGS field, an efficient global optimization scheme prevents overfitting to the newly added images. Experiments on multiple benchmark datasets show that our On-the-Fly GS reduces training time significantly, optimizing each new image in seconds with minimal rendering loss, offering the first practical step toward rapid, progressive 3DGS reconstruction.\n\n三维高斯点云渲染（3DGS）实现了高保真渲染和快速的实时性能，但现有方法依赖于在完整的运动结构（SfM）处理后进行离线训练。与此不同，本文提出了On-the-Fly GS，一种渐进式框架，能够在图像捕捉过程中实现近实时的3DGS优化。随着每张图像的到来，其姿态和稀疏点通过实时SfM进行更新，优化后的高斯点立即集成到3DGS场中。我们提出了一种渐进式局部优化策略，通过对应的重叠关系来优先处理新图像及其邻域，使新图像及其重叠图像能获得更多的训练。为了进一步稳定旧图像和新图像的训练过程，我们采用了自适应学习率调度，平衡迭代次数和学习率。此外，为了保持3DGS场的整体质量，我们设计了一种高效的全局优化方案，防止过拟合到新加入的图像。多个基准数据集上的实验表明，我们的On-the-Fly GS显著减少了训练时间，能够在几秒钟内优化每张新图像，同时保持最小的渲染损失，为快速、渐进式3DGS重建迈出了实际的第一步。\n"
  },
  {
    "path": "abs/2503.13176.md",
    "content": "### DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction\n\nReconstructing clean, distractor-free 3D scenes from real-world captures remains a significant challenge, particularly in highly dynamic and cluttered settings such as egocentric videos. To tackle this problem, we introduce DeGauss, a simple and robust self-supervised framework for dynamic scene reconstruction based on a decoupled dynamic-static Gaussian Splatting design. DeGauss models dynamic elements with foreground Gaussians and static content with background Gaussians, using a probabilistic mask to coordinate their composition and enable independent yet complementary optimization. DeGauss generalizes robustly across a wide range of real-world scenarios, from casual image collections to long, dynamic egocentric videos, without relying on complex heuristics or extensive supervision. Experiments on benchmarks including NeRF-on-the-go, ADT, AEA, Hot3D, and EPIC-Fields demonstrate that DeGauss consistently outperforms existing methods, establishing a strong baseline for generalizable, distractor-free 3D reconstructionin highly dynamic, interaction-rich environments.\n\n从真实世界捕捉中重建干净且无干扰的三维场景仍然是一个重大挑战，特别是在高度动态和杂乱的环境中，如自我中心的视频。为了解决这个问题，我们提出了DeGauss，这是一个简单且稳健的自监督框架，基于解耦的动态-静态高斯点云渲染设计进行动态场景重建。DeGauss使用前景高斯点来建模动态元素，使用背景高斯点来建模静态内容，采用概率掩膜来协调它们的组合，使得动态和静态部分能够独立但互补地优化。DeGauss在广泛的真实场景中表现出稳健的泛化能力，从随意的图像集合到长时间、动态的自我中心视频，均无需依赖复杂的启发式方法或大量的监督。我们在多个基准数据集（包括NeRF-on-the-go、ADT、AEA、Hot3D和EPIC-Fields）上的实验表明，DeGauss始终优于现有方法，为在高度动态且富有互动的环境中进行通用、无干扰的三维重建奠定了坚实的基准。\n"
  },
  {
    "path": "abs/2503.13272.md",
    "content": "### Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors\n\nSynthesizing consistent and photorealistic 3D scenes is an open problem in computer vision. Video diffusion models generate impressive videos but cannot directly synthesize 3D representations, i.e., lack 3D consistency in the generated sequences. In addition, directly training generative 3D models is challenging due to a lack of 3D training data at scale. In this work, we present Generative Gaussian Splatting (GGS) -- a novel approach that integrates a 3D representation with a pre-trained latent video diffusion model. Specifically, our model synthesizes a feature field parameterized via 3D Gaussian primitives. The feature field is then either rendered to feature maps and decoded into multi-view images, or directly upsampled into a 3D radiance field. We evaluate our approach on two common benchmark datasets for scene synthesis, RealEstate10K and ScanNet+, and find that our proposed GGS model significantly improves both the 3D consistency of the generated multi-view images, and the quality of the generated 3D scenes over all relevant baselines. Compared to a similar model without 3D representation, GGS improves FID on the generated 3D scenes by ~20% on both RealEstate10K and ScanNet+.\n\n合成一致且逼真的三维场景是计算机视觉中的一个开放问题。视频扩散模型能够生成令人印象深刻的视频，但不能直接合成三维表示，即在生成的序列中缺乏三维一致性。此外，由于缺乏大规模的三维训练数据，直接训练生成式三维模型也非常具有挑战性。在这项工作中，我们提出了生成式高斯点云渲染（GGS）——一种将三维表示与预训练的潜在视频扩散模型结合的创新方法。具体而言，我们的模型合成一个通过三维高斯原语参数化的特征场。然后，这个特征场可以被渲染为特征图并解码为多视角图像，或者直接上采样为三维辐射场。我们在两个常见的场景合成基准数据集——RealEstate10K和ScanNet+上评估了我们的方法，发现我们提出的GGS模型显著提高了生成的多视角图像的三维一致性以及生成的三维场景的质量，优于所有相关基线。与没有三维表示的类似模型相比，GGS在生成的三维场景上的FID指标在RealEstate10K和ScanNet+上分别提高了约20%。\n"
  },
  {
    "path": "abs/2503.13948.md",
    "content": "### Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model\n\n3D Gaussian Splatting (3DGS) has emerged as an efficient and high-fidelity paradigm for novel view synthesis. To adapt 3DGS for dynamic content, deformable 3DGS incorporates temporally deformable primitives with learnable latent embeddings to capture complex motions. Despite its impressive performance, the high-dimensional embeddings and vast number of primitives lead to substantial storage requirements. In this paper, we introduce a Lightweight 4DGS framework, called Light4GS, that employs significance pruning with a deep context model to provide a lightweight storage-efficient dynamic 3DGS representation. The proposed Light4GS is based on 4DGS that is a typical representation of deformable 3DGS. Specifically, our framework is built upon two core components: (1) a spatio-temporal significance pruning strategy that eliminates over 64% of the deformable primitives, followed by an entropy-constrained spherical harmonics compression applied to the remainder; and (2) a deep context model that integrates intra- and inter-prediction with hyperprior into a coarse-to-fine context structure to enable efficient multiscale latent embedding compression. Our approach achieves over 120x compression and increases rendering FPS up to 20% compared to the baseline 4DGS, and also superior to frame-wise state-of-the-art 3DGS compression methods, revealing the effectiveness of our Light4GS in terms of both intra- and inter-prediction methods without sacrificing rendering quality.\n\n三维高斯点云渲染（3DGS）已成为一个高效且高保真的新颖视角合成范式。为了使3DGS适应动态内容，变形三维高斯点云渲染（Deformable 3DGS）结合了具有可学习潜在嵌入的时变变形原语，以捕捉复杂的运动。尽管其表现令人印象深刻，但高维嵌入和大量的原语导致了显著的存储需求。本文提出了一种轻量级4D高斯点云渲染框架，称为Light4GS，它采用深度上下文模型结合显著性剪枝技术，提供一种轻量级存储高效的动态3DGS表示。所提出的Light4GS基于4DGS，它是变形3DGS的典型表示。具体而言，我们的框架由两个核心组件构建：（1）一种时空显著性剪枝策略，消除了超过64%的变形原语，剩余部分应用熵约束的球谐压缩；（2）一种深度上下文模型，结合了内部和外部预测与超先验，形成从粗到细的上下文结构，以实现高效的多尺度潜在嵌入压缩。与基准4DGS相比，我们的方法实现了超过120倍的压缩，并且渲染帧率提升了20%，还优于逐帧最先进的3DGS压缩方法，展示了我们Light4GS方法在不牺牲渲染质量的情况下，在内部和外部预测方法上的有效性。\n"
  },
  {
    "path": "abs/2503.13961.md",
    "content": "### BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering\n\nDifferentiable rendering enables efficient optimization by allowing gradients to be computed through the rendering process, facilitating 3D reconstruction, inverse rendering and neural scene representation learning. To ensure differentiability, existing solutions approximate or re-formulate traditional rendering operations using smooth, probabilistic proxies such as volumes or Gaussian primitives. Consequently, they struggle to preserve sharp edges due to the lack of explicit boundary definitions. We present a novel hybrid representation, Bézier Gaussian Triangle (BG-Triangle), that combines Bézier triangle-based vector graphics primitives with Gaussian-based probabilistic models, to maintain accurate shape modeling while conducting resolution-independent differentiable rendering. We present a robust and effective discontinuity-aware rendering technique to reduce uncertainties at object boundaries. We also employ an adaptive densification and pruning scheme for efficient training while reliably handling level-of-detail (LoD) variations. Experiments show that BG-Triangle achieves comparable rendering quality as 3DGS but with superior boundary preservation. More importantly, BG-Triangle uses a much smaller number of primitives than its alternatives, showcasing the benefits of vectorized graphics primitives and the potential to bridge the gap between classic and emerging representations.\n\n可微分渲染通过允许在渲染过程中计算梯度，从而实现高效的优化，促进了三维重建、逆向渲染和神经场景表示学习。为了确保可微分性，现有的解决方案通常通过使用平滑的概率代理，如体积或高斯原语，来近似或重新构造传统渲染操作。因此，它们在保持锐利边缘方面存在困难，因为缺乏明确的边界定义。我们提出了一种新颖的混合表示，称为贝塞尔高斯三角形（BG-Triangle），它将基于贝塞尔三角形的矢量图形原语与基于高斯的概率模型相结合，以保持准确的形状建模，同时进行分辨率无关的可微分渲染。我们提出了一种稳健且有效的断点感知渲染技术，以减少物体边界的偏差。我们还采用了自适应密集化和剪枝方案，以实现高效训练，同时可靠地处理细节层次（LoD）变化。实验表明，BG-Triangle在渲染质量上与3DGS相当，但在边界保持方面表现更优。更重要的是，BG-Triangle使用的原语数量远小于其他方法，展示了矢量图形原语的优势，并有潜力弥合经典和新兴表示之间的差距。\n"
  },
  {
    "path": "abs/2503.14029.md",
    "content": "### Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting\n\nLifting multi-view 2D instance segmentation to a radiance field has proven to be effective to enhance 3D understanding. Existing methods rely on direct matching for end-to-end lifting, yielding inferior results; or employ a two-stage solution constrained by complex pre- or post-processing. In this work, we design a new end-to-end object-aware lifting approach, named Unified-Lift that provides accurate 3D segmentation based on the 3D Gaussian representation. To start, we augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information. Importantly, we introduce a learnable object-level codebook to account for individual objects in the scene for an explicit object-level understanding and associate the encoded object-level features with the Gaussian-level point features for segmentation predictions. While promising, achieving effective codebook learning is non-trivial and a naive solution leads to degraded performance. Therefore, we formulate the association learning module and the noisy label filtering module for effective and robust codebook learning. We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms datasets. Both qualitative and quantitative results manifest that our Unified-Lift clearly outperforms existing methods in terms of segmentation quality and time efficiency.\n\n将多视角二维实例分割提升到辐射场已被证明是增强三维理解的有效方法。现有方法依赖于直接匹配进行端到端提升，导致效果较差；或者采用两阶段解决方案，受限于复杂的预处理或后处理。本文提出了一种新的端到端面向物体的提升方法，称为Unified-Lift，它基于三维高斯表示提供准确的三维分割。首先，我们通过对每个高斯点进行增强，学习一个额外的高斯级特征，并使用对比损失来编码实例信息。重要的是，我们引入了一个可学习的物体级代码簿，用以考虑场景中的个体物体，进行显式的物体级理解，并将编码的物体级特征与高斯级点特征关联，进行分割预测。尽管前景广阔，但实现有效的代码簿学习并非易事，简单的解决方案会导致性能下降。因此，我们提出了关联学习模块和噪声标签过滤模块，以实现有效且稳健的代码簿学习。我们在三个基准数据集（LERF-Masked、Replica和Messy Rooms）上进行了实验。定性和定量结果表明，我们的Unified-Lift在分割质量和时间效率方面明显优于现有方法。\n"
  },
  {
    "path": "abs/2503.14171.md",
    "content": "### Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images\n\nWe introduce an image upscaling technique tailored for 3D Gaussian Splatting (3DGS) on lightweight GPUs. Compared to 3DGS, it achieves significantly higher rendering speeds and reduces artifacts commonly observed in 3DGS reconstructions. Our technique upscales low-resolution 3DGS renderings with a marginal increase in cost by directly leveraging the analytical image gradients of Gaussians for gradient-based bicubic spline interpolation. The technique is agnostic to the specific 3DGS implementation, achieving novel view synthesis at rates 3x-4x higher than the baseline implementation. Through extensive experiments on multiple datasets, we showcase the performance improvements and high reconstruction fidelity attainable with gradient-aware upscaling of 3DGS images. We further demonstrate the integration of gradient-aware upscaling into the gradient-based optimization of a 3DGS model and analyze its effects on reconstruction quality and performance.\n\n我们提出了一种针对轻量级GPU上的三维高斯点云渲染（3DGS）图像上采样技术。与3DGS相比，该技术实现了显著更高的渲染速度，并减少了在3DGS重建中常见的伪影。我们的技术通过直接利用高斯的解析图像梯度进行基于梯度的三次样条插值，以轻微增加成本的方式对低分辨率的3DGS渲染进行上采样。该技术与具体的3DGS实现无关，在新颖视角合成方面的速度比基准实现提高了3倍至4倍。通过在多个数据集上的广泛实验，我们展示了使用梯度感知上采样技术进行3DGS图像的性能提升和高重建保真度。我们进一步展示了梯度感知上采样与基于梯度的3DGS模型优化的集成，并分析了它对重建质量和性能的影响。\n"
  },
  {
    "path": "abs/2503.14198.md",
    "content": "### RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images\n\nThis paper presents RoGSplat, a novel approach for synthesizing high-fidelity novel views of unseen human from sparse multi-view images, while requiring no cumbersome per-subject optimization. Unlike previous methods that typically struggle with sparse views with few overlappings and are less effective in reconstructing complex human geometry, the proposed method enables robust reconstruction in such challenging conditions. Our key idea is to lift SMPL vertices to dense and reliable 3D prior points representing accurate human body geometry, and then regress human Gaussian parameters based on the points. To account for possible misalignment between SMPL model and images, we propose to predict image-aligned 3D prior points by leveraging both pixel-level features and voxel-level features, from which we regress the coarse Gaussians. To enhance the ability to capture high-frequency details, we further render depth maps from the coarse 3D Gaussians to help regress fine-grained pixel-wise Gaussians. Experiments on several benchmark datasets demonstrate that our method outperforms state-of-the-art methods in novel view synthesis and cross-dataset generalization.\n\n本文提出了RoGSplat，一种新颖的方法，通过稀疏多视角图像合成未见过的人体的新颖视角，同时无需繁琐的每个主体优化。与以往通常在稀疏视角下重建效果差且对复杂人体几何形状表现不佳的方法不同，所提出的方法能够在这种挑战性条件下实现稳健的重建。我们的核心思路是将SMPL模型的顶点提升到表示准确人体几何的密集可靠三维先验点，然后基于这些点回归人体高斯参数。为了应对SMPL模型和图像之间可能的错位，我们提出通过利用像素级特征和体素级特征来预测与图像对齐的三维先验点，从这些点回归粗略的高斯点。为了增强捕捉高频细节的能力，我们进一步从粗略的三维高斯点渲染深度图，帮助回归细粒度的像素级高斯点。在多个基准数据集上的实验表明，我们的方法在新颖视角合成和跨数据集泛化方面优于最先进的方法。\n"
  },
  {
    "path": "abs/2503.14274.md",
    "content": "### Improving Adaptive Density Control for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has become one of the most influential works in the past year. Due to its efficient and high-quality novel view synthesis capabilities, it has been widely adopted in many research fields and applications. Nevertheless, 3DGS still faces challenges to properly manage the number of Gaussian primitives that are used during scene reconstruction. Following the adaptive density control (ADC) mechanism of 3D Gaussian Splatting, new Gaussians in under-reconstructed regions are created, while Gaussians that do not contribute to the rendering quality are pruned. We observe that those criteria for densifying and pruning Gaussians can sometimes lead to worse rendering by introducing artifacts. We especially observe under-reconstructed background or overfitted foreground regions. To encounter both problems, we propose three new improvements to the adaptive density control mechanism. Those include a correction for the scene extent calculation that does not only rely on camera positions, an exponentially ascending gradient threshold to improve training convergence, and significance-aware pruning strategy to avoid background artifacts. With these adaptions, we show that the rendering quality improves while using the same number of Gaussians primitives. Furthermore, with our improvements, the training converges considerably faster, allowing for more than twice as fast training times while yielding better quality than 3DGS. Finally, our contributions are easily compatible with most existing derivative works of 3DGS making them relevant for future works.\n\n三维高斯点云渲染（3DGS）已经成为过去一年中最具影响力的工作之一。由于其高效且高质量的新颖视角合成能力，它在许多研究领域和应用中得到了广泛应用。然而，3DGS在场景重建过程中仍面临着如何有效管理所使用的高斯原语数量的挑战。根据3D高斯点云渲染的自适应密度控制（ADC）机制，在重建不足的区域会生成新的高斯点，而那些对渲染质量没有贡献的高斯点会被剪枝。我们观察到，用于增加密度和剪枝高斯点的标准有时会通过引入伪影导致更差的渲染效果。我们特别注意到，背景区域重建不足或前景区域过拟合的情况。为了解决这两个问题，我们提出了三项新的改进措施来增强自适应密度控制机制。具体包括：一种场景范围计算的修正方法，它不仅依赖于相机位置；一种指数递增的梯度阈值，以提高训练的收敛性；以及一种基于显著性的剪枝策略，以避免背景伪影。通过这些改进，我们展示了在使用相同数量的高斯原语时，渲染质量得到了提升。此外，经过这些改进后，训练收敛速度大大加快，训练时间比3DGS快了两倍以上，并且得到了比3DGS更好的质量。最后，我们的贡献与大多数现有的3DGS衍生工作高度兼容，具有较高的未来应用价值。\n"
  },
  {
    "path": "abs/2503.14475.md",
    "content": "### Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation\n\nThe field of Novel View Synthesis has been revolutionized by 3D Gaussian Splatting (3DGS), which enables high-quality scene reconstruction that can be rendered in real-time. 3DGS-based techniques typically suffer from high GPU memory and disk storage requirements which limits their practical application on consumer-grade devices. We propose Opti3DGS, a novel frequency-modulated coarse-to-fine optimization framework that aims to minimize the number of Gaussian primitives used to represent a scene, thus reducing memory and storage demands. Opti3DGS leverages image frequency modulation, initially enforcing a coarse scene representation and progressively refining it by modulating frequency details in the training images. On the baseline 3DGS, we demonstrate an average reduction of 62% in Gaussians, a 40% reduction in the training GPU memory requirements and a 20% reduction in optimization time without sacrificing the visual quality. Furthermore, we show that our method integrates seamlessly with many 3DGS-based techniques, consistently reducing the number of Gaussian primitives while maintaining, and often improving, visual quality. Additionally, Opti3DGS inherently produces a level-of-detail scene representation at no extra cost, a natural byproduct of the optimization pipeline.\n\n新颖视角合成领域已经被三维高斯点云渲染（3DGS）所革新，它能够实现高质量的场景重建，并且能够实时渲染。然而，基于3DGS的技术通常存在较高的GPU内存和磁盘存储需求，这限制了它们在消费级设备上的实际应用。我们提出了Opti3DGS，一种新颖的频率调制粗到细优化框架，旨在最小化用于表示场景的高斯原语数量，从而减少内存和存储需求。Opti3DGS利用图像频率调制，最初强制执行粗略的场景表示，并通过调节训练图像中的频率细节逐步优化该表示。在基准的3DGS上，我们展示了高斯点数量平均减少了62%，训练GPU内存需求减少了40%，优化时间减少了20%，且视觉质量没有受到牺牲。此外，我们展示了我们的方法可以与许多基于3DGS的技术无缝集成，在保持视觉质量的同时，始终减少高斯原语的数量，且通常还能改善视觉质量。更重要的是，Opti3DGS自然地生成了一个分级细节的场景表示，而无需额外的成本，这是优化流程的自然副产品。\n"
  },
  {
    "path": "abs/2503.14698.md",
    "content": "### SplatVoxel: History-Aware Novel View Streaming without Temporal Training\n\nWe study the problem of novel view streaming from sparse-view videos, which aims to generate a continuous sequence of high-quality, temporally consistent novel views as new input frames arrive. However, existing novel view synthesis methods struggle with temporal coherence and visual fidelity, leading to flickering and inconsistency. To address these challenges, we introduce history-awareness, leveraging previous frames to reconstruct the scene and improve quality and stability. We propose a hybrid splat-voxel feed-forward scene reconstruction approach that combines Gaussian Splatting to propagate information over time, with a hierarchical voxel grid for temporal fusion. Gaussian primitives are efficiently warped over time using a motion graph that extends 2D tracking models to 3D motion, while a sparse voxel transformer integrates new temporal observations in an error-aware manner. Crucially, our method does not require training on multi-view video datasets, which are currently limited in size and diversity, and can be directly applied to sparse-view video streams in a history-aware manner at inference time. Our approach achieves state-of-the-art performance in both static and streaming scene reconstruction, effectively reducing temporal artifacts and visual artifacts while running at interactive rates (15 fps with 350ms delay) on a single H100 GPU. Project Page: this https URL\n\n我们研究了来自稀疏视角视频的新颖视角流媒体生成问题，旨在随着新的输入帧到来，生成连续的高质量、时间一致的新视角。然而，现有的新颖视角合成方法在时间一致性和视觉保真度方面存在困难，导致闪烁和不一致。为了解决这些挑战，我们引入了历史感知，通过利用前面的帧来重建场景，提升质量和稳定性。我们提出了一种混合点云-体素前馈场景重建方法，结合了高斯点云渲染用于传播信息，采用层次化体素网格进行时间融合。通过一个运动图，我们高效地对高斯原语进行时间扭曲，将二维跟踪模型扩展到三维运动，同时一个稀疏体素变换器以错误感知的方式整合新的时间观测数据。关键的是，我们的方法不需要在多视角视频数据集上进行训练（当前这些数据集在规模和多样性上都有限），并且可以直接在推理时以历史感知的方式应用于稀疏视角视频流。我们的 approach 在静态和流媒体场景重建中都实现了最先进的性能，有效减少了时间伪影和视觉伪影，同时在单个H100 GPU上以交互速率（15帧/秒，350毫秒延迟）运行。\n"
  },
  {
    "path": "abs/2503.14736.md",
    "content": "### HandSplat: Embedding-Driven Gaussian Splatting for High-Fidelity Hand Rendering\n\nExisting 3D Gaussian Splatting (3DGS) methods for hand rendering rely on rigid skeletal motion with an oversimplified non-rigid motion model, which fails to capture fine geometric and appearance details. Additionally, they perform densification based solely on per-point gradients and process poses independently, ignoring spatial and temporal correlations. These limitations lead to geometric detail loss, temporal instability, and inefficient point distribution. To address these issues, we propose HandSplat, a novel Gaussian Splatting-based framework that enhances both fidelity and stability for hand rendering. To improve fidelity, we extend standard 3DGS attributes with implicit geometry and appearance embeddings for finer non-rigid motion modeling while preserving the static hand characteristic modeled by original 3DGS attributes. Additionally, we introduce a local gradient-aware densification strategy that dynamically refines Gaussian density in high-variation regions. To improve stability, we incorporate pose-conditioned attribute regularization to encourage attribute consistency across similar poses, mitigating temporal artifacts. Extensive experiments on InterHand2.6M demonstrate that HandSplat surpasses existing methods in fidelity and stability while achieving real-time performance.\n\n现有的基于三维高斯点云渲染（3DGS）的手部渲染方法依赖于刚性骨架运动，并使用过于简化的非刚性运动模型，无法捕捉精细的几何和外观细节。此外，它们仅基于每个点的梯度进行密集化，并且独立处理姿态，忽视了空间和时间的相关性。这些限制导致了几何细节丧失、时间不稳定性和低效的点分布。为了解决这些问题，我们提出了HandSplat，一种基于高斯点云渲染的全新框架，旨在增强手部渲染的保真度和稳定性。为了提高保真度，我们通过引入隐式几何和外观嵌入，扩展了标准3DGS属性，以实现更精细的非刚性运动建模，同时保留了原始3DGS属性所建模的静态手部特征。此外，我们还提出了一种局部梯度感知密集化策略，动态地在高变化区域细化高斯点的密度。为了提高稳定性，我们引入了姿态条件的属性正则化，鼓励相似姿态之间的属性一致性，从而减轻时间伪影。我们在InterHand2.6M数据集上的大量实验表明，HandSplat在保真度和稳定性方面超越了现有方法，并实现了实时性能。我们将在论文接受后公开代码和预训练模型。\n"
  },
  {
    "path": "abs/2503.14786.md",
    "content": "### SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting\n\nEdges are one of the most basic parametric primitives to describe structural information in 3D. In this paper, we study parametric 3D edge reconstruction from calibrated multi-view images. Previous methods usually reconstruct a 3D edge point set from multi-view 2D edge images, and then fit 3D edges to the point set. However, noise in the point set may cause gaps among fitted edges, and the recovered edges may not align with input multi-view images since the edge fitting depends only on the reconstructed 3D point set. To mitigate these problems, we propose SketchSplat, a method to reconstruct accurate, complete, and compact 3D edges via differentiable multi-view sketch splatting. We represent 3D edges as sketches, which are parametric lines and curves defined by attributes including control points, scales, and opacity. During edge reconstruction, we iteratively sample Gaussian points from a set of sketches and rasterize the Gaussians onto 2D edge images. Then the gradient of the image error with respect to the input 2D edge images can be back-propagated to optimize the sketch attributes. Our method bridges 2D edge images and 3D edges in a differentiable manner, which ensures that 3D edges align well with 2D images and leads to accurate and complete results. We also propose a series of adaptive topological operations and apply them along with the sketch optimization. The topological operations help reduce the number of sketches required while ensuring high accuracy, yielding a more compact reconstruction. Finally, we contribute an accurate 2D edge detector that improves the performance of both ours and existing methods. Experiments show that our method achieves state-of-the-art accuracy, completeness, and compactness on a benchmark CAD dataset.\n\n边缘是描述三维结构信息的最基本参数化原语之一。本文研究了从标定的多视角图像中进行参数化三维边缘重建的问题。以往的方法通常从多视角二维边缘图像重建三维边缘点集，然后对三维边缘进行拟合。然而，点集中的噪声可能导致拟合的边缘之间出现间隙，且恢复的边缘可能与输入的多视角图像不对齐，因为边缘拟合仅依赖于重建的三维点集。为了解决这些问题，我们提出了SketchSplat，一种通过可微分的多视角草图点云渲染来重建准确、完整且紧凑的三维边缘的方法。我们将三维边缘表示为草图，这些草图是由控制点、尺度和不透明度等属性定义的参数化线条和曲线。在边缘重建过程中，我们从草图集迭代地采样高斯点，并将这些高斯点光栅化到二维边缘图像上。然后，图像误差相对于输入二维边缘图像的梯度可以反向传播，用于优化草图的属性。我们的方法以可微分的方式桥接了二维边缘图像和三维边缘，确保三维边缘与二维图像良好对齐，从而得到准确和完整的结果。我们还提出了一系列自适应拓扑操作，并与草图优化一起应用。拓扑操作有助于减少所需草图的数量，同时确保高精度，从而实现更紧凑的重建。最后，我们贡献了一种准确的二维边缘检测器，能够提高我们方法和现有方法的性能。实验表明，我们的方法在基准CAD数据集上实现了最先进的精度、完整性和紧凑性。\n"
  },
  {
    "path": "abs/2503.14845.md",
    "content": "### ClimateGS: Real-Time Climate Simulation with 3D Gaussian Style Transfer\n\nAdverse climate conditions pose significant challenges for autonomous systems, demanding reliable perception and decision-making across diverse environments. To better simulate these conditions, physically-based NeRF rendering methods have been explored for their ability to generate realistic scene representations. However, these methods suffer from slow rendering speeds and long preprocessing times, making them impractical for real-time testing and user interaction. This paper presents ClimateGS, a novel framework integrating 3D Gaussian representations with physical simulation to enable real-time climate effects rendering. The novelty of this work is threefold: 1) developing a linear transformation for 3D Gaussian photorealistic style transfer, enabling direct modification of spherical harmonics across bands for efficient and consistent style adaptation; 2) developing a joint training strategy for 3D style transfer, combining supervised and self-supervised learning to accelerate convergence while preserving original scene details; 3) developing a real-time rendering method for climate simulation, integrating physics-based effects with 3D Gaussian to achieve efficient and realistic rendering. We evaluate ClimateGS on MipNeRF360 and Tanks and Temples, demonstrating real-time rendering with comparable or superior visual quality to SOTA 2D/3D methods, making it suitable for interactive applications.\n\n恶劣的气候条件对自主系统提出了重大挑战，需要在不同环境中实现可靠的感知和决策。为了更好地模拟这些条件，基于物理的 NeRF 渲染方法 被探索用于生成逼真的场景表示。然而，这些方法通常存在渲染速度慢和长时间预处理的问题，使其不适合用于实时测试和用户交互。\n本文提出了 ClimateGS，一个将 3D 高斯表示 与 物理仿真 相结合的框架，旨在实现实时气候效应渲染。该方法的创新之处在于：开发了3D 高斯逼真风格转移的线性变换，使得可以通过跨频带直接修改球谐函数，从而实现高效且一致的风格适配；提出了3D 风格转移的联合训练策略，结合了监督学习和自监督学习，既加速了收敛速度，又保持了原始场景细节；并且开发了一种实时渲染方法，用于气候仿真，将基于物理的效应与 3D 高斯结合，实现了高效且逼真的渲染。\n我们在 MipNeRF360 和 Tanks and Temples 数据集上评估了 ClimateGS，结果表明，它在实时渲染中具有与最先进的 2D/3D 方法相媲美或更优的视觉质量，适合用于交互式应用。\n"
  },
  {
    "path": "abs/2503.15671.md",
    "content": "### CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image\n\nReconstructing clothed humans from a single image is a fundamental task in computer vision with wide-ranging applications. Although existing monocular clothed human reconstruction solutions have shown promising results, they often rely on the assumption that the human subject is in an occlusion-free environment. Thus, when encountering in-the-wild occluded images, these algorithms produce multiview inconsistent and fragmented reconstructions. Additionally, most algorithms for monocular 3D human reconstruction leverage geometric priors such as SMPL annotations for training and inference, which are extremely challenging to acquire in real-world applications. To address these limitations, we propose CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-ConsistEncy from a Single Image, a novel pipeline designed to reconstruct occlusion-resilient 3D humans with multiview consistency from a single occluded image, without requiring either ground-truth geometric prior annotations or 3D supervision. Specifically, CHROME leverages a multiview diffusion model to first synthesize occlusion-free human images from the occluded input, compatible with off-the-shelf pose control to explicitly enforce cross-view consistency during synthesis. A 3D reconstruction model is then trained to predict a set of 3D Gaussians conditioned on both the occluded input and synthesized views, aligning cross-view details to produce a cohesive and accurate 3D representation. CHROME achieves significant improvements in terms of both novel view synthesis (upto 3 db PSNR) and geometric reconstruction under challenging conditions.\n\n从单张图像重建穿衣人类是计算机视觉中的一个基本任务，具有广泛的应用前景。尽管现有的单目穿衣人类重建解决方案已取得了令人鼓舞的成果，但它们通常依赖于一个假设：即人类对象处于无遮挡环境中。因此，当遇到野外遮挡图像时，这些算法会产生多视角不一致和碎片化的重建结果。此外，大多数单目 3D 人类重建算法利用几何先验（如 SMPL 注释）进行训练和推理，但这些先验在现实世界应用中非常难以获得。\n为了解决这些问题，我们提出了 CHROME：Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image，一个新颖的管道，旨在从单张遮挡图像中重建具有遮挡鲁棒性和多视角一致性的 3D 人体，而不需要任何地面真值几何先验注释或3D 监督。\n具体而言，CHROME 利用多视角扩散模型，首先从遮挡的输入图像中合成无遮挡的穿衣人类图像，且与现有的姿态控制兼容，在合成过程中显式地强制执行视角间的一致性。然后，训练一个3D 重建模型，该模型在遮挡的输入和合成视角的条件下预测一组 3D 高斯点，通过对齐视角间的细节，生成一个连贯且准确的 3D 表示。\nCHROME 在新视角合成（最大提升 3 dB PSNR）和几何重建方面，尤其在挑战性条件下，取得了显著的改进。\n"
  },
  {
    "path": "abs/2503.15742.md",
    "content": "### Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes\n\nReconstructing 3D scenes from a single image is a fundamentally ill-posed task due to the severely under-constrained nature of the problem. Consequently, when the scene is rendered from novel camera views, existing single image to 3D reconstruction methods render incoherent and blurry views. This problem is exacerbated when the unseen regions are far away from the input camera. In this work, we address these inherent limitations in existing single image-to-3D scene feedforward networks. To alleviate the poor performance due to insufficient information beyond the input image's view, we leverage a strong generative prior in the form of a pre-trained latent video diffusion model, for iterative refinement of a coarse scene represented by optimizable Gaussian parameters. To ensure that the style and texture of the generated images align with that of the input image, we incorporate on-the-fly Fourier-style transfer between the generated images and the input image. Additionally, we design a semantic uncertainty quantification module that calculates the per-pixel entropy and yields uncertainty maps used to guide the refinement process from the most confident pixels while discarding the remaining highly uncertain ones. We conduct extensive experiments on real-world scene datasets, including in-domain RealEstate-10K and out-of-domain KITTI-v2, showing that our approach can provide more realistic and high-fidelity novel view synthesis results compared to existing state-of-the-art methods.\n\n从单张图像重建 3D 场景是一个本质上不适定的问题，因为该问题的约束条件非常不足。因此，当从新的相机视角渲染场景时，现有的单图像到 3D 重建方法通常会渲染出不连贯且模糊的视图。当未见区域距离输入相机较远时，这一问题尤为严重。\n在本研究中，我们解决了现有单图像到 3D 场景前馈网络中的这些固有限制。为了缓解由于输入图像视角之外信息不足所导致的较差性能，我们利用了强大的生成先验，即预训练的潜在视频扩散模型，用于对通过可优化高斯参数表示的粗糙场景进行迭代优化。为了确保生成图像的风格和纹理与输入图像一致，我们在生成图像和输入图像之间引入了实时傅里叶风格转换。\n此外，我们设计了一个语义不确定性量化模块，该模块计算每个像素的熵并生成不确定性图，用于指导从最有信心的像素开始优化，同时丢弃剩余的不确定像素。\n我们在多个真实场景数据集上进行了广泛实验，包括领域内的 RealEstate-10K 和领域外的 KITTI-v2，结果表明，我们的方法在新视角合成中提供了比现有最先进方法更为逼真和高保真的结果。\n"
  },
  {
    "path": "abs/2503.15809.md",
    "content": "### Controlling Avatar Diffusion with Learnable Gaussian Embedding\n\nRecent advances in diffusion models have made significant progress in digital human generation. However, most existing models still struggle to maintain 3D consistency, temporal coherence, and motion accuracy. A key reason for these shortcomings is the limited representation ability of commonly used control signals(e.g., landmarks, depth maps, etc.). In addition, the lack of diversity in identity and pose variations in public datasets further hinders progress in this area. In this paper, we analyze the shortcomings of current control signals and introduce a novel control signal representation that is optimizable, dense, expressive, and 3D consistent. Our method embeds a learnable neural Gaussian onto a parametric head surface, which greatly enhances the consistency and expressiveness of diffusion-based head models. Regarding the dataset, we synthesize a large-scale dataset with multiple poses and identities. In addition, we use real/synthetic labels to effectively distinguish real and synthetic data, minimizing the impact of imperfections in synthetic data on the generated head images. Extensive experiments show that our model outperforms existing methods in terms of realism, expressiveness, and 3D consistency.\n\n近年来，扩散模型在数字人类生成方面取得了显著进展。然而，大多数现有模型仍难以维持 3D 一致性、时间一致性 和 运动准确性。这些不足的关键原因之一是常用控制信号（如地标、深度图等）表现能力的有限性。此外，公共数据集中身份和姿势变化的多样性不足，进一步阻碍了该领域的进展。\n在本文中，我们分析了当前控制信号的不足，并引入了一种新的控制信号表示方法，该方法具有可优化、密集、富有表现力和3D 一致性的特点。我们的方法将一个可学习的神经高斯嵌入到参数化的头部表面，从而大大增强了基于扩散的头部模型的一致性和表现力。\n在数据集方面，我们合成了一个包含多个姿势和身份的大规模数据集。此外，我们使用真实/合成标签来有效地区分真实数据和合成数据，从而最小化合成数据中的不完美对生成头部图像的影响。\n大量实验表明，我们的模型在真实性、表现力和3D 一致性方面优于现有方法。\n"
  },
  {
    "path": "abs/2503.15835.md",
    "content": "### BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has shown remarkable potential for static scene reconstruction, and recent advancements have extended its application to dynamic scenes. However, the quality of reconstructions depends heavily on high-quality input images and precise camera poses, which are not that trivial to fulfill in real-world scenarios. Capturing dynamic scenes with handheld monocular cameras, for instance, typically involves simultaneous movement of both the camera and objects within a single exposure. This combined motion frequently results in image blur that existing methods cannot adequately handle. To address these challenges, we introduce BARD-GS, a novel approach for robust dynamic scene reconstruction that effectively handles blurry inputs and imprecise camera poses. Our method comprises two main components: 1) camera motion deblurring and 2) object motion deblurring. By explicitly decomposing motion blur into camera motion blur and object motion blur and modeling them separately, we achieve significantly improved rendering results in dynamic regions. In addition, we collect a real-world motion blur dataset of dynamic scenes to evaluate our approach. Extensive experiments demonstrate that BARD-GS effectively reconstructs high-quality dynamic scenes under realistic conditions, significantly outperforming existing methods.\n\n3D 高斯散点 (3DGS) 在静态场景重建方面展现出卓越的潜力，并且近期的研究已将其应用扩展到动态场景。然而，重建质量高度依赖于高质量的输入图像和精确的相机位姿，而在现实世界场景中，这些要求往往难以满足。例如，使用手持单目相机捕捉动态场景通常会导致相机与场景中的物体在同一曝光时间内同时运动。这种复合运动往往会导致图像模糊，而现有方法无法有效处理这一问题。\n为了解决这些挑战，我们提出了一种用于稳健动态场景重建的新方法——BARD-GS，该方法能够有效处理模糊输入和不精确的相机位姿。我们的方法主要包括两个核心组件：1）相机运动去模糊；2）物体运动去模糊。通过显式地将运动模糊分解为相机运动模糊和物体运动模糊，并分别建模处理，我们在动态区域的渲染质量上实现了显著提升。此外，我们构建了一个包含真实世界动态场景运动模糊的数据集，用于评估我们的方法。大量实验表明，BARD-GS 能够在现实条件下高质量地重建动态场景，显著优于现有方法。\n"
  },
  {
    "path": "abs/2503.15855.md",
    "content": "### VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling\n\nWe propose VideoRFSplat, a direct text-to-3D model leveraging a video generation model to generate realistic 3D Gaussian Splatting (3DGS) for unbounded real-world scenes. To generate diverse camera poses and unbounded spatial extent of real-world scenes, while ensuring generalization to arbitrary text prompts, previous methods fine-tune 2D generative models to jointly model camera poses and multi-view images. However, these methods suffer from instability when extending 2D generative models to joint modeling due to the modality gap, which necessitates additional models to stabilize training and inference. In this work, we propose an architecture and a sampling strategy to jointly model multi-view images and camera poses when fine-tuning a video generation model. Our core idea is a dual-stream architecture that attaches a dedicated pose generation model alongside a pre-trained video generation model via communication blocks, generating multi-view images and camera poses through separate streams. This design reduces interference between the pose and image modalities. Additionally, we propose an asynchronous sampling strategy that denoises camera poses faster than multi-view images, allowing rapidly denoised poses to condition multi-view generation, reducing mutual ambiguity and enhancing cross-modal consistency. Trained on multiple large-scale real-world datasets (RealEstate10K, MVImgNet, DL3DV-10K, ACID), VideoRFSplat outperforms existing text-to-3D direct generation methods that heavily depend on post-hoc refinement via score distillation sampling, achieving superior results without such refinement.\n\n我们提出了 VideoRFSplat，一种直接从文本生成三维模型的方法，利用视频生成模型来生成真实感的 3D 高斯散点 (3DGS)，适用于无界真实世界场景。为了生成具有多样化相机位姿和无界空间范围的真实场景，同时确保对任意文本提示的泛化能力，现有方法通常微调 2D 生成模型，以同时建模相机位姿和多视角图像。然而，由于模态差异，这些方法在扩展 2D 生成模型以联合建模时容易出现不稳定性，进而需要额外的模型来稳定训练和推理。\n在本研究中，我们提出了一种新的 架构 和 采样策略，在微调视频生成模型时能够联合建模多视角图像和相机位姿。我们核心的想法是 双流架构，该架构通过通信模块将 专用的相机位姿生成模型 附加到 预训练视频生成模型 之上，从而在独立的流中生成 多视角图像 和 相机位姿。这一设计有效减少了 位姿与图像模态之间的相互干扰。\n此外，我们提出了一种 异步采样策略，使相机位姿的去噪速度快于多视角图像。这样，快速去噪的相机位姿能够更早地对多视角图像生成进行条件约束，从而减少跨模态的不确定性，并增强一致性。\n我们在多个大规模真实世界数据集（RealEstate10K、MVImgNet、DL3DV-10K、ACID）上进行了训练。实验结果表明，VideoRFSplat 优于现有的 直接文本到 3D 生成方法，后者通常依赖 基于分数蒸馏采样 (score distillation sampling, SDS) 进行后处理优化。而 VideoRFSplat 在无需此类后处理的情况下，即可实现更优的生成效果。\n"
  },
  {
    "path": "abs/2503.15877.md",
    "content": "### Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation\n\nRecent advances in text-to-image diffusion models have been driven by the increasing availability of paired 2D data. However, the development of 3D diffusion models has been hindered by the scarcity of high-quality 3D data, resulting in less competitive performance compared to their 2D counterparts. To address this challenge, we propose repurposing pre-trained 2D diffusion models for 3D object generation. We introduce Gaussian Atlas, a novel representation that utilizes dense 2D grids, enabling the fine-tuning of 2D diffusion models to generate 3D Gaussians. Our approach demonstrates successful transfer learning from a pre-trained 2D diffusion model to a 2D manifold flattened from 3D structures. To support model training, we compile GaussianVerse, a large-scale dataset comprising 205K high-quality 3D Gaussian fittings of various 3D objects. Our experimental results show that text-to-image diffusion models can be effectively adapted for 3D content generation, bridging the gap between 2D and 3D modeling.\n\n近年来，文本到图像的扩散模型（text-to-image diffusion models）取得了显著进展，主要得益于配对 2D 数据 的不断增加。然而，3D 扩散模型的发展受限于高质量 3D 数据 的匮乏，导致其与 2D 模型相比性能较为逊色。为了解决这一挑战，我们提出了将预训练的 2D 扩散模型重新用于 3D 物体生成的方法。\n我们引入了 Gaussian Atlas，一种新的表示方式，利用密集的 2D 网格，使得 2D 扩散模型能够微调以生成 3D 高斯点。我们的方法展示了从预训练的 2D 扩散模型到从 3D 结构展开的 2D 流形的成功迁移学习。为了支持模型训练，我们编制了 GaussianVerse，一个大规模数据集，包含了 205K 个高质量的 3D 高斯拟合数据，涵盖各种 3D 物体。\n实验结果表明，文本到图像的扩散模型可以有效地适应 3D 内容生成，从而弥合了 2D 和 3D 建模之间的差距。\n"
  },
  {
    "path": "abs/2503.15908.md",
    "content": "### Enhancing Close-up Novel View Synthesis via Pseudo-labeling\n\nRecent methods, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated remarkable capabilities in novel view synthesis. However, despite their success in producing high-quality images for viewpoints similar to those seen during training, they struggle when generating detailed images from viewpoints that significantly deviate from the training set, particularly in close-up views. The primary challenge stems from the lack of specific training data for close-up views, leading to the inability of current methods to render these views accurately. To address this issue, we introduce a novel pseudo-label-based learning strategy. This approach leverages pseudo-labels derived from existing training data to provide targeted supervision across a wide range of close-up viewpoints. Recognizing the absence of benchmarks for this specific challenge, we also present a new dataset designed to assess the effectiveness of both current and future methods in this area. Our extensive experiments demonstrate the efficacy of our approach.\n\n近年来，神经辐射场 (NeRF) 和 3D 高斯散点 (3DGS) 等方法在新视角合成方面展现出了卓越的能力。然而，尽管它们在生成与训练视角相似的高质量图像方面取得了成功，但在生成与训练集存在较大偏差的视角，特别是近景视角时，仍然存在明显的挑战。这一问题的核心原因在于缺乏针对近景视角的特定训练数据，导致现有方法难以精确渲染这些视角的细节。\n为了解决这一问题，我们提出了一种基于伪标签的学习策略。该方法利用从已有训练数据中生成的伪标签，为广泛的近景视角提供有针对性的监督，从而提升渲染质量。此外，考虑到当前缺乏专门针对该挑战的基准测试，我们构建了一个新数据集，用于评估现有方法和未来方法在这一问题上的表现。\n大量实验结果表明，我们的方法能够有效提升近景视角的渲染质量，并在该任务上取得了显著的性能提升。\n"
  },
  {
    "path": "abs/2503.16177.md",
    "content": "### OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering\n\nIn large-scale scene reconstruction using 3D Gaussian splatting, it is common to partition the scene into multiple smaller regions and reconstruct them individually. However, existing division methods are occlusion-agnostic, meaning that each region may contain areas with severe occlusions. As a result, the cameras within those regions are less correlated, leading to a low average contribution to the overall reconstruction. In this paper, we propose an occlusion-aware scene division strategy that clusters training cameras based on their positions and co-visibilities to acquire multiple regions. Cameras in such regions exhibit stronger correlations and a higher average contribution, facilitating high-quality scene reconstruction. We further propose a region-based rendering technique to accelerate large scene rendering, which culls Gaussians invisible to the region where the viewpoint is located. Such a technique significantly speeds up the rendering without compromising quality. Extensive experiments on multiple large scenes show that our method achieves superior reconstruction results with faster rendering speed compared to existing state-of-the-art approaches.\n\n在使用 3D 高斯散点 (3DGS) 进行大规模场景重建时，通常会将场景划分为多个较小的区域，并分别进行重建。然而，现有的划分方法忽略了遮挡信息，导致每个区域可能包含大量严重遮挡的区域。由于这些区域内的相机之间相关性较低，使得它们对整体重建的贡献度较低，进而影响最终的重建质量。\n为了解决这一问题，我们提出了一种基于遮挡感知的场景划分策略，该策略基于相机的位置和共视信息 (co-visibility) 进行聚类，从而生成多个子区域。在这些区域内，相机之间的相关性更强，平均贡献度更高，从而促进高质量的场景重建。\n此外，我们进一步提出了一种基于区域的渲染技术，用于加速大规模场景的渲染。该方法能够剔除当前视角所在区域不可见的高斯点，从而在不影响渲染质量的前提下显著提升渲染速度。\n在多个大规模场景上的实验表明，与现有最先进的方法相比，我们的方法能够实现更优的重建质量，同时显著提升渲染效率。\n"
  },
  {
    "path": "abs/2503.16338.md",
    "content": "### Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis performance. While conventional methods require per-scene optimization, more recently several feed-forward methods have been proposed to generate pixel-aligned Gaussian representations with a learnable network, which are generalizable to different scenes. However, these methods simply combine pixel-aligned Gaussians from multiple views as scene representations, thereby leading to artifacts and extra memory cost without fully capturing the relations of Gaussians from different images. In this paper, we propose Gaussian Graph Network (GGN) to generate efficient and generalizable Gaussian representations. Specifically, we construct Gaussian Graphs to model the relations of Gaussian groups from different views. To support message passing at Gaussian level, we reformulate the basic graph operations over Gaussian representations, enabling each Gaussian to benefit from its connected Gaussian groups with Gaussian feature fusion. Furthermore, we design a Gaussian pooling layer to aggregate various Gaussian groups for efficient representations. We conduct experiments on the large-scale RealEstate10K and ACID datasets to demonstrate the efficiency and generalization of our method. Compared to the state-of-the-art methods, our model uses fewer Gaussians and achieves better image quality with higher rendering speed.\n\n3D 高斯散点 (3DGS) 在新视角合成任务中展现出了卓越的性能。传统方法通常需要针对每个场景进行优化，而最近出现了一些前馈 (feed-forward) 方法，通过可学习的网络直接生成像素对齐的高斯表示，从而具备跨场景的泛化能力。然而，这些方法仅通过简单地将多个视角的像素对齐高斯点组合成场景表示，未能充分捕捉不同图像间的高斯点关系，导致伪影 (artifacts) 增多，同时带来额外的内存开销。\n为了解决这一问题，我们提出了高斯图网络 (Gaussian Graph Network, GGN)，用于生成高效且具备泛化能力的高斯表示。具体而言，我们构建高斯图 (Gaussian Graphs)，用于建模来自不同视角的高斯点组之间的关系。为了支持高斯级别的信息传递 (message passing)，我们重新定义了基本的图操作，使得高斯点能够通过高斯特征融合 (Gaussian feature fusion) 从其关联的高斯组中受益。此外，我们设计了一种高斯池化层 (Gaussian pooling layer)，用于聚合多个高斯点组，从而生成更加紧凑、高效的表示。\n我们在大规模 RealEstate10K 和 ACID 数据集上进行了实验，验证了我们方法的高效性和泛化能力。与现有最先进的方法相比，我们的模型在使用更少高斯点的情况下，实现了更高质量的图像渲染，同时显著提升渲染速度。\n"
  },
  {
    "path": "abs/2503.16413.md",
    "content": "### M3: 3D-Spatial MultiModal Memory\n\nWe present 3D Spatial MultiModal Memory (M3), a multimodal memory system designed to retain information about medium-sized static scenes through video sources for visual perception. By integrating 3D Gaussian Splatting techniques with foundation models, M3 builds a multimodal memory capable of rendering feature representations across granularities, encompassing a wide range of knowledge. In our exploration, we identify two key challenges in previous works on feature splatting: (1) computational constraints in storing high-dimensional features for each Gaussian primitive, and (2) misalignment or information loss between distilled features and foundation model features. To address these challenges, we propose M3 with key components of principal scene components and Gaussian memory attention, enabling efficient training and inference. To validate M3, we conduct comprehensive quantitative evaluations of feature similarity and downstream tasks, as well as qualitative visualizations to highlight the pixel trace of Gaussian memory attention. Our approach encompasses a diverse range of foundation models, including vision-language models (VLMs), perception models, and large multimodal and language models (LMMs/LLMs). Furthermore, to demonstrate real-world applicability, we deploy M3's feature field in indoor scenes on a quadruped robot. Notably, we claim that M3 is the first work to address the core compression challenges in 3D feature distillation.\n\n我们提出了 3D 空间多模态记忆 (M3)，这是一种多模态记忆系统，旨在通过视频源存储中等规模静态场景的视觉感知信息。M3 结合 3D 高斯散点 (3DGS) 技术与基础模型，构建了一种能够在多个粒度上渲染特征表示的多模态记忆系统，以涵盖广泛的知识范围。\n在研究过程中，我们识别出特征散点研究中的两个关键挑战：(1) 计算约束——为每个高斯基元存储高维特征的计算开销较大，(2) 特征错位或信息损失——蒸馏特征与基础模型特征之间可能存在对齐误差或信息损失。为了解决这些问题，我们提出 M3，其核心组件包括主场景组件和高斯记忆注意力，从而实现高效训练与推理。\n为了验证 M3，我们进行了全面的定量评估，涵盖特征相似性分析和下游任务测试，并通过定性可视化展示高斯记忆注意力的像素级追踪表现。我们的方法涵盖多种基础模型，包括视觉-语言模型 (VLMs)、感知模型，以及大规模多模态和语言模型 (LMMs/LLMs)。此外，为了展示其在真实场景中的应用能力，我们将 M3 的特征场部署在四足机器人的室内场景感知任务中。值得注意的是，我们认为 M3 是首个针对 3D 特征蒸馏核心压缩挑战的研究。\n"
  },
  {
    "path": "abs/2503.16422.md",
    "content": "### 1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering\n\n4D Gaussian Splatting (4DGS) has recently gained considerable attention as a method for reconstructing dynamic scenes. Despite achieving superior quality, 4DGS typically requires substantial storage and suffers from slow rendering speed. In this work, we delve into these issues and identify two key sources of temporal redundancy. (Q1) \\textbf{Short-Lifespan Gaussians}: 4DGS uses a large portion of Gaussians with short temporal span to represent scene dynamics, leading to an excessive number of Gaussians. (Q2) \\textbf{Inactive Gaussians}: When rendering, only a small subset of Gaussians contributes to each frame. Despite this, all Gaussians are processed during rasterization, resulting in redundant computation overhead. To address these redundancies, we present \\textbf{4DGS-1K}, which runs at over 1000 FPS on modern GPUs. For Q1, we introduce the Spatial-Temporal Variation Score, a new pruning criterion that effectively removes short-lifespan Gaussians while encouraging 4DGS to capture scene dynamics using Gaussians with longer temporal spans. For Q2, we store a mask for active Gaussians across consecutive frames, significantly reducing redundant computations in rendering. Compared to vanilla 4DGS, our method achieves a 41× reduction in storage and 9× faster rasterization speed on complex dynamic scenes, while maintaining comparable visual quality.\n\n4D 高斯散点 (4DGS) 最近作为重建动态场景的方法受到广泛关注。尽管其能够实现卓越的渲染质量，但 4DGS 通常需要大量存储，并且渲染速度较慢。在本文中，我们深入探讨了这些问题，并识别出两个关键的时间冗余源。\n(Q1) 短生命周期高斯点：4DGS 使用大量短时间跨度的高斯点来表示场景动态，导致高斯点数量过多。\n(Q2) 非活动高斯点：在渲染过程中，只有一小部分高斯点对每帧有贡献。然而，在光栅化过程中，所有高斯点都会被处理，造成冗余的计算开销。\n为了解决这些冗余问题，我们提出了 4DGS-1K，该方法在现代 GPU 上能够以超过 1000 FPS 的速度运行。针对 Q1，我们引入了 时空变化分数 (Spatial-Temporal Variation Score)，一种新的修剪标准，能够有效地去除短生命周期的高斯点，同时鼓励 4DGS 使用较长时间跨度的高斯点来捕捉场景动态。针对 Q2，我们在连续帧之间存储活动高斯点的掩码，显著减少了渲染中的冗余计算。\n与传统的 4DGS 方法相比，我们的方法在复杂动态场景中实现了 41 倍的存储减少 和 9 倍的光栅化加速，同时保持了相似的视觉质量。\n"
  },
  {
    "path": "abs/2503.16681.md",
    "content": "### GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting\n\n3D intelligence leverages rich 3D features and stands as a promising frontier in AI, with 3D rendering fundamental to many downstream applications. 3D Gaussian Splatting (3DGS), an emerging high-quality 3D rendering method, requires significant computation, making real-time execution on existing GPU-equipped edge devices infeasible. Previous efforts to accelerate 3DGS rely on dedicated accelerators that require substantial integration overhead and hardware costs. This work proposes an acceleration strategy that leverages the similarities between the 3DGS pipeline and the highly optimized conventional graphics pipeline in modern GPUs. Instead of developing a dedicated accelerator, we enhance existing GPU rasterizer hardware to efficiently support 3DGS operations. Our results demonstrate a 23× increase in processing speed and a 24× reduction in energy consumption, with improvements yielding 6× faster end-to-end runtime for the original 3DGS algorithm and 4× for the latest efficiency-improved pipeline, achieving 24 FPS and 46 FPS respectively. These enhancements incur only a minimal area overhead of 0.2% relative to the entire SoC chip area, underscoring the practicality and efficiency of our approach for enabling 3DGS rendering on resource-constrained platforms.\n\n3D 智能利用丰富的三维特征，作为人工智能的一个重要前沿方向，而3D 渲染则是众多下游应用的基础。3D 高斯散点 (3DGS) 作为一种新兴的高质量 3D 渲染方法，其计算需求极为庞大，使得在现有配备 GPU 的边缘设备上无法实现实时执行。此前的 3DGS 加速方案主要依赖专用加速器，但这些方案往往需要较高的集成成本和硬件开销。\n本研究提出了一种加速策略，利用 3DGS 渲染管线与现代 GPU 高度优化的传统图形管线之间的相似性。不同于开发专用加速器，我们通过增强 GPU 光栅化硬件来高效支持 3DGS 操作。实验结果表明，该方法实现了 23× 的处理速度提升和 24× 的能耗降低，并将原始 3DGS 算法的端到端运行时间加速 6×，达到 24 FPS，最新优化的高效渲染管线加速 4×，达到 46 FPS。此外，这一改进仅带来了 0.2% 的额外芯片面积开销（相对于整个 SoC 芯片面积），充分证明了该方案在资源受限平台上支持 3DGS 渲染的可行性和高效性。\n"
  },
  {
    "path": "abs/2503.16710.md",
    "content": "### 4D Gaussian Splatting SLAM\n\nSimultaneously localizing camera poses and constructing Gaussian radiance fields in dynamic scenes establish a crucial bridge between 2D images and the 4D real world. Instead of removing dynamic objects as distractors and reconstructing only static environments, this paper proposes an efficient architecture that incrementally tracks camera poses and establishes the 4D Gaussian radiance fields in unknown scenarios by using a sequence of RGB-D images. First, by generating motion masks, we obtain static and dynamic priors for each pixel. To eliminate the influence of static scenes and improve the efficiency on learning the motion of dynamic objects, we classify the Gaussian primitives into static and dynamic Gaussian sets, while the sparse control points along with an MLP is utilized to model the transformation fields of the dynamic Gaussians. To more accurately learn the motion of dynamic Gaussians, a novel 2D optical flow map reconstruction algorithm is designed to render optical flows of dynamic objects between neighbor images, which are further used to supervise the 4D Gaussian radiance fields along with traditional photometric and geometric constraints. In experiments, qualitative and quantitative evaluation results show that the proposed method achieves robust tracking and high-quality view synthesis performance in real-world environments.\n\n同时本地化相机位姿并构建动态场景中的高斯辐射场，在2D 图像与4D 真实世界之间建立了重要的桥梁。与传统方法通过去除动态物体作为干扰源并仅重建静态环境不同，本文提出了一种高效的架构，通过使用一系列 RGB-D 图像，增量跟踪相机位姿 并在未知场景中构建 4D 高斯辐射场。\n首先，通过生成运动掩码，我们为每个像素获取静态和动态的先验信息。为了消除静态场景的影响并提高动态物体运动学习的效率，我们将高斯基元分类为静态高斯集合和动态高斯集合，同时利用稀疏控制点结合 MLP 来建模动态高斯的变换场。为了更准确地学习动态高斯的运动，我们设计了一种新型的 2D 光流图重建算法，用于渲染相邻图像间动态物体的光流，这些光流进一步用于监督 4D 高斯辐射场，同时结合传统的光度和几何约束。\n实验中的定性和定量评估结果表明，所提出的方法能够在真实环境中实现稳健的跟踪和高质量的视角合成性能。\n"
  },
  {
    "path": "abs/2503.16747.md",
    "content": "### SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality\n\n3D Gaussian Splatting (3DGS) has significantly improved the efficiency and realism of three-dimensional scene visualization in several applications, ranging from robotics to eXtended Reality (XR). This work presents SAGE (Semantic-Driven Adaptive Gaussian Splatting in Extended Reality), a novel framework designed to enhance the user experience by dynamically adapting the Level of Detail (LOD) of different 3DGS objects identified via a semantic segmentation. Experimental results demonstrate how SAGE effectively reduces memory and computational overhead while keeping a desired target visual quality, thus providing a powerful optimization for interactive XR applications.\n\n3D 高斯散点 (3DGS) 在多个应用领域（从机器人技术到扩展现实 (XR)）中显著提升了三维场景可视化的效率和真实感。本研究提出了 SAGE（基于语义驱动的自适应高斯散点渲染框架），该框架通过语义分割识别不同的 3DGS 对象，并动态调整细节层次 (LOD)，以提升用户体验。实验结果表明，SAGE 能够在保持目标视觉质量的同时有效降低内存和计算开销，从而为交互式 XR 应用提供了一种强大的优化方案。\n"
  },
  {
    "path": "abs/2503.16822.md",
    "content": "### RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos\n\nThis paper considers the problem of modeling articulated objects captured in 2D videos to enable novel view synthesis, while also being easily editable, drivable, and re-posable. To tackle this challenging problem, we propose RigGS, a new paradigm that leverages 3D Gaussian representation and skeleton-based motion representation to model dynamic objects without utilizing additional template priors. Specifically, we first propose skeleton-aware node-controlled deformation, which deforms a canonical 3D Gaussian representation over time to initialize the modeling process, producing candidate skeleton nodes that are further simplified into a sparse 3D skeleton according to their motion and semantic information. Subsequently, based on the resulting skeleton, we design learnable skin deformations and pose-dependent detailed deformations, thereby easily deforming the 3D Gaussian representation to generate new actions and render further high-quality images from novel views. Extensive experiments demonstrate that our method can generate realistic new actions easily for objects and achieve high-quality rendering.\n\n本文探讨了如何建模捕捉到的 2D 视频中的关节化物体，以实现新视角合成，同时使其易于编辑、驱动和重新定位。为了解决这一挑战性问题，我们提出了 RigGS，一种新的方法，它利用 3D 高斯表示 和 基于骨架的运动表示 来建模动态物体，而不需要额外的模板先验。\n具体而言，我们首先提出了骨架感知的节点控制形变，该方法随时间对标准 3D 高斯表示进行形变，初始化建模过程，生成候选的骨架节点，这些节点根据其运动和语义信息进一步简化为稀疏的 3D 骨架。接下来，基于得到的骨架，我们设计了可学习的皮肤形变和姿态依赖的详细形变，从而轻松地对 3D 高斯表示进行形变，生成新的动作，并从新的视角渲染出更高质量的图像。\n大量实验表明，我们的方法能够轻松生成物体的新动作，并实现高质量渲染。\n"
  },
  {
    "path": "abs/2503.16924.md",
    "content": "### Optimized Minimal 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful representation for real-time, high-performance rendering, enabling a wide range of applications. However, representing 3D scenes with numerous explicit Gaussian primitives imposes significant storage and memory overhead. Recent studies have shown that high-quality rendering can be achieved with a substantially reduced number of Gaussians when represented with high-precision attributes. Nevertheless, existing 3DGS compression methods still rely on a relatively large number of Gaussians, focusing primarily on attribute compression. This is because a smaller set of Gaussians becomes increasingly sensitive to lossy attribute compression, leading to severe quality degradation. Since the number of Gaussians is directly tied to computational costs, it is essential to reduce the number of Gaussians effectively rather than only optimizing storage. In this paper, we propose Optimized Minimal Gaussians representation (OMG), which significantly reduces storage while using a minimal number of primitives. First, we determine the distinct Gaussian from the near ones, minimizing redundancy without sacrificing quality. Second, we propose a compact and precise attribute representation that efficiently captures both continuity and irregularity among primitives. Additionally, we propose a sub-vector quantization technique for improved irregularity representation, maintaining fast training with a negligible codebook size. Extensive experiments demonstrate that OMG reduces storage requirements by nearly 50% compared to the previous state-of-the-art and enables 600+ FPS rendering while maintaining high rendering quality. Our source code is available at this https URL.\n\n3D 高斯散点 (3DGS) 已成为一种强大的表示方式，支持实时高性能渲染，并广泛应用于多个领域。然而，使用大量显式高斯基元表示三维场景会带来显著的存储和内存开销。近期研究表明，在保持高精度属性的前提下，可以使用显著减少数量的高斯点实现高质量渲染。然而，现有的 3DGS 压缩方法 仍然依赖较多的高斯点，主要关注属性压缩。这是因为，随着高斯点数量减少，有损属性压缩对其影响更加敏感，容易导致严重的渲染质量下降。\n由于高斯点数量直接决定计算成本，仅优化存储并不足够，关键在于有效减少高斯点数量。为此，我们提出了 优化最小高斯表示 (Optimized Minimal Gaussians, OMG)，在使用极少基元的同时显著降低存储需求。首先，我们从相邻高斯点中识别独立的高斯点，在不牺牲质量的情况下最大程度减少冗余。其次，我们提出了一种紧凑且精确的属性表示，能够高效捕捉基元间的连续性与不规则性。此外，我们还提出了一种子向量量化 (sub-vector quantization) 技术，用于更好地表示不规则性，同时保持快速训练，且仅需极小的码本。\n大量实验表明，与当前最先进的方法相比，OMG 在保持高渲染质量的同时，将存储需求降低近 50%，并支持 600+ FPS 的实时渲染。\n"
  },
  {
    "path": "abs/2503.16964.md",
    "content": "### DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery\n\nDrones have become essential tools for reconstructing wild scenes due to their outstanding maneuverability. Recent advances in radiance field methods have achieved remarkable rendering quality, providing a new avenue for 3D reconstruction from drone imagery. However, dynamic distractors in wild environments challenge the static scene assumption in radiance fields, while limited view constraints hinder the accurate capture of underlying scene geometry. To address these challenges, we introduce DroneSplat, a novel framework designed for robust 3D reconstruction from in-the-wild drone imagery. Our method adaptively adjusts masking thresholds by integrating local-global segmentation heuristics with statistical approaches, enabling precise identification and elimination of dynamic distractors in static scenes. We enhance 3D Gaussian Splatting with multi-view stereo predictions and a voxel-guided optimization strategy, supporting high-quality rendering under limited view constraints. For comprehensive evaluation, we provide a drone-captured 3D reconstruction dataset encompassing both dynamic and static scenes. Extensive experiments demonstrate that DroneSplat outperforms both 3DGS and NeRF baselines in handling in-the-wild drone imagery.\n\n无人机凭借其卓越的机动性，已成为重建野外场景的重要工具。近年来，辐射场方法 (Radiance Field) 在渲染质量上取得了显著进展，为基于无人机影像的 3D 重建 提供了新的可能性。然而，野外环境中的动态干扰因素挑战了辐射场的静态场景假设，而视角受限的问题又阻碍了场景几何结构的精确捕捉。\n为了解决这些挑战，我们提出 DroneSplat，一个专为野外无人机影像的稳健 3D 重建设计的框架。我们的方法结合局部-全局分割启发式 (local-global segmentation heuristics) 与统计方法，自适应调整掩码阈值，以精准识别并去除静态场景中的动态干扰因素。此外，我们结合多视角立体 (multi-view stereo) 预测 和 体素引导优化 (voxel-guided optimization) 策略，增强 3D 高斯散点 (3DGS) 以支持在视角受限的情况下实现高质量渲染。\n为了全面评估，我们构建了一个无人机采集的 3D 重建数据集，涵盖动态场景与静态场景。大量实验表明，DroneSplat 在处理野外无人机影像方面优于 3DGS 和 NeRF 基线方法。\n"
  },
  {
    "path": "abs/2503.16979.md",
    "content": "### Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting\n\nBuilding Free-Viewpoint Videos in a streaming manner offers the advantage of rapid responsiveness compared to offline training methods, greatly enhancing user experience. However, current streaming approaches face challenges of high per-frame reconstruction time (10s+) and error accumulation, limiting their broader application. In this paper, we propose Instant Gaussian Stream (IGS), a fast and generalizable streaming framework, to address these issues. First, we introduce a generalized Anchor-driven Gaussian Motion Network, which projects multi-view 2D motion features into 3D space, using anchor points to drive the motion of all Gaussians. This generalized Network generates the motion of Gaussians for each target frame in the time required for a single inference. Second, we propose a Key-frame-guided Streaming Strategy that refines each key frame, enabling accurate reconstruction of temporally complex scenes while mitigating error accumulation. We conducted extensive in-domain and cross-domain evaluations, demonstrating that our approach can achieve streaming with a average per-frame reconstruction time of 2s+, alongside a enhancement in view synthesis quality.\n\n流式生成自由视角视频 相较于离线训练方法具备更快的响应速度，显著提升用户体验。然而，现有流式方法面临 单帧重建时间过长（10 秒以上） 以及 误差累积 的问题，限制了其广泛应用。为了解决这些问题，我们提出 Instant Gaussian Stream (IGS)，一个快速且具备泛化能力的流式框架。\n首先，我们引入了广义的锚点驱动高斯运动网络 (Anchor-driven Gaussian Motion Network)，该网络将多视角 2D 运动特征投影到 3D 空间，并利用锚点 (anchor points) 驱动所有高斯点的运动。这一通用网络能够在单次推理的时间内预测每个目标帧的高斯运动。\n其次，我们提出关键帧引导的流式策略 (Key-frame-guided Streaming Strategy)，通过精细化处理关键帧，精准重建时间复杂场景，同时缓解误差累积问题。\n我们进行了大规模的域内和跨域评估，结果表明 IGS 能够实现流式处理，平均单帧重建时间降低至 2 秒级，同时提升新视角合成质量。\n"
  },
  {
    "path": "abs/2503.17032.md",
    "content": "### TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting\n\nRealistic 3D full-body talking avatars hold great potential in AR, with applications ranging from e-commerce live streaming to holographic communication. Despite advances in 3D Gaussian Splatting (3DGS) for lifelike avatar creation, existing methods struggle with fine-grained control of facial expressions and body movements in full-body talking tasks. Additionally, they often lack sufficient details and cannot run in real-time on mobile devices. We present TaoAvatar, a high-fidelity, lightweight, 3DGS-based full-body talking avatar driven by various signals. Our approach starts by creating a personalized clothed human parametric template that binds Gaussians to represent appearances. We then pre-train a StyleUnet-based network to handle complex pose-dependent non-rigid deformation, which can capture high-frequency appearance details but is too resource-intensive for mobile devices. To overcome this, we \"bake\" the non-rigid deformations into a lightweight MLP-based network using a distillation technique and develop blend shapes to compensate for details. Extensive experiments show that TaoAvatar achieves state-of-the-art rendering quality while running in real-time across various devices, maintaining 90 FPS on high-definition stereo devices such as the Apple Vision Pro.\n\n逼真的 3D 全身语音驱动虚拟人 在增强现实 (AR) 领域具有广阔的应用前景，涵盖 电商直播、全息通信 等场景。尽管 3D 高斯散点 (3DGS) 在逼真化虚拟人生成方面取得了进展，但现有方法在全身语音驱动任务中仍面临 面部表情与身体动作的精细控制难题，并且细节不足，难以在移动设备上实时运行。\n我们提出 TaoAvatar，一种基于 3DGS 的高保真、轻量化全身语音驱动虚拟人，能够由多种信号驱动。我们的方法首先创建一个个性化的着衣人体参数化模板，将高斯点绑定至该模板以表示外观。随后，我们预训练一个基于 StyleUnet 的网络 来处理复杂的依赖姿态的非刚性形变，该方法能够捕捉高频外观细节，但计算资源需求较高，不适用于移动设备。\n为了解决这一问题，我们利用蒸馏技术 将非刚性形变“烘焙”到一个轻量级的 MLP 网络 中，并开发混合变形 (blend shapes) 机制以补偿细节损失。大量实验表明，TaoAvatar 在保持最先进渲染质量的同时，能够在多种设备上实时运行，并在 Apple Vision Pro 等高分辨率双目设备上达到 90 FPS。\n"
  },
  {
    "path": "abs/2503.17486.md",
    "content": "### ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes\n\n3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required, posing challenges for deployment on lightweight devices. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. To overcome these limitations, we propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality. Our method directly uses Gaussian prototypes to enable efficient rendering and leverage the resulting reconstruction loss to guide prototype learning. To further optimize memory efficiency during training, we incorporate structure-from-motion (SfM) points as anchor points to group Gaussian primitives. Gaussian prototypes are derived within each group by clustering of K-means, and both the anchor points and the prototypes are optimized jointly. Our experiments on real-world and synthetic datasets prove that we outperform existing methods, achieving a substantial reduction in the number of Gaussians, and enabling high rendering speed while maintaining or even enhancing rendering fidelity.\n\n3D Gaussian Splatting（3DGS）在新视角合成任务中取得了显著进展，但其依赖大量高斯基元，限制了在轻量级设备上的部署。近期方法尝试通过压缩稠密高斯的存储大小来缓解该问题，但未能在保持渲染质量和效率方面取得良好平衡。为克服这些局限，我们提出了 ProtoGS，通过学习高斯原型来表征高斯基元，在不牺牲视觉质量的前提下显著减少所需高斯数量。我们的方法直接利用高斯原型进行高效渲染，并通过重建误差引导原型的学习。为了进一步优化训练过程中的内存效率，我们引入结构光束法（SfM）点作为锚点，对高斯基元进行分组，并在每组内通过 K-means 聚类得到高斯原型，同时联合优化锚点位置与原型参数。在真实和合成数据集上的实验结果表明，我们的方法优于现有方法，在显著减少高斯数量的同时，实现了更高的渲染速度，并保持甚至提升了渲染保真度。\n"
  },
  {
    "path": "abs/2503.17491.md",
    "content": "### Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping\n\nLiDARs provide accurate geometric measurements, making them valuable for ego-motion estimation and reconstruction tasks. Although its success, managing an accurate and lightweight representation of the environment still poses challenges. Both classic and NeRF-based solutions have to trade off accuracy over memory and processing times. In this work, we build on recent advancements in Gaussian Splatting methods to develop a novel LiDAR odometry and mapping pipeline that exclusively relies on Gaussian primitives for its scene representation. Leveraging spherical projection, we drive the refinement of the primitives uniquely from LiDAR measurements. Experiments show that our approach matches the current registration performance, while achieving SOTA results for mapping tasks with minimal GPU requirements. This efficiency makes it a strong candidate for further exploration and potential adoption in real-time robotics estimation tasks.\n\nLiDAR 能够提供精确的几何测量，使其在自运动估计和重建任务中极具价值。尽管取得了显著成果，但如何管理一种既精确又轻量的环境表示仍面临挑战。传统方法与基于 NeRF 的方案都需在精度、内存与处理时间之间权衡取舍。\n在本研究中，我们基于近期 Gaussian Splatting 方法的进展，提出了一种全新的 LiDAR 里程计与建图流程，该流程完全依赖高斯基元作为场景表示。通过球面投影机制，我们从 LiDAR 测量中直接驱动高斯基元的优化。\n实验结果表明，我们的方法在配准性能上与现有技术持平，同时在建图任务中以极低的 GPU 资源消耗达到了当前最优（SOTA）水平。这种高效性使得我们的方法成为实时机器人估计任务中值得进一步探索与应用的有力候选。\n"
  },
  {
    "path": "abs/2503.17574.md",
    "content": "### Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting\n\nSearching in and editing 3D scenes has become extremely intuitive with trainable scene representations that allow linking human concepts to elements in the scene. These operations are often evaluated on the basis of how accurately the searched element is segmented or extracted from the scene. In this paper, we address the inverse problem, that is, how much of the searched element remains in the scene after it is removed. This question is particularly important in the context of privacy-preserving mapping when a user reconstructs a 3D scene and wants to remove private elements before sharing the map. To the best of our knowledge, this is the first work to address this question. To answer this, we propose a quantitative evaluation that measures whether a removal operation leaves object residuals that can be reasoned over. The scene is not private when such residuals are present. Experiments on state-of-the-art scene representations show that the proposed metrics are meaningful and consistent with the user study that we also present. We also propose a method to refine the removal based on spatial and semantic consistency.\n\n\n得益于可训练的场景表示，3D 场景的搜索与编辑变得极其直观，这使得人类概念可以自然地与场景中的元素建立联系。这类操作通常基于所搜索元素在场景中被准确分割或提取的程度进行评估。而本文关注的是一个逆问题：在某个元素被移除后，该元素在场景中还残留了多少痕迹。这一问题在隐私保护地图构建中尤为关键，例如当用户重建一个 3D 场景并希望在共享地图前移除其中的私密元素时。就我们所知，这是首个针对该问题展开研究的工作。\n为解答这一问题，我们提出了一种定量评估方法，用以测量移除操作是否留下了可被感知或推断的物体残留物。当此类残留存在时，场景即不再具备隐私性。我们在多种最先进的场景表示方法上进行了实验，结果表明我们提出的指标具有显著意义，并与我们开展的用户研究结果高度一致。此外，我们还提出了一种基于空间一致性与语义一致性的残留精修方法，以进一步优化元素移除效果。\n"
  },
  {
    "path": "abs/2503.17733.md",
    "content": "### GS-LTS: 3D Gaussian Splatting-Based Adaptive Modeling for Long-Term Service Robots\n\n3D Gaussian Splatting (3DGS) has garnered significant attention in robotics for its explicit, high fidelity dense scene representation, demonstrating strong potential for robotic applications. However, 3DGS-based methods in robotics primarily focus on static scenes, with limited attention to the dynamic scene changes essential for long-term service robots. These robots demand sustained task execution and efficient scene updates-challenges current approaches fail to meet. To address these limitations, we propose GS-LTS (Gaussian Splatting for Long-Term Service), a 3DGS-based system enabling indoor robots to manage diverse tasks in dynamic environments over time. GS-LTS detects scene changes (e.g., object addition or removal) via single-image change detection, employs a rule-based policy to autonomously collect multi-view observations, and efficiently updates the scene representation through Gaussian editing. Additionally, we propose a simulation-based benchmark that automatically generates scene change data as compact configuration scripts, providing a standardized, user-friendly evaluation benchmark. Experimental results demonstrate GS-LTS's advantages in reconstruction, navigation, and superior scene updates-faster and higher quality than the image training baseline-advancing 3DGS for long-term robotic operations.\n\n3D Gaussian Splatting（3DGS）因其显式的高保真稠密场景表示在机器人领域受到广泛关注，展现出强大的应用潜力。然而，现有基于 3DGS 的机器人方法主要聚焦于静态场景，鲜有研究关注对长期服务机器人至关重要的动态场景变化。这类机器人需具备持续执行任务和高效更新场景的能力，而现有方法难以满足这一需求。\n为应对上述挑战，我们提出了 GS-LTS（Gaussian Splatting for Long-Term Service），这是一个基于 3DGS 的系统，使室内机器人能够在动态环境中长期执行多样化任务。GS-LTS 通过单张图像的变化检测识别场景变化（如物体的添加或移除），并采用基于规则的策略自主采集多视角观测数据，随后通过高斯编辑实现高效的场景更新。\n此外，我们还提出了一个基于仿真的评测基准，该基准可自动生成场景变化数据，以简洁的配置脚本形式呈现，为研究者提供标准化、易于使用的评测平台。实验结果表明，GS-LTS 在重建、导航和场景更新等任务上均优于以图像训练为基础的现有方法，在速度与质量上实现双重提升，推动 3DGS 在长期机器人应用中的发展。\n"
  },
  {
    "path": "abs/2503.17798.md",
    "content": "### GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting\n\nRecent developments in 3D reconstruction and neural rendering have significantly propelled the capabilities of photo-realistic 3D scene rendering across various academic and industrial fields. The 3D Gaussian Splatting technique, alongside its derivatives, integrates the advantages of primitive-based and volumetric representations to deliver top-tier rendering quality and efficiency. Despite these advancements, the method tends to generate excessive redundant noisy Gaussians overfitted to every training view, which degrades the rendering quality. Additionally, while 3D Gaussian Splatting excels in small-scale and object-centric scenes, its application to larger scenes is hindered by constraints such as limited video memory, excessive optimization duration, and variable appearance across views. To address these challenges, we introduce GaussianFocus, an innovative approach that incorporates a patch attention algorithm to refine rendering quality and implements a Gaussian constraints strategy to minimize redundancy. Moreover, we propose a subdivision reconstruction strategy for large-scale scenes, dividing them into smaller, manageable blocks for individual training. Our results indicate that GaussianFocus significantly reduces unnecessary Gaussians and enhances rendering quality, surpassing existing State-of-The-Art (SoTA) methods. Furthermore, we demonstrate the capability of our approach to effectively manage and render large scenes, such as urban environments, whilst maintaining high fidelity in the visual output.\n\n近年来，三维重建与神经渲染的发展显著推动了各类学术与工业领域中对真实感三维场景渲染能力的提升。3D Gaussian Splatting 技术及其衍生方法结合了基元表示与体渲染的优势，在渲染质量与效率方面均达到了领先水平。然而，尽管取得了诸多进展，该方法往往会在每个训练视角上产生大量冗余噪声高斯点，导致过拟合，进而影响最终渲染质量。\n此外，尽管 3D Gaussian Splatting 在小规模、以物体为中心的场景中表现优异，其在大规模场景中的应用仍受到限制，如显存不足、优化时间过长以及视角间外观变化等问题。\n为解决上述挑战，我们提出了 GaussianFocus，该方法引入Patch Attention 算法以精细化渲染质量，并设计了高斯约束策略以有效减少冗余高斯点。同时，我们提出了一种大场景的分块重建策略，将大型场景划分为若干小块，分别进行训练，从而提升训练效率与可扩展性。\n实验结果表明，GaussianFocus 显著减少了冗余高斯点，并提升了渲染质量，超越了现有的最先进方法（State-of-The-Art, SoTA）。此外，我们还展示了该方法在处理和渲染大规模场景（如城市环境）中的能力，在保持高保真视觉输出的同时，实现了有效管理与渲染。\n"
  },
  {
    "path": "abs/2503.17897.md",
    "content": "### Real-time Global Illumination for Dynamic 3D Gaussian Scenes\n\nWe present a real-time global illumination approach along with a pipeline for dynamic 3D Gaussian models and meshes. Building on a formulated surface light transport model for 3D Gaussians, we address key performance challenges with a fast compound stochastic ray-tracing algorithm and an optimized 3D Gaussian rasterizer. Our pipeline integrates multiple real-time techniques to accelerate performance and achieve high-quality lighting effects. Our approach enables real-time rendering of dynamic scenes with interactively editable materials and dynamic lighting of diverse multi-lights settings, capturing mutual multi-bounce light transport (indirect illumination) between 3D Gaussians and mesh. Additionally, we present a real-time renderer with an interactive user interface, validating our approach and demonstrating its practicality and high efficiency with over 40 fps in scenes including both 3D Gaussians and mesh. Furthermore, our work highlights the potential of 3D Gaussians in real-time applications with dynamic lighting, offering insights into performance and optimization.\n\n我们提出了一种面向动态 3D 高斯模型与网格模型的实时全局光照方法及其完整渲染流程。基于我们构建的 3D 高斯表面光传输模型，针对关键性能瓶颈，我们设计了一个高效的复合随机光线追踪算法以及一个优化的 3D 高斯光栅器。该渲染流程集成了多种实时技术，显著提升了渲染性能，并实现了高质量的光照效果。\n我们的方法支持动态场景的实时渲染，包括可交互编辑材质、多光源设置下的动态照明，以及在 3D 高斯与网格模型之间的多次间接光传输（即互相反射的间接照明）。此外，我们还开发了一个配有交互式用户界面的实时渲染器，验证了我们方法的可行性与高效率：在包含高斯点和网格模型的场景中，渲染速度可达 40 帧每秒以上。\n本研究进一步表明，3D 高斯表示在动态照明场景下的实时应用具有巨大潜力，并在性能优化与系统设计方面提供了有价值的参考与启发。\n"
  },
  {
    "path": "abs/2503.18052.md",
    "content": "### SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining\n\nRecognizing arbitrary or previously unseen categories is essential for comprehensive real-world 3D scene understanding. Currently, all existing methods rely on 2D or textual modalities during training, or together at inference. This highlights a clear absence of a model capable of processing 3D data alone for learning semantics end-to-end, along with the necessary data to train such a model. Meanwhile, 3D Gaussian Splatting (3DGS) has emerged as the de facto standard for 3D scene representation across various vision tasks. However, effectively integrating semantic reasoning into 3DGS in a generalizable fashion remains an open challenge. To address these limitations we introduce SceneSplat, to our knowledge the first large-scale 3D indoor scene understanding approach that operates natively on 3DGS. Furthermore, we propose a self-supervised learning scheme that unlocks rich 3D feature learning from unlabeled scenes. In order to power the proposed methods, we introduce SceneSplat-7K, the first large-scale 3DGS dataset for indoor scenes, comprising of 6868 scenes derived from 7 established datasets like ScanNet, Matterport3D, etc. Generating SceneSplat-7K required computational resources equivalent to 119 GPU-days on an L4 GPU, enabling standardized benchmarking for 3DGS-based reasoning for indoor scenes. Our exhaustive experiments on SceneSplat-7K demonstrate the significant benefit of the proposed methods over the established baselines.\n\n识别任意类别或先前未见类别是实现对真实世界 3D 场景全面理解的关键。然而，当前所有已有方法在训练过程中都依赖 2D 模态或文本模态，或在推理阶段结合这两者使用。这凸显出当前缺乏一种能够仅基于 3D 数据进行端到端语义学习的模型，以及支持训练此类模型所需的数据。\n与此同时，3D Gaussian Splatting（3DGS）已成为多种视觉任务中事实上的标准 3D 场景表示方式。然而，如何以具有泛化能力的方式将语义推理有效整合进 3DGS，仍是一个尚未解决的难题。\n为突破上述限制，我们提出了 SceneSplat——据我们所知，这是首个原生运行于 3DGS 表示上的大规模室内场景理解方法。此外，我们还提出了一种自监督学习框架，能够从无标签的场景中挖掘丰富的 3D 特征表示。\n为了支持我们的方法，我们构建了 SceneSplat-7K，这是首个面向室内场景的大规模 3DGS 数据集，包含来自 ScanNet、Matterport3D 等七个已有数据集的 6868 个场景。SceneSplat-7K 的生成耗费了约 119 个 L4 GPU 天的计算资源，为基于 3DGS 的室内场景推理提供了标准化评测基准。\n我们在 SceneSplat-7K 上进行了大量实证实验，结果表明，所提出的方法在多个指标上显著优于现有基线方法。\n"
  },
  {
    "path": "abs/2503.18073.md",
    "content": "### PanopticSplatting: End-to-End Panoptic Gaussian Splatting\n\nOpen-vocabulary panoptic reconstruction is a challenging task for simultaneous scene reconstruction and understanding. Recently, methods have been proposed for 3D scene understanding based on Gaussian splatting. However, these methods are multi-staged, suffering from the accumulated errors and the dependence of hand-designed components. To streamline the pipeline and achieve global optimization, we propose PanopticSplatting, an end-to-end system for open-vocabulary panoptic reconstruction. Our method introduces query-guided Gaussian segmentation with local cross attention, lifting 2D instance masks without cross-frame association in an end-to-end way. The local cross attention within view frustum effectively reduces the training memory, making our model more accessible to large scenes with more Gaussians and objects. In addition, to address the challenge of noisy labels in 2D pseudo masks, we propose label blending to promote consistent 3D segmentation with less noisy floaters, as well as label warping on 2D predictions which enhances multi-view coherence and segmentation accuracy. Our method demonstrates strong performances in 3D scene panoptic reconstruction on the ScanNet-V2 and ScanNet++ datasets, compared with both NeRF-based and Gaussian-based panoptic reconstruction methods. Moreover, PanopticSplatting can be easily generalized to numerous variants of Gaussian splatting, and we demonstrate its robustness on different Gaussian base models.\n\n开放词汇全景重建是一项同时实现场景重建与理解的挑战性任务。近年来，已有方法尝试基于高斯渲染（Gaussian Splatting）进行三维场景理解，但这些方法往往采用多阶段处理流程，容易引入累积误差，并依赖大量手工设计的模块，限制了系统的鲁棒性与扩展性。\n为简化流程并实现全局优化，我们提出了 PanopticSplatting——一个面向开放词汇全景重建的端到端系统。我们的方法引入了基于查询引导的高斯分割机制，结合视锥内的局部交叉注意力，无需跨帧关联即可将二维实例掩码提升为三维表示，从而实现端到端的训练。视锥内的局部交叉注意机制显著降低了训练内存开销，使模型能够适应更多高斯点和物体的大场景重建。\n此外，针对二维伪标签中存在的噪声问题，我们提出了**标签融合（label blending）**策略，以提升三维分割的一致性并减少悬浮噪声点。同时，**标签变换（label warping）**机制能够增强多视角间的语义一致性与分割精度。\n在 ScanNet-V2 和 ScanNet++ 数据集上的实验表明，PanopticSplatting 在三维全景重建任务中相较于基于 NeRF 和基于 Gaussian 的方法均表现出更强的性能表现。此外，PanopticSplatting 具有良好的通用性，可轻松适配多种高斯渲染变体，我们进一步验证了其在不同高斯基模型上的鲁棒性。\n"
  },
  {
    "path": "abs/2503.18107.md",
    "content": "### PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding\n\nRecently, 3D Gaussian Splatting (3DGS) has shown encouraging performance for open vocabulary scene understanding tasks. However, previous methods cannot distinguish 3D instance-level information, which usually predicts a heatmap between the scene feature and text query. In this paper, we propose PanoGS, a novel and effective 3D panoptic open vocabulary scene understanding approach. Technically, to learn accurate 3D language features that can scale to large indoor scenarios, we adopt the pyramid tri-plane to model the latent continuous parametric feature space and use a 3D feature decoder to regress the multi-view fused 2D feature cloud. Besides, we propose language-guided graph cuts that synergistically leverage reconstructed geometry and learned language cues to group 3D Gaussian primitives into a set of super-primitives. To obtain 3D consistent instance, we perform graph clustering based segmentation with SAM-guided edge affinity computation between different super-primitives. Extensive experiments on widely used datasets show better or more competitive performance on 3D panoptic open vocabulary scene understanding.\n\n近年来，3D Gaussian Splatting（3DGS）在开放词汇场景理解任务中展现出令人鼓舞的性能。然而，现有方法通常仅通过场景特征与文本查询之间的热力图来建立关联，难以实现对三维实例级信息的区分。\n为此，本文提出了一种新颖且高效的三维全景式开放词汇场景理解方法——PanoGS。在技术上，为了学习可扩展至大规模室内场景的高质量三维语言特征，我们采用 金字塔三平面（pyramid tri-plane）结构来建模潜在的连续参数特征空间，并通过一个 三维特征解码器回归多视图融合后的二维特征点云。\n此外，我们提出了语言引导的图割算法（language-guided graph cuts），结合重建几何信息与学习到的语言线索，将三维高斯基元划分为一组超基元（super-primitives）。为了实现三维一致性的实例分割，我们在超基元之间计算由 SAM（Segment Anything Model）引导的边界亲和度，并执行图聚类分割。\n我们在多个主流数据集上进行了大量实验，结果表明，PanoGS 在三维全景开放词汇场景理解任务中实现了优于或具有竞争力的性能。\n"
  },
  {
    "path": "abs/2503.18108.md",
    "content": "### Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving\n\nEnd-to-end (E2E) autonomous driving (AD) models require diverse, high-quality data to perform well across various driving scenarios. However, collecting large-scale real-world data is expensive and time-consuming, making high-fidelity synthetic data essential for enhancing data diversity and model robustness. Existing driving simulators for synthetic data generation have significant limitations: game-engine-based simulators struggle to produce realistic sensor data, while NeRF-based and diffusion-based methods face efficiency challenges. Additionally, recent simulators designed for closed-loop evaluation provide limited interaction with other vehicles, failing to simulate complex real-world traffic dynamics. To address these issues, we introduce SceneCrafter, a realistic, interactive, and efficient AD simulator based on 3D Gaussian Splatting (3DGS). SceneCrafter not only efficiently generates realistic driving logs across diverse traffic scenarios but also enables robust closed-loop evaluation of end-to-end models. Experimental results demonstrate that SceneCrafter serves as both a reliable evaluation platform and a efficient data generator that significantly improves end-to-end model generalization.\n\n端到端（End-to-End, E2E）自动驾驶（Autonomous Driving, AD）模型在各种驾驶场景中取得良好性能，依赖于多样且高质量的数据。然而，收集大规模真实世界数据成本高昂且耗时，因此高保真的合成数据对于提升数据多样性与模型鲁棒性至关重要。现有用于合成数据生成的驾驶模拟器存在诸多局限：基于游戏引擎的模拟器难以生成真实的传感器数据，而基于 NeRF 和扩散模型的方法则面临效率瓶颈。此外，近年来面向闭环评估设计的模拟器在与其他车辆的交互方面能力有限，无法有效模拟复杂的现实交通动态。\n为了解决上述问题，我们提出了 SceneCrafter，这是一种基于三维高斯投影（3D Gaussian Splatting, 3DGS）的逼真、可交互且高效的自动驾驶模拟器。SceneCrafter 不仅能够高效生成涵盖多样交通场景的真实驾驶日志，还支持对端到端模型进行稳健的闭环评估。\n实验结果表明，SceneCrafter 既是一个可靠的评估平台，又是一个高效的数据生成器，显著提升了端到端模型的泛化能力。\n"
  },
  {
    "path": "abs/2503.18275.md",
    "content": "### GI-SLAM: Gaussian-Inertial SLAM\n\n3D Gaussian Splatting (3DGS) has recently emerged as a powerful representation of geometry and appearance for dense Simultaneous Localization and Mapping (SLAM). Through rapid, differentiable rasterization of 3D Gaussians, many 3DGS SLAM methods achieve near real-time rendering and accelerated training. However, these methods largely overlook inertial data, witch is a critical piece of information collected from the inertial measurement unit (IMU). In this paper, we present GI-SLAM, a novel gaussian-inertial SLAM system which consists of an IMU-enhanced camera tracking module and a realistic 3D Gaussian-based scene representation for mapping. Our method introduces an IMU loss that seamlessly integrates into the deep learning framework underpinning 3D Gaussian Splatting SLAM, effectively enhancing the accuracy, robustness and efficiency of camera tracking. Moreover, our SLAM system supports a wide range of sensor configurations, including monocular, stereo, and RGBD cameras, both with and without IMU integration. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the EuRoC and TUM-RGBD datasets.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）近年来作为一种强大的几何与外观表示方式，已在稠密同时定位与建图（SLAM）任务中展现出巨大潜力。通过对三维高斯的快速、可微光栅化，许多 3DGS-SLAM 方法实现了近实时渲染与加速训练。然而，这些方法大多忽略了惯性测量单元（IMU）所提供的惯性数据，而这恰恰是 SLAM 系统中一项关键的信息来源。\n在本文中，我们提出了 GI-SLAM，一种新颖的高斯-惯性 SLAM 系统，由一个增强了 IMU 的相机跟踪模块和基于真实三维高斯的场景建图表示组成。我们引入了一种 IMU 损失函数，可无缝集成至 3DGS-SLAM 所依赖的深度学习框架中，从而有效提升相机跟踪的精度、鲁棒性与效率。\n此外，我们的 SLAM 系统支持多种传感器配置，包括单目、双目和 RGBD 相机，均可灵活选择是否搭载 IMU。在 EuRoC 和 TUM-RGBD 数据集上的实验证明，我们的方法在实时性与精度方面均达到了与现有最先进方法相当的水平。\n"
  },
  {
    "path": "abs/2503.18402.md",
    "content": "### DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds\n\n3D Gaussian Splatting (3DGS) renders pixels by rasterizing Gaussian primitives, where the rendering resolution and the primitive number, concluded as the optimization complexity, dominate the time cost in primitive optimization. In this paper, we propose DashGaussian, a scheduling scheme over the optimization complexity of 3DGS that strips redundant complexity to accelerate 3DGS optimization. Specifically, we formulate 3DGS optimization as progressively fitting 3DGS to higher levels of frequency components in the training views, and propose a dynamic rendering resolution scheme that largely reduces the optimization complexity based on this formulation. Besides, we argue that a specific rendering resolution should cooperate with a proper primitive number for a better balance between computing redundancy and fitting quality, where we schedule the growth of the primitives to synchronize with the rendering resolution. Extensive experiments show that our method accelerates the optimization of various 3DGS backbones by 45.7% on average while preserving the rendering quality.\n\n3D Gaussian Splatting（3DGS）通过栅格化高斯基元实现像素渲染，其渲染分辨率与基元数量共同构成优化复杂度，是影响基元优化时间开销的核心因素。本文提出 DashGaussian，一种面向 3DGS 优化复杂度的调度方案，旨在剥离冗余复杂度以加速优化过程。\n具体而言，我们将 3DGS 优化过程建模为对训练视图中逐级更高频率成分的逐步拟合，并基于此提出一种动态渲染分辨率策略，以显著降低优化复杂度。此外，我们指出，特定的渲染分辨率应与合适的基元数量协同配合，以在计算冗余与拟合质量之间取得更优平衡。因此，我们设计了一种与渲染分辨率同步的基元增长调度机制。\n大量实验证明，DashGaussian 能在保持渲染质量的前提下，将多种 3DGS 主干网络的优化速度平均提升 45.7%，显著加速 3DGS 的训练过程。\n\n"
  },
  {
    "path": "abs/2503.18421.md",
    "content": "### 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video\n\n3D Gaussian Splatting (3DGS) has substantial potential for enabling photorealistic Free-Viewpoint Video (FVV) experiences. However, the vast number of Gaussians and their associated attributes poses significant challenges for storage and transmission. Existing methods typically handle dynamic 3DGS representation and compression separately, neglecting motion information and the rate-distortion (RD) trade-off during training, leading to performance degradation and increased model redundancy. To address this gap, we propose 4DGC, a novel rate-aware 4D Gaussian compression framework that significantly reduces storage size while maintaining superior RD performance for FVV. Specifically, 4DGC introduces a motion-aware dynamic Gaussian representation that utilizes a compact motion grid combined with sparse compensated Gaussians to exploit inter-frame similarities. This representation effectively handles large motions, preserving quality and reducing temporal redundancy. Furthermore, we present an end-to-end compression scheme that employs differentiable quantization and a tiny implicit entropy model to compress the motion grid and compensated Gaussians efficiently. The entire framework is jointly optimized using a rate-distortion trade-off. Extensive experiments demonstrate that 4DGC supports variable bitrates and consistently outperforms existing methods in RD performance across multiple datasets.\n\n3D Gaussian Splatting（3DGS）在实现真实感自由视角视频（Free-Viewpoint Video, FVV）体验方面展现出巨大潜力。然而，海量的高斯基元及其附带属性对存储与传输提出了严峻挑战。现有方法通常将动态 3DGS 表示与压缩过程分离处理，忽略了运动信息及训练过程中的码率-失真（Rate-Distortion, RD）权衡，导致性能下降与模型冗余增加。\n为弥补这一空白，我们提出了 4DGC，一个新颖的、具备码率感知能力的四维高斯压缩框架，在大幅压缩存储空间的同时，实现了卓越的 RD 性能，适用于 FVV 场景。\n具体而言，4DGC 引入了一种具备运动感知能力的动态高斯表示，结合紧凑的运动网格（motion grid）与稀疏运动补偿高斯（sparse compensated Gaussians），充分挖掘帧间相似性。该表示方式可有效处理大幅度运动，既能保持画质，又能减少时序冗余。\n此外，我们设计了一种端到端压缩方案，采用可微分量化（differentiable quantization）和轻量级隐式熵模型（tiny implicit entropy model）对运动网格与补偿高斯进行高效压缩。整个框架在训练阶段以 RD 权衡为目标进行联合优化。\n大量实验证明，4DGC 能够支持多种比特率选择，并在多个数据集上持续优于现有方法，在保持高质量渲染的同时显著提高压缩效率。\n\n"
  },
  {
    "path": "abs/2503.18438.md",
    "content": "### ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation\n\nCombining reconstruction models with generative models has emerged as a promising paradigm for closed-loop simulation in autonomous driving. For example, ReconDreamer has demonstrated remarkable success in rendering large-scale maneuvers. However, a significant gap remains between the generated data and real-world sensor observations, particularly in terms of fidelity for structured elements, such as the ground surface. To address these challenges, we propose ReconDreamer++, an enhanced framework that significantly improves the overall rendering quality by mitigating the domain gap and refining the representation of the ground surface. Specifically, ReconDreamer++ introduces the Novel Trajectory Deformable Network (NTDNet), which leverages learnable spatial deformation mechanisms to bridge the domain gap between synthesized novel views and original sensor observations. Moreover, for structured elements such as the ground surface, we preserve geometric prior knowledge in 3D Gaussians, and the optimization process focuses on refining appearance attributes while preserving the underlying geometric structure. Experimental evaluations conducted on multiple datasets (Waymo, nuScenes, PandaSet, and EUVS) confirm the superior performance of ReconDreamer++. Specifically, on Waymo, ReconDreamer++ achieves performance comparable to Street Gaussians for the original trajectory while significantly outperforming ReconDreamer on novel trajectories. In particular, it achieves substantial improvements, including a 6.1% increase in NTA-IoU, a 23. 0% improvement in FID, and a remarkable 4.5% gain in the ground surface metric NTL-IoU, highlighting its effectiveness in accurately reconstructing structured elements such as the road surface.\n\n将重建模型与生成模型相结合，已成为自动驾驶闭环仿真中的一种颇具前景的范式。例如，ReconDreamer 在渲染大规模驾驶操作方面已展现出显著成果。然而，生成数据与真实传感器观测之间仍存在显著差距，特别是在诸如地面表面等结构化元素的保真度方面。\n为了解决这些问题，我们提出了 ReconDreamer++，这一增强框架通过缩小域间差距并精细化地面表示，显著提升了整体渲染质量。具体而言，ReconDreamer++ 引入了新颖的轨迹可变形网络（Novel Trajectory Deformable Network, NTDNet），该网络利用可学习的空间变形机制，弥合合成新视图与原始传感器观测之间的域差异。\n此外，对于如地面等结构化元素，我们在三维高斯中保留几何先验知识，优化过程专注于外观属性的细化，同时保持底层几何结构的稳定性。\n在多个数据集（Waymo、nuScenes、PandaSet 和 EUVS）上的实验评估验证了 ReconDreamer++ 的卓越性能。具体来说，在 Waymo 数据集上，ReconDreamer++ 在原始轨迹下达到了可与 Street Gaussians 相媲美的性能，并在新轨迹场景下显著优于 ReconDreamer。尤其值得关注的是，其在 NTA-IoU 上提升了 6.1%，FID 改进了 23.0%，地面表面评估指标 NTL-IoU 提升了 4.5%，充分体现了 ReconDreamer++ 在准确重建道路等结构化元素方面的有效性。\n"
  },
  {
    "path": "abs/2503.18458.md",
    "content": "### StableGS: A Floater-Free Framework for 3D Gaussian Splatting\n\nRecent years have witnessed remarkable success of 3D Gaussian Splatting (3DGS) in novel view synthesis, surpassing prior differentiable rendering methods in both quality and efficiency. However, its training process suffers from coupled opacity-color optimization that frequently converges to local minima, producing floater artifacts that degrade visual fidelity. We present StableGS, a framework that eliminates floaters through cross-view depth consistency constraints while introducing a dual-opacity GS model to decouple geometry and material properties of translucent objects. To further enhance reconstruction quality in weakly-textured regions, we integrate DUSt3R depth estimation, significantly improving geometric stability. Our method fundamentally addresses 3DGS training instabilities, outperforming existing state-of-the-art methods across open-source datasets.\n\n近年来，3D Gaussian Splatting（3DGS）在新视角合成任务中取得了显著成功，在渲染质量与效率方面均超越了以往的可微分渲染方法。然而，其训练过程中的不透明度与颜色耦合优化常常陷入局部最优，导致出现破坏视觉真实感的漂浮伪影（floater artifacts）。\n为解决这一问题，我们提出了 StableGS 框架，通过引入跨视角深度一致性约束有效消除漂浮伪影，并设计了一种双不透明度高斯模型（dual-opacity GS model），从而解耦半透明物体的几何结构与材质属性。\n此外，为了进一步提升在纹理较弱区域的重建质量，我们集成了 DUSt3R 深度估计方法，显著增强了几何结构的稳定性。\n本方法从根本上解决了 3DGS 在训练阶段的稳定性问题，并在多个开源数据集上超越现有最先进方法（State-of-the-Art），展现出更高的渲染保真度与鲁棒性。\n"
  },
  {
    "path": "abs/2503.18640.md",
    "content": "### LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment\n\n3D Gaussian Splatting has shown remarkable capabilities in novel view rendering tasks and exhibits significant potential for multi-view optimization.However, the original 3D Gaussian Splatting lacks color representation for inputs in low-light environments. Simply using enhanced images as inputs would lead to issues with multi-view consistency, and current single-view enhancement systems rely on pre-trained data, lacking scene generalization. These problems limit the application of 3D Gaussian Splatting in low-light conditions in the field of robotics, including high-fidelity modeling and feature matching. To address these challenges, we propose an unsupervised multi-view stereoscopic system based on Gaussian Splatting, called Low-Light Gaussian Splatting (LLGS). This system aims to enhance images in low-light environments while reconstructing the scene. Our method introduces a decomposable Gaussian representation called M-Color, which separately characterizes color information for targeted enhancement. Furthermore, we propose an unsupervised optimization method with zero-knowledge priors, using direction-based enhancement to ensure multi-view consistency. Experiments conducted on real-world datasets demonstrate that our system outperforms state-of-the-art methods in both low-light enhancement and 3D Gaussian Splatting.\n\n3D Gaussian Splatting 在新视角渲染任务中展现出卓越能力，并在多视角场景建模中具有广泛应用潜力。然而，原始的 3D Gaussian Splatting 方法缺乏对低光照环境输入的颜色表征能力。若直接使用增强后的图像作为输入，往往会导致多视角不一致问题；而现有单视角图像增强方法普遍依赖预训练数据，缺乏对不同场景的泛化能力。这些问题限制了 3DGS 在低光照环境下的应用，特别是在机器人领域中的高保真建模与特征匹配任务。\n为应对上述挑战，我们提出了一种基于 Gaussian Splatting 的无监督多视角立体系统，命名为 Low-Light Gaussian Splatting（LLGS）。该系统旨在实现低光照环境下的图像增强与场景重建的统一建模。\n具体而言，我们引入了一种可分解的高斯表示 M-Color，用于对颜色信息进行独立建模与定向增强，从而实现有针对性的亮度提升。同时，我们提出了一种零先验的无监督优化方法，通过基于方向的一致性增强机制，确保多视角间的颜色一致性。\n在真实世界数据集上的实验证明，LLGS 在低光照增强和 3D Gaussian Splatting 渲染质量方面均优于现有最先进方法，展现出良好的鲁棒性与实用价值。\n\n"
  },
  {
    "path": "abs/2503.18682.md",
    "content": "### Hardware-Rasterized Ray-Based Gaussian Splatting\n\nWe present a novel, hardware rasterized rendering approach for ray-based 3D Gaussian Splatting (RayGS), obtaining both fast and high-quality results for novel view synthesis. Our work contains a mathematically rigorous and geometrically intuitive derivation about how to efficiently estimate all relevant quantities for rendering RayGS models, structured with respect to standard hardware rasterization shaders. Our solution is the first enabling rendering RayGS models at sufficiently high frame rates to support quality-sensitive applications like Virtual and Mixed Reality. Our second contribution enables alias-free rendering for RayGS, by addressing MIP-related issues arising when rendering diverging scales during training and testing. We demonstrate significant performance gains, across different benchmark scenes, while retaining state-of-the-art appearance quality of RayGS.\n\n我们提出了一种新颖的、基于硬件光栅化的光线渲染方法，用于 3D Gaussian Splatting（RayGS）的加速渲染，在实现快速渲染的同时保持高质量的新视角合成效果。我们的方法以数学上严谨、几何上直观的推导为基础，高效估算 RayGS 渲染过程中所需的所有关键量，并以标准的硬件光栅化着色器结构为框架组织整个渲染流程。\n本研究是首个能够以足够高的帧率渲染 RayGS 模型的方案，能够满足对质量敏感的虚拟现实（VR）与混合现实（MR）等应用的实时需求。\n此外，我们还提出了第二项贡献：实现 RayGS 的抗混叠渲染。具体地，我们解决了训练与测试阶段由于尺度差异导致的 MIP（多层次纹理映射）相关问题，从而有效提升了渲染精度与一致性。\n在多个基准场景上的实验结果表明，我们的方法在保持 RayGS 渲染外观质量的同时，显著提升了渲染性能，为高性能实时应用提供了有力支撑。\n\n"
  },
  {
    "path": "abs/2503.18718.md",
    "content": "### GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting\n\nIn the Generative AI era, safeguarding 3D models has become increasingly urgent. While invisible watermarking is well-established for 2D images with encoder-decoder frameworks, generalizable and robust solutions for 3D remain elusive. The main difficulty arises from the renderer between the 3D encoder and 2D decoder, which disrupts direct gradient flow and complicates training. Existing 3D methods typically rely on per-scene iterative optimization, resulting in time inefficiency and limited generalization. In this work, we propose a single-pass watermarking approach for 3D Gaussian Splatting (3DGS), a well-known yet underexplored representation for watermarking. We identify two major challenges: (1) ensuring effective training generalized across diverse 3D models, and (2) reliably extracting watermarks from free-view renderings, even under distortions. Our framework, named GS-Marker, incorporates a 3D encoder to embed messages, distortion layers to enhance resilience against various distortions, and a 2D decoder to extract watermarks from renderings. A key innovation is the Adaptive Marker Control mechanism that adaptively perturbs the initially optimized 3DGS, escaping local minima and improving both training stability and convergence. Extensive experiments show that GS-Marker outperforms per-scene training approaches in terms of decoding accuracy and model fidelity, while also significantly reducing computation time.\n\n在生成式人工智能时代，保护三维模型已变得日益紧迫。尽管不可见水印技术已在二维图像领域通过编码器-解码器框架得到了充分研究，但在三维场景下仍缺乏通用且鲁棒的解决方案。主要困难在于，三维编码器与二维解码器之间存在渲染器，阻碍了梯度的直接传播，并使训练过程复杂化。现有三维方法通常依赖于逐场景的迭代优化，导致训练效率低下且泛化能力有限。\n在本工作中，我们提出了一种适用于 3D Gaussian Splatting（3DGS）的单次水印嵌入方法，3DGS 是一种知名但在水印任务中尚未被充分探索的表示形式。我们识别出两个主要挑战：（1）确保训练过程在多样化三维模型上的泛化能力；（2）在存在失真的情况下，从自由视角渲染图像中可靠地提取水印。\n我们提出的框架名为 GS-Marker，包含一个三维编码器用于嵌入信息、失真层用于增强对各种干扰的鲁棒性，以及一个二维解码器用于从渲染图像中提取水印。其关键创新在于自适应水印控制机制（Adaptive Marker Control），该机制通过对初始优化的 3DGS 进行自适应扰动，以跳出局部最优、提升训练稳定性与收敛性能。\n在多个实验中，我们的方法在水印解码精度与模型保真度方面优于逐场景训练方法，同时显著减少了计算时间。\n"
  },
  {
    "path": "abs/2503.18794.md",
    "content": "### NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting\n\nNeural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have noticeably advanced photo-realistic novel view synthesis using images from densely spaced camera viewpoints. However, these methods struggle in few-shot scenarios due to limited supervision. In this paper, we present NexusGS, a 3DGS-based approach that enhances novel view synthesis from sparse-view images by directly embedding depth information into point clouds, without relying on complex manual regularizations. Exploiting the inherent epipolar geometry of 3DGS, our method introduces a novel point cloud densification strategy that initializes 3DGS with a dense point cloud, reducing randomness in point placement while preventing over-smoothing and overfitting. Specifically, NexusGS comprises three key steps: Epipolar Depth Nexus, Flow-Resilient Depth Blending, and Flow-Filtered Depth Pruning. These steps leverage optical flow and camera poses to compute accurate depth maps, while mitigating the inaccuracies often associated with optical flow. By incorporating epipolar depth priors, NexusGS ensures reliable dense point cloud coverage and supports stable 3DGS training under sparse-view conditions. Experiments demonstrate that NexusGS significantly enhances depth accuracy and rendering quality, surpassing state-of-the-art methods by a considerable margin. Furthermore, we validate the superiority of our generated point clouds by substantially boosting the performance of competing methods.\n\n神经辐射场（NeRF）和 3D Gaussian Splatting（3DGS）在利用密集相机视角图像进行真实感新视角合成方面取得了显著进展。然而，这些方法在少视角（few-shot）场景下由于监督信号不足而表现不佳。\n本文提出 NexusGS，一种基于 3DGS 的新方法，可在稀疏视角图像条件下增强新视角合成能力。该方法通过直接将深度信息嵌入点云中，无需依赖复杂的人工正则化策略。借助 3DGS 中固有的极几何结构，NexusGS 引入了一种全新的点云致密化策略，在初始化阶段使用高密度点云对 3DGS 进行建模，从而减少点位分布的随机性，并有效避免过度平滑与过拟合。\n具体而言，NexusGS 包含以下三个核心步骤：极线深度融合（Epipolar Depth Nexus）、光流鲁棒深度融合（Flow-Resilient Depth Blending）以及光流过滤深度剪枝（Flow-Filtered Depth Pruning）。这些步骤结合光流与相机位姿，计算准确的深度图，同时缓解光流常见的不稳定性问题。通过引入极线深度先验，NexusGS 实现了可靠的点云覆盖效果，为 3DGS 在稀疏视角下的稳定训练提供了支撑。\n实验结果表明，NexusGS 在深度精度与渲染质量方面均显著优于现有先进方法。此外，我们进一步验证了所生成点云的优越性，显著提升了多个现有方法在相应任务中的表现。\n"
  },
  {
    "path": "abs/2503.19232.md",
    "content": "### HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting\n\nNovel view synthesis has demonstrated impressive progress recently, with 3D Gaussian splatting (3DGS) offering efficient training time and photorealistic real-time rendering. However, reliance on Cartesian coordinates limits 3DGS's performance on distant objects, which is important for reconstructing unbounded outdoor environments. We found that, despite its ultimate simplicity, using homogeneous coordinates, a concept on the projective geometry, for the 3DGS pipeline remarkably improves the rendering accuracies of distant objects. We therefore propose Homogeneous Gaussian Splatting (HoGS) incorporating homogeneous coordinates into the 3DGS framework, providing a unified representation for enhancing near and distant objects. HoGS effectively manages both expansive spatial positions and scales particularly in outdoor unbounded environments by adopting projective geometry principles. Experiments show that HoGS significantly enhances accuracy in reconstructing distant objects while maintaining high-quality rendering of nearby objects, along with fast training speed and real-time rendering capability.\n\n新视角合成（Novel View Synthesis）近年来取得了令人瞩目的进展，其中三维高斯溅射（3D Gaussian Splatting, 3DGS）在高效训练和真实感实时渲染方面表现突出。然而，3DGS 对笛卡尔坐标的依赖限制了其在远距离物体重建中的性能，而这在重建无边界的户外环境时尤为关键。我们发现，尽管方法极其简洁，将投影几何中的齐次坐标引入 3DGS 流水线，可以显著提升远距离物体的渲染精度。因此，我们提出了 齐次高斯溅射（Homogeneous Gaussian Splatting, HoGS），将齐次坐标融入 3DGS 框架中，提供统一的表示方式以同时增强近距离和远距离物体的渲染质量。HoGS 通过采用投影几何的原理，有效处理了户外无边界场景中广阔的空间位置与尺度变化。实验结果表明，HoGS 在保持近距离物体高质量渲染和快速训练、实时渲染能力的同时，大幅提升了远距离物体的重建精度。\n\n"
  },
  {
    "path": "abs/2503.19330.md",
    "content": "### MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection\n\nThis paper presents a novel masked attention-based 3D Gaussian Splatting (3DGS) approach to enhance robotic perception and object detection in industrial and smart factory environments. U2-Net is employed for background removal to isolate target objects from raw images, thereby minimizing clutter and ensuring that the model processes only relevant data. Additionally, a Sobel filter-based attention mechanism is integrated into the 3DGS framework to enhance fine details - capturing critical features such as screws, wires, and intricate textures essential for high-precision tasks. We validate our approach using quantitative metrics, including L1 loss, SSIM, PSNR, comparing the performance of the background-removed and attention-incorporated 3DGS model against the ground truth images and the original 3DGS training baseline. The results demonstrate significant improves in visual fidelity and detail preservation, highlighting the effectiveness of our method in enhancing robotic vision for object recognition and manipulation in complex industrial settings.\n\n本文提出了一种新颖的基于掩码注意力机制的三维高斯溅射（3D Gaussian Splatting, 3DGS）方法，以提升机器人在工业与智能工厂环境中的感知能力与目标检测性能。我们采用 U2-Net 进行背景移除，从原始图像中分离出目标物体，从而减少视觉杂乱，确保模型仅处理相关信息。此外，我们将基于 Sobel 滤波的注意力机制集成到 3DGS 框架中，以增强图像细节，捕捉诸如螺丝、导线和复杂纹理等对高精度任务至关重要的关键特征。\n我们使用包括 L1 损失、结构相似性（SSIM）和峰值信噪比（PSNR）在内的定量指标，对融合了背景移除与注意力机制的 3DGS 模型与原始 3DGS 基线模型进行了性能对比，并与真实图像进行了评估。结果表明，该方法在视觉保真度和细节保留方面均取得了显著提升，验证了其在复杂工业场景中增强机器人视觉能力、提升目标识别与操作精度的有效性。\n\n"
  },
  {
    "path": "abs/2503.19332.md",
    "content": "### Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussian Spatting\n\nSemantic 4D Gaussians can be used for reconstructing and understanding dynamic scenes, with temporal variations than static scenes. Directly applying static methods to understand dynamic scenes will fail to capture the temporal features. Few works focus on dynamic scene understanding based on Gaussian Splatting, since once the same update strategy is employed for both dynamic and static parts, regardless of the distinction and interaction between Gaussians, significant artifacts and noise appear. We propose Dual-Hierarchical Optimization (DHO), which consists of Hierarchical Gaussian Flow and Hierarchical Gaussian Guidance in a divide-and-conquer manner. The former implements effective division of static and dynamic rendering and features. The latter helps to mitigate the issue of dynamic foreground rendering distortion in textured complex scenes. Extensive experiments show that our method consistently outperforms the baselines on both synthetic and real-world datasets, and supports various downstream tasks.\n\n语义 4D 高斯（Semantic 4D Gaussians）可用于动态场景的重建与理解，能够处理随时间变化的内容，相较于静态场景更具挑战性。若直接将静态方法应用于动态场景，将无法捕捉其中的时间特征。目前基于 Gaussian Splatting 的动态场景理解研究较少，原因在于当对动态与静态部分采用相同的更新策略，而忽略了不同高斯之间的差异与交互时，容易产生显著的伪影和噪声。\n为此，我们提出了一种“双层次优化”（Dual-Hierarchical Optimization，DHO）方法，采用分而治之的策略，由“层次高斯流”（Hierarchical Gaussian Flow）与“层次高斯引导”（Hierarchical Gaussian Guidance）两个模块组成。前者实现了动态与静态渲染及特征的有效划分，后者则缓解了复杂纹理场景中动态前景渲染失真的问题。\n大量实验表明，我们的方法在合成数据与真实世界数据集上均显著优于现有基线方法，并支持多种下游任务。\n"
  },
  {
    "path": "abs/2503.19358.md",
    "content": "### From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting\n\nThis paper presents a novel camera relocalization method, STDLoc, which leverages Feature Gaussian as scene representation. STDLoc is a full relocalization pipeline that can achieve accurate relocalization without relying on any pose prior. Unlike previous coarse-to-fine localization methods that require image retrieval first and then feature matching, we propose a novel sparse-to-dense localization paradigm. Based on this scene representation, we introduce a novel matching-oriented Gaussian sampling strategy and a scene-specific detector to achieve efficient and robust initial pose estimation. Furthermore, based on the initial localization results, we align the query feature map to the Gaussian feature field by dense feature matching to enable accurate localization. The experiments on indoor and outdoor datasets show that STDLoc outperforms current state-of-the-art localization methods in terms of localization accuracy and recall.\n\n本文提出了一种新颖的相机重定位方法——STDLoc，该方法以 Feature Gaussian 作为场景表示。STDLoc 是一个完整的重定位流程，可在无需任何位姿先验的情况下实现高精度的相机重定位。\n与传统的由粗到细的重定位方法不同，后者通常需先进行图像检索再进行特征匹配，我们提出了一种全新的“由稀到密”（sparse-to-dense）重定位范式。基于所采用的场景表示方式，我们设计了一种面向匹配的高斯采样策略以及场景专属的特征检测器，从而实现高效且鲁棒的初始位姿估计。\n在初始重定位结果的基础上，我们通过密集特征匹配将查询图像的特征图与高斯特征场对齐，从而进一步提升定位精度。\n在多个室内与室外数据集上的实验结果表明，STDLoc 在定位精度和召回率方面均优于当前最先进的重定位方法。\n\n"
  },
  {
    "path": "abs/2503.19443.md",
    "content": "### COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting\n\nAccurate object segmentation is crucial for high-quality scene understanding in the 3D vision domain. However, 3D segmentation based on 3D Gaussian Splatting (3DGS) struggles with accurately delineating object boundaries, as Gaussian primitives often span across object edges due to their inherent volume and the lack of semantic guidance during training. In order to tackle these challenges, we introduce Clear Object Boundaries for 3DGS Segmentation (COB-GS), which aims to improve segmentation accuracy by clearly delineating blurry boundaries of interwoven Gaussian primitives within the scene. Unlike existing approaches that remove ambiguous Gaussians and sacrifice visual quality, COB-GS, as a 3DGS refinement method, jointly optimizes semantic and visual information, allowing the two different levels to cooperate with each other effectively. Specifically, for the semantic guidance, we introduce a boundary-adaptive Gaussian splitting technique that leverages semantic gradient statistics to identify and split ambiguous Gaussians, aligning them closely with object boundaries. For the visual optimization, we rectify the degraded suboptimal texture of the 3DGS scene, particularly along the refined boundary structures. Experimental results show that COB-GS substantially improves segmentation accuracy and robustness against inaccurate masks from pre-trained model, yielding clear boundaries while preserving high visual quality.\n\n在三维视觉领域，实现高质量场景理解的关键在于精确的目标分割。然而，基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的方法在准确描绘物体边界方面存在困难，这是因为高斯基元具有一定体积，容易跨越物体边界扩散，并且训练过程中缺乏语义引导。为了解决这一问题，我们提出了 COB-GS（Clear Object Boundaries for 3DGS Segmentation），旨在通过清晰地划分场景中交织在一起的模糊高斯基元边界，提升三维分割的准确性。\n与现有方法依赖剔除模糊高斯、从而牺牲视觉质量不同，COB-GS 作为一种 3DGS 精化方法，同时优化语义信息与视觉表现，使两个层次能够有效协同。具体而言，在语义引导方面，我们引入了一种边界自适应高斯分裂技术，利用语义梯度统计信息识别并分裂模糊的高斯基元，使其更加贴合物体边界。在视觉优化方面，我们对精化边界结构上的次优纹理进行修正，从而提升整体渲染质量。\n实验结果表明，COB-GS 显著提高了分割精度，并增强了对预训练模型生成的不准确掩膜的鲁棒性，在保留高视觉质量的同时，实现了边界的清晰呈现。\n"
  },
  {
    "path": "abs/2503.19452.md",
    "content": "### SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors\n\nSynthesizing novel views of large-scale scenes from unconstrained in-the-wild images is an important but challenging task in computer vision. Existing methods, which optimize per-image appearance and transient occlusion through implicit neural networks from dense training views (approximately 1000 images), struggle to perform effectively under sparse input conditions, resulting in noticeable artifacts. To this end, we propose SparseGS-W, a novel framework based on 3D Gaussian Splatting that enables the reconstruction of complex outdoor scenes and handles occlusions and appearance changes with as few as five training images. We leverage geometric priors and constrained diffusion priors to compensate for the lack of multi-view information from extremely sparse input. Specifically, we propose a plug-and-play Constrained Novel-View Enhancement module to iteratively improve the quality of rendered novel views during the Gaussian optimization process. Furthermore, we propose an Occlusion Handling module, which flexibly removes occlusions utilizing the inherent high-quality inpainting capability of constrained diffusion priors. Both modules are capable of extracting appearance features from any user-provided reference image, enabling flexible modeling of illumination-consistent scenes. Extensive experiments on the PhotoTourism and Tanks and Temples datasets demonstrate that SparseGS-W achieves state-of-the-art performance not only in full-reference metrics, but also in commonly used non-reference metrics such as FID, ClipIQA, and MUSIQ.\n\n从非约束、真实环境中的图像合成大规模场景的新视角，是计算机视觉领域一项重要但具有挑战性的任务。现有方法通常依赖于隐式神经网络，在大约 1000 张密集训练图像上，通过逐图优化外观和瞬时遮挡，但在稀疏输入条件下表现不佳，容易产生明显伪影。\n为此，我们提出了 SparseGS-W，一个基于三维高斯溅射（3D Gaussian Splatting）的新型框架，能够在仅使用五张训练图像的情况下重建复杂的户外场景，并处理遮挡与外观变化问题。我们引入几何先验和受限扩散先验（constrained diffusion priors），以弥补极端稀疏输入下缺乏多视角信息的问题。\n具体而言，我们提出了一个即插即用的受限新视角增强模块（Constrained Novel-View Enhancement），在高斯优化过程中迭代提升新视角渲染质量。此外，我们还设计了一个遮挡处理模块（Occlusion Handling），利用受限扩散先验中固有的高质量图像修复能力，灵活地去除遮挡。\n这两个模块均可从任意用户提供的参考图像中提取外观特征，使得系统能够灵活建模具有一致光照条件的场景。\n在 PhotoTourism 和 Tanks and Temples 数据集上的大量实验表明，SparseGS-W 不仅在完整参考指标上取得了当前最佳性能，还在 FID、ClipIQA 和 MUSIQ 等常用无参考指标上表现出色。\n"
  },
  {
    "path": "abs/2503.19458.md",
    "content": "### GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting\n\nReconstructing open surfaces from multi-view images is vital in digitalizing complex objects in daily life. A widely used strategy is to learn unsigned distance functions (UDFs) by checking if their appearance conforms to the image observations through neural rendering. However, it is still hard to learn continuous and implicit UDF representations through 3D Gaussians splatting (3DGS) due to the discrete and explicit scene representation, i.e., 3D Gaussians. To resolve this issue, we propose a novel approach to bridge the gap between 3D Gaussians and UDFs. Our key idea is to overfit thin and flat 2D Gaussian planes on surfaces, and then, leverage the self-supervision and gradient-based inference to supervise unsigned distances in both near and far area to surfaces. To this end, we introduce novel constraints and strategies to constrain the learning of 2D Gaussians to pursue more stable optimization and more reliable self-supervision, addressing the challenges brought by complicated gradient field on or near the zero level set of UDFs. We report numerical and visual comparisons with the state-of-the-art on widely used benchmarks and real data to show our advantages in terms of accuracy, efficiency, completeness, and sharpness of reconstructed open surfaces with boundaries.\n\n从多视角图像中重建开放表面，对于数字化日常生活中的复杂物体具有重要意义。当前广泛采用的一种策略是通过神经渲染检查外观是否符合图像观测，从而学习无符号距离函数（Unsigned Distance Functions, UDFs）。然而，由于三维高斯溅射（3D Gaussian Splatting, 3DGS）采用的是离散且显式的场景表示（即 3D 高斯），因此很难直接通过其学习连续、隐式的 UDF 表达。\n为了解决这一问题，我们提出了一种新颖的方法，旨在弥合 3D 高斯与 UDF 表达之间的鸿沟。我们的核心思想是：在物体表面过拟合细薄的二维高斯平面，并结合自监督与基于梯度的推理策略，监督表面近邻与远处区域的无符号距离估计。为此，我们引入了一系列新的约束与优化策略，用于限制二维高斯的学习过程，以实现更稳定的优化和更可靠的自监督信号，从而有效应对 UDF 零水平集附近复杂梯度场所带来的挑战。\n我们在多个广泛使用的基准数据集和真实数据上进行了定量与可视化对比实验，结果表明，该方法在重建具有边界的开放表面方面，在准确性、效率、完整性与边缘锐利度上均优于现有最先进方法。\n"
  },
  {
    "path": "abs/2503.19703.md",
    "content": "### High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting\n\nHighly accurate geometric precision and dense image features characterize True Digital Orthophoto Maps (TDOMs), which are in great demand for applications such as urban planning, infrastructure management, and environmental monitoring. Traditional TDOM generation methods need sophisticated processes, such as Digital Surface Models (DSM) and occlusion detection, which are computationally expensive and prone to errors. This work presents an alternative technique rooted in 2D Gaussian Splatting (2DGS), free of explicit DSM and occlusion detection. With depth map generation, spatial information for every pixel within the TDOM is retrieved and can reconstruct the scene with high precision. Divide-and-conquer strategy achieves excellent GS training and rendering with high-resolution TDOMs at a lower resource cost, which preserves higher quality of rendering on complex terrain and thin structure without a decrease in efficiency. Experimental results demonstrate the efficiency of large-scale scene reconstruction and high-precision terrain modeling. This approach provides accurate spatial data, which assists users in better planning and decision-making based on maps.\n\n高精度几何信息与密集图像特征是真正数字正射影像图（True Digital Orthophoto Maps, TDOMs）的核心特征，使其在城市规划、基础设施管理和环境监测等应用中需求强烈。传统的 TDOM 生成方法通常依赖于复杂的流程，如数字表面模型（DSM）构建和遮挡检测，不仅计算开销大，还容易引入误差。\n本研究提出了一种基于二维高斯投影（2D Gaussian Splatting, 2DGS）的替代方法，无需显式生成 DSM 或进行遮挡检测。通过深度图的生成，我们可以为 TDOM 中每个像素恢复空间信息，从而实现高精度的场景重建。\n此外，我们采用分而治之的策略，有效提升高斯训练与渲染效率，在资源消耗更低的条件下，实现高分辨率 TDOM 的生成。该方法在复杂地形与细长结构的渲染中依然保持高质量，同时不牺牲效率。\n实验结果表明，该方法在大规模场景重建与高精度地形建模方面表现出色，能够提供准确的空间数据，辅助用户基于地图做出更科学的规划与决策。\n"
  },
  {
    "path": "abs/2503.19913.md",
    "content": "### PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model\n\nAs interest grows in world models that predict future states from current observations and actions, accurately modeling part-level dynamics has become increasingly relevant for various applications. Existing approaches, such as Puppet-Master, rely on fine-tuning large-scale pre-trained video diffusion models, which are impractical for real-world use due to the limitations of 2D video representation and slow processing times. To overcome these challenges, we present PartRM, a novel 4D reconstruction framework that simultaneously models appearance, geometry, and part-level motion from multi-view images of a static object. PartRM builds upon large 3D Gaussian reconstruction models, leveraging their extensive knowledge of appearance and geometry in static objects. To address data scarcity in 4D, we introduce the PartDrag-4D dataset, providing multi-view observations of part-level dynamics across over 20,000 states. We enhance the model's understanding of interaction conditions with a multi-scale drag embedding module that captures dynamics at varying granularities. To prevent catastrophic forgetting during fine-tuning, we implement a two-stage training process that focuses sequentially on motion and appearance learning. Experimental results show that PartRM establishes a new state-of-the-art in part-level motion learning and can be applied in manipulation tasks in robotics.\n\n随着对能够根据当前观测与动作预测未来状态的世界模型的关注不断增长，准确建模部件级别的动态在多个应用中变得愈发重要。现有方法（如 Puppet-Master）依赖对大规模预训练视频扩散模型的微调，但由于二维视频表示的局限性和处理速度缓慢，在现实世界中难以实际应用。\n为克服这些挑战，我们提出了 PartRM，这是一种新颖的四维重建框架，能够同时建模静态物体的外观、几何结构以及部件级别的运动。PartRM 构建于大型三维高斯重建模型之上，利用其在静态物体外观与几何方面的丰富知识。为缓解 4D 数据稀缺的问题，我们引入了 PartDrag-4D 数据集，提供涵盖两万多个状态的部件级动态多视角观测数据。\n我们还通过多尺度拖拽嵌入模块增强模型对交互条件的理解，该模块可捕捉不同粒度下的动态变化。为防止在微调过程中发生灾难性遗忘，我们设计了一个两阶段训练过程，依次聚焦于运动学习与外观学习。\n实验结果表明，PartRM 在部件级运动学习方面达到了新的最先进水平，并可应用于机器人操作任务中。\n"
  },
  {
    "path": "abs/2503.19976.md",
    "content": "### Thin-Shell-SfT: Fine-Grained Monocular Non-rigid 3D Surface Tracking with Neural Deformation Fields\n\n3D reconstruction of highly deformable surfaces (e.g. cloths) from monocular RGB videos is a challenging problem, and no solution provides a consistent and accurate recovery of fine-grained surface details. To account for the ill-posed nature of the setting, existing methods use deformation models with statistical, neural, or physical priors. They also predominantly rely on nonadaptive discrete surface representations (e.g. polygonal meshes), perform frame-by-frame optimisation leading to error propagation, and suffer from poor gradients of the mesh-based differentiable renderers. Consequently, fine surface details such as cloth wrinkles are often not recovered with the desired accuracy. In response to these limitations, we propose ThinShell-SfT, a new method for non-rigid 3D tracking that represents a surface as an implicit and continuous spatiotemporal neural field. We incorporate continuous thin shell physics prior based on the Kirchhoff-Love model for spatial regularisation, which starkly contrasts the discretised alternatives of earlier works. Lastly, we leverage 3D Gaussian splatting to differentiably render the surface into image space and optimise the deformations based on analysis-bysynthesis principles. Our Thin-Shell-SfT outperforms prior works qualitatively and quantitatively thanks to our continuous surface formulation in conjunction with a specially tailored simulation prior and surface-induced 3D Gaussians.\n\n从单目 RGB 视频中重建高度可变形表面（如布料）的三维形状是一项具有挑战性的任务，目前尚无方法能够一致且精确地恢复细粒度的表面细节。由于该问题本质上是病态的，现有方法通常引入统计、神经或物理先验的变形模型。然而，这些方法大多依赖非自适应的离散表面表示（例如多边形网格），进行逐帧优化，导致误差累积，并且受到基于网格的可微渲染器梯度质量差的限制。因此，诸如布料褶皱等细节通常无法以理想精度恢复。\n针对上述限制，我们提出了 ThinShell-SfT，这是一种用于非刚性三维追踪的新方法，将表面表示为隐式且连续的时空神经场。我们引入了基于 Kirchhoff-Love 模型的连续薄壳物理先验，用于空间正则化，这与以往工作中的离散化替代方案形成鲜明对比。最后，我们利用三维高斯投影将表面可微渲染到图像空间，并基于“分析-合成”原则优化变形。\n得益于我们连续的表面建模方式、专门设计的仿真先验以及由表面诱导的三维高斯表示，ThinShell-SfT 在定性与定量评估中均优于现有方法。\n"
  },
  {
    "path": "abs/2503.20168.md",
    "content": "### EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis\n\nNovel view synthesis of urban scenes is essential for autonomous driving-related applications. Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner. Unlike existing feed-forward, pixelaligned 3DGS methods, which often suffer from issues like multi-view inconsistencies and duplicated content, our approach predicts 3D Gaussians across multiple frames within a unified volume using a 3D convolutional network. This is achieved by initializing 3D Gaussians with noisy depth predictions, and then refining their geometric properties in 3D space and predicting color based on 2D textures. Our model also handles distant views and the sky with a flexible hemisphere background model. This enables us to perform fast, feed-forward reconstruction while achieving real-time rendering. Experimental evaluations on the KITTI-360 and Waymo datasets show that our method achieves state-of-the-art quality compared to existing feedforward 3DGS- and NeRF-based methods.\n\n城市场景的新视角合成对于自动驾驶相关应用至关重要。尽管现有基于 NeRF 和 3D Gaussian Splatting（3DGS）的方法在实现真实感渲染方面表现出色，但它们通常依赖于缓慢的逐场景优化过程。我们提出了 EVolSplat，一种高效的城市场景 3D 高斯溅射模型，能够以前馈方式运行。不同于现有的前馈式、像素对齐的 3DGS 方法常常面临多视图不一致与内容重复等问题，EVolSplat 通过一个三维卷积网络，在统一体积内预测多个帧的 3D 高斯，从而避免这些问题。具体而言，我们首先利用带噪声的深度预测初始化 3D 高斯，随后在三维空间中对其几何属性进行精细调整，并根据二维纹理预测颜色。此外，我们还设计了灵活的半球背景建模机制，用于处理远距离视角和天空区域，使得系统能够在实现实时渲染的同时，完成快速的前馈式重建。\n在 KITTI-360 和 Waymo 数据集上的实验评估表明，与现有前馈式 3DGS 和 NeRF 方法相比，EVolSplat 在图像质量方面达到了当前最优水平。\n"
  },
  {
    "path": "abs/2503.20221.md",
    "content": "### TC-GS: Tri-plane based compression for 3D Gaussian Splatting\n\nRecently, 3D Gaussian Splatting (3DGS) has emerged as a prominent framework for novel view synthesis, providing high fidelity and rapid rendering speed. However, the substantial data volume of 3DGS and its attributes impede its practical utility, requiring compression techniques for reducing memory cost. Nevertheless, the unorganized shape of 3DGS leads to difficulties in compression. To formulate unstructured attributes into normative distribution, we propose a well-structured tri-plane to encode Gaussian attributes, leveraging the distribution of attributes for compression. To exploit the correlations among adjacent Gaussians, K-Nearest Neighbors (KNN) is used when decoding Gaussian distribution from the Tri-plane. We also introduce Gaussian position information as a prior of the position-sensitive decoder. Additionally, we incorporate an adaptive wavelet loss, aiming to focus on the high-frequency details as iterations increase. Our approach has achieved results that are comparable to or surpass that of SOTA 3D Gaussians Splatting compression work in extensive experiments across multiple datasets.\n\n近年来，三维高斯溅射（3D Gaussian Splatting, 3DGS）作为新视角合成的一种重要框架，因其高保真度和快速渲染速度而受到广泛关注。然而，3DGS 模型本身及其属性数据量庞大，限制了其在实际场景中的应用，亟需压缩技术以降低内存开销。然而，由于 3DGS 的结构无序，使得其压缩过程面临诸多挑战。\n为此，我们提出了一种结构良好的**三平面表示（Tri-plane）**来编码高斯属性，将非结构化属性转换为规范化分布，从而有利于压缩操作。为了利用相邻高斯之间的相关性，我们在从三平面中解码高斯分布时引入 **K 近邻（K-Nearest Neighbors, KNN）**方法。同时，我们将高斯位置引入作为位置敏感解码器的先验信息，以增强空间一致性。\n此外，我们还引入了一种自适应小波损失（adaptive wavelet loss），该损失函数随着迭代过程动态聚焦于图像中的高频细节区域，从而提升最终渲染的细节保留能力。\n在多个数据集上的大量实验表明，我们的方法在压缩效果上已达到或超越当前最先进的 3DGS 压缩方法，在保证渲染质量的同时显著降低了内存成本。\n\n"
  },
  {
    "path": "abs/2503.20776.md",
    "content": "### Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields\n\nRecent advancements in 2D and multimodal models have achieved remarkable success by leveraging large-scale training on extensive datasets. However, extending these achievements to enable free-form interactions and high-level semantic operations with complex 3D/4D scenes remains challenging. This difficulty stems from the limited availability of large-scale, annotated 3D/4D or multi-view datasets, which are crucial for generalizable vision and language tasks such as open-vocabulary and prompt-based segmentation, language-guided editing, and visual question answering (VQA). In this paper, we introduce Feature4X, a universal framework designed to extend any functionality from 2D vision foundation model into the 4D realm, using only monocular video input, which is widely available from user-generated content. The \"X\" in Feature4X represents its versatility, enabling any task through adaptable, model-conditioned 4D feature field distillation. At the core of our framework is a dynamic optimization strategy that unifies multiple model capabilities into a single representation. Additionally, to the best of our knowledge, Feature4X is the first method to distill and lift the features of video foundation models (e.g., SAM2, InternVideo2) into an explicit 4D feature field using Gaussian Splatting. Our experiments showcase novel view segment anything, geometric and appearance scene editing, and free-form VQA across all time steps, empowered by LLMs in feedback loops. These advancements broaden the scope of agentic AI applications by providing a foundation for scalable, contextually and spatiotemporally aware systems capable of immersive dynamic 4D scene interaction.\n\n近年来，二维及多模态模型借助大规模数据训练，在多个任务上取得了显著成功。然而，将这些成果拓展到复杂三维/四维场景中的自由交互和高层语义操作仍面临巨大挑战。这主要归因于缺乏大规模带注释的三维/四维或多视图数据集，而这些数据对实现具备泛化能力的视觉-语言任务至关重要，如开放词汇与提示式分割、语言引导编辑、视觉问答（VQA）等。\n为此，本文提出了 Feature4X ——一个通用框架，旨在将任何二维视觉基础模型的能力扩展到四维场景，仅需单目视频输入，这类数据广泛存在于用户生成内容中。框架名称中的 “X” 表示其通用性，通过可适配的、模型条件驱动的四维特征场蒸馏机制，支持任意任务。\nFeature4X 的核心是一种动态优化策略，能够将多个模型能力统一融合进一个共享表示中。此外，据我们所知，Feature4X 是首个方法可将视频基础模型（如 SAM2、InternVideo2）的特征蒸馏并提升为显式的四维特征场，采用高斯溅射（Gaussian Splatting）进行建模。\n实验展示了我们方法在任意视角分割（novel view segment anything）、几何与外观场景编辑以及**跨时间步的自由形式视觉问答（VQA）**中的强大能力，并通过大型语言模型（LLMs）引入反馈闭环。上述成果为具备时空感知和上下文理解能力的可扩展智能体系统奠定了基础，拓展了 Agentic AI 在沉浸式动态四维场景交互中的应用前景。\n"
  },
  {
    "path": "abs/2503.20779.md",
    "content": "### Photorealistic Simulation-Ready Garments from a Single Pose\n\nWe introduce a novel approach to reconstruct simulation-ready garments with intricate appearance. Despite recent advancements, existing methods often struggle to balance the need for accurate garment reconstruction with the ability to generalize to new poses and body shapes or require large amounts of data to achieve this. In contrast, our method only requires a multi-view capture of a single static frame. We represent garments as hybrid mesh-embedded 3D Gaussian splats, where the Gaussians capture near-field shading and high-frequency details, while the mesh encodes far-field albedo and optimized reflectance parameters. We achieve novel pose generalization by exploiting the mesh from our hybrid approach, enabling physics-based simulation and surface rendering techniques, while also capturing fine details with Gaussians that accurately reconstruct garment details. Our optimized garments can be used for simulating garments on novel poses, and garment relighting.\n\n我们提出了一种用于重建具备复杂外观、可用于仿真的服装的新方法。尽管近年来该领域取得了一定进展，但现有方法常常难以在精确还原服装细节与对新姿态及不同体型的泛化能力之间取得平衡，或是依赖大量数据才能实现。相比之下，我们的方法仅需一个静态帧的多视角捕捉即可完成高质量重建。\n我们将服装表示为一种混合网格嵌入的三维高斯溅射结构（hybrid mesh-embedded 3D Gaussian splats）。其中，高斯部分用于捕捉近场阴影与高频细节，而网格部分则编码远场反照率（albedo）以及优化后的反射参数。\n为了实现姿态泛化，我们利用该混合结构中的网格，实现基于物理的仿真与表面渲染。同时，通过高斯组件精确还原服装的微细结构，实现对外观细节的高保真建模。\n我们优化后的服装模型可直接用于新姿态下的物理仿真与服装重光照渲染（relighting），兼具仿真物理合理性与外观真实感。\n\n"
  },
  {
    "path": "abs/2503.20998.md",
    "content": "### CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis\n\nWe propose Covisibility Map-based Gaussian Splatting (CoMapGS), designed to recover underrepresented sparse regions in sparse novel view synthesis. CoMapGS addresses both high- and low-uncertainty regions by constructing covisibility maps, enhancing initial point clouds, and applying uncertainty-aware weighted supervision using a proximity classifier. Our contributions are threefold: (1) CoMapGS reframes novel view synthesis by leveraging covisibility maps as a core component to address region-specific uncertainty; (2) Enhanced initial point clouds for both low- and high-uncertainty regions compensate for sparse COLMAP-derived point clouds, improving reconstruction quality and benefiting few-shot 3DGS methods; (3) Adaptive supervision with covisibility-score-based weighting and proximity classification achieves consistent performance gains across scenes with varying sparsity scores derived from covisibility maps. Experimental results demonstrate that CoMapGS outperforms state-of-the-art methods on datasets including Mip-NeRF 360 and LLFF.\n\n我们提出了基于共视图地图的高斯投影方法（CoMapGS），旨在解决稀疏新视图合成中欠表示区域的恢复问题。CoMapGS 通过构建共视图地图、增强初始点云，并结合基于邻近分类器的不确定性感知加权监督，兼顾高不确定性与低不确定性区域的处理。我们的贡献包括三点：(1) CoMapGS 通过引入共视图地图作为核心组件，重新定义新视图合成过程，有效应对区域特定的不确定性；(2) 对低不确定性与高不确定性区域的初始点云进行增强，以弥补基于 COLMAP 的稀疏点云，提高重建质量，同时惠及小样本 3DGS 方法；(3) 结合共视图得分加权与邻近分类的自适应监督策略，在不同稀疏度评分的场景中实现了一致的性能提升。实验结果表明，CoMapGS 在 Mip-NeRF 360 和 LLFF 等数据集上优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2503.21226.md",
    "content": "### Frequency-Aware Gaussian Splatting Decomposition\n\n3D Gaussian Splatting (3D-GS) has revolutionized novel view synthesis with its efficient, explicit representation. However, it lacks frequency interpretability, making it difficult to separate low-frequency structures from fine details. We introduce a frequency-decomposed 3D-GS framework that groups 3D Gaussians that correspond to subbands in the Laplacian Pyrmaids of the input images. Our approach enforces coherence within each subband (i.e., group of 3D Gaussians) through dedicated regularization, ensuring well-separated frequency components. We extend color values to both positive and negative ranges, allowing higher-frequency layers to add or subtract residual details. To stabilize optimization, we employ a progressive training scheme that refines details in a coarse-to-fine manner. Beyond interpretability, this frequency-aware design unlocks a range of practical benefits. Explicit frequency separation enables advanced 3D editing and stylization, allowing precise manipulation of specific frequency bands. It also supports dynamic level-of-detail control for progressive rendering, streaming, foveated rendering and fast geometry interaction. Through extensive experiments, we demonstrate that our method provides improved control and flexibility for emerging applications in scene editing and interactive rendering.\n\n3D Gaussian Splatting（3D-GS）凭借其高效、显式的表示方式，彻底改变了新视角合成的实现方式。然而，该方法缺乏频率可解释性，使得难以将低频结构与细节信息分离。为此，我们提出了一种基于频率分解的3D-GS框架，将对应于输入图像拉普拉斯金字塔中各子带的3D高斯进行分组。我们的方法通过专门的正则化机制，在每个子带（即一组3D高斯）内强制保持一致性，从而确保频率成分清晰分离。\n我们将颜色值扩展到正负范围，使得高频层可以叠加或抵消残差细节。为了稳定优化过程，我们采用一种由粗到细的渐进式训练方案，以逐步精炼图像细节。\n除了提升可解释性之外，这种具备频率感知能力的设计还带来了诸多实用优势。显式的频率分离支持高级的3D编辑与风格化处理，使得用户能够精确操控特定频带。同时，该方法还支持动态细节层级控制，适用于渐进式渲染、流式传输、注视点渲染和快速几何交互。\n通过大量实验，我们展示了该方法在场景编辑和交互式渲染等新兴应用中提供了更强的控制能力与更高的灵活性。\n"
  },
  {
    "path": "abs/2503.21425.md",
    "content": "### STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM\n\nSimultaneous Localization and Mapping (SLAM) is a critical task in robotics, enabling systems to autonomously navigate and understand complex environments. Current SLAM approaches predominantly rely on geometric cues for mapping and localization, but they often fail to ensure semantic consistency, particularly in dynamic or densely populated scenes. To address this limitation, we introduce STAMICS, a novel method that integrates semantic information with 3D Gaussian representations to enhance both localization and mapping accuracy. STAMICS consists of three key components: a 3D Gaussian-based scene representation for high-fidelity reconstruction, a graph-based clustering technique that enforces temporal semantic consistency, and an open-vocabulary system that allows for the classification of unseen objects. Extensive experiments show that STAMICS significantly improves camera pose estimation and map quality, outperforming state-of-the-art methods while reducing reconstruction errors.\n\n同时定位与建图（Simultaneous Localization and Mapping, SLAM）是机器人领域的一项关键任务，使系统能够自主导航并理解复杂环境。当前的 SLAM 方法主要依赖几何线索进行建图和定位，但在动态或高密度场景中往往难以保证语义一致性。为了解决这一限制，我们提出了 STAMICS，这是一种将语义信息与三维高斯表示相结合的新方法，以提升定位和建图的精度。STAMICS 包含三个核心组件：基于三维高斯的场景表示，用于实现高保真重建；基于图的聚类技术，用于强制执行时间上的语义一致性；以及一个开放词汇系统，能够对未见过的物体进行分类。大量实验表明，STAMICS 显著提升了相机位姿估计与地图质量，在降低重建误差的同时，性能优于当前的先进方法。\n"
  },
  {
    "path": "abs/2503.21442.md",
    "content": "### RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting\n\nWe consider the problem of adding dynamic rain effects to in-the-wild scenes in a physically-correct manner. Recent advances in scene modeling have made significant progress, with NeRF and 3DGS techniques emerging as powerful tools for reconstructing complex scenes. However, while effective for novel view synthesis, these methods typically struggle with challenging scene editing tasks, such as physics-based rain simulation. In contrast, traditional physics-based simulations can generate realistic rain effects, such as raindrops and splashes, but they often rely on skilled artists to carefully set up high-fidelity scenes. This process lacks flexibility and scalability, limiting its applicability to broader, open-world environments. In this work, we introduce RainyGS, a novel approach that leverages the strengths of both physics-based modeling and 3DGS to generate photorealistic, dynamic rain effects in open-world scenes with physical accuracy. At the core of our method is the integration of physically-based raindrop and shallow water simulation techniques within the fast 3DGS rendering framework, enabling realistic and efficient simulations of raindrop behavior, splashes, and reflections. Our method supports synthesizing rain effects at over 30 fps, offering users flexible control over rain intensity -- from light drizzles to heavy downpours. We demonstrate that RainyGS performs effectively for both real-world outdoor scenes and large-scale driving scenarios, delivering more photorealistic and physically-accurate rain effects compared to state-of-the-art methods.\n\n我们关注的问题是如何以物理正确的方式为自然场景添加动态雨效。近年来，场景建模技术取得了显著进展，NeRF 和 3DGS 等方法已成为重建复杂场景的有力工具。然而，尽管这些方法在新视角合成方面表现出色，但通常难以胜任如基于物理的雨景模拟等复杂场景编辑任务。相比之下，传统的基于物理的模拟可以生成逼真的雨滴与水花效果，但往往依赖经验丰富的艺术家精心搭建高保真场景。这种流程缺乏灵活性与可扩展性，难以适用于更广泛的开放世界环境。\n在本工作中，我们提出了一种新方法 RainyGS，融合了基于物理建模与 3DGS 的优势，能够在开放世界场景中以物理精度生成照片级真实的动态雨效。我们方法的核心是将物理驱动的雨滴与浅水模拟技术整合进高效的 3DGS 渲染框架中，从而实现雨滴运动、水花飞溅与镜面反射的真实高效模拟。该方法支持以超过 30 帧每秒的速度合成雨景，并允许用户灵活控制降雨强度——从细雨到暴雨皆可调节。\n我们展示了 RainyGS 在真实户外场景和大规模驾驶场景中均表现出色，生成的雨效在真实感与物理准确性方面均优于当前最先进方法。\n"
  },
  {
    "path": "abs/2503.21767.md",
    "content": "### Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying\n\nOpen-vocabulary querying in 3D Gaussian Splatting aims to identify semantically relevant regions within a 3D Gaussian representation based on a given text query. Prior work, such as LangSplat, addressed this task by retrieving these regions in the form of segmentation masks on 2D renderings. More recently, OpenGaussian introduced point-level querying, which directly selects a subset of 3D Gaussians. In this work, we propose a point-level querying method that builds upon LangSplat's framework. Our approach improves the framework in two key ways: (a) we leverage masklets from the Segment Anything Model 2 (SAM2) to establish semantic consistent ground-truth for distilling the language Gaussians; (b) we introduces a novel two-step querying approach that first retrieves the distilled ground-truth and subsequently uses the ground-truth to query the individual Gaussians. Experimental evaluations on three benchmark datasets demonstrate that the proposed method achieves better performance compared to state-of-the-art approaches. For instance, our method achieves an mIoU improvement of +20.42 on the 3D-OVS dataset.\n\n在 3D Gaussian Splatting 中进行开放词汇查询（Open-vocabulary Querying）旨在根据给定的文本查询，在 3D 高斯表示中识别语义相关区域。已有工作如 LangSplat 通过在二维渲染图上生成分割掩码的方式来完成该任务。近期的 OpenGaussian 则引入了点级查询方法，可直接选取一部分 3D 高斯点。\n本工作提出了一种基于 LangSplat 框架的点级查询方法，并在两个关键方面对该框架进行了改进：（a）我们利用 Segment Anything Model 2（SAM2）生成的 masklet，构建语义一致的蒸馏真值，用于训练语言引导的高斯表示；（b）我们提出了一种新颖的两阶段查询方法：首先检索蒸馏得到的真值区域，然后基于该真值进一步查询具体的 3D 高斯点。\n在三个基准数据集上的实验评估表明，所提出的方法在性能上优于现有的先进方法。例如，在 3D-OVS 数据集上，我们的方法实现了 +20.42 的 mIoU 提升。\n\n"
  },
  {
    "path": "abs/2503.21779.md",
    "content": "### X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction\n\nFour-dimensional computed tomography (4D CT) reconstruction is crucial for capturing dynamic anatomical changes but faces inherent limitations from conventional phase-binning workflows. Current methods discretize temporal resolution into fixed phases with respiratory gating devices, introducing motion misalignment and restricting clinical practicality. In this paper, We propose X2-Gaussian, a novel framework that enables continuous-time 4D-CT reconstruction by integrating dynamic radiative Gaussian splatting with self-supervised respiratory motion learning. Our approach models anatomical dynamics through a spatiotemporal encoder-decoder architecture that predicts time-varying Gaussian deformations, eliminating phase discretization. To remove dependency on external gating devices, we introduce a physiology-driven periodic consistency loss that learns patient-specific breathing cycles directly from projections via differentiable optimization. Extensive experiments demonstrate state-of-the-art performance, achieving a 9.93 dB PSNR gain over traditional methods and 2.25 dB improvement against prior Gaussian splatting techniques. By unifying continuous motion modeling with hardware-free period learning, X2-Gaussian advances high-fidelity 4D CT reconstruction for dynamic clinical imaging.\n\n四维计算机断层扫描（4D CT）重建对于捕捉动态解剖变化至关重要，但传统的相位分箱流程存在固有局限。目前的方法通常将时间分辨率离散为固定相位，并依赖呼吸门控设备，从而引入运动错位，限制了临床实用性。\n本文中，我们提出了一种新框架 X2-Gaussian，通过结合动态辐射高斯投影与自监督呼吸运动学习，实现了连续时间的 4D CT 重建。我们的方法基于时空编码-解码架构建模解剖动态，预测随时间变化的高斯形变，从而消除了相位离散化的需求。\n为摆脱对外部门控设备的依赖，我们引入了一种基于生理节律的周期一致性损失函数，可通过可微优化从投影数据中直接学习患者特异性的呼吸周期。大量实验表明，X2-Gaussian 在性能上达到当前最优，相较于传统方法提升了 9.93 dB 的 PSNR，相较于现有高斯投影方法也提高了 2.25 dB。\n通过将连续运动建模与无硬件周期学习统一起来，X2-Gaussian 推进了高保真 4D CT 重建在动态临床影像中的发展。\n"
  },
  {
    "path": "abs/2503.21816.md",
    "content": "### EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis\n\nGaussian Splatting (GS)-based methods rely on sufficient training view coverage and perform synthesis on interpolated views. In this work, we tackle the more challenging and underexplored Extrapolated View Synthesis (EVS) task. Here we enable GS-based models trained with limited view coverage to generalize well to extrapolated views. To achieve our goal, we propose a view augmentation framework to guide training through a coarse-to-fine process. At the coarse stage, we reduce rendering artifacts due to insufficient view coverage by introducing a regularization strategy at both appearance and geometry levels. At the fine stage, we generate reliable view priors to provide further training guidance. To this end, we incorporate an occlusion awareness into the view prior generation process, and refine the view priors with the aid of coarse stage output. We call our framework Enhanced View Prior Guidance for Splatting (EVPGS). To comprehensively evaluate EVPGS on the EVS task, we collect a real-world dataset called Merchandise3D dedicated to the EVS scenario. Experiments on three datasets including both real and synthetic demonstrate EVPGS achieves state-of-the-art performance, while improving synthesis quality at extrapolated views for GS-based methods both qualitatively and quantitatively.\n\n\n基于高斯投影（Gaussian Splatting, GS）的方法依赖于充足的训练视角覆盖，并通常在插值视角上进行图像合成。在本研究中，我们聚焦于一个更具挑战性且尚未被充分探索的任务：外推视角合成（Extrapolated View Synthesis, EVS）。我们的目标是使基于 GS 的模型即便在训练视角覆盖受限的情况下，也能很好地泛化到外推视角。\n为此，我们提出了一种视角增强训练框架，通过粗到细的过程引导模型学习。在粗阶段，我们在外观和几何两个层面引入正则化策略，以减少由视角覆盖不足带来的渲染伪影；在精阶段，我们生成可靠的视角先验以进一步提供训练指导。为提升视角先验的可靠性，我们在生成过程中引入了遮挡感知机制，并利用粗阶段的输出对视角先验进行细化。\n我们将该框架命名为 EVPGS（Enhanced View Prior Guidance for Splatting）。为全面评估 EVPGS 在 EVS 任务中的表现，我们构建了一个专用于 EVS 场景的真实世界数据集 Merchandise3D。在三个数据集（包括真实和合成数据）上的实验表明，EVPGS 在外推视角合成任务中达到了当前最先进性能，在定性与定量指标上均显著提升了基于 GS 方法的合成质量。\n"
  },
  {
    "path": "abs/2503.22159.md",
    "content": "### Disentangled 4D Gaussian Splatting: Towards Faster and More Efficient Dynamic Scene Rendering\n\nNovel-view synthesis (NVS) for dynamic scenes from 2D images presents significant challenges due to the spatial complexity and temporal variability of such scenes. Recently, inspired by the remarkable success of NVS using 3D Gaussian Splatting (3DGS), researchers have sought to extend 3D Gaussian models to four dimensions (4D) for dynamic novel-view synthesis. However, methods based on 4D rotation and scaling introduce spatiotemporal deformation into the 4D covariance matrix, necessitating the slicing of 4D Gaussians into 3D Gaussians. This process increases redundant computations as timestamps change-an inherent characteristic of dynamic scene rendering. Additionally, performing calculations on a four-dimensional matrix is computationally intensive. In this paper, we introduce Disentangled 4D Gaussian Splatting (Disentangled4DGS), a novel representation and rendering approach that disentangles temporal and spatial deformations, thereby eliminating the reliance on 4D matrix computations. We extend the 3DGS rendering process to 4D, enabling the projection of temporal and spatial deformations into dynamic 2D Gaussians in ray space. Consequently, our method facilitates faster dynamic scene synthesis. Moreover, it reduces storage requirements by at least 4.5% due to our efficient presentation method. Our approach achieves an unprecedented average rendering speed of 343 FPS at a resolution of 1352×1014 on an RTX 3090 GPU, with experiments across multiple benchmarks demonstrating its competitive performance in both monocular and multi-view scenarios.\n\n动态场景的二维图像新视角合成（Novel-view Synthesis, NVS）由于场景的空间复杂性和时间变化性而面临重大挑战。近年来，受益于基于三维高斯泼洒（3D Gaussian Splatting, 3DGS）技术在新视角合成中取得的卓越成果，研究人员开始探索将三维高斯模型扩展到四维（4D），以实现动态场景的新视角合成。然而，基于四维旋转和缩放的方法会在四维协方差矩阵中引入时空变形，进而需要将四维高斯切片为三维高斯。随着时间戳的变化，这一过程增加了冗余计算，这是动态场景渲染中固有的特性。此外，在四维矩阵上进行计算也极为耗费算力。\n为此，本文提出了一种新的表示与渲染方法——解耦四维高斯泼洒（Disentangled 4D Gaussian Splatting, Disentangled4DGS）。该方法通过解耦时间变形与空间变形，消除了对四维矩阵计算的依赖。我们将3DGS的渲染过程扩展到四维，使时间和空间变形能够投影为射线空间中的动态二维高斯，从而加速了动态场景的合成。同时，由于高效的表示方式，我们的方法至少减少了4.5%的存储开销。\n在分辨率为1352×1014的条件下，我们的方法在RTX 3090 GPU上实现了前所未有的343 FPS平均渲染速度。大量基准测试实验表明，在单目和多视角场景下，我们的方法在性能上均表现出强劲的竞争力。\n"
  },
  {
    "path": "abs/2503.22204.md",
    "content": "### Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting\n\nOpen-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of motion modeling. In this paper, we propose Segment then Splat, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes based on Gaussian Splatting. Segment then Splat reverses the long established approach of \"segmentation after reconstruction\" by dividing Gaussians into distinct object sets before reconstruction. Once the reconstruction is complete, the scene is naturally segmented into individual objects, achieving true 3D segmentation. This approach not only eliminates Gaussian-object misalignment issues in dynamic scenes but also accelerates the optimization process, as it eliminates the need for learning a separate language field. After optimization, a CLIP embedding is assigned to each object to enable open-vocabulary querying. Extensive experiments on various datasets demonstrate the effectiveness of our proposed method in both static and dynamic scenarios.\n\n在三维空间中进行开放词汇查询（Open-vocabulary Querying）对于机器人、自动驾驶系统和增强现实等应用中的智能感知至关重要。然而，现有的大多数方法依赖于二维像素级解析，导致多视角不一致以及较差的三维物体检索效果。此外，由于运动建模的复杂性，它们通常局限于静态场景，在动态场景中表现不佳。\n本文提出了一种基于高斯泼洒（Gaussian Splatting）的三维感知开放词汇分割方法——Segment then Splat，可同时适用于静态和动态场景。Segment then Splat 颠覆了长期以来的“重建后分割”范式，在重建之前即将高斯划分为不同的物体集合。重建完成后，场景自然地被分割为各个独立物体，实现了真正意义上的三维分割。这种方法不仅消除了动态场景中高斯与物体错位的问题，还加速了优化过程，因为无需额外学习语言场（language field）。\n在优化完成后，我们为每个物体分配一个CLIP嵌入向量，以实现开放词汇查询。大量在不同数据集上的实验结果表明，所提方法在静态和动态场景中均表现出了优异的效果。\n"
  },
  {
    "path": "abs/2503.22218.md",
    "content": "### ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting\n\n3D scene stylization approaches based on Neural Radiance Fields (NeRF) achieve promising results by optimizing with Nearest Neighbor Feature Matching (NNFM) loss. However, NNFM loss does not consider global style information. In addition, the implicit representation of NeRF limits their fine-grained control over the resulting scenes. In this paper, we introduce ABC-GS, a novel framework based on 3D Gaussian Splatting to achieve high-quality 3D style transfer. To this end, a controllable matching stage is designed to achieve precise alignment between scene content and style features through segmentation masks. Moreover, a style transfer loss function based on feature alignment is proposed to ensure that the outcomes of style transfer accurately reflect the global style of the reference image. Furthermore, the original geometric information of the scene is preserved with the depth loss and Gaussian regularization terms. Extensive experiments show that our ABC-GS provides controllability of style transfer and achieves stylization results that are more faithfully aligned with the global style of the chosen artistic reference. Our homepage is available at this https URL.\n\n基于神经辐射场（Neural Radiance Fields, NeRF）的三维场景风格化方法通过采用最近邻特征匹配（Nearest Neighbor Feature Matching, NNFM）损失实现了令人瞩目的效果。然而，NNFM损失未能考虑整体风格信息，且NeRF的隐式表示方式限制了对生成场景的精细控制。\n本文提出了一种新的三维高斯泼洒（3D Gaussian Splatting）框架——ABC-GS，以实现高质量的三维风格迁移。为此，我们设计了一个可控的匹配阶段，通过分割掩码实现场景内容与风格特征之间的精确对齐。此外，提出了一种基于特征对齐的风格迁移损失函数，确保迁移结果准确地反映参考图像的整体风格。为了保持场景的原始几何信息，我们引入了深度损失和高斯正则化项。\n大量实验表明，ABC-GS在风格迁移过程中具有良好的可控性，并且能够更忠实地对齐参考艺术作品的整体风格，生成高质量的风格化结果。\n"
  },
  {
    "path": "abs/2503.22225.md",
    "content": "### Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance\n\nPre-trained conditional diffusion models have demonstrated remarkable potential in image editing. However, they often face challenges with temporal consistency, particularly in the talking head domain, where continuous changes in facial expressions intensify the level of difficulty. These issues stem from the independent editing of individual images and the inherent loss of temporal continuity during the editing process. In this paper, we introduce Follow Your Motion (FYM), a generic framework for maintaining temporal consistency in portrait editing. Specifically, given portrait images rendered by a pre-trained 3D Gaussian Splatting model, we first develop a diffusion model that intuitively and inherently learns motion trajectory changes at different scales and pixel coordinates, from the first frame to each subsequent frame. This approach ensures that temporally inconsistent edited avatars inherit the motion information from the rendered avatars. Secondly, to maintain fine-grained expression temporal consistency in talking head editing, we propose a dynamic re-weighted attention mechanism. This mechanism assigns higher weight coefficients to landmark points in space and dynamically updates these weights based on landmark loss, achieving more consistent and refined facial expressions. Extensive experiments demonstrate that our method outperforms existing approaches in terms of temporal consistency and can be used to optimize and compensate for temporally inconsistent outputs in a range of applications, such as text-driven editing, relighting, and various other applications.\n\n经过预训练的条件扩散模型在图像编辑任务中展现了卓越的潜力。然而，在说话人头像（talking head）领域，由于面部表情持续变化带来的高动态性，这些模型在保持时间一致性方面仍面临挑战。这一问题主要源于独立编辑单帧图像过程中导致的时间连续性丧失。\n为了解决这一问题，本文提出了Follow Your Motion (FYM)，一个用于保持肖像编辑中时间一致性的通用框架。具体而言，针对由预训练的三维高斯泼洒（3D Gaussian Splatting）模型渲染得到的肖像图像，我们首先开发了一种扩散模型，能够在不同尺度和像素坐标上，直观且内在地学习从首帧到后续各帧的运动轨迹变化。该方法确保了即使编辑后的头像存在时间不一致，也能继承渲染头像中的运动信息。\n其次，为了在说话人头像编辑中保持细粒度的表情时间一致性，我们提出了动态重加权注意力机制（dynamic re-weighted attention mechanism）。该机制在空间上对关键点（如人脸关键点）赋予更高的权重系数，并基于关键点损失动态更新这些权重，从而实现更加一致且细腻的面部表情变化。\n大量实验表明，FYM在时间一致性方面显著优于现有方法，且能够用于优化和补偿一系列应用中出现的时间不一致输出，如文本驱动编辑、重光照（relighting）以及其他多种应用场景。\n"
  },
  {
    "path": "abs/2503.22324.md",
    "content": "### AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation\n\nThe 3D Gaussian Splatting (3D-GS) is a novel method for scene representation and view synthesis. Although Scaffold-GS achieves higher quality real-time rendering compared to the original 3D-GS, its fine-grained rendering of the scene is extremely dependent on adequate viewing angles. The spectral bias of neural network learning results in Scaffold-GS's poor ability to perceive and learn high-frequency information in the scene. In this work, we propose enhancing the manifold complexity of input features and using network-based feature map loss to improve the image reconstruction quality of 3D-GS models. We introduce AH-GS, which enables 3D Gaussians in structurally complex regions to obtain higher-frequency encodings, allowing the model to more effectively learn the high-frequency information of the scene. Additionally, we incorporate high-frequency reinforce loss to further enhance the model's ability to capture detailed frequency information. Our result demonstrates that our model significantly improves rendering fidelity, and in specific scenarios (e.g., MipNeRf360-garden), our method exceeds the rendering quality of Scaffold-GS in just 15K iterations.\n\n三维高斯泼洒（3D Gaussian Splatting, 3D-GS）是一种新颖的场景表示与视角合成方法。尽管Scaffold-GS相较于原始3D-GS在实时渲染质量上取得了提升，但其对场景的细粒度渲染极度依赖于充足的观测角度。神经网络学习中的频谱偏置（spectral bias）导致了Scaffold-GS在感知和学习场景中的高频信息方面表现不佳。\n针对这一问题，本文提出通过增强输入特征的流形复杂度，并引入基于网络的特征图损失（feature map loss），以提升3D-GS模型的图像重建质量。我们提出了AH-GS方法，使得结构复杂区域内的三维高斯能够获得更高频的编码，从而使模型能够更有效地学习场景中的高频信息。此外，我们引入了高频增强损失（high-frequency reinforce loss），进一步强化模型对细节频率信息的捕获能力。\n实验结果表明，我们的方法在渲染保真度上取得了显著提升，并且在特定场景（如MipNeRF360-garden）中，仅经过15K次迭代，渲染质量便已超越Scaffold-GS。\n"
  },
  {
    "path": "abs/2503.22437.md",
    "content": "### EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting\n\nComplete reconstruction of surgical scenes is crucial for robot-assisted surgery (RAS). Deep depth estimation is promising but existing works struggle with depth discontinuities, resulting in noisy predictions at object boundaries and do not achieve complete reconstruction omitting occluded surfaces. To address these issues we propose EndoLRMGS, that combines Large Reconstruction Modelling (LRM) and Gaussian Splatting (GS), for complete surgical scene reconstruction. GS reconstructs deformable tissues and LRM generates 3D models for surgical tools while position and scale are subsequently optimized by introducing orthogonal perspective joint projection optimization (OPjPO) to enhance accuracy. In experiments on four surgical videos from three public datasets, our method improves the Intersection-over-union (IoU) of tool 3D models in 2D projections by>40%. Additionally, EndoLRMGS improves the PSNR of the tools projection from 3.82% to 11.07%. Tissue rendering quality also improves, with PSNR increasing from 0.46% to 49.87%, and SSIM from 1.53% to 29.21% across all test videos.\n\n对手术场景的完整重建对于机器人辅助手术（Robot-Assisted Surgery, RAS）至关重要。深度学习的深度估计方法具有潜力，但现有工作在处理深度不连续性时存在困难，导致物体边界处预测噪声较大，并且无法完整重建遮挡表面。\n为了解决这些问题，本文提出了EndoLRMGS，结合了大规模重建建模（Large Reconstruction Modeling, LRM）与高斯泼洒（Gaussian Splatting, GS），以实现手术场景的完整重建。其中，GS用于重建可变形的组织，LRM用于生成手术器械的三维模型。同时，我们引入了正交透视联合投影优化（Orthogonal Perspective Joint Projection Optimization, OPjPO），在位置与尺度上进一步优化，以提升重建精度。\n在三个公共数据集的四段手术视频上的实验表明，我们的方法使器械三维模型在二维投影下的交并比（Intersection-over-Union, IoU）提升超过40%。此外，EndoLRMGS使器械投影的峰值信噪比（PSNR）提高了3.82%至11.07%。组织渲染质量也得到了显著提升，PSNR提高了0.46%至49.87%，结构相似性指数（SSIM）在所有测试视频中提升了1.53%至29.21%。\n"
  },
  {
    "path": "abs/2503.22605.md",
    "content": "### Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis\n\nTalking head synthesis has become a key research area in computer graphics and multimedia, yet most existing methods often struggle to balance generation quality with computational efficiency. In this paper, we present a novel approach that leverages an Audio Factorization Plane (Audio-Plane) based Gaussian Splatting for high-quality and real-time talking head generation. For modeling a dynamic talking head, 4D volume representation is needed. However, directly storing a dense 4D grid is impractical due to the high cost and lack of scalability for longer durations. We overcome this challenge with the proposed Audio-Plane, where the 4D volume representation is decomposed into audio-independent space planes and audio-dependent planes. This provides a compact and interpretable feature representation for talking head, facilitating more precise audio-aware spatial encoding and enhanced audio-driven lip dynamic modeling. To further improve speech dynamics, we develop a dynamic splatting method that helps the network more effectively focus on modeling the dynamics of the mouth region. Extensive experiments demonstrate that by integrating these innovations with the powerful Gaussian Splatting, our method is capable of synthesizing highly realistic talking videos in real time while ensuring precise audio-lip synchronization.\n\n说话人头像合成（Talking Head Synthesis）已成为计算机图形学与多媒体领域的关键研究方向，然而，大多数现有方法在生成质量与计算效率之间难以兼顾。\n本文提出了一种新颖的方法，基于**音频因子分解平面（Audio Factorization Plane, Audio-Plane）**的高斯泼洒（Gaussian Splatting），实现高质量且实时的说话人头像生成。为了建模动态说话人头像，需要使用四维体积表示（4D volume representation）。然而，直接存储密集的四维网格不仅代价高昂，而且在处理长时间序列时缺乏可扩展性。\n为克服这一挑战，我们提出了Audio-Plane，将四维体积表示分解为音频无关的空间平面和音频相关的平面。这种分解提供了一种紧凑且可解释的特征表示方式，便于实现更精确的音频感知空间编码与更细致的音频驱动唇部动态建模。\n为了进一步提升语音动态表现，我们还开发了一种动态泼洒方法（dynamic splatting method），引导网络更有效地关注于口部区域的动态建模。\n大量实验表明，通过将上述创新与强大的高斯泼洒方法结合，我们的方法能够实时合成高度真实的说话人视频，并实现精准的音频与唇动同步。\n"
  },
  {
    "path": "abs/2503.22676.md",
    "content": "### TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting\n\nWe present TranSplat, a 3D scene rendering algorithm that enables realistic cross-scene object transfer (from a source to a target scene) based on the Gaussian Splatting framework. Our approach addresses two critical challenges: (1) precise 3D object extraction from the source scene, and (2) faithful relighting of the transferred object in the target scene without explicit material property estimation. TranSplat fits a splatting model to the source scene, using 2D object masks to drive fine-grained 3D segmentation. Following user-guided insertion of the object into the target scene, along with automatic refinement of position and orientation, TranSplat derives per-Gaussian radiance transfer functions via spherical harmonic analysis to adapt the object's appearance to match the target scene's lighting environment. This relighting strategy does not require explicitly estimating physical scene properties such as BRDFs. Evaluated on several synthetic and real-world scenes and objects, TranSplat yields excellent 3D object extractions and relighting performance compared to recent baseline methods and visually convincing cross-scene object transfers. We conclude by discussing the limitations of the approach.\n\n本文提出了TranSplat，一种基于高斯泼洒（Gaussian Splatting）框架的三维场景渲染算法，实现了真实感的跨场景物体迁移（从源场景到目标场景）。我们的方法主要解决了两个关键挑战：(1) 从源场景中精确提取三维物体，以及 (2) 在不显式估计材质属性的情况下，在目标场景中对迁移物体进行真实重光照（relighting）。\nTranSplat首先对源场景拟合一个泼洒模型，并利用二维物体掩码驱动精细的三维分割。随后，在用户引导下将物体插入目标场景，并自动优化位置和朝向。TranSplat通过球谐分析（spherical harmonic analysis）为每个高斯点推导出辐射传递函数（radiance transfer functions），从而自适应调整物体外观以匹配目标场景的光照环境。该重光照策略无需显式估计物理场景属性，如BRDF（双向反射分布函数）。\n在多个合成与真实场景及物体上进行评估，TranSplat在三维物体提取与重光照性能方面相较于现有方法表现出色，实现了视觉上令人信服的跨场景物体迁移。最后，本文也讨论了方法的局限性。\n"
  },
  {
    "path": "abs/2503.22876.md",
    "content": "### VizFlyt: Perception-centric Pedagogical Framework For Autonomous Aerial Robots\n\nAutonomous aerial robots are becoming commonplace in our lives. Hands-on aerial robotics courses are pivotal in training the next-generation workforce to meet the growing market demands. Such an efficient and compelling course depends on a reliable testbed. In this paper, we present VizFlyt, an open-source perception-centric Hardware-In-The-Loop (HITL) photorealistic testing framework for aerial robotics courses. We utilize pose from an external localization system to hallucinate real-time and photorealistic visual sensors using 3D Gaussian Splatting. This enables stress-free testing of autonomy algorithms on aerial robots without the risk of crashing into obstacles. We achieve over 100Hz of system update rate. Lastly, we build upon our past experiences of offering hands-on aerial robotics courses and propose a new open-source and open-hardware curriculum based on VizFlyt for the future. We test our framework on various course projects in real-world HITL experiments and present the results showing the efficacy of such a system and its large potential use cases.\n\n自主飞行机器人正在逐渐融入我们的生活。动手实践的飞行机器人课程对于培养下一代技术人才、满足日益增长的市场需求至关重要。而高效、吸引人的课程依赖于可靠的测试平台。\n本文提出了VizFlyt，一个开源、以感知为中心的硬件在环（Hardware-In-The-Loop, HITL）真实感测试框架，专为飞行机器人课程设计。我们利用外部定位系统提供的位姿信息，通过三维高斯泼洒（3D Gaussian Splatting）实时生成高度真实感的视觉传感器数据，从而实现无风险地测试飞行机器人的自主算法，避免撞击障碍物的风险。系统更新频率超过100Hz。\n基于我们以往开设飞行机器人实践课程的经验，本文进一步提出了一个基于VizFlyt的新型开源、开硬件课程体系，面向未来教学需求。我们在多个真实HITL实验项目中测试了该框架，并展示了其实验结果，验证了系统的有效性及其广泛的应用潜力。\n"
  },
  {
    "path": "abs/2503.22986.md",
    "content": "### FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction\n\nRecently, the integration of the efficient feed-forward scheme into 3D Gaussian Splatting (3DGS) has been actively explored. However, most existing methods focus on sparse view reconstruction of small regions and cannot produce eligible whole-scene reconstruction results in terms of either quality or efficiency. In this paper, we propose FreeSplat++, which focuses on extending the generalizable 3DGS to become an alternative approach to large-scale indoor whole-scene reconstruction, which has the potential of significantly accelerating the reconstruction speed and improving the geometric accuracy. To facilitate whole-scene reconstruction, we initially propose the Low-cost Cross-View Aggregation framework to efficiently process extremely long input sequences. Subsequently, we introduce a carefully designed pixel-wise triplet fusion method to incrementally aggregate the overlapping 3D Gaussian primitives from multiple views, adaptively reducing their redundancy. Furthermore, we propose a weighted floater removal strategy that can effectively reduce floaters, which serves as an explicit depth fusion approach that is crucial in whole-scene reconstruction. After the feed-forward reconstruction of 3DGS primitives, we investigate a depth-regularized per-scene fine-tuning process. Leveraging the dense, multi-view consistent depth maps obtained during the feed-forward prediction phase for an extra constraint, we refine the entire scene's 3DGS primitive to enhance rendering quality while preserving geometric accuracy. Extensive experiments confirm that our FreeSplat++ significantly outperforms existing generalizable 3DGS methods, especially in whole-scene reconstructions. Compared to conventional per-scene optimized 3DGS approaches, our method with depth-regularized per-scene fine-tuning demonstrates substantial improvements in reconstruction accuracy and a notable reduction in training time.\n\n近期，越来越多的研究探索将高效前馈（feed-forward）机制融入三维高斯泼洒（3D Gaussian Splatting, 3DGS）中。然而，大多数现有方法主要针对小区域的稀疏视图重建，难以在质量或效率方面生成符合要求的整场景重建结果。\n本文提出了FreeSplat++，旨在扩展可泛化的3DGS，使其成为大规模室内整场景重建的可行替代方案，具备显著加速重建速度并提升几何精度的潜力。为了促进整场景重建，我们首先提出了低成本跨视角聚合框架（Low-cost Cross-View Aggregation framework），以高效处理极长的输入序列。随后，我们引入了精心设计的逐像素三元融合方法（pixel-wise triplet fusion method），用于增量式地聚合多视角中重叠的三维高斯基元，自适应地减少冗余。\n此外，我们提出了加权浮点噪声移除策略（weighted floater removal strategy），作为一种显式的深度融合方法，有效降低浮点噪声，这对于整场景重建至关重要。在完成3DGS基元的前馈重建后，我们进一步设计了基于深度正则化的逐场景微调过程。通过利用前馈预测阶段获得的稠密、多视角一致的深度图作为额外约束，对整个场景的3DGS基元进行细化，在提升渲染质量的同时保持几何精度。\n大量实验表明，**FreeSplat++**在整场景重建任务中显著优于现有的可泛化3DGS方法。与传统的逐场景优化3DGS方法相比，我们的方法结合深度正则化的逐场景微调，在重建精度上取得了大幅提升，并显著缩短了训练时间。\n"
  },
  {
    "path": "abs/2503.23044.md",
    "content": "### CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction\n\nDespite its significant achievements in large-scale scene reconstruction, 3D Gaussian Splatting still faces substantial challenges, including slow processing, high computational costs, and limited geometric accuracy. These core issues arise from its inherently unstructured design and the absence of efficient parallelization. To overcome these challenges simultaneously, we introduce CityGS-X, a scalable architecture built on a novel parallelized hybrid hierarchical 3D representation (PH^2-3D). As an early attempt, CityGS-X abandons the cumbersome merge-and-partition process and instead adopts a newly-designed batch-level multi-task rendering process. This architecture enables efficient multi-GPU rendering through dynamic Level-of-Detail voxel allocations, significantly improving scalability and performance. Through extensive experiments, CityGS-X consistently outperforms existing methods in terms of faster training times, larger rendering capacities, and more accurate geometric details in large-scale scenes. Notably, CityGS-X can train and render a scene with 5,000+ images in just 5 hours using only 4 * 4090 GPUs, a task that would make other alternative methods encounter Out-Of-Memory (OOM) issues and fail completely. This implies that CityGS-X is far beyond the capacity of other existing methods.\n\n尽管三维高斯泼洒（3D Gaussian Splatting）在大规模场景重建中取得了显著成果，但仍面临诸多挑战，包括处理速度慢、计算开销大和几何精度受限等。这些核心问题源于其本身无结构的设计以及缺乏高效并行化机制。\n为同时克服上述挑战，本文提出了CityGS-X，一种基于新型并行混合分层三维表示（Parallelized Hybrid Hierarchical 3D Representation, PH²-3D）的可扩展架构。作为首次尝试，CityGS-X摒弃了繁琐的“合并与划分”过程，转而采用全新设计的批量级多任务渲染流程（batch-level multi-task rendering process）。该架构通过动态细节层次（Level-of-Detail）体素分配实现高效的多GPU渲染，显著提升了系统的可扩展性与性能。\n通过大量实验验证，CityGS-X在训练速度、更大规模渲染能力以及大场景几何细节精度方面，均持续优于现有方法。值得注意的是，CityGS-X仅使用4块4090 GPU，在5小时内即可完成包含5000张以上图像的场景训练与渲染任务，而其他方法在同等条件下通常会因显存溢出（Out-Of-Memory, OOM）而失败。这表明，CityGS-X在容量与性能上远超现有所有方法。\n"
  },
  {
    "path": "abs/2503.23162.md",
    "content": "### NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations\n\n3D Gaussian Splatting (3DGS) demonstrates superior quality and rendering speed, but with millions of 3D Gaussians and significant storage and transmission costs. Recent 3DGS compression methods mainly concentrate on compressing Scaffold-GS, achieving impressive performance but with an additional voxel structure and a complex encoding and quantization strategy. In this paper, we aim to develop a simple yet effective method called NeuralGS that explores in another way to compress the original 3DGS into a compact representation without the voxel structure and complex quantization strategies. Our observation is that neural fields like NeRF can represent complex 3D scenes with Multi-Layer Perceptron (MLP) neural networks using only a few megabytes. Thus, NeuralGS effectively adopts the neural field representation to encode the attributes of 3D Gaussians with MLPs, only requiring a small storage size even for a large-scale scene. To achieve this, we adopt a clustering strategy and fit the Gaussians with different tiny MLPs for each cluster, based on importance scores of Gaussians as fitting weights. We experiment on multiple datasets, achieving a 45-times average model size reduction without harming the visual quality. The compression performance of our method on original 3DGS is comparable to the dedicated Scaffold-GS-based compression methods, which demonstrate the huge potential of directly compressing original 3DGS with neural fields.\n\n三维高斯泼洒（3D Gaussian Splatting, 3DGS）在渲染质量和速度上表现优异，但由于包含数百万个三维高斯基元，存储与传输开销巨大。近期的3DGS压缩方法主要集中于对Scaffold-GS进行压缩，虽然取得了令人印象深刻的性能，但通常依赖额外的体素结构和复杂的编码与量化策略。\n本文旨在提出一种简单而有效的方法——NeuralGS，探索另一种思路，将原始3DGS压缩为紧凑表示，且无需体素结构和复杂量化策略。我们的观察是，神经场（如NeRF）可以通过多层感知机（Multi-Layer Perceptron, MLP）神经网络，仅用几兆字节的数据就能表示复杂的三维场景。因此，NeuralGS采用神经场表示，用MLP对三维高斯的属性进行编码，即使在大规模场景中也只需极小的存储空间。\n为实现这一目标，我们引入了聚类策略，依据高斯的重要性得分作为拟合权重，将高斯划分为不同的簇，并为每个簇分别拟合小型MLP。在多个数据集上的实验表明，NeuralGS在不损害视觉质量的前提下，平均实现了45倍的模型大小压缩。我们的压缩性能在原始3DGS上可与基于Scaffold-GS的专用压缩方法相媲美，展示了直接利用神经场压缩原始3DGS的巨大潜力。\n"
  },
  {
    "path": "abs/2503.23297.md",
    "content": "### ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning\n\nOpen-vocabulary 3D visual grounding and reasoning aim to localize objects in a scene based on implicit language descriptions, even when they are occluded. This ability is crucial for tasks such as vision-language navigation and autonomous robotics. However, current methods struggle because they rely heavily on fine-tuning with 3D annotations and mask proposals, which limits their ability to handle diverse semantics and common knowledge required for effective reasoning. In this work, we propose ReasonGrounder, an LVLM-guided framework that uses hierarchical 3D feature Gaussian fields for adaptive grouping based on physical scale, enabling open-vocabulary 3D grounding and reasoning. ReasonGrounder interprets implicit instructions using large vision-language models (LVLM) and localizes occluded objects through 3D Gaussian splatting. By incorporating 2D segmentation masks from the SAM and multi-view CLIP embeddings, ReasonGrounder selects Gaussian groups based on object scale, enabling accurate localization through both explicit and implicit language understanding, even in novel, occluded views. We also contribute ReasoningGD, a new dataset containing over 10K scenes and 2 million annotations for evaluating open-vocabulary 3D grounding and amodal perception under occlusion. Experiments show that ReasonGrounder significantly improves 3D grounding accuracy in real-world scenarios.\n\n开放词汇的三维视觉指引与推理（Open-vocabulary 3D Visual Grounding and Reasoning）旨在根据隐式语言描述在场景中定位物体，即使物体被遮挡也能准确识别。这种能力对于视觉-语言导航（Vision-Language Navigation）和自主机器人等任务至关重要。然而，当前方法普遍存在困难，因为它们过度依赖于带有三维标注和掩码提议（mask proposals）的微调过程，限制了对丰富语义和常识推理能力的支持。\n为此，本文提出了ReasonGrounder，一种由大规模视觉-语言模型（Large Vision-Language Model, LVLM）引导的框架，基于**分层三维特征高斯场（hierarchical 3D feature Gaussian fields）**按物理尺度进行自适应分组，实现开放词汇的三维指引与推理。ReasonGrounder利用LVLM解析隐式指令，并通过三维高斯泼洒（3D Gaussian Splatting）定位被遮挡的物体。通过结合来自SAM的二维分割掩码和多视角CLIP嵌入，ReasonGrounder能够根据物体尺度选择高斯组，从而在新颖且被遮挡的视角下，依然实现基于显式与隐式语言理解的精准定位。\n此外，本文贡献了ReasoningGD数据集，包含超过1万组场景和200万条标注，用于评估遮挡条件下的开放词汇三维指引与非显式感知（amodal perception）。实验结果表明，ReasonGrounder在真实场景中显著提升了三维指引的准确性。\n"
  },
  {
    "path": "abs/2503.23337.md",
    "content": "### Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction\n\nRecently, 3D Gaussian Spatting (3DGS) has gained widespread attention in Novel View Synthesis (NVS) due to the remarkable real-time rendering performance. However, the substantial cost of storage and transmission of vanilla 3DGS hinders its further application (hundreds of megabytes or even gigabytes for a single scene). Motivated by the achievements of prediction in video compression, we introduce the prediction technique into the anchor-based Gaussian representation to effectively reduce the bit rate. Specifically, we propose a spatial condition-based prediction module to utilize the grid-captured scene information for prediction, with a residual compensation strategy designed to learn the missing fine-grained information. Besides, to further compress the residual, we propose an instance-aware hyper prior, developing a structure-aware and instance-aware entropy model. Extensive experiments demonstrate the effectiveness of our prediction-based compression framework and each technical component. Even compared with SOTA compression method, our framework still achieves a bit rate savings of 24.42 percent.\n\n近年来，三维高斯泼洒（3D Gaussian Splatting, 3DGS）因其出色的实时渲染性能，在新视角合成（Novel View Synthesis, NVS）领域受到了广泛关注。然而，原始3DGS在存储与传输方面的高开销（单个场景通常需要数百兆字节甚至数千兆字节）严重阻碍了其进一步应用。\n受视频压缩中预测技术成功应用的启发，本文将预测技术引入基于锚点（anchor-based）的高斯表示，以有效降低比特率。具体而言，我们提出了一种基于空间条件的预测模块（spatial condition-based prediction module），利用网格采样的场景信息进行预测，并设计了残差补偿策略（residual compensation strategy）以学习缺失的细粒度信息。此外，为了进一步压缩残差信息，我们提出了面向实例的超先验（instance-aware hyper prior），发展了一种结构感知与实例感知的熵模型。\n大量实验验证了我们基于预测的压缩框架及各技术组件的有效性。即使与当前最佳（SOTA）压缩方法相比，我们的框架仍实现了24.42%的比特率节省。\n"
  },
  {
    "path": "abs/2503.23625.md",
    "content": "### Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR\n\nThe rapidly advancing field of Augmented and Virtual Reality (AR/VR) demands real-time, photorealistic rendering on resource-constrained platforms. 3D Gaussian Splatting, delivering state-of-the-art (SOTA) performance in rendering efficiency and quality, has emerged as a promising solution across a broad spectrum of AR/VR applications. However, despite its effectiveness on high-end GPUs, it struggles on edge systems like the Jetson Orin NX Edge GPU, achieving only 7-17 FPS -- well below the over 60 FPS standard required for truly immersive AR/VR experiences. Addressing this challenge, we perform a comprehensive analysis of Gaussian-based AR/VR applications and identify the Gaussian Blending Stage, which intensively calculates each Gaussian's contribution at every pixel, as the primary bottleneck. In response, we propose a Gaussian Blending Unit (GBU), an edge GPU plug-in module for real-time rendering in AR/VR applications. Notably, our GBU can be seamlessly integrated into conventional edge GPUs and collaboratively supports a wide range of AR/VR applications. Specifically, GBU incorporates an intra-row sequential shading (IRSS) dataflow that shades each row of pixels sequentially from left to right, utilizing a two-step coordinate transformation. When directly deployed on a GPU, the proposed dataflow achieved a non-trivial 1.72x speedup on real-world static scenes, though still falls short of real-time rendering performance. Recognizing the limited compute utilization in the GPU-based implementation, GBU enhances rendering speed with a dedicated rendering engine that balances the workload across rows by aggregating computations from multiple Gaussians. Experiments across representative AR/VR applications demonstrate that our GBU provides a unified solution for on-device real-time rendering while maintaining SOTA rendering quality.\n\n快速发展的增强与虚拟现实（Augmented and Virtual Reality, AR/VR）领域对资源受限平台上的实时、真实感渲染提出了更高要求。三维高斯泼洒（3D Gaussian Splatting）凭借卓越的渲染效率与质量，已成为广泛AR/VR应用中极具前景的解决方案。然而，尽管该方法在高端GPU上表现出色，但在Jetson Orin NX等边缘GPU系统上却表现不佳，仅能实现7–17 FPS，远低于实现沉浸式AR/VR体验所需的60 FPS以上标准。\n为了解决这一挑战，本文对基于高斯的AR/VR应用进行了全面分析，并发现高斯混合阶段（Gaussian Blending Stage）——即为每个像素密集计算每个高斯贡献——是主要的性能瓶颈。针对这一问题，我们提出了高斯混合单元（Gaussian Blending Unit, GBU），一种用于AR/VR应用实时渲染的边缘GPU插件模块。值得注意的是，GBU可以无缝集成到常规边缘GPU中，并广泛支持各种AR/VR应用。\n具体而言，GBU引入了**行内顺序着色（Intra-Row Sequential Shading, IRSS）**数据流方式，从左到右依次处理每一行像素，并采用两步坐标变换。直接部署在GPU上时，该数据流在真实静态场景中实现了1.72倍的加速，尽管仍未达到实时渲染性能。针对GPU实现中计算利用率有限的问题，GBU进一步通过专用渲染引擎提升渲染速度，能够跨行平衡工作负载，聚合多个高斯的计算。\n在代表性AR/VR应用中的实验表明，GBU能够在设备端实现统一的实时渲染解决方案，同时保持当前最佳（SOTA）的渲染质量。\n"
  },
  {
    "path": "abs/2503.23881.md",
    "content": "### ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image\n\nThe increasing demand for augmented and virtual reality applications has highlighted the importance of crafting immersive 3D scenes from a simple single-view image. However, due to the partial priors provided by single-view input, existing methods are often limited to reconstruct low-consistency 3D scenes with narrow fields of view from single-view input. These limitations make them less capable of generalizing to reconstruct immersive scenes. To address this problem, we propose ExScene, a two-stage pipeline to reconstruct an immersive 3D scene from any given single-view image. ExScene designs a novel multimodal diffusion model to generate a high-fidelity and globally consistent panoramic image. We then develop a panoramic depth estimation approach to calculate geometric information from panorama, and we combine geometric information with high-fidelity panoramic image to train an initial 3D Gaussian Splatting (3DGS) model. Following this, we introduce a GS refinement technique with 2D stable video diffusion priors. We add camera trajectory consistency and color-geometric priors into the denoising process of diffusion to improve color and spatial consistency across image sequences. These refined sequences are then used to fine-tune the initial 3DGS model, leading to better reconstruction quality. Experimental results demonstrate that our ExScene achieves consistent and immersive scene reconstruction using only single-view input, significantly surpassing state-of-the-art baselines.\n\n随着增强现实与虚拟现实（Augmented and Virtual Reality, AR/VR）应用需求的不断增长，从单幅图像构建沉浸式三维场景的重要性日益凸显。然而，由于单视图输入提供的先验信息有限，现有方法通常只能重建一致性较低、视野较窄的三维场景，难以推广至沉浸式场景的构建。\n为了解决这一问题，本文提出了ExScene，一个从任意单幅图像重建沉浸式三维场景的两阶段流程。ExScene首先设计了一种新颖的多模态扩散模型（multimodal diffusion model），用于生成高保真、全局一致的全景图像。随后，我们开发了一种全景深度估计方法，从全景图像中计算几何信息，并将几何信息与高保真的全景图结合，用于训练初始的三维高斯泼洒（3D Gaussian Splatting, 3DGS）模型。\n在此基础上，我们引入了一种基于**二维稳定视频扩散先验（2D stable video diffusion priors）**的高斯泼洒（GS）细化技术。通过在扩散去噪过程中引入相机轨迹一致性和颜色-几何先验，提升图像序列的颜色与空间一致性。最终，利用这些细化后的序列对初始3DGS模型进行微调，从而获得更高质量的重建结果。\n实验结果表明，ExScene能够仅依赖单幅图像实现一致且沉浸的三维场景重建，在各项指标上显著超越现有的先进基线方法。\n"
  },
  {
    "path": "abs/2503.24009.md",
    "content": "### Learning 3D-Gaussian Simulators from RGB Videos\n\nLearning physics simulations from video data requires maintaining spatial and temporal consistency, a challenge often addressed with strong inductive biases or ground-truth 3D information -- limiting scalability and generalization. We introduce 3DGSim, a 3D physics simulator that learns object dynamics end-to-end from multi-view RGB videos. It encodes images into a 3D Gaussian particle representation, propagates dynamics via a transformer, and renders frames using 3D Gaussian splatting. By jointly training inverse rendering with a dynamics transformer using a temporal encoding and merging layer, 3DGSimembeds physical properties into point-wise latent vectors without enforcing explicit connectivity constraints. This enables the model to capture diverse physical behaviors, from rigid to elastic and cloth-like interactions, along with realistic lighting effects that also generalize to unseen multi-body interactions and novel scene edits.\n\n从视频数据中学习物理仿真需要同时保持空间与时间的一致性，而现有方法通常依赖强归纳偏置或真实三维信息，限制了其可扩展性与泛化能力。\n本文提出了3DGSim，一种能够从多视角RGB视频端到端学习物体动态的三维物理仿真器。3DGSim将图像编码为三维高斯粒子表示（3D Gaussian particle representation），通过变换器（transformer）传播动态信息，并利用三维高斯泼洒（3D Gaussian Splatting）渲染各帧图像。通过联合训练逆向渲染模块与动态变换器，结合时间编码与融合层，3DGSim在无需显式连接约束的情况下，将物理属性嵌入到逐点的潜变量中。\n这种方法使得模型能够捕捉多样的物理行为，从刚体、弹性体到类似布料的交互，并呈现真实的光照效果，同时在面对未见过的多物体交互和新场景编辑时依然具有良好的泛化能力。\n"
  },
  {
    "path": "abs/2503.24210.md",
    "content": "### DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting\n\nReconstructing sharp 3D representations from blurry multi-view images are long-standing problem in computer vision. Recent works attempt to enhance high-quality novel view synthesis from the motion blur by leveraging event-based cameras, benefiting from high dynamic range and microsecond temporal resolution. However, they often reach sub-optimal visual quality in either restoring inaccurate color or losing fine-grained details. In this paper, we present DiET-GS, a diffusion prior and event stream-assisted motion deblurring 3DGS. Our framework effectively leverages both blur-free event streams and diffusion prior in a two-stage training strategy. Specifically, we introduce the novel framework to constraint 3DGS with event double integral, achieving both accurate color and well-defined details. Additionally, we propose a simple technique to leverage diffusion prior to further enhance the edge details. Qualitative and quantitative results on both synthetic and real-world data demonstrate that our DiET-GS is capable of producing significantly better quality of novel views compared to the existing baselines.\n\n从模糊的多视角图像中重建清晰的三维表示一直是计算机视觉领域的长期挑战。近年来，相关研究尝试利用事件相机（event-based cameras）提升运动模糊情况下的新视角合成质量，得益于事件相机的高动态范围和微秒级时间分辨率。然而，这些方法在恢复颜色准确性或保持细粒度细节方面往往表现欠佳，导致视觉质量次优。\n本文提出了DiET-GS，一种结合扩散先验与事件流辅助的运动去模糊三维高斯泼洒（3DGS）方法。我们的框架在两阶段训练策略中有效利用无模糊的事件流和扩散先验。具体来说，我们引入了一种新颖的框架，通过**事件双重积分（event double integral）**约束3DGS，从而同时实现准确的颜色还原和清晰的细节恢复。此外，我们提出了一种简单的技术，利用扩散先验进一步增强边缘细节。\n在合成数据和真实数据上的定性与定量实验结果表明，与现有基线方法相比，DiET-GS能够生成质量显著更高的新视角图像。\n"
  },
  {
    "path": "abs/2503.24270.md",
    "content": "### Visual Acoustic Fields\n\nObjects produce different sounds when hit, and humans can intuitively infer how an object might sound based on its appearance and material properties. Inspired by this intuition, we propose Visual Acoustic Fields, a framework that bridges hitting sounds and visual signals within a 3D space using 3D Gaussian Splatting (3DGS). Our approach features two key modules: sound generation and sound localization. The sound generation module leverages a conditional diffusion model, which takes multiscale features rendered from a feature-augmented 3DGS to generate realistic hitting sounds. Meanwhile, the sound localization module enables querying the 3D scene, represented by the feature-augmented 3DGS, to localize hitting positions based on the sound sources. To support this framework, we introduce a novel pipeline for collecting scene-level visual-sound sample pairs, achieving alignment between captured images, impact locations, and corresponding sounds. To the best of our knowledge, this is the first dataset to connect visual and acoustic signals in a 3D context. Extensive experiments on our dataset demonstrate the effectiveness of Visual Acoustic Fields in generating plausible impact sounds and accurately localizing impact sources.\n\n物体在受到撞击时会产生不同的声音，人类可以根据物体的外观和材质直观地推断其可能发出的声音。受此直觉启发，本文提出了Visual Acoustic Fields，一个在三维空间中利用三维高斯泼洒（3D Gaussian Splatting, 3DGS）将撞击声音与视觉信号关联的框架。\n我们的方法包含两个核心模块：声音生成与声音定位。声音生成模块采用条件扩散模型（conditional diffusion model），输入由特征增强的3DGS渲染出的多尺度特征，生成真实的撞击声音。同时，声音定位模块支持在由特征增强的3DGS表示的三维场景中进行查询，根据声音源定位撞击位置。\n为支撑这一框架，我们提出了一条新颖的数据采集流程，用于收集场景级的视觉-声音样本对，实现图像、撞击位置与对应声音之间的对齐。据我们所知，这是第一个在三维环境中连接视觉与声学信号的数据集。\n在我们构建的数据集上进行的大量实验表明，Visual Acoustic Fields能够生成合理可信的撞击声音，并准确定位撞击源位置，验证了方法的有效性。\n"
  },
  {
    "path": "abs/2503.24366.md",
    "content": "### StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting\n\n3D Gaussian splatting (3DGS) is a popular radiance field method, with many application-specific extensions. Most variants rely on the same core algorithm: depth-sorting of Gaussian splats then rasterizing in primitive order. This ensures correct alpha compositing, but can cause rendering artifacts due to built-in approximations. Moreover, for a fixed representation, sorted rendering offers little control over render cost and visual fidelity. For example, and counter-intuitively, rendering a lower-resolution image is not necessarily faster. In this work, we address the above limitations by combining 3D Gaussian splatting with stochastic rasterization. Concretely, we leverage an unbiased Monte Carlo estimator of the volume rendering equation. This removes the need for sorting, and allows for accurate 3D blending of overlapping Gaussians. The number of Monte Carlo samples further imbues 3DGS with a way to trade off computation time and quality. We implement our method using OpenGL shaders, enabling efficient rendering on modern GPU hardware. At a reasonable visual quality, our method renders more than four times faster than sorted rasterization.\n\n三维高斯泼洒（3D Gaussian Splatting, 3DGS）是一种流行的辐射场（radiance field）方法，并已被广泛扩展应用于各种特定任务。大多数变体仍依赖于相同的核心算法：对高斯基元进行深度排序，并按基元顺序栅格化。这种方法虽然能保证正确的Alpha合成，但由于内置近似，容易引发渲染伪影。此外，对于固定的表示，排序渲染在渲染开销与视觉保真度上几乎无法调节。例如，颇为反直觉的是，渲染低分辨率图像并不一定能加快渲染速度。\n针对上述局限，本文提出了将三维高斯泼洒与**随机栅格化（stochastic rasterization）**结合的方法。具体而言，我们利用体积渲染方程的无偏蒙特卡洛估计（unbiased Monte Carlo estimator），从而无需排序，并能够准确地对重叠的高斯基元进行三维混合。通过控制蒙特卡洛采样数量，3DGS还获得了一种灵活调整计算时间与渲染质量的方法。\n我们基于OpenGL着色器实现了该方法，能够在现代GPU硬件上高效渲染。在达到合理视觉质量的条件下，我们的方法渲染速度比传统的排序栅格化快了四倍以上。\n"
  },
  {
    "path": "abs/2503.24382.md",
    "content": "### Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views\n\nNeural rendering has demonstrated remarkable success in high-quality 3D neural reconstruction and novel view synthesis with dense input views and accurate poses. However, applying it to extremely sparse, unposed views in unbounded 360° scenes remains a challenging problem. In this paper, we propose a novel neural rendering framework to accomplish the unposed and extremely sparse-view 3D reconstruction in unbounded 360° scenes. To resolve the spatial ambiguity inherent in unbounded scenes with sparse input views, we propose a layered Gaussian-based representation to effectively model the scene with distinct spatial layers. By employing a dense stereo reconstruction model to recover coarse geometry, we introduce a layer-specific bootstrap optimization to refine the noise and fill occluded regions in the reconstruction. Furthermore, we propose an iterative fusion of reconstruction and generation alongside an uncertainty-aware training approach to facilitate mutual conditioning and enhancement between these two processes. Comprehensive experiments show that our approach outperforms existing state-of-the-art methods in terms of rendering quality and surface reconstruction accuracy. Project page: this https URL\n\n神经渲染在高质量三维神经重建和新视角合成任务中已取得显著成果，尤其是在输入视角稠密且相机位姿精确的条件下。然而，将其应用于位姿未知且极度稀疏视角的无界360°场景仍是一项具有挑战性的问题。本文提出了一种新颖的神经渲染框架，旨在实现无界360°场景中无位姿、极稀疏视角下的三维重建。\n为解决稀疏输入视角下无界场景中固有的空间歧义问题，我们提出了一种基于分层高斯的表示方法，能够有效地以不同的空间层次建模场景结构。通过引入稠密立体重建模型以获取粗略几何信息，我们进一步设计了面向层的引导优化策略（bootstrap optimization），用于细化重建中的噪声并补全遮挡区域。\n此外，我们提出了重建与生成过程的迭代融合机制，并结合基于不确定性的训练策略，以实现两者之间的相互引导与增强。\n大量实验结果表明，我们的方法在渲染质量与表面重建精度方面均优于当前最先进的技术。\n"
  },
  {
    "path": "abs/2504.00159.md",
    "content": "### SonarSplat: Novel View Synthesis of Imaging Sonar via Gaussian Splatting\n\nIn this paper, we present SonarSplat, a novel Gaussian splatting framework for imaging sonar that demonstrates realistic novel view synthesis and models acoustic streaking phenomena. Our method represents the scene as a set of 3D Gaussians with acoustic reflectance and saturation properties. We develop a novel method to efficiently rasterize learned Gaussians to produce a range/azimuth image that is faithful to the acoustic image formation model of imaging sonar. In particular, we develop a novel approach to model azimuth streaking in a Gaussian splatting framework. We evaluate SonarSplat using real-world datasets of sonar images collected from an underwater robotic platform in a controlled test tank and in a real-world river environment. Compared to the state-of-the-art, SonarSplat offers improved image synthesis capabilities (+2.5 dB PSNR). We also demonstrate that SonarSplat can be leveraged for azimuth streak removal and 3D scene reconstruction.\n\n本文提出了 SonarSplat，一种新颖的高斯喷洒（Gaussian Splatting）框架，面向成像声呐（imaging sonar），能够实现逼真的新视角合成，并建模声学拖影（acoustic streaking）现象。我们的方法将场景表示为一组具备声学反射率和饱和特性的三维高斯分布。\n我们设计了一种新颖的高效光栅化方法，将学习得到的高斯投影生成距离/方位图像（range/azimuth image），从而忠实模拟成像声呐的声学成像模型。特别地，我们在高斯喷洒框架下创新性地建模了方位拖影现象（azimuth streaking）。\n我们在两个真实场景的数据集上对 SonarSplat 进行了评估，数据分别采集自受控测试水池和真实河流环境中的水下机器人平台。与现有最先进方法相比，SonarSplat 在图像合成能力方面表现更优，PSNR 提升达 +2.5 dB。\n此外，我们还展示了 SonarSplat 在方位拖影去除与三维场景重建任务中的潜力。\n"
  },
  {
    "path": "abs/2504.00219.md",
    "content": "### LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors\n\nDirectly employing 3D Gaussian Splatting (3DGS) on images with adverse illumination conditions exhibits considerable difficulty in achieving high-quality, normally-exposed representations due to: (1) The limited Structure from Motion (SfM) points estimated in adverse illumination scenarios fail to capture sufficient scene details; (2) Without ground-truth references, the intensive information loss, significant noise, and color distortion pose substantial challenges for 3DGS to produce high-quality results; (3) Combining existing exposure correction methods with 3DGS does not achieve satisfactory performance due to their individual enhancement processes, which lead to the illumination inconsistency between enhanced images from different viewpoints. To address these issues, we propose LITA-GS, a novel illumination-agnostic novel view synthesis method via reference-free 3DGS and physical priors. Firstly, we introduce an illumination-invariant physical prior extraction pipeline. Secondly, based on the extracted robust spatial structure prior, we develop the lighting-agnostic structure rendering strategy, which facilitates the optimization of the scene structure and object appearance. Moreover, a progressive denoising module is introduced to effectively mitigate the noise within the light-invariant representation. We adopt the unsupervised strategy for the training of LITA-GS and extensive experiments demonstrate that LITA-GS surpasses the state-of-the-art (SOTA) NeRF-based method while enjoying faster inference speed and costing reduced training time.\n\n直接将三维高斯喷洒（3D Gaussian Splatting，3DGS）应用于光照条件恶劣的图像，难以实现高质量的正常曝光重建，原因在于：(1) 在光照不良的场景中，结构自运动（SfM）估计得到的特征点有限，难以捕捉足够的场景细节；(2) 缺乏真实参考的情况下，信息严重丢失、噪声显著以及颜色失真，使得 3DGS 很难生成高质量结果；(3) 将现有曝光校正方法与 3DGS 结合时，由于它们各自独立的增强过程，不同视角图像之间会出现光照不一致，导致整体表现不佳。\n为了解决上述问题，我们提出 LITA-GS，一种通过无参考 3DGS 与物理先验实现的光照无关新视角合成方法。首先，我们引入一个光照不变的物理先验提取流程。其次，基于提取到的鲁棒空间结构先验，我们设计了光照无关的结构渲染策略，促进场景结构与物体外观的优化。此外，我们还引入了一个渐进式去噪模块，有效缓解光照不变表示中的噪声干扰。\n我们采用无监督的方式对 LITA-GS 进行训练。大量实验证明，LITA-GS 在合成质量上优于现有最先进的基于 NeRF 的方法，同时具备更快的推理速度和更低的训练成本。\n"
  },
  {
    "path": "abs/2504.00387.md",
    "content": "### Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration\n\nThe reconstruction of immersive and realistic 3D scenes holds significant practical importance in various fields of computer vision and computer graphics. Typically, immersive and realistic scenes should be free from obstructions by dynamic objects, maintain global texture consistency, and allow for unrestricted exploration. The current mainstream methods for image-driven scene construction involves iteratively refining the initial image using a moving virtual camera to generate the scene. However, previous methods struggle with visual discontinuities due to global texture inconsistencies under varying camera poses, and they frequently exhibit scene voids caused by foreground-background occlusions. To this end, we propose a novel layered 3D scene reconstruction framework from panoramic image, named Scene4U. Specifically, Scene4U integrates an open-vocabulary segmentation model with a large language model to decompose a real panorama into multiple layers. Then, we employs a layered repair module based on diffusion model to restore occluded regions using visual cues and depth information, generating a hierarchical representation of the scene. The multi-layer panorama is then initialized as a 3D Gaussian Splatting representation, followed by layered optimization, which ultimately produces an immersive 3D scene with semantic and structural consistency that supports free exploration. Scene4U outperforms state-of-the-art method, improving by 24.24% in LPIPS and 24.40% in BRISQUE, while also achieving the fastest training speed. Additionally, to demonstrate the robustness of Scene4U and allow users to experience immersive scenes from various landmarks, we build WorldVista3D dataset for 3D scene reconstruction, which contains panoramic images of globally renowned sites.\n\n沉浸式、真实感三维场景的重建在计算机视觉与计算机图形学的多个应用中具有重要的实际意义。通常，具备沉浸感与真实感的场景应满足以下条件：不被动态物体遮挡、具有全局纹理一致性，并支持自由探索。目前主流的图像驱动场景重建方法，通常采用移动虚拟相机对初始图像进行迭代优化以生成场景。然而，现有方法在不同相机姿态下常出现全局纹理不一致，导致视觉不连续性问题，同时也容易因前景-背景遮挡而产生场景空洞。\n为此，我们提出了一种基于全景图像的新型分层三维场景重建框架，命名为 Scene4U。具体而言，Scene4U 首先结合开放词汇的分割模型与大型语言模型，将真实全景图像分解为多个语义层。随后，利用基于扩散模型的分层修复模块，结合视觉线索与深度信息，恢复被遮挡区域，从而构建场景的分层表达。接着，将多层全景图初始化为三维高斯喷洒（3D Gaussian Splatting）表示，并通过分层优化过程，最终生成具有语义一致性与结构一致性的沉浸式三维场景，支持用户自由探索。\nScene4U 在性能上显著优于现有最先进方法，在 LPIPS 指标上提升 24.24%，在 BRISQUE 指标上提升 24.40%，同时具备最快的训练速度。\n此外，为验证 Scene4U 的鲁棒性，并让用户能够体验来自不同地标的沉浸式场景，我们构建了用于三维场景重建的 WorldVista3D 数据集，该数据集包含全球著名景点的全景图像。\n"
  },
  {
    "path": "abs/2504.00437.md",
    "content": "### ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs\n\nWe present a novel approach, termed ADGaussian, for generalizable street scene reconstruction. The proposed method enables high-quality rendering from single-view input. Unlike prior Gaussian Splatting methods that primarily focus on geometry refinement, we emphasize the importance of joint optimization of image and depth features for accurate Gaussian prediction. To this end, we first incorporate sparse LiDAR depth as an additional input modality, formulating the Gaussian prediction process as a joint learning framework of visual information and geometric clue. Furthermore, we propose a multi-modal feature matching strategy coupled with a multi-scale Gaussian decoding model to enhance the joint refinement of multi-modal features, thereby enabling efficient multi-modal Gaussian learning. Extensive experiments on two large-scale autonomous driving datasets, Waymo and KITTI, demonstrate that our ADGaussian achieves state-of-the-art performance and exhibits superior zero-shot generalization capabilities in novel-view shifting.\n\n我们提出了一种新颖的方法——ADGaussian，用于具有泛化能力的街景重建。该方法支持从单视图输入实现高质量渲染。与以往主要侧重于几何细化的Gaussian Splatting方法不同，我们强调图像特征与深度特征的联合优化对于准确高斯预测的重要性。\n为此，我们首先引入稀疏LiDAR深度作为额外的输入模态，将高斯预测过程表述为一个融合视觉信息与几何线索的联合学习框架。此外，我们提出了一种多模态特征匹配策略，结合多尺度高斯解码模型，以增强多模态特征的联合细化能力，从而实现高效的多模态高斯学习。\n在两个大规模自动驾驶数据集——Waymo和KITTI——上的大量实验证明，我们的ADGaussian方法不仅实现了当前最先进的性能，还在新视角切换任务中展现出卓越的零样本泛化能力。\n"
  },
  {
    "path": "abs/2504.00457.md",
    "content": "### Distilling Multi-view Diffusion Models into 3D Generators\n\nWe introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous amortized optimization approaches, we align the MV-DM and 3D generator representation spaces to transfer the teacher's probabilistic flow to the student, thus avoiding inconsistencies in optimization objectives caused by probabilistic sampling. The introduction of probabilistic flow and the coupling of various attributes in 3D Gaussians introduce challenges in the generation process. To tackle this, we propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow and converts a single image into 3D Gaussians within 0.06 seconds. Furthermore, to reduce knowledge loss and overcome sparse-view supervision, we design a joint optimization objective that ensures the quality of generated samples through explicit supervision and implicit verification. Leveraging existing 2D generation models, we compile 120k high-quality RGBA images for distillation. Experiments on synthetic and public datasets demonstrate the effectiveness of our method.\n\n我们提出了 DD3G，一种将多视图扩散模型（MV-DM）蒸馏为三维生成器的框架，基于高斯喷洒（Gaussian Splatting）实现。DD3G 通过模拟 MV-DM 的常微分方程（ODE）轨迹，压缩并融合其丰富的视觉与空间几何知识，从而使得蒸馏后的生成器相较于仅在三维数据上训练的模型具备更强的泛化能力。\n与以往的摊销式优化方法不同，DD3G 对齐了 MV-DM 与三维生成器的表示空间，从而将教师模型的**概率流（probabilistic flow）**有效传递给学生模型，避免了由概率采样引起的目标函数不一致问题。由于概率流的引入及三维高斯中多种属性的耦合，使得生成过程更加复杂。\n为应对此挑战，我们提出了 PEPD，一种包含**模式提取（Pattern Extraction）与渐进解码（Progressive Decoding）**两个阶段的生成器架构，能够高效融合概率流，并在 0.06 秒内将单张图像转化为三维高斯表示。\n此外，为减少知识损失并克服稀疏视角监督问题，我们设计了一个联合优化目标，结合显式监督与隐式验证，以确保生成样本的质量。\n在蒸馏过程中，我们利用现有的 2D 生成模型构建了包含 12 万张高质量 RGBA 图像的数据集。实验在合成与公开数据集上验证了我们方法的有效性。\n"
  },
  {
    "path": "abs/2504.00525.md",
    "content": "### Robust LiDAR-Camera Calibration with 2D Gaussian Splatting\n\nLiDAR-camera systems have become increasingly popular in robotics recently. A critical and initial step in integrating the LiDAR and camera data is the calibration of the LiDAR-camera system. Most existing calibration methods rely on auxiliary target objects, which often involve complex manual operations, whereas targetless methods have yet to achieve practical effectiveness. Recognizing that 2D Gaussian Splatting (2DGS) can reconstruct geometric information from camera image sequences, we propose a calibration method that estimates LiDAR-camera extrinsic parameters using geometric constraints. The proposed method begins by reconstructing colorless 2DGS using LiDAR point clouds. Subsequently, we update the colors of the Gaussian splats by minimizing the photometric loss. The extrinsic parameters are optimized during this process. Additionally, we address the limitations of the photometric loss by incorporating the reprojection and triangulation losses, thereby enhancing the calibration robustness and accuracy.\n\n近年来，LiDAR-相机系统在机器人领域日益受到关注。将LiDAR与相机数据融合的关键且首要的步骤是对系统进行外参标定。目前大多数标定方法依赖于辅助的靶标物体，这类方法通常需要复杂的人工操作；而无靶标方法则尚未达到实用效果。\n鉴于二维高斯投影（2D Gaussian Splatting, 2DGS）能够从相机图像序列中重建几何信息，我们提出了一种基于几何约束的LiDAR-相机外参估计方法。该方法首先利用LiDAR点云重建出无颜色的2DGS，随后通过最小化光度损失来更新高斯斑点的颜色，并在此过程中优化外参参数。\n此外，为克服光度损失的局限性，我们进一步引入了重投影损失和三角化损失，从而有效提升标定的鲁棒性与精度。\n"
  },
  {
    "path": "abs/2504.00639.md",
    "content": "### Coca-Splat: Collaborative Optimization for Camera Parameters and 3D Gaussians\n\nIn this work, we introduce Coca-Splat, a novel approach to addressing the challenges of sparse view pose-free scene reconstruction and novel view synthesis (NVS) by jointly optimizing camera parameters with 3D Gaussians. Inspired by deformable DEtection TRansformer, we design separate queries for 3D Gaussians and camera parameters and update them layer by layer through deformable Transformer layers, enabling joint optimization in a single network. This design demonstrates better performance because to accurately render views that closely approximate ground-truth images relies on precise estimation of both 3D Gaussians and camera parameters. In such a design, the centers of 3D Gaussians are projected onto each view by camera parameters to get projected points, which are regarded as 2D reference points in deformable cross-attention. With camera-aware multi-view deformable cross-attention (CaMDFA), 3D Gaussians and camera parameters are intrinsically connected by sharing the 2D reference points. Additionally, 2D reference point determined rays (RayRef) defined from camera centers to the reference points assist in modeling relationship between 3D Gaussians and camera parameters through RQ-decomposition on an overdetermined system of equations derived from the rays, enhancing the relationship between 3D Gaussians and camera parameters. Extensive evaluation shows that our approach outperforms previous methods, both pose-required and pose-free, on RealEstate10K and ACID within the same pose-free setting.\n\n在本工作中，我们提出了 Coca-Splat，一种新颖的方法，旨在通过联合优化相机参数与三维高斯，实现稀疏视角、无位姿场景重建与新视角合成（NVS）任务。受可变形目标检测 Transformer（Deformable DEtection TRansformer）的启发，我们为三维高斯与相机参数分别设计了独立的查询（query），并通过可变形 Transformer 层逐层更新，使得二者可以在同一个网络中进行联合优化。\n这种设计展现出更优的性能，因为要精确渲染出高度接近真实图像的视角，既依赖于三维高斯的准确建模，也依赖于相机参数的精准估计。在该框架下，三维高斯的中心通过相机参数投影到各视角，得到的投影点被视为可变形交叉注意力机制中的二维参考点（2D reference points）。\n通过我们提出的相机感知的多视图可变形交叉注意力机制（CaMDFA），三维高斯与相机参数通过共享二维参考点建立起本质联系。此外，我们引入了基于二维参考点定义的参考射线（RayRef），即从相机中心指向参考点的射线，进而构建出一组超定方程。通过对这些方程进行 RQ 分解，可以进一步建模三维高斯与相机参数之间的关系，从而强化它们之间的协同。\n在 RealEstate10K 和 ACID 数据集的无位姿设置下，我们的方法在重建与合成质量方面均显著优于此前的有位姿和无位姿方法。\n"
  },
  {
    "path": "abs/2504.00665.md",
    "content": "### Monocular and Generalizable Gaussian Talking Head Animation\n\nIn this work, we introduce Monocular and Generalizable Gaussian Talking Head Animation (MGGTalk), which requires monocular datasets and generalizes to unseen identities without personalized re-training. Compared with previous 3D Gaussian Splatting (3DGS) methods that requires elusive multi-view datasets or tedious personalized learning/inference, MGGtalk enables more practical and broader applications. However, in the absence of multi-view and personalized training data, the incompleteness of geometric and appearance information poses a significant challenge. To address these challenges, MGGTalk explores depth information to enhance geometric and facial symmetry characteristics to supplement both geometric and appearance features. Initially, based on the pixel-wise geometric information obtained from depth estimation, we incorporate symmetry operations and point cloud filtering techniques to ensure a complete and precise position parameter for 3DGS. Subsequently, we adopt a two-stage strategy with symmetric priors for predicting the remaining 3DGS parameters. We begin by predicting Gaussian parameters for the visible facial regions of the source image. These parameters are subsequently utilized to improve the prediction of Gaussian parameters for the non-visible regions. Extensive experiments demonstrate that MGGTalk surpasses previous state-of-the-art methods, achieving superior performance across various metrics.\n\n在本工作中，我们提出了 MGGTalk（Monocular and Generalizable Gaussian Talking Head Animation），一种无需多视图数据、可泛化至未见身份且无需个性化再训练的高斯说话人动画方法。相较于以往依赖难以获取的多视图数据或繁琐个性化训练/推理的三维高斯喷洒（3D Gaussian Splatting, 3DGS）方法，MGGTalk 在实际性与适用范围上具有更大优势。\n然而，在缺乏多视图与个性化训练数据的条件下，几何与外观信息的不完整性成为主要挑战。为应对此问题，MGGTalk 利用深度信息增强几何结构与面部对称性特征，从而在几何与外观层面进行有效补全。\n具体而言，首先我们基于深度估计获得的像素级几何信息，引入对称操作与点云过滤技术，以获得完整、精确的 3DGS 位置参数。随后，我们采用结合对称先验的两阶段策略预测其余 3DGS 参数：第一阶段预测源图像中可见面部区域的高斯参数，第二阶段则基于第一阶段结果进一步完善不可见区域的参数预测。\n大量实验表明，MGGTalk 在多个评价指标上均优于现有最先进方法，展现出更优异的性能。\n"
  },
  {
    "path": "abs/2504.00763.md",
    "content": "### UnIRe: Unsupervised Instance Decomposition for Dynamic Urban Scene Reconstruction\n\nReconstructing and decomposing dynamic urban scenes is crucial for autonomous driving, urban planning, and scene editing. However, existing methods fail to perform instance-aware decomposition without manual annotations, which is crucial for instance-level scene editing. We propose UnIRe, a 3D Gaussian Splatting (3DGS) based approach that decomposes a scene into a static background and individual dynamic instances using only RGB images and LiDAR point clouds. At its core, we introduce 4D superpoints, a novel representation that clusters multi-frame LiDAR points in 4D space, enabling unsupervised instance separation based on spatiotemporal correlations. These 4D superpoints serve as the foundation for our decomposed 4D initialization, i.e., providing spatial and temporal initialization to train a dynamic 3DGS for arbitrary dynamic classes without requiring bounding boxes or object templates. Furthermore, we introduce a smoothness regularization strategy in both 2D and 3D space, further improving the temporal stability. Experiments on benchmark datasets show that our method outperforms existing methods in decomposed dynamic scene reconstruction while enabling accurate and flexible instance-level editing, making it a practical solution for real-world applications.\n\n重建与分解动态城市场景对于自动驾驶、城市规划以及场景编辑等任务至关重要。然而，现有方法在无人工标注的情况下难以实现实例感知的场景分解，这对于实例级场景编辑来说至关重要。\n我们提出了 UnIRe，一种基于三维高斯喷洒（3D Gaussian Splatting, 3DGS）的方法，仅使用 RGB 图像和激光雷达点云，将场景分解为静态背景与各个动态实例。其核心在于引入了 4D 超点（4D superpoints），这是一种新颖的表示方式，通过在 4D 空间中对多帧激光雷达点进行聚类，基于时空相关性实现无监督的实例分离。\n这些 4D 超点构成了我们的 4D 分解初始化的基础，即为动态 3DGS 提供空间和时间上的初始化，使其能够针对任意动态类别进行训练，无需边界框或对象模板。\n此外，我们在二维与三维空间中引入了平滑正则化策略，进一步提升了时间稳定性。\n在多个基准数据集上的实验表明，UnIRe 在动态场景分解重建方面优于现有方法，同时支持精确且灵活的实例级编辑，为真实场景中的应用提供了实用解决方案。\n"
  },
  {
    "path": "abs/2504.00773.md",
    "content": "### DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting\n\nRecently, 3D Gaussian splatting (3DGS) has gained considerable attentions in the field of novel view synthesis due to its fast performance while yielding the excellent image quality. However, 3DGS in sparse-view settings (e.g., three-view inputs) often faces with the problem of overfitting to training views, which significantly drops the visual quality of novel view images. Many existing approaches have tackled this issue by using strong priors, such as 2D generative contextual information and external depth signals. In contrast, this paper introduces a prior-free method, so-called DropGaussian, with simple changes in 3D Gaussian splatting. Specifically, we randomly remove Gaussians during the training process in a similar way of dropout, which allows non-excluded Gaussians to have larger gradients while improving their visibility. This makes the remaining Gaussians to contribute more to the optimization process for rendering with sparse input views. Such simple operation effectively alleviates the overfitting problem and enhances the quality of novel view synthesis. By simply applying DropGaussian to the original 3DGS framework, we can achieve the competitive performance with existing prior-based 3DGS methods in sparse-view settings of benchmark datasets without any additional complexity.\n\n近年来，三维高斯喷洒（3D Gaussian Splatting, 3DGS）因其高速性能与优异图像质量，在新视角合成领域受到广泛关注。然而，在稀疏视角设置（例如仅提供三个视角）下，3DGS 常常出现对训练视角过拟合的问题，导致新视角图像的视觉质量显著下降。\n许多现有方法通过引入强先验信息（如二维生成上下文或外部深度信号）来应对这一问题。与此不同，本文提出了一种无需先验的方法，命名为 DropGaussian，通过对 3DGS 进行简单改动来提升泛化能力。\n具体而言，我们在训练过程中随机丢弃部分高斯分布，方式类似于 Dropout。这一策略使得未被丢弃的高斯在训练中获得更大的梯度，并提升其可见性，从而在稀疏输入视角下更有效地参与渲染优化。该操作简单直接，却能有效缓解过拟合问题，并提升新视角合成的图像质量。\n仅需在原始 3DGS 框架中引入 DropGaussian，无需引入额外复杂度，即可在基准数据集的稀疏视角设置下达到与现有基于先验方法相当的性能。\n"
  },
  {
    "path": "abs/2504.01358.md",
    "content": "### 3D Gaussian Inverse Rendering with Approximated Global Illumination\n\n3D Gaussian Splatting shows great potential in reconstructing photo-realistic 3D scenes. However, these methods typically bake illumination into their representations, limiting their use for physically-based rendering and scene editing. Although recent inverse rendering approaches aim to decompose scenes into material and lighting components, they often rely on simplifying assumptions that fail when editing. We present a novel approach that enables efficient global illumination for 3D Gaussians Splatting through screen-space ray tracing. Our key insight is that a substantial amount of indirect light can be traced back to surfaces visible within the current view frustum. Leveraging this observation, we augment the direct shading computed by 3D Gaussians with Monte-Carlo screen-space ray-tracing to capture one-bounce indirect illumination. In this way, our method enables realistic global illumination without sacrificing the computational efficiency and editability benefits of 3D Gaussians. Through experiments, we show that the screen-space approximation we utilize allows for indirect illumination and supports real-time rendering and editing.\n\n三维高斯喷洒（3D Gaussian Splatting）在真实感三维场景重建方面展现出巨大潜力。然而，现有方法通常将光照信息“烘焙”进其表示中，限制了其在基于物理渲染与场景编辑中的应用能力。尽管近年来的反向渲染方法尝试将场景分解为材质与光照两个部分，但它们往往依赖简化假设，在编辑时容易失效。\n本文提出了一种新颖方法，能够通过屏幕空间光线追踪（screen-space ray tracing），实现适用于三维高斯喷洒的高效全局光照（global illumination）。我们的核心洞见在于：大量间接光照可追溯至当前视锥内可见的表面。基于这一观察，我们在 3D 高斯计算的直接光照基础上，结合蒙特卡洛屏幕空间光线追踪，捕获一次反射的间接光照。\n该方法在不牺牲 3D 高斯的计算效率与可编辑性的前提下，实现了真实感的全局光照。实验表明，我们所采用的屏幕空间近似方案不仅支持间接光照计算，同时也满足实时渲染与编辑的需求。\n"
  },
  {
    "path": "abs/2504.01503.md",
    "content": "### Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment\n\nCapturing high-quality photographs under diverse real-world lighting conditions is challenging, as both natural lighting (e.g., low-light) and camera exposure settings (e.g., exposure time) significantly impact image quality. This challenge becomes more pronounced in multi-view scenarios, where variations in lighting and image signal processor (ISP) settings across viewpoints introduce photometric inconsistencies. Such lighting degradations and view-dependent variations pose substantial challenges to novel view synthesis (NVS) frameworks based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). To address this, we introduce Luminance-GS, a novel approach to achieving high-quality novel view synthesis results under diverse challenging lighting conditions using 3DGS. By adopting per-view color matrix mapping and view-adaptive curve adjustments, Luminance-GS achieves state-of-the-art (SOTA) results across various lighting conditions -- including low-light, overexposure, and varying exposure -- while not altering the original 3DGS explicit representation. Compared to previous NeRF- and 3DGS-based baselines, Luminance-GS provides real-time rendering speed with improved reconstruction quality.\n\n在多样化的真实光照条件下获取高质量照片是一项具有挑战性的任务，因为自然光照（如低光照）与相机曝光设置（如曝光时间）都会显著影响图像质量。在多视角场景中，这一问题更加复杂——不同视角下的光照变化与图像信号处理器（ISP）设置的差异会引入显著的光度不一致性。这类光照退化与视角依赖性变化给基于神经辐射场（Neural Radiance Fields, NeRF）和三维高斯喷洒（3D Gaussian Splatting, 3DGS）的方法带来了巨大挑战。\n为了解决这一问题，我们提出了 Luminance-GS，一种能够在多种复杂光照条件下实现高质量新视角合成的 3DGS 方法。Luminance-GS 通过引入每视角颜色矩阵映射（per-view color matrix mapping）与视角自适应曲线调整（view-adaptive curve adjustments），在不更改原始 3DGS 显式表示的前提下，有效提升了低光照、过曝、以及曝光不一致等情况下的重建质量。\n与现有 NeRF 和 3DGS 基线方法相比，Luminance-GS 在多种光照环境下均达到了当前最先进（SOTA）的性能表现，同时保持了实时渲染速度与更高的重建质量。\n"
  },
  {
    "path": "abs/2504.01512.md",
    "content": "### High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model\n\nRecently single-view 3D generation via Gaussian splatting has emerged and developed quickly. They learn 3D Gaussians from 2D RGB images generated from pre-trained multi-view diffusion (MVD) models, and have shown a promising avenue for 3D generation through a single image. Despite the current progress, these methods still suffer from the inconsistency jointly caused by the geometric ambiguity in the 2D images, and the lack of structure of 3D Gaussians, leading to distorted and blurry 3D object generation. In this paper, we propose to fix these issues by GS-RGBN, a new RGBN-volume Gaussian Reconstruction Model designed to generate high-fidelity 3D objects from single-view images. Our key insight is a structured 3D representation can simultaneously mitigate the afore-mentioned two issues. To this end, we propose a novel hybrid Voxel-Gaussian representation, where a 3D voxel representation contains explicit 3D geometric information, eliminating the geometric ambiguity from 2D images. It also structures Gaussians during learning so that the optimization tends to find better local optima. Our 3D voxel representation is obtained by a fusion module that aligns RGB features and surface normal features, both of which can be estimated from 2D images. Extensive experiments demonstrate the superiority of our methods over prior works in terms of high-quality reconstruction results, robust generalization, and good efficiency.\n\n近年来，基于高斯喷洒的单视图三维生成技术快速发展。这类方法通常利用预训练多视图扩散模型（Multi-View Diffusion, MVD）生成的二维 RGB 图像，学习对应的三维高斯表示，展现出从单张图像生成三维内容的广阔前景。尽管已有一定进展，但当前方法仍存在两个关键问题：一是二维图像中的几何歧义，二是三维高斯缺乏结构约束，这导致生成结果容易出现形变与模糊等问题。\n为解决上述问题，本文提出 GS-RGBN，一种用于单视图高保真三维物体生成的 RGBN 体积高斯重建模型。我们的核心观点是：结构化的三维表示能够同时缓解上述两类问题。\n具体而言，我们引入了一种新颖的体素-高斯混合表示，其中三维体素表示显式编码了几何结构信息，有效消除了来自二维图像的几何歧义。同时，这种结构化约束也有助于三维高斯在优化过程中收敛到更优的局部最优解。\n我们所采用的三维体素表示通过一个融合模块获得，该模块对齐了从二维图像中估计的 RGB 特征与表面法向特征（Normal Features）。\n大量实验表明，GS-RGBN 在高质量重建效果、稳健的泛化能力以及运行效率方面，均显著优于现有方法。\n"
  },
  {
    "path": "abs/2504.01559.md",
    "content": "### RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avatars\n\nModeling animatable human avatars from monocular or multi-view videos has been widely studied, with recent approaches leveraging neural radiance fields (NeRFs) or 3D Gaussian Splatting (3DGS) achieving impressive results in novel-view and novel-pose synthesis. However, existing methods often struggle to accurately capture the dynamics of loose clothing, as they primarily rely on global pose conditioning or static per-frame representations, leading to oversmoothing and temporal inconsistencies in non-rigid regions. To address this, We propose RealityAvatar, an efficient framework for high-fidelity digital human modeling, specifically targeting loosely dressed avatars. Our method leverages 3D Gaussian Splatting to capture complex clothing deformations and motion dynamics while ensuring geometric consistency. By incorporating a motion trend module and a latentbone encoder, we explicitly model pose-dependent deformations and temporal variations in clothing behavior. Extensive experiments on benchmark datasets demonstrate the effectiveness of our approach in capturing fine-grained clothing deformations and motion-driven shape variations. Our method significantly enhances structural fidelity and perceptual quality in dynamic human reconstruction, particularly in non-rigid regions, while achieving better consistency across temporal frames.\n\n从单目或多视角视频中建模可动画的人体角色是一个被广泛研究的问题，近期的一些方法借助神经辐射场（Neural Radiance Fields, NeRF）或三维高斯喷洒（3D Gaussian Splatting, 3DGS）在新视角与新姿态合成方面取得了显著成果。然而，现有方法在建模宽松衣物的动态变化方面仍面临挑战：它们主要依赖全局姿态条件或逐帧静态表示，难以准确捕捉非刚性区域的细节，容易导致过度平滑和时间不一致问题。\n为此，我们提出了 RealityAvatar，一种高效、高保真的数字人建模框架，专为穿着宽松衣物的角色设计。该方法基于三维高斯喷洒，有效捕捉复杂的衣物变形与运动动态，同时保持几何一致性。\n我们引入了 **运动趋势模块（motion trend module）**与 隐骨编码器（latentbone encoder），显式建模姿态驱动下的衣物形变与时间变化行为，从而增强对动态特征的建模能力。\n在多个基准数据集上的大量实验证明，我们的方法在捕捉细粒度衣物变形与运动驱动的形态变化方面表现出色，显著提升了动态人体重建中非刚性区域的结构保真度与感知质量，同时具备更好的时间一致性。\n"
  },
  {
    "path": "abs/2504.01619.md",
    "content": "### 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting\n\nRecent advancements in text-to-3D generation have shown remarkable results by leveraging 3D priors in combination with 2D diffusion. However, previous methods utilize 3D priors that lack detailed and complex structural information, limiting them to generating simple objects and presenting challenges for creating intricate structures such as bonsai. In this paper, we propose 3DBonsai, a novel text-to-3D framework for generating 3D bonsai with complex structures. Technically, we first design a trainable 3D space colonization algorithm to produce bonsai structures, which are then enhanced through random sampling and point cloud augmentation to serve as the 3D Gaussian priors. We introduce two bonsai generation pipelines with distinct structural levels: fine structure conditioned generation, which initializes 3D Gaussians using a 3D structure prior to produce detailed and complex bonsai, and coarse structure conditioned generation, which employs a multi-view structure consistency module to align 2D and 3D structures. Moreover, we have compiled a unified 2D and 3D Chinese-style bonsai dataset. Our experimental results demonstrate that 3DBonsai significantly outperforms existing methods, providing a new benchmark for structure-aware 3D bonsai generation.\n\n近年来，文本生成三维（text-to-3D）方法通过结合三维先验与二维扩散模型取得了显著成果。然而，现有方法所使用的三维先验往往缺乏细节和复杂结构信息，限制了其生成对象的复杂度，使其仅能生成简单物体，对于如盆景这类结构精细的目标存在较大挑战。\n为此，本文提出 3DBonsai，一个用于生成结构复杂盆景的全新文本生成三维框架。我们首先设计了一种可训练的三维空间拓展算法，用于生成盆景结构，并通过随机采样与点云增强对其进行优化，使其作为三维高斯先验参与生成过程。\n我们引入两种具有不同结构层级的盆景生成流程：细结构条件生成，以三维结构先验初始化三维高斯以生成精细复杂的盆景；粗结构条件生成，则通过多视图结构一致性模块对齐二维与三维结构。\n此外，我们构建了一个统一的二维与三维中式盆景数据集。实验结果表明，3DBonsai 显著优于现有方法，为结构感知的三维盆景生成提供了新的基准。\n"
  },
  {
    "path": "abs/2504.01647.md",
    "content": "### FlowR: Flowing from Sparse to Dense 3D Reconstructions\n\n3D Gaussian splatting enables high-quality novel view synthesis (NVS) at real-time frame rates. However, its quality drops sharply as we depart from the training views. Thus, dense captures are needed to match the high-quality expectations of some applications, e.g. Virtual Reality (VR). However, such dense captures are very laborious and expensive to obtain. Existing works have explored using 2D generative models to alleviate this requirement by distillation or generating additional training views. These methods are often conditioned only on a handful of reference input views and thus do not fully exploit the available 3D information, leading to inconsistent generation results and reconstruction artifacts. To tackle this problem, we propose a multi-view, flow matching model that learns a flow to connect novel view renderings from possibly sparse reconstructions to renderings that we expect from dense reconstructions. This enables augmenting scene captures with novel, generated views to improve reconstruction quality. Our model is trained on a novel dataset of 3.6M image pairs and can process up to 45 views at 540x960 resolution (91K tokens) on one H100 GPU in a single forward pass. Our pipeline consistently improves NVS in sparse- and dense-view scenarios, leading to higher-quality reconstructions than prior works across multiple, widely-used NVS benchmarks.\n\n三维高斯喷溅（3D Gaussian Splatting）能够以实时帧率实现高质量的新视角合成（Novel View Synthesis, NVS）。然而，当观察角度偏离训练视角时，其渲染质量会显著下降。因此，为满足某些应用（如虚拟现实 VR）对高质量的要求，通常需要密集的数据采集，而这类采集过程往往代价高昂且极为繁琐。\n已有研究尝试借助二维生成模型，通过蒸馏或生成额外训练视角来缓解对密集采集的依赖。然而，这些方法通常仅基于少量参考视图进行条件生成，未能充分利用可用的三维信息，导致生成结果存在不一致与重建伪影等问题。\n为解决这一问题，我们提出了一种多视角流匹配模型，该模型学习一种映射流（flow），将来自稀疏重建的视角渲染结果对齐至期望的密集重建渲染结果，从而支持利用生成的新视角增强场景采集数据，提升重建质量。\n我们的方法基于一个包含 360 万图像对的新数据集进行训练，并能在单张 H100 GPU 上以单次前向过程处理多达 45 个视角、540×960 分辨率（共 91K token）的输入。\n该流程在稀疏视角与密集视角场景下均显著提升新视角合成效果，在多个主流 NVS 基准上实现了超过现有方法的高质量重建表现。\n"
  },
  {
    "path": "abs/2504.01732.md",
    "content": "### FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking\n\nThe development of large-scale 3D scene reconstruction and novel view synthesis methods mostly rely on datasets comprising perspective images with narrow fields of view (FoV). While effective for small-scale scenes, these datasets require large image sets and extensive structure-from-motion (SfM) processing, limiting scalability. To address this, we introduce a fisheye image dataset tailored for scene reconstruction tasks. Using dual 200-degree fisheye lenses, our dataset provides full 360-degree coverage of 5 indoor and 5 outdoor scenes. Each scene has sparse SfM point clouds and precise LIDAR-derived dense point clouds that can be used as geometric ground-truth, enabling robust benchmarking under challenging conditions such as occlusions and reflections. While the baseline experiments focus on vanilla Gaussian Splatting and NeRF based Nerfacto methods, the dataset supports diverse approaches for scene reconstruction, novel view synthesis, and image-based rendering.\n\n当前大规模三维场景重建与新视角合成方法的发展，主要依赖于由窄视场（FoV）透视图像构建的数据集。这类数据集虽适用于小规模场景，但通常需要大量图像与复杂的结构自运动（SfM）处理流程，限制了其在更大场景中的可扩展性。\n为了解决这一问题，本文引入了一个专为场景重建任务设计的鱼眼图像数据集。该数据集使用双 200 度鱼眼镜头，提供对 5 个室内场景与 5 个室外场景的完整 360 度覆盖。每个场景均配备了稀疏 SfM 点云与精确的激光雷达（LIDAR）密集点云，后者可作为几何真值，用于在遮挡、反射等复杂条件下进行稳健评估。\n尽管基线实验聚焦于原始高斯喷洒与基于 NeRF 的 Nerfacto 方法，该数据集同时支持多种场景重建、新视角合成与基于图像的渲染方法，为相关研究提供了丰富的测试平台。\n"
  },
  {
    "path": "abs/2504.01844.md",
    "content": "### BOGausS: Better Optimized Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) proposes an efficient solution for novel view synthesis. Its framework provides fast and high-fidelity rendering. Although less complex than other solutions such as Neural Radiance Fields (NeRF), there are still some challenges building smaller models without sacrificing quality. In this study, we perform a careful analysis of 3DGS training process and propose a new optimization methodology. Our Better Optimized Gaussian Splatting (BOGausS) solution is able to generate models up to ten times lighter than the original 3DGS with no quality degradation, thus significantly boosting the performance of Gaussian Splatting compared to the state of the art.\n\n三维高斯喷洒（3D Gaussian Splatting，3DGS）为新视角合成提供了一种高效解决方案，其框架兼具高速渲染与高保真图像质量。尽管相比神经辐射场（Neural Radiance Fields，NeRF）等方法结构更为简洁，3DGS 在构建轻量模型而不损失质量方面仍面临挑战。\n在本研究中，我们对 3DGS 的训练过程进行了细致分析，并提出了一种新的优化方法。我们的方法被称为 BOGausS（Better Optimized Gaussian Splatting），可在不降低渲染质量的前提下，生成比原始 3DGS 小 十倍的模型，从而大幅提升高斯喷洒的整体性能，优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2504.01957.md",
    "content": "### Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting\n\nBird's-eye view (BEV) perception has gained significant attention because it provides a unified representation to fuse multiple view images and enables a wide range of down-stream autonomous driving tasks, such as forecasting and planning. Recent state-of-the-art models utilize projection-based methods which formulate BEV perception as query learning to bypass explicit depth estimation. While we observe promising advancements in this paradigm, they still fall short of real-world applications because of the lack of uncertainty modeling and expensive computational requirement. In this work, we introduce GaussianLSS, a novel uncertainty-aware BEV perception framework that revisits unprojection-based methods, specifically the Lift-Splat-Shoot (LSS) paradigm, and enhances them with depth un-certainty modeling. GaussianLSS represents spatial dispersion by learning a soft depth mean and computing the variance of the depth distribution, which implicitly captures object extents. We then transform the depth distribution into 3D Gaussians and rasterize them to construct uncertainty-aware BEV features. We evaluate GaussianLSS on the nuScenes dataset, achieving state-of-the-art performance compared to unprojection-based methods. In particular, it provides significant advantages in speed, running 2.5x faster, and in memory efficiency, using 0.3x less memory compared to projection-based methods, while achieving competitive performance with only a 0.4% IoU difference.\n\n鸟瞰视角（Bird’s-eye view, BEV）感知因其能够融合多视角图像为统一表示，并支持诸如预测与路径规划等多种自动驾驶下游任务，近年来受到广泛关注。当前最先进的方法多采用基于投影的策略，将 BEV 感知建模为查询学习任务，从而绕过显式的深度估计。尽管该范式在性能上取得了显著进展，但在实际应用中仍存在不足，主要体现在缺乏不确定性建模以及计算资源消耗过高。\n为此，我们提出 GaussianLSS，一种具备不确定性感知能力的 BEV 感知框架，重新审视了基于反投影的方法，特别是 **Lift-Splat-Shoot（LSS）**范式，并在此基础上引入深度不确定性建模进行增强。GaussianLSS 通过学习软深度均值（soft depth mean）并计算深度分布的方差，来表达空间分布的离散程度，从而隐式捕捉目标的空间尺度。\n随后，GaussianLSS 将深度分布转换为三维高斯表示，并进行光栅化以构建具备不确定性表达的 BEV 特征。\n在 nuScenes 数据集上的评估结果表明，GaussianLSS 在基于反投影方法中达到了当前最优性能。特别地，相比基于投影的方法，GaussianLSS 的推理速度提升 2.5 倍，内存占用降低至原来的 0.3 倍，同时在性能上仅有 0.4% IoU 的轻微差距，展现出卓越的效率与精度平衡。\n"
  },
  {
    "path": "abs/2504.01960.md",
    "content": "### Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis\n\nRecent advancements in 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have achieved impressive results in real-time 3D reconstruction and novel view synthesis. However, these methods struggle in large-scale, unconstrained environments where sparse and uneven input coverage, transient occlusions, appearance variability, and inconsistent camera settings lead to degraded quality. We propose GS-Diff, a novel 3DGS framework guided by a multi-view diffusion model to address these limitations. By generating pseudo-observations conditioned on multi-view inputs, our method transforms under-constrained 3D reconstruction problems into well-posed ones, enabling robust optimization even with sparse data. GS-Diff further integrates several enhancements, including appearance embedding, monocular depth priors, dynamic object modeling, anisotropy regularization, and advanced rasterization techniques, to tackle geometric and photometric challenges in real-world settings. Experiments on four benchmarks demonstrate that GS-Diff consistently outperforms state-of-the-art baselines by significant margins.\n\n近年来，三维高斯喷洒（3D Gaussian Splatting, 3DGS）与神经辐射场（Neural Radiance Fields, NeRF）在实时三维重建与新视角合成方面取得了令人瞩目的成果。然而，在大规模、非受控环境下，这些方法仍面临诸多挑战，如输入视角稀疏且分布不均、短暂遮挡、外观变化显著，以及相机设置不一致等问题，导致渲染质量严重下降。\n为应对这些限制，本文提出 GS-Diff，一种结合多视图扩散模型的全新 3DGS 框架。GS-Diff 通过在多视图条件下生成伪观测图像（pseudo-observations），将原本欠约束的三维重建问题转化为良设问题，即便在数据稀疏的情况下也能实现稳健优化。\n此外，GS-Diff 融合了多项增强机制，以应对真实场景中的几何与光度挑战，包括：外观嵌入（appearance embedding）、单目深度先验、动态物体建模、各向异性正则化（anisotropy regularization）以及先进的光栅化技术。\n在四个基准数据集上的实验结果表明，GS-Diff 在各项指标上均显著优于当前最先进方法，展现出一致且卓越的性能提升。\n"
  },
  {
    "path": "abs/2504.02045.md",
    "content": "### WorldPrompter: Traversable Text-to-Scene Generation\n\nScene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360° details of a scene. WorldPrompter incorporates a conditional 360° panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model achieves convincing view consistency across frames, enabling high-quality panoramic Gaussian splat reconstruction and facilitating traversal over an area of the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360° video generators and 3D scene generation models.\n\n场景级三维生成是一项具有挑战性的研究课题，现有方法大多只能生成局部场景，并在可导航性方面存在较大限制。本文提出 WorldPrompter，一种基于文本提示生成可行走三维场景的新型生成流程。\n我们引入全景视频作为中间表示，用以建模场景的 360° 全方位细节。WorldPrompter 包含一个条件式 360° 全景视频生成器，能够生成一段包含 128 帧的视频，模拟人在虚拟环境中行走并拍摄的过程。该视频随后由一个快速前馈的三维重建器重建为高斯喷洒表示，使得用户可以在场景中实现真实的行走体验。\n实验表明，我们的全景视频生成模型在帧间视角一致性上表现出色，支持高质量的全景高斯喷洒重建，并能够覆盖场景中的多个区域，实现可遍历的三维空间。同时，定性与定量结果也显示 WorldPrompter 相较于当前最先进的 360° 视频生成与三维场景生成方法具有更优性能。\n"
  },
  {
    "path": "abs/2504.02158.md",
    "content": "### UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting\n\nWe present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs). Specifically, our approach focuses on synthesizing foreground components, such as various human instances in motion within complex scene backgrounds, from UAV perspectives. This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses. To the best of our knowledge, UAVTwin is the first approach for UAV-based perception that is capable of generating high-fidelity digital twins based on 3DGS. The proposed work significantly enhances downstream models through data augmentation for real-world environments with multiple dynamic objects and significant appearance variations-both of which typically introduce artifacts in 3DGS-based modeling. To tackle these challenges, we propose a novel appearance modeling strategy and a mask refinement module to enhance the training of 3D Gaussian Splatting. We demonstrate the high quality of neural rendering by achieving a 1.23 dB improvement in PSNR compared to recent methods. Furthermore, we validate the effectiveness of data augmentation by showing a 2.5% to 13.7% improvement in mAP for the human detection task.\n\n我们提出 UAVTwin，一种用于从真实环境中构建数字孪生体，并为无人机（UAV）中嵌入的下游模型提供数据增强的方法。该方法特别关注从无人机视角合成前景要素，例如在复杂场景背景中运动的人体实例。\nUAVTwin 通过结合三维高斯喷洒（3D Gaussian Splatting, 3DGS）对背景进行重建，并引入可控的合成人体模型，以多样的外观、动作与姿态呈现，完成对真实环境的仿真。我们的方法是目前首个基于 3DGS 构建高保真数字孪生体并服务于无人机感知任务的方案。\n本研究显著提升了下游模型在包含多个动态物体与显著外观变化的现实场景中的表现，而这类因素通常会对基于 3DGS 的建模造成伪影。为解决这些挑战，我们提出了一种新的外观建模策略以及一个掩码细化模块，用于提升 3DGS 训练过程中的建模质量。\n在神经渲染效果方面，我们在 PSNR 上相比最新方法提升 1.23 dB；在数据增强效能方面，我们在人类检测任务中实现了 2.5% 至 13.7% 的 mAP 提升，验证了该方法在下游任务中的实际价值。\n"
  },
  {
    "path": "abs/2504.02278.md",
    "content": "### Digital-twin imaging based on descattering Gaussian splatting\n\nThree-dimensional imaging through scattering media is important in medical science and astronomy. We propose a digital-twin imaging method based on Gaussian splatting to observe an object behind a scattering medium. A digital twin model built through data assimilation, emulates the behavior of objects and environmental changes in a virtual space. By constructing a digital twin using point clouds composed of Gaussians and simulating the scattering process through the convolution of a point spread function, three-dimensional objects behind a scattering medium can be reproduced as a digital twin. In this study, a high-contrast digital twin reproducing a three-dimensional object was successfully constructed from degraded images, assuming that data were acquired from wavefronts disturbed by a scattering medium. This technique reproduces objects by integrating data processing with image measurements.\n\n通过散射介质进行三维成像在医学和天文学领域具有重要意义。我们提出了一种基于Gaussian Splatting 的数字孪生成像方法，用于观察隐藏在散射介质后的目标物体。\n该方法通过数据同化构建数字孪生模型，在虚拟空间中模拟物体行为及环境变化。具体而言，我们利用由高斯点云构成的三维场景构建数字孪生，并通过点扩散函数（PSF）卷积模拟散射过程，从而再现被散射介质遮挡的三维物体。\n在本研究中，我们在假设波前受散射介质扰动的前提下，从退化图像中成功构建出了高对比度的三维数字孪生体。该技术通过数据处理与图像测量的融合，实现了目标物体的重建。\n"
  },
  {
    "path": "abs/2504.02316.md",
    "content": "### ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation\n\nRecent advances in zero-shot text-to-3D generation have revolutionized 3D content creation by enabling direct synthesis from textual descriptions. While state-of-the-art methods leverage 3D Gaussian Splatting with score distillation to enhance multi-view rendering through pre-trained text-to-image (T2I) models, they suffer from inherent view biases in T2I priors. These biases lead to inconsistent 3D generation, particularly manifesting as the multi-face Janus problem, where objects exhibit conflicting features across views. To address this fundamental challenge, we propose ConsDreamer, a novel framework that mitigates view bias by refining both the conditional and unconditional terms in the score distillation process: (1) a View Disentanglement Module (VDM) that eliminates viewpoint biases in conditional prompts by decoupling irrelevant view components and injecting precise camera parameters; and (2) a similarity-based partial order loss that enforces geometric consistency in the unconditional term by aligning cosine similarities with azimuth relationships. Extensive experiments demonstrate that ConsDreamer effectively mitigates the multi-face Janus problem in text-to-3D generation, outperforming existing methods in both visual quality and consistency.\n\n零样本文本生成三维（zero-shot text-to-3D generation）的最新进展正在革新三维内容创作，使得可以直接从文本描述中合成三维内容。尽管当前最先进的方法利用三维高斯喷洒（3D Gaussian Splatting）结合得分蒸馏（score distillation），借助预训练的文本到图像（T2I）模型提升多视角渲染质量，但这类方法仍受到 T2I 先验中固有视角偏差的限制。这些偏差导致三维生成不一致，特别表现为“多面 Janus 问题”，即同一物体在不同视角下呈现出相互矛盾的特征。\n为了解决这一核心挑战，我们提出 ConsDreamer，一个通过同时优化得分蒸馏过程中的条件项与无条件项来缓解视角偏差的全新框架。具体而言，我们引入视角解耦模块（View Disentanglement Module, VDM），通过剥离条件提示中与视角无关的成分并引入精确的相机参数，降低条件项中的视角偏倚。同时，在无条件项中，我们设计了基于相似度的偏序损失函数，通过将余弦相似度与方位角关系对齐，以增强几何一致性。\n大量实验结果表明，ConsDreamer 能有效缓解文本生成三维任务中的多面 Janus 问题，在视觉质量与一致性方面均优于现有方法。\n"
  },
  {
    "path": "abs/2504.02437.md",
    "content": "### MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM\n\nWe present MonoGS++, a novel fast and accurate Simultaneous Localization and Mapping (SLAM) method that leverages 3D Gaussian representations and operates solely on RGB inputs. While previous 3D Gaussian Splatting (GS)-based methods largely depended on depth sensors, our approach reduces the hardware dependency and only requires RGB input, leveraging online visual odometry (VO) to generate sparse point clouds in real-time. To reduce redundancy and enhance the quality of 3D scene reconstruction, we implemented a series of methodological enhancements in 3D Gaussian mapping. Firstly, we introduced dynamic 3D Gaussian insertion to avoid adding redundant Gaussians in previously well-reconstructed areas. Secondly, we introduced clarity-enhancing Gaussian densification module and planar regularization to handle texture-less areas and flat surfaces better. We achieved precise camera tracking results both on the synthetic Replica and real-world TUM-RGBD datasets, comparable to those of the state-of-the-art. Additionally, our method realized a significant 5.57x improvement in frames per second (fps) over the previous state-of-the-art, MonoGS.\n\n我们提出了 MonoGS++，一种新颖的、快速且高精度的同步定位与建图（Simultaneous Localization and Mapping, SLAM）方法，该方法基于三维高斯表示，并仅使用 RGB 图像作为输入。与以往依赖深度传感器的三维高斯喷洒（3D Gaussian Splatting, GS）方法不同，MonoGS++ 显著降低了对硬件的依赖，仅通过在线视觉里程计（Visual Odometry, VO）实现实时稀疏点云生成。\n为了降低冗余并提升三维场景重建质量，我们在高斯建图模块中引入了一系列方法改进。首先，提出动态三维高斯插入策略，可避免在已充分重建区域重复添加冗余高斯点。其次，我们引入了增强清晰度的高斯致密化模块与平面正则化机制，更好地应对低纹理区域与大平面区域的建模挑战。\n在 Replica 合成数据集与 TUM-RGBD 真实世界数据集上的实验表明，MonoGS++ 在相机追踪精度方面与当前最先进方法相当。同时，相较于前一代方法 MonoGS，MonoGS++ 在运行速度上提升了 5.57 倍，展现出卓越的效率与精度兼备的性能优势。\n"
  },
  {
    "path": "abs/2504.02764.md",
    "content": "### Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model\n\nIn this paper, we propose Scene Splatter, a momentum-based paradigm for video diffusion to generate generic scenes from single image. Existing methods, which employ video generation models to synthesize novel views, suffer from limited video length and scene inconsistency, leading to artifacts and distortions during further reconstruction. To address this issue, we construct noisy samples from original features as momentum to enhance video details and maintain scene consistency. However, for latent features with the perception field that spans both known and unknown regions, such latent-level momentum restricts the generative ability of video diffusion in unknown regions. Therefore, we further introduce the aforementioned consistent video as a pixel-level momentum to a directly generated video without momentum for better recovery of unseen regions. Our cascaded momentum enables video diffusion models to generate both high-fidelity and consistent novel views. We further finetune the global Gaussian representations with enhanced frames and render new frames for momentum update in the next step. In this manner, we can iteratively recover a 3D scene, avoiding the limitation of video length. Extensive experiments demonstrate the generalization capability and superior performance of our method in high-fidelity and consistent scene generation.\n\n本文提出了 Scene Splatter，一种基于动量的视频扩散生成新范式，用于从单张图像生成通用场景视频。现有方法多采用视频生成模型合成新视角，但普遍存在视频时长受限与场景不一致的问题，进而在后续三维重建中引发伪影与失真。\n为解决这一问题，我们从原始特征中构建带噪样本，作为动量信号，用于增强视频细节并保持场景一致性。然而，当潜在特征的感知范围覆盖已知与未知区域时，此类潜在层级的动量机制会在未知区域限制视频扩散模型的生成能力。\n因此，我们进一步引入上述一致性视频作为像素级动量，辅以一条不含动量的直接生成路径，以更好地还原不可见区域。通过这种方式，我们的级联动量机制使得视频扩散模型能够生成同时具备高保真度与场景一致性的新视角视频。\n在此基础上，我们还对全局三维高斯表示进行微调，并利用增强帧渲染出新的帧图像，用于下一轮的动量更新。通过这种迭代过程，我们可以逐步恢复完整的三维场景，避免传统方法受限于视频长度的瓶颈。\n大量实验证明，我们的方法在高保真、场景一致性和泛化能力方面均显著优于现有方法。\n"
  },
  {
    "path": "abs/2504.03059.md",
    "content": "### Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization\n\n3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in 3D reconstruction, achieving high-quality results with real-time radiance field rendering. However, a key challenge is the substantial storage cost: reconstructing a single scene typically requires millions of Gaussian splats, each represented by 59 floating-point parameters, resulting in approximately 1~GB of memory. To address this challenge, we propose a compression method by building separate attribute codebooks and storing only discrete code indices. Specifically, we employ noise-substituted vector quantization technique to jointly train the codebooks and model features, ensuring consistency between gradient descent optimization and parameter discretization. Our method reduces the memory consumption efficiently (around 45×) while maintaining competitive reconstruction quality on standard 3D benchmark scenes. Experiments on different codebook sizes show the trade-off between compression ratio and image quality. Furthermore, the trained compressed model remains fully compatible with popular 3DGS viewers and enables faster rendering speed, making it well-suited for practical applications.\n\n三维高斯喷洒（3D Gaussian Splatting, 3DGS）在三维重建中展现出卓越的效果，能够实现高质量的结果与实时辐射场渲染。然而，其面临的主要挑战之一是极高的存储成本：重建单个场景通常需要数百万个高斯点，每个点由 59 个浮点参数表示，总体内存占用约为 1 GB。\n为解决这一问题，本文提出了一种压缩方法，通过构建独立的属性码本并仅存储离散的编码索引，实现有效压缩。具体而言，我们采用**带噪向量量化（noise-substituted vector quantization）**策略，在训练过程中联合优化码本与模型特征，从而确保梯度下降优化与参数离散化之间的一致性。\n该方法在保持重建质量的同时，将内存消耗有效压缩至原来的约 1/45，在多个标准三维重建基准场景上验证了其优异性能。不同码本大小的实验也揭示了压缩率与图像质量之间的权衡关系。此外，压缩后的模型可完全兼容现有主流 3DGS 渲染器，并实现更快的渲染速度，适用于多种实际应用场景。\n"
  },
  {
    "path": "abs/2504.03536.md",
    "content": "### HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration\n\nSingle-image human reconstruction is vital for digital human modeling applications but remains an extremely challenging task. Current approaches rely on generative models to synthesize multi-view images for subsequent 3D reconstruction and animation. However, directly generating multiple views from a single human image suffers from geometric inconsistencies, resulting in issues like fragmented or blurred limbs in the reconstructed models. To tackle these limitations, we introduce HumanDreamer-X, a novel framework that integrates multi-view human generation and reconstruction into a unified pipeline, which significantly enhances the geometric consistency and visual fidelity of the reconstructed 3D models. In this framework, 3D Gaussian Splatting serves as an explicit 3D representation to provide initial geometry and appearance priority. Building upon this foundation, HumanFixer is trained to restore 3DGS renderings, which guarantee photorealistic results. Furthermore, we delve into the inherent challenges associated with attention mechanisms in multi-view human generation, and propose an attention modulation strategy that effectively enhances geometric details identity consistency across multi-view. Experimental results demonstrate that our approach markedly improves generation and reconstruction PSNR quality metrics by 16.45% and 12.65%, respectively, achieving a PSNR of up to 25.62 dB, while also showing generalization capabilities on in-the-wild data and applicability to various human reconstruction backbone models.\n\n单张图像的人体重建对于数字人建模应用至关重要，但仍是一项极具挑战性的任务。目前的主流方法通常依赖生成模型从单张人像合成多视角图像，继而进行三维重建与动画生成。然而，这种从单图直接生成多视角的方式常常存在几何不一致性，导致重建模型出现四肢破碎或模糊等问题。\n为克服上述局限，我们提出 HumanDreamer-X，一个将多视角人体生成与三维重建整合为统一流程的新型框架，显著提升了重建模型在几何一致性与视觉保真度方面的表现。在该框架中，**三维高斯喷洒（3D Gaussian Splatting）**作为显式三维表示，提供初始几何结构与外观先验。在此基础上，我们引入 HumanFixer 模块，用于修复 3DGS 渲染结果，确保生成图像具有逼真的视觉效果。\n此外，我们深入分析了多视角人体生成任务中注意力机制的固有挑战，并提出了一种注意力调控策略（attention modulation strategy），以增强多视角间几何细节与身份特征的一致性。\n实验结果表明，HumanDreamer-X 在生成与重建图像质量上分别提升了 16.45% 和 12.65% 的 PSNR，重建最高可达 25.62 dB，同时展现出对真实场景数据的良好泛化能力，并适配多种人体重建主干网络，具有广泛的实用潜力。\n"
  },
  {
    "path": "abs/2504.03886.md",
    "content": "### WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments\n\nWe present WildGS-SLAM, a robust and efficient monocular RGB SLAM system designed to handle dynamic environments by leveraging uncertainty-aware geometric mapping. Unlike traditional SLAM systems, which assume static scenes, our approach integrates depth and uncertainty information to enhance tracking, mapping, and rendering performance in the presence of moving objects. We introduce an uncertainty map, predicted by a shallow multi-layer perceptron and DINOv2 features, to guide dynamic object removal during both tracking and mapping. This uncertainty map enhances dense bundle adjustment and Gaussian map optimization, improving reconstruction accuracy. Our system is evaluated on multiple datasets and demonstrates artifact-free view synthesis. Results showcase WildGS-SLAM's superior performance in dynamic environments compared to state-of-the-art methods.\n\n我们提出了 WildGS-SLAM，这是一种鲁棒且高效的单目RGB SLAM系统，通过引入具备不确定性感知的几何建图能力，专为应对动态环境而设计。\n与假设场景静态的传统SLAM方法不同，我们的方法融合了深度信息与不确定性估计，以提升在存在运动物体情况下的跟踪、建图与渲染性能。我们引入了一种由浅层多层感知机（MLP）和 DINOv2 特征预测的不确定性图，用于在跟踪与建图过程中引导动态物体的剔除。\n该不确定性图进一步增强了稠密束调（dense bundle adjustment）和高斯地图优化，从而提升了重建精度。我们在多个数据集上对该系统进行了评估，结果表明 WildGS-SLAM 能实现无伪影的新视角合成，在动态环境下相较于当前最先进的方法表现出更优越的性能。\n"
  },
  {
    "path": "abs/2504.04153.md",
    "content": "### Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization\n\nThe advancement of 4D (i.e., sequential 3D) generation opens up new possibilities for lifelike experiences in various applications, where users can explore dynamic objects or characters from any viewpoint. Meanwhile, video generative models are receiving particular attention given their ability to produce realistic and imaginative frames. These models are also observed to exhibit strong 3D consistency, indicating the potential to act as world simulators. In this work, we present Video4DGen, a novel framework that excels in generating 4D representations from single or multiple generated videos as well as generating 4D-guided videos. This framework is pivotal for creating high-fidelity virtual contents that maintain both spatial and temporal coherence. The 4D outputs generated by Video4DGen are represented using our proposed Dynamic Gaussian Surfels (DGS), which optimizes time-varying warping functions to transform Gaussian surfels (surface elements) from a static state to a dynamically warped state. We design warped-state geometric regularization and refinements on Gaussian surfels, to preserve the structural integrity and fine-grained appearance details. To perform 4D generation from multiple videos and capture representation across spatial, temporal, and pose dimensions, we design multi-video alignment, root pose optimization, and pose-guided frame sampling strategies. The leveraging of continuous warping fields also enables a precise depiction of pose, motion, and deformation over per-video frames. Further, to improve the overall fidelity from the observation of all camera poses, Video4DGen performs novel-view video generation guided by the 4D content, with the proposed confidence-filtered DGS to enhance the quality of generated sequences. With the ability of 4D and video generation, Video4DGen offers a powerful tool for applications in virtual reality, animation, and beyond.\n\n随着 4D（即时序三维）生成技术的发展，人们在各类应用中得以实现更加真实的沉浸式体验，用户可以从任意视角探索动态物体或角色。同时，视频生成模型因其生成真实且富有想象力的画面能力而受到高度关注，这类模型也被观察到展现出良好的三维一致性，具备充当“世界模拟器”的潜力。\n在本研究中，我们提出了 Video4DGen，这是一个全新框架，能够从单个或多个生成视频中构建 4D 表示，也可以用于生成受 4D 内容引导的视频。该框架对于创建在时空维度上均保持高度一致性的高保真虚拟内容具有关键意义。\nVideo4DGen 生成的 4D 输出采用我们提出的 动态高斯面元（Dynamic Gaussian Surfels, DGS） 表示形式。通过优化时变形变函数，DGS 将静态状态下的高斯面元转换为动态变形状态。我们设计了针对变形状态的几何正则化与外观细节优化机制，以保持结构完整性和高质量纹理表现。\n为实现多视频驱动的 4D 生成，并捕捉跨空间、时间与姿态维度的一致表示，我们进一步提出了多视频对齐机制、根姿态优化策略以及基于姿态的帧采样方法。通过连续形变场的引入，系统可对每个视频中的姿态、运动与形变实现精细表达。\n此外，为了提升从各视角观察下的整体真实感，Video4DGen 还支持基于 4D 内容的新视角视频生成，并引入 置信度过滤的 DGS（confidence-filtered DGS） 机制来提升合成序列的质量。\n凭借其在 4D 与视频生成方面的能力，Video4DGen 为虚拟现实、动画等领域提供了一个功能强大的创作工具。\n"
  },
  {
    "path": "abs/2504.04190.md",
    "content": "### Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning\n\nGaussian Splatting (GS) has recently marked a significant advancement in 3D reconstruction, delivering both rapid rendering and high-quality results. However, existing 3DGS methods pose challenges in understanding underlying 3D semantics, which hinders model controllability and interpretability. To address it, we propose an interpretable single-view 3DGS framework, termed 3DisGS, to discover both coarse- and fine-grained 3D semantics via hierarchical disentangled representation learning (DRL). Specifically, the model employs a dual-branch architecture, consisting of a point cloud initialization branch and a triplane-Gaussian generation branch, to achieve coarse-grained disentanglement by separating 3D geometry and visual appearance features. Subsequently, fine-grained semantic representations within each modality are further discovered through DRL-based encoder-adapters. To our knowledge, this is the first work to achieve unsupervised interpretable 3DGS. Evaluations indicate that our model achieves 3D disentanglement while preserving high-quality and rapid reconstruction.\n\n高斯喷洒（Gaussian Splatting, GS）近期在三维重建领域取得了重大突破，兼具快速渲染与高质量表现。然而，现有的三维高斯喷洒（3DGS）方法在三维语义理解方面仍存在困难，限制了模型的可控性与可解释性。\n为解决这一问题，我们提出了一种可解释的单视图 3DGS 框架，命名为 3DisGS，通过分层解耦表示学习（hierarchical disentangled representation learning, DRL），实现对粗粒度与细粒度三维语义的自动发现。\n具体而言，该方法采用双分支架构，包括点云初始化分支与三平面高斯生成分支，以实现粗粒度解耦，将三维几何信息与视觉外观特征进行有效分离。在此基础上，我们进一步通过 DRL 驱动的编码器-适配器模块，挖掘每一模态下的细粒度语义表示。\n据我们所知，这是首个实现无监督可解释三维高斯喷洒的方法。实验结果表明，3DisGS 在保持高质量与快速重建性能的同时，有效实现了三维语义的解耦与可解释建模。\n"
  },
  {
    "path": "abs/2504.04294.md",
    "content": "### 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS\n\n3D Gaussian Splatting (3DGS) has revolutionized neural rendering with its efficiency and quality, but like many novel view synthesis methods, it heavily depends on accurate camera poses from Structure-from-Motion (SfM) systems. Although recent SfM pipelines have made impressive progress, questions remain about how to further improve both their robust performance in challenging conditions (e.g., textureless scenes) and the precision of camera parameter estimation simultaneously. We present 3R-GS, a 3D Gaussian Splatting framework that bridges this gap by jointly optimizing 3D Gaussians and camera parameters from large reconstruction priors MASt3R-SfM. We note that naively performing joint 3D Gaussian and camera optimization faces two challenges: the sensitivity to the quality of SfM initialization, and its limited capacity for global optimization, leading to suboptimal reconstruction results. Our 3R-GS, overcomes these issues by incorporating optimized practices, enabling robust scene reconstruction even with imperfect camera registration. Extensive experiments demonstrate that 3R-GS delivers high-quality novel view synthesis and precise camera pose estimation while remaining computationally efficient.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）以其高效率与高质量彻底革新了神经渲染领域，但与许多新颖的视角合成方法一样，其高度依赖来自结构光恢复（Structure-from-Motion, SfM）系统的精确相机位姿。尽管近期SfM流程已取得显著进展，但在复杂场景下的鲁棒性提升（如无纹理区域）以及相机参数估计精度提升方面，仍存在挑战。\n为此，我们提出了 3R-GS，这是一种基于3DGS的框架，通过引入大型重建先验 MASt3R-SfM，实现对三维高斯与相机参数的联合优化，以填补上述空白。\n我们指出，直接进行3D高斯与相机参数的联合优化存在两个主要难点：其一是对SfM初始化质量极其敏感；其二是缺乏全局优化能力，容易导致次优重建结果。\n3R-GS 通过引入一系列优化实践，有效克服了这些问题，使得即便在相机配准存在误差的条件下，也能实现鲁棒的场景重建。大量实验证明，3R-GS 在保持计算效率的同时，能够实现高质量的新视角合成与高精度的相机位姿估计。\n"
  },
  {
    "path": "abs/2504.04844.md",
    "content": "### Embracing Dynamics: Dynamics-aware 4D Gaussian Splatting SLAM\n\nSimultaneous localization and mapping (SLAM) technology now has photorealistic mapping capabilities thanks to the real-time high-fidelity rendering capability of 3D Gaussian splatting (3DGS). However, due to the static representation of scenes, current 3DGS-based SLAM encounters issues with pose drift and failure to reconstruct accurate maps in dynamic environments. To address this problem, we present D4DGS-SLAM, the first SLAM method based on 4DGS map representation for dynamic environments. By incorporating the temporal dimension into scene representation, D4DGS-SLAM enables high-quality reconstruction of dynamic scenes. Utilizing the dynamics-aware InfoModule, we can obtain the dynamics, visibility, and reliability of scene points, and filter stable static points for tracking accordingly. When optimizing Gaussian points, we apply different isotropic regularization terms to Gaussians with varying dynamic characteristics. Experimental results on real-world dynamic scene datasets demonstrate that our method outperforms state-of-the-art approaches in both camera pose tracking and map quality.\n\n借助于3D Gaussian Splatting（3DGS）在实时高保真渲染方面的能力，同时定位与建图（SLAM）技术如今已具备逼真的地图构建能力。然而，由于现有3DGS采用的是静态场景表示，在面对动态环境时，基于3DGS的SLAM仍存在位姿漂移和地图重建精度不足的问题。\n为了解决这一问题，我们提出了 D4DGS-SLAM，这是首个基于四维高斯表示（4DGS）的动态场景SLAM方法。通过在场景表示中引入时间维度，D4DGS-SLAM 能够实现动态场景的高质量重建。\n借助于具备动态感知能力的 InfoModule，系统可以获取场景点的动态性、可见性和可靠性，并据此筛选出稳定的静态点用于跟踪。在优化高斯点时，我们针对具有不同动态特性的高斯点，施加了不同的各向同性正则化项。\n在真实世界动态场景数据集上的实验结果表明，D4DGS-SLAM 在相机位姿跟踪精度和地图质量方面，均显著优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2504.04857.md",
    "content": "### 3D Gaussian Particle Approximation of VDB Datasets: A Study for Scientific Visualization\n\nThe complexity and scale of Volumetric and Simulation datasets for Scientific Visualization(SciVis) continue to grow. And the approaches and advantages of memory-efficient data formats and storage techniques for such datasets vary. OpenVDB library and its VDB data format excels in memory efficiency through its hierarchical and dynamic tree structure, with active and inactive sub-trees for data storage. It is heavily used in current production renderers for both animation and rendering stages in VFX pipelines and photorealistic rendering of volumes and fluids. However, it still remains to be fully leveraged in SciVis where domains dealing with sparse scalar fields like porous media, time varying volumes such as tornado and weather simulation or high resolution simulation of Computational Fluid Dynamics present ample number of large challenging data this http URL of this paper is not only to explore the use of OpenVDB in SciVis but also to explore a level of detail(LOD) technique using 3D Gaussian particles approximating voxel regions. For rendering, we utilize NVIDIA OptiX library for ray marching through the Gaussians particles. Data modeling using 3D Gaussians has been very popular lately due to success in stereoscopic image to 3D scene conversion using Gaussian Splatting and Gaussian approximation and mixture models aren't entirely new in SciVis as well. Our work explores the integration with rendering software libraries like OpenVDB and OptiX to take advantage of their built-in memory compaction and hardware acceleration features, while also leveraging the performance capabilities of modern GPUs. Thus, we present a SciVis rendering approach that uses 3D Gaussians at varying LOD in a lossy scheme derived from VDB datasets, rather than focusing on photorealistic volume rendering.\n\n随着科学可视化（Scientific Visualization，SciVis）中体数据和模拟数据的复杂性与规模不断增长，对于此类数据集的内存高效数据格式与存储技术也变得愈发重要。不同方法在效率与优势上各不相同。OpenVDB库及其VDB数据格式因其分层动态树结构，具备激活与非激活子树的数据存储方式，在内存效率方面表现出色。它已被广泛应用于VFX管线中的动画和渲染阶段，以及体积和流体的写实渲染。\n然而，OpenVDB在科学可视化领域尚未被充分利用，尽管该领域常处理如多孔介质中的稀疏标量场、风暴或气象模拟等时变体数据，以及高分辨率的计算流体力学（CFD）模拟，这些都包含大量复杂且具有挑战性的数据集。\n因此，本文的目标不仅在于探索OpenVDB在SciVis中的应用，还希望研究一种基于三维高斯粒子的层次细节（Level of Detail, LOD）技术，用于近似表示体素区域。在渲染方面，我们采用NVIDIA OptiX库进行高斯粒子的光线步进（ray marching）。\n近年来，使用3D高斯建模在从立体图像转换为三维场景方面因Gaussian Splatting的成功而广受关注。而高斯近似与高斯混合模型（GMM）在科学可视化中也并非新概念。我们的工作尝试将OpenVDB与OptiX等渲染软件库集成，以利用其内建的内存压缩机制和硬件加速特性，同时发挥现代GPU的计算性能优势。\n综上所述，我们提出了一种基于VDB数据集的有损方案，在不同LOD下使用三维高斯粒子的SciVis渲染方法，重点不在于实现照片级真实感的体渲染，而是提升科学数据在可视化过程中的效率与表现力。\n"
  },
  {
    "path": "abs/2504.05152.md",
    "content": "### PanoDreamer: Consistent Text to 360-Degree Scene Generation\n\nAutomatically generating a complete 3D scene from a text description, a reference image, or both has significant applications in fields like virtual reality and gaming. However, current methods often generate low-quality textures and inconsistent 3D structures. This is especially true when extrapolating significantly beyond the field of view of the reference image. To address these challenges, we propose PanoDreamer, a novel framework for consistent, 3D scene generation with flexible text and image control. Our approach employs a large language model and a warp-refine pipeline, first generating an initial set of images and then compositing them into a 360-degree panorama. This panorama is then lifted into 3D to form an initial point cloud. We then use several approaches to generate additional images, from different viewpoints, that are consistent with the initial point cloud and expand/refine the initial point cloud. Given the resulting set of images, we utilize 3D Gaussian Splatting to create the final 3D scene, which can then be rendered from different viewpoints. Experiments demonstrate the effectiveness of PanoDreamer in generating high-quality, geometrically consistent 3D scenes.\n\n从文本描述、参考图像，或两者结合中自动生成完整的三维场景，在虚拟现实和游戏等领域具有广泛的应用前景。然而，当前的方法往往存在纹理质量差、三维结构不一致等问题，尤其在对参考图像视野范围外区域进行外推时更为明显。\n为应对这些挑战，我们提出了 PanoDreamer——一个支持文本与图像灵活控制的全新框架，旨在实现一致性的三维场景生成。该方法采用大语言模型结合“变形-细化”流程，首先生成初始图像集，并将其合成为360度全景图。该全景图随后被“升维”成三维初始点云。\n接着，系统使用多种策略，从不同视角生成与初始点云一致的附加图像，以进一步扩展并细化点云。最终，我们利用3D Gaussian Splatting将生成图像转换为最终的三维场景，可从多个视角进行渲染。\n实验结果表明，PanoDreamer 能有效生成高质量且几何一致的三维场景，展现出强大的跨模态场景生成能力。\n"
  },
  {
    "path": "abs/2504.05296.md",
    "content": "### Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects\n\n3D Gaussian Splatting has recently enabled fast and photorealistic reconstruction of static 3D scenes. However, introducing dynamic elements that interact naturally with such static scenes remains challenging. Accordingly, we present a novel hybrid framework that combines Gaussian-particle representations for incorporating physically-based global weather effects into static 3D Gaussian Splatting scenes, correctly handling the interactions of dynamic elements with the static scene. We follow a three-stage process: we first map static 3D Gaussians to a particle-based representation. We then introduce dynamic particles and simulate their motion using the Material Point Method (MPM). Finally, we map the simulated particles back to the Gaussian domain while introducing appearance parameters tailored for specific effects. To correctly handle the interactions of dynamic elements with the static scene, we introduce specialized collision handling techniques. Our approach supports a variety of weather effects, including snowfall, rainfall, fog, and sandstorms, and can also support falling objects, all with physically plausible motion and appearance. Experiments demonstrate that our method significantly outperforms existing approaches in both visual quality and physical realism.\n\n3D Gaussian Splatting 近年来实现了对静态三维场景的快速且逼真的重建。然而，在此类静态场景中引入能够自然交互的动态元素仍是一项挑战。\n对此，我们提出了一个新颖的混合框架，将高斯粒子表示与物理基础的全局天气效果相结合，引入到静态 3D Gaussian Splatting 场景中，并能正确处理动态元素与静态场景的交互。\n我们的方法遵循三个阶段的流程：首先将静态的三维高斯映射为基于粒子的表示；随后引入动态粒子，并使用物质点法（Material Point Method, MPM）对其运动进行模拟；最后将模拟后的粒子重新映射回高斯域，同时引入针对特定效果设计的外观参数。\n为了正确处理动态元素与静态场景之间的交互，我们引入了专门的碰撞处理技术。我们的方法支持多种天气效果，包括降雪、降雨、雾和沙尘暴，并可支持下落物体，所有效果均具备物理上可信的运动与外观表现。实验表明，该方法在视觉质量与物理真实感方面均显著优于现有方法。\n"
  },
  {
    "path": "abs/2504.05458.md",
    "content": "### Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images\n\nTo achieve realistic immersion in landscape images, fluids such as water and clouds need to move within the image while revealing new scenes from various camera perspectives. Recently, a field called dynamic scene video has emerged, which combines single image animation with 3D photography. These methods use pseudo 3D space, implicitly represented with Layered Depth Images (LDIs). LDIs separate a single image into depth-based layers, which enables elements like water and clouds to move within the image while revealing new scenes from different camera perspectives. However, as landscapes typically consist of continuous elements, including fluids, the representation of a 3D space separates a landscape image into discrete layers, and it can lead to diminished depth perception and potential distortions depending on camera movement. Furthermore, due to its implicit modeling of 3D space, the output may be limited to videos in the 2D domain, potentially reducing their versatility. In this paper, we propose representing a complete 3D space for dynamic scene video by modeling explicit representations, specifically 4D Gaussians, from a single image. The framework is focused on optimizing 3D Gaussians by generating multi-view images from a single image and creating 3D motion to optimize 4D Gaussians. The most important part of proposed framework is consistent 3D motion estimation, which estimates common motion among multi-view images to bring the motion in 3D space closer to actual motions. As far as we know, this is the first attempt that considers animation while representing a complete 3D space from a single landscape image. Our model demonstrates the ability to provide realistic immersion in various landscape images through diverse experiments and metrics.\n\n为了在风景图像中实现真实的沉浸感，诸如水体与云层等流体元素需要在图像内产生动态变化，同时随着相机视角的移动呈现出新的景观。近期，一种结合单张图像动画与三维摄影的新研究方向——**动态场景视频（dynamic scene video）**逐渐兴起。此类方法通常采用伪三维空间建模，借助 分层深度图像（Layered Depth Images, LDIs） 实现图像的三维表现。LDIs 将单张图像划分为基于深度的多个层，使得水面、云层等元素能够在图像中产生运动，同时随着视角变化揭示新的场景内容。\n然而，由于风景图像通常由连续元素构成，包括大量的流体结构，使用 LDIs 将图像划分为离散层的方式可能导致深度感减弱，并在相机运动时产生几何扭曲。此外，由于该类方法采用的是对三维空间的隐式建模，输出内容常局限于二维视频，从而限制了其应用的多样性。\n为此，本文提出了一种从单张图像出发，通过构建显式表示——4D 高斯（4D Gaussians），来对完整三维空间进行建模的方法，用于生成动态场景视频。该框架的核心是通过从单张图像生成多视角图像并构建三维运动，进而优化三维高斯表示，最终获得 4D 高斯场景。\n框架的关键在于一致性的三维运动估计，该模块用于从多视角图像中估计出共通的运动表达，使得三维空间中的运动更加贴近真实世界的动态行为。据我们所知，这是首次尝试在动画建模的同时，从单张风景图像出发构建完整的三维空间表示。\n通过大量实验与评估指标，我们的方法展示出在不同类型风景图像中提供真实沉浸感的能力。\n"
  },
  {
    "path": "abs/2504.05517.md",
    "content": "### L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery\n\nTraditional 3D content representations include dense point clouds that consume large amounts of data and hence network bandwidth, while newer representations such as neural radiance fields suffer from poor frame rates due to their non-standard volumetric rendering pipeline. 3D Gaussian splats (3DGS) can be seen as a generalization of point clouds that meet the best of both worlds, with high visual quality and efficient rendering for real-time frame rates. However, delivering 3DGS scenes from a hosting server to client devices is still challenging due to high network data consumption (e.g., 1.5 GB for a single scene). The goal of this work is to create an efficient 3D content delivery framework that allows users to view high quality 3D scenes with 3DGS as the underlying data representation. The main contributions of the paper are: (1) Creating new layered 3DGS scenes for efficient delivery, (2) Scheduling algorithms to choose what splats to download at what time, and (3) Trace-driven experiments from users wearing virtual reality headsets to evaluate the visual quality and latency. Our system for Layered 3D Gaussian Splats delivery L3GS demonstrates high visual quality, achieving 16.9% higher average SSIM compared to baselines, and also works with other compressed 3DGS representations.\n\n传统的三维内容表示方式包括稠密点云，其数据量庞大，因此会占用大量网络带宽；而较新的表示方式如神经辐射场（NeRF），则由于采用非常规的体渲染流程，帧率表现较差。三维高斯斑点（3D Gaussian Splatting, 3DGS）可以看作是对点云的泛化，兼具高视觉质量和高效渲染性能，能够实现实时帧率。然而，由主机服务器向客户端设备传输3DGS场景仍具有挑战性，因为其网络数据消耗较高（例如，单个场景可达1.5 GB）。\n本工作的目标是构建一个高效的三维内容传输框架，使得用户能够在以3DGS为底层数据表示的基础上查看高质量的三维场景。本文的主要贡献包括：构建用于高效传输的分层3DGS场景；设计调度算法以决定在何时下载哪些斑点；并基于佩戴虚拟现实头显的用户轨迹进行实验，以评估视觉质量与延迟表现。我们的分层三维高斯斑点传输系统（L3GS）展现出较高的视觉质量，平均 SSIM 相较基线方法提升了16.9%，同时也支持其他压缩形式的3DGS表示。\n"
  },
  {
    "path": "abs/2504.05544.md",
    "content": "### View-Dependent Deformation Fields for 2D Editing of 3D Models\n\nWe propose a method for authoring non-realistic 3D objects (represented as either 3D Gaussian Splats or meshes), that comply with 2D edits from specific viewpoints. Namely, given a 3D object, a user chooses different viewpoints and interactively deforms the object in the 2D image plane of each view. The method then produces a \"deformation field\" - an interpolation between those 2D deformations in a smooth manner as the viewpoint changes. Our core observation is that the 2D deformations do not need to be tied to an underlying object, nor share the same deformation space. We use this observation to devise a method for authoring view-dependent deformations, holding several technical contributions: first, a novel way to compositionality-blend between the 2D deformations after lifting them to 3D - this enables the user to \"stack\" the deformations similarly to layers in an editing software, each deformation operating on the results of the previous; second, a novel method to apply the 3D deformation to 3D Gaussian Splats; third, an approach to author the 2D deformations, by deforming a 2D mesh encapsulating a rendered image of the object. We show the versatility and efficacy of our method by adding cartoonish effects to objects, providing means to modify human characters, fitting 3D models to given 2D sketches and caricatures, resolving occlusions, and recreating classic non-realistic paintings as 3D models.\n\n我们提出了一种用于创作非真实感三维对象的方法，该对象可以以 3D Gaussian Splats 或三维网格的形式表示，并能够根据用户在特定视角下的 二维编辑操作进行一致性变形。具体而言，用户从给定的三维对象出发，选择多个不同的观察视角，并在每个视角对应的二维图像平面中交互式地对对象进行变形。本方法随后会生成一个**“变形场”**，即在视角变化过程中对各视角二维变形结果进行平滑插值的空间变换。\n我们的核心观察是：二维变形本身不需要绑定到底层三维对象，也不需要共享统一的变形空间。基于这一观察，我们提出了一种可用于创作**视角相关变形（view-dependent deformation）**的方法，并在技术上做出以下贡献：\n首先，我们提出了一种新的方式，在将二维变形升维到三维之后进行组合式混合（compositionality-blending），使得用户能够像图像编辑软件中的图层一样“堆叠”多个变形，每个变形作用于上一步结果；\n其次，我们提出了一种新的方法，用于将三维空间中的变形应用于三维高斯投影（3D Gaussian Splats）；\n第三，我们设计了一种用于创作二维变形的方法，通过对包裹对象渲染图像的二维网格进行变形，以生成用户友好的编辑界面。\n我们通过多个示例验证了本方法的通用性与有效性，包括：为对象添加卡通化效果、修改三维人物角色、将三维模型拟合到给定二维草图或漫画、解决遮挡问题，以及将经典的非写实绘画重建为三维模型。\n"
  },
  {
    "path": "abs/2504.05740.md",
    "content": "### Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting\n\nRecent advancements in 3D Gaussian Splatting have achieved impressive scalability and real-time rendering for large-scale scenes but often fall short in capturing fine-grained details. Conventional approaches that rely on relatively large covariance parameters tend to produce blurred representations, while directly reducing covariance sizes leads to sparsity. In this work, we introduce Micro-splatting (Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting), a novel framework designed to overcome these limitations. Our approach leverages a covariance regularization term to penalize excessively large Gaussians to ensure each splat remains compact and isotropic. This work implements an adaptive densification strategy that dynamically refines regions with high image gradients by lowering the splitting threshold, followed by loss function enhancement. This strategy results in a denser and more detailed gaussian means where needed, without sacrificing rendering efficiency. Quantitative evaluations using metrics such as L1, L2, PSNR, SSIM, and LPIPS, alongside qualitative comparisons demonstrate that our method significantly enhances fine-details in 3D reconstructions.\n\n近期在 3D Gaussian Splatting 方面的进展已实现了对大规模场景的优异可扩展性与实时渲染性能，但在捕捉细粒度细节方面仍存在不足。传统方法依赖于相对较大的协方差参数，容易导致表示模糊；而直接减小协方差则会导致表示过于稀疏。\n在本研究中，我们提出了 Micro-splatting（Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting），这一新颖框架旨在克服上述限制。我们的方法引入了一个协方差正则项，用于惩罚过大的高斯，以确保每个斑点保持紧致且各向同性。\n本工作还实现了一种自适应加密策略，通过降低拆分阈值，动态细化图像梯度较高的区域，并结合增强的损失函数进行优化。该策略使得在需要的区域中高斯中心分布更密集、更具细节，同时不牺牲渲染效率。\n通过 L1、L2、PSNR、SSIM 和 LPIPS 等指标的定量评估，以及定性对比实验，结果表明我们的方法在三维重建的细节表现上具有显著提升。\n"
  },
  {
    "path": "abs/2504.06003.md",
    "content": "### econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians\n\nThe primary focus of most recent works on open-vocabulary neural fields is extracting precise semantic features from the VLMs and then consolidating them efficiently into a multi-view consistent 3D neural fields representation. However, most existing works over-trusted SAM to regularize image-level CLIP without any further refinement. Moreover, several existing works improved efficiency by dimensionality reduction of semantic features from 2D VLMs before fusing with 3DGS semantic fields, which inevitably leads to multi-view inconsistency. In this work, we propose econSG for open-vocabulary semantic segmentation with 3DGS. Our econSG consists of: 1) A Confidence-region Guided Regularization (CRR) that mutually refines SAM and CLIP to get the best of both worlds for precise semantic features with complete and precise boundaries. 2) A low dimensional contextual space to enforce 3D multi-view consistency while improving computational efficiency by fusing backprojected multi-view 2D features and follow by dimensional reduction directly on the fused 3D features instead of operating on each 2D view separately. Our econSG shows state-of-the-art performance on four benchmark datasets compared to the existing methods. Furthermore, we are also the most efficient training among all the methods.\n\n当前关于开放词汇神经场的大多数研究主要关注于从视觉语言模型（VLM）中提取精确的语义特征，并高效整合为多视角一致的三维神经场表示。然而，现有方法大多过于依赖 SAM 对图像级 CLIP 特征进行正则化，而未进行进一步细化。此外，一些方法在将二维语义特征融合到 3DGS 语义场之前，通过降维以提升效率，但这一过程往往会导致多视角语义不一致。\n在本研究中，我们提出了 econSG，一种用于 3DGS 的开放词汇语义分割方法。econSG 包含一种置信区域引导的正则化机制（Confidence-region Guided Regularization, CRR），通过 SAM 和 CLIP 的相互细化，联合获取更具边界完整性和精确性的语义特征。我们还提出一种低维上下文空间，在融合回投的多视角二维特征后再对三维特征整体进行降维，从而同时提升多视角一致性和计算效率，避免在各个二维视图上分别处理。\n在四个基准数据集上的实验结果表明，econSG 相较现有方法在语义分割任务上表现更优，同时也是目前训练效率最高的方法。\n"
  },
  {
    "path": "abs/2504.06210.md",
    "content": "### HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation\n\nWe present Hierarchical Motion Representation (HiMoR), a novel deformation representation for 3D Gaussian primitives capable of achieving high-quality monocular dynamic 3D reconstruction. The insight behind HiMoR is that motions in everyday scenes can be decomposed into coarser motions that serve as the foundation for finer details. Using a tree structure, HiMoR's nodes represent different levels of motion detail, with shallower nodes modeling coarse motion for temporal smoothness and deeper nodes capturing finer motion. Additionally, our model uses a few shared motion bases to represent motions of different sets of nodes, aligning with the assumption that motion tends to be smooth and simple. This motion representation design provides Gaussians with a more structured deformation, maximizing the use of temporal relationships to tackle the challenging task of monocular dynamic 3D reconstruction. We also propose using a more reliable perceptual metric as an alternative, given that pixel-level metrics for evaluating monocular dynamic 3D reconstruction can sometimes fail to accurately reflect the true quality of reconstruction. Extensive experiments demonstrate our method's efficacy in achieving superior novel view synthesis from challenging monocular videos with complex motions.\n\n我们提出了 Hierarchical Motion Representation（HiMoR），这是一种用于三维高斯基元的全新变形表示方法，能够实现高质量的单目动态三维重建。HiMoR 的核心思想是：日常场景中的运动可以分解为更粗略的运动作为基础，进而构建更精细的细节。\nHiMoR 采用树状结构，其中的节点表示不同层次的运动细节。浅层节点用于建模较粗的运动，以实现时间上的平滑性，而深层节点则用于捕捉更精细的运动变化。此外，模型使用少量共享的运动基来表示不同节点集合的运动，符合“运动通常是平滑且简单”的假设。\n这一运动表示设计为高斯基元提供了更具结构性的变形方式，使得系统能够最大程度地利用时间关系，应对单目动态三维重建这一极具挑战性的任务。\n考虑到像素级指标在评估单目动态三维重建质量时可能难以准确反映真实效果，我们还提出采用一种更可靠的感知指标作为替代。大量实验证明，该方法在处理复杂运动的单目视频中，能够实现更优的新视角合成效果。\n"
  },
  {
    "path": "abs/2504.06598.md",
    "content": "### Stochastic Ray Tracing of 3D Transparent Gaussians\n\n3D Gaussian splatting has recently been widely adopted as a 3D representation for novel-view synthesis, relighting, and text-to-3D generation tasks, offering realistic and detailed results through a collection of explicit 3D Gaussians carrying opacities and view-dependent colors. However, efficient rendering of many transparent primitives remains a significant challenge. Existing approaches either rasterize the 3D Gaussians with approximate sorting per view or rely on high-end RTX GPUs to exhaustively process all ray-Gaussian intersections (bounding Gaussians by meshes). This paper proposes a stochastic ray tracing method to render 3D clouds of transparent primitives. Instead of processing all ray-Gaussian intersections in sequential order, each ray traverses the acceleration structure only once, randomly accepting and shading a single intersection (or N intersections, using a simple extension). This approach minimizes shading time and avoids sorting the Gaussians along the ray while minimizing the register usage and maximizing parallelism even on low-end GPUs. The cost of rays through the Gaussian asset is comparable to that of standard mesh-intersection rays. While our method introduces noise, the shading is unbiased, and the variance is slight, as stochastic acceptance is importance-sampled based on accumulated opacity. The alignment with the Monte Carlo philosophy simplifies implementation and easily integrates our method into a conventional path-tracing framework.\n\n三维高斯投影（3D Gaussian Splatting）近年来被广泛应用于新视角合成、重光照（relighting）以及文本到三维（text-to-3D）生成等任务，凭借一组具有透明度和视角相关颜色的显式三维高斯，实现了真实且细节丰富的结果。然而，在处理大量透明基元时，如何实现高效渲染仍是一项重大挑战。\n现有方法通常采用每个视角近似排序的方式对三维高斯进行栅格化，或依赖高端 RTX GPU 来穷举计算所有射线与高斯的交点（通过网格对高斯进行包围）。本文提出了一种用于透明基元三维点云的随机射线追踪（stochastic ray tracing）方法。\n该方法不再按顺序处理所有射线与高斯的交点，而是让每条射线只遍历一次加速结构，并随机接受并着色一个交点（或扩展为 N 个交点）。这种做法显著减少了着色时间，避免了对射线方向上高斯的排序，同时最小化了寄存器使用，最大化了并行性，即使在低端 GPU 上也能运行。\n在通过高斯资产的射线开销方面，该方法与标准网格交点的射线相当。虽然引入了噪声，但着色是无偏的，并且由于基于累积透明度的重点采样（importance sampling），方差很小。该方法与蒙特卡洛路径追踪的理念一致，便于实现，并可轻松集成到传统的路径追踪框架中。\n"
  },
  {
    "path": "abs/2504.06716.md",
    "content": "### GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction\n\nGaussian Splatting (GS) is a popular approach for 3D reconstruction, mostly due to its ability to converge reasonably fast, faithfully represent the scene and render (novel) views in a fast fashion. However, it suffers from large storage and memory requirements, and its training speed still lags behind the hash-grid based radiance field approaches (e.g. Instant-NGP), which makes it especially difficult to deploy them in robotics scenarios, where 3D reconstruction is crucial for accurate operation. In this paper, we propose GSta that dynamically identifies Gaussians that have converged well during training, based on their positional and color gradient norms. By forcing such Gaussians into a siesta and stopping their updates (freezing) during training, we improve training speed with competitive accuracy compared to state of the art. We also propose an early stopping mechanism based on the PSNR values computed on a subset of training images. Combined with other improvements, such as integrating a learning rate scheduler, GSta achieves an improved Pareto front in convergence speed, memory and storage requirements, while preserving quality. We also show that GSta can improve other methods and complement orthogonal approaches in efficiency improvement; once combined with Trick-GS, GSta achieves up to 5x faster training, 16x smaller disk size compared to vanilla GS, while having comparable accuracy and consuming only half the peak memory.\n\n高斯投影（Gaussian Splatting, GS）因其收敛速度较快、场景表达能力强以及能实现快速（新视角）渲染，已成为三维重建中广泛采用的方法。然而，GS 也存在存储和内存开销大的问题，其训练速度仍落后于基于哈希网格的辐射场方法（如 Instant-NGP），这使得其在对三维重建有高实时性要求的机器人应用场景中部署变得尤为困难。\n为此，本文提出了 GSta，该方法在训练过程中动态识别已经充分收敛的高斯点，依据其位置和颜色梯度范数进行判断。对于这些已收敛的高斯，我们将其置于“休眠状态”，即在训练中停止更新（冻结），从而在保持精度竞争力的同时显著提升训练速度。\n此外，我们还提出了一种基于训练图像子集的 PSNR 值的早停机制。结合其他优化措施（如引入学习率调度器），GSta 在收敛速度、内存与存储开销等方面构建出更优的 Pareto 前沿，同时保持输出质量不变。\n我们进一步展示了 GSta 对其他方法的适配性与可组合性。在与 Trick-GS 结合使用时，GSta 可实现最高 5 倍的训练加速、磁盘占用减少至原始 GS 的 1/16，同时精度相当、峰值内存占用减半。\n"
  },
  {
    "path": "abs/2504.06815.md",
    "content": "### SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering\n\nReconstructing 3D assets from images, known as inverse rendering (IR), remains a challenging task due to its ill-posed nature. 3D Gaussian Splatting (3DGS) has demonstrated impressive capabilities for novel view synthesis (NVS) tasks. Methods apply it to relighting by separating radiance into BRDF parameters and lighting, yet produce inferior relighting quality with artifacts and unnatural indirect illumination due to the limited capability of each Gaussian, which has constant material parameters and normal, alongside the absence of physical constraints for indirect lighting. In this paper, we present a novel framework called Spatially-vayring Gaussian Inverse Rendering (SVG-IR), aimed at enhancing both NVS and relighting quality. To this end, we propose a new representation-Spatially-varying Gaussian (SVG)-that allows per-Gaussian spatially varying parameters. This enhanced representation is complemented by a SVG splatting scheme akin to vertex/fragment shading in traditional graphics pipelines. Furthermore, we integrate a physically-based indirect lighting model, enabling more realistic relighting. The proposed SVG-IR framework significantly improves rendering quality, outperforming state-of-the-art NeRF-based methods by 2.5 dB in peak signal-to-noise ratio (PSNR) and surpassing existing Gaussian-based techniques by 3.5 dB in relighting tasks, all while maintaining a real-time rendering speed.\n\n从图像中重建三维资产（即逆向渲染，Inverse Rendering, IR）因其病态特性，依然是一项具有挑战性的任务。三维高斯投影（3D Gaussian Splatting, 3DGS）在新视角合成（Novel View Synthesis, NVS）任务中表现出色。已有方法尝试通过将辐射拆分为 BRDF 参数与光照项来将其应用于重光照任务，但由于每个高斯仅具备恒定的材质参数与法线，且缺乏对间接光照的物理建模约束，因此在重光照时往往会出现伪影与不自然的间接光照效果。\n为解决上述问题，本文提出了一种新框架：SVG-IR，旨在同时提升 NVS 与重光照的渲染质量。为此，我们提出了一种新的表示方式——空间可变高斯（Spatially-varying Gaussian, SVG），允许每个高斯拥有空间可变的参数。该表示方式通过一种类似传统图形管线中顶点/片元着色的 SVG 投影策略得以实现。\n此外，我们还引入了一个基于物理的间接光照模型，从而实现更逼真的重光照效果。实验结果表明，SVG-IR 框架显著提升了渲染质量，在保持实时渲染速度的同时，在峰值信噪比（PSNR）上相比当前最先进的 NeRF 方法提升了 2.5 dB，在重光照任务中相较现有高斯方法提升了 3.5 dB。\n"
  },
  {
    "path": "abs/2504.06827.md",
    "content": "### IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments\n\nThis work presents IAAO, a novel framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. Unlike prior methods that rely on task-specific networks and assumptions about movable parts, our IAAO leverages large foundation models to estimate interactive affordances and part articulations in three stages. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances. Finally, scenes from different states are merged and refined based on the estimated transformations, enabling robust affordance-based interaction and manipulation of objects. Experimental results demonstrate the effectiveness of our method.\n\n本工作提出了 IAAO，一个新颖的框架，用于为智能体构建显式三维模型，从而通过交互理解其环境中的可动物体。与以往依赖任务特定网络和对可移动部件进行假设的方法不同，IAAO 利用大规模基础模型，在三个阶段中估计交互可供性和部件关节结构。\n首先，我们通过 3D Gaussian Splatting（3DGS） 构建每个物体状态的分层特征与标签场，方法是从多视角图像中提取遮罩特征并蒸馏获得视角一致的标签。接着，在三维高斯基元上进行物体级和部件级的查询，用于识别静态与可动部分，并估计全局变换与局部关节参数，同时获得可供性信息。最后，将来自不同状态的场景根据估计的变换进行融合与优化，从而支持稳健的基于可供性的交互与操控。\n实验结果表明，我们的方法具有良好的有效性。\n"
  },
  {
    "path": "abs/2504.06978.md",
    "content": "### Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting\n\nAutomated extraction of plant morphological traits is crucial for supporting crop breeding and agricultural management through high-throughput field phenotyping (HTFP). Solutions based on multi-view RGB images are attractive due to their scalability and affordability, enabling volumetric measurements that 2D approaches cannot directly capture. While advanced methods like Neural Radiance Fields (NeRFs) have shown promise, their application has been limited to counting or extracting traits from only a few plants or organs. Furthermore, accurately measuring complex structures like individual wheat heads-essential for studying crop yields-remains particularly challenging due to occlusions and the dense arrangement of crop canopies in field conditions. The recent development of 3D Gaussian Splatting (3DGS) offers a promising alternative for HTFP due to its high-quality reconstructions and explicit point-based representation. In this paper, we present Wheat3DGS, a novel approach that leverages 3DGS and the Segment Anything Model (SAM) for precise 3D instance segmentation and morphological measurement of hundreds of wheat heads automatically, representing the first application of 3DGS to HTFP. We validate the accuracy of wheat head extraction against high-resolution laser scan data, obtaining per-instance mean absolute percentage errors of 15.1%, 18.3%, and 40.2% for length, width, and volume. We provide additional comparisons to NeRF-based approaches and traditional Muti-View Stereo (MVS), demonstrating superior results. Our approach enables rapid, non-destructive measurements of key yield-related traits at scale, with significant implications for accelerating crop breeding and improving our understanding of wheat development\n\n植物形态性状的自动提取在高通量田间表型分析（High-Throughput Field Phenotyping, HTFP）中具有关键意义，可为作物育种与农业管理提供支撑。基于多视角 RGB 图像的解决方案因其可扩展性强、成本低廉，在获取二维方法无法直接测量的体积信息方面具有显著优势。\n尽管近年来如**神经辐射场（Neural Radiance Fields, NeRF）**等先进方法已展现出一定潜力，但其应用范围仍局限于对少量植株或器官的性状计数与提取。尤其是在实际田间环境中，由于遮挡严重和作物冠层排列密集，对复杂结构（如单个小麦穗）的精确测量仍是极具挑战性的任务。\n三维高斯投影（3D Gaussian Splatting, 3DGS）的最新发展为 HTFP 提供了一种具有前景的替代方案，得益于其高质量重建能力与显式点表示的特点。\n本文提出了一种新方法 Wheat3DGS，首次将 3DGS 应用于 HTFP，结合 Segment Anything Model（SAM） 实现对数百个小麦穗的三维实例分割与形态学测量，并具备高度自动化。我们通过与高分辨率激光扫描数据对比验证了小麦穗提取的精度，分别在长度、宽度与体积测量上达到了 15.1%、18.3%、40.2% 的平均单体相对误差（Mean Absolute Percentage Error）。\n同时，我们将 Wheat3DGS 与基于 NeRF 的方法和传统的多视角立体重建（Multi-View Stereo, MVS）进行了对比，结果显示其在性能上具有显著优势。\n该方法实现了对关键产量相关性状的快速、非破坏性测量，可大规模部署，具有加速作物育种和加深对小麦生长机制理解的重大意义。\n"
  },
  {
    "path": "abs/2504.06982.md",
    "content": "### SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets\n\n3D human digitization has long been a highly pursued yet challenging task. Existing methods aim to generate high-quality 3D digital humans from single or multiple views, but remain primarily constrained by current paradigms and the scarcity of 3D human assets. Specifically, recent approaches fall into several paradigms: optimization-based and feed-forward (both single-view regression and multi-view generation with reconstruction). However, they are limited by slow speed, low quality, cascade reasoning, and ambiguity in mapping low-dimensional planes to high-dimensional space due to occlusion and invisibility, respectively. Furthermore, existing 3D human assets remain small-scale, insufficient for large-scale training. To address these challenges, we propose a latent space generation paradigm for 3D human digitization, which involves compressing multi-view images into Gaussians via a UV-structured VAE, along with DiT-based conditional generation, we transform the ill-posed low-to-high-dimensional mapping problem into a learnable distribution shift, which also supports end-to-end inference. In addition, we employ the multi-view optimization approach combined with synthetic data to construct the HGS-1M dataset, which contains 1 million 3D Gaussian assets to support the large-scale training. Experimental results demonstrate that our paradigm, powered by large-scale training, produces high-quality 3D human Gaussians with intricate textures, facial details, and loose clothing deformation.\n\n三维人体数字化一直是一个高度追求但极具挑战性的任务。现有方法主要致力于从单视图或多视图生成高质量的三维数字人，但受限于当前范式和三维人体数据资产的稀缺性，始终面临瓶颈。具体而言，近期方法主要可归类为以下几种范式：基于优化的方法，以及前馈式的方法（包括单视图回归和结合重建的多视图生成）。然而，这些方法分别受到速度慢、质量低、推理流程冗长，以及由于遮挡和不可见性导致的低维到高维映射歧义等问题的限制。\n此外，现有三维人体资产规模有限，难以满足大规模训练需求。为了解决上述挑战，我们提出了一种用于三维人体数字化的潜空间生成范式。该方法通过 UV 结构的变分自编码器（VAE） 将多视图图像压缩为高斯表示，并结合 DiT（Diffusion Transformer） 进行条件生成，将原本病态的低维到高维映射问题转化为可学习的分布迁移过程，同时支持端到端推理。\n此外，我们结合多视图优化方法与合成数据构建了 HGS-1M 数据集，包含 一百万个三维高斯人体资产，以支持大规模训练。实验结果表明，得益于大规模数据支撑，我们的方法能够生成具有精细纹理、面部细节以及宽松衣物变形的高质量三维高斯人体表示。\n"
  },
  {
    "path": "abs/2504.07144.md",
    "content": "### GIGA: Generalizable Sparse Image-driven Gaussian Avatars\n\nDriving a high-quality and photorealistic full-body human avatar, from only a few RGB cameras, is a challenging problem that has become increasingly relevant with emerging virtual reality technologies. To democratize such technology, a promising solution may be a generalizable method that takes sparse multi-view images of an unseen person and then generates photoreal free-view renderings of such identity. However, the current state of the art is not scalable to very large datasets and, thus, lacks in diversity and photorealism. To address this problem, we propose a novel, generalizable full-body model for rendering photoreal humans in free viewpoint, as driven by sparse multi-view video. For the first time in literature, our model can scale up training to thousands of subjects while maintaining high photorealism. At the core, we introduce a MultiHeadUNet architecture, which takes sparse multi-view images in texture space as input and predicts Gaussian primitives represented as 2D texels on top of a human body mesh. Importantly, we represent sparse-view image information, body shape, and the Gaussian parameters in 2D so that we can design a deep and scalable architecture entirely based on 2D convolutions and attention mechanisms. At test time, our method synthesizes an articulated 3D Gaussian-based avatar from as few as four input views and a tracked body template for unseen identities. Our method excels over prior works by a significant margin in terms of cross-subject generalization capability as well as photorealism.\n\n仅使用少量 RGB 相机驱动一个高质量、逼真的全身虚拟人头像，是一个极具挑战性的任务，随着虚拟现实技术的发展，其研究与应用价值愈发凸显。为了实现该技术的大众化应用，一个有前景的解决方案是：设计一种具备泛化能力的方法，能够从稀疏多视角图像中重建未见过的个体，并生成真实感强、可自由视角渲染的人体数字化形象。\n然而，当前的最新方法难以扩展至大规模数据集，因此在多样性与真实感方面仍存在不足。为解决这一问题，我们提出了一种新颖的、可泛化的全身模型，能够从稀疏多视角视频中驱动逼真的自由视角人体渲染。该方法首次在文献中实现了对数千个不同主体的训练扩展能力，同时保持极高的真实感。\n我们方法的核心是提出了 MultiHeadUNet 架构，该架构以纹理空间中的稀疏多视角图像为输入，预测基于人体网格的 二维 texel 表示的高斯基元（Gaussian primitives）。关键在于，我们将稀疏图像信息、人体形状与高斯参数均表示在二维空间中，从而能够构建一个完全基于二维卷积与注意力机制的深度且可扩展的网络结构。\n在测试阶段，该方法能够从仅四个输入视角图像和一个已跟踪的身体模板出发，生成带有运动姿态的 基于三维高斯表示的虚拟人头像，适用于未见身份个体。\n在跨主体泛化能力与图像真实感方面，我们的方法相较现有工作均有显著提升。\n"
  },
  {
    "path": "abs/2504.07370.md",
    "content": "### View-Dependent Uncertainty Estimation of 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has become increasingly popular in 3D scene reconstruction for its high visual accuracy. However, uncertainty estimation of 3DGS scenes remains underexplored and is crucial to downstream tasks such as asset extraction and scene completion. Since the appearance of 3D gaussians is view-dependent, the color of a gaussian can thus be certain from an angle and uncertain from another. We thus propose to model uncertainty in 3DGS as an additional view-dependent per-gaussian feature that can be modeled with spherical harmonics. This simple yet effective modeling is easily interpretable and can be integrated into the traditional 3DGS pipeline. It is also significantly faster than ensemble methods while maintaining high accuracy, as demonstrated in our experiments.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）因其出色的视觉精度，近年来在三维场景重建中受到广泛关注。然而，对于 3DGS 场景中的不确定性估计仍缺乏深入研究，而这一能力对于后续任务如资产提取与场景补全至关重要。\n由于三维高斯的外观具有视角依赖性，因此同一个高斯从某个角度观察可能具有确定的颜色表现，而从另一个角度则可能是不确定的。\n基于此，我们提出将不确定性建模为一种每个高斯视角相关的附加特征，并使用**球谐函数（spherical harmonics）**进行建模。这种建模方式简洁有效，具有良好的可解释性，并可直接集成至传统的 3DGS 渲染流程中。\n实验结果表明，该方法在保持较高准确率的同时，显著快于集成方法（ensemble-based methods），展现出良好的实用性与效率。\n"
  },
  {
    "path": "abs/2504.07949.md",
    "content": "### InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians\n\nWith the rising interest from the community in digital avatars coupled with the importance of expressions and gestures in communication, modeling natural avatar behavior remains an important challenge across many industries such as teleconferencing, gaming, and AR/VR. Human hands are the primary tool for interacting with the environment and essential for realistic human behavior modeling, yet existing 3D hand and head avatar models often overlook the crucial aspect of hand-body interactions, such as between hand and face. We present InteracttAvatar, the first model to faithfully capture the photorealistic appearance of dynamic hand and non-rigid hand-face interactions. Our novel Dynamic Gaussian Hand model, combining template model and 3D Gaussian Splatting as well as a dynamic refinement module, captures pose-dependent change, e.g. the fine wrinkles and complex shadows that occur during articulation. Importantly, our hand-face interaction module models the subtle geometry and appearance dynamics that underlie common gestures. Through experiments of novel view synthesis, self reenactment and cross-identity reenactment, we demonstrate that InteracttAvatar can reconstruct hand and hand-face interactions from monocular or multiview videos with high-fidelity details and be animated with novel poses.\n\n随着社区对数字化虚拟人（digital avatars）兴趣的日益增长，以及表情与手势在交流中的重要性，建模自然的虚拟人行为已成为远程会议、游戏、增强/虚拟现实（AR/VR）等多个行业亟待解决的关键挑战。其中，人手是人类与环境交互的主要工具，也是实现真实行为建模的核心，但现有的三维手部与头部虚拟人模型往往忽视了手与身体之间的关键交互，例如手与脸之间的接触行为。\n我们提出了 InteracttAvatar，这是首个能够真实捕捉动态手部以及非刚性手-脸交互的照片级外观的模型。我们设计的全新动态高斯手部模型（Dynamic Gaussian Hand model）结合了模板模型、3D Gaussian Splatting与动态细化模块，能够捕捉姿态相关的变化，例如关节运动中产生的细小皱褶与复杂阴影。\n尤为重要的是，我们提出的手-脸交互模块能够建模在常见手势中出现的精细几何结构与外观动态变化。\n通过新视角合成、自我重演以及跨身份重演等实验，我们展示了 InteracttAvatar 能够从单目或多视角视频中高保真地重建手部及手-脸交互，并支持新姿态驱动的动画生成。\n\n"
  },
  {
    "path": "abs/2504.08100.md",
    "content": "### ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting\n\nCreating 3D content from single-view images is a challenging problem that has attracted considerable attention in recent years. Current approaches typically utilize score distillation sampling (SDS) from pre-trained 2D diffusion models to generate multi-view 3D representations. Although some methods have made notable progress by balancing generation speed and model quality, their performance is often limited by the visual inconsistencies of the diffusion model outputs. In this work, we propose ContrastiveGaussian, which integrates contrastive learning into the generative process. By using a perceptual loss, we effectively differentiate between positive and negative samples, leveraging the visual inconsistencies to improve 3D generation quality. To further enhance sample differentiation and improve contrastive learning, we incorporate a super-resolution model and introduce another Quantity-Aware Triplet Loss to address varying sample distributions during training. Our experiments demonstrate that our approach achieves superior texture fidelity and improved geometric consistency.\n\n从单视图图像生成三维内容是一项具有挑战性的任务，近年来受到了广泛关注。当前的方法通常依赖于来自预训练二维扩散模型的评分蒸馏采样（Score Distillation Sampling, SDS），以生成多视角的三维表示。尽管部分方法在生成速度与模型质量之间取得了一定的平衡，但其性能仍常受限于扩散模型输出的视觉不一致性。\n在本研究中，我们提出了 ContrastiveGaussian，将**对比学习（contrastive learning）**引入三维生成过程。通过引入感知损失（perceptual loss），我们能够有效区分正负样本，利用视觉不一致性反向促进三维生成质量的提升。\n为进一步增强样本区分能力并提升对比学习效果，我们引入了一个超分辨率模型，并提出一种新的 数量感知三元组损失（Quantity-Aware Triplet Loss），用于应对训练过程中样本分布的差异。\n实验结果表明，我们的方法在纹理保真度和几何一致性方面均优于现有方法。\n"
  },
  {
    "path": "abs/2504.08366.md",
    "content": "### In-2-4D: Inbetweening from Two Single-View Images to 4D Generation\n\nWe propose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion) inbetweening from a minimalistic input setting: two single-view images capturing an object in two distinct motion states. Given two images representing the start and end states of an object in motion, our goal is to generate and reconstruct the motion in 4D. We utilize a video interpolation model to predict the motion, but large frame-to-frame motions can lead to ambiguous interpretations. To overcome this, we employ a hierarchical approach to identify keyframes that are visually close to the input states and show significant motion, then generate smooth fragments between them. For each fragment, we construct the 3D representation of the keyframe using Gaussian Splatting. The temporal frames within the fragment guide the motion, enabling their transformation into dynamic Gaussians through a deformation field. To improve temporal consistency and refine 3D motion, we expand the self-attention of multi-view diffusion across timesteps and apply rigid transformation regularization. Finally, we merge the independently generated 3D motion segments by interpolating boundary deformation fields and optimizing them to align with the guiding video, ensuring smooth and flicker-free transitions. Through extensive qualitative and quantitiave experiments as well as a user study, we show the effectiveness of our method and its components.\n\n我们提出了一个新问题 In-2-4D，旨在在极简输入设定下实现4D（即三维 + 运动）生成式补间建模。具体而言，该任务从仅有的两张单视图图像出发，图像分别捕捉了同一物体在两个不同运动状态下的瞬间目标是生成并重建该物体的连续运动过程，形成完整的 4D 表达。\n我们首先利用视频插帧模型预测中间运动过程，但当帧间差异较大时，运动解释容易出现歧义。为克服这一问题，我们采用分层策略，首先识别出与输入状态视觉上接近且具有显著运动的关键帧，然后在它们之间生成平滑的片段。\n对于每个片段，我们使用 Gaussian Splatting 构建其关键帧的三维表示，并利用片段内的时间帧指导运动，通过变形场将其转化为动态高斯。为提升时间一致性并优化三维运动表达，我们将多视角扩散模型的自注意机制拓展至时间维度，同时施加刚体变换正则化以抑制不合理形变。\n最终，我们通过边界变形场插值与优化，将各个独立生成的三维运动片段融合，确保整体运动连续流畅、无闪烁。\n在大量定性、定量实验及用户研究中，我们验证了本方法及其各组成部分的有效性。\n"
  },
  {
    "path": "abs/2504.08473.md",
    "content": "### Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation\n\nGenerating synthetic images is a useful method for cheaply obtaining labeled data for training computer vision models. However, obtaining accurate 3D models of relevant objects is necessary, and the resulting images often have a gap in realism due to challenges in simulating lighting effects and camera artifacts. We propose using the novel view synthesis method called Gaussian Splatting to address these challenges. We have developed a synthetic data pipeline for generating high-quality context-aware instance segmentation training data for specific objects. This process is fully automated, requiring only a video of the target object. We train a Gaussian Splatting model of the target object and automatically extract the object from the video. Leveraging Gaussian Splatting, we then render the object on a random background image, and monocular depth estimation is employed to place the object in a believable pose. We introduce a novel dataset to validate our approach and show superior performance over other data generation approaches, such as Cut-and-Paste and Diffusion model-based generation.\n\n合成图像是一种低成本获取带标签数据、用于训练计算机视觉模型的有效手段。然而，这一过程通常依赖准确的三维模型，且由于难以真实模拟光照效果与相机成像特性，合成图像常存在逼真度不足的鸿沟。\n为解决上述问题，我们提出采用新视角合成方法 Gaussian Splatting。我们构建了一条用于特定物体高质量、具上下文感知能力的实例分割训练数据生成流程，该流程全自动化，仅需输入目标物体的视频。\n具体而言，我们首先训练目标物体的 Gaussian Splatting 模型，并自动从视频中完成物体提取。随后，利用 Gaussian Splatting 将该物体渲染到随机背景图像上，并结合单目深度估计以合理地放置物体，生成具有真实空间结构的合成图像。\n我们还引入了一个新数据集用于验证所提方法的有效性。实验表明，相较于 Cut-and-Paste 方法与基于扩散模型的生成方法，我们的方法在合成质量与下游任务性能上均表现出更优结果。\n"
  },
  {
    "path": "abs/2504.08581.md",
    "content": "### FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents\n\n The semantically interactive radiance field has long been a promising backbone for 3D real-world applications, such as embodied AI to achieve scene understanding and manipulation. However, multi-granularity interaction remains a challenging task due to the ambiguity of language and degraded quality when it comes to queries upon object components. In this work, we present FMLGS, an approach that supports part-level open-vocabulary query within 3D Gaussian Splatting (3DGS). We propose an efficient pipeline for building and querying consistent object- and part-level semantics based on Segment Anything Model 2 (SAM2). We designed a semantic deviation strategy to solve the problem of language ambiguity among object parts, which interpolates the semantic features of fine-grained targets for enriched information. Once trained, we can query both objects and their describable parts using natural language. Comparisons with other state-of-the-art methods prove that our method can not only better locate specified part-level targets, but also achieve first-place performance concerning both speed and accuracy, where FMLGS is 98 x faster than LERF, 4 x faster than LangSplat and 2.5 x faster than LEGaussians. Meanwhile, we further integrate FMLGS as a virtual agent that can interactively navigate through 3D scenes, locate targets, and respond to user demands through a chat interface, which demonstrates the potential of our work to be further expanded and applied in the future.\n\n语义交互式辐射场长期以来被视为实现三维现实世界应用（如具身智能中的场景理解与操控）的潜力基础。然而，由于语言表达的歧义性以及针对物体组件级别查询时的质量退化问题，实现多粒度交互仍面临重大挑战。\n在本工作中，我们提出了 FMLGS，一种支持在 3D Gaussian Splatting（3DGS） 框架下进行部件级开放词汇查询的方法。我们基于 Segment Anything Model 2（SAM2），设计了一条高效的流程，用于构建并查询一致的物体级与部件级语义信息。\n为应对物体部件之间语言歧义带来的问题，我们引入了一种语义偏差策略（semantic deviation strategy），通过对细粒度目标的语义特征进行插值融合，以增强语义信息的完整性与区分性。一旦完成训练，系统即可支持通过自然语言对物体及其可描述部件进行灵活查询。\n与现有先进方法对比实验表明，FMLGS 在部件级目标定位精度上表现更优，并在速度与准确率方面均取得领先表现：相较于 LERF 快 98 倍，较 LangSplat 快 4 倍，较 LEGaussians 快 2.5 倍。\n此外，我们还将 FMLGS 集成为一个虚拟智能体，可在三维场景中交互式导航、定位目标，并通过聊天界面响应用户指令，展示了本方法未来在多模态交互与智能体系统中的广泛应用潜力。\n"
  },
  {
    "path": "abs/2504.09048.md",
    "content": "### BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting\n\nThe recent advancements in 3D Gaussian Splatting (3DGS) have demonstrated remarkable potential in novel view synthesis tasks. The divide-and-conquer paradigm has enabled large-scale scene reconstruction, but significant challenges remain in scene partitioning, optimization, and merging processes. This paper introduces BlockGaussian, a novel framework incorporating a content-aware scene partition strategy and visibility-aware block optimization to achieve efficient and high-quality large-scale scene reconstruction. Specifically, our approach considers the content-complexity variation across different regions and balances computational load during scene partitioning, enabling efficient scene reconstruction. To tackle the supervision mismatch issue during independent block optimization, we introduce auxiliary points during individual block optimization to align the ground-truth supervision, which enhances the reconstruction quality. Furthermore, we propose a pseudo-view geometry constraint that effectively mitigates rendering degradation caused by airspace floaters during block merging. Extensive experiments on large-scale scenes demonstrate that our approach achieves state-of-the-art performance in both reconstruction efficiency and rendering quality, with a 5x speedup in optimization and an average PSNR improvement of 1.21 dB on multiple benchmarks. Notably, BlockGaussian significantly reduces computational requirements, enabling large-scale scene reconstruction on a single 24GB VRAM device.\n\n近年来，三维高斯投影（3D Gaussian Splatting, 3DGS）的进展在新视角合成任务中展现出显著潜力。分而治之的范式推动了大规模场景重建的发展，但在场景划分、优化及合并过程中仍面临诸多挑战。本文提出了一种新颖框架——BlockGaussian，该方法引入了内容感知的场景划分策略和可见性感知的块级优化机制，以实现高效且高质量的大规模场景重建。\n具体而言，我们的方法在场景划分阶段考虑了不同区域间内容复杂度的变化，并平衡了计算负载，从而实现高效的重建流程。为了解决独立块优化过程中的监督不匹配问题，我们在每个块的优化中引入辅助点以对齐真实监督信号，从而提升重建质量。此外，我们提出了伪视图几何约束，有效缓解了块合并过程中由于“漂浮体素”引发的渲染退化问题。\n在多个大规模场景上的大量实验表明，我们的方法在重建效率与渲染质量方面均达到了当前最优性能：优化速度提升了 5 倍，平均 PSNR 提升 1.21 dB。值得一提的是，BlockGaussian 显著降低了计算资源需求，支持在单张 24GB 显存设备上完成大规模场景重建。\n"
  },
  {
    "path": "abs/2504.09062.md",
    "content": "### You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting\n\nRecently, reconstructing scenes from a single panoramic image using advanced 3D Gaussian Splatting (3DGS) techniques has attracted growing interest. Panoramic images offer a 360× 180 field of view (FoV), capturing the entire scene in a single shot. However, panoramic images introduce severe distortion, making it challenging to render 3D Gaussians into 2D distorted equirectangular space directly. Converting equirectangular images to cubemap projections partially alleviates this problem but introduces new challenges, such as projection distortion and discontinuities across cube-face boundaries. To address these limitations, we present a novel framework, named TPGS, to bridge continuous panoramic 3D scene reconstruction with perspective Gaussian splatting. Firstly, we introduce a Transition Plane between adjacent cube faces to enable smoother transitions in splatting directions and mitigate optimization ambiguity in the boundary region. Moreover, an intra-to-inter face optimization strategy is proposed to enhance local details and restore visual consistency across cube-face boundaries. Specifically, we optimize 3D Gaussians within individual cube faces and then fine-tune them in the stitched panoramic space. Additionally, we introduce a spherical sampling technique to eliminate visible stitching seams. Extensive experiments on indoor and outdoor, egocentric, and roaming benchmark datasets demonstrate that our approach outperforms existing state-of-the-art methods.\n\n近年来，利用先进的三维高斯投影（3D Gaussian Splatting, 3DGS）技术从单张全景图像中重建场景，逐渐成为研究热点。全景图像具有 360×180 的视场（Field of View, FoV），能够一次性捕捉整个场景。然而，由于全景图像本身存在严重的畸变，直接将三维高斯投影渲染到二维的等距矩形（equirectangular）空间中存在较大挑战。尽管将等距矩形图像转换为立方体图（cubemap）投影在一定程度上缓解了该问题，但也引入了投影畸变与立方面边界的不连续性等新问题。\n为了解决这些限制，本文提出了一个新颖的框架——TPGS，旨在实现连续全景三维场景重建与透视高斯投影之间的桥接。\n首先，我们在相邻立方面之间引入了一个过渡平面（Transition Plane），以实现更平滑的投影方向过渡，并减轻边界区域中的优化不确定性。进一步地，我们提出了一种立方面内-跨面联合优化策略（intra-to-inter face optimization strategy），以增强局部细节并恢复立方体各面边界间的视觉一致性。具体而言，我们首先在每个立方面内独立优化 3D Gaussians，然后在拼接后的全景空间中进行全局微调。此外，我们还引入了一种球面采样技术，有效消除可见的拼接缝隙。\n在多个室内外、第一人称视角及自由移动视角的基准数据集上进行的大量实验证明，我们的方法在重建质量和视觉一致性方面均优于当前主流方法。\n"
  },
  {
    "path": "abs/2504.09097.md",
    "content": "### BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting\n\nReconstructing 3Ds of hand-object interaction (HOI) is a fundamental problem that can find numerous applications. Despite recent advances, there is no comprehensive pipeline yet for bimanual class-agnostic interaction reconstruction from a monocular RGB video, where two hands and an unknown object are interacting with each other. Previous works tackled the limited hand-object interaction case, where object templates are pre-known or only one hand is involved in the interaction. The bimanual interaction reconstruction exhibits severe occlusions introduced by complex interactions between two hands and an object. To solve this, we first introduce BIGS (Bimanual Interaction 3D Gaussian Splatting), a method that reconstructs 3D Gaussians of hands and an unknown object from a monocular video. To robustly obtain object Gaussians avoiding severe occlusions, we leverage prior knowledge of pre-trained diffusion model with score distillation sampling (SDS) loss, to reconstruct unseen object parts. For hand Gaussians, we exploit the 3D priors of hand model (i.e., MANO) and share a single Gaussian for two hands to effectively accumulate hand 3D information, given limited views. To further consider the 3D alignment between hands and objects, we include the interacting-subjects optimization step during Gaussian optimization. Our method achieves the state-of-the-art accuracy on two challenging datasets, in terms of 3D hand pose estimation (MPJPE), 3D object reconstruction (CDh, CDo, F10), and rendering quality (PSNR, SSIM, LPIPS), respectively.\n\n重建手-物交互（HOI）的三维结构是一个基础性问题，具有广泛的应用前景。尽管近年来已有诸多进展，目前仍缺乏一个完整的端到端流程，能够从单目 RGB 视频中对双手与未知物体的交互进行类别无关的重建。已有研究多处理限定场景中的手-物交互问题，例如已知的物体模板或仅有一只手参与交互。\n双手交互重建面临由双手与物体之间复杂交互带来的严重遮挡问题。为了解决这一问题，我们提出了 BIGS（Bimanual Interaction 3D Gaussian Splatting），一种可从单目视频中重建双手与未知物体的三维高斯表示的方法。\n为了在遮挡严重的情况下稳健地获取物体的高斯表示，我们结合了预训练扩散模型的先验知识与得分蒸馏采样（SDS）损失，以重建不可见的物体部分。针对手部高斯表示，我们利用手部模型（如 MANO）的三维先验，并为两只手共享一组高斯，从而在视角受限的条件下有效累积手部的三维信息。为了进一步考虑手与物体之间的三维对齐关系，我们在高斯优化过程中引入了交互体联合优化步骤。\n我们的方法在两个具有挑战性的数据集上达到了当前最优的性能，分别在三维手部姿态估计（MPJPE）、三维物体重建（CDh、CDo、F10）和渲染质量（PSNR、SSIM、LPIPS）等方面表现优异。\n"
  },
  {
    "path": "abs/2504.09129.md",
    "content": "### A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds\n\n3D Gaussian Splatting (3DGS) is a powerful reconstruction technique, but it needs to be initialized from accurate camera poses and high-fidelity point clouds. Typically, the initialization is taken from Structure-from-Motion (SfM) algorithms; however, SfM is time-consuming and restricts the application of 3DGS in real-world scenarios and large-scale scene reconstruction. We introduce a constrained optimization method for simultaneous camera pose estimation and 3D reconstruction that does not require SfM support. Core to our approach is decomposing a camera pose into a sequence of camera-to-(device-)center and (device-)center-to-world optimizations. To facilitate, we propose two optimization constraints conditioned to the sensitivity of each parameter group and restricts each parameter's search space. In addition, as we learn the scene geometry directly from the noisy point clouds, we propose geometric constraints to improve the reconstruction quality. Experiments demonstrate that the proposed method significantly outperforms the existing (multi-modal) 3DGS baseline and methods supplemented by COLMAP on both our collected dataset and two public benchmarks.\n\n3DGS是一种强大的三维重建技术，但其依赖于准确的相机位姿与高保真的点云进行初始化。通常，这些初始信息由 Structure-from-Motion（SfM）算法提供；然而，SfM 计算代价高，限制了 3DGS 在真实场景和大规模场景重建中的应用。\n我们提出了一种无需 SfM 支持的相机位姿估计与三维重建联合优化方法。该方法的核心思想是将相机位姿分解为相机到设备中心和设备中心到世界坐标系的两个子优化过程。为此，我们设计了两种优化约束机制，分别针对不同参数组的灵敏度，限定各自参数的搜索空间，从而提高优化的稳定性与效率。\n此外，考虑到我们直接从噪声较大的点云中学习场景几何，我们引入了几何约束以提升重建质量。\n实验证明，该方法在我们自采数据集和两个公开基准数据集上均显著优于现有的多模态 3DGS 基线方法及依赖 COLMAP 的替代方案。\n"
  },
  {
    "path": "abs/2504.09491.md",
    "content": "### DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering\n\nAlthough 3D Gaussian Splatting (3DGS) has demonstrated promising results in novel view synthesis, its performance degrades dramatically with sparse inputs and generates undesirable artifacts. As the number of training views decreases, the novel view synthesis task degrades to a highly under-determined problem such that existing methods suffer from the notorious overfitting issue. Interestingly, we observe that models with fewer Gaussian primitives exhibit less overfitting under sparse inputs. Inspired by this observation, we propose a Random Dropout Regularization (RDR) to exploit the advantages of low-complexity models to alleviate overfitting. In addition, to remedy the lack of high-frequency details for these models, an Edge-guided Splitting Strategy (ESS) is developed. With these two techniques, our method (termed DropoutGS) provides a simple yet effective plug-in approach to improve the generalization performance of existing 3DGS methods. Extensive experiments show that our DropoutGS produces state-of-the-art performance under sparse views on benchmark datasets including Blender, LLFF, and DTU.\n\n尽管 3D Gaussian Splatting（3DGS）在新视角合成任务中表现出良好效果，但在输入视角稀疏的情况下，其性能会急剧下降，并产生明显伪影。随着训练视角数量的减少，新视角合成问题变得高度不适定，现有方法普遍面临严重的过拟合问题。\n有趣的是，我们观察到：在稀疏输入下，使用较少高斯原语的模型过拟合现象更轻微。受此启发，我们提出了一种**随机丢弃正则化（Random Dropout Regularization, RDR）**方法，利用低复杂度模型的优势以缓解过拟合问题。\n此外，为补偿低复杂度模型在高频细节方面的不足，我们设计了一种边缘引导的分裂策略（Edge-guided Splitting Strategy, ESS）。结合这两项技术，我们提出的方法——DropoutGS，是一种简单但有效的插件式增强策略，能够提升现有 3DGS 方法在稀疏视角下的泛化能力。\n大量实验结果表明，DropoutGS 在多个基准数据集（包括 Blender、LLFF 和 DTU）上的稀疏视角设置下，均实现了当前最优性能。\n"
  },
  {
    "path": "abs/2504.09540.md",
    "content": "### EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler\n\nOnline 3D occupancy prediction provides a comprehensive spatial understanding of embodied environments. While the innovative EmbodiedOcc framework utilizes 3D semantic Gaussians for progressive indoor occupancy prediction, it overlooks the geometric characteristics of indoor environments, which are primarily characterized by planar structures. This paper introduces EmbodiedOcc++, enhancing the original framework with two key innovations: a Geometry-guided Refinement Module (GRM) that constrains Gaussian updates through plane regularization, along with a Semantic-aware Uncertainty Sampler (SUS) that enables more effective updates in overlapping regions between consecutive frames. GRM regularizes the position update to align with surface normals. It determines the adaptive regularization weight using curvature-based and depth-based constraints, allowing semantic Gaussians to align accurately with planar surfaces while adapting in complex regions. To effectively improve geometric consistency from different views, SUS adaptively selects proper Gaussians to update. Comprehensive experiments on the EmbodiedOcc-ScanNet benchmark demonstrate that EmbodiedOcc++ achieves state-of-the-art performance across different settings. Our method demonstrates improved edge accuracy and retains more geometric details while ensuring computational efficiency, which is essential for online embodied perception.\n\n在线三维占据预测对于具身环境中的空间理解具有重要意义。虽然 EmbodiedOcc 框架通过使用三维语义高斯实现了室内场景的渐进式占据预测，然而它忽略了室内环境中最核心的几何特征——以平面结构为主的空间构型。\n本文提出 EmbodiedOcc++，在原有方法的基础上引入两项关键创新，以增强其几何建模能力与更新效率。首先，**几何引导的细化模块（Geometry-guided Refinement Module, GRM）**通过平面正则化对高斯的位置更新进行约束，使其更贴合场景中的表面结构。该模块通过曲率与深度信息自适应地调节正则权重，使语义高斯既可精准对齐于平面区域，又具备在复杂几何中灵活调整的能力。其次，**语义感知的不确定性采样器（Semantic-aware Uncertainty Sampler, SUS）**用于在帧间重叠区域内自适应选择高斯进行更新，从而有效提升跨视角下的几何一致性。\n在 EmbodiedOcc-ScanNet 基准上的全面实验表明，EmbodiedOcc++ 在多种设置下均取得了当前最优性能。相较于现有方法，该方法在保持计算效率的前提下，显著提升了边缘结构的准确度与几何细节的保留能力，为在线具身感知提供了更精确而高效的解决方案。\n"
  },
  {
    "path": "abs/2504.09588.md",
    "content": "### TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting\n\nRecent advancements in Generalizable Gaussian Splatting have enabled robust 3D reconstruction from sparse input views by utilizing feed-forward Gaussian Splatting models, achieving superior cross-scene generalization. However, while many methods focus on geometric consistency, they often neglect the potential of text-driven guidance to enhance semantic understanding, which is crucial for accurately reconstructing fine-grained details in complex scenes. To address this limitation, we propose TextSplat--the first text-driven Generalizable Gaussian Splatting framework. By employing a text-guided fusion of diverse semantic cues, our framework learns robust cross-modal feature representations that improve the alignment of geometric and semantic information, producing high-fidelity 3D reconstructions. Specifically, our framework employs three parallel modules to obtain complementary representations: the Diffusion Prior Depth Estimator for accurate depth information, the Semantic Aware Segmentation Network for detailed semantic information, and the Multi-View Interaction Network for refined cross-view features. Then, in the Text-Guided Semantic Fusion Module, these representations are integrated via the text-guided and attention-based feature aggregation mechanism, resulting in enhanced 3D Gaussian parameters enriched with detailed semantic cues. Experimental results on various benchmark datasets demonstrate improved performance compared to existing methods across multiple evaluation metrics, validating the effectiveness of our framework.\n\n近年来，通用型高斯投影（Generalizable Gaussian Splatting）取得了显著进展，借助前馈式高斯投影模型，在稀疏视角输入下实现了鲁棒的三维重建，并展现出优越的跨场景泛化能力。然而，尽管许多方法侧重于几何一致性，它们往往忽略了文本引导在增强语义理解方面的潜力，而语义理解对于复杂场景中细粒度细节的准确重建至关重要。\n为了解决这一限制，我们提出了 TextSplat ——首个文本驱动的通用型高斯投影框架。该框架通过融合多种语义线索的文本引导方式，学习鲁棒的跨模态特征表示，从而提升几何信息与语义信息之间的对齐效果，生成高保真的三维重建结果。\n在框架设计上，我们引入了三个并行模块以获取互补特征表示：Diffusion Prior Depth Estimator 用于提供准确的深度信息，Semantic Aware Segmentation Network 捕捉细致的语义信息，Multi-View Interaction Network 则整合多视角特征。随后，在 Text-Guided Semantic Fusion Module 中，这些特征通过文本引导的注意力机制进行融合，从而生成富含语义细节的三维高斯参数。\n在多个基准数据集上的实验结果表明，与现有方法相比，我们的方法在多个评估指标上均取得了更优表现，验证了所提出框架的有效性。\n"
  },
  {
    "path": "abs/2504.09671.md",
    "content": "### LightHeadEd: Relightable & Editable Head Avatars from a Smartphone\n\nCreating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.\n\n传统上，构建逼真、可驱动、可重光照的三维头部虚拟化身通常依赖于配备多台标定相机的昂贵 Lightstage 系统，使其难以被广泛应用。为弥合这一技术鸿沟，本文提出一种仅使用搭载偏振滤光片的智能手机即可实现的高质量、低成本重光照头像建模方法。\n该方法在暗室中设置单一光源，同时采集交叉偏振与平行偏振的视频流，从而在动态面部表演过程中分离出皮肤的漫反射与镜面反射成分。在此基础上，我们引入了一种混合表示形式，将二维高斯嵌入到参数化头部模型的 UV 空间中，实现了高保真的几何细节保留与高效实时渲染之间的良好平衡。\n我们还设计了一个基于神经分析-合成的学习框架，将姿态与表情相关的几何偏移与外观特征进行解耦，将表面外观进一步分解为反照率（albedo）、法线（normal）、镜面反射（specular）UV 纹理图以及环境贴图，从而实现更具物理一致性的渲染能力。\n此外，我们构建了一个包含多位参与者的独特数据集，涵盖丰富的面部表情与头部运动，为后续研究与应用提供了坚实的数据基础。\n"
  },
  {
    "path": "abs/2504.09878.md",
    "content": "### MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling\n\nNeural Radiance Field (NeRF) is widely known for high-fidelity novel view synthesis. However, even the state-of-the-art NeRF model, Gaussian Splatting, requires minutes for training, far from the real-time performance required by multimedia scenarios like telemedicine. One of the obstacles is its inefficient sampling, which is only partially addressed by existing works. Existing point-sampling algorithms uniformly sample simple-texture regions (easy to fit) and complex-texture regions (hard to fit), while existing ray-sampling algorithms sample these regions all in the finest granularity (i.e. the pixel level), both wasting GPU training resources. Actually, regions with different texture intensities require different sampling granularities. To this end, we propose a novel dynamic-resolution ray-sampling algorithm, MCBlock, which employs Monte Carlo Tree Search (MCTS) to partition each training image into pixel blocks with different sizes for active block-wise training. Specifically, the trees are initialized according to the texture of training images to boost the initialization speed, and an expansion/pruning module dynamically optimizes the block partition. MCBlock is implemented in Nerfstudio, an open-source toolset, and achieves a training acceleration of up to 2.33x, surpassing other ray-sampling algorithms. We believe MCBlock can apply to any cone-tracing NeRF model and contribute to the multimedia community.\n\n神经辐射场（Neural Radiance Field, NeRF）因其出色的新视角合成能力而广受关注。然而，即便是当前最先进的 NeRF 模型——Gaussian Splatting，其训练仍需数分钟，远未达到远程医疗等多媒体场景对实时性的要求。其中一个主要瓶颈在于其低效的采样策略，尽管已有方法对该问题进行了初步尝试，但仍未根本解决。\n现有点采样算法会在**纹理简单（易拟合）与纹理复杂（难拟合）**区域进行统一采样，而现有的射线采样算法则以最精细的粒度（像素级）对所有区域均匀采样，导致 GPU 训练资源浪费。事实上，不同纹理强度的区域应采用不同的采样粒度以提升效率。\n为此，我们提出了一种新颖的动态分辨率射线采样算法 MCBlock，该方法基于蒙特卡洛树搜索（Monte Carlo Tree Search, MCTS），将每张训练图像划分为不同大小的像素块，以实现主动式、分块级别的高效训练。具体而言，MCBlock 根据图像纹理初始化采样树以加速启动过程，并通过扩展/剪枝模块动态优化像素块划分策略。\n我们已在开源工具链 Nerfstudio 中实现 MCBlock，并在多个实验中达成最高 2.33 倍训练加速，显著优于现有射线采样算法。我们相信，MCBlock 可广泛适用于各类基于 cone-tracing 的 NeRF 模型，并为多媒体场景下的实时三维重建提供有效支撑。\n"
  },
  {
    "path": "abs/2504.10001.md",
    "content": "### GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting\n\nSingle-image 3D scene reconstruction presents significant challenges due to its inherently ill-posed nature and limited input constraints. Recent advances have explored two promising directions: multiview generative models that train on 3D consistent datasets but struggle with out-of-distribution generalization, and 3D scene inpainting and completion frameworks that suffer from cross-view inconsistency and suboptimal error handling, as they depend exclusively on depth data or 3D smoothness, which ultimately degrades output quality and computational performance. Building upon these approaches, we present GaussVideoDreamer, which advances generative multimedia approaches by bridging the gap between image, video, and 3D generation, integrating their strengths through two key innovations: (1) A progressive video inpainting strategy that harnesses temporal coherence for improved multiview consistency and faster convergence. (2) A 3D Gaussian Splatting consistency mask to guide the video diffusion with 3D consistent multiview evidence. Our pipeline combines three core components: a geometry-aware initialization protocol, Inconsistency-Aware Gaussian Splatting, and a progressive video inpainting strategy. Experimental results demonstrate that our approach achieves 32% higher LLaVA-IQA scores and at least 2x speedup compared to existing methods while maintaining robust performance across diverse scenes.\n\n单张图像的三维场景重建由于其本质上的不适定性和输入约束有限，始终面临巨大挑战。近期研究主要沿着两个方向展开探索：一类是基于三维一致性数据集训练的多视角生成模型，但其泛化能力在面对分布外样本时表现不佳；另一类是三维场景的补全与修复方法，但这类方法高度依赖深度信息或三维几何光滑性，易导致视角间不一致、误差处理能力不足，从而影响输出质量和计算性能。\n在此基础上，我们提出 GaussVideoDreamer，通过整合图像生成、视频生成与三维生成三者的优势，推进生成式多媒体技术的发展。该方法包含两项关键创新：（1）一种渐进式的视频修复策略，利用时间一致性提升多视角一致性并加速收敛；（2）一种三维高斯投影一致性掩码，用以为视频扩散过程提供三维一致的多视角证据。\n我们的流程整合了三项核心组件：几何感知的初始化策略、感知不一致性的高斯投影机制（Inconsistency-Aware Gaussian Splatting），以及渐进式的视频修复策略。实验结果表明，我们的方法在多个场景中均表现稳健，在保持生成质量的同时，LLaVA-IQA 分数提升了 32%，并在运行速度上至少实现了 2 倍加速，相较现有方法具有显著优势。\n"
  },
  {
    "path": "abs/2504.10012.md",
    "content": "### EBAD-Gaussian: Event-driven Bundle Adjusted Deblur Gaussian Splatting\n\nWhile 3D Gaussian Splatting (3D-GS) achieves photorealistic novel view synthesis, its performance degrades with motion blur. In scenarios with rapid motion or low-light conditions, existing RGB-based deblurring methods struggle to model camera pose and radiance changes during exposure, reducing reconstruction accuracy. Event cameras, capturing continuous brightness changes during exposure, can effectively assist in modeling motion blur and improving reconstruction quality. Therefore, we propose Event-driven Bundle Adjusted Deblur Gaussian Splatting (EBAD-Gaussian), which reconstructs sharp 3D Gaussians from event streams and severely blurred images. This method jointly learns the parameters of these Gaussians while recovering camera motion trajectories during exposure time. Specifically, we first construct a blur loss function by synthesizing multiple latent sharp images during the exposure time, minimizing the difference between real and synthesized blurred images. Then we use event stream to supervise the light intensity changes between latent sharp images at any time within the exposure period, supplementing the light intensity dynamic changes lost in RGB images. Furthermore, we optimize the latent sharp images at intermediate exposure times based on the event-based double integral (EDI) prior, applying consistency constraints to enhance the details and texture information of the reconstructed images. Extensive experiments on synthetic and real-world datasets show that EBAD-Gaussian can achieve high-quality 3D scene reconstruction under the condition of blurred images and event stream inputs.\n\n尽管 3D Gaussian Splatting（3D-GS）在新视角合成中实现了照片级真实感，但其性能在存在运动模糊的场景下显著下降。在快速运动或低光环境中，现有基于 RGB 的去模糊方法难以准确建模曝光期间的相机位姿和辐射变化，导致重建精度降低。\n事件相机能够在曝光过程中连续捕捉亮度变化，可有效辅助运动模糊建模并提升重建质量。为此，我们提出 EBAD-Gaussian（Event-driven Bundle Adjusted Deblur Gaussian Splatting），可在事件流与严重模糊图像的联合输入下，重建清晰的三维高斯表示。该方法在恢复曝光时间内相机运动轨迹的同时，联合优化三维高斯的各项参数。\n具体而言，我们首先通过在曝光时间段内合成多个潜在清晰图像，构建模糊损失函数，使真实模糊图像与合成模糊图像之间的差异最小化。随后，利用事件流对任意时刻的潜在清晰图像之间的亮度变化进行监督，从而补充 RGB 图像中缺失的动态亮度信息。此外，我们基于事件引导的双重积分（EDI）先验，在曝光过程中的中间时刻对潜在清晰图像进行优化，并引入一致性约束，以增强重建图像的细节与纹理信息。\n在多个合成及真实世界数据集上的大量实验表明，EBAD-Gaussian 能在模糊图像与事件流输入条件下，实现高质量的三维场景重建。\n"
  },
  {
    "path": "abs/2504.10316.md",
    "content": "### ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting\n\nIn recent years, significant advancements have been made in text-driven 3D content generation. However, several challenges remain. In practical applications, users often provide extremely simple text inputs while expecting high-quality 3D content. Generating optimal results from such minimal text is a difficult task due to the strong dependency of text-to-3D models on the quality of input prompts. Moreover, the generation process exhibits high variability, making it difficult to control. Consequently, multiple iterations are typically required to produce content that meets user expectations, reducing generation efficiency. To address this issue, we propose GPT-4V for self-optimization, which significantly enhances the efficiency of generating satisfactory content in a single attempt. Furthermore, the controllability of text-to-3D generation methods has not been fully explored. Our approach enables users to not only provide textual descriptions but also specify additional conditions, such as style, edges, scribbles, poses, or combinations of multiple conditions, allowing for more precise control over the generated 3D content. Additionally, during training, we effectively integrate multi-view information, including multi-view depth, masks, features, and images, to address the common Janus problem in 3D content generation. Extensive experiments demonstrate that our method achieves robust generalization, facilitating the efficient and controllable generation of high-quality 3D content.\n\n近年来，文本驱动的三维内容生成取得了显著进展。然而，该领域仍面临诸多挑战。在实际应用中，用户往往只提供极其简洁的文本输入，却期望获得高质量的三维内容。由于文本到三维模型对输入提示词质量高度依赖，因此在仅有最小文本输入的情况下生成理想结果是一项极具挑战性的任务。此外，生成过程本身具有高度的不确定性，缺乏可控性，常常需要多次迭代才能产出符合预期的内容，从而降低了生成效率。\n为应对上述问题，我们提出了 GPT-4V 自优化机制，显著提升了单次生成即获得满意结果的效率。与此同时，现有方法在三维生成的可控性方面探索不足。我们的方法允许用户除文本描述外，还可进一步指定风格、边缘、草图、姿态等附加条件，或组合多种约束，实现对三维生成内容的更精细控制。\n此外，在训练过程中，我们有效整合了多视角信息，包括深度、掩码、特征以及图像，从而缓解了三维生成中常见的“Janus 问题”（即模型生成结果在不同视角下的不一致性）。\n大量实验证明，我们的方法具备强泛化能力，可实现高效、可控的高质量三维内容生成。\n"
  },
  {
    "path": "abs/2504.10331.md",
    "content": "### LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis\n\nNovel view synthesis (NVS) in low-light scenes remains a significant challenge due to degraded inputs characterized by severe noise, low dynamic range (LDR) and unreliable initialization. While recent NeRF-based approaches have shown promising results, most suffer from high computational costs, and some rely on carefully captured or pre-processed data--such as RAW sensor inputs or multi-exposure sequences--which severely limits their practicality. In contrast, 3D Gaussian Splatting (3DGS) enables real-time rendering with competitive visual fidelity; however, existing 3DGS-based methods struggle with low-light sRGB inputs, resulting in unstable Gaussian initialization and ineffective noise suppression. To address these challenges, we propose LL-Gaussian, a novel framework for 3D reconstruction and enhancement from low-light sRGB images, enabling pseudo normal-light novel view synthesis. Our method introduces three key innovations: 1) an end-to-end Low-Light Gaussian Initialization Module (LLGIM) that leverages dense priors from learning-based MVS approach to generate high-quality initial point clouds; 2) a dual-branch Gaussian decomposition model that disentangles intrinsic scene properties (reflectance and illumination) from transient interference, enabling stable and interpretable optimization; 3) an unsupervised optimization strategy guided by both physical constrains and diffusion prior to jointly steer decomposition and enhancement. Additionally, we contribute a challenging dataset collected in extreme low-light environments and demonstrate the effectiveness of LL-Gaussian. Compared to state-of-the-art NeRF-based methods, LL-Gaussian achieves up to 2,000 times faster inference and reduces training time to just 2%, while delivering superior reconstruction and rendering quality.\n\n低光照场景中的新视角合成（Novel View Synthesis, NVS）由于输入图像存在严重噪声、低动态范围（LDR）以及初始化不可靠等问题，仍是一项极具挑战性的任务。尽管近年来基于 NeRF 的方法取得了一定进展，但大多数方法计算开销巨大，且部分依赖精心采集或预处理的数据（例如 RAW 传感器输入或多曝光序列），这在实际应用中极大限制了其实用性。\n相比之下，3D Gaussian Splatting（3DGS）具备实时渲染能力，且可实现具有竞争力的视觉保真度。然而，现有基于 3DGS 的方法在面对低光照 sRGB 输入时表现不佳，常导致高斯初始化不稳定、噪声抑制无效等问题。\n为应对上述挑战，我们提出了 LL-Gaussian，一种用于从低光照 sRGB 图像中进行三维重建与增强的新型框架，能够实现类正常光照条件下的新视角合成。该方法引入三项核心创新：通过端到端的低光照高斯初始化模块（LLGIM）借助学习驱动的多视角立体方法中的密集先验生成高质量初始点云；采用双分支的高斯分解模型，将场景的固有属性（如反射率与光照）与瞬时干扰进行解耦，优化过程稳定且具可解释性；利用结合物理约束与扩散模型先验的无监督优化策略，协同引导分解与增强过程。\n此外，我们还构建了一个极端低光环境下采集的高挑战性数据集，并验证了 LL-Gaussian 的有效性。与现有最先进的 NeRF 方法相比，LL-Gaussian 在保持优越重建与渲染质量的同时，实现了高达 2000 倍的推理速度提升，训练时间缩短至仅 2%。\n"
  },
  {
    "path": "abs/2504.10486.md",
    "content": "### DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting\n\nCreating relightable and animatable human avatars from monocular videos is a rising research topic with a range of applications, e.g. virtual reality, sports, and video games. Previous works utilize neural fields together with physically based rendering (PBR), to estimate geometry and disentangle appearance properties of human avatars. However, one drawback of these methods is the slow rendering speed due to the expensive Monte Carlo ray tracing. To tackle this problem, we proposed to distill the knowledge from implicit neural fields (teacher) to explicit 2D Gaussian splatting (student) representation to take advantage of the fast rasterization property of Gaussian splatting. To avoid ray-tracing, we employ the split-sum approximation for PBR appearance. We also propose novel part-wise ambient occlusion probes for shadow computation. Shadow prediction is achieved by querying these probes only once per pixel, which paves the way for real-time relighting of avatars. These techniques combined give high-quality relighting results with realistic shadow effects. Our experiments demonstrate that the proposed student model achieves comparable or even better relighting results with our teacher model while being 370 times faster at inference time, achieving a 67 FPS rendering speed.\n\n从单目视频中构建可重光照、可驱动的人体虚拟化身，已成为当前备受关注的研究方向，广泛应用于虚拟现实、体育分析与电子游戏等场景。已有方法通常结合神经场与基于物理的渲染（Physically Based Rendering, PBR）框架，以估计人体几何并解耦外观属性。然而，这些方法普遍面临渲染速度缓慢的问题，原因在于其依赖计算开销极高的蒙特卡洛光线追踪。\n为解决这一问题，我们提出将隐式神经场（teacher）中的知识蒸馏至显式的二维高斯投影（student）表示中，充分利用 Gaussian Splatting 的快速光栅化特性，实现高效渲染。为避免光线追踪，我们采用了 split-sum 近似算法以完成 PBR 外观建模，并提出了新颖的 部位级环境光遮蔽探针用于阴影计算。阴影预测仅需对每个像素查询一次这些探针，极大提升了实时重光照能力。\n上述技术相结合，实现了具有逼真阴影效果的高质量重光照效果。实验表明，我们提出的 student 模型在保有与 teacher 模型相当甚至更佳的重光照表现的同时，推理速度提升达 370 倍，达到 67 FPS 实时渲染帧率，为可重光照虚拟人技术的实用化迈出了关键一步。\n"
  },
  {
    "path": "abs/2504.10809.md",
    "content": "### GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR\n\nWe present GaSLight, a method that generates spatially-varying lighting from regular images. Our method proposes using HDR Gaussian Splats as light source representation, marking the first time regular images can serve as light sources in a 3D renderer. Our two-stage process first enhances the dynamic range of images plausibly and accurately by leveraging the priors embedded in diffusion models. Next, we employ Gaussian Splats to model 3D lighting, achieving spatially variant lighting. Our approach yields state-of-the-art results on HDR estimations and their applications in illuminating virtual objects and scenes. To facilitate the benchmarking of images as light sources, we introduce a novel dataset of calibrated and unsaturated HDR to evaluate images as light sources. We assess our method using a combination of this novel dataset and an existing dataset from the literature.\n\n我们提出了 GaSLight，一种可从普通图像生成空间可变光照的方法。该方法首次将 HDR 高斯投影（HDR Gaussian Splats） 作为光源表示，使得普通图像得以在三维渲染器中直接作为光源使用。\n我们的方法采用两阶段流程：第一阶段通过利用扩散模型中蕴含的先验信息，对图像的动态范围进行可信且准确的增强；第二阶段则使用高斯投影对三维光照进行建模，从而实现空间可变的照明效果。\n在高动态范围估计及其在虚拟物体与场景照明中的应用方面，我们的方法取得了当前最优性能。为推动“图像作为光源”这一研究方向的评估标准建设，我们还构建了一个全新的标定且未过曝的 HDR 数据集，用于评估图像作为光源的表现。我们的方法通过该新数据集与已有文献中的公开数据集相结合进行评估，验证了其有效性。\n"
  },
  {
    "path": "abs/2504.11003.md",
    "content": "### 3D Gabor Splatting: Reconstruction of High-frequency Surface Texture using Gabor Noise\n\n3D Gaussian splatting has experienced explosive popularity in the past few years in the field of novel view synthesis. The lightweight and differentiable representation of the radiance field using the Gaussian enables rapid and high-quality reconstruction and fast rendering. However, reconstructing objects with high-frequency surface textures (e.g., fine stripes) requires many skinny Gaussian kernels because each Gaussian represents only one color if viewed from one direction. Thus, reconstructing the stripes pattern, for example, requires Gaussians for at least the number of stripes. We present 3D Gabor splatting, which augments the Gaussian kernel to represent spatially high-frequency signals using Gabor noise. The Gabor kernel is a combination of a Gaussian term and spatially fluctuating wave functions, making it suitable for representing spatial high-frequency texture. We demonstrate that our 3D Gabor splatting can reconstruct various high-frequency textures on the objects.\n\n近年来，3D Gaussian Splatting 在新视角合成领域迅速走红。其通过高斯对辐射场进行轻量且可微分的表示，使得三维重建快速且高质量，渲染效率也大幅提升。然而，对于具有高频表面纹理（如细条纹）物体的重建仍存在挑战：由于每个高斯在单一视角下只能表示一个颜色，因此要重建条纹图案，至少需要数量与条纹数量相当的高斯核，这在表现细节时极为低效。\n为解决这一问题，我们提出 3D Gabor Splatting，通过引入 Gabor 噪声对高斯核进行增强，使其能够表达空间高频信号。Gabor 核由一个高斯项与一个具有空间波动性的波函数组成，因而特别适合表示空间高频纹理。\n实验证明，3D Gabor Splatting 能够有效重建物体表面的多种高频纹理结构。\n"
  },
  {
    "path": "abs/2504.11024.md",
    "content": "### Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation\n\nThe increasing availability of digital 3D environments, whether through image-based 3D reconstruction, generation, or scans obtained by robots, is driving innovation across various applications. These come with a significant demand for 3D interaction, such as 3D Interactive Segmentation, which is useful for tasks like object selection and manipulation. Additionally, there is a persistent need for solutions that are efficient, precise, and performing well across diverse settings, particularly in unseen environments and with unfamiliar objects. In this work, we introduce a 3D interactive segmentation method that consistently surpasses previous state-of-the-art techniques on both in-domain and out-of-domain datasets. Our simple approach integrates a voxel-based sparse encoder with a lightweight transformer-based decoder that implements implicit click fusion, achieving superior performance and maximizing efficiency. Our method demonstrates substantial improvements on benchmark datasets, including ScanNet, ScanNet++, S3DIS, and KITTI-360, and also on unseen geometric distributions such as the ones obtained by Gaussian Splatting.\n\n随着图像驱动的三维重建、三维生成以及机器人扫描等技术的发展，数字三维环境的可获取性不断提升，推动了多个应用领域的创新。这类环境伴随着对三维交互的强烈需求，其中**三维交互式分割（3D Interactive Segmentation）**在对象选择与操作等任务中尤为重要。同时，实际应用中对高效、精确、且具备跨场景泛化能力的解决方案始终有着迫切需求，特别是在未见环境与未知物体的场景下。\n在本工作中，我们提出了一种三维交互式分割方法，在同域与异域数据集上均显著优于现有最先进技术。该方法结构简洁，将基于体素的稀疏编码器与轻量级的 Transformer 解码器结合，通过隐式点击融合机制实现高效的信息集成，从而兼顾性能与效率。\n我们的方法在多个基准数据集（包括 ScanNet、ScanNet++、S3DIS 和 KITTI-360）上取得了显著性能提升，同时也能有效处理如 Gaussian Splatting 等带有新型几何分布的数据，展现出强泛化能力。\n"
  },
  {
    "path": "abs/2504.11218.md",
    "content": "### 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians\n\n3D affordance reasoning is essential in associating human instructions with the functional regions of 3D objects, facilitating precise, task-oriented manipulations in embodied AI. However, current methods, which predominantly depend on sparse 3D point clouds, exhibit limited generalizability and robustness due to their sensitivity to coordinate variations and the inherent sparsity of the data. By contrast, 3D Gaussian Splatting (3DGS) delivers high-fidelity, real-time rendering with minimal computational overhead by representing scenes as dense, continuous distributions. This positions 3DGS as a highly effective approach for capturing fine-grained affordance details and improving recognition accuracy. Nevertheless, its full potential remains largely untapped due to the absence of large-scale, 3DGS-specific affordance datasets. To overcome these limitations, we present 3DAffordSplat, the first large-scale, multi-modal dataset tailored for 3DGS-based affordance reasoning. This dataset includes 23,677 Gaussian instances, 8,354 point cloud instances, and 6,631 manually annotated affordance labels, encompassing 21 object categories and 18 affordance types. Building upon this dataset, we introduce AffordSplatNet, a novel model specifically designed for affordance reasoning using 3DGS representations. AffordSplatNet features an innovative cross-modal structure alignment module that exploits structural consistency priors to align 3D point cloud and 3DGS representations, resulting in enhanced affordance recognition accuracy. Extensive experiments demonstrate that the 3DAffordSplat dataset significantly advances affordance learning within the 3DGS domain, while AffordSplatNet consistently outperforms existing methods across both seen and unseen settings, highlighting its robust generalization capabilities.\n\n三维可供性推理（3D affordance reasoning）在将人类指令与三维物体的功能区域关联方面至关重要，可为具身智能中的任务导向型操控提供精确支持。然而，现有方法主要依赖稀疏的三维点云输入，因其对坐标变化高度敏感且数据本身稀疏，导致泛化能力和鲁棒性较弱。\n相比之下，3D Gaussian Splatting（3DGS）通过将场景表示为稠密的连续分布，能够以极低的计算开销实现高保真、实时渲染。这使得 3DGS 在捕捉细粒度可供性信息与提升识别精度方面具备天然优势。然而，由于缺乏大规模、专用于 3DGS 的可供性数据集，这一潜力尚未被充分挖掘。\n为解决这一问题，我们提出 3DAffordSplat，首个专为基于 3DGS 的可供性推理任务设计的大规模多模态数据集。该数据集包含 23,677 个高斯实例、8,354 个点云实例与 6,631 条人工标注的可供性标签，覆盖 21 个物体类别和 18 种可供性类型。\n基于该数据集，我们进一步提出了 AffordSplatNet，一款专门面向 3DGS 表示的可供性推理模型。AffordSplatNet 设计了创新性的跨模态结构对齐模块，利用结构一致性先验对点云与高斯表示进行对齐，从而提升可供性识别的准确率。\n大量实验表明，3DAffordSplat 数据集显著推动了 3DGS 领域内的可供性学习研究，而 AffordSplatNet 则在已见与未见场景中均优于现有方法，展现出卓越的泛化能力。\n"
  },
  {
    "path": "abs/2504.11893.md",
    "content": "### CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting\n\nOpen-vocabulary 3D scene understanding is crucial for applications requiring natural language-driven spatial interpretation, such as robotics and augmented reality. While 3D Gaussian Splatting (3DGS) offers a powerful representation for scene reconstruction, integrating it with open-vocabulary frameworks reveals a key challenge: cross-view granularity inconsistency. This issue, stemming from 2D segmentation methods like SAM, results in inconsistent object segmentations across views (e.g., a \"coffee set\" segmented as a single entity in one view but as \"cup + coffee + spoon\" in another). Existing 3DGS-based methods often rely on isolated per-Gaussian feature learning, neglecting the spatial context needed for cohesive object reasoning, leading to fragmented representations. We propose Context-Aware Gaussian Splatting (CAGS), a novel framework that addresses this challenge by incorporating spatial context into 3DGS. CAGS constructs local graphs to propagate contextual features across Gaussians, reducing noise from inconsistent granularity, employs mask-centric contrastive learning to smooth SAM-derived features across views, and leverages a precomputation strategy to reduce computational cost by precomputing neighborhood relationships, enabling efficient training in large-scale scenes. By integrating spatial context, CAGS significantly improves 3D instance segmentation and reduces fragmentation errors on datasets like LERF-OVS and ScanNet, enabling robust language-guided 3D scene understanding.\n\n开放词汇的三维场景理解对于自然语言驱动的空间解析任务至关重要，广泛应用于机器人导航、增强现实等场景。虽然 3D Gaussian Splatting（3DGS）作为强大的三维重建表示形式已取得显著成果，但将其与开放词汇框架结合时，面临一个核心挑战：跨视角的粒度不一致性。这一问题源自诸如 SAM 等 2D 分割方法，在不同视角下对同一物体的划分不一致，例如在某一视角中“咖啡具”被分为一个整体，而在另一视角中则被细分为“杯子 + 咖啡 + 勺子”。\n现有基于 3DGS 的方法多采用单独的高斯特征学习，忽略了进行完整物体理解所需的空间上下文，从而导致表示碎片化的问题。\n为应对此挑战，我们提出 Context-Aware Gaussian Splatting（CAGS），一个引入空间上下文的 3DGS 新框架。CAGS 通过构建局部图结构在高斯之间传播上下文特征，从而缓解由分割粒度不一致引入的噪声；同时，引入以掩码为中心的对比学习机制，平滑 SAM 分割在不同视角间的特征表达；此外，CAGS 还采用一种预计算策略提前建立高斯邻接关系，显著降低大规模场景下的训练开销。\n通过引入空间上下文，CAGS 在 LERF-OVS 与 ScanNet 等数据集上显著提升了三维实例分割效果，有效减少碎片化错误，实现了更加稳健的语言引导型三维场景理解。\n"
  },
  {
    "path": "abs/2504.12292.md",
    "content": "### SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians\n\nAccurate, real-time 3D reconstruction of human heads from monocular images and videos underlies numerous visual applications. As 3D ground truth data is hard to come by at scale, previous methods have sought to learn from abundant 2D videos in a self-supervised manner. Typically, this involves the use of differentiable mesh rendering, which is effective but faces limitations. To improve on this, we propose SHeaP (Self-supervised Head Geometry Predictor Learned via 2D Gaussians). Given a source image, we predict a 3DMM mesh and a set of Gaussians that are rigged to this mesh. We then reanimate this rigged head avatar to match a target frame, and backpropagate photometric losses to both the 3DMM and Gaussian prediction networks. We find that using Gaussians for rendering substantially improves the effectiveness of this self-supervised approach. Training solely on 2D data, our method surpasses existing self-supervised approaches in geometric evaluations on the NoW benchmark for neutral faces and a new benchmark for non-neutral expressions. Our method also produces highly expressive meshes, outperforming state-of-the-art in emotion classification.\n\n从单目图像和视频中实现精准、实时的三维人头重建是众多视觉应用的基础。然而，由于大规模三维标注数据难以获取，现有方法通常依赖自监督学习，从大量二维视频中学习三维结构，常用技术包括可微分的网格渲染。尽管这一方案在一定程度上取得了成功，但仍面临表达能力受限等问题。\n为此，我们提出 SHeaP，一种基于二维高斯渲染的自监督人头几何学习方法。在输入一张源图像后，模型会预测一个三维形变模型（3D Morphable Model, 3DMM）网格及其绑定的一组二维高斯表示。随后，我们将该绑定人头化身重新驱动，使其与目标帧中的人脸姿态一致，并通过光度损失对 3DMM 和高斯预测网络进行联合反向传播。\n我们发现，相较于传统网格渲染，使用二维高斯进行渲染可显著提升自监督训练的效果。在仅使用二维数据进行训练的设定下，SHeaP 在中性面孔的 NoW 基准测试和一个涵盖非中性表情的新测试集上，均在几何评估指标上优于现有自监督方法。同时，我们方法生成的人头网格具有更强的表情表现力，在情绪识别任务中也超越当前最先进技术，展现出对细腻面部变化的高保真建模能力。\n"
  },
  {
    "path": "abs/2504.12788.md",
    "content": "### ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior\n\nDrag-driven editing has become popular among designers for its ability to modify complex geometric structures through simple and intuitive manipulation, allowing users to adjust and reshape content with minimal technical skill. This drag operation has been incorporated into numerous methods to facilitate the editing of 2D images and 3D meshes in design. However, few studies have explored drag-driven editing for the widely-used 3D Gaussian Splatting (3DGS) representation, as deforming 3DGS while preserving shape coherence and visual continuity remains challenging. In this paper, we introduce ARAP-GS, a drag-driven 3DGS editing framework based on As-Rigid-As-Possible (ARAP) deformation. Unlike previous 3DGS editing methods, we are the first to apply ARAP deformation directly to 3D Gaussians, enabling flexible, drag-driven geometric transformations. To preserve scene appearance after deformation, we incorporate an advanced diffusion prior for image super-resolution within our iterative optimization process. This approach enhances visual quality while maintaining multi-view consistency in the edited results. Experiments show that ARAP-GS outperforms current methods across diverse 3D scenes, demonstrating its effectiveness and superiority for drag-driven 3DGS editing. Additionally, our method is highly efficient, requiring only 10 to 20 minutes to edit a scene on a single RTX 3090 GPU.\n\n拖拽驱动的编辑因其操作简单、直观，能有效修改复杂几何结构，受到设计师的广泛欢迎，使用户几乎无需专业技能即可调整和重塑内容。这一交互方式已被广泛应用于 2D 图像和 3D 网格的设计编辑中。然而，对于目前广泛应用的 3D Gaussian Splatting（3DGS）表示形式，基于拖拽的编辑尚鲜有探索，因为在编辑过程中同时保持形状一致性与视觉连续性仍具挑战。\n本文提出 ARAP-GS，一种基于**尽可能刚性（As-Rigid-As-Possible, ARAP）**变形的拖拽式 3DGS 编辑框架。与现有 3DGS 编辑方法不同，我们首次将 ARAP 变形直接应用于三维高斯表示，从而实现灵活、精确的拖拽式几何变换。\n为保持变形后场景的外观一致性，我们在迭代优化过程中引入了先进的扩散模型先验用于图像超分辨重建，在提升视觉质量的同时确保编辑结果的多视角一致性。\n实验结果表明，ARAP-GS 在多个多样化三维场景上均优于现有方法，验证了其在拖拽式 3DGS 编辑任务中的有效性与先进性。同时，该方法效率极高，在单张 RTX 3090 GPU 上仅需 10 到 20 分钟即可完成场景编辑。\n"
  },
  {
    "path": "abs/2504.12799.md",
    "content": "### TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors\n\nReconstructing transparent surfaces is essential for tasks such as robotic manipulation in labs, yet it poses a significant challenge for 3D reconstruction techniques like 3D Gaussian Splatting (3DGS). These methods often encounter a transparency-depth dilemma, where the pursuit of photorealistic rendering through standard α-blending undermines geometric precision, resulting in considerable depth estimation errors for transparent materials. To address this issue, we introduce Transparent Surface Gaussian Splatting (TSGS), a new framework that separates geometry learning from appearance refinement. In the geometry learning stage, TSGS focuses on geometry by using specular-suppressed inputs to accurately represent surfaces. In the second stage, TSGS improves visual fidelity through anisotropic specular modeling, crucially maintaining the established opacity to ensure geometric accuracy. To enhance depth inference, TSGS employs a first-surface depth extraction method. This technique uses a sliding window over α-blending weights to pinpoint the most likely surface location and calculates a robust weighted average depth. To evaluate the transparent surface reconstruction task under realistic conditions, we collect a TransLab dataset that includes complex transparent laboratory glassware. Extensive experiments on TransLab show that TSGS achieves accurate geometric reconstruction and realistic rendering of transparent objects simultaneously within the efficient 3DGS framework. Specifically, TSGS significantly surpasses current leading methods, achieving a 37.3% reduction in chamfer distance and an 8.0% improvement in F1 score compared to the top baseline.\n\n重建透明表面对于实验室中的机器人操作等任务至关重要，但这对诸如 3D Gaussian Splatting（3DGS）等三维重建技术而言是一大挑战。此类方法常面临透明度-深度两难问题：通过标准 α 混合追求照片级真实感渲染的同时，会削弱几何精度，导致透明材质的深度估计误差显著。\n为解决这一问题，我们提出了 Transparent Surface Gaussian Splatting（TSGS），一个将几何学习与外观优化解耦的三维重建新框架。在几何学习阶段，TSGS 通过使用抑制高光的输入，聚焦于表面几何的精确表达。在随后的外观优化阶段，TSGS 引入各向异性高光建模以提升视觉真实感，同时保持已确定的不透明度，从而确保几何精度不被破坏。\n为增强深度推理能力，TSGS 设计了一种首层表面深度提取方法：通过在 α 混合权重上滑动窗口，定位最可能的表面位置，并计算稳健的加权平均深度。\n为了在真实条件下评估透明表面重建性能，我们构建了 TransLab 数据集，涵盖复杂的实验室透明玻璃器皿。大量实验证明，TSGS 能在高效的 3DGS 框架下，同时实现透明物体的高精度几何重建与真实感渲染。在 TransLab 上，TSGS 相较当前最优方法实现了 37.3% 的 Chamfer 距离下降与 8.0% 的 F1 分数提升，显著优于主流基线。\n"
  },
  {
    "path": "abs/2504.12800.md",
    "content": "### CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation\n\nAs 3D Gaussian Splatting (3DGS) gains popularity as a 3D representation of real scenes, enabling user-friendly deformation to create novel scenes while preserving fine details from the original 3DGS has attracted significant research attention. We introduce CAGE-GS, a cage-based 3DGS deformation method that seamlessly aligns a source 3DGS scene with a user-defined target shape. Our approach learns a deformation cage from the target, which guides the geometric transformation of the source scene. While the cages effectively control structural alignment, preserving the textural appearance of 3DGS remains challenging due to the complexity of covariance parameters. To address this, we employ a Jacobian matrix-based strategy to update the covariance parameters of each Gaussian, ensuring texture fidelity post-deformation. Our method is highly flexible, accommodating various target shape representations, including texts, images, point clouds, meshes and 3DGS models. Extensive experiments and ablation studies on both public datasets and newly proposed scenes demonstrate that our method significantly outperforms existing techniques in both efficiency and deformation quality.\n\n随着 3D Gaussian Splatting（3DGS）日益成为真实场景的主流三维表示，如何在保留原始细节的同时实现用户友好的变形以生成新场景，成为研究热点。我们提出 CAGE-GS，一种基于笼形结构的 3DGS 变形方法，可将源 3DGS 场景无缝对齐到用户定义的目标形状上。\n我们的方法从目标形状中学习一个变形笼（deformation cage），该笼体用于引导源场景的几何变换。虽然笼体在结构对齐方面具有良好的控制能力，但由于高斯协方差参数的复杂性，保持 3DGS 的纹理外观仍是一项挑战。为此，我们提出基于 Jacobian 矩阵 的策略，用于更新每个高斯的协方差参数，确保变形后的纹理保真度。\n该方法具有高度灵活性，支持多种目标形状表示形式，包括文本、图像、点云、网格以及 3DGS 模型。在多个公开数据集及新构建场景上的广泛实验证明，我们的方法在变形效率与变形质量上均显著优于现有技术。\n"
  },
  {
    "path": "abs/2504.12811.md",
    "content": "### AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering\n\nAlthough 3D Gaussian Splatting (3DGS) has revolutionized 3D reconstruction, it still faces challenges such as aliasing, projection artifacts, and view inconsistencies, primarily due to the simplification of treating splats as 2D entities. We argue that incorporating full 3D evaluation of Gaussians throughout the 3DGS pipeline can effectively address these issues while preserving rasterization efficiency. Specifically, we introduce an adaptive 3D smoothing filter to mitigate aliasing and present a stable view-space bounding method that eliminates popping artifacts when Gaussians extend beyond the view frustum. Furthermore, we promote tile-based culling to 3D with screen-space planes, accelerating rendering and reducing sorting costs for hierarchical rasterization. Our method achieves state-of-the-art quality on in-distribution evaluation sets and significantly outperforms other approaches for out-of-distribution views. Our qualitative evaluations further demonstrate the effective removal of aliasing, distortions, and popping artifacts, ensuring real-time, artifact-free rendering.\n\n尽管 3D Gaussian Splatting（3DGS）在三维重建领域引发了重大变革，但其仍面临锯齿伪影、投影失真以及视角不一致等问题，主要源于将投影单元简化为二维实体的处理方式。我们认为，在整个 3DGS 渲染管线中引入完整的三维高斯评估机制，可以在保持光栅化效率的同时，有效缓解上述问题。\n为此，我们提出了一种自适应三维平滑滤波器以降低锯齿现象，并引入了一种稳定的视空间包围方法，用于消除高斯核超出视锥时出现的“跳变伪影”。此外，我们还将基于图块的剔除策略扩展至三维空间，通过结合屏幕空间平面进行裁剪，从而提升渲染速度并降低层级光栅化中的排序开销。\n在同分布评估数据集上，我们的方法取得了当前最优的重建质量，并在非同分布视角条件下显著优于现有方法。定性评估进一步表明，我们的方法能够有效去除锯齿、畸变及跳变伪影，实现真实时间、无伪影的高质量渲染效果。\n"
  },
  {
    "path": "abs/2504.12905.md",
    "content": "### Second-order Optimization of Gaussian Splats with Importance Sampling\n\n3D Gaussian Splatting (3DGS) is widely used for novel view synthesis due to its high rendering quality and fast inference time. However, 3DGS predominantly relies on first-order optimizers such as Adam, which leads to long training times. To address this limitation, we propose a novel second-order optimization strategy based on Levenberg-Marquardt (LM) and Conjugate Gradient (CG), which we specifically tailor towards Gaussian Splatting. Our key insight is that the Jacobian in 3DGS exhibits significant sparsity since each Gaussian affects only a limited number of pixels. We exploit this sparsity by proposing a matrix-free and GPU-parallelized LM optimization. To further improve its efficiency, we propose sampling strategies for both the camera views and loss function and, consequently, the normal equation, significantly reducing the computational complexity. In addition, we increase the convergence rate of the second-order approximation by introducing an effective heuristic to determine the learning rate that avoids the expensive computation cost of line search methods. As a result, our method achieves a 3× speedup over standard LM and outperforms Adam by  6× when the Gaussian count is low while remaining competitive for moderate counts. Project Page: this https URL\n\n3D Gaussian Splatting（3DGS）因其出色的渲染质量与快速的推理速度，被广泛应用于新视角合成任务。然而，现有 3DGS 主要依赖 Adam 等一阶优化器，训练过程耗时较长，限制了其实用性。\n为解决这一问题，我们提出了一种基于 Levenberg-Marquardt（LM）与共轭梯度法（Conjugate Gradient, CG）的新型二阶优化策略，并针对 Gaussian Splatting 的特性进行了专门设计。我们核心的观察是：在 3DGS 中，Jacobian 矩阵呈现出高度稀疏性，因为每个高斯仅影响有限数量的像素。基于这一特点，我们提出了一种无需显式构建矩阵的 GPU 并行 LM 优化方法，以充分利用稀疏结构。\n为进一步提升效率，我们设计了对相机视角和损失函数的采样策略，并据此简化法方程的构建，显著降低了计算复杂度。此外，为加快二阶优化的收敛速度，我们引入了一种高效启发式策略用于学习率估计，从而避免传统线性搜索方法所带来的高计算开销。\n实验结果表明：在高斯数量较低的设置下，我们的方法相较标准 LM 提速达 3 倍，相比 Adam 实现了 6 倍加速；在中等数量的高斯设置下仍保持具有竞争力的性能。\n"
  },
  {
    "path": "abs/2504.12999.md",
    "content": "### GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration\n\nPhotorealistic avatars have become essential for immersive applications in virtual reality (VR) and augmented reality (AR), enabling lifelike interactions in areas such as training simulations, telemedicine, and virtual collaboration. These avatars bridge the gap between the physical and digital worlds, improving the user experience through realistic human representation. However, existing avatar creation techniques face significant challenges, including high costs, long creation times, and limited utility in virtual applications. Manual methods, such as MetaHuman, require extensive time and expertise, while automatic approaches, such as NeRF-based pipelines often lack efficiency, detailed facial expression fidelity, and are unable to be rendered at a speed sufficent for real-time applications. By involving several cutting-edge modern techniques, we introduce an end-to-end 3D Gaussian Splatting (3DGS) avatar creation pipeline that leverages monocular video input to create a scalable and efficient photorealistic avatar directly compatible with the Unity game engine. Our pipeline incorporates a novel Gaussian splatting technique with customized preprocessing that enables the user of \"in the wild\" monocular video capture, detailed facial expression reconstruction and embedding within a fully rigged avatar model. Additionally, we present a Unity-integrated Gaussian Splatting Avatar Editor, offering a user-friendly environment for VR/AR application development. Experimental results validate the effectiveness of our preprocessing pipeline in standardizing custom data for 3DGS training and demonstrate the versatility of Gaussian avatars in Unity, highlighting the scalability and practicality of our approach.\n\n逼真写实的虚拟化身（Photorealistic avatars）已成为虚拟现实（VR）与增强现实（AR）等沉浸式应用中的关键组成部分，广泛应用于训练模拟、远程医疗、虚拟协作等场景，赋予数字空间中更具真实感的人际互动体验。通过对人类形象的逼真重建，这类虚拟化身有效连接了物理世界与数字世界，显著提升了用户沉浸感与交互质量。\n然而，现有的虚拟化身生成技术仍面临诸多挑战，如成本高昂、制作周期长、以及在虚拟应用中的适用性有限。手工构建方案（如 MetaHuman）依赖专业人员、制作复杂，而自动化方案（如基于 NeRF 的方法）则常常存在效率低下、面部表情精细度不足，以及无法满足实时渲染需求等问题。\n为此，我们提出了一个端到端的 3D Gaussian Splatting（3DGS）化身生成流程，支持从单目视频输入中快速构建可扩展、高保真的写实化身，并可直接部署于 Unity 游戏引擎。该流程融合了多项前沿技术：引入了自定义预处理策略的新型高斯投影方法，支持“野外”条件下的视频采集输入，并可精细还原面部表情，同时将其嵌入到完整骨骼绑定的虚拟角色中。\n此外，我们开发了 Unity 集成版 Gaussian Splatting Avatar Editor，为 VR/AR 应用开发者提供了一个直观友好的编辑环境。实验结果表明，我们的预处理流程能够有效标准化用户自定义数据以适配 3DGS 训练，同时展示了高斯化身在 Unity 中的灵活适应性与广泛实用性，充分体现了本方法的可扩展性与实用价值。\n"
  },
  {
    "path": "abs/2504.13022.md",
    "content": "### CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation\n\nGaussian splatting demonstrates proficiency for 3D scene modeling but suffers from substantial data volume due to inherent primitive redundancy. To enable future photorealistic 3D immersive visual communication applications, significant compression is essential for transmission over the existing Internet infrastructure. Hence, we propose Compressed Gaussian Splatting (CompGS++), a novel framework that leverages compact Gaussian primitives to achieve accurate 3D modeling with substantial size reduction for both static and dynamic scenes. Our design is based on the principle of eliminating redundancy both between and within primitives. Specifically, we develop a comprehensive prediction paradigm to address inter-primitive redundancy through spatial and temporal primitive prediction modules. The spatial primitive prediction module establishes predictive relationships for scene primitives and enables most primitives to be encoded as compact residuals, substantially reducing the spatial redundancy. We further devise a temporal primitive prediction module to handle dynamic scenes, which exploits primitive correlations across timestamps to effectively reduce temporal redundancy. Moreover, we devise a rate-constrained optimization module that jointly minimizes reconstruction error and rate consumption. This module effectively eliminates parameter redundancy within primitives and enhances the overall compactness of scene representations. Comprehensive evaluations across multiple benchmark datasets demonstrate that CompGS++ significantly outperforms existing methods, achieving superior compression performance while preserving accurate scene modeling.\n\nGaussian Splatting 在三维场景建模方面展现出强大能力，但由于原始表示中存在大量冗余，导致数据体积庞大。为了支持未来面向真实感沉浸式视觉通信的 3D 应用，在现有互联网基础设施下进行高效传输，高效压缩成为关键需求。\n为此，我们提出了 Compressed Gaussian Splatting（CompGS++），一个新型压缩框架，旨在通过紧凑高斯原语实现静态与动态场景的高精度建模与大幅尺寸压缩。该框架基于“消除原语之间与原语内部冗余”的核心原则设计。\n具体而言，我们构建了一个完整的预测范式来处理跨原语冗余，包括空间原语预测模块与时间原语预测模块。其中，空间原语预测模块通过建立原语间的空间预测关系，使大部分原语可作为残差编码，从而显著降低空间冗余。时间原语预测模块则针对动态场景设计，利用不同时间帧中原语间的相关性，有效抑制时间冗余。\n此外，我们引入了一个速率约束优化模块，以联合最小化重建误差与压缩速率，该模块可有效压缩原语内部参数的冗余性，进一步提升整体表示的紧凑性。\n在多个基准数据集上的全面评估表明，CompGS++ 在保持高精度场景建模的同时，显著优于现有方法，在压缩率与重建质量之间实现更优平衡。\n"
  },
  {
    "path": "abs/2504.13153.md",
    "content": "### Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs\n\nBridging natural language and 3D geometry is a crucial step toward flexible, language-driven scene understanding. While recent advances in 3D Gaussian Splatting (3DGS) have enabled fast and high-quality scene reconstruction, research has also explored incorporating open-vocabulary understanding into 3DGS. However, most existing methods require iterative optimization over per-view 2D semantic feature maps, which not only results in inefficiencies but also leads to inconsistent 3D semantics across views. To address these limitations, we introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives. The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities and providing a structured foundation for open-vocabulary understanding. Based on the graph structure, we design an efficient reprojection strategy that lifts 2D semantic features onto the superpoints, avoiding costly multi-view iterative training. The resulting representation ensures strong 3D semantic coherence and naturally supports hierarchical understanding, enabling both coarse- and fine-grained open-vocabulary perception within a unified semantic field. Extensive experiments demonstrate that our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over 30× faster.\n\n将自然语言与三维几何建立桥接是实现灵活、语言驱动场景理解的关键一步。尽管近年来 3D Gaussian Splatting（3DGS）在快速高质量场景重建方面取得了显著进展，研究者也开始尝试将开放词汇语义理解引入 3DGS。然而，现有方法大多依赖于对每视角的二维语义特征图进行迭代优化，这不仅效率低下，还容易导致不同视角下语义不一致的问题。\n为克服这些限制，我们提出了一种无需训练的框架，直接基于高斯原语构建 superpoint 图结构。该图将场景划分为空间紧凑且语义一致的区域，形成跨视角一致的三维实体，为开放词汇理解提供了结构化基础。\n在此图结构之上，我们设计了一种高效的 重投影策略，将二维语义特征映射至 superpoint，避免了传统多视角迭代训练的高昂代价。最终形成的表示不仅具备强一致性的三维语义结构，同时天然支持分层理解，可在统一的语义场中实现粗粒度与细粒度的开放词汇感知。\n大量实验表明，该方法在开放词汇语义分割任务上取得了当前最优性能，且语义场构建速度提升超过 30 倍，在效率与质量之间实现了出色平衡。\n"
  },
  {
    "path": "abs/2504.13167.md",
    "content": "### ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos\n\nCreating a photorealistic scene and human reconstruction from a single monocular in-the-wild video figures prominently in the perception of a human-centric 3D world. Recent neural rendering advances have enabled holistic human-scene reconstruction but require pre-calibrated camera and human poses, and days of training time. In this work, we introduce a novel unified framework that simultaneously performs camera tracking, human pose estimation and human-scene reconstruction in an online fashion. 3D Gaussian Splatting is utilized to learn Gaussian primitives for humans and scenes efficiently, and reconstruction-based camera tracking and human pose estimation modules are designed to enable holistic understanding and effective disentanglement of pose and appearance. Specifically, we design a human deformation module to reconstruct the details and enhance generalizability to out-of-distribution poses faithfully. Aiming to learn the spatial correlation between human and scene accurately, we introduce occlusion-aware human silhouette rendering and monocular geometric priors, which further improve reconstruction quality. Experiments on the EMDB and NeuMan datasets demonstrate superior or on-par performance with existing methods in camera tracking, human pose estimation, novel view synthesis and runtime.\n\n从单目野外视频中重建逼真的场景与人物，是实现以人为中心的三维世界感知的关键步骤。尽管近年来神经渲染的进展已推动整体人-场景重建的发展，但这些方法通常依赖于预标定的相机与人体姿态，且训练周期动辄数天，限制了其实用性与推广性。\n为此，我们提出了一个新颖的统一框架，可在在线方式下同时完成相机追踪、人体姿态估计与人-场景联合重建。我们采用 3D Gaussian Splatting 高效学习人物与场景的高斯原语，并设计了基于重建的相机追踪与人体姿态估计模块，实现姿态与外观的有效解耦与整体理解。\n具体而言，我们引入了一个人体变形模块，可在保持细节重建质量的同时提升对分布外姿态的泛化能力。为准确学习人物与场景之间的空间关系，我们进一步引入了遮挡感知的人体轮廓渲染机制与单目几何先验，显著增强了重建质量。\n在 EMDB 与 NeuMan 数据集上的实验结果表明，我们的方法在相机追踪、人体姿态估计、新视角合成与运行时效率等方面均优于或媲美现有技术，展现出在复杂真实场景中的强大适应性与实用性。\n"
  },
  {
    "path": "abs/2504.13175.md",
    "content": "### Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation\n\nVisuomotor policies learned from teleoperated demonstrations face challenges such as lengthy data collection, high costs, and limited data diversity. Existing approaches address these issues by augmenting image observations in RGB space or employing Real-to-Sim-to-Real pipelines based on physical simulators. However, the former is constrained to 2D data augmentation, while the latter suffers from imprecise physical simulation caused by inaccurate geometric reconstruction. This paper introduces RoboSplat, a novel method that generates diverse, visually realistic demonstrations by directly manipulating 3D Gaussians. Specifically, we reconstruct the scene through 3D Gaussian Splatting (3DGS), directly edit the reconstructed scene, and augment data across six types of generalization with five techniques: 3D Gaussian replacement for varying object types, scene appearance, and robot embodiments; equivariant transformations for different object poses; visual attribute editing for various lighting conditions; novel view synthesis for new camera perspectives; and 3D content generation for diverse object types. Comprehensive real-world experiments demonstrate that RoboSplat significantly enhances the generalization of visuomotor policies under diverse disturbances. Notably, while policies trained on hundreds of real-world demonstrations with additional 2D data augmentation achieve an average success rate of 57.2%, RoboSplat attains 87.8% in one-shot settings across six types of generalization in the real world.\n\n从远程操控演示中学习视觉-运动策略，常常面临数据采集周期长、成本高昂以及数据多样性受限等问题。现有方法试图通过在 RGB 空间中增强图像观测，或借助物理模拟器构建 Real-to-Sim-to-Real 流程来缓解这些挑战。然而，前者仅限于二维增强，后者则受到几何重建精度不足所引发的物理模拟误差的影响，难以实现可靠的策略泛化。\n本文提出 RoboSplat，一种通过直接操控三维高斯表示生成多样、真实感演示的新方法。该方法基于 3D Gaussian Splatting（3DGS）对场景进行重建，并在此基础上直接进行编辑，从而实现在多个泛化维度上的增强。通过高斯层面的表示操控，RoboSplat能够自然地处理不同物体类型、外观、机器人形态、姿态、光照条件与视角变化，统一地提升策略的感知广度与行为泛化能力，同时避免了以往方法对昂贵仿真或逐帧图像操作的依赖。\n在真实环境中的系统性评估表明，RoboSplat 显著提升了视觉-运动策略在多种扰动下的稳健性。在六类真实世界泛化任务的 one-shot 设定中，RoboSplat 达到了 87.8% 的平均成功率，而对比策略在结合二维增强并使用数百条演示训练的情况下，仅达 57.2%。该结果充分验证了 RoboSplat 在数据效率与泛化能力方面的优势。\n"
  },
  {
    "path": "abs/2504.13204.md",
    "content": "### EDGS: Eliminating Densification for Efficient Convergence of 3DGS\n\n3D Gaussian Splatting reconstructs scenes by starting from a sparse Structure-from-Motion initialization and iteratively refining under-reconstructed regions. This process is inherently slow, as it requires multiple densification steps where Gaussians are repeatedly split and adjusted, following a lengthy optimization path. Moreover, this incremental approach often leads to suboptimal renderings, particularly in high-frequency regions where detail is critical.\nWe propose a fundamentally different approach: we eliminate densification process with a one-step approximation of scene geometry using triangulated pixels from dense image correspondences. This dense initialization allows us to estimate rough geometry of the scene while preserving rich details from input RGB images, providing each Gaussian with well-informed colors, scales, and positions. As a result, we dramatically shorten the optimization path and remove the need for densification. Unlike traditional methods that rely on sparse keypoints, our dense initialization ensures uniform detail across the scene, even in high-frequency regions where 3DGS and other methods struggle. Moreover, since all splats are initialized in parallel at the start of optimization, we eliminate the need to wait for densification to adjust new Gaussians.\nOur method not only outperforms speed-optimized models in training efficiency but also achieves higher rendering quality than state-of-the-art approaches, all while using only half the splats of standard 3DGS. It is fully compatible with other 3DGS acceleration techniques, making it a versatile and efficient solution that can be integrated with existing approaches.\n\n3D Gaussian Splatting 通过从稀疏的结构光束法（Structure-from-Motion）初始化开始，并对重建不足的区域进行迭代优化，从而完成场景重建。然而，这一过程本质上较为缓慢，因为它依赖于多次致密化步骤，即不断地对高斯进行拆分与调整，并沿着冗长的优化路径前进。此外，这种增量式方法常常在高频区域表现不佳，导致渲染效果不理想，尤其是在细节至关重要的部分。\n我们提出了一种根本不同的方案：通过基于密集图像对应关系三角化像素来一次性近似场景几何，从而彻底省略了致密化过程。这种密集初始化能够在保持输入 RGB 图像丰富细节的同时估计场景的大致几何，为每个高斯赋予更合理的颜色、尺度和位置。因此，我们显著缩短了优化路径，并完全消除了对致密化的依赖。与传统依赖稀疏关键点的方法不同，我们的密集初始化在整个场景中提供了均匀的细节表现，尤其是在 3DGS 与其他方法难以还原的高频区域表现尤为出色。此外，由于所有 splat（高斯）在优化开始时即可并行初始化，我们无需等待致密化过程来逐步添加新的高斯。\n我们的方法不仅在训练效率上超过了当前的速度优化模型，而且在渲染质量上也优于现有最先进的方法，同时仅使用标准 3DGS 一半数量的 splat。该方法与其他 3DGS 加速技术完全兼容，是一种可灵活集成于现有框架中的高效解决方案。\n"
  },
  {
    "path": "abs/2504.13207.md",
    "content": "### BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction\n\nRoad surface is the sole contact medium for wheels or robot feet. Reconstructing road surface is crucial for unmanned vehicles and mobile robots. Recent studies on Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) have achieved remarkable results in scene reconstruction. However, they typically rely on multi-view image inputs and require prolonged optimization times. In this paper, we propose BEV-GS, a real-time single-frame road surface reconstruction method based on feed-forward Gaussian splatting. BEV-GS consists of a prediction module and a rendering module. The prediction module introduces separate geometry and texture networks following Bird's-Eye-View paradigm. Geometric and texture parameters are directly estimated from a single frame, avoiding per-scene optimization. In the rendering module, we utilize grid Gaussian for road surface representation and novel view synthesis, which better aligns with road surface characteristics. Our method achieves state-of-the-art performance on the real-world dataset RSRD. The road elevation error reduces to 1.73 cm, and the PSNR of novel view synthesis reaches 28.36 dB. The prediction and rendering FPS is 26, and 2061, respectively, enabling high-accuracy and real-time applications.\n\n道路表面是车轮或机器人足部与环境接触的唯一介质，其重建对于无人驾驶车辆和移动机器人至关重要。近年来，Neural Radiance Fields（NeRF）与 Gaussian Splatting（GS）在场景重建方面取得了显著成果，但这些方法通常依赖多视图图像输入，且需要较长的优化时间。\n本文提出 BEV-GS，一种基于前馈式 Gaussian Splatting 的实时单帧道路表面重建方法。BEV-GS 包含预测模块与渲染模块：预测模块采用鸟瞰图（Bird’s-Eye-View）范式，引入独立的几何网络与纹理网络，直接从单帧图像中估计几何与纹理参数，避免了逐场景优化的开销；在渲染模块中，我们使用网格高斯（Grid Gaussian）表示道路表面并进行新视角合成，更契合道路表面的结构特性。\n在真实数据集 RSRD 上的实验表明，BEV-GS 达到了当前最先进的性能表现：道路高程误差降低至 1.73 cm，新视角合成的 PSNR 达到 28.36 dB，预测与渲染速度分别为 26 FPS 与 2061 FPS，展现出高精度与实时性兼具的应用潜力。\n"
  },
  {
    "path": "abs/2504.13339.md",
    "content": "### Volume Encoding Gaussians: Transfer Function-Agnostic 3D Gaussians for Volume Rendering\n\nWhile HPC resources are increasingly being used to produce adaptively refined or unstructured volume datasets, current research in applying machine learning-based representation to visualization has largely ignored this type of data. To address this, we introduce Volume Encoding Gaussians (VEG), a novel 3D Gaussian-based representation for scientific volume visualization focused on unstructured volumes. Unlike prior 3D Gaussian Splatting (3DGS) methods that store view-dependent color and opacity for each Gaussian, VEG decouple the visual appearance from the data representation by encoding only scalar values, enabling transfer-function-agnostic rendering of 3DGS models for interactive scientific visualization. VEG are directly initialized from volume datasets, eliminating the need for structure-from-motion pipelines like COLMAP. To ensure complete scalar field coverage, we introduce an opacity-guided training strategy, using differentiable rendering with multiple transfer functions to optimize our data representation. This allows VEG to preserve fine features across the full scalar range of a dataset while remaining independent of any specific transfer function. Each Gaussian is scaled and rotated to adapt to local geometry, allowing for efficient representation of unstructured meshes without storing mesh connectivity and while using far fewer primitives. Across a diverse set of data, VEG achieve high reconstruction quality, compress large volume datasets by up to 3600x, and support lightning-fast rendering on commodity GPUs, enabling interactive visualization of large-scale structured and unstructured volumes.\n\n随着高性能计算（HPC）资源越来越多地被用于生成自适应细化或非结构化的体数据集，当前基于机器学习的可视化表示研究却在很大程度上忽视了这类数据。为了解决这一问题，我们提出了 Volume Encoding Gaussians（VEG），这是一种面向科学体可视化、专为非结构化体数据设计的全新三维高斯表示方法。与以往的 3D Gaussian Splatting（3DGS）方法不同，后者为每个高斯存储与视角相关的颜色和不透明度，VEG 将视觉外观与数据表示解耦，仅编码标量值，从而实现与传递函数无关的 3DGS 渲染，支持交互式的科学可视化。\nVEG 直接从体数据集初始化，无需依赖如 COLMAP 等结构光束法管线。为了覆盖完整的标量场，我们引入了一种基于不透明度引导的训练策略，利用多个传递函数下的可微渲染对数据表示进行优化。这使 VEG 能够在整个标量范围内保留细粒度特征，同时不依赖于任何特定的传递函数。\n每个高斯在空间中被缩放与旋转，以适应局部几何结构，从而无需存储网格连接信息即可高效表示非结构化网格，并显著减少所需的表示原语数量。在多个不同类型的数据集上，VEG 实现了高质量的重建效果，将大型体数据集压缩至最高 3600 倍，并支持在普通 GPU 上实现极快的渲染速度，使大规模结构化与非结构化体数据的交互式可视化成为可能。\n"
  },
  {
    "path": "abs/2504.13540.md",
    "content": "### EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting\n\nIn this paper, we explore an open research problem concerning the reconstruction of 3D scenes from images. Recent methods have adopt 3D Gaussian Splatting (3DGS) to produce 3D scenes due to its efficient training process. However, these methodologies may generate incomplete 3D scenes or blurred multiviews. This is because of (1) inaccurate 3DGS point initialization and (2) the tendency of 3DGS to flatten 3D Gaussians with the sparse-view input. To address these issues, we propose a novel framework EG-Gaussian, which utilizes epipolar geometry and graph networks for 3D scene reconstruction. Initially, we integrate epipolar geometry into the 3DGS initialization phase to enhance initial 3DGS point construction. Then, we specifically design a graph learning module to refine 3DGS spatial features, in which we incorporate both spatial coordinates and angular relationships among neighboring points. Experiments on indoor and outdoor benchmark datasets demonstrate that our approach significantly improves reconstruction accuracy compared to 3DGS-based methods.\n\n本文聚焦于一个尚未解决的研究问题：如何从图像中重建三维场景。近年来，一些方法采用了 3D Gaussian Splatting（3DGS） 以实现高效的三维场景生成。然而，这些方法可能会生成不完整的三维场景或产生模糊的多视角图像。其主要原因包括：（1）3DGS 点的初始化不准确，以及（2）在输入视角稀疏的情况下，3DGS 倾向于将三维高斯展平。\n为了解决上述问题，我们提出了一种全新的重建框架 EG-Gaussian，结合了对极几何与图神经网络，用于三维场景重建。具体而言，我们首先将对极几何引入 3DGS 的初始化阶段，以提升初始高斯点的构建质量。随后，我们设计了一种专门的图学习模块，用于精细化 3DGS 的空间特征表示。该模块不仅考虑了点的空间坐标信息，还融合了相邻点之间的角度关系。\n在多个室内与室外基准数据集上的实验结果表明，与现有基于 3DGS 的方法相比，我们的方法在重建精度上取得了显著提升。\n"
  },
  {
    "path": "abs/2504.13697.md",
    "content": "### Green Robotic Mixed Reality with Gaussian Splatting\n\nRealizing green communication in robotic mixed reality (RoboMR) systems presents a challenge, due to the necessity of uploading high-resolution images at high frequencies through wireless channels. This paper proposes Gaussian splatting (GS) RoboMR (GSRMR), which achieves a lower energy consumption and makes a concrete step towards green RoboMR. The crux to GSRMR is to build a GS model which enables the simulator to opportunistically render a photo-realistic view from the robot's pose, thereby reducing the need for excessive image uploads. Since the GS model may involve discrepancies compared to the actual environments, a GS cross-layer optimization (GSCLO) framework is further proposed, which jointly optimizes content switching (i.e., deciding whether to upload image or not) and power allocation across different frames. The GSCLO problem is solved by an accelerated penalty optimization (APO) algorithm. Experiments demonstrate that the proposed GSRMR reduces the communication energy by over 10x compared with RoboMR. Furthermore, the proposed GSRMR with APO outperforms extensive baseline schemes, in terms of peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM).\n\n在机器人混合现实（RoboMR）系统中实现绿色通信是一项挑战，主要原因在于需要通过无线信道高频率地上传高分辨率图像。本文提出 Gaussian Splatting RoboMR（GSRMR），在降低能耗的同时，向绿色 RoboMR 迈出了实质性一步。\nGSRMR 的核心在于构建一个 Gaussian Splatting（GS）模型，使模拟器能够根据机器人的位姿有选择性地渲染真实感视图，从而减少过多图像上传的需求。由于 GS 模型可能与真实环境存在偏差，本文进一步提出一个 GS 跨层优化框架（GSCLO），联合优化内容切换（即是否上传图像）与跨帧功率分配策略。\n该 GSCLO 问题通过一种加速惩罚优化算法（APO）进行求解。实验结果表明，与传统 RoboMR 方法相比，GSRMR 能将通信能耗降低 超过 10 倍。此外，在峰值信噪比（PSNR）与结构相似性指标（SSIM）方面，结合 APO 的 GSRMR 也显著优于多种现有基线方案。\n"
  },
  {
    "path": "abs/2504.14373.md",
    "content": "### SEGA: Drivable 3D Gaussian Head Avatar from a Single Image\n\nCreating photorealistic 3D head avatars from limited input has become increasingly important for applications in virtual reality, telepresence, and digital entertainment. While recent advances like neural rendering and 3D Gaussian splatting have enabled high-quality digital human avatar creation and animation, most methods rely on multiple images or multi-view inputs, limiting their practicality for real-world use. In this paper, we propose SEGA, a novel approach for Single-imagE-based 3D drivable Gaussian head Avatar creation that combines generalized prior models with a new hierarchical UV-space Gaussian Splatting framework. SEGA seamlessly combines priors derived from large-scale 2D datasets with 3D priors learned from multi-view, multi-expression, and multi-ID data, achieving robust generalization to unseen identities while ensuring 3D consistency across novel viewpoints and expressions. We further present a hierarchical UV-space Gaussian Splatting framework that leverages FLAME-based structural priors and employs a dual-branch architecture to disentangle dynamic and static facial components effectively. The dynamic branch encodes expression-driven fine details, while the static branch focuses on expression-invariant regions, enabling efficient parameter inference and precomputation. This design maximizes the utility of limited 3D data and achieves real-time performance for animation and rendering. Additionally, SEGA performs person-specific fine-tuning to further enhance the fidelity and realism of the generated avatars. Experiments show our method outperforms state-of-the-art approaches in generalization ability, identity preservation, and expression realism, advancing one-shot avatar creation for practical applications.\n\n从有限输入中创建逼真的三维头部头像在虚拟现实、远程呈现和数字娱乐等应用中变得日益重要。尽管神经渲染和 3D Gaussian Splatting 等技术的最新进展已经实现了高质量数字人像的生成与动画，但现有方法大多依赖多张图像或多视角输入，限制了其在真实场景中的应用。\n本文提出了一种新方法 SEGA，用于基于单张图像的三维可驱动高斯头像生成（Single-imagE-based 3D drivable Gaussian head Avatar），结合了通用先验模型和一种新颖的层次化 UV 空间高斯投影框架。SEGA 无缝融合了来源于大规模二维数据集的先验信息与从多视角、多表情和多身份三维数据中学习到的三维先验，能够在保持三维一致性的同时，实现对未见身份的强泛化能力。\n我们进一步提出了一种层次化 UV 空间高斯投影框架，该框架结合了基于 FLAME 的结构先验，并采用双分支结构，有效解耦面部的动态与静态部分。动态分支编码由表情驱动的细节特征，静态分支则关注于表情不变区域，从而支持高效的参数推理与预计算。该设计最大程度地利用了有限的三维数据，实现了动画与渲染的实时性能。\n此外，SEGA 还支持基于个体的精调，以进一步提升生成头像的保真度与真实感。实验表明，该方法在泛化能力、身份保持性和表情真实度方面均优于当前最先进的方法，推动了一次性头像生成技术在实际应用中的发展。\n"
  },
  {
    "path": "abs/2504.14460.md",
    "content": "### Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding\n\nThe introduction of 3D Gaussian Splatting (3DGS) has advanced novel view synthesis by utilizing Gaussians to represent scenes. Encoding Gaussian point features with anchor embeddings has significantly enhanced the performance of newer 3DGS variants. While significant advances have been made, it is still challenging to boost rendering performance. Feature embeddings have difficulty accurately representing colors from different perspectives under varying lighting conditions, which leads to a washed-out appearance. Another reason is the lack of a proper densification strategy that prevents Gaussian point growth in thinly initialized areas, resulting in blurriness and needle-shaped artifacts. To address them, we propose Metamon-GS, from innovative viewpoints of variance-guided densification strategy and multi-level hash grid. The densification strategy guided by variance specifically targets Gaussians with high gradient variance in pixels and compensates for the importance of regions with extra Gaussians to improve reconstruction. The latter studies implicit global lighting conditions and accurately interprets color from different perspectives and feature embeddings. Our thorough experiments on publicly available datasets show that Metamon-GS surpasses its baseline model and previous versions, delivering superior quality in rendering novel views.\n\n3D Gaussian Splatting（3DGS）的引入通过使用高斯表示场景，推动了新视角合成技术的发展。将高斯点特征编码与锚点嵌入相结合，显著提升了新一代 3DGS 方法的表现。尽管已取得显著进展，但提升渲染性能仍然面临挑战。一方面，特征嵌入在不同光照条件下难以准确表达来自不同视角的颜色，导致图像出现泛白现象；另一方面，缺乏有效的致密化策略使得在初始稀疏区域中高斯点无法增长，进而引发模糊和针状伪影等问题。\n为解决上述问题，我们提出 Metamon-GS，从两个创新角度切入：基于方差引导的致密化策略与多层次哈希网格结构。其中，基于方差的致密化策略专门针对在像素中具有高梯度方差的高斯点，通过补充高斯点以提升关键区域的重建效果。多层哈希网格结构则用于建模全局隐式光照条件，从而更准确地解析来自不同视角的颜色与特征嵌入。\n我们在多个公开数据集上的实验结果表明，Metamon-GS 在渲染新视图的质量上显著优于其基础模型及以往版本。\n"
  },
  {
    "path": "abs/2504.14548.md",
    "content": "### VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control\n\nSparse-view 3D reconstruction is a fundamental yet challenging task in practical 3D reconstruction applications. Recently, many methods based on the 3D Gaussian Splatting (3DGS) framework have been proposed to address sparse-view 3D reconstruction. Although these methods have made considerable advancements, they still show significant issues with overfitting. To reduce the overfitting, we introduce VGNC, a novel Validation-guided Gaussian Number Control (VGNC) approach based on generative novel view synthesis (NVS) models. To the best of our knowledge, this is the first attempt to alleviate the overfitting issue of sparse-view 3DGS with generative validation images. Specifically, we first introduce a validation image generation method based on a generative NVS model. We then propose a Gaussian number control strategy that utilizes generated validation images to determine the optimal Gaussian numbers, thereby reducing the issue of overfitting. We conducted detailed experiments on various sparse-view 3DGS baselines and datasets to evaluate the effectiveness of VGNC. Extensive experiments show that our approach not only reduces overfitting but also improves rendering quality on the test set while decreasing the number of Gaussian points. This reduction lowers storage demands and accelerates both training and rendering. The code will be released.\n\n稀疏视角的三维重建是在实际三维重建应用中既基础又具有挑战性的问题。近年来，许多基于 3D Gaussian Splatting（3DGS）框架的方法被提出以应对该任务。尽管这些方法在一定程度上取得了进展，但仍普遍存在过拟合严重的问题。\n为缓解这一问题，我们提出了 VGNC，一种基于生成式新视角合成（NVS）模型的验证引导型高斯数量控制方法（Validation-guided Gaussian Number Control）。据我们所知，这是首次通过生成式验证图像来缓解稀疏视角 3DGS 的过拟合问题。\n具体而言，我们首先引入一种基于生成式 NVS 模型的验证图像生成方法；随后，提出一种利用生成的验证图像来判定高斯数量的控制策略，从而抑制过拟合现象。\n我们在多个稀疏视角 3DGS 基线方法和数据集上进行了系统实验，以验证 VGNC 的有效性。大量实验结果表明，该方法不仅有效降低了过拟合程度，还提升了测试集上的渲染质量，同时减少了高斯点的数量，进而降低了存储需求，加快了训练与渲染过程。\n相关代码将公开发布。\n"
  },
  {
    "path": "abs/2504.14638.md",
    "content": "### NVSMask3D: Hard Visual Prompting with Camera Pose Interpolation for 3D Open Vocabulary Instance Segmentation\n\nVision-language models (VLMs) have demonstrated impressive zero-shot transfer capabilities in image-level visual perception tasks. However, they fall short in 3D instance-level segmentation tasks that require accurate localization and recognition of individual objects. To bridge this gap, we introduce a novel 3D Gaussian Splatting based hard visual prompting approach that leverages camera interpolation to generate diverse viewpoints around target objects without any 2D-3D optimization or fine-tuning. Our method simulates realistic 3D perspectives, effectively augmenting existing hard visual prompts by enforcing geometric consistency across viewpoints. This training-free strategy seamlessly integrates with prior hard visual prompts, enriching object-descriptive features and enabling VLMs to achieve more robust and accurate 3D instance segmentation in diverse 3D scenes.\n\n视觉-语言模型（Vision-Language Models，VLMs）在图像级视觉感知任务中展现出了出色的零样本迁移能力。然而，在需要精确定位与识别个体物体的三维实例级分割任务中，其性能仍然有限。\n为弥补这一差距，我们提出了一种基于 3D Gaussian Splatting 的新颖硬视觉提示（hard visual prompting）方法，通过摄像机插值生成围绕目标物体的多样化视角，而无需任何 2D-3D 优化或微调操作。该方法能够模拟逼真的三维视角，有效增强现有硬视觉提示在几何一致性上的约束，从而扩展其表达能力。\n这一无需训练的策略可与已有的硬视觉提示无缝集成，丰富物体的描述性特征，使 VLMs 在多样化三维场景中实现更稳健且精确的三维实例分割。\n"
  },
  {
    "path": "abs/2504.14699.md",
    "content": "### IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays\n\nSpine surgery is a high-risk intervention demanding precise execution, often supported by image-based navigation systems. Recently, supervised learning approaches have gained attention for reconstructing 3D spinal anatomy from sparse fluoroscopic data, significantly reducing reliance on radiation-intensive 3D imaging systems. However, these methods typically require large amounts of annotated training data and may struggle to generalize across varying patient anatomies or imaging conditions. Instance-learning approaches like Gaussian splatting could offer an alternative by avoiding extensive annotation requirements. While Gaussian splatting has shown promise for novel view synthesis, its application to sparse, arbitrarily posed real intraoperative X-rays has remained largely unexplored. This work addresses this limitation by extending the -Gaussian splatting framework to reconstruct anatomically consistent 3D volumes under these challenging conditions. We introduce an anatomy-guided radiographic standardization step using style transfer, improving visual consistency across views, and enhancing reconstruction quality. Notably, our framework requires no pretraining, making it inherently adaptable to new patients and anatomies. We evaluated our approach using an ex-vivo dataset. Expert surgical evaluation confirmed the clinical utility of the 3D reconstructions for navigation, especially when using 20 to 30 views, and highlighted the standardization's benefit for anatomical clarity. Benchmarking via quantitative 2D metrics (PSNR/SSIM) confirmed performance trade-offs compared to idealized settings, but also validated the improvement gained from standardization over raw inputs. This work demonstrates the feasibility of instance-based volumetric reconstruction from arbitrary sparse-view X-rays, advancing intraoperative 3D imaging for surgical navigation.\n\n脊柱手术是一项高风险操作，对精准执行有极高要求，常依赖于基于图像的导航系统。近年来，监督学习方法因能从稀疏透视图中重建三维脊柱结构而受到关注，显著降低了对高辐射三维成像系统的依赖。然而，此类方法通常需要大量标注训练数据，且在面对不同患者解剖结构或成像条件变化时，其泛化能力可能受限。\n基于实例学习的方法（如 Gaussian Splatting）为替代方案提供了可能，它无需大规模标注数据即可进行重建。尽管 Gaussian Splatting 在新视角合成任务中表现出色，但其在稀疏、姿态任意的真实术中 X 光图像上的应用仍鲜有探索。\n本研究正是为填补这一空白，提出将 R2-Gaussian Splatting 框架扩展至三维体积重建任务，在极端条件下实现解剖一致的三维重建。我们引入了一种 基于解剖结构引导的放射图像标准化流程，通过风格迁移提升多视角之间的视觉一致性，从而增强重建质量。\n值得一提的是，整个框架无需预训练，因此可自然适配于新的患者与不同的解剖结构。我们在一个离体数据集上对方法进行了评估。外科专家验证了该三维重建在导航中的临床实用性，特别是在使用 20 至 30 张视图的情况下，同时指出图像标准化有助于提升解剖结构的清晰度。\n通过 2D 定量指标（PSNR/SSIM）对比实验显示，尽管该方法在非理想条件下存在一定性能折中，但标准化相较于原始输入带来了显著的质量提升。研究结果表明，从任意稀疏视角 X 光图像出发进行基于实例的体积重建是可行的，为术中三维成像与导航带来了新进展。\n"
  },
  {
    "path": "abs/2504.15122.md",
    "content": "### MoBGS: Motion Deblurring Dynamic 3D Gaussian Splatting for Blurry Monocular Video\n\nWe present MoBGS, a novel deblurring dynamic 3D Gaussian Splatting (3DGS) framework capable of reconstructing sharp and high-quality novel spatio-temporal views from blurry monocular videos in an end-to-end manner. Existing dynamic novel view synthesis (NVS) methods are highly sensitive to motion blur in casually captured videos, resulting in significant degradation of rendering quality. While recent approaches address motion-blurred inputs for NVS, they primarily focus on static scene reconstruction and lack dedicated motion modeling for dynamic objects. To overcome these limitations, our MoBGS introduces a novel Blur-adaptive Latent Camera Estimation (BLCE) method for effective latent camera trajectory estimation, improving global camera motion deblurring. In addition, we propose a physically-inspired Latent Camera-induced Exposure Estimation (LCEE) method to ensure consistent deblurring of both global camera and local object motion. Our MoBGS framework ensures the temporal consistency of unseen latent timestamps and robust motion decomposition of static and dynamic regions. Extensive experiments on the Stereo Blur dataset and real-world blurry videos show that our MoBGS significantly outperforms the very recent advanced methods (DyBluRF and Deblur4DGS), achieving state-of-the-art performance for dynamic NVS under motion blur.\n\n我们提出了 MoBGS，一个新颖的去模糊动态 3D Gaussian Splatting（3DGS）框架，能够端到端地从模糊的单目视频中重建清晰且高质量的时空新视角图像。现有的动态新视角合成（Novel View Synthesis, NVS）方法对日常拍摄视频中的运动模糊极为敏感，导致渲染质量显著下降。尽管近期的一些方法开始关注带有运动模糊输入的 NVS 问题，但这些方法大多聚焦于静态场景重建，缺乏对动态物体运动的专门建模。\n为克服上述限制，MoBGS 引入了一种新颖的 模糊自适应潜摄像机估计方法（Blur-adaptive Latent Camera Estimation, BLCE），用于有效估计潜摄像机轨迹，从而改善全局摄像机运动引起的模糊。此外，我们还提出了一种受物理启发的 潜摄像机诱导曝光估计方法（Latent Camera-induced Exposure Estimation, LCEE），以实现全局摄像机与局部物体运动的统一去模糊。\nMoBGS 框架能够确保未见潜时间戳（latent timestamps）的时间一致性，并实现对静态与动态区域的鲁棒运动解耦。我们在 Stereo Blur 数据集及多个真实模糊视频上进行了广泛实验，结果显示 MoBGS 显著优于最新的先进方法 DyBluRF 和 Deblur4DGS，在存在运动模糊的动态 NVS 任务中达到了当前最优性能。\n"
  },
  {
    "path": "abs/2504.15229.md",
    "content": "### Immersive Teleoperation Framework for Locomanipulation Tasks\n\nRecent advancements in robotic loco-manipulation have leveraged Virtual Reality (VR) to enhance the precision and immersiveness of teleoperation systems, significantly outperforming traditional methods reliant on 2D camera feeds and joystick controls. Despite these advancements, challenges remain, particularly concerning user experience across different setups. This paper introduces a novel VR-based teleoperation framework designed for a robotic manipulator integrated onto a mobile platform. Central to our approach is the application of Gaussian splatting, a technique that abstracts the manipulable scene into a VR environment, thereby enabling more intuitive and immersive interactions. Users can navigate and manipulate within the virtual scene as if interacting with a real robot, enhancing both the engagement and efficacy of teleoperation tasks. An extensive user study validates our approach, demonstrating significant usability and efficiency improvements. Two-thirds (66%) of participants completed tasks faster, achieving an average time reduction of 43%. Additionally, 93% preferred the Gaussian Splat interface overall, with unanimous (100%) recommendations for future use, highlighting improvements in precision, responsiveness, and situational awareness. Finally, we demonstrate the effectiveness of our framework through real-world experiments in two distinct application scenarios, showcasing the practical capabilities and versatility of the Splat-based VR interface.\n\n近年来，机器人行走-操作（loco-manipulation）领域取得了显著进展，虚拟现实（VR）被广泛应用于增强遥操作系统的精度与沉浸感，显著优于依赖二维摄像头画面与操纵杆控制的传统方法。尽管如此，不同系统配置下的用户体验仍存在诸多挑战。\n本文提出一种全新的 基于 VR 的遥操作框架，用于控制安装在移动平台上的机器人机械臂。我们方法的核心是引入 Gaussian Splatting 技术，将可操作的真实场景抽象为虚拟环境中的交互空间，从而实现更直观、更沉浸的操作体验。用户可如同与真实机器人交互一般，在虚拟场景中进行导航与操控，有效提升遥操作任务的参与度与效率。\n一项大规模用户研究验证了我们框架的有效性：约 66% 的参与者完成任务更快，平均耗时减少 43%；93% 的用户更偏好 Gaussian Splat 接口，并有 100% 的用户推荐未来继续使用，他们普遍认为该方法在操作精度、响应速度和态势感知方面表现优异。\n此外，我们在两个真实应用场景中展示了该框架的实际效果，进一步证明了基于 Splat 的 VR 接口在机器人遥操作中的实用性与广泛适应性。\n"
  },
  {
    "path": "abs/2504.15281.md",
    "content": "### StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians\n\n3D Gaussian Splatting (3DGS) excels in photorealistic scene reconstruction but struggles with stylized scenarios (e.g., cartoons, games) due to fragmented textures, semantic misalignment, and limited adaptability to abstract aesthetics. We propose StyleMe3D, a holistic framework for 3D GS style transfer that integrates multi-modal style conditioning, multi-level semantic alignment, and perceptual quality enhancement. Our key insights include: (1) optimizing only RGB attributes preserves geometric integrity during stylization; (2) disentangling low-, medium-, and high-level semantics is critical for coherent style transfer; (3) scalability across isolated objects and complex scenes is essential for practical deployment. StyleMe3D introduces four novel components: Dynamic Style Score Distillation (DSSD), leveraging Stable Diffusion's latent space for semantic alignment; Contrastive Style Descriptor (CSD) for localized, content-aware texture transfer; Simultaneously Optimized Scale (SOS) to decouple style details and structural coherence; and 3D Gaussian Quality Assessment (3DG-QA), a differentiable aesthetic prior trained on human-rated data to suppress artifacts and enhance visual harmony. Evaluated on NeRF synthetic dataset (objects) and tandt db (scenes) datasets, StyleMe3D outperforms state-of-the-art methods in preserving geometric details (e.g., carvings on sculptures) and ensuring stylistic consistency across scenes (e.g., coherent lighting in landscapes), while maintaining real-time rendering. This work bridges photorealistic 3D GS and artistic stylization, unlocking applications in gaming, virtual worlds, and digital art.\n\n3D Gaussian Splatting (3DGS) 在真实感场景重建方面表现出色，但在风格化场景（例如卡通、游戏）中表现不佳，原因在于纹理碎片化、语义错配以及对抽象美学的适应能力有限。我们提出 StyleMe3D，一个用于 3DGS 风格迁移的整体框架，融合了多模态风格条件、多层次语义对齐和感知质量增强机制。我们的核心见解包括：（1）仅优化 RGB 属性可以在风格化过程中保留几何完整性；（2）将低层、中层和高层语义进行解耦对于一致性风格迁移至关重要；（3）在孤立物体和复杂场景中的可扩展性对实际部署至关重要。StyleMe3D 引入了四个新组件：Dynamic Style Score Distillation (DSSD)，利用 Stable Diffusion 的潜在空间实现语义对齐；Contrastive Style Descriptor (CSD)，实现局部、内容感知的纹理迁移；Simultaneously Optimized Scale (SOS)，解耦风格细节与结构一致性；以及 3D Gaussian Quality Assessment (3DG-QA)，一个基于人工评分数据训练的可微美学先验，用于抑制伪影并增强视觉协调性。在 NeRF 合成数据集（对象）和 tandt db（场景）数据集上的评估表明，StyleMe3D 在保留几何细节（如雕塑的雕刻）和确保场景风格一致性（如风景中的光照协调）方面优于现有方法，同时保持实时渲染性能。本研究连接了真实感 3DGS 与艺术风格化之间的鸿沟，为游戏、虚拟世界和数字艺术等应用场景开辟了新路径。\n"
  },
  {
    "path": "abs/2504.16545.md",
    "content": "### ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration\n\nTime-of-Flight (ToF) sensors provide efficient active depth sensing at relatively low power budgets; among such designs, only very sparse measurements from low-resolution sensors are considered to meet the increasingly limited power constraints of mobile and AR/VR devices. However, such extreme sparsity levels limit the seamless usage of ToF depth in SLAM. In this work, we propose ToF-Splatting, the first 3D Gaussian Splatting-based SLAM pipeline tailored for using effectively very sparse ToF input data. Our approach improves upon the state of the art by introducing a multi-frame integration module, which produces dense depth maps by merging cues from extremely sparse ToF depth, monocular color, and multi-view geometry. Extensive experiments on both synthetic and real sparse ToF datasets demonstrate the viability of our approach, as it achieves state-of-the-art tracking and mapping performances on reference datasets.\n\nTime-of-Flight（ToF）传感器以相对较低的功耗实现了高效的主动式深度感知；在此类设计中，仅能从低分辨率传感器中获取非常稀疏的测量，以满足移动设备和 AR/VR 设备日益严格的功耗限制。然而，如此极端的稀疏程度限制了 ToF 深度在 SLAM 中的无缝应用。为此，我们提出 ToF-Splatting，这是首个专为极稀疏 ToF 输入设计的、基于三维高斯投影（3D Gaussian Splatting）的 SLAM 管线。我们的方法通过引入多帧融合模块，在现有技术基础上实现了改进。该模块将极其稀疏的 ToF 深度、单目彩色图像以及多视几何信息融合，从而生成稠密深度图。我们在合成与真实稀疏 ToF 数据集上的大量实验表明，该方法在参考数据集上达到了最先进的跟踪与建图性能，验证了其有效性。\n"
  },
  {
    "path": "abs/2504.16606.md",
    "content": "### HUG: Hierarchical Urban Gaussian Splatting with Block-Based Reconstruction\n\nAs urban 3D scenes become increasingly complex and the demand for high-quality rendering grows, efficient scene reconstruction and rendering techniques become crucial. We present HUG, a novel approach to address inefficiencies in handling large-scale urban environments and intricate details based on 3D Gaussian splatting. Our method optimizes data partitioning and the reconstruction pipeline by incorporating a hierarchical neural Gaussian representation. We employ an enhanced block-based reconstruction pipeline focusing on improving reconstruction quality within each block and reducing the need for redundant training regions around block boundaries. By integrating neural Gaussian representation with a hierarchical architecture, we achieve high-quality scene rendering at a low computational cost. This is demonstrated by our state-of-the-art results on public benchmarks, which prove the effectiveness and advantages in large-scale urban scene representation.\n\n随着城市三维场景日益复杂，以及对高质量渲染需求的增长，高效的场景重建与渲染技术变得尤为关键。我们提出 HUG，一种基于三维高斯投影（3D Gaussian Splatting）的方法，旨在解决处理大规模城市环境与复杂细节时的效率问题。该方法通过引入分层神经高斯表示，优化了数据划分与重建流程。我们采用增强的基于块的重建管线，重点提升每个块内的重建质量，并减少块边界周围冗余训练区域的需求。通过将神经高斯表示与分层结构相结合，HUG 实现了低计算成本下的高质量场景渲染。我们在公开基准数据集上的实验结果表明，HUG 在大规模城市场景表达方面达到了当前最先进水平，验证了其有效性与优势。\n"
  },
  {
    "path": "abs/2504.16693.md",
    "content": "### PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation\n\nWhile non-prehensile manipulation (e.g., controlled pushing/poking) constitutes a foundational robotic skill, its learning remains challenging due to the high sensitivity to complex physical interactions involving friction and restitution. To achieve robust policy learning and generalization, we opt to learn a world model of the 3D rigid body dynamics involved in non-prehensile manipulations and use it for model-based reinforcement learning. We propose PIN-WM, a Physics-INformed World Model that enables efficient end-to-end identification of a 3D rigid body dynamical system from visual observations. Adopting differentiable physics simulation, PIN-WM can be learned with only few-shot and task-agnostic physical interaction trajectories. Further, PIN-WM is learned with observational loss induced by Gaussian Splatting without needing state estimation. To bridge Sim2Real gaps, we turn the learned PIN-WM into a group of Digital Cousins via physics-aware randomizations which perturb physics and rendering parameters to generate diverse and meaningful variations of the PIN-WM. Extensive evaluations on both simulation and real-world tests demonstrate that PIN-WM, enhanced with physics-aware digital cousins, facilitates learning robust non-prehensile manipulation skills with Sim2Real transfer, surpassing the Real2Sim2Real state-of-the-arts.\n\n尽管非抓取式操作（如受控推动/戳动）构成了机器人操作的基础技能，但由于其对涉及摩擦与回弹等复杂物理交互高度敏感，相关学习仍然具有挑战性。为实现稳健的策略学习与泛化，我们选择学习非抓取操作中涉及的三维刚体动力学的世界模型，并将其用于基于模型的强化学习。我们提出 PIN-WM，一种物理先验驱动的世界模型（Physics-INformed World Model），能够从视觉观测中高效地端到端识别三维刚体动力学系统。PIN-WM 采用可微分的物理仿真，仅需少量、任务无关的物理交互轨迹即可学习。此外，PIN-WM 使用由 Gaussian Splatting 引导的观测损失进行训练，无需进行状态估计。\n为了弥合模拟到现实（Sim2Real）的差距，我们将训练得到的 PIN-WM 转化为一组“数字近亲”（Digital Cousins），通过引入物理感知的随机扰动机制，在物理与渲染参数上进行扰动，从而生成多样且具有意义的模型变体。我们在仿真与真实场景中的广泛评估表明，结合物理感知数字近亲的 PIN-WM 有助于学习稳健的非抓取操作技能，实现从模拟到现实的有效迁移，性能优于当前最先进的 Real2Sim2Real 方法。\n"
  },
  {
    "path": "abs/2504.16740.md",
    "content": "### Gaussian Splatting is an Effective Data Generator for 3D Object Detection\n\nWe investigate data augmentation for 3D object detection in autonomous driving. We utilize recent advancements in 3D reconstruction based on Gaussian Splatting for 3D object placement in driving scenes. Unlike existing diffusion-based methods that synthesize images conditioned on BEV layouts, our approach places 3D objects directly in the reconstructed 3D space with explicitly imposed geometric transformations. This ensures both the physical plausibility of object placement and highly accurate 3D pose and position annotations.\nOur experiments demonstrate that even by integrating a limited number of external 3D objects into real scenes, the augmented data significantly enhances 3D object detection performance and outperforms existing diffusion-based 3D augmentation for object detection. Extensive testing on the nuScenes dataset reveals that imposing high geometric diversity in object placement has a greater impact compared to the appearance diversity of objects. Additionally, we show that generating hard examples, either by maximizing detection loss or imposing high visual occlusion in camera images, does not lead to more efficient 3D data augmentation for camera-based 3D object detection in autonomous driving.\n\n我们针对自动驾驶中的三维目标检测任务，研究了数据增强方法。我们利用基于高斯投影（Gaussian Splatting）的最新三维重建技术，将三维目标直接置入驾驶场景中。不同于现有基于扩散模型的方法，这些方法依赖于基于 BEV（鸟瞰图）布局生成图像，我们的方法则在重建的三维空间中直接放置三维目标，并显式施加几何变换，从而确保了目标放置的物理合理性，并实现了高精度的三维姿态与位置标注。\n实验结果表明，即便只在真实场景中引入数量有限的外部三维目标，所生成的增强数据也能显著提升三维目标检测性能，并优于现有基于扩散模型的三维数据增强方法。在 nuScenes 数据集上的大量测试进一步表明，在目标放置中引入高度几何多样性，相较于外观多样性，对性能提升影响更大。此外，我们还发现，通过最大化检测损失或在图像中施加高程度遮挡来生成“困难样本”，并不能有效提升基于摄像头的三维目标检测中的三维数据增强效率。\n"
  },
  {
    "path": "abs/2504.17545.md",
    "content": "### When Gaussian Meets Surfel: Ultra-fast High-fidelity Radiance Field Rendering\n\nWe introduce Gaussian-enhanced Surfels (GESs), a bi-scale representation for radiance field rendering, wherein a set of 2D opaque surfels with view-dependent colors represent the coarse-scale geometry and appearance of scenes, and a few 3D Gaussians surrounding the surfels supplement fine-scale appearance details. The rendering with GESs consists of two passes -- surfels are first rasterized through a standard graphics pipeline to produce depth and color maps, and then Gaussians are splatted with depth testing and color accumulation on each pixel order independently. The optimization of GESs from multi-view images is performed through an elaborate coarse-to-fine procedure, faithfully capturing rich scene appearance. The entirely sorting-free rendering of GESs not only achieves very fast rates, but also produces view-consistent images, successfully avoiding popping artifacts under view changes. The basic GES representation can be easily extended to achieve anti-aliasing in rendering (Mip-GES), boosted rendering speeds (Speedy-GES) and compact storage (Compact-GES), and reconstruct better scene geometries by replacing 3D Gaussians with 2D Gaussians (2D-GES). Experimental results show that GESs advance the state-of-the-arts as a compelling representation for ultra-fast high-fidelity radiance field rendering.\n\n我们提出 Gaussian-enhanced Surfels（GESs），这是一种用于辐射场渲染的双尺度表示方法。该表示中，一组具有视角相关颜色的二维不透明 surfel 表示场景的粗尺度几何与外观，而少量围绕 surfel 分布的三维高斯则补充细节层面的外观信息。GES 的渲染过程包括两个阶段：首先通过标准图形渲染管线对 surfel 进行光栅化，生成深度图与颜色图；随后对高斯进行 splatting 渲染，在每个像素顺序上独立执行深度测试与颜色累积。\nGES 的多视图图像优化采用精心设计的由粗到细的优化流程，能够忠实捕捉丰富的场景外观特征。GES 的全程无需排序的渲染机制不仅实现了极高的渲染速度，还能生成视角一致的图像，成功避免了视角变化下常见的“跳动伪影”（popping artifacts）。\n该基础表示形式还可扩展以支持抗锯齿渲染（Mip-GES）、加速渲染（Speedy-GES）、紧凑存储（Compact-GES），以及通过将三维高斯替换为二维高斯（2D-GES）来获得更好的场景几何重建效果。实验结果表明，GES 作为一种适用于超高速、高保真辐射场渲染的表示方法，在多个方面推动了现有技术的进展。\n"
  },
  {
    "path": "abs/2504.17728.md",
    "content": "### CasualHDRSplat: Robust High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos\n\nRecently, photo-realistic novel view synthesis from multi-view images, such as neural radiance field (NeRF) and 3D Gaussian Splatting (3DGS), have garnered widespread attention due to their superior performance. However, most works rely on low dynamic range (LDR) images, which limits the capturing of richer scene details. Some prior works have focused on high dynamic range (HDR) scene reconstruction, typically require capturing of multi-view sharp images with different exposure times at fixed camera positions during exposure times, which is time-consuming and challenging in practice. For a more flexible data acquisition, we propose a one-stage method: \\textbf{CasualHDRSplat} to easily and robustly reconstruct the 3D HDR scene from casually captured videos with auto-exposure enabled, even in the presence of severe motion blur and varying unknown exposure time. \\textbf{CasualHDRSplat} contains a unified differentiable physical imaging model which first applies continuous-time trajectory constraint to imaging process so that we can jointly optimize exposure time, camera response function (CRF), camera poses, and sharp 3D HDR scene. Extensive experiments demonstrate that our approach outperforms existing methods in terms of robustness and rendering quality.\n\n近年来，从多视图图像中合成真实感新视角图像的方法（如 NeRF 和 3D Gaussian Splatting, 3DGS）因其卓越性能而受到广泛关注。然而，大多数方法依赖于低动态范围（LDR）图像，这限制了对更丰富场景细节的捕捉。部分已有工作尝试进行高动态范围（HDR）场景重建，通常需要在曝光期间、固定相机位置下拍摄多视角、不同曝光时间的清晰图像，这在实际中操作繁琐、成本较高。\n为实现更灵活的数据采集，我们提出了一种单阶段方法：CasualHDRSplat，能够从启用自动曝光、随手拍摄的视频中，稳健地重建三维 HDR 场景，即便存在严重的运动模糊与未知变化的曝光时间也能应对自如。CasualHDRSplat 包含一个统一的可微物理成像模型，首先在成像过程中引入连续时间轨迹约束，从而可联合优化曝光时间、相机响应函数（CRF）、相机姿态以及清晰的三维 HDR 场景。\n大量实验证明，我们的方法在鲁棒性与渲染质量方面均优于现有方法。\n"
  },
  {
    "path": "abs/2504.17810.md",
    "content": "### SmallGS: Gaussian Splatting-based Camera Pose Estimation for Small-Baseline Videos\n\nDynamic videos with small baseline motions are ubiquitous in daily life, especially on social media. However, these videos present a challenge to existing pose estimation frameworks due to ambiguous features, drift accumulation, and insufficient triangulation constraints. Gaussian splatting, which maintains an explicit representation for scenes, provides a reliable novel view rasterization when the viewpoint change is small. Inspired by this, we propose SmallGS, a camera pose estimation framework that is specifically designed for small-baseline videos. SmallGS optimizes sequential camera poses using Gaussian splatting, which reconstructs the scene from the first frame in each video segment to provide a stable reference for the rest. The temporal consistency of Gaussian splatting within limited viewpoint differences reduced the requirement of sufficient depth variations in traditional camera pose estimation. We further incorporate pretrained robust visual features, e.g. DINOv2, into Gaussian splatting, where high-dimensional feature map rendering enhances the robustness of camera pose estimation. By freezing the Gaussian splatting and optimizing camera viewpoints based on rasterized features, SmallGS effectively learns camera poses without requiring explicit feature correspondences or strong parallax motion. We verify the effectiveness of SmallGS in small-baseline videos in TUM-Dynamics sequences, which achieves impressive accuracy in camera pose estimation compared to MonST3R and DORID-SLAM for small-baseline videos in dynamic scenes.\n\n日常生活中广泛存在具有小基线运动的动态视频，尤其在社交媒体中尤为常见。然而，由于特征模糊、误差累积以及三角化约束不足，这类视频对现有的位姿估计算法提出了挑战。Gaussian Splatting 通过对场景的显式表示，在视角变化较小时能实现稳定可靠的新视角渲染，激发了我们的方法设计。\n本文提出 SmallGS，一个专为小基线视频设计的相机位姿估计框架。SmallGS 利用 Gaussian Splatting 优化连续帧的相机位姿，并以每个视频片段的首帧重建场景，作为后续帧的稳定参考。由于 Gaussian Splatting 在小视角变化内具有良好的时间一致性，SmallGS 可缓解传统方法对明显深度差异的依赖。\n此外，我们将预训练的强鲁棒性视觉特征（如 DINOv2）融入 Gaussian Splatting，通过渲染高维特征图增强位姿估计的稳定性。在不更新高斯图元的前提下，仅基于特征图对相机位姿进行优化，SmallGS 无需显式特征匹配或强视差运动，即可有效学习相机运动。\n我们在 TUM-Dynamics 数据集的小基线视频上验证了 SmallGS 的有效性，其相机位姿估计精度明显优于 MonST3R 和 DORID-SLAM 等现有方法，展现了在动态场景小基线视频下的强大表现。\n"
  },
  {
    "path": "abs/2504.17815.md",
    "content": "### Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful and efficient 3D representation for novel view synthesis. This paper extends 3DGS capabilities to inpainting, where masked objects in a scene are replaced with new contents that blend seamlessly with the surroundings. Unlike 2D image inpainting, 3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and semantic cues from multiple input views, as occluded areas in one view may be visible in others. To address this, we propose a method that measures the visibility uncertainties of 3D points across different input views and uses them to guide 3DGI in utilizing complementary visual cues. We also employ uncertainties to learn a semantic concept of scene without the masked object and use a diffusion model to fill masked objects in input images based on the learned concept. Finally, we build a novel 3DGI framework, VISTA, by integrating VISibility-uncerTainty-guided 3DGI with scene conceptuAl learning. VISTA generates high-quality 3DGS models capable of synthesizing artifact-free and naturally inpainted novel views. Furthermore, our approach extends to handling dynamic distractors arising from temporal object changes, enhancing its versatility in diverse scene reconstruction scenarios. We demonstrate the superior performance of our method over state-of-the-art techniques using two challenging datasets: the SPIn-NeRF dataset, featuring 10 diverse static 3D inpainting scenes, and an underwater 3D inpainting dataset derived from UTB180, including fast-moving fish as inpainting targets.\n\n3D Gaussian Splatting（3DGS）作为一种高效而强大的三维表示形式，在新视角合成任务中表现突出。本文将 3DGS 的能力拓展至补全任务（inpainting），即将场景中被遮挡或移除的目标以与周围环境自然融合的内容进行替代。不同于二维图像补全，**三维高斯补全（3DGI）**面临更大挑战：如何有效利用多视图中互补的视觉和语义线索，尤其当某一视角中遮挡区域在其他视角中可见时。\n为此，我们提出一种方法，通过衡量三维点在不同输入视角下的可见性不确定性，引导 3DGI 更好地利用互补视觉线索。同时，我们利用这些不确定性学习一个去除遮挡物后的场景语义概念，并基于该概念使用扩散模型对输入图像中的遮挡区域进行填补。最终，我们构建了一个新的三维补全框架 VISTA，将可见性不确定性引导的 3DGI与场景语义学习相结合，生成高质量的 3DGS 模型，能够合成无伪影、自然过渡的补全视图。\n此外，我们的方法还能处理由于时间变化引起的动态干扰目标，提升其在多样化场景重建任务中的适应性。我们在两个具有挑战性的数据集上验证了方法的优越性：一是包含 10 个多样静态 3D 补全过程景的 SPIn-NeRF 数据集；二是从 UTB180 构建的水下三维补全数据集，其中以高速游动的鱼类为补全目标。实验结果表明，我们的方法在补全质量与视图一致性方面均优于现有最先进技术。\n"
  },
  {
    "path": "abs/2504.17954.md",
    "content": "### iVR-GS: Inverse Volume Rendering for Explorable Visualization via Editable 3D Gaussian Splatting\n\nIn volume visualization, users can interactively explore the three-dimensional data by specifying color and opacity mappings in the transfer function (TF) or adjusting lighting parameters, facilitating meaningful interpretation of the underlying structure. However, rendering large-scale volumes demands powerful GPUs and high-speed memory access for real-time performance. While existing novel view synthesis (NVS) methods offer faster rendering speeds with lower hardware requirements, the visible parts of a reconstructed scene are fixed and constrained by preset TF settings, significantly limiting user exploration. This paper introduces inverse volume rendering via Gaussian splatting (iVR-GS), an innovative NVS method that reduces the rendering cost while enabling scene editing for interactive volume exploration. Specifically, we compose multiple iVR-GS models associated with basic TFs covering disjoint visible parts to make the entire volumetric scene visible. Each basic model contains a collection of 3D editable Gaussians, where each Gaussian is a 3D spatial point that supports real-time scene rendering and editing. We demonstrate the superior reconstruction quality and composability of iVR-GS against other NVS solutions (Plenoxels, CCNeRF, and base 3DGS) on various volume datasets. The code is available at this https URL.\n\n在体数据可视化中，用户可以通过在传输函数（Transfer Function, TF）中指定颜色与不透明度映射，或调整光照参数，交互式地探索三维数据，从而更有意义地理解其潜在结构。然而，对大规模体数据的实时渲染通常依赖高性能 GPU 和高速内存访问，这对计算资源提出了较高要求。尽管现有的新视角合成（Novel View Synthesis, NVS）方法具有更快的渲染速度和更低的硬件要求，但其重建场景的可见区域往往固定，受限于预设的 TF 配置，严重限制了用户的探索灵活性。\n本文提出了一种创新的 NVS 方法 —— 基于高斯投影的反向体渲染（inverse Volume Rendering via Gaussian Splatting, iVR-GS），在显著降低渲染成本的同时，实现了场景编辑与交互式体数据探索。具体而言，我们将多个与基本 TF 对应的 iVR-GS 模型进行组合，每个基本 TF 覆盖体数据中一个不相交的可见子区域，从而实现整个体数据场景的完整可视化。每个基本模型由一组可编辑的三维高斯组成，每个高斯对应一个三维空间点，支持实时场景渲染与编辑。\n我们在多个体数据集上将 iVR-GS 与现有 NVS 方法（如 Plenoxels、CCNeRF 和基础 3DGS）进行对比，结果表明 iVR-GS 在重建质量和模型可组合性方面均具有显著优势。\n"
  },
  {
    "path": "abs/2504.18165.md",
    "content": "### PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models\n\nWe introduce PerfCam, an open source Proof-of-Concept (PoC) digital twinning framework that combines camera and sensory data with 3D Gaussian Splatting and computer vision models for digital twinning, object tracking, and Key Performance Indicators (KPIs) extraction in industrial production lines. By utilizing 3D reconstruction and Convolutional Neural Networks (CNNs), PerfCam offers a semi-automated approach to object tracking and spatial mapping, enabling digital twins that capture real-time KPIs such as availability, performance, Overall Equipment Effectiveness (OEE), and rate of conveyor belts in the production line. We validate the effectiveness of PerfCam through a practical deployment within realistic test production lines in the pharmaceutical industry and contribute an openly published dataset to support further research and development in the field. The results demonstrate PerfCam's ability to deliver actionable insights through its precise digital twin capabilities, underscoring its value as an effective tool for developing usable digital twins in smart manufacturing environments and extracting operational analytics.\n\n我们提出 PerfCam，一个开源的概念验证（Proof-of-Concept, PoC）型数字孪生框架，结合摄像头与传感器数据、三维高斯投影（3D Gaussian Splatting）以及计算机视觉模型，用于工业产线中的数字孪生、目标跟踪与关键绩效指标（KPI）提取。PerfCam 借助三维重建与卷积神经网络（CNN），提供了一种半自动化的目标跟踪与空间映射方案，从而实现可实时捕捉 KPI（如可用性、性能、设备综合效率 OEE 及传送带速率等）的数字孪生系统。\n我们在制药行业的真实测试产线中部署 PerfCam，并验证其有效性，同时公开发布了相关数据集，支持该领域的进一步研究与开发。实验结果表明，PerfCam 能通过精准的数字孪生能力提供可行的业务洞察，体现出其作为智能制造环境中高效数字孪生构建与运营分析工具的价值。\n"
  },
  {
    "path": "abs/2504.18318.md",
    "content": "### STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting\n\nText-to-4D generation is rapidly developing and widely applied in various scenarios. However, existing methods often fail to incorporate adequate spatio-temporal modeling and prompt alignment within a unified framework, resulting in temporal inconsistencies, geometric distortions, or low-quality 4D content that deviates from the provided texts. Therefore, we propose STP4D, a novel approach that aims to integrate comprehensive spatio-temporal-prompt consistency modeling for high-quality text-to-4D generation. Specifically, STP4D employs three carefully designed modules: Time-varying Prompt Embedding, Geometric Information Enhancement, and Temporal Extension Deformation, which collaborate to accomplish this goal. Furthermore, STP4D is among the first methods to exploit the Diffusion model to generate 4D Gaussians, combining the fine-grained modeling capabilities and the real-time rendering process of 4DGS with the rapid inference speed of the Diffusion model. Extensive experiments demonstrate that STP4D excels in generating high-fidelity 4D content with exceptional efficiency (approximately 4.6s per asset), surpassing existing methods in both quality and speed.\n\n文本生成 4D 内容（Text-to-4D Generation）正在迅速发展，并在多种场景中得到广泛应用。然而，现有方法往往缺乏统一框架下对时空建模与文本对齐的充分整合，导致生成的 4D 内容在时间维度上不连贯、几何结构失真，或与文本描述偏离，质量较差。\n为解决上述问题，我们提出 STP4D，一种全新的高质量文本生成 4D 内容的方法，旨在实现全面的 时空-文本一致性建模。具体而言，STP4D 设计了三个关键模块：时间变化提示嵌入（Time-varying Prompt Embedding）、几何信息增强（Geometric Information Enhancement） 与 时间扩展变形（Temporal Extension Deformation），三者协同工作以保障生成质量。\n此外，STP4D 是首批将扩散模型（Diffusion Model）用于生成 4D 高斯（4D Gaussians） 的方法之一，结合了 4DGS 在细粒度建模与实时渲染方面的能力，以及扩散模型在推理效率上的优势。大量实验表明，STP4D 能以约 4.6 秒/资产 的速度高效生成高保真 4D 内容，在质量与速度上均显著优于现有方法。\n"
  },
  {
    "path": "abs/2504.18468.md",
    "content": "### RGS-DR: Reflective Gaussian Surfels with Deferred Rendering for Shiny Objects\n\nWe introduce RGS-DR, a novel inverse rendering method for reconstructing and rendering glossy and reflective objects with support for flexible relighting and scene editing. Unlike existing methods (e.g., NeRF and 3D Gaussian Splatting), which struggle with view-dependent effects, RGS-DR utilizes a 2D Gaussian surfel representation to accurately estimate geometry and surface normals, an essential property for high-quality inverse rendering. Our approach explicitly models geometric and material properties through learnable primitives rasterized into a deferred shading pipeline, effectively reducing rendering artifacts and preserving sharp reflections. By employing a multi-level cube mipmap, RGS-DR accurately approximates environment lighting integrals, facilitating high-quality reconstruction and relighting. A residual pass with spherical-mipmap-based directional encoding further refines the appearance modeling. Experiments demonstrate that RGS-DR achieves high-quality reconstruction and rendering quality for shiny objects, often outperforming reconstruction-exclusive state-of-the-art methods incapable of relighting.\n\n我们提出 RGS-DR，一种用于重建与渲染具有光泽和反射特性的物体的新型逆向渲染方法，支持灵活的重光照（relighting）与场景编辑。不同于现有方法（如 NeRF 和 3D Gaussian Splatting）在处理视角依赖效果方面存在困难，RGS-DR 引入了 二维高斯 surfel 表示，可准确估计几何与表面法线，这是实现高质量逆向渲染的关键属性。\n我们的方法通过可学习图元对几何与材质属性进行显式建模，并将其光栅化至延迟渲染管线中，有效减少了渲染伪影并保留了清晰的反射效果。通过引入多层级的 立方体 mipmap，RGS-DR 能精确近似环境光照积分，从而实现高质量的重建与重光照。同时，结合基于球面 mipmap 的方向编码，残差通道进一步优化外观建模。\n实验结果表明，RGS-DR 能在处理高光泽物体时实现高质量的重建与渲染效果，且在可重光照任务中超越了那些仅关注重建的先进方法。\n"
  },
  {
    "path": "abs/2504.18768.md",
    "content": "### TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians\n\nThe emergence of neural and Gaussian-based radiance field methods has led to considerable advancements in novel view synthesis and 3D object reconstruction. Nonetheless, specular reflection and refraction continue to pose significant challenges due to the instability and incorrect overfitting of radiance fields to high-frequency light variations. Currently, even 3D Gaussian Splatting (3D-GS), as a powerful and efficient tool, falls short in recovering transparent objects with nearby contents due to the existence of apparent secondary ray effects. To address this issue, we propose TransparentGS, a fast inverse rendering pipeline for transparent objects based on 3D-GS. The main contributions are three-fold. Firstly, an efficient representation of transparent objects, transparent Gaussian primitives, is designed to enable specular refraction through a deferred refraction strategy. Secondly, we leverage Gaussian light field probes (GaussProbe) to encode both ambient light and nearby contents in a unified framework. Thirdly, a depth-based iterative probes query (IterQuery) algorithm is proposed to reduce the parallax errors in our probe-based framework. Experiments demonstrate the speed and accuracy of our approach in recovering transparent objects from complex environments, as well as several applications in computer graphics and vision.\n\n神经辐射场与基于高斯的辐射场方法的出现，推动了新视角合成和三维物体重建的显著进展。然而，由于辐射场在高频光照变化下的不稳定性与错误过拟合，镜面反射与折射仍然构成重大挑战。目前，即使是功能强大且高效的 3D Gaussian Splatting（3D-GS）方法，也难以重建包含邻近内容的透明物体，原因在于明显存在的二次光线效应。\n为解决这一问题，我们提出 TransparentGS，一种基于 3D-GS 的用于透明物体的快速逆渲染管线。我们的主要贡献包括：设计了一种透明物体的高效表示形式 —— 透明高斯图元（transparent Gaussian primitives），通过延迟折射策略实现镜面折射建模；引入高斯光场探针（GaussProbe），在统一框架下同时编码环境光与邻近内容信息；提出基于深度的迭代探针查询算法（IterQuery），用于减少探针框架中的视差误差。\n实验结果表明，TransparentGS 在复杂环境中能够高效且准确地重建透明物体，并展示了其在计算机图形与视觉应用中的潜力。\n"
  },
  {
    "path": "abs/2504.18925.md",
    "content": "### 4DGS-CC: A Contextual Coding Framework for 4D Gaussian Splatting Data Compression\n\nStorage is a significant challenge in reconstructing dynamic scenes with 4D Gaussian Splatting (4DGS) data. In this work, we introduce 4DGS-CC, a contextual coding framework that compresses 4DGS data to meet specific storage constraints. Building upon the established deformable 3D Gaussian Splatting (3DGS) method, our approach decomposes 4DGS data into 4D neural voxels and a canonical 3DGS component, which are then compressed using Neural Voxel Contextual Coding (NVCC) and Vector Quantization Contextual Coding (VQCC), respectively. Specifically, we first decompose the 4D neural voxels into distinct quantized features by separating the temporal and spatial dimensions. To losslessly compress each quantized feature, we leverage the previously compressed features from the temporal and spatial dimensions as priors and apply NVCC to generate the spatiotemporal context for contextual coding. Next, we employ a codebook to store spherical harmonics information from canonical 3DGS as quantized vectors, which are then losslessly compressed by using VQCC with the auxiliary learned hyperpriors for contextual coding, thereby reducing redundancy within the codebook. By integrating NVCC and VQCC, our contextual coding framework, 4DGS-CC, enables multi-rate 4DGS data compression tailored to specific storage requirements. Extensive experiments on three 4DGS data compression benchmarks demonstrate that our method achieves an average storage reduction of approximately 12 times while maintaining rendering fidelity compared to our baseline 4DGS approach.\n\n在利用 4D Gaussian Splatting（4DGS）重建动态场景时，存储需求始终是一项重大挑战。为此，我们提出 4DGS-CC，一种上下文编码框架，用于在满足特定存储约束的前提下对 4DGS 数据进行压缩。\n\n本方法基于已有的可变形三维高斯投影（3DGS）方法，将 4DGS 数据分解为 4D 神经体素与一个标准 3DGS 分量，分别采用神经体素上下文编码（NVCC）与向量量化上下文编码（VQCC）进行压缩。\n具体而言，我们首先将 4D 神经体素沿时间和空间维度进行分解，得到离散的量化特征。为对每个量化特征进行无损压缩，我们利用其在时间与空间维度上的已压缩特征作为先验，通过 NVCC 构建时空上下文以实现上下文编码。\n随后，我们使用一个码本来存储来自标准 3DGS 的球谐信息，将其表示为量化向量，并结合辅助学习得到的超先验，通过 VQCC 对其进行无损压缩，从而减少码本中的冗余信息。\n通过将 NVCC 与 VQCC 有效结合，4DGS-CC 实现了面向不同存储需求的多码率 4DGS 数据压缩。在三个 4DGS 数据压缩基准数据集上的大量实验表明，相较于原始 4DGS 方法，我们的方法在保持渲染保真度的同时，平均可实现约 12 倍的存储压缩率。\n"
  },
  {
    "path": "abs/2504.19261.md",
    "content": "### Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting\n\nScene view synthesis, which generates novel views from limited perspectives, is increasingly vital for applications like virtual reality, augmented reality, and robotics. Unlike object-based tasks, such as generating 360° views of a car, scene view synthesis handles entire environments where non-uniform observations pose unique challenges for stable rendering quality. To address this issue, we propose a novel approach: renderability field-guided gaussian splatting (RF-GS). This method quantifies input inhomogeneity through a renderability field, guiding pseudo-view sampling to enhanced visual consistency. To ensure the quality of wide-baseline pseudo-views, we train an image restoration model to map point projections to visible-light styles. Additionally, our validated hybrid data optimization strategy effectively fuses information of pseudo-view angles and source view textures. Comparative experiments on simulated and real-world data show that our method outperforms existing approaches in rendering stability.\n\n场景视图合成（Scene View Synthesis）旨在从有限视角生成新视图，对于虚拟现实、增强现实和机器人等应用日益重要。与“物体生成”任务（如生成汽车的 360° 视图）不同，场景视图合成处理的是整个环境，其非均匀的观测数据带来了渲染质量稳定性方面的独特挑战。\n为解决这一问题，我们提出一种新方法：可渲染性场引导的高斯投影（Renderability Field-Guided Gaussian Splatting, RF-GS）。该方法通过构建可渲染性场对输入数据的非均匀性进行量化，从而引导伪视图采样，提升视觉一致性。为了保证宽基线伪视图的图像质量，我们训练了一个图像恢复模型，将点投影映射为可见光风格的图像。此外，我们提出并验证了一种混合数据优化策略，能够有效融合伪视角与源视图纹理信息。\n在模拟与真实数据集上的对比实验表明，该方法在渲染稳定性方面优于现有技术。\n"
  },
  {
    "path": "abs/2504.19409.md",
    "content": "### GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field\n\nSemantic-aware 3D scene reconstruction is essential for autonomous robots to perform complex interactions. Semantic SLAM, an online approach, integrates pose tracking, geometric reconstruction, and semantic mapping into a unified framework, shows significant potential. However, existing systems, which rely on 2D ground truth priors for supervision, are often limited by the sparsity and noise of these signals in real-world environments. To address this challenge, we propose GSFF-SLAM, a novel dense semantic SLAM system based on 3D Gaussian Splatting that leverages feature fields to achieve joint rendering of appearance, geometry, and N-dimensional semantic features. By independently optimizing feature gradients, our method supports semantic reconstruction using various forms of 2D priors, particularly sparse and noisy signals. Experimental results demonstrate that our approach outperforms previous methods in both tracking accuracy and photorealistic rendering quality. When utilizing 2D ground truth priors, GSFF-SLAM achieves state-of-the-art semantic segmentation performance with 95.03% mIoU, while achieving up to 2.9 speedup with only marginal performance degradation.\n\n语义感知的三维场景重建对于自主机器人执行复杂交互任务至关重要。作为一种在线方法，语义 SLAM 将位姿跟踪、几何重建与语义映射整合于统一框架中，展现出显著潜力。然而，现有系统普遍依赖二维真值先验进行监督，在真实环境中常受到信号稀疏与噪声干扰的限制。\n为应对这一挑战，我们提出 GSFF-SLAM，一种基于三维高斯投影（3D Gaussian Splatting）的新型稠密语义 SLAM 系统。该方法通过引入特征场（feature fields），实现了外观、几何与 N 维语义特征的联合渲染。通过对特征梯度的独立优化，GSFF-SLAM 能够利用多种形式的二维先验进行语义重建，尤其在处理稀疏或噪声较大的信号时表现出强鲁棒性。\n实验结果表明，我们的方法在跟踪精度与真实感渲染质量上均优于现有方法。在采用二维真值先验的情况下，GSFF-SLAM 达到 95.03% mIoU 的语义分割性能，同时在保持性能仅有轻微下降的前提下，实现最高 2.9 倍 的加速，展现出优越的效率与准确性。\n"
  },
  {
    "path": "abs/2504.20378.md",
    "content": "### Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views\n\nWe present a Gaussian Splatting method for surface reconstruction using sparse input views. Previous methods relying on dense views struggle with extremely sparse Structure-from-Motion points for initialization. While learning-based Multi-view Stereo (MVS) provides dense 3D points, directly combining it with Gaussian Splatting leads to suboptimal results due to the ill-posed nature of sparse-view geometric optimization. We propose Sparse2DGS, an MVS-initialized Gaussian Splatting pipeline for complete and accurate reconstruction. Our key insight is to incorporate the geometric-prioritized enhancement schemes, allowing for direct and robust geometric learning under ill-posed conditions. Sparse2DGS outperforms existing methods by notable margins while being  faster than the NeRF-based fine-tuning approach.\n\n我们提出了一种用于稀疏视角输入下的表面重建的高斯泼溅方法（Gaussian Splatting）。以往依赖密集视角的重建方法，在初始化时面对极度稀疏的结构光束（Structure-from-Motion, SfM）点云时表现不佳。尽管基于学习的多视角立体（Multi-view Stereo, MVS）方法能够提供稠密的三维点云，但将其直接与高斯泼溅结合，因稀疏视角下几何优化问题的病态性质，通常会导致次优的结果。我们提出了 Sparse2DGS —— 一种以 MVS 为初始化的高斯泼溅重建流程，可实现完整且精确的重建。我们工作的核心思想是引入以几何为优先的增强机制，从而能够在病态条件下实现直接且鲁棒的几何学习。Sparse2DGS 在重建质量上显著优于现有方法，同时相比基于 NeRF 的微调方法快 2 倍。\n"
  },
  {
    "path": "abs/2504.20379.md",
    "content": "### GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting\n\nIn this paper, we present a method for localizing a query image with respect to a precomputed 3D Gaussian Splatting (3DGS) scene representation. First, the method uses 3DGS to render a synthetic RGBD image at some initial pose estimate. Second, it establishes 2D-2D correspondences between the query image and this synthetic image. Third, it uses the depth map to lift the 2D-2D correspondences to 2D-3D correspondences and solves a perspective-n-point (PnP) problem to produce a final pose estimate. Results from evaluation across three existing datasets with 38 scenes and over 2,700 test images show that our method significantly reduces both inference time (by over two orders of magnitude, from more than 10 seconds to as fast as 0.1 seconds) and estimation error compared to baseline methods that use photometric loss minimization. Results also show that our method tolerates large errors in the initial pose estimate of up to 55° in rotation and 1.1 units in translation (normalized by scene scale), achieving final pose errors of less than 5° in rotation and 0.05 units in translation on 90% of images from the Synthetic NeRF and Mip-NeRF360 datasets and on 42% of images from the more challenging Tanks and Temples dataset.\n\n在本文中，我们提出了一种方法，用于将查询图像定位到预先计算的三维高斯泼溅（3D Gaussian Splatting, 3DGS）场景表示中。该方法首先使用 3DGS 在某个初始位姿估计下渲染出一张合成的 RGBD 图像。然后，在查询图像与该合成图像之间建立 2D-2D 对应关系。接着，利用深度图将这些 2D-2D 对应关系提升为 2D-3D 对应关系，并通过求解透视-n-点（PnP）问题得到最终的位姿估计。\n在三个已有数据集上对共 38 个场景和超过 2700 张测试图像进行评估的结果表明，与基于光度损失最小化的基线方法相比，我们的方法在推理时间上显著降低（从 10 秒以上缩短至最快 0.1 秒，提升超过两个数量级），同时也大幅减少了估计误差。实验还表明，我们的方法能够容忍初始位姿估计中高达 55° 的旋转误差和 1.1 个单位（按场景尺度归一化）的平移误差，并在 Synthetic NeRF 和 Mip-NeRF360 数据集上的 90% 图像以及更具挑战性的 Tanks and Temples 数据集上的 42% 图像上，将最终位姿误差控制在 5° 的旋转和 0.05 个单位的平移以内。\n"
  },
  {
    "path": "abs/2504.20403.md",
    "content": "### Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting\n\nPersonalized 3D avatar editing holds significant promise due to its user-friendliness and availability to applications such as AR/VR and virtual try-ons. Previous studies have explored the feasibility of 3D editing, but often struggle to generate visually pleasing results, possibly due to the unstable representation learning under mixed optimization of geometry and texture in complicated reconstructed scenarios. In this paper, we aim to provide an accessible solution for ordinary users to create their editable 3D avatars with precise region localization, geometric adaptability, and photorealistic renderings. To tackle this challenge, we introduce a meticulously designed framework that decouples the editing process into local spatial adaptation and realistic appearance learning, utilizing a hybrid Tetrahedron-constrained Gaussian Splatting (TetGS) as the underlying representation. TetGS combines the controllable explicit structure of tetrahedral grids with the high-precision rendering capabilities of 3D Gaussian Splatting and is optimized in a progressive manner comprising three stages: 3D avatar instantiation from real-world monocular videos to provide accurate priors for TetGS initialization; localized spatial adaptation with explicitly partitioned tetrahedrons to guide the redistribution of Gaussian kernels; and geometry-based appearance generation with a coarse-to-fine activation strategy. Both qualitative and quantitative experiments demonstrate the effectiveness and superiority of our approach in generating photorealistic 3D editable avatars.\n\n个性化三维头像编辑因其易用性以及在 AR/VR 和虚拟试穿等应用中的广泛潜力而备受关注。已有研究虽验证了三维编辑的可行性，但往往难以生成视觉效果令人满意的结果，这可能源于在复杂重建场景中，几何与纹理混合优化下的不稳定表示学习。\n本文旨在为普通用户提供一种可访问的解决方案，使其能够创建具有精准区域定位、几何自适应性和真实感渲染效果的可编辑三维头像。为解决这一挑战，我们提出了一个精心设计的框架，将编辑过程解耦为局部空间适配与真实外观学习两个阶段，并采用混合四面体约束高斯泼溅（Tetrahedron-constrained Gaussian Splatting, TetGS）作为底层表示。TetGS 结合了四面体网格的可控显式结构与三维高斯泼溅的高精度渲染能力，并通过三个阶段以渐进式方式进行优化：首先从真实单目视频中实例化三维头像，为 TetGS 初始化提供准确先验；接着通过显式划分的四面体实现局部空间适配，引导高斯核的重新分布；最后采用由粗到细的激活策略，完成基于几何的外观生成。\n定性与定量实验均表明，我们的方法在生成真实感强、可编辑的三维头像方面具备显著的效果与优势。\n"
  },
  {
    "path": "abs/2504.20607.md",
    "content": "### EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian\n\n3D Gaussian Splatting (3DGS) has been recognized as a pioneering technique in scene reconstruction and novel view synthesis. Recent work on reconstructing the 3D human body using 3DGS attempts to leverage prior information on human pose to enhance rendering quality and improve training speed. However, it struggles to effectively fit dynamic surface planes due to multi-view inconsistency and redundant Gaussians. This inconsistency arises because Gaussian ellipsoids cannot accurately represent the surfaces of dynamic objects, which hinders the rapid reconstruction of the dynamic human body. Meanwhile, the prevalence of redundant Gaussians means that the training time of these works is still not ideal for quickly fitting a dynamic human body. To address these, we propose EfficientHuman, a model that quickly accomplishes the dynamic reconstruction of the human body using Articulated 2D Gaussian while ensuring high rendering quality. The key innovation involves encoding Gaussian splats as Articulated 2D Gaussian surfels in canonical space and then transforming them to pose space via Linear Blend Skinning (LBS) to achieve efficient pose transformations. Unlike 3D Gaussians, Articulated 2D Gaussian surfels can quickly conform to the dynamic human body while ensuring view-consistent geometries. Additionally, we introduce a pose calibration module and an LBS optimization module to achieve precise fitting of dynamic human poses, enhancing the model's performance. Extensive experiments on the ZJU-MoCap dataset demonstrate that EfficientHuman achieves rapid 3D dynamic human reconstruction in less than a minute on average, which is 20 seconds faster than the current state-of-the-art method, while also reducing the number of redundant Gaussians.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）已被公认为场景重建与新视角合成中的开创性技术。近期在三维人体重建方向的研究尝试利用人体姿态的先验信息，以提升渲染质量并加快训练速度。然而，受限于多视角不一致性与高斯表示冗余，这些方法在动态表面拟合方面仍面临显著挑战。具体而言，Gaussian 椭球体难以精确表示动态物体表面，导致动态人体的快速重建受阻；同时，高斯核的冗余也使得训练时间仍难以满足高效拟合的需求。\n为应对上述问题，我们提出了 EfficientHuman，一种基于关节驱动二维高斯（Articulated 2D Gaussian）的高效动态人体重建模型，能够在确保高渲染质量的同时实现快速重建。该方法的关键创新在于：将高斯泼溅编码为标准空间中的 Articulated 2D Gaussian surfels，并通过线性混合蒙皮（Linear Blend Skinning, LBS）将其变换到姿态空间，从而实现高效的姿态变换。与传统三维高斯不同，Articulated 2D Gaussian surfels 可快速贴合动态人体，并保持视角一致的几何结构。\n此外，我们还引入了姿态校准模块与 LBS 优化模块，以实现对动态人体姿态的精确拟合，进一步提升模型性能。在 ZJU-MoCap 数据集上的大量实验证明，EfficientHuman 能够平均在一分钟内完成三维动态人体重建，比当前最先进的方法快 20 秒，同时有效减少了冗余高斯数量。\n"
  },
  {
    "path": "abs/2504.20829.md",
    "content": "### GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion\n\nAs 3D Gaussian Splatting (3DGS) emerges as a breakthrough in scene representation and novel view synthesis, its rapid adoption in safety-critical domains (e.g., autonomous systems, AR/VR) urgently demands scrutiny of potential security vulnerabilities. This paper presents the first systematic study of backdoor threats in 3DGS pipelines. We identify that adversaries may implant backdoor views to induce malicious scene confusion during inference, potentially leading to environmental misperception in autonomous navigation or spatial distortion in immersive environments. To uncover this risk, we propose GuassTrap, a novel poisoning attack method targeting 3DGS models. GuassTrap injects malicious views at specific attack viewpoints while preserving high-quality rendering in non-target views, ensuring minimal detectability and maximizing potential harm. Specifically, the proposed method consists of a three-stage pipeline (attack, stabilization, and normal training) to implant stealthy, viewpoint-consistent poisoned renderings in 3DGS, jointly optimizing attack efficacy and perceptual realism to expose security risks in 3D rendering. Extensive experiments on both synthetic and real-world datasets demonstrate that GuassTrap can effectively embed imperceptible yet harmful backdoor views while maintaining high-quality rendering in normal views, validating its robustness, adaptability, and practical applicability.\n\n随着三维高斯泼溅（3D Gaussian Splatting, 3DGS）作为场景表示与新视角合成领域的突破性技术迅速兴起，其在自动驾驶系统、增强现实/虚拟现实（AR/VR）等安全关键领域的快速应用也引发了对潜在安全漏洞的紧迫关注。本文首次对 3DGS 流水线中的后门攻击威胁进行了系统性研究。\n我们发现，攻击者可能通过植入后门视角，在推理阶段诱发恶意场景混淆，从而在自动导航中导致环境误感知，或在沉浸式环境中引发空间畸变。为揭示这一风险，本文提出 GuassTrap，一种针对 3DGS 模型的全新投毒攻击方法。GuassTrap 在特定攻击视角注入恶意视图，同时在非目标视角保持高质量渲染，确保攻击隐蔽性最大化并实现潜在危害最大化。\n具体而言，该方法包括三个阶段的流水线：攻击阶段、稳定阶段和正常训练阶段，以在 3DGS 表示中植入隐蔽、视角一致的投毒渲染结果。该方法联合优化攻击效果与感知真实感，揭示了 3D 渲染中的安全风险。\n在合成与真实数据集上的大量实验证明，GuassTrap 能够有效嵌入不可察觉但具破坏性的后门视图，同时维持正常视角下的高质量渲染，验证了其鲁棒性、适应性和实际应用价值。\n"
  },
  {
    "path": "abs/2504.21067.md",
    "content": "### GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction\n\nThis research tackles the challenge of real-time active view selection and uncertainty quantification on visual quality for active 3D reconstruction. Visual quality is a critical aspect of 3D reconstruction. Recent advancements such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have notably enhanced the image rendering quality of reconstruction models. Nonetheless, the efficient and effective acquisition of input images for reconstruction-specifically, the selection of the most informative viewpoint-remains an open challenge, which is crucial for active reconstruction. Existing studies have primarily focused on evaluating geometric completeness and exploring unobserved or unknown regions, without direct evaluation of the visual uncertainty within the reconstruction model. To address this gap, this paper introduces a probabilistic model that quantifies visual uncertainty for each Gaussian. Leveraging Shannon Mutual Information, we formulate a criterion, Gaussian Splatting Shannon Mutual Information (GauSS-MI), for real-time assessment of visual mutual information from novel viewpoints, facilitating the selection of next best view. GauSS-MI is implemented within an active reconstruction system integrated with a view and motion planner. Extensive experiments across various simulated and real-world scenes showcase the superior visual quality and reconstruction efficiency performance of the proposed system.\n\n本研究针对主动三维重建中的实时视角选择与视觉质量不确定性量化这一挑战展开探索。视觉质量是三维重建中的关键因素。近年来，诸如神经辐射场（Neural Radiance Fields, NeRF）和三维高斯泼溅（3D Gaussian Splatting, 3DGS）等技术显著提升了重建模型的图像渲染质量。然而，在重建过程中高效获取输入图像，尤其是选择最具信息量的视角，仍是一个尚未解决的关键问题，对主动重建尤为重要。\n现有研究主要集中于评估几何完整性和探索未观测或未知区域，而未对重建模型中的视觉不确定性进行直接评估。为填补这一空白，本文提出了一种概率模型，用于量化每个高斯的视觉不确定性。我们基于香农互信息（Shannon Mutual Information）构建了一种新视角下的视觉互信息评估准则——Gaussian Splatting Shannon Mutual Information（GauSS-MI），用于实时评估不同视角的互信息，从而辅助选择“下一最佳视角”（Next Best View）。\nGauSS-MI 被集成到一个主动重建系统中，该系统包含视角与运动规划器。在多种仿真与真实场景中的大量实验表明，所提出系统在视觉质量与重建效率方面均表现出优越性能。\n"
  },
  {
    "path": "abs/2504.21650.md",
    "content": "### HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation\n\nThe rapid advancement of diffusion models holds the promise of revolutionizing the application of VR and AR technologies, which typically require scene-level 4D assets for user experience. Nonetheless, existing diffusion models predominantly concentrate on modeling static 3D scenes or object-level dynamics, constraining their capacity to provide truly immersive experiences. To address this issue, we propose HoloTime, a framework that integrates video diffusion models to generate panoramic videos from a single prompt or reference image, along with a 360-degree 4D scene reconstruction method that seamlessly transforms the generated panoramic video into 4D assets, enabling a fully immersive 4D experience for users. Specifically, to tame video diffusion models for generating high-fidelity panoramic videos, we introduce the 360World dataset, the first comprehensive collection of panoramic videos suitable for downstream 4D scene reconstruction tasks. With this curated dataset, we propose Panoramic Animator, a two-stage image-to-video diffusion model that can convert panoramic images into high-quality panoramic videos. Following this, we present Panoramic Space-Time Reconstruction, which leverages a space-time depth estimation method to transform the generated panoramic videos into 4D point clouds, enabling the optimization of a holistic 4D Gaussian Splatting representation to reconstruct spatially and temporally consistent 4D scenes. To validate the efficacy of our method, we conducted a comparative analysis with existing approaches, revealing its superiority in both panoramic video generation and 4D scene reconstruction. This demonstrates our method's capability to create more engaging and realistic immersive environments, thereby enhancing user experiences in VR and AR applications.\n\n扩散模型的迅猛发展为虚拟现实（VR）和增强现实（AR）技术的应用带来了变革性潜力，而这类应用通常依赖于场景级的四维（4D）资产以实现沉浸式体验。然而，现有扩散模型大多聚焦于静态三维场景或物体级动态建模，限制了其在构建真正沉浸式体验方面的能力。\n为解决这一问题，我们提出 HoloTime 框架，该框架融合视频扩散模型，从单一提示或参考图像生成全景视频，并配套一套 360 度 4D 场景重建方法，可将生成的全景视频无缝转换为 4D 资产，从而为用户带来完整沉浸式的 4D 体验。\n具体而言，为了使视频扩散模型能够生成高保真全景视频，我们引入了 360World 数据集，这是首个适用于下游 4D 场景重建任务的全景视频综合数据集。在该数据集基础上，我们提出了 Panoramic Animator，一种两阶段的图像到视频扩散模型，能够将全景图像转换为高质量的全景视频。\n随后，我们提出 Panoramic Space-Time Reconstruction，该方法利用时空深度估计技术将生成的全景视频转换为 4D 点云，并进一步优化得到时空一致的整体 4D Gaussian Splatting 表示，从而实现空间与时间维度一致的 4D 场景重建。\n为验证方法有效性，我们与现有方法进行了对比分析，结果表明无论是在全景视频生成还是在 4D 场景重建方面，HoloTime 均展现出显著优势。这一成果证明了我们的方法能够创造更具吸引力和真实感的沉浸式环境，进而提升 VR 与 AR 应用中的用户体验。\n"
  },
  {
    "path": "abs/2505.00421.md",
    "content": "### Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos\n\nHigh-quality, animatable 3D human avatar reconstruction from monocular videos offers significant potential for reducing reliance on complex hardware, making it highly practical for applications in game development, augmented reality, and social media. However, existing methods still face substantial challenges in capturing fine geometric details and maintaining animation stability, particularly under dynamic or complex poses. To address these issues, we propose a novel real-time framework for animatable human avatar reconstruction based on 2D Gaussian Splatting (2DGS). By leveraging 2DGS and global SMPL pose parameters, our framework not only aligns positional and rotational discrepancies but also enables robust and natural pose-driven animation of the reconstructed avatars. Furthermore, we introduce a Rotation Compensation Network (RCN) that learns rotation residuals by integrating local geometric features with global pose parameters. This network significantly improves the handling of non-rigid deformations and ensures smooth, artifact-free pose transitions during animation. Experimental results demonstrate that our method successfully reconstructs realistic and highly animatable human avatars from monocular videos, effectively preserving fine-grained details while ensuring stable and natural pose variation. Our approach surpasses current state-of-the-art methods in both reconstruction quality and animation robustness on public benchmarks.\n\n从单目视频中重建高质量、可动画的三维人体头像，具有显著潜力，有望减少对复杂硬件的依赖，在游戏开发、增强现实和社交媒体等应用中具备极高的实用性。然而，现有方法在捕捉精细几何细节以及在动态或复杂姿态下保持动画稳定性方面仍面临诸多挑战。\n为应对这些问题，我们提出了一种基于二维高斯泼溅（2D Gaussian Splatting, 2DGS）的实时可动画人体头像重建新框架。该框架利用 2DGS 结合全局 SMPL 姿态参数，不仅实现了对位置与旋转误差的有效对齐，还支持对重建头像的鲁棒且自然的姿态驱动动画。\n此外，我们引入了一个旋转补偿网络（Rotation Compensation Network, RCN），通过融合局部几何特征与全局姿态参数，学习旋转残差，从而显著提升对非刚性变形的处理能力，并确保动画过程中的姿态过渡平滑、无伪影。\n实验结果表明，我们的方法能够从单目视频中成功重建出逼真且高度可动画的人体头像，兼顾精细细节的保留与姿态变化的稳定性。在多个公开基准测试中，我们的方法在重建质量与动画鲁棒性方面均优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2505.01235.md",
    "content": "### Compensating Spatiotemporally Inconsistent Observations for Online Dynamic 3D Gaussian Splatting\n\nOnline reconstruction of dynamic scenes is significant as it enables learning scenes from live-streaming video inputs, while existing offline dynamic reconstruction methods rely on recorded video inputs. However, previous online reconstruction approaches have primarily focused on efficiency and rendering quality, overlooking the temporal consistency of their results, which often contain noticeable artifacts in static regions. This paper identifies that errors such as noise in real-world recordings affect temporal inconsistency in online reconstruction. We propose a method that enhances temporal consistency in online reconstruction from observations with temporal inconsistency which is inevitable in cameras. We show that our method restores the ideal observation by subtracting the learned error. We demonstrate that applying our method to various baselines significantly enhances both temporal consistency and rendering quality across datasets. Code, video results, and checkpoints are available at this https URL.\n\n动态场景的在线重建具有重要意义，因为它能够从实时视频流中学习场景，而现有的离线动态重建方法则依赖于预先录制的视频输入。然而，已有的在线重建方法主要关注效率和渲染质量，往往忽视了结果的时间一致性，导致在静态区域中出现明显伪影等问题。\n本文指出，现实世界录制中的噪声等误差是导致在线重建时间不一致性的主要原因。为此，我们提出了一种方法，用于在存在时间不一致性的观测条件下提升在线重建的时间一致性，这种不一致性在实际相机中是难以避免的。我们的方法通过学习误差并将其从观测值中减除，从而恢复理想观测。\n我们在多个基线方法上应用所提出的方法，实验结果表明该方法在多个数据集上显著提升了时间一致性和渲染质量。\n"
  },
  {
    "path": "abs/2505.01322.md",
    "content": "### FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors\n\nText-driven object insertion in 3D scenes is an emerging task that enables intuitive scene editing through natural language. However, existing 2D editing-based methods often rely on spatial priors such as 2D masks or 3D bounding boxes, and they struggle to ensure consistency of the inserted object. These limitations hinder flexibility and scalability in real-world applications. In this paper, we propose FreeInsert, a novel framework that leverages foundation models including MLLMs, LGMs, and diffusion models to disentangle object generation from spatial placement. This enables unsupervised and flexible object insertion in 3D scenes without spatial priors. FreeInsert starts with an MLLM-based parser that extracts structured semantics, including object types, spatial relationships, and attachment regions, from user instructions. These semantics guide both the reconstruction of the inserted object for 3D consistency and the learning of its degrees of freedom. We leverage the spatial reasoning capabilities of MLLMs to initialize object pose and scale. A hierarchical, spatially aware refinement stage further integrates spatial semantics and MLLM-inferred priors to enhance placement. Finally, the appearance of the object is improved using the inserted-object image to enhance visual fidelity. Experimental results demonstrate that FreeInsert achieves semantically coherent, spatially precise, and visually realistic 3D insertions without relying on spatial priors, offering a user-friendly and flexible editing experience.\n\n文本驱动的三维场景目标插入是一项新兴任务，使用户能够通过自然语言实现直观的场景编辑。然而，现有基于二维编辑的方法通常依赖于诸如二维掩码或三维包围盒等空间先验，难以保证插入目标的一致性。这些限制降低了其在实际应用中的灵活性与可扩展性。\n本文提出 FreeInsert，一种全新的三维场景目标插入框架，结合基础模型（包括多模态大模型 MLLMs、大型生成模型 LGMs 和扩散模型）以解耦目标生成与空间放置，从而实现无需空间先验的灵活、无监督三维插入。\nFreeInsert 首先通过基于 MLLM 的解析器，从用户指令中提取结构化语义信息，包括目标类别、空间关系与附着区域等。这些语义信息既用于引导所插入目标的重建以保证三维一致性，也用于学习其自由度（如姿态与尺度）。我们利用 MLLM 的空间推理能力对目标初始的姿态与尺度进行估计。\n随后，设计了一种分层的、具有空间感知能力的细化模块，融合空间语义与 MLLM 推断出的先验信息，进一步优化目标位置与朝向。最后，结合插入目标图像对其外观进行增强，提升视觉真实感。\n实验结果表明，FreeInsert 无需空间先验即可实现语义一致、空间精准、视觉真实的三维目标插入，提供了一种用户友好且高度灵活的编辑体验。\n"
  },
  {
    "path": "abs/2505.01799.md",
    "content": "### AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting\n\nUnderwater scene reconstruction is a critical tech-nology for underwater operations, enabling the generation of 3D models from images captured by underwater platforms. However, the quality of underwater images is often degraded due to medium interference, which limits the effectiveness of Structure-from-Motion (SfM) pose estimation, leading to subsequent reconstruction failures. Additionally, SfM methods typically operate at slower speeds, further hindering their applicability in real-time scenarios. In this paper, we introduce AquaGS, an SfM-free underwater scene reconstruction model based on the SeaThru algorithm, which facilitates rapid and accurate separation of scene details and medium features. Our approach initializes Gaussians by integrating state-of-the-art multi-view stereo (MVS) technology, employs implicit Neural Radiance Fields (NeRF) for rendering translucent media and utilizes the latest explicit 3D Gaussian Splatting (3DGS) technique to render object surfaces, which effectively addresses the limitations of traditional methods and accurately simulates underwater optical phenomena. Experimental results on the data set and the robot platform show that our model can complete high-precision reconstruction in 30 seconds with only 3 image inputs, significantly enhancing the practical application of the algorithm in robotic platforms.\n\n水下场景重建是水下作业中的关键技术，能够利用水下平台采集的图像生成三维模型。然而，由于介质干扰的影响，水下图像质量常常受到严重退化，限制了基于结构光束法（Structure-from-Motion, SfM）的位姿估计效果，进而导致后续重建失败。此外，SfM 方法本身运行速度较慢，也进一步制约了其在实时场景中的适用性。\n本文提出 AquaGS，一种无需 SfM 的水下场景重建模型，基于 SeaThru 算法实现场景细节与介质特征的快速且精确的分离。我们的方法结合了最先进的多视图立体（Multi-view Stereo, MVS）技术初始化高斯分布，利用隐式神经辐射场（NeRF）渲染半透明介质，并采用最新的显式三维高斯泼溅（3D Gaussian Splatting, 3DGS）技术渲染物体表面，有效克服传统方法的局限，真实模拟了水下光学现象。\n在公开数据集和机器人平台上的实验结果表明，我们的模型仅需 3 张图像输入，即可在 30 秒内完成高精度重建，显著提升了算法在机器人平台中的实际应用价值。\n"
  },
  {
    "path": "abs/2505.01928.md",
    "content": "### GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting\n\nWe introduce GenSync, a novel framework for multi-identity lip-synced video synthesis using 3D Gaussian Splatting. Unlike most existing 3D methods that require training a new model for each identity , GenSync learns a unified network that synthesizes lip-synced videos for multiple speakers. By incorporating a Disentanglement Module, our approach separates identity-specific features from audio representations, enabling efficient multi-identity video synthesis. This design reduces computational overhead and achieves 6.8x faster training compared to state-of-the-art models, while maintaining high lip-sync accuracy and visual quality.\n\n我们提出了 GenSync，一个基于三维高斯泼溅（3D Gaussian Splatting）的多身份唇形同步视频生成新框架。与大多数现有三维方法需要为每个身份单独训练模型不同，GenSync 学习一个统一的网络，能够为多个说话人生成唇形同步视频。\n通过引入解耦模块（Disentanglement Module），该方法将身份特征与音频表示有效分离，从而实现高效的多身份视频合成。该设计显著降低了计算开销，使训练速度比当前最先进的方法提升 6.8 倍，同时保持高水平的唇形同步精度与视觉质量。\n"
  },
  {
    "path": "abs/2505.01938.md",
    "content": "### HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder\n\nMost existing 3D Gaussian Splatting (3DGS) compression schemes focus on producing compact 3DGS representation via implicit data embedding. They have long coding times and highly customized data format, making it difficult for widespread deployment. This paper presents a new 3DGS compression framework called HybridGS, which takes advantage of both compact generation and standardized point cloud data encoding. HybridGS first generates compact and explicit 3DGS data. A dual-channel sparse representation is introduced to supervise the primitive position and feature bit depth. It then utilizes a canonical point cloud encoder to perform further data compression and form standard output bitstreams. A simple and effective rate control scheme is proposed to pivot the interpretable data compression scheme. At the current stage, HybridGS does not include any modules aimed at improving 3DGS quality during generation. But experiment results show that it still provides comparable reconstruction performance against state-of-the-art methods, with evidently higher encoding and decoding speed.\n\n现有的大多数三维高斯泼溅（3D Gaussian Splatting, 3DGS）压缩方案主要依赖于隐式数据嵌入，以生成紧凑的 3DGS 表示。然而，这类方法通常存在编码时间长、数据格式高度定制化的问题，限制了其在实际场景中的广泛部署。\n本文提出了一种新的 3DGS 压缩框架 HybridGS，结合了紧凑生成与标准点云编码的优势。HybridGS 首先生成紧凑且显式的 3DGS 数据，并引入双通道稀疏表示机制，以对原始元素的位置与特征比特深度进行联合监督。随后，该方法利用通用的点云编码器进一步压缩数据，生成符合标准的数据比特流输出。\n此外，我们提出了一种简单有效的码率控制机制，支撑该具可解释性的压缩框架。在当前阶段，HybridGS 并未集成任何用于提升生成阶段 3DGS 质量的模块，但实验结果表明，该方法在重建性能方面依然可与现有最先进方法媲美，同时在编码与解码速度上具备明显优势。\n"
  },
  {
    "path": "abs/2505.02108.md",
    "content": "### SignSplat: Rendering Sign Language via Gaussian Splatting\n\nState-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fidelity models are compounded by the complexity of capturing multi-view data of sign. The solution is to make better use of sequence data, ensuring that we can overcome the limited information from only a few views by exploiting temporal variability. Nevertheless, learning from sequence-level data requires extremely accurate and consistent model fitting to ensure that appearance is consistent across complex motions. We focus on how to achieve this, constraining mesh parameters to build an accurate Gaussian splatting framework from few views capable of modelling subtle human motion. We leverage regularization techniques on the Gaussian parameters to mitigate overfitting and rendering artifacts. Additionally, we propose a new adaptive control method to densify Gaussians and prune splat points on the mesh surface. To demonstrate the accuracy of our approach, we render novel sequences of sign language video, building on neural machine translation approaches to sign stitching. On benchmark datasets, our approach achieves state-of-the-art performance; and on highly articulated and complex sign language motion, we significantly outperform competing approaches."
  },
  {
    "path": "abs/2505.02126.md",
    "content": "### GarmentGS: Point-Cloud Guided Gaussian Splatting for High-Fidelity Non-Watertight 3D Garment Reconstruction\n\nTraditional 3D garment creation requires extensive manual operations, resulting in time and labor costs. Recently, 3D Gaussian Splatting has achieved breakthrough progress in 3D scene reconstruction and rendering, attracting widespread attention and opening new pathways for 3D garment reconstruction. However, due to the unstructured and irregular nature of Gaussian primitives, it is difficult to reconstruct high-fidelity, non-watertight 3D garments. In this paper, we present GarmentGS, a dense point cloud-guided method that can reconstruct high-fidelity garment surfaces with high geometric accuracy and generate non-watertight, single-layer meshes. Our method introduces a fast dense point cloud reconstruction module that can complete garment point cloud reconstruction in 10 minutes, compared to traditional methods that require several hours. Furthermore, we use dense point clouds to guide the movement, flattening, and rotation of Gaussian primitives, enabling better distribution on the garment surface to achieve superior rendering effects and geometric accuracy. Through numerical and visual comparisons, our method achieves fast training and real-time rendering while maintaining competitive quality.\n\n传统的三维服装建模依赖大量人工操作，导致较高的时间与人力成本。近年来，三维高斯泼溅（3D Gaussian Splatting）在三维场景重建与渲染方面取得了突破性进展，受到广泛关注，并为三维服装重建开辟了新的路径。然而，由于高斯图元本身结构不规则、分布无序，准确重建高保真、非封闭式的三维服装表面仍面临挑战。\n本文提出 GarmentGS，一种基于稠密点云引导的三维服装重建方法，能够实现高几何精度的服装表面重建，并生成非封闭、单层的三维网格结构。我们引入了一个高效的稠密点云重建模块，相比传统方法需耗时数小时，该模块可在 10 分钟内完成服装点云重建。\n此外，我们利用稠密点云引导高斯图元的移动、展平与旋转，使其更合理地分布在服装表面，从而提升渲染效果与几何重建精度。通过数值与可视化对比实验证明，我们的方法在保持竞争性质量的同时，具备快速训练与实时渲染的能力。\n"
  },
  {
    "path": "abs/2505.02175.md",
    "content": "### SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting\n\nRecovering 3D information from scenes via multi-view stereo reconstruction (MVS) and novel view synthesis (NVS) is inherently challenging, particularly in scenarios involving sparse-view setups. The advent of 3D Gaussian Splatting (3DGS) enabled real-time, photorealistic NVS. Following this, 2D Gaussian Splatting (2DGS) leveraged perspective accurate 2D Gaussian primitive rasterization to achieve accurate geometry representation during rendering, improving 3D scene reconstruction while maintaining real-time performance. Recent approaches have tackled the problem of sparse real-time NVS using 3DGS within a generalizable, MVS-based learning framework to regress 3D Gaussian parameters. Our work extends this line of research by addressing the challenge of generalizable sparse 3D reconstruction and NVS jointly, and manages to perform successfully at both tasks. We propose an MVS-based learning pipeline that regresses 2DGS surface element parameters in a feed-forward fashion to perform 3D shape reconstruction and NVS from sparse-view images. We further show that our generalizable pipeline can benefit from preexisting foundational multi-view deep visual features. The resulting model attains the state-of-the-art results on the DTU sparse 3D reconstruction benchmark in terms of Chamfer distance to ground-truth, as-well as state-of-the-art NVS. It also demonstrates strong generalization on the BlendedMVS and Tanks and Temples datasets. We note that our model outperforms the prior state-of-the-art in feed-forward sparse view reconstruction based on volume rendering of implicit representations, while offering an almost 2 orders of magnitude higher inference speed.\n\n通过多视图立体重建（Multi-View Stereo, MVS）与新视角合成（Novel View Synthesis, NVS）恢复场景中的三维信息本质上具有挑战性，尤其是在稀疏视角设置下更为困难。三维高斯泼溅（3D Gaussian Splatting, 3DGS）的出现实现了实时、写实的新视角合成。随后，二维高斯泼溅（2D Gaussian Splatting, 2DGS）通过透视精确的二维高斯图元光栅化，在保持实时性能的同时提升了渲染过程中的几何表达能力，从而改善了三维场景重建质量。\n近期的一些方法已在可泛化的、基于 MVS 的学习框架中利用 3DGS 解决稀疏视角下的实时 NVS 问题，通过回归 3D 高斯参数实现重建与渲染。本文在此研究方向基础上进一步拓展，提出了一种同时解决可泛化稀疏视角三维重建与新视角合成的联合方法，并在两项任务中均取得优异表现。\n我们提出了一种基于 MVS 的学习流水线，采用前馈方式回归 2DGS 表面元素参数，从稀疏视角图像中实现三维形状重建与新视角图像合成。此外，我们进一步表明该可泛化流水线能够从已有的多视角基础视觉特征中受益。\n实验表明，我们的模型在 DTU 稀疏三维重建基准上达到了当前最优的 Chamfer 距离指标，同时在新视角合成任务中也实现了最先进性能。在 BlendedMVS 与 Tanks and Temples 数据集上亦展现出良好的泛化能力。值得注意的是，与基于隐式表示体渲染的前馈稀疏视角重建方法相比，我们的模型不仅在精度上超越现有最优方法，且推理速度提升近两个数量级。\n"
  },
  {
    "path": "abs/2505.02178.md",
    "content": "### Sparfels: Fast Reconstruction from Sparse Unposed Imagery\n\nWe present a method for Sparse view reconstruction with surface element splatting that runs within 3 minutes on a consumer grade GPU. While few methods address sparse radiance field learning from noisy or unposed sparse cameras, shape recovery remains relatively underexplored in this setting. Several radiance and shape learning test-time optimization methods address the sparse posed setting by learning data priors or using combinations of external monocular geometry priors. Differently, we propose an efficient and simple pipeline harnessing a single recent 3D foundation model. We leverage its various task heads, notably point maps and camera initializations to instantiate a bundle adjusting 2D Gaussian Splatting (2DGS) model, and image correspondences to guide camera optimization midst 2DGS training. Key to our contribution is a novel formulation of splatted color variance along rays, which can be computed efficiently. Reducing this moment in training leads to more accurate shape reconstructions. We demonstrate state-of-the-art performances in the sparse uncalibrated setting in reconstruction and novel view benchmarks based on established multi-view datasets.\n\n我们提出了一种基于表面元素泼溅的稀疏视角重建方法，可在消费级 GPU 上于 3 分钟内完成运行。尽管已有少数方法尝试从噪声或未标定的稀疏相机中学习稀疏辐射场，但在该设定下的形状恢复问题仍相对较少被研究。现有部分辐射场与形状学习方法主要针对已知位姿的稀疏设定，通过学习数据先验或结合外部单目几何先验来实现。\n与之不同，我们提出了一种高效且简洁的重建流程，仅依赖一个最新的三维基础模型。我们利用该基础模型提供的多任务输出头，特别是点图（point maps）和相机初始化，用于构建并调整一个二维高斯泼溅（2D Gaussian Splatting, 2DGS）模型，同时利用图像间的对应关系来辅助训练过程中的相机优化。\n我们工作的关键贡献之一是提出了一种新的射线上泼溅颜色方差公式化方法，该方法可高效计算并在训练过程中最小化，从而提升形状重建的精度。\n在多个标准多视图数据集上进行的稀疏未标定设定下的重建与新视角合成测试表明，我们的方法在精度上达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2505.03310.md",
    "content": "### 3D Gaussian Splatting Data Compression with Mixture of Priors\n\n3D Gaussian Splatting (3DGS) data compression is crucial for enabling efficient storage and transmission in 3D scene modeling. However, its development remains limited due to inadequate entropy models and suboptimal quantization strategies for both lossless and lossy compression scenarios, where existing methods have yet to 1) fully leverage hyperprior information to construct robust conditional entropy models, and 2) apply fine-grained, element-wise quantization strategies for improved compression granularity. In this work, we propose a novel Mixture of Priors (MoP) strategy to simultaneously address these two challenges. Specifically, inspired by the Mixture-of-Experts (MoE) paradigm, our MoP approach processes hyperprior information through multiple lightweight MLPs to generate diverse prior features, which are subsequently integrated into the MoP feature via a gating mechanism. To enhance lossless compression, the resulting MoP feature is utilized as a hyperprior to improve conditional entropy modeling. Meanwhile, for lossy compression, we employ the MoP feature as guidance information in an element-wise quantization procedure, leveraging a prior-guided Coarse-to-Fine Quantization (C2FQ) strategy with a predefined quantization step value. Specifically, we expand the quantization step value into a matrix and adaptively refine it from coarse to fine granularity, guided by the MoP feature, thereby obtaining a quantization step matrix that facilitates element-wise quantization. Extensive experiments demonstrate that our proposed 3DGS data compression framework achieves state-of-the-art performance across multiple benchmarks, including Mip-NeRF360, BungeeNeRF, DeepBlending, and Tank&Temples.\n\n3D Gaussian Splatting（3DGS）数据压缩对于实现高效的三维场景建模存储与传输具有重要意义。然而，由于缺乏有效的熵模型以及在无损与有损压缩场景中次优的量化策略，其发展仍较为受限。现有方法尚未解决以下两个关键问题：1）未能充分利用 hyperprior 信息构建稳健的条件熵模型；2）未实现细粒度的逐元素量化策略以提升压缩精度。\n为应对这些挑战，本文提出了一种新颖的 Mixture of Priors（MoP） 策略。具体而言，受 Mixture-of-Experts（MoE）范式启发，我们的 MoP 方法通过多个轻量级 MLP 处理 hyperprior 信息，生成多样化的先验特征，并通过 gating 机制将其整合为 MoP 特征。\n为增强无损压缩效果，生成的 MoP 特征被用作 hyperprior 以提升条件熵建模能力。而在有损压缩中，我们将 MoP 特征用作逐元素量化过程中的引导信息，引入一种基于先验引导的 由粗到细量化策略（Coarse-to-Fine Quantization, C2FQ），该策略从预定义的量化步长出发，将其扩展为矩阵，并在 MoP 引导下自适应地从粗粒度细化为精粒度，最终得到支持逐元素量化的量化步长矩阵。\n大量实验证明，所提出的 3DGS 数据压缩框架在多个基准上实现了当前最优的性能。\n"
  },
  {
    "path": "abs/2505.03351.md",
    "content": "### GUAVA: Generalizable Upper Body 3D Gaussian Avatar\n\nReconstructing a high-quality, animatable 3D human avatar with expressive facial and hand motions from a single image has gained significant attention due to its broad application potential. 3D human avatar reconstruction typically requires multi-view or monocular videos and training on individual IDs, which is both complex and time-consuming. Furthermore, limited by SMPLX's expressiveness, these methods often focus on body motion but struggle with facial expressions. To address these challenges, we first introduce an expressive human model (EHM) to enhance facial expression capabilities and develop an accurate tracking method. Based on this template model, we propose GUAVA, the first framework for fast animatable upper-body 3D Gaussian avatar reconstruction. We leverage inverse texture mapping and projection sampling techniques to infer Ubody (upper-body) Gaussians from a single image. The rendered images are refined through a neural refiner. Experimental results demonstrate that GUAVA significantly outperforms previous methods in rendering quality and offers significant speed improvements, with reconstruction times in the sub-second range (0.1s), and supports real-time animation and rendering.\n\n从单张图像中重建具有面部与手部表情的高质量可动画三维人体头像，因其广泛的应用潜力而受到广泛关注。传统的三维人体头像重建方法通常依赖多视角或单目视频，并需针对每个个体进行训练，过程复杂且耗时。此外，受限于 SMPLX 模型的表达能力，此类方法虽能处理躯干运动，但在面部表情建模方面表现不足。\n为解决上述问题，我们首先提出了一个增强面部表达能力的 EHM（Expressive Human Model），并在此基础上开发了精确的追踪方法。基于该模板模型，我们进一步提出 GUAVA，首个面向快速可动画上半身三维高斯头像重建的框架。\nGUAVA 利用反向纹理映射与投影采样技术，从单张图像中推理出上半身（Ubody）高斯图元，并通过神经细化器对渲染结果进行优化。实验结果表明，GUAVA 在渲染质量上显著优于现有方法，重建速度达到亚秒级（0.1 秒），支持实时动画与渲染。\n"
  },
  {
    "path": "abs/2505.04262.md",
    "content": "### Bridging Geometry-Coherent Text-to-3D Generation with Multi-View Diffusion Priors and Gaussian Splatting\n\nScore Distillation Sampling (SDS) leverages pretrained 2D diffusion models to advance text-to-3D generation but neglects multi-view correlations, being prone to geometric inconsistencies and multi-face artifacts in the generated 3D content. In this work, we propose Coupled Score Distillation (CSD), a framework that couples multi-view joint distribution priors to ensure geometrically consistent 3D generation while enabling the stable and direct optimization of 3D Gaussian Splatting. Specifically, by reformulating the optimization as a multi-view joint optimization problem, we derive an effective optimization rule that effectively couples multi-view priors to guide optimization across different viewpoints while preserving the diversity of generated 3D assets. Additionally, we propose a framework that directly optimizes 3D Gaussian Splatting (3D-GS) with random initialization to generate geometrically consistent 3D content. We further employ a deformable tetrahedral grid, initialized from 3D-GS and refined through CSD, to produce high-quality, refined meshes. Quantitative and qualitative experimental results demonstrate the efficiency and competitive quality of our approach.\n\nScore Distillation Sampling（SDS）利用预训练的二维扩散模型推动文本生成三维（text-to-3D）技术的发展，但忽视了多视角之间的相关性，容易在生成的三维内容中产生几何不一致和多面伪影等问题。\n为解决这一问题，本文提出 Coupled Score Distillation（CSD） 框架，通过耦合多视角联合分布先验，实现几何一致的三维生成，并支持稳定且直接地优化三维高斯泼溅（3D Gaussian Splatting, 3D-GS）。具体而言，我们将优化过程重构为一个多视角联合优化问题，并由此推导出一条有效的优化规则，该规则在保持生成三维资产多样性的同时，有效耦合多视角先验，引导不同视角间的一致优化。\n此外，我们提出了一个从随机初始化出发、直接优化 3D-GS 的生成框架，用于生成几何一致的三维内容。我们进一步引入一种可变形四面体网格结构，以 3D-GS 初始化，并通过 CSD 进行精细优化，从而生成高质量、细致的三维网格。\n定量与定性实验结果表明，我们的方法在效率和生成质量上均表现出竞争力。\n"
  },
  {
    "path": "abs/2505.04659.md",
    "content": "### GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes\n\nThe semantic synthesis of unseen scenes from multiple viewpoints is crucial for research in 3D scene understanding. Current methods are capable of rendering novel-view images and semantic maps by reconstructing generalizable Neural Radiance Fields. However, they often suffer from limitations in speed and segmentation performance. We propose a generalizable semantic Gaussian Splatting method (GSsplat) for efficient novel-view synthesis. Our model predicts the positions and attributes of scene-adaptive Gaussian distributions from once input, replacing the densification and pruning processes of traditional scene-specific Gaussian Splatting. In the multi-task framework, a hybrid network is designed to extract color and semantic information and predict Gaussian parameters. To augment the spatial perception of Gaussians for high-quality rendering, we put forward a novel offset learning module through group-based supervision and a point-level interaction module with spatial unit aggregation. When evaluated with varying numbers of multi-view inputs, GSsplat achieves state-of-the-art performance for semantic synthesis at the fastest speed.\n\n从多个视角对未见场景进行语义合成是三维场景理解研究中的关键问题。当前方法通过重建具有泛化能力的神经辐射场（Neural Radiance Fields）可以实现新视角图像与语义图的渲染，但在速度与分割性能方面仍存在局限。\n本文提出了一种用于高效新视角合成的可泛化语义高斯泼溅方法 GSsplat。该方法通过一次性输入预测场景自适应高斯分布的位置与属性，取代了传统场景特定高斯泼溅方法中的密化与剪枝流程。在多任务框架下，我们设计了一种混合网络用于提取颜色与语义信息，并预测高斯参数。\n为增强高斯的空间感知能力以实现高质量渲染，我们提出了一种基于分组监督的偏移学习模块，以及结合空间单元聚合的点级交互模块。在不同数量的多视角输入下进行评估时，GSsplat 在语义合成任务中实现了当前最快的推理速度与最优性能。\n"
  },
  {
    "path": "abs/2505.04668.md",
    "content": "### SGCR: Spherical Gaussians for Efficient 3D Curve Reconstruction\n\nNeural rendering techniques have made substantial progress in generating photo-realistic 3D scenes. The latest 3D Gaussian Splatting technique has achieved high quality novel view synthesis as well as fast rendering speed. However, 3D Gaussians lack proficiency in defining accurate 3D geometric structures despite their explicit primitive representations. This is due to the fact that Gaussian's attributes are primarily tailored and fine-tuned for rendering diverse 2D images by their anisotropic nature. To pave the way for efficient 3D reconstruction, we present Spherical Gaussians, a simple and effective representation for 3D geometric boundaries, from which we can directly reconstruct 3D feature curves from a set of calibrated multi-view images. Spherical Gaussians is optimized from grid initialization with a view-based rendering loss, where a 2D edge map is rendered at a specific view and then compared to the ground-truth edge map extracted from the corresponding image, without the need for any 3D guidance or supervision. Given Spherical Gaussians serve as intermedia for the robust edge representation, we further introduce a novel optimization-based algorithm called SGCR to directly extract accurate parametric curves from aligned Spherical Gaussians. We demonstrate that SGCR outperforms existing state-of-the-art methods in 3D edge reconstruction while enjoying great efficiency.\n\n神经渲染技术在生成真实感三维场景方面取得了显著进展。最新的三维高斯泼溅（3D Gaussian Splatting）方法在实现高质量的新视角合成和快速渲染方面表现优异。然而，尽管 3D 高斯采用了显式图元表示，其在精确建模三维几何结构方面仍存在不足。这主要是因为高斯图元的属性本质上具有各向异性，设计初衷是为了优化多样化二维图像的渲染效果，而非几何结构表达。\n为实现高效三维重建，本文提出了一种简单而有效的三维几何边界表示形式——球面高斯（Spherical Gaussians），可用于从一组已标定的多视角图像中直接重建三维特征曲线。球面高斯从网格初始化出发，通过基于视图的渲染损失进行优化。在特定视角下渲染出二维边缘图，并与对应图像中提取的真实边缘图进行比较，无需任何三维监督或引导。\n由于球面高斯可作为鲁棒边缘表示的中介，我们进一步提出了一种优化算法 SGCR，可从配准后的球面高斯中直接提取高精度的参数化三维曲线。实验表明，SGCR 在三维边缘重建任务中优于现有最先进方法，同时具备极高的效率。\n"
  },
  {
    "path": "abs/2505.04959.md",
    "content": "### MoRe-3DGSMR: Motion-resolved reconstruction framework for free-breathing pulmonary MRI based on 3D Gaussian representation\n\nThis study presents an unsupervised, motion-resolved reconstruction framework for high-resolution, free-breathing pulmonary magnetic resonance imaging (MRI), utilizing a three-dimensional Gaussian representation (3DGS). The proposed method leverages 3DGS to address the challenges of motion-resolved 3D isotropic pulmonary MRI reconstruction by enabling data smoothing between voxels for continuous spatial representation. Pulmonary MRI data acquisition is performed using a golden-angle radial sampling trajectory, with respiratory motion signals extracted from the center of k-space in each radial spoke. Based on the estimated motion signal, the k-space data is sorted into multiple respiratory phases. A 3DGS framework is then applied to reconstruct a reference image volume from the first motion state. Subsequently, a patient-specific convolutional neural network is trained to estimate the deformation vector fields (DVFs), which are used to generate the remaining motion states through spatial transformation of the reference volume. The proposed reconstruction pipeline is evaluated on six datasets from six subjects and bench-marked against three state-of-the-art reconstruction methods. The experimental findings demonstrate that the proposed reconstruction framework effectively reconstructs high-resolution, motion-resolved pulmonary MR images. Compared with existing approaches, it achieves superior image quality, reflected by higher signal-to-noise ratio and contrast-to-noise ratio. The proposed unsupervised 3DGS-based reconstruction method enables accurate motion-resolved pulmonary MRI with isotropic spatial resolution. Its superior performance in image quality metrics over state-of-the-art methods highlights its potential as a robust solution for clinical pulmonary MR imaging.\n\n本研究提出了一种**基于三维高斯表示（3DGS）**的无监督、高分辨率自由呼吸肺部磁共振成像（MRI）运动分辨重建框架。该方法利用 3DGS 在体素间实现数据平滑，从而实现连续的空间表示，有效应对运动分辨三维各向同性肺部 MRI 重建中的挑战。\n肺部 MRI 数据采集采用 golden-angle 径向采样轨迹，通过提取每根径向线中心的 k 空间信息获取呼吸运动信号。根据估计的运动信号，将 k 空间数据划分为多个呼吸相位。随后，采用 3DGS 框架在第一个运动状态下重建参考图像体积。\n在此基础上，训练一个个体特异性的卷积神经网络用于估计形变矢量场（Deformation Vector Fields, DVFs），并通过对参考体积进行空间变换生成其余运动状态。\n该重建流程在来自六名受试者的六个数据集上进行了评估，并与三种最先进的重建方法进行了对比。实验结果表明，所提出的重建框架能够有效重建高分辨率的运动分辨肺部 MRI 图像，在图像质量方面优于现有方法，表现为更高的信噪比（SNR）与对比噪声比（CNR）。\n综上所述，该无监督的基于 3DGS 的重建方法实现了准确的、具备各向同性空间分辨率的运动分辨肺部 MRI，在图像质量评估指标上优于现有方法，显示出其作为临床肺部 MRI 成像鲁棒解决方案的潜力。\n"
  },
  {
    "path": "abs/2505.05356.md",
    "content": "### Time of the Flight of the Gaussians: Optimizing Depth Indirectly in Dynamic Radiance Fields\n\nWe present a method to reconstruct dynamic scenes from monocular continuous-wave time-of-flight (C-ToF) cameras using raw sensor samples that achieves similar or better accuracy than neural volumetric approaches and is 100x faster. Quickly achieving high-fidelity dynamic 3D reconstruction from a single viewpoint is a significant challenge in computer vision. In C-ToF radiance field reconstruction, the property of interest-depth-is not directly measured, causing an additional challenge. This problem has a large and underappreciated impact upon the optimization when using a fast primitive-based scene representation like 3D Gaussian splatting, which is commonly used with multi-view data to produce satisfactory results and is brittle in its optimization otherwise. We incorporate two heuristics into the optimization to improve the accuracy of scene geometry represented by Gaussians. Experimental results show that our approach produces accurate reconstructions under constrained C-ToF sensing conditions, including for fast motions like swinging baseball bats.\n\n我们提出了一种利用单目连续波飞行时间（Continuous-wave Time-of-Flight, C-ToF）相机的原始传感器采样数据重建动态场景的方法，其重建精度可与神经体积方法相媲美甚至更优，且速度提升达 100 倍。在计算机视觉中，从单一视角快速实现高保真动态三维重建仍是一个重大挑战。\n在 C-ToF 辐射场重建中，目标属性——深度——并非直接观测量，这为重建任务带来了额外困难。当使用如三维高斯泼溅（3D Gaussian Splatting）等快速图元表示方法进行场景建模时，这一问题对优化过程的影响尤为显著。3DGS 通常依赖多视角数据才能获得令人满意的结果，在单视角条件下其优化过程极为脆弱。\n为提升基于高斯表示的场景几何精度，我们在优化过程中引入了两条启发式策略。实验结果表明，在受限的 C-ToF 传感条件下，包括处理如挥棒等高速动态的情形，我们的方法依然能够实现高精度重建。\n"
  },
  {
    "path": "abs/2505.05474.md",
    "content": "### 3D Scene Generation: A Survey\n\n3D scene generation seeks to synthesize spatially structured, semantically meaningful, and photorealistic environments for applications such as immersive media, robotics, autonomous driving, and embodied AI. Early methods based on procedural rules offered scalability but limited diversity. Recent advances in deep generative models (e.g., GANs, diffusion models) and 3D representations (e.g., NeRF, 3D Gaussians) have enabled the learning of real-world scene distributions, improving fidelity, diversity, and view consistency. Recent advances like diffusion models bridge 3D scene synthesis and photorealism by reframing generation as image or video synthesis problems. This survey provides a systematic overview of state-of-the-art approaches, organizing them into four paradigms: procedural generation, neural 3D-based generation, image-based generation, and video-based generation. We analyze their technical foundations, trade-offs, and representative results, and review commonly used datasets, evaluation protocols, and downstream applications. We conclude by discussing key challenges in generation capacity, 3D representation, data and annotations, and evaluation, and outline promising directions including higher fidelity, physics-aware and interactive generation, and unified perception-generation models. This review organizes recent advances in 3D scene generation and highlights promising directions at the intersection of generative AI, 3D vision, and embodied intelligence.\n\n三维场景生成旨在合成具有空间结构、语义丰富且具备真实感的环境，广泛应用于沉浸式媒体、机器人技术、自动驾驶和具身智能等领域。早期基于程序规则的方法具备可扩展性，但在多样性方面受到限制。近年来，随着深度生成模型（如 GAN 和扩散模型）以及三维表示方法（如 NeRF 和 3D Gaussians）的发展，三维场景生成能够学习真实世界的场景分布，显著提升了真实感、多样性与视角一致性。\n特别是扩散模型的最新进展将三维场景合成问题重新表述为图像或视频生成任务，进一步推动了三维场景生成与写实渲染的融合。\n本文对该领域的最新方法进行了系统综述，并将现有技术归纳为四大类范式：程序生成、基于神经三维表示的生成、基于图像的生成以及基于视频的生成。我们分析了各类方法的技术基础、权衡因素与代表性成果，并回顾了常用数据集、评估协议及其下游应用。\n最后，我们讨论了当前三维场景生成面临的核心挑战，包括生成能力、三维表示、数据与标注、以及评估机制。同时，展望了若干关键发展方向，如更高保真度的生成、具备物理感知与交互能力的生成方法、以及感知与生成统一建模框架。\n本综述系统梳理了三维场景生成领域的研究进展，强调了其在生成式人工智能、三维视觉与具身智能交叉点上的前沿潜力。\n"
  },
  {
    "path": "abs/2505.05475.md",
    "content": "### SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation\n\nCreating high-quality animatable 3D human avatars from a single image remains a significant challenge in computer vision due to the inherent difficulty of reconstructing complete 3D information from a single viewpoint. Current approaches face a clear limitation: 3D Gaussian Splatting (3DGS) methods produce high-quality results but require multiple views or video sequences, while video diffusion models can generate animations from single images but struggle with consistency and identity preservation. We present SVAD, a novel approach that addresses these limitations by leveraging complementary strengths of existing techniques. Our method generates synthetic training data through video diffusion, enhances it with identity preservation and image restoration modules, and utilizes this refined data to train 3DGS avatars. Comprehensive evaluations demonstrate that SVAD outperforms state-of-the-art (SOTA) single-image methods in maintaining identity consistency and fine details across novel poses and viewpoints, while enabling real-time rendering capabilities. Through our data augmentation pipeline, we overcome the dependency on dense monocular or multi-view training data typically required by traditional 3DGS approaches. Extensive quantitative, qualitative comparisons show our method achieves superior performance across multiple metrics against baseline models. By effectively combining the generative power of diffusion models with both the high-quality results and rendering efficiency of 3DGS, our work establishes a new approach for high-fidelity avatar generation from a single image input.\n\n从单张图像创建高质量、可动画的三维人体头像在计算机视觉领域仍是一项重大挑战，其核心困难在于难以从单一视角中完整重建三维信息。现有方法存在明显局限：三维高斯泼溅（3D Gaussian Splatting, 3DGS）虽可生成高质量结果，但依赖多视图或视频序列；而视频扩散模型虽可从单张图像生成动画，但在一致性与身份保持方面表现不佳。\n本文提出 SVAD，一种突破现有限制的新方法，通过融合现有技术的互补优势实现高保真头像生成。我们的方法首先通过视频扩散模型生成合成训练数据，并引入身份保持与图像修复模块对其进行增强，随后利用该精炼数据训练 3DGS 头像模型。\n全面评估结果表明，SVAD 在新姿态与新视角下，能够比现有最先进的单图方法更好地保持身份一致性与细节保真，并支持实时渲染。通过我们设计的数据增强流程，SVAD 克服了传统 3DGS 方法对稠密单目或多视图训练数据的依赖。\n大量定量与定性对比实验表明，SVAD 在多个评估指标上均显著优于现有基线模型。通过有效结合扩散模型的生成能力与 3DGS 的高质量渲染与效率，SVAD 为从单张图像输入生成高保真三维头像提供了一种全新方案。\n"
  },
  {
    "path": "abs/2505.05505.md",
    "content": "### Apply Hierarchical-Chain-of-Generation to Complex Attributes Text-to-3D Generation\n\nRecent text-to-3D models can render high-quality assets, yet they still stumble on objects with complex attributes. The key obstacles are: (1) existing text-to-3D approaches typically lift text-to-image models to extract semantics via text encoders, while the text encoder exhibits limited comprehension ability for long descriptions, leading to deviated cross-attention focus, subsequently wrong attribute binding in generated results. (2) Occluded object parts demand a disciplined generation order and explicit part disentanglement. Though some works introduce manual efforts to alleviate the above issues, their quality is unstable and highly reliant on manual information. To tackle above problems, we propose a automated method Hierarchical-Chain-of-Generation (HCoG). It leverages a large language model to decompose the long description into blocks representing different object parts, and orders them from inside out according to occlusions, forming a hierarchical chain. Within each block we first coarsely create components, then precisely bind attributes via target-region localization and corresponding 3D Gaussian kernel optimization. Between blocks, we introduce Gaussian Extension and Label Elimination to seamlessly generate new parts by extending new Gaussian kernels, re-assigning semantic labels, and eliminating unnecessary kernels, ensuring that only relevant parts are added without disrupting previously optimized parts. Experiments confirm that HCoG yields structurally coherent, attribute-faithful 3D objects with complex attributes.\n\n最新的文本生成三维（text-to-3D）模型已能够渲染高质量资产，但在处理具有复杂属性的物体时仍存在困难。主要障碍在于：(1) 现有方法通常将文本到图像的模型扩展到三维，通过文本编码器提取语义，但文本编码器对长文本理解能力有限，导致交叉注意力偏离，从而产生属性绑定错误；(2) 对于被遮挡的物体部分，需要严格的生成顺序与显式的结构解耦。尽管已有部分方法通过人工干预缓解上述问题，但其生成质量不稳定，且严重依赖手工信息。\n为解决这些问题，我们提出一种自动化方法 Hierarchical-Chain-of-Generation（HCoG）。该方法利用大语言模型将长文本描述分解为表示不同物体部件的语义块，并根据遮挡关系从内到外排序，形成层次化生成链。在每个语义块内，首先粗略生成部件形状，然后通过目标区域定位与对应的三维高斯核优化实现属性的精确绑定。\n在语义块之间，我们引入 Gaussian Extension 与 Label Elimination 机制，通过扩展新的高斯核、重新分配语义标签并消除冗余核，实现新部件的无缝生成，确保仅添加相关部分而不破坏已优化部分。\n实验结果表明，HCoG 能够生成结构连贯、属性准确的复杂三维物体，显著提升了对复杂属性的表达能力与建模一致性。\n"
  },
  {
    "path": "abs/2505.05587.md",
    "content": "### Steepest Descent Density Control for Compact 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time, high-resolution novel view synthesis. By representing scenes as a mixture of Gaussian primitives, 3DGS leverages GPU rasterization pipelines for efficient rendering and reconstruction. To optimize scene coverage and capture fine details, 3DGS employs a densification algorithm to generate additional points. However, this process often leads to redundant point clouds, resulting in excessive memory usage, slower performance, and substantial storage demands - posing significant challenges for deployment on resource-constrained devices. To address this limitation, we propose a theoretical framework that demystifies and improves density control in 3DGS. Our analysis reveals that splitting is crucial for escaping saddle points. Through an optimization-theoretic approach, we establish the necessary conditions for densification, determine the minimal number of offspring Gaussians, identify the optimal parameter update direction, and provide an analytical solution for normalizing off-spring opacity. Building on these insights, we introduce SteepGS, incorporating steepest density control, a principled strategy that minimizes loss while maintaining a compact point cloud. SteepGS achieves a ~50% reduction in Gaussian points without compromising rendering quality, significantly enhancing both efficiency and scalability.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）作为一种强大的技术，已在实时高分辨率新视角合成任务中展现出卓越性能。通过将场景表示为高斯图元的混合体，3DGS 能够借助 GPU 光栅化管线实现高效的渲染与重建。为提升场景覆盖度与细节捕捉能力，3DGS 通常采用密化算法生成更多点位。然而，该过程常常导致点云冗余，进而带来高内存占用、渲染性能下降以及显著的存储压力，这对资源受限设备的部署构成了严峻挑战。\n为解决这一问题，本文提出了一个理论框架，用于揭示并改进 3DGS 中的密度控制机制。我们的分析表明，“图元分裂”在跳出鞍点中起着关键作用。基于优化理论，我们推导出密化的必要条件，确定最小子高斯数量，分析最优参数更新方向，并提供了归一化子高斯不透明度的解析解。\n基于上述理论洞察，我们提出了 SteepGS，一种融合最速密度控制的策略，能够在最小化损失的同时保持紧凑的点云分布。SteepGS 实现了在不损失渲染质量的前提下约 50% 的高斯点数减少，显著提升了系统的效率与可扩展性。\n"
  },
  {
    "path": "abs/2505.05591.md",
    "content": "### QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization\n\nSurface reconstruction is fundamental to computer vision and graphics, enabling applications in 3D modeling, mixed reality, robotics, and more. Existing approaches based on volumetric rendering obtain promising results, but optimize on a per-scene basis, resulting in a slow optimization that can struggle to model under-observed or textureless regions. We introduce QuickSplat, which learns data-driven priors to generate dense initializations for 2D gaussian splatting optimization of large-scale indoor scenes. This provides a strong starting point for the reconstruction, which accelerates the convergence of the optimization and improves the geometry of flat wall structures. We further learn to jointly estimate the densification and update of the scene parameters during each iteration; our proposed densifier network predicts new Gaussians based on the rendering gradients of existing ones, removing the needs of heuristics for densification. Extensive experiments on large-scale indoor scene reconstruction demonstrate the superiority of our data-driven optimization. Concretely, we accelerate runtime by 8x, while decreasing depth errors by up to 48% in comparison to state of the art methods.\n\n表面重建是计算机视觉与图形学中的基础任务，支撑着三维建模、混合现实、机器人等多种应用。现有基于体渲染的方法已取得了令人瞩目的成果，但往往依赖每个场景独立优化，导致优化过程缓慢，并在观测不足或纹理稀缺区域表现不佳。\n我们提出 QuickSplat，一种利用数据驱动先验为大规模室内场景的二维高斯泼溅（2D Gaussian Splatting）优化生成稠密初始化的新方法。该初始化为重建提供了强有力的起点，加速了优化收敛过程，并显著改善了平坦墙面等结构的几何质量。\n此外，我们进一步设计了一种联合估计密化与场景参数更新的策略。我们提出的 densifier 网络 通过现有高斯的渲染梯度预测新高斯点，从而无需依赖传统的启发式密化策略。\n在大规模室内场景重建上的广泛实验表明，我们的优化框架具有明显优势。具体而言，QuickSplat 在保持或提升重建质量的同时，将运行时间加速了 8 倍，并将深度误差最多降低了 48%，显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2505.05643.md",
    "content": "### UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes\n\nUltrasound imaging is widely used due to its safety, affordability, and real-time capabilities, but its 2D interpretation is highly operator-dependent, leading to variability and increased cognitive demand. 2D-to-3D reconstruction mitigates these challenges by providing standardized volumetric views, yet existing methods are often computationally expensive, memory-intensive, or incompatible with ultrasound physics. We introduce UltraGauss: the first ultrasound-specific Gaussian Splatting framework, extending view synthesis techniques to ultrasound wave propagation. Unlike conventional perspective-based splatting, UltraGauss models probe-plane intersections in 3D, aligning with acoustic image formation. We derive an efficient rasterization boundary formulation for GPU parallelization and introduce a numerically stable covariance parametrization, improving computational efficiency and reconstruction accuracy. On real clinical ultrasound data, UltraGauss achieves state-of-the-art reconstructions in 5 minutes, and reaching 0.99 SSIM within 20 minutes on a single GPU. A survey of expert clinicians confirms UltraGauss' reconstructions are the most realistic among competing methods. Our CUDA implementation will be released upon publication.\n\n超声成像因其安全性、低成本和实时性被广泛应用，但其二维图像的解读高度依赖操作者，容易导致结果差异大、认知负担重。二维到三维的重建可以通过提供标准化的体积视图来缓解这些问题，然而现有方法往往计算开销大、内存占用高，或与超声物理模型不兼容。\n我们提出 UltraGauss：首个面向超声成像的高斯泼溅框架，将新视角合成技术扩展至超声波传播过程。不同于传统基于透视模型的泼溅方法，UltraGauss 在三维空间中建模探头平面与组织结构的交互，更贴合声学图像的形成机制。\n我们推导出一种适用于 GPU 并行处理的高效光栅边界公式，并引入数值稳定的协方差参数化方式，从而提升计算效率和重建精度。在真实临床超声数据上的实验表明，UltraGauss 可在 5 分钟内实现最先进的三维重建效果，并在单张 GPU 上于 20 分钟内达到 0.99 SSIM 的图像结构相似度评分。\n专家临床医生的评估结果进一步确认，UltraGauss 所生成的重建图像在所有对比方法中最具真实感。\n"
  },
  {
    "path": "abs/2505.05672.md",
    "content": "### TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling\n\nSparse volumetric reconstruction and rendering via 3D Gaussian splatting have recently enabled animatable 3D head avatars that are rendered under arbitrary viewpoints with impressive photorealism. Today, such photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment. Building a photoreal avatar requires estimating the complex non-rigid motion of different facial components as seen in input video images; due to inaccurate motion estimation, animatable models typically present a loss of fidelity and detail when compared to their non-animatable counterparts, built from an individual facial expression. Also, recent state-of-the-art models are often affected by memory limitations that reduce the number of 3D Gaussians used for modeling, leading to lower detail and quality. To address these problems, we present a new high-detail 3D head avatar model that improves upon the state of the art, largely increasing the number of 3D Gaussians and modeling quality for rendering at 4K resolution. Our high-quality model is reconstructed from multiview input video and builds on top of a mesh-based 3D morphable model, which provides a coarse deformation layer for the head. Photoreal appearance is modelled by 3D Gaussians embedded within the continuous UVD tangent space of this mesh, allowing for more effective densification where most needed. Additionally, these Gaussians are warped by a novel UVD deformation field to capture subtle, localized motion. Our key contribution is the novel deformable Gaussian encoding and overall fitting procedure that allows our head model to preserve appearance detail, while capturing facial motion and other transient high-frequency features such as skin wrinkling.\n\n稀疏体积重建与渲染近年来借助三维高斯泼溅（3D Gaussian Splatting）技术，实现了可动画的三维头像，可在任意视角下呈现出令人印象深刻的真实感。如今，这类写实头像已成为远程交互、扩展现实与娱乐等新兴应用中的关键组成部分。\n构建高保真头像需准确估计输入视频中面部各部位的复杂非刚性运动。然而，由于运动估计的不准确，可动画模型通常在保真度和细节方面逊于基于静态面部表情构建的非动画模型。此外，当前最先进的方法常受限于显存资源，限制了可用于建模的三维高斯数量，从而影响渲染细节与质量。\n为解决上述问题，我们提出了一种新型的高细节三维头像模型，在现有方法基础上实现了显著提升，通过大幅增加三维高斯数量与建模质量，实现了可达 4K 分辨率 的高质量渲染。该模型基于多视角输入视频重建，并建立在三维可变形网格模型（3D Morphable Model）基础上，为头部提供一个粗略的形变层。\n写实外观通过嵌入于网格连续 UVD 切线空间中的三维高斯建模，从而能在需要的区域实现更有效的密化。同时，这些高斯通过一个新颖的 UVD 形变场 进行变形，以捕捉局部微妙运动。\n我们工作的核心贡献是提出了一种可变形高斯编码方法及其整体拟合流程，使得头像模型在捕捉面部运动与诸如皮肤皱纹等瞬时高频细节的同时，依然保持了极高的外观保真度。\n"
  },
  {
    "path": "abs/2505.06523.md",
    "content": "### Virtualized 3D Gaussians: Flexible Cluster-based Level-of-Detail System for Real-Time Rendering of Composed Scenes\n\n3D Gaussian Splatting (3DGS) enables the reconstruction of intricate digital 3D assets from multi-view images by leveraging a set of 3D Gaussian primitives for rendering. Its explicit and discrete representation facilitates the seamless composition of complex digital worlds, offering significant advantages over previous neural implicit methods. However, when applied to large-scale compositions, such as crowd-level scenes, it can encompass numerous 3D Gaussians, posing substantial challenges for real-time rendering. To address this, inspired by Unreal Engine 5's Nanite system, we propose Virtualized 3D Gaussians (V3DG), a cluster-based LOD solution that constructs hierarchical 3D Gaussian clusters and dynamically selects only the necessary ones to accelerate rendering speed. Our approach consists of two stages: (1) Offline Build, where hierarchical clusters are generated using a local splatting method to minimize visual differences across granularities, and (2) Online Selection, where footprint evaluation determines perceptible clusters for efficient rasterization during rendering. We curate a dataset of synthetic and real-world scenes, including objects, trees, people, and buildings, each requiring 0.1 billion 3D Gaussians to capture fine details. Experiments show that our solution balances rendering efficiency and visual quality across user-defined tolerances, facilitating downstream interactive applications that compose extensive 3DGS assets for consistent rendering performance.\n\n3D Gaussian Splatting（3DGS）通过利用一组三维高斯图元进行渲染，使得从多视图图像重建复杂的数字三维资产成为可能。其显式且离散的表示方式，有利于复杂数字世界的自由组合，相较于以往的神经隐式方法具有显著优势。\n然而，当应用于大规模组合场景（如人群级场景）时，往往包含数量庞大的三维高斯图元，给实时渲染带来显著挑战。为解决这一问题，我们受 Unreal Engine 5 中 Nanite 系统的启发，提出 Virtualized 3D Gaussians（V3DG），一种基于簇的 LOD（Level-of-Detail）解决方案。该方法构建分层的三维高斯图元簇，并在渲染过程中动态选择所需部分，以提升渲染速度。\n我们的方法包含两个阶段：（1）离线构建阶段，使用局部泼溅方法生成层次簇，最小化不同粒度之间的视觉差异；（2）在线选择阶段，通过图元投影面积评估决定当前渲染中可感知的簇，以实现高效光栅化。\n我们整理了一组包含合成与真实场景的数据集，涵盖对象、树木、人群与建筑，每个场景约需 0.1 亿个三维高斯图元以捕捉细节。实验表明，该方法在用户定义的容差范围内兼顾渲染效率与视觉质量，支持大规模 3DGS 资产的组合应用，实现一致且高效的渲染表现。\n"
  },
  {
    "path": "abs/2505.07396.md",
    "content": "### Monocular Online Reconstruction with Enhanced Detail Preservation\n\nWe propose an online 3D Gaussian-based dense mapping framework for photorealistic details reconstruction from a monocular image stream. Our approach addresses two key challenges in monocular online reconstruction: distributing Gaussians without relying on depth maps and ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: the Hierarchical Gaussian Management Module for effective Gaussian distribution and the Global Consistency Optimization Module for maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians for capturing details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency. Moreover, it integrates seamlessly with various tracking systems, ensuring generality and scalability.\n\n我们提出了一种基于三维高斯的在线稠密建图框架，用于从单目图像流中重建具有逼真细节的三维场景。该方法针对单目在线重建中的两个关键挑战进行设计：一是如何在不依赖深度图的情况下合理分布高斯点；二是如何在重建的地图中同时确保局部与全局的一致性。\n为此，我们引入了两个核心模块：分层高斯管理模块（Hierarchical Gaussian Management Module），用于实现高效的高斯分布；全局一致性优化模块（Global Consistency Optimization Module），用于在各个尺度上保持重建结果的对齐与一致性。\n此外，我们提出了一种名为**多层次占据哈希体素（Multi-level Occupancy Hash Voxels, MOHV）**的结构，用于对高斯进行正则化，引导其在多个细节层级上捕捉几何与纹理信息。MOHV 结构可确保精细与粗略结构的准确重建，在保留细节的同时维持整体结构的完整性。\n与现有的仅使用 RGB 信息，甚至结合 RGB-D 信息的最先进方法相比，我们的框架在重建质量和计算效率方面均表现优异。同时，该框架可以无缝集成到多种跟踪系统中，具备良好的通用性与可扩展性。\n"
  },
  {
    "path": "abs/2505.07887.md",
    "content": "### Monocular Online Reconstruction with Enhanced Detail Preservation\n\nWe propose an online 3D Gaussian-based dense mapping framework for photorealistic details reconstruction from a monocular image stream. Our approach addresses two key challenges in monocular online reconstruction: distributing Gaussians without relying on depth maps and ensuring both local and global consistency in the reconstructed maps. To achieve this, we introduce two key modules: the Hierarchical Gaussian Management Module for effective Gaussian distribution and the Global Consistency Optimization Module for maintaining alignment and coherence at all scales. In addition, we present the Multi-level Occupancy Hash Voxels (MOHV), a structure that regularizes Gaussians for capturing details across multiple levels of granularity. MOHV ensures accurate reconstruction of both fine and coarse geometries and textures, preserving intricate details while maintaining overall structural integrity. Compared to state-of-the-art RGB-only and even RGB-D methods, our framework achieves superior reconstruction quality with high computational efficiency. Moreover, it integrates seamlessly with various tracking systems, ensuring generality and scalability.\n\n我们提出了一种基于三维高斯的在线稠密建图框架，用于从单目图像流中重建具有照片级真实感的细节。该方法针对单目在线重建中的两个关键挑战进行了解决：无需依赖深度图即可分布高斯点，以及在重建地图中同时保证局部与全局的一致性。\n为此，我们引入了两个关键模块：分层高斯管理模块（Hierarchical Gaussian Management Module），用于实现高效的高斯分布；全局一致性优化模块（Global Consistency Optimization Module），用于在各个尺度上保持对齐与连贯性。\n此外，我们还提出了多层次占据哈希体素结构（Multi-level Occupancy Hash Voxels, MOHV），用于对高斯点进行正则化，从而在多个细粒度层级上捕捉几何和纹理细节。MOHV 能够准确重建精细与粗略结构，既保留复杂细节，又维持整体结构的完整性。\n与现有最先进的仅基于 RGB 甚至 RGB-D 的方法相比，我们的框架在保持较高计算效率的同时，获得了更优的重建质量。此外，该框架能够无缝集成到多种跟踪系统中，具备良好的通用性与可扩展性。\n"
  },
  {
    "path": "abs/2505.08124.md",
    "content": "### SLAG: Scalable Language-Augmented Gaussian Splatting\n\nLanguage-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM and CLIP. Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly parallelized scene encoding. Additionally, we introduce a vector database for efficient embedding storage and retrieval. Our experiments show that SLAG achieves an 18 times speedup in embedding computation on a 16-GPU setup compared to OpenGaussian, while preserving embedding quality on the ScanNet and LERF datasets.\n\n语言增强的场景表示在大规模机器人应用中展现出巨大潜力，例如搜救任务、智慧城市和矿业等。这些应用通常具有时间敏感性，要求快速完成场景编码，同时又伴随着大量数据需求，因此亟需具备可扩展性的解决方案。在计算资源有限的机器人上部署此类表示方法，更加剧了实现难度。\n为应对上述挑战，我们提出了 SLAG，一个用于语言增强高斯投影（Gaussian Splatting）的多 GPU 框架，显著提升了大场景嵌入的速度与可扩展性。该方法通过结合 SAM 和 CLIP，将二维视觉-语言模型的特征嵌入到三维场景中。与以往方法不同，SLAG 无需使用损失函数来为每个高斯计算语言嵌入，而是通过对三维高斯场景参数进行归一化加权平均来提取嵌入，从而实现高度并行化的场景编码。\n此外，我们还引入了向量数据库，用于高效地存储和检索嵌入向量。\n实验结果表明，SLAG 在 16 GPU 环境下的嵌入计算速度比 OpenGaussian 快 18 倍，同时在 ScanNet 和 LERF 数据集上保持了嵌入质量。\n"
  },
  {
    "path": "abs/2505.08510.md",
    "content": "### FOCI: Trajectory Optimization on Gaussian Splats\n\n3D Gaussian Splatting (3DGS) has recently gained popularity as a faster alternative to Neural Radiance Fields (NeRFs) in 3D reconstruction and view synthesis methods. Leveraging the spatial information encoded in 3DGS, this work proposes FOCI (Field Overlap Collision Integral), an algorithm that is able to optimize trajectories directly on the Gaussians themselves. FOCI leverages a novel and interpretable collision formulation for 3DGS using the notion of the overlap integral between Gaussians. Contrary to other approaches, which represent the robot with conservative bounding boxes that underestimate the traversability of the environment, we propose to represent the environment and the robot as Gaussian Splats. This not only has desirable computational properties, but also allows for orientation-aware planning, allowing the robot to pass through very tight and narrow spaces. We extensively test our algorithm in both synthetic and real Gaussian Splats, showcasing that collision-free trajectories for the ANYmal legged robot that can be computed in a few seconds, even with hundreds of thousands of Gaussians making up the environment.\n\n3D Gaussian Splatting 近年来作为神经辐射场（NeRF）在三维重建与新视角合成中的更高效替代方案而受到广泛关注。借助 3DGS 中编码的空间信息，本文提出了一种名为 FOCI（Field Overlap Collision Integral） 的算法，能够直接在高斯分布上进行轨迹优化。FOCI 引入了一种新颖且具可解释性的碰撞建模方法，通过定义高斯之间的重叠积分（overlap integral）来实现对碰撞的描述。与传统方法使用保守包围盒来表示机器人不同——这类方法往往低估了环境的可通行性——我们提出使用 高斯点云（Gaussian Splats） 来同时建模环境与机器人。这种表示不仅具备优越的计算特性，还支持面向朝向的路径规划（orientation-aware planning），使机器人能够通过极其狭窄的空间。我们在合成环境和真实高斯投影场景中对该算法进行了广泛测试，结果表明：即使在环境中包含数十万个高斯的情况下，FOCI 也能在几秒钟内为 ANYmal 四足机器人 计算出无碰撞轨迹。\n"
  },
  {
    "path": "abs/2505.08644.md",
    "content": "### DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting\n\nThis work presents DLO-Splatting, an algorithm for estimating the 3D shape of Deformable Linear Objects (DLOs) from multi-view RGB images and gripper state information through prediction-update filtering. The DLO-Splatting algorithm uses a position-based dynamics model with shape smoothness and rigidity dampening corrections to predict the object shape. Optimization with a 3D Gaussian Splatting-based rendering loss iteratively renders and refines the prediction to align it with the visual observations in the update step. Initial experiments demonstrate promising results in a knot tying scenario, which is challenging for existing vision-only methods.\n\n本工作提出了 DLO-Splatting，一种通过预测-更新滤波，从多视角 RGB 图像与夹爪状态信息中估计可变形线性物体（Deformable Linear Objects, DLO）三维形状的算法。DLO-Splatting 使用基于位置的动力学模型进行形状预测，并引入形状平滑性与刚性抑制修正项，以更准确地模拟物体的动态行为。在更新阶段，算法通过基于三维高斯投影（3D Gaussian Splatting）的渲染损失进行优化，迭代地渲染并细化预测结果，使其与视觉观测相一致。初步实验在“打结”任务中展示出有希望的结果，该场景对于现有仅依赖视觉的方法具有较大挑战性。\n"
  },
  {
    "path": "abs/2505.08811.md",
    "content": "### TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian\n\nUnderwater 3D scene reconstruction is crucial for undewater robotic perception and navigation. However, the task is significantly challenged by the complex interplay between light propagation, water medium, and object surfaces, with existing methods unable to model their interactions accurately. Additionally, expensive training and rendering costs limit their practical application in underwater robotic systems. Therefore, we propose Tensorized Underwater Gaussian Splatting (TUGS), which can effectively solve the modeling challenges of the complex interactions between object geometries and water media while achieving significant parameter reduction. TUGS employs lightweight tensorized higher-order Gaussians with a physics-based underwater Adaptive Medium Estimation (AME) module, enabling accurate simulation of both light attenuation and backscatter effects in underwater environments. Compared to other NeRF-based and GS-based methods designed for underwater, TUGS is able to render high-quality underwater images with faster rendering speeds and less memory usage. Extensive experiments on real-world underwater datasets have demonstrated that TUGS can efficiently achieve superior reconstruction quality using a limited number of parameters, making it particularly suitable for memory-constrained underwater UAV applications\n\n水下三维场景重建对于水下机器人感知与导航至关重要。然而，由于光传播、水体介质与物体表面之间复杂的相互作用，该任务面临极大挑战，现有方法难以准确建模这些交互过程。此外，高昂的训练与渲染开销也限制了其在水下机器人系统中的实际应用。为此，我们提出了张量化水下高斯投影（Tensorized Underwater Gaussian Splatting, TUGS） 方法，能够有效解决物体几何与水体介质之间复杂交互的建模难题，同时实现显著的参数压缩。TUGS 引入了轻量的张量化高阶高斯表示，并结合基于物理的水下自适应介质估计模块（Adaptive Medium Estimation, AME），可准确模拟水下环境中的光衰减与反向散射效应。与其他基于 NeRF 或高斯投影的水下方法相比，TUGS 可在保持高质量渲染效果的同时，显著提升渲染速度并降低内存占用。我们在多个真实水下数据集上的广泛实验表明，TUGS 在使用极少参数的条件下，仍能实现优越的重建质量，尤其适用于内存受限的水下无人机（UAV）应用场景。\n"
  },
  {
    "path": "abs/2505.09324.md",
    "content": "### Neural Video Compression using 2D Gaussian Splatting\n\nThe computer vision and image processing research community has been involved in standardizing video data communications for the past many decades, leading to standards such as AVC, HEVC, VVC, AV1, AV2, etc. However, recent groundbreaking works have focused on employing deep learning-based techniques to replace the traditional video codec pipeline to a greater affect. Neural video codecs (NVC) create an end-to-end ML-based solution that does not rely on any handcrafted features (motion or edge-based) and have the ability to learn content-aware compression strategies, offering better adaptability and higher compression efficiency than traditional methods. This holds a great potential not only for hardware design, but also for various video streaming platforms and applications, especially video conferencing applications such as MS-Teams or Zoom that have found extensive usage in classrooms and workplaces. However, their high computational demands currently limit their use in real-time applications like video conferencing. To address this, we propose a region-of-interest (ROI) based neural video compression model that leverages 2D Gaussian Splatting. Unlike traditional codecs, 2D Gaussian Splatting is capable of real-time decoding and can be optimized using fewer data points, requiring only thousands of Gaussians for decent quality outputs as opposed to millions in 3D scenes. In this work, we designed a video pipeline that speeds up the encoding time of the previous Gaussian splatting-based image codec by 88% by using a content-aware initialization strategy paired with a novel Gaussian inter-frame redundancy-reduction mechanism, enabling Gaussian splatting to be used for a video-codec solution, the first of its kind solution in this neural video codec space.\n\n计算机视觉与图像处理研究社区在过去几十年中持续推动视频数据通信标准的发展，先后制定了 AVC、HEVC、VVC、AV1、AV2 等标准。然而，近年来的突破性研究逐渐将重心转向利用深度学习技术替代传统视频编解码流程。神经视频编解码器（Neural Video Codecs, NVC） 提供了一种端到端的机器学习解决方案，完全摆脱了对传统手工设计特征（如运动或边缘信息）的依赖，具备学习内容自适应压缩策略的能力，在适应性和压缩效率方面均优于传统方法。\n这一发展不仅对硬件设计具有重大意义，也对视频流媒体平台和应用（特别是 MS-Teams、Zoom 等广泛应用于教室和办公场景的视频会议平台）带来巨大潜力。然而，目前神经视频编解码器的高计算开销仍限制了其在实时应用（如视频会议）中的广泛部署。\n为应对这一挑战，我们提出了一种基于**兴趣区域（Region-of-Interest, ROI）**的神经视频压缩模型，利用 二维高斯投影（2D Gaussian Splatting） 实现高效编码。不同于传统编解码器，2D 高斯投影具备实时解码能力，并且仅需数千个高斯点即可生成质量较高的输出，而无需像三维场景那样使用上百万个高斯。\n在本工作中，我们设计了一条完整的视频处理流水线，结合内容感知初始化策略与新颖的高斯帧间冗余消除机制，在此前基于高斯投影的图像编解码器基础上，将编码速度提升了 88%，首次实现在神经视频编解码领域中应用高斯投影的可行视频编码方案。\n"
  },
  {
    "path": "abs/2505.09413.md",
    "content": "### Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians\n\nCurrent learning-based methods predict NeRF or 3D Gaussians from point clouds to achieve photo-realistic rendering but still depend on categorical priors, dense point clouds, or additional refinements. Hence, we introduce a novel point cloud rendering method by predicting 2D Gaussians from point clouds. Our method incorporates two identical modules with an entire-patch architecture enabling the network to be generalized to multiple datasets. The module normalizes and initializes the Gaussians utilizing the point cloud information including normals, colors and distances. Then, splitting decoders are employed to refine the initial Gaussians by duplicating them and predicting more accurate results, making our methodology effectively accommodate sparse point clouds as well. Once trained, our approach exhibits direct generalization to point clouds across different categories. The predicted Gaussians are employed directly for rendering without additional refinement on the rendered images, retaining the benefits of 2D Gaussians. We conduct extensive experiments on various datasets, and the results demonstrate the superiority and generalization of our method, which achieves SOTA performance.\n\n当前基于学习的方法通常从点云中预测 NeRF 或三维高斯，以实现逼真的图像渲染，但这些方法仍依赖于类别先验、稠密点云或额外的后处理步骤。为此，我们提出了一种新颖的点云渲染方法：从点云中预测二维高斯（2D Gaussians）。\n我们的方法采用两个结构相同的模块，并基于整块图像区域（entire-patch）架构设计，使网络具有良好的跨数据集泛化能力。该模块利用点云的法向量、颜色和深度信息对高斯分布进行归一化和初始化。随后，我们引入分裂解码器（splitting decoders），通过复制初始高斯并预测更精确的参数，对其进行细化，从而使该方法同样适用于稀疏点云场景。\n在训练完成后，我们的方法能够直接泛化至不同类别的点云数据，并且可直接使用预测得到的高斯进行渲染，无需对渲染图像进行额外优化，保留了二维高斯的高效特性。\n我们在多个数据集上进行了广泛实验，结果表明该方法在精度和泛化能力方面均优于现有方法，达到了当前最优性能（SOTA）。\n"
  },
  {
    "path": "abs/2505.09601.md",
    "content": "### Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware\n\nScaling robot learning requires vast and diverse datasets. Yet the prevailing data collection paradigm-human teleoperation-remains costly and constrained by manual effort and physical robot access. We introduce Real2Render2Real (R2R2R), a novel approach for generating robot training data without relying on object dynamics simulation or teleoperation of robot hardware. The input is a smartphone-captured scan of one or more objects and a single video of a human demonstration. R2R2R renders thousands of high visual fidelity robot-agnostic demonstrations by reconstructing detailed 3D object geometry and appearance, and tracking 6-DoF object motion. R2R2R uses 3D Gaussian Splatting (3DGS) to enable flexible asset generation and trajectory synthesis for both rigid and articulated objects, converting these representations to meshes to maintain compatibility with scalable rendering engines like IsaacLab but with collision modeling off. Robot demonstration data generated by R2R2R integrates directly with models that operate on robot proprioceptive states and image observations, such as vision-language-action models (VLA) and imitation learning policies. Physical experiments suggest that models trained on R2R2R data from a single human demonstration can match the performance of models trained on 150 human teleoperation demonstrations.\n\n扩展机器人学习的能力需要大量且多样化的数据。然而，当前主流的数据采集方式——人工远程操作——成本高昂，受限于人为操作和对物理机器人设备的依赖。\n我们提出了 Real2Render2Real（R2R2R），一种无需依赖物体动力学模拟或机器人硬件远程操作的全新机器人训练数据生成方法。该方法的输入仅包括：一段由智能手机拍摄的一个或多个物体的扫描，以及一段人类演示的视频。\nR2R2R 能够渲染出成千上万条具有高视觉保真度、与机器人平台无关的演示数据。其关键在于重建高精度的三维物体几何与外观，并对物体的 6 自由度（6-DoF）运动进行跟踪。该方法采用 三维高斯投影（3D Gaussian Splatting, 3DGS） 实现对刚体与关节物体的灵活资产生成与轨迹合成，并将这些表示转换为网格（meshes），以兼容如 IsaacLab 等可扩展渲染引擎（但关闭碰撞建模功能）。\n由 R2R2R 生成的机器人演示数据可直接用于处理机器人本体状态和图像观测的模型，如视觉-语言-动作（Vision-Language-Action, VLA）模型与模仿学习策略。\n实物实验表明，仅基于一段人类演示所生成的 R2R2R 数据即可训练出在表现上媲美于 150 条人工远程操作演示数据 的机器人模型。\n"
  },
  {
    "path": "abs/2505.09915.md",
    "content": "### Large-Scale Gaussian Splatting SLAM\n\nThe recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encouraging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in large-scale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS submaps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches.\n\n近年来发展起来的神经辐射场（Neural Radiance Fields, NeRF）与三维高斯投影（3D Gaussian Splatting, 3DGS）在视觉 SLAM 任务中展现出令人鼓舞的成果。然而，现有主流方法大多依赖 RGB-D 传感器，且仅适用于室内场景，对于大规模户外环境的重建鲁棒性尚未得到系统探索。\n本文提出了一种基于 3DGS 的大规模立体视觉 SLAM 系统，称为 LSG-SLAM（Large-Scale Gaussian Splatting SLAM）。该方法利用双目相机，并采用多模态策略以应对大视角变化下的先验位姿估计问题。在跟踪阶段，我们引入了特征对齐变形约束（feature-alignment warping constraints），缓解由于渲染损失中图像外观相似性所带来的不利影响。\n为支持大规模场景的可扩展性，LSG-SLAM 设计了连续高斯子图（continuous Gaussian Splatting submaps）结构，有效处理非边界限制场景下的内存约束问题。系统通过地点识别（place recognition）检测高斯子图间的回环，并结合渲染损失与特征变形损失优化回环关键帧之间的相对位姿。在完成相机姿态与高斯点的全局优化后，我们进一步引入结构精化模块以提升重建质量。\n在 EuRoc 与 KITTI 等真实数据集上的大量实验验证表明，LSG-SLAM 的性能显著优于现有神经方法、3DGS 方法，乃至传统 SLAM 方法。\n"
  },
  {
    "path": "abs/2505.10072.md",
    "content": "### ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars\n\nThe introduction of 3D Gaussian blendshapes has enabled the real-time reconstruction of animatable head avatars from monocular video. Toonify, a StyleGAN-based framework, has become widely used for facial image stylization. To extend Toonify for synthesizing diverse stylized 3D head avatars using Gaussian blendshapes, we propose an efficient two-stage framework, ToonifyGB. In Stage 1 (stylized video generation), we employ an improved StyleGAN to generate the stylized video from the input video frames, which addresses the limitation of cropping aligned faces at a fixed resolution as preprocessing for normal StyleGAN. This process provides a more stable video, which enables Gaussian blendshapes to better capture the high-frequency details of the video frames, and efficiently generate high-quality animation in the next stage. In Stage 2 (Gaussian blendshapes synthesis), we learn a stylized neutral head model and a set of expression blendshapes from the generated video. By combining the neutral head model with expression blendshapes, ToonifyGB can efficiently render stylized avatars with arbitrary expressions. We validate the effectiveness of ToonifyGB on the benchmark dataset using two styles: Arcane and Pixar.\n\n3D 高斯形变模型（Gaussian Blendshapes） 的引入，使得从单目视频中实时重建可动画的人头头像成为可能。而基于 StyleGAN 的 Toonify 框架已被广泛应用于人脸图像的风格化处理。为将 Toonify 扩展至支持利用高斯形变模型生成多样化风格的三维人头头像，我们提出了一种高效的两阶段框架——ToonifyGB。\n在**第一阶段（风格化视频生成）**中，我们采用改进版的 StyleGAN 从输入视频帧生成风格化视频。与普通 StyleGAN 需要将人脸对齐后裁剪为固定分辨率不同，我们的方法绕过了这一预处理限制，生成的视频更稳定，从而使得后续的高斯形变能够更好地捕捉视频帧中的高频细节，并高效地支持下一阶段的高质量动画生成。\n在第二阶段（高斯形变合成）中，我们从生成的视频中学习一个风格化的中性头部模型与一组表情形变基元（expression blendshapes）。通过将中性头部与表情形变结合，ToonifyGB 可以高效渲染具有任意表情的风格化三维头像。\n我们在基准数据集上使用两种风格（**《双城之战》（Arcane）**和 皮克斯（Pixar））对 ToonifyGB 进行了验证，实验结果证明了该方法的有效性。\n"
  },
  {
    "path": "abs/2505.10144.md",
    "content": "### VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality\n\n3D Gaussian Splatting (3DGS) has rapidly become a leading technique for novel-view synthesis, providing exceptional performance through efficient software-based GPU rasterization. Its versatility enables real-time applications, including on mobile and lower-powered devices. However, 3DGS faces key challenges in virtual reality (VR): (1) temporal artifacts, such as popping during head movements, (2) projection-based distortions that result in disturbing and view-inconsistent floaters, and (3) reduced framerates when rendering large numbers of Gaussians, falling below the critical threshold for VR. Compared to desktop environments, these issues are drastically amplified by large field-of-view, constant head movements, and high resolution of head-mounted displays (HMDs). In this work, we introduce VRSplat: we combine and extend several recent advancements in 3DGS to address challenges of VR holistically. We show how the ideas of Mini-Splatting, StopThePop, and Optimal Projection can complement each other, by modifying the individual techniques and core 3DGS rasterizer. Additionally, we propose an efficient foveated rasterizer that handles focus and peripheral areas in a single GPU launch, avoiding redundant computations and improving GPU utilization. Our method also incorporates a fine-tuning step that optimizes Gaussian parameters based on StopThePop depth evaluations and Optimal Projection. We validate our method through a controlled user study with 25 participants, showing a strong preference for VRSplat over other configurations of Mini-Splatting. VRSplat is the first, systematically evaluated 3DGS approach capable of supporting modern VR applications, achieving 72+ FPS while eliminating popping and stereo-disrupting floaters.\n\n3D Gaussian Splatting（3DGS）因其高效的软件级 GPU 光栅化能力，在新视角合成任务中迅速崛起为领先技术，展现出卓越的性能表现。其多功能性使其能够应用于实时场景，包括移动端及低功耗设备。然而，在虚拟现实（VR）中，3DGS 面临以下关键挑战：（1）时间伪影，例如头部运动时的闪烁现象；（2）基于投影的失真，会导致令人不适且视角不一致的漂浮伪影；（3）在渲染大量高斯时帧率下降，低于 VR 所需的临界阈值。与桌面环境相比，这些问题在头戴显示器（HMD）中由于其大视场、持续的头部运动以及高分辨率而被显著放大。\n在本研究中，我们提出 VRSplat：一个融合并扩展近期多项 3DGS 进展的系统性方法，以全面应对 VR 场景下的挑战。我们展示了如何将 Mini-Splatting、StopThePop 和 Optimal Projection 的理念进行互补整合，具体方法包括对这些单独技术及核心 3DGS 光栅器的改进。此外，我们提出了一种高效的注视点光栅化器（foveated rasterizer），可在一次 GPU 调用中同时处理注视区域和外围区域，避免冗余计算并提升 GPU 利用率。我们的方法还引入了一个微调步骤，利用 StopThePop 的深度评估和 Optimal Projection 优化高斯参数。\n我们通过一项包含 25 位参与者的对照用户研究验证了该方法，结果表明，用户显著偏好 VRSplat 相较于其他 Mini-Splatting 配置。VRSplat 是首个经过系统评估、能够支持现代 VR 应用的 3DGS 方法，实现了 72+ FPS 的渲染速度，同时消除了闪烁与破坏立体感的漂浮伪影。\n"
  },
  {
    "path": "abs/2505.10473.md",
    "content": "### Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting\n\nTo reduce storage and computational costs, 3D Gaussian splatting (3DGS) seeks to minimize the number of Gaussians used while preserving high rendering quality, introducing an inherent trade-off between Gaussian quantity and rendering quality. Existing methods strive for better quantity-quality performance, but lack the ability for users to intuitively adjust this trade-off to suit practical needs such as model deployment under diverse hardware and communication constraints. Here, we present ControlGS, a 3DGS optimization method that achieves semantically meaningful and cross-scene consistent quantity-quality control. Through a single training run using a fixed setup and a user-specified hyperparameter reflecting quantity-quality preference, ControlGS can automatically find desirable quantity-quality trade-off points across diverse scenes, from compact objects to large outdoor scenes. It also outperforms baselines by achieving higher rendering quality with fewer Gaussians, and supports a broad adjustment range with stepless control over the trade-off.\n\n为了降低存储和计算成本，3D Gaussian Splatting（3DGS）致力于在保持高渲染质量的同时，最小化所使用的高斯数量，这引入了高斯数量与渲染质量之间的内在权衡。现有方法虽然力求在数量-质量性能上取得更优表现，但缺乏让用户能够直观调节这一权衡以满足实际需求（例如在不同硬件和通信限制下部署模型）的能力。\n为此，我们提出 ControlGS，这是一种实现语义上有意义且跨场景一致的数量-质量控制的 3DGS 优化方法。通过一次训练过程，使用固定的配置和用户指定的反映数量-质量偏好的超参数，ControlGS 能够自动在从紧凑物体到大尺度户外场景的多种场景中，找到理想的数量-质量权衡点。与基线方法相比，ControlGS 在使用更少高斯的同时实现更高的渲染质量，并支持宽范围、无级连续的权衡调节。\n"
  },
  {
    "path": "abs/2505.10578.md",
    "content": "### ExploreGS: a vision-based low overhead framework for 3D scene reconstruction\n\nThis paper proposes a low-overhead, vision-based 3D scene reconstruction framework for drones, named ExploreGS. By using RGB images, ExploreGS replaces traditional lidar-based point cloud acquisition process with a vision model, achieving a high-quality reconstruction at a lower cost. The framework integrates scene exploration and model reconstruction, and leverags a Bag-of-Words(BoW) model to enable real-time processing capabilities, therefore, the 3D Gaussian Splatting (3DGS) training can be executed on-board. Comprehensive experiments in both simulation and real-world environments demonstrate the efficiency and applicability of the ExploreGS framework on resource-constrained devices, while maintaining reconstruction quality comparable to state-of-the-art methods.\n\n本文提出了一种面向无人机的低开销视觉驱动三维场景重建框架，称为 ExploreGS。该方法利用 RGB 图像替代传统基于激光雷达的点云获取流程，通过视觉模型实现低成本下的高质量重建。该框架融合了场景探索与模型重建过程，并引入词袋模型（Bag-of-Words，BoW）以实现实时处理能力，从而使得 3D Gaussian Splatting（3DGS）的训练能够在无人机本地完成。\n在仿真和真实环境中的大量实验表明，ExploreGS 框架在资源受限设备上的效率和实用性兼具，同时重建质量可与当前最先进的方法相媲美。\n"
  },
  {
    "path": "abs/2505.10685.md",
    "content": "### GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention\n\n3D semantic occupancy prediction is critical for achieving safe and reliable autonomous driving. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and detailed predictions. Although most existing works utilize a dense grid-based representation, in which the entire 3D space is uniformly divided into discrete voxels, the emergence of 3D Gaussians provides a compact and continuous object-centric representation. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, named as GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy to provide 3D Gaussians with geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism for refining 3D Gaussians with LiDAR-camera fusion features in a lifted 3D space. We conducted extensive experiments on both on-road and off-road datasets, demonstrating that our GaussianFormer3D achieves high prediction accuracy that is comparable to state-of-the-art multi-modal fusion-based methods with reduced memory consumption and improved efficiency.\n\n三维语义占据预测对于实现安全可靠的自动驾驶至关重要。相比于仅使用摄像头的感知系统，多模态管线，特别是激光雷达-摄像头融合方法，能够生成更准确且更具细节的预测结果。尽管现有大多数方法采用稠密的网格表示方式，将整个三维空间均匀划分为离散体素，但随着三维高斯的出现，提供了一种紧凑、连续、以目标为中心的表示方式。\n在本研究中，我们提出了一种基于多模态高斯表示的语义占据预测框架，命名为 GaussianFormer3D，该框架结合了三维可变形注意力机制。我们引入了一种从体素到高斯的初始化策略，使三维高斯能够从激光雷达数据中获得几何先验信息；并设计了一种激光雷达引导的三维可变形注意力机制，用于在提升至三维空间后融合激光雷达与摄像头的特征，从而精细化高斯表示。\n我们在道路场景和越野场景的多个数据集上进行了大量实验，结果表明 GaussianFormer3D 在保持与现有最先进多模态融合方法相当的预测精度的同时，显著降低了内存占用，并提升了整体效率。\n"
  },
  {
    "path": "abs/2505.10787.md",
    "content": "### EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes\n\nEfficient scene representations are essential for many real-world applications, especially those involving spatial measurement. Although current NeRF-based methods have achieved impressive results in reconstructing building-scale scenes, they still suffer from slow training and inference speeds due to time-consuming stochastic sampling. Recently, 3D Gaussian Splatting (3DGS) has demonstrated excellent performance with its high-quality rendering and real-time speed, especially for objects and small-scale scenes. However, in outdoor scenes, its point-based explicit representation lacks an effective adjustment mechanism, and the millions of Gaussian points required often lead to memory constraints during training. To address these challenges, we propose EA-3DGS, a high-quality real-time rendering method designed for outdoor scenes. First, we introduce a mesh structure to regulate the initialization of Gaussian components by leveraging an adaptive tetrahedral mesh that partitions the grid and initializes Gaussian components on each face, effectively capturing geometric structures in low-texture regions. Second, we propose an efficient Gaussian pruning strategy that evaluates each 3D Gaussian's contribution to the view and prunes accordingly. To retain geometry-critical Gaussian points, we also present a structure-aware densification strategy that densifies Gaussian points in low-curvature regions. Additionally, we employ vector quantization for parameter quantization of Gaussian components, significantly reducing disk space requirements with only a minimal impact on rendering quality. Extensive experiments on 13 scenes, including eight from four public datasets (MatrixCity-Aerial, Mill-19, Tanks & Temples, WHU) and five self-collected scenes acquired through UAV photogrammetry measurement from SCUT-CA and plateau regions, further demonstrate the superiority of our method.\n\n高效的场景表示对于许多现实世界应用至关重要，尤其是在涉及空间测量的场景中。尽管当前基于 NeRF 的方法在重建建筑尺度的场景方面已取得了显著成果，但由于其依赖耗时的随机采样，在训练和推理速度方面仍然存在瓶颈。近年来，3D Gaussian Splatting（3DGS）凭借高质量渲染和实时性能，在物体及小尺度场景中表现出色。然而，在户外场景中，3DGS 的点云式显式表示缺乏有效的调控机制，且训练过程中所需的数百万高斯点常常造成内存压力。\n为解决上述挑战，我们提出了 EA-3DGS，一种专为户外场景设计的高质量实时渲染方法。首先，我们引入网格结构以引导高斯组件的初始化，采用自适应四面体网格对空间进行划分，并在每个面上初始化高斯组件，从而能够有效捕捉低纹理区域的几何结构。其次，我们提出了一种高效的高斯剪枝策略，通过评估每个三维高斯点对视角的贡献进行裁剪。同时，为保留对几何结构至关重要的高斯点，我们还设计了结构感知的密度增强策略，用于在低曲率区域对高斯点进行增强。此外，我们采用向量量化技术对高斯参数进行压缩，在几乎不影响渲染质量的前提下显著降低磁盘存储需求。\n我们在 13 个场景上进行了大量实验，包括来自四个公开数据集（MatrixCity-Aerial、Mill-19、Tanks & Temples、WHU）的 8 个场景，以及 5 个通过 UAV 航拍测绘获取的自采样场景（来源于 SCUT-CA 与高原地区）。实验结果进一步验证了我们方法在渲染质量、速度和资源占用方面的优越性。\n"
  },
  {
    "path": "abs/2505.10923.md",
    "content": "### GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats\n\nAccurate temporal reconstructions of plant growth are essential for plant phenotyping and breeding, yet remain challenging due to complex geometries, occlusions, and non-rigid deformations of plants. We present a novel framework for building temporal digital twins of plants by combining 3D Gaussian Splatting with a robust sample alignment pipeline. Our method begins by reconstructing Gaussian Splats from multi-view camera data, then leverages a two-stage registration approach: coarse alignment through feature-based matching and Fast Global Registration, followed by fine alignment with Iterative Closest Point. This pipeline yields a consistent 4D model of plant development in discrete time steps. We evaluate the approach on data from the Netherlands Plant Eco-phenotyping Center, demonstrating detailed temporal reconstructions of Sequoia and Quinoa species.\n\n植物生长过程的精确时序重建对于植物表型分析与育种研究至关重要，但由于植物形态结构复杂、存在遮挡以及非刚性形变，使得该任务仍充满挑战。本文提出了一种新颖框架，结合 3D Gaussian Splatting 与稳健的样本对齐流程，用于构建植物的时序数字孪生体。\n该方法首先通过多视角摄像头数据重建高斯点云（Gaussian Splats），随后采用两阶段配准策略进行对齐：第一阶段利用基于特征的匹配与快速全局配准（Fast Global Registration）实现粗配准；第二阶段通过迭代最近点（Iterative Closest Point, ICP）进行精细配准。该流程最终生成一致的、以离散时间步表示的植物 4D 生长模型。\n我们在来自荷兰植物生态表型中心（Netherlands Plant Eco-phenotyping Center）的数据上进行了验证，成功实现了 Sequoia（红杉）和 Quinoa（藜麦）等植物物种的高精度时序重建。\n"
  },
  {
    "path": "abs/2505.11868.md",
    "content": "### MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos\n\nAccurately analyzing the motion parts and their motion attributes in dynamic environments is crucial for advancing key areas such as embodied intelligence. Addressing the limitations of existing methods that rely on dense multi-view images or detailed part-level annotations, we propose an innovative framework that can analyze 3D mobility from monocular videos in a zero-shot manner. This framework can precisely parse motion parts and motion attributes only using a monocular video, completely eliminating the need for annotated training data. Specifically, our method first constructs the scene geometry and roughly analyzes the motion parts and their initial motion attributes combining depth estimation, optical flow analysis and point cloud registration method, then employs 2D Gaussian splatting for scene representation. Building on this, we introduce an end-to-end dynamic scene optimization algorithm specifically designed for articulated objects, refining the initial analysis results to ensure the system can handle 'rotation', 'translation', and even complex movements ('rotation+translation'), demonstrating high flexibility and versatility. To validate the robustness and wide applicability of our method, we created a comprehensive dataset comprising both simulated and real-world scenarios. Experimental results show that our framework can effectively analyze articulated object motions in an annotation-free manner, showcasing its significant potential in future embodied intelligence applications.\n\n在动态环境中对运动部件及其运动属性进行精确分析，对于推动具身智能等关键领域的发展至关重要。针对现有方法依赖密集多视角图像或精细零件级标注的局限性，本文提出了一种创新性框架，能够以零样本的方式从单目视频中解析三维运动性（3D mobility）。该框架仅依赖单目视频即可精准解析运动部件及其运动属性，完全不依赖任何带注释的训练数据。\n具体而言，我们的方法首先通过深度估计、光流分析和点云配准等技术构建场景几何结构，并粗略分析运动部件及其初始运动属性，随后使用二维高斯投影（2D Gaussian Splatting）进行场景表示。在此基础上，我们提出了一种专为关节型物体设计的端到端动态场景优化算法，用于进一步优化初始分析结果，使系统能够处理“旋转”、“平移”乃至“旋转+平移”等复杂运动，展现出高度的灵活性与通用性。\n为验证方法的鲁棒性与广泛适用性，我们构建了一个涵盖模拟与真实场景的综合数据集。实验结果表明，该框架能够在无需人工标注的条件下有效解析关节型物体的运动特性，展现出在未来具身智能应用中的巨大潜力。\n"
  },
  {
    "path": "abs/2505.11905.md",
    "content": "### GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity\n\nWe present a novel method for 6-DoF object tracking and high-quality 3D reconstruction from monocular RGBD video. Existing methods, while achieving impressive results, often struggle with complex objects, particularly those exhibiting symmetry, intricate geometry or complex appearance. To bridge these gaps, we introduce an adaptive method that combines 3D Gaussian Splatting, hybrid geometry/appearance tracking, and key frame selection to achieve robust tracking and accurate reconstructions across a diverse range of objects. Additionally, we present a benchmark covering these challenging object classes, providing high-quality annotations for evaluating both tracking and reconstruction performance. Our approach demonstrates strong capabilities in recovering high-fidelity object meshes, setting a new standard for single-sensor 3D reconstruction in open-world environments.\n\n我们提出了一种新颖的方法，用于从单目 RGBD 视频中实现 6 自由度（6-DoF）物体跟踪与高质量三维重建。尽管现有方法已取得令人瞩目的成果，但在处理具有对称性、几何结构复杂或外观细节丰富的物体时仍面临困难。为弥补这一差距，我们提出了一种自适应方法，结合了 3D Gaussian Splatting、几何与外观混合跟踪机制以及关键帧选择策略，实现了对多种复杂物体的鲁棒跟踪与精确重建。\n此外，我们还构建了一个覆盖上述挑战性物体类别的基准数据集，提供高质量标注，用于评估跟踪与重建性能。实验结果表明，我们的方法在高保真物体网格恢复方面表现出色，为单传感器三维重建在开放世界环境中的应用树立了新标杆。\n"
  },
  {
    "path": "abs/2505.11934.md",
    "content": "### iSegMan: Interactive Segment-and-Manipulate 3D Gaussians\n\nThe efficient rendering and explicit nature of 3DGS promote the advancement of 3D scene manipulation. However, existing methods typically encounter challenges in controlling the manipulation region and are unable to furnish the user with interactive feedback, which inevitably leads to unexpected results. Intuitively, incorporating interactive 3D segmentation tools can compensate for this deficiency. Nevertheless, existing segmentation frameworks impose a pre-processing step of scene-specific parameter training, which limits the efficiency and flexibility of scene manipulation. To deliver a 3D region control module that is well-suited for scene manipulation with reliable efficiency, we propose interactive Segment-and-Manipulate 3D Gaussians (iSegMan), an interactive segmentation and manipulation framework that only requires simple 2D user interactions in any view. To propagate user interactions to other views, we propose Epipolar-guided Interaction Propagation (EIP), which innovatively exploits epipolar constraint for efficient and robust interaction matching. To avoid scene-specific training to maintain efficiency, we further propose the novel Visibility-based Gaussian Voting (VGV), which obtains 2D segmentations from SAM and models the region extraction as a voting game between 2D Pixels and 3D Gaussians based on Gaussian visibility. Taking advantage of the efficient and precise region control of EIP and VGV, we put forth a Manipulation Toolbox to implement various functions on selected regions, enhancing the controllability, flexibility and practicality of scene manipulation. Extensive results on 3D scene manipulation and segmentation tasks fully demonstrate the significant advantages of iSegMan.\n\n3D Gaussian Splatting（3DGS）因其高效的渲染能力和显式表示形式，推动了三维场景操控的发展。然而，现有方法通常难以精确控制操控区域，且无法为用户提供交互式反馈，进而容易产生预期之外的结果。直观地，引入交互式三维分割工具有望弥补这一不足。然而，现有分割框架通常需要针对特定场景进行参数预训练，限制了场景操控的效率与灵活性。\n为此，我们提出了 iSegMan（interactive Segment-and-Manipulate 3D Gaussians），一个交互式分割与操控框架，仅需用户在任意视角进行简单的二维交互即可完成操作。为将用户交互传播至其他视角，我们引入了极线引导交互传播（Epipolar-guided Interaction Propagation, EIP），创新性地利用极线约束实现高效且鲁棒的交互匹配。为避免因场景特定训练导致效率下降，我们进一步提出了基于可见性的高斯投票机制（Visibility-based Gaussian Voting, VGV），该机制基于 SAM 得到的二维分割结果，将区域提取建模为二维像素与三维高斯之间基于可见性的投票博弈过程。\n借助 EIP 与 VGV 所实现的高效精准区域控制能力，我们构建了一个操控工具箱（Manipulation Toolbox），支持在选定区域上执行多种操作，显著提升了三维场景操控的可控性、灵活性与实用性。大量三维场景分割与操控任务的实验结果充分验证了 iSegMan 的显著优势。\n"
  },
  {
    "path": "abs/2505.11992.md",
    "content": "### SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations\n\nNovel view synthesis (NVS) boosts immersive experiences in computer vision and graphics. Existing techniques, though progressed, rely on dense multi-view observations, restricting their application. This work takes on the challenge of reconstructing photorealistic 3D scenes from sparse or single-view inputs. We introduce SpatialCrafter, a framework that leverages the rich knowledge in video diffusion models to generate plausible additional observations, thereby alleviating reconstruction ambiguity. Through a trainable camera encoder and an epipolar attention mechanism for explicit geometric constraints, we achieve precise camera control and 3D consistency, further reinforced by a unified scale estimation strategy to handle scale discrepancies across datasets. Furthermore, by integrating monocular depth priors with semantic features in the video latent space, our framework directly regresses 3D Gaussian primitives and efficiently processes long-sequence features using a hybrid network structure. Extensive experiments show our method enhances sparse view reconstruction and restores the realistic appearance of 3D scenes.\n\n新视角合成（Novel View Synthesis, NVS）在计算机视觉与图形学中极大提升了沉浸式体验。尽管现有技术已取得显著进展，但普遍依赖密集多视角观测，限制了其在实际应用中的可用性。本文针对从稀疏甚至单视角输入中重建真实感三维场景的挑战，提出了 SpatialCrafter 框架。\n该框架通过利用视频扩散模型中蕴含的丰富先验知识，生成合理的补充观测，从而缓解重建歧义问题。我们设计了一个可训练的摄像机编码器以及带有显式几何约束的极线注意力机制（epipolar attention mechanism），实现了精确的摄像机控制与三维一致性，并引入统一的尺度估计策略以应对不同数据集间的尺度差异。\n此外，SpatialCrafter 将单目深度先验与语义特征融合到视频潜空间中，直接回归生成三维高斯图元（3D Gaussian primitives），并通过混合神经网络结构高效处理长序列特征。大量实验表明，该方法在稀疏视角重建任务中表现优异，能够有效还原三维场景的真实外观。\n"
  },
  {
    "path": "abs/2505.12693.md",
    "content": "### TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy\n\nThe performance of multi-modal 3D occupancy prediction is limited by ineffective fusion, mainly due to geometry-semantics mismatch from fixed fusion strategies and surface detail loss caused by sparse, noisy annotations. The mismatch stems from the heterogeneous scale and distribution of point cloud and image features, leading to biased matching under fixed neighborhood fusion. To address this, we propose a target-scale adaptive, bidirectional symmetric retrieval mechanism. It expands the neighborhood for large targets to enhance context awareness and shrinks it for small ones to improve efficiency and suppress noise, enabling accurate cross-modal feature alignment. This mechanism explicitly establishes spatial correspondences and improves fusion accuracy. For surface detail loss, sparse labels provide limited supervision, resulting in poor predictions for small objects. We introduce an improved volume rendering pipeline based on 3D Gaussian Splatting, which takes fused features as input to render images, applies photometric consistency supervision, and jointly optimizes 2D-3D consistency. This enhances surface detail reconstruction while suppressing noise propagation. In summary, we propose TACOcc, an adaptive multi-modal fusion framework for 3D semantic occupancy prediction, enhanced by volume rendering supervision. Experiments on the nuScenes and SemanticKITTI benchmarks validate its effectiveness.\n\n多模态三维占据预测的性能受限于融合效果不佳，主要原因在于固定融合策略所导致的几何与语义不匹配，以及由于标注稀疏和噪声而引起的表面细节丢失。几何与语义的不匹配源于点云与图像特征在尺度与分布上的异构性，导致在固定邻域融合下匹配结果存在偏差。\n为解决上述问题，我们提出了一种目标尺度自适应的双向对称检索机制。该机制对于大目标扩展邻域以增强上下文感知，对于小目标则收缩邻域以提升效率并抑制噪声，从而实现跨模态特征的精准对齐。该机制显式建立空间对应关系，显著提升了融合精度。\n针对表面细节丢失的问题，由于稀疏标签提供的监督有限，导致对小目标的预测效果较差。为此，我们引入了一种改进的体渲染流程，基于 3D Gaussian Splatting，将融合后的特征作为输入进行图像渲染，利用光度一致性监督，实现对二维-三维一致性的联合优化。该机制在增强表面细节重建的同时，有效抑制了噪声传播。\n综上，我们提出了 TACOcc，一个结合体渲染监督的自适应多模态融合框架，用于三维语义占据预测。我们在 nuScenes 和 SemanticKITTI 基准数据集上进行了实验，结果验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2505.12875.md",
    "content": "### 3D Gaussian Adaptive Reconstruction for Fourier Light-Field Microscopy\n\nCompared to light-field microscopy (LFM), which enables high-speed volumetric imaging but suffers from non-uniform spatial sampling, Fourier light-field microscopy (FLFM) introduces sub-aperture division at the pupil plane, thereby ensuring spatially invariant sampling and enhancing spatial resolution. Conventional FLFM reconstruction methods, such as Richardson-Lucy (RL) deconvolution, exhibit poor axial resolution and signal degradation due to the ill-posed nature of the inverse problem. While data-driven approaches enhance spatial resolution by leveraging high-quality paired datasets or imposing structural priors, Neural Radiance Fields (NeRF)-based methods employ physics-informed self-supervised learning to overcome these limitations, yet they are hindered by substantial computational costs and memory demands. Therefore, we propose 3D Gaussian Adaptive Tomography (3DGAT) for FLFM, a 3D gaussian splatting based self-supervised learning framework that significantly improves the volumetric reconstruction quality of FLFM while maintaining computational efficiency. Experimental results indicate that our approach achieves higher resolution and improved reconstruction accuracy, highlighting its potential to advance FLFM imaging and broaden its applications in 3D optical microscopy.\n\n与实现高速体积成像但存在空间采样不均问题的光场显微镜（Light-Field Microscopy, LFM）相比，傅里叶光场显微镜（Fourier Light-Field Microscopy, FLFM）通过在瞳面引入子孔径划分，实现了空间不变采样，从而提升了空间分辨率。然而，传统的 FLFM 重建方法，如 Richardson-Lucy（RL）去卷积，因逆问题的病态性，往往存在较差的轴向分辨率和信号退化问题。尽管数据驱动的方法通过高质量配对数据集或结构先验可提升空间分辨率，Neural Radiance Fields（NeRF）类方法则利用物理约束的自监督学习克服上述限制，但其计算与内存开销极大，制约了实际应用。为此，我们提出了一种适用于 FLFM 的自监督学习框架——3D Gaussian Adaptive Tomography（3DGAT）。该方法基于 3D Gaussian Splatting，在显著提升 FLFM 体积重建质量的同时保持了较高的计算效率。实验结果表明，我们的方法在重建分辨率与准确性方面均优于现有技术，展示出推动 FLFM 成像技术发展并拓展其在三维光学显微领域应用潜力的能力。\n"
  },
  {
    "path": "abs/2505.13215.md",
    "content": "### Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation\n\nRecent advancements in dynamic 3D scene reconstruction have shown promising results, enabling high-fidelity 3D novel view synthesis with improved temporal consistency. Among these, 4D Gaussian Splatting (4DGS) has emerged as an appealing approach due to its ability to model high-fidelity spatial and temporal variations. However, existing methods suffer from substantial computational and memory overhead due to the redundant allocation of 4D Gaussians to static regions, which can also degrade image quality. In this work, we introduce hybrid 3D-4D Gaussian Splatting (3D-4DGS), a novel framework that adaptively represents static regions with 3D Gaussians while reserving 4D Gaussians for dynamic elements. Our method begins with a fully 4D Gaussian representation and iteratively converts temporally invariant Gaussians into 3D, significantly reducing the number of parameters and improving computational efficiency. Meanwhile, dynamic Gaussians retain their full 4D representation, capturing complex motions with high fidelity. Our approach achieves significantly faster training times compared to baseline 4D Gaussian Splatting methods while maintaining or improving the visual quality.\n\n近年来，动态三维场景重建取得了显著进展，使得具有更高时间一致性的新视角三维图像合成成为可能。其中，4D Gaussian Splatting（4DGS） 凭借其对时空变化的高保真建模能力，成为一个颇具吸引力的方案。然而，现有方法普遍存在计算与内存开销巨大的问题，主要由于对静态区域不必要地分配了冗余的 4D 高斯，从而也可能导致图像质量下降。\n为此，本文提出了一种新框架：混合 3D-4D Gaussian Splatting（3D-4DGS）。该方法自适应地使用 3D 高斯表示静态区域，仅将 4D 高斯保留给动态部分。我们的方法以全 4D 高斯初始化表示整个场景，并通过迭代过程将时间上不变的高斯转换为 3D 表达，从而显著减少参数数量，提高计算效率。\n同时，动态区域中的高斯保持完整的 4D 表示，用于高保真地捕捉复杂运动。实验表明，相较于传统 4DGS 方法，我们的框架在保持或提升视觉质量的同时，大幅加快了训练速度。\n"
  },
  {
    "path": "abs/2505.13440.md",
    "content": "### Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos\n\nCurrently almost all state-of-the-art novel view synthesis and reconstruction models rely on calibrated cameras or additional geometric priors for training. These prerequisites significantly limit their applicability to massive uncalibrated data. To alleviate this requirement and unlock the potential for self-supervised training on large-scale uncalibrated videos, we propose a novel two-stage strategy to train a view synthesis model from only raw video frames or multi-view images, without providing camera parameters or other priors. In the first stage, we learn to reconstruct the scene implicitly in a latent space without relying on any explicit 3D representation. Specifically, we predict per-frame latent camera and scene context features, and employ a view synthesis model as a proxy for explicit rendering. This pretraining stage substantially reduces the optimization complexity and encourages the network to learn the underlying 3D consistency in a self-supervised manner. The learned latent camera and implicit scene representation have a large gap compared with the real 3D world. To reduce this gap, we introduce the second stage training by explicitly predicting 3D Gaussian primitives. We additionally apply explicit Gaussian Splatting rendering loss and depth projection loss to align the learned latent representations with physically grounded 3D geometry. In this way, Stage 1 provides a strong initialization and Stage 2 enforces 3D consistency - the two stages are complementary and mutually beneficial. Extensive experiments demonstrate the effectiveness of our approach, achieving high-quality novel view synthesis and accurate camera pose estimation, compared to methods that employ supervision with calibration, pose, or depth information.\n\n目前，几乎所有最先进的新视角合成与重建模型在训练过程中都依赖于经过标定的相机参数或额外的几何先验。这些前提条件极大限制了其在大规模未标定数据上的应用潜力。为突破这一限制，释放在大规模未标定视频上进行自监督训练的能力，本文提出了一种仅基于原始视频帧或多视图图像、无需提供相机参数或其他先验信息的全新两阶段训练策略。\n在第一阶段，我们在潜空间中对场景进行隐式重建，完全不依赖任何显式的三维表示。具体而言，我们为每一帧预测潜在的相机特征和场景上下文特征，并使用视图合成模型作为显式渲染的代理。该预训练阶段显著降低了优化复杂度，同时引导网络以自监督方式学习潜在的三维一致性。\n然而，所学习的潜在相机表示与隐式场景表示与真实三维世界之间仍存在显著差距。为缩小这一差距，我们引入第二阶段训练，显式预测三维高斯图元，并进一步引入高斯投影渲染损失与深度投影损失，将第一阶段中学到的潜在表示与真实物理三维几何对齐。第一阶段提供了强有力的初始化，第二阶段则强化了三维一致性——两者相辅相成，互为促进。\n大量实验验证了我们方法的有效性，在无需标定、位姿或深度监督的前提下，依然能够实现高质量的新视角合成与精确的相机姿态估计，优于多种依赖监督的现有方法。\n\n"
  },
  {
    "path": "abs/2505.13839.md",
    "content": "### MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene Reconstruction\n\n3D Gaussian Splatting (3DGS) has gained significant attention in streamable dynamic novel view synthesis (DNVS) for its photorealistic rendering capability and computational efficiency. Despite much progress in improving rendering quality and optimization strategies, 3DGS-based streamable dynamic scene reconstruction still suffers from flickering artifacts and storage inefficiency, and struggles to model the emerging objects. To tackle this, we introduce MGStream which employs the motion-related 3D Gaussians (3DGs) to reconstruct the dynamic and the vanilla 3DGs for the static. The motion-related 3DGs are implemented according to the motion mask and the clustering-based convex hull algorithm. The rigid deformation is applied to the motion-related 3DGs for modeling the dynamic, and the attention-based optimization on the motion-related 3DGs enables the reconstruction of the emerging objects. As the deformation and optimization are only conducted on the motion-related 3DGs, MGStream avoids flickering artifacts and improves the storage efficiency. Extensive experiments on real-world datasets N3DV and MeetRoom demonstrate that MGStream surpasses existing streaming 3DGS-based approaches in terms of rendering quality, training/storage efficiency and temporal consistency.\n\n3D Gaussian Splatting（3DGS）因其真实感渲染能力与计算效率，在可流式动态新视角合成（Dynamic Novel View Synthesis, DNVS）领域受到广泛关注。尽管在渲染质量与优化策略方面已有诸多进展，基于 3DGS 的动态场景流式重建仍面临闪烁伪影、存储效率低以及难以建模新出现物体等问题。\n为解决上述挑战，本文提出 MGStream，通过引入运动相关 3D 高斯（motion-related 3DGs）与静态 3D 高斯（vanilla 3DGs）实现对动态与静态区域的差异化建模。运动相关 3DGs 是基于运动掩码与基于聚类的凸包算法构建的。对于这些动态区域，高斯点施加刚性变形以建模运动，同时通过基于注意力机制的优化方法对运动相关 3DGs 进行调优，使系统能够有效重建新出现的物体。\n由于变形与优化仅在运动相关高斯点上进行，MGStream 能够显著减少闪烁伪影，同时提升存储效率。在 N3DV 和 MeetRoom 等真实世界数据集上的大量实验证明，MGStream 在渲染质量、训练与存储效率以及时间一致性方面均优于现有的流式 3DGS 方法。\n"
  },
  {
    "path": "abs/2505.14537.md",
    "content": "### Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image\n\nPersonalizing 3D scenes from a single reference image enables intuitive user-guided editing, which requires achieving both multi-view consistency across perspectives and referential consistency with the input image. However, these goals are particularly challenging due to the viewpoint bias caused by the limited perspective provided in a single image. Lacking the mechanisms to effectively expand reference information beyond the original view, existing methods of image-conditioned 3DGS personalization often suffer from this viewpoint bias and struggle to produce consistent results. Therefore, in this paper, we present Consistent Personalization for 3D Gaussian Splatting (CP-GS), a framework that progressively propagates the single-view reference appearance to novel perspectives. In particular, CP-GS integrates pre-trained image-to-3D generation and iterative LoRA fine-tuning to extract and extend the reference appearance, and finally produces faithful multi-view guidance images and the personalized 3DGS outputs through a view-consistent generation process guided by geometric cues. Extensive experiments on real-world scenes show that our CP-GS effectively mitigates the viewpoint bias, achieving high-quality personalization that significantly outperforms existing methods.\n\n从单张参考图像出发实现个性化的三维场景生成，使用户能够以直观方式进行引导式编辑。这一任务要求同时满足多视角一致性与与输入图像的参考一致性，但由于单张图像所提供视角的局限性，易产生视角偏置，使得上述目标尤具挑战性。现有基于图像条件的 3D Gaussian Splatting（3DGS）个性化方法缺乏有效机制来拓展原始视角以外的参考信息，因而普遍受到视角偏置的影响，难以生成一致性良好的结果。\n为解决该问题，本文提出了 CP-GS（Consistent Personalization for 3D Gaussian Splatting），一个可将单视图参考外观逐步传播到新视角的个性化生成框架。具体而言，CP-GS 融合预训练的图像到三维生成模型与迭代式的 LoRA 微调方法，用于提取并扩展参考外观信息，最终在几何线索引导下，生成视角一致的引导图像与个性化的 3DGS 输出。\n在多个真实场景上的实验结果表明，CP-GS 能够有效缓解视角偏置，生成高质量的个性化结果，且在一致性和外观保真度方面显著优于现有方法。\n"
  },
  {
    "path": "abs/2505.14938.md",
    "content": "### Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning\n\nAutonomous robots must reason about the physical consequences of their actions to operate effectively in unstructured, real-world environments. We present Scan, Materialize, Simulate (SMS), a unified framework that combines 3D Gaussian Splatting for accurate scene reconstruction, visual foundation models for semantic segmentation, vision-language models for material property inference, and physics simulation for reliable prediction of action outcomes. By integrating these components, SMS enables generalizable physical reasoning and object-centric planning without the need to re-learn foundational physical dynamics. We empirically validate SMS in a billiards-inspired manipulation task and a challenging quadrotor landing scenario, demonstrating robust performance on both simulated domain transfer and real-world experiments. Our results highlight the potential of bridging differentiable rendering for scene reconstruction, foundation models for semantic understanding, and physics-based simulation to achieve physically grounded robot planning across diverse settings.\n\n自主机器人要在非结构化的真实环境中高效运行，必须能够推理其动作所带来的物理后果。我们提出了**Scan, Materialize, Simulate（SMS）**这一统一框架，该框架融合了用于精确场景重建的3D高斯投影（3D Gaussian Splatting）、用于语义分割的视觉基础模型、用于材质属性推断的视觉-语言模型，以及用于动作结果可靠预测的物理仿真模块。通过整合这些组件，SMS 实现了具备良好泛化能力的物理推理与面向物体的规划，而无需重新学习基础物理动力学。我们在一个受台球启发的操作任务和一个具有挑战性的四旋翼着陆场景中对 SMS 进行了实证验证，展现了其在模拟领域迁移与真实世界实验中的稳健性能。我们的结果突显了将可微渲染用于场景重建、将基础模型用于语义理解，以及将基于物理的仿真用于实现物理可解释机器人规划之间的协同潜力，从而在多样化环境中达成基于物理的机器人任务规划。\n"
  },
  {
    "path": "abs/2505.15185.md",
    "content": "### MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models\n\nRecent advances in generalizable 3D Gaussian Splatting have demonstrated promising results in real-time high-fidelity rendering without per-scene optimization, yet existing approaches still struggle to handle unfamiliar visual content during inference on novel scenes due to limited generalizability. To address this challenge, we introduce MonoSplat, a novel framework that leverages rich visual priors from pre-trained monocular depth foundation models for robust Gaussian reconstruction. Our approach consists of two key components: a Mono-Multi Feature Adapter that transforms monocular features into multi-view representations, coupled with an Integrated Gaussian Prediction module that effectively fuses both feature types for precise Gaussian generation. Through the Adapter's lightweight attention mechanism, features are seamlessly aligned and aggregated across views while preserving valuable monocular priors, enabling the Prediction module to generate Gaussian primitives with accurate geometry and appearance. Through extensive experiments on diverse real-world datasets, we convincingly demonstrate that MonoSplat achieves superior reconstruction quality and generalization capability compared to existing methods while maintaining computational efficiency with minimal trainable parameters.\n\n近期在具备泛化能力的3D高斯投影（3D Gaussian Splatting）方面的研究取得了显著进展，展示了无需针对单个场景进行优化即可实现实时高保真渲染的潜力。然而，现有方法在面对新颖场景中的陌生视觉内容时，仍因泛化能力有限而表现不佳。为应对这一挑战，我们提出了 MonoSplat，一个新颖的框架，利用预训练单目深度基础模型中丰富的视觉先验，实现鲁棒的高斯重建。\n我们的方法包含两个关键组件：一个Mono-Multi特征适配器，用于将单目特征转换为多视图表示；以及一个集成高斯预测模块，用于高效融合这两类特征，从而精确生成高斯基元。适配器中轻量的注意力机制使得不同视角之间的特征能够无缝对齐与聚合，同时保留关键的单目先验，使预测模块能够生成具备准确几何结构与外观的高斯基元。\n在多个真实世界数据集上的大量实验表明，MonoSplat 在保持计算效率和极少可训练参数的前提下，显著优于现有方法，在重建质量和泛化能力方面均表现出色。\n"
  },
  {
    "path": "abs/2505.15208.md",
    "content": "### GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting\n\nTransferring 2D textures to 3D modalities is of great significance for improving the efficiency of multimedia content creation. Existing approaches have rarely focused on transferring image textures onto 3D representations. 3D style transfer methods are capable of transferring abstract artistic styles to 3D scenes. However, these methods often overlook the geometric information of the scene, which makes it challenging to achieve high-quality 3D texture transfer results. In this paper, we present GT^2-GS, a geometry-aware texture transfer framework for gaussian splitting. From the perspective of matching texture features with geometric information in rendered views, we identify the issue of insufficient texture features and propose a geometry-aware texture augmentation module to expand the texture feature set. Moreover, a geometry-consistent texture loss is proposed to optimize texture features into the scene representation. This loss function incorporates both camera pose and 3D geometric information of the scene, enabling controllable texture-oriented appearance editing. Finally, a geometry preservation strategy is introduced. By alternating between the texture transfer and geometry correction stages over multiple iterations, this strategy achieves a balance between learning texture features and preserving geometric integrity. Extensive experiments demonstrate the effectiveness and controllability of our method. Through geometric awareness, our approach achieves texture transfer results that better align with human visual perception.\n\n将二维纹理迁移至三维模态对于提升多媒体内容创作效率具有重要意义。然而，现有方法鲜有关注将图像纹理有效地迁移到三维表示上。虽然已有的三维风格迁移方法能够将抽象的艺术风格迁移至三维场景中，但这些方法往往忽略了场景中的几何信息，从而难以实现高质量的三维纹理迁移效果。本文提出了一种面向高斯投影的几何感知纹理迁移框架 GT²-GS。我们从在渲染视图中匹配纹理特征与几何信息的角度出发，发现现有方法存在纹理特征不足的问题。为此，我们设计了一个几何感知纹理增强模块，用于扩展纹理特征集合。此外，我们提出了一种几何一致性纹理损失函数，将纹理特征有效优化到场景表示中。该损失函数结合了相机位姿和场景的三维几何信息，从而实现可控的、面向纹理的外观编辑。最后，我们引入了一种几何保持策略，在多个迭代中交替执行纹理迁移与几何修正阶段，在学习纹理特征与保持几何结构完整性之间实现平衡。大量实验验证了本方法在纹理迁移效果与可控性方面的有效性。通过引入几何感知机制，我们的方法实现了更符合人类视觉感知的纹理迁移结果。\n"
  },
  {
    "path": "abs/2505.15235.md",
    "content": "### X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography\n\nComputed Tomography serves as an indispensable tool in clinical workflows, providing non-invasive visualization of internal anatomical structures. Existing CT reconstruction works are limited to small-capacity model architecture and inflexible volume representation. In this work, we present X-GRM (X-ray Gaussian Reconstruction Model), a large feedforward model for reconstructing 3D CT volumes from sparse-view 2D X-ray projections. X-GRM employs a scalable transformer-based architecture to encode sparse-view X-ray inputs, where tokens from different views are integrated efficiently. Then, these tokens are decoded into a novel volume representation, named Voxel-based Gaussian Splatting (VoxGS), which enables efficient CT volume extraction and differentiable X-ray rendering. This combination of a high-capacity model and flexible volume representation, empowers our model to produce high-quality reconstructions from various testing inputs, including in-domain and out-domain X-ray projections.\n\n计算机断层扫描（Computed Tomography, CT）是临床工作流程中不可或缺的工具，可用于无创可视化内部解剖结构。然而，现有的CT重建方法通常受限于模型容量较小以及体积表示方式缺乏灵活性。在本研究中，我们提出了 X-GRM（X-ray Gaussian Reconstruction Model），一个用于从稀疏视角二维X射线投影重建三维CT体积的大型前馈模型。X-GRM 采用可扩展的基于 Transformer 的架构对稀疏视角的 X 射线输入进行编码，能够高效整合来自不同视角的 token。随后，这些 token 被解码为一种新颖的体积表示方式，称为基于体素的高斯投影（Voxel-based Gaussian Splatting, VoxGS），该表示支持高效的CT体积提取和可微的X射线渲染。高容量模型与灵活体积表示的结合，使我们的模型能够从多种输入中生成高质量的重建结果，包括域内和域外的X射线投影图像。\n"
  },
  {
    "path": "abs/2505.15287.md",
    "content": "### GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation\n\nWe introduce GS2E (Gaussian Splatting to Event), a large-scale synthetic event dataset for high-fidelity event vision tasks, captured from real-world sparse multi-view RGB images. Existing event datasets are often synthesized from dense RGB videos, which typically lack viewpoint diversity and geometric consistency, or depend on expensive, difficult-to-scale hardware setups. GS2E overcomes these limitations by first reconstructing photorealistic static scenes using 3D Gaussian Splatting, and subsequently employing a novel, physically-informed event simulation pipeline. This pipeline generally integrates adaptive trajectory interpolation with physically-consistent event contrast threshold modeling. Such an approach yields temporally dense and geometrically consistent event streams under diverse motion and lighting conditions, while ensuring strong alignment with underlying scene structures. Experimental results on event-based 3D reconstruction demonstrate GS2E's superior generalization capabilities and its practical value as a benchmark for advancing event vision research.\n\n我们提出了 GS2E（Gaussian Splatting to Event），一个面向高保真事件视觉任务的大规模合成事件数据集，其数据来源于真实世界的稀疏多视角RGB图像。现有的事件数据集通常是从稠密RGB视频中合成的，这类数据往往缺乏视角多样性和几何一致性，或依赖昂贵、难以扩展的硬件设备。GS2E 通过以下方式克服了这些局限：首先利用 3D Gaussian Splatting 重建出写实静态场景；随后设计了一条新颖的、具有物理一致性的事件模拟流程。该流程结合了自适应轨迹插值与物理一致的事件对比度阈值建模，能够在不同运动与光照条件下，生成时间密集、几何一致的事件流，并与底层场景结构高度对齐。在基于事件的三维重建任务中，实验结果表明 GS2E 展现出卓越的泛化能力，并作为一个实用的基准，为推动事件视觉研究提供了重要价值。\n"
  },
  {
    "path": "abs/2505.15385.md",
    "content": "### EVA: Expressive Virtual Avatars from Multi-view Videos\n\nWith recent advancements in neural rendering and motion capture algorithms, remarkable progress has been made in photorealistic human avatar modeling, unlocking immense potential for applications in virtual reality, augmented reality, remote communication, and industries such as gaming, film, and medicine. However, existing methods fail to provide complete, faithful, and expressive control over human avatars due to their entangled representation of facial expressions and body movements. In this work, we introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework that achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures. Specifically, our approach designs the human avatar as a two-layer model: an expressive template geometry layer and a 3D Gaussian appearance layer. First, we present an expressive template tracking algorithm that leverages coarse-to-fine optimization to accurately recover body motions, facial expressions, and non-rigid deformation parameters from multi-view videos. Next, we propose a novel decoupled 3D Gaussian appearance model designed to effectively disentangle body and facial appearance. Unlike unified Gaussian estimation approaches, our method employs two specialized and independent modules to model the body and face separately. Experimental results demonstrate that EVA surpasses state-of-the-art methods in terms of rendering quality and expressiveness, validating its effectiveness in creating full-body avatars. This work represents a significant advancement towards fully drivable digital human models, enabling the creation of lifelike digital avatars that faithfully replicate human geometry and appearance.\n\n随着神经渲染与动作捕捉算法的持续进展，真实感人类虚拟头像建模取得了显著突破，为虚拟现实、增强现实、远程通信，以及游戏、影视、医疗等多个行业带来了巨大的应用潜力。然而，现有方法由于面部表情与身体动作的表示纠缠，尚无法实现对人类头像的完整、真实且富有表现力的控制。为此，我们提出了 EVA（Expressive Virtual Avatars），一个具有人物特异性、可完全控制且富有表现力的人类虚拟头像建模框架，能够在实时渲染下实现高度真实的视觉表现，同时支持对面部表情、身体动作与手势的独立控制。具体而言，我们将虚拟人头像建模为一个双层结构模型：由表情模板几何层与三维高斯外观层组成。首先，我们提出了一种表情模板跟踪算法，采用由粗到细的优化策略，从多视角视频中精准恢复身体动作、面部表情及非刚性变形参数。接着，我们设计了一种解耦的三维高斯外观建模方法，有效实现身体与面部外观特征的分离。区别于统一高斯估计的方法，我们的方法采用两个独立模块，分别对身体与面部进行建模。实验结果表明，EVA 在渲染质量与表现力方面均优于现有先进方法，验证了其在全身虚拟人建模中的有效性。该工作朝着完全可驱动的数字人模型迈出了关键一步，使得构建能够真实还原人体几何与外观的虚拟头像成为可能。\n"
  },
  {
    "path": "abs/2505.15528.md",
    "content": "### PlantDreamer: Achieving Realistic 3D Plant Models with Diffusion-Guided Gaussian Splatting\n\nRecent years have seen substantial improvements in the ability to generate synthetic 3D objects using AI. However, generating complex 3D objects, such as plants, remains a considerable challenge. Current generative 3D models struggle with plant generation compared to general objects, limiting their usability in plant analysis tools, which require fine detail and accurate geometry. We introduce PlantDreamer, a novel approach to 3D synthetic plant generation, which can achieve greater levels of realism for complex plant geometry and textures than available text-to-3D models. To achieve this, our new generation pipeline leverages a depth ControlNet, fine-tuned Low-Rank Adaptation and an adaptable Gaussian culling algorithm, which directly improve textural realism and geometric integrity of generated 3D plant models. Additionally, PlantDreamer enables both purely synthetic plant generation, by leveraging L-System-generated meshes, and the enhancement of real-world plant point clouds by converting them into 3D Gaussian Splats. We evaluate our approach by comparing its outputs with state-of-the-art text-to-3D models, demonstrating that PlantDreamer outperforms existing methods in producing high-fidelity synthetic plants. Our results indicate that our approach not only advances synthetic plant generation, but also facilitates the upgrading of legacy point cloud datasets, making it a valuable tool for 3D phenotyping applications.\n\n近年来，AI 在合成三维物体生成方面取得了显著进展。然而，生成复杂的三维结构（如植物）仍然面临巨大挑战。当前的三维生成模型在植物建模方面明显逊色于对一般物体的建模能力，这限制了它们在需要高精细度与几何准确性的植物分析工具中的应用。为了解决这一问题，我们提出了 PlantDreamer，一种新颖的三维植物合成生成方法，在处理复杂植物的几何结构与纹理方面，比现有的文本到三维（text-to-3D）模型更具真实感。为实现这一目标，我们设计了一条全新的生成流程，融合了深度图控制网络（depth ControlNet）、微调的低秩适配（Low-Rank Adaptation）以及可调式的高斯剔除算法（adaptable Gaussian culling），从而在纹理真实感与几何结构完整性方面显著提升生成效果。此外，PlantDreamer 支持两种植物建模方式：一是基于 L-System 生成的网格进行完全合成的植物生成；二是对真实植物点云进行增强，转换为**三维高斯投影（3D Gaussian Splats）**表示。我们通过与当前最先进的文本到三维模型进行对比评估，结果显示 PlantDreamer 在生成高保真合成植物方面显著优于现有方法。这些结果表明，我们的方法不仅推动了合成植物建模的发展，还能用于提升已有点云数据集的质量，因而在三维植物表型分析等应用中具有重要价值。\n"
  },
  {
    "path": "abs/2505.15737.md",
    "content": "### RUSplatting: Robust 3D Gaussian Splatting for Sparse-View Underwater Scene Reconstruction\n\nReconstructing high-fidelity underwater scenes remains a challenging task due to light absorption, scattering, and limited visibility inherent in aquatic environments. This paper presents an enhanced Gaussian Splatting-based framework that improves both the visual quality and geometric accuracy of deep underwater rendering. We propose decoupled learning for RGB channels, guided by the physics of underwater attenuation, to enable more accurate colour restoration. To address sparse-view limitations and improve view consistency, we introduce a frame interpolation strategy with a novel adaptive weighting scheme. Additionally, we introduce a new loss function aimed at reducing noise while preserving edges, which is essential for deep-sea content. We also release a newly collected dataset, Submerged3D, captured specifically in deep-sea environments. Experimental results demonstrate that our framework consistently outperforms state-of-the-art methods with PSNR gains up to 1.90dB, delivering superior perceptual quality and robustness, and offering promising directions for marine robotics and underwater visual analytics.\n\n由于水下环境中普遍存在的光吸收、散射以及能见度受限等问题，重建高保真的水下场景仍然是一项极具挑战性的任务。本文提出了一种增强的基于高斯投影（Gaussian Splatting）的框架，能够在深海渲染中同时提升视觉质量与几何精度。我们引入了基于水下衰减物理机制的RGB通道解耦学习策略，以实现更准确的颜色还原。为了解决稀疏视角问题并提高视角一致性，我们设计了一种结合自适应加权机制的帧间插值策略。此外，我们还提出了一种新型损失函数，旨在在降噪的同时保留边缘结构，这对于深海内容的还原至关重要。我们同时发布了一个全新采集的数据集 Submerged3D，专门采集自深海环境，填补了现有数据资源的空白。实验结果表明，本文方法在多个指标上均显著优于现有先进方法，PSNR 提升最高达 1.90dB，在感知质量和鲁棒性方面均展现出优势。该方法为海洋机器人与水下视觉分析等应用方向提供了有前景的技术路径。\n"
  },
  {
    "path": "abs/2505.16533.md",
    "content": "### Motion Matters: Compact Gaussian Streaming for Free-Viewpoint Video Reconstruction\n\n3D Gaussian Splatting (3DGS) has emerged as a high-fidelity and efficient paradigm for online free-viewpoint video (FVV) reconstruction, offering viewers rapid responsiveness and immersive experiences. However, existing online methods face challenge in prohibitive storage requirements primarily due to point-wise modeling that fails to exploit the motion properties. To address this limitation, we propose a novel Compact Gaussian Streaming (ComGS) framework, leveraging the locality and consistency of motion in dynamic scene, that models object-consistent Gaussian point motion through keypoint-driven motion representation. By transmitting only the keypoint attributes, this framework provides a more storage-efficient solution. Specifically, we first identify a sparse set of motion-sensitive keypoints localized within motion regions using a viewspace gradient difference strategy. Equipped with these keypoints, we propose an adaptive motion-driven mechanism that predicts a spatial influence field for propagating keypoint motion to neighboring Gaussian points with similar motion. Moreover, ComGS adopts an error-aware correction strategy for key frame reconstruction that selectively refines erroneous regions and mitigates error accumulation without unnecessary overhead. Overall, ComGS achieves a remarkable storage reduction of over 159 X compared to 3DGStream and 14 X compared to the SOTA method QUEEN, while maintaining competitive visual fidelity and rendering speed.\n\n3D Gaussian Splatting（3DGS）已成为在线自由视角视频（Free-Viewpoint Video, FVV）重建中一种高保真且高效的关键范式，为观众带来了极快响应与沉浸式体验。然而，现有的在线方法由于基于逐点建模，未能充分利用运动特性，导致存储需求极其庞大，成为实际应用的瓶颈。为解决上述问题，我们提出了一种新颖的框架——Compact Gaussian Streaming（ComGS）。该框架基于动态场景中运动的局部性与一致性，通过基于关键点驱动的运动表示来建模对象一致的高斯点运动，从而实现更具存储效率的表示方式。在传输过程中，仅需传输关键点属性，大幅压缩了数据量。具体而言，我们首先通过视角空间梯度差策略在运动区域内提取一组稀疏、运动敏感的关键点。借助这些关键点，我们提出了一种自适应的运动传播机制，预测关键点在空间中的影响场，将其运动信息传播到具有相似运动模式的邻近高斯点。此外，ComGS 引入了一种误差感知的关键帧修正策略，用于对误差区域进行选择性精修，从而有效缓解误差累积问题，同时避免不必要的计算开销。\n总体而言，ComGS 在保持渲染质量和速度竞争力的同时，实现了显著的存储压缩：相较于 3DGStream 减少超过 159 倍，相较于当前最先进方法 QUEEN 也减少了 14 倍。\n"
  },
  {
    "path": "abs/2505.17338.md",
    "content": "### Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering\n\nVolumetric rendering of Computed Tomography (CT) scans is crucial for visualizing complex 3D anatomical structures in medical imaging. Current high-fidelity approaches, especially neural rendering techniques, require time-consuming per-scene optimization, limiting clinical applicability due to computational demands and poor generalizability. We propose Render-FM, a novel foundation model for direct, real-time volumetric rendering of CT scans. Render-FM employs an encoder-decoder architecture that directly regresses 6D Gaussian Splatting (6DGS) parameters from CT volumes, eliminating per-scan optimization through large-scale pre-training on diverse medical data. By integrating robust feature extraction with the expressive power of 6DGS, our approach efficiently generates high-quality, real-time interactive 3D visualizations across diverse clinical CT data. Experiments demonstrate that Render-FM achieves visual fidelity comparable or superior to specialized per-scan methods while drastically reducing preparation time from nearly an hour to seconds for a single inference step. This advancement enables seamless integration into real-time surgical planning and diagnostic workflows.\n\n计算机断层扫描（CT）体绘制在医学成像中对于复杂三维解剖结构的可视化至关重要。当前高保真度的方法，尤其是神经渲染技术，通常需要对每个场景进行耗时的优化，这在临床应用中受到计算开销大与泛化能力差的限制。\n我们提出了 Render-FM，一种新颖的基础模型，用于直接、实时地对 CT 扫描进行体绘制。Render-FM 采用编码器-解码器架构，能够直接从 CT 体数据中回归 6D 高斯散点（6D Gaussian Splatting, 6DGS）参数，借助在大规模多样医学数据上的预训练，彻底消除对每个扫描样本进行单独优化的需求。\n通过结合强大的特征提取能力与 6DGS 的高表达性，我们的方法能够高效生成高质量、实时交互的三维可视化效果，适用于多样化的临床 CT 数据。实验表明，Render-FM 在视觉保真度方面达到或超过现有的专用单扫描方法，同时将准备时间从近一小时大幅缩短至仅需数秒一次推理。\n这一突破使其能够无缝集成到实时外科手术规划与诊断工作流程中。\n"
  },
  {
    "path": "abs/2505.17402.md",
    "content": "### From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation\n\nHigh-fidelity 3D reconstruction is critical for aerial inspection tasks such as infrastructure monitoring, structural assessment, and environmental surveying. While traditional photogrammetry techniques enable geometric modeling, they lack semantic interpretability, limiting their effectiveness for automated inspection workflows. Recent advances in neural rendering and 3D Gaussian Splatting (3DGS) offer efficient, photorealistic reconstructions but similarly lack scene-level understanding.\nIn this work, we present a UAV-based pipeline that extends Feature-3DGS for language-guided 3D segmentation. We leverage LSeg-based feature fields with CLIP embeddings to generate heatmaps in response to language prompts. These are thresholded to produce rough segmentations, and the highest-scoring point is then used as a prompt to SAM or SAM2 for refined 2D segmentation on novel view renderings. Our results highlight the strengths and limitations of various feature field backbones (CLIP-LSeg, SAM, SAM2) in capturing meaningful structure in large-scale outdoor environments. We demonstrate that this hybrid approach enables flexible, language-driven interaction with photorealistic 3D reconstructions, opening new possibilities for semantic aerial inspection and scene understanding.\n\n高保真三维重建在基础设施监测、结构评估与环境勘测等无人机空中巡检任务中具有关键作用。尽管传统的摄影测量技术能够实现几何建模，但其缺乏语义可解释性，限制了其在自动化巡检流程中的应用效果。近年来，神经渲染与三维高斯散点（3D Gaussian Splatting, 3DGS）技术的进展，使得高效、照片级真实感的重建成为可能，但同样缺乏对场景层级的理解。\n在本研究中，我们提出了一条基于无人机的处理流程，将 Feature-3DGS 扩展用于语言引导的三维语义分割。该方法利用基于 LSeg 的特征场与 CLIP 嵌入，将语言提示词映射为响应热力图。通过阈值处理获得粗略分割区域，并以得分最高的点作为提示，引导 SAM 或 SAM2 在新视角渲染图像上进行精细的二维分割。\n实验结果展示了不同特征场骨干（如 CLIP-LSeg、SAM、SAM2）在捕捉大规模户外场景中语义结构方面的优劣。我们表明，该混合方法支持灵活的、基于语言的人机交互方式，能够在照片级三维重建结果上实现语义引导的巡检与场景理解，为语义化空中巡检带来了新的可能性。\n"
  },
  {
    "path": "abs/2505.17590.md",
    "content": "### CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis\n\nRecently, 3D GANs based on 3D Gaussian splatting have been proposed for high quality synthesis of human heads. However, existing methods stabilize training and enhance rendering quality from steep viewpoints by conditioning the random latent vector on the current camera position. This compromises 3D consistency, as we observe significant identity changes when re-synthesizing the 3D head with each camera shift. Conversely, fixing the camera to a single viewpoint yields high-quality renderings for that perspective but results in poor performance for novel views. Removing view-conditioning typically destabilizes GAN training, often causing the training to collapse. In response to these challenges, we introduce CGS-GAN, a novel 3D Gaussian Splatting GAN framework that enables stable training and high-quality 3D-consistent synthesis of human heads without relying on view-conditioning. To ensure training stability, we introduce a multi-view regularization technique that enhances generator convergence with minimal computational overhead. Additionally, we adapt the conditional loss used in existing 3D Gaussian splatting GANs and propose a generator architecture designed to not only stabilize training but also facilitate efficient rendering and straightforward scaling, enabling output resolutions up to 20482. To evaluate the capabilities of CGS-GAN, we curate a new dataset derived from FFHQ. This dataset enables very high resolutions, focuses on larger portions of the human head, reduces view-dependent artifacts for improved 3D consistency, and excludes images where subjects are obscured by hands or other objects. As a result, our approach achieves very high rendering quality, supported by competitive FID scores, while ensuring consistent 3D scene generation.\n\n近年来，基于三维高斯散点（3D Gaussian Splatting, 3DGS）的三维生成对抗网络（3D GANs）被提出用于高质量的人头合成。然而，现有方法通常通过将随机潜变量与当前相机位置进行条件耦合，以稳定训练过程并提升从极端视角渲染的质量。但这种方式牺牲了三维一致性：我们观察到，当相机视角发生变化并重新合成三维人头时，身份特征会发生显著改变。\n相反，若固定相机位置，仅从单一视角进行训练，虽然可获得该视角下的高质量图像，但在新视角下表现较差。完全去除视角条件通常会导致 GAN 训练不稳定，甚至直接崩溃。\n针对上述挑战，我们提出了 CGS-GAN，一种新颖的三维高斯散点 GAN 框架，在无需视角条件的情况下实现稳定训练与高质量的三维一致性人头合成。\n为保障训练稳定性，我们引入了一种多视角正则化技术，以极低的计算开销提升生成器的收敛效果。同时，我们对现有三维高斯 GAN 中的条件损失函数进行了适配，并设计了一种新的生成器架构，该架构不仅增强训练稳定性，还具备高效渲染与易于扩展的特性，可支持最高达 2048² 分辨率的输出。\n为了评估 CGS-GAN 的性能，我们基于 FFHQ 数据构建了一个新数据集。该数据集支持超高分辨率训练，覆盖更大范围的人头区域，减少视角相关伪影，增强三维一致性，并剔除因手部或其他物体遮挡面部的图像。\n最终，我们的方法在实现卓越渲染质量的同时，也保持了良好的三维场景一致性，FID 等指标表现具有竞争力。\n"
  },
  {
    "path": "abs/2505.18342.md",
    "content": "### Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance\n\nAccurate and scalable quantification of animal pose and appearance is crucial for studying behavior. Current 3D pose estimation techniques, such as keypoint- and mesh-based techniques, often face challenges including limited representational detail, labor-intensive annotation requirements, and expensive per-frame optimization. These limitations hinder the study of subtle movements and can make large-scale analyses impractical. We propose Pose Splatter, a novel framework leveraging shape carving and 3D Gaussian splatting to model the complete pose and appearance of laboratory animals without prior knowledge of animal geometry, per-frame optimization, or manual annotations. We also propose a novel rotation-invariant visual embedding technique for encoding pose and appearance, designed to be a plug-in replacement for 3D keypoint data in downstream behavioral analyses. Experiments on datasets of mice, rats, and zebra finches show Pose Splatter learns accurate 3D animal geometries. Notably, Pose Splatter represents subtle variations in pose, provides better low-dimensional pose embeddings over state-of-the-art as evaluated by humans, and generalizes to unseen data. By eliminating annotation and per-frame optimization bottlenecks, Pose Splatter enables analysis of large-scale, longitudinal behavior needed to map genotype, neural activity, and micro-behavior at unprecedented resolution.\n\n对动物姿态与外观的准确且可扩展的量化，对于行为研究至关重要。当前的三维姿态估计方法，如基于关键点或网格的方法，通常面临表示能力有限、注释成本高以及每帧优化开销大的问题。这些限制阻碍了对细微动作的研究，并使大规模分析变得不切实际。\n我们提出了 Pose Splatter，一种新颖的框架，结合形状雕刻（shape carving）与三维高斯散点（3D Gaussian Splatting）技术，在无需预先了解动物几何结构、每帧优化或人工标注的情况下，实现对实验动物完整姿态与外观的建模。\n此外，我们还提出了一种旋转不变的视觉嵌入技术，用于编码姿态与外观特征，设计上可直接替代三维关键点数据，应用于后续的行为分析任务。\n在小鼠、大鼠与斑胸草雀的数据集上的实验表明，Pose Splatter 能够学习准确的三维动物几何结构。值得注意的是，该方法能够表示细微的姿态变化，在人类评估下提供比现有方法更优的低维姿态嵌入表示，并能推广至未见数据。\n通过消除标注与逐帧优化的瓶颈，Pose Splatter 为实现大规模、纵向行为分析铺平了道路，从而以前所未有的分辨率关联基因型、神经活动与微观行为。\n"
  },
  {
    "path": "abs/2505.18649.md",
    "content": "### SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting\n\nRecently, 3D Gaussian Splatting (3DGS) has excelled in novel view synthesis (NVS) with its real-time rendering capabilities and superior quality. However, it encounters challenges for high-resolution novel view synthesis (HRNVS) due to the coarse nature of primitives derived from low-resolution input views. To address this issue, we propose SuperGS, an expansion of Scaffold-GS designed with a two-stage coarse-to-fine training framework. In the low-resolution stage, we introduce a latent feature field to represent the low-resolution scene, which serves as both the initialization and foundational information for super-resolution optimization. In the high-resolution stage, we propose a multi-view consistent densification strategy that backprojects high-resolution depth maps based on error maps and employs a multi-view voting mechanism, mitigating ambiguities caused by multi-view inconsistencies in the pseudo labels provided by 2D prior models while avoiding Gaussian redundancy. Furthermore, we model uncertainty through variational feature learning and use it to guide further scene representation refinement and adjust the supervisory effect of pseudo-labels, ensuring consistent and detailed scene reconstruction. Extensive experiments demonstrate that SuperGS outperforms state-of-the-art HRNVS methods on both forward-facing and 360-degree datasets.\n\n近年来，三维高斯散点（3D Gaussian Splatting, 3DGS）在新视角合成（Novel View Synthesis, NVS）任务中表现出色，凭借其实时渲染能力与卓越的图像质量受到广泛关注。然而，当面对高分辨率新视角合成（High-Resolution Novel View Synthesis, HRNVS）时，3DGS 由于其基元通常源自低分辨率输入视图而显得过于粗糙，难以胜任高保真渲染任务。\n为解决这一问题，我们提出了 SuperGS，这是对 Scaffold-GS 的扩展，采用了一个双阶段的粗到细训练框架。在低分辨率阶段，我们引入了潜在特征场（latent feature field）来表示低分辨率场景，它既作为高分辨率优化的初始化，又提供基础信息支持后续超分辨率建模。\n在高分辨率阶段，我们提出了一种多视角一致性的稠密化策略，通过误差图回投高分辨率深度图，并引入多视角投票机制，以缓解多视角伪标签之间的不一致性带来的歧义，同时避免冗余高斯点的引入。\n此外，我们通过变分特征学习（variational feature learning）对不确定性建模，并据此引导后续的场景表示优化过程，同时动态调整伪标签的监督权重，从而确保最终的场景重建既一致又细致。\n大量实验结果表明，SuperGS 在 forward-facing 和 360 度数据集上均显著优于现有最先进的 HRNVS 方法。\n"
  },
  {
    "path": "abs/2505.18764.md",
    "content": "### Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting\n\nRecent works demonstrate the advantages of hardware rasterization for 3D Gaussian Splatting (3DGS) in forward-pass rendering through fast GPU-optimized graphics and fixed memory footprint. However, extending these benefits to backward-pass gradient computation remains challenging due to graphics pipeline constraints. We present a differentiable hardware rasterizer for 3DGS that overcomes the memory and performance limitations of tile-based software rasterization. Our solution employs programmable blending for per-pixel gradient computation combined with a hybrid gradient reduction strategy (quad-level + subgroup) in fragment shaders, achieving over 10x faster backward rasterization versus naive atomic operations and 3x speedup over the canonical tile-based rasterizer. Systematic evaluation reveals 16-bit render targets (float16 and unorm16) as the optimal accuracy-efficiency trade-off, achieving higher gradient accuracy among mixed-precision rendering formats with execution speeds second only to unorm8, while float32 texture incurs severe forward pass performance degradation due to suboptimal hardware optimizations. Our method with float16 formats demonstrates 3.07x acceleration in full pipeline execution (forward + backward passes) on RTX4080 GPUs with the MipNeRF dataset, outperforming the baseline tile-based renderer while preserving hardware rasterization's memory efficiency advantages -- incurring merely 2.67% of the memory overhead required for splat sorting operations. This work presents a unified differentiable hardware rasterization method that simultaneously optimizes runtime and memory usage for 3DGS, making it particularly suitable for resource-constrained devices with limited memory capacity.\n\n近期研究表明，在前向渲染中采用硬件光栅化技术对于三维高斯散点（3D Gaussian Splatting, 3DGS）具有显著优势，得益于其快速的 GPU 优化图形处理能力与固定的内存开销。然而，由于图形渲染管线的限制，将这一优势扩展到反向传播的梯度计算仍面临挑战。\n我们提出了一种可微分硬件光栅化器，专为 3DGS 设计，突破了基于 tile 的软件光栅化在内存与性能上的瓶颈。该方案采用**可编程混合（programmable blending）实现逐像素梯度计算，并结合混合式梯度归约策略（四元组级 + 子组级，quad-level + subgroup）**在 fragment shader 中高效处理反向传播。\n与朴素的原子操作方法相比，我们的方案在反向光栅化阶段实现了超过 10 倍的加速，并相较于传统 tile-based 光栅器也取得了 3 倍的性能提升。\n系统评估结果表明，**16 位渲染格式（float16 和 unorm16）**在精度与效率之间达成最佳平衡，在混合精度渲染格式中实现更高梯度精度，且其执行速度仅次于 unorm8；相比之下，float32 纹理因硬件优化不足，在前向传递中引发显著性能下降。\n在 RTX4080 GPU 上使用 MipNeRF 数据集进行测试，我们的方法在 float16 配置下实现了完整训练流程（前向 + 反向）3.07 倍的加速，相较于基线 tile-based 渲染器性能显著提升，并保留了硬件光栅化在内存上的高效性——其所需内存开销仅为传统 splat 排序操作的 2.67%。\n本研究提出了一种统一的可微分硬件光栅化方法，同时优化了运行时性能与内存占用，特别适用于内存受限的计算设备，为 3DGS 在资源有限环境下的训练与应用提供了切实可行的解决方案。\n"
  },
  {
    "path": "abs/2505.18992.md",
    "content": "### VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes\n\n3D Gaussian Splatting has recently shown promising results in dense visual SLAM. However, existing 3DGS-based SLAM methods are all constrained to small-room scenarios and struggle with memory explosion in large-scale scenes and long sequences. To this end, we propose VPGS-SLAM, the first 3DGS-based large-scale RGBD SLAM framework for both indoor and outdoor scenarios. We design a novel voxel-based progressive 3D Gaussian mapping method with multiple submaps for compact and accurate scene representation in large-scale and long-sequence scenes. This allows us to scale up to arbitrary scenes and improves robustness (even under pose drifts). In addition, we propose a 2D-3D fusion camera tracking method to achieve robust and accurate camera tracking in both indoor and outdoor large-scale scenes. Furthermore, we design a 2D-3D Gaussian loop closure method to eliminate pose drift. We further propose a submap fusion method with online distillation to achieve global consistency in large-scale scenes when detecting a loop. Experiments on various indoor and outdoor datasets demonstrate the superiority and generalizability of the proposed framework.\n\n三维高斯散点（3D Gaussian Splatting）近期在稠密视觉 SLAM 中展现出令人瞩目的效果。然而，现有基于 3DGS 的 SLAM 方法均受限于小尺度室内场景，面对大规模场景与长序列时常因内存爆炸而难以扩展。为此，我们提出了 VPGS-SLAM，这是首个支持大规模室内与室外场景的基于 3DGS 的 RGB-D SLAM 框架。我们设计了一种新颖的基于体素的渐进式 3D 高斯建图方法，结合多子图（submap）结构，在长序列和大场景中实现紧凑而准确的场景表示，从而显著提升系统的可扩展性与鲁棒性，即便在存在位姿漂移的情况下亦能稳定运行。\n此外，我们提出了一种2D-3D 融合的相机跟踪方法，在大规模室内与室外场景中均可实现鲁棒且精确的相机跟踪。同时，我们设计了一种基于 2D-3D 高斯匹配的回环检测机制，用于消除位姿漂移。\n在此基础上，我们还提出了一种结合在线蒸馏的子图融合策略，在检测到回环时实现全局一致性优化，确保大规模场景下的整体重建质量。\n在多个室内与室外数据集上的实验验证了我们方法在精度、鲁棒性与泛化能力方面的优势。\n"
  },
  {
    "path": "abs/2505.19138.md",
    "content": "### Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis\n\nRecently, 3D Gaussian Splatting (3D-GS) based on Thermal Infrared (TIR) imaging has gained attention in novel-view synthesis, showing real-time rendering. However, novel-view synthesis with thermal infrared images suffers from transmission effects, emissivity, and low resolution, leading to floaters and blur effects in rendered images. To address these problems, we introduce Veta-GS, which leverages a view-dependent deformation field and a Thermal Feature Extractor (TFE) to precisely capture subtle thermal variations and maintain robustness. Specifically, we design view-dependent deformation field that leverages camera position and viewing direction, which capture thermal variations. Furthermore, we introduce the Thermal Feature Extractor (TFE) and MonoSSIM loss, which consider appearance, edge, and frequency to maintain robustness. Extensive experiments on the TI-NSD benchmark show that our method achieves better performance over existing methods.\n\n近年来，基于热红外（Thermal Infrared, TIR）成像的三维高斯散点（3D Gaussian Splatting, 3D-GS）在新视角合成任务中受到关注，并展现出实时渲染的潜力。然而，TIR 图像在新视角合成中面临诸多挑战，包括辐射传输效应、发射率变化以及分辨率较低，常导致渲染结果中出现漂浮伪影（floaters）和模糊现象。\n为了解决上述问题，我们提出了 Veta-GS，该方法结合视角相关形变场（view-dependent deformation field）与热特征提取器（Thermal Feature Extractor, TFE），以精确捕捉微妙的热变化并增强系统鲁棒性。\n具体而言，我们设计了一个视角相关的形变场，利用相机位置与视角方向建模热辐射随视角变化的细节特征。同时，引入 TFE 模块和一种新的 MonoSSIM 损失函数，该损失综合考虑图像外观、边缘结构与频域信息，从而提升模型在低质量 TIR 数据下的表现稳定性。\n在 TI-NSD 基准数据集上的大量实验表明，Veta-GS 在热红外新视角合成任务中相较于现有方法具有更优性能。\n"
  },
  {
    "path": "abs/2505.19154.md",
    "content": "### FHGS: Feature-Homogenized Gaussian Splatting\n\nScene understanding based on 3D Gaussian Splatting (3DGS) has recently achieved notable advances. Although 3DGS related methods have efficient rendering capabilities, they fail to address the inherent contradiction between the anisotropic color representation of gaussian primitives and the isotropic requirements of semantic features, leading to insufficient cross-view feature consistency. To overcome the limitation, we proposes FHGS (Feature-Homogenized Gaussian Splatting), a novel 3D feature fusion framework inspired by physical models, which can achieve high-precision mapping of arbitrary 2D features from pre-trained models to 3D scenes while preserving the real-time rendering efficiency of 3DGS. Specifically, our FHGS introduces the following innovations: Firstly, a universal feature fusion architecture is proposed, enabling robust embedding of large-scale pre-trained models' semantic features (e.g., SAM, CLIP) into sparse 3D structures. Secondly, a non-differentiable feature fusion mechanism is introduced, which enables semantic features to exhibit viewpoint independent isotropic distributions. This fundamentally balances the anisotropic rendering of gaussian primitives and the isotropic expression of features; Thirdly, a dual-driven optimization strategy inspired by electric potential fields is proposed, which combines external supervision from semantic feature fields with internal primitive clustering guidance. This mechanism enables synergistic optimization of global semantic alignment and local structural consistency.\n\n基于三维高斯投影（3D Gaussian Splatting, 3DGS）的场景理解技术近年来取得了显著进展。尽管3DGS相关方法具备高效的渲染能力，但其高斯基元的各向异性色彩表示方式与语义特征所需的各向同性表达之间存在根本性矛盾，导致跨视角的特征一致性不足。为克服这一限制，我们提出了一种新颖的三维特征融合框架——FHGS（Feature-Homogenized Gaussian Splatting），该方法受物理模型启发，能够在保留3DGS实时渲染效率的同时，实现任意预训练模型生成的二维特征向三维场景的高精度映射。\n具体而言，FHGS引入了以下创新点：\n首先，提出了一种通用的特征融合架构，能够稳健地将大规模预训练模型（如 SAM、CLIP）生成的语义特征嵌入至稀疏的三维结构中；\n其次，引入了一种非可微的特征融合机制，使语义特征能够呈现视角无关的各向同性分布，从根本上协调了高斯基元的各向异性渲染与语义特征的各向同性表达之间的冲突；\n第三，提出了一种受电势场启发的双驱动优化策略，将来自语义特征场的外部监督与高斯基元聚类的内部引导相结合，从而实现全局语义对齐与局部结构一致性的协同优化。\n"
  },
  {
    "path": "abs/2505.19175.md",
    "content": "### Triangle Splatting for Real-Time Radiance Field Rendering\n\nThe field of computer graphics was revolutionized by models such as Neural Radiance Fields and 3D Gaussian Splatting, displacing triangles as the dominant representation for photogrammetry. In this paper, we argue for a triangle comeback. We develop a differentiable renderer that directly optimizes triangles via end-to-end gradients. We achieve this by rendering each triangle as differentiable splats, combining the efficiency of triangles with the adaptive density of representations based on independent primitives. Compared to popular 2D and 3D Gaussian Splatting methods, our approach achieves higher visual fidelity, faster convergence, and increased rendering throughput. On the Mip-NeRF360 dataset, our method outperforms concurrent non-volumetric primitives in visual fidelity and achieves higher perceptual quality than the state-of-the-art Zip-NeRF on indoor scenes. Triangles are simple, compatible with standard graphics stacks and GPU hardware, and highly efficient: for the Garden scene, we achieve over 2,400 FPS at 1280x720 resolution using an off-the-shelf mesh renderer. These results highlight the efficiency and effectiveness of triangle-based representations for high-quality novel view synthesis. Triangles bring us closer to mesh-based optimization by combining classical computer graphics with modern differentiable rendering frameworks.\n\n计算机图形学领域曾因神经辐射场（Neural Radiance Fields）和三维高斯投影（3D Gaussian Splatting）等模型而发生革命性变化，三角形逐渐被这些用于摄影测量的新型主流表示方法所取代。而在本文中，我们主张让三角形重新回归主舞台。我们开发了一种可微分渲染器，能够通过端到端的梯度直接优化三角形。具体地，我们将每个三角形渲染为可微分的“splat”，结合了三角形的高效性与基于独立基元的表示方法中的自适应密度特性。\n与流行的二维和三维高斯投影方法相比，我们的方法在视觉保真度更高、收敛速度更快、渲染吞吐量更大。在 Mip-NeRF360 数据集上，我们的方法在视觉保真度方面优于当前的非体积基元方法，并在室内场景中达到了比最先进的 Zip-NeRF 更高的感知质量。三角形结构简单，与标准图形系统和 GPU 硬件高度兼容，且效率极高：在Garden场景中，使用现成的网格渲染器，在 1280×720 分辨率下可实现超过 2,400 FPS 的速度。\n这些结果凸显了基于三角形的表示在高质量新视角合成任务中的效率与有效性。三角形的回归将经典计算机图形学与现代可微渲染框架结合起来，进一步推动基于网格的优化研究。\n"
  },
  {
    "path": "abs/2505.19264.md",
    "content": "### Improving Novel view synthesis of 360∘ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images\n\nNovel view synthesis in 360∘ scenes from extremely sparse input views is essential for applications like virtual reality and augmented reality. This paper presents a novel framework for novel view synthesis in extremely sparse-view cases. As typical structure-from-motion methods are unable to estimate camera poses in extremely sparse-view cases, we apply DUSt3R to estimate camera poses and generate a dense point cloud. Using the poses of estimated cameras, we densely sample additional views from the upper hemisphere space of the scenes, from which we render synthetic images together with the point cloud. Training 3D Gaussian Splatting model on a combination of reference images from sparse views and densely sampled synthetic images allows a larger scene coverage in 3D space, addressing the overfitting challenge due to the limited input in sparse-view cases. Retraining a diffusion-based image enhancement model on our created dataset, we further improve the quality of the point-cloud-rendered images by removing artifacts. We compare our framework with benchmark methods in cases of only four input views, demonstrating significant improvement in novel view synthesis under extremely sparse-view conditions for 360∘ scenes.\n\n在极度稀疏视角下进行 360∘ 场景的新视角合成，对于虚拟现实（VR）和增强现实（AR）等应用至关重要。本文提出了一个用于极度稀疏视角条件下的新视角合成的新颖框架。\n由于传统的结构光束法（Structure-from-Motion）方法难以在极度稀疏视角条件下准确估计相机位姿，我们采用 DUSt3R 进行相机位姿估计，并生成稠密点云。基于估计得到的相机位姿，我们从场景上半球空间中稠密采样附加视角，并结合点云渲染合成图像。\n将来自稀疏视角的参考图像与稠密采样生成的合成图像相结合，对 3D Gaussian Splatting 模型进行训练，可实现对三维空间中更大场景范围的覆盖，从而缓解因输入过少导致的过拟合问题。此外，我们在自构建的数据集上对基于扩散模型的图像增强网络进行再训练，有效去除伪影，进一步提升点云渲染图像的质量。\n我们在仅提供四个输入视角的条件下，将所提出的框架与多种基准方法进行对比，实验结果表明在极度稀疏视角下的 360∘ 场景新视角合成任务中，我们的方法显著优于现有方法。\n"
  },
  {
    "path": "abs/2505.19420.md",
    "content": "### ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting\n\nRecent advancements in Neural Radiance Fields (NeRF) and 3D Gaussian-based Simultaneous Localization and Mapping (SLAM) methods have demonstrated exceptional localization precision and remarkable dense mapping performance. However, dynamic objects introduce critical challenges by disrupting scene consistency, leading to tracking drift and mapping artifacts. Existing methods that employ semantic segmentation or object detection for dynamic identification and filtering typically rely on predefined categorical priors, while discarding dynamic scene information crucial for robotic applications such as dynamic obstacle avoidance and environmental interaction. To overcome these challenges, we propose ADD-SLAM: an Adaptive Dynamic Dense SLAM framework based on Gaussian splitting. We design an adaptive dynamic identification mechanism grounded in scene consistency analysis, comparing geometric and textural discrepancies between real-time observations and historical maps. Ours requires no predefined semantic category priors and adaptively discovers scene dynamics. Precise dynamic object recognition effectively mitigates interference from moving targets during localization. Furthermore, we propose a dynamic-static separation mapping strategy that constructs a temporal Gaussian model to achieve online incremental dynamic modeling. Experiments conducted on multiple dynamic datasets demonstrate our method's flexible and accurate dynamic segmentation capabilities, along with state-of-the-art performance in both localization and mapping.\n\n神经辐射场（Neural Radiance Fields, NeRF）与基于三维高斯的同时定位与建图（Simultaneous Localization and Mapping, SLAM）方法近年来在定位精度和稠密建图性能方面取得了显著进展。然而，动态物体会破坏场景的一致性，导致跟踪漂移与建图伪影，从而构成关键挑战。现有方法通常依赖语义分割或目标检测来识别和过滤动态区域，但这类方法往往依赖预定义的类别先验，并且会舍弃对机器人应用至关重要的动态场景信息，例如动态障碍规避与环境交互。\n为应对上述挑战，我们提出了 ADD-SLAM：一种基于高斯拆分的自适应动态稠密 SLAM 框架。我们设计了一种基于场景一致性分析的自适应动态识别机制，通过比较实时观测与历史地图之间的几何与纹理差异，进行动态性判定。该机制无需依赖任何预定义的语义类别先验，能够自适应地发现场景中的动态变化。精确的动态目标识别有效减少了移动物体在定位过程中的干扰。\n此外，我们提出了一种动静态分离建图策略，构建时间维度上的高斯模型，以实现在线增量式的动态建模。在多个动态数据集上的实验证明，该方法在动态分割上具有良好的灵活性与准确性，并在定位与建图任务中均实现了当前最优性能。\n"
  },
  {
    "path": "abs/2505.19854.md",
    "content": "### Sparse2DGS: Sparse-View Surface Reconstruction using 2D Gaussian Splatting with Dense Point Cloud\n\nGaussian Splatting (GS) has gained attention as a fast and effective method for novel view synthesis. It has also been applied to 3D reconstruction using multi-view images and can achieve fast and accurate 3D reconstruction. However, GS assumes that the input contains a large number of multi-view images, and therefore, the reconstruction accuracy significantly decreases when only a limited number of input images are available. One of the main reasons is the insufficient number of 3D points in the sparse point cloud obtained through Structure from Motion (SfM), which results in a poor initialization for optimizing the Gaussian primitives. We propose a new 3D reconstruction method, called Sparse2DGS, to enhance 2DGS in reconstructing objects using only three images. Sparse2DGS employs DUSt3R, a fundamental model for stereo images, along with COLMAP MVS to generate highly accurate and dense 3D point clouds, which are then used to initialize 2D Gaussians. Through experiments on the DTU dataset, we show that Sparse2DGS can accurately reconstruct the 3D shapes of objects using just three images.\n\nGaussian Splatting（GS）作为一种高效的新视角合成方法，近年来受到广泛关注。它也被应用于多视图图像的三维重建任务中，并能实现快速且精确的重建效果。然而，GS 方法假设输入包含大量多视图图像，因此在仅提供少量输入图像的情况下，其重建精度会显著下降。造成这一问题的主要原因之一，是通过结构光束法（Structure from Motion, SfM）所获得的稀疏点云中三维点数量有限，从而导致高斯基元优化过程的初始化效果较差。\n为了解决这一问题，我们提出了一种新颖的三维重建方法 Sparse2DGS，旨在提升 2DGS 在仅使用三张图像时的重建能力。Sparse2DGS 结合了 DUSt3R（一种基础立体图像处理模型）与 COLMAP 多视图立体重建（MVS）技术，用于生成高精度、高密度的三维点云，进而用于初始化二维高斯。\n在 DTU 数据集上的实验证明，Sparse2DGS 能够仅使用三张图像，即实现对物体三维形状的高精度重建。\n"
  },
  {
    "path": "abs/2505.19883.md",
    "content": "### ErpGS: Equirectangular Image Rendering enhanced with 3D Gaussian Regularization\n\nThe use of multi-view images acquired by a 360-degree camera can reconstruct a 3D space with a wide area. There are 3D reconstruction methods from equirectangular images based on NeRF and 3DGS, as well as Novel View Synthesis (NVS) methods. On the other hand, it is necessary to overcome the large distortion caused by the projection model of a 360-degree camera when equirectangular images are used. In 3DGS-based methods, the large distortion of the 360-degree camera model generates extremely large 3D Gaussians, resulting in poor rendering accuracy. We propose ErpGS, which is Omnidirectional GS based on 3DGS to realize NVS addressing the problems. ErpGS introduce some rendering accuracy improvement techniques: geometric regularization, scale regularization, and distortion-aware weights and a mask to suppress the effects of obstacles in equirectangular images. Through experiments on public datasets, we demonstrate that ErpGS can render novel view images more accurately than conventional methods.\n\n利用 360 度相机获取的多视角图像可以用于重建大范围的三维空间。目前已有基于 NeRF 和 3DGS 的从等距矩形图像（equirectangular images）进行三维重建的方法，以及用于新视角合成（Novel View Synthesis, NVS）的方法。然而，在使用等距矩形图像时，需要克服由 360 度相机投影模型所引入的大幅畸变问题。\n在基于 3DGS 的方法中，360 度相机模型所造成的强畸变会生成极大的三维高斯，从而导致渲染精度显著下降。为了解决这一问题，我们提出了 ErpGS，一种面向全景图的新型 3DGS 方法，用于实现新视角合成，并有效应对上述挑战。\nErpGS 引入了一系列用于提升渲染精度的技术，包括：几何正则化、尺度正则化，以及畸变感知的权重与遮罩机制，以抑制等距矩形图像中障碍物所带来的影响。我们在多个公开数据集上的实验表明，ErpGS 在新视角图像渲染精度方面优于现有的传统方法。\n"
  },
  {
    "path": "abs/2505.20267.md",
    "content": "### HaloGS: Loose Coupling of Compact Geometry and Gaussian Splats for 3D Scenes\n\nHigh fidelity 3D reconstruction and rendering hinge on capturing precise geometry while preserving photo realistic detail. Most existing methods either fuse these goals into a single cumbersome model or adopt hybrid schemes whose uniform primitives lead to a trade off between efficiency and fidelity. In this paper, we introduce HaloGS, a dual representation that loosely couples coarse triangles for geometry with Gaussian primitives for appearance, motivated by the lightweight classic geometry representations and their proven efficiency in real world applications. Our design yields a compact yet expressive model capable of photo realistic rendering across both indoor and outdoor environments, seamlessly adapting to varying levels of scene complexity. Experiments on multiple benchmark datasets demonstrate that our method yields both compact, accurate geometry and high fidelity renderings, especially in challenging scenarios where robust geometric structure make a clear difference.\n\n高保真度的三维重建与渲染依赖于精确几何信息的获取与真实感细节的保留。现有的大多数方法要么将这两者融合为一个复杂笨重的模型，要么采用混合方案，但其统一的基元设计常常在效率与保真度之间造成权衡。\n本文提出了一种新颖的双重表示方法——HaloGS，该方法以轻量级的经典几何表示为灵感，将用于几何表达的粗略三角形结构与用于外观表达的高斯基元松耦合结合。该设计既紧凑又具表现力，能够在室内与室外环境中实现高度真实感的渲染，并可自适应不同场景复杂度的变化。\n在多个基准数据集上的实验表明，HaloGS 能够在生成紧凑且准确的几何结构的同时，提供高保真的图像渲染效果，特别是在那些对几何结构鲁棒性要求较高的复杂场景中表现尤为出色。\n"
  },
  {
    "path": "abs/2505.20270.md",
    "content": "### ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation\n\nThis paper aims to model the dynamics of 3D Gaussians from visual observations to support temporal extrapolation. Existing dynamic 3D reconstruction methods often struggle to effectively learn underlying dynamics or rely heavily on manually defined physical priors, which limits their extrapolation capabilities. To address this issue, we propose a novel dynamic 3D Gaussian Splatting prior-free motion extrapolation framework based on particle dynamics systems. The core advantage of our method lies in its ability to learn differential equations that describe the dynamics of 3D Gaussians, and follow them during future frame extrapolation. Instead of simply fitting to the observed visual frame sequence, we aim to more effectively model the gaussian particle dynamics system. To this end, we introduce a dynamics latent state vector into the standard Gaussian kernel and design a dynamics latent space encoder to extract initial state. Subsequently, we introduce a Neural ODEs-based dynamics module that models the temporal evolution of Gaussian in dynamics latent space. Finally, a Gaussian kernel space decoder is used to decode latent state at the specific time step into the deformation. Experimental results demonstrate that the proposed method achieves comparable rendering quality with existing approaches in reconstruction tasks, and significantly outperforms them in future frame extrapolation.\n\n本文旨在从视觉观测中建模三维高斯的动态行为，以支持时间外推任务。现有的动态三维重建方法通常难以有效学习潜在的运动规律，或严重依赖手动定义的物理先验，从而限制了其时间外推能力。\n为解决上述问题，我们提出了一种基于粒子动力系统的、无先验的动态三维高斯投影（3D Gaussian Splatting）运动外推框架。该方法的核心优势在于能够学习描述三维高斯动态行为的微分方程，并在未来帧预测中遵循该动力系统进行外推。我们并非仅对观测到的图像序列进行拟合，而是更深入地建模高斯粒子的动态系统。\n为此，我们在标准高斯核中引入了一个动态潜状态向量，并设计了一个动态潜空间编码器用于提取初始状态。随后，我们引入了基于神经常微分方程（Neural ODE）的动态模块，用于建模高斯在潜在动力空间中的时间演化过程。最后，通过高斯核空间解码器将任意时间步的潜在状态解码为对应的形变结果。\n实验表明，该方法在三维重建任务中能够达到与现有方法相当的渲染质量，并在未来帧的外推任务中显著优于现有方法。\n"
  },
  {
    "path": "abs/2505.20469.md",
    "content": "### CCL-LGS: Contrastive Codebook Learning for 3D Language Gaussian Splatting\n\nRecent advances in 3D reconstruction techniques and vision-language models have fueled significant progress in 3D semantic understanding, a capability critical to robotics, autonomous driving, and virtual/augmented reality. However, methods that rely on 2D priors are prone to a critical challenge: cross-view semantic inconsistencies induced by occlusion, image blur, and view-dependent variations. These inconsistencies, when propagated via projection supervision, deteriorate the quality of 3D Gaussian semantic fields and introduce artifacts in the rendered outputs. To mitigate this limitation, we propose CCL-LGS, a novel framework that enforces view-consistent semantic supervision by integrating multi-view semantic cues. Specifically, our approach first employs a zero-shot tracker to align a set of SAM-generated 2D masks and reliably identify their corresponding categories. Next, we utilize CLIP to extract robust semantic encodings across views. Finally, our Contrastive Codebook Learning (CCL) module distills discriminative semantic features by enforcing intra-class compactness and inter-class distinctiveness. In contrast to previous methods that directly apply CLIP to imperfect masks, our framework explicitly resolves semantic conflicts while preserving category discriminability. Extensive experiments demonstrate that CCL-LGS outperforms previous state-of-the-art methods.\n\n近年来，三维重建技术与视觉-语言模型的快速发展，极大推动了三维语义理解的进步，这一能力对于机器人、自主驾驶以及虚拟/增强现实等领域至关重要。然而，依赖二维先验的方法普遍面临一个关键挑战：由遮挡、图像模糊及视角依赖变化所导致的跨视角语义不一致性。这些不一致性在投影监督过程中被传递，会严重影响三维高斯语义场的质量，并在渲染结果中引入伪影。\n为缓解这一问题，我们提出了 CCL-LGS，一种通过融合多视角语义信息实现视角一致语义监督的新型框架。具体而言，我们首先使用零样本追踪器对一组由 SAM 生成的二维掩码进行对齐，并可靠地识别其所属类别。随后，利用 CLIP 提取跨视角的稳健语义编码。最后，我们引入对比式码本学习（Contrastive Codebook Learning, CCL）模块，通过增强类内聚合与类间分离，提炼判别性语义特征。\n与以往直接将 CLIP 应用于不完美掩码的方法不同，我们的框架显式地解决了语义冲突问题，同时保留了类别区分性。大量实验证明，CCL-LGS 在各项指标上均优于现有的最新方法。\n"
  },
  {
    "path": "abs/2505.20471.md",
    "content": "### WeatherEdit: Controllable Weather Editing with 4D Gaussian Field\n\nIn this work, we present WeatherEdit, a novel weather editing pipeline for generating realistic weather effects with controllable types and severity in 3D scenes. Our approach is structured into two key components: weather background editing and weather particle construction. For weather background editing, we introduce an all-in-one adapter that integrates multiple weather styles into a single pretrained diffusion model, enabling the generation of diverse weather effects in 2D image backgrounds. During inference, we design a Temporal-View (TV-) attention mechanism that follows a specific order to aggregate temporal and spatial information, ensuring consistent editing across multi-frame and multi-view images. To construct the weather particles, we first reconstruct a 3D scene using the edited images and then introduce a dynamic 4D Gaussian field to generate snowflakes, raindrops and fog in the scene. The attributes and dynamics of these particles are precisely controlled through physical-based modelling and simulation, ensuring realistic weather representation and flexible severity adjustments. Finally, we integrate the 4D Gaussian field with the 3D scene to render consistent and highly realistic weather effects. Experiments on multiple driving datasets demonstrate that WeatherEdit can generate diverse weather effects with controllable condition severity, highlighting its potential for autonomous driving simulation in adverse weather.\n\n本文提出了 WeatherEdit，一个新颖的天气编辑管线，能够在三维场景中生成真实感强、类型与强度均可控制的天气效果。我们的方法主要包括两个关键组成部分：天气背景编辑与天气粒子构建。\n在天气背景编辑阶段，我们引入了一种 多合一适配器，将多种天气风格集成到一个预训练扩散模型中，从而实现对二维图像背景中多样天气效果的生成。在推理过程中，我们设计了 时空视角注意力机制（Temporal-View Attention, TV-Attention），该机制以特定顺序聚合时间与空间信息，确保多帧多视角图像之间编辑结果的一致性。\n在天气粒子构建阶段，我们首先利用编辑后的图像对三维场景进行重建，随后引入 动态四维高斯场（4D Gaussian Field），用于在场景中生成雪花、雨滴与雾等天气粒子。这些粒子的属性与动态行为通过基于物理的建模与仿真进行精确控制，从而实现高度真实的天气表现与灵活可调的强度控制。最终，我们将该 4D 高斯场与三维场景融合，实现一致性强、逼真度高的天气效果渲染。\n在多个自动驾驶数据集上的实验表明，WeatherEdit 能够生成多样化、强度可控的天气效果，展示了其在恶劣天气下自动驾驶仿真中的应用潜力。\n"
  },
  {
    "path": "abs/2505.20610.md",
    "content": "### OmniIndoor3D: Comprehensive Indoor 3D Reconstruction\n\nWe propose a novel framework for comprehensive indoor 3D reconstruction using Gaussian representations, called OmniIndoor3D. This framework enables accurate appearance, geometry, and panoptic reconstruction of diverse indoor scenes captured by a consumer-level RGB-D camera. Since 3DGS is primarily optimized for photorealistic rendering, it lacks the precise geometry critical for high-quality panoptic reconstruction. Therefore, OmniIndoor3D first combines multiple RGB-D images to create a coarse 3D reconstruction, which is then used to initialize the 3D Gaussians and guide the 3DGS training. To decouple the optimization conflict between appearance and geometry, we introduce a lightweight MLP that adjusts the geometric properties of 3D Gaussians. The introduced lightweight MLP serves as a low-pass filter for geometry reconstruction and significantly reduces noise in indoor scenes. To improve the distribution of Gaussian primitives, we propose a densification strategy guided by panoptic priors to encourage smoothness on planar surfaces. Through the joint optimization of appearance, geometry, and panoptic reconstruction, OmniIndoor3D provides comprehensive 3D indoor scene understanding, which facilitates accurate and robust robotic navigation. We perform thorough evaluations across multiple datasets, and OmniIndoor3D achieves state-of-the-art results in appearance, geometry, and panoptic reconstruction. We believe our work bridges a critical gap in indoor 3D reconstruction.\n\n我们提出了一种基于高斯表示的全面室内三维重建新框架，称为 OmniIndoor3D。该框架可借助消费级 RGB-D 相机，实现多样室内场景中外观、几何与全景语义（panoptic）的精确重建。\n由于传统 3D Gaussian Splatting（3DGS）方法主要面向逼真渲染优化，缺乏实现高质量全景语义重建所需的精确几何信息，因此，OmniIndoor3D 首先融合多张 RGB-D 图像生成粗略三维重建结果，再用于初始化三维高斯并引导 3DGS 训练过程。\n为了解耦外观与几何之间的优化冲突，我们引入了一个轻量级的 MLP 网络，用于调整三维高斯的几何属性。该 MLP 相当于几何重建中的低通滤波器，有效降低了室内场景中的噪声。\n此外，为了优化高斯基元的分布，我们提出了一种由全景语义先验引导的密度增强策略，从而在平面区域上实现更平滑的一致性表示。\n通过对外观、几何与全景语义三方面的联合优化，OmniIndoor3D 实现了对室内三维场景的全面理解，并为高精度、强鲁棒性的机器人导航提供了有力支持。\n我们在多个数据集上进行了全面评估，结果表明 OmniIndoor3D 在外观质量、几何精度与全景语义重建方面均达到了当前最先进水平。我们相信，本工作有效填补了室内三维重建领域的关键空白。\n"
  },
  {
    "path": "abs/2505.20714.md",
    "content": "### Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting\n\nThis paper presents an innovative frequency-embedded 3D Gaussian splatting (3DGS) algorithm for wideband radio-frequency (RF) radiance field modeling, offering an advancement over the existing works limited to single-frequency modeling. Grounded in fundamental physics, we uncover the complex relationship between EM wave propagation behaviors and RF frequencies. Inspired by this, we design an EM feature network with attenuation and radiance modules to learn the complex relationships between RF frequencies and the key properties of each 3D Gaussian, specifically the attenuation factor and RF signal intensity. By training the frequency-embedded 3DGS model, we can efficiently reconstruct RF radiance fields at arbitrary unknown frequencies within a given 3D environment. Finally, we propose a large-scale power angular spectrum (PAS) dataset containing 50000 samples ranging from 1 to 100 GHz in 6 indoor environments, and conduct extensive experiments to verify the effectiveness of our method. Our approach achieves an average Structural Similarity Index Measure (SSIM) up to 0.72, and a significant improvement up to 17.8% compared to the current state-of-the-art (SOTA) methods trained on individual test frequencies. Additionally, our method achieves an SSIM of 0.70 without prior training on these frequencies, which represents only a 2.8% performance drop compared to models trained with full PAS data. This demonstrates our model's capability to estimate PAS at unknown frequencies.\n\n本文提出了一种创新的频率嵌入式三维高斯投影（frequency-embedded 3D Gaussian Splatting, 3DGS）算法，用于建模宽带射频（RF）辐射场，突破了现有方法仅限于单一频率建模的局限。该方法基于电磁波传播的基本物理原理，揭示了电磁波传播行为与射频频率之间复杂的关联关系。\n受此启发，我们设计了一个包含衰减模块与辐射模块的电磁特征网络（EM feature network），用于学习射频频率与每个三维高斯关键属性（即衰减因子与射频信号强度）之间的非线性映射关系。通过训练频率嵌入式 3DGS 模型，我们能够在任意未知频率下高效重建三维环境中的射频辐射场。\n此外，我们构建了一个大规模的 功率角谱（Power Angular Spectrum, PAS）数据集，涵盖 6 个室内环境共计 50000 个样本，频率范围为 1 到 100 GHz。大量实验证明我们方法的有效性：模型在重建任务中可达到最高 0.72 的结构相似性指数（SSIM），相较于当前最先进方法，在个别频率训练下提升了高达 17.8%；而在未见频率上进行测试时，仍能获得 0.70 的 SSIM，仅比全频训练模型下降 2.8%。\n这些结果表明，我们提出的模型具备强大的跨频率泛化能力，能够有效预测未知频率下的功率角谱，展示出在宽带射频场建模中的显著潜力。\n"
  },
  {
    "path": "abs/2505.20729.md",
    "content": "### Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting\n\nSparse-view scene reconstruction often faces significant challenges due to the constraints imposed by limited observational data. These limitations result in incomplete information, leading to suboptimal reconstructions using existing methodologies. To address this, we present Intern-GS, a novel approach that effectively leverages rich prior knowledge from vision foundation models to enhance the process of sparse-view Gaussian Splatting, thereby enabling high-quality scene reconstruction. Specifically, Intern-GS utilizes vision foundation models to guide both the initialization and the optimization process of 3D Gaussian splatting, effectively addressing the limitations of sparse inputs. In the initialization process, our method employs DUSt3R to generate a dense and non-redundant gaussian point cloud. This approach significantly alleviates the limitations encountered by traditional structure-from-motion (SfM) methods, which often struggle under sparse-view constraints. During the optimization process, vision foundation models predict depth and appearance for unobserved views, refining the 3D Gaussians to compensate for missing information in unseen regions. Extensive experiments demonstrate that Intern-GS achieves state-of-the-art rendering quality across diverse datasets, including both forward-facing and large-scale scenes, such as LLFF, DTU, and Tanks and Temples.\n\n稀疏视角下的场景重建由于观测数据受限，常常面临显著挑战。这些限制导致信息不完整，使得现有方法难以实现高质量的重建效果。为解决这一问题，我们提出 Intern-GS，一种新颖的方法，通过有效利用视觉基础模型中的丰富先验知识，提升稀疏视角条件下的高斯投影（Gaussian Splatting）重建质量，从而实现高质量的三维场景重建。\n具体而言，Intern-GS 利用视觉基础模型在 初始化阶段与优化阶段对 3D Gaussian Splatting 过程进行引导，从根本上缓解稀疏输入带来的不足。在初始化过程中，方法使用 DUSt3R 生成稠密且无冗余的高斯点云，显著缓解了传统结构光束法（Structure-from-Motion, SfM）在稀疏视角下难以构建有效点云的问题。\n在优化阶段，Intern-GS 借助视觉基础模型预测未观测视角下的深度与外观信息，从而补全不可见区域中的缺失信息，进一步细化三维高斯表示。\n大量实验证明，Intern-GS 在多个数据集上实现了当前最优的渲染质量，包括前视场景（如 LLFF）与大规模场景（如 DTU、Tanks and Temples）。该方法在稀疏视角条件下表现出强大的重建能力与泛化性能。\n"
  },
  {
    "path": "abs/2505.20858.md",
    "content": "### ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient\n\nClassical Bundle Adjustment (BA) methods require accurate initial estimates for convergence and typically assume known camera intrinsics, which limits their applicability when such information is uncertain or unavailable. We propose a novel probabilistic formulation of BA (ProBA) that explicitly models and propagates uncertainty in both the 2D observations and the 3D scene structure, enabling optimization without any prior knowledge of camera poses or focal length. Our method uses 3D Gaussians instead of point-like landmarks and we introduce uncertainty-aware reprojection losses by projecting the 3D Gaussians onto the 2D image space, and enforce geometric consistency across multiple 3D Gaussians using the Bhattacharyya coefficient to encourage overlap between their corresponding Gaussian distributions. This probabilistic framework leads to more robust and reliable optimization, even in the presence of outliers in the correspondence set, reducing the likelihood of converging to poor local minima. Experimental results show that \\textit{ProBA} outperforms traditional methods in challenging real-world conditions. By removing the need for strong initialization and known intrinsics, ProBA enhances the practicality of SLAM systems deployed in unstructured environments.\n\n传统的束调优化（Bundle Adjustment, BA）方法通常依赖于准确的初始估计，并假设相机内参已知，这在内参不确定或无法获取的情况下限制了其适用性。为突破这一限制，我们提出了一种全新的概率式束调优化方法 ProBA，该方法显式建模并传播二维观测与三维场景结构中的不确定性，从而在无需任何相机位姿或焦距先验的情况下实现优化。\nProBA 使用 三维高斯分布 替代传统的点状地标，并引入 不确定性感知的重投影损失函数，通过将三维高斯投影到二维图像空间，实现对观测不确定性的建模。同时，我们利用 Bhattacharyya 系数 约束多个三维高斯之间的几何一致性，鼓励它们在分布空间中具有较高的重叠度，从而实现稳定可靠的结构约束。\n该概率框架即使在存在匹配外点的情况下，也能进行更稳健、鲁棒的优化，显著降低陷入次优局部最小值的风险。实验证明，在现实复杂场景中，ProBA 相较传统方法表现出更优的重建精度与鲁棒性。\n通过消除对强初始值与已知相机内参的依赖，ProBA 极大提升了 SLAM 系统在非结构化环境中的实用性与灵活性。\n"
  },
  {
    "path": "abs/2505.21041.md",
    "content": "### CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians\n\nAccurate and efficient modeling of large-scale urban scenes is critical for applications such as AR navigation, UAV based inspection, and smart city digital twins. While aerial imagery offers broad coverage and complements limitations of ground-based data, reconstructing city-scale environments from such views remains challenging due to occlusions, incomplete geometry, and high memory demands. Recent advances like 3D Gaussian Splatting (3DGS) improve scalability and visual quality but remain limited by dense primitive usage, long training times, and poor suit ability for edge devices. We propose CityGo, a hybrid framework that combines textured proxy geometry with residual and surrounding 3D Gaussians for lightweight, photorealistic rendering of urban scenes from aerial perspectives. Our approach first extracts compact building proxy meshes from MVS point clouds, then uses zero order SH Gaussians to generate occlusion-free textures via image-based rendering and back-projection. To capture high-frequency details, we introduce residual Gaussians placed based on proxy-photo discrepancies and guided by depth priors. Broader urban context is represented by surrounding Gaussians, with importance-aware downsampling applied to non-critical regions to reduce redundancy. A tailored optimization strategy jointly refines proxy textures and Gaussian parameters, enabling real-time rendering of complex urban scenes on mobile GPUs with significantly reduced training and memory requirements. Extensive experiments on real-world aerial datasets demonstrate that our hybrid representation significantly reduces training time, achieving on average 1.4x speedup, while delivering comparable visual fidelity to pure 3D Gaussian Splatting approaches. Furthermore, CityGo enables real-time rendering of large-scale urban scenes on mobile consumer GPUs, with substantially reduced memory usage and energy consumption.\n\n对大规模城市场景进行准确高效的建模，对于增强现实导航、无人机巡检以及智慧城市数字孪生等应用至关重要。尽管航拍图像具有广泛的覆盖范围，并能弥补地面数据的不足，但从此类视角重建城市级环境仍然面临遮挡、几何不完整以及高内存开销等挑战。近年来，诸如三维高斯投影（3D Gaussian Splatting，简称 3DGS）等方法在可扩展性和视觉质量方面取得了进展，但仍受限于稠密原语的使用、训练时间较长以及不适用于边缘设备等问题。\n我们提出了 CityGo——一个混合框架，结合了带纹理的代理几何与残差和周边三维高斯，实现了从航拍视角对城市场景的轻量级、照片级真实感渲染。该方法首先从多视图立体（MVS）点云中提取紧凑的建筑代理网格；随后使用零阶球谐（SH）高斯通过图像渲染与反投影生成无遮挡纹理。为捕捉高频细节，我们引入了残差高斯，其位置由代理模型与照片之间的差异确定，并通过深度先验进行引导。更广阔的城市上下文则由周边高斯建模，非关键区域采用基于重要性的下采样策略以减少冗余。\n我们设计了一种定制的优化策略，联合优化代理纹理与高斯参数，使得复杂城市场景能够在移动端 GPU 上实现实时渲染，同时大幅减少训练时间和内存占用。在真实航拍数据集上的大量实验证明，CityGo 所采用的混合表示在保持与纯 3DGS 方法相当的视觉质量的同时，平均训练速度提升了 1.4 倍。此外，CityGo 还能够在消费级移动 GPU 上实现大规模城市场景的实时渲染，并显著降低内存使用和能耗。\n"
  },
  {
    "path": "abs/2505.21238.md",
    "content": "### 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling\n\nNovel view synthesis for underwater scene reconstruction presents unique challenges due to complex light-media interactions. Optical scattering and absorption in water body bring inhomogeneous medium attenuation interference that disrupts conventional volume rendering assumptions of uniform propagation medium. While 3D Gaussian Splatting (3DGS) offers real-time rendering capabilities, it struggles with underwater inhomogeneous environments where scattering media introduce artifacts and inconsistent appearance. In this study, we propose a physics-based framework that disentangles object appearance from water medium effects through tailored Gaussian modeling. Our approach introduces appearance embeddings, which are explicit medium representations for backscatter and attenuation, enhancing scene consistency. In addition, we propose a distance-guided optimization strategy that leverages pseudo-depth maps as supervision with depth regularization and scale penalty terms to improve geometric fidelity. By integrating the proposed appearance and medium modeling components via an underwater imaging model, our approach achieves both high-quality novel view synthesis and physically accurate scene restoration. Experiments demonstrate our significant improvements in rendering quality and restoration accuracy over existing methods.\n\n水下场景重建中的新视角合成面临独特挑战，主要源于复杂的光-介质相互作用。水体中的光散射与吸收造成非均质的介质衰减干扰，这破坏了传统体渲染中“传播介质均匀”的假设。尽管三维高斯投影（3D Gaussian Splatting，简称 3DGS）具备实时渲染能力，但在水下非均质环境中，由于散射介质引入伪影和外观不一致的问题，表现不佳。\n为此，我们提出了一种基于物理建模的框架，通过定制化的高斯建模将物体外观与水介质效应解耦。我们的方法引入了外观嵌入（appearance embeddings），作为对背向散射和光衰减的显式介质表示，从而提升场景一致性。此外，我们还提出了一种基于距离引导的优化策略，利用伪深度图作为监督信号，结合深度正则项和尺度惩罚项以增强几何精度。\n通过将所提出的外观与介质建模组件集成进水下成像模型中，我们的方法不仅实现了高质量的新视角合成，同时实现了物理一致性的场景复原。实验结果表明，与现有方法相比，我们在渲染质量与复原精度方面均取得了显著提升。\n"
  },
  {
    "path": "abs/2505.21258.md",
    "content": "### Plenodium: UnderWater 3D Scene Reconstruction with Plenoptic Medium Representation\n\nWe present Plenodium (plenoptic medium), an effective and efficient 3D representation framework capable of jointly modeling both objects and participating media. In contrast to existing medium representations that rely solely on view-dependent modeling, our novel plenoptic medium representation incorporates both directional and positional information through spherical harmonics encoding, enabling highly accurate underwater scene reconstruction. To address the initialization challenge in degraded underwater environments, we propose the pseudo-depth Gaussian complementation to augment COLMAP-derived point clouds with robust depth priors. In addition, a depth ranking regularized loss is developed to optimize the geometry of the scene and improve the ordinal consistency of the depth maps. Extensive experiments on real-world underwater datasets demonstrate that our method achieves significant improvements in 3D reconstruction. Furthermore, we conduct a simulated dataset with ground truth and the controllable scattering medium to demonstrate the restoration capability of our method in underwater scenarios.\n\n我们提出了 Plenodium（全光介质），这是一种高效且有效的三维表示框架，能够联合建模物体与参与介质。与现有仅依赖视角建模的介质表示方法不同，我们新颖的全光介质表示通过球谐编码引入了方向与位置的信息，从而实现了高度精确的水下场景重建。针对退化水下环境中的初始化难题，我们提出了伪深度高斯补全方法，用以增强由 COLMAP 生成的点云，提供鲁棒的深度先验。此外，我们还设计了一种深度排序正则损失，用于优化场景几何结构并提升深度图的序关系一致性。大量在真实水下数据集上的实验表明，我们的方法在三维重建方面取得了显著提升。进一步地，我们在具有真实标注与可控散射介质的模拟数据集上开展实验，验证了该方法在水下场景中的恢复能力。\n"
  },
  {
    "path": "abs/2505.21483.md",
    "content": "### MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation\n\nObject compositing offers significant promise for augmented reality (AR) and embodied intelligence applications. Existing approaches predominantly focus on single-image scenarios or intrinsic decomposition techniques, facing challenges with multi-view consistency, complex scenes, and diverse lighting conditions. Recent inverse rendering advancements, such as 3D Gaussian and diffusion-based methods, have enhanced consistency but are limited by scalability, heavy data requirements, or prolonged reconstruction time per scene. To broaden its applicability, we introduce MV-CoLight, a two-stage framework for illumination-consistent object compositing in both 2D images and 3D scenes. Our novel feed-forward architecture models lighting and shadows directly, avoiding the iterative biases of diffusion-based methods. We employ a Hilbert curve-based mapping to align 2D image inputs with 3D Gaussian scene representations seamlessly. To facilitate training and evaluation, we further introduce a large-scale 3D compositing dataset. Experiments demonstrate state-of-the-art harmonized results across standard benchmarks and our dataset, as well as casually captured real-world scenes demonstrate the framework's robustness and wide generalization.\n\n对象合成在增强现实（AR）和具身智能应用中具有广阔前景。现有方法主要聚焦于单张图像场景或内在分解技术，难以应对多视角一致性、复杂场景以及多样化光照条件等挑战。近年来，逆向渲染技术的发展（如三维高斯和基于扩散的方法）在一致性方面取得了进展，但仍受限于可扩展性差、数据需求大或每个场景重建时间长等问题。为拓宽其实用性，我们提出了 MV-CoLight，一种适用于二维图像和三维场景中实现光照一致性对象合成的两阶段框架。我们提出的新型前馈架构能够直接建模光照与阴影，避免了扩散方法中存在的迭代偏差。我们引入基于希尔伯特曲线的映射策略，实现了二维图像输入与三维高斯场景表示之间的无缝对齐。为支持训练与评估，我们还构建了一个大规模三维合成数据集。实验结果表明，我们的方法在标准基准和自建数据集上均实现了最先进的光照一致性效果，且在随手采集的真实场景中也展现出较强的鲁棒性与广泛的泛化能力。\n"
  },
  {
    "path": "abs/2505.21502.md",
    "content": "### Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis\n\nWe propose GRGS, a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions. Unlike existing methods that rely on per-character optimization or ignore physical constraints, GRGS adopts a feed-forward, fully supervised strategy that projects geometry, material, and illumination cues from multi-view 2D observations into 3D Gaussian representations. Specifically, to reconstruct lighting-invariant geometry, we introduce a Lighting-aware Geometry Refinement (LGR) module trained on synthetically relit data to predict accurate depth and surface normals. Based on the high-quality geometry, a Physically Grounded Neural Rendering (PGNR) module is further proposed to integrate neural prediction with physics-based shading, supporting editable relighting with shadows and indirect illumination. Besides, we design a 2D-to-3D projection training scheme that leverages differentiable supervision from ambient occlusion, direct, and indirect lighting maps, which alleviates the computational cost of explicit ray tracing. Extensive experiments demonstrate that GRGS achieves superior visual quality, geometric consistency, and generalization across characters and lighting conditions.\n\n\n我们提出了 GRGS，一种具备泛化性与可重光照能力的三维高斯框架，能够在多样光照条件下实现高保真的人体新视角合成。不同于现有方法依赖于逐人物优化或忽略物理约束，GRGS 采用前馈式、全监督策略，将几何、材质与光照线索从多视角二维观测投影到三维高斯表示中。具体而言，为了重建具有光照不变性的几何结构，我们引入了一个基于合成重光照数据训练的光照感知几何优化模块（Lighting-aware Geometry Refinement, LGR），以预测准确的深度和表面法向量。在高质量几何基础上，我们进一步提出了物理约束神经渲染模块（Physically Grounded Neural Rendering, PGNR），将神经预测与基于物理的着色相结合，支持带阴影与间接光照的可编辑重光照效果。此外，我们设计了一种二维到三维的投影训练方案，结合环境遮蔽、直接光照和间接光照图的可微监督，避免了显式光线追踪所带来的高计算开销。大量实验证明，GRGS 在不同人物与光照条件下均表现出卓越的视觉质量、几何一致性与泛化能力。\n"
  },
  {
    "path": "abs/2505.21890.md",
    "content": "### Hyperspectral Gaussian Splatting\n\nHyperspectral imaging (HSI) has been widely used in agricultural applications for non-destructive estimation of plant nutrient composition and precise determination of nutritional elements in samples. Recently, 3D reconstruction methods have been used to create implicit neural representations of HSI scenes, which can help localize the target object's nutrient composition spatially and spectrally. Neural Radiance Field (NeRF) is a cutting-edge implicit representation that can render hyperspectral channel compositions of each spatial location from any viewing direction. However, it faces limitations in training time and rendering speed. In this paper, we propose Hyperspectral Gaussian Splatting (HS-GS), which combines the state-of-the-art 3D Gaussian Splatting (3DGS) with a diffusion model to enable 3D explicit reconstruction of the hyperspectral scenes and novel view synthesis for the entire spectral range. To enhance the model's ability to capture fine-grained reflectance variations across the light spectrum and leverage correlations between adjacent wavelengths for denoising, we introduce a wavelength encoder to generate wavelength-specific spherical harmonics offsets. We also introduce a novel Kullback--Leibler divergence-based loss to mitigate the spectral distribution gap between the rendered image and the ground truth. A diffusion model is further applied for denoising the rendered images and generating photorealistic hyperspectral images. We present extensive evaluations on five diverse hyperspectral scenes from the Hyper-NeRF dataset to show the effectiveness of our proposed HS-GS framework. The results demonstrate that HS-GS achieves new state-of-the-art performance among all previously published methods.\n\n高光谱成像（Hyperspectral Imaging, HSI）已广泛应用于农业领域，用于非破坏性估算植物营养成分及精准测定样品中的营养元素。近年来，三维重建方法被用于构建高光谱场景的隐式神经表示，从而在空间和光谱维度上实现目标对象营养成分的定位。神经辐射场（Neural Radiance Field, NeRF）是一种先进的隐式表示方法，能够从任意视角渲染出每个空间位置的高光谱通道组成。然而，NeRF 在训练时间和渲染速度方面存在限制。\n为此，本文提出了 高光谱高斯投影（Hyperspectral Gaussian Splatting, HS-GS），结合了最先进的三维高斯投影（3D Gaussian Splatting, 3DGS）与扩散模型，能够实现高光谱场景的三维显式重建和整个光谱范围的新视角合成。为提升模型捕捉光谱中精细反射率变化的能力，并利用相邻波长间的相关性进行去噪，我们引入了一个波长编码器，用于生成特定波长的球谐函数偏移量。同时，我们设计了一种基于 Kullback–Leibler 散度的新型损失函数，用于缓解渲染图像与真实图像之间的光谱分布差异。我们还引入扩散模型对渲染图像进行去噪，生成具有真实感的高光谱图像。\n在 Hyper-NeRF 数据集中选取的五个多样化高光谱场景上，我们进行了大量评估实验。结果表明，HS-GS 在所有已发表的方法中实现了新的最先进性能。\n"
  },
  {
    "path": "abs/2505.22279.md",
    "content": "### Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss\n\nNovel view synthesis is a fundamental task in 3D computer vision that aims to reconstruct realistic images from a set of posed input views. However, reconstruction quality degrades significantly under sparse-view conditions due to limited geometric cues. Existing methods, such as Neural Radiance Fields (NeRF) and the more recent 3D Gaussian Splatting (3DGS), often suffer from blurred details and structural artifacts when trained with insufficient views. Recent works have identified the quality of rendered depth as a key factor in mitigating these artifacts, as it directly affects geometric accuracy and view consistency. In this paper, we address these challenges by introducing Hierarchical Depth-Guided Splatting (HDGS), a depth supervision framework that progressively refines geometry from coarse to fine levels. Central to HDGS is a novel Cascade Pearson Correlation Loss (CPCL), which aligns rendered and estimated monocular depths across multiple spatial scales. By enforcing multi-scale depth consistency, our method substantially improves structural fidelity in sparse-view scenarios. Extensive experiments on the LLFF and DTU benchmarks demonstrate that HDGS achieves state-of-the-art performance under sparse-view settings while maintaining efficient and high-quality rendering\n\n新视角合成是三维计算机视觉中的一项基础任务，旨在根据一组带位姿的输入视图重建逼真的图像。然而，在稀疏视角条件下，由于几何线索有限，重建质量会显著下降。现有方法，如神经辐射场（Neural Radiance Fields, NeRF）以及近期提出的三维高斯投影（3D Gaussian Splatting, 3DGS），在训练视角不足的情况下，往往会出现细节模糊与结构伪影等问题。已有研究指出，渲染深度图的质量是缓解这些问题的关键因素，因为其直接影响几何精度与视角一致性。\n本文针对上述挑战，提出了一种分层深度引导的投影框架——Hierarchical Depth-Guided Splatting（HDGS），通过逐级细化几何结构，从粗到细进行深度监督。HDGS 的核心是我们设计的 级联皮尔逊相关损失（Cascade Pearson Correlation Loss, CPCL），该损失函数在多个空间尺度上对渲染深度与估计的单目深度进行对齐。通过强制实施多尺度深度一致性，我们的方法在稀疏视角场景中显著提升了结构保真度。\n在 LLFF 和 DTU 基准测试上的大量实验表明，HDGS 在稀疏视角条件下实现了当前最优的重建性能，同时兼具高效性与高质量渲染效果。\n"
  },
  {
    "path": "abs/2505.22335.md",
    "content": "### UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments\n\nRecent 3D Gaussian Splatting (3DGS) techniques for Visual Simultaneous Localization and Mapping (SLAM) have significantly progressed in tracking and high-fidelity mapping. However, their sequential optimization framework and sensitivity to dynamic objects limit real-time performance and robustness in realworld scenarios. We present UP-SLAM, a real-time RGB-D SLAM system for dynamic environments that decouples tracking and mapping through a parallelized framework. A probabilistic octree is employed to manage Gaussian primitives adaptively, enabling efficient initialization and pruning without hand-crafted thresholds. To robustly filter dynamic regions during tracking, we propose a training-free uncertainty estimator that fuses multi-modal residuals to estimate per-pixel motion uncertainty, achieving open-set dynamic object handling without reliance on semantic labels. Furthermore, a temporal encoder is designed to enhance rendering quality. Concurrently, low-dimensional features are efficiently transformed via a shallow multilayer perceptron to construct DINO features, which are then employed to enrich the Gaussian field and improve the robustness of uncertainty prediction. Extensive experiments on multiple challenging datasets suggest that UP-SLAM outperforms state-of-the-art methods in both localization accuracy (by 59.8%) and rendering quality (by 4.57 dB PSNR), while maintaining real-time performance and producing reusable, artifact-free static maps in dynamic environments.\n\n近期用于视觉同步定位与建图（Visual SLAM）的三维高斯投影（3D Gaussian Splatting, 3DGS）技术在跟踪与高保真建图方面取得了显著进展。然而，其顺序优化框架以及对动态物体的敏感性限制了在真实环境中实现实时性能与鲁棒性的能力。为此，我们提出了 UP-SLAM，这是一种面向动态环境的实时 RGB-D SLAM 系统，通过并行化框架实现了跟踪与建图的解耦。UP-SLAM 采用概率八叉树对高斯基元进行自适应管理，支持高效的初始化与裁剪操作，无需手工设定阈值。为在跟踪过程中稳健地过滤动态区域，我们提出了一种无需训练的不确定性估计器，通过融合多模态残差来估计逐像素的运动不确定性，从而实现对开放集动态物体的处理，无需依赖语义标签。此外，我们设计了时序编码器以提升渲染质量。同时，系统通过浅层多层感知机高效转换低维特征以构建 DINO 特征，进一步用于丰富高斯场表示并增强不确定性预测的鲁棒性。在多个具有挑战性的数据集上的大量实验表明，UP-SLAM 在定位精度（提升 59.8%）和渲染质量（提升 4.57 dB PSNR）方面均优于现有最先进方法，同时具备实时性能，并能在动态环境中生成可复用、无伪影的静态地图。\n"
  },
  {
    "path": "abs/2505.22400.md",
    "content": "### STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering\n\nAlthough dynamic scene reconstruction has long been a fundamental challenge in 3D vision, the recent emergence of 3D Gaussian Splatting (3DGS) offers a promising direction by enabling high-quality, real-time rendering through explicit Gaussian primitives. However, existing 3DGS-based methods for dynamic reconstruction often suffer from spatio-temporal incoherence during initialization, where canonical Gaussians are constructed by aggregating observations from multiple frames without temporal distinction. This results in spatio-temporally entangled representations, making it difficult to model dynamic motion accurately. To overcome this limitation, we propose STDR (Spatio-Temporal Decoupling for Real-time rendering), a plug-and-play module that learns spatio-temporal probability distributions for each Gaussian. STDR introduces a spatio-temporal mask, a separated deformation field, and a consistency regularization to jointly disentangle spatial and temporal patterns. Extensive experiments demonstrate that incorporating our module into existing 3DGS-based dynamic scene reconstruction frameworks leads to notable improvements in both reconstruction quality and spatio-temporal consistency across synthetic and real-world benchmarks.\n\n尽管动态场景重建长期以来一直是三维视觉中的核心挑战，但近年来三维高斯投影（3D Gaussian Splatting, 3DGS）的兴起为该问题提供了有前景的解决方案，能够通过显式高斯基元实现高质量、实时渲染。然而，现有基于 3DGS 的动态重建方法在初始化阶段常常面临时空不一致的问题，即在构建标准高斯基元时，未区分时间地聚合多帧观测，导致生成的表示在空间与时间维度上交织混乱，从而难以准确建模动态运动。\n为解决这一问题，我们提出了 STDR（Spatio-Temporal Decoupling for Real-time rendering），这是一种可插拔模块，用于为每个高斯基元学习其时空概率分布。STDR 引入了时空掩码、独立的形变场以及一致性正则项，以联合解耦空间与时间模式。大量实验表明，将该模块集成到现有基于 3DGS 的动态场景重建框架中，能够在合成数据与真实数据集上显著提升重建质量和时空一致性。\n\n"
  },
  {
    "path": "abs/2505.22854.md",
    "content": "### CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting\n\nGaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, applying style transfer to GS-based representations, especially beyond simple color changes, remains challenging. In this work, we introduce CLIPGaussians, the first unified style transfer framework that supports text- and image-guided stylization across multiple modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates directly on Gaussian primitives and integrates into existing GS pipelines as a plug-in module, without requiring large generative models or retraining from scratch. CLIPGaussians approach enables joint optimization of color and geometry in 3D and 4D settings, and achieves temporal coherence in videos, while preserving a model size. We demonstrate superior style fidelity and consistency across all tasks, validating CLIPGaussians as a universal and efficient solution for multimodal style transfer.\n\n高斯投影（Gaussian Splatting, GS）近年来作为一种高效的三维场景表示方式崭露头角，可用于从二维图像渲染三维场景，并已被扩展应用于图像、视频和动态四维内容。然而，将风格迁移应用于基于 GS 的表示仍具有挑战性，特别是在超越简单颜色变换的复杂风格表达方面。\n在本工作中，我们提出了 CLIPGaussians，这是首个统一的风格迁移框架，支持在多种模态下（二维图像、视频、三维物体、四维场景）进行文本与图像引导的风格化操作。我们的方法直接作用于高斯基元，可作为插件模块集成至现有的 GS 渲染流程中，无需依赖大型生成模型或从头重新训练。\nCLIPGaussians 支持在三维与四维设置中对颜色与几何进行联合优化，并在视频中实现时间一致性，同时保持紧凑的模型规模。我们在各类任务中展示了优越的风格保真度与一致性，验证了 CLIPGaussians 作为通用高效的多模态风格迁移方案的有效性。\n"
  },
  {
    "path": "abs/2505.22859.md",
    "content": "### 4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians\n\nWe propose the first 4D tracking and mapping method that jointly performs camera localization and non-rigid surface reconstruction via differentiable rendering. Our approach captures 4D scenes from an online stream of color images with depth measurements or predictions by jointly optimizing scene geometry, appearance, dynamics, and camera ego-motion. Although natural environments exhibit complex non-rigid motions, 4D-SLAM remains relatively underexplored due to its inherent challenges; even with 2.5D signals, the problem is ill-posed because of the high dimensionality of the optimization space. To overcome these challenges, we first introduce a SLAM method based on Gaussian surface primitives that leverages depth signals more effectively than 3D Gaussians, thereby achieving accurate surface reconstruction. To further model non-rigid deformations, we employ a warp-field represented by a multi-layer perceptron (MLP) and introduce a novel camera pose estimation technique along with surface regularization terms that facilitate spatio-temporal reconstruction. In addition to these algorithmic challenges, a significant hurdle in 4D SLAM research is the lack of reliable ground truth and evaluation protocols, primarily due to the difficulty of 4D capture using commodity sensors. To address this, we present a novel open synthetic dataset of everyday objects with diverse motions, leveraging large-scale object models and animation modeling. In summary, we open up the modern 4D-SLAM research by introducing a novel method and evaluation protocols grounded in modern vision and rendering techniques.\n\n我们提出了首个联合进行相机定位与非刚性表面重建的 4D 跟踪与建图方法，基于可微渲染框架实现。该方法从带有深度测量或预测的彩色图像在线流中捕捉 4D 场景，通过联合优化场景的几何结构、外观属性、动态变化以及相机自身运动，实现高质量的 4D 重建。尽管自然环境中存在复杂的非刚性运动，但由于其内在挑战，4D-SLAM 仍是一项相对未被充分探索的研究课题；即使拥有 2.5D 信号，受限于高维优化空间，该问题仍为病态问题。为应对这些挑战，我们首先提出了一种基于高斯曲面基元的 SLAM 方法，相较于传统三维高斯表示，该方法能更有效地利用深度信息，从而实现准确的表面重建。为进一步建模非刚性形变，我们采用由多层感知机（MLP）表示的形变场，并引入了新颖的相机位姿估计技术与表面正则化项，以实现时空一致的重建。除了算法挑战外，4D SLAM 研究还面临缺乏可靠的真实标注数据与评估协议的问题，这主要源于使用普通传感器捕捉 4D 数据的困难。为此，我们构建了一个新颖的合成数据集，包含带有多样运动的日常物体，依托大规模三维物体模型与动画建模生成。综上所述，我们通过引入一种新方法和基于现代视觉与渲染技术的评估协议，推动了现代 4D-SLAM 研究的进展。\n"
  },
  {
    "path": "abs/2505.22908.md",
    "content": "### 3DGS Compression with Sparsity-guided Hierarchical Transform Coding\n\n3D Gaussian Splatting (3DGS) has gained popularity for its fast and high-quality rendering, but it has a very large memory footprint incurring high transmission and storage overhead. Recently, some neural compression methods, such as Scaffold-GS, were proposed for 3DGS but they did not adopt the approach of end-to-end optimized analysis-synthesis transforms which has been proven highly effective in neural signal compression. Without an appropriate analysis transform, signal correlations cannot be removed by sparse representation. Without such transforms the only way to remove signal redundancies is through entropy coding driven by a complex and expensive context modeling, which results in slower speed and suboptimal rate-distortion (R-D) performance. To overcome this weakness, we propose Sparsity-guided Hierarchical Transform Coding (SHTC), the first end-to-end optimized transform coding framework for 3DGS compression. SHTC jointly optimizes the 3DGS, transforms and a lightweight context model. This joint optimization enables the transform to produce representations that approach the best R-D performance possible. The SHTC framework consists of a base layer using KLT for data decorrelation, and a sparsity-coded enhancement layer that compresses the KLT residuals to refine the representation. The enhancement encoder learns a linear transform to project high-dimensional inputs into a low-dimensional space, while the decoder unfolds the Iterative Shrinkage-Thresholding Algorithm (ISTA) to reconstruct the residuals. All components are designed to be interpretable, allowing the incorporation of signal priors and fewer parameters than black-box transforms. This novel design significantly improves R-D performance with minimal additional parameters and computational overhead.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）因其高效且高质量的渲染性能而受到广泛关注，但其庞大的内存开销带来了较高的传输与存储负担。近期虽有如 Scaffold-GS 等神经压缩方法被提出用于 3DGS 压缩，但它们并未采用已在神经信号压缩中被广泛验证有效的端到端优化分析-合成变换策略。缺乏合适的分析变换将导致信号间的相关性无法通过稀疏表示去除，此时只能依赖复杂且代价高昂的上下文建模驱动的熵编码来压缩信号冗余，不仅速度较慢，且率失真（Rate-Distortion, R-D）性能不佳。\n为克服上述缺陷，我们提出了 Sparsity-guided Hierarchical Transform Coding（SHTC），这是首个用于 3DGS 压缩的端到端优化变换编码框架。SHTC 联合优化 3DGS 表示、变换模块与轻量级上下文模型，使得编码过程中所学习的表示能够接近最优的 R-D 性能。\nSHTC 框架由两个层级组成：基础层使用 KLT（Karhunen–Loève Transform）进行数据去相关，增强层则对 KLT 残差进行稀疏编码以进一步精细表示。增强编码器学习一种线性变换，将高维输入投影到低维空间；解码器则通过展开的迭代收缩-阈值算法（ISTA）重构残差信号。该框架的所有组件均具可解释性，便于融合先验知识，且参数量远少于黑盒型变换方法。\n这种新颖的设计在几乎不增加计算开销与模型参数的前提下，显著提升了压缩任务中的率失真性能。\n"
  },
  {
    "path": "abs/2505.22978.md",
    "content": "### Pose-free 3D Gaussian splatting via shape-ray estimation\n\nWhile generalizable 3D Gaussian splatting enables efficient, high-quality rendering of unseen scenes, it heavily depends on precise camera poses for accurate geometry. In real-world scenarios, obtaining accurate poses is challenging, leading to noisy pose estimates and geometric misalignments. To address this, we introduce SHARE, a pose-free, feed-forward Gaussian splatting framework that overcomes these ambiguities by joint shape and camera rays estimation. Instead of relying on explicit 3D transformations, SHARE builds a pose-aware canonical volume representation that seamlessly integrates multi-view information, reducing misalignment caused by inaccurate pose estimates. Additionally, anchor-aligned Gaussian prediction enhances scene reconstruction by refining local geometry around coarse anchors, allowing for more precise Gaussian placement. Extensive experiments on diverse real-world datasets show that our method achieves robust performance in pose-free generalizable Gaussian splatting.\n\n尽管具备泛化能力的三维高斯投影（3D Gaussian Splatting）能够高效地对未见场景进行高质量渲染，但其高度依赖于精确的相机位姿以确保几何准确性。在现实应用中，获取精确的相机位姿具有挑战性，往往会出现噪声估计与几何错位的问题。\n为应对这一问题，我们提出了 SHARE，一种无需位姿信息的前馈式高斯投影框架，通过联合估计物体形状与相机光线，有效缓解由位姿不确定性带来的模糊与对齐误差。SHARE 不依赖显式的三维变换，而是构建了一个具备位姿感知能力的标准体积表示，能够自然融合多视角信息，从而减少由不准位姿估计引发的几何错位。\n此外，我们还引入了锚点对齐高斯预测机制，通过围绕粗略锚点优化局部几何，实现更精确的高斯放置，进一步提升场景重建质量。\n在多个真实世界数据集上的广泛实验证明，SHARE 在无需位姿信息的泛化高斯投影任务中展现出强健的性能与出色的渲染效果。\n"
  },
  {
    "path": "abs/2505.23044.md",
    "content": "### SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images\n\nA major breakthrough in 3D reconstruction is the feedforward paradigm to generate pixel-wise 3D points or Gaussian primitives from sparse, unposed images. To further incorporate semantics while avoiding the significant memory and storage costs of high-dimensional semantic features, existing methods extend this paradigm by associating each primitive with a compressed semantic feature vector. However, these methods have two major limitations: (a) the naively compressed feature compromises expressiveness, affecting the model's ability to capture fine-grained semantics, and (b) the pixel-wise primitive prediction introduces redundancy in overlapping areas, causing unnecessary memory overhead. To this end, we introduce SpatialSplat, a feedforward framework that produces redundancy-aware Gaussians and capitalizes on a dual-field semantic representation. Particularly, with the insight that primitives within the same instance exhibit high semantic consistency, we decompose the semantic representation into a coarse feature field that encodes uncompressed semantics with minimal primitives, and a fine-grained yet low-dimensional feature field that captures detailed inter-instance relationships. Moreover, we propose a selective Gaussian mechanism, which retains only essential Gaussians in the scene, effectively eliminating redundant primitives. Our proposed Spatialsplat learns accurate semantic information and detailed instances prior with more compact 3D Gaussians, making semantic 3D reconstruction more applicable. We conduct extensive experiments to evaluate our method, demonstrating a remarkable 60% reduction in scene representation parameters while achieving superior performance over state-of-the-art methods.\n\n三维重建领域的一项重要突破是采用前馈式范式，从稀疏、无位姿图像中直接生成逐像素的三维点或高斯基元。为进一步引入语义信息，同时避免高维语义特征所带来的显著内存与存储开销，现有方法通常为每个基元关联一个压缩的语义特征向量。然而，这些方法存在两个主要局限：（a）特征压缩方式过于简单，导致表达能力受限，影响模型对细粒度语义的捕捉能力；（b）逐像素预测基元会在重叠区域引入冗余，造成不必要的内存负担。\n为解决上述问题，我们提出了 SpatialSplat，一个生成冗余感知高斯基元的前馈式框架，并结合了双场语义表示。基于这样一个观察：同一实例内部的基元往往具有高度语义一致性，我们将语义表示解耦为两个部分：一个粗粒度特征场，使用极少量基元编码未经压缩的全局语义信息；以及一个细粒度但低维的特征场，捕捉实例间的局部语义关系。\n此外，我们提出了选择性高斯机制，保留场景中最具代表性的基元，有效剔除冗余高斯点，从而提升表达效率。\nSpatialSplat 能够以更紧凑的三维高斯形式学习准确的语义信息与丰富的实例先验，使语义三维重建更具实用性。我们在多个数据集上进行了大量实验，结果表明该方法在场景表示参数上减少了约 60%，同时在性能上优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2505.23158.md",
    "content": "### LODGE: Level-of-Detail Large-Scale Gaussian Splatting with Efficient Rendering\n\nIn this work, we present a novel level-of-detail (LOD) method for 3D Gaussian Splatting that enables real-time rendering of large-scale scenes on memory-constrained devices. Our approach introduces a hierarchical LOD representation that iteratively selects optimal subsets of Gaussians based on camera distance, thus largely reducing both rendering time and GPU memory usage. We construct each LOD level by applying a depth-aware 3D smoothing filter, followed by importance-based pruning and fine-tuning to maintain visual fidelity. To further reduce memory overhead, we partition the scene into spatial chunks and dynamically load only relevant Gaussians during rendering, employing an opacity-blending mechanism to avoid visual artifacts at chunk boundaries. Our method achieves state-of-the-art performance on both outdoor (Hierarchical 3DGS) and indoor (Zip-NeRF) datasets, delivering high-quality renderings with reduced latency and memory requirements.\n\n本工作提出了一种新颖的三维高斯投影（3D Gaussian Splatting）细节层次（Level-of-Detail, LOD）方法，可在内存受限设备上实现大规模场景的实时渲染。我们的方法引入了分层 LOD 表示，通过依据相机距离迭代选择最优高斯子集，从而显著减少渲染时间与 GPU 内存占用。\n具体而言，我们为每一级 LOD 构建表示时，首先应用一个具备深度感知的三维平滑滤波器，随后进行基于重要性的剪枝与精调，以保持视觉保真度。为进一步降低内存开销，我们将场景划分为多个空间块，并在渲染过程中仅动态加载相关高斯基元，同时引入透明度混合机制以避免块边界处的视觉伪影。\n在户外（Hierarchical 3DGS）和室内（Zip-NeRF）数据集上的实验结果表明，该方法在降低延迟与内存占用的同时，仍能实现高质量渲染，达到了当前最先进的性能水平。\n"
  },
  {
    "path": "abs/2505.23280.md",
    "content": "### Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting\n\nRecent advances in 3D Gaussian Splatting have shown remarkable potential for novel view synthesis. However, most existing large-scale scene reconstruction methods rely on the divide-and-conquer paradigm, which often leads to the loss of global scene information and requires complex parameter tuning due to scene partitioning and local optimization. To address these limitations, we propose MixGS, a novel holistic optimization framework for large-scale 3D scene reconstruction. MixGS models the entire scene holistically by integrating camera pose and Gaussian attributes into a view-aware representation, which is decoded into fine-detailed Gaussians. Furthermore, a novel mixing operation combines decoded and original Gaussians to jointly preserve global coherence and local fidelity. Extensive experiments on large-scale scenes demonstrate that MixGS achieves state-of-the-art rendering quality and competitive speed, while significantly reducing computational requirements, enabling large-scale scene reconstruction training on a single 24GB VRAM GPU.\n\n近年来，3D Gaussian Splatting 在新视角合成方面展现出显著潜力。然而，大多数现有的大规模场景重建方法仍依赖于“分而治之”的范式，这种方式通常导致全局场景信息的丢失，并因场景划分与局部优化而需要复杂的参数调优。为了解决这些限制，我们提出了一种新颖的大规模三维场景重建整体优化框架——MixGS。MixGS 通过将相机位姿与高斯属性整合为一种面向视图的表示，实现对整个场景的整体建模，该表示随后被解码为具有精细细节的高斯点。此外，我们设计了一种新颖的混合操作，将解码后的高斯与原始高斯结合，有效地同时保持全局一致性与局部细节保真。我们在多个大规模场景上的广泛实验表明，MixGS 在保持先进渲染质量与具有竞争力的速度的同时，显著降低了计算资源需求，使得在单块 24GB 显存的 GPU 上训练大规模场景重建成为可能。\n"
  },
  {
    "path": "abs/2505.23716.md",
    "content": "### AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views\n\nWe introduce AnySplat, a feed-forward network for novel-view synthesis from uncalibrated image collections. In contrast to traditional neural-rendering pipelines that demand known camera poses and per-scene optimization, or recent feed-forward methods that buckle under the computational weight of dense views—our model predicts everything in one shot. A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance, and the corresponding camera intrinsics and extrinsics for each input image. This unified design scales effortlessly to casually captured, multi-view datasets without any pose annotations. In extensive zero-shot evaluations, AnySplat matches the quality of pose-aware baselines in both sparse- and dense-view scenarios while surpassing existing pose-free approaches. Moreover, it greatly reduce rendering latency compared to optimization-based neural fields, bringing real-time novel-view synthesis within reach for unconstrained capture settings.\n\n我们提出了 AnySplat，一种用于新视角合成的前馈网络，可从无标定图像集合中直接进行推理。与传统神经渲染方法依赖已知相机位姿及每个场景的优化不同，也区别于近年来在密集视图下计算开销巨大的前馈方法，AnySplat 能够一键完成全部预测。通过一次前向传播，模型即可输出一组编码了场景几何与外观的三维高斯图元（3D Gaussian primitives），以及每张输入图像的相应相机内参与外参。\n这一统一设计可无缝扩展至无姿态标注的随手拍摄的多视角数据集，无需额外预处理。在大量零样本测试中，AnySplat 在稀疏视图和密集视图场景中均能达到与依赖位姿的基线方法相媲美的渲染质量，同时显著超越现有无需位姿的方案。\n此外，与基于优化的神经场方法相比，AnySplat 大幅降低了渲染延迟，使得在非约束采集环境中实现实时新视角合成成为可能。\n"
  },
  {
    "path": "abs/2505.23734.md",
    "content": "### ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS\n\nFeed-forward 3D Gaussian Splatting (3DGS) models have recently emerged as a promising solution for novel view synthesis, enabling one-pass inference without the need for per-scene 3DGS optimization. However, their scalability is fundamentally constrained by the limited capacity of their encoders, leading to degraded performance or excessive memory consumption as the number of input views increases. In this work, we analyze feed-forward 3DGS frameworks through the lens of the Information Bottleneck principle and introduce ZPressor, a lightweight architecture-agnostic module that enables efficient compression of multi-view inputs into a compact latent state Z that retains essential scene information while discarding redundancy. Concretely, ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state Z. We show that integrating ZPressor into several state-of-the-art feed-forward 3DGS models consistently improves performance under moderate input views and enhances robustness under dense view settings on two large-scale benchmarks DL3DV-10K and RealEstate10K.\n\n前馈式的 3D 高斯投影（3D Gaussian Splatting，3DGS）模型近年来成为新视角合成的一种有前景的解决方案，能够在无需对每个场景单独优化的情况下实现一次性推理。然而，这类方法的可扩展性受到其编码器容量的根本限制——当输入视图数量增加时，要么性能下降，要么内存消耗剧增。\n在本研究中，我们从信息瓶颈（Information Bottleneck）原理的角度对前馈式 3DGS 框架进行了分析，并提出了 ZPressor：一个轻量级、与架构无关的模块，能够将多视图输入有效压缩为一个紧凑的潜在状态 $Z$，在保留关键信息的同时去除冗余内容。\n具体来说，ZPressor 通过将输入视图划分为锚视图（anchor views）和支撑视图（support views），并利用**跨注意力机制（cross attention）**将支撑视图中的信息压缩到锚视图中，从而形成潜在状态 $Z$。借助 ZPressor，现有前馈式 3DGS 模型可以在单个 80GB 显存的 GPU 上扩展到处理 超过 100 张 480P 输入图像。\n实验表明，在多个先进的前馈式 3DGS 模型中集成 ZPressor，能够在中等视图数量条件下显著提升性能，并在密集视图设置下增强模型鲁棒性。我们在两个大规模基准数据集——DL3DV-10K 和 RealEstate10K 上验证了这一点。\n"
  },
  {
    "path": "abs/2505.24053.md",
    "content": "### 3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians\n\n3D Gaussian Splatting (3DGS) marks a significant milestone in balancing the quality and efficiency of differentiable rendering. However, its high efficiency stems from an approximation of projecting 3D Gaussians onto the image plane as 2D Gaussians, which inherently limits rendering quality--particularly under large Field-of-View (FoV) camera inputs. While several recent works have extended 3DGS to mitigate these approximation errors, none have successfully achieved both exactness and high efficiency simultaneously. In this work, we introduce 3DGEER, an Exact and Efficient Volumetric Gaussian Rendering method. Starting from first principles, we derive a closed-form expression for the density integral along a ray traversing a 3D Gaussian distribution. This formulation enables precise forward rendering with arbitrary camera models and supports gradient-based optimization of 3D Gaussian parameters. To ensure both exactness and real-time performance, we propose an efficient method for computing a tight Particle Bounding Frustum (PBF) for each 3D Gaussian, enabling accurate and efficient ray-Gaussian association. We also introduce a novel Bipolar Equiangular Projection (BEAP) representation to accelerate ray association under generic camera models. BEAP further provides a more uniform ray sampling strategy to apply supervision, which empirically improves reconstruction quality. Experiments on multiple pinhole and fisheye datasets show that our method consistently outperforms prior methods, establishing a new state-of-the-art in real-time neural rendering.\n\n3D 高斯投影（3D Gaussian Splatting，简称 3DGS）在可微渲染领域中实现了质量与效率之间的显著平衡，标志着一个重要的里程碑。然而，其高效性来自于一种近似做法：将三维高斯投影到图像平面时近似为二维高斯。这一近似在本质上限制了渲染质量，尤其是在大视场角（FoV）相机输入的情况下表现不佳。尽管已有多项研究尝试扩展 3DGS 以缓解该近似误差，但目前尚无方法能同时兼顾精确性与高效率。\n在本研究中，我们提出了 3DGEER：一种精确且高效的体积高斯渲染方法（Exact and Efficient Volumetric Gaussian Rendering）。从最基本的原理出发，我们推导出一条光线穿越三维高斯分布时，其密度积分的闭式表达式，从而实现了对任意相机模型的精确前向渲染，并支持对 3D 高斯参数的梯度优化。\n为同时保证精确性和实时性能，我们还提出了一种高效算法，用于为每个 3D 高斯构建紧致的粒子视锥体（Particle Bounding Frustum, PBF），从而实现准确而高效的光线与高斯的关联。此外，我们引入了一种新颖的表示方法——双极等角投影（Bipolar Equiangular Projection, BEAP），用于加速在通用相机模型下的光线关联。BEAP 同时提供了更均匀的光线采样策略，以进行监督，实证上提升了重建质量。\n在多个针孔相机和鱼眼相机数据集上的实验结果表明，我们的方法在实时神经渲染中持续优于现有方法，确立了新的最先进水平。\n"
  },
  {
    "path": "abs/2505.24608.md",
    "content": "### GARLIC: GAussian Representation LearnIng for spaCe partitioning\n\nWe introduce GARLIC (GAussian Representation LearnIng for spaCe partitioning), a novel indexing structure based on N-dimensional Gaussians for efficiently learning high-dimensional vector spaces. Our approach is inspired from Gaussian splatting techniques, typically used in 3D rendering, which we adapt for high-dimensional search and classification. We optimize Gaussian parameters using information-theoretic objectives that balance coverage, assignment confidence, and structural and semantic consistency. A key contribution is to progressively refine the representation through split and clone operations, handling hundreds of dimensions, thus handling varying data densities. GARLIC offers the fast building times of traditional space partitioning methods (e.g., under ∼ 5 min build time for SIFT1M) while achieving ∼ 50% Recall10@10 in low-candidate regimes. Experimental results on standard benchmarks demonstrate our method’s consistency in (a) k-NN retrieval, outperforming methods, such as Faiss-IVF, in fast-recall by using about half their probes for the same Recall10@10 in Fashion-MNIST, and (b) in classification tasks, beating by ∼ 15% accuracy other majority voting methods. Further, we show strong generalization capabilities, maintaining high accuracy even with downsampled training data: using just 1% of the training data returns ∼ 45% Recall@1, thus making GARLIC quite powerful for applications requiring both speed and accuracy.\n\n我们提出了 GARLIC，这是一种基于 N 维高斯分布的新型索引结构，能够高效地学习高维向量空间。我们的方法受到高斯投影（Gaussian Splatting）技术的启发，该技术通常用于 3D 渲染，而我们将其适配用于高维空间中的搜索与分类任务。我们通过信息论目标函数优化高斯参数，在覆盖性、分配置信度、以及结构与语义一致性之间实现平衡。我们的一个关键贡献是通过**分裂（split）和克隆（clone）**操作逐步细化表示，从而能够处理数百维特征并适应不同的数据密度。GARLIC 具有传统空间划分方法的快速构建能力（例如在 SIFT1M 上构建时间小于约 5 分钟），同时在候选样本数量较少的场景中达成约 50% 的 Recall10@10。在标准基准测试中的实验结果表明，我们的方法在以下方面表现出一致性：(a) k 近邻检索（k-NN retrieval）：在快速召回任务中优于 Faiss-IVF 等方法，在 Fashion-MNIST 上以大约一半的查询次数实现相同的 Recall10@10；(b) 分类任务（classification tasks）：在多数投票策略下准确率高出其他方法约 15%。此外，我们展示了其强大的泛化能力：即便在训练数据被下采样的情况下仍能保持较高准确率——使用仅 1% 的训练数据即可达到约 45% 的 Recall@1。因此，GARLIC 在对速度与准确性均有要求的应用中展现出强大潜力。\n"
  },
  {
    "path": "abs/2505.24746.md",
    "content": "### Tackling View-Dependent Semantics in 3D Language Gaussian Splatting\n\nRecent advancements in 3D Gaussian Splatting (3D-GS) enable high-quality 3D scene reconstruction from RGB images. Many studies extend this paradigm for language-driven open-vocabulary scene understanding. However, most of them simply project 2D semantic features onto 3D Gaussians and overlook a fundamental gap between 2D and 3D understanding: a 3D object may exhibit various semantics from different viewpoints--a phenomenon we term view-dependent semantics. To address this challenge, we propose LaGa (Language Gaussians), which establishes cross-view semantic connections by decomposing the 3D scene into objects. Then, it constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics. Extensive experiments demonstrate that LaGa effectively captures key information from view-dependent semantics, enabling a more comprehensive understanding of 3D scenes. Notably, under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset.\n\n近年来，3D Gaussian Splatting（3D-GS） 在从 RGB 图像进行高质量三维场景重建方面取得了显著进展。许多研究进一步将该范式扩展至语言驱动的开放词汇三维场景理解。然而，大多数方法仅仅是将二维语义特征投影到三维高斯上，忽略了二维与三维语义理解之间的一个基本鸿沟：一个三维物体在不同视角下可能呈现出不同的语义——我们称之为视角相关语义（view-dependent semantics）。\n为应对这一挑战，我们提出了 LaGa（Language Gaussians）。该方法通过将三维场景分解为不同对象，建立跨视角的语义关联；随后，它对语义描述符进行聚类，并基于多视角语义对其进行重加权，从而构建出视角聚合语义表示（view-aggregated semantic representations）。\n大量实验表明，LaGa 能够有效捕捉视角相关语义中的关键信息，实现对三维场景更全面的理解。值得注意的是，在相同实验设置下，LaGa 在 LERF-OVS 数据集上将此前 SOTA 的 mIoU 提升了 18.7 个百分点，展现出显著优势。\n"
  },
  {
    "path": "abs/2505.24796.md",
    "content": "### TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores\n\n3D Gaussian Splatting (3DGS) renders pixels by rasterizing Gaussian primitives, where conditional alpha-blending dominates the time cost in the rendering pipeline. This paper proposes TC-GS, an algorithm-independent universal module that expands Tensor Core (TCU) applicability for 3DGS, leading to substantial speedups and seamless integration into existing 3DGS optimization frameworks. The key innovation lies in mapping alpha computation to matrix multiplication, fully utilizing otherwise idle TCUs in existing 3DGS implementations. TC-GS provides plug-and-play acceleration for existing top-tier acceleration algorithms tightly coupled with rendering pipeline designs, like Gaussian compression and redundancy elimination algorithms. Additionally, we introduce a global-to-local coordinate transformation to mitigate rounding errors from quadratic terms of pixel coordinates caused by Tensor Core half-precision computation. Extensive experiments demonstrate that our method maintains rendering quality while providing an additional 2.18x speedup over existing Gaussian acceleration algorithms, thus reaching up to a total 5.6x acceleration.\n\n3D Gaussian Splatting（3DGS）通过栅格化高斯图元来渲染像素，其中条件 alpha 混合在渲染管线中占据了主要的时间开销。本文提出了 TC-GS，一个与算法无关的通用模块，能够扩展 Tensor Core（TCU）在 3DGS 中的适用性，从而实现显著加速，并可无缝集成至现有的 3DGS 优化框架中。\n其核心创新在于：将 alpha 计算映射为矩阵乘法，从而充分利用现有 3DGS 实现中原本处于空闲状态的 Tensor Core 单元（TCU）。TC-GS 为当前那些与渲染管线高度耦合的顶级加速算法（如高斯压缩与冗余消除算法）提供了即插即用的加速能力。\n此外，我们引入了一种全局到局部的坐标变换策略，以缓解 Tensor Core 半精度计算中，由像素坐标二次项引起的舍入误差问题。\n大量实验证明，TC-GS 在保持渲染质量不变的前提下，相较现有高斯加速算法带来额外 2.18 倍的加速效果，整体实现最高 5.6 倍的加速比。\n"
  },
  {
    "path": "abs/2505.24877.md",
    "content": "### AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion\n\nExisting methods for image-to-3D avatar generation struggle to produce highly detailed, animation-ready avatars suitable for real-world applications. We introduce AdaHuman, a novel framework that generates high-fidelity animatable 3D avatars from a single in-the-wild image. AdaHuman incorporates two key innovations: (1) A pose-conditioned 3D joint diffusion model that synthesizes consistent multi-view images in arbitrary poses alongside corresponding 3D Gaussian Splats (3DGS) reconstruction at each diffusion step; (2) A compositional 3DGS refinement module that enhances the details of local body parts through image-to-image refinement and seamlessly integrates them using a novel crop-aware camera ray map, producing a cohesive detailed 3D avatar. These components allow AdaHuman to generate highly realistic standardized A-pose avatars with minimal self-occlusion, enabling rigging and animation with any input motion. Extensive evaluation on public benchmarks and in-the-wild images demonstrates that AdaHuman significantly outperforms state-of-the-art methods in both avatar reconstruction and reposing.\n\n现有的图像到三维头像生成方法在生成具有高度细节、可用于动画制作的三维头像方面仍存在明显不足，难以满足现实应用需求。我们提出了 AdaHuman，一个能够从单张自然图像中生成高保真、可动画化三维头像的全新框架。\n\nAdaHuman 包含两个关键创新点：\n(1)\t姿态条件 3D 关节点扩散模型：该模型可在任意姿态下合成一致的多视角图像，并在每一步扩散过程中同步完成对应的 3D Gaussian Splatting（3DGS）重建；\n(2)\t组合式 3DGS 细化模块：该模块通过图像到图像的细化方式增强局部身体部位的细节，并结合一种新颖的**裁剪感知相机光线映射（crop-aware camera ray map）**机制，实现各局部区域的无缝整合，最终生成一个结构完整、细节丰富的三维头像。\n得益于以上设计，AdaHuman 能够生成高度写实的标准 A 姿态头像，自遮挡极小，便于进行骨骼绑定与任意动作的动画驱动。\n在多个公开基准与真实自然图像上的广泛评估表明，AdaHuman 在头像重建与姿态迁移两个方面均显著优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2506.00225.md",
    "content": "### Understanding while Exploring: Semantics-driven Active Mapping\n\nEffective robotic autonomy in unknown environments demands proactive exploration and precise understanding of both geometry and semantics. In this paper, we propose ActiveSGM, an active semantic mapping framework designed to predict the informativeness of potential observations before execution. Built upon a 3D Gaussian Splatting (3DGS) mapping backbone, our approach employs semantic and geometric uncertainty quantification, coupled with a sparse semantic representation, to guide exploration. By enabling robots to strategically select the most beneficial viewpoints, ActiveSGM efficiently enhances mapping completeness, accuracy, and robustness to noisy semantic data, ultimately supporting more adaptive scene exploration. Our experiments on the Replica and Matterport3D datasets highlight the effectiveness of ActiveSGM in active semantic mapping tasks.\n\n在未知环境中实现高效的机器人自主性，需要具备主动探索能力以及对几何结构和语义信息的精确理解。本文提出了 ActiveSGM，一种用于主动语义建图的框架，能够在执行之前预测潜在观测信息的价值。\n该方法基于 3D Gaussian Splatting（3DGS） 映射主干，结合语义与几何不确定性量化机制，并引入稀疏语义表示，以指导机器人的探索行为。通过使机器人有策略地选择信息价值最高的视角，ActiveSGM 能够高效提升建图的完整性、精度，以及对语义噪声的鲁棒性，从而支持更加自适应的场景探索过程。\n在 Replica 和 Matterport3D 数据集上的实验结果表明，ActiveSGM 在主动语义建图任务中表现出色，验证了其方法的有效性。\n"
  },
  {
    "path": "abs/2506.00271.md",
    "content": "### Adaptive Voxelization for Transform coding of 3D Gaussian splatting data\n\nWe present a novel compression framework for 3D Gaussian splatting (3DGS) data that leverages transform coding tools originally developed for point clouds. Contrary to existing 3DGS compression methods, our approach can produce compressed 3DGS models at multiple bitrates in a computationally efficient way. Point cloud voxelization is a discretization technique that point cloud codecs use to improve coding efficiency while enabling the use of fast transform coding algorithms. We propose an adaptive voxelization algorithm tailored to 3DGS data, to avoid the inefficiencies introduced by uniform voxelization used in point cloud codecs. We ensure the positions of larger volume Gaussians are represented at high resolution, as these significantly impact rendering quality. Meanwhile, a low-resolution representation is used for dense regions with smaller Gaussians, which have a relatively lower impact on rendering quality. This adaptive voxelization approach significantly reduces the number of Gaussians and the bitrate required to encode the 3DGS data. After voxelization, many Gaussians are moved or eliminated. Thus, we propose to fine-tune/recolor the remaining 3DGS attributes with an initialization that can reduce the amount of retraining required. Experimental results on pre-trained datasets show that our proposed compression framework outperforms existing methods.\n\n我们提出了一种用于 3D Gaussian Splatting（3DGS） 数据的全新压缩框架，借助最初为点云开发的变换编码技术（transform coding tools）实现高效压缩。与现有 3DGS 压缩方法不同，我们的方法能够在多个比特率下生成压缩的 3DGS 模型，且计算效率高。\n点云体素化（voxelization） 是点云编码器常用的一种离散化技术，可提升编码效率并支持快速的变换编码算法。针对 3DGS 数据的特性，我们提出了一种自适应体素化算法，以避免点云编码器中统一体素化所带来的效率低下问题。\n具体而言，我们对体积较大的高斯分布使用高分辨率体素表示，因为它们对最终渲染质量影响显著；而对于密集区域中体积较小的高斯分布，则采用低分辨率表示，因为它们对渲染质量的影响相对较小。该自适应体素化策略显著减少了高斯数量及编码所需的比特率。\n在体素化之后，许多高斯分布被移动或删除。因此，我们进一步提出对剩余的 3DGS 属性进行微调/重新上色（fine-tune/recolor），并采用一种低重训练量的初始化方式来降低重新训练的代价。\n在多个预训练数据集上的实验结果表明，我们提出的压缩框架在性能上优于现有方法。\n"
  },
  {
    "path": "abs/2506.00280.md",
    "content": "### 3D Gaussian Splat Vulnerabilities\n\nWith 3D Gaussian Splatting (3DGS) being increasingly used in safety-critical applications, how can an adversary manipulate the scene to cause harm? We introduce CLOAK, the first attack that leverages view-dependent Gaussian appearances - colors and textures that change with viewing angle - to embed adversarial content visible only from specific viewpoints. We further demonstrate DAGGER, a targeted adversarial attack directly perturbing 3D Gaussians without access to underlying training data, deceiving multi-stage object detectors e.g., Faster R-CNN, through established methods such as projected gradient descent. These attacks highlight underexplored vulnerabilities in 3DGS, introducing a new potential threat to robotic learning for autonomous navigation and other safety-critical 3DGS applications.\n\n随着 3D Gaussian Splatting（3DGS） 日益应用于安全关键场景，攻击者是否能操纵场景以造成危害成为一个重要问题。我们提出了 CLOAK，这是首个利用 视角相关高斯外观（view-dependent Gaussian appearances） 实现攻击的方法——即颜色与纹理随视角变化，从而将对抗内容嵌入到特定视角下才可见的场景中。\n我们进一步提出了 DAGGER，一种有目标的对抗性攻击方式，直接扰动 3D 高斯，且无需访问底层训练数据。该攻击通过诸如**投影梯度下降（projected gradient descent）**等现有方法，对多阶段目标检测器（如 Faster R-CNN）实施误导。\n这些攻击揭示了目前 3DGS 安全性研究中的重要盲区，为机器人导航等安全关键任务中的学习系统带来了新的潜在威胁。\n"
  },
  {
    "path": "abs/2506.00970.md",
    "content": "### Globally Consistent RGB-D SLAM with 2D Gaussian Splatting\n\nRecently, 3D Gaussian splatting-based RGB-D SLAM displays remarkable performance of high-fidelity 3D reconstruction. However, the lack of depth rendering consistency and efficient loop closure limits the quality of its geometric reconstructions and its ability to perform globally consistent mapping online. In this paper, we present 2DGS-SLAM, an RGB-D SLAM system using 2D Gaussian splatting as the map representation. By leveraging the depth-consistent rendering property of the 2D variant, we propose an accurate camera pose optimization method and achieve geometrically accurate 3D reconstruction. In addition, we implement efficient loop detection and camera relocalization by leveraging MASt3R, a 3D foundation model, and achieve efficient map updates by maintaining a local active map. Experiments show that our 2DGS-SLAM approach achieves superior tracking accuracy, higher surface reconstruction quality, and more consistent global map reconstruction compared to existing rendering-based SLAM methods, while maintaining high-fidelity image rendering and improved computational efficiency.\n\n近年来，基于 3D Gaussian Splatting 的 RGB-D SLAM 在高保真三维重建方面展现出显著性能。然而，深度渲染一致性的缺失以及回环检测效率低下限制了其几何重建质量和在线实现全局一致建图的能力。\n本文提出了 2DGS-SLAM，一种以 2D Gaussian Splatting 作为地图表示的 RGB-D SLAM 系统。借助该 2D 变体在深度渲染上一致性的特性，我们设计了一种精确的相机位姿优化方法，从而实现了更高几何精度的三维重建。\n此外，我们结合 MASt3R（一种 3D 基础模型）实现了高效的回环检测与相机重定位，并通过维护**局部活动地图（local active map）**提升了地图更新效率。\n实验表明，2DGS-SLAM 在与现有基于渲染的 SLAM 方法对比时，展现出更强的跟踪精度、更高的表面重建质量以及更一致的全局地图构建能力，同时保持了高质量的图像渲染效果和更优的计算效率。\n"
  },
  {
    "path": "abs/2506.01091.md",
    "content": "### PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation\n\nVisual effects (VFX) are key to immersion in modern films, games, and AR/VR. Creating 3D effects requires specialized expertise and training in 3D animation software and can be time consuming. Generative solutions typically rely on computationally intense methods such as diffusion models which can be slow at 4D inference. We reformulate 3D animation as a field prediction task and introduce a text-driven framework that infers a time-varying 4D flow field acting on 3D Gaussians. By leveraging large language models (LLMs) and vision-language models (VLMs) for function generation, our approach interprets arbitrary prompts (e.g., \"make the vase glow orange, then explode\") and instantly updates color, opacity, and positions of 3D Gaussians in real time. This design avoids overheads such as mesh extraction, manual or physics-based simulations and allows both novice and expert users to animate volumetric scenes with minimal effort on a consumer device even in a web browser. Experimental results show that simple textual instructions suffice to generate compelling time-varying VFX, reducing the manual effort typically required for rigging or advanced modeling. We thus present a fast and accessible pathway to language-driven 3D content creation that can pave the way to democratize VFX further.\n\n视觉特效（VFX）是现代电影、游戏和 AR/VR 中沉浸感的关键。然而，3D 特效的制作通常依赖于专业的 3D 动画软件和复杂的操作流程，需要大量专业知识与训练，耗时且成本高。现有生成式解决方案大多依赖计算密集型方法（如扩散模型），在执行 4D 推理时速度较慢。\n本文将 3D 动画重新表述为一个场预测任务，并提出一个文本驱动的框架，可对作用于 3D 高斯的时变 4D 流场进行推理。该方法利用大语言模型（LLMs）与视觉语言模型（VLMs）进行函数生成，能够理解任意自然语言提示（如：“让花瓶变成橙色发光，然后爆炸”），并实时更新 3D 高斯的颜色、不透明度和位置。\n该设计规避了网格提取、手工建模或基于物理的模拟等传统开销，使得无论是初学者还是专家用户，都能在消费级设备（甚至是网页浏览器中）以最小成本创建体积动画场景。\n实验结果表明，仅凭简单的文本指令即可生成令人信服的时变视觉特效，大幅降低了以往对绑定骨骼（rigging）或高级建模的手动依赖。因此，我们提供了一种快速且易用的语言驱动 3D 内容创作路径，为视觉特效的普及与民主化铺平了道路。\n"
  },
  {
    "path": "abs/2506.01109.md",
    "content": "### CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting\n\nAccurate fruit counting in real-world agricultural environments is a longstanding challenge due to visual occlusions, semantic ambiguity, and the high computational demands of 3D reconstruction. Existing methods based on neural radiance fields suffer from low inference speed, limited generalization, and lack support for open-set semantic control. This paper presents FruitLangGS, a real-time 3D fruit counting framework that addresses these limitations through spatial reconstruction, semantic embedding, and language-guided instance estimation. FruitLangGS first reconstructs orchard-scale scenes using an adaptive Gaussian splatting pipeline with radius-aware pruning and tile-based rasterization for efficient rendering. To enable semantic control, each Gaussian encodes a compressed CLIP-aligned language embedding, forming a compact and queryable 3D representation. At inference time, prompt-based semantic filtering is applied directly in 3D space, without relying on image-space segmentation or view-level fusion. The selected Gaussians are then converted into dense point clouds via distribution-aware sampling and clustered to estimate fruit counts. Experimental results on real orchard data demonstrate that FruitLangGS achieves higher rendering speed, semantic flexibility, and counting accuracy compared to prior approaches, offering a new perspective for language-driven, real-time neural rendering across open-world scenarios.\n\n在真实农业环境中实现精确的果实计数长期以来面临诸多挑战，如视觉遮挡、语义模糊，以及三维重建所需的高计算开销。现有基于神经辐射场（NeRF）的方法存在推理速度慢、泛化能力有限以及缺乏开放集语义控制等问题。\n本文提出 FruitLangGS，一个面向实时 3D 果实计数的框架，结合空间重建、语义嵌入与语言引导的实例估计，系统性地解决了上述瓶颈。\n具体而言，FruitLangGS 首先采用自适应高斯投影（adaptive Gaussian splatting）管线对果园级场景进行重建，并通过基于半径的剪枝与基于切片的光栅化实现高效渲染。为了支持语义控制，每个高斯编码一个经过压缩的、与 CLIP 对齐的语言嵌入，形成紧凑可查询的 3D 表示。\n在推理阶段，系统在三维空间中直接进行基于文本提示的语义筛选，无需依赖图像空间分割或视角级融合。筛选得到的高斯随后通过**分布感知采样（distribution-aware sampling）**转化为稠密点云，并通过聚类完成果实计数估计。\n在真实果园数据上的实验结果表明，FruitLangGS 相较现有方法在渲染速度、语义灵活性和计数精度方面均表现优异，展示了其作为一种语言驱动、实时神经渲染方法在开放世界场景中的广泛潜力。\n"
  },
  {
    "path": "abs/2506.01379.md",
    "content": "### RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes\n\nHigh-Fidelity 3D scene reconstruction plays a crucial role in autonomous driving by enabling novel data generation from existing datasets. This allows simulating safety-critical scenarios and augmenting training datasets without incurring further data collection costs. While recent advances in radiance fields have demonstrated promising results in 3D reconstruction and sensor data synthesis using cameras and LiDAR, their potential for radar remains largely unexplored. Radar is crucial for autonomous driving due to its robustness in adverse weather conditions like rain, fog, and snow, where optical sensors often struggle. Although the state-of-the-art radar-based neural representation shows promise for 3D driving scene reconstruction, it performs poorly in scenarios with significant radar noise, including receiver saturation and multipath reflection. Moreover, it is limited to synthesizing preprocessed, noise-excluded radar images, failing to address realistic radar data synthesis. To address these limitations, this paper proposes RadarSplat, which integrates Gaussian Splatting with novel radar noise modeling to enable realistic radar data synthesis and enhanced 3D reconstruction. Compared to the state-of-the-art, RadarSplat achieves superior radar image synthesis (+3.4 PSNR / 2.6x SSIM) and improved geometric reconstruction (-40% RMSE / 1.5x Accuracy), demonstrating its effectiveness in generating high-fidelity radar data and scene reconstruction.\n\n高保真三维场景重建在自动驾驶中具有关键作用，它能够基于现有数据集生成新数据，从而用于模拟关键安全场景，并在无需额外采集数据的前提下扩充训练数据集。尽管近期基于辐射场的研究在结合摄像头与激光雷达的数据进行三维重建与传感器数据合成方面取得了显著进展，但其在雷达数据上的潜力仍 largely unexplored。雷达在自动驾驶中尤为重要，因为它在雨、雾、雪等恶劣天气条件下表现出极强的鲁棒性，而光学传感器在这些场景下往往难以胜任。\n尽管当前最先进的基于雷达的神经表示方法在三维驾驶场景重建方面展现出一定潜力，但在存在严重雷达噪声的情况下（如接收器饱和、多径反射等）性能显著下降。此外，该方法仅限于合成经过预处理且去噪后的雷达图像，无法模拟真实的雷达数据分布。\n为克服上述限制，本文提出RadarSplat方法，将高斯点渲染（Gaussian Splatting）与创新的雷达噪声建模相结合，支持真实雷达数据合成并提升三维重建质量。与当前最先进方法相比，RadarSplat 在雷达图像合成方面实现了更优的性能（PSNR 提高 3.4，SSIM 提高 2.6 倍），在几何重建方面也显著改善（RMSE 降低 40%，精度提升 1.5 倍），充分验证了其在高保真雷达数据生成与场景重建方面的有效性。\n"
  },
  {
    "path": "abs/2506.01799.md",
    "content": "### WorldExplorer: Towards Generating Fully Navigable 3D Scenes\n\nGenerating 3D worlds from text is a highly anticipated goal in computer vision. Existing works are limited by the degree of exploration they allow inside of a scene, i.e., produce streched-out and noisy artifacts when moving beyond central or panoramic perspectives. To this end, we propose WorldExplorer, a novel method based on autoregressive video trajectory generation, which builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints. We initialize our scenes by creating multi-view consistent images corresponding to a 360 degree panorama. Then, we expand it by leveraging video diffusion models in an iterative scene generation pipeline. Concretely, we generate multiple videos along short, pre-defined trajectories, that explore the scene in depth, including motion around objects. Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results, like moving into objects. Finally, we fuse all generated views into a unified 3D representation via 3D Gaussian Splatting optimization. Compared to prior approaches, WorldExplorer produces high-quality scenes that remain stable under large camera motion, enabling for the first time realistic and unrestricted exploration. We believe this marks a significant step toward generating immersive and truly explorable virtual 3D environments.\n\n从文本生成三维世界是计算机视觉领域备受期待的目标。然而，现有方法在场景可探索性方面存在显著限制 —— 一旦视角超出中心或全景视角范围，生成结果便会出现拉伸或噪声伪影等问题。为此，我们提出WorldExplorer，一种基于自回归视频轨迹生成的新方法，能够构建在广泛视角下视觉一致且可自由导航的三维场景。\n我们的方法首先通过生成多视角一致的图像，构建对应于360度全景的初始场景；随后，借助视频扩散模型，采用迭代式的场景生成流程对其进行扩展。具体而言，我们沿着预定义的短路径生成多个视频片段，以探索场景的深度结构，包括围绕物体的运动视角。我们设计的新型场景记忆机制能够为每段视频提供与之最相关的前视图作为条件，同时配备碰撞检测机制以防止出现如相机穿入物体等失真结果。\n最终，所有生成的视图将通过三维高斯点渲染（3D Gaussian Splatting）优化过程融合为统一的三维表示。与现有方法相比，WorldExplorer 能够生成在大范围相机运动下依然保持稳定高质量的场景，首次实现了逼真且不受限制的三维空间自由探索。\n我们相信，该方法标志着向沉浸式、真实可探索的虚拟三维环境生成迈出了重要一步。\n"
  },
  {
    "path": "abs/2506.01822.md",
    "content": "### GSCodec Studio: A Modular Framework for Gaussian Splat Compression\n\n3D Gaussian Splatting and its extension to 4D dynamic scenes enable photorealistic, real-time rendering from real-world captures, positioning Gaussian Splats (GS) as a promising format for next-generation immersive media. However, their high storage requirements pose significant challenges for practical use in sharing, transmission, and storage. Despite various studies exploring GS compression from different perspectives, these efforts remain scattered across separate repositories, complicating benchmarking and the integration of best practices. To address this gap, we present GSCodec Studio, a unified and modular framework for GS reconstruction, compression, and rendering. The framework incorporates a diverse set of 3D/4D GS reconstruction methods and GS compression techniques as modular components, facilitating flexible combinations and comprehensive comparisons. By integrating best practices from community research and our own explorations, GSCodec Studio supports the development of compact representation and compression solutions for static and dynamic Gaussian Splats, namely our Static and Dynamic GSCodec, achieving competitive rate-distortion performance in static and dynamic GS compression.\n\n3D Gaussian Splatting 及其对 4D 动态场景的扩展，使得从真实世界捕获数据中实现照片级真实感的实时渲染成为可能，因而被认为是下一代沉浸式媒体的重要表示格式之一。然而，其高存储需求在共享、传输与存储等实际应用中带来了严重挑战。尽管已有多项研究从不同角度探讨了 GS 的压缩问题，但这些工作分散在各个独立的代码库中，导致基准评估困难、最佳实践难以整合。\n为填补这一空白，本文提出 GSCodec Studio —— 一个用于 GS 重建、压缩与渲染的统一模块化框架。该框架集成了多种 3D/4D GS 重建方法与 GS 压缩技术，均以模块化形式实现，支持灵活组合与全面对比。通过融合社区已有研究成果与我们自身的探索，GSCodec Studio 支持针对静态与动态高斯点云的紧凑表示与压缩方案开发，分别构建了 Static GSCodec 与 Dynamic GSCodec，在静态与动态 GS 压缩任务中实现了具有竞争力的码率-失真性能（rate-distortion performance）。\n"
  },
  {
    "path": "abs/2506.02380.md",
    "content": "### EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR\n\n3D Gaussian Splatting (3DGS) is an emerging media representation that reconstructs real-world 3D scenes in high fidelity, enabling 6-degrees-of-freedom (6-DoF) navigation in virtual reality (VR). However, developing and evaluating 3DGS-enabled applications and optimizing their rendering performance, require realistic user navigation data. Such data is currently unavailable for photorealistic 3DGS reconstructions of real-world scenes. This paper introduces EyeNavGS (EyeNavGS), the first publicly available 6-DoF navigation dataset featuring traces from 46 participants exploring twelve diverse, real-world 3DGS scenes. The dataset was collected at two sites, using the Meta Quest Pro headsets, recording the head pose and eye gaze data for each rendered frame during free world standing 6-DoF navigation. For each of the twelve scenes, we performed careful scene initialization to correct for scene tilt and scale, ensuring a perceptually-comfortable VR experience. We also release our open-source SIBR viewer software fork with record-and-replay functionalities and a suite of utility tools for data processing, conversion, and visualization. The EyeNavGS dataset and its accompanying software tools provide valuable resources for advancing research in 6-DoF viewport prediction, adaptive streaming, 3D saliency, and foveated rendering for 3DGS scenes.\n\n3D Gaussian Splatting（3DGS）是一种新兴的媒体表示方式，能够以高保真度重建真实世界的三维场景，为虚拟现实（VR）中的六自由度（6-DoF）导航提供支持。然而，开发与评估基于 3DGS 的应用，以及优化其渲染性能，都依赖于真实的用户导航数据——而目前在真实三维场景的照片级 3DGS 重建中，尚未有相关数据集可用。\n为此，本文发布了 EyeNavGS：首个公开可用的六自由度导航数据集，记录了 46 名参与者在 12 个多样化真实 3DGS 场景中的探索轨迹。该数据集在两个不同场地采集，使用 Meta Quest Pro 头显，在用户自由站立式 6-DoF 导航过程中，逐帧记录了头部姿态和眼动视线数据。\n为确保每个场景提供舒适的 VR 体验，我们对十二个场景分别进行了精细初始化，包括倾斜与尺度校正。我们还发布了一个开源的 SIBR viewer 分支版本，支持导航记录与回放功能，并提供一套实用工具，用于数据处理、格式转换与可视化。\nEyeNavGS 数据集及其配套软件工具为 3DGS 场景中的以下研究提供了宝贵资源：六自由度视口预测、自适应流式传输、三维显著性建模以及注视点渲染（foveated rendering）等。\n"
  },
  {
    "path": "abs/2506.02741.md",
    "content": "### VTGaussian-SLAM: RGBD SLAM for Large Scale Scenes with Splatting View-Tied 3D Gaussians\n\nJointly estimating camera poses and mapping scenes from RGBD images is a fundamental task in simultaneous localization and mapping (SLAM). State-of-the-art methods employ 3D Gaussians to represent a scene, and render these Gaussians through splatting for higher efficiency and better rendering. However, these methods cannot scale up to extremely large scenes, due to the inefficient tracking and mapping strategies that need to optimize all 3D Gaussians in the limited GPU memories throughout the training to maintain the geometry and color consistency to previous RGBD observations. To resolve this issue, we propose novel tracking and mapping strategies to work with a novel 3D representation, dubbed view-tied 3D Gaussians, for RGBD SLAM systems. View-tied 3D Gaussians is a kind of simplified Gaussians, which is tied to depth pixels, without needing to learn locations, rotations, and multi-dimensional variances. Tying Gaussians to views not only significantly saves storage but also allows us to employ many more Gaussians to represent local details in the limited GPU memory. Moreover, our strategies remove the need of maintaining all Gaussians learnable throughout the training, while improving rendering quality, and tracking accuracy. We justify the effectiveness of these designs, and report better performance over the latest methods on the widely used benchmarks in terms of rendering and tracking accuracy and scalability.\n\n从 RGBD 图像中联合估计相机位姿与场景建图是同时定位与地图构建（SLAM）中的一项基础任务。当前最先进的方法采用 3D 高斯来表示场景，并通过 splatting 渲染方式提升渲染效率与质量。然而，这些方法在面对超大规模场景时无法扩展，原因在于：为了保持与先前 RGBD 观测之间的几何与颜色一致性，它们需要在整个训练过程中持续优化所有 3D 高斯，而 GPU 显存有限，导致跟踪与建图策略效率低下。\n为解决该问题，本文提出了适用于 RGBD SLAM 系统的新型跟踪与建图策略，并引入一种新的三维表示形式，称为 View-tied 3D Gaussians。这种高斯是一种简化形式，与深度图像像素绑定，不再需要学习其位置、旋转和多维方差。将高斯绑定到视图像素，既大大节省了存储开销，又使我们能够在有限的 GPU 显存中使用更多高斯来表示局部细节。\n此外，我们的方法还消除了在整个训练过程中保持所有高斯可学习的需求，同时提升了渲染质量与跟踪精度。我们通过实验证明了该设计的有效性，并在主流基准测试中，在渲染效果、跟踪精度及系统可扩展性方面，均优于最新方法。\n"
  },
  {
    "path": "abs/2506.02751.md",
    "content": "### RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS\n\n3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling scenes affected by transient objects, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances. To address this, we propose RobustSplat, a robust solution based on two critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method.\n\n3D Gaussian Splatting（3DGS）因其在新视角合成和三维建模中的实时、照片级真实感渲染能力而受到广泛关注。然而，现有方法在建模受瞬时物体影响的场景时表现不佳，导致渲染图像中出现伪影。我们发现，尽管高斯密化过程有助于捕捉场景细节，但其也会无意中引入建模瞬时干扰的额外高斯，从而加剧伪影问题。为了解决这一问题，我们提出了 RobustSplat，一个基于两项关键设计的鲁棒解决方案。\n首先，我们引入了一种延迟高斯生长策略，该策略优先优化静态场景结构，随后才允许高斯的拆分/克隆，从而在优化初期减少对瞬时物体的过拟合。其次，我们设计了一种尺度级联的掩码自举方法，该方法首先利用低分辨率特征相似性监督进行初始瞬时掩码的估计，借助其更强的语义一致性与抗噪性；随后再过渡到高分辨率监督，以实现更精确的掩码预测。\n在多个具有挑战性的数据集上的大量实验表明，我们的方法优于现有方法，清晰地展示了其鲁棒性与有效性。\n"
  },
  {
    "path": "abs/2506.02774.md",
    "content": "### Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone\n\n3D Gaussian Splatting (3DGS) is an emerging technique for photorealistic 3D scene rendering. However, rendering city-scale 3DGS scenes on mobile devices, e.g., your smartphones, remains a significant challenge due to the limited resources on mobile devices. A natural solution is to offload computation to the cloud; however, naively streaming rendered frames from the cloud to the client introduces high latency and requires bandwidth far beyond the capacity of current wireless networks.\nIn this paper, we propose an effective solution to enable city-scale 3DGS rendering on mobile devices. Our key insight is that, under normal user motion, the number of newly visible Gaussians per second remains roughly constant. Leveraging this, we stream only the necessary Gaussians to the client. Specifically, on the cloud side, we propose asynchronous level-of-detail search to identify the necessary Gaussians for the client. On the client side, we accelerate rendering via a lookup table-based rasterization. Combined with holistic runtime optimizations, our system can deliver low-latency, city-scale 3DGS rendering on mobile devices. Compared to existing solutions, Voyager achieves over 100× reduction on data transfer and up to 8.9× speedup while retaining comparable rendering quality.\n\n3D Gaussian Splatting（3DGS）是一种新兴的照片级真实感三维场景渲染技术。然而，在移动设备（例如智能手机）上渲染城市级规模的3DGS场景仍面临重大挑战，主要由于移动设备的资源有限。一个自然的解决方案是将计算任务卸载至云端；但若直接从云端向客户端串流渲染帧，会引入较高的延迟，并且对带宽的要求远超当前无线网络的承载能力。\n在本文中，我们提出了一种高效的解决方案，使移动设备能够实现城市级别的3DGS渲染。我们的关键洞察是：在用户正常移动的情况下，每秒新增可见的高斯数量大致保持恒定。基于此，我们仅向客户端传输必要的高斯。具体而言，在云端，我们提出了异步细节层级搜索（asynchronous level-of-detail search），以识别客户端所需的高斯；在客户端，我们通过基于查找表的光栅化方法加速渲染。结合整体运行时优化，我们的系统能够在移动设备上实现低延迟的城市级3DGS渲染。\n与现有解决方案相比，Voyager 在数据传输量上实现了超过 100 倍的减少，渲染速度提升最高可达 8.9 倍，同时保持了可比拟的渲染质量。\n"
  },
  {
    "path": "abs/2506.03073.md",
    "content": "### LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM\n\nModern Gaussian Splatting methods have proven highly effective for real-time photorealistic rendering of 3D scenes. However, integrating semantic information into this representation remains a significant challenge, especially in maintaining real-time performance for SLAM (Simultaneous Localization and Mapping) applications. In this work, we introduce LEG-SLAM -- a novel approach that fuses an optimized Gaussian Splatting implementation with visual-language feature extraction using DINOv2 followed by a learnable feature compressor based on Principal Component Analysis, while enabling an online dense SLAM. Our method simultaneously generates high-quality photorealistic images and semantically labeled scene maps, achieving real-time scene reconstruction with more than 10 fps on the Replica dataset and 18 fps on ScanNet. Experimental results show that our approach significantly outperforms state-of-the-art methods in reconstruction speed while achieving competitive rendering quality. The proposed system eliminates the need for prior data preparation such as camera's ego motion or pre-computed static semantic maps. With its potential applications in autonomous robotics, augmented reality, and other interactive domains, LEG-SLAM represents a significant step forward in real-time semantic 3D Gaussian-based SLAM.\n\n现代 Gaussian Splatting 方法已在三维场景的实时照片级真实感渲染中表现出极高的效果。然而，将语义信息融入此类表示仍面临重大挑战，尤其是在需要保持实时性能的 SLAM（同时定位与建图）应用中。\n在本文中，我们提出了 LEG-SLAM —— 一种融合优化后的 Gaussian Splatting 实现与视觉-语言特征提取的新方法。该方法采用 DINOv2 进行视觉-语言特征提取，并结合基于主成分分析（PCA）的可学习特征压缩器，同时支持在线稠密 SLAM。\n我们的方法可同时生成高质量的照片级图像与带有语义标签的场景地图，在 Replica 数据集上实现了超过 10 fps 的实时场景重建，在 ScanNet 上达到了 18 fps。实验结果表明，LEG-SLAM 在重建速度上显著优于现有最先进方法，同时在渲染质量方面也保持竞争力。\n该系统无需先验数据准备，如相机自运动或预计算的静态语义地图，具有良好的泛化性。LEG-SLAM 在自主机器人、增强现实以及其他交互式应用领域展现出巨大潜力，标志着基于高斯的实时语义 SLAM 迈出了重要一步。\n"
  },
  {
    "path": "abs/2506.03407.md",
    "content": "### Multi-Spectral Gaussian Splatting with Neural Color Representation\n\nWe present MS-Splatting -- a multi-spectral 3D Gaussian Splatting (3DGS) framework that is able to generate multi-view consistent novel views from images of multiple, independent cameras with different spectral domains. In contrast to previous approaches, our method does not require cross-modal camera calibration and is versatile enough to model a variety of different spectra, including thermal and near-infra red, without any algorithmic changes.\nUnlike existing 3DGS-based frameworks that treat each modality separately (by optimizing per-channel spherical harmonics) and therefore fail to exploit the underlying spectral and spatial correlations, our method leverages a novel neural color representation that encodes multi-spectral information into a learned, compact, per-splat feature embedding. A shallow multi-layer perceptron (MLP) then decodes this embedding to obtain spectral color values, enabling joint learning of all bands within a unified representation.\nOur experiments show that this simple yet effective strategy is able to improve multi-spectral rendering quality, while also leading to improved per-spectra rendering quality over state-of-the-art methods. We demonstrate the effectiveness of this new technique in agricultural applications to render vegetation indices, such as normalized difference vegetation index (NDVI).\n\n我们提出了 MS-Splatting —— 一个多光谱的 3D Gaussian Splatting（3DGS）框架，能够从多台独立摄像机拍摄、具有不同光谱域的图像中生成多视角一致的新视图。与以往方法不同，我们的方法无需跨模态相机标定，且具有足够的通用性，可适用于多种光谱类型（包括热红外和近红外）而无需进行算法修改。\n现有基于 3DGS 的方法通常将每种模态独立处理（通过对每个通道优化球谐系数），因此无法有效利用光谱与空间之间的潜在相关性。而我们的方法引入了一种新颖的神经颜色表示方式，将多光谱信息编码为每个高斯点的紧凑型可学习特征嵌入，随后通过一个浅层多层感知机（MLP）对嵌入进行解码，从而获得各光谱通道的颜色值，实现了所有光谱波段在统一表示下的联合学习。\n实验表明，这一简洁而有效的策略不仅提升了多光谱渲染质量，同时在每个单独光谱上的渲染表现也超过了现有最先进的方法。我们进一步展示了该技术在农业应用中的有效性，例如用于渲染植被指数（如归一化植被指数 NDVI）。\n"
  },
  {
    "path": "abs/2506.03538.md",
    "content": "### Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting\n\n3D reconstruction from in-the-wild images remains a challenging task due to inconsistent lighting conditions and transient distractors. Existing methods typically rely on heuristic strategies to handle the low-quality training data, which often struggle to produce stable and consistent reconstructions, frequently resulting in visual artifacts. In this work, we propose Asymmetric Dual 3DGS, a novel framework that leverages the stochastic nature of these artifacts: they tend to vary across different training runs due to minor randomness. Specifically, our method trains two 3D Gaussian Splatting (3DGS) models in parallel, enforcing a consistency constraint that encourages convergence on reliable scene geometry while suppressing inconsistent artifacts. To prevent the two models from collapsing into similar failure modes due to confirmation bias, we introduce a divergent masking strategy that applies two complementary masks: a multi-cue adaptive mask and a self-supervised soft mask, which leads to an asymmetric training process of the two models, reducing shared error modes. In addition, to improve the efficiency of model training, we introduce a lightweight variant called Dynamic EMA Proxy, which replaces one of the two models with a dynamically updated Exponential Moving Average (EMA) proxy, and employs an alternating masking strategy to preserve divergence. Extensive experiments on challenging real-world datasets demonstrate that our method consistently outperforms existing approaches while achieving high efficiency.\n\n在自然环境下的图像中进行三维重建仍是一项具有挑战性的任务，主要由于光照条件不一致和瞬时干扰物的存在。现有方法通常依赖启发式策略来应对低质量的训练数据，但往往难以生成稳定、一致的重建结果，容易产生视觉伪影。\n本文提出了一种新颖的框架 Asymmetric Dual 3DGS，利用这些伪影的随机性特征：由于微小的随机扰动，不同训练轮次中伪影的表现往往各不相同。具体而言，我们的方法并行训练两个 3D Gaussian Splatting（3DGS）模型，并引入一致性约束，以促使模型在可靠的场景几何结构上收敛，同时抑制不一致的伪影。\n为避免两个模型因确认偏差而陷入相似的失败模式，我们提出了分歧掩码策略，即分别施加两个互补的掩码：一个是多线索自适应掩码，另一个是自监督软掩码，从而形成两个模型的不对称训练过程，减少共享误差模式。\n此外，为提升训练效率，我们引入了一种轻量级变体 Dynamic EMA Proxy，通过动态更新的指数移动平均（EMA）代理模型替代其中一个模型，并采用交替掩码策略以保持模型之间的分歧。\n在多个具有挑战性的真实数据集上的大量实验证明，本文方法在保持高效率的同时，性能稳定且显著优于现有方法。\n"
  },
  {
    "path": "abs/2506.03594.md",
    "content": "### SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting\n\nReconstructing articulated objects prevalent in daily environments is crucial for applications in augmented/virtual reality and robotics. However, existing methods face scalability limitations (requiring 3D supervision or costly annotations), robustness issues (being susceptible to local optima), and rendering shortcomings (lacking speed or photorealism). We introduce SplArt, a self-supervised, category-agnostic framework that leverages 3D Gaussian Splatting (3DGS) to reconstruct articulated objects and infer kinematics from two sets of posed RGB images captured at different articulation states, enabling real-time photorealistic rendering for novel viewpoints and articulations. SplArt augments 3DGS with a differentiable mobility parameter per Gaussian, achieving refined part segmentation. A multi-stage optimization strategy is employed to progressively handle reconstruction, part segmentation, and articulation estimation, significantly enhancing robustness and accuracy. SplArt exploits geometric self-supervision, effectively addressing challenging scenarios without requiring 3D annotations or category-specific priors. Evaluations on established and newly proposed benchmarks, along with applications to real-world scenarios using a handheld RGB camera, demonstrate SplArt's state-of-the-art performance and real-world practicality.\n\n在增强/虚拟现实与机器人等应用中，重建日常环境中常见的可动结构物体具有重要意义。然而，现有方法在可扩展性（需要3D监督或高成本标注）、鲁棒性（易陷入局部最优解）和渲染性能（缺乏速度或真实感）方面仍面临诸多限制。为此，我们提出了 SplArt —— 一种自监督、无类别依赖的框架，利用 三维高斯喷洒（3D Gaussian Splatting, 3DGS） 实现可动结构物体的重建与运动学推理，仅需在不同姿态状态下拍摄的两组有姿态标注的 RGB 图像，即可实现新视角和新动作的实时写实渲染。\nSplArt 在 3DGS 的基础上引入每个高斯粒子的可微运动参数，从而实现精细的部件分割。同时，框架采用多阶段优化策略，逐步处理重建、部件分割与姿态估计任务，显著提升鲁棒性与准确性。该方法利用几何自监督信号，有效应对具有挑战性的场景，无需任何 3D 标注或类别特定先验。\n在多个已有及新提出的基准测试上的评估结果，以及基于手持 RGB 相机采集的真实场景应用，均表明 SplArt 在性能与实用性方面均达到当前最优水平。\n"
  },
  {
    "path": "abs/2506.03872.md",
    "content": "### JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting\n\nReconstructing 3D scenes from sparse viewpoints is a long-standing challenge with wide applications. Recent advances in feed-forward 3D Gaussian sparse-view reconstruction methods provide an efficient solution for real-time novel view synthesis by leveraging geometric priors learned from large-scale multi-view datasets and computing 3D Gaussian centers via back-projection. Despite offering strong geometric cues, both feed-forward multi-view depth estimation and flow-depth joint estimation face key limitations: the former suffers from mislocation and artifact issues in low-texture or repetitive regions, while the latter is prone to local noise and global inconsistency due to unreliable matches when ground-truth flow supervision is unavailable. To overcome this, we propose JointSplat, a unified framework that leverages the complementarity between optical flow and depth via a novel probabilistic optimization mechanism. Specifically, this pixel-level mechanism scales the information fusion between depth and flow based on the matching probability of optical flow during training. Building upon the above mechanism, we further propose a novel multi-view depth-consistency loss to leverage the reliability of supervision while suppressing misleading gradients in uncertain areas. Evaluated on RealEstate10K and ACID, JointSplat consistently outperforms state-of-the-art (SOTA) methods, demonstrating the effectiveness and robustness of our proposed probabilistic joint flow-depth optimization approach for high-fidelity sparse-view 3D reconstruction.\n\n从稀疏视角重建三维场景是一项长期存在的挑战，具有广泛的应用前景。近期的前馈式三维高斯稀疏视图重建方法，借助从大规模多视角数据集中学习的几何先验，并通过反投影计算高斯中心，为实时新视角合成提供了高效的解决方案。尽管具备强几何提示，这类方法在深度估计方面仍面临关键瓶颈：传统的前馈多视角深度估计在低纹理或重复区域中容易出现定位错误和伪影问题，而流-深度联合估计在缺乏真实光流监督时，往往受到匹配不可靠带来的局部噪声与全局不一致的影响。\n为此，我们提出了 JointSplat —— 一种统一的框架，通过一种新颖的概率优化机制，挖掘深度与光流之间的互补性。该机制在训练过程中，以光流的匹配概率为依据，动态调节深度与光流间的信息融合，精确到像素级。基于该机制，我们进一步引入了一个新颖的多视角深度一致性损失，用以增强可靠监督的引导作用，同时在不确定区域抑制误导性梯度。\n在 RealEstate10K 与 ACID 数据集上的实验表明，JointSplat 在多个指标上均优于现有最先进方法，验证了我们提出的概率联合光流-深度优化机制在实现高保真稀疏视角三维重建中的有效性与鲁棒性。\n"
  },
  {
    "path": "abs/2506.04120.md",
    "content": "### Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data\n\nCreating accurate, physical simulations directly from real-world robot motion holds great value for safe, scalable, and affordable robot learning, yet remains exceptionally challenging. Real robot data suffers from occlusions, noisy camera poses, dynamic scene elements, which hinder the creation of geometrically accurate and photorealistic digital twins of unseen objects. We introduce a novel real-to-sim framework tackling all these challenges at once. Our key insight is a hybrid scene representation merging the photorealistic rendering of 3D Gaussian Splatting with explicit object meshes suitable for physics simulation within a single representation. We propose an end-to-end optimization pipeline that leverages differentiable rendering and differentiable physics within MuJoCo to jointly refine all scene components - from object geometry and appearance to robot poses and physical parameters - directly from raw and imprecise robot trajectories. This unified optimization allows us to simultaneously achieve high-fidelity object mesh reconstruction, generate photorealistic novel views, and perform annotation-free robot pose calibration. We demonstrate the effectiveness of our approach both in simulation and on challenging real-world sequences using an ALOHA 2 bi-manual manipulator, enabling more practical and robust real-to-simulation pipelines.\n\n从真实机器人运动中直接构建精确的物理仿真，对于实现安全、可扩展且经济高效的机器人学习具有重要价值，但该任务仍极具挑战性。真实机器人采集的数据通常存在遮挡、相机位姿噪声以及动态场景元素，这些问题严重阻碍了对未见物体进行几何精确且具有写实感的数字孪生建模。\n我们提出了一种新颖的 real-to-sim 框架，能够同时应对上述所有挑战。核心思想是构建一种混合场景表示，将 3D Gaussian Splatting 的写实渲染能力 与适用于物理仿真的显式物体网格融合为统一表达。我们设计了一条端到端的优化流程，结合了可微渲染与 MuJoCo 中的可微物理模块，能够直接从原始、带有误差的机器人轨迹出发，对场景中所有要素（包括物体几何与外观、机器人位姿及物理参数）进行联合优化。\n这一统一优化策略使我们能够同时实现高保真网格重建、写实新视角生成和免标注的机器人位姿校准。我们在模拟环境及使用 ALOHA 2 双臂机械臂采集的真实复杂序列上验证了该方法的有效性，展现了其在提升 real-to-simulation 流程实用性与鲁棒性方面的巨大潜力。\n"
  },
  {
    "path": "abs/2506.04174.md",
    "content": "### FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting\n\n3D Gaussian splatting (3DGS) has enabled various applications in 3D scene representation and novel view synthesis due to its efficient rendering capabilities. However, 3DGS demands relatively significant GPU memory, limiting its use on devices with restricted computational resources. Previous approaches have focused on pruning less important Gaussians, effectively compressing 3DGS but often requiring a fine-tuning stage and lacking adaptability for the specific memory needs of different devices. In this work, we present an elastic inference method for 3DGS. Given an input for the desired model size, our method selects and transforms a subset of Gaussians, achieving substantial rendering performance without additional fine-tuning. We introduce a tiny learnable module that controls Gaussian selection based on the input percentage, along with a transformation module that adjusts the selected Gaussians to complement the performance of the reduced model. Comprehensive experiments on ZipNeRF, MipNeRF and Tanks\\&Temples scenes demonstrate the effectiveness of our approach.\n\n三维高斯喷洒（3D Gaussian Splatting, 3DGS）因其高效的渲染能力，在三维场景表示与新视角合成等任务中得到了广泛应用。然而，3DGS 对 GPU 显存的需求较高，限制了其在计算资源受限设备上的使用。已有方法主要通过裁剪不重要的高斯粒子来压缩模型，虽然在一定程度上缓解了内存问题，但通常依赖额外的微调阶段，且缺乏对不同设备特定内存需求的适应性。\n为此，我们提出了一种适用于 3DGS 的弹性推理方法。给定用户输入的目标模型大小，该方法能够选择并变换部分高斯粒子，在无需额外微调的前提下实现高效渲染性能。我们设计了一个轻量可学习模块，用于根据输入比例控制高斯选择；同时引入一个变换模块，对所选高斯进行调整，以补偿模型压缩带来的性能损失。\n我们在 ZipNeRF、MipNeRF 以及 Tanks&Temples 等多个数据集上进行了全面实验，结果表明该方法在保持渲染质量的同时，显著提升了在不同计算资源条件下的适用性与灵活性。\n"
  },
  {
    "path": "abs/2506.04351.md",
    "content": "### HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting\n\n3D human generation is an important problem with a wide range of applications in computer vision and graphics. Despite recent progress in generative AI such as diffusion models or rendering methods like Neural Radiance Fields or Gaussian Splatting, controlling the generation of accurate 3D humans from text prompts remains an open challenge. Current methods struggle with fine detail, accurate rendering of hands and faces, human realism, and controlability over appearance. The lack of diversity, realism, and annotation in human image data also remains a challenge, hindering the development of a foundational 3D human model. We present a weakly supervised pipeline that tries to address these challenges. In the first step, we generate a photorealistic human image dataset with controllable attributes such as appearance, race, gender, etc using a state-of-the-art image diffusion model. Next, we propose an efficient mapping approach from image features to 3D point clouds using a transformer-based architecture. Finally, we close the loop by training a point-cloud diffusion model that is conditioned on the same text prompts used to generate the original samples. We demonstrate orders-of-magnitude speed-ups in 3D human generation compared to the state-of-the-art approaches, along with significantly improved text-prompt alignment, realism, and rendering quality.\n\n三维人体生成是计算机视觉与图形学中的一项重要课题，拥有广泛的应用场景。尽管近年来生成式 AI（如扩散模型）与渲染方法（如神经辐射场 Neural Radiance Fields 和高斯喷洒 Gaussian Splatting）取得了显著进展，但从文本提示精确生成三维人体仍是一项尚未解决的挑战。现有方法在细节刻画、手部与面部渲染、人体真实感以及外观控制能力方面仍存在显著不足。同时，人体图像数据在多样性、真实性及标注方面的缺失，也阻碍了通用三维人体基础模型的发展。\n为应对上述难题，我们提出了一种弱监督生成管线。第一步，我们借助最先进的图像扩散模型，生成具有可控属性（如外观、种族、性别等）的写实人体图像数据集。接着，我们设计了一种基于 Transformer 的高效映射结构，将图像特征转换为三维点云。最后，我们引入一个以文本提示为条件的点云扩散模型，利用与图像生成阶段相同的文本描述进行训练，形成闭环。\n该方法相比当前最先进的方案，在三维人体生成速度上实现了数量级的提升，并在文本一致性、图像真实感以及渲染质量方面取得了显著进展。\n"
  },
  {
    "path": "abs/2506.04789.md",
    "content": "### Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations\n\nLearning effective multi-modal 3D representations of objects is essential for numerous applications, such as augmented reality and robotics. Existing methods often rely on task-specific embeddings that are tailored either for semantic understanding or geometric reconstruction. As a result, these embeddings typically cannot be decoded into explicit geometry and simultaneously reused across tasks. In this paper, we propose Object-X, a versatile multi-modal object representation framework capable of encoding rich object embeddings (e.g. images, point cloud, text) and decoding them back into detailed geometric and visual reconstructions. Object-X operates by geometrically grounding the captured modalities in a 3D voxel grid and learning an unstructured embedding fusing the information from the voxels with the object attributes. The learned embedding enables 3D Gaussian Splatting-based object reconstruction, while also supporting a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization. Evaluations on two challenging real-world datasets demonstrate that Object-X produces high-fidelity novel-view synthesis comparable to standard 3D Gaussian Splatting, while significantly improving geometric accuracy. Moreover, Object-X achieves competitive performance with specialized methods in scene alignment and localization. Critically, our object-centric descriptors require 3-4 orders of magnitude less storage compared to traditional image- or point cloud-based approaches, establishing Object-X as a scalable and highly practical solution for multi-modal 3D scene representation.\n\n学习有效的多模态三维物体表征对于增强现实和机器人等众多应用至关重要。现有方法通常依赖于特定任务的嵌入，这些嵌入要么面向语义理解，要么面向几何重建。因此，这类嵌入通常既无法解码为显式几何结构，也难以跨任务复用。\n本文提出了 Object-X，一个通用的多模态物体表征框架，能够编码丰富的物体嵌入（如图像、点云、文本），并将其解码为细致的几何和视觉重建结果。Object-X 的核心思路是将采集到的多模态信息几何对齐到三维体素网格中，并学习一种非结构化嵌入，将体素中的信息与物体属性进行融合。\n所学习的嵌入不仅支持基于 3D Gaussian Splatting 的物体重建，还可用于多种下游任务，包括场景对齐、单张图像的三维物体重建以及定位任务。我们在两个具有挑战性的真实世界数据集上进行了评估，结果表明：Object-X 在新视角合成方面可达到与标准 3D Gaussian Splatting 相当的高保真度，同时显著提升了几何精度。\n此外，在场景对齐和定位任务中，Object-X 也展现出与专用方法相媲美的性能。更重要的是，与传统基于图像或点云的方法相比，我们提出的以物体为中心的描述子在存储需求上降低了 3~4 个数量级，使得 Object-X 成为一种可扩展且高度实用的多模态三维场景表示方案。\n"
  },
  {
    "path": "abs/2506.04908.md",
    "content": "### Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer\n\nIn this paper, we introduce a 3D Gaussian Splatting (3DGS)-based pipeline for stereo dataset generation, offering an efficient alternative to Neural Radiance Fields (NeRF)-based methods. To obtain useful geometry estimates, we explore utilizing the reconstructed geometry from the explicit 3D representations as well as depth estimates from the FoundationStereo model in an expert knowledge transfer setup. We find that when fine-tuning stereo models on 3DGS-generated datasets, we demonstrate competitive performance in zero-shot generalization benchmarks. When using the reconstructed geometry directly, we observe that it is often noisy and contains artifacts, which propagate noise to the trained model. In contrast, we find that the disparity estimates from FoundationStereo are cleaner and consequently result in a better performance on the zero-shot generalization benchmarks. Our method highlights the potential for low-cost, high-fidelity dataset creation and fast fine-tuning for deep stereo models. Moreover, we also reveal that while the latest Gaussian Splatting based methods have achieved superior performance on established benchmarks, their robustness falls short in challenging in-the-wild settings warranting further exploration.\n\n在本文中，我们提出了一种基于三维高斯泼洒（3D Gaussian Splatting，3DGS）的立体视觉数据集生成流程，为传统基于神经辐射场（Neural Radiance Fields，NeRF）的方法提供了一种高效替代方案。为了获取有用的几何估计，我们探索了两种方式：一是利用显式三维表示所重建的几何结构，二是采用 FoundationStereo 模型的深度估计结果，在专家知识迁移设置下进行利用。\n我们的研究发现，当在 3DGS 生成的数据集上对立体视觉模型进行微调时，在零样本泛化基准测试中能够取得具有竞争力的性能。直接使用重建几何时，我们观察到其通常较为嘈杂，并带有伪影，这些噪声会进一步传递到训练出的模型中。相比之下，FoundationStereo 的视差估计更加干净，从而带来了更优异的零样本泛化表现。\n本方法展示了构建低成本、高保真数据集以及快速微调深度立体视觉模型的潜力。此外，我们还指出，尽管最新的高斯泼洒方法在标准基准上取得了优越性能，但其在具有挑战性的真实场景中仍存在稳健性不足的问题，亟需进一步探索。\n"
  },
  {
    "path": "abs/2506.05009.md",
    "content": "### Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting\n\nTraining neural networks for tasks such as 3D point cloud semantic segmentation demands extensive datasets, yet obtaining and annotating real-world point clouds is costly and labor-intensive. This work aims to introduce a novel pipeline for generating realistic synthetic data, by leveraging 3D Gaussian Splatting (3DGS) and Gaussian Opacity Fields (GOF) to generate 3D assets of multiple different agricultural vehicles instead of using generic models. These assets are placed in a simulated environment, where the point clouds are generated using a simulated LiDAR. This is a flexible approach that allows changing the LiDAR specifications without incurring additional costs. We evaluated the impact of synthetic data on segmentation models such as PointNet++, Point Transformer V3, and OACNN, by training and validating the models only on synthetic data. Remarkably, the PTv3 model had an mIoU of 91.35%, a noteworthy result given that the model had neither been trained nor validated on any real data. Further studies even suggested that in certain scenarios the models trained only on synthetically generated data performed better than models trained on real-world data. Finally, experiments demonstrated that the models can generalize across semantic classes, enabling accurate predictions on mesh models they were never trained on.\n\n训练神经网络以执行诸如三维点云语义分割等任务，通常需要大量数据集。然而，获取和标注真实世界的点云数据既昂贵又耗费人力。为此，本文提出了一种全新的合成数据生成流程，利用三维高斯泼洒（3D Gaussian Splatting，3DGS）与高斯不透明度场（Gaussian Opacity Fields，GOF）来生成多种农业机械的高逼真三维资产，取代以往使用通用模型的做法。\n这些资产被置于模拟环境中，并通过模拟 LiDAR 生成点云数据。该方法具有高度灵活性，允许在不产生额外成本的前提下变更 LiDAR 的参数配置。\n我们评估了合成数据对语义分割模型（如 PointNet++、Point Transformer V3 和 OACNN）的影响，训练和验证过程完全基于合成数据。值得注意的是，PTv3 模型在未接触任何真实数据的情况下，仍然达到了 91.35% 的 mIoU，展现出极具意义的性能表现。进一步研究表明，在某些场景下，仅使用合成数据训练的模型甚至优于使用真实数据训练的模型。\n最后，实验还表明这些模型具备语义类别的泛化能力，能够对训练时从未见过的网格模型作出准确预测。\n"
  },
  {
    "path": "abs/2506.05011.md",
    "content": "### UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting\n\nDespite significant advancements in dynamic neural rendering, existing methods fail to address the unique challenges posed by UAV-captured scenarios, particularly those involving monocular camera setups, top-down perspective, and multiple small, moving humans, which are not adequately represented in existing datasets. In this work, we introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs. Specifically, we address the challenge of reconstructing dynamic scenes with multiple moving pedestrians from monocular video data without the need for additional sensors. We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans. We propose a novel approach to resolve the scene scale ambiguity and place both humans and the scene in world coordinates by identifying human-scene contact points. Additionally, we exploit the SMPL model and background mesh to initialize Gaussian splats, enabling holistic scene rendering. We evaluated our method on three complex UAV-captured datasets: VisDrone, Manipal-UAV, and Okutama-Action, each with distinct characteristics and 10~50 humans. Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.\n\n尽管动态神经渲染取得了显著进展，现有方法仍未能有效应对无人机（UAV）拍摄场景所面临的独特挑战，尤其是在单目摄像头设置、俯视视角以及包含多个小型移动人物的场景中，这类情况在现有数据集中尚未得到充分覆盖。为此，本文提出了 UAV4D 框架，用于实现对无人机拍摄的真实动态场景的照片级真实感渲染。\n具体而言，我们致力于从单目视频数据中重建包含多名行人动态活动的场景，无需额外传感器辅助。我们结合使用三维基础模型与人体网格重建模型，分别对场景背景和人物进行重建。为了解决场景尺度歧义问题，我们提出了一种新方法，通过识别人物与场景的接触点，将人物与场景共同定位于世界坐标系中。\n此外，我们利用 SMPL 模型与背景网格初始化高斯泼洒，实现对整体现实场景的渲染。我们在三个具有代表性的复杂无人机拍摄数据集——VisDrone、Manipal-UAV 和 Okutama-Action 上对方法进行了评估，这些数据集中均包含 10 至 50 名不等的人物，且各具特性。\n实验结果表明，与现有方法相比，我们的方案在新视角合成任务中表现出更强优势，PSNR 提升达 1.5 dB，同时在视觉清晰度上也展现出更优质的效果。\n"
  },
  {
    "path": "abs/2506.05092.md",
    "content": "### Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training\n\nAnnotated datasets are critical for training neural networks for object detection, yet their manual creation is time- and labour-intensive, subjective to human error, and often limited in diversity. This challenge is particularly pronounced in the domain of robotics, where diverse and dynamic scenarios further complicate the creation of representative datasets. To address this, we propose a novel method for automatically generating annotated synthetic data in Unreal Engine. Our approach leverages photorealistic 3D Gaussian splats for rapid synthetic data generation. We demonstrate that synthetic datasets can achieve performance comparable to that of real-world datasets while significantly reducing the time required to generate and annotate data. Additionally, combining real-world and synthetic data significantly increases object detection performance by leveraging the quality of real-world images with the easier scalability of synthetic data. To our knowledge, this is the first application of synthetic data for training object detection algorithms in the highly dynamic and varied environment of robot soccer. Validation experiments reveal that a detector trained on synthetic images performs on par with one trained on manually annotated real-world images when tested on robot soccer match scenarios. Our method offers a scalable and comprehensive alternative to traditional dataset creation, eliminating the labour-intensive error-prone manual annotation process. By generating datasets in a simulator where all elements are intrinsically known, we ensure accurate annotations while significantly reducing manual effort, which makes it particularly valuable for robotics applications requiring diverse and scalable training data.\n\n带有标注的数据集对训练目标检测神经网络至关重要，然而其手动构建过程往往耗时、耗力，易受人为误差影响，且在多样性方面受限。该问题在机器人领域尤为突出，由于场景多变且动态性强，构建具有代表性的数据集更具挑战性。\n为此，本文提出了一种在 Unreal Engine 中自动生成带标注合成数据的新方法。我们的方法利用真实感的三维高斯泼洒（3D Gaussian Splats）实现快速的合成数据生成。实验表明，所生成的合成数据集在性能上可与真实数据集媲美，同时显著减少了数据采集与标注所需的时间成本。\n此外，将真实数据与合成数据结合使用，可显著提升目标检测性能，既发挥了真实图像的质量优势，又利用了合成数据的可扩展性。据我们所知，这是首个将合成数据用于高度动态、变化多样的机器人足球环境中目标检测算法训练的研究。\n验证实验表明，在机器人足球比赛场景中，使用合成图像训练的检测器在性能上与使用人工标注的真实图像训练的检测器相当。我们的方法为传统数据集构建流程提供了可扩展、全面的替代方案，避免了繁琐且易出错的人工标注过程。通过在模拟器中生成数据集，其中所有元素均已先验可知，我们不仅确保了标注的准确性，也大幅降低了人工成本，这使得该方法对于需要多样化且可扩展训练数据的机器人应用场景具有特别重要的价值。\n"
  },
  {
    "path": "abs/2506.05204.md",
    "content": "### OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View\n\nReconstructing semantic-aware 3D scenes from sparse views is a challenging yet essential research direction, driven by the demands of emerging applications such as virtual reality and embodied AI. Existing per-scene optimization methods require dense input views and incur high computational costs, while generalizable approaches often struggle to reconstruct regions outside the input view cone. In this paper, we propose OGGSplat, an open Gaussian growing method that expands the field-of-view in generalizable 3D reconstruction. Our key insight is that the semantic attributes of open Gaussians provide strong priors for image extrapolation, enabling both semantic consistency and visual plausibility. Specifically, once open Gaussians are initialized from sparse views, we introduce an RGB-semantic consistent inpainting module applied to selected rendered views. This module enforces bidirectional control between an image diffusion model and a semantic diffusion model. The inpainted regions are then lifted back into 3D space for efficient and progressive Gaussian parameter optimization. To evaluate our method, we establish a Gaussian Outpainting (GO) benchmark that assesses both semantic and generative quality of reconstructed open-vocabulary scenes. OGGSplat also demonstrates promising semantic-aware scene reconstruction capabilities when provided with two view images captured directly from a smartphone camera.\n\n从稀疏视角重建具有语义感知的三维场景是一个具有挑战性却又至关重要的研究方向，其背后受到虚拟现实和具身智能等新兴应用的强烈驱动。现有的按场景优化方法通常依赖于密集输入视角，计算成本较高；而可泛化方法则往往难以重建超出输入视锥范围的区域。\n本文提出了一种名为 OGGSplat 的开放式高斯增长方法（Open Gaussian Growing Splatting），用于在可泛化三维重建中扩展视野范围。我们核心的洞察在于：开放高斯的语义属性可为图像外推提供强先验，从而同时实现语义一致性与视觉合理性。\n具体而言，在从稀疏视角初始化开放高斯后，我们引入了一种 RGB-语义一致的图像补全模块，该模块作用于选定的渲染视图，利用图像扩散模型与语义扩散模型之间的双向控制机制，实现视觉与语义上的协同补全。补全后的图像区域再被“升维”回三维空间，参与后续高斯参数的高效、渐进式优化。\n为评估该方法，我们构建了一个新的评测基准 Gaussian Outpainting (GO)，用于衡量重建场景在语义质量与生成质量两个方面的表现。实验表明，OGGSplat 在语义感知的三维场景重建任务中表现出强大能力，即使输入仅为来自智能手机的两个视角图像。\n"
  },
  {
    "path": "abs/2506.05217.md",
    "content": "### DSG-World: Learning a 3D Gaussian World Model from Dual State Videos\n\nBuilding an efficient and physically consistent world model from limited observations is a long standing challenge in vision and robotics. Many existing world modeling pipelines are based on implicit generative models, which are hard to train and often lack 3D or physical consistency. On the other hand, explicit 3D methods built from a single state often require multi-stage processing-such as segmentation, background completion, and inpainting-due to occlusions. To address this, we leverage two perturbed observations of the same scene under different object configurations. These dual states offer complementary visibility, alleviating occlusion issues during state transitions and enabling more stable and complete reconstruction. In this paper, we present DSG-World, a novel end-to-end framework that explicitly constructs a 3D Gaussian World model from Dual State observations. Our approach builds dual segmentation-aware Gaussian fields and enforces bidirectional photometric and semantic consistency. We further introduce a pseudo intermediate state for symmetric alignment and design collaborative co-pruning trategies to refine geometric completeness. DSG-World enables efficient real-to-simulation transfer purely in the explicit Gaussian representation space, supporting high-fidelity rendering and object-level scene manipulation without relying on dense observations or multi-stage pipelines. Extensive experiments demonstrate strong generalization to novel views and scene states, highlighting the effectiveness of our approach for real-world 3D reconstruction and simulation.\n\n从有限观测中构建高效且物理一致的世界模型，一直是计算机视觉与机器人领域的核心挑战之一。许多现有的世界建模流程依赖于隐式生成模型，这类方法不仅训练困难，还常常缺乏三维或物理一致性。而基于单一状态的显式三维方法则通常需要多阶段处理流程，如分割、背景补全与遮挡区域的图像修复等，以应对遮挡问题。\n为了解决这一问题，本文提出利用同一场景在不同物体配置下的两次扰动观测，即“双状态”（Dual State）输入。这两个状态提供了互补的可见性，缓解了状态转换过程中的遮挡问题，从而实现更加稳定和完整的三维重建。\n我们提出了 DSG-World，一个端到端的新型框架，可从双状态观测中显式构建三维高斯世界模型（3D Gaussian World）。该方法构建了双分割感知的高斯场，并强制施加双向的光度一致性与语义一致性约束。此外，我们引入了一个伪中间状态用于实现对称对齐，并设计了协同共剪枝策略（collaborative co-pruning strategies）以进一步提升几何完整性。\nDSG-World 完全在显式高斯表示空间中完成高保真渲染与对象级场景操作，能够高效支持从真实到仿真的迁移，无需依赖密集观测或多阶段建模流程。大量实验表明，该方法在新视角与新场景状态下具有良好的泛化能力，验证了其在真实世界三维重建与模拟中的有效性。\n"
  },
  {
    "path": "abs/2506.05327.md",
    "content": "### Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting\n\nDepth maps are widely used in feed-forward 3D Gaussian Splatting (3DGS) pipelines by unprojecting them into 3D point clouds for novel view synthesis. This approach offers advantages such as efficient training, the use of known camera poses, and accurate geometry estimation. However, depth discontinuities at object boundaries often lead to fragmented or sparse point clouds, degrading rendering quality -- a well-known limitation of depth-based representations. To tackle this issue, we introduce PM-Loss, a novel regularization loss based on a pointmap predicted by a pre-trained transformer. Although the pointmap itself may be less accurate than the depth map, it effectively enforces geometric smoothness, especially around object boundaries. With the improved depth map, our method significantly improves the feed-forward 3DGS across various architectures and scenes, delivering consistently better rendering results.\n\n深度图在前馈式三维高斯泼洒（3D Gaussian Splatting，3DGS）流程中被广泛使用，通常通过将其反投影为三维点云以实现新视角合成。这种方法具有多个优势，例如训练效率高、可利用已知相机位姿，以及具备较高的几何精度。然而，在物体边界处的深度不连续性常常导致点云破碎或稀疏，从而降低渲染质量——这是深度图表示中广为人知的限制。\n为了解决这一问题，本文提出了一种名为 PM-Loss 的新型正则化损失，其基于由预训练 Transformer 模型预测的点图（pointmap）。尽管点图在精度上可能不如深度图，但它能有效增强几何的平滑性，特别是在物体边界区域。\n通过使用改进后的深度图，我们的方法显著提升了前馈式 3DGS 的渲染效果，适用于多种网络结构和场景，在保持几何准确性的同时，能够持续带来更优质的渲染结果。\n"
  },
  {
    "path": "abs/2506.05348.md",
    "content": "### FreeTimeGS: Free Gaussian Primitives at Anytime and Anywhere for Dynamic Scene Reconstruction\n\nThis paper addresses the challenge of reconstructing dynamic 3D scenes with complex motions. Some recent works define 3D Gaussian primitives in the canonical space and use deformation fields to map canonical primitives to observation spaces, achieving real-time dynamic view synthesis. However, these methods often struggle to handle scenes with complex motions due to the difficulty of optimizing deformation fields. To overcome this problem, we propose FreeTimeGS, a novel 4D representation that allows Gaussian primitives to appear at arbitrary time and locations. In contrast to canonical Gaussian primitives, our representation possesses the strong flexibility, thus improving the ability to model dynamic 3D scenes. In addition, we endow each Gaussian primitive with an motion function, allowing it to move to neighboring regions over time, which reduces the temporal redundancy. Experiments results on several datasets show that the rendering quality of our method outperforms recent methods by a large margin.\n\n本文聚焦于重建具有复杂运动的动态三维场景这一挑战。一些最新工作在规范空间中定义三维高斯基元，并通过形变场将这些规范基元映射到观测空间，从而实现实时动态视角合成。然而，由于形变场优化困难，这类方法在处理复杂运动场景时常常表现不佳。\n为了解决这一问题，本文提出了 FreeTimeGS，一种全新的四维表示方法，允许高斯基元在任意时间和空间位置出现。与传统的规范空间高斯基元相比，该表示具有更强的灵活性，从而显著增强了对动态三维场景的建模能力。\n此外，我们为每个高斯基元引入了运动函数，使其能够随时间移动至相邻区域，从而减少时间维度上的冗余信息。\n在多个数据集上的实验结果表明，FreeTimeGS 的渲染质量远超现有方法，取得了显著的性能提升。\n"
  },
  {
    "path": "abs/2506.05397.md",
    "content": "### Gen4D: Synthesizing Humans and Scenes in the Wild\n\nLack of input data for in-the-wild activities often results in low performance across various computer vision tasks. This challenge is particularly pronounced in uncommon human-centric domains like sports, where real-world data collection is complex and impractical. While synthetic datasets offer a promising alternative, existing approaches typically suffer from limited diversity in human appearance, motion, and scene composition due to their reliance on rigid asset libraries and hand-crafted rendering pipelines. To address this, we introduce Gen4D, a fully automated pipeline for generating diverse and photorealistic 4D human animations. Gen4D integrates expert-driven motion encoding, prompt-guided avatar generation using diffusion-based Gaussian splatting, and human-aware background synthesis to produce highly varied and lifelike human sequences. Based on Gen4D, we present SportPAL, a large-scale synthetic dataset spanning three sports: baseball, icehockey, and soccer. Together, Gen4D and SportPAL provide a scalable foundation for constructing synthetic datasets tailored to in-the-wild human-centric vision tasks, with no need for manual 3D modeling or scene design.o\n\n在自然环境下（in-the-wild）获取输入数据的不足，常导致计算机视觉任务在多种场景下表现不佳。该问题在人类中心的少见领域中尤为突出，例如体育运动场景，其真实数据的采集复杂且难以实现。尽管合成数据集提供了有前景的替代方案，但现有方法往往依赖于刚性资源库和手工构建的渲染流程，导致在人物外观、动作和场景组合上的多样性受限。\n为此，我们提出了 Gen4D，一个全自动的多样化真实感四维人物动画生成流程。Gen4D 融合了专家驱动的动作编码、基于扩散模型的高斯泼洒人物生成（支持文本引导），以及具有人物感知能力的背景合成，从而实现高度多样化且逼真的人物序列生成。\n基于 Gen4D，我们构建了大规模合成数据集 SportPAL，涵盖了三个体育项目：棒球、冰球和足球。Gen4D 与 SportPAL 共同构成了一个可扩展的基础平台，能够自动生成适用于自然场景下人类中心视觉任务的合成数据集，无需人工三维建模或场景设计。\n"
  },
  {
    "path": "abs/2506.05473.md",
    "content": "### S2GO: Streaming Sparse Gaussian Occupancy Prediction\n\nDespite the demonstrated efficiency and performance of sparse query-based representations for perception, state-of-the-art 3D occupancy prediction methods still rely on voxel-based or dense Gaussian-based 3D representations. However, dense representations are slow, and they lack flexibility in capturing the temporal dynamics of driving scenes. Distinct from prior work, we instead summarize the scene into a compact set of 3D queries which are propagated through time in an online, streaming fashion. These queries are then decoded into semantic Gaussians at each timestep. We couple our framework with a denoising rendering objective to guide the queries and their constituent Gaussians in effectively capturing scene geometry. Owing to its efficient, query-based representation, S2GO achieves state-of-the-art performance on the nuScenes and KITTI occupancy benchmarks, outperforming prior art (e.g., GaussianWorld) by 1.5 IoU with 5.9x faster inference.\n\n尽管基于稀疏查询的感知表示在效率和性能方面已展现出巨大潜力，当前最先进的三维占据预测方法仍主要依赖体素或稠密高斯等三维表示方式。然而，这类稠密表示不仅计算缓慢，还缺乏对自动驾驶场景中时序动态的建模灵活性。\n与以往工作不同，本文提出将场景压缩为一组紧凑的三维查询点，并在时间轴上以在线流式方式进行传播。每个时间步，这些查询点会被解码为对应的语义高斯分布。\n我们在框架中引入了一个去噪渲染目标函数，用于引导查询点及其组成的高斯更有效地捕捉场景几何信息。得益于这一高效、基于查询的表示方式，S2GO 在 nuScenes 与 KITTI 占据预测基准上均取得了当前最优性能，相较于先前方法（如 GaussianWorld），IoU 提升 1.5，推理速度提升达 5.9 倍。\n"
  },
  {
    "path": "abs/2506.05480.md",
    "content": "### ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting\n\nWe present ODE-GS, a novel method that unifies 3D Gaussian Splatting with latent neural ordinary differential equations (ODEs) to forecast dynamic 3D scenes far beyond the time span seen during training. Existing neural rendering systems - whether NeRF- or 3DGS-based - embed time directly in a deformation network and therefore excel at interpolation but collapse when asked to predict the future, where timestamps are strictly out-of-distribution. ODE-GS eliminates this dependency: after learning a high-fidelity, time-conditioned deformation model for the training window, we freeze it and train a Transformer encoder that summarizes past Gaussian trajectories into a latent state whose continuous evolution is governed by a neural ODE. Numerical integration of this latent flow yields smooth, physically plausible Gaussian trajectories that can be queried at any future instant and rendered in real time. Coupled with a variational objective and a lightweight second-derivative regularizer, ODE-GS attains state-of-the-art extrapolation on D-NeRF and NVFI benchmarks, improving PSNR by up to 10 dB and halving perceptual error (LPIPS) relative to the strongest baselines. Our results demonstrate that continuous-time latent dynamics are a powerful, practical route to photorealistic prediction of complex 3D scenes.\n\n我们提出了 ODE-GS，一种将三维高斯泼洒（3D Gaussian Splatting）与潜在神经常微分方程（latent neural ODEs）相结合的新方法，用于对动态三维场景进行远超训练时间范围的预测。\n现有的神经渲染系统——无论基于 NeRF 还是 3DGS——通常将时间直接嵌入形变网络中，因此在时间插值任务中表现出色，但在面临时间戳超出训练分布的预测任务时却容易失效。ODE-GS 消除了这种依赖：在学习训练时间窗口内的高保真时间条件形变模型后，我们将其参数冻结，并训练一个 Transformer 编码器，该编码器能够将过去的高斯轨迹摘要为一个潜在状态，其连续演化由一个神经 ODE 所控制。\n通过对该潜在轨迹进行数值积分，可生成平滑、物理合理的高斯轨迹，支持任意未来时间点的查询与实时渲染。结合变分目标函数与轻量的二阶导数正则项，ODE-GS 在 D-NeRF 与 NVFI 等基准上实现了最先进的外推性能，PSNR 提升可达 10 dB，LPIPS 感知误差减少一半，相较于当前最强基线表现出显著优势。\n我们的研究结果表明，基于连续时间的潜在动态建模是实现高质量三维场景预测的强大且实用的路径。\n"
  },
  {
    "path": "abs/2506.05558.md",
    "content": "### On-the-fly Reconstruction for Large-Scale Novel View Synthesis from Unposed Images\n\nRadiance field methods such as 3D Gaussian Splatting (3DGS) allow easy reconstruction from photos, enabling free-viewpoint navigation. Nonetheless, pose estimation using Structure from Motion and 3DGS optimization can still each take between minutes and hours of computation after capture is complete. SLAM methods combined with 3DGS are fast but struggle with wide camera baselines and large scenes. We present an on-the-fly method to produce camera poses and a trained 3DGS immediately after capture. Our method can handle dense and wide-baseline captures of ordered photo sequences and large-scale scenes. To do this, we first introduce fast initial pose estimation, exploiting learned features and a GPU-friendly mini bundle adjustment. We then introduce direct sampling of Gaussian primitive positions and shapes, incrementally spawning primitives where required, significantly accelerating training. These two efficient steps allow fast and robust joint optimization of poses and Gaussian primitives. Our incremental approach handles large-scale scenes by introducing scalable radiance field construction, progressively clustering 3DGS primitives, storing them in anchors, and offloading them from the GPU. Clustered primitives are progressively merged, keeping the required scale of 3DGS at any viewpoint. We evaluate our solution on a variety of datasets and show that our solution can provide on-the-fly processing of all the capture scenarios and scene sizes we target while remaining competitive with other methods that only handle specific capture styles or scene sizes in speed, image quality, or both.\n\n辐射场方法（如三维高斯泼洒，3D Gaussian Splatting，3DGS）支持从照片中轻松重建三维场景，实现自由视角导航。然而，基于结构光恢复（Structure from Motion）的相机位姿估计以及 3DGS 优化，通常仍需在采集完成后花费数分钟至数小时的计算时间。尽管将 SLAM 方法与 3DGS 结合可以加快处理速度，但在处理大视差相机基线或大规模场景时往往表现不佳。\n为此，本文提出了一种即时生成相机位姿与训练完成的 3DGS 场景的方法。该方法可应对有序图像序列的密集、大视差采集场景，并适用于大规模场景重建。\n具体做法包括两大核心改进：首先，我们引入了基于学习特征的快速初始位姿估计方法，并结合 GPU 友好的小规模光束调整（mini bundle adjustment），以实现高效的初始对齐；其次，我们提出了对高斯基元的位置与形状的直接采样策略，在所需区域增量式生成基元，大幅加快训练速度。\n这两个高效步骤使得我们能够快速且稳健地联合优化相机位姿与高斯基元。针对大规模场景，我们进一步提出了可扩展的辐射场构建策略：通过逐步对 3DGS 基元进行聚类，并将其存储为锚点（anchors），从而将其卸载出 GPU；聚类后的基元会被渐进式地融合，在任意视角下都可维持合理的重建规模。\n我们在多个数据集上对该方法进行了评估，结果表明该方案能够在目标采集场景和场景规模下实现即时处理，在处理速度与图像质量上均具有竞争力，优于那些仅能应对特定采集形式或规模的方法。\n"
  },
  {
    "path": "abs/2506.05563.md",
    "content": "### VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction\n\nRecent advancements in camera-based occupancy prediction have focused on the simultaneous prediction of 3D semantics and scene flow, a task that presents significant challenges due to specific difficulties, e.g., occlusions and unbalanced dynamic environments. In this paper, we analyze these challenges and their underlying causes. To address them, we propose a novel regularization framework called VoxelSplat. This framework leverages recent developments in 3D Gaussian Splatting to enhance model performance in two key ways: (i) Enhanced Semantics Supervision through 2D Projection: During training, our method decodes sparse semantic 3D Gaussians from 3D representations and projects them onto the 2D camera view. This provides additional supervision signals in the camera-visible space, allowing 2D labels to improve the learning of 3D semantics. (ii) Scene Flow Learning: Our framework uses the predicted scene flow to model the motion of Gaussians, and is thus able to learn the scene flow of moving objects in a self-supervised manner using the labels of adjacent frames. Our method can be seamlessly integrated into various existing occupancy models, enhancing performance without increasing inference time. Extensive experiments on benchmark datasets demonstrate the effectiveness of VoxelSplat in improving the accuracy of both semantic occupancy and scene flow estimation.\n\n近年来，基于摄像头的占用预测研究逐渐聚焦于同时预测三维语义与场景流，这是一项具有高度挑战性的任务，主要因其面临遮挡、动态环境分布不均等特定难题。本文分析了这些挑战及其根本原因。为应对这些问题，我们提出了一种名为 VoxelSplat 的全新正则化框架。该框架借助最新的 3D 高斯溅射（3D Gaussian Splatting）技术，从两个关键方面提升模型性能：\n(i) 通过二维投影增强语义监督：在训练过程中，我们的方法从三维表示中解码出稀疏的语义三维高斯，并将其投影到二维摄像头视角。这一操作在摄像头可见空间中引入了额外的监督信号，使得二维标签能够有效提升三维语义的学习效果。\n(ii) 场景流学习：该框架利用预测得到的场景流对高斯运动进行建模，从而能够在相邻帧标签的引导下，以自监督的方式学习移动物体的场景流。\n我们的方法可无缝集成至现有多种占用模型中，在不增加推理时间的前提下，显著提升性能。大量基准数据集上的实验证明，VoxelSplat 能够有效提高语义占用预测与场景流估计的准确性。\n"
  },
  {
    "path": "abs/2506.05682.md",
    "content": "### Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy\n\n3D Gaussian Splatting (3DGS) has vastly advanced the pace of neural rendering, but it remains computationally demanding on today's mobile SoCs. To address this challenge, we propose Lumina, a hardware-algorithm co-designed system, which integrates two principal optimizations: a novel algorithm, S^2, and a radiance caching mechanism, RC, to improve the efficiency of neural rendering. S2 algorithm exploits temporal coherence in rendering to reduce the computational overhead, while RC leverages the color integration process of 3DGS to decrease the frequency of intensive rasterization computations. Coupled with these techniques, we propose an accelerator architecture, LuminCore, to further accelerate cache lookup and address the fundamental inefficiencies in Rasterization. We show that Lumina achieves 4.5x speedup and 5.3x energy reduction against a mobile Volta GPU, with a marginal quality loss (`<` 0.2 dB peak signal-to-noise ratio reduction) across synthetic and real-world datasets.\n\n3D 高斯溅射（3D Gaussian Splatting, 3DGS）极大地推动了神经渲染的发展速度，但在当前的移动系统级芯片（SoC）上仍然计算开销巨大。为应对这一挑战，我们提出了 Lumina，一个软硬件协同设计的系统，融合了两项核心优化：新颖算法 S² 和辐射度缓存机制 RC，以提升神经渲染的效率。\nS² 算法利用渲染过程中的时间一致性（temporal coherence）来降低计算开销，而 RC 则通过优化 3DGS 中的颜色积分过程，减少高强度光栅化计算的频率。结合上述技术，我们还设计了专用加速器架构 LuminCore，进一步加速缓存查找过程，并从根本上解决光栅化中的效率瓶颈。\n实验结果表明，Lumina 相较于移动端 Volta GPU 实现可达 4.5 倍速度提升 和 5.3 倍能耗降低，在合成与真实数据集上仅带来 `<` 0.2 dB 的峰值信噪比损失，几乎可忽略不计。\n"
  },
  {
    "path": "abs/2506.05935.md",
    "content": "### SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction\n\nIntraoperative navigation relies heavily on precise 3D reconstruction to ensure accuracy and safety during surgical procedures. However, endoscopic scenarios present unique challenges, including sparse features and inconsistent lighting, which render many existing Structure-from-Motion (SfM)-based methods inadequate and prone to reconstruction failure. To mitigate these constraints, we propose SurGSplat, a novel paradigm designed to progressively refine 3D Gaussian Splatting (3DGS) through the integration of geometric constraints. By enabling the detailed reconstruction of vascular structures and other critical features, SurGSplat provides surgeons with enhanced visual clarity, facilitating precise intraoperative decision-making. Experimental evaluations demonstrate that SurGSplat achieves superior performance in both novel view synthesis (NVS) and pose estimation accuracy, establishing it as a high-fidelity and efficient solution for surgical scene reconstruction.\n\n术中导航高度依赖于精确的三维重建，以确保手术过程中的准确性与安全性。然而，内窥镜场景具有诸多独特挑战，包括特征稀疏与光照不一致，导致许多基于结构光恢复（Structure-from-Motion, SfM）的方法效果不佳，易发生重建失败。为克服这些限制，我们提出了一种新范式——SurGSplat，该方法通过引入几何约束，逐步优化三维高斯泼溅（3D Gaussian Splatting, 3DGS）。SurGSplat 能够精细重建血管结构及其他关键特征，为外科医生提供更清晰的视觉信息，辅助其做出更为精准的术中决策。实验评估表明，SurGSplat 在新视角合成（Novel View Synthesis, NVS）与位姿估计精度方面均表现出色，是一种高保真且高效的手术场景重建解决方案。\n"
  },
  {
    "path": "abs/2506.05965.md",
    "content": "### Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments\n\nCurrent Simultaneous Localization and Mapping (SLAM) methods based on Neural Radiance Fields (NeRF) or 3D Gaussian Splatting excel in reconstructing static 3D scenes but struggle with tracking and reconstruction in dynamic environments, such as real-world scenes with moving elements. Existing NeRF-based SLAM approaches addressing dynamic challenges typically rely on RGB-D inputs, with few methods accommodating pure RGB input. To overcome these limitations, we propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input. To address dynamic interference, we fuse optical flow masks and depth masks through a probabilistic model to obtain a fused dynamic mask. With only a single network iteration, this can constrain tracking scales and refine rendered geometry. Based on the fused dynamic mask, we designed a novel motion loss to constrain the pose estimation network for tracking. In mapping, we use the rendering loss of dynamic pixels, color, and depth to eliminate transient interference and occlusion caused by dynamic objects. Experimental results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments, outperforming or matching existing RGB-D methods.\n\n当前基于神经辐射场（Neural Radiance Fields, NeRF）或三维高斯泼溅（3D Gaussian Splatting, 3DGS）的同步定位与建图（Simultaneous Localization and Mapping, SLAM）方法在重建静态三维场景方面表现优异，但在包含运动元素的真实动态环境中，仍面临跟踪与重建的困难。现有面向动态场景的 NeRF-based SLAM 方法多数依赖 RGB-D 输入，能够处理纯 RGB 输入的方案仍较为稀缺。\n为突破上述限制，我们提出 Dy3DGS-SLAM，这是首个面向动态场景、仅使用单目 RGB 输入的三维高斯泼溅 SLAM 方法。为应对动态干扰，我们通过概率模型融合光流掩码与深度掩码，获得融合动态掩码。在仅进行一次网络迭代的条件下，该掩码即可用于约束跟踪尺度并优化重建几何。同时，基于该融合动态掩码，我们设计了一种新颖的运动损失函数，用于约束位姿估计网络以实现鲁棒跟踪。\n在建图阶段，我们利用动态像素的渲染损失、颜色和深度信息，有效剔除动态物体带来的瞬时干扰与遮挡。实验结果表明，Dy3DGS-SLAM 在动态环境中实现了领先的跟踪与渲染性能，超越或可比现有的 RGB-D 方法。\n"
  },
  {
    "path": "abs/2506.06271.md",
    "content": "### BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading\n\nWe introduce BecomingLit, a novel method for reconstructing relightable, high-resolution head avatars that can be rendered from novel viewpoints at interactive rates. Therefore, we propose a new low-cost light stage capture setup, tailored specifically towards capturing faces. Using this setup, we collect a novel dataset consisting of diverse multi-view sequences of numerous subjects under varying illumination conditions and facial expressions. By leveraging our new dataset, we introduce a new relightable avatar representation based on 3D Gaussian primitives that we animate with a parametric head model and an expression-dependent dynamics module. We propose a new hybrid neural shading approach, combining a neural diffuse BRDF with an analytical specular term. Our method reconstructs disentangled materials from our dynamic light stage recordings and enables all-frequency relighting of our avatars with both point lights and environment maps. In addition, our avatars can easily be animated and controlled from monocular videos. We validate our approach in extensive experiments on our dataset, where we consistently outperform existing state-of-the-art methods in relighting and reenactment by a significant margin.\n\n我们提出了 BecomingLit，一种用于重建可重光照的高分辨率头像角色的新方法，该方法支持从新视角进行交互式渲染。为此，我们设计了一种面向人脸捕捉的低成本光照舞台采集系统，并基于该系统采集了一个全新数据集，包含多个主体在不同光照条件和面部表情下的多视角序列。\n基于该数据集，我们提出了一种基于三维高斯图元（3D Gaussian primitives）的可重光照头像表示方法，并通过参数化头部模型和表情相关的动态模块进行驱动。我们还提出了一种混合神经着色方法，将神经漫反射 BRDF 与解析高光项相结合，实现更真实的外观建模。\n该方法能够从动态光照舞台采集数据中重建解耦的材质表示，并支持在点光源与环境光图下进行全频率重光照。此外，我们的头像模型也能通过单目视频轻松驱动和控制。\n我们在自建数据集上进行了大量实验，结果表明，在重光照与驱动再现任务中，BecomingLit 在各项指标上均显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2506.06462.md",
    "content": "### Splat and Replace: 3D Reconstruction with Repetitive Elements\n\nWe leverage repetitive elements in 3D scenes to improve novel view synthesis. Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have greatly improved novel view synthesis but renderings of unseen and occluded parts remain low-quality if the training views are not exhaustive enough. Our key observation is that our environment is often full of repetitive elements. We propose to leverage those repetitions to improve the reconstruction of low-quality parts of the scene due to poor coverage and occlusions. We propose a method that segments each repeated instance in a 3DGS reconstruction, registers them together, and allows information to be shared among instances. Our method improves the geometry while also accounting for appearance variations across instances. We demonstrate our method on a variety of synthetic and real scenes with typical repetitive elements, leading to a substantial improvement in the quality of novel view synthesis.\n\n我们利用三维场景中的重复元素来提升新视角合成效果。尽管神经辐射场（Neural Radiance Fields, NeRF）和三维高斯泼溅（3D Gaussian Splatting, 3DGS）极大提升了新视角合成的质量，但在训练视角覆盖不足的情况下，模型对未见区域和遮挡部分的渲染仍然效果不佳。\n我们的核心观察是：现实环境中往往充满了重复元素。为此，我们提出利用这些重复结构来改善因覆盖不全或遮挡导致的低质量区域重建。具体而言，我们的方法首先对 3DGS 重建中的每个重复实例进行分割与配准，然后在各实例间共享信息。该方法在提升几何结构质量的同时，也考虑了不同实例之间的外观差异。\n我们在多个包含典型重复元素的合成与真实场景中对该方法进行了验证，结果显示该方法能显著提升新视角合成的整体质量。\n"
  },
  {
    "path": "abs/2506.06517.md",
    "content": "### GS4: Generalizable Sparse Splatting Semantic SLAM\n\nTraditional SLAM algorithms are excellent at camera tracking but might generate lower resolution and incomplete 3D maps. Recently, Gaussian Splatting (GS) approaches have emerged as an option for SLAM with accurate, dense 3D map building. However, existing GS-based SLAM methods rely on per-scene optimization which is time-consuming and does not generalize to diverse scenes well. In this work, we introduce the first generalizable GS-based semantic SLAM algorithm that incrementally builds and updates a 3D scene representation from an RGB-D video stream using a learned generalizable network. Our approach starts from an RGB-D image recognition backbone to predict the Gaussian parameters from every downsampled and backprojected image location. Additionally, we seamlessly integrate 3D semantic segmentation into our GS framework, bridging 3D mapping and recognition through a shared backbone. To correct localization drifting and floaters, we propose to optimize the GS for only 1 iteration following global localization. We demonstrate state-of-the-art semantic SLAM performance on the real-world benchmark ScanNet with an order of magnitude fewer Gaussians compared to other recent GS-based methods, and showcase our model's generalization capability through zero-shot transfer to the NYUv2 and TUM RGB-D datasets.\n\n传统的 SLAM 算法在相机跟踪方面表现出色，但往往生成的三维地图分辨率较低且不完整。近年来，高斯泼溅（Gaussian Splatting, GS）方法逐渐成为构建高精度、稠密三维地图的 SLAM 选项。然而，现有基于 GS 的 SLAM 方法依赖于每个场景的优化过程，既耗时又难以泛化至多样化的场景。\n在本工作中，我们提出了首个可泛化的基于 GS 的语义 SLAM 算法。该方法利用训练好的通用网络，从 RGB-D 视频流中逐步构建并更新三维场景表示。我们的方法以 RGB-D 图像识别骨干网络为起点，从每一个下采样并反投影后的图像位置预测高斯参数。同时，我们将三维语义分割无缝集成进 GS 框架，通过共享骨干网络实现三维建图与语义识别的融合。\n为修正定位漂移与浮点伪影，我们提出在全局定位后，仅进行一次迭代的 GS 优化策略。实验表明，我们的方法在真实场景基准数据集 ScanNet 上实现了最先进的语义 SLAM 性能，所需高斯数量相比其他近期 GS 方法少一个数量级。此外，我们还通过零样本迁移在 NYUv2 和 TUM RGB-D 数据集上展示了模型的优秀泛化能力。\n"
  },
  {
    "path": "abs/2506.06645.md",
    "content": "### Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling\n\nPhotorealistic and animatable human avatars are a key enabler for virtual/augmented reality, telepresence, and digital entertainment. While recent advances in 3D Gaussian Splatting (3DGS) have greatly improved rendering quality and efficiency, existing methods still face fundamental challenges, including time-consuming per-subject optimization and poor generalization under sparse monocular inputs. In this work, we present the Parametric Gaussian Human Model (PGHM), a generalizable and efficient framework that integrates human priors into 3DGS for fast and high-fidelity avatar reconstruction from monocular videos. PGHM introduces two core components: (1) a UV-aligned latent identity map that compactly encodes subject-specific geometry and appearance into a learnable feature tensor; and (2) a disentangled Multi-Head U-Net that predicts Gaussian attributes by decomposing static, pose-dependent, and view-dependent components via conditioned decoders. This design enables robust rendering quality under challenging poses and viewpoints, while allowing efficient subject adaptation without requiring multi-view capture or long optimization time. Experiments show that PGHM is significantly more efficient than optimization-from-scratch methods, requiring only approximately 20 minutes per subject to produce avatars with comparable visual quality, thereby demonstrating its practical applicability for real-world monocular avatar creation.\n\n逼真且可动画的人体数字化身是虚拟/增强现实、远程呈现以及数字娱乐等应用的关键支撑。尽管近年来三维高斯泼溅（3D Gaussian Splatting, 3DGS）在渲染质量和效率方面取得了显著进展，现有方法仍面临诸多核心挑战，例如每位角色需进行耗时的个体优化，以及在稀疏单目输入下泛化能力不足。\n为此，我们提出了参数化高斯人体模型（Parametric Gaussian Human Model, PGHM），这是一种通用高效的框架，通过将人体先验融入 3DGS，实现从单目视频中快速且高保真的数字人重建。PGHM 包含两个核心组件：（1）UV 对齐的潜在身份特征图（latent identity map），将个体的几何与外观紧凑编码为可学习的特征张量；（2）解耦的多头 U-Net 网络（Multi-Head U-Net），通过条件解码器将高斯属性分解为静态、姿态相关与视角相关三个子空间进行预测。\n该设计不仅在复杂姿态与视角条件下实现了稳健的渲染效果，还能高效地适配新主体，无需多视角捕捉或长时间优化。实验结果表明，PGHM 相较于从零优化的方法效率大幅提升，每个主体仅需约 20 分钟即可生成具有可比视觉质量的数字化身，展现出在现实场景中创建单目数字人的实际应用潜力。\n"
  },
  {
    "path": "abs/2506.06822.md",
    "content": "### Hi-LSplat: Hierarchical 3D Language Gaussian Splatting\n\nModeling 3D language fields with Gaussian Splatting for open-ended language queries has recently garnered increasing attention. However, recent 3DGS-based models leverage view-dependent 2D foundation models to refine 3D semantics but lack a unified 3D representation, leading to view inconsistencies. Additionally, inherent open-vocabulary challenges cause inconsistencies in object and relational descriptions, impeding hierarchical semantic understanding. In this paper, we propose Hi-LSplat, a view-consistent Hierarchical Language Gaussian Splatting work for 3D open-vocabulary querying. To achieve view-consistent 3D hierarchical semantics, we first lift 2D features to 3D features by constructing a 3D hierarchical semantic tree with layered instance clustering, which addresses the view inconsistency issue caused by 2D semantic features. Besides, we introduce instance-wise and part-wise contrastive losses to capture all-sided hierarchical semantic representations. Notably, we construct two hierarchical semantic datasets to better assess the model's ability to distinguish different semantic levels. Extensive experiments highlight our method's superiority in 3D open-vocabulary segmentation and localization. Its strong performance on hierarchical semantic datasets underscores its ability to capture complex hierarchical semantics within 3D scenes.\n\n近年来，利用高斯泼溅（Gaussian Splatting）建模三维语言场以支持开放式语言查询逐渐受到关注。然而，现有基于 3DGS 的方法通常依赖于视角相关的二维基础模型来增强三维语义，但缺乏统一的三维表示，导致视角不一致的问题。此外，开放词汇本身的挑战也导致物体与关系描述出现语义不一致，阻碍了语义的层次化理解。\n为解决上述问题，本文提出 Hi-LSplat，一种面向三维开放词汇查询的视角一致的层次化语言高斯泼溅方法。为实现视角一致的三维层次语义表示，我们首先通过构建带有分层实例聚类的三维语义树，将二维特征提升为三维特征，从根本上缓解了由二维语义特征引起的视角不一致问题。此外，我们引入了基于实例与部件的对比损失，以全面建模三维场景中的层次语义表达。\n值得一提的是，我们构建了两个层次语义数据集，用于更好地评估模型在不同语义层级上的区分能力。大量实验表明，我们的方法在三维开放词汇分割与定位任务中表现出显著优势，其在层次语义数据集上的强劲性能进一步证明了其捕捉复杂三维语义结构的能力。\n"
  },
  {
    "path": "abs/2506.06846.md",
    "content": "### Multi-StyleGS: Stylizing Gaussian Splatting with Multiple Styles\n\nIn recent years, there has been a growing demand to stylize a given 3D scene to align with the artistic style of reference images for creative purposes. While 3D Gaussian Splatting(GS) has emerged as a promising and efficient method for realistic 3D scene modeling, there remains a challenge in adapting it to stylize 3D GS to match with multiple styles through automatic local style transfer or manual designation, while maintaining memory efficiency for stylization training. In this paper, we introduce a novel 3D GS stylization solution termed Multi-StyleGS to tackle these challenges. In particular, we employ a bipartite matching mechanism to au tomatically identify correspondences between the style images and the local regions of the rendered images. To facilitate local style transfer, we introduce a novel semantic style loss function that employs a segmentation network to apply distinct styles to various objects of the scene and propose a local-global feature matching to enhance the multi-view consistency. Furthermore, this technique can achieve memory efficient training, more texture details and better color match. To better assign a robust semantic label to each Gaussian, we propose several techniques to regularize the segmentation network. As demonstrated by our comprehensive experiments, our approach outperforms existing ones in producing plausible stylization results and offering flexible editing.\n\n近年来，越来越多的创意需求希望将给定的三维场景风格化，使其符合参考图像中的艺术风格。尽管三维高斯泼溅（3D Gaussian Splatting, GS）作为一种高效逼真的三维场景建模方法已初具规模，但在实现对多个风格的自动局部迁移或手动指定的风格化方面，仍面临诸多挑战，尤其是在保持风格化训练过程中的内存效率方面。\n为应对这些问题，本文提出了一种新颖的三维 GS 风格化方法——Multi-StyleGS。具体而言，我们引入了二分图匹配机制，自动识别风格图像与渲染图像局部区域之间的对应关系。为实现局部风格迁移，我们设计了一种新型的语义风格损失函数，利用分割网络将不同风格分别应用于场景中的不同物体，并提出局部-全局特征匹配机制，以增强多视角一致性。\n此外，该方法在实现内存高效训练的同时，也带来了更丰富的纹理细节和更准确的色彩匹配效果。为了更稳健地为每个高斯分配语义标签，我们还提出了一系列正则化策略以优化分割网络。\n大量实验证明，我们的方法在生成逼真风格化结果和支持灵活编辑方面均优于现有方法，展现出良好的泛化性与实用价值。\n"
  },
  {
    "path": "abs/2506.06909.md",
    "content": "### Gaussian Mapping for Evolving Scenes\n\nMapping systems with novel view synthesis (NVS) capabilities are widely used in computer vision, with augmented reality, robotics, and autonomous driving applications. Most notably, 3D Gaussian Splatting-based systems show high NVS performance; however, many current approaches are limited to static scenes. While recent works have started addressing short-term dynamics (motion within the view of the camera), long-term dynamics (the scene evolving through changes out of view) remain less explored. To overcome this limitation, we introduce a dynamic scene adaptation mechanism that continuously updates the 3D representation to reflect the latest changes. In addition, since maintaining geometric and semantic consistency remains challenging due to stale observations disrupting the reconstruction process, we propose a novel keyframe management mechanism that discards outdated observations while preserving as much information as possible. We evaluate Gaussian Mapping for Evolving Scenes (GaME) on both synthetic and real-world datasets and find it to be more accurate than the state of the art.\n\n具备新视角合成（Novel View Synthesis, NVS）能力的建图系统在计算机视觉中应用广泛，涵盖增强现实、机器人与自动驾驶等领域。其中，基于三维高斯泼溅（3D Gaussian Splatting, 3DGS）的系统在新视角合成任务中表现尤为出色；然而，目前多数方法仍局限于静态场景。\n尽管近期研究开始关注短期动态（即视野内的物体运动），但对长期动态（即场景因视野外变化而演化）问题的探索仍相对不足。为突破这一限制，我们提出了一种动态场景自适应机制，可持续更新三维表示以反映场景的最新变化。\n此外，由于旧观测可能干扰重建过程，导致几何与语义一致性难以维持，我们还设计了一种关键帧管理机制，用于在尽量保留信息的同时，剔除过时观测。\n我们在合成数据集和真实场景数据集上对所提出的 Gaussian Mapping for Evolving Scenes（GaME） 方法进行了评估，实验结果表明，其精度显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2506.06988.md",
    "content": "### Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction\n\n3D Gaussian splatting (3DGS) has demonstrated exceptional performance in image-based 3D reconstruction and real-time rendering. However, regions with complex textures require numerous Gaussians to capture significant color variations accurately, leading to inefficiencies in rendering speed. To address this challenge, we introduce a hybrid representation for indoor scenes that combines 3DGS with textured meshes. Our approach uses textured meshes to handle texture-rich flat areas, while retaining Gaussians to model intricate geometries. The proposed method begins by pruning and refining the extracted mesh to eliminate geometrically complex regions. We then employ a joint optimization for 3DGS and mesh, incorporating a warm-up strategy and transmittance-aware supervision to balance their contributions seamlessly.Extensive experiments demonstrate that the hybrid representation maintains comparable rendering quality and achieves superior frames per second FPS with fewer Gaussian primitives.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）在基于图像的三维重建与实时渲染中表现出色。然而，对于纹理复杂的区域，为准确捕捉颜色变化，往往需要大量高斯图元，导致渲染效率降低。\n为解决这一问题，我们提出了一种面向室内场景的混合表示方法，将 3DGS 与带纹理网格（textured meshes）相结合。该方法利用纹理网格处理大面积的纹理丰富平面区域，同时保留高斯图元以建模几何结构复杂的区域。\n具体而言，我们首先对提取的网格进行裁剪与优化，剔除几何复杂的区域。随后，引入联合优化策略，对 3DGS 与网格进行协同训练，其中包含预热策略与透射感知监督机制，以平衡二者在最终表示中的贡献，实现无缝融合。\n大量实验表明，该混合表示在保持可比渲染质量的同时，显著减少了高斯图元数量，提升了渲染帧率（FPS），展现出更优的效率与实用性。\n"
  },
  {
    "path": "abs/2506.07069.md",
    "content": "### Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization\n\n3D Gaussian Splatting (3DGS) has recently gained significant attention for high-quality and efficient view synthesis, making it widely adopted in fields such as AR/VR, robotics, and autonomous driving. Despite its impressive algorithmic performance, real-time rendering on resource-constrained devices remains a major challenge due to tight power and area budgets. This paper presents an architecture-algorithm co-design to address these inefficiencies. First, we reveal substantial redundancy caused by repeated computation of common terms/expressions during the conventional rasterization. To resolve this, we propose axis-oriented rasterization, which pre-computes and reuses shared terms along both the X and Y axes through a dedicated hardware design, effectively reducing multiply-and-add (MAC) operations by up to 63%. Second, by identifying the resource and performance inefficiency of the sorting process, we introduce a novel neural sorting approach that predicts order-independent blending weights using an efficient neural network, eliminating the need for costly hardware sorters. A dedicated training framework is also proposed to improve its algorithmic stability. Third, to uniformly support rasterization and neural network inference, we design an efficient reconfigurable processing array that maximizes hardware utilization and throughput. Furthermore, we introduce a π-trajectory tile schedule, inspired by Morton encoding and Hilbert curve, to optimize Gaussian reuse and reduce memory access overhead. Comprehensive experiments demonstrate that the proposed design preserves rendering quality while achieving a speedup of 23.4∼27.8× and energy savings of 28.8∼51.4× compared to edge GPUs for real-world scenes. We plan to open-source our design to foster further development in this field.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）因其在高质量、高效率的新视角合成中的优异表现，近年来备受关注，已广泛应用于增强/虚拟现实（AR/VR）、机器人和自动驾驶等领域。然而，受限于功耗与面积预算，在资源受限设备上实现实时渲染仍面临巨大挑战。\n本文提出一种**架构-算法协同设计（architecture-algorithm co-design）方法，以解决上述效率瓶颈。首先，我们揭示了传统光栅化过程中，由于重复计算公共项/表达式所导致的大量冗余。为此，我们提出轴向光栅化（axis-oriented rasterization）**方案，通过专用硬件设计沿 X 和 Y 轴预计算并复用共享项，有效减少了高达 63% 的乘加操作（MAC）。\n其次，针对排序过程中的资源与性能低效问题，我们提出一种新颖的神经排序方法（neural sorting），通过高效神经网络预测与顺序无关的混合权重，替代高开销的硬件排序器。我们还构建了专门的训练框架，以增强该算法的稳定性。\n第三，为统一支持光栅化与神经网络推理，我们设计了一个高效的可重构处理阵列（reconfigurable processing array），以最大化硬件利用率与吞吐率。此外，我们引入一种受莫顿编码（Morton encoding）与希尔伯特曲线（Hilbert curve）启发的π轨迹瓦片调度（π-trajectory tile schedule），进一步提升高斯重用率并减少内存访问开销。\n综合实验表明，在保持渲染质量的前提下，该设计相比边缘端 GPU 可实现 23.4～27.8× 的加速比 和 28.8～51.4× 的能耗节省，在真实场景中表现显著。我们计划开源该设计，以推动该领域的进一步发展。\n"
  },
  {
    "path": "abs/2506.07338.md",
    "content": "### Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation\n\nInstance Image-Goal Navigation (IIN) requires autonomous agents to identify and navigate to a target object or location depicted in a reference image captured from any viewpoint. While recent methods leverage powerful novel view synthesis (NVS) techniques, such as three-dimensional Gaussian splatting (3DGS), they typically rely on randomly sampling multiple viewpoints or trajectories to ensure comprehensive coverage of discriminative visual cues. This approach, however, creates significant redundancy through overlapping image samples and lacks principled view selection, substantially increasing both rendering and comparison overhead. In this paper, we introduce a novel IIN framework with a hierarchical scoring paradigm that estimates optimal viewpoints for target matching. Our approach integrates cross-level semantic scoring, utilizing CLIP-derived relevancy fields to identify regions with high semantic similarity to the target object class, with fine-grained local geometric scoring that performs precise pose estimation within promising regions. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on simulated IIN benchmarks and real-world applicability.\n\n实例图像目标导航（Instance Image-Goal Navigation，IIN）要求自主智能体识别并导航至目标对象或目标位置，该目标由任意视角下拍摄的参考图像所描述。尽管近期方法利用了强大的新颖视角合成（Novel View Synthesis，NVS）技术，如三维高斯泼洒（3D Gaussian Splatting，3DGS），但它们通常依赖于从多个视角或轨迹中随机采样，以覆盖尽可能多的判别性视觉线索。这种做法会带来大量冗余图像样本重叠的问题，且缺乏系统性的视角选择策略，显著增加了渲染和比对的开销。\n本文提出了一种新颖的 IIN 框架，采用分层评分范式以估计用于目标匹配的最优视角。我们的方法融合了跨层语义评分和细粒度的局部几何评分：前者利用基于 CLIP 的相关性场（relevancy fields）识别与目标对象类别在语义上高度相似的区域；后者则在候选区域中执行精确的姿态估计。大量实验评估表明，我们的方法在模拟的 IIN 基准测试中达到了当前最优性能，并具备良好的真实场景适应能力。\n"
  },
  {
    "path": "abs/2506.07657.md",
    "content": "### PIG: Physically-based Multi-Material Interaction with 3D Gaussians\n\n3D Gaussian Splatting has achieved remarkable success in reconstructing both static and dynamic 3D scenes. However, in a scene represented by 3D Gaussian primitives, interactions between objects suffer from inaccurate 3D segmentation, imprecise deformation among different materials, and severe rendering artifacts. To address these challenges, we introduce PIG: Physically-Based Multi-Material Interaction with 3D Gaussians, a novel approach that combines 3D object segmentation with the simulation of interacting objects in high precision. Firstly, our method facilitates fast and accurate mapping from 2D pixels to 3D Gaussians, enabling precise 3D object-level segmentation. Secondly, we assign unique physical properties to correspondingly segmented objects within the scene for multi-material coupled interactions. Finally, we have successfully embedded constraint scales into deformation gradients, specifically clamping the scaling and rotation properties of the Gaussian primitives to eliminate artifacts and achieve geometric fidelity and visual consistency. Experimental results demonstrate that our method not only outperforms the state-of-the-art (SOTA) in terms of visual quality, but also opens up new directions and pipelines for the field of physically realistic scene generation.\n\n三维高斯泼洒（3D Gaussian Splatting）在静态与动态三维场景重建方面取得了显著成功。然而，在由三维高斯基元表示的场景中，物体间的交互面临诸多挑战，如三维分割不准确、不同材质之间的变形不精确，以及严重的渲染伪影等问题。\n为解决上述问题，我们提出了一种新方法 PIG：基于物理的多材质交互与3D高斯建模（Physically-Based Multi-Material Interaction with 3D Gaussians）。该方法融合了三维物体分割与高精度交互仿真，主要包括三个方面：\n首先，我们的方法实现了从二维像素到三维高斯的快速且精确的映射，从而实现了精确的三维物体级分割；\n其次，我们为场景中对应分割出的物体赋予独特的物理属性，以支持多种材质之间的耦合交互；\n最后，我们在变形梯度中引入了约束尺度，通过对高斯基元的缩放与旋转属性进行限制，有效消除了渲染伪影，实现了几何保真与视觉一致性。\n实验结果表明，本文方法在视觉质量方面超越了当前最先进技术（SOTA），并为物理真实场景生成领域开辟了新的方向与流程。\n"
  },
  {
    "path": "abs/2506.07670.md",
    "content": "### ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views\n\nFeed-forward 3D Gaussian Splatting (3DGS) has recently demonstrated promising results for novel view synthesis (NVS) from sparse input views, particularly under narrow-baseline conditions. However, its performance significantly degrades in wide-baseline scenarios due to limited texture details and geometric inconsistencies across views. To address these challenges, in this paper, we propose ProSplat, a two-stage feed-forward framework designed for high-fidelity rendering under wide-baseline conditions. The first stage involves generating 3D Gaussian primitives via a 3DGS generator. In the second stage, rendered views from these primitives are enhanced through an improvement model. Specifically, this improvement model is based on a one-step diffusion model, further optimized by our proposed Maximum Overlap Reference view Injection (MORI) and Distance-Weighted Epipolar Attention (DWEA). MORI supplements missing texture and color by strategically selecting a reference view with maximum viewpoint overlap, while DWEA enforces geometric consistency using epipolar constraints. Additionally, we introduce a divide-and-conquer training strategy that aligns data distributions between the two stages through joint optimization. We evaluate ProSplat on the RealEstate10K and DL3DV-10K datasets under wide-baseline settings. Experimental results demonstrate that ProSplat achieves an average improvement of 1 dB in PSNR compared to recent SOTA methods.\n\n前馈式三维高斯泼洒（Feed-forward 3D Gaussian Splatting，3DGS）近年来在稀疏视角条件下的新颖视角合成（Novel View Synthesis，NVS）任务中展现出良好效果，尤其在窄基线（narrow-baseline）条件下表现突出。然而，在宽基线（wide-baseline）场景中，由于视角间纹理细节不足与几何不一致，其性能显著下降。\n为应对上述挑战，本文提出了 ProSplat：一种面向宽基线条件下高保真渲染的两阶段前馈框架。第一阶段通过一个 3DGS 生成器生成三维高斯基元；第二阶段则使用一个增强模型对这些高斯基元渲染得到的视图进行细化提升。\n具体而言，该增强模型基于一步扩散模型（one-step diffusion model），并结合我们提出的**最大重叠参考视图注入（Maximum Overlap Reference view Injection, MORI）与距离加权极线注意力机制（Distance-Weighted Epipolar Attention, DWEA）**进行优化。MORI 通过策略性地选择与当前视角重叠度最大的参考图像，有效补充缺失的纹理与颜色信息；DWEA 则利用极线约束强化几何一致性。此外，我们还引入了一种“分而治之”的训练策略，通过联合优化方式对齐两阶段间的数据分布。\n我们在 RealEstate10K 与 DL3DV-10K 数据集的宽基线设定下对 ProSplat 进行了评估。实验结果表明，ProSplat 在 PSNR 指标上相比最新的 SOTA 方法平均提升了 1 dB。\n"
  },
  {
    "path": "abs/2506.07697.md",
    "content": "### OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful representation for neural scene reconstruction, offering high-quality novel view synthesis while maintaining computational efficiency. In this paper, we extend the capabilities of 3DGS beyond pure scene representation by introducing an approach for open-vocabulary 3D instance segmentation without requiring manual labeling, termed OpenSplat3D. Our method leverages feature-splatting techniques to associate semantic information with individual Gaussians, enabling fine-grained scene understanding. We incorporate Segment Anything Model instance masks with a contrastive loss formulation as guidance for the instance features to achieve accurate instance-level segmentation. Furthermore, we utilize language embeddings of a vision-language model, allowing for flexible, text-driven instance identification. This combination enables our system to identify and segment arbitrary objects in 3D scenes based on natural language descriptions. We show results on LERF-mask and LERF-OVS as well as the full ScanNet++ validation set, demonstrating the effectiveness of our approach.\n\n三维高斯泼洒（3D Gaussian Splatting，3DGS）近年来作为一种强大的神经场景重建表示方式，因其在保持计算效率的同时实现高质量新颖视角合成而备受关注。本文在传统3DGS仅用于场景表示的基础上进一步拓展其能力，提出了一种无需人工标注的开放词汇三维实例分割方法，称为 OpenSplat3D。\n我们的方法利用特征泼洒（feature-splatting）技术将语义信息关联到每一个高斯基元，从而实现对场景的细粒度理解。具体地，我们引入 Segment Anything Model（SAM） 的实例掩码，并结合对比损失（contrastive loss）来引导实例特征学习，从而获得准确的实例级分割。此外，我们还引入了视觉语言模型的语言嵌入，使得系统能够通过自然语言灵活地识别和分割三维场景中的任意对象。\n这一方法实现了基于文本描述的任意对象在三维空间中的识别与分割。我们在 LERF-mask、LERF-OVS 以及完整的 ScanNet++ 验证集上进行了实验，结果表明我们的方法在开放词汇三维实例分割任务中表现出色，验证了其有效性。\n"
  },
  {
    "path": "abs/2506.07826.md",
    "content": "### R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation\n\nValidating autonomous driving (AD) systems requires diverse and safety-critical testing, making photorealistic virtual environments essential. Traditional simulation platforms, while controllable, are resource-intensive to scale and often suffer from a domain gap with real-world data. In contrast, neural reconstruction methods like 3D Gaussian Splatting (3DGS) offer a scalable solution for creating photorealistic digital twins of real-world driving scenes. However, they struggle with dynamic object manipulation and reusability as their per-scene optimization-based methodology tends to result in incomplete object models with integrated illumination effects. This paper introduces R3D2, a lightweight, one-step diffusion model designed to overcome these limitations and enable realistic insertion of complete 3D assets into existing scenes by generating plausible rendering effects-such as shadows and consistent lighting-in real time. This is achieved by training R3D2 on a novel dataset: 3DGS object assets are generated from in-the-wild AD data using an image-conditioned 3D generative model, and then synthetically placed into neural rendering-based virtual environments, allowing R3D2 to learn realistic integration. Quantitative and qualitative evaluations demonstrate that R3D2 significantly enhances the realism of inserted assets, enabling use-cases like text-to-3D asset insertion and cross-scene/dataset object transfer, allowing for true scalability in AD validation.\n\n验证自动驾驶（Autonomous Driving，AD）系统需要覆盖多样化且安全关键的测试情境，因此逼真的虚拟环境变得尤为重要。传统仿真平台虽具可控性，但在扩展上资源消耗大，且与真实世界数据之间存在显著的领域差异。而诸如三维高斯泼洒（3D Gaussian Splatting，3DGS）等神经重建方法，则为构建真实驾驶场景的数字孪生体提供了一种可扩展的解决方案。然而，由于其基于每个场景独立优化的方式，这类方法在动态对象操作与复用方面存在困难，往往导致生成的对象模型不完整，并融合了场景固有的光照信息，难以灵活插入新对象。\n为突破这一局限，本文提出 R3D2：一种轻量化的一步扩散模型，旨在实现完整三维资产在已有场景中的真实感插入，并实时生成合理的渲染效果（如阴影与一致光照）。该方法通过在一个全新构建的数据集上进行训练实现上述目标：我们首先利用图像条件的三维生成模型，从真实自动驾驶数据中生成 3DGS 对象资产；再将这些资产合成地插入基于神经渲染的虚拟环境中，从而使 R3D2 学习对象与场景的真实融合模式。\n定量与定性评估均表明，R3D2 显著提升了插入资产的真实感，支持如文本生成三维资产插入（text-to-3D asset insertion）、跨场景/数据集的对象迁移等应用场景，真正实现了自动驾驶验证任务中的可扩展性。\n"
  },
  {
    "path": "abs/2506.07865.md",
    "content": "### FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity\n\nIn this paper, we aim to model 3D scene geometry, appearance, and the underlying physics purely from multi-view videos. By applying various governing PDEs as PINN losses or incorporating physics simulation into neural networks, existing works often fail to learn complex physical motions at boundaries or require object priors such as masks or types. In this paper, we propose FreeGave to learn the physics of complex dynamic 3D scenes without needing any object priors. The key to our approach is to introduce a physics code followed by a carefully designed divergence-free module for estimating a per-Gaussian velocity field, without relying on the inefficient PINN losses. Extensive experiments on three public datasets and a newly collected challenging real-world dataset demonstrate the superior performance of our method for future frame extrapolation and motion segmentation. Most notably, our investigation into the learned physics codes reveals that they truly learn meaningful 3D physical motion patterns in the absence of any human labels in training.\n\n本文旨在从多视角视频中纯粹地建模三维场景的几何、外观及其底层物理属性。现有方法通常通过将各种控制偏微分方程（PDEs）作为物理引导神经网络（PINN）损失，或将物理仿真融入神经网络来建模物理过程，但这些方法往往无法有效学习物体边界处复杂的物理运动，或依赖于如掩码、物体类别等先验知识。\n为此，本文提出 FreeGave：一种无需任何物体先验即可学习复杂动态三维场景物理的全新方法。该方法的核心在于引入一种物理编码（physics code），并设计了一个无散度（divergence-free）的模块，以估计每个高斯基元的速度场，从而避免了低效的 PINN 损失函数。\n我们在三个公开数据集以及一个新采集的具有挑战性的真实世界数据集上进行了大量实验，结果显示该方法在未来帧预测和运动分割任务中具有显著性能优势。尤其值得注意的是，我们对学习到的物理编码进行了深入分析，发现即便在完全无人工标注的训练条件下，模型仍能够习得有意义的三维物理运动模式。\n"
  },
  {
    "path": "abs/2506.07897.md",
    "content": "### GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution\n\nWe present a novel approach for enhancing the resolution and geometric fidelity of 3D Gaussian Splatting (3DGS) beyond native training resolution. Current 3DGS methods are fundamentally limited by their input resolution, producing reconstructions that cannot extrapolate finer details than are present in the training views. Our work breaks this limitation through a lightweight generative model that predicts and refines additional 3D Gaussians where needed most. The key innovation is our Hessian-assisted sampling strategy, which intelligently identifies regions that are likely to benefit from densification, ensuring computational efficiency. Unlike computationally intensive GANs or diffusion approaches, our method operates in real-time (0.015s per inference on a single consumer-grade GPU), making it practical for interactive applications. Comprehensive experiments demonstrate significant improvements in both geometric accuracy and rendering quality compared to state-of-the-art methods, establishing a new paradigm for resolution-free 3D scene enhancement.\n\n我们提出了一种新方法，可在超越原生训练分辨率的情况下提升 3D 高斯喷溅（3D Gaussian Splatting, 3DGS）的分辨率与几何保真度。现有的 3DGS 方法在本质上受限于输入分辨率，其重建结果无法外推超出训练视图所包含的更精细细节。我们的工作通过一种轻量级生成模型，能够在最需要的区域预测并精炼额外的 3D 高斯点，从而突破这一限制。其核心创新在于 Hessian 辅助采样策略，该策略可智能识别可能从加密中获益的区域，从而确保计算效率。与计算开销巨大的 GAN 或扩散方法不同，我们的方法可在实时运行（单张消费级 GPU 上推理仅需 0.015 秒），因此非常适用于交互式应用。大量实验结果表明，与当前最先进的方法相比，我们在几何精度与渲染质量方面均取得了显著提升，为无分辨率限制的 3D 场景增强确立了新的范式。\n"
  },
  {
    "path": "abs/2506.07917.md",
    "content": "### Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes\n\nRecent extensions of 3D Gaussian Splatting (3DGS) to dynamic scenes achieve high-quality novel view synthesis by using neural networks to predict the time-varying deformation of each Gaussian. However, performing per-Gaussian neural inference at every frame poses a significant bottleneck, limiting rendering speed and increasing memory and compute requirements. In this paper, we present Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), a general pipeline for accelerating the rendering speed of dynamic 3DGS and 4DGS representations by reducing neural inference through two complementary techniques. First, we propose a temporal sensitivity pruning score that identifies and removes Gaussians with low contribution to the dynamic scene reconstruction. We also introduce an annealing smooth pruning mechanism that improves pruning robustness in real-world scenes with imprecise camera poses. Second, we propose GroupFlow, a motion analysis technique that clusters Gaussians by trajectory similarity and predicts a single rigid transformation per group instead of separate deformations for each Gaussian. Together, our techniques accelerate rendering by 10.37×, reduce model size by 7.71×, and shorten training time by 2.71× on the NeRF-DS dataset. SpeeDe3DGS also improves rendering speed by 4.20× and 58.23× on the D-NeRF and HyperNeRF vrig datasets. Our methods are modular and can be integrated into any deformable 3DGS or 4DGS framework.\n\n近期，将 3D 高斯喷溅（3D Gaussian Splatting, 3DGS）扩展至动态场景的方法，通过神经网络预测每个高斯点的时变形变，实现了高质量的新视角合成。然而，在每一帧对每个高斯点执行神经推理会带来显著瓶颈，限制了渲染速度，并增加了内存与计算需求。本文提出 Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS)，这是一种通用流程，通过减少神经推理来加速动态 3DGS 与 4DGS 表示的渲染速度，具体采用两种互补技术。\n首先，我们提出 时间敏感性剪枝评分（temporal sensitivity pruning score），用于识别并移除对动态场景重建贡献较低的高斯点。同时，我们引入 退火平滑剪枝机制（annealing smooth pruning），在相机位姿不精确的真实场景中提升剪枝的鲁棒性。\n其次，我们提出 GroupFlow 运动分析技术，根据轨迹相似性将高斯点聚类，并为每个群组预测单一刚性变换，而不是为每个高斯点分别预测形变。\n结合上述两种技术，我们在 NeRF-DS 数据集上实现了 10.37× 的渲染加速、7.71× 的模型体积压缩，以及 2.71× 的训练时间缩短；在 D-NeRF 和 HyperNeRF vrig 数据集上，渲染速度分别提升 4.20× 和 58.23×。我们的方案具备模块化特性，可无缝集成至任意可形变 3DGS 或 4DGS 框架中。\n"
  },
  {
    "path": "abs/2506.08350.md",
    "content": "### Complex-Valued Holographic Radiance Fields\n\nModeling the full properties of light, including both amplitude and phase, in 3D representations is crucial for advancing physically plausible rendering, particularly in holographic displays. To support these features, we propose a novel representation that optimizes 3D scenes without relying on intensity-based intermediaries. We reformulate 3D Gaussian splatting with complex-valued Gaussian primitives, expanding support for rendering with light waves. By leveraging RGBD multi-view images, our method directly optimizes complex-valued Gaussians as a 3D holographic scene representation. This eliminates the need for computationally expensive hologram re-optimization. Compared with state-of-the-art methods, our method achieves 30x-10,000x speed improvements while maintaining on-par image quality, representing a first step towards geometrically aligned, physically plausible holographic scene representations.\n\n在三维表示中同时建模光的振幅与相位等完整特性，对于推进物理一致性的渲染，尤其是在全息显示领域至关重要。为支持这些特性，我们提出了一种全新的表示方法，可在不依赖基于强度的中间结果的情况下优化三维场景。我们将三维高斯喷溅（3D Gaussian Splatting）重新表述为复值高斯基元，从而扩展对光波渲染的支持。利用 RGBD 多视图图像，我们的方法可直接优化复值高斯，将其作为三维全息场景表示，从而避免了计算代价高昂的全息图重新优化过程。与当前最先进的方法相比，我们的方法在保持相当图像质量的同时，实现了 30× 至 10,000× 的速度提升，代表着向几何对齐、物理一致性的全息场景表示迈出的第一步。\n"
  },
  {
    "path": "abs/2506.08704.md",
    "content": "### TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering\n\nHigh-quality novel view synthesis for large-scale scenes presents a challenging dilemma in 3D computer vision. Existing methods typically partition large scenes into multiple regions, reconstruct a 3D representation using Gaussian splatting for each region, and eventually merge them for novel view rendering. They can accurately render specific scenes, yet they do not generalize effectively for two reasons: (1) rigid spatial partition techniques struggle with arbitrary camera trajectories, and (2) the merging of regions results in Gaussian overlap to distort texture details. To address these challenges, we propose TraGraph-GS, leveraging a trajectory graph to enable high-precision rendering for arbitrarily large-scale scenes. We present a spatial partitioning method for large-scale scenes based on graphs, which incorporates a regularization constraint to enhance the rendering of textures and distant objects, as well as a progressive rendering strategy to mitigate artifacts caused by Gaussian overlap. Experimental results demonstrate its superior performance both on four aerial and four ground datasets and highlight its remarkable efficiency: our method achieves an average improvement of 1.86 dB in PSNR on aerial datasets and 1.62 dB on ground datasets compared to state-of-the-art approaches.\n\n在大规模场景中实现高质量的新视角合成，是三维计算机视觉领域面临的重大挑战。现有方法通常将大场景划分为多个子区域，对每个区域分别利用高斯喷溅（Gaussian Splatting）进行三维重建，并最终将其合并以生成新视角渲染。虽然这种方法能够对特定场景进行精确渲染，但在泛化性方面存在两大问题：(1) 刚性空间划分技术难以适应任意相机轨迹；(2) 多区域合并时的高斯重叠会导致纹理细节失真。\n为解决这些问题，我们提出 TraGraph-GS，利用轨迹图（trajectory graph）实现对任意大规模场景的高精度渲染。我们提出了一种基于图的大规模场景空间划分方法，引入正则化约束以增强纹理与远处物体的渲染质量，并采用渐进式渲染策略以缓解高斯重叠带来的伪影。\n实验结果表明，该方法在四个航拍数据集与四个地面数据集上均表现优异，并具有显著的效率优势：与当前最先进的方法相比，我们在航拍数据集上的平均 PSNR 提升 1.86 dB，在地面数据集上的平均 PSNR 提升 1.62 dB。\n"
  },
  {
    "path": "abs/2506.08710.md",
    "content": "### SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) serves as a highly performant and efficient encoding of scene geometry, appearance, and semantics. Moreover, grounding language in 3D scenes has proven to be an effective strategy for 3D scene understanding. Current Language Gaussian Splatting line of work fall into three main groups: (i) per-scene optimization-based, (ii) per-scene optimization-free, and (iii) generalizable approach. However, most of them are evaluated only on rendered 2D views of a handful of scenes and viewpoints close to the training views, limiting ability and insight into holistic 3D understanding. To address this gap, we propose the first large-scale benchmark that systematically assesses these three groups of methods directly in 3D space, evaluating on 1060 scenes across three indoor datasets and one outdoor dataset. Benchmark results demonstrate a clear advantage of the generalizable paradigm, particularly in relaxing the scene-specific limitation, enabling fast feed-forward inference on novel scenes, and achieving superior segmentation performance. We further introduce GaussianWorld-49K a carefully curated 3DGS dataset comprising around 49K diverse indoor and outdoor scenes obtained from multiple sources, with which we demonstrate the generalizable approach could harness strong data priors. Our codes, benchmark, and datasets will be made public to accelerate research in generalizable 3DGS scene understanding.\n\n3D 高斯喷溅（3D Gaussian Splatting, 3DGS）是一种高性能且高效的场景几何、外观与语义编码方式。此外，将语言与三维场景对齐已被证明是实现三维场景理解的有效策略。当前的 Language Gaussian Splatting 研究主要分为三类：(i) 基于每场景优化（per-scene optimization-based），(ii) 无需每场景优化（per-scene optimization-free），以及 (iii) 可泛化方法（generalizable approach）。然而，大多数方法仅在少量场景的渲染二维视图上进行评估，且评估视角接近训练视图，这在整体三维理解能力与洞察力方面存在局限。\n为弥补这一空白，我们提出了首个大规模基准，能够在 三维空间中直接、系统地评估 这三类方法，覆盖来自三个室内数据集和一个室外数据集的 1060 个场景。基准测试结果显示，可泛化范式在多个方面具备明显优势，尤其是在消除场景特定性限制、实现对新场景的快速前向推理，以及获得更优越的分割性能方面。\n此外，我们引入 GaussianWorld-49K 数据集，这是一个经过精心筛选的 3DGS 数据集，包含约 4.9 万 个来自多种来源的多样化室内与室外场景。实验表明，可泛化方法能够充分利用其中蕴含的强大数据先验。我们的代码、基准与数据集将公开发布，以加速可泛化 3DGS 场景理解领域的研究进展。\n"
  },
  {
    "path": "abs/2506.08777.md",
    "content": "### Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting\n\nSelf-supervised learning (SSL) for point cloud pre-training has become a cornerstone for many 3D vision tasks, enabling effective learning from large-scale unannotated data. At the scene level, existing SSL methods often incorporate volume rendering into the pre-training framework, using RGB-D images as reconstruction signals to facilitate cross-modal learning. This strategy promotes alignment between 2D and 3D modalities and enables the model to benefit from rich visual cues in the RGB-D inputs. However, these approaches are limited by their reliance on implicit scene representations and high memory demands. Furthermore, since their reconstruction objectives are applied only in 2D space, they often fail to capture underlying 3D geometric structures. To address these challenges, we propose Gaussian2Scene, a novel scene-level SSL framework that leverages the efficiency and explicit nature of 3D Gaussian Splatting (3DGS) for pre-training. The use of 3DGS not only alleviates the computational burden associated with volume rendering but also supports direct 3D scene reconstruction, thereby enhancing the geometric understanding of the backbone network. Our approach follows a progressive two-stage training strategy. In the first stage, a dual-branch masked autoencoder learns both 2D and 3D scene representations. In the second stage, we initialize training with reconstructed point clouds and further supervise learning using the geometric locations of Gaussian primitives and rendered RGB images. This process reinforces both geometric and cross-modal learning. We demonstrate the effectiveness of Gaussian2Scene across several downstream 3D object detection tasks, showing consistent improvements over existing pre-training methods.\n\n点云自监督学习（Self-supervised learning, SSL）在预训练中的应用已成为众多 3D 视觉任务的基石，使得模型能够高效地从大规模无标注数据中学习。在场景级别，现有的 SSL 方法通常将体渲染（volume rendering）引入预训练框架中，并利用 RGB-D 图像作为重建信号，以促进跨模态学习。这一策略有助于实现 2D 与 3D 模态的对齐，并使模型能够从 RGB-D 输入中丰富的视觉线索中获益。然而，这类方法受限于对隐式场景表示的依赖以及较高的内存需求。此外，由于其重建目标仅在二维空间中施加，往往难以有效捕捉潜在的三维几何结构。\n为解决上述问题，我们提出 Gaussian2Scene，一种新颖的场景级 SSL 框架，在预训练中利用 3D 高斯点渲染（3D Gaussian Splatting, 3DGS） 的高效性与显式特性。3DGS 的引入不仅缓解了体渲染所带来的计算负担，还支持直接的三维场景重建，从而增强主干网络对几何信息的理解。\n我们的方法采用渐进式的两阶段训练策略：\n第一阶段，使用双分支的掩码自编码器（masked autoencoder）同时学习二维与三维场景表示；\n第二阶段，以重建的点云初始化训练，并进一步利用高斯基元（Gaussian primitives）的几何位置以及渲染得到的 RGB 图像进行监督，从而同时强化几何学习与跨模态学习。\n在多个下游的三维目标检测任务中，我们验证了 Gaussian2Scene 的有效性，并在性能上持续优于现有的预训练方法。\n"
  },
  {
    "path": "abs/2506.08862.md",
    "content": "### StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams\n\nReal-time reconstruction of dynamic 3D scenes from uncalibrated video streams is crucial for numerous real-world applications. However, existing methods struggle to jointly address three key challenges: 1) processing uncalibrated inputs in real time, 2) accurately modeling dynamic scene evolution, and 3) maintaining long-term stability and computational efficiency. To this end, we introduce StreamSplat, the first fully feed-forward framework that transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner, capable of recovering scene dynamics from temporally local observations. We propose two key technical innovations: a probabilistic sampling mechanism in the static encoder for 3DGS position prediction, and a bidirectional deformation field in the dynamic decoder that enables robust and efficient dynamic modeling. Extensive experiments on static and dynamic benchmarks demonstrate that StreamSplat consistently outperforms prior works in both reconstruction quality and dynamic scene modeling, while uniquely supporting online reconstruction of arbitrarily long video streams.\n\n从未经标定的视频流中实时重建动态三维场景对于众多现实应用至关重要。然而，现有方法难以同时解决三个关键挑战：(1) 实时处理未经标定的输入，(2) 准确建模动态场景的演化过程，以及 (3) 保持长期稳定性与计算效率。为此，我们提出了 **StreamSplat**，这是首个可将任意长度的未经标定视频流在线转换为动态三维高斯溅射（3DGS）表示的全前馈框架，能够仅依赖时间局部观测恢复场景动态。我们提出了两项关键技术创新：(1) 在静态编码器中引入用于 3DGS 位置预测的概率采样机制；(2) 在动态解码器中引入双向形变场，以实现稳健且高效的动态建模。在静态与动态基准测试上的大量实验表明，**StreamSplat** 在重建质量和动态场景建模方面均显著优于现有方法，并且独特地支持任意长度视频流的在线重建。\n"
  },
  {
    "path": "abs/2506.09070.md",
    "content": "### STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support\n\n3D Gaussian Splatting (3DGS) has gained popularity for its efficiency and sparse Gaussian-based representation. However, 3DGS struggles to meet the real-time requirement of 90 frames per second (FPS) on resource-constrained mobile devices, achieving only 2 to 9 this http URL accelerators focus on compute efficiency but overlook memory efficiency, leading to redundant DRAM traffic. We introduce STREAMINGGS, a fully streaming 3DGS algorithm-architecture co-design that achieves fine-grained pipelining and reduces DRAM traffic by transforming from a tile-centric rendering to a memory-centric rendering. Results show that our design achieves up to 45.7 × speedup and 62.9 × energy savings over mobile Ampere GPUs.\n\n三维高斯溅射（3DGS）因其高效性和稀疏的基于高斯的表示方式而广受关注。然而，在资源受限的移动设备上，3DGS 难以满足每秒 90 帧（FPS）的实时需求，仅能实现 2 至 9 FPS。现有加速器虽注重计算效率，却忽视了内存效率，导致大量冗余的 DRAM 访问。为此，我们提出 **STREAMINGGS**，一种全流式的 3DGS 算法-架构协同设计，通过将渲染方式从以图块为中心转变为以内存为中心，实现了细粒度流水化并减少 DRAM 访问量。实验结果表明，我们的设计相比移动端 Ampere GPU 可实现最高 45.7 倍的加速和 62.9 倍的能耗节省。\n"
  },
  {
    "path": "abs/2506.09378.md",
    "content": "### UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images\n\nWe propose a feed-forward Gaussian Splatting model that unifies 3D scene and semantic field reconstruction. Combining 3D scenes with semantic fields facilitates the perception and understanding of the surrounding environment. However, key challenges include embedding semantics into 3D representations, achieving generalizable real-time reconstruction, and ensuring practical applicability by using only images as input without camera parameters or ground truth depth. To this end, we propose UniForward, a feed-forward model to predict 3D Gaussians with anisotropic semantic features from only uncalibrated and unposed sparse-view images. To enable the unified representation of the 3D scene and semantic field, we embed semantic features into 3D Gaussians and predict them through a dual-branch decoupled decoder. During training, we propose a loss-guided view sampler to sample views from easy to hard, eliminating the need for ground truth depth or masks required by previous methods and stabilizing the training process. The whole model can be trained end-to-end using a photometric loss and a distillation loss that leverages semantic features from a pre-trained 2D semantic model. At the inference stage, our UniForward can reconstruct 3D scenes and the corresponding semantic fields in real time from only sparse-view images. The reconstructed 3D scenes achieve high-quality rendering, and the reconstructed 3D semantic field enables the rendering of view-consistent semantic features from arbitrary views, which can be further decoded into dense segmentation masks in an open-vocabulary manner. Experiments on novel view synthesis and novel view segmentation demonstrate that our method achieves state-of-the-art performances for unifying 3D scene and semantic field reconstruction.\n\n我们提出了一种前馈式高斯溅射模型，用于统一三维场景与语义场的重建。将三维场景与语义场相结合，有助于感知与理解周围环境。然而，核心挑战在于：(1) 将语义信息嵌入到三维表示中；(2) 实现具有泛化能力的实时重建；(3) 在仅使用图像作为输入的情况下（无需相机参数或真实深度）确保其实用性。为此，我们提出了 **UniForward**，一种前馈式模型，仅基于未经标定且无位姿的稀疏视角图像，即可预测具有各向异性语义特征的三维高斯。为实现三维场景与语义场的统一表示，我们将语义特征嵌入三维高斯中，并通过双分支解耦解码器进行预测。在训练阶段，我们提出了一种基于损失引导的视角采样策略，从简单到困难依次采样视角，从而避免了以往方法所需的真实深度或掩码标注，并能稳定训练过程。整个模型可端到端训练，损失函数包括光度损失和利用预训练二维语义模型特征的蒸馏损失。在推理阶段，**UniForward** 仅依赖稀疏视角图像即可实时重建三维场景及其对应的语义场。重建的三维场景具有高质量渲染效果，而重建的三维语义场则支持从任意视角渲染视角一致的语义特征，并可进一步解码为开放词汇的稠密分割掩码。在新视角合成与新视角分割任务上的实验结果表明，我们的方法在统一三维场景与语义场重建方面达到了当前最优的性能。\n"
  },
  {
    "path": "abs/2506.09417.md",
    "content": "### ODG: Occupancy Prediction Using Dual Gaussians\n\nOccupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment, making it a critical perception task for autonomous driving. Existing methods either adopt dense grids as scene representation, which is difficult to scale to high resolution, or learn the entire scene using a single set of sparse queries, which is insufficient to handle the various object characteristics. In this paper, we present ODG, a hierarchical dual sparse Gaussian representation to effectively capture complex scene dynamics. Building upon the observation that driving scenes can be universally decomposed into static and dynamic counterparts, we define dual Gaussian queries to better model the diverse scene objects. We utilize a hierarchical Gaussian transformer to predict the occupied voxel centers and semantic classes along with the Gaussian parameters. Leveraging the real-time rendering capability of 3D Gaussian Splatting, we also impose rendering supervision with available depth and semantic map annotations injecting pixel-level alignment to boost occupancy learning. Extensive experiments on the Occ3D-nuScenes and Occ3D-Waymo benchmarks demonstrate our proposed method sets new state-of-the-art results while maintaining low inference cost.\n\n占用预测（Occupancy Prediction）通过摄像机图像推断环境的精细三维几何与语义信息，是自动驾驶中的关键感知任务。现有方法要么采用稠密网格作为场景表示，但难以扩展至高分辨率；要么使用单一稀疏查询集来学习整个场景，但不足以应对不同物体的多样特性。本文提出 **ODG**，一种分层双稀疏高斯表示方法，以高效捕获复杂的场景动态。基于驾驶场景可普遍分解为静态部分与动态部分的观察，我们设计了双高斯查询（Dual Gaussian Queries），以更好地建模多样化的场景物体。我们采用分层高斯 Transformer 预测占用体素中心、语义类别及其对应的高斯参数。利用三维高斯溅射（3D Gaussian Splatting）的实时渲染能力，我们结合可用的深度和语义图标注引入渲染监督，在像素级对齐约束下进一步提升占用预测性能。在 Occ3D-nuScenes 与 Occ3D-Waymo 基准测试上的大量实验表明，我们的方法在保持低推理开销的同时，达到了新的最优性能。\n"
  },
  {
    "path": "abs/2506.09479.md",
    "content": "### TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation\n\nThe recent development of feedforward 3D Gaussian Splatting (3DGS) presents a new paradigm to reconstruct 3D scenes. Using neural networks trained on large-scale multi-view datasets, it can directly infer 3DGS representations from sparse input views. Although the feedforward approach achieves high reconstruction speed, it still suffers from the substantial storage cost of 3D Gaussians. Existing 3DGS compression methods relying on scene-wise optimization are not applicable due to architectural incompatibilities. To overcome this limitation, we propose TinySplat, a complete feedforward approach for generating compact 3D scene representations. Built upon standard feedforward 3DGS methods, TinySplat integrates a training-free compression framework that systematically eliminates key sources of redundancy. Specifically, we introduce View-Projection Transformation (VPT) to reduce geometric redundancy by projecting geometric parameters into a more compact space. We further present Visibility-Aware Basis Reduction (VABR), which mitigates perceptual redundancy by aligning feature energy along dominant viewing directions via basis transformation. Lastly, spatial redundancy is addressed through an off-the-shelf video codec. Comprehensive experimental results on multiple benchmark datasets demonstrate that TinySplat achieves over 100x compression for 3D Gaussian data generated by feedforward methods. Compared to the state-of-the-art compression approach, we achieve comparable quality with only 6% of the storage size. Meanwhile, our compression framework requires only 25% of the encoding time and 1% of the decoding time.\n\n前馈式三维高斯溅射（3DGS）的最新发展为三维场景重建提供了一种全新范式。通过在大规模多视图数据集上训练的神经网络，它可以直接从稀疏输入视角推断出 3DGS 表示。尽管前馈方法在重建速度上表现出色，但仍面临三维高斯数据存储成本巨大的问题。现有依赖于逐场景优化的 3DGS 压缩方法由于架构不兼容而无法适用。为解决这一问题，我们提出了 **TinySplat**，一种完整的前馈式紧凑三维场景表示生成方法。在标准前馈 3DGS 方法的基础上，**TinySplat** 集成了一个无需训练的压缩框架，系统性地消除主要的冗余来源。具体而言，我们提出了 **视图投影变换（View-Projection Transformation, VPT）**，通过将几何参数投影到更紧凑的空间来减少几何冗余；进一步提出了 **可见性感知基变换（Visibility-Aware Basis Reduction, VABR）**，通过基变换将特征能量沿主要观察方向对齐，从而缓解感知冗余；最后，利用现成的视频编码器来处理空间冗余。多项基准数据集的综合实验结果表明，**TinySplat** 对由前馈方法生成的三维高斯数据可实现超过 100 倍的压缩率。与当前最先进的压缩方法相比，我们在仅使用其 6% 存储空间的情况下实现了相当的重建质量。同时，我们的压缩框架在编码时间上仅需其 25%，解码时间上仅需其 1%。\n"
  },
  {
    "path": "abs/2506.09518.md",
    "content": "### HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene\n\nReconstructing dynamic 3D scenes from monocular videos remains a fundamental challenge in 3D vision. While 3D Gaussian Splatting (3DGS) achieves real-time rendering in static settings, extending it to dynamic scenes is challenging due to the difficulty of learning structured and temporally consistent motion representations. This challenge often manifests as three limitations in existing methods: redundant Gaussian updates, insufficient motion supervision, and weak modeling of complex non-rigid deformations. These issues collectively hinder coherent and efficient dynamic reconstruction. To address these limitations, we propose HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation. It first identifies motion-relevant regions via an Anchor Filter to suppresses redundant updates in static areas. A self-supervised Induced Flow-Guided Deformation module induces anchor motion using multi-frame feature aggregation, eliminating the need for explicit flow labels. To further handle fine-grained deformations, a Hierarchical Anchor Propagation mechanism increases anchor resolution based on motion complexity and propagates multi-level transformations. Extensive experiments on synthetic and real-world benchmarks validate that HAIF-GS significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.\n\n从单目视频中重建动态三维场景仍然是三维视觉领域的核心挑战。尽管三维高斯溅射（3DGS）在静态场景中能够实现实时渲染，但将其扩展到动态场景依然困难，主要原因在于难以学习结构化且时间一致的运动表示。这一难题在现有方法中通常表现为三大局限：(1) 冗余的高斯更新；(2) 不足的运动监督；(3) 对复杂非刚性形变的建模能力较弱。这些问题共同阻碍了连贯且高效的动态重建。为解决上述问题，我们提出 **HAIF-GS**，一种通过稀疏锚点驱动形变实现结构化与一致性动态建模的统一框架。该方法首先通过 **锚点过滤器（Anchor Filter）** 检测与运动相关的区域，从而抑制静态区域中的冗余更新；随后引入一种 **自监督的流引导形变模块（Induced Flow-Guided Deformation）**，利用多帧特征聚合来驱动锚点运动，无需显式光流标签；最后，为更好地处理细粒度形变，我们提出 **分层锚点传播机制（Hierarchical Anchor Propagation）**，根据运动复杂度提升锚点分辨率，并传播多层级的变换信息。大量在合成和真实数据集上的实验验证表明，**HAIF-GS** 在渲染质量、时间一致性和重建效率方面均显著优于现有动态 3DGS 方法。\n"
  },
  {
    "path": "abs/2506.09534.md",
    "content": "### Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful technique for radiance field rendering, but it typically requires millions of redundant Gaussian primitives, overwhelming memory and rendering budgets. Existing compaction approaches address this by pruning Gaussians based on heuristic importance scores, without global fidelity guarantee. To bridge this gap, we propose a novel optimal transport perspective that casts 3DGS compaction as global Gaussian mixture reduction. Specifically, we first minimize the composite transport divergence over a KD-tree partition to produce a compact geometric representation, and then decouple appearance from geometry by fine-tuning color and opacity attributes with far fewer Gaussian primitives. Experiments on benchmark datasets show that our method (i) yields negligible loss in rendering quality (PSNR, SSIM, LPIPS) compared to vanilla 3DGS with only 10% Gaussians; and (ii) consistently outperforms state-of-the-art 3DGS compaction techniques. Notably, our method is applicable to any stage of vanilla or accelerated 3DGS pipelines, providing an efficient and agnostic pathway to lightweight neural rendering.\n\n三维高斯溅射（3DGS）已成为辐射场渲染的强大技术，但其通常需要数百万个冗余的高斯基元，从而造成巨大的内存与渲染开销。现有压缩方法多依赖基于启发式重要性评分的高斯裁剪，但缺乏全局保真度保证。为此，我们提出了一种全新的**最优传输视角**，将 3DGS 压缩建模为全局高斯混合降维问题。具体而言，我们首先在 KD 树划分上最小化复合传输散度，以获得紧凑的几何表示；随后，通过微调颜色与不透明度属性，将外观与几何解耦，并使用显著更少的高斯基元完成表示。基准数据集上的实验表明，我们的方法 (i) 在仅保留 10% 高斯基元的情况下，与原始 3DGS 相比在渲染质量（PSNR、SSIM、LPIPS）上几乎无损；(ii) 在各项指标上均持续优于当前最先进的 3DGS 压缩方法。值得注意的是，我们的方法可适用于原始或加速版 3DGS 管线的任意阶段，为实现轻量化神经渲染提供了一条高效且通用的路径。\n"
  },
  {
    "path": "abs/2506.09565.md",
    "content": "### SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields\n\nHolistic 3D scene understanding, which jointly models geometry, appearance, and semantics, is crucial for applications like augmented reality and robotic interaction. Existing feed-forward 3D scene understanding methods (e.g., LSM) are limited to extracting language-based semantics from scenes, failing to achieve holistic scene comprehension. Additionally, they suffer from low-quality geometry reconstruction and noisy artifacts. In contrast, per-scene optimization methods rely on dense input views, which reduces practicality and increases complexity during deployment. In this paper, we propose SemanticSplat, a feed-forward semantic-aware 3D reconstruction method, which unifies 3D Gaussians with latent semantic attributes for joint geometry-appearance-semantics modeling. To predict the semantic anisotropic Gaussians, SemanticSplat fuses diverse feature fields (e.g., LSeg, SAM) with a cost volume representation that stores cross-view feature similarities, enhancing coherent and accurate scene comprehension. Leveraging a two-stage distillation framework, SemanticSplat reconstructs a holistic multi-modal semantic feature field from sparse-view images. Experiments demonstrate the effectiveness of our method for 3D scene understanding tasks like promptable and open-vocabulary segmentation.\n\n整体三维场景理解（Holistic 3D Scene Understanding）需要在统一框架下同时建模几何、外观与语义，对于增强现实、机器人交互等应用至关重要。现有前馈式三维场景理解方法（如 LSM）仅限于从场景中提取基于语言的语义信息，无法实现真正的整体场景理解；同时，其几何重建质量较低且存在噪声伪影。相比之下，逐场景优化方法依赖稠密输入视角，降低了实用性并增加了部署复杂度。为此，本文提出 **SemanticSplat**，一种前馈式语义感知三维重建方法，将三维高斯与潜在语义属性结合，实现几何—外观—语义的联合建模。为了预测语义各向异性高斯，**SemanticSplat** 融合了多种特征场（如 LSeg、SAM）与存储跨视角特征相似性的代价体表示（Cost Volume Representation），从而提升场景理解的一致性与准确性。借助双阶段蒸馏框架，**SemanticSplat** 能够从稀疏视角图像中重建完整的多模态语义特征场。实验结果表明，我们的方法在可提示分割（Promptable Segmentation）与开放词汇分割（Open-vocabulary Segmentation）等三维场景理解任务中表现优异。\n"
  },
  {
    "path": "abs/2506.09663.md",
    "content": "### Self-Supervised Multi-Part Articulated Objects Modeling via Deformable Gaussian Splatting and Progressive Primitive Segmentation\n\nArticulated objects are ubiquitous in everyday life, and accurate 3D representations of their geometry and motion are critical for numerous applications. However, in the absence of human annotation, existing approaches still struggle to build a unified representation for objects that contain multiple movable parts. We introduce DeGSS, a unified framework that encodes articulated objects as deformable 3D Gaussian fields, embedding geometry, appearance, and motion in one compact representation. Each interaction state is modeled as a smooth deformation of a shared field, and the resulting deformation trajectories guide a progressive coarse-to-fine part segmentation that identifies distinct rigid components, all in an unsupervised manner. The refined field provides a spatially continuous, fully decoupled description of every part, supporting part-level reconstruction and precise modeling of their kinematic relationships. To evaluate generalization and realism, we enlarge the synthetic PartNet-Mobility benchmark and release RS-Art, a real-to-sim dataset that pairs RGB captures with accurately reverse-engineered 3D models. Extensive experiments demonstrate that our method outperforms existing methods in both accuracy and stability.\n\n关节物体在日常生活中无处不在，而其几何与运动的精确三维表示对于众多应用至关重要。然而，在缺乏人工标注的情况下，现有方法仍难以为包含多个可动部件的物体构建统一表示。我们提出 **DeGSS**，一种将关节物体编码为可变形三维高斯场的统一框架，在一个紧凑表示中同时嵌入几何、外观与运动信息。每个交互状态被建模为共享场的平滑形变，而由此产生的形变轨迹则引导自粗到细的渐进式部件分割，在完全无监督的条件下识别出不同的刚性部件。经细化的场能够为每个部件提供空间连续、完全解耦的描述，支持部件级重建及其运动学关系的精确建模。为评估泛化性与逼真度，我们扩展了合成数据集 PartNet-Mobility，并发布 **RS-Art**，一个将真实 RGB 采集与高精度逆向工程的三维模型配对的真实到仿真数据集。大量实验表明，我们的方法在精度与稳定性方面均优于现有方法。\n"
  },
  {
    "path": "abs/2506.09952.md",
    "content": "### UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting\n\nThe scale diversity of point cloud data presents significant challenges in developing unified representation learning techniques for 3D vision. Currently, there are few unified 3D models, and no existing pre-training method is equally effective for both object- and scene-level point clouds. In this paper, we introduce UniPre3D, the first unified pre-training method that can be seamlessly applied to point clouds of any scale and 3D models of any architecture. Our approach predicts Gaussian primitives as the pre-training task and employs differentiable Gaussian splatting to render images, enabling precise pixel-level supervision and end-to-end optimization. To further regulate the complexity of the pre-training task and direct the model's focus toward geometric structures, we integrate 2D features from pre-trained image models to incorporate well-established texture knowledge. We validate the universal effectiveness of our proposed method through extensive experiments across a variety of object- and scene-level tasks, using diverse point cloud models as backbones.\n\n点云数据的尺度多样性给三维视觉中统一表征学习技术的发展带来了巨大挑战。目前，统一的三维模型屈指可数，且尚无预训练方法能够在物体级与场景级点云上同时保持同等高效。本文提出 **UniPre3D**，这是首个可无缝应用于任意尺度点云及任意架构三维模型的统一预训练方法。我们将预测高斯基元作为预训练任务，并利用可微分的高斯溅射进行图像渲染，从而实现精确的像素级监督与端到端优化。为进一步调控预训练任务的复杂性并引导模型关注几何结构，我们引入来自预训练图像模型的二维特征，以融入成熟的纹理先验知识。我们在多种物体级与场景级任务上，以及基于不同点云模型的骨干网络上进行了大量实验，验证了所提方法的通用有效性。\n"
  },
  {
    "path": "abs/2506.09997.md",
    "content": "### DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos\n\nWe introduce the Deformable Gaussian Splats Large Reconstruction Model (DGS-LRM), the first feed-forward method predicting deformable 3D Gaussian splats from a monocular posed video of any dynamic scene. Feed-forward scene reconstruction has gained significant attention for its ability to rapidly create digital replicas of real-world environments. However, most existing models are limited to static scenes and fail to reconstruct the motion of moving objects. Developing a feed-forward model for dynamic scene reconstruction poses significant challenges, including the scarcity of training data and the need for appropriate 3D representations and training paradigms. To address these challenges, we introduce several key technical contributions: an enhanced large-scale synthetic dataset with ground-truth multi-view videos and dense 3D scene flow supervision; a per-pixel deformable 3D Gaussian representation that is easy to learn, supports high-quality dynamic view synthesis, and enables long-range 3D tracking; and a large transformer network that achieves real-time, generalizable dynamic scene reconstruction. Extensive qualitative and quantitative experiments demonstrate that DGS-LRM achieves dynamic scene reconstruction quality comparable to optimization-based methods, while significantly outperforming the state-of-the-art predictive dynamic reconstruction method on real-world examples. Its predicted physically grounded 3D deformation is accurate and can readily adapt for long-range 3D tracking tasks, achieving performance on par with state-of-the-art monocular video 3D tracking methods.\n\n我们提出了**可变形高斯溅射大规模重建模型（DGS-LRM）**，这是首个能够从任意动态场景的单目已知位姿视频中预测可变形三维高斯溅射的前馈式方法。前馈式场景重建因其能够快速生成真实环境的数字副本而备受关注。然而，大多数现有模型仅适用于静态场景，无法重建运动物体的动态信息。开发用于动态场景重建的前馈模型面临多重挑战，包括训练数据稀缺以及对合适的三维表示与训练范式的需求。为此，我们提出了若干关键技术创新：(1) 构建了增强型大规模合成数据集，包含真实标注的多视角视频与密集三维场景流监督；(2) 提出了易于学习的逐像素可变形三维高斯表示，既支持高质量的动态视图合成，又可实现长距离三维跟踪；(3) 设计了大规模 Transformer 网络，实现了实时、具有泛化能力的动态场景重建。大量定性与定量实验表明，DGS-LRM 的动态场景重建质量可与基于优化的方法媲美，并在真实场景中显著优于当前最先进的预测式动态重建方法。其预测的物理一致性三维形变精确可靠，可直接应用于长距离三维跟踪任务，并在性能上达到当前最先进的单目视频三维跟踪方法的水准。\n"
  },
  {
    "path": "abs/2506.10335.md",
    "content": "### PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting\n\n3D Gaussian splatting (3DGS) is an innovative rendering technique that surpasses the neural radiance field (NeRF) in both rendering speed and visual quality by leveraging an explicit 3D scene representation. Existing 3DGS approaches require a large number of calibrated views to generate a consistent and complete scene representation. When input views are limited, 3DGS tends to overfit the training views, leading to noticeable degradation in rendering quality. To address this limitation, we propose a Point-wise Feature-Aware Gaussian Splatting framework that enables real-time, high-quality rendering from sparse training views. Specifically, we first employ the latest stereo foundation model to estimate accurate camera poses and reconstruct a dense point cloud for Gaussian initialization. We then encode the colour attributes of each 3D Gaussian by sampling and aggregating multiscale 2D appearance features from sparse inputs. To enhance point-wise appearance representation, we design a point interaction network based on a self-attention mechanism, allowing each Gaussian point to interact with its nearest neighbors. These enriched features are subsequently decoded into Gaussian parameters through two lightweight multi-layer perceptrons (MLPs) for final rendering. Extensive experiments on diverse benchmarks demonstrate that our method significantly outperforms NeRF-based approaches and achieves competitive performance under few-shot settings compared to the state-of-the-art 3DGS methods.\n\n\n三维高斯溅射（3DGS）是一种创新型渲染技术，通过显式三维场景表示在渲染速度与视觉质量上均超越了神经辐射场（NeRF）。现有 3DGS 方法需要大量已标定视角才能生成一致且完整的场景表示。当输入视角有限时，3DGS 往往会对训练视角过拟合，导致渲染质量显著下降。为解决这一问题，我们提出了一种**逐点特征感知的高斯溅射（Point-wise Feature-Aware Gaussian Splatting）**框架，实现了在稀疏训练视角下的实时高质量渲染。具体而言，我们首先利用最新的立体视觉基础模型估计精确的相机位姿，并重建稠密点云以初始化高斯分布；然后，通过从稀疏输入中采样并聚合多尺度二维外观特征，为每个三维高斯编码颜色属性。为增强逐点外观表征，我们设计了一种基于自注意力机制的点交互网络，使每个高斯点能够与其最近邻点进行信息交互。随后，这些丰富的特征通过两个轻量级多层感知机（MLP）解码为高斯参数，用于最终渲染。在多个基准数据集上的大量实验表明，我们的方法在少样本场景下显著优于基于 NeRF 的方法，并在与当前最先进的 3DGS 方法对比中取得了具有竞争力的性能。\n"
  },
  {
    "path": "abs/2506.12400.md",
    "content": "### Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis. However, existing methods struggle to adaptively optimize the distribution of Gaussian primitives based on scene characteristics, making it challenging to balance reconstruction quality and efficiency. Inspired by human perception, we propose scene-adaptive perceptual densification for Gaussian Splatting (Perceptual-GS), a novel framework that integrates perceptual sensitivity into the 3DGS training process to address this challenge. We first introduce a perception-aware representation that models human visual sensitivity while constraining the number of Gaussian primitives. Building on this foundation, we develop a perceptual sensitivity-adaptive distribution to allocate finer Gaussian granularity to visually critical regions, enhancing reconstruction quality and robustness. Extensive evaluations on multiple datasets, including BungeeNeRF for large-scale scenes, demonstrate that Perceptual-GS achieves state-of-the-art performance in reconstruction quality, efficiency, and robustness.\n\n三维高斯溅射（3DGS）已成为新视角合成的强大技术。然而，现有方法难以根据场景特征自适应地优化高斯基元的分布，从而难以在重建质量与效率之间取得平衡。受人类感知机制的启发，我们提出**感知驱动的场景自适应高斯加密（Perceptual-GS）**，一种将感知敏感性融入 3DGS 训练过程的新框架，以应对这一挑战。我们首先引入一种**感知感知表示**（Perception-aware Representation），在约束高斯基元数量的同时建模人类视觉敏感性。在此基础上，我们设计了**感知敏感性自适应分布策略**，将更精细的高斯粒度分配给视觉关键区域，从而提升重建质量与鲁棒性。在多个数据集上的大量评估（包括用于大规模场景的 BungeeNeRF）表明，Perceptual-GS 在重建质量、效率和鲁棒性方面均达到了当前最优性能。\n"
  },
  {
    "path": "abs/2506.12716.md",
    "content": "### Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors\n\nWe tackle the challenge of generating dynamic 4D scenes from monocular, multi-object videos with heavy occlusions, and introduce GenMOJO, a novel approach that integrates rendering-based deformable 3D Gaussian optimization with generative priors for view synthesis. While existing models perform well on novel view synthesis for isolated objects, they struggle to generalize to complex, cluttered scenes. To address this, GenMOJO decomposes the scene into individual objects, optimizing a differentiable set of deformable Gaussians per object. This object-wise decomposition allows leveraging object-centric diffusion models to infer unobserved regions in novel viewpoints. It performs joint Gaussian splatting to render the full scene, capturing cross-object occlusions, and enabling occlusion-aware supervision. To bridge the gap between object-centric priors and the global frame-centric coordinate system of videos, GenMOJO uses differentiable transformations that align generative and rendering constraints within a unified framework. The resulting model generates 4D object reconstructions over space and time, and produces accurate 2D and 3D point tracks from monocular input. Quantitative evaluations and perceptual human studies confirm that GenMOJO generates more realistic novel views of scenes and produces more accurate point tracks compared to existing approaches.\n\n我们针对从存在严重遮挡的单目多目标视频中生成动态四维场景的挑战，提出了 **GenMOJO**，一种将基于渲染的可变形三维高斯优化与生成式先验相结合的新方法，用于新视角合成。现有模型在孤立物体的新视角合成上表现良好，但难以推广到复杂、杂乱的场景。为解决这一问题，**GenMOJO** 将场景分解为独立的物体，并为每个物体优化一组可微的可变形高斯。这种按物体分解的方式能够利用基于物体的扩散模型，在新视角下推断未观测到的区域。该方法执行联合高斯溅射以渲染完整场景，捕捉跨物体遮挡关系，并实现遮挡感知的监督。为弥合物体中心先验与视频全局帧中心坐标系之间的差距，**GenMOJO** 使用可微变换将生成与渲染约束对齐到统一框架中。最终，该模型能够在时空中生成四维物体重建，并从单目输入生成精确的二维和三维点跟踪。定量评估与人类感知研究均表明，**GenMOJO** 在生成场景新视角和生成更精确的点跟踪方面均优于现有方法。\n"
  },
  {
    "path": "abs/2506.12727.md",
    "content": "### Efficient multi-view training for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a preferred choice alongside Neural Radiance Fields (NeRF) in inverse rendering due to its superior rendering speed. Currently, the common approach in 3DGS is to utilize \"single-view\" mini-batch training, where only one image is processed per iteration, in contrast to NeRF's \"multi-view\" mini-batch training, which leverages multiple images. We observe that such single-view training can lead to suboptimal optimization due to increased variance in mini-batch stochastic gradients, highlighting the necessity for multi-view training. However, implementing multi-view training in 3DGS poses challenges. Simply rendering multiple images per iteration incurs considerable overhead and may result in suboptimal Gaussian densification due to its reliance on single-view assumptions. To address these issues, we modify the rasterization process to minimize the overhead associated with multi-view training and propose a 3D distance-aware D-SSIM loss and multi-view adaptive density control that better suits multi-view scenarios. Our experiments demonstrate that the proposed methods significantly enhance the performance of 3DGS and its variants, freeing 3DGS from the constraints of single-view training.\n\n三维高斯溅射（3DGS）凭借其卓越的渲染速度，已与神经辐射场（NeRF）一道成为逆向渲染中的优选方案。目前，3DGS 的常见训练方式是采用“单视角”小批量训练，即每次迭代仅处理一张图像；而 NeRF 则采用“多视角”小批量训练，在每次迭代中利用多张图像。我们观察到，这种单视角训练会因小批量随机梯度方差增加而导致次优优化结果，这凸显了多视角训练的必要性。然而，在 3DGS 中实现多视角训练面临挑战：简单地在每次迭代中渲染多张图像会带来显著的计算开销，并且可能由于依赖单视角假设而导致高斯加密效果欠佳。为解决这些问题，我们修改了光栅化过程，以最大限度减少多视角训练的额外开销，并提出了**三维距离感知的 D-SSIM 损失**以及**多视角自适应密度控制**，更好地适配多视角训练场景。实验结果表明，所提方法显著提升了 3DGS 及其变体的性能，使其摆脱了单视角训练的限制。\n"
  },
  {
    "path": "abs/2506.12793.md",
    "content": "### SMPL Normal Map Is All You Need for Single-view Textured Human Reconstruction\n\nSingle-view textured human reconstruction aims to reconstruct a clothed 3D digital human by inputting a monocular 2D image. Existing approaches include feed-forward methods, limited by scarce 3D human data, and diffusion-based methods, prone to erroneous 2D hallucinations. To address these issues, we propose a novel SMPL normal map Equipped 3D Human Reconstruction (SEHR) framework, integrating a pretrained large 3D reconstruction model with human geometry prior. SEHR performs single-view human reconstruction without using a preset diffusion model in one forward propagation. Concretely, SEHR consists of two key components: SMPL Normal Map Guidance (SNMG) and SMPL Normal Map Constraint (SNMC). SNMG incorporates SMPL normal maps into an auxiliary network to provide improved body shape guidance. SNMC enhances invisible body parts by constraining the model to predict an extra SMPL normal Gaussians. Extensive experiments on two benchmark datasets demonstrate that SEHR outperforms existing state-of-the-art methods.\n\n单视角纹理化人体重建旨在通过输入一张单目二维图像来重建穿着衣物的三维数字人。现有方法包括前馈式方法（受限于稀缺的三维人体数据）和基于扩散的方法（容易产生错误的二维幻觉）。为解决这些问题，我们提出了一种全新的**SMPL 法线图增强三维人体重建（SEHR）**框架，将预训练的大规模三维重建模型与人体几何先验相结合。**SEHR** 能够在一次前向传播中完成单视角人体重建，而无需依赖预设的扩散模型。具体而言，SEHR 包含两个核心组件：**SMPL 法线图引导（SNMG）**与**SMPL 法线图约束（SNMC）**。SNMG 将 SMPL 法线图引入辅助网络，为人体形状提供更精准的几何引导；SNMC 则通过约束模型预测额外的 SMPL 法线高斯来增强不可见的人体部位。在两个基准数据集上的大量实验表明，**SEHR** 的性能优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2506.12945.md",
    "content": "### Metropolis-Hastings Sampling for 3D Gaussian Reconstruction\n\nWe propose an adaptive sampling framework for 3D Gaussian Splatting (3DGS) that leverages comprehensive multi-view photometric error signals within a unified Metropolis-Hastings approach. Traditional 3DGS methods heavily rely on heuristic-based density-control mechanisms (e.g., cloning, splitting, and pruning), which can lead to redundant computations or the premature removal of beneficial Gaussians. Our framework overcomes these limitations by reformulating densification and pruning as a probabilistic sampling process, dynamically inserting and relocating Gaussians based on aggregated multi-view errors and opacity scores. Guided by Bayesian acceptance tests derived from these error-based importance scores, our method substantially reduces reliance on heuristics, offers greater flexibility, and adaptively infers Gaussian distributions without requiring predefined scene complexity. Experiments on benchmark datasets, including Mip-NeRF360, Tanks and Temples, and Deep Blending, show that our approach reduces the number of Gaussians needed, enhancing computational efficiency while matching or modestly surpassing the view-synthesis quality of state-of-the-art models.\n\n我们提出了一种适用于三维高斯溅射（3DGS）的自适应采样框架，在统一的 **Metropolis-Hastings** 方法中利用全面的多视角光度误差信号。传统的 3DGS 方法高度依赖启发式的密度控制机制（如克隆、拆分与裁剪），这可能导致冗余计算，或过早移除有用的高斯。我们的框架通过将加密与裁剪重新表述为一种概率采样过程，基于聚合的多视角误差与不透明度分数动态插入和重新定位高斯。该方法利用由误差驱动的重要性评分推导出的 **贝叶斯接受检验** 进行引导，大幅减少对启发式规则的依赖，提供更高的灵活性，并能够在无需预设场景复杂度的情况下自适应地推断高斯分布。在 **Mip-NeRF360**、**Tanks and Temples** 及 **Deep Blending** 等基准数据集上的实验表明，我们的方法在减少高斯数量、提升计算效率的同时，在视图合成质量上能够达到甚至略微超过当前最先进模型的水平。\n"
  },
  {
    "path": "abs/2506.13110.md",
    "content": "### GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction\n\n3D modeling of highly reflective objects remains challenging due to strong view-dependent appearances. While previous SDF-based methods can recover high-quality meshes, they are often time-consuming and tend to produce over-smoothed surfaces. In contrast, 3D Gaussian Splatting (3DGS) offers the advantage of high speed and detailed real-time rendering, but extracting surfaces from the Gaussians can be noisy due to the lack of geometric constraints. To bridge the gap between these approaches, we propose a novel reconstruction method called GS-2DGS for reflective objects based on 2D Gaussian Splatting (2DGS). Our approach combines the rapid rendering capabilities of Gaussian Splatting with additional geometric information from foundation models. Experimental results on synthetic and real datasets demonstrate that our method significantly outperforms Gaussian-based techniques in terms of reconstruction and relighting and achieves performance comparable to SDF-based methods while being an order of magnitude faster.\n\n高反射物体的三维建模因其强烈的视角依赖性外观而依然充满挑战。尽管以往基于 SDF 的方法能够恢复高质量网格，但它们通常耗时较长且容易生成过于平滑的表面。相比之下，三维高斯溅射（3DGS）具有高速与细节丰富的实时渲染优势，但由于缺乏几何约束，从高斯中提取表面往往会产生噪声。为弥合这两类方法之间的差距，我们提出了一种基于二维高斯溅射（2DGS）的新型高反射物体重建方法 **GS-2DGS**。该方法结合了高斯溅射的快速渲染能力与来自基础模型的额外几何信息。在合成与真实数据集上的实验结果表明，我们的方法在重建与重光照效果方面显著优于基于高斯的技术，并在性能上可与基于 SDF 的方法相媲美，同时速度快了一个数量级。\n"
  },
  {
    "path": "abs/2506.13508.md",
    "content": "### Multiview Geometric Regularization of Gaussian Splatting for Accurate Radiance Fields\n\nRecent methods, such as 2D Gaussian Splatting and Gaussian Opacity Fields, have aimed to address the geometric inaccuracies of 3D Gaussian Splatting while retaining its superior rendering quality. However, these approaches still struggle to reconstruct smooth and reliable geometry, particularly in scenes with significant color variation across viewpoints, due to their per-point appearance modeling and single-view optimization constraints. In this paper, we propose an effective multiview geometric regularization strategy that integrates multiview stereo (MVS) depth, RGB, and normal constraints into Gaussian Splatting initialization and optimization. Our key insight is the complementary relationship between MVS-derived depth points and Gaussian Splatting-optimized positions: MVS robustly estimates geometry in regions of high color variation through local patch-based matching and epipolar constraints, whereas Gaussian Splatting provides more reliable and less noisy depth estimates near object boundaries and regions with lower color variation. To leverage this insight, we introduce a median depth-based multiview relative depth loss with uncertainty estimation, effectively integrating MVS depth information into Gaussian Splatting optimization. We also propose an MVS-guided Gaussian Splatting initialization to avoid Gaussians falling into suboptimal positions. Extensive experiments validate that our approach successfully combines these strengths, enhancing both geometric accuracy and rendering quality across diverse indoor and outdoor scenes.\n\n近期的方法（如二维高斯溅射和高斯不透明场）旨在在保留三维高斯溅射优越渲染质量的同时，解决其几何精度不足的问题。然而，这些方法在重建平滑且可靠的几何结构方面依然存在困难，尤其是在跨视角颜色变化显著的场景中，这主要是由于其逐点外观建模和单视角优化的限制。本文提出了一种有效的**多视几何正则化策略**，将多视立体（MVS）深度、RGB 以及法线约束融合到高斯溅射的初始化与优化过程中。我们的核心洞察在于：**MVS 深度点与高斯溅射优化位置之间存在互补关系**——MVS 通过基于局部补丁的匹配与极线约束，可在高颜色变化区域稳健估计几何，而高斯溅射则在物体边界及低颜色变化区域提供更可靠、噪声更低的深度估计。为利用这一洞察，我们引入了一种基于中值深度的多视相对深度损失及不确定性估计，将 MVS 深度信息有效融入高斯溅射优化中。同时，我们提出了一种**MVS 引导的高斯溅射初始化**，以避免高斯落入次优位置。大量实验验证了我们的方法能够有效结合两者优势，在多种室内外场景中同时提升几何精度与渲染质量。\n"
  },
  {
    "path": "abs/2506.13516.md",
    "content": "### Micro-macro Gaussian Splatting with Enhanced Scalability for Unconstrained Scene Reconstruction\n\nReconstructing 3D scenes from unconstrained image collections poses significant challenges due to variations in appearance. In this paper, we propose Scalable Micro-macro Wavelet-based Gaussian Splatting (SMW-GS), a novel method that enhances 3D reconstruction across diverse scales by decomposing scene representations into global, refined, and intrinsic components. SMW-GS incorporates the following innovations: Micro-macro Projection, which enables Gaussian points to sample multi-scale details with improved diversity; and Wavelet-based Sampling, which refines feature representations using frequency-domain information to better capture complex scene appearances. To achieve scalability, we further propose a large-scale scene promotion strategy, which optimally assigns camera views to scene partitions by maximizing their contributions to Gaussian points, achieving consistent and high-quality reconstructions even in expansive environments. Extensive experiments demonstrate that SMW-GS significantly outperforms existing methods in both reconstruction quality and scalability, particularly excelling in large-scale urban environments with challenging illumination variations.\n\n从非受限图像集合中重建三维场景由于外观变化巨大而面临重大挑战。本文提出了一种可扩展的**微宏小波高斯溅射（SMW-GS）**方法，通过将场景表示分解为全局、精细和内在三个组成部分，从而提升跨尺度的三维重建效果。**SMW-GS** 包含以下创新点：**微宏投影（Micro-macro Projection）**，使高斯点能够以更高的多样性采样多尺度细节；**基于小波的采样（Wavelet-based Sampling）**，利用频域信息优化特征表示，更好地捕捉复杂场景外观。为实现可扩展性，我们进一步提出了一种**大规模场景提升策略**，通过最大化相机视角对高斯点的贡献，将其最优分配至场景分区，从而在广阔环境中依然实现一致且高质量的重建。大量实验表明，**SMW-GS** 在重建质量与可扩展性上均显著优于现有方法，尤其在具有挑战性光照变化的大规模城市环境中表现突出。\n"
  },
  {
    "path": "abs/2506.13766.md",
    "content": "### PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images\n\nReconstructing an animatable 3D human from casually captured images of an articulated subject without camera or human pose information is a practical yet challenging task due to view misalignment, occlusions, and the absence of structural priors. While optimization-based methods can produce high-fidelity results from monocular or multi-view videos, they require accurate pose estimation and slow iterative optimization, limiting scalability in unconstrained scenarios. Recent feed-forward approaches enable efficient single-image reconstruction but struggle to effectively leverage multiple input images to reduce ambiguity and improve reconstruction accuracy. To address these challenges, we propose PF-LHM, a large human reconstruction model that generates high-quality 3D avatars in seconds from one or multiple casually captured pose-free images. Our approach introduces an efficient Encoder-Decoder Point-Image Transformer architecture, which fuses hierarchical geometric point features and multi-view image features through multimodal attention. The fused features are decoded to recover detailed geometry and appearance, represented using 3D Gaussian splats. Extensive experiments on both real and synthetic datasets demonstrate that our method unifies single- and multi-image 3D human reconstruction, achieving high-fidelity and animatable 3D human avatars without requiring camera and human pose annotations.\n\n从随意拍摄的关节化人物图像中，在不具备相机或人体姿态信息的情况下重建可动画的三维人体，是一项具有实用价值但极具挑战性的任务，其难点在于视角不对齐、遮挡以及缺乏结构先验。虽然基于优化的方法可以从单目或多视角视频中生成高保真结果，但它们依赖于精确的姿态估计，并需要缓慢的迭代优化，因此在非受限场景下的可扩展性较差。近期的前馈式方法能够高效完成单图重建，但在充分利用多张输入图像以减少歧义、提升重建精度方面表现不足。为解决这些问题，我们提出 **PF-LHM**，一种大型人体重建模型，可在数秒内从一张或多张随意拍摄的无姿态图像生成高质量的三维虚拟人。我们的方法引入了一种高效的**编码器-解码器点-图像 Transformer 架构**，通过多模态注意力融合分层的几何点特征与多视图图像特征。融合后的特征被解码以恢复细致的几何与外观，并采用三维高斯溅射进行表示。在真实与合成数据集上的大量实验表明，我们的方法统一了单图与多图三维人体重建，实现了无需相机与人体姿态标注的高保真可动画三维虚拟人。\n"
  },
  {
    "path": "abs/2506.14009.md",
    "content": "### GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics\n\nAutonomous drones capable of interpreting and executing high-level language instructions in unstructured environments remain a long-standing goal. Yet existing approaches are constrained by their dependence on hand-crafted skills, extensive parameter tuning, or computationally intensive models unsuitable for onboard use. We introduce GRaD-Nav++, a lightweight Vision-Language-Action (VLA) framework that runs fully onboard and follows natural-language commands in real time. Our policy is trained in a photorealistic 3D Gaussian Splatting (3DGS) simulator via Differentiable Reinforcement Learning (DiffRL), enabling efficient learning of low-level control from visual and linguistic inputs. At its core is a Mixture-of-Experts (MoE) action head, which adaptively routes computation to improve generalization while mitigating forgetting. In multi-task generalization experiments, GRaD-Nav++ achieves a success rate of 83% on trained tasks and 75% on unseen tasks in simulation. When deployed on real hardware, it attains 67% success on trained tasks and 50% on unseen ones. In multi-environment adaptation experiments, GRaD-Nav++ achieves an average success rate of 81% across diverse simulated environments and 67% across varied real-world settings. These results establish a new benchmark for fully onboard Vision-Language-Action (VLA) flight and demonstrate that compact, efficient models can enable reliable, language-guided navigation without relying on external infrastructure.\n\n在非结构化环境中能够理解并执行高层次语言指令的自主无人机，一直是长期以来的重要目标。然而，现有方法受限于对人工设计技能的依赖、大量参数调优，或依赖计算量过大的模型，难以在机载环境中运行。我们提出了 **GRaD-Nav++**，一种轻量级视觉-语言-动作（Vision-Language-Action, VLA）框架，可在机载环境中实时运行并执行自然语言指令。我们的策略在高逼真度的三维高斯溅射（3DGS）模拟器中，通过可微分强化学习（Differentiable Reinforcement Learning, DiffRL）训练，从视觉与语言输入中高效学习低层次控制策略。其核心为**专家混合（Mixture-of-Experts, MoE）动作头**，能够自适应地分配计算以提升泛化能力并缓解遗忘问题。在多任务泛化实验中，**GRaD-Nav++** 在模拟环境中训练任务成功率达到 83%，未见任务成功率为 75%；部署到真实硬件时，训练任务成功率为 67%，未见任务成功率为 50%。在多环境适应实验中，**GRaD-Nav++** 在多样化模拟环境中平均成功率为 81%，在多种真实环境中平均成功率为 67%。这些结果为完全机载的视觉-语言-动作飞行设立了新的基准，并证明紧凑高效的模型能够在不依赖外部基础设施的情况下，实现可靠的语言引导导航。\n"
  },
  {
    "path": "abs/2506.14135.md",
    "content": "### GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation\n\nAccurate action inference is critical for vision-based robotic manipulation. Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A) paradigm, leveraging intermediate 3D representations. However, these methods often struggle with action inaccuracies due to the complexity and dynamic nature of manipulation scenes. In this paper, we propose a Vision-to-4D-to-Action (V-4D-A) framework that enables direct action reasoning from motion-aware 4D representations via a Gaussian Action Field (GAF). GAF extends 3D Gaussian Splatting (3DGS) by incorporating learnable motion attributes, allowing simultaneous modeling of dynamic scenes and manipulation actions. To learn time-varying scene geometry and action-aware robot motion, GAF supports three key query types: reconstruction of the current scene, prediction of future frames, and estimation of initial action via robot motion. Furthermore, the high-quality current and future frames generated by GAF facilitate manipulation action refinement through a GAF-guided diffusion model. Extensive experiments demonstrate significant improvements, with GAF achieving +11.5385 dB PSNR and -0.5574 LPIPS improvements in reconstruction quality, while boosting the average success rate in robotic manipulation tasks by 10.33% over state-of-the-art methods.\n\n精确的动作推理对于基于视觉的机器人操作至关重要。现有方法通常遵循“视觉到动作”（V-A）范式，即直接从视觉输入预测动作，或“视觉到三维到动作”（V-3D-A）范式，即利用中间的三维表示。然而，这些方法在复杂且动态的操作场景中往往难以避免动作预测不准确的问题。本文提出了一种**视觉到四维到动作（V-4D-A）**框架，通过**高斯动作场（Gaussian Action Field, GAF）**实现基于具备运动感知能力的四维表示的直接动作推理。GAF 在三维高斯溅射（3DGS）的基础上引入可学习的运动属性，从而能够同时建模动态场景与操作动作。为了学习时变场景几何与动作感知的机器人运动，GAF 支持三类关键查询：当前场景重建、未来帧预测，以及通过机器人运动估计初始动作。此外，GAF 生成的高质量当前帧与未来帧可通过**GAF 引导的扩散模型**进一步优化操作动作。大量实验表明，GAF 在重建质量方面相比当前最先进方法实现了 **+11.5385 dB** 的 PSNR 提升与 **-0.5574** 的 LPIPS 降低，同时在机器人操作任务的平均成功率上提升了 **10.33%**。\n"
  },
  {
    "path": "abs/2506.14229.md",
    "content": "### HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction\n\n3D Gaussian Splatting (3DGS) has made significant strides in real-time 3D scene reconstruction, but faces memory scalability issues in high-resolution scenarios. To address this, we propose Hierarchical Gaussian Splatting (HRGS), a memory-efficient framework with hierarchical block-level optimization. First, we generate a global, coarse Gaussian representation from low-resolution data. Then, we partition the scene into multiple blocks, refining each block with high-resolution data. The partitioning involves two steps: Gaussian partitioning, where irregular scenes are normalized into a bounded cubic space with a uniform grid for task distribution, and training data partitioning, where only relevant observations are retained for each block. By guiding block refinement with the coarse Gaussian prior, we ensure seamless Gaussian fusion across adjacent blocks. To reduce computational demands, we introduce Importance-Driven Gaussian Pruning (IDGP), which computes importance scores for each Gaussian and removes those with minimal contribution, speeding up convergence and reducing memory usage. Additionally, we incorporate normal priors from a pretrained model to enhance surface reconstruction quality. Our method enables high-quality, high-resolution 3D scene reconstruction even under memory constraints. Extensive experiments on three benchmarks show that HRGS achieves state-of-the-art performance in high-resolution novel view synthesis (NVS) and surface reconstruction tasks.\n\n三维高斯溅射（3DGS）在实时三维场景重建方面取得了显著进展，但在高分辨率场景下仍面临内存可扩展性问题。为此，我们提出了**分层高斯溅射（Hierarchical Gaussian Splatting, HRGS）**，一种具备分层块级优化的内存高效框架。首先，我们从低分辨率数据生成全局粗略的高斯表示；然后将场景划分为多个块，并使用高分辨率数据对各块进行精细化重建。该分块过程包括两个步骤：**高斯分块**（Gaussian Partitioning），将不规则场景归一化到有界立方空间，并采用均匀网格进行任务分配；**训练数据分块**（Training Data Partitioning），仅保留与该块相关的观测数据。通过利用粗高斯先验引导块级优化，我们能够实现相邻块之间的无缝高斯融合。为降低计算量，我们引入了**基于重要性的高斯剪枝（Importance-Driven Gaussian Pruning, IDGP）**，计算每个高斯的重要性分数，并移除贡献度较低的高斯，从而加快收敛并减少内存占用。此外，我们结合了预训练模型提供的法线先验，以提升表面重建质量。该方法在内存受限条件下依然能够实现高质量、高分辨率的三维场景重建。在三个基准数据集上的大量实验表明，**HRGS** 在高分辨率新视角合成（NVS）与表面重建任务中均达到了当前最优性能。\n"
  },
  {
    "path": "abs/2506.14642.md",
    "content": "### 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a promising approach for novel view synthesis, offering real-time rendering with high visual fidelity. However, its substantial storage requirements present significant challenges for practical applications. While recent state-of-the-art (SOTA) 3DGS methods increasingly incorporate dedicated compression modules, there is a lack of a comprehensive framework to evaluate their perceptual impact. Therefore we present 3DGS-IEval-15K, the first large-scale image quality assessment (IQA) dataset specifically designed for compressed 3DGS representations. Our dataset encompasses 15,200 images rendered from 10 real-world scenes through 6 representative 3DGS algorithms at 20 strategically selected viewpoints, with different compression levels leading to various distortion effects. Through controlled subjective experiments, we collect human perception data from 60 viewers. We validate dataset quality through scene diversity and MOS distribution analysis, and establish a comprehensive benchmark with 30 representative IQA metrics covering diverse types. As the largest-scale 3DGS quality assessment dataset to date, our work provides a foundation for developing 3DGS specialized IQA metrics, and offers essential data for investigating view-dependent quality distribution patterns unique to 3DGS.\n\n三维高斯溅射（3DGS）作为一种新视角合成的有前景方法，能够在保持高视觉保真度的同时实现实时渲染。然而，其巨大的存储需求对实际应用构成了重大挑战。尽管最新的三维高斯溅射（SOTA）方法越来越多地引入专用压缩模块，但缺乏一个全面的框架来评估这些压缩方法的感知影响。为此，我们提出了 **3DGS-IEval-15K**，这是首个专门针对压缩 3DGS 表示设计的大规模图像质量评价（IQA）数据集。该数据集包含 10 个真实场景、由 6 种具有代表性的 3DGS 算法在 20 个精心挑选的视角下渲染的共 **15,200 张图像**，并在不同压缩水平下产生多种失真效果。通过严格控制的主观实验，我们收集了来自 60 位观察者的人类感知数据。我们通过场景多样性分析和 MOS（平均意见分数）分布分析验证了数据集的质量，并建立了涵盖多种类型的 **30 个代表性 IQA 指标** 的综合基准。作为迄今为止规模最大的 3DGS 质量评估数据集，我们的工作为开发面向 3DGS 的专用 IQA 指标奠定了基础，并为研究 3DGS 独有的依赖视角的质量分布模式提供了关键数据支持。\n"
  },
  {
    "path": "abs/2506.14742.md",
    "content": "### SyncTalk++: High-Fidelity and Efficient Synchronized Talking Heads Synthesis Using Gaussian Splatting\n\nAchieving high synchronization in the synthesis of realistic, speech-driven talking head videos presents a significant challenge. A lifelike talking head requires synchronized coordination of subject identity, lip movements, facial expressions, and head poses. The absence of these synchronizations is a fundamental flaw, leading to unrealistic results. To address the critical issue of synchronization, identified as the ''devil'' in creating realistic talking heads, we introduce SyncTalk++, which features a Dynamic Portrait Renderer with Gaussian Splatting to ensure consistent subject identity preservation and a Face-Sync Controller that aligns lip movements with speech while innovatively using a 3D facial blendshape model to reconstruct accurate facial expressions. To ensure natural head movements, we propose a Head-Sync Stabilizer, which optimizes head poses for greater stability. Additionally, SyncTalk++ enhances robustness to out-of-distribution (OOD) audio by incorporating an Expression Generator and a Torso Restorer, which generate speech-matched facial expressions and seamless torso regions. Our approach maintains consistency and continuity in visual details across frames and significantly improves rendering speed and quality, achieving up to 101 frames per second. Extensive experiments and user studies demonstrate that SyncTalk++ outperforms state-of-the-art methods in synchronization and realism.\n\n在合成逼真、语音驱动的说话人头像视频中，实现高度同步是一项重大挑战。一个栩栩如生的说话人头像需要在主体身份、唇部动作、面部表情和头部姿态之间实现同步协调。缺乏这些同步会成为根本缺陷，导致不真实的结果。为了解决被称为逼真说话头像生成中“魔鬼”的同步问题，我们提出了 **SyncTalk++**，其特点包括：采用基于高斯溅射的**动态肖像渲染器**，确保主体身份的一致性保留；引入**唇语同步控制器（Face-Sync Controller）**，将唇部动作与语音精确对齐，并创新性地利用三维面部混合形状模型（3D Facial Blendshape Model）重建准确的面部表情。为确保自然的头部运动，我们提出了**头部同步稳定器（Head-Sync Stabilizer）**，对头部姿态进行优化以增强稳定性。此外，**SyncTalk++** 通过引入**表情生成器（Expression Generator）**与**躯干修复器（Torso Restorer）**，提升了对分布外（OOD）音频的鲁棒性，从而生成与语音匹配的面部表情和无缝衔接的躯干部位。我们的方法能够在视频帧间保持视觉细节的一致性与连续性，并显著提升渲染速度与质量，最高可达到每秒 101 帧。在大量实验与用户研究中，**SyncTalk++** 在同步性与真实感方面均优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2506.14825.md",
    "content": "### GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction\n\nAddressing the task of 3D semantic occupancy prediction for autonomous driving, we tackle two key issues in existing 3D Gaussian Splatting (3DGS) methods: (1) unified feature aggregation neglecting semantic correlations among similar categories and across regions, (2) boundary ambiguities caused by the lack of geometric constraints in MLP iterative optimization and (3) biased issues in dynamic-static object coupling optimization. We propose the GraphGSOcc model, a novel framework that combines semantic and geometric graph Transformer and decouples dynamic-static objects optimization for 3D Gaussian Splatting-based Occupancy Prediction. We propose the Dual Gaussians Graph Attenntion, which dynamically constructs dual graph structures: a geometric graph adaptively calculating KNN search radii based on Gaussian poses, enabling large-scale Gaussians to aggregate features from broader neighborhoods while compact Gaussians focus on local geometric consistency; a semantic graph retaining top-M highly correlated nodes via cosine similarity to explicitly encode semantic relationships within and across instances. Coupled with the Multi-scale Graph Attention framework, fine-grained attention at lower layers optimizes boundary details, while coarsegrained attention at higher layers models object-level topology. On the other hand, we decouple dynamic and static objects by leveraging semantic probability distributions and design a Dynamic-Static Decoupled Gaussian Attention mechanism to optimize the prediction performance for both dynamic objects and static scenes. GraphGSOcc achieves state-ofthe-art performance on the SurroundOcc-nuScenes, Occ3D-nuScenes, OpenOcc and KITTI occupancy benchmarks. Experiments on the SurroundOcc dataset achieve an mIoU of 25.20%, reducing GPU memory to 6.8 GB, demonstrating a 1.97% mIoU improvement and 13.7% memory reduction compared to GaussianWorld.\n\n针对自动驾驶中的三维语义占用预测任务，我们着重解决现有三维高斯溅射（3DGS）方法的三个关键问题：(1) 统一特征聚合忽略了相似类别及跨区域之间的语义相关性；(2) 由于 MLP 迭代优化缺乏几何约束而导致的边界模糊问题；(3) 动态-静态物体耦合优化带来的偏差问题。为此，我们提出 **GraphGSOcc** 模型，这是一种结合语义与几何图 Transformer 并实现动态-静态物体优化解耦的三维高斯溅射占用预测新框架。我们提出 **双高斯图注意力机制（Dual Gaussians Graph Attention）**，该机制动态构建双图结构：几何图基于高斯位置自适应计算 KNN 搜索半径，使大尺度高斯能够从更广邻域聚合特征，而紧凑型高斯则专注于局部几何一致性；语义图则通过余弦相似度保留相关性最高的 Top-M 节点，从而显式编码实例内及跨实例的语义关系。结合 **多尺度图注意力框架（Multi-scale Graph Attention）**，在低层通过细粒度注意力优化边界细节，高层通过粗粒度注意力建模目标级拓扑结构。另一方面，我们利用语义概率分布对动态与静态物体进行解耦，并设计了 **动态-静态解耦高斯注意力机制（Dynamic-Static Decoupled Gaussian Attention）**，以同时提升动态目标与静态场景的预测性能。**GraphGSOcc** 在 SurroundOcc-nuScenes、Occ3D-nuScenes、OpenOcc 和 KITTI 占用预测基准上均取得了当前最优性能。在 SurroundOcc 数据集上的实验表明，mIoU 达到 **25.20%**，显存占用降低至 **6.8 GB**，相比 GaussianWorld 提升了 **1.97%** mIoU，并减少了 **13.7%** 的显存消耗。\n"
  },
  {
    "path": "abs/2506.17212.md",
    "content": "### Part2GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting\n\nArticulated objects are common in the real world, yet modeling their structure and motion remains a challenging task for 3D reconstruction methods. In this work, we introduce Part2GS, a novel framework for modeling articulated digital twins of multi-part objects with high-fidelity geometry and physically consistent articulation. Part2GS leverages a part-aware 3D Gaussian representation that encodes articulated components with learnable attributes, enabling structured, disentangled transformations that preserve high-fidelity geometry. To ensure physically consistent motion, we propose a motion-aware canonical representation guided by physics-based constraints, including contact enforcement, velocity consistency, and vector-field alignment. Furthermore, we introduce a field of repel points to prevent part collisions and maintain stable articulation paths, significantly improving motion coherence over baselines. Extensive evaluations on both synthetic and real-world datasets show that Part2GS consistently outperforms state-of-the-art methods by up to 10× in Chamfer Distance for movable parts.\n\n关节化物体在现实世界中十分常见，但对其结构与运动进行建模仍然是三维重建方法的一大挑战。本文提出 **Part2GS**，一种用于建模高保真几何与物理一致性关节运动的多部件物体数字孪生的新框架。**Part2GS** 利用**部件感知的三维高斯表示**，将可动部件编码为具备可学习属性的表示，从而实现结构化、解耦的变换，并在运动过程中保持高保真的几何细节。为保证物理一致的运动，我们提出了**运动感知的规范表示**，并引入物理约束进行引导，包括接触约束、速度一致性以及矢量场对齐。此外，我们引入了**排斥点场**（field of repel points）来防止部件之间的碰撞并保持稳定的关节运动路径，相较基线方法显著提升了运动一致性。在合成与真实数据集上的大量评估表明，**Part2GS** 在可动部件的 Chamfer Distance 上较现有最先进方法提升可达 **10×**。\n"
  },
  {
    "path": "abs/2506.17636.md",
    "content": "### 3D Gaussian Splatting for Fine-Detailed Surface Reconstruction in Large-Scale Scene\n\nRecent developments in 3D Gaussian Splatting have made significant advances in surface reconstruction. However, scaling these methods to large-scale scenes remains challenging due to high computational demands and the complex dynamic appearances typical of outdoor environments. These challenges hinder the application in aerial surveying and autonomous driving. This paper proposes a novel solution to reconstruct large-scale surfaces with fine details, supervised by full-sized images. Firstly, we introduce a coarse-to-fine strategy to reconstruct a coarse model efficiently, followed by adaptive scene partitioning and sub-scene refining from image segments. Additionally, we integrate a decoupling appearance model to capture global appearance variations and a transient mask model to mitigate interference from moving objects. Finally, we expand the multi-view constraint and introduce a single-view regularization for texture-less areas. Our experiments were conducted on the publicly available dataset GauU-Scene V2, which was captured using unmanned aerial vehicles. To the best of our knowledge, our method outperforms existing NeRF-based and Gaussian-based methods, achieving high-fidelity visual results and accurate surface from full-size image optimization.\n\n三维高斯溅射（3D Gaussian Splatting）在表面重建方面的最新发展取得了显著进展。然而，将这些方法扩展到大规模场景仍面临挑战，主要原因在于高计算开销以及户外环境中常见的复杂动态外观。这些问题限制了其在航空测绘和自动驾驶等领域的应用。本文提出了一种新方法，在全尺寸图像监督下重建具备精细细节的大规模表面。首先，我们采用**由粗到细（coarse-to-fine）**的策略，高效重建粗模型；随后，通过自适应场景分割与基于图像分块的子场景精化提升局部细节表现。此外，我们引入**外观解耦模型**以捕捉全局外观变化，并设计**瞬态掩码模型**来减轻移动物体的干扰。最后，我们扩展了多视约束，并针对无纹理区域引入**单视正则化**。我们在公开的无人机采集数据集 **GauU-Scene V2** 上进行了实验。结果表明，据我们所知，该方法在全尺寸图像优化条件下，在视觉保真度与表面精度方面均优于现有的基于 NeRF 和基于高斯的方法。\n"
  },
  {
    "path": "abs/2506.18575.md",
    "content": "### 2D Triangle Splatting for Direct Differentiable Mesh Training\n\nDifferentiable rendering with 3D Gaussian primitives has emerged as a powerful method for reconstructing high-fidelity 3D scenes from multi-view images. While it offers improvements over NeRF-based methods, this representation still encounters challenges with rendering speed and advanced rendering effects, such as relighting and shadow rendering, compared to mesh-based models. In this paper, we propose 2D Triangle Splatting (2DTS), a novel method that replaces 3D Gaussian primitives with 2D triangle facelets. This representation naturally forms a discrete mesh-like structure while retaining the benefits of continuous volumetric modeling. By incorporating a compactness parameter into the triangle primitives, we enable direct training of photorealistic meshes. Our experimental results demonstrate that our triangle-based method, in its vanilla version (without compactness tuning), achieves higher fidelity compared to state-of-the-art Gaussian-based methods. Furthermore, our approach produces reconstructed meshes with superior visual quality compared to existing mesh reconstruction methods.\n\n基于三维高斯基元的可微渲染已成为从多视图图像重建高保真三维场景的有力方法。尽管相比基于 NeRF 的方法有所提升，这种表示在渲染速度以及再光照、阴影渲染等高级渲染效果方面，仍不及基于网格的模型。本文提出了一种全新的方法——**二维三角溅射（2D Triangle Splatting, 2DTS）**，将三维高斯基元替换为二维三角面元。该表示形式天然构成离散的类网格结构，同时保留了连续体积建模的优点。通过在三角基元中引入**紧致度参数**，我们能够直接训练出照片级真实感的网格模型。实验结果表明，在未进行紧致度调优的基础版本中，我们的基于三角形的方法在保真度上优于当前最先进的基于高斯的方法。此外，与现有的网格重建方法相比，我们的方法生成的重建网格在视觉质量上也更为优越。\n"
  },
  {
    "path": "abs/2506.18601.md",
    "content": "### BulletGen: Improving 4D Reconstruction with Bullet-Time Generation\n\nTransforming casually captured, monocular videos into fully immersive dynamic experiences is a highly ill-posed task, and comes with significant challenges, e.g., reconstructing unseen regions, and dealing with the ambiguity in monocular depth estimation. In this work we introduce BulletGen, an approach that takes advantage of generative models to correct errors and complete missing information in a Gaussian-based dynamic scene representation. This is done by aligning the output of a diffusion-based video generation model with the 4D reconstruction at a single frozen \"bullet-time\" step. The generated frames are then used to supervise the optimization of the 4D Gaussian model. Our method seamlessly blends generative content with both static and dynamic scene components, achieving state-of-the-art results on both novel-view synthesis, and 2D/3D tracking tasks.\n\n将随意拍摄的单目视频转换为完全沉浸式的动态体验是一项高度病态的问题，并面临诸多挑战，例如重建不可见区域以及处理单目深度估计中的歧义。本文提出 **BulletGen**，一种利用生成模型来纠正误差并补全基于高斯的动态场景表示中缺失信息的方法。具体而言，我们将扩散式视频生成模型的输出与四维重建在某个冻结的“子弹时间”帧对齐，然后利用生成的帧来监督四维高斯模型的优化。我们的方法能够将生成内容与静态和动态场景组件无缝融合，在新视角合成以及二维/三维跟踪任务中均取得了当前最优的结果。\n"
  },
  {
    "path": "abs/2506.18839.md",
    "content": "### 4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation\n\nWe propose the first framework capable of computing a 4D spatio-temporal grid of video frames and 3D Gaussian particles for each time step using a feed-forward architecture. Our architecture has two main components, a 4D video model and a 4D reconstruction model. In the first part, we analyze current 4D video diffusion architectures that perform spatial and temporal attention either sequentially or in parallel within a two-stream design. We highlight the limitations of existing approaches and introduce a novel fused architecture that performs spatial and temporal attention within a single layer. The key to our method is a sparse attention pattern, where tokens attend to others in the same frame, at the same timestamp, or from the same viewpoint. In the second part, we extend existing 3D reconstruction algorithms by introducing a Gaussian head, a camera token replacement algorithm, and additional dynamic layers and training. Overall, we establish a new state of the art for 4D generation, improving both visual quality and reconstruction capability.\n\n我们提出了首个能够利用前馈架构在每个时间步计算视频帧与三维高斯粒子的四维时空网格的框架。该架构主要由两个核心组件组成：**四维视频模型**与**四维重建模型**。在第一部分中，我们分析了当前的四维视频扩散架构，这些架构通常在双流设计中顺序或并行地执行空间与时间注意力机制。我们指出了现有方法的局限性，并提出了一种全新的融合式架构，在单一层内同时执行空间与时间注意力。我们方法的关键在于一种稀疏注意力模式，其中 token 仅关注同一帧、同一时间戳或来自相同视角的其他 token。在第二部分中，我们扩展了现有的三维重建算法，引入了**高斯头（Gaussian Head）**、**相机 token 替换算法**以及额外的动态层与训练机制。总体而言，我们的方法在四维生成任务中建立了新的最优水平，同时提升了视觉质量与重建能力。\n"
  },
  {
    "path": "abs/2506.18885.md",
    "content": "### GRAND-SLAM: Local Optimization for Globally Consistent Large-Scale Multi-Agent Gaussian SLAM\n\n3D Gaussian splatting has emerged as an expressive scene representation for RGB-D visual SLAM, but its application to large-scale, multi-agent outdoor environments remains unexplored. Multi-agent Gaussian SLAM is a promising approach to rapid exploration and reconstruction of environments, offering scalable environment representations, but existing approaches are limited to small-scale, indoor environments. To that end, we propose Gaussian Reconstruction via Multi-Agent Dense SLAM, or GRAND-SLAM, a collaborative Gaussian splatting SLAM method that integrates i) an implicit tracking module based on local optimization over submaps and ii) an approach to inter- and intra-robot loop closure integrated into a pose-graph optimization framework. Experiments show that GRAND-SLAM provides state-of-the-art tracking performance and 28% higher PSNR than existing methods on the Replica indoor dataset, as well as 91% lower multi-agent tracking error and improved rendering over existing multi-agent methods on the large-scale, outdoor Kimera-Multi dataset.\n\n三维高斯投影（3D Gaussian splatting）已成为 RGB-D 视觉 SLAM 中一种具有表现力的场景表示方式，但其在大规模、多智能体户外环境中的应用尚未得到探索。多智能体高斯 SLAM 是一种有前景的环境快速探索与重建方法，能够实现可扩展的环境表示，但现有方法仍局限于小规模室内环境。为此，我们提出了一种基于多智能体稠密 SLAM 的高斯重建方法，称为 GRAND-SLAM，该方法是一种协同式高斯投影 SLAM 技术，集成了：i）基于子地图局部优化的隐式跟踪模块；ii）融合到位姿图优化框架中的机器人间和机器人内部闭环检测方法。实验结果表明，GRAND-SLAM 在 Replica 室内数据集上实现了业界领先的跟踪性能，PSNR 比现有方法高出 28%；在大规模户外 Kimera-Multi 数据集上，与现有多智能体方法相比，其多智能体跟踪误差降低了 91%，且渲染效果更佳。\n"
  },
  {
    "path": "abs/2506.19139.md",
    "content": "### SOF: Sorted Opacity Fields for Fast Unbounded Surface Reconstruction\n\nRecent advances in 3D Gaussian representations have significantly improved the quality and efficiency of image-based scene reconstruction. Their explicit nature facilitates real-time rendering and fast optimization, yet extracting accurate surfaces - particularly in large-scale, unbounded environments - remains a difficult task. Many existing methods rely on approximate depth estimates and global sorting heuristics, which can introduce artifacts and limit the fidelity of the reconstructed mesh. In this paper, we present Sorted Opacity Fields (SOF), a method designed to recover detailed surfaces from 3D Gaussians with both speed and precision. Our approach improves upon prior work by introducing hierarchical resorting and a robust formulation of Gaussian depth, which better aligns with the level-set. To enhance mesh quality, we incorporate a level-set regularizer operating on the opacity field and introduce losses that encourage geometrically-consistent primitive shapes. In addition, we develop a parallelized Marching Tetrahedra algorithm tailored to our opacity formulation, reducing meshing time by up to an order of magnitude. As demonstrated by our quantitative evaluation, SOF achieves higher reconstruction accuracy while cutting total processing time by more than a factor of three. These results mark a step forward in turning efficient Gaussian-based rendering into equally efficient geometry extraction.\n\n三维高斯表示的最新进展显著提升了基于图像的场景重建的质量与效率。其显式特性有助于实现实时渲染和快速优化，但在大规模、无边界环境中提取精确表面仍是一项挑战。许多现有方法依赖于近似深度估计和全局排序启发式算法，这可能导致伪影的产生，并限制重建网格的保真度。本文提出了**排序不透明度场（Sorted Opacity Fields, SOF）**方法，旨在以高速度和高精度从三维高斯中恢复细致表面。我们的方法通过引入**分层重排序机制**和更稳健的高斯深度建模（更好地对齐 level-set）来改进现有工作。为提升网格质量，我们在不透明度场上引入 level-set 正则项，并设计了促进几何一致性原始形状的损失函数。此外，我们还开发了一种并行化的 Marching Tetrahedra 算法，专为我们的不透明度建模设计，其网格生成时间最多可减少一个数量级。定量评估表明，SOF 实现了更高的重建精度，同时总处理时间减少超过三倍。这些成果推动了从高效的高斯渲染迈向同样高效的几何提取。\n"
  },
  {
    "path": "abs/2506.19415.md",
    "content": "### Virtual Memory for 3D Gaussian Splatting\n\n3D Gaussian Splatting represents a breakthrough in the field of novel view synthesis. It establishes Gaussians as core rendering primitives for highly accurate real-world environment reconstruction. Recent advances have drastically increased the size of scenes that can be created. In this work, we present a method for rendering large and complex 3D Gaussian Splatting scenes using virtual memory. By leveraging well-established virtual memory and virtual texturing techniques, our approach efficiently identifies visible Gaussians and dynamically streams them to the GPU just in time for real-time rendering. Selecting only the necessary Gaussians for both storage and rendering results in reduced memory usage and effectively accelerates rendering, especially for highly complex scenes. Furthermore, we demonstrate how level of detail can be integrated into our proposed method to further enhance rendering speed for large-scale scenes. With an optimized implementation, we highlight key practical considerations and thoroughly evaluate the proposed technique and its impact on desktop and mobile devices.\n\n三维高斯投影（3D Gaussian Splatting）在新视角合成领域代表了一项突破性进展，确立了高斯作为核心渲染基元，在重建真实世界环境中展现出极高的精度。近期的研究显著扩大了可生成场景的规模。在本工作中，我们提出了一种利用虚拟内存渲染大规模复杂 3D 高斯投影场景的方法。通过借助成熟的虚拟内存与虚拟纹理技术，我们的方法能够高效识别可见高斯，并在实时渲染过程中动态地将其按需加载至 GPU。仅选择必要的高斯进行存储与渲染，不仅减少了内存占用，还显著加速了渲染过程，尤其适用于高度复杂的场景。此外，我们进一步展示了如何将细节层次（Level of Detail, LOD）机制整合进该方法，从而提升大规模场景的渲染速度。借助优化实现，我们还重点分析了该方法在桌面端与移动设备上的关键实践要点，并对其性能表现进行了全面评估。\n"
  },
  {
    "path": "abs/2506.20875.md",
    "content": "### 3DGH: 3D Head Generation with Composable Hair and Face\n\nWe present 3DGH, an unconditional generative model for 3D human heads with composable hair and face components. Unlike previous work that entangles the modeling of hair and face, we propose to separate them using a novel data representation with template-based 3D Gaussian Splatting, in which deformable hair geometry is introduced to capture the geometric variations across different hairstyles. Based on this data representation, we design a 3D GAN-based architecture with dual generators and employ a cross-attention mechanism to model the inherent correlation between hair and face. The model is trained on synthetic renderings using carefully designed objectives to stabilize training and facilitate hair-face separation. We conduct extensive experiments to validate the design choice of 3DGH, and evaluate it both qualitatively and quantitatively by comparing with several state-of-the-art 3D GAN methods, demonstrating its effectiveness in unconditional full-head image synthesis and composable 3D hairstyle editing.\n\n我们提出了 3DGH，一种用于三维人头生成的无条件生成模型，支持头发与面部组件的可组合建模。不同于以往将头发与面部混合建模的方式，我们采用基于模板的三维高斯投影（3D Gaussian Splatting）引入了一种全新的数据表示方法，将两者分离，并在其中引入可变形的头发几何结构以捕捉不同发型之间的几何差异。在该数据表示基础上，我们设计了一种基于三维 GAN 的网络结构，采用双生成器架构，并引入交叉注意力机制以建模头发与面部之间的内在关联。该模型基于合成渲染数据进行训练，并结合精心设计的目标函数以稳定训练过程并促进头发与面部的有效分离。我们进行了大量实验来验证 3DGH 的设计选择，并通过与多种先进的三维 GAN 方法进行定性与定量比较，展示了其在无条件全头图像合成与可组合三维发型编辑方面的有效性。\n"
  },
  {
    "path": "abs/2506.20998.md",
    "content": "### DBMovi-GS: Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting\n\nNovel view synthesis is a task of generating scenes from unseen perspectives; however, synthesizing dynamic scenes from blurry monocular videos remains an unresolved challenge that has yet to be effectively addressed. Existing novel view synthesis methods are often constrained by their reliance on high-resolution images or strong assumptions about static geometry and rigid scene priors. Consequently, their approaches lack robustness in real-world environments with dynamic object and camera motion, leading to instability and degraded visual fidelity. To address this, we propose Motion-aware Dynamic View Synthesis from Blurry Monocular Video via Sparse-Controlled Gaussian Splatting (DBMovi-GS), a method designed for dynamic view synthesis from blurry monocular videos. Our model generates dense 3D Gaussians, restoring sharpness from blurry videos and reconstructing detailed 3D geometry of the scene affected by dynamic motion variations. Our model achieves robust performance in novel view synthesis under dynamic blurry scenes and sets a new benchmark in realistic novel view synthesis for blurry monocular video inputs.\n\n新视角合成（Novel View Synthesis）旨在从未见过的视角生成场景。然而，从模糊的单目视频中合成动态场景仍是一个尚未被有效解决的难题。现有的新视角合成方法通常依赖高分辨率图像或对静态几何和刚性场景的强假设，这使得它们在存在动态物体与摄像机运动的真实环境中表现出较差的鲁棒性，进而导致渲染不稳定和视觉保真度下降。为了解决这一问题，我们提出了一种基于稀疏控制高斯投影的模糊单目视频动态视图合成方法，称为 DBMovi-GS。该方法专为从模糊单目视频中进行动态新视角合成而设计，能够生成稠密的三维高斯表示，从而在恢复视频清晰度的同时，重建受动态运动变化影响的场景的精细三维几何结构。实验证明，该方法在动态模糊场景下的新视角合成任务中表现出强大的鲁棒性，并为模糊单目视频输入下的真实感视图合成设立了新基准。\n"
  },
  {
    "path": "abs/2506.21117.md",
    "content": "### CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization\n\nIn dynamic 3D environments, accurately updating scene representations over time is crucial for applications in robotics, mixed reality, and embodied AI. As scenes evolve, efficient methods to incorporate changes are needed to maintain up-to-date, high-quality reconstructions without the computational overhead of re-optimizing the entire scene. This paper introduces CL-Splats, which incrementally updates Gaussian splatting-based 3D representations from sparse scene captures. CL-Splats integrates a robust change-detection module that segments updated and static components within the scene, enabling focused, local optimization that avoids unnecessary re-computation. Moreover, CL-Splats supports storing and recovering previous scene states, facilitating temporal segmentation and new scene-analysis applications. Our extensive experiments demonstrate that CL-Splats achieves efficient updates with improved reconstruction quality over the state-of-the-art. This establishes a robust foundation for future real-time adaptation in 3D scene reconstruction tasks.\n\n在动态三维环境中，随着场景的不断演化，如何高效、准确地更新场景表示对于机器人技术、混合现实以及具身智能等应用至关重要。为了保持最新且高质量的重建效果，亟需一种能够高效引入场景变化的方法，同时避免对整个场景进行高开销的重新优化。本文提出了 CL-Splats，一种基于高斯投影的三维表示增量更新方法，可从稀疏的场景捕捉中逐步构建更新内容。CL-Splats 集成了一个强健的变化检测模块，能够将场景中的变动部分与静态部分进行分割，从而实现聚焦式的局部优化，避免不必要的重复计算。此外，CL-Splats 支持历史场景状态的存储与恢复，便于进行时间序列分割与新型场景分析任务。大量实验表明，CL-Splats 在更新效率和重建质量方面均优于现有方法，为未来实时三维场景重建任务中的自适应更新奠定了坚实基础。\n"
  },
  {
    "path": "abs/2506.21152.md",
    "content": "### Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image\n\nGenerating realistic 3D objects from single-view images requires natural appearance, 3D consistency, and the ability to capture multiple plausible interpretations of unseen regions. Existing approaches often rely on fine-tuning pretrained 2D diffusion models or directly generating 3D information through fast network inference or 3D Gaussian Splatting, but their results generally suffer from poor multiview consistency and lack geometric detail. To takle these issues, we present a novel method that seamlessly integrates geometry and perception priors without requiring additional model training to reconstruct detailed 3D objects from a single image. Specifically, we train three different Gaussian branches initialized from the geometry prior, perception prior and Gaussian noise, respectively. The geometry prior captures the rough 3D shapes, while the perception prior utilizes the 2D pretrained diffusion model to enhance multiview information. Subsequently, we refine 3D Gaussian branches through mutual interaction between geometry and perception priors, further enhanced by a reprojection-based strategy that enforces depth consistency. Experiments demonstrate the higher-fidelity reconstruction results of our method, outperforming existing methods on novel view synthesis and 3D reconstruction, demonstrating robust and consistent 3D object generation.\n\n从单视图图像生成逼真的三维物体，需要具备自然的外观表现、三维一致性，以及对不可见区域进行多种合理推断的能力。现有方法通常依赖于对预训练的二维扩散模型进行微调，或通过快速网络推理或三维高斯投影（3D Gaussian Splatting）直接生成三维信息，但这些方法普遍存在多视角一致性差、几何细节不足的问题。为了解决这些问题，我们提出了一种新颖的方法，在无需额外模型训练的前提下，实现几何先验与感知先验的无缝融合，从而从单张图像中重建出细节丰富的三维物体。具体而言，我们设计了三个不同的高斯分支，分别从几何先验、感知先验和高斯噪声初始化。几何先验用于捕捉粗略的三维形状，感知先验则利用预训练的二维扩散模型增强多视角信息。随后，我们通过几何与感知先验之间的相互作用对三维高斯分支进行联合优化，并引入基于重投影的一致性策略以强化深度一致性。实验表明，该方法在新视角合成和三维重建任务中均优于现有方法，能够实现稳健且一致的高保真三维物体生成。\n"
  },
  {
    "path": "abs/2506.21401.md",
    "content": "### Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction\n\nThis paper presents an end-to-end framework for reconstructing 3D parametric curves directly from multi-view edge maps. Contrasting with existing two-stage methods that follow a sequential \"edge point cloud reconstruction and parametric curve fitting\" pipeline, our one-stage approach optimizes 3D parametric curves directly from 2D edge maps, eliminating error accumulation caused by the inherent optimization gap between disconnected stages. However, parametric curves inherently lack suitability for rendering-based multi-view optimization, necessitating a complementary representation that preserves their geometric properties while enabling differentiable rendering. We propose a novel bi-directional coupling mechanism between parametric curves and edge-oriented Gaussian components. This tight correspondence formulates a curve-aware Gaussian representation, \\textbf{CurveGaussian}, that enables differentiable rendering of 3D curves, allowing direct optimization guided by multi-view evidence. Furthermore, we introduce a dynamically adaptive topology optimization framework during training to refine curve structures through linearization, merging, splitting, and pruning operations. Comprehensive evaluations on the ABC dataset and real-world benchmarks demonstrate our one-stage method's superiority over two-stage alternatives, particularly in producing cleaner and more robust reconstructions. Additionally, by directly optimizing parametric curves, our method significantly reduces the parameter count during training, achieving both higher efficiency and superior performance compared to existing approaches.\n\n本文提出了一种端到端框架，可直接从多视角边缘图中重建三维参数曲线。与现有采用“边缘点云重建 + 参数曲线拟合”两阶段顺序流程的方法不同，我们的一阶段方法直接从二维边缘图优化三维参数曲线，避免了由于阶段分离导致的优化间隙所引起的误差累积。然而，参数曲线本身并不适用于基于渲染的多视角优化，因此需要一种既能保留其几何属性，又支持可微渲染的补充表示。为此，我们提出了一种参数曲线与边缘导向高斯组件之间的双向耦合机制。该紧密对应关系构成了一种曲线感知的高斯表示，称为 CurveGaussian，使得三维曲线具备可微渲染能力，能够基于多视角证据进行直接优化。此外，我们还引入了一种训练过程中的动态自适应拓扑优化框架，可通过线性化、合并、分裂和裁剪操作精细调整曲线结构。在 ABC 数据集和真实世界基准上的全面评估表明，该一阶段方法相较于两阶段方案在生成更干净且更鲁棒的重建结果方面具有显著优势。同时，由于直接优化参数曲线，我们的方法在训练中显著减少了参数数量，在保持高效率的同时也实现了优于现有方法的性能。\n"
  },
  {
    "path": "abs/2506.21420.md",
    "content": "### EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting\n\nEfficient three-dimensional reconstruction and real-time visualization are critical in surgical scenarios such as endoscopy. In recent years, 3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in efficient 3D reconstruction and rendering. Most 3DGS-based Simultaneous Localization and Mapping (SLAM) methods only rely on the appearance constraints for optimizing both 3DGS and camera poses. However, in endoscopic scenarios, the challenges include photometric inconsistencies caused by non-Lambertian surfaces and dynamic motion from breathing affects the performance of SLAM systems. To address these issues, we additionally introduce optical flow loss as a geometric constraint, which effectively constrains both the 3D structure of the scene and the camera motion. Furthermore, we propose a depth regularisation strategy to mitigate the problem of photometric inconsistencies and ensure the validity of 3DGS depth rendering in endoscopic scenes. In addition, to improve scene representation in the SLAM system, we improve the 3DGS refinement strategy by focusing on viewpoints corresponding to Keyframes with suboptimal rendering quality frames, achieving better rendering results. Extensive experiments on the C3VD static dataset and the StereoMIS dynamic dataset demonstrate that our method outperforms existing state-of-the-art methods in novel view synthesis and pose estimation, exhibiting high performance in both static and dynamic surgical scenes.\n\n高效的三维重建与实时可视化在内窥镜等外科手术场景中至关重要。近年来，三维高斯投影（3D Gaussian Splatting，3DGS）在高效三维重建与渲染方面展现出显著性能。然而，大多数基于 3DGS 的同步定位与建图（SLAM）方法主要依赖外观约束来优化 3DGS 和相机位姿，在内窥镜场景中却面临非朗伯表面引起的光度不一致性以及呼吸运动带来的动态干扰等挑战，导致 SLAM 系统性能下降。为解决这些问题，我们在优化过程中引入了**光流损失**作为几何约束，有效约束了场景的三维结构与相机运动。同时，我们提出了一种**深度正则化策略**，用于缓解光度不一致问题，并保证在内窥镜场景中 3DGS 深度渲染的有效性。此外，为提升 SLAM 系统中的场景表示能力，我们改进了 3DGS 的细化策略，聚焦于关键帧中渲染质量较差的视角，从而实现更优的渲染效果。在 C3VD 静态数据集与 StereoMIS 动态数据集上的大量实验证明，我们的方法在新视角合成与位姿估计方面均优于现有先进方法，在静态与动态手术场景中均展现出卓越性能。\n"
  },
  {
    "path": "abs/2506.21513.md",
    "content": "### GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation\n\nCreating high-quality, generalizable speech-driven 3D talking heads remains a persistent challenge. Previous methods achieve satisfactory results for fixed viewpoints and small-scale audio variations, but they struggle with large head rotations and out-of-distribution (OOD) audio. Moreover, they are constrained by the need for time-consuming, identity-specific training. We believe the core issue lies in the lack of sufficient 3D priors, which limits the extrapolation capabilities of synthesized talking heads. To address this, we propose GGTalker, which synthesizes talking heads through a combination of generalizable priors and identity-specific adaptation. We introduce a two-stage Prior-Adaptation training strategy to learn Gaussian head priors and adapt to individual characteristics. We train Audio-Expression and Expression-Visual priors to capture the universal patterns of lip movements and the general distribution of head textures. During the Customized Adaptation, individual speaking styles and texture details are precisely modeled. Additionally, we introduce a color MLP to generate fine-grained, motion-aligned textures and a Body Inpainter to blend rendered results with the background, producing indistinguishable, photorealistic video frames. Comprehensive experiments show that GGTalker achieves state-of-the-art performance in rendering quality, 3D consistency, lip-sync accuracy, and training efficiency.\n\n生成高质量、具有泛化能力的语音驱动三维数字人面部动画一直是一个具有挑战性的课题。以往方法在固定视角和小尺度音频变化下可实现令人满意的效果，但在面对大幅度头部旋转和超出训练分布（OOD）的音频时表现不佳。此外，这些方法通常依赖于耗时的身份特定训练过程，限制了其实用性。我们认为，其核心问题在于缺乏充分的三维先验，导致生成数字人面部动画时的外推能力受限。为此，我们提出了 **GGTalker**，通过结合可泛化的先验知识与身份特定的自适应机制，实现数字人口型动画的合成。我们引入了一个两阶段的“先验-适应”训练策略，先学习高斯人头先验，再进行个体化特征适应。我们训练了**音频-表情先验**与**表情-视觉先验**，用于捕捉口型运动的通用模式和人头纹理的整体分布。在个性化适应阶段，我们精确建模个体的说话风格和纹理细节。此外，我们引入了**颜色 MLP** 用于生成细粒度、与动作对齐的纹理图，并设计了**背景修复模块（Body Inpainter）**以将渲染结果自然融合至背景中，生成高度逼真的视频帧。大量实验表明，GGTalker 在渲染质量、三维一致性、唇形同步精度以及训练效率方面均达到了当前最优性能。\n"
  },
  {
    "path": "abs/2506.21520.md",
    "content": "### MADrive: Memory-Augmented Driving Scene Modeling\n\nRecent advances in scene reconstruction have pushed toward highly realistic modeling of autonomous driving (AD) environments using 3D Gaussian splatting. However, the resulting reconstructions remain closely tied to the original observations and struggle to support photorealistic synthesis of significantly altered or novel driving scenarios. This work introduces MADrive, a memory-augmented reconstruction framework designed to extend the capabilities of existing scene reconstruction methods by replacing observed vehicles with visually similar 3D assets retrieved from a large-scale external memory bank. Specifically, we release MAD-Cars, a curated dataset of ∼70K 360° car videos captured in the wild and present a retrieval module that finds the most similar car instances in the memory bank, reconstructs the corresponding 3D assets from video, and integrates them into the target scene through orientation alignment and relighting. The resulting replacements provide complete multi-view representations of vehicles in the scene, enabling photorealistic synthesis of substantially altered configurations, as demonstrated in our experiments.\n\n近年来，随着三维高斯投影（3D Gaussian Splatting）的发展，场景重建技术在自动驾驶（AD）环境中的逼真建模取得了显著进展。然而，现有重建结果仍高度依赖原始观测数据，难以支持对大幅修改或新颖驾驶场景的真实感合成。为了解决这一问题，本文提出了 MADrive —— 一种**记忆增强型重建框架**，旨在通过从大规模外部记忆库中检索视觉相似的三维资产，替换原始观测车辆，从而扩展现有场景重建方法的能力。具体而言，我们发布了 MAD-Cars 数据集，这是一个精心整理的包含约 7 万条真实环境中采集的 360° 汽车视频的数据集。我们设计了一个检索模块，用于在记忆库中找到最相似的车辆实例，从视频中重建相应的三维资产，并通过朝向对齐与重新光照的方式将其集成进目标场景。所替换的车辆提供了完整的多视角表示，能够实现对场景中车辆构型进行大幅修改后的真实感合成，实验结果充分验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2506.21629.md",
    "content": "### ICP-3DGS: SfM-free 3D Gaussian Splatting for Large-scale Unbounded Scenes\n\nIn recent years, neural rendering methods such as NeRFs and 3D Gaussian Splatting (3DGS) have made significant progress in scene reconstruction and novel view synthesis. However, they heavily rely on preprocessed camera poses and 3D structural priors from structure-from-motion (SfM), which are challenging to obtain in outdoor scenarios. To address this challenge, we propose to incorporate Iterative Closest Point (ICP) with optimization-based refinement to achieve accurate camera pose estimation under large camera movements. Additionally, we introduce a voxel-based scene densification approach to guide the reconstruction in large-scale scenes. Experiments demonstrate that our approach ICP-3DGS outperforms existing methods in both camera pose estimation and novel view synthesis across indoor and outdoor scenes of various scales.\n\n近年来，神经渲染方法如 NeRF 和三维高斯投影（3D Gaussian Splatting，3DGS）在场景重建和新视角合成方面取得了显著进展。然而，这些方法严重依赖于预处理的相机位姿和来自 SfM（结构光恢复）算法的三维结构先验，而在户外场景中获取这些信息往往十分困难。为了解决这一问题，我们提出将迭代最近点（ICP）算法与基于优化的精细化策略结合，在大幅相机运动下实现准确的相机位姿估计。此外，我们引入了一种基于体素的场景加密方法，用于在大规模场景中引导三维重建。实验结果表明，我们提出的方法 ICP-3DGS 在相机位姿估计与新视角合成任务中，在不同尺度的室内外场景上均优于现有方法。\n"
  },
  {
    "path": "abs/2506.21633.md",
    "content": "### SAR-GS: 3D Gaussian Splatting for Synthetic Aperture Radar Target Reconstruction\n\nThree-dimensional target reconstruction from synthetic aperture radar (SAR) imagery is crucial for interpreting complex scattering information in SAR data. However, the intricate electromagnetic scattering mechanisms inherent to SAR imaging pose significant reconstruction challenges. Inspired by the remarkable success of 3D Gaussian Splatting (3D-GS) in optical domain reconstruction, this paper presents a novel SAR Differentiable Gaussian Splatting Rasterizer (SDGR) specifically designed for SAR target reconstruction. Our approach combines Gaussian splatting with the Mapping and Projection Algorithm to compute scattering intensities of Gaussian primitives and generate simulated SAR images through SDGR. Subsequently, the loss function between the rendered image and the ground truth image is computed to optimize the Gaussian primitive parameters representing the scene, while a custom CUDA gradient flow is employed to replace automatic differentiation for accelerated gradient computation. Through experiments involving the rendering of simplified architectural targets and SAR images of multiple vehicle targets, we validate the imaging rationality of SDGR on simulated SAR imagery. Furthermore, the effectiveness of our method for target reconstruction is demonstrated on both simulated and real-world datasets containing multiple vehicle targets, with quantitative evaluations conducted to assess its reconstruction performance. Experimental results indicate that our approach can effectively reconstruct the geometric structures and scattering properties of targets, thereby providing a novel solution for 3D reconstruction in the field of SAR imaging.\n\n基于合成孔径雷达（SAR）图像的三维目标重建对于理解 SAR 数据中复杂的散射信息具有重要意义。然而，SAR 成像中固有的复杂电磁散射机制使得三维重建面临巨大挑战。受三维高斯投影（3D Gaussian Splatting，3D-GS）在光学域重建中取得显著成功的启发，本文提出了一种新颖的 SAR 可微高斯投影光栅化器（SAR Differentiable Gaussian Splatting Rasterizer，SDGR），专为 SAR 目标重建设计。我们的方法将高斯投影与映射投影算法相结合，用于计算高斯基元的散射强度，并通过 SDGR 生成模拟 SAR 图像。随后，计算渲染图与真实图像之间的损失函数，以优化表示场景的高斯基元参数，并通过定制的 CUDA 梯度流替代自动微分，以加速梯度计算。我们通过简化建筑目标的渲染实验以及多个车辆目标的 SAR 图像实验，验证了 SDGR 在模拟 SAR 成像中的成像合理性。此外，我们还在包含多个车辆目标的模拟与真实数据集上进行了目标重建实验，并进行了定量评估。实验结果表明，该方法能够有效重建目标的几何结构与散射特性，为 SAR 成像领域中的三维重建提供了一种全新的解决方案。\n"
  },
  {
    "path": "abs/2506.22044.md",
    "content": "### Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field\n\nReconstruction and rendering-based talking head synthesis methods achieve high-quality results with strong identity preservation but are limited by their dependence on identity-specific models. Each new identity requires training from scratch, incurring high computational costs and reduced scalability compared to generative model-based approaches. To overcome this limitation, we propose FIAG, a novel 3D speaking head synthesis framework that enables efficient identity-specific adaptation using only a few training footage. FIAG incorporates Global Gaussian Field, which supports the representation of multiple identities within a shared field, and Universal Motion Field, which captures the common motion dynamics across diverse identities. Benefiting from the shared facial structure information encoded in the Global Gaussian Field and the general motion priors learned in the motion field, our framework enables rapid adaptation from canonical identity representations to specific ones with minimal data. Extensive comparative and ablation experiments demonstrate that our method outperforms existing state-of-the-art approaches, validating both the effectiveness and generalizability of the proposed framework.\n\n基于重建与渲染的语音驱动人脸合成方法在身份保持方面表现优异，能够生成高质量的说话人头部动画，但受限于对身份特定模型的高度依赖。每新增一个身份都需从头训练，带来较高的计算成本，且相较于生成式方法可扩展性较差。为突破这一限制，我们提出了 FIAG —— 一种新颖的三维说话人脸合成框架，能够通过少量训练数据实现高效的身份特定自适应。FIAG 引入了**全局高斯场（Global Gaussian Field）**，用于在共享空间中表示多个身份，同时引入**通用运动场（Universal Motion Field）**，用于捕捉不同身份间的共通动态变化模式。得益于全局高斯场中编码的共享面部结构信息，以及运动场中学习到的通用运动先验，FIAG 能够以极少数据实现从标准身份表示到特定身份表示的快速适应。大量对比实验与消融分析表明，该方法在效果和泛化性方面均优于现有的主流方法，有效验证了所提出框架的先进性与实用性。\n"
  },
  {
    "path": "abs/2506.22280.md",
    "content": "### DIGS: Dynamic CBCT Reconstruction using Deformation-Informed 4D Gaussian Splatting and a Low-Rank Free-Form Deformation Model\n\n3D Cone-Beam CT (CBCT) is widely used in radiotherapy but suffers from motion artifacts due to breathing. A common clinical approach mitigates this by sorting projections into respiratory phases and reconstructing images per phase, but this does not account for breathing variability. Dynamic CBCT instead reconstructs images at each projection, capturing continuous motion without phase sorting. Recent advancements in 4D Gaussian Splatting (4DGS) offer powerful tools for modeling dynamic scenes, yet their application to dynamic CBCT remains underexplored. Existing 4DGS methods, such as HexPlane, use implicit motion representations, which are computationally expensive. While explicit low-rank motion models have been proposed, they lack spatial regularization, leading to inconsistencies in Gaussian motion. To address these limitations, we introduce a free-form deformation (FFD)-based spatial basis function and a deformation-informed framework that enforces consistency by coupling the temporal evolution of Gaussian's mean position, scale, and rotation under a unified deformation field. We evaluate our approach on six CBCT datasets, demonstrating superior image quality with a 6x speedup over HexPlane. These results highlight the potential of deformation-informed 4DGS for efficient, motion-compensated CBCT reconstruction.\n\n三维锥束计算机断层扫描（3D Cone-Beam CT，CBCT）在放射治疗中被广泛应用，但由于呼吸运动，图像容易出现运动伪影。临床上常见的应对方法是将投影按呼吸周期进行分相，并对每个相位分别重建图像，但该方法无法处理呼吸的可变性。相比之下，动态 CBCT 可在每帧投影下重建图像，从而连续捕捉运动过程，避免了分相步骤。近年来，四维高斯投影（4D Gaussian Splatting，4DGS）在动态场景建模方面表现出强大潜力，但其在动态 CBCT 中的应用仍未被深入探索。现有 4DGS 方法（如 HexPlane）多采用隐式运动表示，计算开销较大；而部分显式低秩运动模型虽提高了效率，却缺乏空间正则性，导致高斯运动表现不一致。为此，我们提出了一种基于自由形变（FFD）的空间基函数，并构建了一个变形感知框架，将高斯分布的平均位置、尺度与旋转的时间演化统一建模在同一个变形场中，以增强运动一致性。在六个 CBCT 数据集上的实验表明，我们的方法在图像质量方面优于现有方法，并在速度上较 HexPlane 提升了 6 倍。结果表明，变形感知的 4DGS 方法在实现高效、运动补偿的 CBCT 重建方面具有巨大潜力。\n"
  },
  {
    "path": "abs/2506.22718.md",
    "content": "### Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D Gaussians\n\nPart segmentation and motion estimation are two fundamental problems for articulated object motion analysis. In this paper, we present a method to solve these two problems jointly from a sequence of observed point clouds of a single articulated object. The main challenge in our problem setting is that the point clouds are not assumed to be generated by a fixed set of moving points. Instead, each point cloud in the sequence could be an arbitrary sampling of the object surface at that particular time step. Such scenarios occur when the object undergoes major occlusions, or if the dataset is collected using measurements from multiple sensors asynchronously. In these scenarios, methods that rely on tracking point correspondences are not appropriate. We present an alternative approach based on a compact but effective representation where we represent the object as a collection of simple building blocks modeled as 3D Gaussians. We parameterize the Gaussians with time-dependent rotations, translations, and scales that are shared across all time steps. With our representation, part segmentation can be achieved by building correspondences between the observed points and the Gaussians. Moreover, the transformation of each point across time can be obtained by following the poses of the assigned Gaussian (even when the point is not observed). Experiments show that our method outperforms existing methods that solely rely on finding point correspondences. Additionally, we extend existing datasets to emulate real-world scenarios by considering viewpoint occlusions. We further demonstrate that our method is more robust to missing points as compared to existing approaches on these challenging datasets, even when some parts are completely occluded in some time-steps. Notably, our part segmentation performance outperforms the state-of-the-art method by 13% on point clouds with occlusions.\n\n部件分割和运动估计是解析关节物体运动的两个基本问题。本文提出了一种方法，可从单个关节物体的一系列观测点云中联合解决这两个问题。我们所面临的主要挑战在于：这些点云并不假设来自一组固定移动点的生成。相反，序列中的每一帧点云都可能是该时间点上物体表面的任意采样。这种情况在物体发生大遮挡，或数据由多个异步传感器采集时尤为常见。在这类场景中，依赖点对应关系追踪的方法将难以奏效。为此，我们提出了一种基于紧凑且高效表示的替代方案，即将物体表示为由多个简单构件组成的集合，每个构件建模为一个三维高斯分布。我们使用随时间变化的旋转、平移和缩放参数对这些高斯分布进行建模，这些参数在所有时间步间共享。通过这种表示方式，可通过构建观测点与高斯之间的对应关系实现部件分割。同时，即便某些点在特定时间步中未被观测，也可通过追踪其所归属的高斯的姿态来获得该点的时序变换。实验表明，我们的方法优于仅依赖点对应关系的方法。此外，我们还扩展了现有数据集，通过引入视角遮挡模拟真实场景。进一步实验表明，在这些具有挑战性的数据集中，即便某些部件在某些时间步中完全被遮挡，我们的方法相比现有方法对点缺失更具鲁棒性。值得注意的是，在存在遮挡的点云数据上，我们的部件分割性能相比现有最先进方法提升了 13%。\n"
  },
  {
    "path": "abs/2506.22756.md",
    "content": "### RoboPearls: Editable Video Simulation for Robot Manipulation\n\nThe development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging the sim-to-real gap remains. To address these challenges, we propose RoboPearls, an editable video simulation framework for robotic manipulation. Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations from demonstration videos, and supports a wide range of simulation operators, including various object manipulations, powered by advanced modules like Incremental Semantic Distillation (ISD) and 3D regularized NNFM Loss (3D-NNFM). Moreover, by incorporating large language models (LLMs), RoboPearls automates the simulation production process in a user-friendly manner through flexible command interpretation and execution. Furthermore, RoboPearls employs a vision-language model (VLM) to analyze robotic learning issues to close the simulation loop for performance enhancement. To demonstrate the effectiveness of RoboPearls, we conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot, which demonstrate our satisfactory simulation performance.\n\n通用型机器人操作策略的发展在近年取得了显著进展，这得益于跨多样环境的大规模示范数据。然而，现实世界中示范数据的采集成本高且效率低，严重制约了数据获取的可扩展性。尽管现有仿真平台可以为机器人学习提供可控环境，但“从仿真到现实”（sim-to-real）的鸿沟仍是关键挑战。为应对这一问题，我们提出了 **RoboPearls** —— 一个用于机器人操作任务的可编辑视频仿真框架。该框架基于三维高斯投影（3D Gaussian Splatting, 3DGS），能够从示范视频中构建具有照片级真实感和视角一致性的仿真场景，并支持广泛的仿真操作，包括多种对象操控行为，这些能力由诸如**增量语义蒸馏（ISD）**和**三维正则化 NNFM 损失（3D-NNFM）**等先进模块提供支持。此外，RoboPearls 集成了大型语言模型（LLMs），可通过灵活的指令理解与执行，自动化生成仿真过程，提升用户交互友好性。同时，RoboPearls 还结合视觉-语言模型（VLM）对机器人学习过程中的问题进行分析，实现仿真闭环优化与性能增强。我们在多个数据集和场景上进行了大量实验，包括 RLBench、COLOSSEUM、Ego4D、Open X-Embodiment 以及真实机器人平台，实验结果表明 RoboPearls 在仿真效果方面表现出色，验证了其有效性。\n"
  },
  {
    "path": "abs/2506.22799.md",
    "content": "### VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding\n\n3D Gaussian Splatting (3DGS) has become horsepower in high-quality, real-time rendering for novel view synthesis of 3D scenes. However, existing methods focus primarily on geometric and appearance modeling, lacking deeper scene understanding while also incurring high training costs that complicate the originally streamlined differentiable rendering pipeline. To this end, we propose VoteSplat, a novel 3D scene understanding framework that integrates Hough voting with 3DGS. Specifically, Segment Anything Model (SAM) is utilized for instance segmentation, extracting objects, and generating 2D vote maps. We then embed spatial offset vectors into Gaussian primitives. These offsets construct 3D spatial votes by associating them with 2D image votes, while depth distortion constraints refine localization along the depth axis. For open-vocabulary object localization, VoteSplat maps 2D image semantics to 3D point clouds via voting points, reducing training costs associated with high-dimensional CLIP features while preserving semantic unambiguity. Extensive experiments demonstrate effectiveness of VoteSplat in open-vocabulary 3D instance localization, 3D point cloud understanding, click-based 3D object localization, hierarchical segmentation, and ablation studies.\n\n三维高斯投影（3D Gaussian Splatting，3DGS）已成为高质量、实时新视角合成渲染的重要基础。然而，现有方法主要聚焦于几何与外观建模，缺乏对场景的深层理解，并且训练成本高昂，违背了其原本简洁的可微渲染管线设计初衷。为此，我们提出了 **VoteSplat** —— 一种融合 Hough 投票机制与 3DGS 的新型三维场景理解框架。具体而言，我们引入 Segment Anything Model（SAM）进行实例分割，提取物体并生成二维投票图。随后，我们将空间偏移向量嵌入至高斯基元中，并通过将这些偏移与二维图像投票关联，构建三维空间投票。同时，引入深度畸变约束以优化沿深度方向的定位精度。在开放词汇的物体定位任务中，VoteSplat 通过投票点将二维图像语义映射到三维点云，有效降低高维 CLIP 特征的训练成本，同时保留语义判别性。大量实验表明，VoteSplat 在开放词汇三维实例定位、三维点云理解、基于点击的三维物体定位、层次化分割及消融分析等任务中均展现出优异表现。\n"
  },
  {
    "path": "abs/2506.22800.md",
    "content": "### RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors\n\nA single-pass driving clip frequently results in incomplete scanning of the road structure, making reconstructed scene expanding a critical requirement for sensor simulators to effectively regress driving actions. Although contemporary 3D Gaussian Splatting (3DGS) techniques achieve remarkable reconstruction quality, their direct extension through the integration of diffusion priors often introduces cumulative physical inconsistencies and compromises training efficiency. To address these limitations, we present RGE-GS, a novel expansive reconstruction framework that synergizes diffusion-based generation with reward-guided Gaussian integration. The RGE-GS framework incorporates two key innovations: First, we propose a reward network that learns to identify and prioritize consistently generated patterns prior to reconstruction phases, thereby enabling selective retention of diffusion outputs for spatial stability. Second, during the reconstruction process, we devise a differentiated training strategy that automatically adjust Gaussian optimization progress according to scene converge metrics, which achieving better convergence than baseline methods. Extensive evaluations of publicly available datasets demonstrate that RGE-GS achieves state-of-the-art performance in reconstruction quality.\n\n单次驾驶采集常常无法完整扫描道路结构，因此对重建场景进行扩展已成为传感器模拟器有效回归驾驶行为的关键需求。尽管当代三维高斯投影（3D Gaussian Splatting，3DGS）技术在重建质量方面表现出色，但其通过直接集成扩散先验进行扩展，往往会引入累积的物理不一致性，并降低训练效率。为克服上述局限，我们提出了 **RGE-GS** —— 一种融合扩散生成与奖励引导高斯集成的全新扩展式重建框架。RGE-GS 包含两项核心创新：其一，我们设计了一个奖励网络，在进入重建阶段前学习识别并优先保留稳定生成的结构模式，从而实现对扩散结果的选择性保留，提升空间稳定性；其二，在重建过程中，我们提出了一种差异化训练策略，能够根据场景收敛指标自动调节高斯优化进程，从而实现比基线方法更优的收敛效果。我们在多个公开数据集上进行了全面评估，结果表明 RGE-GS 在重建质量方面达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2506.22973.md",
    "content": "### Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions\n\n3D Gaussian Splatting enables high-quality real-time rendering but often produces millions of splats, resulting in excessive storage and computational overhead. We propose a novel lossy compression method based on learnable confidence scores modeled as Beta distributions. Each splat's confidence is optimized through reconstruction-aware losses, enabling pruning of low-confidence splats while preserving visual fidelity. The proposed approach is architecture-agnostic and can be applied to any Gaussian Splatting variant. In addition, the average confidence values serve as a new metric to assess the quality of the scene. Extensive experiments demonstrate favorable trade-offs between compression and fidelity compared to prior work.\n\n三维高斯投影（3D Gaussian Splatting）能够实现高质量的实时渲染，但通常会生成数量高达数百万的 splat，导致存储和计算开销巨大。为此，我们提出了一种新颖的有损压缩方法，其核心是基于可学习的置信度得分，这些得分通过 Beta 分布建模。每个 splat 的置信度通过与重建质量相关的损失函数进行优化，从而在保证视觉保真的前提下，剔除置信度较低的 splat。该方法具有架构无关性，可适用于任意形式的高斯投影方法。此外，平均置信度得分还可作为评估场景质量的新指标。大量实验证明，本文方法在压缩率与保真度之间达成了比现有方法更优的平衡。\n"
  },
  {
    "path": "abs/2506.23042.md",
    "content": "### From Coarse to Fine: Learnable Discrete Wavelet Transforms for Efficient 3D Gaussian Splatting\n\n3D Gaussian Splatting has emerged as a powerful approach in novel view synthesis, delivering rapid training and rendering but at the cost of an ever-growing set of Gaussian primitives that strains memory and bandwidth. We introduce AutoOpti3DGS, a training-time framework that automatically restrains Gaussian proliferation without sacrificing visual fidelity. The key idea is to feed the input images to a sequence of learnable Forward and Inverse Discrete Wavelet Transforms, where low-pass filters are kept fixed, high-pass filters are learnable and initialized to zero, and an auxiliary orthogonality loss gradually activates fine frequencies. This wavelet-driven, coarse-to-fine process delays the formation of redundant fine Gaussians, allowing 3DGS to capture global structure first and refine detail only when necessary. Through extensive experiments, AutoOpti3DGS requires just a single filter learning-rate hyper-parameter, integrates seamlessly with existing efficient 3DGS frameworks, and consistently produces sparser scene representations more compatible with memory or storage-constrained hardware.\n\n三维高斯投影（3D Gaussian Splatting）作为一种新视角合成的强大方法，能够实现快速训练与渲染，但代价是高斯基元数量不断膨胀，从而对内存与带宽造成压力。为了解决这一问题，我们提出了 **AutoOpti3DGS** —— 一个训练阶段框架，可在不降低视觉保真的前提下自动抑制高斯基元的过度增长。其核心思想是将输入图像输入到一组可学习的正向与逆向离散小波变换（Discrete Wavelet Transforms）中，其中低通滤波器保持固定，高通滤波器则可学习并初始化为零，通过辅助正交损失逐步激活细节频率。该基于小波驱动的由粗到细训练策略，延迟了冗余精细高斯的生成，使 3DGS 能够优先捕捉全局结构，仅在必要时才细化局部细节。大量实验表明，AutoOpti3DGS 仅需一个滤波器学习率的超参数，能够无缝集成到现有高效 3DGS 框架中，始终生成更稀疏的场景表示，更加适用于内存或存储受限的硬件环境。\n"
  },
  {
    "path": "abs/2506.23207.md",
    "content": "### TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints\n\nRecent advances in 3D Gaussian Splatting (3DGS) have enabled RGB-only SLAM systems to achieve high-fidelity scene representation. However, the heavy reliance of existing systems on photometric rendering loss for camera tracking undermines their robustness, especially in unbounded outdoor environments with severe viewpoint and illumination changes. To address these challenges, we propose TVG-SLAM, a robust RGB-only 3DGS SLAM system that leverages a novel tri-view geometry paradigm to ensure consistent tracking and high-quality mapping. We introduce a dense tri-view matching module that aggregates reliable pairwise correspondences into consistent tri-view matches, forming robust geometric constraints across frames. For tracking, we propose Hybrid Geometric Constraints, which leverage tri-view matches to construct complementary geometric cues alongside photometric loss, ensuring accurate and stable pose estimation even under drastic viewpoint shifts and lighting variations. For mapping, we propose a new probabilistic initialization strategy that encodes geometric uncertainty from tri-view correspondences into newly initialized Gaussians. Additionally, we design a Dynamic Attenuation of Rendering Trust mechanism to mitigate tracking drift caused by mapping latency. Experiments on multiple public outdoor datasets show that our TVG-SLAM outperforms prior RGB-only 3DGS-based SLAM systems. Notably, in the most challenging dataset, our method improves tracking robustness, reducing the average Absolute Trajectory Error (ATE) by 69.0\\% while achieving state-of-the-art rendering quality.\n\n三维高斯投影（3D Gaussian Splatting，3DGS）的最新进展使得纯 RGB 输入的 SLAM 系统也能实现高保真的场景重建。然而，现有系统在相机跟踪过程中严重依赖光度渲染损失，导致其在无边界的户外环境中，面对剧烈的视角与光照变化时鲁棒性不足。为解决上述问题，我们提出了 **TVG-SLAM** —— 一种鲁棒的纯 RGB 3DGS SLAM 系统，引入了全新的**三视图几何范式**（Tri-View Geometry）以确保稳定的跟踪与高质量建图。我们设计了一个**稠密三视图匹配模块**，将可靠的双视图对应关系聚合为一致的三视图匹配，从而在多帧间构建强健的几何约束。在跟踪方面，我们提出了**混合几何约束（Hybrid Geometric Constraints）**，结合三视图匹配生成的几何信息与光度损失，提升在大视角变换和强光照变化下的位姿估计精度与稳定性。在建图方面，我们引入了**概率初始化策略**，将三视图对应中的几何不确定性编码进新初始化的高斯中。此外，我们还设计了**渲染置信动态衰减机制**，用于缓解建图延迟带来的跟踪漂移问题。在多个公开户外数据集上的实验结果表明，TVG-SLAM 在性能上全面优于现有基于 3DGS 的纯 RGB SLAM 系统。特别是在最具挑战性的数据集中，我们的方法显著提升了跟踪鲁棒性，将平均绝对轨迹误差（ATE）降低了 69.0%，同时实现了业界领先的渲染质量。\n"
  },
  {
    "path": "abs/2506.23308.md",
    "content": "### Endo-4DGX: Robust Endoscopic Scene Reconstruction and Illumination Correction with Gaussian Splatting\n\nAccurate reconstruction of soft tissue is crucial for advancing automation in image-guided robotic surgery. The recent 3D Gaussian Splatting (3DGS) techniques and their variants, 4DGS, achieve high-quality renderings of dynamic surgical scenes in real-time. However, 3D-GS-based methods still struggle in scenarios with varying illumination, such as low light and over-exposure. Training 3D-GS in such extreme light conditions leads to severe optimization problems and devastating rendering quality. To address these challenges, we present Endo-4DGX, a novel reconstruction method with illumination-adaptive Gaussian Splatting designed specifically for endoscopic scenes with uneven lighting. By incorporating illumination embeddings, our method effectively models view-dependent brightness variations. We introduce a region-aware enhancement module to model the sub-area lightness at the Gaussian level and a spatial-aware adjustment module to learn the view-consistent brightness adjustment. With the illumination adaptive design, Endo-4DGX achieves superior rendering performance under both low-light and over-exposure conditions while maintaining geometric accuracy. Additionally, we employ an exposure control loss to restore the appearance from adverse exposure to the normal level for illumination-adaptive optimization. Experimental results demonstrate that Endo-4DGX significantly outperforms combinations of state-of-the-art reconstruction and restoration methods in challenging lighting environments, underscoring its potential to advance robot-assisted surgical applications.\n\n软组织的精确重建对于推进图像引导的机器人手术自动化具有关键意义。近年来，三维高斯投影（3D Gaussian Splatting, 3DGS）及其扩展方法 4DGS 已能够实现动态手术场景的实时高质量渲染。然而，3DGS 类方法在光照变化剧烈的场景中仍表现不佳，如低照度和过曝环境。在这些极端光照条件下训练 3DGS 会导致严重的优化难题和极差的渲染质量。为应对这一挑战，我们提出 **Endo-4DGX**，一种专为内窥镜场景中非均匀照明设计的光照自适应高斯投影重建方法。通过引入**光照嵌入向量**，我们的方法能够有效建模视角相关的亮度变化；我们还设计了**区域感知增强模块**，在高斯级别对子区域亮度进行建模；以及**空间感知调整模块**，学习视角一致的亮度调节策略。得益于光照自适应机制，Endo-4DGX 在低照度与过曝条件下均可实现优异的渲染效果，同时保持几何重建的准确性。此外，我们还引入了**曝光控制损失**，用于将不良曝光下的外观恢复至正常状态，从而实现更鲁棒的光照自适应优化。实验结果表明，在光照极端变化的环境下，Endo-4DGX 显著优于当前最先进的重建与图像恢复方法的组合，展示了其在机器人辅助手术应用中的巨大潜力。\n"
  },
  {
    "path": "abs/2506.23309.md",
    "content": "### SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting\n\nIn contemporary surgical research and practice, accurately comprehending 3D surgical scenes with text-promptable capabilities is particularly crucial for surgical planning and real-time intra-operative guidance, where precisely identifying and interacting with surgical tools and anatomical structures is paramount. However, existing works focus on surgical vision-language model (VLM), 3D reconstruction, and segmentation separately, lacking support for real-time text-promptable 3D queries. In this paper, we present SurgTPGS, a novel text-promptable Gaussian Splatting method to fill this gap. We introduce a 3D semantics feature learning strategy incorporating the Segment Anything model and state-of-the-art vision-language models. We extract the segmented language features for 3D surgical scene reconstruction, enabling a more in-depth understanding of the complex surgical environment. We also propose semantic-aware deformation tracking to capture the seamless deformation of semantic features, providing a more precise reconstruction for both texture and semantic features. Furthermore, we present semantic region-aware optimization, which utilizes regional-based semantic information to supervise the training, particularly promoting the reconstruction quality and semantic smoothness. We conduct comprehensive experiments on two real-world surgical datasets to demonstrate the superiority of SurgTPGS over state-of-the-art methods, highlighting its potential to revolutionize surgical practices. SurgTPGS paves the way for developing next-generation intelligent surgical systems by enhancing surgical precision and safety.\n\n在当代外科研究与实践中，实现具备文本可提示能力的三维手术场景理解对于术前规划与术中实时引导至关重要，尤其是在精确识别与操作手术器械及解剖结构方面。然而，现有工作通常将外科视觉-语言模型（VLM）、三维重建与分割任务分开研究，尚不支持实时文本驱动的三维查询。为填补这一空白，本文提出了 **SurgTPGS** —— 一种新型的文本可提示高斯投影方法。我们设计了一种三维语义特征学习策略，融合 Segment Anything 模型与最新的视觉-语言模型，从中提取分割后的语言特征以支持三维手术场景重建，从而更深入理解复杂的手术环境。我们还提出了**语义感知形变追踪**方法，用于捕捉语义特征的连续变形过程，从而实现纹理与语义的更精确重建。此外，我们设计了**语义区域感知优化机制**，利用区域级语义信息监督训练过程，显著提升重建质量与语义平滑性。我们在两个真实手术数据集上进行了全面实证，结果显示 SurgTPGS 显著优于当前最先进方法，展现出其变革外科实践的巨大潜力。SurgTPGS 有望推动新一代智能手术系统的发展，进一步提升手术精度与安全性。\n"
  },
  {
    "path": "abs/2506.23611.md",
    "content": "### AttentionGS: Towards Initialization-Free 3D Gaussian Splatting via Structural Attention\n\n3D Gaussian Splatting (3DGS) is a powerful alternative to Neural Radiance Fields (NeRF), excelling in complex scene reconstruction and efficient rendering. However, it relies on high-quality point clouds from Structure-from-Motion (SfM), limiting its applicability. SfM also fails in texture-deficient or constrained-view scenarios, causing severe degradation in 3DGS reconstruction. To address this limitation, we propose AttentionGS, a novel framework that eliminates the dependency on high-quality initial point clouds by leveraging structural attention for direct 3D reconstruction from randomly initialization. In the early training stage, we introduce geometric attention to rapidly recover the global scene structure. As training progresses, we incorporate texture attention to refine fine-grained details and enhance rendering quality. Furthermore, we employ opacity-weighted gradients to guide Gaussian densification, leading to improved surface reconstruction. Extensive experiments on multiple benchmark datasets demonstrate that AttentionGS significantly outperforms state-of-the-art methods, particularly in scenarios where point cloud initialization is unreliable. Our approach paves the way for more robust and flexible 3D Gaussian Splatting in real-world applications.\n\n三维高斯投影（3D Gaussian Splatting，3DGS）是神经辐射场（Neural Radiance Fields，NeRF）的有力替代方案，在复杂场景重建和高效渲染方面表现出色。然而，它依赖于来自运动恢复结构（Structure-from-Motion，SfM）的高质量点云，这限制了其适用性。SfM 在纹理缺乏或视角受限的场景中也会失效，从而导致 3DGS 重建效果严重退化。为解决这一局限，我们提出了 AttentionGS，这是一种新颖的框架，通过利用结构注意力从随机初始化直接进行三维重建，从而消除了对高质量初始点云的依赖。在训练初期，我们引入几何注意力以快速恢复全局场景结构。随着训练的进行，我们加入纹理注意力以精细化细节并提升渲染质量。此外，我们采用不透明度加权梯度来引导高斯密化，从而改善表面重建效果。在多个基准数据集上的大量实验表明，AttentionGS 在点云初始化不可靠的场景下显著优于当前最先进的方法。我们的方法为在真实应用中实现更稳健、更灵活的三维高斯投影铺平了道路。\n"
  },
  {
    "path": "abs/2506.23957.md",
    "content": "### GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering\n\nVideo stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent. Existing approaches, depending on the domain they operate, suffer from several issues (e.g. geometric distortions, excessive cropping, poor generalization) that degrade the user experience. To address these issues, we introduce **GaVS**, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent \"local reconstruction and rendering\" paradigm. Given 3D camera poses, we augment a reconstruction model to predict Gaussian Splatting primitives, and finetune it at test-time, with multi-view dynamics-aware photometric supervision and cross-frame regularization, to produce temporally-consistent local reconstructions. The model are then used to render each stabilized frame. We utilize a scene extrapolation module to avoid frame cropping. Our method is evaluated on a repurposed dataset, instilled with 3D-grounded information, covering samples with diverse camera motions and scene dynamics. Quantitatively, our method is competitive with or superior to state-of-the-art 2D and 2.5D approaches in terms of conventional task metrics and new geometry consistency. Qualitatively, our method produces noticeably better results compared to alternatives, validated by the user study.\n\n视频稳定化在视频处理领域至关重要，它能够消除不必要的抖动，同时保留用户原本的运动意图。现有方法根据其应用领域的不同，存在多种问题（如几何失真、过度裁剪、泛化能力差），这些问题会降低用户体验。为了解决这些问题，我们提出了 **GaVS**，一种基于三维的全新方法，将视频稳定化重新定义为一种时间一致的“局部重建与渲染”范式。在已知三维相机位姿的前提下，我们增强重建模型以预测高斯投影（Gaussian Splatting）基元，并在测试阶段利用多视角、动态感知的光度监督以及跨帧正则化对模型进行微调，从而生成时间一致的局部重建结果。随后使用该模型渲染每一帧稳定化后的画面。我们还引入了场景外推模块以避免画面裁剪。我们的方法在一个经过重新构建的数据集上进行了评估，该数据集包含了丰富的三维先验信息，覆盖了多种相机运动模式和场景动态。在定量评估中，我们的方法在传统任务指标和新的几何一致性指标上均能与最先进的二维和 2.5D 方法竞争甚至超越它们。在定性评估中，通过用户研究验证，我们的方法相比现有替代方案能够产生显著更优的效果。\n"
  },
  {
    "path": "abs/2506.24096.md",
    "content": "### MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction\n\nWhile recent advances in Gaussian Splatting have enabled fast reconstruction of high-quality 3D scenes from images, extracting accurate surface meshes remains a challenge. Current approaches extract the surface through costly post-processing steps, resulting in the loss of fine geometric details or requiring significant time and leading to very dense meshes with millions of vertices. More fundamentally, the a posteriori conversion from a volumetric to a surface representation limits the ability of the final mesh to preserve all geometric structures captured during training. We present MILo, a novel Gaussian Splatting framework that bridges the gap between volumetric and surface representations by differentiably extracting a mesh from the 3D Gaussians. We design a fully differentiable procedure that constructs the mesh-including both vertex locations and connectivity-at every iteration directly from the parameters of the Gaussians, which are the only quantities optimized during training. Our method introduces three key technical contributions: a bidirectional consistency framework ensuring both representations-Gaussians and the extracted mesh-capture the same underlying geometry during training; an adaptive mesh extraction process performed at each training iteration, which uses Gaussians as differentiable pivots for Delaunay triangulation; a novel method for computing signed distance values from the 3D Gaussians that enables precise surface extraction while avoiding geometric erosion. Our approach can reconstruct complete scenes, including backgrounds, with state-of-the-art quality while requiring an order of magnitude fewer mesh vertices than previous methods. Due to their light weight and empty interior, our meshes are well suited for downstream applications such as physics simulations or animation.\n\n尽管高斯投影（Gaussian Splatting）的最新进展已经实现了从图像快速重建高质量三维场景，但精确提取表面网格仍然是一个挑战。现有方法通常通过代价高昂的后处理步骤来提取表面，这不仅可能导致细微几何细节的丢失，还可能需要大量时间，并生成包含数百万顶点的超密集网格。从更根本的角度来看，从体积表示到表面表示的事后转换，限制了最终网格保留训练过程中捕获的全部几何结构的能力。我们提出了 MILo，这是一种新颖的高斯投影框架，通过可微分的方式直接从三维高斯中提取网格，从而弥合体积表示与表面表示之间的鸿沟。我们设计了一套完全可微的流程，在每次迭代中直接基于高斯的参数（训练中唯一被优化的量）构建网格，包括顶点位置和连接关系。我们的方法包含三个关键技术贡献：（1）一种双向一致性框架，确保高斯表示与提取的网格在训练过程中捕获相同的底层几何结构；（2）一种在每次训练迭代中执行的自适应网格提取过程，使用高斯作为可微的 Delaunay 三角剖分枢点；（3）一种新颖的从三维高斯计算符号距离值的方法，使得能够精确提取表面并避免几何侵蚀。我们的方法能够以最先进的质量重建完整场景（包括背景），同时所需的网格顶点数量比现有方法减少一个数量级。由于其轻量化和内部空洞的特性，我们的网格非常适合用于物理模拟或动画等下游应用。\n"
  },
  {
    "path": "abs/2507.00363.md",
    "content": "### GDGS: 3D Gaussian Splatting Via Geometry-Guided Initialization And Dynamic Density Control\n\nWe propose a method to enhance 3D Gaussian Splatting (3DGS), addressing challenges in initialization, optimization, and density control. Gaussian Splatting is an alternative for rendering realistic images while supporting real-time performance, and it has gained popularity due to its explicit 3D Gaussian representation. However, 3DGS heavily depends on accurate initialization and faces difficulties in optimizing unstructured Gaussian distributions into ordered surfaces, with limited adaptive density control mechanism proposed so far. Our first key contribution is a geometry-guided initialization to predict Gaussian parameters, ensuring precise placement and faster convergence. We then introduce a surface-aligned optimization strategy to refine Gaussian placement, improving geometric accuracy and aligning with the surface normals of the scene. Finally, we present a dynamic adaptive density control mechanism that adjusts Gaussian density based on regional complexity, for visual fidelity. These innovations enable our method to achieve high-fidelity real-time rendering and significant improvements in visual quality, even in complex scenes. Our method demonstrates comparable or superior results to state-of-the-art methods, rendering high-fidelity images in real time.\n\n我们提出了一种改进三维高斯投影（3D Gaussian Splatting，3DGS）的方法，针对初始化、优化以及密度控制等挑战进行了改进。高斯投影是一种能够在保持实时性能的同时渲染逼真图像的替代方案，并因其显式的三维高斯表示而广受关注。然而，3DGS 严重依赖于精确的初始化，并且在将非结构化的高斯分布优化为有序表面时存在困难，同时迄今为止提出的自适应密度控制机制也较为有限。我们的第一个关键贡献是引入几何引导的初始化方法来预测高斯参数，从而保证精确放置并加速收敛。随后，我们提出一种与表面对齐的优化策略，以细化高斯的放置位置，提升几何精度，并与场景的表面法线对齐。最后，我们提出了一种动态自适应密度控制机制，可根据区域复杂度调整高斯密度，从而提升视觉逼真度。这些创新使得我们的方法即使在复杂场景中也能实现高保真实时渲染，并显著提升视觉质量。实验结果表明，我们的方法在实时渲染高保真图像方面能够达到或超越当前最先进的方法。\n"
  },
  {
    "path": "abs/2507.00392.md",
    "content": "### Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space\n\nFeature matching plays a fundamental role in many computer vision tasks, yet existing methods heavily rely on scarce and clean multi-view image collections, which constrains their generalization to diverse and challenging scenarios. Moreover, conventional feature encoders are typically trained on single-view 2D images, limiting their capacity to capture 3D-aware correspondences. In this paper, we propose a novel two-stage framework that lifts 2D images to 3D space, named as **Lift to Match (L2M)**, taking full advantage of large-scale and diverse single-view images. To be specific, in the first stage, we learn a 3D-aware feature encoder using a combination of multi-view image synthesis and 3D feature Gaussian representation, which injects 3D geometry knowledge into the encoder. In the second stage, a novel-view rendering strategy, combined with large-scale synthetic data generation from single-view images, is employed to learn a feature decoder for robust feature matching, thus achieving generalization across diverse domains. Extensive experiments demonstrate that our method achieves superior generalization across zero-shot evaluation benchmarks, highlighting the effectiveness of the proposed framework for robust feature matching.\n\n特征匹配在许多计算机视觉任务中发挥着基础性作用，但现有方法严重依赖稀缺且干净的多视图图像集合，这限制了其在多样化和具有挑战性的场景中的泛化能力。此外，传统的特征编码器通常在单视图二维图像上训练，限制了其捕捉具备三维感知的对应关系的能力。本文提出了一种将二维图像提升到三维空间的新型两阶段框架，称为 **Lift to Match (L2M)**，充分利用了大规模、多样化的单视图图像。具体而言，在第一阶段，我们通过结合多视图图像合成与三维特征高斯表示，学习一个具备三维感知的特征编码器，从而将三维几何知识注入编码器。在第二阶段，我们采用新视角渲染策略，并结合从单视图图像生成的大规模合成数据，来训练特征解码器以实现稳健的特征匹配，从而在不同领域中实现泛化。大量实验表明，我们的方法在零样本评估基准上表现出优越的泛化能力，凸显了所提框架在稳健特征匹配中的有效性。\n"
  },
  {
    "path": "abs/2507.00554.md",
    "content": "### LOD-GS: Level-of-Detail-Sensitive 3D Gaussian Splatting for Detail Conserved Anti-Aliasing\n\nDespite the advancements in quality and efficiency achieved by 3D Gaussian Splatting (3DGS) in 3D scene rendering, aliasing artifacts remain a persistent challenge. Existing approaches primarily rely on low-pass filtering to mitigate aliasing. However, these methods are not sensitive to the sampling rate, often resulting in under-filtering and over-smoothing renderings. To address this limitation, we propose LOD-GS, a Level-of-Detail-sensitive filtering framework for Gaussian Splatting, which dynamically predicts the optimal filtering strength for each 3D Gaussian primitive. Specifically, we introduce a set of basis functions to each Gaussian, which take the sampling rate as input to model appearance variations, enabling sampling-rate-sensitive filtering. These basis function parameters are jointly optimized with the 3D Gaussian in an end-to-end manner. The sampling rate is influenced by both focal length and camera distance. However, existing methods and datasets rely solely on down-sampling to simulate focal length changes for anti-aliasing evaluation, overlooking the impact of camera distance. To enable a more comprehensive assessment, we introduce a new synthetic dataset featuring objects rendered at varying camera distances. Extensive experiments on both public datasets and our newly collected dataset demonstrate that our method achieves SOTA rendering quality while effectively eliminating aliasing.\n\n尽管三维高斯投影（3D Gaussian Splatting，3DGS）在三维场景渲染的质量与效率方面取得了显著进展，但混叠伪影依然是一个长期存在的挑战。现有方法主要依赖低通滤波来减轻混叠问题。然而，这些方法对采样率不敏感，常常导致欠滤波或过度平滑的渲染效果。为解决这一局限，我们提出了 LOD-GS，这是一种针对高斯投影的细节层次（Level-of-Detail）敏感滤波框架，能够为每个三维高斯基元动态预测最优滤波强度。具体而言，我们为每个高斯引入一组基函数，以采样率作为输入来建模外观变化，从而实现采样率敏感的滤波。这些基函数参数与三维高斯一起在端到端的方式下联合优化。采样率受到焦距与相机距离的共同影响。然而，现有方法和数据集在抗混叠评估中仅依赖下采样来模拟焦距变化，忽视了相机距离的影响。为实现更全面的评估，我们引入了一个新的合成数据集，其中包含在不同相机距离下渲染的物体。在公共数据集和我们新收集的数据集上的大量实验表明，我们的方法在有效消除混叠的同时，实现了最先进的渲染质量。\n"
  },
  {
    "path": "abs/2507.00886.md",
    "content": "### GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond\n\nAs multimodal language models advance, their application to 3D scene understanding is a fast-growing frontier, driving the development of 3D Vision-Language Models (VLMs). Current methods show strong dependence on object detectors, introducing processing bottlenecks and limitations in taxonomic flexibility. To address these limitations, we propose a scene-centric 3D VLM for 3D Gaussian splat scenes that employs language- and task-aware scene representations. Our approach directly embeds rich linguistic features into the 3D scene representation by associating language with each Gaussian primitive, achieving early modality alignment. To process the resulting dense representations, we introduce a dual sparsifier that distills them into compact, task-relevant tokens via task-guided and location-guided pathways, producing sparse, task-aware global and local scene tokens. Notably, we present the first Gaussian splatting-based VLM, leveraging photorealistic 3D representations derived from standard RGB images, demonstrating strong generalization: it improves performance of prior 3D VLM five folds, in out-of-the-domain settings.\n\n随着多模态语言模型的发展，它们在三维场景理解中的应用正迅速成为前沿方向，推动了三维视觉语言模型（3D Vision-Language Models, VLMs）的发展。现有方法对目标检测器的依赖较强，导致处理瓶颈并限制了分类体系的灵活性。为克服这些局限，我们提出了一种面向场景的三维高斯投影（Gaussian Splatting）场景的三维视觉语言模型，该模型采用具备语言感知与任务感知的场景表示。我们的方法通过将语言与每个高斯基元关联，将丰富的语言特征直接嵌入三维场景表示中，实现了模态的早期对齐。为处理由此产生的稠密表示，我们引入了双稀疏化器（dual sparsifier），通过任务引导和位置引导两条路径将其提炼为紧凑的、与任务相关的标记，生成稀疏的、任务感知的全局与局部场景标记。值得注意的是，我们提出了首个基于高斯投影的视觉语言模型，利用由标准 RGB 图像生成的逼真三维表示，展现出强大的泛化能力：在域外设置中，其性能相较此前的三维视觉语言模型提升了五倍。\n"
  },
  {
    "path": "abs/2507.00916.md",
    "content": "### Masks make discriminative models great again!\n\nWe present Image2GS, a novel approach that addresses the challenging problem of reconstructing photorealistic 3D scenes from a single image by focusing specifically on the image-to-3D lifting component of the reconstruction process. By decoupling the lifting problem (converting an image to a 3D model representing what is visible) from the completion problem (hallucinating content not present in the input), we create a more deterministic task suitable for discriminative models. Our method employs visibility masks derived from optimized 3D Gaussian splats to exclude areas not visible from the source view during training. This masked training strategy significantly improves reconstruction quality in visible regions compared to strong baselines. Notably, despite being trained only on masked regions, Image2GS remains competitive with state-of-the-art discriminative models trained on full target images when evaluated on complete scenes. Our findings highlight the fundamental struggle discriminative models face when fitting unseen regions and demonstrate the advantages of addressing image-to-3D lifting as a distinct problem with specialized techniques.\n\n我们提出了 Image2GS，这是一种新方法，针对从单张图像重建照片级逼真三维场景这一具有挑战性的问题，专注于重建过程中的图像到三维提升（image-to-3D lifting）环节。通过将提升问题（将图像转换为表示可见部分的三维模型）与补全问题（臆造输入中不存在的内容）解耦，我们将任务转化为更具确定性、适合判别式模型处理的问题。我们的方法利用从优化后的三维高斯投影中获得的可见性掩码，在训练过程中排除源视角中不可见的区域。这种掩码训练策略相比强基线方法，在可见区域显著提升了重建质量。值得注意的是，即使仅在掩码区域上训练，Image2GS 在完整场景评估中依然能与在完整目标图像上训练的最先进判别式模型保持竞争力。我们的研究结果突出了判别式模型在拟合不可见区域时的根本困境，并展示了将图像到三维提升作为一个独立问题并采用专门技术处理的优势。\n"
  },
  {
    "path": "abs/2507.01125.md",
    "content": "### VISTA: Open-Vocabulary, Task-Relevant Robot Exploration with Online Semantic Gaussian Splatting\n\nWe present VISTA (Viewpoint-based Image selection with Semantic Task Awareness), an active exploration method for robots to plan informative trajectories that improve 3D map quality in areas most relevant for task completion. Given an open-vocabulary search instruction (e.g., \"find a person\"), VISTA enables a robot to explore its environment to search for the object of interest, while simultaneously building a real-time semantic 3D Gaussian Splatting reconstruction of the scene. The robot navigates its environment by planning receding-horizon trajectories that prioritize semantic similarity to the query and exploration of unseen regions of the environment. To evaluate trajectories, VISTA introduces a novel, efficient viewpoint-semantic coverage metric that quantifies both the geometric view diversity and task relevance in the 3D scene. On static datasets, our coverage metric outperforms state-of-the-art baselines, FisherRF and Bayes' Rays, in computation speed and reconstruction quality. In quadrotor hardware experiments, VISTA achieves 6x higher success rates in challenging maps, compared to baseline methods, while matching baseline performance in less challenging maps. Lastly, we show that VISTA is platform-agnostic by deploying it on a quadrotor drone and a Spot quadruped robot. Open-source code will be released upon acceptance of the paper.\n\n我们提出了 VISTA（Viewpoint-based Image selection with Semantic Task Awareness），这是一种主动探索方法，使机器人能够规划信息量丰富的轨迹，以提升与任务完成最相关区域的三维地图质量。在给定开放词汇的搜索指令（例如“寻找一个人”）的情况下，VISTA 使机器人能够在探索环境、寻找目标物体的同时，实时构建场景的语义三维高斯投影（3D Gaussian Splatting）重建。机器人通过规划递推视界轨迹来导航环境，该轨迹优先考虑与查询的语义相似性以及对环境中未探索区域的覆盖。为评估轨迹，VISTA 提出了新颖且高效的视角-语义覆盖指标，用于同时量化三维场景中的几何视角多样性和任务相关性。在静态数据集上，我们的覆盖指标在计算速度和重建质量方面均优于最先进的基线方法 FisherRF 和 Bayes' Rays。在四旋翼硬件实验中，VISTA 在复杂地图中的任务成功率比基线方法高 6 倍，同时在较简单地图中保持与基线相当的性能。最后，我们通过在四旋翼无人机和 Spot 四足机器人上的部署，证明了 VISTA 与平台无关。论文接收后将开源代码。\n"
  },
  {
    "path": "abs/2507.01367.md",
    "content": "### 3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation\n\nPhysical adversarial attack methods expose the vulnerabilities of deep neural networks and pose a significant threat to safety-critical scenarios such as autonomous driving. Camouflage-based physical attack is a more promising approach compared to the patch-based attack, offering stronger adversarial effectiveness in complex physical environments. However, most prior work relies on mesh priors of the target object and virtual environments constructed by simulators, which are time-consuming to obtain and inevitably differ from the real world. Moreover, due to the limitations of the backgrounds in training images, previous methods often fail to produce multi-view robust adversarial camouflage and tend to fall into sub-optimal solutions. Due to these reasons, prior work lacks adversarial effectiveness and robustness across diverse viewpoints and physical environments. We propose a physical attack framework based on 3D Gaussian Splatting (3DGS), named PGA, which provides rapid and precise reconstruction with few images, along with photo-realistic rendering capabilities. Our framework further enhances cross-view robustness and adversarial effectiveness by preventing mutual and self-occlusion among Gaussians and employing a min-max optimization approach that adjusts the imaging background of each viewpoint, helping the algorithm filter out non-robust adversarial features. Extensive experiments validate the effectiveness and superiority of PGA.\n\n物理对抗攻击方法揭示了深度神经网络的脆弱性，并对自动驾驶等安全关键场景构成了重大威胁。相比基于补丁的攻击，基于伪装的物理攻击在复杂物理环境中具有更强的对抗效果，因此更具潜力。然而，大多数现有工作依赖于目标物体的网格先验以及由模拟器构建的虚拟环境，这些先验的获取过程耗时且不可避免地与真实世界存在差异。此外，由于训练图像背景的局限性，现有方法往往难以生成具备多视角鲁棒性的对抗伪装，并容易陷入次优解。基于上述原因，以往方法在多样化视角和物理环境下的对抗有效性与鲁棒性均不足。为此，我们提出了一种基于三维高斯投影（3D Gaussian Splatting, 3DGS）的物理攻击框架——PGA，该框架能够利用少量图像实现快速且精确的重建，并具备照片级逼真的渲染能力。我们的框架通过防止高斯之间的相互遮挡和自遮挡，并采用极小极大优化方法调整每个视角的成像背景，从而提升跨视角的鲁棒性与对抗效果，帮助算法过滤掉非鲁棒的对抗特征。大量实验验证了 PGA 的有效性与优越性。\n"
  },
  {
    "path": "abs/2507.02257.md",
    "content": "### Gbake: Baking 3D Gaussian Splats into Reflection Probes\n\nThe growing popularity of 3D Gaussian Splatting has created the need to integrate traditional computer graphics techniques and assets in splatted environments. Since 3D Gaussian primitives encode lighting and geometry jointly as appearance, meshes are relit improperly when inserted directly in a mixture of 3D Gaussians and thus appear noticeably out of place. We introduce GBake, a specialized tool for baking reflection probes from Gaussian-splatted scenes that enables realistic reflection mapping of traditional 3D meshes in the Unity game engine.\n\n随着三维高斯投影（3D Gaussian Splatting）的日益流行，人们开始需要在高斯投影环境中集成传统计算机图形学技术和资源。由于三维高斯基元将光照与几何共同编码为外观，当传统网格直接插入到三维高斯混合场景中时，其重新光照效果会不正确，从而显得格格不入。为此，我们提出了 GBake，这是一款专门用于从高斯投影场景中烘焙反射探针的工具，可在 Unity 游戏引擎中实现对传统三维网格的逼真反射映射。\n"
  },
  {
    "path": "abs/2507.02363.md",
    "content": "### LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling\n\nDue to the complex and highly dynamic motions in the real world, synthesizing dynamic videos from multi-view inputs for arbitrary viewpoints is challenging. Previous works based on neural radiance field or 3D Gaussian splatting are limited to modeling fine-scale motion, greatly restricting their application. In this paper, we introduce LocalDyGS, which consists of two parts to adapt our method to both large-scale and fine-scale motion scenes: 1) We decompose a complex dynamic scene into streamlined local spaces defined by seeds, enabling global modeling by capturing motion within each local space. 2) We decouple static and dynamic features for local space motion modeling. A static feature shared across time steps captures static information, while a dynamic residual field provides time-specific features. These are combined and decoded to generate Temporal Gaussians, modeling motion within each local space. As a result, we propose a novel dynamic scene reconstruction framework to model highly dynamic real-world scenes more realistically. Our method not only demonstrates competitive performance on various fine-scale datasets compared to state-of-the-art (SOTA) methods, but also represents the first attempt to model larger and more complex highly dynamic scenes.\n\n由于现实世界中存在复杂且高度动态的运动，从多视图输入合成任意视角的动态视频是一项具有挑战性的任务。基于神经辐射场或三维高斯投影的现有方法在建模细粒度运动方面存在局限性，从而极大地限制了其应用。本文提出了 LocalDyGS，该方法由两个部分组成，以同时适应大规模和细粒度运动场景：1）我们将复杂的动态场景分解为由种子定义的精简局部空间，通过捕捉每个局部空间内的运动，实现全局建模；2）我们将局部空间运动建模中的静态特征与动态特征解耦，时间步共享的静态特征用于捕捉静态信息，而动态残差场则提供特定时间的特征。二者结合并解码生成时序高斯（Temporal Gaussians），用于建模各局部空间内的运动。因此，我们提出了一种新颖的动态场景重建框架，以更真实地建模高度动态的现实场景。我们的方法不仅在多个细粒度数据集上与当前最先进（SOTA）方法相比表现出竞争力，还首次尝试对更大规模、更复杂的高度动态场景进行建模。\n"
  },
  {
    "path": "abs/2507.02419.md",
    "content": "### AvatarMakeup: Realistic Makeup Transfer for 3D Animatable Head Avatars\n\nSimilar to facial beautification in real life, 3D virtual avatars require personalized customization to enhance their visual appeal, yet this area remains insufficiently explored. Although current 3D Gaussian editing methods can be adapted for facial makeup purposes, these methods fail to meet the fundamental requirements for achieving realistic makeup effects: 1) ensuring a consistent appearance during drivable expressions, 2) preserving the identity throughout the makeup process, and 3) enabling precise control over fine details. To address these, we propose a specialized 3D makeup method named AvatarMakeup, leveraging a pretrained diffusion model to transfer makeup patterns from a single reference photo of any individual. We adopt a coarse-to-fine idea to first maintain the consistent appearance and identity, and then to refine the details. In particular, the diffusion model is employed to generate makeup images as supervision. Due to the uncertainties in diffusion process, the generated images are inconsistent across different viewpoints and expressions. Therefore, we propose a Coherent Duplication method to coarsely apply makeup to the target while ensuring consistency across dynamic and multiview effects. Coherent Duplication optimizes a global UV map by recoding the averaged facial attributes among the generated makeup images. By querying the global UV map, it easily synthesizes coherent makeup guidance from arbitrary views and expressions to optimize the target avatar. Given the coarse makeup avatar, we further enhance the makeup by incorporating a Refinement Module into the diffusion model to achieve high makeup quality. Experiments demonstrate that AvatarMakeup achieves state-of-the-art makeup transfer quality and consistency throughout animation.\n\n与现实生活中的面部美化类似，三维虚拟头像同样需要个性化定制来提升其视觉吸引力，但这一领域仍然研究不足。尽管现有的三维高斯编辑方法可以被改造用于面部化妆，但它们无法满足实现真实化妆效果的基本要求：1）在可驱动表情下保持外观一致性；2）在化妆过程中保留身份特征；3）能够对细节进行精确控制。为此，我们提出了一种专门的三维化妆方法——AvatarMakeup，该方法利用预训练的扩散模型，从任意个体的一张参考照片中迁移化妆样式。我们采用由粗到细的思路，先确保外观和身份一致性，再逐步细化细节。具体而言，扩散模型用于生成化妆图像作为监督。然而，由于扩散过程的不确定性，生成的化妆图像在不同视角和表情下存在不一致。为解决这一问题，我们提出了一种一致性复制（Coherent Duplication）方法，对目标进行粗化妆的同时，确保动态与多视角效果的一致性。一致性复制通过记录生成化妆图像的平均面部属性来优化全局 UV 映射。通过查询该全局 UV 映射，可以轻松从任意视角和表情生成一致的化妆引导，从而优化目标头像。在得到粗化妆头像后，我们进一步将精细化模块（Refinement Module）引入扩散模型，以实现高质量的化妆效果。实验结果表明，AvatarMakeup 在动画全程中实现了化妆迁移质量与一致性的最先进水平。\n"
  },
  {
    "path": "abs/2507.02565.md",
    "content": "### Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning\n\nDue to visual ambiguities and inter-person occlusions, existing human pose estimation methods cannot recover plausible close interactions from in-the-wild videos. Even state-of-the-art large foundation models (e.g., SAM) cannot accurately distinguish human semantics in such challenging scenarios. In this work, we find that human appearance can provide a straightforward cue to address these obstacles. Based on this observation, we propose a dual-branch optimization framework to reconstruct accurate interactive motions with plausible body contacts constrained by human appearances, social proxemics, and physical laws. Specifically, we first train a diffusion model to learn the human proxemic behavior and pose prior knowledge. The trained network and two optimizable tensors are then incorporated into a dual-branch optimization framework to reconstruct human motions and appearances. Several constraints based on 3D Gaussians, 2D keypoints, and mesh penetrations are also designed to assist the optimization. With the proxemics prior and diverse constraints, our method is capable of estimating accurate interactions from in-the-wild videos captured in complex environments. We further build a dataset with pseudo ground-truth interaction annotations, which may promote future research on pose estimation and human behavior understanding. Experimental results on several benchmarks demonstrate that our method outperforms existing approaches.\n\n由于视觉歧义和人物间的相互遮挡，现有的人体姿态估计方法无法从自然环境视频中恢复合理的近距离交互。即使是最先进的大型基础模型（如 SAM）在这种具有挑战性的场景中也无法准确区分人体语义。在本研究中，我们发现人体外观可以提供一种直接线索来应对这些障碍。基于这一观察，我们提出了一种双分支优化框架，在人体外观、社会距离学（proxemics）以及物理规律的约束下，重建具有合理身体接触的精确交互动作。具体而言，我们首先训练一个扩散模型来学习人体的社会距离行为与姿态先验知识。随后，将训练好的网络与两个可优化张量结合到双分支优化框架中，以同时重建人体的动作与外观。此外，我们还设计了基于三维高斯、二维关键点和网格穿透的多种约束来辅助优化。在社会距离先验与多样化约束的帮助下，我们的方法能够从复杂环境中拍摄的自然视频中估计出精确的交互动作。我们还构建了一个带有伪真值交互标注的数据集，以促进未来在人体姿态估计与人类行为理解方面的研究。多项基准测试的实验结果表明，我们的方法优于现有方法。\n"
  },
  {
    "path": "abs/2507.02600.md",
    "content": "### ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects\n\nArticulated object manipulation remains a critical challenge in robotics due to the complex kinematic constraints and the limited physical reasoning of existing methods. In this work, we introduce ArtGS, a novel framework that extends 3D Gaussian Splatting (3DGS) by integrating visual-physical modeling for articulated object understanding and interaction. ArtGS begins with multi-view RGB-D reconstruction, followed by reasoning with a vision-language model (VLM) to extract semantic and structural information, particularly the articulated bones. Through dynamic, differentiable 3DGS-based rendering, ArtGS optimizes the parameters of the articulated bones, ensuring physically consistent motion constraints and enhancing the manipulation policy. By leveraging dynamic Gaussian splatting, cross-embodiment adaptability, and closed-loop optimization, ArtGS establishes a new framework for efficient, scalable, and generalizable articulated object modeling and manipulation. Experiments conducted in both simulation and real-world environments demonstrate that ArtGS significantly outperforms previous methods in joint estimation accuracy and manipulation success rates across a variety of articulated objects.\n\n关节物体操作由于其复杂的运动学约束以及现有方法在物理推理能力上的不足，仍然是机器人领域的一项关键挑战。在本研究中，我们提出了 ArtGS，这是一种将视觉-物理建模与三维高斯投影（3D Gaussian Splatting, 3DGS）相结合的新型框架，用于关节物体的理解与交互。ArtGS 首先通过多视角 RGB-D 重建获取场景数据，随后利用视觉语言模型（VLM）进行推理，以提取语义与结构信息，尤其是关节骨骼。通过基于动态、可微的 3DGS 渲染，ArtGS 优化关节骨骼的参数，确保物理一致的运动约束，并提升操作策略。借助动态高斯投影、跨机体适应性以及闭环优化，ArtGS 建立了一个高效、可扩展且具有良好泛化能力的关节物体建模与操作新框架。在仿真与真实环境中的实验结果表明，ArtGS 在关节估计精度与操作成功率方面，相较现有方法在多种关节物体上均取得了显著提升。\n"
  },
  {
    "path": "abs/2507.02803.md",
    "content": "### HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars\n\nWe introduce HyperGaussians, a novel extension of 3D Gaussian Splatting for high-quality animatable face avatars. Creating such detailed face avatars from videos is a challenging problem and has numerous applications in augmented and virtual reality. While tremendous successes have been achieved for static faces, animatable avatars from monocular videos still fall in the uncanny valley. The de facto standard, 3D Gaussian Splatting (3DGS), represents a face through a collection of 3D Gaussian primitives. 3DGS excels at rendering static faces, but the state-of-the-art still struggles with nonlinear deformations, complex lighting effects, and fine details. While most related works focus on predicting better Gaussian parameters from expression codes, we rethink the 3D Gaussian representation itself and how to make it more expressive. Our insights lead to a novel extension of 3D Gaussians to high-dimensional multivariate Gaussians, dubbed 'HyperGaussians'. The higher dimensionality increases expressivity through conditioning on a learnable local embedding. However, splatting HyperGaussians is computationally expensive because it requires inverting a high-dimensional covariance matrix. We solve this by reparameterizing the covariance matrix, dubbed the 'inverse covariance trick'. This trick boosts the efficiency so that HyperGaussians can be seamlessly integrated into existing models. To demonstrate this, we plug in HyperGaussians into the state-of-the-art in fast monocular face avatars: FlashAvatar. Our evaluation on 19 subjects from 4 face datasets shows that HyperGaussians outperform 3DGS numerically and visually, particularly for high-frequency details like eyeglass frames, teeth, complex facial movements, and specular reflections.\n\n我们提出了 HyperGaussians，这是一种面向高质量可动画人脸头像的三维高斯投影（3D Gaussian Splatting, 3DGS）新扩展。从视频中创建如此精细的人脸头像是一项具有挑战性的问题，并在增强现实和虚拟现实中有着广泛应用。尽管在静态人脸方面已取得巨大成功，但从单目视频生成的可动画头像仍处于“恐怖谷”效应中。当前事实上的标准——3DGS——通过一组三维高斯基元来表示人脸。3DGS 在渲染静态人脸方面表现出色，但在处理非线性变形、复杂光照效果和细节表现时，即使是最先进的方法仍面临困难。与大多数相关工作专注于从表情编码预测更优高斯参数不同，我们重新思考了三维高斯表示本身及其如何变得更具表现力。我们的洞察促成了一种将三维高斯扩展为高维多元高斯的新方法，称为“HyperGaussians”。更高的维度通过依赖可学习的局部嵌入提高了表达能力。然而，高维多元高斯投影的计算开销很大，因为它需要求逆高维协方差矩阵。为此，我们通过重新参数化协方差矩阵提出了“逆协方差技巧”（inverse covariance trick），显著提升了计算效率，使 HyperGaussians 能够无缝集成到现有模型中。为验证其效果，我们将 HyperGaussians 集成到当前最快的单目人脸头像生成方法 FlashAvatar 中。在来自四个数据集的 19 位受试者上的评估结果表明，HyperGaussians 在数值和视觉效果上均优于 3DGS，尤其是在眼镜框、牙齿、复杂面部运动以及高光反射等高频细节方面表现突出。\n"
  },
  {
    "path": "abs/2507.03737.md",
    "content": "### Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps\n\n3D Gaussian Splatting (3DGS) has become a popular solution in SLAM due to its high-fidelity and real-time novel view synthesis performance. However, some previous 3DGS SLAM methods employ a differentiable rendering pipeline for tracking, lack geometric priors in outdoor scenes. Other approaches introduce separate tracking modules, but they accumulate errors with significant camera movement, leading to scale drift. To address these challenges, we propose a robust RGB-only outdoor 3DGS SLAM method: S3PO-GS. Technically, we establish a self-consistent tracking module anchored in the 3DGS pointmap, which avoids cumulative scale drift and achieves more precise and robust tracking with fewer iterations. Additionally, we design a patch-based pointmap dynamic mapping module, which introduces geometric priors while avoiding scale ambiguity. This significantly enhances tracking accuracy and the quality of scene reconstruction, making it particularly suitable for complex outdoor environments. Our experiments on the Waymo, KITTI, and DL3DV datasets demonstrate that S3PO-GS achieves state-of-the-art results in novel view synthesis and outperforms other 3DGS SLAM methods in tracking accuracy.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）因其高保真度和实时的新视角合成性能，已成为 SLAM 中广受欢迎的解决方案。然而，一些已有的 3DGS SLAM 方法在跟踪中采用可微渲染管线，但在室外场景中缺乏几何先验；另一些方法则引入独立的跟踪模块，但在相机大幅移动时会累积误差，导致尺度漂移。为应对这些挑战，我们提出了一种鲁棒的仅基于 RGB 的室外 3DGS SLAM 方法：S3PO-GS。在技术上，我们构建了一个基于 3DGS 点图（pointmap）锚定的自一致跟踪模块，避免了累积尺度漂移，并以更少迭代实现更精确、更鲁棒的跟踪。此外，我们设计了基于补丁的点图动态建图模块，引入几何先验的同时避免尺度歧义。这显著提升了跟踪精度和场景重建质量，使其特别适用于复杂的室外环境。在 Waymo、KITTI 和 DL3DV 数据集上的实验表明，S3PO-GS 在新视角合成方面实现了当前最先进的结果，并在跟踪精度上优于其他 3DGS SLAM 方法。\n"
  },
  {
    "path": "abs/2507.03886.md",
    "content": "### ArmGS: Composite Gaussian Appearance Refinement for Modeling Dynamic Urban Environments\n\nThis work focuses on modeling dynamic urban environments for autonomous driving simulation. Contemporary data-driven methods using neural radiance fields have achieved photorealistic driving scene modeling, but they suffer from low rendering efficacy. Recently, some approaches have explored 3D Gaussian splatting for modeling dynamic urban scenes, enabling high-fidelity reconstruction and real-time rendering. However, these approaches often neglect to model fine-grained variations between frames and camera viewpoints, leading to suboptimal results. In this work, we propose a new approach named ArmGS that exploits composite driving Gaussian splatting with multi-granularity appearance refinement for autonomous driving scene modeling. The core idea of our approach is devising a multi-level appearance modeling scheme to optimize a set of transformation parameters for composite Gaussian refinement from multiple granularities, ranging from local Gaussian level to global image level and dynamic actor level. This not only models global scene appearance variations between frames and camera viewpoints, but also models local fine-grained changes of background and objects. Extensive experiments on multiple challenging autonomous driving datasets, namely, Waymo, KITTI, NOTR and VKITTI2, demonstrate the superiority of our approach over the state-of-the-art methods.\n\n本研究聚焦于面向自动驾驶仿真的动态城市环境建模。当前基于数据驱动的神经辐射场方法已在驾驶场景的照片级真实感建模中取得了成果，但渲染效率较低。近期，一些方法开始探索利用三维高斯投影（3D Gaussian Splatting）来建模动态城市场景，实现了高保真重建与实时渲染。然而，这些方法往往忽视了对帧间及相机视角间细粒度变化的建模，从而导致结果次优。为此，我们提出了一种新方法——ArmGS，通过复合驾驶高斯投影结合多粒度外观细化来进行自动驾驶场景建模。其核心思想是设计一种多层级外观建模方案，从局部高斯层级到全局图像层级以及动态主体层级，优化一组复合高斯细化的变换参数。这不仅能够建模帧间与视角间的全局场景外观变化，还能够捕捉背景与物体的局部细粒度变化。在 Waymo、KITTI、NOTR 和 VKITTI2 等多个具有挑战性的自动驾驶数据集上的大量实验表明，我们的方法在性能上优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2507.04004.md",
    "content": "### Gaussian-LIC2: LiDAR-Inertial-Camera Gaussian Splatting SLAM\n\nThis paper presents the first photo-realistic LiDAR-Inertial-Camera Gaussian Splatting SLAM system that simultaneously addresses visual quality, geometric accuracy, and real-time performance. The proposed method performs robust and accurate pose estimation within a continuous-time trajectory optimization framework, while incrementally reconstructing a 3D Gaussian map using camera and LiDAR data, all in real time. The resulting map enables high-quality, real-time novel view rendering of both RGB images and depth maps. To effectively address under-reconstruction in regions not covered by the LiDAR, we employ a lightweight zero-shot depth model that synergistically combines RGB appearance cues with sparse LiDAR measurements to generate dense depth maps. The depth completion enables reliable Gaussian initialization in LiDAR-blind areas, significantly improving system applicability for sparse LiDAR sensors. To enhance geometric accuracy, we use sparse but precise LiDAR depths to supervise Gaussian map optimization and accelerate it with carefully designed CUDA-accelerated strategies. Furthermore, we explore how the incrementally reconstructed Gaussian map can improve the robustness of odometry. By tightly incorporating photometric constraints from the Gaussian map into the continuous-time factor graph optimization, we demonstrate improved pose estimation under LiDAR degradation scenarios. We also showcase downstream applications via extending our elaborate system, including video frame interpolation and fast 3D mesh extraction. To support rigorous evaluation, we construct a dedicated LiDAR-Inertial-Camera dataset featuring ground-truth poses, depth maps, and extrapolated trajectories for assessing out-of-sequence novel view synthesis.\n\n本文提出了首个同时兼顾视觉质量、几何精度与实时性能的照片级真实感 LiDAR-惯性-相机高斯投影（Gaussian Splatting）SLAM 系统。该方法在连续时间轨迹优化框架下实现稳健且精确的位姿估计，同时利用相机与激光雷达数据实时增量式重建三维高斯地图。生成的地图可支持高质量、实时的新视角渲染，包括 RGB 图像与深度图。为有效解决激光雷达未覆盖区域的欠重建问题，我们引入了轻量级零样本深度模型，将 RGB 外观线索与稀疏激光雷达测量信息高效融合，生成稠密深度图。该深度补全过程能够在激光雷达盲区实现可靠的高斯初始化，从而显著提升系统在稀疏激光雷达传感器下的适用性。为提升几何精度，我们利用稀疏但精确的激光雷达深度来监督高斯地图优化，并通过精心设计的 CUDA 加速策略加快优化过程。此外，我们探讨了增量式重建的高斯地图如何提升里程计的鲁棒性。通过将来自高斯地图的光度约束紧密融合到连续时间因子图优化中，我们在激光雷达退化场景下显著改善了位姿估计效果。我们还展示了该系统的下游应用扩展，包括视频帧插值与快速三维网格提取。为支持严格评估，我们构建了专用的 LiDAR-惯性-相机数据集，提供真值位姿、深度图以及用于评估乱序新视角合成的外推轨迹。\n"
  },
  {
    "path": "abs/2507.04147.md",
    "content": "### A3FR: Agile 3D Gaussian Splatting with Incremental Gaze Tracked Foveated Rendering in Virtual Reality\n\nVirtual reality (VR) significantly transforms immersive digital interfaces, greatly enhancing education, professional practices, and entertainment by increasing user engagement and opening up new possibilities in various industries. Among its numerous applications, image rendering is crucial. Nevertheless, rendering methodologies like 3D Gaussian Splatting impose high computational demands, driven predominantly by user expectations for superior visual quality. This results in notable processing delays for real-time image rendering, which greatly affects the user experience. Additionally, VR devices such as head-mounted displays (HMDs) are intricately linked to human visual behavior, leveraging knowledge from perception and cognition to improve user experience. These insights have spurred the development of foveated rendering, a technique that dynamically adjusts rendering resolution based on the user's gaze direction. The resultant solution, known as gaze-tracked foveated rendering, significantly reduces the computational burden of the rendering process. Although gaze-tracked foveated rendering can reduce rendering costs, the computational overhead of the gaze tracking process itself can sometimes outweigh the rendering savings, leading to increased processing latency. To address this issue, we propose an efficient rendering framework called A3FR, designed to minimize the latency of gaze-tracked foveated rendering via the parallelization of gaze tracking and foveated rendering processes. For the rendering algorithm, we utilize 3D Gaussian Splatting, a state-of-the-art neural rendering technique. Evaluation results demonstrate that A3FR can reduce end-to-end rendering latency by up to 2× while maintaining visual quality.\n\n虚拟现实（VR）正在深刻改变沉浸式数字交互界面，通过提升用户参与度并为各行业开辟新可能，极大地推动了教育、专业实践和娱乐的发展。在众多应用中，图像渲染尤为关键。然而，诸如三维高斯投影（3D Gaussian Splatting）等渲染方法在满足用户对高视觉质量期望的驱动下，计算需求极高，导致实时图像渲染出现显著延迟，从而严重影响用户体验。此外，诸如头戴式显示器（HMD）等 VR 设备与人类视觉行为密切相关，借助感知与认知科学的知识来改善用户体验。这些洞察推动了注视点渲染（foveated rendering）的发展，该技术根据用户的注视方向动态调整渲染分辨率。其衍生方案——基于注视跟踪的注视点渲染（gaze-tracked foveated rendering）——显著降低了渲染过程的计算负担。\n尽管基于注视跟踪的注视点渲染能够降低渲染成本，但注视跟踪过程本身的计算开销有时可能超过渲染节省的资源，从而导致处理延迟增加。为解决这一问题，我们提出了一种高效渲染框架——A3FR，通过注视跟踪与注视点渲染过程的并行化来最大限度减少基于注视跟踪的注视点渲染延迟。在渲染算法上，我们采用了最先进的神经渲染技术——三维高斯投影。评估结果表明，A3FR 在保持视觉质量的同时，能够将端到端渲染延迟降低最高达 2 倍。\n"
  },
  {
    "path": "abs/2507.04961.md",
    "content": "### InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior\n\n3D Gaussian Splatting based 3D editing has demonstrated impressive performance in recent years. However, the multi-view editing often exhibits significant local inconsistency, especially in areas of non-rigid deformation, which lead to local artifacts, texture blurring, or semantic variations in edited 3D scenes. We also found that the existing editing methods, which rely entirely on text prompts make the editing process a \"one-shot deal\", making it difficult for users to control the editing degree flexibly. In response to these challenges, we present InterGSEdit, a novel framework for high-quality 3DGS editing via interactively selecting key views with users' preferences. We propose a CLIP-based Semantic Consistency Selection (CSCS) strategy to adaptively screen a group of semantically consistent reference views for each user-selected key view. Then, the cross-attention maps derived from the reference views are used in a weighted Gaussian Splatting unprojection to construct the 3D Geometry-Consistent Attention Prior (GAP3D). We project GAP3D to obtain 3D-constrained attention, which are fused with 2D cross-attention via Attention Fusion Network (AFN). AFN employs an adaptive attention strategy that prioritizes 3D-constrained attention for geometric consistency during early inference, and gradually prioritizes 2D cross-attention maps in diffusion for fine-grained features during the later inference. Extensive experiments demonstrate that InterGSEdit achieves state-of-the-art performance, delivering consistent, high-fidelity 3DGS editing with improved user experience.\n\n基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的三维编辑在近年来表现出了令人印象深刻的性能。然而，多视角编辑往往存在显著的局部不一致性，尤其是在非刚性变形区域，这会导致局部伪影、纹理模糊或编辑后三维场景的语义变化。我们还发现，现有完全依赖文本提示的编辑方法，使得编辑过程成为一次性操作（“one-shot deal”），用户难以灵活控制编辑程度。针对这些挑战，我们提出了 **InterGSEdit**——一种可通过交互式选择用户偏好的关键视图实现高质量 3DGS 编辑的新框架。我们提出了一种基于 CLIP 的语义一致性选择（CLIP-based Semantic Consistency Selection, CSCS）策略，用于针对每个用户选定的关键视图，自适应地筛选一组语义一致的参考视图。随后，从这些参考视图中提取的交叉注意力图将被用于加权高斯溅射反投影，以构建三维几何一致性注意力先验（3D Geometry-Consistent Attention Prior, GAP3D）。我们将 GAP3D 投影以获得三维约束注意力，并通过注意力融合网络（Attention Fusion Network, AFN）与二维交叉注意力进行融合。AFN 采用一种自适应注意力策略，在推理早期优先利用三维约束注意力以保持几何一致性，而在扩散过程的后期则逐渐优先二维交叉注意力图，以捕获更精细的特征。大量实验表明，InterGSEdit 在一致性和高保真度的 3DGS 编辑方面达到了当前最优性能，并显著提升了用户体验。\n"
  },
  {
    "path": "abs/2507.05426.md",
    "content": "### Mastering Regional 3DGS: Locating, Initializing, and Editing with Diverse 2D Priors\n\nMany 3D scene editing tasks focus on modifying local regions rather than the entire scene, except for some global applications like style transfer, and in the context of 3D Gaussian Splatting (3DGS), where scenes are represented by a series of Gaussians, this structure allows for precise regional edits, offering enhanced control over specific areas of the scene; however, the challenge lies in the fact that 3D semantic parsing often underperforms compared to its 2D counterpart, making targeted manipulations within 3D spaces more difficult and limiting the fidelity of edits, which we address by leveraging 2D diffusion editing to accurately identify modification regions in each view, followed by inverse rendering for 3D localization, then refining the frontal view and initializing a coarse 3DGS with consistent views and approximate shapes derived from depth maps predicted by a 2D foundation model, thereby supporting an iterative, view-consistent editing process that gradually enhances structural details and textures to ensure coherence across perspectives. Experiments demonstrate that our method achieves state-of-the-art performance while delivering up to a 4× speedup, providing a more efficient and effective approach to 3D scene local editing.\n\n许多三维场景编辑任务更侧重于局部区域的修改，而非整个场景，除了一些如风格迁移等全局应用。在三维高斯溅射（3D Gaussian Splatting, 3DGS）中，场景由一系列高斯表示，这种结构使精确的局部编辑成为可能，从而能够更好地控制场景中的特定区域。然而，挑战在于三维语义解析的表现往往不如二维，这使得在三维空间中进行针对性的操作更加困难，并限制了编辑的保真度。为解决这一问题，我们利用二维扩散编辑在每个视图中精确识别修改区域，随后通过反向渲染实现三维定位，然后对正视图进行细化，并利用二维基础模型预测的深度图生成一致视图和近似形状，以初始化粗略的 3DGS，从而支持一种迭代的、视图一致的编辑过程，在该过程中逐步增强结构细节与纹理，以确保多视角间的一致性。实验表明，我们的方法在性能上达到了当前最优，并实现了最高 4 倍的加速，为三维场景局部编辑提供了一种更高效且更有效的方案。\n"
  },
  {
    "path": "abs/2507.05661.md",
    "content": "### 3DGS_LSR:Large_Scale Relocation for Autonomous Driving Based on 3D Gaussian Splatting\n\nIn autonomous robotic systems, precise localization is a prerequisite for safe navigation. However, in complex urban environments, GNSS positioning often suffers from signal occlusion and multipath effects, leading to unreliable absolute positioning. Traditional mapping approaches are constrained by storage requirements and computational inefficiency, limiting their applicability to resource-constrained robotic platforms. To address these challenges, we propose 3DGS-LSR: a large-scale relocalization framework leveraging 3D Gaussian Splatting (3DGS), enabling centimeter-level positioning using only a single monocular RGB image on the client side. We combine multi-sensor data to construct high-accuracy 3DGS maps in large outdoor scenes, while the robot-side localization requires just a standard camera input. Using SuperPoint and SuperGlue for feature extraction and matching, our core innovation is an iterative optimization strategy that refines localization results through step-by-step rendering, making it suitable for real-time autonomous navigation. Experimental validation on the KITTI dataset demonstrates our 3DGS-LSR achieves average positioning accuracies of 0.026m, 0.029m, and 0.081m in town roads, boulevard roads, and traffic-dense highways respectively, significantly outperforming other representative methods while requiring only monocular RGB input. This approach provides autonomous robots with reliable localization capabilities even in challenging urban environments where GNSS fails.\n\n在自主机器人系统中，精确定位是安全导航的前提。然而，在复杂的城市环境中，GNSS 定位常因信号遮挡和多径效应而受到影响，导致绝对定位结果不可靠。传统建图方法受限于存储需求和计算效率，难以适用于资源受限的机器人平台。为解决这些挑战，我们提出了 **3DGS-LSR**：一种基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的大规模重定位框架，仅依靠客户端单目 RGB 图像即可实现厘米级定位。我们融合多传感器数据，在大规模户外场景中构建高精度 3DGS 地图，而机器人端的定位仅需使用标准摄像头输入。通过 SuperPoint 和 SuperGlue 进行特征提取与匹配，我们的核心创新在于一种迭代优化策略，通过逐步渲染不断优化定位结果，使其适用于实时自主导航。在 KITTI 数据集上的实验验证表明，3DGS-LSR 在城镇道路、大道以及车流密集的高速公路上的平均定位精度分别达到 0.026m、0.029m 和 0.081m，显著优于其他代表性方法，同时仅需单目 RGB 输入。该方法为自主机器人在 GNSS 失效的复杂城市环境中提供了可靠的定位能力。\n"
  },
  {
    "path": "abs/2507.05859.md",
    "content": "### D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos\n\nFree-viewpoint video (FVV) enables immersive 3D experiences, but efficient compression of dynamic 3D representations remains a major challenge. Recent advances in 3D Gaussian Splatting (3DGS) and its dynamic extensions have enabled high-fidelity scene modeling. However, existing methods often couple scene reconstruction with optimization-dependent coding, which limits generalizability. This paper presents Feedforward Compression of Dynamic Gaussian Splatting (D-FCGS), a novel feedforward framework for compressing temporally correlated Gaussian point cloud sequences. Our approach introduces a Group-of-Frames (GoF) structure with I-P frame coding, where inter-frame motions are extracted via sparse control points. The resulting motion tensors are compressed in a feedforward manner using a dual prior-aware entropy model that combines hyperprior and spatial-temporal priors for accurate rate estimation. For reconstruction, we perform control-point-guided motion compensation and employ a refinement network to enhance view-consistent fidelity. Trained on multi-view video-derived Gaussian frames, D-FCGS generalizes across scenes without per-scene optimization. Experiments show that it matches the rate-distortion performance of optimization-based methods, achieving over 40 times compression in under 2 seconds while preserving visual quality across viewpoints. This work advances feedforward compression for dynamic 3DGS, paving the way for scalable FVV transmission and storage in immersive applications.\n\n自由视角视频（Free-viewpoint Video, FVV）能够带来沉浸式的三维体验，但高效压缩动态三维表示仍然是一大挑战。近年来，三维高斯溅射（3D Gaussian Splatting, 3DGS）及其动态扩展在高保真场景建模方面取得了显著进展。然而，现有方法通常将场景重建与依赖优化的编码过程耦合，限制了通用性。本文提出了 **D-FCGS**（Feedforward Compression of Dynamic Gaussian Splatting），一种用于压缩时间相关的高斯点云序列的新型前馈框架。我们引入了基于帧组（Group-of-Frames, GoF）的 I-P 帧编码结构，其中帧间运动通过稀疏控制点提取。所得的运动张量通过结合超先验（hyperprior）和时空先验（spatial-temporal priors）的双先验感知熵模型，以前馈方式进行压缩，从而实现精确的码率估计。在重建阶段，我们采用基于控制点的运动补偿，并引入细化网络以提升视图一致的保真度。D-FCGS 在多视图视频生成的高斯帧数据上进行训练，无需针对每个场景单独优化即可实现跨场景泛化。实验结果表明，该方法在码率失真性能上可与基于优化的方法相媲美，在不到 2 秒的时间内实现超过 40 倍的压缩率，同时保持跨视角的视觉质量。该研究推动了动态 3DGS 的前馈压缩发展，为沉浸式应用中 FVV 的可扩展传输与存储奠定了基础。\n"
  },
  {
    "path": "abs/2507.06060.md",
    "content": "### VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis\n\nRealistic, high-fidelity 3D facial animations are crucial for expressive avatar systems in human-computer interaction and accessibility. Although prior methods show promising quality, their reliance on the mesh domain limits their ability to fully leverage the rapid visual innovations seen in 2D computer vision and graphics. We propose VisualSpeaker, a novel method that bridges this gap using photorealistic differentiable rendering, supervised by visual speech recognition, for improved 3D facial animation. Our contribution is a perceptual lip-reading loss, derived by passing photorealistic 3D Gaussian Splatting avatar renders through a pre-trained Visual Automatic Speech Recognition model during training. Evaluation on the MEAD dataset demonstrates that VisualSpeaker improves both the standard Lip Vertex Error metric by 56.1% and the perceptual quality of the generated animations, while retaining the controllability of mesh-driven animation. This perceptual focus naturally supports accurate mouthings, essential cues that disambiguate similar manual signs in sign language avatars.\n\n真实且高保真的三维面部动画对于在人机交互和无障碍应用中的富有表现力的虚拟形象系统至关重要。尽管已有方法在质量上取得了可喜的成果，但其对网格域的依赖限制了其充分利用二维计算机视觉与图形学中快速发展的视觉创新能力。我们提出了 **VisualSpeaker**，一种利用光真实可微渲染并结合视觉语音识别监督的新方法，以提升三维面部动画效果。我们的方法核心贡献是一种感知型唇读损失（perceptual lip-reading loss），该损失在训练过程中通过将光真实的三维高斯溅射虚拟形象渲染结果输入至预训练的视觉自动语音识别模型（Visual ASR）获得。在 MEAD 数据集上的评估表明，VisualSpeaker 在标准唇部顶点误差（Lip Vertex Error）指标上提升了 56.1%，并显著改善了生成动画的感知质量，同时保留了网格驱动动画的可控性。这种感知导向的设计天然支持精确的口型表达，这对于在手语虚拟形象中区分相似手势至关重要。\n"
  },
  {
    "path": "abs/2507.06103.md",
    "content": "### Reflections Unlock: Geometry-Aware Reflection Disentanglement in 3D Gaussian Splatting for Photorealistic Scenes Rendering\n\nAccurately rendering scenes with reflective surfaces remains a significant challenge in novel view synthesis, as existing methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) often misinterpret reflections as physical geometry, resulting in degraded reconstructions. Previous methods rely on incomplete and non-generalizable geometric constraints, leading to misalignment between the positions of Gaussian splats and the actual scene geometry. When dealing with real-world scenes containing complex geometry, the accumulation of Gaussians further exacerbates surface artifacts and results in blurred reconstructions. To address these limitations, in this work, we propose Ref-Unlock, a novel geometry-aware reflection modeling framework based on 3D Gaussian Splatting, which explicitly disentangles transmitted and reflected components to better capture complex reflections and enhance geometric consistency in real-world scenes. Our approach employs a dual-branch representation with high-order spherical harmonics to capture high-frequency reflective details, alongside a reflection removal module providing pseudo reflection-free supervision to guide clean decomposition. Additionally, we incorporate pseudo-depth maps and a geometry-aware bilateral smoothness constraint to enhance 3D geometric consistency and stability in decomposition. Extensive experiments demonstrate that Ref-Unlock significantly outperforms classical GS-based reflection methods and achieves competitive results with NeRF-based models, while enabling flexible vision foundation models (VFMs) driven reflection editing. Our method thus offers an efficient and generalizable solution for realistic rendering of reflective scenes.\n\n在新视角合成中，准确渲染具有反射表面的场景依然是一大挑战，因为现有方法（如神经辐射场 Neural Radiance Fields, NeRF 和三维高斯溅射 3D Gaussian Splatting, 3DGS）常常将反射误判为物理几何，从而导致重建质量下降。以往方法依赖不完整且缺乏泛化能力的几何约束，造成高斯溅射点与真实场景几何位置的不对齐。在处理包含复杂几何结构的真实场景时，高斯的累积会进一步加剧表面伪影并导致重建结果模糊。为解决这些问题，我们提出了 **Ref-Unlock**——一种基于 3DGS 的几何感知反射建模新框架，该框架显式地分离透射与反射成分，以更好地捕捉复杂反射并提升真实场景中的几何一致性。我们的方法采用双分支表示，并结合高阶球谐函数以捕捉高频反射细节，同时引入反射去除模块，提供伪无反射监督以指导干净的分解。此外，我们结合伪深度图与几何感知的双边平滑约束，以增强三维几何一致性和分解稳定性。大量实验表明，Ref-Unlock 显著优于经典的基于 GS 的反射方法，并在与 NeRF 系列模型的对比中取得了具有竞争力的结果，同时支持由视觉基础模型（VFM）驱动的灵活反射编辑。因此，我们的方法为真实反射场景的高效、可泛化渲染提供了一种有效的解决方案。\n"
  },
  {
    "path": "abs/2507.06109.md",
    "content": "### LighthouseGS: Indoor Structure-aware 3D Gaussian Splatting for Panorama-Style Mobile Captures\n\nRecent advances in 3D Gaussian Splatting (3DGS) have enabled real-time novel view synthesis (NVS) with impressive quality in indoor scenes. However, achieving high-fidelity rendering requires meticulously captured images covering the entire scene, limiting accessibility for general users. We aim to develop a practical 3DGS-based NVS framework using simple panorama-style motion with a handheld camera (e.g., mobile device). While convenient, this rotation-dominant motion and narrow baseline make accurate camera pose and 3D point estimation challenging, especially in textureless indoor scenes. To address these challenges, we propose LighthouseGS, a novel framework inspired by the lighthouse-like sweeping motion of panoramic views. LighthouseGS leverages rough geometric priors, such as mobile device camera poses and monocular depth estimation, and utilizes the planar structures often found in indoor environments. We present a new initialization method called plane scaffold assembly to generate consistent 3D points on these structures, followed by a stable pruning strategy to enhance geometry and optimization stability. Additionally, we introduce geometric and photometric corrections to resolve inconsistencies from motion drift and auto-exposure in mobile devices. Tested on collected real and synthetic indoor scenes, LighthouseGS delivers photorealistic rendering, surpassing state-of-the-art methods and demonstrating the potential for panoramic view synthesis and object placement.\n\n近年来，三维高斯投影（3D Gaussian Splatting, 3DGS）的研究进展，使得在室内场景中实现高质量的实时新视角合成（NVS）成为可能。然而，要获得高保真渲染，通常需要精心采集覆盖整个场景的图像，这限制了普通用户的使用。我们的目标是开发一种实用的基于 3DGS 的 NVS 框架，仅需手持相机（如移动设备）进行简单的全景式运动即可完成采集。尽管这种方式方便，但由于以旋转为主的运动和较窄的基线，会给相机位姿和三维点的精确估计带来挑战，尤其是在无纹理的室内场景中。为了解决这些问题，我们提出了 LighthouseGS，这一新框架的灵感来自全景视图中类似灯塔扫射的运动方式。LighthouseGS 利用粗略的几何先验信息，例如移动设备的相机位姿和单目深度估计，并结合室内环境中常见的平面结构。我们提出了一种称为“平面支架组装”的初始化方法，在这些平面结构上生成一致的三维点，随后采用稳定的剪枝策略以增强几何精度和优化稳定性。此外，我们引入了几何与光度校正，以消除由移动设备的运动漂移和自动曝光引起的不一致性。在采集的真实与合成室内场景测试中，LighthouseGS 实现了照片级真实感渲染，性能优于当前最先进的方法，并展示了在全景视图合成与物体放置方面的潜力。\n"
  },
  {
    "path": "abs/2507.06671.md",
    "content": "### FlexGaussian: Flexible and Cost-Effective Training-Free Compression for 3D Gaussian Splatting\n\n3D Gaussian splatting has become a prominent technique for representing and rendering complex 3D scenes, due to its high fidelity and speed advantages. However, the growing demand for large-scale models calls for effective compression to reduce memory and computation costs, especially on mobile and edge devices with limited resources. Existing compression methods effectively reduce 3D Gaussian parameters but often require extensive retraining or fine-tuning, lacking flexibility under varying compression constraints.\nIn this paper, we introduce FlexGaussian, a flexible and cost-effective method that combines mixed-precision quantization with attribute-discriminative pruning for training-free 3D Gaussian compression. FlexGaussian eliminates the need for retraining and adapts easily to diverse compression targets. Evaluation results show that FlexGaussian achieves up to 96.4% compression while maintaining high rendering quality (&lt; 1 dB drop in PSNR), and is deployable on mobile devices. FlexGaussian delivers high compression ratios within seconds, being 1.7-2.1x faster than state-of-the-art training-free methods and 10-100x faster than training-involved approaches.\n\n三维高斯投影（3D Gaussian Splatting）凭借其高保真和高速度优势，已成为复杂三维场景表示与渲染的重要技术。然而，随着对大规模模型需求的增长，如何有效压缩以降低内存和计算成本，尤其是在资源有限的移动端和边缘设备上，成为亟需解决的问题。现有压缩方法虽能有效减少三维高斯参数，但往往需要大量的重新训练或微调，在不同压缩约束下缺乏灵活性。\n在本文中，我们提出了 FlexGaussian，这是一种结合混合精度量化与属性判别剪枝的、无需训练的三维高斯压缩方法，具有灵活且高性价比的特点。FlexGaussian 不仅无需重新训练，还能轻松适应多样化的压缩目标。实验结果表明，FlexGaussian 在保持高渲染质量（PSNR 下降不足 1 dB）的情况下，压缩率可达 96.4%，并可部署在移动设备上。FlexGaussian 能在数秒内实现高压缩比，其速度比当前最先进的无需训练方法快 1.7-2.1 倍，比涉及训练的方法快 10-100 倍。\n"
  },
  {
    "path": "abs/2507.07136.md",
    "content": "### LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS\n\nIn this paper, we introduce LangSplatV2, which achieves high-dimensional feature splatting at 476.2 FPS and 3D open-vocabulary text querying at 384.6 FPS for high-resolution images, providing a 42 × speedup and a 47 × boost over LangSplat respectively, along with improved query accuracy. LangSplat employs Gaussian Splatting to embed 2D CLIP language features into 3D, significantly enhancing speed and learning a precise 3D language field with SAM semantics. Such advancements in 3D language fields are crucial for applications that require language interaction within complex scenes. However, LangSplat does not yet achieve real-time inference performance (8.2 FPS), even with advanced A100 GPUs, severely limiting its broader application. In this paper, we first conduct a detailed time analysis of LangSplat, identifying the heavyweight decoder as the primary speed bottleneck. Our solution, LangSplatV2 assumes that each Gaussian acts as a sparse code within a global dictionary, leading to the learning of a 3D sparse coefficient field that entirely eliminates the need for a heavyweight decoder. By leveraging this sparsity, we further propose an efficient sparse coefficient splatting method with CUDA optimization, rendering high-dimensional feature maps at high quality while incurring only the time cost of splatting an ultra-low-dimensional feature. Our experimental results demonstrate that LangSplatV2 not only achieves better or competitive query accuracy but is also significantly faster.\n\n本文提出了 LangSplatV2，在高分辨率图像上实现了 476.2 FPS 的高维特征投影和 384.6 FPS 的三维开放词汇文本查询，分别相比 LangSplat 提升了 42 倍和 47 倍，同时查询精度也得到了提升。LangSplat 采用高斯投影（Gaussian Splatting）将二维 CLIP 语言特征嵌入三维空间，从而显著加快速度，并结合 SAM 语义学习精确的三维语言场。这类三维语言场的进步对于需要在复杂场景中进行语言交互的应用至关重要。然而，即便在先进的 A100 GPU 上，LangSplat 仍未实现实时推理性能（8.2 FPS），严重限制了其更广泛的应用。在本文中，我们首先对 LangSplat 进行了详细的耗时分析，确认重量级解码器是主要的速度瓶颈。我们的解决方案 LangSplatV2 假设每个高斯在全局字典中充当稀疏编码，从而学习出一个三维稀疏系数字段，完全消除了对重量级解码器的需求。基于这一稀疏性，我们进一步提出了一种结合 CUDA 优化的高效稀疏系数投影方法，在保持高质量渲染高维特征图的同时，仅需付出超低维特征投影的时间成本。实验结果表明，LangSplatV2 不仅在查询精度上优于或可与现有方法媲美，而且速度显著更快。\n"
  },
  {
    "path": "abs/2507.07395.md",
    "content": "### Seg-Wild: Interactive Segmentation based on 3D Gaussian Splatting for Unconstrained Image Collections\n\nReconstructing and segmenting scenes from unconstrained photo collections obtained from the Internet is a novel but challenging task. Unconstrained photo collections are easier to get than well-captured photo collections. These unconstrained images suffer from inconsistent lighting and transient occlusions, which makes segmentation challenging. Previous segmentation methods cannot address transient occlusions or accurately restore the scene's lighting conditions. Therefore, we propose Seg-Wild, an interactive segmentation method based on 3D Gaussian Splatting for unconstrained image collections, suitable for in-the-wild scenes. We integrate multi-dimensional feature embeddings for each 3D Gaussian and calculate the feature similarity between the feature embeddings and the segmentation target to achieve interactive segmentation in the 3D scene. Additionally, we introduce the Spiky 3D Gaussian Cutter (SGC) to smooth abnormal 3D Gaussians. We project the 3D Gaussians onto a 2D plane and calculate the ratio of 3D Gaussians that need to be cut using the SAM mask. We also designed a benchmark to evaluate segmentation quality in in-the-wild scenes. Experimental results demonstrate that compared to previous methods, Seg-Wild achieves better segmentation results and reconstruction quality.\n\n从互联网上获取的非受限照片集合中重建和分割场景是一项新颖但具有挑战性的任务。与精心采集的照片集合相比，非受限照片集合更易获取，但这些图像存在光照不一致和瞬态遮挡等问题，使分割任务更为困难。以往的分割方法无法有效应对瞬态遮挡，也不能准确恢复场景的光照条件。为此，我们提出了 Seg-Wild，这是一种基于三维高斯投影（3D Gaussian Splatting）的交互式分割方法，适用于野外环境（in-the-wild）的图像集合。我们为每个三维高斯引入多维特征嵌入，并计算这些特征嵌入与分割目标之间的特征相似度，从而实现三维场景中的交互式分割。此外，我们提出了尖刺三维高斯切割器（Spiky 3D Gaussian Cutter, SGC），用于平滑异常的三维高斯。我们将三维高斯投影到二维平面，并利用 SAM 掩码计算需要切除的三维高斯比例。我们还设计了一个基准，用于评估野外场景的分割质量。实验结果表明，与现有方法相比，Seg-Wild 在分割效果和重建质量上均取得了更优的表现。\n"
  },
  {
    "path": "abs/2507.07465.md",
    "content": "### SD-GS: Structured Deformable 3D Gaussians for Efficient Dynamic Scene Reconstruction\n\nCurrent 4D Gaussian frameworks for dynamic scene reconstruction deliver impressive visual fidelity and rendering speed, however, the inherent trade-off between storage costs and the ability to characterize complex physical motions significantly limits the practical application of these methods. To tackle these problems, we propose SD-GS, a compact and efficient dynamic Gaussian splatting framework for complex dynamic scene reconstruction, featuring two key contributions. First, we introduce a deformable anchor grid, a hierarchical and memory-efficient scene representation where each anchor point derives multiple 3D Gaussians in its local spatiotemporal region and serves as the geometric backbone of the 3D scene. Second, to enhance modeling capability for complex motions, we present a deformation-aware densification strategy that adaptively grows anchors in under-reconstructed high-dynamic regions while reducing redundancy in static areas, achieving superior visual quality with fewer anchors. Experimental results demonstrate that, compared to state-of-the-art methods, SD-GS achieves an average of 60\\% reduction in model size and an average of 100% improvement in FPS, significantly enhancing computational efficiency while maintaining or even surpassing visual quality.\n\n当前的四维高斯框架在动态场景重建中能够提供令人印象深刻的视觉保真度和渲染速度，但存储成本与表征复杂物理运动能力之间的固有权衡显著限制了其实际应用。为解决这一问题，我们提出了 SD-GS，这是一种紧凑高效的动态高斯投影框架，专用于复杂动态场景重建，并具有两大核心贡献。首先，我们引入了可变形锚点网格（deformable anchor grid），这是一种分层且内存高效的场景表示方式，其中每个锚点在其局部时空区域内派生多个三维高斯，并作为三维场景的几何骨架。其次，为增强复杂运动的建模能力，我们提出了一种感知形变的加密策略（deformation-aware densification），在重建不足的高动态区域自适应增加锚点，而在静态区域减少冗余，从而以更少的锚点实现更优的视觉质量。实验结果表明，与最先进的方法相比，SD-GS 在模型体积上平均减少 60%，FPS 平均提升 100%，在显著提高计算效率的同时保持甚至超越了视觉质量。\n"
  },
  {
    "path": "abs/2507.07733.md",
    "content": "### RTR-GS: 3D Gaussian Splatting for Inverse Rendering with Radiance Transfer and Reflection\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive capabilities in novel view synthesis. However, rendering reflective objects remains a significant challenge, particularly in inverse rendering and relighting. We introduce RTR-GS, a novel inverse rendering framework capable of robustly rendering objects with arbitrary reflectance properties, decomposing BRDF and lighting, and delivering credible relighting results. Given a collection of multi-view images, our method effectively recovers geometric structure through a hybrid rendering model that combines forward rendering for radiance transfer with deferred rendering for reflections. This approach successfully separates high-frequency and low-frequency appearances, mitigating floating artifacts caused by spherical harmonic overfitting when handling high-frequency details. We further refine BRDF and lighting decomposition using an additional physically-based deferred rendering branch. Experimental results show that our method enhances novel view synthesis, normal estimation, decomposition, and relighting while maintaining efficient training inference process.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）在新视角合成方面展现了令人印象深刻的能力。然而，渲染反射性物体依然是一个重大挑战，尤其是在逆向渲染与重光照任务中。我们提出了 RTR-GS，这是一种新型逆向渲染框架，能够稳健地渲染具有任意反射特性的物体，实现 BRDF 与光照的分解，并生成可信的重光照结果。给定多视图图像集合，我们的方法通过结合用于辐射传输的前向渲染与用于反射的延迟渲染的混合渲染模型，有效恢复几何结构。该方法能够成功分离高频与低频外观特征，缓解在处理高频细节时由球谐函数过拟合导致的漂浮伪影。我们进一步引入了基于物理的延迟渲染分支，以精细化 BRDF 与光照的分解。实验结果表明，该方法在提升新视角合成、法线估计、分解与重光照效果的同时，仍保持了高效的训练与推理过程。\n"
  },
  {
    "path": "abs/2507.08136.md",
    "content": "### RegGS: Unposed Sparse Views Gaussian Splatting with 3DGS Registration\n\n3D Gaussian Splatting (3DGS) has demonstrated its potential in reconstructing scenes from unposed images. However, optimization-based 3DGS methods struggle with sparse views due to limited prior knowledge. Meanwhile, feed-forward Gaussian approaches are constrained by input formats, making it challenging to incorporate more input views. To address these challenges, we propose RegGS, a 3D Gaussian registration-based framework for reconstructing unposed sparse views. RegGS aligns local 3D Gaussians generated by a feed-forward network into a globally consistent 3D Gaussian representation. Technically, we implement an entropy-regularized Sinkhorn algorithm to efficiently solve the optimal transport Mixture 2-Wasserstein (MW2) distance, which serves as an alignment metric for Gaussian mixture models (GMMs) in Sim(3) space. Furthermore, we design a joint 3DGS registration module that integrates the MW2 distance, photometric consistency, and depth geometry. This enables a coarse-to-fine registration process while accurately estimating camera poses and aligning the scene. Experiments on the RE10K and ACID datasets demonstrate that RegGS effectively registers local Gaussians with high fidelity, achieving precise pose estimation and high-quality novel-view synthesis.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）在无位姿图像的场景重建中展现了潜力。然而，基于优化的 3DGS 方法在稀疏视图下由于缺乏足够的先验知识而表现不佳；与此同时，前馈式高斯方法受输入格式限制，难以灵活引入更多输入视图。为应对这些挑战，我们提出了 RegGS，这是一种基于三维高斯配准的无位姿稀疏视图重建框架。RegGS 将前馈网络生成的局部三维高斯对齐为全局一致的三维高斯表示。在技术上，我们实现了一种熵正则化的 Sinkhorn 算法，以高效求解最优传输的混合二阶 Wasserstein（MW2）距离，该距离被用作 Sim(3) 空间中高斯混合模型（GMM）的对齐度量。此外，我们设计了一个联合 3DGS 配准模块，将 MW2 距离、光度一致性与深度几何相结合，从而实现由粗到细的配准过程，同时精确估计相机位姿并对齐场景。在 RE10K 和 ACID 数据集上的实验表明，RegGS 能够高保真地配准局部高斯，实现精确的位姿估计与高质量的新视角合成。\n"
  },
  {
    "path": "abs/2507.08137.md",
    "content": "### Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction\n\nWe introduce a novel framework for reconstructing dynamic human-object interactions from monocular video that overcomes challenges associated with occlusions and temporal inconsistencies. Traditional 3D reconstruction methods typically assume static objects or full visibility of dynamic subjects, leading to degraded performance when these assumptions are violated-particularly in scenarios where mutual occlusions occur. To address this, our framework leverages amodal completion to infer the complete structure of partially obscured regions. Unlike conventional approaches that operate on individual frames, our method integrates temporal context, enforcing coherence across video sequences to incrementally refine and stabilize reconstructions. This template-free strategy adapts to varying conditions without relying on predefined models, significantly enhancing the recovery of intricate details in dynamic scenes. We validate our approach using 3D Gaussian Splatting on challenging monocular videos, demonstrating superior precision in handling occlusions and maintaining temporal stability compared to existing techniques.\n\n我们提出了一种从单目视频中重建动态人-物交互的新型框架，能够克服遮挡和时间不一致带来的挑战。传统三维重建方法通常假设物体是静止的或动态主体完全可见，当这些假设被打破时（尤其是在发生相互遮挡的场景中），性能会显著下降。为了解决这一问题，我们的框架利用非模态补全（amodal completion）来推断被部分遮挡区域的完整结构。不同于在单帧上独立处理的传统方法，我们的方法融合了时间上下文，在视频序列中强制保持时序一致性，从而逐步优化并稳定重建结果。这种无模板策略无需依赖预定义模型，能够适应多变条件，大幅提升动态场景中细节的恢复效果。我们在具有挑战性的单目视频上结合三维高斯投影（3D Gaussian Splatting）对方法进行了验证，结果表明相比现有技术，我们的方法在处理遮挡与保持时间稳定性方面具有更高的精度。\n"
  },
  {
    "path": "abs/2507.08434.md",
    "content": "### RePaintGS: Reference-Guided Gaussian Splatting for Realistic and View-Consistent 3D Scene Inpainting\n\nRadiance field methods, such as Neural Radiance Field or 3D Gaussian Splatting, have emerged as seminal 3D representations for synthesizing realistic novel views. For practical applications, there is ongoing research on flexible scene editing techniques, among which object removal is a representative task. However, removing objects exposes occluded regions, often leading to unnatural appearances. Thus, studies have employed image inpainting techniques to replace such regions with plausible content - a task referred to as 3D scene inpainting. However, image inpainting methods produce one of many plausible completions for each view, leading to inconsistencies between viewpoints. A widely adopted approach leverages perceptual cues to blend inpainted views smoothly. However, it is prone to detail loss and can fail when there are perceptual inconsistencies across views. In this paper, we propose a novel 3D scene inpainting method that reliably produces realistic and perceptually consistent results even for complex scenes by leveraging a reference view. Given the inpainted reference view, we estimate the inpainting similarity of the other views to adjust their contribution in constructing an accurate geometry tailored to the reference. This geometry is then used to warp the reference inpainting to other views as pseudo-ground truth, guiding the optimization to match the reference appearance. Comparative evaluation studies have shown that our approach improves both the geometric fidelity and appearance consistency of inpainted scenes.\n\n辐射场方法（如神经辐射场 Neural Radiance Field 或三维高斯投影 3D Gaussian Splatting）已成为合成真实感新视角的典型三维表示形式。在实际应用中，灵活的场景编辑技术是研究热点，其中物体移除是具有代表性的一类任务。然而，移除物体会暴露原本被遮挡的区域，往往导致不自然的视觉效果。因此，已有研究采用图像修复技术为这些区域填充合理内容——这一任务被称为三维场景修复（3D scene inpainting）。但图像修复方法在每个视图中生成的只是众多合理补全之一，从而导致不同视角之间的不一致。主流方法通常利用感知线索平滑融合修复后的视图，但这种方法容易造成细节丢失，并在视角间存在感知不一致时失效。本文提出了一种新颖的三维场景修复方法，即使在复杂场景下也能生成真实且感知一致的结果，核心思想是利用参考视图。给定修复完成的参考视图，我们估计其他视图的修复相似度，并据此调整它们在构建针对参考视图的精确几何中的贡献。随后利用该几何将参考修复结果变形到其他视图，作为伪真实值，指导优化过程以匹配参考视图的外观。对比实验结果表明，我们的方法同时提升了修复场景的几何保真度与外观一致性。\n"
  },
  {
    "path": "abs/2507.09993.md",
    "content": "### 3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving\n\nCamera-based object detection systems play a vital role in autonomous driving, yet they remain vulnerable to adversarial threats in real-world environments. Existing 2D and 3D physical attacks, due to their focus on texture optimization, often struggle to balance physical realism and attack robustness. In this work, we propose 3D Gaussian-based Adversarial Attack (3DGAA), a novel adversarial object generation framework that leverages the full 14-dimensional parameterization of 3D Gaussian Splatting (3DGS) to jointly optimize geometry and appearance in physically realizable ways. Unlike prior works that rely on patches or texture optimization, 3DGAA jointly perturbs both geometric attributes (shape, scale, rotation) and appearance attributes (color, opacity) to produce physically realistic and transferable adversarial objects. We further introduce a physical filtering module that filters outliers to preserve geometric fidelity, and a physical augmentation module that simulates complex physical scenarios to enhance attack generalization under real-world conditions. We evaluate 3DGAA on both virtual benchmarks and physical-world setups using miniature vehicle models. Experimental results show that 3DGAA achieves to reduce the detection mAP from 87.21% to 7.38%, significantly outperforming existing 3D physical attacks. Moreover, our method maintains high transferability across different physical conditions, demonstrating a new state-of-the-art in physically realizable adversarial attacks.\n\n基于摄像头的目标检测系统在自动驾驶中发挥着至关重要的作用，但在真实环境中仍易受到对抗性威胁的影响。现有的二维与三维物理攻击由于侧重于纹理优化，往往难以在物理真实感与攻击鲁棒性之间取得平衡。本文提出了一种新的对抗性物体生成框架——基于三维高斯的对抗性攻击（3D Gaussian-based Adversarial Attack, 3DGAA），利用三维高斯投影（3D Gaussian Splatting, 3DGS）的全 14 维参数化方式，在物理可实现的条件下联合优化几何与外观。与依赖贴片或纹理优化的先前方法不同，3DGAA 同时扰动几何属性（形状、尺度、旋转）与外观属性（颜色、不透明度），生成具有物理真实感且可迁移的对抗性物体。我们进一步引入了物理过滤模块（physical filtering module）以剔除异常值、保持几何保真度，并设计了物理增强模块（physical augmentation module）以模拟复杂的物理场景，从而提升攻击在真实世界条件下的泛化能力。我们在虚拟基准测试与使用微缩车辆模型的真实物理环境中对 3DGAA 进行了评估，实验结果表明，该方法可将检测 mAP 从 87.21% 降至 7.38%，显著优于现有三维物理攻击。此外，我们的方法在不同物理条件下保持了较高的可迁移性，在物理可实现的对抗性攻击领域达到了新的最先进水平。\n"
  },
  {
    "path": "abs/2507.10065.md",
    "content": "### MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second\n\nWe present MoVieS, a novel feed-forward model that synthesizes 4D dynamic novel views from monocular videos in one second. MoVieS represents dynamic 3D scenes using pixel-aligned grids of Gaussian primitives, explicitly supervising their time-varying motion. This allows, for the first time, the unified modeling of appearance, geometry and motion, and enables view synthesis, reconstruction and 3D point tracking within a single learning-based framework. By bridging novel view synthesis with dynamic geometry reconstruction, MoVieS enables large-scale training on diverse datasets with minimal dependence on task-specific supervision. As a result, it also naturally supports a wide range of zero-shot applications, such as scene flow estimation and moving object segmentation. Extensive experiments validate the effectiveness and efficiency of MoVieS across multiple tasks, achieving competitive performance while offering several orders of magnitude speedups.\n\n我们提出了 MoVieS，这是一种新型前馈式模型，能够在一秒内从单目视频合成四维动态新视角。MoVieS 使用与像素对齐的高斯基元网格表示动态三维场景，并显式监督其随时间变化的运动。这首次实现了在单一的学习框架中对外观、几何与运动的统一建模，并支持视图合成、重建与三维点跟踪。通过将新视角合成与动态几何重建相结合，MoVieS 能够在多样化数据集上进行大规模训练，并且对特定任务的监督依赖极低。因此，它还能自然地支持多种零样本应用，如场景流估计和运动物体分割。大量实验证明，MoVieS 在多个任务中都具有高效且有效的表现，在保持竞争性性能的同时实现了数量级的速度提升。\n"
  },
  {
    "path": "abs/2507.10542.md",
    "content": "### ScaffoldAvatar: High-Fidelity Gaussian Avatars with Patch Expressions\n\nGenerating high-fidelity real-time animated sequences of photorealistic 3D head avatars is important for many graphics applications, including immersive telepresence and movies. This is a challenging problem particularly when rendering digital avatar close-ups for showing character's facial microfeatures and expressions. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple locally-defined facial expressions with 3D Gaussian splatting to enable creating ultra-high fidelity, expressive and photorealistic 3D head avatars. In contrast to previous works that operate on a global expression space, we condition our avatar's dynamics on patch-based local expression features and synthesize 3D Gaussians at a patch level. In particular, we leverage a patch-based geometric 3D face model to extract patch expressions and learn how to translate these into local dynamic skin appearance and motion by coupling the patches with anchor points of Scaffold-GS, a recent hierarchical scene representation. These anchors are then used to synthesize 3D Gaussians on-the-fly, conditioned by patch-expressions and viewing direction. We employ color-based densification and progressive training to obtain high-quality results and faster convergence for high resolution 3K training images. By leveraging patch-level expressions, ScaffoldAvatar consistently achieves state-of-the-art performance with visually natural motion, while encompassing diverse facial expressions and styles in real time.\n\n生成高保真、实时的照片级三维头部虚拟形象动画序列，对于包括沉浸式远程呈现和电影在内的众多图形应用至关重要。该问题尤其具有挑战性，当渲染数字虚拟形象的特写以展示角色面部微特征和表情时尤为如此。为了捕捉人类头部富有表现力且细节丰富的特征（包括皮肤褶皱和更细微的面部动作），我们提出将局部定义的面部表情与三维高斯投影（3D Gaussian Splatting）相结合，从而实现超高保真、富有表现力且照片级逼真的三维头部虚拟形象。与以往在全局表情空间中操作的方法不同，我们将虚拟形象的动态条件化在基于面部局部区域的表情特征上，并在局部区域级别生成三维高斯。具体而言，我们利用基于面部局部区域的三维几何人脸模型提取局部表情，并通过将这些区域与 Scaffold-GS（近期提出的一种分层场景表示）的锚点相结合，学习如何将其转化为局部动态皮肤外观与运动。这些锚点随后被用于按需生成三维高斯，并由局部表情与视角方向共同决定。我们采用基于颜色的加密策略与渐进式训练，以在 3K 分辨率训练图像下获得高质量结果并加快收敛速度。通过利用局部区域级别的表情特征，ScaffoldAvatar 能够在实时条件下实现多样化面部表情与风格的自然流畅运动，并在视觉效果上持续达到当前最先进水平。\n"
  },
  {
    "path": "abs/2507.11061.md",
    "content": "### Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling\n\nRecent advances in 3D neural representations and instance-level editing models have enabled the efficient creation of high-quality 3D content. However, achieving precise local 3D edits remains challenging, especially for Gaussian Splatting, due to inconsistent multi-view 2D part segmentations and inherently ambiguous nature of Score Distillation Sampling (SDS) loss. To address these limitations, we propose RoMaP, a novel local 3D Gaussian editing framework that enables precise and drastic part-level modifications. First, we introduce a robust 3D mask generation module with our 3D-Geometry Aware Label Prediction (3D-GALP), which uses spherical harmonics (SH) coefficients to model view-dependent label variations and soft-label property, yielding accurate and consistent part segmentations across viewpoints. Second, we propose a regularized SDS loss that combines the standard SDS loss with additional regularizers. In particular, an L1 anchor loss is introduced via our Scheduled Latent Mixing and Part (SLaMP) editing method, which generates high-quality part-edited 2D images and confines modifications only to the target region while preserving contextual coherence. Additional regularizers, such as Gaussian prior removal, further improve flexibility by allowing changes beyond the existing context, and robust 3D masking prevents unintended edits. Experimental results demonstrate that our RoMaP achieves state-of-the-art local 3D editing on both reconstructed and generated Gaussian scenes and objects qualitatively and quantitatively, making it possible for more robust and flexible part-level 3D Gaussian editing.\n\n近年来，三维神经表示与实例级编辑模型的进步使得高质量三维内容的高效创作成为可能。然而，在高斯投影（Gaussian Splatting）中实现精确的局部三维编辑仍然具有挑战性，主要原因在于多视图二维局部分割的不一致性以及得分蒸馏采样（Score Distillation Sampling, SDS）损失本身存在的固有歧义。为克服这些局限，我们提出了 RoMaP，这是一种新颖的局部三维高斯编辑框架，能够实现精确且大幅度的部件级修改。首先，我们引入了鲁棒的三维掩码生成模块——三维几何感知标签预测（3D-Geometry Aware Label Prediction, 3D-GALP），利用球谐函数（Spherical Harmonics, SH）系数建模视角相关的标签变化与软标签特性，从而在多视角下获得准确且一致的部件分割。其次，我们提出了一种正则化 SDS 损失，将标准 SDS 损失与额外的正则项相结合。具体来说，我们通过计划潜变量混合与部件（Scheduled Latent Mixing and Part, SLaMP）编辑方法引入 L1 锚点损失，该方法可生成高质量的局部编辑二维图像，并将修改限制在目标区域，同时保持上下文一致性。其他正则化方法（如高斯先验移除）进一步提升了灵活性，使编辑可突破现有上下文的限制，而鲁棒的三维掩码机制则可防止非预期的编辑。实验结果表明，RoMaP 在重建与生成的高斯场景和物体上都实现了当前最先进的局部三维编辑效果，无论在定性还是定量评估中均表现优异，从而实现了更稳健且灵活的部件级三维高斯编辑。\n"
  },
  {
    "path": "abs/2507.11069.md",
    "content": "### TRAN-D: 2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update\n\nUnderstanding the 3D geometry of transparent objects from RGB images is challenging due to their inherent physical properties, such as reflection and refraction. To address these difficulties, especially in scenarios with sparse views and dynamic environments, we introduce TRAN-D, a novel 2D Gaussian Splatting-based depth reconstruction method for transparent objects. Our key insight lies in separating transparent objects from the background, enabling focused optimization of Gaussians corresponding to the object. We mitigate artifacts with an object-aware loss that places Gaussians in obscured regions, ensuring coverage of invisible surfaces while reducing overfitting. Furthermore, we incorporate a physics-based simulation that refines the reconstruction in just a few seconds, effectively handling object removal and chain-reaction movement of remaining objects without the need for rescanning. TRAN-D is evaluated on both synthetic and real-world sequences, and it consistently demonstrated robust improvements over existing GS-based state-of-the-art methods. In comparison with baselines, TRAN-D reduces the mean absolute error by over 39% for the synthetic TRansPose sequences. Furthermore, despite being updated using only one image, TRAN-D reaches a δ &lt; 2.5 cm accuracy of 48.46%, over 1.5 times that of baselines, which uses six images.\n\n由于反射和折射等固有物理特性，从 RGB 图像中理解透明物体的三维几何形状是一项具有挑战性的任务。为应对这些困难，特别是在稀疏视图和动态环境场景下，我们提出了 TRAN-D，这是一种基于二维高斯投影（2D Gaussian Splatting）的透明物体深度重建新方法。我们的核心思想是将透明物体与背景分离，从而能够针对物体对应的高斯进行集中优化。我们通过引入物体感知损失（object-aware loss）来缓解伪影问题，该损失会在被遮挡区域放置高斯，以确保对不可见表面的覆盖，同时减少过拟合。此外，我们结合了基于物理的模拟，仅需数秒即可优化重建结果，有效处理物体移除及剩余物体的连锁反应式运动，而无需重新扫描。我们在合成和真实数据序列上对 TRAN-D 进行了评估，其在现有基于高斯投影的最先进方法之上表现出持续且稳健的提升。与基线方法相比，TRAN-D 在合成的 TRansPose 序列上将平均绝对误差降低了 39% 以上。此外，尽管仅使用一张图像进行更新，TRAN-D 在 δ &lt; 2.5 cm 的精度下达到了 48.46%，是使用六张图像的基线方法的 1.5 倍以上。\n"
  },
  {
    "path": "abs/2507.11321.md",
    "content": "### A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction\n\nRecently, Gaussian Splatting (GS) has received a lot of attention in surface reconstruction. However, while 3D objects can be of complex and diverse shapes in the real world, existing GS-based methods only limitedly use a single type of splatting primitive (Gaussian ellipse or Gaussian ellipsoid) to represent object surfaces during their reconstruction. In this paper, we highlight that this can be insufficient for object surfaces to be represented in high quality. Thus, we propose a novel framework that, for the first time, enables Gaussian Splatting to incorporate multiple types of (geometrical) primitives during its surface reconstruction process. Specifically, in our framework, we first propose a compositional splatting strategy, enabling the splatting and rendering of different types of primitives in the Gaussian Splatting pipeline. In addition, we also design our framework with a mixed-primitive-based initialization strategy and a vertex pruning mechanism to further promote its surface representation learning process to be well executed leveraging different types of primitives. Extensive experiments show the efficacy of our framework and its accurate surface reconstruction performance.\n\n近年来，高斯投影（Gaussian Splatting, GS）在表面重建领域引起了广泛关注。然而，尽管现实世界中的三维物体形状复杂多样，现有基于 GS 的方法在重建过程中仅有限地使用单一类型的投影基元（高斯椭圆或高斯椭球）来表示物体表面。本文指出，这种方式可能不足以高质量地表达物体表面。为此，我们首次提出了一种新颖的框架，使高斯投影在表面重建过程中能够结合多种类型的几何基元。具体而言，在我们的框架中，首先提出了一种组合投影策略（compositional splatting strategy），使得在高斯投影管线中可以对不同类型的基元进行投影与渲染。此外，我们还设计了基于混合基元的初始化策略与顶点剪枝机制，以进一步促进利用不同类型基元的表面表示学习过程得到高效执行。大量实验结果表明，我们的框架在有效性及高精度表面重建性能方面具有显著优势。\n"
  },
  {
    "path": "abs/2507.11931.md",
    "content": "### Dark-EvGS: Event Camera as an Eye for Radiance Field in the Dark\n\nIn low-light environments, conventional cameras often struggle to capture clear multi-view images of objects due to dynamic range limitations and motion blur caused by long exposure. Event cameras, with their high-dynamic range and high-speed properties, have the potential to mitigate these issues. Additionally, 3D Gaussian Splatting (GS) enables radiance field reconstruction, facilitating bright frame synthesis from multiple viewpoints in low-light conditions. However, naively using an event-assisted 3D GS approach still faced challenges because, in low light, events are noisy, frames lack quality, and the color tone may be inconsistent. To address these issues, we propose Dark-EvGS, the first event-assisted 3D GS framework that enables the reconstruction of bright frames from arbitrary viewpoints along the camera trajectory. Triplet-level supervision is proposed to gain holistic knowledge, granular details, and sharp scene rendering. The color tone matching block is proposed to guarantee the color consistency of the rendered frames. Furthermore, we introduce the first real-captured dataset for the event-guided bright frame synthesis task via 3D GS-based radiance field reconstruction. Experiments demonstrate that our method achieves better results than existing methods, conquering radiance field reconstruction under challenging low-light conditions. The code and sample data are included in the supplementary material.\n\n在低光照环境下，传统相机由于动态范围受限及长曝光引起的运动模糊，往往难以捕获清晰的物体多视图图像。事件相机（event camera）凭借其高动态范围与高速特性，有潜力缓解这些问题。此外，三维高斯投影（3D Gaussian Splatting, GS）能够实现辐射场重建，从而在低光照条件下支持多视角的亮帧合成。然而，直接采用事件辅助的三维高斯投影方法仍面临挑战，因为在低光下，事件数据噪声较大、帧质量不足且色调可能不一致。为解决这些问题，我们提出了 Dark-EvGS——首个事件辅助的三维高斯投影框架，可沿相机轨迹从任意视角重建亮帧。我们提出了三元组级监督（triplet-level supervision），以同时获取整体场景知识、细粒度细节及清晰的场景渲染效果；设计了色调匹配模块（color tone matching block），以保证渲染帧的色彩一致性。此外，我们构建了首个基于三维高斯辐射场重建的事件引导亮帧合成任务的实拍数据集。实验结果表明，我们的方法在应对低光照条件下的辐射场重建时优于现有方法。代码与示例数据包含在补充材料中。\n"
  },
  {
    "path": "abs/2507.12027.md",
    "content": "### SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation\n\nWe propose SGLoc, a novel localization system that directly regresses camera poses from 3D Gaussian Splatting (3DGS) representation by leveraging semantic information. Our method utilizes the semantic relationship between 2D image and 3D scene representation to estimate the 6DoF pose without prior pose information. In this system, we introduce a multi-level pose regression strategy that progressively estimates and refines the pose of query image from the global 3DGS map, without requiring initial pose priors. Moreover, we introduce a semantic-based global retrieval algorithm that establishes correspondences between 2D (image) and 3D (3DGS map). By matching the extracted scene semantic descriptors of 2D query image and 3DGS semantic representation, we align the image with the local region of the global 3DGS map, thereby obtaining a coarse pose estimation. Subsequently, we refine the coarse pose by iteratively optimizing the difference between the query image and the rendered image from 3DGS. Our SGLoc demonstrates superior performance over baselines on 12scenes and 7scenes datasets, showing excellent capabilities in global localization without initial pose prior.\n\n我们提出了 SGLoc，这是一种利用语义信息直接从三维高斯投影（3D Gaussian Splatting, 3DGS）表示回归相机位姿的新型定位系统。该方法利用二维图像与三维场景表示之间的语义关系，在无需先验位姿信息的情况下估计六自由度（6DoF）位姿。在该系统中，我们引入了一种多级位姿回归策略，从全局 3DGS 地图中逐步估计并优化查询图像的位姿，无需初始位姿先验。此外，我们提出了一种基于语义的全局检索算法，用于建立二维（图像）与三维（3DGS 地图）之间的对应关系。通过匹配二维查询图像与 3DGS 语义表示中提取的场景语义描述符，我们将图像与全局 3DGS 地图的局部区域对齐，从而获得粗略位姿估计。随后，我们通过迭代优化查询图像与 3DGS 渲染图像之间的差异来精细化粗略位姿。实验结果表明，SGLoc 在 12scenes 和 7scenes 数据集上相较基线方法表现更优，展现了在无初始位姿先验条件下的卓越全局定位能力。\n"
  },
  {
    "path": "abs/2507.12095.md",
    "content": "### BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images\n\nAccurate 3D reconstruction of vehicles is vital for applications such as vehicle inspection, predictive maintenance, and urban planning. Existing methods like Neural Radiance Fields and Gaussian Splatting have shown impressive results but remain limited by their reliance on dense input views, which hinders real-world applicability. This paper addresses the challenge of reconstructing vehicles from sparse-view inputs, leveraging depth maps and a robust pose estimation architecture to synthesize novel views and augment training data. Specifically, we enhance Gaussian Splatting by integrating a selective photometric loss, applied only to high-confidence pixels, and replacing standard Structure-from-Motion pipelines with the DUSt3R architecture to improve camera pose estimation. Furthermore, we present a novel dataset featuring both synthetic and real-world public transportation vehicles, enabling extensive evaluation of our approach. Experimental results demonstrate state-of-the-art performance across multiple benchmarks, showcasing the method's ability to achieve high-quality reconstructions even under constrained input conditions.\n\n精确的车辆三维重建对于车辆检测、预测性维护和城市规划等应用至关重要。现有方法（如神经辐射场 Neural Radiance Fields 和高斯投影 Gaussian Splatting）虽然在效果上表现出色，但依赖密集输入视图的特性限制了其在真实场景中的适用性。本文针对稀疏视图输入条件下的车辆重建挑战，利用深度图和鲁棒的位姿估计架构来合成新视角并扩充训练数据。具体而言，我们通过引入选择性光度损失（仅作用于高置信度像素）来增强高斯投影，并用 DUSt3R 架构替代标准的结构自运动（SfM）流程以提升相机位姿估计精度。此外，我们构建了一个包含合成与真实公共交通车辆的新数据集，用于全面评估所提方法。实验结果表明，该方法在多个基准测试中均达到了当前最先进水平，即使在输入受限的条件下，也能实现高质量的重建效果。\n"
  },
  {
    "path": "abs/2507.12498.md",
    "content": "### Wavelet-GS: 3D Gaussian Splatting with Wavelet Decomposition\n\n3D Gaussian Splatting (3DGS) has revolutionized 3D scene reconstruction, which effectively balances rendering quality, efficiency, and speed. However, existing 3DGS approaches usually generate plausible outputs and face significant challenges in complex scene reconstruction, manifesting as incomplete holistic structural outlines and unclear local lighting effects. To address these issues simultaneously, we propose a novel decoupled optimization framework, which integrates wavelet decomposition into 3D Gaussian Splatting and 2D sampling. Technically, through 3D wavelet decomposition, our approach divides point clouds into high-frequency and low-frequency components, enabling targeted optimization for each. The low-frequency component captures global structural outlines and manages the distribution of Gaussians through voxelization. In contrast, the high-frequency component restores intricate geometric and textural details while incorporating a relight module to mitigate lighting artifacts and enhance photorealistic rendering. Additionally, a 2D wavelet decomposition is applied to the training images, simulating radiance variations. This provides critical guidance for high-frequency detail reconstruction, ensuring seamless integration of details with the global structure. Extensive experiments on challenging datasets demonstrate our method achieves state-of-the-art performance across various metrics, surpassing existing approaches and advancing the field of 3D scene reconstruction.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）在三维场景重建领域引发了革命性进展，有效平衡了渲染质量、效率与速度。然而，现有的 3DGS 方法虽然能生成较为合理的结果，但在复杂场景重建中仍面临显著挑战，主要表现为整体结构轮廓不完整以及局部光照效果不清晰。为同时解决这些问题，我们提出了一种新颖的解耦优化框架，将小波分解引入三维高斯投影与二维采样中。在技术实现上，我们通过三维小波分解将点云划分为高频与低频两个部分，并针对性地进行优化。低频部分捕捉全局结构轮廓，并通过体素化管理高斯的分布；而高频部分则用于恢复精细的几何与纹理细节，同时结合重光照模块（relight module）以缓解光照伪影并提升照片级真实感渲染效果。此外，我们还对训练图像进行二维小波分解，以模拟辐射变化，从而为高频细节重建提供关键指导，确保细节与全局结构的无缝融合。在具有挑战性的数据集上的大量实验表明，我们的方法在多项指标上均达到了当前最先进水平，超越了现有方法，并推动了三维场景重建领域的发展。\n"
  },
  {
    "path": "abs/2507.12621.md",
    "content": "### NLI4VolVis: Natural Language Interaction for Volume Visualization via LLM Multi-Agents and Editable 3D Gaussian Splatting\n\nTraditional volume visualization (VolVis) methods, like direct volume rendering, suffer from rigid transfer function designs and high computational costs. Although novel view synthesis approaches enhance rendering efficiency, they require additional learning effort for non-experts and lack support for semantic-level interaction. To bridge this gap, we propose NLI4VolVis, an interactive system that enables users to explore, query, and edit volumetric scenes using natural language. NLI4VolVis integrates multi-view semantic segmentation and vision-language models to extract and understand semantic components in a scene. We introduce a multi-agent large language model architecture equipped with extensive function-calling tools to interpret user intents and execute visualization tasks. The agents leverage external tools and declarative VolVis commands to interact with the VolVis engine powered by 3D editable Gaussians, enabling open-vocabulary object querying, real-time scene editing, best-view selection, and 2D stylization. We validate our system through case studies and a user study, highlighting its improved accessibility and usability in volumetric data exploration.\n\n传统的体绘制（VolVis）方法（如直接体绘制）存在传递函数设计僵化和计算成本高的问题。尽管新视角合成方法提升了渲染效率，但对于非专业用户而言需要额外的学习成本，并且缺乏对语义级交互的支持。为弥合这一差距，我们提出了 NLI4VolVis，这是一种交互式系统，允许用户使用自然语言探索、查询和编辑体数据场景。NLI4VolVis 融合了多视图语义分割与视觉-语言模型，以提取并理解场景中的语义组成部分。我们引入了多智能体大语言模型架构，并配备了丰富的函数调用工具，用于解析用户意图并执行可视化任务。这些智能体利用外部工具和声明式 VolVis 命令与基于三维可编辑高斯的 VolVis 引擎交互，从而实现开放词汇的目标查询、实时场景编辑、最佳视角选择和二维风格化。我们通过案例研究和用户研究验证了该系统，结果表明它在体数据探索中的可访问性与可用性均有显著提升。\n"
  },
  {
    "path": "abs/2507.12667.md",
    "content": "### VolSegGS: Segmentation and Tracking in Dynamic Volumetric Scenes via Deformable 3D Gaussians\n\nVisualization of large-scale time-dependent simulation data is crucial for domain scientists to analyze complex phenomena, but it demands significant I/O bandwidth, storage, and computational resources. To enable effective visualization on local, low-end machines, recent advances in view synthesis techniques, such as neural radiance fields, utilize neural networks to generate novel visualizations for volumetric scenes. However, these methods focus on reconstruction quality rather than facilitating interactive visualization exploration, such as feature extraction and tracking. We introduce VolSegGS, a novel Gaussian splatting framework that supports interactive segmentation and tracking in dynamic volumetric scenes for exploratory visualization and analysis. Our approach utilizes deformable 3D Gaussians to represent a dynamic volumetric scene, allowing for real-time novel view synthesis. For accurate segmentation, we leverage the view-independent colors of Gaussians for coarse-level segmentation and refine the results with an affinity field network for fine-level segmentation. Additionally, by embedding segmentation results within the Gaussians, we ensure that their deformation enables continuous tracking of segmented regions over time. We demonstrate the effectiveness of VolSegGS with several time-varying datasets and compare our solutions against state-of-the-art methods. With the ability to interact with a dynamic scene in real time and provide flexible segmentation and tracking capabilities, VolSegGS offers a powerful solution under low computational demands. This framework unlocks exciting new possibilities for time-varying volumetric data analysis and visualization.\n\n大规模时变仿真数据的可视化对于领域科学家分析复杂现象至关重要，但这类任务通常需要大量的 I/O 带宽、存储和计算资源。为了在本地低端设备上实现高效可视化，近期的视图合成技术（如神经辐射场）利用神经网络为体数据场景生成新视角可视化。然而，这些方法主要关注重建质量，而非支持交互式可视化探索（如特征提取与跟踪）。为此，我们提出了 VolSegGS，这是一种新型高斯投影（Gaussian Splatting）框架，可在动态体数据场景中实现交互式分割与跟踪，用于探索性可视化与分析。我们的方法采用可变形三维高斯表示动态体数据场景，从而支持实时新视角合成。为了实现精确分割，我们利用高斯的视角无关颜色进行粗粒度分割，并通过亲和场网络（affinity field network）进行精细化分割。此外，通过将分割结果嵌入到高斯中，我们确保其在随时间变形的过程中能够持续跟踪分割区域。我们在多个时变数据集上验证了 VolSegGS 的有效性，并与当前最先进的方法进行了对比。凭借实时交互动态场景以及灵活的分割和跟踪能力，VolSegGS 在低计算需求下提供了强大的解决方案，为时变体数据的分析与可视化开启了新的可能性。\n"
  },
  {
    "path": "abs/2507.13586.md",
    "content": "### TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting\n\nAdvancements in volume visualization (VolVis) focus on extracting insights from 3D volumetric data by generating visually compelling renderings that reveal complex internal structures. Existing VolVis approaches have explored non-photorealistic rendering techniques to enhance the clarity, expressiveness, and informativeness of visual communication. While effective, these methods often rely on complex predefined rules and are limited to transferring a single style, restricting their flexibility. To overcome these limitations, we advocate the representation of VolVis scenes using differentiable Gaussian primitives combined with pretrained large models to enable arbitrary style transfer and real-time rendering. However, conventional 3D Gaussian primitives tightly couple geometry and appearance, leading to suboptimal stylization results. To address this, we introduce TexGS-VolVis, a textured Gaussian splatting framework for VolVis. TexGS-VolVis employs 2D Gaussian primitives, extending each Gaussian with additional texture and shading attributes, resulting in higher-quality, geometry-consistent stylization and enhanced lighting control during inference. Despite these improvements, achieving flexible and controllable scene editing remains challenging. To further enhance stylization, we develop image- and text-driven non-photorealistic scene editing tailored for TexGS-VolVis and 2D-lift-3D segmentation to enable partial editing with fine-grained control. We evaluate TexGS-VolVis both qualitatively and quantitatively across various volume rendering scenes, demonstrating its superiority over existing methods in terms of efficiency, visual quality, and editing flexibility.\n\n体绘制（VolVis）的进步旨在通过生成具有视觉吸引力的渲染结果，从三维体数据中提取洞察信息，以揭示复杂的内部结构。现有 VolVis 方法已探索了多种非真实感渲染技术，以提升视觉表达的清晰度、表现力和信息量。虽然这些方法在一定程度上有效，但往往依赖复杂的预定义规则，并且仅支持单一风格迁移，从而限制了灵活性。为克服这些局限，我们提出利用可微分高斯基元结合预训练大模型来表示 VolVis 场景，从而实现任意风格迁移与实时渲染。然而，传统的三维高斯基元将几何与外观紧密耦合，导致风格化效果次优。为此，我们提出了 TexGS-VolVis，这是一种面向 VolVis 的纹理化高斯投影框架。TexGS-VolVis 采用二维高斯基元，并为每个高斯扩展了额外的纹理和着色属性，从而在推理过程中实现更高质量、几何一致的风格化效果，并增强光照控制能力。尽管有这些改进，实现灵活且可控的场景编辑仍具挑战性。为进一步提升风格化能力，我们为 TexGS-VolVis 开发了基于图像与文本驱动的非真实感场景编辑，并引入 2D 升维至 3D 分割（2D-lift-3D segmentation）以实现带有细粒度控制的局部编辑。我们在多种体绘制场景中从定性与定量两个维度评估了 TexGS-VolVis，结果表明其在效率、视觉质量和编辑灵活性方面均优于现有方法。\n"
  },
  {
    "path": "abs/2507.13891.md",
    "content": "### PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations\n\nCOLMAP-free 3D Gaussian Splatting (3D-GS) has recently attracted increasing attention due to its remarkable performance in reconstructing high-quality 3D scenes from unposed images or videos. However, it often struggles to handle scenes with complex camera trajectories as featured by drastic rotation and translation across adjacent camera views, leading to degraded estimation of camera poses and further local minima in joint optimization of camera poses and 3D-GS. We propose PCR-GS, an innovative COLMAP-free 3DGS technique that achieves superior 3D scene modeling and camera pose estimation via camera pose co-regularization. PCR-GS achieves regularization from two perspectives. The first is feature reprojection regularization which extracts view-robust DINO features from adjacent camera views and aligns their semantic information for camera pose regularization. The second is wavelet-based frequency regularization which exploits discrepancy in high-frequency details to further optimize the rotation matrix in camera poses. Extensive experiments over multiple real-world scenes show that the proposed PCR-GS achieves superior pose-free 3D-GS scene modeling under dramatic changes of camera trajectories.\n\n无 COLMAP 的三维高斯点渲染（3D-GS）因其在从无位姿图像或视频中重建高质量三维场景方面表现出色，近年来备受关注。然而，当场景包含复杂的相机轨迹时，例如相邻视角间存在剧烈的旋转和平移变化，该方法往往难以处理，导致相机位姿估计退化，并在相机位姿与 3D-GS 的联合优化中陷入局部最优。为此，我们提出了 PCR-GS，这是一种创新的无 COLMAP 3D-GS 技术，通过相机位姿协同正则化实现更优的三维场景建模与相机位姿估计。PCR-GS 从两个方面实现正则化：第一，特征重投影正则化，从相邻相机视角中提取具有视角鲁棒性的 DINO 特征，并对其语义信息进行对齐，从而实现相机位姿的正则化；第二，基于小波的频率正则化，利用高频细节的差异进一步优化相机位姿中的旋转矩阵。大量真实场景实验表明，所提出的 PCR-GS 在相机轨迹剧烈变化的情况下，能够实现更优的无位姿 3D-GS 场景建模效果。\n"
  },
  {
    "path": "abs/2507.13985.md",
    "content": "### DreamScene: 3D Gaussian-based End-to-end Text-to-3D Scene Generation\n\nGenerating 3D scenes from natural language holds great promise for applications in gaming, film, and design. However, existing methods struggle with automation, 3D consistency, and fine-grained control. We present DreamScene, an end-to-end framework for high-quality and editable 3D scene generation from text or dialogue. DreamScene begins with a scene planning module, where a GPT-4 agent infers object semantics and spatial constraints to construct a hybrid graph. A graph-based placement algorithm then produces a structured, collision-free layout. Based on this layout, Formation Pattern Sampling (FPS) generates object geometry using multi-timestep sampling and reconstructive optimization, enabling fast and realistic synthesis. To ensure global consistent, DreamScene employs a progressive camera sampling strategy tailored to both indoor and outdoor settings. Finally, the system supports fine-grained scene editing, including object movement, appearance changes, and 4D dynamic motion. Experiments demonstrate that DreamScene surpasses prior methods in quality, consistency, and flexibility, offering a practical solution for open-domain 3D content creation.o\n\n从自然语言生成三维场景在游戏、电影和设计等领域具有广阔的应用前景。然而，现有方法在自动化、三维一致性以及精细化控制方面仍存在困难。我们提出了 DreamScene，这是一种可从文本或对话生成高质量且可编辑的三维场景的端到端框架。DreamScene 首先通过场景规划模块，由 GPT-4 代理推断物体语义与空间约束，构建混合图；随后利用基于图的放置算法生成结构化且无碰撞的布局。在此布局基础上，采用形成模式采样（Formation Pattern Sampling, FPS）结合多步采样与重构优化生成物体几何，实现快速且逼真的合成。为确保全局一致性，DreamScene 引入了适用于室内和室外场景的渐进式相机采样策略。最后，该系统支持精细化的场景编辑，包括物体移动、外观修改以及四维动态运动。实验表明，DreamScene 在质量、一致性和灵活性方面均优于以往方法，为开放域三维内容创作提供了可行的解决方案。\n"
  },
  {
    "path": "abs/2507.14432.md",
    "content": "### Adaptive 3D Gaussian Splatting Video Streaming\n\nThe advent of 3D Gaussian splatting (3DGS) has significantly enhanced the quality of volumetric video representation. Meanwhile, in contrast to conventional volumetric video, 3DGS video poses significant challenges for streaming due to its substantially larger data volume and the heightened complexity involved in compression and transmission. To address these issues, we introduce an innovative framework for 3DGS volumetric video streaming. Specifically, we design a 3DGS video construction method based on the Gaussian deformation field. By employing hybrid saliency tiling and differentiated quality modeling of 3DGS video, we achieve efficient data compression and adaptation to bandwidth fluctuations while ensuring high transmission quality. Then we build a complete 3DGS video streaming system and validate the transmission performance. Through experimental evaluation, our method demonstrated superiority over existing approaches in various aspects, including video quality, compression effectiveness, and transmission rate.\n\n三维高斯点渲染（3DGS）的出现显著提升了体积视频的表示质量。然而，与传统体积视频相比，3DGS 视频在流媒体传输中面临巨大挑战，主要源于其数据量显著增加，以及压缩与传输过程的复杂性大幅提升。为解决这些问题，我们提出了一种创新性的 3DGS 体积视频流媒体框架。具体而言，我们设计了一种基于高斯形变场的 3DGS 视频构建方法；通过采用混合显著性切片和差异化质量建模，实现了高效的数据压缩与带宽波动自适应，同时保障了传输质量。随后，我们搭建了一个完整的 3DGS 视频流媒体传输系统，并验证了其传输性能。实验评估结果表明，该方法在视频质量、压缩效率及传输速率等多个方面均优于现有方法。\n"
  },
  {
    "path": "abs/2507.14454.md",
    "content": "### Adaptive 3D Gaussian Splatting Video Streaming: Visual Saliency-Aware Tiling and Meta-Learning-Based Bitrate Adaptation\n\n3D Gaussian splatting video (3DGS) streaming has recently emerged as a research hotspot in both academia and industry, owing to its impressive ability to deliver immersive 3D video experiences. However, research in this area is still in its early stages, and several fundamental challenges, such as tiling, quality assessment, and bitrate adaptation, require further investigation. In this paper, we tackle these challenges by proposing a comprehensive set of solutions. Specifically, we propose an adaptive 3DGS tiling technique guided by saliency analysis, which integrates both spatial and temporal features. Each tile is encoded into versions possessing dedicated deformation fields and multiple quality levels for adaptive selection. We also introduce a novel quality assessment framework for 3DGS video that jointly evaluates spatial-domain degradation in 3DGS representations during streaming and the quality of the resulting 2D rendered images. Additionally, we develop a meta-learning-based adaptive bitrate algorithm specifically tailored for 3DGS video streaming, achieving optimal performance across varying network conditions. Extensive experiments demonstrate that our proposed approaches significantly outperform state-of-the-art methods.\n\n三维高斯点渲染视频（3DGS）流媒体因其在呈现沉浸式三维视频体验方面的卓越能力，近年来已成为学术界和工业界的研究热点。然而，该领域研究仍处于早期阶段，切片、质量评估和码率自适应等若干基础性问题亟需深入探索。本文针对这些挑战提出了一整套综合性解决方案。具体而言，我们提出了一种基于显著性分析的自适应 3DGS 切片技术，融合了空间和时间特征；每个切片被编码为具备专用形变场和多种质量等级的版本，以供自适应选择。我们还引入了一种新颖的 3DGS 视频质量评估框架，联合评估流媒体传输过程中 3DGS 表示的空间域退化及其渲染生成的二维图像质量。此外，我们设计了一种基于元学习的 3DGS 视频流媒体码率自适应算法，能够在不同网络条件下实现最优性能。大量实验结果表明，所提出的方法在多个方面均显著优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2507.14505.md",
    "content": "### DCHM: Depth-Consistent Human Modeling for Multiview Detection\n\nMultiview pedestrian detection typically involves two stages: human modeling and pedestrian localization. Human modeling represents pedestrians in 3D space by fusing multiview information, making its quality crucial for detection accuracy. However, existing methods often introduce noise and have low precision. While some approaches reduce noise by fitting on costly multiview 3D annotations, they often struggle to generalize across diverse scenes. To eliminate reliance on human-labeled annotations and accurately model humans, we propose Depth-Consistent Human Modeling (DCHM), a framework designed for consistent depth estimation and multiview fusion in global coordinates. Specifically, our proposed pipeline with superpixel-wise Gaussian Splatting achieves multiview depth consistency in sparse-view, large-scaled, and crowded scenarios, producing precise point clouds for pedestrian localization. Extensive validations demonstrate that our method significantly reduces noise during human modeling, outperforming previous state-of-the-art baselines. Additionally, to our knowledge, DCHM is the first to reconstruct pedestrians and perform multiview segmentation in such a challenging setting.\n\n多视角行人检测通常包括两个阶段：行人建模和行人定位。行人建模通过融合多视角信息在三维空间中表示行人，其质量对检测精度至关重要。然而，现有方法常引入噪声且精度较低。虽然一些方法通过拟合代价高昂的多视角三维标注来降低噪声，但往往难以在多样化场景中具备良好的泛化能力。为消除对人工标注的依赖并精确建模行人，我们提出了深度一致性行人建模（Depth-Consistent Human Modeling, DCHM），该框架旨在实现全局坐标系下的一致深度估计与多视角融合。具体而言，我们提出的基于超像素级高斯点渲染（superpixel-wise Gaussian Splatting）的处理流程，在稀视角、大规模及拥挤场景中实现了多视角深度一致性，生成用于行人定位的精确点云。大量验证结果表明，该方法在行人建模过程中显著降低了噪声，性能优于以往的最新基线方法。此外，据我们所知，DCHM 是首个在如此具有挑战性的环境中同时实现行人重建与多视角分割的方法。\n"
  },
  {
    "path": "abs/2507.14921.md",
    "content": "### Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction\n\nGeneralizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation but requires substantial computational resources and large datasets, posing challenges to training models from scratch. Current methods usually entangle the prediction of 3D Gaussian geometry and appearance, which rely heavily on data-driven priors and result in slow regression speeds. To address this, we propose  Stereo-GS, a disentangled framework for efficient 3D Gaussian prediction. Our method extracts features from local image pairs using a stereo vision backbone and fuses them via global attention blocks. Dedicated point and Gaussian prediction heads generate multi-view point-maps for geometry and Gaussian features for appearance, combined as GS-maps to represent the 3DGS object. A refinement network enhances these GS-maps for high-quality reconstruction. Unlike existing methods that depend on camera parameters, our approach achieves pose-free 3D reconstruction, improving robustness and practicality. By reducing resource demands while maintaining high-quality outputs,  Stereo-GS provides an efficient, scalable solution for real-world 3D content generation.\n\n具有泛化能力的三维高斯点渲染（3D Gaussian Splatting, 3DGS）重建展示了先进的图像到三维内容生成能力，但其训练通常需要大量计算资源和海量数据集，这对从零开始训练模型带来了挑战。现有方法通常将三维高斯的几何与外观预测耦合在一起，过度依赖数据驱动的先验，并导致回归速度缓慢。为此，我们提出了 Stereo-GS，这是一种高效的三维高斯解耦预测框架。该方法利用双目视觉骨干网络从局部图像对中提取特征，并通过全局注意力模块进行融合；专用的点预测头与高斯预测头分别生成用于几何的多视角点图（point-maps）和用于外观的高斯特征，这些特征组合成 GS-maps 以表示 3DGS 对象。随后，精化网络对 GS-maps 进行增强，以实现高质量重建。与依赖相机参数的现有方法不同，我们的方法能够实现无位姿的三维重建，从而提升了鲁棒性与实用性。在降低资源需求的同时保持高质量输出，Stereo-GS 为现实世界的三维内容生成提供了一种高效且可扩展的解决方案。\n"
  },
  {
    "path": "abs/2507.15454.md",
    "content": "### ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting\n\n3D Gaussian Splatting is renowned for its high-fidelity reconstructions and real-time novel view synthesis, yet its lack of semantic understanding limits object-level perception. In this work, we propose ObjectGS, an object-aware framework that unifies 3D scene reconstruction with semantic understanding. Instead of treating the scene as a unified whole, ObjectGS models individual objects as local anchors that generate neural Gaussians and share object IDs, enabling precise object-level reconstruction. During training, we dynamically grow or prune these anchors and optimize their features, while a one-hot ID encoding with a classification loss enforces clear semantic constraints. We show through extensive experiments that ObjectGS not only outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks, but also integrates seamlessly with applications like mesh extraction and scene editing. o\n\n三维高斯点渲染（3D Gaussian Splatting）以其高保真重建和实时新视角合成而闻名，但缺乏语义理解能力，限制了其在物体级感知中的应用。在本研究中，我们提出了 ObjectGS，这是一种融合三维场景重建与语义理解的物体感知框架。与将整个场景视为统一整体的方式不同，ObjectGS 将单个物体建模为生成神经高斯并共享物体 ID 的局部锚点，从而实现精确的物体级重建。在训练过程中，我们动态地增加或剪除这些锚点并优化其特征，同时通过带有分类损失的独热编码（one-hot ID encoding）施加明确的语义约束。大量实验表明，ObjectGS 在开放词汇和全景分割任务上均优于当前最先进的方法，并且能够与网格提取、场景编辑等应用无缝结合。\n"
  },
  {
    "path": "abs/2507.15602.md",
    "content": "### SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting\n\nSurface reconstruction and novel view rendering from sparse-view images are challenging. Signed Distance Function (SDF)-based methods struggle with fine details, while 3D Gaussian Splatting (3DGS)-based approaches lack global geometry coherence. We propose a novel hybrid method that combines the strengths of both approaches: SDF captures coarse geometry to enhance 3DGS-based rendering, while newly rendered images from 3DGS refine the details of SDF for accurate surface reconstruction. As a result, our method surpasses state-of-the-art approaches in surface reconstruction and novel view synthesis on the DTU and MobileBrick datasets.\n\n从稀视角图像进行表面重建与新视角渲染是一项具有挑战性的任务。基于有符号距离函数（Signed Distance Function, SDF）的方法在捕捉细节方面存在不足，而基于三维高斯点渲染（3D Gaussian Splatting, 3DGS）的方法则缺乏全局几何一致性。为此，我们提出了一种结合两者优势的新型混合方法：SDF 用于捕捉粗略几何结构以提升基于 3DGS 的渲染效果，而 3DGS 渲染生成的新图像则用于细化 SDF，从而实现精确的表面重建。实验结果表明，该方法在 DTU 和 MobileBrick 数据集上的表面重建与新视角合成任务中均超越了当前最先进的方法。\n"
  },
  {
    "path": "abs/2507.15629.md",
    "content": "### Gaussian Splatting with Discretized SDF for Relightable Assets\n\n3D Gaussian splatting (3DGS) has shown its detailed expressive ability and highly efficient rendering speed in the novel view synthesis (NVS) task. The application to inverse rendering still faces several challenges, as the discrete nature of Gaussian primitives makes it difficult to apply geometry constraints. Recent works introduce the signed distance field (SDF) as an extra continuous representation to regularize the geometry defined by Gaussian primitives. It improves the decomposition quality, at the cost of increasing memory usage and complicating training. Unlike these works, we introduce a discretized SDF to represent the continuous SDF in a discrete manner by encoding it within each Gaussian using a sampled value. This approach allows us to link the SDF with the Gaussian opacity through an SDF-to-opacity transformation, enabling rendering the SDF via splatting and avoiding the computational cost of ray this http URL key challenge is to regularize the discrete samples to be consistent with the underlying SDF, as the discrete representation can hardly apply the gradient-based constraints (e.g., Eikonal loss). For this, we project Gaussians onto the zero-level set of SDF and enforce alignment with the surface from splatting, namely a projection-based consistency loss. Thanks to the discretized SDF, our method achieves higher relighting quality, while requiring no extra memory beyond GS and avoiding complex manually designed optimization. The experiments reveal that our method outperforms existing Gaussian-based inverse rendering methods.\n\n三维高斯点渲染（3D Gaussian Splatting, 3DGS）在新视角合成（NVS）任务中展现了细致的表达能力和高效的渲染速度。然而，其在逆向渲染中的应用仍面临诸多挑战，因为高斯基元的离散性使得几何约束难以施加。现有一些工作引入有符号距离场（Signed Distance Field, SDF）作为额外的连续表示，以对高斯基元定义的几何进行正则化，这虽然提升了解耦质量，但也增加了内存占用并使训练过程更加复杂。不同于这些方法，我们提出了一种离散化的 SDF，将连续的 SDF 以离散方式表示，即在每个高斯中编码一个采样值。这种方法通过 SDF 到不透明度的变换（SDF-to-opacity transformation）将 SDF 与高斯的不透明度关联起来，从而可通过点渲染（splatting）实现 SDF 的渲染，并避免光线追踪的计算开销。关键挑战在于如何使离散采样与底层 SDF 保持一致，因为离散表示难以直接应用基于梯度的约束（例如 Eikonal 损失）。为此，我们将高斯投影到 SDF 的零水平集，并通过投影一致性损失（projection-based consistency loss）强制其与点渲染得到的表面对齐。得益于离散化 SDF，我们的方法在无需额外 GS 之外的内存、且避免复杂人工设计优化的情况下，实现了更高质量的重光照效果。实验结果表明，该方法在性能上优于现有基于高斯的逆向渲染方法。\n"
  },
  {
    "path": "abs/2507.15683.md",
    "content": "### Hi^2-GSLoc: Dual-Hierarchical Gaussian-Specific Visual Relocalization for Remote Sensing\n\nVisual relocalization, which estimates the 6-degree-of-freedom (6-DoF) camera pose from query images, is fundamental to remote sensing and UAV applications. Existing methods face inherent trade-offs: image-based retrieval and pose regression approaches lack precision, while structure-based methods that register queries to Structure-from-Motion (SfM) models suffer from computational complexity and limited scalability. These challenges are particularly pronounced in remote sensing scenarios due to large-scale scenes, high altitude variations, and domain gaps of existing visual priors. To overcome these limitations, we leverage 3D Gaussian Splatting (3DGS) as a novel scene representation that compactly encodes both 3D geometry and appearance. We introduce Hi2-GSLoc, a dual-hierarchical relocalization framework that follows a sparse-to-dense and coarse-to-fine paradigm, fully exploiting the rich semantic information and geometric constraints inherent in Gaussian primitives. To handle large-scale remote sensing scenarios, we incorporate partitioned Gaussian training, GPU-accelerated parallel matching, and dynamic memory management strategies. Our approach consists of two stages: (1) a sparse stage featuring a Gaussian-specific consistent render-aware sampling strategy and landmark-guided detector for robust and accurate initial pose estimation, and (2) a dense stage that iteratively refines poses through coarse-to-fine dense rasterization matching while incorporating reliability verification. Through comprehensive evaluation on simulation data, public datasets, and real flight experiments, we demonstrate that our method delivers competitive localization accuracy, recall rate, and computational efficiency while effectively filtering unreliable pose estimates. The results confirm the effectiveness of our approach for practical remote sensing applications.\n\n视觉重定位（Visual Relocalization）旨在根据查询图像估计六自由度（6-DoF）相机位姿，是遥感和无人机应用中的关键任务。现有方法存在固有的权衡：基于图像的检索与位姿回归方法精度不足，而基于结构的方法（将查询图像注册到结构自运动 Structure-from-Motion, SfM 模型）则存在计算复杂度高、可扩展性有限的问题。在遥感场景中，这些挑战尤为突出，因为场景规模巨大、高度变化显著，且现有视觉先验存在领域差异。为克服这些限制，我们引入三维高斯点渲染（3D Gaussian Splatting, 3DGS）作为一种新型场景表示方式，能够紧凑地同时编码三维几何与外观信息。我们提出了 Hi2-GSLoc，这是一种双层级重定位框架，遵循由稀到密、由粗到精的范式，充分利用高斯基元所蕴含的丰富语义信息与几何约束。针对大规模遥感场景，我们引入了分区高斯训练、GPU 加速的并行匹配以及动态内存管理策略。该方法包含两个阶段：（1）稀疏阶段，采用特定于高斯的一致渲染感知采样策略与地标引导检测器，以实现稳健且精确的初始位姿估计；（2）稠密阶段，通过由粗到精的稠密光栅化匹配迭代优化位姿，并结合可靠性验证机制。通过在模拟数据、公共数据集以及实飞实验中的全面评估，我们的方法在定位精度、召回率和计算效率方面均表现出竞争力，并能有效筛除不可靠的位姿估计。结果验证了该方法在实际遥感应用中的有效性。\n"
  },
  {
    "path": "abs/2507.15690.md",
    "content": "### DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting\n\nSparse-view 3D Gaussian Splatting (3DGS) presents significant challenges in reconstructing high-quality novel views, as it often overfits to the widely-varying high-frequency (HF) details of the sparse training views. While frequency regularization can be a promising approach, its typical reliance on Fourier transforms causes difficult parameter tuning and biases towards detrimental HF learning. We propose DWTGS, a framework that rethinks frequency regularization by leveraging wavelet-space losses that provide additional spatial supervision. Specifically, we supervise only the low-frequency (LF) LL subbands at multiple DWT levels, while enforcing sparsity on the HF HH subband in a self-supervised manner. Experiments across benchmarks show that DWTGS consistently outperforms Fourier-based counterparts, as this LF-centric strategy improves generalization and reduces HF hallucinations.\n\n稀视角三维高斯点渲染（Sparse-view 3D Gaussian Splatting, 3DGS）在重建高质量新视角时面临巨大挑战，因为它常常会过拟合于稀疏训练视角中差异较大的高频（HF）细节。尽管频域正则化是一种有前景的解决思路，但其典型做法依赖傅里叶变换，导致参数调节困难，并且易偏向有害的高频学习。为此，我们提出了 DWTGS，这一框架通过引入小波域损失重新思考频域正则化，从而提供额外的空间监督。具体来说，我们在多个离散小波变换（DWT）层级上仅监督低频（LF）的 LL 子带，同时以自监督的方式对高频（HF）的 HH 子带施加稀疏性约束。基准数据集上的实验表明，DWTGS 在性能上始终优于基于傅里叶的方法，因为这种以低频为核心的策略能够提升泛化能力并减少高频幻觉。\n"
  },
  {
    "path": "abs/2507.15748.md",
    "content": "### Appearance Harmonization via Bilateral Grid Prediction with Transformers for 3DGS\n\nModern camera pipelines apply extensive on-device processing, such as exposure adjustment, white balance, and color correction, which, while beneficial individually, often introduce photometric inconsistencies across views. These appearance variations violate multi-view consistency and degrade the quality of novel view synthesis. Joint optimization of scene representations and per-image appearance embeddings has been proposed to address this issue, but at the cost of increased computational complexity and slower training. In this work, we propose a transformer-based method that predicts spatially adaptive bilateral grids to correct photometric variations in a multi-view consistent manner, enabling robust cross-scene generalization without the need for scene-specific retraining. By incorporating the learned grids into the 3D Gaussian Splatting pipeline, we improve reconstruction quality while maintaining high training efficiency. Extensive experiments show that our approach outperforms or matches existing scene-specific optimization methods in reconstruction fidelity and convergence speed.\n\n现代相机处理流程通常会进行大量的设备端处理，例如曝光调整、白平衡和颜色校正，这些处理虽在单独应用时各有益处，但往往会在不同视角间引入光度不一致。这类外观变化破坏了多视图一致性，并降低了新视角合成的质量。为解决这一问题，已有研究提出将场景表示与逐图像的外观嵌入进行联合优化，但这会增加计算复杂度并减慢训练速度。在本研究中，我们提出了一种基于 Transformer 的方法，用于预测空间自适应的双边网格，从而以多视图一致的方式校正光度变化，实现无需针对特定场景重新训练的稳健跨场景泛化。通过将所学习的网格融合到 3D 高斯点渲染（3D Gaussian Splatting）流程中，我们在保持高训练效率的同时提升了重建质量。大量实验表明，我们的方法在重建保真度和收敛速度方面优于或可媲美现有的特定场景优化方法。\n"
  },
  {
    "path": "abs/2507.15979.md",
    "content": "### Dream, Lift, Animate: From Single Images to Animatable Gaussian Avatars\n\nWe introduce Dream, Lift, Animate (DLA), a novel framework that reconstructs animatable 3D human avatars from a single image. This is achieved by leveraging multi-view generation, 3D Gaussian lifting, and pose-aware UV-space mapping of 3D Gaussians. Given an image, we first dream plausible multi-views using a video diffusion model, capturing rich geometric and appearance details. These views are then lifted into unstructured 3D Gaussians. To enable animation, we propose a transformer-based encoder that models global spatial relationships and projects these Gaussians into a structured latent representation aligned with the UV space of a parametric body model. This latent code is decoded into UV-space Gaussians that can be animated via body-driven deformation and rendered conditioned on pose and viewpoint. By anchoring Gaussians to the UV manifold, our method ensures consistency during animation while preserving fine visual details. DLA enables real-time rendering and intuitive editing without requiring post-processing. Our method outperforms state-of-the-art approaches on ActorsHQ and 4D-Dress datasets in both perceptual quality and photometric accuracy. By combining the generative strengths of video diffusion models with a pose-aware UV-space Gaussian mapping, DLA bridges the gap between unstructured 3D representations and high-fidelity, animation-ready avatars.\n\n我们提出了 Dream、Lift、Animate（DLA）这一新型框架，可从单张图像重建可动画的三维人类虚拟形象。这一过程依托多视图生成、三维高斯提升（3D Gaussian lifting）以及基于姿态感知的 UV 空间三维高斯映射。给定一张图像，我们首先利用视频扩散模型生成合理的多视图图像（dream），以捕获丰富的几何和外观细节。随后，这些视图被提升为非结构化的三维高斯表示。为实现动画，我们提出了一种基于 Transformer 的编码器，用于建模全局空间关系，并将这些高斯投射到与参数化人体模型 UV 空间对齐的结构化潜在表示中。该潜在编码可解码为 UV 空间高斯，通过身体驱动的形变实现动画，并根据姿态与视角进行渲染。通过将高斯锚定在 UV 流形上，我们的方法在动画过程中确保一致性，同时保留细腻的视觉细节。DLA 支持实时渲染与直观\n"
  },
  {
    "path": "abs/2507.16144.md",
    "content": "### LongSplat: Online Generalizable 3D Gaussian Splatting from Long Sequence Images\n\n3D Gaussian Splatting achieves high-fidelity novel view synthesis, but its application to online long-sequence scenarios is still limited. Existing methods either rely on slow per-scene optimization or fail to provide efficient incremental updates, hindering continuous performance. In this paper, we propose LongSplat, an online real-time 3D Gaussian reconstruction framework designed for long-sequence image input. The core idea is a streaming update mechanism that incrementally integrates current-view observations while selectively compressing redundant historical Gaussians. Crucial to this mechanism is our Gaussian-Image Representation (GIR), a representation that encodes 3D Gaussian parameters into a structured, image-like 2D format. GIR simultaneously enables efficient fusion of current-view and historical Gaussians and identity-aware redundancy compression. These functions enable online reconstruction and adapt the model to long sequences without overwhelming memory or computational costs. Furthermore, we leverage an existing image compression method to guide the generation of more compact and higher-quality 3D Gaussians. Extensive evaluations demonstrate that LongSplat achieves state-of-the-art efficiency-quality trade-offs in real-time novel view synthesis, delivering real-time reconstruction while reducing Gaussian counts by 44% compared to existing per-pixel Gaussian prediction methods.\n\n三维高斯点渲染（3D Gaussian Splatting）在新视角合成中实现了高保真度，但其在在线长序列场景中的应用仍受限。现有方法要么依赖于缓慢的逐场景优化，要么无法实现高效的增量更新，从而阻碍了持续性能的发挥。本文提出了 LongSplat，这是一种面向长序列图像输入的在线实时三维高斯重建框架。其核心思想是基于流式更新机制，在逐步融合当前视角观测的同时，有选择地压缩冗余的历史高斯。支撑该机制的关键是我们提出的高斯-图像表示（Gaussian-Image Representation, GIR），这种表示方式将三维高斯参数编码为结构化、类图像的二维格式。GIR 同时支持高效融合当前视角与历史高斯，以及基于身份感知的冗余压缩。这些功能使得在线重建成为可能，并使模型能够适应长序列输入，而不会导致内存或计算成本过高。此外，我们利用现有的图像压缩方法，引导生成更加紧凑且质量更高的三维高斯。大量评估结果表明，LongSplat 在实时新视角合成中实现了当前最优的效率与质量平衡，不仅能够实现实时重建，还可在与现有逐像素高斯预测方法相比的情况下，将高斯数量减少 44%。\n"
  },
  {
    "path": "abs/2507.16608.md",
    "content": "### Dyna3DGR: 4D Cardiac Motion Tracking with Dynamic 3D Gaussian Representation\n\nAccurate analysis of cardiac motion is crucial for evaluating cardiac function. While dynamic cardiac magnetic resonance imaging (CMR) can capture detailed tissue motion throughout the cardiac cycle, the fine-grained 4D cardiac motion tracking remains challenging due to the homogeneous nature of myocardial tissue and the lack of distinctive features. Existing approaches can be broadly categorized into image based and representation-based, each with its limitations. Image-based methods, including both raditional and deep learning-based registration approaches, either struggle with topological consistency or rely heavily on extensive training data. Representation-based methods, while promising, often suffer from loss of image-level details. To address these limitations, we propose Dynamic 3D Gaussian Representation (Dyna3DGR), a novel framework that combines explicit 3D Gaussian representation with implicit neural motion field modeling. Our method simultaneously optimizes cardiac structure and motion in a self-supervised manner, eliminating the need for extensive training data or point-to-point correspondences. Through differentiable volumetric rendering, Dyna3DGR efficiently bridges continuous motion representation with image-space alignment while preserving both topological and temporal consistency. Comprehensive evaluations on the ACDC dataset demonstrate that our approach surpasses state-of-the-art deep learning-based diffeomorphic registration methods in tracking accuracy. o\n\n对心脏运动的精准分析对于评估心脏功能至关重要。尽管动态心脏磁共振成像（CMR）能够在整个心动周期中捕捉到详细的组织运动，但由于心肌组织的均质性和缺乏显著特征，精细的四维心脏运动跟踪仍然充满挑战。现有方法大体可分为基于图像的方法与基于表示的方法，两者各有局限。基于图像的方法，包括传统配准和基于深度学习的配准方法，要么在保持拓扑一致性方面存在困难，要么严重依赖大量训练数据。基于表示的方法虽然具有潜力，但往往会丢失图像级细节。为克服这些不足，我们提出了动态三维高斯表示（Dynamic 3D Gaussian Representation, Dyna3DGR），这一新框架将显式的三维高斯表示与隐式的神经运动场建模相结合。我们的方法以自监督的方式同时优化心脏结构与运动，无需大量训练数据或点对点对应关系。通过可微分的体渲染，Dyna3DGR 高效地将连续运动表示与图像空间对齐相结合，同时保持拓扑和时间一致性。在 ACDC 数据集上的综合评估表明，我们的方法在跟踪精度方面优于当前最先进的基于深度学习的可微形变配准方法。\n"
  },
  {
    "path": "abs/2507.17029.md",
    "content": "### StreamME: Simplify 3D Gaussian Avatar within Live Stream\n\nWe propose StreamME, a method focuses on fast 3D avatar reconstruction. The StreamME synchronously records and reconstructs a head avatar from live video streams without any pre-cached data, enabling seamless integration of the reconstructed appearance into downstream applications. This exceptionally fast training strategy, which we refer to as on-the-fly training, is central to our approach. Our method is built upon 3D Gaussian Splatting (3DGS), eliminating the reliance on MLPs in deformable 3DGS and relying solely on geometry, which significantly improves the adaptation speed to facial expression. To further ensure high efficiency in on-the-fly training, we introduced a simplification strategy based on primary points, which distributes the point clouds more sparsely across the facial surface, optimizing points number while maintaining rendering quality. Leveraging the on-the-fly training capabilities, our method protects the facial privacy and reduces communication bandwidth in VR system or online conference. Additionally, it can be directly applied to downstream application such as animation, toonify, and relighting.\n\n我们提出了 StreamME，这是一种专注于快速三维虚拟形象重建的方法。StreamME 可在无任何预缓存数据的情况下，从实时视频流中同步录制并重建头部虚拟形象，使重建后的外观能够无缝集成到下游应用中。这种极快的训练策略（我们称之为“即时训练”）是我们方法的核心。我们的方法基于三维高斯点渲染（3D Gaussian Splatting, 3DGS），去除了可变形 3DGS 中对 MLP 的依赖，仅依靠几何信息，大幅提升了对面部表情的适应速度。为进一步保证即时训练的高效性，我们引入了一种基于主点的简化策略，使点云在面部表面分布得更加稀疏，在优化点数的同时保持渲染质量。借助即时训练能力，我们的方法能够在 VR 系统或在线会议中保护面部隐私并减少通信带宽。此外，它还可直接应用于动画、卡通化（toonify）和重光照等下游任务。\n"
  },
  {
    "path": "abs/2507.17336.md",
    "content": "### Temporal Smoothness-Aware Rate-Distortion Optimized 4D Gaussian Splatting\n\nDynamic 4D Gaussian Splatting (4DGS) effectively extends the high-speed rendering capabilities of 3D Gaussian Splatting (3DGS) to represent volumetric videos. However, the large number of Gaussians, substantial temporal redundancies, and especially the absence of an entropy-aware compression framework result in large storage requirements. Consequently, this poses significant challenges for practical deployment, efficient edge-device processing, and data transmission. In this paper, we introduce a novel end-to-end RD-optimized compression framework tailored for 4DGS, aiming to enable flexible, high-fidelity rendering across varied computational platforms. Leveraging Fully Explicit Dynamic Gaussian Splatting (Ex4DGS), one of the state-of-the-art 4DGS methods, as our baseline, we start from the existing 3DGS compression methods for compatibility while effectively addressing additional challenges introduced by the temporal axis. In particular, instead of storing motion trajectories independently per point, we employ a wavelet transform to reflect the real-world smoothness prior, significantly enhancing storage efficiency. This approach yields significantly improved compression ratios and provides a user-controlled balance between compression efficiency and rendering quality. Extensive experiments demonstrate the effectiveness of our method, achieving up to 91x compression compared to the original Ex4DGS model while maintaining high visual fidelity. These results highlight the applicability of our framework for real-time dynamic scene rendering in diverse scenarios, from resource-constrained edge devices to high-performance environments.\n\n动态四维高斯点渲染（Dynamic 4D Gaussian Splatting, 4DGS）有效地将三维高斯点渲染（3DGS）的高速渲染能力扩展至体积视频的表示。然而，大量的高斯点、显著的时间冗余，尤其是缺乏熵感知压缩框架，导致其对存储的需求非常高。这为实际部署、高效的边缘设备处理以及数据传输带来了巨大挑战。本文提出了一种面向 4DGS 的新型端到端率失真（RD）优化压缩框架，旨在实现跨不同计算平台的灵活、高保真渲染。我们以当前最先进的 4DGS 方法之一——全显式动态高斯点渲染（Fully Explicit Dynamic Gaussian Splatting, Ex4DGS）为基线，在兼容现有 3DGS 压缩方法的基础上，有效解决了时间轴引入的额外挑战。具体而言，我们并非为每个点独立存储运动轨迹，而是采用小波变换以反映真实世界的平滑性先验，从而显著提升存储效率。该方法在压缩比方面取得了显著提升，并为用户提供了压缩效率与渲染质量之间的可控平衡。大量实验表明，我们的方法在保持高视觉保真度的同时，相比原始 Ex4DGS 模型实现了最高可达 91 倍的压缩。这些结果凸显了该框架在从资源受限的边缘设备到高性能环境的多种场景下，进行实时动态场景渲染的适用性。\n"
  },
  {
    "path": "abs/2507.18023.md",
    "content": "### High-fidelity 3D Gaussian Inpainting: preserving multi-view consistency and photorealistic details\n\nRecent advancements in multi-view 3D reconstruction and novel-view synthesis, particularly through Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have greatly enhanced the fidelity and efficiency of 3D content creation. However, inpainting 3D scenes remains a challenging task due to the inherent irregularity of 3D structures and the critical need for maintaining multi-view consistency. In this work, we propose a novel 3D Gaussian inpainting framework that reconstructs complete 3D scenes by leveraging sparse inpainted views. Our framework incorporates an automatic Mask Refinement Process and region-wise Uncertainty-guided Optimization. Specifically, we refine the inpainting mask using a series of operations, including Gaussian scene filtering and back-projection, enabling more accurate localization of occluded regions and realistic boundary restoration. Furthermore, our Uncertainty-guided Fine-grained Optimization strategy, which estimates the importance of each region across multi-view images during training, alleviates multi-view inconsistencies and enhances the fidelity of fine details in the inpainted results. Comprehensive experiments conducted on diverse datasets demonstrate that our approach outperforms existing state-of-the-art methods in both visual quality and view consistency.\n\n多视图三维重建与新视角合成的最新进展，尤其是通过神经辐射场（Neural Radiance Fields, NeRF）和三维高斯点渲染（3D Gaussian Splatting, 3DGS），极大提升了三维内容创作的保真度与效率。然而，由于三维结构固有的不规则性，以及保持多视图一致性的关键需求，三维场景修补（inpainting）依然是一项具有挑战性的任务。本文提出了一种新颖的三维高斯修补框架，通过利用稀疏的修补视图来重建完整的三维场景。该框架引入了自动化掩码优化流程（Mask Refinement Process）与基于区域的不确定性感知优化（Uncertainty-guided Optimization）。具体而言，我们通过一系列操作（包括高斯场景滤波与反向投影）来优化修补掩码，从而更准确地定位被遮挡区域并真实恢复边界。此外，我们提出的不确定性感知细粒度优化策略（Uncertainty-guided Fine-grained Optimization），在训练过程中估计多视图图像中各区域的重要性，从而缓解多视图不一致问题，并提升修补结果在细节上的保真度。在多个数据集上的综合实验表明，我们的方法在视觉质量与视角一致性方面均优于现有的最先进方法。\n"
  },
  {
    "path": "abs/2507.18155.md",
    "content": "### GeoAvatar: Adaptive Geometrical Gaussian Splatting for 3D Head Avatar\n\nDespite recent progress in 3D head avatar generation, balancing identity preservation, i.e., reconstruction, with novel poses and expressions, i.e., animation, remains a challenge. Existing methods struggle to adapt Gaussians to varying geometrical deviations across facial regions, resulting in suboptimal quality. To address this, we propose GeoAvatar, a framework for adaptive geometrical Gaussian Splatting. GeoAvatar leverages Adaptive Pre-allocation Stage (APS), an unsupervised method that segments Gaussians into rigid and flexible sets for adaptive offset regularization. Then, based on mouth anatomy and dynamics, we introduce a novel mouth structure and the part-wise deformation strategy to enhance the animation fidelity of the mouth. Finally, we propose a regularization loss for precise rigging between Gaussians and 3DMM faces. Moreover, we release DynamicFace, a video dataset with highly expressive facial motions. Extensive experiments show the superiority of GeoAvatar compared to state-of-the-art methods in reconstruction and novel animation scenarios.\n\n尽管三维头部虚拟形象生成领域取得了显著进展，但在保持身份特征（即重建）与生成新姿态和表情（即动画）之间的平衡方面仍面临挑战。现有方法在适应面部不同区域几何偏差方面表现不足，导致质量欠佳。为此，我们提出了 GeoAvatar，这是一种自适应几何高斯点渲染框架。GeoAvatar 引入了自适应预分配阶段（Adaptive Pre-allocation Stage, APS），这是一种无监督方法，可将高斯划分为刚性集与柔性集，以实现自适应偏移正则化。随后，我们基于口腔的解剖结构与动态特性，提出了一种新颖的口腔结构及分部形变策略，以提升口部动画的保真度。最后，我们提出了一种正则化损失，用于在高斯与三维形状可变模型（3DMM）面部之间实现精确绑定。此外，我们还发布了 DynamicFace 数据集，该视频数据集包含高度丰富的面部表情动态。大量实验表明，GeoAvatar 在重建和新颖动画场景中均优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2507.18231.md",
    "content": "### PS-GS: Gaussian Splatting for Multi-View Photometric Stereo\n\nIntegrating inverse rendering with multi-view photometric stereo (MVPS) yields more accurate 3D reconstructions than the inverse rendering approaches that rely on fixed environment illumination. However, efficient inverse rendering with MVPS remains challenging. To fill this gap, we introduce the Gaussian Splatting for Multi-view Photometric Stereo (PS-GS), which efficiently and jointly estimates the geometry, materials, and lighting of the object that is illuminated by diverse directional lights (multi-light). Our method first reconstructs a standard 2D Gaussian splatting model as the initial geometry. Based on the initialization model, it then proceeds with the deferred inverse rendering by the full rendering equation containing a lighting-computing multi-layer perceptron. During the whole optimization, we regularize the rendered normal maps by the uncalibrated photometric stereo estimated normals. We also propose the 2D Gaussian ray-tracing for single directional light to refine the incident lighting. The regularizations and the use of multi-view and multi-light images mitigate the ill-posed problem of inverse rendering. After optimization, the reconstructed object can be used for novel-view synthesis, relighting, and material and shape editing. Experiments on both synthetic and real datasets demonstrate that our method outperforms prior works in terms of reconstruction accuracy and computational efficiency.\n\n将逆向渲染与多视图光度立体（Multi-view Photometric Stereo, MVPS）相结合，比依赖固定环境光照的逆向渲染方法能够获得更精确的三维重建。然而，高效地将逆向渲染与 MVPS 融合仍然具有挑战性。为填补这一空白，我们提出了用于多视图光度立体的高斯点渲染方法（Gaussian Splatting for Multi-view Photometric Stereo, PS-GS），该方法能够高效地联合估计由多方向光照（multi-light）照亮的物体的几何、材质与光照信息。我们的方法首先重建标准的二维高斯点渲染模型作为初始几何结构；在此基础上，利用包含光照计算多层感知机的完整渲染方程进行延迟逆向渲染。在整个优化过程中，我们使用由非标定光度立体估计得到的法线对渲染法线图进行正则化。此外，我们提出了针对单一方向光照的二维高斯光线追踪方法，以优化入射光照的精度。这些正则化策略以及多视图、多光源图像的利用，有效缓解了逆向渲染中的病态问题。优化完成后，重建的物体可用于新视角合成、重光照以及材质与形状编辑。在合成与真实数据集上的实验表明，我们的方法在重建精度与计算效率方面均优于现有方法。\n"
  },
  {
    "path": "abs/2507.18344.md",
    "content": "### G2S-ICP SLAM: Geometry-aware Gaussian Splatting ICP SLAM\n\nIn this paper, we present a novel geometry-aware RGB-D Gaussian Splatting SLAM system, named G2S-ICP SLAM. The proposed method performs high-fidelity 3D reconstruction and robust camera pose tracking in real-time by representing each scene element using a Gaussian distribution constrained to the local tangent plane. This effectively models the local surface as a 2D Gaussian disk aligned with the underlying geometry, leading to more consistent depth interpretation across multiple viewpoints compared to conventional 3D ellipsoid-based representations with isotropic uncertainty. To integrate this representation into the SLAM pipeline, we embed the surface-aligned Gaussian disks into a Generalized ICP framework by introducing anisotropic covariance prior without altering the underlying registration formulation. Furthermore we propose a geometry-aware loss that supervises photometric, depth, and normal consistency. Our system achieves real-time operation while preserving both visual and geometric fidelity. Extensive experiments on the Replica and TUM-RGBD datasets demonstrate that G2S-ICP SLAM outperforms prior SLAM systems in terms of localization accuracy, reconstruction completeness, while maintaining the rendering quality.\n\n本文提出了一种新型的几何感知 RGB-D 高斯点渲染 SLAM 系统，称为 G2S-ICP SLAM。该方法通过将每个场景元素表示为约束在局部切平面上的高斯分布，实现了实时的高保真三维重建与鲁棒的相机位姿跟踪。这种表示方式将局部表面有效建模为与底层几何对齐的二维高斯圆盘，与传统的各向同性不确定性的三维椭球表示相比，在多视角下能够提供更一致的深度解释。为将该表示集成到 SLAM 流程中，我们将表面对齐的高斯圆盘嵌入到广义 ICP 框架中，并引入各向异性协方差先验，而无需更改底层配准公式。此外，我们提出了一种几何感知损失，用于监督光度、一致性深度与法线一致性。我们的系统在保持视觉与几何保真度的同时，实现了实时运行。在 Replica 和 TUM-RGBD 数据集上的大量实验表明，G2S-ICP SLAM 在定位精度、重建完整性以及渲染质量方面均优于现有 SLAM 系统。\n"
  },
  {
    "path": "abs/2507.18371.md",
    "content": "### MVG4D: Image Matrix-Based Multi-View and Motion Generation for 4D Content Creation from a Single Image\n\nAdvances in generative modeling have significantly enhanced digital content creation, extending from 2D images to complex 3D and 4D scenes. Despite substantial progress, producing high-fidelity and temporally consistent dynamic 4D content remains a challenge. In this paper, we propose MVG4D, a novel framework that generates dynamic 4D content from a single still image by combining multi-view synthesis with 4D Gaussian Splatting (4D GS). At its core, MVG4D employs an image matrix module that synthesizes temporally coherent and spatially diverse multi-view images, providing rich supervisory signals for downstream 3D and 4D reconstruction. These multi-view images are used to optimize a 3D Gaussian point cloud, which is further extended into the temporal domain via a lightweight deformation network. Our method effectively enhances temporal consistency, geometric fidelity, and visual realism, addressing key challenges in motion discontinuity and background degradation that affect prior 4D GS-based methods. Extensive experiments on the Objaverse dataset demonstrate that MVG4D outperforms state-of-the-art baselines in CLIP-I, PSNR, FVD, and time efficiency. Notably, it reduces flickering artifacts and sharpens structural details across views and time, enabling more immersive AR/VR experiences. MVG4D sets a new direction for efficient and controllable 4D generation from minimal inputs.\n\n生成式建模的进步显著推动了数字内容创作的发展，从二维图像扩展到复杂的三维和四维场景。尽管取得了巨大进展，高保真且时间一致的动态四维内容生成仍然具有挑战性。本文提出了 MVG4D，这是一种结合多视图合成与四维高斯点渲染（4D Gaussian Splatting, 4D GS）的新型框架，可从单张静态图像生成动态四维内容。MVG4D 的核心是一个图像矩阵模块，用于合成时间一致且空间多样的多视图图像，为后续的三维与四维重建提供丰富的监督信号。这些多视图图像首先用于优化三维高斯点云，然后通过轻量级形变网络将其扩展到时间域。我们的方法有效提升了时间一致性、几何保真度与视觉真实感，解决了以往基于 4D GS 方法中运动不连续和背景退化等关键问题。在 Objaverse 数据集上的大量实验表明，MVG4D 在 CLIP-I、PSNR、FVD 和时间效率方面均优于当前最先进的基线方法。值得注意的是，它减少了闪烁伪影，并在多视角与时间维度上增强了结构细节的锐度，从而带来更具沉浸感的 AR/VR 体验。MVG4D 为从最少输入实现高效、可控的四维生成开辟了新方向。\n"
  },
  {
    "path": "abs/2507.18473.md",
    "content": "### CRUISE: Cooperative Reconstruction and Editing in V2X Scenarios using Gaussian Splatting\n\nVehicle-to-everything (V2X) communication plays a crucial role in autonomous driving, enabling cooperation between vehicles and infrastructure. While simulation has significantly contributed to various autonomous driving tasks, its potential for data generation and augmentation in V2X scenarios remains underexplored. In this paper, we introduce CRUISE, a comprehensive reconstruction-and-synthesis framework designed for V2X driving environments. CRUISE employs decomposed Gaussian Splatting to accurately reconstruct real-world scenes while supporting flexible editing. By decomposing dynamic traffic participants into editable Gaussian representations, CRUISE allows for seamless modification and augmentation of driving scenes. Furthermore, the framework renders images from both ego-vehicle and infrastructure views, enabling large-scale V2X dataset augmentation for training and evaluation. Our experimental results demonstrate that: 1) CRUISE reconstructs real-world V2X driving scenes with high fidelity; 2) using CRUISE improves 3D detection across ego-vehicle, infrastructure, and cooperative views, as well as cooperative 3D tracking on the V2X-Seq benchmark; and 3) CRUISE effectively generates challenging corner cases.\n\n车联网（Vehicle-to-everything, V2X）通信在自动驾驶中发挥着至关重要的作用，使车辆与基础设施之间能够实现协同工作。尽管仿真技术已在多种自动驾驶任务中作出了重要贡献，但其在 V2X 场景中的数据生成与增强潜力仍有待深入挖掘。本文提出了 CRUISE，这是一种面向 V2X 驾驶环境的全方位重建与合成框架。CRUISE 采用分解式高斯点渲染（decomposed Gaussian Splatting）精确重建真实场景，并支持灵活编辑。通过将动态交通参与者分解为可编辑的高斯表示，CRUISE 实现了对驾驶场景的无缝修改与增强。此外，该框架可从自车视角与基础设施视角渲染图像，从而实现大规模 V2X 数据集的增强，用于训练与评估。实验结果表明：（1）CRUISE 能够高保真地重建真实的 V2X 驾驶场景；（2）使用 CRUISE 可提升自车视角、基础设施视角与协同视角下的三维检测性能，以及在 V2X-Seq 基准上的协同三维跟踪性能；（3）CRUISE 能够有效生成具有挑战性的极端场景。\n"
  },
  {
    "path": "abs/2507.18522.md",
    "content": "### GaussianFusionOcc: A Seamless Sensor Fusion Approach for 3D Occupancy Prediction Using 3D Gaussians\n\n3D semantic occupancy prediction is one of the crucial tasks of autonomous driving. It enables precise and safe interpretation and navigation in complex environments. Reliable predictions rely on effective sensor fusion, as different modalities can contain complementary information. Unlike conventional methods that depend on dense grid representations, our approach, GaussianFusionOcc, uses semantic 3D Gaussians alongside an innovative sensor fusion mechanism. Seamless integration of data from camera, LiDAR, and radar sensors enables more precise and scalable occupancy prediction, while 3D Gaussian representation significantly improves memory efficiency and inference speed. GaussianFusionOcc employs modality-agnostic deformable attention to extract essential features from each sensor type, which are then used to refine Gaussian properties, resulting in a more accurate representation of the environment. Extensive testing with various sensor combinations demonstrates the versatility of our approach. By leveraging the robustness of multi-modal fusion and the efficiency of Gaussian representation, GaussianFusionOcc outperforms current state-of-the-art models.\n\n三维语义占用预测是自动驾驶中的关键任务之一，它能够在复杂环境中实现精确且安全的理解与导航。可靠的预测依赖于有效的传感器融合，因为不同模态往往包含互补的信息。与依赖稠密网格表示的传统方法不同，我们提出的 GaussianFusionOcc 方法采用语义三维高斯结合创新的传感器融合机制。通过无缝整合来自摄像头、激光雷达和毫米波雷达的数据，实现了更精确且可扩展的占用预测，而三维高斯表示则显著提升了内存效率与推理速度。GaussianFusionOcc 使用与模态无关的可变形注意力机制，从每种传感器类型中提取关键信息，并用于优化高斯属性，从而生成更精确的环境表示。在多种传感器组合下的大量测试表明，我们的方法具有高度的通用性。借助多模态融合的鲁棒性与高斯表示的高效性，GaussianFusionOcc 在性能上优于当前最先进的模型。\n"
  },
  {
    "path": "abs/2507.18541.md",
    "content": "### Unposed 3DGS Reconstruction with Probabilistic Procrustes Mapping\n\n3D Gaussian Splatting (3DGS) has emerged as a core technique for 3D representation. Its effectiveness largely depends on precise camera poses and accurate point cloud initialization, which are often derived from pretrained Multi-View Stereo (MVS) models. However, in unposed reconstruction task from hundreds of outdoor images, existing MVS models may struggle with memory limits and lose accuracy as the number of input images grows. To address this limitation, we propose a novel unposed 3DGS reconstruction framework that integrates pretrained MVS priors with the probabilistic Procrustes mapping strategy. The method partitions input images into subsets, maps submaps into a global space, and jointly optimizes geometry and poses with 3DGS. Technically, we formulate the mapping of tens of millions of point clouds as a probabilistic Procrustes problem and solve a closed-form alignment. By employing probabilistic coupling along with a soft dustbin mechanism to reject uncertain correspondences, our method globally aligns point clouds and poses within minutes across hundreds of images. Moreover, we propose a joint optimization framework for 3DGS and camera poses. It constructs Gaussians from confidence-aware anchor points and integrates 3DGS differentiable rendering with an analytical Jacobian to jointly refine scene and poses, enabling accurate reconstruction and pose estimation. Experiments on Waymo and KITTI datasets show that our method achieves accurate reconstruction from unposed image sequences, setting a new state of the art for unposed 3DGS reconstruction.\n\n三维高斯点渲染（3D Gaussian Splatting, 3DGS）已成为三维表示的核心技术之一，其效果在很大程度上依赖于精确的相机位姿和准确的点云初始化，而这些通常来自预训练的多视图立体（Multi-View Stereo, MVS）模型。然而，在由数百张户外图像进行的无位姿重建任务中，现有 MVS 模型可能会受到内存限制，并且随着输入图像数量的增加而失去精度。为解决这一问题，我们提出了一种结合预训练 MVS 先验与概率 Procrustes 映射策略的新型无位姿 3DGS 重建框架。该方法将输入图像划分为子集，将子图映射到全局空间，并利用 3DGS 联合优化几何与位姿。在技术上，我们将数千万点云的映射问题形式化为概率 Procrustes 问题，并求解其闭式对齐解。通过引入概率耦合以及软垃圾桶（soft dustbin）机制以剔除不确定的对应关系，我们的方法能够在数百张图像的规模下于数分钟内实现点云与位姿的全局对齐。此外，我们提出了 3DGS 与相机位姿的联合优化框架：利用置信度感知的锚点构建高斯，并结合 3DGS 可微分渲染与解析雅可比矩阵，实现对场景与位姿的联合细化，从而获得精确的重建与位姿估计。在 Waymo 与 KITTI 数据集上的实验表明，我们的方法能够从无位姿的图像序列中实现高精度重建，并在无位姿 3DGS 重建任务上刷新了最新的性能纪录。\n"
  },
  {
    "path": "abs/2507.18758.md",
    "content": "### Learning Efficient and Generalizable Human Representation with Human Gaussian Model\n\nModeling animatable human avatars from videos is a long-standing and challenging problem. While conventional methods require per-instance optimization, recent feed-forward methods have been proposed to generate 3D Gaussians with a learnable network. However, these methods predict Gaussians for each frame independently, without fully capturing the relations of Gaussians from different timestamps. To address this, we propose Human Gaussian Graph to model the connection between predicted Gaussians and human SMPL mesh, so that we can leverage information from all frames to recover an animatable human representation. Specifically, the Human Gaussian Graph contains dual layers where Gaussians are the first layer nodes and mesh vertices serve as the second layer nodes. Based on this structure, we further propose the intra-node operation to aggregate various Gaussians connected to one mesh vertex, and inter-node operation to support message passing among mesh node neighbors. Experimental results on novel view synthesis and novel pose animation demonstrate the efficiency and generalization of our method.\n\n从视频中建模可动画的人类虚拟形象是一项长期存在且具有挑战性的问题。传统方法通常需要针对每个实例进行优化，而最新的前馈方法则通过可学习网络直接生成三维高斯。然而，这些方法会对每一帧单独预测高斯，而未能充分建模不同时刻高斯之间的关联。为此，我们提出了人类高斯图（Human Gaussian Graph），用于建立预测高斯与人体 SMPL 网格之间的连接，从而能够利用所有帧的信息恢复可动画的人体表示。具体来说，人类高斯图包含双层结构，其中高斯作为第一层节点，网格顶点作为第二层节点。在此结构基础上，我们进一步提出了节点内操作（intra-node operation），用于聚合连接到同一网格顶点的多个高斯，以及节点间操作（inter-node operation），以支持网格节点邻居之间的消息传递。在新视角合成与新姿态动画的实验中，结果表明我们的方法具有高效性与良好的泛化能力。\n"
  },
  {
    "path": "abs/2507.18923.md",
    "content": "### Gaussian Set Surface Reconstruction through Per-Gaussian Optimization\n\n3D Gaussian Splatting (3DGS) effectively synthesizes novel views through its flexible representation, yet fails to accurately reconstruct scene geometry. While modern variants like PGSR introduce additional losses to ensure proper depth and normal maps through Gaussian fusion, they still neglect individual placement optimization. This results in unevenly distributed Gaussians that deviate from the latent surface, complicating both reconstruction refinement and scene editing. Motivated by pioneering work on Point Set Surfaces, we propose Gaussian Set Surface Reconstruction (GSSR), a method designed to distribute Gaussians evenly along the latent surface while aligning their dominant normals with the surface normal. GSSR enforces fine-grained geometric alignment through a combination of pixel-level and Gaussian-level single-view normal consistency and multi-view photometric consistency, optimizing both local and global perspectives. To further refine the representation, we introduce an opacity regularization loss to eliminate redundant Gaussians and apply periodic depth- and normal-guided Gaussian reinitialization for a cleaner, more uniform spatial distribution. Our reconstruction results demonstrate significantly improved geometric precision in Gaussian placement, enabling intuitive scene editing and efficient generation of novel Gaussian-based 3D environments. Extensive experiments validate GSSR's effectiveness, showing enhanced geometric accuracy while preserving high-quality rendering performance.\n\n三维高斯点渲染（3D Gaussian Splatting, 3DGS）凭借其灵活的表示形式能够有效实现新视角合成，但在场景几何精确重建方面存在不足。尽管现代变体如 PGSR 通过高斯融合引入额外损失以确保合理的深度与法线图，但仍忽略了对单个高斯位置的优化。这会导致高斯在潜在表面上的分布不均，与表面存在偏离，从而增加了重建细化与场景编辑的难度。受点集曲面（Point Set Surfaces）先驱性工作的启发，我们提出了高斯集合表面重建（Gaussian Set Surface Reconstruction, GSSR）方法，旨在将高斯均匀分布于潜在表面，并使其主导法线与表面法线对齐。GSSR 通过像素级与高斯级的单视图法线一致性以及多视图光度一致性的联合约束，从局部与全局两个层面实现精细的几何对齐。为进一步优化表示，我们引入不透明度正则化损失以剔除冗余高斯，并结合基于深度与法线引导的周期性高斯重新初始化，以获得更干净、更均匀的空间分布。重建结果表明，GSSR 在高斯定位的几何精度上有显著提升，从而支持直观的场景编辑与高效生成基于高斯的全新三维环境。大量实验验证了 GSSR 的有效性，显示其在提升几何精度的同时保持了高质量渲染性能。\n"
  },
  {
    "path": "abs/2507.19141.md",
    "content": "### DASH: 4D Hash Encoding with Self-Supervised Decomposition for Real-Time Dynamic Scene Rendering\n\nDynamic scene reconstruction is a long-term challenge in 3D vision. Existing plane-based methods in dynamic Gaussian splatting suffer from an unsuitable low-rank assumption, causing feature overlap and poor rendering quality. Although 4D hash encoding provides an explicit representation without low-rank constraints, directly applying it to the entire dynamic scene leads to substantial hash collisions and redundancy. To address these challenges, we present DASH, a real-time dynamic scene rendering framework that employs 4D hash encoding coupled with self-supervised decomposition. Our approach begins with a self-supervised decomposition mechanism that separates dynamic and static components without manual annotations or precomputed masks. Next, we introduce a multiresolution 4D hash encoder for dynamic elements, providing an explicit representation that avoids the low-rank assumption. Finally, we present a spatio-temporal smoothness regularization strategy to mitigate unstable deformation artifacts. Experiments on real-world datasets demonstrate that DASH achieves state-of-the-art dynamic rendering performance, exhibiting enhanced visual quality at real-time speeds of 264 FPS on a single 4090 GPU.\n\n动态场景重建是三维视觉领域的长期挑战。现有基于平面的动态高斯点渲染方法依赖不合适的低秩假设，导致特征重叠和渲染质量下降。虽然四维哈希编码（4D hash encoding）能够在没有低秩约束的情况下提供显式表示，但直接将其应用于整个动态场景会引发大量哈希冲突与冗余。为解决这些问题，我们提出了 DASH，这是一种结合四维哈希编码与自监督分解的实时动态场景渲染框架。该方法首先采用自监督分解机制，在无需人工标注或预计算掩码的情况下，将动态与静态部分进行分离。随后，我们为动态元素引入多分辨率四维哈希编码，以提供避免低秩假设的显式表示。最后，我们提出时空平滑正则化策略，以缓解不稳定的形变伪影。在真实世界数据集上的实验表明，DASH 在动态渲染性能上达到了当前最优水平，在单张 4090 GPU 上实现了 264 FPS 的实时速度，同时具备更高的视觉质量。\n"
  },
  {
    "path": "abs/2507.19481.md",
    "content": "### HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars\n\nWe present a universal prior model for 3D head avatars with explicit hair compositionality. Existing approaches to build generalizable priors for 3D head avatars often adopt a holistic modeling approach, treating the face and hair as an inseparable entity. This overlooks the inherent compositionality of the human head, making it difficult for the model to naturally disentangle face and hair representations, especially when the dataset is limited. Furthermore, such holistic models struggle to support applications like 3D face and hairstyle swapping in a flexible and controllable manner. To address these challenges, we introduce a prior model that explicitly accounts for the compositionality of face and hair, learning their latent spaces separately. A key enabler of this approach is our synthetic hairless data creation pipeline, which removes hair from studio-captured datasets using estimated hairless geometry and texture derived from a diffusion prior. By leveraging a paired dataset of hair and hairless captures, we train disentangled prior models for face and hair, incorporating compositionality as an inductive bias to facilitate effective separation. Our model's inherent compositionality enables seamless transfer of face and hair components between avatars while preserving identity. Additionally, we demonstrate that our model can be fine-tuned in a few-shot manner using monocular captures to create high-fidelity, hair-compositional 3D head avatars for unseen subjects. These capabilities highlight the practical applicability of our approach in real-world scenarios, paving the way for flexible and expressive 3D avatar generation.\n\n我们提出了一种具有显式头发组合性的三维头部虚拟形象通用先验模型。现有用于构建可泛化三维头部虚拟形象先验的方法通常采用整体建模，将面部与头发视为不可分割的整体。这种方法忽视了人类头部固有的组合性，使得模型难以自然地解耦面部与头发的表示，尤其是在数据集有限的情况下。此外，这类整体模型在支持三维面部与发型交换等需要灵活可控的应用时也存在困难。为了解决这些问题，我们引入了一种显式建模面部与头发组合性的先验模型，分别学习它们的潜在空间。实现这一方法的关键是我们提出的合成无发数据生成流程，该流程利用来自扩散先验估计的无发几何与纹理，从影棚采集的数据集中去除头发。通过利用成对的有发与无发数据集，我们为面部与头发训练了解耦的先验模型，并将组合性作为归纳偏置以促进有效分离。我们模型固有的组合性使得在保持身份一致性的前提下，实现面部与头发组件在虚拟形象之间的无缝迁移。此外，我们还展示了该模型能够通过单目采集的少量样本进行快速微调，为未见过的对象生成高保真、具备头发组合性的三维头部虚拟形象。这些能力突显了我们方法在真实场景中的实用性，为灵活且富有表现力的三维虚拟形象生成铺平了道路。\n"
  },
  {
    "path": "abs/2507.19718.md",
    "content": "### GSCache: Real-Time Radiance Caching for Volume Path Tracing using 3D Gaussian Splatting\n\nReal-time path tracing is rapidly becoming the standard for rendering in entertainment and professional applications. In scientific visualization, volume rendering plays a crucial role in helping researchers analyze and interpret complex 3D data. Recently, photorealistic rendering techniques have gained popularity in scientific visualization, yet they face significant challenges. One of the most prominent issues is slow rendering performance and high pixel variance caused by Monte Carlo integration. In this work, we introduce a novel radiance caching approach for path-traced volume rendering. Our method leverages advances in volumetric scene representation and adapts 3D Gaussian splatting to function as a multi-level, path-space radiance cache. This cache is designed to be trainable on the fly, dynamically adapting to changes in scene parameters such as lighting configurations and transfer functions. By incorporating our cache, we achieve less noisy, higher-quality images without increasing rendering costs. To evaluate our approach, we compare it against a baseline path tracer that supports uniform sampling and next-event estimation and the state-of-the-art for neural radiance caching. Through both quantitative and qualitative analyses, we demonstrate that our path-space radiance cache is a robust solution that is easy to integrate and significantly enhances the rendering quality of volumetric visualization applications while maintaining comparable computational efficiency.\n\n实时路径追踪正迅速成为娱乐和专业应用中的渲染标准。在科学可视化中，体绘制在帮助研究人员分析和解释复杂三维数据方面发挥着关键作用。近年来，写实渲染技术在科学可视化中逐渐受到关注，但仍面临显著挑战，其中最突出的问题是由蒙特卡罗积分引起的渲染速度缓慢与像素方差过高。为解决这一问题，本文提出了一种用于路径追踪体绘制的新型辐射缓存方法。我们的方法利用体场景表示的最新进展，并将三维高斯点渲染（3D Gaussian Splatting）改造为多层次、路径空间的辐射缓存。该缓存支持即时训练，可根据光照配置、传递函数等场景参数的变化进行动态自适应。引入该缓存后，我们在不增加渲染成本的情况下获得了噪声更低、质量更高的图像。为评估我们的方法，我们将其与支持均匀采样和下一事件估计的基准路径追踪器以及最新的神经辐射缓存方法进行了比较。通过定量与定性分析，我们证明了该路径空间辐射缓存是一种易于集成且鲁棒性强的解决方案，能够在保持相当计算效率的同时，显著提升体可视化应用的渲染质量。\n"
  },
  {
    "path": "abs/2507.19830.md",
    "content": "### Taking Language Embedded 3D Gaussian Splatting into the Wild\n\nRecent advances in leveraging large-scale Internet photo collections for 3D reconstruction have enabled immersive virtual exploration of landmarks and historic sites worldwide. However, little attention has been given to the immersive understanding of architectural styles and structural knowledge, which remains largely confined to browsing static text-image pairs. Therefore, can we draw inspiration from 3D in-the-wild reconstruction techniques and use unconstrained photo collections to create an immersive approach for understanding the 3D structure of architectural components? To this end, we extend language embedded 3D Gaussian splatting (3DGS) and propose a novel framework for open-vocabulary scene understanding from unconstrained photo collections. Specifically, we first render multiple appearance images from the same viewpoint as the unconstrained image with the reconstructed radiance field, then extract multi-appearance CLIP features and two types of language feature uncertainty maps-transient and appearance uncertainty-derived from the multi-appearance features to guide the subsequent optimization process. Next, we propose a transient uncertainty-aware autoencoder, a multi-appearance language field 3DGS representation, and a post-ensemble strategy to effectively compress, learn, and fuse language features from multiple appearances. Finally, to quantitatively evaluate our method, we introduce PT-OVS, a new benchmark dataset for assessing open-vocabulary segmentation performance on unconstrained photo collections. Experimental results show that our method outperforms existing methods, delivering accurate open-vocabulary segmentation and enabling applications such as interactive roaming with open-vocabulary queries, architectural style pattern recognition, and 3D scene editing.\n\n近年来，利用大规模互联网照片集进行三维重建的研究取得了显著进展，使得全球范围内的地标与历史遗迹能够以沉浸式的方式进行虚拟探索。然而，对于建筑风格与结构知识的沉浸式理解却鲜有关注，这类内容仍主要局限于浏览静态的图文配对信息。那么，我们是否可以借鉴真实环境下的三维重建技术，利用非受限照片集来构建一种沉浸式方法，从而理解建筑构件的三维结构？为此，我们扩展了语言嵌入的三维高斯点渲染（3D Gaussian Splatting, 3DGS），提出了一种面向非受限照片集的开放词汇场景理解新框架。具体而言，我们首先利用重建的辐射场，从与非受限图像相同的视角渲染多种外观图像，然后提取多外观 CLIP 特征，并基于这些特征计算两类语言特征不确定性图——瞬态不确定性与外观不确定性，用于指导后续优化过程。接着，我们提出了瞬态不确定性感知自编码器、多外观语言场 3DGS 表示以及后融合策略，以高效压缩、学习并融合多外观的语言特征。最后，为了定量评估我们的方法，我们引入了 PT-OVS 数据集，这是一个用于评测非受限照片集上开放词汇分割性能的新基准。实验结果表明，我们的方法优于现有方法，能够实现精确的开放词汇分割，并支持开放词汇查询的交互漫游、建筑风格模式识别以及三维场景编辑等应用。\n"
  },
  {
    "path": "abs/2507.19835.md",
    "content": "### SonicGauss: Position-Aware Physical Sound Synthesis for 3D Gaussian Representations\n\nWhile 3D Gaussian representations (3DGS) have proven effective for modeling the geometry and appearance of objects, their potential for capturing other physical attributes-such as sound-remains largely unexplored. In this paper, we present a novel framework dubbed SonicGauss for synthesizing impact sounds from 3DGS representations by leveraging their inherent geometric and material properties. Specifically, we integrate a diffusion-based sound synthesis model with a PointTransformer-based feature extractor to infer material characteristics and spatial-acoustic correlations directly from Gaussian ellipsoids. Our approach supports spatially varying sound responses conditioned on impact locations and generalizes across a wide range of object categories. Experiments on the ObjectFolder dataset and real-world recordings demonstrate that our method produces realistic, position-aware auditory feedback. The results highlight the framework's robustness and generalization ability, offering a promising step toward bridging 3D visual representations and interactive sound synthesis.\n\n虽然三维高斯表示（3D Gaussian Splatting, 3DGS）已被证明在建模物体几何与外观方面非常有效，但其在捕捉声音等其他物理属性上的潜力仍几乎未被探索。本文提出了一种名为 SonicGauss 的新框架，通过利用 3DGS 固有的几何与材质属性，从三维高斯表示中合成碰撞声音。具体而言，我们将基于扩散模型的声音合成方法与基于 PointTransformer 的特征提取器相结合，直接从高斯椭球中推断材质特性与空间-声学相关性。我们的方法支持基于碰撞位置的空间可变声音响应，并可在多种物体类别中实现泛化。在 ObjectFolder 数据集和真实录音上的实验表明，我们的方法能够生成逼真且具备位置感知能力的听觉反馈。结果展示了该框架的鲁棒性与泛化能力，为连接三维视觉表示与交互式声音合成迈出了有前景的一步。\n"
  },
  {
    "path": "abs/2507.19856.md",
    "content": "### RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection\n\n4D millimeter-wave radar has emerged as a promising sensor for autonomous driving, but effective 3D object detection from both 4D radar and monocular images remains a challenge. Existing fusion approaches typically rely on either instance-based proposals or dense BEV grids, which either lack holistic scene understanding or are limited by rigid grid structures. To address these, we propose RaGS, the first framework to leverage 3D Gaussian Splatting (GS) as representation for fusing 4D radar and monocular cues in 3D object detection. 3D GS naturally suits 3D object detection by modeling the scene as a field of Gaussians, dynamically allocating resources on foreground objects and providing a flexible, resource-efficient solution. RaGS uses a cascaded pipeline to construct and refine the Gaussian field. It starts with the Frustum-based Localization Initiation (FLI), which unprojects foreground pixels to initialize coarse 3D Gaussians positions. Then, the Iterative Multimodal Aggregation (IMA) fuses semantics and geometry, refining the limited Gaussians to the regions of interest. Finally, the Multi-level Gaussian Fusion (MGF) renders the Gaussians into multi-level BEV features for 3D object detection. By dynamically focusing on sparse objects within scenes, RaGS enable object concentrating while offering comprehensive scene perception. Extensive experiments on View-of-Delft, TJ4DRadSet, and OmniHD-Scenes benchmarks demonstrate its state-of-the-art performance. Code will be released.\n\n四维毫米波雷达作为自动驾驶中极具潜力的传感器正受到越来越多的关注，但如何将四维雷达与单目图像结合，实现高效的三维目标检测，仍然是一大挑战。现有融合方法通常依赖于基于实例的候选区域或稠密的 BEV 网格，这些方法要么缺乏整体场景理解能力，要么受制于刚性网格结构的限制。为此，我们提出了 RaGS，这是首个利用三维高斯点渲染（3D Gaussian Splatting, GS）作为表示，将四维雷达与单目信息融合用于三维目标检测的框架。3D GS 天然适用于三维目标检测，因为它将场景建模为高斯场，能够动态将资源分配到前景目标上，提供灵活且资源高效的解决方案。RaGS 采用级联式流程构建并优化高斯场：首先通过基于视锥的定位初始化（Frustum-based Localization Initiation, FLI），将前景像素反投影以初始化粗略的三维高斯位置；随后利用迭代多模态聚合（Iterative Multimodal Aggregation, IMA）融合语义与几何信息，将有限的高斯精炼到感兴趣区域；最后通过多级高斯融合（Multi-level Gaussian Fusion, MGF）将高斯渲染为多级 BEV 特征，用于三维目标检测。通过在场景中动态聚焦稀疏目标，RaGS 在实现目标集中的同时，兼顾了全局场景感知能力。在 View-of-Delft、TJ4DRadSet 和 OmniHD-Scenes 基准上的大量实验表明，RaGS 在性能上达到了当前最先进水平。代码将会开源。\n"
  },
  {
    "path": "abs/2507.20239.md",
    "content": "### Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction\n\n3D Gaussian Splatting (GS) has emerged as a powerful representation for high-quality scene reconstruction, offering compelling rendering quality. However, the training process of GS often suffers from slow convergence due to inefficient densification and suboptimal spatial distribution of Gaussian primitives. In this work, we present a comprehensive analysis of the split and clone operations during the densification phase, revealing their distinct roles in balancing detail preservation and computational efficiency. Building upon this analysis, we propose a global-to-local densification strategy, which facilitates more efficient growth of Gaussians across the scene space, promoting both global coverage and local refinement. To cooperate with the proposed densification strategy and promote sufficient diffusion of Gaussian primitives in space, we introduce an energy-guided coarse-to-fine multi-resolution training framework, which gradually increases resolution based on energy density in 2D images. Additionally, we dynamically prune unnecessary Gaussian primitives to speed up the training. Extensive experiments on MipNeRF-360, Deep Blending, and Tanks & Temples datasets demonstrate that our approach significantly accelerates training,achieving over 2x speedup with fewer Gaussian primitives and superior reconstruction performance.\n\n三维高斯点渲染（3D Gaussian Splatting, GS）已成为高质量场景重建的一种强大表示方法，能够提供令人信服的渲染质量。然而，GS 的训练过程常因致密化效率低下以及高斯基元空间分布不理想而导致收敛缓慢。本文对致密化阶段的拆分（split）与克隆（clone）操作进行了全面分析，揭示了它们在平衡细节保留与计算效率方面的不同作用。在此分析基础上，我们提出了一种从全局到局部的致密化策略，以更高效地在场景空间中增长高斯基元，实现全局覆盖与局部细化的兼顾。为了配合所提的致密化策略并促进高斯基元在空间中的充分扩散，我们引入了一种基于能量引导的由粗到细多分辨率训练框架，该框架根据二维图像中的能量密度逐步提升分辨率。此外，我们还动态剪除不必要的高斯基元以加快训练速度。在 MipNeRF-360、Deep Blending 和 Tanks & Temples 数据集上的大量实验表明，我们的方法能够显著加速训练，在使用更少高斯基元的情况下实现超过 2 倍的加速，并获得更优的重建性能。\n"
  },
  {
    "path": "abs/2507.20331.md",
    "content": "### From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos\n\nInserting 3D objects into videos is a longstanding challenge in computer graphics with applications in augmented reality, virtual try-on, and video composition. Achieving both temporal consistency, or realistic lighting remains difficult, particularly in dynamic scenarios with complex object motion, perspective changes, and varying illumination. While 2D diffusion models have shown promise for producing photorealistic edits, they often struggle with maintaining temporal coherence across frames. Conversely, traditional 3D rendering methods excel in spatial and temporal consistency but fall short in achieving photorealistic lighting. In this work, we propose a hybrid object insertion pipeline that combines the strengths of both paradigms. Specifically, we focus on inserting bracelets into dynamic wrist scenes, leveraging the high temporal consistency of 3D Gaussian Splatting (3DGS) for initial rendering and refining the results using a 2D diffusion-based enhancement model to ensure realistic lighting interactions. Our method introduces a shading-driven pipeline that separates intrinsic object properties (albedo, shading, reflectance) and refines both shading and sRGB images for photorealism. To maintain temporal coherence, we optimize the 3DGS model with multi-frame weighted adjustments. This is the first approach to synergize 3D rendering and 2D diffusion for video object insertion, offering a robust solution for realistic and consistent video editing.\n\n将三维物体插入视频是计算机图形学中由来已久的挑战，在增强现实、虚拟试穿和视频合成等领域具有广泛应用。要同时实现时间一致性和逼真的光照仍然十分困难，尤其是在具有复杂物体运动、视角变化和光照多变的动态场景中。尽管二维扩散模型在生成照片级编辑方面表现出一定潜力，但它们在保持跨帧时间一致性方面常常表现欠佳。相反，传统的三维渲染方法在空间和时间一致性上表现出色，但难以实现照片级真实的光照效果。在本工作中，我们提出了一种融合两种范式优势的混合物体插入流程。具体而言，我们聚焦于在动态手腕场景中插入手镯，利用三维高斯溅射（3DGS）的高时间一致性进行初步渲染，并借助基于二维扩散的增强模型优化结果，以确保光照交互的真实感。我们的方法引入了一个以着色为驱动的流程，将物体的固有属性（反照率、着色、反射率）分离开来，并分别优化着色图和 sRGB 图像，以实现照片级真实感。为了保持时间一致性，我们对 3DGS 模型进行多帧加权优化。这是首个将三维渲染与二维扩散结合用于视频物体插入的方法，为实现逼真且一致的视频编辑提供了稳健的解决方案。\n"
  },
  {
    "path": "abs/2507.20480.md",
    "content": "## Automated 3D-GS Registration and Fusion via Skeleton Alignment and Gaussian-Adaptive Features\n\nIn recent years, 3D Gaussian Splatting (3D-GS)-based scene representation demonstrates significant potential in real-time rendering and training efficiency. However, most existing methods primarily focus on single-map reconstruction, while the registration and fusion of multiple 3D-GS sub-maps remain underexplored. Existing methods typically rely on manual intervention to select a reference sub-map as a template and use point cloud matching for registration. Moreover, hard-threshold filtering of 3D-GS primitives often degrades rendering quality after fusion. In this paper, we present a novel approach for automated 3D-GS sub-map alignment and fusion, eliminating the need for manual intervention while enhancing registration accuracy and fusion quality. First, we extract geometric skeletons across multiple scenes and leverage ellipsoid-aware convolution to capture 3D-GS attributes, facilitating robust scene registration. Second, we introduce a multi-factor Gaussian fusion strategy to mitigate the scene element loss caused by rigid thresholding. Experiments on the ScanNet-GSReg and our Coord datasets demonstrate the effectiveness of the proposed method in registration and fusion. For registration, it achieves a 41.9\\% reduction in RRE on complex scenes, ensuring more precise pose estimation. For fusion, it improves PSNR by 10.11 dB, highlighting superior structural preservation. These results confirm its ability to enhance scene alignment and reconstruction fidelity, ensuring more consistent and accurate 3D scene representation for robotic perception and autonomous navigation.o\n\n近年来，基于三维高斯溅射（3D-GS）的场景表示在实时渲染与训练效率方面展现出巨大潜力。然而，大多数现有方法主要集中于单一地图的重建，对于多 3D-GS 子地图的配准与融合研究相对不足。现有方法通常依赖人工干预选择一个参考子地图作为模板，并使用点云匹配进行配准。此外，对 3D-GS 元素进行硬阈值滤波往往会在融合后降低渲染质量。本文提出了一种全新的自动化 3D-GS 子地图对齐与融合方法，既消除了人工干预的需求，又提升了配准精度与融合质量。首先，我们在多场景中提取几何骨架，并利用椭球感知卷积（ellipsoid-aware convolution）捕捉 3D-GS 属性，从而实现鲁棒的场景配准。其次，我们引入了一种多因子高斯融合策略，以缓解刚性阈值导致的场景元素损失。在 ScanNet-GSReg 数据集和我们自建的 Coord 数据集上的实验表明，该方法在配准与融合任务中均表现优异。在配准方面，本方法在复杂场景中的相对旋转误差（RRE）降低了 41.9%，实现了更精确的位姿估计；在融合方面，峰值信噪比（PSNR）提升了 10.11 dB，显著保留了结构细节。这些结果验证了本方法在提升场景对齐与重建保真度方面的有效性，可为机器人感知与自动导航提供更一致、更精确的三维场景表示。\n"
  },
  {
    "path": "abs/2507.20512.md",
    "content": "### GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections\n\nWe propose a 3D Gaussian splatting-based framework for outdoor relighting that leverages intrinsic image decomposition to precisely integrate sunlight, sky radiance, and indirect lighting from unconstrained photo collections. Unlike prior methods that compress the per-image global illumination into a single latent vector, our approach enables simultaneously diverse shading manipulation and the generation of dynamic shadow effects. This is achieved through three key innovations: (1) a residual-based sun visibility extraction method to accurately separate direct sunlight effects, (2) a region-based supervision framework with a structural consistency loss for physically interpretable and coherent illumination decomposition, and (3) a ray-tracing-based technique for realistic shadow simulation. Extensive experiments demonstrate that our framework synthesizes novel views with competitive fidelity against state-of-the-art relighting solutions and produces more natural and multifaceted illumination and shadow effects.o\n\n我们提出了一种基于三维高斯溅射的户外重光照框架，利用固有图像分解技术，从非受限的照片集合中精确融合阳光、天空辐射和间接光照。与以往将每张图像的全局光照压缩为单一潜向量的方法不同，我们的方法能够同时实现多样化的着色操作以及动态阴影效果的生成。这一能力得益于三项关键创新：（1）基于残差的太阳可见性提取方法，可精确分离直射阳光的影响；（2）基于区域的监督框架结合结构一致性损失，实现具有物理可解释性且一致的光照分解；（3）基于光线追踪的逼真阴影模拟技术。大量实验表明，我们的框架在新视角合成中能达到与最新重光照方法相当的保真度，并生成更自然、更多样化的光照与阴影效果。\n"
  },
  {
    "path": "abs/2507.20854.md",
    "content": "### S3LAM: Surfel Splatting SLAM for Geometrically Accurate Tracking and Mapping\n\nWe propose S3LAM, a novel RGB-D SLAM system that leverages 2D surfel splatting to achieve highly accurate geometric representations for simultaneous tracking and mapping. Unlike existing 3DGS-based SLAM approaches that rely on 3D Gaussian ellipsoids, we utilize 2D Gaussian surfels as primitives for more efficient scene representation. By focusing on the surfaces of objects in the scene, this design enables S3LAM to reconstruct high-quality geometry, benefiting both mapping and tracking. To address inherent SLAM challenges including real-time optimization under limited viewpoints, we introduce a novel adaptive surface rendering strategy that improves mapping accuracy while maintaining computational efficiency. We further derive camera pose Jacobians directly from 2D surfel splatting formulation, highlighting the importance of our geometrically accurate representation that improves tracking convergence. Extensive experiments on both synthetic and real-world datasets validate that S3LAM achieves state-of-the-art performance.\n\n我们提出了 S3LAM，这是一种新型的 RGB-D SLAM 系统，利用二维曲面元溅射实现高精度的几何表示，从而同时进行跟踪与建图。不同于依赖三维高斯椭球的现有基于 3DGS 的 SLAM 方法，我们采用二维高斯曲面元作为基本单元，以实现更高效的场景表示。通过专注于场景中物体的表面，这一设计使 S3LAM 能够重建高质量的几何结构，从而同时提升建图和跟踪性能。针对视角受限条件下的实时优化等 SLAM 固有挑战，我们提出了一种新型的自适应表面渲染策略，在保持计算效率的同时提高建图精度。我们还直接从二维曲面元溅射公式推导出相机位姿的雅可比矩阵，凸显了几何精确表示在提升跟踪收敛性方面的重要作用。大量在合成数据集和真实数据集上的实验验证了 S3LAM 能够实现当前最先进的性能。\n\n\n"
  },
  {
    "path": "abs/2507.21872.md",
    "content": "### MultiEditor: Controllable Multimodal Object Editing for Driving Scenarios Using 3D Gaussian Splatting Priors\n\nAutonomous driving systems rely heavily on multimodal perception data to understand complex environments. However, the long-tailed distribution of real-world data hinders generalization, especially for rare but safety-critical vehicle categories. To address this challenge, we propose MultiEditor, a dual-branch latent diffusion framework designed to edit images and LiDAR point clouds in driving scenarios jointly. At the core of our approach is introducing 3D Gaussian Splatting (3DGS) as a structural and appearance prior for target objects. Leveraging this prior, we design a multi-level appearance control mechanism--comprising pixel-level pasting, semantic-level guidance, and multi-branch refinement--to achieve high-fidelity reconstruction across modalities. We further propose a depth-guided deformable cross-modality condition module that adaptively enables mutual guidance between modalities using 3DGS-rendered depth, significantly enhancing cross-modality consistency. Extensive experiments demonstrate that MultiEditor achieves superior performance in visual and geometric fidelity, editing controllability, and cross-modality consistency. Furthermore, generating rare-category vehicle data with MultiEditor substantially enhances the detection accuracy of perception models on underrepresented classes.\n\n自动驾驶系统在理解复杂环境时高度依赖多模态感知数据。然而，真实世界数据的长尾分布会阻碍模型的泛化能力，尤其是对于罕见但安全关键的车辆类别。为应对这一挑战，我们提出了 MultiEditor，这是一种双分支潜空间扩散框架，旨在在驾驶场景中联合编辑图像和激光雷达点云。我们方法的核心是引入三维高斯溅射（3DGS）作为目标物体的结构与外观先验。在此先验的基础上，我们设计了一个多级外观控制机制——包括像素级粘贴、语义级引导以及多分支细化——以在多模态间实现高保真重建。此外，我们提出了一种深度引导的可变形跨模态条件模块，该模块利用 3DGS 渲染的深度自适应地实现模态间的相互引导，从而显著提升跨模态一致性。大量实验结果表明，MultiEditor 在视觉与几何保真度、编辑可控性以及跨模态一致性方面均取得了优异表现。此外，利用 MultiEditor 生成罕见类别车辆数据能够显著提升感知模型在少样本类别上的检测精度。\n"
  },
  {
    "path": "abs/2507.22342.md",
    "content": "### UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views\n\nThis paper presents a pose-free, feed-forward 3D Gaussian Splatting (3DGS) framework designed to handle unfavorable input views. A common rendering setup for training feed-forward approaches places a 3D object at the world origin and renders it from cameras pointed toward the origin -- i.e., from favorable views, limiting the applicability of these models to real-world scenarios involving varying and unknown camera poses. To overcome this limitation, we introduce a novel adaptation framework that enables pretrained pose-free feed-forward 3DGS models to handle unfavorable views. We leverage priors learned from favorable images by feeding recentered images into a pretrained model augmented with low-rank adaptation (LoRA) layers. We further propose a Gaussian adapter module to enhance the geometric consistency of the Gaussians derived from the recentered inputs, along with a Gaussian alignment method to render accurate target views for training. Additionally, we introduce a new training strategy that utilizes an off-the-shelf dataset composed solely of favorable images. Experimental results on both synthetic images from the Google Scanned Objects dataset and real images from the OmniObject3D dataset validate the effectiveness of our method in handling unfavorable input views.\n\n本文提出了一种无姿态、前馈式的三维高斯溅射（3DGS）框架，旨在处理不利输入视角的问题。前馈方法的常见渲染训练设置是将三维物体放置在世界坐标原点，并从指向原点的相机进行渲染，即使用有利视角，这限制了其在涉及多变且未知相机姿态的真实场景中的适用性。为克服这一限制，我们提出了一种新的适配框架，使预训练的无姿态前馈式 3DGS 模型能够处理不利视角。我们通过将重新居中的图像输入到添加了低秩适配（LoRA）层的预训练模型中，利用从有利图像中学习到的先验。此外，我们提出了高斯适配器模块，以增强由重新居中输入生成的高斯在几何上的一致性，并结合高斯对齐方法生成准确的目标视图用于训练。我们还引入了一种新的训练策略，该策略利用仅包含有利图像的现成数据集。基于 Google Scanned Objects 数据集的合成图像和 OmniObject3D 数据集的真实图像的实验结果验证了本方法在处理不利输入视角方面的有效性。\n"
  },
  {
    "path": "abs/2507.23006.md",
    "content": "### Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction\n\nWe present a framework that enables fast reconstruction and real-time rendering of urban-scale scenes while maintaining robustness against appearance variations across multi-view captures. Our approach begins with scene partitioning for parallel training, employing a visibility-based image selection strategy to optimize training efficiency. A controllable level-of-detail (LOD) strategy explicitly regulates Gaussian density under a user-defined budget, enabling efficient training and rendering while maintaining high visual fidelity. The appearance transformation module mitigates the negative effects of appearance inconsistencies across images while enabling flexible adjustments. Additionally, we utilize enhancement modules, such as depth regularization, scale regularization, and antialiasing, to improve reconstruction fidelity. Experimental results demonstrate that our method effectively reconstructs urban-scale scenes and outperforms previous approaches in both efficiency and quality.\n\n我们提出了一个能够实现城市级场景快速重建与实时渲染的框架，同时在多视图捕获中保持对外观变化的鲁棒性。我们的方法首先通过场景划分进行并行训练，并采用基于可见性的图像选择策略以优化训练效率。我们设计了一种可控的细节层次（LOD）策略，在用户设定的预算范围内显式调节高斯密度，从而在保持高视觉保真度的同时实现高效训练与渲染。外观变换模块用于减轻跨图像外观不一致带来的负面影响，并支持灵活调整。此外，我们还引入了深度正则化、尺度正则化和抗锯齿等增强模块，以提升重建的保真度。实验结果表明，我们的方法能够高效、准确地重建城市级场景，并在效率和质量上均优于以往方法。\n"
  },
  {
    "path": "abs/2507.23273.md",
    "content": "### GSFusion:Globally Optimized LiDAR-Inertial-Visual Mapping for Gaussian Splatting\n\nWhile 3D Gaussian Splatting (3DGS) has revolutionized photorealistic mapping, conventional approaches based on camera sensor, even RGB-D, suffer from fundamental limitations such as high computational load, failure in environments with poor texture or illumination, and short operational ranges. LiDAR emerges as a robust alternative, but its integration with 3DGS introduces new challenges, such as the need for exceptional global alignment for photorealistic quality and prolonged optimization times caused by sparse data. To address these challenges, we propose GSFusion, an online LiDAR-Inertial-Visual mapping system that ensures high-precision map consistency through a surfel-to-surfel constraint in the global pose-graph optimization. To handle sparse data, our system employs a pixel-aware Gaussian initialization strategy for efficient representation and a bounded sigmoid constraint to prevent uncontrolled Gaussian growth. Experiments on public and our datasets demonstrate our system outperforms existing 3DGS SLAM systems in terms of rendering quality and map-building efficiency.\n\n尽管三维高斯溅射（3DGS）在照片级真实感建图方面具有革命性意义，但传统基于相机传感器（即便是 RGB-D）的方案仍存在计算负担重、在纹理或光照较差环境中失效、运行范围短等根本性局限。激光雷达作为一种稳健的替代方案应运而生，但其与 3DGS 的融合也带来了新的挑战，例如为了获得照片级质量所需的极高全局对齐精度，以及由稀疏数据引起的漫长优化时间。为应对这些挑战，我们提出了 GSFusion，这是一种在线激光雷达-惯性-视觉融合建图系统，通过在全局位姿图优化中引入曲面元到曲面元（surfel-to-surfel）约束，确保高精度的地图一致性。为处理稀疏数据，我们采用像素感知的高斯初始化策略以实现高效表示，并引入有界 Sigmoid 约束以防止高斯无限增长。在公开数据集和自建数据集上的实验结果表明，我们的系统在渲染质量和建图效率方面均优于现有的 3DGS SLAM 系统。\n"
  },
  {
    "path": "abs/2507.23277.md",
    "content": "### iLRM: An Iterative Large 3D Reconstruction Model\n\nFeed-forward 3D modeling has emerged as a promising approach for rapid and high-quality 3D reconstruction. In particular, directly generating explicit 3D representations, such as 3D Gaussian splatting, has attracted significant attention due to its fast and high-quality rendering, as well as numerous applications. However, many state-of-the-art methods, primarily based on transformer architectures, suffer from severe scalability issues because they rely on full attention across image tokens from multiple input views, resulting in prohibitive computational costs as the number of views or image resolution increases. Toward a scalable and efficient feed-forward 3D reconstruction, we introduce an iterative Large 3D Reconstruction Model (iLRM) that generates 3D Gaussian representations through an iterative refinement mechanism, guided by three core principles: (1) decoupling the scene representation from input-view images to enable compact 3D representations; (2) decomposing fully-attentional multi-view interactions into a two-stage attention scheme to reduce computational costs; and (3) injecting high-resolution information at every layer to achieve high-fidelity reconstruction. Experimental results on widely used datasets, such as RE10K and DL3DV, demonstrate that iLRM outperforms existing methods in both reconstruction quality and speed. Notably, iLRM exhibits superior scalability, delivering significantly higher reconstruction quality under comparable computational cost by efficiently leveraging a larger number of input views.\n\n前馈式三维建模已成为实现快速、高质量三维重建的有前景方法。尤其是直接生成显式三维表示（如三维高斯溅射）因其快速且高质量的渲染能力以及广泛的应用场景而备受关注。然而，许多主要基于 Transformer 架构的最新方法存在严重的可扩展性问题，因为它们依赖跨多个输入视角图像 token 的全局注意力机制，随着视角数量或图像分辨率的增加，其计算成本会急剧上升。为实现可扩展且高效的前馈式三维重建，我们提出了一种迭代式大型三维重建模型（iLRM），通过迭代细化机制生成三维高斯表示，并遵循三项核心原则：（1）将场景表示与输入视角图像解耦，以实现紧凑的三维表示；（2）将全注意力的多视图交互分解为两阶段注意力方案，以降低计算成本；（3）在每一层引入高分辨率信息，以实现高保真重建。在 RE10K 和 DL3DV 等广泛使用的数据集上的实验结果表明，iLRM 在重建质量和速度上均优于现有方法。值得注意的是，iLRM 具有卓越的可扩展性，在相当的计算成本下，通过高效利用更多输入视角，显著提升了重建质量。\n"
  },
  {
    "path": "abs/2507.23374.md",
    "content": "### NeRF Is a Valuable Assistant for 3D Gaussian Splatting\n\nWe introduce NeRF-GS, a novel framework that jointly optimizes Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). This framework leverages the inherent continuous spatial representation of NeRF to mitigate several limitations of 3DGS, including sensitivity to Gaussian initialization, limited spatial awareness, and weak inter-Gaussian correlations, thereby enhancing its performance. In NeRF-GS, we revisit the design of 3DGS and progressively align its spatial features with NeRF, enabling both representations to be optimized within the same scene through shared 3D spatial information. We further address the formal distinctions between the two approaches by optimizing residual vectors for both implicit features and Gaussian positions to enhance the personalized capabilities of 3DGS. Experimental results on benchmark datasets show that NeRF-GS surpasses existing methods and achieves state-of-the-art performance. This outcome confirms that NeRF and 3DGS are complementary rather than competing, offering new insights into hybrid approaches that combine 3DGS and NeRF for efficient 3D scene representation.\n\n我们提出了 NeRF-GS，这是一种联合优化神经辐射场（NeRF）与三维高斯溅射（3DGS）的新型框架。该框架利用 NeRF 所固有的连续空间表示来缓解 3DGS 的多项局限性，包括对高斯初始化的敏感性、空间感知能力有限以及高斯之间相关性弱，从而提升其整体性能。在 NeRF-GS 中，我们重新审视了 3DGS 的设计，并逐步将其空间特征与 NeRF 对齐，使两种表示能够在同一场景中通过共享的三维空间信息共同优化。我们还针对两种方法在形式上的差异，优化了隐式特征和高斯位置的残差向量，以增强 3DGS 的个性化能力。基准数据集上的实验结果表明，NeRF-GS 超越了现有方法并达到了当前最优性能。这一结果表明，NeRF 与 3DGS 是互补而非竞争的关系，为结合 3DGS 与 NeRF 的混合高效三维场景表示方法提供了新的见解。\n"
  },
  {
    "path": "abs/2507.23569.md",
    "content": "### Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization\n\nVisual localization is the task of estimating a camera pose in a known environment. In this paper, we utilize 3D Gaussian Splatting (3DGS)-based representations for accurate and privacy-preserving visual localization. We propose Gaussian Splatting Feature Fields (GSFFs), a scene representation for visual localization that combines an explicit geometry model (3DGS) with an implicit feature field. We leverage the dense geometric information and differentiable rasterization algorithm from 3DGS to learn robust feature representations grounded in 3D. In particular, we align a 3D scale-aware feature field and a 2D feature encoder in a common embedding space through a contrastive framework. Using a 3D structure-informed clustering procedure, we further regularize the representation learning and seamlessly convert the features to segmentations, which can be used for privacy-preserving visual localization. Pose refinement, which involves aligning either feature maps or segmentations from a query image with those rendered from the GSFFs scene representation, is used to achieve localization. The resulting privacy- and non-privacy-preserving localization pipelines, evaluated on multiple real-world datasets, show state-of-the-art performances.\n\n视觉定位的任务是在已知环境中估计相机姿态。本文利用基于三维高斯溅射（3DGS）的表示来实现精确且隐私保护的视觉定位。我们提出了高斯溅射特征场（GSFFs），这是一种将显式几何模型（3DGS）与隐式特征场相结合的视觉定位场景表示方法。我们利用 3DGS 所提供的稠密几何信息和可微光栅化算法，学习以三维为基础的鲁棒特征表示。具体而言，我们通过对比学习框架，将三维尺度感知特征场与二维特征编码器对齐到一个共同的嵌入空间。利用三维结构感知的聚类过程，我们进一步正则化特征表示学习，并将特征无缝转换为可用于隐私保护视觉定位的分割结果。在定位阶段，我们通过姿态优化，将查询图像的特征图或分割结果与由 GSFFs 场景表示渲染得到的结果进行对齐，从而实现定位。基于多种真实世界数据集的评估结果表明，无论是隐私保护还是非隐私保护的定位流程，我们的方法均达到了当前最优性能。\n"
  },
  {
    "path": "abs/2507.23597.md",
    "content": "### MoGA: 3D Generative Avatar Prior for Monocular Gaussian Avatar Reconstruction\n\nWe present MoGA, a novel method to reconstruct high-fidelity 3D Gaussian avatars from a single-view image. The main challenge lies in inferring unseen appearance and geometric details while ensuring 3D consistency and realism. Most previous methods rely on 2D diffusion models to synthesize unseen views; however, these generated views are sparse and inconsistent, resulting in unrealistic 3D artifacts and blurred appearance. To address these limitations, we leverage a generative avatar model, that can generate diverse 3D avatars by sampling deformed Gaussians from a learned prior distribution. Due to limited 3D training data, such a 3D model alone cannot capture all image details of unseen identities. Consequently, we integrate it as a prior, ensuring 3D consistency by projecting input images into its latent space and enforcing additional 3D appearance and geometric constraints. Our novel approach formulates Gaussian avatar creation as model inversion by fitting the generative avatar to synthetic views from 2D diffusion models. The generative avatar provides an initialization for model fitting, enforces 3D regularization, and helps in refining pose. Experiments show that our method surpasses state-of-the-art techniques and generalizes well to real-world scenarios. Our Gaussian avatars are also inherently animatable.\n\n我们提出了 MoGA，这是一种从单视图图像重建高保真三维高斯头像的新方法。其主要挑战在于在保证三维一致性与真实感的同时，推断未见的外观和几何细节。以往大多数方法依赖二维扩散模型来合成未见视角，但这些生成的视角往往稀疏且不一致，导致三维伪影和外观模糊等不真实现象。为克服这些局限，我们利用生成头像模型，通过从学习到的先验分布中采样变形高斯来生成多样化的三维头像。由于三维训练数据有限，这类三维模型单独使用时无法捕捉所有未见身份的图像细节。因此，我们将其作为先验引入，通过将输入图像投影到其潜空间并施加额外的三维外观与几何约束来保证三维一致性。我们的新方法将高斯头像的生成形式化为模型反演过程，即将生成头像拟合到由二维扩散模型生成的合成视图上。生成头像不仅为模型拟合提供初始化，还能施加三维正则化并辅助姿态优化。实验表明，我们的方法优于现有最先进技术，并在真实场景中具有良好的泛化能力。此外，我们的高斯头像天然具有可动画性。\n"
  },
  {
    "path": "abs/2507.23677.md",
    "content": "### Stereo 3D Gaussian Splatting SLAM for Outdoor Urban Scenes\n\n3D Gaussian Splatting (3DGS) has recently gained popularity in SLAM applications due to its fast rendering and high-fidelity representation. However, existing 3DGS-SLAM systems have predominantly focused on indoor environments and relied on active depth sensors, leaving a gap for large-scale outdoor applications. We present BGS-SLAM, the first binocular 3D Gaussian Splatting SLAM system designed for outdoor scenarios. Our approach uses only RGB stereo pairs without requiring LiDAR or active sensors. BGS-SLAM leverages depth estimates from pre-trained deep stereo networks to guide 3D Gaussian optimization with a multi-loss strategy enhancing both geometric consistency and visual quality. Experiments on multiple datasets demonstrate that BGS-SLAM achieves superior tracking accuracy and mapping performance compared to other 3DGS-based solutions in complex outdoor environments.\n\n3D 高斯泼溅（3DGS）因其高速渲染和高保真表示，近年来在 SLAM 应用中逐渐受到关注。然而，现有的 3DGS-SLAM 系统主要集中于室内环境，并依赖主动深度传感器，这在大规模室外应用中留下了空白。我们提出了 BGS-SLAM，这是首个面向室外场景的双目 3D 高斯泼溅 SLAM 系统。该方法仅使用 RGB 双目图像对，无需 LiDAR 或其他主动传感器。BGS-SLAM 利用经过预训练的深度双目网络生成的深度估计，结合多损失策略，引导 3D 高斯优化，以同时提升几何一致性和视觉质量。在多个数据集上的实验结果表明，BGS-SLAM 在复杂室外环境中，相较于其他基于 3DGS 的方案，能够实现更高的跟踪精度和建图性能。\n"
  },
  {
    "path": "abs/2507.23683.md",
    "content": "### I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation\n\nVast and high-quality data are essential for end-to-end autonomous driving systems. However, current driving data is mainly collected by vehicles, which is expensive and inefficient. A potential solution lies in synthesizing data from real-world images. Recent advancements in 3D reconstruction demonstrate photorealistic novel view synthesis, highlighting the potential of generating driving data from images captured on the road. This paper introduces a novel method, I2V-GS, to transfer the Infrastructure view To the Vehicle view with Gaussian Splatting. Reconstruction from sparse infrastructure viewpoints and rendering under large view transformations is a challenging problem. We adopt the adaptive depth warp to generate dense training views. To further expand the range of views, we employ a cascade strategy to inpaint warped images, which also ensures inpainting content is consistent across views. To further ensure the reliability of the diffusion model, we utilize the cross-view information to perform a confidenceguided optimization. Moreover, we introduce RoadSight, a multi-modality, multi-view dataset from real scenarios in infrastructure views. To our knowledge, I2V-GS is the first framework to generate autonomous driving datasets with infrastructure-vehicle view transformation. Experimental results demonstrate that I2V-GS significantly improves synthesis quality under vehicle view, outperforming StreetGaussian in NTA-Iou, NTL-Iou, and FID by 45.7%, 34.2%, and 14.9%, respectively.\n\n大规模且高质量的数据对端到端自动驾驶系统至关重要。然而，目前的驾驶数据主要由车辆采集，这种方式既昂贵又低效。一种潜在的解决方案是从真实世界的图像中合成数据。近年来，3D 重建在逼真新视角合成方面取得了进展，这突显了利用道路上拍摄的图像生成驾驶数据的潜力。本文提出了一种新方法——I2V-GS，通过高斯泼溅实现基础设施视角（Infrastructure view）到车辆视角（Vehicle view）的转换。从稀疏的基础设施视点进行重建并在大视角变化下进行渲染是一个具有挑战性的问题。我们采用自适应深度变形（adaptive depth warp）生成密集的训练视图。为了进一步扩展视角范围，我们引入级联策略对变形后的图像进行修补，同时确保修补内容在不同视图间保持一致。为了进一步保证扩散模型的可靠性，我们利用跨视图信息执行置信度引导优化。此外，我们还引入了 RoadSight 数据集，该数据集包含来自真实场景的多模态、多视角基础设施视图。据我们所知，I2V-GS 是首个实现基础设施到车辆视角转换以生成自动驾驶数据集的框架。实验结果表明，I2V-GS 在车辆视角下的合成质量显著提升，相比 StreetGaussian 在 NTA-Iou、NTL-Iou 和 FID 上分别提高了 45.7%、34.2% 和 14.9%。\n"
  },
  {
    "path": "abs/2507.23704.md",
    "content": "### Enhanced Velocity Field Modeling for Gaussian Video Reconstruction\n\nHigh-fidelity 3D video reconstruction is essential for enabling real-time rendering of dynamic scenes with realistic motion in virtual and augmented reality (VR/AR). The deformation field paradigm of 3D Gaussian splatting has achieved near-photorealistic results in video reconstruction due to the great representation capability of deep deformation networks. However, in videos with complex motion and significant scale variations, deformation networks often overfit to irregular Gaussian trajectories, leading to suboptimal visual quality. Moreover, the gradient-based densification strategy designed for static scene reconstruction proves inadequate to address the absence of dynamic content. In light of these challenges, we propose a flow-empowered velocity field modeling scheme tailored for Gaussian video reconstruction, dubbed FlowGaussian-VR. It consists of two core components: a velocity field rendering (VFR) pipeline which enables optical flow-based optimization, and a flow-assisted adaptive densification (FAD) strategy that adjusts the number and size of Gaussians in dynamic regions. We validate our model's effectiveness on multi-view dynamic reconstruction and novel view synthesis with multiple real-world datasets containing challenging motion scenarios, demonstrating not only notable visual improvements (over 2.5 dB gain in PSNR) and less blurry artifacts in dynamic textures, but also regularized and trackable per-Gaussian trajectories.\n\n高保真 3D 视频重建对于在虚拟现实（VR）和增强现实（AR）中实现具有真实运动的动态场景实时渲染至关重要。3D 高斯泼溅的形变场范式，凭借深度形变网络强大的表征能力，在视频重建中已取得接近照片级的效果。然而，在具有复杂运动和显著尺度变化的视频中，形变网络往往会对不规则的高斯轨迹发生过拟合，从而导致次优的视觉质量。此外，为静态场景重建设计的基于梯度的密集化策略，难以有效解决动态内容缺失的问题。针对这些挑战，我们提出了一种面向高斯视频重建的光流驱动速度场建模方案——FlowGaussian-VR。该方案包含两个核心组件：一是速度场渲染（VFR）流水线，使得基于光流的优化成为可能；二是光流辅助自适应密集化（FAD）策略，用于在动态区域调整高斯的数量与尺寸。我们在包含复杂运动场景的多个真实世界数据集上，验证了该模型在多视角动态重建和新视角合成任务中的有效性，结果显示不仅在视觉质量上有显著提升（PSNR 提高超过 2.5 dB），动态纹理中的模糊伪影减少，而且高斯粒子的轨迹更加规整且可追踪。\n"
  },
  {
    "path": "abs/2507.23772.md",
    "content": "### SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting\n\n3D affordance reasoning, the task of associating human instructions with the functional regions of 3D objects, is a critical capability for embodied agents. Current methods based on 3D Gaussian Splatting (3DGS) are fundamentally limited to single-object, single-step interactions, a paradigm that falls short of addressing the long-horizon, multi-object tasks required for complex real-world applications. To bridge this gap, we introduce the novel task of Sequential 3D Gaussian Affordance Reasoning and establish SeqAffordSplat, a large-scale benchmark featuring 1800+ scenes to support research on long-horizon affordance understanding in complex 3DGS environments. We then propose SeqSplatNet, an end-to-end framework that directly maps an instruction to a sequence of 3D affordance masks. SeqSplatNet employs a large language model that autoregressively generates text interleaved with special segmentation tokens, guiding a conditional decoder to produce the corresponding 3D mask. To handle complex scene geometry, we introduce a pre-training strategy, Conditional Geometric Reconstruction, where the model learns to reconstruct complete affordance region masks from known geometric observations, thereby building a robust geometric prior. Furthermore, to resolve semantic ambiguities, we design a feature injection mechanism that lifts rich semantic features from 2D Vision Foundation Models (VFM) and fuses them into the 3D decoder at multiple scales. Extensive experiments demonstrate that our method sets a new state-of-the-art on our challenging benchmark, effectively advancing affordance reasoning from single-step interactions to complex, sequential tasks at the scene level.\n\n三维可供性推理（3D affordance reasoning）旨在将人类指令与三维物体的功能区域关联起来，是具身智能体的一项关键能力。当前基于 3D 高斯泼溅（3DGS）的方法，基本上局限于单物体、单步骤交互，这种范式难以满足复杂真实场景中长时序、多物体任务的需求。为弥补这一差距，我们提出了一项新的任务——顺序三维高斯可供性推理（Sequential 3D Gaussian Affordance Reasoning），并构建了 SeqAffordSplat，一个包含 1800+ 场景的大规模基准，用于支持在复杂 3DGS 环境中开展长时序可供性理解的研究。随后，我们提出了 SeqSplatNet，这是一个端到端框架，可将指令直接映射为一系列三维可供性掩码。SeqSplatNet 利用大语言模型自回归生成包含特殊分割标记的文本，引导条件解码器生成对应的三维掩码。为应对复杂场景几何，我们引入了条件几何重建（Conditional Geometric Reconstruction）预训练策略，使模型能够根据已知几何观测重建完整的可供性区域掩码，从而建立稳健的几何先验。此外，为解决语义歧义，我们设计了特征注入机制，将来自二维视觉基础模型（VFM）的丰富语义特征提升到三维解码器中，并在多尺度上进行融合。大量实验表明，我们的方法在具有挑战性的基准上刷新了最新性能记录，有效推动了可供性推理从单步交互发展到场景级的复杂顺序任务。\n"
  },
  {
    "path": "abs/2507.23785.md",
    "content": "### Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis\n\nIn this paper, we present a novel framework for video-to-4D generation that creates high-quality dynamic 3D content from single video inputs. Direct 4D diffusion modeling is extremely challenging due to costly data construction and the high-dimensional nature of jointly representing 3D shape, appearance, and motion. We address these challenges by introducing a Direct 4DMesh-to-GS Variation Field VAE that directly encodes canonical Gaussian Splats (GS) and their temporal variations from 3D animation data without per-instance fitting, and compresses high-dimensional animations into a compact latent space. Building upon this efficient representation, we train a Gaussian Variation Field diffusion model with temporal-aware Diffusion Transformer conditioned on input videos and canonical GS. Trained on carefully-curated animatable 3D objects from the Objaverse dataset, our model demonstrates superior generation quality compared to existing methods. It also exhibits remarkable generalization to in-the-wild video inputs despite being trained exclusively on synthetic data, paving the way for generating high-quality animated 3D content.\n\n本文提出了一种新颖的视频到 4D 生成框架，可从单个视频输入生成高质量的动态三维内容。直接进行 4D 扩散建模极具挑战性，因为这不仅需要高成本的数据构建，还要应对同时表示三维形状、外观和运动的高维特性。为解决这些问题，我们引入了一种直接的 4DMesh-to-GS 变动场 VAE，该方法可直接从三维动画数据中编码标准高斯泼溅（GS）及其时间变化，而无需针对每个实例进行拟合，并将高维动画压缩到紧凑的潜在空间。在这一高效表示的基础上，我们训练了一个高斯变动场扩散模型，该模型结合了时间感知的扩散 Transformer，并以输入视频和标准 GS 为条件。该模型在精心筛选的 Objaverse 数据集中可动画的三维物体上进行训练，与现有方法相比展现出更高的生成质量。同时，即便仅在合成数据上训练，它在真实视频输入上的泛化能力也十分出色，为生成高质量的动画三维内容开辟了新途径。\n"
  },
  {
    "path": "abs/2508.00259.md",
    "content": "### PointGauss: Point Cloud-Guided Multi-Object Segmentation for Gaussian Splatting\n\nWe introduce PointGauss, a novel point cloud-guided framework for real-time multi-object segmentation in Gaussian Splatting representations. Unlike existing methods that suffer from prolonged initialization and limited multi-view consistency, our approach achieves efficient 3D segmentation by directly parsing Gaussian primitives through a point cloud segmentation-driven pipeline. The key innovation lies in two aspects: (1) a point cloud-based Gaussian primitive decoder that generates 3D instance masks within 1 minute, and (2) a GPU-accelerated 2D mask rendering system that ensures multi-view consistency. Extensive experiments demonstrate significant improvements over previous state-of-the-art methods, achieving performance gains of 1.89 to 31.78% in multi-view mIoU, while maintaining superior computational efficiency. To address the limitations of current benchmarks (single-object focus, inconsistent 3D evaluation, small scale, and partial coverage), we present DesktopObjects-360, a novel comprehensive dataset for 3D segmentation in radiance fields, featuring: (1) complex multi-object scenes, (2) globally consistent 2D annotations, (3) large-scale training data (over 27 thousand 2D masks), (4) full 360° coverage, and (5) 3D evaluation masks.\n\n我们提出了 PointGauss，这是一种面向高斯泼溅表示的点云引导实时多目标分割新框架。与现有方法存在初始化耗时长、多视图一致性有限等问题不同，我们的方法通过点云分割驱动的流水线直接解析高斯基元，实现了高效的三维分割。其核心创新有两点：（1）基于点云的高斯基元解码器，可在 1 分钟内生成三维实例掩码；（2）GPU 加速的二维掩码渲染系统，确保多视图一致性。大量实验表明，与之前的最新方法相比，该方法在多视图 mIoU 上提升了 1.89% 至 31.78%，同时保持了卓越的计算效率。为解决现有基准存在的局限（如单目标任务为主、三维评估不一致、规模小、覆盖不完整等），我们构建了 DesktopObjects-360 数据集，这是一个用于辐射场三维分割的全新综合性数据集，具备以下特点：（1）复杂多目标场景，（2）全局一致的二维标注，（3）大规模训练数据（超过 2.7 万个二维掩码），（4）完整的 360° 覆盖，（5）三维评估掩码。\n"
  },
  {
    "path": "abs/2508.00354.md",
    "content": "### Omni-Scan: Creating Visually-Accurate Digital Twin Object Models Using a Bimanual Robot with Handover and Gaussian Splat Merging\n\n3D Gaussian Splats (3DGSs) are 3D object models derived from multi-view images. Such \"digital twins\" are useful for simulations, virtual reality, marketing, robot policy fine-tuning, and part inspection. 3D object scanning usually requires multi-camera arrays, precise laser scanners, or robot wrist-mounted cameras, which have restricted workspaces. We propose Omni-Scan, a pipeline for producing high-quality 3D Gaussian Splat models using a bi-manual robot that grasps an object with one gripper and rotates the object with respect to a stationary camera. The object is then re-grasped by a second gripper to expose surfaces that were occluded by the first gripper. We present the Omni-Scan robot pipeline using DepthAny-thing, Segment Anything, as well as RAFT optical flow models to identify and isolate objects held by a robot gripper while removing the gripper and the background. We then modify the 3DGS training pipeline to support concatenated datasets with gripper occlusion, producing an omni-directional (360 degree view) model of the object. We apply Omni-Scan to part defect inspection, finding that it can identify visual or geometric defects in 12 different industrial and household objects with an average accuracy of 83%.\n\n三维高斯泼溅（3DGS）是一种由多视图图像生成的三维物体模型。这类“数字孪生”可广泛应用于仿真、虚拟现实、营销、机器人策略微调以及零件检测等领域。传统的三维物体扫描通常需要多相机阵列、精密激光扫描仪或安装在机器人手腕上的相机，但这些方法受限于工作空间。我们提出了 Omni-Scan，这是一种利用双臂机器人生成高质量 3D 高斯泼溅模型的流程。该流程中，一只机械手夹持物体并相对于固定相机旋转物体，然后由另一只机械手重新夹持物体，以暴露被第一只机械手遮挡的表面。我们在 Omni-Scan 机器人流程中结合了 DepthAnything、Segment Anything 以及 RAFT 光流模型，用于识别并分离由机器人夹持的物体，同时去除机械手和背景。随后，我们对 3DGS 训练流程进行修改，以支持包含机械手遮挡的拼接数据集，从而生成物体的全方位（360 度）模型。我们将 Omni-Scan 应用于零件缺陷检测，结果表明它能够以 83% 的平均准确率识别出 12 种不同工业和家用物体的视觉或几何缺陷。\n"
  },
  {
    "path": "abs/2508.00823.md",
    "content": "### IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation\n\nVisual navigation with an image as goal is a fundamental and challenging problem. Conventional methods either rely on end-to-end RL learning or modular-based policy with topological graph or BEV map as memory, which cannot fully model the geometric relationship between the explored 3D environment and the goal image. In order to efficiently and accurately localize the goal image in 3D space, we build our navigation system upon the renderable 3D gaussian (3DGS) representation. However, due to the computational intensity of 3DGS optimization and the large search space of 6-DoF camera pose, directly leveraging 3DGS for image localization during agent exploration process is prohibitively inefficient. To this end, we propose IGL-Nav, an Incremental 3D Gaussian Localization framework for efficient and 3D-aware image-goal navigation. Specifically, we incrementally update the scene representation as new images arrive with feed-forward monocular prediction. Then we coarsely localize the goal by leveraging the geometric information for discrete space matching, which can be equivalent to efficient 3D convolution. When the agent is close to the goal, we finally solve the fine target pose with optimization via differentiable rendering. The proposed IGL-Nav outperforms existing state-of-the-art methods by a large margin across diverse experimental configurations. It can also handle the more challenging free-view image-goal setting and be deployed on real-world robotic platform using a cellphone to capture goal image at arbitrary pose.\n\n以图像作为目标的视觉导航是一个基础且具有挑战性的问题。传统方法要么依赖端到端的强化学习，要么采用基于模块化的策略，将拓扑图或鸟瞰图（BEV）作为记忆，但这些方法无法充分建模已探索的三维环境与目标图像之间的几何关系。为了在三维空间中高效且准确地定位目标图像，我们将导航系统建立在可渲染的三维高斯（3DGS）表示之上。然而，由于 3DGS 优化的计算开销大以及 6 自由度相机位姿的搜索空间庞大，在智能体探索过程中直接利用 3DGS 进行图像定位效率极低。为此，我们提出了 IGL-Nav，这是一种用于高效且具备三维感知能力的图像目标导航的增量式三维高斯定位框架。具体而言，我们在新图像到来时，利用前向单目预测增量更新场景表示；然后利用几何信息进行离散空间匹配以粗略定位目标，这相当于高效的三维卷积；当智能体接近目标时，我们通过可微渲染优化求解精确的目标位姿。所提出的 IGL-Nav 在多种实验配置下均显著超越现有的最新方法，同时还能处理更具挑战性的自由视角图像目标设定，并可部署在真实的机器人平台上，仅需使用手机在任意姿态下拍摄目标图像即可。\n"
  },
  {
    "path": "abs/2508.01150.md",
    "content": "### OpenGS-Fusion: Open-Vocabulary Dense Mapping with Hybrid 3D Gaussian Splatting for Refined Object-Level Understanding\n\nRecent advancements in 3D scene understanding have made significant strides in enabling interaction with scenes using open-vocabulary queries, particularly for VR/AR and robotic applications. Nevertheless, existing methods are hindered by rigid offline pipelines and the inability to provide precise 3D object-level understanding given open-ended queries. In this paper, we present OpenGS-Fusion, an innovative open-vocabulary dense mapping framework that improves semantic modeling and refines object-level understanding. OpenGS-Fusion combines 3D Gaussian representation with a Truncated Signed Distance Field to facilitate lossless fusion of semantic features on-the-fly. Furthermore, we introduce a novel multimodal language-guided approach named MLLM-Assisted Adaptive Thresholding, which refines the segmentation of 3D objects by adaptively adjusting similarity thresholds, achieving an improvement 17\\% in 3D mIoU compared to the fixed threshold strategy. Extensive experiments demonstrate that our method outperforms existing methods in 3D object understanding and scene reconstruction quality, as well as showcasing its effectiveness in language-guided scene interaction.\n\n近年来，三维场景理解取得了显著进展，使得在 VR/AR 和机器人等应用中能够利用开放词汇查询与场景进行交互。然而，现有方法受制于僵化的离线流程，并且在开放式查询条件下无法提供精确的三维物体级理解。本文提出了 OpenGS-Fusion，这是一种创新的开放词汇稠密建图框架，可提升语义建模能力并优化物体级理解。OpenGS-Fusion 将三维高斯表示与截断符号距离场（TSDF）结合，实现了语义特征的无损在线融合。此外，我们引入了一种新颖的多模态语言引导方法——MLLM 辅助自适应阈值（MLLM-Assisted Adaptive Thresholding），通过自适应调整相似度阈值来优化三维物体的分割效果，与固定阈值策略相比，3D mIoU 提升了 17%。大量实验表明，我们的方法在三维物体理解和场景重建质量上均优于现有方法，同时展现了其在语言引导的场景交互中的有效性。\n"
  },
  {
    "path": "abs/2508.01171.md",
    "content": "### No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views\n\nWe introduce SPFSplat, an efficient framework for 3D Gaussian splatting from sparse multi-view images, requiring no ground-truth poses during training or inference. It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses in a canonical space from unposed inputs within a single feed-forward step. Alongside the rendering loss based on estimated novel-view poses, a reprojection loss is integrated to enforce the learning of pixel-aligned Gaussian primitives for enhanced geometric constraints. This pose-free training paradigm and efficient one-step feed-forward design make SPFSplat well-suited for practical applications. Remarkably, despite the absence of pose supervision, SPFSplat achieves state-of-the-art performance in novel view synthesis even under significant viewpoint changes and limited image overlap. It also surpasses recent methods trained with geometry priors in relative pose estimation.\n\n我们提出了 SPFSplat，这是一种高效的三维高斯泼溅框架，可从稀疏多视图图像中进行建模，训练和推理过程中均无需真实位姿。该方法采用共享特征提取骨干网络，使得在单次前向推理中即可从无位姿输入同时预测标准空间下的三维高斯基元和相机位姿。除了基于估计的新视角位姿的渲染损失外，还引入了重投影损失，以强化像素对齐的高斯基元学习，从而增强几何约束。这种无位姿监督的训练范式与高效的一步前向设计，使 SPFSplat 非常适用于实际应用。值得注意的是，即使在缺乏位姿监督的情况下，SPFSplat 在新视角合成中依然取得了最新的性能，即便在视角变化较大和图像重叠有限的条件下亦如此。此外，它在相对位姿估计中也优于利用几何先验进行训练的最新方法。\n"
  },
  {
    "path": "abs/2508.01218.md",
    "content": "### MoGaFace: Momentum-Guided and Texture-Aware Gaussian Avatars for Consistent Facial Geometry\n\nExisting 3D head avatar reconstruction methods adopt a two-stage process, relying on tracked FLAME meshes derived from facial landmarks, followed by Gaussian-based rendering. However, misalignment between the estimated mesh and target images often leads to suboptimal rendering quality and loss of fine visual details. In this paper, we present MoGaFace, a novel 3D head avatar modeling framework that continuously refines facial geometry and texture attributes throughout the Gaussian rendering process. To address the misalignment between estimated FLAME meshes and target images, we introduce the Momentum-Guided Consistent Geometry module, which incorporates a momentum-updated expression bank and an expression-aware correction mechanism to ensure temporal and multi-view consistency. Additionally, we propose Latent Texture Attention, which encodes compact multi-view features into head-aware representations, enabling geometry-aware texture refinement via integration into Gaussians. Extensive experiments show that MoGaFace achieves high-fidelity head avatar reconstruction and significantly improves novel-view synthesis quality, even under inaccurate mesh initialization and unconstrained real-world settings.\n\n现有的三维头像重建方法通常采用两阶段流程，首先依赖由面部关键点估计得到的 FLAME 网格进行跟踪，然后再进行基于高斯的渲染。然而，估计网格与目标图像之间的错位常常导致渲染质量不佳，并丢失细节视觉信息。本文提出了 MoGaFace，这是一种新颖的三维头像建模框架，可在高斯渲染过程中持续优化面部几何和纹理属性。为解决估计的 FLAME 网格与目标图像的错位问题，我们引入了动量引导一致几何（Momentum-Guided Consistent Geometry）模块，该模块结合了动量更新的表情库和基于表情感知的校正机制，以确保时间和多视图的一致性。此外，我们提出了潜纹理注意（Latent Texture Attention）机制，将紧凑的多视图特征编码为面部感知表征，并将其融合到高斯中，实现基于几何感知的纹理优化。大量实验表明，MoGaFace 在不准确网格初始化和不受约束的真实场景下，依然能够实现高保真头像重建，并显著提升新视角合成质量。\n"
  },
  {
    "path": "abs/2508.01239.md",
    "content": "### OCSplats: Observation Completeness Quantification and Label Noise Separation in 3DGS\n\n3D Gaussian Splatting (3DGS) has become one of the most promising 3D reconstruction technologies. However, label noise in real-world scenarios-such as moving objects, non-Lambertian surfaces, and shadows-often leads to reconstruction errors. Existing 3DGS-Bsed anti-noise reconstruction methods either fail to separate noise effectively or require scene-specific fine-tuning of hyperparameters, making them difficult to apply in practice. This paper re-examines the problem of anti-noise reconstruction from the perspective of epistemic uncertainty, proposing a novel framework, OCSplats. By combining key technologies such as hybrid noise assessment and observation-based cognitive correction, the accuracy of noise classification in areas with cognitive differences has been significantly improved. Moreover, to address the issue of varying noise proportions in different scenarios, we have designed a label noise classification pipeline based on dynamic anchor points. This pipeline enables OCSplats to be applied simultaneously to scenarios with vastly different noise proportions without adjusting parameters. Extensive experiments demonstrate that OCSplats always achieve leading reconstruction performance and precise label noise classification in scenes of different complexity levels.\n\n三维高斯泼溅（3DGS）已成为最具前景的三维重建技术之一。然而，在真实场景中，诸如运动物体、非朗伯表面以及阴影等因素引入的标签噪声，常常会导致重建误差。现有基于 3DGS 的抗噪重建方法，要么无法有效分离噪声，要么需要针对特定场景对超参数进行精细调整，从而难以在实际中推广。本文从认知不确定性（epistemic uncertainty）的视角重新审视抗噪重建问题，提出了一种新框架——OCSplats。该方法结合了混合噪声评估与基于观测的认知校正等关键技术，大幅提升了在存在认知差异区域的噪声分类精度。此外，为应对不同场景中噪声比例差异较大的问题，我们设计了一种基于动态锚点的标签噪声分类流程，使 OCSplats 能够在无需调整参数的情况下同时适用于噪声比例差异显著的多种场景。大量实验表明，OCSplats 在不同复杂度场景中均能实现领先的重建性能和精确的标签噪声分类。\n"
  },
  {
    "path": "abs/2508.01464.md",
    "content": "### Can3Tok: Canonical 3D Tokenization and Latent Modeling of Scene-Level 3D Gaussians\n\n3D generation has made significant progress, however, it still largely remains at the object-level. Feedforward 3D scene-level generation has been rarely explored due to the lack of models capable of scaling-up latent representation learning on 3D scene-level data. Unlike object-level generative models, which are trained on well-labeled 3D data in a bounded canonical space, scene-level generations with 3D scenes represented by 3D Gaussian Splatting (3DGS) are unbounded and exhibit scale inconsistency across different scenes, making unified latent representation learning for generative purposes extremely challenging. In this paper, we introduce Can3Tok, the first 3D scene-level variational autoencoder (VAE) capable of encoding a large number of Gaussian primitives into a low-dimensional latent embedding, which effectively captures both semantic and spatial information of the inputs. Beyond model design, we propose a general pipeline for 3D scene data processing to address scale inconsistency issue. We validate our method on the recent scene-level 3D dataset DL3DV-10K, where we found that only Can3Tok successfully generalizes to novel 3D scenes, while compared methods fail to converge on even a few hundred scene inputs during training and exhibit zero generalization ability during inference. Finally, we demonstrate image-to-3DGS and text-to-3DGS generation as our applications to demonstrate its ability to facilitate downstream generation tasks.\n\n三维生成技术已取得显著进展，但仍主要停留在物体级别。由于缺乏能够在三维场景级数据上扩展潜在表示学习的模型，前向式的三维场景级生成鲜有探索。与在有界标准空间中利用标注完善的三维数据训练的物体级生成模型不同，基于三维高斯泼溅（3DGS）表示的场景级生成是无界的，并且在不同场景间存在尺度不一致问题，这使得面向生成任务的统一潜在表示学习极具挑战性。本文提出了 Can3Tok，这是首个能够将大量高斯基元编码为低维潜在嵌入的三维场景级变分自编码器（VAE），能够有效捕获输入的语义和空间信息。除了模型设计，我们还提出了一套通用的三维场景数据处理流程，以解决尺度不一致的问题。我们在最新的场景级三维数据集 DL3DV-10K 上验证了该方法，结果发现，只有 Can3Tok 能够成功泛化到新的三维场景，而对比方法在训练中即使面对几百个场景输入也无法收敛，并且在推理时表现出零泛化能力。最后，我们展示了图像到 3DGS 和文本到 3DGS 的生成应用，以证明其在下游生成任务中的促进作用。\n"
  },
  {
    "path": "abs/2508.01704.md",
    "content": "### LT-Gaussian: Long-Term Map Update Using 3D Gaussian Splatting for Autonomous Driving\n\nMaps play an important role in autonomous driving systems. The recently proposed 3D Gaussian Splatting (3D-GS) produces rendering-quality explicit scene reconstruction results, demonstrating the potential for map construction in autonomous driving scenarios. However, because of the time and computational costs involved in generating Gaussian scenes, how to update the map becomes a significant challenge. In this paper, we propose LT-Gaussian, a map update method for 3D-GS-based maps. LT-Gaussian consists of three main components: Multimodal Gaussian Splatting, Structural Change Detection Module, and Gaussian-Map Update Module. Firstly, the Gaussian map of the old scene is generated using our proposed Multimodal Gaussian Splatting. Subsequently, during the map update process, we compare the outdated Gaussian map with the current LiDAR data stream to identify structural changes. Finally, we perform targeted updates to the Gaussian-map to generate an up-to-date map. We establish a benchmark for map updating on the nuScenes dataset to quantitatively evaluate our method. The experimental results show that LT-Gaussian can effectively and efficiently update the Gaussian-map, handling common environmental changes in autonomous driving scenarios. Furthermore, by taking full advantage of information from both new and old scenes, LT-Gaussian is able to produce higher quality reconstruction results compared to map update strategies that reconstruct maps from scratch.\n\n地图在自动驾驶系统中起着重要作用。近期提出的三维高斯泼溅（3D-GS）能够生成具备渲染质量的显式场景重建结果，展示了其在自动驾驶场景中构建地图的潜力。然而，由于生成高斯场景所需的时间和计算成本较高，如何高效更新地图成为一个重要挑战。本文提出了 LT-Gaussian，这是一种面向基于 3D-GS 地图的更新方法。LT-Gaussian 包含三个主要模块：多模态高斯泼溅、结构变化检测模块以及高斯地图更新模块。首先，我们利用所提出的多模态高斯泼溅生成旧场景的高斯地图；随后，在地图更新过程中，将过时的高斯地图与当前的 LiDAR 数据流进行对比，以识别结构变化；最后，对高斯地图进行针对性更新，从而生成最新的地图。我们在 nuScenes 数据集上建立了地图更新基准，以定量评估所提方法。实验结果表明，LT-Gaussian 能够高效且有效地更新高斯地图，能够应对自动驾驶场景中的常见环境变化。此外，通过充分利用新旧场景信息，LT-Gaussian 相较于从零开始重建地图的更新策略，能够生成更高质量的重建结果。\n"
  },
  {
    "path": "abs/2508.01740.md",
    "content": "### AG2aussian: Anchor-Graph Structured Gaussian Splatting for Instance-Level 3D Scene Understanding and Editing\n\n3D Gaussian Splatting (3DGS) has witnessed exponential adoption across diverse applications, driving a critical need for semantic-aware 3D Gaussian representations to enable scene understanding and editing tasks. Existing approaches typically attach semantic features to a collection of free Gaussians and distill the features via differentiable rendering, leading to noisy segmentation and a messy selection of Gaussians. In this paper, we introduce AG2aussian, a novel framework that leverages an anchor-graph structure to organize semantic features and regulate Gaussian primitives. Our anchor-graph structure not only promotes compact and instance-aware Gaussian distributions, but also facilitates graph-based propagation, achieving a clean and accurate instance-level Gaussian selection. Extensive validation across four applications, i.e. interactive click-based query, open-vocabulary text-driven query, object removal editing, and physics simulation, demonstrates the advantages of our approach and its benefits to various applications. The experiments and ablation studies further evaluate the effectiveness of the key designs of our approach.\n\n三维高斯泼溅（3DGS）在多种应用中得到了指数级的普及，这对具备语义感知能力的三维高斯表示提出了迫切需求，以支持场景理解与编辑任务。现有方法通常将语义特征附加到一组自由高斯上，并通过可微渲染进行特征蒸馏，这往往导致分割结果噪声较大，高斯选择杂乱无序。本文提出了 AG2aussian，这是一种利用锚点图结构（anchor-graph）组织语义特征并规范高斯基元的新框架。锚点图结构不仅能够促进紧凑且具备实例感知能力的高斯分布，还能支持基于图的特征传播，从而实现干净且精准的实例级高斯选择。在交互式点击查询、开放词汇文本驱动查询、目标移除编辑以及物理仿真四个应用上的广泛验证表明，我们的方法在多种任务中均具有优势。实验与消融研究进一步验证了该方法关键设计的有效性。\n"
  },
  {
    "path": "abs/2508.02172.md",
    "content": "### GaussianCross: Cross-modal Self-supervised 3D Representation Learning via Gaussian Splatting\n\nThe significance of informative and robust point representations has been widely acknowledged for 3D scene understanding. Despite existing self-supervised pre-training counterparts demonstrating promising performance, the model collapse and structural information deficiency remain prevalent due to insufficient point discrimination difficulty, yielding unreliable expressions and suboptimal performance. In this paper, we present GaussianCross, a novel cross-modal self-supervised 3D representation learning architecture integrating feed-forward 3D Gaussian Splatting (3DGS) techniques to address current challenges. GaussianCross seamlessly converts scale-inconsistent 3D point clouds into a unified cuboid-normalized Gaussian representation without missing details, enabling stable and generalizable pre-training. Subsequently, a tri-attribute adaptive distillation splatting module is incorporated to construct a 3D feature field, facilitating synergetic feature capturing of appearance, geometry, and semantic cues to maintain cross-modal consistency. To validate GaussianCross, we perform extensive evaluations on various benchmarks, including ScanNet, ScanNet200, and S3DIS. In particular, GaussianCross shows a prominent parameter and data efficiency, achieving superior performance through linear probing (&lt; 0.1% parameters) and limited data training (1% of scenes) compared to state-of-the-art methods. Furthermore, GaussianCross demonstrates strong generalization capabilities, improving the full fine-tuning accuracy by 9.3% mIoU and 6.1% AP50 on ScanNet200 semantic and instance segmentation tasks, respectively, supporting the effectiveness of our approach.\n\n信息丰富且稳健的点云表示对于三维场景理解的重要性已被广泛认可。尽管现有的自监督预训练方法在性能上展现出潜力，但由于点云区分难度不足，模型坍塌和结构信息缺失仍普遍存在，从而导致表示不可靠、性能欠佳。本文提出了 GaussianCross，这是一种融合前向三维高斯泼溅（3DGS）技术的跨模态自监督三维表示学习架构，用以应对当前挑战。GaussianCross 可将尺度不一致的三维点云无缝转换为统一的立方体归一化高斯表示，同时保留细节，从而实现稳定且具有泛化能力的预训练。随后，我们引入三属性自适应蒸馏泼溅模块，构建三维特征场，以协同捕获外观、几何与语义信息，从而保持跨模态一致性。为了验证 GaussianCross，我们在 ScanNet、ScanNet200 和 S3DIS 等多个基准上进行了广泛评估。特别地，GaussianCross 在参数与数据效率上表现突出，与最新方法相比，在仅使用线性探测（&lt; 0.1% 参数量）和有限数据训练（仅 1% 场景）的条件下仍能取得更优性能。此外，GaussianCross 展现出较强的泛化能力，在 ScanNet200 的语义分割和实例分割任务上，完整微调的精度分别提升了 9.3% mIoU 和 6.1% AP50，验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2508.02261.md",
    "content": "### SplatSSC: Decoupled Depth-Guided Gaussian Splatting for Semantic Scene Completion\n\nMonocular 3D Semantic Scene Completion (SSC) is a challenging yet promising task that aims to infer dense geometric and semantic descriptions of a scene from a single image. While recent object-centric paradigms significantly improve efficiency by leveraging flexible 3D Gaussian primitives, they still rely heavily on a large number of randomly initialized primitives, which inevitably leads to 1) inefficient primitive initialization and 2) outlier primitives that introduce erroneous artifacts. In this paper, we propose SplatSSC, a novel framework that resolves these limitations with a depth-guided initialization strategy and a principled Gaussian aggregator. Instead of random initialization, SplatSSC utilizes a dedicated depth branch composed of a Group-wise Multi-scale Fusion (GMF) module, which integrates multi-scale image and depth features to generate a sparse yet representative set of initial Gaussian primitives. To mitigate noise from outlier primitives, we develop the Decoupled Gaussian Aggregator (DGA), which enhances robustness by decomposing geometric and semantic predictions during the Gaussian-to-voxel splatting process. Complemented with a specialized Probability Scale Loss, our method achieves state-of-the-art performance on the Occ-ScanNet dataset, outperforming prior approaches by over 6.3% in IoU and 4.1% in mIoU, while reducing both latency and memory consumption by more than 9.3%.\n\n单目三维语义场景补全（SSC）是一项具有挑战性但前景广阔的任务，旨在从单张图像中推断场景的密集几何与语义描述。尽管近期基于物体中心的范式通过利用灵活的三维高斯基元显著提升了效率，但它们仍然在很大程度上依赖于大量随机初始化的基元，这不可避免地导致 1）基元初始化效率低下，以及 2）离群基元引入错误伪影。本文提出了 SplatSSC，这是一种通过深度引导初始化策略和结构化高斯聚合器来解决上述问题的新框架。SplatSSC 不再采用随机初始化，而是利用由分组多尺度融合（GMF）模块构成的专用深度分支，将多尺度图像与深度特征融合，生成稀疏但具有代表性的一组初始高斯基元。为减轻离群基元带来的噪声，我们设计了解耦高斯聚合器（DGA），在高斯到体素的泼溅过程中将几何与语义预测解耦，从而提升鲁棒性。结合专门设计的概率尺度损失（Probability Scale Loss），我们的方法在 Occ-ScanNet 数据集上实现了最新的性能，相比现有方法在 IoU 上提升超过 6.3%，在 mIoU 上提升 4.1%，同时将延迟和内存消耗降低了 9.3% 以上。\n"
  },
  {
    "path": "abs/2508.02408.md",
    "content": "### GR-Gaussian: Graph-Based Radiative Gaussian Splatting for Sparse-View CT Reconstruction\n\n3D Gaussian Splatting (3DGS) has emerged as a promising approach for CT reconstruction. However, existing methods rely on the average gradient magnitude of points within the view, often leading to severe needle-like artifacts under sparse-view conditions. To address this challenge, we propose GR-Gaussian, a graph-based 3D Gaussian Splatting framework that suppresses needle-like artifacts and improves reconstruction accuracy under sparse-view conditions. Our framework introduces two key innovations: (1) a Denoised Point Cloud Initialization Strategy that reduces initialization errors and accelerates convergence; and (2) a Pixel-Graph-Aware Gradient Strategy that refines gradient computation using graph-based density differences, improving splitting accuracy and density representation. Experiments on X-3D and real-world datasets validate the effectiveness of GR-Gaussian, achieving PSNR improvements of 0.67 dB and 0.92 dB, and SSIM gains of 0.011 and 0.021. These results highlight the applicability of GR-Gaussian for accurate CT reconstruction under challenging sparse-view conditions.\n\n三维高斯泼溅（3DGS）已成为一种有潜力的 CT 重建方法。然而，现有方法依赖于视图内点的平均梯度幅值，在稀疏视角条件下往往会导致严重的针状伪影。为应对这一挑战，我们提出了 GR-Gaussian，这是一种基于图的三维高斯泼溅框架，可在稀疏视角条件下抑制针状伪影并提升重建精度。该框架包含两项核心创新：（1）去噪点云初始化策略，可减少初始化误差并加快收敛速度；（2）像素图感知梯度策略，通过利用基于图的密度差异优化梯度计算，从而提升分裂精度与密度表示能力。在 X-3D 和真实数据集上的实验验证了 GR-Gaussian 的有效性，分别实现了 PSNR 提升 0.67 dB 和 0.92 dB，以及 SSIM 提升 0.011 和 0.021。这些结果凸显了 GR-Gaussian 在具有挑战性的稀疏视角条件下实现高精度 CT 重建的适用性。\n"
  },
  {
    "path": "abs/2508.02493.md",
    "content": "### Low-Frequency First: Eliminating Floating Artifacts in 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) is a powerful and computationally efficient representation for 3D reconstruction. Despite its strengths, 3DGS often produces floating artifacts, which are erroneous structures detached from the actual geometry and significantly degrade visual fidelity. The underlying mechanisms causing these artifacts, particularly in low-quality initialization scenarios, have not been fully explored. In this paper, we investigate the origins of floating artifacts from a frequency-domain perspective and identify under-optimized Gaussians as the primary source. Based on our analysis, we propose Eliminating-Floating-Artifacts Gaussian Splatting (EFA-GS), which selectively expands under-optimized Gaussians to prioritize accurate low-frequency learning. Additionally, we introduce complementary depth-based and scale-based strategies to dynamically refine Gaussian expansion, effectively mitigating detail erosion. Extensive experiments on both synthetic and real-world datasets demonstrate that EFA-GS substantially reduces floating artifacts while preserving high-frequency details, achieving an improvement of 1.68 dB in PSNR over baseline method on our RWLQ dataset. Furthermore, we validate the effectiveness of our approach in downstream 3D editing tasks.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）是一种功能强大且计算效率高的三维重建表示方法。尽管其具有诸多优势，3DGS 经常会产生漂浮伪影，即与真实几何结构脱离的错误结构，这会显著降低视觉保真度。导致这些伪影的潜在机制，尤其是在低质量初始化场景下的机制，尚未得到充分探究。本文从频域的角度分析漂浮伪影的成因，并将优化不足的高斯识别为主要来源。基于这一分析，我们提出了消除漂浮伪影高斯溅射（Eliminating-Floating-Artifacts Gaussian Splatting，EFA-GS）方法，该方法有选择地扩展优化不足的高斯，以优先实现精确的低频学习。此外，我们引入了互补的基于深度和基于尺度的策略，以动态优化高斯扩展，有效缓解细节侵蚀。在合成数据集和真实数据集上的大量实验表明，EFA-GS 在保留高频细节的同时，可显著减少漂浮伪影，在我们的 RWLQ 数据集上相较基线方法的 PSNR 提高了 1.68 dB。此外，我们还验证了该方法在下游三维编辑任务中的有效性。\n"
  },
  {
    "path": "abs/2508.02660.md",
    "content": "### PMGS: Reconstruction of Projectile Motion across Large Spatiotemporal Spans via 3D Gaussian Splatting\n\nModeling complex rigid motion across large spatiotemporal spans remains an unresolved challenge in dynamic reconstruction. Existing paradigms are mainly confined to short-term, small-scale deformation and offer limited consideration for physical consistency. This study proposes PMGS, focusing on reconstructing Projectile Motion via 3D Gaussian Splatting. The workflow comprises two stages: 1) Target Modeling: achieving object-centralized reconstruction through dynamic scene decomposition and an improved point density control; 2) Motion Recovery: restoring full motion sequences by learning per-frame SE(3) poses. We introduce an acceleration consistency constraint to bridge Newtonian mechanics and pose estimation, and design a dynamic simulated annealing strategy that adaptively schedules learning rates based on motion states. Futhermore, we devise a Kalman fusion scheme to optimize error accumulation from multi-source observations to mitigate disturbances. Experiments show PMGS's superior performance in reconstructing high-speed nonlinear rigid motion compared to mainstream dynamic methods.\n\n在动态重建中，跨越大时空范围的复杂刚体运动建模仍然是一个尚未解决的挑战。现有范式主要局限于短期、小尺度形变，对物理一致性的考虑有限。本研究提出了 PMGS 方法，专注于利用三维高斯溅射（3D Gaussian Splatting）重建抛射运动。该方法的工作流程包括两个阶段：1）目标建模：通过动态场景分解和改进的点密度控制，实现以目标为中心的重建；2）运动恢复：通过学习逐帧的 SE(3) 位姿，恢复完整的运动序列。我们引入了加速度一致性约束，将牛顿力学与位姿估计相结合，并设计了一种动态模拟退火策略，根据运动状态自适应地调整学习率。此外，我们提出了卡尔曼融合方案，以优化来自多源观测的误差累积，从而减轻干扰。实验结果表明，PMGS 在重建高速非线性刚体运动方面，相较主流动态方法具有更优表现。\n"
  },
  {
    "path": "abs/2508.02831.md",
    "content": "### GENIE: Gaussian Encoding for Neural Radiance Fields Interactive Editing\n\nNeural Radiance Fields (NeRF) and Gaussian Splatting (GS) have recently transformed 3D scene representation and rendering. NeRF achieves high-fidelity novel view synthesis by learning volumetric representations through neural networks, but its implicit encoding makes editing and physical interaction challenging. In contrast, GS represents scenes as explicit collections of Gaussian primitives, enabling real-time rendering, faster training, and more intuitive manipulation. This explicit structure has made GS particularly well-suited for interactive editing and integration with physics-based simulation. In this paper, we introduce GENIE (Gaussian Encoding for Neural Radiance Fields Interactive Editing), a hybrid model that combines the photorealistic rendering quality of NeRF with the editable and structured representation of GS. Instead of using spherical harmonics for appearance modeling, we assign each Gaussian a trainable feature embedding. These embeddings are used to condition a NeRF network based on the k nearest Gaussians to each query point. To make this conditioning efficient, we introduce Ray-Traced Gaussian Proximity Search (RT-GPS), a fast nearest Gaussian search based on a modified ray-tracing pipeline. We also integrate a multi-resolution hash grid to initialize and update Gaussian features. Together, these components enable real-time, locality-aware editing: as Gaussian primitives are repositioned or modified, their interpolated influence is immediately reflected in the rendered output. By combining the strengths of implicit and explicit representations, GENIE supports intuitive scene manipulation, dynamic interaction, and compatibility with physical simulation, bridging the gap between geometry-based editing and neural rendering.\n\n神经辐射场（Neural Radiance Fields，NeRF）和高斯溅射（Gaussian Splatting，GS）近年来革新了三维场景表示与渲染。NeRF 通过神经网络学习体积表示，实现了高保真新视角合成，但其隐式编码使得编辑和物理交互变得困难。相比之下，GS 将场景表示为由高斯基元显式构成的集合，能够实现实时渲染、更快的训练以及更直观的操作。这种显式结构使得 GS 特别适合交互式编辑以及与基于物理的模拟相结合。本文提出了 GENIE（Gaussian Encoding for Neural Radiance Fields Interactive Editing），一种融合 NeRF 的照片级渲染质量与 GS 的可编辑、结构化表示的混合模型。不同于使用球谐函数进行外观建模，我们为每个高斯分配一个可训练的特征嵌入。这些嵌入用于根据每个查询点的 k 个最近高斯对 NeRF 网络进行条件化。为了高效实现这种条件化，我们提出了基于改进光线追踪管线的快速最近高斯搜索方法——光线追踪高斯邻近搜索（Ray-Traced Gaussian Proximity Search，RT-GPS）。我们还引入了多分辨率哈希网格来初始化和更新高斯特征。上述组件共同实现了实时、局部感知的编辑：当高斯基元被重新定位或修改时，其插值影响会立即体现在渲染结果中。通过结合隐式与显式表示的优势，GENIE 支持直观的场景操控、动态交互以及与物理模拟的兼容性，弥合了基于几何的编辑与神经渲染之间的鸿沟。\n"
  },
  {
    "path": "abs/2508.03017.md",
    "content": "### SA-3DGS: A Self-Adaptive Compression Method for 3D Gaussian Splatting\n\nRecent advancements in 3D Gaussian Splatting have enhanced efficient and high-quality novel view synthesis. However, representing scenes requires a large number of Gaussian points, leading to high storage demands and limiting practical deployment. The latest methods facilitate the compression of Gaussian models but struggle to identify truly insignificant Gaussian points in the scene, leading to a decline in subsequent Gaussian pruning, compression quality, and rendering performance. To address this issue, we propose SA-3DGS, a method that significantly reduces storage costs while maintaining rendering quality. SA-3DGS learns an importance score to automatically identify the least significant Gaussians in scene reconstruction, thereby enabling effective pruning and redundancy reduction. Next, the importance-aware clustering module compresses Gaussians attributes more accurately into the codebook, improving the codebook's expressive capability while reducing model size. Finally, the codebook repair module leverages contextual scene information to repair the codebook, thereby recovering the original Gaussian point attributes and mitigating the degradation in rendering quality caused by information loss. Experimental results on several benchmark datasets show that our method achieves up to 66x compression while maintaining or even improving rendering quality. The proposed Gaussian pruning approach is not only adaptable to but also improves other pruning-based methods (e.g., LightGaussian), showcasing excellent performance and strong generalization ability.\n\n三维高斯溅射（3D Gaussian Splatting）的最新进展提升了高效且高质量的新视角合成能力。然而，场景表示需要大量高斯点，导致存储需求高，从而限制了实际部署。现有最新方法虽然能够压缩高斯模型，但难以准确识别场景中真正无关紧要的高斯点，进而影响后续高斯剪枝、压缩质量和渲染性能。为解决这一问题，我们提出了 SA-3DGS 方法，在显著降低存储成本的同时保持渲染质量。SA-3DGS 通过学习重要性评分，自动识别场景重建中最不重要的高斯点，从而实现有效剪枝与冗余减少。随后，重要性感知聚类模块将高斯属性更准确地压缩到码本中，提高码本的表达能力并减少模型规模。最后，码本修复模块利用场景上下文信息对码本进行修复，从而恢复原始高斯点属性，并缓解因信息丢失造成的渲染质量下降。在多个基准数据集上的实验结果表明，该方法在保持甚至提升渲染质量的同时，可实现高达 66 倍的压缩率。所提出的高斯剪枝方法不仅适用于其他基于剪枝的方法（如 LightGaussian），还可提升其性能，展现出优异的效果与强泛化能力。\n"
  },
  {
    "path": "abs/2508.03077.md",
    "content": "### RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions\n\nFeedforward 3D Gaussian Splatting (3DGS) overcomes the limitations of optimization-based 3DGS by enabling fast and high-quality reconstruction without the need for per-scene optimization. However, existing feedforward approaches typically assume that input multi-view images are clean and high-quality. In real-world scenarios, images are often captured under challenging conditions such as noise, low light, or rain, resulting in inaccurate geometry and degraded 3D reconstruction. To address these challenges, we propose a general and efficient multi-view feature enhancement module, RobustGS, which substantially improves the robustness of feedforward 3DGS methods under various adverse imaging conditions, enabling high-quality 3D reconstruction. The RobustGS module can be seamlessly integrated into existing pretrained pipelines in a plug-and-play manner to enhance reconstruction robustness. Specifically, we introduce a novel component, Generalized Degradation Learner, designed to extract generic representations and distributions of multiple degradations from multi-view inputs, thereby enhancing degradation-awareness and improving the overall quality of 3D reconstruction. In addition, we propose a novel semantic-aware state-space model. It first leverages the extracted degradation representations to enhance corrupted inputs in the feature space. Then, it employs a semantic-aware strategy to aggregate semantically similar information across different views, enabling the extraction of fine-grained cross-view correspondences and further improving the quality of 3D representations. Extensive experiments demonstrate that our approach, when integrated into existing methods in a plug-and-play manner, consistently achieves state-of-the-art reconstruction quality across various types of degradations.\n\n前向三维高斯溅射（Feedforward 3D Gaussian Splatting，3DGS）通过实现无需逐场景优化的快速高质量重建，克服了基于优化的 3DGS 方法的局限性。然而，现有前向方法通常假设输入的多视图图像是干净且高质量的。在实际场景中，图像常常是在噪声、低光照或雨天等具有挑战性的条件下采集的，从而导致几何不准确和三维重建质量下降。为应对这些挑战，我们提出了一个通用且高效的多视图特征增强模块——RobustGS，它能在多种不利成像条件下显著提升前向 3DGS 方法的鲁棒性，从而实现高质量的三维重建。RobustGS 模块可以以即插即用的方式无缝集成到现有的预训练管线中，从而增强重建的鲁棒性。具体而言，我们引入了一个新组件——广义退化学习器（Generalized Degradation Learner），用于从多视图输入中提取多种退化的通用表征与分布，从而提升退化感知能力并改善整体三维重建质量。此外，我们提出了一种新的语义感知状态空间模型：该模型首先利用提取的退化表征在特征空间中增强受损输入；随后，采用语义感知策略聚合跨视图中语义相似的信息，从而提取精细的跨视图对应关系，进一步提升三维表示质量。大量实验表明，当以即插即用的方式集成到现有方法中时，我们的方法在多种退化类型下均能稳定实现最先进的重建质量。\n"
  },
  {
    "path": "abs/2508.03180.md",
    "content": "### Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting\n\nRecent advances in 3D Gaussian Splatting (3DGS) have demonstrated remarkable rendering fidelity and efficiency. However, these methods still rely on computationally expensive sequential alpha-blending operations, resulting in significant overhead, particularly on resource-constrained platforms. In this paper, we propose Duplex-GS, a dual-hierarchy framework that integrates proxy Gaussian representations with order-independent rendering techniques to achieve photorealistic results while sustaining real-time performance. To mitigate the overhead caused by view-adaptive radix sort, we introduce cell proxies for local Gaussians management and propose cell search rasterization for further acceleration. By seamlessly combining our framework with Order-Independent Transparency (OIT), we develop a physically inspired weighted sum rendering technique that simultaneously eliminates \"popping\" and \"transparency\" artifacts, yielding substantial improvements in both accuracy and efficiency. Extensive experiments on a variety of real-world datasets demonstrate the robustness of our method across diverse scenarios, including multi-scale training views and large-scale environments. Our results validate the advantages of the OIT rendering paradigm in Gaussian Splatting, achieving high-quality rendering with an impressive 1.5 to 4 speedup over existing OIT based Gaussian Splatting approaches and 52.2% to 86.9% reduction of the radix sort overhead without quality degradation.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）的最新进展在渲染保真度和效率方面展现了显著成就。然而，这些方法仍依赖计算开销巨大的顺序 Alpha 混合操作，尤其在资源受限的平台上会带来显著的性能负担。本文提出了 Duplex-GS，一种结合代理高斯表示与顺序无关渲染技术的双层级框架，在保持实时性能的同时实现了照片级渲染效果。为缓解视角自适应基数排序带来的开销，我们引入了用于本地高斯管理的单元代理，并提出单元搜索光栅化方法以进一步加速。通过将框架与顺序无关透明度（Order-Independent Transparency, OIT）无缝结合，我们设计了一种物理启发的加权求和渲染技术，同时消除了“跳变”和“透明度”伪影，在精度与效率方面均取得了显著提升。在多种真实数据集上的大量实验表明，该方法在多尺度训练视角和大规模环境等多种场景中均具有出色的鲁棒性。结果验证了 OIT 渲染范式在高斯溅射中的优势，实现了较现有基于 OIT 的高斯溅射方法 1.5 到 4 倍的加速，并在不降低质量的前提下减少了 52.2% 至 86.9% 的基数排序开销。\n"
  },
  {
    "path": "abs/2508.03227.md",
    "content": "### Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing\n\nWe address the challenge of lifting 2D visual segmentation to 3D in Gaussian Splatting. Existing methods often suffer from inconsistent 2D masks across viewpoints and produce noisy segmentation boundaries as they neglect these semantic cues to refine the learned Gaussians. To overcome this, we introduce Gaussian Instance Tracing (GIT), which augments the standard Gaussian representation with an instance weight matrix across input views. Leveraging the inherent consistency of Gaussians in 3D, we use this matrix to identify and correct 2D segmentation inconsistencies. Furthermore, since each Gaussian ideally corresponds to a single object, we propose a GIT-guided adaptive density control mechanism to split and prune ambiguous Gaussians during training, resulting in sharper and more coherent 2D and 3D segmentation boundaries. Experimental results show that our method extracts clean 3D assets and consistently improves 3D segmentation in both online (e.g., self-prompting) and offline (e.g., contrastive lifting) settings, enabling applications such as hierarchical segmentation, object extraction, and scene editing.\n\n我们研究了在高斯溅射（Gaussian Splatting）中将二维视觉分割提升到三维的挑战。现有方法常常存在跨视角二维掩码不一致的问题，并且由于忽视利用这些语义线索来优化已学习的高斯，导致分割边界噪声较大。为解决这一问题，我们提出了高斯实例追踪（Gaussian Instance Tracing，GIT）方法，在标准高斯表示中引入了跨输入视角的实例权重矩阵。利用三维高斯固有的一致性，我们使用该矩阵来识别并纠正二维分割中的不一致。此外，由于每个高斯理想情况下应对应于单一物体，我们提出了基于 GIT 引导的自适应密度控制机制，在训练过程中对存在歧义的高斯进行拆分与剪枝，从而获得更清晰、更一致的二维与三维分割边界。实验结果表明，我们的方法能够提取干净的三维资产，并在在线（如自提示）和离线（如对比提升）设置中持续提升三维分割性能，从而支持分层分割、物体提取和场景编辑等应用。\n"
  },
  {
    "path": "abs/2508.03643.md",
    "content": "### Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images\n\nReconstructing and semantically interpreting 3D scenes from sparse 2D views remains a fundamental challenge in computer vision. Conventional methods often decouple semantic understanding from reconstruction or necessitate costly per-scene optimization, thereby restricting their scalability and generalizability. In this paper, we introduce Uni3R, a novel feed-forward framework that jointly reconstructs a unified 3D scene representation enriched with open-vocabulary semantics, directly from unposed multi-view images. Our approach leverages a Cross-View Transformer to robustly integrate information across arbitrary multi-view inputs, which then regresses a set of 3D Gaussian primitives endowed with semantic feature fields. This unified representation facilitates high-fidelity novel view synthesis, open-vocabulary 3D semantic segmentation, and depth prediction, all within a single, feed-forward pass. Extensive experiments demonstrate that Uni3R establishes a new state-of-the-art across multiple benchmarks, including 25.07 PSNR on RE10K and 55.84 mIoU on ScanNet. Our work signifies a novel paradigm towards generalizable, unified 3D scene reconstruction and understanding.\n\n从稀疏的二维视图中重建并语义解析三维场景仍然是计算机视觉中的一项基本挑战。传统方法往往将语义理解与重建解耦，或需要代价高昂的逐场景优化，从而限制了其可扩展性与泛化能力。本文提出了 Uni3R，一种新颖的前向框架，可直接从无位姿的多视图图像中联合重建具备开放词汇语义的统一三维场景表示。我们的方法利用跨视图 Transformer（Cross-View Transformer）稳健整合任意多视图输入的信息，并回归一组带有语义特征场的三维高斯基元。这一统一表示能够在单次前向推理中同时实现高保真新视角合成、开放词汇三维语义分割以及深度预测。大量实验证明，Uni3R 在多个基准数据集上均建立了新的最先进水平，包括在 RE10K 上达到 25.07 PSNR、在 ScanNet 上达到 55.84 mIoU。本工作为通用化、统一的三维场景重建与理解开辟了新范式。\n"
  },
  {
    "path": "abs/2508.04078.md",
    "content": "### RLGS: Reinforcement Learning-Based Adaptive Hyperparameter Tuning for Gaussian Splatting\n\nHyperparameter tuning in 3D Gaussian Splatting (3DGS) is a labor-intensive and expert-driven process, often resulting in inconsistent reconstructions and suboptimal results. We propose RLGS, a plug-and-play reinforcement learning framework for adaptive hyperparameter tuning in 3DGS through lightweight policy modules, dynamically adjusting critical hyperparameters such as learning rates and densification thresholds. The framework is model-agnostic and seamlessly integrates into existing 3DGS pipelines without architectural modifications. We demonstrate its generalization ability across multiple state-of-the-art 3DGS variants, including Taming-3DGS and 3DGS-MCMC, and validate its robustness across diverse datasets. RLGS consistently enhances rendering quality. For example, it improves Taming-3DGS by 0.7dB PSNR on the Tanks and Temple (TNT) dataset, under a fixed Gaussian budget, and continues to yield gains even when baseline performance saturates. Our results suggest that RLGS provides an effective and general solution for automating hyperparameter tuning in 3DGS training, bridging a gap in applying reinforcement learning to 3DGS.\n\n在三维高斯溅射（3D Gaussian Splatting，3DGS）中，超参数调优是一个耗时且依赖专家经验的过程，往往导致重建结果不一致和效果次优。我们提出了 RLGS，这是一种即插即用的强化学习框架，通过轻量级策略模块在 3DGS 中自适应调整超参数，如学习率和加密阈值。该框架与具体模型无关，可无缝集成到现有的 3DGS 管线中，无需修改架构。我们验证了其在多种最先进的 3DGS 变体（包括 Taming-3DGS 和 3DGS-MCMC）中的泛化能力，并在多种数据集上验证了其鲁棒性。RLGS 能够持续提升渲染质量，例如，在固定高斯预算下，它使 Taming-3DGS 在 Tanks and Temple（TNT）数据集上的 PSNR 提高了 0.7 dB，并且即使在基线性能饱和时仍能带来收益。结果表明，RLGS 为 3DGS 训练中的超参数调优提供了一种高效且通用的自动化解决方案，填补了强化学习在 3DGS 应用中的空白。\n"
  },
  {
    "path": "abs/2508.04090.md",
    "content": "### Bridging Diffusion Models and 3D Representations: A 3D Consistent Super-Resolution Framework\n\nWe propose 3D Super Resolution (3DSR), a novel 3D Gaussian-splatting-based super-resolution framework that leverages off-the-shelf diffusion-based 2D super-resolution models. 3DSR encourages 3D consistency across views via the use of an explicit 3D Gaussian-splatting-based scene representation. This makes the proposed 3DSR different from prior work, such as image upsampling or the use of video super-resolution, which either don't consider 3D consistency or aim to incorporate 3D consistency implicitly. Notably, our method enhances visual quality without additional fine-tuning, ensuring spatial coherence within the reconstructed scene. We evaluate 3DSR on MipNeRF360 and LLFF data, demonstrating that it produces high-resolution results that are visually compelling, while maintaining structural consistency in 3D reconstructions.\n\n在三维高斯溅射（3D Gaussian Splatting，3DGS）中，超参数调优是一个耗时且依赖专家经验的过程，往往导致重建结果不一致和效果次优。我们提出了 RLGS，这是一种即插即用的强化学习框架，通过轻量级策略模块在 3DGS 中自适应调整超参数，如学习率和加密阈值。该框架与具体模型无关，可无缝集成到现有的 3DGS 管线中，无需修改架构。我们验证了其在多种最先进的 3DGS 变体（包括 Taming-3DGS 和 3DGS-MCMC）中的泛化能力，并在多种数据集上验证了其鲁棒性。RLGS 能够持续提升渲染质量，例如，在固定高斯预算下，它使 Taming-3DGS 在 Tanks and Temple（TNT）数据集上的 PSNR 提高了 0.7 dB，并且即使在基线性能饱和时仍能带来收益。结果表明，RLGS 为 3DGS 训练中的超参数调优提供了一种高效且通用的自动化解决方案，填补了强化学习在 3DGS 应用中的空白。\n"
  },
  {
    "path": "abs/2508.04099.md",
    "content": "### DET-GS: Depth- and Edge-Aware Regularization for High-Fidelity 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) represents a significant advancement in the field of efficient and high-fidelity novel view synthesis. Despite recent progress, achieving accurate geometric reconstruction under sparse-view conditions remains a fundamental challenge. Existing methods often rely on non-local depth regularization, which fails to capture fine-grained structures and is highly sensitive to depth estimation noise. Furthermore, traditional smoothing methods neglect semantic boundaries and indiscriminately degrade essential edges and textures, consequently limiting the overall quality of reconstruction. In this work, we propose DET-GS, a unified depth and edge-aware regularization framework for 3D Gaussian Splatting. DET-GS introduces a hierarchical geometric depth supervision framework that adaptively enforces multi-level geometric consistency, significantly enhancing structural fidelity and robustness against depth estimation noise. To preserve scene boundaries, we design an edge-aware depth regularization guided by semantic masks derived from Canny edge detection. Furthermore, we introduce an RGB-guided edge-preserving Total Variation loss that selectively smooths homogeneous regions while rigorously retaining high-frequency details and textures. Extensive experiments demonstrate that DET-GS achieves substantial improvements in both geometric accuracy and visual fidelity, outperforming state-of-the-art (SOTA) methods on sparse-view novel view synthesis benchmarks.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）在高效、高保真的新视角合成领域取得了重要进展。尽管近年来取得了进步，但在稀疏视角条件下实现精确的几何重建仍是一个核心挑战。现有方法通常依赖非局部深度正则化，这类方法难以捕捉精细结构，并且对深度估计噪声高度敏感。此外，传统的平滑方法忽视了语义边界，不加区分地削弱了关键边缘和纹理，从而限制了重建的整体质量。为此，我们提出了 DET-GS，这是一种面向三维高斯溅射的统一深度与边缘感知正则化框架。DET-GS 引入了分层几何深度监督机制，自适应地在多层级上施加几何一致性约束，从而显著提升结构保真度，并增强对深度估计噪声的鲁棒性。为了保留场景边界，我们设计了基于 Canny 边缘检测生成的语义掩码引导的边缘感知深度正则化。此外，我们提出了一种基于 RGB 引导的边缘保持全变差（Total Variation）损失，该方法在选择性平滑均质区域的同时，严格保留高频细节与纹理。大量实验结果表明，DET-GS 在几何精度和视觉保真度方面均取得了显著提升，并在稀疏视角新视角合成基准上优于当前最先进（SOTA）的方法。\n"
  },
  {
    "path": "abs/2508.04224.md",
    "content": "### SplitGaussian: Reconstructing Dynamic Scenes via Visual Geometry Decomposition\n\nReconstructing dynamic 3D scenes from monocular video remains fundamentally challenging due to the need to jointly infer motion, structure, and appearance from limited observations. Existing dynamic scene reconstruction methods based on Gaussian Splatting often entangle static and dynamic elements in a shared representation, leading to motion leakage, geometric distortions, and temporal flickering. We identify that the root cause lies in the coupled modeling of geometry and appearance across time, which hampers both stability and interpretability. To address this, we propose \\textbf{SplitGaussian}, a novel framework that explicitly decomposes scene representations into static and dynamic components. By decoupling motion modeling from background geometry and allowing only the dynamic branch to deform over time, our method prevents motion artifacts in static regions while supporting view- and time-dependent appearance refinement. This disentangled design not only enhances temporal consistency and reconstruction fidelity but also accelerates convergence. Extensive experiments demonstrate that SplitGaussian outperforms prior state-of-the-art methods in rendering quality, geometric stability, and motion separation.\n\n从单目视频中重建动态三维场景依然是一个根本性的挑战，因为这需要在有限观测条件下联合推断运动、结构和外观。现有基于高斯溅射的动态场景重建方法通常将静态与动态元素混合在同一表示中，导致运动泄漏、几何失真和时间闪烁。我们发现其根本原因在于跨时间耦合建模几何与外观，这阻碍了稳定性与可解释性。为解决这一问题，我们提出了 \\textbf{SplitGaussian}，一种将场景表示显式分解为静态与动态组件的新框架。通过将运动建模与背景几何解耦，并仅允许动态分支随时间发生形变，我们的方法有效防止了静态区域的运动伪影，同时支持视角与时间相关的外观优化。这种解耦设计不仅提升了时间一致性与重建保真度，还加快了收敛速度。大量实验证明，SplitGaussian 在渲染质量、几何稳定性和运动分离方面均优于以往的最先进方法。\n"
  },
  {
    "path": "abs/2508.04297.md",
    "content": "### MuGS: Multi-Baseline Generalizable Gaussian Splatting Reconstruction\n\nWe present Multi-Baseline Gaussian Splatting (MuRF), a generalized feed-forward approach for novel view synthesis that effectively handles diverse baseline settings, including sparse input views with both small and large baselines. Specifically, we integrate features from Multi-View Stereo (MVS) and Monocular Depth Estimation (MDE) to enhance feature representations for generalizable reconstruction. Next, We propose a projection-and-sampling mechanism for deep depth fusion, which constructs a fine probability volume to guide the regression of the feature map. Furthermore, We introduce a reference-view loss to improve geometry and optimization efficiency. We leverage 3D Gaussian representations to accelerate training and inference time while enhancing rendering quality. MuRF achieves state-of-the-art performance across multiple baseline settings and diverse scenarios ranging from simple objects (DTU) to complex indoor and outdoor scenes (RealEstate10K). We also demonstrate promising zero-shot performance on the LLFF and Mip-NeRF 360 datasets.\n\n我们提出了多基线高斯溅射（Multi-Baseline Gaussian Splatting，MuRF），这是一种通用的前向新视角合成方法，能够有效处理包括小基线和大基线稀疏输入视图在内的多种基线设置。具体而言，我们融合了多视图立体（Multi-View Stereo, MVS）和单目深度估计（Monocular Depth Estimation, MDE）的特征，以增强特征表示能力，从而实现更强的重建泛化性。随后，我们提出了一种投影与采样机制进行深度融合，构建精细的概率体以引导特征图的回归。此外，我们引入了参考视图损失，以提升几何质量和优化效率。我们利用三维高斯表示加速训练与推理的同时提升渲染质量。MuRF 在多种基线设置及从简单物体（DTU）到复杂室内外场景（RealEstate10K）的多样化场景中均实现了最先进的性能。我们还在 LLFF 和 Mip-NeRF 360 数据集上展示了具有潜力的零样本表现。\n"
  },
  {
    "path": "abs/2508.04508.md",
    "content": "### Surf3R: Rapid Surface Reconstruction from Sparse RGB Views in Seconds\n\nCurrent multi-view 3D reconstruction methods rely on accurate camera calibration and pose estimation, requiring complex and time-intensive pre-processing that hinders their practical deployment. To address this challenge, we introduce Surf3R, an end-to-end feedforward approach that reconstructs 3D surfaces from sparse views without estimating camera poses and completes an entire scene in under 10 seconds. Our method employs a multi-branch and multi-view decoding architecture in which multiple reference views jointly guide the reconstruction process. Through the proposed branch-wise processing, cross-view attention, and inter-branch fusion, the model effectively captures complementary geometric cues without requiring camera calibration. Moreover, we introduce a D-Normal regularizer based on an explicit 3D Gaussian representation for surface reconstruction. It couples surface normals with other geometric parameters to jointly optimize the 3D geometry, significantly improving 3D consistency and surface detail accuracy. Experimental results demonstrate that Surf3R achieves state-of-the-art performance on multiple surface reconstruction metrics on ScanNet++ and Replica datasets, exhibiting excellent generalization and efficiency.\n\n当前的多视图三维重建方法依赖精确的相机标定和位姿估计，这需要复杂且耗时的预处理过程，从而阻碍了其实际部署。为解决这一问题，我们提出了 Surf3R，这是一种端到端的前向方法，可在无需估计相机位姿的情况下，从稀疏视图重建三维表面，并在 10 秒内完成整个场景重建。我们的方法采用多分支、多视图解码架构，由多个参考视图共同引导重建过程。通过提出的分支级处理、跨视角注意力机制和分支间融合，模型能够在无需相机标定的情况下，有效捕获互补的几何线索。此外，我们引入了一种基于显式三维高斯表示的 D-Normal 正则化方法，用于表面重建。该方法将表面法向与其他几何参数耦合起来，共同优化三维几何结构，从而显著提升三维一致性和表面细节精度。实验结果表明，Surf3R 在 ScanNet++ 和 Replica 数据集上的多项表面重建指标上均达到了当前最先进水平，并展现出卓越的泛化能力和高效性。\n"
  },
  {
    "path": "abs/2508.04597.md",
    "content": "### Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline\n\nIncrementally recovering real-sized 3D geometry from a pose-free RGB stream is a challenging task in 3D reconstruction, requiring minimal assumptions on input data. Existing methods can be broadly categorized into end-to-end and visual SLAM-based approaches, both of which either struggle with long sequences or depend on slow test-time optimization and depth sensors. To address this, we first integrate a depth estimator into an RGB-D SLAM system, but this approach is hindered by inaccurate geometric details in predicted depth. Through further investigation, we find that 3D Gaussian mapping can effectively solve this problem. Building on this, we propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module to directly infer camera pose from optical flow. This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed. Additionally, we introduce a local graph rendering technique to enhance robustness in feed-forward pose prediction. Experimental results on the Replica and TUM-RGBD datasets, along with a real-world deployment demonstration, show that our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.\n\n在无需位姿的 RGB 流中逐步恢复真实尺寸的三维几何是一项具有挑战性的三维重建任务，对输入数据的假设要求极低。现有方法大致可分为端到端方法和基于视觉 SLAM 的方法，这两类方法要么难以处理长序列，要么依赖于缓慢的测试时优化和深度传感器。为了解决这一问题，我们首先将深度估计器集成到 RGB-D SLAM 系统中，但该方法受限于预测深度的几何细节不准确。进一步研究发现，三维高斯映射能够有效解决这一问题。在此基础上，我们提出了一种结合基于三维高斯的 SLAM 与前向递归预测模块的在线三维重建方法，该模块可直接根据光流推断相机位姿。该方法用快速的网络推理替代了缓慢的测试时优化，从而显著提升了跟踪速度。此外，我们引入了一种局部图渲染技术，以增强前向位姿预测的鲁棒性。在 Replica 和 TUM-RGBD 数据集上的实验结果，以及真实环境部署的展示表明，我们的方法在性能上可与当前最先进的 SplaTAM 相媲美，同时将跟踪时间减少了 90% 以上。\n"
  },
  {
    "path": "abs/2508.04929.md",
    "content": "### CryoGS: Gaussian Splatting for Cryo-EM Homogeneous Reconstruction\n\nAs a critical modality for structural biology, cryogenic electron microscopy (cryo-EM) facilitates the determination of macromolecular structures at near-atomic resolution. The core computational task in single-particle cryo-EM is to reconstruct the 3D electrostatic potential of a molecule from a large collection of noisy 2D projections acquired at unknown orientations. Gaussian mixture models (GMMs) provide a continuous, compact, and physically interpretable representation for molecular density and have recently gained interest in cryo-EM reconstruction. However, existing methods rely on external consensus maps or atomic models for initialization, limiting their use in self-contained pipelines. Addressing this issue, we introduce cryoGS, a GMM-based method that integrates Gaussian splatting with the physics of cryo-EM image formation. In particular, we develop an orthogonal projection-aware Gaussian splatting, with adaptations such as a normalization term and FFT-aligned coordinate system tailored for cryo-EM imaging. All these innovations enable stable and efficient homogeneous reconstruction directly from raw cryo-EM particle images using random initialization. Experimental results on real datasets validate the effectiveness and robustness of cryoGS over representative baselines.\n\n作为结构生物学中的关键技术，冷冻电子显微镜（cryo-EM）能够在近原子分辨率下解析大分子结构。单颗粒 cryo-EM 的核心计算任务是根据在未知取向下采集的大量带噪声二维投影图像，重建分子的三维静电势分布。高斯混合模型（GMM）为分子密度提供了一种连续、紧凑且具有物理可解释性的表示形式，近年来在 cryo-EM 重建中引起了关注。然而，现有方法在初始化时依赖外部共识图或原子模型，这限制了其在自包含管线中的应用。为解决这一问题，我们提出了 cryoGS，这是一种基于 GMM 的方法，将高斯溅射与 cryo-EM 成像物理过程相结合。具体而言，我们提出了正交投影感知的高斯溅射方法，并针对 cryo-EM 成像特点进行了适配，包括归一化项和与 FFT 对齐的坐标系统。这些创新使得 cryoGS 能够在随机初始化的情况下，直接从原始 cryo-EM 颗粒图像中实现稳定且高效的同质重建。真实数据集上的实验结果验证了 cryoGS 相较于代表性基线方法的有效性和鲁棒性。\n"
  },
  {
    "path": "abs/2508.04965.md",
    "content": "### Perceive-Sample-Compress: Towards Real-Time 3D Gaussian Splatting\n\nRecent advances in 3D Gaussian Splatting (3DGS) have demonstrated remarkable capabilities in real-time and photorealistic novel view synthesis. However, traditional 3DGS representations often struggle with large-scale scene management and efficient storage, particularly when dealing with complex environments or limited computational resources. To address these limitations, we introduce a novel perceive-sample-compress framework for 3D Gaussian Splatting. Specifically, we propose a scene perception compensation algorithm that intelligently refines Gaussian parameters at each level. This algorithm intelligently prioritizes visual importance for higher fidelity rendering in critical areas, while optimizing resource usage and improving overall visible quality. Furthermore, we propose a pyramid sampling representation to manage Gaussian primitives across hierarchical levels. Finally, to facilitate efficient storage of proposed hierarchical pyramid representations, we develop a Generalized Gaussian Mixed model compression algorithm to achieve significant compression ratios without sacrificing visual fidelity. The extensive experiments demonstrate that our method significantly improves memory efficiency and high visual quality while maintaining real-time rendering speed.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）的最新进展在实时和照片级新视角合成方面展现了卓越的能力。然而，传统的 3DGS 表示在大规模场景管理和高效存储方面常常面临挑战，尤其是在处理复杂环境或计算资源有限的情况下。为应对这些限制，我们提出了一种新颖的“感知-采样-压缩”框架用于三维高斯溅射。具体而言，我们设计了一种场景感知补偿算法，可在每一层智能优化高斯参数。该算法能够优先关注关键区域的视觉重要性，从而在关键区域实现更高保真度渲染，同时优化资源使用并提升整体可见质量。此外，我们提出了金字塔采样表示，以在分层结构中管理高斯基元。最后，为了高效存储所提出的分层金字塔表示，我们开发了一种广义高斯混合模型压缩算法，在不牺牲视觉保真度的前提下实现了显著的压缩率。大量实验表明，我们的方法在保持实时渲染速度的同时，显著提升了内存效率和视觉质量。\n"
  },
  {
    "path": "abs/2508.04966.md",
    "content": "### Laplacian Analysis Meets Dynamics Modelling: Gaussian Splatting for 4D Reconstruction\n\nWhile 3D Gaussian Splatting (3DGS) excels in static scene modeling, its extension to dynamic scenes introduces significant challenges. Existing dynamic 3DGS methods suffer from either over-smoothing due to low-rank decomposition or feature collision from high-dimensional grid sampling. This is because of the inherent spectral conflicts between preserving motion details and maintaining deformation consistency at different frequency. To address these challenges, we propose a novel dynamic 3DGS framework with hybrid explicit-implicit functions. Our approach contains three key innovations: a spectral-aware Laplacian encoding architecture which merges Hash encoding and Laplacian-based module for flexible frequency motion control, an enhanced Gaussian dynamics attribute that compensates for photometric distortions caused by geometric deformation, and an adaptive Gaussian split strategy guided by KDTree-based primitive control to efficiently query and optimize dynamic areas. Through extensive experiments, our method demonstrates state-of-the-art performance in reconstructing complex dynamic scenes, achieving better reconstruction fidelity.\n\n尽管三维高斯溅射（3D Gaussian Splatting, 3DGS）在静态场景建模中表现出色，但其向动态场景的扩展面临重大挑战。现有的动态 3DGS 方法要么因低秩分解导致过度平滑，要么因高维网格采样引发特征冲突。这是由于在保留运动细节与在不同频率下保持形变一致性之间存在固有的频谱冲突。为应对这些挑战，我们提出了一种结合显式与隐式函数的新型动态 3DGS 框架。我们的方法包含三项关键创新：一种频谱感知的拉普拉斯编码架构，将哈希编码与基于拉普拉斯的模块结合，实现灵活的频率运动控制；一种增强型高斯动态属性，用于补偿几何形变引起的光度失真；以及一种由基于 KDTree 的图元控制引导的自适应高斯分裂策略，用于高效查询和优化动态区域。通过大量实验，我们的方法在复杂动态场景的重建中实现了当前最优性能，并取得了更高的重建保真度。\n"
  },
  {
    "path": "abs/2508.04968.md",
    "content": "### UGOD: Uncertainty-Guided Differentiable Opacity and Soft Dropout for Enhanced Sparse-View 3DGS\n\n3D Gaussian Splatting (3DGS) has become a competitive approach for novel view synthesis (NVS) due to its advanced rendering efficiency through 3D Gaussian projection and blending. However, Gaussians are treated equally weighted for rendering in most 3DGS methods, making them prone to overfitting, which is particularly the case in sparse-view scenarios. To address this, we investigate how adaptive weighting of Gaussians affects rendering quality, which is characterised by learned uncertainties proposed. This learned uncertainty serves two key purposes: first, it guides the differentiable update of Gaussian opacity while preserving the 3DGS pipeline integrity; second, the uncertainty undergoes soft differentiable dropout regularisation, which strategically transforms the original uncertainty into continuous drop probabilities that govern the final Gaussian projection and blending process for rendering. Extensive experimental results over widely adopted datasets demonstrate that our method outperforms rivals in sparse-view 3D synthesis, achieving higher quality reconstruction with fewer Gaussians in most datasets compared to existing sparse-view approaches, e.g., compared to DropGaussian, our method achieves 3.27% PSNR improvements on the MipNeRF 360 dataset.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）凭借其基于三维高斯投影与混合的高效渲染能力，已成为新视图合成（Novel View Synthesis, NVS）中的一种具有竞争力的方法。然而，大多数 3DGS 方法在渲染时对所有高斯赋予相同权重，这使得模型易于过拟合，尤其是在稀疏视角场景中。为此，我们研究了高斯自适应加权对渲染质量的影响，并提出了基于学习不确定性的加权策略。该学习不确定性具有两个核心作用：第一，它在保持 3DGS 渲染管线完整性的同时，引导高斯不透明度的可微分更新；第二，它经过软可微分的 dropout 正则化处理，将原始不确定性平滑地映射为连续的丢弃概率，从而在渲染过程中调控最终的高斯投影与混合。大量在主流数据集上的实验结果表明，我们的方法在稀疏视角三维合成中优于现有方法，在多数数据集上以更少的高斯实现更高质量的重建。例如，相较于 DropGaussian，我们的方法在 MipNeRF 360 数据集上实现了 3.27% 的 PSNR 提升。\n"
  },
  {
    "path": "abs/2508.05187.md",
    "content": "### Refining Gaussian Splatting: A Volumetric Densification Approach\n\nAchieving high-quality novel view synthesis in 3D Gaussian Splatting (3DGS) often depends on effective point primitive management. The underlying Adaptive Density Control (ADC) process addresses this issue by automating densification and pruning. Yet, the vanilla 3DGS densification strategy shows key shortcomings. To address this issue, in this paper we introduce a novel density control method, which exploits the volumes of inertia associated to each Gaussian function to guide the refinement process. Furthermore, we study the effect of both traditional Structure from Motion (SfM) and Deep Image Matching (DIM) methods for point cloud initialization. Extensive experimental evaluations on the Mip-NeRF 360 dataset demonstrate that our approach surpasses 3DGS in reconstruction quality, delivering encouraging performance across diverse scenes.\n\n在三维高斯溅射（3D Gaussian Splatting, 3DGS）中，实现高质量的新视图合成往往依赖于对点图元的有效管理。底层的自适应密度控制（Adaptive Density Control, ADC）过程通过自动执行加密与裁剪来应对这一问题。然而，原始的 3DGS 加密策略存在显著不足。为解决这一问题，本文提出了一种新的密度控制方法，该方法利用与每个高斯函数相关的惯性体积来引导精化过程。此外，我们还研究了传统的运动结构恢复（Structure from Motion, SfM）方法与深度图像匹配（Deep Image Matching, DIM）方法在点云初始化中的作用。基于 Mip-NeRF 360 数据集的大量实验评估表明，我们的方法在重建质量上优于 3DGS，并在多种场景中取得了令人鼓舞的性能表现。\n"
  },
  {
    "path": "abs/2508.05254.md",
    "content": "### CF3: Compact and Fast 3D Feature Fields\n\n3D Gaussian Splatting (3DGS) has begun incorporating rich information from 2D foundation models. However, most approaches rely on a bottom-up optimization process that treats raw 2D features as ground truth, incurring increased computational costs. We propose a top-down pipeline for constructing compact and fast 3D Gaussian feature fields, namely, CF3. We first perform a fast weighted fusion of multi-view 2D features with pre-trained Gaussians. This approach enables training a per-Gaussian autoencoder directly on the lifted features, instead of training autoencoders in the 2D domain. As a result, the autoencoder better aligns with the feature distribution. More importantly, we introduce an adaptive sparsification method that optimizes the Gaussian attributes of the feature field while pruning and merging the redundant Gaussians, constructing an efficient representation with preserved geometric details. Our approach achieves a competitive 3D feature field using as little as 5% of the Gaussians compared to Feature-3DGS.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）已开始引入来自二维基础模型的丰富信息。然而，大多数方法依赖于一种自下而上的优化过程，将原始二维特征视为真实值，这带来了较高的计算成本。我们提出了一种自上而下的管线，用于构建紧凑且高效的三维高斯特征场，称为 CF3。我们首先将多视图二维特征与预训练的高斯进行快速加权融合。这一方法使得可以直接在提升后的特征上训练每个高斯的自编码器，而非在二维域中训练自编码器，从而使自编码器与特征分布更好地对齐。更重要的是，我们引入了一种自适应稀疏化方法，在优化特征场中高斯属性的同时，对冗余高斯进行裁剪与合并，从而在保留几何细节的前提下构建高效表示。与 Feature-3DGS 相比，我们的方法仅使用 5% 的高斯即可实现具有竞争力的三维特征场。\n"
  },
  {
    "path": "abs/2508.05343.md",
    "content": "### 3DGabSplat: 3D Gabor Splatting for Frequency-adaptive Radiance Field Rendering\n\nRecent prominence in 3D Gaussian Splatting (3DGS) has enabled real-time rendering while maintaining high-fidelity novel view synthesis. However, 3DGS resorts to the Gaussian function that is low-pass by nature and is restricted in representing high-frequency details in 3D scenes. Moreover, it causes redundant primitives with degraded training and rendering efficiency and excessive memory overhead. To overcome these limitations, we propose 3D Gabor Splatting (3DGabSplat) that leverages a novel 3D Gabor-based primitive with multiple directional 3D frequency responses for radiance field representation supervised by multi-view images. The proposed 3D Gabor-based primitive forms a filter bank incorporating multiple 3D Gabor kernels at different frequencies to enhance flexibility and efficiency in capturing fine 3D details. Furthermore, to achieve novel view rendering, an efficient CUDA-based rasterizer is developed to project the multiple directional 3D frequency components characterized by 3D Gabor-based primitives onto the 2D image plane, and a frequency-adaptive mechanism is presented for adaptive joint optimization of primitives. 3DGabSplat is scalable to be a plug-and-play kernel for seamless integration into existing 3DGS paradigms to enhance both efficiency and quality of novel view synthesis. Extensive experiments demonstrate that 3DGabSplat outperforms 3DGS and its variants using alternative primitives, and achieves state-of-the-art rendering quality across both real-world and synthetic scenes. Remarkably, we achieve up to 1.35 dB PSNR gain over 3DGS with simultaneously reduced number of primitives and memory consumption.\n\n近期三维高斯溅射（3D Gaussian Splatting, 3DGS）的发展，使得在保持高保真新视图合成的同时实现了实时渲染。然而，3DGS 依赖于本质上具有低通特性的高斯函数，这限制了其在三维场景中表示高频细节的能力。此外，它还会导致图元冗余，从而降低训练与渲染效率，并增加内存开销。为克服这些局限，我们提出了三维 Gabor 溅射（3D Gabor Splatting, 3DGabSplat），该方法利用一种新型的基于三维 Gabor 的图元，通过多方向的三维频率响应来表示辐射场，并由多视图图像进行监督。所提出的三维 Gabor 图元构成了一个滤波器组，结合了不同频率下的多个三维 Gabor 核，从而在捕捉细致三维细节方面具有更高的灵活性与效率。此外，为实现新视图渲染，我们开发了一种高效的基于 CUDA 的光栅化器，将三维 Gabor 图元所表征的多方向三维频率分量投影到二维图像平面上，并引入了一种频率自适应机制，以实现图元的自适应联合优化。3DGabSplat 具有可扩展性，可作为即插即用的核心模块无缝集成到现有的 3DGS 框架中，从而同时提升新视图合成的效率与质量。大量实验表明，3DGabSplat 在使用替代图元的情况下优于 3DGS 及其变体，并在真实场景与合成场景中均实现了当前最优的渲染质量。值得注意的是，我们在减少图元数量与内存占用的同时，相比 3DGS 实现了最高 1.35 dB 的 PSNR 提升。\n"
  },
  {
    "path": "abs/2508.05631.md",
    "content": "### GAP: Gaussianize Any Point Clouds with Text Guidance\n\n3D Gaussian Splatting (3DGS) has demonstrated its advantages in achieving fast and high-quality rendering. As point clouds serve as a widely-used and easily accessible form of 3D representation, bridging the gap between point clouds and Gaussians becomes increasingly important. Recent studies have explored how to convert the colored points into Gaussians, but directly generating Gaussians from colorless 3D point clouds remains an unsolved challenge. In this paper, we propose GAP, a novel approach that gaussianizes raw point clouds into high-fidelity 3D Gaussians with text guidance. Our key idea is to design a multi-view optimization framework that leverages a depth-aware image diffusion model to synthesize consistent appearances across different viewpoints. To ensure geometric accuracy, we introduce a surface-anchoring mechanism that effectively constrains Gaussians to lie on the surfaces of 3D shapes during optimization. Furthermore, GAP incorporates a diffuse-based inpainting strategy that specifically targets at completing hard-to-observe regions. We evaluate GAP on the Point-to-Gaussian generation task across varying complexity levels, from synthetic point clouds to challenging real-world scans, and even large-scale scenes.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）已展现出在实现快速且高质量渲染方面的优势。由于点云是一种广泛使用且易于获取的三维表示形式，弥合点云与高斯之间的差距变得愈发重要。近期研究已经探索了如何将带颜色的点转换为高斯，但直接从无颜色的三维点云生成高斯仍是一个未解决的挑战。本文提出了 GAP，这是一种能够在文本引导下将原始点云高斯化为高保真三维高斯的新方法。我们的核心思想是设计一个多视图优化框架，利用深度感知的图像扩散模型，在不同视角下合成一致的外观。为确保几何精度，我们引入了一种表面锚定机制，在优化过程中有效约束高斯位于三维形状的表面。此外，GAP 结合了一种基于漫反射的修补策略，专门用于完成难以观测区域的补全。我们在从合成点云到复杂的真实世界扫描，甚至大规模场景的多种复杂度下，对 GAP 在点云到高斯生成任务中的表现进行了评估。\n"
  },
  {
    "path": "abs/2508.05813.md",
    "content": "### Optimization-Free Style Transfer for 3D Gaussian Splats\n\nThe task of style transfer for 3D Gaussian splats has been explored in many previous works, but these require reconstructing or fine-tuning the splat while incorporating style information or optimizing a feature extraction network on the splat representation. We propose a reconstruction- and optimization-free approach to stylizing 3D Gaussian splats. This is done by generating a graph structure across the implicit surface of the splat representation. A feed-forward, surface-based stylization method is then used and interpolated back to the individual splats in the scene. This allows for any style image and 3D Gaussian splat to be used without any additional training or optimization. This also allows for fast stylization of splats, achieving speeds under 2 minutes even on consumer-grade hardware. We demonstrate the quality results this approach achieves and compare to other 3D Gaussian splat style transfer methods.\n\n三维高斯溅射（3D Gaussian splat）的风格迁移任务已在许多先前工作中得到探索，但这些方法通常需要在融合风格信息的同时重建或微调溅射，或在溅射表示上优化特征提取网络。我们提出了一种无需重建与优化的三维高斯溅射风格化方法。具体而言，该方法通过在溅射表示的隐式表面上生成图结构，并使用基于表面的前馈式风格化方法，再将风格化结果插值回场景中的各个溅射。这使得任何风格图像与三维高斯溅射都可以直接应用，而无需额外训练或优化。同时，该方法实现了溅射的快速风格化，即使在消费级硬件上，速度也可控制在 2 分钟以内。我们展示了该方法所取得的高质量结果，并与其他三维高斯溅射风格迁移方法进行了对比。\n"
  },
  {
    "path": "abs/2508.05950.md",
    "content": "### A 3DGS-Diffusion Self-Supervised Framework for Normal Estimation from a Single Image\n\nThe lack of spatial dimensional information remains a challenge in normal estimation from a single image. Recent diffusion-based methods have demonstrated significant potential in 2D-to-3D implicit mapping, they rely on data-driven statistical priors and miss the explicit modeling of light-surface interaction, leading to multi-view normal direction conflicts. Moreover, the discrete sampling mechanism of diffusion models causes gradient discontinuity in differentiable rendering reconstruction modules, preventing 3D geometric errors from being backpropagated to the normal generation network, thereby forcing existing methods to depend on dense normal annotations. This paper proposes SINGAD, a novel Self-supervised framework from a single Image for Normal estimation via 3D GAussian splatting guided Diffusion. By integrating physics-driven light-interaction modeling and a differentiable rendering-based reprojection strategy, our framework directly converts 3D geometric errors into normal optimization signals, solving the challenges of multi-view geometric inconsistency and data dependency. Specifically, the framework constructs a light-interaction-driven 3DGS reparameterization model to generate multi-scale geometric features consistent with light transport principles, ensuring multi-view normal consistency. A cross-domain feature fusion module is designed within a conditional diffusion model, embedding geometric priors to constrain normal generation while maintaining accurate geometric error propagation. Furthermore, a differentiable 3D reprojection loss strategy is introduced for self-supervised optimization that minimizes geometric error between the reconstructed and input image, eliminating dependence on annotated normal datasets. Quantitative evaluations on the Google Scanned Objects dataset demonstrate that our method outperforms state-of-the-art approaches across multiple metrics.\n\n从单张图像进行法向估计时，缺乏空间维度信息仍是一个挑战。尽管近年来的扩散模型在二维到三维的隐式映射中展现了显著潜力，但它们依赖于数据驱动的统计先验，缺乏对光与表面交互的显式建模，从而导致多视角法向方向冲突。此外，扩散模型的离散采样机制会在可微分渲染重建模块中引起梯度不连续，阻碍三维几何误差向法向生成网络的反向传播，从而迫使现有方法依赖密集的法向标注数据。本文提出了 SINGAD，一种基于单张图像的法向估计自监督框架（Self-supervised framework from a single Image for Normal estimation via 3D GAussian splatting guided Diffusion）。该框架结合了物理驱动的光交互建模与基于可微渲染的重投影策略，能够将三维几何误差直接转化为法向优化信号，从而解决多视角几何不一致和数据依赖问题。具体而言，框架构建了一个光交互驱动的 3DGS 重参数化模型，用于生成符合光传输原理的多尺度几何特征，从而确保多视角法向一致性；在条件扩散模型中设计了跨域特征融合模块，将几何先验嵌入以约束法向生成，同时保持几何误差的准确传播。此外，引入了一种可微的三维重投影损失策略，用于自监督优化，通过最小化重建图像与输入图像的几何误差，消除了对标注法向数据集的依赖。在 Google Scanned Objects 数据集上的定量评估表明，我们的方法在多项指标上均优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2508.06014.md",
    "content": "### ExploreGS: Explorable 3D Scene Reconstruction with Virtual Camera Samplings and Diffusion Priors\n\nRecent advances in novel view synthesis (NVS) have enabled real-time rendering with 3D Gaussian Splatting (3DGS). However, existing methods struggle with artifacts and missing regions when rendering from viewpoints that deviate from the training trajectory, limiting seamless scene exploration. To address this, we propose a 3DGS-based pipeline that generates additional training views to enhance reconstruction. We introduce an information-gain-driven virtual camera placement strategy to maximize scene coverage, followed by video diffusion priors to refine rendered results. Fine-tuning 3D Gaussians with these enhanced views significantly improves reconstruction quality. To evaluate our method, we present Wild-Explore, a benchmark designed for challenging scene exploration. Experiments demonstrate that our approach outperforms existing 3DGS-based methods, enabling high-quality, artifact-free rendering from arbitrary viewpoints.\n\n新视图合成（Novel View Synthesis, NVS）的最新进展使得基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的实时渲染成为可能。然而，当从偏离训练轨迹的视点进行渲染时，现有方法容易出现伪影和区域缺失的问题，从而限制了无缝的场景探索。为此，我们提出了一种基于 3DGS 的管线，通过生成额外的训练视图来增强重建效果。我们引入了一种基于信息增益的虚拟相机布置策略，以最大化场景覆盖率，并结合视频扩散先验来优化渲染结果。利用这些增强视图对三维高斯进行微调，可以显著提升重建质量。为评估我们的方法，我们提出了 Wild-Explore 基准，用于应对具有挑战性的场景探索任务。实验结果表明，我们的方法优于现有的基于 3DGS 的方法，实现了任意视点下的高质量、无伪影渲染。\n"
  },
  {
    "path": "abs/2508.06136.md",
    "content": "### Roll Your Eyes: Gaze Redirection via Explicit 3D Eyeball Rotation\n\nWe propose a novel 3D gaze redirection framework that leverages an explicit 3D eyeball structure. Existing gaze redirection methods are typically based on neural radiance fields, which employ implicit neural representations via volume rendering. Unlike these NeRF-based approaches, where the rotation and translation of 3D representations are not explicitly modeled, we introduce a dedicated 3D eyeball structure to represent the eyeballs with 3D Gaussian Splatting (3DGS). Our method generates photorealistic images that faithfully reproduce the desired gaze direction by explicitly rotating and translating the 3D eyeball structure. In addition, we propose an adaptive deformation module that enables the replication of subtle muscle movements around the eyes. Through experiments conducted on the ETH-XGaze dataset, we demonstrate that our framework is capable of generating diverse novel gaze images, achieving superior image quality and gaze estimation accuracy compared to previous state-of-the-art methods.\n\n我们提出了一种利用显式三维眼球结构的新型三维视线重定向框架。现有的视线重定向方法通常基于神经辐射场（Neural Radiance Fields, NeRF），通过体渲染实现隐式神经表示。与这些基于 NeRF 的方法不同，其三维表示的旋转与平移并未得到显式建模，我们引入了专用的三维眼球结构，并采用三维高斯溅射（3D Gaussian Splatting, 3DGS）来表示眼球。通过显式旋转和平移三维眼球结构，我们的方法能够生成高逼真度的图像，精确再现目标视线方向。此外，我们提出了一种自适应形变模块，可以模拟眼部周围细微的肌肉运动。基于 ETH-XGaze 数据集的实验结果表明，该框架能够生成多样化的新视线图像，在图像质量与视线估计精度方面均优于现有最先进方法。\n"
  },
  {
    "path": "abs/2508.06169.md",
    "content": "### UW-3DGS: Underwater 3D Reconstruction with Physics-Aware Gaussian Splatting\n\nUnderwater 3D scene reconstruction faces severe challenges from light absorption, scattering, and turbidity, which degrade geometry and color fidelity in traditional methods like Neural Radiance Fields (NeRF). While NeRF extensions such as SeaThru-NeRF incorporate physics-based models, their MLP reliance limits efficiency and spatial resolution in hazy environments. We introduce UW-3DGS, a novel framework adapting 3D Gaussian Splatting (3DGS) for robust underwater reconstruction. Key innovations include: (1) a plug-and-play learnable underwater image formation module using voxel-based regression for spatially varying attenuation and backscatter; and (2) a Physics-Aware Uncertainty Pruning (PAUP) branch that adaptively removes noisy floating Gaussians via uncertainty scoring, ensuring artifact-free geometry. The pipeline operates in training and rendering stages. During training, noisy Gaussians are optimized end-to-end with underwater parameters, guided by PAUP pruning and scattering modeling. In rendering, refined Gaussians produce clean Unattenuated Radiance Images (URIs) free from media effects, while learned physics enable realistic Underwater Images (UWIs) with accurate light transport. Experiments on SeaThru-NeRF and UWBundle datasets show superior performance, achieving PSNR of 27.604, SSIM of 0.868, and LPIPS of 0.104 on SeaThru-NeRF, with ~65% reduction in floating artifacts.\n\n水下三维场景重建面临光吸收、散射和浑浊等严峻挑战，这些因素会降低几何与颜色保真度，使得传统方法（如神经辐射场 NeRF）表现受限。尽管 SeaThru-NeRF 等 NeRF 扩展方法引入了基于物理的模型，但其对 MLP 的依赖限制了在浑浊环境中的效率与空间分辨率。我们提出了 UW-3DGS，这是一种针对鲁棒水下重建的三维高斯溅射（3DGS）新框架。其核心创新包括：（1）一种可插拔的可学习水下成像模块，基于体素回归建模空间变化的光衰减与反散射；（2）一个基于物理感知的不确定性裁剪（Physics-Aware Uncertainty Pruning, PAUP）分支，通过不确定性评分自适应去除噪声悬浮高斯，确保无伪影的几何重建。该流水线在训练与渲染阶段均可运行：训练阶段，将含噪高斯与水下参数进行端到端优化，由 PAUP 裁剪与散射建模共同引导；渲染阶段，经优化的高斯可生成去除介质效应的干净非衰减辐射图（Unattenuated Radiance Images, URIs），同时结合学习到的物理模型生成具有准确光传输的真实水下图像（Underwater Images, UWIs）。在 SeaThru-NeRF 和 UWBundle 数据集上的实验表明，该方法性能优越，在 SeaThru-NeRF 上实现了 27.604 的 PSNR、0.868 的 SSIM 和 0.104 的 LPIPS，并减少约 65% 的悬浮伪影。\n"
  },
  {
    "path": "abs/2508.06968.md",
    "content": "### Evaluating Fisheye-Compatible 3D Gaussian Splatting Methods on Real Images Beyond 180 Degree Field of View\n\nWe present the first evaluation of fisheye-based 3D Gaussian Splatting methods, Fisheye-GS and 3DGUT, on real images with fields of view exceeding 180 degree. Our study covers both indoor and outdoor scenes captured with 200 degree fisheye cameras and analyzes how each method handles extreme distortion in real world settings. We evaluate performance under varying fields of view (200 degree, 160 degree, and 120 degree) to study the tradeoff between peripheral distortion and spatial coverage. Fisheye-GS benefits from field of view (FoV) reduction, particularly at 160 degree, while 3DGUT remains stable across all settings and maintains high perceptual quality at the full 200 degree view. To address the limitations of SfM-based initialization, which often fails under strong distortion, we also propose a depth-based strategy using UniK3D predictions from only 2-3 fisheye images per scene. Although UniK3D is not trained on real fisheye data, it produces dense point clouds that enable reconstruction quality on par with SfM, even in difficult scenes with fog, glare, or sky. Our results highlight the practical viability of fisheye-based 3DGS methods for wide-angle 3D reconstruction from sparse and distortion-heavy image inputs.\n\n我们首次对基于鱼眼的3D高斯溅射方法——Fisheye-GS和3DGUT——在视场超过180度的真实图像上进行了评估。我们的研究涵盖了使用200度鱼眼相机拍摄的室内和室外场景，并分析了每种方法在现实环境中应对极端畸变的表现。我们在不同的视场条件下（200度、160度和120度）进行了性能评估，以研究边缘畸变与空间覆盖之间的权衡。Fisheye-GS在视场缩减时（尤其在160度）表现获益，而3DGUT在所有设置下均保持稳定，并在完整200度视角下维持高水平的感知质量。为解决基于SfM的初始化在强畸变下常常失效的问题，我们提出了一种基于深度的策略，利用每个场景仅2-3张鱼眼图像的UniK3D预测。尽管UniK3D并未在真实鱼眼数据上进行训练，但它能够生成密集点云，使得重建质量即使在雾气、眩光或天空等复杂场景下也能与SfM相媲美。我们的结果突显了基于鱼眼的3DGS方法在稀疏且高畸变图像输入条件下实现广角3D重建的实际可行性。\n"
  },
  {
    "path": "abs/2508.07003.md",
    "content": "### EGS-SLAM: RGB-D Gaussian Splatting SLAM with Events\n\nGaussian Splatting SLAM (GS-SLAM) offers a notable improvement over traditional SLAM methods, enabling photorealistic 3D reconstruction that conventional approaches often struggle to achieve. However, existing GS-SLAM systems perform poorly under persistent and severe motion blur commonly encountered in real-world scenarios, leading to significantly degraded tracking accuracy and compromised 3D reconstruction quality. To address this limitation, we propose EGS-SLAM, a novel GS-SLAM framework that fuses event data with RGB-D inputs to simultaneously reduce motion blur in images and compensate for the sparse and discrete nature of event streams, enabling robust tracking and high-fidelity 3D Gaussian Splatting reconstruction. Specifically, our system explicitly models the camera's continuous trajectory during exposure, supporting event- and blur-aware tracking and mapping on a unified 3D Gaussian Splatting scene. Furthermore, we introduce a learnable camera response function to align the dynamic ranges of events and images, along with a no-event loss to suppress ringing artifacts during reconstruction. We validate our approach on a new dataset comprising synthetic and real-world sequences with significant motion blur. Extensive experimental results demonstrate that EGS-SLAM consistently outperforms existing GS-SLAM systems in both trajectory accuracy and photorealistic 3D Gaussian Splatting reconstruction.\n\n高斯溅射SLAM（GS-SLAM）相较于传统SLAM方法具有显著提升，能够实现传统方法难以达到的逼真三维重建。然而，现有的GS-SLAM系统在真实场景中常见的持续性和严重运动模糊条件下表现不佳，导致跟踪精度大幅下降，三维重建质量受损。为解决这一局限，我们提出了EGS-SLAM，这是一种新颖的GS-SLAM框架，将事件数据与RGB-D输入融合，在减少图像运动模糊的同时，弥补事件流稀疏和离散的特性，从而实现稳健的跟踪和高保真三维高斯溅射重建。具体而言，我们的系统显式建模了曝光期间相机的连续轨迹，支持在统一的三维高斯溅射场景中进行事件感知与模糊感知的跟踪和建图。此外，我们引入了可学习的相机响应函数，用于对齐事件与图像的动态范围，并提出了无事件损失（no-event loss），以抑制重建过程中的振铃伪影。我们在包含显著运动模糊的合成和真实序列的新数据集上验证了该方法。大量实验结果表明，EGS-SLAM在轨迹精度和逼真三维高斯溅射重建方面均稳定优于现有的GS-SLAM系统。\n"
  },
  {
    "path": "abs/2508.07038.md",
    "content": "### 3DGS-VBench: A Comprehensive Video Quality Evaluation Benchmark for 3DGS Compression\n\n3D Gaussian Splatting (3DGS) enables real-time novel view synthesis with high visual fidelity, but its substantial storage requirements hinder practical deployment, prompting state-of-the-art (SOTA) 3DGS methods to incorporate compression modules. However, these 3DGS generative compression techniques introduce unique distortions lacking systematic quality assessment research. To this end, we establish 3DGS-VBench, a large-scale Video Quality Assessment (VQA) Dataset and Benchmark with 660 compressed 3DGS models and video sequences generated from 11 scenes across 6 SOTA 3DGS compression algorithms with systematically designed parameter levels. With annotations from 50 participants, we obtained MOS scores with outlier removal and validated dataset reliability. We benchmark 6 3DGS compression algorithms on storage efficiency and visual quality, and evaluate 15 quality assessment metrics across multiple paradigms. Our work enables specialized VQA model training for 3DGS, serving as a catalyst for compression and quality assessment research.\n\n三维高斯溅射（3DGS）实现了高视觉保真度的实时新视角合成，但其庞大的存储需求阻碍了实际部署，这促使最新的3DGS方法引入压缩模块。然而，这些3DGS生成式压缩技术会带来独特的失真，而缺乏系统性的质量评估研究。为此，我们建立了3DGS-VBench，这是一个大规模视频质量评估（VQA）数据集与基准，包含660个经过压缩的3DGS模型和由6种最新3DGS压缩算法在系统化参数水平下生成的11个场景视频序列。通过50名参与者的标注，我们获得了经过异常值剔除的主观意见分（MOS），并验证了数据集的可靠性。我们对6种3DGS压缩算法的存储效率和视觉质量进行了基准测试，并在多个范式下评估了15种质量评估指标。我们的工作支持针对3DGS的专用VQA模型训练，为压缩与质量评估研究提供了推动力。\n"
  },
  {
    "path": "abs/2508.07182.md",
    "content": "### 3D Gaussian Representations with Motion Trajectory Field for Dynamic Scene Reconstruction\n\nThis paper addresses the challenge of novel-view synthesis and motion reconstruction of dynamic scenes from monocular video, which is critical for many robotic applications. Although Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have demonstrated remarkable success in rendering static scenes, extending them to reconstruct dynamic scenes remains challenging. In this work, we introduce a novel approach that combines 3DGS with a motion trajectory field, enabling precise handling of complex object motions and achieving physically plausible motion trajectories. By decoupling dynamic objects from static background, our method compactly optimizes the motion trajectory field. The approach incorporates time-invariant motion coefficients and shared motion trajectory bases to capture intricate motion patterns while minimizing optimization complexity. Extensive experiments demonstrate that our approach achieves state-of-the-art results in both novel-view synthesis and motion trajectory recovery from monocular video, advancing the capabilities of dynamic scene reconstruction.\n\n本文针对单目视频中动态场景的新视角合成与运动重建问题展开研究，这在众多机器人应用中至关重要。尽管神经辐射场（NeRF）和三维高斯溅射（3DGS）在静态场景渲染方面已取得显著成功，但将其扩展到动态场景的重建仍然具有挑战性。在本研究中，我们提出了一种新方法，将3DGS与运动轨迹场相结合，从而能够精确处理复杂的物体运动，并实现物理合理的运动轨迹。通过将动态物体与静态背景解耦，我们的方法能够紧凑地优化运动轨迹场。该方法引入了时间不变的运动系数和共享的运动轨迹基，以捕捉复杂的运动模式，同时降低优化复杂度。大量实验结果表明，该方法在单目视频的新视角合成与运动轨迹恢复方面均达到了最新的先进水平，推动了动态场景重建能力的发展。\n"
  },
  {
    "path": "abs/2508.07263.md",
    "content": "### Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems\n\nWith the rise of 3D Gaussian Splatting (3DGS), a variety of digital watermarking techniques, embedding either 1D bitstreams or 2D images, are used for copyright protection. However, the robustness of these watermarking techniques against potential attacks remains underexplored. This paper introduces the first universal black-box attack framework, the Group-based Multi-objective Evolutionary Attack (GMEA), designed to challenge these watermarking systems. We formulate the attack as a large-scale multi-objective optimization problem, balancing watermark removal with visual quality. In a black-box setting, we introduce an indirect objective function that blinds the watermark detector by minimizing the standard deviation of features extracted by a convolutional network, thus rendering the feature maps uninformative. To manage the vast search space of 3DGS models, we employ a group-based optimization strategy to partition the model into multiple, independent sub-optimization problems. Experiments demonstrate that our framework effectively removes both 1D and 2D watermarks from mainstream 3DGS watermarking methods while maintaining high visual fidelity. This work reveals critical vulnerabilities in existing 3DGS copyright protection schemes and calls for the development of more robust watermarking systems.\n\n随着三维高斯溅射（3DGS）的兴起，各类数字水印技术被广泛用于版权保护，这些方法通常嵌入一维比特流或二维图像。然而，这些水印技术在面对潜在攻击时的鲁棒性仍未得到充分研究。本文提出了首个通用黑盒攻击框架——基于分组的多目标进化攻击（GMEA），旨在对这些水印系统进行挑战。我们将该攻击建模为一个大规模多目标优化问题，在水印去除与视觉质量之间进行权衡。在黑盒设置下，我们引入了一种间接目标函数，通过最小化卷积网络提取特征的标准差来“致盲”水印检测器，从而使特征图失去判别性。为应对3DGS模型庞大的搜索空间，我们采用了分组优化策略，将模型划分为多个独立的子优化问题。实验表明，该框架能够有效去除主流3DGS水印方法中的一维与二维水印，同时保持较高的视觉保真度。本研究揭示了现有3DGS版权保护方案中的关键漏洞，并呼吁开发更为鲁棒的水印系统。\n"
  },
  {
    "path": "abs/2508.07355.md",
    "content": "### GS4Buildings: Prior-Guided Gaussian Splatting for 3D Building Reconstruction\n\nRecent advances in Gaussian Splatting (GS) have demonstrated its effectiveness in photo-realistic rendering and 3D reconstruction. Among these, 2D Gaussian Splatting (2DGS) is particularly suitable for surface reconstruction due to its flattened Gaussian representation and integrated normal regularization. However, its performance often degrades in large-scale and complex urban scenes with frequent occlusions, leading to incomplete building reconstructions. We propose GS4Buildings, a novel prior-guided Gaussian Splatting method leveraging the ubiquity of semantic 3D building models for robust and scalable building surface reconstruction. Instead of relying on traditional Structure-from-Motion (SfM) pipelines, GS4Buildings initializes Gaussians directly from low-level Level of Detail (LoD)2 semantic 3D building models. Moreover, we generate prior depth and normal maps from the planar building geometry and incorporate them into the optimization process, providing strong geometric guidance for surface consistency and structural accuracy. We also introduce an optional building-focused mode that limits reconstruction to building regions, achieving a 71.8% reduction in Gaussian primitives and enabling a more efficient and compact representation. Experiments on urban datasets demonstrate that GS4Buildings improves reconstruction completeness by 20.5% and geometric accuracy by 32.8%. These results highlight the potential of semantic building model integration to advance GS-based reconstruction toward real-world urban applications such as smart cities and digital twins.\n\n高斯溅射（GS）的最新进展展示了其在照片级真实感渲染和三维重建中的有效性。其中，二维高斯溅射（2DGS）由于采用扁平化高斯表示并结合法向正则化，特别适合表面重建。然而，在存在频繁遮挡的大规模复杂城市场景中，其性能常常下降，导致建筑物重建不完整。为此，我们提出了 GS4Buildings，这是一种结合先验信息的高斯溅射方法，利用语义三维建筑模型的普遍存在性，实现稳健且可扩展的建筑表面重建。与依赖传统结构自运动（SfM）流程的方法不同，GS4Buildings 直接从低层级的细节等级（LoD）2语义三维建筑模型初始化高斯。此外，我们从平面建筑几何中生成先验深度图和法向图，并将其引入优化过程中，为表面一致性和结构精度提供强有力的几何指导。我们还引入了一种可选的建筑专注模式，将重建限制在建筑区域，从而减少71.8%的高斯基元数量，实现更高效和紧凑的表示。城市数据集上的实验表明，GS4Buildings在重建完整性方面提升了20.5%，在几何精度方面提升了32.8%。这些结果突显了语义建筑模型集成在推动基于GS的重建向智慧城市和数字孪生等真实城市应用发展的潜力。\n"
  },
  {
    "path": "abs/2508.07372.md",
    "content": "### DIP-GS: Deep Image Prior For Gaussian Splatting Sparse View Recovery\n\n3D Gaussian Splatting (3DGS) is a leading 3D scene reconstruction method, obtaining high-quality reconstruction with real-time rendering runtime performance. The main idea behind 3DGS is to represent the scene as a collection of 3D gaussians, while learning their parameters to fit the given views of the scene. While achieving superior performance in the presence of many views, 3DGS struggles with sparse view reconstruction, where the input views are sparse and do not fully cover the scene and have low overlaps. In this paper, we propose DIP-GS, a Deep Image Prior (DIP) 3DGS representation. By using the DIP prior, which utilizes internal structure and patterns, with coarse-to-fine manner, DIP-based 3DGS can operate in scenarios where vanilla 3DGS fails, such as sparse view recovery. Note that our approach does not use any pre-trained models such as generative models and depth estimation, but rather relies only on the input frames. Among such methods, DIP-GS obtains state-of-the-art (SOTA) competitive results on various sparse-view reconstruction tasks, demonstrating its capabilities.\n\n三维高斯溅射（3DGS）是一种领先的三维场景重建方法，能够在保持实时渲染性能的同时获得高质量的重建效果。3DGS 的核心思想是将场景表示为一组三维高斯，并通过学习其参数来拟合给定的场景视图。尽管在拥有大量视图时能实现优越表现，但3DGS在稀疏视图重建方面表现不佳，此时输入视图稀少，无法全面覆盖场景且重叠度低。本文提出了 DIP-GS，一种基于深度图像先验（DIP）的3DGS表示方法。通过利用DIP先验（利用内部结构和模式）并采用由粗到细的方式，基于DIP的3DGS能够在标准3DGS失效的场景下（如稀疏视图恢复）发挥作用。需要注意的是，我们的方法不依赖任何预训练模型（如生成模型或深度估计），而仅依赖输入帧。在同类方法中，DIP-GS 在多项稀疏视图重建任务上取得了最新的先进水平（SOTA）竞争性结果，展现了其能力。\n"
  },
  {
    "path": "abs/2508.07483.md",
    "content": "### Novel View Synthesis with Gaussian Splatting: Impact on Photogrammetry Model Accuracy and Resolution\n\nIn this paper, I present a comprehensive study comparing Photogrammetry and Gaussian Splatting techniques for 3D model reconstruction and view synthesis. I created a dataset of images from a real-world scene and constructed 3D models using both methods. To evaluate the performance, I compared the models using structural similarity index (SSIM), peak signal-to-noise ratio (PSNR), learned perceptual image patch similarity (LPIPS), and lp/mm resolution based on the USAF resolution chart. A significant contribution of this work is the development of a modified Gaussian Splatting repository, which I forked and enhanced to enable rendering images from novel camera poses generated in the Blender environment. This innovation allows for the synthesis of high-quality novel views, showcasing the flexibility and potential of Gaussian Splatting. My investigation extends to an augmented dataset that includes both original ground images and novel views synthesized via Gaussian Splatting. This augmented dataset was employed to generate a new photogrammetry model, which was then compared against the original photogrammetry model created using only the original images. The results demonstrate the efficacy of using Gaussian Splatting to generate novel high-quality views and its potential to improve photogrammetry-based 3D reconstructions. The comparative analysis highlights the strengths and limitations of both approaches, providing valuable information for applications in extended reality (XR), photogrammetry, and autonomous vehicle simulations.\n\n本文对摄影测量与高斯溅射两种技术在三维模型重建与视图合成中的应用进行了全面比较研究。我基于真实场景采集了一组图像数据集，并分别使用这两种方法构建了三维模型。为评估性能，我采用了结构相似性指数（SSIM）、峰值信噪比（PSNR）、感知图像块相似性（LPIPS）以及基于美国空军（USAF）分辨率卡的 lp/mm 分辨率作为评价指标。本研究的一项重要贡献是开发了一个经过修改的高斯溅射代码库，我在此基础上进行了分支与增强，使其能够渲染由 Blender 环境生成的新颖相机位姿下的图像。这一创新使得合成高质量新视角成为可能，展示了高斯溅射的灵活性与潜力。我的研究进一步扩展到一个增强数据集，该数据集包含原始地面图像与由高斯溅射合成的新视角图像。随后，我利用该增强数据集生成了一个新的摄影测量模型，并与仅使用原始图像构建的初始摄影测量模型进行了对比。实验结果表明，利用高斯溅射生成高质量新视角不仅是可行的，还能够提升基于摄影测量的三维重建效果。该对比分析揭示了两种方法各自的优势与局限，为扩展现实（XR）、摄影测量及自动驾驶仿真等应用提供了有价值的参考。\n"
  },
  {
    "path": "abs/2508.07701.md",
    "content": "### Multi-view Normal and Distance Guidance Gaussian Splatting for Surface Reconstruction\n\n3D Gaussian Splatting (3DGS) achieves remarkable results in the field of surface reconstruction. However, when Gaussian normal vectors are aligned within the single-view projection plane, while the geometry appears reasonable in the current view, biases may emerge upon switching to nearby views. To address the distance and global matching challenges in multi-view scenes, we design multi-view normal and distance-guided Gaussian splatting. This method achieves geometric depth unification and high-accuracy reconstruction by constraining nearby depth maps and aligning 3D normals. Specifically, for the reconstruction of small indoor and outdoor scenes, we propose a multi-view distance reprojection regularization module that achieves multi-view Gaussian alignment by computing the distance loss between two nearby views and the same Gaussian surface. Additionally, we develop a multi-view normal enhancement module, which ensures consistency across views by matching the normals of pixel points in nearby views and calculating the loss. Extensive experimental results demonstrate that our method outperforms the baseline in both quantitative and qualitative evaluations, significantly enhancing the surface reconstruction capability of 3DGS.\n\n三维高斯溅射（3DGS）在表面重建领域取得了显著成果。然而，当高斯法向量在单视图投影平面内对齐时，尽管几何在当前视角下看似合理，但在切换到相邻视角时可能会产生偏差。为了解决多视图场景中的距离与全局匹配问题，我们设计了多视图法向与距离引导的高斯溅射方法。该方法通过约束相邻深度图并对齐三维法向，实现几何深度统一与高精度重建。具体而言，对于小规模室内外场景的重建，我们提出了多视图距离重投影正则化模块，通过计算两个相邻视角与同一高斯表面之间的距离损失，实现多视图高斯对齐。此外，我们开发了多视图法向增强模块，通过匹配相邻视角中像素点的法向并计算损失，从而保证跨视角一致性。大量实验结果表明，我们的方法在定量与定性评估中均优于基线方法，显著提升了3DGS的表面重建能力。\n"
  },
  {
    "path": "abs/2508.07717.md",
    "content": "### Touch-Augmented Gaussian Splatting for Enhanced 3D Scene Reconstruction\n\nThis paper presents a multimodal framework that integrates touch signals (contact points and surface normals) into 3D Gaussian Splatting (3DGS). Our approach enhances scene reconstruction, particularly under challenging conditions like low lighting, limited camera viewpoints, and occlusions. Different from the visual-only method, the proposed approach incorporates spatially selective touch measurements to refine both the geometry and appearance of the 3D Gaussian representation. To guide the touch exploration, we introduce a two-stage sampling scheme that initially probes sparse regions and then concentrates on high-uncertainty boundaries identified from the reconstructed mesh. A geometric loss is proposed to ensure surface smoothness, resulting in improved geometry. Experimental results across diverse scenarios show consistent improvements in geometric accuracy. In the most challenging case with severe occlusion, the Chamfer Distance is reduced by over 15x, demonstrating the effectiveness of integrating touch cues into 3D Gaussian Splatting. Furthermore, our approach maintains a fully online pipeline, underscoring its feasibility in visually degraded environments.\n\n本文提出了一种多模态框架，将触觉信号（接触点与表面法向）融入三维高斯溅射（3DGS）。该方法增强了场景重建能力，尤其适用于光照不足、摄像机视角有限以及存在遮挡等复杂环境。不同于纯视觉方法，本文提出的方法引入了空间选择性的触觉测量，用于优化三维高斯表示的几何与外观。为引导触觉探索，我们设计了一个两阶段采样方案，先在稀疏区域进行探测，再聚焦于由重建网格识别出的高不确定性边界。我们提出了一种几何损失，用以保证表面平滑性，从而提升几何质量。在多种场景下的实验结果表明，该方法在几何精度方面均有显著提升。在最具挑战性的严重遮挡场景中，Chamfer 距离降低了超过15倍，证明了触觉信息融入三维高斯溅射的有效性。此外，该方法保持了全在线处理流程，凸显了其在视觉退化环境下的可行性。\n"
  },
  {
    "path": "abs/2508.07897.md",
    "content": "### NeeCo: Image Synthesis of Novel Instrument States Based on Dynamic and Deformable 3D Gaussian Reconstruction\n\nComputer vision-based technologies significantly enhance surgical automation by advancing tool tracking, detection, and localization. However, Current data-driven approaches are data-voracious, requiring large, high-quality labeled image datasets, which limits their application in surgical data science. Our Work introduces a novel dynamic Gaussian Splatting technique to address the data scarcity in surgical image datasets. We propose a dynamic Gaussian model to represent dynamic surgical scenes, enabling the rendering of surgical instruments from unseen viewpoints and deformations with real tissue backgrounds. We utilize a dynamic training adjustment strategy to address challenges posed by poorly calibrated camera poses from real-world scenarios. Additionally, we propose a method based on dynamic Gaussians for automatically generating annotations for our synthetic data. For evaluation, we constructed a new dataset featuring seven scenes with 14,000 frames of tool and camera motion and tool jaw articulation, with a background of an ex-vivo porcine model. Using this dataset, we synthetically replicate the scene deformation from the ground truth data, allowing direct comparisons of synthetic image quality. Experimental results illustrate that our method generates photo-realistic labeled image datasets with the highest values in Peak-Signal-to-Noise Ratio (29.87). We further evaluate the performance of medical-specific neural networks trained on real and synthetic images using an unseen real-world image dataset. Our results show that the performance of models trained on synthetic images generated by the proposed method outperforms those trained with state-of-the-art standard data augmentation by 10%, leading to an overall improvement in model performances by nearly 15%.\n\n计算机视觉技术通过提升手术工具的跟踪、检测与定位能力，显著推动了手术自动化的发展。然而，当前的数据驱动方法对数据依赖极大，需要大规模且高质量的标注图像数据集，这限制了其在手术数据科学中的应用。本文提出了一种新颖的动态高斯溅射技术，以解决手术图像数据集稀缺的问题。我们提出了一种动态高斯模型，用于表示动态手术场景，使得能够在真实组织背景下从未见过的视角和形变中渲染手术器械。我们采用动态训练调整策略，以应对现实场景中相机位姿校准不佳带来的挑战。此外，我们基于动态高斯提出了一种自动生成合成数据标注的方法。为验证该方法，我们构建了一个包含七个场景的新数据集，涵盖14,000帧工具与相机运动及工具钳口开合，并以离体猪模型作为背景。在该数据集上，我们从真实数据中合成复现了场景形变，从而能够直接比较合成图像的质量。实验结果表明，该方法生成的逼真标注图像数据集在峰值信噪比（PSNR）上达到29.87的最高值。进一步地，我们在未见过的真实图像数据集上评估了使用真实与合成图像训练的医学专用神经网络。结果显示，使用本文方法生成的合成图像训练的模型性能比使用最新标准数据增强训练的模型提升了10%，整体性能提高接近15%。\n"
  },
  {
    "path": "abs/2508.08136.md",
    "content": "### FantasyStyle: Controllable Stylized Distillation for 3D Gaussian Splatting\n\nThe success of 3DGS in generative and editing applications has sparked growing interest in 3DGS-based style transfer. However, current methods still face two major challenges: (1) multi-view inconsistency often leads to style conflicts, resulting in appearance smoothing and distortion; and (2) heavy reliance on VGG features, which struggle to disentangle style and content from style images, often causing content leakage and excessive stylization. To tackle these issues, we introduce **FantasyStyle**, a 3DGS-based style transfer framework, and the first to rely entirely on diffusion model distillation. It comprises two key components: (1) **Multi-View Frequency Consistency**. We enhance cross-view consistency by applying a 3D filter to multi-view noisy latent, selectively reducing low-frequency components to mitigate stylized prior conflicts. (2) **Controllable Stylized Distillation**. To suppress content leakage from style images, we introduce negative guidance to exclude undesired content. In addition, we identify the limitations of Score Distillation Sampling and Delta Denoising Score in 3D style transfer and remove the reconstruction term accordingly. Building on these insights, we propose a controllable stylized distillation that leverages negative guidance to more effectively optimize the 3D Gaussians. Extensive experiments demonstrate that our method consistently outperforms state-of-the-art approaches, achieving higher stylization quality and visual realism across various scenes and styles.\n\n三维高斯溅射（3DGS）在生成与编辑应用中的成功，激发了对基于3DGS的风格迁移的广泛兴趣。然而，现有方法仍面临两大挑战：（1）多视图不一致性常常导致风格冲突，造成外观平滑与失真；（2）对VGG特征的高度依赖，使其难以从风格图像中解耦风格与内容，常引发内容泄漏与过度风格化。为解决这些问题，我们提出了 **FantasyStyle**，一个基于3DGS的风格迁移框架，也是首个完全依赖扩散模型蒸馏的方法。其包括两个关键组件：（1）**多视图频率一致性**。我们通过对多视图噪声潜变量应用三维滤波器，选择性地降低低频成分，从而缓解风格先验冲突并增强跨视图一致性。（2）**可控风格化蒸馏**。为抑制风格图像的内容泄漏，我们引入负向引导以排除不需要的内容。此外，我们识别了得分蒸馏采样（Score Distillation Sampling）和增量去噪得分（Delta Denoising Score）在三维风格迁移中的局限性，并据此去除了重建项。在这些洞见基础上，我们提出了一种可控风格化蒸馏方法，利用负向引导更有效地优化三维高斯。大量实验结果表明，该方法在多种场景与风格下均稳定优于现有方法，实现了更高的风格化质量与视觉真实感。\n"
  },
  {
    "path": "abs/2508.08219.md",
    "content": "### SAGOnline: Segment Any Gaussians Online\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for explicit 3D scene representation, yet achieving efficient and consistent 3D segmentation remains challenging. Current methods suffer from prohibitive computational costs, limited 3D spatial reasoning, and an inability to track multiple objects simultaneously. We present Segment Any Gaussians Online (SAGOnline), a lightweight and zero-shot framework for real-time 3D segmentation in Gaussian scenes that addresses these limitations through two key innovations: (1) a decoupled strategy that integrates video foundation models (e.g., SAM2) for view-consistent 2D mask propagation across synthesized views; and (2) a GPU-accelerated 3D mask generation and Gaussian-level instance labeling algorithm that assigns unique identifiers to 3D primitives, enabling lossless multi-object tracking and segmentation across views. SAGOnline achieves state-of-the-art performance on NVOS (92.7% mIoU) and Spin-NeRF (95.2% mIoU) benchmarks, outperforming Feature3DGS, OmniSeg3D-gs, and SA3D by 15--1500 times in inference speed (27 ms/frame). Qualitative results demonstrate robust multi-object segmentation and tracking in complex scenes. Our contributions include: (i) a lightweight and zero-shot framework for 3D segmentation in Gaussian scenes, (ii) explicit labeling of Gaussian primitives enabling simultaneous segmentation and tracking, and (iii) the effective adaptation of 2D video foundation models to the 3D domain. This work allows real-time rendering and 3D scene understanding, paving the way for practical AR/VR and robotic applications.\n\n三维高斯溅射（3DGS）已成为一种强大的显式三维场景表示范式，但实现高效且一致的三维分割仍然面临挑战。现有方法存在计算开销过大、三维空间推理受限以及无法同时跟踪多个物体的问题。我们提出了 **SAGOnline（Segment Any Gaussians Online）**，一个轻量级的零样本实时三维分割框架，通过两项关键创新解决了上述局限：（1）解耦策略：集成视频基础模型（如 SAM2），在合成视角间进行一致的二维掩膜传播；（2）GPU 加速的三维掩膜生成与高斯级别实例标注算法：为三维基元分配唯一标识符，实现跨视角的无损多目标跟踪与分割。SAGOnline 在 NVOS（92.7% mIoU）和 Spin-NeRF（95.2% mIoU）基准上取得了最新最优表现，在推理速度上比 Feature3DGS、OmniSeg3D-gs 和 SA3D 快 15 至 1500 倍（27 ms/帧）。定性结果显示其在复杂场景中具备鲁棒的多目标分割与跟踪能力。我们的贡献包括：（i）提出了一个轻量级零样本三维高斯场景分割框架；（ii）通过显式标注高斯基元，实现分割与跟踪的同时进行；（iii）将二维视频基础模型有效迁移至三维领域。本研究实现了实时渲染与三维场景理解，为 AR/VR 与机器人应用的落地铺平了道路。\n"
  },
  {
    "path": "abs/2508.08252.md",
    "content": "### ReferSplat: Referring Segmentation in 3D Gaussian Splatting\n\nWe introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks.\n\n我们提出了R3DGS（Referring 3D Gaussian Splatting Segmentation）任务，该任务旨在根据自然语言描述，在3D Gaussian场景中分割目标物体。这些语言描述通常包含空间关系或物体属性，因此该任务要求模型识别出新描述的物体，即便这些物体可能被遮挡或在当前视角下不可见，这对3D多模态理解提出了重大挑战。发展这一能力对于推动具身智能（Embodied AI）具有重要意义。为支持该方向的研究，我们构建了首个R3DGS数据集 Ref-LERF。分析表明，3D多模态理解与空间关系建模是R3DGS任务中的核心挑战。为此，我们提出了ReferSplat框架，以具备空间感知能力的方式显式对3D Gaussian点与自然语言表达进行建模。ReferSplat在新提出的R3DGS任务和现有的3D开放词汇分割基准上均取得了当前最优性能。\n"
  },
  {
    "path": "abs/2508.08254.md",
    "content": "### Learning an Implicit Physics Model for Image-based Fluid Simulation\n\nHumans possess an exceptional ability to imagine 4D scenes, encompassing both motion and 3D geometry, from a single still image. This ability is rooted in our accumulated observations of similar scenes and an intuitive understanding of physics. In this paper, we aim to replicate this capacity in neural networks, specifically focusing on natural fluid imagery. Existing methods for this task typically employ simplistic 2D motion estimators to animate the image, leading to motion predictions that often defy physical principles, resulting in unrealistic animations. Our approach introduces a novel method for generating 4D scenes with physics-consistent animation from a single image. We propose the use of a physics-informed neural network that predicts motion for each surface point, guided by a loss term derived from fundamental physical principles, including the Navier-Stokes equations. To capture appearance, we predict feature-based 3D Gaussians from the input image and its estimated depth, which are then animated using the predicted motions and rendered from any desired camera perspective. Experimental results highlight the effectiveness of our method in producing physically plausible animations, showcasing significant performance improvements over existing methods.\n\n人类具备一种非凡的能力，能够从单张静态图像中想象出包含运动与三维几何的四维场景。这种能力源于我们对类似场景的长期观察以及对物理规律的直觉理解。本文旨在在神经网络中复现这一能力，特别聚焦于自然流体图像。现有方法通常采用简化的二维运动估计器来对图像进行动画化，导致的运动预测往往违背物理规律，从而产生不真实的动画。为解决这一问题，我们提出了一种新颖的方法，可从单张图像生成具有物理一致性的四维场景动画。具体而言，我们提出了一种物理约束的神经网络，用于预测每个表面点的运动，其损失函数由包括纳维-斯托克斯方程在内的基本物理原理推导而来。为捕捉外观信息，我们从输入图像及其估计深度中预测基于特征的三维高斯表示，并利用预测的运动进行动画化，从任意相机视角进行渲染。实验结果表明，该方法能够生成物理合理的动画，在性能上显著优于现有方法。\n"
  },
  {
    "path": "abs/2508.08867.md",
    "content": "### GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments\n\nNovel view synthesis with neural models has advanced rapidly in recent years, yet adapting these models to scene changes remains an open problem. Existing methods are either labor-intensive, requiring extensive model retraining, or fail to capture detailed types of changes over time. In this paper, we present GaussianUpdate, a novel approach that combines 3D Gaussian representation with continual learning to address these challenges. Our method effectively updates the Gaussian radiance fields with current data while preserving information from past scenes. Unlike existing methods, GaussianUpdate explicitly models different types of changes through a novel multi-stage update strategy. Additionally, we introduce a visibility-aware continual learning approach with generative replay, enabling self-aware updating without the need to store images. The experiments on the benchmark dataset demonstrate our method achieves superior and real-time rendering with the capability of visualizing changes over different times\n\n近年来，基于神经网络的新视角合成技术发展迅速，但如何使这些模型适应场景变化仍然是一个未解决的问题。现有方法要么劳动强度大，需要大量模型重新训练，要么无法捕捉场景随时间变化的细节。本文提出了一种名为 **GaussianUpdate** 的新方法，将三维高斯表示与持续学习相结合，以应对这些挑战。我们的方法能够在利用当前数据更新高斯辐射场的同时，保留历史场景信息。与现有方法不同，GaussianUpdate 通过一种新颖的多阶段更新策略，显式建模不同类型的场景变化。此外，我们提出了一种结合生成回放的可见性感知持续学习方法，使模型能够在无需存储图像的情况下实现自适应更新。基准数据集上的实验结果表明，我们的方法在实时渲染与跨时间可视化场景变化方面均实现了优越表现。\n"
  },
  {
    "path": "abs/2508.09239.md",
    "content": "### Gradient-Direction-Aware Density Control for 3D Gaussian Splatting\n\nThe emergence of 3D Gaussian Splatting (3DGS) has significantly advanced novel view synthesis through explicit scene representation, enabling real-time photorealistic rendering. However, existing approaches manifest two critical limitations in complex scenarios: (1) Over-reconstruction occurs when persistent large Gaussians cannot meet adaptive splitting thresholds during density control. This is exacerbated by conflicting gradient directions that prevent effective splitting of these Gaussians; (2) Over-densification of Gaussians occurs in regions with aligned gradient aggregation, leading to redundant component proliferation. This redundancy significantly increases memory overhead due to unnecessary data retention. We present Gradient-Direction-Aware Gaussian Splatting (GDAGS), a gradient-direction-aware adaptive density control framework to address these challenges. Our key innovations: the gradient coherence ratio (GCR), computed through normalized gradient vector norms, which explicitly discriminates Gaussians with concordant versus conflicting gradient directions; and a nonlinear dynamic weighting mechanism leverages the GCR to enable gradient-direction-aware density control. Specifically, GDAGS prioritizes conflicting-gradient Gaussians during splitting operations to enhance geometric details while suppressing redundant concordant-direction Gaussians. Conversely, in cloning processes, GDAGS promotes concordant-direction Gaussian densification for structural completion while preventing conflicting-direction Gaussian overpopulation. Comprehensive evaluations across diverse real-world benchmarks demonstrate that GDAGS achieves superior rendering quality while effectively mitigating over-reconstruction, suppressing over-densification, and constructing compact scene representations with 50% reduced memory consumption through optimized Gaussians utilization.\n\n三维高斯溅射（3DGS）的出现极大推动了新视角合成的发展，通过显式场景表示实现了实时的照片级真实感渲染。然而，在复杂场景下，现有方法仍存在两大关键局限：（1）过度重建：当持续存在的大高斯在密度控制过程中无法满足自适应拆分阈值时，就会出现此问题。冲突的梯度方向会加剧这一情况，阻碍高斯的有效拆分；（2）过度密集化：在梯度聚合方向一致的区域，会导致高斯冗余组件的激增。这种冗余显著增加了内存开销，因保留了不必要的数据。为解决这些问题，我们提出了 **梯度方向感知高斯溅射（GDAGS）**，一个基于梯度方向感知的自适应密度控制框架。其核心创新包括：**梯度相干比（GCR）**，通过归一化梯度向量范数计算，用于显式区分梯度方向一致与冲突的高斯；以及一种非线性动态加权机制，利用 GCR 实现梯度方向感知的密度控制。具体而言，GDAGS 在拆分操作中优先处理冲突梯度的高斯，以增强几何细节，同时抑制冗余的一致方向高斯；而在克隆过程中，GDAGS 促进一致方向高斯的密集化以完成结构，同时避免冲突方向高斯的过度增殖。跨越多种真实场景基准的全面评估表明，GDAGS 在提升渲染质量的同时，有效缓解了过度重建与过度密集化问题，并通过优化高斯利用率，实现了场景表示的紧凑化，内存消耗降低达50%。\n"
  },
  {
    "path": "abs/2508.09479.md",
    "content": "### SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images\n\nThree-dimensional scene reconstruction from sparse-view satellite images is a long-standing and challenging task. While 3D Gaussian Splatting (3DGS) and its variants have recently attracted attention for its high efficiency, existing methods remain unsuitable for satellite images due to incompatibility with rational polynomial coefficient (RPC) models and limited generalization capability. Recent advances in generalizable 3DGS approaches show potential, but they perform poorly on multi-temporal sparse satellite images due to limited geometric constraints, transient objects, and radiometric inconsistencies. To address these limitations, we propose SkySplat, a novel self-supervised framework that integrates the RPC model into the generalizable 3DGS pipeline, enabling more effective use of sparse geometric cues for improved reconstruction. SkySplat relies only on RGB images and radiometric-robust relative height supervision, thereby eliminating the need for ground-truth height maps. Key components include a Cross-Self Consistency Module (CSCM), which mitigates transient object interference via consistency-based masking, and a multi-view consistency aggregation strategy that refines reconstruction results. Compared to per-scene optimization methods, SkySplat achieves an 86 times speedup over EOGS with higher accuracy. It also outperforms generalizable 3DGS baselines, reducing MAE from 13.18 m to 1.80 m on the DFC19 dataset significantly, and demonstrates strong cross-dataset generalization on the MVS3D benchmark.\n\n稀疏视角卫星图像的三维场景重建是一项长期存在且具有挑战性的问题。虽然三维高斯溅射（3DGS）及其变体因高效性而受到关注，但现有方法仍不适用于卫星图像，主要原因在于其与有理多项式系数（RPC）模型不兼容，以及泛化能力有限。尽管最新的可泛化3DGS方法展现了潜力，但在多时相稀疏卫星图像上表现不佳，原因在于几何约束不足、瞬态目标干扰以及辐射不一致性。为克服这些局限，我们提出了 **SkySplat**，一个新颖的自监督框架，将RPC模型集成到可泛化3DGS流程中，从而更有效地利用稀疏几何线索以提升重建效果。SkySplat 仅依赖RGB图像和对辐射具有鲁棒性的相对高度监督，无需地面真实高度图。其关键组件包括：**交叉自一致性模块（CSCM）**，通过基于一致性的掩膜缓解瞬态目标干扰；以及一种多视图一致性聚合策略，用于进一步优化重建结果。与逐场景优化方法相比，SkySplat 在精度更高的情况下实现了相较 EOGS **86 倍的速度提升**。同时，它在 DFC19 数据集上的平均绝对误差（MAE）从 13.18 m 显著降低到 1.80 m，并在 MVS3D 基准上展现了强大的跨数据集泛化能力。\n"
  },
  {
    "path": "abs/2508.09597.md",
    "content": "### SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing\n\nCreating high-fidelity and editable head avatars is a pivotal challenge in computer vision and graphics, boosting many AR/VR applications. While recent advancements have achieved photorealistic renderings and plausible animation, head editing, especially real-time appearance editing, remains challenging due to the implicit representation and entangled modeling of the geometry and global appearance. To address this, we propose Surface-Volumetric Gaussian Head Avatar (SVG-Head), a novel hybrid representation that explicitly models the geometry with 3D Gaussians bound on a FLAME mesh and leverages disentangled texture images to capture the global appearance. Technically, it contains two types of Gaussians, in which surface Gaussians explicitly model the appearance of head avatars using learnable texture images, facilitating real-time texture editing, while volumetric Gaussians enhance the reconstruction quality of non-Lambertian regions (e.g., lips and hair). To model the correspondence between 3D world and texture space, we provide a mesh-aware Gaussian UV mapping method, which leverages UV coordinates given by the FLAME mesh to obtain sharp texture images and real-time rendering speed. A hierarchical optimization strategy is further designed to pursue the optimal performance in both reconstruction quality and editing flexibility. Experiments on the NeRSemble dataset show that SVG-Head not only generates high-fidelity rendering results, but also is the first method to obtain explicit texture images for Gaussian head avatars and support real-time appearance editing.\n\n高保真且可编辑的人头头像生成是计算机视觉与图形学中的一个关键挑战，对于增强现实（AR）和虚拟现实（VR）应用具有重要推动作用。尽管近期方法已实现了照片级逼真渲染与合理的动画效果，但头像编辑，尤其是实时外观编辑，仍然面临困难，主要源于隐式表示以及几何与整体外观的耦合建模。为解决这一问题，我们提出了 **表面-体积高斯头像（SVG-Head）**，这是一种新颖的混合表示方法：通过绑定在 FLAME 网格上的三维高斯显式建模几何，并利用解耦的纹理图像捕捉整体外观。在技术实现上，该方法包含两类高斯：表面高斯通过可学习的纹理图像显式建模头像外观，从而支持实时纹理编辑；体积高斯则提升了非朗伯区域（如嘴唇和头发）的重建质量。为建立三维世界与纹理空间的对应关系，我们提出了一种基于网格的高斯 UV 映射方法，利用 FLAME 网格提供的 UV 坐标获得清晰的纹理图像并实现实时渲染速度。我们进一步设计了一种分层优化策略，以同时追求重建质量与编辑灵活性的最优性能。在 NeRSemble 数据集上的实验表明，SVG-Head 不仅生成了高保真的渲染结果，而且是首个能够为高斯头像获得显式纹理图像并支持实时外观编辑的方法。\n"
  },
  {
    "path": "abs/2508.09610.md",
    "content": "### DualPhys-GS: Dual Physically-Guided 3D Gaussian Splatting for Underwater Scene Reconstruction\n\nIn 3D reconstruction of underwater scenes, traditional methods based on atmospheric optical models cannot effectively deal with the selective attenuation of light wavelengths and the effect of suspended particle scattering, which are unique to the water medium, and lead to color distortion, geometric artifacts, and collapsing phenomena at long distances. We propose the DualPhys-GS framework to achieve high-quality underwater reconstruction through a dual-path optimization mechanism. Our approach further develops a dual feature-guided attenuation-scattering modeling mechanism, the RGB-guided attenuation optimization model combines RGB features and depth information and can handle edge and structural details. In contrast, the multi-scale depth-aware scattering model captures scattering effects at different scales using a feature pyramid network and an attention mechanism. Meanwhile, we design several special loss functions. The attenuation scattering consistency loss ensures physical consistency. The water body type adaptive loss dynamically adjusts the weighting coefficients. The edge-aware scattering loss is used to maintain the sharpness of structural edges. The multi-scale feature loss helps to capture global and local structural information. In addition, we design a scene adaptive mechanism that can automatically identify the water-body-type characteristics (e.g., clear coral reef waters or turbid coastal waters) and dynamically adjust the scattering and attenuation parameters and optimization strategies. Experimental results show that our method outperforms existing methods in several metrics, especially in suspended matter-dense regions and long-distance scenes, and the reconstruction quality is significantly improved.\n\n在水下场景的三维重建中，基于大气光学模型的传统方法无法有效处理水体介质特有的光波长选择性衰减和悬浮颗粒散射效应，导致远距离场景中出现颜色失真、几何伪影和塌陷现象。为解决这一问题，我们提出了 **DualPhys-GS** 框架，通过双路径优化机制实现高质量的水下重建。我们进一步设计了一种双特征引导的衰减-散射建模机制，其中 **RGB 引导的衰减优化模型** 结合了 RGB 特征和深度信息，能够更好地处理边缘和结构细节；而 **多尺度深度感知散射模型** 则利用特征金字塔网络与注意力机制捕捉不同尺度下的散射效应。同时，我们设计了多种特殊损失函数：**衰减-散射一致性损失** 用于保证物理一致性；**水体类型自适应损失** 可动态调整权重系数；**边缘感知散射损失** 用于保持结构边缘的清晰度；**多尺度特征损失** 有助于捕捉全局与局部结构信息。此外，我们提出了一种 **场景自适应机制**，能够自动识别水体类型特征（如清澈的珊瑚礁海域或浑浊的近海水域），并动态调整散射与衰减参数及优化策略。实验结果表明，该方法在多个评价指标上均优于现有方法，尤其在悬浮物密集区域与远距离场景中，重建质量显著提升。\n"
  },
  {
    "path": "abs/2508.09626.md",
    "content": "### Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation\n\nIn the task of 3D Aerial-view Scene Semantic Segmentation (3D-AVS-SS), traditional methods struggle to address semantic ambiguity caused by scale variations and structural occlusions in aerial images. This limits their segmentation accuracy and consistency. To tackle these challenges, we propose a novel 3D-AVS-SS approach named SAD-Splat. Our method introduces a Gaussian point drop module, which integrates semantic confidence estimation with a learnable sparsity mechanism based on the Hard Concrete distribution. This module effectively eliminates redundant and semantically ambiguous Gaussian points, enhancing both segmentation performance and representation compactness. Furthermore, SAD-Splat incorporates a high-confidence pseudo-label generation pipeline. It leverages 2D foundation models to enhance supervision when ground-truth labels are limited, thereby further improving segmentation accuracy. To advance research in this domain, we introduce a challenging benchmark dataset: 3D Aerial Semantic (3D-AS), which encompasses diverse real-world aerial scenes with sparse annotations. Experimental results demonstrate that SAD-Splat achieves an excellent balance between segmentation accuracy and representation compactness. It offers an efficient and scalable solution for 3D aerial scene understanding.\n\n在三维航拍场景语义分割（3D-AVS-SS）任务中，传统方法难以应对航拍图像中由尺度变化与结构遮挡引起的语义歧义，从而限制了分割的精度与一致性。为解决这些挑战，我们提出了一种新颖的 3D-AVS-SS 方法 —— **SAD-Splat**。该方法引入了一个高斯点丢弃模块，将语义置信度估计与基于 Hard Concrete 分布的可学习稀疏机制相结合，有效去除了冗余且语义模糊的高斯点，从而提升分割性能与表示紧凑性。此外，SAD-Splat 还引入了高置信度伪标签生成流程。当真实标签有限时，该流程利用二维基础模型增强监督信号，进一步提升了分割精度。为推动该领域研究，我们还提出了一个具有挑战性的基准数据集 —— **3D Aerial Semantic (3D-AS)**，该数据集涵盖了多样化的真实航拍场景并提供稀疏标注。实验结果表明，SAD-Splat 在分割精度与表示紧凑性之间取得了极佳平衡，为三维航拍场景理解提供了一种高效且可扩展的解决方案。\n"
  },
  {
    "path": "abs/2508.09667.md",
    "content": "### GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors\n\nReconstructing 3D scenes using 3D Gaussian Splatting (3DGS) from sparse views is an ill-posed problem due to insufficient information, often resulting in noticeable artifacts. While recent approaches have sought to leverage generative priors to complete information for under-constrained regions, they struggle to generate content that remains consistent with input observations. To address this challenge, we propose GSFixer, a novel framework designed to improve the quality of 3DGS representations reconstructed from sparse inputs. The core of our approach is the reference-guided video restoration model, built upon a DiT-based video diffusion model trained on paired artifact 3DGS renders and clean frames with additional reference-based conditions. Considering the input sparse views as references, our model integrates both 2D semantic features and 3D geometric features of reference views extracted from the visual geometry foundation model, enhancing the semantic coherence and 3D consistency when fixing artifact novel views. Furthermore, considering the lack of suitable benchmarks for 3DGS artifact restoration evaluation, we present DL3DV-Res which contains artifact frames rendered using low-quality 3DGS. Extensive experiments demonstrate our GSFixer outperforms current state-of-the-art methods in 3DGS artifact restoration and sparse-view 3D reconstruction.\n\n利用三维高斯溅射（3DGS）从稀疏视角重建三维场景是一个病态问题，由于信息不足，往往会产生明显的伪影。尽管近期方法尝试利用生成先验来补全欠约束区域的信息，但仍难以生成与输入观测保持一致的内容。为解决这一挑战，我们提出了 **GSFixer**，一个旨在提升稀疏输入下3DGS重建质量的新框架。其核心是一个基于参考引导的视频修复模型，构建于 DiT 视频扩散模型之上，并在包含成对的伪影 3DGS 渲染帧与干净帧的数据上进行训练，同时引入了额外的参考条件。以输入的稀疏视角作为参考，我们的模型结合了来自视觉几何基础模型提取的二维语义特征与三维几何特征，从而在修复伪影新视角时增强语义一致性与三维一致性。此外，鉴于缺乏适合的 3DGS 伪影修复评测基准，我们提出了 **DL3DV-Res** 数据集，其中包含由低质量3DGS渲染得到的伪影帧。大量实验结果表明，GSFixer 在3DGS伪影修复与稀疏视角三维重建方面均优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2508.09811.md",
    "content": "### TRACE: Learning 3D Gaussian Physical Dynamics from Multi-view Videos\n\nIn this paper, we aim to model 3D scene geometry, appearance, and physical information just from dynamic multi-view videos in the absence of any human labels. By leveraging physics-informed losses as soft constraints or integrating simple physics models into neural nets, existing works often fail to learn complex motion physics, or doing so requires additional labels such as object types or masks. We propose a new framework named TRACE to model the motion physics of complex dynamic 3D scenes. The key novelty of our method is that, by formulating each 3D point as a rigid particle with size and orientation in space, we directly learn a translation rotation dynamics system for each particle, explicitly estimating a complete set of physical parameters to govern the particle's motion over time. Extensive experiments on three existing dynamic datasets and one newly created challenging synthetic datasets demonstrate the extraordinary performance of our method over baselines in the task of future frame extrapolation. A nice property of our framework is that multiple objects or parts can be easily segmented just by clustering the learned physical parameters.\n\n本文旨在在没有任何人工标注的情况下，仅通过动态多视角视频来建模三维场景的几何、外观和物理信息。现有方法通常通过将物理约束损失作为软约束，或将简单物理模型集成到神经网络中，但往往无法有效学习复杂的运动物理，或者需要额外的标注（如物体类型或掩码）。我们提出了一个新的框架 TRACE，用于建模复杂动态三维场景的运动物理。该方法的关键创新在于：将每个三维点视为空间中具有大小和方向的刚体粒子，直接为每个粒子学习一个平移-旋转动力学系统，并显式估计一整套物理参数，以控制粒子随时间的运动。我们在三个现有的动态数据集和一个新构建的具有挑战性的合成数据集上进行了大量实验，结果表明该方法在未来帧外推任务中表现远超基线方法。该框架的一个良好特性是：只需对学习到的物理参数进行聚类，就能轻松地实现对多个物体或部分的分割。\n"
  },
  {
    "path": "abs/2508.09912.md",
    "content": "### E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras\n\nNovel view synthesis and 4D reconstruction techniques predominantly rely on RGB cameras, thereby inheriting inherent limitations such as the dependence on adequate lighting, susceptibility to motion blur, and a limited dynamic range. Event cameras, offering advantages of low power, high temporal resolution and high dynamic range, have brought a new perspective to addressing the scene reconstruction challenges in high-speed motion and low-light scenes. To this end, we propose E-4DGS, the first event-driven dynamic Gaussian Splatting approach, for novel view synthesis from multi-view event streams with fast-moving cameras. Specifically, we introduce an event-based initialization scheme to ensure stable training and propose event-adaptive slicing splatting for time-aware reconstruction. Additionally, we employ intensity importance pruning to eliminate floating artifacts and enhance 3D consistency, while incorporating an adaptive contrast threshold for more precise optimization. We design a synthetic multi-view camera setup with six moving event cameras surrounding the object in a 360-degree configuration and provide a benchmark multi-view event stream dataset that captures challenging motion scenarios. Our approach outperforms both event-only and event-RGB fusion baselines and paves the way for the exploration of multi-view event-based reconstruction as a novel approach for rapid scene capture.\n\n新颖视图合成和四维重建技术主要依赖于 RGB 相机，因此不可避免地受到一些固有限制，如对充足光照的依赖、易受运动模糊影响以及有限的动态范围。事件相机具有低功耗、高时间分辨率和高动态范围的优势，为解决高速运动和低光场景下的场景重建挑战带来了新的视角。为此，我们提出了 E-4DGS，这是首个基于事件驱动的动态高斯泼溅方法，用于在快速运动相机的多视角事件流中进行新颖视图合成。具体而言，我们引入了一种基于事件的初始化方案以确保训练稳定，并提出了事件自适应切片泼溅用于时间感知的重建。此外，我们采用强度重要性剪枝来消除漂浮伪影并增强三维一致性，同时结合自适应对比度阈值以实现更精确的优化。我们设计了一个合成的多视角相机系统，由六个运动事件相机以 360 度方式环绕目标，并提供了一个基准多视角事件流数据集，以捕捉具有挑战性的运动场景。实验结果表明，我们的方法在性能上优于事件单独和事件-RGB 融合的基线方法，并为多视角事件驱动的重建作为一种快速场景捕获的新方法开辟了道路。\n"
  },
  {
    "path": "abs/2508.09977.md",
    "content": "### A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation\n\n3D Gaussian Splatting (3DGS) has recently emerged as a powerful alternative to Neural Radiance Fields (NeRF) for 3D scene representation, offering high-fidelity photorealistic rendering with real-time performance. Beyond novel view synthesis, the explicit and compact nature of 3DGS enables a wide range of downstream applications that require geometric and semantic understanding. This survey provides a comprehensive overview of recent progress in 3DGS applications. It first introduces 2D foundation models that support semantic understanding and control in 3DGS applications, followed by a review of NeRF-based methods that inform their 3DGS counterparts. We then categorize 3DGS applications into segmentation, editing, generation, and other functional tasks. For each, we summarize representative methods, supervision strategies, and learning paradigms, highlighting shared design principles and emerging trends. Commonly used datasets and evaluation protocols are also summarized, along with comparative analyses of recent methods across public benchmarks.\n\n3D Gaussian Splatting（3DGS）作为一种新兴的3D场景表示方法，近年来迅速发展，成为神经辐射场（NeRF）的有力替代方案。3DGS不仅具备高保真、逼真的渲染能力，还能实现实时性能。除了新视角合成外，3DGS结构显式且紧凑的特性，使其具备良好的几何与语义理解能力，从而广泛应用于各类下游任务中。本综述系统梳理了近年来3DGS在各类应用中的研究进展。我们首先介绍了支持3DGS语义理解与可控生成的2D基础模型，随后回顾了与3DGS发展密切相关的NeRF方法。接着，我们将3DGS的应用划分为分割、编辑、生成及其他功能性任务，并围绕每类任务，总结了代表性方法、监督策略和学习范式，归纳了其中的通用设计原则与新兴趋势。此外，我们还汇总了常用数据集与评估协议，并对近期方法在公开基准上的表现进行了对比分析。\n"
  },
  {
    "path": "abs/2508.10227.md",
    "content": "### EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting\n\nAs an emerging novel view synthesis approach, 3D Gaussian Splatting (3DGS) demonstrates fast training/rendering with superior visual quality. The two tasks of 3DGS, Gaussian creation and view rendering, are typically separated over time or devices, and thus storage/transmission and finally compression of 3DGS Gaussians become necessary. We begin with a correlation and statistical analysis of 3DGS Gaussian attributes. An inspiring finding in this work reveals that spherical harmonic AC attributes precisely follow Laplace distributions, while mixtures of Gaussian distributions can approximate rotation, scaling, and opacity. Additionally, harmonic AC attributes manifest weak correlations with other attributes except for inherited correlations from a color space. A factorized and parameterized entropy coding method, EntropyGS, is hereinafter proposed. During encoding, distribution parameters of each Gaussian attribute are estimated to assist their entropy coding. The quantization for entropy coding is adaptively performed according to Gaussian attribute types. EntropyGS demonstrates about 30x rate reduction on benchmark datasets while maintaining similar rendering quality compared to input 3DGS data, with a fast encoding and decoding time.\n\n作为一种新兴的新颖视图合成方法，三维高斯泼溅（3DGS）展现出快速训练/渲染和优异视觉质量的优势。3DGS 的两个任务——高斯生成与视图渲染——通常在时间或设备上分离，因此 3DGS 高斯的存储、传输及最终压缩就变得必要。我们首先对 3DGS 高斯属性进行了相关性与统计分析。本研究的一个重要发现是：球谐 AC 属性严格服从拉普拉斯分布，而高斯分布混合可以近似旋转、缩放和不透明度。此外，球谐 AC 属性与其他属性的相关性较弱，除非继承自颜色空间的相关性。基于此，我们提出了一种分解化、参数化的熵编码方法——EntropyGS。在编码过程中，对每个高斯属性的分布参数进行估计，以辅助其熵编码。熵编码的量化过程根据高斯属性类型自适应执行。实验结果表明，EntropyGS 在基准数据集上实现了约 30 倍的码率压缩，同时保持与输入 3DGS 数据相当的渲染质量，并具备快速的编码与解码性能。\n"
  },
  {
    "path": "abs/2508.10507.md",
    "content": "### Multi-Sample Anti-Aliasing and Constrained Optimization for 3D Gaussian Splatting\n\nRecent advances in 3D Gaussian splatting have significantly improved real-time novel view synthesis, yet insufficient geometric constraints during scene optimization often result in blurred reconstructions of fine-grained details, particularly in regions with high-frequency textures and sharp discontinuities. To address this, we propose a comprehensive optimization framework integrating multisample anti-aliasing (MSAA) with dual geometric constraints. Our system computes pixel colors through adaptive blending of quadruple subsamples, effectively reducing aliasing artifacts in high-frequency components. The framework introduces two constraints: (a) an adaptive weighting strategy that prioritizes under-reconstructed regions through dynamic gradient analysis, and (b) gradient differential constraints enforcing geometric regularization at object boundaries. This targeted optimization enables the model to allocate computational resources preferentially to critical regions requiring refinement while maintaining global consistency. Extensive experimental evaluations across multiple benchmarks demonstrate that our method achieves state-of-the-art performance in detail preservation, particularly in preserving high-frequency textures and sharp discontinuities, while maintaining real-time rendering efficiency. Quantitative metrics and perceptual studies confirm statistically significant improvements over baseline approaches in both structural similarity (SSIM) and perceptual quality (LPIPS).\n\n近年来，三维高斯泼溅在实时新颖视图合成方面取得了显著进展，但在场景优化过程中，由于几何约束不足，往往导致细粒度细节的模糊重建，特别是在高频纹理和锐利不连续区域。为解决这一问题，我们提出了一个结合多重采样抗锯齿（MSAA）和双重几何约束的综合优化框架。我们的系统通过自适应融合四重子采样来计算像素颜色，有效减少高频成分中的锯齿伪影。该框架引入了两个约束：（a）基于动态梯度分析的自适应加权策略，用于优先优化欠重建区域；（b）基于梯度差分的约束，在物体边界处强制几何正则化。这种有针对性的优化使模型能够优先将计算资源分配给需要精细化的关键区域，同时保持整体一致性。在多个基准数据集上的广泛实验评估表明，我们的方法在细节保持方面达到了最新水平，尤其是在高频纹理和锐利不连续性的保留上，同时维持了实时渲染效率。定量指标和感知研究均证实，相比基线方法，我们在结构相似性（SSIM）和感知质量（LPIPS）方面均取得了统计显著的提升。\n"
  },
  {
    "path": "abs/2508.10936.md",
    "content": "### Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction\n\nCollaborative perception enables connected vehicles to share information, overcoming occlusions and extending the limited sensing range inherent in single-agent (non-collaborative) systems. Existing vision-only methods for 3D semantic occupancy prediction commonly rely on dense 3D voxels, which incur high communication costs, or 2D planar features, which require accurate depth estimation or additional supervision, limiting their applicability to collaborative scenarios. To address these challenges, we propose the first approach leveraging sparse 3D semantic Gaussian splatting for collaborative 3D semantic occupancy prediction. By sharing and fusing intermediate Gaussian primitives, our method provides three benefits: a neighborhood-based cross-agent fusion that removes duplicates and suppresses noisy or inconsistent Gaussians; a joint encoding of geometry and semantics in each primitive, which reduces reliance on depth supervision and allows simple rigid alignment; and sparse, object-centric messages that preserve structural information while reducing communication volume. Extensive experiments demonstrate that our approach outperforms single-agent perception and baseline collaborative methods by +8.42 and +3.28 points in mIoU, and +5.11 and +22.41 points in IoU, respectively. When further reducing the number of transmitted Gaussians, our method still achieves a +1.9 improvement in mIoU, using only 34.6% communication volume, highlighting robust performance under limited communication budgets.\n\n协同感知使联网车辆能够共享信息，从而克服遮挡并扩展单智能体（非协同）系统固有的有限感知范围。现有的仅基于视觉的三维语义占据预测方法通常依赖稠密的三维体素，带来高昂的通信开销；或者依赖二维平面特征，需要精确的深度估计或额外的监督，从而限制了其在协同场景中的适用性。为应对这些挑战，我们提出了首个利用稀疏三维语义高斯泼溅的协同三维语义占据预测方法。通过共享与融合中间高斯基元，我们的方法带来三方面优势：一是基于邻域的跨智能体融合，去除冗余并抑制噪声或不一致的高斯；二是在每个基元中联合编码几何与语义信息，从而减少对深度监督的依赖，并支持简单的刚性对齐；三是稀疏、面向对象的消息传递，既能保留结构信息，又能降低通信量。大量实验证明，我们的方法在性能上优于单智能体感知和基线协同方法，mIoU 分别提升 +8.42 和 +3.28，IoU 分别提升 +5.11 和 +22.41。当进一步减少传输的高斯数量时，我们的方法在仅使用 34.6% 通信量的情况下，仍能实现 +1.9 的 mIoU 提升，展现出在有限通信预算下的稳健性能。\n"
  },
  {
    "path": "abs/2508.11854.md",
    "content": "### ComplicitSplat: Downstream Models are Vulnerable to Blackbox Attacks by 3D Gaussian Splat Camouflages\n\nAs 3D Gaussian Splatting (3DGS) gains rapid adoption in safety-critical tasks for efficient novel-view synthesis from static images, how might an adversary tamper images to cause harm? We introduce ComplicitSplat, the first attack that exploits standard 3DGS shading methods to create viewpoint-specific camouflage - colors and textures that change with viewing angle - to embed adversarial content in scene objects that are visible only from specific viewpoints and without requiring access to model architecture or weights. Our extensive experiments show that ComplicitSplat generalizes to successfully attack a variety of popular detector - both single-stage, multi-stage, and transformer-based models on both real-world capture of physical objects and synthetic scenes. To our knowledge, this is the first black-box attack on downstream object detectors using 3DGS, exposing a novel safety risk for applications like autonomous navigation and other mission-critical robotic systems.\n\n随着三维高斯泼溅（3DGS）在静态图像高效新视角合成等安全关键任务中的快速应用，一个对手可能如何篡改图像以造成危害？我们提出了 **ComplicitSplat**，这是首个利用标准 3DGS 着色方法的攻击，通过制造视角特定的伪装——随观察角度变化的颜色与纹理——将对抗性内容嵌入场景对象中，使其仅在特定视角下可见，并且无需访问模型架构或权重。我们的大量实验表明，ComplicitSplat 能够泛化到多种主流检测器的攻击，包括单阶段、多阶段以及基于 Transformer 的模型，无论是在真实物体捕获场景还是合成场景中都能成功攻击。据我们所知，这是首个针对 3DGS 的下游目标检测器的黑箱攻击，揭示了自主导航及其他任务关键型机器人系统中的一种新型安全风险。\n"
  },
  {
    "path": "abs/2508.12015.md",
    "content": "### InstDrive: Instance-Aware 3D Gaussian Splatting for Driving Scenes\n\nReconstructing dynamic driving scenes from dashcam videos has attracted increasing attention due to its significance in autonomous driving and scene understanding. While recent advances have made impressive progress, most methods still unify all background elements into a single representation, hindering both instance-level understanding and flexible scene editing. Some approaches attempt to lift 2D segmentation into 3D space, but often rely on pre-processed instance IDs or complex pipelines to map continuous features to discrete identities. Moreover, these methods are typically designed for indoor scenes with rich viewpoints, making them less applicable to outdoor driving scenarios. In this paper, we present InstDrive, an instance-aware 3D Gaussian Splatting framework tailored for the interactive reconstruction of dynamic driving scene. We use masks generated by SAM as pseudo ground-truth to guide 2D feature learning via contrastive loss and pseudo-supervised objectives. At the 3D level, we introduce regularization to implicitly encode instance identities and enforce consistency through a voxel-based loss. A lightweight static codebook further bridges continuous features and discrete identities without requiring data pre-processing or complex optimization. Quantitative and qualitative experiments demonstrate the effectiveness of InstDrive, and to the best of our knowledge, it is the first framework to achieve 3D instance segmentation in dynamic, open-world driving.\n\n从行车记录仪视频中重建动态驾驶场景因其在自动驾驶与场景理解中的重要性而受到越来越多的关注。尽管近年来取得了显著进展，大多数方法仍然将所有背景元素统一为单一表示，这阻碍了实例级理解与灵活的场景编辑。一些方法尝试将二维分割提升到三维空间，但往往依赖预处理的实例 ID 或复杂的流程来将连续特征映射为离散身份。此外，这些方法通常针对具有丰富视角的室内场景设计，因而在室外驾驶场景中的适用性较差。本文提出 **InstDrive**，一种面向动态驾驶场景交互式重建的实例感知三维高斯泼溅框架。我们使用由 SAM 生成的掩码作为伪真值，通过对比损失与伪监督目标引导二维特征学习。在三维层面，我们引入正则化以隐式编码实例身份，并通过基于体素的损失来强化一致性。一个轻量化的静态码本进一步桥接连续特征与离散身份，无需数据预处理或复杂优化。定量与定性实验结果表明了 InstDrive 的有效性，据我们所知，这是首个在动态开放驾驶场景中实现三维实例分割的框架。\n"
  },
  {
    "path": "abs/2508.12313.md",
    "content": "### Improving Densification in 3D Gaussian Splatting for High-Fidelity Rendering\n\nAlthough 3D Gaussian Splatting (3DGS) has achieved impressive performance in real-time rendering, its densification strategy often results in suboptimal reconstruction quality. In this work, we present a comprehensive improvement to the densification pipeline of 3DGS from three perspectives: when to densify, how to densify, and how to mitigate overfitting. Specifically, we propose an Edge-Aware Score to effectively select candidate Gaussians for splitting. We further introduce a Long-Axis Split strategy that reduces geometric distortions introduced by clone and split operations. To address overfitting, we design a set of techniques, including Recovery-Aware Pruning, Multi-step Update, and Growth Control. Our method enhances rendering fidelity without introducing additional training or inference overhead, achieving state-of-the-art performance with fewer Gaussians.\n\n尽管三维高斯泼溅（3DGS）在实时渲染中已取得令人瞩目的表现，其加密策略往往导致次优的重建质量。在本工作中，我们从三个方面对 3DGS 的加密流程进行了全面改进：何时加密、如何加密，以及如何缓解过拟合。具体而言，我们提出了一种 **边缘感知评分（Edge-Aware Score）**，用于高效选择待分裂的候选高斯。我们进一步引入 **长轴分裂（Long-Axis Split）** 策略，以减少克隆与分裂操作带来的几何畸变。为应对过拟合，我们设计了一系列技术，包括 **恢复感知剪枝（Recovery-Aware Pruning）**、**多步更新（Multi-step Update）** 与 **增长控制（Growth Control）**。该方法在不增加额外训练或推理开销的情况下提升了渲染保真度，并以更少的高斯实现了当前最优性能。\n"
  },
  {
    "path": "abs/2508.12415.md",
    "content": "### TiP4GEN: Text to Immersive Panorama 4D Scene Generation\n\nWith the rapid advancement and widespread adoption of VR/AR technologies, there is a growing demand for the creation of high-quality, immersive dynamic scenes. However, existing generation works predominantly concentrate on the creation of static scenes or narrow perspective-view dynamic scenes, falling short of delivering a truly 360-degree immersive experience from any viewpoint. In this paper, we introduce TiP4GEN, an advanced text-to-dynamic panorama scene generation framework that enables fine-grained content control and synthesizes motion-rich, geometry-consistent panoramic 4D scenes. TiP4GEN integrates panorama video generation and dynamic scene reconstruction to create 360-degree immersive virtual environments. For video generation, we introduce a Dual-branch Generation Model consisting of a panorama branch and a perspective branch, responsible for global and local view generation, respectively. A bidirectional cross-attention mechanism facilitates comprehensive information exchange between the branches. For scene reconstruction, we propose a Geometry-aligned Reconstruction Model based on 3D Gaussian Splatting. By aligning spatial-temporal point clouds using metric depth maps and initializing scene cameras with estimated poses, our method ensures geometric consistency and temporal coherence for the reconstructed scenes. Extensive experiments demonstrate the effectiveness of our proposed designs and the superiority of TiP4GEN in generating visually compelling and motion-coherent dynamic panoramic scenes.\n\n随着 VR/AR 技术的快速发展和广泛应用，对高质量、沉浸式动态场景的需求日益增长。然而，现有的生成工作主要集中于静态场景或窄视角的动态场景，难以实现从任意视角提供真正 360 度沉浸式体验。本文提出了 **TiP4GEN**，一种先进的文本驱动动态全景场景生成框架，能够实现细粒度的内容控制，并合成富含运动、几何一致的全景 4D 场景。TiP4GEN 将全景视频生成与动态场景重建相结合，创建 360 度沉浸式虚拟环境。在视频生成部分，我们引入了 **双分支生成模型**，由全景分支和透视分支组成，分别负责全局和局部视图的生成，双向交叉注意力机制促进了分支间的全面信息交互。在场景重建部分，我们提出了基于三维高斯泼溅的 **几何对齐重建模型**。通过使用度量深度图对时空点云进行对齐，并以估计的相机位姿初始化场景相机，我们的方法确保了重建场景的几何一致性和时间连贯性。大量实验证明了所提设计的有效性，并展示了 TiP4GEN 在生成视觉效果出色、运动连贯的动态全景场景方面的优越性。\n"
  },
  {
    "path": "abs/2508.12615.md",
    "content": "### WIPES: Wavelet-based Visual Primitives\n\nPursuing a continuous visual representation that offers flexible frequency modulation and fast rendering speed has recently garnered increasing attention in the fields of 3D vision and graphics. However, existing representations often rely on frequency guidance or complex neural network decoding, leading to spectrum loss or slow rendering. To address these limitations, we propose WIPES, a universal Wavelet-based vIsual PrimitivES for representing multi-dimensional visual signals. Building on the spatial-frequency localization advantages of wavelets, WIPES effectively captures both the low-frequency \"forest\" and the high-frequency \"trees.\" Additionally, we develop a wavelet-based differentiable rasterizer to achieve fast visual rendering. Experimental results on various visual tasks, including 2D image representation, 5D static and 6D dynamic novel view synthesis, demonstrate that WIPES, as a visual primitive, offers higher rendering quality and faster inference than INR-based methods, and outperforms Gaussian-based representations in rendering quality.\n\n在三维视觉与图形学领域，追求既能灵活调控频率又能实现高速渲染的连续视觉表示正引起越来越多的关注。然而，现有表示往往依赖频率引导或复杂的神经网络解码，导致频谱丢失或渲染速度缓慢。为克服这些限制，我们提出了 WIPES，这是一种通用的基于小波的多维视觉信号表示方法。利用小波在空间-频率局部化上的优势，WIPES 能够有效捕捉低频的“整体”（forest）和高频的“细节”（trees）。此外，我们还开发了一个基于小波的可微光栅化器，实现快速的视觉渲染。在多种视觉任务上（包括二维图像表示、五维静态与六维动态新视角合成）的实验结果表明，作为一种视觉基元，WIPES 比基于 INR 的方法具有更高的渲染质量和更快的推理速度，并且在渲染质量上优于基于高斯的表示方法。\n"
  },
  {
    "path": "abs/2508.12720.md",
    "content": "### Quantifying and Alleviating Co-Adaptation in Sparse-View 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has demonstrated impressive performance in novel view synthesis under dense-view settings. However, in sparse-view scenarios, despite the realistic renderings in training views, 3DGS occasionally manifests appearance artifacts in novel views. This paper investigates the appearance artifacts in sparse-view 3DGS and uncovers a core limitation of current approaches: the optimized Gaussians are overly-entangled with one another to aggressively fit the training views, which leads to a neglect of the real appearance distribution of the underlying scene and results in appearance artifacts in novel views. The analysis is based on a proposed metric, termed Co-Adaptation Score (CA), which quantifies the entanglement among Gaussians, i.e., co-adaptation, by computing the pixel-wise variance across multiple renderings of the same viewpoint, with different random subsets of Gaussians. The analysis reveals that the degree of co-adaptation is naturally alleviated as the number of training views increases. Based on the analysis, we propose two lightweight strategies to explicitly mitigate the co-adaptation in sparse-view 3DGS: (1) random gaussian dropout; (2) multiplicative noise injection to the opacity. Both strategies are designed to be plug-and-play, and their effectiveness is validated across various methods and benchmarks. We hope that our insights into the co-adaptation effect will inspire the community to achieve a more comprehensive understanding of sparse-view 3DGS.\n\n三维高斯泼溅（3DGS）在密集视角条件下的新视角合成中展现了令人印象深刻的性能。然而，在稀疏视角场景中，尽管在训练视角渲染效果逼真，3DGS 在新视角中偶尔会出现外观伪影。本文研究了稀疏视角 3DGS 中的外观伪影，并揭示了当前方法的一个核心局限：优化得到的高斯之间过度耦合，过于激进地拟合训练视角，导致忽略了场景真实的外观分布，从而在新视角中产生伪影。我们的分析基于一个提出的指标——**协同适应分数（Co-Adaptation Score, CA）**，该指标通过在相同视角下多次渲染并使用不同随机子集的高斯，计算像素级方差来量化高斯之间的耦合程度。分析结果表明，随着训练视角数量的增加，协同适应程度自然减轻。基于此分析，我们提出了两种轻量化策略来显式缓解稀疏视角 3DGS 中的协同适应：(1) 随机高斯丢弃；(2) 对不透明度注入乘性噪声。这两种策略均为即插即用，并在多种方法与基准上验证了其有效性。我们希望对协同适应效应的洞察能够激发社区对稀疏视角 3DGS 的更全面理解。\n"
  },
  {
    "path": "abs/2508.13043.md",
    "content": "### IntelliCap: Intelligent Guidance for Consistent View Sampling\n\nNovel view synthesis from images, for example, with 3D Gaussian splatting, has made great progress. Rendering fidelity and speed are now ready even for demanding virtual reality applications. However, the problem of assisting humans in collecting the input images for these rendering algorithms has received much less attention. High-quality view synthesis requires uniform and dense view sampling. Unfortunately, these requirements are not easily addressed by human camera operators, who are in a hurry, impatient, or lack understanding of the scene structure and the photographic process. Existing approaches to guide humans during image acquisition concentrate on single objects or neglect view-dependent material characteristics. We propose a novel situated visualization technique for scanning at multiple scales. During the scanning of a scene, our method identifies important objects that need extended image coverage to properly represent view-dependent appearance. To this end, we leverage semantic segmentation and category identification, ranked by a vision-language model. Spherical proxies are generated around highly ranked objects to guide the user during scanning. Our results show superior performance in real scenes compared to conventional view sampling strategies.\n\n基于图像的新视角合成（例如使用三维高斯泼溅）已取得显著进展。其渲染保真度和速度甚至已能满足高要求的虚拟现实应用。然而，如何辅助人类采集用于这些渲染算法的输入图像这一问题却鲜有关注。高质量的新视角合成需要均匀且密集的视角采样。不幸的是，这些要求对匆忙、缺乏耐心或不了解场景结构和拍摄过程的人类摄影者来说难以满足。现有的人类采集引导方法多集中于单个物体，或忽略了与视角相关的材质特性。我们提出了一种新颖的多尺度扫描情境可视化技术。在场景扫描过程中，我们的方法能识别需要扩展图像覆盖的重要物体，以正确表达视角相关的外观特性。为此，我们利用语义分割和类别识别，并通过视觉-语言模型进行排序。在排序靠前的物体周围生成球形代理，以引导用户完成扫描。实验结果表明，与传统视角采样策略相比，我们的方法在真实场景中表现更为优越。\n"
  },
  {
    "path": "abs/2508.13153.md",
    "content": "### IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion\n\nReconstructing complete and interactive 3D scenes remains a fundamental challenge in computer vision and robotics, particularly due to persistent object occlusions and limited sensor coverage. Multiview observations from a single scene scan often fail to capture the full structural details. Existing approaches typically rely on multi stage pipelines, such as segmentation, background completion, and inpainting or require per-object dense scanning, both of which are error-prone, and not easily scalable. We propose IGFuse, a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans, where natural object rearrangement between captures reveal previously occluded regions. Our method constructs segmentation aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans. To handle spatial misalignments, we introduce a pseudo-intermediate scene state for unified alignment, alongside collaborative co-pruning strategies to refine geometry. IGFuse enables high fidelity rendering and object level scene manipulation without dense observations or complex pipelines. Extensive experiments validate the framework's strong generalization to novel scene configurations, demonstrating its effectiveness for real world 3D reconstruction and real-to-simulation transfer.\n\n重建完整且可交互的三维场景仍然是计算机视觉和机器人学中的一项基础性挑战，主要原因在于持续存在的物体遮挡和有限的传感器覆盖。单次场景扫描的多视角观测往往难以捕获完整的结构细节。现有方法通常依赖多阶段流水线，如分割、背景补全和修复，或要求对每个物体进行密集扫描，这两种方案都容易出错且难以扩展。我们提出了 IGFuse，一种新颖的框架，通过融合多次扫描的观测来重建可交互的高斯场景，其中自然的物体重排揭示了先前被遮挡的区域。我们的方法构建了具备分割感知的高斯场，并在多次扫描之间施加双向光度和语义一致性约束。为处理空间错位，我们引入了伪中间场景状态以实现统一对齐，并结合协同联合裁剪策略来优化几何。IGFuse 在无需密集观测或复杂流水线的情况下，实现了高保真渲染和物体级场景操控。大量实验验证了该框架在新场景配置上的强泛化能力，展示了其在真实三维重建和真实到仿真迁移中的有效性。\n"
  },
  {
    "path": "abs/2508.13287.md",
    "content": "### InnerGS: Internal Scenes Rendering via Factorized 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently gained popularity for efficient scene rendering by representing scenes as explicit sets of anisotropic 3D Gaussians. However, most existing work focuses primarily on modeling external surfaces. In this work, we target the reconstruction of internal scenes, which is crucial for applications that require a deep understanding of an object's interior. By directly modeling a continuous volumetric density through the inner 3D Gaussian distribution, our model effectively reconstructs smooth and detailed internal structures from sparse sliced data. Our approach eliminates the need for camera poses, is plug-and-play, and is inherently compatible with any data modalities.\n\n三维高斯喷溅（3DGS）因其通过显式的各向异性三维高斯集合表示场景，从而实现高效场景渲染，近年来广受关注。然而，大多数现有研究主要集中于外部表面的建模。本文针对内部场景的重建，这对于需要深入理解物体内部结构的应用至关重要。我们通过内部三维高斯分布直接建模连续体积密度，从稀疏切片数据中有效重建平滑且细致的内部结构。我们的方法无需相机位姿，具备即插即用特性，并且天然兼容任何数据模态。\n"
  },
  {
    "path": "abs/2508.13537.md",
    "content": "### EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors\n\nHigh-fidelity head avatar reconstruction plays a crucial role in AR/VR, gaming, and multimedia content creation. Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated effectiveness in modeling complex geometry with real-time rendering capability and are now widely used in high-fidelity head avatar reconstruction tasks. However, existing 3DGS-based methods still face significant challenges in capturing fine-grained facial expressions and preserving local texture continuity, especially in highly deformable regions. To mitigate these limitations, we propose a novel 3DGS-based framework termed EAvatar for head reconstruction that is both expression-aware and deformation-aware. Our method introduces a sparse expression control mechanism, where a small number of key Gaussians are used to influence the deformation of their neighboring Gaussians, enabling accurate modeling of local deformations and fine-scale texture transitions. Furthermore, we leverage high-quality 3D priors from pretrained generative models to provide a more reliable facial geometry, offering structural guidance that improves convergence stability and shape accuracy during training. Experimental results demonstrate that our method produces more accurate and visually coherent head reconstructions with improved expression controllability and detail fidelity.\n\n高保真头部头像重建在 AR/VR、游戏和多媒体内容创作中发挥着关键作用。近年来，三维高斯喷溅（3DGS）在建模复杂几何结构并实现实时渲染方面展现了优越性，并被广泛应用于高保真头部头像重建任务。然而，现有基于 3DGS 的方法在捕捉细粒度面部表情和保持局部纹理连续性方面仍面临显著挑战，尤其是在高度可变形区域。为缓解这些问题，我们提出了一种新颖的基于 3DGS 的头部重建框架 EAvatar，该框架同时具备表情感知和形变感知能力。我们的方法引入了稀疏表情控制机制，利用少量关键高斯影响其邻域高斯的形变，从而实现局部形变和细粒度纹理过渡的精确建模。此外，我们利用预训练生成模型提供的高质量三维先验，得到更可靠的面部几何，为训练过程提供结构性指导，从而提高收敛稳定性和形状精度。实验结果表明，我们的方法能够生成更准确且视觉一致的头部重建结果，并显著提升表情可控性和细节保真度。\n"
  },
  {
    "path": "abs/2508.13911.md",
    "content": "### PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis\n\nWhile physics-grounded 3D motion synthesis has seen significant progress, current methods face critical limitations. They typically rely on pre-reconstructed 3D Gaussian Splatting (3DGS) representations, while physics integration depends on either inflexible, manually defined physical attributes or unstable, optimization-heavy guidance from video models. To overcome these challenges, we introduce PhysGM, a feed-forward framework that jointly predicts a 3D Gaussian representation and its physical properties from a single image, enabling immediate, physical simulation and high-fidelity 4D rendering. We first establish a base model by jointly optimizing for Gaussian reconstruction and probabilistic physics prediction. The model is then refined with physically plausible reference videos to enhance both rendering fidelity and physics prediction accuracy. We adopt the Direct Preference Optimization (DPO) to align its simulations with reference videos, circumventing Score Distillation Sampling (SDS) optimization which needs back-propagating gradients through the complex differentiable simulation and rasterization. To facilitate the training, we introduce a new dataset PhysAssets of over 24,000 3D assets, annotated with physical properties and corresponding guiding videos. Experimental results demonstrate that our method effectively generates high-fidelity 4D simulations from a single image in one minute. This represents a significant speedup over prior works while delivering realistic rendering results.\n\n尽管基于物理的三维运动合成已取得显著进展，但现有方法仍存在关键局限。它们通常依赖于预先重建的三维高斯喷溅（3DGS）表示，而物理建模则依靠僵化的手动物理属性定义或依赖视频模型的、不稳定且优化开销巨大的引导。为克服这些挑战，我们提出了 PhysGM，一种前馈式框架，可从单张图像联合预测三维高斯表示及其物理属性，从而实现即时物理仿真和高保真四维渲染。我们首先通过联合优化高斯重建和概率物理预测来建立基础模型，随后利用物理合理的参考视频对模型进行精炼，以提升渲染保真度和物理预测精度。我们采用直接偏好优化（DPO）来将仿真结果与参考视频对齐，从而绕过需要通过复杂可微仿真和光栅化反向传播梯度的得分蒸馏采样（SDS）优化过程。为支持训练，我们引入了一个新的数据集 PhysAssets，包含超过 24,000 个三维资产，并附带物理属性标注和对应的引导视频。实验结果表明，我们的方法能够在一分钟内从单张图像生成高保真的四维仿真，相比以往工作大幅加速，并实现了逼真的渲染效果。\n"
  },
  {
    "path": "abs/2508.14014.md",
    "content": "### Online 3D Gaussian Splatting Modeling with Novel View Selection\n\nThis study addresses the challenge of generating online 3D Gaussian Splatting (3DGS) models from RGB-only frames. Previous studies have employed dense SLAM techniques to estimate 3D scenes from keyframes for 3DGS model construction. However, these methods are limited by their reliance solely on keyframes, which are insufficient to capture an entire scene, resulting in incomplete reconstructions. Moreover, building a generalizable model requires incorporating frames from diverse viewpoints to achieve broader scene coverage. However, online processing restricts the use of many frames or extensive training iterations. Therefore, we propose a novel method for high-quality 3DGS modeling that improves model completeness through adaptive view selection. By analyzing reconstruction quality online, our approach selects optimal non-keyframes for additional training. By integrating both keyframes and selected non-keyframes, the method refines incomplete regions from diverse viewpoints, significantly enhancing completeness. We also present a framework that incorporates an online multi-view stereo approach, ensuring consistency in 3D information throughout the 3DGS modeling process. Experimental results demonstrate that our method outperforms state-of-the-art methods, delivering exceptional performance in complex outdoor scenes.\n\n本研究针对仅使用 RGB 帧在线生成三维高斯喷溅（3DGS）模型的挑战展开探讨。以往研究多采用稠密 SLAM 技术从关键帧估计三维场景，用于 3DGS 模型构建。然而，这些方法仅依赖关键帧，无法完整覆盖整个场景，导致重建结果不完整。此外，为了构建具有泛化能力的模型，需要引入来自多视角的帧以实现更广的场景覆盖，但在线处理受到帧数量和训练迭代次数的限制。为此，我们提出了一种新方法，通过自适应视角选择提升 3DGS 建模的完整性。该方法在在线过程中分析重建质量，选取最优的非关键帧用于额外训练，并结合关键帧与所选非关键帧，从多视角修复不完整区域，从而显著提升重建完整度。我们还提出了一个结合在线多视图立体的框架，以确保 3DGS 建模过程中三维信息的一致性。实验结果表明，我们的方法优于现有的最新方法，在复杂的户外场景中表现出色。\n"
  },
  {
    "path": "abs/2508.14037.md",
    "content": "### Distilled-3DGS:Distilled 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has exhibited remarkable efficacy in novel view synthesis (NVS). However, it suffers from a significant drawback: achieving high-fidelity rendering typically necessitates a large number of 3D Gaussians, resulting in substantial memory consumption and storage requirements. To address this challenge, we propose the first knowledge distillation framework for 3DGS, featuring various teacher models, including vanilla 3DGS, noise-augmented variants, and dropout-regularized versions. The outputs of these teachers are aggregated to guide the optimization of a lightweight student model. To distill the hidden geometric structure, we propose a structural similarity loss to boost the consistency of spatial geometric distributions between the student and teacher model. Through comprehensive quantitative and qualitative evaluations across diverse datasets, the proposed Distilled-3DGS, a simple yet effective framework without bells and whistles, achieves promising rendering results in both rendering quality and storage efficiency compared to state-of-the-art methods.\n\n三维高斯喷溅（3DGS）在新视角合成（NVS）中展现了卓越的效果。然而，它存在一个显著缺点：要实现高保真渲染通常需要大量的三维高斯，从而导致巨大的内存占用和存储需求。为解决这一问题，我们提出了首个用于 3DGS 的知识蒸馏框架，该框架包含多种教师模型，包括原始 3DGS、噪声增强变体和 Dropout 正则化版本。教师模型的输出被聚合，用以引导轻量级学生模型的优化。为了提炼隐藏的几何结构，我们提出了一种结构相似性损失，以增强学生模型与教师模型在空间几何分布上的一致性。通过在多种数据集上的全面定量与定性评估，所提出的 Distilled-3DGS 框架尽管简单但高效，在渲染质量和存储效率方面均优于现有最先进方法，取得了令人满意的渲染结果。\n"
  },
  {
    "path": "abs/2508.14041.md",
    "content": "### LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos\n\nLongSplat addresses critical challenges in novel view synthesis (NVS) from casually captured long videos characterized by irregular camera motion, unknown camera poses, and expansive scenes. Current methods often suffer from pose drift, inaccurate geometry initialization, and severe memory limitations. To address these issues, we introduce LongSplat, a robust unposed 3D Gaussian Splatting framework featuring: (1) Incremental Joint Optimization that concurrently optimizes camera poses and 3D Gaussians to avoid local minima and ensure global consistency; (2) a robust Pose Estimation Module leveraging learned 3D priors; and (3) an efficient Octree Anchor Formation mechanism that converts dense point clouds into anchors based on spatial density. Extensive experiments on challenging benchmarks demonstrate that LongSplat achieves state-of-the-art results, substantially improving rendering quality, pose accuracy, and computational efficiency compared to prior approaches.\n\nLongSplat 针对从随意捕获的长视频中进行新视角合成（NVS）时面临的关键挑战，包括不规则的相机运动、未知的相机位姿以及大范围场景。现有方法常常遭遇位姿漂移、几何初始化不准确以及严重的内存限制等问题。为解决这些问题，我们提出了 LongSplat，一种鲁棒的无位姿三维高斯喷溅框架，主要包括三项创新：(1) 增量式联合优化，同时优化相机位姿和三维高斯，以避免局部最优并保证全局一致性；(2) 利用学习到的三维先验的鲁棒位姿估计模块；(3) 高效的八叉树锚点生成机制，将稠密点云基于空间密度转换为锚点。在具有挑战性的基准上进行的大量实验表明，LongSplat 在渲染质量、位姿精度和计算效率方面均显著优于现有方法，达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2508.14278.md",
    "content": "### GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting\n\n3D scene reconstruction and understanding have gained increasing popularity, yet existing methods still struggle to capture fine-grained, language-aware 3D representations from 2D images. In this paper, we present GALA, a novel framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS). GALA distills a scene-specific 3D instance feature field via self-supervised contrastive learning. To extend to generalized language feature fields, we introduce the core contribution of GALA, a cross-attention module with two learnable codebooks that encode view-independent semantic embeddings. This design not only ensures intra-instance feature similarity but also supports seamless 2D and 3D open-vocabulary queries. It reduces memory consumption by avoiding per-Gaussian high-dimensional feature learning. Extensive experiments on real-world datasets demonstrate GALA's remarkable open-vocabulary performance on both 2D and 3D.\n\n三维场景重建与理解日益受到关注，但现有方法仍难以从二维图像中捕获细粒度、具备语言感知能力的三维表征。本文提出 GALA，一种基于三维高斯喷溅（3DGS）的开放词汇三维场景理解新框架。GALA 通过自监督对比学习蒸馏出场景特定的三维实例特征场。为了扩展到通用语言特征场，我们提出了 GALA 的核心贡献——一个带有两个可学习码本的交叉注意力模块，用于编码视角无关的语义嵌入。该设计不仅保证了实例内特征的相似性，还支持无缝的二维与三维开放词汇查询。同时，通过避免针对每个高斯学习高维特征，有效降低了内存消耗。在真实数据集上的大量实验表明，GALA 在二维和三维上的开放词汇性能表现出色。\n"
  },
  {
    "path": "abs/2508.14443.md",
    "content": "### Reconstruction Using the Invisible: Intuition from NIR and Metadata for Enhanced 3D Gaussian Splatting\n\nWhile 3D Gaussian Splatting (3DGS) has rapidly advanced, its application in agriculture remains underexplored. Agricultural scenes present unique challenges for 3D reconstruction methods, particularly due to uneven illumination, occlusions, and a limited field of view. To address these limitations, we introduce NIRPlant, a novel multimodal dataset encompassing Near-Infrared (NIR) imagery, RGB imagery, textual metadata, Depth, and LiDAR data collected under varied indoor and outdoor lighting conditions. By integrating NIR data, our approach enhances robustness and provides crucial botanical insights that extend beyond the visible spectrum. Additionally, we leverage text-based metadata derived from vegetation indices, such as NDVI, NDWI, and the chlorophyll index, which significantly enriches the contextual understanding of complex agricultural environments. To fully exploit these modalities, we propose NIRSplat, an effective multimodal Gaussian splatting architecture employing a cross-attention mechanism combined with 3D point-based positional encoding, providing robust geometric priors. Comprehensive experiments demonstrate that NIRSplat outperforms existing landmark methods, including 3DGS, CoR-GS, and InstantSplat, highlighting its effectiveness in challenging agricultural scenarios.\n\n尽管三维高斯喷溅（3DGS）近年来发展迅速，其在农业领域的应用仍然研究不足。农业场景对三维重建方法提出了独特挑战，尤其是由于光照不均、物体遮挡以及有限视场的影响。为解决这些问题，我们提出了**NIRPlant**，一个全新的多模态数据集，涵盖近红外（NIR）图像、RGB 图像、文本元数据、深度和激光雷达数据，并在多种室内外光照条件下采集。通过引入 NIR 数据，我们的方法提升了鲁棒性，并提供了超越可见光范围的关键植物学信息。此外，我们利用基于文本的元数据，这些元数据来源于植被指数，如 NDVI、NDWI 和叶绿素指数，从而显著丰富了对复杂农业环境的上下文理解。为了充分利用这些模态信息，我们提出了**NIRSplat**，一种高效的多模态高斯喷溅架构，结合交叉注意力机制与基于三维点的位置信息编码，以提供稳健的几何先验。全面实验结果表明，**NIRSplat** 在包括 3DGS、CoR-GS 和 InstantSplat 在内的现有方法中表现优异，展现出其在复杂农业场景中的有效性。\n"
  },
  {
    "path": "abs/2508.14449.md",
    "content": "### D^3-Talker: Dual-Branch Decoupled Deformation Fields for Few-Shot 3D Talking Head Synthesis\n\nA key challenge in 3D talking head synthesis lies in the reliance on a long-duration talking head video to train a new model for each target identity from scratch. Recent methods have attempted to address this issue by extracting general features from audio through pre-training models. However, since audio contains information irrelevant to lip motion, existing approaches typically struggle to map the given audio to realistic lip behaviors in the target face when trained on only a few frames, causing poor lip synchronization and talking head image quality. This paper proposes D^3-Talker, a novel approach that constructs a static 3D Gaussian attribute field and employs audio and Facial Motion signals to independently control two distinct Gaussian attribute deformation fields, effectively decoupling the predictions of general and personalized deformations. We design a novel similarity contrastive loss function during pre-training to achieve more thorough decoupling. Furthermore, we integrate a Coarse-to-Fine module to refine the rendered images, alleviating blurriness caused by head movements and enhancing overall image quality. Extensive experiments demonstrate that D^3-Talker outperforms state-of-the-art methods in both high-fidelity rendering and accurate audio-lip synchronization with limited training data.\n\n三维说话人头像合成的关键挑战在于需要依赖长时长的说话人视频，为每个目标身份从零开始训练新模型。近年来的一些方法尝试通过预训练模型从音频中提取通用特征来缓解这一问题。然而，由于音频中包含与唇部运动无关的信息，现有方法在仅用少量帧训练时，往往难以将给定音频映射到目标人脸的逼真唇部运动，导致唇形同步和头像图像质量较差。本文提出 D^3-Talker，一种新颖的方法，通过构建静态三维高斯属性场，并分别利用音频和面部运动信号独立控制两个不同的高斯属性形变场，从而有效解耦通用和个性化形变的预测。我们设计了一种新的相似性对比损失函数用于预训练，以实现更彻底的解耦。此外，我们集成了一个由粗到细的模块来优化渲染图像，缓解由头部运动引起的模糊，并提升整体图像质量。大量实验表明，D^3-Talker 在有限训练数据下实现了更高保真的渲染和更准确的音频-唇形同步，显著优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2508.14552.md",
    "content": "### From Slices to Structures: Unsupervised 3D Reconstruction of Female Pelvic Anatomy from Freehand Transvaginal Ultrasound\n\nVolumetric ultrasound has the potential to significantly improve diagnostic accuracy and clinical decision-making, yet its widespread adoption remains limited by dependence on specialized hardware and restrictive acquisition protocols. In this work, we present a novel unsupervised framework for reconstructing 3D anatomical structures from freehand 2D transvaginal ultrasound (TVS) sweeps, without requiring external tracking or learned pose estimators. Our method adapts the principles of Gaussian Splatting to the domain of ultrasound, introducing a slice-aware, differentiable rasterizer tailored to the unique physics and geometry of ultrasound imaging. We model anatomy as a collection of anisotropic 3D Gaussians and optimize their parameters directly from image-level supervision, leveraging sensorless probe motion estimation and domain-specific geometric priors. The result is a compact, flexible, and memory-efficient volumetric representation that captures anatomical detail with high spatial fidelity. This work demonstrates that accurate 3D reconstruction from 2D ultrasound images can be achieved through purely computational means, offering a scalable alternative to conventional 3D systems and enabling new opportunities for AI-assisted analysis and diagnosis.\n\n容积超声有望显著提升诊断准确性和临床决策能力，但其广泛应用仍受限于对专用硬件和严格采集协议的依赖。本文提出了一种全新的无监督框架，可从自由手持的二维经阴道超声（TVS）扫查重建三维解剖结构，无需外部追踪或学习型位姿估计器。我们的方法将高斯喷溅的原理引入超声成像领域，提出了一种切片感知的可微光栅化器，专门针对超声成像的物理特性和几何特征进行设计。我们将解剖结构建模为一组各向异性的三维高斯，并直接通过图像级监督优化其参数，同时利用无传感器探头运动估计和特定领域几何先验。最终得到一种紧凑、灵活且内存高效的体积表示，能够以高空间保真度捕获解剖细节。本研究表明，可以完全通过计算手段实现从二维超声图像到三维的准确重建，为传统三维超声系统提供了一种可扩展的替代方案，并为人工智能辅助分析和诊断创造了新的可能性。\n"
  },
  {
    "path": "abs/2508.14563.md",
    "content": "### GOGS: High-Fidelity Geometry and Relighting for Glossy Objects via Gaussian Surfels\n\nInverse rendering of glossy objects from RGB imagery remains fundamentally limited by inherent ambiguity. Although NeRF-based methods achieve high-fidelity reconstruction via dense-ray sampling, their computational cost is prohibitive. Recent 3D Gaussian Splatting achieves high reconstruction efficiency but exhibits limitations under specular reflections. Multi-view inconsistencies introduce high-frequency surface noise and structural artifacts, while simplified rendering equations obscure material properties, leading to implausible relighting results. To address these issues, we propose GOGS, a novel two-stage framework based on 2D Gaussian surfels. First, we establish robust surface reconstruction through physics-based rendering with split-sum approximation, enhanced by geometric priors from foundation models. Second, we perform material decomposition by leveraging Monte Carlo importance sampling of the full rendering equation, modeling indirect illumination via differentiable 2D Gaussian ray tracing and refining high-frequency specular details through spherical mipmap-based directional encoding that captures anisotropic highlights. Extensive experiments demonstrate state-of-the-art performance in geometry reconstruction, material separation, and photorealistic relighting under novel illuminations, outperforming existing inverse rendering approaches.\n\n基于 RGB 图像的光亮物体逆渲染受限于其固有的歧义性。尽管基于 NeRF 的方法通过密集光线采样实现了高保真重建，但其计算成本高昂。近年来的三维高斯喷溅（3DGS）实现了高效重建，但在镜面反射场景下表现不佳。多视图不一致会引入高频表面噪声和结构伪影，而简化的渲染方程掩盖了材质属性，导致不真实的重光照结果。为解决这些问题，我们提出了 GOGS，一种基于二维高斯表面元的双阶段框架。首先，我们通过基于物理的渲染与分裂和近似实现鲁棒的表面重建，并结合基础模型的几何先验进行增强。其次，我们利用全渲染方程的蒙特卡洛重要性采样进行材质分解，通过可微二维高斯光线追踪建模间接光照，并结合球形 mipmap 定向编码细化高频镜面细节以捕捉各向异性高光。大量实验表明，GOGS 在几何重建、材质分离和新光照条件下的逼真重光照方面实现了当前最先进的性能，优于现有逆渲染方法。\n"
  },
  {
    "path": "abs/2508.14682.md",
    "content": "### GeMS: Efficient Gaussian Splatting for Extreme Motion Blur\n\nWe introduce GeMS, a framework for 3D Gaussian Splatting (3DGS) designed to handle severely motion-blurred images. State-of-the-art deblurring methods for extreme blur, such as ExBluRF, as well as Gaussian Splatting-based approaches like Deblur-GS, typically assume access to sharp images for camera pose estimation and point cloud generation, an unrealistic assumption. Methods relying on COLMAP initialization, such as BAD-Gaussians, also fail due to unreliable feature correspondences under severe blur. To address these challenges, we propose GeMS, a 3DGS framework that reconstructs scenes directly from extremely blurred images. GeMS integrates: (1) VGGSfM, a deep learning-based Structure-from-Motion pipeline that estimates poses and generates point clouds directly from blurred inputs; (2) 3DGS-MCMC, which enables robust scene initialization by treating Gaussians as samples from a probability distribution, eliminating heuristic densification and pruning; and (3) joint optimization of camera trajectories and Gaussian parameters for stable reconstruction. While this pipeline produces strong results, inaccuracies may remain when all inputs are severely blurred. To mitigate this, we propose GeMS-E, which integrates a progressive refinement step using events: (4) Event-based Double Integral (EDI) deblurring restores sharper images that are then fed into GeMS, improving pose estimation, point cloud generation, and overall reconstruction. Both GeMS and GeMS-E achieve state-of-the-art performance on synthetic and real-world datasets. To our knowledge, this is the first framework to address extreme motion blur within 3DGS directly from severely blurred inputs.\n\n我们提出了 GeMS，一种专为处理严重运动模糊图像设计的三维高斯喷溅（3DGS）框架。现有针对极端模糊的去模糊方法，如 ExBluRF，以及基于高斯喷溅的方法，如 Deblur-GS，通常假设能够获取清晰图像以进行相机位姿估计和点云生成，这在实际中往往不切实际。依赖 COLMAP 初始化的方法（如 BAD-Gaussians）也会因严重模糊下特征匹配不可靠而失败。为解决这些问题，我们提出 GeMS，一个可直接从严重模糊图像重建场景的 3DGS 框架。GeMS 包含：(1) VGGSfM：一种基于深度学习的运动恢复结构（SfM）管线，可直接从模糊输入估计相机位姿并生成点云；(2) 3DGS-MCMC：通过将高斯视为来自概率分布的样本进行稳健的场景初始化，避免了启发式的密集化和剪枝操作；(3) 相机轨迹与高斯参数的联合优化，实现稳定重建。尽管该流程表现强劲，但在所有输入均严重模糊时仍可能存在误差。为此，我们提出 GeMS-E，通过事件相机进行渐进式精炼：(4) 事件驱动双积分（EDI）去模糊生成更清晰的图像，再输入 GeMS 以提升位姿估计、点云生成和整体重建质量。实验表明，GeMS 和 GeMS-E 在合成和真实数据集上均实现了当前最先进的性能。据我们所知，这是首个能够直接从严重模糊输入中解决 3DGS 极端运动模糊问题的框架。\n"
  },
  {
    "path": "abs/2508.14717.md",
    "content": "### GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting\n\nRecent developments in 3D Gaussian Splatting have significantly enhanced novel view synthesis, yet generating high-quality renderings from extreme novel viewpoints or partially observed regions remains challenging. Meanwhile, diffusion models exhibit strong generative capabilities, but their reliance on text prompts and lack of awareness of specific scene information hinder accurate 3D reconstruction tasks. To address these limitations, we introduce GSFix3D, a novel framework that improves the visual fidelity in under-constrained regions by distilling prior knowledge from diffusion models into 3D representations, while preserving consistency with observed scene details. At its core is GSFixer, a latent diffusion model obtained via our customized fine-tuning protocol that can leverage both mesh and 3D Gaussians to adapt pretrained generative models to a variety of environments and artifact types from different reconstruction methods, enabling robust novel view repair for unseen camera poses. Moreover, we propose a random mask augmentation strategy that empowers GSFixer to plausibly inpaint missing regions. Experiments on challenging benchmarks demonstrate that our GSFix3D and GSFixer achieve state-of-the-art performance, requiring only minimal scene-specific fine-tuning on captured data. Real-world test further confirms its resilience to potential pose errors.\n\n近年来，三维高斯喷溅（3DGS）在新视角合成方面取得了显著进展，但从极端新视角或部分可见区域生成高质量渲染仍然具有挑战性。与此同时，扩散模型展现了强大的生成能力，但其对文本提示的依赖以及缺乏对具体场景信息的感知限制了其在精确三维重建任务中的应用。为克服这些限制，我们提出了 GSFix3D，一种新颖的框架，通过将扩散模型的先验知识蒸馏到三维表示中，提升欠约束区域的视觉保真度，同时保持与观测场景细节的一致性。其核心组件是 GSFixer，一种通过定制化微调协议获得的潜空间扩散模型，能够同时利用网格和三维高斯，将预训练生成模型适配到多种环境和来自不同重建方法的伪影类型，从而实现对未见相机位姿的鲁棒新视角修复。此外，我们提出了一种随机掩码增强策略，使 GSFixer 能够合理地补全缺失区域。大量具有挑战性的基准实验表明，GSFix3D 和 GSFixer 在仅需极少场景特定微调的情况下，达到了当前最先进的性能。真实场景测试进一步验证了其对潜在位姿误差的鲁棒性。\n"
  },
  {
    "path": "abs/2508.14891.md",
    "content": "### GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects\n\nReconstructing articulated objects is essential for building digital twins of interactive environments. However, prior methods typically decouple geometry and motion by first reconstructing object shape in distinct states and then estimating articulation through post-hoc alignment. This separation complicates the reconstruction pipeline and restricts scalability, especially for objects with complex, multi-part articulation. We introduce a unified representation that jointly models geometry and motion using articulated 3D Gaussians. This formulation improves robustness in motion decomposition and supports articulated objects with up to 20 parts, significantly outperforming prior approaches that often struggle beyond 2--3 parts due to brittle initialization. To systematically assess scalability and generalization, we propose MPArt-90, a new benchmark consisting of 90 articulated objects across 20 categories, each with diverse part counts and motion configurations. Extensive experiments show that our method consistently achieves superior accuracy in part-level geometry reconstruction and motion estimation across a broad range of object types. We further demonstrate applicability to downstream tasks such as robotic simulation and human-scene interaction modeling, highlighting the potential of unified articulated representations in scalable physical modeling.\n\n可动物体的重建对于构建交互环境的数字孪生至关重要。然而，现有方法通常将几何与运动解耦，先分别重建物体在不同状态下的几何形状，再通过后处理对齐估计关节运动。这种分离的流程增加了重建管线的复杂度并限制了可扩展性，尤其在处理具有复杂多部件关节运动的物体时问题尤为突出。我们提出了一种统一表示，通过关节化三维高斯同时建模几何与运动。该方法提升了运动分解的鲁棒性，并支持最多 20 个部件的关节物体，显著优于现有在超过 2-3 个部件时易失败的方法。为系统评估可扩展性与泛化能力，我们提出了 MPArt-90，一个涵盖 20 类、共 90 个可动物体的新基准，每个物体具有多样的部件数量与运动配置。大量实验表明，我们的方法在部件级几何重建和运动估计方面，在多种物体类型上均取得了优异表现。我们还展示了其在机器人仿真和人-场景交互建模等下游任务中的应用潜力，突显了统一关节表示在可扩展物理建模中的巨大价值。\n"
  },
  {
    "path": "abs/2508.14892.md",
    "content": "### Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds\n\nReconstructing 3D human bodies from sparse views has been an appealing topic, which is crucial to broader the related applications. In this paper, we propose a quite challenging but valuable task to reconstruct the human body from only two images, i.e., the front and back view, which can largely lower the barrier for users to create their own 3D digital humans. The main challenges lie in the difficulty of building 3D consistency and recovering missing information from the highly sparse input. We redesign a geometry reconstruction model based on foundation reconstruction models to predict consistent point clouds even input images have scarce overlaps with extensive human data training. Furthermore, an enhancement algorithm is applied to supplement the missing color information, and then the complete human point clouds with colors can be obtained, which are directly transformed into 3D Gaussians for better rendering quality. Experiments show that our method can reconstruct the entire human in 190 ms on a single NVIDIA RTX 4090, with two images at a resolution of 1024x1024, demonstrating state-of-the-art performance on the THuman2.0 and cross-domain datasets. Additionally, our method can complete human reconstruction even with images captured by low-cost mobile devices, reducing the requirements for data collection.\n\n从稀疏视角重建三维人体一直是一个颇具吸引力的话题，对拓展相关应用具有重要意义。本文提出了一项具有相当挑战性但极具价值的任务，即仅利用两张图像（正面和背面视角）重建人体，这大大降低了用户创建自己三维数字人门槛。其主要挑战在于构建三维一致性以及从高度稀疏的输入中恢复缺失信息的困难。我们基于基础重建模型重新设计了几何重建模型，使其即使在输入图像重叠极少的情况下，也能通过大规模人体数据训练预测出一致的点云。此外，我们还引入了一种增强算法来补充缺失的颜色信息，从而获得完整的人体彩色点云，并直接转换为三维高斯表示，以提升渲染质量。实验结果表明，在单张 NVIDIA RTX 4090 上，我们的方法仅需 190 毫秒即可完成两张分辨率为 1024x1024 的图像的人体完整重建，并在 THuman2.0 及跨域数据集上展现了当前最优的性能。此外，即便是由低成本移动设备采集的图像，我们的方法也能完成人体重建，从而降低了数据采集的要求。\n"
  },
  {
    "path": "abs/2508.15151.md",
    "content": "### Zero-shot Volumetric CT Super-Resolution using 3D Gaussian Splatting with Upsampled 2D X-ray Projection Priors\n\nComputed tomography (CT) is widely used in clinical diagnosis, but acquiring high-resolution (HR) CT is limited by radiation exposure risks. Deep learning-based super-resolution (SR) methods have been studied to reconstruct HR from low-resolution (LR) inputs. While supervised SR approaches have shown promising results, they require large-scale paired LR-HR volume datasets that are often unavailable. In contrast, zero-shot methods alleviate the need for paired data by using only a single LR input, but typically struggle to recover fine anatomical details due to limited internal information. To overcome these, we propose a novel zero-shot 3D CT SR framework that leverages upsampled 2D X-ray projection priors generated by a diffusion model. Exploiting the abundance of HR 2D X-ray data, we train a diffusion model on large-scale 2D X-ray projection and introduce a per-projection adaptive sampling strategy. It selects the generative process for each projection, thus providing HR projections as strong external priors for 3D CT reconstruction. These projections serve as inputs to 3D Gaussian splatting for reconstructing a 3D CT volume. Furthermore, we propose negative alpha blending (NAB-GS) that allows negative values in Gaussian density representation. NAB-GS enables residual learning between LR and diffusion-based projections, thereby enhancing high-frequency structure reconstruction. Experiments on two datasets show that our method achieves superior quantitative and qualitative results for 3D CT SR.\n\n计算机断层扫描（CT）在临床诊断中被广泛使用，但获取高分辨率（HR）CT 受限于辐射暴露风险。基于深度学习的超分辨率（SR）方法已被研究用于从低分辨率（LR）输入重建 HR。尽管有监督的 SR 方法取得了有前景的结果，但它们需要大规模成对的 LR-HR 体数据集，而这类数据往往难以获得。相比之下，零样本方法通过仅使用单个 LR 输入来减轻对成对数据的依赖，但由于内部信息有限，通常难以恢复精细的解剖细节。为了解决这些问题，我们提出了一种新颖的零样本三维 CT 超分辨率框架，该框架利用由扩散模型生成的上采样二维 X 射线投影先验。借助丰富的 HR 二维 X 射线数据，我们在大规模二维 X 射线投影上训练扩散模型，并引入逐投影自适应采样策略。该策略为每个投影选择生成过程，从而提供作为三维 CT 重建强外部先验的 HR 投影。这些投影作为输入用于三维高斯溅射，以重建三维 CT 体数据。此外，我们提出了允许高斯密度表示中出现负值的负 alpha 融合（NAB-GS）。NAB-GS 使得 LR 与基于扩散的投影之间能够进行残差学习，从而增强高频结构的重建。基于两个数据集的实验结果表明，我们的方法在三维 CT 超分辨率任务中实现了优越的定量和定性效果。\n"
  },
  {
    "path": "abs/2508.15169.md",
    "content": "### MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion\n\nMesh models have become increasingly accessible for numerous cities; however, the lack of realistic textures restricts their application in virtual urban navigation and autonomous driving. To address this, this paper proposes MeSS (Meshbased Scene Synthesis) for generating high-quality, styleconsistent outdoor scenes with city mesh models serving as the geometric prior. While image and video diffusion models can leverage spatial layouts (such as depth maps or HD maps) as control conditions to generate street-level perspective views, they are not directly applicable to 3D scene generation. Video diffusion models excel at synthesizing consistent view sequences that depict scenes but often struggle to adhere to predefined camera paths or align accurately with rendered control videos. In contrast, image diffusion models, though unable to guarantee cross-view visual consistency, can produce more geometry-aligned results when combined with ControlNet. Building on this insight, our approach enhances image diffusion models by improving cross-view consistency. The pipeline comprises three key stages: first, we generate geometrically consistent sparse views using Cascaded Outpainting ControlNets; second, we propagate denser intermediate views via a component dubbed AGInpaint; and third, we globally eliminate visual inconsistencies (e.g., varying exposure) using the GCAlign module. Concurrently with generation, a 3D Gaussian Splatting (3DGS) scene is reconstructed by initializing Gaussian balls on the mesh surface. Our method outperforms existing approaches in both geometric alignment and generation quality. Once synthesized, the scene can be rendered in diverse styles through relighting and style transfer techniques.\n\n\n网格模型在众多城市中日益普及，然而缺乏逼真的纹理限制了其在虚拟城市导航和自动驾驶中的应用。为了解决这一问题，本文提出了 MeSS（基于网格的场景合成）方法，用城市网格模型作为几何先验来生成高质量、风格一致的室外场景。尽管图像和视频扩散模型可以利用空间布局（如深度图或高清地图）作为控制条件来生成街景视角，但它们并不能直接用于三维场景生成。视频扩散模型擅长合成一致的视角序列以描绘场景，但往往难以严格遵循预设的相机路径或与渲染的控制视频准确对齐。相比之下，图像扩散模型虽然无法保证跨视角的一致性，但在结合 ControlNet 时能够生成更符合几何约束的结果。基于这一洞察，我们的方法通过改进跨视角一致性来增强图像扩散模型。其流程包括三个关键阶段：首先，利用级联外扩绘制 ControlNet 生成几何一致的稀疏视角；其次，通过名为 AGInpaint 的组件传播更密集的中间视角；第三，借助 GCAlign 模块全局消除视觉不一致性（如曝光差异）。在生成的同时，我们还通过在网格表面初始化高斯球来重建一个三维高斯溅射（3DGS）场景。实验表明，我们的方法在几何对齐和生成质量上均优于现有方法。完成合成后，该场景还可以通过重光照和风格迁移技术呈现多样化的风格。\n"
  },
  {
    "path": "abs/2508.15372.md",
    "content": "### Image-Conditioned 3D Gaussian Splat Quantization\n\n3D Gaussian Splatting (3DGS) has attracted considerable attention for enabling high-quality real-time rendering. Although 3DGS compression methods have been proposed for deployment on storage-constrained devices, two limitations hinder archival use: (1) they compress medium-scale scenes only to the megabyte range, which remains impractical for large-scale scenes or extensive scene collections; and (2) they lack mechanisms to accommodate scene changes after long-term archival. To address these limitations, we propose an Image-Conditioned Gaussian Splat Quantizer (ICGS-Quantizer) that substantially enhances compression efficiency and provides adaptability to scene changes after archiving. ICGS-Quantizer improves quantization efficiency by jointly exploiting inter-Gaussian and inter-attribute correlations and by using shared codebooks across all training scenes, which are then fixed and applied to previously unseen test scenes, eliminating the overhead of per-scene codebooks. This approach effectively reduces the storage requirements for 3DGS to the kilobyte range while preserving visual fidelity. To enable adaptability to post-archival scene changes, ICGS-Quantizer conditions scene decoding on images captured at decoding time. The encoding, quantization, and decoding processes are trained jointly, ensuring that the codes, which are quantized representations of the scene, are effective for conditional decoding. We evaluate ICGS-Quantizer on 3D scene compression and 3D scene updating. Experimental results show that ICGS-Quantizer consistently outperforms state-of-the-art methods in compression efficiency and adaptability to scene changes.\n\n三维高斯溅射（3DGS）因其支持高质量实时渲染而备受关注。尽管已有针对存储受限设备的 3DGS 压缩方法被提出，但在长期归档使用中仍存在两个限制：（1）它们只能将中等规模场景压缩到兆字节级别，对于大规模场景或大量场景集合仍然不切实际；（2）缺乏在长期归档后适应场景变化的机制。为解决这些问题，我们提出了一种图像条件高斯溅射量化器（ICGS-Quantizer），显著提升压缩效率并在归档后具备适应场景变化的能力。ICGS-Quantizer 通过联合利用高斯间和属性间的相关性，并在所有训练场景中使用共享码本（之后固定并应用于未见过的测试场景），从而提高量化效率并消除了逐场景码本的开销。这种方法有效地将 3DGS 的存储需求降低到千字节级别，同时保持视觉保真度。为实现归档后场景变化的适应性，ICGS-Quantizer 在解码时依赖解码时捕获的图像作为条件。编码、量化与解码过程被联合训练，确保作为场景量化表示的代码在条件解码中保持有效。我们在三维场景压缩与三维场景更新上对 ICGS-Quantizer 进行了评估。实验结果表明，ICGS-Quantizer 在压缩效率与适应场景变化能力方面均稳定优于现有最先进方法。\n"
  },
  {
    "path": "abs/2508.15376.md",
    "content": "### DriveSplat: Decoupled Driving Scene Reconstruction with Geometry-enhanced Partitioned Neural Gaussians\n\nIn the realm of driving scenarios, the presence of rapidly moving vehicles, pedestrians in motion, and large-scale static backgrounds poses significant challenges for 3D scene reconstruction. Recent methods based on 3D Gaussian Splatting address the motion blur problem by decoupling dynamic and static components within the scene. However, these decoupling strategies overlook background optimization with adequate geometry relationships and rely solely on fitting each training view by adding Gaussians. Therefore, these models exhibit limited robustness in rendering novel views and lack an accurate geometric representation. To address the above issues, we introduce DriveSplat, a high-quality reconstruction method for driving scenarios based on neural Gaussian representations with dynamic-static decoupling. To better accommodate the predominantly linear motion patterns of driving viewpoints, a region-wise voxel initialization scheme is employed, which partitions the scene into near, middle, and far regions to enhance close-range detail representation. Deformable neural Gaussians are introduced to model non-rigid dynamic actors, whose parameters are temporally adjusted by a learnable deformation network. The entire framework is further supervised by depth and normal priors from pre-trained models, improving the accuracy of geometric structures. Our method has been rigorously evaluated on the Waymo and KITTI datasets, demonstrating state-of-the-art performance in novel-view synthesis for driving scenarios.\n\n在驾驶场景中，快速移动的车辆、行进中的行人以及大规模静态背景给三维场景重建带来了显著挑战。近期基于三维高斯溅射的方法通过对场景中的动态与静态部分进行解耦来应对运动模糊问题。然而，这些解耦策略忽视了具有充分几何关系的背景优化，仅依赖于通过添加高斯来拟合每个训练视角。因此，这类模型在新视角渲染时鲁棒性有限，且缺乏准确的几何表示。为了解决上述问题，我们提出了 DriveSplat，这是一种基于神经高斯表示并结合动静态解耦的高质量驾驶场景重建方法。为了更好地适应驾驶视角中主要呈线性运动的特点，我们采用了区域划分体素初始化方案，将场景划分为近、中、远三个区域，以增强近景细节的表示能力。我们进一步引入可变形神经高斯来建模非刚体动态目标，其参数通过可学习的形变网络在时间维度上进行调整。整个框架还受到来自预训练模型的深度与法线先验的监督，从而提升几何结构的准确性。我们在 Waymo 和 KITTI 数据集上对该方法进行了严格评估，结果表明 DriveSplat 在驾驶场景的新视角合成任务中达到了当前最优性能。\n"
  },
  {
    "path": "abs/2508.15457.md",
    "content": "### Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework\n\n3D Gaussian Splatting (3DGS) has demonstrated remarkable real-time performance in novel view synthesis, yet its effectiveness relies heavily on dense multi-view inputs with precisely known camera poses, which are rarely available in real-world scenarios. When input views become extremely sparse, the Structure-from-Motion (SfM) method that 3DGS depends on for initialization fails to accurately reconstruct the 3D geometric structures of scenes, resulting in degraded rendering quality. In this paper, we propose a novel SfM-free 3DGS-based method that jointly estimates camera poses and reconstructs 3D scenes from extremely sparse-view inputs. Specifically, instead of SfM, we propose a dense stereo module to progressively estimates camera pose information and reconstructs a global dense point cloud for initialization. To address the inherent problem of information scarcity in extremely sparse-view settings, we propose a coherent view interpolation module that interpolates camera poses based on training view pairs and generates viewpoint-consistent content as additional supervision signals for training. Furthermore, we introduce multi-scale Laplacian consistent regularization and adaptive spatial-aware multi-scale geometry regularization to enhance the quality of geometrical structures and rendered content. Experiments show that our method significantly outperforms other state-of-the-art 3DGS-based approaches, achieving a remarkable 2.75dB improvement in PSNR under extremely sparse-view conditions (using only 2 training views). The images synthesized by our method exhibit minimal distortion while preserving rich high-frequency details, resulting in superior visual quality compared to existing techniques.\n\n三维高斯溅射（3DGS）在新视角合成中展现了卓越的实时性能，但其有效性严重依赖于密集多视角输入及精确已知的相机位姿，而这些在真实场景中往往难以获得。当输入视角极度稀疏时，3DGS 依赖的结构自运动（SfM）初始化方法无法准确重建场景的三维几何结构，导致渲染质量下降。为了解决这一问题，本文提出了一种无需 SfM 的基于 3DGS 的新方法，可在极稀疏视角输入下联合估计相机位姿并重建三维场景。具体而言，我们以稠密立体模块取代 SfM，逐步估计相机位姿信息并重建用于初始化的全局稠密点云。针对极稀疏视角下信息不足的固有问题，我们提出了一种一致视角插值模块，该模块基于训练视角对进行相机位姿插值，并生成视角一致的内容作为额外监督信号。此外，我们引入了多尺度拉普拉斯一致性正则化与自适应空间感知多尺度几何正则化，以提升几何结构与渲染内容的质量。实验结果表明，在极稀疏视角条件下（仅使用 2 个训练视角），我们的方法显著优于其他最先进的基于 3DGS 的方法，在 PSNR 上实现了 2.75dB 的显著提升。合成图像几乎无畸变，同时保留了丰富的高频细节，在视觉质量上明显优于现有技术。\n"
  },
  {
    "path": "abs/2508.15972.md",
    "content": "### UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation\n\nEstimating the 6D pose of novel objects is a fundamental yet challenging problem in robotics, often relying on access to object CAD models. However, acquiring such models can be costly and impractical. Recent approaches aim to bypass this requirement by leveraging strong priors from foundation models to reconstruct objects from single or multi-view images, but typically require additional training or produce hallucinated geometry. To this end, we propose UnPose, a novel framework for zero-shot, model-free 6D object pose estimation and reconstruction that exploits 3D priors and uncertainty estimates from a pre-trained diffusion model. Specifically, starting from a single-view RGB-D frame, UnPose uses a multi-view diffusion model to estimate an initial 3D model using 3D Gaussian Splatting (3DGS) representation, along with pixel-wise epistemic uncertainty estimates. As additional observations become available, we incrementally refine the 3DGS model by fusing new views guided by the diffusion model's uncertainty, thereby continuously improving the pose estimation accuracy and 3D reconstruction quality. To ensure global consistency, the diffusion prior-generated views and subsequent observations are further integrated in a pose graph and jointly optimized into a coherent 3DGS field. Extensive experiments demonstrate that UnPose significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality. We further showcase its practical applicability in real-world robotic manipulation tasks.\n\n新物体的 6D 位姿估计是机器人领域中的一个基础但极具挑战性的问题，通常依赖于物体 CAD 模型。然而，获取此类模型往往代价高昂且不切实际。近期方法试图通过利用基础模型的强先验，从单视角或多视角图像重建物体，以绕过这一需求，但通常需要额外训练或会产生虚构的几何结构。为此，我们提出了 UnPose，这是一种零样本、无模型的 6D 物体位姿估计与重建新框架，利用预训练扩散模型的三维先验和不确定性估计。具体而言，从单帧 RGB-D 图像出发，UnPose 使用多视角扩散模型基于三维高斯溅射（3DGS）表示估计初始三维模型，并生成逐像素的认知不确定性估计。随着更多观测的引入，我们通过融合由扩散模型不确定性引导的新视角，逐步优化 3DGS 模型，从而不断提升位姿估计精度与三维重建质量。为确保全局一致性，扩散先验生成的视角与后续观测被进一步整合进位姿图，并联合优化为一个一致的 3DGS 场。大量实验表明，UnPose 在 6D 位姿估计精度与三维重建质量上均显著优于现有方法。我们还展示了其在真实机器人操作任务中的实用性。\n"
  },
  {
    "path": "abs/2508.16467.md",
    "content": "### Arbitrary-Scale 3D Gaussian Super-Resolution\n\nExisting 3D Gaussian Splatting (3DGS) super-resolution methods typically perform high-resolution (HR) rendering of fixed scale factors, making them impractical for resource-limited scenarios. Directly rendering arbitrary-scale HR views with vanilla 3DGS introduces aliasing artifacts due to the lack of scale-aware rendering ability, while adding a post-processing upsampler for 3DGS complicates the framework and reduces rendering efficiency. To tackle these issues, we build an integrated framework that incorporates scale-aware rendering, generative prior-guided optimization, and progressive super-resolving to enable 3D Gaussian super-resolution of arbitrary scale factors with a single 3D model. Notably, our approach supports both integer and non-integer scale rendering to provide more flexibility. Extensive experiments demonstrate the effectiveness of our model in rendering high-quality arbitrary-scale HR views (6.59 dB PSNR gain over 3DGS) with a single model. It preserves structural consistency with LR views and across different scales, while maintaining real-time rendering speed (85 FPS at 1080p).\n\n现有的三维高斯溅射（3DGS）超分辨率方法通常只能在固定放大倍数下进行高分辨率（HR）渲染，这使其在资源受限场景中难以实用。直接使用原始 3DGS 渲染任意倍数的 HR 视图会因缺乏尺度感知渲染能力而引入混叠伪影，而为 3DGS 增加后处理上采样器又会使框架复杂化并降低渲染效率。为解决这些问题，我们构建了一个集成框架，结合了尺度感知渲染、生成先验引导优化和渐进式超分辨率，从而能够通过单一三维模型实现任意倍数的 3D 高斯超分辨率。值得注意的是，我们的方法同时支持整数倍与非整数倍渲染，提供了更高的灵活性。大量实验表明，我们的模型能够在单一模型下实现高质量的任意倍数 HR 渲染（相较于 3DGS 提升 6.59 dB 的 PSNR），并保持与 LR 视图及跨尺度的一致性，同时在 1080p 分辨率下仍能维持实时渲染速度（85 FPS）。\n"
  },
  {
    "path": "abs/2508.17012.md",
    "content": "### Fiducial Marker Splatting for High-Fidelity Robotics Simulations\n\nHigh-fidelity 3D simulation is critical for training mobile robots, but its traditional reliance on mesh-based representations often struggle in complex environments, such as densely packed greenhouses featuring occlusions and repetitive structures. Recent neural rendering methods, like Gaussian Splatting (GS), achieve remarkable visual realism but lack flexibility to incorporate fiducial markers, which are essential for robotic localization and control. We propose a hybrid framework that combines the photorealism of GS with structured marker representations. Our core contribution is a novel algorithm for efficiently generating GS-based fiducial markers (e.g., AprilTags) within cluttered scenes. Experiments show that our approach outperforms traditional image-fitting techniques in both efficiency and pose-estimation accuracy. We further demonstrate the framework's potential in a greenhouse simulation. This agricultural setting serves as a challenging testbed, as its combination of dense foliage, similar-looking elements, and occlusions pushes the limits of perception, thereby highlighting the framework's value for real-world applications.\n\n高保真三维模拟对移动机器人训练至关重要，但其传统依赖的基于网格的表示在复杂环境中往往表现不佳，例如在充满遮挡和重复结构的密集温室中。近期的神经渲染方法（如高斯溅射，GS）能够实现令人瞩目的视觉真实感，但缺乏灵活性来融入用于机器人定位与控制的标志物。为此，我们提出了一种混合框架，将 GS 的逼真效果与结构化标志物表示相结合。我们的核心贡献是一种新颖算法，能够在杂乱场景中高效生成基于 GS 的标志物（如 AprilTags）。实验结果表明，该方法在效率和位姿估计精度方面均优于传统的图像拟合技术。我们进一步在温室模拟中展示了该框架的潜力。这一农业场景是一个具有挑战性的测试平台，其密集的植被、相似的元素和遮挡组合极大考验感知能力，从而凸显了该框架在真实应用中的价值。\n"
  },
  {
    "path": "abs/2508.17437.md",
    "content": "### Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels\n\nInferring the physical properties of 3D scenes from visual information is a critical yet challenging task for creating interactive and realistic virtual worlds. While humans intuitively grasp material characteristics such as elasticity or stiffness, existing methods often rely on slow, per-scene optimization, limiting their generalizability and application. To address this problem, we introduce PIXIE, a novel method that trains a generalizable neural network to predict physical properties across multiple scenes from 3D visual features purely using supervised losses. Once trained, our feed-forward network can perform fast inference of plausible material fields, which coupled with a learned static scene representation like Gaussian Splatting enables realistic physics simulation under external forces. To facilitate this research, we also collected PIXIEVERSE, one of the largest known datasets of paired 3D assets and physic material annotations. Extensive evaluations demonstrate that PIXIE is about 1.46-4.39x better and orders of magnitude faster than test-time optimization methods. By leveraging pretrained visual features like CLIP, our method can also zero-shot generalize to real-world scenes despite only ever been trained on synthetic data.\n\n从视觉信息中推断三维场景的物理属性是构建交互式和逼真虚拟世界的关键任务，但同时也极具挑战性。人类能够直观理解材料特性，如弹性或刚度，而现有方法往往依赖于缓慢的逐场景优化，限制了其泛化性与应用性。为解决这一问题，我们提出了 PIXIE，这是一种新颖的方法，通过纯粹的监督损失训练可泛化的神经网络，从三维视觉特征中预测跨场景的物理属性。一旦训练完成，我们的前馈网络即可快速推理出合理的材料场，并结合三维高斯溅射等静态场景表示，在外力作用下实现逼真的物理模拟。为推动相关研究，我们还收集了 PIXIEVERSE，这是迄今已知规模最大的配对三维资产与物理材料标注数据集之一。大量评估结果表明，PIXIE 的性能比测试时优化方法高出约 1.46-4.39 倍，且速度快出若干个数量级。通过利用如 CLIP 等预训练视觉特征，我们的方法还能够在仅使用合成数据训练的情况下实现对真实场景的零样本泛化。\n"
  },
  {
    "path": "abs/2508.17579.md",
    "content": "### IDU: Incremental Dynamic Update of Existing 3D Virtual Environments with New Imagery Data\n\nFor simulation and training purposes, military organizations have made substantial investments in developing high-resolution 3D virtual environments through extensive imaging and 3D scanning. However, the dynamic nature of battlefield conditions-where objects may appear or vanish over time-makes frequent full-scale updates both time-consuming and costly. In response, we introduce the Incremental Dynamic Update (IDU) pipeline, which efficiently updates existing 3D reconstructions, such as 3D Gaussian Splatting (3DGS), with only a small set of newly acquired images. Our approach starts with camera pose estimation to align new images with the existing 3D model, followed by change detection to pinpoint modifications in the scene. A 3D generative AI model is then used to create high-quality 3D assets of the new elements, which are seamlessly integrated into the existing 3D model. The IDU pipeline incorporates human guidance to ensure high accuracy in object identification and placement, with each update focusing on a single new object at a time. Experimental results confirm that our proposed IDU pipeline significantly reduces update time and labor, offering a cost-effective and targeted solution for maintaining up-to-date 3D models in rapidly evolving military scenarios.\n\n为满足模拟和训练的需求，军方组织在高分辨率三维虚拟环境的构建上投入了大量资源，通过广泛的成像与三维扫描来实现。然而，战场环境的动态特性——物体可能随时间出现或消失——使得频繁的大规模更新既耗时又昂贵。为此，我们提出了增量动态更新（IDU）流水线，该方法能够仅依赖少量新获取的图像，高效更新已有的三维重建（如三维高斯溅射，3DGS）。我们的方案首先通过相机位姿估计将新图像与现有三维模型对齐，随后利用变化检测来定位场景中的改动。接着，利用三维生成式人工智能模型生成新元素的高质量三维资产，并无缝地融入到已有三维模型中。IDU 流水线还引入人工干预，以确保在目标识别和放置上的高精度，每次更新集中于单个新对象。实验结果表明，所提出的 IDU 流水线显著降低了更新所需的时间与人工成本，为快速变化的军事场景中保持最新三维模型提供了一种高效、低成本且有针对性的解决方案。\n"
  },
  {
    "path": "abs/2508.17600.md",
    "content": "### GWM: Towards Scalable Gaussian World Models for Robotic Manipulation\n\nTraining robot policies within a learned world model is trending due to the inefficiency of real-world interactions. The established image-based world models and policies have shown prior success, but lack robust geometric information that requires consistent spatial and physical understanding of the three-dimensional world, even pre-trained on internet-scale video sources. To this end, we propose a novel branch of world model named Gaussian World Model (GWM) for robotic manipulation, which reconstructs the future state by inferring the propagation of Gaussian primitives under the effect of robot actions. At its core is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction with Gaussian Splatting. GWM can not only enhance the visual representation for imitation learning agent by self-supervised future prediction training, but can serve as a neural simulator that supports model-based reinforcement learning. Both simulated and real-world experiments depict that GWM can precisely predict future scenes conditioned on diverse robot actions, and can be further utilized to train policies that outperform the state-of-the-art by impressive margins, showcasing the initial data scaling potential of 3D world model.\n\n由于现实世界交互的低效性，在学习到的世界模型中训练机器人策略正逐渐成为趋势。已有的基于图像的世界模型和策略虽然取得了先前的成功，但缺乏稳健的几何信息，而几何信息对于三维世界的一致空间和物理理解至关重要，即便是在经过互联网规模视频预训练的情况下仍然如此。为此，我们提出了一种新型的世界模型分支——高斯世界模型（Gaussian World Model，GWM），用于机器人操作，其通过推理机器人动作作用下高斯基元的传播来重建未来状态。其核心是结合三维变分自编码器的潜变量扩散Transformer（Diffusion Transformer, DiT），从而能够利用高斯点绘（Gaussian Splatting）实现精细的场景级未来状态重建。GWM不仅可以通过自监督的未来预测训练增强模仿学习智能体的视觉表征，还可以作为神经模拟器，支持基于模型的强化学习。仿真和真实环境的实验结果均表明，GWM能够在多样化机器人动作条件下精确预测未来场景，并进一步用于训练在性能上显著超越现有最先进方法的策略，展示了三维世界模型在数据扩展方面的潜在能力。\n"
  },
  {
    "path": "abs/2508.17811.md",
    "content": "### MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting\n\nSurface reconstruction has been widely studied in computer vision and graphics. However, existing surface reconstruction works struggle to recover accurate scene geometry when the input views are extremely sparse. To address this issue, we propose MeshSplat, a generalizable sparse-view surface reconstruction framework via Gaussian Splatting. Our key idea is to leverage 2DGS as a bridge, which connects novel view synthesis to learned geometric priors and then transfers these priors to achieve surface reconstruction. Specifically, we incorporate a feed-forward network to predict per-view pixel-aligned 2DGS, which enables the network to synthesize novel view images and thus eliminates the need for direct 3D ground-truth supervision. To improve the accuracy of 2DGS position and orientation prediction, we propose a Weighted Chamfer Distance Loss to regularize the depth maps, especially in overlapping areas of input views, and also a normal prediction network to align the orientation of 2DGS with normal vectors predicted by a monocular normal estimator. Extensive experiments validate the effectiveness of our proposed improvement, demonstrating that our method achieves state-of-the-art performance in generalizable sparse-view mesh reconstruction tasks.\n\n表面重建在计算机视觉和图形学中已被广泛研究。然而，当输入视角极其稀疏时，现有的表面重建方法难以恢复准确的场景几何。为了解决这一问题，我们提出了 **MeshSplat**，一种基于高斯点绘（Gaussian Splatting）的可泛化稀疏视角表面重建框架。我们的核心思想是利用 **2DGS** 作为桥梁，将新视角合成与学习到的几何先验相连接，并将这些先验迁移以实现表面重建。具体来说，我们引入了一个前馈网络，用于预测逐视角像素对齐的 2DGS，使网络能够合成新视角图像，从而消除了对直接三维真实监督的需求。为提高 2DGS 的位置与方向预测的准确性，我们提出了一种 **加权Chamfer距离损失** 来正则化深度图，特别是在输入视角的重叠区域，同时引入一个法向预测网络，将 2DGS 的方向与单目法向估计器预测的法向向量对齐。大量实验验证了所提改进的有效性，结果表明，我们的方法在可泛化的稀疏视角网格重建任务中实现了当前最优的性能。\n"
  },
  {
    "path": "abs/2508.17876.md",
    "content": "### Camera Pose Refinement via 3D Gaussian Splatting\n\nCamera pose refinement aims at improving the accuracy of initial pose estimation for applications in 3D computer vision. Most refinement approaches rely on 2D-3D correspondences with specific descriptors or dedicated networks, requiring reconstructing the scene again for a different descriptor or fully retraining the network for each scene. Some recent methods instead infer pose from feature similarity, but their lack of geometry constraints results in less accuracy. To overcome these limitations, we propose a novel camera pose refinement framework leveraging 3D Gaussian Splatting (3DGS), referred to as GS-SMC. Given the widespread usage of 3DGS, our method can employ an existing 3DGS model to render novel views, providing a lightweight solution that can be directly applied to diverse scenes without additional training or fine-tuning. Specifically, we introduce an iterative optimization approach, which refines the camera pose using epipolar geometric constraints among the query and multiple rendered images. Our method allows flexibly choosing feature extractors and matchers to establish these constraints. Extensive empirical evaluations on the 7-Scenes and the Cambridge Landmarks datasets demonstrate that our method outperforms state-of-the-art camera pose refinement approaches, achieving 53.3% and 56.9% reductions in median translation and rotation errors on 7-Scenes, and 40.7% and 53.2% on Cambridge.\n\n相机位姿优化旨在提升三维计算机视觉应用中初始位姿估计的准确性。大多数优化方法依赖于结合特定描述符或专用网络的二维-三维对应关系，这通常需要针对不同描述符重新重建场景，或为每个场景完全重新训练网络。近期一些方法则尝试基于特征相似性推断位姿，但由于缺乏几何约束，其精度往往不足。为克服这些局限，我们提出了一种利用三维高斯点绘（3D Gaussian Splatting, 3DGS）的新型相机位姿优化框架，称为 **GS-SMC**。鉴于 3DGS 的广泛应用，我们的方法可以直接利用现有的 3DGS 模型生成新视角图像，从而提供一种无需额外训练或微调、可直接应用于多种场景的轻量化解决方案。具体而言，我们引入了一种迭代优化方法，通过在查询图像与多个渲染图像之间引入极线几何约束来优化相机位姿。该方法允许灵活选择特征提取器和匹配器来建立这些约束。大量在 7-Scenes 和 Cambridge Landmarks 数据集上的实证评估表明，我们的方法在性能上超越了现有最先进的相机位姿优化方法，在 7-Scenes 上分别实现了 53.3% 和 56.9% 的平移与旋转误差中值下降，在 Cambridge 上分别实现了 40.7% 和 53.2% 的下降。\n"
  },
  {
    "path": "abs/2508.18242.md",
    "content": "### GSVisLoc: Generalizable Visual Localization for Gaussian Splatting Scene Representations\n\nWe introduce GSVisLoc, a visual localization method designed for 3D Gaussian Splatting (3DGS) scene representations. Given a 3DGS model of a scene and a query image, our goal is to estimate the camera's position and orientation. We accomplish this by robustly matching scene features to image features. Scene features are produced by downsampling and encoding the 3D Gaussians while image features are obtained by encoding image patches. Our algorithm proceeds in three steps, starting with coarse matching, then fine matching, and finally by applying pose refinement for an accurate final estimate. Importantly, our method leverages the explicit 3DGS scene representation for visual localization without requiring modifications, retraining, or additional reference images. We evaluate GSVisLoc on both indoor and outdoor scenes, demonstrating competitive localization performance on standard benchmarks while outperforming existing 3DGS-based baselines. Moreover, our approach generalizes effectively to novel scenes without additional training.\n\n我们提出了 **GSVisLoc**，一种专为三维高斯点绘（3D Gaussian Splatting, 3DGS）场景表示设计的视觉定位方法。给定一个场景的 3DGS 模型和一张查询图像，我们的目标是估计相机的位置和朝向。我们通过将场景特征与图像特征进行稳健匹配来实现这一目标。场景特征通过对三维高斯进行下采样和编码获得，而图像特征则通过对图像块进行编码获得。我们的算法分为三个步骤：首先是粗匹配，其次是精匹配，最后通过位姿优化获得准确的最终估计。重要的是，我们的方法直接利用了显式的 3DGS 场景表示来进行视觉定位，无需修改、重新训练或额外的参考图像。我们在室内和室外场景中对 GSVisLoc 进行了评估，结果表明其在标准基准上实现了具有竞争力的定位性能，并优于现有基于 3DGS 的基线方法。此外，我们的方法能够在无需额外训练的情况下有效泛化到新的场景。\n"
  },
  {
    "path": "abs/2508.18389.md",
    "content": "### FastAvatar: Instant 3D Gaussian Splatting for Faces from Single Unconstrained Poses\n\nWe present FastAvatar, a pose-invariant, feed-forward framework that can generate a 3D Gaussian Splatting (3DGS) model from a single face image from an arbitrary pose in near-instant time (&lt;10ms). FastAvatar uses a novel encoder-decoder neural network design to achieve both fast fitting and identity preservation regardless of input pose. First, FastAvatar constructs a 3DGS face \"template\" model from a training dataset of faces with multi-view captures. Second, FastAvatar encodes the input face image into an identity-specific and pose-invariant latent embedding, and decodes this embedding to predict residuals to the structural and appearance parameters of each Gaussian in the template 3DGS model. By only inferring residuals in a feed-forward fashion, model inference is fast and robust. FastAvatar significantly outperforms existing feed-forward face 3DGS methods (e.g., GAGAvatar) in reconstruction quality, and runs 1000x faster than per-face optimization methods (e.g., FlashAvatar, GaussianAvatars and GASP). In addition, FastAvatar's novel latent space design supports real-time identity interpolation and attribute editing which is not possible with any existing feed-forward 3DGS face generation framework. FastAvatar's combination of excellent reconstruction quality and speed expands the scope of 3DGS for photorealistic avatar applications in consumer and interactive systems.\n\n我们提出了 **FastAvatar**，一个姿态无关的前馈框架，可以在近乎瞬时的时间内（&lt;10ms）从任意姿态的单张人脸图像生成三维高斯点绘（3D Gaussian Splatting, 3DGS）模型。FastAvatar 采用新颖的编码器-解码器神经网络设计，在保证输入姿态无关的情况下，实现了快速拟合和身份保持。首先，FastAvatar 从一个多视角采集的人脸训练数据集中构建一个 3DGS 人脸“模板”模型。其次，FastAvatar 将输入人脸图像编码为一个身份相关且姿态无关的潜在嵌入，再通过解码器预测模板 3DGS 模型中每个高斯的结构和外观参数残差。通过仅以前馈方式推断残差，模型推理快速且鲁棒。FastAvatar 在重建质量上显著优于现有的前馈式人脸 3DGS 方法（如 GAGAvatar），并且比基于每张人脸的优化方法（如 FlashAvatar、GaussianAvatars 和 GASP）快 1000 倍。此外，FastAvatar 的新颖潜在空间设计支持实时身份插值和属性编辑，这是现有任何前馈式 3DGS 人脸生成框架所无法实现的。FastAvatar 将卓越的重建质量与高速推理相结合，拓展了 3DGS 在消费级和交互式系统中逼真化身应用的范围。\n"
  },
  {
    "path": "abs/2508.18696.md",
    "content": "### ColorGS: High-fidelity Surgical Scene Reconstruction with Colored Gaussian Splatting\n\nHigh-fidelity reconstruction of deformable tissues from endoscopic videos remains challenging due to the limitations of existing methods in capturing subtle color variations and modeling global deformations. While 3D Gaussian Splatting (3DGS) enables efficient dynamic reconstruction, its fixed per-Gaussian color assignment struggles with intricate textures, and linear deformation modeling fails to model consistent global deformation. To address these issues, we propose ColorGS, a novel framework that integrates spatially adaptive color encoding and enhanced deformation modeling for surgical scene reconstruction. First, we introduce Colored Gaussian Primitives, which employ dynamic anchors with learnable color parameters to adaptively encode spatially varying textures, significantly improving color expressiveness under complex lighting and tissue similarity. Second, we design an Enhanced Deformation Model (EDM) that combines time-aware Gaussian basis functions with learnable time-independent deformations, enabling precise capture of both localized tissue deformations and global motion consistency caused by surgical interactions. Extensive experiments on DaVinci robotic surgery videos and benchmark datasets (EndoNeRF, StereoMIS) demonstrate that ColorGS achieves state-of-the-art performance, attaining a PSNR of 39.85 (1.5 higher than prior 3DGS-based methods) and superior SSIM (97.25\\%) while maintaining real-time rendering efficiency. Our work advances surgical scene reconstruction by balancing high fidelity with computational practicality, critical for intraoperative guidance and AR/VR applications.\n\n从内窥镜视频中实现对可变形组织的高保真重建依然具有挑战性，现有方法在捕捉细微色彩变化和建模全局形变方面存在局限。虽然三维高斯点绘（3D Gaussian Splatting, 3DGS）能够实现高效的动态重建，但其固定的单高斯颜色分配难以处理复杂纹理，而线性形变建模无法刻画一致的全局形变。为解决这些问题，我们提出了 **ColorGS**，一个融合空间自适应颜色编码和增强形变建模的外科场景重建新框架。首先，我们引入 **有色高斯基元**，通过具有可学习颜色参数的动态锚点，自适应地编码空间变化纹理，在复杂光照和组织相似性条件下显著提升了颜色表现力。其次，我们设计了 **增强形变模型（EDM）**，将时序感知的高斯基函数与可学习的时间无关形变相结合，从而能够精确捕捉局部组织形变与因外科交互引起的全局运动一致性。在 DaVinci 机器人手术视频和基准数据集（EndoNeRF, StereoMIS）上的大量实验表明，ColorGS 实现了当前最优性能，获得了 39.85 的 PSNR（比现有基于 3DGS 的方法高出 1.5）和 97.25% 的 SSIM，同时保持了实时渲染效率。我们的工作通过在高保真度和计算实用性之间取得平衡，推动了外科场景重建的发展，这对于术中导航和 AR/VR 应用具有重要意义。\n"
  },
  {
    "path": "abs/2508.19243.md",
    "content": "### Style4D-Bench: A Benchmark Suite for 4D Stylization\n\nWe introduce Style4D-Bench, the first benchmark suite specifically designed for 4D stylization, with the goal of standardizing evaluation and facilitating progress in this emerging area. Style4D-Bench comprises: 1) a comprehensive evaluation protocol measuring spatial fidelity, temporal coherence, and multi-view consistency through both perceptual and quantitative metrics, 2) a strong baseline that make an initial attempt for 4D stylization, and 3) a curated collection of high-resolution dynamic 4D scenes with diverse motions and complex backgrounds. To establish a strong baseline, we present Style4D, a novel framework built upon 4D Gaussian Splatting. It consists of three key components: a basic 4DGS scene representation to capture reliable geometry, a Style Gaussian Representation that leverages lightweight per-Gaussian MLPs for temporally and spatially aware appearance control, and a Holistic Geometry-Preserved Style Transfer module designed to enhance spatio-temporal consistency via contrastive coherence learning and structural content preservation. Extensive experiments on Style4D-Bench demonstrate that Style4D achieves state-of-the-art performance in 4D stylization, producing fine-grained stylistic details with stable temporal dynamics and consistent multi-view rendering. We expect Style4D-Bench to become a valuable resource for benchmarking and advancing research in stylized rendering of dynamic 3D scenes.\n\n我们提出了 **Style4D-Bench**，这是首个专为 4D 风格化设计的基准套件，旨在标准化评估并推动该新兴领域的发展。Style4D-Bench 包含：1）一套全面的评估协议，通过感知指标与定量指标测量空间保真度、时间一致性和多视角一致性；2）一个强有力的基线方法，作为对 4D 风格化的初步探索；3）一个精心整理的高分辨率动态 4D 场景集合，涵盖多样化运动和复杂背景。为建立强基线，我们提出了 **Style4D**，一个基于四维高斯点绘（4D Gaussian Splatting）的新颖框架。它包含三个关键组件：基础的 4DGS 场景表示，用于捕捉可靠的几何结构；**风格高斯表示**，利用轻量级的逐高斯 MLP，实现时空感知的外观控制；以及 **整体几何保持的风格迁移模块**，通过对比一致性学习与结构内容保持来增强时空一致性。在 Style4D-Bench 上的大量实验表明，Style4D 在 4D 风格化任务中达到了最新的性能水平，能够生成细致的风格化细节，保持稳定的时间动态和一致的多视角渲染。我们期望 Style4D-Bench 成为推动动态三维场景风格化渲染研究的重要资源。\n"
  },
  {
    "path": "abs/2508.19699.md",
    "content": "### LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation\n\n3D Gaussian Splatting (3DGS) has emerged as a novel explicit representation for 3D scenes, offering both high-fidelity reconstruction and efficient rendering. However, 3DGS lacks 3D segmentation ability, which limits its applicability in tasks that require scene understanding. The identification and isolating of specific object components is crucial. To address this limitation , we propose Label-aware 3D Gaussian Splatting (LabelGS), a method that augments the Gaussian representation with object label. LabelGS introduces cross-view consistent semantic masks for 3D Gaussians and employs a novel Occlusion Analysis Model to avoid overfitting occlusion during optimization, Main Gaussian Labeling model to lift 2D semantic prior to 3D Gaussian and Gaussian Projection Filter to avoid Gaussian label conflict. Our approach achieves effective decoupling of Gaussian representations and refines the 3DGS optimization process through a random region sampling strategy, significantly improving efficiency. Extensive experiments demonstrate that LabelGS outperforms previous state-of-the-art methods, including Feature-3DGS, in the 3D scene segmentation task. Notably, LabelGS achieves a remarkable 22× speedup in training compared to Feature-3DGS, at a resolution of 1440×1080.\n\n三维高斯点绘（3D Gaussian Splatting, 3DGS）作为一种新颖的显式三维场景表示方式，兼具高保真重建与高效渲染。然而，3DGS 缺乏三维分割能力，限制了其在需要场景理解的任务中的应用。对特定对象组件的识别与分离至关重要。为克服这一局限，我们提出了 **标签感知三维高斯点绘（LabelGS）**，一种将对象标签融入高斯表示的方法。LabelGS 为 3D 高斯引入跨视角一致的语义掩码，并设计了三种新模型：**遮挡分析模型**，用于避免优化过程中对遮挡的过拟合；**主高斯标注模型**，将二维语义先验提升到三维高斯层面；**高斯投影过滤器**，用于避免高斯标签冲突。我们的方法实现了对高斯表示的有效解耦，并通过随机区域采样策略优化 3DGS 训练过程，大幅提升效率。大量实验结果表明，LabelGS 在三维场景分割任务中优于以往的最新方法（包括 Feature-3DGS）。值得注意的是，在 1440×1080 分辨率下，LabelGS 相比 Feature-3DGS 的训练速度提升了 22 倍。\n"
  },
  {
    "path": "abs/2508.19754.md",
    "content": "### FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers\n\nDespite significant progress in 3D avatar reconstruction, it still faces challenges such as high time complexity, sensitivity to data quality, and low data utilization. We propose FastAvatar, a feedforward 3D avatar framework capable of flexibly leveraging diverse daily recordings (e.g., a single image, multi-view observations, or monocular video) to reconstruct a high-quality 3D Gaussian Splatting (3DGS) model within seconds, using only a single unified model. FastAvatar's core is a Large Gaussian Reconstruction Transformer featuring three key designs: First, a variant VGGT-style transformer architecture aggregating multi-frame cues while injecting initial 3D prompt to predict an aggregatable canonical 3DGS representation; Second, multi-granular guidance encoding (camera pose, FLAME expression, head pose) mitigating animation-induced misalignment for variable-length inputs; Third, incremental Gaussian aggregation via landmark tracking and sliced fusion losses. Integrating these features, FastAvatar enables incremental reconstruction, i.e., improving quality with more observations, unlike prior work wasting input data. This yields a quality-speed-tunable paradigm for highly usable avatar modeling. Extensive experiments show that FastAvatar has higher quality and highly competitive speed compared to existing methods.\n\n尽管三维虚拟人重建已取得显著进展，但仍面临计算复杂度高、对数据质量敏感、数据利用率低等挑战。我们提出了 **FastAvatar**，一个前馈式三维虚拟人框架，能够灵活利用多样化的日常记录（如单张图像、多视角观测或单目视频），在数秒内利用单一统一模型重建出高质量的三维高斯点绘（3D Gaussian Splatting, 3DGS）模型。FastAvatar 的核心是一个 **大型高斯重建 Transformer**，具有三个关键设计：第一，VGGT 风格的变体 Transformer 架构，在引入初始三维提示的同时聚合多帧信息，以预测可聚合的规范化 3DGS 表示；第二，多粒度引导编码（相机位姿、FLAME 表情、头部姿态），缓解动画引起的错位问题，适配可变长度输入；第三，通过基于关键点跟踪和切片融合损失的增量高斯聚合机制实现逐步优化。通过整合这些特性，FastAvatar 实现了增量式重建，即随着观测数量的增加提升重建质量，而不同于以往方法对输入数据的浪费。这为虚拟人建模带来了一个质量与速度可调的高实用性范式。大量实验结果表明，FastAvatar 相比现有方法具有更高的质量和极具竞争力的速度。\n"
  },
  {
    "path": "abs/2508.19786.md",
    "content": "### MAPo : Motion-Aware Partitioning of Deformable 3D Gaussian Splatting for High-Fidelity Dynamic Scene Reconstruction\n\n3D Gaussian Splatting, known for enabling high-quality static scene reconstruction with fast rendering, is increasingly being applied to dynamic scene reconstruction. A common strategy involves learning a deformation field to model the temporal changes of a canonical set of 3D Gaussians. However, these deformation-based methods often produce blurred renderings and lose fine motion details in highly dynamic regions due to the inherent limitations of a single, unified model in representing diverse motion patterns. To address these challenges, we introduce Motion-Aware Partitioning of Deformable 3D Gaussian Splatting (MAPo), a novel framework for high-fidelity dynamic scene reconstruction. Its core is a dynamic score-based partitioning strategy that distinguishes between high- and low-dynamic 3D Gaussians. For high-dynamic 3D Gaussians, we recursively partition them temporally and duplicate their deformation networks for each new temporal segment, enabling specialized modeling to capture intricate motion details. Concurrently, low-dynamic 3DGs are treated as static to reduce computational costs. However, this temporal partitioning strategy for high-dynamic 3DGs can introduce visual discontinuities across frames at the partition boundaries. To address this, we introduce a cross-frame consistency loss, which not only ensures visual continuity but also further enhances rendering quality. Extensive experiments demonstrate that MAPo achieves superior rendering quality compared to baselines while maintaining comparable computational costs, particularly in regions with complex or rapid motions.\n\n三维高斯点绘（3D Gaussian Splatting, 3DGS）因其能够实现高质量的静态场景重建并具备快速渲染能力，正逐渐被应用于动态场景重建。一种常见策略是学习形变场以建模一组规范化 3D 高斯的时间变化。然而，这类基于形变的方法在高度动态区域往往会生成模糊渲染，并丢失细致的运动信息，其根本原因在于单一统一模型难以表示多样化的运动模式。为应对这一挑战，我们提出了 **可变形三维高斯点绘的运动感知分区方法（MAPo）**，一个用于高保真动态场景重建的新框架。其核心是基于动态评分的分区策略，用以区分高动态和低动态 3D 高斯。对于高动态 3D 高斯，我们在时间维度上递归分区，并为每个新的时间段复制其形变网络，从而实现专门化建模以捕捉复杂的运动细节。同时，低动态 3D 高斯被视为静态，以降低计算开销。然而，这种对高动态 3D 高斯的时间分区策略可能在分区边界处引入跨帧视觉不连续性。为解决此问题，我们提出了 **跨帧一致性损失**，不仅保证视觉连续性，还进一步提升渲染质量。大量实验表明，MAPo 在保持相当计算开销的同时，在渲染质量上优于基线方法，尤其在复杂或快速运动区域表现突出。\n"
  },
  {
    "path": "abs/2508.20080.md",
    "content": "### Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images\n\n360-degree visual content is widely shared on platforms such as YouTube and plays a central role in virtual reality, robotics, and autonomous navigation. However, consumer-grade dual-fisheye systems consistently yield imperfect panoramas due to inherent lens separation and angular distortions. In this work, we introduce a novel calibration framework that incorporates a dual-fisheye camera model into the 3D Gaussian splatting pipeline. Our approach not only simulates the realistic visual artifacts produced by dual-fisheye cameras but also enables the synthesis of seamlessly rendered 360-degree images. By jointly optimizing 3D Gaussian parameters alongside calibration variables that emulate lens gaps and angular distortions, our framework transforms imperfect omnidirectional inputs into flawless novel view synthesis. Extensive evaluations on real-world datasets confirm that our method produces seamless renderings-even from imperfect images-and outperforms existing 360-degree rendering models.\n\n360 度视觉内容在 YouTube 等平台上被广泛分享，并在虚拟现实、机器人技术和自动驾驶中发挥着核心作用。然而，消费级双鱼眼系统由于固有的镜头分离和角度畸变，常常生成不完美的全景图。在这项工作中，我们提出了一种新颖的标定框架，将双鱼眼相机模型引入三维高斯点绘（3D Gaussian Splatting）管线。我们的方法不仅能够模拟双鱼眼相机产生的真实视觉伪影，还可以合成无缝渲染的 360 度图像。通过联合优化三维高斯参数与标定变量（用于模拟镜头间隙和角度畸变），我们的框架能够将不完美的全向输入转化为无瑕的新视角合成结果。在真实世界数据集上的大量评估结果表明，我们的方法即便在输入图像不完美的情况下也能生成无缝渲染，并优于现有的 360 度渲染模型。\n"
  },
  {
    "path": "abs/2508.20471.md",
    "content": "### Realistic and Controllable 3D Gaussian-Guided Object Editing for Driving Video Generation\n\nCorner cases are crucial for training and validating autonomous driving systems, yet collecting them from the real world is often costly and hazardous. Editing objects within captured sensor data offers an effective alternative for generating diverse scenarios, commonly achieved through 3D Gaussian Splatting or image generative models. However, these approaches often suffer from limited visual fidelity or imprecise pose control. To address these issues, we propose G^2Editor, a framework designed for photorealistic and precise object editing in driving videos. Our method leverages a 3D Gaussian representation of the edited object as a dense prior, injected into the denoising process to ensure accurate pose control and spatial consistency. A scene-level 3D bounding box layout is employed to reconstruct occluded areas of non-target objects. Furthermore, to guide the appearance details of the edited object, we incorporate hierarchical fine-grained features as additional conditions during generation. Experiments on the Waymo Open Dataset demonstrate that G^2Editor effectively supports object repositioning, insertion, and deletion within a unified framework, outperforming existing methods in both pose controllability and visual quality, while also benefiting downstream data-driven tasks.\n\n边缘场景对于自动驾驶系统的训练和验证至关重要，但在真实世界中采集此类数据往往代价高昂且存在风险。在已采集的传感器数据中编辑对象是一种生成多样化场景的有效替代方案，通常通过三维高斯点绘（3D Gaussian Splatting）或图像生成模型实现。然而，这些方法常常受到视觉保真度有限或位姿控制不精确的困扰。为解决这些问题，我们提出了 **G^2Editor**，一个用于驾驶视频中逼真且精确对象编辑的框架。我们的方法利用编辑对象的三维高斯表示作为稠密先验，并将其注入去噪过程，从而确保位姿控制的准确性和空间一致性。同时，我们引入场景级三维包围盒布局，用于重建非目标对象的遮挡区域。此外，为了引导编辑对象的外观细节，我们在生成过程中加入了分层细粒度特征作为附加条件。在 Waymo Open Dataset 上的实验表明，G^2Editor 能够在统一框架下有效支持对象的重定位、插入与删除，在位姿可控性和视觉质量方面均优于现有方法，并能进一步促进下游数据驱动任务。\n"
  },
  {
    "path": "abs/2508.20623.md",
    "content": "### AvatarBack: Back-Head Generation for Complete 3D Avatars from Front-View Images\n\nRecent advances in Gaussian Splatting have significantly boosted the reconstruction of head avatars, enabling high-quality facial modeling by representing an 3D avatar as a collection of 3D Gaussians. However, existing methods predominantly rely on frontal-view images, leaving the back-head poorly constructed. This leads to geometric inconsistencies, structural blurring, and reduced realism in the rear regions, ultimately limiting the fidelity of reconstructed avatars. To address this challenge, we propose AvatarBack, a novel plug-and-play framework specifically designed to reconstruct complete and consistent 3D Gaussian avatars by explicitly modeling the missing back-head regions. AvatarBack integrates two core technical innovations,i.e., the Subject-specific Generator (SSG) and the Adaptive Spatial Alignment Strategy (ASA). The former leverages a generative prior to synthesize identity-consistent, plausible back-view pseudo-images from sparse frontal inputs, providing robust multi-view supervision. To achieve precise geometric alignment between these synthetic views and the 3D Gaussian representation, the later employs learnable transformation matrices optimized during training, effectively resolving inherent pose and coordinate discrepancies. Extensive experiments on NeRSemble and K-hairstyle datasets, evaluated using geometric, photometric, and GPT-4o-based perceptual metrics, demonstrate that AvatarBack significantly enhances back-head reconstruction quality while preserving frontal fidelity. Moreover, the reconstructed avatars maintain consistent visual realism under diverse motions and remain fully animatable.\n\n近期在高斯点绘（Gaussian Splatting）方面的进展极大推动了头部虚拟人重建的发展，使得通过将三维虚拟人表示为三维高斯集合来实现高质量人脸建模成为可能。然而，现有方法主要依赖正面视角图像，导致后脑部分重建不足。这会引发几何不一致、结构模糊以及后部区域真实感下降，从而限制了重建虚拟人的保真度。为应对这一挑战，我们提出了 **AvatarBack**，一个新颖的即插即用框架，专门通过显式建模缺失的后脑区域来重建完整且一致的三维高斯虚拟人。AvatarBack 融合了两项核心技术创新：**主体特定生成器（SSG）** 和 **自适应空间对齐策略（ASA）**。前者利用生成先验，从稀疏的正面输入中合成与身份一致、合理可信的背面伪图像，从而提供稳健的多视角监督；后者通过在训练过程中优化的可学习变换矩阵，实现这些合成视图与三维高斯表示之间的精确几何对齐，有效解决了固有的姿态和坐标差异。在 NeRSemble 和 K-hairstyle 数据集上的大量实验（使用几何、光度和基于 GPT-4o 的感知指标进行评估）表明，AvatarBack 显著提升了后脑重建质量，同时保持了正面保真度。此外，重建的虚拟人在多样化动作下依然保持一致的视觉真实感，并且完全支持动画化。\n"
  },
  {
    "path": "abs/2508.20965.md",
    "content": "### DrivingGaussian++: Towards Realistic Reconstruction and Editable Simulation for Surrounding Dynamic Driving Scenes\n\nWe present DrivingGaussian++, an efficient and effective framework for realistic reconstructing and controllable editing of surrounding dynamic autonomous driving scenes. DrivingGaussian++ models the static background using incremental 3D Gaussians and reconstructs moving objects with a composite dynamic Gaussian graph, ensuring accurate positions and occlusions. By integrating a LiDAR prior, it achieves detailed and consistent scene reconstruction, outperforming existing methods in dynamic scene reconstruction and photorealistic surround-view synthesis. DrivingGaussian++ supports training-free controllable editing for dynamic driving scenes, including texture modification, weather simulation, and object manipulation, leveraging multi-view images and depth priors. By integrating large language models (LLMs) and controllable editing, our method can automatically generate dynamic object motion trajectories and enhance their realism during the optimization process. DrivingGaussian++ demonstrates consistent and realistic editing results and generates dynamic multi-view driving scenarios, while significantly enhancing scene diversity.\n\n我们提出了 **DrivingGaussian++**，一个高效且实用的框架，用于逼真重建和可控编辑自动驾驶场景中的周围动态环境。DrivingGaussian++ 通过增量三维高斯建模静态背景，并利用复合动态高斯图重建运动物体，从而保证位置和遮挡的精确性。通过引入激光雷达（LiDAR）先验，它实现了细致且一致的场景重建，在动态场景重建和逼真环视合成方面优于现有方法。DrivingGaussian++ 支持无需训练的动态驾驶场景可控编辑，包括纹理修改、天气模拟和对象操作，依托多视角图像和深度先验。通过结合大语言模型（LLMs）与可控编辑，我们的方法能够在优化过程中自动生成动态对象运动轨迹，并增强其真实感。DrivingGaussian++ 展现出一致且逼真的编辑效果，并能生成动态多视角驾驶场景，同时显著提升场景多样性。\n"
  },
  {
    "path": "abs/2508.21154.md",
    "content": "### RadGS-Reg: Registering Spine CT with Biplanar X-rays via Joint 3D Radiative Gaussians Reconstruction and 3D/3D Registration\n\nComputed Tomography (CT)/X-ray registration in image-guided navigation remains challenging because of its stringent requirements for high accuracy and real-time performance. Traditional \"render and compare\" methods, relying on iterative projection and comparison, suffer from spatial information loss and domain gap. 3D reconstruction from biplanar X-rays supplements spatial and shape information for 2D/3D registration, but current methods are limited by dense-view requirements and struggles with noisy X-rays. To address these limitations, we introduce RadGS-Reg, a novel framework for vertebral-level CT/X-ray registration through joint 3D Radiative Gaussians (RadGS) reconstruction and 3D/3D registration. Specifically, our biplanar X-rays vertebral RadGS reconstruction module explores learning-based RadGS reconstruction method with a Counterfactual Attention Learning (CAL) mechanism, focusing on vertebral regions in noisy X-rays. Additionally, a patient-specific pre-training strategy progressively adapts the RadGS-Reg from simulated to real data while simultaneously learning vertebral shape prior knowledge. Experiments on in-house datasets demonstrate the state-of-the-art performance for both tasks, surpassing existing methods.\n\n计算机断层扫描（CT）/X 光配准在图像引导导航中仍然具有挑战性，因为其对高精度和实时性的要求极为严格。传统的“渲染并比较”方法依赖迭代投影与比较，容易出现空间信息丢失和域间差异问题。基于双平面 X 光的三维重建能够为二维/三维配准补充空间与形状信息，但现有方法受制于对密集视角的需求，并且在处理噪声较大的 X 光时表现欠佳。为解决这些问题，我们提出了 **RadGS-Reg**，一个通过联合三维辐射高斯（Radiative Gaussians, RadGS）重建与三维/三维配准实现椎体级 CT/X 光配准的新框架。具体而言，我们的双平面 X 光椎体 RadGS 重建模块探索了一种基于学习的 RadGS 重建方法，并引入 **反事实注意力学习（Counterfactual Attention Learning, CAL）机制**，以在噪声 X 光中更好地聚焦椎体区域。此外，我们设计了患者特定的预训练策略，使 RadGS-Reg 能够在从模拟数据到真实数据的过程中逐步适应，同时学习椎体形状的先验知识。在内部数据集上的实验结果表明，RadGS-Reg 在这两项任务中均实现了最新的性能，超越了现有方法。\n"
  },
  {
    "path": "abs/2508.21344.md",
    "content": "### ARGS: Advanced Regularization on Aligning Gaussians over the Surface\n\nReconstructing high-quality 3D meshes and visuals from 3D Gaussian Splatting(3DGS) still remains a central challenge in computer graphics. Although existing models such as SuGaR offer effective solutions for rendering, there is is still room to improve improve both visual fidelity and scene consistency. This work builds upon SuGaR by introducing two complementary regularization strategies that address common limitations in both the shape of individual Gaussians and the coherence of the overall surface. The first strategy introduces an effective rank regularization, motivated by recent studies on Gaussian primitive structures. This regularization discourages extreme anisotropy-specifically, \"needle-like\" shapes-by favoring more balanced, \"disk-like\" forms that are better suited for stable surface reconstruction. The second strategy integrates a neural Signed Distance Function (SDF) into the optimization process. The SDF is regularized with an Eikonal loss to maintain proper distance properties and provides a continuous global surface prior, guiding Gaussians toward better alignment with the underlying geometry. These two regularizations aim to improve both the fidelity of individual Gaussian primitives and their collective surface behavior. The final model can make more accurate and coherent visuals from 3DGS data.\n\n从三维高斯点绘（3DGS）中重建高质量的三维网格和视觉效果，仍然是计算机图形学中的核心挑战。尽管现有模型（如 SuGaR）在渲染方面提供了有效的解决方案，但在视觉保真度和场景一致性方面仍有提升空间。本工作在 SuGaR 的基础上提出了两种互补的正则化策略，针对单个高斯形状和整体表面连贯性中的常见局限性进行改进。第一种策略是有效秩正则化，灵感来源于近期对高斯原语结构的研究。该正则化通过抑制极端的各向异性，尤其是“针状”形态，而更倾向于平衡的“盘状”形态，从而更适合稳定的表面重建。第二种策略是在优化过程中引入神经有符号距离函数（SDF）。SDF 结合 Eikonal 损失进行正则化，以保持合理的距离性质，并提供连续的全局表面先验，从而引导高斯更好地与潜在几何对齐。这两种正则化旨在同时提升单个高斯原语的保真度及其整体表面表现。最终模型能够从 3DGS 数据中生成更准确、更连贯的视觉效果。\n"
  },
  {
    "path": "abs/2508.21444.md",
    "content": "### Scale-GS: Efficient Scalable Gaussian Splatting via Redundancy-filtering Training on Streaming Content\n\n3D Gaussian Splatting (3DGS) enables high-fidelity real-time rendering, a key requirement for immersive applications. However, the extension of 3DGS to dynamic scenes remains limitations on the substantial data volume of dense Gaussians and the prolonged training time required for each frame. This paper presents Scale-GS, a scalable Gaussian Splatting framework designed for efficient training in streaming tasks. Specifically, Gaussian spheres are hierarchically organized by scale within an anchor-based structure. Coarser-level Gaussians represent the low-resolution structure of the scene, while finer-level Gaussians, responsible for detailed high-fidelity rendering, are selectively activated by the coarser-level Gaussians. To further reduce computational overhead, we introduce a hybrid deformation and spawning strategy that models motion of inter-frame through Gaussian deformation and triggers Gaussian spawning to characterize wide-range motion. Additionally, a bidirectional adaptive masking mechanism enhances training efficiency by removing static regions and prioritizing informative viewpoints. Extensive experiments demonstrate that Scale-GS achieves superior visual quality while significantly reducing training time compared to state-of-the-art methods.\n\n三维高斯点绘（3DGS）支持高保真的实时渲染，这是沉浸式应用的关键需求。然而，将 3DGS 扩展至动态场景仍然面临限制：一方面是密集高斯带来的庞大数据量，另一方面是每帧所需的长时间训练。本文提出了 Scale-GS，这是一种面向流式任务的可扩展高斯点绘框架，旨在实现高效训练。具体而言，高斯球在基于锚点的结构中按照尺度进行分层组织。粗层高斯表示场景的低分辨率结构，而细层高斯负责高保真渲染细节，并由粗层高斯选择性激活。为进一步降低计算开销，我们引入了一种混合变形与生成策略，通过高斯变形来建模帧间运动，并在大范围运动时触发高斯生成。同时，双向自适应掩码机制通过剔除静态区域并优先考虑信息量较高的视角，从而提升训练效率。大量实验证明，Scale-GS在显著缩短训练时间的同时，依然能实现优于现有方法的视觉质量。\n"
  },
  {
    "path": "abs/2508.21542.md",
    "content": "### Complete Gaussian Splats from a Single Image with Denoising Diffusion Models\n\nGaussian splatting typically requires dense observations of the scene and can fail to reconstruct occluded and unobserved areas. We propose a latent diffusion model to reconstruct a complete 3D scene with Gaussian splats, including the occluded parts, from only a single image during inference. Completing the unobserved surfaces of a scene is challenging due to the ambiguity of the plausible surfaces. Conventional methods use a regression-based formulation to predict a single \"mode\" for occluded and out-of-frustum surfaces, leading to blurriness, implausibility, and failure to capture multiple possible explanations. Thus, they often address this problem partially, focusing either on objects isolated from the background, reconstructing only visible surfaces, or failing to extrapolate far from the input views. In contrast, we propose a generative formulation to learn a distribution of 3D representations of Gaussian splats conditioned on a single input image. To address the lack of ground-truth training data, we propose a Variational AutoReconstructor to learn a latent space only from 2D images in a self-supervised manner, over which a diffusion model is trained. Our method generates faithful reconstructions and diverse samples with the ability to complete the occluded surfaces for high-quality 360-degree renderings.\n\n高斯点绘通常需要对场景进行密集观测，但在重建被遮挡或未观测区域时往往失败。我们提出了一种潜在扩散模型，在推理阶段仅需一张图像即可利用高斯点绘重建完整的三维场景，包括被遮挡的部分。补全场景中未观测表面是一个挑战，因为合理表面存在多重歧义。传统方法采用基于回归的建模方式，对被遮挡或视锥外表面预测单一“模式”，这通常导致模糊、不合理，且无法捕捉多种可能性。因此，这类方法往往只能部分解决问题，例如仅针对从背景中分离的物体、只重建可见表面，或无法从输入视角进行远距离外推。相比之下，我们提出了一种生成式建模方法，学习在单张输入图像条件下高斯点绘的三维表示分布。为应对缺乏真实训练数据的问题，我们提出了一种变分自编码重建器（Variational AutoReconstructor），仅利用二维图像以自监督方式学习潜在空间，并在其上训练扩散模型。我们的方法能够生成真实可信的重建结果和多样化样本，并具备补全遮挡表面的能力，从而实现高质量的全景（360 度）渲染。\n"
  },
  {
    "path": "abs/2509.00433.md",
    "content": "### AGS: Accelerating 3D Gaussian Splatting SLAM via CODEC-Assisted Frame Covisibility Detection\n\nSimultaneous Localization and Mapping (SLAM) is a critical task that enables autonomous vehicles to construct maps and localize themselves in unknown environments. Recent breakthroughs combine SLAM with 3D Gaussian Splatting (3DGS) to achieve exceptional reconstruction fidelity. However, existing 3DGS-SLAM systems provide insufficient throughput due to the need for multiple training iterations per frame and the vast number of Gaussians.\nIn this paper, we propose AGS, an algorithm-hardware co-design framework to boost the efficiency of 3DGS-SLAM based on the intuition that SLAM systems process frames in a streaming manner, where adjacent frames exhibit high similarity that can be utilized for acceleration. On the software level: 1) We propose a coarse-then-fine-grained pose tracking method with respect to the robot's movement. 2) We avoid redundant computations of Gaussians by sharing their contribution information across frames. On the hardware level, we propose a frame covisibility detection engine to extract intermediate data from the video CODEC. We also implement a pose tracking engine and a mapping engine with workload schedulers to efficiently deploy the AGS algorithm. Our evaluation shows that AGS achieves up to 17.12×, 6.71×, and 5.41× speedups against the mobile and high-end GPUs, and a state-of-the-art 3DGS accelerator, GSCore.\n\n同时定位与建图（SLAM）是一项关键任务，使得自动驾驶车辆能够在未知环境中构建地图并实现自我定位。近期的突破性进展将 SLAM 与三维高斯点绘（3DGS）相结合，从而实现了卓越的重建精度。然而，现有的 3DGS-SLAM 系统由于每帧需要多次训练迭代以及高斯数量庞大，导致吞吐量不足。\n本文提出了 AGS，这是一种算法-硬件协同设计的框架，旨在提升 3DGS-SLAM 的效率。该框架的核心直觉在于 SLAM 系统以流式方式处理帧，且相邻帧之间存在较高相似性，可以利用这一特性进行加速。在软件层面：1）我们提出了一种粗到细的位姿跟踪方法，以适应机器人运动；2）通过在跨帧共享高斯的贡献信息来避免冗余计算。在硬件层面，我们提出了一个帧共视检测引擎，用于从视频编解码器中提取中间数据。同时，我们还实现了一个位姿跟踪引擎和一个带有任务调度器的建图引擎，以高效部署 AGS 算法。实验评估表明，AGS 相较于移动端 GPU、高端 GPU 以及最先进的 3DGS 加速器 GSCore，分别实现了高达 17.12×、6.71× 和 5.41× 的加速效果。\n"
  },
  {
    "path": "abs/2509.00757.md",
    "content": "### MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image Structure\n\nThe growing popularity of 3D Gaussian Splatting (3DGS) has intensified the need for effective copyright protection. Current 3DGS watermarking methods rely on computationally expensive fine-tuning procedures for each predefined message. We propose the first generalizable watermarking framework that enables efficient protection of Splatter Image-based 3DGS models through a single forward pass. We introduce GaussianBridge that transforms unstructured 3D Gaussians into Splatter Image format, enabling direct neural processing for arbitrary message embedding. To ensure imperceptibility, we design a Gaussian-Uncertainty-Perceptual heatmap prediction strategy for preserving visual quality. For robust message recovery, we develop a dense segmentation-based extraction mechanism that maintains reliable extraction even when watermarked objects occupy minimal regions in rendered views.\n\n随着三维高斯溅射（3DGS）的日益普及，有效的版权保护需求愈发迫切。现有的3DGS水印方法依赖于对每个预定义信息进行计算代价高昂的微调过程。我们提出了首个可泛化的水印框架，通过单次前向传播即可高效保护基于溅射图像的3DGS模型。我们引入GaussianBridge，将非结构化的三维高斯转换为溅射图像格式，从而支持对任意信息的直接神经嵌入。为确保不可感知性，我们设计了一种基于高斯不确定性感知热图预测的策略，以保持视觉质量。为实现稳健的信息提取，我们开发了一种基于密集分割的提取机制，即使在带水印对象仅占据渲染视图的极小区域时，也能保持可靠的提取效果。\n"
  },
  {
    "path": "abs/2509.00800.md",
    "content": "### SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting\n\nAccurate 3D reconstruction in underwater environments remains a complex challenge due to issues such as light distortion, turbidity, and limited visibility. AI-based techniques have been applied to address these issues, however, existing methods have yet to fully exploit the potential of AI, particularly in integrating language models with visual processing. In this paper, we propose a novel framework that leverages multimodal cross-knowledge to create semantic-guided 3D Gaussian Splatting for robust and high-fidelity deep-sea scene reconstruction. By embedding an extra semantic feature into each Gaussian primitive and supervised by the CLIP extracted semantic feature, our method enforces semantic and structural awareness throughout the training. The dedicated semantic consistency loss ensures alignment with high-level scene understanding. Besides, we propose a novel stage-wise training strategy, combining coarse-to-fine learning with late-stage parameter refinement, to further enhance both stability and reconstruction quality. Extensive results show that our approach consistently outperforms state-of-the-art methods on SeaThru-NeRF and Submerged3D datasets across three metrics, with an improvement of up to 3.09 dB on average in terms of PSNR, making it a strong candidate for applications in underwater exploration and marine perception.\n\n在水下环境中实现精确的三维重建仍然是一项复杂的挑战，其原因在于光学畸变、浑浊度以及可见度受限等问题。尽管已有基于人工智能的技术尝试解决这些问题，但现有方法尚未充分发挥AI的潜力，尤其是在将语言模型与视觉处理相结合方面。在本文中，我们提出了一种新的框架，利用多模态跨知识构建语义引导的三维高斯溅射，以实现稳健且高保真的深海场景重建。通过在每个高斯基元中嵌入额外的语义特征，并在CLIP提取的语义特征监督下，我们的方法在训练过程中强化了语义与结构感知。专门设计的语义一致性损失确保了与高层次场景理解的对齐。此外，我们提出了一种新的分阶段训练策略，将由粗到细的学习与后期参数优化相结合，以进一步提升稳定性与重建质量。大量实验结果表明，我们的方法在SeaThru-NeRF和Submerged3D数据集的三个指标上均稳定优于现有最先进的方法，平均在PSNR上提升最高可达3.09 dB，使其成为水下探测与海洋感知应用中的有力候选方案。\n"
  },
  {
    "path": "abs/2509.00831.md",
    "content": "### UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring\n\nReconstructing dynamic 3D scenes from monocular video has broad applications in AR/VR, robotics, and autonomous navigation, but often fails due to severe motion blur caused by camera and object motion. Existing methods commonly follow a two-step pipeline, where camera poses are first estimated and then 3D Gaussians are optimized. Since blurring artifacts usually undermine pose estimation, pose errors could be accumulated to produce inferior reconstruction results. To address this issue, we introduce a unified optimization framework by incorporating camera poses as learnable parameters complementary to 3DGS attributes for end-to-end optimization. Specifically, we recast camera and object motion as per-primitive SE(3) affine transformations on 3D Gaussians and formulate a unified optimization objective. For stable optimization, we introduce a three-stage training schedule that optimizes camera poses and Gaussians alternatively. Particularly, 3D Gaussians are first trained with poses being fixed, and then poses are optimized with 3D Gaussians being untouched. Finally, all learnable parameters are optimized together. Extensive experiments on the Stereo Blur dataset and challenging real-world sequences demonstrate that our method achieves significant gains in reconstruction quality and pose estimation accuracy over prior dynamic deblurring methods.\n\n从单目视频中重建动态三维场景在AR/VR、机器人和自动导航中具有广泛应用，但常因相机和物体运动导致的严重运动模糊而失败。现有方法通常遵循两步流程：先估计相机位姿，再优化三维高斯。由于模糊伪影常常破坏位姿估计，位姿误差会不断累积，最终导致较差的重建结果。为解决这一问题，我们提出了一个统一优化框架，将相机位姿作为可学习参数，与3DGS属性互补，实现端到端优化。具体而言，我们将相机和物体运动重新表述为三维高斯上的逐基元SE(3)仿射变换，并构建统一的优化目标。为保证优化的稳定性，我们设计了一个三阶段训练策略，交替优化相机位姿和高斯：首先在固定位姿的情况下训练三维高斯，然后在保持三维高斯不变的情况下优化位姿，最后联合优化所有可学习参数。大量在Stereo Blur数据集和具有挑战性的真实序列上的实验表明，我们的方法在重建质量和位姿估计精度方面较现有动态去模糊方法取得了显著提升。\n"
  },
  {
    "path": "abs/2509.00911.md",
    "content": "### GS-TG: 3D Gaussian Splatting Accelerator with Tile Grouping for Reducing Redundant Sorting while Preserving Rasterization Efficiency\n\n3D Gaussian Splatting (3D-GS) has emerged as a promising alternative to neural radiance fields (NeRF) as it offers high speed as well as high image quality in novel view synthesis. Despite these advancements, 3D-GS still struggles to meet the frames per second (FPS) demands of real-time applications. In this paper, we introduce GS-TG, a tile-grouping-based accelerator that enhances 3D-GS rendering speed by reducing redundant sorting operations and preserving rasterization efficiency. GS-TG addresses a critical trade-off issue in 3D-GS rendering: increasing the tile size effectively reduces redundant sorting operations, but it concurrently increases unnecessary rasterization computations. So, during sorting of the proposed approach, GS-TG groups small tiles (for making large tiles) to share sorting operations across tiles within each group, significantly reducing redundant computations. During rasterization, a bitmask assigned to each Gaussian identifies relevant small tiles, to enable efficient sharing of sorting results. Consequently, GS-TG enables sorting to be performed as if a large tile size is used by grouping tiles during the sorting stage, while allowing rasterization to proceed with the original small tiles by using bitmasks in the rasterization stage. GS-TG is a lossless method requiring no retraining or fine-tuning and it can be seamlessly integrated with previous 3D-GS optimization techniques. Experimental results show that GS-TG achieves an average speed-up of 1.54 times over state-of-the-art 3D-GS accelerators.\n\n三维高斯溅射（3D-GS）作为神经辐射场（NeRF）的有前景替代方案，在新视角合成中兼具高速和高图像质量。然而，尽管已有诸多进展，3D-GS仍难以满足实时应用对帧率（FPS）的需求。本文提出了一种基于瓦片分组的加速器GS-TG，通过减少冗余排序操作并保持光栅化效率来提升3D-GS的渲染速度。GS-TG解决了3D-GS渲染中的一个关键权衡问题：增大瓦片尺寸能够有效减少冗余排序，但同时会增加不必要的光栅化计算。为此，在排序阶段，GS-TG将小瓦片分组成大瓦片，使组内瓦片共享排序操作，从而显著减少冗余计算；在光栅化阶段，通过为每个高斯分配一个位掩码来标识其相关的小瓦片，从而高效共享排序结果。因此，GS-TG在排序阶段实现了如同使用大瓦片的效果，而在光栅化阶段则通过位掩码仍能基于原始小瓦片进行处理。GS-TG是一种无损方法，无需重新训练或微调，并可无缝集成到已有的3D-GS优化技术中。实验结果表明，GS-TG在最新的3D-GS加速方法上实现了平均1.54倍的加速。\n"
  },
  {
    "path": "abs/2509.00989.md",
    "content": "### Towards Integrating Multi-Spectral Imaging with Gaussian Splatting\n\nWe present a study of how to integrate color (RGB) and multi-spectral imagery (red, green, red-edge, and near-infrared) into the 3D Gaussian Splatting (3DGS) framework, a state-of-the-art explicit radiance-field-based method for fast and high-fidelity 3D reconstruction from multi-view images. While 3DGS excels on RGB data, naive per-band optimization of additional spectra yields poor reconstructions due to inconsistently appearing geometry in the spectral domain. This problem is prominent, even though the actual geometry is the same, regardless of spectral modality. To investigate this, we evaluate three strategies: 1) Separate per-band reconstruction with no shared structure. 2) Splitting optimization, in which we first optimize RGB geometry, copy it, and then fit each new band to the model by optimizing both geometry and band representation. 3) Joint, in which the modalities are jointly optimized, optionally with an initial RGB-only phase. We showcase through quantitative metrics and qualitative novel-view renderings on multi-spectral datasets the effectiveness of our dedicated optimized Joint strategy, increasing overall spectral reconstruction as well as enhancing RGB results through spectral cross-talk. We therefore suggest integrating multi-spectral data directly into the spherical harmonics color components to compactly model each Gaussian's multi-spectral reflectance. Moreover, our analysis reveals several key trade-offs in when and how to introduce spectral bands during optimization, offering practical insights for robust multi-modal 3DGS reconstruction.\n\n我们研究了如何将彩色（RGB）和多光谱图像（红光、绿光、红边、近红外）集成到三维高斯溅射（3DGS）框架中。3DGS是一种基于显式辐射场的最新方法，可实现快速且高保真的多视图三维重建。虽然3DGS在RGB数据上表现优异，但对额外光谱进行逐波段的简单优化会导致较差的重建效果，这是由于光谱域中几何结构表现不一致所致。即便实际几何在不同光谱模态下是相同的，这一问题依然突出。为此，我们评估了三种策略：1）逐波段独立重建，不共享结构；2）分裂优化，先优化RGB几何，再复制并通过同时优化几何与光谱表示来拟合每个新波段；3）联合优化，所有模态同时优化，可选地以仅RGB的阶段作为起点。通过在多光谱数据集上的定量指标和新视角渲染结果，我们展示了专门优化的联合策略的有效性，不仅提升了整体光谱重建效果，还通过光谱交互增强了RGB结果。因此，我们建议将多光谱数据直接集成到球谐函数的颜色分量中，以紧凑建模每个高斯的多光谱反射率。此外，我们的分析揭示了在优化过程中何时以及如何引入光谱波段的若干关键权衡，为稳健的多模态3DGS重建提供了实用见解。\n"
  },
  {
    "path": "abs/2509.01469.md",
    "content": "### Im2Haircut: Single-view Strand-based Hair Reconstruction for Human Avatars\n\nWe present a novel approach for 3D hair reconstruction from single photographs based on a global hair prior combined with local optimization. Capturing strand-based hair geometry from single photographs is challenging due to the variety and geometric complexity of hairstyles and the lack of ground truth training data. Classical reconstruction methods like multi-view stereo only reconstruct the visible hair strands, missing the inner structure of hairstyles and hampering realistic hair simulation. To address this, existing methods leverage hairstyle priors trained on synthetic data. Such data, however, is limited in both quantity and quality since it requires manual work from skilled artists to model the 3D hairstyles and create near-photorealistic renderings. To address this, we propose a novel approach that uses both, real and synthetic data to learn an effective hairstyle prior. Specifically, we train a transformer-based prior model on synthetic data to obtain knowledge of the internal hairstyle geometry and introduce real data in the learning process to model the outer structure. This training scheme is able to model the visible hair strands depicted in an input image, while preserving the general 3D structure of hairstyles. We exploit this prior to create a Gaussian-splatting-based reconstruction method that creates hairstyles from one or more images. Qualitative and quantitative comparisons with existing reconstruction pipelines demonstrate the effectiveness and superior performance of our method for capturing detailed hair orientation, overall silhouette, and backside consistency.\n\n我们提出了一种基于全局发型先验与局部优化相结合的从单张照片进行三维头发重建的新方法。从单张照片中捕捉基于发丝的头发几何结构极具挑战性，这源于发型的多样性与几何复杂性，以及缺乏真实的训练数据。传统的重建方法（如多视角立体重建）只能重建可见的发丝，忽略了发型的内部结构，从而阻碍了真实感头发模拟。为了解决这一问题，现有方法通常利用在合成数据上训练的发型先验。然而，这类数据在数量和质量上都存在局限性，因为其需要专业艺术家手工建模三维发型并制作近乎照片级的渲染。针对这一问题，我们提出了一种新方法，结合真实数据和合成数据共同学习有效的发型先验。具体而言，我们在合成数据上训练一个基于Transformer的先验模型，以获取发型内部几何结构知识，并在训练过程中引入真实数据来建模外部结构。该训练方案能够对输入图像中可见的发丝进行建模，同时保持发型整体的三维结构。我们进一步利用这一先验，提出了一种基于高斯溅射的重建方法，可从一张或多张图像生成发型。与现有重建流程的定性与定量比较表明，我们的方法在捕捉头发细节方向、整体轮廓以及背面一致性方面具有有效性和优越性能。\n"
  },
  {
    "path": "abs/2509.01547.md",
    "content": "### FGO-SLAM: Enhancing Gaussian SLAM with Globally Consistent Opacity Radiance Field\n\nVisual SLAM has regained attention due to its ability to provide perceptual capabilities and simulation test data for Embodied AI. However, traditional SLAM methods struggle to meet the demands of high-quality scene reconstruction, and Gaussian SLAM systems, despite their rapid rendering and high-quality mapping capabilities, lack effective pose optimization methods and face challenges in geometric reconstruction. To address these issues, we introduce FGO-SLAM, a Gaussian SLAM system that employs an opacity radiance field as the scene representation to enhance geometric mapping performance. After initial pose estimation, we apply global adjustment to optimize camera poses and sparse point cloud, ensuring robust tracking of our approach. Additionally, we maintain a globally consistent opacity radiance field based on 3D Gaussians and introduce depth distortion and normal consistency terms to refine the scene representation. Furthermore, after constructing tetrahedral grids, we identify level sets to directly extract surfaces from 3D Gaussians. Results across various real-world and large-scale synthetic datasets demonstrate that our method achieves state-of-the-art tracking accuracy and mapping performance.\n\n视觉SLAM因其在赋能智能体AI中的感知能力和提供仿真测试数据的作用而重新受到关注。然而，传统的SLAM方法难以满足高质量场景重建的需求，而高斯SLAM系统虽然具备快速渲染和高质量建图能力，却缺乏有效的位姿优化方法，并在几何重建方面面临挑战。为解决这些问题，我们提出了FGO-SLAM，这是一种基于高斯的SLAM系统，采用不透明辐射场作为场景表示以增强几何建图性能。在初始位姿估计后，我们通过全局调整优化相机位姿和稀疏点云，从而确保方法的稳健跟踪。此外，我们基于三维高斯维护全局一致的不透明辐射场，并引入深度畸变和法向一致性项以进一步优化场景表示。同时，在构建四面体网格后，我们识别水平集以直接从三维高斯中提取表面。在多个真实场景和大规模合成数据集上的实验结果表明，我们的方法在跟踪精度和建图性能方面均达到了最新的先进水平。\n"
  },
  {
    "path": "abs/2509.01964.md",
    "content": "### 2D Gaussian Splatting with Semantic Alignment for Image Inpainting\n\nGaussian Splatting (GS), a recent technique for converting discrete points into continuous spatial representations, has shown promising results in 3D scene modeling and 2D image super-resolution. In this paper, we explore its untapped potential for image inpainting, which demands both locally coherent pixel synthesis and globally consistent semantic restoration. We propose the first image inpainting framework based on 2D Gaussian Splatting, which encodes incomplete images into a continuous field of 2D Gaussian splat coefficients and reconstructs the final image via a differentiable rasterization process. The continuous rendering paradigm of GS inherently promotes pixel-level coherence in the inpainted results. To improve efficiency and scalability, we introduce a patch-wise rasterization strategy that reduces memory overhead and accelerates inference. For global semantic consistency, we incorporate features from a pretrained DINO model. We observe that DINO's global features are naturally robust to small missing regions and can be effectively adapted to guide semantic alignment in large-mask scenarios, ensuring that the inpainted content remains contextually consistent with the surrounding scene. Extensive experiments on standard benchmarks demonstrate that our method achieves competitive performance in both quantitative metrics and perceptual quality, establishing a new direction for applying Gaussian Splatting to 2D image processing.\n\n高斯溅射（GS）是一种将离散点转换为连续空间表示的新技术，已在三维场景建模和二维图像超分辨率任务中展现出良好效果。本文探索了其在图像修复中的潜力，该任务既要求局部像素生成的连贯性，也要求全局语义的统一性。我们提出了首个基于二维高斯溅射的图像修复框架，将不完整图像编码为二维高斯溅射系数的连续场，并通过可微分的光栅化过程重建最终图像。GS的连续渲染范式天然促进了修复结果在像素级的连贯性。为提高效率和可扩展性，我们引入了一种分块光栅化策略，降低了内存开销并加速推理。为了保持全局语义一致性，我们结合了预训练的DINO模型特征。我们观察到，DINO的全局特征对小缺失区域具有天然的鲁棒性，并能有效适应大遮罩场景中的语义对齐，从而确保修复内容与周围场景在语境上保持一致。大量在标准基准上的实验表明，我们的方法在定量指标和感知质量方面均取得了有竞争力的性能，为高斯溅射在二维图像处理中的应用开辟了新的方向。\n"
  },
  {
    "path": "abs/2509.02141.md",
    "content": "### GRMM: Real-Time High-Fidelity Gaussian Morphable Head Model with Learned Residuals\n\n3D Morphable Models (3DMMs) enable controllable facial geometry and expression editing for reconstruction, animation, and AR/VR, but traditional PCA-based mesh models are limited in resolution, detail, and photorealism. Neural volumetric methods improve realism but remain too slow for interactive use. Recent Gaussian Splatting (3DGS) based facial models achieve fast, high-quality rendering but still depend solely on a mesh-based 3DMM prior for expression control, limiting their ability to capture fine-grained geometry, expressions, and full-head coverage. We introduce GRMM, the first full-head Gaussian 3D morphable model that augments a base 3DMM with residual geometry and appearance components, additive refinements that recover high-frequency details such as wrinkles, fine skin texture, and hairline variations. GRMM provides disentangled control through low-dimensional, interpretable parameters (e.g., identity shape, facial expressions) while separately modelling residuals that capture subject- and expression-specific detail beyond the base model's capacity. Coarse decoders produce vertex-level mesh deformations, fine decoders represent per-Gaussian appearance, and a lightweight CNN refines rasterised images for enhanced realism, all while maintaining 75 FPS real-time rendering. To learn consistent, high-fidelity residuals, we present EXPRESS-50, the first dataset with 60 aligned expressions across 50 identities, enabling robust disentanglement of identity and expression in Gaussian-based 3DMMs. Across monocular 3D face reconstruction, novel-view synthesis, and expression transfer, GRMM surpasses state-of-the-art methods in fidelity and expression accuracy while delivering interactive real-time performance.\n\n三维可变形模型（3DMM）能够实现可控的人脸几何与表情编辑，广泛应用于重建、动画和AR/VR，但传统基于PCA的网格模型在分辨率、细节和真实感方面存在局限。神经体积方法虽然提升了真实感，但速度过慢，难以满足交互式应用需求。近年来基于高斯溅射（3DGS）的人脸模型实现了快速且高质量的渲染，但仍完全依赖基于网格的3DMM先验进行表情控制，限制了其对细粒度几何、表情以及全头覆盖的捕捉能力。为此，我们提出了GRMM，这是首个全头高斯三维可变形模型，在基础3DMM上增添残差几何和外观组件，通过加性修复恢复诸如皱纹、精细皮肤纹理和发际线变化等高频细节。GRMM通过低维、可解释的参数（如身份形状、面部表情）提供解耦控制，同时单独建模残差，以捕捉超出基础模型能力的个体与表情特定细节。粗解码器生成顶点级网格变形，细解码器表示逐高斯的外观，一个轻量级CNN对光栅化图像进行细化以增强真实感，同时保持75 FPS的实时渲染。为学习一致且高保真的残差，我们提出了EXPRESS-50，这是首个包含50个身份、每个身份对齐60种表情的数据集，从而实现了高斯3DMM中身份与表情的稳健解耦。在单目三维人脸重建、新视角合成和表情迁移等任务上，GRMM在保真度和表情精度方面均超越现有最先进方法，并实现了交互式实时性能。\n"
  },
  {
    "path": "abs/2509.02232.md",
    "content": "### Efficient Geometry Compression and Communication for 3D Gaussian Splatting Point Clouds\n\nStorage and transmission challenges in dynamic 3D scene representation based on the i3DV platform, With increasing scene complexity, the explosive growth of 3D Gaussian data volume causes excessive storage space occupancy. To address this issue, we propose adopting the AVS PCRM reference software for efficient compression of Gaussian point cloud geometry data. The strategy deeply integrates the advanced encoding capabilities of AVS PCRM into the i3DV platform, forming technical complementarity with the original rate-distortion optimization mechanism based on binary hash tables. On one hand, the hash table efficiently caches inter-frame Gaussian point transformation relationships, which allows for high-fidelity transmission within a 40 Mbps bandwidth constraint. On the other hand, AVS PCRM performs precise compression on geometry data. Experimental results demonstrate that the joint framework maintains the advantages of fast rendering and high-quality synthesis in 3D Gaussian technology while achieving significant 10%-25% bitrate savings on universal test sets. It provides a superior rate-distortion tradeoff solution for the storage, transmission, and interaction of 3D volumetric video.\n\n基于i3DV平台的动态三维场景表示在存储与传输方面面临挑战。随着场景复杂度的提升，三维高斯数据体量呈爆炸式增长，导致存储空间占用过大。为解决这一问题，我们提出采用AVS PCRM参考软件，对高斯点云几何数据进行高效压缩。该策略将AVS PCRM的先进编码能力深度融入i3DV平台，与原有基于二进制哈希表的率失真优化机制形成技术互补。一方面，哈希表能够高效缓存帧间高斯点的变换关系，使其在40 Mbps带宽约束下实现高保真传输；另一方面，AVS PCRM对几何数据进行精确压缩。实验结果表明，该联合框架在保持三维高斯技术快速渲染与高质量合成优势的同时，在通用测试集上实现了显著的10%-25%的码率节省。该方法为三维体视频的存储、传输与交互提供了一种更优的率失真折中方案。\n"
  },
  {
    "path": "abs/2509.03775.md",
    "content": "### ContraGS: Codebook-Condensed and Trainable Gaussian Splatting for Fast, Memory-Efficient Reconstruction\n\n3D Gaussian Splatting (3DGS) is a state-of-art technique to model real-world scenes with high quality and real-time rendering. Typically, a higher quality representation can be achieved by using a large number of 3D Gaussians. However, using large 3D Gaussian counts significantly increases the GPU device memory for storing model parameters. A large model thus requires powerful GPUs with high memory capacities for training and has slower training/rendering latencies due to the inefficiencies of memory access and data movement. In this work, we introduce ContraGS, a method to enable training directly on compressed 3DGS representations without reducing the Gaussian Counts, and thus with a little loss in model quality. ContraGS leverages codebooks to compactly store a set of Gaussian parameter vectors throughout the training process, thereby significantly reducing memory consumption. While codebooks have been demonstrated to be highly effective at compressing fully trained 3DGS models, directly training using codebook representations is an unsolved challenge. ContraGS solves the problem of learning non-differentiable parameters in codebook-compressed representations by posing parameter estimation as a Bayesian inference problem. To this end, ContraGS provides a framework that effectively uses MCMC sampling to sample over a posterior distribution of these compressed representations. With ContraGS, we demonstrate that ContraGS significantly reduces the peak memory during training (on average 3.49X) and accelerated training and rendering (1.36X and 1.88X on average, respectively), while retraining close to state-of-art quality.\n\n三维高斯溅射（3DGS）是一种最先进的技术，可用于高质量、实时渲染的真实场景建模。通常，使用更多的三维高斯可以获得更高质量的表示。然而，大量高斯的使用会显著增加用于存储模型参数的GPU显存需求。如此庞大的模型不仅需要大显存的高性能GPU进行训练，还会因内存访问和数据传输效率低下而导致训练与渲染延迟增加。在本研究中，我们提出了ContraGS，这是一种能够直接在压缩的3DGS表示上进行训练的方法，无需减少高斯数量，从而仅带来极小的质量损失。ContraGS利用码本在整个训练过程中紧凑地存储一组高斯参数向量，从而显著降低内存消耗。尽管码本已被证明在压缩已完全训练的3DGS模型中非常有效，但直接基于码本表示进行训练仍是一个未解决的挑战。ContraGS通过将参数估计视为贝叶斯推理问题，解决了在码本压缩表示中学习不可微参数的难题。为此，ContraGS提供了一个框架，有效利用MCMC采样在这些压缩表示的后验分布上进行采样。实验表明，ContraGS在训练过程中显著降低了峰值内存（平均减少3.49倍），并加速了训练和渲染（平均加速1.36倍和1.88倍），同时保持接近最先进的重建质量。\n"
  },
  {
    "path": "abs/2509.04379.md",
    "content": "### SSGaussian: Semantic-Aware and Structure-Preserving 3D Style Transfer\n\nRecent advancements in neural representations, such as Neural Radiance Fields and 3D Gaussian Splatting, have increased interest in applying style transfer to 3D scenes. While existing methods can transfer style patterns onto 3D-consistent neural representations, they struggle to effectively extract and transfer high-level style semantics from the reference style image. Additionally, the stylized results often lack structural clarity and separation, making it difficult to distinguish between different instances or objects within the 3D scene. To address these limitations, we propose a novel 3D style transfer pipeline that effectively integrates prior knowledge from pretrained 2D diffusion models. Our pipeline consists of two key stages: First, we leverage diffusion priors to generate stylized renderings of key viewpoints. Then, we transfer the stylized key views onto the 3D representation. This process incorporates two innovative designs. The first is cross-view style alignment, which inserts cross-view attention into the last upsampling block of the UNet, allowing feature interactions across multiple key views. This ensures that the diffusion model generates stylized key views that maintain both style fidelity and instance-level consistency. The second is instance-level style transfer, which effectively leverages instance-level consistency across stylized key views and transfers it onto the 3D representation. This results in a more structured, visually coherent, and artistically enriched stylization. Extensive qualitative and quantitative experiments demonstrate that our 3D style transfer pipeline significantly outperforms state-of-the-art methods across a wide range of scenes, from forward-facing to challenging 360-degree environments.\n\n近年来，随着神经表示（如Neural Radiance Fields和3D Gaussian Splatting）的快速发展，人们对三维场景风格迁移的研究兴趣日益增长。现有方法虽然能够将风格图案迁移到三维一致的神经表示上，但在从参考风格图像中有效提取并传递高层次风格语义方面仍存在困难。此外，生成的风格化结果常常缺乏结构清晰度和层次分离，难以在三维场景中区分不同的实例或物体。为了解决这些问题，我们提出了一种新颖的三维风格迁移管线，能够有效整合来自预训练二维扩散模型的先验知识。整个管线包括两个关键阶段：首先，我们利用扩散模型的先验生成关键视角的风格化渲染结果；其次，将这些风格化关键视图迁移到三维表示中。在这一过程中，我们设计了两个创新模块。第一个是跨视图风格对齐模块（cross-view style alignment），在UNet的最后一个上采样块中引入跨视图注意力机制，从而实现多关键视图间的特征交互，确保扩散模型生成的关键视图既保持风格一致性，又具备实例级一致性。第二个是实例级风格迁移模块（instance-level style transfer），该模块有效利用关键视图之间的实例一致性，将其迁移到三维表示中，从而获得结构更加清晰、视觉更加协调且艺术表现更为丰富的风格化结果。大量定性与定量实验表明，我们提出的三维风格迁移管线在从前向场景到复杂的360°环境等多种场景下，均显著优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2509.05075.md",
    "content": "### GeoSplat: A Deep Dive into Geometry-Constrained Gaussian Splatting\n\nA few recent works explored incorporating geometric priors to regularize the optimization of Gaussian splatting, further improving its performance. However, those early studies mainly focused on the use of low-order geometric priors (e.g., normal vector), and they might also be unreliably estimated by noise-sensitive methods, like local principal component analysis. To address their limitations, we first present GeoSplat, a general geometry-constrained optimization framework that exploits both first-order and second-order geometric quantities to improve the entire training pipeline of Gaussian splatting, including Gaussian initialization, gradient update, and densification. As an example, we initialize the scales of 3D Gaussian primitives in terms of principal curvatures, leading to a better coverage of the object surface than random initialization. Secondly, based on certain geometric structures (e.g., local manifold), we introduce efficient and noise-robust estimation methods that provide dynamic geometric priors for our framework. We conduct extensive experiments on multiple datasets for novel view synthesis, showing that our framework, GeoSplat, significantly improves the performance of Gaussian splatting and outperforms previous baselines.\n\n近期有一些研究探索了将几何先验引入高斯溅射（Gaussian Splatting）的优化过程，以进一步提升其性能。然而，这些早期研究主要集中于使用低阶几何先验（如法向量），并且往往依赖于对噪声敏感的估计方法（例如局部主成分分析），从而导致结果不够稳定。为了解决这些局限性，我们提出了一个通用的几何约束优化框架——GeoSplat。该框架同时利用一阶和二阶几何量来改进高斯溅射的整个训练流程，包括高斯初始化、梯度更新和密集化等环节。举例而言，我们基于主曲率来初始化三维高斯原语的尺度参数，相比随机初始化能更好地覆盖物体表面。其次，我们还基于特定的几何结构（如局部流形）设计了高效且抗噪的估计方法，为框架提供动态更新的几何先验。我们在多个新视角合成数据集上进行了大量实验，结果表明，我们提出的GeoSplat框架显著提升了高斯溅射的性能，并优于以往的基线方法。\n"
  },
  {
    "path": "abs/2509.05515.md",
    "content": "### Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting\n\nRecently, distilling open-vocabulary language features from 2D images into 3D Gaussians has attracted significant attention. Although existing methods achieve impressive language-based interactions of 3D scenes, we observe two fundamental issues: background Gaussians contributing negligibly to a rendered pixel get the same feature as the dominant foreground ones, and multi-view inconsistencies due to view-specific noise in language embeddings. We introduce Visibility-Aware Language Aggregation (VALA), a lightweight yet effective method that computes marginal contributions for each ray and applies a visibility-aware gate to retain only visible Gaussians. Moreover, we propose a streaming weighted geometric median in cosine space to merge noisy multi-view features. Our method yields a robust, view-consistent language feature embedding in a fast and memory-efficient manner. VALA improves open-vocabulary localization and segmentation across reference datasets, consistently surpassing existing works.\n\n近年来，将开放词汇语言特征从二维图像蒸馏到三维高斯表示中受到了广泛关注。虽然现有方法在三维场景的语言交互方面取得了令人印象深刻的成果，但我们观察到其中存在两个根本性问题：其一，背景高斯对渲染像素的贡献几乎可以忽略，却被赋予与主导前景相同的特征；其二，多视图语言嵌入中存在视角特定噪声，导致跨视图不一致性。为此，我们提出了可见性感知语言聚合方法（Visibility-Aware Language Aggregation, VALA），这是一种轻量但高效的方案。VALA通过为每条光线计算边际贡献，并引入可见性门控机制，仅保留可见的高斯。此外，我们还提出了一种在余弦空间中进行流式加权几何中值融合的方法，用于整合多视图噪声特征。该方法能够快速、内存高效地生成稳健且跨视图一致的语言特征嵌入。实验结果表明，VALA在多个参考数据集上的开放词汇定位和分割任务中均取得了显著提升，并持续超越现有方法。\n"
  },
  {
    "path": "abs/2509.05582.md",
    "content": "### Reconstruction and Reenactment Separated Method for Realistic Gaussian Head\n\nIn this paper, we explore a reconstruction and reenactment separated framework for 3D Gaussians head, which requires only a single portrait image as input to generate controllable avatar. Specifically, we developed a large-scale one-shot gaussian head generator built upon WebSSL and employed a two-stage training approach that significantly enhances the capabilities of generalization and high-frequency texture reconstruction. During inference, an ultra-lightweight gaussian avatar driven by control signals enables high frame-rate rendering, achieving 90 FPS at a resolution of 512x512. We further demonstrate that the proposed framework follows the scaling law, whereby increasing the parameter scale of the reconstruction module leads to improved performance. Moreover, thanks to the separation design, driving efficiency remains unaffected. Finally, extensive quantitative and qualitative experiments validate that our approach outperforms current state-of-the-art methods.\n\n本文提出了一种重建与驱动分离的三维高斯人头生成框架，该方法仅需输入一张人像图像即可生成可控虚拟头像。具体而言，我们基于WebSSL构建了一个大规模单次生成（one-shot）的高斯人头生成器，并采用了两阶段训练策略，从而显著提升了模型的泛化能力与高频纹理重建效果。在推理阶段，一个由控制信号驱动的超轻量级高斯头像模型实现了高帧率渲染，在512×512分辨率下可达90帧每秒（FPS）。进一步实验表明，所提出的框架遵循“缩放定律”（scaling law）：当重建模块的参数规模增加时，模型性能也随之提升。此外，得益于重建与驱动的分离设计，驱动阶段的效率不受影响。最后，大量定量与定性实验验证了我们的方法在性能上全面优于当前最先进的技术。\n"
  },
  {
    "path": "abs/2509.06400.md",
    "content": "### 3DOF+Quantization: 3DGS quantization for large scenes with limited Degrees of Freedom\n\n3D Gaussian Splatting (3DGS) is a major breakthrough in 3D scene reconstruction. With a number of views of a given object or scene, the algorithm trains a model composed of 3D gaussians, which enables the production of novel views from arbitrary points of view. This freedom of movement is referred to as 6DoF for 6 degrees of freedom: a view is produced for any position (3 degrees), orientation of camera (3 other degrees). On large scenes, though, the input views are acquired from a limited zone in space, and the reconstruction is valuable for novel views from the same zone, even if the scene itself is almost unlimited in size. We refer to this particular case as 3DoF+, meaning that the 3 degrees of freedom of camera position are limited to small offsets around the central position. Considering the problem of coordinate quantization, the impact of position error on the projection error in pixels is studied. It is shown that the projection error is proportional to the squared inverse distance of the point being projected. Consequently, a new quantization scheme based on spherical coordinates is proposed. Rate-distortion performance of the proposed method are illustrated on the well-known Garden scene.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）是三维场景重建领域的一项重大突破。给定某个物体或场景的多视角图像，该算法通过训练由三维高斯组成的模型，从而能够从任意视点生成新视角图像。这种自由的视角变化被称为“6自由度”（6DoF），即视图可以根据任意位置（3个自由度）和相机朝向（另3个自由度）进行生成。然而，在大规模场景中，输入视图通常来自空间中的有限区域，因此重建结果仅在该区域内生成的新视角具有实际价值，即使整个场景的空间范围几乎是无限的。我们将这种特定情形称为“3DoF+”，表示相机位置的三个自由度仅限于围绕中心位置的小范围偏移。本文研究了坐标量化（coordinate quantization）问题，分析了位置误差对像素投影误差的影响。研究表明，投影误差与被投影点距离的平方反比成正比。基于此，我们提出了一种基于球坐标系的新型量化方案。最后，我们在著名的“Garden”场景上展示了该方法的率失真（rate-distortion）性能表现。\n"
  },
  {
    "path": "abs/2509.07021.md",
    "content": "### MEGS2: Memory-Efficient Gaussian Splatting via Spherical Gaussians and Unified Pruning\n\n3D Gaussian Splatting (3DGS) has emerged as a dominant novel-view synthesis technique, but its high memory consumption severely limits its applicability on edge devices. A growing number of 3DGS compression methods have been proposed to make 3DGS more efficient, yet most only focus on storage compression and fail to address the critical bottleneck of rendering memory. To address this problem, we introduce MEGS2, a novel memory-efficient framework that tackles this challenge by jointly optimizing two key factors: the total primitive number and the parameters per primitive, achieving unprecedented memory compression. Specifically, we replace the memory-intensive spherical harmonics with lightweight, arbitrarily oriented spherical Gaussian lobes as our color representations. More importantly, we propose a unified soft pruning framework that models primitive-number and lobe-number pruning as a single constrained optimization problem. Experiments show that MEGS2 achieves a 50% static VRAM reduction and a 40% rendering VRAM reduction compared to existing methods, while maintaining comparable rendering quality.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）已成为当前最具代表性的三维新视角合成技术之一，但其高内存消耗严重限制了其在边缘设备上的应用。为提高3DGS的效率，近年来出现了越来越多的压缩方法，但多数方法仅关注存储压缩，而忽视了渲染内存这一关键瓶颈。为了解决这一问题，我们提出了一个全新的高内存效率框架——MEGS2。该框架通过联合优化两个关键因素：原语总数量与每个原语的参数数量，从而实现了前所未有的内存压缩效果。具体而言，我们用轻量化的任意方向球面高斯瓣（spherical Gaussian lobes）替代了内存占用较高的球谐函数作为颜色表示方式。更重要的是，我们提出了一个统一的软剪枝框架（soft pruning framework），将原语数量剪枝与高斯瓣数量剪枝建模为一个受约束的联合优化问题。实验结果表明，与现有方法相比，MEGS2在保持渲染质量基本一致的情况下，实现了50%的静态显存占用减少和40%的渲染显存占用降低。\n"
  },
  {
    "path": "abs/2509.07435.md",
    "content": "### DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation\n\nThe labor- and experience-intensive creation of 3D assets with physically based rendering (PBR) materials demands an autonomous 3D asset creation pipeline. However, most existing 3D generation methods focus on geometry modeling, either baking textures into simple vertex colors or leaving texture synthesis to post-processing with image diffusion models. To achieve end-to-end PBR-ready 3D asset generation, we present Lightweight Gaussian Asset Adapter (LGAA), a novel framework that unifies the modeling of geometry and PBR materials by exploiting multi-view (MV) diffusion priors from a novel perspective. The LGAA features a modular design with three components. Specifically, the LGAA Wrapper reuses and adapts network layers from MV diffusion models, which encapsulate knowledge acquired from billions of images, enabling better convergence in a data-efficient manner. To incorporate multiple diffusion priors for geometry and PBR synthesis, the LGAA Switcher aligns multiple LGAA Wrapper layers encapsulating different knowledge. Then, a tamed variational autoencoder (VAE), termed LGAA Decoder, is designed to predict 2D Gaussian Splatting (2DGS) with PBR channels. Finally, we introduce a dedicated post-processing procedure to effectively extract high-quality, relightable mesh assets from the resulting 2DGS. Extensive quantitative and qualitative experiments demonstrate the superior performance of LGAA with both text-and image-conditioned MV diffusion models. Additionally, the modular design enables flexible incorporation of multiple diffusion priors, and the knowledge-preserving scheme leads to efficient convergence trained on merely 69k multi-view instances.\n\n基于物理渲染（Physically Based Rendering, PBR）材质的三维资产创建过程往往高度依赖人工经验与劳动投入，因此迫切需要一种自主化的三维资产生成管线。然而，现有大多数三维生成方法主要聚焦于几何建模，要么仅将纹理烘焙为简单的顶点颜色，要么将纹理合成交由基于图像的扩散模型后处理。为实现端到端的PBR就绪三维资产生成，我们提出了轻量高斯资产适配器（Lightweight Gaussian Asset Adapter, LGAA），这是一个新颖的框架，从全新视角利用多视角（Multi-View, MV）扩散模型的先验，实现几何与PBR材质建模的统一。LGAA具有模块化设计，由三个核心组件组成：首先，LGAA Wrapper重用并适配多视角扩散模型的网络层，这些层包含从数十亿张图像中学习到的知识，从而在数据高效的情况下实现更快的收敛；其次，LGAA Switcher用于对齐封装不同知识的多个LGAA Wrapper层，从而融合几何与PBR合成的多重扩散先验；第三，我们设计了一个经过调控的变分自编码器（tamed Variational Autoencoder, VAE），称为LGAA Decoder，用于预测包含PBR通道的二维高斯溅射（2D Gaussian Splatting, 2DGS）。最后，我们引入了专门的后处理流程，从生成的2DGS中高效提取高质量、可再光照（relightable）的网格资产。大量定量与定性实验表明，LGAA在基于文本和图像条件的多视角扩散模型上均表现出优越性能。此外，其模块化设计支持灵活集成多种扩散先验，而知识保留机制使得模型仅需在69k个多视角样本上训练即可实现高效收敛。\n"
  },
  {
    "path": "abs/2509.07493.md",
    "content": "### Accurate and Complete Surface Reconstruction from 3D Gaussians via Direct SDF Learning\n\n3D Gaussian Splatting (3DGS) has recently emerged as a powerful paradigm for photorealistic view synthesis, representing scenes with spatially distributed Gaussian primitives. While highly effective for rendering, achieving accurate and complete surface reconstruction remains challenging due to the unstructured nature of the representation and the absence of explicit geometric supervision. In this work, we propose DiGS, a unified framework that embeds Signed Distance Field (SDF) learning directly into the 3DGS pipeline, thereby enforcing strong and interpretable surface priors. By associating each Gaussian with a learnable SDF value, DiGS explicitly aligns primitives with underlying geometry and improves cross-view consistency. To further ensure dense and coherent coverage, we design a geometry-guided grid growth strategy that adaptively distributes Gaussians along geometry-consistent regions under a multi-scale hierarchy. Extensive experiments on standard benchmarks, including DTU, Mip-NeRF 360, and Tanks& Temples, demonstrate that DiGS consistently improves reconstruction accuracy and completeness while retaining high rendering fidelity.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）近年来已成为实现照片级真实感视图合成的强大范式，它通过空间分布的高斯原语来表示场景。尽管3DGS在渲染方面表现出色，但由于其表示形式非结构化且缺乏显式几何监督，实现精确且完整的表面重建仍然具有挑战性。为此，我们提出了DiGS——一种将有符号距离场（Signed Distance Field, SDF）学习直接嵌入3DGS管线的统一框架，从而引入强而可解释的几何表面先验。通过为每个高斯分配一个可学习的SDF值，DiGS显式地将高斯原语与底层几何结构对齐，并提升了跨视角一致性。为进一步确保稠密且连贯的表面覆盖，我们设计了一种几何引导的网格生长策略（geometry-guided grid growth），在多尺度层次下自适应地将高斯分布于几何一致区域。大量在标准基准（包括DTU、Mip-NeRF 360和Tanks & Temples）上的实验表明，DiGS在保持高渲染保真度的同时，显著提升了重建的精确度与完整性。\n"
  },
  {
    "path": "abs/2509.07552.md",
    "content": "### PanoLAM: Large Avatar Model for Gaussian Full-Head Synthesis from One-shot Unposed Image\n\nWe present a feed-forward framework for Gaussian full-head synthesis from a single unposed image. Unlike previous work that relies on time-consuming GAN inversion and test-time optimization, our framework can reconstruct the Gaussian full-head model given a single unposed image in a single forward pass. This enables fast reconstruction and rendering during inference. To mitigate the lack of large-scale 3D head assets, we propose a large-scale synthetic dataset from trained 3D GANs and train our framework using only synthetic data. For efficient high-fidelity generation, we introduce a coarse-to-fine Gaussian head generation pipeline, where sparse points from the FLAME model interact with the image features by transformer blocks for feature extraction and coarse shape reconstruction, which are then densified for high-fidelity reconstruction. To fully leverage the prior knowledge residing in pretrained 3D GANs for effective reconstruction, we propose a dual-branch framework that effectively aggregates the structured spherical triplane feature and unstructured point-based features for more effective Gaussian head reconstruction. Experimental results show the effectiveness of our framework towards existing work.\n\n本文提出了一种前馈式框架，用于从单张无姿态图像生成完整的高斯人头模型。与以往依赖耗时的GAN反演和测试阶段优化的方法不同，我们的框架仅通过一次前向传播即可根据单张无姿态图像重建高斯人头模型，从而在推理过程中实现快速重建与渲染。为弥补大规模三维人头资产的匮乏，我们利用训练好的三维GAN模型构建了一个大规模的合成数据集，并仅使用合成数据对框架进行训练。为实现高效且高保真的生成，我们设计了一条由粗到细的高斯人头生成管线：首先，来自FLAME模型的稀疏点通过Transformer模块与图像特征交互以提取特征并重建粗略形状，随后通过密集化操作实现高保真重建。为了充分利用预训练三维GAN中蕴含的先验知识以提升重建效果，我们进一步提出了双分支框架（dual-branch framework），能够有效融合结构化的球面三平面（spherical triplane）特征与非结构化的点云特征，从而实现更精确的高斯人头重建。实验结果表明，与现有方法相比，我们的框架在重建质量与效率方面均表现出显著优势。\n"
  },
  {
    "path": "abs/2509.07774.md",
    "content": "### HairGS: Hair Strand Reconstruction based on 3D Gaussian Splatting\n\nHuman hair reconstruction is a challenging problem in computer vision, with growing importance for applications in virtual reality and digital human modeling. Recent advances in 3D Gaussians Splatting (3DGS) provide efficient and explicit scene representations that naturally align with the structure of hair strands. In this work, we extend the 3DGS framework to enable strand-level hair geometry reconstruction from multi-view images. Our multi-stage pipeline first reconstructs detailed hair geometry using a differentiable Gaussian rasterizer, then merges individual Gaussian segments into coherent strands through a novel merging scheme, and finally refines and grows the strands under photometric supervision. While existing methods typically evaluate reconstruction quality at the geometric level, they often neglect the connectivity and topology of hair strands. To address this, we propose a new evaluation metric that serves as a proxy for assessing topological accuracy in strand reconstruction. Extensive experiments on both synthetic and real-world datasets demonstrate that our method robustly handles a wide range of hairstyles and achieves efficient reconstruction, typically completing within one hour.\n\n人类头发重建是计算机视觉中的一项具有挑战性的问题，在虚拟现实和数字人建模等应用中具有日益重要的意义。近年来，三维高斯溅射（3D Gaussian Splatting, 3DGS）的发展为场景提供了一种高效且显式的表示方式，其天然契合了发丝的结构特性。在本研究中，我们扩展了3DGS框架，使其能够从多视角图像中实现基于发丝级别的头发几何重建。我们提出的多阶段管线首先利用可微分高斯光栅化器（differentiable Gaussian rasterizer）重建头发的细节几何结构，然后通过一种全新的合并策略将独立的高斯段融合为连贯的发丝，最后在光度监督下对发丝进行精修与生长。现有方法通常仅在几何层面评估重建质量，而忽视了发丝的连通性与拓扑结构。针对这一问题，我们提出了一种新的评估指标，用于近似衡量发丝重建的拓扑准确性。大量基于合成数据和真实数据集的实验结果表明，我们的方法能够稳健地处理多种不同发型，并实现高效重建，通常在一小时内即可完成。\n"
  },
  {
    "path": "abs/2509.07809.md",
    "content": "### SplatFill: 3D Scene Inpainting via Depth-Guided Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has enabled the creation of highly realistic 3D scene representations from sets of multi-view images. However, inpainting missing regions, whether due to occlusion or scene editing, remains a challenging task, often leading to blurry details, artifacts, and inconsistent geometry. In this work, we introduce SplatFill, a novel depth-guided approach for 3DGS scene inpainting that achieves state-of-the-art perceptual quality and improved efficiency. Our method combines two key ideas: (1) joint depth-based and object-based supervision to ensure inpainted Gaussians are accurately placed in 3D space and aligned with surrounding geometry, and (2) we propose a consistency-aware refinement scheme that selectively identifies and corrects inconsistent regions without disrupting the rest of the scene. Evaluations on the SPIn-NeRF dataset demonstrate that SplatFill not only surpasses existing NeRF-based and 3DGS-based inpainting methods in visual fidelity but also reduces training time by 24.5%. Qualitative results show our method delivers sharper details, fewer artifacts, and greater coherence across challenging viewpoints.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）使得从多视角图像集构建高度逼真的三维场景表示成为可能。然而，无论是由于遮挡造成的缺失区域，还是场景编辑引起的空洞修复（inpainting），仍然是一个具有挑战性的问题，常常会导致细节模糊、伪影增多以及几何结构不一致等问题。为此，我们提出了SplatFill——一种基于深度引导的3DGS场景修复新方法，能够在感知质量和效率上均达到当前最优水平。该方法融合了两个核心设计思想：（1）结合基于深度和基于物体的联合监督机制，以确保填充后的高斯在三维空间中精确定位并与周围几何结构对齐；（2）提出一致性感知的细化策略（consistency-aware refinement scheme），能够选择性地检测并修正不一致区域，而不会破坏场景的其他部分。在SPIn-NeRF数据集上的实验结果表明，SplatFill不仅在视觉保真度上显著优于现有的基于NeRF和基于3DGS的修复方法，还将训练时间缩短了24.5%。定性结果进一步显示，我们的方法在细节清晰度、伪影抑制和跨视角一致性方面均表现出更高的水准。\n"
  },
  {
    "path": "abs/2509.10241.md",
    "content": "### On the Geometric Accuracy of Implicit and Primitive-based Representations Derived from View Rendering Constraints\n\nWe present the first systematic comparison of implicit and explicit Novel View Synthesis methods for space-based 3D object reconstruction, evaluating the role of appearance embeddings. While embeddings improve photometric fidelity by modeling lighting variation, we show they do not translate into meaningful gains in geometric accuracy - a critical requirement for space robotics applications. Using the SPEED+ dataset, we compare K-Planes, Gaussian Splatting, and Convex Splatting, and demonstrate that embeddings primarily reduce the number of primitives needed for explicit methods rather than enhancing geometric fidelity. Moreover, convex splatting achieves more compact and clutter-free representations than Gaussian splatting, offering advantages for safety-critical applications such as interaction and collision avoidance. Our findings clarify the limits of appearance embeddings for geometry-centric tasks and highlight trade-offs between reconstruction quality and representation efficiency in space scenarios.\n\n本文首次对基于空间的三维物体重建任务中，隐式与显式新视角合成（Novel View Synthesis）方法进行了系统比较，并重点评估了外观嵌入（appearance embeddings）的作用。尽管外观嵌入通过建模光照变化能够提升光度保真度，但我们的实验表明，它们并未在几何精度上带来显著提升——而几何精度正是空间机器人应用中的关键要求。基于SPEED+数据集，我们比较了K-Planes、Gaussian Splatting和Convex Splatting三种方法，结果显示外观嵌入主要作用于减少显式方法所需的原语数量，而非提升几何精度。此外，Convex Splatting相较于Gaussian Splatting能生成更紧凑且无杂散的表示形式，在交互与避碰等安全关键任务中具有潜在优势。我们的研究结果揭示了外观嵌入在几何中心任务中的局限性，并进一步阐明了空间场景中重建质量与表示效率之间的权衡关系。\n"
  },
  {
    "path": "abs/2509.10678.md",
    "content": "### T2Bs: Text-to-Character Blendshapes via Video Generation\n\nWe present T2Bs, a framework for generating high-quality, animatable character head morphable models from text by combining static text-to-3D generation with video diffusion. Text-to-3D models produce detailed static geometry but lack motion synthesis, while video diffusion models generate motion with temporal and multi-view geometric inconsistencies. T2Bs bridges this gap by leveraging deformable 3D Gaussian splatting to align static 3D assets with video outputs. By constraining motion with static geometry and employing a view-dependent deformation MLP, T2Bs (i) outperforms existing 4D generation methods in accuracy and expressiveness while reducing video artifacts and view inconsistencies, and (ii) reconstructs smooth, coherent, fully registered 3D geometries designed to scale for building morphable models with diverse, realistic facial motions. This enables synthesizing expressive, animatable character heads that surpass current 4D generation techniques.\n\n我们提出了T2Bs框架，一种结合静态文本到三维（text-to-3D）生成与视频扩散（video diffusion）的方法，用于从文本生成高质量、可动画化的角色头部可变形模型（morphable models）。现有的text-to-3D模型能够生成细致的静态几何结构，但缺乏运动合成能力；而视频扩散模型虽然能够生成运动，但通常存在时间一致性差和多视图几何不一致等问题。T2Bs通过引入可变形三维高斯溅射（deformable 3D Gaussian splatting）来对齐静态三维资产与视频输出，从而弥合两者之间的差距。该方法通过静态几何约束运动，并引入视角相关的变形多层感知机（view-dependent deformation MLP），从而：（i）在准确性与表现力方面优于现有的四维生成（4D generation）方法，同时有效减少视频伪影与视角不一致问题；（ii）能够重建平滑、连贯且完全配准的三维几何，为构建具有多样化且逼真面部运动的可变形模型提供可扩展基础。这一框架实现了超越当前4D生成技术的高表现力可动画角色头部合成能力。\n"
  },
  {
    "path": "abs/2509.11003.md",
    "content": "### AD-GS: Alternating Densification for Sparse-Input 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has shown impressive results in real-time novel view synthesis. However, it often struggles under sparse-view settings, producing undesirable artifacts such as floaters, inaccurate geometry, and overfitting due to limited observations. We find that a key contributing factor is uncontrolled densification, where adding Gaussian primitives rapidly without guidance can harm geometry and cause artifacts. We propose AD-GS, a novel alternating densification framework that interleaves high and low densification phases. During high densification, the model densifies aggressively, followed by photometric loss based training to capture fine-grained scene details. Low densification then primarily involves aggressive opacity pruning of Gaussians followed by regularizing their geometry through pseudo-view consistency and edge-aware depth smoothness. This alternating approach helps reduce overfitting by carefully controlling model capacity growth while progressively refining the scene representation. Extensive experiments on challenging datasets demonstrate that AD-GS significantly improves rendering quality and geometric consistency compared to existing methods.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）在实时新视角合成中取得了令人印象深刻的成果。然而，在稀疏视图条件下，3DGS常常表现不佳，容易产生悬浮伪影（floaters）、几何结构不准确以及因观测有限而导致的过拟合等问题。我们发现，其关键原因之一在于“无约束的密集化”（uncontrolled densification）——即在缺乏指导的情况下快速增加高斯原语，反而可能破坏几何结构并引入伪影。为此，我们提出了AD-GS，这是一种交替密集化（alternating densification）框架，通过在高密集化与低密集化阶段之间交替优化，实现对模型复杂度的有效控制与逐步精化。在高密集化阶段，模型进行积极的高斯扩增，并结合基于光度损失的训练以捕获细粒度场景细节；而在低密集化阶段，主要进行高斯的不透明度剪枝（opacity pruning），并通过伪视图一致性（pseudo-view consistency）与边缘感知的深度平滑（edge-aware depth smoothness）约束几何结构。该交替优化策略通过精细控制模型容量增长，有效缓解了过拟合问题，并持续提升场景表示的精度与一致性。大量在具有挑战性的数据集上的实验结果表明，AD-GS在渲染质量与几何一致性方面均显著优于现有方法。\n"
  },
  {
    "path": "abs/2509.11116.md",
    "content": "### SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) enables fast, high-quality novel view synthesis but typically relies on densification followed by pruning to optimize the number of Gaussians. Existing mask-based pruning, such as MaskGS, regularizes the global mean of the mask, which is misaligned with the local per-pixel (per-ray) reconstruction loss that determines image quality along individual camera rays. This paper introduces SVR-GS, a spatially variant regularizer that renders a per-pixel spatial mask from each Gaussian's effective contribution along the ray, thereby applying sparsity pressure where it matters: on low-importance Gaussians. We explore three spatial-mask aggregation strategies, implement them in CUDA, and conduct a gradient analysis to motivate our final design. Extensive experiments on Tanks\\&Temples, Deep Blending, and Mip-NeRF360 datasets demonstrate that, on average across the three datasets, the proposed SVR-GS reduces the number of Gaussians by 1.79× compared to MaskGS and 5.63× compared to 3DGS, while incurring only 0.50 dB and 0.40 dB PSNR drops, respectively. These gains translate into significantly smaller, faster, and more memory-efficient models, making them well-suited for real-time applications such as robotics, AR/VR, and mobile perception.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）实现了快速且高质量的新视角合成，但通常依赖“密集化（densification）+剪枝（pruning）”的过程来优化高斯原语数量。现有的基于掩码的剪枝方法（如MaskGS）通过正则化掩码的全局均值来约束模型，但这种做法与决定单个相机光线图像质量的局部逐像素（per-pixel / per-ray）重建误差并不一致。为解决这一问题，我们提出了SVR-GS，一种空间可变正则化方法（spatially variant regularizer）。SVR-GS通过渲染每个高斯沿光线的有效贡献，生成逐像素空间掩码，从而在关键位置施加稀疏性约束，重点抑制低重要性高斯。我们探索了三种空间掩码聚合策略，并基于CUDA实现了高效加速，同时进行了梯度分析以验证最终设计的合理性。在Tanks&Temples、Deep Blending和Mip-NeRF360三个基准数据集上的实验表明，SVR-GS在平均表现上相比MaskGS减少了1.79倍的高斯数量，相比原始3DGS减少了5.63倍，同时PSNR仅下降0.50 dB和0.40 dB。这一性能提升使得SVR-GS生成的模型在规模更小、渲染更快、内存更高效的同时，依然保持高质量表现，非常适合实时应用场景，如机器人、增强/虚拟现实（AR/VR）以及移动感知。\n"
  },
  {
    "path": "abs/2509.11171.md",
    "content": "### SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion\n\nCamera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, assessing voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer from high computational cost and slow convergence when handling large-scale, complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based SSC, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, the Semantic-guided Gaussian Initialization (SGI) module leverages dual-branch 3D scene representations to locate focal voxels as anchors to guide efficient Gaussian initialization. Then, the Physical-aware Harmonics Enhancement (PHE) module incorporates semantic spherical harmonics to model physical-aware contextual details and promote semantic-geometry consistency through focal distribution alignment, generating SSC results with realistic details. Extensive experiments and analyses on the popular SemanticKITTI and SSCBench-KITTI-360 benchmarks validate the effectiveness of SPHERE.\n\n基于摄像头的三维语义场景补全（3D Semantic Scene Completion, SSC）是自动驾驶系统中的关键任务，其目标是对体素级几何与语义进行评估，从而实现对场景的整体感知。尽管现有的基于体素（voxel-based）和基于平面（plane-based）的SSC方法已取得显著进展，但它们在捕捉物理规律以生成逼真几何细节方面仍存在不足。另一方面，神经重建方法（如NeRF和3DGS）在物理感知方面表现优越，但在处理大规模、复杂的自动驾驶场景时，往往面临高计算成本和收敛速度慢的问题，从而导致语义精度下降。为解决上述问题，我们提出了用于基于摄像头SSC任务的**语义-物理融合表示（Semantic-PHysical Engaged REpresentation, SPHERE）**，该方法将体素与高斯表示相结合，以联合利用语义与物理信息。具体而言，**语义引导高斯初始化模块（Semantic-guided Gaussian Initialization, SGI）**通过双分支三维场景表示定位关键体素作为锚点，从而引导高效的高斯初始化；随后，**物理感知谐波增强模块（Physical-aware Harmonics Enhancement, PHE）**结合语义球谐函数（semantic spherical harmonics）建模物理感知的上下文细节，并通过焦点分布对齐（focal distribution alignment）促进语义与几何的一致性，从而生成具有真实细节的SSC结果。我们在SemanticKITTI和SSCBench-KITTI-360等主流基准数据集上进行了大量实验与分析，结果验证了SPHERE方法的有效性。\n"
  },
  {
    "path": "abs/2509.11275.md",
    "content": "### ROSGS: Relightable Outdoor Scenes With Gaussian Splatting\n\nImage data captured outdoors often exhibit unbounded scenes and unconstrained, varying lighting conditions, making it challenging to decompose them into geometry, reflectance, and illumination. Recent works have focused on achieving this decomposition using Neural Radiance Fields (NeRF) or the 3D Gaussian Splatting (3DGS) representation but remain hindered by two key limitations: the high computational overhead associated with neural networks of NeRF and the use of low-frequency lighting representations, which often result in inefficient rendering and suboptimal relighting accuracy. We propose ROSGS, a two-stage pipeline designed to efficiently reconstruct relightable outdoor scenes using the Gaussian Splatting representation. By leveraging monocular normal priors, ROSGS first reconstructs the scene's geometry with the compact 2D Gaussian Splatting (2DGS) representation, providing an efficient and accurate geometric foundation. Building upon this reconstructed geometry, ROSGS then decomposes the scene's texture and lighting through a hybrid lighting model. This model effectively represents typical outdoor lighting by employing a spherical Gaussian function to capture the directional, high-frequency components of sunlight, while learning a radiance transfer function via Spherical Harmonic coefficients to model the remaining low-frequency skylight comprehensively. Both quantitative metrics and qualitative comparisons demonstrate that ROSGS achieves state-of-the-art performance in relighting outdoor scenes and highlight its ability to deliver superior relighting accuracy and rendering efficiency.\n\n室外拍摄的图像数据通常包含无边界场景和不受约束的光照变化，这使得将其分解为几何、反射率和光照成分成为一项具有挑战性的任务。近年来，一些研究尝试利用神经辐射场（Neural Radiance Fields, NeRF）或三维高斯溅射（3D Gaussian Splatting, 3DGS）表示来实现这种分解，但仍面临两个关键问题：一是NeRF神经网络带来的高计算开销；二是低频光照表示的使用，导致渲染效率低下和重光照（relighting）精度不足。为了解决这些问题，我们提出了ROSGS——一种基于高斯溅射表示的两阶段高效室外可重光照场景重建管线。ROSGS首先利用单目法线先验（monocular normal priors），通过紧凑的二维高斯溅射（2D Gaussian Splatting, 2DGS）表示重建场景几何，从而提供高效且准确的几何基础。在此基础上，ROSGS进一步通过一种混合光照模型（hybrid lighting model）对场景的纹理与光照进行分解。该模型采用球面高斯函数（spherical Gaussian function）捕捉太阳光的方向性高频分量，同时通过球谐系数（Spherical Harmonic coefficients）学习辐射传输函数，以全面建模剩余的低频天空光。定量指标与定性对比实验表明，ROSGS在室外场景重光照任务中达到了当前最先进的性能，展现出卓越的重光照精度与渲染效率。\n"
  },
  {
    "path": "abs/2509.11574.md",
    "content": "### Gaussian-Plus-SDF SLAM: High-fidelity 3D Reconstruction at 150+ fps\n\nWhile recent Gaussian-based SLAM methods achieve photorealistic reconstruction from RGB-D data, their computational performance remains a critical bottleneck. State-of-the-art techniques operate at less than 20 fps, significantly lagging behind geometry-centric approaches like KinectFusion (hundreds of fps). This limitation stems from the heavy computational burden: modeling scenes requires numerous Gaussians and complex iterative optimization to fit RGB-D data, where insufficient Gaussian counts or optimization iterations cause severe quality degradation. To address this, we propose a Gaussian-SDF hybrid representation, combining a colorized Signed Distance Field (SDF) for smooth geometry and appearance with 3D Gaussians to capture underrepresented details. The SDF is efficiently constructed via RGB-D fusion (as in geometry-centric methods), while Gaussians undergo iterative optimization. Our representation enables drastic Gaussian reduction (50% fewer) by avoiding full-scene Gaussian modeling, and efficient Gaussian optimization (75% fewer iterations) through targeted appearance refinement. Building upon this representation, we develop GPS-SLAM (Gaussian-Plus-SDF SLAM), a real-time 3D reconstruction system achieving over 150 fps on real-world Azure Kinect sequences -- delivering an order-of-magnitude speedup over state-of-the-art techniques while maintaining comparable reconstruction quality. We will release the source code and data to facilitate future research.\n\n尽管近年来基于高斯的SLAM方法能够从RGB-D数据中实现照片级真实感的重建，但其计算性能仍是一个关键瓶颈。当前最先进的方法通常运行速度不足20帧每秒（fps），远低于几何为中心的方法（如KinectFusion，可达数百fps）。这一限制主要源于巨大的计算负担：场景建模需要大量高斯原语以及复杂的迭代优化过程来拟合RGB-D数据；若高斯数量或优化迭代不足，重建质量会显著下降。为此，我们提出了一种**高斯-SDF混合表示（Gaussian-SDF hybrid representation）**，将彩色有符号距离场（colorized Signed Distance Field, SDF）与三维高斯相结合，用于同时建模平滑的几何与外观，并补充未充分表示的细节。SDF部分通过类似几何中心方法的RGB-D融合高效构建，而高斯部分则通过迭代优化进行外观细化。该混合表示使得高斯数量减少约50%，避免了全场景高斯建模；同时，基于目标区域的外观优化使得优化迭代次数减少约75%。基于此表示，我们开发了**GPS-SLAM（Gaussian-Plus-SDF SLAM）**——一种实时三维重建系统，在真实的Azure Kinect数据上可实现超过150帧每秒的重建速度，相较现有最先进方法实现了数量级的加速，同时保持了可比的重建质量。我们将公开源码与数据以促进后续研究。\n"
  },
  {
    "path": "abs/2509.11624.md",
    "content": "### A Controllable 3D Deepfake Generation Framework with Gaussian Splatting\n\nWe propose a novel 3D deepfake generation framework based on 3D Gaussian Splatting that enables realistic, identity-preserving face swapping and reenactment in a fully controllable 3D space. Compared to conventional 2D deepfake approaches that suffer from geometric inconsistencies and limited generalization to novel view, our method combines a parametric head model with dynamic Gaussian representations to support multi-view consistent rendering, precise expression control, and seamless background integration. To address editing challenges in point-based representations, we explicitly separate the head and background Gaussians and use pre-trained 2D guidance to optimize the facial region across views. We further introduce a repair module to enhance visual consistency under extreme poses and expressions. Experiments on NeRSemble and additional evaluation videos demonstrate that our method achieves comparable performance to state-of-the-art 2D approaches in identity preservation, as well as pose and expression consistency, while significantly outperforming them in multi-view rendering quality and 3D consistency. Our approach bridges the gap between 3D modeling and deepfake synthesis, enabling new directions for scene-aware, controllable, and immersive visual forgeries, revealing the threat that emerging 3D Gaussian Splatting technique could be used for manipulation attacks.\n\n我们提出了一种基于三维高斯溅射（3D Gaussian Splatting）的新型三维深度伪造生成框架，能够在可完全控制的三维空间中实现真实且保留身份特征的换脸与重演。与传统的二维深度伪造方法相比，后者常受几何不一致性与对新视角泛化能力有限的问题影响，我们的方法将参数化头部模型与动态高斯表示相结合，从而支持多视角一致渲染、精确的表情控制以及与背景的无缝融合。为了解决点基表示（point-based representations）中的编辑难题，我们对头部与背景高斯进行了显式分离，并利用预训练的二维引导在多视图间优化面部区域。此外，我们引入了一个修复模块，以在极端姿态与表情下增强视觉一致性。在NeRSemble及其他评估视频上的实验表明，我们的方法在身份保持以及姿态与表情一致性方面可达到与最先进二维方法相当的效果，同时在多视角渲染质量与三维一致性方面显著优于它们。我们的方法弥合了三维建模与深度伪造合成之间的鸿沟，推动了面向场景、可控且沉浸式的视觉伪造新方向，并揭示了新兴的三维高斯溅射技术可能被用于操纵攻击的风险。\n"
  },
  {
    "path": "abs/2509.11853.md",
    "content": "### Segmentation-Driven Initialization for Sparse-view 3D Gaussian Splatting\n\nSparse-view synthesis remains a challenging problem due to the difficulty of recovering accurate geometry and appearance from limited observations. While recent advances in 3D Gaussian Splatting (3DGS) have enabled real-time rendering with competitive quality, existing pipelines often rely on Structure-from-Motion (SfM) for camera pose estimation, an approach that struggles in genuinely sparse-view settings. Moreover, several SfM-free methods replace SfM with multi-view stereo (MVS) models, but generate massive numbers of 3D Gaussians by back-projecting every pixel into 3D space, leading to high memory costs. We propose Segmentation-Driven Initialization for Gaussian Splatting (SDI-GS), a method that mitigates inefficiency by leveraging region-based segmentation to identify and retain only structurally significant regions. This enables selective downsampling of the dense point cloud, preserving scene fidelity while substantially reducing Gaussian count. Experiments across diverse benchmarks show that SDI-GS reduces Gaussian count by up to 50% and achieves comparable or superior rendering quality in PSNR and SSIM, with only marginal degradation in LPIPS. It further enables faster training and lower memory footprint, advancing the practicality of 3DGS for constrained-view scenarios.\n\n稀疏视图合成仍然是一个具有挑战性的问题，因为从有限的观测中恢复精确的几何与外观信息极为困难。尽管近年来三维高斯溅射（3D Gaussian Splatting, 3DGS）在实时高质量渲染方面取得了显著进展，但现有管线通常依赖结构自运动（Structure-from-Motion, SfM）进行相机位姿估计，而该方法在真正的稀疏视图场景下表现不佳。此外，一些无SfM方法采用多视图立体（Multi-View Stereo, MVS）模型替代SfM，但往往通过将每个像素反投影到三维空间来生成大量三维高斯，从而导致极高的内存开销。为解决这一问题，我们提出了**基于分割驱动的高斯初始化方法（Segmentation-Driven Initialization for Gaussian Splatting, SDI-GS）**，该方法通过区域级分割识别并保留结构上重要的区域，从而显著提升建模效率。这一策略实现了对稠密点云的选择性下采样，在保持场景保真度的同时大幅减少高斯数量。大量实验表明，SDI-GS在多个基准数据集上可将高斯数量减少多达50%，并在PSNR和SSIM指标上取得与现有方法相当甚至更优的渲染质量，仅在LPIPS上有轻微性能下降。此外，该方法还显著提升了训练速度并降低了内存占用，使3DGS在受限视角场景下的应用更加实用。\n"
  },
  {
    "path": "abs/2509.11964.md",
    "content": "### E2-BKI: Evidential Ellipsoidal Bayesian Kernel Inference for Uncertainty-aware Gaussian Semantic Mapping\n\nSemantic mapping aims to construct a 3D semantic representation of the environment, providing essential knowledge for robots operating in complex outdoor settings. While Bayesian Kernel Inference (BKI) addresses discontinuities of map inference from sparse sensor data, existing semantic mapping methods suffer from various sources of uncertainties in challenging outdoor environments. To address these issues, we propose an uncertainty-aware semantic mapping framework that handles multiple sources of uncertainties, which significantly degrade mapping performance. Our method estimates uncertainties in semantic predictions using Evidential Deep Learning and incorporates them into BKI for robust semantic inference. It further aggregates noisy observations into coherent Gaussian representations to mitigate the impact of unreliable points, while employing geometry-aligned kernels that adapt to complex scene structures. These Gaussian primitives effectively fuse local geometric and semantic information, enabling robust, uncertainty-aware mapping in complex outdoor scenarios. Comprehensive evaluation across diverse off-road and urban outdoor environments demonstrates consistent improvements in mapping quality, uncertainty calibration, representational flexibility, and robustness, while maintaining real-time efficiency.\n\n语义建图（Semantic Mapping）旨在构建环境的三维语义表示，为机器人在复杂的户外环境中运行提供关键知识。尽管贝叶斯核推断（Bayesian Kernel Inference, BKI）在应对稀疏传感数据带来的地图推断不连续性方面表现良好，但现有的语义建图方法在复杂的户外环境中仍面临多种不确定性来源，严重影响建图性能。为此，我们提出了一种**不确定性感知的语义建图框架（Uncertainty-Aware Semantic Mapping Framework）**，能够同时处理多种类型的不确定性，从而显著提升建图的鲁棒性。该方法基于证据深度学习（Evidential Deep Learning）估计语义预测的不确定性，并将其引入BKI框架中以实现稳健的语义推断。此外，我们将噪声观测聚合为一致的高斯表示，以减弱不可靠点的影响，并设计了**几何对齐核函数（geometry-aligned kernels）**以自适应复杂场景结构。这些高斯原语能够有效融合局部几何与语义信息，从而实现鲁棒的、不确定性感知的复杂户外场景建图。在多种越野与城市户外环境中的综合评估表明，该方法在建图质量、不确定性校准、表示灵活性及鲁棒性等方面均取得了显著提升，同时保持了实时性能。\n"
  },
  {
    "path": "abs/2509.12138.md",
    "content": "### Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization\n\n3D Gaussian Splatting (3D-GS) has recently emerged as a powerful technique for real-time, photorealistic rendering by optimizing anisotropic Gaussian primitives from view-dependent images. While 3D-GS has been extended to scientific visualization, prior work remains limited to single-GPU settings, restricting scalability for large datasets on high-performance computing (HPC) systems. We present a distributed 3D-GS pipeline tailored for HPC. Our approach partitions data across nodes, trains Gaussian splats in parallel using multi-nodes and multi-GPUs, and merges splats for global rendering. To eliminate artifacts, we add ghost cells at partition boundaries and apply background masks to remove irrelevant pixels. Benchmarks on the Richtmyer-Meshkov datasets (about 106.7M Gaussians) show up to 3X speedup across 8 nodes on Polaris while preserving image quality. These results demonstrate that distributed 3D-GS enables scalable visualization of large-scale scientific data and provide a foundation for future in situ applications.\n\n三维高斯溅射（3D Gaussian Splatting, 3D-GS）近年来作为一种强大的实时照片级渲染技术崭露头角，其核心思想是从视角相关图像中优化各向异性高斯原语。尽管3D-GS已被扩展用于科学可视化，但现有研究主要局限于单GPU环境，难以在高性能计算（High-Performance Computing, HPC）系统上高效处理大规模数据集。为此，我们提出了一种面向HPC环境的**分布式3D-GS管线**。该方法通过对数据进行分区，在多节点、多GPU间并行训练高斯溅射，并在训练后对各子区域结果进行合并以实现全局渲染。为消除边界伪影，我们在分区边界引入“幽灵单元”（ghost cells），并使用背景掩码去除无关像素。基于Richtmyer-Meshkov数据集（约1.067亿个高斯）的基准测试结果表明，我们的方法在Polaris系统上使用8个节点可实现高达3倍的加速，同时保持图像质量不变。实验结果表明，分布式3D-GS能够实现对大规模科学数据的可扩展可视化，为未来的**原位可视化**应用奠定了基础。\n"
  },
  {
    "path": "abs/2509.12931.md",
    "content": "### 4DRadar-GS: Self-Supervised Dynamic Driving Scene Reconstruction with 4D Radar\n\n3D reconstruction and novel view synthesis are critical for validating autonomous driving systems and training advanced perception models. Recent self-supervised methods have gained significant attention due to their cost-effectiveness and enhanced generalization in scenarios where annotated bounding boxes are unavailable. However, existing approaches, which often rely on frequency-domain decoupling or optical flow, struggle to accurately reconstruct dynamic objects due to imprecise motion estimation and weak temporal consistency, resulting in incomplete or distorted representations of dynamic scene elements. To address these challenges, we propose 4DRadar-GS, a 4D Radar-augmented self-supervised 3D reconstruction framework tailored for dynamic driving scenes. Specifically, we first present a 4D Radar-assisted Gaussian initialization scheme that leverages 4D Radar's velocity and spatial information to segment dynamic objects and recover monocular depth scale, generating accurate Gaussian point representations. In addition, we propose a Velocity-guided PointTrack (VGPT) model, which is jointly trained with the reconstruction pipeline under scene flow supervision, to track fine-grained dynamic trajectories and construct temporally consistent representations. Evaluated on the OmniHD-Scenes dataset, 4DRadar-GS achieves state-of-the-art performance in dynamic driving scene 3D reconstruction.\n\n三维重建与新视角合成在自动驾驶系统验证和高级感知模型训练中具有关键作用。近年来，自监督方法因其高性价比和在无标注边界框场景中的优越泛化能力而备受关注。然而，现有方法多依赖频域解耦或光流估计，在处理动态目标时仍存在运动估计不精确、时间一致性弱等问题，导致动态场景要素的重建结果不完整或发生畸变。为应对这些挑战，我们提出了**4DRadar-GS**——一种针对动态驾驶场景的**基于4D雷达增强的自监督三维重建框架**。具体而言，我们首先提出了**4D雷达辅助的高斯初始化方案（4D Radar-assisted Gaussian Initialization）**，利用4D雷达的速度与空间信息对动态目标进行分割并恢复单目深度尺度，从而生成精确的高斯点表示。此外，我们还提出了**速度引导的点追踪模型（Velocity-guided PointTrack, VGPT）**，该模型在场景流监督下与重建管线联合训练，以跟踪细粒度的动态轨迹并构建时间一致的场景表示。在OmniHD-Scenes数据集上的实验结果表明，4DRadar-GS在动态驾驶场景三维重建任务中取得了当前最优的性能。\n"
  },
  {
    "path": "abs/2509.12938.md",
    "content": "### Beyond Averages: Open-Vocabulary 3D Scene Understanding with Gaussian Splatting and Bag of Embeddings\n\nNovel view synthesis has seen significant advancements with 3D Gaussian Splatting (3DGS), enabling real-time photorealistic rendering. However, the inherent fuzziness of Gaussian Splatting presents challenges for 3D scene understanding, restricting its broader applications in AR/VR and robotics. While recent works attempt to learn semantics via 2D foundation model distillation, they inherit fundamental limitations: alpha blending averages semantics across objects, making 3D-level understanding impossible. We propose a paradigm-shifting alternative that bypasses differentiable rendering for semantics entirely. Our key insight is to leverage predecomposed object-level Gaussians and represent each object through multiview CLIP feature aggregation, creating comprehensive \"bags of embeddings\" that holistically describe objects. This allows: (1) accurate open-vocabulary object retrieval by comparing text queries to object-level (not Gaussian-level) embeddings, and (2) seamless task adaptation: propagating object IDs to pixels for 2D segmentation or to Gaussians for 3D extraction. Experiments demonstrate that our method effectively overcomes the challenges of 3D open-vocabulary object extraction while remaining comparable to state-of-the-art performance in 2D open-vocabulary segmentation, ensuring minimal compromise.\n\n三维新视角合成（Novel View Synthesis）随着三维高斯溅射（3D Gaussian Splatting, 3DGS）的发展取得了显著突破，实现了实时的照片级渲染。然而，高斯溅射固有的模糊性对三维场景理解带来了挑战，从而限制了其在增强现实（AR）、虚拟现实（VR）以及机器人等领域的广泛应用。尽管近期一些工作尝试通过二维基础模型蒸馏（2D foundation model distillation）来学习语义信息，但这类方法存在根本性缺陷：α混合会在多个物体间平均语义，使得实现三维层面的语义理解成为不可能。为此，我们提出了一种**全新的范式转变式方法**，完全绕过语义的可微渲染过程。我们的核心思想是利用预分解的物体级高斯表示（object-level Gaussians），并通过多视角CLIP特征聚合构建出全面的“特征嵌入包（bags of embeddings）”，从整体上描述每个物体。这一设计带来了两大能力：（1）通过将文本查询与物体级（而非高斯级）嵌入进行匹配，实现精确的开放词汇物体检索；（2）实现任务的无缝适配：可将物体ID传播到像素层用于二维分割，或传播到高斯层用于三维提取。实验结果表明，我们的方法有效克服了三维开放词汇物体提取的核心难题，同时在二维开放词汇分割任务上保持与当前最先进方法相当的性能，实现了性能与灵活性的双重平衡。\n"
  },
  {
    "path": "abs/2509.13013.md",
    "content": "### Dream3DAvatar: Text-Controlled 3D Avatar Reconstruction from a Single Image\n\nWith the rapid advancement of 3D representation techniques and generative models, substantial progress has been made in reconstructing full-body 3D avatars from a single image. However, this task remains fundamentally ill-posedness due to the limited information available from monocular input, making it difficult to control the geometry and texture of occluded regions during generation. To address these challenges, we redesign the reconstruction pipeline and propose Dream3DAvatar, an efficient and text-controllable two-stage framework for 3D avatar generation. In the first stage, we develop a lightweight, adapter-enhanced multi-view generation model. Specifically, we introduce the Pose-Adapter to inject SMPL-X renderings and skeletal information into SDXL, enforcing geometric and pose consistency across views. To preserve facial identity, we incorporate ID-Adapter-G, which injects high-resolution facial features into the generation process. Additionally, we leverage BLIP2 to generate high-quality textual descriptions of the multi-view images, enhancing text-driven controllability in occluded regions. In the second stage, we design a feedforward Transformer model equipped with a multi-view feature fusion module to reconstruct high-fidelity 3D Gaussian Splat representations (3DGS) from the generated images. Furthermore, we introduce ID-Adapter-R, which utilizes a gating mechanism to effectively fuse facial features into the reconstruction process, improving high-frequency detail recovery. Extensive experiments demonstrate that our method can generate realistic, animation-ready 3D avatars without any post-processing and consistently outperforms existing baselines across multiple evaluation metrics.\n\n随着三维表示技术与生成模型的快速发展，从单张图像重建全身三维头像的研究取得了显著进展。然而，由于单目输入所提供的信息有限，该任务在本质上仍是不适定问题，使得在生成过程中难以有效控制被遮挡区域的几何形状与纹理外观。为应对这一挑战，我们重新设计了重建流程，提出了 **Dream3DAvatar**——一个高效且可文本控制的两阶段三维头像生成框架。第一阶段中，我们构建了一个轻量化、适配器增强的多视角生成模型。具体而言，我们引入 **Pose-Adapter**，将 SMPL-X 渲染与骨骼信息注入到 SDXL 模型中，以确保多视角间的几何与姿态一致性。为了保持面部身份特征，我们进一步设计 **ID-Adapter-G**，在生成过程中注入高分辨率的面部特征。此外，我们利用 **BLIP2** 自动生成多视角图像的高质量文本描述，从而增强在被遮挡区域的文本驱动可控性。在第二阶段，我们设计了一个具备多视角特征融合模块的前馈式 Transformer 模型，以从生成图像中重建高保真的三维高斯溅射表示（3DGS）。此外，我们提出 **ID-Adapter-R**，通过门控机制将面部特征有效融合到重建过程中，从而提升高频细节的恢复能力。大量实验表明，我们的方法能够在无需后处理的情况下生成逼真且可直接用于动画制作的三维头像，并在多项评测指标上持续优于现有基线方法。\n"
  },
  {
    "path": "abs/2509.13482.md",
    "content": "### Improving 3D Gaussian Splatting Compression by Scene-Adaptive Lattice Vector Quantization\n\n3D Gaussian Splatting (3DGS) is rapidly gaining popularity for its photorealistic rendering quality and real-time performance, but it generates massive amounts of data. Hence compressing 3DGS data is necessary for the cost effectiveness of 3DGS models. Recently, several anchor-based neural compression methods have been proposed, achieving good 3DGS compression performance. However, they all rely on uniform scalar quantization (USQ) due to its simplicity. A tantalizing question is whether more sophisticated quantizers can improve the current 3DGS compression methods with very little extra overhead and minimal change to the system. The answer is yes by replacing USQ with lattice vector quantization (LVQ). To better capture scene-specific characteristics, we optimize the lattice basis for each scene, improving LVQ's adaptability and R-D efficiency. This scene-adaptive LVQ (SALVQ) strikes a balance between the R-D efficiency of vector quantization and the low complexity of USQ. SALVQ can be seamlessly integrated into existing 3DGS compression architectures, enhancing their R-D performance with minimal modifications and computational overhead. Moreover, by scaling the lattice basis vectors, SALVQ can dynamically adjust lattice density, enabling a single model to accommodate multiple bit rate targets. This flexibility eliminates the need to train separate models for different compression levels, significantly reducing training time and memory consumption.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）因其逼真的渲染质量和实时性能而迅速流行，但同时会产生海量数据。因此，为了提高 3DGS 模型的成本效益，对 3DGS 数据进行压缩是必要的。近年来，已有多种基于锚点的神经压缩方法被提出，并在 3DGS 压缩性能上取得了良好效果。然而，这些方法普遍依赖于统一标量量化（Uniform Scalar Quantization, USQ），主要因为其实现简单。一个引人注目的问题是：能否通过更复杂的量化器，在几乎不增加开销、且仅需最小改动的情况下进一步提升当前 3DGS 压缩方法的性能？答案是肯定的——可以通过将 USQ 替换为晶格向量量化（Lattice Vector Quantization, LVQ）实现。为了更好地捕捉场景特有特征，我们针对每个场景优化晶格基，从而提高 LVQ 的自适应性与码率失真（R-D）效率。这种场景自适应晶格向量量化（Scene-Adaptive LVQ, SALVQ）在向量量化的 R-D 效率与 USQ 的低复杂度之间取得了平衡。SALVQ 能够无缝集成到现有的 3DGS 压缩架构中，以极小的修改和计算开销提升其 R-D 性能。此外，通过缩放晶格基向量，SALVQ 可以动态调整晶格密度，使单个模型能够支持多种比特率目标。这种灵活性消除了针对不同压缩等级分别训练模型的需求，大大减少了训练时间和内存消耗。\n"
  },
  {
    "path": "abs/2509.13536.md",
    "content": "### MemGS: Memory-Efficient Gaussian Splatting for Real-Time SLAM\n\nRecent advancements in 3D Gaussian Splatting (3DGS) have made a significant impact on rendering and reconstruction techniques. Current research predominantly focuses on improving rendering performance and reconstruction quality using high-performance desktop GPUs, largely overlooking applications for embedded platforms like micro air vehicles (MAVs). These devices, with their limited computational resources and memory, often face a trade-off between system performance and reconstruction quality. In this paper, we improve existing methods in terms of GPU memory usage while enhancing rendering quality. Specifically, to address redundant 3D Gaussian primitives in SLAM, we propose merging them in voxel space based on geometric similarity. This reduces GPU memory usage without impacting system runtime performance. Furthermore, rendering quality is improved by initializing 3D Gaussian primitives via Patch-Grid (PG) point sampling, enabling more accurate modeling of the entire scene. Quantitative and qualitative evaluations on publicly available datasets demonstrate the effectiveness of our improvements.\n\n近年来，三维高斯溅射（3D Gaussian Splatting, 3DGS）的发展对渲染与重建技术产生了显著影响。现有研究主要集中在利用高性能桌面 GPU 提升渲染性能与重建质量，而对诸如微型飞行器（Micro Air Vehicles, MAVs）等嵌入式平台的应用关注较少。由于这些设备的计算资源与内存有限，往往需要在系统性能与重建质量之间进行权衡。本文针对这一问题，从 GPU 内存占用与渲染质量两方面改进了现有方法。具体而言，为了解决 SLAM 过程中冗余三维高斯基元的问题，我们在体素空间中基于几何相似性进行合并，从而在不影响系统运行时性能的前提下降低 GPU 内存占用。此外，我们通过基于 Patch-Grid（PG）点采样的初始化方法提升了三维高斯基元的渲染质量，使得整体场景建模更加精确。在公开数据集上的定量与定性实验结果均验证了所提改进方法的有效性。\n"
  },
  {
    "path": "abs/2509.13938.md",
    "content": "### Plug-and-Play PDE Optimization for 3D Gaussian Splatting: Toward High-Quality Rendering and Reconstruction\n\n3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction by achieving high-quality novel view synthesis with fast rendering speed, introducing 3D Gaussian primitives to represent the scene. However, 3DGS encounters blurring and floaters when applied to complex scenes, caused by the reconstruction of redundant and ambiguous geometric structures. We attribute this issue to the unstable optimization of the Gaussians. To address this limitation, we present a plug-and-play PDE-based optimization method that overcomes the optimization constraints of 3DGS-based approaches in various tasks, such as novel view synthesis and surface reconstruction. Firstly, we theoretically derive that the 3DGS optimization procedure can be modeled as a PDE, and introduce a viscous term to ensure stable optimization. Secondly, we use the Material Point Method (MPM) to obtain a stable numerical solution of the PDE, which enhances both global and local constraints. Additionally, an effective Gaussian densification strategy and particle constraints are introduced to ensure fine-grained details. Extensive qualitative and quantitative experiments confirm that our method achieves state-of-the-art rendering and reconstruction quality.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）以其高速渲染与高质量的新视图合成能力，革新了辐射场重建方法，引入三维高斯基元作为场景表示。然而，当 3DGS 应用于复杂场景时，常会出现模糊与漂浮伪影等问题，其根源在于冗余和模糊几何结构的重建。我们将这一现象归因于高斯基元优化过程的不稳定性。为解决这一问题，本文提出了一种可即插即用的基于偏微分方程（PDE）的优化方法，突破了现有 3DGS 方法在新视图合成与表面重建等任务中的优化限制。首先，我们从理论上推导出 3DGS 的优化过程可建模为一个 PDE，并引入粘性项以确保优化的稳定性。其次，我们采用材料点法（Material Point Method, MPM）来求解 PDE 的稳定数值解，从而在全局与局部层面同时增强约束。此外，我们还提出了一种有效的高斯致密化策略与粒子约束机制，以保证细节的精细表达。大量定性与定量实验结果表明，本文方法在渲染与重建质量上均达到了当前最优水平。\n"
  },
  {
    "path": "abs/2509.14191.md",
    "content": "### MCGS-SLAM: A Multi-Camera SLAM Framework Using Gaussian Splatting for High-Fidelity Mapping\n\nRecent progress in dense SLAM has primarily targeted monocular setups, often at the expense of robustness and geometric coverage. We present MCGS-SLAM, the first purely RGB-based multi-camera SLAM system built on 3D Gaussian Splatting (3DGS). Unlike prior methods relying on sparse maps or inertial data, MCGS-SLAM fuses dense RGB inputs from multiple viewpoints into a unified, continuously optimized Gaussian map. A multi-camera bundle adjustment (MCBA) jointly refines poses and depths via dense photometric and geometric residuals, while a scale consistency module enforces metric alignment across views using low-rank priors. The system supports RGB input and maintains real-time performance at large scale. Experiments on synthetic and real-world datasets show that MCGS-SLAM consistently yields accurate trajectories and photorealistic reconstructions, usually outperforming monocular baselines. Notably, the wide field of view from multi-camera input enables reconstruction of side-view regions that monocular setups miss, critical for safe autonomous operation. These results highlight the promise of multi-camera Gaussian Splatting SLAM for high-fidelity mapping in robotics and autonomous driving.\n\n近年来，稠密 SLAM 的研究主要集中于单目系统，这往往以牺牲鲁棒性和几何覆盖率为代价。本文提出 **MCGS-SLAM**，这是首个基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的纯 RGB 多相机 SLAM 系统。与以往依赖稀疏地图或惯性数据的方法不同，MCGS-SLAM 将来自多个视角的稠密 RGB 输入融合为一个统一、持续优化的高斯地图。系统中的多相机联合优化（Multi-Camera Bundle Adjustment, MCBA）通过稠密的光度与几何残差共同优化相机位姿与深度；同时，尺度一致性模块利用低秩先验实现多视角间的度量对齐。该系统支持 RGB 输入，并在大规模场景下保持实时性能。在公开的合成与真实数据集上的实验表明，MCGS-SLAM 能稳定地获得高精度轨迹与逼真的重建结果，整体性能优于单目基线方法。值得注意的是，多相机输入带来的宽视场显著提升了侧向区域的重建能力，而这些区域在单目系统中往往被忽略，这对于安全的自主操作尤为关键。实验结果表明，多相机高斯溅射 SLAM 在机器人与自动驾驶等领域的高保真建图任务中具有广阔的应用前景。\n"
  },
  {
    "path": "abs/2509.14421.md",
    "content": "### Perception-Integrated Safety Critical Control via Analytic Collision Cone Barrier Functions on 3D Gaussian Splatting\n\nWe present a perception-driven safety filter that converts each 3D Gaussian Splat (3DGS) into a closed-form forward collision cone, which in turn yields a first-order control barrier function (CBF) embedded within a quadratic program (QP). By exploiting the analytic geometry of splats, our formulation provides a continuous, closed-form representation of collision constraints that is both simple and computationally efficient. Unlike distance-based CBFs, which tend to activate reactively only when an obstacle is already close, our collision-cone CBF activates proactively, allowing the robot to adjust earlier and thereby produce smoother and safer avoidance maneuvers at lower computational cost. We validate the method on a large synthetic scene with approximately 170k splats, where our filter reduces planning time by a factor of 3 and significantly decreased trajectory jerk compared to a state-of-the-art 3DGS planner, while maintaining the same level of safety. The approach is entirely analytic, requires no high-order CBF extensions (HOCBFs), and generalizes naturally to robots with physical extent through a principled Minkowski-sum inflation of the splats. These properties make the method broadly applicable to real-time navigation in cluttered, perception-derived extreme environments, including space robotics and satellite systems.\n\n本文提出了一种基于感知驱动的安全过滤器（safety filter），将每个三维高斯溅射（3D Gaussian Splat, 3DGS）转换为闭式的前向碰撞锥（forward collision cone），从而构建出嵌入于二次规划（Quadratic Program, QP）中的一阶控制屏障函数（Control Barrier Function, CBF）。该方法充分利用高斯溅射的解析几何特性，提供了一种连续、闭式且计算高效的碰撞约束表示。与传统的基于距离的 CBF 不同，后者通常在障碍物已接近时才被动触发，我们的碰撞锥 CBF 能够**主动激活**，使机器人能够更早地调整轨迹，从而在更低的计算成本下实现更平滑、更安全的避障行为。我们在一个包含约 17 万个高斯溅射的大规模合成场景中验证了该方法，结果表明，相较于最新的 3DGS 规划器，我们的过滤器将规划时间缩短了 3 倍，并显著降低了轨迹抖动（trajectory jerk），同时保持相同的安全水平。该方法完全基于解析形式，无需使用高阶控制屏障函数（HOCBF），并可通过严格的 Minkowski 求和膨胀（Minkowski-sum inflation）自然推广至具有物理尺寸的机器人。凭借这些特性，该方法可广泛应用于复杂、基于感知的极端环境中的实时导航任务，如空间机器人与卫星系统等。\n"
  },
  {
    "path": "abs/2509.14687.md",
    "content": "### RealMirror: A Comprehensive, Open-Source Vision-Language-Action Platform for Embodied AI\n\nThe emerging field of Vision-Language-Action (VLA) for humanoid robots faces several fundamental challenges, including the high cost of data acquisition, the lack of a standardized benchmark, and the significant gap between simulation and the real world. To overcome these obstacles, we propose RealMirror, a comprehensive, open-source embodied AI VLA platform. RealMirror builds an efficient, low-cost data collection, model training, and inference system that enables end-to-end VLA research without requiring a real robot. To facilitate model evolution and fair comparison, we also introduce a dedicated VLA benchmark for humanoid robots, featuring multiple scenarios, extensive trajectories, and various VLA models. Furthermore, by integrating generative models and 3D Gaussian Splatting to reconstruct realistic environments and robot models, we successfully demonstrate zero-shot Sim2Real transfer, where models trained exclusively on simulation data can perform tasks on a real robot seamlessly, without any fine-tuning. In conclusion, with the unification of these critical components, RealMirror provides a robust framework that significantly accelerates the development of VLA models for humanoid robots.\n\n新兴的人形机器人视觉-语言-动作（Vision-Language-Action, VLA）研究领域面临若干根本性挑战，包括数据采集成本高、缺乏统一基准测试，以及模拟与真实世界之间存在显著差距。为应对这些问题，我们提出了 **RealMirror**——一个全面的开源具身智能 VLA 平台。RealMirror 构建了一个高效、低成本的数据采集、模型训练与推理系统，使研究者无需真实机器人即可开展端到端的 VLA 研究。为促进模型演化与公平比较，我们还引入了一个专门面向人形机器人的 VLA 基准测试，涵盖多种场景、大量轨迹及多类型 VLA 模型。此外，我们结合生成模型与三维高斯溅射（3D Gaussian Splatting）重建逼真的环境与机器人模型，实现了**零样本仿真到现实（Sim2Real）迁移**——即仅基于模拟数据训练的模型即可无缝执行真实机器人任务，无需任何微调。综上所述，RealMirror 通过整合这些关键组件，提供了一个稳健的框架，显著加速了人形机器人 VLA 模型的研究与发展。\n"
  },
  {
    "path": "abs/2509.14739.md",
    "content": "### FMGS-Avatar: Mesh-Guided 2D Gaussian Splatting with Foundation Model Priors for 3D Monocular Avatar Reconstruction\n\nReconstructing high-fidelity animatable human avatars from monocular videos remains challenging due to insufficient geometric information in single-view observations. While recent 3D Gaussian Splatting methods have shown promise, they struggle with surface detail preservation due to the free-form nature of 3D Gaussian primitives. To address both the representation limitations and information scarcity, we propose a novel method, \\textbf{FMGS-Avatar}, that integrates two key innovations. First, we introduce Mesh-Guided 2D Gaussian Splatting, where 2D Gaussian primitives are attached directly to template mesh faces with constrained position, rotation, and movement, enabling superior surface alignment and geometric detail preservation. Second, we leverage foundation models trained on large-scale datasets, such as Sapiens, to complement the limited visual cues from monocular videos. However, when distilling multi-modal prior knowledge from foundation models, conflicting optimization objectives can emerge as different modalities exhibit distinct parameter sensitivities. We address this through a coordinated training strategy with selective gradient isolation, enabling each loss component to optimize its relevant parameters without interference. Through this combination of enhanced representation and coordinated information distillation, our approach significantly advances 3D monocular human avatar reconstruction. Experimental evaluation demonstrates superior reconstruction quality compared to existing methods, with notable gains in geometric accuracy and appearance fidelity while providing rich semantic information. Additionally, the distilled prior knowledge within a shared canonical space naturally enables spatially and temporally consistent rendering under novel views and poses.\n\n从单目视频重建高保真、可动画化的人体头像仍然是一项极具挑战的任务，原因在于单视角观测所提供的几何信息有限。尽管近期的三维高斯溅射（3D Gaussian Splatting, 3DGS）方法展现了潜力，但由于三维高斯基元的自由形式特征，其在表面细节保留方面仍存在不足。为同时解决表示能力受限与信息不足的问题，我们提出了一种全新的方法——**FMGS-Avatar**，其核心包含两项创新设计。首先，我们提出了**基于网格引导的二维高斯溅射（Mesh-Guided 2D Gaussian Splatting）**，将二维高斯基元直接附着于模板网格的面片上，并在位置、旋转与运动上施加约束，从而实现更优的表面对齐与几何细节保持。其次，我们利用在大规模数据集（如 Sapiens）上训练的基础模型，弥补单目视频视觉线索不足的问题。然而，在从基础模型蒸馏多模态先验知识的过程中，不同模态的参数敏感性差异会导致优化目标冲突。针对这一问题，我们设计了一种**选择性梯度隔离的协同训练策略**，使得各个损失项仅优化其对应的相关参数，避免相互干扰。通过增强的表示方式与协调的信息蒸馏机制，我们的方法在单目人体头像重建方面取得了显著进展。实验结果表明，所提方法在几何精度与外观保真度上均优于现有方法，并能提供丰富的语义信息。此外，在共享的规范化空间中蒸馏的先验知识，使得在新视角与新姿态下的渲染在空间与时间上都具有良好一致性。\n"
  },
  {
    "path": "abs/2509.15249.md",
    "content": "### Causal Reasoning Elicits Controllable 3D Scene Generation\n\nExisting 3D scene generation methods often struggle to model the complex logical dependencies and physical constraints between objects, limiting their ability to adapt to dynamic and realistic environments. We propose CausalStruct, a novel framework that embeds causal reasoning into 3D scene generation. Utilizing large language models (LLMs), We construct causal graphs where nodes represent objects and attributes, while edges encode causal dependencies and physical constraints. CausalStruct iteratively refines the scene layout by enforcing causal order to determine the placement order of objects and applies causal intervention to adjust the spatial configuration according to physics-driven constraints, ensuring consistency with textual descriptions and real-world dynamics. The refined scene causal graph informs subsequent optimization steps, employing a Proportional-Integral-Derivative(PID) controller to iteratively tune object scales and positions. Our method uses text or images to guide object placement and layout in 3D scenes, with 3D Gaussian Splatting and Score Distillation Sampling improving shape accuracy and rendering stability. Extensive experiments show that CausalStruct generates 3D scenes with enhanced logical coherence, realistic spatial interactions, and robust adaptability.\n\n现有的三维场景生成方法在建模物体之间复杂的逻辑依赖关系和物理约束方面仍存在困难，从而限制了其对动态和真实环境的适应能力。为此，我们提出了 **CausalStruct**——一种将因果推理嵌入三维场景生成的新框架。通过利用大型语言模型（Large Language Models, LLMs），我们构建因果图，其中节点表示物体及其属性，边则编码物体间的因果依赖与物理约束。CausalStruct 通过施加因果顺序来确定物体的放置顺序，并通过因果干预（causal intervention）依据物理驱动约束调整空间配置，从而确保场景与文本描述及真实物理动态相一致。经过优化的场景因果图还可指导后续优化步骤，其中我们引入比例-积分-微分（Proportional-Integral-Derivative, PID）控制器以迭代调整物体的尺度与位置。该方法可利用文本或图像引导三维场景中物体的布局与摆放，并结合三维高斯溅射（3D Gaussian Splatting）与得分蒸馏采样（Score Distillation Sampling）以提升形状精度与渲染稳定性。大量实验结果表明，CausalStruct 能生成具有更高逻辑一致性、更真实空间交互以及更强鲁棒适应性的三维场景。\n"
  },
  {
    "path": "abs/2509.15548.md",
    "content": "### MS-GS: Multi-Appearance Sparse-View 3D Gaussian Splatting in the Wild\n\nIn-the-wild photo collections often contain limited volumes of imagery and exhibit multiple appearances, e.g., taken at different times of day or seasons, posing significant challenges to scene reconstruction and novel view synthesis. Although recent adaptations of Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) have improved in these areas, they tend to oversmooth and are prone to overfitting. In this paper, we present MS-GS, a novel framework designed with Multi-appearance capabilities in Sparse-view scenarios using 3DGS. To address the lack of support due to sparse initializations, our approach is built on the geometric priors elicited from monocular depth estimations. The key lies in extracting and utilizing local semantic regions with a Structure-from-Motion (SfM) points anchored algorithm for reliable alignment and geometry cues. Then, to introduce multi-view constraints, we propose a series of geometry-guided supervision at virtual views in a fine-grained and coarse scheme to encourage 3D consistency and reduce overfitting. We also introduce a dataset and an in-the-wild experiment setting to set up more realistic benchmarks. We demonstrate that MS-GS achieves photorealistic renderings under various challenging sparse-view and multi-appearance conditions and outperforms existing approaches significantly across different datasets.\n\n野外场景的照片集合通常包含有限数量的图像，并且呈现多种外观变化（例如不同时间或季节拍摄），这给场景重建和新视图合成带来了极大挑战。尽管近期针对神经辐射场（Neural Radiance Field, NeRF）和三维高斯溅射（3D Gaussian Splatting, 3DGS）的改进方法在这些方面取得了进展，但仍容易出现过度平滑和过拟合问题。本文提出了 **MS-GS**，一种基于 3DGS 的新框架，旨在应对稀疏视角（Sparse-view）条件下的多外观（Multi-appearance）建模。针对稀疏初始化导致的几何信息不足，我们的方法利用来自单目深度估计的几何先验进行构建。其关键在于基于运动结构（Structure-from-Motion, SfM）特征点的锚定算法，提取并利用局部语义区域以提供可靠的几何对齐与结构线索。随后，为引入多视约束，我们提出了一种结合细粒度与粗粒度策略的几何引导虚拟视图监督机制，以增强三维一致性并缓解过拟合。此外，我们还构建了相应的数据集与野外实验设置，以建立更加贴近真实环境的评测基准。实验结果表明，MS-GS 在多种具有挑战性的稀疏视角与多外观条件下均能实现逼真的渲染效果，并在多个数据集上显著优于现有方法。\n"
  },
  {
    "path": "abs/2509.15645.md",
    "content": "### GS-Scale: Unlocking Large-Scale 3D Gaussian Splatting Training via Host Offloading\n\nThe advent of 3D Gaussian Splatting has revolutionized graphics rendering by delivering high visual quality and fast rendering speeds. However, training large-scale scenes at high quality remains challenging due to the substantial memory demands required to store parameters, gradients, and optimizer states, which can quickly overwhelm GPU memory. To address these limitations, we propose GS-Scale, a fast and memory-efficient training system for 3D Gaussian Splatting. GS-Scale stores all Gaussians in host memory, transferring only a subset to the GPU on demand for each forward and backward pass. While this dramatically reduces GPU memory usage, it requires frustum culling and optimizer updates to be executed on the CPU, introducing slowdowns due to CPU's limited compute and memory bandwidth. To mitigate this, GS-Scale employs three system-level optimizations: (1) selective offloading of geometric parameters for fast frustum culling, (2) parameter forwarding to pipeline CPU optimizer updates with GPU computation, and (3) deferred optimizer update to minimize unnecessary memory accesses for Gaussians with zero gradients. Our extensive evaluations on large-scale datasets demonstrate that GS-Scale significantly lowers GPU memory demands by 3.3-5.6x, while achieving training speeds comparable to GPU without host offloading. This enables large-scale 3D Gaussian Splatting training on consumer-grade GPUs; for instance, GS-Scale can scale the number of Gaussians from 4 million to 18 million on an RTX 4070 Mobile GPU, leading to 23-35% LPIPS (learned perceptual image patch similarity) improvement.\n\n三维高斯溅射（3D Gaussian Splatting）的出现革新了图形渲染领域，在保证高视觉质量的同时实现了高速渲染。然而，在大规模场景下实现高质量训练仍然面临挑战，其主要瓶颈在于存储参数、梯度及优化器状态所需的巨大显存开销，这些需求极易耗尽 GPU 内存。为应对这一问题，我们提出了 **GS-Scale**——一种针对三维高斯溅射的快速且内存高效的训练系统。GS-Scale 将所有高斯基元存储在主机内存中，并在每次前向与反向传播时按需将部分数据传输至 GPU。尽管这一设计显著降低了 GPU 内存占用，但也导致视锥裁剪（frustum culling）与优化器更新需在 CPU 上执行，从而因 CPU 计算与内存带宽受限而引入性能瓶颈。为缓解这一问题，GS-Scale 采用了三项系统级优化策略：（1）**几何参数的选择性卸载**，以加速视锥裁剪；（2）**参数前传机制**，通过流水线方式并行 CPU 优化器更新与 GPU 计算；（3）**延迟优化更新策略**，以减少对梯度为零的高斯基元的冗余内存访问。我们在大规模数据集上的实验表明，GS-Scale 可将 GPU 内存需求降低 3.3–5.6 倍，同时在训练速度上与无主机卸载的纯 GPU 训练相当。该系统使得在消费级 GPU 上进行大规模 3DGS 训练成为可能——例如，GS-Scale 可在 RTX 4070 Mobile GPU 上将高斯数量从 400 万扩展至 1800 万，从而在 LPIPS（学习感知图像块相似度）指标上提升 23–35%。\n"
  },
  {
    "path": "abs/2509.15648.md",
    "content": "### FingerSplat: Contactless Fingerprint 3D Reconstruction and Generation based on 3D Gaussian Splatting\n\nResearchers have conducted many pioneer researches on contactless fingerprints, yet the performance of contactless fingerprint recognition still lags behind contact-based methods primary due to the insufficient contactless fingerprint data with pose variations and lack of the usage of implicit 3D fingerprint representations. In this paper, we introduce a novel contactless fingerprint 3D registration, reconstruction and generation framework by integrating 3D Gaussian Splatting, with the goal of offering a new paradigm for contactless fingerprint recognition that integrates 3D fingerprint reconstruction and generation. To our knowledge, this is the first work to apply 3D Gaussian Splatting to the field of fingerprint recognition, and the first to achieve effective 3D registration and complete reconstruction of contactless fingerprints with sparse input images and without requiring camera parameters information. Experiments on 3D fingerprint registration, reconstruction, and generation prove that our method can accurately align and reconstruct 3D fingerprints from 2D images, and sequentially generates high-quality contactless fingerprints from 3D model, thus increasing the performances for contactless fingerprint recognition.\n\n研究者们已在非接触式指纹识别领域进行了大量开创性研究，但其识别性能仍明显落后于接触式方法，主要原因在于现有非接触式指纹数据在姿态变化上的不足，以及缺乏对隐式三维指纹表示的有效利用。本文提出了一种融合三维高斯溅射（3D Gaussian Splatting）的全新非接触式指纹三维配准、重建与生成框架，旨在为非接触式指纹识别提供一个融合三维重建与生成的新范式。据我们所知，这是首次将三维高斯溅射应用于指纹识别领域的研究，也是首个在稀疏输入图像且无需相机参数信息的条件下，实现非接触式指纹的有效三维配准与完整重建的方法。针对三维指纹配准、重建与生成的实验结果表明，所提方法能够从二维图像中准确对齐并重建三维指纹，并可进一步从三维模型中生成高质量的非接触式指纹，从而显著提升非接触式指纹识别的整体性能。\n"
  },
  {
    "path": "abs/2509.15677.md",
    "content": "### Camera Splatting for Continuous View Optimization\n\nWe propose Camera Splatting, a novel view optimization framework for novel view synthesis. Each camera is modeled as a 3D Gaussian, referred to as a camera splat, and virtual cameras, termed point cameras, are placed at 3D points sampled near the surface to observe the distribution of camera splats. View optimization is achieved by continuously and differentiably refining the camera splats so that desirable target distributions are observed from the point cameras, in a manner similar to the original 3D Gaussian splatting. Compared to the Farthest View Sampling (FVS) approach, our optimized views demonstrate superior performance in capturing complex view-dependent phenomena, including intense metallic reflections and intricate textures such as text.\n\n本文提出了 **Camera Splatting**，一种用于新视图合成的全新视图优化框架。该方法将每个相机建模为一个三维高斯分布（即“相机溅射”），并在靠近表面的三维点上放置虚拟相机（称为“点相机”），用于观测相机溅射的分布。视图优化过程通过对这些相机溅射进行连续且可微分的优化，使得点相机所观测到的目标分布逐渐接近理想状态，这一过程与原始三维高斯溅射方法的思想相似。与“最远视图采样”（Farthest View Sampling, FVS）方法相比，我们的优化视图在捕捉复杂的视角相关现象（如强烈的金属反射和精细的纹理细节，如文字）方面表现出显著优势。\n"
  },
  {
    "path": "abs/2509.15871.md",
    "content": "### Zero-Shot Visual Grounding in 3D Gaussians via View Retrieval\n\n3D Visual Grounding (3DVG) aims to locate objects in 3D scenes based on text prompts, which is essential for applications such as robotics. However, existing 3DVG methods encounter two main challenges: first, they struggle to handle the implicit representation of spatial textures in 3D Gaussian Splatting (3DGS), making per-scene training indispensable; second, they typically require larges amounts of labeled data for effective training. To this end, we propose Grounding via View Retrieval (GVR), a novel zero-shot visual grounding framework for 3DGS to transform 3DVG as a 2D retrieval task that leverages object-level view retrieval to collect grounding clues from multiple views, which not only avoids the costly process of 3D annotation, but also eliminates the need for per-scene training. Extensive experiments demonstrate that our method achieves state-of-the-art visual grounding performance while avoiding per-scene training, providing a solid foundation for zero-shot 3DVG research.\n\n三维视觉指代（3D Visual Grounding, 3DVG）旨在根据文本提示在三维场景中定位目标物体，这对于机器人等应用具有重要意义。然而，现有的 3DVG 方法面临两大挑战：其一，难以处理三维高斯溅射（3D Gaussian Splatting, 3DGS）中的空间纹理隐式表示，导致必须进行逐场景训练；其二，通常需要大量标注数据才能实现有效训练。为此，我们提出了 **Grounding via View Retrieval (GVR)**，一种面向 3DGS 的零样本视觉指代新框架。该方法将 3DVG 转化为二维检索任务，通过基于对象级视图检索从多个视角收集指代线索，从而既避免了昂贵的三维标注过程，也无需逐场景训练。大量实验结果表明，我们的方法在无需逐场景训练的情况下仍能实现当前最优的视觉指代性能，为零样本 3DVG 研究提供了坚实基础。\n"
  },
  {
    "path": "abs/2509.16119.md",
    "content": "### RadarGaussianDet3D: An Efficient and Effective Gaussian-based 3D Detector with 4D Automotive Radars\n\n4D automotive radars have gained increasing attention for autonomous driving due to their low cost, robustness, and inherent velocity measurement capability. However, existing 4D radar-based 3D detectors rely heavily on pillar encoders for BEV feature extraction, where each point contributes to only a single BEV grid, resulting in sparse feature maps and degraded representation quality. In addition, they also optimize bounding box attributes independently, leading to sub-optimal detection accuracy. Moreover, their inference speed, while sufficient for high-end GPUs, may fail to meet the real-time requirement on vehicle-mounted embedded devices. To overcome these limitations, an efficient and effective Gaussian-based 3D detector, namely RadarGaussianDet3D is introduced, leveraging Gaussian primitives and distributions as intermediate representations for radar points and bounding boxes. In RadarGaussianDet3D, a novel Point Gaussian Encoder (PGE) is designed to transform each point into a Gaussian primitive after feature aggregation and employs the 3D Gaussian Splatting (3DGS) technique for BEV rasterization, yielding denser feature maps. PGE exhibits exceptionally low latency, owing to the optimized algorithm for point feature aggregation and fast rendering of 3DGS. In addition, a new Box Gaussian Loss (BGL) is proposed, which converts bounding boxes into 3D Gaussian distributions and measures their distance to enable more comprehensive and consistent optimization. Extensive experiments on TJ4DRadSet and View-of-Delft demonstrate that RadarGaussianDet3D achieves state-of-the-art detection accuracy while delivering substantially faster inference, highlighting its potential for real-time deployment in autonomous driving.\n\n由于成本低、鲁棒性强且具备速度测量能力，四维车载雷达（4D automotive radar）在自动驾驶领域正受到越来越多的关注。然而，现有的基于 4D 雷达的三维检测器在鸟瞰视角（BEV）特征提取中普遍依赖柱状编码器（pillar encoder），其中每个点仅对单个 BEV 网格产生贡献，导致特征图稀疏、表示能力不足。此外，这些方法通常独立优化目标框的各个属性，容易造成检测精度的下降。更进一步，其推理速度虽能满足高端 GPU 的需求，但在车载嵌入式设备上往往无法达到实时要求。为此，我们提出了一种高效且精确的基于高斯表示的三维检测器——**RadarGaussianDet3D**，利用高斯基元与分布作为雷达点与边界框的中间表示。在 RadarGaussianDet3D 中，我们设计了一种全新的 **点高斯编码器（Point Gaussian Encoder, PGE）**，通过特征聚合将每个点转换为高斯基元，并采用三维高斯溅射（3D Gaussian Splatting, 3DGS）进行 BEV 栅格化，从而获得更致密的特征图。得益于点特征聚合算法与 3DGS 快速渲染的优化，PGE 具有极低的延迟性能。此外，我们还提出了新的 **高斯框损失（Box Gaussian Loss, BGL）**，将边界框转换为三维高斯分布并通过度量分布间距离实现更全面、一致的优化。在 TJ4DRadSet 与 View-of-Delft 数据集上的大量实验表明，RadarGaussianDet3D 在实现当前最优检测精度的同时显著提升了推理速度，展示了其在自动驾驶实时部署中的巨大潜力。\n"
  },
  {
    "path": "abs/2509.16423.md",
    "content": "### 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction\n\nRecent advances in radiance fields and novel view synthesis enable creation of realistic digital twins from photographs. However, current methods struggle with flat, texture-less surfaces, creating uneven and semi-transparent reconstructions, due to an ill-conditioned photometric reconstruction objective. Surface reconstruction methods solve this issue but sacrifice visual quality. We propose a novel hybrid 2D/3D representation that jointly optimizes constrained planar (2D) Gaussians for modeling flat surfaces and freeform (3D) Gaussians for the rest of the scene. Our end-to-end approach dynamically detects and refines planar regions, improving both visual fidelity and geometric accuracy. It achieves state-of-the-art depth estimation on ScanNet++ and ScanNetv2, and excels at mesh extraction without overfitting to a specific camera model, showing its effectiveness in producing high-quality reconstruction of indoor scenes.\n\n近年来，辐射场与新视图合成技术的进步使得从照片中生成逼真的数字孪生成为可能。然而，现有方法在处理平坦且缺乏纹理的表面时表现不佳，常导致重建结果出现不均匀或半透明的伪影，这源于光度重建目标的不良条件化问题。虽然基于显式表面重建的方法可以缓解这一问题，但往往以牺牲视觉质量为代价。为此，我们提出了一种全新的**二维/三维混合表示方法**，通过联合优化受约束的平面（2D）高斯以建模平坦表面，并使用自由形式（3D）高斯来表示场景的其他部分。该端到端框架能够动态检测并细化平面区域，从而在视觉保真度与几何精度上同时获得显著提升。我们的方法在 ScanNet++ 与 ScanNetv2 数据集上实现了当前最优的深度估计性能，并在网格提取任务中表现优异，且无需针对特定相机模型进行过拟合，充分证明了其在高质量室内场景重建中的有效性。\n"
  },
  {
    "path": "abs/2509.16552.md",
    "content": "### ST-GS: Vision-Based 3D Semantic Occupancy Prediction with Spatial-Temporal Gaussian Splatting\n\n3D occupancy prediction is critical for comprehensive scene understanding in vision-centric autonomous driving. Recent advances have explored utilizing 3D semantic Gaussians to model occupancy while reducing computational overhead, but they remain constrained by insufficient multi-view spatial interaction and limited multi-frame temporal consistency. To overcome these issues, in this paper, we propose a novel Spatial-Temporal Gaussian Splatting (ST-GS) framework to enhance both spatial and temporal modeling in existing Gaussian-based pipelines. Specifically, we develop a guidance-informed spatial aggregation strategy within a dual-mode attention mechanism to strengthen spatial interaction in Gaussian representations. Furthermore, we introduce a geometry-aware temporal fusion scheme that effectively leverages historical context to improve temporal continuity in scene completion. Extensive experiments on the large-scale nuScenes occupancy prediction benchmark showcase that our proposed approach not only achieves state-of-the-art performance but also delivers markedly better temporal consistency compared to existing Gaussian-based methods.\n\n三维占据预测对于以视觉为核心的自动驾驶场景理解至关重要。近年来的研究尝试利用三维语义高斯来建模占据信息，从而降低计算开销，但仍受到多视角空间交互不足和多帧时序一致性有限的制约。为了解决这些问题，本文提出了一种新的时空高斯溅射（Spatial-Temporal Gaussian Splatting, ST-GS）框架，以增强现有基于高斯管线的空间与时间建模能力。具体而言，我们在双模注意力机制中设计了一种基于引导的空间聚合策略，以强化高斯表示的空间交互。此外，我们提出了一种几何感知的时序融合方案，能够有效利用历史上下文信息以提升场景补全的时间连续性。在大规模 nuScenes 占据预测基准上的大量实验表明，我们的方法不仅达到了当前最优性能，还在时序一致性方面显著优于现有的基于高斯的方法。\n"
  },
  {
    "path": "abs/2509.16588.md",
    "content": "### SQS: Enhancing Sparse Perception Models via Query-based Splatting in Autonomous Driving\n\nSparse Perception Models (SPMs) adopt a query-driven paradigm that forgoes explicit dense BEV or volumetric construction, enabling highly efficient computation and accelerated inference. In this paper, we introduce SQS, a novel query-based splatting pre-training specifically designed to advance SPMs in autonomous driving. SQS introduces a plug-in module that predicts 3D Gaussian representations from sparse queries during pre-training, leveraging self-supervised splatting to learn fine-grained contextual features through the reconstruction of multi-view images and depth maps. During fine-tuning, the pre-trained Gaussian queries are seamlessly integrated into downstream networks via query interaction mechanisms that explicitly connect pre-trained queries with task-specific queries, effectively accommodating the diverse requirements of occupancy prediction and 3D object detection. Extensive experiments on autonomous driving benchmarks demonstrate that SQS delivers considerable performance gains across multiple query-based 3D perception tasks, notably in occupancy prediction and 3D object detection, outperforming prior state-of-the-art pre-training approaches by a significant margin (i.e., +1.3 mIoU on occupancy prediction and +1.0 NDS on 3D detection).\n\n稀疏感知模型（Sparse Perception Models, SPMs）采用了一种基于查询驱动的范式，舍弃了显式的稠密 BEV 或体素构建，从而实现了高效计算与快速推理。本文提出了一种名为 SQS 的创新性基于查询的溅射预训练方法，专为推动自动驾驶中的 SPMs 发展而设计。SQS 在预训练阶段引入了一个可插拔模块，用于从稀疏查询中预测三维高斯表示，并通过自监督溅射在重建多视角图像与深度图的过程中学习细粒度的上下文特征。在微调阶段，预训练得到的高斯查询通过查询交互机制无缝融入下游网络，该机制显式地连接预训练查询与任务特定查询，从而有效适配占据预测和三维目标检测等不同任务需求。在自动驾驶基准上的大量实验结果表明，SQS 在多种基于查询的三维感知任务中带来了显著性能提升，尤其在占据预测和三维目标检测上表现突出，分别较现有最优预训练方法提升了 +1.3 mIoU 和 +1.0 NDS。\n"
  },
  {
    "path": "abs/2509.16806.md",
    "content": "### MedGS: Gaussian Splatting for Multi-Modal 3D Medical Imaging\n\nMulti-modal three-dimensional (3D) medical imaging data, derived from ultrasound, magnetic resonance imaging (MRI), and potentially computed tomography (CT), provide a widely adopted approach for non-invasive anatomical visualization. Accurate modeling, registration, and visualization in this setting depend on surface reconstruction and frame-to-frame interpolation. Traditional methods often face limitations due to image noise and incomplete information between frames. To address these challenges, we present MedGS, a semi-supervised neural implicit surface reconstruction framework that employs a Gaussian Splatting (GS)-based interpolation mechanism. In this framework, medical imaging data are represented as consecutive two-dimensional (2D) frames embedded in 3D space and modeled using Gaussian-based distributions. This representation enables robust frame interpolation and high-fidelity surface reconstruction across imaging modalities. As a result, MedGS offers more efficient training than traditional neural implicit methods. Its explicit GS-based representation enhances noise robustness, allows flexible editing, and supports precise modeling of complex anatomical structures with fewer artifacts. These features make MedGS highly suitable for scalable and practical applications in medical imaging.\n\n多模态三维（3D）医学影像数据来源于超声、磁共振成像（MRI）以及可能的计算机断层扫描（CT），为非侵入式解剖可视化提供了一种被广泛采用的方式。在此背景下，实现精确的建模、配准与可视化依赖于表面重建与帧间插值。然而，传统方法常因图像噪声及帧间信息不完整而受到限制。为应对这些挑战，我们提出了 MedGS——一种基于高斯溅射（Gaussian Splatting, GS）插值机制的半监督神经隐式表面重建框架。在该框架中，医学影像数据被表示为嵌入三维空间的连续二维（2D）帧，并通过基于高斯的分布进行建模。这种表示方式能够在不同成像模态下实现稳健的帧间插值与高保真表面重建。得益于此，MedGS 的训练效率显著优于传统神经隐式方法。其显式的 GS 表示增强了对噪声的鲁棒性，支持灵活编辑，并能以更少伪影精确建模复杂的解剖结构。这些特性使得 MedGS 在医学影像的可扩展与实际应用中具有高度的适用性。\n"
  },
  {
    "path": "abs/2509.16863.md",
    "content": "### ConfidentSplat: Confidence-Weighted Depth Fusion for Accurate 3D Gaussian Splatting SLAM\n\nWe introduce ConfidentSplat, a novel 3D Gaussian Splatting (3DGS)-based SLAM system for robust, highfidelity RGB-only reconstruction. Addressing geometric inaccuracies in existing RGB-only 3DGS SLAM methods that stem from unreliable depth estimation, ConfidentSplat incorporates a core innovation: a confidence-weighted fusion mechanism. This mechanism adaptively integrates depth cues from multiview geometry with learned monocular priors (Omnidata ViT), dynamically weighting their contributions based on explicit reliability estimates-derived predominantly from multi-view geometric consistency-to generate high-fidelity proxy depth for map supervision. The resulting proxy depth guides the optimization of a deformable 3DGS map, which efficiently adapts online to maintain global consistency following pose updates from a DROID-SLAM-inspired frontend and backend optimizations (loop closure, global bundle adjustment). Extensive validation on standard benchmarks (TUM-RGBD, ScanNet) and diverse custom mobile datasets demonstrates significant improvements in reconstruction accuracy (L1 depth error) and novel view synthesis fidelity (PSNR, SSIM, LPIPS) over baselines, particularly in challenging conditions. ConfidentSplat underscores the efficacy of principled, confidence-aware sensor fusion for advancing state-of-the-art dense visual SLAM.\n\n我们提出了 **ConfidentSplat**，一种基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的全新 SLAM 系统，用于实现鲁棒且高保真的纯 RGB 重建。针对现有 RGB-only 3DGS SLAM 方法中因深度估计不可靠而导致的几何不准确问题，ConfidentSplat 引入了核心创新：**置信度加权融合机制**。该机制自适应地融合来自多视几何的深度线索与基于学习的单目先验（Omnidata ViT），并依据显式的可靠性评估（主要来源于多视几何一致性）动态调整各自的贡献，从而生成高保真的代理深度用于地图监督。所得代理深度进一步引导可形变 3DGS 地图的优化，使系统能够在 DROID-SLAM 启发的前端与后端优化（如回环检测与全局束调整）后的位姿更新中高效地在线适应并保持全局一致性。在标准基准（TUM-RGBD、ScanNet）及多样化的自采移动数据集上的广泛实验表明，ConfidentSplat 在重建精度（L1 深度误差）及新视角合成质量（PSNR、SSIM、LPIPS）方面均显著优于基线方法，尤其在复杂环境中表现突出。该方法充分展示了基于置信度感知的传感器融合在推动密集视觉 SLAM 领域最先进水平方面的有效性。\n"
  },
  {
    "path": "abs/2509.16922.md",
    "content": "### PGSTalker: Real-Time Audio-Driven Talking Head Generation via 3D Gaussian Splatting with Pixel-Aware Density Control\n\nAudio-driven talking head generation is crucial for applications in virtual reality, digital avatars, and film production. While NeRF-based methods enable high-fidelity reconstruction, they suffer from low rendering efficiency and suboptimal audio-visual synchronization. This work presents PGSTalker, a real-time audio-driven talking head synthesis framework based on 3D Gaussian Splatting (3DGS). To improve rendering performance, we propose a pixel-aware density control strategy that adaptively allocates point density, enhancing detail in dynamic facial regions while reducing redundancy elsewhere. Additionally, we introduce a lightweight Multimodal Gated Fusion Module to effectively fuse audio and spatial features, thereby improving the accuracy of Gaussian deformation prediction. Extensive experiments on public datasets demonstrate that PGSTalker outperforms existing NeRF- and 3DGS-based approaches in rendering quality, lip-sync precision, and inference speed. Our method exhibits strong generalization capabilities and practical potential for real-world deployment.\n\n基于音频驱动的说话人头像生成在虚拟现实、数字人和影视制作等领域具有重要应用价值。尽管基于 NeRF 的方法能够实现高保真重建，但其渲染效率较低，音视频同步效果不理想。本文提出了一种基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的实时音频驱动说话人合成框架——**PGSTalker**。为提升渲染性能，我们提出了一种**像素感知密度控制策略**，可自适应地分配点的密度，在动态面部区域增强细节，同时减少其他区域的冗余。此外，我们设计了一个轻量级的**多模态门控融合模块（Multimodal Gated Fusion Module）**，用于有效融合音频与空间特征，从而提升高斯形变预测的精度。在多个公开数据集上的广泛实验表明，PGSTalker 在渲染质量、唇形同步精度及推理速度方面均优于现有的基于 NeRF 和 3DGS 的方法。该方法展现出较强的泛化能力与实际部署潜力。\n"
  },
  {
    "path": "abs/2509.16960.md",
    "content": "### SemanticGarment: Semantic-Controlled Generation and Editing of 3D Gaussian Garments\n\n3D digital garment generation and editing play a pivotal role in fashion design, virtual try-on, and gaming. Traditional methods struggle to meet the growing demand due to technical complexity and high resource costs. Learning-based approaches offer faster, more diverse garment synthesis based on specific requirements and reduce human efforts and time costs. However, they still face challenges such as inconsistent multi-view geometry or textures and heavy reliance on detailed garment topology and manual rigging. We propose SemanticGarment, a 3D Gaussian-based method that realizes high-fidelity 3D garment generation from text or image prompts and supports semantic-based interactive editing for flexible user customization. To ensure multi-view consistency and garment fitting, we propose to leverage structural human priors for the generative model by introducing a 3D semantic clothing model, which initializes the geometry structure and lays the groundwork for view-consistent garment generation and editing. Without the need to regenerate or rely on existing mesh templates, our approach allows for rapid and diverse modifications to existing Gaussians, either globally or within a local region. To address the artifacts caused by self-occlusion for garment reconstruction based on single image, we develop a self-occlusion optimization strategy to mitigate holes and artifacts that arise when directly animating self-occluded garments. Extensive experiments are conducted to demonstrate our superior performance in 3D garment generation and editing.\n\n三维数字服装的生成与编辑在时尚设计、虚拟试衣和游戏领域中发挥着关键作用。传统方法由于技术复杂度高、资源消耗大，难以满足日益增长的需求。基于学习的方法能够根据特定需求实现更快速、更丰富的服装生成，从而显著减少人工成本和时间开销。然而，这类方法仍面临诸如多视角几何与纹理不一致、对精细服装拓扑及手动绑定的高度依赖等问题。为此，我们提出了 **SemanticGarment**——一种基于三维高斯（3D Gaussian）的高保真服装生成与语义可交互编辑方法。该方法能够根据文本或图像提示生成高质量的三维服装，并支持语义驱动的灵活用户自定义编辑。为保证多视一致性与服装贴合性，我们在生成模型中引入了结构化人体先验，通过构建三维语义服装模型来初始化几何结构，为多视一致的服装生成与编辑奠定基础。我们的方案无需重新生成或依赖现有网格模板，即可对已有高斯表示进行快速、多样的全局或局部修改。针对单张图像驱动的服装重建中因自遮挡带来的伪影问题，我们提出了自遮挡优化策略，有效缓解在直接动画化自遮挡服装时出现的空洞与伪影。大量实验验证了 SemanticGarment 在三维服装生成与编辑任务中的卓越性能。\n"
  },
  {
    "path": "abs/2509.17027.md",
    "content": "### Efficient 3D Scene Reconstruction and Simulation from Sparse Endoscopic Views\n\nSurgical simulation is essential for medical training, enabling practitioners to develop crucial skills in a risk-free environment while improving patient safety and surgical outcomes. However, conventional methods for building simulation environments are cumbersome, time-consuming, and difficult to scale, often resulting in poor details and unrealistic simulations. In this paper, we propose a Gaussian Splatting-based framework to directly reconstruct interactive surgical scenes from endoscopic data while ensuring efficiency, rendering quality, and realism. A key challenge in this data-driven simulation paradigm is the restricted movement of endoscopic cameras, which limits viewpoint diversity. As a result, the Gaussian Splatting representation overfits specific perspectives, leading to reduced geometric accuracy. To address this issue, we introduce a novel virtual camera-based regularization method that adaptively samples virtual viewpoints around the scene and incorporates them into the optimization process to mitigate overfitting. An effective depth-based regularization is applied to both real and virtual views to further refine the scene geometry. To enable fast deformation simulation, we propose a sparse control node-based Material Point Method, which integrates physical properties into the reconstructed scene while significantly reducing computational costs. Experimental results on representative surgical data demonstrate that our method can efficiently reconstruct and simulate surgical scenes from sparse endoscopic views. Notably, our method takes only a few minutes to reconstruct the surgical scene and is able to produce physically plausible deformations in real-time with user-defined interactions.\n\n外科手术模拟对于医学培训至关重要，它能够帮助医生在无风险的环境中培养关键技能，从而提升患者安全性并改善手术效果。然而，传统的手术仿真环境构建方法过程繁琐、耗时且难以扩展，往往导致细节不足和仿真逼真度低。本文提出了一种基于高斯溅射（Gaussian Splatting）的框架，可直接从内窥镜数据中高效重建交互式手术场景，同时兼顾效率、渲染质量与真实感。该数据驱动的仿真范式面临的核心挑战在于内窥镜相机运动范围受限，导致视角多样性不足，从而使高斯溅射表示在特定视角上过拟合，削弱几何精度。为此，我们提出了一种新的基于虚拟相机的正则化方法，自适应地在场景周围采样虚拟视角，并将其引入优化过程以缓解过拟合问题。同时，我们在真实与虚拟视角中均引入了基于深度的有效正则化，以进一步优化场景几何结构。为实现快速的变形仿真，我们提出了一种基于稀疏控制节点的材料点法（Material Point Method, MPM），在显著降低计算开销的同时，将物理属性融入重建场景中。基于典型外科数据的实验结果表明，我们的方法能够从稀疏的内窥镜视角中高效重建并模拟手术场景。值得注意的是，本方法仅需数分钟即可完成手术场景重建，并能够在用户交互下实现实时且物理合理的变形模拟。\n"
  },
  {
    "path": "abs/2509.17083.md",
    "content": "### HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis\n\nRecently, 3D Gaussian Splatting (3DGS) has emerged as a powerful alternative to NeRF-based approaches, enabling real-time, high-quality novel view synthesis through explicit, optimizable 3D Gaussians. However, 3DGS suffers from significant memory overhead due to its reliance on per-Gaussian parameters to model view-dependent effects and anisotropic shapes. While recent works propose compressing 3DGS with neural fields, these methods struggle to capture high-frequency spatial variations in Gaussian properties, leading to degraded reconstruction of fine details. We present Hybrid Radiance Fields (HyRF), a novel scene representation that combines the strengths of explicit Gaussians and neural fields. HyRF decomposes the scene into (1) a compact set of explicit Gaussians storing only critical high-frequency parameters and (2) grid-based neural fields that predict remaining properties. To enhance representational capacity, we introduce a decoupled neural field architecture, separately modeling geometry (scale, opacity, rotation) and view-dependent color. Additionally, we propose a hybrid rendering scheme that composites Gaussian splatting with a neural field-predicted background, addressing limitations in distant scene representation. Experiments demonstrate that HyRF achieves state-of-the-art rendering quality while reducing model size by over 20 times compared to 3DGS and maintaining real-time performance.\n\n近年来，三维高斯溅射（3D Gaussian Splatting, 3DGS）作为 NeRF 的强有力替代方案迅速兴起，它通过显式、可优化的三维高斯实现了实时的高质量新视角合成。然而，3DGS 由于依赖于每个高斯的独立参数来建模视角相关效应与各向异性形状，导致了显著的内存开销。尽管近期有研究尝试利用神经场对 3DGS 进行压缩，但这些方法难以捕捉高斯属性中的高频空间变化，从而造成细节重建质量下降。为此，我们提出了 **混合辐射场（Hybrid Radiance Fields, HyRF）**，这是一种结合显式高斯与神经场优势的全新场景表示方式。HyRF 将场景分解为两部分：（1）一组紧凑的显式高斯，用于存储关键的高频参数；（2）基于网格的神经场，用于预测剩余属性。为提升表示能力，我们设计了一种**解耦神经场结构**，分别建模几何属性（尺度、不透明度、旋转）与视角相关颜色。此外，我们提出了一种**混合渲染方案**，将高斯溅射结果与神经场预测的背景进行复合，从而解决了远景场景表示的不足。实验结果表明，HyRF 在保持实时性能的同时，相较于 3DGS 将模型大小减少了 20 倍以上，并在渲染质量上达到了当前最优水平。\n"
  },
  {
    "path": "abs/2509.17246.md",
    "content": "### SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views\n\nWe introduce SPFSplatV2, an efficient feed-forward framework for 3D Gaussian splatting from sparse multi-view images, requiring no ground-truth poses during training and inference. It employs a shared feature extraction backbone, enabling simultaneous prediction of 3D Gaussian primitives and camera poses in a canonical space from unposed inputs. A masked attention mechanism is introduced to efficiently estimate target poses during training, while a reprojection loss enforces pixel-aligned Gaussian primitives, providing stronger geometric constraints. We further demonstrate the compatibility of our training framework with different reconstruction architectures, resulting in two model variants. Remarkably, despite the absence of pose supervision, our method achieves state-of-the-art performance in both in-domain and out-of-domain novel view synthesis, even under extreme viewpoint changes and limited image overlap, and surpasses recent methods that rely on geometric supervision for relative pose estimation. By eliminating dependence on ground-truth poses, our method offers the scalability to leverage larger and more diverse datasets.\n\n我们提出了 **SPFSplatV2**，一种高效的前馈式三维高斯溅射（3D Gaussian Splatting）框架，能够从稀疏多视图图像中进行重建，在训练与推理阶段均无需使用真实位姿监督。该方法采用共享特征提取骨干网络，可在无姿态输入的情况下同时预测规范空间中的三维高斯基元与相机位姿。我们引入了**掩码注意力机制（masked attention mechanism）**以高效估计训练阶段的目标位姿，并通过**重投影损失（reprojection loss）**实现像素级对齐的高斯基元，从而提供更强的几何约束。此外，我们展示了该训练框架与不同重建架构的兼容性，构建了两个模型变体。值得注意的是，即便在无姿态监督的条件下，本方法在域内与跨域的新视角合成任务中均达到了当前最优性能，尤其在极端视角变化与图像重叠有限的情况下，依然显著优于依赖几何监督进行相对姿态估计的最新方法。通过消除对真实位姿的依赖，SPFSplatV2 具备了利用更大规模与更多样化数据集进行扩展的能力。\n"
  },
  {
    "path": "abs/2509.17329.md",
    "content": "### SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction\n\nSmoke in real-world scenes can severely degrade the quality of images and hamper visibility. Recent methods for image restoration either rely on data-driven priors that are susceptible to hallucinations, or are limited to static low-density smoke. We introduce SmokeSeer, a method for simultaneous 3D scene reconstruction and smoke removal from a video capturing multiple views of a scene. Our method uses thermal and RGB images, leveraging the fact that the reduced scattering in thermal images enables us to see through the smoke. We build upon 3D Gaussian splatting to fuse information from the two image modalities, and decompose the scene explicitly into smoke and non-smoke components. Unlike prior approaches, SmokeSeer handles a broad range of smoke densities and can adapt to temporally varying smoke. We validate our approach on synthetic data and introduce a real-world multi-view smoke dataset with RGB and thermal images.\n\n现实场景中的烟雾会严重降低图像质量并影响可见性。现有的图像恢复方法要么依赖数据驱动的先验，容易产生幻觉伪影，要么仅适用于静态、低密度烟雾。本文提出了 **SmokeSeer**，一种从多视角视频中同时实现三维场景重建与烟雾去除的方法。我们利用热成像与 RGB 图像的互补特性，基于热成像中散射效应减弱的原理，使系统能够“透视”烟雾。SmokeSeer 基于三维高斯溅射（3D Gaussian Splatting）框架，将两种模态的信息进行融合，并显式地将场景分解为烟雾与非烟雾成分。与以往方法不同，SmokeSeer 能够处理不同密度范围的烟雾，并自适应地应对时间变化的烟雾动态。我们在合成数据上验证了该方法的有效性，并首次引入了包含 RGB 与热成像数据的真实多视角烟雾数据集。\n"
  },
  {
    "path": "abs/2509.17390.md",
    "content": "### FGGS-LiDAR: Ultra-Fast, GPU-Accelerated Simulation from General 3DGS Models to LiDAR\n\nWhile 3D Gaussian Splatting (3DGS) has revolutionized photorealistic rendering, its vast ecosystem of assets remains incompatible with high-performance LiDAR simulation, a critical tool for robotics and autonomous driving. We present FGGS-LiDAR, a framework that bridges this gap with a truly plug-and-play approach. Our method converts any pretrained 3DGS model into a high-fidelity, watertight mesh without requiring LiDAR-specific supervision or architectural alterations. This conversion is achieved through a general pipeline of volumetric discretization and Truncated Signed Distance Field (TSDF) extraction. We pair this with a highly optimized, GPU-accelerated ray-casting module that simulates LiDAR returns at over 500 FPS. We validate our approach on indoor and outdoor scenes, demonstrating exceptional geometric fidelity; By enabling the direct reuse of 3DGS assets for geometrically accurate depth sensing, our framework extends their utility beyond visualization and unlocks new capabilities for scalable, multimodal simulation.\n\n尽管三维高斯溅射（3D Gaussian Splatting, 3DGS）在真实感渲染方面取得了革命性进展，但其庞大的资产生态系统仍无法兼容高性能的激光雷达（LiDAR）仿真——这一在机器人与自动驾驶领域至关重要的工具。本文提出了 **FGGS-LiDAR**，一个真正意义上的即插即用框架，用以弥合这一鸿沟。我们的方法能够将任意预训练的 3DGS 模型转换为高保真、密闭（watertight）的网格，而无需任何针对 LiDAR 的特定监督或架构修改。该转换通过通用的体素离散化与**截断符号距离场（Truncated Signed Distance Field, TSDF）提取**流程实现。我们进一步配备了一个高效优化、GPU 加速的射线投射模块，可在 **500 FPS 以上**的速度下模拟激光雷达回波。我们在室内与室外场景上对该方法进行了验证，结果显示其在几何保真度方面表现卓越。通过实现对 3DGS 资产在几何精确深度感知任务中的直接复用，该框架不仅扩展了其在可视化之外的应用范围，也为大规模、多模态仿真带来了新的可能性。\n"
  },
  {
    "path": "abs/2509.17430.md",
    "content": "### EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device\n\nThe field of Embodied AI predominantly relies on simulation for training and evaluation, often using either fully synthetic environments that lack photorealism or high-fidelity real-world reconstructions captured with expensive hardware. As a result, sim-to-real transfer remains a major challenge. In this paper, we introduce EmbodiedSplat, a novel approach that personalizes policy training by efficiently capturing the deployment environment and fine-tuning policies within the reconstructed scenes. Our method leverages 3D Gaussian Splatting (GS) and the Habitat-Sim simulator to bridge the gap between realistic scene capture and effective training environments. Using iPhone-captured deployment scenes, we reconstruct meshes via GS, enabling training in settings that closely approximate real-world conditions. We conduct a comprehensive analysis of training strategies, pre-training datasets, and mesh reconstruction techniques, evaluating their impact on sim-to-real predictivity in real-world scenarios. Experimental results demonstrate that agents fine-tuned with EmbodiedSplat outperform both zero-shot baselines pre-trained on large-scale real-world datasets (HM3D) and synthetically generated datasets (HSSD), achieving absolute success rate improvements of 20% and 40% on real-world Image Navigation task. Moreover, our approach yields a high sim-vs-real correlation (0.87-0.97) for the reconstructed meshes, underscoring its effectiveness in adapting policies to diverse environments with minimal effort.\n\n具身智能（Embodied AI）领域主要依赖仿真环境进行训练和评估，这些环境通常要么是缺乏真实感的全合成场景，要么是通过昂贵硬件捕获的高保真真实场景重建。因此，从仿真到现实（sim-to-real）的迁移仍然是一个主要挑战。本文提出了一种新方法——EmbodiedSplat，通过高效获取部署环境并在重建场景中微调策略，实现策略训练的个性化。我们的方法结合了三维高斯溅射（3D Gaussian Splatting，GS）与 Habitat-Sim 模拟器，在真实场景捕获与高效训练环境之间搭建桥梁。通过使用 iPhone 拍摄的部署场景，我们利用 GS 进行网格重建，使训练环境能够高度接近真实世界条件。我们对训练策略、预训练数据集和网格重建技术进行了系统分析，评估它们在真实场景中的仿真到现实预测性能上的影响。实验结果表明，经过 EmbodiedSplat 微调的智能体在真实世界的图像导航任务中，较在大规模真实数据集（HM3D）和合成数据集（HSSD）上预训练的零样本基线模型，分别实现了 20% 和 40% 的绝对成功率提升。此外，我们的方法在重建网格上的仿真与现实相关性达到 0.87–0.97，充分验证了其在以最小代价适应多样化环境中的有效性。\n"
  },
  {
    "path": "abs/2509.17789.md",
    "content": "### From Restoration to Reconstruction: Rethinking 3D Gaussian Splatting for Underwater Scenes\n\nUnderwater image degradation poses significant challenges for 3D reconstruction, where simplified physical models often fail in complex scenes. We propose R-Splatting, a unified framework that bridges underwater image restoration (UIR) with 3D Gaussian Splatting (3DGS) to improve both rendering quality and geometric fidelity. Our method integrates multiple enhanced views produced by diverse UIR models into a single reconstruction pipeline. During inference, a lightweight illumination generator samples latent codes to support diverse yet coherent renderings, while a contrastive loss ensures disentangled and stable illumination representations. Furthermore, we propose Uncertainty-Aware Opacity Optimization (UAOO), which models opacity as a stochastic function to regularize training. This suppresses abrupt gradient responses triggered by illumination variation and mitigates overfitting to noisy or view-specific artifacts. Experiments on Seathru-NeRF and our new BlueCoral3D dataset demonstrate that R-Splatting outperforms strong baselines in both rendering quality and geometric accuracy.\n\n水下图像退化对三维重建带来了巨大挑战，尤其是在复杂场景中，简化的物理模型往往难以奏效。我们提出了 **R-Splatting**，一个将水下图像恢复（UIR）与三维高斯溅射（3D Gaussian Splatting，3DGS）相结合的统一框架，以同时提升渲染质量与几何精度。我们的方法将多个由不同 UIR 模型生成的增强视图整合到同一个重建管线中。在推理阶段，一个轻量级光照生成器通过采样潜在代码来支持多样而一致的渲染结果，同时对比损失（contrastive loss）确保光照表征的解耦与稳定。此外，我们提出了 *不确定性感知不透明度优化（Uncertainty-Aware Opacity Optimization，UAOO）*，将不透明度建模为随机函数以正则化训练过程。该方法能抑制由光照变化引起的梯度突变响应，并缓解对噪声或特定视角伪影的过拟合。基于 Seathru-NeRF 和我们新提出的 BlueCoral3D 数据集的实验结果表明，R-Splatting 在渲染质量和几何精度上均优于强基线模型。\n"
  },
  {
    "path": "abs/2509.17864.md",
    "content": "### ProDyG: Progressive Dynamic Scene Reconstruction via Gaussian Splatting from Monocular Videos\n\nAchieving truly practical dynamic 3D reconstruction requires online operation, global pose and map consistency, detailed appearance modeling, and the flexibility to handle both RGB and RGB-D inputs. However, existing SLAM methods typically merely remove the dynamic parts or require RGB-D input, while offline methods are not scalable to long video sequences, and current transformer-based feedforward methods lack global consistency and appearance details. To this end, we achieve online dynamic scene reconstruction by disentangling the static and dynamic parts within a SLAM system. The poses are tracked robustly with a novel motion masking strategy, and dynamic parts are reconstructed leveraging a progressive adaptation of a Motion Scaffolds graph. Our method yields novel view renderings competitive to offline methods and achieves on-par tracking with state-of-the-art dynamic SLAM methods.\n\n要实现真正实用的动态三维重建，需要具备在线处理能力、全局位姿与地图一致性、精细的外观建模，以及同时兼容 RGB 与 RGB-D 输入的灵活性。然而，现有的 SLAM 方法通常仅简单地去除动态部分或依赖于 RGB-D 输入；而离线方法无法扩展到长视频序列，当前基于 Transformer 的前馈方法又缺乏全局一致性与外观细节。为此，我们通过在 SLAM 系统中解耦静态与动态部分，实现了在线动态场景重建。我们提出了一种新的运动遮罩策略以实现稳健的位姿跟踪，并利用逐步自适应的运动支架图（Motion Scaffolds Graph）对动态部分进行重建。实验结果表明，我们的方法在新视图渲染质量上可与离线方法相媲美，并在跟踪性能上达到当前动态 SLAM 最先进水平。\n"
  },
  {
    "path": "abs/2509.18501.md",
    "content": "### BridgeSplat: Bidirectionally Coupled CT and Non-Rigid Gaussian Splatting for Deformable Intraoperative Surgical Navigation\n\nWe introduce BridgeSplat, a novel approach for deformable surgical navigation that couples intraoperative 3D reconstruction with preoperative CT data to bridge the gap between surgical video and volumetric patient data. Our method rigs 3D Gaussians to a CT mesh, enabling joint optimization of Gaussian parameters and mesh deformation through photometric supervision. By parametrizing each Gaussian relative to its parent mesh triangle, we enforce alignment between Gaussians and mesh and obtain deformations that can be propagated back to update the CT. We demonstrate BridgeSplat's effectiveness on visceral pig surgeries and synthetic data of a human liver under simulation, showing sensible deformations of the preoperative CT on monocular RGB data.\n\n我们提出了 BridgeSplat，这是一种用于可变形手术导航的新方法，通过将术中三维重建与术前 CT 数据相结合，弥合手术视频与体素化患者数据之间的鸿沟。我们的方法将三维高斯绑定（rig）到 CT 网格上，通过光度监督实现高斯参数与网格形变的联合优化。通过将每个高斯相对于其父网格三角形进行参数化，我们强制保持高斯与网格之间的对齐，从而获得可回传更新 CT 的形变结果。我们在猪内脏手术和基于仿真的人类肝脏合成数据上验证了 BridgeSplat 的有效性，结果表明该方法能够在单目 RGB 数据上实现对术前 CT 的合理形变。\n"
  },
  {
    "path": "abs/2509.18566.md",
    "content": "### Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction\n\nReconstructing dynamic humans together with static scenes from monocular videos remains difficult, especially under fast motion, where RGB frames suffer from motion blur. Event cameras exhibit distinct advantages, e.g., microsecond temporal resolution, making them a superior sensing choice for dynamic human reconstruction. Accordingly, we present a novel event-guided human-scene reconstruction framework that jointly models human and scene from a single monocular event camera via 3D Gaussian Splatting. Specifically, a unified set of 3D Gaussians carries a learnable semantic attribute; only Gaussians classified as human undergo deformation for animation, while scene Gaussians stay static. To combat blur, we propose an event-guided loss that matches simulated brightness changes between consecutive renderings with the event stream, improving local fidelity in fast-moving regions. Our approach removes the need for external human masks and simplifies managing separate Gaussian sets. On two benchmark datasets, ZJU-MoCap-Blur and MMHPSD-Blur, it delivers state-of-the-art human-scene reconstruction, with notable gains over strong baselines in PSNR/SSIM and reduced LPIPS, especially for high-speed subjects.\n\n从单目视频中同时重建动态人体与静态场景仍然是一项具有挑战性的任务，尤其是在快速运动情况下，RGB 帧容易受到运动模糊的影响。事件相机具有独特优势，例如微秒级时间分辨率，使其成为动态人体重建的理想传感器选择。为此，我们提出了一种基于事件引导的人体-场景联合重建框架，通过三维高斯溅射（3D Gaussian Splatting）在单个事件相机的输入下同时建模人体与场景。具体而言，我们使用一组统一的三维高斯，并为其引入可学习的语义属性；其中仅被分类为人体的高斯会进行形变以实现动画，而场景高斯保持静态。为应对运动模糊问题，我们提出了一种事件引导损失，通过匹配相邻渲染帧之间的模拟亮度变化与事件流，从而提升快速运动区域的局部保真度。该方法无需外部人体掩码，并简化了对独立高斯集合的管理。在 ZJU-MoCap-Blur 与 MMHPSD-Blur 两个基准数据集上的实验表明，我们的方法在人体-场景联合重建方面达到了当前最先进水平，在 PSNR/SSIM 指标上显著优于强基线模型，同时有效降低了 LPIPS，尤其在高速运动场景中表现突出。\n"
  },
  {
    "path": "abs/2509.18759.md",
    "content": "### FixingGS: Enhancing 3D Gaussian Splatting via Training-Free Score Distillation\n\nRecently, 3D Gaussian Splatting (3DGS) has demonstrated remarkable success in 3D reconstruction and novel view synthesis. However, reconstructing 3D scenes from sparse viewpoints remains highly challenging due to insufficient visual information, which results in noticeable artifacts persisting across the 3D representation. To address this limitation, recent methods have resorted to generative priors to remove artifacts and complete missing content in under-constrained areas. Despite their effectiveness, these approaches struggle to ensure multi-view consistency, resulting in blurred structures and implausible details. In this work, we propose FixingGS, a training-free method that fully exploits the capabilities of the existing diffusion model for sparse-view 3DGS reconstruction enhancement. At the core of FixingGS is our distillation approach, which delivers more accurate and cross-view coherent diffusion priors, thereby enabling effective artifact removal and inpainting. In addition, we propose an adaptive progressive enhancement scheme that further refines reconstructions in under-constrained regions. Extensive experiments demonstrate that FixingGS surpasses existing state-of-the-art methods with superior visual quality and reconstruction performance.\n\n近年来，三维高斯溅射（3D Gaussian Splatting，3DGS）在三维重建和新视角合成方面取得了显著成功。然而，由于视角稀疏导致视觉信息不足，从有限视点重建三维场景仍然极具挑战性，这会使得伪影在三维表示中持续存在。为了解决这一问题，近年来的方法引入生成先验来去除伪影并在欠约束区域补全缺失内容。尽管这些方法有效，但它们难以保证多视图一致性，常导致结构模糊和细节不合理。为此，我们提出了 FixingGS，一种无需训练的方法，充分利用现有扩散模型的能力以增强稀疏视点下的 3DGS 重建效果。FixingGS 的核心是一种蒸馏策略，可生成更加精确且跨视图一致的扩散先验，从而实现有效的伪影消除与内容修复。此外，我们还提出了一种自适应渐进增强方案，用于进一步优化欠约束区域的重建效果。大量实验结果表明，FixingGS 在视觉质量与重建性能方面均显著超越现有最先进方法。\n"
  },
  {
    "path": "abs/2509.18898.md",
    "content": "### DeblurSplat: SfM-free 3D Gaussian Splatting with Event Camera for Robust Deblurring\n\nIn this paper, we propose the first Structure-from-Motion (SfM)-free deblurring 3D Gaussian Splatting method via event camera, dubbed DeblurSplat. We address the motion-deblurring problem in two ways. First, we leverage the pretrained capability of the dense stereo module (DUSt3R) to directly obtain accurate initial point clouds from blurred images. Without calculating camera poses as an intermediate result, we avoid the cumulative errors transfer from inaccurate camera poses to the initial point clouds' positions. Second, we introduce the event stream into the deblur pipeline for its high sensitivity to dynamic change. By decoding the latent sharp images from the event stream and blurred images, we can provide a fine-grained supervision signal for scene reconstruction optimization. Extensive experiments across a range of scenes demonstrate that DeblurSplat not only excels in generating high-fidelity novel views but also achieves significant rendering efficiency compared to the SOTAs in deblur 3D-GS.\n\n本文提出了首个基于事件相机的无 SfM（Structure-from-Motion）去模糊三维高斯溅射方法，命名为 DeblurSplat。我们从两个方面解决运动去模糊问题。首先，利用密集立体匹配模块（DUSt3R）的预训练能力，直接从模糊图像中获得精确的初始点云。在不需要计算相机位姿作为中间结果的情况下，我们有效避免了由位姿估计误差传递至初始点云位置的累积误差。其次，我们将事件流引入去模糊流程中，利用其对动态变化的高敏感性。通过结合事件流与模糊图像解码潜在清晰图像，我们能够为场景重建优化提供细粒度的监督信号。在多种场景下的大量实验表明，DeblurSplat 不仅在生成高保真新视图方面表现出色，还在去模糊三维高斯溅射（deblur 3D-GS）任务中相较现有最先进方法（SOTAs）显著提升了渲染效率。\n"
  },
  {
    "path": "abs/2509.18956.md",
    "content": "### Seeing Through Reflections: Advancing 3D Scene Reconstruction in Mirror-Containing Environments with Gaussian Splatting\n\nMirror-containing environments pose unique challenges for 3D reconstruction and novel view synthesis (NVS), as reflective surfaces introduce view-dependent distortions and inconsistencies. While cutting-edge methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) excel in typical scenes, their performance deteriorates in the presence of mirrors. Existing solutions mainly focus on handling mirror surfaces through symmetry mapping but often overlook the rich information carried by mirror reflections. These reflections offer complementary perspectives that can fill in absent details and significantly enhance reconstruction quality. To advance 3D reconstruction in mirror-rich environments, we present MirrorScene3D, a comprehensive dataset featuring diverse indoor scenes, 1256 high-quality images, and annotated mirror masks, providing a benchmark for evaluating reconstruction methods in reflective settings. Building on this, we propose ReflectiveGS, an extension of 3D Gaussian Splatting that utilizes mirror reflections as complementary viewpoints rather than simple symmetry artifacts, enhancing scene geometry and recovering absent details. Experiments on MirrorScene3D show that ReflectiveGaussian outperforms existing methods in SSIM, PSNR, LPIPS, and training speed, setting a new benchmark for 3D reconstruction in mirror-rich environments.\n\n含有镜面的环境为三维重建和新视角合成（NVS）带来了独特挑战，因为反射表面会引入视角相关的畸变与不一致性。尽管前沿方法如神经辐射场（Neural Radiance Fields, NeRF）和三维高斯溅射（3D Gaussian Splatting, 3DGS）在常规场景中表现出色，但在镜面存在的情况下性能显著下降。现有方法主要通过对称映射来处理镜面问题，但往往忽略了镜面反射中所蕴含的丰富信息。这些反射提供了互补视角，可用于填补缺失细节，从而显著提升重建质量。为推动镜面场景下的三维重建研究，我们提出了 MirrorScene3D，一个包含多样室内场景、1256 张高质量图像及镜面掩码标注的综合数据集，为反射环境下的重建方法提供了评测基准。在此基础上，我们提出 ReflectiveGS，这是对三维高斯溅射的扩展方法，将镜面反射视为互补视角而非单纯的对称伪影，从而增强场景几何结构并恢复缺失细节。基于 MirrorScene3D 的实验结果表明，ReflectiveGaussian 在 SSIM、PSNR、LPIPS 以及训练速度等方面均优于现有方法，树立了镜面环境下三维重建的新基准。\n"
  },
  {
    "path": "abs/2509.19073.md",
    "content": "### WaveletGaussian: Wavelet-domain Diffusion for Sparse-view 3D Gaussian Object Reconstruction\n\n3D Gaussian Splatting (3DGS) has become a powerful representation for image-based object reconstruction, yet its performance drops sharply in sparse-view settings. Prior works address this limitation by employing diffusion models to repair corrupted renders, subsequently using them as pseudo ground truths for later optimization. While effective, such approaches incur heavy computation from the diffusion fine-tuning and repair steps. We present WaveletGaussian, a framework for more efficient sparse-view 3D Gaussian object reconstruction. Our key idea is to shift diffusion into the wavelet domain: diffusion is applied only to the low-resolution LL subband, while high-frequency subbands are refined with a lightweight network. We further propose an efficient online random masking strategy to curate training pairs for diffusion fine-tuning, replacing the commonly used, but inefficient, leave-one-out strategy. Experiments across two benchmark datasets, Mip-NeRF 360 and OmniObject3D, show WaveletGaussian achieves competitive rendering quality while substantially reducing training time.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）已成为基于图像的物体重建中一种强大的表示方式，但在稀疏视角设置下，其性能会急剧下降。以往方法通常通过扩散模型修复受损渲染结果，并将其作为伪真值用于后续优化。虽然此类方法有效，但扩散微调与修复步骤带来了高昂的计算开销。为此，我们提出了 WaveletGaussian，一种更高效的稀疏视角三维高斯物体重建框架。其核心思想是将扩散过程转移到小波域中：扩散仅作用于低分辨率的 LL 子带，而高频子带则通过轻量级网络进行精细化补偿。此外，我们提出了一种高效的在线随机掩码策略，用于生成扩散微调的训练对，替代常用但低效的留一法（leave-one-out）策略。在两个基准数据集 Mip-NeRF 360 和 OmniObject3D 上的实验表明，WaveletGaussian 在显著降低训练时间的同时，仍能实现与最先进方法相当的渲染质量。\n"
  },
  {
    "path": "abs/2509.19296.md",
    "content": "### Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation\n\nThe ability to generate virtual environments is crucial for applications ranging from gaming to physical AI domains such as robotics, autonomous driving, and industrial AI. Current learning-based 3D reconstruction methods rely on the availability of captured real-world multi-view data, which is not always readily available. Recent advancements in video diffusion models have shown remarkable imagination capabilities, yet their 2D nature limits the applications to simulation where a robot needs to navigate and interact with the environment. In this paper, we propose a self-distillation framework that aims to distill the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation, eliminating the need for multi-view training data. Specifically, we augment the typical RGB decoder with a 3DGS decoder, which is supervised by the output of the RGB decoder. In this approach, the 3DGS decoder can be purely trained with synthetic data generated by video diffusion models. At inference time, our model can synthesize 3D scenes from either a text prompt or a single image for real-time rendering. Our framework further extends to dynamic 3D scene generation from a monocular input video. Experimental results show that our framework achieves state-of-the-art performance in static and dynamic 3D scene generation.\n\n虚拟环境的生成能力对于从游戏到机器人、自主驾驶和工业智能等物理人工智能领域的应用至关重要。当前基于学习的三维重建方法依赖于真实世界多视角数据的获取，而这类数据往往难以获得。近年来的视频扩散模型展现出了强大的想象能力，但其二维特性限制了其在需要机器人导航与交互的仿真环境中的应用。为此，本文提出了一种自蒸馏框架，旨在将视频扩散模型中隐含的三维知识蒸馏为显式的三维高斯溅射（3D Gaussian Splatting，3DGS）表示，从而无需依赖多视角训练数据。具体而言，我们在常规的 RGB 解码器基础上增加了一个 3DGS 解码器，并通过 RGB 解码器的输出对其进行监督。在该框架下，3DGS 解码器可完全基于视频扩散模型生成的合成数据进行训练。在推理阶段，我们的模型能够从文本提示或单张图像合成三维场景，并实现实时渲染。此外，该框架还可扩展至从单目视频生成动态三维场景。实验结果表明，我们的框架在静态与动态三维场景生成任务中均达到了当前最先进的性能。\n"
  },
  {
    "path": "abs/2509.19297.md",
    "content": "### VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction\n\nFeed-forward 3D Gaussian Splatting (3DGS) has emerged as a highly effective solution for novel view synthesis. Existing methods predominantly rely on a pixel-aligned Gaussian prediction paradigm, where each 2D pixel is mapped to a 3D Gaussian. We rethink this widely adopted formulation and identify several inherent limitations: it renders the reconstructed 3D models heavily dependent on the number of input views, leads to view-biased density distributions, and introduces alignment errors, particularly when source views contain occlusions or low texture. To address these challenges, we introduce VolSplat, a new multi-view feed-forward paradigm that replaces pixel alignment with voxel-aligned Gaussians. By directly predicting Gaussians from a predicted 3D voxel grid, it overcomes pixel alignment's reliance on error-prone 2D feature matching, ensuring robust multi-view consistency. Furthermore, it enables adaptive control over Gaussian density based on 3D scene complexity, yielding more faithful Gaussian point clouds, improved geometric consistency, and enhanced novel-view rendering quality. Experiments on widely used benchmarks including RealEstate10K and ScanNet demonstrate that VolSplat achieves state-of-the-art performance while producing more plausible and view-consistent Gaussian reconstructions. In addition to superior results, our approach establishes a more scalable framework for feed-forward 3D reconstruction with denser and more robust representations, paving the way for further research in wider communities. :w\n\n前向三维高斯溅射（Feed-forward 3D Gaussian Splatting，3DGS）已成为新视角合成中极具效率的解决方案。现有方法大多采用像素对齐的高斯预测范式，即将每个二维像素映射为一个三维高斯。我们重新审视这一被广泛采用的公式化方法，并发现其存在多种内在局限：它使得重建的三维模型严重依赖输入视角数量，导致视角偏置的密度分布，并在源视图存在遮挡或低纹理区域时引入对齐误差。为解决这些问题，我们提出了 VolSplat，一种基于体素对齐高斯的新型多视角前向范式。通过直接从预测的三维体素网格中生成高斯，VolSplat 摆脱了像素对齐对易错二维特征匹配的依赖，确保了稳健的多视图一致性。此外，该方法能够根据三维场景复杂度自适应地控制高斯密度，从而获得更真实的高斯点云、更一致的几何结构以及更高质量的新视角渲染效果。在 RealEstate10K 和 ScanNet 等常用基准数据集上的实验表明，VolSplat 不仅在性能上达到当前最先进水平，还能生成更合理且视图一致的高斯重建结果。除此之外，我们的方法还建立了一个更具可扩展性的前向三维重建框架，具备更高的密度与鲁棒性，为更广泛的研究探索奠定了基础。\n"
  },
  {
    "path": "abs/2509.19726.md",
    "content": "### PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction\n\nEfficient shape reconstruction for surfaces with complex reflectance properties is crucial for real-time virtual reality. While 3D Gaussian Splatting (3DGS)-based methods offer fast novel view rendering by leveraging their explicit surface representation, their reconstruction quality lags behind that of implicit neural representations, particularly in the case of recovering surfaces with complex reflective reflectance. To address these problems, we propose PolGS, a Polarimetric Gaussian Splatting model allowing fast reflective surface reconstruction in 10 minutes. By integrating polarimetric constraints into the 3DGS framework, PolGS effectively separates specular and diffuse components, enhancing reconstruction quality for challenging reflective materials. Experimental results on the synthetic and real-world dataset validate the effectiveness of our method.\n\n对于具有复杂反射特性的表面，高效的形状重建对于实时虚拟现实应用至关重要。虽然基于三维高斯溅射（3D Gaussian Splatting，3DGS）的方法通过显式表面表示实现了快速的新视角渲染，但其重建质量仍低于隐式神经表示，尤其在处理复杂反射表面时表现不足。为解决这一问题，我们提出了 PolGS，一种偏振高斯溅射模型，可在 10 分钟内实现快速反射表面重建。通过将偏振约束引入 3DGS 框架，PolGS 能有效分离镜面反射与漫反射成分，从而提升对高反射性材料的重建质量。基于合成数据集和真实世界数据集的实验结果验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2509.19793.md",
    "content": "### BiTAA: A Bi-Task Adversarial Attack for Object Detection and Depth Estimation via 3D Gaussian Splatting\n\nCamera-based perception is critical to autonomous driving yet remains vulnerable to task-specific adversarial manipulations in object detection and monocular depth estimation. Most existing 2D/3D attacks are developed in task silos, lack mechanisms to induce controllable depth bias, and offer no standardized protocol to quantify cross-task transfer, leaving the interaction between detection and depth underexplored. We present BiTAA, a bi-task adversarial attack built on 3D Gaussian Splatting that yields a single perturbation capable of simultaneously degrading detection and biasing monocular depth. Specifically, we introduce a dual-model attack framework that supports both full-image and patch settings and is compatible with common detectors and depth estimators, with optional expectation-over-transformation (EOT) for physical reality. In addition, we design a composite loss that couples detection suppression with a signed, magnitude-controlled log-depth bias within regions of interest (ROIs) enabling controllable near or far misperception while maintaining stable optimization across tasks. We also propose a unified evaluation protocol with cross-task transfer metrics and real-world evaluations, showing consistent cross-task degradation and a clear asymmetry between Det to Depth and from Depth to Det transfer. The results highlight practical risks for multi-task camera-only perception and motivate cross-task-aware defenses in autonomous driving scenarios.\n\n基于摄像头的感知对于自动驾驶至关重要，但在目标检测和单目深度估计等任务中仍易受到特定任务的对抗扰动攻击。现有的二维/三维攻击方法多在各自任务领域独立发展，缺乏可控深度偏差机制，也没有标准化协议来量化跨任务迁移效应，使得检测与深度任务间的交互关系仍未得到充分研究。为此，我们提出了 BiTAA，这是一种基于三维高斯溅射（3D Gaussian Splatting）的双任务对抗攻击方法，能够通过单一扰动同时削弱检测性能并对单目深度估计产生偏移。具体而言，我们设计了一个双模型攻击框架，支持全图与补丁两种设置，兼容常见检测器与深度估计器，并可选用期望变换（EOT）机制以增强物理可实现性。此外，我们构建了一种复合损失函数，将检测抑制与在感兴趣区域（ROI）内具有符号与幅值可控的对数深度偏差相结合，从而实现可控的近距或远距误感知，同时保持跨任务优化的稳定性。我们还提出了统一的评测协议，包含跨任务迁移指标与真实场景测试，结果表明该方法在检测与深度任务中均表现出一致的跨任务退化效应，并揭示了从检测到深度与从深度到检测迁移之间的明显不对称性。实验结果揭示了多任务纯视觉感知系统在现实环境下的潜在安全风险，并为自动驾驶场景中的跨任务防御研究提供了新的启示。\n"
  },
  {
    "path": "abs/2509.19898.md",
    "content": "### Aerial-Ground Image Feature Matching via 3D Gaussian Splatting-based Intermediate View Rendering\n\nThe integration of aerial and ground images has been a promising solution in 3D modeling of complex scenes, which is seriously restricted by finding reliable correspondences. The primary contribution of this study is a feature matching algorithm for aerial and ground images, whose core idea is to generate intermediate views to alleviate perspective distortions caused by the extensive viewpoint changes. First, by using aerial images only, sparse models are reconstructed through an incremental SfM (Structure from Motion) engine due to their large scene coverage. Second, 3D Gaussian Splatting is then adopted for scene rendering by taking as inputs sparse points and oriented images. For accurate view rendering, a render viewpoint determination algorithm is designed by using the oriented camera poses of aerial images, which is used to generate high-quality intermediate images that can bridge the gap between aerial and ground images. Third, with the aid of intermediate images, reliable feature matching is conducted for match pairs from render-aerial and render-ground images, and final matches can be generated by transmitting correspondences through intermediate views. By using real aerial and ground datasets, the validation of the proposed solution has been verified in terms of feature matching and scene rendering and compared comprehensively with widely used methods. The experimental results demonstrate that the proposed solution can provide reliable feature matches for aerial and ground images with an obvious increase in the number of initial and refined matches, and it can provide enough matches to achieve accurate ISfM reconstruction and complete 3DGS-based scene rendering.\n\n融合航拍图像与地面图像已成为复杂场景三维建模中的一种重要解决方案，但其性能受限于难以找到可靠的特征对应关系。本文的主要贡献是一种面向航拍与地面图像的特征匹配算法，其核心思想是通过生成中间视图来缓解大视角变化所带来的透视畸变。首先，仅使用航拍图像，通过增量式运动恢复结构（SfM）引擎重建稀疏模型，以利用其较大的场景覆盖范围。其次，采用三维高斯溅射（3D Gaussian Splatting, 3DGS）进行场景渲染，以稀疏点与定向图像作为输入。为实现精确的视图渲染，设计了一种基于航拍图像相机位姿的渲染视点确定算法，用于生成高质量的中间图像，从而在航拍与地面图像之间建立视觉桥梁。第三，借助这些中间图像，在渲染—航拍与渲染—地面图像之间执行可靠的特征匹配，并通过中间视图传递特征对应关系以获得最终匹配。基于真实的航拍与地面数据集的实验验证了所提方法在特征匹配与场景渲染方面的有效性，并与多种主流方法进行了全面比较。实验结果表明，该方法能够显著提高航拍与地面图像的初始及精化匹配数量，获得更加稳定可靠的特征对应关系，并为实现精确的 ISfM 重建和完整的基于 3DGS 的场景渲染提供了充分支撑。\n"
  },
  {
    "path": "abs/2509.19937.md",
    "content": "### GS-RoadPatching: Inpainting Gaussians via 3D Searching and Placing for Driving Scenes\n\nThis paper presents GS-RoadPatching, an inpainting method for driving scene completion by referring to completely reconstructed regions, which are represented by 3D Gaussian Splatting (3DGS). Unlike existing 3DGS inpainting methods that perform generative completion relying on 2D perspective-view-based diffusion or GAN models to predict limited appearance or depth cues for missing regions, our approach enables substitutional scene inpainting and editing directly through the 3DGS modality, extricating it from requiring spatial-temporal consistency of 2D cross-modals and eliminating the need for time-intensive retraining of Gaussians. Our key insight is that the highly repetitive patterns in driving scenes often share multi-modal similarities within the implicit 3DGS feature space and are particularly suitable for structural matching to enable effective 3DGS-based substitutional inpainting. Practically, we construct feature-embedded 3DGS scenes to incorporate a patch measurement method for abstracting local context at different scales and, subsequently, propose a structural search method to find candidate patches in 3D space effectively. Finally, we propose a simple yet effective substitution-and-fusion optimization for better visual harmony. We conduct extensive experiments on multiple publicly available datasets to demonstrate the effectiveness and efficiency of our proposed method in driving scenes, and the results validate that our method achieves state-of-the-art performance compared to the baseline methods in terms of both quality and interoperability. Additional experiments in general scenes also demonstrate the applicability of the proposed 3D inpainting strategy.\n\n本文提出了一种基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的驾驶场景补全方法——GS-RoadPatching，通过参考已完全重建的区域实现场景修复。与现有依赖二维透视视图扩散或生成对抗网络（GAN）模型进行生成式补全的 3DGS 修复方法不同，GS-RoadPatching 能够在 3DGS 模态下直接进行替代式场景修复与编辑，从而摆脱对二维跨模态时空一致性的依赖，并避免了耗时的高斯再训练过程。我们的方法的核心洞见在于：驾驶场景中具有高度重复的模式，这些模式在 3DGS 隐式特征空间中通常具有多模态相似性，非常适合进行结构匹配以实现高效的替代式三维修复。具体而言，我们首先构建了带有特征嵌入的 3DGS 场景，并引入了一种多尺度的补丁测度方法以提取局部上下文信息，随后提出了一个结构化搜索算法以在三维空间中高效查找候选补丁。最后，我们设计了一种简单而有效的替换-融合优化策略，以实现更自然的视觉一致性。我们在多个公开驾驶场景数据集上进行了大量实验，结果表明 GS-RoadPatching 在修复质量和互操作性方面均显著优于现有基线方法，达到了当前最先进的性能。此外，在一般场景上的额外实验也进一步验证了所提出三维修复策略的广泛适用性。\n"
  },
  {
    "path": "abs/2509.20207.md",
    "content": "### PU-Gaussian: Point Cloud Upsampling using 3D Gaussian Representation\n\nPoint clouds produced by 3D sensors are often sparse and noisy, posing challenges for tasks requiring dense and high-fidelity 3D representations. Prior work has explored both implicit feature-based upsampling and distance-function learning to address this, but often at the expense of geometric interpretability or robustness to input sparsity. To overcome these limitations, we propose PU-Gaussian, a novel upsampling network that models the local neighborhood around each point using anisotropic 3D Gaussian distributions. These Gaussians capture the underlying geometric structure, allowing us to perform upsampling explicitly in the local geometric domain by direct point sampling. The sampling process generates a dense, but coarse, point cloud. A subsequent refinement network adjusts the coarse output to produce a more uniform distribution and sharper edges. We perform extensive testing on the PU1K and PUGAN datasets, demonstrating that PU-Gaussian achieves state-of-the-art performance.\n\n由三维传感器生成的点云通常稀疏且含噪，这给需要高密度、高保真三维表示的任务带来了挑战。以往的研究尝试通过隐式特征上采样或距离函数学习来解决这一问题，但往往牺牲了几何可解释性或对输入稀疏性的鲁棒性。为克服这些局限，我们提出了 PU-Gaussian，一种新型的上采样网络，通过各点的局部邻域建模为各向异性的三维高斯分布。该高斯分布能够捕获底层几何结构，从而在局部几何域中通过直接采样显式地进行上采样。采样过程生成稠密但相对粗糙的点云，随后通过一个细化网络调整输出，使其分布更均匀、边缘更锐利。我们在 PU1K 和 PUGAN 数据集上进行了大量实验，结果表明 PU-Gaussian 在性能上达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2509.20251.md",
    "content": "### 4D Driving Scene Generation With Stereo Forcing\n\nCurrent generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization. Bridging generation and novel view synthesis remains a major challenge. We present PhiGenesis, a unified framework for 4D scene generation that extends video generation techniques with geometric and temporal consistency. Given multi-view image sequences and camera parameters, PhiGenesis produces temporally continuous 4D Gaussian splatting representations along target 3D trajectories. In its first stage, PhiGenesis leverages a pre-trained video VAE with a novel range-view adapter to enable feed-forward 4D reconstruction from multi-view images. This architecture supports single-frame or video inputs and outputs complete 4D scenes including geometry, semantics, and motion. In the second stage, PhiGenesis introduces a geometric-guided video diffusion model, using rendered historical 4D scenes as priors to generate future views conditioned on trajectories. To address geometric exposure bias in novel views, we propose Stereo Forcing, a novel conditioning strategy that integrates geometric uncertainty during denoising. This method enhances temporal coherence by dynamically adjusting generative influence based on uncertainty-aware perturbations. Our experimental results demonstrate that our method achieves state-of-the-art performance in both appearance and geometric reconstruction, temporal generation and novel view synthesis (NVS) tasks, while simultaneously delivering competitive performance in downstream evaluations.\n\n当前的生成模型在合成动态四维驾驶场景时仍存在显著挑战，尤其是在无需针对每个场景进行单独优化的情况下，同时实现时间外推与空间新视角合成（NVS）仍然困难。生成与新视角合成之间的桥接问题尚未得到充分解决。本文提出了 PhiGenesis，一个统一的四维场景生成框架，将视频生成技术扩展至具备几何与时间一致性。给定多视角图像序列及相机参数，PhiGenesis 可沿目标三维轨迹生成时间连续的四维高斯溅射（4D Gaussian Splatting）表示。在第一阶段，PhiGenesis 利用预训练的视频 VAE，并引入新颖的视域适配器（range-view adapter），实现从多视角图像到四维场景的前向重建。该架构同时支持单帧或视频输入，输出包含几何、语义与运动信息的完整四维场景。在第二阶段，PhiGenesis 引入几何引导的视频扩散模型，利用历史渲染的四维场景作为先验，基于轨迹生成未来视图。为应对新视角中的几何曝光偏差，我们提出了一种名为立体约束（Stereo Forcing）的新型条件策略，在去噪过程中融合几何不确定性。该方法通过基于不确定性感知扰动动态调整生成影响，从而增强时间一致性。实验结果表明，我们的方法在外观与几何重建、时间生成以及新视角合成（NVS）任务上均达到了当前最先进的性能，同时在下游评测中也展现出优异的综合表现。\n"
  },
  {
    "path": "abs/2509.20400.md",
    "content": "### SeHDR: Single-Exposure HDR Novel View Synthesis via 3D Gaussian Bracketing\n\nThis paper presents SeHDR, a novel high dynamic range 3D Gaussian Splatting (HDR-3DGS) approach for generating HDR novel views given multi-view LDR images. Unlike existing methods that typically require the multi-view LDR input images to be captured from different exposures, which are tedious to capture and more likely to suffer from errors (e.g., object motion blurs and calibration/alignment inaccuracies), our approach learns the HDR scene representation from multi-view LDR images of a single exposure. Our key insight to this ill-posed problem is that by first estimating Bracketed 3D Gaussians (i.e., with different exposures) from single-exposure multi-view LDR images, we may then be able to merge these bracketed 3D Gaussians into an HDR scene representation. Specifically, SeHDR first learns base 3D Gaussians from single-exposure LDR inputs, where the spherical harmonics parameterize colors in a linear color space. We then estimate multiple 3D Gaussians with identical geometry but varying linear colors conditioned on exposure manipulations. Finally, we propose the Differentiable Neural Exposure Fusion (NeEF) to integrate the base and estimated 3D Gaussians into HDR Gaussians for novel view rendering. Extensive experiments demonstrate that SeHDR outperforms existing methods as well as carefully designed baselines.\n\n本文提出了 SeHDR，一种新型的高动态范围三维高斯溅射（High Dynamic Range 3D Gaussian Splatting，HDR-3DGS）方法，可基于多视角低动态范围（LDR）图像生成 HDR 新视角图像。与现有方法通常需要多视角 LDR 图像来自不同曝光（导致采集过程繁琐，且易受物体运动模糊与标定/配准误差影响）不同，SeHDR 能够仅基于单一曝光的多视角 LDR 图像学习 HDR 场景表示。我们针对这一病态问题的核心洞见在于：若先从单曝光多视角 LDR 图像中估计出不同曝光的括号式三维高斯（Bracketed 3D Gaussians），即可将这些高斯融合为统一的 HDR 场景表示。具体而言，SeHDR 首先从单曝光 LDR 输入中学习基础三维高斯，其中颜色由球谐函数在线性色彩空间中进行参数化；然后基于曝光变换条件，估计具有相同几何结构但线性色彩不同的多组三维高斯。最后，我们提出可微神经曝光融合（Differentiable Neural Exposure Fusion, NeEF），将基础与估计的高斯融合为 HDR 高斯以实现新视角渲染。大量实验结果表明，SeHDR 在性能上优于现有方法和精心设计的基线模型。\n"
  },
  {
    "path": "abs/2509.21702.md",
    "content": "### PowerGS: Display-Rendering Power Co-Optimization for Neural Rendering in Power-Constrained XR Systems\n\n3D Gaussian Splatting (3DGS) combines classic image-based rendering, pointbased graphics, and modern differentiable techniques, and offers an interesting alternative to traditional physically-based rendering. 3DGS-family models are far from efficient for power-constrained Extended Reality (XR) devices, which need to operate at a Watt-level. This paper introduces PowerGS, the first framework to jointly minimize the rendering and display power in 3DGS under a quality constraint. We present a general problem formulation and show that solving the problem amounts to 1) identifying the iso-quality curve(s) in the landscape subtended by the display and rendering power and 2) identifying the power-minimal point on a given curve, which has a closed-form solution given a proper parameterization of the curves. PowerGS also readily supports foveated rendering for further power savings. Extensive experiments and user studies show that PowerGS achieves up to 86% total power reduction compared to state-of-the-art 3DGS models, with minimal loss in both subjective and objective quality.\n\n三维高斯溅射（3D Gaussian Splatting，简称 3DGS）融合了经典的基于图像渲染、基于点的图形学以及现代可微分技术，为传统的物理驱动渲染提供了一种有趣的替代方案。然而，3DGS 系列模型在功耗受限的扩展现实（XR）设备上效率较低，而这些设备通常只能在瓦级功耗下运行。本文提出了 PowerGS，这是首个在质量约束下联合最小化 3DGS 渲染与显示功耗的框架。我们提出了一个通用问题的形式化描述，并指出问题的求解可归结为：1）在显示功耗与渲染功耗构成的空间中确定等质量曲线；2）在给定曲线上找到功耗最小点。若对曲线进行适当参数化，该问题可获得解析解。此外，PowerGS 还自然支持注视点渲染（foveated rendering），以进一步节省功耗。大量实验与用户研究表明，PowerGS 在主观与客观质量几乎无损的情况下，相较于当前最先进的 3DGS 模型，可实现高达 86% 的整体功耗降低。\n"
  },
  {
    "path": "abs/2509.21853.md",
    "content": "### Dynamic Novel View Synthesis in High Dynamic Range\n\nHigh Dynamic Range Novel View Synthesis (HDR NVS) seeks to learn an HDR 3D model from Low Dynamic Range (LDR) training images captured under conventional imaging conditions. Current methods primarily focus on static scenes, implicitly assuming all scene elements remain stationary and non-living. However, real-world scenarios frequently feature dynamic elements, such as moving objects, varying lighting conditions, and other temporal events, thereby presenting a significantly more challenging scenario. To address this gap, we propose a more realistic problem named HDR Dynamic Novel View Synthesis (HDR DNVS), where the additional dimension \"Dynamic\" emphasizes the necessity of jointly modeling temporal radiance variations alongside sophisticated 3D translation between LDR and HDR. To tackle this complex, intertwined challenge, we introduce HDR-4DGS, a Gaussian Splatting-based architecture featured with an innovative dynamic tone-mapping module that explicitly connects HDR and LDR domains, maintaining temporal radiance coherence by dynamically adapting tone-mapping functions according to the evolving radiance distributions across the temporal dimension. As a result, HDR-4DGS achieves both temporal radiance consistency and spatially accurate color translation, enabling photorealistic HDR renderings from arbitrary viewpoints and time instances. Extensive experiments demonstrate that HDR-4DGS surpasses existing state-of-the-art methods in both quantitative performance and visual fidelity.\n\n高动态范围新视图合成（High Dynamic Range Novel View Synthesis，HDR NVS）旨在从在常规成像条件下拍摄的低动态范围（LDR）训练图像中学习一个高动态范围（HDR）三维模型。现有方法主要聚焦于静态场景，隐含假设场景中的所有元素都是静止且无生命的。然而，真实世界中普遍存在动态因素，例如移动的物体、变化的光照条件以及其他时序事件，从而使问题变得更加复杂与挑战性更高。为弥补这一空白，我们提出了一个更为现实的问题：高动态范围动态新视图合成（HDR Dynamic Novel View Synthesis，HDR DNVS），其中新增的“动态”维度强调了在低动态范围与高动态范围之间进行复杂三维转换的同时，还必须联合建模时变辐射特性。为应对这一复杂且交织的问题，我们提出了基于高斯溅射的架构 HDR-4DGS。该方法引入了创新的动态色调映射模块，显式地连接 HDR 与 LDR 域，并通过根据时间维度上不断变化的辐射分布动态调整色调映射函数，从而保持时序辐射一致性。由此，HDR-4DGS 同时实现了时间维度上的辐射一致性与空间维度上的精确颜色转换，能够在任意视角与时间点生成逼真的 HDR 渲染结果。大量实验表明，HDR-4DGS 在定量性能与视觉保真度上均显著优于现有的最先进方法。\n"
  },
  {
    "path": "abs/2509.22112.md",
    "content": "### Large Material Gaussian Model for Relightable 3D Generation\n\nThe increasing demand for 3D assets across various industries necessitates efficient and automated methods for 3D content creation. Leveraging 3D Gaussian Splatting, recent large reconstruction models (LRMs) have demonstrated the ability to efficiently achieve high-quality 3D rendering by integrating multiview diffusion for generation and scalable transformers for reconstruction. However, existing models fail to produce the material properties of assets, which is crucial for realistic rendering in diverse lighting environments. In this paper, we introduce the Large Material Gaussian Model (MGM), a novel framework designed to generate high-quality 3D content with Physically Based Rendering (PBR) materials, ie, albedo, roughness, and metallic properties, rather than merely producing RGB textures with uncontrolled light baking. Specifically, we first fine-tune a new multiview material diffusion model conditioned on input depth and normal maps. Utilizing the generated multiview PBR images, we explore a Gaussian material representation that not only aligns with 2D Gaussian Splatting but also models each channel of the PBR materials. The reconstructed point clouds can then be rendered to acquire PBR attributes, enabling dynamic relighting by applying various ambient light maps. Extensive experiments demonstrate that the materials produced by our method not only exhibit greater visual appeal compared to baseline methods but also enhance material modeling, thereby enabling practical downstream rendering applications.\n\n随着各行业对三维资产需求的不断增长，三维内容创作亟需更高效、自动化的方法。基于三维高斯溅射（3D Gaussian Splatting），近年来的大规模重建模型（Large Reconstruction Models，LRMs）通过结合多视角扩散生成与可扩展 Transformer 重建，展现了高效实现高质量三维渲染的能力。然而，现有模型无法生成资产的材质属性，而这对于在不同光照环境下实现真实感渲染至关重要。为此，本文提出了“大规模材质高斯模型”（Large Material Gaussian Model，MGM），一种旨在生成具备物理真实感渲染（Physically Based Rendering，PBR）材质的高质量三维内容的新型框架，即同时生成反照率（albedo）、粗糙度（roughness）和金属度（metallic）属性，而不仅仅是带有不受控光照烘焙的 RGB 纹理。具体而言，我们首先在输入的深度图与法线图条件下微调一个新的多视角材质扩散模型。利用生成的多视角 PBR 图像，我们进一步提出了一种高斯材质表示方法，该表示不仅与二维高斯溅射兼容，还能分别建模 PBR 材质的各个通道。通过对重建的点云进行渲染，可获得完整的 PBR 属性，从而能够通过不同的环境光照贴图实现动态重光照。大量实验表明，与基线方法相比，本方法生成的材质不仅在视觉上更具吸引力，同时在材质建模上表现更优，显著提升了其实用渲染应用的可行性。\n"
  },
  {
    "path": "abs/2509.22222.md",
    "content": "### Rigidity-Aware 3D Gaussian Deformation from a Single Image\n\nReconstructing object deformation from a single image remains a significant challenge in computer vision and graphics. Existing methods typically rely on multi-view video to recover deformation, limiting their applicability under constrained scenarios. To address this, we propose DeformSplat, a novel framework that effectively guides 3D Gaussian deformation from only a single image. Our method introduces two main technical contributions. First, we present Gaussian-to-Pixel Matching which bridges the domain gap between 3D Gaussian representations and 2D pixel observations. This enables robust deformation guidance from sparse visual cues. Second, we propose Rigid Part Segmentation consisting of initialization and refinement. This segmentation explicitly identifies rigid regions, crucial for maintaining geometric coherence during deformation. By combining these two techniques, our approach can reconstruct consistent deformations from a single image. Extensive experiments demonstrate that our approach significantly outperforms existing methods and naturally extends to various applications,such as frame interpolation and interactive object manipulation.\n\n从单张图像重建物体形变仍然是计算机视觉与图形学中的一项重大挑战。现有方法通常依赖多视角视频来恢复形变，这限制了其在受限场景中的适用性。为此，我们提出了 **DeformSplat**，一种能够仅基于单张图像有效引导三维高斯形变的新框架。我们的方法包含两个主要技术创新。首先，我们提出了 **高斯-像素匹配（Gaussian-to-Pixel Matching）**，用于弥合三维高斯表示与二维像素观测之间的域差距，从而能够从稀疏视觉线索中实现稳健的形变引导。其次，我们提出了 **刚性部分分割（Rigid Part Segmentation）**，包括初始化与细化两个阶段。该分割显式识别出刚性区域，对于在形变过程中保持几何一致性至关重要。通过结合这两项技术，我们的方法能够从单张图像中重建一致且合理的形变。大量实验结果表明，我们的方法显著优于现有方法，并可自然拓展至多种应用场景，如帧插值与交互式物体操控。\n"
  },
  {
    "path": "abs/2509.22225.md",
    "content": "### Polysemous Language Gaussian Splatting via Matching-based Mask Lifting\n\nLifting 2D open-vocabulary understanding into 3D Gaussian Splatting (3DGS) scenes is a critical challenge. However, mainstream methods suffer from three key flaws: (i) their reliance on costly per-scene retraining prevents plug-and-play application; (ii) their restrictive monosemous design fails to represent complex, multi-concept semantics; and (iii) their vulnerability to cross-view semantic inconsistencies corrupts the final semantic representation. To overcome these limitations, we introduce MUSplat, a training-free framework that abandons feature optimization entirely. Leveraging a pre-trained 2D segmentation model, our pipeline generates and lifts multi-granularity 2D masks into 3D, where we estimate a foreground probability for each Gaussian point to form initial object groups. We then optimize the ambiguous boundaries of these initial groups using semantic entropy and geometric opacity. Subsequently, by interpreting the object's appearance across its most representative viewpoints, a Vision-Language Model (VLM) distills robust textual features that reconciles visual inconsistencies, enabling open-vocabulary querying via semantic matching. By eliminating the costly per-scene training process, MUSplat reduces scene adaptation time from hours to mere minutes. On benchmark tasks for open-vocabulary 3D object selection and semantic segmentation, MUSplat outperforms established training-based frameworks while simultaneously addressing their monosemous limitations.\n\n将二维开放词汇理解提升到三维高斯溅射（3D Gaussian Splatting，3DGS）场景中是一项关键挑战。然而，主流方法存在三大缺陷：（i）依赖于高成本的逐场景重新训练，难以实现即插即用；（ii）受限于单义设计，无法表达复杂的多概念语义；（iii）易受跨视角语义不一致影响，从而破坏最终的语义表示。为克服这些限制，我们提出了 **MUSplat**，一个完全无训练的框架，彻底摒弃了特征优化过程。该方法利用预训练的二维分割模型生成多粒度的二维掩膜，并将其提升到三维空间，在此过程中为每个高斯点估计前景概率，从而形成初始的物体分组。随后，我们基于语义熵与几何不透明度对这些初始分组的模糊边界进行优化。接着，通过分析物体在其最具代表性的多个视角下的外观，视觉语言模型（Vision-Language Model，VLM）提炼出稳健的文本特征，从而协调视觉不一致性，实现基于语义匹配的开放词汇查询。通过消除昂贵的逐场景训练过程，MUSplat 将场景适配时间从数小时缩短至数分钟。在开放词汇三维物体选择与语义分割等基准任务上，MUSplat 在性能上超越了现有的基于训练的方法，同时有效解决了其单义语义限制问题。\n"
  },
  {
    "path": "abs/2509.22276.md",
    "content": "### GS-2M: Gaussian Splatting for Joint Mesh Reconstruction and Material Decomposition\n\nWe propose a unified solution for mesh reconstruction and material decomposition from multi-view images based on 3D Gaussian Splatting, referred to as GS-2M. Previous works handle these tasks separately and struggle to reconstruct highly reflective surfaces, often relying on priors from external models to enhance the decomposition results. Conversely, our method addresses these two problems by jointly optimizing attributes relevant to the quality of rendered depth and normals, maintaining geometric details while being resilient to reflective surfaces. Although contemporary works effectively solve these tasks together, they often employ sophisticated neural components to learn scene properties, which hinders their performance at scale. To further eliminate these neural components, we propose a novel roughness supervision strategy based on multi-view photometric variation. When combined with a carefully designed loss and optimization process, our unified framework produces reconstruction results comparable to state-of-the-art methods, delivering triangle meshes and their associated material components for downstream tasks. We validate the effectiveness of our approach with widely used datasets from previous works and qualitative comparisons with state-of-the-art surface reconstruction methods.\n\n我们提出了一种基于三维高斯溅射（3D Gaussian Splatting）的统一解决方案，用于从多视图图像中同时进行网格重建与材质分解，称为 **GS-2M**。以往的研究通常将这两项任务分开处理，并且在重建高反射表面时表现不佳，往往依赖外部模型的先验知识来提升分解效果。相反，我们的方法通过联合优化影响渲染深度与法线质量的属性，在保持几何细节的同时，对高反射表面具有更强的鲁棒性。尽管一些当代方法能够同时解决这两项任务，但它们往往采用复杂的神经组件来学习场景属性，从而限制了大规模场景下的性能。为进一步消除对这些神经组件的依赖，我们提出了一种基于多视图光度变化的新型粗糙度监督策略。结合精心设计的损失函数与优化过程，我们的统一框架能够生成与当前最先进方法相当的重建结果，同时输出可用于下游任务的三角网格及其对应的材质组件。我们通过多个常用基准数据集及与主流表面重建方法的定性比较验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2509.22498.md",
    "content": "### HELIOS: Hierarchical Exploration for Language-grounded Interaction in Open Scenes\n\nLanguage-specified mobile manipulation tasks in novel environments simultaneously face challenges interacting with a scene which is only partially observed, grounding semantic information from language instructions to the partially observed scene, and actively updating knowledge of the scene with new observations. To address these challenges, we propose HELIOS, a hierarchical scene representation and associated search objective to perform language specified pick and place mobile manipulation tasks. We construct 2D maps containing the relevant semantic and occupancy information for navigation while simultaneously actively constructing 3D Gaussian representations of task-relevant objects. We fuse observations across this multi-layered representation while explicitly modeling the multi-view consistency of the detections of each object. In order to efficiently search for the target object, we formulate an objective function balancing exploration of unobserved or uncertain regions with exploitation of scene semantic information. We evaluate HELIOS on the OVMM benchmark in the Habitat simulator, a pick and place benchmark in which perception is challenging due to large and complex scenes with comparatively small target objects. HELIOS achieves state-of-the-art results on OVMM. As our approach is zero-shot, HELIOS can also transfer to the real world without requiring additional data, as we illustrate by demonstrating it in a real world office environment on a Spot robot.\n\n在未知环境中执行由语言指定的移动操作任务（language-specified mobile manipulation）同时面临多重挑战：与仅部分可观测的场景交互、将语言指令中的语义信息与部分观测的场景进行语义对齐，以及在获取新观测后主动更新场景知识。为应对这些挑战，我们提出了 **HELIOS**，一种分层场景表示与联合搜索目标函数，用于执行由语言指定的拾取与放置移动操作任务。我们在导航层构建包含语义与占据信息的二维地图，同时主动构建与任务相关物体的三维高斯表示。该多层次表示结构可融合跨视角观测，并显式建模每个物体检测结果的多视一致性。为了高效搜索目标物体，我们设计了一个目标函数，在探索未观测或不确定区域与利用场景语义信息之间实现平衡。我们在 Habitat 模拟器中的 OVMM 基准上对 HELIOS 进行了评估，该基准测试用于复杂大场景中小目标物体的拾取与放置任务，具有极高的感知难度。HELIOS 在 OVMM 上达到了当前最先进的性能。由于该方法具备零样本（zero-shot）特性，HELIOS 无需额外数据即可直接迁移至现实环境，我们在真实办公场景中通过波士顿动力 Spot 机器人演示了其实用性。\n"
  },
  {
    "path": "abs/2509.22917.md",
    "content": "### Learning Unified Representation of 3D Gaussian Splatting\n\nA well-designed vectorized representation is crucial for the learning systems natively based on 3D Gaussian Splatting. While 3DGS enables efficient and explicit 3D reconstruction, its parameter-based representation remains hard to learn as features, especially for neural-network-based models. Directly feeding raw Gaussian parameters into learning frameworks fails to address the non-unique and heterogeneous nature of the Gaussian parameterization, yielding highly data-dependent models. This challenge motivates us to explore a more principled approach to represent 3D Gaussian Splatting in neural networks that preserves the underlying color and geometric structure while enforcing unique mapping and channel homogeneity. In this paper, we propose an embedding representation of 3DGS based on continuous submanifold fields that encapsulate the intrinsic information of Gaussian primitives, thereby benefiting the learning of 3DGS.\n\n对于原生基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的学习系统而言，设计良好的向量化表示至关重要。尽管 3DGS 具备高效且显式的三维重建能力，但其基于参数的表示形式难以直接作为特征被学习，尤其在神经网络模型中表现突出。将原始高斯参数直接输入学习框架，无法解决高斯参数化的非唯一性与异质性问题，从而导致模型对数据的高度依赖性。该挑战促使我们探索一种更为系统的方式，在神经网络中对 3DGS 进行表示，使其既能保留底层的颜色与几何结构，又能实现唯一映射与通道一致性。为此，本文提出了一种基于连续子流形场（continuous submanifold fields）的 3DGS 嵌入式表示方法，用以封装高斯基元的内在信息，从而有助于提升 3DGS 的学习能力。\n"
  },
  {
    "path": "abs/2509.23258.md",
    "content": "### OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting\n\nSparse-view novel view synthesis is fundamentally ill-posed due to severe geometric ambiguity. Current methods are caught in a trade-off: regressive models are geometrically faithful but incomplete, whereas generative models can complete scenes but often introduce structural inconsistencies. We propose OracleGS, a novel framework that reconciles generative completeness with regressive fidelity for sparse view Gaussian Splatting. Instead of using generative models to patch incomplete reconstructions, our \"propose-and-validate\" framework first leverages a pre-trained 3D-aware diffusion model to synthesize novel views to propose a complete scene. We then repurpose a multi-view stereo (MVS) model as a 3D-aware oracle to validate the 3D uncertainties of generated views, using its attention maps to reveal regions where the generated views are well-supported by multi-view evidence versus where they fall into regions of high uncertainty due to occlusion, lack of texture, or direct inconsistency. This uncertainty signal directly guides the optimization of a 3D Gaussian Splatting model via an uncertainty-weighted loss. Our approach conditions the powerful generative prior on multi-view geometric evidence, filtering hallucinatory artifacts while preserving plausible completions in under-constrained regions, outperforming state-of-the-art methods on datasets including Mip-NeRF 360 and NeRF Synthetic.\n\n稀疏视角的新视图合成由于存在严重的几何歧义而本质上是一个不适定问题。现有方法在两种范式之间存在权衡：回归式模型在几何上更加准确，但重建结果往往不完整；生成式模型能够补全场景，但常会引入结构性不一致。为此，我们提出了 **OracleGS**，一种在稀疏视角高斯溅射（Gaussian Splatting）中同时兼顾生成完整性与回归保真度的新框架。不同于以往直接使用生成模型修补不完整重建的思路，我们的“生成-验证（propose-and-validate）”框架首先利用预训练的具备三维感知能力的扩散模型（3D-aware diffusion model）合成新视图，从而生成一个完整场景的候选方案；随后，我们将多视图立体（MVS）模型重新利用为三维感知的“Oracle”，用于验证生成视图的三维不确定性。通过其注意力图（attention maps），可以区分出哪些区域由多视几何证据充分支撑，哪些区域由于遮挡、纹理缺失或直接不一致而存在高不确定性。该不确定性信号直接通过不确定性加权损失函数（uncertainty-weighted loss）引导 3D 高斯溅射模型的优化过程。我们的方式在多视几何证据的约束下利用强大的生成先验，有效过滤幻觉式伪影，同时在欠约束区域保持合理的场景补全效果。在包括 Mip-NeRF 360 与 NeRF Synthetic 等数据集上的实验结果表明，OracleGS 在性能上超越了当前最先进的方法。\n"
  },
  {
    "path": "abs/2509.23402.md",
    "content": "### WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving\n\nRecent advances in driving-scene generation and reconstruction have demonstrated significant potential for enhancing autonomous driving systems by producing scalable and controllable training data. Existing generation methods primarily focus on synthesizing diverse and high-fidelity driving videos; however, due to limited 3D consistency and sparse viewpoint coverage, they struggle to support convenient and high-quality novel-view synthesis (NVS). Conversely, recent 3D/4D reconstruction approaches have significantly improved NVS for real-world driving scenes, yet inherently lack generative capabilities. To overcome this dilemma between scene generation and reconstruction, we propose WorldSplat, a novel feed-forward framework for 4D driving-scene generation. Our approach effectively generates consistent multi-track videos through two key steps: (i) We introduce a 4D-aware latent diffusion model integrating multi-modal information to produce pixel-aligned 4D Gaussians in a feed-forward manner. (ii) Subsequently, we refine the novel view videos rendered from these Gaussians using a enhanced video diffusion model. Extensive experiments conducted on benchmark datasets demonstrate that WorldSplat effectively generates high-fidelity, temporally and spatially consistent multi-track novel view driving videos.\n\n近年来，驾驶场景的生成与重建取得了显著进展，为自动驾驶系统提供可扩展、可控的训练数据展现出巨大潜力。现有的生成方法主要致力于合成多样化且高保真的驾驶视频，但由于缺乏充分的三维一致性与稀疏的视角覆盖，这些方法难以支持便捷且高质量的新视图合成（Novel View Synthesis, NVS）。相反，近期的三维/四维重建方法显著提升了真实驾驶场景下的 NVS 表现，但其本质上缺乏生成能力。为打破场景生成与重建之间的这一困境，我们提出了 **WorldSplat**，一种用于四维驾驶场景生成的新型前馈框架。该方法通过两个关键步骤有效生成时空一致的多轨道视频：（i）我们提出了一个具备四维感知能力的潜空间扩散模型（4D-aware latent diffusion model），融合多模态信息，以前馈方式生成像素对齐的四维高斯；（ii）随后，我们利用增强型视频扩散模型对由这些高斯渲染的新视角视频进行精细化优化。大量在基准数据集上的实验结果表明，WorldSplat 能够高效生成具有高保真度、时空一致性和多轨道特征的新视角驾驶视频。\n"
  },
  {
    "path": "abs/2509.23492.md",
    "content": "### Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos\n\nWe present Orientation-anchored Gaussian Splatting (OriGS), a novel framework for high-quality 4D reconstruction from casually captured monocular videos. While recent advances extend 3D Gaussian Splatting to dynamic scenes via various motion anchors, such as graph nodes or spline control points, they often rely on low-rank assumptions and fall short in modeling complex, region-specific deformations inherent to unconstrained dynamics. OriGS addresses this by introducing a hyperdimensional representation grounded in scene orientation. We first estimate a Global Orientation Field that propagates principal forward directions across space and time, serving as stable structural guidance for dynamic modeling. Built upon this, we propose Orientation-aware Hyper-Gaussian, a unified formulation that embeds time, space, geometry, and orientation into a coherent probabilistic state. This enables inferring region-specific deformation through principled conditioned slicing, adaptively capturing diverse local dynamics in alignment with global motion intent. Experiments demonstrate the superior reconstruction fidelity of OriGS over mainstream methods in challenging real-world dynamic scenes.\n\n我们提出了 **Orientation-anchored Gaussian Splatting（OriGS）**，一种从随手拍摄的单目视频中实现高质量四维重建的新框架。尽管近期的研究已通过图节点或样条控制点等不同运动锚点，将三维高斯溅射（3D Gaussian Splatting）扩展至动态场景，但这些方法通常依赖低秩假设，难以准确建模真实复杂场景中区域特定的非刚性形变。OriGS 通过引入一种基于场景方向的高维表示（hyperdimensional representation）有效解决了这一问题。我们首先估计 **全局方向场（Global Orientation Field）**，以在时空中传播主要的前向方向，为动态建模提供稳定的结构性约束。基于此，我们提出了 **方向感知超高斯（Orientation-aware Hyper-Gaussian）**，一种统一的概率表征形式，将时间、空间、几何与方向信息共同嵌入到一致的状态空间中。该表征能够通过有原则的条件切片（conditioned slicing）推断区域特定的形变，从而自适应地捕获多样化的局部动态，并保持与全局运动意图的一致性。实验结果表明，OriGS 在复杂的真实动态场景中显著优于主流方法，展现出更高的重建保真度。\n"
  },
  {
    "path": "abs/2509.23947.md",
    "content": "### CrashSplat: 2D to 3D Vehicle Damage Segmentation in Gaussian Splatting\n\nAutomatic car damage detection has been a topic of significant interest for the auto insurance industry as it promises faster, accurate, and cost-effective damage assessments. However, few works have gone beyond 2D image analysis to leverage 3D reconstruction methods, which have the potential to provide a more comprehensive and geometrically accurate representation of the damage. Moreover, recent methods employing 3D representations for novel view synthesis, particularly 3D Gaussian Splatting (3D-GS), have demonstrated the ability to generate accurate and coherent 3D reconstructions from a limited number of views. In this work we introduce an automatic car damage detection pipeline that performs 3D damage segmentation by up-lifting 2D masks. Additionally, we propose a simple yet effective learning-free approach for single-view 3D-GS segmentation. Specifically, Gaussians are projected onto the image plane using camera parameters obtained via Structure from Motion (SfM). They are then filtered through an algorithm that utilizes Z-buffering along with a normal distribution model of depth and opacities. Through experiments we found that this method is particularly effective for challenging scenarios like car damage detection, where target objects (e.g., scratches, small dents) may only be clearly visible in a single view, making multi-view consistency approaches impractical or impossible.\n\n自动车辆损伤检测在汽车保险行业中一直是一个备受关注的研究方向，因为它能够实现更快速、精准且具成本效益的损伤评估。然而，目前的研究很少超越二维图像分析去利用三维重建方法，而三维重建有潜力提供更全面、更符合几何真实的损伤表征。此外，近年来基于三维表示的新视图合成（Novel View Synthesis, NVS）方法，特别是三维高斯溅射（3D Gaussian Splatting, 3D-GS），已经证明即使在有限视角下，也能生成精确且一致的三维重建结果。在本研究中，我们提出了一种自动化的车辆损伤检测流程，该流程通过二维掩膜的上升（up-lifting）实现三维损伤分割。此外，我们还提出了一种简单但高效的、无学习（learning-free）的单视角 3D-GS 分割方法。具体而言，我们利用通过运动结构（Structure from Motion, SfM）获取的相机参数，将高斯点投影到图像平面上。随后，结合深度与不透明度的正态分布模型，并借助 Z-buffer 算法进行筛选。实验表明，该方法在诸如车辆损伤检测等具有挑战性的场景中表现尤为出色，因为目标对象（如划痕、小凹陷）往往仅在单一视角下清晰可见，使得多视一致性方法在此类任务中难以应用甚至不可行。\n"
  },
  {
    "path": "abs/2509.24209.md",
    "content": "### Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos\n\nInstant reconstruction of dynamic 3D humans from uncalibrated sparse-view videos is critical for numerous downstream applications. Existing methods, however, are either limited by the slow reconstruction speeds or incapable of generating novel-time representations. To address these challenges, we propose Forge4D, a feed-forward 4D human reconstruction and interpolation model that efficiently reconstructs temporally aligned representations from uncalibrated sparse-view videos, enabling both novel view and novel time synthesis. Our model simplifies the 4D reconstruction and interpolation problem as a joint task of streaming 3D Gaussian reconstruction and dense motion prediction. For the task of streaming 3D Gaussian reconstruction, we first reconstruct static 3D Gaussians from uncalibrated sparse-view images and then introduce learnable state tokens to enforce temporal consistency in a memory-friendly manner by interactively updating shared information across different timestamps. For novel time synthesis, we design a novel motion prediction module to predict dense motions for each 3D Gaussian between two adjacent frames, coupled with an occlusion-aware Gaussian fusion process to interpolate 3D Gaussians at arbitrary timestamps. To overcome the lack of the ground truth for dense motion supervision, we formulate dense motion prediction as a dense point matching task and introduce a self-supervised retargeting loss to optimize this module. An additional occlusion-aware optical flow loss is introduced to ensure motion consistency with plausible human movement, providing stronger regularization. Extensive experiments demonstrate the effectiveness of our model on both in-domain and out-of-domain datasets.\n\n从未标定的稀疏视角视频中即时重建动态三维人体，对众多下游应用至关重要。然而，现有方法要么受限于较慢的重建速度，要么无法生成新的时间维度表示。为解决这些问题，我们提出了 **Forge4D**，一种前馈式的四维人体重建与插值模型，能够从未标定的稀疏视角视频中高效重建时间对齐的表示，从而同时实现新视角与新时间的合成。我们的模型将四维重建与插值问题简化为“流式三维高斯重建”和“稠密运动预测”的联合任务。对于流式三维高斯重建任务，我们首先从未标定的稀疏视角图像中重建静态三维高斯，然后引入可学习的状态标记（learnable state tokens），通过在不同时间戳间交互更新共享信息，以一种内存友好的方式强制时间一致性。针对新时间合成，我们设计了一个新颖的运动预测模块，用于预测相邻帧之间每个三维高斯的稠密运动，并结合遮挡感知的高斯融合过程（occlusion-aware Gaussian fusion），在任意时间戳插值出三维高斯。为克服缺乏稠密运动监督真值的问题，我们将稠密运动预测建模为稠密点匹配任务，并引入自监督的重定向损失（self-supervised retargeting loss）来优化该模块。此外，我们还引入了遮挡感知的光流损失（occlusion-aware optical flow loss），以保证运动的一致性与合理性，提供更强的正则约束。大量实验结果表明，Forge4D 在域内与跨域数据集上均取得了显著的性能提升。\n"
  },
  {
    "path": "abs/2509.24421.md",
    "content": "### Proxy-GS: Efficient 3D Gaussian Splatting via Proxy Mesh\n\n3D Gaussian Splatting (3DGS) has emerged as an efficient approach for achieving photorealistic rendering. Recent MLP-based variants further improve visual fidelity but introduce substantial decoding overhead during rendering. To alleviate computation cost, several pruning strategies and level-of-detail (LOD) techniques have been introduced, aiming to effectively reduce the number of Gaussian primitives in large-scale scenes. However, our analysis reveals that significant redundancy still remains due to the lack of occlusion awareness. In this work, we propose Proxy-GS, a novel pipeline that exploits a proxy to introduce Gaussian occlusion awareness from any view. At the core of our approach is a fast proxy system capable of producing precise occlusion depth maps at a resolution of 1000x1000 under 1ms. This proxy serves two roles: first, it guides the culling of anchors and Gaussians to accelerate rendering speed. Second, it guides the densification towards surfaces during training, avoiding inconsistencies in occluded regions, and improving the rendering quality. In heavily occluded scenarios, such as the MatrixCity Streets dataset, Proxy-GS not only equips MLP-based Gaussian splatting with stronger rendering capability but also achieves faster rendering speed. Specifically, it achieves more than 2.5x speedup over Octree-GS, and consistently delivers substantially higher rendering quality.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）已成为实现逼真渲染的一种高效方法。近期的基于多层感知机（MLP）的变体进一步提升了视觉保真度，但在渲染过程中引入了显著的解码开销。为降低计算成本，已有研究提出了多种剪枝策略和层次细节（Level of Detail, LOD）技术，旨在有效减少大规模场景中的高斯基元数量。然而，我们的分析表明，由于缺乏对遮挡的感知，仍存在大量冗余。为此，本文提出了 **Proxy-GS**，一种利用代理机制从任意视角引入高斯遮挡感知的新型管线。我们方法的核心是一个高速代理系统，能够在 1 毫秒内生成分辨率为 1000×1000 的精确遮挡深度图。该代理系统发挥两项关键作用：其一，在渲染过程中引导锚点与高斯的剔除，从而加速渲染速度；其二，在训练过程中引导高斯密化（densification）向表面区域聚集，避免遮挡区域的不一致性，从而提升渲染质量。在高度遮挡的场景中（如 MatrixCity Streets 数据集），Proxy-GS 不仅为基于 MLP 的高斯溅射提供了更强的渲染能力，还显著提升了渲染效率。具体而言，其相较于 Octree-GS 实现了超过 2.5 倍的速度提升，并在渲染质量上持续保持显著优势。\n"
  },
  {
    "path": "abs/2509.24758.md",
    "content": "### ExGS: Extreme 3D Gaussian Compression with Diffusion Priors\n\nNeural scene representations, such as 3D Gaussian Splatting (3DGS), have enabled high-quality neural rendering; however, their large storage and transmission costs hinder deployment in resource-constrained environments. Existing compression methods either rely on costly optimization, which is slow and scene-specific, or adopt training-free pruning and quantization, which degrade rendering quality under high compression ratios. In contrast, recent data-driven approaches provide a promising direction to overcome this trade-off, enabling efficient compression while preserving high rendering quality. We introduce ExGS, a novel feed-forward framework that unifies Universal Gaussian Compression (UGC) with GaussPainter for Extreme 3DGS compression. UGC performs re-optimization-free pruning to aggressively reduce Gaussian primitives while retaining only essential information, whereas GaussPainter leverages powerful diffusion priors with mask-guided refinement to restore high-quality renderings from heavily pruned Gaussian scenes. Unlike conventional inpainting, GaussPainter not only fills in missing regions but also enhances visible pixels, yielding substantial improvements in degraded renderings. To ensure practicality, it adopts a lightweight VAE and a one-step diffusion design, enabling real-time restoration. Our framework can even achieve over 100X compression (reducing a typical 354.77 MB model to about 3.31 MB) while preserving fidelity and significantly improving image quality under challenging conditions. These results highlight the central role of diffusion priors in bridging the gap between extreme compression and high-quality neural rendering.\n\n神经场景表示（Neural Scene Representation）方法，如三维高斯溅射（3D Gaussian Splatting, 3DGS），已实现高质量的神经渲染。然而，其庞大的存储与传输开销严重制约了在资源受限环境中的部署。现有压缩方法要么依赖于代价高昂的优化过程（速度慢且场景特定），要么采用无训练（training-free）的剪枝与量化策略，在高压缩比下会显著损害渲染质量。相比之下，近期基于数据驱动的压缩方法为打破这一权衡提供了新的方向，使得在保持高渲染质量的同时实现高效压缩成为可能。本文提出 **ExGS**，一种统一了通用高斯压缩（Universal Gaussian Compression, UGC）与高斯修复器（GaussPainter）的新型前馈框架，用于实现极致的 3DGS 压缩。UGC 通过无再优化的剪枝（re-optimization-free pruning）极大地减少高斯基元数量，仅保留关键信息；而 GaussPainter 则利用强大的扩散先验（diffusion priors）与掩膜引导的精修（mask-guided refinement），在严重剪枝的高斯场景中恢复高质量渲染结果。不同于传统的图像修补（inpainting），GaussPainter 不仅填补缺失区域，还能增强可见像素，从而显著改善退化渲染效果。为保证实用性，该方法采用轻量化的变分自编码器（VAE）与一步扩散（one-step diffusion）设计，实现实时恢复。实验表明，该框架可实现超过 **100 倍压缩率**（将典型的 354.77 MB 模型压缩至约 3.31 MB），同时在复杂条件下保持高保真度并显著提升图像质量。结果凸显了扩散先验在连接极限压缩与高质量神经渲染之间的核心作用。\n"
  },
  {
    "path": "abs/2509.24893.md",
    "content": "### HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping\n\nNovel View Synthesis (NVS) from sparse views presents a formidable challenge in 3D reconstruction, where limited multi-view constraints lead to severe overfitting, geometric distortion, and fragmented scenes. While 3D Gaussian Splatting (3DGS) delivers real-time, high-fidelity rendering, its performance drastically deteriorates under sparse inputs, plagued by floating artifacts and structural failures. To address these challenges, we introduce HBSplat, a unified framework that elevates 3DGS by seamlessly integrating robust structural cues, virtual view constraints, and occluded region completion. Our core contributions are threefold: a Hybrid-Loss Depth Estimation module that ensures multi-view consistency by leveraging dense matching priors and integrating reprojection, point propagation, and smoothness constraints; a Bidirectional Warping Virtual View Synthesis method that enforces substantially stronger constraints by creating high-fidelity virtual views through bidirectional depth-image warping and multi-view fusion; and an Occlusion-Aware Reconstruction component that recovers occluded areas using a depth-difference mask and a learning-based inpainting model. Extensive evaluations on LLFF, Blender, and DTU benchmarks validate that HBSplat sets a new state-of-the-art, achieving up to 21.13 dB PSNR and 0.189 LPIPS, while maintaining real-time inference.\n\n从稀疏视角进行新视图合成（Novel View Synthesis, NVS）在三维重建中是一项极具挑战性的问题。由于多视约束有限，模型容易出现严重的过拟合、几何畸变和场景碎片化现象。尽管三维高斯溅射（3D Gaussian Splatting, 3DGS）能够实现实时高保真渲染，但在稀疏输入条件下，其性能急剧下降，常受到漂浮伪影和结构性失败的困扰。为应对这些挑战，我们提出了 **HBSplat**，一个统一的框架，通过无缝整合结构约束、虚拟视图约束以及遮挡区域补全，显著提升 3DGS 的性能。我们的核心贡献包括三点：① **混合损失深度估计模块（Hybrid-Loss Depth Estimation）**，通过利用稠密匹配先验并结合重投影、点传播与平滑约束，实现多视一致性；② **双向变形虚拟视图合成方法（Bidirectional Warping Virtual View Synthesis）**，通过双向深度图像变形与多视融合生成高保真的虚拟视图，从而施加更强的约束；③ **遮挡感知重建模块（Occlusion-Aware Reconstruction）**，利用深度差掩膜与基于学习的修复模型恢复被遮挡区域。我们在 LLFF、Blender 和 DTU 等基准数据集上进行了大量实验验证，结果表明 HBSplat 刷新了当前最先进性能，在保持实时推理的同时，达到了最高 **21.13 dB PSNR** 和 **0.189 LPIPS** 的渲染质量。\n"
  },
  {
    "path": "abs/2509.25001.md",
    "content": "### LVT: Large-Scale Scene Reconstruction via Local View Transformers\n\nLarge transformer models are proving to be a powerful tool for 3D vision and novel view synthesis. However, the standard Transformer's well-known quadratic complexity makes it difficult to scale these methods to large scenes. To address this challenge, we propose the Local View Transformer (LVT), a large-scale scene reconstruction and novel view synthesis architecture that circumvents the need for the quadratic attention operation. Motivated by the insight that spatially nearby views provide more useful signal about the local scene composition than distant views, our model processes all information in a local neighborhood around each view. To attend to tokens in nearby views, we leverage a novel positional encoding that conditions on the relative geometric transformation between the query and nearby views. We decode the output of our model into a 3D Gaussian Splat scene representation that includes both color and opacity view-dependence. Taken together, the Local View Transformer enables reconstruction of arbitrarily large, high-resolution scenes in a single forward pass.\n\n大型 Transformer 模型已被证明是三维视觉与新视图合成中的强大工具。然而，标准 Transformer 所具有的著名二次复杂度（quadratic complexity）使得其难以在大规模场景中扩展。为应对这一挑战，我们提出了 **局部视图 Transformer（Local View Transformer, LVT）**，一种面向大规模场景重建与新视图合成的架构，能够绕过传统二次注意力操作的限制。受到这样一个关键洞察的启发——空间上相邻的视角比远处视角能提供更多关于局部场景组成的有用信息——我们的模型在每个视角的局部邻域内处理所有相关信息。为了高效地关注相邻视角中的特征标记（tokens），我们设计了一种新颖的位置编码方式，该编码基于查询视角与相邻视角之间的相对几何变换进行条件化建模。模型的输出被解码为包含颜色与不透明度视角依赖性的三维高斯溅射（3D Gaussian Splat）场景表示。综上所述，**Local View Transformer** 能够在一次前向推理中重建任意大规模、高分辨率的三维场景。\n"
  },
  {
    "path": "abs/2509.25075.md",
    "content": "### GEM: 3D Gaussian Splatting for Efficient and Accurate Cryo-EM Reconstruction\n\nCryo-electron microscopy (cryo-EM) has become a central tool for high-resolution structural biology, yet the massive scale of datasets (often exceeding 100k particle images) renders 3D reconstruction both computationally expensive and memory intensive. Traditional Fourier-space methods are efficient but lose fidelity due to repeated transforms, while recent real-space approaches based on neural radiance fields (NeRFs) improve accuracy but incur cubic memory and computation overhead. Therefore, we introduce GEM, a novel cryo-EM reconstruction framework built on 3D Gaussian Splatting (3DGS) that operates directly in real-space while maintaining high efficiency. Instead of modeling the entire density volume, GEM represents proteins with compact 3D Gaussians, each parameterized by only 11 values. To further improve the training efficiency, we designed a novel gradient computation to 3D Gaussians that contribute to each voxel. This design substantially reduced both memory footprint and training cost. On standard cryo-EM benchmarks, GEM achieves up to 48% faster training and 12% lower memory usage compared to state-of-the-art methods, while improving local resolution by as much as 38.8%. These results establish GEM as a practical and scalable paradigm for cryo-EM reconstruction, unifying speed, efficiency, and high-resolution accuracy.\n\n冷冻电子显微镜（Cryo-Electron Microscopy，cryo-EM）已成为高分辨率结构生物学的重要工具，但其数据集规模极为庞大（通常超过 10 万张粒子图像），使得三维重建计算代价高昂且内存占用巨大。传统的傅里叶空间方法虽然计算高效，但由于多次变换会导致精度损失；而基于神经辐射场（Neural Radiance Fields, NeRF）的实空间重建方法虽然提升了精度，却带来了立方级别的内存与计算开销。为此，我们提出了 **GEM**，一种基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的新型 cryo-EM 重建框架，能够直接在实空间中高效运行。不同于对整个密度体积进行建模，GEM 采用紧凑的三维高斯表示蛋白质，每个高斯仅由 11 个参数描述。为进一步提升训练效率，我们设计了一种针对体素贡献的三维高斯梯度计算方法，大幅降低了内存占用与训练成本。在标准 cryo-EM 基准测试中，GEM 相比现有最先进方法实现了 **最高 48% 的训练加速** 和 **12% 的内存节省**，同时 **局部分辨率提升高达 38.8%**。实验结果表明，GEM 在速度、效率与高分辨率精度之间实现了统一，为 cryo-EM 重建提供了一种实用且可扩展的新范式。\n"
  },
  {
    "path": "abs/2509.25079.md",
    "content": "### UniLat3D: Geometry-Appearance Unified Latents for Single-Stage 3D Generation\n\nHigh-fidelity 3D asset generation is crucial for various industries. While recent 3D pretrained models show strong capability in producing realistic content, most are built upon diffusion models and follow a two-stage pipeline that first generates geometry and then synthesizes appearance. Such a decoupled design tends to produce geometry-texture misalignment and non-negligible cost. In this paper, we propose UniLat3D, a unified framework that encodes geometry and appearance in a single latent space, enabling direct single-stage generation. Our key contribution is a geometry-appearance Unified VAE, which compresses high-resolution sparse features into a compact latent representation -- UniLat. UniLat integrates structural and visual information into a dense low-resolution latent, which can be efficiently decoded into diverse 3D formats, e.g., 3D Gaussians and meshes. Based on this unified representation, we train a single flow-matching model to map Gaussian noise directly into UniLat, eliminating redundant stages. Trained solely on public datasets, UniLat3D produces high-quality 3D assets in seconds from a single image, achieving superior appearance fidelity and geometric quality.\n\n高保真三维资产生成在多个行业中具有关键重要性。尽管近年来的三维预训练模型在生成逼真内容方面展现出强大能力，但大多数方法基于扩散模型，并采用“几何生成—外观合成”两阶段流程。这种解耦式设计容易导致几何与纹理不对齐，并带来显著的计算开销。为此，我们提出了 **UniLat3D**，一个在单一潜空间中联合编码几何与外观信息的统一框架，实现直接的一阶段生成。我们的核心创新在于 **几何-外观统一变分自编码器（Unified VAE）**，该模型将高分辨率稀疏特征压缩为紧凑的潜表示——**UniLat**。UniLat 将结构信息与视觉信息融合为密集、低分辨率的潜特征，可高效解码为多种三维形式，例如三维高斯（3D Gaussians）和网格（meshes）。基于这种统一表示，我们训练了一个单一的流匹配模型（flow-matching model），能够将高斯噪声直接映射到 UniLat，从而消除冗余阶段。该模型仅基于公开数据集进行训练，能够在数秒内从单张图像生成高质量的三维资产，在外观逼真度与几何精度上均显著优于现有方法。\n"
  },
  {
    "path": "abs/2509.25122.md",
    "content": "### Triangle Splatting+: Differentiable Rendering with Opaque Triangles\n\nReconstructing 3D scenes and synthesizing novel views has seen rapid progress in recent years. Neural Radiance Fields demonstrated that continuous volumetric radiance fields can achieve high-quality image synthesis, but their long training and rendering times limit practicality. 3D Gaussian Splatting (3DGS) addressed these issues by representing scenes with millions of Gaussians, enabling real-time rendering and fast optimization. However, Gaussian primitives are not natively compatible with the mesh-based pipelines used in VR headsets, and real-time graphics applications. Existing solutions attempt to convert Gaussians into meshes through post-processing or two-stage pipelines, which increases complexity and degrades visual quality. In this work, we introduce Triangle Splatting+, which directly optimizes triangles, the fundamental primitive of computer graphics, within a differentiable splatting framework. We formulate triangle parametrization to enable connectivity through shared vertices, and we design a training strategy that enforces opaque triangles. The final output is immediately usable in standard graphics engines without post-processing. Experiments on the Mip-NeRF360 and Tanks & Temples datasets show that Triangle Splatting+achieves state-of-the-art performance in mesh-based novel view synthesis. Our method surpasses prior splatting approaches in visual fidelity while remaining efficient and fast to training. Moreover, the resulting semi-connected meshes support downstream applications such as physics-based simulation or interactive walkthroughs.\n\n三维场景重建与新视图合成近年来取得了快速进展。神经辐射场（Neural Radiance Fields, NeRF）证明了连续体辐射场能够实现高质量的图像合成，但其训练与渲染耗时过长，限制了实际应用。三维高斯溅射（3D Gaussian Splatting, 3DGS）通过以数百万个高斯表示场景，有效解决了上述问题，实现了实时渲染与快速优化。然而，高斯基元与虚拟现实（VR）头显及实时图形应用中常用的基于网格的渲染管线并不兼容。现有方法通常通过后处理或两阶段流程将高斯转换为网格，这不仅增加了复杂度，还会降低视觉质量。为此，我们提出了 **Triangle Splatting+**，一种在可微分溅射框架中直接优化三角形（计算机图形学的基本单元）的新方法。我们提出了三角形参数化策略，以通过共享顶点实现几何连通性，并设计了一种强制三角形不透明的训练策略。最终输出可直接用于标准图形引擎，无需额外后处理。我们在 Mip-NeRF360 和 Tanks & Temples 等数据集上进行了实验，结果表明 **Triangle Splatting+** 在基于网格的新视图合成任务上达到了当前最先进的性能。该方法在保持高训练效率的同时，显著提升了视觉保真度。此外，生成的半连通网格还能支持下游任务，如基于物理的仿真或交互式漫游。\n"
  },
  {
    "path": "abs/2509.25183.md",
    "content": "### PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos\n\nWe present PAD3R, a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos. Unlike existing approaches, PAD3R handles long video sequences featuring substantial object deformation, large-scale camera movement, and limited view coverage that typically challenge conventional systems. At its core, our approach trains a personalized, object-centric pose estimator, supervised by a pre-trained image-to-3D model. This guides the optimization of deformable 3D Gaussian representation. The optimization is further regularized by long-term 2D point tracking over the entire input video. By combining generative priors and differentiable rendering, PAD3R reconstructs high-fidelity, articulated 3D representations of objects in a category-agnostic way. Extensive qualitative and quantitative results show that PAD3R is robust and generalizes well across challenging scenarios, highlighting its potential for dynamic scene understanding and 3D content creation.\n\n我们提出了 **PAD3R**，一种从随手拍摄、无姿态标注的单目视频中重建可变形三维物体的方法。与现有方法不同，PAD3R 能够处理包含显著物体形变、大范围相机运动以及有限视角覆盖的长视频序列，这些因素通常会对传统系统造成严重挑战。该方法的核心是训练一个个性化、以物体为中心的姿态估计器，并通过预训练的图像到三维模型（image-to-3D model）进行监督，从而引导可变形三维高斯表示（deformable 3D Gaussian representation）的优化过程。优化过程中还结合了整个输入视频的长期二维点跟踪（long-term 2D point tracking）进行正则化。通过结合生成先验与可微渲染（differentiable rendering），PAD3R 能够以类别无关的方式重建高保真、具关节结构的三维物体表示。大量定性与定量实验结果表明，PAD3R 在复杂场景中表现出极高的鲁棒性与泛化能力，展现了其在动态场景理解与三维内容创作中的巨大潜力。\n"
  },
  {
    "path": "abs/2509.25603.md",
    "content": "### GaussianLens: Localized High-Resolution Reconstruction via On-Demand Gaussian Densification\n\nWe perceive our surroundings with an active focus, paying more attention to regions of interest, such as the shelf labels in a grocery store. When it comes to scene reconstruction, this human perception trait calls for spatially varying degrees of detail ready for closer inspection in critical regions, preferably reconstructed on demand. While recent works in 3D Gaussian Splatting (3DGS) achieve fast, generalizable reconstruction from sparse views, their uniform resolution output leads to high computational costs unscalable to high-resolution training. As a result, they cannot leverage available images at their original high resolution to reconstruct details. Per-scene optimization methods reconstruct finer details with adaptive density control, yet require dense observations and lengthy offline optimization. To bridge the gap between the prohibitive cost of high-resolution holistic reconstructions and the user needs for localized fine details, we propose the problem of localized high-resolution reconstruction via on-demand Gaussian densification. Given a low-resolution 3DGS reconstruction, the goal is to learn a generalizable network that densifies the initial 3DGS to capture fine details in a user-specified local region of interest (RoI), based on sparse high-resolution observations of the RoI. This formulation avoids the high cost and redundancy of uniformly high-resolution reconstructions and fully leverages high-resolution captures in critical regions. We propose GaussianLens, a feed-forward densification framework that fuses multi-modal information from the initial 3DGS and multi-view images. We further design a pixel-guided densification mechanism that effectively captures details under large resolution increases. Experiments demonstrate our method's superior performance in local fine detail reconstruction and strong scalability to images of up to 1024×1024 resolution.\n\n我们以主动聚焦的方式感知周围环境，对感兴趣区域（例如超市货架上的标签）给予更多注意。在场景重建中，这种人类感知特性启示我们应针对关键区域提供空间分辨率可变、可按需精细重建的能力。尽管近期的三维高斯溅射（3D Gaussian Splatting, 3DGS）方法能够从稀疏视角实现快速且具泛化能力的重建，但其输出为均匀分辨率，导致计算成本高昂，难以扩展至高分辨率训练。因此，它们无法充分利用原始高分辨率图像来重建细节。基于逐场景优化的传统方法虽然通过自适应密度控制能够恢复更精细的细节，但依赖密集观测且训练耗时长。为弥合高分辨率整体重建的高昂代价与用户在局部精细区域需求之间的差距，我们提出了 **基于按需高斯密化的局部高分辨率重建问题**。在给定低分辨率 3DGS 重建结果的情况下，我们的目标是学习一个具备泛化能力的网络，利用稀疏的高分辨率观测，对用户指定的兴趣区域（Region of Interest, RoI）进行密化，从而捕获细节结构。该方案避免了全局高分辨率重建的高成本与冗余，同时充分利用关键区域的高分辨率图像。为此，我们提出了 **GaussianLens**，一个基于前馈架构的密化框架，可融合初始 3DGS 与多视图图像的多模态信息。此外，我们设计了 **像素引导的密化机制（pixel-guided densification）**，能够在大幅提升分辨率时有效捕获细节。实验结果表明，GaussianLens 在局部细节重建方面表现优异，并在高达 1024×1024 分辨率的图像上展现出强大的可扩展性。\n"
  },
  {
    "path": "abs/2509.25626.md",
    "content": "### LLM-Powered Code Analysis and Optimization for Gaussian Splatting Kernels\n\n3D Gaussian splatting (3DGS) is a transformative technique with profound implications on novel view synthesis and real-time rendering. Given its importance, there have been many attempts to improve its performance. However, with the increasing complexity of GPU architectures and the vast search space of performance-tuning parameters, it is a challenging task. Although manual optimizations have achieved remarkable speedups, they require domain expertise and the optimization process can be highly time consuming and error prone. In this paper, we propose to exploit large language models (LLMs) to analyze and optimize Gaussian splatting kernels. To our knowledge, this is the first work to use LLMs to optimize highly specialized real-world GPU kernels. We reveal the intricacies of using LLMs for code optimization and analyze the code optimization techniques from the LLMs. We also propose ways to collaborate with LLMs to further leverage their capabilities. For the original 3DGS code on the MipNeRF360 datasets, LLMs achieve significant speedups, 19% with Deepseek and 24% with GPT-5, demonstrating the different capabilities of different LLMs. By feeding additional information from performance profilers, the performance improvement from LLM-optimized code is enhanced to up to 42% and 38% on average. In comparison, our best-effort manually optimized version can achieve a performance improvement up to 48% and 39% on average, showing that there are still optimizations beyond the capabilities of current LLMs. On the other hand, even upon a newly proposed 3DGS framework with algorithmic optimizations, Seele, LLMs can still further enhance its performance by 6%, showing that there are optimization opportunities missed by domain experts. This highlights the potential of collaboration between domain experts and LLMs.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）是一项具有变革性意义的技术，在新视图合成与实时渲染等领域具有深远影响。鉴于其重要性，学界与业界已进行了大量研究以提升其性能。然而，随着 GPU 架构的复杂性不断增加以及性能调优参数空间的急剧扩张，性能优化仍然是一项极具挑战的任务。尽管人工优化已取得显著加速效果，但这类优化依赖专家经验，且过程耗时长、易出错。本文提出利用大语言模型（Large Language Models, LLMs）对高斯溅射核心（kernels）进行分析与优化。据我们所知，这是首个利用 LLM 优化真实世界中高度专业化 GPU 核函数的研究。我们揭示了使用 LLM 进行代码优化的复杂性，并系统分析了其产生的优化策略。同时，我们提出了与 LLM 协同优化的多种方法，以进一步发挥其潜能。在 MipNeRF360 数据集上的原始 3DGS 代码测试中，LLMs 实现了显著加速效果：Deepseek 提升 19%，GPT-5 提升 24%，体现了不同 LLM 的性能差异。当引入性能分析工具（performance profilers）提供的附加信息后，LLM 优化代码的性能提升可进一步达到 **42%** 与 **38%（平均）**。相比之下，我们人工尽力优化的版本可实现 **48%** 与 **39%（平均）** 的性能提升，说明当前 LLM 仍存在未覆盖的优化潜力。另一方面，即使在一个已包含算法级优化的新型 3DGS 框架 **Seele** 上，LLMs 仍能进一步提升 **6%** 的性能，表明仍有专家未能发现的优化机会。总体而言，这项研究凸显了领域专家与大语言模型协同优化的巨大潜力。\n"
  },
  {
    "path": "abs/2509.26008.md",
    "content": "### PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion\n\nIn this paper, we present the first pinhole-fisheye framework for heterogeneous multi-view depth estimation, PFDepth. Our key insight is to exploit the complementary characteristics of pinhole and fisheye imagery (undistorted vs. distorted, small vs. large FOV, far vs. near field) for joint optimization. PFDepth employs a unified architecture capable of processing arbitrary combinations of pinhole and fisheye cameras with varied intrinsics and extrinsics. Within PFDepth, we first explicitly lift 2D features from each heterogeneous view into a canonical 3D volumetric space. Then, a core module termed Heterogeneous Spatial Fusion is designed to process and fuse distortion-aware volumetric features across overlapping and non-overlapping regions. Additionally, we subtly reformulate the conventional voxel fusion into a novel 3D Gaussian representation, in which learnable latent Gaussian spheres dynamically adapt to local image textures for finer 3D aggregation. Finally, fused volume features are rendered into multi-view depth maps. Through extensive experiments, we demonstrate that PFDepth sets a state-of-the-art performance on KITTI-360 and RealHet datasets over current mainstream depth networks. To the best of our knowledge, this is the first systematic study of heterogeneous pinhole-fisheye depth estimation, offering both technical novelty and valuable empirical insights.\n\n本文提出了首个用于异构多视图深度估计的针孔-鱼眼统一框架 —— **PFDepth**。我们的方法核心思想在于利用针孔与鱼眼成像的互补特性（无畸变 vs. 有畸变、小视场 vs. 大视场、远场 vs. 近场）进行联合优化。PFDepth 采用统一的网络架构，能够同时处理具有不同内外参配置的任意针孔与鱼眼相机组合。在 PFDepth 中，我们首先将来自各异构视角的二维特征显式提升到标准化的三维体素空间中。随后，我们设计了一个关键模块 —— **异构空间融合模块（Heterogeneous Spatial Fusion）**，用于在重叠与非重叠区域之间处理并融合考虑畸变的体素特征。此外，我们对传统的体素融合进行了重新表述，提出了一种新颖的 **三维高斯表示（3D Gaussian Representation）**，其中可学习的潜在高斯球能够根据局部图像纹理动态自适应，从而实现更精细的三维聚合。最后，融合后的体积特征被渲染为多视深度图。大量实验结果表明，PFDepth 在 KITTI-360 与 RealHet 数据集上均显著超越现有主流深度网络，达到了最新的性能水平。据我们所知，这也是首个系统性研究针孔-鱼眼异构深度估计的工作，兼具方法创新性与重要的实证价值。\n"
  },
  {
    "path": "abs/2509.26055.md",
    "content": "### GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts\n\nThis paper presents GaussEdit, a framework for adaptive 3D scene editing guided by text and image prompts. GaussEdit leverages 3D Gaussian Splatting as its backbone for scene representation, enabling convenient Region of Interest selection and efficient editing through a three-stage process. The first stage involves initializing the 3D Gaussians to ensure high-quality edits. The second stage employs an Adaptive Global-Local Optimization strategy to balance global scene coherence and detailed local edits and a category-guided regularization technique to alleviate the Janus problem. The final stage enhances the texture of the edited objects using a sophisticated image-to-image synthesis technique, ensuring that the results are visually realistic and align closely with the given prompts. Our experimental results demonstrate that GaussEdit surpasses existing methods in editing accuracy, visual fidelity, and processing speed. By successfully embedding user-specified concepts into 3D scenes, GaussEdit is a powerful tool for detailed and user-driven 3D scene editing, offering significant improvements over traditional methods.\n\n本文提出了 **GaussEdit**，一个由文本与图像提示共同引导的自适应三维场景编辑框架。GaussEdit 以三维高斯溅射（3D Gaussian Splatting）作为场景表示的核心骨干，使得兴趣区域（Region of Interest, RoI）的选择更为便捷，并通过三阶段流程实现高效编辑。第一阶段对三维高斯进行初始化，以确保编辑的高质量；第二阶段引入 **自适应全局-局部优化策略（Adaptive Global-Local Optimization）**，在保证场景整体一致性的同时实现细节层面的精准编辑，并结合 **类别引导的正则化（Category-Guided Regularization）** 来缓解 Janus 问题；最后阶段利用先进的 **图像到图像合成技术（image-to-image synthesis）** 对编辑区域的纹理进行增强，确保生成结果在视觉上逼真且与用户提示高度一致。实验结果表明，GaussEdit 在编辑精度、视觉保真度与处理速度上均显著优于现有方法。通过将用户指定概念精准嵌入三维场景，GaussEdit 成为一个强大的、以用户为中心的三维场景编辑工具，在细节控制与交互体验上实现了对传统方法的突破性提升。\n"
  },
  {
    "path": "abs/2509.26455.md",
    "content": "### Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting\n\nWe present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi-view collection, conditioned on a separate reference style image. Stylos synthesizes a stylized 3D Gaussian scene without per-scene optimization or precomputed poses, achieving geometry-aware, view-consistent stylization that generalizes to unseen categories, scenes, and styles. At its core, Stylos adopts a Transformer backbone with two pathways: geometry predictions retain self-attention to preserve geometric fidelity, while style is injected via global cross-attention to enforce visual consistency across views. With the addition of a voxel-based 3D style loss that aligns aggregated scene features to style statistics, Stylos enforces view-consistent stylization while preserving geometry. Experiments across multiple datasets demonstrate that Stylos delivers high-quality zero-shot stylization, highlighting the effectiveness of global style-content coupling, the proposed 3D style loss, and the scalability of our framework from single view to large-scale multi-view settings.\n\n本文提出了 **Stylos**，一种单前向（single-forward）三维高斯框架，用于基于三维风格迁移的生成任务。Stylos 能够在无姿态标注的输入下工作，从单张图像到多视图集合均可适用，并以独立的参考风格图像作为条件输入。该方法无需逐场景优化或预计算姿态，即可生成具备几何感知与视角一致性的三维高斯风格化场景，并具备对未见类别、场景与风格的良好泛化能力。Stylos 的核心结构基于 Transformer 骨干网络，包含两条路径：几何预测路径通过自注意力机制保持几何保真度，而风格路径则通过全局交叉注意力（global cross-attention）实现多视图间的视觉一致性。进一步地，我们提出了一种基于体素的三维风格损失（voxel-based 3D style loss），通过对聚合的场景特征与风格统计进行对齐，实现几何保持下的视角一致性风格迁移。大量实验表明，Stylos 在多个数据集上均能实现高质量的零样本风格化效果，验证了其 **全局风格-内容耦合机制**、**三维风格损失设计** 以及 **从单视图到大规模多视图场景的可扩展性** 的有效性。\n"
  },
  {
    "path": "abs/2509.26621.md",
    "content": "### HART: Human Aligned Reconstruction Transformer\n\nWe introduce HART, a unified framework for sparse-view human reconstruction. Given a small set of uncalibrated RGB images of a person as input, it outputs a watertight clothed mesh, the aligned SMPL-X body mesh, and a Gaussian-splat representation for photorealistic novel-view rendering. Prior methods for clothed human reconstruction either optimize parametric templates, which overlook loose garments and human-object interactions, or train implicit functions under simplified camera assumptions, limiting applicability in real scenes. In contrast, HART predicts per-pixel 3D point maps, normals, and body correspondences, and employs an occlusion-aware Poisson reconstruction to recover complete geometry, even in self-occluded regions. These predictions also align with a parametric SMPL-X body model, ensuring that reconstructed geometry remains consistent with human structure while capturing loose clothing and interactions. These human-aligned meshes initialize Gaussian splats to further enable sparse-view rendering. While trained on only 2.3K synthetic scans, HART achieves state-of-the-art results: Chamfer Distance improves by 18-23 percent for clothed-mesh reconstruction, PA-V2V drops by 6-27 percent for SMPL-X estimation, LPIPS decreases by 15-27 percent for novel-view synthesis on a wide range of datasets. These results suggest that feed-forward transformers can serve as a scalable model for robust human reconstruction in real-world settings.\n\n我们提出了 HART，一个用于稀疏视角人体重建的统一框架。该方法以一小组未经校准的人体 RGB 图像作为输入，输出包括：闭合的穿衣网格、对齐的 SMPL-X 身体网格，以及用于真实感新视角渲染的高斯泼溅表示。此前的穿衣人体重建方法通常是优化参数化模板，这种方式忽略了宽松服饰与人-物交互，或是在简化相机假设下训练隐式函数，从而限制了其在真实场景中的适用性。相比之下，HART 能预测每像素的三维点图、法向量和身体对应关系，并使用遮挡感知的泊松重建方法，恢复包括自遮挡区域在内的完整几何结构。这些预测结果还与参数化的 SMPL-X 人体模型对齐，确保重建几何既保持人体结构的一致性，又能捕捉宽松服饰和交互行为。这些对齐的人体网格进一步用于初始化高斯泼溅表示，以支持稀疏视角渲染。尽管仅在 2.3K 个合成扫描数据上进行了训练，HART 仍取得了当前最优的效果：在穿衣网格重建方面，Chamfer 距离提升了 18%-23%；在 SMPL-X 估计方面，PA-V2V 降低了 6%-27%；在新视角合成方面，LPIPS 降低了 15%-27%，涵盖了多种数据集。这些结果表明，前馈 Transformer 可以作为一种可扩展的模型，在真实场景中实现稳健的人体重建。\n"
  },
  {
    "path": "abs/2510.01619.md",
    "content": "### MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics\n\nWhile there has been significant progress in the field of 3D avatar creation from visual observations, modeling physically plausible dynamics of humans with loose garments remains a challenging problem. Although a few existing works address this problem by leveraging physical simulation, they suffer from limited accuracy or robustness to novel animation inputs. In this work, we present MPMAvatar, a framework for creating 3D human avatars from multi-view videos that supports highly realistic, robust animation, as well as photorealistic rendering from free viewpoints. For accurate and robust dynamics modeling, our key idea is to use a Material Point Method-based simulator, which we carefully tailor to model garments with complex deformations and contact with the underlying body by incorporating an anisotropic constitutive model and a novel collision handling algorithm. We combine this dynamics modeling scheme with our canonical avatar that can be rendered using 3D Gaussian Splatting with quasi-shadowing, enabling high-fidelity rendering for physically realistic animations. In our experiments, we demonstrate that MPMAvatar significantly outperforms the existing state-of-the-art physics-based avatar in terms of (1) dynamics modeling accuracy, (2) rendering accuracy, and (3) robustness and efficiency. Additionally, we present a novel application in which our avatar generalizes to unseen interactions in a zero-shot manner-which was not achievable with previous learning-based methods due to their limited simulation generalizability.\n\n尽管从视觉观测中构建三维虚拟人技术已取得显著进展，但对于穿着宽松服饰的人体进行物理合理的动态建模，仍然是一个极具挑战性的问题。已有少数工作尝试借助物理仿真解决该问题，但在面对新颖动画输入时，其准确性和鲁棒性仍然有限。在本研究中，我们提出了 MPMAvatar——一个从多视角视频中创建三维人体虚拟人的框架，支持高度真实且鲁棒的动画，同时实现任意视角下的逼真渲染。为了实现准确且稳健的动态建模，我们的核心思想是使用基于材料点法（Material Point Method, MPM）的仿真器，并通过引入各向异性的本构模型和全新的碰撞处理算法，对其进行精细定制，使其能够建模服饰的复杂形变以及与身体的接触行为。我们将这一动态建模机制与可使用带有准阴影效果的三维高斯泼溅进行渲染的标准虚拟人结合，从而实现具备物理真实性的高保真动画渲染。在实验中，我们展示了 MPMAvatar 在以下几个方面显著优于当前最先进的基于物理的虚拟人方法：（1）动态建模的准确性，（2）渲染的精度，以及（3）鲁棒性和效率。此外，我们还展示了一个新颖的应用场景，即该虚拟人可以在零样本条件下泛化至未见过的交互情境，而这在以往的基于学习的方法中是无法实现的，因为其仿真泛化能力受限。\n"
  },
  {
    "path": "abs/2510.01767.md",
    "content": "### LOBE-GS: Load-Balanced and Efficient 3D Gaussian Splatting for Large-Scale Scene Reconstruction\n\n3D Gaussian Splatting (3DGS) has established itself as an efficient representation for real-time, high-fidelity 3D scene reconstruction. However, scaling 3DGS to large and unbounded scenes such as city blocks remains difficult. Existing divide-and-conquer methods alleviate memory pressure by partitioning the scene into blocks, but introduce new bottlenecks: (i) partitions suffer from severe load imbalance since uniform or heuristic splits do not reflect actual computational demands, and (ii) coarse-to-fine pipelines fail to exploit the coarse stage efficiently, often reloading the entire model and incurring high overhead. In this work, we introduce LoBE-GS, a novel Load-Balanced and Efficient 3D Gaussian Splatting framework, that re-engineers the large-scale 3DGS pipeline. LoBE-GS introduces a depth-aware partitioning method that reduces preprocessing from hours to minutes, an optimization-based strategy that balances visible Gaussians -- a strong proxy for computational load -- across blocks, and two lightweight techniques, visibility cropping and selective densification, to further reduce training cost. Evaluations on large-scale urban and outdoor datasets show that LoBE-GS consistently achieves up to 2× faster end-to-end training time than state-of-the-art baselines, while maintaining reconstruction quality and enabling scalability to scenes infeasible with vanilla 3DGS.\n\n三维高斯泼溅（3D Gaussian Splatting，简称 3DGS）已成为实时高保真三维场景重建中高效的表示方式。然而，将 3DGS 扩展到如城市街区这类大规模、无边界的场景仍然具有挑战性。现有的“分而治之”方法虽然通过将场景划分为若干块来缓解内存压力，但也引入了新的瓶颈：（i）由于均匀或启发式划分无法反映实际的计算负载，划分结果往往存在严重的负载不均衡问题；（ii）粗到细的重建流程未能高效利用粗阶段的结果，常常需要重新加载整个模型，导致极高的开销。为此，我们提出了 LoBE-GS——一种面向负载均衡与高效计算的 3D 高斯泼溅新框架，全面重构了大规模 3DGS 流水线。LoBE-GS 引入了一种深度感知的划分方法，将预处理时间从数小时缩短至数分钟；同时提出一种基于优化的策略，在各个块之间平衡可见高斯的分布——这是计算负载的有效代理。此外，我们还提出了两个轻量级技术：可见性裁剪（visibility cropping）与选择性加密（selective densification），进一步降低训练成本。我们在大规模城市与户外数据集上的评估结果表明，LoBE-GS 在保持重建质量的同时，训练端到端时间可达当前最优基线的 2 倍速度，并支持扩展到原始 3DGS 无法处理的大规模场景。\n"
  },
  {
    "path": "abs/2510.01978.md",
    "content": "### ROI-GS: Interest-based Local Quality 3D Gaussian Splatting\n\nWe tackle the challenge of efficiently reconstructing 3D scenes with high detail on objects of interest. Existing 3D Gaussian Splatting (3DGS) methods allocate resources uniformly across the scene, limiting fine detail to Regions Of Interest (ROIs) and leading to inflated model size. We propose ROI-GS, an object-aware framework that enhances local details through object-guided camera selection, targeted Object training, and seamless integration of high-fidelity object of interest reconstructions into the global scene. Our method prioritizes higher resolution details on chosen objects while maintaining real-time performance. Experiments show that ROI-GS significantly improves local quality (up to 2.96 dB PSNR), while reducing overall model size by ≈17% of baseline and achieving faster training for a scene with a single object of interest, outperforming existing methods.\n\n三维高斯泼溅（3D Gaussian Splatting，简称 3DGS）已成为实时高保真三维场景重建中高效的表示方式。然而，将 3DGS 扩展到如城市街区这类大规模、无边界的场景仍然具有挑战性。现有的“分而治之”方法虽然通过将场景划分为若干块来缓解内存压力，但也引入了新的瓶颈：（i）由于均匀或启发式划分无法反映实际的计算负载，划分结果往往存在严重的负载不均衡问题；（ii）粗到细的重建流程未能高效利用粗阶段的结果，常常需要重新加载整个模型，导致极高的开销。为此，我们提出了 LoBE-GS——一种面向负载均衡与高效计算的 3D 高斯泼溅新框架，全面重构了大规模 3DGS 流水线。LoBE-GS 引入了一种深度感知的划分方法，将预处理时间从数小时缩短至数分钟；同时提出一种基于优化的策略，在各个块之间平衡可见高斯的分布——这是计算负载的有效代理。此外，我们还提出了两个轻量级技术：可见性裁剪（visibility cropping）与选择性加密（selective densification），进一步降低训练成本。我们在大规模城市与户外数据集上的评估结果表明，LoBE-GS 在保持重建质量的同时，训练端到端时间可达当前最优基线的 2 倍速度，并支持扩展到原始 3DGS 无法处理的大规模场景。\n"
  },
  {
    "path": "abs/2510.02034.md",
    "content": "### GaussianMorphing: Mesh-Guided 3D Gaussians for Semantic-Aware Object Morphing\n\nWe introduce GaussianMorphing, a novel framework for semantic-aware 3D shape and texture morphing from multi-view images. Previous approaches usually rely on point clouds or require pre-defined homeomorphic mappings for untextured data. Our method overcomes these limitations by leveraging mesh-guided 3D Gaussian Splatting (3DGS) for high-fidelity geometry and appearance modeling. The core of our framework is a unified deformation strategy that anchors 3DGaussians to reconstructed mesh patches, ensuring geometrically consistent transformations while preserving texture fidelity through topology-aware constraints. In parallel, our framework establishes unsupervised semantic correspondence by using the mesh topology as a geometric prior and maintains structural integrity via physically plausible point trajectories. This integrated approach preserves both local detail and global semantic coherence throughout the morphing process with out requiring labeled data. On our proposed TexMorph benchmark, GaussianMorphing substantially outperforms prior 2D/3D methods, reducing color consistency error (∆E) by 22.2% and EI by 26.2%.\n\n我们提出了 GaussianMorphing，一种面向语义感知的三维形状与纹理变形新框架，支持从多视角图像中直接进行高保真变形。以往方法通常依赖点云，或要求无纹理数据具备预定义的同胚映射。我们的方案通过引入 mesh-guided 的三维高斯泼溅（3D Gaussian Splatting, 3DGS），实现对几何与外观的高保真建模，从而克服了上述限制。该框架的核心是一种统一的变形策略：将 3D 高斯锚定于重建的网格片段上，通过拓扑感知约束在保持纹理保真度的同时实现几何一致的变形。与此同时，框架还通过将网格拓扑作为几何先验，建立无监督的语义对应关系，并利用物理合理的点轨迹保持结构完整性。该集成方法在整个变形过程中无需标注数据，便可同时保持局部细节与全局语义的一致性。在我们提出的 TexMorph 基准上，GaussianMorphing 显著优于现有的二维/三维方法，颜色一致性误差（∆E）降低了 22.2%，EI 指标降低了 26.2%。\n"
  },
  {
    "path": "abs/2510.02314.md",
    "content": "### StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions\n\n3D scene representation methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have significantly advanced novel view synthesis. As these methods become prevalent, addressing their vulnerabilities becomes critical. We analyze 3DGS robustness against image-level poisoning attacks and propose a novel density-guided poisoning method. Our method strategically injects Gaussian points into low-density regions identified via Kernel Density Estimation (KDE), embedding viewpoint-dependent illusory objects clearly visible from poisoned views while minimally affecting innocent views. Additionally, we introduce an adaptive noise strategy to disrupt multi-view consistency, further enhancing attack effectiveness. We propose a KDE-based evaluation protocol to assess attack difficulty systematically, enabling objective benchmarking for future research. Extensive experiments demonstrate our method's superior performance compared to state-of-the-art techniques.\n\nNeural Radiance Fields（NeRF）与三维高斯泼溅（3D Gaussian Splatting, 3DGS）等三维场景表示方法，已显著推动新视角合成的发展。随着这些方法的广泛应用，研究其潜在脆弱性变得愈发重要。本文系统分析了 3DGS 在图像级投毒攻击下的鲁棒性，并提出了一种新颖的基于密度引导的投毒方法。该方法利用核密度估计（Kernel Density Estimation, KDE）识别的低密度区域，策略性地注入高斯点，在不显著影响无辜视角的前提下，在特定视角中嵌入具有视角依赖性的幻觉物体。此外，我们还引入了一种自适应噪声策略，用于破坏多视角一致性，从而进一步增强攻击效果。我们还提出了一套基于 KDE 的评估协议，用于系统地衡量攻击难度，为后续研究提供客观的基准。大量实验结果表明，本文方法在效果上明显优于现有最先进的技术。\n"
  },
  {
    "path": "abs/2510.02732.md",
    "content": "### From Tokens to Nodes: Semantic-Guided Motion Control for Dynamic 3D Gaussian Splatting\n\nDynamic 3D reconstruction from monocular videos remains difficult due to the ambiguity inferring 3D motion from limited views and computational demands of modeling temporally varying scenes. While recent sparse control methods alleviate computation by reducing millions of Gaussians to thousands of control points, they suffer from a critical limitation: they allocate points purely by geometry, leading to static redundancy and dynamic insufficiency. We propose a motion-adaptive framework that aligns control density with motion complexity. Leveraging semantic and motion priors from vision foundation models, we establish patch-token-node correspondences and apply motion-adaptive compression to concentrate control points in dynamic regions while suppressing redundancy in static backgrounds. Our approach achieves flexible representational density adaptation through iterative voxelization and motion tendency scoring, directly addressing the fundamental mismatch between control point allocation and motion complexity. To capture temporal evolution, we introduce spline-based trajectory parameterization initialized by 2D tracklets, replacing MLP-based deformation fields to achieve smoother motion representation and more stable optimization. Extensive experiments demonstrate significant improvements in reconstruction quality and efficiency over existing state-of-the-art methods.\n\n从单目视频中进行动态三维重建仍然是一项具有挑战性的任务，原因在于从有限视角中推理三维运动存在固有歧义，同时对时变场景建模的计算开销巨大。尽管近期的稀疏控制方法通过将百万级高斯点压缩为数千个控制点，从而缓解了计算负担，但它们存在一个关键问题：控制点的分配仅基于几何信息，导致静态区域出现冗余、动态区域控制不足。为此，我们提出了一种运动自适应框架，使控制点密度与运动复杂度相匹配。该方法借助视觉基础模型中的语义和运动先验，构建 patch-token-node 的对应关系，并引入运动自适应压缩策略，将控制点聚焦于动态区域，同时压缩静态背景中的冗余。通过迭代体素化与运动趋势评分，我们实现了灵活的表示密度自适应，有效解决了控制点分配与运动复杂度不匹配的问题。为了更好地捕捉时间演化过程，我们提出基于样条的轨迹参数化方法，以二维跟踪点为初始化，替代基于 MLP 的变形场，进而实现更平滑的运动表达和更稳定的优化过程。大量实验表明，本文方法在重建质量与效率方面均显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2510.02884.md",
    "content": "### GS-Share: Enabling High-fidelity Map Sharing with Incremental Gaussian Splatting\n\nConstructing and sharing 3D maps is essential for many applications, including autonomous driving and augmented reality. Recently, 3D Gaussian splatting has emerged as a promising approach for accurate 3D reconstruction. However, a practical map-sharing system that features high-fidelity, continuous updates, and network efficiency remains elusive. To address these challenges, we introduce GS-Share, a photorealistic map-sharing system with a compact representation. The core of GS-Share includes anchor-based global map construction, virtual-image-based map enhancement, and incremental map update. We evaluate GS-Share against state-of-the-art methods, demonstrating that our system achieves higher fidelity, particularly for extrapolated views, with improvements of 11%, 22%, and 74% in PSNR, LPIPS, and Depth L1, respectively. Furthermore, GS-Share is significantly more compact, reducing map transmission overhead by 36%.\n\n三维地图的构建与共享对于自动驾驶、增强现实等众多应用至关重要。近年来，三维高斯泼溅（3D Gaussian Splatting）作为一种精确的三维重建方法逐渐受到关注。然而，具备高保真度、持续更新能力与网络传输效率的实用地图共享系统仍未真正实现。为应对这些挑战，我们提出了 GS-Share——一个具有紧凑表示的真实感地图共享系统。GS-Share 的核心包括基于锚点的全局地图构建、基于虚拟图像的地图增强以及增量式地图更新机制。我们在与现有最先进方法的对比实验中表明，GS-Share 在重建保真度上（特别是在外推视角下）具有显著优势，PSNR、LPIPS 与 Depth L1 分别提升了 11%、22% 与 74%。此外，GS-Share 的表示也更加紧凑，地图传输开销减少了 36%。\n"
  },
  {
    "path": "abs/2510.03312.md",
    "content": "### Universal Beta Splatting\n\nWe introduce Universal Beta Splatting (UBS), a unified framework that generalizes 3D Gaussian Splatting to N-dimensional anisotropic Beta kernels for explicit radiance field rendering. Unlike fixed Gaussian primitives, Beta kernels enable controllable dependency modeling across spatial, angular, and temporal dimensions within a single representation. Our unified approach captures complex light transport effects, handles anisotropic view-dependent appearance, and models scene dynamics without requiring auxiliary networks or specific color encodings. UBS maintains backward compatibility by approximating to Gaussian Splatting as a special case, guaranteeing plug-in usability and lower performance bounds. The learned Beta parameters naturally decompose scene properties into interpretable without explicit supervision: spatial (surface vs. texture), angular (diffuse vs. specular), and temporal (static vs. dynamic). Our CUDA-accelerated implementation achieves real-time rendering while consistently outperforming existing methods across static, view-dependent, and dynamic benchmarks, establishing Beta kernels as a scalable universal primitive for radiance field rendering.\n\n我们提出了 Universal Beta Splatting（UBS），一个统一框架，将三维高斯泼溅（3D Gaussian Splatting）推广为适用于显式辐射场渲染的 N 维各向异性 Beta 核函数。与固定的高斯基元不同，Beta 核允许在一个统一表示中对空间、角度与时间维度的依赖关系进行可控建模。该统一方法能够捕捉复杂的光传输效应，处理各向异性的视角依赖外观，并建模场景动态变化，无需辅助网络或特定的颜色编码。UBS 保持对高斯泼溅的向后兼容性，可将其作为特殊情况近似，从而保证良好的可插拔性与性能下限。同时，学习得到的 Beta 参数可在无显式监督的情况下，自然地将场景属性分解为可解释的表示：空间维度（表面 vs. 纹理）、角度维度（漫反射 vs. 镜面反射）、时间维度（静态 vs. 动态）。我们的 CUDA 加速实现支持实时渲染，并在静态、视角依赖和动态基准测试中持续优于现有方法，确立了 Beta 核函数作为可扩展的通用显式辐射场基元的地位。\n"
  },
  {
    "path": "abs/2510.03545.md",
    "content": "### SketchPlan: Diffusion Based Drone Planning From Human Sketches\n\nWe propose SketchPlan, a diffusion-based planner that interprets 2D hand-drawn sketches over depth images to generate 3D flight paths for drone navigation. SketchPlan comprises two components: a SketchAdapter that learns to map the human sketches to projected 2D paths, and DiffPath, a diffusion model that infers 3D trajectories from 2D projections and a first person view depth image. Our model achieves zero-shot sim-to-real transfer, generating accurate and safe flight paths in previously unseen real-world environments. To train the model, we build a synthetic dataset of 32k flight paths using a diverse set of photorealistic 3D Gaussian Splatting scenes. We automatically label the data by computing 2D projections of the 3D flight paths onto the camera plane, and use this to train the DiffPath diffusion model. However, since real human 2D sketches differ significantly from ideal 2D projections, we additionally label 872 of the 3D flight paths with real human sketches and use this to train the SketchAdapter to infer the 2D projection from the human sketch. We demonstrate SketchPlan's effectiveness in both simulated and real-world experiments, and show through ablations that training on a mix of human labeled and auto-labeled data together with a modular design significantly boosts its capabilities to correctly interpret human intent and infer 3D paths. In real-world drone tests, SketchPlan achieved 100% success in low/medium clutter and 40% in unseen high-clutter environments, outperforming key ablations by 20-60% in task completion.\n\n我们提出了 SketchPlan，一种基于扩散模型的规划器，能够解释深度图上的二维手绘草图，并生成用于无人机导航的三维飞行路径。SketchPlan 由两个关键组件构成：SketchAdapter 和 DiffPath。SketchAdapter 学习将人类草图映射为投影的二维路径，DiffPath 是一个扩散模型，用于结合二维投影和第一人称视角的深度图，推理出三维飞行轨迹。该模型实现了零样本的仿真到真实迁移，在此前未见的真实环境中也能生成准确且安全的飞行路径。为了训练模型，我们构建了一个包含 32,000 条飞行路径的合成数据集，场景基于多样化的真实感三维高斯泼溅（3DGS）场景生成。我们自动将三维飞行路径投影到相机平面，得到二维路径作为标签，用于训练 DiffPath 扩散模型。但由于真实的人类手绘草图与理想二维投影之间存在显著差异，我们另外手工标注了 872 条三维飞行路径对应的人类草图，并利用这些数据训练 SketchAdapter，使其能够从草图中推断出对应的二维投影路径。我们在仿真和真实世界中均验证了 SketchPlan 的有效性，并通过消融实验发现，使用人类标注与自动标注数据的混合训练方式，以及模块化的设计，显著增强了系统对人类意图的正确理解与三维路径的推理能力。在真实无人机飞行测试中，SketchPlan 在低/中等复杂度场景中实现了 100% 成功率，在未见的高复杂度场景中达到 40%，在任务完成率方面相比关键消融模型提升了 20%-60%。\n"
  },
  {
    "path": "abs/2510.04759.md",
    "content": "### Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction\n\nThe 3D occupancy prediction task has witnessed remarkable progress in recent years, playing a crucial role in vision-based autonomous driving systems. While traditional methods are limited to fixed semantic categories, recent approaches have moved towards predicting text-aligned features to enable open-vocabulary text queries in real-world scenes. However, there exists a trade-off in text-aligned scene modeling: sparse Gaussian representation struggles to capture small objects in the scene, while dense representation incurs significant computational overhead. To address these limitations, we present PG-Occ, an innovative Progressive Gaussian Transformer Framework that enables open-vocabulary 3D occupancy prediction. Our framework employs progressive online densification, a feed-forward strategy that gradually enhances the 3D Gaussian representation to capture fine-grained scene details. By iteratively enhancing the representation, the framework achieves increasingly precise and detailed scene understanding. Another key contribution is the introduction of an anisotropy-aware sampling strategy with spatio-temporal fusion, which adaptively assigns receptive fields to Gaussians at different scales and stages, enabling more effective feature aggregation and richer scene information capture. Through extensive evaluations, we demonstrate that PG-Occ achieves state-of-the-art performance with a relative 14.3% mIoU improvement over the previous best performing method.\n\n近年来，三维占据预测任务取得了显著进展，在基于视觉的自动驾驶系统中发挥着关键作用。传统方法通常局限于固定的语义类别，而最新研究则逐渐转向预测与文本对齐的特征，以支持真实场景中的开放词汇查询。然而，文本对齐的场景建模存在权衡：稀疏的高斯表示难以捕捉场景中的小物体，而稠密表示则带来较高的计算开销。为了解决这一问题，我们提出了 PG-Occ——一种创新的渐进式高斯变换器框架，用于实现开放词汇的三维占据预测。该框架采用渐进式在线加密策略，这是一种前馈式方法，能够逐步增强三维高斯表示，从而捕捉细粒度的场景细节。通过迭代式表示增强，模型实现了越来越精细且精准的场景理解。另一个关键贡献是引入了具备各向异性感知能力的采样策略，并结合时空融合机制，能够在不同尺度与阶段自适应地分配高斯的感受野，从而实现更有效的特征聚合与更丰富的场景信息捕捉。大量评估结果表明，PG-Occ 在性能上达到当前最先进水平，相较于此前最优方法，mIoU 提升达 14.3%。\n"
  },
  {
    "path": "abs/2510.05488.md",
    "content": "### ArchitectHead: Continuous Level of Detail Control for 3D Gaussian Head Avatars\n\n3D Gaussian Splatting (3DGS) has enabled photorealistic and real-time rendering of 3D head avatars. Existing 3DGS-based avatars typically rely on tens of thousands of 3D Gaussian points (Gaussians), with the number of Gaussians fixed after training. However, many practical applications require adjustable levels of detail (LOD) to balance rendering efficiency and visual quality. In this work, we propose \"ArchitectHead\", the first framework for creating 3D Gaussian head avatars that support continuous control over LOD. Our key idea is to parameterize the Gaussians in a 2D UV feature space and propose a UV feature field composed of multi-level learnable feature maps to encode their latent features. A lightweight neural network-based decoder then transforms these latent features into 3D Gaussian attributes for rendering. ArchitectHead controls the number of Gaussians by dynamically resampling feature maps from the UV feature field at the desired resolutions. This method enables efficient and continuous control of LOD without retraining. Experimental results show that ArchitectHead achieves state-of-the-art (SOTA) quality in self and cross-identity reenactment tasks at the highest LOD, while maintaining near SOTA performance at lower LODs. At the lowest LOD, our method uses only 6.2% of the Gaussians while the quality degrades moderately (L1 Loss +7.9%, PSNR --0.97%, SSIM --0.6%, LPIPS Loss +24.1%), and the rendering speed nearly doubles.\n\n三维高斯泼溅（3D Gaussian Splatting，3DGS）已实现对三维头部头像的真实感和实时渲染。现有基于 3DGS 的头像系统通常依赖于数万个三维高斯点（Gaussians），且在训练完成后高斯点数量是固定的。然而，许多实际应用需要具备可调节的细节层级（LOD），以在渲染效率与视觉质量之间进行平衡。为此，我们提出了 ArchitectHead，这是首个支持连续细节层级控制的三维高斯头部头像生成框架。我们的核心思想是将高斯参数化于二维 UV 特征空间，并设计一个由多层可学习特征图组成的 UV 特征场，用于编码其潜在表示。随后，一个轻量级神经网络解码器将这些潜在特征转换为用于渲染的三维高斯属性。ArchitectHead 通过在 UV 特征场中以所需分辨率动态重采样特征图，来调节高斯点的数量，从而实现无需重新训练的高效、连续 LOD 控制。实验结果表明，ArchitectHead 在自我和跨身份重演任务中，在最高 LOD 下达到了当前最优（SOTA）水平，同时在较低 LOD 下也保持了接近 SOTA 的表现。在最低 LOD 设置下，本方法仅使用了 6.2% 的高斯点，渲染质量仅有适度下降（L1 损失增加 7.9%、PSNR 降低 0.97%、SSIM 降低 0.6%、LPIPS 增加 24.1%），而渲染速度几乎提升了一倍。\n"
  },
  {
    "path": "abs/2510.06644.md",
    "content": "### RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction\n\n3D Gaussian Splatting (3DGS) based Simultaneous Localization and Mapping (SLAM) systems can largely benefit from 3DGS's state-of-the-art rendering efficiency and accuracy, but have not yet been adopted in resource-constrained edge devices due to insufficient speed. Addressing this, we identify notable redundancies across the SLAM pipeline for acceleration. While conceptually straightforward, practical approaches are required to minimize the overhead associated with identifying and eliminating these redundancies. In response, we propose RTGS, an algorithm-hardware co-design framework that comprehensively reduces the redundancies for real-time 3DGS-SLAM on edge. To minimize the overhead, RTGS fully leverages the characteristics of the 3DGS-SLAM pipeline. On the algorithm side, we introduce (1) an adaptive Gaussian pruning step to remove the redundant Gaussians by reusing gradients computed during backpropagation; and (2) a dynamic downsampling technique that directly reuses the keyframe identification and alpha computing steps to eliminate redundant pixels. On the hardware side, we propose (1) a subtile-level streaming strategy and a pixel-level pairwise scheduling strategy that mitigates workload imbalance via a Workload Scheduling Unit (WSU) guided by previous iteration information; (2) a Rendering and Backpropagation (R&B) Buffer that accelerates the rendering backpropagation by reusing intermediate data computed during rendering; and (3) a Gradient Merging Unit (GMU) to reduce intensive memory accesses caused by atomic operations while enabling pipelined aggregation. Integrated into an edge GPU, RTGS achieves real-time performance (>= 30 FPS) on four datasets and three algorithms, with up to 82.5x energy efficiency over the baseline and negligible quality loss.\n\n基于三维高斯泼溅（3D Gaussian Splatting，3DGS）的同时定位与建图（SLAM）系统，因其在渲染效率与精度方面具备当前最优表现，具有广泛的应用潜力。然而，由于速度不足，该类系统尚未在资源受限的边缘设备上得到广泛部署。为此，我们识别出 SLAM 流水线中的关键冗余，并致力于消除这些冗余以实现加速。尽管在概念上较为直接，但在实践中需要有效的手段来最小化冗余识别与剔除过程带来的开销。对此，我们提出了 RTGS——一种算法与硬件协同设计的框架，旨在实现边缘设备上的实时 3DGS-SLAM。为降低开销，RTGS 深度利用 3DGS-SLAM 流水线的结构特性。在算法层面，我们引入了：（1）自适应高斯剪枝模块，重用反向传播过程中的梯度信息以剔除冗余高斯点；（2）动态下采样机制，直接复用关键帧识别与 alpha 计算过程，消除冗余像素。在硬件层面，我们提出了：（1）基于子图块的流式处理策略与基于像素对的调度策略，借助工作负载调度单元（WSU）在前一轮迭代信息的指导下缓解负载不均问题；（2）渲染与反向传播缓存（R&B Buffer），通过复用渲染阶段生成的中间结果来加速反向传播；（3）梯度合并单元（GMU），降低原子操作带来的高频内存访问，同时支持流水线式聚合。在部署于边缘 GPU 后，RTGS 在四个数据集与三个算法上实现了实时性能（≥30 FPS），在保证几乎无质量损失的前提下，能效相比基线最高提升 82.5 倍。\n"
  },
  {
    "path": "abs/2510.06694.md",
    "content": "### SCas4D: Structural Cascaded Optimization for Boosting Persistent 4D Novel View Synthesis\n\nPersistent dynamic scene modeling for tracking and novel-view synthesis remains challenging due to the difficulty of capturing accurate deformations while maintaining computational efficiency. We propose SCas4D, a cascaded optimization framework that leverages structural patterns in 3D Gaussian Splatting for dynamic scenes. The key idea is that real-world deformations often exhibit hierarchical patterns, where groups of Gaussians share similar transformations. By progressively refining deformations from coarse part-level to fine point-level, SCas4D achieves convergence within 100 iterations per time frame and produces results comparable to existing methods with only one-twentieth of the training iterations. The approach also demonstrates effectiveness in self-supervised articulated object segmentation, novel view synthesis, and dense point tracking tasks.\n\n针对动态场景中的持续建模任务，如目标跟踪与新视角合成，如何在保持计算效率的同时精确捕捉形变仍是一大挑战。为此，我们提出了 SCas4D——一个基于级联优化的框架，利用三维高斯泼溅（3DGS）中动态场景的结构性规律进行建模。其核心思想是：现实世界中的形变往往具有层次结构，不同高斯点群之间可能共享相似的变换关系。SCas4D 通过从粗粒度的部件级变形到细粒度的点级变形逐步优化，实现了每帧时间内 100 次迭代内的快速收敛，仅用传统方法二十分之一的训练迭代量即可达到可比的效果。此外，该方法还在自监督的关节物体分割、新视角合成和稠密点跟踪等任务中展现了出色的性能。\n"
  },
  {
    "path": "abs/2510.06802.md",
    "content": "### Capture and Interact: Rapid 3D Object Acquisition and Rendering with Gaussian Splatting in Unity\n\nCapturing and rendering three-dimensional (3D) objects in real time remain a significant challenge, yet hold substantial potential for applications in augmented reality, digital twin systems, remote collaboration and prototyping. We present an end-to-end pipeline that leverages 3D Gaussian Splatting (3D GS) to enable rapid acquisition and interactive rendering of real-world objects using a mobile device, cloud processing and a local computer. Users scan an object with a smartphone video, upload it for automated 3D reconstruction, and visualize it interactively in Unity at an average of 150 frames per second (fps) on a laptop. The system integrates mobile capture, cloud-based 3D GS and Unity rendering to support real-time telepresence. Our experiments show that the pipeline processes scans in approximately 10 minutes on a graphics processing unit (GPU) achieving real-time rendering on the laptop.\n\n实时捕捉与渲染三维（3D）物体仍面临巨大挑战，但在增强现实、数字孪生系统、远程协作与快速原型等应用中具有广泛前景。我们提出了一条端到端的处理流程，基于三维高斯泼溅（3D Gaussian Splatting, 3D GS），实现了利用移动设备、云端处理和本地计算机对真实物体的快速获取与交互式渲染。用户可通过智能手机视频扫描目标物体，上传后由系统自动完成三维重建，并在 Unity 中以平均每秒 150 帧（fps）的速度在笔记本电脑上实现交互式可视化。该系统整合了移动端采集、云端 3D GS 重建与 Unity 渲染，支持实时远程呈现。实验结果表明，该流程可在图形处理单元（GPU）上大约 10 分钟内完成扫描数据处理，并在笔记本电脑上实现实时渲染。\n"
  },
  {
    "path": "abs/2510.06967.md",
    "content": "### Generating Surface for Text-to-3D using 2D Gaussian Splatting\n\nRecent advancements in Text-to-3D modeling have shown significant potential for the creation of 3D content. However, due to the complex geometric shapes of objects in the natural world, generating 3D content remains a challenging task. Current methods either leverage 2D diffusion priors to recover 3D geometry, or train the model directly based on specific 3D representations. In this paper, we propose a novel method named DirectGaussian, which focuses on generating the surfaces of 3D objects represented by surfels. In DirectGaussian, we utilize conditional text generation models and the surface of a 3D object is rendered by 2D Gaussian splatting with multi-view normal and texture priors. For multi-view geometric consistency problems, DirectGaussian incorporates curvature constraints on the generated surface during optimization process. Through extensive experiments, we demonstrate that our framework is capable of achieving diverse and high-fidelity 3D content creation.\n\n近年来，Text-to-3D（文本生成三维）建模取得了显著进展，在三维内容创作方面展现出巨大潜力。然而，由于自然界中物体几何形状的复杂性，生成高质量三维内容依然是一项挑战。目前的方法主要采用两种策略：一是借助二维扩散先验来恢复三维几何，二是直接基于特定的三维表示形式对模型进行训练。本文提出了一种新方法，名为 DirectGaussian，专注于以面元（surfels）为表示形式生成三维物体表面。在 DirectGaussian 中，我们利用条件文本生成模型，并通过结合多视角法线与纹理先验，以二维高斯泼溅方式对三维物体表面进行渲染。针对多视角几何一致性问题，DirectGaussian 在优化过程中引入了曲率约束，以提升生成表面的结构连贯性。通过大量实验，我们验证了该框架在多样化与高保真三维内容生成方面的强大能力。\n"
  },
  {
    "path": "abs/2510.07729.md",
    "content": "### ComGS: Efficient 3D Object-Scene Composition via Surface Octahedral Probes\n\nGaussian Splatting (GS) enables immersive rendering, but realistic 3D object-scene composition remains challenging. Baked appearance and shadow information in GS radiance fields cause inconsistencies when combining objects and scenes. Addressing this requires relightable object reconstruction and scene lighting estimation. For relightable object reconstruction, existing Gaussian-based inverse rendering methods often rely on ray tracing, leading to low efficiency. We introduce Surface Octahedral Probes (SOPs), which store lighting and occlusion information and allow efficient 3D querying via interpolation, avoiding expensive ray tracing. SOPs provide at least a 2x speedup in reconstruction and enable real-time shadow computation in Gaussian scenes. For lighting estimation, existing Gaussian-based inverse rendering methods struggle to model intricate light transport and often fail in complex scenes, while learning-based methods predict lighting from a single image and are viewpoint-sensitive. We observe that 3D object-scene composition primarily concerns the object's appearance and nearby shadows. Thus, we simplify the challenging task of full scene lighting estimation by focusing on the environment lighting at the object's placement. Specifically, we capture a 360 degrees reconstructed radiance field of the scene at the location and fine-tune a diffusion model to complete the lighting. Building on these advances, we propose ComGS, a novel 3D object-scene composition framework. Our method achieves high-quality, real-time rendering at around 28 FPS, produces visually harmonious results with vivid shadows, and requires only 36 seconds for editing.\n\n高斯泼溅（Gaussian Splatting, GS）支持沉浸式渲染，但逼真的三维物体-场景合成仍然面临诸多挑战。GS 辐射场中烘焙的外观与阴影信息在物体与场景合成时会导致视觉不一致。要解决该问题，需同时实现可重光照的物体重建与场景光照估计。对于可重光照的物体重建，现有基于高斯的逆向渲染方法多依赖光线追踪，效率较低。为此，我们提出了表面八面体探针（Surface Octahedral Probes, SOPs），用于存储光照与遮挡信息，并通过插值实现高效的三维查询，从而避免高成本的光线追踪。SOPs 可将重建效率提升至少两倍，并支持高斯场景中的实时阴影计算。\n对于光照估计，现有基于高斯的逆渲染方法难以建模复杂光传输，容易在复杂场景中失败，而基于学习的方法常常从单张图像预测光照，受视角影响较大。我们观察到，三维物体-场景合成的核心关注在于物体外观及其附近的阴影表现。因此，我们将全局场景光照估计这一难题简化为物体摆放位置处的环境光估计。具体而言，我们在目标位置采集 360 度的重建辐射场，并微调扩散模型以完成光照预测。\n基于以上技术，我们提出了 ComGS —— 一个新颖的三维物体-场景合成框架。该方法在约 28 帧每秒（FPS）的速率下实现高质量、实时渲染，能够生成光影生动、视觉协调的合成结果，且编辑时间仅需 36 秒。\n"
  },
  {
    "path": "abs/2510.07752.md",
    "content": "### DEGS: Deformable Event-based 3D Gaussian Splatting from RGB and Event Stream\n\nReconstructing Dynamic 3D Gaussian Splatting (3DGS) from low-framerate RGB videos is challenging. This is because large inter-frame motions will increase the uncertainty of the solution space. For example, one pixel in the first frame might have more choices to reach the corresponding pixel in the second frame. Event cameras can asynchronously capture rapid visual changes and are robust to motion blur, but they do not provide color information. Intuitively, the event stream can provide deterministic constraints for the inter-frame large motion by the event trajectories. Hence, combining low-temporal-resolution images with high-framerate event streams can address this challenge. However, it is challenging to jointly optimize Dynamic 3DGS using both RGB and event modalities due to the significant discrepancy between these two data modalities. This paper introduces a novel framework that jointly optimizes dynamic 3DGS from the two modalities. The key idea is to adopt event motion priors to guide the optimization of the deformation fields. First, we extract the motion priors encoded in event streams by using the proposed LoCM unsupervised fine-tuning framework to adapt an event flow estimator to a certain unseen scene. Then, we present the geometry-aware data association method to build the event-Gaussian motion correspondence, which is the primary foundation of the pipeline, accompanied by two useful strategies, namely motion decomposition and inter-frame pseudo-label. Extensive experiments show that our method outperforms existing image and event-based approaches across synthetic and real scenes and prove that our method can effectively optimize dynamic 3DGS with the help of event data.\n\n从低帧率 RGB 视频中重建动态三维高斯泼溅（3DGS）是一项具有挑战性的任务。这是因为帧间存在较大的运动，会增加解空间的不确定性。例如，第一帧中的一个像素可能对应到第二帧中多个不同的位置。事件相机可以异步捕捉快速的视觉变化，并且对运动模糊具有鲁棒性，但它们不提供颜色信息。直观来看，事件流所记录的轨迹可以为帧间大位移提供确定性约束。因此，将低时域分辨率的图像与高帧率事件流结合使用，有望解决该问题。然而，由于 RGB 图像与事件数据之间存在显著模态差异，如何联合优化动态 3DGS 成为一大难点。本文提出了一种新颖的框架，用于融合这两种模态对动态 3DGS 进行联合优化。其核心思想是利用事件中的运动先验来引导变形场的优化。具体而言，我们首先提出了名为 LoCM 的无监督微调框架，用于将事件流估计器自适应到特定的未知场景中，从而提取事件流中蕴含的运动先验。接着，我们设计了几何感知的数据关联方法，建立事件与高斯之间的运动对应关系，这是整个流程的基础，并辅以运动分解与帧间伪标签两项策略以提升性能。大量实验结果表明，本文方法在合成与真实场景中均优于现有基于图像或事件的方案，验证了事件数据对于动态 3DGS 优化的有效性。\n"
  },
  {
    "path": "abs/2510.07830.md",
    "content": "### PrismGS: Physically-Grounded Anti-Aliasing for High-Fidelity Large-Scale 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently enabled real-time photorealistic rendering in compact scenes, but scaling to large urban environments introduces severe aliasing artifacts and optimization instability, especially under high-resolution (e.g., 4K) rendering. These artifacts, manifesting as flickering textures and jagged edges, arise from the mismatch between Gaussian primitives and the multi-scale nature of urban geometry. While existing \"divide-and-conquer\" pipelines address scalability, they fail to resolve this fidelity gap. In this paper, we propose PrismGS, a physically-grounded regularization framework that improves the intrinsic rendering behavior of 3D Gaussians. PrismGS integrates two synergistic regularizers. The first is pyramidal multi-scale supervision, which enforces consistency by supervising the rendering against a pre-filtered image pyramid. This compels the model to learn an inherently anti-aliased representation that remains coherent across different viewing scales, directly mitigating flickering textures. This is complemented by an explicit size regularization that imposes a physically-grounded lower bound on the dimensions of the 3D Gaussians. This prevents the formation of degenerate, view-dependent primitives, leading to more stable and plausible geometric surfaces and reducing jagged edges. Our method is plug-and-play and compatible with existing pipelines. Extensive experiments on MatrixCity, Mill-19, and UrbanScene3D demonstrate that PrismGS achieves state-of-the-art performance, yielding significant PSNR gains around 1.5 dB against CityGaussian, while maintaining its superior quality and robustness under demanding 4K rendering.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）近期已在小型场景中实现了实时的真实感渲染，但将其扩展至大规模城市环境时，会引发严重的走样伪影与优化不稳定问题，尤其是在高分辨率（如 4K）渲染条件下。这些伪影表现为纹理闪烁和边缘锯齿，源于高斯基元与城市场景中多尺度几何之间的不匹配。尽管现有的“分而治之”式处理流程在一定程度上解决了可扩展性问题，但仍难以弥合渲染质量的差距。为此，本文提出了 PrismGS——一个具有物理基础的正则化框架，用于提升 3D 高斯的本质渲染行为。PrismGS 结合了两个协同正则项：其一为金字塔多尺度监督，通过将渲染结果与预滤波图像金字塔进行监督对齐，强制模型学习本质上具备抗锯齿能力的表示，从而跨视角尺度保持一致性并直接抑制纹理闪烁；其二为显式的尺寸正则项，对 3D 高斯的尺寸设定物理合理的下界，防止退化成视角依赖的异常基元，从而生成更加稳定且可信的几何表面，并有效减少边缘锯齿。我们的方法为即插即用型设计，可无缝集成至现有渲染流程中。我们在 MatrixCity、Mill-19 和 UrbanScene3D 等多个数据集上进行了大量实验证明，PrismGS 相较 CityGaussian 获得了约 1.5 dB 的 PSNR 提升，并在要求极高的 4K 渲染任务中持续展现出优异的质量与鲁棒性。\n"
  },
  {
    "path": "abs/2510.07944.md",
    "content": "### CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving\n\nGenerative models have been widely applied to world modeling for environment simulation and future state prediction. With advancements in autonomous driving, there is a growing demand not only for high-fidelity video generation under various controls, but also for producing diverse and meaningful information such as depth estimation. To address this, we propose CVD-STORM, a cross-view video diffusion model utilizing a spatial-temporal reconstruction Variational Autoencoder (VAE) that generates long-term, multi-view videos with 4D reconstruction capabilities under various control inputs. Our approach first fine-tunes the VAE with an auxiliary 4D reconstruction task, enhancing its ability to encode 3D structures and temporal dynamics. Subsequently, we integrate this VAE into the video diffusion process to significantly improve generation quality. Experimental results demonstrate that our model achieves substantial improvements in both FID and FVD metrics. Additionally, the jointly-trained Gaussian Splatting Decoder effectively reconstructs dynamic scenes, providing valuable geometric information for comprehensive scene understanding.\n\n生成模型已被广泛应用于世界建模任务，如环境模拟与未来状态预测。随着自动驾驶技术的发展，系统不仅需要在多种控制条件下生成高保真视频，还需产出如深度估计等多样且有意义的信息。为此，我们提出了 CVD-STORM，一种基于跨视角视频扩散的模型，结合时空重建变分自编码器（VAE），能够在多种控制输入下生成具有四维重建能力的长时多视角视频。我们的方法首先通过辅助的四维重建任务对 VAE 进行微调，从而增强其对三维结构与时间动态的编码能力；随后，将该 VAE 融入视频扩散生成流程中，显著提升视频生成质量。实验结果表明，CVD-STORM 在 FID 与 FVD 指标上均取得了显著提升。此外，联合训练的高斯泼溅解码器可有效重建动态场景，提供有价值的几何信息，助力全面的场景理解。\n"
  },
  {
    "path": "abs/2510.08096.md",
    "content": "### Efficient Label Refinement for Face Parsing Under Extreme Poses Using 3D Gaussian Splatting\n\nAccurate face parsing under extreme viewing angles remains a significant challenge due to limited labeled data in such poses. Manual annotation is costly and often impractical at scale. We propose a novel label refinement pipeline that leverages 3D Gaussian Splatting (3DGS) to generate accurate segmentation masks from noisy multiview predictions. By jointly fitting two 3DGS models, one to RGB images and one to their initial segmentation maps, our method enforces multiview consistency through shared geometry, enabling the synthesis of pose-diverse training data with only minimal post-processing. Fine-tuning a face parsing model on this refined dataset significantly improves accuracy on challenging head poses, while maintaining strong performance on standard views. Extensive experiments, including human evaluations, demonstrate that our approach achieves superior results compared to state-of-the-art methods, despite requiring no ground-truth 3D annotations and using only a small set of initial images. Our method offers a scalable and effective solution for improving face parsing robustness in real-world settings.\n\n在极端视角下实现精确的人脸解析仍是一项重大挑战，原因在于此类姿态下的标注数据极为稀缺。人工标注不仅成本高昂，而且在大规模场景中往往难以实现。为此，我们提出了一种基于三维高斯泼溅（3DGS）的标签精炼新方案，可通过多视角噪声预测生成准确的分割掩码。该方法联合拟合两个 3DGS 模型，分别对应原始 RGB 图像与其初始分割图，通过共享几何实现多视角一致性，从而仅需极少的后处理便可合成姿态多样的训练数据。利用该精炼数据集微调人脸解析模型，可显著提升其在困难头部姿态下的解析精度，同时在标准视角下仍保持优异性能。大量实验（包括人工评估）表明，尽管无需任何真实三维标注，且仅依赖少量初始图像，我们的方法在性能上仍全面优于现有最先进方法，提供了一种可扩展、有效的现实世界人脸解析鲁棒性提升方案。\n"
  },
  {
    "path": "abs/2510.08491.md",
    "content": "### Splat the Net: Radiance Fields with Splattable Neural Primitives\n\nRadiance fields have emerged as a predominant representation for modeling 3D scene appearance. Neural formulations such as Neural Radiance Fields provide high expressivity but require costly ray marching for rendering, whereas primitive-based methods such as 3D Gaussian Splatting offer real-time efficiency through splatting, yet at the expense of representational power. Inspired by advances in both these directions, we introduce splattable neural primitives, a new volumetric representation that reconciles the expressivity of neural models with the efficiency of primitive-based splatting. Each primitive encodes a bounded neural density field parameterized by a shallow neural network. Our formulation admits an exact analytical solution for line integrals, enabling efficient computation of perspectively accurate splatting kernels. As a result, our representation supports integration along view rays without the need for costly ray marching. The primitives flexibly adapt to scene geometry and, being larger than prior analytic primitives, reduce the number required per scene. On novel-view synthesis benchmarks, our approach matches the quality and speed of 3D Gaussian Splatting while using 10× fewer primitives and 6× fewer parameters. These advantages arise directly from the representation itself, without reliance on complex control or adaptation frameworks.\n\n辐射场已成为建模三维场景外观的主流表示方式。神经辐射场（Neural Radiance Fields）等神经方法具有较强的表达能力，但渲染时需进行代价高昂的光线行进计算；而基于基元的方法如三维高斯泼溅（3D Gaussian Splatting）则通过泼溅操作实现实时效率，代价是表达能力的降低。受这两类方法进展的启发，我们提出了可泼溅神经基元（Splattable Neural Primitives），这是一种结合神经模型表达性与基元泼溅效率的新型体积表示方式。每个基元编码一个由浅层神经网络参数化的有界神经密度场。我们的方法形式允许对路径积分求得精确解析解，从而高效计算透视精确的泼溅核函数。因此，该表示可在无需光线行进的情况下完成视线方向的积分计算。所提出的神经基元可灵活适应场景几何，且由于尺寸大于以往的解析基元，所需基元数量显著减少。在新视角合成基准测试中，我们的方法在保持与 3D Gaussian Splatting 相当的图像质量和渲染速度的同时，使用的基元数量减少了 10 倍，参数量减少了 6 倍。这些优势完全源自于表示本身，而非依赖复杂的控制或自适应机制。\n"
  },
  {
    "path": "abs/2510.08551.md",
    "content": "### ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation\n\nOn-the-fly 3D reconstruction from monocular image sequences is a long-standing challenge in computer vision, critical for applications such as real-to-sim, AR/VR, and robotics. Existing methods face a major tradeoff: per-scene optimization yields high fidelity but is computationally expensive, whereas feed-forward foundation models enable real-time inference but struggle with accuracy and robustness. In this work, we propose ARTDECO, a unified framework that combines the efficiency of feed-forward models with the reliability of SLAM-based pipelines. ARTDECO uses 3D foundation models for pose estimation and point prediction, coupled with a Gaussian decoder that transforms multi-scale features into structured 3D Gaussians. To sustain both fidelity and efficiency at scale, we design a hierarchical Gaussian representation with a LoD-aware rendering strategy, which improves rendering fidelity while reducing redundancy. Experiments on eight diverse indoor and outdoor benchmarks show that ARTDECO delivers interactive performance comparable to SLAM, robustness similar to feed-forward systems, and reconstruction quality close to per-scene optimization, providing a practical path toward on-the-fly digitization of real-world environments with both accurate geometry and high visual fidelity.\n\n从单目图像序列中进行实时三维重建一直是计算机视觉领域的长期挑战，对于 real-to-sim、增强/虚拟现实（AR/VR）以及机器人等应用至关重要。现有方法面临一个关键权衡：逐场景优化可实现高保真重建，但计算代价高昂；而前馈式基础模型可支持实时推理，但在精度与鲁棒性方面表现不足。为解决这一问题，我们提出了 ARTDECO——一个融合前馈模型高效性与 SLAM 系统稳定性的统一框架。ARTDECO 利用三维基础模型进行位姿估计与点预测，并结合高斯解码器将多尺度特征转换为结构化的三维高斯表示。为在大规模场景中同时维持重建保真度与效率，我们设计了分层高斯表示与面向细节层级（LoD-aware）的渲染策略，有效提升了渲染质量并减少冗余。我们在八个不同类型的室内外基准数据集上进行了实验，结果表明 ARTDECO 实现了与 SLAM 相当的交互式性能，具备前馈系统级别的鲁棒性，同时在重建质量上接近逐场景优化水平，提供了一种可行的路径，实现现实环境的高精度、高保真实时数字化。\n"
  },
  {
    "path": "abs/2510.08566.md",
    "content": "### D2GS: Depth-and-Density Guided Gaussian Splatting for Stable and Accurate Sparse-View Reconstruction\n\nRecent advances in 3D Gaussian Splatting (3DGS) enable real-time, high-fidelity novel view synthesis (NVS) with explicit 3D representations. However, performance degradation and instability remain significant under sparse-view conditions. In this work, we identify two key failure modes under sparse-view conditions: overfitting in regions with excessive Gaussian density near the camera, and underfitting in distant areas with insufficient Gaussian coverage. To address these challenges, we propose a unified framework D2GS, comprising two key components: a Depth-and-Density Guided Dropout strategy that suppresses overfitting by adaptively masking redundant Gaussians based on density and depth, and a Distance-Aware Fidelity Enhancement module that improves reconstruction quality in under-fitted far-field areas through targeted supervision. Moreover, we introduce a new evaluation metric to quantify the stability of learned Gaussian distributions, providing insights into the robustness of the sparse-view 3DGS. Extensive experiments on multiple datasets demonstrate that our method significantly improves both visual quality and robustness under sparse view conditions.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）的最新进展使得基于显式三维表示的实时高保真新视角合成（NVS）成为可能。然而，在稀疏视角条件下，3DGS 仍面临显著的性能下降与稳定性问题。本文识别出两类典型失败模式：其一是在靠近相机的区域，高斯密度过高导致的过拟合现象；其二是在远距离区域，高斯覆盖不足导致的欠拟合问题。为应对这些挑战，我们提出了一个统一框架 D2GS，其中包含两个关键模块：（1）深度-密度引导的 Dropout 策略（Depth-and-Density Guided Dropout），该方法基于深度与密度信息自适应地屏蔽冗余高斯点，以抑制过拟合；（2）距离感知的保真度增强模块（Distance-Aware Fidelity Enhancement），通过有针对性的监督机制提升远场欠拟合区域的重建质量。此外，我们还引入了一种新的评估指标，用于量化所学习高斯分布的稳定性，从而更好地洞察稀疏视角下 3DGS 的鲁棒性。大量在多个数据集上的实验证明，D2GS 显著提升了稀疏视角条件下的视觉质量与系统稳定性。\n"
  },
  {
    "path": "abs/2510.08575.md",
    "content": "### ReSplat: Learning Recurrent Gaussian Splats\n\nWhile feed-forward Gaussian splatting models provide computational efficiency and effectively handle sparse input settings, their performance is fundamentally limited by the reliance on a single forward pass during inference. We propose ReSplat, a feed-forward recurrent Gaussian splatting model that iteratively refines 3D Gaussians without explicitly computing gradients. Our key insight is that the Gaussian splatting rendering error serves as a rich feedback signal, guiding the recurrent network to learn effective Gaussian updates. This feedback signal naturally adapts to unseen data distributions at test time, enabling robust generalization. To initialize the recurrent process, we introduce a compact reconstruction model that operates in a 16× subsampled space, producing 16× fewer Gaussians than previous per-pixel Gaussian models. This substantially reduces computational overhead and allows for efficient Gaussian updates. Extensive experiments across varying of input views (2, 8, 16), resolutions (256×256 to 540×960), and datasets (DL3DV and RealEstate10K) demonstrate that our method achieves state-of-the-art performance while significantly reducing the number of Gaussians and improving the rendering speed.\n\n虽然前馈式高斯泼溅模型在计算效率方面表现出色，并能有效处理稀疏输入设置，但其性能受限于推理时仅执行一次前向传播的结构性限制。为突破这一瓶颈，我们提出了 ReSplat——一种前馈式循环高斯泼溅模型，可在不显式计算梯度的情况下迭代优化三维高斯表示。我们的关键洞察是：高斯泼溅渲染误差本身可作为富含信息的反馈信号，引导循环网络学习有效的高斯更新策略。该反馈机制可自然适应测试时未见过的数据分布，从而实现强泛化能力。为初始化循环过程，我们设计了一种紧凑的重建模型，其在 16 倍下采样的空间中运行，生成的高斯数量比以往每像素高斯模型少 16 倍，显著降低了计算开销并支持高效的高斯更新。我们在不同输入视角数量（2、8、16）、图像分辨率（256×256 至 540×960）和数据集（DL3DV 和 RealEstate10K）上的大量实验证明：ReSplat 在大幅减少高斯数量与加快渲染速度的同时，仍实现了当前最优的性能。\n"
  },
  {
    "path": "abs/2510.08587.md",
    "content": "### EGSTalker: Real-Time Audio-Driven Talking Head Generation with Efficient Gaussian Deformation\n\nThis paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3-5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker's potential for real-time multimedia applications.\n\n本文提出了EGSTalker，一种基于三维高斯泼溅（3D Gaussian Splatting, 3DGS）的实时音频驱动拟人头像生成框架。EGSTalker旨在提升生成速度与视觉保真度，仅需3至5分钟的训练视频即可合成高质量面部动画。该框架包含两个关键阶段：静态高斯初始化和音频驱动变形。在第一阶段，采用多分辨率哈希三平面（multi-resolution hash triplane）和Kolmogorov-Arnold网络（Kolmogorov-Arnold Network, KAN）提取空间特征，并构建紧凑的三维高斯表示。在第二阶段，我们提出高效空间-音频注意力（Efficient Spatial-Audio Attention, ESAA）模块，用于融合音频与空间线索，同时由KAN预测相应的高斯变形。大量实验表明，EGSTalker在渲染质量和唇形同步准确性方面可媲美现有最先进方法，同时在推理速度上显著超越它们。上述结果凸显了EGSTalker在实时多媒体应用中的潜力。\n"
  },
  {
    "path": "abs/2510.09364.md",
    "content": "### Visibility-Aware Densification for 3D Gaussian Splatting in Dynamic Urban Scenes\n\n3D Gaussian splatting (3DGS) has demonstrated impressive performance in synthesizing high-fidelity novel views. Nonetheless, its effectiveness critically depends on the quality of the initialized point cloud. Specifically, achieving uniform and complete point coverage over the underlying scene structure requires overlapping observation frustums, an assumption that is often violated in unbounded, dynamic urban environments. Training Gaussian models with partially initialized point clouds often leads to distortions and artifacts, as camera rays may fail to intersect valid surfaces, resulting in incorrect gradient propagation to Gaussian primitives associated with occluded or invisible geometry. Additionally, existing densification strategies simply clone and split Gaussian primitives from existing ones, incapable of reconstructing missing structures. To address these limitations, we propose VAD-GS, a 3DGS framework tailored for geometry recovery in challenging urban scenes. Our method identifies unreliable geometry structures via voxel-based visibility reasoning, selects informative supporting views through diversity-aware view selection, and recovers missing structures via patch matching-based multi-view stereo reconstruction. This design enables the generation of new Gaussian primitives guided by reliable geometric priors, even in regions lacking initial points. Extensive experiments on the Waymo and nuScenes datasets demonstrate that VAD-GS outperforms state-of-the-art 3DGS approaches and significantly improves the quality of reconstructed geometry for both static and dynamic objects.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）在合成高保真新视角方面展现出了出色性能。然而，其效果在很大程度上依赖于初始点云的质量。具体而言，为了在场景结构上实现均匀且完整的点覆盖，通常需要重叠的观测视锥，但这一假设在无界、动态的城市环境中往往难以满足。使用部分初始化的点云训练高斯模型常常会导致失真和伪影，因为摄像机光线可能无法与有效表面相交，进而导致与被遮挡或不可见几何相关的高斯基元接收到错误的梯度传播。此外，现有的加密策略通常仅通过克隆和拆分已有的高斯基元来进行点密度增强，无法重建缺失结构。为了解决上述问题，我们提出了VAD-GS，一种面向复杂城市场景几何恢复的3DGS框架。该方法通过基于体素的可见性推理识别不可靠的几何结构，结合多样性感知的视角选择策略提取具有信息量的辅助视角，并利用基于图块匹配的多视图立体重建技术恢复缺失结构。该设计使得即使在初始点缺失区域，也能依据可靠的几何先验生成新的高斯基元。我们在Waymo和nuScenes数据集上进行了大量实验证明，VAD-GS在几何重建质量方面显著优于现有最先进的3DGS方法，且在静态与动态物体上均取得了优异性能。\n"
  },
  {
    "path": "abs/2510.09438.md",
    "content": "### Mono4DEditor: Text-Driven 4D Scene Editing from Monocular Video via Point-Level Localization of Language-Embedded Gaussians\n\nEditing 4D scenes reconstructed from monocular videos based on text prompts is a valuable yet challenging task with broad applications in content creation and virtual environments. The key difficulty lies in achieving semantically precise edits in localized regions of complex, dynamic scenes, while preserving the integrity of unedited content. To address this, we introduce Mono4DEditor, a novel framework for flexible and accurate text-driven 4D scene editing. Our method augments 3D Gaussians with quantized CLIP features to form a language-embedded dynamic representation, enabling efficient semantic querying of arbitrary spatial regions. We further propose a two-stage point-level localization strategy that first selects candidate Gaussians via CLIP similarity and then refines their spatial extent to improve accuracy. Finally, targeted edits are performed on localized regions using a diffusion-based video editing model, with flow and scribble guidance ensuring spatial fidelity and temporal coherence. Extensive experiments demonstrate that Mono4DEditor enables high-quality, text-driven edits across diverse scenes and object types, while preserving the appearance and geometry of unedited areas and surpassing prior approaches in both flexibility and visual fidelity.\n\n基于文本提示编辑由单目视频重建而成的4D场景是一项具有广泛应用价值但极具挑战性的任务，广泛应用于内容创作与虚拟环境。其核心难点在于，如何在复杂动态场景中的局部区域实现语义精确的编辑，同时保持未编辑内容的完整性。为此，我们提出Mono4DEditor，一个灵活且精确的文本驱动4D场景编辑新框架。该方法通过将3D高斯与量化的CLIP特征相结合，构建出一种融合语言信息的动态表示，使得对任意空间区域的语义查询更加高效。我们进一步提出一种两阶段的点级定位策略，首先通过CLIP相似度筛选候选高斯点，然后细化其空间范围以提升定位精度。最后，我们在这些局部区域中使用基于扩散模型的视频编辑方法进行目标编辑，同时引入光流与涂鸦引导，确保空间保真与时间连贯。大量实验证明，Mono4DEditor能够在多种场景与对象类型下实现高质量、文本驱动的编辑，既保留了未编辑区域的外观与几何结构，又在灵活性与视觉保真度方面优于现有方法。\n"
  },
  {
    "path": "abs/2510.09881.md",
    "content": "### LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates\n\nRecent advances in novel-view synthesis can create the photo-realistic visualization of real-world environments from conventional camera captures. However, acquiring everyday environments from casual captures faces challenges due to frequent scene changes, which require dense observations both spatially and temporally. We propose long-term Gaussian scene chronology from sparse-view updates, coined LTGS, an efficient scene representation that can embrace everyday changes from highly under-constrained casual captures. Given an incomplete and unstructured Gaussian splatting representation obtained from an initial set of input images, we robustly model the long-term chronology of the scene despite abrupt movements and subtle environmental variations. We construct objects as template Gaussians, which serve as structural, reusable priors for shared object tracks. Then, the object templates undergo a further refinement pipeline that modulates the priors to adapt to temporally varying environments based on few-shot observations. Once trained, our framework is generalizable across multiple time steps through simple transformations, significantly enhancing the scalability for a temporal evolution of 3D environments. As existing datasets do not explicitly represent the long-term real-world changes with a sparse capture setup, we collect real-world datasets to evaluate the practicality of our pipeline. Experiments demonstrate that our framework achieves superior reconstruction quality compared to other baselines while enabling fast and light-weight updates.\n\n新视角合成技术的最新进展使得可以利用常规相机捕捉数据生成真实环境的照片级可视化效果。然而，日常环境的随手拍摄常面临频繁场景变化所带来的挑战，这类变化要求在空间和时间上都具备高密度的观测。为应对这一问题，我们提出LTGS（long-term Gaussian scene chronology from sparse-view updates），一种可从极度欠约束的随手拍摄中适应日常变化的高效场景表示方式。面对仅由初始图像集获得的不完整、非结构化高斯泼溅表示，我们能够稳健建模场景的长期时间序列，即便在存在剧烈运动和微妙环境变化的情况下亦如此。我们将场景中的对象构建为模板高斯，这些模板作为可共享的结构性先验用于跟踪对象轨迹。随后，这些对象模板将进入进一步的优化流程，根据少量观测对先验进行调节，以适应时间变化的环境。一旦训练完成，我们的框架便可通过简单的变换泛化到多个时间步，大幅提升了3D环境时间演化建模的可扩展性。鉴于现有数据集并未显式覆盖稀疏拍摄条件下的长期现实变化，我们采集了真实世界的数据集用于评估本方法的实用性。实验结果表明，我们的框架不仅在重建质量上优于现有基线方法，同时支持快速且轻量的更新。\n"
  },
  {
    "path": "abs/2510.09962.md",
    "content": "### VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping\n\nMaintaining an up-to-date map that accurately reflects recent changes in the environment is crucial, especially for robots that repeatedly traverse the same space. Failing to promptly update the changed regions can degrade map quality, resulting in poor localization, inefficient operations, and even lost robots. 3D Gaussian Splatting (3DGS) has recently seen widespread adoption in online map reconstruction due to its dense, differentiable, and photorealistic properties, yet accurately and efficiently updating the regions of change remains a challenge. In this paper, we propose VG-Mapping, a novel online 3DGS-based mapping system tailored for such semi-static scenes. Our approach introduces a hybrid representation that augments 3DGS with a TSDF-based voxel map to efficiently identify changed regions in a scene, along with a variation-aware density control strategy that inserts or deletes Gaussian primitives in regions undergoing change. Furthermore, to address the absence of public benchmarks for this task, we construct a RGB-D dataset comprising both synthetic and real-world semi-static environments. Experimental results demonstrate that our method substantially improves the rendering quality and map update efficiency in semi-static scenes.\n\n维护一张能够准确反映环境最新变化的地图至关重要，尤其对于那些需反复穿越同一区域的机器人而言更是如此。如果未能及时更新发生变化的区域，地图质量将下降，导致定位误差、操作效率低下，甚至机器人丢失等严重问题。三维高斯泼溅（3D Gaussian Splatting, 3DGS）因其稠密、可微分及高度写实的特性，近年来被广泛应用于在线地图重建中，然而，如何高效且准确地更新变化区域仍然是一个挑战。本文提出VG-Mapping，一种专为半静态场景设计的在线3DGS地图构建系统。该方法引入一种混合表示，将3DGS与基于TSDF的体素地图相结合，以高效识别场景中发生变化的区域，并提出一种变化感知的密度控制策略，在变化区域中插入或删除高斯基元以实现动态更新。此外，鉴于该任务尚无公开基准，我们构建了一个涵盖合成与真实半静态环境的RGB-D数据集。实验结果表明，VG-Mapping在半静态场景中显著提升了渲染质量与地图更新效率。\n"
  },
  {
    "path": "abs/2510.09997.md",
    "content": "### CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting\n\nLevel of Detail (LoD) is a fundamental technique in real-time computer graphics for managing the rendering costs of complex scenes while preserving visual fidelity. Traditionally, LoD is implemented using discrete levels (DLoD), where multiple, distinct versions of a model are swapped out at different distances. This long-standing paradigm, however, suffers from two major drawbacks: it requires significant storage for multiple model copies and causes jarring visual \"popping\" artifacts during transitions, degrading the user experience. We argue that the explicit, primitive-based nature of the emerging 3D Gaussian Splatting (3DGS) technique enables a more ideal paradigm: Continuous LoD (CLoD). A CLoD approach facilitates smooth, seamless quality scaling within a single, unified model, thereby circumventing the core problems of DLOD. To this end, we introduce CLoD-GS, a framework that integrates a continuous LoD mechanism directly into a 3DGS representation. Our method introduces a learnable, distance-dependent decay parameter for each Gaussian primitive, which dynamically adjusts its opacity based on viewpoint proximity. This allows for the progressive and smooth filtering of less significant primitives, effectively creating a continuous spectrum of detail within one model. To train this model to be robust across all distances, we introduce a virtual distance scaling mechanism and a novel coarse-to-fine training strategy with rendered point count regularization. Our approach not only eliminates the storage overhead and visual artifacts of discrete methods but also reduces the primitive count and memory footprint of the final model. Extensive experiments demonstrate that CLoD-GS achieves smooth, quality-scalable rendering from a single model, delivering high-fidelity results across a wide range of performance targets.\n\n细节层次（Level of Detail, LoD）是实时计算机图形中一项核心技术，用于在保留视觉保真度的同时控制复杂场景的渲染开销。传统上，LoD采用离散层次（Discrete LoD, DLoD）方式实现，即在不同距离下切换多个不同版本的模型。然而，这一长期使用的范式存在两个主要问题：一是需要为多个模型副本预留大量存储空间，二是在切换过程中会产生明显的“跳变”伪影，影响用户体验。我们认为，新兴的三维高斯泼溅（3D Gaussian Splatting, 3DGS）技术具有显式、基元驱动的特性，为构建更理想的连续细节层次（Continuous LoD, CLoD）范式提供了可能。CLoD方法允许在单一统一模型内实现细节的平滑、无缝缩放，从而绕开DLoD的核心问题。为此，我们提出CLoD-GS，一个将连续LoD机制直接融入3DGS表示的框架。我们为每个高斯基元引入一个可学习的、基于距离衰减的参数，依据观察视角与对象的接近程度动态调整其透明度，从而实现不重要基元的逐步、平滑滤除，最终在单一模型中构建出连续的细节层次谱。为了使该模型在所有距离下均具鲁棒性，我们引入虚拟距离缩放机制，并提出一种从粗到细的训练策略，结合渲染点数量正则化。该方法不仅消除了离散方法带来的存储冗余与视觉伪影，还降低了最终模型的基元数量和内存占用。大量实验表明，CLoD-GS可从单一模型实现平滑、可调细节的渲染，在不同性能需求下均能呈现高保真结果。\n"
  },
  {
    "path": "abs/2510.10030.md",
    "content": "### P-4DGS: Predictive 4D Gaussian Splatting with 90× Compression\n\n3D Gaussian Splatting (3DGS) has garnered significant attention due to its superior scene representation fidelity and real-time rendering performance, especially for dynamic 3D scene reconstruction (i.e., 4D reconstruction). However, despite achieving promising results, most existing algorithms overlook the substantial temporal and spatial redundancies inherent in dynamic scenes, leading to prohibitive memory consumption. To address this, we propose P-4DGS, a novel dynamic 3DGS representation for compact 4D scene modeling. Inspired by intra- and inter-frame prediction techniques commonly used in video compression, we first design a 3D anchor point-based spatial-temporal prediction module to fully exploit the spatial-temporal correlations across different 3D Gaussian primitives. Subsequently, we employ an adaptive quantization strategy combined with context-based entropy coding to further reduce the size of the 3D anchor points, thereby achieving enhanced compression efficiency. To evaluate the rate-distortion performance of our proposed P-4DGS in comparison with other dynamic 3DGS representations, we conduct extensive experiments on both synthetic and real-world datasets. Experimental results demonstrate that our approach achieves state-of-the-art reconstruction quality and the fastest rendering speed, with a remarkably low storage footprint (around 1MB on average), achieving up to 40× and 90× compression on synthetic and real-world scenes, respectively.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）因其卓越的场景表达保真度与实时渲染性能，特别是在动态三维场景重建（即4D重建）中的表现，近年来受到广泛关注。然而，尽管现有方法取得了令人鼓舞的成果，但大多数算法忽视了动态场景中大量存在的时间与空间冗余，导致内存消耗极为高昂。为此，我们提出P-4DGS，一种用于紧凑型4D场景建模的动态3DGS新表示方法。该方法受到视频压缩中帧内与帧间预测技术的启发，首先设计了一种基于三维锚点的时空预测模块，充分挖掘不同高斯基元之间的时空相关性。随后，我们结合上下文熵编码，采用自适应量化策略进一步压缩三维锚点的存储开销，从而实现更高效的压缩率。为评估P-4DGS在码率-失真性能方面相较其他动态3DGS表示的优劣，我们在合成数据与真实场景数据上进行了大量实验。实验结果表明，我们的方法在重建质量和渲染速度上均达到当前最优水平，同时具有极低的存储占用（平均约为1MB），在合成场景和真实场景上分别实现了高达40×与90×的压缩率。\n"
  },
  {
    "path": "abs/2510.10097.md",
    "content": "### Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting\n\nNeural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have advanced 3D reconstruction and novel view synthesis, but remain heavily dependent on accurate camera poses and dense viewpoint coverage. These requirements limit their applicability in sparse-view settings, where pose estimation becomes unreliable and supervision is insufficient. To overcome these challenges, we introduce Gesplat, a 3DGS-based framework that enables robust novel view synthesis and geometrically consistent reconstruction from unposed sparse images. Unlike prior works that rely on COLMAP for sparse point cloud initialization, we leverage the VGGT foundation model to obtain more reliable initial poses and dense point clouds. Our approach integrates several key innovations: 1) a hybrid Gaussian representation with dual position-shape optimization enhanced by inter-view matching consistency; 2) a graph-guided attribute refinement module to enhance scene details; and 3) flow-based depth regularization that improves depth estimation accuracy for more effective supervision. Comprehensive quantitative and qualitative experiments demonstrate that our approach achieves more robust performance on both forward-facing and large-scale complex datasets compared to other pose-free methods.\n\n神经辐射场（Neural Radiance Fields, NeRF）与三维高斯泼溅（3D Gaussian Splatting, 3DGS）在三维重建与新视角合成方面取得了重要进展，但仍严重依赖于准确的相机位姿和密集的视角覆盖。这些要求限制了其在稀疏视角设置下的适用性，因为在此条件下位姿估计变得不可靠，监督信息也十分有限。为应对这一挑战，我们提出Gesplat，一种基于3DGS的框架，能够从无位姿的稀疏图像中实现鲁棒的新视角合成与几何一致的三维重建。与以往依赖COLMAP进行稀疏点云初始化的方法不同，我们采用VGGT基础模型来获得更可靠的初始位姿与稠密点云。我们的方法融合了多项关键创新：1）融合位置-形状双重优化与视角间匹配一致性增强的混合高斯表示；2）用于提升场景细节的图引导属性细化模块；3）基于光流的深度正则化机制，以提高深度估计精度，实现更有效的监督。大量定量与定性实验表明，与其他无需位姿的方法相比，我们的方法在前向视角和大规模复杂数据集上均展现出更为鲁棒的性能表现。\n"
  },
  {
    "path": "abs/2510.10152.md",
    "content": "### Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer\n\nIn this work, we present Color3D, a highly adaptable framework for colorizing both static and dynamic 3D scenes from monochromatic inputs, delivering visually diverse and chromatically vibrant reconstructions with flexible user-guided control. In contrast to existing methods that focus solely on static scenarios and enforce multi-view consistency by averaging color variations which inevitably sacrifice both chromatic richness and controllability, our approach is able to preserve color diversity and steerability while ensuring cross-view and cross-time consistency. In particular, the core insight of our method is to colorize only a single key view and then fine-tune a personalized colorizer to propagate its color to novel views and time steps. Through personalization, the colorizer learns a scene-specific deterministic color mapping underlying the reference view, enabling it to consistently project corresponding colors to the content in novel views and video frames via its inherent inductive bias. Once trained, the personalized colorizer can be applied to infer consistent chrominance for all other images, enabling direct reconstruction of colorful 3D scenes with a dedicated Lab color space Gaussian splatting representation. The proposed framework ingeniously recasts complicated 3D colorization as a more tractable single image paradigm, allowing seamless integration of arbitrary image colorization models with enhanced flexibility and controllability. Extensive experiments across diverse static and dynamic 3D colorization benchmarks substantiate that our method can deliver more consistent and chromatically rich renderings with precise user control.\n\n本文提出了Color3D，一个高度灵活的框架，可从单色输入中为静态与动态三维场景进行上色，生成具有视觉多样性和色彩丰富性的重建结果，并支持灵活的用户引导控制。与现有方法仅关注静态场景，并通过对颜色变化取平均以实现多视角一致性（从而不可避免地牺牲色彩丰富性与可控性）不同，我们的方法在保证视角间与时间间一致性的同时，保留了颜色多样性与可操控性。我们方法的核心思想在于仅对一个关键视角进行上色，并通过微调一个个性化上色器，将其颜色传播至新视角和时间帧。通过个性化训练，该上色器能够学习参考视图中场景特定的确定性颜色映射，利用其固有的归纳偏置，将对应颜色一致地投射到新视角与视频帧中的内容上。训练完成后，该个性化上色器可用于为所有其他图像推断一致的色度，从而直接重建出具有专用Lab色彩空间高斯泼溅表示的彩色三维场景。该框架巧妙地将复杂的三维上色任务转化为更易处理的单图像范式，使任意图像上色模型得以无缝集成，并具备更高的灵活性与可控性。我们在多个静态与动态三维上色基准上进行了大量实验证明，Color3D能够在提供精确用户控制的同时，实现更一致、色彩更丰富的渲染效果。\n"
  },
  {
    "path": "abs/2510.10257.md",
    "content": "### Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) struggles in few-shot scenarios, where its standard adaptive density control (ADC) can lead to overfitting and bloated reconstructions. While state-of-the-art methods like FSGS improve quality, they often do so by significantly increasing the primitive count. This paper presents a framework that revises the core 3DGS optimization to prioritize efficiency. We replace the standard positional gradient heuristic with a novel densification trigger that uses the opacity gradient as a lightweight proxy for rendering error. We find this aggressive densification is only effective when paired with a more conservative pruning schedule, which prevents destructive optimization cycles. Combined with a standard depth-correlation loss for geometric guidance, our framework demonstrates a fundamental improvement in efficiency. On the 3-view LLFF dataset, our model is over 40% more compact (32k vs. 57k primitives) than FSGS, and on the Mip-NeRF 360 dataset, it achieves a reduction of approximately 70%. This dramatic gain in compactness is achieved with a modest trade-off in reconstruction metrics, establishing a new state-of-the-art on the quality-vs-efficiency Pareto frontier for few-shot view synthesis.\n\n三维高斯泼溅（3D Gaussian Splatting, 3DGS）在小样本（few-shot）场景中表现不佳，其标准的自适应密度控制（Adaptive Density Control, ADC）机制往往导致过拟合与冗余重建。尽管诸如FSGS等最新方法提升了重建质量，但通常以显著增加基元数量为代价。本文提出了一个优化框架，对3DGS的核心优化策略进行调整，以效率为优先目标。我们用一种新颖的加密触发机制替代了传统的位置梯度启发式，该机制以不透明度梯度作为渲染误差的轻量代理指标。我们发现，这种激进的加密策略仅在结合更保守的剪枝调度时才有效，从而避免了破坏性的优化循环。此外，我们引入标准的深度相关损失作为几何引导，进一步提升效率表现。在3视图LLFF数据集上，我们的模型相较FSGS实现了超过40%的压缩（基元数从57k降至32k）；在Mip-NeRF 360数据集上压缩幅度约达70%。这一显著的紧凑性提升仅以较小的重建指标代价换取，建立了小样本视角合成任务中质量-效率折中曲线（Pareto前沿）的新标杆。\n"
  },
  {
    "path": "abs/2510.10492.md",
    "content": "### Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework\n\nThis paper proposes an efficient 3D avatar coding framework that leverages compact human priors and canonical-to-target transformation to enable high-quality 3D human avatar video compression at ultra-low bit rates. The framework begins by training a canonical Gaussian avatar using articulated splatting in a network-free manner, which serves as the foundation for avatar appearance modeling. Simultaneously, a human-prior template is employed to capture temporal body movements through compact parametric representations. This decomposition of appearance and temporal evolution minimizes redundancy, enabling efficient compression: the canonical avatar is shared across the sequence, requiring compression only once, while the temporal parameters, consisting of just 94 parameters per frame, are transmitted with minimal bit-rate. For each frame, the target human avatar is generated by deforming canonical avatar via Linear Blend Skinning transformation, facilitating temporal coherent video reconstruction and novel view synthesis. Experimental results demonstrate that the proposed method significantly outperforms conventional 2D/3D codecs and existing learnable dynamic 3D Gaussian splatting compression method in terms of rate-distortion performance on mainstream multi-view human video datasets, paving the way for seamless immersive multimedia experiences in meta-verse applications.\n\n本文提出了一种高效的三维虚拟人编码框架，利用紧凑的人体先验与标准到目标的变换机制，实现了在超低比特率下的高质量三维虚拟人视频压缩。该框架首先通过无网络的关节高斯泼溅方式训练出标准高斯虚拟人，用于建模人物外观。与此同时，引入人体先验模板，通过紧凑的参数化表示捕捉时间维度上的身体运动。该种对外观与时间演化的分离建模有效减少了冗余，从而实现高效压缩：标准虚拟人在整个序列中共享，仅需压缩一次；而每帧仅需传输94个时间参数，极大降低了码率。每帧的目标虚拟人通过线性混合蒙皮（Linear Blend Skinning）对标准虚拟人进行形变生成，从而实现时间连贯的视频重建与新视角合成。实验结果表明，该方法在主流多视角人体视频数据集上，相较传统2D/3D编解码器和现有可学习的动态三维高斯泼溅压缩方法，在码率-失真表现方面具有显著优势，为元宇宙应用中的沉浸式多媒体体验铺平了道路。\n"
  },
  {
    "path": "abs/2510.10637.md",
    "content": "### High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting\n\nThe scalability of robotic learning is fundamentally bottlenecked by the significant cost and labor of real-world data collection. While simulated data offers a scalable alternative, it often fails to generalize to the real world due to significant gaps in visual appearance, physical properties, and object interactions. To address this, we propose RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into scalable, high-fidelity, and physically interactive simulation environments for robotic manipulation. Our approach reconstructs scenes using a hybrid representation: 3D Gaussian Splatting (3DGS) captures the photorealistic appearance of the environment, while mesh primitives for interactive objects ensure accurate physics simulation. Crucially, we pioneer the use of a Multi-modal Large Language Model (MLLM) to automate the creation of physically plausible, articulated assets. The MLLM analyzes visual data to infer not only physical properties (e.g., density, stiffness) but also complex kinematic structures (e.g., hinges, sliding rails) of objects. We demonstrate that policies trained entirely on data generated by RoboSimGS achieve successful zero-shot sim-to-real transfer across a diverse set of real-world manipulation tasks. Furthermore, data from RoboSimGS significantly enhances the performance and generalization capabilities of SOTA methods. Our results validate RoboSimGS as a powerful and scalable solution for bridging the sim-to-real gap.\n\n机器人学习的可扩展性受到真实世界数据采集所需高昂成本与人力的根本性限制。尽管模拟数据提供了可扩展的替代方案，但由于其在视觉外观、物理属性和物体交互等方面与现实存在显著差异，往往难以泛化至真实环境。为解决这一问题，我们提出RoboSimGS，一个创新的Real2Sim2Real框架，可将多视角真实图像转换为可扩展、高保真、具物理交互能力的机器人操作模拟环境。我们采用混合表示重建场景：三维高斯泼溅（3DGS）用于捕捉环境的照片级真实外观，而交互物体则以网格基元建模，以确保物理模拟的准确性。关键在于，我们率先引入多模态大语言模型（Multi-modal Large Language Model, MLLM），自动生成具备物理可行性与可动结构的资产。该模型分析视觉数据，不仅能够推断物体的物理属性（如密度、刚度），还能识别复杂的运动结构（如铰链、滑轨等）。我们展示了完全基于RoboSimGS生成数据训练的策略，在多个真实世界操作任务中成功实现零样本从模拟到现实的迁移。同时，RoboSimGS生成的数据显著提升了多种SOTA方法的性能与泛化能力。我们的实验结果验证了RoboSimGS作为连接模拟与现实之间鸿沟的强大且可扩展的解决方案。\n"
  },
  {
    "path": "abs/2510.10726.md",
    "content": "### WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting\n\nWe present WorldMirror, an all-in-one, feed-forward model for versatile 3D geometric prediction tasks. Unlike existing methods constrained to image-only inputs or customized for a specific task, our framework flexibly integrates diverse geometric priors, including camera poses, intrinsics, and depth maps, while simultaneously generating multiple 3D representations: dense point clouds, multi-view depth maps, camera parameters, surface normals, and 3D Gaussians. This elegant and unified architecture leverages available prior information to resolve structural ambiguities and delivers geometrically consistent 3D outputs in a single forward pass. WorldMirror achieves state-of-the-art performance across diverse benchmarks from camera, point map, depth, and surface normal estimation to novel view synthesis, while maintaining the efficiency of feed-forward inference.\n\n我们提出了WorldMirror，一个面向多种三维几何预测任务的全能型前馈模型。不同于现有仅限于图像输入或针对特定任务定制的方法，我们的框架可灵活融合多种几何先验信息，包括相机位姿、内参以及深度图，同时一次前向推理即可生成多种三维表示形式：稠密点云、多视图深度图、相机参数、表面法向以及三维高斯。该架构结构优雅统一，能够利用已有先验信息解决结构歧义，并在保持几何一致性的前提下输出高质量三维结果。WorldMirror在多个评测任务中均达到了当前最优性能，包括相机估计、点图重建、深度估计、表面法向预测和新视角合成，同时保持了前馈推理的高效性。\n"
  },
  {
    "path": "abs/2510.10993.md",
    "content": "### Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency\n\n3D Gaussian inpainting, a critical technique for numerous applications in virtual reality and multimedia, has made significant progress with pretrained diffusion models. However, ensuring multi-view consistency, an essential requirement for high-quality inpainting, remains a key challenge. In this work, we present PAInpainter, a novel approach designed to advance 3D Gaussian inpainting by leveraging perspective-aware content propagation and consistency verification across multi-view inpainted images. Our method iteratively refines inpainting and optimizes the 3D Gaussian representation with multiple views adaptively sampled from a perspective graph. By propagating inpainted images as prior information and verifying consistency across neighboring views, PAInpainter substantially enhances global consistency and texture fidelity in restored 3D scenes. Extensive experiments demonstrate the superiority of PAInpainter over existing methods. Our approach achieves superior 3D inpainting quality, with PSNR scores of 26.03 dB and 29.51 dB on the SPIn-NeRF and NeRFiller datasets, respectively, highlighting its effectiveness and generalization capability.\n\n三维高斯图像修复（3D Gaussian Inpainting）是虚拟现实与多媒体等众多应用中的关键技术，近年来借助预训练扩散模型取得了显著进展。然而，实现多视角一致性这一高质量修复所必需的核心要求，仍然面临重大挑战。为此，我们提出PAInpainter，一种利用透视感知内容传播与多视角一致性验证推进三维高斯修复的全新方法。该方法通过从透视图构建的视图图中自适应采样多个视角，迭代执行图像修复与三维高斯表示优化；同时，将修复图像作为先验信息进行传播，并对邻近视角之间的一致性进行验证，从而显著提升了三维场景修复的整体一致性与纹理真实感。大量实验验证了PAInpainter相较现有方法的优越性：在SPIn-NeRF与NeRFiller数据集上分别取得了26.03 dB与29.51 dB的PSNR成绩，充分展现了其有效性与泛化能力。\n"
  },
  {
    "path": "abs/2510.11473.md",
    "content": "### VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment\n\n3D Gaussian Splatting has recently emerged as an efficient solution for high-quality and real-time novel view synthesis. However, its capability for accurate surface reconstruction remains underexplored. Due to the discrete and unstructured nature of Gaussians, supervision based solely on image rendering loss often leads to inaccurate geometry and inconsistent multi-view alignment. In this work, we propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment (VA). Specifically, we incorporate edge-aware image cues into the rendering loss to improve surface boundary delineation. To enforce geometric consistency across views, we introduce a visibility-aware photometric alignment loss that models occlusions and encourages accurate spatial relationships among Gaussians. To further mitigate ambiguities caused by lighting variations, we incorporate normal-based constraints to refine the spatial orientation of Gaussians and improve local surface estimation. Additionally, we leverage deep image feature embeddings to enforce cross-view consistency, enhancing the robustness of the learned geometry under varying viewpoints and illumination. Extensive experiments on standard benchmarks demonstrate that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.\n\n三维高斯泼溅（3D Gaussian Splatting）近年来作为一种高效的高质量实时新视角合成方案逐渐受到关注。然而，其在精确表面重建方面的能力尚未被充分挖掘。由于高斯基元的离散性与非结构化特性，单纯依赖图像渲染损失进行监督往往会导致几何结构不准确以及多视角对齐不一致。为此，本文提出了一种通过视角对齐（View Alignment, VA）提升三维高斯几何表达能力的新方法。具体而言，我们在渲染损失中引入边缘感知图像线索，以提升表面边界的清晰度；同时，为实现跨视角的几何一致性，我们提出了一种感知可见性的光度对齐损失，该损失建模遮挡关系，鼓励高斯基元间准确的空间结构。此外，为减弱光照变化带来的歧义，我们引入基于法向的约束以优化高斯的空间朝向，从而提升局部表面估计的精度。我们还利用深度图像特征嵌入进行跨视角一致性约束，增强所学几何在不同视角和光照条件下的鲁棒性。我们在多个标准基准数据集上进行大量实验，结果表明本方法在表面重建与新视角合成任务中均达到了当前最优水平。\n"
  },
  {
    "path": "abs/2510.11689.md",
    "content": "### Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation\n\nLearning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success.\n\n在真实环境中直接学习机器人操作策略往往代价高昂且耗时漫长。尽管在仿真中训练的强化学习（RL）策略提供了可扩展的替代方案，但实现有效的模拟到现实（sim-to-real）迁移仍面临挑战，尤其是在对动力学精度要求较高的任务中。为解决这一问题，我们提出Phys2Real，一种结合视觉-语言模型（VLM）推理物理参数估计与交互式自适应的真实到模拟再回归现实（real-to-sim-to-real）RL训练管线。该方法包含三个核心模块：（1）基于三维高斯泼溅的高保真几何重建；（2）由VLM推理得到的物理参数先验分布；（3）基于交互数据的在线物理参数估计。Phys2Real在策略训练中引入可解释的物理参数，通过基于集成的不确定性量化机制，利用在线估计结果对VLM预测进行细化。在T形积木的平面推动任务中（中心质量位置不同）及一侧加重的锤子推动任务中，Phys2Real在多个指标上均显著优于基于域随机化的基线方法：在底部加重的T积木任务中成功率为100%（对比基线的79%），在顶部加重的更具挑战性T积木任务中为57%（对比23%），锤子推动任务的平均完成时间也快了15%。消融实验进一步表明，VLM与交互信息的结合对于成功完成任务至关重要。\n"
  },
  {
    "path": "abs/2510.11717.md",
    "content": "### Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams\n\nEvent cameras offer various advantages for novel view rendering compared to synchronously operating RGB cameras, and efficient event-based techniques supporting rigid scenes have been recently demonstrated in the literature. In the case of non-rigid objects, however, existing approaches additionally require sparse RGB inputs, which can be a substantial practical limitation; it remains unknown if similar models could be learned from event streams only. This paper sheds light on this challenging open question and introduces Ev4DGS, i.e., the first approach for novel view rendering of non-rigidly deforming objects in the explicit observation space (i.e., as RGB or greyscale images) from monocular event streams. Our method regresses a deformable 3D Gaussian Splatting representation through 1) a loss relating the outputs of the estimated model with the 2D event observation space, and 2) a coarse 3D deformation model trained from binary masks generated from events. We perform experimental comparisons on existing synthetic and newly recorded real datasets with non-rigid objects. The results demonstrate the validity of Ev4DGS and its superior performance compared to multiple naive baselines that can be applied in our setting.\n\n事件相机在新视图合成方面相比同步工作的RGB相机具有多种优势，并且近期已有文献展示了在刚性场景中高效的事件驱动技术。然而，对于非刚性物体，现有方法通常还需要稀疏的RGB输入，这在实际应用中可能构成显著限制；目前仍不清楚是否可以仅从事件流中学习出类似的模型。本文探讨了这一具有挑战性的开放问题，并提出了Ev4DGS，即首个能够基于单目事件流，在显式观测空间（即RGB或灰度图像）中对非刚性变形物体实现新视图合成的方法。我们的方法通过以下两点回归可变形的三维高斯溅射表示：1）一种将估计模型输出与二维事件观测空间关联的损失函数，2）一种利用事件生成的二值掩码训练的粗略三维变形模型。我们在现有的合成数据集以及新采集的非刚性物体真实数据集上进行了实验对比。结果表明，Ev4DGS在本设定下不仅是可行的，而且在性能上优于多种可用的朴素基线方法。\n"
  },
  {
    "path": "abs/2510.11878.md",
    "content": "### GS-Verse: Mesh-based Gaussian Splatting for Physics-aware Interaction in Virtual Reality\n\nAs the demand for immersive 3D content grows, the need for intuitive and efficient interaction methods becomes paramount. Current techniques for physically manipulating 3D content within Virtual Reality (VR) often face significant limitations, including reliance on engineering-intensive processes and simplified geometric representations, such as tetrahedral cages, which can compromise visual fidelity and physical accuracy. In this paper, we introduce GS-Verse (Gaussian Splatting for Virtual Environment Rendering and Scene Editing), a novel method designed to overcome these challenges by directly integrating an object's mesh with a Gaussian Splatting (GS) representation. Our approach enables more precise surface approximation, leading to highly realistic deformations and interactions. By leveraging existing 3D mesh assets, GS-Verse facilitates seamless content reuse and simplifies the development workflow. Moreover, our system is designed to be physics-engine-agnostic, granting developers robust deployment flexibility. This versatile architecture delivers a highly realistic, adaptable, and intuitive approach to interactive 3D manipulation. We rigorously validate our method against the current state-of-the-art technique that couples VR with GS in a comparative user study involving 18 participants. Specifically, we demonstrate that our approach is statistically significantly better for physics-aware stretching manipulation and is also more consistent in other physics-based manipulations like twisting and shaking. Further evaluation across various interactions and scenes confirms that our method consistently delivers high and reliable performance, showing its potential as a plausible alternative to existing methods.\n\n随着沉浸式3D内容需求的增长，对直观高效的交互方式的需求也日益迫切。目前在虚拟现实（VR）中进行3D内容物理操控的技术常常面临诸多限制，包括依赖高工程成本的流程，以及采用诸如四面体笼状结构等简化几何表示，这些都可能影响视觉真实感与物理准确性。本文提出GS-Verse（Gaussian Splatting for Virtual Environment Rendering and Scene Editing），一种新颖的方法，旨在通过将物体网格与高斯溅射（Gaussian Splatting, GS）表示直接融合，来克服上述挑战。我们的方法可实现更精确的表面近似，从而带来高度真实的变形效果与交互体验。通过利用现有的3D网格资产，GS-Verse支持无缝的内容复用，简化开发流程。此外，该系统在设计上不依赖具体的物理引擎，为开发者提供了强大的部署灵活性。这一通用架构为交互式3D操控提供了真实感强、适应性高、操作直观的新范式。我们通过一项涉及18位参与者的用户对比研究，严谨地将本方法与当前结合VR与GS的最先进技术进行比较验证。结果表明，我们的方法在感知物理性的拉伸操控方面具有统计显著优势，在扭转和抖动等其他物理操控任务中也表现出更高的一致性。在不同交互类型和场景中的进一步评估亦验证了本方法持续稳定的高性能，展现出其作为现有方法可行替代方案的潜力。\n"
  },
  {
    "path": "abs/2510.12099.md",
    "content": "### G4Splat: Geometry-Guided Gaussian Splatting with Generative Prior\n\nDespite recent advances in leveraging generative prior from pre-trained diffusion models for 3D scene reconstruction, existing methods still face two critical limitations. First, due to the lack of reliable geometric supervision, they struggle to produce high-quality reconstructions even in observed regions, let alone in unobserved areas. Second, they lack effective mechanisms to mitigate multi-view inconsistencies in the generated images, leading to severe shape-appearance ambiguities and degraded scene geometry. In this paper, we identify accurate geometry as the fundamental prerequisite for effectively exploiting generative models to enhance 3D scene reconstruction. We first propose to leverage the prevalence of planar structures to derive accurate metric-scale depth maps, providing reliable supervision in both observed and unobserved regions. Furthermore, we incorporate this geometry guidance throughout the generative pipeline to improve visibility mask estimation, guide novel view selection, and enhance multi-view consistency when inpainting with video diffusion models, resulting in accurate and consistent scene completion. Extensive experiments on Replica, ScanNet++, and DeepBlending show that our method consistently outperforms existing baselines in both geometry and appearance reconstruction, particularly for unobserved regions. Moreover, our method naturally supports single-view inputs and unposed videos, with strong generalizability in both indoor and outdoor scenarios with practical real-world applicability.\n\n尽管近期在利用预训练扩散模型的生成先验进行三维场景重建方面取得了显著进展，现有方法仍面临两个关键限制。首先，由于缺乏可靠的几何监督，即便是在已观测区域也难以生成高质量的重建结果，更不用说在未观测区域。其次，它们缺乏有效机制来缓解生成图像中的多视角不一致问题，导致严重的形状与外观歧义，进一步削弱了场景几何的准确性。本文指出，精确的几何结构是充分利用生成模型以提升三维场景重建效果的基础前提。我们首先提出利用场景中广泛存在的平面结构，推导出具有度量尺度的高精度深度图，从而为已观测与未观测区域均提供可靠监督。此外，我们在整个生成流程中引入几何引导信息，用于改进可见性掩码估计、引导新视角选择，并在利用视频扩散模型进行修复时增强多视角一致性，从而实现准确且一致的场景补全。在Replica、ScanNet++和DeepBlending等数据集上的大量实验证明，我们的方法在几何和外观重建方面均优于现有各类基线方法，尤其在未观测区域表现出显著优势。此外，我们的方法天然支持单视图输入和无位姿视频，在室内与室外场景中均展现出强泛化能力，具备良好的现实应用潜力。\n"
  },
  {
    "path": "abs/2510.12101.md",
    "content": "### Gaussian Semantic Field for One-shot LiDAR Global Localization\n\nWe present a one-shot LiDAR global localization algorithm featuring semantic disambiguation ability based on a lightweight tri-layered scene graph. While landmark semantic registration-based methods have shown promising performance improvements in global localization compared with geometric-only methods, landmarks can be repetitive and misleading for correspondence establishment. We propose to mitigate this problem by modeling semantic distributions with continuous functions learned from a population of Gaussian processes. Compared with discrete semantic labels, the continuous functions capture finer-grained geo-semantic information and also provide more detailed metric information for correspondence establishment. We insert this continuous function as the middle layer between the object layer and the metric-semantic layer, forming a tri-layered 3D scene graph, serving as a light-weight yet performant backend for one-shot localization. We term our global localization pipeline Outram-GSF (Gaussian semantic field) and conduct a wide range of experiments on publicly available data sets, validating the superior performance against the current state-of-the-art.\n\n我们提出了一种具备语义消歧能力的一次性LiDAR全局定位算法，基于轻量级的三层场景图构建而成。尽管基于语义地标配准的方法相比仅利用几何信息的方法在全局定位任务中表现出了显著提升，但由于地标往往具有重复性，容易导致误导性的匹配关系，影响对应关系的建立。为缓解这一问题，我们提出采用从高斯过程族中学习得到的连续函数来建模语义分布。相较于离散语义标签，这些连续函数能够捕捉更细粒度的地理-语义信息，并提供更丰富的度量信息用于对应关系建立。我们将该连续函数作为中间层插入在对象层与度量-语义层之间，构建出一个三层结构的三维场景图，作为一次性定位任务中轻量但高效的后端支持。我们将该全局定位流程命名为Outram-GSF（Gaussian Semantic Field），并在多个公开数据集上开展了广泛实验，验证了其相较现有最先进方法在性能上的显著优势。\n"
  },
  {
    "path": "abs/2510.12174.md",
    "content": "### UniGS: Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering\n\nIn this paper, we propose UniGS, a unified map representation and differentiable framework for high-fidelity multimodal 3D reconstruction based on 3D Gaussian Splatting. Our framework integrates a CUDA-accelerated rasterization pipeline capable of rendering photo-realistic RGB images, geometrically accurate depth maps, consistent surface normals, and semantic logits simultaneously. We redesign the rasterization to render depth via differentiable ray-ellipsoid intersection rather than using Gaussian centers, enabling effective optimization of rotation and scale attribute through analytic depth gradients. Furthermore, we derive the analytic gradient formulation for surface normal rendering, ensuring geometric consistency among reconstructed 3D scenes. To improve computational and storage efficiency, we introduce a learnable attribute that enables differentiable pruning of Gaussians with minimal contribution during training. Quantitative and qualitative experiments demonstrate state-of-the-art reconstruction accuracy across all modalities, validating the efficacy of our geometry-aware paradigm.\n\n本文提出了UniGS，一种基于三维高斯溅射（3D Gaussian Splatting）的统一地图表示与可微框架，用于高保真多模态三维重建。该框架集成了一条CUDA加速的光栅化管线，能够同时渲染照片级真实感的RGB图像、几何精确的深度图、一致的表面法线以及语义logits。我们重新设计了光栅化过程，通过可微的射线-椭球体交点计算方式来渲染深度，而非采用高斯中心，从而可通过解析深度梯度高效优化旋转和尺度属性。此外，我们推导了表面法线渲染的解析梯度公式，确保所重建三维场景之间的几何一致性。为提高计算与存储效率，我们引入了一种可学习属性，允许在训练过程中对贡献较小的高斯进行可微裁剪。定量与定性实验结果显示，我们的方法在各模态上均达到了当前最优的重建精度，有力验证了我们几何感知范式的有效性。\n"
  },
  {
    "path": "abs/2510.12282.md",
    "content": "### PAGS: Priority-Adaptive Gaussian Splatting for Dynamic Driving Scenes\n\nReconstructing dynamic 3D urban scenes is crucial for autonomous driving, yet current methods face a stark trade-off between fidelity and computational cost. This inefficiency stems from their semantically agnostic design, which allocates resources uniformly, treating static backgrounds and safety-critical objects with equal importance. To address this, we introduce Priority-Adaptive Gaussian Splatting (PAGS), a framework that injects task-aware semantic priorities directly into the 3D reconstruction and rendering pipeline. PAGS introduces two core contributions: (1) Semantically-Guided Pruning and Regularization strategy, which employs a hybrid importance metric to aggressively simplify non-critical scene elements while preserving fine-grained details on objects vital for navigation. (2) Priority-Driven Rendering pipeline, which employs a priority-based depth pre-pass to aggressively cull occluded primitives and accelerate the final shading computations. Extensive experiments on the Waymo and KITTI datasets demonstrate that PAGS achieves exceptional reconstruction quality, particularly on safety-critical objects, while significantly reducing training time and boosting rendering speeds to over 350 FPS.\n\n动态三维城市场景的重建对自动驾驶至关重要，然而当前方法在重建精度与计算成本之间存在显著权衡。这一低效性源于现有方法在语义上缺乏感知，其资源分配方式一视同仁地对待静态背景与安全关键物体。为了解决这一问题，我们提出了优先级自适应高斯溅射（Priority-Adaptive Gaussian Splatting, PAGS）框架，将面向任务的语义优先级直接引入三维重建与渲染流程中。PAGS包含两项核心创新：（1）语义引导的裁剪与正则化策略，通过混合重要性度量对非关键场景元素进行激进简化，同时保留对导航至关重要物体的细粒度细节；（2）优先级驱动的渲染流程，利用基于优先级的深度预处理阶段，积极剔除被遮挡图元，加速最终着色计算。我们在Waymo与KITTI数据集上的大量实验表明，PAGS在重建安全关键目标方面展现出卓越的重建质量，同时显著缩短训练时间，将渲染速度提升至350帧每秒以上。\n"
  },
  {
    "path": "abs/2510.12308.md",
    "content": "### Hybrid Gaussian Splatting for Novel Urban View Synthesis\n\nThis paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025. The challenge concerns novel view synthesis in street scenes, and participants are required to generate, starting from car-centric frames captured during some training traversals, renders of the same urban environment as viewed from a different traversal (e.g. different street lane or car direction). Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models, and it is composed of two stages: First, we fit a 3D reconstruction of the scene and render novel views as seen from the target cameras. Then, we enhance the resulting frames with a dedicated single-step diffusion model. We discuss specific choices made in the initialization of gaussian primitives as well as the finetuning of the enhancer model and its training data curation. We report the performance of our model design and we ablate its components in terms of novel view quality as measured by PSNR, SSIM and LPIPS. On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.\n\n本文介绍了Qualcomm AI Research在ICCV 2025的RealADSim研讨会中所举办的RealADSim-NVS挑战赛中的解决方案。该挑战聚焦于街景场景下的新视图合成，要求参赛者基于若干训练轨迹中采集的车载视角图像，生成同一城市环境在不同轨迹（例如不同车道或车辆行驶方向）下的渲染结果。我们的解决方案受到混合式场景生成方法及融合高斯溅射与扩散模型的生成式模拟器的启发，整体由两个阶段组成：首先，对场景进行三维重建，并从目标视角渲染出新视图；随后，利用专用的单步扩散模型对渲染帧进行增强。我们详细讨论了高斯基元初始化、增强模型的微调策略以及训练数据的筛选方法。我们报告了该模型设计在新视图质量方面的性能表现，并通过PSNR、SSIM和LPIPS指标对各组件进行消融分析。在官方测试集排行榜上，我们的方法获得了总分0.432的成绩，排名第二。\n"
  },
  {
    "path": "abs/2510.12493.md",
    "content": "### BSGS: Bi-stage 3D Gaussian Splatting for Camera Motion Deblurring\n\n3D Gaussian Splatting has exhibited remarkable capabilities in 3D scene reconstruction. However, reconstructing high-quality 3D scenes from motion-blurred images caused by camera motion poses a significant challenge. The performance of existing 3DGS-based deblurring methods are limited due to their inherent mechanisms, such as extreme dependence on the accuracy of camera poses and inability to effectively control erroneous Gaussian primitives densification caused by motion blur. To solve these problems, we introduce a novel framework, Bi-Stage 3D Gaussian Splatting, to accurately reconstruct 3D scenes from motion-blurred images. BSGS contains two stages. First, Camera Pose Refinement roughly optimizes camera poses to reduce motion-induced distortions. Second, with fixed rough camera poses, Global Rigid Transformation further corrects motion-induced blur distortions. To alleviate multi-subframe gradient conflicts, we propose a subframe gradient aggregation strategy to optimize both stages. Furthermore, a space-time bi-stage optimization strategy is introduced to dynamically adjust primitive densification thresholds and prevent premature noisy Gaussian generation in blurred regions. Comprehensive experiments verify the effectiveness of our proposed deblurring method and show its superiority over the state of the arts.\n\n三维高斯溅射（3D Gaussian Splatting）在三维场景重建方面展现出卓越能力。然而，针对因相机运动而产生的运动模糊图像进行高质量三维重建，仍然是一个极具挑战性的问题。现有基于3DGS的去模糊方法在性能上受限，主要原因在于其机制上的固有限制，例如对相机位姿精度的极度依赖，以及难以有效控制运动模糊导致的错误高斯基元过度密化问题。为解决上述难题，我们提出了一种新颖的双阶段三维高斯溅射框架（Bi-Stage 3D Gaussian Splatting，BSGS），用于从运动模糊图像中精确重建三维场景。BSGS包括两个阶段：第一阶段为相机位姿粗调，用于粗略优化相机姿态以减小运动引起的畸变；第二阶段在固定粗位姿的基础上，通过全局刚体变换进一步校正由运动模糊带来的失真。为缓解多子帧梯度冲突，我们提出了一种子帧梯度聚合策略，用于联合优化两个阶段。此外，我们还引入了时空双阶段优化策略，动态调整基元密化阈值，有效防止在模糊区域中过早生成噪声高斯点。大量实验验证了我们所提出的去模糊方法的有效性，并显示其在性能上显著优于现有最先进技术。\n"
  },
  {
    "path": "abs/2510.12768.md",
    "content": "### Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction\n\nReconstructing dynamic 3D scenes from monocular input is fundamentally under-constrained, with ambiguities arising from occlusion and extreme novel views. While dynamic Gaussian Splatting offers an efficient representation, vanilla models optimize all Gaussian primitives uniformly, ignoring whether they are well or poorly observed. This limitation leads to motion drifts under occlusion and degraded synthesis when extrapolating to unseen views. We argue that uncertainty matters: Gaussians with recurring observations across views and time act as reliable anchors to guide motion, whereas those with limited visibility are treated as less reliable. To this end, we introduce USplat4D, a novel Uncertainty-aware dynamic Gaussian Splatting framework that propagates reliable motion cues to enhance 4D reconstruction. Our key insight is to estimate time-varying per-Gaussian uncertainty and leverages it to construct a spatio-temporal graph for uncertainty-aware optimization. Experiments on diverse real and synthetic datasets show that explicitly modeling uncertainty consistently improves dynamic Gaussian Splatting models, yielding more stable geometry under occlusion and high-quality synthesis at extreme viewpoints.\n\n从单目输入重建动态三维场景本质上是一个欠约束问题，遮挡和极端新视角带来的歧义使该任务更加困难。尽管动态高斯溅射提供了一种高效的表示方式，但基础模型对所有高斯基元一视同仁地进行优化，忽略了它们是否被充分观测。这一限制在遮挡区域易导致运动漂移，并在外推至未见视角时引发合成质量下降。我们认为，不确定性至关重要：在多个视角和时间中被重复观测的高斯基元可作为可靠锚点以引导运动，而可见性有限的基元则应视为不太可靠。为此，我们提出USplat4D，一种新颖的不确定性感知动态高斯溅射框架，能够传播可靠的运动信息以提升四维重建质量。其核心思想在于估计每个高斯基元的时间变化不确定性，并据此构建时空图，用于不确定性感知优化。在多个真实与合成数据集上的实验表明，显式建模不确定性能稳定提升动态高斯溅射模型的性能，在遮挡区域获得更稳定的几何结构，在极端视角下实现更高质量的合成效果。\n"
  },
  {
    "path": "abs/2510.12901.md",
    "content": "### SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms\n\nRigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.\n\n对自动驾驶汽车等自主机器人进行严格测试对于其在真实环境中的安全部署至关重要。这要求构建高保真模拟器，以测试那些在现实中难以安全或全面采集的场景。现有基于NeRF和3DGS的神经渲染方法虽具有潜力，但存在渲染速度慢或仅支持针孔相机模型的问题，限制了它们在需要高畸变镜头和LiDAR数据的常见应用场景中的适用性。多传感器模拟还面临额外挑战，目前的方法往往通过牺牲某一模态的质量来应对跨传感器不一致问题。为克服这些限制，我们提出了SimULi，这是首个能够实时渲染任意相机模型和LiDAR数据的方法。该方法在支持复杂相机模型的3DGUT基础上扩展了对LiDAR的支持，采用自动平铺策略来适配任意旋转LiDAR模型，并结合基于射线的剔除机制。为缓解跨传感器不一致问题，我们设计了一种分解式三维高斯表示及锚定策略，与现有方法相比可将平均相机误差与深度误差降低最多达40%。SimULi在渲染速度方面比光线追踪方法快10至20倍，比现有基于光栅化的方法快1.5至10倍，同时支持更广泛的相机模型。在两个广泛使用的自动驾驶数据集上进行评估时，SimULi在多个相机和LiDAR指标上均达到了或超过了现有最先进方法的保真度表现。\n"
  },
  {
    "path": "abs/2510.13381.md",
    "content": "#### Leveraging 2D Priors and SDF Guidance for Dynamic Urban Scene Rendering\n\nDynamic scene rendering and reconstruction play a crucial role in computer vision and augmented reality. Recent methods based on 3D Gaussian Splatting (3DGS), have enabled accurate modeling of dynamic urban scenes, but for urban scenes they require both camera and LiDAR data, ground-truth 3D segmentations and motion data in the form of tracklets or pre-defined object templates such as SMPL. In this work, we explore whether a combination of 2D object agnostic priors in the form of depth and point tracking coupled with a signed distance function (SDF) representation for dynamic objects can be used to relax some of these requirements. We present a novel approach that integrates Signed Distance Functions (SDFs) with 3D Gaussian Splatting (3DGS) to create a more robust object representation by harnessing the strengths of both methods. Our unified optimization framework enhances the geometric accuracy of 3D Gaussian splatting and improves deformation modeling within the SDF, resulting in a more adaptable and precise representation. We demonstrate that our method achieves state-of-the-art performance in rendering metrics even without LiDAR data on urban scenes. When incorporating LiDAR, our approach improved further in reconstructing and generating novel views across diverse object categories, without ground-truth 3D motion annotation. Additionally, our method enables various scene editing tasks, including scene decomposition, and scene composition.\n\n动态场景的渲染与重建在计算机视觉和增强现实中具有关键作用。近期基于三维高斯溅射（3D Gaussian Splatting, 3DGS）的方法实现了对动态城市场景的高精度建模，但对于城市环境，它们通常依赖于相机与LiDAR数据、三维分割的真实标签以及形式为tracklets或预定义对象模板（如SMPL）的运动信息。本文探讨了是否可以结合以深度和点追踪为形式的无类别二维先验信息，以及针对动态物体的有符号距离函数（Signed Distance Function, SDF）表示，以放宽这些数据要求。我们提出了一种新颖方法，将SDF与3DGS相结合，利用两者的优势构建更鲁棒的物体表示。我们的统一优化框架不仅提升了3DGS的几何精度，还增强了SDF内的变形建模能力，从而实现更灵活、精确的表示方式。实验表明，即便在没有LiDAR数据的情况下，我们的方法在渲染指标上也达到了当前最先进的性能；而在引入LiDAR数据后，我们的方法在无需真实3D运动标注的前提下，进一步提升了对多种物体类别的新视图重建与生成能力。此外，我们的方法还支持多种场景编辑任务，包括场景分解与组合。\n"
  },
  {
    "path": "abs/2510.13454.md",
    "content": "### VIST3A: Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator\n\nThe rapid progress of large, pretrained models for both visual content generation and 3D reconstruction opens up new possibilities for text-to-3D generation. Intuitively, one could obtain a formidable 3D scene generator if one were able to combine the power of a modern latent text-to-video model as \"generator\" with the geometric abilities of a recent (feedforward) 3D reconstruction system as \"decoder\". We introduce VIST3A, a general framework that does just that, addressing two main challenges. First, the two components must be joined in a way that preserves the rich knowledge encoded in their weights. We revisit model stitching, i.e., we identify the layer in the 3D decoder that best matches the latent representation produced by the text-to-video generator and stitch the two parts together. That operation requires only a small dataset and no labels. Second, the text-to-video generator must be aligned with the stitched 3D decoder, to ensure that the generated latents are decodable into consistent, perceptually convincing 3D scene geometry. To that end, we adapt direct reward finetuning, a popular technique for human preference alignment. We evaluate the proposed VIST3A approach with different video generators and 3D reconstruction models. All tested pairings markedly improve over prior text-to-3D models that output Gaussian splats. Moreover, by choosing a suitable 3D base model, VIST3A also enables high-quality text-to-pointmap generation.\n\n近年来，大规模预训练模型在视觉内容生成与三维重建领域的迅猛发展为文本到三维（text-to-3D）生成带来了全新机遇。直观来看，若能将现代潜变量文本到视频生成模型作为“生成器”与最新的（前馈式）三维重建系统作为“解码器”相结合，则有望构建出强大的三维场景生成器。我们提出了VIST3A，一个实现上述目标的通用框架，并针对其中的两大关键挑战提出解决方案。首先，这两个组件必须以一种能够保留其权重中丰富知识的方式进行衔接。我们重新审视了模型拼接的策略，即在三维解码器中识别最匹配文本到视频生成器所生成潜变量表示的层，并将两者拼接起来。该过程仅需少量数据且无需标签。其次，为确保生成的潜变量能够被解码为一致且感知上可信的三维场景几何结构，需对文本到视频生成器进行与拼接后的三维解码器的对齐。为此，我们采用了一种流行的人类偏好对齐技术——直接奖励微调（Direct Reward Finetuning）。我们在多种视频生成器与三维重建模型组合下对VIST3A进行了评估，所有组合在效果上均显著优于以往输出高斯点云的text-to-3D模型。此外，得益于灵活的三维基础模型选择，VIST3A还支持高质量的文本到点云图（text-to-pointmap）生成。\n"
  },
  {
    "path": "abs/2510.13678.md",
    "content": "### FlashWorld: High-quality 3D Scene Generation within Seconds\n\nWe propose FlashWorld, a generative model that produces 3D scenes from a single image or text prompt in seconds, 10~100× faster than previous works while possessing superior rendering quality. Our approach shifts from the conventional multi-view-oriented (MV-oriented) paradigm, which generates multi-view images for subsequent 3D reconstruction, to a 3D-oriented approach where the model directly produces 3D Gaussian representations during multi-view generation. While ensuring 3D consistency, 3D-oriented method typically suffers poor visual quality. FlashWorld includes a dual-mode pre-training phase followed by a cross-mode post-training phase, effectively integrating the strengths of both paradigms. Specifically, leveraging the prior from a video diffusion model, we first pre-train a dual-mode multi-view diffusion model, which jointly supports MV-oriented and 3D-oriented generation modes. To bridge the quality gap in 3D-oriented generation, we further propose a cross-mode post-training distillation by matching distribution from consistent 3D-oriented mode to high-quality MV-oriented mode. This not only enhances visual quality while maintaining 3D consistency, but also reduces the required denoising steps for inference. Also, we propose a strategy to leverage massive single-view images and text prompts during this process to enhance the model's generalization to out-of-distribution inputs. Extensive experiments demonstrate the superiority and efficiency of our method.\n\n我们提出了FlashWorld，一种能够在数秒内基于单张图像或文本提示生成三维场景的生成模型，其速度相比现有方法提升10到100倍，同时具备更优越的渲染质量。我们的方法突破了传统多视图导向（MV-oriented）范式——即先生成多视图图像再进行三维重建的流程，转而采用三维导向（3D-oriented）策略，在多视图生成阶段直接生成三维高斯表示。尽管3D导向方法能够保证三维一致性，但通常会带来较差的视觉质量。FlashWorld通过“双模态预训练 + 跨模态后训练”的方式，有效融合了两种范式的优点。具体而言，我们首先借助视频扩散模型的先验，预训练了一个支持MV导向与3D导向双模式的多视图扩散模型。在此基础上，我们提出了一种跨模态后训练蒸馏策略，将3D导向模式生成的分布与高质量MV导向模式对齐，从而在保持三维一致性的同时显著提升视觉质量，并减少推理过程中所需的去噪步数。此外，我们还提出了一种利用大规模单视图图像与文本提示的策略，以增强模型在处理分布外输入时的泛化能力。大量实验验证了我们方法在效率与质量上的显著优势。\n"
  },
  {
    "path": "abs/2510.13978.md",
    "content": "### Instant Skinned Gaussian Avatars for Web, Mobile and VR Applications\n\nWe present Instant Skinned Gaussian Avatars, a real-time and cross-platform 3D avatar system. Many approaches have been proposed to animate Gaussian Splatting, but they often require camera arrays, long preprocessing times, or high-end GPUs. Some methods attempt to convert Gaussian Splatting into mesh-based representations, achieving lightweight performance but sacrificing visual fidelity. In contrast, our system efficiently animates Gaussian Splatting by leveraging parallel splat-wise processing to dynamically follow the underlying skinned mesh in real time while preserving high visual fidelity. From smartphone-based 3D scanning to on-device preprocessing, the entire process takes just around five minutes, with the avatar generation step itself completed in only about 30 seconds. Our system enables users to instantly transform their real-world appearance into a 3D avatar, making it ideal for seamless integration with social media and metaverse applications.\n\n我们提出了Instant Skinned Gaussian Avatars，一种实时、跨平台的三维头像系统。尽管已有许多方法被提出用于驱动高斯溅射动画，但这些方法往往依赖摄像头阵列、长时间的预处理流程，或高端GPU设备。一些方法尝试将高斯溅射转换为基于网格的表示，以实现轻量级性能，但通常会牺牲视觉保真度。相比之下，我们的系统通过并行的逐点溅射处理策略，使高斯点能够实时动态地跟随底层蒙皮网格，从而在保持高视觉保真的同时实现高效动画驱动。从基于智能手机的三维扫描到设备端预处理，整个流程仅需约五分钟，其中头像生成环节只需约30秒。我们的系统让用户可以瞬间将现实外貌转化为三维头像，非常适合无缝集成到社交媒体和元宇宙应用中。\n"
  },
  {
    "path": "abs/2510.14081.md",
    "content": "### Capture, Canonicalize, Splat: Zero-Shot 3D Gaussian Avatars from Unstructured Phone Images\n\nWe present a novel, zero-shot pipeline for creating hyperrealistic, identity-preserving 3D avatars from a few unstructured phone images. Existing methods face several challenges: single-view approaches suffer from geometric inconsistencies and hallucinations, degrading identity preservation, while models trained on synthetic data fail to capture high-frequency details like skin wrinkles and fine hair, limiting realism. Our method introduces two key contributions: (1) a generative canonicalization module that processes multiple unstructured views into a standardized, consistent representation, and (2) a transformer-based model trained on a new, large-scale dataset of high-fidelity Gaussian splatting avatars derived from dome captures of real people. This \"Capture, Canonicalize, Splat\" pipeline produces static quarter-body avatars with compelling realism and robust identity preservation from unstructured photos.\n\n我们提出了一种新颖的零样本生成流程，能够仅通过几张非结构化手机照片创建高度逼真、身份保留良好的三维头像。现有方法面临诸多挑战：单视角方案容易产生几何不一致和虚构内容，损害身份还原效果；而基于合成数据训练的模型则难以捕捉皮肤皱纹、细发丝等高频细节，限制了真实感。为此，我们引入了两个关键贡献：（1）一个生成式标准化模块，用于将多个非结构化视角处理为统一、连续的表示；（2）一个基于Transformer的模型，训练于一个全新构建的大规模高保真高斯溅射头像数据集，该数据集来源于真实人像的穹顶捕捉。该“捕捉-标准化-溅射”（Capture, Canonicalize, Splat）流程能够从非结构化图像中生成静态四分身头像，兼具极高的真实感与稳健的身份保留能力。\n"
  },
  {
    "path": "abs/2510.14179.md",
    "content": "### Virtually Being: Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures\n\nWe introduce a framework that enables both multi-view character consistency and 3D camera control in video diffusion models through a novel customization data pipeline. We train the character consistency component with recorded volumetric capture performances re-rendered with diverse camera trajectories via 4D Gaussian Splatting (4DGS), lighting variability obtained with a video relighting model. We fine-tune state-of-the-art open-source video diffusion models on this data to provide strong multi-view identity preservation, precise camera control, and lighting adaptability. Our framework also supports core capabilities for virtual production, including multi-subject generation using two approaches: joint training and noise blending, the latter enabling efficient composition of independently customized models at inference time; it also achieves scene and real-life video customization as well as control over motion and spatial layout during customization. Extensive experiments show improved video quality, higher personalization accuracy, and enhanced camera control and lighting adaptability, advancing the integration of video generation into virtual production.\n\n我们提出了一个新框架，通过创新的数据定制流程，使视频扩散模型同时具备多视角角色一致性与三维相机控制能力。我们利用4D高斯溅射（4D Gaussian Splatting, 4DGS）对录制的体积捕捉表演进行多相机轨迹重渲染，并结合视频重光照模型引入光照变化，用以训练角色一致性模块。在此基础上，我们对开源的先进视频扩散模型进行微调，使其在多视角身份保留、精确相机控制与光照适应性方面具备强大能力。该框架还支持虚拟制作中的核心能力，包括多角色生成（采用联合训练与噪声融合两种方式，其中噪声融合支持在推理阶段高效组合独立定制模型）；此外，还支持场景与真实视频定制，以及对运动轨迹与空间布局的控制。大量实验结果表明，该方法在视频质量、个性化准确性、相机控制与光照适应性方面均有显著提升，推动了视频生成技术在虚拟制作中的深度融合。\n"
  },
  {
    "path": "abs/2510.14270.md",
    "content": "### GauSSmart: Enhanced 3D Reconstruction through 2D Foundation Models and Geometric Filtering\n\nScene reconstruction has emerged as a central challenge in computer vision, with approaches such as Neural Radiance Fields (NeRF) and Gaussian Splatting achieving remarkable progress. While Gaussian Splatting demonstrates strong performance on large-scale datasets, it often struggles to capture fine details or maintain realism in regions with sparse coverage, largely due to the inherent limitations of sparse 3D training data.\nIn this work, we propose GauSSmart, a hybrid method that effectively bridges 2D foundational models and 3D Gaussian Splatting reconstruction. Our approach integrates established 2D computer vision techniques, including convex filtering and semantic feature supervision from foundational models such as DINO, to enhance Gaussian-based scene reconstruction. By leveraging 2D segmentation priors and high-dimensional feature embeddings, our method guides the densification and refinement of Gaussian splats, improving coverage in underrepresented areas and preserving intricate structural details.\nWe validate our approach across three datasets, where GauSSmart consistently outperforms existing Gaussian Splatting in the majority of evaluated scenes. Our results demonstrate the significant potential of hybrid 2D-3D approaches, highlighting how the thoughtful combination of 2D foundational models with 3D reconstruction pipelines can overcome the limitations inherent in either approach alone.\n\n场景重建已成为计算机视觉领域的核心挑战之一，近年来包括Neural Radiance Fields（NeRF）和高斯溅射（Gaussian Splatting）在内的方法取得了显著进展。尽管高斯溅射在大规模数据集上表现优异，但在稀疏区域往往难以捕捉细节或保持真实感，这主要源于稀疏三维训练数据的固有限制。\n为了解决这一问题，我们提出了GauSSmart，一种有效融合二维基础模型与三维高斯溅射重建的混合方法。该方法结合了经典的二维计算机视觉技术，包括凸性滤波与来自DINO等基础模型的语义特征监督，从而增强基于高斯的场景重建。通过引入二维分割先验与高维特征嵌入，GauSSmart能够指导高斯点的密化与细化过程，提升在低覆盖区域的重建效果，并更好地保留精细的结构细节。\n我们在三个数据集上对该方法进行了验证，结果表明在大多数场景中，GauSSmart相较于现有高斯溅射方法表现出更优性能。我们的研究展示了2D-3D混合方法的巨大潜力，表明通过将二维基础模型与三维重建流程有机结合，有望突破单一方法固有的局限。\n"
  },
  {
    "path": "abs/2510.14564.md",
    "content": "### BalanceGS: Algorithm-System Co-design for Efficient 3D Gaussian Splatting Training on GPU\n\n3D Gaussian Splatting (3DGS) has emerged as a promising 3D reconstruction technique. The traditional 3DGS training pipeline follows three sequential steps: Gaussian densification, Gaussian projection, and color splatting. Despite its promising reconstruction quality, this conventional approach suffers from three critical inefficiencies: (1) Skewed density allocation during Gaussian densification, (2) Imbalanced computation workload during Gaussian projection and (3) Fragmented memory access during color splatting.\nTo tackle the above challenges, we introduce BalanceGS, the algorithm-system co-design for efficient training in 3DGS. (1) At the algorithm level, we propose heuristic workload-sensitive Gaussian density control to automatically balance point distributions - removing 80% redundant Gaussians in dense regions while filling gaps in sparse areas. (2) At the system level, we propose Similarity-based Gaussian sampling and merging, which replaces the static one-to-one thread-pixel mapping with adaptive workload distribution - threads now dynamically process variable numbers of Gaussians based on local cluster density. (3) At the mapping level, we propose reordering-based memory access mapping strategy that restructures RGB storage and enables batch loading in shared memory.\nExtensive experiments demonstrate that compared with 3DGS, our approach achieves a 1.44× training speedup on a NVIDIA A100 GPU with negligible quality degradation.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）已成为一种具有前景的三维重建技术。传统的3DGS训练流程通常包含三个顺序步骤：高斯密化、高斯投影和颜色溅射。尽管其重建质量令人瞩目，但该常规方法存在三项关键效率瓶颈：（1）高斯密化过程中密度分配不均，（2）高斯投影阶段计算负载不平衡，（3）颜色溅射阶段内存访问碎片化。\n为应对上述挑战，我们提出了BalanceGS，这是一种面向3DGS高效训练的算法-系统协同设计方法。（1）在算法层面，我们设计了启发式的负载敏感型高斯密度控制机制，自动平衡点云分布——在高密度区域剔除80%的冗余高斯，同时填补稀疏区域的空洞。（2）在系统层面，我们提出了基于相似度的高斯采样与合并方法，替代静态的一对一线程-像素映射，采用自适应负载分配机制——线程可根据局部聚类密度动态处理不同数量的高斯点。（3）在映射层面，我们设计了基于重排的内存访问映射策略，重构RGB数据的存储结构，实现共享内存中的批量加载。\n大量实验表明，与传统3DGS相比，BalanceGS在NVIDIA A100 GPU上实现了1.44倍的训练加速，且几乎无视觉质量损失。\n"
  },
  {
    "path": "abs/2510.14705.md",
    "content": "### Leveraging Learned Image Prior for 3D Gaussian Compression\n\nCompression techniques for 3D Gaussian Splatting (3DGS) have recently achieved considerable success in minimizing storage overhead for 3D Gaussians while preserving high rendering quality. Despite the impressive storage reduction, the lack of learned priors restricts further advances in the rate-distortion trade-off for 3DGS compression tasks. To address this, we introduce a novel 3DGS compression framework that leverages the powerful representational capacity of learned image priors to recover compression-induced quality degradation. Built upon initially compressed Gaussians, our restoration network effectively models the compression artifacts in the image space between degraded and original Gaussians. To enhance the rate-distortion performance, we provide coarse rendering residuals into the restoration network as side information. By leveraging the supervision of restored images, the compressed Gaussians are refined, resulting in a highly compact representation with enhanced rendering performance. Our framework is designed to be compatible with existing Gaussian compression methods, making it broadly applicable across different baselines. Extensive experiments validate the effectiveness of our framework, demonstrating superior rate-distortion performance and outperforming the rendering quality of state-of-the-art 3DGS compression methods while requiring substantially less storage.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）的压缩技术近年来在减少存储开销的同时保持高质量渲染方面取得了显著进展。尽管现有方法在压缩率方面表现出色，但由于缺乏可学习先验，其在码率-失真（rate-distortion）平衡上的进一步提升受限。为此，我们提出了一种新颖的3DGS压缩框架，利用图像领域中强大的可学习先验能力来恢复压缩所导致的质量退化。该方法以初步压缩后的高斯点为基础，引入一个恢复网络，在图像空间中建模降质高斯与原始高斯之间的压缩伪影。为了进一步提升码率-失真表现，我们将粗渲染残差信息作为辅助输入引入恢复网络。通过借助恢复图像的监督信号，压缩高斯得以进一步细化，从而获得更紧凑的表示和更高质量的渲染效果。该框架可与现有高斯压缩方法兼容，具有良好的通用性和适配性。大量实验证明，该方法在压缩性能上显著优于当前最先进的3DGS压缩方法，在大幅减少存储需求的同时，提升了最终渲染质量。\n"
  },
  {
    "path": "abs/2510.14977.md",
    "content": "### Terra: Explorable Native 3D World Model with Point Latents\n\nWorld models have garnered increasing attention for comprehensive modeling of the real world. However, most existing methods still rely on pixel-aligned representations as the basis for world evolution, neglecting the inherent 3D nature of the physical world. This could undermine the 3D consistency and diminish the modeling efficiency of world models. In this paper, we present Terra, a native 3D world model that represents and generates explorable environments in an intrinsic 3D latent space. Specifically, we propose a novel point-to-Gaussian variational autoencoder (P2G-VAE) that encodes 3D inputs into a latent point representation, which is subsequently decoded as 3D Gaussian primitives to jointly model geometry and appearance. We then introduce a sparse point flow matching network (SPFlow) for generating the latent point representation, which simultaneously denoises the positions and features of the point latents. Our Terra enables exact multi-view consistency with native 3D representation and architecture, and supports flexible rendering from any viewpoint with only a single generation process. Furthermore, Terra achieves explorable world modeling through progressive generation in the point latent space. We conduct extensive experiments on the challenging indoor scenes from ScanNet v2. Terra achieves state-of-the-art performance in both reconstruction and generation with high 3D consistency.\n\n世界模型因其对现实世界的全面建模能力而受到越来越多的关注。然而，大多数现有方法仍然依赖于像素对齐的表示作为世界演化的基础，忽视了物理世界本质上的三维属性，这可能削弱模型的三维一致性并降低建模效率。本文提出Terra，一种原生三维世界模型，能够在内在的三维潜空间中表示和生成可探索环境。具体而言，我们提出了一种新颖的点到高斯变分自编码器（P2G-VAE），该模型将三维输入编码为点状潜表示，并进一步解码为三维高斯基元，从而实现几何与外观的联合建模。我们还引入了稀疏点流匹配网络（SPFlow）以生成点状潜表示，该网络可同时对潜点的位置与特征进行去噪。得益于原生三维表示和架构，Terra具备严格的多视图一致性，仅通过一次生成过程即可支持从任意视角的灵活渲染。此外，Terra还通过在点潜空间中的逐步生成实现可探索的世界建模。我们在ScanNet v2这一具有挑战性的室内场景数据集上进行了大量实验，结果表明Terra在重建与生成任务中均达到了当前最先进的性能，并展现出极高的三维一致性。\n"
  },
  {
    "path": "abs/2510.15072.md",
    "content": "### SaLon3R: Structure-aware Long-term Generalizable 3D Reconstruction from Unposed Images\n\nRecent advances in 3D Gaussian Splatting (3DGS) have enabled generalizable, on-the-fly reconstruction of sequential input views. However, existing methods often predict per-pixel Gaussians and combine Gaussians from all views as the scene representation, leading to substantial redundancies and geometric inconsistencies in long-duration video sequences. To address this, we propose SaLon3R, a novel framework for Structure-aware, Long-term 3DGS Reconstruction. To our best knowledge, SaLon3R is the first online generalizable GS method capable of reconstructing over 50 views in over 10 FPS, with 50% to 90% redundancy removal. Our method introduces compact anchor primitives to eliminate redundancy through differentiable saliency-aware Gaussian quantization, coupled with a 3D Point Transformer that refines anchor attributes and saliency to resolve cross-frame geometric and photometric inconsistencies. Specifically, we first leverage a 3D reconstruction backbone to predict dense per-pixel Gaussians and a saliency map encoding regional geometric complexity. Redundant Gaussians are compressed into compact anchors by prioritizing high-complexity regions. The 3D Point Transformer then learns spatial structural priors in 3D space from training data to refine anchor attributes and saliency, enabling regionally adaptive Gaussian decoding for geometric fidelity. Without known camera parameters or test-time optimization, our approach effectively resolves artifacts and prunes the redundant 3DGS in a single feed-forward pass. Experiments on multiple datasets demonstrate our state-of-the-art performance on both novel view synthesis and depth estimation, demonstrating superior efficiency, robustness, and generalization ability for long-term generalizable 3D reconstruction.\n\n近年来，三维高斯溅射（3D Gaussian Splatting, 3DGS）在实现通用性强、可在线处理的逐帧视角重建方面取得了显著进展。然而，现有方法通常基于逐像素预测高斯表示，并将所有视角的高斯点直接组合为场景表示，这在处理长时序视频时易导致大量冗余和几何不一致问题。为此，我们提出SaLon3R，一种结构感知的长期3DGS重建新框架。据我们所知，SaLon3R是首个支持在线、通用的3DGS方法，能够以每秒超过10帧的速度重建超过50个视角，并实现50%至90%的冗余消除。该方法引入紧凑的锚点基元，通过可微的显著性感知高斯量化机制消除冗余，并结合三维点云Transformer网络（3D Point Transformer）对锚点属性和显著性进行优化，从而缓解跨帧几何与光照不一致问题。具体而言，我们首先利用三维重建骨干网络预测密集的逐像素高斯表示和编码区域几何复杂度的显著性图；再根据复杂区域优先原则，将冗余高斯压缩为紧凑锚点。随后，3D Point Transformer在三维空间中学习空间结构先验，对锚点属性与显著性进行细化，使解码过程在区域层面具备几何自适应性。在无需已知相机参数或测试时优化的前提下，我们的方法可在一次前向推理中有效修复伪影并裁剪冗余高斯。大量数据集实验表明，SaLon3R在新视图合成与深度估计任务上均达成当前最优性能，展现出在长期通用三维重建任务中的卓越效率、鲁棒性与泛化能力。\n"
  },
  {
    "path": "abs/2510.15264.md",
    "content": "### DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion\n\nWe present DriveGen3D, a novel framework for generating high-quality and highly controllable dynamic 3D driving scenes that addresses critical limitations in existing methodologies. Current approaches to driving scene synthesis either suffer from prohibitive computational demands for extended temporal generation, focus exclusively on prolonged video synthesis without 3D representation, or restrict themselves to static single-scene reconstruction. Our work bridges this methodological gap by integrating accelerated long-term video generation with large-scale dynamic scene reconstruction through multimodal conditional control. DriveGen3D introduces a unified pipeline consisting of two specialized components: FastDrive-DiT, an efficient video diffusion transformer for high-resolution, temporally coherent video synthesis under text and Bird's-Eye-View (BEV) layout guidance; and FastRecon3D, a feed-forward reconstruction module that rapidly builds 3D Gaussian representations across time, ensuring spatial-temporal consistency. Together, these components enable real-time generation of extended driving videos (up to 424×800 at 12 FPS) and corresponding dynamic 3D scenes, achieving SSIM of 0.811 and PSNR of 22.84 on novel view synthesis, all while maintaining parameter efficiency.\n\n我们提出了DriveGen3D，一种新颖的框架，旨在生成高质量、高可控性的动态三维驾驶场景，解决现有方法中的关键瓶颈。目前主流的驾驶场景合成方法要么在长时间序列生成上计算成本过高，要么仅关注视频合成而不具备三维表示能力，或者仅限于静态单场景的重建。DriveGen3D通过多模态条件控制，将加速的长期视频生成与大规模动态场景重建相融合，有效弥合了这一方法学断层。\n该框架引入了一个统一流程，由两个专用模块构成：FastDrive-DiT，是一个高效的视频扩散Transformer，可在文本和俯视图（BEV）布局引导下，合成高分辨率、时序连贯的视频；FastRecon3D，是一个前馈式三维重建模块，能够快速在时间序列上构建三维高斯表示，确保时空一致性。两者协同运行，实现了实时生成长时间驾驶视频（分辨率达424×800，帧率达12 FPS）及其对应的动态三维场景。在新视图合成任务中，该框架取得了SSIM 0.811 和 PSNR 22.84 的出色表现，同时保持了参数高效性。\n"
  },
  {
    "path": "abs/2510.15352.md",
    "content": "### GaussGym: An open-source real-to-sim framework for learning locomotion from pixels\n\nWe present a novel approach for photorealistic robot simulation that integrates 3D Gaussian Splatting as a drop-in renderer within vectorized physics simulators such as IsaacGym. This enables unprecedented speed -- exceeding 100,000 steps per second on consumer GPUs -- while maintaining high visual fidelity, which we showcase across diverse tasks. We additionally demonstrate its applicability in a sim-to-real robotics setting. Beyond depth-based sensing, our results highlight how rich visual semantics improve navigation and decision-making, such as avoiding undesirable regions. We further showcase the ease of incorporating thousands of environments from iPhone scans, large-scale scene datasets (e.g., GrandTour, ARKit), and outputs from generative video models like Veo, enabling rapid creation of realistic training worlds. This work bridges high-throughput simulation and high-fidelity perception, advancing scalable and generalizable robot learning. All code and data will be open-sourced for the community to build upon.\n\n我们提出了一种用于真实感机器人仿真的新方法，将三维高斯溅射（3D Gaussian Splatting）作为即插即用的渲染器集成进如IsaacGym等向量化物理模拟器中。该方法在保持高视觉保真度的同时，实现了前所未有的模拟速度——在消费级GPU上每秒可超过100,000步，并在多项任务中展示了出色效果。我们还展示了该方法在“模拟到现实”（sim-to-real）机器人任务中的适用性。除了基于深度的感知外，我们的实验结果还表明，丰富的视觉语义信息有助于提升导航与决策能力，例如避开不安全区域。此外，我们展示了如何轻松地将数千个环境集成进系统，包括iPhone扫描数据、大规模场景数据集（如GrandTour、ARKit）以及生成式视频模型（如Veo）的输出，从而可快速构建逼真的训练世界。该工作打通了高吞吐量仿真与高保真感知之间的桥梁，推动了可扩展、可泛化的机器人学习发展。我们将开放所有代码和数据，供社区共享与扩展。\n"
  },
  {
    "path": "abs/2510.15386.md",
    "content": "### PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction\n\nRecent advances in 3D Gaussian Splatting (3DGS) have enabled high-quality, real-time novel-view synthesis from multi-view images. However, most existing methods assume the object is captured in a single, static pose, resulting in incomplete reconstructions that miss occluded or self-occluded regions. We introduce PFGS, a pose-aware 3DGS framework that addresses the practical challenge of reconstructing complete objects from multi-pose image captures. Given images of an object in one main pose and several auxiliary poses, PFGS iteratively fuses each auxiliary set into a unified 3DGS representation of the main pose. Our pose-aware fusion strategy combines global and local registration to merge views effectively and refine the 3DGS model. While recent advances in 3D foundation models have improved registration robustness and efficiency, they remain limited by high memory demands and suboptimal accuracy. PFGS overcomes these challenges by incorporating them more intelligently into the registration process: it leverages background features for per-pose camera pose estimation and employs foundation models for cross-pose registration. This design captures the best of both approaches while resolving background inconsistency issues. Experimental results demonstrate that PFGS consistently outperforms strong baselines in both qualitative and quantitative evaluations, producing more complete reconstructions and higher-fidelity 3DGS models.\n\n三维高斯散射（3D Gaussian Splatting，简称 3DGS）领域的最新进展，使得基于多视图图像的高质量、实时新视角合成成为可能。然而，现有大多数方法假设目标物体处于单一静态姿态，从而导致重建结果不完整，遗漏了被遮挡或自遮挡的区域。为了解决从多姿态图像中重建完整物体的实际挑战，我们提出了 PFGS —— 一种姿态感知的 3DGS 框架。给定一个物体在一个主姿态和若干辅助姿态下的图像，PFGS 通过迭代方式将每个辅助姿态的信息融合到主姿态对应的统一 3DGS 表示中。我们提出的姿态感知融合策略结合了全局与局部配准方法，能够有效整合各视角信息，并细化最终的 3DGS 模型。尽管近期的三维基础模型在配准的鲁棒性与效率方面取得了进展，但仍受限于高内存消耗和精度不佳的问题。PFGS 通过更智能地将这些基础模型引入配准流程，有效克服了上述挑战：它利用背景特征进行每个姿态的相机位姿估计，同时使用基础模型实现跨姿态的配准操作。该设计兼顾两种方法的优势，并解决了背景不一致的问题。实验结果表明，PFGS 在定性与定量评估中均显著优于强基线，能够实现更完整的重建效果与更高保真度的 3DGS 模型。\n"
  },
  {
    "path": "abs/2510.15736.md",
    "content": "### Fix False Transparency by Noise Guided Splatting\n\nOpaque objects reconstructed by 3DGS often exhibit a falsely transparent surface, leading to inconsistent background and internal patterns under camera motion in interactive viewing. This issue stems from the ill-posed optimization in 3DGS. During training, background and foreground Gaussians are blended via alpha-compositing and optimized solely against the input RGB images using a photometric loss. As this process lacks an explicit constraint on surface opacity, the optimization may incorrectly assign transparency to opaque regions, resulting in view-inconsistent and falsely transparent. This issue is difficult to detect in standard evaluation settings but becomes particularly evident in object-centric reconstructions under interactive viewing. Although other causes of view-inconsistency have been explored recently, false transparency has not been explicitly identified. To the best of our knowledge, we are the first to identify, characterize, and develop solutions for this artifact, an underreported artifact in 3DGS. Our strategy, NGS, encourages surface Gaussians to adopt higher opacity by injecting opaque noise Gaussians in the object volume during training, requiring only minimal modifications to the existing splatting process. To quantitatively evaluate false transparency in static renderings, we propose a transmittance-based metric that measures the severity of this artifact. In addition, we introduce a customized, high-quality object-centric scan dataset exhibiting pronounced transparency issues, and we augment popular existing datasets with complementary infill noise specifically designed to assess the robustness of 3D reconstruction methods to false transparency. Experiments across multiple datasets show that NGS substantially reduces false transparency while maintaining competitive performance on standard rendering metrics, demonstrating its overall effectiveness.\n\n由 3D 高斯散射（3DGS）重建的非透明物体，常常表现出伪透明表面，在交互式浏览中随相机移动呈现出背景和物体内部纹理不一致的问题。这一问题源于 3DGS 中病态的优化过程。在训练阶段，前景与背景高斯通过 alpha 混合进行融合，并仅以输入的 RGB 图像为监督信号，通过光度损失进行优化。由于该过程缺乏对表面不透明性的显式约束，优化过程可能错误地将透明性赋予本应不透明的区域，从而导致视角不一致并产生伪透明现象。该问题在标准评估设定下难以察觉，但在以物体为中心的交互式重建任务中尤为明显。尽管已有研究探讨了其他导致视角不一致的因素，但“伪透明”现象尚未被明确识别。据我们所知，我们是首个识别、分析并提出解决方案来应对这一问题的工作，该现象是 3DGS 中一个被低估的伪影。我们提出的策略 NGS，通过在训练期间向物体内部注入不透明的噪声高斯，鼓励表面高斯趋向于更高的不透明性，只需对现有的 splatting 流程进行最小修改。为在静态渲染中对伪透明进行定量评估，我们提出了一种基于透射率的指标，用于衡量此类伪影的严重程度。此外，我们构建了一个定制的、高质量的以物体为中心的扫描数据集，展现出明显的伪透明问题；同时还在已有的主流数据集上加入了专门设计的填充噪声，用以评估 3D 重建方法对伪透明问题的鲁棒性。跨多个数据集的实验证明，NGS 能显著减少伪透明现象，同时在标准渲染指标上保持竞争性能，展示了其整体有效性。\n"
  },
  {
    "path": "abs/2510.16410.md",
    "content": "### REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting\n\nBridging the gap between complex human instructions and precise 3D object grounding remains a significant challenge in vision and robotics. Existing 3D segmentation methods often struggle to interpret ambiguous, reasoning-based instructions, while 2D vision-language models that excel at such reasoning lack intrinsic 3D spatial understanding. In this paper, we introduce REALM, an innovative MLLM-agent framework that enables open-world reasoning-based segmentation without requiring extensive 3D-specific post-training. We perform segmentation directly on 3D Gaussian Splatting representations, capitalizing on their ability to render photorealistic novel views that are highly suitable for MLLM comprehension. As directly feeding one or more rendered views to the MLLM can lead to high sensitivity to viewpoint selection, we propose a novel Global-to-Local Spatial Grounding strategy. Specifically, multiple global views are first fed into the MLLM agent in parallel for coarse-level localization, aggregating responses to robustly identify the target object. Then, several close-up novel views of the object are synthesized to perform fine-grained local segmentation, yielding accurate and consistent 3D masks. Extensive experiments show that REALM achieves remarkable performance in interpreting both explicit and implicit instructions across LERF, 3D-OVS, and our newly introduced REALM3D benchmarks. Furthermore, our agent framework seamlessly supports a range of 3D interaction tasks, including object removal, replacement, and style transfer, demonstrating its practical utility and versatility.\n\n在视觉与机器人领域，将复杂的人类指令与精准的三维物体定位相结合，仍是一项重大挑战。现有的三维分割方法往往难以理解含糊或基于推理的指令，而那些擅长语言推理的二维视觉语言模型则缺乏对三维空间的本体理解。本文提出了 REALM —— 一种创新的 MLLM-agent 框架，能够在无需大规模三维专属后训练的前提下，实现开放世界的推理驱动分割。我们直接在三维高斯散射（3D Gaussian Splatting）表示上执行分割任务，充分利用其可渲染高保真新视角的特性，使其非常适合被多模态大模型（MLLM）理解。由于直接将一幅或多幅渲染图输入 MLLM 会对视角选择高度敏感，我们提出了一种新颖的“全局到局部空间定位”策略。具体而言，首先并行地将多个全局视角输入 MLLM agent 进行粗粒度定位，通过聚合响应结果稳健地识别目标物体；随后再合成多个该物体的近距离新视角，用于精细的局部分割，从而获得准确一致的三维掩码。大量实验证明，REALM 在 LERF、3D-OVS 以及我们新提出的 REALM3D 基准上，对于显式与隐式指令的理解均表现出色。此外，该 agent 框架还可无缝支持一系列三维交互任务，如物体移除、替换与风格迁移，展现出良好的实用性与多样性。\n"
  },
  {
    "path": "abs/2510.16463.md",
    "content": "### HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars\n\nRecent advances in 3D Gaussian Splatting (3DGS) have enabled fast, photorealistic rendering of dynamic 3D scenes, showing strong potential in immersive communication. However, in digital human encoding and transmission, the compression methods based on general 3DGS representations are limited by the lack of human priors, resulting in suboptimal bitrate efficiency and reconstruction quality at the decoder side, which hinders their application in streamable 3D avatar systems. We propose HGC-Avatar, a novel Hierarchical Gaussian Compression framework designed for efficient transmission and high-quality rendering of dynamic avatars. Our method disentangles the Gaussian representation into a structural layer, which maps poses to Gaussians via a StyleUNet-based generator, and a motion layer, which leverages the SMPL-X model to represent temporal pose variations compactly and semantically. This hierarchical design supports layer-wise compression, progressive decoding, and controllable rendering from diverse pose inputs such as video sequences or text. Since people are most concerned with facial realism, we incorporate a facial attention mechanism during StyleUNet training to preserve identity and expression details under low-bitrate constraints. Experimental results demonstrate that HGC-Avatar provides a streamable solution for rapid 3D avatar rendering, while significantly outperforming prior methods in both visual quality and compression efficiency.\n\n近年来，三维高斯散射（3D Gaussian Splatting, 简称 3DGS）的发展，使得动态三维场景的快速、照片级真实感渲染成为可能，展现出在沉浸式通信中的巨大潜力。然而，在数字人编码与传输过程中，基于通用 3DGS 表示的压缩方法由于缺乏人体先验，导致在解码端比特率效率和重建质量不足，限制了其在可流式传输的三维头像系统中的应用。为此，我们提出 HGC-Avatar —— 一种新颖的层次化高斯压缩（Hierarchical Gaussian Compression）框架，用于高效传输与高质量渲染动态虚拟人。我们的方法将高斯表示解耦为两个层次：结构层通过基于 StyleUNet 的生成器将姿态映射为高斯；运动层则借助 SMPL-X 模型以紧凑且语义化的方式表征时序姿态变化。该层次化设计支持分层压缩、渐进式解码，并能从多种姿态输入（如视频序列或文本）中进行可控渲染。考虑到人们对面部真实感的关注，我们在 StyleUNet 训练过程中引入了面部注意力机制，以在低比特率约束下保留身份和表情细节。实验结果表明，HGC-Avatar 提供了一种可流式传输的快速三维头像渲染方案，在视觉质量与压缩效率方面均显著优于现有方法。\n"
  },
  {
    "path": "abs/2510.16777.md",
    "content": "### GS2POSE: Marry Gaussian Splatting to 6D Object Pose Estimation\n\nAccurate 6D pose estimation of 3D objects is a fundamental task in computer vision, and current research typically predicts the 6D pose by establishing correspondences between 2D image features and 3D model features. However, these methods often face difficulties with textureless objects and varying illumination conditions. To overcome these limitations, we propose GS2POSE, a novel approach for 6D object pose estimation. GS2POSE formulates a pose regression algorithm inspired by the principles of Bundle Adjustment (BA). By leveraging Lie algebra, we extend the capabilities of 3DGS to develop a pose-differentiable rendering pipeline, which iteratively optimizes the pose by comparing the input image to the rendered image. Additionally, GS2POSE updates color parameters within the 3DGS model, enhancing its adaptability to changes in illumination. Compared to previous models, GS2POSE demonstrates accuracy improvements of 1.4\\%, 2.8\\% and 2.5\\% on the T-LESS, LineMod-Occlusion and LineMod datasets, respectively.\n\n三维物体的 6D 姿态估计是计算机视觉中的一项基础任务，目前主流研究通常通过建立二维图像特征与三维模型特征之间的对应关系来预测 6D 姿态。然而，这些方法在处理无纹理物体或光照变化时常常面临挑战。为克服上述限制，我们提出了一种用于 6D 物体姿态估计的新方法 —— GS2POSE。GS2POSE 借鉴束束调整（Bundle Adjustment, BA）原理，设计了一种姿态回归算法。通过引入李代数（Lie Algebra），我们扩展了 3DGS 的能力，构建了一个可导的姿态渲染管线，该管线通过将输入图像与渲染图像进行比对，迭代优化物体姿态。此外，GS2POSE 还会在训练过程中动态更新 3DGS 模型中的颜色参数，从而增强其对光照变化的适应性。与现有方法相比，GS2POSE 在 T-LESS、LineMod-Occlusion 和 LineMod 数据集上分别取得了 1.4%、2.8% 和 2.5% 的精度提升。\n"
  },
  {
    "path": "abs/2510.16837.md",
    "content": "### 2DGS-R: Revisiting the Normal Consistency Regularization in 2D Gaussian Splatting\n\n\nRecent advancements in 3D Gaussian Splatting (3DGS) have greatly influenced neural fields, as it enables high-fidelity rendering with impressive visual quality. However, 3DGS has difficulty accurately representing surfaces. In contrast, 2DGS transforms the 3D volume into a collection of 2D planar Gaussian disks. Despite advancements in geometric fidelity, rendering quality remains compromised, highlighting the challenge of achieving both high-quality rendering and precise geometric structures. This indicates that optimizing both geometric and rendering quality in a single training stage is currently unfeasible. To overcome this limitation, we present 2DGS-R, a new method that uses a hierarchical training approach to improve rendering quality while maintaining geometric accuracy. 2DGS-R first trains the original 2D Gaussians with the normal consistency regularization. Then 2DGS-R selects the 2D Gaussians with inadequate rendering quality and applies a novel in-place cloning operation to enhance the 2D Gaussians. Finally, we fine-tune the 2DGS-R model with opacity frozen. Experimental results show that compared to the original 2DGS, our method requires only 1% more storage and minimal additional training time. Despite this negligible overhead, it achieves high-quality rendering results while preserving fine geometric structures. These findings indicate that our approach effectively balances efficiency with performance, leading to improvements in both visual fidelity and geometric reconstruction accuracy.\n\n三维高斯散射（3D Gaussian Splatting，简称 3DGS）的最新进展极大地推动了神经场的发展，其出色的视觉质量使高保真渲染成为可能。然而，3DGS 在精确表面表达方面仍存在困难。相较之下，2DGS 将三维体积转化为一组二维平面高斯圆盘。尽管几何保真度有所提升，但渲染质量仍受到影响，凸显了在保持高质量渲染与精确几何结构之间取得双重优化的挑战。这表明，在单阶段训练过程中同时优化几何与渲染质量仍不可行。为克服该限制，我们提出了 2DGS-R —— 一种通过分层训练策略提升渲染质量的同时保持几何精度的新方法。2DGS-R 首先在原始 2D 高斯上引入法线一致性正则项进行训练；随后选出渲染质量不足的高斯，并通过一种新颖的“原地克隆”操作对其增强；最后在冻结透明度的情况下对整个 2DGS-R 模型进行微调。实验结果表明，与原始 2DGS 相比，我们的方法仅增加约 1% 的存储开销，并带来极小的训练时间代价。尽管开销可以忽略不计，2DGS-R 依然实现了高质量的渲染效果，同时保留了精细的几何结构。这些结果表明，该方法在效率与性能之间实现了良好平衡，显著提升了视觉保真度与几何重建精度。\n"
  },
  {
    "path": "abs/2510.17095.md",
    "content": "### GSPlane: Concise and Accurate Planar Reconstruction via Structured Representation\n\nPlanes are fundamental primitives of 3D sences, especially in man-made environments such as indoor spaces and urban streets. Representing these planes in a structured and parameterized format facilitates scene editing and physical simulations in downstream applications. Recently, Gaussian Splatting (GS) has demonstrated remarkable effectiveness in the Novel View Synthesis task, with extensions showing great potential in accurate surface reconstruction. However, even state-of-the-art GS representations often struggle to reconstruct planar regions with sufficient smoothness and precision. To address this issue, we propose GSPlane, which recovers accurate geometry and produces clean and well-structured mesh connectivity for plane regions in the reconstructed scene. By leveraging off-the-shelf segmentation and normal prediction models, GSPlane extracts robust planar priors to establish structured representations for planar Gaussian coordinates, which help guide the training process by enforcing geometric consistency. To further enhance training robustness, a Dynamic Gaussian Re-classifier is introduced to adaptively reclassify planar Gaussians with persistently high gradients as non-planar, ensuring more reliable optimization. Furthermore, we utilize the optimized planar priors to refine the mesh layouts, significantly improving topological structure while reducing the number of vertices and faces. We also explore applications of the structured planar representation, which enable decoupling and flexible manipulation of objects on supportive planes. Extensive experiments demonstrate that, with no sacrifice in rendering quality, the introduction of planar priors significantly improves the geometric accuracy of the extracted meshes across various baselines.\n\n平面是三维场景中的基本构成元素，尤其在人造环境中，如室内空间与城市街道中尤为常见。将这些平面以结构化、参数化的方式进行表示，有助于在后续应用中实现场景编辑与物理模拟。近年来，高斯散射（Gaussian Splatting, 简称 GS）在新视角合成任务中表现出显著效果，其扩展方法也在高精度表面重建方面展现出巨大潜力。然而，即使是最先进的 GS 表示，在重建平面区域时，仍常面临平滑度与精度不足的问题。为此，我们提出 GSPlane —— 一种针对平面区域的重建方法，能够恢复精确几何结构，并生成干净且结构良好的网格连接。GSPlane 借助现成的分割与法线预测模型，提取稳健的平面先验，用于构建平面高斯坐标的结构化表示，并通过几何一致性约束，引导训练过程。为了进一步提升训练的稳健性，我们引入了一种动态高斯重分类器（Dynamic Gaussian Re-classifier），用于自适应地将持续保持高梯度的平面高斯重新分类为非平面，以实现更可靠的优化。此外，我们还利用优化后的平面先验对网格布局进行细化，在显著提升拓扑结构的同时，有效减少顶点与面片数量。我们还探索了结构化平面表示的应用场景，如支撑面上物体的解耦与灵活操控。大量实验证明，在不牺牲渲染质量的前提下，引入平面先验可显著提升所提取网格的几何精度，在多个基准方法上均表现出优越性能。\n"
  },
  {
    "path": "abs/2510.17479.md",
    "content": "### Initialize to Generalize: A Stronger Initialization Pipeline for Sparse-View 3DGS\n\nSparse-view 3D Gaussian Splatting (3DGS) often overfits to the training views, leading to artifacts like blurring in novel view rendering. Prior work addresses it either by enhancing the initialization (i.e., the point cloud from Structure-from-Motion (SfM)) or by adding training-time constraints (regularization) to the 3DGS optimization. Yet our controlled ablations reveal that initialization is the decisive factor: it determines the attainable performance band in sparse-view 3DGS, while training-time constraints yield only modest within-band improvements at extra cost. Given initialization's primacy, we focus our design there. Although SfM performs poorly under sparse views due to its reliance on feature matching, it still provides reliable seed points. Thus, building on SfM, our effort aims to supplement the regions it fails to cover as comprehensively as possible. Specifically, we design: (i) frequency-aware SfM that improves low-texture coverage via low-frequency view augmentation and relaxed multi-view correspondences; (ii) 3DGS self-initialization that lifts photometric supervision into additional points, compensating SfM-sparse regions with learned Gaussian centers; and (iii) point-cloud regularization that enforces multi-view consistency and uniform spatial coverage through simple geometric/visibility priors, yielding a clean and reliable point cloud. Our experiments on LLFF and Mip-NeRF360 demonstrate consistent gains in sparse-view settings, establishing our approach as a stronger initialization strategy.\n\n在稀疏视角条件下，三维高斯散射（3D Gaussian Splatting，简称 3DGS）常常容易过拟合训练视角，从而在新视角渲染中出现模糊等伪影。以往的研究主要通过两种方式来缓解该问题：一是提升初始化质量（即来自 SfM 的点云），二是在训练过程中引入额外约束（正则项）优化 3DGS。然而，我们的对照消融实验表明，初始化才是决定性因素：它决定了在稀疏视角条件下 3DGS 所能达到的性能上限，而训练时的约束仅能在该性能带内带来有限改进，且需要额外成本。因此，我们将设计重点放在初始化阶段。尽管传统 SfM 在稀疏视角下由于依赖特征匹配而表现不佳，但它仍能提供可靠的种子点。因此，我们的目标是在 SfM 基础上尽可能补充其未覆盖的区域。具体而言，我们设计了以下三项改进：\n（i）频率感知 SfM，通过低频视角增强与宽松的多视图匹配机制，提升低纹理区域的覆盖能力；\n（ii）3DGS 自初始化，将光度监督提升为额外点，引入可学习的高斯中心以弥补 SfM 稀疏区域；\n（iii）点云正则化，通过几何与可见性先验，增强多视角一致性与空间覆盖均匀性，从而获得干净可靠的点云。\n我们在 LLFF 和 Mip-NeRF360 数据集上的实验证明，该方法在稀疏视角设置下带来一致性性能提升，确立了我们方法作为更优初始化策略的地位。\n"
  },
  {
    "path": "abs/2510.17719.md",
    "content": "### Raindrop GS: A Benchmark for 3D Gaussian Splatting under Raindrop Conditions\n\n3D Gaussian Splatting (3DGS) under raindrop conditions suffers from severe occlusions and optical distortions caused by raindrop contamination on the camera lens, substantially degrading reconstruction quality. Existing benchmarks typically evaluate 3DGS using synthetic raindrop images with known camera poses (constrained images), assuming ideal conditions. However, in real-world scenarios, raindrops often interfere with accurate camera pose estimation and point cloud initialization. Moreover, a significant domain gap between synthetic and real raindrops further impairs generalization. To tackle these issues, we introduce RaindropGS, a comprehensive benchmark designed to evaluate the full 3DGS pipeline-from unconstrained, raindrop-corrupted images to clear 3DGS reconstructions. Specifically, the whole benchmark pipeline consists of three parts: data preparation, data processing, and raindrop-aware 3DGS evaluation, including types of raindrop interference, camera pose estimation and point cloud initialization, single image rain removal comparison, and 3D Gaussian training comparison. First, we collect a real-world raindrop reconstruction dataset, in which each scene contains three aligned image sets: raindrop-focused, background-focused, and rain-free ground truth, enabling a comprehensive evaluation of reconstruction quality under different focus conditions. Through comprehensive experiments and analyses, we reveal critical insights into the performance limitations of existing 3DGS methods on unconstrained raindrop images and the varying impact of different pipeline components: the impact of camera focus position on 3DGS reconstruction performance, and the interference caused by inaccurate pose and point cloud initialization on reconstruction. These insights establish clear directions for developing more robust 3DGS methods under raindrop conditions.\n\n在雨滴干扰场景下，三维高斯散射（3D Gaussian Splatting，简称 3DGS）面临由雨滴附着在摄像头镜头上所导致的严重遮挡和光学畸变问题，从而大幅降低重建质量。现有评测基准通常基于合成的雨滴图像进行 3DGS 评估，这些图像具备已知相机位姿（即受约束图像），假设理想条件成立。然而在真实场景中，雨滴常常干扰相机位姿估计和点云初始化过程。此外，合成雨滴与真实雨滴之间存在显著的域差异，进一步影响模型泛化能力。为解决上述问题，我们提出 RaindropGS —— 一个面向雨滴干扰条件下的 3DGS 全流程评测基准，覆盖从无约束、被雨滴污染的图像输入到清晰 3DGS 重建结果的全链路评估。该评测基准由三个部分组成：数据准备、数据处理和雨滴感知的 3DGS 评估，涵盖雨滴干扰类型、相机位姿估计与点云初始化、单图像去雨方法对比以及 3DGS 训练对比。\n我们首先采集了一个真实世界的雨滴重建数据集，每个场景包含三组对齐图像：以雨滴为焦点、以背景为焦点、以及无雨的真实图像，用于全面评估不同焦点条件下的重建质量。通过系统实验与分析，我们揭示了现有 3DGS 方法在处理无约束雨滴图像时的性能瓶颈，并进一步分析了不同流程组件的影响：例如相机焦点位置对 3DGS 重建效果的影响，以及不准确的相机位姿与点云初始化对重建质量的干扰。这些发现为在雨滴场景下构建更鲁棒的 3DGS 方法提供了明确的发展方向。\n"
  },
  {
    "path": "abs/2510.17783.md",
    "content": "### Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats\n\nCommercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed \"annotated digital twins\" of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy.\n\n商业植物表型分析系统通常使用固定摄像头，因此在叶片遮挡的情况下难以获取植物的许多细节信息。本文提出了 Botany-Bot —— 一个用于构建植物“注释数字孪生体”的系统。该系统结合了两个双目摄像头、一个置于光箱内的数字转台、一只工业机械臂，以及三维分割高斯散射（3D segmentated Gaussian Splat）模型。我们还提出了一系列用于操控叶片的机器人算法，能够获取被遮挡细节（如茎部芽点和叶片正反面）的高分辨率、可索引图像。实验结果表明，Botany-Bot 在叶片分割任务上达到了 90.8% 的准确率，在叶片检测上为 86.2%，在叶片抬升/推动操作中达到 77.9%，在叶片正反面细节拍摄任务中实现了 77.3% 的准确率。\n"
  },
  {
    "path": "abs/2510.17864.md",
    "content": "### InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation\n\nWe introduce InsideOut, an extension of 3D Gaussian splatting (3DGS) that bridges the gap between high-fidelity RGB surface details and subsurface X-ray structures. The fusion of RGB and X-ray imaging is invaluable in fields such as medical diagnostics, cultural heritage restoration, and manufacturing. We collect new paired RGB and X-ray data, perform hierarchical fitting to align RGB and X-ray radiative Gaussian splats, and propose an X-ray reference loss to ensure consistent internal structures. InsideOut effectively addresses the challenges posed by disparate data representations between the two modalities and limited paired datasets. This approach significantly extends the applicability of 3DGS, enhancing visualization, simulation, and non-destructive testing capabilities across various domains.\n\n我们提出了 InsideOut —— 一种扩展的三维高斯散射（3D Gaussian Splatting，简称 3DGS）方法，用于融合高保真的 RGB 表面细节与物体内部的 X 射线结构信息。RGB 与 X 射线成像的融合在医学诊断、文化遗产修复及制造业等领域具有重要价值。为实现该融合，我们采集了新的 RGB-X 射线配对数据，通过分层拟合方法对 RGB 与 X 射线的辐射高斯散点进行对齐，并引入 X 射线参考损失项，以确保内部结构的一致性。InsideOut 有效应对了跨模态数据表示差异和配对数据有限所带来的挑战，显著拓展了 3DGS 在可视化、模拟及无损检测等多领域的应用能力。\n"
  },
  {
    "path": "abs/2510.18054.md",
    "content": "### HouseTour: A Virtual Real Estate A(I)gent\n\nWe introduce HouseTour, a method for spatially-aware 3D camera trajectory and natural language summary generation from a collection of images depicting an existing 3D space. Unlike existing vision-language models (VLMs), which struggle with geometric reasoning, our approach generates smooth video trajectories via a diffusion process constrained by known camera poses and integrates this information into the VLM for 3D-grounded descriptions. We synthesize the final video using 3D Gaussian splatting to render novel views along the trajectory. To support this task, we present the HouseTour dataset, which includes over 1,200 house-tour videos with camera poses, 3D reconstructions, and real estate descriptions. Experiments demonstrate that incorporating 3D camera trajectories into the text generation process improves performance over methods handling each task independently. We evaluate both individual and end-to-end performance, introducing a new joint metric. Our work enables automated, professional-quality video creation for real estate and touristic applications without requiring specialized expertise or equipment.\n\n我们提出了 HouseTour —— 一种面向现有三维空间图像集合的空间感知型 3D 相机轨迹生成与自然语言摘要生成方法。不同于现有视觉语言模型（VLM）在几何推理方面的不足，我们的方法通过受已知相机位姿约束的扩散过程生成平滑的视频轨迹，并将这些空间信息注入 VLM，实现基于三维场景的描述生成。最终视频通过三维高斯散射（3D Gaussian Splatting）渲染生成，在轨迹路径上合成新视角图像。为支撑该任务，我们构建了 HouseTour 数据集，包含 1200 多个带有相机轨迹、三维重建与房产描述的房屋导览视频。实验表明，将 3D 相机轨迹引入文本生成过程相比于将两任务分开处理的方法取得了更优性能。我们对子任务与端到端流程均进行了评估，并提出了一项新的联合评估指标。我们的工作实现了无需专业知识或设备即可自动生成高质量房产与旅游视频内容的能力。\n"
  },
  {
    "path": "abs/2510.18101.md",
    "content": "### From Volume Rendering to 3D Gaussian Splatting: Theory and Applications\n\nThe problem of 3D reconstruction from posed images is undergoing a fundamental transformation, driven by continuous advances in 3D Gaussian Splatting (3DGS). By modeling scenes explicitly as collections of 3D Gaussians, 3DGS enables efficient rasterization through volumetric splatting, offering thus a seamless integration with common graphics pipelines. Despite its real-time rendering capabilities for novel view synthesis, 3DGS suffers from a high memory footprint, the tendency to bake lighting effects directly into its representation, and limited support for secondary-ray effects. This tutorial provides a concise yet comprehensive overview of the 3DGS pipeline, starting from its splatting formulation and then exploring the main efforts in addressing its limitations. Finally, we survey a range of applications that leverage 3DGS for surface reconstruction, avatar modeling, animation, and content generation-highlighting its efficient rendering and suitability for feed-forward pipelines.\n\n基于已知相机位姿图像的三维重建问题正在经历一场根本性的变革，这一变革由三维高斯投影（3D Gaussian Splatting，简称3DGS）的持续进步所推动。3DGS通过将场景显式建模为多个三维高斯分布的集合，实现了基于体素投影的高效光栅化，从而能够与常见图形渲染管线无缝集成。尽管3DGS在新视角合成中具备实时渲染能力，但它仍存在显著的内存占用、将光照效果直接烘焙进表示的倾向，以及对次级光线效应支持有限等问题。本教程对3DGS流程进行了简洁而全面的概述，首先介绍其投影表达方式，随后探讨当前主要的改进方向与解决方案。最后，我们回顾了一系列利用3DGS实现的应用，包括表面重建、虚拟人建模、动画生成以及内容创作，强调其高效渲染特性以及对前馈渲染流程的良好适配性。\n"
  },
  {
    "path": "abs/2510.18253.md",
    "content": "### OpenInsGaussian: Open-vocabulary Instance Gaussian Segmentation with Context-aware Cross-view Fusion\n\nUnderstanding 3D scenes is pivotal for autonomous driving, robotics, and augmented reality. Recent semantic Gaussian Splatting approaches leverage large-scale 2D vision models to project 2D semantic features onto 3D scenes. However, they suffer from two major limitations: (1) insufficient contextual cues for individual masks during preprocessing and (2) inconsistencies and missing details when fusing multi-view features from these 2D models. In this paper, we introduce OpenInsGaussian, an Open-vocabulary Ins}tance Gaussian segmentation framework with Context-aware Cross-view Fusion. Our method consists of two modules: Context-Aware Feature Extraction, which augments each mask with rich semantic context, and Attention-Driven Feature Aggregation, which selectively fuses multi-view features to mitigate alignment errors and incompleteness. Through extensive experiments on benchmark datasets, OpenInsGaussian achieves state-of-the-art results in open-vocabulary 3D Gaussian segmentation, outperforming existing baselines by a large margin. These findings underscore the robustness and generality of our proposed approach, marking a significant step forward in 3D scene understanding and its practical deployment across diverse real-world scenarios.\n\n三维场景理解对于自动驾驶、机器人技术以及增强现实具有关键意义。近年来的语义高斯投影方法利用大规模二维视觉模型，将二维语义特征投影至三维场景中。然而，这类方法存在两个主要局限：(1) 在预处理阶段，个体掩膜缺乏足够的上下文信息；(2) 在融合多视角特征时，容易出现不一致和细节缺失的问题。本文提出了**OpenInsGaussian**，一个基于上下文感知跨视角融合的**开放词汇实例高斯分割框架**。该方法由两个模块组成：上下文感知特征提取模块，为每个掩膜引入丰富语义上下文；注意力驱动特征聚合模块，有选择地融合多视角特征以缓解对齐误差和信息缺失。通过在多个基准数据集上的广泛实验证明，OpenInsGaussian在开放词汇三维高斯分割任务中取得了当前最优的性能，远超现有基线方法。上述结果充分展示了所提方法的鲁棒性与泛化能力，为三维场景理解及其在多种现实应用场景中的部署迈出了重要一步。\n"
  },
  {
    "path": "abs/2510.18739.md",
    "content": "### Moving Light Adaptive Colonoscopy Reconstruction via Illumination-Attenuation-Aware 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has emerged as a pivotal technique for real-time view synthesis in colonoscopy, enabling critical applications such as virtual colonoscopy and lesion tracking. However, the vanilla 3DGS assumes static illumination and that observed appearance depends solely on viewing angle, which causes incompatibility with the photometric variations in colonoscopic scenes induced by dynamic light source/camera. This mismatch forces most 3DGS methods to introduce structure-violating vaporous Gaussian blobs between the camera and tissues to compensate for illumination attenuation, ultimately degrading the quality of 3D reconstructions. Previous works only consider the illumination attenuation caused by light distance, ignoring the physical characters of light source and camera. In this paper, we propose ColIAGS, an improved 3DGS framework tailored for colonoscopy. To mimic realistic appearance under varying illumination, we introduce an Improved Appearance Modeling with two types of illumination attenuation factors, which enables Gaussians to adapt to photometric variations while preserving geometry accuracy. To ensure the geometry approximation condition of appearance modeling, we propose an Improved Geometry Modeling using high-dimensional view embedding to enhance Gaussian geometry attribute prediction. Furthermore, another cosine embedding input is leveraged to generate illumination attenuation solutions in an implicit manner. Comprehensive experimental results on standard benchmarks demonstrate that our proposed ColIAGS achieves the dual capabilities of novel view synthesis and accurate geometric reconstruction. It notably outperforms other state-of-the-art methods by achieving superior rendering fidelity while significantly reducing Depth MSE.\n\n三维高斯投影（3D Gaussian Splatting，简称3DGS）已成为结肠镜检查中实时视图合成的关键技术，支持如虚拟结肠镜和病灶追踪等重要应用。然而，原始的3DGS方法假设照明为静态，且观测到的外观仅依赖于视角，这与结肠镜场景中由动态光源/摄像头引起的光照变化不兼容。这种不匹配迫使大多数3DGS方法在摄像头与组织之间引入违反结构的雾状高斯斑点，以补偿光照衰减，最终导致三维重建质量下降。现有工作通常仅考虑由光源距离引起的光照衰减，忽略了光源和摄像头的物理特性。本文提出ColIAGS——一种面向结肠镜场景的改进型3DGS框架。为模拟在变化光照条件下的真实外观，我们引入了改进的外观建模机制，利用两类光照衰减因子，使高斯能够适应光度变化，同时保持几何精度。为满足外观建模对几何逼近的要求，我们提出改进的几何建模方法，引入高维视角嵌入以提升高斯几何属性的预测能力。此外，我们还利用另一组余弦嵌入作为输入，以隐式方式生成光照衰减解。大量标准基准测试的实验结果表明，所提出的ColIAGS不仅能够实现新视角合成，还具备高精度的几何重建能力。在渲染保真度和深度MSE显著下降方面，均显著优于现有最先进的方法。\n"
  },
  {
    "path": "abs/2510.19200.md",
    "content": "### GRASPLAT: Enabling dexterous grasping through novel view synthesis\n\nAchieving dexterous robotic grasping with multi-fingered hands remains a significant challenge. While existing methods rely on complete 3D scans to predict grasp poses, these approaches face limitations due to the difficulty of acquiring high-quality 3D data in real-world scenarios. In this paper, we introduce GRASPLAT, a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images. Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp. To achieve this, we utilize 3D Gaussian Splatting to generate high-fidelity novel views of real hand-object interactions, enabling end-to-end training with RGB data. Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images. We conduct extensive experiments on both synthetic and real-world grasping datasets, demonstrating that GRASPLAT improves grasp success rates up to 36.9% over existing image-based methods.\n\n实现多指灵巧手的机器人抓取仍然是一个重大挑战。现有方法通常依赖完整的三维扫描数据来预测抓取姿态，但在现实环境中获取高质量三维数据具有较大困难，限制了这些方法的应用。本文提出了一种新颖的抓取框架——GRASPLAT，它在训练过程中仅使用RGB图像，同时利用一致的三维信息来辅助学习。我们核心的洞见在于：通过合成物理上合理的手部抓取物体图像，可以反向推理出实现成功抓取所需的手部关节位置。为此，我们采用三维高斯投影（3D Gaussian Splatting）技术生成真实手物交互的高保真新视角图像，从而实现基于RGB数据的端到端训练。与以往方法不同，我们引入了一种光度损失项，通过最小化渲染图与真实图像之间的差异来优化抓取预测。我们在合成和真实抓取数据集上进行了大量实验证明，GRASPLAT在抓取成功率方面相比现有基于图像的方法最高可提升36.9%。\n"
  },
  {
    "path": "abs/2510.19210.md",
    "content": "### MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting\n\nRecent advances in dynamic scene reconstruction have significantly benefited from 3D Gaussian Splatting, yet existing methods show inconsistent performance across diverse scenes, indicating no single approach effectively handles all dynamic challenges. To overcome these limitations, we propose Mixture of Experts for Dynamic Gaussian Splatting (MoE-GS), a unified framework integrating multiple specialized experts via a novel Volume-aware Pixel Router. Our router adaptively blends expert outputs by projecting volumetric Gaussian-level weights into pixel space through differentiable weight splatting, ensuring spatially and temporally coherent results. Although MoE-GS improves rendering quality, the increased model capacity and reduced FPS are inherent to the MoE architecture. To mitigate this, we explore two complementary directions: (1) single-pass multi-expert rendering and gate-aware Gaussian pruning, which improve efficiency within the MoE framework, and (2) a distillation strategy that transfers MoE performance to individual experts, enabling lightweight deployment without architectural changes. To the best of our knowledge, MoE-GS is the first approach incorporating Mixture-of-Experts techniques into dynamic Gaussian splatting. Extensive experiments on the N3V and Technicolor datasets demonstrate that MoE-GS consistently outperforms state-of-the-art methods with improved efficiency.\n\n近年来，动态场景重建在三维高斯投影（3D Gaussian Splatting）技术的发展下取得了显著进展，但现有方法在不同场景中的表现仍不稳定，表明尚无一种方法能有效应对所有动态挑战。为克服这一局限，本文提出了一种面向动态高斯投影的专家混合方法（Mixture of Experts for Dynamic Gaussian Splatting，简称MoE-GS），该方法通过引入新颖的体素感知像素路由器（Volume-aware Pixel Router）实现多个专业子模型的统一融合。该路由器通过可微分的权重投影机制将体素级高斯权重映射到像素空间，从而自适应地融合各专家输出，确保空间和时间上的一致性。尽管MoE-GS提升了渲染质量，但由于其架构本身的特点，模型容量增大和帧率下降也随之而来。为缓解这一问题，我们探索了两个互补方向：(1) 单次多专家渲染与感知门控的高斯剪枝机制，以提升MoE架构下的运行效率；(2) 蒸馏策略，将MoE的表现迁移至单一专家模型，从而在不更改模型架构的前提下实现轻量化部署。据我们所知，MoE-GS是首个将专家混合机制引入动态高斯投影的工作。大量在N3V和Technicolor数据集上的实验结果表明，MoE-GS在保持高效性的同时，始终优于当前最先进方法。\n"
  },
  {
    "path": "abs/2510.19653.md",
    "content": "### Re-Activating Frozen Primitives for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3D-GS) achieves real-time photorealistic novel view synthesis, yet struggles with complex scenes due to over-reconstruction artifacts, manifesting as local blurring and needle-shape distortions. While recent approaches attribute these issues to insufficient splitting of large-scale Gaussians, we identify two fundamental limitations: gradient magnitude dilution during densification and the primitive frozen phenomenon, where essential Gaussian densification is inhibited in complex regions while suboptimally scaled Gaussians become trapped in local optima. To address these challenges, we introduce ReAct-GS, a method founded on the principle of re-activation. Our approach features: (1) an importance-aware densification criterion incorporating α-blending weights from multiple viewpoints to re-activate stalled primitive growth in complex regions, and (2) a re-activation mechanism that revitalizes frozen primitives through adaptive parameter perturbations. Comprehensive experiments across diverse real-world datasets demonstrate that ReAct-GS effectively eliminates over-reconstruction artifacts and achieves state-of-the-art performance on standard novel view synthesis metrics while preserving intricate geometric details. Additionally, our re-activation mechanism yields consistent improvements when integrated with other 3D-GS variants such as Pixel-GS, demonstrating its broad applicability.\n\n三维高斯投影（3D Gaussian Splatting，简称3D-GS）实现了实时的照片级新视角合成，但在处理复杂场景时仍面临重建过度伪影的问题，表现为局部模糊和针状形变。尽管近期方法将此类问题归因于大尺度高斯未能充分拆分，但我们识别出两个更为根本的限制因素：其一是密化过程中的梯度幅值稀释问题；其二是“原语冻结”现象——在复杂区域中，关键高斯未能实现有效密化，而子优化尺度的高斯则陷入局部最优。为解决上述问题，本文提出ReAct-GS，一种基于“重新激活”原理的方法。我们的方法主要包括两项设计：(1) 一种基于重要性感知的密化准则，该准则融合来自多视角的α混合权重，用于在复杂区域重新激活停滞的原语增长；(2) 一种重新激活机制，通过自适应参数扰动恢复被冻结的原语活性。我们在多个真实世界数据集上进行了全面实验，结果表明ReAct-GS有效消除了重建过度伪影，在标准的新视角合成评估指标上取得了当前最优性能，并能精细保留复杂几何结构。此外，该重新激活机制在与Pixel-GS等其他3D-GS变体结合时也表现出一致的性能提升，展示了其广泛的适用性。\n"
  },
  {
    "path": "abs/2510.20027.md",
    "content": "### Extreme Views: 3DGS Filter for Novel View Synthesis from Out-of-Distribution Camera Poses\n\nWhen viewing a 3D Gaussian Splatting (3DGS) model from camera positions significantly outside the training data distribution, substantial visual noise commonly occurs. These artifacts result from the lack of training data in these extrapolated regions, leading to uncertain density, color, and geometry predictions from the model.\nTo address this issue, we propose a novel real-time render-aware filtering method. Our approach leverages sensitivity scores derived from intermediate gradients, explicitly targeting instabilities caused by anisotropic orientations rather than isotropic variance. This filtering method directly addresses the core issue of generative uncertainty, allowing 3D reconstruction systems to maintain high visual fidelity even when users freely navigate outside the original training viewpoints.\nExperimental evaluation demonstrates that our method substantially improves visual quality, realism, and consistency compared to existing Neural Radiance Field (NeRF)-based approaches such as BayesRays. Critically, our filter seamlessly integrates into existing 3DGS rendering pipelines in real-time, unlike methods that require extensive post-hoc retraining or fine-tuning.\n\n当从远离训练数据分布的摄像机位置观察三维高斯投影（3D Gaussian Splatting，简称3DGS）模型时，常常会出现明显的视觉噪声。这些伪影源于外推区域缺乏训练数据，导致模型在密度、颜色和几何预测方面产生不确定性。\n为解决这一问题，本文提出一种新颖的实时渲染感知滤波方法。我们的方法利用中间梯度中提取的敏感度分数，重点抑制由各向异性方向引起的不稳定性，而非仅考虑各向同性方差。该滤波策略直接应对生成式不确定性的核心问题，使得三维重建系统即便在用户自由游览至训练视角之外时，也能保持较高的视觉保真度。\n实验评估表明，与BayesRays等现有基于NeRF的方法相比，我们的方法在视觉质量、真实感和一致性方面均有显著提升。更为关键的是，该滤波器可无缝实时集成至现有3DGS渲染流程中，而无需复杂的后期再训练或微调步骤。\n"
  },
  {
    "path": "abs/2510.20238.md",
    "content": "### COS3D: Collaborative Open-Vocabulary 3D Segmentation\n\nOpen-vocabulary 3D segmentation is a fundamental yet challenging task, requiring a mutual understanding of both segmentation and language. However, existing Gaussian-splatting-based methods rely either on a single 3D language field, leading to inferior segmentation, or on pre-computed class-agnostic segmentations, suffering from error accumulation. To address these limitations, we present COS3D, a new collaborative prompt-segmentation framework that contributes to effectively integrating complementary language and segmentation cues throughout its entire pipeline. We first introduce the new concept of collaborative field, comprising an instance field and a language field, as the cornerstone for collaboration. During training, to effectively construct the collaborative field, our key idea is to capture the intrinsic relationship between the instance field and language field, through a novel instance-to-language feature mapping and designing an efficient two-stage training strategy. During inference, to bridge distinct characteristics of the two fields, we further design an adaptive language-to-instance prompt refinement, promoting high-quality prompt-segmentation inference. Extensive experiments not only demonstrate COS3D's leading performance over existing methods on two widely-used benchmarks but also show its high potential to various applications,~\\ie, novel image-based 3D segmentation, hierarchical segmentation, and robotics.\n\n开放词汇三维分割是一项基础但极具挑战性的任务，要求模型在分割与语言之间建立深层次的相互理解。然而，现有基于高斯投影的方法要么依赖于单一的三维语言场，导致分割性能不足；要么依赖于预计算的类别无关分割结果，容易积累误差。为克服这些局限，本文提出了COS3D，一种新颖的协同提示分割框架，能够在整个流程中有效融合语言与分割的互补信息。\n我们首先提出“协同场”（collaborative field）的新概念，它由实例场和语言场共同构成，作为协同建模的基础。在训练阶段，为了高效构建协同场，我们的核心思想是通过创新的“实例到语言”特征映射方式，并设计高效的两阶段训练策略，从而建模实例场与语言场之间的内在关系。在推理阶段，为了弥合两种场之间的特征差异，我们进一步设计了自适应的“语言到实例”提示优化机制，以提升提示驱动分割的推理质量。\n大量实验证明，COS3D在两个广泛使用的基准上都取得了领先性能，并展现出在多种应用场景中的强大潜力，例如新颖图像驱动的三维分割、分层分割以及机器人系统等。\n"
  },
  {
    "path": "abs/2510.20605.md",
    "content": "### OnlineSplatter: Pose-Free Online 3D Reconstruction for Free-Moving Objects\n\nFree-moving object reconstruction from monocular video remains challenging, particularly without reliable pose or depth cues and under arbitrary object motion. We introduce OnlineSplatter, a novel online feed-forward framework generating high-quality, object-centric 3D Gaussians directly from RGB frames without requiring camera pose, depth priors, or bundle optimization. Our approach anchors reconstruction using the first frame and progressively refines the object representation through a dense Gaussian primitive field, maintaining constant computational cost regardless of video sequence length. Our core contribution is a dual-key memory module combining latent appearance-geometry keys with explicit directional keys, robustly fusing current frame features with temporally aggregated object states. This design enables effective handling of free-moving objects via spatial-guided memory readout and an efficient sparsification mechanism, ensuring comprehensive yet compact object coverage. Evaluations on real-world datasets demonstrate that OnlineSplatter significantly outperforms state-of-the-art pose-free reconstruction baselines, consistently improving with more observations while maintaining constant memory and runtime.\n\n从单目视频中重建自由运动物体仍是一项具有挑战性的任务，尤其是在缺乏可靠的位姿或深度信息，并且物体运动方式多变的情况下。为应对这一问题，本文提出了OnlineSplatter——一种新颖的在线前馈式重建框架，可直接从RGB图像帧生成高质量、以物体为中心的三维高斯表示，无需相机位姿、深度先验或Bundle Adjustment优化。\n该方法以首帧为锚点启动重建流程，并通过密集的高斯原语场对物体表示进行逐步细化，其计算开销在任意视频序列长度下始终保持恒定。我们工作的核心贡献在于提出了一种双键记忆模块，结合了潜在的外观-几何键与显式的方向键，能够稳健地融合当前帧特征与时间聚合的物体状态。该模块通过空间引导的记忆读取与高效稀疏化机制，有效应对自由运动物体的建模需求，实现全面而紧凑的物体表示覆盖。\n在多个真实世界数据集上的评估表明，OnlineSplatter在无需位姿条件下显著优于现有最先进的重建方法，且在增加观察次数的同时持续提升重建质量，同时保持恒定的内存占用与运行时间。\n"
  },
  {
    "path": "abs/2510.20813.md",
    "content": "### GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation\n\nThis paper presents GSWorld, a robust, photo-realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates \"closing the loop\" of developing manipulation policies with reproducible evaluation of policies learned from real-robot data and sim2real policy training without using real robots. To enable photo-realistic rendering of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian Scene Description File), that infuses Gaussian-on-Mesh representation with robot URDF and other objects. With a streamlined reconstruction pipeline, we curate a database of GSDF that contains 3 robot embodiments for single-arm and bimanual manipulation, as well as more than 40 objects. Combining GSDF with physics engines, we demonstrate several immediate interesting applications: (1) learning zero-shot sim2real pixel-to-action manipulation policy with photo-realistic rendering, (2) automated high-quality DAgger data collection for adapting policies to deployment environments, (3) reproducible benchmarking of real-robot manipulation policies in simulation, (4) simulation data collection by virtual teleoperation, and (5) zero-shot sim2real visual reinforcement learning.\n\n本文提出了GSWorld——一个结合三维高斯投影（3D Gaussian Splatting）与物理引擎的稳健、逼真的机器人操作仿真平台。该框架主张“闭环”机器人操作策略的开发流程，即在不使用真实机器人的前提下，实现基于真实机器人数据训练的策略评估与仿真到真实（sim2real）策略训练的可复现闭环流程。\n为实现多样场景的照片级渲染，我们提出了一种新型资产格式GSDF（Gaussian Scene Description File，高斯场景描述文件），将高斯-网格混合表达方式与机器人URDF模型和其他物体信息融合。在高效重建流程支持下，我们构建了一个GSDF数据库，包含3种用于单臂和双臂操作的机器人形态及40余种物体。\n结合GSDF与物理引擎，我们展示了多个具有现实意义的直接应用：（1）基于照片级渲染的零样本sim2real像素到动作操作策略学习；（2）自动化高质量DAgger数据采集以适配部署环境；（3）真实机器人操作策略在仿真中的可复现评测；（4）通过虚拟远程操控采集仿真数据；（5）零样本sim2real视觉强化学习。\n"
  },
  {
    "path": "abs/2510.21307.md",
    "content": "### Towards Physically Executable 3D Gaussian for Embodied Navigation\n\n3D Gaussian Splatting (3DGS), a 3D representation method with photorealistic real-time rendering capabilities, is regarded as an effective tool for narrowing the sim-to-real gap. However, it lacks fine-grained semantics and physical executability for Visual-Language Navigation (VLN). To address this, we propose SAGE-3D (Semantically and Physically Aligned Gaussian Environments for 3D Navigation), a new paradigm that upgrades 3DGS into an executable, semantically and physically aligned environment. It comprises two components: (1) Object-Centric Semantic Grounding, which adds object-level fine-grained annotations to 3DGS; and (2) Physics-Aware Execution Jointing, which embeds collision objects into 3DGS and constructs rich physical interfaces. We release InteriorGS, containing 1K object-annotated 3DGS indoor scene data, and introduce SAGE-Bench, the first 3DGS-based VLN benchmark with 2M VLN data. Experiments show that 3DGS scene data is more difficult to converge, while exhibiting strong generalizability, improving baseline performance by 31% on the VLN-CE Unseen task.\n\n3D 高斯泼洒（3DGS）是一种具有照片级真实感实时渲染能力的三维表示方法，被认为是缩小仿真到现实差距（sim-to-real gap）的有效工具。然而，它在视觉-语言导航（VLN）任务中缺乏细粒度语义和物理可执行性。为了解决这一问题，我们提出了 SAGE-3D（Semantically and Physically Aligned Gaussian Environments for 3D Navigation，语义与物理对齐的高斯三维导航环境），这是一种将 3DGS 升级为可执行、语义与物理对齐的新范式。该方法包含两个组成部分：（1）面向物体的语义对齐机制，为 3DGS 添加物体级细粒度标注；（2）具备物理感知的执行连接机制，在 3DGS 中嵌入碰撞物体并构建丰富的物理交互接口。我们发布了 InteriorGS，其中包含 1,000 个带有物体标注的 3DGS 室内场景数据，并引入了 SAGE-Bench，这是首个基于 3DGS 的 VLN 基准测试，包含 200 万条 VLN 数据。实验表明，3DGS 场景数据更难以收敛，但展现出较强的泛化能力，在 VLN-CE 的 Unseen 任务上将基线性能提升了 31%。\n"
  },
  {
    "path": "abs/2510.22140.md",
    "content": "### STG-Avatar: Animatable Human Avatars via Spacetime Gaussian\n\nRealistic animatable human avatars from monocular videos are crucial for advancing human-robot interaction and enhancing immersive virtual experiences. While recent research on 3DGS-based human avatars has made progress, it still struggles with accurately representing detailed features of non-rigid objects (e.g., clothing deformations) and dynamic regions (e.g., rapidly moving limbs). To address these challenges, we present STG-Avatar, a 3DGS-based framework for high-fidelity animatable human avatar reconstruction. Specifically, our framework introduces a rigid-nonrigid coupled deformation framework that synergistically integrates Spacetime Gaussians (STG) with linear blend skinning (LBS). In this hybrid design, LBS enables real-time skeletal control by driving global pose transformations, while STG complements it through spacetime adaptive optimization of 3D Gaussians. Furthermore, we employ optical flow to identify high-dynamic regions and guide the adaptive densification of 3D Gaussians in these regions. Experimental results demonstrate that our method consistently outperforms state-of-the-art baselines in both reconstruction quality and operational efficiency, achieving superior quantitative metrics while retaining real-time rendering capabilities.\n\n从单目视频中构建真实可动画的人体头像对推动人机交互和增强沉浸式虚拟体验至关重要。尽管近期基于 3DGS 的人体头像研究取得了进展，但在精确表示非刚体对象（如衣物变形）和动态区域（如快速移动的四肢）方面仍面临挑战。为了解决这些问题，我们提出了 STG-Avatar，这是一个基于 3DGS 的高保真可动画人体头像重建框架。具体而言，我们的框架引入了刚体-非刚体耦合的变形机制，将时空高斯（Spacetime Gaussians, STG）与线性混合蒙皮（Linear Blend Skinning, LBS）协同融合。在这种混合设计中，LBS 通过驱动全局姿态变换实现实时骨骼控制，而 STG 则通过对三维高斯进行时空自适应优化予以补充。此外，我们还利用光流识别高动态区域，并引导这些区域中 3D 高斯的自适应加密。实验结果表明，我们的方法在重建质量和运行效率方面均优于现有最先进的方法，在保持实时渲染能力的同时实现了更优的定量指标。\n"
  },
  {
    "path": "abs/2510.22213.md",
    "content": "### DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum\n\nGenerating dynamic and interactive 3D objects, such as trees, has wide applications in virtual reality, games, and world simulation. Nevertheless, existing methods still face various challenges in generating realistic 4D motion for complex real trees. In this paper, we propose DynamicTree, the first framework that can generate long-term, interactive animation of 3D Gaussian Splatting trees. Unlike prior optimization-based methods, our approach generates dynamics in a fast feed-forward manner. The key success of our approach is the use of a compact sparse voxel spectrum to represent the tree movement. Given a 3D tree from Gaussian Splatting reconstruction, our pipeline first generates mesh motion using the sparse voxel spectrum and then binds Gaussians to deform the mesh. Additionally, the proposed sparse voxel spectrum can also serve as a basis for fast modal analysis under external forces, allowing real-time interactive responses. To train our model, we also introduce 4DTree, the first large-scale synthetic 4D tree dataset containing 8,786 animated tree meshes with semantic labels and 100-frame motion sequences. Extensive experiments demonstrate that our method achieves realistic and responsive tree animations, significantly outperforming existing approaches in both visual quality and computational efficiency.\n\n生成动态且具交互性的三维物体（如树木）在虚拟现实、游戏和世界仿真等领域具有广泛应用。然而，现有方法在为复杂真实树木生成逼真的四维（4D）运动方面仍面临诸多挑战。为此，我们提出了 DynamicTree，这是首个可生成 3D 高斯泼洒树木长期、交互式动画的框架。与以往基于优化的方法不同，我们的方法以快速前馈的方式生成动态效果。该方法的关键在于引入紧凑的稀疏体素频谱来表示树木的运动。对于从高斯泼洒重建得到的三维树木，我们的流程首先利用稀疏体素频谱生成网格运动，然后将高斯绑定至网格以实现变形。此外，该稀疏体素频谱还可作为在外力作用下进行快速模态分析的基础，从而支持实时交互响应。为训练我们的模型，我们还引入了 4DTree，这是首个大规模合成的四维树木数据集，包含 8,786 个带语义标签的树木动画网格以及 100 帧的运动序列。大量实验表明，我们的方法在实现逼真且响应灵敏的树木动画方面显著优于现有方法，在视觉质量和计算效率上均表现出色。\n"
  },
  {
    "path": "abs/2510.22473.md",
    "content": "### DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss\n\nRecent advancements in 2D and 3D generative models have expanded the capabilities of computer vision. However, generating high-quality 4D dynamic content from a single static image remains a significant challenge. Traditional methods have limitations in modeling temporal dependencies and accurately capturing dynamic geometry changes, especially when considering variations in camera perspective. To address this issue, we propose DynaPose4D, an innovative solution that integrates 4D Gaussian Splatting (4DGS) techniques with Category-Agnostic Pose Estimation (CAPE) technology. This framework uses 3D Gaussian Splatting to construct a 3D model from single images, then predicts multi-view pose keypoints based on one-shot support from a chosen view, leveraging supervisory signals to enhance motion consistency. Experimental results show that DynaPose4D achieves excellent coherence, consistency, and fluidity in dynamic motion generation. These findings not only validate the efficacy of the DynaPose4D framework but also indicate its potential applications in the domains of computer vision and animation production.\n\n近年来，二维与三维生成模型的进展显著拓展了计算机视觉的能力。然而，从单张静态图像生成高质量四维动态内容仍是一项重大挑战。传统方法在建模时间依赖性和精确捕捉动态几何变化方面存在局限，尤其是在摄像机视角变化的情况下。为了解决这一问题，我们提出了 DynaPose4D，这是一种将四维高斯泼洒（4DGS）技术与类别无关的姿态估计（CAPE）技术相结合的创新方案。该框架首先利用三维高斯泼洒从单张图像构建三维模型，然后基于选定视角的 one-shot 支持预测多视角姿态关键点，并利用监督信号增强运动一致性。实验结果表明，DynaPose4D 在动态运动生成方面实现了极高的连贯性、一致性和流畅性。这些发现不仅验证了 DynaPose4D 框架的有效性，也表明其在计算机视觉与动画制作等领域具有广泛的应用潜力。\n"
  },
  {
    "path": "abs/2510.22600.md",
    "content": "### RoGER-SLAM: A Robust Gaussian Splatting SLAM System for Noisy and Low-light Environment Resilience\n\nThe reliability of Simultaneous Localization and Mapping (SLAM) is severely constrained in environments where visual inputs suffer from noise and low illumination. Although recent 3D Gaussian Splatting (3DGS) based SLAM frameworks achieve high-fidelity mapping under clean conditions, they remain vulnerable to compounded degradations that degrade mapping and tracking performance. A key observation underlying our work is that the original 3DGS rendering pipeline inherently behaves as an implicit low-pass filter, attenuating high-frequency noise but also risking over-smoothing. Building on this insight, we propose RoGER-SLAM, a robust 3DGS SLAM system tailored for noise and low-light resilience. The framework integrates three innovations: a Structure-Preserving Robust Fusion (SP-RoFusion) mechanism that couples rendered appearance, depth, and edge cues; an adaptive tracking objective with residual balancing regularization; and a Contrastive Language-Image Pretraining (CLIP)-based enhancement module, selectively activated under compounded degradations to restore semantic and structural fidelity. Comprehensive experiments on Replica, TUM, and real-world sequences show that RoGER-SLAM consistently improves trajectory accuracy and reconstruction quality compared with other 3DGS-SLAM systems, especially under adverse imaging conditions.\n\n在视觉输入受噪声和低照度影响的环境中，同时定位与建图（SLAM）的可靠性受到严重限制。尽管近年来基于三维高斯泼洒（3DGS）的 SLAM 框架在理想清洁条件下可实现高保真建图，但在复合退化环境下，其建图与跟踪性能仍然容易受到影响。我们工作的一个关键观察是，原始的 3DGS 渲染管线本质上表现出一种隐式低通滤波器的特性，能够抑制高频噪声，但也存在过度平滑的风险。基于这一洞察，我们提出了 RoGER-SLAM，这是一个面向噪声和低光照环境鲁棒性的 3DGS SLAM 系统。该框架融合了三项创新：结构保持的鲁棒融合机制（SP-RoFusion），结合渲染外观、深度与边缘线索；带残差平衡正则项的自适应跟踪目标；以及基于对比语言-图像预训练（CLIP）的增强模块，可在复合退化条件下有选择地激活，以恢复语义和结构的保真度。我们在 Replica、TUM 及真实世界序列上的综合实验表明，RoGER-SLAM 在各种恶劣成像条件下均显著提升了轨迹精度与重建质量，相较其他 3DGS-SLAM 系统表现更加出色。\n"
  },
  {
    "path": "abs/2510.22669.md",
    "content": "### LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering\n\n3D Gaussian Splatting SLAM has emerged as a widely used technique for high-fidelity mapping in spatial intelligence. However, existing methods often rely on a single representation scheme, which limits their performance in large-scale dynamic outdoor scenes and leads to cumulative pose errors and scale ambiguity. To address these challenges, we propose \\textbf{LVD-GS}, a novel LiDAR-Visual 3D Gaussian Splatting SLAM system. Motivated by the human chain-of-thought process for information seeking, we introduce a hierarchical collaborative representation module that facilitates mutual reinforcement for mapping optimization, effectively mitigating scale drift and enhancing reconstruction robustness. Furthermore, to effectively eliminate the influence of dynamic objects, we propose a joint dynamic modeling module that generates fine-grained dynamic masks by fusing open-world segmentation with implicit residual constraints, guided by uncertainty estimates from DINO-Depth features. Extensive evaluations on KITTI, nuScenes, and self-collected datasets demonstrate that our approach achieves state-of-the-art performance compared to existing methods.\n\n3D 高斯泼洒 SLAM 已成为空间智能中广泛应用的高保真建图技术。然而，现有方法通常依赖单一的表示方案，这在大规模动态户外场景中会限制其性能，导致位姿误差累积与尺度歧义。为了解决这些问题，我们提出了 **LVD-GS**，一种新颖的激光-视觉融合的 3D 高斯泼洒 SLAM 系统。受人类在信息搜索过程中“思维链”机制的启发，我们引入了层次化协同表示模块，实现了建图优化中的互补强化，有效缓解了尺度漂移问题并提升了重建的鲁棒性。此外，为了有效消除动态物体的干扰，我们提出了联合动态建模模块，通过融合开放世界分割与隐式残差约束，并结合 DINO-Depth 特征的不确定性估计，引导生成细粒度的动态掩码。在 KITTI、nuScenes 以及自采数据集上的大量实验表明，我们的方法在性能上全面超越现有方法，达到了最先进的水平。\n"
  },
  {
    "path": "abs/2510.22812.md",
    "content": "### Region-Adaptive Learned Hierarchical Encoding for 3D Gaussian Splatting Data\n\nWe introduce Region-Adaptive Learned Hierarchical Encoding (RALHE) for 3D Gaussian Splatting (3DGS) data. While 3DGS has recently become popular for novel view synthesis, the size of trained models limits its deployment in bandwidth-constrained applications such as volumetric media streaming. To address this, we propose a learned hierarchical latent representation that builds upon the principles of \"overfitted\" learned image compression (e.g., Cool-Chic and C3) to efficiently encode 3DGS attributes. Unlike images, 3DGS data have irregular spatial distributions of Gaussians (geometry) and consist of multiple attributes (signals) defined on the irregular geometry. Our codec is designed to account for these differences between images and 3DGS. Specifically, we leverage the octree structure of the voxelized 3DGS geometry to obtain a hierarchical multi-resolution representation. Our approach overfits latents to each Gaussian attribute under a global rate constraint. These latents are decoded independently through a lightweight decoder network. To estimate the bitrate during training, we employ an autoregressive probability model that leverages octree-derived contexts from the 3D point structure. The multi-resolution latents, decoder, and autoregressive entropy coding networks are jointly optimized for each Gaussian attribute. Experiments demonstrate that the proposed RALHE compression framework achieves a rendering PSNR gain of up to 2dB at low bitrates (less than 1 MB) compared to the baseline 3DGS compression methods.\n\n我们提出了一种用于 3D 高斯泼洒（3DGS）数据的区域自适应学习分层编码方法（RALHE）。尽管 3DGS 近期在新视角合成任务中广受关注，但其训练模型的体积限制了其在带宽受限的应用场景（如体积媒体流传输）中的部署。为了解决这一问题，我们提出了一种基于“过拟合”图像压缩原理（如 Cool-Chic 和 C3）构建的分层潜表示方法，以高效编码 3DGS 属性。与图像不同，3DGS 数据的高斯分布在空间上是不规则的（几何结构），并且包含多个定义于该不规则几何上的属性（信号）。我们设计的编码器充分考虑了图像与 3DGS 之间的这些差异。具体而言，我们利用体素化后的 3DGS 几何结构中的八叉树（octree）构建多分辨率的层次表示。我们的方法在全局码率约束下，对每个高斯属性的潜变量进行过拟合，并通过轻量级解码网络独立地解码各个潜变量。训练过程中，我们采用一种自回归概率模型对码率进行估计，该模型利用了来自 3D 点结构中八叉树上下文的信息。多分辨率潜变量、解码器和自回归熵编码网络被联合优化，以适配每个高斯属性。实验表明，所提出的 RALHE 压缩框架在低比特率（小于 1MB）下，相较于现有 3DGS 压缩方法，在渲染 PSNR 上最多可提升 2dB。\n"
  },
  {
    "path": "abs/2510.22930.md",
    "content": "### Gen-LangSplat: Generalized Language Gaussian Splatting with Pre-Trained Feature Compression\n\nModeling open-vocabulary language fields in 3D is essential for intuitive human-AI interaction and querying within physical environments. State-of-the-art approaches, such as LangSplat, leverage 3D Gaussian Splatting to efficiently construct these language fields, encoding features distilled from high-dimensional models like CLIP. However, this efficiency is currently offset by the requirement to train a scene-specific language autoencoder for feature compression, introducing a costly, per-scene optimization bottleneck that hinders deployment scalability. In this work, we introduce Gen-LangSplat, that eliminates this requirement by replacing the scene-wise autoencoder with a generalized autoencoder, pre-trained extensively on the large-scale ScanNet dataset. This architectural shift enables the use of a fixed, compact latent space for language features across any new scene without any scene-specific training. By removing this dependency, our entire language field construction process achieves a efficiency boost while delivering querying performance comparable to, or exceeding, the original LangSplat method. To validate our design choice, we perform a thorough ablation study empirically determining the optimal latent embedding dimension and quantifying representational fidelity using Mean Squared Error and cosine similarity between the original and reprojected 512-dimensional CLIP embeddings. Our results demonstrate that generalized embeddings can efficiently and accurately support open-vocabulary querying in novel 3D scenes, paving the way for scalable, real-time interactive 3D AI applications.\n\n在三维空间中建模开放词汇语言场，对于实现直观的人机交互与物理环境中的查询至关重要。当前最先进的方法（如 LangSplat）利用 3D 高斯泼洒技术高效构建语言场，通过编码由高维模型（如 CLIP）提取的特征来实现。然而，这种高效性被其对场景特定语言自动编码器进行训练的需求所抵消，该过程用于特征压缩，但会引入昂贵的逐场景优化瓶颈，从而限制了部署的可扩展性。为了解决这一问题，我们提出了 Gen-LangSplat，通过引入一个在大规模 ScanNet 数据集上预训练的通用自动编码器，替代原先需要针对每个场景单独训练的自动编码器。该架构变更使得我们能够在任何新场景中使用一个固定、紧凑的语言特征潜空间，无需进行场景特定的训练。通过移除这一依赖，我们整体的语言场构建流程在显著提升效率的同时，仍可提供与原始 LangSplat 方法相当甚至更优的查询性能。为验证我们的设计选择，我们进行了详尽的消融实验，实证确定了最佳潜嵌入维度，并通过均方误差（MSE）和余弦相似度指标，量化了原始与重投影后的 512 维 CLIP 嵌入之间的表征保真度。实验结果表明，通用嵌入可以高效且精确地支持在新颖三维场景中的开放词汇查询，为可扩展、实时的交互式三维 AI 应用奠定了基础。\n"
  },
  {
    "path": "abs/2510.23087.md",
    "content": "### EndoWave: Rational-Wavelet 4D Gaussian Splatting for Endoscopic Reconstruction\n\nIn robot-assisted minimally invasive surgery, accurate 3D reconstruction from endoscopic video is vital for downstream tasks and improved outcomes. However, endoscopic scenarios present unique challenges, including photometric inconsistencies, non-rigid tissue motion, and view-dependent highlights. Most 3DGS-based methods that rely solely on appearance constraints for optimizing 3DGS are often insufficient in this context, as these dynamic visual artifacts can mislead the optimization process and lead to inaccurate reconstructions. To address these limitations, we present EndoWave, a unified spatio-temporal Gaussian Splatting framework by incorporating an optical flow-based geometric constraint and a multi-resolution rational wavelet supervision. First, we adopt a unified spatio-temporal Gaussian representation that directly optimizes primitives in a 4D domain. Second, we propose a geometric constraint derived from optical flow to enhance temporal coherence and effectively constrain the 3D structure of the scene. Third, we propose a multi-resolution rational orthogonal wavelet as a constraint, which can effectively separate the details of the endoscope and enhance the rendering performance. Extensive evaluations on two real surgical datasets, EndoNeRF and StereoMIS, demonstrate that our method EndoWave achieves state-of-the-art reconstruction quality and visual accuracy compared to the baseline method.\n\n在机器人辅助手术的微创场景中，从内窥镜视频中实现精确的三维重建对后续任务与提高手术效果至关重要。然而，内窥镜环境具有独特的挑战，例如光照不一致、非刚性组织运动以及视角相关的高光等问题。多数仅依赖外观约束进行 3DGS 优化的方法，在此类动态视觉干扰下常常难以保证重建的准确性。为应对这些限制，我们提出了 EndoWave——一种融合了光流几何约束与多分辨率有理小波监督的统一时空高斯泼洒框架。首先，我们采用统一的时空高斯表示，直接在四维空间中优化原始图元；其次，我们引入基于光流的几何约束，以增强时序一致性并有效约束场景的三维结构；第三，我们提出使用多分辨率有理正交小波作为附加监督，有效分离内窥镜图像中的细节信息并提升渲染表现。我们在两个真实手术数据集 EndoNeRF 和 StereoMIS 上进行了广泛评估，结果表明 EndoWave 在重建质量与视觉准确性方面均优于现有基线方法，达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2510.23205.md",
    "content": "### VR-Drive: Viewpoint-Robust End-to-End Driving with Feed-Forward 3D Gaussian Splatting\n\nEnd-to-end autonomous driving (E2E-AD) has emerged as a promising paradigm that unifies perception, prediction, and planning into a holistic, data-driven framework. However, achieving robustness to varying camera viewpoints, a common real-world challenge due to diverse vehicle configurations, remains an open problem. In this work, we propose VR-Drive, a novel E2E-AD framework that addresses viewpoint generalization by jointly learning 3D scene reconstruction as an auxiliary task to enable planning-aware view synthesis. Unlike prior scene-specific synthesis approaches, VR-Drive adopts a feed-forward inference strategy that supports online training-time augmentation from sparse views without additional annotations. To further improve viewpoint consistency, we introduce a viewpoint-mixed memory bank that facilitates temporal interaction across multiple viewpoints and a viewpoint-consistent distillation strategy that transfers knowledge from original to synthesized views. Trained in a fully end-to-end manner, VR-Drive effectively mitigates synthesis-induced noise and improves planning under viewpoint shifts. In addition, we release a new benchmark dataset to evaluate E2E-AD performance under novel camera viewpoints, enabling comprehensive analysis. Our results demonstrate that VR-Drive is a scalable and robust solution for the real-world deployment of end-to-end autonomous driving systems.\n\n端到端自动驾驶（E2E-AD）作为一种有前景的新范式，将感知、预测与规划整合为一个整体的数据驱动框架。然而，在应对不同摄像头视角这一由于车辆配置多样性而在真实世界中普遍存在的挑战方面，仍存在显著困难。为此，我们提出了 VR-Drive，一个新颖的 E2E-AD 框架，通过联合学习三维场景重建作为辅助任务，实现面向规划的视图合成，从而增强视角泛化能力。与以往依赖特定场景合成的方法不同，VR-Drive 采用前馈式推理策略，支持从稀疏视图进行在线训练时增强，无需额外注释。为了进一步提升视角一致性，我们引入了视角混合记忆库，用于促进跨多个视角的时序交互，并提出了一种视角一致的蒸馏策略，将原始视图中的知识迁移至合成视图。在完全端到端的训练范式下，VR-Drive 有效缓解了由合成引入的噪声，并提升了在视角变换下的规划性能。此外，我们还发布了一个全新的基准数据集，用于评估在新颖摄像头视角下的 E2E-AD 表现，从而支持更全面的分析。实验结果表明，VR-Drive 是一个具备可扩展性与鲁棒性的解决方案，适用于真实环境中的端到端自动驾驶系统部署。\n"
  },
  {
    "path": "abs/2510.23521.md",
    "content": "### Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation\n\nRemembering where object segments were predicted in the past is useful for improving the accuracy and consistency of class-agnostic video segmentation algorithms. Existing video segmentation algorithms typically use either no object-level memory (e.g. FastSAM) or they use implicit memories in the form of recurrent neural network features (e.g. SAM2). In this paper, we augment both types of segmentation models using an explicit 3D memory and show that the resulting models have more accurate and consistent predictions. For this, we develop an online 3D Gaussian Splatting (3DGS) technique to store predicted object-level segments generated throughout the duration of a video. Based on this 3DGS representation, a set of fusion techniques are developed, named FastSAM-Splat and SAM2-Splat, that use the explicit 3DGS memory to improve their respective foundation models' predictions. Ablation experiments are used to validate the proposed techniques' design and hyperparameter settings. Results from both real-world and simulated benchmarking experiments show that models which use explicit 3D memories result in more accurate and consistent predictions than those which use no memory or only implicit neural network memories.\n\n记忆视频中过去所预测的目标分割位置，有助于提升类别无关视频分割算法的准确性与一致性。现有视频分割算法通常要么不使用目标级别的记忆（如 FastSAM），要么仅使用隐式记忆，例如循环神经网络特征（如 SAM2）。在本文中，我们通过引入显式的三维记忆来增强这两类分割模型，并证明增强后的模型在预测准确性和一致性方面都有显著提升。为此，我们提出了一种在线 3D 高斯泼洒（3DGS）技术，用于存储视频整个时序过程中预测得到的目标级别分割信息。基于该 3DGS 表示，我们开发了一系列融合技术，分别命名为 FastSAM-Splat 和 SAM2-Splat，利用显式的 3DGS 记忆来提升各自基础模型的预测性能。我们还通过消融实验验证了所提出方法的设计合理性与超参数设置。来自真实世界与模拟基准测试的实验结果表明，相较于不使用记忆或仅使用隐式神经网络记忆的模型，采用显式三维记忆的模型在预测准确度和一致性方面表现得更加优越。\n"
  },
  {
    "path": "abs/2510.23930.md",
    "content": "### PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors\n\nThree-dimensional Gaussian Splatting (3DGS) has recently emerged as an efficient representation for novel-view synthesis, achieving impressive visual quality. However, in scenes dominated by large and low-texture regions, common in indoor environments, the photometric loss used to optimize 3DGS yields ambiguous geometry and fails to recover high-fidelity 3D surfaces. To overcome this limitation, we introduce PlanarGS, a 3DGS-based framework tailored for indoor scene reconstruction. Specifically, we design a pipeline for Language-Prompted Planar Priors (LP3) that employs a pretrained vision-language segmentation model and refines its region proposals via cross-view fusion and inspection with geometric priors. 3D Gaussians in our framework are optimized with two additional terms: a planar prior supervision term that enforces planar consistency, and a geometric prior supervision term that steers the Gaussians toward the depth and normal cues. We have conducted extensive experiments on standard indoor benchmarks. The results show that PlanarGS reconstructs accurate and detailed 3D surfaces, consistently outperforming state-of-the-art methods by a large margin.\n\n三维高斯泼洒（3DGS）近年来作为一种高效的新视角合成表示方法受到广泛关注，并在视觉质量上取得了令人印象深刻的成果。然而，在以大面积低纹理区域为主的场景中（如常见的室内环境），用于优化 3DGS 的光度损失容易导致几何形状模糊，难以还原高保真的三维表面。为了解决这一问题，我们提出了 PlanarGS——一个专为室内场景重建设计的 3DGS 框架。具体而言，我们设计了一条名为“语言提示平面先验”（LP3）的处理流程，利用预训练的视觉-语言分割模型生成初始区域提议，并通过跨视图融合与几何先验的检验机制进一步优化这些提议。在此基础上，我们在 3D 高斯的优化过程中引入了两个附加监督项：一是平面先验监督项，用于保持平面一致性；二是几何先验监督项，用于引导高斯分布对齐于深度和法向信息。我们在多个标准室内数据集上进行了大量实验，结果表明 PlanarGS 能够重建出准确且细节丰富的三维表面，在多个指标上大幅超越当前最先进方法。\n"
  },
  {
    "path": "abs/2510.24118.md",
    "content": "### LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation\n\nNavigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation.\n\n利用视觉信息导航至指定目标是智能机器人的一项基本能力。大多数传统视觉导航方法局限于单目标、单模态以及封闭集目标的设定，难以满足实际应用中对多模态、开放词汇目标查询以及多目标导航的需求。为此，我们提出了 LagMemo，一种基于语言三维高斯泼洒记忆的导航系统。在探索过程中，LagMemo 构建一个统一的三维语言记忆；在接收到任务目标后，系统会查询该记忆体，预测候选目标位置，并融合基于局部感知的验证机制，在导航过程中动态匹配和验证目标。为实现公平且严谨的评估，我们从 GOAT-Bench 中精选并构建了 GOAT-Core，一个专门针对多模态、开放词汇、多目标视觉导航任务的高质量核心数据划分。实验结果表明，LagMemo 的记忆模块能够实现高效的多模态开放词汇目标定位，且整体性能在多目标视觉导航任务中显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2510.24335.md",
    "content": "### NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation\n\nWe present NVSim, a framework that automatically constructs large-scale, navigable indoor simulators from only common image sequences, overcoming the cost and scalability limitations of traditional 3D scanning. Our approach adapts 3D Gaussian Splatting to address visual artifacts on sparsely observed floors a common issue in robotic traversal data. We introduce Floor-Aware Gaussian Splatting to ensure a clean, navigable ground plane, and a novel mesh-free traversability checking algorithm that constructs a topological graph by directly analyzing rendered views. We demonstrate our system's ability to generate valid, large-scale navigation graphs from real-world data.\n\n我们提出了 NVSim，一个可从普通图像序列中自动构建大规模、可导航室内仿真环境的框架，有效克服了传统三维扫描在成本和可扩展性方面的限制。我们的方法对 3D 高斯泼洒进行了适配，以解决机器人导航数据中常见的稀疏观测地面导致的视觉伪影问题。我们引入了“地面感知高斯泼洒”技术，以确保生成干净、可行走的地面平面，并设计了一种全新的无网格可通行性检测算法，该算法通过直接分析渲染视图构建拓扑导航图。我们展示了该系统在真实世界数据中自动生成有效且大规模导航图的能力。\n"
  },
  {
    "path": "abs/2510.24734.md",
    "content": "### DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes\n\nReal-time, high-fidelity reconstruction of dynamic driving scenes is challenged by complex dynamics and sparse views, with prior methods struggling to balance quality and efficiency. We propose DrivingScene, an online, feed-forward framework that reconstructs 4D dynamic scenes from only two consecutive surround-view images. Our key innovation is a lightweight residual flow network that predicts the non-rigid motion of dynamic objects per camera on top of a learned static scene prior, explicitly modeling dynamics via scene flow. We also introduce a coarse-to-fine training paradigm that circumvents the instabilities common to end-to-end approaches. Experiments on nuScenes dataset show our image-only method simultaneously generates high-quality depth, scene flow, and 3D Gaussian point clouds online, significantly outperforming state-of-the-art methods in both dynamic reconstruction and novel view synthesis.\n\n在动态驾驶场景中实现实时、高保真的重建面临复杂动态变化和视角稀疏性的挑战，现有方法在质量与效率之间往往难以兼顾。为此，我们提出了 DrivingScene，一个基于在线前馈机制的框架，仅利用两帧连续的环视图像即可重建四维动态场景。我们的核心创新在于引入一个轻量级残差流网络，结合静态场景先验，在每个摄像头视角下预测动态物体的非刚性运动，通过场景流显式建模动态变化。此外，我们还提出了粗到细的训练范式，有效规避了端到端方法常见的不稳定性问题。在 nuScenes 数据集上的实验表明，我们的图像驱动方法能够同时在线生成高质量的深度图、场景流和三维高斯点云，在动态重建与新视角合成方面均显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2510.25129.md",
    "content": "### AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians\n\n3D reconstruction of indoor and urban environments is a prominent research topic with various downstream applications. However, existing geometric priors for addressing low-texture regions in indoor and urban settings often lack global consistency. Moreover, Gaussian Splatting and implicit SDF fields often suffer from discontinuities or exhibit computational inefficiencies, resulting in a loss of detail. To address these issues, we propose an Atlanta-world guided implicit-structured Gaussian Splatting that achieves smooth indoor and urban scene reconstruction while preserving high-frequency details and rendering efficiency. By leveraging the Atlanta-world model, we ensure the accurate surface reconstruction for low-texture regions, while the proposed novel implicit-structured GS representations provide smoothness without sacrificing efficiency and high-frequency details. Specifically, we propose a semantic GS representation to predict the probability of all semantic regions and deploy a structure plane regularization with learnable plane indicators for global accurate surface reconstruction. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both indoor and urban scenes, delivering superior surface reconstruction quality.\n\n室内与城市环境的三维重建是一个重要的研究课题，具有广泛的下游应用前景。然而，现有用于处理低纹理区域的几何先验在室内与城市场景中往往缺乏全局一致性。同时，Gaussian Splatting 和隐式 SDF 场通常存在几何不连续或计算效率低下的问题，导致细节丢失。为了解决这些问题，我们提出了一种基于 Atlanta-world 模型引导的隐式结构化 Gaussian Splatting 方法，能够在保持高频细节与渲染效率的同时，实现平滑的室内与城市场景重建。通过引入 Atlanta-world 模型，我们确保了低纹理区域的表面重建准确性；而我们所提出的新型隐式结构化 GS 表示则在不牺牲效率与细节的前提下带来了结构连续性。具体来说，我们提出了一种语义高斯表示方法，用于预测所有语义区域的概率分布，并引入带有可学习平面指示器的结构平面正则项，以实现全局准确的表面重建。大量实验表明，我们的方法在室内与城市场景中均优于现有最先进方法，能够提供更高质量的表面重建效果。\n"
  },
  {
    "path": "abs/2510.25146.md",
    "content": "### EA3D: Online Open-World 3D Object Extraction from Streaming Videos\n\nCurrent 3D scene understanding methods are limited by offline-collected multi-view data or pre-constructed 3D geometry. In this paper, we present ExtractAnything3D (EA3D), a unified online framework for open-world 3D object extraction that enables simultaneous geometric reconstruction and holistic scene understanding. Given a streaming video, EA3D dynamically interprets each frame using vision-language and 2D vision foundation encoders to extract object-level knowledge. This knowledge is integrated and embedded into a Gaussian feature map via a feed-forward online update strategy. We then iteratively estimate visual odometry from historical frames and incrementally update online Gaussian features with new observations. A recurrent joint optimization module directs the model's attention to regions of interest, simultaneously enhancing both geometric reconstruction and semantic understanding. Extensive experiments across diverse benchmarks and tasks, including photo-realistic rendering, semantic and instance segmentation, 3D bounding box and semantic occupancy estimation, and 3D mesh generation, demonstrate the effectiveness of EA3D. Our method establishes a unified and efficient framework for joint online 3D reconstruction and holistic scene understanding, enabling a broad range of downstream tasks.\n\n当前的三维场景理解方法通常依赖离线采集的多视角数据或预构建的三维几何信息，存在适应性差和灵活性不足的问题。为此，我们提出了 ExtractAnything3D（EA3D），一个面向开放世界的统一在线三维目标提取框架，能够同时完成几何重建与整体场景理解。面对视频流输入，EA3D 利用视觉-语言与二维视觉基础模型对每一帧图像进行动态解析，提取目标级别的知识，并通过前馈式在线更新策略将这些知识集成并嵌入至高斯特征图中。随后，系统通过历史帧迭代估计视觉里程计，并用新的观测不断更新在线高斯特征。一个循环联合优化模块引导模型聚焦于兴趣区域，从而同时提升几何重建的精度与语义理解的完整性。我们在多个基准数据集和任务上进行了广泛实验证明，包括照片级真实感渲染、语义与实例分割、三维边界框与语义占据估计、三维网格生成等。结果表明，EA3D 在多个方面均展现出卓越性能，成功构建了一个统一高效的三维在线重建与整体场景理解框架，支持多样的下游任务应用。\n"
  },
  {
    "path": "abs/2510.25234.md",
    "content": "### Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation\n\nExpressions are fundamental to conveying human emotions. With the rapid advancement of AI-generated content (AIGC), realistic and expressive 3D facial animation has become increasingly crucial. Despite recent progress in speech-driven lip-sync for talking-face animation, generating emotionally expressive talking faces remains underexplored. A major obstacle is the scarcity of real emotional 3D talking-face datasets due to the high cost of data capture. To address this, we model facial animation driven by both speech and emotion as a linear additive problem. Leveraging a 3D talking-face dataset with neutral expressions (VOCAset) and a dataset of 3D expression sequences (Florence4D), we jointly learn a set of blendshapes driven by speech and emotion. We introduce a sparsity constraint loss to encourage disentanglement between the two types of blendshapes while allowing the model to capture inherent secondary cross-domain deformations present in the training data. The learned blendshapes can be further mapped to the expression and jaw pose parameters of the FLAME model, enabling the animation of 3D Gaussian avatars. Qualitative and quantitative experiments demonstrate that our method naturally generates talking faces with specified expressions while maintaining accurate lip synchronization. Perceptual studies further show that our approach achieves superior emotional expressivity compared to existing methods, without compromising lip-sync quality.\n\n表情是传达情感的基本方式。随着 AIGC（AI 生成内容）的迅猛发展，真实且富有表现力的三维人脸动画变得日益重要。尽管近年来语音驱动的说话人脸动画（lip-sync）技术取得了显著进展，但生成具有情绪表达的说话人脸仍然研究不足。一个主要障碍在于获取真实的情感三维说话人脸数据集成本高昂，导致相关数据资源稀缺。为此，我们将语音与情绪共同驱动的人脸动画建模为一个线性可加的问题。我们联合使用中性表情的三维说话人脸数据集（VOCAset）与三维表情序列数据集（Florence4D），共同学习一组分别由语音与情绪驱动的 blendshape。我们引入稀疏约束损失，以鼓励两种 blendshape 之间的解耦，同时允许模型捕捉训练数据中固有的跨域次级变形。学习得到的 blendshape 可进一步映射为 FLAME 模型中的表情与下颌姿态参数，从而实现三维高斯头像的动画生成。定性与定量实验表明，我们的方法能够自然地生成带有指定情绪表达的说话人脸，同时保持高度准确的唇形同步。感知研究进一步表明，与现有方法相比，我们的方法在不牺牲唇形同步质量的前提下，实现了更优越的情感表现力。\n"
  },
  {
    "path": "abs/2510.26117.md",
    "content": "### JOGS: Joint Optimization of Pose Estimation and 3D Gaussian Splatting\n\nTraditional novel view synthesis methods heavily rely on external camera pose estimation tools such as COLMAP, which often introduce computational bottlenecks and propagate errors. To address these challenges, we propose a unified framework that jointly optimizes 3D Gaussian points and camera poses without requiring pre-calibrated inputs. Our approach iteratively refines 3D Gaussian parameters and updates camera poses through a novel co-optimization strategy, ensuring simultaneous improvements in scene reconstruction fidelity and pose accuracy. The key innovation lies in decoupling the joint optimization into two interleaved phases: first, updating 3D Gaussian parameters via differentiable rendering with fixed poses, and second, refining camera poses using a customized 3D optical flow algorithm that incorporates geometric and photometric constraints. This formulation progressively reduces projection errors, particularly in challenging scenarios with large viewpoint variations and sparse feature distributions, where traditional methods struggle. Extensive evaluations on multiple datasets demonstrate that our approach significantly outperforms existing COLMAP-free techniques in reconstruction quality, and also surpasses the standard COLMAP-based baseline in general.\n\n传统的新视角合成方法严重依赖于外部相机位姿估计工具，如 COLMAP，这类工具常常成为计算瓶颈，并可能引入累积误差。为了解决这些问题，我们提出了一个统一框架，在无需预先校准输入的情况下联合优化三维高斯点与相机位姿。我们的方法通过一种新颖的协同优化策略，迭代地优化三维高斯参数并更新相机位姿，从而实现对场景重建质量和位姿准确性的同步提升。其核心创新在于将联合优化过程解耦为两个交替阶段：第一阶段在固定位姿条件下，通过可微渲染优化三维高斯参数；第二阶段使用定制的三维光流算法，在结合几何与光度约束的基础上对相机位姿进行精细优化。该策略能够持续降低投影误差，尤其适用于视角变化大、特征分布稀疏等传统方法难以处理的复杂场景。在多个数据集上的大量实验结果表明，我们的方法在重建质量方面显著优于现有的无 COLMAP 方法，整体表现也超过了标准的基于 COLMAP 的方法。\n"
  },
  {
    "path": "abs/2510.26166.md",
    "content": "### 6D Channel Knowledge Map Construction via Bidirectional Wireless Gaussian Splatting\n\nThis paper investigates the construction of channel knowledge map (CKM) from sparse channel measurements. Dif ferent from conventional two-/three-dimensional (2D/3D) CKM approaches assuming fixed base station configurations, we present a six-dimensional (6D) CKM framework named bidirectional wireless Gaussian splatting (BiWGS), which is capable of mod eling wireless channels across dynamic transmitter (Tx) and receiver (Rx) positions in 3D space. BiWGS uses Gaussian el lipsoids to represent virtual scatterer clusters and environmental obstacles in the wireless environment. By properly learning the bidirectional scattering patterns and complex attenuation profiles based on channel measurements, these ellipsoids inherently cap ture the electromagnetic transmission characteristics of wireless environments, thereby accurately modeling signal transmission under varying transceiver configurations. Experiment results show that BiWGS significantly outperforms classic multi-layer perception (MLP) for the construction of 6D channel power gain map with varying Tx-Rx positions, and achieves spatial spectrum prediction accuracy comparable to the state-of-the art wireless radiation field Gaussian splatting (WRF-GS) for 3D CKM construction. This validates the capability of the proposed BiWGS in accomplishing dimensional expansion of 6D CKM construction, without compromising fidelity.\n\n本文研究了如何从稀疏信道测量数据中构建信道知识图（CKM）。不同于传统的二维/三维（2D/3D）CKM 方法通常假设基站配置固定，我们提出了一种六维（6D）CKM 框架，称为双向无线高斯泼洒（BiWGS），该方法能够建模三维空间中发射端（Tx）与接收端（Rx）位置动态变化下的无线信道。BiWGS 使用高斯椭球体来表示无线环境中的虚拟散射簇与环境障碍物。通过基于信道测量数据学习双向散射模式与复杂衰减特征，这些椭球体能够内在地捕捉无线传播环境中的电磁传输特性，从而在不同的收发配置下准确建模信号传输过程。实验结果表明，BiWGS 在构建具有动态 Tx-Rx 位置的 6D 信道功率增益图方面，显著优于传统的多层感知机（MLP），并在三维 CKM 构建的空间频谱预测精度上可与当前最先进的无线辐射场高斯泼洒（WRF-GS）方法相媲美。这验证了 BiWGS 能够在不牺牲建模精度的前提下，有效实现 CKM 构建的维度扩展。\n"
  },
  {
    "path": "abs/2510.26358.md",
    "content": "### AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM\n\nAutonomous robots in orchards require real-time 3D scene understanding despite repetitive row geometry, seasonal appearance changes, and wind-driven foliage motion. We present AgriGS-SLAM, a Visual--LiDAR SLAM framework that couples direct LiDAR odometry and loop closures with multi-camera 3D Gaussian Splatting (3DGS) rendering. Batch rasterization across complementary viewpoints recovers orchard structure under occlusions, while a unified gradient-driven map lifecycle executed between keyframes preserves fine details and bounds memory. Pose refinement is guided by a probabilistic LiDAR-based depth consistency term, back-propagated through the camera projection to tighten geometry-appearance coupling. We deploy the system on a field platform in apple and pear orchards across dormancy, flowering, and harvesting, using a standardized trajectory protocol that evaluates both training-view and novel-view synthesis to reduce 3DGS overfitting in evaluation. Across seasons and sites, AgriGS-SLAM delivers sharper, more stable reconstructions and steadier trajectories than recent state-of-the-art 3DGS-SLAM baselines while maintaining real-time performance on-tractor. While demonstrated in orchard monitoring, the approach can be applied to other outdoor domains requiring robust multimodal perception.\n\n果园中的自主机器人需要在具有重复行间几何、季节性外观变化和风致叶片运动等复杂条件下，实现实时的三维场景理解。为此，我们提出了 AgriGS-SLAM，一种结合了直接激光雷达里程计与回环检测的视觉–激光 SLAM 框架，并集成了多相机三维高斯泼洒（3DGS）渲染模块。该系统通过从互补视角批量光栅化，有效恢复了被遮挡区域的果园结构，同时在关键帧之间执行统一的梯度驱动地图生命周期管理，既保持了细节，又控制了内存使用。位姿优化由基于激光雷达的概率深度一致性项引导，并通过相机投影反向传播，从而增强几何与外观之间的耦合。我们在实地平台上将该系统部署于苹果和梨果园中，覆盖休眠期、开花期和采摘期，并使用标准化轨迹协议对训练视角和新视角的合成效果进行评估，从而降低 3DGS 在评估阶段的过拟合风险。在不同季节和场地中，AgriGS-SLAM 相较于当前最先进的 3DGS-SLAM 基线方法，在保证实时性（可在拖拉机上运行）的同时，实现了更清晰、更稳定的重建效果与更平稳的轨迹表现。虽然该方法主要应用于果园监测，但同样适用于其他需要强鲁棒多模态感知的户外环境。\n"
  },
  {
    "path": "abs/2510.26786.md",
    "content": "### HEIR: Learning Graph-Based Motion Hierarchies\n\nHierarchical structures of motion exist across research fields, including computer vision, graphics, and robotics, where complex dynamics typically arise from coordinated interactions among simpler motion components. Existing methods to model such dynamics typically rely on manually-defined or heuristic hierarchies with fixed motion primitives, limiting their generalizability across different tasks. In this work, we propose a general hierarchical motion modeling method that learns structured, interpretable motion relationships directly from data. Our method represents observed motions using graph-based hierarchies, explicitly decomposing global absolute motions into parent-inherited patterns and local motion residuals. We formulate hierarchy inference as a differentiable graph learning problem, where vertices represent elemental motions and directed edges capture learned parent-child dependencies through graph neural networks. We evaluate our hierarchical reconstruction approach on three examples: 1D translational motion, 2D rotational motion, and dynamic 3D scene deformation via Gaussian splatting. Experimental results show that our method reconstructs the intrinsic motion hierarchy in 1D and 2D cases, and produces more realistic and interpretable deformations compared to the baseline on dynamic 3D Gaussian splatting scenes. By providing an adaptable, data-driven hierarchical modeling paradigm, our method offers a formulation applicable to a broad range of motion-centric tasks.\n\n层次化的运动结构广泛存在于计算机视觉、图形学与机器人等多个研究领域中，其中复杂的动态通常来源于多个简单运动组件之间的协调交互。现有对这类动态建模的方法大多依赖人工定义或启发式构建的固定运动层级，因而在跨任务泛化方面存在局限性。为此，我们提出了一种通用的层次化运动建模方法，能够直接从数据中学习结构化、可解释的运动关系。我们的方法通过图结构层级表示观测到的运动，将整体的绝对运动显式地分解为继承自父节点的模式与局部运动残差。我们将层级结构推理建模为一个可微分的图学习问题，其中图的顶点表示基本运动单元，边则通过图神经网络学习捕捉父子之间的依赖关系。我们在三个任务上评估了该层次建模方法的重建性能：一维平移运动、二维旋转运动，以及基于高斯泼洒的三维动态场景形变。实验结果表明，我们的方法在一维与二维场景中能够成功重建内在的运动层级结构，并在动态三维高斯泼洒场景中生成更具真实感和可解释性的形变效果，优于现有基线方法。该方法提供了一种可适用于多种以运动为核心任务的数据驱动层次建模范式。\n"
  },
  {
    "path": "abs/2510.26921.md",
    "content": "### DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting\n\nWe present a Directional Consistency (DC)-driven Adaptive Density Control (ADC) for 3D Gaussian Splatting (DC4GS). Whereas the conventional ADC bases its primitive splitting on the magnitudes of positional gradients, we further incorporate the DC of the gradients into ADC, and realize it through the angular coherence of the gradients. Our DC better captures local structural complexities in ADC, avoiding redundant splitting. When splitting is required, we again utilize the DC to define optimal split positions so that sub-primitives best align with the local structures than the conventional random placement. As a consequence, our DC4GS greatly reduces the number of primitives (up to 30% in our experiments) than the existing ADC, and also enhances reconstruction fidelity greatly.\n\n我们提出了一种用于三维高斯泼洒（3D Gaussian Splatting, 3DGS）的基于方向一致性（Directional Consistency, DC）驱动的自适应密度控制方法（Adaptive Density Control, ADC），简称 DC4GS。传统 ADC 方法在进行图元分裂时主要依据位置梯度的幅值，而我们进一步引入梯度的方向一致性，并通过梯度的角度相干性实现该机制。相比之下，DC 能更有效地捕捉局部结构复杂度，从而避免冗余的分裂操作。在必须进行分裂的情况下，我们同样利用方向一致性来确定最优的分裂位置，使得子图元能更好地贴合局部结构，而非传统的随机放置策略。因此，DC4GS 在实验中相较于现有的 ADC 方法显著减少了图元数量（最多可达 30%），同时大幅提升了重建的保真度。\n"
  },
  {
    "path": "abs/2510.27318.md",
    "content": "### SAGS: Self-Adaptive Alias-Free Gaussian Splatting for Dynamic Surgical Endoscopic Reconstruction\n\nSurgical reconstruction of dynamic tissues from endoscopic videos is a crucial technology in robot-assisted surgery. The development of Neural Radiance Fields (NeRFs) has greatly advanced deformable tissue reconstruction, achieving high-quality results from video and image sequences. However, reconstructing deformable endoscopic scenes remains challenging due to aliasing and artifacts caused by tissue movement, which can significantly degrade visualization quality. The introduction of 3D Gaussian Splatting (3DGS) has improved reconstruction efficiency by enabling a faster rendering pipeline. Nevertheless, existing 3DGS methods often prioritize rendering speed while neglecting these critical issues. To address these challenges, we propose SAGS, a self-adaptive alias-free Gaussian splatting framework. We introduce an attention-driven, dynamically weighted 4D deformation decoder, leveraging 3D smoothing filters and 2D Mip filters to mitigate artifacts in deformable tissue reconstruction and better capture the fine details of tissue movement. Experimental results on two public benchmarks, EndoNeRF and SCARED, demonstrate that our method achieves superior performance in all metrics of PSNR, SSIM, and LPIPS compared to the state of the art while also delivering better visualization quality.\n\n通过内窥镜视频对动态组织进行手术重建，是机器人辅助手术中的一项关键技术。神经辐射场（NeRFs）的发展极大推动了可变形组织的重建，使得从视频和图像序列中实现高质量的重建成为可能。然而，由于组织运动引起的混叠与伪影问题，可变形内窥镜场景的重建仍然面临挑战，这些问题会显著降低可视化质量。3D高斯投影（3D Gaussian Splatting, 3DGS）的引入通过更快的渲染管线提升了重建效率。然而，现有3DGS方法往往更关注渲染速度，而忽视了上述关键问题。为应对这些挑战，我们提出了SAGS——一种自适应、无混叠的高斯投影框架。该方法引入了一个基于注意力机制的动态加权4D形变解码器，并结合3D平滑滤波器与2D Mip滤波器，有效缓解可变形组织重建中的伪影问题，更好地捕捉组织运动的细节。在两个公开基准数据集EndoNeRF与SCARED上的实验结果表明，我们的方法在PSNR、SSIM与LPIPS等所有指标上均优于现有技术，并在可视化质量上取得了更佳表现。\n"
  },
  {
    "path": "abs/2511.00248.md",
    "content": "### Object-Aware 4D Human Motion Generation\n\nRecent advances in video diffusion models have enabled the generation of high-quality videos. However, these videos still suffer from unrealistic deformations, semantic violations, and physical inconsistencies that are largely rooted in the absence of 3D physical priors. To address these challenges, we propose an object-aware 4D human motion generation framework grounded in 3D Gaussian representations and motion diffusion priors. With pre-generated 3D humans and objects, our method, Motion Score Distilled Interaction (MSDI), employs the spatial and prompt semantic information in large language models (LLMs) and motion priors through the proposed Motion Diffusion Score Distillation Sampling (MSDS). The combination of MSDS and LLMs enables our spatial-aware motion optimization, which distills score gradients from pre-trained motion diffusion models, to refine human motion while respecting object and semantic constraints. Unlike prior methods requiring joint training on limited interaction datasets, our zero-shot approach avoids retraining and generalizes to out-of-distribution object aware human motions. Experiments demonstrate that our framework produces natural and physically plausible human motions that respect 3D spatial context, offering a scalable solution for realistic 4D generation.\n\n视频扩散模型的最新进展已实现了高质量视频的生成。然而，这些视频仍然存在不真实的形变、语义违背和物理不一致性等问题，其根源在于缺乏三维物理先验。为了解决这些挑战，我们提出了一个基于三维高斯表示和运动扩散先验的、具备物体感知能力的四维人体运动生成框架。借助预生成的3D人体与物体，我们的方法——Motion Score Distilled Interaction（MSDI），通过所提出的Motion Diffusion Score Distillation Sampling（MSDS）机制，结合大语言模型（LLMs）中的空间与语义提示信息以及运动先验，实现空间感知的运动优化。MSDS与LLMs的结合，使我们能够从预训练的运动扩散模型中提取梯度得分，用于在人-物体语义约束下优化人体运动。与以往需在有限交互数据集上联合训练的方法不同，我们的零样本方法无需重新训练，能够泛化到分布外的物体感知人体运动。实验结果表明，我们的框架能够生成自然、物理合理的3D空间上下文一致的人体运动，为现实可行的4D生成提供了可扩展的解决方案。\n"
  },
  {
    "path": "abs/2511.00503.md",
    "content": "### Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models\n\nWe introduce Diff4Splat, a feed-forward method that synthesizes controllable and explicit 4D scenes from a single image. Our approach unifies the generative priors of video diffusion models with geometry and motion constraints learned from large-scale 4D datasets. Given a single input image, a camera trajectory, and an optional text prompt, Diff4Splat directly predicts a deformable 3D Gaussian field that encodes appearance, geometry, and motion, all in a single forward pass, without test-time optimization or post-hoc refinement. At the core of our framework lies a video latent transformer, which augments video diffusion models to jointly capture spatio-temporal dependencies and predict time-varying 3D Gaussian primitives. Training is guided by objectives on appearance fidelity, geometric accuracy, and motion consistency, enabling Diff4Splat to synthesize high-quality 4D scenes in 30 seconds. We demonstrate the effectiveness of Diff4Splatacross video generation, novel view synthesis, and geometry extraction, where it matches or surpasses optimization-based methods for dynamic scene synthesis while being significantly more efficient.\n\n我们提出了 Diff4Splat，一种前馈式方法，可从单张图像合成可控且显式的四维场景。该方法融合了视频扩散模型的生成先验与从大规模四维数据集中学习到的几何与运动约束。给定一张输入图像、一个相机轨迹以及可选的文本提示，Diff4Splat 能够在一次前向推理中直接预测一个可变形的三维高斯场，编码外观、几何和运动信息，无需测试时优化或后处理精修。我们框架的核心是一个视频潜变量变换器，它增强了视频扩散模型，使其能够联合捕捉时空依赖关系，并预测时变的三维高斯基元。训练过程中通过外观保真度、几何精度和运动一致性等目标进行引导，使 Diff4Splat 能在 30 秒内合成高质量的四维场景。我们在视频生成、新视角合成和几何提取等任务中展示了 Diff4Splat 的有效性，其性能与基于优化的方法相当甚至优于它们，同时具备显著更高的效率。\n"
  },
  {
    "path": "abs/2511.00560.md",
    "content": "### 4D Neural Voxel Splatting: Dynamic Scene Rendering with Voxelized Guassian Splatting\n\nAlthough 3D Gaussian Splatting (3D-GS) achieves efficient rendering for novel view synthesis, extending it to dynamic scenes still results in substantial memory overhead from replicating Gaussians across frames. To address this challenge, we propose 4D Neural Voxel Splatting (4D-NVS), which combines voxel-based representations with neural Gaussian splatting for efficient dynamic scene modeling. Instead of generating separate Gaussian sets per timestamp, our method employs a compact set of neural voxels with learned deformation fields to model temporal dynamics. The design greatly reduces memory consumption and accelerates training while preserving high image quality. We further introduce a novel view refinement stage that selectively improves challenging viewpoints through targeted optimization, maintaining global efficiency while enhancing rendering quality for difficult viewing angles. Experiments demonstrate that our method outperforms state-of-the-art approaches with significant memory reduction and faster training, enabling real-time rendering with superior visual fidelity.\n\n尽管三维高斯投影（3D Gaussian Splatting, 3D-GS）在新视角合成任务中实现了高效渲染，但将其扩展到动态场景时仍然面临显著的内存开销问题，这是由于在每一帧中重复复制高斯点所致。为了解决这一挑战，我们提出了四维神经体素投影（4D Neural Voxel Splatting, 4D-NVS），该方法将基于体素的表示与神经高斯投影相结合，实现对动态场景的高效建模。我们不再为每个时间戳生成独立的高斯集合，而是采用一组紧凑的神经体素，并通过学习的形变场来建模时间变化。该设计在保留图像质量的同时大幅降低了内存消耗并加快了训练速度。我们进一步引入了一个新颖的视角优化阶段，通过针对性的优化提升困难视角的渲染质量，在保持整体效率的同时增强了复杂角度下的渲染表现。实验结果表明，4D-NVS 在显著降低内存占用和加速训练的同时，超越了现有的最新方法，实现了具有更高视觉保真度的实时渲染。\n"
  },
  {
    "path": "abs/2511.00998.md",
    "content": "### GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies\n\nRecently, effective coordination in embodied multi-agent systems has remained a fundamental challenge, particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.\n\n近年来，在具身多智能体系统中实现高效协同始终是一项核心挑战，尤其是在需要智能体兼顾个体视角与全局环境感知的场景中。现有方法常常难以在精细的局部控制与全面的场景理解之间取得平衡，导致系统可扩展性受限，协作质量下降。本文提出了 GauDP，一种高斯图像协同表示方法，支持多智能体协作系统中的可扩展、感知感知的模仿学习。具体而言，GauDP 从分散的 RGB 观测中构建一个全局一致的三维高斯场，并将其属性动态映射到每个智能体的局部视角中。这一机制使得所有智能体能够从共享的场景表示中自适应地查询与任务相关的关键特征，同时保持各自的独立观察视角。该设计无需额外的感知模态（如三维点云），即可实现细粒度控制与全局一致行为的统一。在 RoboFactory 基准测试中，涵盖了多种多机械臂协作任务，我们的方法在性能上显著优于现有的基于图像的方法，并接近基于点云的方法的效果，同时在智能体数量增加时保持良好的扩展性。\n"
  },
  {
    "path": "abs/2511.01373.md",
    "content": "### 3D Gaussian Radiation Field Modeling for Integrated RIS-FAS Systems: Analysis and Optimization\n\nThe integration of reconfigurable intelligent surfaces (RIS) and fluid antenna systems (FAS) has attracted considerable attention due to its tremendous potential in enhancing wireless communication performance. However, under fast-fading channel conditions, rapidly and effectively performing joint optimization of the antenna positions in an FAS system and the RIS phase configuration remains a critical challenge. Traditional optimization methods typically rely on complex iterative computations, thus making it challenging to obtain optimal solutions in real time within dynamic channel environments. To address this issue, this paper introduces a field information-driven optimization method based on three-dimensional Gaussian radiation-field modeling for real-time optimization of integrated FAS-RIS systems. In the proposed approach, obstacles are treated as virtual transmitters and, by separately learning the amplitude and phase variations, the model can quickly generate high-precision channel information based on the transmitter's position. This design eliminates the need for extensive pilot overhead and cumbersome computations. On this framework, an alternating optimization scheme is presented to jointly optimize the FAS position and the RIS phase configuration. Simulation results demonstrate that the proposed method significantly outperforms existing approaches in terms of spectrum prediction accuracy, convergence speed, and minimum achievable rate, validating its effectiveness and practicality in fast-fading scenarios.\n\n可重构智能表面（RIS）与流动天线系统（FAS）的融合因其在提升无线通信性能方面的巨大潜力而备受关注。然而，在快衰落信道条件下，如何快速且高效地联合优化FAS系统中的天线位置与RIS的相位配置，仍然是一个关键难题。传统的优化方法通常依赖于复杂的迭代计算，因此难以在动态信道环境中实时获得最优解。为了解决这一问题，本文提出了一种基于三维高斯辐射场建模的场信息驱动优化方法，用于FAS-RIS一体化系统的实时优化。在该方法中，将障碍物视为虚拟发射源，并通过分别学习振幅与相位的变化，模型能够依据发射源位置快速生成高精度的信道信息。这一设计免去了大量导频开销与繁重的计算。在此框架下，进一步提出了一种交替优化机制，用于联合优化FAS的位置与RIS的相位配置。仿真结果表明，该方法在频谱预测精度、收敛速度以及最小可达速率等方面显著优于现有方法，验证了其在快衰落场景中的有效性与实用性。\n"
  },
  {
    "path": "abs/2511.02207.md",
    "content": "### Object-Centric 3D Gaussian Splatting for Strawberry Plant Reconstruction and Phenotyping\n\nStrawberries are among the most economically significant fruits in the United States, generating over $2 billion in annual farm-gate sales and accounting for approximately 13% of the total fruit production value. Plant phenotyping plays a vital role in selecting superior cultivars by characterizing plant traits such as morphology, canopy structure, and growth dynamics. However, traditional plant phenotyping methods are time-consuming, labor-intensive, and often destructive. Recently, neural rendering techniques, notably Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have emerged as powerful frameworks for high-fidelity 3D reconstruction. By capturing a sequence of multi-view images or videos around a target plant, these methods enable non-destructive reconstruction of complex plant architectures. Despite their promise, most current applications of 3DGS in agricultural domains reconstruct the entire scene, including background elements, which introduces noise, increases computational costs, and complicates downstream trait analysis. To address this limitation, we propose a novel object-centric 3D reconstruction framework incorporating a preprocessing pipeline that leverages the Segment Anything Model v2 (SAM-2) and alpha channel background masking to achieve clean strawberry plant reconstructions. This approach produces more accurate geometric representations while substantially reducing computational time. With a background-free reconstruction, our algorithm can automatically estimate important plant traits, such as plant height and canopy width, using DBSCAN clustering and Principal Component Analysis (PCA). Experimental results show that our method outperforms conventional pipelines in both accuracy and efficiency, offering a scalable and non-destructive solution for strawberry plant phenotyping.\n\n草莓是美国最具经济意义的水果之一，其年度农场销售额超过20亿美元，占水果总产值的约13%。植物表型分析在优良品种筛选中扮演着关键角色，通过表征植物的形态特征、冠层结构和生长动态等属性，实现品种优选。然而，传统的植物表型分析方法通常耗时长、劳动强度大，且常常具有破坏性。近年来，神经渲染技术，尤其是神经辐射场（NeRF）和三维高斯投影（3D Gaussian Splatting, 3DGS），已成为实现高保真三维重建的强大框架。通过围绕目标植物拍摄多视角图像或视频序列，这些方法能够以非破坏性的方式重建复杂的植物结构。尽管这类方法极具潜力，但当前3DGS在农业领域的应用多数会重建整个场景，包括背景元素，这会引入噪声、增加计算开销，并使后续的性状分析变得更加复杂。为了解决这一问题，我们提出了一种新颖的以目标为中心的三维重建框架，集成了预处理流程，该流程利用Segment Anything Model v2（SAM-2）和alpha通道背景遮罩技术，实现了草莓植物的干净重建。该方法能够生成更为精准的几何表示，同时显著降低计算时间。借助无背景的重建结果，我们的算法可以通过DBSCAN聚类和主成分分析（PCA）自动估计植物的重要性状，如植株高度和冠幅宽度。实验结果表明，我们的方法在精度和效率上均优于传统流程，提供了一种可扩展且非破坏性的草莓植物表型分析解决方案。\n"
  },
  {
    "path": "abs/2511.02777.md",
    "content": "### PercHead: Perceptual Head Model for Single-Image 3D Head Reconstruction & Editing\n\nWe present PercHead, a method for single-image 3D head reconstruction and semantic 3D editing - two tasks that are inherently challenging due to severe view occlusions, weak perceptual supervision, and the ambiguity of editing in 3D space. We develop a unified base model for reconstructing view-consistent 3D heads from a single input image. The model employs a dual-branch encoder followed by a ViT-based decoder that lifts 2D features into 3D space through iterative cross-attention. Rendering is performed using Gaussian Splatting. At the heart of our approach is a novel perceptual supervision strategy based on DINOv2 and SAM2.1, which provides rich, generalized signals for both geometric and appearance fidelity. Our model achieves state-of-the-art performance in novel-view synthesis and, furthermore, exhibits exceptional robustness to extreme viewing angles compared to established baselines. Furthermore, this base model can be seamlessly extended for semantic 3D editing by swapping the encoder and finetuning the network. In this variant, we disentangle geometry and style through two distinct input modalities: a segmentation map to control geometry and either a text prompt or a reference image to specify appearance. We highlight the intuitive and powerful 3D editing capabilities of our model through a lightweight, interactive GUI, where users can effortlessly sculpt geometry by drawing segmentation maps and stylize appearance via natural language or image prompts.\n\n我们提出了 PercHead，这是一种用于单张图像的三维人头重建与语义三维编辑的方法。这两个任务本质上都极具挑战性，原因包括视角遮挡严重、感知监督信号薄弱，以及三维空间中编辑的模糊性。我们构建了一个统一的基础模型，能够从一张输入图像中重建视角一致的三维人头。该模型采用双分支编码器，接着是一个基于ViT的解码器，利用迭代交叉注意力将二维特征提升至三维空间。渲染则通过高斯投影（Gaussian Splatting）完成。我们方法的核心是一种基于 DINOv2 和 SAM2.1 的新型感知监督策略，能够为几何结构和外观保真提供丰富且通用的信号。我们的模型在新视角合成方面达到了当前最先进的性能，并在极端视角下表现出优于现有基线的强鲁棒性。此外，该基础模型可无缝扩展至语义三维编辑任务，只需替换编码器并微调网络。在这一变体中，我们通过两种不同的输入模态实现几何与风格的解耦：使用分割图控制几何结构，使用文本提示或参考图像指定外观风格。我们通过一个轻量级、交互式的图形界面展示了该模型直观而强大的三维编辑能力，用户可以通过绘制分割图轻松塑造几何结构，并通过自然语言或图像提示进行风格化外观编辑。\n"
  },
  {
    "path": "abs/2511.03099.md",
    "content": "### DentalSplat: Dental Occlusion Novel View Synthesis from Sparse Intra-Oral Photographs\n\nIn orthodontic treatment, particularly within telemedicine contexts, observing patients' dental occlusion from multiple viewpoints facilitates timely clinical decision-making. Recent advances in 3D Gaussian Splatting (3DGS) have shown strong potential in 3D reconstruction and novel view synthesis. However, conventional 3DGS pipelines typically rely on densely captured multi-view inputs and precisely initialized camera poses, limiting their practicality. Orthodontic cases, in contrast, often comprise only three sparse images, specifically, the anterior view and bilateral buccal views, rendering the reconstruction task especially challenging. The extreme sparsity of input views severely degrades reconstruction quality, while the absence of camera pose information further complicates the process. To overcome these limitations, we propose DentalSplat, an effective framework for 3D reconstruction from sparse orthodontic imagery. Our method leverages a prior-guided dense stereo reconstruction model to initialize the point cloud, followed by a scale-adaptive pruning strategy to improve the training efficiency and reconstruction quality of 3DGS. In scenarios with extremely sparse viewpoints, we further incorporate optical flow as a geometric constraint, coupled with gradient regularization, to enhance rendering fidelity. We validate our approach on a large-scale dataset comprising 950 clinical cases and an additional video-based test set of 195 cases designed to simulate real-world remote orthodontic imaging conditions. Experimental results demonstrate that our method effectively handles sparse input scenarios and achieves superior novel view synthesis quality for dental occlusion visualization, outperforming state-of-the-art techniques.\n\n在正畸治疗中，尤其是在远程医疗环境下，从多个视角观察患者的牙合情况有助于临床决策的及时制定。近年来，三维高斯投影（3D Gaussian Splatting, 3DGS）在三维重建和新视角合成方面展现出强大的潜力。然而，传统的3DGS流程通常依赖密集捕获的多视角输入和精确初始化的相机位姿，这限制了其实际应用。而正畸病例往往仅包含三张稀疏图像，分别为正面视角和双侧颊面视角，使得重建任务尤为具有挑战性。输入视角的极度稀疏性严重影响重建质量，而缺乏相机位姿信息进一步加剧了困难。为克服这些限制，我们提出了DentalSplat，一种用于稀疏正畸图像三维重建的有效框架。我们的方法首先利用先验引导的密集立体重建模型初始化点云，然后采用尺度自适应的剪枝策略，以提升3DGS的训练效率和重建质量。在极度稀疏视角的场景下，我们进一步引入光流作为几何约束，并结合梯度正则化，以增强渲染的保真度。我们在一个包含950个临床病例的大规模数据集上进行了验证，并额外引入一个由195个病例构成的视频测试集，用以模拟真实的远程正畸成像条件。实验结果表明，我们的方法在稀疏输入场景中表现出色，并在牙合可视化的新视角合成质量方面超越了现有最先进技术。\n"
  },
  {
    "path": "abs/2511.03950.md",
    "content": "### Improving Multi-View Reconstruction via Texture-Guided Gaussian-Mesh Joint Optimization\n\nReconstructing real-world objects from multi-view images is essential for applications in 3D editing, AR/VR, and digital content creation. Existing methods typically prioritize either geometric accuracy (Multi-View Stereo) or photorealistic rendering (Novel View Synthesis), often decoupling geometry and appearance optimization, which hinders downstream editing tasks. This paper advocates an unified treatment on geometry and appearance optimization for seamless Gaussian-mesh joint optimization. More specifically, we propose a novel framework that simultaneously optimizes mesh geometry (vertex positions and faces) and vertex colors via Gaussian-guided mesh differentiable rendering, leveraging photometric consistency from input images and geometric regularization from normal and depth maps. The obtained high-quality 3D reconstruction can be further exploit in down-stream editing tasks, such as relighting and shape deformation.\n\n从多视角图像中重建真实世界的物体对于三维编辑、增强现实/虚拟现实（AR/VR）以及数字内容创作等应用至关重要。现有方法通常倾向于在几何精度（如多视图立体重建）或照片级真实渲染（如新视角合成）之间做权衡，往往将几何与外观优化过程解耦，这在很大程度上限制了后续的编辑任务。本文倡导对几何与外观优化进行统一处理，实现高斯与网格的联合优化。具体而言，我们提出了一个新颖的框架，能够通过高斯引导的可微分网格渲染同时优化网格几何（顶点位置与面片）和顶点颜色，该框架利用输入图像的光度一致性以及法线图与深度图提供的几何正则信息。最终获得的高质量三维重建结果可以进一步用于后续的编辑任务，例如重光照和形状变形。\n"
  },
  {
    "path": "abs/2511.03992.md",
    "content": "### CaRF: Enhancing Multi-View Consistency in Referring 3D Gaussian Splatting Segmentation\n\nReferring 3D Gaussian Splatting Segmentation (R3DGS) aims to interpret free-form language expressions and localize the corresponding 3D regions in Gaussian fields. While recent advances have introduced cross-modal alignment between language and 3D geometry, existing pipelines still struggle with cross-view consistency due to their reliance on 2D rendered pseudo supervision and view specific feature learning. In this work, we present Camera Aware Referring Field (CaRF), a fully differentiable framework that operates directly in the 3D Gaussian space and achieves multi view consistency. Specifically, CaRF introduces Gaussian Field Camera Encoding (GFCE), which incorporates camera geometry into Gaussian text interactions to explicitly model view dependent variations and enhance geometric reasoning. Building on this, In Training Paired View Supervision (ITPVS) is proposed to align per Gaussian logits across calibrated views during training, effectively mitigating single view overfitting and exposing inter view discrepancies for optimization. Extensive experiments on three representative benchmarks demonstrate that CaRF achieves average improvements of 16.8%, 4.3%, and 2.0% in mIoU over state of the art methods on the Ref LERF, LERF OVS, and 3D OVS datasets, respectively. Moreover, this work promotes more reliable and view consistent 3D scene understanding, with potential benefits for embodied AI, AR/VR interaction, and autonomous perception.\n\n指向性三维高斯投影分割（Referring 3D Gaussian Splatting Segmentation, R3DGS）旨在理解自由形式的语言表达，并在高斯场中定位对应的三维区域。尽管近年来在语言与三维几何之间的跨模态对齐方面取得了进展，现有方法仍依赖于二维渲染的伪监督和视图特定的特征学习，难以实现视角一致性。在本研究中，我们提出了Camera Aware Referring Field（CaRF），一个直接在三维高斯空间中操作并实现多视角一致性的全可微分框架。具体而言，CaRF引入了高斯场相机编码（Gaussian Field Camera Encoding, GFCE），将相机几何信息融入高斯与文本的交互之中，显式建模视图相关的变化并增强几何推理能力。在此基础上，我们提出了训练中配对视图监督（In Training Paired View Supervision, ITPVS），在训练过程中对校准视图下的每个高斯的logits进行对齐，有效缓解了单视图过拟合，并暴露视图间差异以供优化。我们在三个具有代表性的基准数据集上进行了大量实验，结果表明CaRF在Ref LERF、LERF OVS和3D OVS数据集上相较于现有最先进方法在mIoU指标上分别实现了16.8%、4.3%和2.0%的平均提升。此外，该方法有助于实现更可靠、视角一致的三维场景理解，在具身智能、增强/虚拟现实交互以及自动感知等领域具有广阔的应用前景。\n"
  },
  {
    "path": "abs/2511.04283.md",
    "content": "### FastGS: Training 3D Gaussian Splatting in 100 Seconds\n\nThe dominant 3D Gaussian splatting (3DGS) acceleration methods fail to properly regulate the number of Gaussians during training, causing redundant computational time overhead. In this paper, we propose FastGS, a novel, simple, and general acceleration framework that fully considers the importance of each Gaussian based on multi-view consistency, efficiently solving the trade-off between training time and rendering quality. We innovatively design a densification and pruning strategy based on multi-view consistency, dispensing with the budgeting mechanism. Extensive experiments on Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets demonstrate that our method significantly outperforms the state-of-the-art methods in training speed, achieving a 3.32× training acceleration and comparable rendering quality compared with DashGaussian on the Mip-NeRF 360 dataset and a 15.45× acceleration compared with vanilla 3DGS on the Deep Blending dataset. We demonstrate that FastGS exhibits strong generality, delivering 2-7× training acceleration across various tasks, including dynamic scene reconstruction, surface reconstruction, sparse-view reconstruction, large-scale reconstruction, and simultaneous localization and mapping.\n\n当前主流的三维高斯投影（3D Gaussian Splatting, 3DGS）加速方法在训练过程中未能有效调控高斯数量，导致冗余的计算时间开销。本文提出了FastGS，一种新颖、简洁且通用的加速框架，能够基于多视角一致性全面评估每个高斯的重要性，从而高效解决训练时间与渲染质量之间的权衡问题。我们创新性地设计了基于多视角一致性的密化与剪枝策略，摒弃了传统的预算机制。我们在Mip-NeRF 360、Tanks & Temples以及Deep Blending等数据集上进行了大量实验，结果表明：与最先进方法相比，FastGS在训练速度方面显著提升，在Mip-NeRF 360数据集上相较于DashGaussian实现了3.32倍的加速，同时保持了可比的渲染质量；在Deep Blending数据集上相较于原始3DGS实现了15.45倍的加速。此外，我们还展示了FastGS在多种任务中的强泛化能力，在动态场景重建、表面重建、稀疏视图重建、大规模重建和同时定位与建图（SLAM）等任务中实现了2至7倍的训练加速效果。\n"
  },
  {
    "path": "abs/2511.04595.md",
    "content": "### UniSplat: Unified Spatio-Temporal Fusion via 3D Latent Scaffolds for Dynamic Driving Scene Reconstruction\n\nFeed-forward 3D reconstruction for autonomous driving has advanced rapidly, yet existing methods struggle with the joint challenges of sparse, non-overlapping camera views and complex scene dynamics. We present UniSplat, a general feed-forward framework that learns robust dynamic scene reconstruction through unified latent spatio-temporal fusion. UniSplat constructs a 3D latent scaffold, a structured representation that captures geometric and semantic scene context by leveraging pretrained foundation models. To effectively integrate information across spatial views and temporal frames, we introduce an efficient fusion mechanism that operates directly within the 3D scaffold, enabling consistent spatio-temporal alignment. To ensure complete and detailed reconstructions, we design a dual-branch decoder that generates dynamic-aware Gaussians from the fused scaffold by combining point-anchored refinement with voxel-based generation, and maintain a persistent memory of static Gaussians to enable streaming scene completion beyond current camera coverage. Extensive experiments on real-world datasets demonstrate that UniSplat achieves state-of-the-art performance in novel view synthesis, while providing robust and high-quality renderings even for viewpoints outside the original camera coverage.\n\n用于自动驾驶的前馈式三维重建技术发展迅速，但现有方法在面对稀疏且无重叠的相机视角以及复杂的场景动态时仍然面临巨大挑战。我们提出了 UniSplat，一个通用的前馈框架，通过统一的时空潜表示融合学习稳健的动态场景重建。UniSplat 构建了一个三维潜在支架，这是一种结构化表示，借助预训练基础模型来捕捉场景的几何与语义上下文。为了高效整合空间视角与时间帧之间的信息，我们引入了一种直接在三维支架中进行操作的高效融合机制，实现一致的时空对齐。为保证完整且细致的重建效果，我们设计了一个双分支解码器，从融合后的支架中生成对动态敏感的高斯表示，结合点锚定精细化与基于体素的生成方式，同时保留静态高斯的持久记忆，从而实现超出当前相机覆盖范围的流式场景补全。我们在真实场景数据集上进行了大量实验，结果表明 UniSplat 在新视角合成方面达到了当前最先进的性能，并在原始相机视角之外仍能生成稳健且高质量的渲染效果。\n"
  },
  {
    "path": "abs/2511.04665.md",
    "content": "### Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions\n\nRobotic manipulation policies are advancing rapidly, but their direct evaluation in the real world remains costly, time-consuming, and difficult to reproduce, particularly for tasks involving deformable objects. Simulation provides a scalable and systematic alternative, yet existing simulators often fail to capture the coupled visual and physical complexity of soft-body interactions. We present a real-to-sim policy evaluation framework that constructs soft-body digital twins from real-world videos and renders robots, objects, and environments with photorealistic fidelity using 3D Gaussian Splatting. We validate our approach on representative deformable manipulation tasks, including plush toy packing, rope routing, and T-block pushing, demonstrating that simulated rollouts correlate strongly with real-world execution performance and reveal key behavioral patterns of learned policies. Our results suggest that combining physics-informed reconstruction with high-quality rendering enables reproducible, scalable, and accurate evaluation of robotic manipulation policies.\n\n机器人操作策略正迅速发展，但在现实环境中的直接评估仍然成本高昂、耗时且难以复现，尤其是在涉及可变形物体的任务中。仿真提供了一种可扩展且系统性的替代方案，然而现有模拟器往往难以真实捕捉软体物体交互中视觉与物理之间的耦合复杂性。我们提出了一种从真实到仿真的策略评估框架，通过真实世界视频构建可变形物体的数字孪生体，并利用三维高斯投影（3D Gaussian Splatting）以照片级真实感渲染机器人、物体和环境。我们在具有代表性的可变形操作任务中验证了该方法，包括毛绒玩具打包、绳索布线和T形块推动，实验表明仿真回放与真实执行表现之间具有高度相关性，能够揭示学习策略中的关键行为模式。我们的研究结果表明，将物理感知的重建与高质量渲染相结合，可以实现对机器人操作策略的可复现、可扩展且精确的评估。\n"
  },
  {
    "path": "abs/2511.04797.md",
    "content": "### 3D Gaussian Point Encoders\n\nIn this work, we introduce the 3D Gaussian Point Encoder, an explicit per-point embedding built on mixtures of learned 3D Gaussians. This explicit geometric representation for 3D recognition tasks is a departure from widely used implicit representations such as PointNet. However, it is difficult to learn 3D Gaussian encoders in end-to-end fashion with standard optimizers. We develop optimization techniques based on natural gradients and distillation from PointNets to find a Gaussian Basis that can reconstruct PointNet activations. The resulting 3D Gaussian Point Encoders are faster and more parameter efficient than traditional PointNets. As in the 3D reconstruction literature where there has been considerable interest in the move from implicit (e.g., NeRF) to explicit (e.g., Gaussian Splatting) representations, we can take advantage of computational geometry heuristics to accelerate 3D Gaussian Point Encoders further. We extend filtering techniques from 3D Gaussian Splatting to construct encoders that run 2.7 times faster as a comparable accuracy PointNet while using 46% less memory and 88% fewer FLOPs. Furthermore, we demonstrate the effectiveness of 3D Gaussian Point Encoders as a component in Mamba3D, running 1.27 times faster and achieving a reduction in memory and FLOPs by 42% and 54% respectively. 3D Gaussian Point Encoders are lightweight enough to achieve high framerates on CPU-only devices.\n\n在本研究中，我们提出了3D Gaussian Point Encoder，一种基于学习得到的三维高斯混合模型的显式逐点嵌入方法。这种用于三维识别任务的显式几何表示方式突破了PointNet等广泛使用的隐式表示。然而，使用标准优化器很难以端到端的方式训练3D高斯编码器。为此，我们开发了基于自然梯度和PointNet蒸馏的优化技术，以学习能够重建PointNet激活的高斯基表示。所得到的3D Gaussian Point Encoder在速度和参数效率上均优于传统的PointNet。正如三维重建领域中从隐式（如NeRF）向显式（如Gaussian Splatting）表示迁移的趋势一样，我们也可利用计算几何启发式加速3D Gaussian Point Encoder的效率。我们扩展了3D Gaussian Splatting中的过滤技术，构建出的编码器在达到与PointNet相当精度的前提下，实现了2.7倍的加速，同时内存占用减少46%，FLOPs减少88%。此外，我们进一步展示了3D Gaussian Point Encoder在Mamba3D框架中的有效性，其运行速度提升至1.27倍，同时内存和FLOPs分别降低42%和54%。该方法足够轻量，可在仅使用CPU的设备上实现高帧率运行。\n"
  },
  {
    "path": "abs/2511.04951.md",
    "content": "### CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) is an increasingly popular novel view synthesis approach due to its fast rendering time, and high-quality output. However, scaling 3DGS to large (or intricate) scenes is challenging due to its large memory requirement, which exceed most GPU's memory capacity. In this paper, we describe CLM, a system that allows 3DGS to render large scenes using a single consumer-grade GPU, e.g., RTX4090. It does so by offloading Gaussians to CPU memory, and loading them into GPU memory only when necessary. To reduce performance and communication overheads, CLM uses a novel offloading strategy that exploits observations about 3DGS's memory access pattern for pipelining, and thus overlap GPU-to-CPU communication, GPU computation and CPU computation. Furthermore, we also exploit observation about the access pattern to reduce communication volume. Our evaluation shows that the resulting implementation can render a large scene that requires 100 million Gaussians on a single RTX4090 and achieve state-of-the-art reconstruction quality.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）因其渲染速度快、输出质量高，近年来在新视角合成任务中日益受到关注。然而，将3DGS扩展到大型或复杂场景面临显著挑战，主要原因在于其对内存的需求极高，超出了大多数GPU的内存容量。本文提出了CLM系统，使得3DGS可以在一块消费级GPU（如RTX4090）上渲染大规模场景。该系统通过将高斯数据卸载至CPU内存，仅在需要时再加载至GPU，从而规避了GPU内存瓶颈。为减少性能与通信开销，CLM设计了一种新颖的卸载策略，基于对3DGS内存访问模式的观察进行流水线处理，从而实现GPU到CPU通信、GPU计算与CPU计算的并行化。此外，我们还进一步利用访问模式特性，降低了通信数据量。评估结果表明，该实现能够在单块RTX4090上渲染一个包含1亿个高斯的大型场景，同时达到当前最先进的重建质量。\n"
  },
  {
    "path": "abs/2511.05109.md",
    "content": "### Efficient representation of 3D spatial data for defense-related applications\n\nGeospatial sensor data is essential for modern defense and security, offering indispensable 3D information for situational awareness. This data, gathered from sources like lidar sensors and optical cameras, allows for the creation of detailed models of operational environments. In this paper, we provide a comparative analysis of traditional representation methods, such as point clouds, voxel grids, and triangle meshes, alongside modern neural and implicit techniques like Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS). Our evaluation reveals a fundamental trade-off: traditional models offer robust geometric accuracy ideal for functional tasks like line-of-sight analysis and physics simulations, while modern methods excel at producing high-fidelity, photorealistic visuals but often lack geometric reliability. Based on these findings, we conclude that a hybrid approach is the most promising path forward. We propose a system architecture that combines a traditional mesh scaffold for geometric integrity with a neural representation like 3DGS for visual detail, managed within a hierarchical scene structure to ensure scalability and performance.\n\n地理空间传感数据对于现代国防与安全至关重要，能够为态势感知提供不可或缺的三维信息。这类数据通常来自激光雷达传感器和光学相机等来源，可用于构建作战环境的高精度模型。本文对传统表示方法（如点云、体素网格和三角网格）与现代神经和隐式技术（如Neural Radiance Fields, NeRFs 和三维高斯投影 3D Gaussian Splatting, 3DGS）进行了对比分析。我们的评估揭示出一个基本的权衡关系：传统模型在几何精度方面更具优势，适用于视线分析、物理仿真等功能性任务，而现代方法则擅长生成高保真、照片级真实感的视觉效果，但通常缺乏几何可靠性。基于上述发现，我们认为混合式方法是未来最具前景的发展方向。我们提出了一种系统架构，将具有几何准确性的传统网格支架与用于视觉细节表达的神经表示（如3DGS）相结合，并通过分层场景结构进行管理，以保障系统的可扩展性与运行效率。\n"
  },
  {
    "path": "abs/2511.05229.md",
    "content": "### 4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos\n\nNovel view synthesis from monocular videos of dynamic scenes with unknown camera poses remains a fundamental challenge in computer vision and graphics. While recent advances in 3D representations such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown promising results for static scenes, they struggle with dynamic content and typically rely on pre-computed camera poses. We present 4D3R, a pose-free dynamic neural rendering framework that decouples static and dynamic components through a two-stage approach. Our method first leverages 3D foundational models for initial pose and geometry estimation, followed by motion-aware refinement. 4D3R introduces two key technical innovations: (1) a motion-aware bundle adjustment (MA-BA) module that combines transformer-based learned priors with SAM2 for robust dynamic object segmentation, enabling more accurate camera pose refinement; and (2) an efficient Motion-Aware Gaussian Splatting (MA-GS) representation that uses control points with a deformation field MLP and linear blend skinning to model dynamic motion, significantly reducing computational cost while maintaining high-quality reconstruction. Extensive experiments on real-world dynamic datasets demonstrate that our approach achieves up to 1.8dB PSNR improvement over state-of-the-art methods, particularly in challenging scenarios with large dynamic objects, while reducing computational requirements by 5x compared to previous dynamic scene representations.\n\n在相机位姿未知的情况下，从单目视频中对动态场景进行新视角合成，仍然是计算机视觉与计算机图形学中的一项基础性挑战。尽管近年来诸如神经辐射场（Neural Radiance Fields，NeRF）和三维高斯溅射（3D Gaussian Splatting，3DGS）等三维表示方法在静态场景中取得了令人鼓舞的成果，但它们在处理动态内容时表现受限，并且通常依赖于预先计算的相机位姿。本文提出了 4D3R，一种无需相机位姿的动态神经渲染框架，通过两阶段策略对静态与动态成分进行解耦。我们的方法首先利用三维基础模型进行初始相机位姿与几何结构估计，随后引入运动感知的精细化优化。4D3R 包含两项关键技术创新：（1）一种运动感知的束调整（Motion-Aware Bundle Adjustment，MA-BA）模块，将基于 Transformer 的学习先验与 SAM2 相结合，实现对动态物体的鲁棒分割，从而支持更精确的相机位姿优化；（2）一种高效的运动感知高斯溅射（Motion-Aware Gaussian Splatting，MA-GS）表示方法，通过引入控制点、形变场 MLP 以及线性混合蒙皮来建模动态运动，在保持高质量重建的同时显著降低计算开销。在多个真实世界动态数据集上的大量实验表明，与现有最先进方法相比，我们的方法在 PSNR 指标上最高可提升 1.8dB，尤其在包含大尺度动态物体的挑战性场景中表现突出，同时相较于以往的动态场景表示方法，计算需求降低了约 5 倍。\n"
  },
  {
    "path": "abs/2511.06046.md",
    "content": "### StreamSTGS: Streaming Spatial and Temporal Gaussian Grids for Real-Time Free-Viewpoint Video\n\nStreaming free-viewpoint video~(FVV) in real-time still faces significant challenges, particularly in training, rendering, and transmission efficiency. Harnessing superior performance of 3D Gaussian Splatting~(3DGS), recent 3DGS-based FVV methods have achieved notable breakthroughs in both training and rendering. However, the storage requirements of these methods can reach up to 10MB per frame, making stream FVV in real-time impossible. To address this problem, we propose a novel FVV representation, dubbed StreamSTGS, designed for real-time streaming. StreamSTGS represents a dynamic scene using canonical 3D Gaussians, temporal features, and a deformation field. For high compression efficiency, we encode canonical Gaussian attributes as 2D images and temporal features as a video. This design not only enables real-time streaming, but also inherently supports adaptive bitrate control based on network condition without any extra training. Moreover, we propose a sliding window scheme to aggregate adjacent temporal features to learn local motions, and then introduce a transformer-guided auxiliary training module to learn global motions. On diverse FVV benchmarks, StreamSTGS demonstrates competitive performance on all metrics compared to state-of-the-art methods. Notably, StreamSTGS increases the PSNR by an average of 1dB while reducing the average frame size to just 170KB.\n\n实时流式自由视角视频（Free-Viewpoint Video，FVV）仍然面临诸多挑战，尤其是在训练效率、渲染效率以及传输效率方面。得益于三维高斯溅射（3D Gaussian Splatting，3DGS）的优越性能，近期基于 3DGS 的 FVV 方法在训练与渲染方面均取得了显著突破。然而，这类方法的存储开销可高达每帧 10MB，使得实时流式 FVV 几乎不可行。为了解决这一问题，我们提出了一种新的 FVV 表示方法，称为 StreamSTGS，专为实时流式传输而设计。StreamSTGS 通过规范化的三维高斯、时间特征以及形变场来表示动态场景。为了实现高效压缩，我们将规范化高斯的属性编码为二维图像，并将时间特征编码为视频。这一设计不仅支持实时流式传输，还能够在无需额外训练的情况下，根据网络状况自然地实现自适应码率控制。此外，我们提出了一种滑动窗口机制，用于聚合相邻的时间特征以学习局部运动，并进一步引入一个由 Transformer 引导的辅助训练模块来建模全局运动。在多个 FVV 基准数据集上的实验结果表明，StreamSTGS 在各项指标上均展现出与当前最先进方法相当的性能。值得注意的是，StreamSTGS 在将平均单帧大小压缩至仅 170KB 的同时，使 PSNR 平均提升了约 1dB。\n"
  },
  {
    "path": "abs/2511.06299.md",
    "content": "### Physics-Informed Deformable Gaussian Splatting: Towards Unified Constitutive Laws for Time-Evolving Material Field\n\nRecently, 3D Gaussian Splatting (3DGS), an explicit scene representation technique, has shown significant promise for dynamic novel-view synthesis from monocular video input. However, purely data-driven 3DGS often struggles to capture the diverse physics-driven motion patterns in dynamic scenes. To fill this gap, we propose Physics-Informed Deformable Gaussian Splatting (PIDG), which treats each Gaussian particle as a Lagrangian material point with time-varying constitutive parameters and is supervised by 2D optical flow via motion projection. Specifically, we adopt static-dynamic decoupled 4D decomposed hash encoding to reconstruct geometry and motion efficiently. Subsequently, we impose the Cauchy momentum residual as a physics constraint, enabling independent prediction of each particle's velocity and constitutive stress via a time-evolving material field. Finally, we further supervise data fitting by matching Lagrangian particle flow to camera-compensated optical flow, which accelerates convergence and improves generalization. Experiments on a custom physics-driven dataset as well as on standard synthetic and real-world datasets demonstrate significant gains in physical consistency and monocular dynamic reconstruction quality.\n\n近年来，作为一种显式场景表示技术，三维高斯溅射（3D Gaussian Splatting，3DGS）在基于单目视频输入的动态新视角合成任务中展现出显著潜力。然而，纯数据驱动的 3DGS 往往难以刻画动态场景中由物理机制驱动的多样化运动模式。为弥补这一不足，我们提出了物理信息引导的可形变高斯溅射（Physics-Informed Deformable Gaussian Splatting，PIDG），将每一个高斯粒子视为具有时变本构参数的拉格朗日材料点，并通过运动投影引入二维光流进行监督。具体而言，我们采用静态—动态解耦的四维分解哈希编码，以高效重建几何结构与运动信息。随后，我们引入柯西动量方程残差作为物理约束，通过随时间演化的材料场，实现对每个粒子速度与本构应力的独立预测。最后，通过将拉格朗日粒子流与经相机运动补偿的光流进行匹配，对数据拟合过程施加进一步监督，从而加速收敛并提升泛化能力。在自建的物理驱动数据集以及标准合成与真实世界数据集上的实验结果表明，该方法在物理一致性和单目动态重建质量方面均取得了显著提升。\n"
  },
  {
    "path": "abs/2511.06457.md",
    "content": "### Inpaint360GS: Efficient Object-Aware 3D Inpainting via Gaussian Splatting for 360° Scenes\n\nDespite recent advances in single-object front-facing inpainting using NeRF and 3D Gaussian Splatting (3DGS), inpainting in complex 360° scenes remains largely underexplored. This is primarily due to three key challenges: (i) identifying target objects in the 3D field of 360° environments, (ii) dealing with severe occlusions in multi-object scenes, which makes it hard to define regions to inpaint, and (iii) maintaining consistent and high-quality appearance across views effectively. To tackle these challenges, we propose Inpaint360GS, a flexible 360° editing framework based on 3DGS that supports multi-object removal and high-fidelity inpainting in 3D space. By distilling 2D segmentation into 3D and leveraging virtual camera views for contextual guidance, our method enables accurate object-level editing and consistent scene completion. We further introduce a new dataset tailored for 360° inpainting, addressing the lack of ground truth object-free scenes. Experiments demonstrate that Inpaint360GS outperforms existing baselines and achieves state-of-the-art performance.\n\n尽管近年来基于 NeRF 和三维高斯溅射（3D Gaussian Splatting，3DGS）的单目标正面视角修复取得了显著进展，但在复杂的 360° 场景中进行修复仍然缺乏系统性的研究。这主要源于三项关键挑战：（i）在 360° 环境的三维场中准确识别目标物体；（ii）多物体场景中普遍存在的严重遮挡问题，使得待修复区域难以精确定义；以及（iii）如何在不同视角之间有效地保持一致且高质量的外观。为应对上述挑战，我们提出了 Inpaint360GS，一种基于 3DGS 的灵活 360° 编辑框架，支持多目标移除以及三维空间中的高保真修复。通过将二维分割结果蒸馏至三维表示，并利用虚拟相机视角提供上下文引导，我们的方法实现了精确的物体级编辑与一致的场景补全。此外，我们还引入了一个专为 360° 修复任务设计的新数据集，以弥补缺乏无目标真实场景真值数据的问题。实验结果表明，Inpaint360GS 优于现有基线方法，并达到了当前最先进的性能水平。\n"
  },
  {
    "path": "abs/2511.06632.md",
    "content": "### DIAL-GS: Dynamic Instance Aware Reconstruction for Label-free Street Scenes with 4D Gaussian Splatting\n\nUrban scene reconstruction is critical for autonomous driving, enabling structured 3D representations for data synthesis and closed-loop testing. Supervised approaches rely on costly human annotations and lack scalability, while current self-supervised methods often confuse static and dynamic elements and fail to distinguish individual dynamic objects, limiting fine-grained editing. We propose DIAL-GS, a novel dynamic instance-aware reconstruction method for label-free street scenes with 4D Gaussian Splatting. We first accurately identify dynamic instances by exploiting appearance-position inconsistency between warped rendering and actual observation. Guided by instance-level dynamic perception, we employ instance-aware 4D Gaussians as the unified volumetric representation, realizing dynamic-adaptive and instance-aware reconstruction. Furthermore, we introduce a reciprocal mechanism through which identity and dynamics reinforce each other, enhancing both integrity and consistency. Experiments on urban driving scenarios show that DIAL-GS surpasses existing self-supervised baselines in reconstruction quality and instance-level editing, offering a concise yet powerful solution for urban scene modeling.\n\n\n城市场景重建是自动驾驶中的关键技术，可为数据合成与闭环测试提供结构化的三维表示。监督式方法依赖高成本的人工标注，且难以规模化扩展；而现有的自监督方法往往混淆静态与动态元素，且难以区分独立的动态物体，从而限制了细粒度编辑能力。为此，我们提出了 DIAL-GS，一种基于四维高斯溅射（4D Gaussian Splatting）的、面向无标注街景的动态实例感知重建方法。我们首先利用扭曲渲染结果与真实观测之间的外观—位置不一致性，精确识别动态实例。在实例级动态感知的引导下，我们采用实例感知的 4D 高斯作为统一的体素化表示，实现了动态自适应且实例感知的场景重建。此外，我们引入了一种互促机制，使实例身份与动态信息相互增强，从而同时提升重建的完整性与一致性。在城市场景自动驾驶数据上的实验结果表明，DIAL-GS 在重建质量和实例级编辑能力方面均优于现有的自监督基线方法，为城市场景建模提供了一种简洁而高效的解决方案。\n"
  },
  {
    "path": "abs/2511.06734.md",
    "content": "### Rethinking Rainy 3D Scene Reconstruction via Perspective Transforming and Brightness Tuning\n\nRain degrades the visual quality of multi-view images, which are essential for 3D scene reconstruction, resulting in inaccurate and incomplete reconstruction results. Existing datasets often overlook two critical characteristics of real rainy 3D scenes: the viewpoint-dependent variation in the appearance of rain streaks caused by their projection onto 2D images, and the reduction in ambient brightness resulting from cloud coverage during rainfall. To improve data realism, we construct a new dataset named OmniRain3D that incorporates perspective heterogeneity and brightness dynamicity, enabling more faithful simulation of rain degradation in 3D scenes. Based on this dataset, we propose an end-to-end reconstruction framework named REVR-GSNet (Rain Elimination and Visibility Recovery for 3D Gaussian Splatting). Specifically, REVR-GSNet integrates recursive brightness enhancement, Gaussian primitive optimization, and GS-guided rain elimination into a unified architecture through joint alternating optimization, achieving high-fidelity reconstruction of clean 3D scenes from rain-degraded inputs. Extensive experiments show the effectiveness of our dataset and method. Our dataset and method provide a foundation for future research on multi-view image deraining and rainy 3D scene reconstruction.\n\n降雨会显著降低多视角图像的视觉质量，而多视角图像是三维场景重建的关键输入，从而导致重建结果不准确且不完整。现有数据集往往忽略了真实雨天三维场景的两个关键特性：其一，雨线在投影到二维图像时，由于视角变化而呈现出与视点相关的外观差异；其二，降雨过程中云层遮挡会导致环境亮度整体降低。为提升数据的真实性，我们构建了一个新的数据集 OmniRain3D，引入了视角异质性与亮度动态性，从而能够更加真实地模拟三维场景中的雨天退化现象。基于该数据集，我们提出了一种端到端的重建框架 REVR-GSNet（Rain Elimination and Visibility Recovery for 3D Gaussian Splatting）。具体而言，REVR-GSNet 通过联合交替优化，将递归亮度增强、高斯基元优化以及 GS 引导的去雨过程整合到一个统一的架构中，从受雨退化的输入中实现对干净三维场景的高保真重建。大量实验验证了我们所构建数据集及所提出方法的有效性。我们的数据集与方法为未来多视角图像去雨以及雨天三维场景重建研究奠定了基础。\n"
  },
  {
    "path": "abs/2511.06765.md",
    "content": "### Robust and High-Fidelity 3D Gaussian Splatting: Fusing Pose Priors and Geometry Constraints for Texture-Deficient Outdoor Scenes\n\n3D Gaussian Splatting (3DGS) has emerged as a key rendering pipeline for digital asset creation due to its balance between efficiency and visual quality. To address the issues of unstable pose estimation and scene representation distortion caused by geometric texture inconsistency in large outdoor scenes with weak or repetitive textures, we approach the problem from two aspects: pose estimation and scene representation. For pose estimation, we leverage LiDAR-IMU Odometry to provide prior poses for cameras in large-scale environments. These prior pose constraints are incorporated into COLMAP's triangulation process, with pose optimization performed via bundle adjustment. Ensuring consistency between pixel data association and prior poses helps maintain both robustness and accuracy. For scene representation, we introduce normal vector constraints and effective rank regularization to enforce consistency in the direction and shape of Gaussian primitives. These constraints are jointly optimized with the existing photometric loss to enhance the map quality. We evaluate our approach using both public and self-collected datasets. In terms of pose optimization, our method requires only one-third of the time while maintaining accuracy and robustness across both datasets. In terms of scene representation, the results show that our method significantly outperforms conventional 3DGS pipelines. Notably, on self-collected datasets characterized by weak or repetitive textures, our approach demonstrates enhanced visualization capabilities and achieves superior overall performance.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）因在效率与视觉质量之间取得良好平衡，已成为数字资产创建中的关键渲染管线。针对在弱纹理或重复纹理的大规模室外场景中，由几何与纹理不一致所引发的相机位姿估计不稳定以及场景表示畸变问题，我们从两个方面对该问题进行研究：位姿估计与场景表示。在位姿估计方面，我们利用 LiDAR-IMU 里程计为大尺度环境中的相机提供先验位姿，并将这些先验位姿约束引入 COLMAP 的三角化过程，通过束调整（bundle adjustment）进行位姿优化。通过确保像素数据关联与先验位姿之间的一致性，从而在保持鲁棒性的同时提升估计精度。在场景表示方面，我们引入法向量约束与有效秩正则化，以强化高斯基元在方向与形状上的一致性，并将这些约束与现有的光度损失进行联合优化，从而提升整体地图质量。我们在公开数据集和自采集数据集上对所提出的方法进行了评估。在位姿优化方面，我们的方法在保证精度与鲁棒性的前提下，仅需约三分之一的计算时间。在场景表示方面，实验结果表明，该方法显著优于传统的 3DGS 管线。尤其是在具有弱纹理或重复纹理特征的自采集数据集中，我们的方法展现出更强的可视化能力，并取得了更为优越的整体性能。\n"
  },
  {
    "path": "abs/2511.06810.md",
    "content": "### ConeGS: Error-Guided Densification Using Pixel Cones for Improved Reconstruction with Fewer Primitives\n\n3D Gaussian Splatting (3DGS) achieves state-of-the-art image quality and real-time performance in novel view synthesis but often suffers from a suboptimal spatial distribution of primitives. This issue stems from cloning-based densification, which propagates Gaussians along existing geometry, limiting exploration and requiring many primitives to adequately cover the scene. We present ConeGS, an image-space-informed densification framework that is independent of existing scene geometry state. ConeGS first creates a fast Instant Neural Graphics Primitives (iNGP) reconstruction as a geometric proxy to estimate per-pixel depth. During the subsequent 3DGS optimization, it identifies high-error pixels and inserts new Gaussians along the corresponding viewing cones at the predicted depth values, initializing their size according to the cone diameter. A pre-activation opacity penalty rapidly removes redundant Gaussians, while a primitive budgeting strategy controls the total number of primitives, either by a fixed budget or by adapting to scene complexity, ensuring high reconstruction quality. Experiments show that ConeGS consistently enhances reconstruction quality and rendering performance across Gaussian budgets, with especially strong gains under tight primitive constraints where efficient placement is crucial.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）在新视角合成任务中实现了最先进的图像质量与实时渲染性能，但其高斯基元的空间分布往往并不理想。该问题主要源于基于克隆的致密化策略：该策略沿着已有几何结构传播高斯基元，限制了对新区域的探索，并需要大量基元才能充分覆盖整个场景。为此，我们提出了 ConeGS，一种基于图像空间信息、且不依赖于现有场景几何状态的致密化框架。ConeGS 首先构建一个快速的 Instant Neural Graphics Primitives（iNGP）重建结果，作为几何代理来估计逐像素深度。在随后的 3DGS 优化过程中，方法识别出高误差像素，并在预测的深度位置沿对应的视锥插入新的高斯基元，其初始尺度根据视锥直径进行设置。通过在激活前引入不透明度惩罚，可以快速移除冗余的高斯基元；同时，基元预算策略用于控制基元总数，既可以采用固定预算，也可以根据场景复杂度自适应调整，从而在保证高重建质量的同时提高效率。实验结果表明，ConeGS 在不同高斯基元预算条件下均能持续提升重建质量与渲染性能，尤其在基元数量受限、对高效布局要求极高的场景中表现出显著优势。\n"
  },
  {
    "path": "abs/2511.06830.md",
    "content": "### MUGSQA: Novel Multi-Uncertainty-Based Gaussian Splatting Quality Assessment Method, Dataset, and Benchmarks\n\nGaussian Splatting (GS) has recently emerged as a promising technique for 3D object reconstruction, delivering high-quality rendering results with significantly improved reconstruction speed. As variants continue to appear, assessing the perceptual quality of 3D objects reconstructed with different GS-based methods remains an open challenge. To address this issue, we first propose a unified multi-distance subjective quality assessment method that closely mimics human viewing behavior for objects reconstructed with GS-based methods in actual applications, thereby better collecting perceptual experiences. Based on it, we also construct a novel GS quality assessment dataset named MUGSQA, which is constructed considering multiple uncertainties of the input data. These uncertainties include the quantity and resolution of input views, the view distance, and the accuracy of the initial point cloud. Moreover, we construct two benchmarks: one to evaluate the robustness of various GS-based reconstruction methods under multiple uncertainties, and the other to evaluate the performance of existing quality assessment metrics. Our dataset and benchmark code will be released soon.\n\n高斯溅射（Gaussian Splatting，GS）近年来已成为一种极具潜力的三维物体重建技术，在显著提升重建速度的同时，能够提供高质量的渲染结果。随着各种 GS 变体方法不断涌现，如何评估不同 GS 方法重建得到的三维物体的感知质量，仍然是一个尚未解决的挑战。为此，我们首先提出了一种统一的多视距主观质量评估方法，该方法紧密模拟了真实应用场景中人类对 GS 重建物体的观看行为，从而能够更有效地采集人类的感知体验。基于该评估方法，我们进一步构建了一个新的 GS 质量评估数据集，命名为 MUGSQA，该数据集在构建过程中综合考虑了输入数据中的多种不确定性因素，包括输入视角的数量与分辨率、观看距离以及初始点云的精度。此外，我们还构建了两个基准：其一用于评估不同 GS 重建方法在多种不确定性条件下的鲁棒性，其二用于评估现有质量评估指标的性能。我们的数据集及基准测试代码将于近期公开。\n"
  },
  {
    "path": "abs/2511.06953.md",
    "content": "### GFix: Perceptually Enhanced Gaussian Splatting Video Compression\n\n3D Gaussian Splatting (3DGS) enhances 3D scene reconstruction through explicit representation and fast rendering, demonstrating potential benefits for various low-level vision tasks, including video compression. However, existing 3DGS-based video codecs generally exhibit more noticeable visual artifacts and relatively low compression ratios. In this paper, we specifically target the perceptual enhancement of 3DGS-based video compression, based on the assumption that artifacts from 3DGS rendering and quantization resemble noisy latents sampled during diffusion training. Building on this premise, we propose a content-adaptive framework, GFix, comprising a streamlined, single-step diffusion model that serves as an off-the-shelf neural enhancer. Moreover, to increase compression efficiency, We propose a modulated LoRA scheme that freezes the low-rank decompositions and modulates the intermediate hidden states, thereby achieving efficient adaptation of the diffusion backbone with highly compressible updates. Experimental results show that GFix delivers strong perceptual quality enhancement, outperforming GSVC with up to 72.1% BD-rate savings in LPIPS and 21.4% in FID.\n\n三维高斯溅射（3D Gaussian Splatting，3DGS）通过显式表示与快速渲染增强了三维场景重建能力，在包括视频压缩在内的多种低层视觉任务中展现出潜在优势。然而，现有基于 3DGS 的视频编解码方法通常存在较为明显的视觉伪影，并且压缩率相对较低。本文聚焦于提升基于 3DGS 的视频压缩的感知质量，基于这样一种假设：3DGS 渲染与量化所产生的伪影与扩散模型训练过程中采样得到的噪声潜变量在分布特性上具有相似性。在此基础上，我们提出了一种内容自适应框架 GFix，其核心是一个精简的单步扩散模型，可作为即插即用的神经感知增强器。此外，为了进一步提升压缩效率，我们提出了一种调制式 LoRA 方案，在冻结低秩分解参数的同时，对中间隐藏状态进行调制，从而以高度可压缩的更新实现对扩散骨干网络的高效自适应。实验结果表明，GFix 在感知质量提升方面表现突出，在 LPIPS 指标上相较 GSVC 实现了最高 72.1% 的 BD-rate 节省，在 FID 指标上实现了 21.4% 的节省。\n"
  },
  {
    "path": "abs/2511.07241.md",
    "content": "### 4DSTR: Advancing Generative 4D Gaussians with Spatial-Temporal Rectification for High-Quality and Consistent 4D Generation\n\nRemarkable advances in recent 2D image and 3D shape generation have induced a significant focus on dynamic 4D content generation. However, previous 4D generation methods commonly struggle to maintain spatial-temporal consistency and adapt poorly to rapid temporal variations, due to the lack of effective spatial-temporal modeling. To address these problems, we propose a novel 4D generation network called 4DSTR, which modulates generative 4D Gaussian Splatting with spatial-temporal rectification. Specifically, temporal correlation across generated 4D sequences is designed to rectify deformable scales and rotations and guarantee temporal consistency. Furthermore, an adaptive spatial densification and pruning strategy is proposed to address significant temporal variations by dynamically adding or deleting Gaussian points with the awareness of their pre-frame movements. Extensive experiments demonstrate that our 4DSTR achieves state-of-the-art performance in video-to-4D generation, excelling in reconstruction quality, spatial-temporal consistency, and adaptation to rapid temporal movements.\n\n近年来二维图像与三维形状生成领域的显著进展，推动了对动态四维内容生成的广泛关注。然而，受限于缺乏有效的时空建模，现有的四维生成方法通常难以保持时空一致性，并且对快速时间变化的适应能力较弱。为了解决上述问题，我们提出了一种新的四维生成网络 4DSTR，通过引入时空校正机制，对生成式四维高斯溅射（4D Gaussian Splatting）进行调制。具体而言，我们利用生成的四维序列之间的时间相关性，对可形变高斯的尺度与旋转进行校正，从而保证时间一致性。此外，我们提出了一种自适应的空间致密化与裁剪策略，在感知前一帧运动信息的基础上，动态地增加或删除高斯点，以应对剧烈的时间变化。大量实验结果表明，4DSTR 在视频到四维生成任务中达到了当前最先进的性能，在重建质量、时空一致性以及对快速时间运动的适应能力方面均表现出显著优势。\n"
  },
  {
    "path": "abs/2511.07321.md",
    "content": "### YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting\n\nFast and flexible 3D scene reconstruction from unstructured image collections remains a significant challenge. We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images. Our model is highly versatile, operating effectively with both posed and unposed, calibrated and uncalibrated inputs. YoNoSplat predicts local Gaussians and camera poses for each view, which are aggregated into a global representation using either predicted or provided poses. To overcome the inherent difficulty of jointly learning 3D Gaussians and camera parameters, we introduce a novel mixing training strategy. This approach mitigates the entanglement between the two tasks by initially using ground-truth poses to aggregate local Gaussians and gradually transitioning to a mix of predicted and ground-truth poses, which prevents both training instability and exposure bias. We further resolve the scale ambiguity problem by a novel pairwise camera-distance normalization scheme and by embedding camera intrinsics into the network. Moreover, YoNoSplat also predicts intrinsic parameters, making it feasible for uncalibrated inputs. YoNoSplat demonstrates exceptional efficiency, reconstructing a scene from 100 views (at 280x518 resolution) in just 2.69 seconds on an NVIDIA GH200 GPU. It achieves state-of-the-art performance on standard benchmarks in both pose-free and pose-dependent settings.\n\n从无结构图像集合中实现快速且灵活的三维场景重建仍然是一项重要挑战。我们提出了 YoNoSplat，一种前馈式模型，能够从任意数量的图像中重建高质量的三维高斯溅射（3D Gaussian Splatting）表示。该模型具有很强的通用性，可同时适用于有位姿与无位姿、已标定与未标定的输入数据。YoNoSplat 为每个视角预测局部高斯基元和相机位姿，并利用预测的或已知的位姿将其聚合为全局表示。针对联合学习三维高斯与相机参数所固有的困难，我们提出了一种新的混合训练策略：在训练初期使用真实位姿来聚合局部高斯，随后逐步过渡到预测位姿与真实位姿的混合使用，从而有效缓解两项任务之间的耦合，避免训练不稳定和暴露偏差问题。我们还通过一种新的成对相机距离归一化方案，并将相机内参嵌入网络中，解决了尺度不确定性问题。此外，YoNoSplat 还能直接预测相机内参，使其能够处理未标定输入。在效率方面，YoNoSplat 表现出色：在 NVIDIA GH200 GPU 上，仅需 2.69 秒即可从 100 个视角（分辨率为 280×518）重建一个完整场景。该方法在标准基准测试中，无论在无位姿还是有位姿设置下，均达到了当前最先进的性能。\n"
  },
  {
    "path": "abs/2511.07409.md",
    "content": "### DIMO: Diverse 3D Motion Generation for Arbitrary Objects\n\nWe present DIMO, a generative approach capable of generating diverse 3D motions for arbitrary objects from a single image. The core idea of our work is to leverage the rich priors in well-trained video models to extract the common motion patterns and then embed them into a shared low-dimensional latent space. Specifically, we first generate multiple videos of the same object with diverse motions. We then embed each motion into a latent vector and train a shared motion decoder to learn the distribution of motions represented by a structured and compact motion representation, i.e., neural key point trajectories. The canonical 3D Gaussians are then driven by these key points and fused to model the geometry and appearance. During inference time with learned latent space, we can instantly sample diverse 3D motions in a single-forward pass and support several interesting applications including 3D motion interpolation and language-guided motion generation.\n\n我们提出了 **DIMO**，一种仅从单张图像即可为任意物体生成多样化三维运动的生成式方法。该方法的核心思想是利用成熟视频模型中蕴含的丰富先验知识，提取通用的运动模式，并将其嵌入到一个共享的低维潜在空间中。具体而言，我们首先为同一物体生成多段具有不同运动形态的视频；随后将每种运动嵌入为一个潜在向量，并训练一个共享的运动解码器，以学习由结构化且紧凑的运动表示——即**神经关键点轨迹**——所刻画的运动分布。接着，利用这些关键点来驱动规范化的三维高斯基元，并进行融合以建模物体的几何结构与外观。在推理阶段，借助学习到的潜在空间，模型能够在单次前向传播中高效采样多样化的三维运动，并支持多种有趣的应用场景，包括**三维运动插值**以及**语言引导的运动生成**。\n"
  },
  {
    "path": "abs/2511.08032.md",
    "content": "### Perceptual Quality Assessment of 3D Gaussian Splatting: A Subjective Dataset and Prediction Metric\n\nWith the rapid advancement of 3D visualization, 3D Gaussian Splatting (3DGS) has emerged as a leading technique for real-time, high-fidelity rendering. While prior research has emphasized algorithmic performance and visual fidelity, the perceptual quality of 3DGS-rendered content, especially under varying reconstruction conditions, remains largely underexplored. In practice, factors such as viewpoint sparsity, limited training iterations, point downsampling, noise, and color distortions can significantly degrade visual quality, yet their perceptual impact has not been systematically studied. To bridge this gap, we present 3DGS-QA, the first subjective quality assessment dataset for 3DGS. It comprises 225 degraded reconstructions across 15 object types, enabling a controlled investigation of common distortion factors. Based on this dataset, we introduce a no-reference quality prediction model that directly operates on native 3D Gaussian primitives, without requiring rendered images or ground-truth references. Our model extracts spatial and photometric cues from the Gaussian representation to estimate perceived quality in a structure-aware manner. We further benchmark existing quality assessment methods, spanning both traditional and learning-based approaches. Experimental results show that our method consistently achieves superior performance, highlighting its robustness and effectiveness for 3DGS content evaluation.\n\n随着三维可视化技术的快速发展，**三维高斯溅射（3D Gaussian Splatting，3DGS）**已成为实现实时高保真渲染的代表性技术。尽管以往研究主要关注算法性能和视觉保真度，但在不同重建条件下 3DGS 渲染内容的**感知质量**仍然缺乏系统研究。在实际应用中，诸如**视角稀疏、训练迭代次数受限、点下采样、噪声以及颜色失真**等因素都会显著降低视觉质量，但其对人类感知的影响尚未得到系统分析。\n为弥补这一空白，我们提出了 **3DGS-QA**，这是首个面向 3DGS 的主观质量评价数据集。该数据集涵盖了 **15 类物体上的 225 个退化重建结果**，使得对常见失真因素的可控研究成为可能。基于该数据集，我们进一步提出了一种**无参考质量预测模型**，该模型直接作用于原生的三维高斯基元表示，无需渲染图像或真实参考。模型从高斯表示中提取**空间与光度线索**，以**结构感知**的方式估计主观感知质量。\n此外，我们还对现有质量评价方法进行了基准测试，涵盖传统方法与学习方法。实验结果表明，我们的方法始终取得更优的性能，凸显了其在 **3DGS 内容质量评估**中的鲁棒性与有效性。\n"
  },
  {
    "path": "abs/2511.08294.md",
    "content": "### SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering\n\nAccurate 3D human pose estimation is fundamental for applications such as augmented reality and human-robot interaction. State-of-the-art multi-view methods learn to fuse predictions across views by training on large annotated datasets, leading to poor generalization when the test scenario differs. To overcome these limitations, we propose SkelSplat, a novel framework for multi-view 3D human pose estimation based on differentiable Gaussian rendering. Human pose is modeled as a skeleton of 3D Gaussians, one per joint, optimized via differentiable rendering to enable seamless fusion of arbitrary camera views without 3D ground-truth supervision. Since Gaussian Splatting was originally designed for dense scene reconstruction, we propose a novel one-hot encoding scheme that enables independent optimization of human joints. SkelSplat outperforms approaches that do not rely on 3D ground truth in Human3.6M and CMU, while reducing the cross-dataset error up to 47.8% compared to learning-based methods. Experiments on Human3.6M-Occ and Occlusion-Person demonstrate robustness to occlusions, without scenario-specific fine-tuning.\n\n精确的三维人体姿态估计是增强现实和人机交互等应用的基础。当前最先进的多视角方法通常依赖大规模标注数据进行训练，通过学习跨视角融合预测来实现性能提升，但当测试场景与训练分布不一致时，其泛化能力往往较差。\n为克服这些局限性，我们提出了 **SkelSplat**，一种基于**可微高斯渲染**的多视角三维人体姿态估计新框架。该方法将人体姿态建模为由三维高斯组成的骨架结构，每个关节对应一个高斯，并通过可微渲染进行优化，从而在无需三维真实标注监督的情况下，实现对任意相机视角的无缝融合。\n鉴于高斯溅射最初是为稠密场景重建而设计的，我们提出了一种新的 **one-hot 编码方案**，使得各个人体关节能够被独立优化。SkelSplat 在 **Human3.6M** 和 **CMU** 数据集上优于不依赖三维真实标注的方法，并且相比基于学习的方法，跨数据集误差最多降低 **47.8%**。在 **Human3.6M-Occ** 和 **Occlusion-Person** 数据集上的实验进一步表明，该方法在无需针对特定场景进行微调的情况下，依然对遮挡具有良好的鲁棒性。\n"
  },
  {
    "path": "abs/2511.09397.md",
    "content": "### OUGS: Active View Selection via Object-aware Uncertainty Estimation in 3DGS\n\nRecent advances in 3D Gaussian Splatting (3DGS) have achieved state-of-the-art results for novel view synthesis. However, efficiently capturing high-fidelity reconstructions of specific objects within complex scenes remains a significant challenge. A key limitation of existing active reconstruction methods is their reliance on scene-level uncertainty metrics, which are often biased by irrelevant background clutter and lead to inefficient view selection for object-centric tasks. We present OUGS, a novel framework that addresses this challenge with a more principled, physically-grounded uncertainty formulation for 3DGS. Our core innovation is to derive uncertainty directly from the explicit physical parameters of the 3D Gaussian primitives (e.g., position, scale, rotation). By propagating the covariance of these parameters through the rendering Jacobian, we establish a highly interpretable uncertainty model. This foundation allows us to then seamlessly integrate semantic segmentation masks to produce a targeted, object-aware uncertainty score that effectively disentangles the object from its environment. This allows for a more effective active view selection strategy that prioritizes views critical to improving object fidelity. Experimental evaluations on public datasets demonstrate that our approach significantly improves the efficiency of the 3DGS reconstruction process and achieves higher quality for targeted objects compared to existing state-of-the-art methods, while also serving as a robust uncertainty estimator for the global scene.\n\n近年来，**三维高斯溅射（3D Gaussian Splatting，3DGS）**在新视角合成任务上取得了最先进的性能。然而，在复杂场景中高效地捕获**特定目标的高保真重建**仍然是一个重大挑战。现有主动重建方法的一个关键局限在于其依赖于**场景级不确定性度量**，而这些度量往往受到无关背景杂波的干扰，从而在以目标为中心的任务中导致低效的视角选择。\n为此，我们提出了 **OUGS**，这是一种针对 3DGS 的新型框架，通过更加原则化、**物理可解释的不确定性建模**来解决上述问题。我们的核心创新在于，直接从三维高斯基元的**显式物理参数**（如位置、尺度和旋转）中推导不确定性。通过将这些参数的协方差经由**渲染雅可比矩阵**进行传播，我们构建了一个高度可解释的不确定性模型。\n在此基础上，我们进一步无缝融合**语义分割掩码**，生成一种具有针对性的、**目标感知的不确定性评分**，从而有效地将目标与其环境解耦。这使得我们能够制定更加有效的**主动视角选择策略**，优先选择对提升目标重建质量至关重要的视角。\n在公开数据集上的实验评估表明，与现有最先进方法相比，我们的方法显著提升了 **3DGS 重建过程的效率**，并在目标对象上取得了更高的重建质量，同时也可作为一种**稳健的全局场景不确定性估计器**。\n"
  },
  {
    "path": "abs/2511.09695.md",
    "content": "### A Shared-Autonomy Construction Robotic System for Overhead Works\n\nWe present the ongoing development of a robotic system for overhead work such as ceiling drilling. The hardware platform comprises a mobile base with a two-stage lift, on which a bimanual torso is mounted with a custom-designed drilling end effector and RGB-D cameras. To support teleoperation in dynamic environments with limited visibility, we use Gaussian splatting for online 3D reconstruction and introduce motion parameters to model moving objects. For safe operation around dynamic obstacles, we developed a neural configuration-space barrier approach for planning and control. Initial feasibility studies demonstrate the capability of the hardware in drilling, bolting, and anchoring, and the software in safe teleoperation in a dynamic environment.\n\n我们介绍了一套用于**顶置作业**（如天花板钻孔）的机器人系统的持续研发工作。该硬件平台由一台带有**两级升降机构的移动底座**组成，其上安装了一个**双臂躯干**，配备定制设计的**钻孔末端执行器**以及 **RGB-D 相机**。\n为支持在**可视性受限的动态环境**中的远程操作，我们采用**高斯溅射**进行在线三维重建，并引入**运动参数**以对移动物体进行建模。为在**动态障碍物**周围实现安全作业，我们提出了一种**基于神经配置空间势垒**的方法用于规划与控制。\n初步可行性研究表明，该**硬件系统**具备执行**钻孔、螺栓安装和锚固作业**的能力，而**软件系统**能够在动态环境中实现**安全的远程操作**。\n"
  },
  {
    "path": "abs/2511.09818.md",
    "content": "### Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration\n\nRestoring 3D scenes captured under low-light con- ditions remains a fundamental yet challenging problem. Most existing approaches depend on precomputed camera poses and scene-specific optimization, which greatly restricts their scala- bility to dynamic real-world environments. To overcome these limitations, we introduce Lumos3D, a generalizable pose-free framework for 3D low-light scene restoration. Trained once on a single dataset, Lumos3D performs inference in a purely feed- forward manner, directly restoring illumination and structure from unposed, low-light multi-view images without any per- scene training or optimization. Built upon a geometry-grounded backbone, Lumos3D reconstructs a normal-light 3D Gaussian representation that restores illumination while faithfully pre- serving structural details. During training, a cross-illumination distillation scheme is employed, where the teacher network is distilled on normal-light ground truth to transfer accurate geometric information, such as depth, to the student model. A dedicated Lumos loss is further introduced to promote photomet- ric consistency within the reconstructed 3D space. Experiments on real-world datasets demonstrate that Lumos3D achieves high- fidelity low-light 3D scene restoration with accurate geometry and strong generalization to unseen cases. Furthermore, the framework naturally extends to handle over-exposure correction, highlighting its versatility for diverse lighting restoration tasks.\n\n在低照度条件下采集的三维场景恢复仍然是一个基础而具有挑战性的问题。现有大多数方法依赖于**预先计算的相机位姿**以及**针对单一场景的优化过程**，这在很大程度上限制了其在动态真实环境中的可扩展性。\n为克服这些局限，我们提出了 **Lumos3D**，一种具有良好泛化能力的、**无需位姿**的三维低照度场景恢复框架。Lumos3D 仅需在单一数据集上训练一次，即可在推理阶段以**纯前向传播**的方式工作，直接从**无位姿的低照度多视图图像**中恢复照明与结构，而无需任何逐场景训练或优化。\n基于以几何为基础的网络骨干，Lumos3D 重建出一种**正常照度下的三维高斯表示**，在恢复照明的同时忠实地保留结构细节。在训练过程中，我们采用了一种**跨照度蒸馏策略**，即在正常照度真实数据上对教师网络进行蒸馏，将准确的几何信息（如深度）传递给学生模型。此外，我们还引入了一种专门的 **Lumos 损失**，以促进重建三维空间内的**光度一致性**。\n在真实世界数据集上的实验表明，Lumos3D 能够实现**高保真的低照度三维场景恢复**，具备准确的几何重建能力，并对未见过的场景表现出很强的**泛化性**。进一步地，该框架还可以自然地扩展用于**过曝光校正**，体现了其在多种照明恢复任务中的通用性与灵活性。\n"
  },
  {
    "path": "abs/2511.09827.md",
    "content": "### AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting\n\nWe present a novel framework for animating humans in 3D scenes using 3D Gaussian Splatting (3DGS), a neural scene representation that has recently achieved state-of-the-art photorealistic results for novel-view synthesis but remains under-explored for human-scene animation and interaction. Unlike existing animation pipelines that use meshes or point clouds as the underlying 3D representation, our approach introduces the use of 3DGS as the 3D representation to the problem of animating humans in scenes. By representing humans and scenes as Gaussians, our approach allows for geometry-consistent free-viewpoint rendering of humans interacting with 3D scenes. Our key insight is that the rendering can be decoupled from the motion synthesis and each sub-problem can be addressed independently, without the need for paired human-scene data. Central to our method is a Gaussian-aligned motion module that synthesizes motion without explicit scene geometry, using opacity-based cues and projected Gaussian structures to guide human placement and pose alignment. To ensure natural interactions, we further propose a human-scene Gaussian refinement optimization that enforces realistic contact and navigation. We evaluate our approach on scenes from Scannet++ and the SuperSplat library, and on avatars reconstructed from sparse and dense multi-view human capture. Finally, we demonstrate that our framework allows for novel applications such as geometry-consistent free-viewpoint rendering of edited monocular RGB videos with new animated humans, showcasing the unique advantage of 3DGS for monocular video-based human animation.\n\n我们提出了一种利用**三维高斯溅射（3D Gaussian Splatting，3DGS）**在三维场景中进行人体动画的新型框架。3DGS 作为一种神经场景表示，近年来在新视角合成任务中取得了最先进的照片级真实效果，但在**人体—场景动画与交互**方面仍有待深入探索。\n不同于现有以**网格或点云**作为底层三维表示的动画流程，我们的方法首次将 **3DGS 作为三维表示**引入到场景中人体动画的问题中。通过将人体和场景统一表示为**高斯基元**，该方法能够实现人体与三维场景交互时在**几何上一致的自由视角渲染**。\n我们的关键洞见在于，可以将**渲染过程与运动合成过程解耦**，并在无需成对的人体—场景数据的情况下，独立地解决这两个子问题。方法的核心是一个**与高斯对齐的运动模块**，该模块在不显式依赖场景几何的情况下进行运动合成，利用**基于不透明度的线索**以及**投影后的高斯结构**来引导人体的放置与姿态对齐。\n为确保交互的自然性，我们进一步提出了一种**人体—场景高斯细化优化策略**，以约束真实的接触关系与行走导航。我们在 **Scannet++** 和 **SuperSplat** 库中的场景上，以及由**稀疏和稠密多视角人体采集**重建的虚拟角色上对该方法进行了评估。\n最后，我们展示了该框架支持的一系列新颖应用，例如在**经过编辑的单目 RGB 视频**中引入新的动画人体，并进行**几何一致的自由视角渲染**，从而体现了 **3DGS 在基于单目视频的人体动画任务中的独特优势**。\n"
  },
  {
    "path": "abs/2511.09944.md",
    "content": "### TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting\n\n3D Gaussian Splatting offers a strong speed-quality trade-off but struggles to reconstruct semi-transparent surfaces because most methods assume a single depth per pixel, which fails when multiple surfaces are visible. We propose TSPE-GS (Transparent Surface Probabilistic Extraction for Gaussian Splatting), which uniformly samples transmittance to model a pixel-wise multi-modal distribution of opacity and depth, replacing the prior single-peak assumption and resolving cross-surface depth ambiguity. By progressively fusing truncated signed distance functions, TSPE-GS reconstructs external and internal surfaces separately within a unified framework. The method generalizes to other Gaussian-based reconstruction pipelines without extra training overhead. Extensive experiments on public and self-collected semi-transparent and opaque datasets show TSPE-GS significantly improves semi-transparent geometry reconstruction while maintaining performance on opaque scenes.\n\n**三维高斯溅射（3D Gaussian Splatting）**在速度与质量之间实现了良好的权衡，但在重建**半透明表面**时仍面临困难，其根本原因在于大多数方法假设每个像素仅对应**单一深度**，当存在多层可见表面时该假设将失效。\n为此，我们提出了 **TSPE-GS（Transparent Surface Probabilistic Extraction for Gaussian Splatting）**，通过对**透射率进行均匀采样**，建模像素级**不透明度与深度的多模态分布**，从而替代先前的单峰假设，并有效消除**跨表面的深度歧义**。\n通过逐步融合**截断符号距离函数**，TSPE-GS 在统一框架内分别重建**外部与内部表面**。该方法无需额外训练开销即可推广至其他**基于高斯的重建流水线**。\n在公开数据集和自采集的**半透明与不透明数据集**上的大量实验表明，TSPE-GS 在**显著提升半透明几何重建质量**的同时，仍能保持在**不透明场景**上的重建性能。\n"
  },
  {
    "path": "abs/2511.10060.md",
    "content": "### Multivariate Gaussian Representation Learning for Medical Action Evaluation\n\nFine-grained action evaluation in medical vision faces unique challenges due to the unavailability of comprehensive datasets, stringent precision requirements, and insufficient spatiotemporal dynamic modeling of very rapid actions. To support development and evaluation, we introduce CPREval-6k, a multi-view, multi-label medical action benchmark containing 6,372 expert-annotated videos with 22 clinical labels. Using this dataset, we present GaussMedAct, a multivariate Gaussian encoding framework, to advance medical motion analysis through adaptive spatiotemporal representation learning. Multivariate Gaussian Representation projects the joint motions to a temporally scaled multi-dimensional space, and decomposes actions into adaptive 3D Gaussians that serve as tokens. These tokens preserve motion semantics through anisotropic covariance modeling while maintaining robustness to spatiotemporal noise. Hybrid Spatial Encoding, employing a Cartesian and Vector dual-stream strategy, effectively utilizes skeletal information in the form of joint and bone features. The proposed method achieves 92.1% Top-1 accuracy with real-time inference on the benchmark, outperforming baseline by +5.9% accuracy with only 10% FLOPs. Cross-dataset experiments confirm the superiority of our method in robustness.\n\n医学视觉中的**细粒度动作评估**面临着独特挑战，主要源于**缺乏全面的数据集**、**对精度的严格要求**，以及对**高速动作时空动态建模能力不足**。\n为支持相关方法的开发与评测，我们提出了 **CPREval-6k**，这是一个**多视角、多标签**的医学动作基准数据集，包含 **6,372 段由专家标注的视频**，覆盖 **22 个临床标签**。\n基于该数据集，我们提出了 **GaussMedAct**，一种**多变量高斯编码框架**，通过**自适应的时空表示学习**推进医学动作分析。**多变量高斯表示**将关节运动投影到经过时间尺度调整的多维空间中，并将动作分解为作为 **token 的自适应三维高斯**。这些 token 通过**各向异性的协方差建模**来保留运动语义，同时对**时空噪声**保持鲁棒性。\n**混合空间编码**采用**笛卡尔坐标与向量的双流策略**，有效利用了以**关节和骨骼特征**形式表示的骨架信息。所提出的方法在该基准上实现了 **92.1% 的 Top-1 准确率**，并支持**实时推理**，在仅使用基线方法 **10% FLOPs** 的情况下，准确率提升了 **5.9%**。**跨数据集实验**进一步验证了该方法在**鲁棒性**方面的优势。\n"
  },
  {
    "path": "abs/2511.10316.md",
    "content": "### Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision\n\nThree-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing methods fail to simultaneously address inaccurate depth estimation in distant areas and structural degradation in close-range regions. This paper proposes a novel computational framework that integrates depth-of-field supervision and multi-view consistency supervision to advance 3D Gaussian Splatting. Our approach comprises two core components: (1) Depth-of-field Supervision employs a scale-recovered monocular depth estimator (e.g., Metric3D) to generate depth priors, leverages defocus convolution to synthesize physically accurate defocused images, and enforces geometric consistency through a novel depth-of-field loss, thereby enhancing depth fidelity in both far-field and near-field regions; (2) Multi-View Consistency Supervision employing LoFTR-based semi-dense feature matching to minimize cross-view geometric errors and enforce depth consistency via least squares optimization of reliable matched points. By unifying defocus physics with multi-view geometric constraints, our method achieves superior depth fidelity, demonstrating a 0.8 dB PSNR improvement over the state-of-the-art method on the Waymo Open Dataset. This framework bridges physical imaging principles and learning-based depth regularization, offering a scalable solution for complex depth stratification in urban environments.\n\n在具有极端深度变化的场景中进行三维重建仍然面临巨大挑战，其根本原因在于近场与远场区域之间监督信号的不一致性。现有方法难以同时解决远距离区域深度估计不准确以及近距离区域结构退化的问题。本文提出了一种新的计算框架，将景深监督（depth-of-field supervision）与多视角一致性监督相结合，以推进 3D Gaussian Splatting 的发展。我们的方法包含两个核心组成部分：（1）景深监督：利用尺度恢复的单目深度估计器（如 Metric3D）生成深度先验，并通过离焦卷积合成物理一致的虚焦图像，同时引入一种新的景深损失来约束几何一致性，从而同时提升远场与近场区域的深度保真度；（2）多视角一致性监督：采用基于 LoFTR 的半稠密特征匹配来最小化跨视角的几何误差，并通过对可靠匹配点进行最小二乘优化以强化深度一致性。通过将离焦成像物理与多视角几何约束相统一，我们的方法在深度保真度方面取得了显著提升，在 Waymo Open Dataset 上相比当前最先进方法实现了 0.8 dB 的 PSNR 提升。该框架在物理成像原理与基于学习的深度正则化之间架起了桥梁，为城市环境中复杂深度分层问题提供了一种可扩展的解决方案。\n"
  },
  {
    "path": "abs/2511.11048.md",
    "content": "### PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI\n\n4D flow magnetic resonance imaging (MRI) is a reliable, non-invasive approach for estimating blood flow velocities, vital for cardiovascular diagnostics. Unlike conventional MRI focused on anatomical structures, 4D flow MRI requires high spatiotemporal resolution for early detection of critical conditions such as stenosis or aneurysms. However, achieving such resolution typically results in prolonged scan times, creating a trade-off between acquisition speed and prediction accuracy. Recent studies have leveraged physics-informed neural networks (PINNs) for super-resolution of MRI data, but their practical applicability is limited as the prohibitively slow training process must be performed for each patient. To overcome this limitation, we propose PINGS-X, a novel framework modeling high-resolution flow velocities using axes-aligned spatiotemporal Gaussian representations. Inspired by the effectiveness of 3D Gaussian splatting (3DGS) in novel view synthesis, PINGS-X extends this concept through several non-trivial novel innovations: (i) normalized Gaussian splatting with a formal convergence guarantee, (ii) axes-aligned Gaussians that simplify training for high-dimensional data while preserving accuracy and the convergence guarantee, and (iii) a Gaussian merging procedure to prevent degenerate solutions and boost computational efficiency. Experimental results on computational fluid dynamics (CFD) and real 4D flow MRI datasets demonstrate that PINGS-X substantially reduces training time while achieving superior super-resolution accuracy.\n\n四维流磁共振成像（4D flow MRI）是一种可靠的无创方法，可用于估计血流速度，对心血管疾病诊断至关重要。不同于主要关注解剖结构的传统 MRI，4D flow MRI 需要高时空分辨率，以便及早检测诸如血管狭窄或动脉瘤等关键病变。然而，实现如此高的分辨率通常会导致扫描时间显著延长，从而在数据采集速度与预测精度之间形成权衡。近年来的研究利用物理信息神经网络（PINNs）对 MRI 数据进行超分辨率重建，但其实际应用受到限制，因为训练过程极其缓慢且必须针对每位患者单独执行。为克服这一局限性，我们提出了 PINGS-X，这是一种利用轴对齐的时空高斯表示来建模高分辨率流速的全新框架。受 3D Gaussian Splatting（3DGS）在新视角合成中有效性的启发，PINGS-X 在此基础上引入了若干非平凡的创新：（i）具有形式化收敛性保证的归一化高斯 splatting；（ii）轴对齐高斯表示，在保持精度和收敛性保证的同时，简化了高维数据的训练过程；（iii）高斯合并策略，用于避免退化解并提升计算效率。在计算流体力学（CFD）数据集和真实 4D flow MRI 数据集上的实验结果表明，PINGS-X 在显著降低训练时间的同时，实现了更优的超分辨率重建精度。\n"
  },
  {
    "path": "abs/2511.11213.md",
    "content": "### RealisticDreamer: Guidance Score Distillation for Few-shot Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently gained great attention in the 3D scene representation for its high-quality real-time rendering capabilities. However, when the input comprises sparse training views, 3DGS is prone to overfitting, primarily due to the lack of intermediate-view supervision. Inspired by the recent success of Video Diffusion Models (VDM), we propose a framework called Guidance Score Distillation (GSD) to extract the rich multi-view consistency priors from pretrained VDMs. Building on the insights from Score Distillation Sampling (SDS), GSD supervises rendered images from multiple neighboring views, guiding the Gaussian splatting representation towards the generative direction of VDM. However, the generative direction often involves object motion and random camera trajectories, making it challenging for direct supervision in the optimization process. To address this problem, we introduce an unified guidance form to correct the noise prediction result of VDM. Specifically, we incorporate both a depth warp guidance based on real depth maps and a guidance based on semantic image features, ensuring that the score update direction from VDM aligns with the correct camera pose and accurate geometry. Experimental results show that our method outperforms existing approaches across multiple datasets.\n\n3D Gaussian Splatting（3DGS）因其高质量的实时渲染能力，近年来在三维场景表示领域受到广泛关注。然而，当输入仅包含稀疏的训练视角时，3DGS 容易发生过拟合，其主要原因在于缺乏中间视角的监督。受视频扩散模型（Video Diffusion Models，VDM）近期成功的启发，我们提出了一种名为引导得分蒸馏（Guidance Score Distillation，GSD）的框架，用于从预训练的 VDM 中提取丰富的多视角一致性先验。在得分蒸馏采样（Score Distillation Sampling，SDS）的思想基础上，GSD 对来自多个相邻视角的渲染图像进行监督，引导高斯 splatting 表示朝向 VDM 的生成方向优化。然而，该生成方向往往包含物体运动和随机相机轨迹，这使其在优化过程中难以直接用于监督。为了解决这一问题，我们提出了一种统一的引导形式，用于修正 VDM 的噪声预测结果。具体而言，我们同时引入了基于真实深度图的深度变形引导，以及基于语义图像特征的引导，从而确保来自 VDM 的得分更新方向与正确的相机位姿和精确的几何结构保持一致。实验结果表明，我们的方法在多个数据集上均优于现有方法。\n"
  },
  {
    "path": "abs/2511.11231.md",
    "content": "### 3D Gaussian and Diffusion-Based Gaze Redirection\n\nHigh-fidelity gaze redirection is critical for generating augmented data to improve the generalization of gaze estimators. 3D Gaussian Splatting (3DGS) models like GazeGaussian represent the state-of-the-art but can struggle with rendering subtle, continuous gaze shifts. In this paper, we propose DiT-Gaze, a framework that enhances 3D gaze redirection models using a novel combination of Diffusion Transformer (DiT), weak supervision across gaze angles, and an orthogonality constraint loss. DiT allows higher-fidelity image synthesis, while our weak supervision strategy using synthetically generated intermediate gaze angles provides a smooth manifold of gaze directions during training. The orthogonality constraint loss mathematically enforces the disentanglement of internal representations for gaze, head pose, and expression. Comprehensive experiments show that DiT-Gaze sets a new state-of-the-art in both perceptual quality and redirection accuracy, reducing the state-of-the-art gaze error by 4.1% to 6.353 degrees, providing a superior method for creating synthetic training data. Our code and models will be made available for the research community to benchmark against.\n\n高保真的视线重定向对于生成增强数据、提升视线估计器的泛化能力至关重要。以 GazeGaussian 为代表的 3D Gaussian Splatting（3DGS）模型已达到当前最先进水平，但在渲染细微且连续的视线变化时仍面临困难。本文提出了 DiT-Gaze，一个通过结合扩散 Transformer（Diffusion Transformer，DiT）、跨视线角度的弱监督以及正交约束损失来增强三维视线重定向模型的框架。DiT 能够实现更高保真的图像合成，而我们基于合成生成的中间视线角度的弱监督策略，则在训练过程中提供了平滑的视线方向流形。正交约束损失从数学上强制视线、头部姿态与表情等内部表示之间的解耦。大量实验表明，DiT-Gaze 在感知质量和重定向精度两方面均达到了新的最先进水平，将当前最优的视线误差降低了 4.1%，达到 6.353 度，为构建合成训练数据提供了一种更优的方法。我们的代码和模型将向研究社区公开，以便进行基准评测。\n"
  },
  {
    "path": "abs/2511.12040.md",
    "content": "### SRSplat: Feed-Forward Super-Resolution Gaussian Splatting from Sparse Multi-View Images\n\nFeed-forward 3D reconstruction from sparse, low-resolution (LR) images is a crucial capability for real-world applications, such as autonomous driving and embodied AI. However, existing methods often fail to recover fine texture details. This limitation stems from the inherent lack of high-frequency information in LR inputs. To address this, we propose \\textbf{SRSplat}, a feed-forward framework that reconstructs high-resolution 3D scenes from only a few LR views. Our main insight is to compensate for the deficiency of texture information by jointly leveraging external high-quality reference images and internal texture cues. We first construct a scene-specific reference gallery, generated for each scene using Multimodal Large Language Models (MLLMs) and diffusion models. To integrate this external information, we introduce the Reference-Guided Feature Enhancement (RGFE) module, which aligns and fuses features from the LR input images and their reference twin image. Subsequently, we train a decoder to predict the Gaussian primitives using the multi-view fused feature obtained from RGFE. To further refine predicted Gaussian primitives, we introduce Texture-Aware Density Control (TADC), which adaptively adjusts Gaussian density based on the internal texture richness of the LR inputs. Extensive experiments demonstrate that our SRSplat outperforms existing methods on various datasets, including RealEstate10K, ACID, and DTU, and exhibits strong cross-dataset and cross-resolution generalization capabilities.\n\n从稀疏、低分辨率（LR）图像进行前馈式三维重建是自动驾驶、具身智能等真实世界应用中的一项关键能力。然而，现有方法往往难以恢复精细的纹理细节。这一局限主要源于 LR 输入中固有的高频信息缺失。为此，我们提出了 **SRSplat**，一种仅利用少量 LR 视角即可重建高分辨率三维场景的前馈式框架。我们的核心洞见在于，通过联合利用外部高质量参考图像与内部纹理线索来弥补纹理信息的不足。首先，我们为每个场景构建一个场景特定的参考图像库，该参考库由多模态大语言模型（MLLMs）与扩散模型生成。为有效融合这些外部信息，我们提出了 *参考引导特征增强（Reference-Guided Feature Enhancement，RGFE）* 模块，用于对齐并融合 LR 输入图像及其对应的参考孪生图像的特征。随后，我们基于 *RGFE* 得到的多视角融合特征训练一个解码器，以预测高斯基元。为进一步细化预测得到的高斯基元，我们引入了 *纹理感知密度控制（Texture-Aware Density Control，TADC）*，该机制根据 LR 输入的内部纹理丰富程度自适应地调节高斯密度。大量实验结果表明，SRSplat 在包括 RealEstate10K、ACID 和 DTU 在内的多个数据集上均优于现有方法，并展现出强大的跨数据集与跨分辨率泛化能力。\n"
  },
  {
    "path": "abs/2511.12370.md",
    "content": "### Changes in Real Time: Online Scene Change Detection with Multi-View Fusion\n\nOnline Scene Change Detection (SCD) is an extremely challenging problem that requires an agent to detect relevant changes on the fly while observing the scene from unconstrained viewpoints. Existing online SCD methods are significantly less accurate than offline approaches. We present the first online SCD approach that is pose-agnostic, label-free, and ensures multi-view consistency, while operating at over 10 FPS and achieving new state-of-the-art performance, surpassing even the best offline approaches. Our method introduces a new self-supervised fusion loss to infer scene changes from multiple cues and observations, PnP-based fast pose estimation against the reference scene, and a fast change-guided update strategy for the 3D Gaussian Splatting scene representation. Extensive experiments on complex real-world datasets demonstrate that our approach outperforms both online and offline baselines.\n\n在线场景变化检测（Online Scene Change Detection，SCD）是一项极具挑战性的任务，要求智能体在从不受约束的视角观察场景的同时，实时检测与任务相关的变化。现有的在线 SCD 方法在精度上显著落后于离线方法。我们提出了首个同时具备位姿无关、无需标注并保证多视角一致性的在线 SCD 方法，在超过 10 FPS 的运行速度下取得了新的最先进性能，甚至超越了当前最优的离线方法。该方法引入了一种新的自监督融合损失，用于从多种线索和观测中推断场景变化；同时采用基于 PnP 的快速位姿估计与参考场景进行对齐，并提出了一种面向变化的快速更新策略，用于 3D Gaussian Splatting 场景表示。在复杂真实世界数据集上的大量实验表明，我们的方法在在线和离线基线方法之上均取得了显著优势。\n"
  },
  {
    "path": "abs/2511.12895.md",
    "content": "### Reconstructing 3D Scenes in Native High Dynamic Range\n\nHigh Dynamic Range (HDR) imaging is essential for professional digital media creation, e.g., filmmaking, virtual production, and photorealistic rendering. However, 3D scene reconstruction has primarily focused on Low Dynamic Range (LDR) data, limiting its applicability to professional workflows. Existing approaches that reconstruct HDR scenes from LDR observations rely on multi-exposure fusion or inverse tone-mapping, which increase capture complexity and depend on synthetic supervision. With the recent emergence of cameras that directly capture native HDR data in a single exposure, we present the first method for 3D scene reconstruction that directly models native HDR observations. We propose {\\bf Native High dynamic range 3D Gaussian Splatting (NH-3DGS)}, which preserves the full dynamic range throughout the reconstruction pipeline. Our key technical contribution is a novel luminance-chromaticity decomposition of the color representation that enables direct optimization from native HDR camera data. We demonstrate on both synthetic and real multi-view HDR datasets that NH-3DGS significantly outperforms existing methods in reconstruction quality and dynamic range preservation, enabling professional-grade 3D reconstruction directly from native HDR captures.\n\n高动态范围（High Dynamic Range，HDR）成像对于专业数字媒体创作至关重要，例如电影制作、虚拟制作以及照片级真实感渲染。然而，三维场景重建长期以来主要聚焦于低动态范围（Low Dynamic Range，LDR）数据，这在很大程度上限制了其在专业工作流程中的应用。从 LDR 观测重建 HDR 场景的现有方法通常依赖多曝光融合或反色调映射，这不仅增加了数据采集的复杂性，还依赖于合成监督。随着能够在单次曝光中直接捕获原生 HDR 数据的相机逐渐出现，我们提出了首个直接建模原生 HDR 观测的三维场景重建方法。具体而言，我们提出了 **原生高动态范围 3D Gaussian Splatting（Native High Dynamic Range 3D Gaussian Splatting，NH-3DGS）**，在整个重建流程中完整保留了动态范围。我们的核心技术贡献在于一种新的颜色表示的亮度–色度分解方法，使得能够直接基于原生 HDR 相机数据进行优化。我们在合成数据集和真实多视角 HDR 数据集上的实验表明，NH-3DGS 在重建质量和动态范围保持方面均显著优于现有方法，从而实现了直接基于原生 HDR 采集的专业级三维重建。\n"
  },
  {
    "path": "abs/2511.12930.md",
    "content": "### Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration\n\n3D Gaussian Splatting (3DGS) rendering in real-time on resource-constrained devices is essential for delivering immersive augmented and virtual reality (AR/VR) experiences. However, existing solutions struggle to achieve high frame rates, especially for high-resolution rendering. Our analysis identifies the sorting stage in the 3DGS rendering pipeline as the major bottleneck due to its high memory bandwidth demand. This paper presents Neo, which introduces a reuse-and-update sorting algorithm that exploits temporal redundancy in Gaussian ordering across consecutive frames, and devises a hardware accelerator optimized for this algorithm. By efficiently tracking and updating Gaussian depth ordering instead of re-sorting from scratch, Neo significantly reduces redundant computations and memory bandwidth pressure. Experimental results show that Neo achieves up to 10.0x and 5.6x higher throughput than state-of-the-art edge GPU and ASIC solution, respectively, while reducing DRAM traffic by 94.5% and 81.3%. These improvements make high-quality and low-latency on-device 3D rendering more practical.\n\n在资源受限设备上实现 3D Gaussian Splatting（3DGS）的实时渲染，对于提供沉浸式增强现实与虚拟现实（AR/VR）体验至关重要。然而，现有解决方案在实现高帧率方面仍面临困难，尤其是在高分辨率渲染场景下。我们的分析表明，3DGS 渲染流水线中的排序阶段由于对内存带宽需求极高，成为主要性能瓶颈。本文提出了 Neo，引入了一种重用-更新式排序算法，利用相邻帧之间高斯顺序的时间冗余性，并设计了一种针对该算法优化的硬件加速器。通过对高斯深度顺序进行高效跟踪与更新，而非从头重新排序，Neo 显著减少了冗余计算并缓解了内存带宽压力。实验结果表明，Neo 相比当前最先进的边缘 GPU 和 ASIC 方案，吞吐率分别提升最高可达 10.0× 和 5.6×，同时将 DRAM 访问流量分别降低了 94.5% 和 81.3%。这些改进使得在设备端实现高质量、低时延的三维渲染更加切实可行。\n"
  },
  {
    "path": "abs/2511.12941.md",
    "content": "### GUIDE: Gaussian Unified Instance Detection for Enhanced Obstacle Perception in Autonomous Driving\n\nIn the realm of autonomous driving, accurately detecting surrounding obstacles is crucial for effective decision-making. Traditional methods primarily rely on 3D bounding boxes to represent these obstacles, which often fail to capture the complexity of irregularly shaped, real-world objects. To overcome these limitations, we present GUIDE, a novel framework that utilizes 3D Gaussians for instance detection and occupancy prediction. Unlike conventional occupancy prediction methods, GUIDE also offers robust tracking capabilities. Our framework employs a sparse representation strategy, using Gaussian-to-Voxel Splatting to provide fine-grained, instance-level occupancy data without the computational demands associated with dense voxel grids. Experimental validation on the nuScenes dataset demonstrates GUIDE's performance, with an instance occupancy mAP of 21.61, marking a 50\\% improvement over existing methods, alongside competitive tracking capabilities. GUIDE establishes a new benchmark in autonomous perception systems, effectively combining precision with computational efficiency to better address the complexities of real-world driving environments.\n\n在自动驾驶领域，准确检测周围障碍物对于实现高效决策至关重要。传统方法主要依赖三维边界框来表示这些障碍物，但这种表示方式往往难以刻画现实世界中形状不规则物体的复杂性。为克服上述局限性，我们提出了 GUIDE，一种利用三维高斯表示进行实例检测与占用预测的新型框架。不同于传统的占用预测方法，GUIDE 还具备鲁棒的目标跟踪能力。该框架采用稀疏表示策略，通过 Gaussian-to-Voxel Splatting 在不引入稠密体素网格所需高额计算开销的前提下，提供细粒度、实例级别的占用信息。在 nuScenes 数据集上的实验验证表明，GUIDE 的实例占用 mAP 达到 21.61，相比现有方法提升了 50\\%，同时在跟踪性能方面也具有竞争力。GUIDE 为自动驾驶感知系统树立了新的基准，在精度与计算效率之间实现了有效平衡，从而更好地应对真实驾驶环境中的复杂挑战。\n"
  },
  {
    "path": "abs/2511.12972.md",
    "content": "### SplatSearch: Instance Image Goal Navigation for Mobile Robots using 3D Gaussian Splatting and Diffusion Models\n\nThe Instance Image Goal Navigation (IIN) problem requires mobile robots deployed in unknown environments to search for specific objects or people of interest using only a single reference goal image of the target. This problem can be especially challenging when: 1) the reference image is captured from an arbitrary viewpoint, and 2) the robot must operate with sparse-view scene reconstructions. In this paper, we address the IIN problem, by introducing SplatSearch, a novel architecture that leverages sparse-view 3D Gaussian Splatting (3DGS) reconstructions. SplatSearch renders multiple viewpoints around candidate objects using a sparse online 3DGS map, and uses a multi-view diffusion model to complete missing regions of the rendered images, enabling robust feature matching against the goal image. A novel frontier exploration policy is introduced which uses visual context from the synthesized viewpoints with semantic context from the goal image to evaluate frontier locations, allowing the robot to prioritize frontiers that are semantically and visually relevant to the goal image. Extensive experiments in photorealistic home and real-world environments validate the higher performance of SplatSearch against current state-of-the-art methods in terms of Success Rate and Success Path Length. An ablation study confirms the design choices of SplatSearch.\n\n实例图像目标导航（Instance Image Goal Navigation，IIN）问题要求部署在未知环境中的移动机器人，仅依赖一张目标的参考图像来搜索特定物体或感兴趣的人。这一问题在以下情况下尤为具有挑战性：（1）参考图像是从任意视角拍摄的；（2）机器人必须在仅有稀疏视角场景重建的条件下运行。本文针对 IIN 问题，提出了一种名为 SplatSearch 的新型架构，该架构利用稀疏视角的 3D Gaussian Splatting（3DGS）重建。SplatSearch 基于稀疏的在线 3DGS 地图，在候选物体周围渲染多个视角，并采用多视角扩散模型对渲染图像中的缺失区域进行补全，从而实现与目标图像之间的鲁棒特征匹配。我们还引入了一种新的前沿探索策略，将合成视角中的视觉上下文与目标图像中的语义上下文相结合，对前沿位置进行评估，使机器人能够优先探索在语义和视觉上与目标图像高度相关的前沿区域。在照片级真实感的家庭环境和真实世界环境中的大量实验验证了 SplatSearch 在成功率（Success Rate）和成功路径长度（Success Path Length）等指标上均优于当前最先进方法。消融实验进一步验证了 SplatSearch 各项设计选择的有效性。\n"
  },
  {
    "path": "abs/2511.13009.md",
    "content": "### TR-Gaussians: High-fidelity Real-time Rendering of Planar Transmission and Reflection with 3D Gaussian Splatting\n\nWe propose Transmission-Reflection Gaussians (TR-Gaussians), a novel 3D-Gaussian-based representation for high-fidelity rendering of planar transmission and reflection, which are ubiquitous in indoor scenes. Our method combines 3D Gaussians with learnable reflection planes that explicitly model the glass planes with view-dependent reflectance strengths. Real scenes and transmission components are modeled by 3D Gaussians and the reflection components are modeled by the mirrored Gaussians with respect to the reflection plane. The transmission and reflection components are blended according to a Fresnel-based, view-dependent weighting scheme, allowing for faithful synthesis of complex appearance effects under varying viewpoints. To effectively optimize TR-Gaussians, we develop a multi-stage optimization framework incorporating color and geometry constraints and an opacity perturbation mechanism. Experiments on different datasets demonstrate that TR-Gaussians achieve real-time, high-fidelity novel view synthesis in scenes with planar transmission and reflection, and outperform state-of-the-art approaches both quantitatively and qualitatively.\n\n我们提出了传输–反射高斯（Transmission-Reflection Gaussians，TR-Gaussians），这是一种基于 3D 高斯的全新表示方法，用于对室内场景中普遍存在的平面透射与反射现象进行高保真渲染。该方法将 3D 高斯与可学习的反射平面相结合，显式建模具有视角相关反射强度的玻璃平面。真实场景及其透射成分由 3D 高斯进行建模，而反射成分则通过相对于反射平面镜像的高斯来表示。透射与反射成分依据基于菲涅尔效应的视角相关加权方案进行融合，从而在不同视角下忠实合成复杂的外观效果。为高效优化 TR-Gaussians，我们设计了一个多阶段优化框架，结合颜色与几何约束，并引入不透明度扰动机制。在不同数据集上的实验表明，TR-Gaussians 能够在包含平面透射与反射的场景中实现实时、高保真的新视角合成，并在定量与定性评估中均优于当前最先进的方法。\n"
  },
  {
    "path": "abs/2511.13011.md",
    "content": "### Beyond Darkness: Thermal-Supervised 3D Gaussian Splatting for Low-Light Novel View Synthesis\n\nUnder extremely low-light conditions, novel view synthesis (NVS) faces severe degradation in terms of geometry, color consistency, and radiometric stability. Standard 3D Gaussian Splatting (3DGS) pipelines fail when applied directly to underexposed inputs, as independent enhancement across views causes illumination inconsistencies and geometric distortion. To address this, we present DTGS, a unified framework that tightly couples Retinex-inspired illumination decomposition with thermal-guided 3D Gaussian Splatting for illumination-invariant reconstruction. Unlike prior approaches that treat enhancement as a pre-processing step, DTGS performs joint optimization across enhancement, geometry, and thermal supervision through a cyclic enhancement-reconstruction mechanism. A thermal supervisory branch stabilizes both color restoration and geometry learning by dynamically balancing enhancement, structural, and thermal losses. Moreover, a Retinex-based decomposition module embedded within the 3DGS loop provides physically interpretable reflectance-illumination separation, ensuring consistent color and texture across viewpoints. To evaluate our method, we construct RGBT-LOW, a new multi-view low-light thermal dataset capturing severe illumination degradation. Extensive experiments show that DTGS significantly outperforms existing low-light enhancement and 3D reconstruction baselines, achieving superior radiometric consistency, geometric fidelity, and color stability under extreme illumination.\n\n在极端低照度条件下，新视角合成（Novel View Synthesis，NVS）在几何结构、颜色一致性和辐射稳定性方面都会遭遇严重退化。标准的 3D Gaussian Splatting（3DGS）流程在直接应用于欠曝光输入时往往失效，因为跨视角的独立增强会导致光照不一致和几何畸变。为此，我们提出了 DTGS，一种将受 Retinex 启发的光照分解与热成像引导的 3D Gaussian Splatting 紧密耦合的统一框架，用于实现光照不变的三维重建。不同于将增强视为预处理步骤的以往方法，DTGS 通过循环式的增强–重建机制，对增强、几何以及热监督进行联合优化。热监督分支通过动态平衡增强损失、结构损失与热损失，有效稳定了颜色恢复与几何学习过程。此外，嵌入在 3DGS 循环中的基于 Retinex 的分解模块提供了具有物理可解释性的反射率–光照分离，从而保证跨视角的颜色与纹理一致性。为评估所提出的方法，我们构建了 RGBT-LOW，一个新的多视角低照度热成像数据集，用以刻画严重的光照退化情况。大量实验结果表明，DTGS 在极端光照条件下在辐射一致性、几何保真度以及颜色稳定性方面均显著优于现有的低照度增强与三维重建基线方法。\n"
  },
  {
    "path": "abs/2511.13055.md",
    "content": "### Monocular 3D Lane Detection via Structure Uncertainty-Aware Network with Curve-Point Queries\n\nMonocular 3D lane detection is challenged by aleatoric uncertainty arising from inherent observation noise. Existing methods rely on simplified geometric assumptions, such as independent point predictions or global planar modeling, failing to capture structural variations and aleatoric uncertainty in real-world scenarios. In this paper, we propose MonoUnc, a bird's-eye view (BEV)-free 3D lane detector that explicitly models aleatoric uncertainty informed by local lane structures. Specifically, 3D lanes are projected onto the front-view (FV) space and approximated by parametric curves. Guided by curve predictions, curve-point query embeddings are dynamically generated for lane point predictions in 3D space. Each segment formed by two adjacent points is modeled as a 3D Gaussian, parameterized by the local structure and uncertainty estimations. Accordingly, a novel 3D Gaussian matching loss is designed to constrain these parameters jointly. Experiments on the ONCE-3DLanes and OpenLane datasets demonstrate that MonoUnc outperforms previous state-of-the-art (SoTA) methods across all benchmarks under stricter evaluation criteria. Additionally, we propose two comprehensive evaluation metrics for ONCE-3DLanes, calculating the average and maximum bidirectional Chamfer distances to quantify global and local errors.\n\n单目三维车道线检测面临来自观测噪声所导致的非系统性（Aleatoric）不确定性挑战。现有方法通常依赖简化的几何假设，如独立点预测或全局平面建模，难以捕捉真实场景中的结构变化与非系统性不确定性。本文提出了一种无需鸟瞰图（BEV）的三维车道线检测方法——MonoUnc，它显式建模由局部车道结构引导的非系统性不确定性。具体地，三维车道线被投影到前视图（FV）空间，并通过参数曲线进行拟合。在曲线预测的引导下，系统动态生成曲线点查询嵌入（curve-point query embeddings），用于三维空间中的车道点预测。每一段由相邻两个点构成的车道线段被建模为一个三维高斯分布，其参数由局部结构与不确定性估计共同决定。为此，我们设计了一种新的三维高斯匹配损失，用于联合约束这些参数。在 ONCE-3DLanes 和 OpenLane 数据集上的实验表明，MonoUnc 在所有基准上均优于现有最先进方法，尤其在更严格的评估标准下表现更加突出。此外，我们还为 ONCE-3DLanes 提出两项全面的评估指标，基于双向 Chamfer 距离的平均值与最大值，以量化全局与局部误差。\n"
  },
  {
    "path": "abs/2511.13264.md",
    "content": "### SymGS : Leveraging Local Symmetries for 3D Gaussian Splatting Compression\n\n3D Gaussian Splatting has emerged as a transformative technique in novel view synthesis, primarily due to its high rendering speed and photorealistic fidelity. However, its memory footprint scales rapidly with scene complexity, often reaching several gigabytes. Existing methods address this issue by introducing compression strategies that exploit primitive-level redundancy through similarity detection and quantization. We aim to surpass the compression limits of such methods by incorporating symmetry-aware techniques, specifically targeting mirror symmetries to eliminate redundant primitives. We propose a novel compression framework, SymGS, introducing learnable mirrors into the scene, thereby eliminating local and global reflective redundancies for compression. Our framework functions as a plug-and-play enhancement to state-of-the-art compression methods, (e.g. HAC) to achieve further compression. Compared to HAC, we achieve 1.66× compression across benchmark datasets (upto 3× on large-scale scenes). On an average, SymGS enables 108× compression of a 3DGS scene, while preserving rendering quality.\n\n三维高斯投影（3D Gaussian Splatting）因其高速渲染能力和真实感极强的图像质量，已成为新视角合成领域的一项变革性技术。然而，其内存开销随场景复杂度迅速增长，常常达到数GB之多。现有方法主要通过相似性检测与量化等手段挖掘原始图元级冗余，从而实现压缩。我们希望通过引入对称感知机制，突破这些方法的压缩上限，尤其是针对镜像对称性以消除冗余图元。为此，我们提出了一种全新的压缩框架——SymGS，通过在场景中引入可学习的镜面结构，同时消除局部与全局的镜像冗余，从而实现更高效的压缩。该框架可作为当前主流压缩方法（如 HAC）的即插即用增强模块，实现进一步压缩。与 HAC 相比，我们在多个基准数据集上平均实现了 1.66× 的压缩（在大规模场景中最高可达 3×）。整体而言，SymGS 可在保证渲染质量的前提下，实现平均 108× 的三维高斯场景压缩率。\n"
  },
  {
    "path": "abs/2511.13278.md",
    "content": "### SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting\n\nLightweight building surface models are crucial for digital city, navigation, and fast geospatial analytics, yet conventional multi-view geometry pipelines remain cumbersome and quality-sensitive due to their reliance on dense reconstruction, meshing, and subsequent simplification. This work presents SF-Recon, a method that directly reconstructs lightweight building surfaces from multi-view images without post-hoc mesh simplification. We first train an initial 3D Gaussian Splatting (3DGS) field to obtain a view-consistent representation. Building structure is then distilled by a normal-gradient-guided Gaussian optimization that selects primitives aligned with roof and wall boundaries, followed by multi-view edge-consistency pruning to enhance structural sharpness and suppress non-structural artifacts without external supervision. Finally, a multi-view depth-constrained Delaunay triangulation converts the structured Gaussian field into a lightweight, structurally faithful building mesh. Based on a proposed SF dataset, the experimental results demonstrate that our SF-Recon can directly reconstruct lightweight building models from multi-view imagery, achieving substantially fewer faces and vertices while maintaining computational efficiency.\n\n轻量级建筑表面模型在数字城市、导航与快速地理空间分析等任务中具有重要意义，然而传统的多视几何流程仍然繁琐且对质量敏感，原因在于其依赖于稠密重建、网格生成及后续简化操作。本文提出 SF-Recon，一种可直接从多视图图像重建轻量级建筑表面的方法，省去了事后网格简化步骤。我们首先训练一个初始的三维高斯投影（3DGS）场，以获得视角一致的表示。随后，通过法向梯度引导的高斯优化过程提取与屋顶和墙体边界对齐的结构图元，并通过多视图边缘一致性剪枝来增强结构清晰度、抑制非结构伪影，全过程无需外部监督。最后，基于多视图深度约束的 Delaunay 三角剖分将结构化高斯场转换为轻量且结构保真的建筑网格。基于我们构建的 SF 数据集，实验结果表明 SF-Recon 可直接从多视图图像中重建轻量级建筑模型，在显著减少面数与顶点数的同时保持良好的计算效率。\n"
  },
  {
    "path": "abs/2511.13571.md",
    "content": "### Opt3DGS: Optimizing 3D Gaussian Splatting with Adaptive Exploration and Curvature-Aware Exploitation\n\n3D Gaussian Splatting (3DGS) has emerged as a leading framework for novel view synthesis, yet its core optimization challenges remain underexplored. We identify two key issues in 3DGS optimization: entrapment in suboptimal local optima and insufficient convergence quality. To address these, we propose Opt3DGS, a robust framework that enhances 3DGS through a two-stage optimization process of adaptive exploration and curvature-guided exploitation. In the exploration phase, an Adaptive Weighted Stochastic Gradient Langevin Dynamics (SGLD) method enhances global search to escape local optima. In the exploitation phase, a Local Quasi-Newton Direction-guided Adam optimizer leverages curvature information for precise and efficient convergence. Extensive experiments on diverse benchmark datasets demonstrate that Opt3DGS achieves state-of-the-art rendering quality by refining the 3DGS optimization process without modifying its underlying representation.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）已成为新视角合成中的主流框架，然而其核心优化问题仍未被充分探索。我们识别出当前 3DGS 优化中的两个关键挑战：陷入次优的局部最优解，以及收敛质量不足。为应对这些问题，我们提出 Opt3DGS，一种通过“自适应探索 + 曲率引导利用”两阶段优化流程增强 3DGS 的鲁棒优化框架。在探索阶段，我们采用自适应加权随机梯度朗之万动力学（Adaptive Weighted Stochastic Gradient Langevin Dynamics, SGLD）方法提升全局搜索能力，帮助跳出局部最优。在利用阶段，我们设计了基于局部拟牛顿方向引导的 Adam 优化器（Local Quasi-Newton Direction-guided Adam），利用曲率信息实现更精确且高效的收敛。大量基准数据集的实验证明，Opt3DGS 在不改变 3DGS 表示形式的前提下，通过优化过程的精细化设计，显著提升了渲染质量，达到了当前最先进水平。\n"
  },
  {
    "path": "abs/2511.13684.md",
    "content": "### Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting\n\nWe introduce GS-Light, an efficient, textual position-aware pipeline for text-guided relighting of 3D scenes represented via Gaussian Splatting (3DGS). GS-Light implements a training-free extension of a single-input diffusion model to handle multi-view inputs. Given a user prompt that may specify lighting direction, color, intensity, or reference objects, we employ a large vision-language model (LVLM) to parse the prompt into lighting priors. Using off-the-shelf estimators for geometry and semantics (depth, surface normals, and semantic segmentation), we fuse these lighting priors with view-geometry constraints to compute illumination maps and generate initial latent codes for each view. These meticulously derived init latents guide the diffusion model to generate relighting outputs that more accurately reflect user expectations, especially in terms of lighting direction. By feeding multi-view rendered images, along with the init latents, into our multi-view relighting model, we produce high-fidelity, artistically relit images. Finally, we fine-tune the 3DGS scene with the relit appearance to obtain a fully relit 3D scene. We evaluate GS-Light on both indoor and outdoor scenes, comparing it to state-of-the-art baselines including per-view relighting, video relighting, and scene editing methods. Using quantitative metrics (multi-view consistency, imaging quality, aesthetic score, semantic similarity, etc.) and qualitative assessment (user studies), GS-Light demonstrates consistent improvements over baselines.\n\n我们提出 GS-Light，一种高效的、具备文本位置信息感知能力的 3D 场景重光照方法，适用于以三维高斯投影（3DGS）表示的场景。GS-Light 将单输入扩散模型扩展为可处理多视图输入的形式，且无需额外训练。针对用户提供的提示词（可能包含光照方向、颜色、强度或参考物体等信息），我们使用大型视觉语言模型（LVLM）将其解析为光照先验信息。结合现成的几何与语义估计器（如深度图、法线图和语义分割结果），我们将这些光照先验与视图几何约束融合，生成每个视图的光照图以及初始潜变量（init latents）。这些精细推导得到的潜变量可有效引导扩散模型生成更加符合用户预期的重光照结果，尤其在光照方向感知方面表现突出。将多视图渲染图像与初始潜变量一并输入多视图重光照模型后，我们能够生成高保真、具有艺术感的光照效果图。最后，我们将这些重光照结果用于微调 3DGS 场景，从而获得完整的重光照三维场景。我们在室内和室外数据集上对 GS-Light 进行了评估，并与现有最先进的逐视图重光照、视频重光照和场景编辑方法进行了比较。在多视图一致性、成像质量、美学评分、语义相似度等定量指标和用户调研等定性评估中，GS-Light 均展现出对现有方法的全面提升。\n"
  },
  {
    "path": "abs/2511.14149.md",
    "content": "### iGaussian: Real-Time Camera Pose Estimation via Feed-Forward 3D Gaussian Splatting Inversion\n\nRecent trends in SLAM and visual navigation have embraced 3D Gaussians as the preferred scene representation, highlighting the importance of estimating camera poses from a single image using a pre-built Gaussian model. However, existing approaches typically rely on an iterative \\textit{render-compare-refine} loop, where candidate views are first rendered using NeRF or Gaussian Splatting, then compared against the target image, and finally, discrepancies are used to update the pose. This multi-round process incurs significant computational overhead, hindering real-time performance in robotics. In this paper, we propose iGaussian, a two-stage feed-forward framework that achieves real-time camera pose estimation through direct 3D Gaussian inversion. Our method first regresses a coarse 6DoF pose using a Gaussian Scene Prior-based Pose Regression Network with spatial uniform sampling and guided attention mechanisms, then refines it through feature matching and multi-model fusion. The key contribution lies in our cross-correlation module that aligns image embeddings with 3D Gaussian attributes without differentiable rendering, coupled with a Weighted Multiview Predictor that fuses features from Multiple strategically sampled viewpoints. Experimental results on the NeRF Synthetic, Mip-NeRF 360, and T\\&T+DB datasets demonstrate a significant performance improvement over previous methods, reducing median rotation errors to 0.2° while achieving 2.87 FPS tracking on mobile robots, which is an impressive 10 times speedup compared to optimization-based approaches.\n\n近年来，SLAM 与视觉导航领域日益采用三维高斯作为首选的场景表示方式，这凸显了基于预构建高斯模型从单张图像估计相机位姿的重要性。然而，现有方法普遍依赖迭代式的“渲染-比较-优化（render-compare-refine）”流程：先使用 NeRF 或 Gaussian Splatting 渲染候选视角图像，再与目标图像进行比较，最后根据差异迭代更新相机位姿。这种多轮循环计算开销巨大，严重限制了机器人系统中的实时性能。本文提出 iGaussian，一种两阶段前馈式框架，通过直接的三维高斯反演实现实时相机位姿估计。第一阶段中，我们利用结合空间均匀采样与引导注意力机制的高斯场景先验位姿回归网络，预测粗略的 6 自由度（6DoF）位姿；第二阶段则通过特征匹配与多模型融合进一步细化位姿估计。核心创新在于设计了一种无需可微渲染的跨相关模块，可将图像特征嵌入与三维高斯属性进行对齐，同时结合加权多视图预测器（Weighted Multiview Predictor），融合来自多个策略性采样视角的特征信息。我们在 NeRF Synthetic、Mip-NeRF 360 与 T\\&T+DB 数据集上的实验表明，iGaussian 显著优于现有方法，将中位数旋转误差降低至 0.2°，并在移动机器人上实现了 2.87 FPS 的追踪速度，相较于基于优化的方法，获得了 10 倍的速度提升。\n"
  },
  {
    "path": "abs/2511.14291.md",
    "content": "### GEN3D: Generating Domain-Free 3D Scenes from a Single Image\n\nDespite recent advancements in neural 3D reconstruction, the dependence on dense multi-view captures restricts their broader applicability. Additionally, 3D scene generation is vital for advancing embodied AI and world models, which depend on diverse, high-quality scenes for learning and evaluation. In this work, we propose Gen3d, a novel method for generation of high-quality, wide-scope, and generic 3D scenes from a single image. After the initial point cloud is created by lifting the RGBD image, Gen3d maintains and expands its world model. The 3D scene is finalized through optimizing a Gaussian splatting representation. Extensive experiments on diverse datasets demonstrate the strong generalization capability and superior performance of our method in generating a world model and Synthesizing high-fidelity and consistent novel views.\n\n尽管神经三维重建技术取得了显著进展，但其对密集多视图采集的依赖性限制了在更广泛场景中的应用潜力。同时，三维场景生成对于推动具身智能（Embodied AI）与世界模型的发展至关重要，这些系统依赖多样且高质量的场景进行学习与评估。为此，本文提出 Gen3d，一种可从单张图像生成高质量、广覆盖、通用型三维场景的新方法。在通过提升 RGBD 图像生成初始点云后，Gen3d 持续维护并扩展其世界模型，最终通过优化三维高斯投影（Gaussian Splatting）表示来完成三维场景构建。在多个多样化数据集上的实验结果表明，Gen3d 在构建世界模型和合成高保真、一致性强的新视角图像方面表现出色，具有良好的泛化能力。\n"
  },
  {
    "path": "abs/2511.14315.md",
    "content": "### Dental3R: Geometry-Aware Pairing for Intraoral 3D Reconstruction from Sparse-View Photographs\n\nIntraoral 3D reconstruction is fundamental to digital orthodontics, yet conventional methods like intraoral scanning are inaccessible for remote tele-orthodontics, which typically relies on sparse smartphone imagery. While 3D Gaussian Splatting (3DGS) shows promise for novel view synthesis, its application to the standard clinical triad of unposed anterior and bilateral buccal photographs is challenging. The large view baselines, inconsistent illumination, and specular surfaces common in intraoral settings can destabilize simultaneous pose and geometry estimation. Furthermore, sparse-view photometric supervision often induces a frequency bias, leading to over-smoothed reconstructions that lose critical diagnostic details. To address these limitations, we propose Dental3R, a pose-free, graph-guided pipeline for robust, high-fidelity reconstruction from sparse intraoral photographs. Our method first constructs a Geometry-Aware Pairing Strategy (GAPS) to intelligently select a compact subgraph of high-value image pairs. The GAPS focuses on correspondence matching, thereby improving the stability of the geometry initialization and reducing memory usage. Building on the recovered poses and point cloud, we train the 3DGS model with a wavelet-regularized objective. By enforcing band-limited fidelity using a discrete wavelet transform, our approach preserves fine enamel boundaries and interproximal edges while suppressing high-frequency artifacts. We validate our approach on a large-scale dataset of 950 clinical cases and an additional video-based test set of 195 cases. Experimental results demonstrate that Dental3R effectively handles sparse, unposed inputs and achieves superior novel view synthesis quality for dental occlusion visualization, outperforming state-of-the-art methods.\n\n口腔内三维重建是数字正畸的核心技术，但传统方法如口扫仪（intraoral scanning）并不适用于远程正畸（tele-orthodontics）场景，后者通常依赖稀疏的智能手机图像。尽管三维高斯投影（3D Gaussian Splatting, 3DGS）在新视角合成中展现出良好潜力，但应用于标准临床三视图（未配准的正面和双侧颊面照片）仍面临巨大挑战。由于视角基线较大、照明不一致以及口腔区域常见的镜面反射，导致位姿与几何的联合估计极不稳定。此外，稀疏视图下的光度监督常引发频率偏差，产生过度平滑的重建结果，丢失关键诊断细节。为克服上述问题，我们提出了 **Dental3R**：一种无需位姿估计、基于图结构引导的高保真口腔重建新方法。该方法首先设计了一个几何感知配对策略（Geometry-Aware Pairing Strategy, GAPS），智能选择信息量丰富的图像对构建紧凑子图，从而提升几何初始化稳定性并降低内存开销。随后，在恢复的位姿与点云基础上，我们引入离散小波变换的波段约束目标函数，训练3DGS模型，确保在压制高频伪影的同时保留牙釉质边界与牙缝等关键细节。我们在950例临床病例组成的大规模数据集以及195例视频测试集上进行了验证，实验结果表明，Dental3R 能够稳健处理稀疏、未配准图像输入，在牙合关系可视化等任务中生成高质量的新视角图像，显著优于现有先进方法。\n"
  },
  {
    "path": "abs/2511.14357.md",
    "content": "### IBGS: Image-Based Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) has recently emerged as a fast, high-quality method for novel view synthesis (NVS). However, its use of low-degree spherical harmonics limits its ability to capture spatially varying color and view-dependent effects such as specular highlights. Existing works augment Gaussians with either a global texture map, which struggles with complex scenes, or per-Gaussian texture maps, which introduces high storage overhead. We propose Image-Based Gaussian Splatting, an efficient alternative that leverages high-resolution source images for fine details and view-specific color modeling. Specifically, we model each pixel color as a combination of a base color from standard 3DGS rendering and a learned residual inferred from neighboring training images. This promotes accurate surface alignment and enables rendering images of high-frequency details and accurate view-dependent effects. Experiments on standard NVS benchmarks show that our method significantly outperforms prior Gaussian Splatting approaches in rendering quality, without increasing the storage footprint.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）近期作为一种快速且高质量的新视角合成（Novel View Synthesis, NVS）方法获得广泛关注。然而，3DGS 采用的低阶球谐函数限制了其对空间变化颜色与视角相关效果（如镜面高光）的建模能力。现有方法通常通过全局纹理图进行增强，但在复杂场景中表现不佳；或为每个高斯图元引入独立纹理图，虽能提升表现，但存储开销巨大。为此，我们提出了一种高效替代方案——基于图像的高斯投影（Image-Based Gaussian Splatting）。该方法利用高分辨率原始图像建模细节与视角特有颜色信息。具体而言，我们将每个像素的颜色表示为标准 3DGS 渲染结果与来自相邻训练图像推断的残差项之和。这一机制有助于提升表面对齐精度，并实现高频细节和准确的视角相关效果渲染。在标准 NVS 基准上的实验表明，该方法在不增加存储开销的前提下，显著优于现有高斯投影方法的渲染质量。\n"
  },
  {
    "path": "abs/2511.14540.md",
    "content": "### Interaction-Aware 4D Gaussian Splatting for Dynamic Hand-Object Interaction Reconstruction\n\nThis paper focuses on a challenging setting of simultaneously modeling geometry and appearance of hand-object interaction scenes without any object priors. We follow the trend of dynamic 3D Gaussian Splatting based methods, and address several significant challenges. To model complex hand-object interaction with mutual occlusion and edge blur, we present interaction-aware hand-object Gaussians with newly introduced optimizable parameters aiming to adopt piecewise linear hypothesis for clearer structural representation. Moreover, considering the complementarity and tightness of hand shape and object shape during interaction dynamics, we incorporate hand information into object deformation field, constructing interaction-aware dynamic fields to model flexible motions. To further address difficulties in the optimization process, we propose a progressive strategy that handles dynamic regions and static background step by step. Correspondingly, explicit regularizations are designed to stabilize the hand-object representations for smooth motion transition, physical interaction reality, and coherent lighting. Experiments show that our approach surpasses existing dynamic 3D-GS-based methods and achieves state-of-the-art performance in reconstructing dynamic hand-object interaction.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）近期作为一种快速且高质量的新视角合成（Novel View Synthesis, NVS）方法获得广泛关注。然而，3DGS 采用的低阶球谐函数限制了其对空间变化颜色与视角相关效果（如镜面高光）的建模能力。现有方法通常通过全局纹理图进行增强，但在复杂场景中表现不佳；或为每个高斯图元引入独立纹理图，虽能提升表现，但存储开销巨大。为此，我们提出了一种高效替代方案——基于图像的高斯投影（Image-Based Gaussian Splatting）。该方法利用高分辨率原始图像建模细节与视角特有颜色信息。具体而言，我们将每个像素的颜色表示为标准 3DGS 渲染结果与来自相邻训练图像推断的残差项之和。这一机制有助于提升表面对齐精度，并实现高频细节和准确的视角相关效果渲染。在标准 NVS 基准上的实验表明，该方法在不增加存储开销的前提下，显著优于现有高斯投影方法的渲染质量。\n"
  },
  {
    "path": "abs/2511.14633.md",
    "content": "### SparseSurf: Sparse-View 3D Gaussian Splatting for Surface Reconstruction\n\nRecent advances in optimizing Gaussian Splatting for scene geometry have enabled efficient reconstruction of detailed surfaces from images. However, when input views are sparse, such optimization is prone to overfitting, leading to suboptimal reconstruction quality. Existing approaches address this challenge by employing flattened Gaussian primitives to better fit surface geometry, combined with depth regularization to alleviate geometric ambiguities under limited viewpoints. Nevertheless, the increased anisotropy inherent in flattened Gaussians exacerbates overfitting in sparse-view scenarios, hindering accurate surface fitting and degrading novel view synthesis performance. In this paper, we propose \\net{}, a method that reconstructs more accurate and detailed surfaces while preserving high-quality novel view rendering. Our key insight is to introduce Stereo Geometry-Texture Alignment, which bridges rendering quality and geometry estimation, thereby jointly enhancing both surface reconstruction and view synthesis. In addition, we present a Pseudo-Feature Enhanced Geometry Consistency that enforces multi-view geometric consistency by incorporating both training and unseen views, effectively mitigating overfitting caused by sparse supervision. Extensive experiments on the DTU, BlendedMVS, and Mip-NeRF360 datasets demonstrate that our method achieves the state-of-the-art performance.\n\n近期在高斯投影（Gaussian Splatting）优化场景几何方面的研究取得了显著进展，使得从图像中高效重建精细表面成为可能。然而，当输入视角较稀疏时，此类优化易产生过拟合，导致重建质量下降。现有方法通常采用扁平化高斯图元以更好地贴合表面几何，同时引入深度正则项缓解视角受限下的几何歧义。但由于扁平高斯本身存在更高的各向异性，这种策略在稀疏视角下反而会加剧过拟合，影响表面拟合准确性并降低新视角合成质量。为解决这一问题，本文提出方法 \\net{}，在保持高质量新视图渲染的同时，实现更准确、细节更丰富的表面重建。我们的核心思想是引入“立体几何-纹理对齐”（Stereo Geometry-Texture Alignment）机制，建立渲染质量与几何估计之间的桥梁，实现两者的协同优化。此外，我们还提出“伪特征增强的几何一致性”策略（Pseudo-Feature Enhanced Geometry Consistency），通过结合训练视图与未见视图，引导多视图几何一致性，有效缓解稀疏监督导致的过拟合问题。我们在 DTU、BlendedMVS 以及 Mip-NeRF360 数据集上的大量实验表明，该方法在表面重建与新视图合成任务中均达到了当前最优性能。\n"
  },
  {
    "path": "abs/2511.14848.md",
    "content": "### Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video\n\nWe present Gaussian See, Gaussian Do, a novel approach for semantic 3D motion transfer from multiview video. Our method enables rig-free, cross-category motion transfer between objects with semantically meaningful correspondence. Building on implicit motion transfer techniques, we extract motion embeddings from source videos via condition inversion, apply them to rendered frames of static target shapes, and use the resulting videos to supervise dynamic 3D Gaussian Splatting reconstruction. Our approach introduces an anchor-based view-aware motion embedding mechanism, ensuring cross-view consistency and accelerating convergence, along with a robust 4D reconstruction pipeline that consolidates noisy supervision videos. We establish the first benchmark for semantic 3D motion transfer and demonstrate superior motion fidelity and structural consistency compared to adapted baselines.\n\n我们提出了 Gaussian See, Gaussian Do，一种新颖的语义三维动作迁移方法，能够从多视角视频中实现无绑定、跨类别的动作迁移，适用于具有语义对应关系的物体之间。基于隐式动作迁移技术，我们首先通过条件反演从源视频中提取动作嵌入（motion embeddings），并将其应用于静态目标形状的渲染帧上，所生成的视频用于监督动态三维高斯投影（3D Gaussian Splatting）重建过程。我们引入了一种基于锚点的视角感知动作嵌入机制，以确保多视图间的一致性并加速训练收敛，同时设计了一条稳健的四维重建流程，用于整合存在噪声的监督视频。我们建立了首个语义三维动作迁移的评测基准，并在运动保真度与结构一致性方面显著优于现有改进基线方法。\n"
  },
  {
    "path": "abs/2511.15102.md",
    "content": "### Gaussian Blending: Rethinking Alpha Blending in 3D Gaussian Splatting\n\nThe recent introduction of 3D Gaussian Splatting (3DGS) has significantly advanced novel view synthesis. Several studies have further improved the rendering quality of 3DGS, yet they still exhibit noticeable visual discrepancies when synthesizing views at sampling rates unseen during training. Specifically, they suffer from (i) erosion-induced blurring artifacts when zooming in and (ii) dilation-induced staircase artifacts when zooming out. We speculate that these artifacts arise from the fundamental limitation of the alpha blending adopted in 3DGS methods. Instead of the conventional alpha blending that computes alpha and transmittance as scalar quantities over a pixel, we propose to replace it with our novel Gaussian Blending that treats alpha and transmittance as spatially varying distributions. Thus, transmittances can be updated considering the spatial distribution of alpha values across the pixel area, allowing nearby background splats to contribute to the final rendering. Our Gaussian Blending maintains real-time rendering speed and requires no additional memory cost, while being easily integrated as a drop-in replacement into existing 3DGS-based or other NVS frameworks. Extensive experiments demonstrate that Gaussian Blending effectively captures fine details at various sampling rates unseen during training, consistently outperforming existing novel view synthesis models across both unseen and seen sampling rates.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）的提出显著推动了新视角合成的发展。尽管已有多项研究提升了 3DGS 的渲染质量，但在合成训练阶段未见采样率的新视角时，仍会出现明显的视觉伪影。具体而言：(i) 在放大视角时会出现由图元侵蚀引起的模糊伪影；(ii) 在缩小视角时则会出现由图元膨胀导致的阶梯状伪影。我们推测，这些伪影源于 3DGS 方法中所采用的 alpha 混合方式的基本局限。为此，我们提出了一种新颖的“高斯混合（Gaussian Blending）”策略，取代传统的 alpha 混合（将 alpha 和透射率视为像素上的标量值），将其视为具有空间变化分布的函数。这样，透射率的更新可以考虑 alpha 值在像素区域内的空间分布，使得邻近的背景图元也能对最终图像渲染做出贡献。该高斯混合方法不仅保持了实时渲染速度，不增加额外内存开销，还可作为现有 3DGS 或其他新视角合成（NVS）框架的无缝替代组件。大量实验结果表明，高斯混合在不同于训练采样率的视角下均能有效保留图像细节，在已见和未见采样率上均显著优于现有的新视角合成模型。\n"
  },
  {
    "path": "abs/2511.16030.md",
    "content": "### CuriGS: Curriculum-Guided Gaussian Splatting for Sparse View Synthesis\n\n3D Gaussian Splatting (3DGS) has recently emerged as an efficient, high-fidelity representation for real-time scene reconstruction and rendering. However, extending 3DGS to sparse-view settings remains challenging because of supervision scarcity and overfitting caused by limited viewpoint coverage. In this paper, we present CuriGS, a curriculum-guided framework for sparse-view 3D reconstruction using 3DGS. CuriGS addresses the core challenge of sparse-view synthesis by introducing student views: pseudo-views sampled around ground-truth poses (teacher). For each teacher, we generate multiple groups of student views with different perturbation levels. During training, we follow a curriculum schedule that gradually unlocks higher perturbation level, randomly sampling candidate students from the active level to assist training. Each sampled student is regularized via depth-correlation and co-regularization, and evaluated using a multi-signal metric that combines SSIM, LPIPS, and an image-quality measure. For every teacher and perturbation level, we periodically retain the best-performing students and promote those that satisfy a predefined quality threshold to the training set, resulting in a stable augmentation of sparse training views. Experimental results show that CuriGS outperforms state-of-the-art baselines in both rendering fidelity and geometric consistency across various synthetic and real sparse-view scenes.\n\n三维高斯投影（3D Gaussian Splatting, 3DGS）作为一种高效且高保真的实时场景重建与渲染表示方式，近年来受到广泛关注。然而，将 3DGS 扩展到稀疏视图场景仍面临重大挑战，主要原因在于监督信号稀缺以及视角覆盖受限所导致的过拟合问题。为解决这一问题，本文提出 CuriGS，一种基于课程学习的 3DGS 稀疏视图重建框架。CuriGS 的核心策略是引入“学生视图”（student views）：围绕真实视角（教师）进行扰动采样所生成的伪视角图像。针对每个教师视角，我们生成多个具有不同扰动强度的学生视图组。训练过程中遵循课程学习策略，逐步解锁更高扰动等级，并从当前活跃等级中随机采样候选学生视图参与训练。每个采样到的学生视图会通过深度相关性和协同正则进行约束，并通过结合 SSIM、LPIPS 与图像质量指标的多信号评分体系进行评估。在训练过程中，我们会周期性保留表现最优的学生视图，并将达到预设质量阈值的学生样本加入训练集，从而稳定扩充原始稀疏监督视角。大量实验结果表明，CuriGS 在多个合成与真实稀疏视图场景中，在渲染保真度与几何一致性方面均优于当前主流方法。\n"
  },
  {
    "path": "abs/2511.16091.md",
    "content": "### Rad-GS: Radar-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments\n\nWe present Rad-GS, a 4D radar-camera SLAM system designed for kilometer-scale outdoor environments, utilizing 3D Gaussian as a differentiable spatial representation. Rad-GS combines the advantages of raw radar point cloud with Doppler information and geometrically enhanced point cloud to guide dynamic object masking in synchronized images, thereby alleviating rendering artifacts and improving localization accuracy. Additionally, unsynchronized image frames are leveraged to globally refine the 3D Gaussian representation, enhancing texture consistency and novel view synthesis fidelity. Furthermore, the global octree structure coupled with a targeted Gaussian primitive management strategy further suppresses noise and significantly reduces memory consumption in large-scale environments. Extensive experiments and ablation studies demonstrate that Rad-GS achieves performance comparable to traditional 3D Gaussian methods based on camera or LiDAR inputs, highlighting the feasibility of robust outdoor mapping using 4D mmWave radar. Real-world reconstruction at kilometer scale validates the potential of Rad-GS for large-scale scene reconstruction.\n\n我们提出了 Rad-GS，一种面向千米级户外环境的四维雷达-相机联合 SLAM 系统，采用三维高斯作为可微分的空间表示。Rad-GS 结合了原始雷达点云中的多普勒信息与几何增强点云的优势，用于引导同步图像中的动态目标掩模，从而缓解渲染伪影并提升定位精度。此外，系统还利用非同步图像帧对三维高斯表示进行全局优化，增强纹理一致性与新视角合成的保真度。进一步地，Rad-GS 引入了全局八叉树结构和针对性的高斯图元管理策略，在大规模场景中有效抑制噪声并显著降低内存开销。大量实验与消融研究表明，Rad-GS 在性能上可与基于相机或激光雷达的传统三维高斯方法媲美，验证了采用四维毫米波雷达进行稳健户外建图的可行性。实际的千米级重建结果进一步展示了 Rad-GS 在大规模场景建模中的应用潜力。\n"
  },
  {
    "path": "abs/2511.16112.md",
    "content": "### Clustered Error Correction with Grouped 4D Gaussian Splatting\n\nExisting 4D Gaussian Splatting (4DGS) methods struggle to accurately reconstruct dynamic scenes, often failing to resolve ambiguous pixel correspondences and inadequate densification in dynamic regions. We address these issues by introducing a novel method composed of two key components: (1) Elliptical Error Clustering and Error Correcting Splat Addition that pinpoints dynamic areas to improve and initialize fitting splats, and (2) Grouped 4D Gaussian Splatting that improves consistency of mapping between splats and represented dynamic objects. Specifically, we classify rendering errors into missing-color and occlusion types, then apply targeted corrections via backprojection or foreground splitting guided by cross-view color consistency. Evaluations on Neural 3D Video and Technicolor datasets demonstrate that our approach significantly improves temporal consistency and achieves state-of-the-art perceptual rendering quality, improving 0.39dB of PSNR on the Technicolor Light Field dataset. Our visualization shows improved alignment between splats and dynamic objects, and the error correction method's capability to identify errors and properly initialize new splats.\n\n现有的四维高斯投影（4D Gaussian Splatting, 4DGS）方法在动态场景重建方面表现不佳，常常难以解决像素对应关系模糊与动态区域点云稀疏等问题。为此，我们提出了一种新颖的方法，包含两个核心组件：（1）椭圆误差聚类与误差修正图元添加（Elliptical Error Clustering and Error Correcting Splat Addition），用于精准定位动态区域并初始化/优化拟合图元；（2）分组式四维高斯投影（Grouped 4D Gaussian Splatting），提升图元与动态物体之间的映射一致性。具体来说，我们将渲染误差划分为“颜色缺失型”和“遮挡型”两类，并通过反投影或前景分割等策略，在跨视角颜色一致性指导下进行有针对性的修正。在 Neural 3D Video 和 Technicolor 数据集上的评估结果表明，我们的方法显著提升了时间一致性，并在感知渲染质量方面达到当前最优，在 Technicolor 光场数据集上提升了 0.39dB 的 PSNR。可视化结果进一步验证了本方法在图元与动态物体对齐方面的改进效果，以及误差修正机制在错误检测与图元初始化中的有效性。\n"
  },
  {
    "path": "abs/2511.16144.md",
    "content": "### LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM\n\nRecent advances in 3D Gaussian Splatting (3DGS) have enabled Simultaneous Localization and Mapping (SLAM) systems to build photorealistic maps. However, these maps lack the open-vocabulary semantic understanding required for advanced robotic interaction. Integrating language features into SLAM remains a significant challenge, as storing high-dimensional features demands excessive memory and rendering overhead, while existing methods with static models lack adaptability for novel environments. To address these limitations, we propose LEGO-SLAM (Language-Embedded Gaussian Optimization SLAM), the first framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system. At the core of our method is a scene-adaptive encoder-decoder that distills high-dimensional language embeddings into a compact 16-dimensional feature space. This design reduces the memory per Gaussian and accelerates rendering, enabling real-time performance. Unlike static approaches, our encoder adapts online to unseen scenes. These compact features also enable a language-guided pruning strategy that identifies semantic redundancy, reducing the map's Gaussian count by over 60% while maintaining rendering quality. Furthermore, we introduce a language-based loop detection approach that reuses these mapping features, eliminating the need for a separate detection model. Extensive experiments demonstrate that LEGO-SLAM achieves competitive mapping quality and tracking accuracy, all while providing open-vocabulary capabilities at 15 FPS.\n\n三维高斯溅射（3DGS）的最新进展使得同时定位与建图（SLAM）系统能够构建出具备照片级真实感的地图。然而，这些地图缺乏面向高级机器人交互所需的开放词汇语义理解能力。将语言特征引入SLAM仍然面临重大挑战：高维特征的存储需要大量内存并带来渲染开销，而现有使用静态模型的方法缺乏对新环境的适应性。为了解决这些问题，我们提出了 LEGO-SLAM（Language-Embedded Gaussian Optimization SLAM），这是首个在基于3DGS的SLAM系统中实现实时、开放词汇建图的框架。我们方法的核心是一个场景自适应的编码器-解码器结构，它将高维语言嵌入提炼为紧凑的16维特征空间。该设计显著降低了每个高斯的内存开销并加速了渲染，从而实现实时性能。与静态方法不同，我们的编码器能够在线适应未见场景。这些紧凑特征还支持一种语言引导的剪枝策略，用于识别语义冗余，在保持渲染质量的同时将地图中的高斯数量减少超过60%。此外，我们还引入了一种基于语言的闭环检测方法，能够重用这些建图特征，无需单独的检测模型。大量实验证明，LEGO-SLAM在提供15 FPS开放词汇能力的同时，在建图质量和跟踪精度上也达到了竞争性水平。\n"
  },
  {
    "path": "abs/2511.16298.md",
    "content": "### Optimizing 3D Gaussian Splattering for Mobile GPUs\n\nImage-based 3D scene reconstruction, which transforms multi-view images into a structured 3D representation of the surrounding environment, is a common task across many modern applications. 3D Gaussian Splatting (3DGS) is a new paradigm to address this problem and offers considerable efficiency as compared to the previous methods. Motivated by this, and considering various benefits of mobile device deployment (data privacy, operating without internet connectivity, and potentially faster responses), this paper develops Texture3dgs, an optimized mapping of 3DGS for a mobile GPU. A critical challenge in this area turns out to be optimizing for the two-dimensional (2D) texture cache, which needs to be exploited for faster executions on mobile GPUs. As a sorting method dominates the computations in 3DGS on mobile platforms, the core of Texture3dgs is a novel sorting algorithm where the processing, data movement, and placement are highly optimized for 2D memory. The properties of this algorithm are analyzed in view of a cost model for the texture cache. In addition, we accelerate other steps of the 3DGS algorithm through improved variable layout design and other optimizations. End-to-end evaluation shows that Texture3dgs delivers up to 4.1× and 1.7× speedup for the sorting and overall 3D scene reconstruction, respectively -- while also reducing memory usage by up to 1.6× -- demonstrating the effectiveness of our design for efficient mobile 3D scene reconstruction.\n\n基于图像的三维场景重建是将多视角图像转换为周围环境的结构化三维表示的任务，广泛应用于许多现代应用中。三维高斯溅射（3DGS）是一种解决该问题的新范式，与传统方法相比具有显著的效率优势。受此启发，并结合移动设备部署的多项优势（数据隐私、无需联网运行以及潜在的更快响应），本文提出了 Texture3dgs，即一种针对移动GPU优化的3DGS映射方案。在该领域面临的一个关键挑战是如何优化二维（2D）纹理缓存的使用，以实现移动GPU上的快速执行。由于排序方法在移动平台上的3DGS计算中占据主导地位，Texture3dgs 的核心是一个新颖的排序算法，在处理过程、数据移动和数据布局方面针对二维内存进行了高度优化。该算法的性能通过纹理缓存的成本模型进行了分析。此外，我们还通过改进变量布局设计以及其他优化手段加速了3DGS算法的其他步骤。端到端的评估结果表明，Texture3dgs 在排序和整体三维场景重建上分别实现了最高 4.1 倍和 1.7 倍的加速，同时内存使用最多减少 1.6 倍，展示了我们设计在高效移动端三维场景重建方面的有效性。\n"
  },
  {
    "path": "abs/2511.16542.md",
    "content": "### EOGS++: Earth Observation Gaussian Splatting with Internal Camera Refinement and Direct Panchromatic Rendering\n\nRecently, 3D Gaussian Splatting has been introduced as a compelling alternative to NeRF for Earth observation, offering com- petitive reconstruction quality with significantly reduced training times. In this work, we extend the Earth Observation Gaussian Splatting (EOGS) framework to propose EOGS++, a novel method tailored for satellite imagery that directly operates on raw high-resolution panchromatic data without requiring external preprocessing. Furthermore, leveraging optical flow techniques we embed bundle adjustment directly within the training process, avoiding reliance on external optimization tools while improving camera pose estimation. We also introduce several improvements to the original implementation, including early stopping and TSDF post-processing, all contributing to sharper reconstructions and better geometric accuracy. Experiments on the IARPA 2016 and DFC2019 datasets demonstrate that EOGS++ achieves state-of-the-art performance in terms of reconstruction quality and effi- ciency, outperforming the original EOGS method and other NeRF-based methods while maintaining the computational advantages of Gaussian Splatting. Our model demonstrates an improvement from 1.33 to 1.19 mean MAE errors on buildings compared to the original EOGS models\n\n近年来，三维高斯溅射（3D Gaussian Splatting）被提出作为NeRF在地球观测任务中的有力替代方案，在显著减少训练时间的同时提供了具有竞争力的重建质量。在本研究中，我们在原有的地球观测高斯溅射（EOGS）框架基础上提出了EOGS++，这是一种专为卫星影像设计的新方法，可直接在高分辨率全色原始数据上运行，无需额外的预处理步骤。此外，我们利用光流技术将束束调整（bundle adjustment）直接嵌入训练流程中，摆脱了对外部优化工具的依赖，同时提升了相机位姿估计的精度。我们还对原始实现进行了多项改进，包括早停机制和TSDF后处理，这些都共同提升了重建的清晰度和几何精度。在IARPA 2016 和 DFC2019 数据集上的实验表明，EOGS++在重建质量和效率方面均达到了当前最优性能，不仅优于原始EOGS方法，也超过了其他基于NeRF的方法，同时仍保留了高斯溅射在计算效率上的优势。与原始EOGS模型相比，我们的模型在建筑物的平均MAE误差上从1.33下降至1.19，进一步验证了其有效性。\n"
  },
  {
    "path": "abs/2511.16642.md",
    "content": "### TRIM: Scalable 3D Gaussian Diffusion Inference with Temporal and Spatial Trimming\n\nRecent advances in 3D Gaussian diffusion models suffer from time-intensive denoising and post-denoising processing due to the massive number of Gaussian primitives, resulting in slow generation and limited scalability along sampling trajectories. To improve the efficiency of 3D diffusion models, we propose TRIM (Trajectory Reduction and Instance Mask denoising), a post-training approach that incorporates both temporal and spatial trimming strategies, to accelerate inference without compromising output quality while supporting the inference-time scaling for Gaussian diffusion models. Instead of scaling denoising trajectories in a costly end-to-end manner, we develop a lightweight selector model to evaluate latent Gaussian primitives derived from multiple sampled noises, enabling early trajectory reduction by selecting candidates with high-quality potential. Furthermore, we introduce instance mask denoising to prune learnable Gaussian primitives by filtering out redundant background regions, reducing inference computation at each denoising step. Extensive experiments and analysis demonstrate that TRIM significantly improves both the efficiency and quality of 3D generation.\n\n近年来，三维高斯扩散模型在去噪及其后处理阶段由于涉及大量高斯基元，导致处理耗时严重，从而在生成速度和采样路径的可扩展性方面受到限制。为提升三维扩散模型的推理效率，我们提出了TRIM（Trajectory Reduction and Instance Mask denoising），这是一种后训练阶段的方法，融合了时间与空间两个维度的裁剪策略，在不牺牲输出质量的前提下加速推理，并支持高斯扩散模型的推理时缩放能力。与代价高昂的端到端扩展去噪路径方法不同，我们设计了一个轻量级选择器模型，用于评估从多组采样噪声中生成的潜在高斯基元，从而在早期阶段就进行路径裁剪，优先选择具有高质量潜力的候选项。此外，我们引入了实例掩码去噪机制，在去除冗余背景区域的同时，对可学习的高斯基元进行裁剪，从而在每一步去噪过程中降低计算负担。大量实验和分析表明，TRIM 显著提升了三维生成的效率与质量。\n"
  },
  {
    "path": "abs/2511.16831.md",
    "content": "### Vorion: A RISC-V GPU with Hardware-Accelerated 3D Gaussian Rendering and Training\n\n3D Gaussian Splatting (3DGS) has recently emerged as a foundational technique for real-time neural rendering, 3D scene generation, volumetric video (4D) capture. However, its rendering and training impose massive computation, making real-time rendering on edge devices and real-time 4D reconstruction on workstations currently infeasible. Given its fixed-function nature and similarity with traditional rasterization, 3DGS presents a strong case for dedicated hardware in the graphics pipeline of next-generation GPUs. This work, Vorion, presents the first GPGPU prototype with hardware-accelerated 3DGS rendering and training. Vorion features scalable architecture, minimal hardware change to traditional rasterizers, z-tiling to increase parallelism, and Gaussian/pixel-centric hybrid dataflow. We prototype the minimal system (8 SIMT cores, 2 Gaussian rasterizer) using TSMC 16nm FinFET technology, which achieves 19 FPS for rendering. The scaled design with 16 rasterizers achieves 38.6 iterations/s for training.\n\n三维高斯溅射（3D Gaussian Splatting, 3DGS）近年来已成为实时神经渲染、三维场景生成以及体积视频（4D）捕获的基础性技术。然而，3DGS 的渲染与训练过程计算量极大，导致目前在边缘设备上实现实时渲染，以及在工作站上实现实时4D重建仍不可行。鉴于其固定功能特性以及与传统光栅化方法的相似性，3DGS 极有潜力成为下一代 GPU 图形渲染管线中专用硬件的候选对象。本工作提出 Vorion，这是首个支持 3DGS 渲染与训练的硬件加速 GPGPU 原型系统。Vorion 采用可扩展架构，对传统光栅器仅需最小硬件改动，引入 z-tiling 技术以提升并行度，并设计了高斯/像素混合的数据流方式。我们基于台积电 16nm FinFET 工艺实现了最小系统原型（8 个 SIMT 核心，2 个高斯光栅器），在渲染任务中可达 19 FPS；在训练任务中，扩展为 16 个光栅器的设计实现了每秒 38.6 次迭代的性能。\n"
  },
  {
    "path": "abs/2511.16966.md",
    "content": "### One Walk is All You Need: Data-Efficient 3D RF Scene Reconstruction with Human Movements\n\nReconstructing 3D Radiance Field (RF) scenes through opaque obstacles is a long-standing goal, yet it is fundamentally constrained by a laborious data acquisition process requiring thousands of static measurements, which treats human motion as noise to be filtered. This work introduces a new paradigm with a core objective: to perform fast, data-efficient, and high-fidelity RF reconstruction of occluded 3D static scenes, using only a single, brief human walk. We argue that this unstructured motion is not noise, but is in fact an information-rich signal available for reconstruction. To achieve this, we design a factorization framework based on composite 3D Gaussian Splatting (3DGS) that learns to model the dynamic effects of human motion from the persistent static scene geometry within a raw RF stream. Trained on just a single 60-second casual walk, our model reconstructs the full static scene with a Structural Similarity Index (SSIM) of 0.96, remarkably outperforming heavily-sampled state-of-the-art (SOTA) by 12%. By transforming the human movements into its valuable signals, our method eliminates the data acquisition bottleneck and paves the way for on-the-fly 3D RF mapping of unseen environments.\n\n通过不透明障碍物重建三维辐射场（Radiance Field, RF）场景一直是一个长期目标，但该任务本质上受到繁重的数据采集流程的限制，通常需要成千上万次静态测量，并将人类运动视为需要过滤的噪声。本文提出了一种新的范式，核心目标是仅利用一次短暂的人类行走，实现对被遮挡三维静态场景的快速、高效且高保真的RF重建。我们认为这种非结构化运动并非噪声，反而是可用于重建的高信息量信号。为此，我们设计了一个基于复合三维高斯溅射（3DGS）的因式分解框架，该框架能够从原始RF流中持续存在的静态场景几何中学习人类运动的动态效应。仅通过一次 60 秒的随意行走训练后，我们的模型即可重建完整静态场景，结构相似度指数（SSIM）高达 0.96，显著超过依赖密集采样的最先进方法（SOTA）12%。通过将人类动作转化为有价值的信号，我们的方法消除了数据采集瓶颈，为未知环境的即时三维RF建图开辟了新路径。\n"
  },
  {
    "path": "abs/2511.16980.md",
    "content": "### Gradient-Driven Natural Selection for Compact 3D Gaussian Splatting\n\n3DGS employs a large number of Gaussian primitives to fit scenes, resulting in substantial storage and computational overhead. Existing pruning methods rely on manually designed criteria or introduce additional learnable parameters, yielding suboptimal results. To address this, we propose an natural selection inspired pruning framework that models survival pressure as a regularization gradient field applied to opacity, allowing the optimization gradients--driven by the goal of maximizing rendering quality--to autonomously determine which Gaussians to retain or prune. This process is fully learnable and requires no human intervention. We further introduce an opacity decay technique with a finite opacity prior, which accelerates the selection process without compromising pruning effectiveness. Compared to 3DGS, our method achieves over 0.6 dB PSNR gain under 15% budgets, establishing state-of-the-art performance for compact 3DGS.\n\n3DGS 使用大量高斯基元对场景进行拟合，导致存储和计算开销巨大。现有的剪枝方法依赖于人工设计的标准或引入额外的可学习参数，通常难以取得最优效果。为了解决这一问题，我们提出了一种受自然选择启发的剪枝框架，将“生存压力”建模为作用于不透明度的正则化梯度场，使得由优化目标（即最大化渲染质量）驱动的梯度能够自主决定哪些高斯应被保留、哪些应被剪除。该过程完全可学习，无需人工干预。我们进一步引入了带有限不透明度先验的“不透明度衰减”技术，在不影响剪枝效果的前提下加快了选择过程。与原始 3DGS 方法相比，在 15% 的预算下，我们的方法实现了超过 0.6 dB 的 PSNR 提升，在紧凑型 3DGS 任务上达到了当前最优性能。\n"
  },
  {
    "path": "abs/2511.16988.md",
    "content": "### PhysMorph-GS: Differentiable Shape Morphing via Joint Optimization of Physics and Rendering Objectives\n\nShape morphing with physics-based simulation naturally supports large deformations and topology changes, but existing methods suffer from a \"rendering gap\": nondifferentiable surface extraction prevents image losses from directly guiding physics optimization. We introduce PhysMorph-GS, which couples a differentiable material point method (MPM) with 3D Gaussian splatting through a deformation-aware upsampling bridge that maps sparse particle states (x, F) to dense Gaussians (mu, Sigma). Multi-modal rendering losses on silhouette and depth backpropagate along two paths, from covariances to deformation gradients via a stretch-based mapping and from Gaussian means to particle positions. Through the MPM adjoint, these gradients update deformation controls while mass is conserved at a compact set of anchor particles. A multi-pass interleaved optimization scheme repeatedly injects rendering gradients into successive physics steps, avoiding collapse to purely physics-driven solutions. On challenging morphing sequences, PhysMorph-GS improves boundary fidelity and temporal stability over a differentiable MPM baseline and better reconstructs thin structures such as ears and tails. Quantitatively, our depth-supervised variant reduces Chamfer distance by about 2.5 percent relative to the physics-only baseline. By providing a differentiable particle-to-Gaussian bridge, PhysMorph-GS closes a key gap in physics-aware rendering pipelines and enables inverse design directly from image-space supervision.\n\n基于物理的形变模拟天然支持大范围变形和拓扑结构变化，但现有方法存在“渲染鸿沟”：不可微的表面提取过程阻碍了图像损失对物理优化的直接引导。我们提出了 PhysMorph-GS，它将可微分的质点法（MPM）与三维高斯溅射（3DGS）耦合，通过一个感知形变的上采样桥梁将稀疏的粒子状态（x, F）映射为稠密的高斯参数（μ, Σ）。轮廓和深度的多模态渲染损失通过两条路径反向传播：从协方差到形变梯度的路径基于拉伸映射；从高斯均值到粒子位置的路径直接连接形变几何。这些梯度通过MPM的伴随方法更新形变控制，同时质量在一组紧凑的锚点粒子上保持守恒。我们引入了一个多次迭代、交替优化的策略，将渲染梯度不断注入后续的物理步骤中，避免最终收敛为纯物理主导的解决方案。在具有挑战性的形变序列任务中，PhysMorph-GS 相比于可微 MPM 基线在边界保真度和时间稳定性方面有显著提升，尤其能更好地重建诸如耳朵、尾巴等细长结构。定量实验显示，我们的深度监督变体在 Chamfer 距离指标上相较物理基线降低了约 2.5%。通过构建可微的粒子到高斯的桥梁，PhysMorph-GS 弥合了物理感知渲染管线中的关键鸿沟，使得从图像空间监督中直接实现逆向设计成为可能。\n"
  },
  {
    "path": "abs/2511.17048.md",
    "content": "### RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation\n\nIn this paper, we propose RoomPlanner, the first fully automatic 3D room generation framework for painlessly creating realistic indoor scenes with only short text as input. Without any manual layout design or panoramic image guidance, our framework can generate explicit layout criteria for rational spatial placement. We begin by introducing a hierarchical structure of language-driven agent planners that can automatically parse short and ambiguous prompts into detailed scene descriptions. These descriptions include raw spatial and semantic attributes for each object and the background, which are then used to initialize 3D point clouds. To position objects within bounded environments, we implement two arrangement constraints that iteratively optimize spatial arrangements, ensuring a collision-free and accessible layout solution. In the final rendering stage, we propose a novel AnyReach Sampling strategy for camera trajectory, along with the Interval Timestep Flow Sampling (ITFS) strategy, to efficiently optimize the coarse 3D Gaussian scene representation. These approaches help reduce the total generation time to under 30 minutes. Extensive experiments demonstrate that our method can produce geometrically rational 3D indoor scenes, surpassing prior approaches in both rendering speed and visual quality while preserving editability.\n\n基于物理的形变模拟天然支持大范围变形和拓扑结构变化，但现有方法存在“渲染鸿沟”：不可微的表面提取过程阻碍了图像损失对物理优化的直接引导。我们提出了 PhysMorph-GS，它将可微分的质点法（MPM）与三维高斯溅射（3DGS）耦合，通过一个感知形变的上采样桥梁将稀疏的粒子状态（x, F）映射为稠密的高斯参数（μ, Σ）。轮廓和深度的多模态渲染损失通过两条路径反向传播：从协方差到形变梯度的路径基于拉伸映射；从高斯均值到粒子位置的路径直接连接形变几何。这些梯度通过MPM的伴随方法更新形变控制，同时质量在一组紧凑的锚点粒子上保持守恒。我们引入了一个多次迭代、交替优化的策略，将渲染梯度不断注入后续的物理步骤中，避免最终收敛为纯物理主导的解决方案。在具有挑战性的形变序列任务中，PhysMorph-GS 相比于可微 MPM 基线在边界保真度和时间稳定性方面有显著提升，尤其能更好地重建诸如耳朵、尾巴等细长结构。定量实验显示，我们的深度监督变体在 Chamfer 距离指标上相较物理基线降低了约 2.5%。通过构建可微的粒子到高斯的桥梁，PhysMorph-GS 弥合了物理感知渲染管线中的关键鸿沟，使得从图像空间监督中直接实现逆向设计成为可能。\n"
  },
  {
    "path": "abs/2511.17092.md",
    "content": "### SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting\n\nArticulated objects are ubiquitous in daily environments, and their 3D reconstruction holds great significance across various fields. However, existing articulated object reconstruction methods typically require costly inputs such as multi-stage and multi-view observations. To address the limitations, we propose a category-agnostic articulated object reconstruction framework via planar Gaussian Splatting, which only uses sparse-view RGB images from a single state. Specifically, we first introduce a Gaussian information field to perceive the optimal sparse viewpoints from candidate camera poses. Then we compress 3D Gaussians into planar Gaussians to facilitate accurate estimation of normal and depth. The planar Gaussians are optimized in a coarse-to-fine manner through depth smooth regularization and few-shot diffusion. Moreover, we introduce a part segmentation probability for each Gaussian primitive and update them by back-projecting part segmentation masks of renderings. Extensive experimental results demonstrate that our method achieves higher-fidelity part-level surface reconstruction on both synthetic and real-world data than existing methods.\n\n关节物体在日常环境中无处不在，其三维重建在多个领域具有重要意义。然而，现有的关节物体重建方法通常依赖代价高昂的输入，如多阶段和多视角观测。为解决这一局限，我们提出了一种基于平面高斯溅射的类别无关关节物体重建框架，仅使用单一状态下的稀疏视角RGB图像。具体而言，我们首先引入高斯信息场，用于从候选相机姿态中感知最优的稀疏视角；随后将三维高斯压缩为平面高斯，以促进法线与深度的准确估计。平面高斯通过深度平滑正则项与少量扩散优化策略，以粗到细的方式进行优化。此外，我们为每个高斯基元引入部件分割概率，并通过反投影渲染结果的分割掩码对其进行更新。大量实验证明，我们的方法在合成数据与真实数据上均实现了更高保真的部件级表面重建，优于现有方法。\n"
  },
  {
    "path": "abs/2511.17116.md",
    "content": "### PEGS: Physics-Event Enhanced Large Spatiotemporal Motion Reconstruction via 3D Gaussian Splatting\n\nReconstruction of rigid motion over large spatiotemporal scales remains a challenging task due to limitations in modeling paradigms, severe motion blur, and insufficient physical consistency. In this work, we propose PEGS, a framework that integrates Physical priors with Event stream enhancement within a 3D Gaussian Splatting pipeline to perform deblurred target-focused modeling and motion recovery. We introduce a cohesive triple-level supervision scheme that enforces physical plausibility via an acceleration constraint, leverages event streams for high-temporal resolution guidance, and employs a Kalman regularizer to fuse multi-source observations. Furthermore, we design a motion-aware simulated annealing strategy that adaptively schedules the training process based on real-time kinematic states. We also contribute the first RGB-Event paired dataset targeting natural, fast rigid motion across diverse scenarios. Experiments show PEGS's superior performance in reconstructing motion over large spatiotemporal scales compared to mainstream dynamic methods.\n\n在大时空尺度下重建刚体运动依然面临巨大挑战，主要原因包括建模范式的局限性、严重的运动模糊以及物理一致性不足。为应对这些问题，我们提出了 PEGS 框架，该方法在三维高斯溅射（3DGS）管线中融合了物理先验与事件流增强，用于实现去模糊的目标聚焦建模与运动恢复。我们引入了一个统一的三层监督机制：通过加速度约束确保物理合理性、利用事件流提供高时间分辨率的引导、并使用卡尔曼正则器融合多源观测信息。此外，我们设计了一种具备运动感知能力的模拟退火策略，根据实时运动状态自适应调度训练过程。我们还首次构建了一个针对自然快速刚体运动的 RGB-事件对齐数据集，涵盖多样化场景。实验表明，PEGS 在大时空尺度的运动重建方面显著优于主流动态方法。\n"
  },
  {
    "path": "abs/2511.17207.md",
    "content": "### SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors\n\nRecent advances in dense 3D reconstruction enable the accurate capture of local geometry; however, integrating them into SLAM is challenging due to drift and redundant point maps, which limit efficiency and downstream tasks, such as novel view synthesis. To address these issues, we propose SING3R-SLAM, a globally consistent and compact Gaussian-based dense RGB SLAM framework. The key idea is to combine locally consistent 3D reconstructions with a unified global Gaussian representation that jointly refines scene geometry and camera poses, enabling efficient and versatile 3D mapping for multiple downstream applications. SING3R-SLAM first builds locally consistent submaps through our lightweight tracking and reconstruction module, and then progressively aligns and fuses them into a global Gaussian map that enforces cross-view geometric consistency. This global map, in turn, provides feedback to correct local drift and enhance the robustness of tracking. Extensive experiments demonstrate that SING3R-SLAM achieves state-of-the-art tracking, 3D reconstruction, and novel view rendering, resulting in over 12% improvement in tracking and producing finer, more detailed geometry, all while maintaining a compact and memory-efficient global representation on real-world datasets.\n\n近年来，稠密三维重建技术取得了显著进展，使得局部几何的高精度捕捉成为可能；然而，将其整合进SLAM系统仍面临挑战，主要包括位姿漂移和冗余点图，这些问题限制了系统的效率以及下游任务（如新视角合成）的性能表现。为了解决这些问题，我们提出了 SING3R-SLAM，这是一种具备全局一致性和紧凑表达的基于高斯分布的稠密RGB SLAM框架。其核心思想是将局部一致的三维重建结果与统一的全局高斯表示相结合，在联合优化场景几何和相机位姿的同时，实现高效且灵活的三维建图，适用于多种下游应用。SING3R-SLAM 首先通过轻量级的跟踪与重建模块构建局部一致的子图，然后逐步将其对齐并融合到一个全局高斯地图中，以强制跨视角几何一致性。反过来，该全局地图还能反馈用于修正局部漂移并增强跟踪的鲁棒性。大量实验表明，SING3R-SLAM 在跟踪精度、三维重建质量及新视角合成方面均达到了当前最优水平，跟踪性能提升超过12%，几何细节更精致丰富，同时保持了全局地图的紧凑性与内存效率，在真实世界数据集上展现了优异表现。\n"
  },
  {
    "path": "abs/2511.17210.md",
    "content": "### FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception\n\nAccurate BEV semantic segmentation from fisheye imagery remains challenging due to extreme non-linear distortion, occlusion, and depth ambiguity inherent to wide-angle projections. We present a distortion-aware BEV segmentation framework that directly processes multi-camera high-resolution fisheye images,utilizing calibrated geometric unprojection and per-pixel depth distribution estimation. Each image pixel is lifted into 3D space via Gaussian parameterization, predicting spatial means and anisotropic covariances to explicitly model geometric uncertainty. The projected 3D Gaussians are fused into a BEV representation via differentiable splatting, producing continuous, uncertainty-aware semantic maps without requiring undistortion or perspective rectification. Extensive experiments demonstrate strong segmentation performance on complex parking and urban driving scenarios, achieving IoU scores of 87.75% for drivable regions and 57.26% for vehicles under severe fisheye distortion and diverse environmental conditions.\n\n由于广角投影固有的非线性畸变、遮挡和深度模糊，从鱼眼图像中实现高精度的鸟瞰图（BEV）语义分割仍然是一项挑战。我们提出了一种畸变感知的BEV分割框架，能够直接处理多摄像头高分辨率鱼眼图像，结合标定后的几何反投影与逐像素深度分布估计。具体地，框架通过高斯参数化将每个图像像素提升到三维空间，预测空间均值与各向异性的协方差，以显式建模几何不确定性。投影后的三维高斯通过可微溅射融合到BEV表示中，生成连续的、不确定性感知的语义图，无需执行去畸变或透视校正。大量实验表明，该方法在复杂的停车场与城市驾驶场景中表现优异，在严重鱼眼畸变和多样环境条件下，分别在可通行区域与车辆类别上达到了87.75%和57.26%的IoU得分。\n"
  },
  {
    "path": "abs/2511.17747.md",
    "content": "### AEGIS: Preserving privacy of 3D Facial Avatars with Adversarial Perturbations\n\nThe growing adoption of photorealistic 3D facial avatars, particularly those utilizing efficient 3D Gaussian Splatting representations, introduces new risks of online identity theft, especially in systems that rely on biometric authentication. While effective adversarial masking methods have been developed for 2D images, a significant gap remains in achieving robust, viewpoint-consistent identity protection for dynamic 3D avatars. To address this, we present AEGIS, the first privacy-preserving identity masking framework for 3D Gaussian Avatars that maintains the subject's perceived characteristics. Our method aims to conceal identity-related facial features while preserving the avatar's perceptual realism and functional integrity. AEGIS applies adversarial perturbations to the Gaussian color coefficients, guided by a pre-trained face verification network, ensuring consistent protection across multiple viewpoints without retraining or modifying the avatar's geometry. AEGIS achieves complete de-identification, reducing face retrieval and verification accuracy to 0%, while maintaining high perceptual quality (SSIM = 0.9555, PSNR = 35.52 dB). It also preserves key facial attributes such as age, race, gender, and emotion, demonstrating strong privacy protection with minimal visual distortion.\n\n真实感三维人脸头像的广泛应用，尤其是那些采用高效三维高斯溅射（3DGS）表示的方法，正在带来新的在线身份盗用风险，特别是在依赖生物识别身份验证的系统中。尽管已有针对二维图像的对抗遮蔽方法取得良好效果，但在动态三维头像中实现稳健且视角一致的身份保护仍存在明显空白。为此，我们提出了 AEGIS，这是首个面向三维高斯头像的隐私保护身份遮蔽框架，同时保持人物感知特征的一致性。该方法旨在隐藏与身份相关的面部特征，同时保留头像的感知真实感与功能完整性。AEGIS 在不修改头像几何结构或重新训练模型的前提下，通过对高斯颜色系数施加对抗扰动，并借助预训练的人脸验证网络进行引导，实现在多视角下的一致性身份保护。AEGIS 可实现完全去身份化，使人脸检索与验证准确率降为 0%，同时保持较高的感知质量（SSIM = 0.9555，PSNR = 35.52 dB）。此外，它还能保留关键面部属性，如年龄、种族、性别和情感状态，展现出在几乎无可见失真的情况下的强隐私保护能力。\n"
  },
  {
    "path": "abs/2511.17904.md",
    "content": "### CUS-GS: A Compact Unified Structured Gaussian Splatting Framework for Multimodal Scene Representation\n\nRecent advances in Gaussian Splatting based 3D scene representation have shown two major trends: semantics-oriented approaches that focus on high-level understanding but lack explicit 3D geometry modeling, and structure-oriented approaches that capture spatial structures yet provide limited semantic abstraction. To bridge this gap, we present CUS-GS, a compact unified structured Gaussian Splatting representation, which connects multimodal semantic features with structured 3D geometry. Specifically, we design a voxelized anchor structure that constructs a spatial scaffold, while extracting multimodal semantic features from a set of foundation models (e.g., CLIP, DINOv2, SEEM). Moreover, we introduce a multimodal latent feature allocation mechanism to unify appearance, geometry, and semantics across heterogeneous feature spaces, ensuring a consistent representation across multiple foundation models. Finally, we propose a feature-aware significance evaluation strategy to dynamically guide anchor growing and pruning, effectively removing redundant or invalid anchors while maintaining semantic integrity. Extensive experiments show that CUS-GS achieves competitive performance compared to state-of-the-art methods using as few as 6M parameters - an order of magnitude smaller than the closest rival at 35M - highlighting the excellent trade off between performance and model efficiency of the proposed framework.\n\n近年来，基于高斯溅射的三维场景表示研究呈现出两大趋势：一类是以语义为导向的方法，侧重高层次理解但缺乏显式的三维几何建模；另一类则是以结构为导向的方法，注重空间结构捕捉但语义抽象能力有限。为弥合这一差距，我们提出了 CUS-GS（紧凑统一结构化高斯溅射表示），该方法将多模态语义特征与结构化三维几何进行融合。具体而言，我们设计了一种体素化锚点结构，作为空间骨架，并从一系列基础模型（如 CLIP、DINOv2、SEEM）中提取多模态语义特征。同时，我们引入了一种多模态潜特征分配机制，实现对外观、几何和语义在异构特征空间中的统一编码，确保跨模型表示的一致性。最后，我们提出了一种感知特征重要性的锚点评估策略，可动态引导锚点的生长与裁剪，有效剔除冗余或无效锚点，同时保持语义完整性。大量实验结果表明，CUS-GS 在仅使用 600 万参数的情况下即可达到与最先进方法相媲美的性能——相比之下，最接近的对比方法需使用 3500 万参数，凸显了该框架在性能与模型效率之间的卓越平衡。\n"
  },
  {
    "path": "abs/2511.17918.md",
    "content": "### Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization\n\nDespite 3D Gaussian Splatting (3DGS) excelling in most configurations, it lacks generalization across novel viewpoints in a few-shot scenario because it overfits to the sparse observations. We revisit 3DGS optimization from a machine learning perspective, framing novel view synthesis as a generalization problem to unseen viewpoints-an underexplored direction. We propose Frequency-Adaptive Sharpness Regularization (FASR), which reformulates the 3DGS training objective, thereby guiding 3DGS to converge toward a better generalization solution. Although Sharpness-Aware Minimization (SAM) similarly reduces the sharpness of the loss landscape to improve generalization of classification models, directly employing it to 3DGS is suboptimal due to the discrepancy between the tasks. Specifically, it hinders reconstructing high-frequency details due to excessive regularization, while reducing its strength leads to under-penalizing sharpness. To address this, we reflect the local frequency of images to set the regularization weight and the neighborhood radius when estimating the local sharpness. It prevents floater artifacts in novel viewpoints and reconstructs fine details that SAM tends to oversmooth. Across datasets with various configurations, our method consistently improves a wide range of baselines.\n\n尽管三维高斯溅射（3DGS）在大多数配置中表现出色，但在小样本场景下对新视角的泛化能力不足，因为它容易对稀疏观测结果过拟合。我们从机器学习的视角重新审视3DGS的优化过程，将新视角合成问题建模为一个对未见视角的泛化问题——这是一个尚未被充分探索的方向。为此，我们提出了频率自适应锐度正则化（FASR），重新构建3DGS的训练目标，引导其收敛至更具泛化能力的解。虽然锐度感知最小化（SAM）也通过降低损失函数的锐度提升分类模型的泛化能力，但将其直接应用于3DGS是不理想的，因为两者任务差异显著。具体而言，SAM会由于正则过强而抑制高频细节重建，而降低正则强度又会导致对锐度惩罚不足。为解决这一问题，我们通过图像的局部频率信息动态设定正则权重和局部锐度估计的邻域半径。这种方法有效避免了新视角下的浮动伪影，同时保留了SAM常常过度平滑的细节。在多个数据集与不同配置下，FASR均显著提升了各类基线方法的表现。\n"
  },
  {
    "path": "abs/2511.17932.md",
    "content": "### Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion\n\nGiven just a few glimpses of a scene, can you imagine the movie playing out as the camera glides through it? That's the lens we take on sparse-input novel view synthesis, not only as filling spatial gaps between widely spaced views, but also as completing a natural video unfolding through space.\nWe recast the task as test-time natural video completion, using powerful priors from pretrained video diffusion models to hallucinate plausible in-between views. Our zero-shot, generation-guided framework produces pseudo views at novel camera poses, modulated by an uncertainty-aware mechanism for spatial coherence. These synthesized frames densify supervision for 3D Gaussian Splatting (3D-GS) for scene reconstruction, especially in under-observed regions. An iterative feedback loop lets 3D geometry and 2D view synthesis inform each other, improving both the scene reconstruction and the generated views.\nThe result is coherent, high-fidelity renderings from sparse inputs without any scene-specific training or fine-tuning. On LLFF, DTU, DL3DV, and MipNeRF-360, our method significantly outperforms strong 3D-GS baselines under extreme sparsity.\n\n仅凭对场景的一些片段性观察，你能想象摄像机穿梭其中所呈现出的完整电影画面吗？这正是我们看待稀疏输入的新视角合成任务的方式：不仅是填补稀疏视角之间的空间空白，更是重建一段自然流动的视频。我们将该任务重新定义为测试时的自然视频补全，利用预训练视频扩散模型中蕴含的强大先验，以生成可信的中间视角画面。我们提出了一种零样本的生成引导框架，可在新颖视角下合成伪视图，并通过不确定性感知机制调控空间一致性。这些合成帧用于增强3D高斯溅射（3D-GS）场景重建中的监督信号，尤其在观测稀缺区域表现显著。我们还引入了一个迭代反馈机制，使三维几何与二维视图合成相互作用，协同提升场景重建与视图生成质量。最终，系统在无任何特定场景训练或微调的前提下，便能从极少量输入中生成连贯、高保真的渲染结果。在 LLFF、DTU、DL3DV 和 MipNeRF-360 数据集上，我们的方法在极度稀疏场景下显著优于现有强大的 3D-GS 基线方法。\n"
  },
  {
    "path": "abs/2511.17961.md",
    "content": "### RoboArmGS: High-Quality Robotic Arm Splatting via Bézier Curve Refinement\n\nConstructing photorealistic and controllable robotic arm digital assets from real observations is fundamental to robotic applications. Current approaches naively bind static 3D Gaussians according to URDF links, forcing them to follow an URDF-rigged motion passively. However, the idealized URDF-rigged motion cannot accurately model the actual motion captured in real-world observations, leading to severe rendering artifacts in 3D Gaussians. To address these challenges, we propose RoboArmGS, a novel hybrid representation that refines the URDF-rigged motion with learnable Bézier curves, enabling more accurate real-world motion modeling. To be more specific, we present a learnable Bézier Curve motion refiner that corrects per-joint residuals to address mismatches between real-world motion and URDF-rigged motion. RoboArmGS enables the learning of more accurate real-world motion while achieving a coherent binding of 3D Gaussians across arm parts. To support future research, we contribute a carefully collected dataset named RoboArm4D, which comprises several widely used robotic arms for evaluating the quality of building high-quality digital assets. We evaluate our approach on RoboArm4D, and RoboArmGS achieves state-of-the-art performance in real-world motion modeling and rendering quality.\n\n从真实观测中构建具备真实感且可控的机器人手臂数字资产是机器人应用中的关键问题。当前方法通常将静态三维高斯基元按 URDF 链接进行绑定，使其被动地跟随 URDF 骨架驱动的运动轨迹。然而，这种理想化的 URDF 骨架运动无法精确拟合真实世界中的实际运动，从而在三维高斯渲染中引发严重伪影。为了解决这一问题，我们提出了 RoboArmGS，一种新颖的混合式表示方法，利用可学习的贝塞尔曲线对 URDF 骨架运动进行细化，从而实现更准确的真实运动建模。具体而言，我们设计了一个可学习的贝塞尔曲线运动优化器，用于修正各关节的残差偏差，从而弥合实际观测运动与 URDF 运动之间的不一致。RoboArmGS 不仅能够学习更符合真实世界的运动轨迹，还能实现机器人手臂各部分高斯基元之间的连贯绑定。为支持后续研究，我们还构建并发布了 RoboArm4D 数据集，涵盖多个主流机器人手臂模型，用于评估高质量数字资产的构建效果。我们在 RoboArm4D 上进行了评估，结果显示 RoboArmGS 在真实运动建模与渲染质量方面均达到了当前最优水平。\n"
  },
  {
    "path": "abs/2511.18140.md",
    "content": "### Observer Actor: Active Vision Imitation Learning with Sparse View Gaussian Splatting\n\nWe propose Observer Actor (ObAct), a novel framework for active vision imitation learning in which the observer moves to optimal visual observations for the actor. We study ObAct on a dual-arm robotic system equipped with wrist-mounted cameras. At test time, ObAct dynamically assigns observer and actor roles: the observer arm constructs a 3D Gaussian Splatting (3DGS) representation from three images, virtually explores this to find an optimal camera pose, then moves to this pose; the actor arm then executes a policy using the observer's observations. This formulation enhances the clarity and visibility of both the object and the gripper in the policy's observations. As a result, we enable the training of ambidextrous policies on observations that remain closer to the occlusion-free training distribution, leading to more robust policies. We study this formulation with two existing imitation learning methods -- trajectory transfer and behavior cloning -- and experiments show that ObAct significantly outperforms static-camera setups: trajectory transfer improves by 145% without occlusion and 233% with occlusion, while behavior cloning improves by 75% and 143%, respectively.\n\n我们提出了 Observer Actor（ObAct），一个用于主动视觉模仿学习的新颖框架，其中观察者会主动移动到最佳视角为执行者提供观察信息。我们在配备腕部摄像头的双臂机器人系统上研究了 ObAct。在测试阶段，ObAct 会动态分配观察者与执行者的角色：观察者手臂通过三张图像构建三维高斯溅射（3DGS）表示，随后在虚拟空间中探索最优相机位姿并移动至该位置；执行者手臂则基于观察者提供的视角执行策略。该机制提高了策略观察中物体与机械夹爪的清晰度与可见性。因此，我们能够在更接近于无遮挡训练分布的观察下训练双臂通用策略，从而提升策略的鲁棒性。我们在两个现有模仿学习方法上验证了该框架——轨迹迁移与行为克隆，实验表明 ObAct 相较静态相机方案取得了显著提升：在无遮挡场景下，轨迹迁移提升 145%，有遮挡时提升 233%；行为克隆分别提升 75% 和 143%。\n"
  },
  {
    "path": "abs/2511.18386.md",
    "content": "### SegSplat: Feed-forward Gaussian Splatting and Open-Set Semantic Segmentation\n\nWe have introduced SegSplat, a novel framework designed to bridge the gap between rapid, feed-forward 3D reconstruction and rich, open-vocabulary semantic understanding. By constructing a compact semantic memory bank from multi-view 2D foundation model features and predicting discrete semantic indices alongside geometric and appearance attributes for each 3D Gaussian in a single pass, SegSplat efficiently imbues scenes with queryable semantics. Our experiments demonstrate that SegSplat achieves geometric fidelity comparable to state-of-the-art feed-forward 3D Gaussian Splatting methods while simultaneously enabling robust open-set semantic segmentation, crucially without requiring any per-scene optimization for semantic feature integration. This work represents a significant step towards practical, on-the-fly generation of semantically aware 3D environments, vital for advancing robotic interaction, augmented reality, and other intelligent systems.\n\n我们提出了 SegSplat，一种旨在弥合快速前馈式三维重建与丰富开放词汇语义理解之间鸿沟的新型框架。该方法通过从多视角图像中提取基础模型的二维语义特征构建紧凑的语义记忆库，并在一次前向传播中为每个三维高斯同时预测离散语义索引、几何属性和外观信息，从而高效地为场景注入可查询的语义信息。实验结果表明，SegSplat 在保持几何重建精度与现有最先进的前馈式三维高斯溅射方法相当的同时，还实现了稳健的开放集语义分割，且整个语义整合过程无需任何针对单场景的优化步骤。这项工作向实用、即时报生成具有语义感知能力的三维环境迈出了关键一步，对推动机器人交互、增强现实及其他智能系统的发展具有重要意义。\n"
  },
  {
    "path": "abs/2511.18441.md",
    "content": "### ReCoGS: Real-time ReColoring for Gaussian Splatting scenes\n\nGaussian Splatting has emerged as a leading method for novel view synthesis, offering superior training efficiency and real-time inference compared to NeRF approaches, while still delivering high-quality reconstructions. Beyond view synthesis, this 3D representation has also been explored for editing tasks. Many existing methods leverage 2D diffusion models to generate multi-view datasets for training, but they often suffer from limitations such as view inconsistencies, lack of fine-grained control, and high computational demand. In this work, we focus specifically on the editing task of recoloring. We introduce a user-friendly pipeline that enables precise selection and recoloring of regions within a pre-trained Gaussian Splatting scene. To demonstrate the real-time performance of our method, we also present an interactive tool that allows users to experiment with the pipeline in practice.\n\n高斯溅射（Gaussian Splatting）已成为新视角合成任务中的领先方法，在保持高质量重建效果的同时，相较于 NeRF 方法展现出更高的训练效率和实时推理能力。除了视角合成，该三维表示还被广泛应用于编辑任务。许多现有方法利用二维扩散模型生成多视角数据集进行训练，但常常存在视角不一致、控制粒度不足以及计算开销大的问题。在本工作中，我们聚焦于重新着色这一编辑任务，提出了一条用户友好的管线，可对预训练的高斯溅射场景中的特定区域进行精确选择与重新着色。为展示方法的实时性能，我们还开发了一个交互式工具，使用户能够在实践中便捷地体验和操作整个编辑流程。\n"
  },
  {
    "path": "abs/2511.18570.md",
    "content": "### PhysGS: Bayesian-Inferred Gaussian Splatting for Physical Property Estimation\n\nUnderstanding physical properties such as friction, stiffness, hardness, and material composition is essential for enabling robots to interact safely and effectively with their surroundings. However, existing 3D reconstruction methods focus on geometry and appearance and cannot infer these underlying physical properties. We present PhysGS, a Bayesian-inferred extension of 3D Gaussian Splatting that estimates dense, per-point physical properties from visual cues and vision--language priors. We formulate property estimation as Bayesian inference over Gaussian splats, where material and property beliefs are iteratively refined as new observations arrive. PhysGS also models aleatoric and epistemic uncertainties, enabling uncertainty-aware object and scene interpretation. Across object-scale (ABO-500), indoor, and outdoor real-world datasets, PhysGS improves accuracy of the mass estimation by up to 22.8%, reduces Shore hardness error by up to 61.2%, and lowers kinetic friction error by up to 18.1% compared to deterministic baselines. Our results demonstrate that PhysGS unifies 3D reconstruction, uncertainty modeling, and physical reasoning in a single, spatially continuous framework for dense physical property estimation.\n\n理解摩擦系数、刚度、硬度以及材料组成等物理属性，对于实现机器人与环境的安全高效交互至关重要。然而，现有的三维重建方法主要聚焦于几何和外观信息，无法推理这些底层物理属性。我们提出 PhysGS，一种基于贝叶斯推理的三维高斯溅射（3D Gaussian Splatting）扩展方法，能够从视觉线索与视觉-语言先验中估计密集的逐点物理属性。我们将属性估计建模为在高斯基元上的贝叶斯推理过程，其中关于材料与物理属性的信念会随着新观测的到来不断更新。PhysGS 同时建模了 Aleatoric（数据固有）与 Epistemic（模型知识）不确定性，从而支持对物体和场景的感知具备不确定性意识。在对象级（ABO-500）、室内与室外真实场景数据集上，PhysGS 将质量估计准确率提升最高达 22.8%，Shore 硬度误差降低最高达 61.2%，动摩擦系数误差降低最高达 18.1%，相较于确定性方法表现显著更优。我们的研究结果表明，PhysGS 在一个统一、空间连续的框架中整合了三维重建、不确定性建模与物理属性推理，实现了高密度物理属性的估计。\n"
  },
  {
    "path": "abs/2511.18600.md",
    "content": "### NeAR: Coupled Neural Asset-Renderer Stack\n\nNeural asset authoring and neural rendering have traditionally evolved as disjoint paradigms: one generates digital assets for fixed graphics pipelines, while the other maps conventional assets to images. However, treating them as independent entities limits the potential for end-to-end optimization in fidelity and consistency. In this paper, we bridge this gap with NeAR, a Coupled Neural Asset--Renderer Stack. We argue that co-designing the asset representation and the renderer creates a robust \"contract\" for superior generation. On the asset side, we introduce the Lighting-Homogenized SLAT (LH-SLAT). Leveraging a rectified-flow model, NeAR lifts casually lit single images into a canonical, illumination-invariant latent space, effectively suppressing baked-in shadows and highlights. On the renderer side, we design a lighting-aware neural decoder tailored to interpret these homogenized latents. Conditioned on HDR environment maps and camera views, it synthesizes relightable 3D Gaussian splats in real-time without per-object optimization. We validate NeAR on four tasks: (1) G-buffer-based forward rendering, (2) random-lit reconstruction, (3) unknown-lit relighting, and (4) novel-view relighting. Extensive experiments demonstrate that our coupled stack outperforms state-of-the-art baselines in both quantitative metrics and perceptual quality. We hope this coupled asset-renderer perspective inspires future graphics stacks that view neural assets and renderers as co-designed components instead of independent entities.\n\n神经资产生成与神经渲染传统上被视为两个独立发展的范式：前者为固定图形管线生成数字资产，后者则将传统资产映射为图像。然而，将二者作为彼此独立的模块处理，限制了在保真度与一致性上的端到端优化潜力。为此，本文提出 NeAR（Coupled Neural Asset--Renderer Stack），一种耦合的神经资产与渲染器框架。我们认为，资产表示与渲染器的协同设计能够构建更稳健的“契约”，从而实现更优的生成质量。在资产端，我们提出了光照归一化的 SLAT（Lighting-Homogenized SLAT，简称 LH-SLAT），利用整流流模型将自然光照下的单张图像提升至一个标准化、对光照不敏感的潜空间，从而有效抑制烘焙阴影与高光。在渲染端，我们设计了一个具备光照感知能力的神经解码器，专用于解析上述归一化潜空间。该解码器结合 HDR 环境贴图与相机视角，能够实时合成可重光照的三维高斯溅射表示，且无需每个物体进行独立优化。我们在四项任务中验证了 NeAR 的有效性：（1）基于 G-buffer 的前向渲染；（2）随机光照下的重建；（3）未知光照下的重光照；（4）新视角下的重光照。大量实验证明，该耦合框架在定量指标与感知质量方面均显著优于当前最先进的基线方法。我们希望这一资产与渲染器协同设计的视角，能启发未来神经图形系统将二者作为共同构建的组成部分，而非彼此孤立的单元。\n"
  },
  {
    "path": "abs/2511.18755.md",
    "content": "### Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing\n\n3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile platforms due to their high computational cost, especially for their tracking process.\nThis work introduces Splatonic, a sparse and efficient real-time 3DGS-SLAM algorithm-hardware co-design for resource-constrained devices. Inspired by classical SLAMs, we propose an adaptive sparse pixel sampling algorithm that reduces the number of rendered pixels by up to 256× while retaining accuracy. To unlock this performance potential on mobile GPUs, we design a novel pixel-based rendering pipeline that improves hardware utilization via Gaussian-parallel rendering and preemptive α-checking. Together, these optimizations yield up to 121.7× speedup on the bottleneck stages and 14.6× end-to-end speedup on off-the-shelf GPUs. To further address new bottlenecks introduced by our rendering pipeline, we propose a pipelined architecture that simplifies the overall design while addressing newly emerged bottlenecks in projection and aggregation. Evaluated across four 3DGS-SLAM algorithms, Splatonic achieves up to 274.9× speedup and 4738.5× energy savings over mobile GPUs and up to 25.2× speedup and 241.1× energy savings over state-of-the-art accelerators, all with comparable accuracy.\n\n三维高斯溅射（3DGS）因其高保真重建与快速收敛特性，已成为构建SLAM系统的一种有前景的方向。然而，由于计算开销巨大，尤其是在跟踪阶段的代价，现有的3DGS-SLAM算法在移动平台上难以实用化。为了解决这一问题，我们提出了 Splatonic——一种面向资源受限设备的稀疏高效实时3DGS-SLAM算法-硬件协同设计方法。受经典SLAM启发，我们提出了一种自适应稀疏像素采样算法，将渲染像素数量最多减少256倍，同时保持重建精度。为充分释放移动GPU上的性能潜力，我们设计了一种基于像素的全新渲染管线，通过高斯并行渲染与前瞻性α检查提高硬件利用率。这些优化带来了瓶颈阶段最高121.7倍加速和整体14.6倍端到端加速（在主流GPU上）。针对该渲染管线引入的新瓶颈，我们进一步提出了一种流水线架构，简化整体设计的同时解决投影与聚合阶段的新瓶颈。在四种3DGS-SLAM算法上进行评估后，Splatonic 在移动GPU上可实现最高274.9倍加速与4738.5倍能耗节省，在先进加速器上也可实现最高25.2倍加速与241.1倍能耗节省，且保持与原始方法相当的准确率。\n"
  },
  {
    "path": "abs/2511.18873.md",
    "content": "### Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction\n\n3D Gaussian Splatting (3DGS) has emerged as a leading approach for high-quality novel view synthesis, with numerous variants extending its applicability to a broad spectrum of 3D and 4D scene reconstruction tasks. Despite its success, the representational capacity of 3DGS remains limited by the use of 3D Gaussian kernels to model local variations. Recent works have proposed to augment 3DGS with additional per-primitive capacity, such as per-splat textures, to enhance its expressiveness. However, these per-splat texture approaches primarily target dense novel view synthesis with a reduced number of Gaussian primitives, and their effectiveness tends to diminish when applied to more general reconstruction scenarios. In this paper, we aim to achieve concrete performance improvement over state-of-the-art 3DGS variants across a wide range of reconstruction tasks, including novel view synthesis, geometry and dynamic reconstruction, under both sparse and dense input settings. To this end, we introduce Neural Texture Splatting (NTS). At the core of our approach is a global neural field (represented as a hybrid of a tri-plane and a neural decoder) that predicts local appearance and geometric fields for each primitive. By leveraging this shared global representation that models local texture fields across primitives, we significantly reduce model size and facilitate efficient global information exchange, demonstrating strong generalization across tasks. Furthermore, our neural modeling of local texture fields introduces expressive view- and time-dependent effects, a critical aspect that existing methods fail to account for. Extensive experiments show that Neural Texture Splatting consistently improves models and achieves state-of-the-art results across multiple benchmarks.\n\n三维高斯溅射（3DGS）已成为高质量新视角合成的领先方法，众多变体不断拓展其在三维和四维场景重建中的应用范围。尽管取得了显著成功，3DGS 的表达能力仍受限于使用三维高斯核对局部变化的建模能力。近期一些研究尝试通过引入每个高斯基元的纹理等增强结构来提升其表达力，但这些方法主要针对的是使用较少高斯基元进行的稠密新视角合成，在更一般性的重建场景中表现往往不佳。本文旨在在更广泛的任务范围内，包括新视角合成、几何重建与动态重建，并同时涵盖稀疏和稠密输入设置，全面超越当前最先进的 3DGS 变体。为此，我们提出了 Neural Texture Splatting（NTS）。其核心思想是在全局神经场中引入混合结构（由 tri-plane 与神经解码器构成），为每个基元预测其局部外观与几何属性。借助这种跨基元共享的全局表示，我们显著减少了模型规模，并实现了高效的全局信息交互，从而展现出良好的任务泛化能力。此外，我们对局部纹理场的神经建模还引入了具备表达力的视角与时间依赖效果，这是现有方法普遍忽略的关键要素。大量实验证明，Neural Texture Splatting 在多个基准任务上持续提升模型性能，达成当前最优水平。\n"
  },
  {
    "path": "abs/2511.19172.md",
    "content": "### MetroGS: Efficient and Stable Reconstruction of Geometrically Accurate High-Fidelity Large-Scale Scenes\n\nRecently, 3D Gaussian Splatting and its derivatives have achieved significant breakthroughs in large-scale scene reconstruction. However, how to efficiently and stably achieve high-quality geometric fidelity remains a core challenge. To address this issue, we introduce MetroGS, a novel Gaussian Splatting framework for efficient and robust reconstruction in complex urban environments. Our method is built upon a distributed 2D Gaussian Splatting representation as the core foundation, serving as a unified backbone for subsequent modules. To handle potential sparse regions in complex scenes, we propose a structured dense enhancement scheme that utilizes SfM priors and a pointmap model to achieve a denser initialization, while incorporating a sparsity compensation mechanism to improve reconstruction completeness. Furthermore, we design a progressive hybrid geometric optimization strategy that organically integrates monocular and multi-view optimization to achieve efficient and accurate geometric refinement. Finally, to address the appearance inconsistency commonly observed in large-scale scenes, we introduce a depth-guided appearance modeling approach that learns spatial features with 3D consistency, facilitating effective decoupling between geometry and appearance and further enhancing reconstruction stability. Experiments on large-scale urban datasets demonstrate that MetroGS achieves superior geometric accuracy, rendering quality, offering a unified solution for high-fidelity large-scale scene reconstruction.\n\n近年来，三维高斯溅射及其衍生方法在大规模场景重建中取得了显著突破。然而，如何高效且稳定地实现高质量几何保真仍然是核心挑战。为此，我们提出了 MetroGS，一种面向复杂城市环境的高效鲁棒重建高斯溅射框架。该方法以分布式二维高斯溅射表示为核心基础，作为统一主干支撑后续各模块。为应对复杂场景中可能出现的稀疏区域，我们提出了一种结构化稠密增强策略，结合SfM先验与点图模型进行稠密初始化，并引入稀疏性补偿机制以提升重建完整性。此外，我们设计了一种渐进式混合几何优化策略，将单目与多视角优化有机融合，实现高效精确的几何细化。针对大规模场景中常见的外观不一致问题，我们引入一种基于深度引导的外观建模方法，学习具有三维一致性的空间特征，实现几何与外观的有效解耦，进一步增强重建稳定性。在多个大规模城市数据集上的实验结果表明，MetroGS 在几何精度与渲染质量方面均取得领先性能，提供了一种统一的高保真大规模场景重建解决方案。\n"
  },
  {
    "path": "abs/2511.19202.md",
    "content": "### NVGS: Neural Visibility for Occlusion Culling in 3D Gaussian Splatting\n\n3D Gaussian Splatting can exploit frustum culling and level-of-detail strategies to accelerate rendering of scenes containing a large number of primitives. However, the semi-transparent nature of Gaussians prevents the application of another highly effective technique: occlusion culling. We address this limitation by proposing a novel method to learn the viewpoint-dependent visibility function of all Gaussians in a trained model using a small, shared MLP across instances of an asset in a scene. By querying it for Gaussians within the viewing frustum prior to rasterization, our method can discard occluded primitives during rendering. Leveraging Tensor Cores for efficient computation, we integrate these neural queries directly into a novel instanced software rasterizer. Our approach outperforms the current state of the art for composed scenes in terms of VRAM usage and image quality, utilizing a combination of our instanced rasterizer and occlusion culling MLP, and exhibits complementary properties to existing LoD techniques.\n\n三维高斯溅射可通过视锥剔除（frustum culling）和细节层次（LoD）策略来加速包含大量图元的场景渲染。然而，由于高斯具有半透明特性，无法应用另一种高效技术：遮挡剔除（occlusion culling）。为了解决这一限制，我们提出了一种新方法，通过一个小型共享的多层感知机（MLP）来学习已训练模型中所有高斯的视角相关可见性函数，并在场景中各个资产实例之间共享。在光栅化之前，我们可对视锥内的高斯进行查询，从而剔除渲染过程中被遮挡的图元。我们进一步利用 Tensor Core 实现高效计算，并将神经查询模块直接集成至新设计的实例化软件光栅器中。该方法结合了实例化光栅器与遮挡剔除 MLP，在复杂场景中显著降低了显存占用并提升了图像质量，同时与现有的 LoD 技术互为补充，展现出更优越的性能表现。\n"
  },
  {
    "path": "abs/2511.19235.md",
    "content": "### IDSplat: Instance-Decomposed 3D Gaussian Splatting for Driving Scenes\n\nReconstructing dynamic driving scenes is essential for developing autonomous systems through sensor-realistic simulation. Although recent methods achieve high-fidelity reconstructions, they either rely on costly human annotations for object trajectories or use time-varying representations without explicit object-level decomposition, leading to intertwined static and dynamic elements that hinder scene separation. We present IDSplat, a self-supervised 3D Gaussian Splatting framework that reconstructs dynamic scenes with explicit instance decomposition and learnable motion trajectories, without requiring human annotations. Our key insight is to model dynamic objects as coherent instances undergoing rigid transformations, rather than unstructured time-varying primitives. For instance decomposition, we employ zero-shot, language-grounded video tracking anchored to 3D using lidar, and estimate consistent poses via feature correspondences. We introduce a coordinated-turn smoothing scheme to obtain temporally and physically consistent motion trajectories, mitigating pose misalignments and tracking failures, followed by joint optimization of object poses and Gaussian parameters. Experiments on the Waymo Open Dataset demonstrate that our method achieves competitive reconstruction quality while maintaining instance-level decomposition and generalizes across diverse sequences and view densities without retraining, making it practical for large-scale autonomous driving applications.\n\n动态驾驶场景的重建对于通过传感器真实感模拟发展自动驾驶系统至关重要。尽管近期方法已能实现高保真重建，但它们往往依赖昂贵的人工标注以获取物体轨迹，或采用不具备显式实例级分解的时变表示，从而导致静态与动态元素相互交织，难以实现清晰的场景分离。我们提出 IDSplat，一种自监督的三维高斯溅射框架，能够在无需人工标注的前提下，实现具有显式实例分解与可学习运动轨迹的动态场景重建。其核心思想是将动态物体建模为执行刚性变换的连贯实例，而非非结构化的时变基元。在实例分解方面，我们结合激光雷达数据，利用零样本、语言引导的视频跟踪方法将对象锚定于三维空间，并通过特征匹配估计一致的姿态。我们还提出了协同转向平滑机制，以获得时间上与物理上均一致的运动轨迹，从而缓解姿态误差与跟踪失败问题，并进一步联合优化物体姿态与高斯参数。在 Waymo Open Dataset 上的实验证明，IDSplat 在保持实例级分解的同时，实现了有竞争力的重建质量，且在不同序列和视角密度条件下具备良好的泛化能力，无需重新训练，展现出其在大规模自动驾驶应用中的实用性。\n"
  },
  {
    "path": "abs/2511.19294.md",
    "content": "### DensifyBeforehand: LiDAR-assisted Content-aware Densification for Efficient and Quality 3D Gaussian Splatting\n\nThis paper addresses the limitations of existing 3D Gaussian Splatting (3DGS) methods, particularly their reliance on adaptive density control, which can lead to floating artifacts and inefficient resource usage. We propose a novel densify beforehand approach that enhances the initialization of 3D scenes by combining sparse LiDAR data with monocular depth estimation from corresponding RGB images. Our ROI-aware sampling scheme prioritizes semantically and geometrically important regions, yielding a dense point cloud that improves visual fidelity and computational efficiency. This densify beforehand approach bypasses the adaptive density control that may introduce redundant Gaussians in the original pipeline, allowing the optimization to focus on the other attributes of 3D Gaussian primitives, reducing overlap while enhancing visual quality. Our method achieves comparable results to state-of-the-art techniques while significantly lowering resource consumption and training time. We validate our approach through extensive comparisons and ablation studies on four newly collected datasets, showcasing its effectiveness in preserving regions of interest in complex scenes.\n\n本文针对当前三维高斯溅射（3DGS）方法的局限性，特别是其对自适应密度控制的依赖问题，提出了一种全新的预密集化（densify beforehand）策略。传统的自适应密度控制常会引入漂浮伪影并导致资源使用低效，而我们的方法在三维场景初始化阶段，通过融合稀疏的 LiDAR 数据与对应 RGB 图像的单目深度估计，生成更具信息性的初始点云。我们设计了一个 ROI 感知采样机制，优先保留在语义和几何上重要的区域，从而获得更高密度、更高保真度的点云表示，同时提升计算效率。通过在预处理阶段完成密度增强，我们绕过了原始流程中易引入冗余高斯的自适应密度机制，使得后续优化过程可聚焦于高斯基元的其他属性，如颜色、透明度与协方差，从而有效减少重叠并增强视觉质量。实验在四个新采集的数据集上进行，涵盖定量对比与消融分析，结果表明我们的方法在保持语义关注区域清晰重建的同时，大幅降低了资源消耗与训练时间，并在视觉效果上与当前最先进技术持平。\n"
  },
  {
    "path": "abs/2511.19542.md",
    "content": "### Proxy-Free Gaussian Splats Deformation with Splat-Based Surface Estimation\n\nWe introduce SpLap, a proxy-free deformation method for Gaussian splats (GS) based on a Laplacian operator computed from our novel surface-aware splat graph. Existing approaches to GS deformation typically rely on deformation proxies such as cages or meshes, but they suffer from dependency on proxy quality and additional computational overhead. An alternative is to directly apply Laplacian-based deformation techniques by treating splats as point clouds. However, this often fail to properly capture surface information due to lack of explicit structure. To address this, we propose a novel method that constructs a surface-aware splat graph, enabling the Laplacian operator derived from it to support more plausible deformations that preserve details and topology. Our key idea is to leverage the spatial arrangement encoded in splats, defining neighboring splats not merely by the distance between their centers, but by their intersections. Furthermore, we introduce a Gaussian kernel adaptation technique that preserves surface structure under deformation, thereby improving rendering quality after deformation. In our experiments, we demonstrate the superior performance of our method compared to both proxy-based and proxy-free baselines, evaluated on 50 challenging objects from the ShapeNet, Objaverse, and Sketchfab datasets, as well as the NeRF-Synthetic dataset.\n\n我们提出了 SpLap，这是一种无需代理的高斯球面 (Gaussian splats, GS) 变形方法，基于我们提出的具有表面感知能力的 Splat 图计算拉普拉斯算子。现有的 GS 变形方法通常依赖于笼架（cages）或网格（meshes）等变形代理，但这些方法受到代理质量的影响，并带来额外的计算开销。另一种方式是将 splats 视为点云，直接应用基于拉普拉斯的变形技术，但由于缺乏显式结构，这类方法往往难以准确捕捉表面信息。为了解决这一问题，我们提出了一种新方法，构建具备表面感知能力的 Splat 图，使得由其导出的拉普拉斯算子能够支持更合理的变形过程，从而更好地保持细节与拓扑结构。我们提出的关键思想是利用 splats 中编码的空间排列关系，不仅根据中心之间的距离定义邻近关系，还考虑它们之间的相交情况。此外，我们还提出了一种高斯核自适应技术，用于在变形过程中保持表面结构，从而提升变形后的渲染质量。在实验中，我们在来自 ShapeNet、Objaverse 和 Sketchfab 的 50 个挑战性对象上进行评估，同时包含 NeRF-Synthetic 数据集，实验结果表明，我们的方法在性能上优于基于代理和无代理的各类基线方法。\n"
  },
  {
    "path": "abs/2511.19854.md",
    "content": "### STAvatar: Soft Binding and Temporal Density Control for Monocular 3D Head Avatars Reconstruction\n\nReconstructing high-fidelity and animatable 3D head avatars from monocular videos remains a challenging yet essential task. Existing methods based on 3D Gaussian Splatting typically bind Gaussians to mesh triangles and model deformations solely via Linear Blend Skinning, which results in rigid motion and limited expressiveness. Moreover, they lack specialized strategies to handle frequently occluded regions (e.g., mouth interiors, eyelids). To address these limitations, we propose STAvatar, which consists of two key components: (1) a UV-Adaptive Soft Binding framework that leverages both image-based and geometric priors to learn per-Gaussian feature offsets within the UV space. This UV representation supports dynamic resampling, ensuring full compatibility with Adaptive Density Control (ADC) and enhanced adaptability to shape and textural variations. (2) a Temporal ADC strategy, which first clusters structurally similar frames to facilitate more targeted computation of the densification criterion. It further introduces a novel fused perceptual error as clone criterion to jointly capture geometric and textural discrepancies, encouraging densification in regions requiring finer details. Extensive experiments on four benchmark datasets demonstrate that STAvatar achieves state-of-the-art reconstruction performance, especially in capturing fine-grained details and reconstructing frequently occluded regions.\n\n从单目视频中重建高保真、可动画驱动的三维头部头像仍是一项具有挑战性但至关重要的任务。现有基于三维高斯渲染（3D Gaussian Splatting）的方法通常将高斯绑定到网格三角形上，并仅通过线性混合蒙皮（Linear Blend Skinning）进行变形建模，这会导致动作僵硬、表现力有限。此外，这些方法缺乏专门策略处理频繁遮挡的区域（如口腔内部、眼睑等）。为克服这些局限，我们提出了 STAvatar 方法，包含两个关键组件：（1）UV 自适应软绑定（UV-Adaptive Soft Binding）框架，结合图像先验和几何先验，在 UV 空间中学习每个高斯的特征偏移量。该 UV 表示支持动态重采样，确保与自适应密度控制（Adaptive Density Control, ADC）完全兼容，并提升对形状与纹理变化的适应性；（2）时序自适应密度控制（Temporal ADC）策略，首先聚类结构上相似的帧，以实现更具针对性的致密化准则计算。该策略还引入了一种融合感知误差的全新克隆准则（clone criterion），能够同时捕捉几何与纹理差异，促进需要更细节的区域进行致密化。我们在四个基准数据集上进行了广泛实验，结果表明 STAvatar 在细节捕捉及遮挡区域重建方面均达到了当前最优性能。\n"
  },
  {
    "path": "abs/2511.19861.md",
    "content": "### GigaWorld-0: World Models as Data Engine to Empower Embodied AI\n\nWorld models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and temporally coherent embodied sequences under fine-grained control of appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism. Their joint optimization enables the scalable synthesis of embodied interaction data that is visually compelling, spatially coherent, physically plausible, and instruction-aligned. Training at scale is made feasible through our efficient GigaTrain framework, which exploits FP8-precision and sparse attention to drastically reduce memory and compute requirements. We conduct comprehensive evaluations showing that GigaWorld-0 generates high-quality, diverse, and controllable data across multiple dimensions. Critically, VLA model (e.g., GigaBrain-0) trained on GigaWorld-0-generated data achieve strong real-world performance, significantly improving generalization and task success on physical robots without any real-world interaction during training.\n\n世界模型正在成为可扩展、高数据效率的具身智能（embodied AI）的基础范式。在本研究中，我们提出了 GigaWorld-0，这是一个专为视觉-语言-动作（Vision-Language-Action, VLA）学习而设计的统一世界模型框架，作为数据引擎使用。GigaWorld-0 融合了两个协同模块：GigaWorld-0-Video 利用大规模视频生成，在外观、相机视角和动作语义的细粒度控制下，生成多样、纹理丰富且时间连贯的具身交互序列；GigaWorld-0-3D 则融合了三维生成建模、三维高斯重建（3D Gaussian Splatting）、物理可微系统识别以及可执行的运动规划，以确保几何一致性与物理真实感。这两个模块的联合优化，使得具身交互数据的可扩展合成在视觉吸引力、空间连贯性、物理合理性与指令一致性方面兼具。为了支持大规模训练，我们提出了高效的 GigaTrain 框架，结合 FP8 精度和稀疏注意力机制，大幅降低内存与计算资源消耗。我们进行了全面评估，表明 GigaWorld-0 能够在多个维度上生成高质量、多样性强且高度可控的数据。尤为重要的是，使用 GigaWorld-0 所生成数据训练的 VLA 模型（如 GigaBrain-0）在真实世界任务中表现优异，显著提升了泛化能力和物理机器人任务成功率，而在训练阶段无需任何真实世界交互。\n"
  },
  {
    "path": "abs/2511.20050.md",
    "content": "### Active3D: Active High-Fidelity 3D Reconstruction via Hierarchical Uncertainty Quantification\n\nIn this paper, we present an active exploration framework for high-fidelity 3D reconstruction that incrementally builds a multi-level uncertainty space and selects next-best-views through an uncertainty-driven motion planner. We introduce a hybrid implicit-explicit representation that fuses neural fields with Gaussian primitives to jointly capture global structural priors and locally observed details. Based on this hybrid state, we derive a hierarchical uncertainty volume that quantifies both implicit global structure quality and explicit local surface confidence. To focus optimization on the most informative regions, we propose an uncertainty-driven keyframe selection strategy that anchors high-entropy viewpoints as sparse attention nodes, coupled with a viewpoint-space sliding window for uncertainty-aware local refinement. The planning module formulates next-best-view selection as an Expected Hybrid Information Gain problem and incorporates a risk-sensitive path planner to ensure efficient and safe exploration. Extensive experiments on challenging benchmarks demonstrate that our approach consistently achieves state-of-the-art accuracy, completeness, and rendering quality, highlighting its effectiveness for real-world active reconstruction and robotic perception tasks.\n\n本文提出了一种面向高保真三维重建的主动探索框架，该框架通过逐步构建多层次的不确定性空间，并基于不确定性驱动的运动规划器选择下一最佳视角（next-best-view）。我们引入了一种显式与隐式结合的混合表示方式，将神经场与高斯基元融合，用于同时建模全局结构先验与局部观测细节。在此混合状态基础上，我们构建了一个分层的不确定性体积，用于同时量化隐式的全局结构质量和显式的局部表面置信度。为了将优化聚焦在最具信息量的区域，我们提出了一种基于不确定性的关键帧选择策略，将高熵视角作为稀疏注意节点，并结合视角空间滑动窗口机制进行局部不确定性感知的细化优化。规划模块将下一最佳视角选择建模为期望混合信息增益（Expected Hybrid Information Gain）问题，并结合风险敏感路径规划器，确保探索过程的高效性与安全性。在多个具有挑战性的基准测试中，我们的方法在准确性、完整性和渲染质量方面均稳定达到当前最优水平，展现出其在真实场景主动重建与机器人感知任务中的优越性。\n"
  },
  {
    "path": "abs/2511.20348.md",
    "content": "### Material-informed Gaussian Splatting for 3D World Reconstruction in a Digital Twin\n\n3D reconstruction for Digital Twins often relies on LiDAR-based methods, which provide accurate geometry but lack the semantics and textures naturally captured by cameras. Traditional LiDAR-camera fusion approaches require complex calibration and still struggle with certain materials like glass, which are visible in images but poorly represented in point clouds. We propose a camera-only pipeline that reconstructs scenes using 3D Gaussian Splatting from multi-view images, extracts semantic material masks via vision models, converts Gaussian representations to mesh surfaces with projected material labels, and assigns physics-based material properties for accurate sensor simulation in modern graphics engines and simulators. This approach combines photorealistic reconstruction with physics-based material assignment, providing sensor simulation fidelity comparable to LiDAR-camera fusion while eliminating hardware complexity and calibration requirements. We validate our camera-only method using an internal dataset from an instrumented test vehicle, leveraging LiDAR as ground truth for reflectivity validation alongside image similarity metrics.\n\n数字孪生中的三维重建通常依赖于基于激光雷达（LiDAR）的方法，这类方法能够提供精确的几何信息，但缺乏摄像头自然捕捉到的语义与纹理。传统的激光雷达-摄像头融合方法需要复杂的标定流程，且在处理诸如玻璃等材料时仍存在困难——这些材料在图像中清晰可见，但在点云中表现不佳。为此，我们提出了一种仅基于摄像头的重建流程：通过多视角图像实现三维高斯渲染（3D Gaussian Splatting）进行场景重建，借助视觉模型提取语义材质掩膜，将高斯表示转换为带有投影材质标签的网格表面，并赋予物理驱动的材质属性，以支持现代图形引擎与仿真器中的精确传感器模拟。该方法融合了真实感重建与基于物理的材质赋值，在消除硬件复杂性与标定需求的同时，提供了可与激光雷达-摄像头融合方法媲美的仿真精度。我们使用来自一辆搭载传感器的测试车辆的内部数据集对该摄像头-only 方法进行了验证，利用激光雷达作为反射率验证的参考，并结合图像相似度指标进行评估。\n"
  },
  {
    "path": "abs/2511.20354.md",
    "content": "### GS-Checker: Tampering Localization for 3D Gaussian Splatting\n\nRecent advances in editing technologies for 3D Gaussian Splatting (3DGS) have made it simple to manipulate 3D scenes. However, these technologies raise concerns about potential malicious manipulation of 3D content. To avoid such malicious applications, localizing tampered regions becomes crucial. In this paper, we propose GS-Checker, a novel method for locating tampered areas in 3DGS models. Our approach integrates a 3D tampering attribute into the 3D Gaussian parameters to indicate whether the Gaussian has been tampered. Additionally, we design a 3D contrastive mechanism by comparing the similarity of key attributes between 3D Gaussians to seek tampering cues at 3D level. Furthermore, we introduce a cyclic optimization strategy to refine the 3D tampering attribute, enabling more accurate tampering localization. Notably, our approach does not require expensive 3D labels for supervision. Extensive experimental results demonstrate the effectiveness of our proposed method to locate the tampered 3DGS area.\n\n近年来，3D 高斯渲染（3D Gaussian Splatting，3DGS）编辑技术的快速发展使得操控三维场景变得十分便捷。然而，这些技术也引发了对三维内容被恶意篡改的担忧。为防止此类恶意应用，定位被篡改区域显得尤为关键。本文提出了一种新颖的方法——GS-Checker，用于在 3DGS 模型中定位被篡改的区域。我们的方法在三维高斯参数中引入了一个 3D 篡改属性，用于指示对应高斯是否被篡改。此外，我们设计了一种三维对比机制，通过比较高斯之间关键属性的相似性，从三维层面寻找篡改线索。进一步地，我们引入了一种循环优化策略，对 3D 篡改属性进行精细优化，从而实现更准确的篡改定位。值得一提的是，该方法无需昂贵的三维标注数据作为监督。大量实验证明，所提出的方法在定位 3DGS 篡改区域方面具有显著效果。\n"
  },
  {
    "path": "abs/2512.07345.md",
    "content": "### Debiasing Diffusion Priors via 3D Attention for Consistent Gaussian Splatting\n\nVersatile 3D tasks (e.g., generation or editing) that distill from Text-to-Image (T2I) diffusion models have attracted significant research interest for not relying on extensive 3D training data. However, T2I models exhibit limitations resulting from prior view bias, which produces conflicting appearances between different views of an object. This bias causes subject-words to preferentially activate prior view features during cross-attention (CA) computation, regardless of the target view condition. To overcome this limitation, we conduct a comprehensive mathematical analysis to reveal the root cause of the prior view bias in T2I models. Moreover, we find different UNet layers show different effects of prior view in CA. Therefore, we propose a novel framework, TD-Attn, which addresses multi-view inconsistency via two key components: (1) the 3D-Aware Attention Guidance Module (3D-AAG) constructs a view-consistent 3D attention Gaussian for subject-words to enforce spatial consistency across attention-focused regions, thereby compensating for the limited spatial information in 2D individual view CA maps; (2) the Hierarchical Attention Modulation Module (HAM) utilizes a Semantic Guidance Tree (SGT) to direct the Semantic Response Profiler (SRP) in localizing and modulating CA layers that are highly responsive to view conditions, where the enhanced CA maps further support the construction of more consistent 3D attention Gaussians. Notably, HAM facilitates semantic-specific interventions, enabling controllable and precise 3D editing. Extensive experiments firmly establish that TD-Attn has the potential to serve as a universal plugin, significantly enhancing multi-view consistency across 3D tasks.\n\n从文本到图像扩散模型蒸馏得到的多种三维任务，例如生成和编辑，由于无需大量三维训练数据而受到广泛关注。然而，这类模型存在先验视角偏置，导致物体在不同视角之间出现相互冲突的外观。这种偏置会使与主体相关的词在交叉注意力计算时，无论目标视角如何，都优先激活先验视角特征。为解决这一问题，我们进行了系统的数学分析，以揭示文本到图像模型中先验视角偏置的根源。同时我们发现，不同 UNet 层中的交叉注意力受先验视角影响的方式并不相同。因此，我们提出新框架 TD-Attn，通过两个关键模块解决多视角不一致问题：(1) 3D-Aware Attention Guidance Module (3D-AAG) 为主体词构建视角一致的三维注意力高斯，以在注意力聚焦区域之间施加空间一致性，从而弥补二维单视角交叉注意力图空间信息不足的问题；(2) Hierarchical Attention Modulation Module (HAM) 通过 Semantic Guidance Tree (SGT) 引导 Semantic Response Profiler (SRP) 定位并调节对视角条件响应最强的交叉注意力层，进而利用增强后的注意力图构建更一致的三维注意力高斯。值得注意的是，HAM 支持语义特定的干预，从而实现可控且精细的三维编辑。大量实验表明，TD-Attn 有潜力作为通用插件，显著提升多种三维任务中的多视角一致性。\n"
  },
  {
    "path": "abs/2512.07381.md",
    "content": "### Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects\n\n3D Gaussian Splatting (GS) enables highly photorealistic scene reconstruction from posed image sequences but struggles with viewpoint extrapolation due to its anisotropic nature, leading to overfitting and poor generalization, particularly in sparse-view and dynamic scene reconstruction. We propose Tessellation GS, a structured 2D GS approach anchored on mesh faces, to reconstruct dynamic scenes from a single continuously moving or static camera. Our method constrains 2D Gaussians to localized regions and infers their attributes via hierarchical neural features on mesh faces. Gaussian subdivision is guided by an adaptive face subdivision strategy driven by a detail-aware loss function. Additionally, we leverage priors from a reconstruction foundation model to initialize Gaussian deformations, enabling robust reconstruction of general dynamic objects from a single static camera, previously extremely challenging for optimization-based methods. Our method outperforms previous SOTA method, reducing LPIPS by 29.1% and Chamfer distance by 49.2% on appearance and mesh reconstruction tasks.\n\n3D Gaussian Splatting (GS) 能够从已知位姿的图像序列中实现高度逼真的场景重建，但由于其各向异性特性，在视角外推时容易过拟合且泛化能力较差，尤其是在稀疏视角和动态场景重建中更为明显。我们提出 Tessellation GS，这是一种锚定在网格面片上的结构化二维 GS 方法，可从单个连续运动或静止相机中重建动态场景。该方法将二维高斯约束在局部区域内，并通过网格面片上的分层神经特征推断其属性。高斯细分由细节感知损失驱动的自适应面片细分策略进行引导。此外，我们利用重建基础模型的先验来初始化高斯形变，从而使得从单个静止相机稳健重建一般动态物体成为可能，而这对于基于优化的方法来说此前极具挑战。我们的方法优于先前最先进方法，在外观和网格重建任务上将 LPIPS 降低 29.1%，并将 Chamfer 距离降低 49.2%。\n"
  },
  {
    "path": "abs/2512.08334.md",
    "content": "### HybridSplat: Fast Reflection-baked Gaussian Tracing using Hybrid Splatting\n\nRendering complex reflection of real-world scenes using 3D Gaussian splatting has been a quite promising solution for photorealistic novel view synthesis, but still faces bottlenecks especially in rendering speed and memory storage. This paper proposes a new Hybrid Splatting(HybridSplat) mechanism for Gaussian primitives. Our key idea is a new reflection-baked Gaussian tracing, which bakes the view-dependent reflection within each Gaussian primitive while rendering the reflection using tile-based Gaussian splatting. Then we integrate the reflective Gaussian primitives with base Gaussian primitives using a unified hybrid splatting framework for high-fidelity scene reconstruction. Moreover, we further introduce a pipeline-level acceleration for the hybrid splatting, and reflection-sensitive Gaussian pruning to reduce the model size, thus achieving much faster rendering speed and lower memory storage while preserving the reflection rendering quality. By extensive evaluation, our HybridSplat accelerates about 7x rendering speed across complex reflective scenes from Ref-NeRF, NeRF-Casting with 4x fewer Gaussian primitives than similar ray-tracing based Gaussian splatting baselines, serving as a new state-of-the-art method especially for complex reflective scenes.\n\n使用 3D Gaussian splatting 渲染真实世界场景中的复杂反射，已成为实现照片级新视角合成的一个很有前景的方案，但在渲染速度和内存占用方面仍存在瓶颈。本文提出一种新的 Gaussian primitive 混合 splatting 机制 HybridSplat。其核心思想是新的 reflection-baked Gaussian tracing：把视角相关反射预烘焙到每个 Gaussian primitive 中，同时在渲染反射时使用基于 tile 的 Gaussian splatting。随后，我们在统一的混合 splatting 框架中，将反射 Gaussian primitive 与基础 Gaussian primitive 融合，以实现高保真场景重建。此外，我们还引入了面向整个管线的加速方案，以及对反射敏感的 Gaussian 剪枝，以在保持反射渲染质量的同时显著提升渲染速度并降低内存占用。大量评估表明，HybridSplat 在复杂反射场景上的渲染速度约提升 7 倍，相比类似的基于射线追踪的 Gaussian splatting 基线只需四分之一的 Gaussian primitive，成为复杂反射场景上的新 SOTA 方法。\n"
  },
  {
    "path": "abs/2512.08625.md",
    "content": "### OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics\n\nSimultaneous Localization and Mapping (SLAM) is a foundational component in robotics, AR/VR, and autonomous systems. With the rising focus on spatial AI in recent years, combining SLAM with semantic understanding has become increasingly important for enabling intelligent perception and interaction. Recent efforts have explored this integration, but they often rely on depth sensors or closed-set semantic models, limiting their scalability and adaptability in open-world environments. In this work, we present OpenMonoGS-SLAM, the first monocular SLAM framework that unifies 3D Gaussian Splatting (3DGS) with open-set semantic understanding. To achieve our goal, we leverage recent advances in Visual Foundation Models (VFMs), including MASt3R for visual geometry and SAM and CLIP for open-vocabulary semantics. These models provide robust generalization across diverse tasks, enabling accurate monocular camera tracking and mapping, as well as a rich understanding of semantics in open-world environments. Our method operates without any depth input or 3D semantic ground truth, relying solely on self-supervised learning objectives. Furthermore, we propose a memory mechanism specifically designed to manage high-dimensional semantic features, which effectively constructs Gaussian semantic feature maps, leading to strong overall performance. Experimental results demonstrate that our approach achieves performance comparable to or surpassing existing baselines in both closed-set and open-set segmentation tasks, all without relying on supplementary sensors such as depth maps or semantic annotations.\n\n同步定位与建图（SLAM）是机器人、AR/VR 和自动系统中的基础组成部分。随着近几年空间智能受到越来越多关注，将 SLAM 与语义理解结合起来，对于实现智能感知与交互也变得更加重要。近期已有一些工作尝试进行这种融合，但它们通常依赖深度传感器或闭集语义模型，从而限制了在开放世界环境中的可扩展性和适应性。本文提出 OpenMonoGS-SLAM，这是首个将 3D Gaussian Splatting (3DGS) 与开放集语义理解统一起来的单目 SLAM 框架。为实现这一目标，我们利用视觉基础模型的最新进展，包括用于视觉几何的 MASt3R，以及用于开放词汇语义的 SAM 和 CLIP。这些模型在多种任务上具备强泛化能力，使系统能够实现准确的单目相机跟踪与建图，并获得丰富的开放世界语义理解。我们的方法不需要任何深度输入或三维语义真值，仅依赖自监督学习目标。此外，我们还提出一种专门管理高维语义特征的记忆机制，能够有效构建高斯语义特征图，并带来整体性能提升。实验结果表明，该方法在闭集和开放集分割任务上都达到与现有基线相当甚至更优的性能，同时无需依赖深度图或语义标注等额外传感器。\n"
  },
  {
    "path": "abs/2512.09162.md",
    "content": "### GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars\n\nRecent advancements in Gaussian Splatting have enabled increasingly accurate reconstruction of photorealistic head avatars, opening the door to numerous applications in visual effects, videoconferencing, and virtual reality. This, however, comes with the lack of intuitive editability offered by traditional triangle mesh-based methods. In contrast, we propose a method that combines the accuracy and fidelity of 2D Gaussian Splatting with the intuitiveness of UV texture mapping. By embedding each canonical Gaussian primitive's local frame into a patch in the UV space of a template mesh in a computationally efficient manner, we reconstruct continuous editable material head textures from a single monocular video on a conventional UV domain. Furthermore, we leverage an efficient physically based reflectance model to enable relighting and editing of these intrinsic material maps. Through extensive comparisons with state-of-the-art methods, we demonstrate the accuracy of our reconstructions, the quality of our relighting results, and the ability to provide intuitive controls for modifying an avatar's appearance and geometry via texture mapping without additional optimization.\n\nGaussian Splatting 的最新进展使对写实头部化身的重建越来越精确，为视觉特效、视频会议和虚拟现实等应用打开了大门。但与此同时，这类方法缺乏传统三角网格方法所具备的直观可编辑性。为此，我们提出一种将 2D Gaussian Splatting 的精度与保真度和 UV 纹理映射的直观编辑能力结合起来的方法。通过以计算高效的方式将每个规范高斯 primitive 的局部坐标系嵌入到模板网格 UV 空间中的一个 patch，我们能够从单个单目视频在常规 UV 域中重建连续且可编辑的材质头部纹理。此外，我们利用高效的物理反射模型，使这些内在材质图支持重光照和编辑。通过与最先进方法的大量比较，我们验证了重建精度、重光照质量，以及在无需额外优化的情况下通过纹理映射直观控制化身外观和几何的能力。\n"
  },
  {
    "path": "abs/2512.09335.md",
    "content": "### Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular Video\n\nModeling relightable and animatable human avatars from monocular video is a long-standing and challenging task. Recently, Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) methods have been employed to reconstruct the avatars. However, they often produce unsatisfactory photo-realistic results because of insufficient geometrical details related to body motion, such as clothing wrinkles. In this paper, we propose a 3DGS-based human avatar modeling framework, termed as Relightable and Dynamic Gaussian Avatar (RnD-Avatar), that presents accurate pose-variant deformation for high-fidelity geometrical details. To achieve this, we introduce dynamic skinning weights that define the human avatar's articulation based on pose while also learning additional deformations induced by body motion. We also introduce a novel regularization to capture fine geometric details under sparse visual cues. Furthermore, we present a new multi-view dataset with varied lighting conditions to evaluate relight. Our framework enables realistic rendering of novel poses and views while supporting photo-realistic lighting effects under arbitrary lighting conditions. Our method achieves state-of-the-art performance in novel view synthesis, novel pose rendering, and relighting.\n\n从单目视频中建模可重光照、可动画的人体化身是一项长期且困难的任务。近年来，NeRF 和 3D Gaussian Splatting (3DGS) 方法被用于重建这类化身，但由于对衣物褶皱等与身体运动相关的几何细节建模不足，往往难以达到令人满意的照片级真实感。本文提出一种基于 3DGS 的人体化身建模框架，称为 Relightable and Dynamic Gaussian Avatar（RnD-Avatar），可为高保真几何细节提供准确的姿态相关形变。为此，我们引入动态蒙皮权重，根据姿态定义人体化身的关节运动，同时学习由身体运动引起的附加形变。我们还提出一种新的正则化，在稀疏视觉线索下捕获精细几何细节。此外，我们构建了一个具有多种光照条件的新多视角数据集，用于评估重光照能力。该框架支持新姿态和新视角下的真实渲染，并能在任意光照条件下实现照片级真实的光照效果。方法在新视角合成、新姿态渲染和重光照任务上都达到最先进性能。\n"
  },
  {
    "path": "abs/2512.09411.md",
    "content": "### D$^2$GSLAM: 4D Dynamic Gaussian Splatting SLAM\n\nRecent advances in Dense Simultaneous Localization and Mapping (SLAM) have demonstrated remarkable performance in static environments. However, dense SLAM in dynamic environments remains challenging. Most methods directly remove dynamic objects and focus solely on static scene reconstruction, which ignores the motion information contained in these dynamic objects. In this paper, we present D$^2$GSLAM, a novel dynamic SLAM system utilizing Gaussian representation, which simultaneously performs accurate dynamic reconstruction and robust tracking within dynamic environments. Our system is composed of four key components: (i) We propose a geometric-prompt dynamic separation method to distinguish between static and dynamic elements of the scene. This approach leverages the geometric consistency of Gaussian representation and scene geometry to obtain coarse dynamic regions. The regions then serve as prompts to guide the refinement of the coarse mask for achieving accurate motion mask. (ii) To facilitate accurate and efficient mapping of the dynamic scene, we introduce dynamic-static composite representation that integrates static 3D Gaussians with dynamic 4D Gaussians. This representation allows for modeling the transitions between static and dynamic states of objects in the scene for composite mapping and optimization. (iii) We employ a progressive pose refinement strategy that leverages both the multi-view consistency of static scene geometry and motion information from dynamic objects to achieve accurate camera tracking. (iv) We introduce a motion consistency loss, which leverages the temporal continuity in object motions for accurate dynamic modeling. Our D$^2$GSLAM demonstrates superior performance on dynamic scenes in terms of mapping and tracking accuracy, while also showing capability in accurate dynamic modeling.\n\n近期稠密同步定位与建图（SLAM）在静态环境中表现出色，但动态环境下的稠密 SLAM 仍然极具挑战。大多数方法直接移除动态物体，只关注静态场景重建，从而忽略了这些动态物体所包含的运动信息。本文提出 D²GSLAM，一个利用高斯表示的动态 SLAM 系统，可在动态环境中同时实现精确的动态重建与稳健跟踪。系统包含四个关键组成部分：(i) 我们提出基于几何提示的动静分离方法，利用高斯表示与场景几何的一致性获得粗略动态区域，再将其作为提示引导粗掩码细化，得到准确的运动掩码。(ii) 为实现高效且准确的动态场景建图，我们引入动静组合表示，将静态 3D 高斯与动态 4D 高斯结合起来，用于建模场景中物体在静态与动态状态之间的转变，并进行联合建图与优化。(iii) 我们采用渐进式位姿优化策略，同时利用静态场景几何的多视角一致性和动态物体的运动信息，实现准确的相机跟踪。(iv) 我们引入运动一致性损失，利用物体运动的时间连续性来实现准确的动态建模。D²GSLAM 在动态场景的建图和跟踪精度上都表现优异，并展现了准确动态建模能力。\n"
  },
  {
    "path": "abs/2512.09608.md",
    "content": "### Super4DR: 4D Radar-centric Self-supervised Odometry and Gaussian-based Map Optimization\n\nConventional SLAM systems using visual or LiDAR data often struggle in poor lighting and severe weather. Although 4D radar is suited for such environments, its sparse and noisy point clouds hinder accurate odometry estimation, while the radar maps suffer from obscure and incomplete structures. Thus, we propose Super4DR, a 4D radar-centric framework for learning-based odometry estimation and gaussian-based map optimization. First, we design a cluster-aware odometry network that incorporates object-level cues from the clustered radar points for inter-frame matching, alongside a hierarchical self-supervision mechanism to overcome outliers through spatio-temporal consistency, knowledge transfer, and feature contrast. Second, we propose using 3D gaussians as an intermediate representation, coupled with a radar-specific growth strategy, selective separation, and multi-view regularization, to recover blurry map areas and those undetected based on image texture. Experiments show that Super4DR achieves a 67% performance gain over prior self-supervised methods, nearly matches supervised odometry, and narrows the map quality disparity with LiDAR while enabling multi-modal image rendering.\n\n传统基于视觉或 LiDAR 的 SLAM 系统在弱光和恶劣天气下往往表现不佳。尽管 4D radar 很适合这类环境，但其点云稀疏且噪声较大，不利于精确里程计估计，基于 radar 构建的地图也常常结构模糊且不完整。为此，我们提出 Super4DR，这是一个以 4D radar 为中心的框架，用于学习式里程计估计和基于 Gaussian 的地图优化。首先，我们设计了一个具备聚类感知能力的里程计网络，将聚类后的 radar 点中的目标级线索用于帧间匹配，并结合分层自监督机制，通过时空一致性、知识迁移和特征对比来抑制离群点。其次，我们提出将 3D Gaussian 作为中间表示，并结合 radar 特定的生长策略、选择性分离和多视图正则，以恢复模糊区域以及那些仅靠图像纹理难以检测到的地图区域。实验表明，Super4DR 相比先前的自监督方法实现了 67% 的性能提升，效果接近监督式里程计，并缩小了与 LiDAR 地图质量之间的差距，同时还能支持多模态图像渲染。\n"
  },
  {
    "path": "abs/2512.09903.md",
    "content": "### YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos\n\nVisual navigation has emerged as a practical alternative to traditional robotic navigation pipelines that rely on detailed mapping and path planning. However, constructing and maintaining 3D maps is often computationally expensive and memory-intensive. We address the problem of visual navigation when exploration videos of a large environment are available. The videos serve as a visual reference, allowing a robot to retrace the explored trajectories without relying on metric maps. Our proposed method, YOPO-Nav (You Only Pass Once), encodes an environment into a compact spatial representation composed of interconnected local 3D Gaussian Splatting (3DGS) models. During navigation, the framework aligns the robot's current visual observation with this representation and predicts actions that guide it back toward the demonstrated trajectory. YOPO-Nav employs a hierarchical design: a visual place recognition (VPR) module provides coarse localization, while the local 3DGS models refine the goal and intermediate poses to generate control actions. To evaluate our approach, we introduce the YOPO-Campus dataset, comprising 4 hours of egocentric video and robot controller inputs from over 6 km of human-teleoperated robot trajectories. We benchmark recent visual navigation methods on trajectories from YOPO-Campus using a Clearpath Jackal robot. Experimental results show YOPO-Nav provides excellent performance in image-goal navigation for real-world scenes on a physical robot. The dataset and code will be made publicly available for visual navigation and scene representation research.\n\n视觉导航已成为依赖详细建图和路径规划的传统机器人导航流程的一种实用替代方案。然而，构建和维护三维地图通常计算开销大、内存占用高。我们研究当大环境探索视频可用时的视觉导航问题。这些视频可作为视觉参考，使机器人在不依赖度量地图的情况下沿着已探索轨迹返回。我们提出 YOPO-Nav（You Only Pass Once），将环境编码为由多个互联局部 3D Gaussian Splatting (3DGS) 模型组成的紧凑空间表示。导航过程中，系统将机器人当前视觉观测与该表示对齐，并预测动作以引导其返回示教轨迹。YOPO-Nav 采用分层设计：视觉地点识别模块负责粗定位，局部 3DGS 模型进一步细化目标位姿和中间位姿，从而生成控制动作。为评估该方法，我们提出 YOPO-Campus 数据集，其中包含 4 小时第一视角视频和机器人控制输入，覆盖超过 6 公里的人类遥操作机器人轨迹。我们在 Clearpath Jackal 机器人上，使用 YOPO-Campus 的轨迹对近期视觉导航方法进行了基准评测。实验结果表明，YOPO-Nav 在真实场景中的图像目标导航任务上表现优异。数据集和代码将公开。\n"
  },
  {
    "path": "abs/2512.09923.md",
    "content": "### Splatent: Splatting Diffusion Latents for Novel View Synthesis\n\nRadiance field representations have recently been explored in the latent space of VAEs that are commonly used by diffusion models. This direction offers efficient rendering and seamless integration with diffusion-based pipelines. However, these methods face a fundamental limitation: The VAE latent space lacks multi-view consistency, leading to blurred textures and missing details during 3D reconstruction. Existing approaches attempt to address this by fine-tuning the VAE, at the cost of reconstruction quality, or by relying on pre-trained diffusion models to recover fine-grained details, at the risk of some hallucinations. We present Splatent, a diffusion-based enhancement framework designed to operate on top of 3D Gaussian Splatting (3DGS) in the latent space of VAEs. Our key insight departs from the conventional 3D-centric view: rather than reconstructing fine-grained details in 3D space, we recover them in 2D from input views through multi-view attention mechanisms. This approach preserves the reconstruction quality of pretrained VAEs while achieving faithful detail recovery. Evaluated across multiple benchmarks, Splatent establishes a new state-of-the-art for VAE latent radiance field reconstruction. We further demonstrate that integrating our method with existing feed-forward frameworks, consistently improves detail preservation, opening new possibilities for high-quality sparse-view 3D reconstruction.\n\n近年来，辐射场表示已经开始在扩散模型常用的 VAE 潜空间中进行探索。这一方向具有渲染高效、且易于与基于扩散的流程集成等优点。然而，这类方法存在一个根本性限制：VAE 潜空间缺乏多视图一致性，会在 3D 重建过程中造成纹理模糊和细节缺失。现有方法要么通过微调 VAE 来缓解这一问题，但会牺牲重建质量；要么依赖预训练扩散模型恢复细粒度细节，但存在幻觉风险。我们提出 Splatent，这是一个设计在 VAE 潜空间 3D Gaussian Splatting (3DGS) 之上的扩散增强框架。我们的核心洞见不同于传统以 3D 为中心的视角：我们不是在 3D 空间中重建细节，而是通过多视图注意力机制在输入视图的 2D 空间中恢复细节。该方法在保持预训练 VAE 重建质量的同时，实现了忠实的细节恢复。在多个基准上的评测表明，Splatent 为 VAE 潜空间辐射场重建建立了新的 SOTA。我们还展示了将该方法与现有前馈框架结合后，可以稳定提升细节保留能力，为高质量稀疏视角 3D 重建打开了新的可能。\n"
  },
  {
    "path": "abs/2512.09925.md",
    "content": "### GAINS: Gaussian-based Inverse Rendering from Sparse Multi-View Captures\n\nRecent advances in Gaussian Splatting-based inverse rendering extend Gaussian primitives with shading parameters and physically grounded light transport, enabling high-quality material recovery from dense multi-view captures. However, these methods degrade sharply under sparse-view settings, where limited observations lead to severe ambiguity between geometry, reflectance, and lighting. We introduce GAINS (Gaussian-based Inverse rendering from Sparse multi-view captures), a two-stage inverse rendering framework that leverages learning-based priors to stabilize geometry and material estimation. GAINS first refines geometry using monocular depth/normal and diffusion priors, then employs segmentation, intrinsic image decomposition (IID), and diffusion priors to regularize material recovery. Extensive experiments on synthetic and real-world datasets show that GAINS significantly improves material parameter accuracy, relighting quality, and novel-view synthesis compared to state-of-the-art Gaussian-based inverse rendering methods, especially under sparse-view settings.\n\n近期基于 Gaussian Splatting 的 inverse rendering 方法通过为 Gaussian primitive 扩展着色参数并引入具有物理依据的光传输，使得从稠密多视图采集恢复高质量材质成为可能。然而，在稀疏视角设定下，这类方法性能会急剧下降，因为有限观测会在几何、反射率和光照之间带来严重歧义。我们提出 GAINS（Gaussian-based Inverse Rendering from Sparse Multi-View Captures），这是一个两阶段 inverse rendering 框架，利用学习型先验来稳定几何与材质估计。GAINS 首先利用单目深度/法线先验和扩散先验细化几何；随后通过分割、intrinsic image decomposition (IID) 以及扩散先验对材质恢复进行正则化。大量合成与真实数据集实验表明，相比现有最先进的 Gaussian-based inverse rendering 方法，GAINS 在材质参数精度、重光照质量和新视角合成方面都有显著提升，尤其是在稀疏视角条件下。\n"
  },
  {
    "path": "abs/2512.10095.md",
    "content": "### TraceFlow: Dynamic 3D Reconstruction of Specular Scenes Driven by Ray Tracing\n\nWe present TraceFlow, a novel framework for high-fidelity rendering of dynamic specular scenes by addressing two key challenges: precise reflection direction estimation and physically accurate reflection modeling. To achieve this, we propose a Residual Material-Augmented 2D Gaussian Splatting representation that models dynamic geometry and material properties, allowing accurate reflection ray computation. Furthermore, we introduce a Dynamic Environment Gaussian and a hybrid rendering pipeline that decomposes rendering into diffuse and specular components, enabling physically grounded specular synthesis via rasterization and ray tracing. Finally, we devise a coarse-to-fine training strategy to improve optimization stability and promote physically meaningful decomposition. Extensive experiments on dynamic scene benchmarks demonstrate that TraceFlow outperforms prior methods both quantitatively and qualitatively, producing sharper and more realistic specular reflections in complex dynamic environments.\n\n我们提出 TraceFlow，这是一种用于高保真渲染动态镜面场景的新框架，重点解决两个关键问题：精确的反射方向估计，以及物理上更准确的反射建模。为此，我们提出 Residual Material-Augmented 2D Gaussian Splatting 表示，用于同时建模动态几何与材质属性，从而支持准确的反射光线计算。此外，我们引入 Dynamic Environment Gaussian 和一个混合渲染管线，将渲染分解为漫反射与镜面反射两部分，并通过光栅化与射线追踪实现具有物理依据的镜面反射合成。最后，我们设计了粗到细的训练策略，以提升优化稳定性并促进更符合物理意义的分解。大量动态场景基准实验表明，TraceFlow 在定量与定性上都优于现有方法，能够在复杂动态环境中生成更锐利、更真实的镜面反射效果。\n"
  },
  {
    "path": "abs/2512.10267.md",
    "content": "### Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction\n\nRecent advances in generalizable Gaussian splatting (GS) have enabled feed-forward reconstruction of scenes from tens of input views. Long-LRM notably scales this paradigm to 32 input images at 950x540 resolution, achieving 360-degree scene-level reconstruction in a single forward pass. However, directly predicting millions of Gaussian parameters at once remains highly error-sensitive: small inaccuracies in positions or other attributes lead to noticeable blurring, particularly in fine structures such as text. In parallel, implicit representation methods such as LVSM and LaCT have demonstrated significantly higher rendering fidelity by compressing scene information into model weights rather than explicit Gaussians, and decoding RGB frames using the full transformer or TTT backbone. However, this computationally intensive decompression process for every rendered frame makes real-time rendering infeasible. These observations raise key questions: Is the deep, sequential \"decompression\" process necessary? Can we retain the benefits of implicit representations while enabling real-time performance? We address these questions with Long-LRM++, a model that adopts a semi-explicit scene representation combined with a lightweight decoder. Long-LRM++ matches the rendering quality of LaCT on DL3DV while achieving real-time 14 FPS rendering on an A100 GPU, overcoming the speed limitations of prior implicit methods. Our design also scales to 64 input views at the 950x540 resolution, demonstrating strong generalization to increased input lengths. Additionally, Long-LRM++ delivers superior novel-view depth prediction on ScanNetv2 compared to direct depth rendering from Gaussians. Extensive ablation studies validate the effectiveness of each component in the proposed framework.\n\n近年来，可泛化 Gaussian splatting (GS) 的进展使得从数十张输入视图进行前馈式场景重建成为可能。Long-LRM 将这一范式扩展到 32 张 950x540 分辨率输入图像，可在单次前向传播中完成 360 度场景级重建。然而，一次性直接预测数百万个 Gaussian 参数仍然对误差极其敏感：位置或其他属性中的微小偏差都会导致明显模糊，尤其是在文本等精细结构上。与此同时，LVSM 和 LaCT 等隐式表示方法通过将场景信息压缩到模型权重中，而不是显式 Gaussian 中，并使用完整 transformer 或 TTT 主干解码 RGB 帧，展示出显著更高的渲染保真度。但这种对每一帧都进行高计算量“解压”的过程使实时渲染不可行。这引出了几个关键问题：这种深层、串行的“解压”过程是否必需？我们能否在保留隐式表示优势的同时实现实时性能？为此，我们提出 Long-LRM++，采用半显式场景表示并结合轻量解码器。Long-LRM++ 在 DL3DV 上匹配 LaCT 的渲染质量，同时在 A100 GPU 上实现实时 14 FPS 渲染，克服了先前隐式方法的速度限制。我们的设计还可扩展到 64 张 950x540 分辨率输入视图，展现出对更长输入序列的强泛化能力。此外，Long-LRM++ 在 ScanNetv2 上的新视角深度预测也优于直接从 Gaussian 渲染深度的方法。大量消融实验验证了所提框架中各组件的有效性。\n"
  },
  {
    "path": "abs/2512.10369.md",
    "content": "### Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views\n\n3D Gaussian Splatting (3DGS) has emerged as a state-of-the-art method for novel view synthesis. However, its performance heavily relies on dense, high-quality input imagery, an assumption that is often violated in real-world applications, where data is typically sparse and motion-blurred. These two issues create a vicious cycle: sparse views ignore the multi-view constraints necessary to resolve motion blur, while motion blur erases high-frequency details crucial for aligning the limited views. Thus, reconstruction often fails catastrophically, with fragmented views and a low-frequency bias. To break this cycle, we introduce CoherentGS, a novel framework for high-fidelity 3D reconstruction from sparse and blurry images. Our key insight is to address these compound degradations using a dual-prior strategy. Specifically, we combine two pre-trained generative models: a specialized deblurring network for restoring sharp details and providing photometric guidance, and a diffusion model that offers geometric priors to fill in unobserved regions of the scene. This dual-prior strategy is supported by several key techniques, including a consistency-guided camera exploration module that adaptively guides the generative process, and a depth regularization loss that ensures geometric plausibility. We evaluate CoherentGS through both quantitative and qualitative experiments on synthetic and real-world scenes, using as few as 3, 6, and 9 input views. Our results demonstrate that CoherentGS significantly outperforms existing methods, setting a new state-of-the-art for this challenging task. The code and video demos are available at https://potatobigroom.github.io/CoherentGS/.\n\n3D Gaussian Splatting (3DGS) 已成为新视角合成的先进方法，但其性能高度依赖稠密且高质量的输入图像，而这一假设在真实应用中往往不成立，因为数据通常既稀疏又存在运动模糊。这两个问题会形成恶性循环：稀疏视角缺乏解决运动模糊所需的多视角约束，而运动模糊又抹去了对有限视角对齐至关重要的高频细节，导致重建灾难性失败，表现为视图碎裂和低频偏置。为打破这一循环，我们提出 CoherentGS，一个从稀疏且模糊的图像中进行高保真三维重建的新框架。核心思想是利用双先验策略应对复合退化：将用于恢复清晰细节并提供光度引导的专用去模糊网络，与用于补全未观测区域几何先验的扩散模型结合起来。该双先验策略还配合多个关键技术，包括一致性引导的相机探索模块，以及确保几何合理性的深度正则损失。我们在合成和真实场景上，以仅 3、6、9 个输入视角进行定量和定性评估。结果表明，CoherentGS 显著优于现有方法，在这一困难任务上建立了新的最先进结果。代码和视频演示见 https://potatobigroom.github.io/CoherentGS/ 。\n"
  },
  {
    "path": "abs/2512.10572.md",
    "content": "### DeMapGS: Simultaneous Mesh Deformation and Surface Attribute Mapping via Gaussian Splatting\n\nWe propose DeMapGS, a structured Gaussian Splatting framework that jointly optimizes deformable surfaces and surface-attached 2D Gaussian splats. By anchoring splats to a deformable template mesh, our method overcomes topological inconsistencies and enhances editing flexibility, addressing limitations of prior Gaussian Splatting methods that treat points independently. The unified representation in our method supports extraction of high-fidelity diffuse, normal, and displacement maps, enabling the reconstructed mesh to inherit the photorealistic rendering quality of Gaussian Splatting. To support robust optimization, we introduce a gradient diffusion strategy that propagates supervision across the surface, along with an alternating 2D/3D rendering scheme to handle concave regions. Experiments demonstrate that DeMapGS achieves state-of-the-art mesh reconstruction quality and enables downstream applications for Gaussian splats such as editing and cross-object manipulation through a shared parametric surface.\n\n我们提出 DeMapGS，这是一种结构化的 Gaussian Splatting 框架，可联合优化可变形表面以及附着在表面上的 2D Gaussian splat。通过将 splat 锚定到可变形模板网格上，该方法克服了拓扑不一致问题并增强了编辑灵活性，解决了以往将点独立处理的 Gaussian Splatting 方法的局限。我们的方法中的统一表示支持提取高保真的 diffuse、normal 和 displacement 贴图，使重建网格能够继承 Gaussian Splatting 的照片级渲染质量。为支持稳健优化，我们引入梯度扩散策略，将监督在表面上传播，并采用交替的 2D/3D 渲染方案处理凹陷区域。实验表明，DeMapGS 在网格重建质量上达到 SOTA，并支持 Gaussian splat 的下游应用，例如基于共享参数化表面的编辑与跨物体操控。\n"
  },
  {
    "path": "abs/2512.10685.md",
    "content": "### Sharp Monocular View Synthesis in Less Than a Second\n\nWe present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude.\n\n我们提出 SHARP，这是一种从单张图像生成照片级新视角的方案。给定一张照片，SHARP 通过一次神经网络前馈传播，在标准 GPU 上于 1 秒内回归出场景的 3D Gaussian 表示参数。得到的 3D Gaussian 表示可以实时渲染，从而为邻近视角生成高分辨率、照片级图像。该表示具有绝对尺度，是一个度量化表示，因此支持具有实际尺度意义的相机移动。实验结果表明，SHARP 在多个数据集上都展现出稳健的零样本泛化能力，并在多个基准上达到新的 SOTA：相较最佳已有方法，LPIPS 降低 25%-34%，DISTS 降低 21%-43%，同时合成时间缩短了三个数量级。\n"
  },
  {
    "path": "abs/2512.10939.md",
    "content": "### GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting\n\nSpeech-driven talking heads have recently emerged and enable interactive avatars. However, real-world applications are limited, as current methods achieve high visual fidelity but slow or fast yet temporally unstable. Diffusion methods provide realistic image generation, yet struggle with oneshot settings. Gaussian Splatting approaches are real-time, yet inaccuracies in facial tracking, or inconsistent Gaussian mappings, lead to unstable outputs and video artifacts that are detrimental to realistic use cases. We address this problem by mapping Gaussian Splatting using 3D Morphable Models to generate person-specific avatars. We introduce transformer-based prediction of model parameters, directly from audio, to drive temporal consistency. From monocular video and independent audio speech inputs, our method enables generation of real-time talking head videos where we report competitive quantitative and qualitative performance.\n\n语音驱动的 talking head 近来发展迅速，可用于构建交互式虚拟化身。然而在真实应用中，现有方法通常要么视觉质量高但速度慢，要么速度快但时间稳定性差。扩散方法能够生成逼真图像，但在 one-shot 场景中表现受限。Gaussian Splatting 方法虽然可以实时运行，但由于人脸跟踪不准确或高斯映射不一致，往往会产生不稳定输出和视频伪影，不利于真实使用场景。为解决这一问题，我们利用 3D Morphable Model 对 Gaussian Splatting 进行参数化映射，从而生成人物特定的头像化身。我们引入基于 Transformer 的参数预测模块，直接由音频驱动，以增强时间一致性。给定单目视频和独立语音输入，我们的方法能够实时生成 talking head 视频，并在定量和定性评估中取得有竞争力的结果。\n"
  },
  {
    "path": "abs/2512.11186.md",
    "content": "### Lightweight 3D Gaussian Splatting Compression via Video Codec\n\nCurrent video-based GS compression methods rely on using Parallel Linear Assignment Sorting (PLAS) to convert 3D GS into smooth 2D maps, which are computationally expensive and time-consuming, limiting the application of GS on lightweight devices. In this paper, we propose a Lightweight 3D Gaussian Splatting (GS) Compression method based on Video codec (LGSCV). First, a two-stage Morton scan is proposed to generate blockwise 2D maps that are friendly for canonical video codecs in which the coding units (CU) are square blocks. A 3D Morton scan is used to permute GS primitives, followed by a 2D Morton scan to map the ordered GS primitives to 2D maps in a blockwise style. However, although the blockwise 2D maps report close performance to the PLAS map in high-bitrate regions, they show a quality collapse at medium-to-low bitrates. Therefore, a principal component analysis (PCA) is used to reduce the dimensionality of spherical harmonics (SH), and a MiniPLAS, which is flexible and fast, is designed to permute the primitives within certain block sizes. Incorporating SH PCA and MiniPLAS leads to a significant gain in rate-distortion (RD) performance, especially at medium and low bitrates. MiniPLAS can also guide the setting of the codec CU size configuration and significantly reduce encoding time. Experimental results on the MPEG dataset demonstrate that the proposed LGSCV achieves over 20% RD gain compared with state-of-the-art methods, while reducing 2D map generation time to approximately 1 second and cutting encoding time by 50%. The code is available at https://github.com/Qi-Yangsjtu/LGSCV .\n\n当前基于视频的 GS 压缩方法依赖 Parallel Linear Assignment Sorting (PLAS) 将 3D GS 转为平滑的二维映射，这一过程计算代价高且耗时长，限制了 GS 在轻量设备上的应用。本文提出一种基于视频编解码器的轻量级 3D Gaussian Splatting 压缩方法 LGSCV。首先，我们设计了两阶段 Morton 扫描，以生成更适配标准视频编码器方形编码单元的块状二维映射：先通过三维 Morton 扫描对 GS primitives 重排，再通过二维 Morton 扫描将有序 primitives 映射为块状二维图。然而，尽管这种块状二维图在高码率区域的表现接近 PLAS 图，但在中低码率下会出现明显的质量崩塌。为此，我们利用主成分分析对球谐系数进行降维，并设计了灵活高效的 MiniPLAS，在给定块大小内对 primitives 进行置换。结合 SH PCA 和 MiniPLAS 后，率失真性能显著提升，尤其是在中低码率区域。MiniPLAS 还能指导编码器编码单元大小配置，并显著减少编码时间。在 MPEG 数据集上的实验表明，LGSCV 相比最先进方法取得超过 20% 的率失真增益，同时将二维图生成时间缩短到约 1 秒，并将编码时间减少 50%。代码见 https://github.com/Qi-Yangsjtu/LGSCV 。\n"
  },
  {
    "path": "abs/2512.11356.md",
    "content": "### Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video\n\nWe introduce a fully automatic pipeline for dynamic scene reconstruction from casually captured monocular RGB videos. Rather than designing a new scene representation, we enhance the priors that drive Dynamic Gaussian Splatting. Video segmentation combined with epipolar-error maps yields object-level masks that closely follow thin structures; these masks (i) guide an object-depth loss that sharpens the consistent video depth, and (ii) support skeleton-based sampling plus mask-guided re-identification to produce reliable, comprehensive 2-D tracks. Two additional objectives embed the refined priors in the reconstruction stage: a virtual-view depth loss removes floaters, and a scaffold-projection loss ties motion nodes to the tracks, preserving fine geometry and coherent motion. The resulting system surpasses previous monocular dynamic scene reconstruction methods and delivers visibly superior renderings\n\n我们提出一条完全自动化的流程，用于从随手拍摄的单目 RGB 视频中重建动态场景。我们并未重新设计一种新的场景表示，而是强化驱动 Dynamic Gaussian Splatting 的先验。视频分割结合极线误差图生成紧贴细结构的目标级掩码；这些掩码一方面通过目标深度损失提升视频深度的一致性和清晰度，另一方面结合基于骨架的采样与掩码引导的重识别，生成可靠且完整的 2D 轨迹。我们又在重建阶段引入两个额外目标，将这些改进的先验嵌入优化中：虚拟视角深度损失用于去除漂浮伪影，scaffold 投影损失将运动节点与轨迹绑定，从而保留精细几何并维持连贯运动。最终系统超过了已有的单目动态场景重建方法，并在渲染效果上明显更好。\n"
  },
  {
    "path": "abs/2512.11624.md",
    "content": "### Fast and Explicit: Slice-to-Volume Reconstruction via 3D Gaussian Primitives with Analytic Point Spread Function Modeling\n\nRecovering high-fidelity 3D images from sparse or degraded 2D images is a fundamental challenge in medical imaging, with broad applications ranging from 3D ultrasound reconstruction to MRI super-resolution. In the context of fetal MRI, high-resolution 3D reconstruction of the brain from motion-corrupted low-resolution 2D acquisitions is a prerequisite for accurate neurodevelopmental diagnosis. While implicit neural representations (INRs) have recently established state-of-the-art performance in self-supervised slice-to-volume reconstruction (SVR), they suffer from a critical computational bottleneck: accurately modeling the image acquisition physics requires expensive stochastic Monte Carlo sampling to approximate the point spread function (PSF). In this work, we propose a shift from neural network based implicit representations to Gaussian based explicit representations. By parameterizing the HR 3D image volume as a field of anisotropic Gaussian primitives, we leverage the property of Gaussians being closed under convolution and thus derive a closed-form analytical solution for the forward model. This formulation reduces the previously intractable acquisition integral to an exact covariance addition, `Sigma_obs = Sigma_HR + Sigma_PSF`, effectively bypassing the need for compute-intensive stochastic sampling while ensuring exact gradient propagation. We demonstrate that our approach matches the reconstruction quality of self-supervised state-of-the-art SVR frameworks while delivering a 5x-10x speed-up on neonatal and fetal data. With convergence often reached in under 30 seconds, our framework paves the way towards translation into clinical routine of real-time fetal 3D MRI.\n\n从稀疏或退化的 2D 图像中恢复高保真的 3D 图像是医学成像中的基础挑战，应用范围涵盖 3D 超声重建到 MRI 超分辨率。在胎儿 MRI 场景中，从受运动影响的低分辨率 2D 采集中重建大脑的高分辨率 3D 图像，是进行准确神经发育诊断的前提。尽管隐式神经表示 (INR) 近来在自监督 slice-to-volume reconstruction (SVR) 上取得 SOTA，但它们存在一个关键计算瓶颈：为了准确建模成像物理，需要通过代价高昂的随机蒙特卡洛采样来逼近点扩散函数 (PSF)。本文提出从基于神经网络的隐式表示转向基于 Gaussian 的显式表示。通过把高分辨率 3D 图像体参数化为各向异性 Gaussian primitive 场，我们利用 Gaussian 在卷积下封闭的性质，为前向模型推导出闭式解析解。这一公式把原本难以处理的成像积分简化为精确的协方差相加，即 `Sigma_obs = Sigma_HR + Sigma_PSF`，从而在保证精确梯度传播的同时，绕开高成本的随机采样。实验表明，我们的方法在新生儿和胎儿数据上能够在与自监督 SOTA SVR 框架相当的重建质量下，实现 5 到 10 倍加速，且通常在 30 秒内收敛，为实时胎儿 3D MRI 进入临床流程铺平了道路。\n"
  },
  {
    "path": "abs/2512.11800.md",
    "content": "### Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance\n\nThe recent success of 3D Gaussian Splatting (3DGS) has reshaped novel view synthesis by enabling fast optimization and real-time rendering of high-quality radiance fields. However, it relies on simplified, order-dependent alpha blending and coarse approximations of the density integral within the rasterizer, thereby limiting its ability to render complex, overlapping semi-transparent objects. In this paper, we extend rasterization-based rendering of 3D Gaussian representations with a novel method for high-fidelity transmittance computation, entirely avoiding the need for ray tracing or per-pixel sample sorting. Building on prior work in moment-based order-independent transparency, our key idea is to characterize the density distribution along each camera ray with a compact and continuous representation based on statistical moments. To this end, we analytically derive and compute a set of per-pixel moments from all contributing 3D Gaussians. From these moments, a continuous transmittance function is reconstructed for each ray, which is then independently sampled within each Gaussian. As a result, our method bridges the gap between rasterization and physical accuracy by modeling light attenuation in complex translucent media, significantly improving overall reconstruction and rendering quality.\n\n近年来 3D Gaussian Splatting (3DGS) 的成功通过快速优化和实时渲染高质量辐射场，重塑了新视角合成。然而，它依赖简化的、与顺序相关的 alpha 混合，以及光栅化器中对密度积分的粗略近似，因此难以渲染复杂、相互重叠的半透明物体。本文提出一种用于 3D Gaussian 表示光栅化渲染的高保真透射率计算方法，完全避免射线追踪和逐像素样本排序。基于以矩为基础的顺序无关透明度工作，我们的核心思想是用基于统计矩的紧凑连续表示来刻画每条相机射线上的密度分布。为此，我们从所有有贡献的 3D Gaussian 中解析推导并计算每个像素的一组矩，再据此为每条射线重建连续的透射率函数，并在每个 Gaussian 内独立采样。由此，该方法通过在复杂半透明介质中更准确地建模光衰减，缩小了光栅化与物理准确性之间的差距，并显著提升了整体重建和渲染质量。\n"
  },
  {
    "path": "abs/2512.13411.md",
    "content": "### Computer vision training dataset generation for robotic environments using Gaussian splatting\n\nThis paper introduces a novel pipeline for generating large-scale, highly realistic, and automatically labeled datasets for computer vision tasks in robotic environments. Our approach addresses the critical challenges of the domain gap between synthetic and real-world imagery and the time-consuming bottleneck of manual annotation. We leverage 3D Gaussian Splatting (3DGS) to create photorealistic representations of the operational environment and objects. These assets are then used in a game engine where physics simulations create natural arrangements. A novel, two-pass rendering technique combines the realism of splats with a shadow map generated from proxy meshes. This map is then algorithmically composited with the image to add both physically plausible shadows and subtle highlights, significantly enhancing realism. Pixel-perfect segmentation masks are generated automatically and formatted for direct use with object detection models like YOLO. Our experiments show that a hybrid training strategy, combining a small set of real images with a large volume of our synthetic data, yields the best detection and segmentation performance, confirming this as an optimal strategy for efficiently achieving robust and accurate models.\n\n本文提出一条新的流程，用于在机器人环境中生成大规模、高真实感且可自动标注的计算机视觉训练数据集。我们的方法针对两个关键问题：合成图像与真实图像之间的域差距，以及人工标注耗时这一瓶颈。我们利用 3D Gaussian Splatting (3DGS) 构建作业环境和目标物体的高真实感表示，再将这些资产导入游戏引擎，通过物理仿真生成自然摆放。我们提出一种新的双阶段渲染方法，将 splat 的真实感与由代理网格生成的阴影贴图结合起来，再通过算法将其与图像合成，从而加入物理上更合理的阴影和细微高光，显著提升真实感。系统还能自动生成像素级分割掩码，并直接格式化为 YOLO 等目标检测模型可用的数据。实验表明，将少量真实图像与大量合成数据结合的混合训练策略，能够取得最佳的检测与分割性能，证明这是高效获得稳健且准确模型的优选方案。\n"
  },
  {
    "path": "abs/2512.13796.md",
    "content": "### Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries\n\nThough Gaussian splatting has achieved impressive results in novel view synthesis, it requires millions of primitives to model highly textured scenes, even when the geometry of the scene is simple. We propose a representation that goes beyond point-based rendering and decouples geometry and appearance in order to achieve a compact representation. We use surfels for geometry and a combination of a global neural field and per-primitive colours for appearance. The neural field textures a fixed number of primitives for each pixel, ensuring that the added compute is low. Our representation matches the perceptual quality of 3D Gaussian splatting while using $9.7\\\\times$ fewer primitives and $5.5\\\\times$ less memory on outdoor scenes and using $31\\\\times$ fewer primitives and $3.7\\\\times$ less memory on indoor scenes. Our representation also renders twice as fast as existing textured primitives while improving upon their visual quality.\n\n尽管 Gaussian splatting 在新视角合成中取得了令人印象深刻的结果，但即便场景几何较为简单，它仍需要数百万个 primitive 来建模高纹理场景。我们提出一种超越点渲染的表示，通过解耦几何与外观来实现更紧凑的表示。我们使用 surfel 表示几何，并用全局神经场与每个 primitive 的颜色组合表示外观。神经场会为每个像素的固定数量 primitive 提供纹理，因此额外计算开销较低。我们的表示在感知质量上可与 3D Gaussian Splatting 相当，同时在室外场景中使用的 primitive 数量减少 9.7 倍、显存减少 5.5 倍，在室内场景中 primitive 数量减少 31 倍、显存减少 3.7 倍。该表示的渲染速度也比现有纹理 primitive 快 2 倍，同时视觉质量更好。\n"
  },
  {
    "path": "abs/2512.14039.md",
    "content": "### ASAP-Textured Gaussians: Enhancing Textured Gaussians with Adaptive Sampling and Anisotropic Parameterization\n\nRecent advances have equipped 3D Gaussian Splatting with texture parameterizations to capture spatially varying attributes, improving the performance of both appearance modeling and downstream tasks. However, the added texture parameters introduce significant memory efficiency challenges. Rather than proposing new texture formulations, we take a step back to examine the characteristics of existing textured Gaussian methods and identify two key limitations in common: (1) Textures are typically defined in canonical space, leading to inefficient sampling that wastes textures' capacity on low-contribution regions; and (2) texture parameterization is uniformly assigned across all Gaussians, regardless of their visual complexity, resulting in over-parameterization. In this work, we address these issues through two simple yet effective strategies: adaptive sampling based on the Gaussian density distribution and error-driven anisotropic parameterization that allocates texture resources according to rendering error. Our proposed ASAP Textured Gaussians, short for Adaptive Sampling and Anisotropic Parameterization, significantly improve the quality efficiency tradeoff, achieving high-fidelity rendering with far fewer texture parameters.\n\n近期的工作已经为 3D Gaussian Splatting 引入了纹理参数化，以捕获空间变化属性，从而提升外观建模和下游任务的表现。然而，新增的纹理参数会带来显著的内存效率问题。我们并未提出新的纹理形式，而是回过头审视现有 textured Gaussian 方法的共同特征，并指出其中两个关键局限：(1) 纹理通常定义在 canonical space 中，导致采样效率低下，使大量纹理容量浪费在贡献较低的区域；(2) 纹理参数化在所有 Gaussians 上统一分配，而不考虑其视觉复杂度，从而造成过度参数化。为解决这些问题，我们提出两个简单而有效的策略：基于 Gaussian 密度分布的自适应采样，以及基于渲染误差驱动的各向异性参数分配，按需为不同 Gaussian 分配纹理资源。我们提出的 ASAP-Textured Gaussians，即 Adaptive Sampling and Anisotropic Parameterization，在质量与效率之间取得了更优权衡，能够使用更少的纹理参数实现高保真渲染。\n"
  },
  {
    "path": "abs/2512.14087.md",
    "content": "### GaussianPlant: Structure-aligned Gaussian Splatting for 3D Reconstruction of Plants\n\nWe present a method for jointly recovering the appearance and internal structure of botanical plants from multi-view images based on 3D Gaussian Splatting (3DGS). While 3DGS exhibits robust reconstruction of scene appearance for novel-view synthesis, it lacks structural representations underlying those appearances (e.g., branching patterns of plants), which limits its applicability to tasks such as plant phenotyping. To achieve both high-fidelity appearance and structural reconstruction, we introduce GaussianPlant, a hierarchical 3DGS representation, which disentangles structure and appearance. Specifically, we employ structure primitives (StPs) to explicitly represent branch and leaf geometry, and appearance primitives (ApPs) to the plants' appearance using 3D Gaussians. StPs represent a simplified structure of the plant, i.e., modeling branches as cylinders and leaves as disks. To accurately distinguish the branches and leaves, StP's attributes (i.e., branches or leaves) are optimized in a self-organized manner. ApPs are bound to each StP to represent the appearance of branches or leaves as in conventional 3DGS. StPs and ApPs are jointly optimized using a re-rendering loss on the input multi-view images, as well as the gradient flow from ApP to StP using the binding correspondence information. We conduct experiments to qualitatively evaluate the reconstruction accuracy of both appearance and structure, as well as real-world experiments to qualitatively validate the practical performance. Experiments show that the GaussianPlant achieves both high-fidelity appearance reconstruction via ApPs and accurate structural reconstruction via StPs, enabling the extraction of branch structure and leaf instances.\n\n我们提出一种基于 3D Gaussian Splatting (3DGS) 的方法，用于从多视角图像中联合恢复植物的外观和内部结构。虽然 3DGS 在用于新视角合成的场景外观重建方面表现稳健，但它缺乏支撑这些外观的结构表示，例如植物的分枝模式，这限制了其在植物表型分析等任务中的应用。为了同时实现高保真外观重建和结构重建，我们提出 GaussianPlant，这是一种分层式 3DGS 表示，将结构与外观显式解耦。具体来说，我们使用 structure primitives (StPs) 显式表示枝条和叶片的几何，而使用 appearance primitives (ApPs) 通过 3D Gaussians 表示植物外观。StPs 以简化结构形式表示植物，即用圆柱体建模枝条，用圆盘建模叶片。为了准确区分枝条和叶片，StP 的属性通过自组织方式进行优化。ApPs 则绑定到每个 StP 上，以常规 3DGS 的方式表示枝条或叶片的外观。StPs 和 ApPs 通过输入多视角图像上的重渲染损失以及 ApP 到 StP 的梯度流进行联合优化，后者依赖绑定对应关系信息。我们通过实验从外观和结构两个方面定性评估了重建准确性，并通过真实世界实验验证其实用性。结果表明，GaussianPlant 能同时实现通过 ApPs 进行高保真外观重建和通过 StPs 进行准确结构重建，从而支持分枝结构和叶片实例的提取。\n"
  },
  {
    "path": "abs/2512.14126.md",
    "content": "### Consistent Instance Field for Dynamic Scene Understanding\n\nUnderstanding dynamic scenes remains a core challenge in computer vision, requiring the modeling of geometry, semantics, and scene dynamics simultaneously. Traditional methods have focused primarily on motion tracking and on-demand rendering, with limited progress in broader scene understanding. Recent neural field reconstructions based on 3D Gaussian representations show strong potential for geometry and appearance modeling. However, direct projection of 2D foundation model features onto 3D Gaussians often fails in dynamic settings, as 2D features are inherently inconsistent and fragmented. To address this, we propose the Consistent Instance Field (CIF), a neural rendering-based approach that transforms sparse and inconsistent 2D features into a consistent, compact 3D feature representation by aggregating dynamic 3D Gaussian features over time. CIF constructs temporally aligned 3D feature representations for real-world long videos in an unsupervised manner, enabling text, point, or box-based querying of target objects and interactive editing for dynamic scene understanding.\n\n理解动态场景一直是计算机视觉中的核心挑战，因为这要求同时建模几何、语义和场景动态。传统方法主要关注运动跟踪和按需渲染，而在更广义的场景理解方面进展有限。近期基于 3D Gaussian 表示的神经场重建在几何与外观建模方面展现出很强潜力。然而，在动态场景中，直接将 2D 基础模型特征投影到 3D Gaussian 上通常会失败，因为 2D 特征本身是不一致且碎片化的。为解决这一问题，我们提出 Consistent Instance Field (CIF)，这是一种基于 neural rendering 的方法，通过随时间聚合动态 3D Gaussian 特征，把稀疏且不一致的 2D 特征转换为一致且紧凑的 3D 特征表示。CIF 能够以无监督方式为真实长视频构建时间对齐的 3D 特征表示，并支持基于文本、点或框的目标查询，以及面向动态场景理解的交互式编辑。\n"
  },
  {
    "path": "abs/2512.14180.md",
    "content": "### Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere\n\nRadiance field methods (e.g. 3D Gaussian Splatting) have emerged as a powerful paradigm for novel view synthesis, yet their appearance modeling often relies on Spherical Harmonics (SH), which impose fundamental limitations. SH struggle with high-frequency signals, exhibit Gibbs ringing artifacts, and fail to capture specular reflections - a key component of realistic rendering. Although alternatives like spherical Gaussians offer improvements, they add significant optimization complexity. We propose Spherical Voronoi (SV) as a unified framework for appearance representation in 3D Gaussian Splatting. SV partitions the directional domain into learnable regions with smooth boundaries, providing an intuitive and stable parameterization for view-dependent effects. For diffuse appearance, SV achieves competitive results while keeping optimization simpler than existing alternatives. For reflections - where SH fail - we leverage SV as learnable reflection probes, taking reflected directions as input following principles from classical graphics. This formulation attains state-of-the-art results on synthetic and real-world datasets, demonstrating that SV offers a principled, efficient, and general solution for appearance modeling in explicit 3D representations.\n\nRadiance field 方法，例如 3D Gaussian Splatting，已经成为新视角合成中的强大范式，但其外观建模通常依赖球谐函数 (Spherical Harmonics, SH)，而 SH 存在根本性限制。它们难以表达高频信号，容易出现 Gibbs 振铃伪影，也无法有效捕捉真实感渲染中至关重要的镜面反射。尽管球面高斯等替代方案有所改进，但它们会显著增加优化复杂度。为此，我们提出 Spherical Voronoi (SV)，作为一种统一的 3D Gaussian Splatting 外观表示框架。SV 将方向域划分为带有平滑边界的可学习区域，从而为视角相关效应提供更直观、更稳定的参数化方式。对于漫反射外观，SV 在保持优化更简单的同时取得了与现有方法有竞争力的结果。对于 SH 无法处理的反射项，我们将 SV 作为可学习反射探针，并按照经典图形学原理以反射方向作为输入。这一表述在合成数据集和真实数据集上都达到了最先进结果，表明 SV 为显式三维表示中的外观建模提供了一种原则明确、高效且通用的解决方案。\n"
  },
  {
    "path": "abs/2512.14200.md",
    "content": "### Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination\n\nRecent advances in Neural Radiance Fields and 3D Gaussian Splatting have demonstrated strong potential for large-scale UAV-based 3D reconstruction tasks by fitting the appearance of images. However, real-world large-scale captures are often based on multi-temporal data capture, where illumination inconsistencies across different times of day can significantly lead to color artifacts, geometric inaccuracies, and inconsistent appearance. Due to the lack of UAV datasets that systematically capture the same areas under varying illumination conditions, this challenge remains largely underexplored. To fill this gap, we introduceSkyLume, a large-scale, real-world UAV dataset specifically designed for studying illumination robust 3D reconstruction in urban scene modeling: (1) We collect data from 10 urban regions data comprising more than 100k high resolution UAV images (four oblique views and nadir), where each region is captured at three periods of the day to systematically isolate illumination changes. (2) To support precise evaluation of geometry and appearance, we provide per-scene LiDAR scans and accurate 3D ground-truth for assessing depth, surface normals, and reconstruction quality under varying illumination. (3) For the inverse rendering task, we introduce the Temporal Consistency Coefficient (TCC), a metric that measuress cross-time albedo stability and directly evaluates the robustness of the disentanglement of light and material. We aim for this resource to serve as a foundation that advances research and real-world evaluation in large-scale inverse rendering, geometry reconstruction, and novel view synthesis.\n\n近期 Neural Radiance Fields 和 3D Gaussian Splatting 在大规模基于无人机的三维重建任务中展示了很强潜力，它们通过拟合图像外观来实现高质量重建。然而，真实世界中的大规模采集往往依赖多时段数据采集，不同时段之间的光照不一致会显著导致颜色伪影、几何误差和外观不一致。由于缺乏能够系统性记录同一区域在不同光照条件下数据的无人机数据集，这一问题仍然缺少充分研究。为填补这一空白，我们提出 SkyLume，这是一个专门为研究城市场景中光照鲁棒三维重建而设计的大规模真实世界无人机数据集：(1) 我们从 10 个城市区域采集超过 10 万张高分辨率无人机图像，每个区域都在一天中的三个时段采集，以系统性隔离光照变化；(2) 为了支持几何和外观的精确评估，我们提供逐场景 LiDAR 扫描和准确的三维真值，用于评估深度、表面法线和不同光照下的重建质量；(3) 针对逆渲染任务，我们提出 Temporal Consistency Coefficient (TCC)，用于衡量跨时间的 albedo 稳定性，并直接评估光照与材质解耦的鲁棒性。我们希望该资源能成为推动大规模逆渲染、几何重建和新视角合成研究与真实世界评测的基础。\n"
  },
  {
    "path": "abs/2512.14352.md",
    "content": "### HGS: Hybrid Gaussian Splatting with Static-Dynamic Decomposition for Compact Dynamic View Synthesis\n\nDynamic novel view synthesis (NVS) is essential for creating immersive experiences. Existing approaches have advanced dynamic NVS by introducing 3D Gaussian Splatting (3DGS) with implicit deformation fields or indiscriminately assigned time-varying parameters, surpassing NeRF-based methods. However, due to excessive model complexity and parameter redundancy, they incur large model sizes and slow rendering speeds, making them inefficient for real-time applications, particularly on resource-constrained devices. To obtain a more efficient model with fewer redundant parameters, in this paper, we propose Hybrid Gaussian Splatting (HGS), a compact and efficient framework explicitly designed to disentangle static and dynamic regions of a scene within a unified representation. The core innovation of HGS lies in our Static-Dynamic Decomposition (SDD) strategy, which leverages Radial Basis Function (RBF) modeling for Gaussian primitives. Specifically, for dynamic regions, we employ time-dependent RBFs to effectively capture temporal variations and handle abrupt scene changes, while for static regions, we reduce redundancy by sharing temporally invariant parameters. Additionally, we introduce a two-stage training strategy tailored for explicit models to enhance temporal coherence at static-dynamic boundaries. Experimental results demonstrate that our method reduces model size by up to 98% and achieves real-time rendering at up to 125 FPS at 4K resolution on a single RTX 3090 GPU. It further sustains 160 FPS at 1352 * 1014 on an RTX 3050 and has been integrated into the VR system. Moreover, HGS achieves comparable rendering quality to state-of-the-art methods while providing significantly improved visual fidelity for high-frequency details and abrupt scene changes.\n\n动态新视角合成对构建沉浸式体验至关重要。现有方法通过在 3D Gaussian Splatting (3DGS) 中引入隐式形变场，或直接为所有 Gaussian indiscriminately 赋予随时间变化的参数，已经在动态新视角合成中超越了基于 NeRF 的方法。然而，由于模型复杂度过高和参数冗余严重，这些方法通常模型体积大、渲染速度慢，在资源受限设备上尤其不适合实时应用。为获得更高效且冗余更少的模型，我们提出 Hybrid Gaussian Splatting (HGS)，这是一个紧凑且高效的框架，能够在统一表示中显式解耦场景的静态区域和动态区域。HGS 的核心创新在于 Static-Dynamic Decomposition (SDD) 策略，它利用 Radial Basis Function (RBF) 对 Gaussian primitives 进行建模。具体而言，对于动态区域，我们使用时间相关的 RBF 以捕捉时间变化并处理突发场景变化；对于静态区域，则通过共享时间不变参数来减少冗余。此外，我们提出面向显式模型的两阶段训练策略，以增强静态与动态边界上的时间一致性。实验结果表明，该方法可将模型大小最多减少 98%，并在单张 RTX 3090 上实现最高 125 FPS 的 4K 实时渲染；在 RTX 3050 上也能以 160 FPS 运行于 1352×1014 分辨率，并已集成到 VR 系统中。同时，HGS 在保持与最先进方法相当渲染质量的同时，显著提升了高频细节和突发场景变化的视觉保真度。\n"
  },
  {
    "path": "abs/2512.14406.md",
    "content": "### Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos\n\nDespite recent progress in dynamic scene novel view synthesis, existing methods typically rely on stable camera trajectories or assume variations around a single constrained motion pattern, such as a circular trajectory with fixed radius. These restrictions limit their applicability to videos captured by mobile cameras with broad movement, such as handheld or drone footage. We propose a novel method for reconstructing a high-fidelity dynamic 3D Gaussian scene from a single freely moving monocular video and synthesizing it over a significantly broader range of viewpoints centered around the original camera trajectory. Our approach combines coarse-grained dynamic reconstruction with extensive camera trajectory interpolation, effectively leveraging reference frames in dynamic masked regions while preserving temporal consistency under wider viewpoint variation. Compared to existing methods, ours expands the coverage of camera trajectory variations by up to 64%, while still synthesizing realistic and temporally coherent dynamic scene sequences.\n\n尽管动态场景的新视角合成发展迅速，但现有方法通常依赖稳定的相机轨迹，或假设相机变化围绕单一受限运动模式，例如固定半径的环绕轨迹。这些限制使得它们难以适用于由大范围移动的手持或无人机相机拍摄的视频。我们提出一种新方法，可从单个自由移动的单目视频中重建高保真的动态 3D Gaussian 场景，并在以原始相机轨迹为中心、显著更宽的视角范围内进行合成。我们的方法将粗粒度动态重建与大规模相机轨迹插值结合，在动态掩码区域有效利用参考帧，并在更宽视角变化下保持时间一致性。相比现有方法，我们将相机轨迹变化覆盖范围最多扩大 64%，同时仍能合成真实且时间连贯的动态场景序列。\n"
  },
  {
    "path": "abs/2512.15048.md",
    "content": "### MVGSR: Multi-View Consistent 3D Gaussian Super-Resolution via Epipolar Guidance\n\nScenes reconstructed by 3D Gaussian Splatting (3DGS) trained on low-resolution (LR) images are unsuitable for high-resolution (HR) rendering. Consequently, a 3DGS super-resolution (SR) method is needed to bridge LR inputs and HR rendering. Early 3DGS SR methods rely on single-image SR networks, which lack cross-view consistency and fail to fuse complementary information across views. More recent video-based SR approaches attempt to address this limitation but require strictly sequential frames, limiting their applicability to unstructured multi-view datasets. In this work, we introduce Multi-View Consistent 3D Gaussian Splatting Super-Resolution (MVGSR), a framework that focuses on integrating multi-view information for 3DGS rendering with high-frequency details and enhanced consistency. We first propose an Auxiliary View Selection Method based on camera poses, making our method adaptable for arbitrarily organized multi-view datasets without the need of temporal continuity or data reordering. Furthermore, we introduce, for the first time, an epipolar-constrained multi-view attention mechanism into 3DGS SR, which serves as the core of our proposed multi-view SR network. This design enables the model to selectively aggregate consistent information from auxiliary views, enhancing the geometric consistency and detail fidelity of 3DGS representations. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both object-centric and scene-level 3DGS SR benchmarks.\n\n在低分辨率图像上训练得到的 3D Gaussian Splatting（3DGS）场景并不适合进行高分辨率渲染，因此需要一种 3DGS 超分辨率方法来连接低分辨率输入与高分辨率渲染。早期 3DGS 超分方法依赖单图像超分网络，缺乏跨视角一致性，也无法融合不同视角中的互补信息。后来的视频超分方法虽然尝试解决这一问题，但要求输入必须是严格连续的帧，从而限制了它们在非结构化多视角数据集上的适用性。本文提出 Multi-View Consistent 3D Gaussian Splatting Super-Resolution（MVGSR），重点在于整合多视角信息，以为 3DGS 渲染带来更多高频细节和更强一致性。我们首先提出一种基于相机位姿的辅助视角选择方法，使该方法能够适配任意组织形式的多视角数据集，而不需要时间连续性或数据重排序。进一步地，我们首次将受极线约束的多视角注意力机制引入 3DGS 超分任务，并将其作为所提出多视角超分网络的核心。该设计使模型能够有选择地聚合来自辅助视角的一致信息，从而增强 3DGS 表示的几何一致性与细节保真度。大量实验表明，我们的方法在以物体为中心和场景级的 3DGS 超分基准上都达到了最先进性能。\n"
  },
  {
    "path": "abs/2512.15258.md",
    "content": "### VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments\n\nThis paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.\n\n本文提出 VLA-AN，一个面向复杂环境下自主无人机导航的高效机载 Vision-Language-Action (VLA) 框架。VLA-AN 旨在解决现有大型空中导航模型的四个主要问题：数据域差距、长时序导航推理不足、生成式动作策略带来的安全风险，以及机载部署受限。首先，我们利用 3D Gaussian Splatting (3D-GS) 构建高保真数据集，以有效缩小数据域差距。其次，我们提出渐进式三阶段训练框架，依次强化场景理解、基础飞行技能和复杂导航能力。第三，我们设计了轻量、实时的动作模块，并结合几何安全校正机制，以保证动作生成快速、无碰撞且稳定，从而缓解随机生成式策略固有的安全问题。最后，通过对机载部署流程进行深度优化，VLA-AN 在资源受限无人机上实现了显著的实时推理吞吐提升。大量实验表明，VLA-AN 在空间定位、场景推理和长时程导航方面均有明显提升，单任务最高成功率达到 98.1%，为轻量级空中机器人实现全链路闭环自主提供了一种高效而实用的方案。\n"
  },
  {
    "path": "abs/2512.15508.md",
    "content": "### Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting\n\nFeed-forward 3D Gaussian Splatting (3DGS) models enable real-time scene generation but are hindered by suboptimal pixel-aligned primitive placement, which relies on a dense, rigid grid and limits both quality and efficiency. We introduce a new feed-forward architecture that detects 3D Gaussian primitives at a sub-pixel level, replacing the pixel grid with an adaptive, \"Off The Grid\" distribution. Inspired by keypoint detection, our multi-resolution decoder learns to distribute primitives across image patches. This module is trained end-to-end with a 3D reconstruction backbone using self-supervised learning. Our resulting pose-free model generates photorealistic scenes in seconds, achieving state-of-the-art novel view synthesis for feed-forward models. It outperforms competitors while using far fewer primitives, demonstrating a more accurate and efficient allocation that captures fine details and reduces artifacts. Moreover, we observe that by learning to render 3D Gaussians, our 3D reconstruction backbone improves camera pose estimation, suggesting opportunities to train these foundational models without labels.\n\n前馈式 3D Gaussian Splatting（3DGS）模型能够实时生成场景，但其 primitive 放置方式仍受制于像素对齐策略，需要依赖致密且刚性的网格，从而限制了质量和效率。我们提出一种新的前馈架构，在子像素级检测 3D Gaussian primitives，以自适应的“Off The Grid”分布取代像素网格。受关键点检测启发，我们的多分辨率解码器能够学习如何在图像 patch 上分布 primitives。该模块与 3D 重建骨干网络一起，通过自监督学习端到端训练。最终得到的无位姿模型能够在数秒内生成写实场景，并在前馈模型的新视角合成任务上达到最先进性能。它在使用更少 primitives 的情况下优于现有方法，说明这种分配方式更准确、更高效，能够捕获细节并减少伪影。此外，我们还观察到，通过学习渲染 3D Gaussians，3D 重建骨干网络还能提升相机位姿估计，说明这类基础模型有望在无标签条件下训练。\n"
  },
  {
    "path": "abs/2512.15711.md",
    "content": "### Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering\n\nWe present Gaussian Pixel Codec Avatars (GPiCA), photorealistic head avatars that can be generated from multi-view images and efficiently rendered on mobile devices. GPiCA utilizes a unique hybrid representation that combines a triangle mesh and anisotropic 3D Gaussians. This combination maximizes memory and rendering efficiency while maintaining a photorealistic appearance. The triangle mesh is highly efficient in representing surface areas like facial skin, while the 3D Gaussians effectively handle non-surface areas such as hair and beard. To this end, we develop a unified differentiable rendering pipeline that treats the mesh as a semi-transparent layer within the volumetric rendering paradigm of 3D Gaussian Splatting. We train neural networks to decode a facial expression code into three components: a 3D face mesh, an RGBA texture, and a set of 3D Gaussians. These components are rendered simultaneously in a unified rendering engine. The networks are trained using multi-view image supervision. Our results demonstrate that GPiCA achieves the realism of purely Gaussian-based avatars while matching the rendering performance of mesh-based avatars.\n\n我们提出 Gaussian Pixel Codec Avatars（GPiCA），这是一种能够由多视角图像生成并在移动设备上高效渲染的照片级头部化身。GPiCA 采用独特的混合表示，将三角网格与各向异性 3D 高斯结合起来，在保持照片级外观的同时最大化内存与渲染效率。三角网格非常适合表示面部皮肤等表面区域，而 3D 高斯则能有效处理头发和胡须等非表面区域。为此，我们开发了统一的可微渲染管线，将网格视为 3D Gaussian Splatting 体渲染范式中的半透明层。我们训练神经网络将面部表情编码解码为三个组成部分：3D 人脸网格、RGBA 纹理以及一组 3D 高斯。这些部分在统一渲染引擎中同时渲染，网络通过多视角图像监督进行训练。实验结果表明，GPiCA 在达到纯高斯化身真实感的同时，也具备与基于网格化身相当的渲染性能。\n"
  },
  {
    "path": "abs/2512.16397.md",
    "content": "### Using Gaussian Splats to Create High-Fidelity Facial Geometry and Texture\n\nWe leverage increasingly popular three-dimensional neural representations in order to construct a unified and consistent explanation of a collection of uncalibrated images of the human face. Our approach utilizes Gaussian Splatting, since it is more explicit and thus more amenable to constraints than NeRFs. We leverage segmentation annotations to align the semantic regions of the face, facilitating the reconstruction of a neutral pose from only 11 images (as opposed to requiring a long video). We soft constrain the Gaussians to an underlying triangulated surface in order to provide a more structured Gaussian Splat reconstruction, which in turn informs subsequent perturbations to increase the accuracy of the underlying triangulated surface. The resulting triangulated surface can then be used in a standard graphics pipeline. In addition, and perhaps most impactful, we show how accurate geometry enables the Gaussian Splats to be transformed into texture space where they can be treated as a view-dependent neural texture. This allows one to use high visual fidelity Gaussian Splatting on any asset in a scene without the need to modify any other asset or any other aspect (geometry, lighting, renderer, etc.) of the graphics pipeline. We utilize a relightable Gaussian model to disentangle texture from lighting in order to obtain a delit high-resolution albedo texture that is also readily usable in a standard graphics pipeline. The flexibility of our system allows for training with disparate images, even with incompatible lighting, facilitating robust regularization. Finally, we demonstrate the efficacy of our approach by illustrating its use in a text-driven asset creation pipeline.\n\n我们利用日益流行的三维神经表示，对一组未经标定的人脸图像构建统一且一致的解释。我们选择 Gaussian Splatting，因为它比 NeRF 更显式，更便于施加约束。借助分割标注，我们对齐人脸的语义区域，从而只用 11 张图像就能重建中性姿态，而不需要长视频。我们对 Gaussian 施加软约束，使其贴合底层三角网格表面，从而得到更结构化的 Gaussian Splat 重建，并反过来指导后续扰动以提升底层三角网格表面的精度。得到的三角网格随后可以直接用于标准图形管线。更重要的是，我们展示了准确几何如何使 Gaussian Splats 能被变换到纹理空间，并作为视角相关的神经纹理处理。这使得在无需修改场景中其他资产，也无需改动图形管线中的几何、光照、渲染器等其他部分的情况下，就能把高视觉保真的 Gaussian Splatting 用到任意资产上。我们还利用可重光照的 Gaussian 模型将纹理与光照解耦，以获得去光照的高分辨率 albedo 纹理，同样可直接用于标准图形管线。该系统具有很强灵活性，能够利用来源不同、甚至光照不兼容的图像进行训练，从而实现稳健正则化。最后，我们通过一个文本驱动的资产创建流程展示了该方法的有效性。\n"
  },
  {
    "path": "abs/2512.16706.md",
    "content": "### SDFoam: Signed-Distance Foam for explicit surface reconstruction\n\nNeural radiance fields (NeRF) have driven impressive progress in view synthesis by using ray-traced volumetric rendering. Splatting-based methods such as 3D Gaussian Splatting (3DGS) provide faster rendering by rasterizing 3D primitives. RadiantFoam (RF) brought ray tracing back, achieving throughput comparable to Gaussian Splatting by organizing radiance with an explicit Voronoi Diagram (VD). Yet, all the mentioned methods still struggle with precise mesh reconstruction. We address this gap by jointly learning an explicit VD with an implicit Signed Distance Field (SDF). The scene is optimized via ray tracing and regularized by an Eikonal objective. The SDF introduces metric-consistent isosurfaces, which, in turn, bias near-surface Voronoi cell faces to align with the zero level set. The resulting model produces crisper, view-consistent surfaces with fewer floaters and improved topology, while preserving photometric quality and maintaining training speed on par with RadiantFoam. Across diverse scenes, our hybrid implicit-explicit formulation, which we name SDFoam, substantially improves mesh reconstruction accuracy (Chamfer distance) with comparable appearance (PSNR, SSIM), without sacrificing efficiency.\n\nNeural radiance fields (NeRF) 通过基于光线追踪的体渲染推动了视图合成领域的巨大进展。基于 splatting 的方法，例如 3D Gaussian Splatting (3DGS)，通过对三维 primitives 进行光栅化实现了更快的渲染。RadiantFoam (RF) 则将光线追踪重新引入该方向，通过使用显式 Voronoi 图组织辐射信息，实现了与 Gaussian Splatting 相当的吞吐性能。然而，这些方法在精确网格重建方面仍然存在困难。为此，我们提出将显式 Voronoi 图与隐式 Signed Distance Field (SDF) 联合学习。场景通过光线追踪进行优化，并辅以 Eikonal 正则项。SDF 引入了具有度量一致性的等值面，进一步推动靠近表面的 Voronoi 单元面与零水平集对齐。最终模型在保持光度质量和训练速度不逊于 RadiantFoam 的同时，生成了更清晰、视图更一致、浮点伪影更少、拓扑结构更好的表面。在多个场景上的实验表明，我们命名为 SDFoam 的这一显式与隐式混合表示，在不牺牲效率的前提下，显著提升了网格重建精度，同时保持与基线相当的外观质量。\n"
  },
  {
    "path": "abs/2512.16893.md",
    "content": "### Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation\n\nPortrait animation has witnessed tremendous quality improvements thanks to recent advances in video diffusion models. However, these 2D methods often compromise 3D consistency and speed, limiting their applicability in real-world scenarios, such as digital twins or telepresence. In contrast, 3D-aware facial animation feedforward methods, built upon explicit 3D representations such as neural radiance fields or Gaussian splatting, ensure 3D consistency and achieve faster inference speed, but come with inferior expression details. In this paper, we aim to combine their strengths by distilling knowledge from a 2D diffusion-based method into a feed-forward encoder, which instantly converts an in-the-wild single image into a 3D-consistent, fast yet expressive animatable representation. Our animation representation is decoupled from the face's 3D representation and learns motion implicitly from data, eliminating the dependency on pre-defined parametric models that often constrain animation capabilities. Unlike previous computationally intensive global fusion mechanisms for fusing 3D structural and animation information, our design employs an efficient lightweight local fusion strategy to achieve high animation expressivity. As a result, our method runs at 107.31 FPS for animation and pose control while achieving comparable animation quality to the state of the art.\n\n得益于视频扩散模型的进展，人像动画在质量上取得了显著提升。然而，这类 2D 方法通常会牺牲 3D 一致性和速度，限制了其在数字孪生、远程呈现等真实场景中的应用。相比之下，基于显式 3D 表示，如 neural radiance fields 或 Gaussian splatting 的 3D-aware 前馈人脸动画方法能够保证 3D 一致性并实现更快推理，但表情细节较弱。本文旨在结合两者优点，将 2D 扩散模型的知识蒸馏到一个前馈编码器中，使其能够把一张 in-the-wild 单图像即时转换为 3D 一致、速度快且表情丰富的可动画表示。我们的动画表示与人脸的 3D 表示解耦，并从数据中隐式学习运动，摆脱了对预定义参数化模型的依赖，从而避免这类模型对动画能力的限制。不同于以往使用高代价全局融合机制来融合 3D 结构与动画信息，我们采用轻量的局部融合策略，以高效实现强表情表现力。最终，该方法在动画与位姿控制上可达 107.31 FPS，同时在动画质量上与 SOTA 相当。\n"
  },
  {
    "path": "abs/2512.17094.md",
    "content": "### DGH: Dynamic Gaussian Hair\n\nThe creation of photorealistic dynamic hair remains a major challenge in digital human modeling because of the complex motions, occlusions, and light scattering. Existing methods often resort to static capture and physics-based models that do not scale as they require manual parameter fine-tuning to handle the diversity of hairstyles and motions, and heavy computation to obtain high-quality appearance. In this paper, we present Dynamic Gaussian Hair (DGH), a novel framework that efficiently learns hair dynamics and appearance. We propose: (1) a coarse-to-fine model that learns temporally coherent hair motion dynamics across diverse hairstyles; (2) a strand-guided optimization module that learns a dynamic 3D Gaussian representation for hair appearance with support for differentiable rendering, enabling gradient-based learning of view-consistent appearance under motion. Unlike prior simulation-based pipelines, our approach is fully data-driven, scales with training data, and generalizes across various hairstyles and head motion sequences. Additionally, DGH can be seamlessly integrated into a 3D Gaussian avatar framework, enabling realistic, animatable hair for high-fidelity avatar representation. DGH achieves promising geometry and appearance results, providing a scalable, data-driven alternative to physics-based simulation and rendering.\n\n在数字人建模中，构建照片级真实的动态头发仍是一项重大挑战，因为头发具有复杂的运动、遮挡和光散射。现有方法通常依赖静态采集和基于物理的模型，但这类方法需要手工微调参数以适应多样的发型和运动，并且需要大量计算才能获得高质量外观，因而难以扩展。本文提出 Dynamic Gaussian Hair (DGH)，这是一个能够高效学习头发动态与外观的新框架。我们提出：(1) 一个粗到细模型，用于在多样发型之间学习时间一致的头发运动动态；(2) 一个由发丝引导的优化模块，用于学习支持可微渲染的动态 3D Gaussian 头发外观表示，从而能够在运动过程中通过梯度学习视角一致的外观。不同于以往基于仿真的管线，我们的方法完全数据驱动，可随着训练数据扩展，并能泛化到不同发型和头部运动序列。此外，DGH 还可以无缝集成到 3D Gaussian avatar 框架中，为高保真 avatar 表示提供真实、可动画的头发。DGH 在几何和外观上都取得了有前景的结果，为基于物理的仿真和渲染提供了一种可扩展的数据驱动替代方案。\n"
  },
  {
    "path": "abs/2512.17349.md",
    "content": "### Flying in Clutter on Monocular RGB by Learning in 3D Radiance Fields with Domain Adaptation\n\nModern autonomous navigation systems predominantly rely on lidar and depth cameras. However, a fundamental question remains: Can flying robots navigate in clutter using solely monocular RGB images? Given the prohibitive costs of real-world data collection, learning policies in simulation offers a promising path. Yet, deploying such policies directly in the physical world is hindered by the significant sim-to-real perception gap. Thus, we propose a framework that couples the photorealism of 3D Gaussian Splatting (3DGS) environments with Adversarial Domain Adaptation. By training in high-fidelity simulation while explicitly minimizing feature discrepancy, our method ensures the policy relies on domain-invariant cues. Experimental results demonstrate that our policy achieves robust zero-shot transfer to the physical world, enabling safe and agile flight in unstructured environments with varying illumination.\n\n现代自主导航系统主要依赖激光雷达和深度相机。然而，一个根本问题依然存在：飞行机器人能否仅凭单目 RGB 图像在杂乱环境中完成飞行导航？考虑到真实世界数据采集成本极高，在仿真中学习策略是一条很有前景的路线。但由于从仿真到真实世界存在显著的感知域差异，这类策略往往难以直接部署到物理系统中。为此，我们提出一个将 3D Gaussian Splatting (3DGS) 环境的高真实感与对抗式域适应结合起来的框架。通过在高保真仿真环境中训练，并显式缩小不同域之间的特征差异，我们的方法能够使策略依赖于域不变线索。实验结果表明，该策略能够稳健地零样本迁移到真实世界，在光照变化明显的非结构化环境中实现安全且灵活的飞行。\n"
  },
  {
    "path": "abs/2512.17528.md",
    "content": "### Voxel-GS: Quantized Scaffold Gaussian Splatting Compression with Run-Length Coding\n\nSubstantial Gaussian splatting format point clouds require effective compression. In this paper, we propose Voxel-GS, a simple yet highly effective framework that departs from the complex neural entropy models of prior work, instead achieving competitive performance using only a lightweight rate proxy and run-length coding. Specifically, we employ a differentiable quantization to discretize the Gaussian attributes of Scaffold-GS. Subsequently, a Laplacian-based rate proxy is devised to impose an entropy constraint, guiding the generation of high-fidelity and compact reconstructions. Finally, this integer-type Gaussian point cloud is compressed losslessly using Octree and run-length coding. Experiments validate that the proposed rate proxy accurately estimates the bitrate of run-length coding, enabling Voxel-GS to eliminate redundancy and optimize for a more compact representation. Consequently, our method achieves a remarkable compression ratio with significantly faster coding speeds than prior art. The code is available at https://github.com/zb12138/VoxelGS.\n\n大规模 Gaussian Splatting 格式点云需要有效的压缩。本文提出 Voxel-GS，一个简单但高效的框架，区别于以往复杂的神经熵模型，仅依赖轻量级码率代理和游程编码就实现了有竞争力的性能。具体而言，我们首先对 Scaffold-GS 的 Gaussian 属性进行可微量化离散化。随后设计基于 Laplacian 的码率代理，以施加熵约束，引导生成高保真且紧凑的重建结果。最后，再利用八叉树和游程编码对这种整数类型的 Gaussian 点云进行无损压缩。实验表明，所提出的码率代理能够准确估计游程编码的比特率，使 Voxel-GS 能够有效消除冗余并优化得到更紧凑的表示。因此，我们的方法在实现显著压缩比的同时，编码速度也明显快于现有方法。代码见 https://github.com/zb12138/VoxelGS 。\n"
  },
  {
    "path": "abs/2512.17541.md",
    "content": "### FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views\n\nWe present FLEG, a feed-forward network that reconstructs language-embedded 3D Gaussians from any views. Previous straightforward solutions combine feed-forward reconstruction with Gaussian heads but suffer from fixed input views and insufficient 3D training data. In contrast, we propose a 3D-annotation-free training framework for 2D-to-3D lifting from arbitrary uncalibrated and unposed multi-view images. Since the framework does not require 3D annotations, we can leverage large-scale video data with easily obtained 2D instance information to enrich semantic embedding. We also propose an instance-guided contrastive learning to align 2D semantics with the 3D representations. In addition, to mitigate the high memory and computational cost of dense views, we further propose a geometry-semantic hierarchical sparsification strategy. Our FLEG efficiently reconstructs language-embedded 3D Gaussian representation in a feed-forward manner from arbitrary sparse or dense views, jointly producing accurate geometry, high-fidelity appearance, and language-aligned semantics. Extensive experiments show that it outperforms existing methods on various related tasks. Project page: https://fangzhou2000.github.io/projects/fleg.\n\n我们提出 FLEG，一个能够从任意视角重建语言嵌入 3D 高斯的前馈网络。此前直接将前馈式重建与 Gaussian heads 结合的方案通常受限于固定输入视角以及 3D 训练数据不足。相比之下，我们提出一种无需 3D 标注的训练框架，可从任意未经标定且未定姿的多视角图像中进行二维到三维提升。由于不需要 3D 标注，我们可以利用易于获取二维实例信息的大规模视频数据来丰富语义嵌入。我们还提出实例引导的对比学习，用于对齐二维语义与三维表示。此外，为缓解密集视角带来的高内存和高计算成本，我们进一步提出几何-语义分层稀疏化策略。FLEG 能够以前馈方式从任意稀疏或稠密视角高效重建语言嵌入的 3D 高斯表示，同时获得准确几何、高保真外观和语言对齐语义。大量实验表明，它在多项相关任务上优于现有方法。项目主页为 https://fangzhou2000.github.io/projects/fleg 。\n"
  },
  {
    "path": "abs/2512.17547.md",
    "content": "### G3Splat: Geometrically Consistent Generalizable Gaussian Splatting\n\n3D Gaussians have recently emerged as an effective scene representation for real-time splatting and accurate novel-view synthesis, motivating several works to adapt multi-view structure prediction networks to regress per-pixel 3D Gaussians from images. However, most prior work extends these networks to predict additional Gaussian parameters -- orientation, scale, opacity, and appearance -- while relying almost exclusively on view-synthesis supervision. We show that a view-synthesis loss alone is insufficient to recover geometrically meaningful splats in this setting. We analyze and address the ambiguities of learning 3D Gaussian splats under self-supervision for pose-free generalizable splatting, and introduce G3Splat, which enforces geometric priors to obtain geometrically consistent 3D scene representations. Trained on RE10K, our approach achieves state-of-the-art performance in (i) geometrically consistent reconstruction, (ii) relative pose estimation, and (iii) novel-view synthesis. We further demonstrate strong zero-shot generalization on ScanNet, substantially outperforming prior work in both geometry recovery and relative pose estimation.\n\n3D Gaussian 近来已成为实时 splatting 和高精度新视角合成的有效场景表示，因此不少工作开始将多视图结构预测网络改造为从图像回归逐像素 3D Gaussian。然而，大多数已有工作只是扩展这些网络去预测额外的 Gaussian 参数，如朝向、尺度、不透明度和外观，同时几乎完全依赖视图合成监督。我们表明，仅靠视图合成损失不足以在这种设定下恢复具有几何意义的 splat。我们分析并解决了在无姿态、可泛化 splatting 场景下，通过自监督学习 3D Gaussian splat 所面临的歧义，并提出 G3Splat，通过引入几何先验来获得几何一致的 3D 场景表示。在 RE10K 上训练后，我们的方法在几何一致重建、相对位姿估计和新视角合成三项任务上都达到 SOTA。我们还展示了在 ScanNet 上的强零样本泛化能力，在几何恢复和相对位姿估计方面都显著优于现有方法。\n"
  },
  {
    "path": "abs/2512.17796.md",
    "content": "### Animate Any Character in Any World\n\nRecent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construct 3D environments without active agents, and (2) controllable-entity models, which allow a single entity to perform limited actions in an otherwise uncontrollable environment. In this work, we introduce AniX, leveraging the realism and structural grounding of static world generation while extending controllable-entity models to support user-specified characters capable of performing open-ended actions. Users can provide a 3DGS scene and a character, then direct the character through natural language to perform diverse behaviors from basic locomotion to object-centric interactions while freely exploring the environment. AniX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character, formulated as a conditional autoregressive video generation problem. Built upon a pre-trained video generator, our training strategy significantly enhances motion dynamics while maintaining generalization across actions and characters. Our evaluation covers a broad range of aspects, including visual quality, character consistency, action controllability, and long-horizon coherence.\n\n近期 world model 的进展显著增强了交互式环境仿真能力。现有方法主要分为两类：(1) 静态世界生成模型，用于构建没有主动体的 3D 环境；(2) 可控实体模型，允许单个实体在其余环境不可控的情况下执行有限动作。本文提出 AniX，在利用静态世界生成所带来的真实感和结构基础的同时，将可控实体模型扩展为支持用户指定角色执行开放式动作。用户可以提供一个 3DGS 场景和一个角色，再通过自然语言指挥角色执行从基本运动到面向物体交互的多样行为，并在环境中自由探索。AniX 将这一任务表述为条件自回归视频生成问题，生成在时间上连贯、并保持与给定场景和角色视觉一致的视频片段。基于预训练视频生成器，我们的训练策略显著增强了动作动态，同时保持了对不同动作和角色的泛化能力。我们的评估覆盖视觉质量、角色一致性、动作可控性以及长时程连贯性等多个方面。\n"
  },
  {
    "path": "abs/2512.17817.md",
    "content": "### Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding\n\nWhile 3DGS has emerged as a high-fidelity scene representation, encoding rich, general-purpose features directly from its primitives remains under-explored. We address this gap by introducing Chorus, a multi-teacher pretraining framework that learns a holistic feed-forward 3D Gaussian Splatting (3DGS) scene encoder by distilling complementary signals from 2D foundation models. Chorus employs a shared 3D encoder and teacher-specific projectors to learn from language-aligned, generalist, and object-aware teachers, encouraging a shared embedding space that captures signals from high-level semantics to fine-grained structure. We evaluate Chorus on a wide range of tasks: open-vocabulary semantic and instance segmentation, linear and decoder probing, as well as data-efficient supervision. Besides 3DGS, we also test Chorus on several benchmarks that only support point clouds by pretraining a variant using only Gaussians' centers, colors, estimated normals as inputs. Interestingly, this encoder shows strong transfer and outperforms the point clouds baseline while using 39.9 times fewer training scenes. Finally, we propose a render-and-distill adaptation that facilitates out-of-domain finetuning. Our code and model will be released upon publication.\n\n尽管 3DGS 已成为一种高保真的场景表示方式，但如何直接从其 primitives 中编码丰富且通用的特征仍研究不足。为解决这一问题，我们提出 Chorus，一个多教师预训练框架，通过蒸馏二维基础模型中的互补信号来学习整体性的前馈式 3D Gaussian Splatting (3DGS) 场景编码器。Chorus 使用共享的三维编码器和教师特定投影头，从语言对齐、通用感知和目标感知等多个教师中学习，鼓励形成一个能够同时捕获高层语义与细粒度结构信息的共享嵌入空间。我们在多种任务上评估 Chorus，包括开放词汇语义与实例分割、线性探测、解码器探测以及数据高效监督。除了 3DGS 外，我们还在只支持点云的多个基准上测试了一个仅以高斯中心、颜色和估计法线为输入的变体。值得注意的是，这个编码器展现了很强的迁移能力，并且在训练场景数减少 39.9 倍的情况下，仍优于点云基线。最后，我们还提出一种 render-and-distill 适配方式，以支持跨域微调。代码和模型将在论文发表后公开。\n"
  },
  {
    "path": "abs/2512.18314.md",
    "content": "### MatSpray: Fusing 2D Material World Knowledge on 3D Geometry\n\nManual modeling of material parameters and 3D geometry is a time consuming yet essential task in the gaming and film industries. While recent advances in 3D reconstruction have enabled accurate approximations of scene geometry and appearance, these methods often fall short in relighting scenarios due to the lack of precise, spatially varying material parameters. At the same time, diffusion models operating on 2D images have shown strong performance in predicting physically based rendering (PBR) properties such as albedo, roughness, and metallicity. However, transferring these 2D material maps onto reconstructed 3D geometry remains a significant challenge. We propose a framework for fusing 2D material data into 3D geometry using a combination of novel learning-based and projection-based approaches. We begin by reconstructing scene geometry via Gaussian Splatting. From the input images, a diffusion model generates 2D maps for albedo, roughness, and metallic parameters. Any existing diffusion model that can convert images or videos to PBR materials can be applied. The predictions are further integrated into the 3D representation either by optimizing an image-based loss or by directly projecting the material parameters onto the Gaussians using Gaussian ray tracing. To enhance fine-scale accuracy and multi-view consistency, we further introduce a light-weight neural refinement step (Neural Merger), which takes ray-traced material features as input and produces detailed adjustments. Our results demonstrate that the proposed methods outperform existing techniques in both quantitative metrics and perceived visual realism. This enables more accurate, relightable, and photorealistic renderings from reconstructed scenes, significantly improving the realism and efficiency of asset creation workflows in content production pipelines.\n\n在游戏和影视工业中，手工建模材质参数和 3D 几何既耗时又必不可少。尽管近期 3D 重建方法已经能够较准确地逼近场景几何与外观，但在重光照场景中，这类方法通常因缺乏精确且空间变化的材质参数而表现不足。与此同时，基于 2D 图像的扩散模型在预测 albedo、roughness、metallicity 等物理渲染属性方面已经表现出很强能力。然而，如何将这些 2D 材质图稳定迁移到重建后的 3D 几何上，仍是一个重大挑战。我们提出一个将 2D 材质知识融合到 3D 几何中的框架，结合了新的学习式方法和投影式方法。首先，我们利用 Gaussian Splatting 重建场景几何；然后，从输入图像中由扩散模型生成 albedo、roughness 和 metallic 参数的 2D 图。任意可将图像或视频转换为 PBR 材质的扩散模型都可用于该流程。接着，这些预测结果要么通过优化图像损失融入 3D 表示，要么通过 Gaussian ray tracing 直接投影到 Gaussian 上。为进一步提升细节精度和多视图一致性，我们引入一个轻量级神经细化步骤 Neural Merger，以 ray-traced 材质特征为输入并输出细节修正。实验结果表明，该方法在定量指标和视觉真实感上都优于现有技术，使从重建场景得到更准确、可重光照、更加照片级的渲染成为可能，并显著提升了内容生产流程中资产创建的真实感与效率。\n"
  },
  {
    "path": "abs/2512.18640.md",
    "content": "### Geometric-Photometric Event-based 3D Gaussian Ray Tracing\n\nEvent cameras offer a high temporal resolution over traditional frame-based cameras, which makes them suitable for motion and structure estimation. However, it has been unclear how event-based 3D Gaussian Splatting (3DGS) approaches could leverage fine-grained temporal information of sparse events. This work proposes a framework to address the trade-off between accuracy and temporal resolution in event-based 3DGS. Our key idea is to decouple the rendering into two branches: event-by-event geometry (depth) rendering and snapshot-based radiance (intensity) rendering, by using ray-tracing and the image of warped events. The extensive evaluation shows that our method achieves state-of-the-art performance on the real-world datasets and competitive performance on the synthetic dataset. Also, the proposed method works without prior information (e.g., pretrained image reconstruction models) or COLMAP-based initialization, is more flexible in the event selection number, and achieves sharp reconstruction on scene edges with fast training time. We hope that this work deepens our understanding of the sparse nature of events for 3D reconstruction. The code will be released.\n\n事件相机相比传统基于帧的相机具有更高的时间分辨率，因此非常适合运动与结构估计。然而，事件驱动的 3D Gaussian Splatting（3DGS）方法究竟应如何利用稀疏事件中的细粒度时间信息，一直并不清楚。本文提出一个新框架，用于解决事件驱动 3DGS 中精度与时间分辨率之间的权衡。核心思想是将渲染解耦为两个分支：利用光线追踪和扭曲事件图像进行逐事件的几何（深度）渲染，以及基于快照的辐射（强度）渲染。大量评估结果表明，我们的方法在真实数据集上达到最先进性能，在合成数据集上也具备竞争力。此外，该方法无需先验信息，例如预训练图像重建模型或基于 COLMAP 的初始化，在事件选择数量上更加灵活，并且能以较快训练速度在场景边缘实现清晰重建。我们希望这项工作能加深对事件稀疏特性在三维重建中作用的理解。代码将公开。\n"
  },
  {
    "path": "abs/2512.18655.md",
    "content": "### SplatBright: Generalizable Low-Light Scene Reconstruction from Sparse Views via Physically-Guided Gaussian Enhancement\n\nLow-light 3D reconstruction from sparse views remains challenging due to exposure imbalance and degraded color fidelity. While existing methods struggle with view inconsistency and require per-scene training, we propose SplatBright, which is, to our knowledge, the first generalizable 3D Gaussian framework for joint low-light enhancement and reconstruction from sparse sRGB inputs. Our key idea is to integrate physically guided illumination modeling with geometry-appearance decoupling for consistent low-light reconstruction. Specifically, we adopt a dual-branch predictor that provides stable geometric initialization of 3D Gaussian parameters. On the appearance side, illumination consistency leverages frequency priors to enable controllable and cross-view coherent lighting, while an appearance refinement module further separates illumination, material, and view-dependent cues to recover fine texture. To tackle the lack of large-scale geometrically consistent paired data, we synthesize dark views via a physics-based camera model for training. Extensive experiments on public and self-collected datasets demonstrate that SplatBright achieves superior novel view synthesis, cross-view consistency, and better generalization to unseen low-light scenes compared with both 2D and 3D methods.\n\n稀疏视角下的低照度 3D 重建仍然具有挑战性，因为存在曝光不平衡和颜色保真度下降的问题。现有方法往往难以保持视角一致性，并且需要针对每个场景单独训练。为此，我们提出 SplatBright，据我们所知，这是首个面向稀疏 sRGB 输入、同时进行低照度增强与重建的可泛化 3D Gaussian 框架。其核心思想是将物理引导的光照建模与几何-外观解耦结合起来，以实现一致的低照度重建。具体来说，我们采用双分支预测器，为 3D Gaussian 参数提供稳定的几何初始化。在外观侧，光照一致性利用频率先验实现可控且跨视角一致的光照，而外观细化模块进一步分离光照、材质和视角相关线索，以恢复精细纹理。针对缺乏大规模几何一致配对数据的问题，我们通过基于物理的相机模型合成暗光视图进行训练。大量公共和自采数据集上的实验表明，相比 2D 和 3D 方法，SplatBright 在新视角合成、跨视角一致性以及对未见低照度场景的泛化能力方面都更优。\n"
  },
  {
    "path": "abs/2512.18692.md",
    "content": "### EcoSplat: Efficiency-controllable Feed-forward 3D Gaussian Splatting from Multi-view Images\n\nFeed-forward 3D Gaussian Splatting (3DGS) enables efficient one-pass scene reconstruction, providing 3D representations for novel view synthesis without per-scene optimization. However, existing methods typically predict pixel-aligned primitives per-view, producing an excessive number of primitives in dense-view settings and offering no explicit control over the number of predicted Gaussians. To address this, we propose EcoSplat, the first efficiency-controllable feed-forward 3DGS framework that adaptively predicts the 3D representation for any given target primitive count at inference time. EcoSplat adopts a two-stage optimization process. The first stage is Pixel-aligned Gaussian Training (PGT) where our model learns initial primitive prediction. The second stage is Importance-aware Gaussian Finetuning (IGF) stage where our model learns rank primitives and adaptively adjust their parameters based on the target primitive count. Extensive experiments across multiple dense-view settings show that EcoSplat is robust and outperforms state-of-the-art methods under strict primitive-count constraints, making it well-suited for flexible downstream rendering tasks.\n\n前馈式 3D Gaussian Splatting (3DGS) 能够在无需场景级优化的情况下实现一次前向的场景重建，为新视角合成提供三维表示。然而，现有方法通常按视角预测像素对齐的 primitives，在密集视角设置下会产生过量 primitives，并且无法显式控制预测高斯的数量。为此，我们提出 EcoSplat，这是首个可控效率的前馈式 3DGS 框架，能够在推理时针对任意目标 primitive 数量自适应预测三维表示。EcoSplat 采用两阶段优化过程。第一阶段是 Pixel-aligned Gaussian Training (PGT)，用于学习初始 primitive 预测；第二阶段是 Importance-aware Gaussian Finetuning (IGF)，用于学习对 primitives 进行排序，并根据目标 primitive 数量自适应调整其参数。跨多种密集视角设置的大量实验表明，EcoSplat 在严格 primitive 数量约束下仍然稳健，并优于现有最先进方法，适合灵活的下游渲染任务。\n"
  },
  {
    "path": "abs/2512.19648.md",
    "content": "### 4D Gaussian Splatting as a Learned Dynamical System\n\nWe reinterpret 4D Gaussian Splatting as a continuous-time dynamical system, where scene motion arises from integrating a learned neural dynamical field rather than applying per-frame deformations. This formulation, which we call EvoGS, treats the Gaussian representation as an evolving physical system whose state evolves continuously under a learned motion law. This unlocks capabilities absent in deformation-based approaches:(1) sample-efficient learning from sparse temporal supervision by modeling the underlying motion law; (2) temporal extrapolation enabling forward and backward prediction beyond observed time ranges; and (3) compositional dynamics that allow localized dynamics injection for controllable scene synthesis. Experiments on dynamic scene benchmarks show that EvoGS achieves better motion coherence and temporal consistency compared to deformation-field baselines while maintaining real-time rendering\n\n我们将 4D Gaussian Splatting 重新解释为一个连续时间动力系统，其中场景运动来自对学习到的神经动力场进行积分，而不是逐帧施加形变。我们将这一表述称为 EvoGS，它把高斯表示视为在学习到的运动规律下持续演化的物理系统。该框架带来了传统形变场方法所不具备的能力：(1) 通过建模底层运动规律，在稀疏时间监督下实现更高的样本效率；(2) 进行时间外推，可在观测时间范围之外做前向和后向预测；(3) 通过局部动力学注入实现可控场景合成的组合式动态。动态场景基准上的实验表明，EvoGS 在保持实时渲染的同时，相比基于形变场的基线方法具有更好的运动连贯性和时间一致性。\n"
  },
  {
    "path": "abs/2512.19678.md",
    "content": "### WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion\n\nGenerating long-range, geometrically consistent video presents a fundamental dilemma: while consistency demands strict adherence to 3D geometry in pixel space, state-of-the-art generative models operate most effectively in a camera-conditioned latent space. This disconnect causes current methods to struggle with occluded areas and complex camera trajectories. To bridge this gap, we propose WorldWarp, a framework that couples a 3D structural anchor with a 2D generative refiner. To establish geometric grounding, WorldWarp maintains an online 3D geometric cache built via Gaussian Splatting (3DGS). By explicitly warping historical content into novel views, this cache acts as a structural scaffold, ensuring each new frame respects prior geometry. However, static warping inevitably leaves holes and artifacts due to occlusions. We address this using a Spatio-Temporal Diffusion (ST-Diff) model designed for a \"fill-and-revise\" objective. Our key innovation is a spatio-temporal varying noise schedule: blank regions receive full noise to trigger generation, while warped regions receive partial noise to enable refinement. By dynamically updating the 3D cache at every step, WorldWarp maintains consistency across video chunks. Consequently, it achieves state-of-the-art fidelity by ensuring that 3D logic guides structure while diffusion logic perfects texture. Project page: https://hyokong.github.io/worldwarp-page/.\n\n生成长时段且几何一致的视频面临一个根本性矛盾：一致性要求在像素空间严格遵守三维几何，而最先进的生成模型却通常在相机条件化的潜空间中效果最佳。这种割裂使现有方法在遮挡区域和复杂相机轨迹下表现不佳。为弥合这一差距，我们提出 WorldWarp，一个将三维结构锚点与二维生成式细化器耦合起来的框架。为了建立几何约束，WorldWarp 维护一个通过 Gaussian Splatting (3DGS) 构建的在线三维几何缓存。通过将历史内容显式扭曲到新视角中，该缓存充当结构支架，确保每个新帧都遵循先前几何。然而，静态扭曲不可避免地会因遮挡留下空洞和伪影。为此，我们设计了用于“填充并修正”目标的时空扩散模型 ST-Diff。其关键创新在于时空变化的噪声调度：空白区域施加全噪声以触发生成，而已扭曲区域仅施加部分噪声以进行细化。通过在每一步动态更新三维缓存，WorldWarp 能够在视频块之间维持一致性。因此，它通过让三维逻辑负责结构、扩散逻辑完善纹理，达到了当前最先进的保真度。项目主页为 https://hyokong.github.io/worldwarp-page/ 。\n"
  },
  {
    "path": "abs/2512.19871.md",
    "content": "### HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction\n\n3D Panoptic Occupancy Prediction aims to reconstruct a dense volumetric scene map by predicting the semantic class and instance identity of every occupied region in 3D space. Achieving such fine-grained 3D understanding requires precise geometric reasoning and spatially consistent scene representation across complex environments. However, existing approaches often struggle to maintain precise geometry and capture the precise spatial range of 3D instances critical for robust panoptic separation. To overcome these limitations, we introduce HyGE-Occ, a novel framework that leverages a hybrid view-transformation branch with 3D Gaussian and edge priors to enhance both geometric consistency and boundary awareness in 3D panoptic occupancy prediction. HyGE-Occ employs a hybrid view-transformation branch that fuses a continuous Gaussian-based depth representation with a discretized depth-bin formulation, producing BEV features with improved geometric consistency and structural coherence. In parallel, we extract edge maps from BEV features and use them as auxiliary information to learn edge cues. In our extensive experiments on the Occ3D-nuScenes dataset, HyGE-Occ outperforms existing work, demonstrating superior 3D geometric reasoning.\n\n3D Panoptic Occupancy Prediction 的目标，是通过预测 3D 空间中每个被占据区域的语义类别和实例身份，重建一个稠密体素场景图。实现这种细粒度 3D 理解需要精确的几何推理，以及在复杂环境中的空间一致场景表示。然而，现有方法往往难以保持精确几何，也难以准确捕捉对稳健 panoptic 分离至关重要的 3D 实例空间范围。为解决这些问题，我们提出 HyGE-Occ，这是一种新框架，利用带有 3D Gaussian 和边缘先验的混合视角变换分支，同时增强 3D panoptic occupancy prediction 中的几何一致性与边界感知能力。HyGE-Occ 采用混合视角变换分支，将连续的 Gaussian-based 深度表示与离散深度 bin 表示融合，生成几何一致性和结构连贯性更好的 BEV 特征。同时，我们从 BEV 特征中提取边缘图，并将其作为辅助信息学习边缘线索。在 Occ3D-nuScenes 数据集上的大量实验表明，HyGE-Occ 优于现有工作，展现出更强的 3D 几何推理能力。\n"
  },
  {
    "path": "abs/2512.20129.md",
    "content": "### Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs\n\nAuthoring 3D scenes is a central task for spatial computing applications. Competing visions for lowering existing barriers are (1) focus on immersive, direct manipulation of 3D content or (2) leverage AI techniques that capture real scenes (3D Radiance Fields such as, NeRFs, 3D Gaussian Splatting) and modify them at a higher level of abstraction, at the cost of high latency. We unify the complementary strengths of these approaches and investigate how to integrate generative AI advances into real-time, immersive 3D Radiance Field editing. We introduce Dreamcrafter, a VR-based 3D scene editing system that: (1) provides a modular architecture to integrate generative AI algorithms; (2) combines different levels of control for creating objects, including natural language and direct manipulation; and (3) introduces proxy representations that support interaction during high-latency operations. We contribute empirical findings on control preferences and discuss how generative AI interfaces beyond text input enhance creativity in scene editing and world building.\n\n三维场景创作是空间计算应用中的核心任务。当前降低创作门槛主要有两种思路：一是强调对三维内容进行沉浸式、直接操作；二是借助 AI 技术对真实场景进行采集与高层次修改，例如 NeRF 和 3D Gaussian Splatting 等 3D Radiance Field 表示，但代价是较高延迟。我们将这两种思路的互补优势结合起来，探索如何把生成式 AI 的进展融入实时、沉浸式的 3D Radiance Field 编辑中。为此，我们提出 Dreamcrafter，一个基于 VR 的三维场景编辑系统，它：(1) 提供模块化架构，便于集成生成式 AI 算法；(2) 结合多层级控制方式来创建物体，包括自然语言和直接操作；(3) 引入代理表示，以支持在高延迟操作期间进行交互。我们还给出了关于控制偏好的实证发现，并讨论了除文本输入之外的生成式 AI 交互方式如何增强三维场景编辑和世界构建中的创造力。\n"
  },
  {
    "path": "abs/2512.20148.md",
    "content": "### Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)\n\nAutomating tasks in orchards is challenging because of the large amount of variation in the environment and occlusions. One of the challenges is apple pose estimation, where key points, such as the calyx, are often occluded. Recently developed pose estimation methods no longer rely on these key points, but still require them for annotations, making annotating challenging and time-consuming. Due to the abovementioned occlusions, there can be conflicting and missing annotations of the same fruit between different images. Novel 3D reconstruction methods can be used to simplify annotating and enlarge datasets. We propose a novel pipeline consisting of 3D Gaussian Splatting to reconstruct an orchard scene, simplified annotations, automated projection of the annotations to images, and the training and evaluation of a pose estimation method. Using our pipeline, 105 manual annotations were required to obtain 28,191 training labels, a reduction of 99.6%. Experimental results indicated that training with labels of fruits that are $\\leq95\\%$ occluded resulted in the best performance, with a neutral F1 score of 0.927 on the original images and 0.970 on the rendered images. Adjusting the size of the training dataset had small effects on the model performance in terms of F1 score and pose estimation accuracy. It was found that the least occluded fruits had the best position estimation, which worsened as the fruits became more occluded. It was also found that the tested pose estimation method was unable to correctly learn the orientation estimation of apples.\n\n果园作业自动化具有挑战性，因为环境变化多且遮挡严重。其中一个问题是苹果位姿估计，像花萼这类关键点经常被遮挡。近期提出的位姿估计方法虽然不再依赖这些关键点本身，但在标注阶段仍需要它们，因此标注过程依然困难且耗时。由于上述遮挡问题，同一个果实在不同图像中可能会出现相互冲突或缺失的标注。新的三维重建方法可用于简化标注流程并扩充数据集。我们提出一个新的流程，利用 3D Gaussian Splatting 重建果园场景，结合简化标注、标注自动投影回图像，以及位姿估计方法的训练与评估。使用该流程，只需 105 条人工标注即可得到 28,191 条训练标签，减少了 99.6% 的人工标注量。实验结果表明，当训练标签中的果实遮挡程度不超过 95% 时性能最佳，在原始图像上的 neutral F1 分数为 0.927，在渲染图像上为 0.970。调整训练集大小对模型性能的影响较小。结果还表明，遮挡越少的果实位置估计越准确，而随着遮挡程度加深，位置估计会变差；同时，被测试的位姿估计方法无法正确学习苹果的朝向估计。\n"
  },
  {
    "path": "abs/2512.20495.md",
    "content": "### Nebula: Enable City-Scale 3D Gaussian Splatting in Virtual Reality via Collaborative Rendering and Accelerated Stereo Rasterization\n\n3D Gaussian splatting (3DGS) has drawn significant attention in the architectural community recently. However, current architectural designs often overlook the 3DGS scalability, making them fragile for extremely large-scale 3DGS. Meanwhile, the VR bandwidth requirement makes it impossible to deliver high-fidelity and smooth VR content from the cloud. We present Nebula, a coherent acceleration framework for large-scale 3DGS collaborative rendering. Instead of streaming videos, Nebula streams intermediate results after the LoD search, reducing 1925% data communication between the cloud and the client. To further enhance the motion-to-photon experience, we introduce a temporal-aware LoD search in the cloud that tames the irregular memory access and reduces redundant data access by exploiting temporal coherence across frames. On the client side, we propose a novel stereo rasterization that enables two eyes to share most computations during the stereo rendering with bit-accurate quality. With minimal hardware augmentations, Nebula achieves 2.7x motion-to-photon speedup and reduces 1925% bandwidth over lossy video streaming.\n\n3D Gaussian splatting (3DGS) 近期在建筑领域受到广泛关注。然而，现有体系结构设计往往忽视 3DGS 的可扩展性，使其在超大规模 3DGS 场景下十分脆弱。与此同时，VR 对带宽的要求也使得从云端传输高保真且流畅的 VR 内容变得不可行。我们提出 Nebula，一个面向大规模 3DGS 协同渲染的统一加速框架。Nebula 不直接传输视频，而是在完成 LoD 搜索后传输中间结果，从而将云端与客户端之间的数据通信减少 1925%。为进一步提升 motion-to-photon 体验，我们在云端引入时序感知的 LoD 搜索，利用跨帧时间一致性抑制不规则内存访问并减少冗余数据访问。客户端侧则提出一种新的双目光栅化方法，使双眼在立体渲染中共享大部分计算，同时保持位级精确的画质。在极少硬件增补下，Nebula 可将 motion-to-photon 速度提升 2.7 倍，并相较有损视频流减少 1925% 带宽。\n"
  },
  {
    "path": "abs/2512.20927.md",
    "content": "### Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting\n\nRecent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian Splatting Network (GS-Net), which predicts Gaussian features in a generalizable manner. Extensive experiments on ScanNet and LeRF demonstrate that our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate 43.7x speedup on 512-D feature maps. Code will be made publicly available.\n\n近期计算机视觉研究已经借助 3D Gaussian Splatting (3D-GS) 将开放词汇分割扩展到三维领域。然而，高效渲染开放词汇查询所需的高维特征仍然是主要挑战。现有方法通常依赖码本或特征压缩，这会带来信息损失，从而降低分割质量。为解决这一问题，我们提出 Quantile Rendering (Q-Render)，这是一种面向 3D 高斯的新型渲染策略，能够在保持高保真的同时高效处理高维特征。不同于传统体渲染会密集采样沿每条光线相交的所有 3D 高斯，Q-Render 只稀疏采样沿光线具有主导影响的高斯。我们进一步将 Q-Render 集成到一个可泛化的三维神经网络中，并提出 Gaussian Splatting Network (GS-Net)，以可泛化的方式预测高斯特征。在 ScanNet 和 LeRF 上的大量实验表明，我们的框架优于现有最先进方法，并在 512 维特征图上实现约 43.7 倍的实时渲染加速。代码将公开。\n"
  },
  {
    "path": "abs/2512.20943.md",
    "content": "### AirGS: Real-Time 4D Gaussian Streaming for Free-Viewpoint Video Experiences\n\nFree-viewpoint video (FVV) enables immersive viewing experiences by allowing users to view scenes from arbitrary perspectives. As a prominent reconstruction technique for FVV generation, 4D Gaussian Splatting (4DGS) models dynamic scenes with time-varying 3D Gaussian ellipsoids and achieves high-quality rendering via fast rasterization. However, existing 4DGS approaches suffer from quality degradation over long sequences and impose substantial bandwidth and storage overhead, limiting their applicability in real-time and wide-scale deployments. Therefore, we present AirGS, a streaming-optimized 4DGS framework that rearchitects the training and delivery pipeline to enable high-quality, low-latency FVV experiences. AirGS converts Gaussian video streams into multi-channel 2D formats and intelligently identifies keyframes to enhance frame reconstruction quality. It further combines temporal coherence with inflation loss to reduce training time and representation size. To support communication-efficient transmission, AirGS models 4DGS delivery as an integer linear programming problem and design a lightweight pruning level selection algorithm to adaptively prune the Gaussian updates to be transmitted, balancing reconstruction quality and bandwidth consumption. Extensive experiments demonstrate that AirGS reduces quality deviation in PSNR by more than 20% when scene changes, maintains frame-level PSNR consistently above 30, accelerates training by 6 times, reduces per-frame transmission size by nearly 50% compared to the SOTA 4DGS approaches.\n\n自由视点视频允许用户从任意视角观察场景，从而带来沉浸式观看体验。作为生成自由视点视频的重要重建技术，4D Gaussian Splatting (4DGS) 通过随时间变化的三维高斯椭球对动态场景建模，并借助快速光栅化实现高质量渲染。然而，现有 4DGS 方法在长序列上会出现质量退化，并带来较高的带宽和存储开销，限制了其在实时和大规模部署中的应用。因此，我们提出 AirGS，一个面向流式传输优化的 4DGS 框架，通过重新设计训练与传输流程来实现高质量、低时延的自由视点视频体验。AirGS 将高斯视频流转换为多通道二维格式，并智能识别关键帧以提升帧重建质量；同时结合时间一致性和 inflation loss 来减少训练时间和表示规模。为支持高通信效率传输，AirGS 将 4DGS 传输建模为整数线性规划问题，并设计了轻量的剪枝级别选择算法，自适应选择待传输的高斯更新，在重建质量和带宽消耗之间取得平衡。大量实验表明，AirGS 在场景变化时可将 PSNR 的质量波动降低 20% 以上，帧级 PSNR 持续保持在 30 以上，训练加速 6 倍，并将单帧传输大小相较最先进 4DGS 方法减少近 50%。\n"
  },
  {
    "path": "abs/2512.21099.md",
    "content": "### TexAvatars : Hybrid Texel-3D Representations for Stable Rigging of Photorealistic Gaussian Head Avatars\n\nConstructing drivable and photorealistic 3D head avatars has become a central task in AR/XR, enabling immersive and expressive user experiences. With the emergence of high-fidelity and efficient representations such as 3D Gaussians, recent works have pushed toward ultra-detailed head avatars. Existing approaches typically fall into two categories: rule-based analytic rigging or neural network-based deformation fields. While effective in constrained settings, both approaches often fail to generalize to unseen expressions and poses, particularly in extreme reenactment scenarios. Other methods constrain Gaussians to the global texel space of 3DMMs to reduce rendering complexity. However, these texel-based avatars tend to underutilize the underlying mesh structure. They apply minimal analytic deformation and rely heavily on neural regressors and heuristic regularization in UV space, which weakens geometric consistency and limits extrapolation to complex, out-of-distribution deformations. To address these limitations, we introduce TexAvatars, a hybrid avatar representation that combines the explicit geometric grounding of analytic rigging with the spatial continuity of texel space. Our approach predicts local geometric attributes in UV space via CNNs, but drives 3D deformation through mesh-aware Jacobians, enabling smooth and semantically meaningful transitions across triangle boundaries. This hybrid design separates semantic modeling from geometric control, resulting in improved generalization, interpretability, and stability. Furthermore, TexAvatars captures fine-grained expression effects, including muscle-induced wrinkles, glabellar lines, and realistic mouth cavity geometry, with high fidelity. Our method achieves state-of-the-art performance under extreme pose and expression variations, demonstrating strong generalization in challenging head reenactment settings.\n\n可驱动且照片级真实的 3D 头部 avatar 构建，已经成为 AR/XR 中的核心任务，能够带来沉浸式且富有表现力的用户体验。随着 3D Gaussian 等高保真、高效率表示的出现，近期工作不断推动超高细节头部 avatar 的发展。现有方法通常分为两类：基于规则的解析 rigging，以及基于神经网络的形变场。尽管它们在受限场景中有效，但在未见表情和姿态，尤其是极端重演场景下，往往难以泛化。还有一些方法将 Gaussian 约束到 3DMM 的全局 texel 空间中以降低渲染复杂度，但这类 texel-based avatar 往往没有充分利用底层网格结构，只进行很少的解析形变，同时过度依赖 UV 空间中的神经回归器和启发式正则，从而削弱几何一致性，并限制对复杂分布外形变的外推能力。为解决这些问题，我们提出 TexAvatars，这是一种混合 avatar 表示，将解析 rigging 的显式几何基础与 texel 空间的空间连续性结合起来。我们的方法通过 CNN 在 UV 空间中预测局部几何属性，但通过 mesh-aware Jacobian 驱动 3D 形变，从而在三角形边界间实现平滑且具有语义意义的过渡。这种混合设计将语义建模与几何控制分离，带来更好的泛化性、可解释性和稳定性。此外，TexAvatars 还能高保真捕捉细粒度表情效果，包括肌肉驱动的皱纹、眉间纹和真实的口腔几何。我们的方法在极端姿态和表情变化下达到 SOTA，展示了在困难头部重演场景中的强泛化能力。\n"
  },
  {
    "path": "abs/2512.22771.md",
    "content": "### Next Best View Selections for Semantic and Dynamic 3D Gaussian Splatting\n\nUnderstanding semantics and dynamics has been crucial for embodied agents in various tasks. Both tasks have much more data redundancy than the static scene understanding task. We formulate the view selection problem as an active learning problem, where the goal is to prioritize frames that provide the greatest information gain for model training. To this end, we propose an active learning algorithm with Fisher Information that quantifies the informativeness of candidate views with respect to both semantic Gaussian parameters and deformation networks. This formulation allows our method to jointly handle semantic reasoning and dynamic scene modeling, providing a principled alternative to heuristic or random strategies. We evaluate our method on large-scale static images and dynamic video datasets by selecting informative frames from multi-camera setups. Experimental results demonstrate that our approach consistently improves rendering quality and semantic segmentation performance, outperforming baseline methods based on random selection and uncertainty-based heuristics.\n\n语义理解和动态理解对具身智能体的多种任务都十分关键，而这两类任务相比静态场景理解存在更高的数据冗余。我们将视角选择问题表述为主动学习问题，目标是在模型训练中优先选择能够带来最大信息增益的帧。为此，我们提出一种基于 Fisher 信息的主动学习算法，用于量化候选视角对语义高斯参数和形变网络的共同信息量。该表述使我们的方法能够同时处理语义推理和动态场景建模，为启发式或随机策略提供了一种更有原则的替代方案。我们在大规模静态图像和动态视频数据集上进行了评估，从多相机设置中选择信息量更高的帧。实验结果表明，我们的方法能够稳定提升渲染质量和语义分割性能，优于基于随机选择和不确定性启发式的基线方法。\n"
  },
  {
    "path": "abs/2512.23180.md",
    "content": "### GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation\n\nDriving World Models (DWMs) have been developing rapidly with the advances of generative models. However, existing DWMs lack 3D scene understanding capabilities and can only generate content conditioned on input data, without the ability to interpret or reason about the driving environment. Moreover, current approaches represent 3D spatial information with point cloud or BEV features do not accurately align textual information with the underlying 3D scene. To address these limitations, we propose a novel unified DWM framework based on 3D Gaussian scene representation, which enables both 3D scene understanding and multi-modal scene generation, while also enabling contextual enrichment for understanding and generation tasks. Our approach directly aligns textual information with the 3D scene by embedding rich linguistic features into each Gaussian primitive, thereby achieving early modality alignment. In addition, we design a novel task-aware language-guided sampling strategy that removes redundant 3D Gaussians and injects accurate and compact 3D tokens into LLM. Furthermore, we design a dual-condition multi-modal generation model, where the information captured by our vision-language model is leveraged as a high-level language condition in combination with a low-level image condition, jointly guiding the multi-modal generation process. We conduct comprehensive studies on the nuScenes and NuInteract datasets to validate the effectiveness of our framework. Our method achieves state-of-the-art performance.\n\n随着生成模型的发展，Driving World Models (DWM) 正在快速演进。然而，现有 DWM 缺乏 3D 场景理解能力，只能根据输入数据生成内容，而无法解释或推理驾驶环境。此外，当前方法通常使用点云或 BEV 特征表示 3D 空间信息，难以把文本信息与底层 3D 场景准确对齐。为解决这些问题，我们提出一种基于 3D Gaussian 场景表示的统一 DWM 框架，既能进行 3D 场景理解，也能进行多模态场景生成，并可为理解与生成任务提供上下文增强。我们的方法通过将丰富的语言特征嵌入每个 Gaussian primitive，直接把文本信息与 3D 场景对齐，从而实现早期模态对齐。此外，我们设计了新的任务感知语言引导采样策略，以去除冗余 3D Gaussian，并向 LLM 注入准确且紧凑的 3D token。进一步地，我们设计了双条件多模态生成模型，将视觉语言模型捕获的信息作为高层语言条件，并结合低层图像条件，共同引导多模态生成过程。我们在 nuScenes 和 NuInteract 数据集上进行了全面实验，验证了框架的有效性，方法达到 SOTA。\n"
  },
  {
    "path": "abs/2512.23998.md",
    "content": "### Improved 3D Gaussian Splatting of Unknown Spacecraft Structure Using Space Environment Illumination Knowledge\n\nThis work presents a novel pipeline to recover the 3D structure of an unknown target spacecraft from a sequence of images captured during Rendezvous and Proximity Operations (RPO) in space. The target's geometry and appearance are represented as a 3D Gaussian Splatting (3DGS) model. However, learning 3DGS requires static scenes, an assumption in contrast to dynamic lighting conditions encountered in spaceborne imagery. The trained 3DGS model can also be used for camera pose estimation through photometric optimization. Therefore, in addition to recovering a geometrically accurate 3DGS model, the photometric accuracy of the rendered images is imperative to downstream pose estimation tasks during the RPO process. This work proposes to incorporate the prior knowledge of the Sun's position, estimated and maintained by the servicer spacecraft, into the training pipeline for improved photometric quality of 3DGS rasterization. Experimental studies demonstrate the effectiveness of the proposed solution, as 3DGS models trained on a sequence of images learn to adapt to rapidly changing illumination conditions in space and reflect global shadowing and self-occlusion.\n\n本文提出一种新流程，用于从空间交会与近距离操作期间采集的图像序列中恢复未知目标航天器的三维结构。目标的几何和外观由 3D Gaussian Splatting (3DGS) 模型表示。然而，3DGS 的学习通常假设场景静态，这与太空图像中会遇到的动态光照条件相矛盾。训练好的 3DGS 模型还可通过光度优化用于相机位姿估计。因此，除了恢复几何精确的 3DGS 模型之外，渲染图像的光度精度对交会与近距离操作过程中的下游位姿估计任务也至关重要。本文提出将服务航天器估计并维护的太阳位置先验纳入训练流程，以提升 3DGS 光栅化的光度质量。实验表明，该方案能够使基于图像序列训练的 3DGS 模型适应太空中快速变化的光照条件，并反映全局阴影和自遮挡。\n"
  },
  {
    "path": "abs/2512.24742.md",
    "content": "### Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression\n\nThe recent advent of 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in real-time novel view synthesis. However, the rapid proliferation of 3DGS-based algorithms has created a pressing need for standardized and comprehensive evaluation tools, especially for compression task. Existing benchmarks often lack the specific metrics necessary to holistically assess the unique characteristics of different methods, such as rendering speed, rate distortion trade-offs memory efficiency, and geometric accuracy. To address this gap, we introduce Splatwizard, a unified benchmark toolkit designed specifically for benchmarking 3DGS compression models. Splatwizard provides an easy-to-use framework to implement new 3DGS compression model and utilize state-of-the-art techniques proposed by previous work. Besides, an integrated pipeline that automates the calculation of key performance indicators, including image-based quality metrics, chamfer distance of reconstruct mesh, rendering frame rates, and computational resource consumption is included in the framework as well. Code is available at https://github.com/splatwizard/splatwizard\n\n近来 3D Gaussian Splatting (3DGS) 的出现推动了实时新视角合成的发展，但 3DGS 算法的快速增长也带来了对统一、全面评测工具的迫切需求，尤其是在压缩任务上。现有基准往往缺少能够整体评估不同方法独特特性的指标，例如渲染速度、率失真权衡、内存效率和几何精度。为填补这一空白，我们提出 Splatwizard，一个专门用于评测 3DGS 压缩模型的统一基准工具包。Splatwizard 提供了易用框架，方便实现新的 3DGS 压缩模型并复用已有工作的先进技术。此外，框架还集成了自动化流水线，可计算图像质量指标、重建网格的 Chamfer 距离、渲染帧率和计算资源消耗等关键性能指标。代码见 https://github.com/splatwizard/splatwizard 。\n"
  },
  {
    "path": "abs/2512.24763.md",
    "content": "### UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning\n\n3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis. Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmentation for better scene understanding. A key challenge is the inconsistency of 2D instance labels across views, leading to poor 3D predictions. Existing methods use a two-stage approach in which some rely on contrastive learning with hyperparameter-sensitive clustering, while others preprocess labels for consistency. We propose a unified framework that merges these steps, reducing training time and improving performance by introducing a learnable feature embedding for segmentation in Gaussian primitives. This embedding is then efficiently decoded into instance labels through a novel \"Embedding-to-Label\" process, effectively integrating the optimization. While this unified framework offers substantial benefits, we observed artifacts at the object boundaries. To address the object boundary issues, we propose hard-mining samples along these boundaries. However, directly applying hard mining to the feature embeddings proved unstable. Therefore, we apply a linear layer to the rasterized feature embeddings before calculating the triplet loss, which stabilizes training and significantly improves performance. Our method outperforms baselines qualitatively and quantitatively on the ScanNet, Replica3D, and Messy-Rooms datasets.\n\n3D Gaussian Splatting (3DGS) 和 Neural Radiance Fields (NeRF) 推动了新视角合成的发展。近期方法将多视角二维分割扩展到三维，从而支持实例分割和语义分割，以增强场景理解。一个关键挑战是不同视角下二维实例标签不一致，这会导致三维预测效果较差。现有方法通常采用两阶段流程，其中一类依赖对超参数敏感的对比学习聚类，另一类则先对标签进行一致性预处理。我们提出一个统一框架，将这些步骤合并起来，通过在高斯 primitives 中引入可学习的分割特征嵌入来缩短训练时间并提升性能。随后，这种嵌入可通过一种新的“Embedding-to-Label”过程高效解码为实例标签，从而将优化整合为一个整体。尽管该统一框架带来了显著优势，我们观察到在目标边界处仍会出现伪影。为此，我们提出沿边界进行困难样本挖掘。但直接对特征嵌入施加困难挖掘会导致训练不稳定，因此我们在计算 triplet loss 之前，先对光栅化后的特征嵌入施加一个线性层，从而稳定训练并显著提升性能。该方法在 ScanNet、Replica3D 和 Messy-Rooms 数据集上均在定性和定量上优于基线。\n"
  },
  {
    "path": "abs/2512.24986.md",
    "content": "### PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes\n\nRealistic visual simulations are omnipresent, yet their creation requires computing time, rendering, and expert animation knowledge. Open-vocabulary visual effects generation from text inputs emerges as a promising solution that can unlock immense creative potential. However, current pipelines lack both physical realism and effective language interfaces, requiring slow offline optimization. In contrast, PhysTalk takes a 3D Gaussian Splatting (3DGS) scene as input and translates arbitrary user prompts into real time, physics based, interactive 4D animations. A large language model (LLM) generates executable code that directly modifies 3DGS parameters through lightweight proxies and particle dynamics. Notably, PhysTalk is the first framework to couple 3DGS directly with a physics simulator without relying on time consuming mesh extraction. While remaining open vocabulary, this design enables interactive 3D Gaussian animation via collision aware, physics based manipulation of arbitrary, multi material objects. Finally, PhysTalk is train-free and computationally lightweight: this makes 4D animation broadly accessible and shifts these workflows from a \"render and wait\" paradigm toward an interactive dialogue with a modern, physics-informed pipeline.\n\n逼真的视觉仿真无处不在，但其制作通常需要大量计算时间、渲染开销和专业动画知识。基于文本输入的开放词汇视觉效果生成提供了极具潜力的解决方案，但现有流程同时缺乏物理真实感和有效的语言接口，并且依赖缓慢的离线优化。相比之下，PhysTalk 以 3D Gaussian Splatting (3DGS) 场景为输入，将任意用户提示实时转换为基于物理的交互式 4D 动画。大语言模型会生成可执行代码，通过轻量代理和粒子动力学直接修改 3DGS 参数。值得注意的是，PhysTalk 是首个在不依赖耗时网格提取的前提下，将 3DGS 与物理模拟器直接耦合的框架。在保持开放词汇能力的同时，这一设计实现了对任意多材料物体的碰撞感知、基于物理的交互式 3D 高斯动画。最后，PhysTalk 无需训练且计算开销轻量，使 4D 动画更易获得，并把工作流程从“渲染后等待”转向现代物理感知管线下的交互式对话。\n"
  },
  {
    "path": "abs/2601.00285.md",
    "content": "### SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting\n\nReconstructing a dynamic target moving over a large area is challenging. Standard approaches for dynamic object reconstruction require dense coverage in both the viewing space and the temporal dimension, typically relying on multi-view videos captured at each time step. However, such setups are only possible in constrained environments. In real-world scenarios, observations are often sparse over time and captured sparsely from diverse viewpoints (e.g., from security cameras), making dynamic reconstruction highly ill-posed. We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations. To initialize SV-GS, we leverage a rough skeleton graph and an initial static reconstruction as inputs to guide motion estimation. (Later, we show that this input requirement can be relaxed.) Our method optimizes a skeleton-driven deformation field composed of a coarse skeleton joint pose estimator and a module for fine-grained deformations. By making only the joint pose estimator time-dependent, our model enables smooth motion interpolation while preserving learned geometric details. Experiments on synthetic datasets show that our method outperforms existing approaches under sparse observations by up to 34% in PSNR, and achieves comparable performance to dense monocular video methods on real-world datasets despite using significantly fewer frames. Moreover, we demonstrate that the input initial static reconstruction can be replaced by a diffusion-based generative prior, making our method more practical for real-world scenarios.\n\n对在大范围区域内运动的动态目标进行重建是一项具有挑战性的任务。标准的动态目标重建方法通常要求在视角空间和时间维度上都具备稠密观测，也就是在每个时间步都依赖多视角视频，这种设置往往只在受控环境中才可实现。而在真实场景中，观测通常在时间上稀疏、视角上也分散，比如来自安防摄像头的输入，这使得动态重建问题高度病态。本文提出 SV-GS，一个在稀疏观测条件下同时估计形变模型和目标时序运动的框架。为初始化 SV-GS，方法利用粗略骨架图和初始静态重建结果来引导运动估计，后文还进一步表明这一输入条件可以放宽。该方法优化了一个由粗粒度骨架关节位姿估计器和细粒度形变模块组成的骨架驱动形变场。通过仅让关节位姿估计器随时间变化，模型既能实现平滑的运动插值，又能保留已学习到的几何细节。合成数据集实验表明，在稀疏观测条件下，该方法相较已有方法在 PSNR 上最高提升 34%；在真实数据集上，尽管使用的帧数明显更少，也能取得与稠密单目视频方法相当的性能。此外，作者还展示了可以用基于扩散模型的生成先验替代初始静态重建输入，从而使方法在真实场景中更具实用性。\n"
  },
  {
    "path": "abs/2601.00705.md",
    "content": "### RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization\n\nWe introduce RGS-SLAM, a robust Gaussian-splatting SLAM framework that replaces the residual-driven densification stage of GS-SLAM with a training-free correspondence-to-Gaussian initialization. Instead of progressively adding Gaussians as residuals reveal missing geometry, RGS-SLAM performs a one-shot triangulation of dense multi-view correspondences derived from DINOv3 descriptors refined through a confidence-aware inlier classifier, generating a well-distributed and structure-aware Gaussian seed prior to optimization. This initialization stabilizes early mapping and accelerates convergence by roughly 20%, yielding higher rendering fidelity in texture-rich and cluttered scenes while remaining fully compatible with existing GS-SLAM pipelines. Evaluated on the TUM RGB-D and Replica datasets, RGS-SLAM achieves competitive or superior localization and reconstruction accuracy compared with state-of-the-art Gaussian and point-based SLAM systems, sustaining real-time mapping performance at up to 925 FPS. Additional details and resources are available at this URL: https://breeze1124.github.io/rgs-slam-project-page/\n\n本文提出 RGS-SLAM，这是一种稳健的 Gaussian Splatting SLAM 框架，它用一种无需训练的“对应关系到高斯”的初始化方式，替代了 GS-SLAM 中由残差驱动的增密阶段。不同于在残差逐步暴露缺失几何后再逐渐添加高斯，RGS-SLAM 先基于经由置信感知内点分类器优化的 DINOv3 描述子，获取稠密多视角对应关系，再通过一次性三角化，在优化前生成分布均匀且具备结构感知能力的高斯种子。这样的初始化能够稳定早期建图过程，并将收敛速度提升约 20%，同时在纹理丰富和杂乱场景中获得更高的渲染保真度，且与现有 GS-SLAM 流水线完全兼容。在 TUM RGB-D 和 Replica 数据集上的评测表明，RGS-SLAM 在定位和重建精度上达到甚至超过当前先进的高斯和点基 SLAM 系统，并保持最高 925 FPS 的实时建图性能。\n"
  },
  {
    "path": "abs/2601.00913.md",
    "content": "### Clean-GS: Semantic Mask-Guided Pruning for 3D Gaussian Splatting\n\n3D Gaussian Splatting produces high-quality scene reconstructions but generates hundreds of thousands of spurious Gaussians (floaters) scattered throughout the environment. These artifacts obscure objects of interest and inflate model sizes, hindering deployment in bandwidth-constrained applications. We present Clean-GS, a method for removing background clutter and floaters from 3DGS reconstructions using sparse semantic masks. Our approach combines whitelist-based spatial filtering with color-guided validation and outlier removal to achieve 60-80% model compression while preserving object quality. Unlike existing 3DGS pruning methods that rely on global importance metrics, Clean-GS uses semantic information from as few as 3 segmentation masks (1% of views) to identify and remove Gaussians not belonging to the target object. Our multi-stage approach consisting of (1) whitelist filtering via projection to masked regions, (2) depth-buffered color validation, and (3) neighbor-based outlier removal isolates monuments and objects from complex outdoor scenes. Experiments on Tanks and Temples show that Clean-GS reduces file sizes from 125MB to 47MB while maintaining rendering quality, making 3DGS models practical for web deployment and AR/VR applications. Our code is available at https://github.com/smlab-niser/clean-gs\n\n3D Gaussian Splatting 能够生成高质量场景重建结果，但同时也会在场景中产生数十万个杂散高斯点，也就是常说的 floater。这些伪影会遮挡目标对象并增大模型体积，从而阻碍其在带宽受限场景中的部署。本文提出 Clean-GS，一种利用稀疏语义掩码从 3DGS 重建结果中移除背景杂物和 floater 的方法。该方法结合基于白名单的空间过滤、颜色引导验证以及离群点移除，在保留目标质量的同时实现 60% 到 80% 的模型压缩。与依赖全局重要性指标的现有 3DGS 剪枝方法不同，Clean-GS 仅需最少 3 张分割掩码（约占 1% 的视图）所提供的语义信息，就能识别并删除不属于目标对象的高斯点。其多阶段流程包括：1）通过投影到掩码区域进行白名单过滤；2）利用深度缓冲进行颜色验证；3）基于邻域的离群点移除。Tanks and Temples 上的实验表明，Clean-GS 在保持渲染质量的同时，可将文件大小从 125MB 降至 47MB，使 3DGS 模型更适用于 Web 部署以及 AR/VR 应用。\n"
  },
  {
    "path": "abs/2601.00939.md",
    "content": "### ShadowGS: Shadow-Aware 3D Gaussian Splatting for Satellite Imagery\n\n3D Gaussian Splatting (3DGS) has emerged as a novel paradigm for 3D reconstruction from satellite imagery. However, in multi-temporal satellite images, prevalent shadows exhibit significant inconsistencies due to varying illumination conditions. To address this, we propose ShadowGS, a novel framework based on 3DGS. It leverages a physics-based rendering equation from remote sensing, combined with an efficient ray marching technique, to precisely model geometrically consistent shadows while maintaining efficient rendering. Additionally, it effectively disentangles different illumination components and apparent attributes in the scene. Furthermore, we introduce a shadow consistency constraint that significantly enhances the geometric accuracy of 3D reconstruction. We also incorporate a novel shadow map prior to improve performance with sparse-view inputs. Extensive experiments demonstrate that ShadowGS outperforms current state-of-the-art methods in shadow decoupling accuracy, 3D reconstruction precision, and novel view synthesis quality, with only a few minutes of training. ShadowGS exhibits robust performance across various settings, including RGB, pansharpened, and sparse-view satellite inputs.\n\n3D Gaussian Splatting（3DGS）已经成为利用卫星影像进行三维重建的一种新范式。然而，在多时相卫星图像中，由于光照条件变化，阴影往往表现出明显的不一致性。为解决这一问题，本文提出 ShadowGS，一个基于 3DGS 的新框架。该方法结合遥感中的物理渲染方程和高效的光线步进技术，在保持渲染效率的同时，能够精确建模几何一致的阴影。此外，方法还能够有效解耦场景中的不同光照成分和表观属性。本文进一步引入阴影一致性约束，以显著提升三维重建的几何精度，并加入一种新的阴影图先验，以改善稀疏视图输入下的性能。大量实验表明，ShadowGS 仅需数分钟训练，就能在阴影解耦精度、三维重建精度和新视角合成质量上优于当前先进方法，并在 RGB、全色锐化影像和稀疏视图卫星输入等多种设置下都表现出良好的鲁棒性。\n"
  },
  {
    "path": "abs/2601.01386.md",
    "content": "### ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking\n\nParking is a critical task for autonomous driving systems (ADS), with unique challenges in crowded parking slots and GPS-denied environments. However, existing works focus on 2D parking slot perception, mapping, and localization, 3D reconstruction remains underexplored, which is crucial for capturing complex spatial geometry in parking scenarios. Naively improving the visual quality of reconstructed parking scenes does not directly benefit autonomous parking, as the key entry point for parking is the slots perception module. To address these limitations, we curate the first benchmark named ParkRecon3D, specifically designed for parking scene reconstruction. It includes sensor data from four surround-view fisheye cameras with calibrated extrinsics and dense parking slot annotations. We then propose ParkGaussian, the first framework that integrates 3D Gaussian Splatting (3DGS) for parking scene reconstruction. To further improve the alignment between reconstruction and downstream parking slot detection, we introduce a slot-aware reconstruction strategy that leverages existing parking perception methods to enhance the synthesis quality of slot regions. Experiments on ParkRecon3D demonstrate that ParkGaussian achieves state-of-the-art reconstruction quality and better preserves perception consistency for downstream tasks. The code and dataset will be released at: https://github.com/wm-research/ParkGaussian\n\n泊车是自动驾驶系统中的关键任务，在拥挤车位和无 GPS 环境下尤具挑战。然而，现有工作主要集中在二维车位感知、建图和定位上，而对三维重建的研究仍然不足，而三维重建对于刻画泊车场景中的复杂空间几何至关重要。单纯提升重建场景的视觉质量并不能直接帮助自动泊车，因为泊车任务的关键入口其实是车位感知模块。为此，本文构建了首个专门面向泊车场景重建的基准数据集 ParkRecon3D，其中包含来自四个环视鱼眼相机、已完成外参标定的传感器数据，以及稠密的车位标注。基于此，作者提出 ParkGaussian，这是首个将 3D Gaussian Splatting（3DGS）引入泊车场景重建的框架。为了进一步提升重建结果与下游车位检测任务之间的一致性，本文提出一种车位感知的重建策略，利用现有泊车感知方法增强车位区域的合成质量。ParkRecon3D 上的实验表明，ParkGaussian 在重建质量上达到当前先进水平，并且更好地保留了下游任务所需的感知一致性。\n"
  },
  {
    "path": "abs/2601.01847.md",
    "content": "### ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting\n\nMost current audio-driven facial animation research primarily focuses on generating videos with neutral emotions. While some studies have addressed the generation of facial videos driven by emotional audio, efficiently generating high-quality talking head videos that integrate both emotional expressions and style features remains a significant challenge. In this paper, we propose ESGaussianFace, an innovative framework for emotional and stylized audio-driven facial animation. Our approach leverages 3D Gaussian Splatting to reconstruct 3D scenes and render videos, ensuring efficient generation of 3D consistent results. We propose an emotion-audio-guided spatial attention method that effectively integrates emotion features with audio content features. Through emotion-guided attention, the model is able to reconstruct facial details across different emotional states more accurately. To achieve emotional and stylized deformations of the 3D Gaussian points through emotion and style features, we introduce two 3D Gaussian deformation predictors. Futhermore, we propose a multi-stage training strategy, enabling the step-by-step learning of the character's lip movements, emotional variations, and style features. Our generated results exhibit high efficiency, high quality, and 3D consistency. Extensive experimental results demonstrate that our method outperforms existing state-of-the-art techniques in terms of lip movement accuracy, expression variation, and style feature expressiveness.\n\n当前大多数语音驱动人脸动画研究主要关注生成中性表情的视频。虽然已有一些工作尝试利用带有情绪的音频生成面部视频，但要高效生成同时融合情绪表达和风格特征的高质量说话人头像视频仍然十分具有挑战性。本文提出 ESGaussianFace，一个面向情绪化与风格化语音驱动人脸动画的创新框架。该方法利用 3D Gaussian Splatting 重建三维场景并进行视频渲染，从而高效生成具有三维一致性的结果。本文提出一种情绪与音频联合引导的空间注意力机制，以有效融合情绪特征和音频内容特征。借助情绪引导注意力，模型能够更准确地重建不同情绪状态下的人脸细节。为了让 3D 高斯点在情绪和风格特征的驱动下产生情绪化和风格化形变，本文进一步引入两个 3D 高斯形变预测器。此外，作者还设计了多阶段训练策略，使模型能够逐步学习人物的唇部运动、情绪变化和风格特征。大量实验结果表明，该方法在唇动准确性、表情变化以及风格特征表现力方面都优于现有先进技术。\n"
  },
  {
    "path": "abs/2601.02072.md",
    "content": "### SketchRodGS: Sketch-based Extraction of Slender Geometries for Animating Gaussian Splatting Scenes\n\nPhysics simulation of slender elastic objects often requires discretization as a polyline. However, constructing a polyline from Gaussian splatting is challenging as Gaussian splatting lacks connectivity information and the configuration of Gaussian primitives contains much noise. This paper presents a method to extract a polyline representation of the slender part of the objects in a Gaussian splatting scene from the user's sketching input. Our method robustly constructs a polyline mesh that represents the slender parts using the screen-space shortest path analysis that can be efficiently solved using dynamic programming. We demonstrate the effectiveness of our approach in several in-the-wild examples.\n\n细长弹性物体的物理模拟通常需要将其离散表示为折线。然而，要从 Gaussian Splatting 场景中构建折线并不容易，因为 Gaussian Splatting 本身缺乏连通性信息，而且高斯原语的分布往往含有较多噪声。本文提出一种方法，可根据用户的草图输入，从 Gaussian Splatting 场景中提取物体细长部分的折线表示。该方法通过屏幕空间最短路径分析来稳健地构建表示细长结构的折线网格，而这一分析过程可以通过动态规划高效求解。作者在多个真实场景示例中验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2601.02102.md",
    "content": "### 360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images\n\n3D scene reconstruction is fundamental for spatial intelligence applications such as AR, robotics, and digital twins. Traditional multi-view stereo struggles with sparse viewpoints or low-texture regions, while neural rendering approaches, though capable of producing high-quality results, require per-scene optimization and lack real-time efficiency. Explicit 3D Gaussian Splatting (3DGS) enables efficient rendering, but most feed-forward variants focus on visual quality rather than geometric consistency, limiting accurate surface reconstruction and overall reliability in spatial perception tasks. This paper presents a novel feed-forward 3DGS framework for 360 images, capable of generating geometrically consistent Gaussian primitives while maintaining high rendering quality. A Depth-Normal geometric regularization is introduced to couple rendered depth gradients with normal information, supervising Gaussian rotation, scale, and position to improve point cloud and surface accuracy. Experimental results show that the proposed method maintains high rendering quality while significantly improving geometric consistency, providing an effective solution for 3D reconstruction in spatial perception tasks.\n\n三维场景重建是增强现实、机器人和数字孪生等空间智能应用的基础。传统多视图立体方法在稀疏视角或低纹理区域中表现不佳，而神经渲染方法虽然能够生成高质量结果，却通常需要逐场景优化，难以满足实时效率要求。显式 3D Gaussian Splatting（3DGS）具备高效渲染能力，但现有大多数前馈式方法更关注视觉质量而非几何一致性，这限制了准确表面重建以及空间感知任务中的整体可靠性。本文提出一个面向 360 图像的新型前馈式 3DGS 框架，能够在保持高渲染质量的同时生成几何一致的高斯原语。作者引入一种深度-法向几何正则，将渲染深度梯度与法向信息耦合起来，并对高斯的旋转、尺度和位置进行监督，以提升点云和表面精度。实验结果表明，该方法在维持高渲染质量的同时显著改善了几何一致性，为空间感知任务中的三维重建提供了一种有效方案。\n"
  },
  {
    "path": "abs/2601.03024.md",
    "content": "### SA-ResGS: Self-Augmented Residual 3D Gaussian Splatting for Next Best View Selection\n\nWe propose Self-Augmented Residual 3D Gaussian Splatting (SA-ResGS), a novel framework to stabilize uncertainty quantification and enhancing uncertainty-aware supervision in next-best-view (NBV) selection for active scene reconstruction. SA-ResGS improves both the reliability of uncertainty estimates and their effectiveness for supervision by generating Self-Augmented point clouds (SA-Points) via triangulation between a training view and a rasterized extrapolated view, enabling efficient scene coverage estimation. While improving scene coverage through physically guided view selection, SA-ResGS also addresses the challenge of under-supervised Gaussians, exacerbated by sparse and wide-baseline views, by introducing the first residual learning strategy tailored for 3D Gaussian Splatting. This targeted supervision enhances gradient flow in high-uncertainty Gaussians by combining uncertainty-driven filtering with dropout- and hard-negative-mining-inspired sampling. Our contributions are threefold: (1) a physically grounded view selection strategy that promotes efficient and uniform scene coverage; (2) an uncertainty-aware residual supervision scheme that amplifies learning signals for weakly contributing Gaussians, improving training stability and uncertainty estimation across scenes with diverse camera distributions; (3) an implicit unbiasing of uncertainty quantification as a consequence of constrained view selection and residual supervision, which together mitigate conflicting effects of wide-baseline exploration and sparse-view ambiguity in NBV planning. Experiments on active view selection demonstrate that SA-ResGS outperforms state-of-the-art baselines in both reconstruction quality and view selection robustness.\n\n本文提出 Self-Augmented Residual 3D Gaussian Splatting（SA-ResGS），这是一个用于主动场景重建中下一最佳视角（NBV）选择的新框架，旨在稳定不确定性量化并增强基于不确定性的监督效果。SA-ResGS 通过在训练视角与栅格化外推视角之间做三角化，生成自增强点云（SA-Points），从而更有效地估计场景覆盖范围，并同时提升不确定性估计的可靠性及其用于监督时的有效性。在通过物理引导视角选择提升场景覆盖率的同时，SA-ResGS 还针对稀疏、大基线视图下高斯监督不足的问题，引入了首个专为 3D Gaussian Splatting 设计的残差学习策略。该有针对性的监督将不确定性驱动的筛选与受 dropout 和困难负样本挖掘启发的采样方式结合起来，以增强高不确定性高斯上的梯度流。本文的主要贡献包括：1）一种具备物理依据的视角选择策略，可促进高效且均匀的场景覆盖；2）一种感知不确定性的残差监督方案，可放大贡献较弱高斯的学习信号，从而提升不同相机分布场景中的训练稳定性和不确定性估计质量；3）通过受约束的视角选择和残差监督，隐式减轻不确定性量化中的偏差，缓解宽基线探索与稀疏视图歧义在 NBV 规划中的冲突。主动视角选择实验表明，SA-ResGS 在重建质量和视角选择鲁棒性上都优于当前先进基线。\n"
  },
  {
    "path": "abs/2601.03200.md",
    "content": "### A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting\n\nDeveloping high-fidelity, interactive digital twins is crucial for enabling closed-loop motion planning and reliable real-world robot execution, which are essential to advancing sim-to-real transfer. However, existing approaches often suffer from slow reconstruction, limited visual fidelity, and difficulties in converting photorealistic models into planning-ready collision geometry. We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting (3DGS) for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials. These results demonstrate that 3DGS-based digital twins, enriched with semantic and geometric consistency, offer a fast, reliable, and scalable path from perception to manipulation in unstructured environments.\n\n构建高保真、可交互的数字孪生对于实现闭环运动规划和可靠的真实机器人执行至关重要，而这正是推动 sim-to-real 迁移的关键基础。然而，现有方法往往存在重建速度慢、视觉保真度有限，以及难以将照片级真实模型转换为适用于规划的碰撞几何等问题。本文提出一个实用框架，能够从稀疏 RGB 输入在数分钟内构建高质量数字孪生。系统采用 3D Gaussian Splatting（3DGS）作为统一场景表示，以实现快速、逼真的重建；并通过可见性感知的语义融合提升三维标注精度，同时引入一种高效的基于滤波的几何转换方法，将结果无缝转为可用于碰撞检测的模型，并接入 Unity-ROS2-MoveIt 物理引擎。在使用 Franka Emika Panda 机械臂执行抓取放置任务的实验中，本文展示了增强后的几何精度能够有效支持真实场景中的稳健操作。这表明，融合语义与几何一致性的 3DGS 数字孪生，为非结构化环境下从感知到操作提供了一条快速、可靠且可扩展的路径。\n"
  },
  {
    "path": "abs/2601.03319.md",
    "content": "### CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature\n\nA photorealistic and controllable 3D caricaturization framework for faces is introduced. We start with an intrinsic Gaussian curvature-based surface exaggeration technique, which, when coupled with texture, tends to produce over-smoothed renders. To address this, we resort to 3D Gaussian Splatting (3DGS), which has recently been shown to produce realistic free-viewpoint avatars. Given a multiview sequence, we extract a FLAME mesh, solve a curvature-weighted Poisson equation, and obtain its exaggerated form. However, directly deforming the Gaussians yields poor results, necessitating the synthesis of pseudo-ground-truth caricature images by warping each frame to its exaggerated 2D representation using local affine transformations. We then devise a training scheme that alternates real and synthesized supervision, enabling a single Gaussian collection to represent both natural and exaggerated avatars. This scheme improves fidelity, supports local edits, and allows continuous control over the intensity of the caricature. In order to achieve real-time deformations, an efficient interpolation between the original and exaggerated surfaces is introduced. We further analyze and show that it has a bounded deviation from closed-form solutions. In both quantitative and qualitative evaluations, our results outperform prior work, delivering photorealistic, geometry-controlled caricature avatars.\n\n本文提出一个面向人脸的高真实感、可控三维夸张化框架。方法首先使用基于内蕴高斯曲率的表面夸张技术，但这种方法与纹理结合时往往会产生过度平滑的渲染结果。为解决这一问题，本文转而采用 3D Gaussian Splatting（3DGS），该表示近期已被证明能够生成真实的自由视角头像。给定一段多视角序列，方法先提取 FLAME 网格，求解曲率加权的泊松方程，并得到其夸张形态。然而，直接对高斯进行形变会带来较差结果，因此本文通过局部仿射变换，将每帧图像扭曲到其夸张后的二维表示，合成伪真值夸张图像。随后，本文设计了一种在真实监督与合成监督之间交替进行的训练方案，使单一高斯集合既能表示自然头像，也能表示夸张头像。该方案提升了保真度，支持局部编辑，并允许连续控制夸张强度。为实现实时形变，本文还提出一种在原始表面与夸张表面之间高效插值的方法，并分析证明其与闭式解之间的偏差有界。无论在定量还是定性评估中，该方法都优于以往工作，能够生成兼具真实感和几何可控性的夸张头像。\n"
  },
  {
    "path": "abs/2601.03824.md",
    "content": "### IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting\n\nGeneralizable 3D Gaussian Splatting aims to directly predict Gaussian parameters using a feed-forward network for scene reconstruction. Among these parameters, Gaussian means are particularly difficult to predict, so depth is usually estimated first and then unprojected to obtain the Gaussian sphere centers. Existing methods typically rely solely on a single warp to estimate depth probability, which hinders their ability to fully leverage cross-view geometric cues, resulting in unstable and coarse depth maps. To address this limitation, we propose IDESplat, which iteratively applies warp operations to boost depth probability estimation for accurate Gaussian mean prediction. First, to eliminate the inherent instability of a single warp, we introduce a Depth Probability Boosting Unit (DPBU) that integrates epipolar attention maps produced by cascading warp operations in a multiplicative manner. Next, we construct an iterative depth estimation process by stacking multiple DPBUs, progressively identifying potential depth candidates with high likelihood. As IDESplat iteratively boosts depth probability estimates and updates the depth candidates, the depth map is gradually refined, resulting in accurate Gaussian means. We conduct experiments on RealEstate10K, ACID, and DL3DV. IDESplat achieves outstanding reconstruction quality and state-of-the-art performance with real-time efficiency. On RE10K, it outperforms DepthSplat by 0.33 dB in PSNR, using only 10.7% of the parameters and 70% of the memory. Additionally, our IDESplat improves PSNR by 2.95 dB over DepthSplat on the DTU dataset in cross-dataset experiments, demonstrating its strong generalization ability.\n\n可泛化的 3D Gaussian Splatting 旨在使用前馈网络直接预测高斯参数以完成场景重建。在这些参数中，高斯均值尤其难以预测，因此通常会先估计深度，再通过反投影得到高斯球心。现有方法往往只依赖单次 warp 来估计深度概率，这限制了它们对跨视角几何线索的充分利用，导致深度图不稳定且较粗糙。为解决这一问题，本文提出 IDESplat，通过迭代地执行 warp 操作来提升深度概率估计，从而更准确地预测高斯均值。首先，为消除单次 warp 本身的不稳定性，本文提出深度概率增强单元 DPBU，以乘法方式整合级联 warp 产生的极线注意力图。随后，方法通过堆叠多个 DPBU 构建迭代式深度估计过程，逐步识别高概率的深度候选。随着 IDESplat 持续增强深度概率估计并更新候选集合，深度图被逐步细化，最终得到更准确的高斯均值。在 RealEstate10K、ACID 和 DL3DV 上的实验表明，IDESplat 在保持实时效率的同时实现了优秀的重建质量和先进性能。在 RE10K 上，它以仅 10.7% 的参数量和 70% 的显存占用，相比 DepthSplat 将 PSNR 提升了 0.33 dB；在跨数据集 DTU 实验中，也比 DepthSplat 高出 2.95 dB，展示了很强的泛化能力。\n"
  },
  {
    "path": "abs/2601.04348.md",
    "content": "### SCAR-GS: Spatial Context Attention for Residuals in Progressive Gaussian Splatting\n\nRecent advances in 3D Gaussian Splatting have allowed for real-time, high-fidelity novel view synthesis. Nonetheless, these models have significant storage requirements for large and medium-sized scenes, hindering their deployment over cloud and streaming services. Some of the most recent progressive compression techniques for these models rely on progressive masking and scalar quantization techniques to reduce the bitrate of Gaussian attributes using spatial context models. While effective, scalar quantization may not optimally capture the correlations of high-dimensional feature vectors, which can potentially limit the rate-distortion performance. In this work, we introduce a novel progressive codec for 3D Gaussian Splatting that replaces traditional methods with a more powerful Residual Vector Quantization approach to compress the primitive features. Our key contribution is an auto-regressive entropy model, guided by a multi-resolution hash grid, that accurately predicts the conditional probability of each successive transmitted index, allowing for coarse and refinement layers to be compressed with high efficiency.\n\n近年来，3D Gaussian Splatting 的发展使实时高保真新视角合成成为可能。但对于中大型场景，这类模型仍然存在较高的存储需求，从而限制了其在云端和流媒体服务中的部署。近期一些渐进式压缩方法通常依赖渐进式掩码与标量量化，并结合空间上下文模型来降低高斯属性的码率。虽然这些方法有效，但标量量化未必能充分刻画高维特征向量之间的相关性，因此可能限制率失真性能。为此，本文提出一种新的 3D Gaussian Splatting 渐进式编解码方法，使用更强的残差向量量化来压缩原语特征，从而替代传统做法。其关键贡献是一个由多分辨率哈希网格引导的自回归熵模型，能够准确预测连续传输索引的条件概率，使粗层和细化层都能以较高效率完成压缩。\n"
  },
  {
    "path": "abs/2601.04754.md",
    "content": "### ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3D Gaussian Splatting\n\nWe present ProFuse, an efficient context-aware framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS). The pipeline enhances cross-view consistency and intra-mask cohesion within a direct registration setup, adding minimal overhead and requiring no render-supervised fine-tuning. Instead of relying on a pretrained 3DGS scene, we introduce a dense correspondence-guided pre-registration phase that initializes Gaussians with accurate geometry while jointly constructing 3D Context Proposals via cross-view clustering. Each proposal carries a global feature obtained through weighted aggregation of member embeddings, and this feature is fused onto Gaussians during direct registration to maintain per-primitive language coherence across views. With associations established in advance, semantic fusion requires no additional optimization beyond standard reconstruction, and the model retains geometric refinement without densification. ProFuse achieves strong open-vocabulary 3DGS understanding while completing semantic attachment in about five minutes per scene, which is two times faster than SOTA. Additional details are available at our project page https://chiou1203.github.io/ProFuse/.\n\n本文提出 ProFuse，一个面向开放词汇三维场景理解的高效上下文感知框架，基于 3D Gaussian Splatting（3DGS）实现。该流程在直接配准框架下增强了跨视角一致性与掩码内部一致性，额外开销很小，也无需基于渲染监督的微调。不同于依赖预训练 3DGS 场景的方法，本文引入稠密对应引导的预配准阶段，在以准确几何初始化高斯的同时，通过跨视角聚类联合构建三维上下文候选。每个候选都携带一个由成员嵌入加权聚合得到的全局特征，并在直接配准过程中融合到高斯原语上，以保持跨视角的语言一致性。由于关联关系预先建立，语义融合在标准重建之外不再需要额外优化，模型也能在不做增密的情况下保留几何细化能力。ProFuse 在开放词汇 3DGS 理解任务上取得了很强的效果，并能在每个场景约 5 分钟内完成语义附着，速度约为当前先进方法的两倍。\n"
  },
  {
    "path": "abs/2601.04984.md",
    "content": "### OceanSplat: Object-aware Gaussian Splatting with Trinocular View Consistency for Underwater Scene Reconstruction\n\nWe introduce OceanSplat, a novel 3D Gaussian Splatting-based approach for high-fidelity underwater scene reconstruction. To overcome multi-view inconsistencies caused by scattering media, we design a trinocular setup for each camera pose by rendering from horizontally and vertically translated virtual viewpoints, enforcing view consistency to facilitate spatial optimization of 3D Gaussians. Furthermore, we derive synthetic epipolar depth priors from the virtual viewpoints, which serve as self-supervised depth regularizers to compensate for the limited geometric cues in degraded underwater scenes. We also propose a depth-aware alpha adjustment that modulates the opacity of 3D Gaussians during early training based on their depth along the viewing direction, deterring the formation of medium-induced primitives. Our approach promotes the disentanglement of 3D Gaussians from the scattering medium through effective geometric constraints, enabling accurate representation of scene structure and significantly reducing floating artifacts. Experiments on real-world underwater and simulated scenes demonstrate that OceanSplat substantially outperforms existing methods for both scene reconstruction and restoration in scattering media.\n\n本文提出 OceanSplat，一种基于 3D Gaussian Splatting 的高保真水下场景重建方法。为解决散射介质导致的多视角不一致问题，本文为每个相机位姿设计了一个三目设置，通过在水平和垂直方向平移得到虚拟视角并进行渲染，以施加视角一致性约束，从而促进 3D 高斯的空间优化。此外，本文还从这些虚拟视角中构造合成的极线深度先验，作为自监督深度正则项，用于弥补退化水下场景中几何线索不足的问题。方法还提出一种深度感知的 alpha 调整策略，在训练早期依据高斯沿视线方向的深度调节其透明度，以抑制由介质效应诱发的伪原语形成。通过这些有效的几何约束，OceanSplat 能够更好地将 3D 高斯与散射介质解耦，进而更准确地表示场景结构，并显著减少漂浮伪影。在真实水下场景和模拟场景上的实验表明，OceanSplat 在散射介质下的场景重建与图像恢复两项任务上都明显优于现有方法。\n"
  },
  {
    "path": "abs/2601.05511.md",
    "content": "### GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting\n\nWe introduce GaussianSwap, a novel video face swapping framework that constructs a 3D Gaussian Splatting based face avatar from a target video while transferring identity from a source image to the avatar. Conventional video swapping frameworks are limited to generating facial representations in pixel-based formats. The resulting swapped faces exist merely as a set of unstructured pixels without any capacity for animation or interactive manipulation. Our work introduces a paradigm shift from conventional pixel-based video generation to the creation of high-fidelity avatar with swapped faces. The framework first preprocesses target video to extract FLAME parameters, camera poses and segmentation masks, and then rigs 3D Gaussian splats to the FLAME model across frames, enabling dynamic facial control. To ensure identity preserving, we propose an compound identity embedding constructed from three state-of-the-art face recognition models for avatar finetuning. Finally, we render the face-swapped avatar on the background frames to obtain the face-swapped video. Experimental results demonstrate that GaussianSwap achieves superior identity preservation, visual clarity and temporal consistency, while enabling previously unattainable interactive applications.\n\n本文提出 GaussianSwap，一种新的视频换脸框架。该方法从目标视频中构建基于 3D Gaussian Splatting 的人脸头像，同时将源图像中的身份迁移到该头像上。传统视频换脸框架通常只能生成基于像素的人脸表示，换脸结果只是无结构的像素集合，无法进行动画驱动或交互式操控。本文将这一范式从像素级视频生成转向高保真可动画头像构建。方法首先对目标视频进行预处理，提取 FLAME 参数、相机位姿和分割掩码，然后将 3D 高斯点跨帧绑定到 FLAME 模型上，以实现动态人脸控制。为更好地保持身份一致性，本文提出一种复合身份嵌入，由三个先进的人脸识别模型共同构成，并用于头像微调。最后，方法将换脸后的人脸头像渲染回背景帧中，得到最终换脸视频。实验结果表明，GaussianSwap 在身份保持、视觉清晰度和时间一致性方面都表现更优，同时支持以往方法难以实现的交互式应用。\n"
  },
  {
    "path": "abs/2601.05584.md",
    "content": "### GS-DMSR: Dynamic Sensitive Multi-scale Manifold Enhancement for Accelerated High-Quality 3D Gaussian Splatting\n\nIn the field of 3D dynamic scene reconstruction, how to balance model convergence rate and rendering quality has long been a critical challenge that urgently needs to be addressed, particularly in high-precision modeling of scenes with complex dynamic motions. To tackle this issue, this study proposes the GS-DMSR method. By quantitatively analyzing the dynamic evolution process of Gaussian attributes, this mechanism achieves adaptive gradient focusing, enabling it to dynamically identify significant differences in the motion states of Gaussian models. It then applies differentiated optimization strategies to Gaussian models with varying degrees of significance, thereby significantly improving the model convergence rate. Additionally, this research integrates a multi-scale manifold enhancement module, which leverages the collaborative optimization of an implicit nonlinear decoder and an explicit deformation field to enhance the modeling efficiency for complex deformation scenes. Experimental results demonstrate that this method achieves a frame rate of up to 96 FPS on synthetic datasets, while effectively reducing both storage overhead and training time.Our code and data are available at https://anonymous.4open.science/r/GS-DMSR-2212.\n\n在三维动态场景重建领域，如何平衡模型收敛速度与渲染质量一直是亟待解决的关键问题，尤其是在存在复杂动态运动的高精度场景建模中更为突出。为此，本文提出 GS-DMSR 方法。该方法通过定量分析高斯属性的动态演化过程，实现自适应梯度聚焦，能够动态识别高斯模型在运动状态上的显著差异，并对不同重要程度的高斯模型采用差异化优化策略，从而显著提升模型收敛速度。此外，本文还引入多尺度流形增强模块，利用隐式非线性解码器与显式形变场的协同优化，提高复杂形变场景的建模效率。实验结果表明，该方法在合成数据集上可达到最高 96 FPS 的帧率，同时有效降低存储开销与训练时间。\n"
  },
  {
    "path": "abs/2601.05738.md",
    "content": "### FeatureSLAM: Feature-enriched 3D gaussian splatting SLAM in real time\n\nWe present a real-time tracking SLAM system that unifies efficient camera tracking with photorealistic feature-enriched mapping using 3D Gaussian Splatting (3DGS). Our main contribution is integrating dense feature rasterization into the novel-view synthesis, aligned with a visual foundation model. This yields strong semantics, going beyond basic RGB-D input, aiding both tracking and mapping accuracy. Unlike previous semantic SLAM approaches (which embed pre-defined class labels) FeatureSLAM enables entirely new downstream tasks via free-viewpoint, open-set segmentation. Across standard benchmarks, our method achieves real-time tracking, on par with state-of-the-art systems while improving tracking stability and map fidelity without prohibitive compute. Quantitatively, we obtain 9\\% lower pose error and 8\\% higher mapping accuracy compared to recent fixed-set SLAM baselines. Our results confirm that real-time feature-embedded SLAM, is not only valuable for enabling new downstream applications. It also improves the performance of the underlying tracking and mapping subsystems, providing semantic and language masking results that are on-par with offline 3DGS models, alongside state-of-the-art tracking, depth and RGB rendering.\n\n本文提出一个实时跟踪式 SLAM 系统，将高效相机跟踪与基于 3D Gaussian Splatting（3DGS）的高真实感特征增强建图统一起来。其主要贡献是在新视角合成过程中引入与视觉基础模型对齐的稠密特征光栅化，从而获得超越基础 RGB-D 输入的更强语义信息，并同时提升跟踪与建图精度。与以往嵌入预定义类别标签的语义 SLAM 不同，FeatureSLAM 通过自由视角、开放词汇分割支持全新的下游任务。在标准基准上，该方法实现了与先进系统相当的实时跟踪性能，同时在不引入过高计算开销的情况下提升了跟踪稳定性和地图保真度。定量结果显示，与近期固定类别集合的 SLAM 基线相比，该方法的位姿误差降低了 9%，建图精度提升了 8%。结果表明，实时特征嵌入式 SLAM 不仅能支持新的下游应用，也能切实改善底层跟踪与建图子系统，并在语义和语言掩码结果上达到与离线 3DGS 模型相当的水平。\n"
  },
  {
    "path": "abs/2601.05853.md",
    "content": "### LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting\n\nWe propose a novel framework for decomposing arbitrarily posed humans into animatable multi-layered 3D human avatars, separating the body and garments. Conventional single-layer reconstruction methods lock clothing to one identity, while prior multi-layer approaches struggle with occluded regions. We overcome both limitations by encoding each layer as a set of 2D Gaussians for accurate geometry and photorealistic rendering, and inpainting hidden regions with a pretrained 2D diffusion model via score-distillation sampling (SDS). Our three-stage training strategy first reconstructs the coarse canonical garment via single-layer reconstruction, followed by multi-layer training to jointly recover the inner-layer body and outer-layer garment details. Experiments on two 3D human benchmark datasets (4D-Dress, Thuman2.0) show that our approach achieves better rendering quality and layer decomposition and recomposition than the previous state-of-the-art, enabling realistic virtual try-on under novel viewpoints and poses, and advancing practical creation of high-fidelity 3D human assets for immersive applications. Our code is available at https://github.com/RockyXu66/LayerGS\n\n本文提出一种新的框架，可将任意姿态的人体分解为可动画化的多层三维人体头像，从而分离身体与服装。传统的单层重建方法通常会将服装绑定到单一身份上，而此前的多层方法又难以处理被遮挡区域。为解决这两个问题，本文将每一层编码为一组二维高斯，以实现准确几何建模和逼真渲染，并通过带有 score distillation sampling（SDS）的预训练二维扩散模型对隐藏区域进行补全。方法采用三阶段训练策略：先通过单层重建恢复粗略的标准服装，再通过多层训练联合恢复内层身体与外层服装细节。在 4D-Dress 和 Thuman2.0 两个三维人体基准上的实验表明，该方法在渲染质量以及层分解与重组能力上均优于以往先进方法，能够在新视角和新姿态下实现更真实的虚拟试穿，并推动高保真三维人体资产在沉浸式应用中的实际构建。\n"
  },
  {
    "path": "abs/2601.06285.md",
    "content": "### NAS-GS: Noise-Aware Sonar Gaussian Splatting\n\nUnderwater sonar imaging plays a crucial role in various applications, including autonomous navigation in murky water, marine archaeology, and environmental monitoring. However, the unique characteristics of sonar images, such as complex noise patterns and the lack of elevation information, pose significant challenges for 3D reconstruction and novel view synthesis. In this paper, we present NAS-GS, a novel Noise-Aware Sonar Gaussian Splatting framework specifically designed to address these challenges. Our approach introduces a Two-Ways Splatting technique that accurately models the dual directions for intensity accumulation and transmittance calculation inherent in sonar imaging, significantly improving rendering speed without sacrificing quality. Moreover, we propose a Gaussian Mixture Model (GMM) based noise model that captures complex sonar noise patterns, including side-lobes, speckle, and multi-path noise. This model enhances the realism of synthesized images while preventing 3D Gaussian overfitting to noise, thereby improving reconstruction accuracy. We demonstrate state-of-the-art performance on both simulated and real-world large-scale offshore sonar scenarios, achieving superior results in novel view synthesis and 3D reconstruction.\n\n水下声呐成像在浑浊水域自主导航、海洋考古和环境监测等场景中具有重要作用。然而，声呐图像具有复杂噪声模式以及缺少高程信息等特殊特性，这给三维重建和新视角合成带来了明显挑战。为此，本文提出 NAS-GS，一种面向噪声感知的声呐 Gaussian Splatting 框架。该方法引入 Two-Ways Splatting 技术，能够准确建模声呐成像中强度累积和透射率计算所涉及的双向过程，从而在不牺牲质量的前提下显著提升渲染速度。此外，本文还提出基于高斯混合模型（GMM）的噪声建模方法，用于刻画声呐中的旁瓣噪声、散斑噪声和多路径噪声等复杂模式。该模型不仅增强了合成图像的真实性，还能防止 3D 高斯对噪声发生过拟合，从而提升重建精度。实验表明，NAS-GS 在模拟和真实世界的大规模近海声呐场景上都取得了先进性能，在新视角合成和三维重建任务中均优于现有方法。\n"
  },
  {
    "path": "abs/2601.07963.md",
    "content": "### 3DGS-Drag: Dragging Gaussians for Intuitive Point-Based 3D Editing\n\nThe transformative potential of 3D content creation has been progressively unlocked through advancements in generative models. Recently, intuitive drag editing with geometric changes has attracted significant attention in 2D editing yet remains challenging for 3D scenes. In this paper, we introduce 3DGS-Drag -- a point-based 3D editing framework that provides efficient, intuitive drag manipulation of real 3D scenes. Our approach bridges the gap between deformation-based and 2D-editing-based 3D editing methods, addressing their limitations to geometry-related content editing. We leverage two key innovations: deformation guidance utilizing 3D Gaussian Splatting for consistent geometric modifications and diffusion guidance for content correction and visual quality enhancement. A progressive editing strategy further supports aggressive 3D drag edits. Our method enables a wide range of edits, including motion change, shape adjustment, inpainting, and content extension. Experimental results demonstrate the effectiveness of 3DGS-Drag in various scenes, achieving state-of-the-art performance in geometry-related 3D content editing. Notably, the editing is efficient, taking 10 to 20 minutes on a single RTX 4090 GPU.\n\n随着生成模型的发展，三维内容创作的潜力正不断被释放。近年来，带有几何变化的直观拖拽编辑在二维图像编辑中受到广泛关注，但在三维场景中仍然较难实现。为此，本文提出 3DGS-Drag，一种基于点表示的三维编辑框架，可对真实三维场景进行高效且直观的拖拽操作。该方法连接了基于形变的三维编辑方法和基于二维编辑的三维编辑方法，弥补了它们在几何相关内容编辑方面的不足。其关键在于两项创新：一是利用 3D Gaussian Splatting 进行形变引导，从而实现一致的几何修改；二是引入扩散模型引导，用于内容校正和视觉质量提升。方法还采用渐进式编辑策略，以支持幅度较大的三维拖拽修改。3DGS-Drag 可以完成多种编辑任务，包括运动变化、形状调整、图像修补和内容扩展。实验结果表明，该方法在多种场景中都表现有效，在几何相关三维内容编辑任务上达到当前先进水平，并且编辑效率较高，在单张 RTX 4090 GPU 上仅需 10 到 20 分钟。\n"
  },
  {
    "path": "abs/2601.09291.md",
    "content": "### TIDI-GS: Floater Suppression in 3D Gaussian Splatting for Enhanced Indoor Scene Fidelity\n\n3D Gaussian Splatting (3DGS) is a technique to create high-quality, real-time 3D scenes from images. This method often produces visual artifacts known as floaters--nearly transparent, disconnected elements that drift in space away from the actual surface. This geometric inaccuracy undermines the reliability of these models for practical applications, which is critical. To address this issue, we introduce TIDI-GS, a new training framework designed to eliminate these floaters. A key benefit of our approach is that it functions as a lightweight plugin for the standard 3DGS pipeline, requiring no major architectural changes and adding minimal overhead to the training process. The core of our method is a floater pruning algorithm--TIDI--that identifies and removes floaters based on several criteria: their consistency across multiple viewpoints, their spatial relationship to other elements, and an importance score learned during training. The framework includes a mechanism to preserve fine details, ensuring that important high-frequency elements are not mistakenly removed. This targeted cleanup is supported by a monocular depth-based loss function that helps improve the overall geometric structure of the scene. Our experiments demonstrate that TIDI-GS improves both the perceptual quality and geometric integrity of reconstructions, transforming them into robust digital assets, suitable for high-fidelity applications.\n\n3D Gaussian Splatting（3DGS）是一种能够从图像构建高质量、实时三维场景的技术。但该方法常会产生一种被称为 floater 的视觉伪影，即一些几乎透明、与真实表面分离并漂浮在空间中的元素。这种几何不准确性会削弱模型在实际应用中的可靠性。为解决这一问题，本文提出 TIDI-GS，一种专门用于消除 floater 的新训练框架。其主要优势在于，它可以作为标准 3DGS 流程中的轻量插件使用，无需大幅修改原有架构，对训练过程带来的额外开销也很小。方法核心是一个名为 TIDI 的 floater 剪除算法，它依据多个准则识别并移除 floater，包括跨视角一致性、与其他元素的空间关系，以及训练过程中学习到的重要性分数。该框架还包含一种细节保护机制，以避免重要的高频结构被误删。此外，本文还引入基于单目深度的损失函数，以帮助改善场景整体几何结构。实验结果表明，TIDI-GS 同时提升了重建结果的感知质量和几何完整性，使其更适合作为高保真应用中的稳健数字资产。\n"
  },
  {
    "path": "abs/2601.10075.md",
    "content": "### Thinking Like Van Gogh: Structure-Aware Style Transfer via Flow-Guided 3D Gaussian Splatting\n\nIn 1888, Vincent van Gogh wrote, \"I am seeking exaggeration in the essential.\" This principle, amplifying structural form while suppressing photographic detail, lies at the core of Post-Impressionist art. However, most existing 3D style transfer methods invert this philosophy, treating geometry as a rigid substrate for surface-level texture projection. To authentically reproduce Post-Impressionist stylization, geometric abstraction must be embraced as the primary vehicle of expression. We propose a flow-guided geometric advection framework for 3D Gaussian Splatting (3DGS) that operationalizes this principle in a mesh-free setting. Our method extracts directional flow fields from 2D paintings and back-propagates them into 3D space, rectifying Gaussian primitives to form flow-aligned brushstrokes that conform to scene topology without relying on explicit mesh priors. This enables expressive structural deformation driven directly by painterly motion rather than photometric constraints. Our contributions are threefold: (1) a projection-based, mesh-free flow guidance mechanism that transfers 2D artistic motion into 3D Gaussian geometry; (2) a luminance-structure decoupling strategy that isolates geometric deformation from color optimization, mitigating artifacts during aggressive structural abstraction; and (3) a VLM-as-a-Judge evaluation framework that assesses artistic authenticity through aesthetic judgment instead of conventional pixel-level metrics, explicitly addressing the subjective nature of artistic stylization.\n\n1888 年，梵高写道：“我在追求对本质的夸张。”这种通过强化结构形态、弱化照片式细节来表达的原则，正是后印象派艺术的核心。然而，现有大多数三维风格迁移方法恰恰与这一理念相反，它们把几何当作僵硬的载体，仅在表面投射纹理。若要真正再现后印象派风格，就必须把几何抽象本身视为主要表达手段。为此，本文提出一种基于流场引导的几何平流框架，用于 3D Gaussian Splatting（3DGS）的无网格风格化。该方法从二维画作中提取方向流场，并将其反向传播到三维空间，对高斯原语进行校正，使其形成与流向一致、又符合场景拓扑的笔触，而无需显式网格先验。这样一来，场景的结构变形可以直接由绘画运动感驱动，而不再受光度约束主导。本文的贡献主要有三点：1）提出一种基于投影的无网格流场引导机制，将二维艺术运动迁移到三维高斯几何中；2）提出亮度与结构解耦策略，将几何变形与颜色优化分离，减轻强结构抽象下的伪影；3）提出 VLM-as-a-Judge 评估框架，以审美判断而非传统像素级指标来评估艺术真实性，从而更直接地面对艺术风格化的主观性。\n"
  },
  {
    "path": "abs/2601.11772.md",
    "content": "### studentSplat: Your Student Model Learns Single-view 3D Gaussian Splatting\n\nRecent advance in feed-forward 3D Gaussian splatting has enable remarkable multi-view 3D scene reconstruction or single-view 3D object reconstruction but single-view 3D scene reconstruction remain under-explored due to inherited ambiguity in single-view. We present \\textbf{studentSplat}, a single-view 3D Gaussian splatting method for scene reconstruction. To overcome the scale ambiguity and extrapolation problems inherent in novel-view supervision from a single input, we introduce two techniques: 1) a teacher-student architecture where a multi-view teacher model provides geometric supervision to the single-view student during training, addressing scale ambiguity and encourage geometric validity; and 2) an extrapolation network that completes missing scene context, enabling high-quality extrapolation. Extensive experiments show studentSplat achieves state-of-the-art single-view novel-view reconstruction quality and comparable performance to multi-view methods at the scene level. Furthermore, studentSplat demonstrates competitive performance as a self-supervised single-view depth estimation method, highlighting its potential for general single-view 3D understanding tasks.\n\n近期前馈式 3D Gaussian Splatting 方法已经在多视角三维场景重建和单视角三维目标重建上取得了显著进展，但由于单视角输入天生存在歧义，单视角三维场景重建仍缺乏充分研究。本文提出 studentSplat，一种用于场景重建的单视角 3D Gaussian Splatting 方法。为解决单输入下新视角监督所固有的尺度歧义和外推问题，本文引入两项关键技术：1）教师-学生架构，在训练阶段由多视角教师模型为单视角学生模型提供几何监督，以缓解尺度歧义并增强几何合理性；2）外推网络，用于补全缺失的场景上下文，从而实现高质量外推。大量实验表明，studentSplat 在单视角新视角重建质量上达到当前最优水平，并在场景层面取得可与多视角方法相比的性能。此外，studentSplat 作为一种自监督单视角深度估计方法也展现出有竞争力的表现，说明其在更通用的单视角三维理解任务中具有潜力。\n"
  },
  {
    "path": "abs/2601.12122.md",
    "content": "### Active Semantic Mapping of Horticultural Environments Using Gaussian Splatting\n\nSemantic reconstruction of agricultural scenes plays a vital role in tasks such as phenotyping and yield estimation. However, traditional approaches that rely on manual scanning or fixed camera setups remain a major bottleneck in this process. In this work, we propose an active 3D reconstruction framework for horticultural environments using a mobile manipulator. The proposed system integrates the classical Octomap representation with 3D Gaussian Splatting to enable accurate and efficient target-aware mapping. While a low-resolution Octomap provides probabilistic occupancy information for informative viewpoint selection and collision-free planning, 3D Gaussian Splatting leverages geometric, photometric, and semantic information to optimize a set of 3D Gaussians for high-fidelity scene reconstruction. We further introduce simple yet effective strategies to enhance robustness against segmentation noise and reduce memory consumption. Simulation experiments demonstrate that our method outperforms purely occupancy-based approaches in both runtime efficiency and reconstruction accuracy, enabling precise fruit counting and volume estimation. Compared to a 0.01m-resolution Octomap, our approach achieves an improvement of 6.6% in fruit-level F1 score under noise-free conditions, and up to 28.6% under segmentation noise. Additionally, it achieves a 50% reduction in runtime, highlighting its potential for scalable, real-time semantic reconstruction in agricultural robotics.\n\n农业场景的语义重建在表型分析、产量估计等任务中具有重要作用，但依赖人工扫描或固定相机布置的传统方法仍是该流程中的主要瓶颈。本文提出一个面向园艺环境的主动式三维重建框架，使用移动机械臂完成场景建图。该系统将经典的 Octomap 表示与 3D Gaussian Splatting 相结合，以实现准确且高效的目标感知建图。其中，低分辨率 Octomap 提供概率占据信息，用于信息量驱动的视角选择和无碰撞规划；3D Gaussian Splatting 则结合几何、光度和语义信息，对一组 3D 高斯进行优化，从而实现高保真场景重建。本文还提出一些简单但有效的策略，以提升系统对分割噪声的鲁棒性并降低内存消耗。仿真实验表明，该方法在运行效率和重建精度上均优于纯占据栅格方法，可支持更准确的果实计数和体积估计。与分辨率为 0.01 m 的 Octomap 相比，该方法在无噪声条件下将果实级 F1 分数提升 6.6%，在存在分割噪声时最高提升 28.6%，同时运行时间减少 50%，显示出其在农业机器人中实现可扩展、实时语义重建的潜力。\n"
  },
  {
    "path": "abs/2601.12683.md",
    "content": "### GaussianTrimmer: Online Trimming Boundaries for 3DGS Segmentation\n\nWith the widespread application of 3D Gaussians in 3D scene representation, 3D scene segmentation methods based on 3D Gaussians have also gradually emerged. However, existing 3D Gaussian segmentation methods basically segment on the basis of Gaussian primitives. Due to the large variation range of the scale of 3D Gaussians, large-sized Gaussians that often span the foreground and background lead to jagged boundaries of segmented objects. To this end, we propose an online boundary trimming method, GaussianTrimmer, which is an efficient and plug-and-play post-processing method capable of trimming coarse boundaries for existing 3D Gaussian segmentation methods. Our method consists of two core steps: 1. Generating uniformly and well-covered virtual cameras; 2. Trimming Gaussian at the primitive level based on 2D segmentation results on virtual cameras. Extensive quantitative and qualitative experiments demonstrate that our method can improve the segmentation quality of existing 3D Gaussian segmentation methods as a plug-and-play method.\n\n随着 3D 高斯在三维场景表示中的广泛应用，基于 3D 高斯的三维场景分割方法也逐渐出现。然而，现有 3D 高斯分割方法基本都是以高斯原语为基础进行分割。由于 3D 高斯的尺度变化范围很大，一些大尺度高斯往往会同时跨越前景和背景，从而导致分割结果边界锯齿明显。为此，本文提出一种在线边界修剪方法 GaussianTrimmer。该方法是一种高效、即插即用的后处理方案，可用于修剪现有 3D 高斯分割方法产生的粗糙边界。其核心包括两个步骤：1）生成分布均匀且覆盖充分的虚拟相机；2）基于这些虚拟相机上的二维分割结果，在高斯原语层面对边界进行修剪。大量定量与定性实验表明，该方法作为插件式后处理模块，能够有效提升现有 3D 高斯分割方法的分割质量。\n"
  },
  {
    "path": "abs/2601.12814.md",
    "content": "### CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting\n\nWe present the first unified framework for rate-distortion-optimized compression and segmentation of 3D Gaussian Splatting (3DGS). While 3DGS has proven effective for both real-time rendering and semantic scene understanding, prior works have largely treated these tasks independently, leaving their joint consideration unexplored. Inspired by recent advances in rate-distortion-optimized 3DGS compression, this work integrates semantic learning into the compression pipeline to support decoder-side applications--such as scene editing and manipulation--that extend beyond traditional scene reconstruction and view synthesis. Our scheme features a lightweight implicit neural representation-based hyperprior, enabling efficient entropy coding of both color and semantic attributes while avoiding costly grid-based hyperprior as seen in many prior works. To facilitate compression and segmentation, we further develop compression-guided segmentation learning, consisting of quantization-aware training to enhance feature separability and a quality-aware weighting mechanism to suppress unreliable Gaussian primitives. Extensive experiments on the LERF and 3D-OVS datasets demonstrate that our approach significantly reduces transmission cost while preserving high rendering quality and strong segmentation performance.\n\n本文提出首个面向 3D Gaussian Splatting（3DGS）率失真优化压缩与分割的统一框架。虽然 3DGS 已被证明同时适用于实时渲染和语义场景理解，但以往工作基本将这两个任务分开处理，尚未探索它们的联合建模。受近期率失真优化 3DGS 压缩方法的启发，本文将语义学习整合进压缩流程，以支持解码端的场景编辑、场景操控等应用，而不仅限于传统的场景重建和视角合成。该方法采用基于轻量隐式神经表示的超先验，在避免许多先前工作中高成本网格超先验的同时，实现了对颜色属性和语义属性的高效熵编码。为同时提升压缩和分割效果，本文进一步提出压缩引导的分割学习机制，包括提升特征可分性的量化感知训练，以及抑制不可靠高斯原语影响的质量感知加权机制。在 LERF 和 3D-OVS 数据集上的大量实验表明，该方法能够在显著降低传输成本的同时，保持较高的渲染质量和较强的分割性能。\n"
  },
  {
    "path": "abs/2601.12823.md",
    "content": "### TreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement\n\nAerial remote sensing enables efficient large-area surveying, but accurate direct object-level measurement remains difficult in complex natural scenes. Recent advancements in 3D vision, particularly learned radiance-field representations such as NeRF and 3D Gaussian Splatting, have begun to raise the ceiling on reconstruction fidelity and densifiable geometry from posed imagery. Nevertheless, direct aerial measurement of important natural attributes such as tree diameter at breast height (DBH) remains challenging. Trunks in aerial forest scans are distant and sparsely observed in image views: at typical operating altitudes, stems may span only a few pixels. With these constraints, conventional reconstruction methods leave breast-height trunk geometry weakly constrained. We present TreeDGS, an aerial image reconstruction method that leverages 3D Gaussian Splatting as a continuous, densifiable scene representation for trunk measurement. After SfM--MVS initialization and Gaussian optimization, we extract a dense point set from the Gaussian field using RaDe-GS's depth-aware cumulative-opacity integration and associate each sample with a multi-view opacity reliability score. Then, we estimate DBH from trunk-isolated points using opacity-weighted solid-circle fitting. Evaluated on 10 plots with field-measured DBH, TreeDGS reaches 4.79,cm RMSE (about 2.6 pixels at this GSD) and outperforms a state-of-the-art LiDAR baseline (7.91,cm RMSE). This shows that TreeDGS can enable accurate, low-cost aerial DBH measurement\n\n航空遥感能够高效完成大范围测绘，但在复杂自然场景中，要直接且准确地进行目标级测量仍然很困难。近年来，三维视觉的发展，尤其是 NeRF 和 3D Gaussian Splatting 这类学习式辐射场表示，显著提升了基于已知位姿图像进行高保真重建和致密几何恢复的能力。然而，针对胸径（DBH）这类重要自然属性的直接航空测量仍然具有挑战性。森林航拍中，树干距离远、在图像中观测稀疏，在常见飞行高度下往往只占几个像素，因此传统重建方法难以对胸高位置的树干几何形成足够约束。本文提出 TreeDGS，一种面向空中图像的重建方法，利用 3D Gaussian Splatting 作为连续且可致密化的场景表示来完成树干测量。在 SfM-MVS 初始化和高斯优化之后，方法通过 RaDe-GS 的深度感知累积透明度积分，从高斯场中提取稠密点集，并为每个样本关联一个多视角透明度可靠性分数。随后，方法在树干分离点集上使用透明度加权的实心圆拟合来估计 DBH。在 10 个具有实测 DBH 的样地上，TreeDGS 达到 4.79 cm 的 RMSE（约为该地面采样距离下 2.6 个像素），优于先进的 LiDAR 基线方法（7.91 cm RMSE）。这表明 TreeDGS 有望实现准确且低成本的航空 DBH 测量。\n"
  },
  {
    "path": "abs/2601.13132.md",
    "content": "### GaussExplorer: 3D Gaussian Splatting for Embodied Exploration and Reasoning\n\nWe present GaussExplorer, a framework for embodied exploration and reasoning built on 3D Gaussian Splatting (3DGS). While prior approaches to language-embedded 3DGS have made meaningful progress in aligning simple text queries with Gaussian embeddings, they are generally optimized for relatively simple queries and struggle to interpret more complex, compositional language queries. Alternative studies based on object-centric RGB-D structured memories provide spatial grounding but are constrained by pre-fixed viewpoints. To address these issues, GaussExplorer introduces Vision-Language Models (VLMs) on top of 3DGS to enable question-driven exploration and reasoning within 3D scenes. We first identify pre-captured images that are most correlated with the query question, and subsequently adjust them into novel viewpoints to more accurately capture visual information for better reasoning by VLMs. Experiments show that ours outperforms existing methods on several benchmarks, demonstrating the effectiveness of integrating VLM-based reasoning with 3DGS for embodied tasks.\n\n本文提出 GaussExplorer，一个构建在 3D Gaussian Splatting（3DGS）之上的具身探索与推理框架。以往将语言嵌入 3DGS 的方法，虽然在将简单文本查询与高斯表示对齐方面取得了进展，但通常只适用于较简单的问题，难以理解更复杂、具有组合结构的语言查询。另一类基于以目标为中心的 RGB-D 结构化记忆的方法虽然具备空间定位能力，但受限于预先固定的视角。为解决这些问题，GaussExplorer 在 3DGS 之上引入视觉语言模型（VLM），从而支持面向问题的三维场景探索与推理。该方法首先找出与问题最相关的预采集图像，再将这些图像调整到新的观察视角，以更准确地捕获视觉信息，提升 VLM 的推理效果。实验结果表明，该方法在多个基准上优于已有方法，验证了将基于 VLM 的推理与 3DGS 结合用于具身任务的有效性。\n"
  },
  {
    "path": "abs/2601.14208.md",
    "content": "### Rig-Aware 3D Reconstruction of Vehicle Undercarriages using Gaussian Splatting\n\nInspecting the undercarriage of used vehicles is a labor-intensive task that requires inspectors to crouch or crawl underneath each vehicle to thoroughly examine it. Additionally, online buyers rarely see undercarriage photos. We present an end-to-end pipeline that utilizes a three-camera rig to capture videos of the undercarriage as the vehicle drives over it, and produces an interactive 3D model of the undercarriage. The 3D model enables inspectors and customers to rotate, zoom, and slice through the undercarriage, allowing them to detect rust, leaks, or impact damage in seconds, thereby improving both workplace safety and buyer confidence. Our primary contribution is a rig-aware Structure-from-Motion (SfM) pipeline specifically designed to overcome the challenges of wide-angle lens distortion and low-parallax scenes. Our method overcomes the challenges of wide-angle lens distortion and low-parallax scenes by integrating precise camera calibration, synchronized video streams, and strong geometric priors from the camera rig. We use a constrained matching strategy with learned components, the DISK feature extractor, and the attention-based LightGlue matcher to generate high-quality sparse point clouds that are often unattainable with standard SfM pipelines. These point clouds seed the Gaussian splatting process to generate photorealistic undercarriage models that render in real-time. Our experiments and ablation studies demonstrate that our design choices are essential to achieve state-of-the-art quality.\n\n检查二手车底盘是一项劳动密集型任务，检验人员往往需要蹲下甚至钻到车底下才能完成全面检查。此外，线上购车用户通常也看不到底盘照片。本文提出一个端到端流程，利用三相机 rig 在车辆驶过时拍摄其底盘视频，并生成一个可交互的三维底盘模型。该模型允许检验人员和买家对底盘进行旋转、缩放和切片，从而在数秒内发现锈蚀、漏液或撞击损伤，进而提升工作安全性和买家信心。本文的主要贡献是一个 rig-aware 的运动恢复结构（SfM）流程，专门用于解决广角镜头畸变和低视差场景带来的难题。该方法通过结合精确相机标定、同步视频流以及来自相机 rig 的强几何先验，有效应对这些挑战。作者进一步采用带有学习组件的约束匹配策略，以及 DISK 特征提取器和基于注意力的 LightGlue 匹配器，生成高质量稀疏点云，而这些点云通常难以通过标准 SfM 流程获得。随后，这些点云作为 Gaussian Splatting 的初始化种子，用于生成可实时渲染的高真实感底盘模型。实验和消融研究表明，本文的这些设计选择对于达到先进水平的效果是关键的。\n"
  },
  {
    "path": "abs/2601.14510.md",
    "content": "### Structured Image-based Coding for Efficient Gaussian Splatting Compression\n\nGaussian Splatting (GS) has recently emerged as a state-of-the-art representation for radiance fields, combining real-time rendering with high visual fidelity. However, GS models require storing millions of parameters, leading to large file sizes that impair their use in practical multimedia systems. To address this limitation, this paper introduces GS Image-based Compression (GSICO), a novel GS codec that efficiently compresses pre-trained GS models while preserving perceptual fidelity. The core contribution lies in a mapping procedure that arranges GS parameters into structured images, guided by a novel algorithm that enhances spatial coherence. These GS parameter images are then encoded using a conventional image codec. Experimental evaluations on Tanks and Temples, Deep Blending, and Mip-NeRF360 datasets show that GSICO achieves average compression factors of 20.2x with minimal loss in visual quality, as measured by PSNR, SSIM, and LPIPS. Compared with state-of-the-art GS compression methods, the proposed codec consistently yields superior rate-distortion (RD) trade-offs.\n\nGaussian Splatting（GS）近来已成为辐射场表示中的先进方案，兼顾实时渲染能力与较高视觉保真度。然而，GS 模型需要存储数百万个参数，导致文件体积较大，限制了其在实际多媒体系统中的应用。为了解决这一问题，本文提出 GS Image-based Compression（GSICO），这是一种新的 GS 编码方法，能够在保持感知质量的同时高效压缩预训练的 GS 模型。其核心贡献在于一种映射过程：在新算法的引导下，将 GS 参数组织成具有更强空间一致性的结构化图像，再借助传统图像编解码器进行压缩。基于 Tanks and Temples、Deep Blending 和 Mip-NeRF360 数据集的实验表明，GSICO 平均可实现 20.2 倍压缩，同时在 PSNR、SSIM 和 LPIPS 指标上仅造成很小的视觉质量损失。与当前先进的 GS 压缩方法相比，该方法能够稳定取得更优的率失真折中表现。\n"
  },
  {
    "path": "abs/2601.14821.md",
    "content": "### POTR: Post-Training 3DGS Compression\n\n3D Gaussian Splatting (3DGS) has recently emerged as a promising contender to Neural Radiance Fields (NeRF) in 3D scene reconstruction and real-time novel view synthesis. 3DGS outperforms NeRF in training and inference speed but has substantially higher storage requirements. To remedy this downside, we propose POTR, a post-training 3DGS codec built on two novel techniques. First, POTR introduces a novel pruning approach that uses a modified 3DGS rasterizer to efficiently calculate every splat's individual removal effect simultaneously. This technique results in 2-4x fewer splats than other post-training pruning techniques and as a result also significantly accelerates inference with experiments demonstrating 1.5-2x faster inference than other compressed models. Second, we propose a novel method to recompute lighting coefficients, significantly reducing their entropy without using any form of training. Our fast and highly parallel approach especially increases AC lighting coefficient sparsity, with experiments demonstrating increases from 70% to 97%, with minimal loss in quality. Finally, we extend POTR with a simple fine-tuning scheme to further enhance pruning, inference, and rate-distortion performance. Experiments demonstrate that POTR, even without fine-tuning, consistently outperforms all other post-training compression techniques in both rate-distortion performance and inference speed.\n\n3D Gaussian Splatting（3DGS）近来已成为 Neural Radiance Fields（NeRF）的有力竞争者，可用于三维场景重建和实时新视角合成。虽然 3DGS 在训练和推理速度上优于 NeRF，但其存储开销明显更高。为缓解这一问题，本文提出后训练 3DGS 编解码方法 POTR，核心包含两项新技术。首先，POTR 提出一种新的剪枝方法，利用改造后的 3DGS 光栅化器同时高效计算每个高斯点被移除后的单独影响。该方法相比其他后训练剪枝技术可将高斯数量减少 2 到 4 倍，并使推理速度提升 1.5 到 2 倍。其次，本文提出一种无需训练即可重新计算光照系数的新方法，从而显著降低其熵值。该高并行方案尤其能提升 AC 光照系数的稀疏性，实验中其稀疏率可从 70% 提升到 97%，同时仅带来极小质量损失。最后，POTR 还结合了一个简单的微调方案，以进一步改善剪枝效果、推理速度和率失真表现。实验表明，即使不进行微调，POTR 在率失真性能和推理速度上也能稳定优于其他后训练压缩方法。\n"
  },
  {
    "path": "abs/2601.15431.md",
    "content": "### SplatBus: A Gaussian Splatting Viewer Framework via GPU Interprocess Communication\n\nRadiance field-based rendering methods have attracted significant interest from the computer vision and computer graphics communities. They enable high-fidelity rendering with complex real-world lighting effects, but at the cost of high rendering time. 3D Gaussian Splatting solves this issue with a rasterization-based approach for real-time rendering, enabling applications such as autonomous driving, robotics, virtual reality, and extended reality. However, current 3DGS implementations are difficult to integrate into traditional mesh-based rendering pipelines, which is a common use case for interactive applications and artistic exploration. To address this limitation, this software solution uses Nvidia's interprocess communication APIs to allow easy integration and to enable results to be viewed in external clients such as Unity, Blender, Unreal Engine, and OpenGL viewers. The code is available at https://github.com/RockyXu66/splatbus.\n\n基于辐射场的渲染方法近年来受到计算机视觉和计算机图形学领域的广泛关注。它们能够实现带有复杂真实光照效果的高保真渲染，但代价是渲染时间较长。3D Gaussian Splatting 通过基于栅格化的实时渲染方式缓解了这一问题，从而支持自动驾驶、机器人、虚拟现实和扩展现实等应用。然而，现有 3DGS 实现很难融入传统基于 mesh 的渲染管线，而这恰恰是交互式应用和艺术探索中的常见需求。为了解决这一限制，本文提出一种软件框架，利用 Nvidia 的进程间通信 API，实现对现有系统的便捷集成，并允许在 Unity、Blender、Unreal Engine 以及 OpenGL 查看器等外部客户端中直接查看结果。代码地址为 https://github.com/RockyXu66/splatbus。\n"
  },
  {
    "path": "abs/2601.15766.md",
    "content": "### LL-GaussianMap: Zero-shot Low-Light Image Enhancement via 2D Gaussian Splatting Guided Gain Maps\n\nSignificant progress has been made in low-light image enhancement with respect to visual quality. However, most existing methods primarily operate in the pixel domain or rely on implicit feature representations. As a result, the intrinsic geometric structural priors of images are often neglected. 2D Gaussian Splatting (2DGS) has emerged as a prominent explicit scene representation technique characterized by superior structural fitting capabilities and high rendering efficiency. Despite these advantages, the utilization of 2DGS in low-level vision tasks remains unexplored. To bridge this gap, LL-GaussianMap is proposed as the first unsupervised framework incorporating 2DGS into low-light image enhancement. Distinct from conventional methodologies, the enhancement task is formulated as a gain map generation process guided by 2DGS primitives. The proposed method comprises two primary stages. First, high-fidelity structural reconstruction is executed utilizing 2DGS. Then, data-driven enhancement dictionary coefficients are rendered via the rasterization mechanism of Gaussian splatting through an innovative unified enhancement module. This design effectively incorporates the structural perception capabilities of 2DGS into gain map generation, thereby preserving edges and suppressing artifacts during enhancement. Additionally, the reliance on paired data is circumvented through unsupervised learning. Experimental results demonstrate that LL-GaussianMap achieves superior enhancement performance with an extremely low storage footprint, highlighting the effectiveness of explicit Gaussian representations for image enhancement.\n\n低照度图像增强在视觉质量方面已经取得了显著进展。然而，大多数现有方法主要工作在像素域，或者依赖隐式特征表示，因此常常忽略图像本身固有的几何结构先验。二维 Gaussian Splatting (2DGS) 作为一种显式场景表示技术，具备更强的结构拟合能力和更高的渲染效率。尽管如此，2DGS 在低层视觉任务中的应用仍然缺乏探索。为填补这一空白，我们提出 LL-GaussianMap，这是首个将 2DGS 引入低照度图像增强的无监督框架。不同于传统方法，我们将增强任务表述为一个由 2DGS primitives 引导的增益图生成过程。该方法主要包含两个阶段：首先利用 2DGS 执行高保真的结构重建；随后通过一种创新的统一增强模块，借助 Gaussian splatting 的光栅化机制渲染数据驱动的增强字典系数。该设计将 2DGS 的结构感知能力有效引入增益图生成过程中，从而在增强时更好地保留边缘并抑制伪影。此外，该方法通过无监督学习摆脱了对成对数据的依赖。实验结果表明，LL-GaussianMap 在极低存储开销下实现了更优的增强性能，说明显式高斯表示对于图像增强任务具有很强的有效性。\n"
  },
  {
    "path": "abs/2601.15772.md",
    "content": "### LL-GaussianImage: Efficient Image Representation for Zero-shot Low-Light Enhancement with 2D Gaussian Splatting\n\n2D Gaussian Splatting (2DGS) is an emerging explicit scene representation method with significant potential for image compression due to high fidelity and high compression ratios. However, existing low-light enhancement algorithms operate predominantly within the pixel domain. Processing 2DGS-compressed images necessitates a cumbersome decompression-enhancement-recompression pipeline, which compromises efficiency and introduces secondary degradation. To address these limitations, we propose LL-GaussianImage, the first zero-shot unsupervised framework designed for low-light enhancement directly within the 2DGS compressed representation domain. Three primary advantages are offered by this framework. First, a semantic-guided Mixture-of-Experts enhancement framework is designed. Dynamic adaptive transformations are applied to the sparse attribute space of 2DGS using rendered images as guidance to enable compression-as-enhancement without full decompression to a pixel grid. Second, a multi-objective collaborative loss function system is established to strictly constrain smoothness and fidelity during enhancement, suppressing artifacts while improving visual quality. Third, a two-stage optimization process is utilized to achieve reconstruction-as-enhancement. The accuracy of the base representation is ensured through single-scale reconstruction and network robustness is enhanced. High-quality enhancement of low-light images is achieved while high compression ratios are maintained. The feasibility and superiority of the paradigm for direct processing within the compressed representation domain are validated through experimental results.\n\n2D Gaussian Splatting（2DGS）是一种新兴的显式场景表示方法，因其高保真度和高压缩率，在图像压缩方面展现出很大潜力。然而，现有低照度增强算法大多仍工作在像素域中。若要处理经过 2DGS 压缩的图像，通常需要先解压、再增强、再重新压缩，这一繁琐流程既降低效率，也会引入二次退化。为解决这一问题，本文提出 LL-GaussianImage，这是首个直接在 2DGS 压缩表示域内进行低照度增强的零样本无监督框架。该框架主要具有三方面优势：首先，设计了一个语义引导的专家混合增强框架，利用渲染图像作为指导，在 2DGS 的稀疏属性空间中施加动态自适应变换，从而实现无需完整解压到像素网格的“压缩即增强”；其次，建立了多目标协同损失函数体系，以严格约束增强过程中的平滑性和保真度，在提升视觉质量的同时抑制伪影；第三，采用两阶段优化过程实现“重建即增强”，既保证基础表示的准确性，又增强网络鲁棒性。实验结果验证了在保持高压缩率的同时实现高质量低照度增强的可行性，也证明了在压缩表示域内直接处理图像这一范式的有效性和优越性。\n"
  },
  {
    "path": "abs/2601.15897.md",
    "content": "### ThermoSplat: Cross-Modal 3D Gaussian Splatting with Feature Modulation and Geometry Decoupling\n\nMulti-modal scene reconstruction integrating RGB and thermal infrared data is essential for robust environmental perception across diverse lighting and weather conditions. However, extending 3D Gaussian Splatting (3DGS) to multi-spectral scenarios remains challenging. Current approaches often struggle to fully leverage the complementary information of multi-modal data, typically relying on mechanisms that either tend to neglect cross-modal correlations or leverage shared representations that fail to adaptively handle the complex structural correlations and physical discrepancies between spectrums. To address these limitations, we propose ThermoSplat, a novel framework that enables deep spectral-aware reconstruction through active feature modulation and adaptive geometry decoupling. First, we introduce a Spectrum-Aware Adaptive Modulation that dynamically conditions shared latent features on thermal structural priors, effectively guiding visible texture synthesis with reliable cross-modal geometric cues. Second, to accommodate modality-specific geometric inconsistencies, we propose a Modality-Adaptive Geometric Decoupling scheme that learns independent opacity offsets and executes an independent rasterization pass for the thermal branch. Additionally, a hybrid rendering pipeline is employed to integrate explicit Spherical Harmonics with implicit neural decoding, ensuring both semantic consistency and high-frequency detail preservation. Extensive experiments on the RGBT-Scenes dataset demonstrate that ThermoSplat achieves state-of-the-art rendering quality across both visible and thermal spectrums.\n\n融合 RGB 与热红外数据的多模态场景重建，对于在复杂光照和天气条件下实现稳健环境感知至关重要。然而，将 3D Gaussian Splatting 扩展到多光谱场景仍然具有挑战。现有方法通常难以充分利用多模态数据之间的互补信息，往往要么忽视跨模态相关性，要么依赖共享表示，而这些共享表示又难以自适应处理不同光谱之间复杂的结构关联和物理差异。为解决这些问题，我们提出 ThermoSplat，这是一种通过主动特征调制和自适应几何解耦实现深度光谱感知重建的新框架。首先，我们设计了 Spectrum-Aware Adaptive Modulation，通过热模态结构先验动态调制共享潜在特征，从而利用可靠的跨模态几何线索引导可见光纹理合成。其次，为了适应不同模态下的几何不一致性，我们提出 Modality-Adaptive Geometric Decoupling 方案，为热模态分支学习独立的不透明度偏移，并执行独立的栅格化过程。此外，我们还采用混合渲染管线，将显式的球谐表示与隐式神经解码结合起来，以同时保证语义一致性和高频细节保留。在 RGBT-Scenes 数据集上的大量实验表明，ThermoSplat 在可见光和热模态两个频谱上都取得了当前最先进的渲染质量。\n"
  },
  {
    "path": "abs/2601.15951.md",
    "content": "### EVolSplat4D: Efficient Volume-based Gaussian Splatting for 4D Urban Scene Synthesis\n\nNovel view synthesis of static and dynamic urban scenes is essential for autonomous driving simulation, yet existing methods often struggle to balance reconstruction time with quality. While state-of-the-art neural radiance fields and 3D Gaussian Splatting approaches achieve photorealism, they often rely on time-consuming per-scene optimization. Conversely, emerging feed-forward methods frequently adopt per-pixel Gaussian representations, which lead to 3D inconsistencies when aggregating multi-view predictions in complex, dynamic environments. We propose EVolSplat4D, a feed-forward framework that moves beyond existing per-pixel paradigms by unifying volume-based and pixel-based Gaussian prediction across three specialized branches. For close-range static regions, we predict consistent geometry of 3D Gaussians over multiple frames directly from a 3D feature volume, complemented by a semantically enhanced image-based rendering module for predicting appearance. For dynamic actors, we utilize object-centric canonical spaces and a motion-adjusted rendering module to aggregate temporal features, ensuring stable 4D reconstruction despite noisy motion priors. Far-field scenery is handled by an efficient per-pixel Gaussian branch to ensure full-scene coverage. Experimental results on the KITTI-360, KITTI, Waymo, and PandaSet datasets show that EVolSplat4D reconstructs both static and dynamic environments with superior accuracy and consistency, outperforming both per-scene optimization and state-of-the-art feed-forward baselines.\n\n静态和动态城市场景的新视角合成对于自动驾驶仿真至关重要，但现有方法往往难以兼顾重建速度与质量。尽管当前最先进的 neural radiance fields 和 3D Gaussian Splatting 方法能够实现逼真的渲染效果，但它们通常依赖耗时的逐场景优化。相比之下，新出现的前馈式方法往往采用逐像素高斯表示，在复杂动态环境中聚合多视图预测时容易产生三维不一致。为此，我们提出 EVolSplat4D，这是一种前馈式框架，通过三个专门分支统一体素级和像素级高斯预测，突破了现有逐像素范式的局限。对于近距离静态区域，我们直接从三维特征体中预测跨多帧一致的三维高斯几何，并配合语义增强的图像式渲染模块预测外观；对于动态目标，我们使用以对象为中心的规范空间和运动调整渲染模块聚合时间特征，从而在运动先验存在噪声的情况下依然保持稳定的四维重建；对于远场场景，则使用高效的逐像素高斯分支以保证全场景覆盖。KITTI-360、KITTI、Waymo 和 PandaSet 数据集上的实验结果表明，EVolSplat4D 在静态与动态环境重建中都表现出更好的精度和一致性，优于逐场景优化方法和当前最先进的前馈式基线。\n"
  },
  {
    "path": "abs/2601.16736.md",
    "content": "### A Step to Decouple Optimization in 3DGS\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful technique for real-time novel view synthesis. As an explicit representation optimized through gradient propagation among primitives, optimization methods widely accepted in deep neural networks are also adopted in 3DGS, such as synchronous weight updating and Adam with adaptive gradients. However, considering the physical significance and specific design of 3DGS, there are two overlooked details in its optimization: update step coupling, which induces optimizer state rescaling and costly attribute updates outside the viewpoints, and gradient coupling in the moment, which may lead to under- or over-effective regularization. Such complex coupling remains under-explored. After revisiting the optimization of 3DGS, we decouple it and recompose the process into Sparse Adam, Re-State Regularization, and Decoupled Attribute Regularization. Through a large number of experiments under both the 3DGS and 3DGS-MCMC frameworks, our work provides a deeper understanding of these components. Finally, based on the empirical analysis, we redesign the optimization and propose AdamW-GS by re-coupling the beneficial components, under which better optimization efficiency and representation effectiveness are achieved simultaneously.\n\n3D Gaussian Splatting 已成为实时新视角合成中的一种强大技术。作为一种通过基元间梯度传播进行优化的显式表示，深度神经网络中广泛采用的优化方式也被直接用于 3DGS，例如同步参数更新和带自适应梯度的 Adam。然而，考虑到 3DGS 的物理含义及其特定设计，其优化过程中的两个细节长期被忽视：一是更新步长耦合，这会引起优化器状态缩放，并在视角之外带来高代价的属性更新；二是动量中的梯度耦合，这可能导致正则化不足或过强。这样的复杂耦合机制目前研究仍不充分。本文重新审视 3DGS 的优化过程，将其拆解并重组为 Sparse Adam、Re-State Regularization 和 Decoupled Attribute Regularization 三个部分。通过在 3DGS 与 3DGS-MCMC 两个框架下进行大量实验，我们更深入地分析了这些组成部分的作用。最终，我们基于经验分析重新设计优化流程，通过重新组合有益组件提出 AdamW-GS，在优化效率和表示效果上同时取得更优表现。\n"
  },
  {
    "path": "abs/2601.17185.md",
    "content": "### LGDWT-GS: Local and Global Discrete Wavelet-Regularized 3D Gaussian Splatting for Sparse-View Scene Reconstruction\n\nWe propose a new method for few-shot 3D reconstruction that integrates global and local frequency regularization to stabilize geometry and preserve fine details under sparse-view conditions, addressing a key limitation of existing 3D Gaussian Splatting (3DGS) models. We also introduce a new multispectral greenhouse dataset containing four spectral bands captured from diverse plant species under controlled conditions. Alongside the dataset, we release an open-source benchmarking package that defines standardized few-shot reconstruction protocols for evaluating 3DGS-based methods. Experiments on our multispectral dataset, as well as standard benchmarks, demonstrate that the proposed method achieves sharper, more stable, and spectrally consistent reconstructions than existing baselines. The dataset and code for this work are publicly available at https://github.com/Advanced-Vision-and-Learning-Lab/sparse-view-3dgs-pack.\n\n我们提出一种新的少样本三维重建方法，通过联合全局与局部频率正则，在稀疏视角条件下稳定几何结构并保留细节，从而解决现有 3D Gaussian Splatting 模型中的一个关键局限。与此同时，我们还发布了一个新的多光谱温室数据集，其中包含在受控环境下从多种植物物种采集的四个光谱波段数据。围绕该数据集，我们同步提供开源基准评测工具包，用于定义标准化的少样本重建协议，以评估基于 3DGS 的方法。我们在这一多光谱数据集及标准基准上的实验表明，该方法相较现有基线能够获得更锐利、更稳定且光谱一致性更好的重建结果。数据集和代码已公开在 https://github.com/Advanced-Vision-and-Learning-Lab/sparse-view-3dgs-pack。\n"
  },
  {
    "path": "abs/2601.17354.md",
    "content": "### PocketGS: On-Device Training of 3D Gaussian Splatting for High Perceptual Modeling\n\nEfficient and high-fidelity 3D scene modeling is a long-standing pursuit in computer graphics. While recent 3D Gaussian Splatting (3DGS) methods achieve impressive real-time modeling performance, they rely on resource-unconstrained training assumptions that fail on mobile devices, which are limited by minute-scale training budgets and hardware-available peak memory. We present PocketGS, a mobile scene modeling paradigm that enables on-device 3DGS training under these tightly coupled constraints while preserving high perceptual fidelity. Our method resolves the fundamental contradictions of standard 3DGS through three co-designed operators: G builds geometry-faithful point-cloud priors; I injects local surface statistics to seed anisotropic Gaussians, thereby reducing early conditioning gaps; and T unrolls alpha compositing with cached intermediates and index-mapped gradient scattering for stable mobile backpropagation. Collectively, these operators satisfy the competing requirements of training efficiency, memory compactness, and modeling fidelity. Extensive experiments demonstrate that PocketGS is able to outperform the powerful mainstream workstation 3DGS baseline to deliver high-quality reconstructions, enabling a fully on-device, practical capture-to-rendering workflow.\n\n高效且高保真的三维场景建模一直是计算机图形学中的长期目标。尽管近期的 3D Gaussian Splatting 方法已经展现出出色的实时建模能力，但它们通常假设训练阶段不受资源限制，这一前提在移动设备上并不成立，因为移动端往往只允许分钟级训练预算，并受到可用峰值显存或内存的严格限制。为此，我们提出 PocketGS，这是一种面向移动端场景建模的新范式，能够在这些紧密耦合的限制下实现设备端 3DGS 训练，同时保持较高的感知质量。我们通过三个协同设计的算子解决标准 3DGS 中的核心矛盾：G 用于构建几何可信的点云先验；I 将局部表面统计信息注入初始化过程，以生成各向异性高斯并减小早期条件失配；T 则通过缓存中间量和基于索引映射的梯度散射展开 alpha 合成，从而支持稳定的移动端反向传播。三者共同满足了训练效率、内存紧凑性和建模保真度之间的竞争性要求。大量实验表明，PocketGS 甚至能够超越主流工作站级 3DGS 基线，获得高质量重建结果，从而实现真正可用的端侧采集到渲染流程。\n"
  },
  {
    "path": "abs/2601.17835.md",
    "content": "### Geometry-Grounded Gaussian Splatting\n\nGaussian Splatting has demonstrated impressive quality and efficiency in novel view synthesis. However, shape extraction from Gaussian primitives remains an open problem. Due to inadequate geometry parameterization and approximation, existing shape reconstruction methods suffer from poor multi-view consistency and are sensitive to floaters. In this paper, we present a rigorous theoretical derivation that establishes Gaussian primitives as a specific type of stochastic solids. This theoretical framework provides a principled foundation for Geometry-Grounded Gaussian Splatting by enabling the direct treatment of Gaussian primitives as explicit geometric representations. Using the volumetric nature of stochastic solids, our method efficiently renders high-quality depth maps for fine-grained geometry extraction. Experiments show that our method achieves the best shape reconstruction results among all Gaussian Splatting-based methods on public datasets.\n\nGaussian Splatting 在新视角合成任务中已经展现出优异的质量和效率，但如何从高斯基元中准确提取形状仍然是一个开放问题。由于几何参数化和近似方式不足，现有形状重建方法往往存在多视图一致性较差、并且对漂浮伪影敏感的问题。本文通过严格的理论推导，将高斯基元建立为一种特定类型的随机实体，从而为 Geometry-Grounded Gaussian Splatting 提供了有原则的理论基础，使高斯基元能够被直接视为显式几何表示。利用随机实体的体积性质，我们的方法能够高效渲染高质量深度图，用于细粒度几何提取。实验结果表明，在公开数据集上，该方法在所有基于 Gaussian Splatting 的方法中取得了最好的形状重建结果。\n"
  },
  {
    "path": "abs/2601.18475.md",
    "content": "### LoD-Structured 3D Gaussian Splatting for Streaming Video Reconstruction\n\nFree-Viewpoint Video (FVV) reconstruction enables photorealistic and interactive 3D scene visualization; however, real-time streaming is often bottlenecked by sparse-view inputs, prohibitive training costs, and bandwidth constraints. While recent 3D Gaussian Splatting (3DGS) has advanced FVV due to its superior rendering speed, Streaming Free-Viewpoint Video (SFVV) introduces additional demands for rapid optimization, high-fidelity reconstruction under sparse constraints, and minimal storage footprints. To bridge this gap, we propose StreamLoD-GS, an LoD-based Gaussian Splatting framework designed specifically for SFVV. Our approach integrates three core innovations: 1) an anchor- and octree-based LoD-structured 3DGS with a hierarchical Gaussian dropout technique to ensure efficient and stable optimization while maintaining high-quality rendering; 2) a GMM-based motion partitioning mechanism that separates dynamic and static content, refining dynamic regions while preserving background stability; and 3) a quantized residual refinement framework that significantly reduces storage requirements without compromising visual fidelity. Extensive experiments demonstrate that StreamLoD-GS achieves competitive or state-of-the-art performance in terms of quality, efficiency, and storage.\n\n自由视点视频重建能够实现逼真且可交互的三维场景可视化，但实时流式传输通常受限于稀疏视角输入、高昂的训练成本以及带宽约束。尽管近期 3D Gaussian Splatting 依靠优异的渲染速度推动了自由视点视频重建的发展，但流式自由视点视频场景对快速优化、稀疏条件下的高保真重建以及极小的存储开销提出了更高要求。为弥补这一缺口，我们提出 StreamLoD-GS，这是一种专为流式自由视点视频设计的基于层次细节的 Gaussian Splatting 框架。该方法包含三项核心创新：1）结合锚点和八叉树的层次细节结构化 3DGS，并引入分层高斯 dropout 技术，在保持高质量渲染的同时实现高效稳定优化；2）基于高斯混合模型的运动划分机制，将动态内容与静态内容分离，在保留背景稳定性的同时细化动态区域；3）量化残差细化框架，在不牺牲视觉保真的情况下显著降低存储需求。大量实验表明，StreamLoD-GS 在质量、效率和存储开销方面达到有竞争力或当前最先进的水平。\n"
  },
  {
    "path": "abs/2601.18629.md",
    "content": "### ExoGS: A 4D Real-to-Sim-to-Real Framework for Scalable Manipulation Data Collection\n\nReal-to-Sim-to-Real technique is gaining increasing interest for robotic manipulation, as it can generate scalable data in simulation while having a narrower sim-to-real gap. However, previous methods mainly focused on environment-level visual real-to-sim transfer, ignoring the transfer of interactions, which could be challenging and inefficient to obtain purely in simulation, especially for contact-rich tasks. We propose ExoGS, a robot-free 4D Real-to-Sim-to-Real framework that captures both static environments and dynamic interactions in the real world and transfers them seamlessly to a simulated environment. It provides a new solution for scalable manipulation data collection and policy learning. ExoGS employs a self-designed robot-isomorphic passive exoskeleton AirExo-3 to capture kinematically consistent trajectories with millimeter-level accuracy and synchronized RGB observations during direct human demonstrations. The robot, objects, and environment are reconstructed as editable 3D Gaussian Splatting assets, enabling geometry-consistent replay and large-scale data augmentation. Additionally, a lightweight Mask Adapter injects instance-level semantics into the policy to enhance robustness under visual domain shifts. Real-world experiments demonstrate that ExoGS significantly improves data efficiency and policy generalization compared to teleoperation-based baselines. Code and hardware files have been released at https://github.com/zaixiabalala/ExoGS.\n\nReal-to-Sim-to-Real 技术在机器人操作领域正受到越来越多关注，因为它能够在仿真环境中生成可扩展的数据，同时缩小仿真到现实之间的差距。然而，既有方法主要关注环境层面的视觉 real-to-sim 迁移，忽略了交互过程本身的迁移，而这类交互尤其在富接触任务中很难仅靠仿真高效获得。为此，我们提出 ExoGS，这是一种无需机器人参与的四维 Real-to-Sim-to-Real 框架，能够同时采集真实世界中的静态环境和动态交互，并将其无缝迁移到仿真环境中，从而为大规模操作数据采集和策略学习提供新的解决方案。ExoGS 使用自设计的类机器人被动外骨骼 AirExo-3，在人类直接示范过程中采集具有毫米级精度的运动学一致轨迹和同步 RGB 观测。机器人、物体和环境都会被重建为可编辑的 3D Gaussian Splatting 资产，以支持几何一致的回放和大规模数据增强。此外，一个轻量级 Mask Adapter 会将实例级语义注入策略中，以增强其在视觉域偏移下的鲁棒性。真实世界实验表明，相较基于遥操作的基线方法，ExoGS 显著提升了数据效率和策略泛化能力。代码和硬件文件已发布在 https://github.com/zaixiabalala/ExoGS。\n"
  },
  {
    "path": "abs/2601.18633.md",
    "content": "### Splat-Portrait: Generalizing Talking Heads with Gaussian Splatting\n\nTalking Head Generation aims at synthesizing natural-looking talking videos from speech and a single portrait image. Previous 3D talking head generation methods have relied on domain-specific heuristics such as warping-based facial motion representation priors to animate talking motions, yet still produce inaccurate 3D avatar reconstructions, thus undermining the realism of generated animations. We introduce Splat-Portrait, a Gaussian-splatting-based method that addresses the challenges of 3D head reconstruction and lip motion synthesis. Our approach automatically learns to disentangle a single portrait image into a static 3D reconstruction represented as static Gaussian Splatting, and a predicted whole-image 2D background. It then generates natural lip motion conditioned on input audio, without any motion-driven priors. Training is driven purely by 2D reconstruction and score-distillation losses, without 3D supervision nor landmarks. Experimental results demonstrate that Splat-Portrait exhibits superior performance on talking head generation and novel view synthesis, achieving better visual quality compared to previous works. Our project code and supplementary documents are publicly available at https://github.com/stonewalking/Splat-portrait.\n\nTalking Head Generation 旨在根据语音和单张人像图像合成自然逼真的说话视频。以往的三维说话头像生成方法通常依赖特定领域的启发式设计，例如基于形变的面部运动表示先验来驱动说话动作，但仍然会生成不准确的三维头像重建结果，从而削弱最终动画的真实感。本文提出 Splat-Portrait，这是一种基于 Gaussian Splatting 的方法，用于同时解决三维头部重建和唇部运动合成问题。该方法能够自动将单张人像图像分解为由静态 Gaussian Splatting 表示的静态三维重建，以及一个预测得到的整图二维背景。随后，它在不依赖任何运动先验的情况下，根据输入语音生成自然的唇部动作。训练过程完全由二维重建损失和 score distillation 损失驱动，不需要三维监督或人脸关键点。实验结果表明，Splat-Portrait 在说话头像生成和新视角合成任务上都表现优越，相较现有方法取得了更好的视觉质量。项目代码和补充材料已公开在 https://github.com/stonewalking/Splat-portrait。\n"
  },
  {
    "path": "abs/2601.19233.md",
    "content": "### UniMGS: Unifying Mesh and 3D Gaussian Splatting with Single-Pass Rasterization and Proxy-Based Deformation\n\nJoint rendering and deformation of mesh and 3D Gaussian Splatting (3DGS) have significant value as both representations offer complementary advantages for graphics applications. However, due to differences in representation and rendering pipelines, existing studies render meshes and 3DGS separately, making it difficult to accurately handle occlusions and transparency. Moreover, the deformed 3DGS still suffers from visual artifacts due to the sensitivity to the topology quality of the proxy mesh. These issues pose serious obstacles to the joint use of 3DGS and meshes, making it difficult to adapt 3DGS to conventional mesh-oriented graphics pipelines. We propose UniMGS, the first unified framework for rasterizing mesh and 3DGS in a single-pass anti-aliased manner, with a novel binding strategy for 3DGS deformation based on proxy mesh. Our key insight is to blend the colors of both triangle and Gaussian fragments by anti-aliased alpha blending in a single pass, achieving visually coherent results with precise handling of occlusion and transparency. To improve the visual appearance of the deformed 3DGS, our Gaussian-centric binding strategy employs a proxy mesh and spatially associates Gaussians with the mesh faces, significantly reducing rendering artifacts. With these two components, UniMGS enables the visualization and manipulation of 3D objects represented by mesh or 3DGS within a unified framework, opening up new possibilities in embodied AI, virtual reality, and gaming. We will release our source code to facilitate future research.\n\n将 mesh 与 3D Gaussian Splatting 统一用于渲染和形变具有重要价值，因为两种表示在图形应用中具备互补优势。然而，由于表示形式和渲染流程不同，现有工作通常分别渲染 mesh 与 3DGS，难以精确处理遮挡和透明关系。此外，形变后的 3DGS 仍会受到代理 mesh 拓扑质量敏感性的影响，产生视觉伪影。这些问题严重阻碍了 mesh 与 3DGS 的联合使用，也使 3DGS 难以适配传统面向 mesh 的图形管线。为此，我们提出 UniMGS，这是首个能够以单次前向、抗锯齿方式统一栅格化 mesh 和 3DGS 的框架，并为基于代理 mesh 的 3DGS 形变设计了新的绑定策略。我们的核心思想是在单次渲染过程中，通过抗锯齿的 alpha 混合同时融合三角形片元和高斯片元的颜色，从而更精确地处理遮挡与透明效果，获得视觉上一致的结果。为改善形变后 3DGS 的视觉质量，我们提出以高斯为中心的绑定策略，借助代理 mesh 将高斯与 mesh 面片建立空间关联，显著减少渲染伪影。依靠这两个核心组件，UniMGS 使得以 mesh 或 3DGS 表示的三维对象可以在统一框架中进行可视化与编辑，为具身智能、虚拟现实和游戏等方向带来新的可能。作者表示后续将公开源代码。\n"
  },
  {
    "path": "abs/2601.19489.md",
    "content": "### Fast Converging 3D Gaussian Splatting for 1-Minute Reconstruction\n\nWe present a fast 3DGS reconstruction pipeline designed to converge within one minute, developed for the SIGGRAPH Asia 3DGS Fast Reconstruction Challenge. The challenge consists of an initial round using SLAM-generated camera poses with noisy trajectories and a final round using highly accurate COLMAP poses. To robustly handle these heterogeneous settings, we develop a two-stage solution. In the first round, we use reverse per-Gaussian parallel optimization and compact forward splatting based on Taming-GS and Speedy-splat, load-balanced tiling, an anchor-based Neural-Gaussian representation for rapid convergence with fewer learnable parameters, initialization from monocular depth and partially from feed-forward 3DGS models, and a global pose refinement module for noisy SLAM trajectories. In the final round, the accurate COLMAP poses change the optimization landscape; we disable pose refinement, revert from Neural-Gaussians to standard 3DGS to eliminate MLP inference overhead, introduce multi-view consistency-guided Gaussian splitting inspired by Fast-GS, and add a depth estimator to supervise rendered depth. Together, these techniques enable high-fidelity reconstruction under a strict one-minute budget. Our method achieved the top performance with a PSNR of 28.43 and ranked first in the competition. Code is available at https://github.com/will-zzy/siggraph_asia.\n\n我们提出一条面向 1 分钟内收敛的快速 3DGS 重建流程，并将其用于 SIGGRAPH Asia 3DGS Fast Reconstruction Challenge。该挑战的初赛使用带噪声轨迹的 SLAM 相机位姿，决赛则使用精度很高的 COLMAP 位姿。为稳健处理这两种差异明显的设置，我们设计了一个两阶段方案。在第一阶段，我们采用基于 Taming-GS 和 Speedy-splat 的反向逐高斯并行优化与紧凑前向 splatting、负载均衡分块、基于锚点的 Neural-Gaussian 表示以减少可学习参数并加快收敛，同时结合单目深度和部分前馈式 3DGS 模型进行初始化，并加入全局位姿优化模块来修正噪声 SLAM 轨迹。在第二阶段，由于准确的 COLMAP 位姿改变了优化条件，我们关闭位姿优化，改回标准 3DGS 以消除 MLP 推理开销，引入受 Fast-GS 启发的多视图一致性高斯分裂策略，并加入深度估计器监督渲染深度。上述设计共同实现了在严格 1 分钟预算下的高保真重建。我们的方法以 28.43 的 PSNR 取得最佳成绩，并在竞赛中排名第一。代码地址为 https://github.com/will-zzy/siggraph_asia。\n"
  },
  {
    "path": "abs/2601.19717.md",
    "content": "### DiffStyle3D: Consistent 3D Gaussian Stylization via Attention Optimization\n\n3D style transfer enables the creation of visually expressive 3D content, enriching the visual appearance of 3D scenes and objects. However, existing VGG- and CLIP-based methods struggle to model multi-view consistency within the model itself, while diffusion-based approaches can capture such consistency but rely on denoising directions, leading to unstable training. To address these limitations, we propose DiffStyle3D, a novel diffusion-based paradigm for 3DGS style transfer that directly optimizes in the latent space. Specifically, we introduce an Attention-Aware Loss that performs style transfer by aligning style features in the self-attention space, while preserving original content through content feature alignment. Inspired by the geometric invariance of 3D stylization, we propose a Geometry-Guided Multi-View Consistency method that integrates geometric information into self-attention to enable cross-view correspondence modeling. Based on geometric information, we additionally construct a geometry-aware mask to prevent redundant optimization in overlapping regions across views, which further improves multi-view consistency. Extensive experiments show that DiffStyle3D outperforms state-of-the-art methods, achieving higher stylization quality and visual realism.\n\n三维风格迁移能够创建更具表现力的三维内容，丰富三维场景和对象的视觉外观。然而，现有基于 VGG 和 CLIP 的方法难以在模型内部有效建模多视图一致性；而基于扩散模型的方法虽然能够捕获这类一致性，却依赖去噪方向，导致训练过程不稳定。为解决这些问题，我们提出 DiffStyle3D，这是一种新的基于扩散的 3DGS 风格化范式，直接在潜空间中进行优化。具体而言，我们提出 Attention-Aware Loss，在自注意力空间中对齐风格特征来实现风格迁移，同时通过内容特征对齐保留原始内容。受三维风格化中几何不变性的启发，我们进一步提出 Geometry-Guided Multi-View Consistency 方法，将几何信息融入自注意力机制，从而实现跨视图对应关系建模。基于几何信息，我们还构建了几何感知掩码，用于避免跨视图重叠区域中的冗余优化，进一步提升多视图一致性。大量实验表明，DiffStyle3D 优于现有最先进方法，在风格化质量和视觉真实感方面都取得了更好结果。\n"
  },
  {
    "path": "abs/2601.19753.md",
    "content": "### WaterClear-GS: Optical-Aware Gaussian Splatting for Underwater Reconstruction and Restoration\n\nUnderwater 3D reconstruction and appearance restoration are hindered by the complex optical properties of water, such as wavelength-dependent attenuation and scattering. Existing Neural Radiance Fields (NeRF)-based methods struggle with slow rendering speeds and suboptimal color restoration, while 3D Gaussian Splatting (3DGS) inherently lacks the capability to model complex volumetric scattering effects. To address these issues, we introduce WaterClear-GS, the first pure 3DGS-based framework that explicitly integrates underwater optical properties of local attenuation and scattering into Gaussian primitives, eliminating the need for an auxiliary medium network. Our method employs a dual-branch optimization strategy to ensure underwater photometric consistency while naturally recovering water-free appearances. This strategy is enhanced by depth-guided geometry regularization and perception-driven image loss, together with exposure constraints, spatially-adaptive regularization, and physically guided spectral regularization, which collectively enforce local 3D coherence and maintain natural visual perception. Experiments on standard benchmarks and our newly collected dataset demonstrate that WaterClear-GS achieves outstanding performance on both novel view synthesis (NVS) and underwater image restoration (UIR) tasks, while maintaining real-time rendering. The code will be available at https://buaaxrzhang.github.io/WaterClear-GS/.\n\n水下三维重建与外观恢复会受到水体复杂光学性质的显著影响，例如与波长相关的衰减和散射。现有基于 Neural Radiance Fields 的方法通常存在渲染速度慢、颜色恢复效果不理想的问题，而 3D Gaussian Splatting 本身又缺乏建模复杂体散射效应的能力。为了解决这些问题，我们提出 WaterClear-GS，这是首个纯 3DGS 框架，能够将局部衰减与散射等水下光学特性显式融入高斯基元中，从而无需额外的介质网络。我们的方法采用双分支优化策略，在保证水下光度一致性的同时自然恢复无水外观。该策略进一步结合深度引导的几何正则、感知驱动的图像损失，以及曝光约束、空间自适应正则和物理引导的光谱正则，共同约束局部三维一致性并维持自然视觉效果。标准基准和我们新采集的数据集上的实验表明，WaterClear-GS 在新视角合成和水下图像恢复两项任务上都取得了优异表现，同时保持了实时渲染能力。代码将发布在 https://buaaxrzhang.github.io/WaterClear-GS/。\n"
  },
  {
    "path": "abs/2601.19843.md",
    "content": "### Graphical X Splatting (GraphiXS): A Graphical Model for 4D Gaussian Splatting under Uncertainty\n\nWe propose a new framework to systematically incorporate data uncertainty in Gaussian Splatting. Being the new paradigm of neural rendering, Gaussian Splatting has been investigated in many applications, with the main effort in extending its representation, improving its optimization process, and accelerating its speed. However, one orthogonal, much needed, but under-explored area is data uncertainty. In standard 4D Gaussian Splatting, data uncertainty can manifest as view sparsity, missing frames, camera asynchronization, etc. So far, there has been little research to holistically incorporating various types of data uncertainty under a single framework. To this end, we propose Graphical X Splatting, or GraphiXS, a new probabilistic framework that considers multiple types of data uncertainty, aiming for a fundamental augmentation of the current 4D Gaussian Splatting paradigm into a probabilistic setting. GraphiXS is general and can be instantiated with a range of primitives, e.g. Gaussians and Student's t. Furthermore, GraphiXS can be used to upgrade existing methods to accommodate data uncertainty. Through exhaustive evaluation and comparison, we demonstrate that GraphiXS can systematically model various uncertainties in data, outperform existing methods in many settings where data are missing or polluted in space and time, and therefore is a major generalization of the current 4D Gaussian Splatting research.\n\n我们提出一个新框架，用于在 Gaussian Splatting 中系统性地建模数据不确定性。作为神经渲染的新范式，Gaussian Splatting 已在许多应用中得到研究，现有工作主要聚焦于扩展表示形式、改进优化过程和提升运行速度。然而，数据不确定性这一方向虽与之正交，却同样关键且研究明显不足。在标准的四维 Gaussian Splatting 中，不确定性可能体现为视角稀疏、帧缺失、相机不同步等问题。到目前为止，几乎没有工作能在统一框架下整体处理多种类型的数据不确定性。为此，我们提出 Graphical X Splatting，简称 GraphiXS。这是一个新的概率建模框架，能够同时考虑多种数据不确定性，旨在将现有四维 Gaussian Splatting 从确定性范式根本性扩展到概率范式。GraphiXS 具有通用性，可实例化为多种基元形式，例如高斯或 Student's t 分布；它还可以作为现有方法的升级模块，使其具备处理数据不确定性的能力。通过充分的实验评估与比较，我们表明 GraphiXS 能够系统地建模多种不确定性，并在时空数据缺失或受污染的多种场景中优于现有方法，因此代表了当前四维 Gaussian Splatting 研究的重要泛化方向。\n"
  },
  {
    "path": "abs/2601.20331.md",
    "content": "### GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction\n\n3D Gaussian Splatting enables efficient optimization and high-quality rendering, yet accurate surface reconstruction remains challenging. Prior methods improve surface reconstruction by refining Gaussian depth estimates, either via multi-view geometric consistency or through monocular depth priors. However, multi-view constraints become unreliable under large geometric discrepancies, while monocular priors suffer from scale ambiguity and local inconsistency, ultimately leading to inaccurate Gaussian depth supervision. To address these limitations, we introduce a Gaussian visibility-aware multi-view geometric consistency constraint that aggregates the visibility of shared Gaussian primitives across views, enabling more accurate and stable geometric supervision. In addition, we propose a progressive quadtree-calibrated monocular depth constraint that performs block-wise affine calibration from coarse to fine spatial scales, mitigating the scale ambiguity of depth priors while preserving fine-grained surface details. Extensive experiments on DTU and TNT datasets demonstrate consistent improvements in geometric accuracy over prior Gaussian-based and implicit surface reconstruction methods. Codes are available at https://github.com/GVGScode/GVGS.\n\n3D Gaussian Splatting 具备高效优化和高质量渲染能力，但要实现准确的表面重建仍然具有挑战性。现有方法通常通过改进高斯深度估计来提升表面重建质量，要么依赖多视图几何一致性，要么依赖单目深度先验。然而，当几何差异较大时，多视图约束会变得不可靠；而单目先验又会受到尺度歧义和局部不一致的影响，最终导致高斯深度监督不准确。为解决这些问题，本文提出一种高斯可见性感知的多视图几何一致性约束，通过聚合跨视图共享高斯基元的可见性，提供更准确、更稳定的几何监督。此外，我们还提出一种渐进式四叉树校准的单目深度约束，在从粗到细的空间尺度上执行分块仿射校准，从而缓解深度先验的尺度歧义并保留细粒度表面细节。在 DTU 和 TNT 数据集上的大量实验表明，该方法相较以往基于高斯和基于隐式表示的表面重建方法，在几何精度上均取得了稳定提升。代码地址为 https://github.com/GVGScode/GVGS。\n"
  },
  {
    "path": "abs/2601.20429.md",
    "content": "### GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering\n\n3D Gaussian Splatting has gained widespread adoption across diverse applications due to its exceptional rendering performance and visual quality. While most existing methods rely on rasterization to render Gaussians, recent research has started investigating ray tracing approaches to overcome the fundamental limitations inherent in rasterization. However, current Gaussian ray tracing methods suffer from inefficiencies such as bloated acceleration structures and redundant node traversals, which greatly degrade ray tracing performance. In this work, we present GRTX, a set of software and hardware optimizations that enable efficient ray tracing for 3D Gaussian-based rendering. First, we introduce a novel approach for constructing streamlined acceleration structures for Gaussian primitives. Our key insight is that anisotropic Gaussians can be treated as unit spheres through ray space transformations, which substantially reduces BVH size and traversal overhead. Second, we propose dedicated hardware support for traversal checkpointing within ray tracing units. This eliminates redundant node visits during multi-round tracing by resuming traversal from checkpointed nodes rather than restarting from the root node in each subsequent round. Our evaluation shows that GRTX significantly improves ray tracing performance compared to the baseline ray tracing method with a negligible hardware cost.\n\n3D Gaussian Splatting 凭借出色的渲染性能和视觉质量，已经在众多应用中得到广泛采用。尽管现有大多数方法依赖栅格化来渲染高斯，近期研究开始探索利用光线追踪克服栅格化固有的局限。然而，现有高斯光线追踪方法通常存在加速结构臃肿、节点遍历冗余等低效问题，显著拖慢了光线追踪性能。为此，本文提出 GRTX，这是一组面向三维高斯渲染的软硬件联合优化方案。首先，我们提出一种构建紧凑高斯基元加速结构的新方法。核心观察在于，各向异性高斯可以通过光线空间变换视作单位球，从而显著降低 BVH 的规模和遍历开销。其次，我们为光线追踪单元设计了专门的遍历检查点硬件支持，使得多轮追踪时可以从已保存的节点继续遍历，而不必在每一轮都从根节点重新开始，从而消除重复节点访问。实验结果表明，在几乎可忽略的硬件开销下，GRTX 相较基线高斯光线追踪方法能够显著提升性能。\n"
  },
  {
    "path": "abs/2601.20857.md",
    "content": "### FreeFix: Boosting 3D Gaussian Splatting via Fine-Tuning-Free Diffusion Models\n\nNeural Radiance Fields and 3D Gaussian Splatting have advanced novel view synthesis, yet still rely on dense inputs and often degrade at extrapolated views. Recent approaches leverage generative models, such as diffusion models, to provide additional supervision, but face a trade-off between generalization and fidelity: fine-tuning diffusion models for artifact removal improves fidelity but risks overfitting, while fine-tuning-free methods preserve generalization but often yield lower fidelity. We introduce FreeFix, a fine-tuning-free approach that pushes the boundary of this trade-off by enhancing extrapolated rendering with pretrained image diffusion models. We present an interleaved 2D-3D refinement strategy, showing that image diffusion models can be leveraged for consistent refinement without relying on costly video diffusion models. Furthermore, we take a closer look at the guidance signal for 2D refinement and propose a per-pixel confidence mask to identify uncertain regions for targeted improvement. Experiments across multiple datasets show that FreeFix improves multi-frame consistency and achieves performance comparable to or surpassing fine-tuning-based methods, while retaining strong generalization ability. Our project page is at https://xdimlab.github.io/freefix/.\n\nNeural Radiance Fields 和 3D Gaussian Splatting 推动了新视角合成的发展，但它们仍依赖稠密输入，并且在外推视角下往往会明显退化。近期一些方法利用扩散模型等生成式模型提供额外监督，但会面临泛化性与保真度之间的权衡：对扩散模型进行微调有助于去除伪影并提升保真度，却容易过拟合；不做微调的方法虽然保留了泛化能力，但通常保真度较低。为此，我们提出 FreeFix，这是一种无需微调的方法，利用预训练图像扩散模型提升外推视角的渲染质量，进一步推动这一权衡的边界。我们设计了交替进行的二维到三维联合细化策略，说明图像扩散模型无需依赖高成本的视频扩散模型，也能实现一致的多视角细化。进一步地，我们重新审视二维细化中的引导信号，提出逐像素置信度掩码，以定位不确定区域并进行针对性改进。多个数据集上的实验表明，FreeFix 在保留强泛化能力的同时提升了多帧一致性，性能达到或超过基于微调的方法。项目主页为 https://xdimlab.github.io/freefix/。\n"
  },
  {
    "path": "abs/2601.22046.md",
    "content": "### PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction\n\nStreaming reconstruction from monocular image sequences remains challenging, as existing methods typically favor either high-quality rendering or accurate geometry, but rarely both. We present PLANING, an efficient on-the-fly reconstruction framework built on a hybrid representation that loosely couples explicit geometric primitives with neural Gaussians, enabling geometry and appearance to be modeled in a decoupled manner. This decoupling supports an online initialization and optimization strategy that separates geometry and appearance updates, yielding stable streaming reconstruction with substantially reduced structural redundancy. PLANING improves dense mesh Chamfer-L2 by 18.52% over PGSR, surpasses ARTDECO by 1.31 dB PSNR, and reconstructs ScanNetV2 scenes in under 100 seconds, over 5x faster than 2D Gaussian Splatting, while matching the quality of offline per-scene optimization. Beyond reconstruction quality, the structural clarity and computational efficiency of PLANING make it well suited for a broad range of downstream applications, such as enabling large-scale scene modeling and simulation-ready environments for embodied AI. Project page: https://city-super.github.io/PLANING/.\n\n基于单目图像序列的流式重建仍然具有挑战性，因为现有方法通常只能在高质量渲染和准确几何之间偏向其一，很难兼顾两者。本文提出 PLANING，这是一种高效的在线重建框架，建立在一种混合表示之上，将显式几何基元与神经高斯松耦合结合，从而使几何与外观能够以解耦方式建模。这种解耦支持在线初始化与优化策略，将几何更新和外观更新分离，进而实现稳定的流式重建，并显著减少结构冗余。相较于 PGSR，PLANING 将稠密 mesh 的 Chamfer-L2 降低了 18.52%；相较于 ARTDECO，PSNR 提升了 1.31 dB；在 ScanNetV2 上，它能在 100 秒内完成场景重建，速度比 2D Gaussian Splatting 快 5 倍以上，同时达到与离线逐场景优化相当的质量。除重建质量之外，PLANING 在结构清晰度和计算效率上的优势，也使其适合更广泛的下游应用，例如大规模场景建模和面向具身智能的可仿真环境。项目主页为 https://city-super.github.io/PLANING/。\n"
  },
  {
    "path": "abs/2601.22990.md",
    "content": "### Self-Supervised Slice-to-Volume Reconstruction with Gaussian Representations for Fetal MRI\n\nReconstructing 3D fetal MR volumes from motion-corrupted stacks of 2D slices is a crucial and challenging task. Conventional slice-to-volume reconstruction (SVR) methods are time-consuming and require multiple orthogonal stacks for reconstruction. While learning-based SVR approaches have significantly reduced the time required at the inference stage, they heavily rely on ground truth information for training, which is inaccessible in practice. To address these challenges, we propose GaussianSVR, a self-supervised framework for slice-to-volume reconstruction. GaussianSVR represents the target volume using 3D Gaussian representations to achieve high-fidelity reconstruction. It leverages a simulated forward slice acquisition model to enable self-supervised training, alleviating the need for ground-truth volumes. Furthermore, to enhance both accuracy and efficiency, we introduce a multi-resolution training strategy that jointly optimizes Gaussian parameters and spatial transformations across different resolution levels. Experiments show that GaussianSVR outperforms the baseline methods on fetal MR volumetric reconstruction. Code will be available upon acceptance.\n\n从受运动干扰的二维切片堆栈中重建三维胎儿磁共振体数据，是一项关键但极具挑战性的任务。传统的切片到体重建方法耗时较长，并且通常需要多组相互正交的切片堆栈。尽管基于学习的方法显著缩短了推理阶段的重建时间，但它们在训练时高度依赖真实体数据，而这在实际中往往无法获得。为了解决这些问题，本文提出 GaussianSVR，一种用于切片到体重建的自监督框架。GaussianSVR 使用三维高斯表示目标体数据，以实现高保真重建；同时结合模拟的前向切片采集模型进行自监督训练，从而避免对真实体数据的依赖。进一步地，为了同时提升精度与效率，我们提出多分辨率训练策略，在不同分辨率层级上联合优化高斯参数和空间变换。实验结果表明，在胎儿磁共振体重建任务上，GaussianSVR 优于基线方法。代码将在论文录用后公开。\n"
  },
  {
    "path": "abs/2601.23065.md",
    "content": "### EAG-PT: Emission-Aware Gaussians and Path Tracing for Indoor Scene Reconstruction and Editing\n\nRecent reconstruction methods based on radiance field such as NeRF and 3DGS reproduce indoor scenes with high visual fidelity, but break down under scene editing due to baked illumination and the lack of explicit light transport. In contrast, physically based inverse rendering relies on mesh representations and path tracing, which enforce correct light transport but place strong requirements on geometric fidelity, becoming a practical bottleneck for real indoor scenes. In this work, we propose Emission-Aware Gaussians and Path Tracing (EAG-PT), aiming for physically based light transport with a unified 2D Gaussian representation. Our design is based on three cores: (1) using 2D Gaussians as a unified scene representation and transport-friendly geometry proxy that avoids reconstructed mesh, (2) explicitly separating emissive and non-emissive components during reconstruction for further scene editing, and (3) decoupling reconstruction from final rendering by using efficient single-bounce optimization and high-quality multi-bounce path tracing after scene editing. Experiments on synthetic and real indoor scenes show that EAG-PT produces more natural and physically consistent renders after editing than radiant scene reconstructions, while preserving finer geometric detail and avoiding mesh-induced artifacts compared to mesh-based inverse path tracing. These results suggest promising directions for future use in interior design, XR content creation, and embodied AI.\n\n近期基于辐射场的重建方法，例如 NeRF 和 3DGS，能够以较高视觉保真度重建室内场景，但在场景编辑时往往会因光照被烘焙进表示中、且缺乏显式光传输机制而失效。相比之下，基于物理的逆渲染通常依赖 mesh 表示和路径追踪，虽然能正确建模光传输，但对几何精度要求很高，这在真实室内场景中会成为实际瓶颈。为此，本文提出 EAG-PT，即 Emission-Aware Gaussians and Path Tracing，旨在基于统一的二维高斯表示实现符合物理规律的光传输。其设计包含三个核心：1）使用二维高斯作为统一场景表示和适合光传输的几何代理，从而避免依赖重建 mesh；2）在重建阶段显式分离发光与非发光成分，以支持后续场景编辑；3）将重建与最终渲染解耦，在编辑后使用高效的单次反弹优化和高质量的多次反弹路径追踪。合成与真实室内场景实验表明，EAG-PT 在编辑后能够比传统辐射场重建产生更自然且物理一致的渲染结果，同时相较基于 mesh 的逆路径追踪保留更细的几何细节，并避免 mesh 带来的伪影。这些结果显示，该方向在室内设计、XR 内容创作和具身智能中具有良好的应用前景。\n"
  },
  {
    "path": "abs/2602.00395.md",
    "content": "### 3DGS$^2$-TR: Scalable Second-Order Trust-Region Method for 3D Gaussian Splatting\n\nWe propose 3DGS$^2$-TR,a second-order optimizer for accelerating the scene training problem in 3D Gaussian Splatting (3DGS). Unlike existing second-order approaches that rely on explicit or dense curvature representations, such as 3DGS-LM (Höllein et al., 2025) or 3DGS2 (Lan et al., 2025), our method approximates curvature using only the diagonal of the Hessian matrix, efficiently via Hutchinson's method. Our approach is fully matrix-free and has the same complexity as ADAM (Kingma, 2024), $O(n)$ in both computation and memory costs. To ensure stable optimization in the presence of strong nonlinearity in the 3DGS rasterization process, we introduce a parameter-wise trust-region technique based on the squared Hellinger distance, regularizing updates to Gaussian parameters. Under identical parameter initialization and without densification, 3DGS$^2$-TR is able to achieve better reconstruction quality on standard datasets, using 50% fewer training iterations compared to ADAM, while incurring less than 1GB of peak GPU memory overhead (17% more than ADAM and 85% less than 3DGS-LM), enabling scalability to very large scenes and potentially to distributed training settings.\n\n我们提出 3DGS2-TR，这是一种用于加速三维高斯喷溅场景训练的二阶优化器。不同于现有依赖显式或稠密曲率表示的二阶方法，例如 3DGS-LM 和 3DGS2，我们的方法仅使用 Hessian 矩阵的对角项来近似曲率，并通过 Hutchinson 方法高效计算。该方法完全不需要显式矩阵，与 ADAM 在计算和内存上的复杂度相同，均为 O(n)。为保证在 3DGS 光栅化强非线性条件下的优化稳定性，我们引入基于平方 Hellinger 距离的逐参数信赖域技术，对高斯参数更新进行正则化。在相同参数初始化且不进行增密的条件下，3DGS2-TR 在标准数据集上用比 ADAM 少 50% 的训练迭代就能获得更好的重建质量，同时峰值 GPU 额外内存开销不足 1 GB，仅比 ADAM 高 17%，却比 3DGS-LM 低 85%，因此能够扩展到超大场景，甚至具备分布式训练的潜力。\n"
  },
  {
    "path": "abs/2602.00463.md",
    "content": "### PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting\n\nGenerating realistic 3D scenes from text is crucial for immersive applications like VR, AR, and gaming. While text-driven approaches promise efficiency, existing methods suffer from limited 3D-text data and inconsistent multi-view stitching, resulting in overly simplistic scenes. To address this, we propose PSGS, a two-stage framework for high-fidelity panoramic scene generation. First, a novel two-layer optimization architecture generates semantically coherent panoramas: a layout reasoning layer parses text into structured spatial relationships, while a self-optimization layer refines visual details via iterative MLLM feedback. Second, our panorama sliding mechanism initializes globally consistent 3D Gaussian Splatting point clouds by strategically sampling overlapping perspectives. By incorporating depth and semantic coherence losses during training, we greatly improve the quality and detail fidelity of rendered scenes. Our experiments demonstrate that PSGS outperforms existing methods in panorama generation and produces more appealing 3D scenes, offering a robust solution for scalable immersive content creation.\n\n从文本生成逼真的三维场景，对虚拟现实、增强现实和游戏等沉浸式应用至关重要。尽管文本驱动方法具有效率优势，但现有方法受限于三维文本数据不足以及多视角拼接不一致，生成的场景往往过于简单。为此，我们提出 PSGS，一种用于高保真全景场景生成的两阶段框架。首先，一个新的双层优化架构生成语义一致的全景图：布局推理层将文本解析为结构化空间关系，自优化层则通过多模态大语言模型的迭代反馈细化视觉细节。其次，我们设计全景滑动机制，通过策略性采样具有重叠区域的视角，初始化全局一致的三维高斯喷溅点云。在训练中结合深度一致性损失和语义一致性损失后，渲染场景的质量和细节保真度得到明显提升。实验表明，PSGS 在全景生成上优于现有方法，并能生成更具吸引力的三维场景，为可扩展的沉浸式内容创建提供了可靠方案。\n"
  },
  {
    "path": "abs/2602.00671.md",
    "content": "### HPC: Hierarchical Point-based Latent Representation for Streaming Dynamic Gaussian Splatting Compression\n\nWhile dynamic Gaussian Splatting has driven significant advances in free-viewpoint video, maintaining its rendering quality with a small memory footprint for efficient streaming transmission still presents an ongoing challenge. Existing streaming dynamic Gaussian Splatting compression methods typically leverage a latent representation to drive the neural network for predicting Gaussian residuals between frames. Their core latent representations can be categorized into structured grid-based and unstructured point-based paradigms. However, the former incurs significant parameter redundancy by inevitably modeling unoccupied space, while the latter suffers from limited compactness as it fails to exploit local correlations. To relieve these limitations, we propose HPC, a novel streaming dynamic Gaussian Splatting compression framework. It employs a hierarchical point-based latent representation that operates on a per-Gaussian basis to avoid parameter redundancy in unoccupied space. Guided by a tailored aggregation scheme, these latent points achieve high compactness with low spatial redundancy. To improve compression efficiency, we further undertake the first investigation to compress neural networks for streaming dynamic Gaussian Splatting through mining and exploiting the inter-frame correlation of parameters. Combined with latent compression, this forms a fully end-to-end compression framework. Comprehensive experimental evaluations demonstrate that HPC substantially outperforms state-of-the-art methods. It achieves a storage reduction of 67% against its baseline while maintaining high reconstruction fidelity.\n\n尽管 dynamic Gaussian Splatting 显著推动了自由视点视频的发展，但要在保持渲染质量的同时获得足够小的内存占用以支持高效流式传输，仍然是一个持续存在的挑战。现有流式 dynamic Gaussian Splatting 压缩方法通常依赖潜在表示来驱动神经网络预测帧间高斯残差，其核心潜在表示大致可分为结构化网格型和非结构化点型两类。然而，前者不可避免地对空区域建模，导致大量参数冗余；后者则由于未能有效利用局部相关性而难以达到高紧凑性。为缓解这些问题，我们提出 HPC，这是一种新的流式 dynamic Gaussian Splatting 压缩框架。它采用分层点式潜在表示，以单个高斯为粒度进行建模，从而避免对空区域产生参数冗余。在定制聚合机制的引导下，这些潜在点以较低空间冗余实现了较高紧凑性。为进一步提升压缩效率，我们还首次研究了如何通过挖掘并利用参数的帧间相关性来压缩面向流式 dynamic Gaussian Splatting 的神经网络。结合潜在表示压缩，这构成了一个完全端到端的压缩框架。综合实验表明，HPC 明显优于现有最先进方法，在保持较高重建保真度的同时，相比其基线实现了 67% 的存储减少。\n"
  },
  {
    "path": "abs/2602.01057.md",
    "content": "### Radioactive 3D Gaussian Ray Tracing for Tomographic Reconstruction\n\n3D Gaussian Splatting (3DGS) has recently emerged in computer vision as a promising rendering technique. By adapting the principles of Elliptical Weighted Average (EWA) splatting to a modern differentiable pipeline, 3DGS enables real-time, high-quality novel view synthesis. Building upon this, R2-Gaussian extended the 3DGS paradigm to tomographic reconstruction by rectifying integration bias, achieving state-of-the-art performance in computed tomography (CT). To enable differentiability, R2-Gaussian adopts a local affine approximation: each 3D Gaussian is locally mapped to a 2D Gaussian on the detector and composed via alpha blending to form projections. However, the affine approximation can degrade reconstruction quantitative accuracy and complicate the incorporation of nonlinear geometric corrections. To address these limitations, we propose a tomographic reconstruction framework based on 3D Gaussian ray tracing. Our approach provides two key advantages over splatting-based models: (i) it computes the line integral through 3D Gaussian primitives analytically, avoiding the local affine collapse and thus yielding a more physically consistent forward projection model; and (ii) the ray-tracing formulation gives explicit control over ray origins and directions, which facilitates the precise application of nonlinear geometric corrections, e.g., arc-correction used in positron emission tomography (PET). These properties extend the applicability of Gaussian-based reconstruction to a wider range of realistic tomography systems while improving projection accuracy.\n\n三维高斯喷溅近来在计算机视觉中成为一种很有前景的渲染技术。它将椭圆加权平均喷溅的原理适配到现代可微分流程中，从而实现了实时、高质量的新视角合成。在此基础上，R2-Gaussian 通过修正积分偏差，将 3DGS 扩展到断层重建任务，并在计算机断层成像中取得了最先进性能。为了支持可微分优化，R2-Gaussian 采用局部仿射近似：将每个三维高斯局部映射为探测器上的二维高斯，并通过 alpha 混合形成投影。然而，这种仿射近似会降低重建的定量精度，也不利于引入非线性几何校正。为解决这些限制，我们提出一种基于三维高斯光线追踪的断层重建框架。与基于喷溅的模型相比，该方法有两个主要优势：一是能够解析地计算穿过三维高斯基元的线积分，避免局部仿射塌缩，从而得到更符合物理规律的前向投影模型；二是光线追踪形式可显式控制光线的起点和方向，便于精确施加非线性几何校正，例如正电子发射断层成像中的弧形校正。这些特性将基于高斯的重建方法扩展到了更广泛、更真实的断层系统，并提升了投影精度。\n"
  },
  {
    "path": "abs/2602.01674.md",
    "content": "### VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR\n\nWe present VRGaussianAvatar, an integrated system that enables real-time full-body 3D Gaussian Splatting (3DGS) avatars in virtual reality using only head-mounted display (HMD) tracking signals. The system adopts a parallel pipeline with a VR Frontend and a GA Backend. The VR Frontend uses inverse kinematics to estimate full-body pose and streams the resulting pose along with stereo camera parameters to the backend. The GA Backend stereoscopically renders a 3DGS avatar reconstructed from a single image. To improve stereo rendering efficiency, we introduce Binocular Batching, which jointly processes left and right eye views in a single batched pass to reduce redundant computation and support high-resolution VR displays. We evaluate VRGaussianAvatar with quantitative performance tests and a within-subject user study against image- and video-based mesh avatar baselines. Results show that VRGaussianAvatar sustains interactive VR performance and yields higher perceived appearance similarity, embodiment, and plausibility. Project page and source code are available at https://vrgaussianavatar.github.io.\n\n我们提出 VRGaussianAvatar，这是一套集成系统，仅利用头戴显示设备的跟踪信号，即可在虚拟现实中实现实时全身三维高斯喷溅化身。系统采用并行管线，由 VR 前端和高斯化身后端组成。VR 前端利用逆运动学估计全身姿态，并将得到的姿态以及双目相机参数传输到后端；高斯化身后端则对由单张图像重建得到的三维高斯化身进行双目立体渲染。为提升双目渲染效率，我们提出双眼批处理机制，在一次批处理过程中联合处理左右眼视图，以减少冗余计算并支持高分辨率 VR 显示。我们通过定量性能测试和受试者内对比用户研究，将 VRGaussianAvatar 与基于图像和视频的网格化身基线方法进行比较。结果表明，VRGaussianAvatar 能够维持交互式 VR 性能，并在外观相似性、具身感和可信度方面带来更高的主观评价。项目页面和源代码见 https://vrgaussianavatar.github.io。\n"
  },
  {
    "path": "abs/2602.01723.md",
    "content": "### FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization\n\nExtending 3D Gaussian Splatting (3DGS) to 4D physical simulation remains challenging. Based on the Material Point Method (MPM), existing methods either rely on manual parameter tuning or distill dynamics from video diffusion models, limiting the generalization and optimization efficiency. Recent attempts using LLMs/VLMs suffer from a text/image-to-3D perceptual gap, yielding unstable physics behavior. In addition, they often ignore the surface structure of 3DGS, leading to implausible motion. We propose FastPhysGS, a fast and robust framework for physics-based dynamic 3DGS simulation:(1) Instance-aware Particle Filling (IPF) with Monte Carlo Importance Sampling (MCIS) to efficiently populate interior particles while preserving geometric fidelity; (2) Bidirectional Graph Decoupling Optimization (BGDO), an adaptive strategy that rapidly optimizes material parameters predicted from a VLM. Experiments show FastPhysGS achieves high-fidelity physical simulation in 1 minute using only 7 GB runtime memory, outperforming prior works with broad potential applications.\n\n将三维高斯喷溅扩展到四维物理仿真仍然具有挑战性。基于物质点法的现有方法，要么依赖人工调参，要么从视频扩散模型中蒸馏动态过程，限制了方法的泛化能力和优化效率。近期借助大语言模型或视觉语言模型的尝试，又受到文本或图像到三维之间感知鸿沟的影响，导致物理行为不稳定。此外，这些方法通常忽略 3DGS 的表面结构，从而产生不合理的运动。为此，我们提出 FastPhysGS，一个快速且鲁棒的基于物理的动态 3DGS 仿真框架，包括两项关键设计：一是结合蒙特卡罗重要性采样的实例感知粒子填充，用于高效填充内部粒子并保持几何保真度；二是双向图解耦优化，这是一种自适应策略，可快速优化由视觉语言模型预测的材料参数。实验表明，FastPhysGS 仅用 1 分钟和 7 GB 运行时内存，就能实现高保真的物理仿真，性能优于现有方法，并具有广泛的应用潜力。\n"
  },
  {
    "path": "abs/2602.01748.md",
    "content": "### OFERA: Blendshape-driven 3D Gaussian Control for Occluded Facial Expression to Realistic Avatars in VR\n\nWe propose OFERA, a novel framework for real-time expression control of photorealistic Gaussian head avatars for VR headset users. Existing approaches attempt to recover occluded facial expressions using additional sensors or internal cameras, but sensor-based methods increase device weight and discomfort, while camera-based methods raise privacy concerns and suffer from limited access to raw data. To overcome these limitations, we leverage the blendshape signals provided by commercial VR headsets as expression inputs. Our framework consists of three key components: (1) Blendshape Distribution Alignment (BDA), which applies linear regression to align the headset-provided blendshape distribution to a canonical input space; (2) an Expression Parameter Mapper (EPM) that maps the aligned blendshape signals into an expression parameter space for controlling Gaussian head avatars; and (3) a Mapper-integrated Avatar (MiA) that incorporates EPM into the avatar learning process to ensure distributional consistency. Furthermore, OFERA establishes an end-to-end pipeline that senses and maps expressions, updates Gaussian avatars, and renders them in real-time within VR environments. We show that EPM outperforms existing mapping methods on quantitative metrics, and we demonstrate through a user study that the full OFERA framework enhances expression fidelity while preserving avatar realism. By enabling real-time and photorealistic avatar expression control, OFERA significantly improves telepresence in VR communication. A project page is available at https://ysshwan147.github.io/projects/ofera/.\n\n我们提出 OFERA，这是一种面向 VR 头显用户的全新框架，可对照片级真实感的高斯头部化身进行实时表情控制。现有方法通常尝试借助额外传感器或内置摄像头恢复被遮挡的面部表情，但基于传感器的方法会增加设备重量并降低佩戴舒适性，而基于摄像头的方法则带来隐私问题，且难以获取原始数据。为克服这些限制，我们利用商用 VR 头显提供的 blendshape 信号作为表情输入。该框架包含三个关键组件：一是 Blendshape Distribution Alignment，用线性回归将头显输出的 blendshape 分布对齐到规范输入空间；二是 Expression Parameter Mapper，将对齐后的 blendshape 信号映射到控制高斯头部化身的表情参数空间；三是 Mapper-integrated Avatar，在化身学习过程中引入该映射器，以保证分布一致性。进一步地，OFERA 构建了一条端到端流程，能够在 VR 环境中感知和映射表情、更新高斯化身并进行实时渲染。实验表明，EPM 在定量指标上优于现有映射方法，用户研究也显示完整的 OFERA 框架在保持化身真实感的同时提升了表情还原质量。通过实现实时且高真实感的化身表情控制，OFERA 显著提升了 VR 通信中的临场感。项目页面见 https://ysshwan147.github.io/projects/ofera/。\n"
  },
  {
    "path": "abs/2602.02000.md",
    "content": "### SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors\n\nReconstructing 3D scenes from sparse images remains a challenging task due to the difficulty of recovering accurate geometry and texture without optimization. Recent approaches leverage generalizable models to generate 3D scenes using 3D Gaussian Splatting (3DGS) primitive. However, they often fail to produce continuous surfaces and instead yield discrete, color-biased point clouds that appear plausible at normal resolution but reveal severe artifacts under close-up views. To address this issue, we present SurfSplat, a feedforward framework based on 2D Gaussian Splatting (2DGS) primitive, which provides stronger anisotropy and higher geometric precision. By incorporating a surface continuity prior and a forced alpha blending strategy, SurfSplat reconstructs coherent geometry together with faithful textures. Furthermore, we introduce High-Resolution Rendering Consistency (HRRC), a new evaluation metric designed to evaluate high-resolution reconstruction quality. Extensive experiments on RealEstate10K, DL3DV, and ScanNet demonstrate that SurfSplat consistently outperforms prior methods on both standard metrics and HRRC, establishing a robust solution for high-fidelity 3D reconstruction from sparse inputs. Project page: https://hebing-sjtu.github.io/SurfSplat-website/\n\n从稀疏图像重建三维场景仍然是一项困难任务，因为在不进行优化的情况下，很难同时恢复精确的几何和纹理。近期方法通常借助可泛化模型，以三维高斯喷溅基元生成三维场景，但它们往往无法产生连续表面，而是输出离散且带有颜色偏置的点云，在普通分辨率下看似合理，却会在近距离观察时暴露出严重伪影。为解决这一问题，我们提出 SurfSplat，这是一种基于二维高斯喷溅基元的前馈式框架。二维高斯喷溅具有更强的各向异性和更高的几何精度。通过引入表面连续性先验和强制 alpha 混合策略，SurfSplat 能够重建连贯的几何结构和忠实的纹理。进一步地，我们提出高分辨率渲染一致性指标，用于评估高分辨率下的重建质量。在 RealEstate10K、DL3DV 和 ScanNet 上的大量实验表明，SurfSplat 在标准指标和该新指标上都持续优于现有方法，为稀疏输入下的高保真三维重建提供了稳健方案。项目页面见 https://hebing-sjtu.github.io/SurfSplat-website/。\n"
  },
  {
    "path": "abs/2602.02089.md",
    "content": "### UrbanGS: A Scalable and Efficient Architecture for Geometrically Accurate Large-Scene Reconstruction\n\nWhile 3D Gaussian Splatting (3DGS) enables high-quality, real-time rendering for bounded scenes, its extension to large-scale urban environments gives rise to critical challenges in terms of geometric consistency, memory efficiency, and computational scalability. To address these issues, we present UrbanGS, a scalable reconstruction framework that effectively tackles these challenges for city-scale applications. First, we propose a Depth-Consistent D-Normal Regularization module. Unlike existing approaches that rely solely on monocular normal estimators, which can effectively update rotation parameters yet struggle to update position parameters, our method integrates D-Normal constraints with external depth supervision. This allows for comprehensive updates of all geometric parameters. By further incorporating an adaptive confidence weighting mechanism based on gradient consistency and inverse depth deviation, our approach significantly enhances multi-view depth alignment and geometric coherence, which effectively resolves the issue of geometric accuracy in complex large-scale scenes. To improve scalability, we introduce a Spatially Adaptive Gaussian Pruning (SAGP) strategy, which dynamically adjusts Gaussian density based on local geometric complexity and visibility to reduce redundancy. Additionally, a unified partitioning and view assignment scheme is designed to eliminate boundary artifacts and optimize computational load. Extensive experiments on multiple urban datasets demonstrate that UrbanGS achieves superior performance in rendering quality, geometric accuracy, and memory efficiency, providing a systematic solution for high-fidelity large-scale scene reconstruction.\n\n虽然三维高斯喷溅已经能够在有界场景中实现高质量、实时渲染，但将其扩展到大规模城市环境时，会在几何一致性、内存效率和计算可扩展性方面带来关键挑战。为此，我们提出 UrbanGS，一种面向城市场景应用的可扩展重建框架，系统性地解决这些问题。首先，我们提出深度一致的 D-Normal 正则化模块。不同于仅依赖单目法线估计器的现有方法，那类方法虽然能有效更新旋转参数，却难以更新位置参数；我们的方法将 D-Normal 约束与外部深度监督结合，从而能够全面更新所有几何参数。进一步结合基于梯度一致性和逆深度偏差的自适应置信度加权机制后，我们的方法显著增强了多视图深度对齐与几何一致性，有效提升了复杂大场景中的几何精度。为了提高可扩展性，我们提出空间自适应高斯剪枝策略，根据局部几何复杂度和可见性动态调整高斯密度，以减少冗余。此外，我们还设计了统一的分区与视图分配方案，以消除边界伪影并优化计算负载。在多个城市场景数据集上的大量实验表明，UrbanGS 在渲染质量、几何精度和内存效率方面均优于现有方法，为高保真大规模场景重建提供了系统性解决方案。\n"
  },
  {
    "path": "abs/2602.02402.md",
    "content": "### SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation\n\nSimulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot actions. Existing simulators rely on predefined physics or data-driven dynamics without robot-conditioned control, limiting accuracy, stability, and generalization. This paper presents SoMA, a 3D Gaussian Splat simulator for soft-body manipulation. SoMA couples deformable dynamics, environmental forces, and robot joint actions in a unified latent neural space for end-to-end real-to-sim simulation. Modeling interactions over learned Gaussian splats enables controllable, stable long-horizon manipulation and generalization beyond observed trajectories without predefined physical models. SoMA improves resimulation accuracy and generalization on real-world robot manipulation by 20%, enabling stable simulation of complex tasks such as long-horizon cloth folding.\n\n在真实到仿真的机器人操作中，对富含交互的可变形物体进行模拟仍是一项基础性挑战，因为其动力学同时受到环境效应和机器人动作的共同驱动。现有模拟器要么依赖预定义物理模型，要么采用不具备机器人条件控制的数据驱动动力学，因此在精度、稳定性和泛化能力上都受到限制。本文提出 SoMA，一种面向软体操作的三维高斯喷溅模拟器。SoMA 在统一的潜在神经空间中耦合可变形动力学、环境力以及机器人关节动作，从而实现端到端的真实到仿真模拟。通过在学习得到的高斯喷溅上建模交互，SoMA 在无需预定义物理模型的情况下，实现了可控、稳定的长时程操作，并能泛化到观测轨迹之外。SoMA 将真实机器人操作中的重模拟精度和泛化能力提升了 20%，并可稳定模拟长时程布料折叠等复杂任务。\n"
  },
  {
    "path": "abs/2602.02602.md",
    "content": "### Position: 3D Gaussian Splatting Watermarking Should Be Scenario-Driven and Threat-Model Explicit\n\n3D content acquisition and creation are expanding rapidly in the new era of machine learning and AI. 3D Gaussian Splatting (3DGS) has become a promising high-fidelity and real-time representation for 3D content. Similar to the initial wave of digital audio-visual content at the turn of the millennium, the demand for intellectual property protection is also increasing, since explicit and editable 3D parameterization makes unauthorized use and dissemination easier. In this position paper, we argue that effective progress in watermarking 3D assets requires articulated security objectives and realistic threat models, incorporating the lessons learned from digital audio-visual asset protection over the past decades. To address this gap in security specification and evaluation, we advocate a scenario-driven formulation, in which adversarial capabilities are formalized through a security model. Based on this formulation, we construct a reference framework that organizes existing methods and clarifies how specific design choices map to corresponding adversarial assumptions. Within this framework, we also examine a legacy spread-spectrum embedding scheme, characterizing its advantages and limitations and highlighting the important trade-offs it entails. Overall, this work aims to foster effective intellectual property protection for 3D assets.\n\n在机器学习和人工智能新时代，三维内容的采集与生成正在快速扩张。三维高斯喷溅已经成为一种兼具高保真与实时性的三维内容表示方式。与千禧年之交数字音视频内容爆发时的情况类似，由于三维参数表示显式且易编辑，未授权使用与传播更加容易，因此知识产权保护需求也随之增长。本文作为立场论文指出，要想在三维资产水印领域取得有效进展，必须明确安全目标并建立现实的威胁模型，同时吸收过去几十年数字音视频资产保护领域的经验。为弥补当前在安全规范与评测上的缺口，我们倡导一种场景驱动的建模方式，通过安全模型来形式化攻击者能力。基于这一表述，我们构建了一个参考框架，用于梳理现有方法，并阐明具体设计选择如何对应不同的对抗性假设。在该框架下，我们还分析了一种传统的扩频嵌入方案，刻画其优势与局限，并指出其中涉及的重要权衡。总体而言，本文旨在推动三维资产知识产权保护走向更有效的实践。\n"
  },
  {
    "path": "abs/2602.02989.md",
    "content": "### SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation\n\nNovel view synthesis of dynamic scenes is fundamental to achieving photorealistic 4D reconstruction and immersive visual experiences. Recent progress in Gaussian-based representations has significantly improved real-time rendering quality, yet existing methods still struggle to maintain a balance between long-term static and short-term dynamic regions in both representation and optimization. To address this, we present SharpTimeGS, a lifespan-aware 4D Gaussian framework that achieves temporally adaptive modeling of both static and dynamic regions under a unified representation. Specifically, we introduce a learnable lifespan parameter that reformulates temporal visibility from a Gaussian-shaped decay into a flat-top profile, allowing primitives to remain consistently active over their intended duration and avoiding redundant densification. In addition, the learned lifespan modulates each primitives' motion, reducing drift in long-lived static points while retaining unrestricted motion for short-lived dynamic ones. This effectively decouples motion magnitude from temporal duration, improving long-term stability without compromising dynamic fidelity. Moreover, we design a lifespan-velocity-aware densification strategy that mitigates optimization imbalance between static and dynamic regions by allocating more capacity to regions with pronounced motion while keeping static areas compact and stable. Extensive experiments on multiple benchmarks demonstrate that our method achieves state-of-the-art performance while supporting real-time rendering up to 4K resolution at 100 FPS on one RTX 4090.\n\n动态场景的新视角合成是实现照片级 4D 重建和沉浸式视觉体验的基础。尽管基于 Gaussian 的表示已经显著提升了实时渲染质量，现有方法仍难以在表示和优化层面同时兼顾长期静态区域与短期动态区域。为解决这一问题，我们提出 SharpTimeGS，一种具备生命周期感知的 4D Gaussian 框架，可在统一表示下对静态与动态区域进行时间自适应建模。具体而言，我们引入可学习的生命周期参数，将时间可见性从高斯衰减重新表述为平顶分布，使基元在其应有持续时间内保持稳定激活，并避免冗余增密。此外，学习得到的生命周期还会调制各基元的运动，使长期存在的静态点漂移更小，同时保留短生命周期动态点的自由运动能力。这有效地将运动幅度与时间持续性解耦，在不损害动态保真度的前提下提升了长期稳定性。我们还设计了生命周期和速度感知的增密策略，通过为显著运动区域分配更多容量、同时保持静态区域紧凑稳定，缓解静态与动态区域之间的优化不平衡。多个基准上的实验表明，该方法达到最先进性能，并可在单张 RTX 4090 上以 100 FPS 支持最高 4K 分辨率的实时渲染。\n"
  },
  {
    "path": "abs/2602.03327.md",
    "content": "### Pi-GS: Sparse-View Gaussian Splatting with Dense π^3 Initialization\n\nNovel view synthesis has evolved rapidly, advancing from Neural Radiance Fields to 3D Gaussian Splatting (3DGS), which offers real-time rendering and rapid training without compromising visual fidelity. However, 3DGS relies heavily on accurate camera poses and high-quality point cloud initialization, which are difficult to obtain in sparse-view scenarios. While traditional Structure from Motion (SfM) pipelines often fail in these settings, existing learning-based point estimation alternatives typically require reliable reference views and remain sensitive to pose or depth errors. In this work, we propose a robust method utilizing π^3, a reference-free point cloud estimation network. We integrate dense initialization from π^3 with a regularization scheme designed to mitigate geometric inaccuracies. Specifically, we employ uncertainty-guided depth supervision, normal consistency loss, and depth warping. Experimental results demonstrate that our approach achieves state-of-the-art performance on the Tanks and Temples, LLFF, DTU, and MipNeRF360 datasets.\n\n新视角合成发展迅速，已经从神经辐射场演进到三维高斯喷溅，在保持视觉保真度的同时实现了实时渲染和快速训练。然而，3DGS 严重依赖精确的相机位姿和高质量点云初始化，而这在稀疏视角场景下很难获得。传统的运动恢复结构流程在这类场景中经常失效，而现有基于学习的点估计替代方案通常又需要可靠的参考视图，并且对位姿或深度误差十分敏感。本文提出一种鲁棒方法，利用无需参考视图的点云估计网络 π 立方。我们将 π 立方 生成的稠密初始化与一套用于缓解几何误差的正则化方案结合起来，具体包括不确定性引导的深度监督、法线一致性损失以及深度扭曲。实验结果表明，该方法在 Tanks and Temples、LLFF、DTU 和 MipNeRF360 数据集上达到了最先进性能。\n"
  },
  {
    "path": "abs/2602.03538.md",
    "content": "### Constrained Dynamic Gaussian Splatting\n\nWhile Dynamic Gaussian Splatting enables high-fidelity 4D reconstruction, its deployment is severely hindered by a fundamental dilemma: unconstrained densification leads to excessive memory consumption incompatible with edge devices, whereas heuristic pruning fails to achieve optimal rendering quality under preset Gaussian budgets. In this work, we propose Constrained Dynamic Gaussian Splatting (CDGS), a novel framework that formulates dynamic scene reconstruction as a budget-constrained optimization problem to enforce a strict, user-defined Gaussian budget during training. Our key insight is to introduce a differentiable budget controller as the core optimization driver. Guided by a multi-modal unified importance score, this controller fuses geometric, motion, and perceptual cues for precise capacity regulation. To maximize the utility of this fixed budget, we further decouple the optimization of static and dynamic elements, employing an adaptive allocation mechanism that dynamically distributes capacity based on motion complexity. Furthermore, we implement a three-phase training strategy to seamlessly integrate these constraints, ensuring precise adherence to the target count. Coupled with a dual-mode hybrid compression scheme, CDGS not only strictly adheres to hardware constraints (error < 2%}) but also pushes the Pareto frontier of rate-distortion performance. Extensive experiments demonstrate that CDGS delivers optimal rendering quality under varying capacity limits, achieving over 3x compression compared to state-of-the-art methods.\n\n尽管 Dynamic Gaussian Splatting 能够实现高保真的 4D 重建，但其部署受到一个根本性困境的严重制约：无约束增密会导致与边缘设备不兼容的过高内存消耗，而启发式剪枝又难以在预设高斯预算下获得最优渲染质量。本文提出 Constrained Dynamic Gaussian Splatting（CDGS），一种将动态场景重建表述为预算约束优化问题的新框架，可在训练过程中严格满足用户定义的高斯预算。我们的关键思想是引入可微预算控制器，作为核心优化驱动，并借助多模态统一重要性分数，融合几何、运动和感知线索来实现精确的容量调控。为最大化固定预算的利用率，我们进一步解耦静态与动态元素的优化，并采用自适应分配机制，根据运动复杂度动态分配容量。此外，我们实现了三阶段训练策略，以无缝整合这些约束，确保对目标数量的精确遵守。结合双模式混合压缩方案，CDGS 不仅能严格满足硬件约束，误差小于 2%，还推动了率失真性能的 Pareto 前沿。大量实验表明，CDGS 在不同容量限制下都能实现最优渲染质量，并相较现有最先进方法获得 3 倍以上压缩率。\n"
  },
  {
    "path": "abs/2602.03809.md",
    "content": "### Split&Splat: Zero-Shot Panoptic Segmentation via Explicit Instance Modeling and 3D Gaussian Splatting\n\n3D Gaussian Splatting (GS) enables fast and high-quality scene reconstruction, but it lacks an object-consistent and semantically aware structure. We propose Split&Splat, a framework for panoptic scene reconstruction using 3DGS. Our approach explicitly models object instances. It first propagates instance masks across views using depth, thus producing view-consistent 2D masks. Each object is then reconstructed independently and merged back into the scene while refining its boundaries. Finally, instance-level semantic descriptors are embedded in the reconstructed objects, supporting various applications, including panoptic segmentation, object retrieval, and 3D editing. Unlike existing methods, Split&Splat tackles the problem by first segmenting the scene and then reconstructing each object individually. This design naturally supports downstream tasks and allows Split&Splat to achieve state-of-the-art performance on the ScanNetv2 segmentation benchmark.\n\n三维高斯喷溅能够实现快速且高质量的场景重建，但它缺乏面向对象一致且具备语义感知的结构。我们提出 Split&Splat，一个基于 3DGS 的全景分割式场景重建框架。该方法显式建模对象实例，首先借助深度信息在不同视图间传播实例掩码，从而得到跨视图一致的二维掩码；随后对每个对象分别进行重建，并在细化边界后将其重新合并回场景中；最后，在重建对象中嵌入实例级语义描述符，以支持全景分割、目标检索和三维编辑等多种应用。不同于现有方法，Split&Splat 先对场景做分割，再对每个对象单独重建。这样的设计天然支持下游任务，并使其在 ScanNetv2 分割基准上达到了最先进性能。\n"
  },
  {
    "path": "abs/2602.03878.md",
    "content": "### Intellectual Property Protection for 3D Gaussian Splatting Assets: A Survey\n\n3D Gaussian Splatting (3DGS) has become a mainstream representation for real-time 3D scene synthesis, enabling applications in virtual and augmented reality, robotics, and 3D content creation. Its rising commercial value and explicit parametric structure raise emerging intellectual property (IP) protection concerns, prompting a surge of research on 3DGS IP protection. However, current progress remains fragmented, lacking a unified view of the underlying mechanisms, protection paradigms, and robustness challenges. To address this gap, we present the first systematic survey on 3DGS IP protection and introduce a bottom-up framework that examines (i) underlying Gaussian-based perturbation mechanisms, (ii) passive and active protection paradigms, and (iii) robustness threats under emerging generative AI era, revealing gaps in technical foundations and robustness characterization and indicating opportunities for deeper investigation. Finally, we outline six research directions across robustness, efficiency, and protection paradigms, offering a roadmap toward reliable and trustworthy IP protection for 3DGS assets.\n\n三维高斯喷溅已经成为实时三维场景合成的主流表示方式，并广泛应用于虚拟现实、增强现实、机器人和三维内容创作等领域。随着其商业价值上升以及参数结构显式可控，围绕 3DGS 资产的知识产权保护问题日益突出，也推动了相关研究快速增长。然而，当前研究仍较为零散，缺乏对底层机制、保护范式和鲁棒性挑战的统一认识。为弥补这一空缺，本文给出了首个关于 3DGS 知识产权保护的系统性综述，并提出一个自底向上的分析框架，从三个方面进行审视：一是基于高斯扰动的底层机制，二是被动与主动保护范式，三是在生成式人工智能时代下面临的鲁棒性威胁。该框架揭示了技术基础和鲁棒性刻画中的缺口，并指出值得深入研究的机会。最后，本文从鲁棒性、效率和保护范式三个方面总结了六个未来研究方向，为构建可靠、可信的 3DGS 资产知识产权保护体系提供路线图。\n"
  },
  {
    "path": "abs/2602.04043.md",
    "content": "### AnyStyle: Single-Pass Multimodal Stylization for 3D Gaussian Splatting\n\nThe growing demand for rapid and scalable 3D asset creation has driven interest in feed-forward 3D reconstruction methods, with 3D Gaussian Splatting (3DGS) emerging as an effective scene representation. While recent approaches have demonstrated pose-free reconstruction from unposed image collections, integrating stylization or appearance control into such pipelines remains underexplored. Existing attempts largely rely on image-based conditioning, which limits both controllability and flexibility. In this work, we introduce AnyStyle, a feed-forward 3D reconstruction and stylization framework that enables pose-free, zero-shot stylization through multimodal conditioning. Our method supports both textual and visual style inputs, allowing users to control the scene appearance using natural language descriptions or reference images. We propose a modular stylization architecture that requires only minimal architectural modifications and can be integrated into existing feed-forward 3D reconstruction backbones. Experiments demonstrate that AnyStyle improves style controllability over prior feed-forward stylization methods while preserving high-quality geometric reconstruction. A user study further confirms that AnyStyle achieves superior stylization quality compared to an existing state-of-the-art approach. Repository: https://github.com/joaxkal/AnyStyle.\n\n随着对快速、可扩展三维资产创建需求的不断增长，前馈式三维重建方法受到了广泛关注，而三维高斯喷溅已成为一种有效的场景表示。尽管近期方法已经展示了从无位姿图像集合进行无姿态重建的能力，但如何在这类流程中融入风格化或外观控制仍缺乏充分研究。现有尝试大多依赖基于图像的条件输入，因此在可控性和灵活性上都存在限制。本文提出 AnyStyle，一种前馈式三维重建与风格化框架，能够通过多模态条件实现无位姿、零样本风格化。我们的方法同时支持文本风格输入和视觉风格输入，用户可通过自然语言描述或参考图像控制场景外观。我们设计了一种模块化风格化架构，只需对原模型做极少改动，便可集成到现有前馈式三维重建骨干中。实验表明，AnyStyle 在保持高质量几何重建的同时，相比已有前馈式风格化方法具备更强的风格控制能力。用户研究进一步验证了其风格化质量优于现有最先进方法。代码仓库见 https://github.com/joaxkal/AnyStyle。\n"
  },
  {
    "path": "abs/2602.04251.md",
    "content": "### Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions\n\nTraditional Simultaneous Localization and Mapping (SLAM) systems often face limitations including coarse rendering quality, insufficient recovery of scene details, and poor robustness in dynamic environments. 3D Gaussian Splatting (3DGS), with its efficient explicit representation and high-quality rendering capabilities, offers a new reconstruction paradigm for SLAM. This survey comprehensively reviews key technical approaches for integrating 3DGS with SLAM. We analyze performance optimization of representative methods across four critical dimensions: rendering quality, tracking accuracy, reconstruction speed, and memory consumption, delving into their design principles and breakthroughs. Furthermore, we examine methods for enhancing the robustness of 3DGS-SLAM in complex environments such as motion blur and dynamic environments. Finally, we discuss future challenges and development trends in this area. This survey aims to provide a technical reference for researchers and foster the development of next-generation SLAM systems characterized by high fidelity, efficiency, and robustness.\n\n传统的同步定位与建图系统常常存在渲染质量粗糙、场景细节恢复不足以及在动态环境中鲁棒性较差等问题。三维高斯喷溅凭借高效的显式表示和高质量渲染能力，为 SLAM 提供了新的重建范式。本文系统综述了将 3DGS 与 SLAM 融合的关键技术路线。我们从四个关键维度分析代表性方法的性能优化，包括渲染质量、跟踪精度、重建速度和内存消耗，并深入讨论其设计原理与技术突破。此外，我们还梳理了提升 3DGS-SLAM 在运动模糊和动态环境等复杂场景中鲁棒性的方法。最后，本文讨论了该方向未来面临的挑战与发展趋势。我们希望这篇综述能为研究者提供技术参考，并推动高保真、高效率和高鲁棒性的下一代 SLAM 系统发展。\n"
  },
  {
    "path": "abs/2602.04271.md",
    "content": "### SkeletonGaussian: Editable 4D Generation through Gaussian Skeletonization\n\n4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/\n\n四维生成在从文本、图像或视频合成动态三维对象方面已经取得了显著进展。然而，现有方法通常将运动表示为隐式形变场，限制了直接控制和可编辑性。为此，我们提出 SkeletonGaussian，一种利用单目视频生成可编辑动态三维高斯的新框架。该方法引入层次化的关节表示，将运动显式分解为由骨架驱动的稀疏刚体运动以及细粒度的非刚体运动。具体来说，我们先提取鲁棒骨架，并通过线性混合蒙皮驱动刚体运动，再使用基于 hexplane 的细化模块建模非刚体形变，从而提升可解释性与可编辑性。实验结果表明，SkeletonGaussian 在生成质量上超过现有方法，同时支持直观的运动编辑，为可编辑四维生成建立了新的范式。项目页面见 https://wusar.github.io/projects/skeletongaussian/。\n"
  },
  {
    "path": "abs/2602.04317.md",
    "content": "### JOintGS: Joint Optimization of Cameras, Bodies and 3D Gaussians for In-the-Wild Monocular Reconstruction\n\nReconstructing high-fidelity animatable 3D human avatars from monocular RGB videos remains challenging, particularly in unconstrained in-the-wild scenarios where camera parameters and human poses from off-the-shelf methods (e.g., COLMAP, HMR2.0) are often inaccurate. Splatting (3DGS) advances demonstrate impressive rendering quality and real-time performance, they critically depend on precise camera calibration and pose annotations, limiting their applicability in real-world settings. We present JOintGS, a unified framework that jointly optimizes camera extrinsics, human poses, and 3D Gaussian representations from coarse initialization through a synergistic refinement mechanism. Our key insight is that explicit foreground-background disentanglement enables mutual reinforcement: static background Gaussians anchor camera estimation via multi-view consistency; refined cameras improve human body alignment through accurate temporal correspondence; optimized human poses enhance scene reconstruction by removing dynamic artifacts from static constraints. We further introduce a temporal dynamics module to capture fine-grained pose-dependent deformations and a residual color field to model illumination variations. Extensive experiments on NeuMan and EMDB datasets demonstrate that JOintGS achieves superior reconstruction quality, with 2.1~dB PSNR improvement over state-of-the-art methods on NeuMan dataset, while maintaining real-time rendering. Notably, our method shows significantly enhanced robustness to noisy initialization compared to the baseline.Our source code is available at https://github.com/MiliLab/JOintGS.\n\n从单目 RGB 视频中重建高保真、可驱动的三维人体化身仍然具有挑战性，尤其是在自然场景中，相机参数和人体姿态由现成方法估计时往往并不准确，例如 COLMAP 或 HMR2.0。尽管基于三维高斯喷溅的方法在渲染质量和实时性能方面表现出色，但它们严重依赖精确的相机标定和姿态标注，因此在真实世界中的适用性受到限制。本文提出 JOintGS，一个统一框架，可从粗初始化出发，通过协同细化机制联合优化相机外参、人体姿态和三维高斯表示。我们的核心观察是，显式的前景背景解耦能够实现相互增强：静态背景高斯通过多视图一致性稳定相机估计，优化后的相机又通过准确的时间对应关系改善人体对齐，而优化后的人体姿态则通过移除静态约束中的动态伪影进一步提升场景重建质量。我们还引入时间动态模块以捕捉细粒度的姿态相关形变，并使用残差颜色场建模光照变化。在 NeuMan 和 EMDB 数据集上的大量实验表明，JOintGS 在保持实时渲染的同时，实现了更优的重建质量，在 NeuMan 上相较最先进方法提升了 2.1 dB 的 PSNR，并且对噪声初始化具有显著更强的鲁棒性。代码见 https://github.com/MiliLab/JOintGS。\n"
  },
  {
    "path": "abs/2602.04349.md",
    "content": "### VecSet-Edit: Unleashing Pre-trained LRM for Mesh Editing from Single Image\n\n3D editing has emerged as a critical research area to provide users with flexible control over 3D assets. While current editing approaches predominantly focus on 3D Gaussian Splatting or multi-view images, the direct editing of 3D meshes remains underexplored. Prior attempts, such as VoxHammer, rely on voxel-based representations that suffer from limited resolution and necessitate labor-intensive 3D mask. To address these limitations, we propose \\textbf{VecSet-Edit}, the first pipeline that leverages the high-fidelity VecSet Large Reconstruction Model (LRM) as a backbone for mesh editing. Our approach is grounded on a analysis of the spatial properties in VecSet tokens, revealing that token subsets govern distinct geometric regions. Based on this insight, we introduce Mask-guided Token Seeding and Attention-aligned Token Gating strategies to precisely localize target regions using only 2D image conditions. Also, considering the difference between VecSet diffusion process versus voxel we design a Drift-aware Token Pruning to reject geometric outliers during the denoising process. Finally, our Detail-preserving Texture Baking module ensures that we not only preserve the geometric details of original mesh but also the textural information. More details can be found in our project page: https://github.com/BlueDyee/VecSet-Edit/tree/main\n\n三维编辑已经成为一个关键研究方向，目的是让用户能够灵活控制三维资产。当前编辑方法主要集中在三维高斯喷溅或多视图图像上，而对三维网格的直接编辑仍研究不足。此前如 VoxHammer 之类的方法依赖体素表示，存在分辨率受限且需要耗费大量人工制作三维掩码的问题。为了解决这些限制，我们提出 VecSet-Edit，这是首个以高保真 VecSet 大型重建模型为骨干的单图像网格编辑流程。我们的方法建立在对 VecSet token 空间属性的分析之上，发现不同 token 子集控制着不同的几何区域。基于这一发现，我们提出掩码引导的 token 播种和注意力对齐的 token 门控策略，仅利用二维图像条件就能精确定位目标区域。同时，考虑到 VecSet 的扩散过程与体素方法之间的差异，我们设计了漂移感知 token 剪枝，用于在去噪过程中剔除几何离群点。最后，细节保持纹理烘焙模块不仅保留原始网格的几何细节，也尽可能保留其纹理信息。更多细节见项目页面 https://github.com/BlueDyee/VecSet-Edit/tree/main。\n"
  },
  {
    "path": "abs/2602.04549.md",
    "content": "### Nix and Fix: Targeting 1000x Compression of 3D Gaussian Splatting with Diffusion Models\n\n3D Gaussian Splatting (3DGS) revolutionized novel view rendering. Instead of inferring from dense spatial points, as implicit representations do, 3DGS uses sparse Gaussians. This enables real-time performance but increases space requirements, hindering applications such as immersive communication. 3DGS compression emerged as a field aimed at alleviating this issue. While impressive progress has been made, at low rates, compression introduces artifacts that degrade visual quality significantly. We introduce NiFi, a method for extreme 3DGS compression through restoration via artifact-aware, diffusion-based one-step distillation. We show that our method achieves state-of-the-art perceptual quality at extremely low rates, down to 0.1 MB, and towards 1000x rate improvement over 3DGS at comparable perceptual performance. The code will be open-sourced upon acceptance.\n\n三维高斯喷溅彻底改变了新视角渲染。与隐式表示依赖稠密空间点进行推断不同，3DGS 使用稀疏高斯，因此能够实现实时性能，但也带来了更高的存储需求，从而限制了其在沉浸式通信等场景中的应用。为缓解这一问题，3DGS 压缩逐渐成为一个研究方向。尽管该领域已经取得显著进展，但在低码率条件下，压缩引入的伪影仍会明显降低视觉质量。我们提出 NiFi，一种面向极限 3DGS 压缩的方法，通过具备伪影感知能力的扩散式一步蒸馏进行恢复。实验表明，我们的方法在极低码率下依然能实现最先进的感知质量，模型大小最低可压缩到 0.1 MB，并在保持相近感知表现的前提下，相比原始 3DGS 朝着约 1000 倍压缩率迈进。代码将在论文录用后开源。\n"
  },
  {
    "path": "abs/2602.05047.md",
    "content": "### QuantumGS: Quantum Encoding Framework for Gaussian Splatting\n\nRecent advances in neural rendering, particularly 3D Gaussian Splatting (3DGS), have enabled real-time rendering of complex scenes. However, standard 3DGS relies on spherical harmonics, which often struggle to accurately capture high-frequency view-dependent effects such as sharp reflections and transparency. While hybrid approaches like Viewing Direction Gaussian Splatting (VDGS) mitigate this limitation using classical Multi-Layer Perceptrons (MLPs), they remain limited by the expressivity of classical networks in low-parameter regimes. In this paper, we introduce QuantumGS, a novel hybrid framework that integrates Variational Quantum Circuits (VQC) into the Gaussian Splatting pipeline. We propose a unique encoding strategy that maps the viewing direction directly onto the Bloch sphere, leveraging the natural geometry of qubits to represent 3D directional data. By replacing classical color-modulating networks with quantum circuits generated via a hypernetwork or conditioning mechanism, we achieve higher expressivity and better generalization. Source code is available in the supplementary material. Code is available at https://github.com/gwilczynski95/QuantumGS\n\n近年来，神经渲染，尤其是三维高斯喷溅，已经能够实现复杂场景的实时渲染。然而，标准 3DGS 依赖球谐函数来表示视角相关外观，往往难以准确刻画锐利反射和透明等高频视角依赖效应。虽然如 Viewing Direction Gaussian Splatting 这样的混合方法通过经典多层感知机缓解了这一问题，但在低参数规模下，经典网络的表达能力仍然有限。本文提出 QuantumGS，一个将变分量子电路集成进高斯喷溅流程的新型混合框架。我们设计了一种独特的编码策略，将视线方向直接映射到 Bloch 球面上，利用量子比特的自然几何结构表示三维方向数据。通过使用由超网络或条件机制生成的量子电路替代传统颜色调制网络，我们获得了更强的表达能力和更好的泛化性。源代码已在补充材料中提供，代码地址为 https://github.com/gwilczynski95/QuantumGS。\n"
  },
  {
    "path": "abs/2602.05617.md",
    "content": "### Unified Sensor Simulation for Autonomous Driving\n\nIn this work, we introduce \\textbf{XSIM}, a sensor simulation framework for autonomous driving. XSIM extends 3DGUT splatting with a generalized rolling-shutter modeling tailored for autonomous driving applications. Our framework provides a unified and flexible formulation for appearance and geometric sensor modeling, enabling rendering of complex sensor distortions in dynamic environments. We identify spherical cameras, such as LiDARs, as a critical edge case for existing 3DGUT splatting due to cyclic projection and time discontinuities at azimuth boundaries leading to incorrect particle projection. To address this issue, we propose a phase modeling mechanism that explicitly accounts temporal and shape discontinuities of Gaussians projected by the Unscented Transform at azimuth borders. In addition, we introduce an extended 3D Gaussian representation that incorporates two distinct opacity parameters to resolve mismatches between geometry and color distributions. As a result, our framework provides enhanced scene representations with improved geometric consistency and photorealistic appearance. We evaluate our framework extensively on multiple autonomous driving datasets, including Waymo Open Dataset, Argoverse 2, and PandaSet. Our framework consistently outperforms strong recent baselines and achieves state-of-the-art performance across all datasets. The source code is publicly available at \\href{https://github.com/whesense/XSIM}{https://github.com/whesense/XSIM}.\n\n本文提出 XSIM，一个面向自动驾驶的传感器仿真框架。XSIM 在 3DGUT 喷溅基础上扩展了适用于自动驾驶场景的广义滚动快门建模，为外观与几何传感器建模提供统一且灵活的表述，从而能够在动态环境中渲染复杂的传感器畸变。我们发现球形相机，例如激光雷达，是现有 3DGUT 喷溅方法中的关键边界情况，因为方位角边界处存在循环投影和时间不连续性，会导致粒子投影错误。为了解决这一问题，我们提出相位建模机制，显式考虑无迹变换投影到方位角边界时高斯在时间和形状上的不连续性。此外，我们还引入扩展的三维高斯表示，加入两个不同的不透明度参数，以解决几何分布与颜色分布不匹配的问题。因此，该框架能够提供几何一致性更强、外观更逼真的场景表示。我们在 Waymo Open Dataset、Argoverse 2 和 PandaSet 等多个自动驾驶数据集上进行了充分评估，结果表明该框架持续优于强基线方法，并在所有数据集上达到最先进性能。代码公开地址为 https://github.com/whesense/XSIM。\n"
  },
  {
    "path": "abs/2602.06032.md",
    "content": "### Splat and Distill: Augmenting Teachers with Feed-Forward 3D Reconstruction For 3D-Aware Distillation\n\nVision Foundation Models (VFMs) have achieved remarkable success when applied to various downstream 2D tasks. Despite their effectiveness, they often exhibit a critical lack of 3D awareness. To this end, we introduce Splat and Distill, a framework that instills robust 3D awareness into 2D VFMs by augmenting the teacher model with a fast, feed-forward 3D reconstruction pipeline. Given 2D features produced by a teacher model, our method first lifts these features into an explicit 3D Gaussian representation, in a feedforward manner. These 3D features are then ``splatted\" onto novel viewpoints, producing a set of novel 2D feature maps used to supervise the student model, ``distilling\" geometrically grounded knowledge. By replacing slow per-scene optimization of prior work with our feed-forward lifting approach, our framework avoids feature-averaging artifacts, creating a dynamic learning process where the teacher's consistency improves alongside that of the student. We conduct a comprehensive evaluation on a suite of downstream tasks, including monocular depth estimation, surface normal estimation, multi-view correspondence, and semantic segmentation. Our method significantly outperforms prior works, not only achieving substantial gains in 3D awareness but also enhancing the underlying semantic richness of 2D features. Project page is available at https://davidshavin4.github.io/Splat-and-Distill/\n\n视觉基础模型在多种二维下游任务中取得了显著成功，但它们往往缺乏关键的三维感知能力。为此，我们提出 Splat and Distill，一个通过快速前馈式三维重建管线增强教师模型，从而向二维视觉基础模型注入稳定三维感知能力的框架。给定教师模型产生的二维特征，我们的方法首先以前馈方式将这些特征提升为显式的三维高斯表示，再把这些三维特征投射到新的视角上，生成一组新的二维特征图，用来监督学生模型，从而蒸馏出具有几何依据的知识。与以往依赖逐场景慢速优化的方法不同，我们的前馈式提升方法避免了特征平均化伪影，并形成了一个动态学习过程，其中教师的一致性会随着学生的一致性同步提升。我们在单目深度估计、表面法线估计、多视图对应和语义分割等一系列下游任务上进行了全面评测。结果表明，该方法显著优于现有工作，不仅明显提升了模型的三维感知能力，也增强了二维特征本身的语义表达能力。项目页面见 https://davidshavin4.github.io/Splat-and-Distill/。\n"
  },
  {
    "path": "abs/2602.06122.md",
    "content": "### From Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors\n\nCreating high-fidelity, animatable 3D talking heads is crucial for immersive applications, yet often hindered by the prevalence of low-quality image or video sources, which yield poor 3D reconstructions. In this paper, we introduce SuperHead, a novel framework for enhancing low-resolution, animatable 3D head avatars. The core challenge lies in synthesizing high-quality geometry and textures, while ensuring both 3D and temporal consistency during animation and preserving subject identity. Despite recent progress in image, video and 3D-based super-resolution (SR), existing SR techniques are ill-equipped to handle dynamic 3D inputs. To address this, SuperHead leverages the rich priors from pre-trained 3D generative models via a novel dynamics-aware 3D inversion scheme. This process optimizes the latent representation of the generative model to produce a super-resolved 3D Gaussian Splatting (3DGS) head model, which is subsequently rigged to an underlying parametric head model (e.g., FLAME) for animation. The inversion is jointly supervised using a sparse collection of upscaled 2D face renderings and corresponding depth maps, captured from diverse facial expressions and camera viewpoints, to ensure realism under dynamic facial motions. Experiments demonstrate that SuperHead generates avatars with fine-grained facial details under dynamic motions, significantly outperforming baseline methods in visual quality.\n\n构建高保真、可驱动的三维说话人头部对于沉浸式应用至关重要，但现实中常见的低质量图像或视频输入会导致较差的三维重建效果。本文提出 SuperHead，一个用于增强低分辨率可驱动三维头部化身的新框架。其核心挑战在于如何合成高质量几何和纹理，同时在动画过程中保持三维一致性、时间一致性以及人物身份不变。尽管图像、视频和三维超分辨率方法近年来已有进展，但现有超分技术并不适合处理动态三维输入。为此，SuperHead 通过一种新的动态感知三维反演方案，利用预训练三维生成模型中的丰富先验。该过程优化生成模型的潜在表示，以生成超分辨率的三维高斯喷溅头部模型，随后将其绑定到参数化头部模型，例如 FLAME，以实现动画驱动。反演过程通过一组稀疏采样的二维人脸上采样渲染结果及其对应深度图进行联合监督，这些监督来自不同面部表情和相机视角，从而保证在动态表情运动下依然具有真实感。实验表明，SuperHead 能在动态运动中生成拥有细粒度面部细节的化身，并在视觉质量上显著优于基线方法。\n"
  },
  {
    "path": "abs/2602.06343.md",
    "content": "### Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering\n\nHigh-fidelity rendering of dynamic humans from monocular videos typically degrades catastrophically under occlusions. Existing solutions incorporate external priors-either hallucinating missing content via generative models, which induces severe temporal flickering, or imposing rigid geometric heuristics that fail to capture diverse appearances. To this end, we reformulate the task as a Maximum A Posteriori estimation problem under heteroscedastic observation noise. In this paper, we propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Double Rasterization pipeline. This architecture renders pixel-aligned uncertainty maps that act as an adaptive gradient modulator, automatically attenuating artifacts from unreliable observations. Furthermore, to prevent geometric drift in regions lacking reliable visual cues, we enforce Confidence-Aware Regularizations, which leverage the learned uncertainty to selectively propagate spatial-temporal validity. Extensive experiments on ZJU-MoCap and OcMotion demonstrate that U-4DGS achieves SOTA rendering fidelity and robustness.\n\n从单目视频中对动态人体进行高保真渲染，在存在遮挡时通常会出现灾难性退化。现有方法通常依赖外部先验，要么通过生成模型补全缺失内容，从而引入严重的时间闪烁；要么施加僵硬的几何启发式，难以刻画多样化外观。为此，我们将该任务重新表述为异方差观测噪声下的最大后验估计问题。本文提出 U-4DGS 框架，将概率变形网络与双光栅化管线结合起来。该架构渲染出像素对齐的不确定性图，作为自适应梯度调制器，可自动削弱来自不可靠观测的伪影。此外，为防止在缺乏可靠视觉线索的区域出现几何漂移，我们引入置信度感知正则项，利用学习到的不确定性有选择地传播时空有效性。ZJU-MoCap 和 OcMotion 上的大量实验表明，U-4DGS 在渲染保真度和鲁棒性上均达到了最先进水平。\n"
  },
  {
    "path": "abs/2602.06400.md",
    "content": "### TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction\n\n3D semantic occupancy prediction enables autonomous vehicles (AVs) to perceive fine-grained geometric and semantic structure of their surroundings from onboard sensors, which is essential for safe decision-making and navigation. Recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, the intermediate representations used by existing methods for 3D semantic occupancy prediction rely heavily on 3D voxel volumes or a set of 3D Gaussians, hindering the model's ability to efficiently and effectively capture fine-grained geometric details in the 3D driving environment. This paper introduces TFusionOcc, a novel object-centric multi-sensor fusion framework for predicting 3D semantic occupancy. By leveraging multi-stage multi-sensor fusion, Student's t-distribution, and the T-Mixture model (TMM), together with more geometrically flexible primitives, such as the deformable superquadric (superquadric with inverse warp), the proposed method achieved state-of-the-art (SOTA) performance on the nuScenes benchmark. In addition, extensive experiments were conducted on the nuScenes-C dataset to demonstrate the robustness of the proposed method in different camera and lidar corruption scenarios. The code will be available at: https://github.com/DanielMing123/TFusionOcc\n\n三维语义占据预测使自动驾驶车辆能够根据车载传感器感知周围环境中细粒度的几何和语义结构，这对安全决策和导航至关重要。近期模型已经能够较好应对现实世界中不同形状和类别目标的表示问题。然而，现有方法在中间表示上高度依赖三维体素或一组三维高斯，这限制了模型高效、有效捕获驾驶场景细粒度几何细节的能力。本文提出 TFusionOcc，一个新的以对象为中心的多传感器融合框架，用于三维语义占据预测。该方法结合多阶段多传感器融合、Student t 分布、T 混合模型，以及更具几何灵活性的基元，如可变形超二次曲面，在 nuScenes 基准上取得了最先进性能。此外，我们还在 nuScenes-C 数据集上进行了大量实验，验证了该方法在不同相机和激光雷达损坏场景下的鲁棒性。代码将发布于 https://github.com/DanielMing123/TFusionOcc。\n"
  },
  {
    "path": "abs/2602.06830.md",
    "content": "### GaussianPOP: Principled Simplification Framework for Compact 3D Gaussian Splatting via Error Quantification\n\nExisting 3D Gaussian Splatting simplification methods commonly use importance scores, such as blending weights or sensitivity, to identify redundant Gaussians. However, these scores are not driven by visual error metrics, often leading to suboptimal trade-offs between compactness and rendering fidelity. We present GaussianPOP, a principled simplification framework based on analytical Gaussian error quantification. Our key contribution is a novel error criterion, derived directly from the 3DGS rendering equation, that precisely measures each Gaussian's contribution to the rendered image. By introducing a highly efficient algorithm, our framework enables practical error calculation in a single forward pass. The framework is both accurate and flexible, supporting on-training pruning as well as post-training simplification via iterative error re-quantification for improved stability. Experimental results show that our method consistently outperforms existing state-of-the-art pruning methods across both application scenarios, achieving a superior trade-off between model compactness and high rendering quality.\n\n现有三维高斯喷溅简化方法通常依赖混合权重或敏感度等重要性分数来识别冗余高斯，但这些分数并非直接由视觉误差指标驱动，因此往往无法在模型紧凑性与渲染保真度之间取得最佳平衡。我们提出 GaussianPOP，一个基于解析高斯误差量化的原则性简化框架。其核心贡献是一种新的误差准则，直接从 3DGS 渲染方程推导而来，可精确衡量每个高斯对最终渲染图像的贡献。通过引入高效算法，我们的框架能够在一次前向传播中完成实用的误差计算。该框架既准确又灵活，既支持训练过程中的剪枝，也支持训练后的迭代误差重新量化简化，以获得更好的稳定性。实验结果表明，在这两种应用场景下，我们的方法都持续优于现有最先进剪枝方法，在模型压缩度与渲染质量之间实现了更优的平衡。\n"
  },
  {
    "path": "abs/2602.06846.md",
    "content": "### DynFOA: Generating First-Order Ambisonics with Conditional Diffusion for Dynamic and Acoustically Complex 360-Degree Videos\n\nSpatial audio is crucial for creating compelling immersive 360-degree video experiences. However, generating realistic spatial audio, such as first-order ambisonics (FOA), from 360-degree videos in complex acoustic scenes remains challenging. Existing methods often overlook the dynamic nature and acoustic complexity of 360-degree scenes, fail to fully account for dynamic sound sources, and neglect complex environmental effects such as occlusion, reflections, and reverberation, which are influenced by scene geometries and materials. We propose DynFOA, a framework based on dynamic acoustic perception and conditional diffusion, for generating high-fidelity FOA from 360-degree videos. DynFOA first performs visual processing via a video encoder, which detects and localizes multiple dynamic sound sources, estimates their depth and semantics, and reconstructs the scene geometry and materials using a 3D Gaussian Splatting. This reconstruction technique accurately models occlusion, reflections, and reverberation based on the geometries and materials of the reconstructed 3D scene and the listener's viewpoint. The audio encoder then captures the spatial motion and temporal 4D sound source trajectories to fine-tune the diffusion-based FOA generator. The fine-tuned FOA generator adjusts spatial cues in real time, ensuring consistent directional fidelity during listener head rotation and complex environmental changes. Extensive evaluations demonstrate that DynFOA consistently outperforms existing methods across metrics such as spatial accuracy, acoustic fidelity, and distribution matching, while also improving the user experience. Therefore, DynFOA provides a robust and scalable approach to rendering realistic dynamic spatial audio for VR and immersive media applications.\n\n空间音频对于构建有吸引力的沉浸式 360 度视频体验至关重要。然而，在复杂声学场景中，仅根据 360 度视频生成逼真的空间音频，例如一阶 Ambisonics，仍然十分困难。现有方法往往忽略 360 度场景的动态性和声学复杂性，无法充分考虑动态声源，也忽视了遮挡、反射和混响等复杂环境效应，而这些效应又受到场景几何和材质的影响。我们提出 DynFOA，一个基于动态声学感知和条件扩散的框架，用于从 360 度视频生成高保真的一阶 Ambisonics 音频。DynFOA 首先通过视频编码器进行视觉处理，检测并定位多个动态声源，估计其深度和语义，并借助三维高斯喷溅重建场景几何与材质。这种重建能够根据三维场景几何、材质以及听者视角，准确建模遮挡、反射和混响。随后，音频编码器捕捉空间运动和时间上的四维声源轨迹，以微调基于扩散的一阶 Ambisonics 生成器。微调后的生成器能够实时调整空间线索，保证在听者转头和复杂环境变化下仍保持一致的方向保真度。大量评测表明，DynFOA 在空间精度、声学保真度和分布匹配等指标上持续优于现有方法，同时也提升了用户体验，为虚拟现实和沉浸式媒体中的真实动态空间音频渲染提供了稳健且可扩展的方案。\n"
  },
  {
    "path": "abs/2602.06991.md",
    "content": "### LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM\n\nIn this paper, we propose a RGB-D SLAM system that reconstructs a language-aligned dense feature field while sustaining low-latency tracking and mapping. First, we introduce a Top-K Rendering pipeline, a high-throughput and semantic-distortion-free method for efficiently rendering high-dimensional feature maps. To address the resulting semantic-geometric discrepancy and mitigate the memory consumption, we further design a multi-criteria map management strategy that prunes redundant or inconsistent Gaussians while preserving scene integrity. Finally, a hybrid field optimization framework jointly refines the geometric and semantic fields under real-time constraints by decoupling their optimization frequencies according to field characteristics. The proposed system achieves superior geometric fidelity compared to geometric-only baselines and comparable semantic fidelity to offline approaches while operating at 15 FPS. Our results demonstrate that online SLAM with dense, uncompressed language-aligned feature fields is both feasible and effective, bridging the gap between 3D perception and language-based reasoning.\n\n本文提出一种 RGB-D SLAM 系统，可在保持低延迟跟踪与建图的同时，重建与语言对齐的稠密特征场。首先，我们引入 Top-K Rendering 管线，这是一种高吞吐且无语义失真的方法，用于高效渲染高维特征图。为解决由此带来的语义与几何不一致问题并缓解内存消耗，我们进一步设计了多准则地图管理策略，在保留场景完整性的同时剪除冗余或不一致的高斯。最后，我们提出混合场优化框架，在实时约束下根据不同场的特性解耦其优化频率，从而联合细化几何场和语义场。所提出系统在以 15 FPS 运行时，相比仅几何基线取得了更好的几何保真度，并在语义保真度上达到与离线方法相当的水平。结果表明，使用稠密、未压缩的语言对齐特征场进行在线 SLAM 是可行且有效的，为三维感知与基于语言的推理之间架起了桥梁。\n"
  },
  {
    "path": "abs/2602.07101.md",
    "content": "### Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting\n\nUAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splatting enables photorealistic scene reconstruction from real-world data, existing methods inherently couple static lighting with geometry, severely limiting policy generalization to dynamic real-world illumination. In this paper, we propose a novel end-to-end reinforcement learning framework designed for effective zero-shot transfer to unstructured outdoors. Within a high-fidelity simulation grounded in real-world data, our policy is trained to map raw monocular RGB observations directly to continuous control commands. To overcome photometric limitations, we introduce Relightable 3D Gaussian Splatting, which decomposes scene components to enable explicit, physically grounded editing of environmental lighting within the neural representation. By augmenting training with diverse synthesized lighting conditions ranging from strong directional sunlight to diffuse overcast skies, we compel the policy to learn robust, illumination-invariant visual features. Extensive real-world experiments demonstrate that a lightweight quadrotor achieves robust, collision-free navigation in complex forest environments at speeds up to 10 m/s, exhibiting significant resilience to drastic lighting variations without fine-tuning.\n\n在非结构化户外环境中，仅依赖被动单目视觉进行无人机导航，会受到仿真与真实世界之间巨大视觉域差的严重影响。虽然三维高斯喷溅能够利用真实数据重建高逼真场景，但现有方法将静态光照与几何结构紧密耦合，导致策略难以泛化到真实世界中不断变化的光照条件。本文提出一个新的端到端强化学习框架，用于实现向非结构化户外环境的有效零样本迁移。在基于真实世界数据构建的高保真仿真环境中，我们训练策略直接将原始单目 RGB 观测映射为连续控制指令。为克服光度限制，我们引入可重光照的三维高斯喷溅表示，将场景成分分解，使得环境光照能够在神经表示中进行显式、符合物理规律的编辑。通过在训练中加入从强定向阳光到漫射阴天等多样化合成光照条件，我们迫使策略学习对光照不敏感的稳健视觉特征。大量真实世界实验表明，轻量级四旋翼无人机无需微调，即可在复杂森林环境中以最高 10 米每秒的速度实现稳健、无碰撞导航，并且对剧烈光照变化具有显著鲁棒性。\n"
  },
  {
    "path": "abs/2602.07891.md",
    "content": "### Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video\n\nGeometric foundation models show promise in 3D reconstruction, yet their progress is severely constrained by the scarcity of diverse, large-scale 3D annotations. While Internet videos offer virtually unlimited raw data, utilizing them as a scaling source for geometric learning is challenging due to the absence of ground-truth geometry and the presence of observational noise. To address this, we propose SAGE, a framework for Scalable Adaptation of GEometric foundation models from raw video streams. SAGE leverages a hierarchical mining pipeline to transform videos into training trajectories and hybrid supervision: (1) Informative training trajectory selection; (2) Sparse Geometric Anchoring via SfM point clouds for global structural guidance; and (3) Dense Differentiable Consistency via 3D Gaussian rendering for multi-view constraints. To prevent catastrophic forgetting, we introduce a regularization strategy using anchor data. Extensive experiments show that SAGE significantly enhances zero-shot generalization, reducing Chamfer Distance by 20-42% on unseen benchmarks (7Scenes, TUM-RGBD, Matterport3D) compared to state-of-the-art baselines. To our knowledge, SAGE pioneers the adaptation of geometric foundation models via Internet video, establishing a scalable paradigm for general-purpose 3D learning.\n\n几何基础模型在三维重建中展现出良好前景，但其发展严重受限于大规模、多样化三维标注数据的缺乏。互联网视频虽然提供了几乎无限的原始数据，但由于缺少真实几何标签且观测噪声较多，要将其作为几何学习的扩展数据源仍十分困难。为此，我们提出 SAGE，一个从原始视频流中对几何基础模型进行可扩展适配的框架。SAGE 采用分层挖掘流程，将视频转化为训练轨迹和混合监督，包括三个步骤：一是选择信息量更高的训练轨迹，二是利用运动恢复结构点云进行稀疏几何锚定以提供全局结构指导，三是通过三维高斯渲染实现稠密可微一致性约束。为防止灾难性遗忘，我们还引入基于锚点数据的正则化策略。大量实验表明，SAGE 显著增强了模型的零样本泛化能力，在未见基准数据集 7Scenes、TUM-RGBD 和 Matterport3D 上，相比最先进基线将 Chamfer Distance 降低了 20% 到 42%。据我们所知，SAGE 首次实现了通过互联网视频对几何基础模型进行适配，为通用三维学习建立了可扩展的新范式。\n"
  },
  {
    "path": "abs/2602.08266.md",
    "content": "### Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes\n\nIn cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.\n\n在存在不可避免遮挡和不完整观测的杂乱场景中，选择信息量高的视角对于构建可靠表示至关重要。在这一背景下，三维高斯喷溅具有独特优势，因为它能够显式指导后续视角的选择，并利用新的观测来不断细化表示。然而，现有方法仅依赖几何线索，忽视了与操作相关的语义信息，而且往往更偏向利用而非探索。为解决这些问题，我们提出一种实例感知的下一最佳视角策略，通过利用目标特征优先关注尚未充分探索的区域。具体来说，我们的对象感知 3DGS 将实例级信息蒸馏为 one-hot 对象向量，并据此计算带置信度加权的信息增益，用于识别与错误高斯和不确定高斯相关的区域。进一步地，该方法还能自然扩展为以对象为中心的下一最佳视角策略，将视角选择聚焦于目标对象，从而提高对目标摆放变化的重建鲁棒性。实验表明，与基线相比，我们的策略在合成数据集上最多可将深度误差降低 77.14%，在真实世界 GraspNet 数据集上最多降低 34.10%。此外，相比面向整个场景执行下一最佳视角，在特定对象上执行该策略还能额外将该对象的深度误差降低 25.60%。我们还通过真实机器人操作任务进一步验证了该方法的有效性。\n"
  },
  {
    "path": "abs/2602.08558.md",
    "content": "### FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction\n\nWe introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically rely on a single Multilayer Perceptron (MLP) to model temporal deformations, and they often struggle to capture complex point motions and fine-grained dynamic details consistently over time, especially from sparse input views. Our approach, FLAG-4D, overcomes this by employing a dual-deformation network that dynamically warps a canonical set of 3D Gaussians over time into new positions and anisotropic shapes. This dual-deformation network consists of an Instantaneous Deformation Network (IDN) for modeling fine-grained, local deformations and a Global Motion Network (GMN) for capturing long-range dynamics, refined through mutual learning. To ensure these deformations are both accurate and temporally smooth, FLAG-4D incorporates dense motion features from a pretrained optical flow backbone. We fuse these motion cues from adjacent timeframes and use a deformation-guided attention mechanism to align this flow information with the current state of each evolving 3D Gaussian. Extensive experiments demonstrate that FLAG-4D achieves higher-fidelity and more temporally coherent reconstructions with finer detail preservation than state-of-the-art methods.\n\n我们提出 FLAG-4D，一个通过重建三维高斯基元在时空中演化过程来生成动态场景新视角的全新框架。现有方法通常依赖单个多层感知机来建模时间形变，尤其在稀疏输入视角条件下，往往难以一致地捕捉复杂点运动和细粒度动态细节。FLAG-4D 通过双形变网络克服了这一问题，该网络会随时间动态地将一组规范空间中的三维高斯扭曲到新的位置和各向异性形状。该双形变网络由两部分组成：用于建模细粒度局部形变的瞬时形变网络，以及用于捕捉长程动态的全局运动网络，两者通过互学习共同优化。为了确保这些形变既准确又具有时间平滑性，FLAG-4D 融合了来自预训练光流骨干网络的稠密运动特征。我们将相邻时刻的运动线索进行融合，并采用形变引导的注意力机制，使这些光流信息与每个演化中的三维高斯当前状态对齐。大量实验表明，FLAG-4D 比现有最先进方法能够实现更高保真、更具时间一致性的重建，并保留更细致的动态细节。\n"
  },
  {
    "path": "abs/2602.08784.md",
    "content": "### GaussianCaR: Gaussian Splatting for Efficient Camera-Radar Fusion\n\nRobust and accurate perception of dynamic objects and map elements is crucial for autonomous vehicles performing safe navigation in complex traffic scenarios. While vision-only methods have become the de facto standard due to their technical advances, they can benefit from effective and cost-efficient fusion with radar measurements. In this work, we advance fusion methods by repurposing Gaussian Splatting as an efficient universal view transformer that bridges the view disparity gap, mapping both image pixels and radar points into a common Bird's-Eye View (BEV) representation. Our main contribution is GaussianCaR, an end-to-end network for BEV segmentation that, unlike prior BEV fusion methods, leverages Gaussian Splatting to map raw sensor information into latent features for efficient camera-radar fusion. Our architecture combines multi-scale fusion with a transformer decoder to efficiently extract BEV features. Experimental results demonstrate that our approach achieves performance on par with, or even surpassing, the state of the art on BEV segmentation tasks (57.3%, 82.9%, and 50.1% IoU for vehicles, roads, and lane dividers) on the nuScenes dataset, while maintaining a 3.2x faster inference runtime. Code and project page are available online.\n\n对动态目标和地图元素进行稳健而精确的感知，对于自动驾驶车辆在复杂交通场景中的安全导航至关重要。尽管纯视觉方法因技术进步已成为事实标准，但它们仍可从与雷达测量进行有效且低成本的融合中受益。本文通过将 Gaussian Splatting 重新用于一种高效的通用视图变换器，推动了融合方法的发展；该变换器弥合了不同视角之间的差异，将图像像素和雷达点统一映射到鸟瞰视角表示中。我们的主要贡献是 GaussianCaR，这是一种端到端的 BEV 分割网络。与以往的 BEV 融合方法不同，它利用 Gaussian Splatting 将原始传感器信息映射为潜在特征，以实现高效的相机-雷达融合。该架构结合多尺度融合和 Transformer 解码器，以高效提取 BEV 特征。实验结果表明，在 nuScenes 数据集上的 BEV 分割任务中，我们的方法在车辆、道路和车道线上的性能与现有最先进方法相当甚至更优，同时推理速度提高了 3.2 倍。代码和项目主页已公开。\n"
  },
  {
    "path": "abs/2602.08909.md",
    "content": "### Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit\n\nWe investigate what structure emerges in 3D Gaussian Splatting (3DGS) solutions from standard multi-view optimization. We term these Rendering-Optimal References (RORs) and analyze their statistical properties, revealing stable patterns: mixture-structured scales and bimodal radiance across diverse scenes. To understand what determines these parameters, we apply learnability probes by training predictors to reconstruct RORs from point clouds without rendering supervision. Our analysis uncovers fundamental density-stratification. Dense regions exhibit geometry-correlated parameters amenable to render-free prediction, while sparse regions show systematic failure across architectures. We formalize this through variance decomposition, demonstrating that visibility heterogeneity creates covariance-dominated coupling between geometric and appearance parameters in sparse regions. This reveals the dual character of RORs: geometric primitives where point clouds suffice, and view synthesis primitives where multi-view constraints are essential. We provide density-aware strategies that improve training robustness and discuss architectural implications for systems that adaptively balance feed-forward prediction and rendering-based refinement.\n\n我们研究在标准多视图优化下，三维高斯喷溅解中会形成怎样的结构。我们将这类解称为渲染最优参考，并分析其统计性质，发现不同场景中都存在稳定模式，例如具有混合结构的尺度分布和双峰辐射特性。为了理解这些参数由什么决定，我们通过可学习性探针进行分析，训练预测器在没有渲染监督的情况下仅凭点云去重建渲染最优参考。分析揭示出一种基本的密度分层现象：高密度区域中的参数与几何高度相关，适合进行无需渲染的预测；而低密度区域则在不同架构下都表现出系统性失败。我们通过方差分解对这一现象进行形式化，证明可见性异质性会在稀疏区域中造成几何参数与外观参数之间以协方差为主导的耦合关系。这揭示了渲染最优参考的双重属性：一类是点云即可刻画的几何基元，另一类是必须依赖多视图约束的视图合成基元。基于此，我们提出感知密度的策略来提升训练鲁棒性，并讨论对那些需要在前馈预测与基于渲染的细化之间自适应平衡的系统所带来的架构启发。\n"
  },
  {
    "path": "abs/2602.08958.md",
    "content": "### Grow with the Flow: 4D Reconstruction of Growing Plants with Gaussian Flow Fields\n\nModeling the time-varying 3D appearance of plants during their growth poses unique challenges: unlike many dynamic scenes, plants generate new geometry over time as they expand, branch, and differentiate. Recent motion modeling techniques are ill-suited to this problem setting. For example, deformation fields cannot introduce new geometry, and 4D Gaussian splatting constrains motion to a linear trajectory in space and time and cannot track the same set of Gaussians over time. Here, we introduce a 3D Gaussian flow field representation that models plant growth as a time-varying derivative over Gaussian parameters -- position, scale, orientation, color, and opacity -- enabling nonlinear and continuous-time growth dynamics. To initialize a sufficient set of Gaussian primitives, we reconstruct the mature plant and learn a process of reverse growth, effectively simulating the plant's developmental history in reverse. Our approach achieves superior image quality and geometric accuracy compared to prior methods on multi-view timelapse datasets of plant growth, providing a new approach for appearance modeling of growing 3D structures.\n\n对植物在生长过程中的时变三维外观进行建模具有独特挑战：不同于许多动态场景，植物会在扩展、分枝和分化过程中持续生成新的几何结构。近期的运动建模技术并不适合这一问题，例如形变场无法引入新几何，而四维高斯喷溅则将运动限制为空间和时间中的线性轨迹，无法在时间维度上持续跟踪同一组高斯。为此，我们提出三维高斯流场表示，将植物生长建模为高斯参数随时间变化的导数，这些参数包括位置、尺度、朝向、颜色和不透明度，从而支持非线性、连续时间的生长动态。为了初始化足够多的高斯基元，我们先重建成熟植物，再学习一个反向生长过程，相当于在反向模拟植物的发育历史。与现有方法相比，我们的方法在植物生长的多视图延时数据集上取得了更优的图像质量和几何精度，为生长型三维结构的外观建模提供了新思路。\n"
  },
  {
    "path": "abs/2602.09736.md",
    "content": "### Toward Fine-Grained Facial Control in 3D Talking Head Generation\n\nAudio-driven talking head generation is a core component of digital avatars, and 3D Gaussian Splatting has shown strong performance in real-time rendering of high-fidelity talking heads. However, achieving precise control over fine-grained facial movements remains a significant challenge, particularly due to lip-synchronization inaccuracies and facial jitter, both of which can contribute to the uncanny valley effect. To address these challenges, we propose Fine-Grained 3D Gaussian Splatting (FG-3DGS), a novel framework that enables temporally consistent and high-fidelity talking head generation. Our method introduces a frequency-aware disentanglement strategy to explicitly model facial regions based on their motion characteristics. Low-frequency regions, such as the cheeks, nose, and forehead, are jointly modeled using a standard MLP, while high-frequency regions, including the eyes and mouth, are captured separately using a dedicated network guided by facial area masks. The predicted motion dynamics, represented as Gaussian deltas, are applied to the static Gaussians to generate the final head frames, which are rendered via a rasterizer using frame-specific camera parameters. Additionally, a high-frequency-refined post-rendering alignment mechanism, learned from large-scale audio-video pairs by a pretrained model, is incorporated to enhance per-frame generation and achieve more accurate lip synchronization. Extensive experiments on widely used datasets for talking head generation demonstrate that our method outperforms recent state-of-the-art approaches in producing high-fidelity, lip-synced talking head videos.\n\n音频驱动的说话人头生成是数字化身的核心组成部分，而三维高斯喷溅在高保真说话人头的实时渲染中已表现出强大能力。然而，如何精确控制细粒度面部运动仍是一个重大挑战，尤其是唇形同步不准和面部抖动会共同加剧恐怖谷效应。为解决这些问题，我们提出细粒度三维高斯喷溅框架 FG-3DGS，用于生成时间一致且高保真的说话人头。该方法引入频率感知的解耦策略，根据不同面部区域的运动特性进行显式建模。脸颊、鼻子和额头等低频区域由标准多层感知机联合建模，而眼睛和嘴部等高频区域则在面部区域掩码引导下，由专门网络单独建模。预测得到的运动动态以高斯增量形式施加到静态高斯上，再结合每帧相机参数通过光栅化器渲染出最终头部图像。此外，我们还结合一个由大规模音视频对预训练得到的高频细化后渲染对齐机制，以提升逐帧生成质量并获得更准确的唇形同步。在常用说话人头数据集上的大量实验表明，我们的方法在生成高保真、口型同步说话人头视频方面优于近期最先进方法。\n"
  },
  {
    "path": "abs/2602.09816.md",
    "content": "### CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video\n\nHigh-quality novel view synthesis (NVS) from real-world videos is crucial for applications such as cultural heritage preservation, digital twins, and immersive media. However, real-world videos typically contain long sequences with irregular camera trajectories and unknown poses, leading to pose drift, feature misalignment, and geometric distortion during reconstruction. Moreover, lossy compression amplifies these issues by introducing inconsistencies that gradually degrade geometry and rendering quality. While recent studies have addressed either long-sequence NVS or unposed reconstruction, compression-aware approaches still focus on specific artifacts or limited scenarios, leaving diverse compression patterns in long videos insufficiently explored. In this paper, we propose CompSplat, a compression-aware training framework that explicitly models frame-wise compression characteristics to mitigate inter-frame inconsistency and accumulated geometric errors. CompSplat incorporates compression-aware frame weighting and an adaptive pruning strategy to enhance robustness and geometric consistency, particularly under heavy compression. Extensive experiments on challenging benchmarks, including Tanks and Temples, Free, and Hike, demonstrate that CompSplat achieves state-of-the-art rendering quality and pose accuracy, significantly surpassing most recent state-of-the-art NVS approaches under severe compression conditions.\n\n从真实视频中进行高质量新视角合成，对文化遗产保护、数字孪生和沉浸式媒体等应用至关重要。然而，真实视频通常包含长序列、相机轨迹不规则且位姿未知，这会在重建过程中引发位姿漂移、特征错位和几何畸变。此外，有损压缩还会通过引入跨帧不一致，进一步累积并放大几何和渲染质量的退化。虽然近期研究已经分别处理了长序列新视角合成或无位姿重建问题，但面向压缩感知的方法仍主要针对特定伪影或有限场景，对长视频中多样化的压缩模式探索不足。本文提出 CompSplat，一个压缩感知训练框架，通过显式建模逐帧压缩特性来减轻帧间不一致和累积几何误差。CompSplat 结合压缩感知帧加权与自适应剪枝策略，尤其在重压缩条件下显著提升了鲁棒性和几何一致性。在 Tanks and Temples、Free 和 Hike 等具有挑战性的基准上的大量实验表明，CompSplat 在严重压缩条件下实现了最先进的渲染质量和位姿精度，明显优于近期大多数新视角合成方法。\n"
  },
  {
    "path": "abs/2602.09999.md",
    "content": "### Faster-GS: Analyzing and Improving Gaussian Splatting Optimization\n\nRecent advances in 3D Gaussian Splatting (3DGS) have focused on accelerating optimization while preserving reconstruction quality. However, many proposed methods entangle implementation-level improvements with fundamental algorithmic modifications or trade performance for fidelity, leading to a fragmented research landscape that complicates fair comparison. In this work, we consolidate and evaluate the most effective and broadly applicable strategies from prior 3DGS research and augment them with several novel optimizations. We further investigate underexplored aspects of the framework, including numerical stability, Gaussian truncation, and gradient approximation. The resulting system, Faster-GS, provides a rigorously optimized algorithm that we evaluate across a comprehensive suite of benchmarks. Our experiments demonstrate that Faster-GS achieves up to 5$\\times$ faster training while maintaining visual quality, establishing a new cost-effective and resource efficient baseline for 3DGS optimization. Furthermore, we demonstrate that optimizations can be applied to 4D Gaussian reconstruction, leading to efficient non-rigid scene optimization.\n\n近期三维高斯喷溅研究一直在尝试在保持重建质量的同时加速优化。然而，许多已有方法将实现层面的改进与算法本身的修改混杂在一起，或者以牺牲保真度换取性能，导致研究图景较为碎片化，也使公平比较变得困难。本文整合并评估了以往 3DGS 研究中最有效、最具普适性的策略，并在此基础上加入若干新的优化手段。我们还进一步研究了该框架中一些被忽视的问题，包括数值稳定性、高斯截断和梯度近似。最终得到的 Faster-GS 是一个经过严格优化的系统，我们在全面的基准套件上进行了评测。实验表明，Faster-GS 在保持视觉质量的同时，训练速度最高可提升 5 倍，为 3DGS 优化建立了新的高性价比、资源高效的基线。此外，我们还验证了这些优化同样可以应用于四维高斯重建，从而实现高效的非刚体场景优化。\n"
  },
  {
    "path": "abs/2602.10173.md",
    "content": "### ArtisanGS: Interactive Tools for Gaussian Splat Selection with AI and Human in the Loop\n\nRepresentation in the family of 3D Gaussian Splats (3DGS) are growing into a viable alternative to traditional graphics for an expanding number of application, including recent techniques that facilitate physics simulation and animation. However, extracting usable objects from in-the-wild captures remains challenging and controllable editing techniques for this representation are limited. Unlike the bulk of emerging techniques, focused on automatic solutions or high-level editing, we introduce an interactive suite of tools centered around versatile Gaussian Splat selection and segmentation. We propose a fast AI-driven method to propagate user-guided 2D selection masks to 3DGS selections. This technique allows for user intervention in the case of errors and is further coupled with flexible manual selection and segmentation tools. These allow a user to achieve virtually any binary segmentation of an unstructured 3DGS scene. We evaluate our toolset against the state-of-the-art for Gaussian Splat selection and demonstrate their utility for downstream applications by developing a user-guided local editing approach, leveraging a custom Video Diffusion Model. With flexible selection tools, users have direct control over the areas that the AI can modify. Our selection and editing tools can be used for any in-the-wild capture without additional optimization.\n\n三维高斯喷溅这一类表示正逐渐成为传统图形学之外的一种可行替代方案，应用范围也在不断扩大，包括近期支持物理仿真和动画的技术。然而，从自然采集数据中提取可直接使用的对象仍然具有挑战性，而且针对这种表示的可控编辑技术仍然有限。不同于大多数聚焦自动化方案或高层编辑的工作，我们提出一套交互式工具，核心是灵活的 Gaussian Splat 选择与分割。我们设计了一种快速的 AI 驱动方法，可将用户引导的二维选择掩码传播为三维高斯喷溅上的选择结果。当自动传播出现错误时，用户仍可介入修正，并可配合灵活的手动选择和分割工具使用。这使用户几乎可以对任意非结构化 3DGS 场景完成任意二值分割。我们将工具集与当前最先进的 Gaussian Splat 选择方法进行比较，并进一步开发了一个由用户引导的局部编辑方案，结合定制的视频扩散模型，展示其在下游应用中的价值。借助灵活的选择工具，用户能够直接控制 AI 可修改的区域，而且整个工具链适用于任意自然采集场景，无需额外优化。\n"
  },
  {
    "path": "abs/2602.10239.md",
    "content": "### XSPLAIN: XAI-enabling Splat-based Prototype Learning for Attribute-aware INterpretability\n\n3D Gaussian Splatting (3DGS) has rapidly become a standard for high-fidelity 3D reconstruction, yet its adoption in multiple critical domains is hindered by the lack of interpretability of the generation models as well as classification of the Splats. While explainability methods exist for other 3D representations, like point clouds, they typically rely on ambiguous saliency maps that fail to capture the volumetric coherence of Gaussian primitives. We introduce XSPLAIN, the first ante-hoc, prototype-based interpretability framework designed specifically for 3DGS classification. Our approach leverages a voxel-aggregated PointNet backbone and a novel, invertible orthogonal transformation that disentangles feature channels for interpretability while strictly preserving the original decision boundaries. Explanations are grounded in representative training examples, enabling intuitive ``this looks like that'' reasoning without any degradation in classification performance. A rigorous user study (N=51) demonstrates a decisive preference for our approach: participants selected XSPLAIN explanations 48.4\\% of the time as the best, significantly outperforming baselines $(p<0.001)$, showing that XSPLAIN provides transparency and user trust. The source code for this work is available at: https://github.com/Solvro/ml-splat-xai\n\n三维高斯喷溅已经迅速成为高保真三维重建的标准表示之一，但由于生成模型和高斯类别判别缺乏可解释性，其在多个关键领域的应用仍受到限制。虽然点云等其他三维表示已经有一些可解释方法，但它们通常依赖含义模糊的显著图，难以体现高斯基元在体积上的一致性。我们提出 XSPLAIN，这是首个专为 3DGS 分类设计的、基于原型的先验可解释性框架。该方法结合体素聚合的 PointNet 骨干网络与一种新型可逆正交变换，在严格保持原始决策边界不变的前提下，对特征通道进行可解释的解耦。解释建立在具有代表性的训练样本之上，使模型能够以“这个看起来像那个”的直观方式进行推理，同时不降低分类性能。一项严格的用户研究表明，参与者有 48.4% 的时间将 XSPLAIN 生成的解释评为最佳，显著优于基线方法，说明 XSPLAIN 能有效提升透明度与用户信任。代码地址为 https://github.com/Solvro/ml-splat-xai。\n"
  },
  {
    "path": "abs/2602.10278.md",
    "content": "### ERGO: Excess-Risk-Guided Optimization for High-Fidelity Monocular 3D Gaussian Splatting\n\nGenerating 3D content from a single image remains a fundamentally challenging and ill-posed problem due to the inherent absence of geometric and textural information in occluded regions. While state-of-the-art generative models can synthesize auxiliary views to provide additional supervision, these views inevitably contain geometric inconsistencies and textural misalignments that propagate and amplify artifacts during 3D reconstruction. To effectively harness these imperfect supervisory signals, we propose an adaptive optimization framework guided by excess risk decomposition, termed ERGO. Specifically, ERGO decomposes the optimization losses in 3D Gaussian splatting into two components, i.e., excess risk that quantifies the suboptimality gap between current and optimal parameters, and Bayes error that models the irreducible noise inherent in synthesized views. This decomposition enables ERGO to dynamically estimate the view-specific excess risk and adaptively adjust loss weights during optimization. Furthermore, we introduce geometry-aware and texture-aware objectives that complement the excess-risk-derived weighting mechanism, establishing a synergistic global-local optimization paradigm. Consequently, ERGO demonstrates robustness against supervision noise while consistently enhancing both geometric fidelity and textural quality of the reconstructed 3D content. Extensive experiments on the Google Scanned Objects dataset and the OmniObject3D dataset demonstrate the superiority of ERGO over existing state-of-the-art methods.\n\n从单张图像生成三维内容本质上是一项困难且欠定的问题，因为被遮挡区域天然缺乏几何和纹理信息。虽然最先进的生成模型可以合成辅助视图来提供额外监督，但这些视图不可避免地包含几何不一致和纹理错位，并会在三维重建过程中传播和放大伪影。为了更有效地利用这些不完美监督信号，我们提出 ERGO，一个由超额风险分解引导的自适应优化框架。具体而言，ERGO 将三维高斯喷溅中的优化损失分解为两部分：一部分是衡量当前参数与最优参数之间差距的超额风险，另一部分是表征合成视图中不可约噪声的 Bayes 误差。基于这种分解，ERGO 能动态估计视图特定的超额风险，并在优化过程中自适应调整损失权重。进一步地，我们引入几何感知目标和纹理感知目标，与超额风险导出的加权机制相互补充，形成全局与局部协同的优化范式。因而，ERGO 在对监督噪声保持鲁棒性的同时，能够持续提升重建三维内容的几何保真度和纹理质量。在 Google Scanned Objects 和 OmniObject3D 数据集上的大量实验表明，ERGO 优于现有最先进方法。\n"
  },
  {
    "path": "abs/2602.11575.md",
    "content": "### ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles\n\nVisual navigation models often struggle in real-world dynamic environments due to limited robustness to the sim-to-real gap and the difficulty of training policies tailored to target deployment environments (e.g., households, restaurants, and factories). Although real-to-sim navigation simulation using 3D Gaussian Splatting (GS) can mitigate these challenges, prior GS-based works have considered only static scenes or non-photorealistic human obstacles built from simulator assets, despite the importance of safe navigation in dynamic environments. To address these issues, we propose ReaDy-Go, a novel real-to-sim simulation pipeline that synthesizes photorealistic dynamic scenarios in target environments by augmenting a reconstructed static GS scene with dynamic human GS obstacles, and trains navigation policies using the generated datasets. The pipeline provides three key contributions: (1) a dynamic GS simulator that integrates static scene GS with a human animation module, enabling the insertion of animatable human GS avatars and the synthesis of plausible human motions from 2D trajectories, (2) a navigation dataset generation framework that leverages the simulator along with a robot expert planner designed for dynamic GS representations and a human planner, and (3) robust navigation policies to both the sim-to-real gap and moving obstacles. The proposed simulator generates thousands of photorealistic navigation scenarios with animatable human GS avatars from arbitrary viewpoints. ReaDy-Go outperforms baselines across target environments in both simulation and real-world experiments, demonstrating improved navigation performance even after sim-to-real transfer and in the presence of moving obstacles. Moreover, zero-shot sim-to-real deployment in an unseen environment indicates its generalization potential. Project page: https://syeon-yoo.github.io/ready-go-site/.\n\n视觉导航模型在真实动态环境中往往表现不佳，原因在于对仿真到现实域差的鲁棒性不足，以及难以训练针对目标部署环境，例如家庭、餐厅和工厂，量身定制的策略。尽管基于三维高斯喷溅的真实到仿真导航模拟能够缓解这些问题，但先前相关工作只考虑静态场景，或者仅使用模拟器资产构建不够逼真的人体障碍物，而动态环境中的安全导航恰恰十分重要。为此，我们提出 ReaDy-Go，一个新的真实到仿真模拟流程，通过在重建的静态高斯场景中加入动态人体高斯障碍物，合成目标环境中的照片级真实动态场景，并利用生成的数据集训练导航策略。该流程包含三项关键贡献：一是动态高斯喷溅模拟器，将静态场景高斯与人体动画模块结合，能够插入可驱动的人体高斯化身，并从二维轨迹合成合理的人体运动；二是导航数据集生成框架，结合该模拟器、面向动态高斯表示设计的机器人专家规划器以及人体规划器；三是同时对仿真到现实域差和移动障碍物具有鲁棒性的导航策略。所提出模拟器能够从任意视角生成成千上万个包含可驱动人体高斯化身的照片级真实导航场景。ReaDy-Go 在仿真与真实实验中的多个目标环境上都优于基线，即便经历仿真到现实迁移并面对移动障碍物，仍能表现出更好的导航性能。与此同时，在未见环境中的零样本仿真到现实部署结果也显示了其良好的泛化潜力。项目页面见 https://syeon-yoo.github.io/ready-go-site/。\n"
  },
  {
    "path": "abs/2602.11577.md",
    "content": "### LeafFit: Plant Assets Creation from 3D Gaussian Splatting\n\nWe propose LeafFit, a pipeline that converts 3D Gaussian Splatting (3DGS) of individual plants into editable, instanced mesh assets. While 3DGS faithfully captures complex foliage, its high memory footprint and lack of mesh topology make it incompatible with traditional game production workflows. We address this by leveraging the repetition of leaf shapes; our method segments leaves from the unstructured 3DGS, with optional user interaction included as a fallback. A representative leaf group is selected and converted into a thin, sharp mesh to serve as a template; this template is then fitted to all other leaves via differentiable Moving Least Squares (MLS) deformation. At runtime, the deformation is evaluated efficiently on-the-fly using a vertex shader to minimize storage requirements. Experiments demonstrate that LeafFit achieves higher segmentation quality and deformation accuracy than recent baselines while significantly reducing data size and enabling parameter-level editing.\n\n我们提出 LeafFit，一个将单株植物的三维高斯喷溅转换为可编辑、可实例化网格资产的流程。虽然 3DGS 能够忠实捕捉复杂植被，但其较高的内存占用以及缺乏网格拓扑结构，使其难以融入传统游戏制作流程。为此，我们利用叶片形状的重复性来解决这一问题。我们的方法首先从非结构化 3DGS 中分割出叶片，并在必要时允许用户作为兜底进行交互修正。随后，从中选取一个具有代表性的叶片组并将其转换为薄而锐利的网格作为模板，再通过可微分的移动最小二乘形变，将该模板拟合到所有其他叶片上。运行时，这一形变可通过顶点着色器高效地按需计算，从而尽量减少存储需求。实验表明，LeafFit 在分割质量和形变精度上都优于近期基线，同时显著减小了数据体积，并支持参数级编辑。\n"
  },
  {
    "path": "abs/2602.11638.md",
    "content": "### Variation-aware Flexible 3D Gaussian Editing\n\nIndirect editing methods for 3D Gaussian Splatting (3DGS) have recently witnessed significant advancements. These approaches operate by first applying edits in the rendered 2D space and subsequently projecting the modifications back into 3D. However, this paradigm inevitably introduces cross-view inconsistencies and constrains both the flexibility and efficiency of the editing process. To address these challenges, we present VF-Editor, which enables native editing of Gaussian primitives by predicting attribute variations in a feedforward manner. To accurately and efficiently estimate these variations, we design a novel variation predictor distilled from 2D editing knowledge. The predictor encodes the input to generate a variation field and employs two learnable, parallel decoding functions to iteratively infer attribute changes for each 3D Gaussian. Thanks to its unified design, VF-Editor can seamlessly distill editing knowledge from diverse 2D editors and strategies into a single predictor, allowing for flexible and effective knowledge transfer into the 3D domain. Extensive experiments on both public and private datasets reveal the inherent limitations of indirect editing pipelines and validate the effectiveness and flexibility of our approach.\n\n针对三维高斯喷溅的间接编辑方法近期取得了显著进展。这类方法通常先在渲染出的二维图像空间中执行编辑，再将修改结果投影回三维。然而，这一范式不可避免地会引入跨视图不一致，同时限制了编辑过程的灵活性与效率。为了解决这些问题，我们提出 VF-Editor，它通过以前馈方式预测属性变化，实现对高斯基元的原生编辑。为了准确且高效地估计这些变化，我们设计了一个从二维编辑知识蒸馏而来的新型变化预测器。该预测器会对输入进行编码生成变化场，并使用两个可学习的并行解码函数，迭代推断每个三维高斯的属性变化。由于采用统一设计，VF-Editor 能够将来自不同二维编辑器和不同策略的编辑知识无缝蒸馏到单一预测器中，从而实现灵活且有效的三维知识迁移。在公开和私有数据集上的大量实验揭示了间接编辑流水线的固有局限，并验证了我们方法的有效性与灵活性。\n"
  },
  {
    "path": "abs/2602.11653.md",
    "content": "### GR-Diffusion: 3D Gaussian Representation Meets Diffusion in Whole-Body PET Reconstruction\n\nPositron emission tomography (PET) reconstruction is a critical challenge in molecular imaging, often hampered by noise amplification, structural blurring, and detail loss due to sparse sampling and the ill-posed nature of inverse problems. The three-dimensional discrete Gaussian representation (GR), which efficiently encodes 3D scenes using parameterized discrete Gaussian distributions, has shown promise in computer vision. In this work, we pro-pose a novel GR-Diffusion framework that synergistically integrates the geometric priors of GR with the generative power of diffusion models for 3D low-dose whole-body PET reconstruction. GR-Diffusion employs GR to generate a reference 3D PET image from projection data, establishing a physically grounded and structurally explicit benchmark that overcomes the low-pass limitations of conventional point-based or voxel-based methods. This reference image serves as a dual guide during the diffusion process, ensuring both global consistency and local accuracy. Specifically, we employ a hierarchical guidance mechanism based on the GR reference. Fine-grained guidance leverages differences to refine local details, while coarse-grained guidance uses multi-scale difference maps to correct deviations. This strategy allows the diffusion model to sequentially integrate the strong geometric prior from GR and recover sub-voxel information. Experimental results on the UDPET and Clinical datasets with varying dose levels show that GR-Diffusion outperforms state-of-the-art methods in enhancing 3D whole-body PET image quality and preserving physiological details.\n\n正电子发射断层成像重建是分子影像中的关键难题，通常会受到稀疏采样和逆问题不适定性带来的噪声放大、结构模糊和细节丢失影响。三维离散高斯表示能够用参数化离散高斯分布高效编码三维场景，在计算机视觉中已展现出潜力。本文提出 GR-Diffusion，一个将高斯表示的几何先验与扩散模型生成能力协同结合起来的框架，用于三维低剂量全身 PET 重建。GR-Diffusion 首先利用高斯表示根据投影数据生成参考三维 PET 图像，建立一个具有物理依据且结构显式的基准，从而克服传统点表示或体素表示的低通限制。该参考图像在扩散过程中作为双重引导，保证整体一致性和局部准确性。具体来说，我们基于这一参考设计了层次化引导机制：细粒度引导利用差异信息细化局部细节，粗粒度引导则利用多尺度差异图纠正偏差。借助这一策略，扩散模型能够逐步吸收来自高斯表示的强几何先验，并恢复亚体素级信息。在 UDPET 和 Clinical 数据集上针对不同剂量水平的实验结果表明，GR-Diffusion 在提升三维全身 PET 图像质量和保留生理细节方面优于现有最先进方法。\n"
  },
  {
    "path": "abs/2602.11693.md",
    "content": "### OMEGA-Avatar: One-shot Modeling of 360° Gaussian Avatars\n\nCreating high-fidelity, animatable 3D avatars from a single image remains a formidable challenge. We identified three desirable attributes of avatar generation: 1) the method should be feed-forward, 2) model a 360° full-head, and 3) should be animation-ready. However, current work addresses only two of the three points simultaneously. To address these limitations, we propose OMEGA-Avatar, the first feed-forward framework that simultaneously generates a generalizable, 360°-complete, and animatable 3D Gaussian head from a single image. Starting from a feed-forward and animatable framework, we address the 360° full-head avatar generation problem with two novel components. First, to overcome poor hair modeling in full-head avatar generation, we introduce a semantic-aware mesh deformation module that integrates multi-view normals to optimize a FLAME head with hair while preserving its topology structure. Second, to enable effective feed-forward decoding of full-head features, we propose a multi-view feature splatting module that constructs a shared canonical UV representation from features across multiple views through differentiable bilinear splatting, hierarchical UV mapping, and visibility-aware fusion. This approach preserves both global structural coherence and local high-frequency details across all viewpoints, ensuring 360° consistency without per-instance optimization. Extensive experiments demonstrate that OMEGA-Avatar achieves state-of-the-art performance, significantly outperforming existing baselines in 360° full-head completeness while robustly preserving identity across different viewpoints.\n\n从单张图像创建高保真、可驱动的三维化身仍然是一项极具挑战性的任务。我们认为理想的化身生成应同时具备三点：一是前馈式生成，二是完整建模 360 度全头部，三是可直接用于动画驱动。然而现有工作通常只能同时满足其中两项。为此，我们提出 OMEGA-Avatar，这是首个能够从单张图像同时生成可泛化、360 度完整且可驱动三维高斯头部的前馈式框架。我们在一个前馈且可驱动的基础框架上，引入两个新组件来解决完整全头部化身生成问题。首先，为了解决全头部化身生成中头发建模较差的问题，我们提出语义感知网格形变模块，融合多视图法线来优化包含头发的 FLAME 头部模型，同时保持其拓扑结构不变。其次，为了实现对全头部特征的高效前馈解码，我们提出多视图特征喷溅模块，通过可微双线性喷溅、层次化 UV 映射和可见性感知融合，从多个视角特征构建共享的规范 UV 表示。该方法能够在所有视角下同时保留全局结构一致性和局部高频细节，从而在无需逐实例优化的情况下保证 360 度一致性。大量实验表明，OMEGA-Avatar 达到了最先进性能，在 360 度全头完整性方面显著优于现有基线，并能稳健保持不同视角下的人物身份一致性。\n"
  },
  {
    "path": "abs/2602.11705.md",
    "content": "### TG-Field: Geometry-Aware Radiative Gaussian Fields for Tomographic Reconstruction\n\n3D Gaussian Splatting (3DGS) has revolutionized 3D scene representation with superior efficiency and quality. While recent adaptations for computed tomography (CT) show promise, they struggle with severe artifacts under highly sparse-view projections and dynamic motions. To address these challenges, we propose Tomographic Geometry Field (TG-Field), a geometry-aware Gaussian deformation framework tailored for both static and dynamic CT reconstruction. A multi-resolution hash encoder is employed to capture local spatial priors, regularizing primitive parameters under ultra-sparse settings. We further extend the framework to dynamic reconstruction by introducing time-conditioned representations and a spatiotemporal attention block to adaptively aggregate features, thereby resolving spatiotemporal ambiguities and enforcing temporal coherence. In addition, a motion-flow network models fine-grained respiratory motion to track local anatomical deformations. Extensive experiments on synthetic and real-world datasets demonstrate that TG-Field consistently outperforms existing methods, achieving state-of-the-art reconstruction accuracy under highly sparse-view conditions.\n\n三维高斯喷溅以优异的效率和质量改变了三维场景表示方式。尽管近期面向计算机断层成像的改造版本展现出潜力，但在极稀疏投影和动态运动条件下仍会出现严重伪影。为解决这些问题，我们提出 TG-Field，一种面向静态和动态 CT 重建的几何感知高斯形变框架。该方法使用多分辨率哈希编码器捕捉局部空间先验，在超稀疏场景下对基元参数进行正则化。我们进一步通过引入时间条件表示和时空注意力模块，将该框架扩展到动态重建，以自适应聚合特征，消除时空歧义并强化时间一致性。此外，运动流网络用于建模细粒度的呼吸运动，从而跟踪局部解剖结构形变。在合成和真实数据集上的大量实验表明，TG-Field 在高度稀疏视角条件下持续优于现有方法，达到最先进重建精度。\n"
  },
  {
    "path": "abs/2602.12159.md",
    "content": "### 3DGSNav: Enhancing Vision-Language Model Reasoning for Object Navigation via Active 3D Gaussian Splatting\n\nObject navigation is a core capability of embodied intelligence, enabling an agent to locate target objects in unknown environments. Recent advances in vision-language models (VLMs) have facilitated zero-shot object navigation (ZSON). However, existing methods often rely on scene abstractions that convert environments into semantic maps or textual representations, causing high-level decision making to be constrained by the accuracy of low-level perception. In this work, we present 3DGSNav, a novel ZSON framework that embeds 3D Gaussian Splatting (3DGS) as persistent memory for VLMs to enhance spatial reasoning. Through active perception, 3DGSNav incrementally constructs a 3DGS representation of the environment, enabling trajectory-guided free-viewpoint rendering of frontier-aware first-person views. Moreover, we design structured visual prompts and integrate them with Chain-of-Thought (CoT) prompting to further improve VLM reasoning. During navigation, a real-time object detector filters potential targets, while VLM-driven active viewpoint switching performs target re-verification, ensuring efficient and reliable recognition. Extensive evaluations across multiple benchmarks and real-world experiments on a quadruped robot demonstrate that our method achieves robust and competitive performance against state-of-the-art approaches.The Project Page:https://aczheng-cai.github.io/3dgsnav.github.io/\n\n目标导航是具身智能的核心能力之一，使智能体能够在未知环境中找到目标物体。近期视觉语言模型的发展推动了零样本目标导航，但现有方法往往依赖将环境转化为语义地图或文本表示等场景抽象形式，导致高层决策受限于底层感知精度。本文提出 3DGSNav，一个新的零样本目标导航框架，将三维高斯喷溅作为视觉语言模型的持久记忆，以增强其空间推理能力。通过主动感知，3DGSNav 逐步构建环境的 3DGS 表示，从而支持基于轨迹引导的、感知前沿区域的第一人称自由视角渲染。我们还设计了结构化视觉提示，并将其与思维链提示结合，进一步提升视觉语言模型的推理能力。在导航过程中，实时目标检测器用于筛选候选目标，而由视觉语言模型驱动的主动视角切换则进行目标复核，以确保识别高效可靠。在多个基准和四足机器人真实实验中的广泛评测表明，该方法在与现有最先进方法比较时表现出稳健且有竞争力的性能。项目页面见 https://aczheng-cai.github.io/3dgsnav.github.io/。\n"
  },
  {
    "path": "abs/2602.12314.md",
    "content": "### LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning\n\nWe present LatentAM, an online 3D Gaussian Splatting (3DGS) mapping framework that builds scalable latent feature maps from streaming RGB-D observations for open-vocabulary robotic perception. Instead of distilling high-dimensional Vision-Language Model (VLM) embeddings using model-specific decoders, LatentAM proposes an online dictionary learning approach that is both model-agnostic and pretraining-free, enabling plug-and-play integration with different VLMs at test time. Specifically, our approach associates each Gaussian primitive with a compact query vector that can be converted into approximate VLM embeddings using an attention mechanism with a learnable dictionary. The dictionary is initialized efficiently from streaming observations and optimized online to adapt to evolving scene semantics under trust-region regularization. To scale to long trajectories and large environments, we further propose an efficient map management strategy based on voxel hashing, where optimization is restricted to an active local map on the GPU, while the global map is stored and indexed on the CPU to maintain bounded GPU memory usage. Experiments on public benchmarks and a large-scale custom dataset demonstrate that LatentAM attains significantly better feature reconstruction fidelity compared to state-of-the-art methods, while achieving near-real-time speed (12-35 FPS) on the evaluated datasets. Our project page is at: https://junwoonlee.github.io/projects/LatentAM\n\n我们提出 LatentAM，一个在线三维高斯喷溅建图框架，可从流式 RGB-D 观测中构建可扩展的潜在特征地图，用于开放词汇机器人感知。不同于依赖特定模型解码器蒸馏高维视觉语言模型嵌入的方法，LatentAM 提出一种与模型无关、且无需预训练的在线字典学习方法，使不同视觉语言模型能够在测试时即插即用地接入。具体来说，我们为每个高斯基元关联一个紧凑的查询向量，并通过带可学习字典的注意力机制将其转换为近似的视觉语言模型嵌入。该字典可从流式观测中高效初始化，并在信赖域正则化约束下在线优化，以适应不断变化的场景语义。为了扩展到长轨迹和大规模环境，我们还提出基于体素哈希的高效地图管理策略，仅在 GPU 上对活动局部地图进行优化，而全局地图存储并索引在 CPU 上，以保证 GPU 内存占用受控。在公开基准和大规模自建数据集上的实验表明，LatentAM 在特征重建保真度上明显优于现有最先进方法，同时在所评测数据集上实现了接近实时的 12 到 35 FPS 速度。项目页面见 https://junwoonlee.github.io/projects/LatentAM。\n"
  },
  {
    "path": "abs/2602.12796.md",
    "content": "### GSM-GS: Geometry-Constrained Single and Multi-view Gaussian Splatting for Surface Reconstruction\n\nRecently, 3D Gaussian Splatting has emerged as a prominent research direction owing to its ultrarapid training speed and high-fidelity rendering capabilities. However, the unstructured and irregular nature of Gaussian point clouds poses challenges to reconstruction accuracy. This limitation frequently causes high-frequency detail loss in complex surface microstructures when relying solely on routine strategies. To address this limitation, we propose GSM-GS: a synergistic optimization framework integrating single-view adaptive sub-region weighting constraints and multi-view spatial structure refinement. For single-view optimization, we leverage image gradient features to partition scenes into texture-rich and texture-less sub-regions. The reconstruction quality is enhanced through adaptive filtering mechanisms guided by depth discrepancy features. This preserves high-weight regions while implementing a dual-branch constraint strategy tailored to regional texture variations, thereby improving geometric detail characterization. For multi-view optimization, we introduce a geometry-guided cross-view point cloud association method combined with a dynamic weight sampling strategy. This constructs 3D structural normal constraints across adjacent point cloud frames, effectively reinforcing multi-view consistency and reconstruction fidelity. Extensive experiments on public datasets demonstrate that our method achieves both competitive rendering quality and geometric reconstruction. See our interactive project page\n\n近年来，三维高斯喷溅凭借极快的训练速度和高保真渲染能力成为研究热点。然而，高斯点云的非结构化和不规则特性会给重建精度带来挑战，仅依赖常规策略时，复杂表面微结构中的高频细节常常容易丢失。为此，我们提出 GSM-GS，一个结合单视图自适应子区域加权约束与多视图空间结构细化的协同优化框架。在单视图优化中，我们利用图像梯度特征将场景划分为纹理丰富区域和纹理贫乏区域，并通过由深度差异特征引导的自适应滤波机制提升重建质量。该设计在保留高权重区域的同时，引入适配不同区域纹理变化的双分支约束策略，从而改善几何细节表征。在多视图优化中，我们提出几何引导的跨视图点云关联方法，并结合动态权重采样策略，在相邻点云帧之间构建三维结构法线约束，有效增强多视图一致性和重建保真度。在公开数据集上的大量实验表明，我们的方法在渲染质量和几何重建方面均具有竞争力。\n"
  },
  {
    "path": "abs/2602.13444.md",
    "content": "### FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation\n\nRecent vision-language-action (VLA) models can generate plausible end-effector motions, yet they often fail in long-horizon, contact-rich tasks because the underlying hand-object interaction (HOI) structure is not explicitly represented. An embodiment-agnostic interaction representation that captures this structure would make manipulation behaviors easier to validate and transfer across robots. We propose FlowHOI, a two-stage flow-matching framework that generates semantically grounded, temporally coherent HOI sequences, comprising hand poses, object poses, and hand-object contact states, conditioned on an egocentric observation, a language instruction, and a 3D Gaussian splatting (3DGS) scene reconstruction. We decouple geometry-centric grasping from semantics-centric manipulation, conditioning the latter on compact 3D scene tokens and employing a motion-text alignment loss to semantically ground the generated interactions in both the physical scene layout and the language instruction. To address the scarcity of high-fidelity HOI supervision, we introduce a reconstruction pipeline that recovers aligned hand-object trajectories and meshes from large-scale egocentric videos, yielding an HOI prior for robust generation. Across the GRAB and HOT3D benchmarks, FlowHOI achieves the highest action recognition accuracy and a 1.7$\\times$ higher physics simulation success rate than the strongest diffusion-based baseline, while delivering a 40$\\times$ inference speedup. We further demonstrate real-robot execution on four dexterous manipulation tasks, illustrating the feasibility of retargeting generated HOI representations to real-robot execution pipelines.\n\n近期视觉语言动作模型可以生成看起来合理的末端执行器运动，但在长时程、接触密集型任务中仍然容易失败，因为底层的手物交互结构没有被显式表示。如果能够建立一种与具体机器人形态无关、且能捕捉这种结构的交互表示，将更有利于验证操作行为，并在不同机器人之间迁移。我们提出 FlowHOI，一个两阶段 flow-matching 框架，可在第一视角观测、语言指令和三维高斯喷溅场景重建的条件下，生成具备语义约束且时间一致的手物交互序列，包括手部姿态、物体姿态以及手物接触状态。我们将以几何为中心的抓取与以语义为中心的操控解耦，后者依赖紧凑的三维场景 token，并采用运动文本对齐损失，使生成的交互在物理场景布局和语言指令上同时具备语义对应关系。为了解决高保真手物交互监督数据匮乏的问题，我们引入一个重建流程，从大规模第一视角视频中恢复对齐的手物轨迹和网格，从而获得稳健生成所需的交互先验。在 GRAB 和 HOT3D 基准上，FlowHOI 取得了最高的动作识别准确率，并比最强的扩散基线获得高出 1.7 倍的物理仿真成功率，同时推理速度提高 40 倍。我们还在四个灵巧操作任务上展示了真实机器人执行结果，说明生成的手物交互表示可以重定向到真实机器人执行流程中。\n"
  },
  {
    "path": "abs/2602.13549.md",
    "content": "### Nighttime Autonomous Driving Scene Reconstruction with Physically-Based Gaussian Splatting\n\nThis paper focuses on scene reconstruction under nighttime conditions in autonomous driving simulation. Recent methods based on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) have achieved photorealistic modeling in autonomous driving scene reconstruction, but they primarily focus on normal-light conditions. Low-light driving scenes are more challenging to model due to their complex lighting and appearance conditions, which often causes performance degradation of existing methods. To address this problem, this work presents a novel approach that integrates physically based rendering into 3DGS to enhance nighttime scene reconstruction for autonomous driving. Specifically, our approach integrates physically based rendering into composite scene Gaussian representations and jointly optimizes Bidirectional Reflectance Distribution Function (BRDF) based material properties. We explicitly model diffuse components through a global illumination module and specular components by anisotropic spherical Gaussians. As a result, our approach improves reconstruction quality for outdoor nighttime driving scenes, while maintaining real-time rendering. Extensive experiments across diverse nighttime scenarios on two real-world autonomous driving datasets, including nuScenes and Waymo, demonstrate that our approach outperforms the state-of-the-art methods both quantitatively and qualitatively.\n\n本文关注自动驾驶仿真中的夜间场景重建。尽管基于神经辐射场和三维高斯喷溅的近期方法已经能够对自动驾驶场景进行照片级建模，但它们主要面向正常光照条件。低照度驾驶场景由于具有更复杂的光照与外观条件，更难建模，也常导致现有方法性能下降。为解决这一问题，我们提出一种将基于物理的渲染机制融入 3DGS 的新方法，以增强夜间自动驾驶场景重建能力。具体而言，我们将基于物理的渲染整合进复合场景高斯表示中，并联合优化基于双向反射分布函数的材质属性。我们通过全局光照模块显式建模漫反射成分，并利用各向异性球面高斯表示镜面反射成分。由此，该方法在保持实时渲染的同时，提高了户外夜间驾驶场景的重建质量。在 nuScenes 和 Waymo 两个真实自动驾驶数据集上的多种夜间场景实验表明，该方法在定量和定性上都优于现有最先进方法。\n"
  },
  {
    "path": "abs/2602.13801.md",
    "content": "### Joint Orientation and Weight Optimization for Robust Watertight Surface Reconstruction via Dirichlet-Regularized Winding Fields\n\nWe propose Dirichlet Winding Reconstruction (DiWR), a robust method for reconstructing watertight surfaces from unoriented point clouds with non-uniform sampling, noise, and outliers. Our method uses the generalized winding number (GWN) field as the target implicit representation and jointly optimizes point orientations, per-point area weights, and confidence coefficients in a single pipeline. The optimization minimizes the Dirichlet energy of the induced winding field together with additional GWN-based constraints, allowing DiWR to compensate for non-uniform sampling, reduce the impact of noise, and downweight outliers during reconstruction, with no reliance on separate preprocessing. We evaluate DiWR on point clouds from 3D Gaussian Splatting, a computer-vision pipeline, and corrupted graphics benchmarks. Experiments show that DiWR produces plausible watertight surfaces on these challenging inputs and outperforms both traditional multi-stage pipelines and recent joint orientation-reconstruction methods.\n\n我们提出 Dirichlet Winding Reconstruction，一种从无朝向点云中重建封闭水密表面的鲁棒方法，能够应对非均匀采样、噪声和离群点。该方法以广义绕数场作为目标隐式表示，并在单一流程中联合优化点的朝向、逐点面积权重以及置信系数。优化目标是在附加广义绕数约束的同时，最小化诱导绕数场的 Dirichlet 能量，从而使 DiWR 能够补偿非均匀采样、减弱噪声影响，并在重建过程中自动降低离群点权重，而无需依赖独立预处理。我们在来自三维高斯喷溅、计算机视觉流程和受损图形学基准的点云上对 DiWR 进行了评估。实验结果表明，DiWR 能在这些具有挑战性的输入上生成合理的水密表面，并优于传统多阶段流程以及近期的联合朝向重建方法。\n"
  },
  {
    "path": "abs/2602.13806.md",
    "content": "### Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos\n\nUnderstanding dynamic scenes from casual videos is critical for scalable robot learning, yet four-dimensional (4D) reconstruction under strictly monocular settings remains highly ill-posed. To address this challenge, our key insight is that real-world dynamics exhibits a multi-scale regularity from object to particle level. To this end, we design the multi-scale dynamics mechanism that factorizes complex motion fields. Within this formulation, we propose Gaussian sequences with multi-scale dynamics, a novel representation for dynamic 3D Gaussians derived through compositions of multi-level motion. This layered structure substantially alleviates ambiguity of reconstruction and promotes physically plausible dynamics. We further incorporate multi-modal priors from vision foundation models to establish complementary supervision, constraining the solution space and improving the reconstruction fidelity. Our approach enables accurate and globally consistent 4D reconstruction from monocular casual videos. Experiments of dynamic novel-view synthesis (NVS) on benchmark and real-world manipulation datasets demonstrate considerable improvements over existing methods.\n\n从随手拍摄的单目视频中理解动态场景，对于可扩展机器人学习至关重要，但在严格单目条件下进行四维重建仍然高度欠定。为应对这一挑战，我们的核心观察是，真实世界动态在从对象级到粒子级的不同层面上具有多尺度规律性。基于这一点，我们设计了多尺度动态机制，将复杂运动场进行因子化。在这一表述下，我们提出具有多尺度动态的高斯序列，这是一种新的动态三维高斯表示形式，通过多层级运动的组合得到。这样的分层结构显著减轻了重建歧义，并促进了更符合物理规律的动态表现。我们进一步引入来自视觉基础模型的多模态先验，建立互补监督，以约束解空间并提升重建保真度。该方法能够从单目随手视频中实现准确且全局一致的四维重建。在动态新视角合成基准和真实操作数据集上的实验表明，我们的方法相较现有方法有明显提升。\n"
  },
  {
    "path": "abs/2602.14199.md",
    "content": "### Learnable Multi-level Discrete Wavelet Transforms for 3D Gaussian Splatting Frequency Modulation\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful approach for novel view synthesis. However, the number of Gaussian primitives often grows substantially during training as finer scene details are reconstructed, leading to increased memory and storage costs. Recent coarse-to-fine strategies regulate Gaussian growth by modulating the frequency content of the ground-truth images. In particular, AutoOpti3DGS employs the learnable Discrete Wavelet Transform (DWT) to enable data-adaptive frequency modulation. Nevertheless, its modulation depth is limited by the 1-level DWT, and jointly optimizing wavelet regularization with 3D reconstruction introduces gradient competition that promotes excessive Gaussian densification. In this paper, we propose a multi-level DWT-based frequency modulation framework for 3DGS. By recursively decomposing the low-frequency subband, we construct a deeper curriculum that provides progressively coarser supervision during early training, consistently reducing Gaussian counts. Furthermore, we show that the modulation can be performed using only a single scaling parameter, rather than learning the full 2-tap high-pass filter. Experimental results on standard benchmarks demonstrate that our method further reduces Gaussian counts while maintaining competitive rendering quality.\n\n三维高斯喷溅已成为新视角合成中的强大方法。然而，随着更细致的场景细节不断被重建，训练过程中高斯基元数量通常会显著增长，从而带来更高的内存和存储开销。近期的由粗到细策略通过调节真实图像的频率内容来控制高斯增长，尤其是 AutoOpti3DGS 利用可学习离散小波变换实现数据自适应频率调制。不过，它的调制深度受限于单层离散小波变换，而且将小波正则化与三维重建联合优化会带来梯度竞争，促使高斯过度增密。本文提出一种基于多层离散小波变换的 3DGS 频率调制框架。通过递归分解低频子带，我们构建了更深层的训练课程，使模型在训练早期获得逐步更粗的监督，从而持续减少高斯数量。进一步地，我们证明该调制只需一个缩放参数即可完成，而不必学习完整的两抽头高通滤波器。在标准基准上的实验表明，我们的方法在保持有竞争力渲染质量的同时，进一步减少了高斯数量。\n"
  },
  {
    "path": "abs/2602.14493.md",
    "content": "### Gaussian Mesh Renderer for Lightweight Differentiable Rendering\n\n3D Gaussian Splatting (3DGS) has enabled high-fidelity virtualization with fast rendering and optimization for novel view synthesis. On the other hand, triangle mesh models still remain a popular choice for surface reconstruction but suffer from slow or heavy optimization in traditional mesh-based differentiable renderers. To address this problem, we propose a new lightweight differentiable mesh renderer leveraging the efficient rasterization process of 3DGS, named Gaussian Mesh Renderer (GMR), which tightly integrates the Gaussian and mesh representations. Each Gaussian primitive is analytically derived from the corresponding mesh triangle, preserving structural fidelity and enabling the gradient flow. Compared to the traditional mesh renderers, our method achieves smoother gradients, which especially contributes to better optimization using smaller batch sizes with limited memory. Our implementation is available in the public GitHub repository at https://github.com/huntorochi/Gaussian-Mesh-Renderer.\n\n三维高斯喷溅已经实现了高保真虚拟化，并在新视角合成中兼具快速渲染和高效优化能力。另一方面，三角网格模型仍是表面重建中的常用选择，但传统基于网格的可微渲染器通常存在优化缓慢或计算开销较大的问题。为解决这一难题，我们提出一种新的轻量级可微网格渲染器 Gaussian Mesh Renderer，利用 3DGS 的高效光栅化过程，将高斯表示与网格表示紧密结合。每个高斯基元都由对应的网格三角形解析推导而来，从而保留结构保真度并支持梯度传播。与传统网格渲染器相比，我们的方法能够提供更平滑的梯度，这一点在内存受限、使用较小批量进行优化时尤其有利。实现代码已公开于 https://github.com/huntorochi/Gaussian-Mesh-Renderer。\n"
  },
  {
    "path": "abs/2602.14929.md",
    "content": "### Wrivinder: Towards Spatial Intelligence for Geo-locating Ground Images onto Satellite Imagery\n\nAligning ground-level imagery with geo-registered satellite maps is crucial for mapping, navigation, and situational awareness, yet remains challenging under large viewpoint gaps or when GPS is unreliable. We introduce Wrivinder, a zero-shot, geometry-driven framework that aggregates multiple ground photographs to reconstruct a consistent 3D scene and align it with overhead satellite imagery. Wrivinder combines SfM reconstruction, 3D Gaussian Splatting, semantic grounding, and monocular depth--based metric cues to produce a stable zenith-view rendering that can be directly matched to satellite context for metrically accurate camera geo-localization. To support systematic evaluation of this task, which lacks suitable benchmarks, we also release MC-Sat, a curated dataset linking multi-view ground imagery with geo-registered satellite tiles across diverse outdoor environments. Together, Wrivinder and MC-Sat provide a first comprehensive baseline and testbed for studying geometry-centered cross-view alignment without paired supervision. In zero-shot experiments, Wrivinder achieves sub-30\\,m geolocation accuracy across both dense and large-area scenes, highlighting the promise of geometry-based aggregation for robust ground-to-satellite localization.\n\n将地面图像与已地理配准的卫星地图对齐，对于测绘、导航和态势感知至关重要，但在视角差距巨大或 GPS 不可靠时，这一任务仍十分困难。我们提出 Wrivinder，一个零样本、几何驱动的框架，通过聚合多张地面照片重建一致的三维场景，并将其与俯视卫星图像对齐。Wrivinder 结合运动恢复结构重建、三维高斯喷溅、语义对齐以及基于单目深度的度量线索，生成稳定的天顶视角渲染结果，从而可直接与卫星上下文匹配，实现度量精确的相机地理定位。为了系统评估这一缺乏合适基准的任务，我们还发布了 MC-Sat 数据集，将多视图地面图像与不同户外环境中的地理配准卫星图块对应起来。Wrivinder 与 MC-Sat 一起，为研究在无配对监督条件下以几何为中心的跨视角对齐提供了首个完整基线与测试平台。在零样本实验中，Wrivinder 在密集和大范围场景上都实现了 30 米以内的定位精度，显示出基于几何聚合进行稳健地面到卫星定位的潜力。\n"
  },
  {
    "path": "abs/2602.15181.md",
    "content": "### Time-Archival Camera Virtualization for Sports and Visual Performances\n\nCamera virtualization -- an emerging solution to novel view synthesis -- holds transformative potential for visual entertainment, live performances, and sports broadcasting by enabling the generation of photorealistic images from novel viewpoints using images from a limited set of calibrated multiple static physical cameras. Despite recent advances, achieving spatially and temporally coherent and photorealistic rendering of dynamic scenes with efficient time-archival capabilities, particularly in fast-paced sports and stage performances, remains challenging for existing approaches. Recent methods based on 3D Gaussian Splatting (3DGS) for dynamic scenes could offer real-time view-synthesis results. Yet, they are hindered by their dependence on accurate 3D point clouds from the structure-from-motion method and their inability to handle large, non-rigid, rapid motions of different subjects (e.g., flips, jumps, articulations, sudden player-to-player transitions). Moreover, independent motions of multiple subjects can break the Gaussian-tracking assumptions commonly used in 4DGS, ST-GS, and other dynamic splatting variants. This paper advocates reconsidering a neural volume rendering formulation for camera virtualization and efficient time-archival capabilities, making it useful for sports broadcasting and related applications. By modeling a dynamic scene as rigid transformations across multiple synchronized camera views at a given time, our method performs neural representation learning, providing enhanced visual rendering quality at test time. A key contribution of our approach is its support for time-archival, i.e., users can revisit any past temporal instance of a dynamic scene and can perform novel view synthesis, enabling retrospective rendering for replay, analysis, and archival of live events, a functionality absent in existing neural rendering approaches and novel view synthesis...\n\n相机虚拟化是新视角合成中的新兴方案，能够利用少量已标定的静态物理相机图像，从新视角生成照片级真实图像，因此在视觉娱乐、现场演出和体育转播中具有变革性潜力。尽管近期研究取得了进展，但要在快速体育运动和舞台表演等场景下，同时实现具有空间和时间一致性的照片级动态场景渲染，并支持高效的时间归档能力，现有方法仍然困难重重。面向动态场景的三维高斯喷溅方法虽然有望提供实时视图合成效果，但它们受限于需要借助运动恢复结构得到精确三维点云，而且难以处理不同主体之间大幅、非刚体且快速的运动，例如翻转、跳跃、关节运动和运动员之间的突然切换。此外，多主体之间相互独立的运动还会破坏四维高斯喷溅等动态喷溅方法中常见的高斯跟踪假设。本文主张重新考虑基于神经体渲染的建模形式，以支持相机虚拟化和高效时间归档能力，使其更适用于体育转播及相关应用。我们的方法将动态场景建模为给定时刻多个同步相机视图之间的刚体变换，并在此基础上进行神经表示学习，从而在测试阶段获得更好的视觉渲染质量。我们方法的一个关键贡献是支持时间归档，也就是说，用户可以回看动态场景过去任意时刻，并在该时刻执行新视角合成，从而支持对现场事件的回放、分析和归档，而这是现有神经渲染和新视角合成方法所不具备的。\n"
  },
  {
    "path": "abs/2602.15355.md",
    "content": "### DAV-GSWT: Diffusion-Active-View Sampling for Data-Efficient Gaussian Splatting Wang Tiles\n\nThe emergence of 3D Gaussian Splatting has fundamentally redefined the capabilities of photorealistic neural rendering by enabling high-throughput synthesis of complex environments. While procedural methods like Wang Tiles have recently been integrated to facilitate the generation of expansive landscapes, these systems typically remain constrained by a reliance on densely sampled exemplar reconstructions. We present DAV-GSWT, a data-efficient framework that leverages diffusion priors and active view sampling to synthesize high-fidelity Gaussian Splatting Wang Tiles from minimal input observations. By integrating a hierarchical uncertainty quantification mechanism with generative diffusion models, our approach autonomously identifies the most informative viewpoints while hallucinating missing structural details to ensure seamless tile transitions. Experimental results indicate that our system significantly reduces the required data volume while maintaining the visual integrity and interactive performance necessary for large-scale virtual environments.\n\n三维高斯喷溅的出现大幅提升了照片级神经渲染复杂环境的能力。尽管 Wang Tiles 等程序化方法已被引入以生成大规模场景，但这类系统通常仍依赖密集采样的示例重建。我们提出 DAV-GSWT，一个数据高效框架，结合扩散先验与主动视角采样，仅凭极少输入观测即可合成高保真的 Gaussian Splatting Wang Tiles。通过将层次化不确定性量化机制与生成式扩散模型结合，该方法能够自动识别最有信息量的视角，并在缺失结构处进行合理补全，以保证平铺块之间无缝衔接。实验结果表明，该系统在显著减少数据需求的同时，仍能保持大规模虚拟环境所需的视觉完整性和交互性能。\n"
  },
  {
    "path": "abs/2602.15516.md",
    "content": "### Semantic-Guided 3D Gaussian Splatting for Transient Object Removal\n\nTransient objects in casual multi-view captures cause ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstruction. Existing solutions relied on scene decomposition at significant memory cost or on motion-based heuristics that were vulnerable to parallax ambiguity. A semantic filtering framework was proposed for category-aware transient removal using vision-language models. CLIP similarity scores between rendered views and distractor text prompts were accumulated per-Gaussian across training iterations. Gaussians exceeding a calibrated threshold underwent opacity regularization and periodic pruning. Unlike motion-based approaches, semantic classification resolved parallax ambiguity by identifying object categories independently of motion patterns. Experiments on the RobustNeRF benchmark demonstrated consistent improvement in reconstruction quality over vanilla 3DGS across four sequences, while maintaining minimal memory overhead and real-time rendering performance. Threshold calibration and comparisons with baselines validated semantic guidance as a practical strategy for transient removal in scenarios with predictable distractor categories.\n\n在随手采集的多视图数据中，瞬时物体会在三维高斯喷溅重建中引入重影伪影。现有方法要么依赖内存开销较大的场景分解，要么依赖容易受视差歧义影响的基于运动的启发式策略。本文提出一种基于语义过滤的框架，利用视觉语言模型实现类别感知的瞬时物体去除。具体而言，我们在训练过程中按高斯基元累计渲染视图与干扰文本提示之间的 CLIP 相似度分数。超过校准阈值的高斯会受到不透明度正则化并被周期性剪枝。与基于运动的方法不同，语义分类能够独立于运动模式识别物体类别，从而消除视差歧义。在 RobustNeRF 基准上的实验表明，该方法在四个序列上都能相较原始 3DGS 稳定提升重建质量，同时保持极小的内存开销和实时渲染性能。阈值校准与基线比较进一步验证了语义引导是处理可预测干扰类别瞬时物体的一种实用策略。\n"
  },
  {
    "path": "abs/2602.16713.md",
    "content": "### Three-dimensional Damage Visualization of Civil Structures via Gaussian Splatting-enabled Digital Twins\n\nRecent advancements in civil infrastructure inspections underscore the need for precise three-dimensional (3D) damage visualization on digital twins, transcending traditional 2D image-based damage identifications. Compared to conventional photogrammetric 3D reconstruction techniques, modern approaches such as Neural Radiance Field (NeRF) and Gaussian Splatting (GS) excel in scene representation, rendering quality, and handling featureless regions. Among them, GS stands out for its efficiency, leveraging discrete anisotropic 3D Gaussians to represent radiance fields, unlike NeRF's continuous implicit model. This study introduces a GS-enabled digital twin method tailored for effective 3D damage visualization. The method's key contributions include: 1) utilizing GS-based 3D reconstruction to visualize 2D damage segmentation results while reducing segmentation errors; 2) developing a multi-scale reconstruction strategy to balance efficiency and damage detail; 3) enabling digital twin updates as damage evolves over time. Demonstrated on an open-source synthetic dataset for post-earthquake inspections, the proposed approach offers a promising solution for comprehensive 3D damage visualization in civil infrastructure digital twins.\n\n近年来，土木基础设施检测的发展凸显了在数字孪生上进行精确三维损伤可视化的需求，这超越了传统基于二维图像的损伤识别方式。与传统摄影测量三维重建技术相比，Neural Radiance Field（NeRF）和 Gaussian Splatting（GS）等现代方法在场景表示、渲染质量以及处理低纹理区域方面表现更优。其中，GS 通过离散各向异性的三维高斯来表示辐射场，相较于 NeRF 的连续隐式模型，具有更高效率。本文提出一种面向高效三维损伤可视化的 GS 数字孪生方法。其主要贡献包括：1）利用基于 GS 的三维重建对二维损伤分割结果进行可视化，同时减少分割误差；2）提出多尺度重建策略，在效率与损伤细节之间取得平衡；3）支持随着损伤随时间演化而对数字孪生进行更新。该方法在一个面向震后检测的开源合成数据集上进行了验证，为土木基础设施数字孪生中的全面三维损伤可视化提供了一个有前景的解决方案。\n"
  },
  {
    "path": "abs/2602.17117.md",
    "content": "### i-PhysGaussian: Implicit Physical Simulation for 3D Gaussian Splatting\n\nPhysical simulation predicts future states of objects based on material properties and external loads, enabling blueprints for both Industry and Engineering to conduct risk management. Current 3D reconstruction-based simulators typically rely on explicit, step-wise updates, which are sensitive to step time and suffer from rapid accuracy degradation under complicated scenarios, such as high-stiffness materials or quasi-static movement. To address this, we introduce i-PhysGaussian, a framework that couples 3D Gaussian Splatting (3DGS) with an implicit Material Point Method (MPM) integrator. Unlike explicit methods, our solution obtains an end-of-step state by minimizing a momentum-balance residual through implicit Newton-type optimization with a GMRES solver. This formulation significantly reduces time-step sensitivity and ensures physical consistency. Our results demonstrate that i-PhysGaussian maintains stability at up to 20x larger time steps than explicit baselines, preserving structural coherence and smooth motion even in complex dynamic transitions.\n\n物理仿真通过材料属性和外部载荷预测物体未来状态，是工业与工程领域进行风险管理的重要基础。当前基于三维重建的模拟器通常依赖显式的逐步更新方式，对时间步长敏感，并且在高刚度材料或准静态运动等复杂场景下精度会迅速下降。为此，我们提出 i-PhysGaussian，一个将三维高斯喷溅与隐式物质点法积分器耦合的框架。不同于显式方法，我们通过基于 GMRES 求解器的隐式牛顿型优化，最小化动量平衡残差来求得每一步结束时的状态。该形式显著降低了对时间步长的敏感性，并保证了物理一致性。实验结果表明，i-PhysGaussian 在时间步长放大到显式基线 20 倍时仍能保持稳定，并在复杂动态过渡中维持结构连贯性和平滑运动。\n"
  },
  {
    "path": "abs/2602.17124.md",
    "content": "### 3D Scene Rendering with Multimodal Gaussian Splatting\n\n3D scene reconstruction and rendering are core tasks in computer vision, with applications spanning industrial monitoring, robotics, and autonomous driving. Recent advances in 3D Gaussian Splatting (GS) and its variants have achieved impressive rendering fidelity while maintaining high computational and memory efficiency. However, conventional vision-based GS pipelines typically rely on a sufficient number of camera views to initialize the Gaussian primitives and train their parameters, typically incurring additional processing cost during initialization while falling short in conditions where visual cues are unreliable, such as adverse weather, low illumination, or partial occlusions. To cope with these challenges, and motivated by the robustness of radio-frequency (RF) signals to weather, lighting, and occlusions, we introduce a multimodal framework that integrates RF sensing, such as automotive radar, with GS-based rendering as a more efficient and robust alternative to vision-only GS rendering. The proposed approach enables efficient depth prediction from only sparse RF-based depth measurements, yielding a high-quality 3D point cloud for initializing Gaussian functions across diverse GS architectures. Numerical tests demonstrate the merits of judiciously incorporating RF sensing into GS pipelines, achieving high-fidelity 3D scene rendering driven by RF-informed structural accuracy.\n\n三维场景重建与渲染是计算机视觉中的核心任务，广泛应用于工业监测、机器人和自动驾驶。近期三维高斯喷溅及其变体在保持较高计算与内存效率的同时，实现了出色的渲染保真度。然而，传统基于视觉的高斯喷溅流程通常需要足够多的相机视角来初始化高斯基元并训练参数，这不仅增加初始化成本，也难以应对恶劣天气、低照度或部分遮挡等视觉线索不可靠的情况。为此，受射频信号对天气、光照和遮挡更鲁棒这一特性的启发，我们提出一种多模态框架，将汽车雷达等射频传感与基于高斯喷溅的渲染结合起来，作为纯视觉高斯喷溅渲染更高效、更稳健的替代方案。该方法仅凭稀疏的射频深度测量即可实现高效深度预测，并生成高质量三维点云，用于初始化多种高斯喷溅架构中的高斯函数。数值实验表明，将射频感知合理引入高斯喷溅流程，能够借助射频提供的结构精度，实现高保真的三维场景渲染。\n"
  },
  {
    "path": "abs/2602.17134.md",
    "content": "### B$^3$-Seg: Camera-Free, Training-Free 3DGS Segmentation via Analytic EIG and Beta-Bernoulli Bayesian Updates\n\nInteractive 3D Gaussian Splatting (3DGS) segmentation is essential for real-time editing of pre-reconstructed assets in film and game production. However, existing methods rely on predefined camera viewpoints, ground-truth labels, or costly retraining, making them impractical for low-latency use. We propose B$^3$-Seg (Beta-Bernoulli Bayesian Segmentation for 3DGS), a fast and theoretically grounded method for open-vocabulary 3DGS segmentation under camera-free and training-free conditions. Our approach reformulates segmentation as sequential Beta-Bernoulli Bayesian updates and actively selects the next view via analytic Expected Information Gain (EIG). This Bayesian formulation guarantees the adaptive monotonicity and submodularity of EIG, which produces a greedy $(1{-}1/e)$ approximation to the optimal view sampling policy. Experiments on multiple datasets show that B$^3$-Seg achieves competitive results to high-cost supervised methods while operating end-to-end segmentation within a few seconds. The results demonstrate that B$^3$-Seg enables practical, interactive 3DGS segmentation with provable information efficiency.\n\n交互式三维高斯喷溅分割对于影视和游戏制作中预重建资产的实时编辑至关重要。然而，现有方法依赖预定义相机视角、真实标签或高开销再训练，难以满足低延迟需求。我们提出 B3-Seg，一种在无相机、无训练条件下进行开放词汇 3DGS 分割的快速且理论基础明确的方法。该方法将分割问题重写为序列式 Beta-Bernoulli 贝叶斯更新，并通过解析形式的期望信息增益主动选择下一视角。这一贝叶斯表述保证了期望信息增益的自适应单调性与次模性，因此可得到对最优视角采样策略的贪心近似，近似比为 1 减去 1 除以 e。多个数据集上的实验表明，B3-Seg 在端到端分割仅需数秒的情况下，能够获得与高成本监督方法相当的结果，说明它能够以可证明的信息效率实现实用的交互式 3DGS 分割。\n"
  },
  {
    "path": "abs/2602.17182.md",
    "content": "### NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting\n\nVisual simultaneous localization and mapping (V-SLAM) is a fundamental capability for autonomous perception and navigation. However, endoscopic scenes violate the rigidity assumption due to persistent soft-tissue deformations, creating a strong coupling ambiguity between camera ego-motion and intrinsic deformation. Although recent monocular non-rigid SLAM methods have made notable progress, they often lack effective decoupling mechanisms and rely on sparse or low-fidelity scene representations, which leads to tracking drift and limited reconstruction quality. To address these limitations, we propose NRGS-SLAM, a monocular non-rigid SLAM system for endoscopy based on 3D Gaussian Splatting. To resolve the coupling ambiguity, we introduce a deformation-aware 3D Gaussian map that augments each Gaussian primitive with a learnable deformation probability, optimized via a Bayesian self-supervision strategy without requiring external non-rigidity labels. Building on this representation, we design a deformable tracking module that performs robust coarse-to-fine pose estimation by prioritizing low-deformation regions, followed by efficient per-frame deformation updates. A carefully designed deformable mapping module progressively expands and refines the map, balancing representational capacity and computational efficiency. In addition, a unified robust geometric loss incorporates external geometric priors to mitigate the inherent ill-posedness of monocular non-rigid SLAM. Extensive experiments on multiple public endoscopic datasets demonstrate that NRGS-SLAM achieves more accurate camera pose estimation (up to 50\\% reduction in RMSE) and higher-quality photo-realistic reconstructions than state-of-the-art methods. Comprehensive ablation studies further validate the effectiveness of our key design choices. Source code will be publicly available upon paper acceptance.\n\n视觉同步定位与建图是自主感知和导航的基础能力。然而，内窥镜场景由于软组织持续形变而违反了刚性假设，导致相机自运动与场景内在形变之间存在强耦合歧义。尽管近期单目非刚性 SLAM 取得了进展，但其往往缺乏有效解耦机制，并依赖稀疏或低保真的场景表示，因此容易产生跟踪漂移并限制重建质量。为此，我们提出 NRGS-SLAM，一个基于三维高斯喷溅的内窥镜单目非刚性 SLAM 系统。为了解决耦合歧义，我们引入形变感知三维高斯地图，为每个高斯基元增加可学习的形变概率，并通过贝叶斯自监督策略进行优化，无需额外非刚性标签。在此表示基础上，我们设计了可形变跟踪模块，优先利用低形变区域执行由粗到细的稳健位姿估计，并进行高效逐帧形变更新。精心设计的可形变建图模块则逐步扩展和细化地图，在表示能力与计算效率之间取得平衡。此外，我们还引入统一的鲁棒几何损失，结合外部几何先验，以缓解单目非刚性 SLAM 的固有欠定性。在多个公开内窥镜数据集上的大量实验表明，NRGS-SLAM 相比现有最先进方法可实现更准确的相机位姿估计，RMSE 最多降低 50%，并生成更高质量的照片级真实重建。消融实验进一步验证了关键设计的有效性。代码将在论文录用后公开。\n"
  },
  {
    "path": "abs/2602.17473.md",
    "content": "### 4D Monocular Surgical Reconstruction under Arbitrary Camera Motions\n\nReconstructing deformable surgical scenes from endoscopic videos is challenging and clinically important. Recent state-of-the-art methods based on implicit neural representations or 3D Gaussian splatting have made notable progress. However, most are designed for deformable scenes with fixed endoscope viewpoints and rely on stereo depth priors or accurate structure-from-motion for initialization and optimization, limiting their ability to handle monocular sequences with large camera motion in real clinical settings. To address this, we propose Local-EndoGS, a high-quality 4D reconstruction framework for monocular endoscopic sequences with arbitrary camera motion. Local-EndoGS introduces a progressive, window-based global representation that allocates local deformable scene models to each observed window, enabling scalability to long sequences with substantial motion. To overcome unreliable initialization without stereo depth or accurate structure-from-motion, we design a coarse-to-fine strategy integrating multi-view geometry, cross-window information, and monocular depth priors, providing a robust foundation for optimization. We further incorporate long-range 2D pixel trajectory constraints and physical motion priors to improve deformation plausibility. Experiments on three public endoscopic datasets with deformable scenes and varying camera motions show that Local-EndoGS consistently outperforms state-of-the-art methods in appearance quality and geometry. Ablation studies validate the effectiveness of our key designs. Code will be released upon acceptance at: https://github.com/IRMVLab/Local-EndoGS.\n\n从内窥镜视频中重建可形变的手术场景既具有挑战性，也具有重要临床价值。近期基于隐式神经表示或三维高斯喷溅的方法虽已取得进展，但大多数方法都面向固定内窥镜视角的可形变场景，并依赖双目深度先验或准确的运动恢复结构进行初始化与优化，因此难以处理真实临床环境中存在大幅相机运动的单目序列。为此，我们提出 Local-EndoGS，一个面向任意相机运动单目内窥镜序列的高质量四维重建框架。Local-EndoGS 引入渐进式、基于窗口的全局表示，为每个观测窗口分配局部可形变场景模型，从而能够扩展到具有显著运动的长序列。为解决缺乏双目深度或精确运动恢复结构时初始化不可靠的问题，我们设计了一个由粗到细的策略，将多视图几何、跨窗口信息和单目深度先验结合起来，为优化提供稳健基础。我们还进一步引入长程二维像素轨迹约束和物理运动先验，以提高形变的合理性。在三个包含不同相机运动和可形变场景的公开内窥镜数据集上的实验表明，Local-EndoGS 在外观质量和几何重建方面都持续优于现有最先进方法。消融实验验证了关键设计的有效性。代码将在录用后发布于 https://github.com/IRMVLab/Local-EndoGS。\n"
  },
  {
    "path": "abs/2602.18322.md",
    "content": "### Unifying Color and Lightness Correction with View-Adaptive Curve Adjustment for Robust 3D Novel View Synthesis\n\nHigh-quality image acquisition in real-world environments remains challenging due to complex illumination variations and inherent limitations of camera imaging pipelines. These issues are exacerbated in multi-view capture, where differences in lighting, sensor responses, and image signal processor (ISP) configurations introduce photometric and chromatic inconsistencies that violate the assumptions of photometric consistency underlying modern 3D novel view synthesis (NVS) methods, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), leading to degraded reconstruction and rendering quality. We propose Luminance-GS++, a 3DGS-based framework for robust NVS under diverse illumination conditions. Our method combines a globally view-adaptive lightness adjustment with a local pixel-wise residual refinement for precise color correction. We further design unsupervised objectives that jointly enforce lightness correction and multi-view geometric and photometric consistency. Extensive experiments demonstrate state-of-the-art performance across challenging scenarios, including low-light, overexposure, and complex luminance and chromatic variations. Unlike prior approaches that modify the underlying representation, our method preserves the explicit 3DGS formulation, improving reconstruction fidelity while maintaining real-time rendering efficiency.\n\n由于复杂光照变化以及相机成像流程本身的限制，在真实环境中获取高质量图像仍然困难。在多视图采集场景下，这些问题会进一步放大，因为不同的光照条件、传感器响应和图像信号处理配置会引入光度与色彩不一致，破坏包括神经辐射场和三维高斯喷溅在内的现代三维新视角合成方法所依赖的光度一致性假设，从而降低重建和渲染质量。我们提出 Luminance-GS++，一个面向复杂光照条件下稳健新视角合成的 3DGS 框架。该方法结合全局的视角自适应亮度调整和局部逐像素残差细化，以实现精确色彩校正。我们还设计了无监督目标，使亮度校正与多视图几何和光度一致性联合成立。大量实验表明，在低照度、过曝以及复杂亮度和色彩变化等具有挑战性的场景中，该方法达到了最先进性能。与需要修改底层表示的方法不同，我们的方法保留了显式 3DGS 形式，在提高重建保真度的同时维持了实时渲染效率。\n"
  },
  {
    "path": "abs/2602.18830.md",
    "content": "### Spatial-Temporal State Propagation Autoregressive Model for 4D Object Generation\n\nGenerating high-quality 4D objects with spatial-temporal consistency is still formidable. Existing diffusion-based methods often struggle with spatial-temporal inconsistency, as they fail to leverage outputs from all previous timesteps to guide the generation at the current timestep. Therefore, we propose a Spatial-Temporal State Propagation AutoRegressive Model (4DSTAR), which generates 4D objects maintaining temporal-spatial consistency. 4DSTAR formulates the generation problem as the prediction of tokens that represent the 4D object. It consists of two key components: (1) The dynamic spatial-temporal state propagation autoregressive model (STAR) is proposed, which achieves spatial-temporal consistent generation. Unlike standard autoregressive models, STAR divides prediction tokens into groups based on timesteps. It models long-term dependencies by propagating spatial-temporal states from previous groups and utilizes these dependencies to guide generation at the next timestep. To this end, a spatial-temporal container is proposed, which dynamically updating the effective spatial-temporal state features from all historical groups, then updated features serve as conditional features to guide the prediction of the next token group. (2) The 4D VQ-VAE is proposed, which implicitly encodes the 4D structure into discrete space and decodes the discrete tokens predicted by STAR into temporally coherent dynamic 3D Gaussians. Experiments demonstrate that 4DSTAR generates spatial-temporal consistent 4D objects, and achieves performance competitive with diffusion models.\n\n生成同时具有空间和时间一致性的高质量四维对象仍然十分困难。现有基于扩散的方法常常难以保证时空一致性，因为它们无法充分利用所有先前时间步的输出去指导当前时间步的生成。为此，我们提出时空状态传播自回归模型 4DSTAR，用于生成保持时空一致性的四维对象。4DSTAR 将生成问题表述为预测表示四维对象的 token，并包含两个关键组成部分。第一，动态时空状态传播自回归模型 STAR 用于实现时空一致的生成。不同于标准自回归模型，STAR 按时间步对预测 token 进行分组，通过传播前面各组的时空状态来建模长程依赖，并利用这些依赖指导下一时间步的生成。为此，我们提出一个时空容器，能够动态更新来自所有历史组的有效时空状态特征，并将更新后的特征作为条件来指导下一组 token 的预测。第二，我们提出四维 VQ-VAE，用于将四维结构隐式编码到离散空间，并将 STAR 预测的离散 token 解码为具有时间一致性的动态三维高斯。实验表明，4DSTAR 能生成具有时空一致性的四维对象，并取得与扩散模型相当的性能。\n"
  },
  {
    "path": "abs/2602.19323.md",
    "content": "### DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for real-time and high-fidelity 3D reconstruction from posed images. However, recent studies reveal its vulnerability to adversarial corruptions in input views, where imperceptible yet consistent perturbations can drastically degrade rendering quality, increase training and rendering time, and inflate memory usage, even leading to server denial-of-service. In our work, to mitigate this issue, we begin by analyzing the distinct behaviors of adversarial perturbations in the low- and high-frequency components of input images using wavelet transforms. Based on this observation, we design a simple yet effective frequency-aware defense strategy that reconstructs training views by filtering high-frequency noise while preserving low-frequency content. This approach effectively suppresses adversarial artifacts while maintaining the authenticity of the original scene. Notably, it does not significantly impair training on clean data, achieving a desirable trade-off between robustness and performance on clean inputs. Through extensive experiments under a wide range of attack intensities on multiple benchmarks, we demonstrate that our method substantially enhances the robustness of 3DGS without access to clean ground-truth supervision. By highlighting and addressing the overlooked vulnerabilities of 3D Gaussian Splatting, our work paves the way for more robust and secure 3D reconstructions.\n\n三维高斯喷溅已经成为利用带位姿图像进行实时高保真三维重建的强大范式。然而，近期研究表明，它对输入视图中的对抗性扰动十分脆弱，即使这些扰动难以察觉，只要在多个视图中保持一致，就可能显著降低渲染质量、增加训练和渲染时间并抬高内存占用，甚至导致服务器拒绝服务。为缓解这一问题，我们首先利用小波变换分析了对抗扰动在输入图像低频和高频成分中的不同表现。基于这一观察，我们设计了一种简单而有效的频率感知防御策略，通过滤除高频噪声并保留低频内容来重建训练视图。该方法能够有效抑制对抗伪影，同时保持原始场景的真实性。值得注意的是，它不会明显削弱模型在干净数据上的训练效果，从而在鲁棒性与干净输入性能之间取得了理想平衡。我们在多个基准、不同攻击强度下进行了大量实验，证明该方法在无需干净真实监督的前提下，能够显著提升 3DGS 的鲁棒性。通过揭示并解决三维高斯喷溅中被忽视的脆弱性，这项工作为更稳健、更安全的三维重建铺平了道路。\n"
  },
  {
    "path": "abs/2602.19753.md",
    "content": "### RAP: Fast Feedforward Rendering-Free Attribute-Guided Primitive Importance Score Prediction for Efficient 3D Gaussian Splatting Processing\n\n3D Gaussian Splatting (3DGS) has emerged as a leading technology for high-quality 3D scene reconstruction. However, the iterative refinement and densification process leads to the generation of a large number of primitives, each contributing to the reconstruction to a substantially different extent. Estimating primitive importance is thus crucial, both for removing redundancy during reconstruction and for enabling efficient compression and transmission. Existing methods typically rely on rendering-based analyses, where each primitive is evaluated through its contribution across multiple camera viewpoints. However, such methods are sensitive to the number and selection of views, rely on specialized differentiable rasterizers, and have long calculation times that grow linearly with view count, making them difficult to integrate as plug-and-play modules and limiting scalability and generalization. To address these issues, we propose RAP, a fast feedforward rendering-free attribute-guided method for efficient importance score prediction in 3DGS. RAP infers primitive significance directly from intrinsic Gaussian attributes and local neighborhood statistics, avoiding rendering-based or visibility-dependent computations. A compact MLP predicts per-primitive importance scores using rendering loss, pruning-aware loss, and significance distribution regularization. After training on a small set of scenes, RAP generalizes effectively to unseen data and can be seamlessly integrated into reconstruction, compression, and transmission pipelines. Our code is publicly available at https://github.com/yyyykf/RAP.\n\n三维高斯喷溅已成为高质量三维场景重建的重要技术。然而，迭代细化和增密过程会生成大量基元，而这些基元对重建的贡献程度差异很大。因此，估计基元重要性十分关键，不仅有助于在重建过程中去除冗余，也能支持高效压缩与传输。现有方法通常依赖基于渲染的分析，通过考察每个基元在多个相机视角下的贡献来估计其重要性。然而，这类方法对视图数量和选择高度敏感，依赖专门的可微光栅化器，而且计算时间随视图数量线性增长，难以作为即插即用模块集成，也限制了可扩展性和泛化性。为了解决这些问题，我们提出 RAP，一种快速、前馈式、无需渲染、基于属性引导的 3DGS 基元重要性预测方法。RAP 直接根据高斯本身的属性和局部邻域统计推断基元重要性，避免依赖渲染或可见性计算。一个紧凑的多层感知机利用渲染损失、剪枝感知损失和重要性分布正则化来预测逐基元的重要性分数。在少量场景上训练后，RAP 能有效泛化到未见数据，并可无缝接入重建、压缩和传输流程。代码见 https://github.com/yyyykf/RAP。\n"
  },
  {
    "path": "abs/2602.19916.md",
    "content": "### Augmented Radiance Field: A General Framework for Enhanced Gaussian Splatting\n\nDue to the real-time rendering performance, 3D Gaussian Splatting (3DGS) has emerged as the leading method for radiance field reconstruction. However, its reliance on spherical harmonics for color encoding inherently limits its ability to separate diffuse and specular components, making it challenging to accurately represent complex reflections. To address this, we propose a novel enhanced Gaussian kernel that explicitly models specular effects through view-dependent opacity. Meanwhile, we introduce an error-driven compensation strategy to improve rendering quality in existing 3DGS scenes. Our method begins with 2D Gaussian initialization and then adaptively inserts and optimizes enhanced Gaussian kernels, ultimately producing an augmented radiance field. Experiments demonstrate that our method not only surpasses state-of-the-art NeRF methods in rendering performance but also achieves greater parameter efficiency. Project page at: https://xiaoxinyyx.github.io/augs.\n\n由于具备实时渲染能力，三维高斯喷溅已成为辐射场重建的主流方法。然而，它依赖球谐函数进行颜色编码，这从根本上限制了其分离漫反射与镜面反射成分的能力，因此难以准确表示复杂反射。为此，我们提出一种新的增强型高斯核，通过视角相关的不透明度显式建模镜面效应。同时，我们引入一种由误差驱动的补偿策略，以提高现有 3DGS 场景的渲染质量。该方法从二维高斯初始化开始，然后自适应地插入并优化增强型高斯核，最终形成增强辐射场。实验表明，我们的方法不仅在渲染性能上超过现有最先进的 NeRF 方法，而且在参数效率上更具优势。项目页面见 https://xiaoxinyyx.github.io/augs。\n"
  },
  {
    "path": "abs/2602.20160.md",
    "content": "### tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction\n\nWe propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstruction with linear computational complexity, further scaling the model's capability. Our framework efficiently compresses multiple image observations into the fast weights of the TTT layer, forming an implicit 3D representation in the latent space that can be decoded into various explicit formats, such as Gaussian Splats (GS) for downstream applications. The online learning variant of our model supports progressive 3D reconstruction and refinement from streaming observations. We demonstrate that pretraining on novel view synthesis tasks effectively transfers to explicit 3D modeling, resulting in improved reconstruction quality and faster convergence. Extensive experiments show that our method achieves superior performance in feedforward 3D Gaussian reconstruction compared to state-of-the-art approaches on both objects and scenes.\n\n我们提出 tttLRM，一个新型的大规模三维重建模型，通过引入测试时训练层，实现具有线性计算复杂度的长上下文自回归三维重建，从而进一步扩展模型能力。该框架能够高效地将多张图像观测压缩到测试时训练层的快速权重中，在潜在空间中形成隐式三维表示，并可解码为多种显式格式，例如适用于下游任务的 Gaussian Splats。我们模型的在线学习变体还支持从流式观测中进行渐进式三维重建与细化。实验表明，在新视角合成任务上的预训练能够有效迁移到显式三维建模，带来更好的重建质量和更快的收敛速度。大量实验显示，在物体和场景两类任务上，我们的方法在前馈式三维高斯重建方面都优于现有最先进方法。\n"
  },
  {
    "path": "abs/2602.20342.md",
    "content": "### Large-scale Photorealistic Outdoor 3D Scene Reconstruction from UAV Imagery Using Gaussian Splatting Techniques\n\nIn this study, we present an end-to-end pipeline capable of converting drone-captured video streams into high-fidelity 3D reconstructions with minimal latency. Unmanned aerial vehicles (UAVs) are extensively used in aerial real-time perception applications. Moreover, recent advances in 3D Gaussian Splatting (3DGS) have demonstrated significant potential for real-time neural rendering. However, their integration into end-to-end UAV-based reconstruction and visualization systems remains underexplored. Our goal is to propose an efficient architecture that combines live video acquisition via RTMP streaming, synchronized sensor fusion, camera pose estimation, and 3DGS optimization, achieving continuous model updates and low-latency deployment within interactive visualization environments that supports immersive augmented and virtual reality (AR/VR) applications. Experimental results demonstrate that the proposed method achieves competitive visual fidelity, while delivering significantly higher rendering performance and substantially reduced end-to-end latency, compared to NeRF-based approaches. Reconstruction quality remains within 4-7\\% of high-fidelity offline references, confirming the suitability of the proposed system for real-time, scalable augmented perception from aerial platforms.\n\n本文提出一条端到端流程，能够将无人机采集的视频流以较低延迟转换为高保真的三维重建。无人机已经被广泛应用于空中实时感知场景，而三维高斯喷溅的最新进展也显示出其在实时神经渲染中的巨大潜力。然而，如何将其整合进端到端的无人机重建与可视化系统仍缺乏充分研究。我们的目标是提出一种高效架构，将基于 RTMP 的实时视频采集、同步传感器融合、相机位姿估计和 3DGS 优化结合起来，在支持沉浸式增强现实和虚拟现实应用的交互式可视化环境中，实现模型持续更新和低延迟部署。实验结果表明，与基于 NeRF 的方法相比，该系统在保持有竞争力视觉保真度的同时，能够提供显著更高的渲染性能和更低的端到端延迟。其重建质量与高保真离线参考结果相比仅相差 4% 到 7%，说明该系统适合用于来自空中平台的实时、可扩展增强感知。\n"
  },
  {
    "path": "abs/2602.20363.md",
    "content": "### Aesthetic Camera Viewpoint Suggestion with 3D Aesthetic Field\n\nThe aesthetic quality of a scene depends strongly on camera viewpoint. Existing approaches for aesthetic viewpoint suggestion are either single-view adjustments, predicting limited camera adjustments from a single image without understanding scene geometry, or 3D exploration approaches, which rely on dense captures or prebuilt 3D environments coupled with costly reinforcement learning (RL) searches. In this work, we introduce the notion of 3D aesthetic field that enables geometry-grounded aesthetic reasoning in 3D with sparse captures, allowing efficient viewpoint suggestions in contrast to costly RL searches. We opt to learn this 3D aesthetic field using a feedforward 3D Gaussian Splatting network that distills high-level aesthetic knowledge from a pretrained 2D aesthetic model into 3D space, enabling aesthetic prediction for novel viewpoints from only sparse input views. Building on this field, we propose a two-stage search pipeline that combines coarse viewpoint sampling with gradient-based refinement, efficiently identifying aesthetically appealing viewpoints without dense captures or RL exploration. Extensive experiments show that our method consistently suggests viewpoints with superior framing and composition compared to existing approaches, establishing a new direction toward 3D-aware aesthetic modeling.\n\n场景的审美质量与相机视角密切相关。现有审美视角推荐方法要么只在单张图像上预测有限的相机调整，缺乏对场景几何的理解；要么依赖稠密采集或预建三维环境，并配合代价高昂的强化学习搜索。本文提出三维审美场这一概念，使得在稀疏采集条件下也能进行有几何依据的三维审美推理，从而高效推荐视角，而无需昂贵的强化学习搜索。我们采用前馈式三维高斯喷溅网络来学习这一三维审美场，将预训练二维审美模型中的高层审美知识蒸馏到三维空间中，因此只需稀疏输入视图，就能预测新视角的审美质量。在此基础上，我们设计了一个两阶段搜索流程，将粗粒度视角采样与基于梯度的细化结合起来，从而在无需稠密采集或强化学习探索的情况下，高效找到更具审美吸引力的视角。大量实验表明，我们的方法在构图和取景上持续优于现有方法，为面向三维感知的审美建模提供了新方向。\n"
  },
  {
    "path": "abs/2602.20556.md",
    "content": "### WildGHand: Learning Anti-Perturbation Gaussian Hand Avatars from Monocular In-the-Wild Videos\n\nDespite recent progress in 3D hand reconstruction from monocular videos, most existing methods rely on data captured in well-controlled environments and therefore degrade in real-world settings with severe perturbations, such as hand-object interactions, extreme poses, illumination changes, and motion blur. To tackle these issues, we introduce WildGHand, an optimization-based framework that enables self-adaptive 3D Gaussian splatting on in-the-wild videos and produces high-fidelity hand avatars. WildGHand incorporates two key components: (i) a dynamic perturbation disentanglement module that explicitly represents perturbations as time-varying biases on 3D Gaussian attributes during optimization, and (ii) a perturbation-aware optimization strategy that generates per-frame anisotropic weighted masks to guide optimization. Together, these components allow the framework to identify and suppress perturbations across both spatial and temporal dimensions. We further curate a dataset of monocular hand videos captured under diverse perturbations to benchmark in-the-wild hand avatar reconstruction. Extensive experiments on this dataset and two public datasets demonstrate that WildGHand achieves state-of-the-art performance and substantially improves over its base model across multiple metrics (e.g., up to a $15.8\\%$ relative gain in PSNR and a $23.1\\%$ relative reduction in LPIPS). Our implementation and dataset are available at https://github.com/XuanHuang0/WildGHand.\n\n尽管单目视频三维手部重建近期取得了进展，但大多数现有方法依赖受控环境下采集的数据，因此在真实世界中遇到手物交互、极端姿态、光照变化和运动模糊等强扰动时，性能会明显下降。为解决这些问题，我们提出 WildGHand，一个基于优化的框架，可在自然视频上进行自适应三维高斯喷溅建模，生成高保真手部化身。WildGHand 包含两个关键组件：一是动态扰动解耦模块，在优化过程中将扰动显式表示为施加在三维高斯属性上的时变偏置；二是扰动感知优化策略，通过生成逐帧各向异性加权掩码来引导优化。二者共同作用，使框架能够在空间和时间两个维度上识别并抑制扰动。我们还构建了一个涵盖多种扰动条件的单目手部视频数据集，用于评估自然场景手部化身重建。该数据集及两个公开数据集上的大量实验表明，WildGHand 达到了最先进性能，并在多项指标上显著优于其基础模型，例如 PSNR 相对提升最高可达 15.8%，LPIPS 相对降低最高可达 23.1%。代码和数据集见 https://github.com/XuanHuang0/WildGHand。\n"
  },
  {
    "path": "abs/2602.20718.md",
    "content": "### Monocular Endoscopic Tissue 3D Reconstruction with Multi-Level Geometry Regularization\n\nReconstructing deformable endoscopic tissues is crucial for achieving robot-assisted surgery. However, 3D Gaussian Splatting-based approaches encounter challenges in achieving consistent tissue surface reconstruction, while existing NeRF-based methods lack real-time rendering capabilities. In pursuit of both smooth deformable surfaces and real-time rendering, we introduce a novel approach based on 3D Gaussian Splatting. Specifically, we introduce surface-aware reconstruction, initially employing a Sign Distance Field-based method to construct a mesh, subsequently utilizing this mesh to constrain the Gaussian Splatting reconstruction process. Furthermore, to ensure the generation of physically plausible deformations, we incorporate local rigidity and global non-rigidity restrictions to guide Gaussian deformation, tailored for the highly deformable nature of soft endoscopic tissue. Based on 3D Gaussian Splatting, our proposed method delivers a fast rendering process and smooth surface appearances. Quantitative and qualitative analysis against alternative methodologies shows that our approach achieves solid reconstruction quality in both textures and geometries.\n\n对可形变内窥镜组织进行三维重建，是实现机器人辅助手术的重要基础。然而，基于三维高斯喷溅的方法在获得一致的组织表面重建方面仍存在困难，而现有基于 NeRF 的方法又缺乏实时渲染能力。为了同时获得平滑的可形变表面和实时渲染能力，我们提出一种基于三维高斯喷溅的新方法。具体而言，我们引入表面感知重建，首先采用基于符号距离场的方法构建网格，再利用该网格约束高斯喷溅重建过程。进一步地，为确保生成的形变符合物理规律，我们针对内窥镜软组织高度可形变的特性，引入局部刚性和全局非刚性约束来引导高斯形变。基于三维高斯喷溅，我们的方法能够实现快速渲染和平滑表面外观。与其他方法的定量和定性分析表明，该方法在纹理和几何两方面都能获得扎实的重建质量。\n"
  },
  {
    "path": "abs/2602.20807.md",
    "content": "### RU4D-SLAM: Reweighting Uncertainty in Gaussian Splatting SLAM for 4D Scene Reconstruction\n\nCombining 3D Gaussian splatting with Simultaneous Localization and Mapping (SLAM) has gained popularity as it enables continuous 3D environment reconstruction during motion. However, existing methods struggle in dynamic environments, particularly moving objects complicate 3D reconstruction and, in turn, hinder reliable tracking. The emergence of 4D reconstruction, especially 4D Gaussian splatting, offers a promising direction for addressing these challenges, yet its potential for 4D-aware SLAM remains largely underexplored. Along this direction, we propose a robust and efficient framework, namely Reweighting Uncertainty in Gaussian Splatting SLAM (RU4D-SLAM) for 4D scene reconstruction, that introduces temporal factors into spatial 3D representation while incorporating uncertainty-aware perception of scene changes, blurred image synthesis, and dynamic scene reconstruction. We enhance dynamic scene representation by integrating motion blur rendering, and improve uncertainty-aware tracking by extending per-pixel uncertainty modeling, which is originally designed for static scenarios, to handle blurred images. Furthermore, we propose a semantic-guided reweighting mechanism for per-pixel uncertainty estimation in dynamic scenes, and introduce a learnable opacity weight to support adaptive 4D mapping. Extensive experiments on standard benchmarks demonstrate that our method substantially outperforms state-of-the-art approaches in both trajectory accuracy and 4D scene reconstruction, particularly in dynamic environments with moving objects and low-quality inputs. Code available: https://ru4d-slam.github.io\n\n将三维高斯喷溅与同步定位与建图结合，因能够在运动过程中连续重建三维环境而受到广泛关注。然而，现有方法在动态环境中仍然表现不佳，尤其是运动物体会使三维重建更加复杂，并进一步妨碍稳定跟踪。四维重建，尤其是四维高斯喷溅的出现，为解决这些问题提供了有前景的方向，但其在面向四维感知的 SLAM 中的潜力仍缺乏深入探索。沿着这一方向，我们提出 RU4D-SLAM，一个面向四维场景重建的稳健高效框架。该方法在空间三维表示中引入时间因素，同时融合不确定性感知的场景变化建模、模糊图像合成和动态场景重建。我们通过引入运动模糊渲染增强动态场景表示，并将原本面向静态场景设计的逐像素不确定性建模扩展到模糊图像，从而提升不确定性感知跟踪能力。此外，我们提出语义引导的逐像素不确定性重加权机制，并引入可学习的不透明度权重来支持自适应四维建图。在标准基准上的大量实验表明，RU4D-SLAM 在轨迹精度和四维场景重建方面都显著优于现有最先进方法，尤其是在存在运动物体和低质量输入的动态环境中。代码见 https://ru4d-slam.github.io。\n"
  },
  {
    "path": "abs/2602.20933.md",
    "content": "### Dropping Anchor and Spherical Harmonics for Sparse-view Gaussian Splatting\n\nRecent 3D Gaussian Splatting (3DGS) Dropout methods address overfitting under sparse-view conditions by randomly nullifying Gaussian opacities. However, we identify a neighbor compensation effect in these approaches: dropped Gaussians are often compensated by their neighbors, weakening the intended regularization. Moreover, these methods overlook the contribution of high-degree spherical harmonic coefficients (SH) to overfitting. To address these issues, we propose DropAnSH-GS, a novel anchor-based Dropout strategy. Rather than dropping Gaussians independently, our method randomly selects certain Gaussians as anchors and simultaneously removes their spatial neighbors. This effectively disrupts local redundancies near anchors and encourages the model to learn more robust, globally informed representations. Furthermore, we extend the Dropout to color attributes by randomly dropping higher-degree SH to concentrate appearance information in lower-degree SH. This strategy further mitigates overfitting and enables flexible post-training model compression via SH truncation. Experimental results demonstrate that DropAnSH-GS substantially outperforms existing Dropout methods with negligible computational overhead, and can be readily integrated into various 3DGS variants to enhance their performances. Project Website: https://sk-fun.fun/DropAnSH-GS\n\n近期面向稀疏视角场景的三维高斯喷溅 Dropout 方法，通常通过随机将高斯不透明度置零来缓解过拟合。然而，我们发现这些方法存在邻域补偿效应，即被丢弃的高斯常会被邻近高斯所补偿，从而削弱原本的正则化效果。此外，这类方法还忽略了高阶球谐系数在过拟合中的作用。为此，我们提出 DropAnSH-GS，一种新的基于锚点的 Dropout 策略。不同于独立丢弃单个高斯，我们的方法随机选择部分高斯作为锚点，并同时移除其空间邻域中的高斯。这能够有效打破锚点附近的局部冗余，促使模型学习更稳健、更具全局性的表示。进一步地，我们还将 Dropout 扩展到颜色属性上，通过随机丢弃高阶球谐系数，使外观信息更多集中在低阶球谐中。这一策略既进一步减轻了过拟合，也支持通过截断球谐系数实现灵活的后训练模型压缩。实验结果表明，DropAnSH-GS 在几乎没有额外计算开销的情况下，显著优于现有 Dropout 方法，并且可以方便地集成到各种 3DGS 变体中以提升性能。项目主页见 https://sk-fun.fun/DropAnSH-GS。\n"
  },
  {
    "path": "abs/2602.21105.md",
    "content": "### BrepGaussian: CAD reconstruction from Multi-View Images with Gaussian Splatting\n\nThe boundary representation (B-rep) models a 3D solid as its explicit boundaries: trimmed corners, edges, and faces. Recovering B-rep representation from unstructured data is a challenging and valuable task of computer vision and graphics. Recent advances in deep learning have greatly improved the recovery of 3D shape geometry, but still depend on dense and clean point clouds and struggle to generalize to novel shapes. We propose B-rep Gaussian Splatting (BrepGaussian), a novel framework that learns 3D parametric representations from 2D images. We employ a Gaussian Splatting renderer with learnable features, followed by a specific fitting strategy. To disentangle geometry reconstruction and feature learning, we introduce a two-stage learning framework that first captures geometry and edges and then refines patch features to achieve clean geometry and coherent instance representations. Extensive experiments demonstrate the superior performance of our approach to state-of-the-art methods. We will release our code and datasets upon acceptance.\n\n边界表示（B-rep）将三维实体建模为其显式边界，例如修剪后的角点、边和面。从非结构化数据中恢复 B-rep 表示，是计算机视觉与图形学中一个具有挑战性且很有价值的任务。近年来，深度学习显著提升了三维形状几何的恢复能力，但现有方法仍依赖稠密且干净的点云，并且难以泛化到新形状。我们提出 B-rep Gaussian Splatting（BrepGaussian），这是一个从二维图像学习三维参数化表示的新框架。该方法采用带可学习特征的 Gaussian Splatting 渲染器，并结合专门的拟合策略。为了解耦几何重建与特征学习，我们引入两阶段学习框架：第一阶段捕获几何与边界，第二阶段细化局部面片特征，从而得到干净的几何结构和一致的实例表示。大量实验表明，我们的方法优于现有最先进方法。代码和数据集将在论文被接收后公开。\n"
  },
  {
    "path": "abs/2602.21535.md",
    "content": "### Pseudo-View Enhancement via Confidence Fusion for Unposed Sparse-View Reconstruction\n\n3D scene reconstruction under unposed sparse viewpoints is a highly challenging yet practically important problem, especially in outdoor scenes due to complex lighting and scale variation. With extremely limited input views, directly utilizing diffusion model to synthesize pseudo frames will introduce unreasonable geometry, which will harm the final reconstruction quality. To address these issues, we propose a novel framework for sparse-view outdoor reconstruction that achieves high-quality results through bidirectional pseudo frame restoration and scene perception Gaussian management. Specifically, we introduce a bidirectional pseudo frame restoration method that restores missing content by diffusion-based synthesis guided by adjacent frames with a lightweight pseudo-view deblur model and confidence mask inference algorithm. Then we propose a scene perception Gaussian management strategy that optimize Gaussians based on joint depth-density information. These designs significantly enhance reconstruction completeness, suppress floating artifacts and improve overall geometric consistency under extreme view sparsity. Experiments on outdoor benchmarks demonstrate substantial gains over existing methods in both fidelity and stability.\n\n在无位姿稀疏视角条件下进行三维场景重建，是一个极具挑战但又具有重要实际意义的问题，尤其在户外场景中，复杂光照和尺度变化会进一步增加难度。当输入视图极其有限时，若直接利用扩散模型合成伪视图，往往会引入不合理的几何结构，从而损害最终重建质量。为解决这些问题，我们提出一种新的稀疏视角户外重建框架，通过双向伪视图恢复与场景感知高斯管理来实现高质量重建。具体而言，我们引入一种双向伪视图恢复方法，在轻量级伪视图去模糊模型和置信度掩码推断算法的辅助下，利用相邻帧引导的扩散式合成来恢复缺失内容。随后，我们提出一种场景感知高斯管理策略，基于联合深度-密度信息对高斯进行优化。这些设计显著提升了重建完整性，抑制了漂浮伪影，并在极端稀疏视角条件下改善了整体几何一致性。户外基准数据集上的实验表明，该方法在保真度和稳定性方面都相较现有方法取得了显著提升。\n"
  },
  {
    "path": "abs/2602.21644.md",
    "content": "### DAGS-SLAM: Dynamic-Aware 3DGS SLAM via Spatiotemporal Motion Probability and Uncertainty-Aware Scheduling\n\nMobile robots and IoT devices demand real-time localization and dense reconstruction under tight compute and energy budgets. While 3D Gaussian Splatting (3DGS) enables efficient dense SLAM, dynamic objects and occlusions still degrade tracking and mapping. Existing dynamic 3DGS-SLAM often relies on heavy optical flow and per-frame segmentation, which is costly for mobile deployment and brittle under challenging illumination. We present DAGS-SLAM, a dynamic-aware 3DGS-SLAM system that maintains a spatiotemporal motion probability (MP) state per Gaussian and triggers semantics on demand via an uncertainty-aware scheduler. DAGS-SLAM fuses lightweight YOLO instance priors with geometric cues to estimate and temporally update MP, propagates MP to the front-end for dynamic-aware correspondence selection, and suppresses dynamic artifacts in the back-end via MP-guided optimization. Experiments on public dynamic RGB-D benchmarks show improved reconstruction and robust tracking while sustaining real-time throughput on a commodity GPU, demonstrating a practical speed-accuracy tradeoff with reduced semantic invocations toward mobile deployment.\n\n移动机器人和物联网设备需要在严格的计算与能耗预算下实现实时定位和稠密重建。虽然三维高斯喷溅使高效稠密 SLAM 成为可能，但动态物体和遮挡仍会降低跟踪与建图效果。现有动态 3DGS-SLAM 通常依赖较重的光流计算和逐帧分割，这对于移动端部署代价较高，并且在复杂光照下不够稳健。我们提出 DAGS-SLAM，一个面向动态环境的 3DGS-SLAM 系统，它为每个高斯维护时空运动概率状态，并通过不确定性感知调度器按需触发语义分析。DAGS-SLAM 将轻量级 YOLO 实例先验与几何线索融合，用于估计并随时间更新运动概率；随后将这一状态传播到前端，以支持动态感知的对应点选择，并在后端通过运动概率引导优化抑制动态伪影。公开动态 RGB-D 基准上的实验表明，该方法在普通 GPU 上保持实时吞吐量的同时，提升了重建质量和跟踪稳健性，展示了适合移动部署的速度与精度折中，并减少了语义模块调用次数。\n"
  },
  {
    "path": "abs/2602.21874.md",
    "content": "### Interactive Augmented Reality-enabled Outdoor Scene Visualization For Enhanced Real-time Disaster Response\n\nA user-centered AR interface for disaster response is presented in this work that uses 3D Gaussian Splatting (3DGS) to visualize detailed scene reconstructions, while maintaining situational awareness and keeping cognitive load low. The interface relies on a lightweight interaction approach, combining World-in-Miniature (WIM) navigation with semantic Points of Interest (POIs) that can be filtered as needed, and it is supported by an architecture designed to stream updates as reconstructions evolve. User feedback from a preliminary evaluation indicates that this design is easy to use and supports real-time coordination, with participants highlighting the value of interaction and POIs for fast decision-making in context. Thorough user-centric performance evaluation demonstrates strong usability of the developed interface and high acceptance ratios.\n\n本文提出一种以用户为中心的增强现实灾害响应界面，利用三维高斯喷溅来可视化高细节场景重建，同时兼顾态势感知并降低认知负担。该界面采用轻量级交互方式，将世界缩影式导航与可按需筛选的语义兴趣点结合起来，并由一套可随重建演化实时流式更新的系统架构支持。初步评估中的用户反馈表明，这种设计易于使用，能够支持实时协同，参与者尤其强调了交互功能和兴趣点信息对于快速情境决策的重要价值。更为充分的以用户为中心的性能评估也表明，该界面具有较强的可用性和较高的接受度。\n"
  },
  {
    "path": "abs/2602.22376.md",
    "content": "### AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction\n\nRecent advances in 4D scene reconstruction have significantly improved dynamic modeling across various domains. However, existing approaches remain limited under aerial conditions with single-view capture, wide spatial range, and dynamic objects of limited spatial footprint and large motion disparity. These challenges cause severe depth ambiguity and unstable motion estimation, making monocular aerial reconstruction inherently ill-posed. To this end, we present AeroDGS, a physics-guided 4D Gaussian splatting framework for monocular UAV videos. AeroDGS introduces a Monocular Geometry Lifting module that reconstructs reliable static and dynamic geometry from a single aerial sequence, providing a robust basis for dynamic estimation. To further resolve monocular ambiguity, we propose a Physics-Guided Optimization module that incorporates differentiable ground-support, upright-stability, and trajectory-smoothness priors, transforming ambiguous image cues into physically consistent motion. The framework jointly refines static backgrounds and dynamic entities with stable geometry and coherent temporal evolution. We additionally build a real-world UAV dataset that spans various altitudes and motion conditions to evaluate dynamic aerial reconstruction. Experiments on synthetic and real UAV scenes demonstrate that AeroDGS outperforms state-of-the-art methods, achieving superior reconstruction fidelity in dynamic aerial environments.\n\n近年来，4D 场景重建在多种领域的动态建模能力上取得了显著进展。然而，现有方法在航拍场景下仍然受限于单视角采集、空间范围大，以及动态物体空间占比小且运动差异大的条件。这些挑战会导致严重的深度歧义和不稳定的运动估计，使单目航拍重建本质上成为病态问题。为此，我们提出 AeroDGS，这是一种面向单目无人机视频的物理引导 4D Gaussian splatting 框架。AeroDGS 引入单目几何提升模块，从单条航拍序列中重建可靠的静态与动态几何，为动态估计提供稳健基础。为进一步缓解单目歧义，我们提出物理引导优化模块，引入可微的地面支撑、直立稳定性和轨迹平滑先验，将模糊的图像线索转化为物理一致的运动。该框架联合优化静态背景和动态实体，使其具有稳定几何与连贯的时间演化。我们还构建了一个覆盖不同飞行高度和运动条件的真实 UAV 数据集，用于评估动态航拍重建。合成与真实 UAV 场景上的实验表明，AeroDGS 在动态航拍环境中实现了优于现有最先进方法的重建保真度。\n"
  },
  {
    "path": "abs/2602.22565.md",
    "content": "### SwiftNDC: Fast Neural Depth Correction for High-Fidelity 3D Reconstruction\n\nDepth-guided 3D reconstruction has gained popularity as a fast alternative to optimization-heavy approaches, yet existing methods still suffer from scale drift, multi-view inconsistencies, and the need for substantial refinement to achieve high-fidelity geometry. Here, we propose SwiftNDC, a fast and general framework built around a Neural Depth Correction field that produces cross-view consistent depth maps. From these refined depths, we generate a dense point cloud through back-projection and robust reprojection-error filtering, obtaining a clean and uniformly distributed geometric initialization for downstream reconstruction. This reliable dense geometry substantially accelerates 3D Gaussian Splatting (3DGS) for mesh reconstruction, enabling high-quality surfaces with significantly fewer optimization iterations. For novel-view synthesis, SwiftNDC can also improve 3DGS rendering quality, highlighting the benefits of strong geometric initialization. We conduct a comprehensive study across five datasets, including two for mesh reconstruction, as well as three for novel-view synthesis. SwiftNDC consistently reduces running time for accurate mesh reconstruction and boosts rendering fidelity for view synthesis, demonstrating the effectiveness of combining neural depth refinement with robust geometric initialization for high-fidelity and efficient 3D reconstruction.\n\n基于深度引导的三维重建因速度快而成为重优化方法的重要替代方案，但现有方法仍然受到尺度漂移、多视图不一致以及为了获得高保真几何结构而需要大量后续细化的限制。为此，我们提出 SwiftNDC，一个围绕神经深度校正场构建的快速通用框架，可生成跨视图一致的深度图。基于这些细化后的深度，我们通过反投影和稳健的重投影误差过滤生成稠密点云，从而为后续重建提供干净且均匀分布的几何初始化。这一可靠的稠密几何能够显著加速用于网格重建的三维高斯喷溅，只需更少的优化迭代即可获得高质量表面。对于新视角合成，SwiftNDC 也能提升 3DGS 的渲染质量，说明强几何初始化具有明显优势。我们在五个数据集上进行了全面研究，其中两个用于网格重建，三个用于新视角合成。结果表明，SwiftNDC 持续缩短了精确网格重建的运行时间，并提高了视图合成的渲染保真度，验证了将神经深度细化与稳健几何初始化结合用于高保真、高效率三维重建的有效性。\n"
  },
  {
    "path": "abs/2602.22571.md",
    "content": "### GIFSplat: Generative Prior-Guided Iterative Feed-Forward 3D Gaussian Splatting from Sparse Views\n\nFeed-forward 3D reconstruction offers substantial runtime advantages over per-scene optimization, which remains slow at inference and often fragile under sparse views. However, existing feed-forward methods still have potential for further performance gains, especially for out-of-domain data, and struggle to retain second-level inference time once a generative prior is introduced. These limitations stem from the one-shot prediction paradigm in existing feed-forward pipeline: models are strictly bounded by capacity, lack inference-time refinement, and are ill-suited for continuously injecting generative priors. We introduce GIFSplat, a purely feed-forward iterative refinement framework for 3D Gaussian Splatting from sparse unposed views. A small number of forward-only residual updates progressively refine current 3D scene using rendering evidence, achieve favorable balance between efficiency and quality. Furthermore, we distill a frozen diffusion prior into Gaussian-level cues from enhanced novel renderings without gradient backpropagation or ever-increasing view-set expansion, thereby enabling per-scene adaptation with generative prior while preserving feed-forward efficiency. Across DL3DV, RealEstate10K, and DTU, GIFSplat consistently outperforms state-of-the-art feed-forward baselines, improving PSNR by up to +2.1 dB, and it maintains second-scale inference time without requiring camera poses or any test-time gradient optimization.\n\n前馈式三维重建相比逐场景优化具有显著的推理速度优势，而后者不仅推理缓慢，在稀疏视角下也常常不稳定。然而，现有前馈式方法仍有进一步提升空间，尤其是在域外数据上，同时一旦引入生成先验，也往往无法维持秒级推理速度。这些局限源于当前前馈式流程的一次性预测范式：模型能力受到严格上限约束，缺乏推理阶段细化机制，也不适合持续注入生成先验。我们提出 GIFSplat，一个完全前馈式、迭代细化的三维高斯喷溅框架，面向稀疏无位姿视图重建。该方法通过少量仅前向的残差更新，利用渲染证据逐步细化当前三维场景，在效率和质量之间取得良好平衡。进一步地，我们将冻结的扩散先验蒸馏为高斯级线索，来源于增强后的新视角渲染，而无需梯度反向传播，也无需不断扩充视图集合，从而实现带生成先验的逐场景自适应，同时保持前馈式效率。在 DL3DV、RealEstate10K 和 DTU 上，GIFSplat 持续优于现有最先进前馈基线，PSNR 最高提升 2.1 dB，并在无需相机位姿和任何测试时梯度优化的情况下维持秒级推理时间。\n"
  },
  {
    "path": "abs/2602.22596.md",
    "content": "### BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model\n\nWe present BetterScene, an approach to enhance novel view synthesis (NVS) quality for diverse real-world scenes using extremely sparse, unconstrained photos. BetterScene leverages the production-ready Stable Video Diffusion (SVD) model pretrained on billions of frames as a strong backbone, aiming to mitigate artifacts and recover view-consistent details at inference time. Conventional methods have developed similar diffusion-based solutions to address these challenges of novel view synthesis. Despite significant improvements, these methods typically rely on off-the-shelf pretrained diffusion priors and fine-tune only the UNet module while keeping other components frozen, which still leads to inconsistent details and artifacts even when incorporating geometry-aware regularizations like depth or semantic conditions. To address this, we investigate the latent space of the diffusion model and introduce two components: (1) temporal equivariance regularization and (2) vision foundation model-aligned representation, both applied to the variational autoencoder (VAE) module within the SVD pipeline. BetterScene integrates a feed-forward 3D Gaussian Splatting (3DGS) model to render features as inputs for the SVD enhancer and generate continuous, artifact-free, consistent novel views. We evaluate on the challenging DL3DV-10K dataset and demonstrate superior performance compared to state-of-the-art methods.\n\n我们提出 BetterScene，一种在极稀疏、无约束照片条件下提升真实世界场景新视角合成质量的方法。BetterScene 利用在数十亿帧视频上预训练、可用于工业级生产的 Stable Video Diffusion 模型作为强大骨干，旨在在推理时减轻伪影并恢复跨视角一致的细节。尽管传统方法也提出了类似的扩散式方案来解决新视角合成问题，但它们通常直接使用现成的扩散先验，并且只微调 UNet 模块、冻结其他组件，因此即便加入深度或语义条件等几何感知正则项，仍会出现细节不一致和伪影。为解决这一问题，我们研究扩散模型的潜在空间，并在 Stable Video Diffusion 管线中的变分自编码器模块上引入两个组件：时间等变正则化和与视觉基础模型对齐的表示。BetterScene 还结合了一个前馈式三维高斯喷溅模型，用于渲染特征作为扩散增强器输入，并生成连续、无伪影且一致的新视角。在具有挑战性的 DL3DV-10K 数据集上的评测表明，BetterScene 优于现有最先进方法。\n"
  },
  {
    "path": "abs/2602.22666.md",
    "content": "### ArtPro: Self-Supervised Articulated Object Reconstruction with Adaptive Integration of Mobility Proposals\n\nReconstructing articulated objects into high-fidelity digital twins is crucial for applications such as robotic manipulation and interactive simulation. Recent self-supervised methods using differentiable rendering frameworks like 3D Gaussian Splatting remain highly sensitive to the initial part segmentation. Their reliance on heuristic clustering or pre-trained models often causes optimization to converge to local minima, especially for complex multi-part objects. To address these limitations, we propose ArtPro, a novel self-supervised framework that introduces adaptive integration of mobility proposals. Our approach begins with an over-segmentation initialization guided by geometry features and motion priors, generating part proposals with plausible motion hypotheses. During optimization, we dynamically merge these proposals by analyzing motion consistency among spatial neighbors, while a collision-aware motion pruning mechanism prevents erroneous kinematic estimation. Extensive experiments on both synthetic and real-world objects demonstrate that ArtPro achieves robust reconstruction of complex multi-part objects, significantly outperforming existing methods in accuracy and stability.\n\n将可动关节物体重建为高保真的数字孪生，对于机器人操作和交互式仿真等应用至关重要。近期基于可微渲染框架，如三维高斯喷溅的自监督方法，对初始部件分割极为敏感。它们依赖启发式聚类或预训练模型，往往会导致优化陷入局部最优，尤其是在复杂多部件物体上。为解决这些限制，我们提出 ArtPro，一个引入可动性提议自适应整合机制的新型自监督框架。该方法首先在几何特征和运动先验引导下进行过分割初始化，生成带有合理运动假设的部件提议。在优化过程中，我们通过分析空间邻居之间的运动一致性，动态合并这些提议，同时利用碰撞感知的运动剪枝机制避免错误的运动学估计。合成和真实世界物体上的大量实验表明，ArtPro 能够稳健地重建复杂多部件物体，并在精度和稳定性方面显著优于现有方法。\n"
  },
  {
    "path": "abs/2602.22731.md",
    "content": "### Sapling-NeRF: Geo-Localised Sapling Reconstruction in Forests for Ecological Monitoring\n\nSaplings are key indicators of forest regeneration and overall forest health. However, their fine-scale architectural traits are difficult to capture with existing 3D sensing methods, which make quantitative evaluation difficult. Terrestrial Laser Scanners (TLS), Mobile Laser Scanners (MLS), or traditional photogrammetry approaches poorly reconstruct thin branches, dense foliage, and lack the scale consistency needed for long-term monitoring. Implicit 3D reconstruction methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) are promising alternatives, but cannot recover the true scale of a scene and lack any means to be accurately geo-localised. In this paper, we present a pipeline which fuses NeRF, LiDAR SLAM, and GNSS to enable repeatable, geo-localised ecological monitoring of saplings. Our system proposes a three-level representation: (i) coarse Earth-frame localisation using GNSS, (ii) LiDAR-based SLAM for centimetre-accurate localisation and reconstruction, and (iii) NeRF-derived object-centric dense reconstruction of individual saplings. This approach enables repeatable quantitative evaluation and long-term monitoring of sapling traits. Our experiments in forest plots in Wytham Woods (Oxford, UK) and Evo (Finland) show that stem height, branching patterns, and leaf-to-wood ratios can be captured with increased accuracy as compared to TLS. We demonstrate that accurate stem skeletons and leaf distributions can be measured for saplings with heights between 0.5m and 2m in situ, giving ecologists access to richer structural and quantitative data for analysing forest dynamics.\n\n幼树是森林更新和整体森林健康状况的重要指标。然而，现有三维感知方法难以捕捉其细尺度结构特征，因此很难进行定量评估。地面激光扫描、移动激光扫描或传统摄影测量方法都难以重建细枝和茂密叶片，也缺乏长期监测所需的尺度一致性。神经辐射场和三维高斯喷溅等隐式三维重建方法是有前景的替代方案，但它们无法恢复场景的真实尺度，也缺乏精确地理定位能力。本文提出一条融合 NeRF、激光雷达 SLAM 和 GNSS 的流程，以支持对幼树进行可重复、可地理定位的生态监测。我们的系统采用三级表示：第一层使用 GNSS 进行地球坐标系下的粗定位；第二层利用激光雷达 SLAM 进行厘米级定位与重建；第三层利用 NeRF 对单株幼树进行以对象为中心的稠密重建。该方法使得对幼树特征进行可重复的定量评估和长期监测成为可能。在英国牛津 Wytham Woods 和芬兰 Evo 的森林样地实验表明，与地面激光扫描相比，该方法能更准确地获取树干高度、分枝模式以及叶木比。我们还展示了对高度在 0.5 米到 2 米之间幼树的树干骨架和叶片分布进行原位精确测量的能力，为生态学家分析森林动态提供了更丰富的结构与定量数据。\n"
  },
  {
    "path": "abs/2602.22800.md",
    "content": "### GSTurb: Gaussian Splatting for Atmospheric Turbulence Mitigation\n\nAtmospheric turbulence causes significant image degradation due to pixel displacement (tilt) and blur, particularly in long-range imaging applications. In this paper, we propose a novel framework for atmospheric turbulence mitigation, GSTurb, which integrates optical flow-guided tilt correction and Gaussian splatting for modeling non-isoplanatic blur. The framework employs Gaussian parameters to represent tilt and blur, and optimizes them across multiple frames to enhance restoration. Experimental results on the ATSyn-static dataset demonstrate the effectiveness of our method, achieving a peak PSNR of 27.67 dB and SSIM of 0.8735. Compared to the state-of-the-art method, GSTurb improves PSNR by 1.3 dB (a 4.5% increase) and SSIM by 0.048 (a 5.8% increase). Additionally, on real datasets, including the TSRWGAN Real-World and CLEAR datasets, GSTurb outperforms existing methods, showing significant improvements in both qualitative and quantitative performance. These results highlight that combining optical flow-guided tilt correction with Gaussian splatting effectively enhances image restoration under both synthetic and real-world turbulence conditions. The code for this method will be available at https://github.com/DuhlLiamz/3DGS_turbulence/tree/main.\n\n大气湍流会由于像素位移（倾斜）和模糊而显著降低图像质量，尤其在远距离成像应用中更为明显。本文提出一种新的大气湍流缓解框架 GSTurb，将光流引导的倾斜校正与 Gaussian splatting 结合起来，以建模非等晕模糊。该框架使用高斯参数表示倾斜和模糊，并在多帧之间对其进行优化，以增强图像恢复效果。在 ATSyn-static 数据集上的实验表明，该方法有效，将 PSNR 提升到 27.67 dB、SSIM 提升到 0.8735。与当前最先进方法相比，GSTurb 的 PSNR 提高了 1.3 dB，SSIM 提高了 0.048。此外，在 TSRWGAN Real-World 和 CLEAR 等真实数据集上，GSTurb 在定性和定量结果上都显著优于现有方法。这些结果表明，将光流引导的倾斜校正与 Gaussian splatting 结合，能够有效增强在合成和真实大气湍流条件下的图像恢复效果。该方法代码将公开。\n"
  },
  {
    "path": "abs/2602.23172.md",
    "content": "### Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking\n\nCapturing 4D spatiotemporal surroundings is crucial for the safe and reliable operation of robots in dynamic environments. However, most existing methods address only one side of the problem: they either provide coarse geometric tracking via bounding boxes, or detailed 3D structures like voxel-based occupancy that lack explicit temporal association. In this work, we present Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking (LaGS) that advances spatiotemporal scene understanding in a holistic direction. Our approach incorporates camera-based end-to-end tracking with mask-based multi-view panoptic occupancy prediction, and addresses the key challenge of efficiently aggregating multi-view information into 3D voxel grids via a novel latent Gaussian splatting approach. Specifically, we first fuse observations into 3D Gaussians that serve as a sparse point-centric latent representation of the 3D scene, and then splat the aggregated features onto a 3D voxel grid that is decoded by a mask-based segmentation head. We evaluate LaGS on the Occ3D nuScenes and Waymo datasets, achieving state-of-the-art performance for 4D panoptic occupancy tracking. We make our code available at https://lags.cs.uni-freiburg.de/.\n\n捕捉四维时空环境对于机器人在动态环境中的安全可靠运行至关重要。然而，大多数现有方法只解决了问题的一部分：要么提供基于边界框的粗粒度几何跟踪，要么给出基于体素占据的细致三维结构，但缺乏显式的时间关联。本文提出用于四维全景占据跟踪的潜在高斯喷溅方法 LaGS，推动时空场景理解向更整体的方向发展。该方法结合基于相机的端到端跟踪和基于掩码的多视图全景占据预测，并通过一种新的潜在高斯喷溅方法，高效地将多视图信息聚合到三维体素网格中。具体而言，我们首先将观测融合为三维高斯，作为稀疏、以点为中心的三维场景潜在表示，然后再将聚合后的特征喷溅到三维体素网格上，并由基于掩码的分割头进行解码。我们在 Occ3D nuScenes 和 Waymo 数据集上评测 LaGS，取得了四维全景占据跟踪的最先进性能。代码见 https://lags.cs.uni-freiburg.de/。\n"
  },
  {
    "path": "abs/2602.23559.md",
    "content": "### No Calibration, No Depth, No Problem: Cross-Sensor View Synthesis with 3D Consistency\n\nWe present the first study of cross-sensor view synthesis across different modalities. We examine a practical, fundamental, yet widely overlooked problem: getting aligned RGB-X data, where most RGB-X prior work assumes such pairs exist and focuses on modality fusion, but it empirically requires huge engineering effort in calibration. We propose a match-densify-consolidate method. First, we perform RGB-X image matching followed by guided point densification. Using the proposed confidence-aware densification and self-matching filtering, we attain better view synthesis and later consolidate them in 3D Gaussian Splatting (3DGS). Our method uses no 3D priors for X-sensor and only assumes nearly no-cost COLMAP for RGB. We aim to remove the cumbersome calibration for various RGB-X sensors and advance the popularity of cross-sensor learning by a scalable solution that breaks through the bottleneck in large-scale real-world RGB-X data collection.\n\n我们首次研究了跨不同传感模态的跨传感器视图合成问题。本文关注一个实际、基础却长期被忽视的问题，即如何获取对齐的 RGB-X 数据。大多数以往 RGB-X 工作默认这类配对数据已经存在，并将重点放在模态融合上，但事实上完成这些传感器标定往往需要巨大的工程代价。我们提出一种匹配、增密、整合的方法。首先，我们进行 RGB-X 图像匹配，然后执行引导式点增密。借助所提出的置信度感知增密和自匹配过滤，我们能够获得更好的视图合成结果，并随后在三维高斯喷溅中进行三维整合。该方法不依赖 X 传感器的三维先验，只假设对 RGB 使用近乎零成本的 COLMAP。我们的目标是去除各种 RGB-X 传感器繁琐的标定过程，并通过一种可扩展方案突破大规模真实世界 RGB-X 数据采集瓶颈，从而推动跨传感器学习的普及。\n"
  },
  {
    "path": "abs/2602.24020.md",
    "content": "### SR3R: Rethinking Super-Resolution 3D Reconstruction With Feed-Forward Gaussian Splatting\n\n3D super-resolution (3DSR) aims to reconstruct high-resolution (HR) 3D scenes from low-resolution (LR) multi-view images. Existing methods rely on dense LR inputs and per-scene optimization, which restricts the high-frequency priors for constructing HR 3D Gaussian Splatting (3DGS) to those inherited from pretrained 2D super-resolution (2DSR) models. This severely limits reconstruction fidelity, cross-scene generalization, and real-time usability. We propose to reformulate 3DSR as a direct feed-forward mapping from sparse LR views to HR 3DGS representations, enabling the model to autonomously learn 3D-specific high-frequency geometry and appearance from large-scale, multi-scene data. This fundamentally changes how 3DSR acquires high-frequency knowledge and enables robust generalization to unseen scenes. Specifically, we introduce SR3R, a feed-forward framework that directly predicts HR 3DGS representations from sparse LR views via the learned mapping network. To further enhance reconstruction fidelity, we introduce Gaussian offset learning and feature refinement, which stabilize reconstruction and sharpen high-frequency details. SR3R is plug-and-play and can be paired with any feed-forward 3DGS reconstruction backbone: the backbone provides an LR 3DGS scaffold, and SR3R upscales it to an HR 3DGS. Extensive experiments across three 3D benchmarks demonstrate that SR3R surpasses state-of-the-art (SOTA) 3DSR methods and achieves strong zero-shot generalization, even outperforming SOTA per-scene optimization methods on unseen scenes.\n\n三维超分辨率旨在从低分辨率多视图图像中重建高分辨率三维场景。现有方法依赖密集的低分辨率输入和逐场景优化，因此构建高分辨率三维高斯喷溅所需的高频先验基本只能继承自预训练二维超分模型。这严重限制了重建保真度、跨场景泛化能力和实时可用性。我们提出将三维超分辨率重新表述为从稀疏低分辨率视图直接前馈映射到高分辨率三维高斯喷溅表示，使模型能够从大规模多场景数据中自主学习三维特有的高频几何与外观信息。这从根本上改变了三维超分辨率获取高频知识的方式，并使其能够稳健泛化到未见场景。具体来说，我们提出 SR3R，一个前馈式框架，通过学习得到的映射网络直接从稀疏低分辨率视图预测高分辨率三维高斯喷溅表示。为进一步提高重建保真度，我们引入高斯偏移学习和特征细化，以稳定重建过程并增强高频细节。SR3R 是即插即用的，可与任意前馈式 3DGS 重建骨干配合使用：骨干先提供低分辨率 3DGS 骨架，SR3R 再将其上采样为高分辨率表示。三个三维基准上的大量实验表明，SR3R 不仅优于现有最先进的三维超分辨率方法，还表现出很强的零样本泛化能力，甚至在未见场景上超过了逐场景优化的最先进方法。\n"
  },
  {
    "path": "abs/2602.24096.md",
    "content": "### DiffusionHarmonizer: Bridging Neural Reconstruction and Photorealistic Simulation with Online Diffusion Enhancer\n\nSimulation is essential to the development and evaluation of autonomous robots such as self-driving vehicles. Neural reconstruction is emerging as a promising solution as it enables simulating a wide variety of scenarios from real-world data alone in an automated and scalable way. However, while methods such as NeRF and 3D Gaussian Splatting can produce visually compelling results, they often exhibit artifacts particularly when rendering novel views, and fail to realistically integrate inserted dynamic objects, especially when they were captured from different scenes. To overcome these limitations, we introduce DiffusionHarmonizer, an online generative enhancement framework that transforms renderings from such imperfect scenes into temporally consistent outputs while improving their realism. At its core is a single-step temporally-conditioned enhancer that is converted from a pretrained multi-step image diffusion model, capable of running in online simulators on a single GPU. The key to training it effectively is a custom data curation pipeline that constructs synthetic-real pairs emphasizing appearance harmonization, artifact correction, and lighting realism. The result is a scalable system that significantly elevates simulation fidelity in both research and production environments.\n\n仿真对于自动驾驶车辆等自主机器人系统的开发与评估至关重要。神经重建正逐渐成为一种很有前景的方案，因为它能够仅从真实世界数据出发，以自动化且可扩展的方式模拟各种场景。然而，尽管 NeRF 和三维高斯喷溅等方法能够产生视觉上颇具吸引力的结果，它们在渲染新视角时仍常常出现伪影，并且难以将插入的动态物体真实地融入场景，尤其是当这些物体来自其他场景时。为克服这些限制，我们提出 DiffusionHarmonizer，一个在线生成增强框架，可将这类不完美场景的渲染结果转化为时间一致且更加逼真的输出。其核心是一个由预训练多步图像扩散模型转换而来的、带时间条件的单步增强器，能够在单张 GPU 上运行于在线模拟器中。为了有效训练这一模型，我们设计了专门的数据构建流程，生成强调外观协调、伪影校正和光照真实感的合成与真实图像配对数据。最终得到的是一个可扩展系统，能够在研究和工业环境中显著提升仿真保真度。\n"
  },
  {
    "path": "abs/2602.24136.md",
    "content": "### Prune Wisely, Reconstruct Sharply: Compact 3D Gaussian Splatting via Adaptive Pruning and Difference-of-Gaussian Primitives\n\nRecent significant advances in 3D scene representation have been driven by 3D Gaussian Splatting (3DGS), which has enabled real-time rendering with photorealistic quality. 3DGS often requires a large number of primitives to achieve high fidelity, leading to redundant representations and high resource consumption, thereby limiting its scalability for complex or large-scale scenes. Consequently, effective pruning strategies and more expressive primitives that can reduce redundancy while preserving visual quality are crucial for practical deployment. We propose an efficient, integrated reconstruction-aware pruning strategy that adaptively determines pruning timing and refining intervals based on reconstruction quality, thus reducing model size while enhancing rendering quality. Moreover, we introduce a 3D Difference-of-Gaussians primitive that jointly models both positive and negative densities in a single primitive, improving the expressiveness of Gaussians under compact configurations. Our method significantly improves model compactness, achieving up to 90\\% reduction in Gaussian-count while delivering visual quality that is similar to, or in some cases better than, that produced by state-of-the-art methods. Code will be made publicly available.\n\n近期三维场景表示的显著进展很大程度上得益于三维高斯喷溅，它实现了具备照片级质量的实时渲染。然而，3DGS 往往需要大量基元才能达到高保真效果，进而带来冗余表示和较高资源消耗，限制了其在复杂或大规模场景中的可扩展性。因此，如何在保持视觉质量的同时通过有效剪枝和更有表达力的基元来减少冗余，是实际部署中的关键问题。我们提出一种高效且与重建过程紧密结合的剪枝策略，它可根据重建质量自适应地确定剪枝时机和细化间隔，从而在缩小模型规模的同时提升渲染质量。此外，我们还引入三维高斯差分基元，在单个基元中同时建模正密度和负密度，从而在紧凑配置下增强高斯表达能力。我们的方法显著提升了模型紧凑性，在高斯数量最多减少 90% 的同时，仍能获得与现有最先进方法相当，甚至在某些情况下更好的视觉质量。代码将公开发布。\n"
  },
  {
    "path": "abs/2602.24161.md",
    "content": "### GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction\n\nReconstructing photorealistic and animatable 4D head avatars from a single portrait image remains a fundamental challenge in computer vision. While diffusion models have enabled remarkable progress in image and video generation for avatar reconstruction, existing methods primarily rely on 2D priors and struggle to achieve consistent 3D geometry. We propose a novel framework that leverages geometry-aware diffusion to learn strong geometry priors for high-fidelity head avatar reconstruction. Our approach jointly synthesizes portrait images and corresponding surface normals, while a pose-free expression encoder captures implicit expression representations. Both synthesized images and expression latents are incorporated into 3D Gaussian-based avatars, enabling photorealistic rendering with accurate geometry. Extensive experiments demonstrate that our method substantially outperforms state-of-the-art approaches in visual quality, expression fidelity, and cross-identity generalization, while supporting real-time rendering.\n\n从单张人像图像重建照片级真实、可驱动的四维头部化身仍是计算机视觉中的基础难题。虽然扩散模型已经推动了图像和视频生成在化身重建中的进展，但现有方法主要依赖二维先验，难以获得一致的三维几何。我们提出一种新框架，利用几何感知扩散学习强几何先验，以实现高保真头部化身重建。该方法联合合成人像图像及其对应的表面法线，同时利用无姿态表达编码器捕捉隐式表达表示。合成图像与表达潜变量共同被引入基于三维高斯的化身表示中，从而实现几何准确的照片级渲染。大量实验表明，我们的方法在视觉质量、表情保真度和跨身份泛化方面都显著优于现有最先进方法，并支持实时渲染。\n"
  },
  {
    "path": "abs/2602.24290.md",
    "content": "### UFO-4D: Unposed Feedforward 4D Reconstruction from Two Images\n\nDense 4D reconstruction from unposed images remains a critical challenge, with current methods relying on slow test-time optimization or fragmented, task-specific feedforward models. We introduce UFO-4D, a unified feedforward framework to reconstruct a dense, explicit 4D representation from just a pair of unposed images. UFO-4D directly estimates dynamic 3D Gaussian Splats, enabling the joint and consistent estimation of 3D geometry, 3D motion, and camera pose in a feedforward manner. Our core insight is that differentiably rendering multiple signals from a single Dynamic 3D Gaussian representation offers major training advantages. This approach enables a self-supervised image synthesis loss while tightly coupling appearance, depth, and motion. Since all modalities share the same geometric primitives, supervising one inherently regularizes and improves the others. This synergy overcomes data scarcity, allowing UFO-4D to outperform prior work by up to 3 times in joint geometry, motion, and camera pose estimation. Our representation also enables high-fidelity 4D interpolation across novel views and time. Please visit our project page for visual results: https://ufo-4d.github.io/\n\n从无位姿图像中进行稠密四维重建仍然是一个关键挑战，现有方法要么依赖缓慢的测试时优化，要么依赖彼此割裂、任务特定的前馈模型。我们提出 UFO-4D，一个统一的前馈框架，仅需一对无位姿图像即可重建稠密且显式的四维表示。UFO-4D 直接估计动态三维 Gaussian Splats，以前馈方式联合且一致地估计三维几何、三维运动和相机位姿。我们的核心观察是，从单一动态三维高斯表示中可微渲染多种信号，会给训练带来显著优势。这使得我们能够使用自监督图像合成损失，同时紧密耦合外观、深度与运动。由于所有模态共享同一组几何基元，对任一模态的监督都会天然地正则化并改善其他模态。这种协同作用缓解了数据稀缺问题，使 UFO-4D 在联合几何、运动和相机位姿估计上相比已有工作最高提升 3 倍。该表示还支持跨新视角和时间的高保真四维插值。项目页面见 https://ufo-4d.github.io/。\n"
  },
  {
    "path": "abs/2603.00145.md",
    "content": "### M-Gaussian: An Magnetic Gaussian Framework for Efficient Multi-Stack MRI Reconstruction\n\nMagnetic Resonance Imaging (MRI) is a crucial non-invasive imaging modality. In routine clinical practice, multi-stack thick-slice acquisitions are widely used to reduce scan time and motion sensitivity, particularly in challenging scenarios such as fetal brain imaging. However, the resulting severe through-plane anisotropy compromises volumetric analysis and downstream quantitative assessment, necessitating robust reconstruction of isotropic high-resolution volumes. Implicit neural representation methods, while achieving high quality, suffer from computational inefficiency due to complex network structures. We present M-Gaussian, adapting 3D Gaussian Splatting to MRI reconstruction. Our contributions include: (1) Magnetic Gaussian primitives with physics-consistent volumetric rendering, (2) neural residual field for high-frequency detail refinement, and (3) multi-resolution progressive training. Our method achieves an optimal balance between quality and speed. On the FeTA dataset, M-Gaussian achieves 40.31 dB PSNR while being 14 times faster, representing the first successful adaptation of 3D Gaussian Splatting to multi-stack MRI reconstruction.\n\n磁共振成像是一种重要的无创成像方式。在常规临床实践中，多堆栈厚切片采集被广泛用于减少扫描时间并降低对运动的敏感性，尤其适用于胎儿脑成像等困难场景。然而，由此产生的严重层间各向异性会损害体数据分析和下游定量评估，因此需要稳健地重建各向同性高分辨率体数据。虽然隐式神经表示方法能够取得较高质量，但由于网络结构复杂，计算效率较低。我们提出 M-Gaussian，将三维高斯喷溅引入 MRI 重建。主要贡献包括：一是具备物理一致体渲染能力的磁性高斯基元，二是用于细化高频细节的神经残差场，三是多分辨率渐进式训练。我们的方法在质量与速度之间取得了良好平衡。在 FeTA 数据集上，M-Gaussian 达到 40.31 dB 的 PSNR，同时速度提升 14 倍，是首个成功将三维高斯喷溅应用于多堆栈 MRI 重建的方法。\n"
  },
  {
    "path": "abs/2603.00492.md",
    "content": "### ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models\n\nPer-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas. Methods that leverage generative priors to correct artifacts in these areas hold promise but currently suffer from two shortcomings. The first is scalability, as existing methods use image diffusion models or bidirectional video models that are limited in the number of views they can generate in a single pass (and thus require a costly iterative distillation process for consistency). The second is quality itself, as generators used in prior work tend to produce outputs that are inconsistent with existing scene content and fail entirely in completely unobserved regions. To solve these, we propose a two-stage pipeline that leverages two key insights. First, we train a powerful bidirectional generative model with a novel opacity mixing strategy that encourages consistency with existing observations while retaining the model's ability to extrapolate novel content in unseen areas. Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass. This model can directly produce novel views or serve as pseudo-supervision to improve the underlying 3D representation in a simple and highly efficient manner. We evaluate our method extensively and demonstrate that it can generate plausible reconstructions in scenarios where existing approaches fail completely. When measured on commonly benchmarked datasets, we outperform existing all existing baselines by a wide margin, exceeding prior state-of-the-art methods by 1-3 dB PSNR.\n\n逐场景优化方法，例如三维高斯喷溅，能够提供最先进的新视角合成质量，但在观测不足区域的外推能力较弱。利用生成先验来修复这些区域伪影的方法很有前景，但目前存在两个主要问题。第一是可扩展性，现有方法通常使用图像扩散模型或双向视频模型，而它们单次能够生成的视图数量有限，因此需要代价高昂的迭代蒸馏来保证一致性。第二是生成质量本身，已有方法中的生成器往往与现有场景内容不一致，在完全未观测区域甚至会彻底失效。为解决这些问题，我们提出一个两阶段流程，基于两个关键观察。首先，我们训练一个强大的双向生成模型，并结合新的不透明度混合策略，使其在保持对已有观测一致的同时，仍具备在未见区域外推新内容的能力。其次，我们将该模型蒸馏为一个因果自回归模型，可在单次前向中生成数百帧。该模型既可直接生成新视角，也可作为伪监督，以简单高效的方式提升底层三维表示。我们进行了充分评估，证明该方法能够在现有方法完全失败的场景中生成合理重建。在常用基准数据集上，我们大幅超过所有现有基线，相比此前最先进方法提高了 1 到 3 dB 的 PSNR。\n"
  },
  {
    "path": "abs/2603.00500.md",
    "content": "### Zero-Shot Robotic Manipulation via 3D Gaussian Splatting-Enhanced Multimodal Retrieval-Augmented Generation\n\nExisting end-to-end approaches of robotic manipulation often lack generalization to unseen objects or tasks due to limited data and poor interpretability. While recent Multimodal Large Language Models (MLLMs) demonstrate strong commonsense reasoning, they struggle with geometric and spatial understanding required for pose prediction. In this paper, we propose RobMRAG, a 3D Gaussian Splatting-Enhanced Multimodal Retrieval-Augmented Generation (MRAG) framework for zero-shot robotic manipulation. Specifically, we construct a multi-source manipulation knowledge base containing object contact frames, task completion frames, and pose parameters. During inference, a Hierarchical Multimodal Retrieval module first employs a three-priority hybrid retrieval strategy to find task-relevant object prototypes, then selects the geometrically closest reference example based on pixel-level similarity and Instance Matching Distance (IMD). We further introduce a 3D-Aware Pose Refinement module based on 3D Gaussian Splatting into the MRAG framework, which aligns the pose of the reference object to the target object in 3D space. The aligned results are reprojected onto the image plane and used as input to the MLLM to enhance the generation of the final pose parameters. Extensive experiments show that on a test set containing 30 categories of household objects, our method improves the success rate by 7.76% compared to the best-performing zero-shot baseline under the same setting, and by 6.54% compared to the state-of-the-art supervised baseline. Our results validate that RobMRAG effectively bridges the gap between high-level semantic reasoning and low-level geometric execution, enabling robotic systems that generalize to unseen objects while remaining inherently interpretable.\n\n现有端到端机器人操作方法常因数据有限和可解释性差而难以泛化到未见物体或未见任务。尽管近期多模态大语言模型展现出较强的常识推理能力，但它们在姿态预测所需的几何与空间理解方面仍然不足。本文提出 RobMRAG，一个结合三维高斯喷溅增强的多模态检索增强生成框架，用于零样本机器人操作。具体而言，我们构建了一个多源操作知识库，包含物体接触帧、任务完成帧和姿态参数。在推理阶段，层次化多模态检索模块首先采用三优先级混合检索策略找到与任务相关的物体原型，然后基于像素级相似性和实例匹配距离选出几何上最接近的参考样例。我们进一步在 MRAG 框架中引入基于三维高斯喷溅的三维感知姿态细化模块，在三维空间中对齐参考物体与目标物体的姿态。对齐结果会重新投影到图像平面，并作为输入送入多模态大语言模型，以增强最终姿态参数的生成。在包含 30 类家庭物体的测试集上，大量实验表明，与同设置下表现最好的零样本基线相比，我们的方法成功率提升了 7.76%，相较现有最先进的监督基线也提升了 6.54%。结果验证了 RobMRAG 能有效弥合高层语义推理与底层几何执行之间的差距，使机器人系统在保持可解释性的同时泛化到未见物体。\n"
  },
  {
    "path": "abs/2603.00697.md",
    "content": "### TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction\n\nWe present TokenSplat, a feed-forward framework for joint 3D Gaussian reconstruction and camera pose estimation from unposed multi-view images. At its core, TokenSplat introduces a Token-aligned Gaussian Prediction module that aligns semantically corresponding information across views directly in the feature space. Guided by coarse token positions and fusion confidence, it aggregates multi-scale contextual features to enable long-range cross-view reasoning and reduce redundancy from overlapping Gaussians. To further enhance pose robustness and disentangle viewpoint cues from scene semantics, TokenSplat employs learnable camera tokens and an Asymmetric Dual-Flow Decoder (ADF-Decoder) that enforces directionally constrained communication between camera and image tokens. This maintains clean factorization within a feed-forward architecture, enabling coherent reconstruction and stable pose estimation without iterative refinement. Extensive experiments demonstrate that TokenSplat achieves higher reconstruction fidelity and novel-view synthesis quality in pose-free settings, and significantly improves pose estimation accuracy compared to prior pose-free methods. Project page: https://kidleyh.github.io/tokensplat/.\n\n我们提出 TokenSplat，一个从无位姿多视图图像中联合进行三维高斯重建和相机位姿估计的前馈框架。其核心是一个 Token 对齐的高斯预测模块，可直接在特征空间中对齐不同视图之间语义对应的信息。在粗粒度 token 位置和融合置信度的引导下，该模块聚合多尺度上下文特征，从而实现长程跨视图推理，并减少重叠高斯带来的冗余。为进一步提升位姿鲁棒性，并将视角线索与场景语义解耦，TokenSplat 采用可学习相机 token 和非对称双流解码器，在相机 token 与图像 token 之间施加方向受限的信息交互。这样就能在前馈架构中维持清晰的因子化表示，无需迭代细化也可获得一致的重建和稳定的位姿估计。大量实验表明，在无位姿设定下，TokenSplat 能获得更高的重建保真度和新视角合成质量，并相比以往无位姿方法显著提高位姿估计精度。项目页面见 https://kidleyh.github.io/tokensplat/。\n"
  },
  {
    "path": "abs/2603.00952.md",
    "content": "### Decoupling Motion and Geometry in 4D Gaussian Splatting\n\nHigh-fidelity reconstruction of dynamic scenes is an important yet challenging problem. While recent 4D Gaussian Splatting (4DGS) has demonstrated the ability to model temporal dynamics, it couples Gaussian motion and geometric attributes within a single covariance formulation, which limits its expressiveness for complex motions and often leads to visual artifacts. To address this, we propose VeGaS, a novel velocity-based 4D Gaussian Splatting framework that decouples Gaussian motion and geometry. Specifically, we introduce a Galilean shearing matrix that explicitly incorporates time-varying velocity to flexibly model complex non-linear motions, while strictly isolating the effects of Gaussian motion from the geometry-related conditional Gaussian covariance. Furthermore, a Geometric Deformation Network is introduced to refine Gaussian shapes and orientations using spatio-temporal context and velocity cues, enhancing temporal geometric modeling. Extensive experiments on public datasets demonstrate that VeGaS achieves state-of-the-art performance.\n\n高保真动态场景重建是一个重要但具有挑战性的问题。尽管近年来的 4D Gaussian Splatting（4DGS）已经展现出建模时序动态的能力，但它在单一协方差表述中耦合了高斯的运动与几何属性，从而限制了对复杂运动的表达能力，并常常导致视觉伪影。为解决这一问题，我们提出 VeGaS，一种基于速度的 4D Gaussian Splatting 新框架，将高斯运动与几何解耦。具体而言，我们引入伽利略剪切矩阵，将随时间变化的速度显式纳入建模，以灵活表达复杂的非线性运动，同时严格隔离高斯运动对与几何相关的条件高斯协方差的影响。此外，我们还提出几何变形网络，利用时空上下文和速度线索来细化高斯的形状与朝向，从而增强时序几何建模能力。公开数据集上的大量实验表明，VeGaS 达到了最先进性能。\n"
  },
  {
    "path": "abs/2603.01099.md",
    "content": "### HeroGS: Hierarchical Guidance for Robust 3D Gaussian Splatting under Sparse Views\n\n3D Gaussian Splatting (3DGS) has recently emerged as a promising approach in novel view synthesis, combining photorealistic rendering with real-time efficiency. However, its success heavily relies on dense camera coverage; under sparse-view conditions, insufficient supervision leads to irregular Gaussian distributions, characterized by globally sparse coverage, blurred background, and distorted high-frequency areas. To address this, we propose HeroGS, Hierarchical Guidance for Robust 3D Gaussian Splatting, a unified framework that establishes hierarchical guidance across the image, feature, and parameter levels. At the image level, sparse supervision is converted into pseudo-dense guidance, globally regularizing the Gaussian distributions and forming a consistent foundation for subsequent optimization. Building upon this, Feature-Adaptive Densification and Pruning (FADP) at the feature level leverages low-level features to refine high-frequency details and adaptively densifies Gaussians in background regions. The optimized distributions then support Co-Pruned Geometry Consistency (CPG) at parameter level, which guides geometric consistency through parameter freezing and co-pruning, effectively removing inconsistent splats. The hierarchical guidance strategy effectively constrains and optimizes the overall Gaussian distributions, thereby enhancing both structural fidelity and rendering quality. Extensive experiments demonstrate that HeroGS achieves high-fidelity reconstructions and consistently surpasses state-of-the-art baselines under sparse-view conditions.\n\n三维高斯喷溅近期在新视角合成中成为一种很有前景的方法，将照片级渲染与实时效率结合起来。然而，它的成功高度依赖稠密相机覆盖；在稀疏视角条件下，监督不足会导致高斯分布不规则，表现为全局覆盖稀疏、背景模糊以及高频区域失真。为此，我们提出 HeroGS，一个统一框架，在图像、特征和参数三个层级上建立分层引导，以增强稀疏视角下的三维高斯喷溅鲁棒性。在图像层面，稀疏监督被转换为伪稠密引导，从全局上正则化高斯分布，并为后续优化奠定一致基础；在此基础上，特征层面的特征自适应增密与剪枝模块利用低层特征细化高频细节，并在背景区域自适应增密高斯；优化后的高斯分布再支持参数层面的协同剪枝几何一致性，通过参数冻结和协同剪枝有效去除不一致的喷溅。分层引导策略能够有效约束并优化整体高斯分布，从而同时提升结构保真度和渲染质量。大量实验表明，在稀疏视角条件下，HeroGS 可以实现高保真重建，并持续优于现有最先进基线。\n"
  },
  {
    "path": "abs/2603.01158.md",
    "content": "### FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian Splatting\n\nRecently, 3D Gaussian Splatting (3DGS) has emerged as a mainstream rendering technique due to its photorealistic quality and low latency. However, processing massive numbers of non-contributing Gaussian points introduces significant computational overhead on resource-limited edge platforms, limiting its deployment in next-generation AR/VR devices. Contribution-based prior skipping alleviates this inefficiency, yet the resulting contribution-testing workload becomes prohibitive for edge execution. In this paper, we present FLICKER, a contribution-aware 3DGS accelerator based on hardware-software co-design. The proposed framework integrates adaptive leader pixels, pixel-rectangle grouping, hierarchical Gaussian testing, and a mixed-precision architecture to enable near pixel-level, contribution-driven rendering with minimal overhead. Experimental results demonstrate up to $1.5\\times$ speedup, $2.6\\times$ improvement in energy efficiency, and $14%$ area reduction compared with a state-of-the-art accelerator. Compared with a representative edge GPU, FLICKER achieves a $19.8\\times$ speedup and $26.7\\times$ higher energy efficiency.\n\n近年来，三维高斯喷溅因其照片级质量和低延迟而成为主流渲染技术。然而，在资源受限的边缘平台上处理大量对结果没有贡献的高斯点，会带来显著计算开销，从而限制其在下一代 AR 和 VR 设备中的部署。基于贡献的先验跳过虽然可以缓解这一问题，但随之而来的贡献判定工作量本身又过大，不适合边缘执行。本文提出 FLICKER，一个基于软硬件协同设计的贡献感知 3DGS 加速器。该框架融合自适应 leader 像素、像素矩形分组、分层高斯测试以及混合精度架构，实现了近似像素级、以贡献为驱动、且额外开销极低的渲染。实验结果显示，与最先进加速器相比，FLICKER 最多可获得 1.5 倍速度提升、2.6 倍能效提升和 14% 面积缩减；与代表性的边缘 GPU 相比，其速度提升达到 19.8 倍，能效提升达到 26.7 倍。\n"
  },
  {
    "path": "abs/2603.01603.md",
    "content": "### Sparse View Distractor-Free Gaussian Splatting\n\n3D Gaussian Splatting (3DGS) enables efficient training and fast novel view synthesis in static environments. To address challenges posed by transient objects, distractor-free 3DGS methods have emerged and shown promising results when dense image captures are available. However, their performance degrades significantly under sparse input conditions. This limitation primarily stems from the reliance on the color residual heuristics to guide the training, which becomes unreliable with limited observations. In this work, we propose a framework to enhance distractor-free 3DGS under sparse-view conditions by incorporating rich prior information. Specifically, we first adopt the geometry foundation model VGGT to estimate camera parameters and generate a dense set of initial 3D points. Then, we harness the attention maps from VGGT for efficient and accurate semantic entity matching. Additionally, we utilize Vision-Language Models (VLMs) to further identify and preserve the large static regions in the scene. We also demonstrate how these priors can be seamlessly integrated into existing distractor-free 3DGS methods. Extensive experiments confirm the effectiveness and robustness of our approach in mitigating transient distractors for sparse-view 3DGS training.\n\n三维高斯喷溅能够在静态环境中实现高效训练和快速新视角合成。为了解决瞬时物体带来的问题，去干扰的 3DGS 方法已经出现，并在稠密图像采集条件下展现出良好效果。然而，在稀疏输入条件下，这些方法的性能会明显下降。这一局限主要源于它们依赖颜色残差启发式来指导训练，而在观测有限时，这种信号会变得不可靠。本文提出一个框架，通过引入丰富先验信息，提升稀疏视角条件下去干扰 3DGS 的效果。具体而言，我们首先采用几何基础模型 VGGT 估计相机参数并生成稠密初始三维点。随后，我们利用 VGGT 的注意力图进行高效且准确的语义实体匹配。此外，我们还借助视觉语言模型进一步识别并保留场景中的大面积静态区域。我们同时展示了这些先验如何无缝集成到现有去干扰 3DGS 方法中。大量实验验证了该方法在缓解稀疏视角 3DGS 训练中的瞬时干扰方面具有良好的有效性与鲁棒性。\n"
  },
  {
    "path": "abs/2603.02129.md",
    "content": "### LiftAvatar: Kinematic-Space Completion for Expression-Controlled 3D Gaussian Avatar Animation\n\nWe present LiftAvatar, a new paradigm that completes sparse monocular observations in kinematic space (e.g., facial expressions and head pose) and uses the completed signals to drive high-fidelity avatar animation. LiftAvatar is a fine-grained, expression-controllable large-scale video diffusion Transformer that synthesizes high-quality, temporally coherent expression sequences conditioned on single or multiple reference images. The key idea is to lift incomplete input data into a richer kinematic representation, thereby strengthening both reconstruction and animation in downstream 3D avatar pipelines. To this end, we introduce (i) a multi-granularity expression control scheme that combines shading maps with expression coefficients for precise and stable driving, and (ii) a multi-reference conditioning mechanism that aggregates complementary cues from multiple frames, enabling strong 3D consistency and controllability. As a plug-and-play enhancer, LiftAvatar directly addresses the limited expressiveness and reconstruction artifacts of 3D Gaussian Splatting-based avatars caused by sparse kinematic cues in everyday monocular videos. By expanding incomplete observations into diverse pose-expression variations, LiftAvatar also enables effective prior distillation from large-scale video generative models into 3D pipelines, leading to substantial gains. Extensive experiments show that LiftAvatar consistently boosts animation quality and quantitative metrics of state-of-the-art 3D avatar methods, especially under extreme, unseen expressions.\n\n我们提出 LiftAvatar，一种新的范式：先在运动学空间中补全稀疏单目观测，例如面部表情和头部姿态，再利用补全后的信号驱动高保真化身动画。LiftAvatar 是一个细粒度、可控表情的大规模视频扩散 Transformer，可在单张或多张参考图像的条件下，生成高质量、时间一致的表情序列。其关键思想是将不完整输入提升到更丰富的运动学表示中，从而同时增强下游三维化身流程中的重建与动画表现。为此，我们引入两项设计：一是结合阴影图和表情系数的多粒度表情控制方案，实现精确且稳定的驱动；二是多参考条件机制，聚合多帧中的互补线索，以增强三维一致性和可控性。作为即插即用增强器，LiftAvatar 直接针对日常单目视频中由于运动学线索稀疏而导致的基于三维高斯喷溅化身在表达力不足和重建伪影方面的问题。通过将不完整观测扩展为多样化的姿态与表情变化，LiftAvatar 还可实现将大规模视频生成模型中的先验有效蒸馏到三维流程中，并带来显著性能提升。大量实验表明，特别是在极端、未见表情条件下，LiftAvatar 能持续提升现有最先进三维化身方法的动画质量和定量指标。\n"
  },
  {
    "path": "abs/2603.02134.md",
    "content": "### OnlineX: Unified Online 3D Reconstruction and Understanding with Active-to-Stable State Evolution\n\nRecent advances in generalizable 3D Gaussian Splatting (3DGS) have enabled rapid 3D scene reconstruction within seconds, eliminating the need for per-scene optimization. However, existing methods primarily follow an offline reconstruction paradigm, lacking the capacity for continuous reconstruction, which limits their applicability to online scenarios such as robotics and VR/AR. In this paper, we introduce OnlineX, a feed-forward framework that reconstructs both 3D visual appearance and language fields in an online manner using only streaming images. A key challenge in online formulation is the cumulative drift issue, which is rooted in the fundamental conflict between two opposing roles of the memory state: an active role that constantly refreshes to capture high-frequency local geometry, and a stable role that conservatively accumulates and preserves the long-term global structure. To address this, we introduce a decoupled active-to-stable state evolution paradigm. Our framework decouples the memory state into a dedicated active state and a persistent stable state, and then cohesively fuses the information from the former into the latter to achieve both fidelity and stability. Moreover, we jointly model visual appearance and language fields and incorporate an implicit Gaussian fusion module to enhance reconstruction quality. Experiments on mainstream datasets demonstrate that our method consistently outperforms prior work in novel view synthesis and semantic understanding, showcasing robust performance across input sequences of varying lengths with real-time inference speed.\n\n近期可泛化三维高斯喷溅的发展，已经能够在数秒内完成快速三维场景重建，无需逐场景优化。然而，现有方法主要遵循离线重建范式，缺乏持续重建能力，因此难以应用于机器人和 VR、AR 等在线场景。本文提出 OnlineX，一个仅凭流式图像在线重建三维视觉外观和语言场的前馈框架。在线建模中的一个关键挑战是累积漂移，其根源在于记忆状态承担了两种相互冲突的角色：一方面它需要保持活跃，以不断刷新并捕获高频局部几何；另一方面它又必须保持稳定，以保守地累积并保存长期全局结构。为解决这一问题，我们提出一种解耦的由活跃到稳定的状态演化范式。该框架将记忆状态拆分为专门的活跃状态和持久的稳定状态，再将前者的信息有机融合到后者中，从而同时实现高保真与稳定性。此外，我们联合建模视觉外观和语言场，并引入隐式高斯融合模块以提升重建质量。在主流数据集上的实验表明，我们的方法在新视角合成和语义理解任务上都持续优于已有工作，并在不同长度的输入序列上都表现出稳健的实时推理能力。\n"
  },
  {
    "path": "abs/2603.02548.md",
    "content": "### SemGS: Feed-Forward Semantic 3D Gaussian Splatting from Sparse Views for Generalizable Scene Understanding\n\nSemantic understanding of 3D scenes is essential for robots to operate effectively and safely in complex environments. Existing methods for semantic scene reconstruction and semantic-aware novel view synthesis often rely on dense multi-view inputs and require scene-specific optimization, limiting their practicality and scalability in real-world applications. To address these challenges, we propose SemGS, a feed-forward framework for reconstructing generalizable semantic fields from sparse image inputs. SemGS uses a dual-branch architecture to extract color and semantic features, where the two branches share shallow CNN layers, allowing semantic reasoning to leverage textural and structural cues in color appearance. We also incorporate a camera-aware attention mechanism into the feature extractor to explicitly model geometric relationships between camera viewpoints. The extracted features are decoded into dual-Gaussians that share geometric consistency while preserving branch-specific attributes, and further rasterized to synthesize semantic maps under novel viewpoints. Additionally, we introduce a regional smoothness loss to enhance semantic coherence. Experiments show that SemGS achieves state-of-the-art performance on benchmark datasets, while providing rapid inference and strong generalization capabilities across diverse synthetic and real-world scenarios.\n\n三维场景的语义理解对于机器人在复杂环境中高效、安全地运行至关重要。现有语义场景重建和语义感知新视角合成方法通常依赖稠密多视图输入，并需要针对每个场景单独优化，这限制了它们在真实应用中的实用性与可扩展性。为解决这些问题，我们提出 SemGS，一个从稀疏图像输入中重建可泛化语义场的前馈框架。SemGS 采用双分支结构分别提取颜色特征和语义特征，两条分支共享浅层卷积网络，使语义推理能够利用颜色外观中的纹理和结构线索。我们还在特征提取器中加入相机感知注意力机制，以显式建模不同相机视角之间的几何关系。提取得到的特征会被解码为双高斯表示，它们共享几何一致性，同时保留各分支特有属性，并进一步通过光栅化在新视角下合成语义图。此外，我们引入区域平滑损失来增强语义一致性。实验表明，SemGS 在基准数据集上实现了最先进性能，同时具有快速推理能力，并在多种合成和真实场景中表现出很强的泛化性。\n"
  },
  {
    "path": "abs/2603.02801.md",
    "content": "### R3GW: Relightable 3D Gaussians for Outdoor Scenes in the Wild\n\n3D Gaussian Splatting (3DGS) has established itself as a leading technique for 3D reconstruction and novel view synthesis of static scenes, achieving outstanding rendering quality and fast training. However, the method does not explicitly model the scene illumination, making it unsuitable for relighting tasks. Furthermore, 3DGS struggles to reconstruct scenes captured in the wild by unconstrained photo collections featuring changing lighting conditions. In this paper, we present R3GW, a novel method that learns a relightable 3DGS representation of an outdoor scene captured in the wild. Our approach separates the scene into a relightable foreground and a non-reflective background (the sky), using two distinct sets of Gaussians. R3GW models view-dependent lighting effects in the foreground reflections by combining Physically Based Rendering with the 3DGS scene representation in a varying illumination setting. We evaluate our method quantitatively and qualitatively on the NeRF-OSR dataset, offering state-of-the-art performance and enhanced support for physically-based relighting of unconstrained scenes. Our method synthesizes photorealistic novel views under arbitrary illumination conditions. Additionally, our representation of the sky mitigates depth reconstruction artifacts, improving rendering quality at the sky-foreground boundary\n\n三维高斯喷溅已经成为静态场景三维重建和新视角合成中的领先技术，具备优异的渲染质量和快速训练能力。然而，该方法并不显式建模场景光照，因此并不适合重光照任务。此外，面对来自自然拍摄、光照变化较大的无约束照片集合时，3DGS 也难以完成稳定重建。本文提出 R3GW，一种能够学习自然室外场景可重光照 3DGS 表示的新方法。我们将场景分解为可重光照的前景和不反光的背景，也就是天空，并分别使用两组不同的高斯来表示。R3GW 在变化光照条件下，将基于物理的渲染与 3DGS 场景表示结合，以建模前景反射中的视角相关光照效应。我们在 NeRF-OSR 数据集上进行了定量和定性评估，取得了最先进性能，并增强了对无约束场景进行基于物理的重光照支持。该方法能够在任意光照条件下合成照片级真实新视角。此外，我们对天空的表示还能缓解深度重建伪影，提升天空与前景边界处的渲染质量。\n"
  },
  {
    "path": "abs/2603.02866.md",
    "content": "### Multimodal-Prior-Guided Importance Sampling for Hierarchical Gaussian Splatting in Sparse-View Novel View Synthesis\n\nWe present multimodal-prior-guided importance sampling as the central mechanism for hierarchical 3D Gaussian Splatting (3DGS) in sparse-view novel view synthesis. Our sampler fuses complementary cues { -- } photometric rendering residuals, semantic priors, and geometric priors { -- } to produce a robust, local recoverability estimate that directly drives where to inject fine Gaussians. Built around this sampling core, our framework comprises (1) a coarse-to-fine Gaussian representation that encodes global shape with a stable coarse layer and selectively adds fine primitives where the multimodal metric indicates recoverable detail; and (2) a geometric-aware sampling and retention policy that concentrates refinement on geometrically critical and complex regions while protecting newly added primitives in underconstrained areas from premature pruning. By prioritizing regions supported by consistent multimodal evidence rather than raw residuals alone, our method alleviates overfitting texture-induced errors and suppresses noise from pose/appearance inconsistencies. Experiments on diverse sparse-view benchmarks demonstrate state-of-the-art reconstructions, with up to +0.3 dB PSNR on DTU.\n\n我们提出一种由多模态先验引导的重要性采样机制，作为稀疏视角新视角合成中层次化三维高斯喷溅的核心。采样器融合了多种互补线索，包括光度渲染残差、语义先验和几何先验，从而得到一个稳健的局部可恢复性估计，并据此直接决定在哪里注入更细粒度的高斯。围绕这一采样核心，我们的框架包含两个部分：一是由粗到细的高斯表示，用稳定的粗层编码全局形状，并仅在多模态指标表明细节可恢复的位置选择性加入精细基元；二是几何感知的采样与保留策略，将细化集中在几何关键且复杂的区域，同时保护欠约束区域中新加入的基元，避免其被过早剪枝。通过优先关注由一致多模态证据支持的区域，而不是单纯依赖残差，我们的方法能够缓解由纹理引起的过拟合错误，并抑制来自位姿和外观不一致的噪声。在多个稀疏视角基准上的实验表明，我们的方法实现了最先进的重建效果，在 DTU 上 PSNR 最高提升 0.3 dB。\n"
  },
  {
    "path": "abs/2603.02887.md",
    "content": "### Generalized non-exponential Gaussian splatting\n\nIn this work we generalize 3D Gaussian splatting (3DGS) to a wider family of physically-based alpha-blending operators. 3DGS has become the standard de-facto for radiance field rendering and reconstruction, given its flexibility and efficiency. At its core, it is based on alpha-blending sorted semitransparent primitives, which in the limit converges to the classic radiative transfer function with exponential transmittance. Inspired by recent research on non-exponential radiative transfer, we generalize the image formation model of 3DGS to non-exponential regimes. Based on this generalization, we use a quadratic transmittance to define sub-linear, linear, and super-linear versions of 3DGS, which exhibit faster-than-exponential decay. We demonstrate that these new non-exponential variants achieve similar quality than the original 3DGS but significantly reduce the number of overdraws, which result on speed-ups of up to $4\\times$ in complex real-world captures, on a ray-tracing-based renderer.\n\n本文将三维高斯喷溅推广到更广泛的一类基于物理的 alpha 混合算子中。由于具有灵活性和高效率，3DGS 已成为辐射场渲染与重建事实上的标准方法。其核心是对按深度排序的半透明基元进行 alpha 混合，在极限情况下可收敛到具有指数透射率的经典辐射传输函数。受近期非指数辐射传输研究启发，我们将 3DGS 的成像模型推广到非指数机制下。在这一推广基础上，我们采用二次透射率定义了亚线性、线性和超线性版本的 3DGS，它们都表现出比指数衰减更快的衰减行为。实验表明，这些新的非指数变体在质量上与原始 3DGS 相近，但能显著减少过绘次数，从而在基于光线追踪的渲染器上对复杂真实场景采集实现最高 4 倍的加速。\n"
  },
  {
    "path": "abs/2603.02893.md",
    "content": "### Intrinsic Geometry-Appearance Consistency Optimization for Sparse-View Gaussian Splatting\n\n3D human reconstruction from a single image is a challenging problem and has been exclusively studied in the literature. Recently, some methods have resorted to diffusion models for guidance, optimizing a 3D representation via Score Distillation Sampling(SDS) or generating a back-view image for facilitating reconstruction. However, these methods tend to produce unsatisfactory artifacts (\\textit{e.g.} flattened human structure or over-smoothing results caused by inconsistent priors from multiple views) and struggle with real-world generalization in the wild. In this work, we present \\emph{MVD-HuGaS}, enabling free-view 3D human rendering from a single image via a multi-view human diffusion model. We first generate multi-view images from the single reference image with an enhanced multi-view diffusion model, which is well fine-tuned on high-quality 3D human datasets to incorporate 3D geometry priors and human structure priors. To infer accurate camera poses from the sparse generated multi-view images for reconstruction, an alignment module is introduced to facilitate joint optimization of 3D Gaussians and camera poses. Furthermore, we propose a depth-based Facial Distortion Mitigation module to refine the generated facial regions, thereby improving the overall fidelity of the reconstruction. Finally, leveraging the refined multi-view images, along with their accurate camera poses, MVD-HuGaS optimizes the 3D Gaussians of the target human for high-fidelity free-view renderings. Extensive experiments on Thuman2.0 and 2K2K datasets show that the proposed MVD-HuGaS achieves state-of-the-art performance on single-view 3D human rendering.\n\n从单张图像进行三维人体重建一直是一个具有挑战性的问题，相关研究长期受到关注。近年来，一些方法借助扩散模型进行引导，通过得分蒸馏采样优化三维表示，或生成背视图图像以辅助重建。然而，这些方法往往会产生不理想的伪影，例如由于多视图先验不一致导致的人体结构扁平化或结果过度平滑，并且在真实自然场景中的泛化能力有限。为此，我们提出 MVD-HuGaS，通过多视图人体扩散模型，实现从单张图像进行自由视角三维人体渲染。我们首先利用增强的多视图扩散模型，从单一参考图像生成多视图图像。该模型在高质量三维人体数据集上进行了充分微调，以融入三维几何先验和人体结构先验。为从稀疏生成的多视图图像中推断准确相机位姿以支持后续重建，我们引入一个对齐模块，以促进三维高斯与相机位姿的联合优化。进一步地，我们提出基于深度的人脸畸变缓解模块，对生成人脸区域进行细化，从而提升整体重建保真度。最终，借助这些细化后的多视图图像及其准确位姿，MVD-HuGaS 对目标人体的三维高斯进行优化，以获得高保真的自由视角渲染。在 Thuman2.0 和 2K2K 数据集上的大量实验表明，MVD-HuGaS 在单视图三维人体渲染上达到了最先进性能。\n"
  },
  {
    "path": "abs/2603.02910.md",
    "content": "### Articulation in Motion: Prior-free Part Mobility Analysis for Articulated Objects By Dynamic-Static Disentanglement\n\nArticulated objects are ubiquitous in daily life. Our goal is to achieve a high-quality reconstruction, segmentation of independent moving parts, and analysis of articulation. Recent methods analyse two different articulation states and perform per-point part segmentation, optimising per-part articulation using cross-state correspondences, given a priori knowledge of the number of parts. Such assumptions greatly limit their applications and performance. Their robustness is reduced when objects cannot be clearly visible in both states. To address these issues, in this paper, we present a new framework, Articulation in Motion (AiM). We infer part-level decomposition, articulation kinematics, and reconstruct an interactive 3D digital replica from a user-object interaction video and a start-state scan. We propose a dual-Gaussian scene representation that is learned from an initial 3DGS scan of the object and a video that shows the movement of separate parts. It uses motion cues to segment the object into parts and assign articulation joints. Subsequently, a robust, sequential RANSAC is employed to achieve part mobility analysis without any part-level structural priors, which clusters moving primitives into rigid parts and estimates kinematics while automatically determining the number of parts. The proposed approach separates the object into parts, each represented as a 3D Gaussian set, enabling high-quality rendering. Our approach yields higher quality part segmentation than previous methods, without prior knowledge. Extensive experimental analysis on both simple and complex objects validates the effectiveness and strong generalisation ability of our approach. Project page: https://haoai-1997.github.io/AiM/.\n\n可动关节物体在日常生活中无处不在。我们的目标是实现高质量重建、独立运动部件分割以及关节运动分析。近期方法通常分析两个不同的关节状态，在已知部件数量的前提下，先进行逐点部件分割，再利用跨状态对应关系优化各部件的关节运动。这些假设极大限制了它们的适用范围和性能，当物体在两个状态下无法都被清晰观察到时，其鲁棒性也会明显下降。为解决这些问题，本文提出 Articulation in Motion，也就是 AiM 框架。我们从用户与物体交互的视频及一个起始状态扫描中，推断部件级分解、关节运动学，并重建可交互的三维数字副本。我们提出双高斯场景表示，基于物体的初始 3DGS 扫描和展示各独立部件运动的视频进行学习。该表示利用运动线索将物体分割为不同部件，并为其分配关节。随后，我们采用稳健的顺序式 RANSAC，在无需任何部件级结构先验的情况下完成部件运动分析，将运动基元聚类为刚体部件并估计其运动学，同时自动确定部件数量。所提出方法将物体拆分为多个部件，每个部件都以一组三维高斯表示，从而支持高质量渲染。与以往方法相比，我们的方法在无需先验知识的情况下可获得更高质量的部件分割。针对简单和复杂物体的大量实验分析验证了该方法的有效性和较强的泛化能力。项目页面见 https://haoai-1997.github.io/AiM/。\n"
  },
  {
    "path": "abs/2603.02986.md",
    "content": "### VIRGi: View-dependent Instant Recoloring of 3D Gaussians Splats\n\n3D Gaussian Splatting (3DGS) has recently transformed the fields of novel view synthesis and 3D reconstruction due to its ability to accurately model complex 3D scenes and its unprecedented rendering performance. However, a significant challenge persists: the absence of an efficient and photorealistic method for editing the appearance of the scene's content. In this paper we introduce VIRGi, a novel approach for rapidly editing the color of scenes modeled by 3DGS while preserving view-dependent effects such as specular highlights. Key to our method are a novel architecture that separates color into diffuse and view-dependent components, and a multi-view training strategy that integrates image patches from multiple viewpoints. Improving over the conventional single-view batch training, our 3DGS representation provides more accurate reconstruction and serves as a solid representation for the recoloring task. For 3DGS recoloring, we then introduce a rapid scheme requiring only one manually edited image of the scene from the end-user. By fine-tuning the weights of a single MLP, alongside a module for single-shot segmentation of the editable area, the color edits are seamlessly propagated to the entire scene in just two seconds, facilitating real-time interaction and providing control over the strength of the view-dependent effects. An exhaustive validation on diverse datasets demonstrates significant quantitative and qualitative advancements over competitors based on Neural Radiance Fields representations.\n\n三维高斯喷溅凭借对复杂三维场景的准确建模能力和前所未有的渲染效率，近期改变了新视角合成和三维重建领域。然而，一个重要挑战仍未解决：缺乏高效且具有照片级真实感的场景外观编辑方法。本文提出 VIRGi，一种能够快速编辑 3DGS 场景颜色，同时保留镜面高光等视角相关效应的新方法。该方法的关键在于一种新的网络结构，它将颜色分解为漫反射成分和视角相关成分，以及一种结合多视角图像块的训练策略。相较传统单视图批训练，我们的 3DGS 表示能够得到更准确的重建，并为后续重着色任务提供稳固基础。随后，在 3DGS 重着色阶段，我们提出一种快速方案，只需终端用户手动编辑一张场景图像。通过微调单个多层感知机的权重，并结合一个单次分割可编辑区域的模块，就能在两秒内将颜色编辑无缝传播到整个场景，实现实时交互，并允许用户控制视角相关效应的强弱。在多个不同数据集上的充分验证表明，该方法在定量和定性上都显著优于基于神经辐射场表示的竞争方法。\n"
  },
  {
    "path": "abs/2603.03602.md",
    "content": "### DM-CFO: A Diffusion Model for Compositional 3D Tooth Generation with Collision-Free Optimization\n\nThe automatic design of a 3D tooth model plays a crucial role in dental digitization. However, current approaches face challenges in compositional 3D tooth generation because both the layouts and shapes of missing teeth need to be optimized.In addition, collision conflicts are often omitted in 3D Gaussian-based compositional 3D generation, where objects may intersect with each other due to the absence of explicit geometric information on the object surfaces. Motivated by graph generation through diffusion models and collision detection using 3D Gaussians, we propose an approach named DM-CFO for compositional tooth generation, where the layout of missing teeth is progressively restored during the denoising phase under both text and graph constraints. Then, the Gaussian parameters of each layout-guided tooth and the entire jaw are alternately updated using score distillation sampling (SDS). Furthermore, a regularization term based on the distances between the 3D Gaussians of neighboring teeth and the anchor tooth is introduced to penalize tooth intersections. Experimental results on three tooth-design datasets demonstrate that our approach significantly improves the multiview consistency and realism of the generated teeth compared with existing methods. Project page: https://amateurc.github.io/CF-3DTeeth/.\n\n三维牙齿模型的自动设计在牙科数字化中发挥着关键作用。然而，当前组合式三维牙齿生成方法面临挑战，因为缺失牙齿的布局和形状都需要联合优化。此外，在基于三维高斯的组合式三维生成中，由于缺乏物体表面的显式几何信息，常常忽视碰撞冲突，导致生成物体彼此相交。受扩散模型图生成和三维高斯碰撞检测的启发，我们提出 DM-CFO，一种用于组合式牙齿生成的方法。在该方法中，缺失牙齿的布局会在去噪阶段结合文本和图约束被逐步恢复。随后，我们利用得分蒸馏采样，交替更新由布局引导的每颗牙齿及整个牙弓的高斯参数。进一步地，我们引入一种基于相邻牙齿与锚点牙齿之间三维高斯距离的正则项，用于惩罚牙齿间相交。三个牙齿设计数据集上的实验结果表明，相较现有方法，我们的方法显著提升了生成牙齿的多视图一致性和真实感。项目页面见 https://amateurc.github.io/CF-3DTeeth/。\n"
  },
  {
    "path": "abs/2603.04254.md",
    "content": "### EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding\n\nUnderstanding a 3D scene immediately with its exploration is essential for embodied tasks, where an agent must construct and comprehend the 3D scene in an online and nearly real-time manner. In this study, we propose EmbodiedSplat, an online feed-forward 3DGS for open-vocabulary scene understanding that enables simultaneous online 3D reconstruction and 3D semantic understanding from the streaming images. Unlike existing open-vocabulary 3DGS methods which are typically restricted to either offline or per-scene optimization setting, our objectives are two-fold: 1) Reconstructs the semantic-embedded 3DGS of the entire scene from over 300 streaming images in an online manner. 2) Highly generalizable to novel scenes with feed-forward design and supports nearly real-time 3D semantic reconstruction when combined with real-time 2D models. To achieve these objectives, we propose an Online Sparse Coefficients Field with a CLIP Global Codebook where it binds the 2D CLIP embeddings to each 3D Gaussian while minimizing memory consumption and preserving the full semantic generalizability of CLIP. Furthermore, we generate 3D geometric-aware CLIP features by aggregating the partial point cloud of 3DGS through 3D U-Net to compensate the 3D geometric prior to 2D-oriented language embeddings. Extensive experiments on diverse indoor datasets, including ScanNet, ScanNet++, and Replica, demonstrate both the effectiveness and efficiency of our method. Check out our project page in https://0nandon.github.io/EmbodiedSplat/.\n\n在具身任务中，伴随探索过程即时理解三维场景至关重要，因为智能体必须以在线、近实时的方式构建并理解三维场景。本文提出 EmbodiedSplat，一个面向开放词汇场景理解的在线前馈式 3DGS 框架，可从流式图像中同时实现在线三维重建和三维语义理解。不同于现有开放词汇 3DGS 方法通常局限于离线或逐场景优化设定，我们的目标有两点：一是能够从 300 多张流式图像中在线重建整个场景的语义嵌入 3DGS；二是通过前馈设计具备对新场景的强泛化能力，并在与实时二维模型结合时支持近实时三维语义重建。为实现这一目标，我们提出带有 CLIP 全局码本的在线稀疏系数字段，在最大限度降低内存消耗的同时，将二维 CLIP 嵌入绑定到每个三维高斯上，并完整保留 CLIP 的语义泛化能力。进一步地，我们通过三维 U-Net 聚合 3DGS 的局部点云，生成具备三维几何感知能力的 CLIP 特征，以弥补二维导向语言嵌入缺乏三维几何先验的问题。在包括 ScanNet、ScanNet++ 和 Replica 在内的多种室内数据集上的大量实验表明，我们的方法在效果和效率上都表现突出。项目页面见 https://0nandon.github.io/EmbodiedSplat/。\n"
  },
  {
    "path": "abs/2603.04290.md",
    "content": "### Gaussian Wardrobe: Compositional 3D Gaussian Avatars for Free-Form Virtual Try-On\n\nWe introduce Gaussian Wardrobe, a novel framework to digitalize compositional 3D neural avatars from multi-view videos. Existing methods for 3D neural avatars typically treat the human body and clothing as an inseparable entity. However, this paradigm fails to capture the dynamics of complex free-form garments and limits the reuse of clothing across different individuals. To overcome these problems, we develop a novel, compositional 3D Gaussian representation to build avatars from multiple layers of free-form garments. The core of our method is decomposing neural avatars into bodies and layers of shape-agnostic neural garments. To achieve this, our framework learns to disentangle each garment layer from multi-view videos and canonicalizes it into a shape-independent space. In experiments, our method models photorealistic avatars with high-fidelity dynamics, achieving new state-of-the-art performance on novel pose synthesis benchmarks. In addition, we demonstrate that the learned compositional garments contribute to a versatile digital wardrobe, enabling a practical virtual try-on application where clothing can be freely transferred to new subjects. Project page: https://ait.ethz.ch/gaussianwardrobe\n\n我们提出 Gaussian Wardrobe，一个从多视图视频中数字化组合式三维神经化身的新框架。现有三维神经化身方法通常将人体和服装视为不可分割的整体。然而，这一范式无法刻画复杂自由形态服装的动态，也限制了服装在不同人物之间的复用。为了解决这些问题，我们开发了一种新的组合式三维高斯表示，用于由多层自由形态服装构建化身。该方法的核心是将神经化身分解为人体以及多层与具体体型无关的神经服装。为实现这一点，我们的框架从多视图视频中学习将每一层服装解耦出来，并将其规范化到与体型无关的空间中。实验中，我们的方法能够建模具备高保真动态的照片级真实化身，并在新姿态合成基准上取得新的最先进性能。此外，我们还展示了学习得到的组合式服装如何构成一个灵活的数字衣橱，使服装能够自由迁移到新人物身上，从而支持实用的虚拟试衣应用。项目页面见 https://ait.ethz.ch/gaussianwardrobe。\n"
  },
  {
    "path": "abs/2603.04770.md",
    "content": "### DSA-SRGS: Super-Resolution Gaussian Splatting for Dynamic Sparse-View DSA Reconstruction\n\nDigital subtraction angiography (DSA) is a key imaging technique for the auxiliary diagnosis and treatment of cerebrovascular diseases. Recent advancements in gaussian splatting and dynamic neural representations have enabled robust 3D vessel reconstruction from sparse dynamic inputs. However, these methods are fundamentally constrained by the resolution of input projections, where performing naive upsampling to enhance rendering resolution inevitably results in severe blurring and aliasing artifacts. Such lack of super-resolution capability prevents the reconstructed 4D models from recovering fine-grained vascular details and intricate branching structures, which restricts their application in precision diagnosis and treatment. To solve this problem, this paper proposes DSA-SRGS, the first super-resolution gaussian splatting framework for dynamic sparse-view DSA reconstruction. Specifically, we introduce a Multi-Fidelity Texture Learning Module that integrates high-quality priors from a fine-tuned DSA-specific super-resolution model, into the 4D reconstruction optimization. To mitigate potential hallucination artifacts from pseudo-labels, this module employs a Confidence-Aware Strategy to adaptively weight supervision signals between the original low-resolution projections and the generated high-resolution pseudo-labels. Furthermore, we develop Radiative Sub-Pixel Densification, an adaptive strategy that leverages gradient accumulation from high-resolution sub-pixel sampling to refine the 4D radiative gaussian kernels. Extensive experiments on two clinical DSA datasets demonstrate that DSA-SRGS significantly outperforms state-of-the-art methods in both quantitative metrics and qualitative visual fidelity.\n\n数字减影血管造影（DSA）是脑血管疾病辅助诊断与治疗中的关键成像技术。近年来，gaussian splatting 与动态神经表示的发展，使得从稀疏动态输入中稳健地重建三维血管成为可能。然而，这类方法从根本上受限于输入投影视图的分辨率；若直接采用朴素上采样来提升渲染分辨率，往往会不可避免地引入严重的模糊和混叠伪影。缺乏超分辨能力，使得重建得到的四维模型难以恢复细粒度血管细节和复杂分支结构，从而限制了其在精准诊疗中的应用。为解决这一问题，本文提出 DSA-SRGS，这是首个面向动态稀疏视角 DSA 重建的超分辨 gaussian splatting 框架。具体而言，我们引入多保真纹理学习模块，将经过微调的 DSA 专用超分辨模型所提供的高质量先验融入四维重建优化过程中。为减轻伪标签可能带来的幻觉伪影，该模块采用一种置信度感知策略，在原始低分辨率投影与生成的高分辨率伪标签之间自适应地分配监督权重。此外，我们还提出辐射子像素增密策略，通过利用高分辨率子像素采样带来的梯度累积来细化四维辐射高斯核。在两个临床 DSA 数据集上的大量实验表明，DSA-SRGS 在定量指标和定性视觉保真度方面都显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2603.04847.md",
    "content": "### GloSplat: Joint Pose-Appearance Optimization for Faster and More Accurate 3D Reconstruction\n\nFeature extraction, matching, structure from motion (SfM), and novel view synthesis (NVS) have traditionally been treated as separate problems with independent optimization objectives. We present GloSplat, a framework that performs \\emph{joint pose-appearance optimization} during 3D Gaussian Splatting training. Unlike prior joint optimization methods (BARF, NeRF--, 3RGS) that rely purely on photometric gradients for pose refinement, GloSplat preserves \\emph{explicit SfM feature tracks} as first-class entities throughout training: track 3D points are maintained as separate optimizable parameters from Gaussian primitives, providing persistent geometric anchors via a reprojection loss that operates alongside photometric supervision. This architectural choice prevents early-stage pose drift while enabling fine-grained refinement -- a capability absent in photometric-only approaches. We introduce two pipeline variants: (1) \\textbf{GloSplat-F}, a COLMAP-free variant using retrieval-based pair selection for efficient reconstruction, and (2) \\textbf{GloSplat-A}, an exhaustive matching variant for maximum quality. Both employ global SfM initialization followed by joint photometric-geometric optimization during 3DGS training. Experiments demonstrate that GloSplat-F achieves state-of-the-art among COLMAP-free methods while GloSplat-A surpasses all COLMAP-based baselines.\n\n特征提取、匹配、运动恢复结构和新视角合成传统上都被视为相互独立、各自优化的问题。我们提出 GloSplat，一个在三维高斯喷溅训练过程中执行联合位姿与外观优化的框架。与以往仅依赖光度梯度进行位姿细化的联合优化方法不同，GloSplat 在整个训练过程中保留显式的运动恢复结构特征轨迹，将轨迹三维点作为与高斯基元分离的可优化参数，通过与光度监督并行的重投影损失持续提供几何锚点。这样的架构选择能够防止早期位姿漂移，同时支持细粒度细化，而这正是仅依赖光度的方法所缺失的。我们提出两个流程变体：一是 GloSplat-F，一个不依赖 COLMAP 的变体，通过基于检索的图像对选择实现高效重建；二是 GloSplat-A，一个穷举匹配变体，用于追求最高质量。两者都采用全局运动恢复结构初始化，并在 3DGS 训练中联合进行光度与几何优化。实验表明，GloSplat-F 在无 COLMAP 方法中达到了最先进性能，而 GloSplat-A 则超过了所有基于 COLMAP 的基线。\n"
  },
  {
    "path": "abs/2603.05108.md",
    "content": "### GaussTwin: Unified Simulation and Correction with Gaussian Splatting for Robotic Digital Twins\n\nDigital twins promise to enhance robotic manipulation by maintaining a consistent link between real-world perception and simulation. However, most existing systems struggle with the lack of a unified model, complex dynamic interactions, and the real-to-sim gap, which limits downstream applications such as model predictive control. Thus, we propose GaussTwin, a real-time digital twin that combines position-based dynamics with discrete Cosserat rod formulations for physically grounded simulation, and Gaussian splatting for efficient rendering and visual correction. By anchoring Gaussians to physical primitives and enforcing coherent SE(3) updates driven by photometric error and segmentation masks, GaussTwin achieves stable prediction-correction while preserving physical fidelity. Through experiments in both simulation and on a Franka Research 3 platform, we show that GaussTwin consistently improves tracking accuracy and robustness compared to shape-matching and rigid-only baselines, while also enabling downstream tasks such as push-based planning. These results highlight GaussTwin as a step toward unified, physically meaningful digital twins that can support closed-loop robotic interaction and learning.\n\n数字孪生有望通过维持真实世界感知与仿真之间的一致联系来提升机器人操作能力。然而，现有系统大多受限于缺乏统一模型、复杂动态交互以及真实到仿真的差距，从而限制了模型预测控制等下游应用。为此，我们提出 GaussTwin，一个实时数字孪生系统，将基于位置的动力学与离散 Cosserat 杆公式结合起来，实现具备物理依据的仿真，并结合 Gaussian splatting 进行高效渲染与视觉校正。通过将高斯锚定到物理基元，并利用由光度误差和分割掩码驱动的一致 SE(3) 更新，GaussTwin 在保持物理保真度的同时实现了稳定的预测-校正过程。仿真环境和 Franka Research 3 平台上的实验表明，与基于 shape matching 和仅刚体的方法相比，GaussTwin 在跟踪精度和鲁棒性上均有稳定提升，并能支持基于推动的规划等下游任务。这些结果表明，GaussTwin 是迈向统一且具有物理意义数字孪生的一步，可为闭环机器人交互与学习提供支撑。\n"
  },
  {
    "path": "abs/2603.05152.md",
    "content": "### SSR-GS: Separating Specular Reflection in Gaussian Splatting for Glossy Surface Reconstruction\n\nIn recent years, 3D Gaussian splatting (3DGS) has achieved remarkable progress in novel view synthesis. However, accurately reconstructing glossy surfaces under complex illumination remains challenging, particularly in scenes with strong specular reflections and multi-surface interreflections. To address this issue, we propose SSR-GS, a specular reflection modeling framework for glossy surface reconstruction. Specifically, we introduce a prefiltered Mip-Cubemap to model direct specular reflections efficiently, and propose an IndiASG module to capture indirect specular reflections. Furthermore, we design Visual Geometry Priors (VGP) that couple a reflection-aware visual prior via a reflection score (RS) to downweight the photometric loss contribution of reflection-dominated regions, with geometry priors derived from VGGT, including progressively decayed depth supervision and transformed normal constraints. Extensive experiments on both synthetic and real-world datasets demonstrate that SSR-GS achieves state-of-the-art performance in glossy surface reconstruction.\n\n近年来，三维高斯喷溅在新视角合成方面取得了显著进展。然而，在复杂光照条件下准确重建高光表面仍然困难，尤其是在存在强镜面反射和多表面间反射的场景中。为解决这一问题，我们提出 SSR-GS，一个用于高光表面重建的镜面反射建模框架。具体而言，我们引入预滤波的 Mip-Cubemap 来高效建模直接镜面反射，并提出 IndiASG 模块来捕捉间接镜面反射。进一步地，我们设计了视觉几何先验，将反射感知视觉先验通过反射分数耦合进来，以降低反射主导区域对光度损失的影响；同时结合来自 VGGT 的几何先验，包括逐步衰减的深度监督和变换法线约束。在合成和真实数据集上的大量实验表明，SSR-GS 在高光表面重建任务上达到了最先进性能。\n"
  },
  {
    "path": "abs/2603.05845.md",
    "content": "### Cog2Gen3D: Sculpturing 3D Semantic-Geometric Cognition for 3D Generation\n\nGenerative models have achieved success in producing semantically plausible 2D images, but it remains challenging in 3D generation due to the absence of spatial geometry constraints. Typically, existing methods utilize geometric features as conditions to enhance spatial awareness. However, these methods can only model relative relationships and are prone to scale inconsistency of absolute geometry. Thus, we argue that semantic information and absolute geometry empower 3D cognition, thereby enabling controllable 3D generation for the physical world. In this work, we propose Cog2Gen3D, a 3D cognition-guided diffusion framework for 3D generation. Our model is guided by three key designs: 1) Cognitive Feature Embeddings. We encode different modalities into semantic and geometric representations and further extract logical representations. 2) 3D Latent Cognition Graph. We structure different representations into dual-stream semantic-geometric graphs and fuse them via common-based cross-attention to obtain a 3D cognition graph. 3) Cognition-Guided Latent Diffusion. We leverage the fused 3D cognition graph as the condition to guide the latent diffusion process for 3D Gaussian generation. Under this unified framework, the 3D cognition graph ensures the physical plausibility and structural rationality of 3D generation. Moreover, we construct a validation subset based on the Marble World Labs. Extensive experiments demonstrate that our Cog2Gen3D significantly outperforms existing methods in both semantic fidelity and geometric plausibility.\n\n生成模型已经能够成功生成语义合理的二维图像，但由于缺乏空间几何约束，三维生成仍然具有挑战性。现有方法通常利用几何特征作为条件来增强空间感知，但这类方法只能建模相对关系，并容易出现绝对几何尺度不一致的问题。因此，我们认为语义信息和绝对几何共同构成了三维认知，使可控的物理世界三维生成成为可能。为此，我们提出 Cog2Gen3D，一个由三维认知引导的扩散式三维生成框架。该模型由三项关键设计支撑：一是认知特征嵌入，我们将不同模态编码为语义和几何表示，并进一步提取逻辑表示；二是三维潜在认知图，我们将不同表示组织为双流的语义几何图，并通过基于共性的交叉注意力进行融合，得到三维认知图；三是认知引导的潜在扩散，我们利用融合后的三维认知图作为条件，引导潜在扩散过程生成三维高斯。在这一统一框架下，三维认知图保证了三维生成在物理上合理、结构上可信。与此同时，我们还基于 Marble World Labs 构建了一个验证子集。大量实验表明，Cog2Gen3D 在语义保真度和几何合理性上都显著优于现有方法。\n"
  },
  {
    "path": "abs/2603.05882.md",
    "content": "### CylinderSplat: 3D Gaussian Splatting with Cylindrical Triplanes for Panoramic Novel View Synthesis\n\nFeed-forward 3D Gaussian Splatting (3DGS) has shown great promise for real-time novel view synthesis, but its application to panoramic imagery remains challenging. Existing methods often rely on multi-view cost volumes for geometric refinement, which struggle to resolve occlusions in sparse-view scenarios. Furthermore, standard volumetric representations like Cartesian Triplanes are poor in capturing the inherent geometry of $360^\\circ$ scenes, leading to distortion and aliasing. In this work, we introduce CylinderSplat, a feed-forward framework for panoramic 3DGS that addresses these limitations. The core of our method is a new {cylindrical Triplane} representation, which is better aligned with panoramic data and real-world structures adhering to the Manhattan-world assumption. We use a dual-branch architecture: a pixel-based branch reconstructs well-observed regions, while a volume-based branch leverages the cylindrical Triplane to complete occluded or sparsely-viewed areas. Our framework is designed to flexibly handle a variable number of input views, from single to multiple panoramas. Extensive experiments demonstrate that CylinderSplat achieves state-of-the-art results in both single-view and multi-view panoramic novel view synthesis, outperforming previous methods in both reconstruction quality and geometric accuracy.\n\n前馈式三维高斯喷溅在实时新视角合成中展现出巨大潜力，但将其应用于全景图像仍然具有挑战。现有方法通常依赖多视图代价体进行几何细化，而在稀疏视角场景中，这种方式难以解决遮挡问题。此外，笛卡尔 Triplane 等标准体表示也难以捕捉 360 度场景固有的几何结构，容易产生畸变和混叠。为此，我们提出 CylinderSplat，一个面向全景 3DGS 的前馈框架。该方法的核心是一种新的柱面 Triplane 表示，它与全景数据以及符合曼哈顿世界假设的真实结构更加匹配。我们采用双分支架构：像素分支负责重建观测充分的区域，而体表示分支则利用柱面 Triplane 补全被遮挡或观测稀疏的区域。该框架能够灵活处理从单张到多张全景图在内的不同数量输入视图。大量实验表明，CylinderSplat 在单视图和多视图全景新视角合成上都达到了最先进结果，在重建质量和几何精度方面均优于此前方法。\n"
  },
  {
    "path": "abs/2603.05932.md",
    "content": "### FTSplat: Feed-forward Triangle Splatting Network\n\nHigh-fidelity three-dimensional (3D) reconstruction is essential for robotics and simulation. While Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) achieve impressive rendering quality, their reliance on time-consuming per-scene optimization limits real-time deployment. Emerging feed-forward Gaussian splatting methods improve efficiency but often lack explicit, manifold geometry required for direct simulation. To address these limitations, we propose a feed-forward framework for triangle primitive generation that directly predicts continuous triangle surfaces from calibrated multi-view images. Our method produces simulation-ready models in a single forward pass, obviating the need for per-scene optimization or post-processing. We introduce a pixel-aligned triangle generation module and incorporate relative 3D point cloud supervision to enhance geometric learning stability and consistency. Experiments demonstrate that our method achieves efficient reconstruction while maintaining seamless compatibility with standard graphics and robotic simulators.\n\n高保真三维重建对于机器人和仿真至关重要。尽管神经辐射场和三维高斯喷溅能够获得出色的渲染质量，但它们依赖耗时的逐场景优化，限制了实时部署。新兴的前馈式高斯喷溅方法虽然提高了效率，却常常缺乏可直接用于仿真的显式流形几何。为解决这些限制，我们提出一个面向三角形基元生成的前馈框架，能够直接从已标定的多视图图像预测连续三角形表面。该方法通过单次前向传播生成可直接用于仿真的模型，无需逐场景优化或后处理。我们引入像素对齐的三角形生成模块，并结合相对三维点云监督，以增强几何学习的稳定性和一致性。实验表明，我们的方法在保持与标准图形和机器人模拟器无缝兼容的同时，实现了高效重建。\n"
  },
  {
    "path": "abs/2603.06061.md",
    "content": "### Transforming Omnidirectional RGB-LiDAR data into 3D Gaussian Splatting\n\nThe demand for large-scale digital twins is rapidly growing in robotics and autonomous driving. However, constructing these environments with 3D Gaussian Splatting (3DGS) usually requires expensive, purpose-built data collection. Meanwhile, deployed platforms routinely collect extensive omnidirectional RGB and LiDAR logs, but a significant portion of these sensor data is directly discarded or strictly underutilized due to transmission constraints and the lack of scalable reuse pipeline. In this paper, we present an omnidirectional RGB-LiDAR reuse pipeline that transforms these archived logs into robust initialization assets for 3DGS. Direct conversion of such raw logs introduces practical bottlenecks: inherent non-linear distortion leads to unreliable Structure-from-Motion (SfM) tracking, and dense, unorganized LiDAR clouds cause computational overhead during 3DGS optimization. To overcome these challenges, our pipeline strategically integrates an ERP-to-cubemap conversion module for deterministic spatial anchoring, alongside PRISM-a color stratified downsampling strategy. By bridging these multi-modal inputs via Fast Point Feature Histograms (FPFH) based global registration and Iterative Closest Point (ICP), our pipeline successfully repurposes a considerable fraction of discarded data into usable SfM geometry. Furthermore, our LiDAR-reinforced initialization consistently enhances the final 3DGS rendering fidelity in structurally complex scenes compared to vision-only baselines. Ultimately, this work provides a deterministic workflow for creating simulation-grade digital twins from standard archived sensor logs.\n\n机器人和自动驾驶领域对大规模数字孪生的需求正在迅速增长。然而，利用三维高斯喷溅构建这类环境通常需要昂贵且专门设计的数据采集。与此同时，实际部署平台会持续采集大量全向 RGB 和激光雷达日志，但由于传输限制和缺乏可扩展复用流程，这些传感器数据中有相当一部分被直接丢弃或严重闲置。本文提出一条全向 RGB 和激光雷达复用流程，可将这些存档日志转化为稳健的 3DGS 初始化资产。直接转换这类原始日志会引入实际瓶颈：固有的非线性畸变会导致运动恢复结构跟踪不可靠，而稠密、无组织的激光雷达点云又会在 3DGS 优化中带来较大计算开销。为解决这些问题，我们的流程结合 ERP 到立方体贴图转换模块以实现确定性的空间锚定，并配合 PRISM，一种按颜色分层的下采样策略。通过使用基于快速点特征直方图的全局配准和迭代最近点算法桥接多模态输入，我们的流程能够成功地将大量原本被丢弃的数据重新利用为可用于运动恢复结构的几何信息。进一步地，相较纯视觉基线，我们基于激光雷达增强的初始化在结构复杂场景中能持续提升最终 3DGS 的渲染保真度。总体来看，这项工作提供了一条从标准存档传感器日志创建仿真级数字孪生的确定性流程。\n"
  },
  {
    "path": "abs/2603.06210.md",
    "content": "### VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction\n\n3D semantic occupancy prediction has become a crucial perception task for comprehensive scene understanding in autonomous driving. While recent advances have explored 3D Gaussian splatting for occupancy modeling to substantially reduce computational overhead, the generation of high-quality 3D Gaussians relies heavily on accurate geometric cues, which are often insufficient in purely vision-centric paradigms. To bridge this gap, we advocate for injecting the strong geometric grounding capability from Vision Foundation Models (VFMs) into occupancy prediction. In this regard, we introduce Visual Geometry Grounded Gaussian Splatting (VG3S), a novel framework that empowers Gaussian-based occupancy prediction with cross-view 3D geometric grounding. Specifically, to fully exploit the rich 3D geometric priors from a frozen VFM, we propose a plug-and-play hierarchical geometric feature adapter, which can effectively transform generic VFM tokens via feature aggregation, task-specific alignment, and multi-scale restructuring. Extensive experiments on the nuScenes occupancy benchmark demonstrate that VG3S achieves remarkable improvements of 12.6% in IoU and 7.5% in mIoU over the baseline. Furthermore, we show that VG3S generalizes seamlessly across diverse VFMs, consistently enhancing occupancy prediction accuracy and firmly underscoring the immense value of integrating priors derived from powerful, pre-trained geometry-grounded VFMs.\n\n三维语义占据预测已成为自动驾驶中实现全面场景理解的关键感知任务。尽管近期有工作探索利用三维高斯喷溅进行占据建模，以显著降低计算开销，但高质量三维高斯的生成高度依赖准确的几何线索，而纯视觉范式下这类线索往往不足。为弥补这一缺口，我们主张将视觉基础模型强大的几何扎根能力注入占据预测中。基于这一思路，我们提出 VG3S，一个新的框架，通过跨视图三维几何扎根增强基于高斯的占据预测。具体来说，为了充分利用冻结视觉基础模型中丰富的三维几何先验，我们提出一种即插即用的层次化几何特征适配器，它能够通过特征聚合、任务特定对齐和多尺度重组，将通用视觉基础模型 token 有效转换为占据预测所需表示。在 nuScenes 占据预测基准上的大量实验表明，VG3S 相比基线在 IoU 和 mIoU 上分别提升了 12.6% 和 7.5%。此外，我们还证明了 VG3S 可以无缝泛化到不同视觉基础模型上，持续提升占据预测精度，并充分体现了将强大的几何感知预训练视觉基础模型先验融入该任务的巨大价值。\n"
  },
  {
    "path": "abs/2603.06216.md",
    "content": "### EntON: Eigenentropy-Optimized Neighborhood Densification in 3D Gaussian Splatting\n\nWe present a novel Eigenentropy-optimized neighboorhood densification strategy EntON in 3D Gaussian Splatting (3DGS) for geometrically accurate and high-quality rendered 3D reconstruction. While standard 3DGS produces Gaussians whose centers and surfaces are poorly aligned with the underlying object geometry, surface-focused reconstruction methods frequently sacrifice photometric accuracy. In contrast to the conventional densification strategy, which relies on the magnitude of the view-space position gradient, our approach introduces a geometry-aware strategy to guide adaptive splitting and pruning. Specifically, we compute the 3D shape feature Eigenentropy from the eigenvalues of the covariance matrix in the k-nearest neighborhood of each Gaussian center, which quantifies the local structural order. These Eigenentropy values are integrated into an alternating optimization framework: During the optimization process, the algorithm alternates between (i) standard gradient-based densification, which refines regions via view-space gradients, and (ii) Eigenentropy-aware densification, which preferentially densifies Gaussians in low-Eigenentropy (ordered, flat) neighborhoods to better capture fine geometric details on the object surface, and prunes those in high-Eigenentropy (disordered, spherical) regions. We provide quantitative and qualitative evaluations on two benchmark datasets: small-scale DTU dataset and large-scale TUM2TWIN dataset, covering man-made objects and urban scenes. Experiments demonstrate that our Eigenentropy-aware alternating densification strategy improves geometric accuracy by up to 33% and rendering quality by up to 7%, while reducing the number of Gaussians by up to 50% and training time by up to 23%. Overall, EnTON achieves a favorable balance between geometric accuracy, rendering quality and efficiency by avoiding unnecessary scene expansion.\n\n我们提出 EntON，一种基于特征熵优化的邻域增密策略，用于在三维高斯喷溅中实现几何准确且渲染质量高的三维重建。标准 3DGS 生成的高斯，其中心和表面往往与真实物体几何对齐较差；而以表面为重点的重建方法又常常牺牲光度精度。不同于传统依赖视空间位置梯度大小的增密策略，我们提出一种几何感知策略，用于指导自适应分裂与剪枝。具体而言，我们根据每个高斯中心的 k 近邻协方差矩阵特征值计算三维形状特征熵，用以量化局部结构有序程度。我们将这一特征熵整合进交替优化框架中：优化过程中，算法在两种模式之间切换，一种是标准的基于梯度的增密，通过视空间梯度细化区域；另一种是基于特征熵的增密，会优先在低特征熵，也就是更有序、更平坦的邻域中增密高斯，以更好捕捉物体表面的细粒度几何，同时在高特征熵，也就是更无序、更球状的区域中剪枝。我们在小规模 DTU 和大规模 TUM2TWIN 两个基准数据集上进行了定量和定性评估，覆盖人造物体和城市场景。实验表明，我们基于特征熵的交替增密策略最多可将几何精度提高 33%，渲染质量提高 7%，同时高斯数量最多减少 50%，训练时间最多减少 23%。总体而言，EntON 通过避免不必要的场景扩张，在几何精度、渲染质量和效率之间取得了良好平衡。\n"
  },
  {
    "path": "abs/2603.06852.md",
    "content": "### Active View Selection with Perturbed Gaussian Ensemble for Tomographic Reconstruction\n\nSparse-view computed tomography (CT) is critical for reducing radiation exposure to patients. Recent advances in radiative 3D Gaussian Splatting (3DGS) have enabled fast and accurate sparse-view CT reconstruction. Despite these algorithmic advancements, practical reconstruction fidelity remains fundamentally bounded by the quality of the captured data, raising the crucial yet underexplored problem of X-ray active view selection. Existing active view selection methods are primarily designed for natural-light scenes and fail to capture the unique geometric ambiguities and physical attenuation properties inherent in X-ray imaging. In this paper, we present Perturbed Gaussian Ensemble, an active view selection framework that integrates uncertainty modeling with sequential decision-making, tailored for X-ray Gaussian Splatting. Specifically, we identify low-density Gaussian primitives that are likely to be uncertain and apply stochastic density scaling to construct an ensemble of plausible Gaussian density fields. For each candidate projection, we measure the structural variance of the ensemble predictions and select the one with the highest variance as the next best view. Extensive experimental results on arbitrary-trajectory CT benchmarks demonstrate that our density-guided perturbation strategy effectively eliminates geometric artifacts and consistently outperforms existing baselines in progressive tomographic reconstruction under unified view selection protocols.\n\n稀疏视角计算机断层成像对于降低患者辐射暴露至关重要。近期辐射式三维高斯喷溅的发展已经使快速而准确的稀疏视角 CT 重建成为可能。然而，尽管算法不断进步，实际重建保真度仍从根本上受限于采集数据质量，这引出了一个关键但尚未被充分研究的问题，即 X 射线主动视角选择。现有主动视角选择方法主要面向自然光场景设计，无法捕捉 X 射线成像中独有的几何歧义和物理衰减特性。本文提出扰动高斯集成框架，一个结合不确定性建模和序列决策的主动视角选择方法，专门面向 X 射线高斯喷溅。具体而言，我们识别出可能具有较大不确定性的低密度高斯基元，并对其施加随机密度缩放，以构造一组合理的高斯密度场。对于每个候选投影，我们测量该集成预测结果的结构方差，并选择方差最大的投影作为下一最佳视角。在任意轨迹 CT 基准上的大量实验表明，我们基于密度引导的扰动策略能够有效消除几何伪影，并在统一视角选择协议下持续优于现有基线。\n"
  },
  {
    "path": "abs/2603.06860.md",
    "content": "### ColonSplat: Reconstruction of Peristaltic Motion in Colonoscopy with Dynamic Gaussian Splatting\n\nAccurate 3D reconstruction of colonoscopy data, accounting for complex peristaltic movements, is crucial for advanced surgical navigation and retrospective diagnostics. While recent novel view synthesis and 3D reconstruction methods have demonstrated remarkable success in general endoscopic scenarios, they struggle in the highly constrained environment of the colon. Due to the limited field of view of a camera moving through an actively deforming tubular structure, existing endoscopic methods reconstruct the colon appearance only for initial camera trajectory. However, the underlying anatomy remains largely static; instead of updating Gaussians' spatial coordinates (xyz), these methods encode deformation through either rotation, scale or opacity adjustments. In this paper, we first present a benchmark analysis of state-of-the-art dynamic endoscopic methods for realistic colonoscopic scenes, showing that they fail to model true anatomical motion. To enable rigorous evaluation of global reconstruction quality, we introduce DynamicColon, a synthetic dataset with ground-truth point clouds at every timestep. Building on these insights, we propose ColonSplat, a dynamic Gaussian Splatting framework that captures peristaltic-like motion while preserving global geometric consistency, achieving superior geometric fidelity on C3VDv2 and DynamicColon datasets. Project page: this https URL\n\n准确重建结肠镜数据中的三维结构，并考虑复杂的肠蠕动，对先进的手术导航和回顾性诊断都至关重要。尽管近期的新视角合成与三维重建方法在一般内窥镜场景中表现出色，但它们在结肠这一高度受限的环境中仍然面临困难。由于相机在一个主动变形的管状结构内移动，视野十分有限，现有内窥镜方法通常只能针对初始相机轨迹重建结肠外观。然而，底层解剖结构在很大程度上是静态的；这些方法并不更新高斯的空间坐标（xyz），而是通过旋转、尺度或不透明度调整来编码形变。本文首先对当前面向真实结肠镜场景的动态内窥镜方法进行了基准分析，表明它们无法建模真实的解剖运动。为实现对全局重建质量的严格评估，我们引入了 DynamicColon，这是一个在每个时间步都提供真实点云的合成数据集。在此基础上，我们提出 ColonSplat，这是一种动态 Gaussian Splatting 框架，能够在保持全局几何一致性的同时捕获类似肠蠕动的运动，并在 C3VDv2 和 DynamicColon 数据集上取得更优的几何保真度。项目页面：this https URL\n"
  },
  {
    "path": "abs/2603.06989.md",
    "content": "### MipSLAM: Alias-Free Gaussian Splatting SLAM\n\nThis paper introduces MipSLAM, a frequency-aware 3D Gaussian Splatting (3DGS) SLAM framework capable of high-fidelity anti-aliased novel view synthesis and robust pose estimation under varying camera configurations. Existing 3DGS-based SLAM systems often suffer from aliasing artifacts and trajectory drift due to inadequate filtering and purely spatial optimization. To overcome these limitations, we propose an Elliptical Adaptive Anti-aliasing (EAA) algorithm that approximates Gaussian contributions via geometry-aware numerical integration, avoiding costly analytic computation. Furthermore, we present a Spectral-Aware Pose Graph Optimization (SA-PGO) module that reformulates trajectory estimation in the frequency domain, effectively suppressing high-frequency noise and drift through graph Laplacian analysis. A novel local frequency-domain perceptual loss is also introduced to enhance fine-grained geometric detail recovery. Extensive evaluations on Replica and TUM datasets demonstrate that MipSLAM achieves state-of-the-art rendering quality and localization accuracy across multiple resolutions while maintaining real-time capability. Code is available at https://github.com/yzli1998/MipSLAM.\n\n本文提出 MipSLAM，一个具备频率感知能力的三维高斯喷溅 SLAM 框架，可在不同相机配置下实现高保真、无混叠的新视角合成和稳健位姿估计。现有基于 3DGS 的 SLAM 系统通常由于滤波不足和仅基于空间的优化而受到混叠伪影和轨迹漂移的影响。为克服这些限制，我们提出椭圆自适应抗混叠算法，通过几何感知的数值积分近似高斯贡献，避免昂贵的解析计算。进一步地，我们提出频谱感知位姿图优化模块，将轨迹估计重新表述到频域中，并通过图拉普拉斯分析有效抑制高频噪声和漂移。我们还引入一种新的局部频域感知损失，以增强细粒度几何细节恢复。在 Replica 和 TUM 数据集上的大量评测表明，MipSLAM 在保持实时能力的同时，在多种分辨率下实现了最先进的渲染质量和定位精度。代码见 https://github.com/yzli1998/MipSLAM。\n"
  },
  {
    "path": "abs/2603.07552.md",
    "content": "### ReconDrive: Fast Feed-Forward 4D Gaussian Splatting for Autonomous Driving Scene Reconstruction\n\nHigh-fidelity visual reconstruction and novel-view synthesis are essential for realistic closed-loop evaluation in autonomous driving. While 4D Gaussian Splatting (4DGS) offers a promising balance of accuracy and efficiency, existing per-scene optimization methods require costly iterative refinement, rendering them unscalable for extensive urban environments. Conversely, current feed-forward approaches often suffer from degraded photometric quality. To address these limitations, we propose ReconDrive, a feed-forward framework that leverages and extends the 3D foundation model VGGT for rapid, high-fidelity 4DGS generation. Our architecture introduces two core adaptations to tailor the foundation model to dynamic driving scenes: (1) Hybrid Gaussian Prediction Heads, which decouple the regression of spatial coordinates and appearance attributes to overcome the photometric deficiencies inherent in generalized foundation features; and (2) a Static-Dynamic 4D Composition strategy that explicitly captures temporal motion via velocity modeling to represent complex dynamic environments. Benchmarked on nuScenes, ReconDrive significantly outperforms existing feed-forward baselines in reconstruction, novel-view synthesis, and 3D perception. It achieves performance competitive with per-scene optimization while being orders of magnitude faster, providing a scalable and practical solution for realistic driving simulation.\n\n高保真视觉重建和新视角合成是实现自动驾驶真实闭环评测的关键。虽然四维 Gaussian Splatting（4DGS）在精度与效率之间提供了很有吸引力的平衡，但现有按场景逐一优化的方法需要代价高昂的迭代细化，因此难以扩展到大规模城市场景；相反，现有前馈式方法又常常出现光度质量下降的问题。为解决这些限制，我们提出 ReconDrive，这是一种前馈式框架，它利用并扩展三维基础模型 VGGT，以实现快速且高保真的 4DGS 生成。我们的架构针对动态驾驶场景引入了两个核心改动：1）混合高斯预测头，将空间坐标与外观属性的回归解耦，以缓解通用基础特征在光度建模上的缺陷；2）静态-动态四维组合策略，通过速度建模显式捕获时间运动，从而表示复杂动态环境。在 nuScenes 基准上，ReconDrive 在重建、新视角合成和三维感知方面都显著优于现有前馈基线。它在速度上比逐场景优化方法快了几个数量级，同时性能具有竞争力，为真实驾驶仿真提供了可扩展且实用的解决方案。\n"
  },
  {
    "path": "abs/2603.07587.md",
    "content": "### 3DGS-HPC: Distractor-free 3D Gaussian Splatting with Hybrid Patch-wise Classification\n\n3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in novel view synthesis and 3D scene reconstruction, yet its quality often degrades in real-world environments due to transient distractors, such as moving objects and varying shadows. Existing methods commonly rely on semantic cues extracted from pre-trained vision models to identify and suppress these distractors, but such semantics are misaligned with the binary distinction between static and transient regions and remain fragile under the appearance perturbations introduced during 3DGS optimization. We propose 3DGS-HPC, a framework that circumvents these limitations by combining two complementary principles: a patch-wise classification strategy that leverages local spatial consistency for robust region-level decisions, and a hybrid classification metric that adaptively integrates photometric and perceptual cues for more reliable separation. Extensive experiments demonstrate the superiority and robustness of our method in mitigating distractors to improve 3DGS-based novel view synthesis.\n\n三维高斯喷溅在新视角合成和三维场景重建中展现了卓越性能，但在真实环境中，其质量常常会因运动物体和变化阴影等瞬时干扰而下降。现有方法通常依赖预训练视觉模型提取的语义线索来识别并抑制这些干扰，但这类语义并不天然对应于静态区域与瞬时区域的二元划分，而且在 3DGS 优化过程中引入的外观扰动下也较为脆弱。我们提出 3DGS-HPC，一个结合两种互补原则的框架，以规避这些限制：一是利用局部空间一致性的基于图像块分类策略，用于实现更稳健的区域级判断；二是混合分类度量，可自适应整合光度和感知线索，从而更可靠地完成分离。大量实验表明，我们的方法在减轻干扰、提升基于 3DGS 的新视角合成质量方面具有明显优势和较强鲁棒性。\n"
  },
  {
    "path": "abs/2603.07604.md",
    "content": "### EmbedTalk: Triplane-Free Talking Head Synthesis using Embedding-Driven Gaussian Deformation\n\nReal-time talking head synthesis increasingly relies on deformable 3D Gaussian Splatting (3DGS) due to its low latency. Tri-planes are the standard choice for encoding Gaussians prior to deformation, since they provide a continuous domain with explicit spatial relationships. However, tri-plane representations are limited by grid resolution and approximation errors introduced by projecting 3D volumetric fields onto 2D subspaces. Recent work has shown the superiority of learnt embeddings for driving temporal deformations in 4D scene reconstruction. We introduce $\\textbf{EmbedTalk}$, which shows how such embeddings can be leveraged for modelling speech deformations in talking head synthesis. Through comprehensive experiments, we show that EmbedTalk outperforms existing 3DGS-based methods in rendering quality, lip synchronisation, and motion consistency, while remaining competitive with state-of-the-art generative models. Moreover, replacing the tri-plane encoding with learnt embeddings enables significantly more compact models that achieve over 60 FPS on a mobile GPU (RTX 2060 6 GB). Our code will be placed in the public domain on acceptance.\n\n由于延迟低，实时说话人头合成越来越多地依赖可形变的三维高斯喷溅。三平面通常被用作高斯形变前的编码方式，因为它提供了具有显式空间关系的连续表示域。然而，三平面表示会受到网格分辨率限制，并且将三维体场投影到二维子空间时会引入近似误差。近期研究已经表明，学习得到的嵌入在驱动四维场景重建中的时间形变方面具有优势。我们提出 EmbedTalk，展示了如何利用这类嵌入来建模说话人头合成中的语音驱动形变。通过全面实验，我们证明 EmbedTalk 在渲染质量、口型同步和运动一致性方面优于现有基于 3DGS 的方法，同时与最先进的生成模型相比也具有竞争力。此外，用学习嵌入替代三平面编码后，模型可显著压缩，并在移动 GPU，具体为 RTX 2060 6 GB 上实现超过 60 FPS 的运行速度。代码将在论文录用后公开。\n"
  },
  {
    "path": "abs/2603.07660.md",
    "content": "### Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence\n\nThe pursuit of spatial intelligence fundamentally relies on access to large-scale, fine-grained 3D data. However, existing approaches predominantly construct spatial understanding benchmarks by generating question-answer (QA) pairs from a limited number of manually annotated datasets, rather than systematically annotating new large-scale 3D scenes from raw web data. As a result, their scalability is severely constrained, and model performance is further hindered by domain gaps inherent in these narrowly curated datasets. In this work, we propose Holi-Spatial, the first fully automated, large-scale, spatially-aware multimodal dataset, constructed from raw video inputs without human intervention, using the proposed data curation pipeline. Holi-Spatial supports multi-level spatial supervision, ranging from geometrically accurate 3D Gaussian Splatting (3DGS) reconstructions with rendered depth maps to object-level and relational semantic annotations, together with corresponding spatial Question-Answer (QA) pairs. Following a principled and systematic pipeline, we further construct Holi-Spatial-4M, the first large-scale, high-quality 3D semantic dataset, containing 12K optimized 3DGS scenes, 1.3M 2D masks, 320K 3D bounding boxes, 320K instance captions, 1.2M 3D grounding instances, and 1.2M spatial QA pairs spanning diverse geometric, relational, and semantic reasoning tasks. Holi-Spatial demonstrates exceptional performance in data curation quality, significantly outperforming existing feed-forward and per-scene optimized methods on datasets such as ScanNet, ScanNet++, and DL3DV. Furthermore, fine-tuning Vision-Language Models (VLMs) on spatial reasoning tasks using this dataset has also led to substantial improvements in model performance.\n\n对空间智能的追求从根本上依赖于大规模、细粒度三维数据。然而，现有方法主要通过从少量人工标注数据集中生成问答对来构建空间理解基准，而不是系统地从原始网络数据中为新的大规模三维场景进行标注。因此，它们的可扩展性受到严重限制，模型性能还会进一步受到这些狭窄精选数据集所固有域差的影响。本文提出 Holi-Spatial，这是首个完全自动化、大规模、具备空间感知能力的多模态数据集，通过所提出的数据整理流程，从原始视频输入中无需人工干预构建而成。Holi-Spatial 支持多层次空间监督，既包括具有几何精度的三维高斯喷溅重建及其渲染深度图，也包括对象级与关系级语义标注，以及对应的空间问答对。沿着一条原则清晰、系统化的流程，我们进一步构建了 Holi-Spatial-4M，这是首个大规模高质量三维语义数据集，包含 1.2 万个优化后的 3DGS 场景、130 万个二维掩码、32 万个三维边界框、32 万条实例描述、120 万个三维 grounding 实例以及 120 万个覆盖几何、关系和语义推理任务的空间问答对。Holi-Spatial 在数据构建质量上表现出色，在 ScanNet、ScanNet++ 和 DL3DV 等数据集上显著优于现有前馈式和逐场景优化方法。此外，利用该数据集在空间推理任务上微调视觉语言模型，也显著提升了模型性能。\n"
  },
  {
    "path": "abs/2603.08254.md",
    "content": "### DynamicVGGT: Learning Dynamic Point Maps for 4D Scene Reconstruction in Autonomous Driving\n\nDynamic scene reconstruction in autonomous driving remains a fundamental challenge due to significant temporal variations, moving objects, and complex scene dynamics. Existing feed-forward 3D models have demonstrated strong performance in static reconstruction but still struggle to capture dynamic motion. To address these limitations, we propose DynamicVGGT, a unified feed-forward framework that extends VGGT from static 3D perception to dynamic 4D reconstruction. Our goal is to model point motion within feed-forward 3D models in a dynamic and temporally coherent manner. To this end, we jointly predict the current and future point maps within a shared reference coordinate system, allowing the model to implicitly learn dynamic point representations through temporal correspondence. To efficiently capture temporal dependencies, we introduce a Motion-aware Temporal Attention (MTA) module that learns motion continuity. Furthermore, we design a Dynamic 3D Gaussian Splatting Head that explicitly models point motion by predicting Gaussian velocities using learnable motion tokens under scene flow supervision. It refines dynamic geometry through continuous 3D Gaussian optimization. Extensive experiments on autonomous driving datasets demonstrate that DynamicVGGT significantly outperforms existing methods in reconstruction accuracy, achieving robust feed-forward 4D dynamic scene reconstruction under complex driving scenarios.\n\n自动驾驶中的动态场景重建由于显著的时间变化、运动物体和复杂场景动态，仍然是一项基础性挑战。现有前馈式三维模型在静态重建上已表现出较强性能，但在捕捉动态运动方面仍然不足。为此，我们提出 DynamicVGGT，一个统一的前馈框架，将 VGGT 从静态三维感知扩展到动态四维重建。我们的目标是在前馈式三维模型中以动态且时间一致的方式建模点的运动。为此，我们在共享参考坐标系中联合预测当前点图和未来点图，使模型能够通过时间对应关系隐式学习动态点表示。为了高效捕捉时间依赖，我们引入运动感知时间注意力模块来学习运动连续性。进一步地，我们设计了动态三维高斯喷溅头，在场景流监督下，通过可学习的运动 token 预测高斯速度，显式建模点运动，并通过连续三维高斯优化细化动态几何。在自动驾驶数据集上的大量实验表明，DynamicVGGT 在重建精度上显著优于现有方法，能够在复杂驾驶场景下实现稳健的前馈式四维动态场景重建。\n"
  },
  {
    "path": "abs/2603.08503.md",
    "content": "### Spherical-GOF: Geometry-Aware Panoramic Gaussian Opacity Fields for 3D Scene Reconstruction\n\nOmnidirectional images are increasingly used in robotics and vision due to their wide field of view. However, extending 3D Gaussian Splatting (3DGS) to panoramic camera models remains challenging, as existing formulations are designed for perspective projections and naive adaptations often introduce distortion and geometric inconsistencies. We present Spherical-GOF, an omnidirectional Gaussian rendering framework built upon Gaussian Opacity Fields (GOF). Unlike projection-based rasterization, Spherical-GOF performs GOF ray sampling directly on the unit sphere in spherical ray space, enabling consistent ray-Gaussian interactions for panoramic rendering. To make the spherical ray casting efficient and robust, we derive a conservative spherical bounding rule for fast ray-Gaussian culling and introduce a spherical filtering scheme that adapts Gaussian footprints to distortion-varying panoramic pixel sampling. Extensive experiments on standard panoramic benchmarks (OmniBlender and OmniPhotos) demonstrate competitive photometric quality and substantially improved geometric consistency. Compared with the strongest baseline, Spherical-GOF reduces depth reprojection error by 57% and improves cycle inlier ratio by 21%. Qualitative results show cleaner depth and more coherent normal maps, with strong robustness to global panorama rotations. We further validate generalization on OmniRob, a real-world robotic omnidirectional dataset introduced in this work, featuring UAV and quadruped platforms. The source code and the OmniRob dataset will be released at https://github.com/1170632760/Spherical-GOF.\n\n全向图像因视场角宽广而在机器人和视觉任务中被越来越多地使用。然而，将三维高斯喷溅扩展到全景相机模型仍然具有挑战，因为现有公式主要面向透视投影设计，简单改造往往会引入畸变和几何不一致。我们提出 Spherical-GOF，一个建立在 Gaussian Opacity Fields 之上的全向高斯渲染框架。不同于基于投影的光栅化方式，Spherical-GOF 直接在球面射线空间中的单位球面上执行 Gaussian Opacity Fields 射线采样，从而实现适用于全景渲染的一致射线与高斯交互。为使球面射线投射高效且稳健，我们推导了保守的球面包围规则以加速射线与高斯的裁剪，并引入一种球面滤波策略，使高斯足迹能够适应因畸变而变化的全景像素采样。在标准全景基准 OmniBlender 和 OmniPhotos 上的大量实验表明，该方法在保持有竞争力光度质量的同时，显著提升了几何一致性。与最强基线相比，Spherical-GOF 将深度重投影误差降低了 57%，并将循环内点比例提升了 21%。定性结果表明，它能够得到更干净的深度图和更一致的法线图，并且对全局全景旋转具有很强鲁棒性。我们还在本文引入的真实机器人全向数据集 OmniRob 上验证了其泛化能力，该数据集包含无人机和四足平台。代码和 OmniRob 数据集将发布于 https://github.com/1170632760/Spherical-GOF。\n"
  },
  {
    "path": "abs/2603.08661.md",
    "content": "### ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for 3D Gaussian Splatting\n\nRecent advancements in 3D Gaussian Splatting (3DGS) have shifted the focus toward balancing reconstruction fidelity with computational efficiency. In this work, we propose ImprovedGS+, a high-performance, low-level reinvention of the ImprovedGS strategy, implemented natively within the LichtFeld-Studio framework. By transitioning from high-level Python logic to hardware-optimized C++/CUDA kernels, we achieve a significant reduction in host-device synchronization and training latency. Our implementation introduces a Long-Axis-Split (LAS) CUDA kernel, custom Laplacian-based importance kernels with Non-Maximum Suppression (NMS) for edge scores, and an adaptive Exponential Scale Scheduler. Experimental results on the Mip-NeRF360 dataset demonstrate that ImprovedGS+ establishes a new Pareto-optimal front for scene reconstruction. Our 1M-budget variant outperforms the state-of-the-art MCMC baseline by achieving a 26.8% reduction in training time (saving 17 minutes per session) and utilizing 13.3% fewer Gaussians while maintaining superior visual quality. Furthermore, our full variant demonstrates a 1.28 dB PSNR increase over the ADC baseline with a 38.4% reduction in parametric complexity. These results validate ImprovedGS+ as a scalable, high-speed solution that upholds the core pillars of Speed, Quality, and Usability within the LichtFeld-Studio ecosystem.\n\n近期三维高斯喷溅的研究重心逐渐转向如何在重建保真度与计算效率之间取得平衡。本文提出 ImprovedGS+，这是对 ImprovedGS 策略的一种高性能、底层重构实现，原生集成于 LichtFeld-Studio 框架中。通过将高层 Python 逻辑迁移到面向硬件优化的 C++ 和 CUDA 内核，我们显著减少了主机与设备之间的同步开销以及训练延迟。我们的实现引入了长轴分裂 CUDA 内核、自定义基于拉普拉斯的带非极大值抑制边缘评分重要性内核，以及自适应指数尺度调度器。在 Mip-NeRF360 数据集上的实验表明，ImprovedGS+ 为场景重建建立了新的帕累托最优前沿。预算为 100 万高斯的版本相较现有最先进的 MCMC 基线，训练时间减少了 26.8%，每次训练节省约 17 分钟，同时高斯数量减少 13.3%，且视觉质量更优。此外，完整版本在参数复杂度降低 38.4% 的前提下，相较 ADC 基线实现了 1.28 dB 的 PSNR 提升。这些结果证明，ImprovedGS+ 是一种可扩展、高速的解决方案，能够在 LichtFeld-Studio 生态中兼顾速度、质量与可用性。\n"
  },
  {
    "path": "abs/2603.08809.md",
    "content": "### Where, What, Why: Toward Explainable 3D-GS Watermarking\n\nAs 3D Gaussian Splatting becomes the de facto representation for interactive 3D assets, robust yet imperceptible watermarking is critical. We present a representation-native framework that separates where to write from how to preserve quality. A Trio-Experts module operates directly on Gaussian primitives to derive priors for carrier selection, while a Safety and Budget Aware Gate (SBAG) allocates Gaussians to watermark carriers, optimized for bit resilience under perturbation and bitrate budgets, and to visual compensators that are insulated from watermark loss. To maintain fidelity, we introduce a channel-wise group mask that controls gradient propagation for carriers and compensators, thereby limiting Gaussian parameter updates, repairing local artifacts, and preserving high-frequency details without increasing runtime. Our design yields view-consistent watermark persistence and strong robustness against common image distortions such as compression and noise, while achieving a favorable robustness-quality trade-off compared with prior methods. In addition, decoupled finetuning provides per-Gaussian attributions that reveal where the message is carried and why those carriers are selected, enabling auditable explainability. Compared with state-of-the-art methods, our approach achieves a PSNR improvement of +0.83 dB and a bit-accuracy gain of +1.24%.\n\n随着三维高斯喷溅成为交互式三维资产的事实标准表示方式，既稳健又不可感知的水印变得至关重要。我们提出一种原生于该表示的框架，将“写入哪里”和“如何保持质量”这两个问题分离开来。一个 Trio-Experts 模块直接作用于高斯基元，用于得到载体选择先验；随后，Safety and Budget Aware Gate 会根据扰动下比特稳健性和码率预算，将高斯分配给水印载体，或分配给不受水印丢失影响的视觉补偿器。为保持保真度，我们引入按通道分组的掩码，用于控制载体和补偿器的梯度传播，从而限制高斯参数更新、修复局部伪影并保留高频细节，而不会增加运行时间。我们的设计能够实现跨视角一致的水印持久性，并对压缩和噪声等常见图像失真具有较强鲁棒性，同时相比已有方法在稳健性与质量之间取得更优平衡。此外，解耦式微调还能提供逐高斯归因，揭示消息承载位置以及为何选择这些载体，从而支持可审计的可解释性。与现有最先进方法相比，我们的方法在 PSNR 上提高了 0.83 dB，在比特准确率上提升了 1.24%。\n"
  },
  {
    "path": "abs/2603.08983.md",
    "content": "### SurgCalib: Gaussian Splatting-Based Hand-Eye Calibration for Robot-Assisted Minimally Invasive Surgery\n\nWe present a Gaussian Splatting-based framework for hand-eye calibration of the da Vinci surgical robot. In a vision-guided robotic system, accurate estimation of the rigid transformation between the robot base and the camera frame is essential for reliable closed-loop control. For cable-driven surgical robots, this task faces unique challenges. The encoders of surgical instruments often produce inaccurate proprioceptive measurements due to cable stretch and backlash. Conventional hand-eye calibration approaches typically rely on known fiducial patterns and solve the AX = XB formulation. While effective, introducing additional markers into the operating room (OR) environment can violate sterility protocols and disrupt surgical workflows. In this study, we propose SurgCalib, an automatic, markerless framework that has the potential to be used in the OR. SurgCalib first initializes the pose of the surgical instrument using raw kinematic measurements and subsequently refines this pose through a two-phase optimization procedure under the RCM constraint within a Gaussian Splatting-based differentiable rendering pipeline. We evaluate the proposed method on the public dVRK benchmark, SurgPose. The results demonstrate average 2D tool-tip reprojection errors of 12.24 px (2.06 mm) and 11.33 px (1.9 mm), and 3D tool-tip Euclidean distance errors of 5.98 mm and 4.75 mm, for the left and right instruments, respectively.\n\n我们提出了一种基于 Gaussian Splatting 的达芬奇手术机器人手眼标定框架。在视觉引导机器人系统中，准确估计机器人底座坐标系与相机坐标系之间的刚体变换，是实现可靠闭环控制的关键。对于缆线驱动的手术机器人，这一任务存在独特挑战：由于缆线拉伸和回程间隙，手术器械编码器给出的本体感觉测量往往并不准确。传统手眼标定方法通常依赖已知的标志物图案，并求解 AX = XB 形式的方程。虽然这类方法有效，但在手术室环境中引入额外标记物可能破坏无菌规范，并干扰手术流程。为此，我们提出了 SurgCalib，这是一种自动、无标记的框架，具备在手术室中使用的潜力。SurgCalib 先利用原始运动学测量对手术器械位姿进行初始化，然后在基于 Gaussian Splatting 的可微渲染流程中，在 RCM 约束下通过两阶段优化进一步细化该位姿。我们在公开的 dVRK 基准数据集 SurgPose 上评估了该方法。结果表明，对于左右两个器械，其平均二维工具尖端重投影误差分别为 12.24 像素（2.06 毫米）和 11.33 像素（1.9 毫米），三维工具尖端欧氏距离误差分别为 5.98 毫米和 4.75 毫米。\n"
  },
  {
    "path": "abs/2603.08997.md",
    "content": "### SkipGS: Post-Densification Backward Skipping for Efficient 3DGS Training\n\n3D Gaussian Splatting (3DGS) achieves real-time novel-view synthesis by optimizing millions of anisotropic Gaussians, yet its training remains expensive, with the backward pass dominating runtime in the post-densification refinement phase. We observe substantial update redundancy in this phase: many sampled views have near-plateaued losses and provide diminishing gradient benefits, but standard training still runs full backpropagation. We propose SkipGS with a novel view-adaptive backward gating mechanism for efficient post-densification training. SkipGS always performs the forward pass to update per-view loss statistics, and selectively skips backward passes when the sampled view's loss is consistent with its recent per-view baseline, while enforcing a minimum backward budget for stable optimization. On Mip-NeRF 360, compared to 3DGS, SkipGS reduces end-to-end training time by 23.1%, driven by a 42.0% reduction in post-densification time, with comparable reconstruction quality. Because it only changes when to backpropagate -- without modifying the renderer, representation, or loss -- SkipGS is plug-and-play and compatible with other complementary efficiency strategies for additive speedups.\n\n三维高斯喷溅通过优化数百万个各向异性高斯，实现了实时新视角合成，但其训练成本依然很高，尤其是在增密后的细化阶段，反向传播占据了主要运行时间。我们观察到这一阶段存在明显的更新冗余：许多采样视图的损失已接近平台期，继续计算梯度的收益很小，但标准训练流程仍会对其执行完整反向传播。为此，我们提出 SkipGS，它带有一种新的视图自适应反向传播门控机制，用于高效执行增密后训练。SkipGS 总是执行前向传播，以更新逐视图损失统计；而当某个采样视图的损失与其近期基线保持一致时，它就会选择性跳过反向传播，同时又通过设定最小反向传播预算来保证优化稳定。在 Mip-NeRF 360 数据集上，相比原始 3DGS，SkipGS 将端到端训练时间减少了 23.1%，其中增密后阶段的时间减少了 42.0%，同时保持了相近的重建质量。由于它只改变何时执行反向传播，而不修改渲染器、表示方式或损失函数，因此 SkipGS 是即插即用的，并且可与其他互补的效率提升策略结合以获得累加式加速。\n"
  },
  {
    "path": "abs/2603.09079.md",
    "content": "### GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models\n\nVLA models encode visual observations as 2D patch tokens with no intrinsic geometric structure. We introduce GST-VLA with two contributions. First, the Gaussian Spatial Tokenizer (GST) converts frozen dense depth and frozen semantic patch features into $N_g{=}128$ anisotropic 3D Gaussian primitives, each parameterized by a metric residual mean $μ\\in \\mathbb{R}^3$, log-scale covariance $\\log σ\\in \\mathbb{R}^3$, and learned opacity $α\\in (0,1)$. The covariance eigenstructure encodes local surface orientation, and opacity provides per-primitive geometric confidence, both inaccessible from scalar depth. Spatial attention pooling with learned queries concentrates the fixed token budget on geometrically salient regions rather than distributing uniformly. Second, 3D Depth-Aware Chain-of-Thought (DA-CoT) reasoning supervises four structured intermediate spatial thoughts, covering 3D object grounding, grasp affordance contact geometry, pairwise metric distances, and coarse SE(3) waypoints, as explicit generation targets in the training loss. A cross-attention sublayer at every VLM transformer block provides direct access to the raw 256-primitive Gaussian field during DA-CoT generation. A 300M-parameter flow-matching action expert with mixture-of-experts feedforward sublayers decodes 7-DoF delta action chunks via conditional ODE integration, conditioned on both VLM hidden states and DA-CoT outputs through dual cross-attention. Trained with composite $\\mathcal{L}_\\mathrm{flow} + \\mathcal{L}_\\mathrm{CoT} + \\mathcal{L}_\\mathrm{depth}$ across three progressive stages, GST-VLA achieves 96.4% on LIBERO (+2.0%), and 80.2% on SimplerEnv (+5.4%). Ablations isolate the contribution of each GST component, each DA-CoT thought, and each training stage, confirming independent and synergistic gains concentrated on precision demanding tasks.\n\n现有视觉语言动作模型将视觉观测编码为二维图像块 token，本身缺乏几何结构。我们提出 GST-VLA，并包含两项主要贡献。首先，Gaussian Spatial Tokenizer 模块将冻结的稠密深度特征和冻结的语义图像块特征转换为 128 个各向异性的三维高斯基元，每个基元由度量残差均值、对数尺度协方差和可学习不透明度参数化。协方差的特征结构编码局部表面朝向，而不透明度则提供逐基元的几何置信度，这些都是单一深度值无法表达的信息。带有可学习查询的空间注意力池化会将有限的 token 预算集中到几何上更重要的区域，而不是平均分配。其次，三维深度感知思维链推理对四类结构化中间空间思维进行监督，包括三维目标定位、抓取可供性接触几何、两两度量距离以及粗粒度 SE3 路径点，并将它们作为训练损失中的显式生成目标。在每个视觉语言模型 Transformer 模块中，我们加入交叉注意力子层，使模型在生成深度感知思维链时可以直接访问原始的 256 基元高斯场。随后，一个 3 亿参数的基于 flow-matching 的动作专家通过混合专家前馈子层，利用条件常微分方程积分来解码七自由度的增量动作片段，其条件同时来自视觉语言模型隐藏状态和深度感知思维链输出。在包含流损失、思维链损失和深度损失的三阶段渐进式训练下，GST-VLA 在 LIBERO 上达到 96.4%，提升 2.0%，在 SimplerEnv 上达到 80.2%，提升 5.4%。消融实验分别验证了各个 GST 组件、各类深度感知思维链以及各训练阶段的独立贡献和协同增益，尤其在需要高精度的任务上表现明显。\n"
  },
  {
    "path": "abs/2603.09277.md",
    "content": "### Speeding Up the Learning of 3D Gaussians with Much Shorter Gaussian Lists\n\n3D Gaussian splatting (3DGS) has become a vital tool for learning a radiance field from multiple posed images. Although 3DGS shows great advantages over NeRF in terms of rendering quality and efficiency, it remains a research challenge to further improve the efficiency of learning 3D Gaussians. To overcome this challenge, we propose novel training strategies and losses to shorten each Gaussian list used to render a pixel, which speeds up the splatting by involving fewer Gaussians along a ray. Specifically, we shrink the size of each Gaussian by resetting their scales regularly, encouraging smaller Gaussians to cover fewer nearby pixels, which shortens the Gaussian lists of pixels. Additionally, we introduce an entropy constraint on the alpha blending procedure to sharpen the weight distribution of Gaussians along each ray, which drives dominant weights larger while making minor weights smaller. As a result, each Gaussian becomes more focused on the pixels where it is dominant, which reduces its impact on nearby pixels, leading to even shorter Gaussian lists. Eventually, we integrate our method into a rendering resolution scheduler which further improves efficiency through progressive resolution increase. We evaluate our method by comparing it with state-of-the-art methods on widely used benchmarks. Our results show significant advantages over others in efficiency without sacrificing rendering quality.\n\n三维高斯喷溅已经成为从多张带位姿图像中学习辐射场的重要工具。尽管 3DGS 在渲染质量和效率上相较 NeRF 具有明显优势，但如何进一步提升三维高斯学习效率仍然是研究挑战。为解决这一问题，我们提出一套新的训练策略和损失函数，用于缩短渲染每个像素时所需的高斯列表，从而通过减少沿射线参与计算的高斯数量来加速喷溅过程。具体而言，我们通过定期重置高斯尺度来缩小每个高斯的覆盖范围，促使更小的高斯只覆盖更少的邻近像素，从而缩短像素对应的高斯列表。此外，我们在 alpha 混合过程中引入熵约束，以锐化每条射线上高斯权重的分布，强化主导权重、削弱次要权重。这样，每个高斯会更加集中地作用于其占主导地位的像素，并减少对邻近像素的影响，从而进一步缩短高斯列表。最终，我们将该方法整合进一个渲染分辨率调度器中，通过渐进式提高分辨率进一步提升效率。在常用基准上的比较表明，我们的方法在不牺牲渲染质量的前提下，在效率方面显著优于现有最先进方法。\n"
  },
  {
    "path": "abs/2603.09291.md",
    "content": "### DenoiseSplat: Feed-Forward Gaussian Splatting for Noisy 3D Scene Reconstruction\n\n3D scene reconstruction and novel-view synthesis are fundamental for VR, robotics, and content creation. However, most NeRF and 3D Gaussian Splatting pipelines assume clean inputs and degrade under real noise and artifacts. We therefore propose DenoiseSplat, a feed-forward 3D Gaussian splatting method for noisy multi-view images. We build a large-scale, scene-consistent noisy--clean benchmark on RE10K by injecting Gaussian, Poisson, speckle, and salt-and-pepper noise with controlled intensities. With a lightweight MVSplat-style feed-forward backbone, we train end-to-end using only clean 2D renderings as supervision and no 3D ground truth. On noisy RE10K, DenoiseSplat outperforms vanilla MVSplat and a strong two-stage baseline (IDF + MVSplat) in PSNR/SSIM and LPIPS across noise types and levels.\n\n三维场景重建和新视角合成是虚拟现实、机器人和内容创作中的基础能力。然而，大多数 NeRF 和三维高斯喷溅流程都假设输入是干净的，在真实噪声和伪影存在时性能会明显下降。为此，我们提出 DenoiseSplat，一种面向噪声多视图图像的前馈式三维高斯喷溅方法。我们在 RE10K 数据集上构建了一个大规模、场景一致的噪声与干净图像基准，通过可控强度注入高斯噪声、泊松噪声、散斑噪声以及椒盐噪声。在一个轻量级、类似 MVSplat 的前馈骨干上，我们仅利用干净的二维渲染结果作为监督进行端到端训练，而无需任何三维真实标注。在带噪 RE10K 上，DenoiseSplat 在多种噪声类型和强度下都在 PSNR、SSIM 和 LPIPS 上优于原始 MVSplat 以及强大的两阶段基线，也就是 IDF 加 MVSplat。\n"
  },
  {
    "path": "abs/2603.09621.md",
    "content": "### Physics-Driven 3D Gaussian Rendering for Zero-Shot MRI Super-Resolution\n\nHigh-resolution Magnetic Resonance Imaging (MRI) is vital for clinical diagnosis but limited by long acquisition times and motion artifacts. Super-resolution (SR) reconstructs low-resolution scans into high-resolution images, yet existing methods are mutually constrained: paired-data methods achieve efficiency only by relying on costly aligned datasets, while implicit neural representation approaches avoid such data needs at the expense of heavy computation. We propose a zero-shot MRI SR framework using explicit Gaussian representation to balance data requirements and efficiency. MRI-tailored Gaussian parameters embed tissue physical properties, reducing learnable parameters while preserving MR signal fidelity. A physics-grounded volume rendering strategy models MRI signal formation via normalized Gaussian aggregation. Additionally, a brick-based order-independent rasterization scheme enables highly parallel 3D computation, lowering training and inference costs. Experiments on two public MRI datasets show superior reconstruction quality and efficiency, demonstrating the method's potential for clinical MRI SR.\n\n高分辨率磁共振成像对于临床诊断至关重要，但受到采集时间长和运动伪影的限制。超分辨率方法可以将低分辨率扫描重建为高分辨率图像，但现有方法存在两难：依赖成对数据的方法虽然高效，却需要昂贵的对齐数据集；隐式神经表示方法则避免了对这类数据的依赖，却要付出较高计算代价。我们提出一个零样本 MRI 超分辨率框架，使用显式高斯表示来平衡数据需求与计算效率。面向 MRI 设计的高斯参数编码了组织的物理属性，在保留磁共振信号保真度的同时减少可学习参数数量。基于物理的体渲染策略则通过归一化高斯聚合来模拟 MRI 信号形成过程。此外，我们还设计了基于砖块的无序独立光栅化方案，可支持高度并行的三维计算，降低训练和推理成本。在两个公开 MRI 数据集上的实验表明，该方法在重建质量和效率上都表现更优，显示出其在临床 MRI 超分辨率中的潜力。\n"
  },
  {
    "path": "abs/2603.09632.md",
    "content": "### X-GS: An Extensible Open Framework Unifying 3DGS Architectures with Downstream Multimodal Models\n\n3D Gaussian Splatting (3DGS) has emerged as a powerful technique for novel view synthesis, subsequently extending into numerous spatial AI applications. However, most existing 3DGS methods are isolated, focusing on specific domains such as online SLAM, semantic enrichment, or 3DGS for unposed images. In this paper, we introduce X-GS, an extensible open framework that unifies a broad range of techniques to enable real-time 3DGS-based online SLAM enriched with semantics, bridging the gap to downstream multimodal models. At the core of X-GS is a highly efficient pipeline called X-GS-Perceiver, capable of taking unposed RGB (or optionally RGB-D) video streams as input to co-optimize geometry and poses, and distill high-dimensional semantic features from vision foundation models into the 3D Gaussians. We achieve real-time performance through a novel online Vector Quantization (VQ) module, a GPU-accelerated grid-sampling scheme, and a highly parallelized pipeline design. The semantic 3D Gaussians can then be utilized by vision-language models within the X-GS-Thinker component, enabling downstream tasks such as object detection, zero-shot caption generation, and potentially embodied tasks. Experimental results on real-world datasets showcase the efficacy, efficiency, and newly unlocked multimodal capabilities of the X-GS framework.\n\n三维高斯喷溅已成为新视角合成中的强大技术，并进一步扩展到众多空间智能应用中。然而，大多数现有 3DGS 方法彼此孤立，分别聚焦于在线 SLAM、语义增强或无位姿图像的 3DGS 等特定方向。本文提出 X-GS，一个可扩展的开放框架，它统一了广泛的技术路线，从而实现带有语义增强的实时 3DGS 在线 SLAM，并进一步桥接到下游多模态模型。X-GS 的核心是一个高效流程 X-GS-Perceiver，它能够以无位姿 RGB 视频流，或者可选的 RGB-D 视频流作为输入，联合优化几何和位姿，并将视觉基础模型中的高维语义特征蒸馏到三维高斯中。我们通过新的在线向量量化模块、GPU 加速的网格采样方案以及高度并行化的流程设计，实现了实时性能。随后，这些带语义的三维高斯可以在 X-GS-Thinker 组件中被视觉语言模型利用，用于目标检测、零样本图像描述生成，乃至潜在的具身任务。真实世界数据集上的实验结果展示了 X-GS 框架在效果、效率和新解锁的多模态能力方面的优势。\n"
  },
  {
    "path": "abs/2603.09668.md",
    "content": "### DiffWind: Physics-Informed Differentiable Modeling of Wind-Driven Object Dynamics\n\nModeling wind-driven object dynamics from video observations is highly challenging due to the invisibility and spatio-temporal variability of wind, as well as the complex deformations of objects. We present DiffWind, a physics-informed differentiable framework that unifies wind-object interaction modeling, video-based reconstruction, and forward simulation. Specifically, we represent wind as a grid-based physical field and objects as particle systems derived from 3D Gaussian Splatting, with their interaction modeled by the Material Point Method (MPM). To recover wind-driven object dynamics, we introduce a reconstruction framework that jointly optimizes the spatio-temporal wind force field and object motion through differentiable rendering and simulation. To ensure physical validity, we incorporate the Lattice Boltzmann Method (LBM) as a physics-informed constraint, enforcing compliance with fluid dynamics laws. Beyond reconstruction, our method naturally supports forward simulation under novel wind conditions and enables new applications such as wind retargeting. We further introduce WD-Objects, a dataset of synthetic and real-world wind-driven scenes. Extensive experiments demonstrate that our method significantly outperforms prior dynamic scene modeling approaches in both reconstruction accuracy and simulation fidelity, opening a new avenue for video-based wind-object interaction modeling.\n\n从视频观测中建模风驱动物体动态十分困难，因为风是不可见且在时空上变化的，同时物体形变也很复杂。我们提出 DiffWind，一个结合物理先验的可微框架，统一了风与物体交互建模、基于视频的重建以及前向仿真。具体而言，我们将风表示为基于网格的物理场，将物体表示为由三维高斯喷溅导出的粒子系统，并利用物质点法建模两者之间的交互。为了恢复风驱动物体动态，我们引入一个重建框架，通过可微渲染和可微仿真联合优化时空风力场和物体运动。为保证物理有效性，我们还引入格子玻尔兹曼方法作为物理约束，使结果符合流体动力学规律。除重建外，该方法还能自然支持在新风场条件下的前向仿真，并支持诸如风场重定向等新应用。我们还提出 WD-Objects 数据集，包含合成和真实世界的风驱动场景。大量实验表明，我们的方法在重建精度和仿真保真度方面都显著优于现有动态场景建模方法，为基于视频的风与物体交互建模开辟了新方向。\n"
  },
  {
    "path": "abs/2603.09673.md",
    "content": "### VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM\n\nSimultaneous Localization and Mapping (SLAM) with 3D Gaussian Splatting (3DGS) enables fast, differentiable rendering and high-fidelity reconstruction across diverse real-world scenes. However, existing 3DGS-SLAM approaches handle measurement reliability implicitly, making pose estimation and global alignment susceptible to drift in low-texture regions, transparent surfaces, or areas with complex reflectance properties. To this end, we introduce VarSplat, an uncertainty-aware 3DGS-SLAM system that explicitly learns per-splat appearance variance. By using the law of total variance with alpha compositing, we then render differentiable per-pixel uncertainty map via efficient, single-pass rasterization. This map guides tracking, submap registration, and loop detection toward focusing on reliable regions and contributes to more stable optimization. Experimental results on Replica (synthetic) and TUM-RGBD, ScanNet, and ScanNet++ (real-world) show that VarSplat improves robustness and achieves competitive or superior tracking, mapping, and novel view synthesis rendering compared to existing studies for dense RGB-D SLAM.\n\n结合三维高斯喷溅的同步定位与建图能够在多种真实场景中实现快速、可微渲染和高保真重建。然而，现有 3DGS-SLAM 方法对测量可靠性的处理较为隐式，导致在低纹理区域、透明表面或具有复杂反射特性的区域中，位姿估计和全局对齐容易发生漂移。为此，我们提出 VarSplat，一个具备不确定性感知能力的 3DGS-SLAM 系统，它显式学习每个喷溅基元的外观方差。随后，我们利用总方差定律和 alpha 合成，通过高效的单次光栅化渲染得到可微的逐像素不确定性图。该不确定性图会引导跟踪、子地图配准和回环检测聚焦于更可靠的区域，从而带来更稳定的优化。在 Replica 合成数据集以及 TUM-RGBD、ScanNet 和 ScanNet++ 真实数据集上的实验结果表明，VarSplat 提升了鲁棒性，并在稠密 RGB-D SLAM 的跟踪、建图和新视角渲染方面取得了与现有方法相比具有竞争力或更优的表现。\n"
  },
  {
    "path": "abs/2603.09703.md",
    "content": "### ProGS: Towards Progressive Coding for 3D Gaussian Splatting\n\nWith the emergence of 3D Gaussian Splatting (3DGS), numerous pioneering efforts have been made to address the effective compression issue of massive 3DGS data. 3DGS offers an efficient and scalable representation of 3D scenes by utilizing learnable 3D Gaussians, but the large size of the generated data has posed significant challenges for storage and transmission. Existing methods, however, have been limited by their inability to support progressive coding, a crucial feature in streaming applications with varying bandwidth. To tackle this limitation, this paper introduce a novel approach that organizes 3DGS data into an octree structure, enabling efficient progressive coding. The proposed ProGS is a streaming-friendly codec that facilitates progressive coding for 3D Gaussian splatting, and significantly improves both compression efficiency and visual fidelity. The proposed method incorporates mutual information enhancement mechanisms to mitigate structural redundancy, leveraging the relevance between nodes in the octree hierarchy. By adapting the octree structure and dynamically adjusting the anchor nodes, ProGS ensures scalable data compression without compromising the rendering quality. ProGS achieves a remarkable 45X reduction in file storage compared to the original 3DGS format, while simultaneously improving visual performance by over 10%. This demonstrates that ProGS can provide a robust solution for real-time applications with varying network conditions.\n\n随着三维高斯喷溅的发展，围绕海量 3DGS 数据高效压缩的问题已经出现了大量开创性研究。3DGS 通过可学习的三维高斯提供了一种高效、可扩展的三维场景表示，但其生成数据规模庞大，给存储与传输带来了显著挑战。不过，现有方法普遍无法支持渐进式编码，而这正是适应带宽变化的流式应用中的关键能力。为解决这一限制，本文提出一种新方法，将 3DGS 数据组织为八叉树结构，从而实现高效的渐进式编码。我们提出的 ProGS 是一种面向流式场景的编解码器，能够支持三维高斯喷溅的渐进式编码，并显著提升压缩效率和视觉保真度。该方法引入互信息增强机制，以利用八叉树层次结构中节点之间的相关性，减轻结构冗余。通过自适应调整八叉树结构和动态调节锚节点，ProGS 能在不损害渲染质量的情况下实现可扩展数据压缩。与原始 3DGS 格式相比，ProGS 在文件存储上实现了 45 倍压缩，同时视觉表现提升超过 10%。这表明 ProGS 能为网络条件不断变化的实时应用提供稳健解决方案。\n"
  },
  {
    "path": "abs/2603.09718.md",
    "content": "### GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System\n\nRecently, the 3D Gaussian splatting (3DGS) technique for real-time radiance field rendering has revolutionized the field of volumetric scene representation, providing users with an immersive experience. But in return, it also poses a large amount of data volume, which is extremely bandwidth-intensive. Cutting-edge researchers have tried to introduce different approaches and construct multiple variants for 3DGS to obtain a more compact scene representation, but it is still challenging for real-time distribution. In this paper, we propose GSStream, a novel volumetric scene streaming system to support 3DGS data format. Specifically, GSStream integrates a collaborative viewport prediction module to better predict users' future behaviors by learning collaborative priors and historical priors from multiple users and users' viewport sequences and a deep reinforcement learning (DRL)-based bitrate adaptation module to tackle the state and action space variability challenge of the bitrate adaptation problem, achieving efficient volumetric scene delivery. Besides, we first build a user viewport trajectory dataset for volumetric scenes to support the training and streaming simulation. Extensive experiments prove that our proposed GSStream system outperforms existing representative volumetric scene streaming systems in visual quality and network usage. Demo video: https://youtu.be/3WEe8PN8yvA.\n\n近年来，三维高斯喷溅这项用于实时辐射场渲染的技术，彻底改变了体场景表示领域，为用户带来了沉浸式体验。但与之相伴的是巨大的数据体量，对带宽提出了极高要求。尽管研究者已经尝试通过不同方法和多种 3DGS 变体来获得更紧凑的场景表示，但要实现实时分发仍然具有挑战。本文提出 GSStream，一个用于支持 3DGS 数据格式的新型体场景流式传输系统。具体来说，GSStream 集成了协同视口预测模块，通过学习多用户的协同先验和用户视口序列的历史先验，更准确地预测用户未来行为；同时还引入了基于深度强化学习的码率自适应模块，以解决码率自适应问题中状态空间和动作空间变化所带来的挑战，从而实现高效的体场景传输。此外，我们还首次构建了一个面向体场景的用户视口轨迹数据集，用于支持训练和流式传输仿真。大量实验表明，GSStream 在视觉质量和网络使用效率方面都优于现有代表性的体场景流式传输系统。演示视频见 https://youtu.be/3WEe8PN8yvA。\n"
  },
  {
    "path": "archive/202407.md",
    "content": "# 3D Gaussian Splatting Papers Before 2024/07/01\n\n#### [1] Lightweight Predictive 3D Gaussian Splats\n- **🧑‍🔬 作者**：Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren\n- **🏫 单位**：Snap Inc. ⟐ University of California, Los Angeles\n- **🔗 链接**：[[中英摘要](./abs/2406.19434.md)] [[arXiv:2406.19434](https://arxiv.org/abs/2406.19434)] [Code]\n- **📝 说明**：\n\n#### [2] FAGhead: Fully Animate Gaussian Head from Monocular Videos\n- **🧑‍🔬 作者**：Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan\n- **🏫 单位**：Zhejiang University ⟐ vivo AI Lab\n- **🔗 链接**：[[中英摘要](./abs/2406.19070.md)] [[arXiv:2406.19070](https://arxiv.org/abs/2406.19070)] [Code]\n- **📝 说明**：\n\n#### [3] GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors\n- **🧑‍🔬 作者**：Zuo-Liang Zhu, Beibei Wang, Jian Yang\n- **🏫 单位**：Nankai University\n- **🔗 链接**：[[中英摘要](./abs/2406.18544.md)] [[arXiv:2406.18544](https://arxiv.org/abs/2406.18544)] [Code]\n- **📝 说明**：\n\n#### [4] On Scaling Up 3D Gaussian Splatting Training\n- **🧑‍🔬 作者**：Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie\n- **🏫 单位**：New York University ⟐ Pacific Northwest National Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2406.18533.md)] [[arXiv:2406.18533](https://arxiv.org/abs/2406.18533)] [[Code](https://github.com/nyu-systems/Grendel-GS)]\n- **📝 说明**：\n\n#### [5] GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality\n- **🧑‍🔬 作者**：Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Huawei Inc. ⟐ Shanghai Jiao Tong University ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2406.18462.md)] [[arXiv:2406.18462](https://arxiv.org/abs/2406.18462)] [[Code](https://github.com/hustvl/GaussianDreamerPro)]\n- **📝 说明**：\n\n#### [6] GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting\n- **🧑‍🔬 作者**：Jiaze Li, Zhengyu Wen, Luo Zhang, Jiangbei Hu, Fei Hou, Zhebin Zhang, Ying He\n- **🏫 单位**：Nanyang Technological University ⟐ Dalian University of Technology ⟐ Chinese Academy of Sciences ⟐ InnoPeak Technology, Inc.\n- **🔗 链接**：[[中英摘要](./abs/2406.18199.md)] [[arXiv:2406.18199](https://arxiv.org/abs/2406.18199)] [Code]\n- **📝 说明**：\n\n#### [7] VDG: Vision-Only Dynamic Gaussian for Driving Simulation\n- **🧑‍🔬 作者**：Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han\n- **🏫 单位**：NWPU ⟐ Baidu Inc. ⟐ HKUST\n- **🔗 链接**：[[中英摘要](./abs/2406.18198.md)] [[arXiv:2406.18198](https://arxiv.org/abs/2406.18198)] [Code]\n- **📝 说明**：\n\n#### [8] ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians\n- **🧑‍🔬 作者**：Yufei Liu, Junshu Tang, Chu Zheng, Shijie Zhang, Jinkun Hao, Junwei Zhu, Dongjin Huang\n- **🏫 单位**：Shanghai University ⟐ Shanghai Jiao Tong University ⟐ Fudan University ⟐ Tencent Youtu Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2406.16815.md)] [[arXiv:2406.16815](https://arxiv.org/abs/2406.16815)] [Code]\n- **📝 说明**：\n\n#### [9] Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling\n- **🧑‍🔬 作者**：Min-Seop Kwak, Donghoon Ahn, Ines Hyeonsu Kim, Jin-wha Kim, Seungryong Kim\n- **🏫 单位**：Korea University ⟐ NAVER AI Lab ⟐ AI Institute of Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2406.16695.md)] [[arXiv:2406.16695](https://arxiv.org/abs/2406.16695)] [[Code](https://github.com/cvlab-kaist/GSD)]\n- **📝 说明**：\n\n#### [10] Taming 3DGS: High-Quality Radiance Fields with Limited Resources\n- **🧑‍🔬 作者**：Saswat Subhajyoti Mallick, Rahul Goel, Bernhard Kerbl, Francisco Vicente Carrasco, Markus Steinberger, Fernando De La Torre\n- **🏫 单位**：Carnegie Mellon University ⟐ Graz University of Technology ⟐ International Institute of Information Technology, Hyderabad\n- **🔗 链接**：[[中英摘要](./abs/2406.15643.md)] [[arXiv:2406.15643](https://arxiv.org/abs/2406.15643)] [Code]\n- **📝 说明**：\n\n#### [11] Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks\n- **🧑‍🔬 作者**：Alex Quach, Makram Chahine, Alexander Amini, Ramin Hasani, Daniela Rus\n- **🏫 单位**：MIT\n- **🔗 链接**：[[中英摘要](./abs/2406.15149.md)] [[arXiv:2406.15149](https://arxiv.org/abs/2406.15149)] [Code]\n- **📝 说明**：\n\n#### [12] E2GS: Event Enhanced Gaussian Splatting\n- **🧑‍🔬 作者**：Hiroyuki Deguchi, Mana Masuda, Takuya Nakabayashi, Hideo Saito\n- **🏫 单位**：Keio University\n- **🔗 链接**：[[中英摘要](./abs/2406.14978.md)] [[arXiv:2406.14978](https://arxiv.org/abs/2406.14978)] [[Code](https://github.com/deguchihiroyuki/E2GS)]\n- **📝 说明**：\n\n#### [13] Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models\n- **🧑‍🔬 作者**：Paul Henderson, Melonie de Almeida, Daniela Ivanova, Titas Anciukevičius\n- **🏫 单位**：University of Glasgow ⟐ University of Edinburgh\n- **🔗 链接**：[[中英摘要](./abs/2406.13099.md)] [[arXiv:2406.13099](https://arxiv.org/abs/2406.13099)] [Code]\n- **📝 说明**：\n\n#### [14] RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians\n- **🧑‍🔬 作者**：Bingling Li, Shengyi Chen, Luchao Wang, Kaimin He, Sijie Yan, Yuanjun Xiong\n- **🏫 单位**：MThreads AI\n- **🔗 链接**：[[中英摘要](./abs/2406.11836.md)] [[arXiv:2406.11836](https://arxiv.org/abs/2406.11836)] [Code]\n- **📝 说明**：\n\n#### [15] Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections\n- **🧑‍🔬 作者**：Jiacong Xu, Yiqun Mei, Vishal M. Patel\n- **🏫 单位**：Johns Hopkins University\n- **🔗 链接**：[[中英摘要](./abs/2406.10373.md)] [[arXiv:2406.10373](https://arxiv.org/abs/2406.10373)] [Code]\n- **📝 说明**：\n\n#### [16] GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors\n- **🧑‍🔬 作者**：Xiqian Yu, Hanxin Zhu, Tianyu He, Zhibo Chen\n- **🏫 单位**：University of Science and Technology of China ⟐ Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](./abs/2406.10111.md)] [[arXiv:2406.10111](https://arxiv.org/abs/2406.10111)] [[Code](https://github.com/chchnii/GaussianSR)]\n- **📝 说明**：\n\n#### [17] GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion\n- **🧑‍🔬 作者**：Trapoom Ukarapol, Kevin Pruvost\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2406.09850.md)] [[arXiv:2406.09850](https://arxiv.org/abs/2406.09850)] [[Code](https://github.com/trapoom555/GradeADreamer)]\n- **📝 说明**：\n\n#### [18] Unified Gaussian Primitives for Scene Representation and Rendering\n- **🧑‍🔬 作者**：Yang Zhou, Songyin Wu, Ling-Qi Yan\n- **🏫 单位**：University of California, Santa Barbara, USA\n- **🔗 链接**：[[中英摘要](./abs/2406.09733.md)] [[arXiv:2406.09733](https://arxiv.org/abs/2406.09733)] [Code]\n- **📝 说明**：\n\n#### [19] Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling\n- **🧑‍🔬 作者**：Fengyi Zhang, Tianjun Zhang, Lin Zhang, Helen Huang, Yadan Luo\n- **🏫 单位**：Tongji University ⟐ University of Queensland\n- **🔗 链接**：[[中英摘要](./abs/2406.08759.md)] [[arXiv:2406.08759](https://arxiv.org/abs/2406.08759)] [[Code](https://github.com/Xian-Bei/GaussianForest)]\n- **📝 说明**：\n\n#### [20] Trim 3D Gaussian Splatting for Accurate Geometry Representation\n- **🧑‍🔬 作者**：Lue Fan, Yuxue Yang, Minxing Li, Hongsheng Li, Zhaoxiang Zhang\n- **🏫 单位**：CASIA ⟐ MMLab,CUHK ⟐ Shanghai AILab\n- **🔗 链接**：[[中英摘要](./abs/2406.07499.md)] [[arXiv:2406.07499](https://arxiv.org/abs/2406.07499)] [[Code](https://github.com/YuxueYang1204/TrimGS)]\n- **📝 说明**：\n\n#### [21] InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping\n- **🧑‍🔬 作者**：Yunchao Zhang, Guandao Yang, Leonidas Guibas, Yanchao Yang\n- **🏫 单位**：The University of Hong Kong ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2406.05897.md)] [[arXiv:2406.05897](https://arxiv.org/abs/2406.05897)] [Code]\n- **📝 说明**：\n\n#### [22] RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering\n- **🧑‍🔬 作者**：Rui Zhang, Tianyue Luo, Weidong Yang, Ben Fei, Jingyi Xu, Qingyuan Zhou, Keyi Liu, Ying He\n- **🏫 单位**：Fudan University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2406.05852.md)] [[arXiv:2406.05852](https://arxiv.org/abs/2406.05852)] [Code]\n- **📝 说明**：\n\n#### [23] Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image\n- **🧑‍🔬 作者**：Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, João F. Henriques, Christian Rupprecht, Andrea Vedaldi\n- **🏫 单位**：VGG, University of Oxford ⟐ Australian National University\n- **🔗 链接**：[[中英摘要](./abs/2406.04343.md)] [[arXiv:2406.04343](https://arxiv.org/abs/2406.04343)] [[Code](https://github.com/eldar/flash3d)]\n- **📝 说明**：\n\n#### [24] Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion\n- **🧑‍🔬 作者**：Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, Yueqi Duan\n- **🏫 单位**：Tsinghua University ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2406.04338.md)] [[arXiv:2406.04338](https://arxiv.org/abs/2406.04338)] [[Code](https://github.com/liuff19/Physics3D)]\n- **📝 说明**：\n\n#### [25] Gaussian Representation for Deformable Image Registration\n- **🧑‍🔬 作者**：Jihe Li, Fabian Zhang, Xia Li, Tianhao Zhang, Ye Zhang, Joachim Buhmann\n- **🏫 单位**：Peking University ⟐ ETH Zurich ⟐ Tsinghua University ⟐ Paul Scherrer Institut\n- **🔗 链接**：[[中英摘要](./abs/2406.03394.md)] [[arXiv:2406.03394](https://arxiv.org/abs/2406.03394)] [Code]\n- **📝 说明**：\n\n#### [26] SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition\n- **🧑‍🔬 作者**：Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White\n- **🏫 单位**：NEural TransmissionS (NETS) Lab Florida Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2406.02533.md)] [[arXiv:2406.02533](https://arxiv.org/abs/2406.02533)] [Code]\n- **📝 说明**：\n\n#### [27] WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections\n- **🧑‍🔬 作者**：Yuze Wang, Junyi Wang, Yue Qi\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2406.02407.md)] [[arXiv:2406.02407](https://arxiv.org/abs/2406.02407)] [Code]\n- **📝 说明**：\n\n#### [28] Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning\n- **🧑‍🔬 作者**：Jiaxu Wang, Ziyi Zhang, Qiang Zhang, Jia Li, Jingkai Sun, Mingyuan Sun, Junhao He, Renjing Xu\n- **🏫 单位**：HKUST (GZ) ⟐ HKU ⟐ NEU\n- **🔗 链接**：[[中英摘要](./abs/2406.02370.md)] [[arXiv:2406.02370](https://arxiv.org/abs/2406.02370)] [Code]\n- **📝 说明**：\n\n#### [29] Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting\n- **🧑‍🔬 作者**：Shaojie Ma, Yawei Luo, Yi Yang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2406.01593.md)] [[arXiv:2406.01593](https://arxiv.org/abs/2406.01593)] [[Code](https://github.com/wcwac/MaGS)]\n- **📝 说明**：\n\n#### [30] RaDe-GS: Rasterizing Depth in Gaussian Splatting\n- **🧑‍🔬 作者**：Baowen Zhang, Chuan Fang, Rakesh Shrestha, Yixun Liang, Xiaoxiao Long, Ping Tan\n- **🏫 单位**：Hong Kong University of Science and Technology ⟐ Simon Fraser University\n- **🔗 链接**：[[中英摘要](./abs/2406.01467.md)] [[arXiv:2406.01467](https://arxiv.org/abs/2406.01467)] [[Code](https://github.com/BaowenZ/RaDe-GS)]\n- **📝 说明**：\n\n#### [31] Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting\n- **🧑‍🔬 作者**：Fang Li, Hao Zhang, Narendra Ahuja\n- **🏫 单位**：University of Illinois at Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2406.01042.md)] [[arXiv:2406.01042](https://arxiv.org/abs/2406.01042)] [[Code](https://github.com/fangli333/SC-4DGS)]\n- **📝 说明**：\n\n#### [32] GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis\n- **🧑‍🔬 作者**：Yumeng He, Yunbo Wang, Xiaokang Yang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2405.20791.md)] [[arXiv:2405.20791](https://arxiv.org/abs/2405.20791)] [Code]\n- **📝 说明**：\n\n#### [33] Fourier123: One Image to High-Quality 3D Object Generation with Hybrid Fourier Score Distillation\n- **🧑‍🔬 作者**：Shuzhou Yang, Yu Wang, Haijie Li, Jiarui Meng, Xiandong Meng, Jian Zhang\n- **🏫 单位**：Peking University ⟐ PengCheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2405.20669.md)] [[arXiv:2405.20669](https://arxiv.org/abs/2405.20669)] [[Code](https://github.com/Ysz2022/Fourier123)]\n- **📝 说明**：\n\n#### [34] S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving\n- **🧑‍🔬 作者**：Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang\n- **🏫 单位**：UC Berkeley ⟐ Peking University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2405.20323.md)] [[arXiv:2405.20323](https://arxiv.org/abs/2405.20323)] [[Code](https://github.com/nnanhuang/S3Gaussian/)]\n- **📝 说明**：\n\n#### [35] A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction\n- **🧑‍🔬 作者**：Jianghao Shen, Tianfu Wu\n- **🏫 单位**：North Carolina State University\n- **🔗 链接**：[[中英摘要](./abs/2405.20310.md)] [[arXiv:2405.20310](https://arxiv.org/abs/2405.20310)] [Code]\n- **📝 说明**：\n\n#### [36] EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images\n- **🧑‍🔬 作者**：Wangbo Yu, Chaoran Feng, Jiye Tang, Xu Jia, Li Yuan, Yonghong Tian\n- **🏫 单位**：Peking University ⟐ Pengcheng Laboratory ⟐ Dalian University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2405.20224.md)] [[arXiv:2405.20224](https://arxiv.org/abs/2405.20224)] [[Code](https://github.com/PKU-YuanGroup/EvaGaussians)]\n- **📝 说明**：\n\n#### [37] PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting\n- **🧑‍🔬 作者**：Qiaowei Miao, Yawei Luo, Yi Yang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2405.19957.md)] [[arXiv:2405.19957](https://arxiv.org/abs/2405.19957)] [Code]\n- **📝 说明**：\n\n#### [38] GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction\n- **🧑‍🔬 作者**：Haodong Xiang, Xinghui Li, Xiansong Lai, Wanting Zhang, Zhichao Liao, Kai Cheng, Xueping Liu\n- **🏫 单位**：Tsinghua University ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2405.19671.md)] [[arXiv:2405.19671](https://arxiv.org/abs/2405.19671)] [[Code](https://github.com/xhd0612/GaussianRoom)]\n- **📝 说明**：\n\n#### [39] Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian\n- **🧑‍🔬 作者**：Wei Sun, Qi Zhang, Yanzhao Zhou, Qixiang Ye, Jianbin Jiao, Yuan Li\n- **🏫 单位**：UCAS\n- **🔗 链接**：[[中英摘要](./abs/2405.19657.md)] [[arXiv:2405.19657](https://arxiv.org/abs/2405.19657)] [Code]\n- **📝 说明**：\n\n#### [40] TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM\n- **🧑‍🔬 作者**：Peifeng Jiang, Hong Liu, Xia Li, Ti Wang, Fabian Zhang, Joachim M. Buhmann\n- **🏫 单位**：Peking University ⟐ ETH Zurich\n- **🔗 链接**：[[中英摘要](./abs/2405.19614.md)] [[arXiv:2405.19614](https://arxiv.org/abs/2405.19614)] [[Code](https://github.com/ZeldaFromHeaven/TAMBRIDGE-DAVID)]\n- **📝 说明**：\n\n#### [41] E3Gen: Efficient, Expressive and Editable Avatars Generation\n- **🧑‍🔬 作者**：Weitian Zhang, Yichao Yan, Yunhui Liu, Xingdong Sheng, Xiaokang Yang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Lenovo Research\n- **🔗 链接**：[[中英摘要](./abs/2405.19203.md)] [[arXiv:2405.19203](https://arxiv.org/abs/2405.19203)] [[Code](https://github.com/olivia23333/E3Gen)]\n- **📝 说明**：\n\n#### [42] LP-3DGS: Learning to Prune 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan\n- **🏫 单位**：Johns Hopkins University ⟐ University of North Carolina at Charlotte\n- **🔗 链接**：[[中英摘要](./abs/2405.18784.md)] [[arXiv:2405.18784](https://arxiv.org/abs/2405.18784)] [Code]\n- **📝 说明**：\n\n#### [43] NegGS: Negative Gaussian Splatting\n- **🧑‍🔬 作者**：Artur Kasymov, Bartosz Czekaj, Marcin Mazur, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University\n- **🔗 链接**：[[中英摘要](./abs/2405.18163.md)] [[arXiv:2405.18163](https://arxiv.org/abs/2405.18163)] [Code]\n- **📝 说明**：\n\n#### [44] A Grid-Free Fluid Solver based on Gaussian Spatial Representation\n- **🧑‍🔬 作者**：Jingrui Xing, Bin Wang, Mengyu Chu, Baoquan Chen\n- **🏫 单位**：Peking University ⟐ Beijing Institute for General Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2405.18133.md)] [[arXiv:2405.18133](https://arxiv.org/abs/2405.18133)] [Code]\n- **📝 说明**：\n\n#### [45] A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Bin Zhang, Bi Zeng, Zexin Peng\n- **🏫 单位**：Guangdong University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2405.17891.md)] [[arXiv:2405.17891](https://arxiv.org/abs/2405.17891)] [Code]\n- **📝 说明**：\n\n#### [46] SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction\n- **🧑‍🔬 作者**：Yongjae Lee, Zhaoliang Zhang, Deliang Fan\n- **🏫 单位**：Johns Hopkins University\n- **🔗 链接**：[[中英摘要](./abs/2405.17793.md)] [[arXiv:2405.17793](https://arxiv.org/abs/2405.17793)] [Code]\n- **📝 说明**：\n\n#### [47] SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain\n- **🧑‍🔬 作者**：Butian Xiong, Xiaoyu Ye, Tze Ho Elden Tse, Kai Han, Shuguang Cui, Zhen Li\n- **🏫 单位**：CUHK, Shenzhen ⟐ Beijing Institute of Technology ⟐ Auki Labs ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2405.16923.md)] [[arXiv:2405.16923](https://arxiv.org/abs/2405.16923)] [Code]\n- **📝 说明**：\n\n#### [48] Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation\n- **🧑‍🔬 作者**：Zhoujie Fu, Jiacheng Wei, Wenhao Shen, Chaoyue Song, Xiaofeng Yang, Fayao Liu, Xulei Yang, Guosheng Lin\n- **🏫 单位**：Nanyang Technological University ⟐ A*STAR\n- **🔗 链接**：[[中英摘要](./abs/2405.16849.md)] [[arXiv:2405.16849](https://arxiv.org/abs/2405.16849)] [Code]\n- **📝 说明**：\n\n#### [49] PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zipeng Wang, Dan Xu\n- **🏫 单位**：HKUST\n- **🔗 链接**：[[中英摘要](./abs/2405.16829.md)] [[arXiv:2405.16829](https://arxiv.org/abs/2405.16829)] [Code]\n- **📝 说明**：\n\n#### [50] Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians\n- **🧑‍🔬 作者**：Erik Sandström, Keisuke Tateno, Michael Oechsle, Michael Niemeyer, Luc Van Gool, Martin R. Oswald, Federico Tombari\n- **🏫 单位**：Google ⟐ ETH Zürich ⟐ INSAIT ⟐ University of Amsterdam ⟐ TU München\n- **🔗 链接**：[[中英摘要](./abs/2405.16544.md)] [[arXiv:2405.16544](https://arxiv.org/abs/2405.16544)] [[Code](https://github.com/eriksandstroem/Splat-SLAM)]\n- **📝 说明**：\n\n#### [51] Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors\n- **🧑‍🔬 作者**：Soumava Paul, Christopher Wewer, Bernt Schiele, Jan Eric Lenssen\n- **🏫 单位**：Max Planck Institute for Informatics ⟐ Saarland University\n- **🔗 链接**：[[中英摘要](./abs/2405.16517.md)] [[arXiv:2405.16517](https://arxiv.org/abs/2405.16517)] [Code]\n- **📝 说明**：\n\n#### [52] Feature Splatting for Better Novel View Synthesis with Low Overlap\n- **🧑‍🔬 作者**：T. Berriel Martins, Javier Civera\n- **🏫 单位**：University of Zaragoza\n- **🔗 链接**：[[中英摘要](./abs/2405.15518.md)] [[arXiv:2405.15518](https://arxiv.org/abs/2405.15518)] [[Code](https://github.com/tberriel/FeatSplat)]\n- **📝 说明**：\n\n#### [53] GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiajun Huang, Hongchuan Yu\n- **🏫 单位**：Bournemouth University\n- **🔗 链接**：[[中英摘要](./abs/2405.15491.md)] [[arXiv:2405.15491](https://arxiv.org/abs/2405.15491)] [Code]\n- **📝 说明**：\n\n#### [54] MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes\n- **🧑‍🔬 作者**：Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong, Zhenguo Li, Qiang Xu\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Hong Kong University of Science and Technology ⟐ Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2405.14475.md)] [[arXiv:2405.14475](https://arxiv.org/abs/2405.14475)] [[Code](https://github.com/flymin/MagicDrive3D)]\n- **📝 说明**：\n\n#### [55] TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing\n- **🧑‍🔬 作者**：Teng Xu, Jiamin Chen, Peng Chen, Youjia Zhang, Junqing Yu, Wei Yang\n- **🏫 单位**：Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2405.14455.md)] [[arXiv:2405.14455](https://arxiv.org/abs/2405.14455)] [Code]\n- **📝 说明**：\n\n#### [56] RoGS: Large Scale Road Surface Reconstruction based on 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiheng Feng, Wenhua Wu, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2405.14342.md)] [[arXiv:2405.14342](https://arxiv.org/abs/2405.14342)] [Code]\n- **📝 说明**：\n\n#### [57] Monocular Gaussian SLAM with Language Extended Loop Closure\n- **🧑‍🔬 作者**：Tian Lan, Qinwei Lin, Haoqian Wang\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2405.13748.md)] [[arXiv:2405.13748](https://arxiv.org/abs/2405.13748)] [Code]\n- **📝 说明**：\n\n#### [58] Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances\n- **🧑‍🔬 作者**：Licheng Shen, Ho Ngai Chow, Lingyun Wang, Tong Zhang, Mengqiu Wang, Yuxing Han\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School, Tsinghua University ⟐ Zero-Zero Lab ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2405.13694.md)] [[arXiv:2405.13694](https://arxiv.org/abs/2405.13694)] [Code]\n- **📝 说明**：\n\n#### [59] MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video\n- **🧑‍🔬 作者**：Hongsheng Wang, Xiang Cai, Xi Sun, Jinhong Yue, Shengyu Zhang, Feng Lin, Fei Wu\n- **🏫 单位**：Zhejiang University ⟐ Zhejiang Lab, China\n- **🔗 链接**：[[中英摘要](./abs/2405.12806.md)] [[arXiv:2405.12806](https://arxiv.org/abs/2405.12806)] [[Code](https://github.com/3DHumanRehab/MOSS)]\n- **📝 说明**：\n\n#### [60] LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting\n- **🧑‍🔬 作者**：Jia Gong, Shenyu Ji, Lin Geng Foo, Kang Chen, Hossein Rahmani, Jun Liu\n- **🏫 单位**：Singapore University of Technology and Design ⟐ Netease ⟐ Lancaster University\n- **🔗 链接**：[[中英摘要](./abs/2405.12663.md)] [[arXiv:2405.12663](https://arxiv.org/abs/2405.12663)] [[Code](https://github.com/richzhang/webpage-template)]\n- **📝 说明**：\n\n#### [61] Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery\n- **🧑‍🔬 作者**：Hongsheng Wang, Weiyue Zhang, Sihao Liu, Xinrui Zhou, Shengyu Zhang, Fei Wu, Feng Lin\n- **🏫 单位**：Zhejiang University ⟐ Zhejiang Lab, China\n- **🔗 链接**：[[中英摘要](./abs/2405.12477.md)] [[arXiv:2405.12477](https://arxiv.org/abs/2405.12477)] [[Code](https://github.com/3DHumanRehab/SemanticGraph-Gaussian)]\n- **📝 说明**：\n\n#### [62] Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping\n- **🧑‍🔬 作者**：Tianhao Wu, Jing Yang, Zhilin Guo, Jingyi Wan, Fangcheng Zhong, Cengiz Oztireli\n- **🏫 单位**：University of Cambridge, United Kingdom\n- **🔗 链接**：[[中英摘要](./abs/2405.12069.md)] [[arXiv:2405.12069](https://arxiv.org/abs/2405.12069)] [[Code](https://github.com/ChikaYan/Gaussian-HS)]\n- **📝 说明**：\n\n#### [63] GGAvatar: Geometric Adjustment of Gaussian Head Avatar\n- **🧑‍🔬 作者**：Xinyang Li, Jiaxin Wang, Yixin Xuan, Gongxin Yao, Yu Pan\n- **🏫 单位**：Zhejiang University ⟐ Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](./abs/2405.11993.md)] [[arXiv:2405.11993](https://arxiv.org/abs/2405.11993)] [Code]\n- **📝 说明**：\n\n#### [64] MotionGS : Compact Gaussian Splatting SLAM by Motion Filter\n- **🧑‍🔬 作者**：Xinli Guo, Peng Han, Weidong Zhang, Hongtian Chen\n- **🏫 单位**：Shang Hai Jiao Tong University, China\n- **🔗 链接**：[[中英摘要](./abs/2405.11129.md)] [[arXiv:2405.11129](https://arxiv.org/abs/2405.11129)] [Code]\n- **📝 说明**：\n\n#### [65] Photorealistic 3D Urban Scene Reconstruction and Point Cloud Extraction using Google Earth Imagery and Gaussian Splatting\n- **🧑‍🔬 作者**：Kyle Gao, Dening Lu, Hongjie He, Linlin Xu, Jonathan Li\n- **🏫 单位**：Department of Systems Design Engineering, University of Waterloo\n- **🔗 链接**：[[中英摘要](./abs/2405.11021.md)] [[arXiv:2405.11021](https://arxiv.org/abs/2405.11021)] [Code]\n- **📝 说明**：\n\n#### [66] GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction\n- **🧑‍🔬 作者**：Rui Jin, Yuman Gao, Haojian Lu, Fei Gao\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2405.10142.md)] [[arXiv:2405.10142](https://arxiv.org/abs/2405.10142)] [Code]\n- **📝 说明**：✏️\n\n#### [67] GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting\n- **🧑‍🔬 作者**：Haodong Chen, Yongle Huang, Haojian Huang, Xiangsheng Ge, Dian Shao\n- **🏫 单位**：Northwestern Polytechnical University ⟐  The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2405.07472.md)] [[arXiv:2405.07472](https://arxiv.org/abs/2405.07472)] [Code]\n- **📝 说明**：✏️\n\n#### [68] Direct Learning of Mesh and Appearance via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Ancheng Lin, Jun Li\n- **🏫 单位**：School of Computer Science, Australian Artificial Intelligence Institute (AAII) ⟐  University of Technology Sydney, Sydney, NSW 2007, Australia\n- **🔗 链接**：[[中英摘要](./abs/2405.06945.md)] [[arXiv:2405.06945](https://arxiv.org/abs/2405.06945)] [Code]\n- **📝 说明**：✏️\n\n#### [69] OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation\n- **🧑‍🔬 作者**：Jinwei Lin\n- **🏫 单位**：Monash University, Clayton Victoria, Australia\n- **🔗 链接**：[[中英摘要](./abs/2405.06547.md)] [[arXiv:2405.06547](https://arxiv.org/abs/2405.06547)] [[Code](https://github.com/lin-jinwei/OneTo3D)]\n- **📝 说明**：✏️\n\n#### [70] I3DGS: Improve 3D Gaussian Splatting from Multiple Dimension\n- **🧑‍🔬 作者**：Jinwei Lin\n- **🏫 单位**：Monash University, Clayton Victoria, Australia\n- **🔗 链接**：[[中英摘要](./abs/2405.06408.md)] [[arXiv:2405.06408](https://arxiv.org/abs/2405.06408)] [Code]\n- **📝 说明**：✏️\n\n#### [71] DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation\n- **🧑‍🔬 作者**：Sitian Shen, Jing Xu, Yuheng Yuan, Xingyi Yang, Qiuhong Shen, Xinchao Wang\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2405.05800.md)] [[arXiv:2405.05800](https://arxiv.org/abs/2405.05800)] [Code]\n- **📝 说明**：✏️\n\n#### [72] NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap\n- **🧑‍🔬 作者**：Mingrui Li, Jingwei Huang, Lei Sun, Aaron Xuxiang Tian, Tianchen Deng, Hongyu Wang\n- **🏫 单位**：Dalian University of Technology ⟐ University of Electronic Science and Technology of China ⟐ University of Pennsylvania\n- **🔗 链接**：[[中英摘要](./abs/2405.05702.md)] [[arXiv:2405.05702](https://arxiv.org/abs/2405.05702)] [Code]\n- **📝 说明**：✏️\n\n#### [73] GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields\n- **🧑‍🔬 作者**：Yuanhao Gong\n- **🏫 单位**：Electronics and Information Engineering, Shenzhen University, China\n- **🔗 链接**：[[中英摘要](./abs/2405.05446.md)] [[arXiv:2405.05446](https://arxiv.org/abs/2405.05446)] [Code]\n- **📝 说明**：✏️\n\n#### [74] Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting\n- **🧑‍🔬 作者**：Ola Shorinwa, Johnathan Tucker, Aliyah Smith, Aiden Swann, Timothy Chen, Roya Firoozi, Monroe Kennedy III, Mac Schwager\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2405.04378.md)] [[arXiv:2405.04378](https://arxiv.org/abs/2405.04378)] [[Code](https://github.com/StanfordMSL/Splat-MOVER)]\n- **📝 说明**：✏️\n\n#### [75] DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos\n- **🧑‍🔬 作者**：Wen-Hsuan Chu, Lei Ke, Katerina Fragkiadaki\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2405.02280.md)] [[arXiv:2405.02280](https://arxiv.org/abs/2405.02280)] [[Code](https://github.com/dreamscene4d/dreamscene4d)]\n- **📝 说明**：✏️\n\n#### [76] Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians\n- **🧑‍🔬 作者**：Zhenya Yang, Kai Chen, Yonghao Long, Qi Dou\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2405.00956.md)] [[arXiv:2405.00956](https://arxiv.org/abs/2405.00956)] [Code]\n- **📝 说明**：✏️\n\n#### [77] Spectrally Pruned Gaussian Fields with Neural Compensation\n- **🧑‍🔬 作者**：Runyi Yang, Zhenxin Zhu, Zhou Jiang, Baijun Ye, Xiaoxue Chen, Yifei Zhang, Yuantao Chen, Jian Zhao, Hao Zhao\n- **🏫 单位**：Tsinghua University ⟐ Imperial College London ⟐ Beihang University ⟐ Beijing Institute of Technology ⟐ UCAS ⟐ CUHK ⟐ China Telecom\n- **🔗 链接**：[[中英摘要](./abs/2405.00676.md)] [[arXiv:2405.00676](https://arxiv.org/abs/2405.00676)] [[Code](https://github.com/RunyiYang/SUNDAE)]\n- **📝 说明**：✏️\n\n#### [78] GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, Zexiang Xu\n- **🏫 单位**：Adobe Research ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](./abs/2404.19702.md)] [[arXiv:2404.19702](https://arxiv.org/abs/2404.19702)] [Code]\n- **📝 说明**：✏️\n\n#### [79] MicroDreamer: Zero-shot 3D Generation in ∼20 Seconds by Score-based Iterative Reconstruction\n- **🧑‍🔬 作者**：Luxi Chen, Zhengyi Wang, Chongxuan Li, Tingting Gao, Hang Su, Jun Zhu\n- **🏫 单位**：Renmin University of China ⟐ Tsinghua University ⟐ Kuaishou Technology\n- **🔗 链接**：[[中英摘要](./abs/2404.19525.md)] [[arXiv:2404.19525](https://arxiv.org/abs/2404.19525)] [[Code](https://github.com/ML-GSAI/MicroDreamer)]\n- **📝 说明**：✏️\n\n#### [80] GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting\n- **🧑‍🔬 作者**：Bo Chen, Shoukang Hu, Qi Chen, Chenpeng Du, Ran Yi, Yanmin Qian, Xie Chen\n- **🏫 单位**：Shanghai Jiaotong University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2404.19040.md)] [[arXiv:2404.19040](https://arxiv.org/abs/2404.19040)] [Code]\n- **📝 说明**：✏️\n\n#### [81] Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yifei Gao, Jie Ou, Lei Wang, Jun Cheng\n- **🏫 单位**：University of Electronic Science and Technology of China ⟐ Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences ⟐ Sichuan Yuanzhigu Technology Co., Ltd\n- **🔗 链接**：[[中英摘要](./abs/2404.18669.md)] [[arXiv:2404.18669](https://arxiv.org/abs/2404.18669)] [Code]\n- **📝 说明**：✏️\n\n#### [82] 3D Gaussian Splatting with Deferred Reflection\n- **🧑‍🔬 作者**：Keyang Ye, Qiming Hou, Kun Zhou\n- **🏫 单位**：State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China\n- **🔗 链接**：[[中英摘要](./abs/2404.18454.md)] [[arXiv:2404.18454](https://arxiv.org/abs/2404.18454)] [Code]\n- **📝 说明**：✏️\n\n#### [83] Reconstructing Satellites in 3D from Amateur Telescope Images\n- **🧑‍🔬 作者**：Zhiming Chang, Boyang Liu, Yifei Xia, Youming Guo, Boxin Shi, He Sun\n- **🏫 单位**：Simon Fraser University ⟐ University of British Columbia ⟐ University of Toronto ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2404.18394.md)] [[arXiv:2404.18394](https://arxiv.org/abs/2404.18394)] [Code]\n- **📝 说明**：✏️\n\n#### [84] SLAM for Indoor Mapping of Wide Area Construction Environments\n- **🧑‍🔬 作者**：Vincent Ress, Wei Zhang, David Skuddis, Norbert Haala, Uwe Soergel\n- **🏫 单位**：Institute for Photogrammetry and Geoinformatics, University of Stuttgart, Germany\n- **🔗 链接**：[[中英摘要](./abs/2404.17215.md)] [[arXiv:2404.17215](https://arxiv.org/abs/2404.17215)] [Code]\n- **📝 说明**：✏️\n\n#### [85] DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction\n- **🧑‍🔬 作者**：Jiamin Wu, Kenkun Liu, Han Gao, Xiaoke Jiang, Lei Zhang\n- **🏫 单位**：Hong Kong University of Science and Technology ⟐ International Digital Economy Academy (IDEA) ⟐ The Chinese University of Hong Kong, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2404.16323.md)] [[arXiv:2404.16323](https://arxiv.org/abs/2404.16323)] [Code]\n- **📝 说明**：✏️\n\n#### [86] GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko, Sangjun Ahn, Seungryong Kim\n- **🏫 单位**：Korea University ⟐ NCSOFT\n- **🔗 链接**：[[中英摘要](./abs/2404.16012.md)] [[arXiv:2404.16012](https://arxiv.org/abs/2404.16012)] [[Code](https://github.com/KU-CVLAB/GaussianTalker)]\n- **📝 说明**：✏️\n\n#### [87] OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation\n- **🧑‍🔬 作者**：Lizhi Wang, Feng Zhou, Jianqin Yin\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2404.15891.md)] [[arXiv:2404.15891](https://arxiv.org/abs/2404.15891)] [[Code](https://github.com/CrystalWlz/OMEGAS)]\n- **📝 说明**：✏️\n\n#### [88] Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation\n- **🧑‍🔬 作者**：Myrna C. Silva, Mahtab Dahaghin, Matteo Toso, Alessio Del Bue\n- **🏫 单位**：Pattern Analysis and Computer Vision (PAVIS), Istituto Italiano di Tecnologia (IIT) Genoa, Italy\n- **🔗 链接**：[[中英摘要](./abs/2404.12784.md)] [[arXiv:2404.12784](https://arxiv.org/abs/2404.12784)] [Code]\n- **📝 说明**：✏️\n\n#### [89] EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation\n- **🧑‍🔬 作者**：Wenkai Liu, Tao Guan, Bin Zhu, Lili Ju, Zikai Song, Dan Li, Yuesong Wang, Wei Yang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Wuhan Farsee2 Technology Co., Ltd. ⟐ University of South Carolina\n- **🔗 链接**：[[中英摘要](./abs/2404.12777.md)] [[arXiv:2404.12777](https://arxiv.org/abs/2404.12777)] [Code]\n- **📝 说明**：✏️\n\n#### [90] Does Gaussian Splatting need SFM Initialization?\n- **🧑‍🔬 作者**：Yalda Foroutan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi\n- **🏫 单位**：Simon Fraser University ⟐ University of British Columbia ⟐ University of Toronto ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2404.12547.md)] [[arXiv:2404.12547](https://arxiv.org/abs/2404.12547)] [Code]\n- **📝 说明**：✏️\n\n#### [91] InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior\n- **🧑‍🔬 作者**：Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao\n- **🏫 单位**：University of Science and Technology of China ⟐ The Hong Kong University of Science and Technology ⟐ Ant Group ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2404.11613.md)] [[arXiv:2404.11613](https://arxiv.org/abs/2404.11613)] [[Code](https://github.com/ali-vilab/infusion)]\n- **📝 说明**：✏️\n\n#### [92] DeblurGS: Gaussian Splatting for Camera Motion Blur\n- **🧑‍🔬 作者**：Jeongtaek Oh, Jaeyoung Chung, Dongwoo Lee, Kyoung Mu Lee\n- **🏫 单位**：Seoul National University ⟐ Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2404.11358.md)] [[arXiv:2404.11358](https://arxiv.org/abs/2404.11358)] [Code]\n- **📝 说明**：✏️\n\n#### [93] AbsGS: Recovering Fine Details for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zongxin Ye, Wenyu Li, Sidun Liu, Peng Qiao, Yong Dou\n- **🏫 单位**：National University of Defense Technology Changsha, China\n- **🔗 链接**：[[中英摘要](./abs/2404.10484.md)] [[arXiv:2404.10484](https://arxiv.org/abs/2404.10484)] [[Code](https://github.com/TY424/AbsGS)]\n- **📝 说明**：✏️\n\n#### [94] SRGS: Super-Resolution 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiang Feng, Yongbo He, Yubo Wang, Yan Yang, Zhenzhong Kuang, Yu Jun, Jianping Fan, Jiajun ding\n- **🏫 单位**：Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](./abs/2404.10318.md)] [[arXiv:2404.10318](https://arxiv.org/abs/2404.10318)] [Code]\n- **📝 说明**：✏️\n\n#### [95] CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting\n- **🧑‍🔬 作者**：Xiangrui Liu, Xinju Wu, Pingping Zhang, Shiqi Wang, Zhu Li, Sam Kwong\n- **🏫 单位**：City University of Hong Kong ⟐ University of Missouri-Kansas City ⟐ Lingnan University\n- **🔗 链接**：[[中英摘要](./abs/2404.09458.md)] [[arXiv:2404.09458](https://arxiv.org/abs/2404.09458)] [Code]\n- **📝 说明**：✏️\n\n#### [96] DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading\n- **🧑‍🔬 作者**：Tong Wu, Jia-Mu Sun, Yu-Kun Lai, Yuewen Ma, Leif Kobbelt, Lin Gao\n- **🏫 单位**：Institute of Computing Technology\n- **🔗 链接**：[[中英摘要](./abs/2404.09412.md)] [[arXiv:2404.09412](https://arxiv.org/abs/2404.09412)] [Code]\n- **📝 说明**：✏️\n\n#### [97] DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling\n- **🧑‍🔬 作者**：Xuening Yuan, Hongyu Yang, Yueming Zhao, Di Huang\n- **🏫 单位**：Beihang University Beijing, China\n- **🔗 链接**：[[中英摘要](./abs/2404.09227.md)] [[arXiv:2404.09227](https://arxiv.org/abs/2404.09227)] [Code]\n- **📝 说明**：✏️\n\n#### [98] EGGS: Edge Guided Gaussian Splatting for Radiance Fields\n- **🧑‍🔬 作者**：Yuanhao Gong\n- **🏫 单位**：Electronics and Information Engineering, Shenzhen University, China\n- **🔗 链接**：[[中英摘要](./abs/2404.09105.md)] [[arXiv:2404.09105](https://arxiv.org/abs/2404.09105)] [Code]\n- **📝 说明**：✏️\n\n#### [99] OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering\n- **🧑‍🔬 作者**：Jingrui Ye, Zongkai Zhang, Yujiao Jiang, Qingmin Liao, Wenming Yang, Zongqing Lu\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School, Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2404.08449.md)] [[arXiv:2404.08449](https://arxiv.org/abs/2404.08449)] [Code]\n- **📝 说明**：✏️\n\n#### [100] Reinforcement Learning with Generalizable Gaussian Splatting\n- **🧑‍🔬 作者**：Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Yecheng Shao, Renjing Xu\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou), China ⟐  Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2404.07950.md)] [[arXiv:2404.07950](https://arxiv.org/abs/2404.07950)] [Code]\n- **📝 说明**：✏️\n\n#### [101] Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction\n- **🧑‍🔬 作者**：Sierra Bonilla, Shuai Zhang, Dimitrios Psychogyios, Danail Stoyanov, Francisco Vasconcelos, Sophia Bano\n- **🏫 单位**：University College London\n- **🔗 链接**：[[中英摘要](./abs/2404.06128.md)] [[arXiv:2404.06128](https://arxiv.org/abs/2404.06128)] [[Code](https://github.com/smbonilla/GaussianPancakes)]\n- **📝 说明**：✏️\n\n#### [102] Revising Densification in Gaussian Splatting\n- **🧑‍🔬 作者**：Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder\n- **🏫 单位**：Meta Reality Labs Zurich\n- **🔗 链接**：[[中英摘要](./abs/2404.06109.md)] [[arXiv:2404.06109](https://arxiv.org/abs/2404.06109)] [Code]\n- **📝 说明**：✏️\n\n#### [103] GauU-Scene V2: Expanse Lidar Image Dataset Shows Unreliable Geometric Reconstruction Using Gaussian Splatting and NeRF\n- **🧑‍🔬 作者**：Butian Xiong, Nanjun Zheng, Zhen Li\n- **🏫 单位**：The Chinese University of Hong Kong, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2404.04880.md)] [[arXiv:2404.04880](https://arxiv.org/abs/2404.04880)] [[Code](https://github.com/saliteta/lidar_SfM_alignment)]\n- **📝 说明**：✏️\n\n#### [104] Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion\n- **🧑‍🔬 作者**：Ziyuan Qu, Omkar Vengurlekar, Mohamad Qadri, Kevin Zhang, Michael Kaess, Christopher Metzler, Suren Jayasuriya, Adithya Pediredla\n- **🏫 单位**：Dartmouth College ⟐ Arizona State University ⟐ Carnegie Mellon University ⟐ University of Maryland\n- **🔗 链接**：[[中英摘要](./abs/2404.04687.md)] [[arXiv:2404.04687](https://arxiv.org/abs/2404.04687)] [Code]\n- **📝 说明**：✏️\n\n#### [105] Robust Gaussian Splatting\n- **🧑‍🔬 作者**：François Darmon, Lorenzo Porzi, Samuel Rota-Bulò, Peter Kontschieder\n- **🏫 单位**：Meta Reality Labs Zurich\n- **🔗 链接**：[[中英摘要](./abs/2404.04211.md)] [[arXiv:2404.04211](https://arxiv.org/abs/2404.04211)] [Code]\n- **📝 说明**：✏️\n\n#### [106] GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis\n- **🧑‍🔬 作者**：Emmanouil Nikolakakis, Utkarsh Gupta, Jonathan Vengosh, Justin Bui, Razvan Marinescu\n- **🏫 单位**：University of California\n- **🔗 链接**：[[中英摘要](./abs/2404.03126.md)] [[arXiv:2404.03126](https://arxiv.org/abs/2404.03126)] [Code]\n- **📝 说明**：✏️\n\n#### [107] TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Surrounding Autonomous Driving Scenes\n- **🧑‍🔬 作者**：Cheng Zhao, Su Sun, Ruoyu Wang, Yuliang Guo, Jun-Jun Wan, Zhou Huang, Xinyu Huang, Yingjie Victor Chen, Liu Ren\n- **🏫 单位**：Bosch Research North America ⟐ Purdue University ⟐ XC Cross Domain Computing\n- **🔗 链接**：[[中英摘要](./abs/2404.02410.md)] [[arXiv:2404.02410](https://arxiv.org/abs/2404.02410)] [Code]\n- **📝 说明**：✏️\n\n#### [108] 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaoyang Lyu, Yang-Tian Sun, Yi-Hua Huang, Xiuzhe Wu, Ziyi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi\n- **🏫 单位**：The University of Hong Kong ⟐ Zhejiang University ⟐ Shanghai AI Lab\n- **🔗 链接**：[[中英摘要](./abs/2404.00409.md)] [[arXiv:2404.00409](https://arxiv.org/abs/2404.00409)] [Code]\n- **📝 说明**：✏️\n\n#### [109] InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds\n- **🧑‍🔬 作者**：Zhiwen Fan, Wenyan Cong, Kairun Wen, Kevin Wang, Jian Zhang, Xinghao Ding, Danfei Xu, Boris Ivanovic, Marco Pavone, Georgios Pavlakos, Zhangyang Wang, Yue Wang\n- **🏫 单位**：University of Texas at Austin ⟐ Nvidia Research ⟐ Xiamen University ⟐ Georgia Institute of Technology ⟐ Stanford University ⟐ University of Southern California\n- **🔗 链接**：[[中英摘要](./abs/2403.20309.md)] [[arXiv:2403.20309](https://arxiv.org/abs/2403.20309)] [[Code](https://github.com/NVlabs/InstantSplat)]\n- **📝 说明**：✏️\n\n#### [110] Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces\n- **🧑‍🔬 作者**：Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison\n- **🏫 单位**：University of Bristol ⟐ Google Zurich ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2403.20275.md)] [[arXiv:2403.20275](https://arxiv.org/abs/2403.20275)] [Code]\n- **📝 说明**：✏️\n\n#### [111] HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes\n- **🧑‍🔬 作者**：Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Shanshuai Yuan, Muer Tie, Julong Wei, Zijun Xu, Jieru Zhao, Zhongxue Gan, Wenchao Ding\n- **🏫 单位**：Fudan University ⟐ Harbin Institute of Technology ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2403.20159.md)] [[arXiv:2403.20159](https://arxiv.org/abs/2403.20159)] [Code]\n- **📝 说明**：✏️\n\n#### [112] SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior\n- **🧑‍🔬 作者**：Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun\n- **🏫 单位**：ETH Zürich ⟐ Baidu Research ⟐ University of Chinese Academy of Sciences ⟐ Harbin Institute of Technology ⟐ Tianjin University\n- **🔗 链接**：[[中英摘要](./abs/2403.20079.md)] [[arXiv:2403.20079](https://arxiv.org/abs/2403.20079)] [Code]\n- **📝 说明**：✏️\n\n#### [113] HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes\n- **🧑‍🔬 作者**：Zhuopeng Li, Yilin Zhang, Chenming Wu, Jianke Zhu, Liangjun Zhang\n- **🏫 单位**：Zhejiang University ⟐ Baidu Research\n- **🔗 链接**：[[中英摘要](./abs/2403.20032.md)] [[arXiv:2403.20032](https://arxiv.org/abs/2403.20032)] [Code]\n- **📝 说明**：✏️\n\n#### [114] GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond\n- **🧑‍🔬 作者**：Chongjie Ye, Yinyu Nie, Jiahao Chang, Yuantao Chen, Yihao Zhi, Xiaoguang Han\n- **🏫 单位**：CUHKSZ ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2403.19632.md)] [[arXiv:2403.19632](https://arxiv.org/abs/2403.19632)] [[Code](https://github.com/GAP-LAB-CUHK-SZ/gaustudio)]\n- **📝 说明**：✏️\n\n#### [115] SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing\n- **🧑‍🔬 作者**：Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao\n- **🏫 单位**：Tsinghua University ⟐ Tongji University ⟐ Ocean University of China ⟐ Duke Kunshan University ⟐ Haomo.ai\n- **🔗 链接**：[[中英摘要](./abs/2403.19615.md)] [[arXiv:2403.19615](https://arxiv.org/abs/2403.19615)] [[Code](https://github.com/zsy1987/SA-GS)]\n- **📝 说明**：✏️\n\n#### [116] SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface\n- **🧑‍🔬 作者**：Jiahao Luo, Jing Liu, James Davis\n- **🏫 单位**：University of California ⟐ ByteDance Inc.\n- **🔗 链接**：[[中英摘要](./abs/2403.18784.md)] [[arXiv:2403.18784](https://arxiv.org/abs/2403.18784)] [Code]\n- **📝 说明**：✏️\n\n#### [117] Modeling uncertainty for Gaussian Splatting\n- **🧑‍🔬 作者**：Luca Savant, Diego Valsesia, Enrico Magli\n- **🏫 单位**：Politecnico di Torino\n- **🔗 链接**：[[中英摘要](./abs/2403.18476.md)] [[arXiv:2403.18476](https://arxiv.org/abs/2403.18476)] [Code]\n- **📝 说明**：✏️\n\n#### [118] Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians\n- **🧑‍🔬 作者**：Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ Tongji University ⟐ University of Science and Technology of China ⟐ The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2403.17898.md)] [[arXiv:2403.17898](https://arxiv.org/abs/2403.17898)] [[Code](https://github.com/city-super/Octree-GS)]\n- **📝 说明**：✏️\n\n#### [119] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields\n- **🧑‍🔬 作者**：Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao\n- **🏫 单位**：ShanghaiTech University ⟐ University of Tübingen\n- **🔗 链接**：[[中英摘要](./abs/2403.17888.md)] [[arXiv:2403.17888](https://arxiv.org/abs/2403.17888)] [Code]\n- **📝 说明**：✏️\n\n#### [120] DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion\n- **🧑‍🔬 作者**：Yuanze Lin, Ronald Clark, Philip Torr\n- **🏫 单位**：University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2403.17237.md)] [[arXiv:2403.17237](https://arxiv.org/abs/2403.17237)] [[Code](https://github.com/yuanze-lin/DreamPolisher)]\n- **📝 说明**：✏️\n\n#### [121] Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jun Guo, Xiaojian Ma, Yue Fan, Huaping Liu, Qing Li\n- **🏫 单位**：BIGAI ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2403.15624.md)] [[arXiv:2403.15624](https://arxiv.org/abs/2403.15624)] [[Code](https://github.com/sharinka0715/semantic-gaussians)]\n- **📝 说明**：✏️\n\n#### [122] EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting\n- **🧑‍🔬 作者**：Kailing Wang, Chen Yang, Yuehao Wang, Sikuang Li, Yan Wang, Qi Dou, Xiaokang Yang, Wei Shen\n- **🏫 单位**： Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong ⟐ East China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2403.15124.md)] [[arXiv:2403.15124](https://arxiv.org/abs/2403.15124)] [[Code](https://github.com/endogslam/EndoGSLAM)]:\n- **📝 说明**：✏️\n\n#### [123] GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation\n- **🧑‍🔬 作者**：Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein\n- **🏫 单位**：Stanford University ⟐ The Hong Kong University of Science and Technology ⟐ Shanghai AI Laboratory ⟐ Zhejiang University ⟐ Ant Group\n- **🔗 链接**：[[中英摘要](./abs/2403.14621.md)] [[arXiv:2403.14621](https://arxiv.org/abs/2403.14621)] [[Code](https://github.com/justimyhxu/GRM)]\n- **📝 说明**：✏️\n\n#### [124] Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering\n- **🧑‍🔬 作者**：Yuanhao Gong, Lantao Yu, Guanghui Yue\n- **🏫 单位**：Shenzhen University ⟐ Guangdong Key Laboratory of Intelligent Information Processing ⟐ Adobe Inc.\n- **🔗 链接**：[[中英摘要](./abs/2403.14244.md)] [[arXiv:2403.14244](https://arxiv.org/abs/2403.14244)] [Code]\n- **📝 说明**：✏️\n\n#### [125] GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation\n- **🧑‍🔬 作者**：Quankai Gao, Qiangeng Xu, Zhe Cao, Ben Mildenhall, Wenchao Ma, Le Chen, Danhang Tang, Ulrich Neumann\n- **🏫 单位**：University of Southern California ⟐ Google ⟐ Pennsylvania State University ⟐ Max Planck Institute for Intelligent Systems\n- **🔗 链接**：[[中英摘要](./abs/2403.12365.md)] [[arXiv:2403.12365](https://arxiv.org/abs/2403.12365)] [[Code](https://github.com/Zerg-Overmind/GaussianFlow)]\n- **📝 说明**：✏️\n\n#### [126] VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model\n- **🧑‍🔬 作者**：Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, Qixing Huang\n- **🏫 单位**：Alibaba Group ⟐ CUHKSZ ⟐ Fudan University ⟐ Peking University ⟐ The University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2403.12010.md)] [[arXiv:2403.12010](https://arxiv.org/abs/2403.12010)] [[Code](https://github.com/alibaba/VideoMV)]\n- **📝 说明**：✏️\n\n#### [127] GaussNav: Gaussian Splatting for Visual Navigation\n- **🧑‍🔬 作者**：Xiaohan Lei, Min Wang, Wengang Zhou, Houqiang Li\n- **🏫 单位**：Univerisity of Science and Technology of China ⟐ Hefei Comprehensive National Science Center\n- **🔗 链接**：[[中英摘要](./abs/2403.11625.md)] [[arXiv:2403.11625](https://arxiv.org/abs/2403.11625)] [[Code](https://github.com/XiaohanLei/GaussNav)]\n- **📝 说明**：✏️\n\n#### [128] UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling\n- **🧑‍🔬 作者**：Yujiao Jiang, Qingmin Liao, Xiaoyu Li, Li Ma, Qi Zhang, Chaopeng Zhang, Zongqing Lu, Ying Shan\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School ⟐ Tencent AI Lab ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2403.11589.md)] [[arXiv:2403.11589](https://arxiv.org/abs/2403.11589)] [Code]\n- **📝 说明**：✏️\n\n#### [129] Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning\n- **🧑‍🔬 作者**：Teppei Suzuki\n- **🏫 单位**：Denso IT Laboratory, Inc.\n- **🔗 链接**：[[中英摘要](./abs/2403.11460.md)] [[arXiv:2403.11460](https://arxiv.org/abs/2403.11460)] [[Code](https://github.com/DensoITLab/Fed3DGS)]\n- **📝 说明**：✏️\n\n#### [130] Bridging 3D Gaussian and Mesh for Freeview Video Rendering\n- **🧑‍🔬 作者**：Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao\n- **🏫 单位**：Shanghai Tech University ⟐ Ant Group ⟐ Xi’an Jiaotong University ⟐ University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2403.11453.md)] [[arXiv:2403.11453](https://arxiv.org/abs/2403.11453)] [Code]\n- **📝 说明**：✏️\n\n#### [131] Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Zhiyang Guo, Wengang Zhou, Li Li, Min Wang, Houqiang Li\n- **🏫 单位**：University of Science and Technology of China ⟐ Institute of Artificial Intelligence, Hefei Comprehensive National Science Center\n- **🔗 链接**：[[中英摘要](./abs/2403.11447.md)] [[arXiv:2403.11447](https://arxiv.org/abs/2403.11447)] [Code]\n- **📝 说明**：✏️\n\n#### [132] BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors\n- **🧑‍🔬 作者**：Tingyang Zhang, Qingzhe Gao, Weiyu Li, Libin Liu, Baoquan Chen\n- **🏫 单位**：Peking University ⟐ Shandong University ⟐ The Hong Kong University of Science and Technology ⟐\n- **🔗 链接**：[[中英摘要](./abs/2403.11427.md)] [[arXiv:2403.11427](https://arxiv.org/abs/2403.11427)] [[Code](https://github.com/Talegqz/BAGS)]\n- **📝 说明**：✏️\n\n#### [133] BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis\n- **🧑‍🔬 作者**：Lutao Jiang, Lin Wang\n- **🏫 单位**：HKUST(GZ) ⟐ HKUST\n- **🔗 链接**：[[中英摘要](./abs/2403.11273.md)] [[arXiv:2403.11273](https://arxiv.org/abs/2403.11273)] [[Code](https://github.com/lutao2021/BrightDreamer)]\n- **📝 说明**：✏️\n\n#### [134] Compact 3D Gaussian Splatting For Dense Visual SLAM\n- **🧑‍🔬 作者**：Tianchen Deng, Yaohui Chen, Leyan Zhang, Jianfei Yang, Shenghai Yuan, Danwei Wang, Weidong Chen\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2403.11247.md)] [[arXiv:2403.11247](https://arxiv.org/abs/2403.11247)] [Code]\n- **📝 说明**：✏️\n\n#### [135] SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians\n- **🧑‍🔬 作者**：Hiba Dahmani, Moussab Bennehar, Nathan Piasco, Luis Roldao, Dzmitry Tsishkou\n- **🏫 单位**：Noah’s Ark, Huawei Paris Research Center\n- **🔗 链接**：[[中英摘要](./abs/2403.10427.md)] [[arXiv:2403.10427](https://arxiv.org/abs/2403.10427)] [Code]\n- **📝 说明**：✏️\n\n#### [136] FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model\n- **🧑‍🔬 作者**：Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang\n- **🏫 单位**：Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2403.10242.md)] [[arXiv:2403.10242](https://arxiv.org/abs/2403.10242)] [Code]\n- **📝 说明**：✏️\n\n#### [137] Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing\n- **🧑‍🔬 作者**：Tian-Xing Xu, Wenbo Hu, Yu-Kun Lai, Ying Shan, Song-Hai Zhang\n- **🏫 单位**：Tsinghua University ⟐ Tencent AI Lab ⟐ Cardiff University\n- **🔗 链接**：[[中英摘要](./abs/2403.10050.md)] [[arXiv:2403.10050](https://arxiv.org/abs/2403.10050)] [Code]\n- **📝 说明**：✏️\n\n#### [138] Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jaewoo Jung, Jisang Han, Honggyu An, Jiwon Kang, Seonghoon Park, Seungryong Kim\n- **🏫 单位**：Korea University\n- **🔗 链接**：[[中英摘要](./abs/2403.09413.md)] [[arXiv:2403.09413](https://arxiv.org/abs/2403.09413)] [[Code](https://github.com/KU-CVLAB/RAIN-GS)]\n- **📝 说明**：✏️\n\n#### [139] A New Split Algorithm for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu\n- **🏫 单位**：Tsinghua University ⟐ Cardiff University\n- **🔗 链接**：[[中英摘要](./abs/2403.09143.md)] [[arXiv:2403.09143](https://arxiv.org/abs/2403.09143)] [Code]\n- **📝 说明**：✏️\n\n#### [140] GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting\n- **🧑‍🔬 作者**：Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodolà\n- **🏫 单位**：Sapienza University of Rome\n- **🔗 链接**：[[中英摘要](./abs/2403.05154.md)] [[arXiv:2403.05154](https://arxiv.org/abs/2403.05154)] [Code]\n- **📝 说明**：✏️\n\n#### [141] Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps\n- **🧑‍🔬 作者**：Timothy Chen, Ola Shorinwa, Weijia Zeng, Joseph Bruno, Philip Dames, Mac Schwager\n- **🏫 单位**：Stanford University ⟐ University of California San Diego ⟐ Temple University\n- **🔗 链接**：[[中英摘要](./abs/2403.02751.md)] [[arXiv:2403.02751](https://arxiv.org/abs/2403.02751)] [[Code](https://github.com/chengine/splatnav)]\n- **📝 说明**：✏️\n\n#### [142] 3D Gaussian Model for Animation and Texturing\n- **🧑‍🔬 作者**：Xiangzhi Eric Wang, Zackary P. T. Sin\n- **🏫 单位**：The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2402.19441.md)] [[arXiv:2402.19441](https://arxiv.org/abs/2402.19441)] [Code]\n- **📝 说明**：✏️\n\n#### [143] GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video\n- **🧑‍🔬 作者**：Xinqi Liu, Chenming Wu, Xing Liu, Jialun Liu, Jinbo Wu, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang\n- **🏫 单位**：Baidu Inc.\n- **🔗 链接**：[[中英摘要](./abs/2402.16607.md)] [[arXiv:2402.16607](https://arxiv.org/abs/2402.16607)] [Code]\n- **📝 说明**：✏️\n\n#### [144] Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Joongho Jo, Hyeongwon Kim, Jongsun Park\n- **🏫 单位**：Korea University\n- **🔗 链接**：[[中英摘要](./abs/2402.13827.md)] [[arXiv:2402.13827](https://arxiv.org/abs/2402.13827)] [Code]\n- **📝 说明**：This paper has been withdrawn\n\n#### [145] GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians\n- **🧑‍🔬 作者**：Haimin Luo, Min Ouyang, Zijun Zhao, Suyi Jiang, Longwen Zhang, Qixuan Zhang, Wei Yang, Lan Xu, Jingyi Yu\n- **🏫 单位**：ShanghaiTech University ⟐ Huazhong University of Science and Technology ⟐ Deemos Technology ⟐ LumiAni Technology\n- **🔗 链接**：[[中英摘要](./abs/2402.10483.md)] [[arXiv:2402.10483](https://arxiv.org/abs/2402.10483)] [Code]\n- **📝 说明**：✏️\n\n#### [146] Mesh-based Gaussian Splatting for Real-time Large-scale Deformation\n- **🧑‍🔬 作者**：Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai\n- **🏫 单位**：University of Chinese Academy of Sciences ⟐  City University of Hong Kong ⟐ Cardiff University\n- **🔗 链接**：[[中英摘要](./abs/2402.04796.md)] [[arXiv:2402.04796](https://arxiv.org/abs/2402.04796)] [Code]\n- **📝 说明**：✏️\n\n#### [147] Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos\n- **🧑‍🔬 作者**：Alfredo Rivero, ShahRukh Athar, Zhixin Shu, Dimitris Samaras\n- **🏫 单位**：Stony Brook University ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2402.03723.md)] [[arXiv:2402.03723](https://arxiv.org/abs/2402.03723)] [Code]\n- **📝 说明**：✏️\n\n#### [148] GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting\n- **🧑‍🔬 作者**：Joanna Waczyńska, Piotr Borycki, Sławomir Tadeja, Jacek Tabor, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2402.01459.md)] [[arXiv:2402.01459](https://arxiv.org/abs/2402.01459)] [[Code](https://github.com/waczjoan/gaussian-mesh-splatting)]\n- **📝 说明**：✏️\n\n#### [149] 360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming\n- **🧑‍🔬 作者**：Jiayang Bai, Letian Huang, Jie Guo, Wen Gong, Yuanqi Li, Yanwen Guo\n- **🏫 单位**：Nanjing University\n- **🔗 链接**：[[中英摘要](./abs/2402.00763.md)] [[arXiv:2402.00763](https://arxiv.org/abs/2402.00763)] [Code]\n- **📝 说明**：✏️\n\n#### [150] Segment Anything in 3D Gaussians\n- **🧑‍🔬 作者**：Xu Hu, Yuxi Wang, Lue Fan, Junsong Fan, Junran Peng, Zhen Lei, Qing Li, Zhaoxiang Zhang\n- **🏫 单位**：The Hong Kong Polytechnic University ⟐ Center for Artificial Intelligence and Robotics, HKISI, CAS ⟐\nInstitute of Automation, Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Chongyue Technology\n- **🔗 链接**：[[中英摘要](./abs/2401.17857.md)] [[arXiv:2401.17857](https://arxiv.org/abs/2401.17857)] [[Code](https://github.com/Jumpat/SegAnyGAussians)]\n- **📝 说明**：✏️\n\n#### [151] PSAvatar: A Point-based Morphable Shape Model for Real-Time Head Avatar Creation with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhongyuan Zhao, Zhenyu Bao, Qing Li, Guoping Qiu, Kanglin Liu\n- **🏫 单位**：Pengcheng Laboratory ⟐ Peking University ⟐ University of Nottingham ⟐ Shenzhen University\n- **🔗 链接**：[[中英摘要](./abs/2401.12900.md)] [[arXiv:2401.12900](https://arxiv.org/abs/2401.12900)] [Code]\n- **📝 说明**：✏️\n\n#### [152] GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting\n- **🧑‍🔬 作者**：Mengtian Li, Shengxiang Yao, Zhifeng Xie, Keyu Chen, Yu-Gang Jiang\n- **🏫 单位**：Shanghai University ⟐ Fudan University ⟐ Shanghai Engineering Research Center of Motion Picture Special Effects ⟐ Tavus Inc.\n- **🔗 链接**：[[中英摘要](./abs/2401.09720.md)] [[arXiv:2401.09720](https://arxiv.org/abs/2401.09720)] [Code]\n- **📝 说明**：✏️\n\n#### [153] Fast Dynamic 3D Object Generation from a Single-view Video\n- **🧑‍🔬 作者**：Zijie Pan, Zeyu Yang, Xiatian Zhu, Li Zhang\n- **🏫 单位**：Fudan University ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2401.08742.md)] [[arXiv:2401.08742](https://arxiv.org/abs/2401.08742)] [[Code](https://github.com/fudan-zvg/Efficient4D)]\n- **📝 说明**：✏️\n\n#### [154] CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians\n- **🧑‍🔬 作者**：Bin Dou, Tianyu Zhang, Yongjia Ma, Zhaohui Wang, Zejian Yuan\n- **🏫 单位**：Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2401.05925.md)] [[arXiv:2401.05925](https://arxiv.org/abs/2401.05925)] [Code]\n- **📝 说明**：✏️\n\n#### [155] DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines\n- **🧑‍🔬 作者**：Sankeerth Durvasula, Adrian Zhao, Fan Chen, Ruofan Liang, Pawan Kumar Sanjaya, Nandita Vijaykumar\n- **🏫 单位**：University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2401.05345.md)] [[arXiv:2401.05345](https://arxiv.org/abs/2401.05345)] [Code]\n- **📝 说明**：✏️\n\n#### [156] AGG: Amortized Generative 3D Gaussians for Single Image to 3D\n- **🧑‍🔬 作者**：Dejia Xu, Ye Yuan, Morteza Mardani, Sifei Liu, Jiaming Song, Zhangyang Wang, Arash Vahdat\n- **🏫 单位**：The University of Texas at Austin ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2401.04099.md)] [[arXiv:2401.04099](https://arxiv.org/abs/2401.04099)] [Code]\n- **📝 说明**：✏️\n\n#### [157] Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Van Minh Nguyen, Emma Sandidge, Trupti Mahendrakar, Ryan T. White\n- **🏫 单位**：Florida Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2401.02588.md)] [[arXiv:2401.02588](https://arxiv.org/abs/2401.02588)] [Code]\n- **📝 说明**：✏️\n\n#### [158] 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency\n- **🧑‍🔬 作者**：Yuyang Yin, Dejia Xu, Zhangyang Wang, Yao Zhao, Yunchao Wei\n- **🏫 单位**：Beijing Jiaotong University ⟐ University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2312.17225.md)] [[arXiv:2312.17225](https://arxiv.org/abs/2312.17225)] [[Code](https://github.com/VITA-Group/4DGen)]\n- **📝 说明**：✏️\n\n#### [159] DreamGaussian4D: Generative 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu\n- **🏫 单位**：Nanyang Technological University ⟐ Shanghai AI Laboratory ⟐ Peking University ⟐ University of Michigan\n- **🔗 链接**：[[中英摘要](./abs/2312.17142.md)] [[arXiv:2312.17142](https://arxiv.org/abs/2312.17142)] [[Code](https://github.com/jiawei-ren/dreamgaussian4d)]\n- **📝 说明**：✏️\n\n#### [160] 2D-Guided 3D Gaussian Segmentation\n- **🧑‍🔬 作者**：Kun Lan, Haoran Li, Haolin Shi, Wenjun Wu, Yong Liao, Lin Wang, Pengyuan Zhou\n- **🏫 单位**：University of Science and Technology of China ⟐ HKUST(GZ)\n- **🔗 链接**：[[中英摘要](./abs/2312.16047.md)] [[arXiv:2312.16047](https://arxiv.org/abs/2312.16047)] [Code]\n- **📝 说明**：✏️\n\n#### [161] Sparse-view CT Reconstruction with 3D Gaussian Volumetric Representation\n- **🧑‍🔬 作者**：Yingtai Li, Xueming Fu, Shang Zhao, Ruiyang Jin, S. Kevin Zhou\n- **🏫 单位**：University of Science and Technology of China ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2312.15676.md)] [[arXiv:2312.15676](https://arxiv.org/abs/2312.15676)] [Code]\n- **📝 说明**：✏️\n\n#### [162] Human101: Training 100+FPS Human Gaussians in 100s from 1 View\n- **🧑‍🔬 作者**：Mingwei Li, Jiachen Tao, Zongxin Yang, Yi Yang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2312.15258.md)] [[arXiv:2312.15258](https://arxiv.org/abs/2312.15258)] [[Code](https://github.com/longxiang-ai/Human101)]\n- **📝 说明**：✏️\n\n#### [163] Deformable 3D Gaussian Splatting for Animatable Human Avatars\n- **🧑‍🔬 作者**：HyunJun Jung, Nikolas Brasch, Jifei Song, Eduardo Perez-Pellitero, Yiren Zhou, Zhihao Li, Nassir Navab, Benjamin Busam\n- **🏫 单位**：Technical University of Munich ⟐ Huawei Noah’s Ark Lab ⟐ 3dwe.ai\n- **🔗 链接**：[[中英摘要](./abs/2312.15059.md)] [[arXiv:2312.15059](https://arxiv.org/abs/2312.15059)] [[Code](https://github.com/Junggy/pardy-human)]\n- **📝 说明**：Code link 404\n\n#### [164] Gaussian Splatting with NeRF-based Color and Opacity\n- **🧑‍🔬 作者**：Dawid Malarz, Weronika Smolak, Jacek Tabor, Sławomir Tadeja, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2312.13729.md)] [[arXiv:2312.13729](https://arxiv.org/abs/2312.13729)] [[Code](https://github.com/gmum/ViewingDirectionGaussianSplatting)]\n- **📝 说明**：✏️\n\n#### [165] SWAGS: Sampling Windows Adaptively for Dynamic 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Richard Shaw, Jifei Song, Arthur Moreau, Michal Nazarczuk, Sibi Catley-Chandar, Helisa Dhamo, Eduardo Perez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab\n- **🔗 链接**：[[中英摘要](./abs/2312.13308.md)] [[arXiv:2312.13308](https://arxiv.org/abs/2312.13308)] [Code]\n- **📝 说明**：✏️\n\n#### [166] Exploring the Feasibility of Generating Realistic 3D Models of Endangered Species Using DreamGaussian: An Analysis of Elevation Angle's Impact on Model Generation\n- **🧑‍🔬 作者**：Selcuk Anil Karatopak, Deniz Sen\n- **🏫 单位**：Huawei Türkiye R&D Center\n- **🔗 链接**：[[中英摘要](./abs/2312.09682.md)] [[arXiv:2312.09682](https://arxiv.org/abs/2312.09682)] [Code]\n- **📝 说明**：✏️\n\n#### [167] Text2Immersion: Generative Immersive Scene with 3D Gaussians\n- **🧑‍🔬 作者**：Hao Ouyang, Kathryn Heal, Stephen Lombardi, Tiancheng Sun\n- **🏫 单位**：HKUST ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2312.09242.md)] [[arXiv:2312.09242](https://arxiv.org/abs/2312.09242)] [Code]\n- **📝 说明**：✏️\n\n#### [168] iComMa: Inverting 3D Gaussians Splatting for Camera Pose Estimation via Comparing and Matching\n- **🧑‍🔬 作者**：Yuan Sun, Xuan Wang, Yunfan Zhang, Jie Zhang, Caigui Jiang, Yu Guo, Fei Wang\n- **🏫 单位**：Xi’an Jiaotong University ⟐ Ant Group\n- **🔗 链接**：[[中英摘要](./abs/2312.09031.md)] [[arXiv:2312.09031](https://arxiv.org/abs/2312.09031)] [Code]\n- **📝 说明**：✏️\n\n#### [169] NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance\n- **🧑‍🔬 作者**：Hanlin Chen, Chen Li, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2312.00846.md)] [[arXiv:2312.00846](https://arxiv.org/abs/2312.00846)] [Code]\n- **📝 说明**：✏️\n\n#### [170] Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering\n- **🧑‍🔬 作者**：Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, Li Zhang\n- **🏫 单位**：Fudan University ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2311.18561.md)] [[arXiv:2311.18561](https://arxiv.org/abs/2311.18561)] [[Code](https://github.com/fudan-zvg/PVG)]\n- **📝 说明**：✏\n\n#### [171] CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting\n- **🧑‍🔬 作者**：Alexander Vilesov, Pradyumna Chari, Achuta Kadambi\n- **🏫 单位**：University of California, Los Angeles\n- **🔗 链接**：[[中英摘要](./abs/2311.17907.md)] [[arXiv:2311.17907](https://arxiv.org/abs/2311.17907)] [Code]\n- **📝 说明**：✏️\n\n#### [172] Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars\n- **🧑‍🔬 作者**：Yang Liu, Xiang Huang, Minghan Qin, Qinwei Lin, Haoqian Wang\n- **🏫 单位**： Tsinghua University ⟐ Gala Sports\n- **🔗 链接**：[[中英摘要](./abs/2311.16482.md)] [[arXiv:2311.16482](https://arxiv.org/abs/2311.16482)] [[Code](https://github.com/jimmyYliu/Animatable-3D-Gaussian)]\n- **📝 说明**：✏️\n\n#### [173] LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes\n- **🧑‍🔬 作者**：Jaeyoung Chung, Suyoung Lee, Hyeongjin Nam, Jaerin Lee, Kyoung Mu Lee\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2311.13384.md)] [[arXiv:2311.13384](https://arxiv.org/abs/2311.13384)] [[Code](https://github.com/luciddreamer-cvlab/LucidDreamer)]\n- **📝 说明**：✏️\n\n#### [174] GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise\n- **🧑‍🔬 作者**：Xinhai Li, Huaibin Wang, Kuo-Kun Tseng\n- **🏫 单位**：Harbin Institute of Technology (Shenzhen)\n- **🔗 链接**：[[中英摘要](./abs/2311.11221.md)] [[arXiv:2311.11221](https://arxiv.org/abs/2311.11221)] [Code]\n- **📝 说明**：✏️\n\n#### [175] SplatArmor: Articulated Gaussian splatting for animatable humans from monocular RGB videos\n- **🧑‍🔬 作者**：Rohit Jena, Ganesh Subramanian Iyer, Siddharth Choudhary, Brandon Smith, Pratik Chaudhari, James Gee\n- **🏫 单位**：University of Pennsylvania ⟐ Amazon.com, Inc.\n- **🔗 链接**：[[中英摘要](./abs/2311.10812.md)] [[arXiv:2311.10812](https://arxiv.org/abs/2311.10812)] [[Code](https://github.com/rohitrango/splatarmor)]\n- **📝 说明**：✏️\n\n#### [176] Flexible Techniques for Differentiable Rendering with 3D Gaussians\n- **🧑‍🔬 作者**：Leonid Keselman, Martial Hebert\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2308.14737.md)] [[arXiv:2308.14737](https://arxiv.org/abs/2308.14737)] [[Code](https://github.com/leonidk/fmb-plus)]\n- **📝 说明**：✏️\n"
  },
  {
    "path": "archive/202410.md",
    "content": "# 3D Gaussian Splatting Papers Before 2024/10/01\n\n#### [1] Robust Gaussian Splatting SLAM by Leveraging Loop Closure\n- **🧑‍🔬 作者**：Zunjie Zhu, Youxu Fang, Xin Li, Chengang Yan, Feng Xu, Chau Yuen, Yanyan Li\n- **🏫 单位**：Hangzhou Dianzi University ⟐ Macao Polytechnic University ⟐ Nanyang Technological University ⟐ Tsinghua University ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2409.20111.md)] [[arXiv:2409.20111](https://arxiv.org/abs/2409.20111)] [Code]\n- **📝 说明**：\n\n#### [2] GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting\n- **🧑‍🔬 作者**：Tao Liu, Runze Yuan, Yi'ang Ju, Xun Xu, Jiaqi Yang, Xiangting Meng, Xavier Lagorce, Laurent Kneip\n- **🏫 单位**：Mobile Perception Lab, ShanghaiTech University\n- **🔗 链接**：[[中英摘要](./abs/2409.19228.md)] [[arXiv:2409.19228](https://arxiv.org/abs/2409.19228)] [Code]\n- **📝 说明**：\n\n#### [3] 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction\n- **🧑‍🔬 作者**：Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Hyein Hwang, Soohyun Hwang, Junuk Cha, Jaewook Han, Seungryul Baek\n- **🏫 单位**：UNIST\n- **🔗 链接**：[[中英摘要](./abs/2409.19215.md)] [[arXiv:2409.19215](https://arxiv.org/abs/2409.19215)] [Code]\n- **📝 说明**：\n\n#### [4] Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes\n- **🧑‍🔬 作者**：Shuo Wang, Binbin Huang, Ruoyu Wang, Shenghua Gao\n- **🏫 单位**：ShanghaiTech University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2409.18852.md)] [[arXiv:2409.18852](https://arxiv.org/abs/2409.18852)] [[Code](https://github.com/tb2-sy/st-2dgs)]\n- **📝 说明**：\n\n#### [5] RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration\n- **🧑‍🔬 作者**：Yuezhan Tao, Dexter Ong, Varun Murali, Igor Spasojevic, Pratik Chaudhari, Vijay Kumar\n- **🏫 单位**：The GRASP Lab, University of Pennsylvania\n- **🔗 链接**：[[中英摘要](./abs/2409.18122.md)] [[arXiv:2409.18122](https://arxiv.org/abs/2409.18122)] [[Code](https://tyuezhan.github.io/RT_GuIDE/)]\n- **📝 说明**：\n\n#### [6] Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot\n- **🧑‍🔬 作者**：Justin Yu, Kush Hari, Kishore Srinivas, Karim El-Refai, Adam Rashid, Chung Min Kim, Justin Kerr, Richard Cheng, Muhammad Zubair Irshad, Ashwin Balakrishna, Thomas Kollar, Ken Goldberg\n- **🏫 单位**：The AUTOLab at UC Berkeley ⟐ The Toyota Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2409.18108.md)] [[arXiv:2409.18108](https://arxiv.org/abs/2409.18108)] [[Code](https://berkeleyautomation.github.io/LEGS/)]\n- **📝 说明**：\n\n#### [7] WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians\n- **🧑‍🔬 作者**：Dmytro Kotovenko, Olga Grebenkova, Nikolaos Sarafianos, Avinash Paliwal, Pingchuan Ma, Omid Poursaeed, Sreyas Mohan, Yuchen Fan, Yilei Li, Rakesh Ranjan, Björn Ommer\n- **🏫 单位**：CompVis @ LMU Munich, MCML ⟐ Meta Reality Labs ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2409.17917.md)] [[arXiv:2409.17917](https://arxiv.org/abs/2409.17917)] [[Code](https://github.com/facebookresearch/WaSt3D)]\n- **📝 说明**：\n\n#### [8] HGS-Planner: Hierarchical Planning Framework for Active Scene Reconstruction Using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zijun Xu, Rui Jin, Ke Wu, Yi Zhao, Zhiwei Zhang, Jieru Zhao, Zhongxue Gan, Wenchao Ding\n- **🏫 单位**：Academy for Engineering & Technology, Fudan University, Shanghai, China ⟐ Institute of Cyber-Systems and Control, College of Control Science and Engineering, Zhejiang University ⟐ Department of Computer Science and Engineering, Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2409.17624.md)] [[arXiv:2409.17624](https://arxiv.org/abs/2409.17624)] [Code]\n- **📝 说明**：\n\n#### [9] Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Phu Pham, Dipam Patel, Damon Conover, Aniket Bera\n- **🏫 单位**：Department of Computer Science, Purdue University ⟐ DEVCOM Army Research Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2409.16944.md)] [[arXiv:2409.16944](https://arxiv.org/abs/2409.16944)] [Code]\n- **📝 说明**：\n\n#### [10] Let's Make a Splan: Risk-Aware Trajectory Optimization in a Normalized Gaussian Splat\n- **🧑‍🔬 作者**：Jonathan Michaux, Seth Isaacson, Challen Enninful Adu, Adam Li, Rahul Kashyap Swayampakula, Parker Ewen, Sean Rice, Katherine A. Skinner, Ram Vasudevan\n- **🏫 单位**：the department of Department of Robotics of the University of Michigan, Ann Arbor\n- **🔗 链接**：[[中英摘要](./abs/2409.16915.md)] [[arXiv:2409.16915](https://arxiv.org/abs/2409.16915)] [[Code](https://roahmlab.github.io/splanning/)]\n- **📝 说明**：\n\n#### [11] GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization\n- **🧑‍🔬 作者**：Gennady Sidorov, Malik Mohrat, Ksenia Lebedeva, Ruslan Rakhimov, Sergey Kolyubin\n- **🏫 单位**：ITMO University ⟐ Robotics Center\n- **🔗 链接**：[[中英摘要](./abs/2409.16502.md)] [[arXiv:2409.16502](https://arxiv.org/abs/2409.16502)] [[Code](https://github.com/haksorus/gsplatloc)]\n- **📝 说明**：\n\n#### [12] Frequency-based View Selection in Gaussian Splatting Reconstruction\n- **🧑‍🔬 作者**：Monica M.Q. Li, Pierre-Yves Lajoie, Giovanni Beltrame\n- **🏫 单位**：Department of Computer and Software Engineering, Polytechnique Montreal\n- **🔗 链接**：[[中英摘要](./abs/2409.16470.md)] [[arXiv:2409.16470](https://arxiv.org/abs/2409.16470)] [Code]\n- **📝 说明**：\n\n#### [13] LiDAR-3DGS: LiDAR Reinforced 3D Gaussian Splatting for Multimodal Radiance Field Rendering\n- **🧑‍🔬 作者**：Hansol Lim, Hanbeom Chang, Jongseong Brad Choi, Chul Min Yeum\n- **🏫 单位**：Mechanical Engineering, State University of New York, Stony Brook ⟐ Civil Engineering Department, University of Waterloo\n- **🔗 链接**：[[中英摘要](./abs/2409.16296.md)] [[arXiv:2409.16296](https://arxiv.org/abs/2409.16296)] [Code]\n- **📝 说明**：\n\n#### [14] Semantics-Controlled Gaussian Splatting for Outdoor Scene Reconstruction and Rendering in Virtual Reality\n- **🧑‍🔬 作者**：Hannah Schieber, Jacob Young, Tobias Langlotz, Stefanie Zollmann, Daniel Roth\n- **🏫 单位**：Department of Artificial Intelligence in Biomedical Engineering Friedrich-Alexander-Universitat¨ Erlangen-Nurnberg (FAU) ⟐ Department of Computer Science, University of Otago ⟐ Technical University of Munich Human-Centered Computing and Extended Reality Lab TUM University Hospital Orthopedics and Sports Orthopedics\n- **🔗 链接**：[[中英摘要](./abs/2409.15959.md)] [[arXiv:2409.15959](https://arxiv.org/abs/2409.15959)] [Code]\n- **📝 说明**：\n\n#### [15] Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB\n- **🧑‍🔬 作者**：Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, Shenlong Wang\n- **🏫 单位**：University of Illinois at Urbana-Champaign ⟐ Amazon Inc\n- **🔗 链接**：[[中英摘要](./abs/2409.15689.md)] [[arXiv:2409.15689](https://arxiv.org/abs/2409.15689)] [[Code](https://github.com/leejaeyong7/ppng)]\n- **📝 说明**：\n\n#### [16] Human Hair Reconstruction with Strand-Aligned 3D Gaussians\n- **🧑‍🔬 作者**：Egor Zakharov, Vanessa Sklyarova, Michael Black, Giljoo Nam, Justus Thies, Otmar Hilliges\n- **🏫 单位**：ETH Zürich ⟐ Max Planck Institute for Intelligent Systems ⟐ Meta ⟐ Technical University of Darmstadt\n- **🔗 链接**：[[中英摘要](./abs/2409.14778.md)] [[arXiv:2409.14778](https://arxiv.org/abs/2409.14778)] [Code]\n- **📝 说明**：\n\n#### [17] Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors\n- **🧑‍🔬 作者**：Zixin Zhang, Kanghao Chen, Lin Wang\n- **🏫 单位**：Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2409.13392.md)] [[arXiv:2409.13392](https://arxiv.org/abs/2409.13392)] [Code]\n- **📝 说明**：\n\n#### [18] LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction\n- **🧑‍🔬 作者**：Changjian Jiang, Ruilan Gao, Kele Shao, Yue Wang, Rong Xiong, Yu Zhang\n- **🏫 单位**：State Key Laboratory of Industrial Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou, China ⟐ Key Laboratory of Collaborative sensing and autonomous unmanned systems of Zhejiang Province, Hangzhou, China\n- **🔗 链接**：[[中英摘要](./abs/2409.12899.md)] [[arXiv:2409.12899](https://arxiv.org/abs/2409.12899)] [[Code](https://changjianjiang01.github.io/LI-GS/)]\n- **📝 说明**：\n\n#### [19] GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Hanyue Zhang, Zhiliu Yang, Xinhe Zuo, Yuxin Tong, Ying Long, Chen Liu\n- **🏫 单位**：Yunnan University ⟐  Clarkson University\n- **🔗 链接**：[[中英摘要](./abs/2409.12774.md)] [[arXiv:2409.12774](https://arxiv.org/abs/2409.12774)] [Code]\n- **📝 说明**：\n\n#### [20] Spectral-GS: Taming 3D Gaussian Splatting with Spectral Entropy\n- **🧑‍🔬 作者**：Letian Huang, Jie Guo, Jialin Dan, Ruoyu Fu, Shujie Wang, Yuanqi Li, Yanwen Guo\n- **🏫 单位**：Nanjing University\n- **🔗 链接**：[[中英摘要](./abs/2409.12771.md)] [[arXiv:2409.12771](https://arxiv.org/abs/2409.12771)] [Code]\n- **📝 说明**：\n\n#### [21] CrossRT: A cross platform programming technology for hardware-accelerated ray tracing in CG and CV applications\n- **🧑‍🔬 作者**：Vladimir Frolov, Vadim Sanzharov, Garifullin Albert, Maxim Raenchuk, Alexei Voloboy\n- **🏫 单位**：IAI Lomonosov Moscow State University ⟐ Keldysh Institute of Applied Mathematics\n- **🔗 链接**：[[中英摘要](./abs/2409.12617.md)] [[arXiv:2409.12617](https://arxiv.org/abs/2409.12617)] [Code]\n- **📝 说明**：\n\n#### [22] Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus\n- **🧑‍🔬 作者**：Jinchang Zhang, Ningning Xu, Hao Zhang, Guoyu Lu\n- **🏫 单位**：University of Georgia ⟐ University of Massachusetts Amherst\n- **🔗 链接**：[[中英摘要](./abs/2409.12323.md)] [[arXiv:2409.12323](https://arxiv.org/abs/2409.12323)] [Code]\n- **📝 说明**：\n\n#### [23] Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks\n- **🧑‍🔬 作者**：Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar\n- **🏫 单位**：RBCCPS, Indian Institute of Science, Bangalore, India\n- **🔗 链接**：[[中英摘要](./abs/2409.11681.md)] [[arXiv:2409.11681](https://arxiv.org/abs/2409.11681)] [[Code](https://github.com/JojiJoseph/3dgs-gradient-segmentation)]\n- **📝 说明**：\n\n#### [24] GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module\n- **🧑‍🔬 作者**：Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li\n- **🏫 单位**：The School of Vehicle and Mobility, Tsinghua University, China ⟐ The State Key Laboratory of Intelligent Green Vehicle and Mobility, Tsinghua University, China ⟐ Sorbonne University, France ⟐ Tencent Technology (Beijing) Co., Ltd ⟐ University of Illinois at Urbana-Champaign, USA\n- **🔗 链接**：[[中英摘要](./abs/2409.11307.md)] [[arXiv:2409.11307](https://arxiv.org/abs/2409.11307)] [Code]\n- **📝 说明**：\n\n#### [25] GLC-SLAM: Gaussian Splatting SLAM with Efficient Loop Closure\n- **🧑‍🔬 作者**：Ziheng Xu, Qingfeng Li, Chen Chen, Xuefeng Liu, Jianwei Niu\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2409.10982.md)] [[arXiv:2409.10982](https://arxiv.org/abs/2409.10982)] [Code]\n- **📝 说明**：\n\n#### [26] Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering\n- **🧑‍🔬 作者**：Euntae Choi, Sungjoo Yoo\n- **🏫 单位**：Department of Computer Science and Engineering, Seoul National University, South Korea\n- **🔗 链接**：[[中英摘要](./abs/2409.10335.md)] [[arXiv:2409.10335](https://arxiv.org/abs/2409.10335)] [Code]\n- **📝 说明**：\n\n#### [27] BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting\n- **🧑‍🔬 作者**：Wugang Meng, Tianfu Wu, Huan Yin, Fumin Zhang\n- **🏫 单位**：Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong SAR\n- **🔗 链接**：[[中英摘要](./abs/2409.10216.md)] [[arXiv:2409.10216](https://arxiv.org/abs/2409.10216)] [[Code](https://github.com/guaMass/BEINGS)]\n- **📝 说明**：\n\n#### [28] DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments\n- **🧑‍🔬 作者**：Mahmud A. Mohamad, Gamal Elghazaly, Arthur Hubert, Raphael Frank\n- **🏫 单位**：SnT- Interdisciplinary Centre for Security, Reliability and Trust, University of Luxembourg\n- **🔗 链接**：[[中英摘要](./abs/2409.10041.md)] [[arXiv:2409.10041](https://arxiv.org/abs/2409.10041)] [[Code](https://github.com/sntubix/denser)]\n- **📝 说明**：\n\n#### [29] GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians\n- **🧑‍🔬 作者**：Dasong Gao, Peter Zhi Xuan Li, Vivienne Sze, Sertac Karaman\n- **🏫 单位**：Massachusetts Institute of Technology, Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2409.09295.md)] [[arXiv:2409.09295](https://arxiv.org/abs/2409.09295)] [Code]\n- **📝 说明**：\n\n#### [30] Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints\n- **🧑‍🔬 作者**：Shan Chen, Jiale Zhou, Lei Li\n- **🏫 单位**：East China University of Science and Technology ⟐ University of Washington ⟐ University of Copenhagen\n- **🔗 链接**：[[中英摘要](./abs/2409.08613.md)] [[arXiv:2409.08613](https://arxiv.org/abs/2409.08613)] [Code]\n- **📝 说明**：\n\n#### [31] CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Runze Chen, Mingyu Xiao, Haiyong Luo, Fang Zhao, Fan Wu, Hao Xiong, Qi Liu, Meng Song\n- **🏫 单位**：Beijing University of Posts and Telecommunications ⟐ Chinese Academy of Sciences ⟐ China Unicom Smart City Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2409.08562.md)] [[arXiv:2409.08562](https://arxiv.org/abs/2409.08562)] [Code]\n- **📝 说明**：\n\n#### [32] SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length\n- **🧑‍🔬 作者**：Bangya Liu, Suman Banerjee\n- **🏫 单位**：University of Wisconsin-Madison\n- **🔗 链接**：[[中英摘要](./abs/2409.07759.md)] [[arXiv:2409.07759](https://arxiv.org/abs/2409.07759)] [Code]\n- **📝 说明**：\n\n#### [33] Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering\n- **🧑‍🔬 作者**：Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura\n- **🏫 单位**：University of Hong Kong ⟐ Deemos Technology ⟐ ShanghaiTech University ⟐ Adobe Research, USA\n- **🔗 链接**：[[中英摘要](./abs/2409.07441.md)] [[arXiv:2409.07441](https://arxiv.org/abs/2409.07441)] [Code]\n- **📝 说明**：\n\n#### [34] GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction\n- **🧑‍🔬 作者**：Junyi Chen, Weicai Ye, Yifan Wang, Danpeng Chen, Di Huang, Wanli Ouyang, Guofeng Zhang, Yu Qiao, Tong He\n- **🏫 单位**：Shanghai AI Laboratory ⟐ Shanghai Jiao Tong University ⟐ State Key Lab of CAD&CG, Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2409.06685.md)] [[arXiv:2409.06685](https://arxiv.org/abs/2409.06685)] [Code]\n- **📝 说明**：\n\n#### [35] GASP: Gaussian Splatting for Physic-Based Simulations\n- **🧑‍🔬 作者**：Piotr Borycki, Weronika Smolak, Joanna Waczyńska, Marcin Mazur, Sławomir Tadeja, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ IDEAS ⟐ Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2409.05819.md)] [[arXiv:2409.05819](https://arxiv.org/abs/2409.05819)] [Code]\n- **📝 说明**：\n\n#### [36] Lagrangian Hashing for Compressed Neural Field Representations\n- **🧑‍🔬 作者**：Shrisudhan Govindarajan, Zeno Sambugaro, Akhmedkhan, Shabanov, Towaki Takikawa, Daniel Rebain, Weiwei Sun, Nicola Conci, Kwang Moo Yi, Andrea Tagliasacchi\n- **🏫 单位**：Simon Fraser University ⟐ University of Trento ⟐ University of Toronto ⟐ University of British Columbia ⟐ Google DeepMind\n- **🔗 链接**：[[中英摘要](./abs/2409.05334.md)] [[arXiv:2409.05334](https://arxiv.org/abs/2409.05334)] [Code]\n- **📝 说明**：\n\n#### [37] GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning\n- **🧑‍🔬 作者**：Keyi Liu, Yeqi Luo, Weidong Yang, Jingyi Xu, Zhijun Li, Wen-Ming Chen, Ben Fei\n- **🏫 单位**：Fudan University ⟐ Tongji University\n- **🔗 链接**：[[中英摘要](./abs/2409.04963.md)] [[arXiv:2409.04963](https://arxiv.org/abs/2409.04963)] [Code]\n- **📝 说明**：\n\n#### [38] 3D-GP-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors\n- **🧑‍🔬 作者**：Yujun Huang, Bin Chen, Niu Lian, Baoyi An, Shu-Tao Xia\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2409.04013.md)] [[arXiv:2409.04013](https://arxiv.org/abs/2409.04013)] [[Code](https://github.com/YujunHuang063/3D-GP-LMVIC)]\n- **📝 说明**：\n\n#### [39] LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors\n- **🧑‍🔬 作者**：Hanyang Yu, Xiaoxiao Long, Ping Tan\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2409.03456.md)] [[arXiv:2409.03456](https://arxiv.org/abs/2409.03456)] [[Code](https://github.com/hanyangyu1021/LMGaussian)]\n- **📝 说明**：\n\n#### [40] Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction\n- **🧑‍🔬 作者**：Shen Chen, Jiale Zhou, Lei Li\n- **🏫 单位**：East China University of Science and Technology ⟐ University of Washington\n- **🔗 链接**：[[中英摘要](./abs/2409.03213.md)] [[arXiv:2409.03213](https://arxiv.org/abs/2409.03213)] [Code]\n- **📝 说明**：\n\n#### [41] Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models\n- **🧑‍🔬 作者**：Zhibin Liu, Haoye Dong, Aviral Chharia, Hefeng Wu\n- **🏫 单位**：Sun Yat-sen University ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2409.02851.md)] [[arXiv:2409.02851](https://arxiv.org/abs/2409.02851)] [[Code](https://github.com/Human-VDM/Human-VDM)]\n- **📝 说明**：\n\n#### [42] Object Gaussian for Monocular 6D Pose Estimation from Sparse Views\n- **🧑‍🔬 作者**：Luqing Luo, Shichu Sun, Jiangang Yang, Linfang Zheng, Jinwei Du, Jian Liu\n- **🏫 单位**：Institute of Microelectronics Chinese Academy of Sciences, Beijing, China ⟐ University of Birmingham, Birmingham, UK ⟐ NVIDIA, Shanghai, China\n- **🔗 链接**：[[中英摘要](./abs/2409.02581.md)] [[arXiv:2409.02581](https://arxiv.org/abs/2409.02581)] [Code]\n- **📝 说明**：\n\n#### [43] GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving\n- **🧑‍🔬 作者**：Huasong Han, Kaixuan Zhou, Xiaoxiao Long, Yusen Wang, Chunxia Xiao\n- **🏫 单位**：School of Computer Science, Wuhan University ⟐ Huawei ⟐ The University of HongKong\n- **🔗 链接**：[[中英摘要](./abs/2409.02382.md)] [[arXiv:2409.02382](https://arxiv.org/abs/2409.02382)] [Code]\n- **📝 说明**：\n\n#### [44] PRoGS: Progressive Rendering of Gaussian Splats\n- **🧑‍🔬 作者**：Brent Zoomers, Maarten Wijnants, Ivan Molenaers, Joni Vanherck, Jeroen Put, Lode Jorissen, Nick Michiels\n- **🏫 单位**：Hasselt University ⟐ Flanders Make ⟐ Expertise Centre for Digital Media, Diepenbeek, Belgium\n- **🔗 链接**：[[中英摘要](./abs/2409.01761.md)] [[arXiv:2409.01761](https://arxiv.org/abs/2409.01761)] [Code]\n- **📝 说明**：\n\n#### [45] GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zixuan Guo, Yifan Xie, Weijing Xie, Peng Huang, Fei Ma, Fei Richard Yu\n- **🏫 单位**：Peking University ⟐ Xi’an Jiaotong University ⟐ Sun Yat-Sen University ⟐ Nanjing University ⟐ Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)\n- **🔗 链接**：[[中英摘要](./abs/2409.01581.md)] [[arXiv:2409.01581](https://arxiv.org/abs/2409.01581)] [Code]\n- **📝 说明**：\n\n#### [46] Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos\n- **🧑‍🔬 作者**：Qian Li, Shuojue Yang, Daiyun Shen, Yueming Jin\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2409.01003.md)] [[arXiv:2409.01003](https://arxiv.org/abs/2409.01003)] [Code]\n- **📝 说明**：\n\n#### [47] 3D Gaussian Splatting for Large-scale 3D Surface Reconstruction from Aerial Images\n- **🧑‍🔬 作者**：YuanZheng Wu, Jin Liu, Shunping Ji\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2409.00381.md)] [[arXiv:2409.00381](https://arxiv.org/abs/2409.00381)] [Code]\n- **📝 说明**：\n\n#### [48] UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM\n- **🧑‍🔬 作者**：Mostafa Mansour, Ahmed Abdelsalam, Ari Happonen, Jari Porras, Esa Rahtu\n- **🏫 单位**：Faculty of Engineering and Natural Sciences, Tampere University, Finland ⟐ School of Engineering Science, LUT University, Finland ⟐ School of Electrical Engineering, Aalto University, Finland ⟐ Faculty of Information Technology and Communication Sciences, University, Finland\n- **🔗 链接**：[[中英摘要](./abs/2409.00362.md)] [[arXiv:2409.00362](https://arxiv.org/abs/2409.00362)] [Code]\n- **📝 说明**：\n\n#### [49] OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping\n- **🧑‍🔬 作者**：Meng Wang, Junyi Wang, Changqun Xia, Chen Wang, Yue Qi\n- **🏫 单位**：State Key Laboratory of Virtual Reality Technology and Systems, Beihang University ⟐ PengCheng Laboratory ⟐ School of Computer Science and Technology, Shandong University ⟐ Beijing Technology and Business University\n- **🔗 链接**：[[中英摘要](./abs/2408.17223.md)] [[arXiv:2408.17223](https://arxiv.org/abs/2408.17223)] [Code]\n- **📝 说明**：\n\n#### [50] 2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction\n- **🧑‍🔬 作者**：Ruihan Yu, Tianyu Huang, Jingwang Ling, Feng Xu\n- **🏫 单位**：School of Software and BNRist, Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2408.16982.md)] [[arXiv:2408.16982](https://arxiv.org/abs/2408.16982)] [Code]\n- **📝 说明**：\n\n#### [51] ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model\n- **🧑‍🔬 作者**：Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan\n- **🏫 单位**：Tsinghua University ⟐ Technion ⟐ HKUST\n- **🔗 链接**：[[中英摘要](./abs/2408.16767.md)] [[arXiv:2408.16767](https://arxiv.org/abs/2408.16767)] [[Code](https://github.com/liuff19/ReconX)]\n- **📝 说明**：\n\n#### [52] G-Style: Stylized Gaussian Splatting\n- **🧑‍🔬 作者**：Áron Samuel Kovács, Pedro Hermosilla, Renata G. Raidou\n- **🏫 单位**：TU Wien, Austria\n- **🔗 链接**：[[中英摘要](./abs/2408.15695.md)] [[arXiv:2408.15695](https://arxiv.org/abs/2408.15695)] [Code]\n- **📝 说明**：\n\n#### [53] Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation\n- **🧑‍🔬 作者**：Haozhe Lou, Yurong Liu, Yike Pan, Yiran Geng, Jianteng Chen, Wenlong Ma, Chenglong Li, Lin Wang, Hengzhen Feng, Lu Shi, Liyi Luo, Yongliang Shi\n- **🏫 单位**：University of Southern California ⟐ National University of Singapore ⟐ University of Michigan ⟐ Peking University ⟐ Hong Kong University of Science and Technology ⟐ Beijing Institute of Technology ⟐ Tsinghua University ⟐ Xiaomi Robotics Lab\n- **🔗 链接**：[[中英摘要](./abs/2408.14873.md)] [[arXiv:2408.14873](https://arxiv.org/abs/2408.14873)] [Code]\n- **📝 说明**：\n\n#### [54] Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control\n- **🧑‍🔬 作者**：Yixuan He, Lin Geng Foo, Ajmal Saeed Mian, Hossein Rahmani, Jun Jiu\n- **🏫 单位**：Singapore University of Technology and Design ⟐ University of Western Australia ⟐ Lancaster University\n- **🔗 链接**：[[中英摘要](./abs/2408.13995.md)] [[arXiv:2408.13995](https://arxiv.org/abs/2408.13995)] [Code]\n- **📝 说明**：\n\n#### [55] DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting\n- **🧑‍🔬 作者**：Weiwei Cai, Weicai Ye, Peng Ye, Tong He, Tao Chen\n- **🏫 单位**：Fudan University ⟐ Zhejiang University ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2408.13972.md)] [[arXiv:2408.13972](https://arxiv.org/abs/2408.13972)] [[Code](https://github.com/Open3DVLab/DynaSurfGS)]\n- **📝 说明**：\n\n#### [56] Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs\n- **🧑‍🔬 作者**：Brandon Smart, Chuanxia Zheng, Iro Laina, Victor Adrian Prisacariu\n- **🏫 单位**：Active Vision Lab, University of Oxford ⟐ Visual Geometry Group, University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2408.13912.md)] [[arXiv:2408.13912](https://arxiv.org/abs/2408.13912)] [Code]\n- **📝 说明**：\n\n#### [57] SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting\n- **🧑‍🔬 作者**：Wenrui Li, Yapeng Mi, Fucheng Cai, Zhe Yang, Wangmeng Zuo, Xingtao Wang, Xiaopeng Fan\n- **🏫 单位**：Harbin Institute of Technology ⟐ University of Electronic Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2408.13711.md)] [[arXiv:2408.13711](https://arxiv.org/abs/2408.13711)] [[Code](https://github.com/liwrui/SceneDreamer360)]\n- **📝 说明**：\n\n#### [58] BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhenyuan Liu, Yu Guo, Xinyuan Li, Bernd Bickel, Ran Zhang\n- **🏫 单位**：Tencent PCG, New York, USA ⟐ ETH Zurich ⟐ George Mason University\n- **🔗 链接**：[[中英摘要](./abs/2408.13370.md)] [[arXiv:2408.13370](https://arxiv.org/abs/2408.13370)] [Code]\n- **📝 说明**：\n\n#### [59] S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points\n- **🧑‍🔬 作者**：Bing He, Yunuo Chen, Guo Lu, Li Song, Wenjun Zhang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2408.13036.md)] [[arXiv:2408.13036](https://arxiv.org/abs/2408.13036)] [Code]\n- **📝 说明**：\n\n#### [60] Subsurface Scattering for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jan-Niklas Dihlmann, Arjun Majumdar, Andreas Engelhardt, Raphael Braun, Hendrik P. A. Lensch\n- **🏫 单位**：University of Tübingen ⟐ Sony Germany\n- **🔗 链接**：[[中英摘要](./abs/2408.12282.md)] [[arXiv:2408.12282](https://arxiv.org/abs/2408.12282)] [[Code](https://github.com/cgtuebingen/SSS-GS)]\n- **📝 说明**：\n\n#### [61] DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments\n- **🧑‍🔬 作者**：Shuhong Liu, Xiang Chen, Hongming Chen, Quanfeng Xu, Mingrui Li\n- **🏫 单位**：University of Tokyo ⟐ Nanjing University of Science and Technology ⟐ Dalian Maritime University ⟐ Shanghai Astronomical Observatory ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2408.11540.md)] [[arXiv:2408.11540](https://arxiv.org/abs/2408.11540)] [Code]\n- **📝 说明**：\n\n#### [62] Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation\n- **🧑‍🔬 作者**：Minye Wu, Tinne Tuytelaars\n- **🏫 单位**：KU Leuven\n- **🔗 链接**：[[中英摘要](./abs/2408.10041.md)] [[arXiv:2408.10041](https://arxiv.org/abs/2408.10041)] [Code]\n- **📝 说明**：This paper has been withdrawn\n\n#### [63] SG-GS: Photo-realistic Animatable Human Avatars with Semantically-Guided Gaussian Splatting\n- **🧑‍🔬 作者**：Haoyu Zhao, Chen Yang, Hao Wang, Xingyue Zhao, Wei Shen\n- **🏫 单位**：MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University ⟐ School of Computer Science, Wuhan University ⟐ Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology ⟐ School of Software Engineering, Xi’an Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2408.09665.md)] [[arXiv:2408.09665](https://arxiv.org/abs/2408.09665)] [Code]\n- **📝 说明**：\n\n#### [64] CHASE: 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning\n- **🧑‍🔬 作者**：Haoyu Zhao, Hao Wang, Chen Yang, Wei Shen\n- **🏫 单位**：MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University ⟐ School of Computer Science, Wuhan University ⟐ Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2408.09663.md)] [[arXiv:2408.09663](https://arxiv.org/abs/2408.09663)] [Code]\n- **📝 说明**：\n\n#### [65] Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS\n- **🧑‍🔬 作者**：Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao\n- **🏫 单位**：School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, China ⟐ Beijing Academy of Artificial Intelligence, China\n- **🔗 链接**：[[中英摘要](./abs/2408.08723.md)] [[arXiv:2408.08723](https://arxiv.org/abs/2408.08723)] [Code]\n- **📝 说明**：\n\n#### [66] GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization\n- **🧑‍🔬 作者**：Kang Du, Zhihao Liang, Zeyu Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ The Hong Kong University of Science and Technology ⟐ South China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2408.08524.md)] [[arXiv:2408.08524](https://arxiv.org/abs/2408.08524)] [Code]\n- **📝 说明**：\n\n#### [67] 3D Gaussian Editing with A Single Image\n- **🧑‍🔬 作者**：Guan Luo, Tian-Xing Xu, Ying-Tian Liu, Xiao-Xiong Fan, Fang-Lue Zhang, Song-Hai Zhang\n- **🏫 单位**：Tsinghua University ⟐ Victoria University of Wellington\n- **🔗 链接**：[[中英摘要](./abs/2408.07540.md)] [[arXiv:2408.07540](https://arxiv.org/abs/2408.07540)] [Code]\n- **📝 说明**：\n\n#### [68] SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis\n- **🧑‍🔬 作者**：Saptarshi Neil Sinha, Holger Graf, Michael Weinmann\n- **🏫 单位**：Fraunhofer IGD ⟐ Delft University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2408.06975.md)] [[arXiv:2408.06975](https://arxiv.org/abs/2408.06975)] [Code]\n- **📝 说明**：\n\n#### [69] HDRGS: High Dynamic Range Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahao Wu, Lu Xiao, Chao Wang, Rui Peng, Kaiqiang Xiong, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ MPI Informatik\n- **🔗 链接**：[[中英摘要](./abs/2408.06543.md)] [[arXiv:2408.06543](https://arxiv.org/abs/2408.06543)] [Code]\n- **📝 说明**：\n\n#### [70] Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering\n- **🧑‍🔬 作者**：Jiameng Li, Yue Shi, Jiezhang Cao, Bingbing Ni, Wenjun Zhang, Kai Zhang, Luc Van Gool\n- **🏫 单位**：Nanjing University ⟐ INSAIT, Sofia University\n- **🔗 链接**：[[中英摘要](./abs/2408.06286.md)] [[arXiv:2408.06286](https://arxiv.org/abs/2408.06286)] [Code]\n- **📝 说明**：\n\n#### [71] Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis\n- **🧑‍🔬 作者**：Zhongche Qu, Zhi Zhang, Cong Liu, Jianhua Yin\n- **🏫 单位**：Columbia University ⟐ New York University ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2408.05635.md)] [[arXiv:2408.05635](https://arxiv.org/abs/2408.05635)] [Code]\n- **📝 说明**：\n\n#### [72] PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer\n- **🧑‍🔬 作者**：Libo Zhang, Yuxuan Han, Wenbin Lin, Jingwang Ling, Feng Xu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2408.05631.md)] [[arXiv:2408.05631](https://arxiv.org/abs/2408.05631)] [[Code](https://github.com/zhanglbthu/PRTGaussian)]\n- **📝 说明**：\n\n#### [73] Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction\n- **🧑‍🔬 作者**：Lingbei Meng, Bi'an Du, Wei Hu\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2408.04831.md)] [[arXiv:2408.04831](https://arxiv.org/abs/2408.04831)] [Code]\n- **📝 说明**：\n\n#### [74] InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xin-Yi Yu, Jun-Xin Yu, Li-Bo Zhou, Yan Wei, Lin-Lin Ou\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2408.04249.md)] [[arXiv:2408.04249](https://arxiv.org/abs/2408.04249)] [Code]\n- **📝 说明**：\n\n#### [75] Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM\n- **🧑‍🔬 作者**：Yan Song Hu, Dayou Mao, Yuhao Chen, John Zelek\n- **🏫 单位**：University of Waterloo\n- **🔗 链接**：[[中英摘要](./abs/2408.03825.md)] [[arXiv:2408.03825](https://arxiv.org/abs/2408.03825)] [Code]\n- **📝 说明**：\n\n#### [76] Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields\n- **🧑‍🔬 作者**：Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ University of North Carolina at Chapel Hill\n- **🔗 链接**：[[中英摘要](./abs/2408.03822.md)] [[arXiv:2408.03822](https://arxiv.org/abs/2408.03822)] [[Code](https://github.com/maincold2/Dynamic_C3DGS/)]\n- **📝 说明**：Extended Paper of Compact3DGS\n\n#### [77] PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting\n- **🧑‍🔬 作者**：Yijia Guo, Yuanxi Bai, Liwen Hu, Ziyi Guo, Mianzhi Liu, Yu Cai, Tiejun Huang, Lei Ma\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2408.03538.md)] [[arXiv:2408.03538](https://arxiv.org/abs/2408.03538)] [Code]\n- **📝 说明**：\n\n#### [78] MGFs: Masked Gaussian Fields for Meshing Building based on Multi-View Images\n- **🧑‍🔬 作者**：Tengfei Wang, Zongqian Zhan, Rui Xia, Linxia Ji, Xin Wang\n- **🏫 单位**：Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2408.03060.md)] [[arXiv:2408.03060](https://arxiv.org/abs/2408.03060)] [Code]\n- **📝 说明**：\n\n#### [79] IG-SLAM: Instant Gaussian SLAM\n- **🧑‍🔬 作者**：F. Aykut Sarikamis, A. Aydin Alatan\n- **🏫 单位**：Center for Image Analysis (OGAM), EEE Department, METU, Turkey\n- **🔗 链接**：[[中英摘要](./abs/2408.01126.md)] [[arXiv:2408.01126](https://arxiv.org/abs/2408.01126)] [Code]\n- **📝 说明**：\n\n#### [80] LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting\n- **🧑‍🔬 作者**：Zhenyu Bao, Guibiao Liao, Kaichen Zhou, Kanglin Liu, Qing Li, Guoping Qiu\n- **🏫 单位**：Peking University ⟐ Pengcheng Laboratory ⟐ University of Nottingham\n- **🔗 链接**：[[中英摘要](./abs/2408.00254.md)] [[arXiv:2408.00254](https://arxiv.org/abs/2408.00254)] [Code]\n- **📝 说明**：\n\n#### [81] Registering Neural 4D Gaussians for Endoscopic Surgery\n- **🧑‍🔬 作者**：Yiming Huang, Beilei Cui, Ikemura Kei, Jiekai Zhang, Long Bai, Hongliang Ren\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Kungliga Tekniska hogskolan Royal Institute of Technology ⟐ Hong Kong Applied Science and Technology Research Institute Company Limited\n- **🔗 链接**：[[中英摘要](./abs/2407.20213.md)] [[arXiv:2407.20213](https://arxiv.org/abs/2407.20213)] [Code]\n- **📝 说明**：\n\n#### [82] ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting\n- **🧑‍🔬 作者**：Shen Chen, Jiale Zhou, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li\n- **🏫 单位**：East China University of Science and Technology ⟐ University of Washington ⟐ University of Copenhagen ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2407.19035.md)] [[arXiv:2407.19035](https://arxiv.org/abs/2407.19035)] [Code]\n- **📝 说明**：\n\n#### [83] DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene\n- **🧑‍🔬 作者**：Xi Shi, Lingli Chen, Peng Wei, Xi Wu, Tian Jiang, Yonggang Luo, Lecheng Xie\n- **🏫 单位**：Changan Auto, AILab\n- **🔗 链接**：[[中英摘要](./abs/2407.16600.md)] [[arXiv:2407.16600](https://arxiv.org/abs/2407.16600)] [Code]\n- **📝 说明**：\n\n#### [84] Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance\n- **🧑‍🔬 作者**：Jiyeop Kim, Jongwoo Lim\n- **🏫 单位**：IPAI, Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2407.16173.md)] [[arXiv:2407.16173](https://arxiv.org/abs/2407.16173)] [Code]\n- **📝 说明**：\n\n#### [85] Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures\n- **🧑‍🔬 作者**：Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang\n- **🏫 单位**：Southeast University\n- **🔗 链接**：[[中英摘要](./abs/2407.15435.md)] [[arXiv:2407.15435](https://arxiv.org/abs/2407.15435)] [Code]\n- **📝 说明**：\n\n#### [86] HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions\n- **🧑‍🔬 作者**：Haiyang Zhou, Xinhua Cheng, Wangbo Yu, Yonghong Tian, Li Yuan\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2407.15187.md)] [[arXiv:2407.15187](https://arxiv.org/abs/2407.15187)] [[Code](https://github.com/zhouhyOcean/HoloDreamer)]\n- **📝 说明**：\n\n#### [87] EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting\n- **🧑‍🔬 作者**：Yuchen Weng, Zhengwen Shen, Ruofan Chen, Qi Wang, Jun Wang\n- **🏫 单位**：China University of Mining and Technology\n- **🔗 链接**：[[中英摘要](./abs/2407.13520.md)] [[arXiv:2407.13520](https://arxiv.org/abs/2407.13520)] [Code]\n- **📝 说明**：\n\n#### [88] Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections\n- **🧑‍🔬 作者**：Congrong Xu, Justin Kerr, Angjoo Kanazawa\n- **🏫 单位**：UC Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2407.12306.md)] [[arXiv:2407.12306](https://arxiv.org/abs/2407.12306)] [[Code](https://github.com/KevinXu02/splatfacto-w)]\n- **📝 说明**：\n\n#### [89] MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification\n- **🧑‍🔬 作者**：Zhuoxiao Li, Shanliang Yao, Yijie Chu, Angel F. Garcia-Fernandez, Yong Yue, Eng Gee Lim, Xiaohui Zhu\n- **🏫 单位**：University of Liverpool ⟐ Xi’an Jiaotong-Liverpool University ⟐ ARIES Research Centre, Universidad Antonio de Nebrija\n- **🔗 链接**：[[中英摘要](./abs/2407.11840.md)] [[arXiv:2407.11840](https://arxiv.org/abs/2407.11840)] [Code]\n- **📝 说明**：\n\n#### [90] RecGS: Removing Water Caustic with Recurrent Gaussian Splatting\n- **🧑‍🔬 作者**：Tianyi Zhang, Weiming Zhi, Kaining Huang, Joshua Mangelson, Corina Barbalata, Matthew Johnson-Roberson\n- **🏫 单位**：Carnegie Mellon University ⟐ Louisiana State University ⟐ Brigham Young University\n- **🔗 链接**：[[中英摘要](./abs/2407.10318.md)] [[arXiv:2407.10318](https://arxiv.org/abs/2407.10318)] [Code]\n- **📝 说明**：\n\n#### [91] SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion\n- **🧑‍🔬 作者**：Jiyuan Zhang, Kang Chen, Shiyan Chen, Yajing Zheng, Tiejun Huang, Zhaofei Yu\n- **🏫 单位**：Peking University ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2407.10062.md)] [[arXiv:2407.10062](https://arxiv.org/abs/2407.10062)] [Code]\n- **📝 说明**：\n\n#### [92] StyleSplat: 3D Object Style Transfer with Gaussian Splatting\n- **🧑‍🔬 作者**：Sahil Jain, Avik Kuthiala, Prabhdeep Singh Sethi, Prakanshul Saxena\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2407.09473.md)] [[arXiv:2407.09473](https://arxiv.org/abs/2407.09473)] [[Code](https://github.com/bernard0047/style-splat)]\n- **📝 说明**：\n\n#### [93] PICA: Physics-Integrated Clothed Avatar\n- **🧑‍🔬 作者**：Bo Peng, Yunfan Tao, Haoyu Zhan, Yudong Guo, Juyong Zhang\n- **🏫 单位**：the School of Mathematical Science, University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2407.05324.md)] [[arXiv:2407.05324](https://arxiv.org/abs/2407.05324)] [Code]\n- **📝 说明**：\n\n#### [94] SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction\n- **🧑‍🔬 作者**：Weixing Xie, Junfeng Yao, Xianpeng Cao, Qiqin Lin, Zerui Tang, Xiao Dong, Xiaohu Guo\n- **🏫 单位**：Xiamen University ⟐ BNU-HKBU United International College ⟐ The University of Texas at Dallas\n- **🔗 链接**：[[中英摘要](./abs/2407.05023.md)] [[arXiv:2407.05023](https://arxiv.org/abs/2407.05023)] [[Code](https://github.com/SurgicalGaussian/SurgicalGaussian)]\n- **📝 说明**：\n\n#### [95] Segment Any 4D Gaussians\n- **🧑‍🔬 作者**：Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ Huawei Inc. ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2407.04504.md)] [[arXiv:2407.04504](https://arxiv.org/abs/2407.04504)] [[Code](https://github.com/jsxzs/SA4D)]\n- **📝 说明**：\n\n#### [96] CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images\n- **🧑‍🔬 作者**：Junghe Lee, Donghyeong Kim, Dogyoon Lee, Suhwan Cho, Sangyoun Lee\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2407.03923.md)] [[arXiv:2407.03923](https://arxiv.org/abs/2407.03923)] [[Code](https://github.com/Jho-Yonsei/CRiM-GS)]\n- **📝 说明**：\n\n#### [97] PFGS: High Fidelity Point Cloud Rendering via Feature Splatting\n- **🧑‍🔬 作者**：Jiaxu Wang, Ziyi Zhang, Junhao He, Renjing Xu\n- **🏫 单位**：Hong Kong University of Science and Technology (GZ)\n- **🔗 链接**：[[中英摘要](./abs/2407.03857.md)] [[arXiv:2407.03857](https://arxiv.org/abs/2407.03857)] [[Code](https://github.com/Mercerai/PFGS)]\n- **📝 说明**：\n\n#### [98] TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation\n- **🧑‍🔬 作者**：Chaofan Luo, Donglin Di, Yongjia Ma, Zhou Xue, Chen Wei, Xun Yang, Yebin Liu\n- **🏫 单位**：Space AI, Li Auto ⟐ University of Science and Technology of China ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2407.02034.md)] [[arXiv:2407.02034](https://arxiv.org/abs/2407.02034)] [Code]\n- **📝 说明**：\n\n#### [99] OccFusion: Rendering Occluded Humans with Generative Diffusion Priors\n- **🧑‍🔬 作者**：Adam Sun, Tiange Xiang, Scott Delp, Li Fei-Fei, Ehsan Adeli\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2407.00316.md)] [[arXiv:2407.00316](https://arxiv.org/abs/2407.00316)] [Code]\n- **📝 说明**：\n"
  },
  {
    "path": "archive/202501.md",
    "content": "# 3D Gaussian Splatting Papers Before 2025/01/01\n\n#### [1] 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives\n- **🧑‍🔬 作者**：Zeyu Yang, Zijie Pan, Xiatian Zhu, Li Zhang, Yu-Gang Jiang, Philip H. S. Torr\n- **🏫 单位**：Fudan University ⟐ University of Surrey ⟐ University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2412.20720.md)] [[arXiv:2412.20720](https://arxiv.org/abs/2412.20720)] [Code]\n- **📝 说明**：\n\n#### [2] GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Atticus J. Zeller\n- **🏫 单位**：Southeast University Chengxian College\n- **🔗 链接**：[[中英摘要](./abs/2412.20056.md)] [[arXiv:2412.20056](https://arxiv.org/abs/2412.20056)] [[Code](https://github.com/AtticusZeller/GsplatLoc)]\n- **📝 说明**：\n\n#### [3] FlameGS: Reconstruct flame light field via Gaussian Splatting\n- **🧑‍🔬 作者**：Yunhao Shui, Fuhao Zhang, Can Gao, Hao Xue, Zhiyin Ma, Gang Xun, Xuesong Li\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2412.19841.md)] [[arXiv:2412.19841](https://arxiv.org/abs/2412.19841)] [Code]\n- **📝 说明**：\n\n#### [4] DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction\n- **🧑‍🔬 作者**：Kai Xu, Tze Ho Elden Tse, Jizong Peng, Angela Yao\n- **🏫 单位**：National University of Singapore ⟐ dConstruct Robotics\n- **🔗 链接**：[[中英摘要](./abs/2412.19584.md)] [[arXiv:2412.19584](https://arxiv.org/abs/2412.19584)] [[Code](https://github.com/kai422/das3r)]\n- **📝 说明**：\n\n#### [5] Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images\n- **🧑‍🔬 作者**：Xudong Cai, Yongcai Wang, Zhaoxin Fan, Deng Haoran, Shuo Wang, Wanting Li, Deying Li, Lun Luo, Minhang Wang, Jintao Xu\n- **🏫 单位**：School of Information, Renmin University of China, Beijing, China ⟐ Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Institute of Artificial Intelligence, Beihang University, Beijing, China ⟐ HAOMO.AI, Beijing, Chin\n- **🔗 链接**：[[中英摘要](./abs/2412.19518.md)] [[arXiv:2412.19518](https://arxiv.org/abs/2412.19518)] [Code]\n- **📝 说明**：\n\n#### [6] Learning Radiance Fields from a Single Snapshot Compressive Image\n- **🧑‍🔬 作者**：Yunhao Li, Xiang Liu, Xiaodong Wang, Xin Yuan, Peidong Liu\n- **🏫 单位**：College of Computer Science and Technology at Zhejiang University and the School of Engineering at Westlake University ⟐ School of Engineering at Westlake University, Hangzhou, Zhejiang, China\n- **🔗 链接**：[[中英摘要](./abs/2412.19483.md)] [[arXiv:2412.19483](https://arxiv.org/abs/2412.19483)] [Code]\n- **📝 说明**：\n\n#### [7] Generating Editable Head Avatars with 3D Gaussian GANs\n- **🧑‍🔬 作者**：Guohao Li, Hongyu Yang, Yifang Men, Di Huang, Weixin Li, Ruijie Yang, Yunhong Wang\n- **🏫 单位**：School of Computer Science and Engineering, Beihang University, Beijing, China ⟐ School of Artificial Intelligence, Beihang University, Beijing, China ⟐ Shanghai Artificial Intelligence Laboratory, Shanghai, China\n- **🔗 链接**：[[中英摘要](./abs/2412.19149.md)] [[arXiv:2412.19149](https://arxiv.org/abs/2412.19149)] [[Code](https://github.com/liguohao96/EGG3D)]\n- **📝 说明**：\n\n#### [8] CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Siyu Jiao, Haoye Dong, Yuyang Yin, Zequn Jie, Yinlong Qian, Yao Zhao, Humphrey Shi, Yunchao Wei\n- **🏫 单位**：Institute of Information Science, Beijing Jiaotong University ⟐ Visual Intellgence +X International Cooperation Joint Laboratory of MOE ⟐ National University of Singapore ⟐ Meituan ⟐ Georgia Institute of Technology ⟐ Picsart AI Research (PAIR)\n- **🔗 链接**：[[中英摘要](./abs/2412.19142.md)] [[arXiv:2412.19142](https://arxiv.org/abs/2412.19142)] [Code]\n- **📝 说明**：\n\n#### [9] MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo\n- **🧑‍🔬 作者**：Byeonggwon Lee, Junkyu Park, Khang Truong Giang, Sungho Jo, Soohwan Song\n- **🏫 单位**：College of AI Convergence, Dongguk University ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2412.19130.md)] [[arXiv:2412.19130](https://arxiv.org/abs/2412.19130)] [Code]\n- **📝 说明**：\n\n#### [10] ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization\n- **🧑‍🔬 作者**：Zixiao Gu, Mengtian Li, Ruhua Chen, Zhongxia Ji, Sichen Guo, Zhenye Zhang, Guangnan Ye, Zuo Hu\n- **🏫 单位**：Fudan University ⟐ Shanghai University ⟐ Shanghai Theatre Academy\n- **🔗 链接**：[[中英摘要](./abs/2412.18783.md)] [[arXiv:2412.18783](https://arxiv.org/abs/2412.18783)] [Code]\n- **📝 说明**：\n\n#### [11] RSGaussian:3D Gaussian Splatting with LiDAR for Aerial Remote Sensing Novel View Synthesis\n- **🧑‍🔬 作者**：Yiling Yao, Wenjuan Zhang, Bing Zhang, Bocheng Li, Yaning Wang, Bowen Wang\n- **🏫 单位**：Chinese Academy of Sciences ⟐ International Research Center of Big Data for Sustainable Development Goals ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2412.18380.md)] [[arXiv:2412.18380](https://arxiv.org/abs/2412.18380)] [Code]\n- **📝 说明**：\n\n#### [12] LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding\n- **🧑‍🔬 作者**：Hao Li, Roy Qin, Zhengyu Zou, Diqi He, Bohan Li, Bingquan Dai, Dingewn Zhang, Junwei Han\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Tsinghua University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2412.17635.md)] [[arXiv:2412.17635](https://arxiv.org/abs/2412.17635)] [[Code](https://github.com/lifuguan/LangSurf)]\n- **📝 说明**：\n\n#### [13] CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction\n- **🧑‍🔬 作者**：Yuanyuan Gao, Yalun Dai, Hao Li, Weicai Ye, Junyi Chen, Danpeng Chen, Dingwen Zhang, Tong He, Guofeng Zhang, Junwei Han\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Nanyang Technological University ⟐ Zhejiang University ⟐ Shanghai AI Lab\n- **🔗 链接**：[[中英摘要](./abs/2412.17612.md)] [[arXiv:2412.17612](https://arxiv.org/abs/2412.17612)] [[Code](https://github.com/zju3dv/CoSurfGS)]\n- **📝 说明**：\n\n#### [14] Exploring Dynamic Novel View Synthesis Technologies for Cinematography\n- **🧑‍🔬 作者**：Adrian Azzarelli, Nantheera Anantrasirichai, David R Bull\n- **🏫 单位**：University of Bristol\n- **🔗 链接**：[[中英摘要](./abs/2412.17532.md)] [[arXiv:2412.17532](https://arxiv.org/abs/2412.17532)] [Code]\n- **📝 说明**：\n\n#### [15] Balanced 3DGS: Gaussian-wise Parallelism Rendering with Fine-Grained Tiling\n- **🧑‍🔬 作者**：Hao Gui, Lin Hu, Rui Chen, Mingxiao Huang, Yuxin Yin, Jin Yang, Yong Wu, Chen Liu, Zhongxu Sun, Xueyang Zhang, Kun Zhan\n- **🏫 单位**：Li Auto ⟐ NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2412.17378.md)] [[arXiv:2412.17378](https://arxiv.org/abs/2412.17378)] [Code]\n- **📝 说明**：\n\n#### [16] GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs\n- **🧑‍🔬 作者**：Xingrui Wang, Cuiling Lan, Hanxin Zhu, Zhibo Chen, Yan Lu\n- **🏫 单位**：University of Science and Technology of China ⟐ Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](./abs/2412.16932.md)] [[arXiv:2412.16932](https://arxiv.org/abs/2412.16932)] [Code]\n- **📝 说明**：\n\n#### [17] GeoTexDensifier: Geometry-Texture-Aware Densification for High-Quality Photorealistic 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hanqing Jiang, Xiaojun Xiang, Han Sun, Hongjie Li, Liyang Zhou, Xiaoyu Zhang, Guofeng Zhang\n- **🏫 单位**：SenseTime Research ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2412.16809.md)] [[arXiv:2412.16809](https://arxiv.org/abs/2412.16809)] [Code]\n- **📝 说明**：\n\n#### [18] SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum\n- **🧑‍🔬 作者**：JunEn Low, Maximilian Adang, Javier Yu, Keiko Nagami, Mac Schwager\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2412.16346.md)] [[arXiv:2412.16346](https://arxiv.org/abs/2412.16346)] [[Code](https://github.com/StanfordMSL/SousVide)]\n- **📝 说明**：\n\n#### [19] Interactive Scene Authoring with Specialized Generative Primitives\n- **🧑‍🔬 作者**：Clément Jambon, Changwoon Choi, Dongsu Zhang, Olga Sorkine-Hornung, Young Min Kim\n- **🏫 单位**：ETHZurich ⟐ SeoulNational University\n- **🔗 链接**：[[中英摘要](./abs/2412.16253.md)] [[arXiv:2412.16253](https://arxiv.org/abs/2412.16253)] [Code]\n- **📝 说明**：\n\n#### [20] AvatarPerfect: User-Assisted 3D Gaussian Splatting Avatar Refinement with Automatic Pose Suggestion\n- **🧑‍🔬 作者**：Jotaro Sakamiya, I-Chao Shen, Jinsong Zhang, Mustafa Doga Dogan, Takeo Igarashi\n- **🏫 单位**：The University of Tokyo ⟐ Tianjin University ⟐ Adobe Research\n- **🔗 链接**：[[中英摘要](./abs/2412.15609.md)] [[arXiv:2412.15609](https://arxiv.org/abs/2412.15609)] [Code]\n- **📝 说明**：\n\n#### [21] LiHi-GS: LiDAR-Supervised Gaussian Splatting for Highway Driving Scene Reconstruction\n- **🧑‍🔬 作者**：Pou-Chun Kung, Xianling Zhang, Katherine A. Skinner, Nikita Jaipuria\n- **🏫 单位**：Latitude AI ⟐ University of Michigan, Ann Arbor\n- **🔗 链接**：[[中英摘要](./abs/2412.15447.md)] [[arXiv:2412.15447](https://arxiv.org/abs/2412.15447)] [Code]\n- **📝 说明**：\n\n#### [22] SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction\n- **🧑‍🔬 作者**：Zhuowen Shen, Yuan Liu, Zhang Chen, Zhong Li, Jiepeng Wang, Yongqing Liang, Zhengming Yu, Jingdong Zhang, Yi Xu, Scott Schaefer, Xin Li, Wenping Wang\n- **🏫 单位**：Texas A&M University ⟐ OPPO US Research ⟐ Nanyang Technological University ⟐ Hong Kong University of Science and Technology ⟐ University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2412.15400.md)] [[arXiv:2412.15400](https://arxiv.org/abs/2412.15400)] [Code]\n- **📝 说明**：\n\n#### [23] GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Qianpu Sun, Changyong Shu, Sifan Zhou, Zichen Yu, Yan Chen, Dawei Yang, Yuan Chun\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School ⟐ Houmo AI ⟐ Dalian University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.14579.md)] [[arXiv:2412.14579](https://arxiv.org/abs/2412.14579)] [Code]\n- **📝 说明**：\n\n#### [24] Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation\n- **🧑‍🔬 作者**：Yongsung Kim, Minjun Park, Jooyoung Choi, Sungroh Yoon\n- **🏫 单位**：Seoul National University ⟐ ECE ⟐ AIIS, ASRI, INMC, ISRC, Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2412.14568.md)] [[arXiv:2412.14568](https://arxiv.org/abs/2412.14568)] [Code]\n- **📝 说明**：\n\n#### [25] GAGS: Granularity-Aware Feature Distillation for Language Gaussian Splatting\n- **🧑‍🔬 作者**：Yuning Peng, Haiping Wang, Yuan Liu, Chenglu Wen, Zhen Dong, Bisheng Yang\n- **🏫 单位**：Wuhan University ⟐ Hong Kong University of Science and Technology ⟐ Xiamen University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2412.13654.md)] [[arXiv:2412.13654](https://arxiv.org/abs/2412.13654)] [[Code](https://github.com/WHU-USI3DV/GAGS)]\n- **📝 说明**：\n\n#### [26] 4D Radar-Inertial Odometry based on Gaussian Modeling and Multi-Hypothesis Scan Matching\n- **🧑‍🔬 作者**：Fernando Amodeo, Luis Merino, Fernando Caballero\n- **🏫 单位**：Universidad Pablo de Olavide\n- **🔗 链接**：[[中英摘要](./abs/2412.13639.md)] [[arXiv:2412.13639](https://arxiv.org/abs/2412.13639)] [[Code](https://github.com/robotics-upo/gaussian-rio)]\n- **📝 说明**：\n\n#### [27] Turbo-GS: Accelerating 3D Gaussian Fitting for High-Quality Radiance Fields\n- **🧑‍🔬 作者**：Tao Lu, Ankit Dhiman, R Srinath, Emre Arslan, Angela Xing, Yuanbo Xiangli, R Venkatesh Babu, Srinath Sridhar\n- **🏫 单位**：Brown University ⟐ Indian Institute of Science, Bangalore ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](./abs/2412.13547.md)] [[arXiv:2412.13547](https://arxiv.org/abs/2412.13547)] [[Code](https://github.com/inspirelt/Turbo-GS)]\n- **📝 说明**：\n\n#### [28] NFL-BA: Improving Endoscopic SLAM with Near-Field Light Bundle Adjustment\n- **🧑‍🔬 作者**：Andrea Dunn Beltran, Daniel Rho, Marc Niethammer, Roni Sengupta\n- **🏫 单位**：University of North Carolina at Chapel Hill\n- **🔗 链接**：[[中英摘要](./abs/2412.13176.md)] [[arXiv:2412.13176](https://arxiv.org/abs/2412.13176)] [Code]\n- **📝 说明**：\n\n#### [29] CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image\n- **🧑‍🔬 作者**：Wonseok Roh, Hwanhee Jung, Jong Wook Kim, Seunggwan Lee, Innfarn Yoo, Andreas Lugmayr, Seunggeun Chi, Karthik Ramani, Sangpil Kim\n- **🏫 单位**：Korea University ⟐ Google Research ⟐ Purdue University\n- **🔗 链接**：[[中英摘要](./abs/2412.12906.md)] [[arXiv:2412.12906](https://arxiv.org/abs/2412.12906)] [Code]\n- **📝 说明**：\n\n#### [30] HyperGS: Hyperspectral 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Christopher Thirgood, Oscar Mendez, Erin Chao Ling, Jon Storey, Simon Hadfield\n- **🏫 单位**：University of Surrey ⟐ I3D Robotics, Kent, UK\n- **🔗 链接**：[[中英摘要](./abs/2412.12849.md)] [[arXiv:2412.12849](https://arxiv.org/abs/2412.12849)] [Code]\n- **📝 说明**：\n\n#### [31] Gaussian Billboards: Expressive 2D Gaussian Splatting with Textures\n- **🧑‍🔬 作者**：Sebastian Weiss, Derek Bradley\n- **🏫 单位**：DisneyResearch\n- **🔗 链接**：[[中英摘要](./abs/2412.12734.md)] [[arXiv:2412.12734](https://arxiv.org/abs/2412.12734)] [Code]\n- **📝 说明**：\n\n#### [32] Wonderland: Navigating 3D Scenes from a Single Image\n- **🧑‍🔬 作者**：Hanwen Liang, Junli Cao, Vidit Goel, Guocheng Qian, Sergei Korolev, Demetri Terzopoulos, Konstantinos N. Plataniotis, Sergey Tulyakov, Jian Ren\n- **🏫 单位**：University of Toronto ⟐ Snap Inc. ⟐ University of California, Los Angeles\n- **🔗 链接**：[[中英摘要](./abs/2412.12091.md)] [[arXiv:2412.12091](https://arxiv.org/abs/2412.12091)] [Code]\n- **📝 说明**：\n\n#### [33] GS-ProCams: Gaussian Splatting-based Projector-Camera Systems\n- **🧑‍🔬 作者**：Qingyue Deng, Jijiang Li, Haibin Ling, Bingyao Huang\n- **🏫 单位**：Southwest University ⟐ Stony Brook University\n- **🔗 链接**：[[中英摘要](./abs/2412.11762.md)] [[arXiv:2412.11762](https://arxiv.org/abs/2412.11762)] [Code]\n- **📝 说明**：\n\n#### [34] Deformable Radial Kernel Splatting\n- **🧑‍🔬 作者**：Yi-Hua Huang, Ming-Xian Lin, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi\n- **🏫 单位**：The University of Hong Kong ⟐ VAST\n- **🔗 链接**：[[中英摘要](./abs/2412.11752.md)] [[arXiv:2412.11752](https://arxiv.org/abs/2412.11752)] [Code]\n- **📝 说明**：\n\n#### [35] SweepEvGS: Event-Based 3D Gaussian Splatting for Macro and Micro Radiance Field Rendering from a Single Sweep\n- **🧑‍🔬 作者**：Jingqian Wu, Shuo Zhu, Chutian Wang, Boxin Shi, Edmund Y. Lam\n- **🏫 单位**：The University of Hong Kong ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2412.11579.md)] [[arXiv:2412.11579](https://arxiv.org/abs/2412.11579)] [Code]\n- **📝 说明**：\n\n#### [36] EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Dong In Lee, Hyeongcheol Park, Jiyoung Seo, Eunbyung Park, Hyunje Park, Ha Dam Baek, Shin Sangheon, Sangmin kim, Sangpil Kim\n- **🏫 单位**：Korea University ⟐ Google ⟐ Sungkyunkwan University ⟐ Hanhwa Systems\n- **🔗 链接**：[[中英摘要](./abs/2412.11520.md)] [[arXiv:2412.11520](https://arxiv.org/abs/2412.11520)] [Code]\n- **📝 说明**：\n\n#### [37] GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs\n- **🧑‍🔬 作者**：Xinli Xu, Wenhang Ge, Dicong Qiu, ZhiFei Chen, Dongyu Yan, Zhuoyun Liu, Haoyu Zhao, Hanfeng Zhao, Shunsi Zhang, Junwei Liang, Ying-Cong Chen\n- **🏫 单位**：HKUST(GZ) ⟐ HKUST ⟐ Quwan\n- **🔗 链接**：[[中英摘要](./abs/2412.11258.md)] [[arXiv:2412.11258](https://arxiv.org/abs/2412.11258)] [[Code](https://github.com/xxlbigbrother/Gaussian-Property)]\n- **📝 说明**：\n\n#### [38] GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction\n- **🧑‍🔬 作者**：Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2412.10373.md)] [[arXiv:2412.10373](https://arxiv.org/abs/2412.10373)] [[Code](https://github.com/zuosc19/GaussianWorld)]\n- **📝 说明**：\n\n#### [39] GaussianAD: Gaussian-Centric End-to-End Autonomous Driving\n- **🧑‍🔬 作者**：Wenzhao Zheng, Junjie Wu, Yao Zheng, Sicheng Zuo, Zixun Xie, Longchao Yang, Yong Pan, Zhihui Hao, Peng Jia, Xianpeng Lang, Shanghang Zhang\n- **🏫 单位**：Tsinghua University ⟐ Li Auto ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2412.10371.md)] [[arXiv:2412.10371](https://arxiv.org/abs/2412.10371)] [[Code](https://github.com/wzzheng/GaussianAD)]\n- **📝 说明**：\n\n#### [40] SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians\n- **🧑‍🔬 作者**：Siyun Liang, Sen Wang, Kunyi Li, Michael Niemeyer, Stefano Gasperini, Nassir Navab, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Google ⟐ Munich Center for Machine Learning ⟐ Visual AI\n- **🔗 链接**：[[中英摘要](./abs/2412.10231.md)] [[arXiv:2412.10231](https://arxiv.org/abs/2412.10231)] [Code]\n- **📝 说明**：\n\n#### [41] Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories\n- **🧑‍🔬 作者**：Xiaohan Zhang, Zhenyu Sun, Yukui Qiu, Junyan Su, Qi Liu\n- **🏫 单位**：South China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.10078.md)] [[arXiv:2412.10078](https://arxiv.org/abs/2412.10078)] [Code]\n- **📝 说明**：\n\n#### [42] TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views\n- **🧑‍🔬 作者**：Liang Zhao, Zehan Bao, Yi Xie, Hong Chen, Yaohui Chen, Weifu Li\n- **🏫 单位**：Huazhong Agricultural University ⟐ Engineering Research Center of Intelligent Technology for Agriculture\n- **🔗 链接**：[[中英摘要](./abs/2412.10051.md)] [[arXiv:2412.10051](https://arxiv.org/abs/2412.10051)] [[Code](https://github.com/leon2000-ai/TSGaussian)]\n- **📝 说明**：\n\n#### [43] RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Masanori Suganuma, Takayuki Okatani\n- **🏫 单位**：Tongji University ⟐ Tohoku University\n- **🔗 链接**：[[中英摘要](./abs/2412.09868.md)] [[arXiv:2412.09868](https://arxiv.org/abs/2412.09868)] [Code]\n- **📝 说明**：\n\n#### [44] DSplats: 3D Generation by Denoising Splats-Based Multiview Diffusion Models\n- **🧑‍🔬 作者**：Kevin Miao, Harsh Agrawal, Qihang Zhang, Federico Semeraro, Marco Cavallo, Jiatao Gu, Alexander Toshev\n- **🏫 单位**：Apple\n- **🔗 链接**：[[中英摘要](./abs/2412.09648.md)] [[arXiv:2412.09648](https://arxiv.org/abs/2412.09648)] [Code]\n- **📝 说明**：\n\n#### [45] LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors\n- **🧑‍🔬 作者**：Yabo Chen, Chen Yang, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, Wei Shen, Wenrui Dai, Hongkai Xiong, Qi Tian\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Huawei Inc.\n- **🔗 链接**：[[中英摘要](./abs/2412.09597.md)] [[arXiv:2412.09597](https://arxiv.org/abs/2412.09597)] [[Code](https://github.com/AbrahamYabo/LiftImage3D)]\n- **📝 说明**：\n\n#### [46] FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction\n- **🧑‍🔬 作者**：Jiale Xu, Shenghua Gao, Ying Shan\n- **🏫 单位**：ARC Lab, Tencent PCG ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2412.09573.md)] [[arXiv:2412.09573](https://arxiv.org/abs/2412.09573)] [[Code](https://github.com/TencentARC/FreeSplatter)]\n- **📝 说明**：\n\n#### [47] SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing\n- **🧑‍🔬 作者**：Xueting Li, Ye Yuan, Shalini De Mello, Gilles Daviet, Jonathan Leaf, Miles Macklin, Jan Kautz, Umar Iqbal\n- **🏫 单位**：NVIDIA\n- **🔗 链接**：[[中英摘要](./abs/2412.09545.md)] [[arXiv:2412.09545](https://arxiv.org/abs/2412.09545)] [Code]\n- **📝 说明**：\n\n#### [48] LIVE-GS: LLM Powers Interactive VR by Enhancing Gaussian Splatting\n- **🧑‍🔬 作者**：Haotian Mao, Zhuoxiong Xu, Siyue Wei, Yule Quan, Nianchen Deng, Xubo Yang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai AI Lab\n- **🔗 链接**：[[中英摘要](./abs/2412.09176.md)] [[arXiv:2412.09176](https://arxiv.org/abs/2412.09176)] [Code]\n- **📝 说明**：\n\n#### [49] ProGDF: Progressive Gaussian Differential Field for Controllable and Flexible 3D Editing\n- **🧑‍🔬 作者**：Yian Zhao, Wanshi Xu, Yang Wu, Weiheng Huang, Zhongqian Sun, Wei Yang\n- **🏫 单位**：Peking University ⟐ Tencent AI Lab\n- **🔗 链接**：[[中英摘要](./abs/2412.08152.md)] [[arXiv:2412.08152](https://arxiv.org/abs/2412.08152)] [Code]\n- **📝 说明**：\n\n#### [50] Diffusion-Based Attention Warping for Consistent 3D Scene Editing\n- **🧑‍🔬 作者**：Eyal Gomel, Lior Wolf\n- **🏫 单位**：Tel-Aviv University\n- **🔗 链接**：[[中英摘要](./abs/2412.07984.md)] [[arXiv:2412.07984](https://arxiv.org/abs/2412.07984)] [[Code](https://attention-warp.github.io/)]\n- **📝 说明**：\n\n#### [51] Proc-GS: Procedural Building Generation for City Assembly with 3D Gaussians\n- **🧑‍🔬 作者**：Yixuan Li, Xingjian Ran, Linning Xu, Tao Lu, Mulin Yu, Zhenzhi Wang, Yuanbo Xiangli, Dahua Lin, Bo Dai\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Brown University ⟐ Cornell University ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2412.07660.md)] [[arXiv:2412.07660](https://arxiv.org/abs/2412.07660)] [[Code](https://github.com/city-super/ProcGS/)]\n- **📝 说明**：\n\n#### [52] Faster and Better 3D Splatting via Group Training\n- **🧑‍🔬 作者**：Chengbo Wang, Guozheng Ma, Yifei Xue, Yizhen Lao\n- **🏫 单位**：Hunan University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2412.07608.md)] [[arXiv:2412.07608](https://arxiv.org/abs/2412.07608)] [Code]\n- **📝 说明**：\n\n#### [53] ReCap: Better Gaussian Relighting with Cross-Environment Captures\n- **🧑‍🔬 作者**：Jingzhi Li, Zongwei Wu, Eduard Zamfir, Radu Timofte\n- **🏫 单位**：University of Wurzburg\n- **🔗 链接**：[[中英摘要](./abs/2412.07534.md)] [[arXiv:2412.07534](https://arxiv.org/abs/2412.07534)] [Code]\n- **📝 说明**：\n\n#### [54] ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery\n- **🧑‍🔬 作者**：Yanzhe Lyu, Kai Cheng, Xin Kang, Xuejin Chen\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2412.07494.md)] [[arXiv:2412.07494](https://arxiv.org/abs/2412.07494)] [Code]\n- **📝 说明**：\n\n#### [55] EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering\n- **🧑‍🔬 作者**：Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski\n- **🏫 单位**：Sony Semiconductor Solutions Corporation ⟐ University of Toronto ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2412.07293.md)] [[arXiv:2412.07293](https://arxiv.org/abs/2412.07293)] [Code]\n- **📝 说明**：\n\n#### [56] MV-DUSt3R+: Single-Stage Scene Reconstruction from Sparse Views In 2 Seconds\n- **🧑‍🔬 作者**：Zhenggang Tang, Yuchen Fan, Dilin Wang, Hongyu Xu, Rakesh Ranjan, Alexander Schwing, Zhicheng Yan\n- **🏫 单位**：Meta Reality Labs ⟐ University of Illinois Urbana Champaign\n- **🔗 链接**：[[中英摘要](./abs/2412.06974.md)] [[arXiv:2412.06974](https://arxiv.org/abs/2412.06974)] [[Code](https://mv-dust3rp.github.io/)]\n- **📝 说明**：\n\n#### [57] Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video\n- **🧑‍🔬 作者**：Renlong Wu, Zhilu Zhang, Mingyang Chen, Xiaopeng Fan, Zifei Yan, Wangmeng Zuo\n- **🏫 单位**：Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.06424.md)] [[arXiv:2412.06424](https://arxiv.org/abs/2412.06424)] [[Code](https://github.com/ZcsrenlongZ/Deblur4DGS)]\n- **📝 说明**：\n\n#### [58] Efficient Semantic Splatting for Remote Sensing Multi-view Segmentation\n- **🧑‍🔬 作者**：Zipeng Qi, Hao Chen, Haotian Zhang, Zhengxia Zou, Zhenwei Shi\n- **🏫 单位**：Beihang University ⟐ Shanghai Artificial Intelligence Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2412.05969.md)] [[arXiv:2412.05969](https://arxiv.org/abs/2412.05969)] [Code]\n- **📝 说明**：\n\n#### [59] GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting and Meshing\n- **🧑‍🔬 作者**：Jianing Zhang, Yuchao Zheng, Ziwei Li, Qionghai Dai, Xiaoyun Yuan\n- **🏫 单位**：Fudan University ⟐ Tsinghua University ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2412.05908.md)] [[arXiv:2412.05908](https://arxiv.org/abs/2412.05908)] [Code]\n- **📝 说明**：\n\n#### [60] SizeGS: Size-aware Compression of 3D Gaussians with Hierarchical Mixed Precision Quantization\n- **🧑‍🔬 作者**：Shuzhao Xie, Jiahang Liu, Weixiang Zhang, Shijia Ge, Sicheng Pan, Chen Tang, Yunpeng Bai, Zhi Wang\n- **🏫 单位**：Tsinghua University ⟐ Harbin Institute of Technology, Shenzhen ⟐ Chinese University of Hong Kong ⟐ The University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2412.05808.md)] [[arXiv:2412.05808](https://arxiv.org/abs/2412.05808)] [[Code](https://github.com/mmlab-sigs/sizegs)]\n- **📝 说明**：\n\n#### [61] Temporally Compressed 3D Gaussian Splatting for Dynamic Scenes\n- **🧑‍🔬 作者**：Saqib Javed, Ahmad Jarrar Khan, Corentin Dumery, Chen Zhao, Mathieu Salzmann\n- **🏫 单位**：EPFL ⟐ Swiss Data Science Center\n- **🔗 链接**：[[中英摘要](./abs/2412.05700.md)] [[arXiv:2412.05700](https://arxiv.org/abs/2412.05700)] [Code]\n- **📝 说明**：\n\n#### [62] WATER-GS: Toward Copyright Protection for 3D Gaussian Splatting via Universal Watermarking\n- **🧑‍🔬 作者**：Yuqi Tan, Xiang Liu, Shuzhao Xie, Bin Chen, Shu-Tao Xia, Zhi Wang\n- **🏫 单位**：Tsinghua Shenzhen International Graduate SchoolHarbin Institute of Technology, Shenzhen ⟐ Peng Cheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2412.05695.md)] [[arXiv:2412.05695](https://arxiv.org/abs/2412.05695)] [Code]\n- **📝 说明**：\n\n#### [63] Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation\n- **🧑‍🔬 作者**：Wenqing Wang, Yun Fu\n- **🏫 单位**：Northeastern University, USA ⟐  360 Huntington Ave, Boston, MA 02115\n- **🔗 链接**：[[中英摘要](./abs/2412.05560.md)] [[arXiv:2412.05560](https://arxiv.org/abs/2412.05560)] [Code]\n- **📝 说明**：\n\n#### [64] Radiant: Large-scale 3D Gaussian Rendering based on Hierarchical Framework\n- **🧑‍🔬 作者**：Haosong Peng, Tianyu Qi, Yufeng Zhan, Hao Li, Yalun Dai, Yuanqing Xia\n- **🏫 单位** Beijing Institute of Technology ⟐ Sun Yat-sen University, Shenzhen ⟐ Northwestern Polytechnical University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2412.05546.md)] [[arXiv:2412.05546](https://arxiv.org/abs/2412.05546)] [Code]\n- **📝 说明**：\n\n#### [65] Extrapolated Urban View Synthesis Benchmark\n- **🧑‍🔬 作者**：Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic, Yurong You, Lingjie Liu, Yue Wang, Marco Pavone, Chen Feng, Yiming Li\n- **🏫 单位**：NYU ⟐ NVIDIA ⟐ University of Pennsylvania ⟐ USC ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2412.05256.md)] [[arXiv:2412.05256](https://arxiv.org/abs/2412.05256)] [[Code](https://github.com/ai4ce/EUVS-Benchmark/)]\n- **📝 说明**：\n\n#### [66] MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting\n- **🧑‍🔬 作者**：Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, Ming Lu\n- **🏫 单位**：Institute of Software, Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Intel Labs China ⟐ Tsinghua University ⟐ Nankai University ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2412.04955.md)] [[arXiv:2412.04955](https://arxiv.org/abs/2412.04955)] [[Code](https://github.com/ChenVoid/MGA/)]\n- **📝 说明**：\n\n#### [67] Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction\n- **🧑‍🔬 作者**：Jixuan Fan, Wanhua Li, Yifei Han, Yansong Tang\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School, Tsinghua University ⟐ Harvard University\n- **🔗 链接**：[[中英摘要](./abs/2412.04887.md)] [[arXiv:2412.04887](https://arxiv.org/abs/2412.04887)] [[Code](https://github.com/Jixuan-Fan/Momentum-GS)]\n- **📝 说明**：\n\n#### [68] Pushing Rendering Boundaries: Hard Gaussian Splatting\n- **🧑‍🔬 作者**：Qingshan Xu, Jiequan Cui, Xuanyu Yi, Yuxuan Wang, Yuan Zhou, Yew-Soon Ong, Hanwang Zhang\n- **🏫 单位**：Nanyang Technological University ⟐ A*STAR, Singapore\n- **🔗 链接**：[[中英摘要](./abs/2412.04826.md)] [[arXiv:2412.04826](https://arxiv.org/abs/2412.04826)] [[Code](https://github.com/GhiXu/HGS)]\n- **📝 说明**：\n\n#### [69] PBDyG: Position Based Dynamic Gaussians for Motion-Aware Clothed Human Avatars\n- **🧑‍🔬 作者**：Shota Sasaki, Jane Wu, Ko Nishino\n- **🏫 单位**：Kyoto University ⟐ University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2412.04433.md)] [[arXiv:2412.04433](https://arxiv.org/abs/2412.04433)] [Code]\n- **📝 说明**：\n\n#### [70] DGNS: Deformable Gaussian Splatting and Dynamic Neural Surface for Monocular Dynamic 3D Reconstruction\n- **🧑‍🔬 作者**：Xuesong Li, Jinguang Tong, Jie Hong, Vivien Rolland, Lars Petersson\n- **🏫 单位**：Australian National University ⟐ CSIRO, Australia ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2412.03910.md)] [[arXiv:2412.03910](https://arxiv.org/abs/2412.03910)] [Code]\n- **📝 说明**：\n\n#### [71] Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos\n- **🧑‍🔬 作者**：Hanxue Liang, Jiawei Ren, Ashkan Mirzaei, Antonio Torralba, Ziwei Liu, Igor Gilitschenski, Sanja Fidler, Cengiz Oztireli, Huan Ling, Zan Gojcic, Jiahui Huang\n- **🏫 单位**：NVIDIA ⟐ University of Cambridge ⟐ Nanyang Technological University ⟐ University of Toronto ⟐ MIT ⟐ Vector Institute\n- **🔗 链接**：[[中英摘要](./abs/2412.03526.md)] [[arXiv:2412.03526](https://arxiv.org/abs/2412.03526)] [Code]\n- **📝 说明**：\n\n#### [72] Urban4D: Semantic-Guided 4D Gaussian Splatting for Urban Scene Reconstruction\n- **🧑‍🔬 作者**：Ziwen Li, Jiaxin Huang, Runnan Chen, Yunlong Che, Yandong Guo, Tongliang Liu, Fakhri Karray, Mingming Gong\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2412.03473.md)] [[arXiv:2412.03473](https://arxiv.org/abs/2412.03473)] [Code]\n- **📝 说明**：\n\n#### [73] 2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction\n- **🧑‍🔬 作者**：Wanting Zhang, Haodong Xiang, Zhichao Liao, Xiansong Lai, Xinghui Li, Long Zeng\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2412.03428.md)] [[arXiv:2412.03428](https://arxiv.org/abs/2412.03428)] [[Code](https://github.com/Valentina-Zhang/2DGS-Room)]\n- **📝 说明**：\n\n#### [74] SGSST: Scaling Gaussian Splatting StyleTransfer\n- **🧑‍🔬 作者**：Bruno Galerne, Jianling Wang, Lara Raad, Jean-Michel Morel\n- **🏫 单位**：Institut Denis Poisson ⟐ Institut Universitaire de France (IUF) ⟐ Universidad de la Republica ⟐ City University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2412.03371.md)] [[arXiv:2412.03371](https://arxiv.org/abs/2412.03371)] [Code]\n- **📝 说明**：\n\n#### [75] Splats in Splats: Embedding Invisible 3D Watermark within Gaussian Splatting\n- **🧑‍🔬 作者**：Yijia Guo, Wenkai Huang, Yang Li, Gaolei Li, Hang Zhang, Liwen Hu, Jianhua Li, Tiejun Huang, Lei Ma\n- **🏫 单位**：Peking University ⟐ Shanghai Jiao Tong University ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](./abs/2412.03121.md)] [[arXiv:2412.03121](https://arxiv.org/abs/2412.03121)] [Code]\n- **📝 说明**：\n\n#### [76] RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos\n- **🧑‍🔬 作者**：Yoonwoo Jeong, Junmyeong Lee, Hoseung Choi, Minsu Cho\n- **🏫 单位**：POSTECH\n- **🔗 链接**：[[中英摘要](./abs/2412.03077.md)] [[arXiv:2412.03077](https://arxiv.org/abs/2412.03077)] [[Code](https://github.com/POSTECH-CVLab/RoDyGS)]\n- **📝 说明**：\n\n#### [77] RelayGS: Reconstructing Dynamic Scenes with Large-Scale and Complex Motions via Relay Gaussians\n- **🧑‍🔬 作者**：Qiankun Gao, Yanmin Wu, Chengxiang Wen, Jiarui Meng, Luyang Tang, Jie Chen, Ronggang Wang, Jian Zhang\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory ⟐ Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, Peking University Shenzhen Graduate School\n- **🔗 链接**：[[中英摘要](./abs/2412.02493.md)] [[arXiv:2412.02493](https://arxiv.org/abs/2412.02493)] [[Code](https://github.com/gqk/RelayGS)]\n- **📝 说明**：\n\n#### [78] GSGTrack: Gaussian Splatting-Guided Object Pose Tracking from RGB Videos\n- **🧑‍🔬 作者**：Zhiyuan Chen, Fan Lu, Guo Yu, Bin Li, Sanqing Qu, Yuan Huang, Changhong Fu, Guang Chen\n- **🏫 单位**：Tongji University ⟐ Beijing Institute of Control Engineering\n- **🔗 链接**：[[中英摘要](./abs/2412.02267.md)] [[arXiv:2412.02267](https://arxiv.org/abs/2412.02267)] [Code]\n- **📝 说明**：\n\n#### [79] Multi-robot autonomous 3D reconstruction using Gaussian splatting with Semantic guidance\n- **🧑‍🔬 作者**：Jing Zeng, Qi Ye, Tianle Liu, Yang Xu, Jin Li, Jinming Xu, Liang Li, Jiming Chen\n- **🏫 单位**：Zhejiang University ⟐ Zhejiang University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.02249.md)] [[arXiv:2412.02249](https://arxiv.org/abs/2412.02249)] [Code]\n- **📝 说明**：\n\n#### [80] SparseLGS: Sparse View Language Embedded Gaussian Splatting\n- **🧑‍🔬 作者**：Jun Hu, Zhang Chen, Zhong Li, Yi Xu, Juyong Zhang\n- **🏫 单位**：University of Science and Technology of China ⟐ OPPO US Research Center\n- **🔗 链接**：[[中英摘要](./abs/2412.02245.md)] [[arXiv:2412.02245](https://arxiv.org/abs/2412.02245)] [Code]\n- **📝 说明**：\n\n#### [81] SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images\n- **🧑‍🔬 作者**：Junqiu Yu, Xinlin Ren, Yongchong Gu, Haitao Lin, Tianyu Wang, Yi Zhu, Hang Xu, Yu-Gang Jiang, Xiangyang Xue, Yanwei Fu\n- **🏫 单位**：Fudan University ⟐ Noah’s Ark Lab, Huawei Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.02140.md)] [[arXiv:2412.02140](https://arxiv.org/abs/2412.02140)] [Code]\n- **📝 说明**：\n\n#### [82] Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion\n- **🧑‍🔬 作者**：Liu Liu, Xinjie Wang, Jiaxiong Qiu, Tianwei Lin, Xiaolin Zhou, Zhizhong Su\n- **🏫 单位**：Horizon Robotics, Beijing, China\n- **🔗 链接**：[[中英摘要](./abs/2412.02075.md)] [[arXiv:2412.02075](https://arxiv.org/abs/2412.02075)] [[Code](https://github.com/liuliu3dv/GOC)]\n- **📝 说明**：\n\n#### [83] HDGS: Textured 2D Gaussian Splatting for Enhanced Scene Rendering\n- **🧑‍🔬 作者**：Jiahuan Cheng, Jan-Nico Zaech, Luc Van Gool, Danda Pani Paudel\n- **🏫 单位**：University of Pennsylvania ⟐ Archimedes, Athena RC\n- **🔗 链接**：[[中英摘要](./abs/2412.01823.md)] [[arXiv:2412.01823](https://arxiv.org/abs/2412.01823)] [[Code](https://github.com/TimSong412/HDGS)]\n- **📝 说明**：\n\n#### [84] Occam's LGS: A Simple Approach for Language Gaussian Splatting\n- **🧑‍🔬 作者**：Jiahuan Cheng, Jan-Nico Zaech, Luc Van Gool, Danda Pani Paudel\n- **🏫 单位**：Johns Hopkins University ⟐ INSAIT, Sofia University\n- **🔗 链接**：[[中英摘要](./abs/2412.01807.md)] [[arXiv:2412.01807](https://arxiv.org/abs/2412.01807)] [[Code](https://github.com/insait-institute/OccamLGS)]\n- **📝 说明**：\n\n#### [85] 3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting\n- **🧑‍🔬 作者**：Ziyang Yan, Lei Li, Yihua Shao, Siyu Chen, Wuzong Kai, Jenq-Neng Hwang, Hao Zhao, Fabio Remondino\n- **🏫 单位**：Bruno Kessler Foundation ⟐ University of Trento ⟐ University of Washington ⟐ University of Copenhagen ⟐ University of Science and Technology Beijing ⟐ Fancy Tech ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2412.01583.md)] [[arXiv:2412.01583](https://arxiv.org/abs/2412.01583)] [[Code](https://github.com/ZiyangYan/3DSceneEditor)]\n- **📝 说明**：\n\n#### [86] 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting\n- **🧑‍🔬 作者**：Yufeng Jin, Vignesh Prasad, Snehal Jauhri, Mathias Franzius, Georgia Chalvatzaki\n- **🏫 单位**：Technische Universitat Darmstadt, Germany ⟐ Honda Research Institute Europe GmbH, Offenbach, Germany ⟐ Hessian.AI, Darmstadt, Germany\n- **🔗 链接**：[[中英摘要](./abs/2412.01543.md)] [[arXiv:2412.01543](https://arxiv.org/abs/2412.01543)] [Code]\n- **📝 说明**：\n\n#### [87] ULSR-GS: Ultra Large-scale Surface Reconstruction Gaussian Splatting with Multi-View Geometric Consistency\n- **🧑‍🔬 作者**：Zhuoxiao Li, Shanliang Yao, Qizhong Gao, Angel F. Garcia-Fernandez, Yong Yue, Xiaohui Zhu\n- **🏫 单位**：University of Liverpool ⟐ Xi’an Jiaotong-Liverpool University ⟐ ARIES Research Centre, Universidad Antonio de Nebrija\n- **🔗 链接**：[[中英摘要](./abs/2412.01402.md)] [[arXiv:2412.01402](https://arxiv.org/abs/2412.01402)] [Code]\n- **📝 说明**：\n\n#### [88] RGBDS-SLAM: A RGB-D Semantic Dense SLAM Based on 3D Multi Level Pyramid Gaussian Splatting\n- **🧑‍🔬 作者**：Zhenzhong Cao, Qianyi Zhang, Jinzheng Guang, Yinuo Song, Jingtai Liu\n- **🏫 单位**：Nankai University\n- **🔗 链接**：[[中英摘要](./abs/2412.01217.md)] [[arXiv:2412.01217](https://arxiv.org/abs/2412.01217)] [[Code](https://github.com/zhenzhongcao/RGBDS-SLAM)]\n- **📝 说明**：\n\n#### [89] DynSUP: Dynamic Gaussian Splatting from An Unposed Image Pair\n- **🧑‍🔬 作者**：Weihang Li, Weirong Chen, Shenhan Qian, Jiajie Chen, Daniel Cremers, Haoang Li\n- **🏫 单位**：Technical University of Munich ⟐ Munich Center for Machine Learning ⟐ The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2412.00851.md)] [[arXiv:2412.00851](https://arxiv.org/abs/2412.00851)] [Code]\n- **📝 说明**：\n\n#### [90] ChatSplat: 3D Conversational Gaussian Splatting\n- **🧑‍🔬 作者**：Hanlin Chen, Fangyin Wei, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐ Princeton University\n- **🔗 链接**：[[中英摘要](./abs/2412.00734.md)] [[arXiv:2412.00734](https://arxiv.org/abs/2412.00734)] [Code]\n- **📝 说明**：\n\n#### [91] FlashSLAM: Accelerated RGB-D SLAM for Real-Time 3D Scene Reconstruction with Gaussian Splatting\n- **🧑‍🔬 作者**：Phu Pham, Damon Conover, Aniket Bera\n- **🏫 单位**：Purdue University ⟐ DEVCOM Army Research Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2412.00682.md)] [[arXiv:2412.00682](https://arxiv.org/abs/2412.00682)] [Code]\n- **📝 说明**：\n\n#### [92] LineGS : 3D Line Segment Representation on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chenggang Yang, Yuang Shi, Wei Tsang Ooi\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2412.00477.md)] [[arXiv:2412.00477](https://arxiv.org/abs/2412.00477)] [Code]\n- **📝 说明**：\n\n#### [93] GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision\n- **🧑‍🔬 作者**：Zehao Li, Wenwei Han, Yujun Cai, Hao Jiang, Baolong Bi, Shuqin Gao, Honglong Zhao, Zhaoqi Wang\n- **🏫 单位**：Institute of Computing Technology, Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ The University of Queensland\n- **🔗 链接**：[[中英摘要](./abs/2412.00392.md)] [[arXiv:2412.00392](https://arxiv.org/abs/2412.00392)] [Code]\n- **📝 说明**：\n\n#### [94] Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling\n- **🧑‍🔬 作者**：Junli Deng, Yihao Luo\n- **🏫 单位**：Communication University of China ⟐ Imperial College London\n- **🔗 链接**：[[中英摘要](./abs/2412.00333.md)] [[arXiv:2412.00333](https://arxiv.org/abs/2412.00333)] [Code]\n- **📝 说明**：\n\n#### [95] T-3DGS: Removing Transient Objects for 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Vadim Pryadilshchikov, Alexander Markin, Artem Komarichev, Ruslan Rakhimov, Peter Wonka, Evgeny Burnaev\n- **🏫 单位**：Skoltech, Russia ⟐ Robotics Center, Russia ⟐ KAUST, Saudi Arabia ⟐ AIRI, Russia\n- **🔗 链接**：[[中英摘要](./abs/2412.00155.md)] [[arXiv:2412.00155](https://arxiv.org/abs/2412.00155)] [[Code](https://github.com/Vadim200116/T-3DGS)]\n- **📝 说明**：\n\n#### [96] DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering\n- **🧑‍🔬 作者**：Yihao Wang, Marcus Klasson, Matias Turkulainen, Shuzhe Wang, Juho Kannala, Arno Solin\n- **🏫 单位**：Technical University of Munich ⟐ Aalto University ⟐ University of Oulu\n- **🔗 链接**：[[中英摘要](./abs/2411.19756.md)] [[arXiv:2411.19756](https://arxiv.org/abs/2411.19756)] [[Code](https://github.com/AaltoML/desplat/)]\n- **📝 说明**：\n\n#### [97] Tortho-Gaussian: Splatting True Digital Orthophoto Maps\n- **🧑‍🔬 作者**：Xin Wang, Wendi Zhang, Hong Xie, Haibin Ai, Qiangqiang Yuan, Zongqian Zhan\n- **🏫 单位**：School of Geodesy and Geomatics, Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2411.19594.md)] [[arXiv:2411.19594](https://arxiv.org/abs/2411.19594)] [Code]\n- **📝 说明**：\n\n#### [98] Gaussian Splashing: Direct Volumetric Rendering Underwater\n- **🧑‍🔬 作者**：Nir Mualem, Roy Amoyal, Oren Freifeld, Derya Akkaynak\n- **🏫 单位**：Ben-Gurion University ⟐ The Inter-University Institute for Marine Sciences and the University of Haifa\n- **🔗 链接**：[[中英摘要](./abs/2411.19588.md)] [[arXiv:2411.19588](https://arxiv.org/abs/2411.19588)] [Code]\n- **📝 说明**：\n\n#### [99] GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction\n- **🧑‍🔬 作者**：Jiepeng Wang, Yuan Liu, Peng Wang, Cheng Lin, Junhui Hou, Xin Li, Taku Komura, Wenping Wang\n- **🏫 单位**：The University of Hong Kong ⟐ Hong Kong University of Science and Technology ⟐ Nanyang Technological University ⟐ City University of Hong Kong ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2411.19454.md)] [[arXiv:2411.19454](https://arxiv.org/abs/2411.19454)] [[Code](https://github.com/jiepengwang/GausSurf)]\n- **📝 说明**：\n\n#### [100] SADG: Segment Any Dynamic Gaussian Without Object Trackers\n- **🧑‍🔬 作者**：Yun-Jin Li, Mariia Gladkova, Yan Xia, Daniel Cremers\n- **🏫 单位**：Technical University of Munich ⟐ Munich Center for Machine Learning\n- **🔗 链接**：[[中英摘要](./abs/2411.19290.md)] [[arXiv:2411.19290](https://arxiv.org/abs/2411.19290)] [[Code](https://github.com/yunjinli/SADG-SegmentAnyDynamicGaussian)]\n- **📝 说明**：\n\n#### [101] SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors\n- **🧑‍🔬 作者**：Rui Xu, Wenyue Chen, Jiepeng Wang, Yuan Liu, Peng Wang, Lin Gao, Shiqing Xin, Taku Komura, Xin Li, Wenping Wang\n- **🏫 单位**：The University of Hong Kong ⟐ Dalian University of Technology ⟐ Nanyang Technological University ⟐ Hong Kong University of Science and Technology ⟐ Chinese Academy of Sciences ⟐ Shandong University ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2411.18966.md)] [[arXiv:2411.18966](https://arxiv.org/abs/2411.18966)] [[Code](https://github.com/Xrvitd/SuperGaussians)]\n- **📝 说明**：\n\n#### [102] RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning\n- **🧑‍🔬 作者**：Jiacheng Wang, Zhedong Zheng, Wei Xu, Ping Liu\n- **🏫 单位**：EIC, Huazhong University of Science and Technology ⟐ FST and ICI, University of Macau ⟐ CSE, University of Nevada, Reno\n- **🔗 链接**：[[中英摘要](./abs/2411.18866.md)] [[arXiv:2411.18866](https://arxiv.org/abs/2411.18866)] [Code]\n- **📝 说明**：\n\n#### [103] GaussianSpeech: Audio-Driven Gaussian Avatars\n- **🧑‍🔬 作者**：Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner\n- **🏫 单位**：Technical University of Munich ⟐ Max Planck Institute for Intelligent Systems ⟐ Technical University of Darmstadt\n- **🔗 链接**：[[中英摘要](./abs/2411.18675.md)] [[arXiv:2411.18675](https://arxiv.org/abs/2411.18675)] [[Code](https://github.com/shivangi-aneja/GaussianSpeech)]\n- **📝 说明**：\n\n#### [104] Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Hao Liu, Minglin Chen, Yanni Ma, Haihong Xiao, Ying He\n- **🏫 单位**：Nanyang Technological University ⟐ Sun Yat-Sen University ⟐ South China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2411.18667.md)] [[arXiv:2411.18667](https://arxiv.org/abs/2411.18667)] [Code]\n- **📝 说明**：\n\n#### [105] PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image\n- **🧑‍🔬 作者**：Han Yan, Mingrui Zhang, Yang Li, Chao Ma, Pan Ji\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Tencent XR Vision Labs\n- **🔗 链接**：[[中英摘要](./abs/2411.18548.md)] [[arXiv:2411.18548](https://arxiv.org/abs/2411.18548)] [[Code](https://github.com/Tencent/Tencent-XR-3DGen/tree/main/character/phy_cage)]\n- **📝 说明**：\n\n#### [106] HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression\n- **🧑‍🔬 作者**：Lei Liu, Zhenghao Chen, Dong Xu\n- **🏫 单位**：Beihang University ⟐ The University of Newcastle ⟐ The University of HongKong Hong Kong SAR, China\n- **🔗 链接**：[[中英摘要](./abs/2411.18473.md)] [[arXiv:2411.18473](https://arxiv.org/abs/2411.18473)] [Code]\n- **📝 说明**：\n\n#### [107] Neural Surface Priors for Editable Gaussian Splatting\n- **🧑‍🔬 作者**：Jakub Szymkowiak, Weronika Jakubowska, Dawid Malarz, Weronika Smolak-Dyżewska, Maciej Zięba, Przemysław Musialski, Wojtek Pałubicki, Przemysław Spurek\n- **🏫 单位**：IDEAS NCBR ⟐ Adam Mickiewicz University ⟐ Wrocław University of Science and Technology ⟐ Jagiellonian University ⟐ Tooploox ⟐ New Jersey Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2411.18311.md)] [[arXiv:2411.18311](https://arxiv.org/abs/2411.18311)] [[Code](https://github.com/WJakubowska/NeuralSurfacePriors)]\n- **📝 说明**：\n\n#### [108] SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images\n- **🧑‍🔬 作者**：Yanyan Li, Yixin Fang, Federico Tombari, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐ Zhejiang university ⟐ Technical University of Munich ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2411.18072.md)] [[arXiv:2411.18072](https://arxiv.org/abs/2411.18072)] [[Code](https://github.com/yanyan-li/SmileSplat)]\n- **📝 说明**：\n\n#### [109] GLS: Geometry-aware 3D Language Gaussian Splatting\n- **🧑‍🔬 作者**：Jiaxiong Qiu, Liu Liu, Zhizhong Su, Tianwei Lin\n- **🏫 单位**：Horizon Robotics, Beijing, China\n- **🔗 链接**：[[中英摘要](./abs/2411.18066.md)] [[arXiv:2411.18066](https://arxiv.org/abs/2411.18066)] [[Code](https://github.com/JiaxiongQ/GLS)]\n- **📝 说明**：\n\n#### [110] HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction\n- **🧑‍🔬 作者**：Wei Zhang, Qing Cheng, David Skuddis, Niclas Zeller, Daniel Cremers, Norbert Haala\n- **🏫 单位**：Institute for Photogrammetry and Geoinformatics, University of Stuttgart, Germany ⟐ Technical University of Munich ⟐ Karlsruhe University of Applied Sciences ⟐ Munich Center for Machine Learning\n- **🔗 链接**：[[中英摘要](./abs/2411.17982.md)] [[arXiv:2411.17982](https://arxiv.org/abs/2411.17982)] [[Code](https://github.com/Willyzw/HI-SLAM2)]\n- **📝 说明**：\n\n#### [111] DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Christian Homeyer, Leon Begiristain, Christoph Schnörr\n- **🏫 单位**：Image and Pattern Analysis Group, Heidelberg University, Germany\n- **🔗 链接**：[[中英摘要](./abs/2411.17660.md)] [[arXiv:2411.17660](https://arxiv.org/abs/2411.17660)] [[Code](https://github.com/ChenHoy/DROID-Splat)]\n- **📝 说明**：\n\n#### [112] Distractor-free Generalizable 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yanqi Bao, Jing Liao, Jing Huo, Yang Gao\n- **🏫 单位**：Nanjing University ⟐ City University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2411.17605.md)] [[arXiv:2411.17605](https://arxiv.org/abs/2411.17605)] [[Code](https://github.com/bbbbby-99/DGGS)]\n- **📝 说明**：\n\n#### [113] 4D Scaffold Gaussian Splatting for Memory Efficient Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Woong Oh Cho, In Cho, Seoha Kim, Jeongmin Bae, Youngjung Uh, Seon Joo Kim\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2411.17044.md)] [[arXiv:2411.17044](https://arxiv.org/abs/2411.17044)] [Code]\n- **📝 说明**：\n\n#### [114] G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs\n- **🧑‍🔬 作者**：Kunyi Li, Michael Niemeyer, Zeyu Chen, Nassir Navab, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Google ⟐ Tsinghua University ⟐ Johns Hopkins University\n- **🔗 链接**：[[中英摘要](./abs/2411.16898.md)] [[arXiv:2411.16898](https://arxiv.org/abs/2411.16898)] [Code]\n- **📝 说明**：\n\n#### [115] PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence\n- **🧑‍🔬 作者**：Zequn Chen, Jiezhi Yang, Heng Yang\n- **🏫 单位**：Harvard University\n- **🔗 链接**：[[中英摘要](./abs/2411.16877.md)] [[arXiv:2411.16877](https://arxiv.org/abs/2411.16877)] [[Code](https://github.com/ComputationalRobotics/PreF3R)]\n- **📝 说明**：\n\n#### [116] NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model\n- **🧑‍🔬 作者**：Jinpeng Liu, Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Ying Shan, Yansong Tang\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School, Tsinghua University ⟐ ARC Lab, Tencent PCG\n- **🔗 链接**：[[中英摘要](./abs/2411.16779.md)] [[arXiv:2411.16779](https://arxiv.org/abs/2411.16779)] [Code]\n- **📝 说明**：\n\n#### [117] Bundle Adjusted Gaussian Avatars Deblurring\n- **🧑‍🔬 作者**：Muyao Niu, Yifan Zhan, Qingtian Zhu, Zhuoxiao Li, Wei Wang, Zhihang Zhong, Xiao Sun, Yinqiang Zheng\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ The University of Tokyo\n- **🔗 链接**：[[中英摘要](./abs/2411.16758.md)] [[arXiv:2411.16758](https://arxiv.org/abs/2411.16758)] [[Code](https://github.com/MyNiuuu/BAGA)]\n- **📝 说明**：\n\n#### [118] Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction\n- **🧑‍🔬 作者**：Ziyu Zhang, Binbin Huang, Hanqing Jiang, Liyang Zhou, Xiaojun Xiang, Shunhan Shen\n- **🏫 单位**：CASIA ⟐ TheUniversity of Hong Kong ⟐ SenseTime Research\n- **🔗 链接**：[[中英摘要](./abs/2411.16392.md)] [[arXiv:2411.16392](https://arxiv.org/abs/2411.16392)] [[Code](https://github.com/QuadraticGS/QGS)]\n- **📝 说明**：\n\n#### [119] Event-boosted Deformable 3D Gaussians for Fast Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Wenhao Xu, Wenming Weng, Yueyi Zhang, Ruikang Xu, Zhiwei Xiong\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2411.16180.md)] [[arXiv:2411.16180](https://arxiv.org/abs/2411.16180)] [Code]\n- **📝 说明**：\n\n#### [120] UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation\n- **🧑‍🔬 作者**：Guangzhao Dai, Jian Zhao, Yuantao Chen, Yusen Qin, Hao Zhao, Guosen Xie, Yazhou Yao, Xiangbo Shu, Xuelong Li\n- **🏫 单位**：Nanjing University of Science and Technology ⟐ Northwest Polytechnical University ⟐ The Chinese University of Hong Kong, Shenzhen ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2411.16053.md)] [[arXiv:2411.16053](https://arxiv.org/abs/2411.16053)] [Code]\n- **📝 说明**：\n\n#### [121] Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors\n- **🧑‍🔬 作者**：Soumava Paul, Prakhar Kaushik, Alan Yuille\n- **🏫 单位**：Johns Hopkins University\n- **🔗 链接**：[[中英摘要](./abs/2411.15966.md)] [[arXiv:2411.15966](https://arxiv.org/abs/2411.15966)] [Code]\n- **📝 说明**：\n\n#### [122] PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments\n- **🧑‍🔬 作者**：Haoang Li, Xiangqi Meng, Xingxing Zuo, Zhe Liu, Hesheng Wang, Daniel Cremers\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou) ⟐ Technical University of Munich ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2411.15800.md)] [[arXiv:2411.15800](https://arxiv.org/abs/2411.15800)] [Code]\n- **📝 说明**：\n\n#### [123] ZeroGS: Training 3D Gaussian Splatting from Unposed Images\n- **🧑‍🔬 作者**：Yu Chen, Rolandos Alexandros Potamias, Evangelos Ververas, Jifei Song, Jiankang Deng, Gim Hee Lee\n- **🏫 单位**：National University of Singapore ⟐ Imperial College of London\n- **🔗 链接**：[[中英摘要](./abs/2411.15779.md)] [[arXiv:2411.15779](https://arxiv.org/abs/2411.15779)] [[Code](https://github.com/aibluefisher/ZeroGS)]\n- **📝 说明**：\n\n#### [124] DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models\n- **🧑‍🔬 作者**：Yangyang Qian, Yuan Sun, Yu Guo\n- **🏫 单位**：Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2411.15732.md)] [[arXiv:2411.15732](https://arxiv.org/abs/2411.15732)] [Code]\n- **📝 说明**：\n\n#### [125] GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision\n- **🧑‍🔬 作者**：Xu Baixin, Hu Jiangbei, Li Jiaze, He Ying\n- **🏫 单位**：Nanyang Technological Univerisy ⟐ Dalian University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2411.15723.md)] [[arXiv:2411.15723](https://arxiv.org/abs/2411.15723)] [[Code](https://github.com/xubaixinxbx/Gsurf)]\n- **📝 说明**：\n\n#### [126] SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving\n- **🧑‍🔬 作者**：Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen\n- **🏫 单位**：Purdue University ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](./abs/2411.15482.md)] [[arXiv:2411.15482](https://arxiv.org/abs/2411.15482)] [Code]\n- **📝 说明**：\n\n#### [127] Gassidy: Gaussian Splatting SLAM in Dynamic Environments\n- **🧑‍🔬 作者**：Long Wen, Shixin Li, Yu Zhang, Yuhong Huang, Jianjie Lin, Fengjunjie Pan, Zhenshan Bing, Alois Knoll\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2411.15476.md)] [[arXiv:2411.15476](https://arxiv.org/abs/2411.15476)] [[Code](https://blarklee.github.io/splatsdf/)]\n- **📝 说明**：\n\n#### [128] SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion\n- **🧑‍🔬 作者**：Runfa Blark Li, Keito Suzuki, Bang Du, Ki Myung Brian Le, Nikolay Atanasov, Truong Nguyen\n- **🏫 单位**：Video Processing Lab ⟐ Existential Robotics Lab, UC San Diego\n- **🔗 链接**：[[中英摘要](./abs/2411.15468.md)] [[arXiv:2411.15468](https://arxiv.org/abs/2411.15468)] [[Code](https://github.com/BlarkLee/SplatSDF_official)]\n- **📝 说明**：\n\n#### [129] UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations\n- **🧑‍🔬 作者**：Yuan Ren, Guile Wu, Runhao Li, Zheyuan Yang, Yibo Liu, Xingxin Chen, Tongtong Cao, Bingbing Liu\n- **🏫 单位**：Huawei Noah’s Ark Lab ⟐ University of Toronto ⟐ York University\n- **🔗 链接**：[[中英摘要](./abs/2411.15355.md)] [[arXiv:2411.15355](https://arxiv.org/abs/2411.15355)] [Code]\n- **📝 说明**：\n\n#### [130] Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Joji Joseph, Bharadwaj Amrutur, Shalabh Bhatnagar\n- **🏫 单位**：Indian Institute of Science\n- **🔗 链接**：[[中英摘要](./abs/2411.15193.md)] [[arXiv:2411.15193](https://arxiv.org/abs/2411.15193)] [[Code](https://jojijoseph.github.io/3dgs-backprojection/)]\n- **📝 说明**：\n\n#### [131] Neural 4D Evolution under Large Topological Changes from 2D Images\n- **🧑‍🔬 作者**：AmirHossein Naghi Razlighi, Tiago Novello, Asen Nachkov, Thomas Probst, Danda Paudel\n- **🏫 单位**：INSAIT, Sofia University ⟐ IMPA\n- **🔗 链接**：[[中英摘要](./abs/2411.15018.md)] [[arXiv:2411.15018](https://arxiv.org/abs/2411.15018)] [Code]\n- **📝 说明**：\n\n#### [132] Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly Training for 4D Reconstruction\n- **🧑‍🔬 作者**：Zhening Liu, Yingdong Hu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang\n- **🏫 单位**：Hong Kong University of Science and Technology ⟐ Institute of Artificial Intelligence (TeleAI), China Telecom\n- **🔗 链接**：[[中英摘要](./abs/2411.14847.md)] [[arXiv:2411.14847](https://arxiv.org/abs/2411.14847)] [[Code](https://github.com/LIUZhening111/DASS)]\n- **📝 说明**：\n\n#### [133] NexusSplats: Efficient 3D Gaussian Splatting in the Wild\n- **🧑‍🔬 作者**：Yuzhou Tang, Dejun Xu, Yongjie Hou, Zhenzhong Wang, Min Jiang\n- **🏫 单位**：Xiamen University\n- **🔗 链接**：[[中英摘要](./abs/2411.14514.md)] [[arXiv:2411.14514](https://arxiv.org/abs/2411.14514)] [[Code](https://github.com/juliantang324/NexusSplats)]\n- **📝 说明**：\n\n#### [134] FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting\n- **🧑‍🔬 作者**：Ola Shorinwa, Jiankai Sun, Mac Schwager\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2411.13753.md)] [[arXiv:2411.13753](https://arxiv.org/abs/2411.13753)] [Code]\n- **📝 说明**：\n\n#### [135] Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification\n- **🧑‍🔬 作者**：Guangchi Fang, Bing Wang\n- **🏫 单位**：The Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2411.12788.md)] [[arXiv:2411.12788](https://arxiv.org/abs/2411.12788)] [Code]\n- **📝 说明**：\n\n#### [136] PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy\n- **🧑‍🔬 作者**：Joanna Kaleta, Weronika Smolak-Dyżewska, Dawid Malarz, Diego Dall'Alba, Przemysław Korzeniowski, Przemysław Spurek\n- **🏫 单位**：Warsaw University of Technology ⟐ Sano Centre for Computational Medicine ⟐ Jagiellonian University ⟐ University of Verona\n- **🔗 链接**：[[中英摘要](./abs/2411.12510.md)] [[arXiv:2411.12510](https://arxiv.org/abs/2411.12510)] [Code]\n- **📝 说明**：\n\n#### [137] SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image\n- **🧑‍🔬 作者**：Zixu Wang, Hao Yang, Yu Guo, Fei Wang\n- **🏫 单位**：National Key Laboratory of Human-Machine Hybrid Augmented Intelligence ⟐ National Engineering Research Center for Visual Information and Applications ⟐ Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2411.12471.md)] [[arXiv:2411.12471](https://arxiv.org/abs/2411.12471)] [Code]\n- **📝 说明**：\n\n#### [138] GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving\n- **🧑‍🔬 作者**：Shaoqing Xu, Fang Li, Shengyin Jiang, Ziying Song, Li Liu, Zhi-xin Yang\n- **🏫 单位**：University of Macau ⟐ Beijing Institute of Technology ⟐ Beijing University of Posts and Telecommunications ⟐ Beijing Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2411.12452.md)] [[arXiv:2411.12452](https://arxiv.org/abs/2411.12452)] [Code]\n- **📝 说明**：\n\n#### [139] Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels\n- **🧑‍🔬 作者**：Haodong Chen, Runnan Chen, Qiang Qu, Zhaoqing Wang, Tongliang Liu, Xiaoming Chen, Yuk Ying Chung\n- **🏫 单位**：University of Sydney ⟐ Beijing Technology and Business University\n- **🔗 链接**：[[中英摘要](./abs/2411.12440.md)] [[arXiv:2411.12440](https://arxiv.org/abs/2411.12440)] [Code]\n- **📝 说明**：\n\n#### [140] LiV-GS: LiDAR-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments\n- **🧑‍🔬 作者**：Renxiang Xiao, Wei Liu, Yushuai Chen, Liang Hu\n- **🏫 单位**：Harbin Institute of Technology, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2411.12185.md)] [[arXiv:2411.12185](https://arxiv.org/abs/2411.12185)] [Code]\n- **📝 说明**：\n\n#### [141] Sketch-guided Cage-based 3D Gaussian Splatting Deformation\n- **🧑‍🔬 作者**：Tianhao Xie, Noam Aigerman, Eugene Belilovsky, Tiberiu Popa\n- **🏫 单位**：Concordia University ⟐ Université de Montréal ⟐ MILA\n- **🔗 链接**：[[中英摘要](./abs/2411.12168.md)] [[arXiv:2411.12168](https://arxiv.org/abs/2411.12168)] [Code]\n- **📝 说明**：\n\n#### [142] RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator\n- **🧑‍🔬 作者**：Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang\n- **🏫 单位**：Harbin Institute of Technology, Shenzhen ⟐ Chinese Academy of Sciences ⟐ MEGVII Technology ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2411.11839.md)] [[arXiv:2411.11839](https://arxiv.org/abs/2411.11839)] [Code]\n- **📝 说明**：\n\n#### [143] VeGaS: Video Gaussian Splatting\n- **🧑‍🔬 作者**：Weronika Smolak-Dyżewska, Dawid Malarz, Kornel Howil, Jan Kaczmarczyk, Marcin Mazur, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University\n- **🔗 链接**：[[中英摘要](./abs/2411.11024.md)] [[arXiv:2411.11024](https://arxiv.org/abs/2411.11024)] [[Code](https://github.com/gmum/VeGaS)]\n- **📝 说明**：\n\n#### [144] DGS-SLAM: Gaussian Splatting SLAM in Dynamic Environment\n- **🧑‍🔬 作者**：Mangyu Kong, Jaewon Lee, Seongwon Lee, Euntai Kim\n- **🏫 单位**：Yonsei University ⟐ Kookmin University\n- **🔗 链接**：[[中英摘要](./abs/2411.10722.md)] [[arXiv:2411.10722](https://arxiv.org/abs/2411.10722)] [Code]\n- **📝 说明**：\n\n#### [145] Efficient Density Control for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaobin Deng, Changyu Diao, Min Li, Ruohan Yu, Duanqing Xu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2411.10133.md)] [[arXiv:2411.10133](https://arxiv.org/abs/2411.10133)] [[Code](https://github.com/XiaoBin2001/EDC)]\n- **📝 说明**：\n\n#### [146] DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction\n- **🧑‍🔬 作者**：Shengchao Zhao, Yundong Li\n- **🏫 单位**：School of Information Science and Technology, North China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2411.09156.md)] [[arXiv:2411.09156](https://arxiv.org/abs/2411.09156)] [Code]\n- **📝 说明**：\n\n#### [147] BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis\n- **🧑‍🔬 作者**：David Svitov, Pietro Morerio, Lourdes Agapito, Alessio Del Bue\n- **🏫 单位**：Universita degli Studi di Genova, Genoa, Italy ⟐ Istituto Italiano di Tecnologia (IIT), Genoa, Italy ⟐ Department of Computer Science, University College London\n- **🔗 链接**：[[中英摘要](./abs/2411.08508.md)] [[arXiv:2411.08508](https://arxiv.org/abs/2411.08508)] [[Code](https://github.com/david-svitov/BBSplat)]\n- **📝 说明**：\n\n#### [148] Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation\n- **🧑‍🔬 作者**：Han Qi, Tao Cai, Xiyue Han\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2411.07579.md)] [[arXiv:2411.07579](https://arxiv.org/abs/2411.07579)] [Code]\n- **📝 说明**：\n\n#### [149] GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering\n- **🧑‍🔬 作者**：Zhihao Liang, Hongdong Li, Kui Jia, Kailing Guo, Qi Zhang\n- **🏫 单位**：the School of Electronic and Information Engineering, South China University of Technology ⟐ School of Data Science, The Chinese University of Hong Kong ⟐ VIVO\n- **🔗 链接**：[[中英摘要](./abs/2411.07478.md)] [[arXiv:2411.07478](https://arxiv.org/abs/2411.07478)] [Code]\n- **📝 说明**：\n\n#### [150] A Hierarchical Compression Technique for 3D Gaussian Splatting Compression\n- **🧑‍🔬 作者**：He Huang, Wenjie Huang, Qi Yang, Yiling Xu, Zhu li\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Missouri-Kansas City\n- **🔗 链接**：[[中英摘要](./abs/2411.06976.md)] [[arXiv:2411.06976](https://arxiv.org/abs/2411.06976)] [Code]\n- **📝 说明**：\n\n#### [151] GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting\n- **🧑‍🔬 作者**：Jilan Mei, Junbo Li, Cai Meng\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2411.03807.md)] [[arXiv:2411.03807](https://arxiv.org/abs/2411.03807)] [Code]\n- **📝 说明**：\n\n#### [152] 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement\n- **🧑‍🔬 作者**：Ziqi Lu, Jianbo Ye, John Leonard\n- **🏫 单位**：Massachusetts Institute of Technology ⟐ Amazon\n- **🔗 链接**：[[中英摘要](./abs/2411.03706.md)] [[arXiv:2411.03706](https://arxiv.org/abs/2411.03706)] [Code]\n- **📝 说明**：\n\n#### [153] HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features\n- **🧑‍🔬 作者**：Arnab Dey, Cheng-You Lu, Andrew I. Comport, Srinath Sridhar, Chin-Teng Lin, Jean Martinet\n- **🏫 单位**：Universite C´ ote d’Azur ⟐ University of Technology Sydney ⟐ Brown University\n- **🔗 链接**：[[中英摘要](./abs/2411.03086.md)] [[arXiv:2411.03086](https://arxiv.org/abs/2411.03086)] [Code]\n- **📝 说明**：\n\n#### [154] LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Huibin Zhao, Weipeng Guan, Peng Lu\n- **🏫 单位**：The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2411.02703.md)] [[arXiv:2411.02703](https://arxiv.org/abs/2411.02703)] [Code]\n- **📝 说明**：\n\n#### [155] Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting\n- **🧑‍🔬 作者**：Joey Wilson, Marcelino Almeida, Min Sun, Sachit Mahajan, Maani Ghaffari, Parker Ewen, Omid Ghasemalizadeh, Cheng-Hao Kuo, Arnie Sen\n- **🏫 单位**：University of Michigan ⟐ Amazon Lab 126\n- **🔗 链接**：[[中英摘要](./abs/2411.02547.md)] [[arXiv:2411.02547](https://arxiv.org/abs/2411.02547)] [Code]\n- **📝 说明**：\n\n#### [156] Real-Time Spatio-Temporal Reconstruction of Dynamic Endoscopic Scenes with 4D Gaussian Splatting\n- **🧑‍🔬 作者**：Fengze Li, Jishuai He, Jieming Ma, Zhijing Wu\n- **🏫 单位**：University of Liverpool ⟐ Xi’an Jiaotong-Liverpool University, Suzhou ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2411.01218.md)] [[arXiv:2411.01218](https://arxiv.org/abs/2411.01218)] [Code]\n- **📝 说明**：\n\n#### [157] Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes\n- **🧑‍🔬 作者**：Shaohua Liu, Junzhe Lu, Zuoya Gu, Jiajun Li, Yue Deng\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2411.00239.md)] [[arXiv:2411.00239](https://arxiv.org/abs/2411.00239)] [[Code](https://github.com/deng-ai-lab/AquaticGS)]\n- **📝 说明**：\n\n#### [158] FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives\n- **🧑‍🔬 作者**：Qizhi Chen, Delin Qu, Yiwen Tang, Haoming Song, Yiting Zhang, Dong Wang, Bin Zhao, Xuelong Li\n- **🏫 单位**：Zhejiang University ⟐ Shanghai AI Laboratory ⟐ Fudan University ⟐ Northwestern Polytechnical University\n- **🔗 链接**：[[中英摘要](./abs/2410.22070.md)] [[arXiv:2410.22070](https://arxiv.org/abs/2410.22070)] [Code]\n- **📝 说明**：\n\n#### [159] LoDAvatar: Hierarchical Embedding and Adaptive Levels of Detail with Gaussian Splatting for Enhanced Human Avatars\n- **🧑‍🔬 作者**：Xiaonuo Dongye, Hanzhi Guo, Le Luo, Haiyan Jiang, Yihua Bao, Zeyu Tian, Dongdong Weng\n- **🏫 单位**：Beijing Institute of Technology ⟐ Peng Cheng Laboratory ⟐ Beijing Institute of Technology Zhengzhou Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2410.20789.md)] [[arXiv:2410.20789](https://arxiv.org/abs/2410.20789)] [Code]\n- **📝 说明**：\n\n#### [160] CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians\n- **🧑‍🔬 作者**：Chongjian Ge, Chenfeng Xu, Yuanfeng Ji, Chensheng Peng, Masayoshi Tomizuka, Ping Luo, Mingyu Ding, Varun Jampani, Wei Zhan\n- **🏫 单位**：The University of Hong Kong ⟐ University of California, Berkeley ⟐ UNC-Chapel Hill ⟐ Stability AI\n- **🔗 链接**：[[中英摘要](./abs/2410.20723.md)] [[arXiv:2410.20723](https://arxiv.org/abs/2410.20723)] [Code]\n- **📝 说明**：\n\n#### [161] PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views\n- **🧑‍🔬 作者**：Xin Fei, Wenzhao Zheng, Yueqi Duan, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Jiwen Lu\n- **🏫 单位**：Tsinghua University ⟐ University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2410.18979.md)] [[arXiv:2410.18979](https://arxiv.org/abs/2410.18979)] [[Code](https://github.com/Barrybarry-Smith/PixelGaussian)]\n- **📝 说明**：\n\n#### [162] PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yu Wang, Xiaobao Wei, Ming Lu, Guoliang Kang\n- **🏫 单位**：Beihang University ⟐ Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2410.17505.md)] [[arXiv:2410.17505](https://arxiv.org/abs/2410.17505)] [Code]\n- **📝 说明**：\n\n#### [163] GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting\n- **🧑‍🔬 作者**：Yusen Xie, Zhenmin Huang, Jin Wu, Jun Ma\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Hunan University ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2410.17084.md)] [[arXiv:2410.17084](https://arxiv.org/abs/2410.17084)] [[Code](https://github.com/xieyuser/GS-LIVM)]\n- **📝 说明**：\n\n#### [164] Multi-Layer Gaussian Splatting for Immersive Anatomy Visualization\n- **🧑‍🔬 作者**：Constantin Kleinbeck, Hannah Schieber, Klaus Engel, Ralf Gutjahr, Daniel Roth\n- **🏫 单位**：Siemens Healthineers AG ⟐ Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2410.16978.md)] [[arXiv:2410.16978](https://arxiv.org/abs/2410.16978)] [Code]\n- **📝 说明**：\n\n#### [165] MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors\n- **🧑‍🔬 作者**：Honghua Chen, Yushi Lan, Yongwei Chen, Yifan Zhou, Xingang Pan\n- **🏫 单位**：Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2410.16272.md)] [[arXiv:2410.16272](https://arxiv.org/abs/2410.16272)] [[Code](https://github.com/chenhonghua/MvDrag3D)]\n- **📝 说明**：\n\n#### [166] LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images\n- **🧑‍🔬 作者**：Hao He, Yixun Liang, Luozhou Wang, Yuanhao Cai, Xinli Xu, Hao-Xiang Guo, Xiang Wen, Yingcong Chen\n- **🏫 单位**：HKUST(GZ) ⟐ HKUST ⟐ Johns Hopkins University ⟐ SkyWork AI\n- **🔗 链接**：[[中英摘要](./abs/2410.15636.md)] [[arXiv:2410.15636](https://arxiv.org/abs/2410.15636)] [[Code](https://github.com/EnVision-Research/LucidFusion)]\n- **📝 说明**：\n\n#### [167] EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Bohao Liao, Wei Zhai, Zengyu Wan, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2410.15392.md)] [[arXiv:2410.15392](https://arxiv.org/abs/2410.15392)] [Code]\n- **📝 说明**：\n\n#### [168] GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shuichang Lai, Letian Huang, Jie Guo, Kai Cheng, Bowen Pan, Xiaoxiao Long, Jiangjing Lyu, Chengfei Lv, Yanwen Guo\n- **🏫 单位**：Alibaba Group ⟐ Nanjing University ⟐ University of Science and Technology of China ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2410.13349.md)] [[arXiv:2410.13349](https://arxiv.org/abs/2410.13349)] [Code]\n- **📝 说明**：\n\n#### [169] UniG: Modelling Unitary 3D Gaussians for View-consistent 3D Reconstruction\n- **🧑‍🔬 作者**：Jiamin Wu, Kenkun Liu, Yukai Shi, Xiaoke Jiang, Yuan Yao, Lei Zhang\n- **🏫 单位**：Hong Kong University of Science and Technology ⟐ IDEA ⟐ The Chinese University of Hong Kong, Shenzhen ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2410.13195.md)] [[arXiv:2410.13195](https://arxiv.org/abs/2410.13195)] [[Code](https://github.com/jwubz123/UNIG)]\n- **📝 说明**：\n\n#### [170] SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection\n- **🧑‍🔬 作者**：Yizhe Liu, Yan Song Hu, Yuhao Chen, John Zelek\n- **🏫 单位**：University of Waterloo\n- **🔗 链接**：[[中英摘要](./abs/2410.12080.md)] [[arXiv:2410.12080](https://arxiv.org/abs/2410.12080)] [Code]\n- **📝 说明**：\n\n#### [171] LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images\n- **🧑‍🔬 作者**：Yuzhou Cheng, Jianhao Jiao, Yue Wang, Dimitrios Kanoulas\n- **🏫 单位**：University College London ⟐ Zhejiang University ⟐ University College London\n- **🔗 链接**：[[中英摘要](./abs/2410.11505.md)] [[arXiv:2410.11505](https://arxiv.org/abs/2410.11505)] [Code]\n- **📝 说明**：\n\n#### [172] 4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting\n- **🧑‍🔬 作者**：Wanlin Liang, Hongbin Xu, Weitao Chen, Feng Xiao, Wenxiong Kang\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2410.10412.md)] [[arXiv:2410.10412](https://arxiv.org/abs/2410.10412)] [Code]\n- **📝 说明**：\n\n#### [173] Gaussian Splatting Visual MPC for Granular Media Manipulation\n- **🧑‍🔬 作者**：Wei-Cheng Tseng, Ellina Zhang, Krishna Murthy Jatavallabhula, Florian Shkurti\n- **🏫 单位**：University of Toronto ⟐ Vector Institute ⟐ University College London ⟐ MIT CSAIL\n- **🔗 链接**：[[中英摘要](./abs/2410.09740.md)] [[arXiv:2410.09740](https://arxiv.org/abs/2410.09740)] [Code]\n- **📝 说明**：\n\n#### [174] Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors\n- **🧑‍🔬 作者**：Hritam Basak, Hadi Tabatabaee, Shreekant Gayaka, Ming-Feng Li, Xin Yang, Cheng-Hao Kuo, Arnie Sen, Min Sun, Zhaozheng Yin\n- **🏫 单位**：Amazon Lab126 ⟐ Carnegie Mellon University ⟐ University College London ⟐ Stony Brook University\n- **🔗 链接**：[[中英摘要](./abs/2410.09467.md)] [[arXiv:2410.09467](https://arxiv.org/abs/2410.09467)] [Code]\n- **📝 说明**：\n\n#### [175] SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction\n- **🧑‍🔬 作者**：Jialei Chen, Xin Zhang, Mobarakol Islam, Francisco Vasconcelos, Danail Stoyanov, Daniel S. Elson, Baoru Huang\n- **🏫 单位**：Imperial College London ⟐ The HongKong University of Science and Technology ⟐ University College London ⟐ University of Liverpool\n- **🔗 链接**：[[中英摘要](./abs/2410.09292.md)] [[arXiv:2410.09292](https://arxiv.org/abs/2410.09292)] [Code]\n- **📝 说明**：\n\n#### [176] FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction\n- **🧑‍🔬 作者**：Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang\n- **🏫 单位**：New York University ⟐ Carnegie Mellon University ⟐ University of Illinois, Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2410.08282.md)] [[arXiv:2410.08282](https://arxiv.org/abs/2410.08282)] [[Code](https://github.com/ai4ce/FusionSense)]\n- **📝 说明**：\n\n#### [177] RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image\n- **🧑‍🔬 作者**：Xiaoxue Chen, Jv Zheng, Hao Huang, Haoran Xu, Weihao Gu, Kangliang Chen, He xiang, Huan-ang Gao, Hao Zhao, Guyue Zhou, Yaqin Zhang\n- **🏫 单位**：Tsinghua University ⟐ Haomo.ai ⟐ Beijing Jiaotong University ⟐ Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2410.08181.md)] [[arXiv:2410.08181](https://arxiv.org/abs/2410.08181)] [Code]\n- **📝 说明**：\n\n#### [178] Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting\n- **🧑‍🔬 作者**：Weixing Zhang, Zongrui Li, De Ma, Huajin Tang, Xudong Jiang, Qian Zheng, Gang Pan\n- **🏫 单位**：Zhejiang University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2410.07266.md)] [[arXiv:2410.07266](https://arxiv.org/abs/2410.07266)] [[Code](https://github.com/shippoT/Spiking_GS)]\n- **📝 说明**：\n\n#### [179] ES-Gaussian: Gaussian Splatting Mapping via Error Space-Based Gaussian Completion\n- **🧑‍🔬 作者**：Lu Chen, Yingfu Zeng, Haoang Li, Zhitao Deng, Jiafu Yan, Zhenjun Zhao\n- **🏫 单位**：Dreame Technology, Shenzhen ⟐ Hong Kong University of Science and Technology ⟐ Harbin Institute of Technology ⟐ Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2410.06613.md)] [[arXiv:2410.06613](https://arxiv.org/abs/2410.06613)] [[Code](https://github.com/Dreame-Simulation-Group/ESGaussian)]\n- **📝 说明**：This paper has been withdrawn\n\n#### [180] HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction\n- **🧑‍🔬 作者**：Shengji Tang, Weicai Ye, Peng Ye, Weihao Lin, Yang Zhou, Tao Chen, Wanli Ouyang\n- **🏫 单位**：Fudan University ⟐ Shanghai AI Laboratory ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2410.06245.md)] [[arXiv:2410.06245](https://arxiv.org/abs/2410.06245)] [[Code](https://github.com/Open3DVLab/HiSplat)]\n- **📝 说明**：\n\n#### [181] GSLoc: Visual Localization with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kazii Botashev, Vladislav Pyatov, Gonzalo Ferrer, Stamatios Lefkimmiatis\n- **🏫 单位**：Skolkovo Institute of Science and Technology ⟐ MTS AI, Russia\n- **🔗 链接**：[[中英摘要](./abs/2410.06165.md)] [[arXiv:2410.06165](https://arxiv.org/abs/2410.06165)] [Code]\n- **📝 说明**：\n\n#### [182] SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting\n- **🧑‍🔬 作者**：Xinyi Liu, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi\n- **🏫 单位**：Carnegie Mellon University ⟐ University of North Carolina-Chapel\n- **🔗 链接**：[[中英摘要](./abs/2410.06014.md)] [[arXiv:2410.06014](https://arxiv.org/abs/2410.06014)] [Code]\n- **📝 说明**：\n\n#### [183] GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting\n- **🧑‍🔬 作者**：Yukang Cao, Masoud Hadi, Liang Pan, Ziwei Liu\n- **🏫 单位**：Nanyang Technological University ⟐ Shanghai AI Laboratory ⟐ Isfahan University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2410.05259.md)] [[arXiv:2410.05259](https://arxiv.org/abs/2410.05259)] [[Code](https://github.com/yukangcao/GS-VTON)]\n- **📝 说明**：\n\n#### [184] LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting\n- **🧑‍🔬 作者**：Qifeng Chen, Sheng Yang, Sicong Du, Tao Tang, Peng Chen, Yuchi Huo\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2410.05111.md)] [[arXiv:2410.05111](https://arxiv.org/abs/2410.05111)] [[Code](https://github.com/cqf7419/LiDAR-GS)]\n- **📝 说明**：\n\n#### [185] PhotoReg: Photometrically Registering 3D Gaussian Splatting Models\n- **🧑‍🔬 作者**：Ziwen Yuan, Tianyi Zhang, Matthew Johnson-Roberson, Weiming Zhi\n- **🏫 单位**：Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2410.05044.md)] [[arXiv:2410.05044](https://arxiv.org/abs/2410.05044)] [[Code](https://github.com/ziweny11/PhotoRegCodes)]\n- **📝 说明**：\n\n#### [186] Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering\n- **🧑‍🔬 作者**：Yonghan Lee, Jaehoon Choi, Dongki Jung, Jaeseong Yun, Soohyun Ryu, Dinesh Manocha, Suyong Yeon\n- **🏫 单位**：University of Maryland ⟐ NAVER LABS\n- **🔗 链接**：[[中英摘要](./abs/2410.04646.md)] [[arXiv:2410.04646](https://arxiv.org/abs/2410.04646)] [Code]\n- **📝 说明**：\n\n#### [187] StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting\n- **🧑‍🔬 作者**：Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Tong He, Houqiang Li\n- **🏫 单位**：Department of Electrical Engineering and Information Science, University of Science and Technology of China ⟐ State Key Laboratory of CAD and CG, Zhejiang University ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2410.04354.md)] [[arXiv:2410.04354](https://arxiv.org/abs/2410.04354)] [Code]\n- **📝 说明**：\n\n#### [188] Variational Bayes Gaussian Splatting\n- **🧑‍🔬 作者**：Toon Van de Maele, Ozan Catal, Alexander Tschantz, Christopher L. Buckley, Tim Verbelen\n- **🏫 单位**：VERSES AI Research Lab ⟐ School of Engineering and Informatics University of Sussex Brighton, UK\n- **🔗 链接**：[[中英摘要](./abs/2410.03592.md)] [[arXiv:2410.03592](https://arxiv.org/abs/2410.03592)] [Code]\n- **📝 说明**：\n\n#### [189] SuperGS: Super-Resolution 3D Gaussian Splatting via Latent Feature Field and Gradient-guided Splitting\n- **🧑‍🔬 作者**：Shiyun Xie, Zhiru Wang, Yinghao Zhu, Chengwei Pan\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2410.02571.md)] [[arXiv:2410.02571](https://arxiv.org/abs/2410.02571)] [[Code](https://github.com/SYXieee/SuperGS)]\n- **📝 说明**：\n\n#### [190] MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis\n- **🧑‍🔬 作者**：Xiaobiao Du, Yida Wang, Xin Yu\n- **🏫 单位**：The University of Technology Sydney ⟐ The University of Queensland ⟐ Liauto Inc.\n- **🔗 链接**：[[中英摘要](./abs/2410.02103.md)] [[arXiv:2410.02103](https://arxiv.org/abs/2410.02103)] [[Code](https://github.com/xiaobiaodu/MVGS)]\n- **📝 说明**：\n\n#### [191] EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis\n- **🧑‍🔬 作者**：Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jon Barron, Yinda Zhang\n- **🏫 单位**：University of California, San Diego ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2410.01804.md)] [[arXiv:2410.01804](https://arxiv.org/abs/2410.01804)] [[Code](https://half-potato.gitlab.io/posts/ever/)]\n- **📝 说明**：\n\n#### [192] 3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection\n- **🧑‍🔬 作者**：Yang Cao, Yuanliang Jv, Dan Xu\n- **🏫 单位**：The Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2410.01647.md)] [[arXiv:2410.01647](https://arxiv.org/abs/2410.01647)] [[Code](https://github.com/yangcaoai/3DGS-DET)]\n- **📝 说明**：\n\n#### [193] GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians\n- **🧑‍🔬 作者**：Shuyi Jiang, Qihao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao\n- **🏫 单位**：Singapore Univeristy of Technology and Design ⟐ Microsoft Research Asia ⟐ Lancaster University\n- **🔗 链接**：[[中英摘要](./abs/2410.01535.md)] [[arXiv:2410.01535](https://arxiv.org/abs/2410.01535)] [Code]\n- **📝 说明**：\n\n#### [194] MiraGe: Editable 2D Images using Gaussian Splatting\n- **🧑‍🔬 作者**：Joanna Waczyńska, Tomasz Szczepanik, Piotr Borycki, Sławomir Tadeja, Thomas Bohné, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2410.01521.md)] [[arXiv:2410.01521](https://arxiv.org/abs/2410.01521)] [Code]\n- **📝 说明**：\n\n#### [195] EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings\n- **🧑‍🔬 作者**：Yingdong Hu, Zhening Liu, Jiawei Shao, Zehong Lin, Jun Zhang\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ Institute of Artificial Intelligence (TeleAI), China Telecom\n- **🔗 链接**：[[中英摘要](./abs/2410.01425.md)] [[arXiv:2410.01425](https://arxiv.org/abs/2410.01425)] [[Code](https://zhenliuzju.github.io/huyingdong/EVA-Gaussian)]\n- **📝 说明**：\n\n#### [196] GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving\n- **🧑‍🔬 作者**：Zhangshuo Qi, Junyi Ma, Jingyi Xu, Zijie Zhou, Luqi Cheng, Guangming Xiong\n- **🏫 单位**：Beijing Institute of Technology ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2410.00299.md)] [[arXiv:2410.00299](https://arxiv.org/abs/2410.00299)] [[Code](https://github.com/QiZS-BIT/GSPR)]\n- **📝 说明**：\n"
  },
  {
    "path": "archive/202504.md",
    "content": "# 3D Gaussian Splatting Papers Before 2025/04/01\n\n#### [1] StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi\n- **🏫 单位**：Google DeepMind ⟐ University of British Columbia ⟐ Google ⟐ Runway ML ⟐ Simon Fraser University ⟐ University of Toronto\n- **🔗 链接**：[[中英摘要](./abs/2503.24366.md)] [[arXiv:2503.24366](https://arxiv.org/abs/2503.24366)] [Code]\n- **📝 说明**：\n\n#### [2] Visual Acoustic Fields\n- **🧑‍🔬 作者**：Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang\n- **🏫 单位**：UC San Diego ⟐ Harvard University ⟐ MIT\n- **🔗 链接**：[[中英摘要](./abs/2503.24270.md)] [[arXiv:2503.24270](https://arxiv.org/abs/2503.24270)] [Code]\n- **📝 说明**：\n\n#### [3] Learning 3D-Gaussian Simulators from RGB Videos\n- **🧑‍🔬 作者**：Mikel Zhobro, Andreas René Geist, Georg Martius\n- **🏫 单位**：University of T¨ubingen ⟐ Max Planck Institute for Intelligent Systems, T¨ ubingen\n- **🔗 链接**：[[中英摘要](./abs/2503.24009.md)] [[arXiv:2503.24009](https://arxiv.org/abs/2503.24009)] [Code]\n- **📝 说明**：\n\n#### [4] NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations\n- **🧑‍🔬 作者**：Zhenyu Tang, Chaoran Feng, Xinhua Cheng, Wangbo Yu, Junwu Zhang, Yuan Liu, Xiaoxiao Long, Wenping Wang, Li Yuan\n- **🏫 单位**：Peking University ⟐ Hong Kong University of Science and Technology ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2503.23162.md)] [[arXiv:2503.23162](https://arxiv.org/abs/2503.23162)] [[Code](https://github.com/PKU-YuanGroup/NeuralGS)]\n- **📝 说明**：\n\n#### [5] CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction\n- **🧑‍🔬 作者**：Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han\n- **🏫 单位**：Northwestern Polytechnical University ⟐ Shanghai Artificial Intelligence Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2503.23044.md)] [[arXiv:2503.23044](https://arxiv.org/abs/2503.23044)] [[Code](https://github.com/gyy456/CityGS-X)]\n- **📝 说明**：\n\n#### [6] FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction\n- **🧑‍🔬 作者**：Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee\n- **🏫 单位**：CVRP Lab, National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2503.22986.md)] [[arXiv:2503.22986](https://arxiv.org/abs/2503.22986)] [[Code](https://github.com/wangys16/FreeSplatPP)]\n- **📝 说明**：\n\n#### [7] TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Boyang (Tony)Yu, Yanlin Jin, Ashok Veeraraghavan, Akshat Dave, Guha Balakrishnan\n- **🏫 单位**：Rice University ⟐ Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.22676.md)] [[arXiv:2503.22676](https://arxiv.org/abs/2503.22676)] [Code]\n- **📝 说明**：\n\n#### [8] Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis\n- **🧑‍🔬 作者**：Shuai Shen, Wanhua Li, Yunpeng Zhang, Weipeng Hu, Yap-Peng Tan\n- **🏫 单位**：Nanyang Technological University ⟐ Harvard University ⟐ Dept of Computer Science, University College London, UK ⟐ PhiGent Robotics\n- **🔗 链接**：[[中英摘要](./abs/2503.22605.md)] [[arXiv:2503.22605](https://arxiv.org/abs/2503.22605)] [[Code](https://sstzal.github.io/Audio-Plane/)]\n- **📝 说明**：\n\n#### [9] EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting\n- **🧑‍🔬 作者**：Xu Wang, Shuai Zhang, Baoru Huang, Danail Stoyanov, Evangelos B. Mazomenos\n- **🏫 单位**：UCL Hawkes Institute, University College London, UK ⟐ Dept of Medical Physics & Biomedical Engineering, University College London, UK ⟐ Dept of Computer Science, University College London, UK ⟐ Dept of Computer Science, University of Liverpool, UK\n- **🔗 链接**：[[中英摘要](./abs/2503.22437.md)] [[arXiv:2503.22437](https://arxiv.org/abs/2503.22437)] [Code]\n- **📝 说明**：\n\n#### [10] AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation\n- **🧑‍🔬 作者**：Chenyang Xu, XingGuo Deng, Rui Zhong\n- **🏫 单位**：Fuzhou University ⟐ Central China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2503.22324.md)] [[arXiv:2503.22324](https://arxiv.org/abs/2503.22324)] [Code]\n- **📝 说明**：\n\n#### [11] Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance\n- **🧑‍🔬 作者**：Wenjie Liu, Zhongliang Liu, Xiaoyan Yang, Man Sha, Yang Li\n- **🏫 单位**：School of Computer Science and Technology, East China Normal University, Shanghai, China ⟐ School of Software Engineering, East China Normal University, Shanghai, China ⟐ Shanghai Chinafortune Co Ltd, Shanghai, China\n- **🔗 链接**：[[中英摘要](./abs/2503.22225.md)] [[arXiv:2503.22225](https://arxiv.org/abs/2503.22225)] [Code]\n- **📝 说明**：\n\n#### [12] Disentangled 4D Gaussian Splatting: Towards Faster and More Efficient Dynamic Scene Rendering\n- **🧑‍🔬 作者**：Hao Feng, Hao Sun, Wei Xie\n- **🏫 单位**：Central China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2503.22159.md)] [[arXiv:2503.22159](https://arxiv.org/abs/2503.22159)] [Code]\n- **📝 说明**：\n\n#### [13] Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying\n- **🧑‍🔬 作者**：Hairong Yin, Huangying Zhan, Yi Xu, Raymond A. Yeh\n- **🏫 单位**：Department Computer Science, Purdue University ⟐ Goertek Alpha Labs\n- **🔗 链接**：[[中英摘要](./abs/2503.21767.md)] [[arXiv:2503.21767](https://arxiv.org/abs/2503.21767)] [Code]\n- **📝 说明**：\n\n#### [14] STAMICS: Splat, Track And Map with Integrated Consistency and Semantics for Dense RGB-D SLAM\n- **🧑‍🔬 作者**：Yongxu Wang, Xu Cao, Weiyun Yi, Zhaoxin Fan\n- **🏫 单位**：University of Science and Technology of China ⟐ University of Science and Technology Liaoning ⟐ Beijing Institute of Technology ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2503.21425.md)] [[arXiv:2503.21425](https://arxiv.org/abs/2503.21425)] [Code]\n- **📝 说明**：\n\n#### [15] Frequency-Aware Gaussian Splatting Decomposition\n- **🧑‍🔬 作者**：Yishai Lavi, Leo Segre, Shai Avidan\n- **🏫 单位**：Tel Aviv University\n- **🔗 链接**：[[中英摘要](./abs/2503.21226.md)] [[arXiv:2503.21226](https://arxiv.org/abs/2503.21226)] [[Code](https://github.com/yishailavi/nerfstudio_lap/tree/main)]\n- **📝 说明**：\n\n#### [16] High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Qian Wang, Zhihao Zhan, Jialei He, Zhituo Tu, Xiang Zhu, Jie Yuan\n- **🏫 单位**：Nanjing University ⟐ TopXGun Robotics\n- **🔗 链接**：[[中英摘要](./abs/2503.19703.md)] [[arXiv:2503.19703](https://arxiv.org/abs/2503.19703)] [Code]\n- **📝 说明**：\n\n#### [17] SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors\n- **🧑‍🔬 作者**：Yiqing Li, Xuan Wang, Jiawei Wu, Yikun Ma, Zhi Jin\n- **🏫 单位**：Sun Yat-sen University ⟐ Ant Research\n- **🔗 链接**：[[中英摘要](./abs/2503.19452.md)] [[arXiv:2503.19452](https://arxiv.org/abs/2503.19452)] [Code]\n- **📝 说明**：\n\n#### [18] MATT-GS: Masked Attention-based 3DGS for Robot Perception and Object Detection\n- **🧑‍🔬 作者**：Jee Won Lee, Hansol Lim, SooYeun Yang, Jongseong Brad Choi\n- **🏫 单位**：State University of New York\n- **🔗 链接**：[[中英摘要](./abs/2503.19330.md)] [[arXiv:2503.19330](https://arxiv.org/abs/2503.19330)] [Code]\n- **📝 说明**：\n\n#### [19] GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Lijiang Li, Jinglu Wang, Xiang Ming, Yan Lu\n- **🏫 单位**：Microsoft Research Asia\n- **🔗 链接**：[[中英摘要](./abs/2503.18718.md)] [[arXiv:2503.18718](https://arxiv.org/abs/2503.18718)] [Code]\n- **📝 说明**：\n\n#### [20] LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment\n- **🧑‍🔬 作者**：Haoran Wang, Jingwei Huang, Lu Yang, Tianchen Deng, Gaojing Zhang, Mingrui Li\n- **🏫 单位**：University of Sussex ⟐ University of Electronic Science and Technology of China ⟐ Shanghai Jiao Tong University ⟐ Dalian University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.18640.md)] [[arXiv:2503.18640](https://arxiv.org/abs/2503.18640)] [Code]\n- **📝 说明**：\n\n#### [21] StableGS: A Floater-Free Framework for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Luchao Wang, Qian Ren, Kaimin Liao, Hua Wang, Zhi Chen, Yaohua Tang\n- **🏫 单位**：Moore Threads AI\n- **🔗 链接**：[[中英摘要](./abs/2503.18458.md)] [[arXiv:2503.18458](https://arxiv.org/abs/2503.18458)] [Code]\n- **📝 说明**：\n\n#### [22] ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation\n- **🧑‍🔬 作者**：Guosheng Zhao, Xiaofeng Wang, Chaojun Ni, Zheng Zhu, Wenkang Qin, Guan Huang, Xingang Wang\n- **🏫 单位**：GigaAI ⟐ Chinese Academy of Sciences ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2503.18438.md)] [[arXiv:2503.18438](https://arxiv.org/abs/2503.18438)] [[Code](https://github.com/GigaAI-research/ReconDreamer-Plus)]\n- **📝 说明**：\n\n#### [23] GI-SLAM: Gaussian-Inertial SLAM\n- **🧑‍🔬 作者**：Xulang Liu, Ning Tan\n- **🏫 单位**：Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2503.18275.md)] [[arXiv:2503.18275](https://arxiv.org/abs/2503.18275)] [Code]\n- **📝 说明**：\n\n#### [24] Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving\n- **🧑‍🔬 作者**：Junhao Ge, Zuhong Liu, Longteng Fan, Yifan Jiang, Jiaqi Su, Yiming Li, Zhejun Zhang, Siheng Chen\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ New York University ⟐ ETH Zurich ⟐ Shanghai Artificial Intelligence Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2503.18108.md)] [[arXiv:2503.18108](https://arxiv.org/abs/2503.18108)] [[Code](https://github.com/cancaries/SceneCrafter)]\n- **📝 说明**：\n\n#### [25] PanopticSplatting: End-to-End Panoptic Gaussian Splatting\n- **🧑‍🔬 作者**：Yuxuan Xie, Xuan Yu, Changjian Jiang, Sitong Mao, Shunbo Zhou, Rui Fan, Rong Xiong, Yue Wang\n- **🏫 单位**：Zhejiang University ⟐ Huawei Cloud Computing Technologies Co., Ltd., Shenzhen, China ⟐ Tongji University\n- **🔗 链接**：[[中英摘要](./abs/2503.18073.md)] [[arXiv:2503.18073](https://arxiv.org/abs/2503.18073)] [Code]\n- **📝 说明**：\n\n#### [26] SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining\n- **🧑‍🔬 作者**：Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel\n- **🏫 单位**：University of Amsterdam ⟐ Computer Vision Lab, ETH Zurich ⟐ INSAIT, Sofia University ⟐ Nanjing University of Aeronautics and Astronautics ⟐ University of Pisa ⟐ University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2503.18052.md)] [[arXiv:2503.18052](https://arxiv.org/abs/2503.18052)] [Code]\n- **📝 说明**：\n\n#### [27] Real-time Global Illumination for Dynamic 3D Gaussian Scenes\n- **🧑‍🔬 作者**：Chenxiao Hu, Meng Gai, Guoping Wang, Sheng Li\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2503.17897.md)] [[arXiv:2503.17897](https://arxiv.org/abs/2503.17897)] [Code]\n- **📝 说明**：\n\n#### [28] GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zexu Huang, Min Xu, Stuart Perry\n- **🏫 单位**：School of Electrical and Data Engineering, University of Technology Sydney\n- **🔗 链接**：[[中英摘要](./abs/2503.17798.md)] [[arXiv:2503.17798](https://arxiv.org/abs/2503.17798)] [Code]\n- **📝 说明**：\n\n#### [29] GS-LTS: 3D Gaussian Splatting-Based Adaptive Modeling for Long-Term Service Robots\n- **🧑‍🔬 作者**：Bin Fu, Jialin Li, Bin Zhang, Ruiping Wang, Xilin Chen\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2503.17733.md)] [[arXiv:2503.17733](https://arxiv.org/abs/2503.17733)] [Code]\n- **📝 说明**：\n\n#### [30] Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Simona Kocour, Assia Benbihi, Aikaterini Adam, Torsten Sattler\n- **🏫 单位**：Czech Technical University in Prague ⟐ Archimedes/Athena RC\n- **🔗 链接**：[[中英摘要](./abs/2503.17574.md)] [[arXiv:2503.17574](https://arxiv.org/abs/2503.17574)] [Code]\n- **📝 说明**：\n\n#### [31] Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping\n- **🧑‍🔬 作者**：Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti, Giorgio Grisetti, Martin R. Oswald\n- **🏫 单位**：Sapienza University of Rome ⟐ University of Melbourne ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](./abs/2503.17491.md)] [[arXiv:2503.17491](https://arxiv.org/abs/2503.17491)] [Code]\n- **📝 说明**：\n\n#### [32] ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes\n- **🧑‍🔬 作者**：Zhengqing Gao, Dongting Hu, Jia-Wang Bian, Huan Fu, Yan Li, Tongliang Liu, Mingming Gong, Kun Zhang\n- **🏫 单位**：Mohamed bin Zayed University of Artificial Intelligence ⟐ University of Melbourne ⟐ Alibaba Group ⟐ The University of Sydney ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2503.17486.md)] [[arXiv:2503.17486](https://arxiv.org/abs/2503.17486)] [Code]\n- **📝 说明**：\n\n#### [33] DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery\n- **🧑‍🔬 作者**：Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, Yi Yang\n- **🏫 单位**：Beijing Institute of Technology ⟐ Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](./abs/2503.16964.md)] [[arXiv:2503.16964](https://arxiv.org/abs/2503.16964)] [Code]\n- **📝 说明**：\n\n#### [34] Optimized Minimal 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Joo Chan Lee, Jong Hwan Ko, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2503.16924.md)] [[arXiv:2503.16924](https://arxiv.org/abs/2503.16924)] [[Code](https://github.com/maincold2/OMG)]\n- **📝 说明**：\n\n#### [35] SAGE: Semantic-Driven Adaptive Gaussian Splatting in Extended Reality\n- **🧑‍🔬 作者**：Chiara Schiavo, Elena Camuffo, Leonardo Badia, Simone Milani\n- **🏫 单位**：Dept. of Information Engineering ⟐ University of Padova, Padua, Italy\n- **🔗 链接**：[[中英摘要](./abs/2503.16747.md)] [[arXiv:2503.16747](https://arxiv.org/abs/2503.16747)] [Code]\n- **📝 说明**：\n\n#### [36] 4D Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari\n- **🏫 单位**：Technical University of Munich ⟐ Hangzhou Dianzi University ⟐ Zhejiang University ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2503.16710.md)] [[arXiv:2503.16710](https://arxiv.org/abs/2503.16710)] [[Code](https://github.com/yanyan-li/4DGS-SLAM)]\n- **📝 说明**：\n\n#### [37] 1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering\n- **🧑‍🔬 作者**：Yuheng Yuan, Qiuhong Shen, Xingyi Yang, Xinchao Wang\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2503.16422.md)] [[arXiv:2503.16422](https://arxiv.org/abs/2503.16422)] [[Code](https://4dgs-1k.github.io/)]\n- **📝 说明**：\n\n#### [38] VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling\n- **🧑‍🔬 作者**：Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim\n- **🏫 单位**：EverEx ⟐ KAIST ⟐ Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2503.15855.md)] [[arXiv:2503.15855](https://arxiv.org/abs/2503.15855)] [[Code](https://github.com/gohyojun15/VideoRFSplat)]\n- **📝 说明**：\n\n#### [39] Controlling Avatar Diffusion with Learnable Gaussian Embedding\n- **🧑‍🔬 作者**：Xuan Gao, Jingtao Zhou, Dongyu Liu, Yuqi Zhou, Juyong Zhang\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2503.15809.md)] [[arXiv:2503.15809](https://arxiv.org/abs/2503.15809)] [Code]\n- **📝 说明**：\n\n#### [40] ClimateGS: Real-Time Climate Simulation with 3D Gaussian Style Transfer\n- **🧑‍🔬 作者**：Yuezhen Xie, Meiying Zhang, Qi Hao\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2503.14845.md)] [[arXiv:2503.14845](https://arxiv.org/abs/2503.14845)] [Code]\n- **📝 说明**：\n\n#### [41] SketchSplat: 3D Edge Reconstruction via Differentiable Multi-view Sketch Splatting\n- **🧑‍🔬 作者**：Haiyang Ying, Matthias Zwicker\n- **🏫 单位**：University of Maryland, College Park\n- **🔗 链接**：[[中英摘要](./abs/2503.14786.md)] [[arXiv:2503.14786](https://arxiv.org/abs/2503.14786)] [Code]\n- **📝 说明**：\n\n#### [42] HandSplat: Embedding-Driven Gaussian Splatting for High-Fidelity Hand Rendering\n- **🧑‍🔬 作者**：Yilan Dong, Haohe Liu, Qing Wang, Jiahao Yang, Wenqing Wang, Gregory Slabaugh, Shanxin Yuan\n- **🏫 单位**：Queen Mary University of London ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2503.14736.md)] [[arXiv:2503.14736](https://arxiv.org/abs/2503.14736)] [Code]\n- **📝 说明**：\n\n#### [43] Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation\n- **🧑‍🔬 作者**：Umar Farooq, Jean-Yves Guillemaut, Adrian Hilton, Marco Volino\n- **🏫 单位**：CVSSP ⟐ University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2503.14475.md)] [[arXiv:2503.14475](https://arxiv.org/abs/2503.14475)] [Code]\n- **📝 说明**：\n\n#### [44] Improving Adaptive Density Control for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Glenn Grubert, Florian Barthel, Anna Hilsmann, Peter Eisert\n- **🏫 单位**：Humboldt Universit¨at zu Berlin, Berlin, Germany ⟐ Fraunhofer HHI, Berlin, Germany\n- **🔗 链接**：[[中英摘要](./abs/2503.14274.md)] [[arXiv:2503.14274](https://arxiv.org/abs/2503.14274)] [Code]\n- **📝 说明**：\n\n#### [45] Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images\n- **🧑‍🔬 作者**：Simon Niedermayr, Christoph Neuhauser Rüdiger Westermann\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2503.14171.md)] [[arXiv:2503.14171](https://arxiv.org/abs/2503.14171)] [Code]\n- **📝 说明**：\n\n#### [46] BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering\n- **🧑‍🔬 作者**：Minye Wu, Haizhao Dai, Kaixin Yao, Tinne Tuytelaars, Jingyi Yu\n- **🏫 单位**：KU Leuven ⟐ ShanghaiTech University ⟐ Cellverse Co, Ltd.\n- **🔗 链接**：[[中英摘要](./abs/2503.13961.md)] [[arXiv:2503.13961](https://arxiv.org/abs/2503.13961)] [Code]\n- **📝 说明**：\n\n#### [47] Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model\n- **🧑‍🔬 作者**：Mufan Liu, Qi Yang, He Huang, Wenjie Huang, Zhenlong Yuan, Zhu Li, Yiling Xu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Missouri, Kansas City ⟐ University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2503.13948.md)] [[arXiv:2503.13948](https://arxiv.org/abs/2503.13948)] [Code]\n- **📝 说明**：\n\n#### [48] Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors\n- **🧑‍🔬 作者**：Katja Schwarz, Norman Mueller, Peter Kontschieder\n- **🏫 单位**：Meta Reality Labs Zurich, Switzerland\n- **🔗 链接**：[[中英摘要](./abs/2503.13272.md)] [[arXiv:2503.13272](https://arxiv.org/abs/2503.13272)] [[Code](https://katjaschwarz.github.io/ggs/)]\n- **📝 说明**：\n\n#### [49] Gaussian On-the-Fly Splatting: A Progressive Framework for Robust Near Real-Time 3DGS Optimization\n- **🧑‍🔬 作者**：Yiwei Xu, Yifei Yu, Wentian Gan, Tengfei Wang, Zongqian Zhan, Hao Cheng, Xin Wang\n- **🏫 单位**：Wuhan University ⟐ University of Twente\n- **🔗 链接**：[[中英摘要](./abs/2503.13086.md)] [[arXiv:2503.13086](https://arxiv.org/abs/2503.13086)] [[Code](https://xywjohn.github.io/GS_On-the-Fly.github.io/)]\n- **📝 说明**：\n\n#### [50] CAT-3DGS Pro: A New Benchmark for Efficient 3DGS Compression\n- **🧑‍🔬 作者**：Yu-Ting Zhan, He-bi Yang, Cheng-Yuan Ho, Jui-Chiu Chiang, Wen-Hsiao Peng\n- **🏫 单位**：National Yang Ming Chiao Tung University ⟐ National Chung Cheng University\n- **🔗 链接**：[[中英摘要](./abs/2503.12862.md)] [[arXiv:2503.12862](https://arxiv.org/abs/2503.12862)] [Code]\n- **📝 说明**：\n\n#### [51] CompMarkGS: Robust Watermarking for Compression 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Sumin In, Youngdong Jang, Utae Jeong, MinHyuk Jang, Hyeongcheol Park, Eunbyung Park, Sangpil Kim\n- **🏫 单位**：Korea University ⟐ Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2503.12836.md)] [[arXiv:2503.12836](https://arxiv.org/abs/2503.12836)] [Code]\n- **📝 说明**：\n\n#### [52] Deblur Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Francesco Girlanda, Denys Rozumnyi, Marc Pollefeys, Martin R. Oswald\n- **🏫 单位**：ETH Zürich ⟐ Microsoft ⟐ University of Amsterdam\n- **🔗 链接**：[[中英摘要](./abs/2503.12572.md)] [[arXiv:2503.12572](https://arxiv.org/abs/2503.12572)] [Code]\n- **📝 说明**：\n\n#### [53] Niagara: Normal-Integrated Geometric Affine Field for Scene Reconstruction from a Single View\n- **🧑‍🔬 作者**：Xianzu Wu, Zhenxin Ai, Harry Yang, Ser-Nam Lim, Jun Liu, Huan Wang\n- **🏫 单位**：Westlake University ⟐ Jiangxi University of Science and Technology ⟐ The Hong Kong University of Science and Technology ⟐ University of Central Florida ⟐ Lancaster University ⟐ Everlyn AI\n- **🔗 链接**：[[中英摘要](./abs/2503.12553.md)] [[arXiv:2503.12553](https://arxiv.org/abs/2503.12553)] [[Code](https://github.com/xianzuwu/Niagara)]\n- **📝 说明**：\n\n#### [54] MTGS: Multi-Traversal Gaussian Splatting\n- **🧑‍🔬 作者**：Tianyu Li, Yihang Qiu, Zhenhua Wu, Carl Lindström, Peng Su, Matthias Nießner, Hongyang Li\n- **🏫 单位**：Shanghai Innovation Institute ⟐ OpenDriveLab and MMLab, The University of Hong Kong ⟐ Technical University of Munich ⟐ Chalmers University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.12552.md)] [[arXiv:2503.12552](https://arxiv.org/abs/2503.12552)] [Code]\n- **📝 说明**：\n\n#### [55] VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting\n- **🧑‍🔬 作者**：Songen Gu, Haoxuan Song, Binjie Liu, Qian Yu, Sanyi Zhang, Haiyong Jiang, Jin Huang, Feng Tian\n- **🏫 单位**：Institute of Software, CAS ⟐ UCAS ⟐ Communication University of China ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2503.12383.md)] [[arXiv:2503.12383](https://arxiv.org/abs/2503.12383)] [Code]\n- **📝 说明**：\n\n#### [56] TopoGaussian: Inferring Internal Topology Structures from Visual Clues\n- **🧑‍🔬 作者**：Xiaoyu Xiong, Changyu Hu, Chunru Lin, Pingchuan Ma, Chuang Gan, Tao Du\n- **🏫 单位**：Tsinghua University ⟐ University of Massachusetts Amherst ⟐ Massachusetts Institute of Technology ⟐ Shanghai Qi Zhi Institute\n- **🔗 链接**：[[中英摘要](./abs/2503.12343.md)] [[arXiv:2503.12343](https://arxiv.org/abs/2503.12343)] [[Code](https://topo-gaussian.github.io/TopoGaussian/)]\n- **📝 说明**：\n\n#### [57] GS-I3: Gaussian Splatting for Surface Reconstruction from Illumination-Inconsistent Images\n- **🧑‍🔬 作者**：Tengfei Wang, Yongmao Hou, Zhaoning Zhang, Yiwei Xu, Zongqian Zhan, Xin Wang\n- **🏫 单位**：School of Geodesy and Geomatics, Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2503.12335.md)] [[arXiv:2503.12335](https://arxiv.org/abs/2503.12335)] [[Code](https://github.com/TFwang-9527/GS-3I)]\n- **📝 说明**：\n\n#### [58] REdiSplats: Ray Tracing for Editable Gaussian Splatting\n- **🧑‍🔬 作者**：Krzysztof Byrski, Grzegorz Wilczyński, Weronika Smolak-Dyżewska, Piotr Borycki, Dawid Baran, Sławomir Tadeja, Przemysław Spurek\n- **🏫 单位**：Faculty of Mathematics and Computer Science, Jagiellonian University, Poland ⟐ Department of Engineering, University of Cambridge, Cambridge, United Kingdom\n- **🔗 链接**：[[中英摘要](./abs/2503.12284.md)] [[arXiv:2503.12284](https://arxiv.org/abs/2503.12284)] [[Code](https://github.com/KByrski/REdiSplats)]\n- **📝 说明**：\n\n#### [59] 3D Gaussian Splatting against Moving Objects for High-Fidelity Street Scene Reconstruction\n- **🧑‍🔬 作者**：Peizhen Zheng, Longfei Wei, Dongjing Jiang, Jianfei Zhang\n- **🏫 单位**：ThinkX, Canada ⟐ MedicineX, Canada\n- **🔗 链接**：[[中英摘要](./abs/2503.12001.md)] [[arXiv:2503.12001](https://arxiv.org/abs/2503.12001)] [Code]\n- **📝 说明**：\n\n#### [60] DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting\n- **🧑‍🔬 作者**：Utkarsh Nath, Rajeev Goel, Rahul Khurana, Kyle Min, Mark Ollila, Pavan Turaga, Varun Jampani, Tejaswi Gowda\n- **🏫 单位**：Arizona State University ⟐ Intel Labs ⟐ Stability AI\n- **🔗 链接**：[[中英摘要](./abs/2503.11981.md)] [[arXiv:2503.11981](https://arxiv.org/abs/2503.11981)] [[Code](https://decompdreamer3d.github.io/)]\n- **📝 说明**：\n\n#### [61] DynaGSLAM: Real-Time Gaussian-Splatting SLAM for Online Rendering, Tracking, Motion Predictions of Moving Objects in Dynamic Scenes\n- **🧑‍🔬 作者**：Runfa Blark Li, Mahdi Shaghaghi, Keito Suzuki, Xinshuang Liu, Varun Moparthi, Bang Du, Walker Curtis, Martin Renschler, Ki Myung Brian Lee, Nikolay Atanasov, Truong Nguyen\n- **🏫 单位**：UC San Diego ⟐ Qualcomm XR Advanced Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.11979.md)] [[arXiv:2503.11979](https://arxiv.org/abs/2503.11979)] [[Code](https://github.com/BlarkLee/DynaGSLAM_official)]\n- **📝 说明**：\n\n#### [62] Snapmoji: Instant Generation of Animatable Dual-Stylized Avatars\n- **🧑‍🔬 作者**：Eric M. Chen, Di Liu, Sizhuo Ma, Michael Vasilkovsky, Bing Zhou, Qiang Gao, Wenzhou Wang, Jiahao Luo, Dimitris N. Metaxas, Vincent Sitzmann, Jian Wang\n- **🏫 单位**：Snap Inc ⟐ MIT ⟐ Rutgers University ⟐ University of California, Santa Cruz\n- **🔗 链接**：[[中英摘要](./abs/2503.11978.md)] [[arXiv:2503.11978](https://arxiv.org/abs/2503.11978)] [Code]\n- **📝 说明**：\n\n#### [63] Industrial-Grade Sensor Simulation via Gaussian Splatting: A Modular Framework for Scalable Editing and Full-Stack Validation\n- **🧑‍🔬 作者**：Xianming Zeng, Sicong Du, Qifeng Chen, Lizhe Liu, Haoyu Shu, Jiaxuan Gao, Jiarun Liu, Jiulong Xu, Jianyun Xu, Mingxia Chen, Yiru Zhao, Peng Chen, Yapeng Xue, Chunming Zhao, Sheng Yang, Qiang Li\n- **🏫 单位**：Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2503.11731.md)] [[arXiv:2503.11731](https://arxiv.org/abs/2503.11731)] [Code]\n- **📝 说明**：\n\n#### [64] Advancing 3D Gaussian Splatting Editing with Complementary and Consensus Information\n- **🧑‍🔬 作者**：Xuanqi Zhang, Jieun Lee, Chris Joslin, Wonsook Lee\n- **🏫 单位**：University of Ottawa ⟐ Hansung University ⟐ Carleton University\n- **🔗 链接**：[[中英摘要](./abs/2503.11601.md)] [[arXiv:2503.11601](https://arxiv.org/abs/2503.11601)] [Code]\n- **📝 说明**：\n\n#### [65] EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Di Li, Jie Feng, Jiahao Chen, Weisheng Dong, Guanbin Li, Guangming Shi, Licheng Jiao\n- **🏫 单位**：Xidian University ⟐ Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2503.11345.md)] [[arXiv:2503.11345](https://arxiv.org/abs/2503.11345)] [Code]\n- **📝 说明**：\n\n#### [66] Uncertainty-Aware Normal-Guided Gaussian Splatting for Surface Reconstruction from Sparse Image Sequences\n- **🧑‍🔬 作者**：Zhen Tan, Xieyuanli Chen, Jinpu Zhang, Lei Feng, Dewen Hu\n- **🏫 单位**：National University of Defense Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.11172.md)] [[arXiv:2503.11172](https://arxiv.org/abs/2503.11172)] [Code]\n- **📝 说明**：\n\n#### [67] RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors\n- **🧑‍🔬 作者**：Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, Nima Khademi Kalantari\n- **🏫 单位**：Texas A&M University ⟐ Meta Reality Labs ⟐ Max Planck Institute for Informatics\n- **🔗 链接**：[[中英摘要](./abs/2503.10860.md)] [[arXiv:2503.10860](https://arxiv.org/abs/2503.10860)] [[Code](https://github.com/avinashpaliwal/RI3D)]\n- **📝 说明**：\n\n#### [68] LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds\n- **🧑‍🔬 作者**：Lingteng Qiu, Xiaodong Gu, Peihao Li, Qi Zuo, Weichao Shen, Junfei Zhang, Kejie Qiu, Weihao Yuan, Guanying Chen, Zilong Dong, Liefeng Bo\n- **🏫 单位**：Tongyi Lab ⟐ Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2503.10625.md)] [[arXiv:2503.10625](https://arxiv.org/abs/2503.10625)] [[Code](https://github.com/aigc3d/LHM)]\n- **📝 说明**：\n\n#### [69] MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction\n- **🧑‍🔬 作者**：Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang\n- **🏫 单位**：THU ⟐ MEGVII ⟐ MachDrive ⟐ Brown University ⟐ SJTU ⟐ HKU\n- **🔗 链接**：[[中英摘要](./abs/2503.10604.md)] [[arXiv:2503.10604](https://arxiv.org/abs/2503.10604)] [[Code](https://github.com/heiheishuang/MuDG)]\n- **📝 说明**：\n\n#### [70] VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames\n- **🧑‍🔬 作者**：Zhiqi Li, Chengrui Dong, Yiming Chen, Zhangchi Huang, Peidong Liu\n- **🏫 单位**：Zhejiang University ⟐ Westlake University\n- **🔗 链接**：[[中英摘要](./abs/2503.10286.md)] [[arXiv:2503.10286](https://arxiv.org/abs/2503.10286)] [[Code](https://github.com/WU-CVGL/VicaSplat)]\n- **📝 说明**：\n\n#### [71] ROODI: Reconstructing Occluded Objects with Denoising Inpainters\n- **🧑‍🔬 作者**：Yeonjin Chang, Erqun Dong, Seunghyeon Seo, Nojun Kwak, Kwang Moo Yi\n- **🏫 单位**：Seoul National University ⟐ University of British Columbia\n- **🔗 链接**：[[中英摘要](./abs/2503.10256.md)] [[arXiv:2503.10256](https://arxiv.org/abs/2503.10256)] [[Code](https://github.com/yeonjin-chang/ROODI)]\n- **📝 说明**：\n\n#### [72] 3D Student Splatting and Scooping\n- **🧑‍🔬 作者**：Jialin Zhu, Jiangbei Yue, Feixiang He, He Wang\n- **🏫 单位**：University College London, UK ⟐ University of Leeds, UK ⟐ AI Centre, University College London, UK\n- **🔗 链接**：[[中英摘要](./abs/2503.10148.md)] [[arXiv:2503.10148](https://arxiv.org/abs/2503.10148)] [Code]\n- **📝 说明**：\n\n#### [73] TGP: Two-modal occupancy prediction with 3D Gaussian and sparse points for 3D Environment Awareness\n- **🧑‍🔬 作者**：Mu Chen, Wenyu Chen, Mingchuan Yang, Yuan Zhang, Tao Han, Xinchi Li, Yunlong Li, Huaici Zhao\n- **🏫 单位**： China Telecom Research Institute ⟐ University of Chinese Academy of Sciences ⟐ Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2503.09941.md)] [[arXiv:2503.09941](https://arxiv.org/abs/2503.09941)] [Code]\n- **📝 说明**：\n\n#### [74] Physics-Aware Human-Object Rendering from Sparse Views via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Weiquan Wang, Jun Xiao, Yueting Zhuang, Long Chen\n- **🏫 单位**：Zhejiang University ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.09640.md)] [[arXiv:2503.09640](https://arxiv.org/abs/2503.09640)] [Code]\n- **📝 说明**：\n\n#### [75] FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting\n- **🧑‍🔬 作者**：GeonU Kim, Kim Youwang, Lee Hyoseok, Tae-Hyun Oh\n- **🏫 单位**：POSTECH ⟐ KAIST\n- **🔗 链接**：[[中英摘要](./abs/2503.09635.md)] [[arXiv:2503.09635](https://arxiv.org/abs/2503.09635)] [[Code](https://github.com/kaist-ami/FPGS)]\n- **📝 说明**：\n\n#### [76] Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation\n- **🧑‍🔬 作者**：Máté Tóth, Péter Kovács, Zoltán Bendefy, Zoltán Hortsin, Balázs Teréki, Tamás Matuszka\n- **🏫 单位**：Tam´as Matuszka ⟐ aiMotive\n- **🔗 链接**：[[中英摘要](./abs/2503.09464.md)] [[arXiv:2503.09464](https://arxiv.org/abs/2503.09464)] [Code]\n- **📝 说明**：\n\n#### [77] Online Language Splatting\n- **🧑‍🔬 作者**：Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Guoquan Huang, Liu Ren\n- **🏫 单位**：University of Delaware ⟐ Bosch Research North America Center for AI\n- **🔗 链接**：[[中英摘要](./abs/2503.09447.md)] [[arXiv:2503.09447](https://arxiv.org/abs/2503.09447)] [[Code](https://saimouli.github.io/onlineLang/)]\n- **📝 说明**：\n\n#### [78] Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training\n- **🧑‍🔬 作者**：Jiatong Xia, Lingqiao Liu\n- **🏫 单位**：Australian Institute for Machine Learning, The University of Adelaide\n- **🔗 链接**：[[中英摘要](./abs/2503.09396.md)] [[arXiv:2503.09396](https://arxiv.org/abs/2503.09396)] [Code]\n- **📝 说明**：\n\n#### [79] GASPACHO: Gaussian Splatting for Controllable Humans and Objects\n- **🧑‍🔬 作者**：Aymen Mir, Arthur Moreau, Helisa Dhamo, Zhensong Zhang, Eduardo Pérez-Pellitero\n- **🏫 单位**：Huawei Noah’s Ark Lab, London ⟐ University of Tubingen, Germany\n- **🔗 链接**：[[中英摘要](./abs/2503.09342.md)] [[arXiv:2503.09342](https://arxiv.org/abs/2503.09342)] [Code]\n- **📝 说明**：\n\n#### [80] SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction\n- **🧑‍🔬 作者**：Dai Sun, Huhao Guan, Kun Zhang, Xike Xie, S. Kevin Zhou\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2503.09332.md)] [[arXiv:2503.09332](https://arxiv.org/abs/2503.09332)] [Code]\n- **📝 说明**：\n\n#### [81] PCGS: Progressive Compression of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yihang Chen, Mengyao Li, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Monash University ⟐ Shanghai University\n- **🔗 链接**：[[中英摘要](./abs/2503.08511.md)] [[arXiv:2503.08511](https://arxiv.org/abs/2503.08511)] [[Code](https://github.com/YihangChen-ee/PCGS)]\n- **📝 说明**：\n\n#### [82] TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting\n- **🧑‍🔬 作者**：Fengyi Zhang, Huitong Yang, Zheng Zhang, Zi Huang, Yadan Luo\n- **🏫 单位**：The University of Queensland, Australia ⟐ Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.08485.md)] [[arXiv:2503.08485](https://arxiv.org/abs/2503.08485)] [[Code](https://github.com/Xian-Bei/TT-Occ)]\n- **📝 说明**：\n\n#### [83] Uni-Gaussians: Unifying Camera and Lidar Simulation with Gaussians for Dynamic Driving Scenarios\n- **🧑‍🔬 作者**：Zikang Yuan, Yuechuan Pu, Hongcheng Luo, Fengtian Lang, Cheng Chi, Teng Li, Yingying Shen, Haiyang Sun, Bing Wang, Xin Yang\n- **🏫 单位**：AI Chip Center for Emerging Smart Systems, InnoHK Centers ⟐ Xiaome EV ⟐ Huazhong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.08317.md)] [[arXiv:2503.08317](https://arxiv.org/abs/2503.08317)] [Code]\n- **📝 说明**：\n\n#### [84] MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior\n- **🧑‍🔬 作者**：Kaiqiang Xiong, Ying Feng, Qi Zhang, Jianbo Jiao, Yang Zhao, Zhihao Liang, Huachen Gao, Ronggang Wang\n- **🏫 单位**：Peking University ⟐ Peng Cheng Laboratory ⟐ Migu Culture Technology Co., Ltd ⟐ vivo Mobile Communication (Hangzhou) Co., Ltd ⟐ University of Birmingham ⟐ Hefei University of Technology ⟐ South China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.08218.md)] [[arXiv:2503.08218](https://arxiv.org/abs/2503.08218)] [Code]\n- **📝 说明**：\n\n#### [85] S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction\n- **🧑‍🔬 作者**：Guangting Zheng, Jiajun Deng, Xiaomeng Chu, Yu Yuan, Houqiang Li, Yanyong Zhang\n- **🏫 单位**：University of Science and Technology of China ⟐ The University of Adelaide\n- **🔗 链接**：[[中英摘要](./abs/2503.08217.md)] [[arXiv:2503.08217](https://arxiv.org/abs/2503.08217)] [Code]\n- **📝 说明**：\n\n#### [86] MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction\n- **🧑‍🔬 作者**：Chenfeng Hou, Qi Xun Yeo, Mengqi Guo, Yongxin Su, Yanyan Li, Gim Hee Lee\n- **🏫 单位**：Beihang University ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2503.08093.md)] [[arXiv:2503.08093](https://arxiv.org/abs/2503.08093)] [Code]\n- **📝 说明**：\n\n#### [87] POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality\n- **🧑‍🔬 作者**：Joey Wilson, Marcelino Almeida, Sachit Mahajan, Martin Labrie, Maani Ghaffari, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnab Sen\n- **🏫 单位**：University of Michigan ⟐ Amazon Lab 126\n- **🔗 链接**：[[中英摘要](./abs/2503.07819.md)] [[arXiv:2503.07819](https://arxiv.org/abs/2503.07819)] [Code]\n- **📝 说明**：\n\n#### [88] EigenGS Representation: From Eigenspace to Gaussian Image Space\n- **🧑‍🔬 作者**：Lo-Wei Tai, Ching-En Li, Cheng-Lin Chen, Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu\n- **🏫 单位**：National Tsing Hua University ⟐ Aeolus Robotics ⟐ Academia Sinica, Taiwan\n- **🔗 链接**：[[中英摘要](./abs/2503.07446.md)] [[arXiv:2503.07446](https://arxiv.org/abs/2503.07446)] [Code]\n- **📝 说明**：\n\n#### [89] All That Glitters Is Not Gold: Key-Secured 3D Secrets within 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yan Ren, Shilin Lu, Adams Wai-Kin Kong\n- **🏫 单位**：Nanyang Technological University, Singapore\n- **🔗 链接**：[[中英摘要](./abs/2503.07191.md)] [[arXiv:2503.07191](https://arxiv.org/abs/2503.07191)] [[Code](https://github.com/RY-Paper/KeySS)]\n- **📝 说明**：\n\n#### [90] ActiveInitSplat: How Active Image Selection Helps Gaussian Splatting\n- **🧑‍🔬 作者**：Konstantinos D. Polyzos, Athanasios Bacharis, Saketh Madhuvarasu, Nikos Papanikolopoulos, Tara Javidi\n- **🏫 单位**：University of California San Diego ⟐ University of Minnesota\n- **🔗 链接**：[[中英摘要](./abs/2503.06859.md)] [[arXiv:2503.06859](https://arxiv.org/abs/2503.06859)] [Code]\n- **📝 说明**：\n\n#### [91] Gaussian RBFNet: Gaussian Radial Basis Functions for Fast and Accurate Representation and Reconstruction of Neural Fields\n- **🧑‍🔬 作者**：Abdelaziz Bouzidi, Hamid Laga, Hazem Wannous\n- **🏫 单位**：Murdoch University ⟐ IMT Nord Europe\n- **🔗 链接**：[[中英摘要](./abs/2503.06762.md)] [[arXiv:2503.06762](https://arxiv.org/abs/2503.06762)] [[Code](https://grbfnet.github.io/)]\n- **📝 说明**：\n\n#### [92] CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving\n- **🧑‍🔬 作者**：Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll\n- **🏫 单位**：Fraunhofer IVI ⟐ TU Munich ⟐ TU Delft ⟐ TH Ingolstadt\n- **🔗 链接**：[[中英摘要](./abs/2503.06744.md)] [[arXiv:2503.06744](https://arxiv.org/abs/2503.06744)] [Code]\n- **📝 说明**：\n\n#### [93] D3DR: Lighting-Aware Object Insertion in Gaussian Splatting\n- **🧑‍🔬 作者**：Vsevolod Skorokhodov, Nikita Durasov, Pascal Fua\n- **🏫 单位**：EPFL, Lausanne, Switzerland\n- **🔗 链接**：[[中英摘要](./abs/2503.06740.md)] [[arXiv:2503.06740](https://arxiv.org/abs/2503.06740)] [Code]\n- **📝 说明**：\n\n#### [94] REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints\n- **🧑‍🔬 作者**：Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, Cewu Lu\n- **🏫 单位**：Hefei Institutes of Physical Science Chinese Academy of Sciences ⟐ University of Science and Technology of China ⟐ Hefei University of Technology ⟐ Shanghai Jiao Tong University ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](./abs/2503.06677.md)] [[arXiv:2503.06677](https://arxiv.org/abs/2503.06677)] [Code]\n- **📝 说明**：\n\n#### [95] Pixel to Gaussian: Ultra-Fast Continuous Super-Resolution with 2D Gaussian Modeling\n- **🧑‍🔬 作者**：Long Peng, Anran Wu, Wenbo Li, Peizhe Xia, Xueyuan Dai, Xinjie Zhang, Xin Di, Haoze Sun, Renjing Pei, Yang Wang, Yang Cao, Zheng-Jun Zha\n- **🏫 单位**：USTC ⟐ AHU ⟐ Huawei Noah’s Ark Lab ⟐ Chang’an University ⟐ HKUST ⟐ THU\n- **🔗 链接**：[[中英摘要](./abs/2503.06617.md)] [[arXiv:2503.06617](https://arxiv.org/abs/2503.06617)] [[Code](https://github.com/peylnog/ContinuousSR)]\n- **📝 说明**：\n\n#### [96] StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zexu Huang, Min Xu, Stuart Perry\n- **🏫 单位**：University of Technology Sydney\n- **🔗 链接**：[[中英摘要](./abs/2503.06462.md)] [[arXiv:2503.06462](https://arxiv.org/abs/2503.06462)] [Code]\n- **📝 说明**：\n\n#### [97] StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams\n- **🧑‍🔬 作者**：Yang LI, Jinglu Wang, Lei Chu, Xiao Li, Shiu-hong Kao, Ying-Cong Chen, Yan Lu\n- **🏫 单位**：Media Computing Group, Microsoft Research Asia ⟐ CSE Dept., HKUST ⟐ AI Thrust, HKUST(GZ)\n- **🔗 链接**：[[中英摘要](./abs/2503.06235.md)] [[arXiv:2503.06235](https://arxiv.org/abs/2503.06235)] [Code]\n- **📝 说明**：\n\n#### [98] ForestSplats: Deformable transient field for Gaussian Splatting in the Wild\n- **🧑‍🔬 作者**：Wongi Park, Myeongseok Nam, Siwon Kim, Sangwoo Jo, Soomok Lee\n- **🏫 单位**：Ajou University ⟐ Minds and Company\n- **🔗 链接**：[[中英摘要](./abs/2503.06179.md)] [[arXiv:2503.06179](https://arxiv.org/abs/2503.06179)] [Code]\n- **📝 说明**：\n\n#### [99] Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction\n- **🧑‍🔬 作者**：Kai Li, Junhao Wang, William Han, Ding Zhao\n- **🏫 单位**：University of Toronto ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2503.06161.md)] [[arXiv:2503.06161](https://arxiv.org/abs/2503.06161)] [Code]\n- **📝 说明**：\n\n#### [100] GSV3D: Gaussian Splatting-based Geometric Distillation with Stable Video Diffusion for Single-Image 3D Object Generation\n- **🧑‍🔬 作者**：Ye Tao, Jiawei Zhang, Yahao Shi, Dongqing Zou, Bin Zhou\n- **🏫 单位**：State Key Laboratory of Virtual Reality Technology and Systems, Beihang University ⟐ SenseTime Research ⟐ PBVR\n- **🔗 链接**：[[中英摘要](./abs/2503.06136.md)] [[arXiv:2503.06136](https://arxiv.org/abs/2503.06136)] [Code]\n- **📝 说明**：\n\n#### [101] Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting\n- **🧑‍🔬 作者**：Dominic Maggio, Luca Carlone\n- **🏫 单位**：Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.05949.md)] [[arXiv:2503.05949](https://arxiv.org/abs/2503.05949)] [[Code](https://github.com/MIT-SPARK/Bayesian-Fields)]\n- **📝 说明**：\n\n#### [102] D2GV: Deformable 2D Gaussian Splatting for Video Representation in 400FPS\n- **🧑‍🔬 作者**：Mufan Liu, Qi Yang, Miaoran Zhao, He Huang, Le Yang, Zhu Li, Yiling Xu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Missouri, Kansas City ⟐ University of Canterbury\n- **🔗 链接**：[[中英摘要](./abs/2503.05600.md)] [[arXiv:2503.05600](https://arxiv.org/abs/2503.05600)] [[Code](https://github.com/Evan-sudo/D2GV)]\n- **📝 说明**：\n\n#### [103] Free Your Hands: Lightweight Relightable Turntable Capture Pipeline\n- **🧑‍🔬 作者**：Jian Shen, Huai Yu, Ji Wu, Wen Yang, Gui-Song Xia\n- **🏫 单位**：Nanjing University of Science and Technology, China ⟐ Adobe Research, USA ⟐ Nanjing University of Science and Technology, China\n- **🔗 链接**：[[中英摘要](./abs/2503.05511.md)] [[arXiv:2503.05511](https://arxiv.org/abs/2503.05511)] [Code]\n- **📝 说明**：\n\n#### [104] STGA: Selective-Training Gaussian Head Avatars\n- **🧑‍🔬 作者**：Hanzhi Guo, Yixiao Chen, Dongye Xiaonuo, Zeyu Tian, Dongdong Weng, Le Luo\n- **🏫 单位**：Beijing Institute of Technology, China\n- **🔗 链接**：[[中英摘要](./abs/2503.05196.md)] [[arXiv:2503.05196](https://arxiv.org/abs/2503.05196)] [Code]\n- **📝 说明**：\n\n#### [105] SplatPose: Geometry-Aware 6-DoF Pose Estimation from Single RGB Image via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Linqi Yang, Xiongwei Zhao, Qihao Sun, Ke Wang, Ao Chen, Peng Kang\n- **🏫 单位**：State Key Laboratory of Robotics and System, Harbin Institute of Technology ⟐ Zhengzhou Research Institute, Harbin Institute of Technology ⟐ School of Information Science and Technology, Harbin Institute of Technology (Shen Zhen) ⟐ Jianghuai Advance Technology Center, Hefei\n- **🔗 链接**：[[中英摘要](./abs/2503.05174.md)] [[arXiv:2503.05174](https://arxiv.org/abs/2503.05174)] [Code]\n- **📝 说明**：\n\n#### [106] SeeLe: A Unified Acceleration Framework for Real-Time Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaotong Huang, He Zhu, Zihan Liu, Weikai Lin, Xiaohong Liu, Zhezhi He, Jingwen Leng, Minyi Guo, Yu Feng\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai Qi Zhi Institute ⟐ University of Rochester\n- **🔗 链接**：[[中英摘要](./abs/2503.05168.md)] [[arXiv:2503.05168](https://arxiv.org/abs/2503.05168)] [[Code](https://github.com/SJTU-MVCLab/SeeLe)]\n- **📝 说明**：\n\n#### [107] EvolvingGS: High-Fidelity Streamable Volumetric Video via Evolving 3D Gaussian Representation\n- **🧑‍🔬 作者**：Chao Zhang, Yifeng Zhou, Shuheng Wang, Wenfa Li, Degang Wang, Yi Xu, Shaohui Jiao\n- **🏫 单位**：Bytedance\n- **🔗 链接**：[[中英摘要](./abs/2503.05162.md)] [[arXiv:2503.05162](https://arxiv.org/abs/2503.05162)] [Code]\n- **📝 说明**：\n\n#### [108] GaussianCAD: Robust Self-Supervised CAD Reconstruction from Three Orthographic Views Using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zheng Zhou, Zhe Li, Bo Yu, Lina Hu, Liang Dong, Zijian Yang, Xiaoli Liu, Ning Xu, Ziwei Wang, Yonghao Dang, Jianqin Yin\n- **🏫 单位**：State Grid Hubei Electric Power Co., Ltd. Information and Communication Company, Hubei, China ⟐ School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications, Beijing, China\n- **🔗 链接**：[[中英摘要](./abs/2503.05161.md)] [[arXiv:2503.05161](https://arxiv.org/abs/2503.05161)] [Code]\n- **📝 说明**：\n\n#### [109] GSplatVNM: Point-of-View Synthesis for Visual Navigation Models Using Gaussian Splatting\n- **🧑‍🔬 作者**：Yingji Zhong, Zhihao Li, Dave Zhenyu Chen, Lanqing Hong, Dan Xu\n- **🏫 单位**：CyberAgent AI Lab, Tokyo, Japan ⟐ The Department of Mechanical kohei, Systems gineering, Nagoya University\n- **🔗 链接**：[[中英摘要](./abs/2503.05152.md)] [[arXiv:2503.05152](https://arxiv.org/abs/2503.05152)] [Code]\n- **📝 说明**：\n\n#### [110] GaussianVideo: Efficient Video Representation and Compression by Gaussian Splatting\n- **🧑‍🔬 作者**：Inseo Lee, Youngyoon Choi, Joonseok Lee\n- **🏫 单位**：Seoul National University\n- **🔗 链接**：[[中英摘要](./abs/2503.04333.md)] [[arXiv:2503.04333](https://arxiv.org/abs/2503.04333)] [Code]\n- **📝 说明**：\n\n#### [111] Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting\n- **🧑‍🔬 作者**：Shuojue Yang, Zijian Wu, Mingxuan Hong, Qian Li, Daiyun Shen, Septimiu E. Salcudean, Yueming Jin\n- **🏫 单位**：National University of Singapore, Singapore, Singapore ⟐ The University of British Columbia, Vancouver, Canada\n- **🔗 链接**：[[中英摘要](./abs/2503.04082.md)] [[arXiv:2503.04082](https://arxiv.org/abs/2503.04082)] [Code]\n- **📝 说明**：\n\n#### [112] Surgical Gaussian Surfels: Highly Accurate Real-time Surgical Scene Rendering\n- **🧑‍🔬 作者**：Idris O. Sunmola, Zhenjun Zhao, Samuel Schmidgall, Yumeng Wang, Paul Maria Scheikl, Axel Krieger\n- **🏫 单位**：Johns Hopkins University, Baltimore, MD, USA ⟐ Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2503.04079.md)] [[arXiv:2503.04079](https://arxiv.org/abs/2503.04079)] [[Code](https://github.com/aloma85/SurgicalGaussianSurfels)]\n- **📝 说明**：\n\n#### [113] Beyond Existance: Fulfill 3D Reconstructed Scenes with Pseudo Details\n- **🧑‍🔬 作者**：Yifei Gao, Jun Huang, Lei Wang, Ruiting Dai, Jun Cheng\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2503.04037.md)] [[arXiv:2503.04037](https://arxiv.org/abs/2503.04037)] [Code]\n- **📝 说明**：\n\n#### [114] GaussianGraph: 3D Gaussian-based Scene Graph Generation for Open-world Scene Understanding\n- **🧑‍🔬 作者**：Xihan Wang, Dianyi Yang, Yu Gao, Yufeng Yue, Yi Yang, Mengyin Fu\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.04034.md)] [[arXiv:2503.04034](https://arxiv.org/abs/2503.04034)] [[Code](https://github.com/WangXihan-bit/GaussianGraph/)]\n- **📝 说明**：\n\n#### [115] LensDFF: Language-enhanced Sparse Feature Distillation for Efficient Few-Shot Dexterous Manipulation\n- **🧑‍🔬 作者**：Qian Feng, David S. Martinez Lema, Jianxiang Feng, Zhaopeng Chen, Alois Knoll\n- **🏫 单位**：Agile Robots SE ⟐ TUMSchoolof Information Computation and Technology, Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2503.03890.md)] [[arXiv:2503.03890](https://arxiv.org/abs/2503.03890)] [Code]\n- **📝 说明**：\n\n#### [116] DQO-MAP: Dual Quadrics Multi-Object mapping with Gaussian Splatting\n- **🧑‍🔬 作者**：Haoyuan Li, Ziqin Ye, Yue Hao, Weiyang Lin, Chao Ye\n- **🏫 单位**：Research Institute of Intelligent Control and Systems, Harbin Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.02223.md)] [[arXiv:2503.02223](https://arxiv.org/abs/2503.02223)] [[Code](https://github.com/LiHaoy-ux/DQO-MAP)]\n- **📝 说明**：\n\n#### [117] LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training\n- **🧑‍🔬 作者**：Kaimin Liao\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2503.01199.md)] [[arXiv:2503.01199](https://arxiv.org/abs/2503.01199)] [[Code](https://github.com/MooreThreads/LiteGS)]\n- **📝 说明**：\n\n#### [118] FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion\n- **🧑‍🔬 作者**：Yansong Xu, Junlin Li, Wei Zhang, Siyu Chen, Shengyong Zhang, Yuquan Leng, Weijia Zhou\n- **🏫 单位**：State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ Mechanical and Energy Engineering, Southern University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2503.01109.md)] [[arXiv:2503.01109](https://arxiv.org/abs/2503.01109)] [Code]\n- **📝 说明**：\n\n#### [119] Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization\n- **🧑‍🔬 作者**：You Shen, Zhipeng Zhang, Xinyang Li, Yansong Qu, Yu Lin, Shengchuan Zhang, Liujuan Cao\n- **🏫 单位**：Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University\n- **🔗 链接**：[[中英摘要](./abs/2503.00881.md)] [[arXiv:2503.00881](https://arxiv.org/abs/2503.00881)] [Code]\n- **📝 说明**：\n\n#### [120] Vid2Fluid: 3D Dynamic Fluid Assets from Single-View Videos with Generative Gaussian Splatting\n- **🧑‍🔬 作者**：Zhiwei Zhao, Alan Zhao, Minchen Li, Yixin Hu\n- **🏫 单位**：Tencent, China ⟐ Carnegie Mellon University, USA\n- **🔗 链接**：[[中英摘要](./abs/2503.00868.md)] [[arXiv:2503.00848](https://arxiv.org/abs/2503.00868)] [Code]\n- **📝 说明**：\n\n#### [121] PSRGS:Progressive Spectral Residual of 3D Gaussian for High-Frequency Recovery\n- **🧑‍🔬 作者**：BoCheng Li, WenJuan Zhang, Bing Zhang, YiLing Yao, YaNing Wang\n- **🏫 单位**：Aerospace Information Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2503.00848.md)] [[arXiv:2503.00848](https://arxiv.org/abs/2503.00848)] [Code]\n- **📝 说明**：\n\n#### [122] Enhancing Monocular 3D Scene Completion with Diffusion Model\n- **🧑‍🔬 作者**：Changlin Song, Jiaqi Wang, Liyun Zhu, He Weng\n- **🏫 单位**：Australian National University\n- **🔗 链接**：[[中英摘要](./abs/2503.00726.md)] [[arXiv:2503.00726](https://arxiv.org/abs/2503.00726)] [[Code](https://github.com/CharlieSong1999/FlashDreamer)]\n- **📝 说明**：\n\n#### [123] Scalable Real2Sim: Physics-Aware Asset Generation Via Robotic Pick-and-Place Setups\n- **🧑‍🔬 作者**：Nicholas Pfaff, Evelyn Fu, Jeremy Binagia, Phillip Isola, Russ Tedrake\n- **🏫 单位**：the Massachusetts Institute of Technology, Cambridge, MA, USA ⟐ Amazon Robotics\n- **🔗 链接**：[[中英摘要](./abs/2503.00370.md)] [[arXiv:2503.00370](https://arxiv.org/abs/2503.00370)] [[Code](https://github.com/nepfaff/scalable-real2sim)]\n- **📝 说明**：\n\n#### [124] Abstract Rendering: Computing All that is Seen in Gaussian Splat Scenes\n- **🧑‍🔬 作者**：Yangge Li, Chenxi Ji, Xiangru Zhong, Huan Zhang, Sayan Mitra\n- **🏫 单位**：University of Illinois at Urbana-Champaign Urbana ⟐ National Chung Cheng University, Taiwan\n- **🔗 链接**：[[中英摘要](./abs/2503.00308.md)] [[arXiv:2503.00308](https://arxiv.org/abs/2503.00308)] [Code]\n- **📝 说明**：\n\n#### [125] Seeing A 3D World in A Grain of Sand\n- **🧑‍🔬 作者**：Yufan Zhang, Yu Ji, Yu Guo, Jinwei Ye\n- **🏫 单位**：George Mason University ⟐ LightThought LLC\n- **🔗 链接**：[[中英摘要](./abs/2503.00260.md)] [[arXiv:2503.00260](https://arxiv.org/abs/2503.00260)] [Code]\n- **📝 说明**：\n\n#### [126] FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering\n- **🧑‍🔬 作者**：Jingqiu Zhou, Lue Fan, Linjiang Huang, Xiaoyu Shi, Si Liu, Zhaoxiang Zhang, Hongsheng Li\n- **🏫 单位**：Multimedia Laboratory, The Chinese University of Hong Kong ⟐ Centre for Perceptual and Interactive Intelligence, Hong Kong ⟐ Institute of Automation, Chinese Academy of Sciences ⟐ Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2502.21093.md)] [[arXiv:2502.21093](https://arxiv.org/abs/2502.21093)] [Code]\n- **📝 说明**：\n\n#### [127] EndoPBR: Material and Lighting Estimation for Photorealistic Surgical Simulations via Physically-based Rendering\n- **🧑‍🔬 作者**：John J. Han, Jie Ying Wu\n- **🏫 单位**：Vanderbilt University\n- **🔗 链接**：[[中英摘要](./abs/2502.20669.md)] [[arXiv:2502.20669](https://arxiv.org/abs/2502.20669)] [[Code](https://github.com/juseonghan/EndoPBR)]\n- **📝 说明**：\n\n#### [128] No Parameters, No Problem: 3D Gaussian Splatting without Camera Intrinsics and Extrinsics\n- **🧑‍🔬 作者**：Dongbo Shi, Shen Cao, Lubin Fan, Bojian Wu, Jinhui Guo, Renjie Chen, Ligang Liu, Jieping Ye\n- **🏫 单位**：University of Science and Technology of China ⟐ Individual Researcher\n- **🔗 链接**：[[中英摘要](./abs/2502.19800.md)] [[arXiv:2502.19800](https://arxiv.org/abs/2502.19800)] [Code]\n- **📝 说明**：\n\n#### [129] LUCAS: Layered Universal Codec Avatars\n- **🧑‍🔬 作者**：Di Liu, Teng Deng, Giljoo Nam, Yu Rong, Stanislav Pidhorskyi, Junxuan Li, Jason Saragih, Dimitris N. Metaxas, Chen Cao\n- **🏫 单位**：Codec Avatars Lab, Meta ⟐ Rutgers University\n- **🔗 链接**：[[中英摘要](./abs/2502.19739.md)] [[arXiv:2502.19739](https://arxiv.org/abs/2502.19739)] [Code]\n- **📝 说明**：\n\n#### [130] OpenFly: A Versatile Toolchain and Large-scale Benchmark for Aerial Vision-Language Navigation\n- **🧑‍🔬 作者**：Yunpeng Gao, Chenhui Li, Zhongrui You, Junli Liu, Zhen Li, Pengan Chen, Qizhi Chen, Zhonghan Tang, Liansheng Wang, Penghui Yang, Yiwen Tang, Yuhang Tang, Shuai Liang, Songyi Zhu, Ziqin Xiong, Yifei Su, Xinyi Ye, Jianan Li, Yan Ding, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li\n- **🏫 单位**：Shanghai AI Laboratory ⟐ North western Polytechnical University ⟐ Beijing University of Posts and Telecommunications ⟐ Shanghai JiaoTong University ⟐ The University of HongKong ⟐ Zhejiang University ⟐ University of Science and Technology of China ⟐ East China University of Science and Technology ⟐ Fudan University ⟐ Institute of Automation, Chinese Academy of Sciences ⟐ Tele AI\n- **🔗 链接**：[[中英摘要](./abs/2502.18041.md)] [[arXiv:2502.18041](https://arxiv.org/abs/2502.18041)] [[Code](https://github.com/SHAILAB-IPEC/OpenFly-Platform)]\n- **📝 说明**：\n\n#### [131] Laplace-Beltrami Operator for Gaussian Splatting\n- **🧑‍🔬 作者**：Hongyu Zhou, Zorah Lähner\n- **🏫 单位**：University of Bonn ⟐ Lamarr Institute\n- **🔗 链接**：[[中英摘要](./abs/2502.17531.md)] [[arXiv:2502.17531](https://arxiv.org/abs/2502.17531)] [Code]\n- **📝 说明**：\n\n#### [132] Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control\n- **🧑‍🔬 作者**：Jinbo Yan, Alan Zhao, Yixin Hu\n- **🏫 单位**：Tencent\n- **🔗 链接**：[[中英摘要](./abs/2502.16475.md)] [[arXiv:2502.16475](https://arxiv.org/abs/2502.16475)] [Code]\n- **📝 说明**：\n\n#### [133] DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation\n- **🧑‍🔬 作者**：Luzhou Ge, Xiangyu Zhu, Zhuo Yang, Xuesong Li\n- **🏫 单位**：School of Computer Science, Beijing Institute of Technology, China\n- **🔗 链接**：[[中英摘要](./abs/2502.15309.md)] [[arXiv:2502.15309](https://arxiv.org/abs/2502.15309)] [[Code](https://github.com/GeLuzhou/Dynamic-GSG)]\n- **📝 说明**：\n\n#### [134] GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models\n- **🧑‍🔬 作者**：Miao Tao, Yuanzhen Zhou, Haoran Xu, Zeyu He, Zhenyu Yang, Yuchang Zhang, Zhongling Su, Linning Xu, Zhenxiang Ma, Rong Fu, Hengjie Li, Xingcheng Zhang, Jidong Zhai\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory, Shanghai ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2502.14938.md)] [[arXiv:2502.14938](https://arxiv.org/abs/2502.14938)] [Code]\n- **📝 说明**：\n\n#### [135] Hier-SLAM++: Neuro-Symbolic Semantic SLAM with a Hierarchically Categorical Gaussian Splatting\n- **🧑‍🔬 作者**：Boying Li, Vuong Chi Hao, Peter J. Stuckey, Ian Reid, Hamid Rezatofighi\n- **🏫 单位**：Monash University ⟐ VinUniversity, Vietnam ⟐ Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates\n- **🔗 链接**：[[中英摘要](./abs/2502.14931.md)] [[arXiv:2502.14931](https://arxiv.org/abs/2502.14931)] [Code]\n- **📝 说明**：\n\n#### [136] CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Qilin Zhang, Olaf Wysocki, Steffen Urban, Boris Jutzi\n- **🏫 单位**：Photogrammetry and Remote Sensing, TUM School of Engineering and Design, Technical University of Munich (TUM), Munich, Germany\n- **🔗 链接**：[[中英摘要](./abs/2502.14684.md)] [[arXiv:2502.14684](https://arxiv.org/abs/2502.14684)] [[Code](https://github.com/zqlin0521/cdgs-release)]\n- **📝 说明**：\n\n#### [137] OG-Gaussian: Occupancy Based Street Gaussians for Autonomous Driving\n- **🧑‍🔬 作者**：Yedong Shen, Xinran Zhang, Yifan Duan, Shiqi Zhang, Heng Li, Yilong Wu, Jianmin Ji, Yanyong Zhang\n- **🏫 单位**：School of Computer Science and Technology, University of Science and Technology of China ⟐ School of Artificial Intelligence and Data Science, University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2502.14235.md)] [[arXiv:2502.14235](https://arxiv.org/abs/2502.14235)] [Code]\n- **📝 说明**：\n\n#### [138] GlossGau: Efficient Inverse Rendering for Glossy Surface with Anisotropic Spherical Gaussian\n- **🧑‍🔬 作者**：Bang Du, Runfa Blark Li, Chen Du, Truong Nguyen\n- **🏫 单位**：University of California San Diego\n- **🔗 链接**：[[中英摘要](./abs/2502.14129.md)] [[arXiv:2502.14129](https://arxiv.org/abs/2502.14129)] [Code]\n- **📝 说明**：\n\n#### [139] Inter3D: A Benchmark and Strong Baseline for Human-Interactive 3D Object Reconstruction\n- **🧑‍🔬 作者**：Vincent Ress, Jonas Meyer, Wei Zhang, David Skuddis, Uwe Soergel, Norbert Haala\n- **🏫 单位**：Shenzhen University ⟐ Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) ⟐ Shanghai AI Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2502.14004.md)] [[arXiv:2502.14004](https://arxiv.org/abs/2502.14004)] [Code]\n- **📝 说明**：\n\n#### [140] 3D Gaussian Splatting aided Localization for Large and Complex Indoor-Environments\n- **🧑‍🔬 作者**：Vincent Ress, Jonas Meyer, Wei Zhang, David Skuddis, Uwe Soergel, Norbert Haala\n- **🏫 单位**：Institute for Photogrammetry and Geoinformatics, University of Stuttgart, Germany ⟐ Institute of Geomatics, University of Applied Sciences and Arts Northwestern Switzerland, Switzerland\n- **🔗 链接**：[[中英摘要](./abs/2502.13803.md)] [[arXiv:2502.13803](https://arxiv.org/abs/2502.13803)] [Code]\n- **📝 说明**：\n\n#### [141] GS-QA: Comprehensive Quality Assessment Benchmark for Gaussian Splatting View Synthesis\n- **🧑‍🔬 作者**：Pedro Martin, António Rodrigues, João Ascenso, Maria Paula Queluz\n- **🏫 单位**：Instituto de Telecomunicações, Instituto Superior Técnico, University of Lisbon\n- **🔗 链接**：[[中英摘要](./abs/2502.13196.md)] [[arXiv:2502.13196](https://arxiv.org/abs/2502.13196)] [Code]\n- **📝 说明**：\n\n#### [142] RadSplatter: Extending 3D Gaussian Splatting to Radio Frequencies for Wireless Radiomap Extrapolation\n- **🧑‍🔬 作者**：Yiheng Wang, Ye Xue, Shutao Zhang, Tsung-Hui Chang\n- **🏫 单位**：Shenzhen Research Institute of Big Data, Shenzhen ⟐ School of Data Science, The Chinese University of Hong Kong-Shenzhen, Shenzhen ⟐ School of Science and Engineering, The Chinese University of Hong Kong-Shenzhen, Shenzhen\n- **🔗 链接**：[[中英摘要](./abs/2502.12686.md)] [[arXiv:2502.12686](https://arxiv.org/abs/2502.12686)] [Code]\n- **📝 说明**：\n\n#### [143] Exploring the Versal AI Engine for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Kotaro Shimamura, Ayumi Ohno, Shinya Takamaeda-Yamazaki\n- **🏫 单位**：The University of Tokyo\n- **🔗 链接**：[[中英摘要](./abs/2502.11782.md)] [[arXiv:2502.11782](https://arxiv.org/abs/2502.11782)] [Code]\n- **📝 说明**：\n\n#### [144] GaussianMotion: End-to-End Learning of Animatable Gaussian Avatars with Pose Guidance from Text\n- **🧑‍🔬 作者**：Gyumin Shim, Sangmin Lee, Jaegul Choo\n- **🏫 单位**：Korea Advanced Institute of Science and Technology ⟐ Sungkyunkwan University\n- **🔗 链接**：[[中英摘要](./abs/2502.11642.md)] [[arXiv:2502.11642](https://arxiv.org/abs/2502.11642)] [Code]\n- **📝 说明**：\n\n#### [145] GS-GVINS: A Tightly-integrated GNSS-Visual-Inertial Navigation System Augmented by 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zelin Zhou, Saurav Uprety, Shichuang Nie, Hongzhou Yang\n- **🏫 单位**：University of Calgary\n- **🔗 链接**：[[中英摘要](./abs/2502.10975.md)] [[arXiv:2502.10975](https://arxiv.org/abs/2502.10975)] [Code]\n- **📝 说明**：\n\n#### [146] X-SG2S: Safe and Generalizable Gaussian Splatting with X-dimensional Watermarks\n- **🧑‍🔬 作者**：Zihang Cheng, Huiping Zhuang, Chun Li, Xin Meng, Ming Li, Fei Richard Yu\n- **🏫 单位**：South China University of Technology ⟐ ShenZhen MSU-BIT University ⟐ Peking University ⟐ Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)\n- **🔗 链接**：[[中英摘要](./abs/2502.10475.md)] [[arXiv:2502.10475](https://arxiv.org/abs/2502.10475)] [[Code](https://github.com/ChengLiDuoJi/XSGS)]\n- **📝 说明**：\n\n#### [147] DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior\n- **🧑‍🔬 作者**：Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang\n- **🏫 单位**：Dalian University of Technology ⟐ The University of Tokyo ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2502.09111.md)] [[arXiv:2502.09111](https://arxiv.org/abs/2502.09111)] [Code]\n- **📝 说明**：\n\n#### [148] TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation\n- **🧑‍🔬 作者**：Jeongyun Kim, Jeongho Noh, Dong-Guw Lee, Ayoung Kim\n- **🏫 单位**：SNU, Seoul, S. Korea\n- **🔗 链接**：[[中英摘要](./abs/2502.07840.md)] [[arXiv:2502.07840](https://arxiv.org/abs/2502.07840)] [[Code](https://github.com/jeongyun0609/TranSplat)]\n- **📝 说明**：\n\n#### [149] MeshSplats: Mesh-Based Rendering with Gaussian Splatting Initialization\n- **🧑‍🔬 作者**：Rafał Tobiasz, Grzegorz Wilczyński, Marcin Mazur, Sławomir Tadeja, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2502.07754.md)] [[arXiv:2502.07754](https://arxiv.org/abs/2502.07754)] [[Code](https://github.com/gwilczynski95/meshsplats)]\n- **📝 说明**：\n\n#### [150] SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps\n- **🧑‍🔬 作者**：Ola Shorinwa, Jiankai Sun, Mac Schwager, Anirudha Majumdar\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2502.06519.md)] [[arXiv:2502.06519](https://arxiv.org/abs/2502.06519)] [Code]\n- **📝 说明**：\n\n#### [151] Three-Dimensional MRI Reconstruction with Gaussian Representations: Tackling the Undersampling Problem\n- **🧑‍🔬 作者**：Tengya Peng, Ruyi Zha, Zhen Li, Xiaofeng Liu, Qing Zou\n- **🏫 单位**：University of Texas Southwestern Medical Centero ⟐ Australian National University ⟐ Yale University\n- **🔗 链接**：[[中英摘要](./abs/2502.06510.md)] [[arXiv:2502.06510](https://arxiv.org/abs/2502.06510)] [Code]\n- **📝 说明**：\n\n#### [152] Digital Twin Buildings: 3D Modeling, GIS Integration, and Visual Descriptions Using Gaussian Splatting, ChatGPT/Deepseek, and Google Maps Platform\n- **🧑‍🔬 作者**：Kyle Gao, Dening Lu, Liangzhi Li, Nan Chen, Hongjie He, Linlin Xu, Jonathan Li\n- **🏫 单位**：University of Waterloo ⟐ Xi’an Aeronautical University ⟐ Chang’an University ⟐ University of Calgary\n- **🔗 链接**：[[中英摘要](./abs/2502.05769.md)] [[arXiv:2502.05769](https://arxiv.org/abs/2502.05769)] [Code]\n- **📝 说明**：\n\n#### [153] High-Speed Dynamic 3D Imaging with Sensor Fusion Splatting\n- **🧑‍🔬 作者**：Zihao Zou, Ziyuan Qu, Xi Peng, Vivek Boominathan, Adithya Pediredla, Praneeth Chakravarthula\n- **🏫 单位**：University of North Carolina, Chapel Hill, USA ⟐ DartmouthCollege, USA ⟐ RiceUniversity, USA\n- **🔗 链接**：[[中英摘要](./abs/2502.04630.md)] [[arXiv:2502.04630](https://arxiv.org/abs/2502.04630)] [Code]\n- **📝 说明**：\n\n#### [154] GP-GS: Gaussian Processes for Enhanced Gaussian Splatting\n- **🧑‍🔬 作者**：Zhihao Guo, Jingxuan Su, Shenglin Wang, Jinlong Fan, Jing Zhang, Liangxiu Han, Peng Wang\n- **🏫 单位**：Manchester Metropolitan University ⟐ SECE, Peking University ⟐ Pengcheng Laboratory ⟐ Hangzhou Dianzi University ⟐ Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2502.02283.md)] [[arXiv:2502.02283](https://arxiv.org/abs/2502.02283)] [Code]\n- **📝 说明**：\n\n#### [155] LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation\n- **🧑‍🔬 作者**：Yang Zhou, Zongjin He, Qixuan Li, Chao Wang\n- **🏫 单位**：ShangHai University\n- **🔗 链接**：[[中英摘要](./abs/2502.01949.md)] [[arXiv:2502.01949](https://arxiv.org/abs/2502.01949)] [Code]\n- **📝 说明**：\n\n#### [156] Scalable 3D Gaussian Splatting-Based RF Signal Spatial Propagation Modeling\n- **🧑‍🔬 作者**：Kang Yang, Gaofeng Dong, Sijie Ji, Wan Du, Mani Srivastava\n- **🏫 单位**：University of California, Los Angeles ⟐ University of California, Merced\n- **🔗 链接**：[[中英摘要](./abs/2502.01826.md)] [[arXiv:2502.01826](https://arxiv.org/abs/2502.01826)] [Code]\n- **📝 说明**：\n\n#### [157] Radiant Foam: Real-Time Differentiable Ray Tracing\n- **🧑‍🔬 作者**：Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi\n- **🏫 单位**：Simon Fraser University ⟐ University of British Columbia ⟐ University of Toronto ⟐ Google Deepmind\n- **🔗 链接**：[[中英摘要](./abs/2502.01157.md)] [[arXiv:2502.01157](https://arxiv.org/abs/2502.01157)] [[Code](https://github.com/theialab/radfoam)]\n- **📝 说明**：\n\n#### [158] PhiP-G: Physics-Guided Text-to-3D Compositional Scene Generation\n- **🧑‍🔬 作者**：Qixuan Li, Chao Wang, Zongjin He, Yan Peng\n- **🏫 单位**：Shanghai University\n- **🔗 链接**：[[中英摘要](./abs/2502.00708.md)] [[arXiv:2502.00708](https://arxiv.org/abs/2502.00708)] [Code]\n- **📝 说明**：\n\n#### [159] EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis\n- **🧑‍🔬 作者**：Junuk Cha, Seongro Yoon, Valeriya Strizhkova, Francois Bremond, Seungryul Baek\n- **🏫 单位**：UNIST ⟐ Inria\n- **🔗 链接**：[[中英摘要](./abs/2502.00654.md)] [[arXiv:2502.00654](https://arxiv.org/abs/2502.00654)] [Code]\n- **📝 说明**：\n\n#### [160] RaySplats: Ray Tracing based Gaussian Splatting\n- **🧑‍🔬 作者**：Krzysztof Byrski, Marcin Mazur, Jacek Tabor, Tadeusz Dziarmaga, Marcin Kądziołka, Dawid Baran, Przemysław Spurek\n- **🏫 单位**：Jagiellonian University\n- **🔗 链接**：[[中英摘要](./abs/2501.19196.md)] [[arXiv:2501.19196](https://arxiv.org/abs/2501.19196)] [[Code](https://github.com/KByrski/RaySplatting)]\n- **📝 说明**：\n\n#### [161] JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zhoutao Sun, Xukun Shen, Yong Hu, Yuyou Zhong, Xueyang Zhou\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2501.19088.md)] [[arXiv:2501.19088](https://arxiv.org/abs/2501.19088)] [Code]\n- **📝 说明**：\n\n#### [162] StructuredField: Unifying Structured Geometry and Radiance Field\n- **🧑‍🔬 作者**：Kaiwen Song, Jinkai Cui, Zherui Qiu, Juyong Zhang\n- **🏫 单位**：University of Science and Technology of China\n- **🔗 链接**：[[中英摘要](./abs/2501.18152.md)] [[arXiv:2501.18152](https://arxiv.org/abs/2501.18152)] [Code]\n- **📝 说明**：\n\n#### [163] VoD-3DGS: View-opacity-Dependent 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Nowak Mateusz, Jarosz Wojciech, Chin Peter\n- **🏫 单位**：Dartmouth College\n- **🔗 链接**：[[中英摘要](./abs/2501.17978.md)] [[arXiv:2501.17978](https://arxiv.org/abs/2501.17978)] [Code]\n- **📝 说明**：\n\n#### [164] CrowdSplat: Exploring Gaussian Splatting For Crowd Rendering\n- **🧑‍🔬 作者**：Xiaohan Sun, Yinghan Xu, John Dingliana, Carol O'Sullivan\n- **🏫 单位**：Trinity College Dublin\n- **🔗 链接**：[[中英摘要](./abs/2501.17792.md)] [[arXiv:2501.17792](https://arxiv.org/abs/2501.17792)] [Code]\n- **📝 说明**：\n\n#### [165] FeatureGS: Eigenvalue-Feature Optimization in 3D Gaussian Splatting for Geometrically Accurate and Artifact-Reduced Reconstruction\n- **🧑‍🔬 作者**：Miriam Jäger, Markus Hillemann, Boris Jutzi\n- **🏫 单位**：Karlsruhe Institute of Technology, Karlsruhe, Germany\n- **🔗 链接**：[[中英摘要](./abs/2501.17655.md)] [[arXiv:2501.17655](https://arxiv.org/abs/2501.17655)] [Code]\n- **📝 说明**：\n\n#### [166] Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds\n- **🧑‍🔬 作者**：Xiaohan Sun, Yinghan Xu, John Dingliana, Carol O'Sullivan\n- **🏫 单位**：Trinity College Dublin, Ireland\n- **🔗 链接**：[[中英摘要](./abs/2501.17085.md)] [[arXiv:2501.17085](https://arxiv.org/abs/2501.17085)] [Code]\n- **📝 说明**：\n\n#### [167] GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Jiajun Dong, Chengkun Wang, Wenzhao Zheng, Lei Chen, Jiwen Lu, Yansong Tang\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2501.15619.md)] [[arXiv:2501.15619](https://arxiv.org/abs/2501.15619)] [[Code](https://github.com/ChrisDong-THU/GaussianToken)]\n- **📝 说明**：\n\n#### [168] Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos\n- **🧑‍🔬 作者**：Zhen-Hui Dong, Sheng Ye, Yu-Hui Wen, Nannan Li, Yong-Jin Liu\n- **🏫 单位**：Tsinghua University ⟐ Beijing Jiaotong University ⟐ Maritime University\n- **🔗 链接**：[[中英摘要](./abs/2501.15096.md)] [[arXiv:2501.15096](https://arxiv.org/abs/2501.15096)] [Code]\n- **📝 说明**：\n\n#### [169] HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion\n- **🧑‍🔬 作者**：Yingzhi Tang, Qijian Zhang, Junhui Hou\n- **🏫 单位**：City University of Hong Kong ⟐ TiMi L1 Studio of Tencent Games\n- **🔗 链接**：[[中英摘要](./abs/2501.15008.md)] [[arXiv:2501.15008](https://arxiv.org/abs/2501.15008)] [Code]\n- **📝 说明**：\n\n#### [170] 3DGS2: Near Second-order Converging 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Lei Lan, Tianjia Shao, Zixuan Lu, Yu Zhang, Chenfanfu Jiang, Yin Yang\n- **🏫 单位**：University of Utah ⟐ Zhejiang University ⟐ University of California, Los Angeles\n- **🔗 链接**：[[中英摘要](./abs/2501.13975.md)] [[arXiv:2501.13975](https://arxiv.org/abs/2501.13975)] [Code]\n- **📝 说明**：\n\n#### [171] GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting\n- **🧑‍🔬 作者**：Junzhe Jiang, Chun Gu, Yurui Chen, Li Zhang\n- **🏫 单位**：School of Data Science, Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2501.13971.md)] [[arXiv:2501.13971](https://arxiv.org/abs/2501.13971)] [[Code](https://github.com/fudan-zvg/GS-LiDAR)]\n- **📝 说明**：\n\n#### [172] GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression\n- **🧑‍🔬 作者**：Francesco Di Sario, Riccardo Renzulli, Marco Grangetto, Akihiro Sugimoto, Enzo Tartaglione\n- **🏫 单位**：University of Turin ⟐ National Institute of Informatics, Japan ⟐ Institut Polytechnique de Paris, France\n- **🔗 链接**：[[中英摘要](./abs/2501.13558.md)] [[arXiv:2501.13558](https://arxiv.org/abs/2501.13558)] [Code]\n- **📝 说明**：\n\n#### [173] MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance\n- **🧑‍🔬 作者**：Wooseok Song, Seunggyu Chang, Jaejun Yoo\n- **🏫 单位**：UNIST ⟐ NAVER Cloud\n- **🔗 链接**：[[中英摘要](./abs/2501.13449.md)] [[arXiv:2501.13449](https://arxiv.org/abs/2501.13449)] [Code]\n- **📝 说明**：\n\n#### [174] GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization\n- **🧑‍🔬 作者**：Jaewon Lee, Mangyu Kong, Minseong Park, Euntai Kim\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2501.13417.md)] [[arXiv:2501.13417](https://arxiv.org/abs/2501.13417)] [Code]\n- **📝 说明**：\n\n#### [175] VIGS SLAM: IMU-based Large-Scale 3D Gaussian Splatting SLAM\n- **🧑‍🔬 作者**：Gyuhyeon Pak, Euntai Kim\n- **🏫 单位**：Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2501.13402.md)] [[arXiv:2501.13402](https://arxiv.org/abs/2501.13402)] [Code]\n- **📝 说明**：\n\n#### [176] Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos\n- **🧑‍🔬 作者**：Xianrui Luo, Juewen Peng, Zhongang Cai, Lei Yang, Fan Yang, Zhiguo Cao, Guosheng Lin\n- **🏫 单位**：Nanyang Technological University ⟐ Huazhong University of Science and Technology ⟐ SenseTime Research\n- **🔗 链接**：[[中英摘要](./abs/2501.13335.md)] [[arXiv:2501.13335](https://arxiv.org/abs/2501.13335)] [Code]\n- **📝 说明**：\n\n#### [177] Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes\n- **🧑‍🔬 作者**：Yuang Shi, Simone Gasparini, Géraldine Morin, Chenggang Yang, Wei Tsang Ooi\n- **🏫 单位**：National University of Singapore ⟐ IRIT - University of Toulouse\n- **🔗 链接**：[[中英摘要](./abs/2501.13045.md)] [[arXiv:2501.13045](https://arxiv.org/abs/2501.13045)] [Code]\n- **📝 说明**：\n\n#### [178] DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions\n- **🧑‍🔬 作者**：Vishagar Arunan, Saeedha Nazar, Hashiru Pramuditha, Vinasirajan Viruthshaan, Sameera Ramasinghe, Simon Lucey, Ranga Rodrigo\n- **🏫 单位**：University of Moratuwa ⟐ University of Adelaide\n- **🔗 链接**：[[中英摘要](./abs/2501.12369.md)] [[arXiv:2501.12369](https://arxiv.org/abs/2501.12369)] [[Code](https://github.com/viruthshaan/darb-splatting)]\n- **📝 说明**：\n\n#### [179] GaussianVideo: Efficient Video Representation Through 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Longan Wang, Yuang Shi, Wei Tsang Ooi\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2501.12060.md)] [[arXiv:2501.12060](https://arxiv.org/abs/2501.12060)] [Code]\n- **📝 说明**：\n\n#### [180] RDG-GS: Relative Depth Guidance with Gaussian Splatting for Real-time Sparse-View 3D Rendering\n- **🧑‍🔬 作者**：Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2501.11102.md)] [[arXiv:2501.11102](https://arxiv.org/abs/2501.11102)] [Code]\n- **📝 说明**：\n\n#### [181] Car-GS: Addressing Reflective and Transparent Surface Challenges in 3D Car Reconstruction\n- **🧑‍🔬 作者**：Congcong Li, Jin Wang, Xiaomeng Wang, Xingchen Zhou, Wei Wu, Yuzhi Zhang, Tongyi Cao\n- **🏫 单位**：DeepRoute.AI\n- **🔗 链接**：[[中英摘要](./abs/2501.11020.md)] [[arXiv:2501.11020](https://arxiv.org/abs/2501.11020)] [Code]\n- **📝 说明**：\n\n#### [182] CityLoc: 6 DoF Localization of Text Descriptions in Large-Scale Scenes with Gaussian Representation\n- **🧑‍🔬 作者**：Qi Ma, Runyi Yang, Bin Ren, Ender Konukoglu, Luc Van Gool, Danda Pani Paudel\n- **🏫 单位**：Computer Vision Lab, ETH Zurich ⟐ INSAIT, Sofia University ⟐ University of Pisa ⟐ University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2501.08982.md)] [[arXiv:2501.08982](https://arxiv.org/abs/2501.08982)] [Code]\n- **📝 说明**：\n\n#### [183] GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping\n- **🧑‍🔬 作者**：Sheng Hong, Chunran Zheng, Yishu Shen, Changze Li, Fu Zhang, Tong Qin, Shaojie Shen\n- **🏫 单位**：The Hong Kong University of Science and Technology ⟐ The University of Hong Kong ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2501.08672.md)] [[arXiv:2501.08672](https://arxiv.org/abs/2501.08672)] [Code]\n- **📝 说明**：\n\n#### [184] VINGS-Mono: Visual-Inertial Gaussian Splatting Monocular SLAM in Large Scenes\n- **🧑‍🔬 作者**：Ke Wu, Zicheng Zhang, Muer Tie, Ziqing Ai, Zhongxue Gan, Wenchao Ding\n- **🏫 单位**：Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2501.08286.md)] [[arXiv:2501.08286](https://arxiv.org/abs/2501.08286)] [Code]\n- **📝 说明**：\n\n#### [185] Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes\n- **🧑‍🔬 作者**：Yuhang Zhang, Joshua Maraval, Zhengyu Zhang, Nicolas Ramin, Shishun Tian, Lu Zhang\n- **🏫 单位**：Shenzhen University ⟐ IRT ⟐ Guangzhou University ⟐ Univ Rennes\n- **🔗 链接**：[[中英摘要](./abs/2501.08072.md)] [[arXiv:2501.08072](https://arxiv.org/abs/2501.08072)] [Code]\n- **📝 说明**：\n\n#### [186] 3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh\n- **🧑‍🔬 作者**：Lewis A G Stuart, Michael P Pound\n- **🏫 单位**：University of Nottingham\n- **🔗 链接**：[[中英摘要](./abs/2501.07478.md)] [[arXiv:2501.07478](https://arxiv.org/abs/2501.07478)] [[Code](https://github.com/Lewis-Stuart-11/3DGS-to-PC)]\n- **📝 说明**：\n\n#### [187] F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Consistent Gaussian Splatting\n- **🧑‍🔬 作者**：Yuxin Wang, Qianyi Wu, Dan Xu\n- **🏫 单位**：HKUST ⟐ Monash University\n- **🔗 链接**：[[中英摘要](./abs/2501.06714.md)] [[arXiv:2501.06714](https://arxiv.org/abs/2501.06714)] [[Code](https://github.com/W-Ted/F3D-Gaus)]\n- **📝 说明**：\n\n#### [188] Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation\n- **🧑‍🔬 作者**：Xuyi Meng, Chen Wang, Jiahui Lei, Kostas Daniilidis, Jiatao Gu, Lingjie Liu\n- **🏫 单位**：University of Pennsylvania ⟐ Apple\n- **🔗 链接**：[[中英摘要](./abs/2501.05427.md)] [[arXiv:2501.05427](https://arxiv.org/abs/2501.05427)] [Code]\n- **📝 说明**：\n\n#### [189] GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting\n- **🧑‍🔬 作者**：Andrew Bond, Jui-Hsien Wang, Long Mai, Erkut Erdem, Aykut Erdem\n- **🏫 单位**：Koç University ⟐ Adobe Research ⟐ Hacettepe University\n- **🔗 链接**：[[中英摘要](./abs/2501.04782.md)] [[arXiv:2501.04782](https://arxiv.org/abs/2501.04782)] [Code]\n- **📝 说明**：\n\n#### [190] Spatiotemporal Gaussian Optimization for 4D Cone Beam CT Reconstruction from Sparse Projections\n- **🧑‍🔬 作者**：Yabo Fu, Hao Zhang, Weixing Cai, Huiqiao Xie, Licheng Kuo, Laura Cervino, Jean Moran, Xiang Li, Tianfang Li\n- **🏫 单位**：Memorial Sloan Kettering Cancer Center\n- **🔗 链接**：[[中英摘要](./abs/2501.04140.md)] [[arXiv:2501.04140](https://arxiv.org/abs/2501.04140)] [[Code](https://github.com/fuyabo/4DGS_for_4DCBCT/tree/main)]\n- **📝 说明**：\n\n#### [191] ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting\n- **🧑‍🔬 作者**：Abhishek Saroha, Florian Hofherr, Mariia Gladkova, Cecilia Curreli, Or Litany, Daniel Cremers\n- **🏫 单位**：Technical University of Munich ⟐ Munich Center for Machine Learning ⟐ Technion ⟐ Nvidia\n- **🔗 链接**：[[中英摘要](./abs/2501.03875.md)] [[arXiv:2501.03875](https://arxiv.org/abs/2501.03875)] [Code]\n- **📝 说明**：\n\n#### [192] Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs\n- **🧑‍🔬 作者**：Soonbin Lee, Fangwen Shu, Yago Sanchez, Thomas Schierl, Cornelius Hellge\n- **🏫 单位**：Fraunhofer Heinrich Hertz Institute, HHI\n- **🔗 链接**：[[中英摘要](./abs/2501.03399.md)] [[arXiv:2501.03399](https://arxiv.org/abs/2501.03399)] [[Code](https://github.com/fraunhoferhhi/CodecGS/)]\n- **📝 说明**：\n\n#### [193] Gaussian Masked Autoencoders\n- **🧑‍🔬 作者**：Jathushan Rajasegaran, Xinlei Chen, Rulilong Li, Christoph Feichtenhofer, Jitendra Malik, Shiry Ginosar\n- **🏫 单位**：Meta, FAIR ⟐ UC Berkeley ⟐ Toyota Technological Institute at Chicago\n- **🔗 链接**：[[中英摘要](./abs/2501.03229.md)] [[arXiv:2501.03229](https://arxiv.org/abs/2501.03229)] [Code]\n- **📝 说明**：\n\n#### [194] GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking\n- **🧑‍🔬 作者**：Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yijin Li, Fu-Yun Wang, Hongsheng Li\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Centre for Perceptual and Interactive Intelligence ⟐ Avolution AI\n- **🔗 链接**：[[中英摘要](./abs/2501.02690.md)] [[arXiv:2501.02690](https://arxiv.org/abs/2501.02690)] [[Code](https://github.com/wkbian/GS-DiT)]\n- **📝 说明**：\n\n#### [195] CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction\n- **🧑‍🔬 作者**：Chenhao Zhang, Yuanping Cao, Lei Zhang\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2501.01695.md)] [[arXiv:2501.01695](https://arxiv.org/abs/2501.01695)] [Code]\n- **📝 说明**：\n\n#### [196] PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping\n- **🧑‍🔬 作者**：Tengfei Wang, Xin Wang, Yongmao Hou, Yiwei Xu, Wendi Zhang, Zongqian Zhan\n- **🏫 单位**：Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2501.01677.md)] [[arXiv:2501.01677](https://arxiv.org/abs/2501.01677)] [[Code](https://github.com/TFWang-9527/PG-SAG)]\n- **📝 说明**：\n\n#### [197] Deformable Gaussian Splatting for Efficient and High-Fidelity Reconstruction of Surgical Scenes\n- **🧑‍🔬 作者**：Jiwei Shan, Zeyu Cai, Cheng-Tai Hsieh, Shing Shin Cheng, Hesheng Wang\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2501.01101.md)] [[arXiv:2501.01101](https://arxiv.org/abs/2501.01101)] [Code]\n- **📝 说明**：\n\n#### [198] EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy\n- **🧑‍🔬 作者**：Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang\n- **🏫 单位**：Nanjing University ⟐ Nanjing University of Aeronautics and Astronautics ⟐ China Mobile\n- **🔗 链接**：[[中英摘要](./abs/2501.01003.md)] [[arXiv:2501.01003](https://arxiv.org/abs/2501.01003)] [Code]\n- **📝 说明**：\n\n#### [199] Gaussian Building Mesh (GBM): Extract a Building's 3D Mesh with Google Earth and Gaussian Splatting\n- **🧑‍🔬 作者**：Kyle Gao, Liangzhi Li, Hongjie He, Dening Lu, Linlin Xu, Jonathan Li\n- **🏫 单位**：University of Calgary ⟐ University of Waterloo ⟐ Chang’an University\n- **🔗 链接**：[[中英摘要](./abs/2501.00625.md)] [[arXiv:2501.00625](https://arxiv.org/abs/2501.00625)] [Code]\n- **📝 说明**：\n\n#### [200] DreamDrive: Generative 4D Scene Modeling from Street View Images\n- **🧑‍🔬 作者**：Jiageng Mao, Boyi Li, Boris Ivanovic, Yuxiao Chen, Yan Wang, Yurong You, Chaowei Xiao, Danfei Xu, Marco Pavone, Yue Wang\n- **🏫 单位**：NVIDIA Research ⟐ University of Southern California\n- **🔗 链接**：[[中英摘要](./abs/2501.00601.md)] [[arXiv:2501.00601](https://arxiv.org/abs/2501.00601)] [Code]\n- **📝 说明**：\n\n#### [201] PanoSLAM: Panoptic 3D Scene Reconstruction via Gaussian SLAM\n- **🧑‍🔬 作者**：Runnan Chen, Zhaoqing Wang, Jiepeng Wang, Yuexin Ma, Mingming Gong, Wenping Wang, Tongliang Liu\n- **🏫 单位**：The University of Sydney ⟐ The University of Hong Kong ⟐ ShanghaiTech University ⟐ The University of Melbourne ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2501.00352.md)] [[arXiv:2501.00352](https://arxiv.org/abs/2501.00352)] [[Code](https://github.com/runnanchen/PanoSLAM)]\n- **📝 说明**：\n\n#### [202] SG-Splatting: Accelerating 3D Gaussian Splatting with Spherical Gaussians\n- **🧑‍🔬 作者**：Yiwen Wang, Siyuan Chen, Ran Yi\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2501.00342.md)] [[arXiv:2501.00342](https://arxiv.org/abs/2501.00342)] [Code]\n- **📝 说明**：\n\n#### [203] OVGaussian: Generalizable 3D Gaussian Segmentation with Open Vocabularies\n- **🧑‍🔬 作者**：Runnan Chen, Xiangyu Sun, Zhaoqing Wang, Youquan Liu, Jiepeng Wang, Lingdong Kong, Jiankang Deng, Mingming Gong, Liang Pan, Wenping Wang, Tongliang Liu\n- **🏫 单位**：The University of Sydney ⟐ Fudan University ⟐ The University of Hong Kong ⟐ National University of Singapore ⟐ Imperial College London ⟐ The University of Melbourne ⟐ Shanghai AI Laboratory ⟐ Texas A&M University\n- **🔗 链接**：[[中英摘要](./abs/2501.00326.md)] [[arXiv:2501.00326](https://arxiv.org/abs/2501.00326)] [[Code](https://github.com/runnanchen/OVGaussian)]\n- **📝 说明**：\n"
  },
  {
    "path": "archive/202507.md",
    "content": "# 3D Gaussian Splatting Papers Before 2025/04/01\n\n#### [1] MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction\n- **🧑‍🔬 作者**：Antoine Guédon, Diego Gomez, Nissim Maruani, Bingchen Gong, George Drettakis, Maks Ovsjanikov\n- **🏫 单位**：École Polytechnique ⟐ Université Côte d’Azur\n- **🔗 链接**：[[中英摘要](./abs/2506.24096.md)] [[arXiv:2506.24096](https://arxiv.org/abs/2506.24096)] [Code]\n- **📝 说明**:\n\n#### [2] AttentionGS: Towards Initialization-Free 3D Gaussian Splatting via Structural Attention\n- **🧑‍🔬 作者**：Ziao Liu, Zhenjia Li, Yifeng Shi, Xiangang Li\n- **🏫 单位**：Wuhan University ⟐ BEKE.inc\n- **🔗 链接**：[[中英摘要](./abs/2506.23611.md)] [[arXiv:2506.23611](https://arxiv.org/abs/2506.23611)] [Code]\n- **📝 说明**:\n\n#### [3] TVG-SLAM: Robust Gaussian Splatting SLAM with Tri-view Geometric Constraints\n- **🧑‍🔬 作者**：Zhen Tan, Xieyuanli Chen, Lei Feng, Yangbing Ge, Shuaifeng Zhi, Jiaxiong Liu, Dewen Hu\n- **🏫 单位**：National University of Defense Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.23207.md)] [[arXiv:2506.23207](https://arxiv.org/abs/2506.23207)] [Code]\n- **📝 说明**:\n\n#### [4] Confident Splatting: Confidence-Based Compression of 3D Gaussian Splatting via Learnable Beta Distributions\n- **🧑‍🔬 作者**：AmirHossein Naghi Razlighi, Elaheh Badali Golezani, Shohreh Kasaei\n- **🏫 单位**：Sharif University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.22973.md)] [[arXiv:2506.22973](https://arxiv.org/abs/2506.22973)] [[Code](https://github.com/amirhossein-razlighi/Confident-Splatting)]\n- **📝 说明**:\n\n#### [5] Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D Gaussians\n- **🧑‍🔬 作者**：Jun-Jee Chao, Qingyuan Jiang, Volkan Isler\n- **🏫 单位**：University of Minnesota ⟐ The University of Texas at Austin\n- **🔗 链接**：[[中英摘要](./abs/2506.22718.md)] [[arXiv:2506.22718](https://arxiv.org/abs/2506.22718)] [Code]\n- **📝 说明**:\n\n#### [6] Few-Shot Identity Adaptation for 3D Talking Heads via Global Gaussian Field\n- **🧑‍🔬 作者**：Hong Nie, Fuyuan Cao, Lu Chen, Fengxin Chen, Yuefeng Zou, Jun Yu\n- **🏫 单位**：USTC\n- **🔗 链接**：[[中英摘要](./abs/2506.22044.md)] [[arXiv:2506.22044](https://arxiv.org/abs/2506.22044)] [[Code](https://github.com/gme-hong/FIAG)]\n- **📝 说明**:\n\n#### [7] SAR-GS: 3D Gaussian Splatting for Synthetic Aperture Radar Target Reconstruction\n- **🧑‍🔬 作者**：Aobo Li, Zhengxin Lei, Jiangtao Wei, Feng Xu\n- **🏫 单位**：Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2506.21633.md)] [[arXiv:2506.21633](https://arxiv.org/abs/2506.21633)] [[Code](https://github.com/gme-hong/FIAG)]\n- **📝 说明**:\n\n#### [8] MADrive: Memory-Augmented Driving Scene Modeling\n- **🧑‍🔬 作者**：Polina Karpikova, Daniil Selikhanovych, Kirill Struminsky, Ruslan Musaev, Maria Golitsyna, Dmitry Baranchuk\n- **🏫 单位**：Yandex ⟐ Yandex Research ⟐ HSE University ⟐ Skoltech\n- **🔗 链接**：[[中英摘要](./abs/2506.21520.md)] [[arXiv:2506.21520](https://arxiv.org/abs/2506.21520)] [Code]\n- **📝 说明**:\n\n#### [9] Geometry and Perception Guided Gaussians for Multiview-consistent 3D Generation from a Single Image\n- **🧑‍🔬 作者**：Pufan Li, Bi'an Du, Wei Hu\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2506.21152.md)] [[arXiv:2506.21152](https://arxiv.org/abs/2506.21152)] [Code]\n- **📝 说明**:\n\n#### [10] Virtual Memory for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jonathan Haberl, Philipp Fleck, Clemens Arth\n- **🏫 单位**： Graz University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.19415.md)] [[arXiv:2506.19415](https://arxiv.org/abs/2506.19415)] [Code]\n- **📝 说明**:\n\n#### [11] GRAND-SLAM: Local Optimization for Globally Consistent Large-Scale Multi-Agent Gaussian SLAM\n- **🧑‍🔬 作者**：Annika Thomas, Aneesa Sonawalla, Alex Rose, Jonathan P. How\n- **🏫 单位**： Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.18885.md)] [[arXiv:2506.18885](https://arxiv.org/abs/2506.18885)] [Code]\n- **📝 说明**:\n\n#### [12] 4Real-Video-V2: Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation\n- **🧑‍🔬 作者**：Chaoyang Wang, Ashkan Mirzaei, Vidit Goel, Willi Menapace, Aliaksandr Siarohin, Avalon Vinella, Michael Vasilkovsky, Ivan Skorokhodov, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Peter Wonka\n- **🏫 单位**：Snap Inc. ⟐ KAUST\n- **🔗 链接**：[[中英摘要](./abs/2506.18839.md)] [[arXiv:2506.18839](https://arxiv.org/abs/2506.18839)] [Code]\n- **📝 说明**:\n\n#### [13] BulletGen: Improving 4D Reconstruction with Bullet-Time Generation\n- **🧑‍🔬 作者**：Denys Rozumnyi, Jonathon Luiten, Numair Khan, Johannes Schönberger, Peter Kontschieder\n- **🏫 单位**：Meta Reality Labs\n- **🔗 链接**：[[中英摘要](./abs/2506.18601.md)] [[arXiv:2506.18601](https://arxiv.org/abs/2506.18601)] [Code]\n- **📝 说明**:\n\n#### [14] 2D Triangle Splatting for Direct Differentiable Mesh Training\n- **🧑‍🔬 作者**：Kaifeng Sheng, Zheng Zhou, Yingliang Peng, Qianwei Wang\n- **🏫 单位**：AMAP\n- **🔗 链接**：[[中英摘要](./abs/2506.18575.md)] [[arXiv:2506.18575](https://arxiv.org/abs/2506.18575)] [[Code](https://github.com/GaodeRender/triangle-splatting)]\n- **📝 说明**:\n\n#### [15] Part2GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Tianjiao Yu, Vedant Shah, Muntasir Wahed, Ying Shen, Kiet A. Nguyen, Ismini Lourentzou\n- **🏫 单位**：University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2506.17212.md)] [[arXiv:2506.17212](https://arxiv.org/abs/2506.17212)] [[Code](https://plan-lab.github.io/projects/part2gs)]\n- **📝 说明**:\n\n#### [16] GraphGSOcc: Semantic-Geometric Graph Transformer with Dynamic-Static Decoupling for 3D Gaussian Splatting-based Occupancy Prediction\n- **🧑‍🔬 作者**：Ke Song, Yunhe Wu, Chunchit Siu, Huiyuan Xiong\n- **🏫 单位**：Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2506.14825.md)] [[arXiv:2506.14825](https://arxiv.org/abs/2506.14825)] [Code]\n- **📝 说明**:\n\n#### [17] 3DGS-IEval-15K: A Large-scale Image Quality Evaluation Database for 3D Gaussian-Splatting\n- **🧑‍🔬 作者**：Yuke Xing, Jiarui Wang, Peizhi Niu, Wenjie Huang, Guangtao Zhai, Yiling Xu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2506.14642.md)] [[arXiv:2506.14642](https://arxiv.org/abs/2506.14642)] [[Code](https://github.com/YukeXing/3DGS-IEval-15K)]\n- **📝 说明**:\n\n#### [18] HRGS: Hierarchical Gaussian Splatting for Memory-Efficient High-Resolution 3D Reconstruction\n- **🧑‍🔬 作者**：Changbai Li, Haodong Zhu, Hanlin Chen, Juan Zhang, Tongfei Chen, Shuo Yang, Shuwei Shao, Wenhao Dong, Baochang Zhang\n- **🏫 单位**：Beihang University ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2506.14229.md)] [[arXiv:2506.14229](https://arxiv.org/abs/2506.14229)] [Code]\n- **📝 说明**:\n\n#### [19] GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation\n- **🧑‍🔬 作者**：Ying Chai, Litao Deng, Ruizhi Shao, Jiajun Zhang, Liangjun Xing, Hongwen Zhang, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ Beijing Normal University ⟐ Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2506.14135.md)] [[arXiv:2506.14135](https://arxiv.org/abs/2506.14135)] [Code]\n- **📝 说明**:\n\n#### [20] GRaD-Nav++: Vision-Language Model Enabled Visual Drone Navigation with Gaussian Radiance Fields and Differentiable Dynamics\n- **🧑‍🔬 作者**：Qianzhong Chen, Naixiang Gao, Suning Huang, JunEn Low, Timothy Chen, Jiankai Sun, Mac Schwager\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2506.14009.md)] [[arXiv:2506.14009](https://arxiv.org/abs/2506.14009)] [Code]\n- **📝 说明**:\n\n#### [21] PF-LHM: 3D Animatable Avatar Reconstruction from Pose-free Articulated Human Images\n- **🧑‍🔬 作者**：Lingteng Qiu, Peihao Li, Qi Zuo, Xiaodong Gu, Yuan Dong, Weihao Yuan, Siyu Zhu, Xiaoguang Han, Guanying Chen, Zilong Dong\n- **🏫 单位**：Tongyi Lab ⟐ Sun Yat-sen University ⟐ CUHKSZ ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2506.13766.md)] [[arXiv:2506.13766](https://arxiv.org/abs/2506.13766)] [Code]\n- **📝 说明**:\n\n#### [22] Micro-macro Gaussian Splatting with Enhanced Scalability for Unconstrained Scene Reconstruction\n- **🧑‍🔬 作者**：Yihui Li, Chengxin Lv, Hongyu Yang, Di Huang\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2506.13516.md)] [[arXiv:2506.13516](https://arxiv.org/abs/2506.13516)] [[Code](https://github.com/Kidleyh/SMW-GS)]\n- **📝 说明**:\n\n#### [23] Efficient multi-view training for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Minhyuk Choi, Injae Kim, Hyunwoo J. Kim\n- **🏫 单位**：Korea University ⟐ KAIST\n- **🔗 链接**：[[中英摘要](./abs/2506.12727.md)] [[arXiv:2506.12727](https://arxiv.org/abs/2506.12727)] [Code]\n- **📝 说明**:\n\n#### [24] Generative 4D Scene Gaussian Splatting with Object View-Synthesis Priors\n- **🧑‍🔬 作者**：Wen-Hsuan Chu, Lei Ke, Jianmeng Liu, Mingxiao Huo, Pavel Tokmakov, Katerina Fragkiadaki\n- **🏫 单位**：Carnegie Mellon University ⟐ Toyota Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2506.12716.md)] [[arXiv:2506.12716](https://arxiv.org/abs/2506.12716)] [Code]\n- **📝 说明**: Extended version of CVPR paper\n\n#### [25] PointGS: Point Attention-Aware Sparse View Synthesis with Gaussian Splatting\n- **🧑‍🔬 作者**：Lintao Xiang, Hongpei Zheng, Yating Huang, Qijun Yang, Hujun Yin\n- **🏫 单位**：The University of Manchester\n- **🔗 链接**：[[中英摘要](./abs/2506.10335.md)] [[arXiv:2506.10335](https://arxiv.org/abs/2506.10335)] [Code]\n- **📝 说明**:\n\n#### [26] DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos\n- **🧑‍🔬 作者**：Chieh Hubert Lin, Zhaoyang Lv, Songyin Wu, Zhen Xu, Thu Nguyen-Phuoc, Hung-Yu Tseng, Julian Straub, Numair Khan, Lei Xiao, Ming-Hsuan Yang, Yuheng Ren, Richard Newcombe, Zhao Dong, Zhengqin Li\n- **🏫 单位**：Meta ⟐ UC Merced ⟐ UC Santa Barbara\n- **🔗 链接**：[[中英摘要](./abs/2506.09997.md)] [[arXiv:2506.09997](https://arxiv.org/abs/2506.09997)] [Code]\n- **📝 说明**:\n\n#### [27] Self-Supervised Multi-Part Articulated Objects Modeling via Deformable Gaussian Splatting and Progressive Primitive Segmentation\n- **🧑‍🔬 作者**：Haowen Wang, Xiaoping Yuan, Zhao Jin, Zhen Zhao, Zhengping Che, Yousong Xue, Jin Tian, Yakun Huang, Jian Tang\n- **🏫 单位**：Anhui University ⟐ Beijing Innovation Center of Humanoid Robotics ⟐ Beijing Institute of Archtecture Design ⟐ Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2506.09663.md)] [[arXiv:2506.09663](https://arxiv.org/abs/2506.09663)] [Code]\n- **📝 说明**:\n\n#### [28] SemanticSplat: Feed-Forward 3D Scene Understanding with Language-Aware Gaussian Fields\n- **🧑‍🔬 作者**：Qijing Li, Jingxiang Sun, Liang An, Zhaoqi Su, Hongwen Zhang, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ Beijing Normal University\n- **🔗 链接**：[[中英摘要](./abs/2506.09565.md)] [[arXiv:2506.09565](https://arxiv.org/abs/2506.09565)] [Code]\n- **📝 说明**:\n\n#### [29] Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS\n- **🧑‍🔬 作者**：Tao Wang, Mengyu Li, Geduo Zeng, Cheng Meng, Qiong Zhang\n- **🏫 单位**：Renmin University of China\n- **🔗 链接**：[[中英摘要](./abs/2506.09534.md)] [[arXiv:2506.09534](https://arxiv.org/abs/2506.09534)] [Code]\n- **📝 说明**:\n\n#### [30] TinySplat: Feedforward Approach for Generating Compact 3D Scene Representation\n- **🧑‍🔬 作者**：Zetian Song, Jiaye Fu, Jiaqi Zhang, Xiaohan Lu, Chuanmin Jia, Siwei Ma, Wen Gao\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2506.09479.md)] [[arXiv:2506.09479](https://arxiv.org/abs/2506.09479)] [Code]\n- **📝 说明**:\n\n#### [31] ODG: Occupancy Prediction Using Dual Gaussians\n- **🧑‍🔬 作者**：Yunxiao Shi, Yinhao Zhu, Shizhong Han, Jisoo Jeong, Amin Ansari, Hong Cai, Fatih Porikli\n- **🏫 单位**：Qualcomm AI Research ⟐ Qualcomm Technologies, Inc\n- **🔗 链接**：[[中英摘要](./abs/2506.09417.md)] [[arXiv:2506.09417](https://arxiv.org/abs/2506.09417)] [Code]\n- **📝 说明**:\n\n#### [32] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images\n- **🧑‍🔬 作者**：Qijian Tian, Xin Tan, Jingyu Gong, Yuan Xie, Lizhuang Ma\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ East China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2506.09378.md)] [[arXiv:2506.09378](https://arxiv.org/abs/2506.09378)] [Code]\n- **📝 说明**:\n\n#### [33] STREAMINGGS: Voxel-Based Streaming 3D Gaussian Splatting with Memory Optimization and Architectural Support\n- **🧑‍🔬 作者**：Chenqi Zhang, Yu Feng, Jieru Zhao, Guangda Liu, Wenchao Ding, Chentao Wu, Minyi Guo\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2506.09070.md)] [[arXiv:2506.09070](https://arxiv.org/abs/2506.09070)] [Code]\n- **📝 说明**:\n\n#### [34] StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams\n- **🧑‍🔬 作者**：Zike Wu, Qi Yan, Xuanyu Yi, Lele Wang, Renjie Liao\n- **🏫 单位**：University of British Columbia ⟐ Vector Institute for AI ⟐ Canada CIFAR AI Chair ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2506.08862.md)] [[arXiv:2506.08862](https://arxiv.org/abs/2506.08862)] [[Code](https://github.com/nickwzk/StreamSplat)]\n- **📝 说明**:\n\n#### [35] Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Keyi Liu, Weidong Yang, Ben Fei, Ying He\n- **🏫 单位**：Fudan University ⟐ The Chinese University of Hong Kong ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2506.08777.md)] [[arXiv:2506.08777](https://arxiv.org/abs/2506.08777)] [Code]\n- **📝 说明**:\n\n#### [36] SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting\n- **🧑‍🔬 作者**：Mengjiao Ma, Qi Ma, Yue Li, Jiahuan Cheng, Runyi Yang, Bin Ren, Nikola Popovic, Mingqiang Wei, Nicu Sebe, Luc Van Gool, Theo Gevers, Martin R. Oswald, Danda Pani Paudel\n- **🏫 单位**：Sofia University ⟐ Nanjing University of Aeronautics and Astronautics ⟐ ETH Zürich ⟐ University of Amsterdam ⟐ Johns Hopkins University ⟐ University of Pisa ⟐ University of Trento\n- **🔗 链接**：[[中英摘要](./abs/2506.08710.md)] [[arXiv:2506.08710](https://arxiv.org/abs/2506.08710)] [Code]\n- **📝 说明**:\n\n#### [37] TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering\n- **🧑‍🔬 作者**：Xiaohan Zhang, Sitong Wang, Yushen Yan, Yi Yang, Mingda Xu, Qi Liu\n- **🏫 单位**： South China University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.08704.md)] [[arXiv:2506.08704](https://arxiv.org/abs/2506.08704)] [Code]\n- **📝 说明**:\n\n#### [38] Complex-Valued Holographic Radiance Fields\n- **🧑‍🔬 作者**：Yicheng Zhan, Dong-Ha Shin, Seung-Hwan Baek, Kaan Akşit\n- **🏫 单位**：University College London ⟐ Pohang University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.08350.md)] [[arXiv:2506.08350](https://arxiv.org/abs/2506.08350)] [Code]\n- **📝 说明**:\n\n#### [39] Speedy Deformable 3D Gaussian Splatting: Fast Rendering and Compression of Dynamic Scenes\n- **🧑‍🔬 作者**：Allen Tu, Haiyang Ying, Alex Hanson, Yonghan Lee, Tom Goldstein, Matthias Zwicker\n- **🏫 单位**：University of Maryland, College Park\n- **🔗 链接**：[[中英摘要](./abs/2506.07917.md)] [[arXiv:2506.07917](https://arxiv.org/abs/2506.07917)] [[Code](https://github.com/tuallen/speede3dgs)]\n- **📝 说明**:\n\n#### [40] GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution\n- **🧑‍🔬 作者**：Shuja Khalid, Mohamed Ibrahim, Yang Liu\n- **🏫 单位**：Huawei Canada\n- **🔗 链接**：[[中英摘要](./abs/2506.07897.md)] [[arXiv:2506.07897](https://arxiv.org/abs/2506.07897)] [Code]\n- **📝 说明**:\n\n#### [41] R3D2: Realistic 3D Asset Insertion via Diffusion for Autonomous Driving Simulation\n- **🧑‍🔬 作者**：William Ljungbergh, Bernardo Taveira, Wenzhao Zheng, Adam Tonderski, Chensheng Peng, Fredrik Kahl, Christoffer Petersson, Michael Felsberg, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan\n- **🏫 单位**：Zenseact ⟐ Linköping University ⟐ Chalmers University ⟐ UC Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2506.07826.md)] [[arXiv:2506.07826](https://arxiv.org/abs/2506.07826)] [[Code](https://github.com/bertaveira/R3D2)]\n- **📝 说明**:\n\n#### [42] OpenSplat3D: Open-Vocabulary 3D Instance Segmentation using Gaussian Splatting\n- **🧑‍🔬 作者**：Jens Piekenbrinck, Christian Schmidt, Alexander Hermans, Narunas Vaskevicius, Timm Linder, Bastian Leibe\n- **🏫 单位**：RWTH Aachen University ⟐ Robert Bosch GmbH\n- **🔗 链接**：[[中英摘要](./abs/2506.07697.md)] [[arXiv:2506.07697](https://arxiv.org/abs/2506.07697)] [Code]\n- **📝 说明**:\n\n#### [43] ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views\n- **🧑‍🔬 作者**：Xiaohan Lu, Jiaye Fu, Jiaqi Zhang, Zetian Song, Chuanmin Jia, Siwei Ma\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2506.07670.md)] [[arXiv:2506.07670](https://arxiv.org/abs/2506.07670)] [Code]\n- **📝 说明**:\n\n#### [44] PIG: Physically-based Multi-Material Interaction with 3D Gaussians\n- **🧑‍🔬 作者**：Zeyu Xiao, Zhenyi Wu, Mingyang Sun, Qipeng Yan, Yufan Guo, Zhuoer Liang, Lihua Zhang\n- **🏫 单位**： Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2506.07657.md)] [[arXiv:2506.07657](https://arxiv.org/abs/2506.07657)] [Code]\n- **📝 说明**:\n\n#### [45] Hierarchical Scoring with 3D Gaussian Splatting for Instance Image-Goal Navigation\n- **🧑‍🔬 作者**：Yijie Deng, Shuaihang Yuan, Geeta Chandra Raju Bethala, Anthony Tzes, Yu-Shen Liu, Yi Fang\n- **🏫 单位**：NYU Abu Dhabi ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2506.07338.md)] [[arXiv:2506.07338](https://arxiv.org/abs/2506.07338)] [Code]\n- **📝 说明**:\n\n#### [46] Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization\n- **🧑‍🔬 作者**：Zhican Wang, Guanghui He, Dantong Liu, Lingjun Gao, Shell Xu Hu, Chen Zhang, Zhuoran Song, Nicholas Lane, Wayne Luk, Hongxiang Fan\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Cambridge ⟐ Imperial College London ⟐ Samsung AI Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2506.07069.md)] [[arXiv:2506.07069](https://arxiv.org/abs/2506.07069)] [Code]\n- **📝 说明**:\n\n#### [47] Hybrid Mesh-Gaussian Representation for Efficient Indoor Scene Reconstruction\n- **🧑‍🔬 作者**：Binxiao Huang, Zhihao Li, Shiyong Liu, Xiao Tang, Jiajun Tang, Jiaqi Lin, Yuxin Cheng, Zhenyu Chen, Xiaofei Wu, Ngai Wong\n- **🏫 单位**：The University of Hong Kong ⟐ Huawei Technologies Ltd ⟐ Peking University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2506.06988.md)] [[arXiv:2506.06988](https://arxiv.org/abs/2506.06988)] [Code]\n- **📝 说明**:\n\n#### [48] Gaussian Mapping for Evolving Scenes\n- **🧑‍🔬 作者**：Vladimir Yugay, Thies Kersten, Luca Carlone, Theo Gevers, Martin R. Oswald, Lukas Schmid\n- **🏫 单位**：University of Amsterdam ⟐ Massachusetts Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.06909.md)] [[arXiv:2506.06909](https://arxiv.org/abs/2506.06909)] [Code]\n- **📝 说明**:\n\n#### [49] Hi-LSplat: Hierarchical 3D Language Gaussian Splatting\n- **🧑‍🔬 作者**：Chenlu Zhan, Yufei Zhang, Gaoang Wang, Hongwei Wang\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2506.06822.md)] [[arXiv:2506.06822](https://arxiv.org/abs/2506.06822)] [Code]\n- **📝 说明**:\n\n#### [50] Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling\n- **🧑‍🔬 作者**：Cheng Peng, Jingxiang Sun, Yushuo Chen, Zhaoqi Su, Zhuo Su, Yebin Liu\n- **🏫 单位**：Tsinghua University ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](./abs/2506.06645.md)] [[arXiv:2506.06645](https://arxiv.org/abs/2506.06645)] [Code]\n- **📝 说明**:\n\n#### [51] GS4: Generalizable Sparse Splatting Semantic SLAM\n- **🧑‍🔬 作者**：Mingqi Jiang, Chanho Kim, Chen Ziwen, Li Fuxin\n- **🏫 单位**：Oregon State University\n- **🔗 链接**：[[中英摘要](./abs/2506.06517.md)] [[arXiv:2506.06517](https://arxiv.org/abs/2506.06517)] [Code]\n- **📝 说明**:\n\n#### [52] Dy3DGS-SLAM: Monocular 3D Gaussian Splatting SLAM for Dynamic Environments\n- **🧑‍🔬 作者**：Mingrui Li, Yiming Zhou, Hongxing Zhou, Xinggang Hu, Florian Roemer, Hongyu Wang, Ahmad Osman\n- **🏫 单位**：Dalian University of Technology ⟐ Saarland University of Applied Sciences ⟐ Fraunhofer Institute for Nondestructive Testing ⟐ Beijing University of Chemical Technology ⟐ Laval University\n- **🔗 链接**：[[中英摘要](./abs/2506.05965.md)] [[arXiv:2506.05965](https://arxiv.org/abs/2506.05965)] [Code]\n- **📝 说明**:\n\n#### [53] SurGSplat: Progressive Geometry-Constrained Gaussian Splatting for Surgical Scene Reconstruction\n- **🧑‍🔬 作者**：Yuchao Zheng, Jianing Zhang, Guochen Ning, Hongen Liao\n- **🏫 单位**：Tsinghua University ⟐ Fudan University\n- **🔗 链接**：[[中英摘要](./abs/2506.05935.md)] [[arXiv:2506.05935](https://arxiv.org/abs/2506.05935)] [Code]\n- **📝 说明**:\n\n#### [54] Lumina: Real-Time Mobile Neural Rendering by Exploiting Computational Redundancy\n- **🧑‍🔬 作者**：Yu Feng, Weikai Lin, Yuge Cheng, Zihan Liu, Jingwen Leng, Minyi Guo, Chen Chen, Shixuan Sun, Yuhao Zhu\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ University of Rochester\n- **🔗 链接**：[[中英摘要](./abs/2506.05682.md)] [[arXiv:2506.05682](https://arxiv.org/abs/2506.05682)] [Code]\n- **📝 说明**:\n\n#### [55] ODE-GS: Latent ODEs for Dynamic Scene Extrapolation with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Daniel Wang, Patrick Rim, Tian Tian, Alex Wong, Ganesh Sundaramoorthi\n- **🏫 单位**：Yale University ⟐ RISD ⟐ RTX\n- **🔗 链接**：[[中英摘要](./abs/2506.05480.md)] [[arXiv:2506.05480](https://arxiv.org/abs/2506.05480)] [Code]\n- **📝 说明**:\n\n#### [56] S2GO: Streaming Sparse Gaussian Occupancy Prediction\n- **🧑‍🔬 作者**：Jinhyung Park, Yihan Hu, Chensheng Peng, Wenzhao Zheng, Kris Kitani, Wei Zhan\n- **🏫 单位**：Applied Intuition ⟐ Carnegie Mellon University ⟐ University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2506.05473.md)] [[arXiv:2506.05473](https://arxiv.org/abs/2506.05473)] [Code]\n- **📝 说明**:\n\n#### [57] Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jia-Wang Bian, Bohan Zhuang, Chunhua Shen\n- **🏫 单位**：Zhejiang University ⟐ Monash University ⟐ MBZUAI ⟐ GigaAI\n- **🔗 链接**：[[中英摘要](./abs/2506.05327.md)] [[arXiv:2506.05327](https://arxiv.org/abs/2506.05327)] [[Code](https://github.com/aim-uofa/PM-Loss)]\n- **📝 说明**:\n\n#### [58] DSG-World: Learning a 3D Gaussian World Model from Dual State Videos\n- **🧑‍🔬 作者**：Wenhao Hu, Xuexiang Wen, Xi Li, Gaoang Wang\n- **🏫 单位**：Zhejiang University ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2506.05217.md)] [[arXiv:2506.05217](https://arxiv.org/abs/2506.05217)] [Code]\n- **📝 说明**:\n\n#### [59] OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View\n- **🧑‍🔬 作者**：Yanbo Wang, Ziyi Wang, Wenzhao Zheng, Jie Zhou, Jiwen Lu\n- **🏫 单位**：Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2506.05204.md)] [[arXiv:2506.05204](https://arxiv.org/abs/2506.05204)] [[Code](https://github.com/Yanbo-23/OGGSplat)]\n- **📝 说明**:\n\n#### [60] Synthetic Dataset Generation for Autonomous Mobile Robots Using 3D Gaussian Splatting for Vision Training\n- **🧑‍🔬 作者**：Aneesh Deogan, Wout Beks, Peter Teurlings, Koen de Vos, Mark van den Brand, Rene van de Molengraft\n- **🏫 单位**：Eindhoven University of Technology\n- **🔗 链接**：[[中英摘要](./abs/2506.05092.md)] [[arXiv:2506.05092](https://arxiv.org/abs/2506.05092)] [Code]\n- **📝 说明**:\n\n#### [61] UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting\n- **🧑‍🔬 作者**：Jaehoon Choi, Dongki Jung, Christopher Maxey, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon\n- **🏫 单位**：University of Maryland ⟐ DEVCOM Army Research Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2506.05011.md)] [[arXiv:2506.05011](https://arxiv.org/abs/2506.05011)] [Code]\n- **📝 说明**:\n\n#### [62] Point Cloud Segmentation of Agricultural Vehicles using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Alfred T. Christiansen, Andreas H. Højrup, Morten K. Stephansen, Md Ibtihaj A. Sakib, Taman S. Poojary, Filip Slezak, Morten S. Laursen, Thomas B. Moeslund, Joakim B. Haurum\n- **🏫 单位**：Aalborg University ⟐ AGCO A/S, Denmark ⟐ Pioneer Centre for AI, Denmark\n- **🔗 链接**：[[中英摘要](./abs/2506.05009.md)] [[arXiv:2506.05009](https://arxiv.org/abs/2506.05009)] [Code]\n- **📝 说明**:\n\n#### [63] Generating Synthetic Stereo Datasets using 3D Gaussian Splatting and Expert Knowledge Transfer\n- **🧑‍🔬 作者**：Filip Slezak, Magnus K. Gjerde, Joakim B. Haurum, Ivan Nikolov, Morten S. Laursen, Thomas B. Moeslund\n- **🏫 单位**：Aalborg University ⟐ AGCO A/S, Denmark ⟐ Pioneer Centre for AI, Denmark\n- **🔗 链接**：[[中英摘要](./abs/2506.04908.md)] [[arXiv:2506.04908](https://arxiv.org/abs/2506.04908)] [Code]\n- **📝 说明**:\n\n#### [64] Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations\n- **🧑‍🔬 作者**：Gaia Di Lorenzo, Federico Tombari, Marc Pollefeys, Daniel Barath\n- **🏫 单位**：ETH Zurich ⟐ Google ⟐ Microsoft\n- **🔗 链接**：[[中英摘要](./abs/2506.04789.md)] [[arXiv:2506.04789](https://arxiv.org/abs/2506.04789)] [[Code](https://github.com/gaiadilorenzo/object-x)]\n- **📝 说明**:\n\n#### [65] HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting\n- **🧑‍🔬 作者**：Maksym Ivashechkin, Oscar Mendez, Richard Bowden\n- **🏫 单位**：University of Surrey\n- **🔗 链接**：[[中英摘要](./abs/2506.04351.md)] [[arXiv:2506.04351](https://arxiv.org/abs/2506.04351)] [Code]\n- **📝 说明**:\n\n#### [66] Splatting Physical Scenes: End-to-End Real-to-Sim from Imperfect Robot Data\n- **🧑‍🔬 作者**：Ben Moran, Mauro Comi, Arunkumar Byravan, Steven Bohez, Tom Erez, Zhibin Li, Leonard Hasenclever\n- **🏫 单位**：Google DeepMind ⟐ University College London ⟐ University of Bristol\n- **🔗 链接**：[[中英摘要](./abs/2506.04120.md)] [[arXiv:2506.04120](https://arxiv.org/abs/2506.04120)] [Code]\n- **📝 说明**:\n\n#### [67] JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting\n- **🧑‍🔬 作者**：Yang Xiao, Guoan Xu, Qiang Wu, Wenjing Jia\n- **🏫 单位**：University of Technology Sydney\n- **🔗 链接**：[[中英摘要](./abs/2506.03872.md)] [[arXiv:2506.03872](https://arxiv.org/abs/2506.03872)] [Code]\n- **📝 说明**:\n\n#### [68] SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad, Vitor Campagnolo Guizilini, Rares Andrei Ambrus, Greg Shakhnarovich, Matthew R. Walter\n- **🏫 单位**：Toyota Technological Institute at Chicago ⟐ Toyota Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2506.03594.md)] [[arXiv:2506.03594](https://arxiv.org/abs/2506.03594)] [[Code](https://github.com/ripl/splart)]\n- **📝 说明**:\n\n#### [69] Multi-Spectral Gaussian Splatting with Neural Color Representation\n- **🧑‍🔬 作者**：Lukas Meyer, Josef Grün, Maximilian Weiherer, Bernhard Egger, Marc Stamminger, Linus Franke\n- **🏫 单位**：Friedrich-Alexander-Universität Erlangen-Nürnberg-Fürth\n- **🔗 链接**：[[中英摘要](./abs/2506.03407.md)] [[arXiv:2506.03407](https://arxiv.org/abs/2506.03407)] [Code]\n- **📝 说明**:\n\n#### [70] LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM\n- **🧑‍🔬 作者**：Roman Titkov, Egor Zubkov, Dmitry Yudin, Jaafar Mahmoud, Malik Mohrat, Gennady Sidorov\n- **🏫 单位**：Moscow Institute of Physics and Technology ⟐ AIRI ⟐ Sberbank of Russia, Robotics Center\n- **🔗 链接**：[[中英摘要](./abs/2506.03073.md)] [[arXiv:2506.03073](https://arxiv.org/abs/2506.03073)] [[Code](https://github.com/Titrom025/LEG-SLAM)]\n- **📝 说明**:\n\n#### [71] Voyager: Real-Time Splatting City-Scale 3D Gaussians on Your Phone\n- **🧑‍🔬 作者**：Zheng Liu, He Zhu, Xinyang Li, Yirun Wang, Yujiao Shi, Wei Li, Jingwen Leng, Minyi Guo, Yu Feng\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ ShanghaiTech University ⟐ Shanghai Qi Zhi Institute\n- **🔗 链接**：[[中英摘要](./abs/2506.02774.md)] [[arXiv:2506.02774](https://arxiv.org/abs/2506.02774)] [Code]\n- **📝 说明**:\n\n#### [72] EyeNavGS: A 6-DoF Navigation Dataset and Record-n-Replay Software for Real-World 3DGS Scenes in VR\n- **🧑‍🔬 作者**：Zihao Ding, Cheng-Tse Lee, Mufeng Zhu, Tao Guan, Yuan-Chun Sun, Cheng-Hsin Hsu, Yao Liu\n- **🏫 单位**：Rutgers University ⟐ National Tsing Hua University\n- **🔗 链接**：[[中英摘要](./abs/2506.02380.md)] [[arXiv:2506.02380](https://arxiv.org/abs/2506.02380)] [Code]\n- **📝 说明**:\n\n#### [73] GSCodec Studio: A Modular Framework for Gaussian Splat Compression\n- **🧑‍🔬 作者**：Sicheng Li, Chengzhen Wu, Hao Li, Xiang Gao, Yiyi Liao, Lu Yu\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2506.01822.md)] [[arXiv:2506.01822](https://arxiv.org/abs/2506.01822)] [Code]\n- **📝 说明**:\n\n#### [74] RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes\n- **🧑‍🔬 作者**：Pou-Chun Kung, Skanda Harisha, Ram Vasudevan, Aline Eid, Katherine A. Skinner\n- **🏫 单位**：University of Michigan\n- **🔗 链接**：[[中英摘要](./abs/2506.01379.md)] [[arXiv:2506.01379](https://arxiv.org/abs/2506.01379)] [Code]\n- **📝 说明**:\n\n#### [75] CountingFruit: Real-Time 3D Fruit Counting with Language-Guided Semantic Gaussian Splatting\n- **🧑‍🔬 作者**：Fengze Li, Yangle Liu, Jieming Ma, Hai-Ning Liang, Yaochun Shen, Huangxiang Li, Zhijing Wu\n- **🏫 单位**：University of Liverpool ⟐ Xi’an Jiaotong-Liverpool University ⟐ Hong Kong University of Science and Technology (Guangzhou) ⟐ Baidu ⟐ University of Cambridge\n- **🔗 链接**：[[中英摘要](./abs/2506.01109.md)] [[arXiv:2506.01109](https://arxiv.org/abs/2506.01109)] [Code]\n- **📝 说明**:\n\n#### [76] PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation\n- **🧑‍🔬 作者**：Mert Kiray, Paul Uhlenbruck, Nassir Navab, Benjamin Busam\n- **🏫 单位**：Technical University of Munich ⟐ 3Dwe.ai\n- **🔗 链接**：[[中英摘要](./abs/2506.01091.md)] [[arXiv:2506.01091](https://arxiv.org/abs/2506.01091)] [Code]\n- **📝 说明**:\n\n#### [77] Globally Consistent RGB-D SLAM with 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Xingguang Zhong, Yue Pan, Liren Jin, Marija Popović, Jens Behley, Cyrill Stachniss\n- **🏫 单位**：University of Bonn ⟐ MAVLab ⟐ Lamarr Institute for Machine Learning and Artificial Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2506.00970.md)] [[arXiv:2506.00970](https://arxiv.org/abs/2506.00970)] [Code]\n- **📝 说明**:\n\n#### [78] Adaptive Voxelization for Transform coding of 3D Gaussian splatting data\n- **🧑‍🔬 作者**：Chenjunjie Wang, Shashank N. Sridhara, Eduardo Pavez, Antonio Ortega, Cheng Chang\n- **🏫 单位**：University of Southern California ⟐ Meta\n- **🔗 链接**：[[中英摘要](./abs/2506.00271.md)] [[arXiv:2506.00271](https://arxiv.org/abs/2506.00271)] [[Code](https://github.com/STAC-USC/3DGS_Compression_Adaptive_Voxelization)]\n- **📝 说明**:\n\n#### [79] Understanding while Exploring: Semantics-driven Active Mapping\n- **🧑‍🔬 作者**：Liyan Chen, Huangying Zhan, Hairong Yin, Yi Xu, Philippos Mordohai\n- **🏫 单位**：Stevens Institute of Technology ⟐ Goertek Alpha Labs ⟐ Purdue University\n- **🔗 链接**：[[中英摘要](./abs/2506.00225.md)] [[arXiv:2506.00225](https://arxiv.org/abs/2506.00225)] [Code]\n- **📝 说明**:\n\n#### [80] TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores\n- **🧑‍🔬 作者**：Zimu Liao, Jifeng Ding, Rong Fu, Siwei Cui, Ruixuan Gong, Li Wang, Boni Hu, Yi Wang, Hengjie Li, XIngcheng Zhang, Hui Wang\n- **🏫 单位**：Shanghai Artificial Intelligence Laboratory ⟐ Shanghai Jiao Tong University ⟐ University of Electronic Science and Technology of China ⟐ Fudan University ⟐ Beijing Institute of Technology ⟐ Northwestern Polytechnical University ⟐ Shanghai Innovation Institute\n- **🔗 链接**：[[中英摘要](./abs/2505.24796.md)] [[arXiv:2505.24796](https://arxiv.org/abs/2505.24796)] [[Code](https://github.com/TensorCore3DGS/3DGSTensorCore)]\n- **📝 说明**:\n\n#### [81] GARLIC: GAussian Representation LearnIng for spaCe partitioning\n- **🧑‍🔬 作者**：Panagiotis Rigas, Panagiotis Drivas, Charalambos Tzamos, Ioannis Chamodrakas, George Ioannakis, Leonidas J. Guibas, Ioannis Z. Emiris\n- **🏫 单位**：National and Kapodistrian University of Athens ⟐ Athena Research Center ⟐ Czech Technical University in Prague ⟐ Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2505.24608.md)] [[arXiv:2505.24608](https://arxiv.org/abs/2505.24608)] [Code]\n- **📝 说明**:\n\n#### [82] 3DGEER: Exact and Efficient Volumetric Rendering with 3D Gaussians\n- **🧑‍🔬 作者**：Zixun Huang, Cho-Ying Wu, Yuliang Guo, Xinyu Huang, Liu Ren\n- **🏫 单位**：Bosch Research North America & Bosch Center for AI\n- **🔗 链接**：[[中英摘要](./abs/2505.24053.md)] [[arXiv:2505.24053](https://arxiv.org/abs/2505.24053)] [Code]\n- **📝 说明**:\n\n#### [83] Holistic Large-Scale Scene Reconstruction via Mixed Gaussian Splatting\n- **🧑‍🔬 作者**：Chuandong Liu, Huijiao Wang, Lei Yu, Gui-Song Xia\n- **🏫 单位**：Wuhan University\n- **🔗 链接**：[[中英摘要](./abs/2505.23280.md)] [[arXiv:2505.23280](https://arxiv.org/abs/2505.23280)] [[Code](https://github.com/azhuantou/MixGS)]\n- **📝 说明**:\n\n#### [84] 3DGS Compression with Sparsity-guided Hierarchical Transform Coding\n- **🧑‍🔬 作者**：Hao Xu, Xiaolin Wu, Xi Zhang\n- **🏫 单位**：McMaster University ⟐ Southwest Jiaotong University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2505.22908.md)] [[arXiv:2505.22908](https://arxiv.org/abs/2505.22908)] [Code]\n- **📝 说明**:\n\n#### [85] STDR: Spatio-Temporal Decoupling for Real-Time Dynamic Scene Rendering\n- **🧑‍🔬 作者**：Zehao Li, Hao Jiang, Yujun Cai, Jianing Chen, Baolong Bi, Shuqin Gao, Honglong Zhao, Yiwei Wang, Tianlu Mao, Zhaoqi Wang\n- **🏫 单位**：Chinese Academy of Sciences ⟐ University of Chinese Academy of Sciences ⟐ The University of Queensland ⟐ University of California, Merced\n- **🔗 链接**：[[中英摘要](./abs/2505.22400.md)] [[arXiv:2505.22400](https://arxiv.org/abs/2505.22400)] [Code]\n- **📝 说明**:\n\n#### [86] UP-SLAM: Adaptively Structured Gaussian SLAM with Uncertainty Prediction in Dynamic Environments\n- **🧑‍🔬 作者**：Wancai Zheng, Linlin Ou, Jiajie He, Libo Zhou, Xinyi Yu, Yan Wei\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2505.22335.md)] [[arXiv:2505.22335](https://arxiv.org/abs/2505.22335)] [Code]\n- **📝 说明**:\n\n#### [87] Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss\n- **🧑‍🔬 作者**：Wenjun Lu, Haodong Chen, Anqi Yi, Yuk Ying Chung, Zhiyong Wang, Kun Hu\n- **🏫 单位**：The University of Sydney ⟐ Edith Cowan University\n- **🔗 链接**：[[中英摘要](./abs/2505.22279.md)] [[arXiv:2505.22279](https://arxiv.org/abs/2505.22279)] [Code]\n- **📝 说明**:\n\n#### [88] Diffusion-Denoised Hyperspectral Gaussian Splatting\n- **🧑‍🔬 作者**：Sunil Kumar Narayanan, Lingjun Zhao, Lu Gan, Yongsheng Chen\n- **🏫 单位**：Georgia Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2505.21890.md)] [[arXiv:2505.21890](https://arxiv.org/abs/2505.21890)] [Code]\n- **📝 说明**: 🏆 Accepted to 3DV 2026\n\n#### [89] Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis\n- **🧑‍🔬 作者**：Yipengjing Sun, Chenyang Wang, Shunyuan Zheng, Zonglin Li, Shengping Zhang, Xiangyang Ji\n- **🏫 单位**：Harbin Institute of Technology ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2505.21502.md)] [[arXiv:2505.21502](https://arxiv.org/abs/2505.21502)] [Code]\n- **📝 说明**:\n\n#### [90] MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation\n- **🧑‍🔬 作者**：Kerui Ren, Jiayang Bai, Linning Xu, Lihan Jiang, Jiangmiao Pang, Mulin Yu, Bo Dai\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Nanjing University ⟐ The Chinese University of Hong Kong ⟐ University of Science and Technology of China ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2505.21483.md)] [[arXiv:2505.21483](https://arxiv.org/abs/2505.21483)] [Code]\n- **📝 说明**:\n\n#### [91] Plenodium: UnderWater 3D Scene Reconstruction with Plenoptic Medium Representation\n- **🧑‍🔬 作者**：Changguanng Wu, Jiangxin Dong, Chengjian Li, Jinhui Tang\n- **🏫 单位**：Nanjing University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2505.21258.md)] [[arXiv:2505.21258](https://arxiv.org/abs/2505.21258)] [Code]\n- **📝 说明**:\n\n#### [92] 3D-UIR: 3D Gaussian for Underwater 3D Scene Reconstruction via Physics Based Appearance-Medium Decoupling\n- **🧑‍🔬 作者**：Jieyu Yuan, Yujun Li, Yuanlin Zhang, Chunle Guo, Xiongxin Tang, Ruixing Wang, Chongyi Li\n- **🏫 单位**：Nankai University ⟐ Chinese Academy of Sciences ⟐ Nankai International Advanced Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2505.21238.md)] [[arXiv:2505.21238](https://arxiv.org/abs/2505.21238)] [[Code](https://github.com/bilityniu/3D-UIR)]\n- **📝 说明**:\n\n#### [93] CityGo: Lightweight Urban Modeling and Rendering with Proxy Buildings and Residual Gaussians\n- **🧑‍🔬 作者**：Weihang Liu, Yuhui Zhong, Yuke Li, Xi Chen, Jiadi Cui, Honglong Zhang, Lan Xu, Xin Lou, Yujiao Shi, Jingyi Yu, Yingliang Zhang\n- **🏫 单位**：ShanghaiTech University ⟐ DGene ⟐ Migu Cultural Technology Co.,Ltd ⟐ GGU Technology Co., Ltd ⟐ Stereye\n- **🔗 链接**：[[中英摘要](./abs/2505.21041.md)] [[arXiv:2505.21041](https://arxiv.org/abs/2505.21041)] [Code]\n- **📝 说明**:\n\n#### [94] ProBA: Probabilistic Bundle Adjustment with the Bhattacharyya Coefficient\n- **🧑‍🔬 作者**：Jason Chui, Daniel Cremers\n- **🏫 单位**：Technical University of Munich\n- **🔗 链接**：[[中英摘要](./abs/2505.20858.md)] [[arXiv:2505.20858](https://arxiv.org/abs/2505.20858)] [Code]\n- **📝 说明**:\n\n#### [95] Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Xiangyu Sun, Runnan Chen, Mingming Gong, Dong Xu, Tongliang Liu\n- **🏫 单位**：University of Sydney\n- **🔗 链接**：[[中英摘要](./abs/2505.20729.md)] [[arXiv:2505.20729](https://arxiv.org/abs/2505.20729)] [Code]\n- **📝 说明**:\n\n#### [96] Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Zechen Li, Lanqing Yang, Yiheng Bian, Hao Pan, Yongjian Fu, Yezhou Wang, Yi-Chao Chen, Guangtao Xue, Ju Ren\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Central South University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2505.20714.md)] [[arXiv:2505.20714](https://arxiv.org/abs/2505.20714)] [[Code](https://github.com/sim-2-real/Wideband3DGS)]\n- **📝 说明**:\n\n#### [97] OmniIndoor3D: Comprehensive Indoor 3D Reconstruction\n- **🧑‍🔬 作者**：Xiaobao Wei, Xiaoan Zhang, Hao Wang, Qingpo Wuwu, Ming Lu, Wenzhao Zheng, Shanghang Zhang\n- **🏫 单位**：Peking University ⟐ University of California\n- **🔗 链接**：[[中英摘要](./abs/2505.20610.md)] [[arXiv:2505.20610](https://arxiv.org/abs/2505.20610)] [Code]\n- **📝 说明**:\n\n#### [98] WeatherEdit: Controllable Weather Editing with 4D Gaussian Field\n- **🧑‍🔬 作者**：Chenghao Qian, Wenjing Li, Yuhu Guo, Gustav Markkula\n- **🏫 单位**：University of Leeds ⟐ Carnegie Mellon University\n- **🔗 链接**：[[中英摘要](./abs/2505.20471.md)] [[arXiv:2505.20471](https://arxiv.org/abs/2505.20471)] [[Code](https://github.com/Jumponthemoon/WeatherEdit)]\n- **📝 说明**:\n\n#### [99] ParticleGS: Particle-Based Dynamics Modeling of 3D Gaussians for Prior-free Motion Extrapolation\n- **🧑‍🔬 作者**：Jinsheng Quan, Chunshi Wang, Yawei Luo\n- **🏫 单位**：Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2505.20270.md)] [[arXiv:2505.20270](https://arxiv.org/abs/2505.20270)] [[Code](https://github.com/QuanJinSheng/ParticleGS)]\n- **📝 说明**:\n\n#### [100] HaloGS: Loose Coupling of Compact Geometry and Gaussian Splats for 3D Scenes\n- **🧑‍🔬 作者**：Changjian Jiang, Kerui Ren, Linning Xu, Jiong Chen, Jiangmiao Pang, Yu Zhang, Bo Dai, Mulin Yu\n- **🏫 单位**：Zhejiang University ⟐ Shanghai Artificial Intelligence Laboratory ⟐ Shanghai Jiao Tong University ⟐ The Chinese University of Hong Kong ⟐ Inria ⟐ The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2505.20267.md)] [[arXiv:2505.20267](https://arxiv.org/abs/2505.20267)] [Code]\n- **📝 说明**:\n\n#### [101] ADD-SLAM: Adaptive Dynamic Dense SLAM with Gaussian Splatting\n- **🧑‍🔬 作者**：Wenhua Wu, Chenpeng Su, Siting Zhu, Tianchen Deng, Zhe Liu, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2505.19420.md)] [[arXiv:2505.19420](https://arxiv.org/abs/2505.19420)] [Code]\n- **📝 说明**:\n\n#### [102] Improving Novel view synthesis of 360∘ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images\n- **🧑‍🔬 作者**：Guangan Chen, Anh Minh Truong, Hanhe Lin, Michiel Vlaminck, Wilfried Philips, Hiep Luong\n- **🏫 单位**：Ghent University ⟐ University of Dundee\n- **🔗 链接**：[[中英摘要](./abs/2505.19264.md)] [[arXiv:2505.19264](https://arxiv.org/abs/2505.19264)] [[Code](https://github.com/angchen-dev/hemiSparseGS)]\n- **📝 说明**:\n\n#### [103] Triangle Splatting for Real-Time Radiance Field Rendering\n- **🧑‍🔬 作者**：Jan Held, Renaud Vandeghen, Adrien Deliege, Abdullah Hamdi, Silvio Giancola, Anthony Cioppa, Andrea Vedaldi, Bernard Ghanem, Andrea Tagliasacchi, Marc Van Droogenbroeck\n- **🏫 单位**：University of Liège ⟐ KAUST ⟐ University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2505.19175.md)] [[arXiv:2505.19175](https://arxiv.org/abs/2505.19175)] [[Code](https://github.com/trianglesplatting/triangle-splatting)]\n- **📝 说明**:\n\n#### [104] FHGS: Feature-Homogenized Gaussian Splatting\n- **🧑‍🔬 作者**：Q. G. Duan, Benyun Zhao, Mingqiao Han Yijun Huang, Ben M. Chen\n- **🏫 单位**：The Chinese University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2505.19154.md)] [[arXiv:2505.19154](https://arxiv.org/abs/2505.19154)] [Code]\n- **📝 说明**:\n\n#### [105] Veta-GS: View-dependent deformable 3D Gaussian Splatting for thermal infrared Novel-view Synthesis\n- **🧑‍🔬 作者**：Myeongseok Nam, Wongi Park, Minsol Kim, Hyejin Hur, Soomok Lee\n- **🏫 单位**：Ajou University ⟐ Sejong University ⟐ Korea University\n- **🔗 链接**：[[中英摘要](./abs/2505.19138.md)] [[arXiv:2505.19138](https://arxiv.org/abs/2505.19138)] [[Code](https://github.com/nbril0313/Veta-GS)]\n- **📝 说明**:\n\n#### [106] VPGS-SLAM: Voxel-based Progressive 3D Gaussian SLAM in Large-Scale Scenes\n- **🧑‍🔬 作者**：Tianchen Deng, Wenhua Wu, Junjie He, Yue Pan, Xirui Jiang, Shenghai Yuan, Danwei Wang, Hesheng Wang, Weidong Chen\n- **🏫 单位**： Shanghai Jiao Tong University ⟐ HKUST ⟐ University of Bonn ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2505.18992.md)] [[arXiv:2505.18992](https://arxiv.org/abs/2505.18992)] [[Code](https://github.com/dtc111111/vpgs-slam)]\n- **📝 说明**:\n\n#### [107] Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Yitian Yuan, Qianyue He\n- **🏫 单位**：Shanghai Jiaotong University ⟐ Tsinghua University\n- **🔗 链接**：[[中英摘要](./abs/2505.18764.md)] [[arXiv:2505.18764](https://arxiv.org/abs/2505.18764)] [Code]\n- **📝 说明**:\n\n#### [108] SuperGS: Consistent and Detailed 3D Super-Resolution Scene Reconstruction via Gaussian Splatting\n- **🧑‍🔬 作者**：Shiyun Xie, Zhiru Wang, Yinghao Zhu, Xu Wang, Chengwei Pan, Xiwang Dong\n- **🏫 单位**：Beihang University\n- **🔗 链接**：[[中英摘要](./abs/2505.18649.md)] [[arXiv:2505.18649](https://arxiv.org/abs/2505.18649)] [Code]\n- **📝 说明**:\n\n#### [109] Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance\n- **🧑‍🔬 作者**：Jack Goffinet, Youngjo Min, Carlo Tomasi, David E. Carlson\n- **🏫 单位**：Duke University\n- **🔗 链接**：[[中英摘要](./abs/2505.18342.md)] [[arXiv:2505.18342](https://arxiv.org/abs/2505.18342)] [Code]\n- **📝 说明**:\n\n#### [110] CGS-GAN: 3D Consistent Gaussian Splatting GANs for High Resolution Human Head Synthesis\n- **🧑‍🔬 作者**：Florian Barthel, Wieland Morgenstern, Paul Hinzer, Anna Hilsmann, Peter Eisert\n- **🏫 单位**：Fraunhofer HHI\n- **🔗 链接**：[[中英摘要](./abs/2505.17590.md)] [[arXiv:2505.17590](https://arxiv.org/abs/2505.17590)] [[Code](https://github.com/fraunhoferhhi/cgs-gan)]\n- **📝 说明**:\n\n#### [111] From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation\n- **🧑‍🔬 作者**：Mahmoud Chick Zaouali, Todd Charter, Homayoun Najjaran\n- **🏫 单位**： University of Victoria\n- **🔗 链接**：[[中英摘要](./abs/2505.17402.md)] [[arXiv:2505.17402](https://arxiv.org/abs/2505.17402)] [Code]\n- **📝 说明**:\n\n#### [112] Render-FM: A Foundation Model for Real-time Photorealistic Volumetric Rendering\n- **🧑‍🔬 作者**：Zhongpai Gao, Meng Zheng, Benjamin Planche, Anwesa Choudhuri, Terrence Chen, Ziyan Wu\n- **🏫 单位**：United Imaging Intelligence\n- **🔗 链接**：[[中英摘要](./abs/2505.17338.md)] [[arXiv:2505.17338](https://arxiv.org/abs/2505.17338)] [Code]\n- **📝 说明**:\n\n#### [113] PlantDreamer: Achieving Realistic 3D Plant Models with Diffusion-Guided Gaussian Splatting\n- **🧑‍🔬 作者**：Zane K J Hartley, Lewis A G Stuart, Andrew P French, Michael P Pound\n- **🏫 单位**：University of Nottingham\n- **🔗 链接**：[[中英摘要](./abs/2505.15528.md)] [[arXiv:2505.15528](https://arxiv.org/abs/2505.15528)] [Code]\n- **📝 说明**:\n\n#### [114] GS2E: Gaussian Splatting is an Effective Data Generator for Event Stream Generation\n- **🧑‍🔬 作者**：Yuchen Li, Chaoran Feng, Zhenyu Tang, Kaiyuan Deng, Wangbo Yu, Yonghong Tian, Li Yuan\n- **🏫 单位**：Peking University ⟐ Clemson University\n- **🔗 链接**：[[中英摘要](./abs/2505.15287.md)] [[arXiv:2505.15287](https://arxiv.org/abs/2505.15287)] [[Code](https://github.com/PKU-YuanGroup/GS2E)]\n- **📝 说明**:\n\n#### [115] X-GRM: Large Gaussian Reconstruction Model for Sparse-view X-rays to Computed Tomography\n- **🧑‍🔬 作者**：Yifan Liu, Wuyang Li, Weihao Yu, Chenxin Li, Alexandre Alahi, Max Meng, Yixuan Yuan\n- **🏫 单位**：The Chinese University of Hong Kong ⟐ EPFL ⟐ SUSTech\n- **🔗 链接**：[[中英摘要](./abs/2505.15235.md)] [[arXiv:2505.15235](https://arxiv.org/abs/2505.15235)] [[Code](https://github.com/CUHK-AIM-Group/X-GRM)]\n- **📝 说明**:\n\n#### [116] GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting\n- **🧑‍🔬 作者**：Wenjie Liu, Zhongliang Liu, Junwei Shu, Changbo Wang, Yang Li\n- **🏫 单位**：East China Normal University\n- **🔗 链接**：[[中英摘要](./abs/2505.15208.md)] [[arXiv:2505.15208](https://arxiv.org/abs/2505.15208)] [[Code](https://github.com/vpx-ecnu/GT2-GS)]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [117] Scan, Materialize, Simulate: A Generalizable Framework for Physically Grounded Robot Planning\n- **🧑‍🔬 作者**：Amine Elhafsi, Daniel Morton, Marco Pavone\n- **🏫 单位**：Stanford University ⟐ NVIDIA Research\n- **🔗 链接**：[[中英摘要](./abs/2505.14938.md)] [[arXiv:2505.14938](https://arxiv.org/abs/2505.14938)] [Code]\n- **📝 说明**:\n\n#### [118] Personalize Your Gaussian: Consistent 3D Scene Personalization from a Single Image\n- **🧑‍🔬 作者**：Yuxuan Wang, Xuanyu Yi, Qingshan Xu, Yuan Zhou, Long Chen, Hanwang Zhang\n- **🏫 单位**：Nanyang Technological University ⟐ Hong Kong University of Science and Technology\n- **🔗 链接**：[[中英摘要](./abs/2505.14537.md)] [[arXiv:2505.14537](https://arxiv.org/abs/2505.14537)] [Code]\n- **📝 说明**:\n\n#### [119] MGStream: Motion-aware 3D Gaussian for Streamable Dynamic Scene Reconstruction\n- **🧑‍🔬 作者**：Zhenyu Bao, Qing Li, Guibiao Liao, Zhongyuan Zhao, Kanglin Liu\n- **🏫 单位**：Peking University ⟐ Pengcheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2505.13839.md)] [[arXiv:2505.13839](https://arxiv.org/abs/2505.13839)] [[Code](https://github.com/pcl3dv/MGStream)]\n- **📝 说明**:\n\n#### [120] Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos\n- **🧑‍🔬 作者**：Ruoyu Wang, Yi Ma, Shenghua Gao\n- **🏫 单位**：Transcengram\n- **🔗 链接**：[[中英摘要](./abs/2505.13440.md)] [[arXiv:2505.13440](https://arxiv.org/abs/2505.13440)] [[Code](https://github.com/Dwawayu/Pensieve)]\n- **📝 说明**:\n\n#### [121] Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation\n- **🧑‍🔬 作者**：Seungjun Oh, Younggeun Lee, Hyejin Jeon, Eunbyung Park\n- **🏫 单位**：Sungkyunkwan University ⟐ Yonsei University\n- **🔗 链接**：[[中英摘要](./abs/2505.13215.md)] [[arXiv:2505.13215](https://arxiv.org/abs/2505.13215)] [[Code](https://github.com/ohsngjun/3D-4DGS)]\n- **📝 说明**:\n\n#### [122] 3D Gaussian Adaptive Reconstruction for Fourier Light-Field Microscopy\n- **🧑‍🔬 作者**：Chenyu Xu, Zhouyu Jin, Chengkang Shen, Hao Zhu, Zhan Ma, Bo Xiong, You Zhou, Xun Cao, Ning Gu\n- **🏫 单位**：Nanjing University ⟐ Peking University\n- **🔗 链接**：[[中英摘要](./abs/2505.12875.md)] [[arXiv:2505.12875](https://arxiv.org/abs/2505.12875)] [Code]\n- **📝 说明**:\n\n#### [123] TACOcc:Target-Adaptive Cross-Modal Fusion with Volume Rendering for 3D Semantic Occupancy\n- **🧑‍🔬 作者**：Luyao Lei, Shuo Xu, Yifan Bai, Xing Wei\n- **🏫 单位**：Xi’an Jiaotong University\n- **🔗 链接**：[[中英摘要](./abs/2505.12693.md)] [[arXiv:2505.12693](https://arxiv.org/abs/2505.12693)] [Code]\n- **📝 说明**:\n\n#### [124] GTR: Gaussian Splatting Tracking and Reconstruction of Unknown Objects Based on Appearance and Geometric Complexity\n- **🧑‍🔬 作者**：Takuya Ikeda, Sergey Zakharov, Muhammad Zubair Irshad, Istvan Balazs Opra, Shun Iwase, Dian Chen, Mark Tjersland, Robert Lee, Alexandre Dilly, Rares Ambrus, Koichi Nishiwaki\n- **🏫 单位**：Woven by Toyota, Inc. ⟐ Toyota Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2505.11905.md)] [[arXiv:2505.11905](https://arxiv.org/abs/2505.11905)] [Code]\n- **📝 说明**:\n\n#### [125] MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos\n- **🧑‍🔬 作者**：Hongyi Zhou, Xiaogang Wang, Yulan Guo, Kai Xu\n- **🏫 单位**：National University of Defense Technology ⟐ Southwest University ⟐ Sun Yat-sen University\n- **🔗 链接**：[[中英摘要](./abs/2505.11868.md)] [[arXiv:2505.11868](https://arxiv.org/abs/2505.11868)] [Code]\n- **📝 说明**:\n\n#### [126] GrowSplat: Constructing Temporal Digital Twins of Plants with Gaussian Splats\n- **🧑‍🔬 作者**：Simeon Adebola, Shuangyu Xie, Chung Min Kim, Justin Kerr, Bart M. van Marrewijk, Mieke van Vlaardingen, Tim van Daalen, E.N. van Loo, Jose Luis Susa Rincon, Eugen Solowjow, Rick van de Zedde, Ken Goldberg\n- **🏫 单位**：UC Berkeley ⟐ Siemens Research Lab ⟐ Wageningen University and Research\n- **🔗 链接**：[[中英摘要](./abs/2505.10923.md)] [[arXiv:2505.10923](https://arxiv.org/abs/2505.10923)] [Code]\n- **📝 说明**:\n\n#### [127] EA-3DGS: Efficient and Adaptive 3D Gaussians with Highly Enhanced Quality for outdoor scenes\n- **🧑‍🔬 作者**：Jianlin Guo, Haihong Xiao, Wenxiong Kang\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2505.10787.md)] [[arXiv:2505.10787](https://arxiv.org/abs/2505.10787)] [Code]\n- **📝 说明**:\n\n#### [128] GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention\n- **🧑‍🔬 作者**：Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan\n- **🏫 单位**：Georgia Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2505.10685.md)] [[arXiv:2505.10685](https://arxiv.org/abs/2505.10685)] [Code]\n- **📝 说明**:\n\n#### [129] ExploreGS: a vision-based low overhead framework for 3D scene reconstruction\n- **🧑‍🔬 作者**：Yunji Feng, Chengpu Yu, Fengrui Ran, Zhi Yang, Yinni Liu\n- **🏫 单位**：Beijing Institute of Technology\n- **🔗 链接**：[[中英摘要](./abs/2505.10578.md)] [[arXiv:2505.10578](https://arxiv.org/abs/2505.10578)] [Code]\n- **📝 说明**:\n\n#### [130] Consistent Quantity-Quality Control across Scenes for Deployment-Aware Gaussian Splatting\n- **🧑‍🔬 作者**：Fengdi Zhang, Hongkun Cao, Ruqi Huang\n- **🏫 单位**：Tsinghua University ⟐ Pengcheng Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2505.10473.md)] [[arXiv:2505.10473](https://arxiv.org/abs/2505.10473)] [[Code](https://github.com/zhang-fengdi/ControlGS)]\n- **📝 说明**:\n\n#### [131] ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars\n- **🧑‍🔬 作者**：Rui-Yang Ju, Sheng-Yen Huang, Yi-Ping Hung\n- **🏫 单位**：National Taiwan University\n- **🔗 链接**：[[中英摘要](./abs/2505.10072.md)] [[arXiv:2505.10072](https://arxiv.org/abs/2505.10072)] [Code]\n- **📝 说明**:\n\n#### [132] Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware\n- **🧑‍🔬 作者**：Justin Yu, Letian Fu, Huang Huang, Karim El-Refai, Rares Andrei Ambrus, Richard Cheng, Muhammad Zubair Irshad, Ken Goldberg\n- **🏫 单位**：University of California, Berkeley ⟐ Toyota Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2505.09601.md)] [[arXiv:2505.09601](https://arxiv.org/abs/2505.09601)] [[Code](https://github.com/uynitsuj/real2render2real)]\n- **📝 说明**:\n\n#### [133] Neural Video Compression using 2D Gaussian Splatting\n- **🧑‍🔬 作者**：Lakshya Gupta, Imran N. Junejo\n- **🏫 单位**：University of Toronto ⟐ AMD\n- **🔗 链接**：[[中英摘要](./abs/2505.09324.md)] [[arXiv:2505.09324](https://arxiv.org/abs/2505.09324)] [Code]\n- **📝 说明**:\n\n#### [134] TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian\n- **🧑‍🔬 作者**：Shijie Lian, Ziyi Zhang, Laurence Tianruo Yang and, Mengyu Ren, Debin Liu, Hua Li\n- **🏫 单位**：Huazhong University of Science and Technology ⟐ The Chinese University of Hong Kong, Shenzhen ⟐ Zhengzhou University ⟐ Hainan University\n- **🔗 链接**：[[中英摘要](./abs/2505.08811.md)] [[arXiv:2505.08811](https://arxiv.org/abs/2505.08811)] [Code]\n- **📝 说明**:\n\n#### [135] SLAG: Scalable Language-Augmented Gaussian Splatting\n- **🧑‍🔬 作者**：Laszlo Szilagyi, Francis Engelmann, Jeannette Bohg\n- **🏫 单位**：Stanford University\n- **🔗 链接**：[[中英摘要](./abs/2505.08124.md)] [[arXiv:2505.08124](https://arxiv.org/abs/2505.08124)] [Code]\n- **📝 说明**:\n\n#### [136] UltraGauss: Ultrafast Gaussian Reconstruction of 3D Ultrasound Volumes\n- **🧑‍🔬 作者**：Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner, Angela Dai\n- **🏫 单位**：Visual Geometry Group, University of Oxford ⟐ Oxford Machine Learning in NeuroImaging Lab, University of Oxford\n- **🔗 链接**：[[中英摘要](./abs/2505.05643.md)] [[arXiv:2505.05643](https://arxiv.org/abs/2505.05643)] [Code]\n- **📝 说明**:\n\n#### [137] MoRe-3DGSMR: Motion-resolved reconstruction framework for free-breathing pulmonary MRI based on 3D Gaussian representation\n- **🧑‍🔬 作者**：Tengya Peng, Ruyi Zha, Qing Zou\n- **🏫 单位**： University of Texas Southwestern Medical Center ⟐ Australian National University\n- **🔗 链接**：[[中英摘要](./abs/2505.04959.md)] [[arXiv:2505.04959](https://arxiv.org/abs/2505.04959)] [Code]\n- **📝 说明**:\n\n#### [138] GSsplat: Generalizable Semantic Gaussian Splatting for Novel-view Synthesis in 3D Scenes\n- **🧑‍🔬 作者**：Feng Xiao, Hongbin Xu, Wanlin Liang, Wenxiong Kang\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2505.04659.md)] [[arXiv:2505.04659](https://arxiv.org/abs/2505.04659)] [[Code](https://github.com/onmyoji-xiao/GSsplat)]\n- **📝 说明**:\n\n#### [139] 3D Gaussian Splatting Data Compression with Mixture of Priors\n- **🧑‍🔬 作者**：Lei Liu, Zhenghao Chen, Dong Xu\n- **🏫 单位**：The University of Hong Kong Hong Kong SAR, China ⟐ The University of Newcastle Newcastle, Australia\n- **🔗 链接**：[[中英摘要](./abs/2505.03310.md)] [[arXiv:2505.03310](https://arxiv.org/abs/2505.03310)] [Code]\n- **📝 说明**:\n\n#### [140] SignSplat: Rendering Sign Language via Gaussian Splatting\n- **🧑‍🔬 作者**：Maksym Ivashechkin, Oscar Mendez, Richard Bowden\n- **🏫 单位**：CVSSP, University of Surrey, Guildford, United Kingdom\n- **🔗 链接**：[[中英摘要](./abs/2505.02108.md)] [[arXiv:2505.02108](https://arxiv.org/abs/2505.02108)] [Code]\n- **📝 说明**:\n\n#### [141] GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Anushka Agarwal, Muhammad Yusuf Hassan, Talha Chafekar\n- **🏫 单位**：University of Massachusetts Amherst\n- **🔗 链接**：[[中英摘要](./abs/2505.01928.md)] [[arXiv:2505.01928](https://arxiv.org/abs/2505.01928)] [Code]\n- **📝 说明**:\n\n#### [142] AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting\n- **🧑‍🔬 作者**：Junhao Shi, Jisheng Xu, Jianping He, Zhiliang Lin\n- **🏫 单位**：School of Electronic Information and Electrical Engineering ⟐ Shanghai Jiao Tong University\n- **🔗 链接**：[[中英摘要](./abs/2505.01799.md)] [[arXiv:2505.01799](https://arxiv.org/abs/2505.01799)] [Code]\n- **📝 说明**:\n\n#### [143] Real-Time Animatable 2DGS-Avatars with Detail Enhancement from Monocular Videos\n- **🧑‍🔬 作者**：Xia Yuan, Hai Yuan, Wenyi Ge, Ying Fu, Xi Wu, Guanyu Xing\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2505.00421.md)] [[arXiv:2505.00421](https://arxiv.org/abs/2505.00421)] [Code]\n- **📝 说明**:\n\n#### [144] HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation\n- **🧑‍🔬 作者**：Haiyang Zhou, Wangbo Yu, Jiawen Guan, Xinhua Cheng, Yonghong Tian, Li Yuan\n- **🏫 单位**：School of Electronic and Computer Engineering, Peking University, Shenzhen, China ⟐ Harbin Institute of Technology, Shenzhen, China\n- **🔗 链接**：[[中英摘要](./abs/2504.21650.md)] [[arXiv:2504.21650](https://arxiv.org/abs/2504.21650)] [[Code](https://github.com/PKU-YuanGroup/HoloTime)]\n- **📝 说明**:\n\n#### [145] GauSS-MI: Gaussian Splatting Shannon Mutual Information for Active 3D Reconstruction\n- **🧑‍🔬 作者**：Yuhan Xie, Yixi Cai, Yinqiang Zhang, Lei Yang, Jia Pan\n- **🏫 单位**：School of Computing and Data Science, The University of Hong Kong, Hong Kong SAR, China ⟐ Division of Robotics, Perception, and Learning, KTH Royal Institute of Technology, Stockholm, Sweden ⟐ Faculty of Engineering, The University of Hong Kong, Hong Kong SAR, China ⟐ Centre for Transformative Garment Production, Hong Kong SAR, China\n- **🔗 链接**：[[中英摘要](./abs/2504.21067.md)] [[arXiv:2504.21067](https://arxiv.org/abs/2504.21067)] [Code]\n- **📝 说明**:\n\n#### [146] GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion\n- **🧑‍🔬 作者**：Jiaxin Hong, Sixu Chen, Shuoyang Sun, Hongyao Yu, Hao Fang, Yuqi Tan, Bin Chen, Shuhan Qi, Jiawei Li\n- **🏫 单位**：Harbin Institute of Technology(Shenzhen) Shenzhen, China ⟐ South China University of Technology Guangzhou, China ⟐ Shenzhen Internation Graduate School, Tsinghua University Shenzhen, China ⟐ Huawei Manufacturing Shenzhen, China\n- **🔗 链接**：[[中英摘要](./abs/2504.20829.md)] [[arXiv:2504.20829](https://arxiv.org/abs/2504.20829)] [Code]\n- **📝 说明**:\n\n#### [147] Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting\n- **🧑‍🔬 作者**：Hanxi Liu, Yifang Men, Zhouhui Lian\n- **🏫 单位**：Wangxuan Institute of Computer Technology, Peking University, China ⟐ Institute for Intelligent Computing, Alibaba Group\n- **🔗 链接**：[[中英摘要](./abs/2504.20403.md)] [[arXiv:2504.20403](https://arxiv.org/abs/2504.20403)] [Code]\n- **📝 说明**:\n\n#### [148] GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jongwon Lee, Timothy Bretl\n- **🏫 单位**：University of Illinois Urbana-Champaign\n- **🔗 链接**：[[中英摘要](./abs/2504.20379.md)] [[arXiv:2504.20379](https://arxiv.org/abs/2504.20379)] [Code]\n- **📝 说明**:\n\n#### [149] GSFF-SLAM: 3D Semantic Gaussian Splatting SLAM via Feature Field\n- **🧑‍🔬 作者**：Zuxing Lu, Xin Yuan, Shaowen Yang, Jingyu Liu, Jiawei Wang, Changyin Sun\n- **🏫 单位**：Southeast University ⟐ Tongji University\n- **🔗 链接**：[[中英摘要](./abs/2504.19409.md)] [[arXiv:2504.19409](https://arxiv.org/abs/2504.19409)] [Code]\n- **📝 说明**:\n\n#### [150] Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting\n- **🧑‍🔬 作者**：Xiaofeng Jin, Yan Fang, Matteo Frosi, Jianfei Ge, Jiangjian Xiao, Matteo Matteucci\n- **🏫 单位**：Politecnico di Milano, Milan 20133, Italy ⟐ Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2504.19261.md)] [[arXiv:2504.19261](https://arxiv.org/abs/2504.19261)] [Code]\n- **📝 说明**:\n\n#### [151] 4DGS-CC: A Contextual Coding Framework for 4D Gaussian Splatting Data Compression\n- **🧑‍🔬 作者**：Zicong Chen, Zhenghao Chen, Wei Jiang, Wei Wang, Lei Liu, Dong Xu\n- **🏫 单位**：Beihang University ⟐ The University of Newcastle, Australia ⟐ Futurewei Technologies Inc,4 The University of Hong Kong\n- **🔗 链接**：[[中英摘要](./abs/2504.18925.md)] [[arXiv:2504.18925](https://arxiv.org/abs/2504.18925)] [Code]\n- **📝 说明**:\n\n#### [152] RGS-DR: Reflective Gaussian Surfels with Deferred Rendering for Shiny Objects\n- **🧑‍🔬 作者**：Georgios Kouros, Minye Wu, Tinne Tuytelaars\n- **🏫 单位**：KU Leuven\n- **🔗 链接**：[[中英摘要](./abs/2504.18468.md)] [[arXiv:2504.18468](https://arxiv.org/abs/2504.18468)] [Code]\n- **📝 说明**:\n\n#### [153] STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting\n- **🧑‍🔬 作者**：Yunze Deng, Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu\n- **🏫 单位**：School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China\n- **🔗 链接**：[[中英摘要](./abs/2504.18318.md)] [[arXiv:2504.18318](https://arxiv.org/abs/2504.18318)] [Code]\n- **📝 说明**:\n\n#### [154] When Gaussian Meets Surfel: Ultra-fast High-fidelity Radiance Field Rendering\n- **🧑‍🔬 作者**：Keyang Ye, Tianjia Shao, Kun Zhou\n- **🏫 单位**：StateKey Lab of CAD&CG, Zhejiang University, China\n- **🔗 链接**：[[中英摘要](./abs/2504.17545.md)] [[arXiv:2504.17545](https://arxiv.org/abs/2504.17545)] [Code]\n- **📝 说明**:\n\n#### [155] Gaussian Splatting is an Effective Data Generator for 3D Object Detection\n- **🧑‍🔬 作者**：Farhad G. Zanjani, Davide Abati, Auke Wiggers, Dimitris Kalatzis, Jens Petersen, Hong Cai, Amirhossein Habibian\n- **🏫 单位**：Qualcomm AI Research\n- **🔗 链接**：[[中英摘要](./abs/2504.16740.md)] [[arXiv:2504.16740](https://arxiv.org/abs/2504.16740)] [Code]\n- **📝 说明**:\n\n#### [156] ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration\n- **🧑‍🔬 作者**：Andrea Conti, Matteo Poggi, Valerio Cambareri, Martin R. Oswald, Stefano Mattoccia\n- **🏫 单位**：University of Bologna, Italy ⟐ Sony DepthSensing Solutions, Belgium ⟐ University of Amsterdam, Netherlands\n- **🔗 链接**：[[中英摘要](./abs/2504.16545.md)] [[arXiv:2504.16545](https://arxiv.org/abs/2504.16545)] [Code]\n- **📝 说明**:\n\n#### [157] StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians\n- **🧑‍🔬 作者**：Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, Shengqi Liu, Yiying Yang, Xianfang Zeng, Gang Yu, Ming Li\n- **🏫 单位**：ShanghaiTech University ⟐ StepFun ⟐ AIGC Research (AI4C Team) ⟐ Guangming Lab\n- **🔗 链接**：[[中英摘要](./abs/2504.15281.md)] [[arXiv:2504.15281](https://arxiv.org/abs/2504.15281)] [[Code](https://github.com/AIGCResearch/styleme3d)]\n- **📝 说明**:\n\n#### [158] MoBGS: Motion Deblurring Dynamic 3D Gaussian Splatting for Blurry Monocular Video\n- **🧑‍🔬 作者**：Minh-Quan Viet Bui, Jongmin Park, Juan Luis Gonzalez Bello, Jaeho Moon, Jihyong Oh, Munchurl Kim\n- **🏫 单位**：KAIST ⟐ Department of Imaging Science, GSAIM, Chung-Ang University\n- **🔗 链接**：[[中英摘要](./abs/2504.15122.md)] [[arXiv:2504.15122](https://arxiv.org/abs/2504.15122)] [Code]\n- **📝 说明**: 🏆 Accepted to AAAI 2026\n\n#### [159] IXGS-Intraoperative 3D Reconstruction from Sparse, Arbitrarily Posed Real X-rays\n- **🧑‍🔬 作者**：Sascha Jecklin, Aidana Massalimova, Ruyi Zha, Lilian Calvet, Christoph J. Laux, Mazda Farshad, Philipp Fürnstahl\n- **🏫 单位**：Research in Orthopedic Computer Science, Balgrist University Hospital, Zurich, 8008, Switzerland ⟐ Department of Orthopedics, Balgrist University Hospital, University of Zurich, Zurich, 8008, Switzerland ⟐ The Australian National University, Canberra, ACT 2601, Australia\n- **🔗 链接**：[[中英摘要](./abs/2504.14699.md)] [[arXiv:2504.14699](https://arxiv.org/abs/2504.14699)] [[Code](https://github.com/MrMonk3y/IXGS)]\n- **📝 说明**：\n\n#### [160] VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control\n- **🧑‍🔬 作者**：Lifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan, Anke Xue\n- **🏫 单位**：Hangzhou Dianzi University ⟐ Intel Labs China ⟐ Lishui University\n- **🔗 链接**：[[中英摘要](./abs/2504.14548.md)] [[arXiv:2504.14548](https://arxiv.org/abs/2504.14548)] [Code]\n- **📝 说明**：\n\n#### [161] Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding\n- **🧑‍🔬 作者**：Junyan Su, Baozhu Zhao, Xiaohan Zhang, Qi Liu\n- **🏫 单位**：Department of Future Technology, South China University of Technology, Guangzhou\n- **🔗 链接**：[[中英摘要](./abs/2504.14460.md)] [[arXiv:2504.14460](https://arxiv.org/abs/2504.14460)] [Code]\n- **📝 说明**：\n\n#### [162] SEGA: Drivable 3D Gaussian Head Avatar from a Single Image\n- **🧑‍🔬 作者**：Chen Guo, Zhuo Su, Jian Wang, Shuang Li, Xu Chang, Zhaohu Li, Yang Zhao, Guidong Wang, Ruqi Huang\n- **🏫 单位**：Tsinghua Shenzhen International Graduate School ⟐ ByteDance\n- **🔗 链接**：[[中英摘要](./abs/2504.14373.md)] [[arXiv:2504.14373](https://arxiv.org/abs/2504.14373)] [Code]\n- **📝 说明**：\n\n#### [163] EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Beizhen Zhao, Yifan Zhou, Zijian Wang, Hao Wang\n- **🏫 单位**：The Hong Kong University of Science and Technology (Guangzhou)\n- **🔗 链接**：[[中英摘要](./abs/2504.13540.md)] [[arXiv:2504.13540](https://arxiv.org/abs/2504.13540)] [Code]\n- **📝 说明**：\n\n#### [164] Volume Encoding Gaussians: Transfer Function-Agnostic 3D Gaussians for Volume Rendering\n- **🧑‍🔬 作者**：Landon Dyken, Andres Sewell, Will Usher, Steve Petruzza, Sidharth Kumar\n- **🏫 单位**：\n- **🔗 链接**：[[中英摘要](./abs/2504.13339.md)] [[arXiv:2504.13339](https://arxiv.org/abs/2504.13339)] [Code]\n- **📝 说明**：\n\n#### [165] BEV-GS: Feed-forward Gaussian Splatting in Bird's-Eye-View for Road Reconstruction\n- **🧑‍🔬 作者**：Wenhua Wu, Tong Zhao, Chensheng Peng, Lei Yang, Yintao Wei, Zhe Liu, Hesheng Wang\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Tsinghua University ⟐ University of California, Berkeley\n- **🔗 链接**：[[中英摘要](./abs/2504.13207.md)] [[arXiv:2504.13207](https://arxiv.org/abs/2504.13207)] [[Code](https://github.com/cat-wwh/BEV-GS)]\n- **📝 说明**：\n\n#### [166] EDGS: Eliminating Densification for Efficient Convergence of 3DGS\n- **🧑‍🔬 作者**：Dmytro Kotovenko, Olga Grebenkova, Björn Ommer\n- **🏫 单位**：CompVis @ LMU Munich ⟐ Munich Center for Machine Learning (MCML)\n- **🔗 链接**：[[中英摘要](./abs/2504.13204.md)] [[arXiv:2504.13204](https://arxiv.org/abs/2504.13204)] [[Code](https://github.com/CompVis/EDGS)]\n- **📝 说明**：\n\n#### [167] Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs\n- **🧑‍🔬 作者**：Shaohui Dai, Yansong Qu, Zheyan Li, Xinyang Li, Shengchuan Zhang, Liujuan Cao\n- **🏫 单位**：Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, Xiamen University, China\n- **🔗 链接**：[[中英摘要](./abs/2504.13153.md)] [[arXiv:2504.13153](https://arxiv.org/abs/2504.13153)] [[Code](https://github.com/Atrovast/THGS)]\n- **📝 说明**：\n\n#### [168] CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation\n- **🧑‍🔬 作者**：Xiangrui Liu, Xinju Wu, Shiqi Wang, Zhu Li, Sam Kwong\n- **🏫 单位**：City University of Hong Kong ⟐ University of Missouri–Kansas City ⟐ Lingnan University\n- **🔗 链接**：[[中英摘要](./abs/2504.13022.md)] [[arXiv:2504.13022](https://arxiv.org/abs/2504.13022)] [Code]\n- **📝 说明**：\n\n#### [169] GSAC: Leveraging Gaussian Splatting for Photorealistic Avatar Creation with Unity Integration\n- **🧑‍🔬 作者**：Rendong Zhang, Alexandra Watkins, Nilanjan Sarkar\n- **🏫 单位**：Dept. of Computer Science Vanderbilt University\n- **🔗 链接**：[[中英摘要](./abs/2504.12999.md)] [[arXiv:2504.12999](https://arxiv.org/abs/2504.12999)] [Code]\n- **📝 说明**：\n\n#### [170] Second-order Optimization of Gaussian Splats with Importance Sampling\n- **🧑‍🔬 作者**：Hamza Pehlivan, Andrea Boscolo Camiletto, Lin Geng Foo, Marc Habermann, Christian Theobalt\n- **🏫 单位**：Max Planck Institute for Informatics, Saarland Informatics Campus\n- **🔗 链接**：[[中英摘要](./abs/2504.12905.md)] [[arXiv:2504.12905](https://arxiv.org/abs/2504.12905)] [Code]\n- **📝 说明**：\n\n#### [171] AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering\n- **🧑‍🔬 作者**：Michael Steiner, Thomas Köhler, Lukas Radl, Felix Windisch, Dieter Schmalstieg, Markus Steinberger\n- **🏫 单位**：Graz University of Technology ⟐ University of Stuttgart\n- **🔗 链接**：[[中英摘要](./abs/2504.12811.md)] [[arXiv:2504.12811](https://arxiv.org/abs/2504.12811)] [Code]\n- **📝 说明**：\n\n#### [172] CAGE-GS: High-fidelity Cage Based 3D Gaussian Splatting Deformation\n- **🧑‍🔬 作者**：Yifei Tong, Runze Tian, Xiao Han, Dingyao Liu, Fenggen Yu, Yan Zhang\n- **🏫 单位**：Nanjing University ⟐ Simon Fraser University\n- **🔗 链接**：[[中英摘要](./abs/2504.12800.md)] [[arXiv:2504.12800](https://arxiv.org/abs/2504.12800)] [Code]\n- **📝 说明**：\n\n#### [173] ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior\n- **🧑‍🔬 作者**：Xiao Han, Runze Tian, Yifei Tong, Fenggen Yu, Dingyao Liu, Yan Zhang\n- **🏫 单位**：Nanjing University ⟐ Simon Fraser University\n- **🔗 链接**：[[中英摘要](./abs/2504.12788.md)] [[arXiv:2504.12788](https://arxiv.org/abs/2504.12788)] [Code]\n- **📝 说明**：\n\n#### [174] SHeaP: Self-Supervised Head Geometry Predictor Learned via 2D Gaussians\n- **🧑‍🔬 作者**：Liam Schoneveld, Zhe Chen, Davide Davoli, Jiapeng Tang, Saimon Terazawa, Ko Nishino, Matthias Nießner\n- **🏫 单位**：Woven by Toyota ⟐ Toyota Motor Europe NV/SA ⟐ Technical University of Munich ⟐ Kyoto University\n- **🔗 链接**：[[中英摘要](./abs/2504.12292.md)] [[arXiv:2504.12292](https://arxiv.org/abs/2504.12292)] [Code]\n- **📝 说明**：\n\n#### [175] CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting\n- **🧑‍🔬 作者**：Wei Sun, Yanzhao Zhou, Jianbin Jiao, Yuan Li\n- **🏫 单位**：University of Chinese Academy of Sciences\n- **🔗 链接**：[[中英摘要](./abs/2504.11893.md)] [[arXiv:2504.11893](https://arxiv.org/abs/2504.11893)] [Code]\n- **📝 说明**：\n\n#### [176] 3DAffordSplat: Efficient Affordance Reasoning with 3D Gaussians\n- **🧑‍🔬 作者**：Zeming Wei, Junyi Lin, Yang Liu, Weixing Chen, Jingzhou Luo, Guanbin Li, Liang Lin\n- **🏫 单位**：Sun Yat-senUniversity,China ⟐ Peng Cheng Laboratory ⟐ Guangdong Key Laboratory of Big Data Analysis and Processing\n- **🔗 链接**：[[中英摘要](./abs/2504.11218.md)] [[arXiv:2504.11218](https://arxiv.org/abs/2504.11218)] [[Code](https://github.com/HCPLab-SYSU/3DAffordSplat)]\n- **📝 说明**：\n\n#### [177] Easy3D: A Simple Yet Effective Method for 3D Interactive Segmentation\n- **🧑‍🔬 作者**：Andrea Simonelli, Norman Müller, Peter Kontschieder\n- **🏫 单位**：Meta Reality Labs Zürich\n- **🔗 链接**：[[中英摘要](./abs/2504.11024.md)] [[arXiv:2504.11024](https://arxiv.org/abs/2504.11024)] [Code]\n- **📝 说明**：\n\n#### [178] LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis\n- **🧑‍🔬 作者**：Hao Sun, Fenggen Yu, Huiyao Xu, Tao Zhang, Changqing Zou\n- **🏫 单位**：Zhejiang Lab ⟐ University of Chinese Academy of Sciences ⟐ State Key Lab of CAD&CG, Zhejiang University ⟐ Simon Fraser University ⟐ Hangzhou Dianzi University\n- **🔗 链接**：[[中英摘要](./abs/2504.10331.md)] [[arXiv:2504.10331](https://arxiv.org/abs/2504.10331)] [Code]\n- **📝 说明**：\n\n#### [179] ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting\n- **🧑‍🔬 作者**：Huiqi Wu, Jianbo Mei, Yingjie Huang, Yining Xu, Jingjiao You, Yilong Liu, Li Yao\n- **🏫 单位**：Southeast University, China\n- **🔗 链接**：[[中英摘要](./abs/2504.10316.md)] [[arXiv:2504.10316](https://arxiv.org/abs/2504.10316)] [Code]\n- **📝 说明**：\n\n#### [180] EBAD-Gaussian: Event-driven Bundle Adjusted Deblur Gaussian Splatting\n- **🧑‍🔬 作者**：Yufei Deng, Yuanjian Wang, Rong Xiao, Chenwei Tang, Jizhe Zhou, Jiahao Fan, Deng Xiong, Jiancheng Lv, Huajin Tang\n- **🏫 单位**：Sichuan University ⟐ Stevens institute of Technology Hoboken, NJ, China ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2504.10012.md)] [[arXiv:2504.10012](https://arxiv.org/abs/2504.10012)] [Code]\n- **📝 说明**：\n\n#### [181] GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting\n- **🧑‍🔬 作者**：Junlin Hao, Peiheng Wang, Haoyang Wang, Xinggong Zhang, Zongming Guo\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2504.10001.md)] [[arXiv:2504.10001](https://arxiv.org/abs/2504.10001)] [Code]\n- **📝 说明**：\n\n#### [182] MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling\n- **🧑‍🔬 作者**：Yunpeng Tan, Junlin Hao, Jiangkai Wu, Liming Liu, Qingyang Li, Xinggong Zhang\n- **🏫 单位**：Peking University\n- **🔗 链接**：[[中英摘要](./abs/2504.09878.md)] [[arXiv:2504.09878](https://arxiv.org/abs/2504.09878)] [Code]\n- **📝 说明**：\n\n#### [183] LightHeadEd: Relightable & Editable Head Avatars from a Smartphone\n- **🧑‍🔬 作者**：Pranav Manu, Astitva Srivastava, Amit Raj, Varun Jampani, Avinash Sharma, P.J. Narayanan\n- **🏫 单位**：IIIT Hyderabad ⟐ Google Research ⟐ Stability AI ⟐ IIT Jodhpur\n- **🔗 链接**：[[中英摘要](./abs/2504.09671.md)] [[arXiv:2504.09671](https://arxiv.org/abs/2504.09671)] [Code]\n- **📝 说明**：\n\n#### [184] TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting\n- **🧑‍🔬 作者**：Zhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li, Liqiang Nie\n- **🏫 单位**： Xiamen University ⟐ South China University of Technology ⟐ Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ) China ⟐ Peking University ⟐ Hangzhou Dianzi University ⟐ University of Hong Kong China ⟐  Harbin Institute of Technology (Shenzhen) China\n- **🔗 链接**：[[中英摘要](./abs/2504.09588.md)] [[arXiv:2504.09588](https://arxiv.org/abs/2504.09588)] [Code]\n- **📝 说明**：\n\n#### [185] A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds\n- **🧑‍🔬 作者**：Jizong Peng, Tze Ho Elden Tse, Kai Xu, Wenchao Gao, Angela Yao\n- **🏫 单位**：dConstruct Robotics ⟐ National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2504.09129.md)] [[arXiv:2504.09129](https://arxiv.org/abs/2504.09129)] [Code]\n- **📝 说明**：\n\n#### [186] You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting\n- **🧑‍🔬 作者**：Zhijie Shen, Chunyu Lin, Shujuan Huang, Lang Nie, Kang Liao, Yao Zhao\n- **🏫 单位**：Beijing Jiaotong University ⟐ Nanyang Technological University\n- **🔗 链接**：[[中英摘要](./abs/2504.09062.md)] [[arXiv:2504.09062](https://arxiv.org/abs/2504.09062)] [[Code](https://github.com/zhijieshen-bjtu/TPGS)]\n- **📝 说明**：\n\n#### [187] BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting\n- **🧑‍🔬 作者**：Yongchang Wu, Zipeng Qi, Zhenwei Shi, Zhengxia Zou\n- **🏫 单位**：Beihang University (BUAA)\n- **🔗 链接**：[[中英摘要](./abs/2504.09048.md)] [[arXiv:2504.09048](https://arxiv.org/abs/2504.09048)] [[Code](https://github.com/SunshineWYC/BlockGaussian)]\n- **📝 说明**：\n\n#### [188] FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents\n- **🧑‍🔬 作者**：Xin Tan, Yuzhou Ji, He Zhu, Yuan Xie\n- **🏫 单位**：East China Normal University ⟐ Shanghai Innovation Institute\n- **🔗 链接**：[[中英摘要](./abs/2504.08581.md)] [[arXiv:2504.08581](https://arxiv.org/abs/2504.08581)] [Code]\n- **📝 说明**：\n\n#### [189] InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians\n- **🧑‍🔬 作者**：Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash\n- **🏫 单位**：Brown University ⟐ Meta Reality Labs\n- **🔗 链接**：[[中英摘要](./abs/2504.07949.md)] [[arXiv:2504.07949](https://arxiv.org/abs/2504.07949)] [Code]\n- **📝 说明**：\n\n#### [190] View-Dependent Uncertainty Estimation of 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Chenyu Han, Corentin Dumery\n- **🏫 单位**：Computer Vision Lab, EPFL\n- **🔗 链接**：[[中英摘要](./abs/2504.07370.md)] [[arXiv:2504.07370](https://arxiv.org/abs/2504.07370)] [Code]\n- **📝 说明**：\n\n#### [191] GIGA: Generalizable Sparse Image-driven Gaussian Avatars\n- **🧑‍🔬 作者**：Anton Zubekhin, Heming Zhu, Paulo Gotardo, Thabo Beeler, Marc Habermann, Christian Theobalt\n- **🏫 单位**：Max Planck Institute for Informatics ⟐ Saarbrucken Research Center for Visual Computing, Interaction and AI ⟐ Google\n- **🔗 链接**：[[中英摘要](./abs/2504.07144.md)] [[arXiv:2504.07144](https://arxiv.org/abs/2504.07144)] [[Code](https://github.com/antonzub99/giga)]\n- **📝 说明**\n\n#### [192] IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments\n- **🧑‍🔬 作者**：Can Zhang, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2504.06827.md)] [[arXiv:2504.06827](https://arxiv.org/abs/2504.06827)] [Code]\n- **📝 说明**：\n\n#### [193] SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering\n- **🧑‍🔬 作者**：Hanxiao Sun, YuPeng Gao, Jin Xie, Jian Yang, Beibei Wang\n- **🏫 单位**：Nankai University ⟐ Nanjing University\n- **🔗 链接**：[[中英摘要](./abs/2504.06815.md)] [[arXiv:2504.06815](https://arxiv.org/abs/2504.06815)] [Code]\n- **📝 说明**：\n\n#### [194] GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction\n- **🧑‍🔬 作者**：Anil Armagan, Albert Saà-Garriga, Bruno Manganelli, Kyuwon Kim, M. Kerim Yucel\n- **🏫 单位**：Samsung R&D Institute UK ⟐ Samsung Electronics\n- **🔗 链接**：[[中英摘要](./abs/2504.06716.md)] [[arXiv:2504.06716](https://arxiv.org/abs/2504.06716)] [Code]\n- **📝 说明**：\n\n#### [195] Stochastic Ray Tracing of 3D Transparent Gaussians\n- **🧑‍🔬 作者**：Xin Sun, Iliyan Georgiev, Yun Fei, Miloš Hašan\n- **🏫 单位**：Adobe\n- **🔗 链接**：[[中英摘要](./abs/2504.06598.md)] [[arXiv:2504.06598](https://arxiv.org/abs/2504.06598)] [Code]\n- **📝 说明**：\n\n#### [196] econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians\n- **🧑‍🔬 作者**：Can Zhang, Gim Hee Lee\n- **🏫 单位**：National University of Singapore\n- **🔗 链接**：[[中英摘要](./abs/2504.06003.md)] [[arXiv:2504.06003](https://arxiv.org/abs/2504.06003)] [Code]\n- **📝 说明**：\n\n#### [197] Micro-splatting: Maximizing Isotropic Constraints for Refined Optimization in 3D Gaussian Splatting\n- **🧑‍🔬 作者**：Jee Won Lee, Hansol Lim, Sooyeun Yang, Jongseong Choi\n- **🏫 单位**：State University of New York, Korea ⟐ State University of New York, Stony Brook\n- **🔗 链接**：[[中英摘要](./abs/2504.05740.md)] [[arXiv:2504.05740](https://arxiv.org/abs/2504.05740)] [Code]\n- **📝 说明**：\n\n#### [198] View-Dependent Deformation Fields for 2D Editing of 3D Models\n- **🧑‍🔬 作者**：Martin El Mqirmi, Noam Aigerman\n- **🏫 单位**：Université de Montréal\n- **🔗 链接**：[[中英摘要](./abs/2504.05544.md)] [[arXiv:2504.05544](https://arxiv.org/abs/2504.05544)] [Code]\n- **📝 说明**：\n\n#### [199] L3GS: Layered 3D Gaussian Splats for Efficient 3D Scene Delivery\n- **🧑‍🔬 作者**：Yi-Zhen Tsai, Xuechen Zhang, Zheng Li, Jiasi Chen\n- **🏫 单位**：University of California ⟐ University of Michigan\n- **🔗 链接**：[[中英摘要](./abs/2504.05517.md)] [[arXiv:2504.05517](https://arxiv.org/abs/2504.05517)] [Code]\n- **📝 说明**：\n\n#### [200] Let it Snow! Animating Static Gaussian Scenes With Dynamic Weather Effects\n- **🧑‍🔬 作者**：Gal Fiebelman, Hadar Averbuch-Elor, Sagie Benaim\n- **🏫 单位**：The Hebrew University of Jerusalem ⟐ Cornell University\n- **🔗 链接**：[[中英摘要](./abs/2504.05296.md)] [[arXiv:2504.05296](https://arxiv.org/abs/2504.05296)] [Code]\n- **📝 说明**：\n\n#### [201] 3D Gaussian Particle Approximation of VDB Datasets: A Study for Scientific Visualization\n- **🧑‍🔬 作者**：Isha Sharma, Dieter Schmalstieg\n- **🏫 单位**：University of Stuttgart\n- **🔗 链接**：[[中英摘要](./abs/2504.04857.md)] [[arXiv:2504.04857](https://arxiv.org/abs/2504.04857)] [Code]\n- **📝 说明**：\n\n#### [202] 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS\n- **🧑‍🔬 作者**：Zhisheng Huang, Peng Wang, Jingdong Zhang, Yuan Liu, Xin Li, Wenping Wang\n- **🏫 单位**：Texas A&M University ⟐ Hong Kong University (HKU) ⟐ Hong Kong University of Science and Technology (HKUST)\n- **🔗 链接**：[[中英摘要](./abs/2504.04294.md)] [[arXiv:2504.04294](https://arxiv.org/abs/2504.04294)] [Code]\n- **📝 说明**：\n\n#### [203] Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning\n- **🧑‍🔬 作者**：Yuyang Zhang, Baao Xie, Hu Zhu, Qi Wang, Huanting Guo, Xin Jin, Wenjun Zeng\n- **🏫 单位**：Shanghai Jiao Tong University ⟐ Ningbo Institute of Digital Twin, Eastern Institute of Technology ⟐ Zhejiang Key Laboratory of Industrial Intelligence and Digital Twin, Eastern Institute of Technology ⟐ Hong Kong Polytechnic University\n- **🔗 链接**：[[中英摘要](./abs/2504.04190.md)] [[arXiv:2504.04190](https://arxiv.org/abs/2504.04190)] [Code]\n- **📝 说明**：\n\n#### [204] HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration\n- **🧑‍🔬 作者**：Boyuan Wang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Guan Huang, Lihong Liu, Xingang Wang\n- **🏫 单位**：GigaAI ⟐ Institute of Automation, Chinese Academy of Sciences, China ⟐ Peking University, China\n- **🔗 链接**：[[中英摘要](./abs/2504.03536.md)] [[arXiv:2504.03536](https://arxiv.org/abs/2504.03536)] [[Code](https://github.com/GigaAI-research/HumanDreamer-X)]\n- **📝 说明**：\n\n#### [205] MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM\n- **🧑‍🔬 作者**：Renwu Li, Wenjing Ke, Dong Li, Lu Tian, Emad Barsoum\n- **🏫 单位**： Advanced Micro Devices, Inc.\n- **🔗 链接**：[[中英摘要](./abs/2504.02437.md)] [[arXiv:2504.02437](https://arxiv.org/abs/2504.02437)] [Code]\n- **📝 说明**：\n\n#### [206] ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation\n- **🧑‍🔬 作者**：Yuan Zhou, Shilong Jin, Litao Hua, Wanjun Lv, Haoran Duan, Jungong Han\n- **🏫 单位**：Nanjing University of Information Science and Technology ⟐ Tsinghua University ⟐ Lenovo\n- **🔗 链接**：[[中英摘要](./abs/2504.02316.md)] [[arXiv:2504.02316](https://arxiv.org/abs/2504.02316)] [Code]\n- **📝 说明**：\n\n#### [207] Digital-twin imaging based on descattering Gaussian splatting\n- **🧑‍🔬 作者**：Suguru Shimomura, Kazuki Yamanouchi, Jun Tanida\n- **🏫 单位**：Graduate School of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suit, Osaka, Japan\n- **🔗 链接**：[[中英摘要](./abs/2504.02278.md)] [[arXiv:2504.02278](https://arxiv.org/abs/2504.02278)] [Code]\n- **📝 说明**：\n\n#### [208] UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting\n- **🧑‍🔬 作者**：Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon\n- **🏫 单位**：University of Maryland, College Park ⟐ DEVCOM Army Research Laboratory\n- **🔗 链接**：[[中英摘要](./abs/2504.02158.md)] [[arXiv:2504.02158](https://arxiv.org/abs/2504.02158)] [Code]\n- **📝 说明**：\n\n#### [209] BOGausS: Better Optimized Gaussian Splatting\n- **🧑‍🔬 作者**：Stéphane Pateux, Matthieu Gendrin, Luce Morin, Théo Ladune, Xiaoran Jiang\n- **🏫 单位**：Orange Innovation, Cesson S´evign´e, France ⟐ Univ Rennes, INSA Rennes, CNRS, IETR-UMR 6164, F-35000 Rennes, France\n- **🔗 链接**：[[中英摘要](./abs/2504.01844.md)] [[arXiv:2504.01844](https://arxiv.org/abs/2504.01844)] [Code]\n- **📝 说明**：\n\n#### [210] RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avata\n- **🧑‍🔬 作者**：Yahui Li, Zhi Zeng, Liming Pang, Guixuan Zhang, Shuwu Zhang\n- **🏫 单位**：Beijing University of Posts and Telecommunications\n- **🔗 链接**：[[中英摘要](./abs/2504.01559.md)] [[arXiv:2504.01559](https://arxiv.org/abs/2504.01559)] [Code]\n- **📝 说明**：\n\n#### [211] High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model\n- **🧑‍🔬 作者**：Yiyang Shen, Kun Zhou, He Wang, Yin Yang, Tianjia Shao\n- **🏫 单位**：State Key Lab of CAD&CG, Zhejiang University ⟐ AI Centre, University College London ⟐ University of Utah\n- **🔗 链接**：[[中英摘要](./abs/2504.01512.md)] [[arXiv:2504.01512](https://arxiv.org/abs/2504.01512)] [Code]\n- **📝 说明**：\n\n#### [212] 3D Gaussian Inverse Rendering with Approximated Global Illumination\n- **🧑‍🔬 作者**：Hyunwoo Park, Gun Ryu, Wonjun Kim\n- **🏫 单位**：HKUST(GZ) ⟐ NIO ⟐ University of Amsterdam ⟐ HKUST\n- **🔗 链接**：[[中英摘要](./abs/2504.01358.md)] [[arXiv:2504.01358](https://arxiv.org/abs/2504.01358)] [[Code](https://github.com/wuzirui/gs-ssr)]\n- **📝 说明**：\n\n#### [213] Coca-Splat: Collaborative Optimization for Camera Parameters and 3D Gaussians\n- **🧑‍🔬 作者**：Jiamin Wu, Hongyang Li, Xiaoke Jiang, Yuan Yao, Lei Zhang\n- **🏫 单位**：Hong Kong University of Science and Technology ⟐ International Digital Economy Academy (IDEA)\n- **🔗 链接**：[[中英摘要](./abs/2504.00639.md)] [[arXiv:2504.00639](https://arxiv.org/abs/2504.00639)] [Code]\n- **📝 说明**：\n\n#### [214] Distilling Multi-view Diffusion Models into 3D Generators\n- **🧑‍🔬 作者**：Hao Qin, Luyuan Chen, Ming Kong, Mengxu Lu, Qiang Zhu\n- **🏫 单位**：Zhejiang University ⟐ Beijing Information Science and Technology University ⟐ Hikvision Research Institute\n- **🔗 链接**：[[中英摘要](./abs/2504.00457.md)] [[arXiv:2504.00457](https://arxiv.org/abs/2504.00457)] [[Code](https://qinbaigao.github.io/DD3G_project/)]\n- **📝 说明**：\n\n#### [215] ADGaussian: Generalizable Gaussian Splatting for Autonomous Driving with Multi-modal Inputs\n- **🧑‍🔬 作者**：Qi Song, Chenghong Li, Haotong Lin, Sida Peng, Rui Huang\n- **🏫 单位**：SSE, CUHKSZ ⟐ Zhejiang University\n- **🔗 链接**：[[中英摘要](./abs/2504.00437.md)] [[arXiv:2504.00437](https://arxiv.org/abs/2504.00437)] [Code]\n- **📝 说明**：\n\n#### [216] SonarSplat: Novel View Synthesis of Imaging Sonar via Gaussian Splatting\n- **🧑‍🔬 作者**：Advaith V. Sethuraman, Max Rucker, Onur Bagoren, Pou-Chun Kung, Nibarkavi N.B. Amutha, Katherine A. Skinner\n- **🏫 单位**：Department of Robotics, University of Michigan, Ann Arbor\n- **🔗 链接**：[[中英摘要](./abs/2504.00159.md)] [[arXiv:2504.00159](https://arxiv.org/abs/2504.00159)] [Code]\n- **📝 说明**：\n"
  },
  {
    "path": "scripts/changelog.py",
    "content": "#!/usr/bin/env python\n\nimport subprocess\nimport re\nfrom collections import defaultdict\nfrom datetime import datetime\n\n\ndef parse_git_log(git_log):\n    commits = git_log.split(\"commit \")[1:]\n    changelog_entries = defaultdict(list)\n\n    for commit in commits:\n        date_raw = re.search(r\"Date:\\s+(.+)\", commit).group(1).strip()\n        date = datetime.strptime(date_raw, \"%a %b %d %H:%M:%S %Y %z\")\n        date_formatted = date.strftime(\"%Y/%m/%d\")\n        if 'Add \"' in commit:\n            papers = re.findall(r'Add \"(.+?)\"', commit)\n            for paper in papers:\n                changelog_entries[date_formatted].append(f'Add \"{paper}\"')\n\n    return changelog_entries\n\n\ndef write_changelog(changelog_entries, filename=\"Changelog.md\"):\n    with open(filename, \"w\") as file:\n        file.write(\"# Changelog\\n\\n\")\n        for date in sorted(changelog_entries.keys(), reverse=True):\n            file.write(f\"### {date}\\n\\n\")\n            for entry in changelog_entries[date]:\n                file.write(entry + \"\\n\\n\")\n\n\ndef main():\n    try:\n        kwargs = dict(check=True, text=True, capture_output=True)\n        process = subprocess.run([\"git\", \"log\"], **kwargs)\n        write_changelog(parse_git_log(process.stdout))\n        print(\"Changelog generated successfully.\")\n    except subprocess.CalledProcessError as e:\n        print(f\"Error executing git log: {e}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "scripts/format.py",
    "content": "#!/usr/bin/env python\n\nimport re\nimport sys\n\n\ndef correct_markdown_numbering_simple(markdown_text):\n    \"\"\"\n    A simpler version of the previous function. It corrects the numbering in a markdown file where each entry is\n    preceded by a number in square brackets and starts with '####'.\n\n    Parameters:\n    markdown_text (str): The original markdown text with possibly incorrect numbering.\n\n    Returns:\n    str: The markdown text with corrected numbering.\n    \"\"\"\n    lines = markdown_text.split(\"\\n\")\n    counter = 1\n    pattern = re.compile(r\"####\\s*\\[\\d*\\]\")\n    for i in range(len(lines)):\n        if pattern.match(lines[i]):\n            lines[i] = pattern.sub(f\"#### [{counter}]\", lines[i])\n            counter += 1\n    return \"\\n\".join(lines)\n\n\ndef main():\n    if len(sys.argv) < 2:\n        print(\"Usage: python script.py <input_filename> [output_filename]\")\n        sys.exit(1)\n\n    input_filename = sys.argv[1]\n    output_filename = sys.argv[2] if len(sys.argv) > 2 else None\n\n    if not output_filename or output_filename == input_filename:\n        confirm = input(f\"Overwrite {input_filename}? (y/n): \")\n        if confirm.lower() != \"y\":\n            print(\"Operation cancelled.\")\n            sys.exit(1)\n        output_filename = input_filename\n\n    with open(input_filename, \"r\", encoding=\"utf-8\") as file:\n        content = file.read()\n\n    corrected_content = correct_markdown_numbering_simple(content)\n\n    with open(output_filename, \"w\", encoding=\"utf-8\") as file:\n        file.write(corrected_content)\n    print(f\"Corrected content written to {output_filename}\")\n\n\nif __name__ == \"__main__\":\n    main()\n"
  }
]