Repository: YU-deep/Awesome-Latent-Space Branch: main Commit: a056ea83b779 Files: 3 Total size: 110.9 KB Directory structure: gitextract_gi5h8kyw/ ├── CONTRIBUTING.md ├── LICENSE └── README.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: CONTRIBUTING.md ================================================ # 🤝 Contributing We sincerely welcome paper updates/ contributions of any kind (and please do that lol)! Feel free to *open issues* or *create pull requests*.
## Adding New Papers If you want to add new papers to the existing list, please modify the README.md and follow the format in the table: ```markdown | year/month | [Title of the Paper](Arxiv Link) | image | [Github](Github Link) | ``` ### Recommended Guidelines: - Detect the earliest date of the paper, and place it properly in ascending order of the date. - If possible, use arXiv links rather than links from other resources (e.g. conference page), and provide link to the *abstract* instead of PDF page. - If the paper is accepted by a conference or journal, please add tag ![NAME'YEAR](https://img.shields.io/badge/NAME'YEAR-f1b800). See existing cases. - An introduction must be added in the /img folder, which should be the overview of the method. Keep it as aesthetically pleasing as possible; avoid leaving too much blank space. - If the Github link is available, please add it; otherwise, use "-" instead. ### Examples ```markdown | 2024/12 | ![ICML'25](https://img.shields.io/badge/ICML'25-f1b800)
[Deliberation in Latent Space via Differentiable Cache Augmentation](https://arxiv.org/abs/2412.17747) | image | - | ``` ```markdown | 2025/05 | [Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space](https://arxiv.org/abs/2505.13308) | image | [Github](https://github.com/bigai-nlco/LatentSeekhttps://github.com/bigai-nlco/LatentSeek) | ``` ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2025 Neil_Yu Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================

icon The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Awesome list badge GitHub stars MIT License arXiv Hugging Face PRs welcome WeChat Group Semantic Scholar Citations

This repository manually collects works in **latent space**, which will be continuously updated. ## 📖 News **[2026/04/03]** We release our survey: [The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook](https://arxiv.org/pdf/2604.02029)! **[2025/11/30]** We release the initial version! [![Star History Chart](https://api.star-history.com/svg?repos=YU-deep/Awesome-Latent-Space&type=Date)](https://star-history.com/#YU-deep/Awesome-Latent-Space&Date) ## 🌟 Overview - [📖 News](#-news) - [🌟 Overview](#-overview) - [📄 Citation](#-citation) - [🤝 Contributing](#-contributing) - [🔥 Methods](#-methods) - [Large-Language-Model](#large-language-model) - [Vision-Language-Model](#vision-language-model) - [Vision-Language-Action-Model](#vision-language-action-model) ## 📄 Citation If you find this survey helpful, a citation to our paper would be greatly appreciated: ```bibtex @article{yu2026latent, title={The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook}, author={Yu, Xinlei and Chen, Zhangquan and He, Yongbo and Fu, Tianyu and Yang, Cheng and Xu, Chengming and Ma, Yue and Hu, Xiaobin and Cao, Zhe and Xu, Jie and others}, journal={arXiv preprint arXiv:2604.02029}, year={2026} } ``` ## 🤝 Contributing We warmly welcome contributions of excellent resources you find via **pull request**. Please follow the instruction in **CONTRIBUTING.md** if you want to make one. Additionally, if you want to have any other issue, please add our wechat group. ## 🔥 Methods ### Large-Language-Model | Date | Paper Title | Introduction | Code | |----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------| | 2024/09 | [Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding](https://arxiv.org/abs/2409.08561) | image | - | | 2024/09 | [Uncovering Latent Chain of Thought Vectors in Language Models](https://arxiv.org/abs/2409.14026) | image | - | | 2024/10 | [Understanding Reasoning in Chain-of-Thought from the Hopfieldian View](https://arxiv.org/abs/2410.03595) | image | - | | 2024/10 | ![ICLR'25](https://img.shields.io/badge/ICLR'25-f1b800)
[Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation](https://arxiv.org/abs/2410.13640) | image | [Github](https://github.com/Alsace08/Chain-of-Embedding) | | 2024/11 | [Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding](https://arxiv.org/abs/2411.04282) | image | [Github](https://github.com/SalesforceAIResearch/LaTRO) | | 2024/12 | ![COLM'25](https://img.shields.io/badge/COLM'25-f1b800)
[Training Large Language Models to Reason in a Continuous Latent Space](https://arxiv.org/abs/2412.06769) | image | [Github](https://github.com/facebookresearch/coconut) | | 2024/12 | [Compressed Chain of Thought: Efficient Reasoning Through Dense Representations](https://arxiv.org/abs/2412.13171) | image | - | | 2024/12 | ![ICML'25](https://img.shields.io/badge/ICML'25-f1b800)
[Deliberation in Latent Space via Differentiable Cache Augmentation](https://arxiv.org/abs/2412.17747) | image | - | | 2025/01 | [Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks](https://arxiv.org/abs/2501.10639) | image | [Github](https://github.com/xinykou/Against_Jailbreak) | | 2025/01 | [LF-Steering: Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models](https://arxiv.org/abs/2501.11036) | image | - | | 2025/02 | ![ICML'25](https://img.shields.io/badge/ICML'25-f1b800)
[Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning](https://arxiv.org/abs/2502.03275) | image | - | | 2025/02 | ![ICML'25](https://img.shields.io/badge/ICML'25-f1b800)
[Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization](https://arxiv.org/abs/2502.04686) | image | - | | 2025/02 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) | image | [Github](https://github.com/seal-rg/recurrent-pretraining) | | 2025/02 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[LLM Pretraining with Continuous Concepts](https://arxiv.org/abs/2502.08524) | image | [Github](https://github.com/facebookresearch/RAM/tree/main/projects/cocomix) | | 2025/02 | ![ACL'25](https://img.shields.io/badge/ACL'25-f1b800)
[SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs](https://arxiv.org/abs/2502.12134) | image | [Github](https://github.com/xuyige/SoftCoT) | | 2025/02 | [Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction](https://arxiv.org/abs/2502.16280) | image | - | | 2025/02 | ![ICLR'25](https://img.shields.io/badge/ICLR'25-f1b800)
[Reasoning with Latent Thoughts: On the Power of Looped Transformers](https://arxiv.org/abs/2502.17416) | image | - | | 2025/02 | [Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs](https://arxiv.org/abs/2502.21030) | image | - | | 2025/02 | ![EMNLP'25](https://img.shields.io/badge/EMNLP'25-f1b800)
[CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation](https://arxiv.org/abs/2502.21074) | image | [Github](https://github.com/zhenyi4/codi) | | 2025/03 | ![ICLR'25](https://img.shields.io/badge/ICLR'25-f1b800)
[Reasoning to Learn from Latent Thoughts](https://arxiv.org/abs/2503.18866) | image | [Github](https://github.com/ryoungj/BoLT) | | 2025/03 | [Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation](https://arxiv.org/abs/2503.22675) | image | - | | 2025/03 | [MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models](https://arxiv.org/abs/2503.23100) | image | - | | 2025/04 | [Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models](https://arxiv.org/abs/2504.10615) | - | - | | 2025/04 | [Efficient Pretraining Length Scaling](https://arxiv.org/abs/2504.14992) | image | - | | 2025/05 | [SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning](https://arxiv.org/abs/2505.11484) | image | [Github](https://github.com/xuyige/SoftCoT) | | 2025/05 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought](https://arxiv.org/abs/2505.12514) | image | [Github](https://github.com/Ber666/reasoning-by-superposition) | | 2025/05 | [Enhancing Latent Computation in Transformers with Latent Tokens](https://arxiv.org/abs/2505.12629) | image | - | | 2025/05 | [Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space](https://arxiv.org/abs/2505.13308) | image | [Github](https://github.com/bigai-nlco/LatentSeekhttps://github.com/bigai-nlco/LatentSeek) | | 2025/05 | [Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs](https://arxiv.org/abs/2505.14530) | image | [Github](https://github.com/yzp11/Internal-Chain-of-Thought) | | 2025/05 | [Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space](https://arxiv.org/abs/2505.15778) | image | - | | 2025/05 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains](https://arxiv.org/abs/2505.16552) | image | [Github](https://github.com/xiaomi-research/colar) | | 2025/05 | [LARES: Latent Reasoning for Sequential Recommendation](https://arxiv.org/abs/2505.16865) | image | - | | 2025/05 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Hybrid Latent Reasoning via Reinforcement Learning](https://arxiv.org/abs/2505.18454) | image | [Github](https://github.com/thu-nics/C2C) | | 2025/05 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts](https://arxiv.org/abs/2505.18962) | image | - | | 2025/05 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Reinforced Latent Reasoning for LLM-based Recommendation](https://arxiv.org/abs/2505.19092) | image | [Github](https://github.com/xuwenxinedu/R3) | | 2025/05 | [Continuous Chain of Thought Enables Parallel Exploration and Reasoning](https://arxiv.org/abs/2505.23648) | image | [Github](https://github.com/alperengozeten/CoT2) | | 2025/05 | ![ICML'25](https://img.shields.io/badge/ICML'25-f1b800)
[Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration](https://arxiv.org/abs/2505.24688) | image | [Github](https://github.com/alickzhu/Soft-Reasoning) | | 2025/06 | [Efficient Post-Training Refinement of Latent Reasoning in Large Language Models](https://arxiv.org/abs/2506.08552) | image | [Github](https://github.com/anord-wang/Lateng-Reasoning) | | 2025/06 | [DART: Distilling Autoregressive Reasoning to Silent Thought](https://arxiv.org/abs/2506.11752) | image | - | | 2025/06 | ![EMNLP'25](https://img.shields.io/badge/EMNLP'25-f1b800)
[Parallel Continuous Chain-of-Thought with Jacobi Iteration](https://arxiv.org/pdf/2506.18582) | image | [Github](https://github.com/whyNLP/PCCoT) | | 2025/07 | [Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer](https://arxiv.org/abs/2507.02199) | image | [Github](https://github.com/wenquanlu/huginn-latent-cot) | | 2025/07 | [CTRLS: Chain-of-Thought Reasoning via Latent State Transition](https://arxiv.org/abs/2507.08182) | image | - | | 2025/07 | [Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models](https://arxiv.org/abs/2507.13874) | image | - | | 2025/08 | [Bridging Search and Recommendation through Latent Cross Reasoning](https://arxiv.org/abs/2508.04152) | image | - | | 2025/08 | [LatentPrompt: Optimizing Promts in Latent Space](https://arxiv.org/abs/2508.02452) | image | - | | 2025/08 | [Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs](https://arxiv.org/abs/2508.10029) | image | - | | 2025/09 | [Decoding in Latent Spaces for Efficient Inference in LLM-based Recommendation](https://arxiv.org/abs/2509.11524) | image | - | | 2025/09 | [LTA-thinker: Latent Thought-Augmented Training Framework for Large Language Models on Complex Reasoning](https://arxiv.org/abs/2509.12875) | image | [Github](https://github.com/wangjiaqi886/LTA-Thinker) | | 2025/09 | ![EMNLP'25](https://img.shields.io/badge/EMNLP'25-f1b800)
[The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs](https://arxiv.org/abs/2509.17030) | image | - | | 2025/09 | [LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation](https://arxiv.org/abs/2509.19839) | image | - | | 2025/09 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[SIM-CoT: Supervised Implicit Chain-of-Thought](https://arxiv.org/abs/2509.20317) | image | [Github](https://github.com/InternLM/SIM-CoT) | | 2025/09 | [PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space](https://arxiv.org/abs/2509.23184) | image | [Github](https://github.com/LUMIA-Group/PonderLM-2) | | 2025/09 | [Fast Thinking for Large Language Models](https://arxiv.org/abs/2509.23633) | image | - | | 2025/09 | [Learning to Ponder: Adaptive Reasoning in Latent Space](https://arxiv.org/abs/2509.24238) | image | - | | 2025/09 | [Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory](https://arxiv.org/abs/2509.24653) | image | - | | 2025/09 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[MemGen: Weaving Generative Latent Memory for Self-Evolving Agents](https://arxiv.org/abs/2509.24704) | image | [Github](https://github.com/KANABOON1/MemGen) | | 2025/09 | [LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space](https://arxiv.org/abs/2509.24771) | image | [Github](https://github.com/jins7/LatentEvolve) | | 2025/09 | [MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts](https://arxiv.org/abs/2509.25020) | image | - | | 2025/09 | [A Formal Comparison Between Chain of Thought and Latent Thought](https://arxiv.org/abs/2509.25239) | - | - | | 2025/09 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts](https://arxiv.org/abs/2509.26314) | image | - | | 2025/10 | [Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space](https://arxiv.org/abs/2510.00219) | image | [Github](https://github.com/stanfordnlp/thoughtbubbles) | | 2025/10 | [Analyzing Latent Concepts in Code Language Models](https://arxiv.org/abs/2510.00476) | image | - | | 2025/10 | [Exploring System 1 and 2 communication for latent reasoning in LLMs](https://arxiv.org/abs/2510.00494) | image | - | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[KaVa: Latent Reasoning via Compressed KV-Cache Distillation](https://arxiv.org/abs/2510.02312) | image | - | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization](https://arxiv.org/abs/2510.04182) | image | [Github](https://github.com/ltpo2025/LTPO) | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning](https://arxiv.org/abs/2510.04573) | image | [Github](https://github.com/mk322/LaDiR) | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs](https://arxiv.org/abs/2510.05069) | image | [Github](https://github.com/sdc17/SwiReasoning) | | 2025/10 | [Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts](https://arxiv.org/abs/2510.07358) | image | - | | 2025/10 | [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) | image | [Github](https://github.com/YRYangang/LatentTTS) | | 2025/10 | [LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback](https://arxiv.org/abs/2510.08604) | image | - | | 2025/10 | [Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection](https://arxiv.org/abs/2510.09694) | image | [Github](https://github.com/Alibaba-AAIG/Kelp) | | 2025/10 | [Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning](https://arxiv.org/abs/2510.10494) | image | - | | 2025/10 | [Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning](https://arxiv.org/abs/2510.14095) | image | [Github](https://github.com/Awni00/algorithmic-generalization-transformer-architectures) | | 2025/10 | [Language Models are Injective and Hence Invertible](https://arxiv.org/abs/2510.15511) | image | - | | 2025/10 | [LLM Latent Reasoning as Chain of Superposition](https://arxiv.org/abs/2510.15522) | image | [Github](https://github.com/DJC-GO-SOLO/Latent-SFT) | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[ActivationReasoning: Logical Reasoning in Latent Activation Spaces](https://arxiv.org/abs/2510.18184) | image | - | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models](https://arxiv.org/abs/2510.22042) | image | - | | 2025/10 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[SALS: Sparse Attention in Latent Space for KV cache Compression](https://arxiv.org/abs/2510.24273) | image | - | | 2025/10 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens](https://arxiv.org/abs/2510.24940) | image | [Github](https://github.com/YinhanHe123/SemCoT) | | 2025/10 | [Scaling Latent Reasoning via Looped Language Models](https://arxiv.org/abs/2510.25741) | image | - | | 2025/10 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Cache-to-Cache: Direct Semantic Communication Between Large Language Model](https://arxiv.org/abs/2510.03215) | image | [Github](https://github.com/thu-nics/C2C) | | 2025/10 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Thought Communication in Multiagent Collaboration](https://arxiv.org/abs/2510.20733) | image | - | | 2025/11 | [SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization](https://arxiv.org/abs/2511.06411) | image | [Github](https://github.com/zz1358m/SofT-GRPO-master) | | 2025/11 | [Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought](https://arxiv.org/abs/2511.07124) | image | - | | 2025/11 | [Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models](https://arxiv.org/abs/2511.08577) | image | [Github](https://github.com/thu-nics/TaH) | | 2025/11 | [SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving](https://arxiv.org/abs/2511.08983) | image | - | | 2025/11 | [Enabling Agents to Communicate Entirely in Latent Space](https://arxiv.org/abs/2511.09149) | image | - | | 2025/11 | [Improving Latent Reasoning in LLMs via Soft Concept Mixing](https://arxiv.org/abs/2511.16885) | image | - | | 2025/11 | [Your Latent Reasoning is Secretly Policy Improvement Operator](https://arxiv.org/abs/2511.16886) | - | - | | 2025/11 | [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) | image | [Github](https://github.com/apple/ml-clara) | | 2025/11 | [Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning](https://arxiv.org/abs/2511.21581) | image | [Github](https://github.com/apning/adaptive-latent-reasoning) | | 2025/11 | [Visualizing LLM Latent Space Geometry Through Dimensionality Reduction](https://arxiv.org/abs/2511.21594) | image | [Github](https://github.com/Vainateya/Feature_Geometry_Visualization) | | 2025/11 | [Polarity-Aware Probing for Quantifying Latent Alignment in Language Models](https://arxiv.org/abs/2511.21737) | image | [Github](https://github.com/SadSabrina/polarity-probing) | | 2025/11 | [Latent Collaboration in Multi-Agent Systems](https://arxiv.org/abs/2511.20639) | image | [Github](https://github.com/Gen-Verse/LatentMAS) | | 2025/12 | [Latent Debate: A Surrogate Framework for Interpreting LLM Thinking](https://arxiv.org/abs/2512.01909) | image | [Github](https://github.com/tigerchen52/latent_debate) | | 2025/12 | [Lightweight Latent Reasoning for Narrative Tasks](https://arxiv.org/abs/2512.02240) | image | - | | 2025/12 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation](https://arxiv.org/abs/2512.06690) | image | - | | 2025/12 | [ReLaX: Reasoning with Latent Exploration for Large Reasoning Models](https://arxiv.org/abs/2512.07558) | image | - | | 2025/12 | [Reinforcement Learning for Latent-Space Thinking in LLMs](https://arxiv.org/abs/2512.11816) | image | [Github](https://github.com/enesozeren/latent-space-thinking-model) | | 2025/12 | [Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs](https://arxiv.org/abs/2512.17206) | image | - | | 2025/12 | [JEPA-Reasoner: Decoupling Latent Reasoning from Token Generation](https://arxiv.org/abs/2512.19171) | image | - | | 2025/12 | [Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought](https://arxiv.org/abs/2512.21711) | image | - | | 2025/12 | [iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning](https://arxiv.org/abs/2512.24014) | image | [Github](https://github.com/AgenticFinLab/latent-planning) | | 2025/12 | [Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space](https://arxiv.org/abs/2512.24617) | image | - | | 2025/12 | [Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning](https://arxiv.org/abs/2512.20629) | - | - | | 2026/01 | [Parallel Latent Reasoning for Sequential Recommendation](https://arxiv.org/abs/2601.03153) | image | - | | 2026/01 | [Latent Space Communication via K-V Cache Alignment](https://arxiv.org/abs/2601.06123) | image | - | | 2026/01 | [Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models](https://arxiv.org/abs/2601.03542) | image | [Github](https://github.com/laquabe/Layer-Order-Inversion) | | 2026/01 | [FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse](https://arxiv.org/abs/2601.05505) | image | - | | 2026/01 | [IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck](https://arxiv.org/abs/2601.05870) | image | [Github](https://github.com/denghuilin-cyber/IIB-LPO) | | 2026/01 | [Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space](https://arxiv.org/abs/2601.06220) | image | [Github](https://github.com/Codeffun3/ZeroRouter) | | 2026/01 | [Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering](https://arxiv.org/abs/2601.08427) | image | - | | 2026/01 | [Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models](https://arxiv.org/abs/2601.08058) | image | - | | 2026/01 | [RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering](https://arxiv.org/abs/2601.09269) | image | [Github](https://github.com/gooogleshanghai/RISER-Orchestrating-Latent-Reasoning-Skills-for-Adaptive-Activation-Steering) | | 2026/01 | [GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients](https://arxiv.org/abs/2601.10229) | image | - | | 2026/01 | [Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models](https://arxiv.org/abs/2601.13533) | image | - | | 2026/01 | [Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning](https://arxiv.org/abs/2601.17275) | image | - | | 2026/01 | [UniCog: Uncovering Cognitive Abilities of LLMs through Latent Mind Space Analysis](https://arxiv.org/abs/2601.17897) | image | [Github](https://github.com/milksalute/unicog) | | 2026/01 | [S2GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation](https://arxiv.org/abs/2601.18664) | image | - | | 2026/01 | [The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning](https://arxiv.org/abs/2601.18832) | image | - | | 2026/01 | [PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models](https://arxiv.org/abs/2601.19917) | image | - | | 2026/01 | [Beyond Imitation: Reinforcement Learning for Active Latent Planning](https://arxiv.org/abs/2601.21598) | image | [Github](https://github.com/zz1358m/ATP-Latent-master) | | 2026/01 | [Latent Adversarial Regularization for Offline Preference Optimization](https://arxiv.org/abs/2601.22083) | image | - | | 2026/01 | [Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization](https://arxiv.org/abs/2601.21358) | image | [Github](https://github.com/yunsaijc/PLaT) | | 2026/01 | [Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves](https://arxiv.org/abs/2601.21582) | image | - | | 2026/01 | [From Logits to Latents: Contrastive Representation Shaping for LLM Unlearning](https://arxiv.org/abs/2601.22028) | - | - | | 2026/01 | [ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought](https://arxiv.org/abs/2601.23184) | image | [Github](https://github.com/FanmengWang/ReGuLaR) | | 2026/02 | [G-MemLLM: Gated Latent Memory Augmentation for Long-Context Reasoning in Large Language Models](https://arxiv.org/abs/2602.00015) | image | - | | 2026/02 | [Do Latent-CoT Models Think Step-by-Step? A Mechanistic Study on Sequential Reasoning Tasks](https://arxiv.org/abs/2602.00449) | image | [Github](https://github.com/jialiang19/latent-cot-thinking) | | 2026/02 | [Capabilities and Fundamental Limits of Latent Chain-of-Thought](https://arxiv.org/abs/2602.01148) | - | - | | 2026/02 | [Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models](https://arxiv.org/abs/2602.01698) | image | [Github](https://github.com/Xiaomi-Research/LED) | | 2026/02 | [No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs](https://arxiv.org/abs/2602.02103) | - | [Github](https://github.com/lxucs/tele-lens) | | 2026/02 | [CoLT: Reasoning with Chain of Latent Tool Calls](https://arxiv.org/abs/2602.04246) | image | - | | 2026/02 | [Internalizing LLM Reasoning via Discovery and Replay of Latent Actions](https://arxiv.org/abs/2602.04925) | image | [Github](https://github.com/sznnzs/LLM-Latent-Action) | | 2026/02 | [Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning](https://arxiv.org/abs/2602.06584) | image | - | | 2026/02 | [LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning](https://arxiv.org/abs/2602.07075) | image | [Github](https://github.com/xinwuye/LatentChem) | | 2026/02 | [DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity](https://arxiv.org/abs/2602.08005) | image | [Github](https://github.com/CURRENTF/Sparse-vLLM) | | 2026/02 | [Pretraining with Token-Level Adaptive Latent Chain-of-Thought](https://arxiv.org/abs/2602.08220) | image | - | | 2026/02 | [Latent Reasoning with Supervised Thinking States](https://arxiv.org/abs/2602.08332) | image | - | | 2026/02 | [Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure](https://arxiv.org/abs/2602.08783) | image | - | | 2026/02 | [Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models](https://arxiv.org/abs/2602.08984) | image | [Github](https://github.com/LUMIA-Group/ConceptLM) | | 2026/02 | [Talking with the Latents -- how to convert your LLM into an astronomer](https://arxiv.org/abs/2602.09670) | image | - | | 2026/02 | [Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens](https://arxiv.org/abs/2602.10229) | image | [Github](https://github.com/NeosKnight233/Latent-Thoughts-Tuning) | | 2026/02 | [Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models](https://arxiv.org/abs/2602.10520) | image | - | | 2026/02 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation](https://arxiv.org/abs/2602.11451) | image | [Github](https://github.com/armenjeddi/loopformer) | | 2026/02 | [Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models](https://arxiv.org/abs/2602.11495) | image | - | | 2026/02 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Native Reasoning Models: Training Language Models to Reason on Unverifiable Data](https://arxiv.org/abs/2602.11549) | image | - | | 2026/02 | [ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces](https://arxiv.org/abs/2602.11683) | image | - | | 2026/02 | [SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion](https://arxiv.org/abs/2602.11698) | image | - | | 2026/02 | [GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler](https://arxiv.org/abs/2602.14077) | image | - | | 2026/02 | [Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation](https://arxiv.org/abs/2602.14469) | image | - | | 2026/02 | [Inner Loop Inference for Pretrained Transformers: Unlocking Latent Capabilities Without Training](https://arxiv.org/abs/2602.14759) | - | [Github](https://github.com/jonathanlys01/looped-transformer) | | 2026/02 | [LatentMem: Customizing Latent Memory for Multi-Agent Systems](https://arxiv.org/abs/2602.03036) | image | [Github](https://github.com/KANABOON1/LatentMem) | | 2026/02 | [Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems](https://arxiv.org/abs/2602.03695) | image | - | | 2026/03 | [LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval](https://arxiv.org/abs/2603.01425) | image | [Github](https://github.com/ignorejjj/LaSER) | | 2026/03 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Multi-Head Low-Rank Attention](https://arxiv.org/abs/2603.02188) | - | [Github](https://github.com/SongtaoLiu0823/MLRA) | | 2026/03 | [AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth](https://arxiv.org/abs/2603.01914) | image | - | | 2026/03 | [PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking](https://arxiv.org/abs/2603.02023) | image | - | | 2026/03 | [When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning](https://arxiv.org/abs/2603.03475) | - | [Github](https://github.com/SubramanyamSahoo/When-Shallow-Wins) | | 2026/03 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[∇-REASONER: LLM REASONING VIA TEST-TIMEGRADIENT DESCENT IN LATENT SPACE](https://arxiv.org/abs/2603.04948) | - | - | | 2026/03 | [SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models](https://arxiv.org/abs/2603.06222) | image | - | | 2026/03 | [NextMem: Towards Latent Factual Memory for LLM-based Agents](https://arxiv.org/abs/2603.15634) | image | [Github](https://github.com/nuster1128/NextMem) | | 2026/03 | [Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations](https://arxiv.org/abs/2603.17305) | image | - | | 2026/03 | [LoopRPT: Reinforcement Pre-Training for Looped Language Models](https://arxiv.org/abs/2603.19714) | image | - | ### Vision-Language-Model | Date | Paper Title | Introduction | Code | |----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------------------------------| | 2024/10 | [Reducing hallucinations in large vision-language models via latent space steering](https://arxiv.org/abs/2410.15778) | image | [Github](https://github.com/shengliu66/VTI) | | 2024/12 | ![CVPR'25](https://img.shields.io/badge/CVPR'25-f1b800)
[Perception Tokens Enhance Visual Reasoning in Multimodal Language Models](https://arxiv.org/abs/2412.03548) | image | [Github](https://github.com/mahtabbigverdi/Aurora-perception) | | 2025/01 | [Efficient Reasoning with Hidden Thinking](https://arxiv.org/abs/2501.19201) | image | [Github](https://github.com/shawnricecake/Heima) | | 2025/02 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding](https://arxiv.org/abs/2502.01341) | image | - | | 2025/03 | [Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models](https://arxiv.org/abs/2503.17142) | image | [Github](https://github.com/BerasiDavide/vlm_image_compositionality) | | 2025/05 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Towards General Continuous Memory for Vision-Language Models](https://arxiv.org/abs/2505.17670) | image | [Github](https://github.com/WenyiWU0111/CoMEM/tree/main) | | 2025/05 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing](https://arxiv.org/abs/2505.21547) | image | [Github](https://github.com/weixingW/CGC-VTD/tree/main) | | 2025/06 | [Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens](https://arxiv.org/abs/2506.17218) | image | [Github](https://github.com/UMass-Embodied-AGI/Mirage) | | 2025/08 | [Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models](https://arxiv.org/abs/2508.12587) | image | - | | 2025/09 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning](https://arxiv.org/abs/2509.22761) | image | [Github](https://github.com/spatigen/milr) | | 2025/09 | ![ICLR'26](https://img.shields.io/badge/ICLR'26-f1b800)
[Latent Visual reasoning](https://arxiv.org/abs/2509.24251) | image | [Github](https://github.com/VincentLeebang/lvr) | | 2025/10 | [Auto-scaling Continuous Memory for GUI Agent](https://arxiv.org/abs/2510.09038) | image | [Github](https://github.com/WenyiWU0111/CoMEM-Agent) | | 2025/10 | [Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space](https://arxiv.org/abs/2510.12603) | image | [Github](https://github.com/FYYDCC/IVT-LR) | | 2025/10 | ![CVPR'26](https://img.shields.io/badge/CVPR'26-f1b800)
[Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views](https://arxiv.org/abs/2510.18632) | image | [Github](https://github.com/zhangquanchen/3DThinker) | | 2025/10 | [Latent Chain-of-Thought for Visual Reasoning](https://arxiv.org/abs/2510.23925) | image | [Github](https://github.com/heliossun/LaCoT) | | 2025/10 | [Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs](https://arxiv.org/abs/2510.24514) | image | [Github](https://github.com/hwanyu112/Latent-Sketchpad) | | 2025/11 | [Multimodal Reasoning via Latent Refocusing](https://arxiv.org/abs/2511.02360) | image | - | | 2025/11 | ![CVPR'26](https://img.shields.io/badge/CVPR'26-f1b800)
[VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Model](https://arxiv.org/abs/2511.11007) | image | [Github](https://github.com/YU-deep/VisMem) | | 2025/11 | [L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention](https://arxiv.org/abs/2511.17910) | image | - | | 2025/11 | [Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens](https://arxiv.org/abs/2511.19418) | image | [Github](https://github.com/Wakals/CoVT) | | 2025/11 | [Reading Between the Lines: Abstaining from VLM-Generated OCR Errors via Latent Representation Probes](https://arxiv.org/abs/2511.19806) | image | - | | 2025/11 | [Monet: Reasoning in Latent Visual Space Beyond Image and Language](https://arxiv.org/abs/2511.21395) | image | [Github](https://github.com/NOVAglow646/) | | 2025/12 | [Interleaved Latent Visual Reasoning with Selective Perceptual Modeling](https://arxiv.org/abs/2512.05665) | image | [Github](https://github.com/XD111ds/ILVR) | | 2025/12 | [Mull-Tokens: Modality-Agnostic Latent Thinking](https://arxiv.org/abs/2512.10941) | image | - | | 2025/12 | [VL-JEPA: Joint Embedding Predictive Architecture for Vision-language](https://arxiv.org/abs/2512.10942) | image | - | | 2025/12 | [Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space](https://arxiv.org/abs/2512.12623) | image | [Github](https://github.com/eric-ai-lab/DMLR) | | 2025/12 | [Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs](https://arxiv.org/abs/2512.16584) | image | [Github](https://github.com/TungChintao/SkiLa) | | 2025/12 | [Latent Implicit Visual Reasoning](https://arxiv.org/abs/2512.21218) | image | - | | 2026/01 | [Forest Before Trees: Latent Superposition for Efficient Visual Reasoning](https://arxiv.org/abs/2601.06803) | image | [Github](https://github.com/ybb6/laser) | | 2026/01 | [Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions](https://arxiv.org/abs/2601.07516) | image | - | | 2026/01 | [LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning](https://arxiv.org/abs/2601.10129) | image | [Github](https://github.com/Svardfox/LaViT) | | 2026/01 | [PREGEN: Uncovering Latent Thoughts in Composed Video Retrieval](https://arxiv.org/abs/2601.13797) | image | - | | 2026/01 | [Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning](https://arxiv.org/abs/2601.14750) | image | [Github](https://github.com/TencentBAC/RoT) | | 2026/01 | [CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding](https://arxiv.org/abs/2601.21262) | image | - | | 2026/02 | [PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Multimodal Agents](https://arxiv.org/abs/2602.00415) | image | [Github](https://github.com/czs-ict/PolarMem) | | 2026/02 | [LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs](https://arxiv.org/abs/2602.00462) | image | [Github](https://github.com/McGill-NLP/latentlens) | | 2026/02 | [Dual Latent Memory for Visual Multi-agent System](https://arxiv.org/abs/2602.00471) | image | [Github](https://github.com/YU-deep/L2-VMAS) | | 2026/02 | [Learning Modal-Mixed Chain-of-Thought Reasoning with Latent Embeddings](https://arxiv.org/abs/2602.00574) | image | - | | 2026/02 | [Toward Cognitive Supersensing in Multimodal Large Language Model](https://arxiv.org/abs/2602.01541) | image | [Github](https://github.com/PediaMedAI/Cognition-MLLM) | | 2026/02 | [Visual Reasoning over Time Series via Multi-Agent Systems](https://arxiv.org/abs/2602.03026) | image | - | | 2026/02 | [Vision-aligned Latent Reasoning for Multi-modal Large Language Model](https://arxiv.org/abs/2602.04476) | image | - | | 2026/02 | [Multimodal Latent Reasoning via Hierarchical Visual Cues Injection](https://arxiv.org/abs/2602.05359) | image | - | | 2026/02 | [LCLA: Language-Conditioned Latent Alignment for Vision-Language Navigation](https://arxiv.org/abs/2602.07629) | image | - | | 2026/02 | [MaD-Mix: Multi-Modal Data Mixtures via Latent Space Coupling for Vision-Language Model Training](https://arxiv.org/abs/2602.07790) | image | - | | 2026/02 | [Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection](https://arxiv.org/abs/2602.09850) | image | [Github](https://github.com/chenpeng052/Reason-IAD) | | 2026/02 | [Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models](https://arxiv.org/abs/2602.11824) | image | [Github](https://github.com/antgroup/Revis) | | 2026/02 | [OneLatent: Single-Token Compression for Visual Latent Reasoning](https://arxiv.org/abs/2602.13738) | image | - | | 2026/02 | [The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems](https://arxiv.org/abs/2602.15382) | image | [Github](https://github.com/xz-liu/heterogeneous-latent-mas) | | 2026/02 | [Test-Time Computing for Referring Multimodal Large Language Models](https://arxiv.org/abs/2602.19505) | image | - | | 2026/02 | [CrystaL: Spontaneous Emergence of Visual Latents in MLLMs](https://arxiv.org/abs/2602.20980) | image | [Github](https://github.com/yangzhangok/crystal) | | 2026/02 | [Imagination Helps Visual Reasoning, But Not Yet in Latent Space](https://arxiv.org/abs/2602.22766) | image | - | | 2026/02 | [Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection](https://arxiv.org/abs/2602.24021) | image | - | | 2026/03 | [Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding](https://arxiv.org/abs/2603.13366) | image | - | | 2026/04 | [Visual Enhanced Depth Scaling for Multimodal Latent Reasoning](https://arxiv.org/abs/2604.10500) | image | [Github](https://github.com/Simon98-AI/Vedas) | | 2026/04 | [HyLaR: Hybrid Latent Reasoning with Decoupled Policy Optimization](https://arxiv.org/abs/2604.20328) | image | [Github](https://github.com/EthenCheng/HyLaR) | | 2026/03 | [MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution](https://arxiv.org/abs/2604.26283) | image | - | | 2026/05 | [Representative Attention For Vision Transformers](https://arxiv.org/abs/2605.14913) | image | [Github](https://github.com/Liyuntong123/RPAtten) | ### Vision-Language-Action-Model | Date | Paper Title | Introduction | Code | |----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|-----------------------------------------------------------| | 2024/10 | ![ICLR'25](https://img.shields.io/badge/ICLR'25-f1b800)
[Latent Action Pretraining from Videos](https://arxiv.org/abs/2410.11758) | image | [Github](https://github.com/LatentActionPretraining/LAPA) | | 2025/05 | [UniVLA: Learning to Act Anywhere with Task-centric Latent Actions](https://arxiv.org/abs/2505.06111) | image | [Github](https://github.com/OpenDriveLab/UniVLA) | | 2025/07 | ![NeurIPS'25](https://img.shields.io/badge/NeurIPS'25-f1b800)
[ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning](https://arxiv.org/abs/2507.16815) | image | [Github](https://jasper0314-huang.github.io/thinkact-vla) | | 2025/09 | [Align-Then-Steer: Adapting the Vision-Language Action Models through Unified Latent Guidance](https://arxiv.org/abs/2509.02055) | image | [Github](https://github.com/TeleHuman/Align-Then-Steer) | | 2025/09 | [OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision](https://arxiv.org/abs/2509.05578) | image | - | | 2025/09 | [Latent Action Pretraining Through World Modeling](https://arxiv.org/abs/2509.18428) | image | - | | 2025/09 | [Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA](https://arxiv.org/abs/2509.26251) | image | - | | 2025/11 | [SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models](https://arxiv.org/abs/2511.15605) | image | [Github](https://github.com/sii-research/siiRL) | | 2025/11 | [LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models](https://arxiv.org/abs/2511.23034) | image | - | | 2025/11 | [Unifying Perception and Action: A Hybrid-Modality Pipeline with Implicit Visual Chain-of-Thought for Robotic Action Generation](https://arxiv.org/abs/2511.19859) | image | [Github](https://github.com/vita-cvpr26/vita) | | 2025/12 | [SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead](https://arxiv.org/abs/2512.00903) | image | [Github](https://github.com/GigaAI-research/SwiftVLA) | | 2025/12 | [GLaD: Geometric Latent Distillation for Vision-Language-Action Models](https://arxiv.org/abs/2512.09619) | image | - | | 2025/12 | [Latent Chain-of-Thought World Modeling for End-to-End Autonomous Driving](https://arxiv.org/abs/2512.10226) | image | - | | 2025/12 | [WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control](https://arxiv.org/abs/2512.11047) | image | [Github](https://github.com/OpenDriveLab/WholebodyVLA) | | 2025/12 | [Motus: A Unified Latent Action World Model](https://arxiv.org/abs/2512.13030) | image | [Github](https://github.com/thu-ml/Motus) | | 2025/12 | [LoLA: Long Horizon Latent Action Learning for General Robot Manipulation](https://arxiv.org/abs/2512.20166) | image | - | | 2025/12 | [ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving](https://arxiv.org/abs/2512.22939) | image | [Github](https://github.com/pqh22/ColaVLA) | | 2026/01 | [Learning to Act Robustly with View-Invariant Latent Actions](https://arxiv.org/abs/2601.02994) | image | - | | 2026/01 | [CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos](https://arxiv.org/abs/2601.04061) | image | - | | 2026/01 | [LaST0: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model](https://arxiv.org/abs/2601.05248) | image | - | | 2026/01 | [LatentVLA: Efficient Vision-Language Models for Autonomous Driving via Latent Action Prediction](https://arxiv.org/abs/2601.05611) | image | - | | 2026/01 | [Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning](https://arxiv.org/abs/2601.09708) | image | - | | 2026/01 | [LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries](https://arxiv.org/abs/2601.15197) | image | [Github](https://github.com/ZGC-EmbodyAI/LangForce) | | 2026/01 | [CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control](https://arxiv.org/abs/2601.22467) | image | - | | 2026/01 | [Vision-Language Models Unlock Task-Centric Latent Actions](https://arxiv.org/abs/2601.22714) | image | - | | 2026/02 | [Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models](https://arxiv.org/abs/2602.01166) | image | [Github](https://github.com/LoveJu1y/LaRA-VLA) | | 2026/02 | [DriveWorld-VLA: Unified Latent-Space World Modeling for Autonomous Driving](https://arxiv.org/abs/2602.06521) | image | - | | 2026/02 | [Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning](https://arxiv.org/abs/2602.07845) | image | - | | 2026/02 | [ConLA: Contrastive Latent Action Learning from Human Videos for Robotic Manipulation](https://arxiv.org/abs/2602.00557) | image | [Github](https://github.com/WeishengDAI/ConLA) | | 2026/02 | [VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model](https://arxiv.org/abs/2602.10098) | image | [Github](https://github.com/ginwind/VLA-JEPA/) | | 2026/02 | [FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution](https://arxiv.org/abs/2602.15882) | image | - | | 2026/02 | [UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models](https://arxiv.org/abs/2602.20231) | image | - | | 2026/02 | ![CVPR'26](https://img.shields.io/badge/CVPR'26-f1b800)
[JALA: Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild](https://arxiv.org/abs/2602.21736) | image | - | | 2026/03 | [LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving](https://arxiv.org/abs/2603.01928) | image | [Github](https://github.com/luo-yc17/LaST-VLA) | | 2026/03 | [Chain of World: World Model Thinking in Latent Motion](https://arxiv.org/abs/2603.03195) | image | [Github](https://github.com/fx-hit/CoWVLA) | | 2026/04 | [OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation](https://arxiv.org/abs/2604.18486) | image | - |