Repository: YU-deep/Awesome-Latent-Space
Branch: main
Commit: a056ea83b779
Files: 3
Total size: 110.9 KB
Directory structure:
gitextract_gi5h8kyw/
├── CONTRIBUTING.md
├── LICENSE
└── README.md
================================================
FILE CONTENTS
================================================
================================================
FILE: CONTRIBUTING.md
================================================
# 🤝 Contributing
We sincerely welcome paper updates/ contributions of any kind (and please do that lol)! Feel free to *open issues* or *create pull requests*. <br>
## Adding New Papers
If you want to add new papers to the existing list, please modify the README.md and follow the format in the table:
```markdown
| year/month | [Title of the Paper](Arxiv Link) | <img width="700" alt="image" src="img/name.png"> | [Github](Github Link) |
```
### Recommended Guidelines:
- Detect the earliest date of the paper, and place it properly in ascending order of the date.
- If possible, use arXiv links rather than links from other resources (e.g. conference page), and provide link to the *abstract* instead of PDF page.
- If the paper is accepted by a conference or journal, please add tag . See existing cases.
- An introduction must be added in the /img folder, which should be the overview of the method. Keep it as aesthetically pleasing as possible; avoid leaving too much blank space.
- If the Github link is available, please add it; otherwise, use "-" instead.
### Examples
```markdown
| 2024/12 |  <br/> [Deliberation in Latent Space via Differentiable Cache Augmentation](https://arxiv.org/abs/2412.17747) | <img width="700" alt="image" src="img/deliberation.png"> | - |
```
```markdown
| 2025/05 | [Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space](https://arxiv.org/abs/2505.13308) | <img width="700" alt="image" src="img/latent_seek.png"> | [Github](https://github.com/bigai-nlco/LatentSeekhttps://github.com/bigai-nlco/LatentSeek) |
```
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2025 Neil_Yu
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
<div align="center">
<h1 style="display: inline-flex; align-items: center;">
<img src="img/static/icon.png" alt="icon" style="width: 32px; height: 32px; margin-right: 8px;">
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook
</h1>
</div>
<p align="center">
<a href="https://github.com/sindresorhus/awesome"><img src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg" alt="Awesome list badge"></a>
<a href="https://github.com/YU-deep/Awesome-Latent-Space/stargazers"><img src="https://img.shields.io/github/stars/YU-deep/Awesome-Latent-Space?style=social" alt="GitHub stars"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-blue.svg" alt="MIT License"></a>
<a href="https://arxiv.org/abs/2604.02029"><img src="https://img.shields.io/badge/Arxiv-2604.02029-b31b1b.svg?logo=arXiv" alt="arXiv"></a>
<a href="https://huggingface.co/papers/2604.02029"><img src="https://img.shields.io/badge/Hugging_Face-2604.02029-292929.svg?logo=huggingface" alt="Hugging Face"></a>
<a href="CONTRIBUTING.md"><img src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg" alt="PRs welcome"></a>
<a href="img/static/wechat_group.jpg"><img src="https://img.shields.io/badge/Group-WeChat-07c160?logo=wechat&logoColor=white" alt="WeChat Group"></a>
<a href="https://www.semanticscholar.org/paper/The-Latent-Space%3A-Foundation%2C-Evolution%2C-Mechanism-Yu-Chen/47e1ecd29e617de26cc03f9615e303b19f52cfe1"><img src="https://img.shields.io/badge/dynamic/json?label=Citations&query=%24.citationCount&url=https%3A%2F%2Fapi.semanticscholar.org%2Fgraph%2Fv1%2Fpaper%2F47e1ecd29e617de26cc03f9615e303b19f52cfe1%3Ffields%3DcitationCount&logo=semanticscholar&cacheSeconds=3600" alt="Semantic Scholar Citations"></a>
</p>
This repository manually collects works in **latent space**, which will be continuously updated.
## 📖 News
**[2026/04/03]** We release our survey: [The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook](https://arxiv.org/pdf/2604.02029)!
**[2025/11/30]** We release the initial version!
[](https://star-history.com/#YU-deep/Awesome-Latent-Space&Date)
## 🌟 Overview
- [📖 News](#-news)
- [🌟 Overview](#-overview)
- [📄 Citation](#-citation)
- [🤝 Contributing](#-contributing)
- [🔥 Methods](#-methods)
- [Large-Language-Model](#large-language-model)
- [Vision-Language-Model](#vision-language-model)
- [Vision-Language-Action-Model](#vision-language-action-model)
## 📄 Citation
If you find this survey helpful, a citation to our paper would be greatly appreciated:
```bibtex
@article{yu2026latent,
title={The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook},
author={Yu, Xinlei and Chen, Zhangquan and He, Yongbo and Fu, Tianyu and Yang, Cheng and Xu, Chengming and Ma, Yue and Hu, Xiaobin and Cao, Zhe and Xu, Jie and others},
journal={arXiv preprint arXiv:2604.02029},
year={2026}
}
```
## 🤝 Contributing
We warmly welcome contributions of excellent resources you find via **pull request**. Please follow the instruction in **CONTRIBUTING.md** if you want to make one.
Additionally, if you want to have any other issue, please add our wechat group.
## 🔥 Methods
### Large-Language-Model
| Date | Paper Title | Introduction | Code |
|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| 2024/09 | [Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding](https://arxiv.org/abs/2409.08561) | <img width="700" alt="image" src="img/llm/2409_hcot.png"> | - |
| 2024/09 | [Uncovering Latent Chain of Thought Vectors in Language Models](https://arxiv.org/abs/2409.14026) | <img width="700" alt="image" src="img/llm/2409_uncover.png"> | - |
| 2024/10 | [Understanding Reasoning in Chain-of-Thought from the Hopfieldian View](https://arxiv.org/abs/2410.03595) | <img width="700" alt="image" src="img/llm/2410_hopfieldian.png"> | - |
| 2024/10 |  <br/> [Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation](https://arxiv.org/abs/2410.13640) | <img width="700" alt="image" src="img/llm/2410_coe.png"> | [Github](https://github.com/Alsace08/Chain-of-Embedding) |
| 2024/11 | [Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding](https://arxiv.org/abs/2411.04282) | <img width="700" alt="image" src="img/llm/2411_latro.png"> | [Github](https://github.com/SalesforceAIResearch/LaTRO) |
| 2024/12 |  <br/> [Training Large Language Models to Reason in a Continuous Latent Space](https://arxiv.org/abs/2412.06769) | <img width="700" alt="image" src="img/llm/2412_coconut.png"> | [Github](https://github.com/facebookresearch/coconut) |
| 2024/12 | [Compressed Chain of Thought: Efficient Reasoning Through Dense Representations](https://arxiv.org/abs/2412.13171) | <img width="700" alt="image" src="img/llm/2412_ccot.png"> | - |
| 2024/12 |  <br/> [Deliberation in Latent Space via Differentiable Cache Augmentation](https://arxiv.org/abs/2412.17747) | <img width="700" alt="image" src="img/llm/2412_deliberation.png"> | - |
| 2025/01 | [Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks](https://arxiv.org/abs/2501.10639) | <img width="700" alt="image" src="img/llm/2501_latpc.png"> | [Github](https://github.com/xinykou/Against_Jailbreak) |
| 2025/01 | [LF-Steering: Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models](https://arxiv.org/abs/2501.11036) | <img width="700" alt="image" src="img/llm/2501_lf_steering.png"> | - |
| 2025/02 |  <br/> [Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning](https://arxiv.org/abs/2502.03275) | <img width="700" alt="image" src="img/llm/2502_token.png"> | - |
| 2025/02 |  <br/> [Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization](https://arxiv.org/abs/2502.04686) | <img width="700" alt="image" src="img/llm/2502_lspo.png"> | - |
| 2025/02 |  <br/> [Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/abs/2502.05171) | <img width="700" alt="image" src="img/llm/2502_scaling.png"> | [Github](https://github.com/seal-rg/recurrent-pretraining) |
| 2025/02 |  <br/> [LLM Pretraining with Continuous Concepts](https://arxiv.org/abs/2502.08524) | <img width="700" alt="image" src="img/llm/2502_cocomix.png"> | [Github](https://github.com/facebookresearch/RAM/tree/main/projects/cocomix) |
| 2025/02 |  <br/> [SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs](https://arxiv.org/abs/2502.12134) | <img width="700" alt="image" src="img/llm/2502_soft_cot.png"> | [Github](https://github.com/xuyige/SoftCoT) |
| 2025/02 | [Human Preferences in Large Language Model Latent Space: A Technical Analysis on the Reliability of Synthetic Data in Voting Outcome Prediction](https://arxiv.org/abs/2502.16280) | <img width="700" alt="image" src="img/llm/2502_human.png"> | - |
| 2025/02 |  <br/> [Reasoning with Latent Thoughts: On the Power of Looped Transformers](https://arxiv.org/abs/2502.17416) | <img width="700" alt="image" src="img/llm/2502_reasoning.png"> | - |
| 2025/02 | [Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs](https://arxiv.org/abs/2502.21030) | <img width="700" alt="image" src="img/llm/2502_nano_gpt.png"> | - |
| 2025/02 |  <br/> [CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation](https://arxiv.org/abs/2502.21074) | <img width="700" alt="image" src="img/llm/2502_codi.png"> | [Github](https://github.com/zhenyi4/codi) |
| 2025/03 |  <br/> [Reasoning to Learn from Latent Thoughts](https://arxiv.org/abs/2503.18866) | <img width="700" alt="image" src="img/llm/2503_bolt.png"> | [Github](https://github.com/ryoungj/BoLT) |
| 2025/03 | [Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation](https://arxiv.org/abs/2503.22675) | <img width="700" alt="image" src="img/llm/2503_rearec.png"> | - |
| 2025/03 | [MoLAE: Mixture of Latent Experts for Parameter-Efficient Language Models](https://arxiv.org/abs/2503.23100) | <img width="700" alt="image" src="img/llm/2503_molae.png"> | - |
| 2025/04 | [Beyond Chains of Thought: Benchmarking Latent-Space Reasoning Abilities in Large Language Models](https://arxiv.org/abs/2504.10615) | - | - |
| 2025/04 | [Efficient Pretraining Length Scaling](https://arxiv.org/abs/2504.14992) | <img width="700" alt="image" src="img/llm/2504_phd.png"> | - |
| 2025/05 | [SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning](https://arxiv.org/abs/2505.11484) | <img width="700" alt="image" src="img/llm/2505_soft_cot_plus.png"> | [Github](https://github.com/xuyige/SoftCoT) |
| 2025/05 |  <br/> [Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought](https://arxiv.org/abs/2505.12514) | <img width="700" alt="image" src="img/llm/2505_reasoning.png"> | [Github](https://github.com/Ber666/reasoning-by-superposition) |
| 2025/05 | [Enhancing Latent Computation in Transformers with Latent Tokens](https://arxiv.org/abs/2505.12629) | <img width="700" alt="image" src="img/llm/2505_enhancing.png"> | - |
| 2025/05 | [Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space](https://arxiv.org/abs/2505.13308) | <img width="700" alt="image" src="img/llm/2505_latent_seek.png"> | [Github](https://github.com/bigai-nlco/LatentSeekhttps://github.com/bigai-nlco/LatentSeek) |
| 2025/05 | [Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs](https://arxiv.org/abs/2505.14530) | <img width="700" alt="image" src="img/llm/2505_internal.png"> | [Github](https://github.com/yzp11/Internal-Chain-of-Thought) |
| 2025/05 | [Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space](https://arxiv.org/abs/2505.15778) | <img width="700" alt="image" src="img/llm/2505_soft_think.png"> | - |
| 2025/05 |  <br/> [Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains](https://arxiv.org/abs/2505.16552) | <img width="700" alt="image" src="img/llm/2505_colar.png"> | [Github](https://github.com/xiaomi-research/colar) |
| 2025/05 | [LARES: Latent Reasoning for Sequential Recommendation](https://arxiv.org/abs/2505.16865) | <img width="700" alt="image" src="img/llm/2505_lares.png"> | - |
| 2025/05 |  <br/> [Hybrid Latent Reasoning via Reinforcement Learning](https://arxiv.org/abs/2505.18454) | <img width="700" alt="image" src="img/llm/2505_hrpo.png"> | [Github](https://github.com/thu-nics/C2C) |
| 2025/05 |  <br/> [System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts](https://arxiv.org/abs/2505.18962) | <img width="700" alt="image" src="img/llm/2505_system_15.png"> | - |
| 2025/05 |  <br/> [Reinforced Latent Reasoning for LLM-based Recommendation](https://arxiv.org/abs/2505.19092) | <img width="700" alt="image" src="img/llm/2505_r3.png"> | [Github](https://github.com/xuwenxinedu/R3) |
| 2025/05 | [Continuous Chain of Thought Enables Parallel Exploration and Reasoning](https://arxiv.org/abs/2505.23648) | <img width="700" alt="image" src="img/llm/2505_cot2.png"> | [Github](https://github.com/alperengozeten/CoT2) |
| 2025/05 |  <br/> [Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration](https://arxiv.org/abs/2505.24688) | <img width="700" alt="image" src="img/llm/2505_soft.png"> | [Github](https://github.com/alickzhu/Soft-Reasoning) |
| 2025/06 | [Efficient Post-Training Refinement of Latent Reasoning in Large Language Models](https://arxiv.org/abs/2506.08552) | <img width="700" alt="image" src="img/llm/2506_efficient.png"> | [Github](https://github.com/anord-wang/Lateng-Reasoning) |
| 2025/06 | [DART: Distilling Autoregressive Reasoning to Silent Thought](https://arxiv.org/abs/2506.11752) | <img width="700" alt="image" src="img/llm/2506_dart.png"> | - |
| 2025/06 |  <br/> [Parallel Continuous Chain-of-Thought with Jacobi Iteration](https://arxiv.org/pdf/2506.18582) | <img width="700" alt="image" src="img/llm/2506_pccot.png"> | [Github](https://github.com/whyNLP/PCCoT) |
| 2025/07 | [Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer](https://arxiv.org/abs/2507.02199) | <img width="700" alt="image" src="img/llm/2507_latent.png"> | [Github](https://github.com/wenquanlu/huginn-latent-cot) |
| 2025/07 | [CTRLS: Chain-of-Thought Reasoning via Latent State Transition](https://arxiv.org/abs/2507.08182) | <img width="700" alt="image" src="img/llm/2507_ctrls.png"> | - |
| 2025/07 | [Geometry of Knowledge Allows Extending Diversity Boundaries of Large Language Models](https://arxiv.org/abs/2507.13874) | <img width="700" alt="image" src="img/llm/2507_g2_seeds.png"> | - |
| 2025/08 | [Bridging Search and Recommendation through Latent Cross Reasoning](https://arxiv.org/abs/2508.04152) | <img width="700" alt="image" src="img/llm/2508_lcr_ser.png"> | - |
| 2025/08 | [LatentPrompt: Optimizing Promts in Latent Space](https://arxiv.org/abs/2508.02452) | <img width="700" alt="image" src="img/llm/2508_latent_prompt.png"> | - |
| 2025/08 | [Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs](https://arxiv.org/abs/2508.10029) | <img width="700" alt="image" src="img/llm/2508_lfj.png"> | - |
| 2025/09 | [Decoding in Latent Spaces for Efficient Inference in LLM-based Recommendation](https://arxiv.org/abs/2509.11524) | <img width="700" alt="image" src="img/llm/2509_l2d.png"> | - |
| 2025/09 | [LTA-thinker: Latent Thought-Augmented Training Framework for Large Language Models on Complex Reasoning](https://arxiv.org/abs/2509.12875) | <img width="700" alt="image" src="img/llm/2509_lta_thinker.png"> | [Github](https://github.com/wangjiaqi886/LTA-Thinker) |
| 2025/09 |  <br/> [The Transfer Neurons Hypothesis: An Underlying Mechanism for Language Latent Space Transitions in Multilingual LLMs](https://arxiv.org/abs/2509.17030) | <img width="700" alt="image" src="img/llm/2509_lsa.png"> | - |
| 2025/09 | [LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation](https://arxiv.org/abs/2509.19839) | <img width="700" alt="image" src="img/llm/2509_latent_guard.png"> | - |
| 2025/09 |  <br/> [SIM-CoT: Supervised Implicit Chain-of-Thought](https://arxiv.org/abs/2509.20317) | <img width="700" alt="image" src="img/llm/2509_sim_cot.png"> | [Github](https://github.com/InternLM/SIM-CoT) |
| 2025/09 | [PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space](https://arxiv.org/abs/2509.23184) | <img width="700" alt="image" src="img/llm/2509_ponderlm2.png"> | [Github](https://github.com/LUMIA-Group/PonderLM-2) |
| 2025/09 | [Fast Thinking for Large Language Models](https://arxiv.org/abs/2509.23633) | <img width="700" alt="image" src="img/llm/2509_fast_thinking.png"> | - |
| 2025/09 | [Learning to Ponder: Adaptive Reasoning in Latent Space](https://arxiv.org/abs/2509.24238) | <img width="700" alt="image" src="img/llm/2509_fr_ponder.png"> | - |
| 2025/09 | [Identity Bridge: Enabling Implicit Reasoning via Shared Latent Memory](https://arxiv.org/abs/2509.24653) | <img width="700" alt="image" src="img/llm/2509_identify.png"> | - |
| 2025/09 |  <br/> [MemGen: Weaving Generative Latent Memory for Self-Evolving Agents](https://arxiv.org/abs/2509.24704) | <img width="700" alt="image" src="img/llm/2509_memgen.png"> | [Github](https://github.com/KANABOON1/MemGen) |
| 2025/09 | [LatentEvolve: Self-Evolving Test-Time Scaling in Latent Space](https://arxiv.org/abs/2509.24771) | <img width="700" alt="image" src="img/llm/2509_latent_evolve.png"> | [Github](https://github.com/jins7/LatentEvolve) |
| 2025/09 | [MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts](https://arxiv.org/abs/2509.25020) | <img width="700" alt="image" src="img/llm/2509_marcos.png"> | - |
| 2025/09 | [A Formal Comparison Between Chain of Thought and Latent Thought](https://arxiv.org/abs/2509.25239) | - | - |
| 2025/09 |  <br/> [Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts](https://arxiv.org/abs/2509.26314) | <img width="700" alt="image" src="img/llm/2509_huginn.png"> | - |
| 2025/10 | [Thoughtbubbles: an Unsupervised Method for Parallel Thinking in Latent Space](https://arxiv.org/abs/2510.00219) | <img width="700" alt="image" src="img/llm/2510_thought_bubbles.png"> | [Github](https://github.com/stanfordnlp/thoughtbubbles) |
| 2025/10 | [Analyzing Latent Concepts in Code Language Models](https://arxiv.org/abs/2510.00476) | <img width="700" alt="image" src="img/llm/2510_cocoa.png"> | - |
| 2025/10 | [Exploring System 1 and 2 communication for latent reasoning in LLMs](https://arxiv.org/abs/2510.00494) | <img width="700" alt="image" src="img/llm/2510_exploring.png"> | - |
| 2025/10 |  <br/> [KaVa: Latent Reasoning via Compressed KV-Cache Distillation](https://arxiv.org/abs/2510.02312) | <img width="700" alt="image" src="img/llm/2510_kava.png"> | - |
| 2025/10 |  <br/> [Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization](https://arxiv.org/abs/2510.04182) | <img width="700" alt="image" src="img/llm/2510_ltpo.png"> | [Github](https://github.com/ltpo2025/LTPO) |
| 2025/10 |  <br/> [LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning](https://arxiv.org/abs/2510.04573) | <img width="700" alt="image" src="img/llm/2510_ladir.png"> | [Github](https://github.com/mk322/LaDiR) |
| 2025/10 |  <br/> [SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs](https://arxiv.org/abs/2510.05069) | <img width="700" alt="image" src="img/llm/2510_swin_reasoning.png"> | [Github](https://github.com/sdc17/SwiReasoning) |
| 2025/10 | [Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts](https://arxiv.org/abs/2510.07358) | <img width="700" alt="image" src="img/llm/2510_etd.png"> | - |
| 2025/10 | [Parallel Test-Time Scaling for Latent Reasoning Models](https://arxiv.org/abs/2510.07745) | <img width="700" alt="image" src="img/llm/2510_agn.png"> | [Github](https://github.com/YRYangang/LatentTTS) |
| 2025/10 | [LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback](https://arxiv.org/abs/2510.08604) | <img width="700" alt="image" src="img/llm/2510_latent_break.png"> | - |
| 2025/10 | [Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection](https://arxiv.org/abs/2510.09694) | <img width="700" alt="image" src="img/llm/2510_kelp.png"> | [Github](https://github.com/Alibaba-AAIG/Kelp) |
| 2025/10 | [Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning](https://arxiv.org/abs/2510.10494) | <img width="700" alt="image" src="img/llm/2510_latent-trajectory.png"> | - |
| 2025/10 | [Unlocking Out-of-Distribution Generalization in Transformers via Recursive Latent Space Reasoning](https://arxiv.org/abs/2510.14095) | <img width="700" alt="image" src="img/llm/2510_ood.png"> | [Github](https://github.com/Awni00/algorithmic-generalization-transformer-architectures) |
| 2025/10 | [Language Models are Injective and Hence Invertible](https://arxiv.org/abs/2510.15511) | <img width="700" alt="image" src="img/llm/2510_language.png"> | - |
| 2025/10 | [LLM Latent Reasoning as Chain of Superposition](https://arxiv.org/abs/2510.15522) | <img width="700" alt="image" src="img/llm/2510_latent_sft.png"> | [Github](https://github.com/DJC-GO-SOLO/Latent-SFT) |
| 2025/10 |  <br/> [ActivationReasoning: Logical Reasoning in Latent Activation Spaces](https://arxiv.org/abs/2510.18184) | <img width="700" alt="image" src="img/llm/2510_activation_reasoning.png"> | - |
| 2025/10 |  <br/> [Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models](https://arxiv.org/abs/2510.22042) | <img width="700" alt="image" src="img/llm/2510_emotion.png"> | - |
| 2025/10 |  <br/> [SALS: Sparse Attention in Latent Space for KV cache Compression](https://arxiv.org/abs/2510.24273) | <img width="700" alt="image" src="img/llm/2510_sals.png"> | - |
| 2025/10 |  <br/> [SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens](https://arxiv.org/abs/2510.24940) | <img width="700" alt="image" src="img/llm/2510_sem_cot.png"> | [Github](https://github.com/YinhanHe123/SemCoT) |
| 2025/10 | [Scaling Latent Reasoning via Looped Language Models](https://arxiv.org/abs/2510.25741) | <img width="700" alt="image" src="img/llm/2510_ouro.png"> | - |
| 2025/10 |  <br/> [Cache-to-Cache: Direct Semantic Communication Between Large Language Model](https://arxiv.org/abs/2510.03215) | <img width="700" alt="image" src="img/llm/2510_c2c.png"> | [Github](https://github.com/thu-nics/C2C) |
| 2025/10 |  <br/> [Thought Communication in Multiagent Collaboration](https://arxiv.org/abs/2510.20733) | <img width="700" alt="image" src="img/llm/2510_thoughtcomm.png"> | - |
| 2025/11 | [SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization](https://arxiv.org/abs/2511.06411) | <img width="700" alt="image" src="img/llm/2511_soft_cot.png"> | [Github](https://github.com/zz1358m/SofT-GRPO-master) |
| 2025/11 | [Think Consistently, Reason Efficiently: Energy-Based Calibration for Implicit Chain-of-Thought](https://arxiv.org/abs/2511.07124) | <img width="700" alt="image" src="img/llm/2511_ebm.png"> | - |
| 2025/11 | [Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models](https://arxiv.org/abs/2511.08577) | <img width="700" alt="image" src="img/llm/2511_tah.png"> | [Github](https://github.com/thu-nics/TaH) |
| 2025/11 | [SpiralThinker: Latent Reasoning through an Iterative Process with Text-Latent Interleaving](https://arxiv.org/abs/2511.08983) | <img width="700" alt="image" src="img/llm/2511_spiral_thinker.png"> | - |
| 2025/11 | [Enabling Agents to Communicate Entirely in Latent Space](https://arxiv.org/abs/2511.09149) | <img width="700" alt="image" src="img/llm/2511_interlat.png"> | - |
| 2025/11 | [Improving Latent Reasoning in LLMs via Soft Concept Mixing](https://arxiv.org/abs/2511.16885) | <img width="700" alt="image" src="img/llm/2511_scm.png"> | - |
| 2025/11 | [Your Latent Reasoning is Secretly Policy Improvement Operator](https://arxiv.org/abs/2511.16886) | - | - |
| 2025/11 | [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) | <img width="700" alt="image" src="img/llm/2511_clara.png"> | [Github](https://github.com/apple/ml-clara) |
| 2025/11 | [Learning When to Stop: Adaptive Latent Reasoning via Reinforcement Learning](https://arxiv.org/abs/2511.21581) | <img width="700" alt="image" src="img/llm/2511_learning.png"> | [Github](https://github.com/apning/adaptive-latent-reasoning) |
| 2025/11 | [Visualizing LLM Latent Space Geometry Through Dimensionality Reduction](https://arxiv.org/abs/2511.21594) | <img width="700" alt="image" src="img/llm/2511_visualization.png"> | [Github](https://github.com/Vainateya/Feature_Geometry_Visualization) |
| 2025/11 | [Polarity-Aware Probing for Quantifying Latent Alignment in Language Models](https://arxiv.org/abs/2511.21737) | <img width="700" alt="image" src="img/llm/2511_pa_ccs.png"> | [Github](https://github.com/SadSabrina/polarity-probing) |
| 2025/11 | [Latent Collaboration in Multi-Agent Systems](https://arxiv.org/abs/2511.20639) | <img width="700" alt="image" src="img/llm/2511_latent_mas.png"> | [Github](https://github.com/Gen-Verse/LatentMAS) |
| 2025/12 | [Latent Debate: A Surrogate Framework for Interpreting LLM Thinking](https://arxiv.org/abs/2512.01909) | <img width="700" alt="image" src="img/llm/2512_latent.png"> | [Github](https://github.com/tigerchen52/latent_debate) |
| 2025/12 | [Lightweight Latent Reasoning for Narrative Tasks](https://arxiv.org/abs/2512.02240) | <img width="700" alt="image" src="img/llm/2512_lite_reason.png"> | - |
| 2025/12 |  <br/> [Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation](https://arxiv.org/abs/2512.06690) | <img width="700" alt="image" src="img/llm/2512_flythinker.png"> | - |
| 2025/12 | [ReLaX: Reasoning with Latent Exploration for Large Reasoning Models](https://arxiv.org/abs/2512.07558) | <img width="700" alt="image" src="img/llm/2512_relax.png"> | - |
| 2025/12 | [Reinforcement Learning for Latent-Space Thinking in LLMs](https://arxiv.org/abs/2512.11816) | <img width="700" alt="image" src="img/llm/2512_reinforcement.png"> | [Github](https://github.com/enesozeren/latent-space-thinking-model) |
| 2025/12 | [Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs](https://arxiv.org/abs/2512.17206) | <img width="700" alt="image" src="img/llm/2512_repa.png"> | - |
| 2025/12 | [JEPA-Reasoner: Decoupling Latent Reasoning from Token Generation](https://arxiv.org/abs/2512.19171) | <img width="700" alt="image" src="img/llm/2512_jepa_reasoner.png"> | - |
| 2025/12 | [Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought](https://arxiv.org/abs/2512.21711) | <img width="700" alt="image" src="img/llm/2512_coconut.png"> | - |
| 2025/12 | [iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning](https://arxiv.org/abs/2512.24014) | <img width="700" alt="image" src="img/llm/2512_iclp.png"> | [Github](https://github.com/AgenticFinLab/latent-planning) |
| 2025/12 | [Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space](https://arxiv.org/abs/2512.24617) | <img width="700" alt="image" src="img/llm/2512_dlcm.png"> | - |
| 2025/12 | [Learning Evolving Latent Strategies for Multi-Agent Language Systems without Model Fine-Tuning](https://arxiv.org/abs/2512.20629) | - | - |
| 2026/01 | [Parallel Latent Reasoning for Sequential Recommendation](https://arxiv.org/abs/2601.03153) | <img width="700" alt="image" src="img/llm/2601_plr.png"> | - |
| 2026/01 | [Latent Space Communication via K-V Cache Alignment](https://arxiv.org/abs/2601.06123) | <img width="700" alt="image" src="img/llm/2601_latent.png"> | - |
| 2026/01 | [Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models](https://arxiv.org/abs/2601.03542) | <img width="700" alt="image" src="img/llm/2601_layer.png"> | [Github](https://github.com/laquabe/Layer-Order-Inversion) |
| 2026/01 | [FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse](https://arxiv.org/abs/2601.05505) | <img width="700" alt="image" src="img/llm/2601_flashmem.png"> | - |
| 2026/01 | [IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck](https://arxiv.org/abs/2601.05870) | <img width="700" alt="image" src="img/llm/2601_i2b_lpo.png"> | [Github](https://github.com/denghuilin-cyber/IIB-LPO) |
| 2026/01 | [Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space](https://arxiv.org/abs/2601.06220) | <img width="700" alt="image" src="img/llm/2601_zero_router.png"> | [Github](https://github.com/Codeffun3/ZeroRouter) |
| 2026/01 | [Silence the Judge: Reinforcement Learning with Self-Verifier via Latent Geometric Clustering](https://arxiv.org/abs/2601.08427) | <img width="700" alt="image" src="img/llm/2601_latent_grpo.png"> | - |
| 2026/01 | [Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models](https://arxiv.org/abs/2601.08058) | <img width="700" alt="image" src="img/llm/2601_reasoning.png"> | - |
| 2026/01 | [RISER: Orchestrating Latent Reasoning Skills for Adaptive Activation Steering](https://arxiv.org/abs/2601.09269) | <img width="700" alt="image" src="img/llm/2601_riser.png"> | [Github](https://github.com/gooogleshanghai/RISER-Orchestrating-Latent-Reasoning-Skills-for-Adaptive-Activation-Steering) |
| 2026/01 | [GeoSteer: Faithful Chain-of-Thought Steering via Latent Manifold Gradients](https://arxiv.org/abs/2601.10229) | <img width="700" alt="image" src="img/llm/2601_geosteer.png"> | - |
| 2026/01 | [Reasoning While Recommending: Entropy-Guided Latent Reasoning in Generative Re-ranking Models](https://arxiv.org/abs/2601.13533) | <img width="700" alt="image" src="img/llm/2601_eglr.png"> | - |
| 2026/01 | [Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning](https://arxiv.org/abs/2601.17275) | <img width="700" alt="image" src="img/llm/2601_dlr.png"> | - |
| 2026/01 | [UniCog: Uncovering Cognitive Abilities of LLMs through Latent Mind Space Analysis](https://arxiv.org/abs/2601.17897) | <img width="700" alt="image" src="img/llm/2601_unicog.png"> | [Github](https://github.com/milksalute/unicog) |
| 2026/01 | [S2GR: Stepwise Semantic-Guided Reasoning in Latent Space for Generative Recommendation](https://arxiv.org/abs/2601.18664) | <img width="700" alt="image" src="img/llm/2601_s2gr.png"> | - |
| 2026/01 | [The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning](https://arxiv.org/abs/2601.18832) | <img width="700" alt="image" src="img/llm/2601_tgr.png"> | - |
| 2026/01 | [PILOT: Planning via Internalized Latent Optimization Trajectories for Large Language Models](https://arxiv.org/abs/2601.19917) | <img width="700" alt="image" src="img/llm/2601_pilot.png"> | - |
| 2026/01 | [Beyond Imitation: Reinforcement Learning for Active Latent Planning](https://arxiv.org/abs/2601.21598) | <img width="700" alt="image" src="img/llm/2601_atp_latent.png"> | [Github](https://github.com/zz1358m/ATP-Latent-master) |
| 2026/01 | [Latent Adversarial Regularization for Offline Preference Optimization](https://arxiv.org/abs/2601.22083) | <img width="700" alt="image" src="img/llm/2601_ganpo.png"> | - |
| 2026/01 | [Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization](https://arxiv.org/abs/2601.21358) | <img width="700" alt="image" src="img/llm/2601_plat.png"> | [Github](https://github.com/yunsaijc/PLaT) |
| 2026/01 | [Depth-Recurrent Attention Mixtures: Giving Latent Reasoning the Attention it Deserves](https://arxiv.org/abs/2601.21582) | <img width="700" alt="image" src="img/llm/2601_dreamer.png"> | - |
| 2026/01 | [From Logits to Latents: Contrastive Representation Shaping for LLM Unlearning](https://arxiv.org/abs/2601.22028) | - | - |
| 2026/01 | [ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought](https://arxiv.org/abs/2601.23184) | <img width="700" alt="image" src="img/llm/2601_regular.png"> | [Github](https://github.com/FanmengWang/ReGuLaR) |
| 2026/02 | [G-MemLLM: Gated Latent Memory Augmentation for Long-Context Reasoning in Large Language Models](https://arxiv.org/abs/2602.00015) | <img width="700" alt="image" src="img/llm/2602_g_memlm.png"> | - |
| 2026/02 | [Do Latent-CoT Models Think Step-by-Step? A Mechanistic Study on Sequential Reasoning Tasks](https://arxiv.org/abs/2602.00449) | <img width="700" alt="image" src="img/llm/2602_codi.png"> | [Github](https://github.com/jialiang19/latent-cot-thinking) |
| 2026/02 | [Capabilities and Fundamental Limits of Latent Chain-of-Thought](https://arxiv.org/abs/2602.01148) | - | - |
| 2026/02 | [Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models](https://arxiv.org/abs/2602.01698) | <img width="700" alt="image" src="img/llm/2602_led.png"> | [Github](https://github.com/Xiaomi-Research/LED) |
| 2026/02 | [No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs](https://arxiv.org/abs/2602.02103) | - | [Github](https://github.com/lxucs/tele-lens) |
| 2026/02 | [CoLT: Reasoning with Chain of Latent Tool Calls](https://arxiv.org/abs/2602.04246) | <img width="700" alt="image" src="img/llm/2602_colt.png"> | - |
| 2026/02 | [Internalizing LLM Reasoning via Discovery and Replay of Latent Actions](https://arxiv.org/abs/2602.04925) | <img width="700" alt="image" src="img/llm/2602_stir.png"> | [Github](https://github.com/sznnzs/LLM-Latent-Action) |
| 2026/02 | [Inference-Time Rethinking with Latent Thought Vectors for Math Reasoning](https://arxiv.org/abs/2602.06584) | <img width="700" alt="image" src="img/llm/2602_inference.png"> | - |
| 2026/02 | [LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning](https://arxiv.org/abs/2602.07075) | <img width="700" alt="image" src="img/llm/2602_latent_chem.png"> | [Github](https://github.com/xinwuye/LatentChem) |
| 2026/02 | [DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity](https://arxiv.org/abs/2602.08005) | <img width="700" alt="image" src="img/llm/2602_delta_kv.png"> | [Github](https://github.com/CURRENTF/Sparse-vLLM) |
| 2026/02 | [Pretraining with Token-Level Adaptive Latent Chain-of-Thought](https://arxiv.org/abs/2602.08220) | <img width="700" alt="image" src="img/llm/2602_pretraining.png"> | - |
| 2026/02 | [Latent Reasoning with Supervised Thinking States](https://arxiv.org/abs/2602.08332) | <img width="700" alt="image" src="img/llm/2602_latent.png"> | - |
| 2026/02 | [Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure](https://arxiv.org/abs/2602.08783) | <img width="700" alt="image" src="img/llm/2602_dynamics.png"> | - |
| 2026/02 | [Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models](https://arxiv.org/abs/2602.08984) | <img width="700" alt="image" src="img/llm/2602_concept_lm.png"> | [Github](https://github.com/LUMIA-Group/ConceptLM) |
| 2026/02 | [Talking with the Latents -- how to convert your LLM into an astronomer](https://arxiv.org/abs/2602.09670) | <img width="700" alt="image" src="img/llm/2602_talking.png"> | - |
| 2026/02 | [Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens](https://arxiv.org/abs/2602.10229) | <img width="700" alt="image" src="img/llm/2602_lt_tuning.png"> | [Github](https://github.com/NeosKnight233/Latent-Thoughts-Tuning) |
| 2026/02 | [Prioritize the Process, Not Just the Outcome: Rewarding Latent Thought Trajectories Improves Reasoning in Looped Language Models](https://arxiv.org/abs/2602.10520) | <img width="700" alt="image" src="img/llm/2602_rltt.png"> | - |
| 2026/02 |  <br/> [LoopFormer: Elastic-Depth Looped Transformers for Latent Reasoning via Shortcut Modulation](https://arxiv.org/abs/2602.11451) | <img width="700" alt="image" src="img/llm/2602_loop_former.png"> | [Github](https://github.com/armenjeddi/loopformer) |
| 2026/02 | [Jailbreaking Leaves a Trace: Understanding and Detecting Jailbreak Attacks from Internal Representations of Large Language Models](https://arxiv.org/abs/2602.11495) | <img width="700" alt="image" src="img/llm/2602_jailbreaking.png"> | - |
| 2026/02 |  <br/> [Native Reasoning Models: Training Language Models to Reason on Unverifiable Data](https://arxiv.org/abs/2602.11549) | <img width="700" alt="image" src="img/llm/2602_nrt.png"> | - |
| 2026/02 | [ThinkRouter: Efficient Reasoning via Routing Thinking between Latent and Discrete Spaces](https://arxiv.org/abs/2602.11683) | <img width="700" alt="image" src="img/llm/2602_think_router.png"> | - |
| 2026/02 | [SpiralFormer: Looped Transformers Can Learn Hierarchical Dependencies via Multi-Resolution Recursion](https://arxiv.org/abs/2602.11698) | <img width="700" alt="image" src="img/llm/2602_spiralformer.png"> | - |
| 2026/02 | [GTS: Inference-Time Scaling of Latent Reasoning with a Learnable Gaussian Thought Sampler](https://arxiv.org/abs/2602.14077) | <img width="700" alt="image" src="img/llm/2602_gts.png"> | - |
| 2026/02 | [Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation](https://arxiv.org/abs/2602.14469) | <img width="700" alt="image" src="img/llm/2602_ssr.png"> | - |
| 2026/02 | [Inner Loop Inference for Pretrained Transformers: Unlocking Latent Capabilities Without Training](https://arxiv.org/abs/2602.14759) | - | [Github](https://github.com/jonathanlys01/looped-transformer) |
| 2026/02 | [LatentMem: Customizing Latent Memory for Multi-Agent Systems](https://arxiv.org/abs/2602.03036) | <img width="700" alt="image" src="img/llm/2602_latent_mem.png"> | [Github](https://github.com/KANABOON1/LatentMem) |
| 2026/02 | [Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems](https://arxiv.org/abs/2602.03695) | <img width="700" alt="image" src="img/llm/2602_agent.png"> | - |
| 2026/03 | [LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval](https://arxiv.org/abs/2603.01425) | <img width="700" alt="image" src="img/llm/2603_laser.png"> | [Github](https://github.com/ignorejjj/LaSER) |
| 2026/03 |  <br/> [Multi-Head Low-Rank Attention](https://arxiv.org/abs/2603.02188) | - | [Github](https://github.com/SongtaoLiu0823/MLRA) |
| 2026/03 | [AdaPonderLM: Gated Pondering Language Models with Token-Wise Adaptive Depth](https://arxiv.org/abs/2603.01914) | <img width="700" alt="image" src="img/llm/2603_adaponderlm.png"> | - |
| 2026/03 | [PonderLM-3: Adaptive Token-Wise Pondering with Differentiable Masking](https://arxiv.org/abs/2603.02023) | <img width="700" alt="image" src="img/llm/2603_ponderlm3.png"> | - |
| 2026/03 | [When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning](https://arxiv.org/abs/2603.03475) | - | [Github](https://github.com/SubramanyamSahoo/When-Shallow-Wins) |
| 2026/03 |  <br/> [∇-REASONER: LLM REASONING VIA TEST-TIMEGRADIENT DESCENT IN LATENT SPACE](https://arxiv.org/abs/2603.04948) | - | - |
| 2026/03 | [SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models](https://arxiv.org/abs/2603.06222) | <img width="700" alt="image" src="img/llm/2603_spot.png"> | - |
| 2026/03 | [NextMem: Towards Latent Factual Memory for LLM-based Agents](https://arxiv.org/abs/2603.15634) | <img width="700" alt="image" src="img/llm/2603_nextmem.png"> | [Github](https://github.com/nuster1128/NextMem) |
| 2026/03 | [Contrastive Reasoning Alignment: Reinforcement Learning from Hidden Representations](https://arxiv.org/abs/2603.17305) | <img width="700" alt="image" src="img/llm/2603_craft.png"> | - |
| 2026/03 | [LoopRPT: Reinforcement Pre-Training for Looped Language Models](https://arxiv.org/abs/2603.19714) | <img width="700" alt="image" src="img/llm/2603_looprpt.png"> | - |
### Vision-Language-Model
| Date | Paper Title | Introduction | Code |
|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|----------------------------------------------------------------------|
| 2024/10 | [Reducing hallucinations in large vision-language models via latent space steering](https://arxiv.org/abs/2410.15778) | <img width="700" alt="image" src="img/vlm/2410_vti.png"> | [Github](https://github.com/shengliu66/VTI) |
| 2024/12 |  <br/> [Perception Tokens Enhance Visual Reasoning in Multimodal Language Models](https://arxiv.org/abs/2412.03548) | <img width="700" alt="image" src="img/vlm/2412_aurora.png"> | [Github](https://github.com/mahtabbigverdi/Aurora-perception) |
| 2025/01 | [Efficient Reasoning with Hidden Thinking](https://arxiv.org/abs/2501.19201) | <img width="700" alt="image" src="img/vlm/2501_heima.png"> | [Github](https://github.com/shawnricecake/Heima) |
| 2025/02 |  <br/> [AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding](https://arxiv.org/abs/2502.01341) | <img width="700" alt="image" src="img/vlm/2502_alignvlm.png"> | - |
| 2025/03 | [Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models](https://arxiv.org/abs/2503.17142) | <img width="700" alt="image" src="img/vlm/2503_gde.png"> | [Github](https://github.com/BerasiDavide/vlm_image_compositionality) |
| 2025/05 |  <br/> [Towards General Continuous Memory for Vision-Language Models](https://arxiv.org/abs/2505.17670) | <img width="700" alt="image" src="img/vlm/2505_comem.png"> | [Github](https://github.com/WenyiWU0111/CoMEM/tree/main) |
| 2025/05 |  <br/> [Image Tokens Matter: Mitigating Hallucination in Discrete Tokenizer-based Large Vision-Language Models via Latent Editing](https://arxiv.org/abs/2505.21547) | <img width="700" alt="image" src="img/vlm/2505_cgc_vtd.png"> | [Github](https://github.com/weixingW/CGC-VTD/tree/main) |
| 2025/06 | [Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens](https://arxiv.org/abs/2506.17218) | <img width="700" alt="image" src="img/vlm/2506_mirage.png"> | [Github](https://github.com/UMass-Embodied-AGI/Mirage) |
| 2025/08 | [Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models](https://arxiv.org/abs/2508.12587) | <img width="700" alt="image" src="img/vlm/2508_mcout.png"> | - |
| 2025/09 |  <br/> [MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning](https://arxiv.org/abs/2509.22761) | <img width="700" alt="image" src="img/vlm/2509_milr.png"> | [Github](https://github.com/spatigen/milr) |
| 2025/09 |  <br/> [Latent Visual reasoning](https://arxiv.org/abs/2509.24251) | <img width="700" alt="image" src="img/vlm/2509_lvr.png"> | [Github](https://github.com/VincentLeebang/lvr) |
| 2025/10 | [Auto-scaling Continuous Memory for GUI Agent](https://arxiv.org/abs/2510.09038) | <img width="700" alt="image" src="img/vlm/2510_auto.png"> | [Github](https://github.com/WenyiWU0111/CoMEM-Agent) |
| 2025/10 | [Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space](https://arxiv.org/abs/2510.12603) | <img width="700" alt="image" src="img/vlm/2510_ivt_lr.png"> | [Github](https://github.com/FYYDCC/IVT-LR) |
| 2025/10 |  <br/> [Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views](https://arxiv.org/abs/2510.18632) | <img width="700" alt="image" src="img/vlm/2510_think.png"> | [Github](https://github.com/zhangquanchen/3DThinker) |
| 2025/10 | [Latent Chain-of-Thought for Visual Reasoning](https://arxiv.org/abs/2510.23925) | <img width="700" alt="image" src="img/vlm/2510_lacot.png"> | [Github](https://github.com/heliossun/LaCoT) |
| 2025/10 | [Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs](https://arxiv.org/abs/2510.24514) | <img width="700" alt="image" src="img/vlm/2510_sketchpad.png"> | [Github](https://github.com/hwanyu112/Latent-Sketchpad) |
| 2025/11 | [Multimodal Reasoning via Latent Refocusing](https://arxiv.org/abs/2511.02360) | <img width="700" alt="image" src="img/vlm/2511_lare.png"> | - |
| 2025/11 |  <br/> [VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Model](https://arxiv.org/abs/2511.11007) | <img width="700" alt="image" src="img/vlm/2511_vismem.png"> | [Github](https://github.com/YU-deep/VisMem) |
| 2025/11 | [L2V-CoT: Cross-Modal Transfer of Chain-of-Thought Reasoning via Latent Intervention](https://arxiv.org/abs/2511.17910) | <img width="700" alt="image" src="img/vlm/2511_l2v_cot.png"> | - |
| 2025/11 | [Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens](https://arxiv.org/abs/2511.19418) | <img width="700" alt="image" src="img/vlm/2511_covt.png"> | [Github](https://github.com/Wakals/CoVT) |
| 2025/11 | [Reading Between the Lines: Abstaining from VLM-Generated OCR Errors via Latent Representation Probes](https://arxiv.org/abs/2511.19806) | <img width="700" alt="image" src="img/vlm/2511_lrp.png"> | - |
| 2025/11 | [Monet: Reasoning in Latent Visual Space Beyond Image and Language](https://arxiv.org/abs/2511.21395) | <img width="700" alt="image" src="img/vlm/2511_monet.png"> | [Github](https://github.com/NOVAglow646/) |
| 2025/12 | [Interleaved Latent Visual Reasoning with Selective Perceptual Modeling](https://arxiv.org/abs/2512.05665) | <img width="700" alt="image" src="img/vlm/2512_ilvr.png"> | [Github](https://github.com/XD111ds/ILVR) |
| 2025/12 | [Mull-Tokens: Modality-Agnostic Latent Thinking](https://arxiv.org/abs/2512.10941) | <img width="700" alt="image" src="img/vlm/2512_mull.png"> | - |
| 2025/12 | [VL-JEPA: Joint Embedding Predictive Architecture for Vision-language](https://arxiv.org/abs/2512.10942) | <img width="700" alt="image" src="img/vlm/2512_vl_jepa.png"> | - |
| 2025/12 | [Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space](https://arxiv.org/abs/2512.12623) | <img width="700" alt="image" src="img/vlm/2512_mind.png"> | [Github](https://github.com/eric-ai-lab/DMLR) |
| 2025/12 | [Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs](https://arxiv.org/abs/2512.16584) | <img width="700" alt="image" src="img/vlm/2512_skila.png"> | [Github](https://github.com/TungChintao/SkiLa) |
| 2025/12 | [Latent Implicit Visual Reasoning](https://arxiv.org/abs/2512.21218) | <img width="700" alt="image" src="img/vlm/2512_livr.png"> | - |
| 2026/01 | [Forest Before Trees: Latent Superposition for Efficient Visual Reasoning](https://arxiv.org/abs/2601.06803) | <img width="700" alt="image" src="img/vlm/2601_laser.png"> | [Github](https://github.com/ybb6/laser) |
| 2026/01 | [Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions](https://arxiv.org/abs/2601.07516) | <img width="700" alt="image" src="img/vlm/2601_controlling.png"> | - |
| 2026/01 | [LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning](https://arxiv.org/abs/2601.10129) | <img width="700" alt="image" src="img/vlm/2601_lavit.png"> | [Github](https://github.com/Svardfox/LaViT) |
| 2026/01 | [PREGEN: Uncovering Latent Thoughts in Composed Video Retrieval](https://arxiv.org/abs/2601.13797) | <img width="700" alt="image" src="img/vlm/2601_pregen.png"> | - |
| 2026/01 | [Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning](https://arxiv.org/abs/2601.14750) | <img width="700" alt="image" src="img/vlm/2601_rot.png"> | [Github](https://github.com/TencentBAC/RoT) |
| 2026/01 | [CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding](https://arxiv.org/abs/2601.21262) | <img width="700" alt="image" src="img/vlm/2601_casual_embed.png"> | - |
| 2026/02 | [PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Multimodal Agents](https://arxiv.org/abs/2602.00415) | <img width="700" alt="image" src="img/vlm/2602_polar_mem.png"> | [Github](https://github.com/czs-ict/PolarMem) |
| 2026/02 | [LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs](https://arxiv.org/abs/2602.00462) | <img width="700" alt="image" src="img/vlm/2602_latent_lens.png"> | [Github](https://github.com/McGill-NLP/latentlens) |
| 2026/02 | [Dual Latent Memory for Visual Multi-agent System](https://arxiv.org/abs/2602.00471) | <img width="700" alt="image" src="img/vlm/2602_l2_vmas.png"> | [Github](https://github.com/YU-deep/L2-VMAS) |
| 2026/02 | [Learning Modal-Mixed Chain-of-Thought Reasoning with Latent Embeddings](https://arxiv.org/abs/2602.00574) | <img width="700" alt="image" src="img/vlm/2602_learning.png"> | - |
| 2026/02 | [Toward Cognitive Supersensing in Multimodal Large Language Model](https://arxiv.org/abs/2602.01541) | <img width="700" alt="image" src="img/vlm/2602_cog_sense.png"> | [Github](https://github.com/PediaMedAI/Cognition-MLLM) |
| 2026/02 | [Visual Reasoning over Time Series via Multi-Agent Systems](https://arxiv.org/abs/2602.03026) | <img width="700" alt="image" src="img/vlm/2602_mas4ts.png"> | - |
| 2026/02 | [Vision-aligned Latent Reasoning for Multi-modal Large Language Model](https://arxiv.org/abs/2602.04476) | <img width="700" alt="image" src="img/vlm/2602_valr.png"> | - |
| 2026/02 | [Multimodal Latent Reasoning via Hierarchical Visual Cues Injection](https://arxiv.org/abs/2602.05359) | <img width="700" alt="image" src="img/vlm/2602_hive.png"> | - |
| 2026/02 | [LCLA: Language-Conditioned Latent Alignment for Vision-Language Navigation](https://arxiv.org/abs/2602.07629) | <img width="700" alt="image" src="img/vlm/2602_lcla.png"> | - |
| 2026/02 | [MaD-Mix: Multi-Modal Data Mixtures via Latent Space Coupling for Vision-Language Model Training](https://arxiv.org/abs/2602.07790) | <img width="700" alt="image" src="img/vlm/2602_mad_mix.png"> | - |
| 2026/02 | [Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection](https://arxiv.org/abs/2602.09850) | <img width="700" alt="image" src="img/vlm/2602_reason_iad.png"> | [Github](https://github.com/chenpeng052/Reason-IAD) |
| 2026/02 | [Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models](https://arxiv.org/abs/2602.11824) | <img width="700" alt="image" src="img/vlm/2602_revis.png"> | [Github](https://github.com/antgroup/Revis) |
| 2026/02 | [OneLatent: Single-Token Compression for Visual Latent Reasoning](https://arxiv.org/abs/2602.13738) | <img width="700" alt="image" src="img/vlm/2602_one_latent.png"> | - |
| 2026/02 | [The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems](https://arxiv.org/abs/2602.15382) | <img width="700" alt="image" src="img/vlm/2602_wormhole.png"> | [Github](https://github.com/xz-liu/heterogeneous-latent-mas) |
| 2026/02 | [Test-Time Computing for Referring Multimodal Large Language Models](https://arxiv.org/abs/2602.19505) | <img width="700" alt="image" src="img/vlm/2602_controlmllm.png"> | - |
| 2026/02 | [CrystaL: Spontaneous Emergence of Visual Latents in MLLMs](https://arxiv.org/abs/2602.20980) | <img width="700" alt="image" src="img/vlm/2602_ctystal.png"> | [Github](https://github.com/yangzhangok/crystal) |
| 2026/02 | [Imagination Helps Visual Reasoning, But Not Yet in Latent Space](https://arxiv.org/abs/2602.22766) | <img width="700" alt="image" src="img/vlm/2602_capimagine.png"> | - |
| 2026/02 | [Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection](https://arxiv.org/abs/2602.24021) | <img width="700" alt="image" src="img/vlm/2602_steer_vad.png"> | - |
| 2026/03 | [Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding](https://arxiv.org/abs/2603.13366) | <img width="700" alt="image" src="img/vlm/2603_lead.png"> | - |
| 2026/04 | [Visual Enhanced Depth Scaling for Multimodal Latent Reasoning](https://arxiv.org/abs/2604.10500) | <img width="700" alt="image" src="img/vlm/2604_vedas.png"> | [Github](https://github.com/Simon98-AI/Vedas) |
| 2026/04 | [HyLaR: Hybrid Latent Reasoning with Decoupled Policy Optimization](https://arxiv.org/abs/2604.20328) | <img width="700" alt="image" src="img/vlm/2604_hylar.png"> | [Github](https://github.com/EthenCheng/HyLaR) |
| 2026/03 | [MedSynapse-V: Bridging Visual Perception and Clinical Intuition via Latent Memory Evolution](https://arxiv.org/abs/2604.26283) | <img width="700" alt="image" src="img/vlm/2604_medsynapse_v.png"> | - |
| 2026/05 | [Representative Attention For Vision Transformers](https://arxiv.org/abs/2605.14913) | <img width="700" alt="image" src="img/vlm/2605_rpatten.png"> | [Github](https://github.com/Liyuntong123/RPAtten) |
### Vision-Language-Action-Model
| Date | Paper Title | Introduction | Code |
|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|-----------------------------------------------------------|
| 2024/10 |  <br/> [Latent Action Pretraining from Videos](https://arxiv.org/abs/2410.11758) | <img width="700" alt="image" src="img/vla/2410_lapa.png"> | [Github](https://github.com/LatentActionPretraining/LAPA) |
| 2025/05 | [UniVLA: Learning to Act Anywhere with Task-centric Latent Actions](https://arxiv.org/abs/2505.06111) | <img width="700" alt="image" src="img/vla/2505_univla.png"> | [Github](https://github.com/OpenDriveLab/UniVLA) |
| 2025/07 |  <br/> [ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning](https://arxiv.org/abs/2507.16815) | <img width="700" alt="image" src="img/vla/2507_thinkact.png"> | [Github](https://jasper0314-huang.github.io/thinkact-vla) |
| 2025/09 | [Align-Then-Steer: Adapting the Vision-Language Action Models through Unified Latent Guidance](https://arxiv.org/abs/2509.02055) | <img width="700" alt="image" src="img/vla/2509_ate.png"> | [Github](https://github.com/TeleHuman/Align-Then-Steer) |
| 2025/09 | [OccVLA: Vision-Language-Action Model with Implicit 3D Occupancy Supervision](https://arxiv.org/abs/2509.05578) | <img width="700" alt="image" src="img/vla/2509_occvla.png"> | - |
| 2025/09 | [Latent Action Pretraining Through World Modeling](https://arxiv.org/abs/2509.18428) | <img width="700" alt="image" src="img/vla/2509_lawm.png"> | - |
| 2025/09 | [Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA](https://arxiv.org/abs/2509.26251) | <img width="700" alt="image" src="img/vla/2509_ssm_vla.png"> | - |
| 2025/11 | [SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models](https://arxiv.org/abs/2511.15605) | <img width="700" alt="image" src="img/vla/2511_srpo.png"> | [Github](https://github.com/sii-research/siiRL) |
| 2025/11 | [LatBot: Distilling Universal Latent Actions for Vision-Language-Action Models](https://arxiv.org/abs/2511.23034) | <img width="700" alt="image" src="img/vla/2511_latbot.png"> | - |
| 2025/11 | [Unifying Perception and Action: A Hybrid-Modality Pipeline with Implicit Visual Chain-of-Thought for Robotic Action Generation](https://arxiv.org/abs/2511.19859) | <img width="700" alt="image" src="img/vla/2511_vita.png"> | [Github](https://github.com/vita-cvpr26/vita) |
| 2025/12 | [SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead](https://arxiv.org/abs/2512.00903) | <img width="700" alt="image" src="img/vla/2512_swiftvla.png"> | [Github](https://github.com/GigaAI-research/SwiftVLA) |
| 2025/12 | [GLaD: Geometric Latent Distillation for Vision-Language-Action Models](https://arxiv.org/abs/2512.09619) | <img width="700" alt="image" src="img/vla/2512_glad.png"> | - |
| 2025/12 | [Latent Chain-of-Thought World Modeling for End-to-End Autonomous Driving](https://arxiv.org/abs/2512.10226) | <img width="700" alt="image" src="img/vla/2512_lc_drive.png"> | - |
| 2025/12 | [WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control](https://arxiv.org/abs/2512.11047) | <img width="700" alt="image" src="img/vla/2512_whole_body_vla.png"> | [Github](https://github.com/OpenDriveLab/WholebodyVLA) |
| 2025/12 | [Motus: A Unified Latent Action World Model](https://arxiv.org/abs/2512.13030) | <img width="700" alt="image" src="img/vla/2512_motus.png"> | [Github](https://github.com/thu-ml/Motus) |
| 2025/12 | [LoLA: Long Horizon Latent Action Learning for General Robot Manipulation](https://arxiv.org/abs/2512.20166) | <img width="700" alt="image" src="img/vla/2512_lola.png"> | - |
| 2025/12 | [ColaVLA: Leveraging Cognitive Latent Reasoning for Hierarchical Parallel Trajectory Planning in Autonomous Driving](https://arxiv.org/abs/2512.22939) | <img width="700" alt="image" src="img/vla/2512_colavla.png"> | [Github](https://github.com/pqh22/ColaVLA) |
| 2026/01 | [Learning to Act Robustly with View-Invariant Latent Actions](https://arxiv.org/abs/2601.02994) | <img width="700" alt="image" src="img/vla/2601_vila.png"> | - |
| 2026/01 | [CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos](https://arxiv.org/abs/2601.04061) | <img width="700" alt="image" src="img/vla/2601_clap.png"> | - |
| 2026/01 | [LaST0: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model](https://arxiv.org/abs/2601.05248) | <img width="700" alt="image" src="img/vla/2601_last.png"> | - |
| 2026/01 | [LatentVLA: Efficient Vision-Language Models for Autonomous Driving via Latent Action Prediction](https://arxiv.org/abs/2601.05611) | <img width="700" alt="image" src="img/vla/2601_latent_vla.png"> | - |
| 2026/01 | [Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning](https://arxiv.org/abs/2601.09708) | <img width="700" alt="image" src="img/vla/2601_fast_thinkact.png"> | - |
| 2026/01 | [LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries](https://arxiv.org/abs/2601.15197) | <img width="700" alt="image" src="img/vla/2601_longforce.png"> | [Github](https://github.com/ZGC-EmbodyAI/LangForce) |
| 2026/01 | [CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control](https://arxiv.org/abs/2601.22467) | <img width="700" alt="image" src="img/vla/2601_care.png"> | - |
| 2026/01 | [Vision-Language Models Unlock Task-Centric Latent Actions](https://arxiv.org/abs/2601.22714) | <img width="700" alt="image" src="img/vla/2601_vision.png"> | - |
| 2026/02 | [Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models](https://arxiv.org/abs/2602.01166) | <img width="700" alt="image" src="img/vla/2602_lara_vla.png"> | [Github](https://github.com/LoveJu1y/LaRA-VLA) |
| 2026/02 | [DriveWorld-VLA: Unified Latent-Space World Modeling for Autonomous Driving](https://arxiv.org/abs/2602.06521) | <img width="700" alt="image" src="img/vla/2602_driveworld_vla.png"> | - |
| 2026/02 | [Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning](https://arxiv.org/abs/2602.07845) | <img width="700" alt="image" src="img/vla/2602_recurrent_depth_vla.png"> | - |
| 2026/02 | [ConLA: Contrastive Latent Action Learning from Human Videos for Robotic Manipulation](https://arxiv.org/abs/2602.00557) | <img width="700" alt="image" src="img/vla/2602_conla.png"> | [Github](https://github.com/WeishengDAI/ConLA) |
| 2026/02 | [VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model](https://arxiv.org/abs/2602.10098) | <img width="700" alt="image" src="img/vla/2602_vla_jepa.png"> | [Github](https://github.com/ginwind/VLA-JEPA/) |
| 2026/02 | [FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution](https://arxiv.org/abs/2602.15882) | <img width="700" alt="image" src="img/vla/2602_futurevla.png"> | - |
| 2026/02 | [UniLACT: Depth-Aware RGB Latent Action Learning for Vision-Language-Action Models](https://arxiv.org/abs/2602.20231) | <img width="700" alt="image" src="img/vla/2602_unilact.png"> | - |
| 2026/02 |  <br/> [JALA: Joint-Aligned Latent Action: Towards Scalable VLA Pretraining in the Wild](https://arxiv.org/abs/2602.21736) | <img width="700" alt="image" src="img/vla/2602_jala.png"> | - |
| 2026/03 | [LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving](https://arxiv.org/abs/2603.01928) | <img width="700" alt="image" src="img/vla/2603_last_vla.png"> | [Github](https://github.com/luo-yc17/LaST-VLA) |
| 2026/03 | [Chain of World: World Model Thinking in Latent Motion](https://arxiv.org/abs/2603.03195) | <img width="700" alt="image" src="img/vla/2603_cowvla.png"> | [Github](https://github.com/fx-hit/CoWVLA) |
| 2026/04 | [OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation](https://arxiv.org/abs/2604.18486) | <img width="700" alt="image" src="img/vla/2604_onevl.png"> | - |
gitextract_gi5h8kyw/ ├── CONTRIBUTING.md ├── LICENSE └── README.md
Condensed preview — 3 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (116K chars).
[
{
"path": "CONTRIBUTING.md",
"chars": 1724,
"preview": "# 🤝 Contributing\nWe sincerely welcome paper updates/ contributions of any kind (and please do that lol)! Feel free to *o"
},
{
"path": "LICENSE",
"chars": 1064,
"preview": "MIT License\n\nCopyright (c) 2025 Neil_Yu\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof"
},
{
"path": "README.md",
"chars": 110809,
"preview": "<div align=\"center\">\n <h1 style=\"display: inline-flex; align-items: center;\">\n <img src=\"img/static/icon.png\" "
}
]
About this extraction
This page contains the full source code of the YU-deep/Awesome-Latent-Space GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 3 files (110.9 KB), approximately 23.2k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.