HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published Jan 21 • 75
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published Jan 4 • 46
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published Oct 14, 2025 • 149
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6, 2025 • 130
Soul: Breathe Life into Digital Human for High-fidelity Long-term Multimodal Animation Paper • 2512.13495 • Published Dec 15, 2025 • 11
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242