HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 6 days ago • 58
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 11 days ago • 63
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published 11 days ago • 90
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs Paper • 2603.09906 • Published 13 days ago • 71
Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing Paper • 2603.03143 • Published 20 days ago • 144
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published 13 days ago • 47
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published 13 days ago • 28
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Paper • 2603.06577 • Published 16 days ago • 48
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published 23 days ago • 97
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published 24 days ago • 58
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published 26 days ago • 99
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 27 days ago • 31
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 193
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 349
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots Paper • 2602.00919 • Published Jan 31 • 318