HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 10 days ago • 104
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published 2 days ago • 83
Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2603.18118 • Published 9 days ago • 11
Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation Paper • 2603.12247 • Published 15 days ago • 23
Video-Based Reward Modeling for Computer-Use Agents Paper • 2603.10178 • Published 17 days ago • 42
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising Paper • 2603.08703 • Published 18 days ago • 31
RISE-Video: Can Video Generators Decode Implicit World Rules? Paper • 2602.05986 • Published Feb 5 • 26
Architecture Decoupling Is Not All You Need For Unified Multimodal Model Paper • 2511.22663 • Published Nov 27, 2025 • 29