Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty Paper • 2602.18312 • Published 4 days ago • 1
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning Paper • 2602.16742 • Published 6 days ago • 3
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published 4 days ago • 22
Does Your Reasoning Model Implicitly Know When to Stop Thinking? Paper • 2602.08354 • Published 15 days ago • 122
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 13 days ago • 162