Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 5 days ago • 30
Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction Paper • 2605.12070 • Published 8 days ago • 16
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective Paper • 2509.22613 • Published Sep 26, 2025 • 10