F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking Paper • 2605.12995 • Published 3 days ago • 1
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning Paper • 2605.02913 • Published Apr 8 • 9