Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes Paper • 2603.25562 • Published Mar 26 • 15
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 23 days ago • 90
Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing Paper • 2604.02288 • Published Apr 2 • 32