On-Policy Self-Distillation for Reasoning Compression Paper • 2603.05433 • Published about 19 hours ago • 2
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published 10 days ago • 5
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published 10 days ago • 5