Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 9 days ago • 12
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 9 days ago • 85
Running Featured 72 Distilling 100B+ Models 40x Faster with TRL 📝 72 TRL distillation for 100B+ teachers, 40x faster
Running Featured 72 Distilling 100B+ Models 40x Faster with TRL 📝 72 TRL distillation for 100B+ teachers, 40x faster