mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT Paper • 2603.21606 • Published 9 days ago • 38
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 133
Running on CPU Upgrade Featured 3.07k The Smol Training Playbook 📚 3.07k The secrets to building world-class LLMs