view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 68
view article Article You could have designed state of the art positional encoding Nov 25, 2024 • 454
Running on CPU Upgrade Featured 3.05k The Smol Training Playbook 📚 3.05k The secrets to building world-class LLMs
Running 3.74k The Ultra-Scale Playbook 🌌 3.74k The ultimate guide to training LLM on large GPU Clusters
deepseek-ai/DeepSeek-V3-0324 Text Generation • 685B • Updated Mar 27, 2025 • 326k • • 3.09k
nvidia/Llama-Nemotron-Post-Training-Dataset Viewer • Updated May 8, 2025 • 3.91M • 2.88k • 644