Strong Teacher Not Needed? On Distillation in LLM Pretraining Paper • 2605.23857 • Published 23 days ago • 1
i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models Paper • 2606.11289 • Published 5 days ago • 9
i1: A Simple and Fully Open Recipe for Strong Text-to-Image Models Paper • 2606.11289 • Published 5 days ago • 9
Derf Collection Model Checkpoints for [CVPR 2026] Stronger Normalization-Free Transformers • 2 items • Updated Apr 7 • 1
Derf Collection Model Checkpoints for [CVPR 2026] Stronger Normalization-Free Transformers • 2 items • Updated Apr 7 • 1
SoFlow Collection [ICLR 2026] SoFlow: Solution Flow Models for One-Step Generative Modeling. • 1 item • Updated Mar 24 • 1
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 lysandre, ArthurZ, cyrilvallez, reach-vb • Dec 1, 2025 • 311
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.14k