view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773
Hypersolid: Emergent Vision Representations via Short-Range Repulsion Paper • 2601.21255 • Published Jan 29
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 129
sradc/chunked-shuffled-wikipedia20220301en-bookcorpusopen Viewer • Updated Jul 17, 2023 • 33.5M • 160 • 4
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 bwarner, NohTow, bclavie, orionweller, ohallstrom, staghado, alexisgallagher, rbiswasfc, fladhak, tomaarsen, ncoop57, griffin, jph00, johnowhitaker, iacolippo • Dec 19, 2024 • 740
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3, 2025 • 32