Ale San's picture

7 2

Ale San

alecco

·

alecco

AI & ML interests

Data indexing and retrieval

Recent Activity

upvoted a paper 4 days ago

Normalized Architectures are Natively 4-Bit

upvoted a paper 5 days ago

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

updated a collection 5 days ago

View all activity

Organizations

None yet

upvoted a paper 4 days ago

Normalized Architectures are Natively 4-Bit

Paper • 2605.06067 • Published 29 days ago • 1

upvoted a paper 5 days ago

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Paper • 2501.12370 • Published Jan 21, 2025 • 12

updated a collection 5 days ago

Distillation

2 items • Updated 5 days ago

upvoted 2 papers 5 days ago

Distillation Scaling Laws

Paper • 2502.08606 • Published Feb 12, 2025 • 48

Strong Teacher Not Needed? On Distillation in LLM Pretraining

Paper • 2605.23857 • Published 14 days ago • 1

upvoted a paper about 1 month ago

Adam's Law: Textual Frequency Law on Large Language Models

Paper • 2604.02176 • Published Apr 2 • 504

upvoted a paper about 2 months ago

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs

Paper • 2604.13226 • Published Apr 14 • 10

liked a dataset about 2 months ago

SimpleStories/SimpleStories

Viewer • Updated Dec 19, 2025 • 2.14M • 6.21k • 29

liked a Space about 2 months ago

Distilling 100B+ Models 40x Faster with TRL

TRL distillation for 100B+ teachers, 40x faster

upvoted an article about 2 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf

•

Jul 8, 2025

• 778