2 6

chengluo PRO

wdlctc

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

Delta Attention Residuals

published a model about 2 months ago

wdlctc/open-attnres-0.6b-full

updated a model about 2 months ago

wdlctc/open-attnres-0.6b-block

View all activity

Organizations

None yet

upvoted a paper about 10 hours ago

Delta Attention Residuals

Paper • 2605.18855 • Published 8 days ago • 5

published a model about 2 months ago

wdlctc/open-attnres-0.6b-full

Updated Mar 27

updated a model about 2 months ago

wdlctc/open-attnres-0.6b-block

0.5B • Updated Mar 27 • 20

published a model about 2 months ago

wdlctc/open-attnres-0.6b-block

0.5B • Updated Mar 27 • 20

updated a model about 2 months ago

wdlctc/open-attnres-0.6b-baseline

0.5B • Updated Mar 27 • 25

published a model about 2 months ago

wdlctc/open-attnres-0.6b-baseline

0.5B • Updated Mar 27 • 25

upvoted a paper about 2 months ago

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Paper • 2603.24472 • Published Mar 25 • 55

updated a model about 2 months ago

wdlctc/open-attnres-full

0.1B • Updated Mar 26

published a model about 2 months ago

wdlctc/open-attnres-full

0.1B • Updated Mar 26

updated a model about 2 months ago

wdlctc/open-attnres-block

0.1B • Updated Mar 26 • 3

published a model about 2 months ago

wdlctc/open-attnres-block

0.1B • Updated Mar 26 • 3

updated a model about 2 months ago

wdlctc/open-attnres-baseline

0.1B • Updated Mar 26

published a model about 2 months ago

wdlctc/open-attnres-baseline

0.1B • Updated Mar 26

upvoted a paper 8 months ago

Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?

Paper • 2510.01161 • Published Oct 1, 2025 • 14

upvoted a paper 11 months ago

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11, 2025 • 55

upvoted an article about 1 year ago

Article

SmolLM - blazingly fast and remarkably powerful

loubnabnl, anton-l, eliebak

•

Jul 16, 2024

• 455

authored a paper over 1 year ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13

upvoted a paper over 1 year ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13

commented a paper over 1 year ago

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Paper • 2502.12574 • Published Feb 18, 2025 • 13 •

authored a paper over 1 year ago

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Paper • 2407.15892 • Published Jul 22, 2024

chengluo PRO

AI & ML interests

Recent Activity

Organizations

wdlctc's activity

SmolLM - blazingly fast and remarkably powerful