admarcosai 's Collections Efficient Training
updated
Rethinking Optimization and Architecture for Tiny Language Models
Paper
• 2402.02791
• Published
• 13
Specialized Language Models with Cheap Inference from Limited Domain
Data
Paper
• 2402.01093
• Published
• 47
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper
• 2401.17574
• Published
• 17
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
• 2401.02038
• Published
• 65
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Paper
• 2312.00678
• Published
• 2
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published
• 95
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper
• 2401.02954
• Published
• 53
Ziya2: Data-centric Learning is All LLMs Need
Paper
• 2311.03301
• Published
• 20
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published
• 51
Towards Optimal Learning of Language Models
Paper
• 2402.17759
• Published
• 18
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
• 2403.03507
• Published
• 189
Beyond Language Models: Byte Models are Digital World Simulators
Paper
• 2402.19155
• Published
• 53