AI & ML interests
A new generation of foundation models from first principles.
mlabonne
authored 2
papers 3 months ago
Post
10332
New family of 1B models just dropped!
> LiquidAI/LFM2.5-1.2B-Base: 10T → 28T tokens
> LiquidAI/LFM2.5-1.2B-Instruct: new large-scale multi-stage RL
> LiquidAI/LFM2.5-1.2B-JP: our most polite model
> LiquidAI/LFM2.5-VL-1.6B: multi-image multilingual
> LiquidAI/LFM2.5-Audio-1.5B: 8x times faster, no quality loss
Super proud of this release 🤗
> LiquidAI/LFM2.5-1.2B-Base: 10T → 28T tokens
> LiquidAI/LFM2.5-1.2B-Instruct: new large-scale multi-stage RL
> LiquidAI/LFM2.5-1.2B-JP: our most polite model
> LiquidAI/LFM2.5-VL-1.6B: multi-image multilingual
> LiquidAI/LFM2.5-Audio-1.5B: 8x times faster, no quality loss
Super proud of this release 🤗
ykhrustalev
authored a
paper 4 months ago
adityatadimeti
authored a
paper 5 months ago
fernandofernandes
authored 3
papers 5 months ago
Spectrum: Targeted Training on Signal to Noise Ratio
Paper • 2406.06623 • Published • 16
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
Paper • 2406.14971 • Published
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit
Paper • 2506.06607 • Published • 3
zetianli
authored a
paper 5 months ago
fernandofernandes
authored a
paper 5 months ago
kohsei
authored a
paper 5 months ago
sam-paech
authored 3
papers 6 months ago
EQ-Bench: An Emotional Intelligence Benchmark for Large Language Models
Paper • 2312.06281 • Published • 2
Democratizing Diplomacy: A Harness for Evaluating Any Large Language Model on Full-Press Diplomacy
Paper • 2508.07485 • Published • 10
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
Paper • 2510.15061 • Published • 3
GAD-cell
authored a
paper 7 months ago
Post
8435
LiquidAI/LFM2-8B-A1B just dropped!
8.3B params with only 1.5B active/token 🚀
> Quality ≈ 3–4B dense, yet faster than Qwen3-1.7B
> MoE designed to run on phones/laptops (llama.cpp / vLLM)
> Pre-trained on 12T tokens → strong math/code/IF
8.3B params with only 1.5B active/token 🚀
> Quality ≈ 3–4B dense, yet faster than Qwen3-1.7B
> MoE designed to run on phones/laptops (llama.cpp / vLLM)
> Pre-trained on 12T tokens → strong math/code/IF
s-jse
authored 2
papers 7 months ago
Post
3887
⚛️ New drop of tiny task-specific models!
Want to do data extraction, translation, RAG, tool use, or math on a Raspberry Pi? We got you covered! ✅
These tiny models were fine-tuned to perform narrow tasks extremely well, making them competitive with much larger models.
You can deploy them today on-device or even on GPUs for big data operations!
LiquidAI/liquid-nanos-68b98d898414dd94d4d5f99a
Want to do data extraction, translation, RAG, tool use, or math on a Raspberry Pi? We got you covered! ✅
These tiny models were fine-tuned to perform narrow tasks extremely well, making them competitive with much larger models.
You can deploy them today on-device or even on GPUs for big data operations!
LiquidAI/liquid-nanos-68b98d898414dd94d4d5f99a