TINY MODELS WITH BIG INTELLIGENCE - a urroxyz Collection

urroxyz 's Collections

✨ free demo spaces

WTF GENIUS PAPERS

TINY MODELS WITH BIG INTELLIGENCE

ETHICALLY-DECENT & LEGALLY-ADJACENT

HUMAN-WRITTEN & LEGALLY-SOURCED*

ATTENTIVE ASR MODELS FOR ONNX

TINY MODELS WITH BIG INTELLIGENCE

updated 13 days ago

Tiny (<30B) models that tend to outperform their same-parameter counterparts.

Qwen/Qwen3.5-27B

Image-Text-to-Text • 28B • Updated 19 days ago • 1.37M • • 679
Qwen/Qwen3.5-9B

Image-Text-to-Text • 10B • Updated 14 days ago • 1.96M • • 837
cerebras/GLM-4.7-Flash-REAP-23B-A3B

Text Generation • 23B • Updated Jan 23 • 4.9k • 65

Note 30 on the Artificial Analysis Intelligence Index (Jan '26), beating GPT-OSS 20B, and only 3 points behind the larger 120B variant. More than HALF as intelligent as its big sibling, GLM 4.7 (Reasoning). Only 23B when pruned for "unused" experts. Uniquely good for its size, and MoE; only 3B params active per token. https://artificialanalysis.ai/models/glm-4-7-flash GGUF: unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF https://huggingface.co/unsloth/GLM-4.7-Flash-REAP-23B-A3B-GGUF
janhq/Jan-v3-4B-base-instruct

Text Generation • 4B • Updated Feb 2 • 1.89k • 59

Note Beats Qwen3 4B Thinking... But it's not a thinking model. Just instruct! Same param count.
ServiceNow-AI/Apriel-1.6-15b-Thinker

Image-Text-to-Text • Updated Dec 22, 2025 • 3.74k • • 291

Note Doesn't usually overthink, massive improvement over the previous 1.5 model. Outstanding intelligence for a 15B model.
Nanbeige/Nanbeige4.1-3B

Text Generation • 4B • Updated 17 days ago • 689k • • 1.01k

Note Outperforms Qwen 30B models at almost 1/10 the size.
Alibaba-Apsara/DASD-4B-Thinking

Text Generation • Updated Jan 15 • 682 • 217

Note Born from a great paper. Visibly outperforms all models of similar size.
Nanbeige/Nanbeige4-3B-Thinking-2511

Text Generation • 4B • Updated Dec 17, 2025 • 4.75k • 200

Note Outperforms Qwen3 4B Thinking at a slightly smaller size.
ByteDance/Ouro-1.4B-Thinking

Text Generation • Updated 17 days ago • 8.16k • 31

Note On par with 3-4B models.
ByteDance/Ouro-2.6B-Thinking

Text Generation • Updated 17 days ago • 17.2k • 96

Note On par with 4-8B models.
tiiuae/Falcon-H1R-7B

Text Generation • Updated Jan 21 • 3.73k • 218

Note Overthinks, but good proof-of-concept. Similar in intelligence to Apriel 1.5 Thinker (a 15B model), but not as good at agentic tasks. A bit benchmaxxed, and not so great at general knowledge. Better with RAG.
tiiuae/Falcon-H1-Tiny-R-0.6B

Text Generation • 0.6B • Updated Feb 4 • 341 • 16

Note Competitive with Qwen3 4B Instruct but at just 600M params. Check out the blog post, it's pretty cool. IMPORTANT: Has no general knowledge whatsoever. Only trained for logic/reasoning (math/coding).
AiAsistent/Gemma3-4B-Dark-Chain-of-Thought-CoT

Text Generation • 4B • Updated Jan 3 • 95 • 16

Note Experimental model.
AngelSlim/HY-1.8B-2Bit-GGUF

2B • Updated 12 days ago • 2.68k • 38
Qwen/Qwen3.5-4B

Image-Text-to-Text • 5B • Updated 14 days ago • 1.05M • 358