Tiny-LM-15M
A nano-sized language model (15M parameters) that demonstrates the power of high-quality synthetic data. Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and simplified English datasets.
This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M). The results show that Tiny-LM-15M punches far above its weight class:
Performance Comparison
This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M). The results show that Tiny-LM-15M punches far above its weight class:
| Task | Tiny-LM (15M) | GPT-2 (124M) | % of GPT-2 Perf. |
|---|---|---|---|
| ARC-Easy (acc_norm) | 31.73% | 39.48% | 80.4% |
| HellaSwag (acc_norm) | 27.00% | 31.14% | 86.7% |
Key Takeaway: With only 12% of the parameters, this model achieves over 80% of the reasoning performance of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.
Model Architecture
The model is based on the Llama-2 architecture with several modern optimizations:
- Parameters: 15.2 Million
- Layers: 6
- Attention Heads: 6
- Hidden Dimension: 288
- Context Length: 256 tokens
- Vocabulary Size: 4096 (Custom SentencePiece Tokenizer)
- Features: Rotary Positional Embeddings (RoPE), RMSNorm, SwiGLU activation.
Training Data
The secret sauce of this model is the training data, designed for maximum information density:
- Distilled BabyLM (10M): A subset of the BabyLM dataset, rewritten by DeepSeek3 into simplified, high-clarity English.
- Synthetic Wiki: Educational Wikipedia content rewritten into child-friendly English by Gemma-27B.
This combination ensures the model learns factual world knowledge without the "noise" and complexity of raw web crawls.
Usage
You can use this model directly with the Hugging Face transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "sixf0ur/tiny-lm-15M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "The meaning of life is"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Ouptut:
# The meaning of life is a set of ways that people can share, feel, and learn about things.
# People have thought about things like how they find their way, where they look for adventures, and how they fit together
Training Progress
- Final Train Loss: 2.5206
- Final Val Loss: 2.7290
- Training Steps: 3,600
- Epochs: ~18
- Downloads last month
- -