Tiny-LM-15M

A nano-sized language model (15M parameters) that demonstrates the power of high-quality synthetic data. Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and simplified English datasets.

This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M). The results show that Tiny-LM-15M punches far above its weight class:

Performance Comparison

This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M). The results show that Tiny-LM-15M punches far above its weight class:

Task	Tiny-LM (15M)	GPT-2 (124M)	% of GPT-2 Perf.
ARC-Easy (acc_norm)	31.73%	39.48%	80.4%
HellaSwag (acc_norm)	27.00%	31.14%	86.7%

Key Takeaway: With only 12% of the parameters, this model achieves over 80% of the reasoning performance of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.

Model Architecture

The model is based on the Llama-2 architecture with several modern optimizations:

Parameters: 15.2 Million
Layers: 6
Attention Heads: 6
Hidden Dimension: 288
Context Length: 256 tokens
Vocabulary Size: 4096 (Custom SentencePiece Tokenizer)
Features: Rotary Positional Embeddings (RoPE), RMSNorm, SwiGLU activation.

Training Data

The secret sauce of this model is the training data, designed for maximum information density:

Distilled BabyLM (10M): A subset of the BabyLM dataset, rewritten by DeepSeek3 into simplified, high-clarity English.
Synthetic Wiki: Educational Wikipedia content rewritten into child-friendly English by Gemma-27B.

This combination ensures the model learns factual world knowledge without the "noise" and complexity of raw web crawls.

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "sixf0ur/tiny-lm-15M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "The meaning of life is"
inputs = tokenizer(prompt, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

# Ouptut:
# The meaning of life is a set of ways that people can share, feel, and learn about things.
# People have thought about things like how they find their way, where they look for adventures, and how they fit together

Training Progress

Final Train Loss: 2.5206
Final Val Loss: 2.7290
Training Steps: 3,600
Epochs: ~18

Downloads last month: -

Safetensors

Model size

8.34M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

sixf0ur
/

tiny-lm-15M