Tiny-LM-15M

A nano-sized language model (15M parameters) that demonstrates the power of high-quality synthetic data. Despite its tiny size, it achieves a significant portion of GPT-2's (124M) performance by training on distilled and simplified English datasets.

This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M). The results show that Tiny-LM-15M punches far above its weight class:

Performance Comparison

This model was evaluated using the lm-evaluation-harness against OpenAI's GPT-2 (124M). The results show that Tiny-LM-15M punches far above its weight class:

Task Tiny-LM (15M) GPT-2 (124M) % of GPT-2 Perf.
ARC-Easy (acc_norm) 31.73% 39.48% 80.4%
HellaSwag (acc_norm) 27.00% 31.14% 86.7%

Key Takeaway: With only 12% of the parameters, this model achieves over 80% of the reasoning performance of GPT-2, proving that modern architectures combined with curated data can drastically reduce model size.

Model Architecture

The model is based on the Llama-2 architecture with several modern optimizations:

  • Parameters: 15.2 Million
  • Layers: 6
  • Attention Heads: 6
  • Hidden Dimension: 288
  • Context Length: 256 tokens
  • Vocabulary Size: 4096 (Custom SentencePiece Tokenizer)
  • Features: Rotary Positional Embeddings (RoPE), RMSNorm, SwiGLU activation.

Training Data

The secret sauce of this model is the training data, designed for maximum information density:

  1. Distilled BabyLM (10M): A subset of the BabyLM dataset, rewritten by DeepSeek3 into simplified, high-clarity English.
  2. Synthetic Wiki: Educational Wikipedia content rewritten into child-friendly English by Gemma-27B.

This combination ensures the model learns factual world knowledge without the "noise" and complexity of raw web crawls.

Usage

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "sixf0ur/tiny-lm-15M"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "The meaning of life is"
inputs = tokenizer(prompt, return_tensors="pt")

output = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0], skip_special_tokens=True))

# Ouptut:
# The meaning of life is a set of ways that people can share, feel, and learn about things.
# People have thought about things like how they find their way, where they look for adventures, and how they fit together

Training Progress

  • Final Train Loss: 2.5206
  • Final Val Loss: 2.7290
  • Training Steps: 3,600
  • Epochs: ~18

Downloads last month
-
Safetensors
Model size
8.34M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train sixf0ur/tiny-lm-15M