TinyLM-5M

Tiny 4.92M parameter causal language model trained with standard next-token CE plus a training-only Closed Learning attention KL.

Model

  • Architecture: custom HF-compatible Llama-style causal LM
  • Parameters: ~4.92M total, ~4.13M excluding tied token embeddings
  • Vocab size: 4096
  • Context length: 1024
  • Layers: 9
  • Hidden size: 192
  • Attention heads: 6
  • Key/value heads: 2
  • MLP intermediate: 640
  • Tokenizer source: AxiomicLabs/GPT-S-5M

Data Mix

  • 60% HuggingFaceFW/fineweb-edu config sample-100BT split train
  • 25% HuggingFaceFW/finewiki config en split train
  • 15% HuggingFaceFW/fineweb config sample-100BT split train

Total training tokens:

131,072

Training

  • Steps: 1
  • Batch size: 128
  • LR: 0.0025
  • Warmup steps: 100
  • Scheduler: cosine
  • Dropout: 0.0
  • Torch compile: True
  • Closed Learning: True

The inference path is a normal causal transformer. Closed Learning is used only during training.

Evaluation

Evaluation uses the full Salesforce/wikitext wikitext-103-raw-v1 validation split as a neutral validation set. Perplexity/BPB are computed with sliding-window evaluation, context length 1024, stride 512.

  • Training tokens: 131,072
  • Final train window total loss: nan
  • Final train window LM loss: nan
  • Final train window CL loss: nan
  • WikiText validation loss: 8.3485
  • WikiText validation perplexity: 4223.78
  • WikiText validation UTF-8 BPB: 3.8196
  • WikiText validation tokens: 365,255
  • WikiText validation UTF-8 bytes: 1,151,766

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "User01110/TinyLM-5M"
prompt = "Artificial intelligence is"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    dtype="auto",
    device_map="auto" if torch.cuda.is_available() else None,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.8,
    repetition_penalty=1.2,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Downloads last month
-
Safetensors
Model size
4.99M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support