TinyLM-5M

Tiny 4.92M parameter causal language model trained with standard next-token CE plus a training-only Closed Learning attention KL.

Model

Architecture: custom HF-compatible Llama-style causal LM
Parameters: ~4.92M total, ~4.13M excluding tied token embeddings
Vocab size: 4096
Context length: 1024
Layers: 9
Hidden size: 192
Attention heads: 6
Key/value heads: 2
MLP intermediate: 640
Tokenizer source: AxiomicLabs/GPT-S-5M

Data Mix

60% HuggingFaceFW/fineweb-edu config sample-100BT split train
25% HuggingFaceFW/finewiki config en split train
15% HuggingFaceFW/fineweb config sample-100BT split train

Total training tokens:

131,072

Training

Steps: 1
Batch size: 128
LR: 0.0025
Warmup steps: 100
Scheduler: cosine
Dropout: 0.0
Torch compile: True
Closed Learning: True

The inference path is a normal causal transformer. Closed Learning is used only during training.

Evaluation

Evaluation uses the full Salesforce/wikitext wikitext-103-raw-v1 validation split as a neutral validation set. Perplexity/BPB are computed with sliding-window evaluation, context length 1024, stride 512.

Training tokens: 131,072
Final train window total loss: nan
Final train window LM loss: nan
Final train window CL loss: nan
WikiText validation loss: 8.3485
WikiText validation perplexity: 4223.78
WikiText validation UTF-8 BPB: 3.8196
WikiText validation tokens: 365,255
WikiText validation UTF-8 bytes: 1,151,766

Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "User01110/TinyLM-5M"
prompt = "Artificial intelligence is"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    dtype="auto",
    device_map="auto" if torch.cuda.is_available() else None,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.8,
    repetition_penalty=1.2,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Downloads last month: -

Safetensors

Model size

4.99M params

Tensor type

F32