tiny-edu-166m / README.md

SlitherCode

Upload README.md with huggingface_hub

d6a5aea verified 16 days ago

1.41 kB

language: en
license: mit
tags:
  - pretrained
  - causal-lm
  - fineweb-edu
  - custom-architecture

tiny-edu-166m (ParchmentLM)

A 166M parameter transformer pretrained from scratch on 4B tokens of FineWeb-Edu.

Architecture (ParchmentLM)

Custom decoder-only transformer:

Parameters: 166M
Layers: 12
Hidden size: 768
Attention heads: 12
FFN: SwiGLU (hidden=2048)
Context length: 1024
Positional encoding: RoPE (base=10000)
Normalization: RMSNorm
Tokenizer: cl100k_base (100277 tokens)

Training

Dataset: FineWeb-Edu 10BT sample
Tokens seen: ~4B
Steps: 30,000
Optimizer: AdamW (lr=3e-4, cosine decay to 3e-5)
Hardware: Single A100 80GB

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
model     = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)

inputs = tokenizer("The history of mathematics", return_tensors="pt")
out    = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8)
print(tokenizer.decode(out[0], skip_special_tokens=True))

License

Model weights: MIT. Training data: ODC-By 1.0.