| ---
|
| language: en
|
| license: mit
|
| tags:
|
| - pretrained
|
| - causal-lm
|
| - fineweb-edu
|
| - custom-architecture
|
| ---
|
|
|
| # tiny-edu-166m (ParchmentLM)
|
|
|
| A 166M parameter transformer pretrained from scratch on 4B tokens of [FineWeb-Edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
|
|
|
| ## Architecture (ParchmentLM)
|
|
|
| Custom decoder-only transformer:
|
| - **Parameters:** 166M
|
| - **Layers:** 12
|
| - **Hidden size:** 768
|
| - **Attention heads:** 12
|
| - **FFN:** SwiGLU (hidden=2048)
|
| - **Context length:** 1024
|
| - **Positional encoding:** RoPE (base=10000)
|
| - **Normalization:** RMSNorm
|
| - **Tokenizer:** cl100k_base (100277 tokens)
|
|
|
| ## Training
|
|
|
| - **Dataset:** FineWeb-Edu 10BT sample
|
| - **Tokens seen:** ~4B
|
| - **Steps:** 30,000
|
| - **Optimizer:** AdamW (lr=3e-4, cosine decay to 3e-5)
|
| - **Hardware:** Single A100 80GB
|
|
|
| ## Usage
|
|
|
| ```python
|
| from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
| tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
|
| model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
|
|
|
| inputs = tokenizer("The history of mathematics", return_tensors="pt")
|
| out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8)
|
| print(tokenizer.decode(out[0], skip_special_tokens=True))
|
| ```
|
|
|
| ## License
|
|
|
| Model weights: MIT. Training data: ODC-By 1.0.
|
| |