Text Generation
Transformers
Safetensors
English
parchment
tiny
from-scratch
instruction-tuned
causal-lm
parchmentlm
custom_code
Instructions to use SlitherCode/tiny-edu-166m-instruct-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SlitherCode/tiny-edu-166m-instruct-v3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SlitherCode/tiny-edu-166m-instruct-v3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
- SGLang
How to use SlitherCode/tiny-edu-166m-instruct-v3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Docker Model Runner:
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
File size: 5,865 Bytes
5259975 ea2d4a7 5259975 ea2d4a7 5259975 ea2d4a7 5259975 ea2d4a7 5259975 ea2d4a7 f41a393 ea2d4a7 5259975 ea2d4a7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 | ---
library_name: transformers
tags:
- tiny
- from-scratch
- instruction-tuned
- causal-lm
- parchmentlm
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
- Cleanlab/databricks-dolly-15k-cleaned
- ProCreations/SimpleMath
language:
- en
base_model:
- SlitherCode/tiny-edu-166m
---
# ParchmentLM 166M Instruct
A 166M parameter instruction-tuned language model trained entirely from scratch β custom architecture, real pretraining data, and full SFT pipeline β for under $55 in cloud compute.
This is a proof-of-concept demonstrating the full LLM development pipeline: architecture design, pretraining on real web data, supervised fine-tuning, and deployment. It is not intended for production use.
## Model Details
- **Developed by:** Pranay Narula (SlitherCode)
- **Model type:** ParchmentLM β a custom decoder-only transformer architecture
- **Language:** English
- **License:** MIT
- **Base model:** [SlitherCode/tiny-edu-166m](https://huggingface.co/SlitherCode/tiny-edu-166m) (pretrained from scratch)
### Architecture
ParchmentLM is a custom LLaMA-style architecture with the following components:
| Component | Details |
|---|---|
| Parameters | ~166M |
| Layers | 12 |
| Attention heads | 12 |
| Hidden size | 768 |
| FFN size | 2048 |
| Context length | 1024 tokens |
| Positional encoding | RoPE |
| Normalization | RMSNorm (pre-norm) |
| Activation | SwiGLU |
| Attention | FlashAttention (via `scaled_dot_product_attention`) |
| Tokenizer | tiktoken cl100k_base (vocab size 100,277) |
| Weight tying | Yes (input embeddings = output projection) |
### Chat Template (ParchmentLM format)
```
system
You are a helpful assistant<|endoftext|>
user
{user message}<|endoftext|>
assistant
{assistant response}<|endoftext|>
```
`<|endoftext|>` (token ID 100257) serves as both the turn separator and stop token.
## Training
### Stage 1 β Pretraining
- **Dataset:** FineWeb-Edu 10BT sample (HuggingFaceFW/fineweb-edu)
- **Tokens trained on:** ~4B
- **Infrastructure:** Modal, single A100-40GB
- **Throughput:** ~75,000 tokens/sec
- **Duration:** ~14.8 hours
- **Cost:** ~$46
- **Optimizer:** AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1)
- **Learning rate:** 3e-4 with cosine decay to 3e-5, 2000 step warmup
- **Batch size:** 16 Γ 8 grad accum Γ 1024 seq len β 131k tokens/step
- **Precision:** bfloat16
### Stage 2 β Supervised Fine-Tuning
- **Datasets:**
- [Cleanlab/databricks-dolly-15k-cleaned](https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleaned) β filtered to `closed_qa`, `open_qa`, `information_extraction` categories (~7k examples)
- [ProCreations/SimpleMath](https://huggingface.co/datasets/ProCreations/SimpleMath) β 2,500 examples per operation (+, -, *, /) balanced, 10k total
- **Total SFT examples:** ~17k
- **Loss:** Completion-only (prompt and padding tokens masked to -100)
- **Pad token:** `<|endofprompt|>` (token ID 83285) to preserve EOT as a learnable stop signal
- **Epochs:** 8
- **Learning rate:** 1e-4 cosine decay
- **Batch size:** 16 Γ 2 grad accum
- **Duration:** ~38 minutes
- **Cost:** ~$1.50
- **Infrastructure:** Modal, single A100-40GB
- **Precision:** bfloat16
**Total training cost: ~$55 with many sft iterations**
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
tokenizer.pad_token = "<|endofprompt|>"
model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166M-instruct", trust_remote_code=True)
model.eval()
PAD_ID = tokenizer.convert_tokens_to_ids("<|endofprompt|>")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
input_len = inputs["input_ids"].shape[1]
import torch
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100,
do_sample=False,
repetition_penalty=1.1,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=PAD_ID,
)
raw = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=False)
response = raw.split("<|endoftext|>")[0].strip()
print(response)
# The capital of France is Paris.
```
**Note:** For arithmetic, use the format `"47 + 83 ="` rather than `"What is 47 + 83?"` to match the training distribution.
## Evaluation
Informal evaluation on held-out questions:
| Question | Response | Correct? |
|---|---|---|
| What is the capital of France? | The capital of France is Paris. | β |
| What is the capital of Germany? | The capital of Germany is Berlin. | β |
| Who wrote Romeo and Juliet? | Romeo and Juliet was written by William Shakespeare. | β |
| 12 + 5 = | 17 | β |
| 900 - 345 = | 700 | β (off by ~145) |
| 2790 + 6698 = | 9648 | β (correct: 9488) |
**Limitations:**
- Reliable arithmetic only up to ~2-3 digit operands
- Tends to hallucinate on out-of-distribution factual questions
- No safety filtering or alignment
- Will not stop gracefully on prompts with no clear answer (creative writing, open-ended tasks)
- Undertrained relative to model capacity β 4B tokens vs. the ~300B tokens models this size typically see
## Compute & Environmental Impact
- **Hardware:** NVIDIA A100-40GB (via Modal)
- **Cloud provider:** Modal (AWS us-east-1 region)
- **Total GPU hours:** ~15.5 hours
- **Total cost:** ~$55 USD
## Citation
If you use this model or find this project useful, a link back to the repository is appreciated.
```
@misc{narula2025parchmentlm,
author = {Pranay Narula},
title = {ParchmentLM 166M Instruct: Full LLM Pipeline From Scratch},
year = {2025},
url = {https://huggingface.co/SlitherCode/tiny-edu-166M-instruct}
}
``` |