Text Generation
Transformers
Safetensors
English
parchment
tiny
from-scratch
instruction-tuned
causal-lm
parchmentlm
custom_code
Instructions to use SlitherCode/tiny-edu-166m-instruct-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SlitherCode/tiny-edu-166m-instruct-v3 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SlitherCode/tiny-edu-166m-instruct-v3" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
- SGLang
How to use SlitherCode/tiny-edu-166m-instruct-v3 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SlitherCode/tiny-edu-166m-instruct-v3", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Docker Model Runner:
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
| library_name: transformers | |
| tags: | |
| - tiny | |
| - from-scratch | |
| - instruction-tuned | |
| - causal-lm | |
| - parchmentlm | |
| license: mit | |
| datasets: | |
| - HuggingFaceFW/fineweb-edu | |
| - Cleanlab/databricks-dolly-15k-cleaned | |
| - ProCreations/SimpleMath | |
| language: | |
| - en | |
| base_model: | |
| - SlitherCode/tiny-edu-166m | |
| # ParchmentLM 166M Instruct | |
| A 166M parameter instruction-tuned language model trained entirely from scratch β custom architecture, real pretraining data, and full SFT pipeline β for under $55 in cloud compute. | |
| This is a proof-of-concept demonstrating the full LLM development pipeline: architecture design, pretraining on real web data, supervised fine-tuning, and deployment. It is not intended for production use. | |
| ## Model Details | |
| - **Developed by:** Pranay Narula (SlitherCode) | |
| - **Model type:** ParchmentLM β a custom decoder-only transformer architecture | |
| - **Language:** English | |
| - **License:** MIT | |
| - **Base model:** [SlitherCode/tiny-edu-166m](https://huggingface.co/SlitherCode/tiny-edu-166m) (pretrained from scratch) | |
| ### Architecture | |
| ParchmentLM is a custom LLaMA-style architecture with the following components: | |
| | Component | Details | | |
| |---|---| | |
| | Parameters | ~166M | | |
| | Layers | 12 | | |
| | Attention heads | 12 | | |
| | Hidden size | 768 | | |
| | FFN size | 2048 | | |
| | Context length | 1024 tokens | | |
| | Positional encoding | RoPE | | |
| | Normalization | RMSNorm (pre-norm) | | |
| | Activation | SwiGLU | | |
| | Attention | FlashAttention (via `scaled_dot_product_attention`) | | |
| | Tokenizer | tiktoken cl100k_base (vocab size 100,277) | | |
| | Weight tying | Yes (input embeddings = output projection) | | |
| ### Chat Template (ParchmentLM format) | |
| ``` | |
| system | |
| You are a helpful assistant<|endoftext|> | |
| user | |
| {user message}<|endoftext|> | |
| assistant | |
| {assistant response}<|endoftext|> | |
| ``` | |
| `<|endoftext|>` (token ID 100257) serves as both the turn separator and stop token. | |
| ## Training | |
| ### Stage 1 β Pretraining | |
| - **Dataset:** FineWeb-Edu 10BT sample (HuggingFaceFW/fineweb-edu) | |
| - **Tokens trained on:** ~4B | |
| - **Infrastructure:** Modal, single A100-40GB | |
| - **Throughput:** ~75,000 tokens/sec | |
| - **Duration:** ~14.8 hours | |
| - **Cost:** ~$46 | |
| - **Optimizer:** AdamW (Ξ²1=0.9, Ξ²2=0.95, weight decay=0.1) | |
| - **Learning rate:** 3e-4 with cosine decay to 3e-5, 2000 step warmup | |
| - **Batch size:** 16 Γ 8 grad accum Γ 1024 seq len β 131k tokens/step | |
| - **Precision:** bfloat16 | |
| ### Stage 2 β Supervised Fine-Tuning | |
| - **Datasets:** | |
| - [Cleanlab/databricks-dolly-15k-cleaned](https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleaned) β filtered to `closed_qa`, `open_qa`, `information_extraction` categories (~7k examples) | |
| - [ProCreations/SimpleMath](https://huggingface.co/datasets/ProCreations/SimpleMath) β 2,500 examples per operation (+, -, *, /) balanced, 10k total | |
| - **Total SFT examples:** ~17k | |
| - **Loss:** Completion-only (prompt and padding tokens masked to -100) | |
| - **Pad token:** `<|endofprompt|>` (token ID 83285) to preserve EOT as a learnable stop signal | |
| - **Epochs:** 8 | |
| - **Learning rate:** 1e-4 cosine decay | |
| - **Batch size:** 16 Γ 2 grad accum | |
| - **Duration:** ~38 minutes | |
| - **Cost:** ~$1.50 | |
| - **Infrastructure:** Modal, single A100-40GB | |
| - **Precision:** bfloat16 | |
| **Total training cost: ~$55 with many sft iterations** | |
| ## Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True) | |
| tokenizer.pad_token = "<|endofprompt|>" | |
| model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166M-instruct", trust_remote_code=True) | |
| model.eval() | |
| PAD_ID = tokenizer.convert_tokens_to_ids("<|endofprompt|>") | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful assistant."}, | |
| {"role": "user", "content": "What is the capital of France?"}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| input_len = inputs["input_ids"].shape[1] | |
| import torch | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=100, | |
| do_sample=False, | |
| repetition_penalty=1.1, | |
| eos_token_id=tokenizer.eos_token_id, | |
| pad_token_id=PAD_ID, | |
| ) | |
| raw = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=False) | |
| response = raw.split("<|endoftext|>")[0].strip() | |
| print(response) | |
| # The capital of France is Paris. | |
| ``` | |
| **Note:** For arithmetic, use the format `"47 + 83 ="` rather than `"What is 47 + 83?"` to match the training distribution. | |
| ## Evaluation | |
| Informal evaluation on held-out questions: | |
| | Question | Response | Correct? | | |
| |---|---|---| | |
| | What is the capital of France? | The capital of France is Paris. | β | | |
| | What is the capital of Germany? | The capital of Germany is Berlin. | β | | |
| | Who wrote Romeo and Juliet? | Romeo and Juliet was written by William Shakespeare. | β | | |
| | 12 + 5 = | 17 | β | | |
| | 900 - 345 = | 700 | β (off by ~145) | | |
| | 2790 + 6698 = | 9648 | β (correct: 9488) | | |
| **Limitations:** | |
| - Reliable arithmetic only up to ~2-3 digit operands | |
| - Tends to hallucinate on out-of-distribution factual questions | |
| - No safety filtering or alignment | |
| - Will not stop gracefully on prompts with no clear answer (creative writing, open-ended tasks) | |
| - Undertrained relative to model capacity β 4B tokens vs. the ~300B tokens models this size typically see | |
| ## Compute & Environmental Impact | |
| - **Hardware:** NVIDIA A100-40GB (via Modal) | |
| - **Cloud provider:** Modal (AWS us-east-1 region) | |
| - **Total GPU hours:** ~15.5 hours | |
| - **Total cost:** ~$55 USD | |
| ## Citation | |
| If you use this model or find this project useful, a link back to the repository is appreciated. | |
| ``` | |
| @misc{narula2025parchmentlm, | |
| author = {Pranay Narula}, | |
| title = {ParchmentLM 166M Instruct: Full LLM Pipeline From Scratch}, | |
| year = {2025}, | |
| url = {https://huggingface.co/SlitherCode/tiny-edu-166M-instruct} | |
| } | |
| ``` |