Instructions to use SlitherCode/tiny-edu-166m-instruct-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SlitherCode/tiny-edu-166m-instruct-v3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166m-instruct-v3", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SlitherCode/tiny-edu-166m-instruct-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SlitherCode/tiny-edu-166m-instruct-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SlitherCode/tiny-edu-166m-instruct-v3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3

SGLang

How to use SlitherCode/tiny-edu-166m-instruct-v3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SlitherCode/tiny-edu-166m-instruct-v3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SlitherCode/tiny-edu-166m-instruct-v3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SlitherCode/tiny-edu-166m-instruct-v3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SlitherCode/tiny-edu-166m-instruct-v3 with Docker Model Runner:
```
docker model run hf.co/SlitherCode/tiny-edu-166m-instruct-v3
```

SlitherCode commited on 7 days ago

Commit

ea2d4a7

verified ·

1 Parent(s): 5259975

Update README.md

Browse files

Files changed (1) hide show

README.md +166 -187

README.md CHANGED Viewed

@@ -1,199 +1,178 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+tags:
+- tiny
+- from-scratch
+- instruction-tuned
+- causal-lm
+- parchmentlm
+license: mit
+datasets:
+- HuggingFaceFW/fineweb-edu
+- Cleanlab/databricks-dolly-15k-cleaned
+- ProCreations/SimpleMath
+language:
+- en
+base_model:
+- SlitherCode/tiny-edu-166m
 ---
+# ParchmentLM 166M Instruct
+A 166M parameter instruction-tuned language model trained entirely from scratch — custom architecture, real pretraining data, and full SFT pipeline — for under $55 in cloud compute.
+This is a proof-of-concept  demonstrating the full LLM development pipeline: architecture design, pretraining on real web data, supervised fine-tuning, and deployment. It is not intended for production use.
 ## Model Details
+- **Developed by:** Pranay Narula (SlitherCode)
+- **Model type:** ParchmentLM — a custom decoder-only transformer architecture
+- **Language:** English
+- **License:** MIT
+- **Base model:** [SlitherCode/tiny-edu-166m](https://huggingface.co/SlitherCode/tiny-edu-166m) (pretrained from scratch)
+### Architecture
+ParchmentLM is a custom LLaMA-style architecture with the following components:
+| Component | Details |
+|---|---|
+| Parameters | ~166M |
+| Layers | 12 |
+| Attention heads | 12 |
+| Hidden size | 768 |
+| FFN size | 3072 |
+| Context length | 1024 tokens |
+| Positional encoding | RoPE |
+| Normalization | RMSNorm (pre-norm) |
+| Activation | SwiGLU |
+| Attention | FlashAttention (via `scaled_dot_product_attention`) |
+| Tokenizer | tiktoken cl100k_base (vocab size 100,277) |
+| Weight tying | Yes (input embeddings = output projection) |
+### Chat Template (ParchmentLM format)
+```
+system
+You are a helpful assistant<|endoftext|>
+user
+{user message}<|endoftext|>
+assistant
+{assistant response}<|endoftext|>
+```
+`<|endoftext|>` (token ID 100257) serves as both the turn separator and stop token.
+## Training
+### Stage 1 — Pretraining
+- **Dataset:** FineWeb-Edu 10BT sample (HuggingFaceFW/fineweb-edu)
+- **Tokens trained on:** ~4B
+- **Infrastructure:** Modal, single A100-40GB
+- **Throughput:** ~75,000 tokens/sec
+- **Duration:** ~14.8 hours
+- **Cost:** ~$46
+- **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1)
+- **Learning rate:** 3e-4 with cosine decay to 3e-5, 2000 step warmup
+- **Batch size:** 16 × 8 grad accum × 1024 seq len ≈ 131k tokens/step
+- **Precision:** bfloat16
+### Stage 2 — Supervised Fine-Tuning
+- **Datasets:**
+  - [Cleanlab/databricks-dolly-15k-cleaned](https://huggingface.co/datasets/Cleanlab/databricks-dolly-15k-cleaned) — filtered to `closed_qa`, `open_qa`, `information_extraction` categories (~7k examples)
+  - [ProCreations/SimpleMath](https://huggingface.co/datasets/ProCreations/SimpleMath) — 2,500 examples per operation (+, -, *, /) balanced, 10k total
+- **Total SFT examples:** ~17k
+- **Loss:** Completion-only (prompt and padding tokens masked to -100)
+- **Pad token:** `<|endofprompt|>` (token ID 83285) to preserve EOT as a learnable stop signal
+- **Epochs:** 8
+- **Learning rate:** 1e-4 cosine decay
+- **Batch size:** 16 × 2 grad accum
+- **Duration:** ~38 minutes
+- **Cost:** ~$1.50
+- **Infrastructure:** Modal, single A100-40GB
+- **Precision:** bfloat16
+**Total training cost: ~$55 with many sft iterations**
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("SlitherCode/tiny-edu-166m", trust_remote_code=True)
+tokenizer.pad_token = "<|endofprompt|>"
+model = AutoModelForCausalLM.from_pretrained("SlitherCode/tiny-edu-166M-instruct", trust_remote_code=True)
+model.eval()
+PAD_ID = tokenizer.convert_tokens_to_ids("<|endofprompt|>")
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is the capital of France?"},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt")
+input_len = inputs["input_ids"].shape[1]
+import torch
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=100,
+        do_sample=False,
+        repetition_penalty=1.1,
+        eos_token_id=tokenizer.eos_token_id,
+        pad_token_id=PAD_ID,
+    )
+raw = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=False)
+response = raw.split("<|endoftext|>")[0].strip()
+print(response)
+# The capital of France is Paris.
+```
+**Note:** For arithmetic, use the format `"47 + 83 ="` rather than `"What is 47 + 83?"` to match the training distribution.
 ## Evaluation
+Informal evaluation on held-out questions:
+| Question | Response | Correct? |
+|---|---|---|
+| What is the capital of France? | The capital of France is Paris. | ✓ |
+| What is the capital of Germany? | The capital of Germany is Berlin. | ✓ |
+| Who wrote Romeo and Juliet? | Romeo and Juliet was written by William Shakespeare. | ✓ |
+| 12 + 5 = | 17 | ✓ |
+| 900 - 345 = | 700 | ✗ (off by ~145) |
+| 2790 + 6698 = | 9648 | ✗ (correct: 9488) |
+**Limitations:**
+- Reliable arithmetic only up to ~2-3 digit operands
+- Tends to hallucinate on out-of-distribution factual questions
+- No safety filtering or alignment
+- Will not stop gracefully on prompts with no clear answer (creative writing, open-ended tasks)
+- Undertrained relative to model capacity — 4B tokens vs. the ~300B tokens models this size typically see
+## Compute & Environmental Impact
+- **Hardware:** NVIDIA A100-40GB (via Modal)
+- **Cloud provider:** Modal (AWS us-east-1 region)
+- **Total GPU hours:** ~15.5 hours
+- **Total cost:** ~$55 USD
+## Citation
+If you use this model or find this project useful, a link back to the repository is appreciated.
+```
+@misc{narula2025parchmentlm,
+  author = {Pranay Narula},
+  title = {ParchmentLM 166M Instruct: Full LLM Pipeline From Scratch},
+  year = {2025},
+  url = {https://huggingface.co/SlitherCode/tiny-edu-166M-instruct}
+}
+```