Instructions to use North-ML1/Wind-Edge-1.6-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use North-ML1/Wind-Edge-1.6-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="North-ML1/Wind-Edge-1.6-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("North-ML1/Wind-Edge-1.6-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use North-ML1/Wind-Edge-1.6-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "North-ML1/Wind-Edge-1.6-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "North-ML1/Wind-Edge-1.6-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/North-ML1/Wind-Edge-1.6-Instruct

SGLang

How to use North-ML1/Wind-Edge-1.6-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "North-ML1/Wind-Edge-1.6-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "North-ML1/Wind-Edge-1.6-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "North-ML1/Wind-Edge-1.6-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "North-ML1/Wind-Edge-1.6-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use North-ML1/Wind-Edge-1.6-Instruct with Docker Model Runner:
```
docker model run hf.co/North-ML1/Wind-Edge-1.6-Instruct
```

Wind-Edge-1.6-Instruct

Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass.

This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models.

Recommended Usage

Use trust_remote_code=True; the custom loader re-applies tied weights from model.safetensors.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "arthu1/Wind-Edge-1.6-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Who are you?"}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    repetition_penalty=1.06,
    eos_token_id=[
        tokenizer.eos_token_id,
        tokenizer.convert_tokens_to_ids("<|im_end|>"),
    ],
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Suggested Settings

For chat:

enable_thinking=False
temperature=0.55-0.7
top_p=0.85-0.92
repetition_penalty=1.05-1.08
max_new_tokens=128-512

For deterministic tests:

do_sample=False
repetition_penalty=1.06
Keep prompts short and direct.

The bundled chat template injects a minimal default identity system message if no system message is supplied:

You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.

Training Summary

Source family: Qwen3-compatible Wind-Edge architecture
Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights
Final SFT:
- 12M tokens of no-thinking distillation SFT
- Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct
- Bad self-identity teacher rows filtered
- 6M-token system-template adaptation pass
- 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior

Quick Sanity Outputs

Expected behavior after the final polish:

hi -> short greeting as Wind-Edge-1.6
Who are you? -> identifies as Wind-Edge-1.6, not human
sort this list: [3, 1, 2] -> [1, 2, 3]
60 miles in 1.5 hours -> 40 mph

Limitations

Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work.

Citation

See wind_edge_1_6_paper.html in this repository for a short technical write-up of the build and tuning process.

Downloads last month: 6

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for North-ML1/Wind-Edge-1.6-Instruct

Base model

North-ML1/Wind-Edge-1.6-Base

Finetuned

(1)

this model

Space using North-ML1/Wind-Edge-1.6-Instruct 1

Collection including North-ML1/Wind-Edge-1.6-Instruct

North ML Version 1

Collection

3 items • Updated 4 days ago