Wind-Edge-1.6-Instruct

Wind-Edge-1.6-Instruct is a compact custom Qwen3-compatible assistant model for local and edge inference. It was built from a depth-pruned Wind-Edge base and tuned with a Claude-heavy public distillation SFT mix, code/math instruction data, and a final behavior polish pass.

This is a small model. It is intended for short answers, simple coding help, summaries, and lightweight local assistant use. It is not a replacement for large reasoning models.

Recommended Usage

Use trust_remote_code=True; the custom loader re-applies tied weights from model.safetensors.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "arthu1/Wind-Edge-1.6-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Who are you?"}]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

out = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    repetition_penalty=1.06,
    eos_token_id=[
        tokenizer.eos_token_id,
        tokenizer.convert_tokens_to_ids("<|im_end|>"),
    ],
)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Suggested Settings

For chat:

  • enable_thinking=False
  • temperature=0.55-0.7
  • top_p=0.85-0.92
  • repetition_penalty=1.05-1.08
  • max_new_tokens=128-512

For deterministic tests:

  • do_sample=False
  • repetition_penalty=1.06
  • Keep prompts short and direct.

The bundled chat template injects a minimal default identity system message if no system message is supplied:

You are Wind-Edge-1.6, a compact AI assistant model. You are not a human.

Training Summary

  • Source family: Qwen3-compatible Wind-Edge architecture
  • Base: depth-pruned and healed Wind-Edge base from Qwen3-0.6B-compatible weights
  • Final SFT:
    • 12M tokens of no-thinking distillation SFT
    • Claude-style public distillation data plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct
    • Bad self-identity teacher rows filtered
    • 6M-token system-template adaptation pass
    • 2M-token local quality polish for identity, simple arithmetic, list sorting, and concise coding behavior

Quick Sanity Outputs

Expected behavior after the final polish:

  • hi -> short greeting as Wind-Edge-1.6
  • Who are you? -> identifies as Wind-Edge-1.6, not human
  • sort this list: [3, 1, 2] -> [1, 2, 3]
  • 60 miles in 1.5 hours -> 40 mph

Limitations

Wind-Edge-1.6-Instruct is small and can still make arithmetic, factual, and reasoning mistakes. It may overgeneralize from prompts, and it is best used with concise instructions and verification for important work.

Citation

See wind_edge_1_6_paper.html in this repository for a short technical write-up of the build and tuning process.

Downloads last month
6
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for North-ML1/Wind-Edge-1.6-Instruct

Finetuned
(1)
this model

Space using North-ML1/Wind-Edge-1.6-Instruct 1

Collection including North-ML1/Wind-Edge-1.6-Instruct