How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rockypod/neotoi-coder-8b",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Neotoi Coder v3.2 โ€” 8B

A Rust / Dioxus 0.7 specialist fine-tuned from Qwen3-8B (8.2B parameters, 6.95B non-embedding) using RAFT (Retrieval-Augmented Fine-Tuning). Optimized for production-quality Dioxus 0.7 components with Tailwind v4 and WCAG 2.2 AAA accessibility.

This is the 8B variant of the v3.2 release. Companion repos: 4B (rockypod/neotoi-coder-4b) ยท 15B family hub (rockypod/neotoi-coder)

v3.2 Exam Results โ€” 114Q Dioxus 0.7 Spec Exam

160.0 / 164.0 weighted | 111 / 114 raw | 97.56%

Tier Name Cnt Raw Wtd /Max Rate Floor Status
T1 Fundamentals 12 12 12.0 12.0 100.0% 82% โœ…
T2 RSX Syntax 12 11 11.0 12.0 91.7% 82% โœ…
T3 Signal Hygiene 12 12 12.0 12.0 100.0% 82% โœ…
T4 WCAG / ARIA 15 15 22.5 22.5 100.0% 82% โœ…
T5 use_resource 8 8 12.0 12.0 100.0% 82% โœ…
T6 Hard Reasoning 10 10 20.0 20.0 100.0% 88% โœ…
T7 Primitives + CSS 13 13 19.5 19.5 100.0% 82% โœ…
T8 GlobalSignal / i18n 8 7 10.5 12.0 87.5% 82% โœ…
T9 Static Navigator 6 6 9.0 9.0 100.0% 82% โœ…
T10 Dioxus 0.7.4 6 6 12.0 12.0 100.0% 88% โœ…
T11 Server Functions 4 4 6.0 6.0 100.0% 82% โœ…
T12 Format Compliance (NEW) 6 6 12.0 12.0 100.0% 88% โœ…
T13 SyncStore (NEW) 2 1 1.5 3.0 50.0% 82% โš ๏ธ
Total 114 111 160.0 164.0 97.56% โ€” โ€”
  • Publication bar (90%): PASS
  • Release bar (95%): PASS
  • Tier floors: FAIL (T13 only โ€” structural: 2 questions, single real miss = 50%)

3 misses: q022 (T2, rsx! macro), q087 (T8, use_signal), q113 (T13, tokio::spawn)

The T13 SyncStore floor failure is structural โ€” only 2 questions in the tier means any single real miss equals a floor failure regardless of difficulty.

v3.2 vs v3.1 (8B)

Metric v3.1 8B v3.2 8B
Score 144.5/144.5 (100.0%) 160.0/164.0 (97.56%)
Exam 103Q, max 144.5, 11 tiers 114Q, max 164.0, 13 tiers
T4 WCAG / ARIA 100.0% 100.0% โœ…
T7 Primitives + CSS 100.0% 100.0% โœ… (15B was 92.3%)
T12 Format Compliance โ€” 100.0% โœ… (15B was 83.3%)
T13 SyncStore โ€” 50.0% โš ๏ธ
Dioxus surface 0.7.0โ€“0.7.4 0.7.0โ€“0.7.9
Dataset 4,880 rows, 43 topics 5,287 rows, 57 topics

The 8B v3.2 outscores the 15B v3.2 (97.56% vs 95.12%) โ€” consistent with the v3.1 pattern where the 8B outperformed the larger model.

Version History

Version Base (params) Score Exam Dataset
v3.0 15B Qwen3-Coder-14B (14.8B) 124.0/144.5 (85.8%) 103Q weighted 4,535
v3.1 15B Qwen3-Coder-14B (14.8B) 137.0/144.5 (94.81%) 103Q weighted 4,880
v3.1 8B Qwen3-8B (8.2B) 144.5/144.5 (100.00%) 103Q weighted 4,880
v3.1 4B Qwen3-4B (4.0B) 143.5/144.5 (99.31%) 103Q weighted 4,880
v3.2 15B Qwen3-Coder-14B (14.8B) 156.0/164.0 (95.12%) 114Q weighted 5,287
v3.2 8B Qwen3-8B (8.2B) 160.0/164.0 (97.56%) 114Q weighted 5,287

Files

  • neotoi-coder-v3.2-8b-q4_k_m_patched.gguf โ€” current Q4_K_M + qwen3.thinking=true patch (~4.68 GB)
  • neotoi-coder-v3.1-8b-q4_k_m_patched.gguf โ€” v3.1 archive

Install

Ollama

ollama pull rockypod/neotoi-coder:8b
ollama run rockypod/neotoi-coder:8b "Write a Dioxus 0.7 counter with use_signal"

LM Studio

Download neotoi-coder-v3.2-8b-q4_k_m_patched.gguf from this repo (~4.68 GB). See integration/lm_studio.md in the GitHub repo for prompt template setup.

llama.cpp

./llama-cli -m neotoi-coder-v3.2-8b-q4_k_m_patched.gguf -ngl 99 --temp 0.2 \
  -p "<|im_start|>user\nYour question<|im_end|>\n<|im_start|>assistant\n<think>"

Model Details

  • Base model: Qwen/Qwen3-8B (8.2B total, 6.95B non-embedding)
  • Method: RAFT with LoRA adapters (Unsloth)
  • Dataset: 5,287 curated Dioxus 0.7 examples across 57 topics (T1โ€“T57)
  • Scope: Rust + Dioxus 0.7.0โ€“0.7.9 + Tailwind v4 + WCAG 2.2 AAA
  • Quantization: Q4_K_M (~4.68 GB)
  • Thinking tokens: patched (qwen3.thinking = true)

Training

Field Value
Steps 2,644
Epochs 4
LoRA rank 16 (alpha 16, dropout 0)
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Sequence length 8192
Precision bf16 + 4-bit base
Hardware RTX 3090 Ti (24 GB)

What's New in v3.2

  • Full Dioxus 0.7 series coverage (0.7.0โ€“0.7.9): Scoped CSS + CSS modules (0.7.3), SyncStore + use_store_sync (0.7.2), onauxclick/onscrollend events (0.7.3), server-only extractors + serde_qs, inert attribute + web panic resilience (0.7.6), IntoAttributeValue for &T, Action::PartialEq
  • Format compliance training โ€” fenced-code-only outputs, no orphan </think>, no prose preamble
  • Preserve-and-append training โ€” edits to .ftl catalogs, Cargo.toml, route enums add without replacing
  • WCAG / ARIA corrections โ€” T55 correction set ensures rsx! macro is never dropped on ARIA-heavy components
  • 5,287 training examples across 57 topics (up from 4,880 / 43 in v3.1)

Enabling Thinking Mode

This model emits Qwen3 native <think>...</think> blocks. Thinking is on by default with the _patched.gguf quants on inference backends that honor qwen3.thinking.

Transparency

The training dataset is not redistributed โ€” see the GitHub repo for the data-generation pipeline.

License

Fine-tuned weights: Neotoi Coder Community License v1.0 โ€” commercial use of outputs permitted, weight redistribution prohibited, mental health deployment requires written permission. See LICENSE.

Base model: Qwen3-8B โ€” Apache 2.0 ยฉ Alibaba Cloud.

Credits

  • Unsloth โ€” 2ร— faster fine-tuning
  • Qwen3-8B โ€” base model
  • Dioxus โ€” the framework this model specializes in
  • Claude Code โ€” dataset pipeline and training infrastructure

Built on a homelab RTX 3090 Ti in Washington State.

Downloads last month
94
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rockypod/neotoi-coder-8b

Finetuned
Qwen/Qwen3-8B
Finetuned
(1666)
this model