Instructions to use rockypod/neotoi-coder-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rockypod/neotoi-coder-8b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rockypod/neotoi-coder-8b", filename="neotoi-coder-v3.1-8b-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use rockypod/neotoi-coder-8b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/neotoi-coder-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/neotoi-coder-8b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rockypod/neotoi-coder-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rockypod/neotoi-coder-8b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rockypod/neotoi-coder-8b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rockypod/neotoi-coder-8b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rockypod/neotoi-coder-8b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rockypod/neotoi-coder-8b:Q4_K_M
Use Docker
docker model run hf.co/rockypod/neotoi-coder-8b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rockypod/neotoi-coder-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rockypod/neotoi-coder-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rockypod/neotoi-coder-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rockypod/neotoi-coder-8b:Q4_K_M
- Ollama
How to use rockypod/neotoi-coder-8b with Ollama:
ollama run hf.co/rockypod/neotoi-coder-8b:Q4_K_M
- Unsloth Studio
How to use rockypod/neotoi-coder-8b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/neotoi-coder-8b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rockypod/neotoi-coder-8b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rockypod/neotoi-coder-8b to start chatting
- Pi
How to use rockypod/neotoi-coder-8b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/neotoi-coder-8b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rockypod/neotoi-coder-8b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rockypod/neotoi-coder-8b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rockypod/neotoi-coder-8b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rockypod/neotoi-coder-8b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rockypod/neotoi-coder-8b with Docker Model Runner:
docker model run hf.co/rockypod/neotoi-coder-8b:Q4_K_M
- Lemonade
How to use rockypod/neotoi-coder-8b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rockypod/neotoi-coder-8b:Q4_K_M
Run and chat with the model
lemonade run user.neotoi-coder-8b-Q4_K_M
List all available models
lemonade list
Neotoi Coder v3.2 โ 8B
A Rust / Dioxus 0.7 specialist fine-tuned from Qwen3-8B (8.2B parameters, 6.95B non-embedding) using RAFT (Retrieval-Augmented Fine-Tuning). Optimized for production-quality Dioxus 0.7 components with Tailwind v4 and WCAG 2.2 AAA accessibility.
This is the 8B variant of the v3.2 release. Companion repos: 4B (rockypod/neotoi-coder-4b) ยท 15B family hub (rockypod/neotoi-coder)
v3.2 Exam Results โ 114Q Dioxus 0.7 Spec Exam
160.0 / 164.0 weighted | 111 / 114 raw | 97.56%
| Tier | Name | Cnt | Raw | Wtd | /Max | Rate | Floor | Status |
|---|---|---|---|---|---|---|---|---|
| T1 | Fundamentals | 12 | 12 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T2 | RSX Syntax | 12 | 11 | 11.0 | 12.0 | 91.7% | 82% | โ |
| T3 | Signal Hygiene | 12 | 12 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T4 | WCAG / ARIA | 15 | 15 | 22.5 | 22.5 | 100.0% | 82% | โ |
| T5 | use_resource | 8 | 8 | 12.0 | 12.0 | 100.0% | 82% | โ |
| T6 | Hard Reasoning | 10 | 10 | 20.0 | 20.0 | 100.0% | 88% | โ |
| T7 | Primitives + CSS | 13 | 13 | 19.5 | 19.5 | 100.0% | 82% | โ |
| T8 | GlobalSignal / i18n | 8 | 7 | 10.5 | 12.0 | 87.5% | 82% | โ |
| T9 | Static Navigator | 6 | 6 | 9.0 | 9.0 | 100.0% | 82% | โ |
| T10 | Dioxus 0.7.4 | 6 | 6 | 12.0 | 12.0 | 100.0% | 88% | โ |
| T11 | Server Functions | 4 | 4 | 6.0 | 6.0 | 100.0% | 82% | โ |
| T12 | Format Compliance (NEW) | 6 | 6 | 12.0 | 12.0 | 100.0% | 88% | โ |
| T13 | SyncStore (NEW) | 2 | 1 | 1.5 | 3.0 | 50.0% | 82% | โ ๏ธ |
| Total | 114 | 111 | 160.0 | 164.0 | 97.56% | โ | โ |
- Publication bar (90%): PASS
- Release bar (95%): PASS
- Tier floors: FAIL (T13 only โ structural: 2 questions, single real miss = 50%)
3 misses: q022 (T2, rsx! macro), q087 (T8, use_signal), q113 (T13, tokio::spawn)
The T13 SyncStore floor failure is structural โ only 2 questions in the tier means any single real miss equals a floor failure regardless of difficulty.
v3.2 vs v3.1 (8B)
| Metric | v3.1 8B | v3.2 8B |
|---|---|---|
| Score | 144.5/144.5 (100.0%) | 160.0/164.0 (97.56%) |
| Exam | 103Q, max 144.5, 11 tiers | 114Q, max 164.0, 13 tiers |
| T4 WCAG / ARIA | 100.0% | 100.0% โ |
| T7 Primitives + CSS | 100.0% | 100.0% โ (15B was 92.3%) |
| T12 Format Compliance | โ | 100.0% โ (15B was 83.3%) |
| T13 SyncStore | โ | 50.0% โ ๏ธ |
| Dioxus surface | 0.7.0โ0.7.4 | 0.7.0โ0.7.9 |
| Dataset | 4,880 rows, 43 topics | 5,287 rows, 57 topics |
The 8B v3.2 outscores the 15B v3.2 (97.56% vs 95.12%) โ consistent with the v3.1 pattern where the 8B outperformed the larger model.
Version History
| Version | Base (params) | Score | Exam | Dataset |
|---|---|---|---|---|
| v3.0 15B | Qwen3-Coder-14B (14.8B) | 124.0/144.5 (85.8%) | 103Q weighted | 4,535 |
| v3.1 15B | Qwen3-Coder-14B (14.8B) | 137.0/144.5 (94.81%) | 103Q weighted | 4,880 |
| v3.1 8B | Qwen3-8B (8.2B) | 144.5/144.5 (100.00%) | 103Q weighted | 4,880 |
| v3.1 4B | Qwen3-4B (4.0B) | 143.5/144.5 (99.31%) | 103Q weighted | 4,880 |
| v3.2 15B | Qwen3-Coder-14B (14.8B) | 156.0/164.0 (95.12%) | 114Q weighted | 5,287 |
| v3.2 8B | Qwen3-8B (8.2B) | 160.0/164.0 (97.56%) | 114Q weighted | 5,287 |
Files
neotoi-coder-v3.2-8b-q4_k_m_patched.ggufโ current Q4_K_M +qwen3.thinking=truepatch (~4.68 GB)neotoi-coder-v3.1-8b-q4_k_m_patched.ggufโ v3.1 archive
Install
Ollama
ollama pull rockypod/neotoi-coder:8b
ollama run rockypod/neotoi-coder:8b "Write a Dioxus 0.7 counter with use_signal"
LM Studio
Download neotoi-coder-v3.2-8b-q4_k_m_patched.gguf from this repo (~4.68 GB).
See integration/lm_studio.md in the GitHub repo
for prompt template setup.
llama.cpp
./llama-cli -m neotoi-coder-v3.2-8b-q4_k_m_patched.gguf -ngl 99 --temp 0.2 \
-p "<|im_start|>user\nYour question<|im_end|>\n<|im_start|>assistant\n<think>"
Model Details
- Base model: Qwen/Qwen3-8B (8.2B total, 6.95B non-embedding)
- Method: RAFT with LoRA adapters (Unsloth)
- Dataset: 5,287 curated Dioxus 0.7 examples across 57 topics (T1โT57)
- Scope: Rust + Dioxus 0.7.0โ0.7.9 + Tailwind v4 + WCAG 2.2 AAA
- Quantization: Q4_K_M (~4.68 GB)
- Thinking tokens: patched (
qwen3.thinking = true)
Training
| Field | Value |
|---|---|
| Steps | 2,644 |
| Epochs | 4 |
| LoRA rank | 16 (alpha 16, dropout 0) |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Sequence length | 8192 |
| Precision | bf16 + 4-bit base |
| Hardware | RTX 3090 Ti (24 GB) |
What's New in v3.2
- Full Dioxus 0.7 series coverage (0.7.0โ0.7.9): Scoped CSS + CSS modules
(0.7.3), SyncStore +
use_store_sync(0.7.2),onauxclick/onscrollendevents (0.7.3), server-only extractors +serde_qs,inertattribute + web panic resilience (0.7.6),IntoAttributeValuefor&T,Action::PartialEq - Format compliance training โ fenced-code-only outputs, no orphan
</think>, no prose preamble - Preserve-and-append training โ edits to
.ftlcatalogs,Cargo.toml, route enums add without replacing - WCAG / ARIA corrections โ T55 correction set ensures
rsx!macro is never dropped on ARIA-heavy components - 5,287 training examples across 57 topics (up from 4,880 / 43 in v3.1)
Enabling Thinking Mode
This model emits Qwen3 native <think>...</think> blocks. Thinking is on
by default with the _patched.gguf quants on inference backends that
honor qwen3.thinking.
Transparency
- Weights: HuggingFace โ rockypod/neotoi-coder-8b
- Family hub (8B / 4B / 15B comparison): rockypod/neotoi-coder
- Exam runner, grader, per-question results: GitHub โ rockypod/neotoi-coder
- Ollama:
ollama pull rockypod/neotoi-coder:8b
The training dataset is not redistributed โ see the GitHub repo for the data-generation pipeline.
License
Fine-tuned weights: Neotoi Coder Community License v1.0 โ commercial use of outputs permitted, weight redistribution prohibited, mental health deployment requires written permission. See LICENSE.
Base model: Qwen3-8B โ Apache 2.0 ยฉ Alibaba Cloud.
Credits
- Unsloth โ 2ร faster fine-tuning
- Qwen3-8B โ base model
- Dioxus โ the framework this model specializes in
- Claude Code โ dataset pipeline and training infrastructure
Built on a homelab RTX 3090 Ti in Washington State.
- Downloads last month
- 94
4-bit