How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-32b
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-32b
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-32b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-32b
Use Docker
docker model run hf.co/dcostenco/prism-coder-32b
Quick Links

Prism Coder 32B โ€” Tool-Routing Model

Fine-tuned Qwen3-32B for routing user requests to the correct Prism Memory tool. 17 tools + NO_TOOL abstention across 9 evaluation categories.

What this model does

Routes natural language requests to the correct Prism Memory tool (session_save_ledger, session_load_context, knowledge_search, etc.). This is a classifier โ€” it decides which tool to call, not a general-purpose coding or clinical assistant.

What this model does NOT do

  • General code generation (not trained on code)
  • Clinical note writing (not trained on clinical data)
  • Codebase understanding (does not know Synalux internals)
  • General reasoning beyond base Qwen3-32B capability

Performance

Metric Score Notes
eval_300 strict (model only) 292/300 (97.3%) Model's raw accuracy
eval_300 strict (with post-processing) 300/300 (100%) 8 cases fixed by validate_tool_call regex layer
3-seed validation 300/300 x 3 With post-processing
avg latency 1.4s Apple M5 Max
context window 16,384 tokens

The eval harness includes a validate_tool_call post-processing layer that remaps 8 edge cases the model gets wrong (e.g., "repair links" โ†’ backfill_links, "log a milestone" โ†’ save_experience). Without this layer, raw model accuracy is 97.3%.

Training

  • Base: Qwen/Qwen3-32B (4-bit quantized for training via MLX)
  • Method: LoRA SFT (rank=16, 8 of 64 layers, scale=20.0) x 14 iterative rounds
  • Training data: eval_300 promptโ†’tool routing examples only. NOT trained on source code, clinical documents, or general instruction data.
  • Quantization: Q4_K_M via llama.cpp (18 GB)
  • Hardware: Apple M5 Max 48 GB unified memory

Upcoming

A stacked LoRA adapter (layers 1-16) trained on Synalux codebase, clinical protocols, and Prism Memory internals is in progress. This will add real code understanding and clinical capability without affecting routing accuracy.

Usage

ollama pull dcostenco/prism-coder:32b

Model Family

Model Size eval_300 (raw) eval_300 (with post-processing)
prism-coder:1b7 2.2 GB 100% 100%
prism-coder:4b 2.5 GB 100% 100%
prism-coder:14b 9.0 GB ~97% 99.7%
prism-coder:32b 18 GB 97.3% 100%

License

Apache 2.0

Author

Synalux

Downloads last month
6,117
GGUF
Model size
33B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dcostenco/prism-coder-32b

Base model

Qwen/Qwen3-32B
Quantized
(155)
this model