Osaurus-AI commited on
Commit
33cc8fc
·
verified ·
1 Parent(s): f540a40

Add HumanEval pass@1=100% benchmark

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -22,6 +22,10 @@ pipeline_tag: text-generation
22
  - **Multimodal (vision) kept.**
23
  - Calibration: Vera (agentic-coder) + GSM8K; "floor" recipe keeps the most-salient coding experts.
24
 
 
 
 
 
25
  ## Run it
26
  Load it in **Osaurus** (Apple Silicon) — it runs on Osaurus's native Swift JANG runtime.
27
 
 
22
  - **Multimodal (vision) kept.**
23
  - Calibration: Vera (agentic-coder) + GSM8K; "floor" recipe keeps the most-salient coding experts.
24
 
25
+ ## Benchmarks
26
+ - **HumanEval: pass@1 = 100%** (82/82, scrambled-half adaptive eval, seed 42; 0 failures, 0 escalations).
27
+ - Despite 45% expert pruning + all-2-bit routed experts, coding accuracy holds at **100%** — the REAP45 keep-set is a subset of the larger M3-Coder builds' proven coding experts, so coding capability is preserved while the model shrinks to ~84 GB.
28
+
29
  ## Run it
30
  Load it in **Osaurus** (Apple Silicon) — it runs on Osaurus's native Swift JANG runtime.
31