Tsunamayo7 commited on
Commit
b07b0f2
·
verified ·
1 Parent(s): f8adfb4

Add benchmark comparison table (base 9.4 vs fine-tuned 10.0)

Browse files
Files changed (1) hide show
  1. README.md +57 -9
README.md CHANGED
@@ -30,6 +30,42 @@ pipeline_tag: text-generation
30
  - **Function calling**: Native Ollama/OpenAI tool use format
31
  - **Zero API cost**: Runs locally on 20GB+ VRAM
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ## Training Details
34
 
35
  | Parameter | Value |
@@ -40,9 +76,12 @@ pipeline_tag: text-generation
40
  | LoRA alpha | 32 |
41
  | Target modules | q/k/v/o_proj, gate/up/down_proj |
42
  | Trainable params | 133M / 31B (0.43%) |
43
- | Training data | 1,500+ custom samples |
44
- | Epochs | 3 |
45
- | Learning rate | 2e-4 (cosine) |
 
 
 
46
  | Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |
47
 
48
  ## Training Data Categories
@@ -69,12 +108,15 @@ pipeline_tag: text-generation
69
  ## Use with Ollama
70
 
71
  ```bash
 
72
  ollama create gemma4-ja-agent-coder -f Modelfile
73
  ollama run gemma4-ja-agent-coder
74
  ```
75
 
76
  ## Use with helix-agents (Claude Code MCP)
77
 
 
 
78
  ```json
79
  {
80
  "mcpServers": {
@@ -89,18 +131,24 @@ ollama run gemma4-ja-agent-coder
89
  ## Use with transformers
90
 
91
  ```python
92
- from transformers import AutoModelForCausalLM, AutoTokenizer
93
  from peft import PeftModel
94
-
95
- base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it")
96
- model = PeftModel.from_pretrained(base, "tsunamayo7/gemma4-31b-ja-agent-coder")
97
- tokenizer = AutoTokenizer.from_pretrained("tsunamayo7/gemma4-31b-ja-agent-coder")
 
 
 
 
98
  ```
99
 
 
 
100
  ## License
101
 
102
  Apache 2.0 (same as base model)
103
 
104
  ## Author
105
 
106
- [tsunamayo7](https://github.com/tsunamayo7)
 
30
  - **Function calling**: Native Ollama/OpenAI tool use format
31
  - **Zero API cost**: Runs locally on 20GB+ VRAM
32
 
33
+ ## Benchmark Results
34
+
35
+ Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).
36
+
37
+ | Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta |
38
+ |----------|:---:|:---:|:---:|
39
+ | ReAct Tool Call | 10.0 | **10.0** | — |
40
+ | Function Calling | 8.0 | **10.0** | +2.0 |
41
+ | Multi-step ReAct | 8.0 | **10.0** | +2.0 |
42
+ | JP Code Gen (API) | 10.0 | **10.0** | — |
43
+ | JP Code Gen (Algorithm) | 10.0 | **10.0** | — |
44
+ | JP Code Gen (Database) | 9.0 | **10.0** | +1.0 |
45
+ | JP Debug (TypeError) | 10.0 | **10.0** | — |
46
+ | JP Debug (KeyError) | 10.0 | **10.0** | — |
47
+ | JP Code Review | 8.0 | **10.0** | +2.0 |
48
+ | JP Git Strategy | 10.0 | **10.0** | — |
49
+ | JP Self-correction | 10.0 | **10.0** | — |
50
+ | JP Documentation | 10.0 | **10.0** | — |
51
+ | **Overall** | **9.4** | **10.0** | **+0.6** |
52
+
53
+ ### Key Improvements
54
+
55
+ - **Function Calling**: Clean `<tool_call>` JSON format output (base model adds extra explanation)
56
+ - **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow
57
+ - **Code Review**: Parameterized query suggestions for SQL injection fixes
58
+ - **Database CRUD**: Complete Create/Read/Update/Delete coverage
59
+
60
+ ### Inference Test Results (v2 adapter)
61
+
62
+ | Test | Input | Result |
63
+ |------|-------|--------|
64
+ | ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action |
65
+ | JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint |
66
+ | JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code |
67
+ | Function Calling | "Use read_file to read README.md" | Clean `<tool_call>` JSON format |
68
+
69
  ## Training Details
70
 
71
  | Parameter | Value |
 
76
  | LoRA alpha | 32 |
77
  | Target modules | q/k/v/o_proj, gate/up/down_proj |
78
  | Trainable params | 133M / 31B (0.43%) |
79
+ | Training data | 1,546 custom samples (v2) |
80
+ | Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) |
81
+ | Learning rate | 1.5e-4 (cosine) |
82
+ | Final loss | 0.98 |
83
+ | Token accuracy | 96.8% |
84
+ | Training time | ~1.5 hours |
85
  | Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |
86
 
87
  ## Training Data Categories
 
108
  ## Use with Ollama
109
 
110
  ```bash
111
+ # After GGUF conversion
112
  ollama create gemma4-ja-agent-coder -f Modelfile
113
  ollama run gemma4-ja-agent-coder
114
  ```
115
 
116
  ## Use with helix-agents (Claude Code MCP)
117
 
118
+ Reduce Claude Code API token consumption by delegating routine tasks to this local model.
119
+
120
  ```json
121
  {
122
  "mcpServers": {
 
131
  ## Use with transformers
132
 
133
  ```python
134
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
135
  from peft import PeftModel
136
+ import torch
137
+
138
+ bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
139
+ bnb_4bit_compute_dtype=torch.bfloat16)
140
+ base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
141
+ quantization_config=bnb, device_map="auto")
142
+ model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
143
+ tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")
144
  ```
145
 
146
+ > **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround.
147
+
148
  ## License
149
 
150
  Apache 2.0 (same as base model)
151
 
152
  ## Author
153
 
154
+ [tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code.