File size: 5,598 Bytes
95c1d5f
f8adfb4
 
 
 
95c1d5f
 
f8adfb4
 
 
 
 
 
 
 
 
 
 
95c1d5f
 
f8adfb4
95c1d5f
f8adfb4
95c1d5f
f8adfb4
95c1d5f
f8adfb4
 
 
 
 
95c1d5f
b07b0f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95c1d5f
 
f8adfb4
 
 
 
 
 
 
 
b07b0f2
 
 
 
 
 
f8adfb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b07b0f2
f8adfb4
 
 
 
 
 
b07b0f2
 
f8adfb4
 
 
 
 
 
 
 
 
 
 
 
 
 
b07b0f2
f8adfb4
b07b0f2
 
 
 
 
 
 
 
f8adfb4
 
b07b0f2
 
f8adfb4
 
 
 
 
 
b07b0f2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
language:
- ja
- en
license: apache-2.0
base_model: google/gemma-4-31b-it
tags:
- gemma4
- code
- agent
- japanese
- qlora
- react
- mcp
- claude-code
datasets:
- custom
pipeline_tag: text-generation
---

# gemma4-31b-ja-agent-coder

**Japanese-enhanced agentic coding model** — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support.

## Highlights

- **Agentic behavior**: ReAct reasoning, multi-step tool calling, self-correction
- **Japanese coding**: Code generation, review, debugging in Japanese
- **Claude Code compatible**: Designed as a local subagent for Claude Code via MCP
- **Function calling**: Native Ollama/OpenAI tool use format
- **Zero API cost**: Runs locally on 20GB+ VRAM

## Benchmark Results

Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).

| Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta |
|----------|:---:|:---:|:---:|
| ReAct Tool Call | 10.0 | **10.0** | — |
| Function Calling | 8.0 | **10.0** | +2.0 |
| Multi-step ReAct | 8.0 | **10.0** | +2.0 |
| JP Code Gen (API) | 10.0 | **10.0** | — |
| JP Code Gen (Algorithm) | 10.0 | **10.0** | — |
| JP Code Gen (Database) | 9.0 | **10.0** | +1.0 |
| JP Debug (TypeError) | 10.0 | **10.0** | — |
| JP Debug (KeyError) | 10.0 | **10.0** | — |
| JP Code Review | 8.0 | **10.0** | +2.0 |
| JP Git Strategy | 10.0 | **10.0** | — |
| JP Self-correction | 10.0 | **10.0** | — |
| JP Documentation | 10.0 | **10.0** | — |
| **Overall** | **9.4** | **10.0** | **+0.6** |

### Key Improvements

- **Function Calling**: Clean `<tool_call>` JSON format output (base model adds extra explanation)
- **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow
- **Code Review**: Parameterized query suggestions for SQL injection fixes
- **Database CRUD**: Complete Create/Read/Update/Delete coverage

### Inference Test Results (v2 adapter)

| Test | Input | Result |
|------|-------|--------|
| ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action |
| JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint |
| JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code |
| Function Calling | "Use read_file to read README.md" | Clean `<tool_call>` JSON format |

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | google/gemma-4-31b-it |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Trainable params | 133M / 31B (0.43%) |
| Training data | 1,546 custom samples (v2) |
| Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) |
| Learning rate | 1.5e-4 (cosine) |
| Final loss | 0.98 |
| Token accuracy | 96.8% |
| Training time | ~1.5 hours |
| Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |

## Training Data Categories

| Category | Samples | Description |
|----------|---------|-------------|
| ReAct Tool Calling | ~120 | Single/chained tool calls |
| Multi-step Agentic Trajectory | ~100 | Plan→Tool→Observe→Correct→Answer loops |
| Self-correction | ~40 | Error recovery patterns |
| Function Calling | ~50 | Ollama native tool format |
| Japanese Code Generation | ~200 | JP instruction → Python/TS code |
| Japanese Code Review | ~100 | Security, refactoring, best practices |
| Japanese Error Explanation | ~80 | Error → JP diagnosis + fix |
| Japanese Comprehension | ~50 | Reading, reasoning, summarization |
| Debugging & Troubleshooting | ~100 | Error analysis → root cause → fix |
| Git & CI/CD | ~80 | Branch strategy, PR, GitHub Actions |
| Project Planning | ~80 | Requirements → task decomposition |
| Technical Documentation | ~80 | README, API docs, specs |
| Algorithms & Data Structures | ~200 | Binary search, DP, graph, sorting |
| Web Frameworks | ~200 | FastAPI, Django, React, Next.js |
| Database Operations | ~150 | SQLAlchemy, PostgreSQL, Redis |
| Testing & DevOps | ~150 | pytest, Docker, K8s, Terraform |

## Use with Ollama

```bash
# After GGUF conversion
ollama create gemma4-ja-agent-coder -f Modelfile
ollama run gemma4-ja-agent-coder
```

## Use with helix-agents (Claude Code MCP)

Reduce Claude Code API token consumption by delegating routine tasks to this local model.

```json
{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}
```

## Use with transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                          bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
                                              quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")
```

> **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround.

## License

Apache 2.0 (same as base model)

## Author

[tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code.