File size: 5,598 Bytes

95c1d5f
f8adfb4
 
 
 
95c1d5f
 
f8adfb4
 
 
 
 
 
 
 
 
 
 
95c1d5f
 
f8adfb4
95c1d5f
f8adfb4
95c1d5f
f8adfb4
95c1d5f
f8adfb4
 
 
 
 
95c1d5f
b07b0f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95c1d5f
 
f8adfb4
 
 
 
 
 
 
 
b07b0f2
 
 
 
 
 
f8adfb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b07b0f2
f8adfb4
 
 
 
 
 
b07b0f2
 
f8adfb4
 
 
 
 
 
 
 
 
 
 
 
 
 
b07b0f2
f8adfb4
b07b0f2
 
 
 
 
 
 
 
f8adfb4
 
b07b0f2
 
f8adfb4
 
 
 
 
 
b07b0f2

---
language:
- ja
- en
license: apache-2.0
base_model: google/gemma-4-31b-it
tags:
- gemma4
- code
- agent
- japanese
- qlora
- react
- mcp
- claude-code
datasets:
- custom
pipeline_tag: text-generation
---

# gemma4-31b-ja-agent-coder

**Japanese-enhanced agentic coding model** — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support.

## Highlights

- **Agentic behavior**: ReAct reasoning, multi-step tool calling, self-correction
- **Japanese coding**: Code generation, review, debugging in Japanese
- **Claude Code compatible**: Designed as a local subagent for Claude Code via MCP
- **Function calling**: Native Ollama/OpenAI tool use format
- **Zero API cost**: Runs locally on 20GB+ VRAM

## Benchmark Results

Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).

| Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta |
|----------|:---:|:---:|:---:|
| ReAct Tool Call | 10.0 | **10.0** | — |
| Function Calling | 8.0 | **10.0** | +2.0 |
| Multi-step ReAct | 8.0 | **10.0** | +2.0 |
| JP Code Gen (API) | 10.0 | **10.0** | — |
| JP Code Gen (Algorithm) | 10.0 | **10.0** | — |
| JP Code Gen (Database) | 9.0 | **10.0** | +1.0 |
| JP Debug (TypeError) | 10.0 | **10.0** | — |
| JP Debug (KeyError) | 10.0 | **10.0** | — |
| JP Code Review | 8.0 | **10.0** | +2.0 |
| JP Git Strategy | 10.0 | **10.0** | — |
| JP Self-correction | 10.0 | **10.0** | — |
| JP Documentation | 10.0 | **10.0** | — |
| **Overall** | **9.4** | **10.0** | **+0.6** |

### Key Improvements

- **Function Calling**: Clean `<tool_call>` JSON format output (base model adds extra explanation)
- **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow
- **Code Review**: Parameterized query suggestions for SQL injection fixes
- **Database CRUD**: Complete Create/Read/Update/Delete coverage

### Inference Test Results (v2 adapter)

| Test | Input | Result |
|------|-------|--------|
| ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action |
| JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint |
| JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code |
| Function Calling | "Use read_file to read README.md" | Clean `<tool_call>` JSON format |

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | google/gemma-4-31b-it |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Trainable params | 133M / 31B (0.43%) |
| Training data | 1,546 custom samples (v2) |
| Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) |
| Learning rate | 1.5e-4 (cosine) |
| Final loss | 0.98 |
| Token accuracy | 96.8% |
| Training time | ~1.5 hours |
| Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |

## Training Data Categories

| Category | Samples | Description |
|----------|---------|-------------|
| ReAct Tool Calling | ~120 | Single/chained tool calls |
| Multi-step Agentic Trajectory | ~100 | Plan→Tool→Observe→Correct→Answer loops |
| Self-correction | ~40 | Error recovery patterns |
| Function Calling | ~50 | Ollama native tool format |
| Japanese Code Generation | ~200 | JP instruction → Python/TS code |
| Japanese Code Review | ~100 | Security, refactoring, best practices |
| Japanese Error Explanation | ~80 | Error → JP diagnosis + fix |
| Japanese Comprehension | ~50 | Reading, reasoning, summarization |
| Debugging & Troubleshooting | ~100 | Error analysis → root cause → fix |
| Git & CI/CD | ~80 | Branch strategy, PR, GitHub Actions |
| Project Planning | ~80 | Requirements → task decomposition |
| Technical Documentation | ~80 | README, API docs, specs |
| Algorithms & Data Structures | ~200 | Binary search, DP, graph, sorting |
| Web Frameworks | ~200 | FastAPI, Django, React, Next.js |
| Database Operations | ~150 | SQLAlchemy, PostgreSQL, Redis |
| Testing & DevOps | ~150 | pytest, Docker, K8s, Terraform |

## Use with Ollama

```bash
# After GGUF conversion
ollama create gemma4-ja-agent-coder -f Modelfile
ollama run gemma4-ja-agent-coder
```

## Use with helix-agents (Claude Code MCP)

Reduce Claude Code API token consumption by delegating routine tasks to this local model.

```json
{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}
```

## Use with transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                          bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
                                              quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")
```

> **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround.

## License

Apache 2.0 (same as base model)

## Author

[tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code.