---
language:
- ja
- en
license: apache-2.0
base_model: google/gemma-4-31b-it
tags:
- gemma4
- code
- agent
- japanese
- qlora
- react
- mcp
- claude-code
datasets:
- custom
pipeline_tag: text-generation
---

# gemma4-31b-ja-agent-coder

**Japanese-enhanced agentic coding model** — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support.

## Highlights

- **Agentic behavior**: ReAct reasoning, multi-step tool calling, self-correction
- **Japanese coding**: Code generation, review, debugging in Japanese
- **Claude Code compatible**: Designed as a local subagent for Claude Code via MCP
- **Function calling**: Native Ollama/OpenAI tool use format
- **Zero API cost**: Runs locally on 20GB+ VRAM

## Benchmark Results

Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10).

| Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta |
|----------|:---:|:---:|:---:|
| ReAct Tool Call | 10.0 | **10.0** | — |
| Function Calling | 8.0 | **10.0** | +2.0 |
| Multi-step ReAct | 8.0 | **10.0** | +2.0 |
| JP Code Gen (API) | 10.0 | **10.0** | — |
| JP Code Gen (Algorithm) | 10.0 | **10.0** | — |
| JP Code Gen (Database) | 9.0 | **10.0** | +1.0 |
| JP Debug (TypeError) | 10.0 | **10.0** | — |
| JP Debug (KeyError) | 10.0 | **10.0** | — |
| JP Code Review | 8.0 | **10.0** | +2.0 |
| JP Git Strategy | 10.0 | **10.0** | — |
| JP Self-correction | 10.0 | **10.0** | — |
| JP Documentation | 10.0 | **10.0** | — |
| **Overall** | **9.4** | **10.0** | **+0.6** |

### Key Improvements

- **Function Calling**: Clean `<tool_call>` JSON format output (base model adds extra explanation)
- **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow
- **Code Review**: Parameterized query suggestions for SQL injection fixes
- **Database CRUD**: Complete Create/Read/Update/Delete coverage

### Inference Test Results (v2 adapter)

| Test | Input | Result |
|------|-------|--------|
| ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action |
| JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint |
| JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code |
| Function Calling | "Use read_file to read README.md" | Clean `<tool_call>` JSON format |

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | google/gemma-4-31b-it |
| Method | QLoRA (4-bit NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Trainable params | 133M / 31B (0.43%) |
| Training data | 1,546 custom samples (v2) |
| Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) |
| Learning rate | 1.5e-4 (cosine) |
| Final loss | 0.98 |
| Token accuracy | 96.8% |
| Training time | ~1.5 hours |
| Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) |

## Training Data Categories

| Category | Samples | Description |
|----------|---------|-------------|
| ReAct Tool Calling | ~120 | Single/chained tool calls |
| Multi-step Agentic Trajectory | ~100 | Plan→Tool→Observe→Correct→Answer loops |
| Self-correction | ~40 | Error recovery patterns |
| Function Calling | ~50 | Ollama native tool format |
| Japanese Code Generation | ~200 | JP instruction → Python/TS code |
| Japanese Code Review | ~100 | Security, refactoring, best practices |
| Japanese Error Explanation | ~80 | Error → JP diagnosis + fix |
| Japanese Comprehension | ~50 | Reading, reasoning, summarization |
| Debugging & Troubleshooting | ~100 | Error analysis → root cause → fix |
| Git & CI/CD | ~80 | Branch strategy, PR, GitHub Actions |
| Project Planning | ~80 | Requirements → task decomposition |
| Technical Documentation | ~80 | README, API docs, specs |
| Algorithms & Data Structures | ~200 | Binary search, DP, graph, sorting |
| Web Frameworks | ~200 | FastAPI, Django, React, Next.js |
| Database Operations | ~150 | SQLAlchemy, PostgreSQL, Redis |
| Testing & DevOps | ~150 | pytest, Docker, K8s, Terraform |

## Use with Ollama

```bash
# After GGUF conversion
ollama create gemma4-ja-agent-coder -f Modelfile
ollama run gemma4-ja-agent-coder
```

## Use with helix-agents (Claude Code MCP)

Reduce Claude Code API token consumption by delegating routine tasks to this local model.

```json
{
  "mcpServers": {
    "helix-agents": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"]
    }
  }
}
```

## Use with transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                          bnb_4bit_compute_dtype=torch.bfloat16)
base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it",
                                              quantization_config=bnb, device_map="auto")
model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder")
tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder")
```

> **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround.

## License

Apache 2.0 (same as base model)

## Author

[tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code.