| --- |
| language: |
| - ja |
| - en |
| license: apache-2.0 |
| base_model: google/gemma-4-31b-it |
| tags: |
| - gemma4 |
| - code |
| - agent |
| - japanese |
| - qlora |
| - react |
| - mcp |
| - claude-code |
| datasets: |
| - custom |
| pipeline_tag: text-generation |
| --- |
| |
| # gemma4-31b-ja-agent-coder |
|
|
| **Japanese-enhanced agentic coding model** — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support. |
|
|
| ## Highlights |
|
|
| - **Agentic behavior**: ReAct reasoning, multi-step tool calling, self-correction |
| - **Japanese coding**: Code generation, review, debugging in Japanese |
| - **Claude Code compatible**: Designed as a local subagent for Claude Code via MCP |
| - **Function calling**: Native Ollama/OpenAI tool use format |
| - **Zero API cost**: Runs locally on 20GB+ VRAM |
|
|
| ## Benchmark Results |
|
|
| Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10). |
|
|
| | Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta | |
| |----------|:---:|:---:|:---:| |
| | ReAct Tool Call | 10.0 | **10.0** | — | |
| | Function Calling | 8.0 | **10.0** | +2.0 | |
| | Multi-step ReAct | 8.0 | **10.0** | +2.0 | |
| | JP Code Gen (API) | 10.0 | **10.0** | — | |
| | JP Code Gen (Algorithm) | 10.0 | **10.0** | — | |
| | JP Code Gen (Database) | 9.0 | **10.0** | +1.0 | |
| | JP Debug (TypeError) | 10.0 | **10.0** | — | |
| | JP Debug (KeyError) | 10.0 | **10.0** | — | |
| | JP Code Review | 8.0 | **10.0** | +2.0 | |
| | JP Git Strategy | 10.0 | **10.0** | — | |
| | JP Self-correction | 10.0 | **10.0** | — | |
| | JP Documentation | 10.0 | **10.0** | — | |
| | **Overall** | **9.4** | **10.0** | **+0.6** | |
|
|
| ### Key Improvements |
|
|
| - **Function Calling**: Clean `<tool_call>` JSON format output (base model adds extra explanation) |
| - **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow |
| - **Code Review**: Parameterized query suggestions for SQL injection fixes |
| - **Database CRUD**: Complete Create/Read/Update/Delete coverage |
|
|
| ### Inference Test Results (v2 adapter) |
|
|
| | Test | Input | Result | |
| |------|-------|--------| |
| | ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action | |
| | JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint | |
| | JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code | |
| | Function Calling | "Use read_file to read README.md" | Clean `<tool_call>` JSON format | |
|
|
| ## Training Details |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Base model | google/gemma-4-31b-it | |
| | Method | QLoRA (4-bit NF4) | |
| | LoRA rank | 16 | |
| | LoRA alpha | 32 | |
| | Target modules | q/k/v/o_proj, gate/up/down_proj | |
| | Trainable params | 133M / 31B (0.43%) | |
| | Training data | 1,546 custom samples (v2) | |
| | Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) | |
| | Learning rate | 1.5e-4 (cosine) | |
| | Final loss | 0.98 | |
| | Token accuracy | 96.8% | |
| | Training time | ~1.5 hours | |
| | Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) | |
|
|
| ## Training Data Categories |
|
|
| | Category | Samples | Description | |
| |----------|---------|-------------| |
| | ReAct Tool Calling | ~120 | Single/chained tool calls | |
| | Multi-step Agentic Trajectory | ~100 | Plan→Tool→Observe→Correct→Answer loops | |
| | Self-correction | ~40 | Error recovery patterns | |
| | Function Calling | ~50 | Ollama native tool format | |
| | Japanese Code Generation | ~200 | JP instruction → Python/TS code | |
| | Japanese Code Review | ~100 | Security, refactoring, best practices | |
| | Japanese Error Explanation | ~80 | Error → JP diagnosis + fix | |
| | Japanese Comprehension | ~50 | Reading, reasoning, summarization | |
| | Debugging & Troubleshooting | ~100 | Error analysis → root cause → fix | |
| | Git & CI/CD | ~80 | Branch strategy, PR, GitHub Actions | |
| | Project Planning | ~80 | Requirements → task decomposition | |
| | Technical Documentation | ~80 | README, API docs, specs | |
| | Algorithms & Data Structures | ~200 | Binary search, DP, graph, sorting | |
| | Web Frameworks | ~200 | FastAPI, Django, React, Next.js | |
| | Database Operations | ~150 | SQLAlchemy, PostgreSQL, Redis | |
| | Testing & DevOps | ~150 | pytest, Docker, K8s, Terraform | |
|
|
| ## Use with Ollama |
|
|
| ```bash |
| # After GGUF conversion |
| ollama create gemma4-ja-agent-coder -f Modelfile |
| ollama run gemma4-ja-agent-coder |
| ``` |
|
|
| ## Use with helix-agents (Claude Code MCP) |
|
|
| Reduce Claude Code API token consumption by delegating routine tasks to this local model. |
|
|
| ```json |
| { |
| "mcpServers": { |
| "helix-agents": { |
| "command": "uv", |
| "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"] |
| } |
| } |
| } |
| ``` |
|
|
| ## Use with transformers |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| from peft import PeftModel |
| import torch |
| |
| bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", |
| bnb_4bit_compute_dtype=torch.bfloat16) |
| base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it", |
| quantization_config=bnb, device_map="auto") |
| model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder") |
| tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder") |
| ``` |
|
|
| > **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround. |
|
|
| ## License |
|
|
| Apache 2.0 (same as base model) |
|
|
| ## Author |
|
|
| [tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code. |
|
|