--- language: - ja - en license: apache-2.0 base_model: google/gemma-4-31b-it tags: - gemma4 - code - agent - japanese - qlora - react - mcp - claude-code datasets: - custom pipeline_tag: text-generation --- # gemma4-31b-ja-agent-coder **Japanese-enhanced agentic coding model** — Fine-tuned gemma4-31b-it for autonomous coding agents with Japanese language support. ## Highlights - **Agentic behavior**: ReAct reasoning, multi-step tool calling, self-correction - **Japanese coding**: Code generation, review, debugging in Japanese - **Claude Code compatible**: Designed as a local subagent for Claude Code via MCP - **Function calling**: Native Ollama/OpenAI tool use format - **Zero API cost**: Runs locally on 20GB+ VRAM ## Benchmark Results Evaluated on 12 task categories across agentic coding capabilities. Each criterion is scored 0-1, averaged per category (scale 0-10). | Category | Base (gemma4-31b-it) | Fine-tuned (v2) | Delta | |----------|:---:|:---:|:---:| | ReAct Tool Call | 10.0 | **10.0** | — | | Function Calling | 8.0 | **10.0** | +2.0 | | Multi-step ReAct | 8.0 | **10.0** | +2.0 | | JP Code Gen (API) | 10.0 | **10.0** | — | | JP Code Gen (Algorithm) | 10.0 | **10.0** | — | | JP Code Gen (Database) | 9.0 | **10.0** | +1.0 | | JP Debug (TypeError) | 10.0 | **10.0** | — | | JP Debug (KeyError) | 10.0 | **10.0** | — | | JP Code Review | 8.0 | **10.0** | +2.0 | | JP Git Strategy | 10.0 | **10.0** | — | | JP Self-correction | 10.0 | **10.0** | — | | JP Documentation | 10.0 | **10.0** | — | | **Overall** | **9.4** | **10.0** | **+0.6** | ### Key Improvements - **Function Calling**: Clean `` JSON format output (base model adds extra explanation) - **Multi-step ReAct**: Structured JSON reasoning with proper Thought/Action/Observation flow - **Code Review**: Parameterized query suggestions for SQL injection fixes - **Database CRUD**: Complete Create/Read/Update/Delete coverage ### Inference Test Results (v2 adapter) | Test | Input | Result | |------|-------|--------| | ReAct | "Read src/main.py using read_file tool" | Correct JSON with thought + action | | JP Code Gen | "FastAPIでヘルスチェックエンドポイントを作成" | Clean Python with `/healthz` endpoint | | JP Debug | "TypeError: 'NoneType' is not subscriptable の原因と修正" | Japanese explanation + fix code | | Function Calling | "Use read_file to read README.md" | Clean `` JSON format | ## Training Details | Parameter | Value | |-----------|-------| | Base model | google/gemma-4-31b-it | | Method | QLoRA (4-bit NF4) | | LoRA rank | 16 | | LoRA alpha | 32 | | Target modules | q/k/v/o_proj, gate/up/down_proj | | Trainable params | 133M / 31B (0.43%) | | Training data | 1,546 custom samples (v2) | | Epochs | 2 (3rd epoch interrupted, checkpoint-388 used) | | Learning rate | 1.5e-4 (cosine) | | Final loss | 0.98 | | Token accuracy | 96.8% | | Training time | ~1.5 hours | | Hardware | NVIDIA RTX PRO 6000 (96GB VRAM) | ## Training Data Categories | Category | Samples | Description | |----------|---------|-------------| | ReAct Tool Calling | ~120 | Single/chained tool calls | | Multi-step Agentic Trajectory | ~100 | Plan→Tool→Observe→Correct→Answer loops | | Self-correction | ~40 | Error recovery patterns | | Function Calling | ~50 | Ollama native tool format | | Japanese Code Generation | ~200 | JP instruction → Python/TS code | | Japanese Code Review | ~100 | Security, refactoring, best practices | | Japanese Error Explanation | ~80 | Error → JP diagnosis + fix | | Japanese Comprehension | ~50 | Reading, reasoning, summarization | | Debugging & Troubleshooting | ~100 | Error analysis → root cause → fix | | Git & CI/CD | ~80 | Branch strategy, PR, GitHub Actions | | Project Planning | ~80 | Requirements → task decomposition | | Technical Documentation | ~80 | README, API docs, specs | | Algorithms & Data Structures | ~200 | Binary search, DP, graph, sorting | | Web Frameworks | ~200 | FastAPI, Django, React, Next.js | | Database Operations | ~150 | SQLAlchemy, PostgreSQL, Redis | | Testing & DevOps | ~150 | pytest, Docker, K8s, Terraform | ## Use with Ollama ```bash # After GGUF conversion ollama create gemma4-ja-agent-coder -f Modelfile ollama run gemma4-ja-agent-coder ``` ## Use with helix-agents (Claude Code MCP) Reduce Claude Code API token consumption by delegating routine tasks to this local model. ```json { "mcpServers": { "helix-agents": { "command": "uv", "args": ["run", "--directory", "/path/to/helix-agent", "python", "server.py"] } } } ``` ## Use with transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import torch bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16) base = AutoModelForCausalLM.from_pretrained("google/gemma-4-31b-it", quantization_config=bnb, device_map="auto") model = PeftModel.from_pretrained(base, "Tsunamayo7/gemma4-31b-ja-agent-coder") tokenizer = AutoTokenizer.from_pretrained("Tsunamayo7/gemma4-31b-ja-agent-coder") ``` > **Note**: Gemma4 uses `Gemma4ClippableLinear` which requires a PEFT monkey-patch. See [this gist](https://gist.github.com/) for the workaround. ## License Apache 2.0 (same as base model) ## Author [tsunamayo7](https://github.com/tsunamayo7) — Builder of [helix-agents](https://github.com/tsunamayo7/helix-agents), a local LLM delegation framework for Claude Code.