YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Kimi K2.6 β Fixed Chat Template v2 (Generic Tool Call Format)
A community-fixed Jinja2 chat template for Kimi K2.6 (all quant formats β GGUF, MLX, etc.) that resolves tool calling failures across all major inference engines by replacing Kimi's native special tokens with a generic format that llama.cpp, ik_llama.cpp, oMLX, and LM Studio can actually parse.
Drop-in replacement for the default tokenizer_config.json chat template.
The Core Problem
Kimi K2.6 uses unique special tokens for tool calls (<|tool_call_begin|>, <|tool_call_argument_begin|>, <|tool_calls_section_begin|>, etc.). The only inference engine with a native parser for these tokens is vLLM. Every other engine β llama.cpp, ik_llama.cpp, oMLX, LM Studio, KoboldCpp β either strips these tokens silently or fails to parse them, resulting in:
- Tool calls being swallowed entirely (model generates correct output, parser eats it)
- Empty responses after tool use (harness never sees the tool call)
Native tool parser failed (ValueError: No tool call found.)errors in oMLX- Tool call content being returned as plain text instead of structured function calls
This template solves the problem by having the model output tool calls in the standard
<tool_call>JSON format that every inference engine's generic parser understands.
Changes from Stock Template
v2 β Generic Tool Call Format (this version)
| Issue | Stock / v1 Behavior | v2 Fix |
|---|---|---|
| Native special tokens for tool calls | Outputs <|tool_call_begin|>, <|tool_call_argument_begin|>, etc. β only parseable by vLLM |
Outputs generic <tool_call>{"name": "...", "arguments": {...}}</tool_call> format understood by all engines |
| Tool response format | Uses ## Return of {{ tool_call_id }} β non-standard |
Uses <tool_response>...</tool_response> tags matching generic format |
| System prompt tool instructions | Shows native token format as example | Shows <tool_call> JSON format so model follows the generic pattern |
| Missing function name | Stock template never outputs function.name in tool calls |
Function name included in JSON output |
v1 Fixes (retained in v2)
| Fix | Description |
|---|---|
Auto-close <think> before tool calls |
Prevents reasoning from leaking into structured tool output |
| Strict tool calling rules | System prompt includes behavioral rules for reliable tool use |
| String-form argument handling | Handles model outputting arguments as JSON string instead of object |
Both </think> and </thinking> recognized |
Supports both close tag variants |
| Think toggles | <|think_on|> / <|think_off|> in any message |
| Developer role | Supports developer role messages |
| Cross-runtime compatible | No .get() calls, no is sequence β works on limited Jinja runtimes |
| Historical reasoning hidden | Previous turns' think blocks stripped to save tokens |
Preserved from Original
- Kimi role tokens:
<|im_user|>,<|im_assistant|>,<|im_system|>,<|im_middle|>,<|im_end|> - Vision tokens:
<|media_begin|>,<|media_content|>,<|media_pad|>,<|media_end|> - Video placeholder:
<|kimi_k25_video_placeholder|> - History/suffix split for reasoning preservation (Kimi's training approach)
preserve_thinkingflag for debuggingtools_ts_strsupport for pre-formatted tool strings- Named roles via
message.name
Important Note
This template changes the tool call output format from Kimi's native tokens to a generic format. The model was trained with native tokens, so there is a possibility it may occasionally revert to its trained format. In testing, providing clear format instructions in the system prompt (which this template does) is sufficient to guide the model to use the generic format consistently. If you see native tokens leaking through, try lowering temperature or adding reinforcing instructions to your system prompt.
Installation
Option 1: llama.cpp / llama-server (recommended)
llama-server -m your-kimi-model.gguf \
--jinja \
--chat-template-file kimi_k2.6_fixed_template_v2.jinja
Option 2: ik_llama.cpp
ik_llama_server -m your-kimi-model.gguf \
--jinja \
--chat-template-file kimi_k2.6_fixed_template_v2.jinja
For thinking control via API:
ik_llama_server -m your-kimi-model.gguf \
--jinja \
--chat-template-file kimi_k2.6_fixed_template_v2.jinja \
--chat-template-kwargs '{"enable_thinking":false}' \
--reasoning-budget 0
Option 3: Replace in tokenizer_config.json
Copy the contents of kimi_k2.6_fixed_template_v2.jinja into the chat_template field of your model's tokenizer_config.json.
Option 4: LM Studio
Go to My Models β model settings β Prompt Template and paste the template contents.
Tool Call Format Comparison
Stock template output (broken on most engines):
<|tool_calls_section_begin|>
<|tool_call_begin|>read
call_abc123<|tool_call_argument_begin|>{"filePath": "/src/main.py"}<|tool_call_end|>
<|tool_calls_section_end|>
v2 template output (works everywhere):
<tool_call>
{"name": "read", "arguments": {"filePath": "/src/main.py"}}
</tool_call>
Tested With
- Kimi K2.6 GGUF (various quants) via llama.cpp and ik_llama.cpp
- Kimi K2.6 MLX (DQ3_K_M-q8) via oMLX on M3 Ultra 512GB
- Kilo Code (VS Code) β agentic coding workflows
- LM Studio β template loads without Jinja errors
Known Limitations
- The model may occasionally emit native Kimi tool call tokens despite template instructions, especially at higher temperatures. Lower temperature (0.6 or below) improves format compliance.
- Vision/multimodal tool calling is untested β the media tokens are preserved but tool calls involving image analysis may behave differently.
- The
tools_ts_strpre-formatted tool string path is preserved but untested with the generic format.
Credits
Template fixes by Hunterx, based on the community fix pattern established by:
- fakezeta's merged Qwen 3.6 template
- allanchan339's vLLM Qwen chat template fix
- froggeric's Qwen Fixed Chat Templates
- ubergarm's Kimi K2.6 GGUF β approach of making Kimi "behave like Qwen3.6" for llama.cpp compatibility
Related
- Kimi K2.6 official model
- MiniMax M2.7 Fixed Template (same fix pattern, native tokens preserved since MiniMax parsers work)
- ubergarm's Kimi K2.6 GGUF quants (ik_llama.cpp optimized quants)
License
Same license as the original Kimi K2.6 model.