YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Kimi K2.6 — Fixed Chat Template v2 (Generic Tool Call Format)

A community-fixed Jinja2 chat template for Kimi K2.6 (all quant formats — GGUF, MLX, etc.) that resolves tool calling failures across all major inference engines by replacing Kimi's native special tokens with a generic format that llama.cpp, ik_llama.cpp, oMLX, and LM Studio can actually parse.

Drop-in replacement for the default tokenizer_config.json chat template.

The Core Problem

Kimi K2.6 uses unique special tokens for tool calls (<|tool_call_begin|>, <|tool_call_argument_begin|>, <|tool_calls_section_begin|>, etc.). The only inference engine with a native parser for these tokens is vLLM. Every other engine — llama.cpp, ik_llama.cpp, oMLX, LM Studio, KoboldCpp — either strips these tokens silently or fails to parse them, resulting in:

Tool calls being swallowed entirely (model generates correct output, parser eats it)
Empty responses after tool use (harness never sees the tool call)
Native tool parser failed (ValueError: No tool call found.) errors in oMLX
Tool call content being returned as plain text instead of structured function calls This template solves the problem by having the model output tool calls in the standard <tool_call> JSON format that every inference engine's generic parser understands.

Changes from Stock Template

v2 — Generic Tool Call Format (this version)

Issue	Stock / v1 Behavior	v2 Fix
Native special tokens for tool calls	Outputs `<\|tool_call_begin\|>`, `<\|tool_call_argument_begin\|>`, etc. — only parseable by vLLM	Outputs generic `<tool_call>{"name": "...", "arguments": {...}}</tool_call>` format understood by all engines
Tool response format	Uses `## Return of {{ tool_call_id }}` — non-standard	Uses `<tool_response>...</tool_response>` tags matching generic format
System prompt tool instructions	Shows native token format as example	Shows `<tool_call>` JSON format so model follows the generic pattern
Missing function name	Stock template never outputs `function.name` in tool calls	Function name included in JSON output

v1 Fixes (retained in v2)

Fix	Description
Auto-close `<think>` before tool calls	Prevents reasoning from leaking into structured tool output
Strict tool calling rules	System prompt includes behavioral rules for reliable tool use
String-form argument handling	Handles model outputting arguments as JSON string instead of object
Both `</think>` and `</thinking>` recognized	Supports both close tag variants
Think toggles	`<\|think_on\|>` / `<\|think_off\|>` in any message
Developer role	Supports `developer` role messages
Cross-runtime compatible	No `.get()` calls, no `is sequence` — works on limited Jinja runtimes
Historical reasoning hidden	Previous turns' think blocks stripped to save tokens

Preserved from Original

Kimi role tokens: <|im_user|>, <|im_assistant|>, <|im_system|>, <|im_middle|>, <|im_end|>
Vision tokens: <|media_begin|>, <|media_content|>, <|media_pad|>, <|media_end|>
Video placeholder: <|kimi_k25_video_placeholder|>
History/suffix split for reasoning preservation (Kimi's training approach)
preserve_thinking flag for debugging
tools_ts_str support for pre-formatted tool strings
Named roles via message.name

Important Note

This template changes the tool call output format from Kimi's native tokens to a generic format. The model was trained with native tokens, so there is a possibility it may occasionally revert to its trained format. In testing, providing clear format instructions in the system prompt (which this template does) is sufficient to guide the model to use the generic format consistently. If you see native tokens leaking through, try lowering temperature or adding reinforcing instructions to your system prompt.

Installation

Option 1: llama.cpp / llama-server (recommended)

llama-server -m your-kimi-model.gguf \
  --jinja \
  --chat-template-file kimi_k2.6_fixed_template_v2.jinja

Option 2: ik_llama.cpp

ik_llama_server -m your-kimi-model.gguf \
  --jinja \
  --chat-template-file kimi_k2.6_fixed_template_v2.jinja

For thinking control via API:

ik_llama_server -m your-kimi-model.gguf \
  --jinja \
  --chat-template-file kimi_k2.6_fixed_template_v2.jinja \
  --chat-template-kwargs '{"enable_thinking":false}' \
  --reasoning-budget 0

Option 3: Replace in tokenizer_config.json

Copy the contents of kimi_k2.6_fixed_template_v2.jinja into the chat_template field of your model's tokenizer_config.json.

Option 4: LM Studio

Go to My Models → model settings → Prompt Template and paste the template contents.

Tool Call Format Comparison

Stock template output (broken on most engines):

<|tool_calls_section_begin|>
<|tool_call_begin|>read
call_abc123<|tool_call_argument_begin|>{"filePath": "/src/main.py"}<|tool_call_end|>
<|tool_calls_section_end|>

v2 template output (works everywhere):

<tool_call>
{"name": "read", "arguments": {"filePath": "/src/main.py"}}
</tool_call>

Tested With

Kimi K2.6 GGUF (various quants) via llama.cpp and ik_llama.cpp
Kimi K2.6 MLX (DQ3_K_M-q8) via oMLX on M3 Ultra 512GB
Kilo Code (VS Code) — agentic coding workflows
LM Studio — template loads without Jinja errors

Known Limitations

The model may occasionally emit native Kimi tool call tokens despite template instructions, especially at higher temperatures. Lower temperature (0.6 or below) improves format compliance.
Vision/multimodal tool calling is untested — the media tokens are preserved but tool calls involving image analysis may behave differently.
The tools_ts_str pre-formatted tool string path is preserved but untested with the generic format.

Credits

Template fixes by Hunterx, based on the community fix pattern established by:

fakezeta's merged Qwen 3.6 template
allanchan339's vLLM Qwen chat template fix
froggeric's Qwen Fixed Chat Templates
ubergarm's Kimi K2.6 GGUF — approach of making Kimi "behave like Qwen3.6" for llama.cpp compatibility

Kimi K2.6 official model
MiniMax M2.7 Fixed Template (same fix pattern, native tokens preserved since MiniMax parsers work)
ubergarm's Kimi K2.6 GGUF quants (ik_llama.cpp optimized quants)

License

Same license as the original Kimi K2.6 model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Hunterx
/

Kimi_K2.6_ToolCall_Template