YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Kimi K2.6 β€” Fixed Chat Template v2 (Generic Tool Call Format)

A community-fixed Jinja2 chat template for Kimi K2.6 (all quant formats β€” GGUF, MLX, etc.) that resolves tool calling failures across all major inference engines by replacing Kimi's native special tokens with a generic format that llama.cpp, ik_llama.cpp, oMLX, and LM Studio can actually parse.

Drop-in replacement for the default tokenizer_config.json chat template.

The Core Problem

Kimi K2.6 uses unique special tokens for tool calls (<|tool_call_begin|>, <|tool_call_argument_begin|>, <|tool_calls_section_begin|>, etc.). The only inference engine with a native parser for these tokens is vLLM. Every other engine β€” llama.cpp, ik_llama.cpp, oMLX, LM Studio, KoboldCpp β€” either strips these tokens silently or fails to parse them, resulting in:

  • Tool calls being swallowed entirely (model generates correct output, parser eats it)
  • Empty responses after tool use (harness never sees the tool call)
  • Native tool parser failed (ValueError: No tool call found.) errors in oMLX
  • Tool call content being returned as plain text instead of structured function calls This template solves the problem by having the model output tool calls in the standard <tool_call> JSON format that every inference engine's generic parser understands.

Changes from Stock Template

v2 β€” Generic Tool Call Format (this version)

Issue Stock / v1 Behavior v2 Fix
Native special tokens for tool calls Outputs <|tool_call_begin|>, <|tool_call_argument_begin|>, etc. β€” only parseable by vLLM Outputs generic <tool_call>{"name": "...", "arguments": {...}}</tool_call> format understood by all engines
Tool response format Uses ## Return of {{ tool_call_id }} β€” non-standard Uses <tool_response>...</tool_response> tags matching generic format
System prompt tool instructions Shows native token format as example Shows <tool_call> JSON format so model follows the generic pattern
Missing function name Stock template never outputs function.name in tool calls Function name included in JSON output

v1 Fixes (retained in v2)

Fix Description
Auto-close <think> before tool calls Prevents reasoning from leaking into structured tool output
Strict tool calling rules System prompt includes behavioral rules for reliable tool use
String-form argument handling Handles model outputting arguments as JSON string instead of object
Both </think> and </thinking> recognized Supports both close tag variants
Think toggles <|think_on|> / <|think_off|> in any message
Developer role Supports developer role messages
Cross-runtime compatible No .get() calls, no is sequence β€” works on limited Jinja runtimes
Historical reasoning hidden Previous turns' think blocks stripped to save tokens

Preserved from Original

  • Kimi role tokens: <|im_user|>, <|im_assistant|>, <|im_system|>, <|im_middle|>, <|im_end|>
  • Vision tokens: <|media_begin|>, <|media_content|>, <|media_pad|>, <|media_end|>
  • Video placeholder: <|kimi_k25_video_placeholder|>
  • History/suffix split for reasoning preservation (Kimi's training approach)
  • preserve_thinking flag for debugging
  • tools_ts_str support for pre-formatted tool strings
  • Named roles via message.name

Important Note

This template changes the tool call output format from Kimi's native tokens to a generic format. The model was trained with native tokens, so there is a possibility it may occasionally revert to its trained format. In testing, providing clear format instructions in the system prompt (which this template does) is sufficient to guide the model to use the generic format consistently. If you see native tokens leaking through, try lowering temperature or adding reinforcing instructions to your system prompt.

Installation

Option 1: llama.cpp / llama-server (recommended)

llama-server -m your-kimi-model.gguf \
  --jinja \
  --chat-template-file kimi_k2.6_fixed_template_v2.jinja

Option 2: ik_llama.cpp

ik_llama_server -m your-kimi-model.gguf \
  --jinja \
  --chat-template-file kimi_k2.6_fixed_template_v2.jinja

For thinking control via API:

ik_llama_server -m your-kimi-model.gguf \
  --jinja \
  --chat-template-file kimi_k2.6_fixed_template_v2.jinja \
  --chat-template-kwargs '{"enable_thinking":false}' \
  --reasoning-budget 0

Option 3: Replace in tokenizer_config.json

Copy the contents of kimi_k2.6_fixed_template_v2.jinja into the chat_template field of your model's tokenizer_config.json.

Option 4: LM Studio

Go to My Models β†’ model settings β†’ Prompt Template and paste the template contents.

Tool Call Format Comparison

Stock template output (broken on most engines):

<|tool_calls_section_begin|>
<|tool_call_begin|>read
call_abc123<|tool_call_argument_begin|>{"filePath": "/src/main.py"}<|tool_call_end|>
<|tool_calls_section_end|>

v2 template output (works everywhere):

<tool_call>
{"name": "read", "arguments": {"filePath": "/src/main.py"}}
</tool_call>

Tested With

  • Kimi K2.6 GGUF (various quants) via llama.cpp and ik_llama.cpp
  • Kimi K2.6 MLX (DQ3_K_M-q8) via oMLX on M3 Ultra 512GB
  • Kilo Code (VS Code) β€” agentic coding workflows
  • LM Studio β€” template loads without Jinja errors

Known Limitations

  • The model may occasionally emit native Kimi tool call tokens despite template instructions, especially at higher temperatures. Lower temperature (0.6 or below) improves format compliance.
  • Vision/multimodal tool calling is untested β€” the media tokens are preserved but tool calls involving image analysis may behave differently.
  • The tools_ts_str pre-formatted tool string path is preserved but untested with the generic format.

Credits

Template fixes by Hunterx, based on the community fix pattern established by:

Related

License

Same license as the original Kimi K2.6 model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support