MiniMax M2.7 — Fixed Chat Template

A community-fixed Jinja2 chat template for MiniMax M2.7 (all quant formats — GGUF, MLX, etc.) that resolves tool calling failures, thinking block corruption, and improves multi-turn agent reliability.

Drop-in replacement for the default tokenizer_config.json chat template.

Why This Fix?

The stock MiniMax M2.7 chat template has several issues that cause tool calling to silently break, especially in agentic coding workflows (Kilo Code, Continue, OpenClaw, etc.). These failures don't produce errors — the model just generates malformed output that the harness can't parse.

This fixed template resolves all known issues while maintaining full backward compatibility with the original MiniMax M2.7 special token format.

Changes from Stock Template

Critical Fixes

Issue	Stock Behavior	Fixed Behavior
Unclosed `<think>` before tool calls	Model reasoning leaks into `<minimax:tool_call>` XML, corrupting structured output	Auto-detects unclosed `<think>` tags and inserts `</think>` before the first tool call
No tool calling instructions	System prompt only shows format example, no behavioral rules	Injects strict rules: close thinking first, reasoning before not after tool calls, include all required params, preserve user strings verbatim
String-form tool arguments	Assumes `arguments` is always a dict — breaks with `TypeError` when model outputs JSON string	New `render_tool_arguments` macro parses string-form JSON into proper key-value parameters
Only `</think>` recognized	`</thinking>` variant is silently ignored, leaving reasoning content in the response body	Both `</think>` and `</thinking>` are recognized as valid close tags

Quality of Life Improvements

Feature	Description
Think toggles	`<\|think_on\|>` / `<\|think_off\|>` in system, user, or assistant messages to control thinking per-message
Developer role	Supports `developer` role as first message (same as `system` but distinct for frameworks that use it)
Historical reasoning hidden	Previous turns' `<think>` blocks are stripped from context by default, saving tokens. Only the current generation's thinking is shown. Set `preserve_thinking=true` to show all.
LM Studio compatible	Uses `is iterable` instead of `is sequence` for compatibility with LM Studio's limited Jinja runtime
Better error messages	Clearer exceptions for unexpected argument types and role ordering

Preserved from Original

MiniMax special tokens: ]~!b[, ]~b], [e~[ (BOS, role start, role end)
<minimax:tool_call> / </minimax:tool_call> tool call delimiters
<invoke> / <parameter> XML structure for tool calls
<response> wrapping for tool results with multi-response grouping
Default model identity string
current_date and current_location system message fields
ensure_ascii=False for non-ASCII safe JSON rendering

Installation

Option 1: Replace in tokenizer_config.json

Copy the contents of minimax_m2.7_fixed_template.jinja into the chat_template field of your model's tokenizer_config.json.

Option 2: LM Studio

Go to My Models → model settings → Prompt Template and paste the template contents.

Option 3: llama.cpp / llama-server

llama-server -m your-model.gguf --chat-template-file minimax_m2.7_fixed_template.jinja

Option 4: ik_llama.cpp

ik_llama_server -m your-model.gguf --chat-template-file minimax_m2.7_fixed_template.jinja

Tested With

MiniMax M2.7 GGUF Q8 (Unsloth) via llama.cpp / ik_llama.cpp on M3 Ultra 512GB — 21 t/s generation, clean tool calling
Kilo Code (VS Code) — repeated code edits with no tool call corruption
LM Studio — template loads and renders without errors
Hermes agent framework — multi-turn agentic workflows stable

Credits

Template fixes by Hunterx, based on the community fix pattern established by:

License

Same license as the original MiniMax M2.7 model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Hunterx
/

MinimaxM2.7FixedTemplate