Instructions to use MiniMaxAI/MiniMax-M2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MiniMaxAI/MiniMax-M2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use MiniMaxAI/MiniMax-M2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MiniMaxAI/MiniMax-M2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/MiniMaxAI/MiniMax-M2

SGLang

How to use MiniMaxAI/MiniMax-M2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MiniMaxAI/MiniMax-M2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MiniMaxAI/MiniMax-M2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MiniMaxAI/MiniMax-M2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use MiniMaxAI/MiniMax-M2 with Docker Model Runner:
```
docker model run hf.co/MiniMaxAI/MiniMax-M2
```

sriting commited on Oct 27, 2025

Commit

a970723

1 Parent(s): 4167c37

update guides

Browse files

Files changed (6) hide show

docs/sglang_deploy_guide.md +2 -0
docs/sglang_deploy_guide_cn.md +2 -0
docs/tool_calling_guide.md +2 -0
docs/tool_calling_guide_cn.md +2 -0
docs/vllm_deploy_guide.md +2 -0
docs/vllm_deploy_guide_cn.md +2 -0

docs/sglang_deploy_guide.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # MiniMax M2 Model SGLang Deployment Guide
 We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
 ## Applicable Models

 # MiniMax M2 Model SGLang Deployment Guide
+[English Version](./sglang_deploy_guide.md) | [Chinese Version](./sglang_deploy_guide_cn.md)
 We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
 ## Applicable Models

docs/sglang_deploy_guide_cn.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # MiniMax M2 模型 SGLang 部署指南
 我们推荐使用 [SGLang](https://github.com/sgl-project/sglang) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。SGLang 是一个高性能的推理引擎，其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 SGLang 的官方文档以检查硬件兼容性。
 ## 本文档适用模型

 # MiniMax M2 模型 SGLang 部署指南
+[英文版](./sglang_deploy_guide.md) | [中文版](./sglang_deploy_guide_cn.md)
 我们推荐使用 [SGLang](https://github.com/sgl-project/sglang) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。SGLang 是一个高性能的推理引擎，其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 SGLang 的官方文档以检查硬件兼容性。
 ## 本文档适用模型

docs/tool_calling_guide.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # MiniMax-M2 Tool Calling Guide
 ## Introduction
 The MiniMax-M2 model supports tool calling capabilities, enabling the model to identify when external tools need to be called and output tool call parameters in a structured format. This document provides detailed instructions on how to use the tool calling features of MiniMax-M2.

 # MiniMax-M2 Tool Calling Guide
+[English Version](./tool_calling_guide.md) | [Chinese Version](./tool_calling_guide_cn.md)
 ## Introduction
 The MiniMax-M2 model supports tool calling capabilities, enabling the model to identify when external tools need to be called and output tool call parameters in a structured format. This document provides detailed instructions on how to use the tool calling features of MiniMax-M2.

docs/tool_calling_guide_cn.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # MiniMax-M2 工具调用指南
 ## 简介
 MiniMax-M2 模型支持工具调用功能，使模型能够识别何时需要调用外部工具，并以结构化格式输出工具调用参数。本文档提供了有关如何使用 MiniMax-M2 工具调用功能的详细说明。

 # MiniMax-M2 工具调用指南
+[英文版](./tool_calling_guide.md) | [中文版](./tool_calling_guide_cn.md)
 ## 简介
 MiniMax-M2 模型支持工具调用功能，使模型能够识别何时需要调用外部工具，并以结构化格式输出工具调用参数。本文档提供了有关如何使用 MiniMax-M2 工具调用功能的详细说明。

docs/vllm_deploy_guide.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # MiniMax M2 Model vLLM Deployment Guide
 We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. vLLM is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing vLLM's official documentation to check hardware compatibility before deployment.
 ## Applicable Models

 # MiniMax M2 Model vLLM Deployment Guide
+[English Version](./vllm_deploy_guide.md) | [Chinese Version](./vllm_deploy_guide_cn.md)
 We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. vLLM is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing vLLM's official documentation to check hardware compatibility before deployment.
 ## Applicable Models

docs/vllm_deploy_guide_cn.md CHANGED Viewed

@@ -1,5 +1,7 @@
 # MiniMax M2 模型 vLLM 部署指南
 我们推荐使用 [vLLM](https://docs.vllm.ai/en/stable/) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。vLLM 是一个高性能的推理引擎，其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 vLLM 的官方文档以检查硬件兼容性。
 ## 本文档适用模型

 # MiniMax M2 模型 vLLM 部署指南
+[英文版](./vllm_deploy_guide.md) | [中文版](./vllm_deploy_guide_cn.md)
 我们推荐使用 [vLLM](https://docs.vllm.ai/en/stable/) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。vLLM 是一个高性能的推理引擎，其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 vLLM 的官方文档以检查硬件兼容性。
 ## 本文档适用模型