Instructions to use MiniMaxAI/MiniMax-M2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MiniMaxAI/MiniMax-M2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M2", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MiniMaxAI/MiniMax-M2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MiniMaxAI/MiniMax-M2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MiniMaxAI/MiniMax-M2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MiniMaxAI/MiniMax-M2
- SGLang
How to use MiniMaxAI/MiniMax-M2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MiniMaxAI/MiniMax-M2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MiniMaxAI/MiniMax-M2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MiniMaxAI/MiniMax-M2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MiniMaxAI/MiniMax-M2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MiniMaxAI/MiniMax-M2 with Docker Model Runner:
docker model run hf.co/MiniMaxAI/MiniMax-M2
update guides
Browse files
docs/sglang_deploy_guide.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# MiniMax M2 Model SGLang Deployment Guide
|
| 2 |
|
|
|
|
|
|
|
| 3 |
We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
|
| 4 |
|
| 5 |
## Applicable Models
|
|
|
|
| 1 |
# MiniMax M2 Model SGLang Deployment Guide
|
| 2 |
|
| 3 |
+
[English Version](./sglang_deploy_guide.md) | [Chinese Version](./sglang_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
We recommend using [SGLang](https://github.com/sgl-project/sglang) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. SGLang is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing SGLang's official documentation to check hardware compatibility before deployment.
|
| 6 |
|
| 7 |
## Applicable Models
|
docs/sglang_deploy_guide_cn.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# MiniMax M2 模型 SGLang 部署指南
|
| 2 |
|
|
|
|
|
|
|
| 3 |
我们推荐使用 [SGLang](https://github.com/sgl-project/sglang) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。SGLang 是一个高性能的推理引擎,其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 SGLang 的官方文档以检查硬件兼容性。
|
| 4 |
|
| 5 |
## 本文档适用模型
|
|
|
|
| 1 |
# MiniMax M2 模型 SGLang 部署指南
|
| 2 |
|
| 3 |
+
[英文版](./sglang_deploy_guide.md) | [中文版](./sglang_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
我们推荐使用 [SGLang](https://github.com/sgl-project/sglang) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。SGLang 是一个高性能的推理引擎,其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 SGLang 的官方文档以检查硬件兼容性。
|
| 6 |
|
| 7 |
## 本文档适用模型
|
docs/tool_calling_guide.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# MiniMax-M2 Tool Calling Guide
|
| 2 |
|
|
|
|
|
|
|
| 3 |
## Introduction
|
| 4 |
|
| 5 |
The MiniMax-M2 model supports tool calling capabilities, enabling the model to identify when external tools need to be called and output tool call parameters in a structured format. This document provides detailed instructions on how to use the tool calling features of MiniMax-M2.
|
|
|
|
| 1 |
# MiniMax-M2 Tool Calling Guide
|
| 2 |
|
| 3 |
+
[English Version](./tool_calling_guide.md) | [Chinese Version](./tool_calling_guide_cn.md)
|
| 4 |
+
|
| 5 |
## Introduction
|
| 6 |
|
| 7 |
The MiniMax-M2 model supports tool calling capabilities, enabling the model to identify when external tools need to be called and output tool call parameters in a structured format. This document provides detailed instructions on how to use the tool calling features of MiniMax-M2.
|
docs/tool_calling_guide_cn.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# MiniMax-M2 工具调用指南
|
| 2 |
|
|
|
|
|
|
|
| 3 |
## 简介
|
| 4 |
|
| 5 |
MiniMax-M2 模型支持工具调用功能,使模型能够识别何时需要调用外部工具,并以结构化格式输出工具调用参数。本文档提供了有关如何使用 MiniMax-M2 工具调用功能的详细说明。
|
|
|
|
| 1 |
# MiniMax-M2 工具调用指南
|
| 2 |
|
| 3 |
+
[英文版](./tool_calling_guide.md) | [中文版](./tool_calling_guide_cn.md)
|
| 4 |
+
|
| 5 |
## 简介
|
| 6 |
|
| 7 |
MiniMax-M2 模型支持工具调用功能,使模型能够识别何时需要调用外部工具,并以结构化格式输出工具调用参数。本文档提供了有关如何使用 MiniMax-M2 工具调用功能的详细说明。
|
docs/vllm_deploy_guide.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# MiniMax M2 Model vLLM Deployment Guide
|
| 2 |
|
|
|
|
|
|
|
| 3 |
We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. vLLM is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing vLLM's official documentation to check hardware compatibility before deployment.
|
| 4 |
|
| 5 |
## Applicable Models
|
|
|
|
| 1 |
# MiniMax M2 Model vLLM Deployment Guide
|
| 2 |
|
| 3 |
+
[English Version](./vllm_deploy_guide.md) | [Chinese Version](./vllm_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
We recommend using [vLLM](https://docs.vllm.ai/en/stable/) to deploy the [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) model. vLLM is a high-performance inference engine with excellent serving throughput, efficient and intelligent memory management, powerful batch request processing capabilities, and deeply optimized underlying performance. We recommend reviewing vLLM's official documentation to check hardware compatibility before deployment.
|
| 6 |
|
| 7 |
## Applicable Models
|
docs/vllm_deploy_guide_cn.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
# MiniMax M2 模型 vLLM 部署指南
|
| 2 |
|
|
|
|
|
|
|
| 3 |
我们推荐使用 [vLLM](https://docs.vllm.ai/en/stable/) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。vLLM 是一个高性能的推理引擎,其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 vLLM 的官方文档以检查硬件兼容性。
|
| 4 |
|
| 5 |
## 本文档适用模型
|
|
|
|
| 1 |
# MiniMax M2 模型 vLLM 部署指南
|
| 2 |
|
| 3 |
+
[英文版](./vllm_deploy_guide.md) | [中文版](./vllm_deploy_guide_cn.md)
|
| 4 |
+
|
| 5 |
我们推荐使用 [vLLM](https://docs.vllm.ai/en/stable/) 来部署 [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) 模型。vLLM 是一个高性能的推理引擎,其具有卓越的服务吞吐、高效智能的内存管理机制、强大的批量请求处理能力、深度优化的底层性能等特性。我们建议在部署之前查看 vLLM 的官方文档以检查硬件兼容性。
|
| 6 |
|
| 7 |
## 本文档适用模型
|