Instructions to use mlx-community/MiniMax-M2.7-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/MiniMax-M2.7-3bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/MiniMax-M2.7-3bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use mlx-community/MiniMax-M2.7-3bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/MiniMax-M2.7-3bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/MiniMax-M2.7-3bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/MiniMax-M2.7-3bit
Run Hermes
hermes
- MLX LM
How to use mlx-community/MiniMax-M2.7-3bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/MiniMax-M2.7-3bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/MiniMax-M2.7-3bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Low quality quant
I tried this quant with oMLX backend server, and it seems pretty broken, confusing simple numbers during tasks requiring calculations and going into infinite loops. Tried with the recommended and other sampling params.
It could also be that M2.7 is more sensitive to quantization?
@bibproj thanks,
I remember reading that MiniMax M2.x is more sensitive to aggressive quantization compared to Qwen3 or Qwen3.5 models.
I tried in the past I think MiniMax M2.1 3bit mlx with LM Studio and it was ok.
Sorry I already deleted this 3bit quant, I’m curious to try some of the mixed/dynamic mlx versions that seem to be popular now. I have Mac Studio with 128gb memory, so looking for something to fit plus room for decent context.
Ubergarm is quite good with this. He normally does this using ik_llama.cpp, with good results. It is not MLX, but normally also does work on Macs. You can find his quants for MiniMax-2.7 at https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF. Try the smol-IQ3_KS version at https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF/tree/main/smol-IQ3_KS, which is 93.7 GB. That sounds about right for your 128GB Mac Studio.