Instructions to use mlx-community/MiniMax-M2.7-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/MiniMax-M2.7-3bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/MiniMax-M2.7-3bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use mlx-community/MiniMax-M2.7-3bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/MiniMax-M2.7-3bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/MiniMax-M2.7-3bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/MiniMax-M2.7-3bit

Run Hermes

hermes

OpenClaw new

How to use mlx-community/MiniMax-M2.7-3bit with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "mlx-community/MiniMax-M2.7-3bit" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

MLX LM

How to use mlx-community/MiniMax-M2.7-3bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/MiniMax-M2.7-3bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/MiniMax-M2.7-3bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/MiniMax-M2.7-3bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Low quality quant

by DaniDubi - opened Apr 13

Discussion

DaniDubi

Apr 13

I tried this quant with oMLX backend server, and it seems pretty broken, confusing simple numbers during tasks requiring calculations and going into infinite loops. Tried with the recommended and other sampling params.

bibproj

MLX Community org Apr 13

@DaniDubi

Hi. I created this using the standard procedure. Exactly the same as the one for MiniMax-M2.5-3bit. Could you please also try the same prompts using the standard MLX-LM, just in case oMLX is not working well with it?

bibproj

MLX Community org Apr 13

It could also be that M2.7 is more sensitive to quantization?

DaniDubi

Apr 13

@bibproj thanks,
I remember reading that MiniMax M2.x is more sensitive to aggressive quantization compared to Qwen3 or Qwen3.5 models.

I tried in the past I think MiniMax M2.1 3bit mlx with LM Studio and it was ok.

Sorry I already deleted this 3bit quant, I’m curious to try some of the mixed/dynamic mlx versions that seem to be popular now. I have Mac Studio with 128gb memory, so looking for something to fit plus room for decent context.

bibproj

MLX Community org Apr 13

Ubergarm is quite good with this. He normally does this using ik_llama.cpp, with good results. It is not MLX, but normally also does work on Macs. You can find his quants for MiniMax-2.7 at https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF. Try the smol-IQ3_KS version at https://huggingface.co/ubergarm/MiniMax-M2.7-GGUF/tree/main/smol-IQ3_KS, which is 93.7 GB. That sounds about right for your 128GB Mac Studio.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment