| | --- |
| | language: |
| | - en |
| | license: mit |
| | pipeline_tag: text-generation |
| | tags: |
| | - mlx |
| | - mixture-of-experts |
| | - moe |
| | - pruning |
| | - reap |
| | - minimax |
| | - 4bit |
| | - quantized |
| | - apple-silicon |
| | library_name: mlx |
| | base_model: Akicou/MiniMax-M2-5-REAP-29 |
| | --- |
| | |
| | <p align="center"> |
| | <a href="https://vmlx.net"> |
| | <img src="vmlx-logo.png" alt="vMLX" width="120"> |
| | </a> |
| | </p> |
| | |
| | # MiniMax-M2.5 REAP-29 — MLX 4-bit |
| |
|
| | MLX 4-bit quantized version of [Akicou/MiniMax-M2-5-REAP-29](https://huggingface.co/Akicou/MiniMax-M2-5-REAP-29) for efficient local inference on Apple Silicon. |
| |
|
| | - **Quantization**: 4-bit (group size 64, affine mode; router gates at 8-bit) |
| | - **Architecture**: MiniMax M2.5 MoE — 62 layers, 180 experts (REAP-pruned from 256), 8 active per token |
| | - **Context**: 196K tokens |
| | - **Size**: ~85 GB |
| | - **Pruning**: 29% of experts removed via [REAP](https://github.com/CerebrasResearch/reap) (Router Expert Activation Pruning) |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from mlx_lm import load, generate |
| | |
| | model, tokenizer = load("shieldstackllc/MiniMax-M2.5-REAP-29-mlx-4bit") |
| | response = generate(model, tokenizer, prompt="Hello!", verbose=True) |
| | ``` |
| |
|
| | Or with [vMLX](https://vmlx.net) for native macOS inference. |
| |
|
| | ## About |
| |
|
| | MiniMax-M2.5 is a large Mixture-of-Experts language model by MiniMax AI. This variant was pruned to 29% fewer experts by [Akicou](https://huggingface.co/Akicou) using REAP (Router Expert Activation Pruning), reducing model size and memory footprint while maintaining strong performance. MLX quantization by [vMLX](https://vmlx.net). |
| |
|
| | ## Also Available |
| |
|
| | - [MiniMax-M2.5-REAP-39 MLX 4-bit](https://huggingface.co/shieldstackllc/MiniMax-M2-5-REAP-39-mlx-4bit) (~73 GB) — 39% pruned variant |
| | - [MiniMax-M2.5-REAP-39 MLX 8-bit](https://huggingface.co/shieldstackllc/MiniMax-M2-5-REAP-39-mlx-8bit) (~138 GB) — 39% pruned variant |
| |
|
| | ## Made for vMLX |
| |
|
| | This model was converted and optimized for [vMLX](https://vmlx.net) — a free, open source macOS native MLX inference engine for Apple Silicon. Download vMLX to run this model locally with zero configuration. |
| |
|
| | ## Credits |
| |
|
| | - **Base model**: [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) by MiniMax AI |
| | - **REAP pruning**: [Akicou/MiniMax-M2-5-REAP-29](https://huggingface.co/Akicou/MiniMax-M2-5-REAP-29) by Akicou |
| | - **MLX conversion**: [vMLX](https://vmlx.net) — Run AI locally on Mac. No compromises. |
| |
|
| | ## Contact |
| |
|
| | For questions, issues, or collaboration: **admin@vmlx.net** |
| |
|