--- license: other base_model: MiniMaxAI/MiniMax-M3 pipeline_tag: image-text-to-text library_name: transformers tags: - minimax-m3 - fp8 - compressed-tensors - llm-compressor - vllm - rocm - conversational - image-text-to-text --- # MiniMax-M3-FP8-dynamic ## Model Overview This model is an FP8 dynamic quantized version of [MiniMaxAI/MiniMax-M3](https://huggingface.co/MiniMaxAI/MiniMax-M3). - Base model: `MiniMaxAI/MiniMax-M3` - Optimization: FP8 dynamic quantization - Format: safetensors / compressed-tensors - Validated runtime: vLLM OpenAI-compatible server - Tested hardware: AMD MI350, tensor parallel size 8 MiniMax-M3 is a native multimodal MoE model. The original model card describes it as a ~428B parameter model with ~23B activated parameters and 1M context support. ## License This quantized checkpoint follows the license terms of the base model, [MiniMaxAI/MiniMax-M3](https://huggingface.co/MiniMaxAI/MiniMax-M3). The Hugging Face model-card metadata uses `license: other` because the MiniMax community license is not one of the Hub's enumerated license identifiers. ## Model Optimizations This checkpoint uses FP8 dynamic quantization to reduce memory and disk requirements while preserving model quality. Validation below compares this quantized checkpoint against the BF16 `MiniMaxAI/MiniMax-M3` baseline. ## Evaluation The model was evaluated against BF16 `MiniMaxAI/MiniMax-M3`. Scores are averaged across seeds. | Benchmark | MiniMaxAI/MiniMax-M3 | EmbeddedLLM/MiniMax-M3-FP8-dynamic | Recovery (%) | |---|---:|---:|---:| | GSM8k Platinum | 95.81 | 95.92 | 100.12 | | IfEval | 80.65 | 79.42 | 98.47 | | AIME 2025 | 20.83 | 19.17 | 92.00 | | GPQA diamond | 77.78 | 77.95 | 100.22 | | Math 500 | 81.20 | 79.93 | 98.44 | | Lcb Codegeneration V6 | 37.14 | 35.62 | 95.90 | | MMLU Pro Chat | 79.85 | 79.62 | 99.72 | ## Evaluation Setup - Standard seeds: `42, 1234, 4158` - AIME 2025 seeds: `42, 1234, 4158, 5322, 1356, 9843, 3344, 5678` - GSM8K Platinum cap: `max_gen_toks=64000` - IFEval, AIME, GPQA, Math 500, MMLU Pro Chat cap: `max_gen_toks=4096` - LiveCodeBench v6 cap: `max_gen_toks=2048` - MiniMax thinking mode: disabled - Runners: lm-eval harness and lighteval through LiteLLM endpoint mode