pstan's picture
Add files using upload-large-folder tool
cf38c3b verified
|
Raw
History Blame Contribute Delete
2.23 kB
metadata
license: other
base_model: MiniMaxAI/MiniMax-M3
pipeline_tag: image-text-to-text
library_name: transformers
tags:
  - minimax-m3
  - fp8
  - compressed-tensors
  - llm-compressor
  - vllm
  - rocm
  - conversational
  - image-text-to-text

MiniMax-M3-FP8-dynamic

Model Overview

This model is an FP8 dynamic quantized version of MiniMaxAI/MiniMax-M3.

  • Base model: MiniMaxAI/MiniMax-M3
  • Optimization: FP8 dynamic quantization
  • Format: safetensors / compressed-tensors
  • Validated runtime: vLLM OpenAI-compatible server
  • Tested hardware: AMD MI350, tensor parallel size 8

MiniMax-M3 is a native multimodal MoE model. The original model card describes it as a ~428B parameter model with ~23B activated parameters and 1M context support.

License

This quantized checkpoint follows the license terms of the base model, MiniMaxAI/MiniMax-M3. The Hugging Face model-card metadata uses license: other because the MiniMax community license is not one of the Hub's enumerated license identifiers.

Model Optimizations

This checkpoint uses FP8 dynamic quantization to reduce memory and disk requirements while preserving model quality. Validation below compares this quantized checkpoint against the BF16 MiniMaxAI/MiniMax-M3 baseline.

Evaluation

The model was evaluated against BF16 MiniMaxAI/MiniMax-M3. Scores are averaged across seeds.

Benchmark MiniMaxAI/MiniMax-M3 EmbeddedLLM/MiniMax-M3-FP8-dynamic Recovery (%)
GSM8k Platinum 95.81 95.92 100.12
IfEval 80.65 79.42 98.47
AIME 2025 20.83 19.17 92.00
GPQA diamond 77.78 77.95 100.22
Math 500 81.20 79.93 98.44
Lcb Codegeneration V6 37.14 35.62 95.90
MMLU Pro Chat 79.85 79.62 99.72

Evaluation Setup

  • Standard seeds: 42, 1234, 4158
  • AIME 2025 seeds: 42, 1234, 4158, 5322, 1356, 9843, 3344, 5678
  • GSM8K Platinum cap: max_gen_toks=64000
  • IFEval, AIME, GPQA, Math 500, MMLU Pro Chat cap: max_gen_toks=4096
  • LiveCodeBench v6 cap: max_gen_toks=2048
  • MiniMax thinking mode: disabled
  • Runners: lm-eval harness and lighteval through LiteLLM endpoint mode