Transformers documentation

MLX

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v5.1.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

MLX

MLX is an array framework for machine learning on Apple silicon that also works with CUDA. On Apple silicon, arrays stay in shared memory to avoid data copies between CPU and GPU. Lazy computation enables graph manipulation and optimizations. Native safetensors support means Transformers language models run directly on MLX.

Install the mlx-lm library.

pip install mlx-lm transformers

Load any Transformers language model from the Hub as long as the model architecture is supported. No weight conversion is required.

from mlx_lm import load, generate

model, tokenizer = load("openai/gpt-oss-20b")
output = generate(
    model,
    tokenizer,
    prompt="The capital of France is",
    max_tokens=100,
)
print(output)

Transformers integration

  • mlx_lm.load loads safetensor weights and returns a model and tokenizer.
  • MLX loads weight arrays keyed by tensor names and maps them into an MLX nn.Module parameter tree. This matches how Transformers checkpoints are organized.

The MLX Transformers integration is bidirectional. Transformers can also load and run MLX weights from the Hub.

Resources

  • MLX documentation
  • mlx-lm repository containing MLX LLM implementations
  • mlx-vlm community library with VLM implementations
Update on GitHub