Transformers documentation
MLX
You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v5.1.0).
MLX
MLX is an array framework for machine learning on Apple silicon that also works with CUDA. On Apple silicon, arrays stay in shared memory to avoid data copies between CPU and GPU. Lazy computation enables graph manipulation and optimizations. Native safetensors support means Transformers language models run directly on MLX.
Install the mlx-lm library.
pip install mlx-lm transformers
Load any Transformers language model from the Hub as long as the model architecture is supported. No weight conversion is required.
from mlx_lm import load, generate
model, tokenizer = load("openai/gpt-oss-20b")
output = generate(
model,
tokenizer,
prompt="The capital of France is",
max_tokens=100,
)
print(output)Transformers integration
- mlx_lm.load loads safetensor weights and returns a model and tokenizer.
- MLX loads weight arrays keyed by tensor names and maps them into an MLX nn.Module parameter tree. This matches how Transformers checkpoints are organized.
The MLX Transformers integration is bidirectional. Transformers can also load and run MLX weights from the Hub.
Resources
- MLX documentation
- mlx-lm repository containing MLX LLM implementations
- mlx-vlm community library with VLM implementations