MetalRT
Collection
7 items โข Updated โข 2
How to use runanywhere/whisper_medium_4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir whisper_medium_4bit runanywhere/whisper_medium_4bit
Custom MLX 4-bit quantization of OpenAI Whisper Medium optimized for MetalRT GPU inference on Apple Silicon.
Used by RCLI with the MetalRT engine for speech-to-text:
rcli setup # select MetalRT or Both engines
Note: Whisper Medium is in GPU beta. Whisper Tiny is recommended for production use.
| Metric | Value |
|---|---|
| Latency (1.2s audio) | 233 ms |
| RTF | 0.19x |
| Quantization | MLX 4-bit |
Model weights: MIT (OpenAI) MetalRT engine: Proprietary (RunAnywhere, Inc.)
Quantized