Instructions to use TerminatorPower/Qwen3.5-4B-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use TerminatorPower/Qwen3.5-4B-MLX-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen3.5-4B-MLX-4bit TerminatorPower/Qwen3.5-4B-MLX-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Qwen3.5-4B-MLX-4bit
A verbatim mirror of mlx-community/Qwen3.5-4B-MLX-4bit, kept here so the Vanta iOS app always has a stable place to download it from.
Run it on your iPhone with Vanta
This is one of the built-in one-tap downloads in Vanta — Local AI LLM Chat, a local-first AI chat app for iPhone and iPad. Vanta runs models like this one fully on-device with Apple's MLX framework — no account and no cloud, your chats stay on your device. Because it's a vision-capable model, you can also chat about images.
Download Vanta on the App Store →
This is a copy. Every file in this repository is an exact copy of
mlx-community/Qwen3.5-4B-MLX-4bit. We cloned it so that Vanta Client always has a reliable, always-available source to download this model from, independent of any upstream changes. All credit for the model weights and the MLX conversion goes to mlx-community and the original authors.
Model Details
- Original Model: Qwen/Qwen3.5-4B
- Quantization: 4-bit (5.347 bits per weight)
- Group Size: 64
- Format: MLX SafeTensors
- Framework: mlx-vlm
- Disk Size: ~2.9G
Conversion Details
This model was converted using mlx-vlm from the
pc/fix-qwen35-predicate
branch, which includes fixes for Qwen3.5 model support (proper handling of MoE gate
layers, shared_expert_gate, and A_log casting).
Conversion command:
python3 -m mlx_vlm convert \
--hf-path "Qwen/Qwen3.5-4B" \
--mlx-path "./Qwen3.5-4B-MLX-4bit" \
-q --q-bits 4 --q-group-size 64
Related Models
- bf16 (full precision): mlx-community/Qwen3.5-4B-MLX-bf16
- 8-bit quantized: mlx-community/Qwen3.5-4B-MLX-8bit
- Original: Qwen/Qwen3.5-4B
Usage
from mlx_vlm import load, generate
model, processor = load("TerminatorPower/Qwen3.5-4B-MLX-4bit")
output = generate(
model,
processor,
prompt="Describe this image.",
image="path/to/image.jpg",
max_tokens=512
)
print(output)
CLI:
python3 -m mlx_vlm.generate \
--model TerminatorPower/Qwen3.5-4B-MLX-4bit \
--image path/to/image.jpg \
--prompt "Describe this image."
License
This model inherits the Apache 2.0 license from the original Qwen model. The mirror does not add any restrictions.
- Downloads last month
- 65
4-bit