Vanta — Local AI LLM Chat

Qwen3.5-4B-MLX-4bit

A verbatim mirror of mlx-community/Qwen3.5-4B-MLX-4bit, kept here so the Vanta iOS app always has a stable place to download it from.

Run it on your iPhone with Vanta

This is one of the built-in one-tap downloads in Vanta — Local AI LLM Chat, a local-first AI chat app for iPhone and iPad. Vanta runs models like this one fully on-device with Apple's MLX framework — no account and no cloud, your chats stay on your device. Because it's a vision-capable model, you can also chat about images.

Download Vanta on the App Store →


This is a copy. Every file in this repository is an exact copy of mlx-community/Qwen3.5-4B-MLX-4bit. We cloned it so that Vanta Client always has a reliable, always-available source to download this model from, independent of any upstream changes. All credit for the model weights and the MLX conversion goes to mlx-community and the original authors.


Model Details

  • Original Model: Qwen/Qwen3.5-4B
  • Quantization: 4-bit (5.347 bits per weight)
  • Group Size: 64
  • Format: MLX SafeTensors
  • Framework: mlx-vlm
  • Disk Size: ~2.9G

Conversion Details

This model was converted using mlx-vlm from the pc/fix-qwen35-predicate branch, which includes fixes for Qwen3.5 model support (proper handling of MoE gate layers, shared_expert_gate, and A_log casting).

Conversion command:

python3 -m mlx_vlm convert \
  --hf-path "Qwen/Qwen3.5-4B" \
  --mlx-path "./Qwen3.5-4B-MLX-4bit" \
  -q --q-bits 4 --q-group-size 64

Related Models

Usage

from mlx_vlm import load, generate

model, processor = load("TerminatorPower/Qwen3.5-4B-MLX-4bit")

output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.jpg",
    max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
  --model TerminatorPower/Qwen3.5-4B-MLX-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model. The mirror does not add any restrictions.

Downloads last month
65
Safetensors
Model size
1.0B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TerminatorPower/Qwen3.5-4B-MLX-4bit

Finetuned
Qwen/Qwen3.5-4B
Quantized
(213)
this model