Qwen3.5-4B-MLX-4bit

A verbatim mirror of mlx-community/Qwen3.5-4B-MLX-4bit, kept here so the Vanta iOS app always has a stable place to download it from.

Run it on your iPhone with Vanta

This is one of the built-in one-tap downloads in Vanta — Local AI LLM Chat, a local-first AI chat app for iPhone and iPad. Vanta runs models like this one fully on-device with Apple's MLX framework — no account and no cloud, your chats stay on your device. Because it's a vision-capable model, you can also chat about images.

Download Vanta on the App Store →

This is a copy. Every file in this repository is an exact copy of mlx-community/Qwen3.5-4B-MLX-4bit. We cloned it so that Vanta Client always has a reliable, always-available source to download this model from, independent of any upstream changes. All credit for the model weights and the MLX conversion goes to mlx-community and the original authors.

Model Details

Original Model: Qwen/Qwen3.5-4B
Quantization: 4-bit (5.347 bits per weight)
Group Size: 64
Format: MLX SafeTensors
Framework: mlx-vlm
Disk Size: ~2.9G

Conversion Details

This model was converted using mlx-vlm from the pc/fix-qwen35-predicate branch, which includes fixes for Qwen3.5 model support (proper handling of MoE gate layers, shared_expert_gate, and A_log casting).

Conversion command:

python3 -m mlx_vlm convert \
  --hf-path "Qwen/Qwen3.5-4B" \
  --mlx-path "./Qwen3.5-4B-MLX-4bit" \
  -q --q-bits 4 --q-group-size 64

Related Models

bf16 (full precision): mlx-community/Qwen3.5-4B-MLX-bf16
8-bit quantized: mlx-community/Qwen3.5-4B-MLX-8bit
Original: Qwen/Qwen3.5-4B

Usage

from mlx_vlm import load, generate

model, processor = load("TerminatorPower/Qwen3.5-4B-MLX-4bit")

output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.jpg",
    max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
  --model TerminatorPower/Qwen3.5-4B-MLX-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model. The mirror does not add any restrictions.

Downloads last month: 65

Safetensors

Model size

1.0B params

Tensor type

BF16

U32

F32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TerminatorPower/Qwen3.5-4B-MLX-4bit

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Quantized

(213)

this model