How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4",
	filename="model.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen 2.5 0.5B Instruct - Mobile INT4 (GGUF)

Alibaba's Qwen 2.5 0.5B Instruct, the smallest capable general-purpose model. Incredibly fast on phones.

Property Value
Base Qwen/Qwen2.5-0.5B-Instruct
Parameters 494 million
Quantization INT4 GGUF
Size ~398 MB
License Apache 2.0

Performance

  • ~45 tok/s on Samsung S20 FE CPU (fastest in our collection!)
  • ~0.7 GB memory footprint
  • Fits on ANY modern smartphone
  • ~94% quality retention

Use Cases

  • Code generation on mobile IDEs
  • Quick text classification / extraction
  • Embedded assistants in apps
  • Ultra-low-latency responses (<50ms per token)
  • Batch processing at massive scale

Quick Start

huggingface-cli download dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 --local-dir ./models
./build/bin/main -m ./models/model.gguf -p "Explain quantum computing simply." -n 128 -t 4
Downloads last month
862
GGUF
Model size
0.6B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4 3

Collections including dispatchAI/Qwen2.5-0.5B-Instruct-mobile-int4