Best Practices: How to Get the Best Results from dispatchAI Models

#26

by 3morixd - opened about 11 hours ago

Best Practices for dispatchAI Models

1. Always Set the Correct Chat Format

Family	Format	Code
SmolLM2	llama-3	`chat_format="llama-3"`
Llama-3.2	llama-3	`chat_format="llama-3"`
Qwen2.5	chatml	`chat_format="chatml"`
Gemma-2	gemma	`chat_format="gemma"`
Others	chatml	`chat_format="chatml"`

2. Use create_chat_completion(), Not call()

# CORRECT
response = llm.create_chat_completion(messages=[...], max_tokens=50)

# WRONG (produces gibberish)
response = llm("prompt", max_tokens=50)

3. Choose the Right Model Size

RAM	Best Model	Size
2GB	SmolLM2-135M	101MB
4GB	Qwen2.5-0.5B-int4	469MB
6GB	Llama-3.2-1B-Q4	770MB

4. Use Low Temperature for Facts

# Factual answers (low temperature)
response = llm.create_chat_completion(messages=[...], max_tokens=30, temperature=0.1)

# Creative answers (higher temperature)
response = llm.create_chat_completion(messages=[...], max_tokens=50, temperature=0.7)

5. Keep Prompts Short for Small Models

Small models (135M-500M) work best with short, direct prompts:

✅ "What is the capital of France?"
❌ "Please tell me everything you know about the capital city of France and its history"

6. Use the SDK (Auto-Detects Everything)

from dispatchai import load_model
model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
print(model.chat("What is the capital of France?"))

🚀 dispatchAI

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment