Best Practices: How to Get the Best Results from dispatchAI Models
#26
by 3morixd - opened
Best Practices for dispatchAI Models
1. Always Set the Correct Chat Format
| Family | Format | Code |
|---|---|---|
| SmolLM2 | llama-3 | chat_format="llama-3" |
| Llama-3.2 | llama-3 | chat_format="llama-3" |
| Qwen2.5 | chatml | chat_format="chatml" |
| Gemma-2 | gemma | chat_format="gemma" |
| Others | chatml | chat_format="chatml" |
2. Use create_chat_completion(), Not call()
# CORRECT
response = llm.create_chat_completion(messages=[...], max_tokens=50)
# WRONG (produces gibberish)
response = llm("prompt", max_tokens=50)
3. Choose the Right Model Size
| RAM | Best Model | Size |
|---|---|---|
| 2GB | SmolLM2-135M | 101MB |
| 4GB | Qwen2.5-0.5B-int4 | 469MB |
| 6GB | Llama-3.2-1B-Q4 | 770MB |
4. Use Low Temperature for Facts
# Factual answers (low temperature)
response = llm.create_chat_completion(messages=[...], max_tokens=30, temperature=0.1)
# Creative answers (higher temperature)
response = llm.create_chat_completion(messages=[...], max_tokens=50, temperature=0.7)
5. Keep Prompts Short for Small Models
Small models (135M-500M) work best with short, direct prompts:
- β "What is the capital of France?"
- β "Please tell me everything you know about the capital city of France and its history"
6. Use the SDK (Auto-Detects Everything)
from dispatchai import load_model
model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
print(model.chat("What is the capital of France?"))
π dispatchAI