Best Practices: How to Get the Best Results from dispatchAI Models

#26
by 3morixd - opened
dispatchAI org

Best Practices for dispatchAI Models

1. Always Set the Correct Chat Format

Family Format Code
SmolLM2 llama-3 chat_format="llama-3"
Llama-3.2 llama-3 chat_format="llama-3"
Qwen2.5 chatml chat_format="chatml"
Gemma-2 gemma chat_format="gemma"
Others chatml chat_format="chatml"

2. Use create_chat_completion(), Not call()

# CORRECT
response = llm.create_chat_completion(messages=[...], max_tokens=50)

# WRONG (produces gibberish)
response = llm("prompt", max_tokens=50)

3. Choose the Right Model Size

RAM Best Model Size
2GB SmolLM2-135M 101MB
4GB Qwen2.5-0.5B-int4 469MB
6GB Llama-3.2-1B-Q4 770MB

4. Use Low Temperature for Facts

# Factual answers (low temperature)
response = llm.create_chat_completion(messages=[...], max_tokens=30, temperature=0.1)

# Creative answers (higher temperature)
response = llm.create_chat_completion(messages=[...], max_tokens=50, temperature=0.7)

5. Keep Prompts Short for Small Models

Small models (135M-500M) work best with short, direct prompts:

  • βœ… "What is the capital of France?"
  • ❌ "Please tell me everything you know about the capital city of France and its history"

6. Use the SDK (Auto-Detects Everything)

from dispatchai import load_model
model = load_model("SmolLM2-135M-Instruct-mobile", backend="gguf")
print(model.chat("What is the capital of France?"))

πŸš€ dispatchAI

Sign up or log in to comment