Using Stack 2.9 with Together AI
This guide explains how to use Stack 2.9 with Together AI as the model provider.
Overview
Together AI provides powerful cloud-hosted models with high performance and competitive pricing. Stack 2.9 supports Together AI through its OpenAI-compatible API, allowing you to use models like:
togethercomputer/meta-llama-3-70b-instructtogethercomputer/CodeLlama-34b-instructtogethercomputer/Qwen2.5-Coder-32B-Instruct(recommended for Stack 2.9)- And many others from Together's model library
Prerequisites
- Together AI Account: Sign up at together.ai
- API Key: Obtain your API key from the Together dashboard
- OpenAI Python Package: Install
openai>=1.0.0(required for Together client)
pip install openai
Environment Variables
Configure your environment with the following variables:
# Required: Together AI API key
export TOGETHER_API_KEY="your-together-api-key-here"
# Optional: Model selection (default: togethercomputer/Qwen2.5-Coder-32B-Instruct)
export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"
# Optional: Provider configuration (for auto-detection)
export MODEL_PROVIDER="together"
Setting up in Shell
Add these lines to your ~/.zshrc, ~/.bashrc, or shell profile:
# Together AI configuration
export TOGETHER_API_KEY="tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"
Then reload your shell:
source ~/.zshrc # or ~/.bashrc
Using .env file (recommended for development)
Create a .env file in your project root:
TOGETHER_API_KEY=tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct
MODEL_PROVIDER=together
Then load it with python-dotenv:
pip install python-dotenv
And in your Python script:
from dotenv import load_dotenv
load_dotenv() # loads .env file
Usage Examples
Command Line
Use the built-in CLI with Together provider:
# Using default model (Meta-Llama-3-70B)
python stack.py --provider together "Write a Python function to reverse a string"
# Using a specific model (override env var)
TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct python stack.py --provider together "def factorial(n):"
Python API
from stack_2_9_eval.model_client import create_model_client
# Create Together client (reads TOGETHER_API_KEY from env)
client = create_model_client(provider="together")
# Or specify explicitly
client = create_model_client(
provider="together",
model="togethercomputer/Qwen2.5-Coder-32B-Instruct",
api_key="your-api-key"
)
# Generate code
result = client.generate(
prompt="Write a Python function to sort a list using quicksort",
temperature=0.2,
max_tokens=1024
)
print(result.text)
Chat Mode
from stack_2_9_eval.model_client import create_model_client, ChatMessage
client = create_model_client(provider="together")
messages = [
ChatMessage(role="system", content="You are an expert Python programmer."),
ChatMessage(role="user", content="How do I read a JSON file in Python?"),
]
result = client.chat(messages, temperature=0.2, max_tokens=512)
print(result.text)
Using with Tool Calls
tools = [
{
"type": "function",
"function": {
"name": "FileReadTool",
"description": "Read file contents",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"}
},
"required": ["path"]
}
}
}
]
messages = [
ChatMessage(role="user", content="Read the file 'config.yaml' and tell me what's in it")
]
result = client.chat(messages, temperature=0.2, tools=tools)
print(result.text)
# Check result.raw_response for tool_calls if model requested a tool
Recommended Models
For Stack 2.9 use cases (coding + tool use), these Together models are recommended:
Primary Recommendation
togethercomputer/qwen2.5-coder-32b-instruct
- Matches Stack 2.9's base model (Qwen2.5-Coder-32B)
- Excellent code generation
- Strong tool-calling capabilities
- Cost-effective: ~$0.22 / 1M tokens (input)
- Use this for production Stack 2.9 deployments
Alternatives
togethercomputer/meta-llama-3-70b-instruct
- Larger model (70B) with strong reasoning
- Slightly higher cost but excellent quality
- Good for complex problem-solving
togethercomputer/codellama-34b-instruct
- Code-specialized Llama 34B
- Good performance, lower cost than 70B models
togethercomputer/qwen2.5-72b-instruct
- 72B variant of Qwen2.5 (if you need maximum quality)
- Higher cost and latency
Model Selection Tips
- Match training distribution: Use Qwen models for Stack 2.9 pattern compatibility
- Budget: 34B models offer best price/performance for coding tasks
- Latency: Smaller models (7B-13B) are faster but less capable
- Throughput: Consider batching for large-scale usage
Cost Estimation
Together AI pricing (as of 2025, check their site for current rates):
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Qwen2.5-Coder-32B | ~0.22 | ~0.22 |
| Meta-Llama-3-70B | ~0.70 | ~0.70 |
| CodeLlama-34B | ~0.22 | ~0.22 |
| Qwen2.5-72B | ~0.70 | ~0.70 |
Example Cost Calculation
If your typical usage:
- 100 queries/day
- Average 2,000 tokens per query (input + output)
- Using Qwen2.5-Coder-32B
Daily cost: (100 * 2000 / 1,000,000) * $0.22 β $0.044
Monthly cost: ~$1.32
Very affordable for development and light production use.
Performance Considerations
- Latency: Expect 100-500ms per request depending on model size and complexity
- Rate Limits: Together provides generous rate limits (check your plan)
- Throughput: Use concurrent requests for batch processing (respect rate limits)
- Streaming: Together supports streaming; use
stream=Truein client for long generations
Error Handling
Implement robust error handling for production:
from stack_2_9_eval.model_client import create_model_client
import time
def generate_with_retry(client, prompt, max_retries=3):
for attempt in range(max_retries):
try:
result = client.generate(prompt, temperature=0.2, max_tokens=1024)
return result
except Exception as e:
if attempt == max_retries - 1:
raise
wait = 2 ** attempt # exponential backoff
print(f"Error: {e}. Retrying in {wait}s...")
time.sleep(wait)
client = create_model_client(provider="together", api_key=os.getenv("TOGETHER_API_KEY"))
result = generate_with_retry(client, "Write a function to calculate prime numbers")
Comparison with Other Providers
| Feature | Together AI | Ollama (local) | OpenAI | Anthropic |
|---|---|---|---|---|
| Cost (32B class) | Low (~$0.22/M) | Free (your hardware) | High (~$3/M) | High (~$3/M) |
| Qwen2.5-Coder | β Supported | β Via pull | β No | β No |
| Privacy | Cloud (check TOS) | Full local | Cloud | Cloud |
| Latency | Medium | Fast (local) | Medium | Medium |
| Setup Complexity | Low (API key) | Medium (install) | Low | Low |
| Rate Limits | Generous | Unlimited | Pay-as-you-go | Pay-as-you-go |
| Tool Calling | β Yes | β Yes | β Yes | β Yes |
Best for Stack 2.9: Together AI when you need cloud access and Qwen models without running locally.
Troubleshooting
API Key Errors
ValueError: Together AI API key required.
Solution: Set TOGETHER_API_KEY environment variable or pass api_key param.
Model Not Found
openai.BadRequestError: The model '...' does not exist
Solution: Check model name spelling. Browse available models at Together Models. Use full model ID like togethercomputer/qwen2.5-coder-32b-instruct.
Rate Limit Exceeded
Solution: Add retry logic with exponential backoff. Consider upgrading your Together plan.
Import Errors
ImportError: openai package required
Solution: pip install openai (version 1.0+)
Advanced Configuration
Custom Base URL
If you need to use a custom endpoint (e.g., for regional deployments):
client = create_model_client(
provider="together",
model="togethercomputer/qwen2.5-coder-32b-instruct",
base_url="https://your-custom-endpoint.together.ai/v1"
)
Timeouts and Retries
client = TogetherClient(
model="togethercomputer/qwen2.5-coder-32b-instruct",
api_key=os.getenv("TOGETHER_API_KEY"),
timeout=300 # 5 minute timeout
)
Streaming Responses
For long generations, use streaming (requires modifying client or using OpenAI library directly):
from openai import OpenAI
client = OpenAI(api_key=os.getenv("TOGETHER_API_KEY"), base_url="https://api.together.xyz/v1")
stream = client.chat.completions.create(
model="togethercomputer/qwen2.5-coder-32b-instruct",
messages=[{"role": "user", "content": "Write a detailed explanation of binary search"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Integration with Stack 2.9 CLI
To make Together AI the default provider:
# Set environment variables permanently
echo 'export MODEL_PROVIDER="together"' >> ~/.zshrc
echo 'export TOGETHER_MODEL="togethercomputer/qwen2.5-coder-32b-instruct"' >> ~/.zshrc
source ~/.zshrc
Now stack.py will automatically use Together AI without --provider flag.
Security Best Practices
- Never commit API keys to version control. Use
.envfiles or environment variables. - Rotate keys periodically from Together dashboard.
- Use minimal permissions - Together API keys have full access; protect them.
- Enable billing alerts to avoid unexpected charges.
- Review Together's TOS for data usage and privacy policies.
Support
- Together Documentation: https://docs.together.io/
- Stack 2.9 Issues: https://github.com/my-ai-stack/stack-2.9/issues
- Model Cards: See
MODEL_CARD.mdfor Stack 2.9 details
Last Updated: 2025-04-02
Compatible Stack 2.9 Version: 2.9.0+