walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 22 days ago

10.5 kB

Using Stack 2.9 with Together AI

This guide explains how to use Stack 2.9 with Together AI as the model provider.

Overview

Together AI provides powerful cloud-hosted models with high performance and competitive pricing. Stack 2.9 supports Together AI through its OpenAI-compatible API, allowing you to use models like:

togethercomputer/meta-llama-3-70b-instruct
togethercomputer/CodeLlama-34b-instruct
togethercomputer/Qwen2.5-Coder-32B-Instruct (recommended for Stack 2.9)
And many others from Together's model library

Prerequisites

Together AI Account: Sign up at together.ai
API Key: Obtain your API key from the Together dashboard
OpenAI Python Package: Install openai>=1.0.0 (required for Together client)

pip install openai

Environment Variables

Configure your environment with the following variables:

# Required: Together AI API key
export TOGETHER_API_KEY="your-together-api-key-here"

# Optional: Model selection (default: togethercomputer/Qwen2.5-Coder-32B-Instruct)
export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"

# Optional: Provider configuration (for auto-detection)
export MODEL_PROVIDER="together"

Setting up in Shell

Add these lines to your ~/.zshrc, ~/.bashrc, or shell profile:

# Together AI configuration
export TOGETHER_API_KEY="tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"

Then reload your shell:

source ~/.zshrc  # or ~/.bashrc

Using .env file (recommended for development)

Create a .env file in your project root:

TOGETHER_API_KEY=tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct
MODEL_PROVIDER=together

Then load it with python-dotenv:

pip install python-dotenv

And in your Python script:

from dotenv import load_dotenv
load_dotenv()  # loads .env file

Usage Examples

Command Line

Use the built-in CLI with Together provider:

# Using default model (Meta-Llama-3-70B)
python stack.py --provider together "Write a Python function to reverse a string"

# Using a specific model (override env var)
TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct python stack.py --provider together "def factorial(n):"

Python API

from stack_2_9_eval.model_client import create_model_client

# Create Together client (reads TOGETHER_API_KEY from env)
client = create_model_client(provider="together")

# Or specify explicitly
client = create_model_client(
    provider="together",
    model="togethercomputer/Qwen2.5-Coder-32B-Instruct",
    api_key="your-api-key"
)

# Generate code
result = client.generate(
    prompt="Write a Python function to sort a list using quicksort",
    temperature=0.2,
    max_tokens=1024
)

print(result.text)

Chat Mode

from stack_2_9_eval.model_client import create_model_client, ChatMessage

client = create_model_client(provider="together")

messages = [
    ChatMessage(role="system", content="You are an expert Python programmer."),
    ChatMessage(role="user", content="How do I read a JSON file in Python?"),
]

result = client.chat(messages, temperature=0.2, max_tokens=512)
print(result.text)

Using with Tool Calls

tools = [
    {
        "type": "function",
        "function": {
            "name": "FileReadTool",
            "description": "Read file contents",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path"}
                },
                "required": ["path"]
            }
        }
    }
]

messages = [
    ChatMessage(role="user", content="Read the file 'config.yaml' and tell me what's in it")
]

result = client.chat(messages, temperature=0.2, tools=tools)
print(result.text)
# Check result.raw_response for tool_calls if model requested a tool

Recommended Models

For Stack 2.9 use cases (coding + tool use), these Together models are recommended:

Primary Recommendation

togethercomputer/qwen2.5-coder-32b-instruct

Matches Stack 2.9's base model (Qwen2.5-Coder-32B)
Excellent code generation
Strong tool-calling capabilities
Cost-effective: ~$0.22 / 1M tokens (input)
Use this for production Stack 2.9 deployments

Alternatives

togethercomputer/meta-llama-3-70b-instruct

Larger model (70B) with strong reasoning
Slightly higher cost but excellent quality
Good for complex problem-solving

togethercomputer/codellama-34b-instruct

Code-specialized Llama 34B
Good performance, lower cost than 70B models

togethercomputer/qwen2.5-72b-instruct

72B variant of Qwen2.5 (if you need maximum quality)
Higher cost and latency

Model Selection Tips

Match training distribution: Use Qwen models for Stack 2.9 pattern compatibility
Budget: 34B models offer best price/performance for coding tasks
Latency: Smaller models (7B-13B) are faster but less capable
Throughput: Consider batching for large-scale usage

Cost Estimation

Together AI pricing (as of 2025, check their site for current rates):

Model	Input ($/1M tokens)	Output ($/1M tokens)
Qwen2.5-Coder-32B	~0.22	~0.22
Meta-Llama-3-70B	~0.70	~0.70
CodeLlama-34B	~0.22	~0.22
Qwen2.5-72B	~0.70	~0.70

Example Cost Calculation

If your typical usage:

100 queries/day
Average 2,000 tokens per query (input + output)
Using Qwen2.5-Coder-32B

Daily cost: (100 * 2000 / 1,000,000) * $0.22 ≈ $0.044 Monthly cost: ~$1.32

Very affordable for development and light production use.

Performance Considerations

Latency: Expect 100-500ms per request depending on model size and complexity
Rate Limits: Together provides generous rate limits (check your plan)
Throughput: Use concurrent requests for batch processing (respect rate limits)
Streaming: Together supports streaming; use stream=True in client for long generations

Error Handling

Implement robust error handling for production:

from stack_2_9_eval.model_client import create_model_client
import time

def generate_with_retry(client, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = client.generate(prompt, temperature=0.2, max_tokens=1024)
            return result
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            wait = 2 ** attempt  # exponential backoff
            print(f"Error: {e}. Retrying in {wait}s...")
            time.sleep(wait)

client = create_model_client(provider="together", api_key=os.getenv("TOGETHER_API_KEY"))
result = generate_with_retry(client, "Write a function to calculate prime numbers")

Comparison with Other Providers

Feature	Together AI	Ollama (local)	OpenAI	Anthropic
Cost (32B class)	Low (~$0.22/M)	Free (your hardware)	High (~$3/M)	High (~$3/M)
Qwen2.5-Coder	✅ Supported	✅ Via pull	❌ No	❌ No
Privacy	Cloud (check TOS)	Full local	Cloud	Cloud
Latency	Medium	Fast (local)	Medium	Medium
Setup Complexity	Low (API key)	Medium (install)	Low	Low
Rate Limits	Generous	Unlimited	Pay-as-you-go	Pay-as-you-go
Tool Calling	✅ Yes	✅ Yes	✅ Yes	✅ Yes

Best for Stack 2.9: Together AI when you need cloud access and Qwen models without running locally.

Troubleshooting

API Key Errors

ValueError: Together AI API key required.

Solution: Set TOGETHER_API_KEY environment variable or pass api_key param.

Model Not Found

openai.BadRequestError: The model '...' does not exist

Solution: Check model name spelling. Browse available models at Together Models. Use full model ID like togethercomputer/qwen2.5-coder-32b-instruct.

Rate Limit Exceeded

Solution: Add retry logic with exponential backoff. Consider upgrading your Together plan.

Import Errors

ImportError: openai package required

Solution: pip install openai (version 1.0+)

Advanced Configuration

Custom Base URL

If you need to use a custom endpoint (e.g., for regional deployments):

client = create_model_client(
    provider="together",
    model="togethercomputer/qwen2.5-coder-32b-instruct",
    base_url="https://your-custom-endpoint.together.ai/v1"
)

Timeouts and Retries

client = TogetherClient(
    model="togethercomputer/qwen2.5-coder-32b-instruct",
    api_key=os.getenv("TOGETHER_API_KEY"),
    timeout=300  # 5 minute timeout
)

Streaming Responses

For long generations, use streaming (requires modifying client or using OpenAI library directly):

from openai import OpenAI

client = OpenAI(api_key=os.getenv("TOGETHER_API_KEY"), base_url="https://api.together.xyz/v1")

stream = client.chat.completions.create(
    model="togethercomputer/qwen2.5-coder-32b-instruct",
    messages=[{"role": "user", "content": "Write a detailed explanation of binary search"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Integration with Stack 2.9 CLI

To make Together AI the default provider:

# Set environment variables permanently
echo 'export MODEL_PROVIDER="together"' >> ~/.zshrc
echo 'export TOGETHER_MODEL="togethercomputer/qwen2.5-coder-32b-instruct"' >> ~/.zshrc
source ~/.zshrc

Now stack.py will automatically use Together AI without --provider flag.

Security Best Practices

Never commit API keys to version control. Use .env files or environment variables.
Rotate keys periodically from Together dashboard.
Use minimal permissions - Together API keys have full access; protect them.
Enable billing alerts to avoid unexpected charges.
Review Together's TOS for data usage and privacy policies.

Support

Together Documentation: https://docs.together.io/
Stack 2.9 Issues: https://github.com/my-ai-stack/stack-2.9/issues
Model Cards: See MODEL_CARD.md for Stack 2.9 details

Last Updated: 2025-04-02
Compatible Stack 2.9 Version: 2.9.0+