| # Using Stack 2.9 with Together AI |
|
|
| This guide explains how to use Stack 2.9 with Together AI as the model provider. |
|
|
| ## Overview |
|
|
| Together AI provides powerful cloud-hosted models with high performance and competitive pricing. Stack 2.9 supports Together AI through its OpenAI-compatible API, allowing you to use models like: |
|
|
| - `togethercomputer/meta-llama-3-70b-instruct` |
| - `togethercomputer/CodeLlama-34b-instruct` |
| - `togethercomputer/Qwen2.5-Coder-32B-Instruct` (recommended for Stack 2.9) |
| - And many others from Together's model library |
|
|
| ## Prerequisites |
|
|
| 1. **Together AI Account**: Sign up at [together.ai](https://together.ai) |
| 2. **API Key**: Obtain your API key from the Together dashboard |
| 3. **OpenAI Python Package**: Install `openai>=1.0.0` (required for Together client) |
|
|
| ```bash |
| pip install openai |
| ``` |
|
|
| ## Environment Variables |
|
|
| Configure your environment with the following variables: |
|
|
| ```bash |
| # Required: Together AI API key |
| export TOGETHER_API_KEY="your-together-api-key-here" |
| |
| # Optional: Model selection (default: togethercomputer/Qwen2.5-Coder-32B-Instruct) |
| export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct" |
| |
| # Optional: Provider configuration (for auto-detection) |
| export MODEL_PROVIDER="together" |
| ``` |
|
|
| ### Setting up in Shell |
|
|
| Add these lines to your `~/.zshrc`, `~/.bashrc`, or shell profile: |
|
|
| ```bash |
| # Together AI configuration |
| export TOGETHER_API_KEY="tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" |
| export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct" |
| ``` |
|
|
| Then reload your shell: |
|
|
| ```bash |
| source ~/.zshrc # or ~/.bashrc |
| ``` |
|
|
| ### Using .env file (recommended for development) |
|
|
| Create a `.env` file in your project root: |
|
|
| ```env |
| TOGETHER_API_KEY=tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
| TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct |
| MODEL_PROVIDER=together |
| ``` |
|
|
| Then load it with `python-dotenv`: |
|
|
| ```bash |
| pip install python-dotenv |
| ``` |
|
|
| And in your Python script: |
|
|
| ```python |
| from dotenv import load_dotenv |
| load_dotenv() # loads .env file |
| ``` |
|
|
| ## Usage Examples |
|
|
| ### Command Line |
|
|
| Use the built-in CLI with Together provider: |
|
|
| ```bash |
| # Using default model (Meta-Llama-3-70B) |
| python stack.py --provider together "Write a Python function to reverse a string" |
| |
| # Using a specific model (override env var) |
| TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct python stack.py --provider together "def factorial(n):" |
| ``` |
|
|
| ### Python API |
|
|
| ```python |
| from stack_2_9_eval.model_client import create_model_client |
| |
| # Create Together client (reads TOGETHER_API_KEY from env) |
| client = create_model_client(provider="together") |
| |
| # Or specify explicitly |
| client = create_model_client( |
| provider="together", |
| model="togethercomputer/Qwen2.5-Coder-32B-Instruct", |
| api_key="your-api-key" |
| ) |
| |
| # Generate code |
| result = client.generate( |
| prompt="Write a Python function to sort a list using quicksort", |
| temperature=0.2, |
| max_tokens=1024 |
| ) |
| |
| print(result.text) |
| ``` |
|
|
| ### Chat Mode |
|
|
| ```python |
| from stack_2_9_eval.model_client import create_model_client, ChatMessage |
| |
| client = create_model_client(provider="together") |
| |
| messages = [ |
| ChatMessage(role="system", content="You are an expert Python programmer."), |
| ChatMessage(role="user", content="How do I read a JSON file in Python?"), |
| ] |
| |
| result = client.chat(messages, temperature=0.2, max_tokens=512) |
| print(result.text) |
| ``` |
|
|
| ### Using with Tool Calls |
|
|
| ```python |
| tools = [ |
| { |
| "type": "function", |
| "function": { |
| "name": "FileReadTool", |
| "description": "Read file contents", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "path": {"type": "string", "description": "File path"} |
| }, |
| "required": ["path"] |
| } |
| } |
| } |
| ] |
| |
| messages = [ |
| ChatMessage(role="user", content="Read the file 'config.yaml' and tell me what's in it") |
| ] |
| |
| result = client.chat(messages, temperature=0.2, tools=tools) |
| print(result.text) |
| # Check result.raw_response for tool_calls if model requested a tool |
| ``` |
|
|
| ## Recommended Models |
|
|
| For Stack 2.9 use cases (coding + tool use), these Together models are recommended: |
|
|
| ### Primary Recommendation |
|
|
| **`togethercomputer/qwen2.5-coder-32b-instruct`** |
| - Matches Stack 2.9's base model (Qwen2.5-Coder-32B) |
| - Excellent code generation |
| - Strong tool-calling capabilities |
| - Cost-effective: ~$0.22 / 1M tokens (input) |
| - Use this for production Stack 2.9 deployments |
|
|
| ### Alternatives |
|
|
| **`togethercomputer/meta-llama-3-70b-instruct`** |
| - Larger model (70B) with strong reasoning |
| - Slightly higher cost but excellent quality |
| - Good for complex problem-solving |
|
|
| **`togethercomputer/codellama-34b-instruct`** |
| - Code-specialized Llama 34B |
| - Good performance, lower cost than 70B models |
|
|
| **`togethercomputer/qwen2.5-72b-instruct`** |
| - 72B variant of Qwen2.5 (if you need maximum quality) |
| - Higher cost and latency |
|
|
| ### Model Selection Tips |
|
|
| - **Match training distribution**: Use Qwen models for Stack 2.9 pattern compatibility |
| - **Budget**: 34B models offer best price/performance for coding tasks |
| - **Latency**: Smaller models (7B-13B) are faster but less capable |
| - **Throughput**: Consider batching for large-scale usage |
|
|
| ## Cost Estimation |
|
|
| Together AI pricing (as of 2025, check their site for current rates): |
|
|
| | Model | Input ($/1M tokens) | Output ($/1M tokens) | |
| |-------|---------------------|----------------------| |
| | Qwen2.5-Coder-32B | ~0.22 | ~0.22 | |
| | Meta-Llama-3-70B | ~0.70 | ~0.70 | |
| | CodeLlama-34B | ~0.22 | ~0.22 | |
| | Qwen2.5-72B | ~0.70 | ~0.70 | |
|
|
| ### Example Cost Calculation |
|
|
| If your typical usage: |
| - 100 queries/day |
| - Average 2,000 tokens per query (input + output) |
| - Using Qwen2.5-Coder-32B |
|
|
| Daily cost: `(100 * 2000 / 1,000,000) * $0.22 ≈ $0.044` |
| Monthly cost: ~$1.32 |
|
|
| **Very affordable for development and light production use.** |
|
|
| ## Performance Considerations |
|
|
| - **Latency**: Expect 100-500ms per request depending on model size and complexity |
| - **Rate Limits**: Together provides generous rate limits (check your plan) |
| - **Throughput**: Use concurrent requests for batch processing (respect rate limits) |
| - **Streaming**: Together supports streaming; use `stream=True` in client for long generations |
|
|
| ## Error Handling |
|
|
| Implement robust error handling for production: |
|
|
| ```python |
| from stack_2_9_eval.model_client import create_model_client |
| import time |
| |
| def generate_with_retry(client, prompt, max_retries=3): |
| for attempt in range(max_retries): |
| try: |
| result = client.generate(prompt, temperature=0.2, max_tokens=1024) |
| return result |
| except Exception as e: |
| if attempt == max_retries - 1: |
| raise |
| wait = 2 ** attempt # exponential backoff |
| print(f"Error: {e}. Retrying in {wait}s...") |
| time.sleep(wait) |
| |
| client = create_model_client(provider="together", api_key=os.getenv("TOGETHER_API_KEY")) |
| result = generate_with_retry(client, "Write a function to calculate prime numbers") |
| ``` |
|
|
| ## Comparison with Other Providers |
|
|
| | Feature | Together AI | Ollama (local) | OpenAI | Anthropic | |
| |---------|-------------|----------------|--------|-----------| |
| | Cost (32B class) | Low (~$0.22/M) | Free (your hardware) | High (~$3/M) | High (~$3/M) | |
| | Qwen2.5-Coder | ✅ Supported | ✅ Via pull | ❌ No | ❌ No | |
| | Privacy | Cloud (check TOS) | Full local | Cloud | Cloud | |
| | Latency | Medium | Fast (local) | Medium | Medium | |
| | Setup Complexity | Low (API key) | Medium (install) | Low | Low | |
| | Rate Limits | Generous | Unlimited | Pay-as-you-go | Pay-as-you-go | |
| | Tool Calling | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | |
|
|
| **Best for Stack 2.9**: Together AI when you need cloud access and Qwen models without running locally. |
|
|
| ## Troubleshooting |
|
|
| ### API Key Errors |
|
|
| ``` |
| ValueError: Together AI API key required. |
| ``` |
|
|
| **Solution**: Set `TOGETHER_API_KEY` environment variable or pass `api_key` param. |
|
|
| ### Model Not Found |
|
|
| ``` |
| openai.BadRequestError: The model '...' does not exist |
| ``` |
|
|
| **Solution**: Check model name spelling. Browse available models at [Together Models](https://together.ai/models). Use full model ID like `togethercomputer/qwen2.5-coder-32b-instruct`. |
|
|
| ### Rate Limit Exceeded |
|
|
| **Solution**: Add retry logic with exponential backoff. Consider upgrading your Together plan. |
|
|
| ### Import Errors |
|
|
| ``` |
| ImportError: openai package required |
| ``` |
|
|
| **Solution**: `pip install openai` (version 1.0+) |
|
|
| ## Advanced Configuration |
|
|
| ### Custom Base URL |
|
|
| If you need to use a custom endpoint (e.g., for regional deployments): |
|
|
| ```python |
| client = create_model_client( |
| provider="together", |
| model="togethercomputer/qwen2.5-coder-32b-instruct", |
| base_url="https://your-custom-endpoint.together.ai/v1" |
| ) |
| ``` |
|
|
| ### Timeouts and Retries |
|
|
| ```python |
| client = TogetherClient( |
| model="togethercomputer/qwen2.5-coder-32b-instruct", |
| api_key=os.getenv("TOGETHER_API_KEY"), |
| timeout=300 # 5 minute timeout |
| ) |
| ``` |
|
|
| ### Streaming Responses |
|
|
| For long generations, use streaming (requires modifying client or using OpenAI library directly): |
|
|
| ```python |
| from openai import OpenAI |
| |
| client = OpenAI(api_key=os.getenv("TOGETHER_API_KEY"), base_url="https://api.together.xyz/v1") |
| |
| stream = client.chat.completions.create( |
| model="togethercomputer/qwen2.5-coder-32b-instruct", |
| messages=[{"role": "user", "content": "Write a detailed explanation of binary search"}], |
| stream=True |
| ) |
| |
| for chunk in stream: |
| if chunk.choices[0].delta.content: |
| print(chunk.choices[0].delta.content, end="") |
| ``` |
|
|
| ## Integration with Stack 2.9 CLI |
|
|
| To make Together AI the default provider: |
|
|
| ```bash |
| # Set environment variables permanently |
| echo 'export MODEL_PROVIDER="together"' >> ~/.zshrc |
| echo 'export TOGETHER_MODEL="togethercomputer/qwen2.5-coder-32b-instruct"' >> ~/.zshrc |
| source ~/.zshrc |
| ``` |
|
|
| Now `stack.py` will automatically use Together AI without `--provider` flag. |
|
|
| ## Security Best Practices |
|
|
| 1. **Never commit API keys** to version control. Use `.env` files or environment variables. |
| 2. **Rotate keys** periodically from Together dashboard. |
| 3. **Use minimal permissions** - Together API keys have full access; protect them. |
| 4. **Enable billing alerts** to avoid unexpected charges. |
| 5. **Review Together's TOS** for data usage and privacy policies. |
|
|
| ## Support |
|
|
| - **Together Documentation**: https://docs.together.io/ |
| - **Stack 2.9 Issues**: https://github.com/my-ai-stack/stack-2.9/issues |
| - **Model Cards**: See `MODEL_CARD.md` for Stack 2.9 details |
|
|
| --- |
|
|
| **Last Updated**: 2025-04-02 |
| **Compatible Stack 2.9 Version**: 2.9.0+ |
|
|