walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 22 days ago

10.5 kB

	# Using Stack 2.9 with Together AI

	This guide explains how to use Stack 2.9 with Together AI as the model provider.

	## Overview

	Together AI provides powerful cloud-hosted models with high performance and competitive pricing. Stack 2.9 supports Together AI through its OpenAI-compatible API, allowing you to use models like:

	- `togethercomputer/meta-llama-3-70b-instruct`
	- `togethercomputer/CodeLlama-34b-instruct`
	- `togethercomputer/Qwen2.5-Coder-32B-Instruct` (recommended for Stack 2.9)
	- And many others from Together's model library

	## Prerequisites

	1. Together AI Account: Sign up at [together.ai](https://together.ai)
	2. API Key: Obtain your API key from the Together dashboard
	3. OpenAI Python Package: Install `openai>=1.0.0` (required for Together client)

	```bash
	pip install openai
	```

	## Environment Variables

	Configure your environment with the following variables:

	```bash
	# Required: Together AI API key
	export TOGETHER_API_KEY="your-together-api-key-here"

	# Optional: Model selection (default: togethercomputer/Qwen2.5-Coder-32B-Instruct)
	export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"

	# Optional: Provider configuration (for auto-detection)
	export MODEL_PROVIDER="together"
	```

	### Setting up in Shell

	Add these lines to your `~/.zshrc`, `~/.bashrc`, or shell profile:

	```bash
	# Together AI configuration
	export TOGETHER_API_KEY="tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
	export TOGETHER_MODEL="togethercomputer/Qwen2.5-Coder-32B-Instruct"
	```

	Then reload your shell:

	```bash
	source ~/.zshrc # or ~/.bashrc
	```

	### Using .env file (recommended for development)

	Create a `.env` file in your project root:

	```env
	TOGETHER_API_KEY=tog-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
	TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct
	MODEL_PROVIDER=together
	```

	Then load it with `python-dotenv`:

	```bash
	pip install python-dotenv
	```

	And in your Python script:

	```python
	from dotenv import load_dotenv
	load_dotenv() # loads .env file
	```

	## Usage Examples

	### Command Line

	Use the built-in CLI with Together provider:

	```bash
	# Using default model (Meta-Llama-3-70B)
	python stack.py --provider together "Write a Python function to reverse a string"

	# Using a specific model (override env var)
	TOGETHER_MODEL=togethercomputer/Qwen2.5-Coder-32B-Instruct python stack.py --provider together "def factorial(n):"
	```

	### Python API

	```python
	from stack_2_9_eval.model_client import create_model_client

	# Create Together client (reads TOGETHER_API_KEY from env)
	client = create_model_client(provider="together")

	# Or specify explicitly
	client = create_model_client(
	provider="together",
	model="togethercomputer/Qwen2.5-Coder-32B-Instruct",
	api_key="your-api-key"
	)

	# Generate code
	result = client.generate(
	prompt="Write a Python function to sort a list using quicksort",
	temperature=0.2,
	max_tokens=1024
	)

	print(result.text)
	```

	### Chat Mode

	```python
	from stack_2_9_eval.model_client import create_model_client, ChatMessage

	client = create_model_client(provider="together")

	messages = [
	ChatMessage(role="system", content="You are an expert Python programmer."),
	ChatMessage(role="user", content="How do I read a JSON file in Python?"),
	]

	result = client.chat(messages, temperature=0.2, max_tokens=512)
	print(result.text)
	```

	### Using with Tool Calls

	```python
	tools = [
	{
	"type": "function",
	"function": {
	"name": "FileReadTool",
	"description": "Read file contents",
	"parameters": {
	"type": "object",
	"properties": {
	"path": {"type": "string", "description": "File path"}
	},
	"required": ["path"]
	}
	}
	}
	]

	messages = [
	ChatMessage(role="user", content="Read the file 'config.yaml' and tell me what's in it")
	]

	result = client.chat(messages, temperature=0.2, tools=tools)
	print(result.text)
	# Check result.raw_response for tool_calls if model requested a tool
	```

	## Recommended Models

	For Stack 2.9 use cases (coding + tool use), these Together models are recommended:

	### Primary Recommendation

	`togethercomputer/qwen2.5-coder-32b-instruct`
	- Matches Stack 2.9's base model (Qwen2.5-Coder-32B)
	- Excellent code generation
	- Strong tool-calling capabilities
	- Cost-effective: ~$0.22 / 1M tokens (input)
	- Use this for production Stack 2.9 deployments

	### Alternatives

	`togethercomputer/meta-llama-3-70b-instruct`
	- Larger model (70B) with strong reasoning
	- Slightly higher cost but excellent quality
	- Good for complex problem-solving

	`togethercomputer/codellama-34b-instruct`
	- Code-specialized Llama 34B
	- Good performance, lower cost than 70B models

	`togethercomputer/qwen2.5-72b-instruct`
	- 72B variant of Qwen2.5 (if you need maximum quality)
	- Higher cost and latency

	### Model Selection Tips

	- Match training distribution: Use Qwen models for Stack 2.9 pattern compatibility
	- Budget: 34B models offer best price/performance for coding tasks
	- Latency: Smaller models (7B-13B) are faster but less capable
	- Throughput: Consider batching for large-scale usage

	## Cost Estimation

	Together AI pricing (as of 2025, check their site for current rates):

	\| Model \| Input ($/1M tokens) \| Output ($/1M tokens) \|
	\|-------\|---------------------\|----------------------\|
	\| Qwen2.5-Coder-32B \| ~0.22 \| ~0.22 \|
	\| Meta-Llama-3-70B \| ~0.70 \| ~0.70 \|
	\| CodeLlama-34B \| ~0.22 \| ~0.22 \|
	\| Qwen2.5-72B \| ~0.70 \| ~0.70 \|

	### Example Cost Calculation

	If your typical usage:
	- 100 queries/day
	- Average 2,000 tokens per query (input + output)
	- Using Qwen2.5-Coder-32B

	Daily cost: `(100 * 2000 / 1,000,000) * $0.22 ≈ $0.044`
	Monthly cost: ~$1.32

	Very affordable for development and light production use.

	## Performance Considerations

	- Latency: Expect 100-500ms per request depending on model size and complexity
	- Rate Limits: Together provides generous rate limits (check your plan)
	- Throughput: Use concurrent requests for batch processing (respect rate limits)
	- Streaming: Together supports streaming; use `stream=True` in client for long generations

	## Error Handling

	Implement robust error handling for production:

	```python
	from stack_2_9_eval.model_client import create_model_client
	import time

	def generate_with_retry(client, prompt, max_retries=3):
	for attempt in range(max_retries):
	try:
	result = client.generate(prompt, temperature=0.2, max_tokens=1024)
	return result
	except Exception as e:
	if attempt == max_retries - 1:
	raise
	wait = 2 ** attempt # exponential backoff
	print(f"Error: {e}. Retrying in {wait}s...")
	time.sleep(wait)

	client = create_model_client(provider="together", api_key=os.getenv("TOGETHER_API_KEY"))
	result = generate_with_retry(client, "Write a function to calculate prime numbers")
	```

	## Comparison with Other Providers

	\| Feature \| Together AI \| Ollama (local) \| OpenAI \| Anthropic \|
	\|---------\|-------------\|----------------\|--------\|-----------\|
	\| Cost (32B class) \| Low (~$0.22/M) \| Free (your hardware) \| High (~$3/M) \| High (~$3/M) \|
	\| Qwen2.5-Coder \| ✅ Supported \| ✅ Via pull \| ❌ No \| ❌ No \|
	\| Privacy \| Cloud (check TOS) \| Full local \| Cloud \| Cloud \|
	\| Latency \| Medium \| Fast (local) \| Medium \| Medium \|
	\| Setup Complexity \| Low (API key) \| Medium (install) \| Low \| Low \|
	\| Rate Limits \| Generous \| Unlimited \| Pay-as-you-go \| Pay-as-you-go \|
	\| Tool Calling \| ✅ Yes \| ✅ Yes \| ✅ Yes \| ✅ Yes \|

	Best for Stack 2.9: Together AI when you need cloud access and Qwen models without running locally.

	## Troubleshooting

	### API Key Errors

	```
	ValueError: Together AI API key required.
	```

	Solution: Set `TOGETHER_API_KEY` environment variable or pass `api_key` param.

	### Model Not Found

	```
	openai.BadRequestError: The model '...' does not exist
	```

	Solution: Check model name spelling. Browse available models at [Together Models](https://together.ai/models). Use full model ID like `togethercomputer/qwen2.5-coder-32b-instruct`.

	### Rate Limit Exceeded

	Solution: Add retry logic with exponential backoff. Consider upgrading your Together plan.

	### Import Errors

	```
	ImportError: openai package required
	```

	Solution: `pip install openai` (version 1.0+)

	## Advanced Configuration

	### Custom Base URL

	If you need to use a custom endpoint (e.g., for regional deployments):

	```python
	client = create_model_client(
	provider="together",
	model="togethercomputer/qwen2.5-coder-32b-instruct",
	base_url="https://your-custom-endpoint.together.ai/v1"
	)
	```

	### Timeouts and Retries

	```python
	client = TogetherClient(
	model="togethercomputer/qwen2.5-coder-32b-instruct",
	api_key=os.getenv("TOGETHER_API_KEY"),
	timeout=300 # 5 minute timeout
	)
	```

	### Streaming Responses

	For long generations, use streaming (requires modifying client or using OpenAI library directly):

	```python
	from openai import OpenAI

	client = OpenAI(api_key=os.getenv("TOGETHER_API_KEY"), base_url="https://api.together.xyz/v1")

	stream = client.chat.completions.create(
	model="togethercomputer/qwen2.5-coder-32b-instruct",
	messages=[{"role": "user", "content": "Write a detailed explanation of binary search"}],
	stream=True
	)

	for chunk in stream:
	if chunk.choices[0].delta.content:
	print(chunk.choices[0].delta.content, end="")
	```

	## Integration with Stack 2.9 CLI

	To make Together AI the default provider:

	```bash
	# Set environment variables permanently
	echo 'export MODEL_PROVIDER="together"' >> ~/.zshrc
	echo 'export TOGETHER_MODEL="togethercomputer/qwen2.5-coder-32b-instruct"' >> ~/.zshrc
	source ~/.zshrc
	```

	Now `stack.py` will automatically use Together AI without `--provider` flag.

	## Security Best Practices

	1. Never commit API keys to version control. Use `.env` files or environment variables.
	2. Rotate keys periodically from Together dashboard.
	3. Use minimal permissions - Together API keys have full access; protect them.
	4. Enable billing alerts to avoid unexpected charges.
	5. Review Together's TOS for data usage and privacy policies.

	## Support

	- Together Documentation: https://docs.together.io/
	- Stack 2.9 Issues: https://github.com/my-ai-stack/stack-2.9/issues
	- Model Cards: See `MODEL_CARD.md` for Stack 2.9 details

	---

	Last Updated: 2025-04-02
	Compatible Stack 2.9 Version: 2.9.0+