| # Stack 2.9 — 5-Minute Quick Start |
|
|
| > **Goal:** Get Stack 2.9 running and solving coding tasks in under 5 minutes. |
|
|
| Stack 2.9 is an AI coding assistant powered by **Qwen2.5-Coder-32B** with Pattern Memory — it learns from your interactions and improves over time. |
|
|
| --- |
|
|
| ## 📋 Prerequisites |
|
|
| ### Required |
| | Requirement | Version | Check | |
| |-------------|---------|-------| |
| | Python | 3.10+ | `python3 --version` | |
| | Git | Any recent | `git --version` | |
| | pip | Latest | `pip --version` | |
|
|
| ### Optional (Recommended) |
| | Resource | Why You Need It | Minimum | |
| |----------|----------------|---------| |
| | **GPU** | Fast code generation | RTX 3070 / M1 Pro | |
| | **16GB VRAM** | Run 32B model smoothly | 8GB for 7B quantized | |
|
|
| > **No GPU?** Stack 2.9 works on CPU via Ollama or cloud providers (OpenAI, Together AI, etc.). |
|
|
| --- |
|
|
| ## ⚡ Step 1 — Install in 60 Seconds |
|
|
| ```bash |
| # 1. Clone the repository |
| git clone https://github.com/my-ai-stack/stack-2.9.git |
| cd stack-2.9 |
| |
| # 2. Create a virtual environment (recommended) |
| python3 -m venv venv |
| source venv/bin/activate # On Windows: venv\Scripts\activate |
| |
| # 3. Install dependencies |
| pip install --upgrade pip |
| pip install -r requirements.txt |
| |
| # 4. Copy environment template |
| cp .env.example .env |
| ``` |
|
|
| **That's it.** If you hit errors, see [Troubleshooting](#-troubleshooting) below. |
|
|
| --- |
|
|
| ## 🔑 Step 2 — Configure Your Model Provider |
|
|
| Stack 2.9 supports multiple LLM providers. **Pick one that matches your setup:** |
|
|
| ### Option A: Ollama (Recommended — Local, Private) |
|
|
| ```bash |
| # Install Ollama (macOS/Linux) |
| curl -fsSL https://ollama.ai/install.sh | sh |
| |
| # Pull the Qwen model |
| ollama pull qwen2.5-coder:32b |
| |
| # Set environment |
| export MODEL_PROVIDER=ollama |
| export OLLAMA_MODEL=qwen2.5-coder:32b |
| ``` |
|
|
| Edit your `.env` file: |
| ```env |
| MODEL_PROVIDER=ollama |
| OLLAMA_MODEL=qwen2.5-coder:32b |
| ``` |
|
|
| ### Option B: Together AI (Best for Qwen, Cloud) |
|
|
| ```bash |
| # Get your API key at https://together.ai |
| export TOGETHER_API_KEY=tog-your-key-here |
| ``` |
|
|
| Edit your `.env`: |
| ```env |
| MODEL_PROVIDER=together |
| TOGETHER_API_KEY=tog-your-key-here |
| TOGETHER_MODEL=togethercomputer/qwen2.5-32b-instruct |
| ``` |
|
|
| ### Option C: OpenAI (GPT-4o) |
|
|
| ```env |
| MODEL_PROVIDER=openai |
| OPENAI_API_KEY=sk-your-key-here |
| OPENAI_MODEL=gpt-4o |
| ``` |
|
|
| ### Option D: Anthropic (Claude) |
|
|
| ```env |
| MODEL_PROVIDER=anthropic |
| ANTHROPIC_API_KEY=sk-ant-your-key-here |
| ANTHROPIC_MODEL=claude-3-5-sonnet-20240229 |
| ``` |
|
|
| ### Option E: OpenRouter (Unified Access) |
|
|
| ```env |
| MODEL_PROVIDER=openrouter |
| OPENROUTER_API_KEY=sk-or-your-key-here |
| OPENROUTER_MODEL=openai/gpt-4o |
| ``` |
|
|
| --- |
|
|
| ## 🚀 Step 3 — Run Your First Task |
|
|
| ### Interactive Chat Mode |
|
|
| ```bash |
| python stack.py |
| ``` |
|
|
| You'll see: |
| ``` |
| ╔══════════════════════════════════════════════╗ |
| ║ Stack 2.9 — AI Coding Assistant ║ |
| ║ Pattern Memory: Active | Tools: 46 ║ |
| ╚══════════════════════════════════════════════╝ |
| |
| You: Write a Python function to reverse a string |
| ``` |
|
|
| ### Single Query Mode |
|
|
| ```bash |
| python stack.py -c "Write a Python function to reverse a string" |
| ``` |
|
|
| **Expected output:** |
| ```python |
| def reverse_string(s): |
| """Reverse a string and return it.""" |
| return s[::-1] |
| |
| # Or for a more robust version: |
| def reverse_string(s): |
| return ''.join(reversed(s)) |
| ``` |
|
|
| ### Ask About Your Codebase |
|
|
| ```bash |
| python stack.py -c "Find all Python files modified in the last week and list them" |
| ``` |
|
|
| ### Generate and Run Code |
|
|
| ```bash |
| python stack.py -c "Create a hello world Flask app with one route" |
| ``` |
|
|
| --- |
|
|
| ## 📊 Step 4 — Run Evaluation (Optional) |
|
|
| > **Note:** Evaluation requires a GPU with ~16GB VRAM or more. |
|
|
| ### Prepare Your Fine-Tuned Model |
|
|
| After training Stack 2.9 on your data, your merged model will be in: |
| ``` |
| ./output/merged/ |
| ``` |
|
|
| ### Run HumanEval Benchmark |
|
|
| ```bash |
| python evaluate_model.py \ |
| --model-path ./output/merged \ |
| --benchmark humaneval \ |
| --num-samples 10 \ |
| --output results.json |
| ``` |
|
|
| ### Run MBPP Benchmark |
|
|
| ```bash |
| python evaluate_model.py \ |
| --model-path ./output/merged \ |
| --benchmark mbpp \ |
| --num-samples 10 \ |
| --output results.json |
| ``` |
|
|
| ### Run Both Benchmarks |
|
|
| ```bash |
| python evaluate_model.py \ |
| --model-path ./output/merged \ |
| --benchmark both \ |
| --num-samples 10 \ |
| --k-values 1,10 \ |
| --output results.json |
| ``` |
|
|
| **Expected output format:** |
| ``` |
| ============================================================ |
| HumanEval Results |
| ============================================================ |
| pass@1: 65.00% |
| pass@10: 82.00% |
| Total problems evaluated: 12 |
| ============================================================ |
|
|
| ============================================================ |
| MBPP Results |
| ============================================================ |
| pass@1: 70.00% |
| pass@10: 85.00% |
| Total problems evaluated: 12 |
| ============================================================ |
| ``` |
| |
| ### Quick Evaluation (5 Problems Only) |
| |
| ```bash |
| python evaluate_model.py \ |
| --model-path ./output/merged \ |
| --benchmark humaneval \ |
| --num-problems 5 \ |
| --num-samples 5 |
| ``` |
| |
| --- |
| |
| ## 🐳 Step 5 — Deploy Stack 2.9 |
| |
| ### Deploy Locally with Docker |
| |
| ```bash |
| # Start the container |
| docker build -t stack-2.9 . |
| docker run -p 7860:7860 \ |
| -e MODEL_PROVIDER=ollama \ |
| -e OLLAMA_MODEL=qwen2.5-coder:32b \ |
| stack-2.9 |
| ``` |
| |
| Access at: **http://localhost:7860** |
|
|
| ### Deploy to RunPod (Cloud GPU) |
|
|
| ```bash |
| # Edit runpod_deploy.sh with your config first |
| bash runpod_deploy.sh --gpu a100 --instance hourly |
| ``` |
|
|
| ### Deploy to Kubernetes |
|
|
| ```bash |
| # 1. Edit k8s/secret.yaml with your HuggingFace token |
| # 2. Apply the manifests |
| kubectl apply -f k8s/namespace.yaml |
| kubectl apply -f k8s/secret.yaml |
| kubectl apply -f k8s/configmap.yaml |
| kubectl apply -f k8s/pvc.yaml |
| kubectl apply -f k8s/deployment.yaml |
| kubectl apply -f k8s/service.yaml |
| |
| # Check status |
| kubectl get pods -n stack-29 |
| kubectl logs -n stack-29 deployment/stack-29 |
| ``` |
|
|
| ### Hardware Requirements for Deployment |
|
|
| | Model Size | Minimum GPU | Recommended | Quantized (4-bit) | |
| |------------|-------------|-------------|-------------------| |
| | 7B | RTX 3070 (8GB) | A100 40GB | RTX 3060 (6GB) | |
| | 32B | A100 40GB | A100 80GB | RTX 3090 (24GB) | |
|
|
| --- |
|
|
| ## 🧠 Pattern Memory Quick Guide |
|
|
| Stack 2.9 stores successful patterns to help with future tasks. |
|
|
| ### List Your Patterns |
|
|
| ```bash |
| python stack.py --patterns list |
| python stack.py --patterns stats |
| ``` |
|
|
| ### Extract Patterns from Your Git History |
|
|
| ```bash |
| python scripts/extract_patterns_from_git.py \ |
| --repo-path . \ |
| --output patterns.jsonl \ |
| --since-date "2024-01-01" |
| ``` |
|
|
| ### Merge LoRA Adapters (Team Sharing) |
|
|
| ```bash |
| python scripts/merge_lora_adapters.py \ |
| --adapters adapter_a.safetensors adapter_b.safetensors \ |
| --weights 0.7 0.3 \ |
| --output merged.safetensors |
| ``` |
|
|
| --- |
|
|
| ## 🛠️ Troubleshooting |
|
|
| ### "Module not found" errors |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### "CUDA out of memory" during evaluation |
|
|
| ```bash |
| # Reduce batch size |
| python evaluate_model.py --model-path ./merged --num-samples 5 |
| |
| # Or use 4-bit quantization |
| # (See docs/TRAINING_7B.md for quantized training) |
| ``` |
|
|
| ### "Model not found" with Ollama |
|
|
| ```bash |
| ollama pull qwen2.5-coder:32b |
| ollama list # Verify it's installed |
| ``` |
|
|
| ### "API key not set" errors |
|
|
| ```bash |
| # Double-check your .env file |
| cat .env |
| |
| # For testing, you can also set inline |
| export TOGETHER_API_KEY=tog-your-key |
| ``` |
|
|
| ### Slow inference on CPU |
|
|
| ```bash |
| # Use a smaller model |
| export OLLAMA_MODEL=qwen2.5-coder:7b |
| |
| # Or switch to cloud |
| export MODEL_PROVIDER=together |
| ``` |
|
|
| ### Docker build fails |
|
|
| ```bash |
| # Use Python 3.10 explicitly |
| docker build --build-arg PYTHON_VERSION=3.10 -t stack-2.9 . |
| ``` |
|
|
| ### Kubernetes GPU not found |
|
|
| ```bash |
| # Verify nvidia.com/gpu label on your node |
| kubectl get nodes -L nvidia.com/gpu |
| |
| # Install NVIDIA GPU Operator if missing |
| # https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/ |
| ``` |
|
|
| --- |
|
|
| ## 📚 What's Next? |
|
|
| | Goal | Go To | |
| |------|-------| |
| | Train on my own data | `docs/TRAINING_7B.md` | |
| | Learn all 46 tools | `TOOLS.md` | |
| | Set up team pattern sharing | `docs/pattern-moat.md` | |
| | Understand the architecture | `docs/reference/ARCHITECTURE.md` | |
| | Report a bug | `SECURITY.md` / GitHub Issues | |
|
|
| --- |
|
|
| ## ⚡ Quick Reference Card |
|
|
| ```bash |
| # Install |
| git clone https://github.com/my-ai-stack/stack-2.9.git |
| cd stack-2.9 && pip install -r requirements.txt |
| |
| # Configure |
| cp .env.example .env # Edit with your API keys |
| |
| # Run |
| python stack.py # Interactive |
| python stack.py -c "your code request" # Single query |
| |
| # Evaluate |
| python evaluate_model.py --model-path ./merged --benchmark humaneval |
| |
| # Deploy |
| docker build -t stack-2.9 . && docker run -p 7860:7860 stack-2.9 |
| ``` |
|
|
| --- |
|
|
| *Stack 2.9 — AI that learns your patterns and grows with you.* |
|
|