Stack-2-9-finetuned / docs /QUICKSTART.md
walidsobhie-code
feat: add inference API, quickstart guide, roadmap, and combined tool data
b03a8a0
# Stack 2.9 — 5-Minute Quick Start
> **Goal:** Get Stack 2.9 running and solving coding tasks in under 5 minutes.
Stack 2.9 is an AI coding assistant powered by **Qwen2.5-Coder-32B** with Pattern Memory — it learns from your interactions and improves over time.
---
## 📋 Prerequisites
### Required
| Requirement | Version | Check |
|-------------|---------|-------|
| Python | 3.10+ | `python3 --version` |
| Git | Any recent | `git --version` |
| pip | Latest | `pip --version` |
### Optional (Recommended)
| Resource | Why You Need It | Minimum |
|----------|----------------|---------|
| **GPU** | Fast code generation | RTX 3070 / M1 Pro |
| **16GB VRAM** | Run 32B model smoothly | 8GB for 7B quantized |
> **No GPU?** Stack 2.9 works on CPU via Ollama or cloud providers (OpenAI, Together AI, etc.).
---
## ⚡ Step 1 — Install in 60 Seconds
```bash
# 1. Clone the repository
git clone https://github.com/my-ai-stack/stack-2.9.git
cd stack-2.9
# 2. Create a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# 4. Copy environment template
cp .env.example .env
```
**That's it.** If you hit errors, see [Troubleshooting](#-troubleshooting) below.
---
## 🔑 Step 2 — Configure Your Model Provider
Stack 2.9 supports multiple LLM providers. **Pick one that matches your setup:**
### Option A: Ollama (Recommended — Local, Private)
```bash
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull the Qwen model
ollama pull qwen2.5-coder:32b
# Set environment
export MODEL_PROVIDER=ollama
export OLLAMA_MODEL=qwen2.5-coder:32b
```
Edit your `.env` file:
```env
MODEL_PROVIDER=ollama
OLLAMA_MODEL=qwen2.5-coder:32b
```
### Option B: Together AI (Best for Qwen, Cloud)
```bash
# Get your API key at https://together.ai
export TOGETHER_API_KEY=tog-your-key-here
```
Edit your `.env`:
```env
MODEL_PROVIDER=together
TOGETHER_API_KEY=tog-your-key-here
TOGETHER_MODEL=togethercomputer/qwen2.5-32b-instruct
```
### Option C: OpenAI (GPT-4o)
```env
MODEL_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-4o
```
### Option D: Anthropic (Claude)
```env
MODEL_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-your-key-here
ANTHROPIC_MODEL=claude-3-5-sonnet-20240229
```
### Option E: OpenRouter (Unified Access)
```env
MODEL_PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-your-key-here
OPENROUTER_MODEL=openai/gpt-4o
```
---
## 🚀 Step 3 — Run Your First Task
### Interactive Chat Mode
```bash
python stack.py
```
You'll see:
```
╔══════════════════════════════════════════════╗
║ Stack 2.9 — AI Coding Assistant ║
║ Pattern Memory: Active | Tools: 46 ║
╚══════════════════════════════════════════════╝
You: Write a Python function to reverse a string
```
### Single Query Mode
```bash
python stack.py -c "Write a Python function to reverse a string"
```
**Expected output:**
```python
def reverse_string(s):
"""Reverse a string and return it."""
return s[::-1]
# Or for a more robust version:
def reverse_string(s):
return ''.join(reversed(s))
```
### Ask About Your Codebase
```bash
python stack.py -c "Find all Python files modified in the last week and list them"
```
### Generate and Run Code
```bash
python stack.py -c "Create a hello world Flask app with one route"
```
---
## 📊 Step 4 — Run Evaluation (Optional)
> **Note:** Evaluation requires a GPU with ~16GB VRAM or more.
### Prepare Your Fine-Tuned Model
After training Stack 2.9 on your data, your merged model will be in:
```
./output/merged/
```
### Run HumanEval Benchmark
```bash
python evaluate_model.py \
--model-path ./output/merged \
--benchmark humaneval \
--num-samples 10 \
--output results.json
```
### Run MBPP Benchmark
```bash
python evaluate_model.py \
--model-path ./output/merged \
--benchmark mbpp \
--num-samples 10 \
--output results.json
```
### Run Both Benchmarks
```bash
python evaluate_model.py \
--model-path ./output/merged \
--benchmark both \
--num-samples 10 \
--k-values 1,10 \
--output results.json
```
**Expected output format:**
```
============================================================
HumanEval Results
============================================================
pass@1: 65.00%
pass@10: 82.00%
Total problems evaluated: 12
============================================================
============================================================
MBPP Results
============================================================
pass@1: 70.00%
pass@10: 85.00%
Total problems evaluated: 12
============================================================
```
### Quick Evaluation (5 Problems Only)
```bash
python evaluate_model.py \
--model-path ./output/merged \
--benchmark humaneval \
--num-problems 5 \
--num-samples 5
```
---
## 🐳 Step 5 — Deploy Stack 2.9
### Deploy Locally with Docker
```bash
# Start the container
docker build -t stack-2.9 .
docker run -p 7860:7860 \
-e MODEL_PROVIDER=ollama \
-e OLLAMA_MODEL=qwen2.5-coder:32b \
stack-2.9
```
Access at: **http://localhost:7860**
### Deploy to RunPod (Cloud GPU)
```bash
# Edit runpod_deploy.sh with your config first
bash runpod_deploy.sh --gpu a100 --instance hourly
```
### Deploy to Kubernetes
```bash
# 1. Edit k8s/secret.yaml with your HuggingFace token
# 2. Apply the manifests
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/pvc.yaml
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
# Check status
kubectl get pods -n stack-29
kubectl logs -n stack-29 deployment/stack-29
```
### Hardware Requirements for Deployment
| Model Size | Minimum GPU | Recommended | Quantized (4-bit) |
|------------|-------------|-------------|-------------------|
| 7B | RTX 3070 (8GB) | A100 40GB | RTX 3060 (6GB) |
| 32B | A100 40GB | A100 80GB | RTX 3090 (24GB) |
---
## 🧠 Pattern Memory Quick Guide
Stack 2.9 stores successful patterns to help with future tasks.
### List Your Patterns
```bash
python stack.py --patterns list
python stack.py --patterns stats
```
### Extract Patterns from Your Git History
```bash
python scripts/extract_patterns_from_git.py \
--repo-path . \
--output patterns.jsonl \
--since-date "2024-01-01"
```
### Merge LoRA Adapters (Team Sharing)
```bash
python scripts/merge_lora_adapters.py \
--adapters adapter_a.safetensors adapter_b.safetensors \
--weights 0.7 0.3 \
--output merged.safetensors
```
---
## 🛠️ Troubleshooting
### "Module not found" errors
```bash
pip install -r requirements.txt
```
### "CUDA out of memory" during evaluation
```bash
# Reduce batch size
python evaluate_model.py --model-path ./merged --num-samples 5
# Or use 4-bit quantization
# (See docs/TRAINING_7B.md for quantized training)
```
### "Model not found" with Ollama
```bash
ollama pull qwen2.5-coder:32b
ollama list # Verify it's installed
```
### "API key not set" errors
```bash
# Double-check your .env file
cat .env
# For testing, you can also set inline
export TOGETHER_API_KEY=tog-your-key
```
### Slow inference on CPU
```bash
# Use a smaller model
export OLLAMA_MODEL=qwen2.5-coder:7b
# Or switch to cloud
export MODEL_PROVIDER=together
```
### Docker build fails
```bash
# Use Python 3.10 explicitly
docker build --build-arg PYTHON_VERSION=3.10 -t stack-2.9 .
```
### Kubernetes GPU not found
```bash
# Verify nvidia.com/gpu label on your node
kubectl get nodes -L nvidia.com/gpu
# Install NVIDIA GPU Operator if missing
# https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/
```
---
## 📚 What's Next?
| Goal | Go To |
|------|-------|
| Train on my own data | `docs/TRAINING_7B.md` |
| Learn all 46 tools | `TOOLS.md` |
| Set up team pattern sharing | `docs/pattern-moat.md` |
| Understand the architecture | `docs/reference/ARCHITECTURE.md` |
| Report a bug | `SECURITY.md` / GitHub Issues |
---
## ⚡ Quick Reference Card
```bash
# Install
git clone https://github.com/my-ai-stack/stack-2.9.git
cd stack-2.9 && pip install -r requirements.txt
# Configure
cp .env.example .env # Edit with your API keys
# Run
python stack.py # Interactive
python stack.py -c "your code request" # Single query
# Evaluate
python evaluate_model.py --model-path ./merged --benchmark humaneval
# Deploy
docker build -t stack-2.9 . && docker run -p 7860:7860 stack-2.9
```
---
*Stack 2.9 — AI that learns your patterns and grows with you.*