Stack-2-9-finetuned / docs /QUICKSTART.md

walidsobhie-code

feat: add inference API, quickstart guide, roadmap, and combined tool data

b03a8a0 20 days ago

8.92 kB

	# Stack 2.9 — 5-Minute Quick Start

	> Goal: Get Stack 2.9 running and solving coding tasks in under 5 minutes.

	Stack 2.9 is an AI coding assistant powered by Qwen2.5-Coder-32B with Pattern Memory — it learns from your interactions and improves over time.

	---

	## 📋 Prerequisites

	### Required
	\| Requirement \| Version \| Check \|
	\|-------------\|---------\|-------\|
	\| Python \| 3.10+ \| `python3 --version` \|
	\| Git \| Any recent \| `git --version` \|
	\| pip \| Latest \| `pip --version` \|

	### Optional (Recommended)
	\| Resource \| Why You Need It \| Minimum \|
	\|----------\|----------------\|---------\|
	\| GPU \| Fast code generation \| RTX 3070 / M1 Pro \|
	\| 16GB VRAM \| Run 32B model smoothly \| 8GB for 7B quantized \|

	> No GPU? Stack 2.9 works on CPU via Ollama or cloud providers (OpenAI, Together AI, etc.).

	---

	## ⚡ Step 1 — Install in 60 Seconds

	```bash
	# 1. Clone the repository
	git clone https://github.com/my-ai-stack/stack-2.9.git
	cd stack-2.9

	# 2. Create a virtual environment (recommended)
	python3 -m venv venv
	source venv/bin/activate # On Windows: venv\Scripts\activate

	# 3. Install dependencies
	pip install --upgrade pip
	pip install -r requirements.txt

	# 4. Copy environment template
	cp .env.example .env
	```

	That's it. If you hit errors, see [Troubleshooting](#-troubleshooting) below.

	---

	## 🔑 Step 2 — Configure Your Model Provider

	Stack 2.9 supports multiple LLM providers. Pick one that matches your setup:

	### Option A: Ollama (Recommended — Local, Private)

	```bash
	# Install Ollama (macOS/Linux)
	curl -fsSL https://ollama.ai/install.sh \| sh

	# Pull the Qwen model
	ollama pull qwen2.5-coder:32b

	# Set environment
	export MODEL_PROVIDER=ollama
	export OLLAMA_MODEL=qwen2.5-coder:32b
	```

	Edit your `.env` file:
	```env
	MODEL_PROVIDER=ollama
	OLLAMA_MODEL=qwen2.5-coder:32b
	```

	### Option B: Together AI (Best for Qwen, Cloud)

	```bash
	# Get your API key at https://together.ai
	export TOGETHER_API_KEY=tog-your-key-here
	```

	Edit your `.env`:
	```env
	MODEL_PROVIDER=together
	TOGETHER_API_KEY=tog-your-key-here
	TOGETHER_MODEL=togethercomputer/qwen2.5-32b-instruct
	```

	### Option C: OpenAI (GPT-4o)

	```env
	MODEL_PROVIDER=openai
	OPENAI_API_KEY=sk-your-key-here
	OPENAI_MODEL=gpt-4o
	```

	### Option D: Anthropic (Claude)

	```env
	MODEL_PROVIDER=anthropic
	ANTHROPIC_API_KEY=sk-ant-your-key-here
	ANTHROPIC_MODEL=claude-3-5-sonnet-20240229
	```

	### Option E: OpenRouter (Unified Access)

	```env
	MODEL_PROVIDER=openrouter
	OPENROUTER_API_KEY=sk-or-your-key-here
	OPENROUTER_MODEL=openai/gpt-4o
	```

	---

	## 🚀 Step 3 — Run Your First Task

	### Interactive Chat Mode

	```bash
	python stack.py
	```

	You'll see:
	```
	╔══════════════════════════════════════════════╗
	║ Stack 2.9 — AI Coding Assistant ║
	║ Pattern Memory: Active \| Tools: 46 ║
	╚══════════════════════════════════════════════╝

	You: Write a Python function to reverse a string
	```

	### Single Query Mode

	```bash
	python stack.py -c "Write a Python function to reverse a string"
	```

	Expected output:
	```python
	def reverse_string(s):
	"""Reverse a string and return it."""
	return s[::-1]

	# Or for a more robust version:
	def reverse_string(s):
	return ''.join(reversed(s))
	```

	### Ask About Your Codebase

	```bash
	python stack.py -c "Find all Python files modified in the last week and list them"
	```

	### Generate and Run Code

	```bash
	python stack.py -c "Create a hello world Flask app with one route"
	```

	---

	## 📊 Step 4 — Run Evaluation (Optional)

	> Note: Evaluation requires a GPU with ~16GB VRAM or more.

	### Prepare Your Fine-Tuned Model

	After training Stack 2.9 on your data, your merged model will be in:
	```
	./output/merged/
	```

	### Run HumanEval Benchmark

	```bash
	python evaluate_model.py \
	--model-path ./output/merged \
	--benchmark humaneval \
	--num-samples 10 \
	--output results.json
	```

	### Run MBPP Benchmark

	```bash
	python evaluate_model.py \
	--model-path ./output/merged \
	--benchmark mbpp \
	--num-samples 10 \
	--output results.json
	```

	### Run Both Benchmarks

	```bash
	python evaluate_model.py \
	--model-path ./output/merged \
	--benchmark both \
	--num-samples 10 \
	--k-values 1,10 \
	--output results.json
	```

	Expected output format:
	```
	============================================================
	HumanEval Results
	============================================================
	pass@1: 65.00%
	pass@10: 82.00%
	Total problems evaluated: 12
	============================================================

	============================================================
	MBPP Results
	============================================================
	pass@1: 70.00%
	pass@10: 85.00%
	Total problems evaluated: 12
	============================================================
	```

	### Quick Evaluation (5 Problems Only)

	```bash
	python evaluate_model.py \
	--model-path ./output/merged \
	--benchmark humaneval \
	--num-problems 5 \
	--num-samples 5
	```

	---

	## 🐳 Step 5 — Deploy Stack 2.9

	### Deploy Locally with Docker

	```bash
	# Start the container
	docker build -t stack-2.9 .
	docker run -p 7860:7860 \
	-e MODEL_PROVIDER=ollama \
	-e OLLAMA_MODEL=qwen2.5-coder:32b \
	stack-2.9
	```

	Access at: http://localhost:7860

	### Deploy to RunPod (Cloud GPU)

	```bash
	# Edit runpod_deploy.sh with your config first
	bash runpod_deploy.sh --gpu a100 --instance hourly
	```

	### Deploy to Kubernetes

	```bash
	# 1. Edit k8s/secret.yaml with your HuggingFace token
	# 2. Apply the manifests
	kubectl apply -f k8s/namespace.yaml
	kubectl apply -f k8s/secret.yaml
	kubectl apply -f k8s/configmap.yaml
	kubectl apply -f k8s/pvc.yaml
	kubectl apply -f k8s/deployment.yaml
	kubectl apply -f k8s/service.yaml

	# Check status
	kubectl get pods -n stack-29
	kubectl logs -n stack-29 deployment/stack-29
	```

	### Hardware Requirements for Deployment

	\| Model Size \| Minimum GPU \| Recommended \| Quantized (4-bit) \|
	\|------------\|-------------\|-------------\|-------------------\|
	\| 7B \| RTX 3070 (8GB) \| A100 40GB \| RTX 3060 (6GB) \|
	\| 32B \| A100 40GB \| A100 80GB \| RTX 3090 (24GB) \|

	---

	## 🧠 Pattern Memory Quick Guide

	Stack 2.9 stores successful patterns to help with future tasks.

	### List Your Patterns

	```bash
	python stack.py --patterns list
	python stack.py --patterns stats
	```

	### Extract Patterns from Your Git History

	```bash
	python scripts/extract_patterns_from_git.py \
	--repo-path . \
	--output patterns.jsonl \
	--since-date "2024-01-01"
	```

	### Merge LoRA Adapters (Team Sharing)

	```bash
	python scripts/merge_lora_adapters.py \
	--adapters adapter_a.safetensors adapter_b.safetensors \
	--weights 0.7 0.3 \
	--output merged.safetensors
	```

	---

	## 🛠️ Troubleshooting

	### "Module not found" errors

	```bash
	pip install -r requirements.txt
	```

	### "CUDA out of memory" during evaluation

	```bash
	# Reduce batch size
	python evaluate_model.py --model-path ./merged --num-samples 5

	# Or use 4-bit quantization
	# (See docs/TRAINING_7B.md for quantized training)
	```

	### "Model not found" with Ollama

	```bash
	ollama pull qwen2.5-coder:32b
	ollama list # Verify it's installed
	```

	### "API key not set" errors

	```bash
	# Double-check your .env file
	cat .env

	# For testing, you can also set inline
	export TOGETHER_API_KEY=tog-your-key
	```

	### Slow inference on CPU

	```bash
	# Use a smaller model
	export OLLAMA_MODEL=qwen2.5-coder:7b

	# Or switch to cloud
	export MODEL_PROVIDER=together
	```

	### Docker build fails

	```bash
	# Use Python 3.10 explicitly
	docker build --build-arg PYTHON_VERSION=3.10 -t stack-2.9 .
	```

	### Kubernetes GPU not found

	```bash
	# Verify nvidia.com/gpu label on your node
	kubectl get nodes -L nvidia.com/gpu

	# Install NVIDIA GPU Operator if missing
	# https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/
	```

	---

	## 📚 What's Next?

	\| Goal \| Go To \|
	\|------\|-------\|
	\| Train on my own data \| `docs/TRAINING_7B.md` \|
	\| Learn all 46 tools \| `TOOLS.md` \|
	\| Set up team pattern sharing \| `docs/pattern-moat.md` \|
	\| Understand the architecture \| `docs/reference/ARCHITECTURE.md` \|
	\| Report a bug \| `SECURITY.md` / GitHub Issues \|

	---

	## ⚡ Quick Reference Card

	```bash
	# Install
	git clone https://github.com/my-ai-stack/stack-2.9.git
	cd stack-2.9 && pip install -r requirements.txt

	# Configure
	cp .env.example .env # Edit with your API keys

	# Run
	python stack.py # Interactive
	python stack.py -c "your code request" # Single query

	# Evaluate
	python evaluate_model.py --model-path ./merged --benchmark humaneval

	# Deploy
	docker build -t stack-2.9 . && docker run -p 7860:7860 stack-2.9
	```

	---

	Stack 2.9 — AI that learns your patterns and grows with you.