feat: Add free deployment support for Stack 2.9

New additions:
- Together AI fine-tuning script (free credits)
- HuggingFace Spaces deployment (free hosting)
- Free deployment guide with cost comparison
- Updated README with free tier options

Enables deployment on:
- HuggingFace Spaces (free inference API)
- Together AI (free fine-tuning)
- Google Colab (free training)

Recommended: Qwen2.5-Coder-7B for free tier

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (5) hide show

README.md +34 -1
stack/deploy/FREE_DEPLOYMENT.md +132 -0
stack/deploy/hfSpaces/Dockerfile +26 -0
stack/deploy/hfSpaces/app.py +147 -0
stack/training/together_finetune.py +138 -0

README.md CHANGED Viewed

@@ -130,7 +130,23 @@ Stack 2.9 requires a GPU for optimal performance. Minimum and recommended config
 - Multi-GPU (tensor parallelism) supported for large models
 - Ensure NVIDIA drivers and CUDA toolkit are installed
-For detailed deployment options (Docker, RunPod, Vast.ai, Kubernetes), see `stack-2.9-deploy/README.md`.
 ### Interactive Chat
@@ -424,3 +440,20 @@ Licensed under the Apache License 2.0. See [LICENSE](LICENSE) for details.
 <p align="center">
   Built with ❤️ for developers who want an AI that grows with them
 </p>

 - Multi-GPU (tensor parallelism) supported for large models
 - Ensure NVIDIA drivers and CUDA toolkit are installed
+### Free Deployment (No Cost)
+Stack 2.9 can be deployed on free platforms:
+| Platform | What's Free | How |
+|----------|-------------|-----|
+| **HuggingFace Spaces** | 2CPU 4GB inference | `stack/deploy/FREE_DEPLOYMENT.md` |
+| **Together AI** | Fine-tuning credits | `stack/training/together_finetune.py` |
+| **Google Colab** | ~0.5hr GPU/day | `colab_train_stack29.ipynb` |
+**Recommended for free tier:**
+- Model: `Qwen2.5-Coder-7B` (runs on free GPU)
+- Fine-tune: Together AI (free credits)
+- Deploy: HuggingFace Spaces (free hosting)
+See `stack/deploy/FREE_DEPLOYMENT.md` for detailed guide.
+For paid deployment (Docker, RunPod, Vast.ai), see `stack/deploy/README.md`.
 ### Interactive Chat
 <p align="center">
   Built with ❤️ for developers who want an AI that grows with them
 </p>
+### Free Deployment (No Cost)
+Stack 2.9 can run on free platforms:
+| Platform | What's Free | Recommended For |
+|----------|-----------------|-----------------|
+| **HuggingFace Spaces** | 2CPU 4GB hosting | API deployment |
+| **Together AI** | Fine-tuning credits | Model customization |
+| **Google Colab** | ~0.5hr GPU/day | Training experiments |
+**Free tier model:** Use Qwen2.5-Coder-7B (runs on free GPU)
+See `stack/deploy/FREE_DEPLOYMENT.md` for detailed guide.
+For paid options see `stack/deploy/README.md`.

stack/deploy/FREE_DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# Free Deployment Guide for Stack 2.9
+This guide covers deploying Stack 2.9 on free-tier platforms.
+---
+## Option 1: HuggingFace Spaces (Free Inference)
+### Step 1: Create Space
+```bash
+# Go to https://huggingface.co/spaces and create new Space
+# Choose: Docker, Python 3.11, Small (2CPU 4GB)
+```
+### Step 2: Push Your Model
+```bash
+# Upload your fine-tuned model to HF
+from huggingface_hub import HfApi
+api = HfApi()
+api.upload_folder(
+    folder_path="./stack-2.9-7b",
+    repo_id="yourusername/stack-2.9",
+    repo_type="model"
+)
+```
+### Step 3: Configure API URL
+Set environment variable in Space:
+- `API_URL`: Your model inference URL
+- `HF_TOKEN`: Your HF token
+### Step 4: Deploy
+```bash
+# Clone Space and push files
+git clone https://huggingface.co/spaces/yourusername/stack-2.9
+cp deploy/hfSpaces/* .
+git add . && git push
+```
+---
+## Option 2: Together AI Fine-tuning (Free Credits)
+### Free Tier Limits
+- Up to 7B model fine-tuning
+- Limited training minutes (varies by promotion)
+- Requires: Together AI account
+### Setup
+```bash
+# Get API key from https://together.ai
+export TOGETHER_API_KEY="your-key"
+# Fine-tune 7B model (free-tier friendly)
+python stack/training/together_finetune.py \
+    --model 7b \
+    --data data/final/train.jsonl \
+    --epochs 3
+```
+### Use Fine-tuned Model
+```python
+from together import Together
+client = Together(api_key="your-key")
+response = client.chat.completions.create(
+    model="your-finetuned-model",
+    messages=[{"role": "user", "content": "Write a function"}]
+)
+```
+---
+## Option 3: Google Colab (Free Training)
+### Run Training
+```python
+# Open colab_train_stack29.ipynb in Google Colab
+# Select GPU runtime (free tier: T4 15GB)
+# For 7B model (runs on free tier):
+batch_size = 2  # Reduce for 15GB VRAM
+gradient_accumulation = 8
+```
+### Model Sizes for Free Tier
+| Model | VRAM Needed | Free Tier? |
+|-------|-------------|------------|
+| 1.5B | ~4GB | ✅ Yes |
+| 3B | ~8GB | ✅ Yes (T4) |
+| 7B | ~16GB | ⚠️ Limited |
+| 32B | ~64GB | ❌ No |
+---
+## Option 4: RunPod / Vast.ai (Cheap, Not Free)
+### Quick Start
+```bash
+# Deploy on RunPod (~$0.20/hour for A100)
+cd stack/deploy
+./runpod_deploy.sh --template runpod-template.json
+# Deploy on Vast.ai (~$0.15/hour)
+./vastai_deploy.sh --template vastai-template.json
+```
+---
+## Recommended Free Stack
+```
+┌─────────────────────────────────────────────┐
+│  Stack 2.9 Free Deployment Stack           │
+├─────────────────────────────────────────────┤
+│  Model:    Qwen2.5-Coder-7B               │
+│  Fine-tune: Together AI (free credits)      │
+│  Deploy:    HuggingFace Spaces (free)       │
+│  UI:        Gradio (included in Spaces)   │
+└─────────────────────────────────────────────┘
+```
+## Cost Comparison
+| Platform | Cost | What's Free |
+|----------|------|-------------|
+| HF Spaces | $0 | 2CPU 4GB hosting |
+| Together AI | varies | Fine-tuning credits |
+| Colab | $0 | ~0.5hr GPU/day |
+| RunPod | $0.20/hr | First $10 credit |
+| Vast.ai | $0.15/hr | First $5 credit |

stack/deploy/hfSpaces/Dockerfile ADDED Viewed

	@@ -0,0 +1,26 @@

+# HuggingFace Spaces Dockerfile for Stack 2.9
+# Use this for free inference hosting on HF Spaces
+# https://huggingface.co/docs/hub/spaces-sdks-docker
+FROM python:3.11-slim
+# Set environment
+ENV PYTHONUNBUFFERED=1
+ENV PORT=7860
+# Install dependencies
+RUN pip install --no-cache-dir \
+    fastapi \
+    uvicorn[standard] \
+    pydantic \
+    requests \
+    huggingface_hub
+# Copy app
+COPY app.py .
+# Expose port
+EXPOSE 7860
+# Run app
+CMD ["python", "app.py"]

stack/deploy/hfSpaces/app.py ADDED Viewed

	@@ -0,0 +1,147 @@

+"""
+HuggingFace Spaces Deployment for Stack 2.9
+Free inference API on HuggingFace Spaces.
+https://huggingface.co/docs/hub/spaces-sdks-docker
+"""
+# =============================================================================
+# app.py - Stack 2.9 Inference API
+# Deploy this to HuggingFace Spaces for free inference
+# =============================================================================
+import os
+import json
+from typing import Optional, List, Dict
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+import requests
+app = FastAPI(title="Stack 2.9 API")
+# Model configuration
+MODEL_NAME = os.environ.get("MODEL_NAME", "Qwen/Qwen2.5-Coder-7B-Instruct")
+API_URL = os.environ.get("API_URL", "")  # Your model API URL
+HF_TOKEN = os.environ.get("HF_TOKEN", "")  # HuggingFace token
+# ============================================================================
+# Request/Response Models
+# ============================================================================
+class ChatMessage(BaseModel):
+    role: str
+    content: str
+class ChatRequest(BaseModel):
+    messages: List[ChatMessage]
+    max_tokens: int = 1024
+    temperature: float = 0.7
+    top_p: float = 0.9
+class ChatResponse(BaseModel):
+    content: str
+    model: str
+    usage: Optional[Dict] = None
+class CompletionRequest(BaseModel):
+    prompt: str
+    max_tokens: int = 512
+    temperature: float = 0.7
+# ============================================================================
+# Health Check
+# ============================================================================
+@app.get("/health")
+async def health():
+    return {"status": "healthy", "model": MODEL_NAME}
+@app.get("/")
+async def root():
+    return {
+        "name": "Stack 2.9",
+        "version": "1.0.0",
+        "model": MODEL_NAME,
+        "endpoints": {
+            "chat": "/v1/chat/completions",
+            "complete": "/v1/completions",
+            "health": "/health"
+        }
+    }
+# ============================================================================
+# OpenAI-Compatible API
+# ============================================================================
+@app.post("/v1/chat/completions", response_model=ChatResponse)
+async def chat_completions(request: ChatRequest):
+    """OpenAI-compatible chat endpoint"""
+    if API_URL:
+        # Use external API
+        response = requests.post(
+            f"{API_URL}/v1/chat/completions",
+            headers={"Authorization": f"Bearer {HF_TOKEN}"},
+            json={
+                "messages": [m.dict() for m in request.messages],
+                "max_tokens": request.max_tokens,
+                "temperature": request.temperature,
+            },
+            timeout=60
+        )
+        return response.json()
+    # Placeholder for local model
+    raise HTTPException(
+        status_code=503,
+        detail="No model API configured. Set API_URL environment variable."
+    )
+@app.post("/v1/completions")
+async def completions(request: CompletionRequest):
+    """OpenAI-compatible completion endpoint"""
+    if API_URL:
+        response = requests.post(
+            f"{API_URL}/v1/completions",
+            headers={"Authorization": f"Bearer {HF_TOKEN}"},
+            json={
+                "prompt": request.prompt,
+                "max_tokens": request.max_tokens,
+                "temperature": request.temperature,
+            },
+            timeout=60
+        )
+        return response.json()
+    raise HTTPException(
+        status_code=503,
+        detail="No model API configured"
+    )
+# ============================================================================
+# Model Info
+# ============================================================================
+@app.get("/v1/models")
+async def list_models():
+    return {
+        "object": "list",
+        "data": [
+            {
+                "id": MODEL_NAME,
+                "object": "model",
+                "created": 1700000000,
+                "owned_by": "stack-2.9"
+            }
+        ]
+    }
+# ============================================================================
+# Run Server
+# ============================================================================
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.environ.get("PORT", "7860"))
+    uvicorn.run(app, host="0.0.0.0", port=port)

stack/training/together_finetune.py ADDED Viewed

	@@ -0,0 +1,138 @@

+"""
+Together AI Fine-tuning Script for Stack 2.9
+Free fine-tuning on Together AI platform.
+https://docs.together.ai/docs/fine-tuning
+"""
+import os
+import json
+import requests
+from typing import Optional
+TOGETHER_API = "https://api.together.xyz/v1"
+class TogetherFineTuner:
+    def __init__(self, api_key: str = None):
+        self.api_key = api_key or os.environ.get("TOGETHER_API_KEY")
+        if not self.api_key:
+            raise ValueError("TOGETHER_API_KEY required")
+    def upload_dataset(self, file_path: str) -> str:
+        """Upload training data to Together AI"""
+        url = f"{TOGETHER_API}/files"
+        with open(file_path, 'rb') as f:
+            response = requests.post(
+                url,
+                headers={"Authorization": f"Bearer {self.api_key}"},
+                files={"file": f}
+            )
+        if response.status_code == 200:
+            return response.json()['id']
+        raise Exception(f"Upload failed: {response.text}")
+    def create_finetune_job(
+        self,
+        model: str,
+        training_file: str,
+        epochs: int = 3,
+        batch_size: int = 4,
+        learning_rate: float = 1e-5,
+    ) -> dict:
+        """
+        Create fine-tuning job on Together AI
+        Free tier: Up to 7B models, limited training minutes
+        """
+        url = f"{TOGETHER_API}/fine_tuning/jobs"
+        payload = {
+            "model": model,  # e.g., "Qwen/Qwen2.5-Coder-7B"
+            "training_file": training_file,
+            "epochs": epochs,
+            "batch_size": batch_size,
+            "learning_rate": learning_rate,
+            "lora": True,  # Enable LoRA for efficiency
+            "lora_r": 64,
+            "lora_alpha": 128,
+        }
+        response = requests.post(
+            url,
+            headers={
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json"
+            },
+            json=payload
+        )
+        if response.status_code == 200:
+            return response.json()
+        raise Exception(f"Job creation failed: {response.text}")
+    def get_job_status(self, job_id: str) -> dict:
+        """Check fine-tuning job status"""
+        url = f"{TOGETHER_API}/fine_tuning/jobs/{job_id}"
+        response = requests.get(
+            url,
+            headers={"Authorization": f"Bearer {self.api_key}"}
+        )
+        return response.json()
+    def list_fine_tuned_models(self) -> list:
+        """List your fine-tuned models"""
+        url = f"{TOGETHER_API}/fine_tuning/models"
+        response = requests.get(
+            url,
+            headers={"Authorization": f"Bearer {self.api_key}"}
+        )
+        return response.json().get('models', [])
+# Recommended models for free tier
+FREE_TIER_MODELS = {
+    "7b": "Qwen/Qwen2.5-Coder-7B",
+    "3b": "Qwen/Qwen2.5-Coder-3B",
+    "1.5b": "Qwen/Qwen2.5-Coder-1.5B",
+}
+def main():
+    import argparse
+    parser = argparse.ArgumentParser(description="Fine-tune on Together AI")
+    parser.add_argument("--api-key", type=str, help="Together AI API key")
+    parser.add_argument("--model", default="7b", choices=["7b", "3b", "1.5b"],
+                        help="Model size")
+    parser.add_argument("--data", required=True, help="Training data file (JSONL)")
+    parser.add_argument("--epochs", type=int, default=3)
+    args = parser.parse_args()
+    tuner = TogetherFineTuner(args.api_key)
+    # Upload data
+    print("Uploading dataset...")
+    file_id = tuner.upload_dataset(args.data)
+    print(f"Uploaded: {file_id}")
+    # Start job
+    model_name = FREE_TIER_MODELS[args.model]
+    print(f"Starting fine-tune on {model_name}...")
+    job = tuner.create_finetune_job(
+        model=model_name,
+        training_file=file_id,
+        epochs=args.epochs,
+    )
+    print(f"Job created: {job['id']}")
+    print(f"Status: {job['status']}")
+if __name__ == "__main__":
+    main()