Stack-2-9-finetuned / stack /deploy /TROUBLESHOOTING.md

walidsobhie-code

refactor: Squeeze folders further - cleaner structure

65888d5 22 days ago

preview code

raw

history blame contribute delete

9.57 kB

Deployment Troubleshooting Guide

Quick Diagnostic

Run the health check first:

curl http://localhost:8000/health

Or use Python:

python3 -c "import urllib.request; print(urllib.request.urlopen('http://localhost:8000/health').read())"

Check logs:

docker-compose logs -f vllm
# or
tail -f logs/vllm.log

Common Issues and Solutions

1. Docker/Compose Issues

Problem: `docker: command not found`

Error: Docker is not installed or not in PATH.

Solution:

# Install Docker (Ubuntu/Debian)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Log out and back in

# Install Docker Compose
sudo apt-get install docker-compose-plugin
# or download binary: https://github.com/docker/compose/releases

Problem: `Cannot connect to the Docker daemon`

Error: Permission denied or socket not found.

Solution:

# Start Docker service
sudo systemctl start docker
sudo systemctl enable docker

# Verify permissions
docker info

Problem: `nvidia: driver not installed` or GPU not detected

Error: Docker doesn't see NVIDIA GPU.

Solution:

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

# Verify
docker run --rm --gpus all nvidia/cuda:11.8-base nvidia-smi

2. vLLM Service Issues

Problem: `GPU Out of Memory (OOM)`

Error in logs: CUDA out of memory or CUDA error: out of memory

Solution:

Reduce model memory usage via environment variables:

export GPU_MEMORY_UTILIZATION=0.7  # Lower from 0.9
export MAX_MODEL_LEN=8192         # Reduce from 131072
export BLOCK_SIZE=16              # Smaller blocks

Use quantized model (recommended):
- Convert model to AWQ or GGUF format
- Set QUANTIZATION=awq in environment
Use smaller model: Switch from Llama-3.1-8B to 7B or smaller
Reduce batch size:

export MAX_BATCH_SIZE=4

Ensure no other processes are using GPU:

nvidia-smi  # Check for other processes

Problem: `Model not found`

Error: Model fails to load, FileNotFoundError, or stays in loading state.

Solution:

Check model path:

# For local model:
ls -la models/
# Should contain config.json, pytorch_model.bin, etc.

# For HuggingFace model:
# Set MODEL_NAME to HF name, e.g., meta-llama/Llama-3.1-8B-Instruct

Download model manually if automatic download fails:

# Install huggingface-cli
pip install huggingface-hub

# Download (requires authentication for gated models)
huggingface-cli login  # if needed
huggingface-cli download meta-llama/Llama-3.1-8B-Instruct --local-dir models

Check disk space:

df -h
# Need ~16GB for 8B model (32GB for original, ~8GB for quantized)

Use pre-downloaded model:
- Upload model to the models/ directory before starting
- Mount external volume with model

Problem: `Health check timeout` or `503 Service Unavailable`

Cause: Model still loading, or failed to start.

Diagnosis:

docker-compose logs vllm
# Look for "Model loaded successfully" or error messages

Solution:

Wait longer (first load can take 5-15 minutes)
Check logs for specific errors (OOM, missing files)
Increase healthcheck start_period:

healthcheck:
  start_period: 300s  # Increase from 120s

Problem: `CORS or network errors` when calling API

Symptoms: Connection refused, network timeout.

Solution:

# Check if container is running
docker-compose ps

# Check port mapping
docker-compose port vllm 8000

# Test from inside container
docker-compose exec vllm curl http://localhost:8000/health

# Check firewall
sudo ufw status
sudo ufw allow 8000

Problem: `Redis connection failed`

Error: Could not connect to Redis

Solution:

Redis is optional (caching). vLLM will continue without it.
If you want Redis:

docker-compose ps redis  # Check if running
docker-compose logs redis

3. Docker Compose Issues

Problem: `Port already in use`

Error: Bind for 0.0.0.0:8000 failed: port is already allocated

Solution:

# Find process using port
lsof -i :8000
# or
netstat -tulpn | grep :8000

# Kill process or change port in docker-compose.yml:
# ports:
#   - "8001:8000"  # Map host 8001 to container 8000

Problem: `Volume mount permission denied`

Error: Cannot mount ./models:/models

Solution:

# Create directories with proper permissions
mkdir -p models logs
sudo chown -R $(id -u):$(id -g) models logs
# Or run Docker with volume flags to ignore permissions

Problem: `docker-compose: command not found`

Solution:

# Docker Compose v2 (included with Docker)
sudo apt-get install docker-compose-plugin

# Or Docker Compose v1 (standalone)
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

4. Cloud Deployment Issues

RunPod Specific

Problem: runpodctl: command not found

# Install
curl -L https://github.com/runpod/runpodctl/releases/latest/download/runpodctl-linux-amd64 -o runpodctl
sudo install runpodctl /usr/local/bin/
runpodctl config  # Set API key

Problem: Template not found or pod creation failed

Ensure you have sufficient quota/balance
Check GPU availability in your region
Verify template name (case-sensitive)

Problem: SCP/SSH connection failed

Pod may still be starting; wait 2-3 minutes
Check pod status: runpodctl get pod <id>
Verify pod is in RUNNING state

Problem: Insufficient disk space on pod

Increase disk size in script (DISK_SIZE=100 or higher)
Upload model separately to /workspace/models before starting

Vast.ai Specific

Problem: vastai: command not found

pip install vastai
# or download from: https://vast.ai/docs/cli

Problem: No suitable instance found

Relax search criteria (lower VAST_GPU_RAM)
Increase VAST_SEARCH_LIMIT
Check marketplace manually: vastai search offers "cuda>=11.8"

Problem: SSH connection refused

Instance may still be provisioning
Check vastai show instance <id>
Ensure port forwarding is set up correctly

Problem: Instance died or unresponsive

Check if balance depleted
Instance may have been evicted (low priority)
Use --priority flag or choose higher-cost instances

Performance Tuning

Reduce Latency

export MAX_BATCH_SIZE=4          # Smaller batches for lower latency
export MAX_MODEL_LEN=4096        # Shorter context window
export GPU_MEMORY_UTILIZATION=0.8

Increase Throughput

export MAX_BATCH_SIZE=32        # Larger batches
export MAX_MODEL_LEN=16384      # Longer context capability
export GPU_MEMORY_UTILIZATION=0.95

Multi-GPU Setup

# Automatically detected. Ensure tensor parallel size matches GPU count:
# export TENSOR_PARALLEL_SIZE=2  # For 2 GPUs (usually auto-detected)

Monitoring

Health Endpoint

curl http://localhost:8000/health | jq
# Returns: {"status":"healthy","model":{...},"timestamp":...}

Readiness Endpoint (K8s liveness)

curl http://localhost:8000/ready
# Returns: {"status":"ready"}

Prometheus Metrics

curl http://localhost:9090/metrics
# Look for: vllm_requests_total, vllm_request_latency_seconds

Container Logs

# All logs
docker-compose logs -f vllm

# Last 100 lines
docker-compose logs --tail=100 vllm

# Search for errors
docker-compose logs vllm | grep -i error

Model Compatibility

Supported Formats

HuggingFace (default): MODEL_FORMAT=hf
Local directory: Mount model folder to /models
AWQ quantized: Set QUANTIZATION=awq and use AWQ model

Gated Models (Llama 3.1, etc.)

Request access on HuggingFace
Get your token: https://huggingface.co/settings/tokens
Authenticate:

huggingface-cli login
# Paste token

Unsupported Models

If vLLM doesn't support your model architecture:

Use trust_remote_code=True (already set)
Convert model to supported format
Check vLLM supported models: https://docs.vllm.ai/

Debug Mode

Enable verbose logging:

export LOG_LEVEL=DEBUG
# restart services
docker-compose down && docker-compose up -d

Getting Help

Check this guide for common symptoms
Review logs: docker-compose logs vllm
Search issues: https://github.com/vllm-project/vllm/issues
Community: https://discord.gg/vllm

Quick Reference Commands

# Start deployment
cd stack-2.9-deploy
./local_deploy.sh

# Stop deployment
docker-compose down

# View logs
docker-compose logs -f vllm

# Restart single service
docker-compose restart vllm

# Check service status
docker-compose ps

# Access container shell
docker-compose exec vllm bash

# Clean everything (WARNING: deletes data!)
docker-compose down -v
rm -rf models logs

# Rebuild image (after Dockerfile changes)
docker-compose build --no-cache vllm
docker-compose up -d

Deployment Troubleshooting Guide

Quick Diagnostic

Common Issues and Solutions

1. Docker/Compose Issues

Problem: docker: command not found

Problem: Cannot connect to the Docker daemon

Problem: nvidia: driver not installed or GPU not detected

2. vLLM Service Issues

Problem: GPU Out of Memory (OOM)

Problem: Model not found

Problem: Health check timeout or 503 Service Unavailable

Problem: CORS or network errors when calling API

Problem: Redis connection failed

3. Docker Compose Issues

Problem: Port already in use

Problem: Volume mount permission denied

Problem: docker-compose: command not found

4. Cloud Deployment Issues

RunPod Specific

Vast.ai Specific

Performance Tuning

Reduce Latency

Increase Throughput

Multi-GPU Setup

Monitoring

Health Endpoint

Readiness Endpoint (K8s liveness)

Prometheus Metrics

Container Logs

Model Compatibility

Supported Formats

Gated Models (Llama 3.1, etc.)

Unsupported Models

Debug Mode

Getting Help

Quick Reference Commands

Problem: `docker: command not found`

Problem: `Cannot connect to the Docker daemon`

Problem: `nvidia: driver not installed` or GPU not detected

Problem: `GPU Out of Memory (OOM)`

Problem: `Model not found`

Problem: `Health check timeout` or `503 Service Unavailable`

Problem: `CORS or network errors` when calling API

Problem: `Redis connection failed`

Problem: `Port already in use`

Problem: `Volume mount permission denied`

Problem: `docker-compose: command not found`