# 🛡️ CodeSentry Backend

**AI/ML Code Security Analysis Engine — powered by Qwen2.5-Coder-32B on AMD MI300X**

> Zero Data Retention. All inference runs locally. No code leaves your machine.

---

## Overview

CodeSentry is a multi-agent backend that audits AI/ML codebases for security vulnerabilities and performance issues:

- **Security Agent** — OWASP Top-10 + OWASP LLM Top-10 scanning (static regex + LLM deep analysis)
- **Performance Agent** — GPU memory leaks, N+1 embeddings, FP32 waste, missing `@torch.no_grad`
- **Fix Agent** — Generates unified diffs, security reports, and PR descriptions
- **AMD Migration Advisor** — 10-category CUDA → ROCm/HIP compatibility scanner with AMD Compatibility Score
- **AMD Metrics Collector** — Real-time MI300X GPU monitoring via `rocm-smi` (with simulated fallback)
- **Privacy Guard** — Blocks outbound connections, generates cryptographically signed ZDR certificates

**Model stack:** `Qwen/Qwen2.5-Coder-32B-Instruct` via vLLM on AMD MI300X (192 GB HBM3)

---

## Quick Start

### 1. Setup vLLM on AMD MI300X

```bash
cd codesentry-backend
chmod +x scripts/setup_vllm.sh
./scripts/setup_vllm.sh
```

This installs vLLM with ROCm backend, starts the model server, and launches the CodeSentry API.

### 2. Manual startup

```bash
# Copy and configure environment
cp .env.example .env

# Install dependencies
pip install -r requirements.txt

# Start vLLM (in background)
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct \
  --port 8080 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 32768 &

# Start CodeSentry API
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```

---

## API Reference

### `GET /api/health`

Check service status, GPU memory, and live AMD hardware metrics.

```bash
curl http://localhost:8000/api/health
```

**Response:**
```json
{
  "status": "ok",
  "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
  "vllm_ready": true,
  "gpu_memory_free_gb": 142.5,
  "vllm_endpoint": "http://localhost:8080",
  "amd_hardware": {
    "gpu_utilization_percent": 85,
    "vram_used_gb": 48.2,
    "vram_total_gb": 192.0,
    "temperature_c": 63,
    "power_draw_w": 612,
    "memory_bandwidth_tbs": 4.7,
    "tokens_per_sec": 1250,
    "timestamp": "2026-05-09T13:30:00Z"
  }
}
```

---

### `POST /api/scan` & `GET /api/scan/stream/{session_id}` — SSE Stream

Analyse a codebase. Returns a Server-Sent Events stream.

```bash
# Analyse a GitHub repository (creates scan session)
curl -X POST http://localhost:8000/api/scan \
  -H "Content-Type: application/json" \
  -d '{
    "source": "https://github.com/example/vulnerable-ml-app",
    "source_type": "github",
    "session_id": "test-123"
  }'

# Stream the results
curl -N http://localhost:8000/api/scan/stream/test-123
```

**SSE Events:**
```
event: status
data: {"message": "Ingesting code...", "session_id": "test-123"}

event: agent_start
data: {"agent": "security", "status": "scanning"}

event: finding
data: {"severity": "critical", "title": "Insecure Pickle Deserialization", "cwe": "CWE-502", "line_number": 2}

event: amd_metrics
data: {"gpu_utilization_percent": 87, "vram_used_gb": 48.2, "vram_total_gb": 192.0, "temperature_c": 63, ...}

event: agent_start
data: {"agent": "performance", "status": "analyzing"}

event: finding
data: {"agent": "performance", "type": "gpu_memory", "saving_mb": 3584, "suggestion": "Switch from FP32 to BF16"}

event: amd_migration_finding
data: {"id": "AMD_M02", "title": "NVIDIA-Specific CLI Tool", "severity": "critical", "rocm_fix": "..."}

event: amd_migration_summary
data: {"compatibility_score": 72, "compatibility_label": "Mostly Compatible", "total_cuda_patterns_found": 3}

event: fix_ready
data: {"findingId": "SEC-STATIC-1", "title": "Fix pickle.load", "before": "...", "after": "..."}

event: complete
data: {"summary": {...}, "privacy_certificate": {...}, "amd_migration_guide": {...}}
```

---

### `POST /api/analyze/demo`

Pre-computed result from the vulnerable fixture. **No GPU required.** For frontend development and CI.

```bash
curl -X POST http://localhost:8000/api/analyze/demo | python -m json.tool
```

---

### `GET /api/session/{session_id}`

Retrieve the full analysis result for a completed session (includes `amd_migration_guide`).

```bash
curl http://localhost:8000/api/session/test-123
```

---

### `GET /api/privacy-certificate/{session_id}`

Get the Zero Data Retention audit certificate for a session.

```bash
curl http://localhost:8000/api/privacy-certificate/test-123
```

**Response:**
```json
{
  "session_id": "test-123",
  "timestamp": "2024-01-01T00:00:00+00:00",
  "guarantee": "All inference ran exclusively on localhost AMD MI300X via vLLM. Zero data transmitted to external services.",
  "model_endpoint": "http://localhost:8080",
  "external_calls_blocked": [],
  "data_wiped": true,
  "signature": "a3f8d2..."
}
```

---

## Running Tests

```bash
# Install test dependencies and run all tests (no GPU required)
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh

# Or directly with pytest
export USE_LLM=false
pytest tests/ -v --asyncio-mode=auto
```

All 15+ tests use **static analysis only** — no GPU or vLLM server needed.

---

## Benchmarking

```bash
# Requires running API at localhost:8000
chmod +x scripts/benchmark.sh
./scripts/benchmark.sh

# Custom URL and run count
CODESENTRY_URL=http://localhost:8000 BENCHMARK_RUNS=5 ./scripts/benchmark.sh
```

Outputs `benchmark_results.json` with TTFF, total latency, and findings statistics.

---

## Project Structure

```
codesentry-backend/
├── main.py                    # FastAPI app entry point
├── amd_metrics.py             # AMD MI300X live metrics (rocm-smi + simulated fallback)
├── api/
│   ├── routes.py              # All API endpoints
│   └── models.py              # Pydantic request/response schemas
├── agents/
│   ├── orchestrator.py        # Master agent (coordinates all sub-agents, SSE)
│   ├── security_agent.py      # OWASP + OWASP-LLM-Top-10 scanner
│   ├── performance_agent.py   # GPU memory, latency, ROCm optimisation
│   ├── fix_agent.py           # Code fixes, diffs, security report
│   └── amd_migration_advisor.py  # CUDA → ROCm migration (10 pattern categories)
├── tools/
│   ├── code_parser.py         # AST parsing, GitHub/zip/string ingestion
│   ├── github_connector.py    # GitHub shallow clone
│   ├── vulnerability_db.py    # OWASP knowledge base + regex patterns
│   ├── diff_generator.py      # Unified diff generation
│   └── benchmark_tool.py      # GPU memory estimation + timing
├── privacy/
│   └── privacy_guard.py       # ZDR enforcement + HMAC certificates
├── memory/
│   └── session_store.py       # In-memory TTL session store
├── tests/
│   ├── fixtures/
│   │   ├── vulnerable_ml_code.py  # Deliberately vulnerable ML app
│   │   ├── clean_ml_code.py       # Secure baseline
│   │   └── expected_findings.json # Ground truth for assertions
│   ├── test_security_agent.py
│   ├── test_performance_agent.py
│   ├── test_api_endpoints.py
│   └── test_privacy_guard.py
├── scripts/
│   ├── setup_vllm.sh          # One-command AMD MI300X setup
│   ├── run_tests.sh           # Full test suite runner
│   └── benchmark.sh           # Latency + throughput benchmark
├── requirements.txt
├── .env.example
└── README.md
```

---

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `VLLM_BASE_URL` | `http://localhost:8080/v1` | vLLM OpenAI-compatible endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-Coder-32B-Instruct` | Model served by vLLM |
| `USE_LLM` | `true` | Set `false` for static-only mode (CI) |
| `PORT` | `8000` | CodeSentry API port |
| `CORS_ORIGINS` | `*` | Allowed frontend origins |
| `ZDR_SIGNING_KEY` | (dev default) | HMAC key for certificates — **change in production** |
| `GROQ_API_KEY` | — | Groq cloud API key (alternative to local vLLM) |

---

## Zero Data Retention

Every analysis session runs inside a `ZeroDataRetentionGuard` that:

1. **Blocks** all outbound non-localhost network connections at the socket level
2. **Logs** any blocked connection attempts to the audit trail
3. **Wipes** all session data from memory after the analysis completes
4. **Generates** a cryptographically signed audit certificate

The certificate is available at `GET /api/privacy-certificate/{session_id}`.

---

## Vulnerability Coverage

### Security (OWASP)

| Category | ID | Description |
|---|---|---|
| OWASP LLM | LLM01 | Prompt Injection |
| OWASP LLM | LLM02 | Insecure Output Handling (eval, exec) |
| OWASP LLM | LLM03 | Training Data Poisoning |
| OWASP LLM | LLM04 | Model Denial of Service |
| OWASP LLM | LLM06 | Sensitive Information Disclosure |
| OWASP LLM | LLM08 | Excessive Agency |
| OWASP LLM | LLM09 | Overreliance |
| OWASP Web | A01 | Broken Access Control |
| OWASP Web | A02 | Cryptographic Failures |
| OWASP Web | A03 | SQL Injection |
| OWASP Web | A04 | Insecure Deserialization (CWE-502) |
| OWASP Web | A05 | Security Misconfiguration |
| OWASP Web | A07 | Hardcoded Credentials |
| OWASP Web | A08 | Software & Data Integrity Failures |
| OWASP Web | A10 | Server-Side Request Forgery |
| ML-Specific | ML01 | GPU Memory Leak |
| ML-Specific | ML02 | Missing `@torch.no_grad` |
| ML-Specific | ML03 | N+1 Embedding Calls |
| ML-Specific | ML04 | FP32 vs BF16 Inefficiency |
| ML-Specific | ML05 | Synchronous Model Loading in Handler |

### AMD Migration (CUDA → ROCm)

| ID | Severity | Description |
|---|---|---|
| AMD_M01 | Low | `torch.cuda.is_available()` — CUDA device check |
| AMD_M02 | Critical | `nvidia-smi` — NVIDIA-only CLI tool |
| AMD_M03 | High | `CUDA_VISIBLE_DEVICES` — CUDA env variable |
| AMD_M04 | High | `torch.cuda.amp.autocast/GradScaler` — Legacy CUDA AMP |
| AMD_M05 | Medium | `.half()` / `torch.float16` — FP16 suboptimal on MI300X |
| AMD_M06 | Medium | `torch.backends.cudnn.*` — cuDNN configuration |
| AMD_M07 | High | `import flash_attn` — CUDA-only Flash Attention |
| AMD_M08 | Low | `torch.cuda.memory_allocated()` — CUDA memory profiling |
| AMD_M09 | Low | `device = 'cuda'` — Hardcoded device string |
| AMD_M10 | Critical | `BitsAndBytesConfig` — CUDA-only quantization |