File size: 10,469 Bytes
7b4f5dd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
# πŸ›‘οΈ CodeSentry Backend

**AI/ML Code Security Analysis Engine β€” powered by Qwen2.5-Coder-32B on AMD MI300X**

> Zero Data Retention. All inference runs locally. No code leaves your machine.

---

## Overview

CodeSentry is a multi-agent backend that audits AI/ML codebases for security vulnerabilities and performance issues:

- **Security Agent** β€” OWASP Top-10 + OWASP LLM Top-10 scanning (static regex + LLM deep analysis)
- **Performance Agent** β€” GPU memory leaks, N+1 embeddings, FP32 waste, missing `@torch.no_grad`
- **Fix Agent** β€” Generates unified diffs, security reports, and PR descriptions
- **AMD Migration Advisor** β€” 10-category CUDA β†’ ROCm/HIP compatibility scanner with AMD Compatibility Score
- **AMD Metrics Collector** β€” Real-time MI300X GPU monitoring via `rocm-smi` (with simulated fallback)
- **Privacy Guard** β€” Blocks outbound connections, generates cryptographically signed ZDR certificates

**Model stack:** `Qwen/Qwen2.5-Coder-32B-Instruct` via vLLM on AMD MI300X (192 GB HBM3)

---

## Quick Start

### 1. Setup vLLM on AMD MI300X

```bash
cd codesentry-backend
chmod +x scripts/setup_vllm.sh
./scripts/setup_vllm.sh
```

This installs vLLM with ROCm backend, starts the model server, and launches the CodeSentry API.

### 2. Manual startup

```bash
# Copy and configure environment
cp .env.example .env

# Install dependencies
pip install -r requirements.txt

# Start vLLM (in background)
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct \
  --port 8080 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 32768 &

# Start CodeSentry API
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```

---

## API Reference

### `GET /api/health`

Check service status, GPU memory, and live AMD hardware metrics.

```bash
curl http://localhost:8000/api/health
```

**Response:**
```json
{
  "status": "ok",
  "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
  "vllm_ready": true,
  "gpu_memory_free_gb": 142.5,
  "vllm_endpoint": "http://localhost:8080",
  "amd_hardware": {
    "gpu_utilization_percent": 85,
    "vram_used_gb": 48.2,
    "vram_total_gb": 192.0,
    "temperature_c": 63,
    "power_draw_w": 612,
    "memory_bandwidth_tbs": 4.7,
    "tokens_per_sec": 1250,
    "timestamp": "2026-05-09T13:30:00Z"
  }
}
```

---

### `POST /api/scan` & `GET /api/scan/stream/{session_id}` β€” SSE Stream

Analyse a codebase. Returns a Server-Sent Events stream.

```bash
# Analyse a GitHub repository (creates scan session)
curl -X POST http://localhost:8000/api/scan \
  -H "Content-Type: application/json" \
  -d '{
    "source": "https://github.com/example/vulnerable-ml-app",
    "source_type": "github",
    "session_id": "test-123"
  }'

# Stream the results
curl -N http://localhost:8000/api/scan/stream/test-123
```

**SSE Events:**
```
event: status
data: {"message": "Ingesting code...", "session_id": "test-123"}

event: agent_start
data: {"agent": "security", "status": "scanning"}

event: finding
data: {"severity": "critical", "title": "Insecure Pickle Deserialization", "cwe": "CWE-502", "line_number": 2}

event: amd_metrics
data: {"gpu_utilization_percent": 87, "vram_used_gb": 48.2, "vram_total_gb": 192.0, "temperature_c": 63, ...}

event: agent_start
data: {"agent": "performance", "status": "analyzing"}

event: finding
data: {"agent": "performance", "type": "gpu_memory", "saving_mb": 3584, "suggestion": "Switch from FP32 to BF16"}

event: amd_migration_finding
data: {"id": "AMD_M02", "title": "NVIDIA-Specific CLI Tool", "severity": "critical", "rocm_fix": "..."}

event: amd_migration_summary
data: {"compatibility_score": 72, "compatibility_label": "Mostly Compatible", "total_cuda_patterns_found": 3}

event: fix_ready
data: {"findingId": "SEC-STATIC-1", "title": "Fix pickle.load", "before": "...", "after": "..."}

event: complete
data: {"summary": {...}, "privacy_certificate": {...}, "amd_migration_guide": {...}}
```

---

### `POST /api/analyze/demo`

Pre-computed result from the vulnerable fixture. **No GPU required.** For frontend development and CI.

```bash
curl -X POST http://localhost:8000/api/analyze/demo | python -m json.tool
```

---

### `GET /api/session/{session_id}`

Retrieve the full analysis result for a completed session (includes `amd_migration_guide`).

```bash
curl http://localhost:8000/api/session/test-123
```

---

### `GET /api/privacy-certificate/{session_id}`

Get the Zero Data Retention audit certificate for a session.

```bash
curl http://localhost:8000/api/privacy-certificate/test-123
```

**Response:**
```json
{
  "session_id": "test-123",
  "timestamp": "2024-01-01T00:00:00+00:00",
  "guarantee": "All inference ran exclusively on localhost AMD MI300X via vLLM. Zero data transmitted to external services.",
  "model_endpoint": "http://localhost:8080",
  "external_calls_blocked": [],
  "data_wiped": true,
  "signature": "a3f8d2..."
}
```

---

## Running Tests

```bash
# Install test dependencies and run all tests (no GPU required)
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh

# Or directly with pytest
export USE_LLM=false
pytest tests/ -v --asyncio-mode=auto
```

All 15+ tests use **static analysis only** β€” no GPU or vLLM server needed.

---

## Benchmarking

```bash
# Requires running API at localhost:8000
chmod +x scripts/benchmark.sh
./scripts/benchmark.sh

# Custom URL and run count
CODESENTRY_URL=http://localhost:8000 BENCHMARK_RUNS=5 ./scripts/benchmark.sh
```

Outputs `benchmark_results.json` with TTFF, total latency, and findings statistics.

---

## Project Structure

```
codesentry-backend/
β”œβ”€β”€ main.py                    # FastAPI app entry point
β”œβ”€β”€ amd_metrics.py             # AMD MI300X live metrics (rocm-smi + simulated fallback)
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ routes.py              # All API endpoints
β”‚   └── models.py              # Pydantic request/response schemas
β”œβ”€β”€ agents/
β”‚   β”œβ”€β”€ orchestrator.py        # Master agent (coordinates all sub-agents, SSE)
β”‚   β”œβ”€β”€ security_agent.py      # OWASP + OWASP-LLM-Top-10 scanner
β”‚   β”œβ”€β”€ performance_agent.py   # GPU memory, latency, ROCm optimisation
β”‚   β”œβ”€β”€ fix_agent.py           # Code fixes, diffs, security report
β”‚   └── amd_migration_advisor.py  # CUDA β†’ ROCm migration (10 pattern categories)
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ code_parser.py         # AST parsing, GitHub/zip/string ingestion
β”‚   β”œβ”€β”€ github_connector.py    # GitHub shallow clone
β”‚   β”œβ”€β”€ vulnerability_db.py    # OWASP knowledge base + regex patterns
β”‚   β”œβ”€β”€ diff_generator.py      # Unified diff generation
β”‚   └── benchmark_tool.py      # GPU memory estimation + timing
β”œβ”€β”€ privacy/
β”‚   └── privacy_guard.py       # ZDR enforcement + HMAC certificates
β”œβ”€β”€ memory/
β”‚   └── session_store.py       # In-memory TTL session store
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ fixtures/
β”‚   β”‚   β”œβ”€β”€ vulnerable_ml_code.py  # Deliberately vulnerable ML app
β”‚   β”‚   β”œβ”€β”€ clean_ml_code.py       # Secure baseline
β”‚   β”‚   └── expected_findings.json # Ground truth for assertions
β”‚   β”œβ”€β”€ test_security_agent.py
β”‚   β”œβ”€β”€ test_performance_agent.py
β”‚   β”œβ”€β”€ test_api_endpoints.py
β”‚   └── test_privacy_guard.py
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ setup_vllm.sh          # One-command AMD MI300X setup
β”‚   β”œβ”€β”€ run_tests.sh           # Full test suite runner
β”‚   └── benchmark.sh           # Latency + throughput benchmark
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md
```

---

## Environment Variables

| Variable | Default | Description |
|---|---|---|
| `VLLM_BASE_URL` | `http://localhost:8080/v1` | vLLM OpenAI-compatible endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-Coder-32B-Instruct` | Model served by vLLM |
| `USE_LLM` | `true` | Set `false` for static-only mode (CI) |
| `PORT` | `8000` | CodeSentry API port |
| `CORS_ORIGINS` | `*` | Allowed frontend origins |
| `ZDR_SIGNING_KEY` | (dev default) | HMAC key for certificates β€” **change in production** |
| `GROQ_API_KEY` | β€” | Groq cloud API key (alternative to local vLLM) |

---

## Zero Data Retention

Every analysis session runs inside a `ZeroDataRetentionGuard` that:

1. **Blocks** all outbound non-localhost network connections at the socket level
2. **Logs** any blocked connection attempts to the audit trail
3. **Wipes** all session data from memory after the analysis completes
4. **Generates** a cryptographically signed audit certificate

The certificate is available at `GET /api/privacy-certificate/{session_id}`.

---

## Vulnerability Coverage

### Security (OWASP)

| Category | ID | Description |
|---|---|---|
| OWASP LLM | LLM01 | Prompt Injection |
| OWASP LLM | LLM02 | Insecure Output Handling (eval, exec) |
| OWASP LLM | LLM03 | Training Data Poisoning |
| OWASP LLM | LLM04 | Model Denial of Service |
| OWASP LLM | LLM06 | Sensitive Information Disclosure |
| OWASP LLM | LLM08 | Excessive Agency |
| OWASP LLM | LLM09 | Overreliance |
| OWASP Web | A01 | Broken Access Control |
| OWASP Web | A02 | Cryptographic Failures |
| OWASP Web | A03 | SQL Injection |
| OWASP Web | A04 | Insecure Deserialization (CWE-502) |
| OWASP Web | A05 | Security Misconfiguration |
| OWASP Web | A07 | Hardcoded Credentials |
| OWASP Web | A08 | Software & Data Integrity Failures |
| OWASP Web | A10 | Server-Side Request Forgery |
| ML-Specific | ML01 | GPU Memory Leak |
| ML-Specific | ML02 | Missing `@torch.no_grad` |
| ML-Specific | ML03 | N+1 Embedding Calls |
| ML-Specific | ML04 | FP32 vs BF16 Inefficiency |
| ML-Specific | ML05 | Synchronous Model Loading in Handler |

### AMD Migration (CUDA β†’ ROCm)

| ID | Severity | Description |
|---|---|---|
| AMD_M01 | Low | `torch.cuda.is_available()` β€” CUDA device check |
| AMD_M02 | Critical | `nvidia-smi` β€” NVIDIA-only CLI tool |
| AMD_M03 | High | `CUDA_VISIBLE_DEVICES` β€” CUDA env variable |
| AMD_M04 | High | `torch.cuda.amp.autocast/GradScaler` β€” Legacy CUDA AMP |
| AMD_M05 | Medium | `.half()` / `torch.float16` β€” FP16 suboptimal on MI300X |
| AMD_M06 | Medium | `torch.backends.cudnn.*` β€” cuDNN configuration |
| AMD_M07 | High | `import flash_attn` β€” CUDA-only Flash Attention |
| AMD_M08 | Low | `torch.cuda.memory_allocated()` β€” CUDA memory profiling |
| AMD_M09 | Low | `device = 'cuda'` β€” Hardcoded device string |
| AMD_M10 | Critical | `BitsAndBytesConfig` β€” CUDA-only quantization |