Spaces:
Running
Running
File size: 10,469 Bytes
7b4f5dd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 | # π‘οΈ CodeSentry Backend
**AI/ML Code Security Analysis Engine β powered by Qwen2.5-Coder-32B on AMD MI300X**
> Zero Data Retention. All inference runs locally. No code leaves your machine.
---
## Overview
CodeSentry is a multi-agent backend that audits AI/ML codebases for security vulnerabilities and performance issues:
- **Security Agent** β OWASP Top-10 + OWASP LLM Top-10 scanning (static regex + LLM deep analysis)
- **Performance Agent** β GPU memory leaks, N+1 embeddings, FP32 waste, missing `@torch.no_grad`
- **Fix Agent** β Generates unified diffs, security reports, and PR descriptions
- **AMD Migration Advisor** β 10-category CUDA β ROCm/HIP compatibility scanner with AMD Compatibility Score
- **AMD Metrics Collector** β Real-time MI300X GPU monitoring via `rocm-smi` (with simulated fallback)
- **Privacy Guard** β Blocks outbound connections, generates cryptographically signed ZDR certificates
**Model stack:** `Qwen/Qwen2.5-Coder-32B-Instruct` via vLLM on AMD MI300X (192 GB HBM3)
---
## Quick Start
### 1. Setup vLLM on AMD MI300X
```bash
cd codesentry-backend
chmod +x scripts/setup_vllm.sh
./scripts/setup_vllm.sh
```
This installs vLLM with ROCm backend, starts the model server, and launches the CodeSentry API.
### 2. Manual startup
```bash
# Copy and configure environment
cp .env.example .env
# Install dependencies
pip install -r requirements.txt
# Start vLLM (in background)
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct \
--port 8080 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.85 \
--max-model-len 32768 &
# Start CodeSentry API
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```
---
## API Reference
### `GET /api/health`
Check service status, GPU memory, and live AMD hardware metrics.
```bash
curl http://localhost:8000/api/health
```
**Response:**
```json
{
"status": "ok",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"vllm_ready": true,
"gpu_memory_free_gb": 142.5,
"vllm_endpoint": "http://localhost:8080",
"amd_hardware": {
"gpu_utilization_percent": 85,
"vram_used_gb": 48.2,
"vram_total_gb": 192.0,
"temperature_c": 63,
"power_draw_w": 612,
"memory_bandwidth_tbs": 4.7,
"tokens_per_sec": 1250,
"timestamp": "2026-05-09T13:30:00Z"
}
}
```
---
### `POST /api/scan` & `GET /api/scan/stream/{session_id}` β SSE Stream
Analyse a codebase. Returns a Server-Sent Events stream.
```bash
# Analyse a GitHub repository (creates scan session)
curl -X POST http://localhost:8000/api/scan \
-H "Content-Type: application/json" \
-d '{
"source": "https://github.com/example/vulnerable-ml-app",
"source_type": "github",
"session_id": "test-123"
}'
# Stream the results
curl -N http://localhost:8000/api/scan/stream/test-123
```
**SSE Events:**
```
event: status
data: {"message": "Ingesting code...", "session_id": "test-123"}
event: agent_start
data: {"agent": "security", "status": "scanning"}
event: finding
data: {"severity": "critical", "title": "Insecure Pickle Deserialization", "cwe": "CWE-502", "line_number": 2}
event: amd_metrics
data: {"gpu_utilization_percent": 87, "vram_used_gb": 48.2, "vram_total_gb": 192.0, "temperature_c": 63, ...}
event: agent_start
data: {"agent": "performance", "status": "analyzing"}
event: finding
data: {"agent": "performance", "type": "gpu_memory", "saving_mb": 3584, "suggestion": "Switch from FP32 to BF16"}
event: amd_migration_finding
data: {"id": "AMD_M02", "title": "NVIDIA-Specific CLI Tool", "severity": "critical", "rocm_fix": "..."}
event: amd_migration_summary
data: {"compatibility_score": 72, "compatibility_label": "Mostly Compatible", "total_cuda_patterns_found": 3}
event: fix_ready
data: {"findingId": "SEC-STATIC-1", "title": "Fix pickle.load", "before": "...", "after": "..."}
event: complete
data: {"summary": {...}, "privacy_certificate": {...}, "amd_migration_guide": {...}}
```
---
### `POST /api/analyze/demo`
Pre-computed result from the vulnerable fixture. **No GPU required.** For frontend development and CI.
```bash
curl -X POST http://localhost:8000/api/analyze/demo | python -m json.tool
```
---
### `GET /api/session/{session_id}`
Retrieve the full analysis result for a completed session (includes `amd_migration_guide`).
```bash
curl http://localhost:8000/api/session/test-123
```
---
### `GET /api/privacy-certificate/{session_id}`
Get the Zero Data Retention audit certificate for a session.
```bash
curl http://localhost:8000/api/privacy-certificate/test-123
```
**Response:**
```json
{
"session_id": "test-123",
"timestamp": "2024-01-01T00:00:00+00:00",
"guarantee": "All inference ran exclusively on localhost AMD MI300X via vLLM. Zero data transmitted to external services.",
"model_endpoint": "http://localhost:8080",
"external_calls_blocked": [],
"data_wiped": true,
"signature": "a3f8d2..."
}
```
---
## Running Tests
```bash
# Install test dependencies and run all tests (no GPU required)
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh
# Or directly with pytest
export USE_LLM=false
pytest tests/ -v --asyncio-mode=auto
```
All 15+ tests use **static analysis only** β no GPU or vLLM server needed.
---
## Benchmarking
```bash
# Requires running API at localhost:8000
chmod +x scripts/benchmark.sh
./scripts/benchmark.sh
# Custom URL and run count
CODESENTRY_URL=http://localhost:8000 BENCHMARK_RUNS=5 ./scripts/benchmark.sh
```
Outputs `benchmark_results.json` with TTFF, total latency, and findings statistics.
---
## Project Structure
```
codesentry-backend/
βββ main.py # FastAPI app entry point
βββ amd_metrics.py # AMD MI300X live metrics (rocm-smi + simulated fallback)
βββ api/
β βββ routes.py # All API endpoints
β βββ models.py # Pydantic request/response schemas
βββ agents/
β βββ orchestrator.py # Master agent (coordinates all sub-agents, SSE)
β βββ security_agent.py # OWASP + OWASP-LLM-Top-10 scanner
β βββ performance_agent.py # GPU memory, latency, ROCm optimisation
β βββ fix_agent.py # Code fixes, diffs, security report
β βββ amd_migration_advisor.py # CUDA β ROCm migration (10 pattern categories)
βββ tools/
β βββ code_parser.py # AST parsing, GitHub/zip/string ingestion
β βββ github_connector.py # GitHub shallow clone
β βββ vulnerability_db.py # OWASP knowledge base + regex patterns
β βββ diff_generator.py # Unified diff generation
β βββ benchmark_tool.py # GPU memory estimation + timing
βββ privacy/
β βββ privacy_guard.py # ZDR enforcement + HMAC certificates
βββ memory/
β βββ session_store.py # In-memory TTL session store
βββ tests/
β βββ fixtures/
β β βββ vulnerable_ml_code.py # Deliberately vulnerable ML app
β β βββ clean_ml_code.py # Secure baseline
β β βββ expected_findings.json # Ground truth for assertions
β βββ test_security_agent.py
β βββ test_performance_agent.py
β βββ test_api_endpoints.py
β βββ test_privacy_guard.py
βββ scripts/
β βββ setup_vllm.sh # One-command AMD MI300X setup
β βββ run_tests.sh # Full test suite runner
β βββ benchmark.sh # Latency + throughput benchmark
βββ requirements.txt
βββ .env.example
βββ README.md
```
---
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `VLLM_BASE_URL` | `http://localhost:8080/v1` | vLLM OpenAI-compatible endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-Coder-32B-Instruct` | Model served by vLLM |
| `USE_LLM` | `true` | Set `false` for static-only mode (CI) |
| `PORT` | `8000` | CodeSentry API port |
| `CORS_ORIGINS` | `*` | Allowed frontend origins |
| `ZDR_SIGNING_KEY` | (dev default) | HMAC key for certificates β **change in production** |
| `GROQ_API_KEY` | β | Groq cloud API key (alternative to local vLLM) |
---
## Zero Data Retention
Every analysis session runs inside a `ZeroDataRetentionGuard` that:
1. **Blocks** all outbound non-localhost network connections at the socket level
2. **Logs** any blocked connection attempts to the audit trail
3. **Wipes** all session data from memory after the analysis completes
4. **Generates** a cryptographically signed audit certificate
The certificate is available at `GET /api/privacy-certificate/{session_id}`.
---
## Vulnerability Coverage
### Security (OWASP)
| Category | ID | Description |
|---|---|---|
| OWASP LLM | LLM01 | Prompt Injection |
| OWASP LLM | LLM02 | Insecure Output Handling (eval, exec) |
| OWASP LLM | LLM03 | Training Data Poisoning |
| OWASP LLM | LLM04 | Model Denial of Service |
| OWASP LLM | LLM06 | Sensitive Information Disclosure |
| OWASP LLM | LLM08 | Excessive Agency |
| OWASP LLM | LLM09 | Overreliance |
| OWASP Web | A01 | Broken Access Control |
| OWASP Web | A02 | Cryptographic Failures |
| OWASP Web | A03 | SQL Injection |
| OWASP Web | A04 | Insecure Deserialization (CWE-502) |
| OWASP Web | A05 | Security Misconfiguration |
| OWASP Web | A07 | Hardcoded Credentials |
| OWASP Web | A08 | Software & Data Integrity Failures |
| OWASP Web | A10 | Server-Side Request Forgery |
| ML-Specific | ML01 | GPU Memory Leak |
| ML-Specific | ML02 | Missing `@torch.no_grad` |
| ML-Specific | ML03 | N+1 Embedding Calls |
| ML-Specific | ML04 | FP32 vs BF16 Inefficiency |
| ML-Specific | ML05 | Synchronous Model Loading in Handler |
### AMD Migration (CUDA β ROCm)
| ID | Severity | Description |
|---|---|---|
| AMD_M01 | Low | `torch.cuda.is_available()` β CUDA device check |
| AMD_M02 | Critical | `nvidia-smi` β NVIDIA-only CLI tool |
| AMD_M03 | High | `CUDA_VISIBLE_DEVICES` β CUDA env variable |
| AMD_M04 | High | `torch.cuda.amp.autocast/GradScaler` β Legacy CUDA AMP |
| AMD_M05 | Medium | `.half()` / `torch.float16` β FP16 suboptimal on MI300X |
| AMD_M06 | Medium | `torch.backends.cudnn.*` β cuDNN configuration |
| AMD_M07 | High | `import flash_attn` β CUDA-only Flash Attention |
| AMD_M08 | Low | `torch.cuda.memory_allocated()` β CUDA memory profiling |
| AMD_M09 | Low | `device = 'cuda'` β Hardcoded device string |
| AMD_M10 | Critical | `BitsAndBytesConfig` β CUDA-only quantization |
|