YashashviAlva's picture
Initial commit for HF Spaces deploy
7b4f5dd
# πŸ›‘οΈ CodeSentry Backend
**AI/ML Code Security Analysis Engine β€” powered by Qwen2.5-Coder-32B on AMD MI300X**
> Zero Data Retention. All inference runs locally. No code leaves your machine.
---
## Overview
CodeSentry is a multi-agent backend that audits AI/ML codebases for security vulnerabilities and performance issues:
- **Security Agent** β€” OWASP Top-10 + OWASP LLM Top-10 scanning (static regex + LLM deep analysis)
- **Performance Agent** β€” GPU memory leaks, N+1 embeddings, FP32 waste, missing `@torch.no_grad`
- **Fix Agent** β€” Generates unified diffs, security reports, and PR descriptions
- **AMD Migration Advisor** β€” 10-category CUDA β†’ ROCm/HIP compatibility scanner with AMD Compatibility Score
- **AMD Metrics Collector** β€” Real-time MI300X GPU monitoring via `rocm-smi` (with simulated fallback)
- **Privacy Guard** β€” Blocks outbound connections, generates cryptographically signed ZDR certificates
**Model stack:** `Qwen/Qwen2.5-Coder-32B-Instruct` via vLLM on AMD MI300X (192 GB HBM3)
---
## Quick Start
### 1. Setup vLLM on AMD MI300X
```bash
cd codesentry-backend
chmod +x scripts/setup_vllm.sh
./scripts/setup_vllm.sh
```
This installs vLLM with ROCm backend, starts the model server, and launches the CodeSentry API.
### 2. Manual startup
```bash
# Copy and configure environment
cp .env.example .env
# Install dependencies
pip install -r requirements.txt
# Start vLLM (in background)
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct \
--port 8080 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.85 \
--max-model-len 32768 &
# Start CodeSentry API
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```
---
## API Reference
### `GET /api/health`
Check service status, GPU memory, and live AMD hardware metrics.
```bash
curl http://localhost:8000/api/health
```
**Response:**
```json
{
"status": "ok",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"vllm_ready": true,
"gpu_memory_free_gb": 142.5,
"vllm_endpoint": "http://localhost:8080",
"amd_hardware": {
"gpu_utilization_percent": 85,
"vram_used_gb": 48.2,
"vram_total_gb": 192.0,
"temperature_c": 63,
"power_draw_w": 612,
"memory_bandwidth_tbs": 4.7,
"tokens_per_sec": 1250,
"timestamp": "2026-05-09T13:30:00Z"
}
}
```
---
### `POST /api/scan` & `GET /api/scan/stream/{session_id}` β€” SSE Stream
Analyse a codebase. Returns a Server-Sent Events stream.
```bash
# Analyse a GitHub repository (creates scan session)
curl -X POST http://localhost:8000/api/scan \
-H "Content-Type: application/json" \
-d '{
"source": "https://github.com/example/vulnerable-ml-app",
"source_type": "github",
"session_id": "test-123"
}'
# Stream the results
curl -N http://localhost:8000/api/scan/stream/test-123
```
**SSE Events:**
```
event: status
data: {"message": "Ingesting code...", "session_id": "test-123"}
event: agent_start
data: {"agent": "security", "status": "scanning"}
event: finding
data: {"severity": "critical", "title": "Insecure Pickle Deserialization", "cwe": "CWE-502", "line_number": 2}
event: amd_metrics
data: {"gpu_utilization_percent": 87, "vram_used_gb": 48.2, "vram_total_gb": 192.0, "temperature_c": 63, ...}
event: agent_start
data: {"agent": "performance", "status": "analyzing"}
event: finding
data: {"agent": "performance", "type": "gpu_memory", "saving_mb": 3584, "suggestion": "Switch from FP32 to BF16"}
event: amd_migration_finding
data: {"id": "AMD_M02", "title": "NVIDIA-Specific CLI Tool", "severity": "critical", "rocm_fix": "..."}
event: amd_migration_summary
data: {"compatibility_score": 72, "compatibility_label": "Mostly Compatible", "total_cuda_patterns_found": 3}
event: fix_ready
data: {"findingId": "SEC-STATIC-1", "title": "Fix pickle.load", "before": "...", "after": "..."}
event: complete
data: {"summary": {...}, "privacy_certificate": {...}, "amd_migration_guide": {...}}
```
---
### `POST /api/analyze/demo`
Pre-computed result from the vulnerable fixture. **No GPU required.** For frontend development and CI.
```bash
curl -X POST http://localhost:8000/api/analyze/demo | python -m json.tool
```
---
### `GET /api/session/{session_id}`
Retrieve the full analysis result for a completed session (includes `amd_migration_guide`).
```bash
curl http://localhost:8000/api/session/test-123
```
---
### `GET /api/privacy-certificate/{session_id}`
Get the Zero Data Retention audit certificate for a session.
```bash
curl http://localhost:8000/api/privacy-certificate/test-123
```
**Response:**
```json
{
"session_id": "test-123",
"timestamp": "2024-01-01T00:00:00+00:00",
"guarantee": "All inference ran exclusively on localhost AMD MI300X via vLLM. Zero data transmitted to external services.",
"model_endpoint": "http://localhost:8080",
"external_calls_blocked": [],
"data_wiped": true,
"signature": "a3f8d2..."
}
```
---
## Running Tests
```bash
# Install test dependencies and run all tests (no GPU required)
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh
# Or directly with pytest
export USE_LLM=false
pytest tests/ -v --asyncio-mode=auto
```
All 15+ tests use **static analysis only** β€” no GPU or vLLM server needed.
---
## Benchmarking
```bash
# Requires running API at localhost:8000
chmod +x scripts/benchmark.sh
./scripts/benchmark.sh
# Custom URL and run count
CODESENTRY_URL=http://localhost:8000 BENCHMARK_RUNS=5 ./scripts/benchmark.sh
```
Outputs `benchmark_results.json` with TTFF, total latency, and findings statistics.
---
## Project Structure
```
codesentry-backend/
β”œβ”€β”€ main.py # FastAPI app entry point
β”œβ”€β”€ amd_metrics.py # AMD MI300X live metrics (rocm-smi + simulated fallback)
β”œβ”€β”€ api/
β”‚ β”œβ”€β”€ routes.py # All API endpoints
β”‚ └── models.py # Pydantic request/response schemas
β”œβ”€β”€ agents/
β”‚ β”œβ”€β”€ orchestrator.py # Master agent (coordinates all sub-agents, SSE)
β”‚ β”œβ”€β”€ security_agent.py # OWASP + OWASP-LLM-Top-10 scanner
β”‚ β”œβ”€β”€ performance_agent.py # GPU memory, latency, ROCm optimisation
β”‚ β”œβ”€β”€ fix_agent.py # Code fixes, diffs, security report
β”‚ └── amd_migration_advisor.py # CUDA β†’ ROCm migration (10 pattern categories)
β”œβ”€β”€ tools/
β”‚ β”œβ”€β”€ code_parser.py # AST parsing, GitHub/zip/string ingestion
β”‚ β”œβ”€β”€ github_connector.py # GitHub shallow clone
β”‚ β”œβ”€β”€ vulnerability_db.py # OWASP knowledge base + regex patterns
β”‚ β”œβ”€β”€ diff_generator.py # Unified diff generation
β”‚ └── benchmark_tool.py # GPU memory estimation + timing
β”œβ”€β”€ privacy/
β”‚ └── privacy_guard.py # ZDR enforcement + HMAC certificates
β”œβ”€β”€ memory/
β”‚ └── session_store.py # In-memory TTL session store
β”œβ”€β”€ tests/
β”‚ β”œβ”€β”€ fixtures/
β”‚ β”‚ β”œβ”€β”€ vulnerable_ml_code.py # Deliberately vulnerable ML app
β”‚ β”‚ β”œβ”€β”€ clean_ml_code.py # Secure baseline
β”‚ β”‚ └── expected_findings.json # Ground truth for assertions
β”‚ β”œβ”€β”€ test_security_agent.py
β”‚ β”œβ”€β”€ test_performance_agent.py
β”‚ β”œβ”€β”€ test_api_endpoints.py
β”‚ └── test_privacy_guard.py
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ setup_vllm.sh # One-command AMD MI300X setup
β”‚ β”œβ”€β”€ run_tests.sh # Full test suite runner
β”‚ └── benchmark.sh # Latency + throughput benchmark
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md
```
---
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `VLLM_BASE_URL` | `http://localhost:8080/v1` | vLLM OpenAI-compatible endpoint |
| `MODEL_NAME` | `Qwen/Qwen2.5-Coder-32B-Instruct` | Model served by vLLM |
| `USE_LLM` | `true` | Set `false` for static-only mode (CI) |
| `PORT` | `8000` | CodeSentry API port |
| `CORS_ORIGINS` | `*` | Allowed frontend origins |
| `ZDR_SIGNING_KEY` | (dev default) | HMAC key for certificates β€” **change in production** |
| `GROQ_API_KEY` | β€” | Groq cloud API key (alternative to local vLLM) |
---
## Zero Data Retention
Every analysis session runs inside a `ZeroDataRetentionGuard` that:
1. **Blocks** all outbound non-localhost network connections at the socket level
2. **Logs** any blocked connection attempts to the audit trail
3. **Wipes** all session data from memory after the analysis completes
4. **Generates** a cryptographically signed audit certificate
The certificate is available at `GET /api/privacy-certificate/{session_id}`.
---
## Vulnerability Coverage
### Security (OWASP)
| Category | ID | Description |
|---|---|---|
| OWASP LLM | LLM01 | Prompt Injection |
| OWASP LLM | LLM02 | Insecure Output Handling (eval, exec) |
| OWASP LLM | LLM03 | Training Data Poisoning |
| OWASP LLM | LLM04 | Model Denial of Service |
| OWASP LLM | LLM06 | Sensitive Information Disclosure |
| OWASP LLM | LLM08 | Excessive Agency |
| OWASP LLM | LLM09 | Overreliance |
| OWASP Web | A01 | Broken Access Control |
| OWASP Web | A02 | Cryptographic Failures |
| OWASP Web | A03 | SQL Injection |
| OWASP Web | A04 | Insecure Deserialization (CWE-502) |
| OWASP Web | A05 | Security Misconfiguration |
| OWASP Web | A07 | Hardcoded Credentials |
| OWASP Web | A08 | Software & Data Integrity Failures |
| OWASP Web | A10 | Server-Side Request Forgery |
| ML-Specific | ML01 | GPU Memory Leak |
| ML-Specific | ML02 | Missing `@torch.no_grad` |
| ML-Specific | ML03 | N+1 Embedding Calls |
| ML-Specific | ML04 | FP32 vs BF16 Inefficiency |
| ML-Specific | ML05 | Synchronous Model Loading in Handler |
### AMD Migration (CUDA β†’ ROCm)
| ID | Severity | Description |
|---|---|---|
| AMD_M01 | Low | `torch.cuda.is_available()` β€” CUDA device check |
| AMD_M02 | Critical | `nvidia-smi` β€” NVIDIA-only CLI tool |
| AMD_M03 | High | `CUDA_VISIBLE_DEVICES` β€” CUDA env variable |
| AMD_M04 | High | `torch.cuda.amp.autocast/GradScaler` β€” Legacy CUDA AMP |
| AMD_M05 | Medium | `.half()` / `torch.float16` β€” FP16 suboptimal on MI300X |
| AMD_M06 | Medium | `torch.backends.cudnn.*` β€” cuDNN configuration |
| AMD_M07 | High | `import flash_attn` β€” CUDA-only Flash Attention |
| AMD_M08 | Low | `torch.cuda.memory_allocated()` β€” CUDA memory profiling |
| AMD_M09 | Low | `device = 'cuda'` β€” Hardcoded device string |
| AMD_M10 | Critical | `BitsAndBytesConfig` β€” CUDA-only quantization |