Spaces:
Running
π‘οΈ CodeSentry Backend
AI/ML Code Security Analysis Engine β powered by Qwen2.5-Coder-32B on AMD MI300X
Zero Data Retention. All inference runs locally. No code leaves your machine.
Overview
CodeSentry is a multi-agent backend that audits AI/ML codebases for security vulnerabilities and performance issues:
- Security Agent β OWASP Top-10 + OWASP LLM Top-10 scanning (static regex + LLM deep analysis)
- Performance Agent β GPU memory leaks, N+1 embeddings, FP32 waste, missing
@torch.no_grad - Fix Agent β Generates unified diffs, security reports, and PR descriptions
- AMD Migration Advisor β 10-category CUDA β ROCm/HIP compatibility scanner with AMD Compatibility Score
- AMD Metrics Collector β Real-time MI300X GPU monitoring via
rocm-smi(with simulated fallback) - Privacy Guard β Blocks outbound connections, generates cryptographically signed ZDR certificates
Model stack: Qwen/Qwen2.5-Coder-32B-Instruct via vLLM on AMD MI300X (192 GB HBM3)
Quick Start
1. Setup vLLM on AMD MI300X
cd codesentry-backend
chmod +x scripts/setup_vllm.sh
./scripts/setup_vllm.sh
This installs vLLM with ROCm backend, starts the model server, and launches the CodeSentry API.
2. Manual startup
# Copy and configure environment
cp .env.example .env
# Install dependencies
pip install -r requirements.txt
# Start vLLM (in background)
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct \
--port 8080 \
--tensor-parallel-size 1 \
--gpu-memory-utilization 0.85 \
--max-model-len 32768 &
# Start CodeSentry API
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
API Reference
GET /api/health
Check service status, GPU memory, and live AMD hardware metrics.
curl http://localhost:8000/api/health
Response:
{
"status": "ok",
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"vllm_ready": true,
"gpu_memory_free_gb": 142.5,
"vllm_endpoint": "http://localhost:8080",
"amd_hardware": {
"gpu_utilization_percent": 85,
"vram_used_gb": 48.2,
"vram_total_gb": 192.0,
"temperature_c": 63,
"power_draw_w": 612,
"memory_bandwidth_tbs": 4.7,
"tokens_per_sec": 1250,
"timestamp": "2026-05-09T13:30:00Z"
}
}
POST /api/scan & GET /api/scan/stream/{session_id} β SSE Stream
Analyse a codebase. Returns a Server-Sent Events stream.
# Analyse a GitHub repository (creates scan session)
curl -X POST http://localhost:8000/api/scan \
-H "Content-Type: application/json" \
-d '{
"source": "https://github.com/example/vulnerable-ml-app",
"source_type": "github",
"session_id": "test-123"
}'
# Stream the results
curl -N http://localhost:8000/api/scan/stream/test-123
SSE Events:
event: status
data: {"message": "Ingesting code...", "session_id": "test-123"}
event: agent_start
data: {"agent": "security", "status": "scanning"}
event: finding
data: {"severity": "critical", "title": "Insecure Pickle Deserialization", "cwe": "CWE-502", "line_number": 2}
event: amd_metrics
data: {"gpu_utilization_percent": 87, "vram_used_gb": 48.2, "vram_total_gb": 192.0, "temperature_c": 63, ...}
event: agent_start
data: {"agent": "performance", "status": "analyzing"}
event: finding
data: {"agent": "performance", "type": "gpu_memory", "saving_mb": 3584, "suggestion": "Switch from FP32 to BF16"}
event: amd_migration_finding
data: {"id": "AMD_M02", "title": "NVIDIA-Specific CLI Tool", "severity": "critical", "rocm_fix": "..."}
event: amd_migration_summary
data: {"compatibility_score": 72, "compatibility_label": "Mostly Compatible", "total_cuda_patterns_found": 3}
event: fix_ready
data: {"findingId": "SEC-STATIC-1", "title": "Fix pickle.load", "before": "...", "after": "..."}
event: complete
data: {"summary": {...}, "privacy_certificate": {...}, "amd_migration_guide": {...}}
POST /api/analyze/demo
Pre-computed result from the vulnerable fixture. No GPU required. For frontend development and CI.
curl -X POST http://localhost:8000/api/analyze/demo | python -m json.tool
GET /api/session/{session_id}
Retrieve the full analysis result for a completed session (includes amd_migration_guide).
curl http://localhost:8000/api/session/test-123
GET /api/privacy-certificate/{session_id}
Get the Zero Data Retention audit certificate for a session.
curl http://localhost:8000/api/privacy-certificate/test-123
Response:
{
"session_id": "test-123",
"timestamp": "2024-01-01T00:00:00+00:00",
"guarantee": "All inference ran exclusively on localhost AMD MI300X via vLLM. Zero data transmitted to external services.",
"model_endpoint": "http://localhost:8080",
"external_calls_blocked": [],
"data_wiped": true,
"signature": "a3f8d2..."
}
Running Tests
# Install test dependencies and run all tests (no GPU required)
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh
# Or directly with pytest
export USE_LLM=false
pytest tests/ -v --asyncio-mode=auto
All 15+ tests use static analysis only β no GPU or vLLM server needed.
Benchmarking
# Requires running API at localhost:8000
chmod +x scripts/benchmark.sh
./scripts/benchmark.sh
# Custom URL and run count
CODESENTRY_URL=http://localhost:8000 BENCHMARK_RUNS=5 ./scripts/benchmark.sh
Outputs benchmark_results.json with TTFF, total latency, and findings statistics.
Project Structure
codesentry-backend/
βββ main.py # FastAPI app entry point
βββ amd_metrics.py # AMD MI300X live metrics (rocm-smi + simulated fallback)
βββ api/
β βββ routes.py # All API endpoints
β βββ models.py # Pydantic request/response schemas
βββ agents/
β βββ orchestrator.py # Master agent (coordinates all sub-agents, SSE)
β βββ security_agent.py # OWASP + OWASP-LLM-Top-10 scanner
β βββ performance_agent.py # GPU memory, latency, ROCm optimisation
β βββ fix_agent.py # Code fixes, diffs, security report
β βββ amd_migration_advisor.py # CUDA β ROCm migration (10 pattern categories)
βββ tools/
β βββ code_parser.py # AST parsing, GitHub/zip/string ingestion
β βββ github_connector.py # GitHub shallow clone
β βββ vulnerability_db.py # OWASP knowledge base + regex patterns
β βββ diff_generator.py # Unified diff generation
β βββ benchmark_tool.py # GPU memory estimation + timing
βββ privacy/
β βββ privacy_guard.py # ZDR enforcement + HMAC certificates
βββ memory/
β βββ session_store.py # In-memory TTL session store
βββ tests/
β βββ fixtures/
β β βββ vulnerable_ml_code.py # Deliberately vulnerable ML app
β β βββ clean_ml_code.py # Secure baseline
β β βββ expected_findings.json # Ground truth for assertions
β βββ test_security_agent.py
β βββ test_performance_agent.py
β βββ test_api_endpoints.py
β βββ test_privacy_guard.py
βββ scripts/
β βββ setup_vllm.sh # One-command AMD MI300X setup
β βββ run_tests.sh # Full test suite runner
β βββ benchmark.sh # Latency + throughput benchmark
βββ requirements.txt
βββ .env.example
βββ README.md
Environment Variables
| Variable | Default | Description |
|---|---|---|
VLLM_BASE_URL |
http://localhost:8080/v1 |
vLLM OpenAI-compatible endpoint |
MODEL_NAME |
Qwen/Qwen2.5-Coder-32B-Instruct |
Model served by vLLM |
USE_LLM |
true |
Set false for static-only mode (CI) |
PORT |
8000 |
CodeSentry API port |
CORS_ORIGINS |
* |
Allowed frontend origins |
ZDR_SIGNING_KEY |
(dev default) | HMAC key for certificates β change in production |
GROQ_API_KEY |
β | Groq cloud API key (alternative to local vLLM) |
Zero Data Retention
Every analysis session runs inside a ZeroDataRetentionGuard that:
- Blocks all outbound non-localhost network connections at the socket level
- Logs any blocked connection attempts to the audit trail
- Wipes all session data from memory after the analysis completes
- Generates a cryptographically signed audit certificate
The certificate is available at GET /api/privacy-certificate/{session_id}.
Vulnerability Coverage
Security (OWASP)
| Category | ID | Description |
|---|---|---|
| OWASP LLM | LLM01 | Prompt Injection |
| OWASP LLM | LLM02 | Insecure Output Handling (eval, exec) |
| OWASP LLM | LLM03 | Training Data Poisoning |
| OWASP LLM | LLM04 | Model Denial of Service |
| OWASP LLM | LLM06 | Sensitive Information Disclosure |
| OWASP LLM | LLM08 | Excessive Agency |
| OWASP LLM | LLM09 | Overreliance |
| OWASP Web | A01 | Broken Access Control |
| OWASP Web | A02 | Cryptographic Failures |
| OWASP Web | A03 | SQL Injection |
| OWASP Web | A04 | Insecure Deserialization (CWE-502) |
| OWASP Web | A05 | Security Misconfiguration |
| OWASP Web | A07 | Hardcoded Credentials |
| OWASP Web | A08 | Software & Data Integrity Failures |
| OWASP Web | A10 | Server-Side Request Forgery |
| ML-Specific | ML01 | GPU Memory Leak |
| ML-Specific | ML02 | Missing @torch.no_grad |
| ML-Specific | ML03 | N+1 Embedding Calls |
| ML-Specific | ML04 | FP32 vs BF16 Inefficiency |
| ML-Specific | ML05 | Synchronous Model Loading in Handler |
AMD Migration (CUDA β ROCm)
| ID | Severity | Description |
|---|---|---|
| AMD_M01 | Low | torch.cuda.is_available() β CUDA device check |
| AMD_M02 | Critical | nvidia-smi β NVIDIA-only CLI tool |
| AMD_M03 | High | CUDA_VISIBLE_DEVICES β CUDA env variable |
| AMD_M04 | High | torch.cuda.amp.autocast/GradScaler β Legacy CUDA AMP |
| AMD_M05 | Medium | .half() / torch.float16 β FP16 suboptimal on MI300X |
| AMD_M06 | Medium | torch.backends.cudnn.* β cuDNN configuration |
| AMD_M07 | High | import flash_attn β CUDA-only Flash Attention |
| AMD_M08 | Low | torch.cuda.memory_allocated() β CUDA memory profiling |
| AMD_M09 | Low | device = 'cuda' β Hardcoded device string |
| AMD_M10 | Critical | BitsAndBytesConfig β CUDA-only quantization |