Spaces:

YashashviAlva
/

codeSentry

Running

App Files Files Community

codeSentry / codesentry-backend /README.md

YashashviAlva

Initial commit for HF Spaces deploy

7b4f5dd 1 day ago

preview code

raw

history blame contribute delete

10.5 kB

🛡️ CodeSentry Backend

AI/ML Code Security Analysis Engine — powered by Qwen2.5-Coder-32B on AMD MI300X

Zero Data Retention. All inference runs locally. No code leaves your machine.

Overview

CodeSentry is a multi-agent backend that audits AI/ML codebases for security vulnerabilities and performance issues:

Security Agent — OWASP Top-10 + OWASP LLM Top-10 scanning (static regex + LLM deep analysis)
Performance Agent — GPU memory leaks, N+1 embeddings, FP32 waste, missing @torch.no_grad
Fix Agent — Generates unified diffs, security reports, and PR descriptions
AMD Migration Advisor — 10-category CUDA → ROCm/HIP compatibility scanner with AMD Compatibility Score
AMD Metrics Collector — Real-time MI300X GPU monitoring via rocm-smi (with simulated fallback)
Privacy Guard — Blocks outbound connections, generates cryptographically signed ZDR certificates

Model stack: Qwen/Qwen2.5-Coder-32B-Instruct via vLLM on AMD MI300X (192 GB HBM3)

Quick Start

1. Setup vLLM on AMD MI300X

cd codesentry-backend
chmod +x scripts/setup_vllm.sh
./scripts/setup_vllm.sh

This installs vLLM with ROCm backend, starts the model server, and launches the CodeSentry API.

2. Manual startup

# Copy and configure environment
cp .env.example .env

# Install dependencies
pip install -r requirements.txt

# Start vLLM (in background)
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct \
  --port 8080 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 32768 &

# Start CodeSentry API
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

API Reference

`GET /api/health`

Check service status, GPU memory, and live AMD hardware metrics.

curl http://localhost:8000/api/health

Response:

{
  "status": "ok",
  "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
  "vllm_ready": true,
  "gpu_memory_free_gb": 142.5,
  "vllm_endpoint": "http://localhost:8080",
  "amd_hardware": {
    "gpu_utilization_percent": 85,
    "vram_used_gb": 48.2,
    "vram_total_gb": 192.0,
    "temperature_c": 63,
    "power_draw_w": 612,
    "memory_bandwidth_tbs": 4.7,
    "tokens_per_sec": 1250,
    "timestamp": "2026-05-09T13:30:00Z"
  }
}

`POST /api/scan` & `GET /api/scan/stream/{session_id}` — SSE Stream

Analyse a codebase. Returns a Server-Sent Events stream.

# Analyse a GitHub repository (creates scan session)
curl -X POST http://localhost:8000/api/scan \
  -H "Content-Type: application/json" \
  -d '{
    "source": "https://github.com/example/vulnerable-ml-app",
    "source_type": "github",
    "session_id": "test-123"
  }'

# Stream the results
curl -N http://localhost:8000/api/scan/stream/test-123

SSE Events:

event: status
data: {"message": "Ingesting code...", "session_id": "test-123"}

event: agent_start
data: {"agent": "security", "status": "scanning"}

event: finding
data: {"severity": "critical", "title": "Insecure Pickle Deserialization", "cwe": "CWE-502", "line_number": 2}

event: amd_metrics
data: {"gpu_utilization_percent": 87, "vram_used_gb": 48.2, "vram_total_gb": 192.0, "temperature_c": 63, ...}

event: agent_start
data: {"agent": "performance", "status": "analyzing"}

event: finding
data: {"agent": "performance", "type": "gpu_memory", "saving_mb": 3584, "suggestion": "Switch from FP32 to BF16"}

event: amd_migration_finding
data: {"id": "AMD_M02", "title": "NVIDIA-Specific CLI Tool", "severity": "critical", "rocm_fix": "..."}

event: amd_migration_summary
data: {"compatibility_score": 72, "compatibility_label": "Mostly Compatible", "total_cuda_patterns_found": 3}

event: fix_ready
data: {"findingId": "SEC-STATIC-1", "title": "Fix pickle.load", "before": "...", "after": "..."}

event: complete
data: {"summary": {...}, "privacy_certificate": {...}, "amd_migration_guide": {...}}

`POST /api/analyze/demo`

Pre-computed result from the vulnerable fixture. No GPU required. For frontend development and CI.

curl -X POST http://localhost:8000/api/analyze/demo | python -m json.tool

`GET /api/session/{session_id}`

Retrieve the full analysis result for a completed session (includes amd_migration_guide).

curl http://localhost:8000/api/session/test-123

`GET /api/privacy-certificate/{session_id}`

Get the Zero Data Retention audit certificate for a session.

curl http://localhost:8000/api/privacy-certificate/test-123

Response:

{
  "session_id": "test-123",
  "timestamp": "2024-01-01T00:00:00+00:00",
  "guarantee": "All inference ran exclusively on localhost AMD MI300X via vLLM. Zero data transmitted to external services.",
  "model_endpoint": "http://localhost:8080",
  "external_calls_blocked": [],
  "data_wiped": true,
  "signature": "a3f8d2..."
}

Running Tests

# Install test dependencies and run all tests (no GPU required)
chmod +x scripts/run_tests.sh
./scripts/run_tests.sh

# Or directly with pytest
export USE_LLM=false
pytest tests/ -v --asyncio-mode=auto

All 15+ tests use static analysis only — no GPU or vLLM server needed.

Benchmarking

# Requires running API at localhost:8000
chmod +x scripts/benchmark.sh
./scripts/benchmark.sh

# Custom URL and run count
CODESENTRY_URL=http://localhost:8000 BENCHMARK_RUNS=5 ./scripts/benchmark.sh

Outputs benchmark_results.json with TTFF, total latency, and findings statistics.

Project Structure

codesentry-backend/
├── main.py                    # FastAPI app entry point
├── amd_metrics.py             # AMD MI300X live metrics (rocm-smi + simulated fallback)
├── api/
│   ├── routes.py              # All API endpoints
│   └── models.py              # Pydantic request/response schemas
├── agents/
│   ├── orchestrator.py        # Master agent (coordinates all sub-agents, SSE)
│   ├── security_agent.py      # OWASP + OWASP-LLM-Top-10 scanner
│   ├── performance_agent.py   # GPU memory, latency, ROCm optimisation
│   ├── fix_agent.py           # Code fixes, diffs, security report
│   └── amd_migration_advisor.py  # CUDA → ROCm migration (10 pattern categories)
├── tools/
│   ├── code_parser.py         # AST parsing, GitHub/zip/string ingestion
│   ├── github_connector.py    # GitHub shallow clone
│   ├── vulnerability_db.py    # OWASP knowledge base + regex patterns
│   ├── diff_generator.py      # Unified diff generation
│   └── benchmark_tool.py      # GPU memory estimation + timing
├── privacy/
│   └── privacy_guard.py       # ZDR enforcement + HMAC certificates
├── memory/
│   └── session_store.py       # In-memory TTL session store
├── tests/
│   ├── fixtures/
│   │   ├── vulnerable_ml_code.py  # Deliberately vulnerable ML app
│   │   ├── clean_ml_code.py       # Secure baseline
│   │   └── expected_findings.json # Ground truth for assertions
│   ├── test_security_agent.py
│   ├── test_performance_agent.py
│   ├── test_api_endpoints.py
│   └── test_privacy_guard.py
├── scripts/
│   ├── setup_vllm.sh          # One-command AMD MI300X setup
│   ├── run_tests.sh           # Full test suite runner
│   └── benchmark.sh           # Latency + throughput benchmark
├── requirements.txt
├── .env.example
└── README.md

Environment Variables

Variable	Default	Description
`VLLM_BASE_URL`	`http://localhost:8080/v1`	vLLM OpenAI-compatible endpoint
`MODEL_NAME`	`Qwen/Qwen2.5-Coder-32B-Instruct`	Model served by vLLM
`USE_LLM`	`true`	Set `false` for static-only mode (CI)
`PORT`	`8000`	CodeSentry API port
`CORS_ORIGINS`	`*`	Allowed frontend origins
`ZDR_SIGNING_KEY`	(dev default)	HMAC key for certificates — change in production
`GROQ_API_KEY`	—	Groq cloud API key (alternative to local vLLM)

Zero Data Retention

Every analysis session runs inside a ZeroDataRetentionGuard that:

Blocks all outbound non-localhost network connections at the socket level
Logs any blocked connection attempts to the audit trail
Wipes all session data from memory after the analysis completes
Generates a cryptographically signed audit certificate

The certificate is available at GET /api/privacy-certificate/{session_id}.

Vulnerability Coverage

Security (OWASP)

Category	ID	Description
OWASP LLM	LLM01	Prompt Injection
OWASP LLM	LLM02	Insecure Output Handling (eval, exec)
OWASP LLM	LLM03	Training Data Poisoning
OWASP LLM	LLM04	Model Denial of Service
OWASP LLM	LLM06	Sensitive Information Disclosure
OWASP LLM	LLM08	Excessive Agency
OWASP LLM	LLM09	Overreliance
OWASP Web	A01	Broken Access Control
OWASP Web	A02	Cryptographic Failures
OWASP Web	A03	SQL Injection
OWASP Web	A04	Insecure Deserialization (CWE-502)
OWASP Web	A05	Security Misconfiguration
OWASP Web	A07	Hardcoded Credentials
OWASP Web	A08	Software & Data Integrity Failures
OWASP Web	A10	Server-Side Request Forgery
ML-Specific	ML01	GPU Memory Leak
ML-Specific	ML02	Missing `@torch.no_grad`
ML-Specific	ML03	N+1 Embedding Calls
ML-Specific	ML04	FP32 vs BF16 Inefficiency
ML-Specific	ML05	Synchronous Model Loading in Handler

AMD Migration (CUDA → ROCm)

ID	Severity	Description
AMD_M01	Low	`torch.cuda.is_available()` — CUDA device check
AMD_M02	Critical	`nvidia-smi` — NVIDIA-only CLI tool
AMD_M03	High	`CUDA_VISIBLE_DEVICES` — CUDA env variable
AMD_M04	High	`torch.cuda.amp.autocast/GradScaler` — Legacy CUDA AMP
AMD_M05	Medium	`.half()` / `torch.float16` — FP16 suboptimal on MI300X
AMD_M06	Medium	`torch.backends.cudnn.*` — cuDNN configuration
AMD_M07	High	`import flash_attn` — CUDA-only Flash Attention
AMD_M08	Low	`torch.cuda.memory_allocated()` — CUDA memory profiling
AMD_M09	Low	`device = 'cuda'` — Hardcoded device string
AMD_M10	Critical	`BitsAndBytesConfig` — CUDA-only quantization