code-security-analyzer

Running

App Files Files Community

code-security-analyzer / README.md

ayshajavd

Update README with v2 features and metrics

4aeba64 verified about 2 months ago

preview code

Raw

History Blame Contribute Delete

2.78 kB

	---
	title: Code Security Risk Analyzer
	emoji: 🔒
	colorFrom: red
	colorTo: purple
	sdk: gradio
	sdk_version: 6.13.0
	app_file: app.py
	pinned: true
	license: apache-2.0
	tags:
	- security
	- vulnerability-detection
	- owasp
	- cwe
	- code-analysis
	- static-analysis
	short_description: AI-powered code vulnerability detection with OWASP mapping
	---

	# 🔒 Code Security Risk Analyzer v2

	AI-powered multi-label vulnerability detection across 30 CWE categories mapped to OWASP Top 10 2021. Supports Python, JavaScript, Java, C, C++, PHP, and Go.

	## v2 Improvements
	- Per-class threshold optimization — each CWE has its own optimal detection threshold (not global 0.3)
	- Temperature-calibrated probabilities — confidence scores are meaningful (0.8 ≈ 80% true positive rate)
	- CWE-aware fix generation — fixer model knows what vulnerability to fix
	- 3.7x larger fixer model — CodeT5+ 220M (was flan-t5-small 60M)
	- Asymmetric Loss training — handles 90% safe class imbalance

	## Model Performance

	\| Model \| Metric \| Score \|
	\|-------\|--------\|-------\|
	\| Classifier (GraphCodeBERT 125M) \| Macro F1 \| 0.476 (+311% vs baseline) \|
	\| \| Weighted F1 \| 0.945 \|
	\| \| Safe Detection F1 \| 0.982 \|
	\| Fixer (CodeT5+ 220M) \| BLEU \| 81.0 \|
	\| \| ROUGE-L \| 0.788 \|
	\| \| Eval Loss \| 0.175 (3.1x better than v1) \|

	## Features
	- Detection Model: [GraphCodeBERT classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) — 125M params, two-phase training with ASL loss
	- Fix Generator: [CodeT5+ 220M](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) — CWE-aware input format, beam search generation
	- Structured Reports: CWE ID, OWASP category, severity score, exploit likelihood, plain English explanation
	- Attack Chain Analysis: Multi-vulnerability chaining analysis
	- REST API: JSON endpoint for integration into CI/CD pipelines

	## API Usage

	```python
	from gradio_client import Client

	client = Client("ayshajavd/code-security-analyzer")

	# Get markdown report
	report = client.predict(code="your code here", api_name="/analyze")

	# Get structured JSON report
	json_report = client.predict(code="your code here", api_name="/get_json_report")
	```

	## Models & Dataset
	- [graphcodebert-vuln-classifier](https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier) — Multi-label CWE detection
	- [codet5p-vuln-fixer](https://huggingface.co/ayshajavd/codet5p-vuln-fixer) — Vulnerability fix generation
	- [code-security-vulnerability-dataset](https://huggingface.co/datasets/ayshajavd/code-security-vulnerability-dataset) — 175K labeled samples

	## Training Notebooks
	All training code: [vuln-classifier-training-notebooks](https://huggingface.co/ayshajavd/vuln-classifier-training-notebooks)