| --- |
| language: en |
| tags: |
| - code |
| - security |
| - vulnerability-detection |
| - codebert |
| - classification |
| license: mit |
| --- |
| |
| # PolyGuard — Code Vulnerability Scanner |
|
|
| A fine-tuned [CodeBERT](https://huggingface.co/microsoft/codebert-base) model |
| for detecting security vulnerabilities in source code. |
|
|
| ## Supported Languages |
| Python, JavaScript, SQL, PHP, Java, C, C++, Go, Ruby, Rust |
|
|
| ## Performance |
| - **F1 Score**: 0.6698 |
| - **Training samples**: 16681 |
| - **Base model**: microsoft/codebert-base |
| - **Trained at**: 2026-04-29 |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| model_id = "MUHAMMADSAADAMIN/PolyGuard" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| model.eval() |
| |
| code = "eval(input())" |
| inputs = tokenizer(code, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| logits = model(**inputs).logits |
| |
| probs = torch.softmax(logits, dim=1).squeeze().tolist() |
| print(f"Clean: {probs[0]*100:.1f}% Vulnerable: {probs[1]*100:.1f}%") |
| ``` |
|
|
| ## Labels |
| - 0 = Clean / Safe |
| - 1 = Vulnerable |
|
|
| ## Training Data |
| Fine-tuned on CrossVUL dataset (~9,300 real-world CVE pairs) with |
| curated augmentation examples covering common CWEs. |
|
|