A multi-label code vulnerability detection model that identifies 31 vulnerability classes (30 CWEs + safe) mapped to the OWASP Top 10 2021 categories. Fine-tuned from CodeBERTa-small-v1 on 175K+ labeled code samples.
Multi-label classification (BCEWithLogitsLoss with class weights)
Labels
31 (30 CWE categories + "safe")
Max Sequence Length
512 tokens
Recommended Threshold
0.5 (balanced precision/recall) or 0.3 (high recall, security-first)
Supported Languages
Python, JavaScript, Java, C, C++, PHP, Go
The model was trained on a diverse multi-language dataset. Performance is strongest on C/C++ (largest training subset from BigVul) and Python/JavaScript (from the multi-language datasets).
Evaluation Results (Test Set β 5,000 samples)
Threshold Comparison
Threshold
Macro F1
Micro F1
Weighted F1
Macro Precision
Macro Recall
0.2
0.066
0.301
0.859
0.048
0.562
0.3
0.081
0.458
0.865
0.057
0.502
0.4
0.101
0.626
0.870
0.070
0.439
0.5
0.125
0.739
0.870
0.088
0.366
Per-Class Performance (threshold=0.3)
OWASP A01:2021 β Broken Access Control
CWE
Name
Support
Precision
Recall
F1
CWE-22
Path Traversal
2
0.000
0.000
0.000
CWE-200
Information Exposure
30
0.063
0.800
0.117
CWE-264
Permissions/Privileges
23
0.025
0.696
0.049
CWE-269
Improper Privilege Mgmt
1
0.000
0.000
0.000
CWE-276
Incorrect Permissions
0
β
β
β
CWE-284
Access Control
5
0.000
0.000
0.000
CWE-352
CSRF
1
0.000
0.000
0.000
CWE-601
Open Redirect
0
β
β
β
OWASP A02:2021 β Cryptographic Failures
CWE
Name
Support
Precision
Recall
F1
CWE-310
Cryptographic Issues
5
0.000
0.000
0.000
CWE-327
Broken Crypto Algorithm
1
0.000
0.000
0.000
CWE-330
Insufficient Randomness
1
0.000
0.000
0.000
OWASP A03:2021 β Injection
CWE
Name
Support
Precision
Recall
F1
CWE-20
Input Validation
69
0.023
0.957
0.046
CWE-78
Command Injection
1
0.011
1.000
0.021
CWE-79
XSS
16
0.084
0.750
0.151
CWE-89
SQL Injection
15
0.096
1.000
0.174
CWE-94
Code Injection
27
0.123
1.000
0.220
CWE-119
Buffer Overflow
118
0.088
0.898
0.160
CWE-125
Out-of-bounds Read
35
0.048
0.829
0.091
CWE-190
Integer Overflow
14
0.033
1.000
0.064
CWE-401
Memory Leak
2
0.022
1.000
0.044
CWE-416
Use After Free
20
0.048
0.400
0.086
CWE-476
NULL Pointer Deref
30
0.032
0.867
0.061
CWE-787
Out-of-bounds Write
46
0.052
0.891
0.099
OWASP A04:2021 β Insecure Design
CWE
Name
Support
Precision
Recall
F1
CWE-362
Race Condition
11
0.035
0.636
0.065
CWE-399
Resource Management
21
0.008
0.857
0.015
CWE-434
File Upload
0
β
β
β
OWASP A07βA10
CWE
Name
Support
Precision
Recall
F1
CWE-287
Authentication
0
β
β
β
CWE-798
Hardcoded Credentials
0
β
β
β
CWE-502
Deserialization
10
0.056
1.000
0.106
CWE-918
SSRF
0
β
β
β
Key Metric: Safe Code Detection
Class
Support
Precision
Recall
F1
safe
4,496
0.927
0.975
0.950
Model Strengths
Excellent recall on many vulnerability classes (0.75β1.0 for SQL injection, buffer overflow, XSS, code injection, etc.)
High sensitivity β at threshold 0.3, catches most real vulnerabilities (macro recall=0.50)
Model Limitations
Low precision on rare classes β many false positives, especially on CWEs with few training examples
Precision can be improved by using threshold=0.5 (macro F1 improves to 0.125 but recall drops)
Classes with 0 test support cannot be evaluated
Design choice: For security applications, we prioritize recall (catching real vulnerabilities) over precision (reducing false positives). Missing a real vulnerability (false negative) is worse than flagging safe code (false positive).
BCEWithLogitsLoss (class-weighted, pos_weight clipped to 30x)
Training Subset
20K balanced samples
Optimizer
AdamW (fused)
Limitations
Class imbalance: Many rare CWE types have very few training examples, leading to high false positive rates
Sequence length: Limited to 512 tokens β long functions may be truncated
Language bias: Strongest on C/C++ due to BigVul's dominance. Go and PHP performance may be lower
Single-function analysis: Analyzes individual functions, not cross-function or cross-file vulnerabilities
Not a replacement: Should complement manual review and established SAST tools (Semgrep, CodeQL, etc.)
Interactive Demo
Try the model in our Code Security Analyzer Space β paste any code and get a full security report with OWASP mapping, severity scores, attack chain analysis, and suggested fixes.
Citation
@misc{graphcodebert-vuln-classifier,
title={GraphCodeBERT Vulnerability Classifier: Multi-label CWE Detection Mapped to OWASP Top 10},
author={ayshajavd},
year={2025},
url={https://huggingface.co/ayshajavd/graphcodebert-vuln-classifier}
}