AetherPrior commited on
Commit
afbe1d6
·
verified ·
1 Parent(s): 3d98807

Add dataset card with CWE labeling details

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CWE-Code_Vulnerability_Security_DPO
2
+
3
+ This dataset adds a `cwe_ids` field to each row in
4
+ `CyberNative/Code_Vulnerability_Security_DPO` by querying an LLM
5
+ with the row's vulnerability description and rejected code sample.
6
+
7
+ ## How the CWE IDs were obtained
8
+
9
+ - Model: gpt-5-mini via the OpenAI Responses API
10
+ - Reasoning: low effort
11
+ - Tooling: `web_search` limited to `cwe.mitre.org`
12
+ - Prompt format:
13
+
14
+ ```
15
+ Given the following description of a CWE and a code segment that replicates it:
16
+ Description: {desc}
17
+
18
+ Code: {vuln_code}
19
+
20
+ What's the most likely CWE ID associated with this vulnerability? Answer in the following format:
21
+
22
+ ## Answer:
23
+ CWE-#: CWE Name
24
+ ```
25
+
26
+ - Parsing: lines starting with `CWE-` are extracted and the numeric ID is retained.