jugalgajjar commited on
Commit
5e654f0
·
verified ·
1 Parent(s): f74f961

update model card

Browse files
Files changed (1) hide show
  1. README.md +83 -3
README.md CHANGED
@@ -1,3 +1,83 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # 🔐 PyJavaCPP-Vuln-Fixer
5
+
6
+ PyJavaCPP-Vuln-Fixer is a security-focused code repair model fine-tuned to automatically fix vulnerabilities in:
7
+
8
+ - Python
9
+ - Java
10
+ - C++
11
+
12
+ The model is built on Qwen2.5-Coder-1.5B-Instruct and fine-tuned using LoRA for automated vulnerability remediation.
13
+
14
+ It takes vulnerable source code as input and outputs only the fixed, secure version of the code.
15
+
16
+ ## 🚀 Quick Usage
17
+
18
+ ```python
19
+ from transformers import AutoModelForCausalLM, AutoTokenizer
20
+ import torch
21
+
22
+ model_id = "jugalgajjar/PyJavaCPP-Vuln-Fixer"
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
25
+ model = AutoModelForCausalLM.from_pretrained(
26
+ model_id,
27
+ torch_dtype=torch.float16,
28
+ device_map="auto",
29
+ )
30
+
31
+ SYSTEM_MESSAGE = (
32
+ "You are a code security expert. Given vulnerable source code, "
33
+ "output ONLY the fixed version of the code with the vulnerability repaired. "
34
+ "Do not include explanations, just the corrected code."
35
+ )
36
+
37
+ language = "python"
38
+ vulnerable_code = """import os
39
+ from flask import Flask, request
40
+
41
+ app = Flask(__name__)
42
+
43
+ @app.route("/run")
44
+ def run():
45
+ cmd = request.args.get("cmd")
46
+ return os.popen(cmd).read()
47
+
48
+ if __name__ == "__main__":
49
+ app.run()"""
50
+
51
+ messages = [
52
+ {"role": "system", "content": SYSTEM_MESSAGE},
53
+ {"role": "user", "content": f"Fix the below given vulnerable {language} code:\n{vulnerable_code}"},
54
+ ]
55
+
56
+ prompt = tokenizer.apply_chat_template(
57
+ messages,
58
+ tokenize=False,
59
+ add_generation_prompt=True,
60
+ )
61
+
62
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
63
+
64
+ with torch.no_grad():
65
+ outputs = model.generate(
66
+ **inputs,
67
+ max_new_tokens=1024,
68
+ temperature=0.2,
69
+ top_p=0.95,
70
+ do_sample=True,
71
+ repetition_penalty=1.15,
72
+ )
73
+
74
+ new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
75
+ print(tokenizer.decode(new_tokens, skip_special_tokens=True))
76
+ ```
77
+
78
+ ## 🎯 Intended Use
79
+
80
+ - Automated vulnerability remediation
81
+ - Secure code refactoring
82
+ - Research in AI-assisted program repair
83
+ - Secure CI/CD integration