File size: 2,050 Bytes
5e654f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: apache-2.0
---
# 🔐 PyJavaCPP-Vuln-Fixer

PyJavaCPP-Vuln-Fixer is a security-focused code repair model fine-tuned to automatically fix vulnerabilities in:

- Python  
- Java  
- C++  

The model is built on Qwen2.5-Coder-1.5B-Instruct and fine-tuned using LoRA for automated vulnerability remediation.

It takes vulnerable source code as input and outputs only the fixed, secure version of the code.

## 🚀 Quick Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "jugalgajjar/PyJavaCPP-Vuln-Fixer"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

SYSTEM_MESSAGE = (
    "You are a code security expert. Given vulnerable source code, "
    "output ONLY the fixed version of the code with the vulnerability repaired. "
    "Do not include explanations, just the corrected code."
)

language = "python"
vulnerable_code = """import os
from flask import Flask, request

app = Flask(__name__)

@app.route("/run")
def run():
    cmd = request.args.get("cmd")
    return os.popen(cmd).read()

if __name__ == "__main__":
    app.run()"""

messages = [
    {"role": "system", "content": SYSTEM_MESSAGE},
    {"role": "user", "content": f"Fix the below given vulnerable {language} code:\n{vulnerable_code}"},
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.2,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.15,
    )

new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(new_tokens, skip_special_tokens=True))
```

## 🎯 Intended Use

- Automated vulnerability remediation
- Secure code refactoring
- Research in AI-assisted program repair
- Secure CI/CD integration