jugalgajjar
/

PyJavaCPP-Vuln-Fixer

Model card Files Files and versions

jugalgajjar commited on 16 days ago

Commit

5e654f0

·

verified ·

1 Parent(s): f74f961

update model card

Files changed (1) hide show

README.md +83 -3

README.md CHANGED Viewed

@@ -1,3 +1,83 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# 🔐 PyJavaCPP-Vuln-Fixer
+PyJavaCPP-Vuln-Fixer is a security-focused code repair model fine-tuned to automatically fix vulnerabilities in:
+- Python
+- Java
+- C++
+The model is built on Qwen2.5-Coder-1.5B-Instruct and fine-tuned using LoRA for automated vulnerability remediation.
+It takes vulnerable source code as input and outputs only the fixed, secure version of the code.
+## 🚀 Quick Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "jugalgajjar/PyJavaCPP-Vuln-Fixer"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+SYSTEM_MESSAGE = (
+    "You are a code security expert. Given vulnerable source code, "
+    "output ONLY the fixed version of the code with the vulnerability repaired. "
+    "Do not include explanations, just the corrected code."
+)
+language = "python"
+vulnerable_code = """import os
+from flask import Flask, request
+app = Flask(__name__)
+@app.route("/run")
+def run():
+    cmd = request.args.get("cmd")
+    return os.popen(cmd).read()
+if __name__ == "__main__":
+    app.run()"""
+messages = [
+    {"role": "system", "content": SYSTEM_MESSAGE},
+    {"role": "user", "content": f"Fix the below given vulnerable {language} code:\n{vulnerable_code}"},
+]
+prompt = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=1024,
+        temperature=0.2,
+        top_p=0.95,
+        do_sample=True,
+        repetition_penalty=1.15,
+    )
+new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
+print(tokenizer.decode(new_tokens, skip_special_tokens=True))
+```
+## 🎯 Intended Use
+- Automated vulnerability remediation
+- Secure code refactoring
+- Research in AI-assisted program repair
+- Secure CI/CD integration