| | --- |
| | license: apache-2.0 |
| | --- |
| | # π PyJavaCPP-Vuln-Fixer |
| |
|
| | PyJavaCPP-Vuln-Fixer is a security-focused code repair model fine-tuned to automatically fix vulnerabilities in: |
| |
|
| | - Python |
| | - Java |
| | - C++ |
| |
|
| | The model is built on Qwen2.5-Coder-1.5B-Instruct and fine-tuned using LoRA for automated vulnerability remediation. |
| |
|
| | It takes vulnerable source code as input and outputs only the fixed, secure version of the code. |
| |
|
| | ## π Quick Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model_id = "jugalgajjar/PyJavaCPP-Vuln-Fixer" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | torch_dtype=torch.float16, |
| | device_map="auto", |
| | ) |
| | |
| | SYSTEM_MESSAGE = ( |
| | "You are a code security expert. Given vulnerable source code, " |
| | "output ONLY the fixed version of the code with the vulnerability repaired. " |
| | "Do not include explanations, just the corrected code." |
| | ) |
| | |
| | language = "python" |
| | vulnerable_code = """import os |
| | from flask import Flask, request |
| | |
| | app = Flask(__name__) |
| | |
| | @app.route("/run") |
| | def run(): |
| | cmd = request.args.get("cmd") |
| | return os.popen(cmd).read() |
| | |
| | if __name__ == "__main__": |
| | app.run()""" |
| | |
| | messages = [ |
| | {"role": "system", "content": SYSTEM_MESSAGE}, |
| | {"role": "user", "content": f"Fix the below given vulnerable {language} code:\n{vulnerable_code}"}, |
| | ] |
| | |
| | prompt = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True, |
| | ) |
| | |
| | inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| | |
| | with torch.no_grad(): |
| | outputs = model.generate( |
| | **inputs, |
| | max_new_tokens=1024, |
| | temperature=0.2, |
| | top_p=0.95, |
| | do_sample=True, |
| | repetition_penalty=1.15, |
| | ) |
| | |
| | new_tokens = outputs[0][inputs["input_ids"].shape[1]:] |
| | print(tokenizer.decode(new_tokens, skip_special_tokens=True)) |
| | ``` |
| |
|
| | ## π― Intended Use |
| |
|
| | - Automated vulnerability remediation |
| | - Secure code refactoring |
| | - Research in AI-assisted program repair |
| | - Secure CI/CD integration |