DebugGPT LoRA Adapter for Phi-3 Mini

A lightweight LoRA adapter fine-tuned on synthetic Python bug-fixing tasks using the MBPP dataset. This model enhances the ability of Phi-3 Mini to detect and correct common Python syntax errors while preserving general language capabilities.

Model Description

Base Model: microsoft/phi-3-mini-4k-instruct
Fine-Tuning Method: QLoRA (Low-Rank Adaptation with 4-bit quantization)
Task: Automated Python bug fixing

The model takes buggy Python code as input and generates the corrected version.

Intended Use

This model is designed for:

Python debugging assistance
Educational coding tools
AI-assisted code correction
Research experiments in code repair

Out-of-Scope Use

Production-critical systems
Security-sensitive applications
Complex multi-file debugging

Dataset

We use the MBPP (Mostly Basic Python Problems) dataset. Since MBPP contains correct code, we generate a bug-fixing dataset by injecting synthetic bugs.

Data Format

Each example follows an instruction-tuning format:

{
  "instruction": "Fix the bug in the following Python code",
  "input": "<buggy code>",
  "output": "<correct code>"
}

Bug Injection Strategy

We introduce controlled bugs such as:

Operator replacement (+ → -)
Comparison changes (> → <)
Removal of return statements

Dataset Size

Split	Samples
Train	~374
Validation	~90
Test	~500

Training Procedure

Method: QLoRA

To enable efficient training on limited hardware:

Base model loaded in 4-bit precision (NF4)
Base weights frozen
Only LoRA adapters trained

LoRA Configuration

Parameter	Value
Rank (r)	16
Alpha	32
Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj

Training Configuration

Parameter	Value
Epochs	3
Learning Rate	2e-4
Batch Size	1
Gradient Accumulation	8
Precision	FP16
Optimizer	AdamW

Hardware & Frameworks

GPU: NVIDIA Tesla T4
Frameworks: Hugging Face Transformers, PEFT (LoRA), TRL (SFTTrainer), Weights & Biases

Evaluation Results

Performance Summary

Metric	Base Model	Fine-Tuned Model
Syntax Fix Accuracy	Low	Noticeably Higher
Indentation Correction	Inconsistent	Reliable
Variable Error Fixing	Occasional	Improved
Complex Logic Bugs	Limited	Limited (unchanged)
Instruction Adherence	Moderate	High

Note: Quantitative metrics (e.g., exact match accuracy, CodeBLEU) were not computed due to dataset and tooling constraints.

Example

Input — Buggy Code

for i in range(5)
    print(i)

Output — Fixed Code

for i in range(5):
    print(i)

Limitations

Small dataset size limits generalization
Focused primarily on syntax-level bugs
Limited performance on complex logical errors
Not evaluated on large-scale real-world codebases

Discussion

What Worked Well

QLoRA enabled efficient fine-tuning on limited hardware
Significant improvement in syntax correction tasks
Strong adherence to instruction format

Challenges

Limited dataset size
Lack of quantitative evaluation metrics
Difficulty handling complex multi-line logic bugs

Ethical Considerations

The model may generate incorrect fixes for complex bugs
Should be used as an assistive tool, not a final authority
Users should validate outputs before deployment

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-3-mini-4k-instruct"
)

tokenizer = AutoTokenizer.from_pretrained(
    "microsoft/phi-3-mini-4k-instruct"
)

model = PeftModel.from_pretrained(
    base_model,
    "Sud1212/phi3-debug-llm-lora"
)

prompt = "Fix the bug:\nfor i in range(5)\n    print(i)"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Resources

GitHub Repository: Phi3-debugLLM-LoRA
Weights & Biases Dashboard: W&B Project
Dataset (MBPP): Hugging Face Datasets

Author

Sudarshan Maddi Woxsen University

License

MIT License

Downloads last month: 15

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Sud1212
/

phi3-debug-llm-lora