mistral-7b-docstring

Mistral 7B fine-tuned with QLoRA on Python docstring generation from CodeSearchNet.

Outperforms Llama 3.3 70B — a model 10x larger — on both ROUGE-L and BERTScore on domain-specific NumPy-style docstring generation.

Evaluation results

Evaluated on 100 held-out Python functions from CodeSearchNet (never seen during training).

Model ROUGE-L BERTScore F1
Mistral 7B fine-tuned (this model) 0.2033 0.7739
Llama 3.3 70B via Groq 0.1715 0.7594
Mistral 7B base (no fine-tuning) 0.1102 0.7118

The fine-tuned 7B model beats Llama 3.3 70B on ROUGE-L (+18.5%) and BERTScore (+1.9%) while being 10x smaller and running at a fraction of the inference cost.

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE_MODEL = "mistralai/Mistral-7B-v0.1"

# Load in 4-bit for efficient inference
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "kk014/mistral-7b-docstring")
model.eval()

# Generate a docstring
function_code = """
def calculate_bmi(weight_kg, height_m):
    return weight_kg / (height_m ** 2)
""".strip()

prompt = (
    "You are a Python documentation expert. "
    "Write a clear, concise NumPy-style docstring for the following Python function.\n\n"
    f"### Function:\n{function_code}\n\n"
    "### Docstring:"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
docstring  = generated[len(prompt):].strip()
print(docstring)

Training details

Parameter Value
Base model mistralai/Mistral-7B-v0.1
Dataset CodeSearchNet (Python split)
Training samples 8,000
Method QLoRA (4-bit NF4 quantisation)
LoRA rank 16
LoRA alpha 32
Epochs 1
Batch size 2 (effective 16 with grad accum)
Learning rate 2e-4
Hardware Kaggle T4 x2 (free tier)
Training time ~4 hours
Framework HuggingFace PEFT + TRL

Limitations

  • Trained on NumPy-style docstrings specifically — output style may differ for Google or Sphinx style
  • Best on standalone functions under ~50 lines
  • May repeat examples in generated output at very low temperatures
  • Evaluated on CodeSearchNet Python split only — performance on other codebases may vary

Citation

If you use this model, please cite the original QLoRA paper:

@article{dettmers2023qlora,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Dettmers, Tim and others},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}
Downloads last month
85
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kk014/mistral-7b-docstring

Adapter
(2471)
this model

Dataset used to train kk014/mistral-7b-docstring

Paper for kk014/mistral-7b-docstring