mistral-7b-docstring

Mistral 7B fine-tuned with QLoRA on Python docstring generation from CodeSearchNet.

Outperforms Llama 3.3 70B — a model 10x larger — on both ROUGE-L and BERTScore on domain-specific NumPy-style docstring generation.

Evaluation results

Evaluated on 100 held-out Python functions from CodeSearchNet (never seen during training).

Model	ROUGE-L	BERTScore F1
Mistral 7B fine-tuned (this model)	0.2033	0.7739
Llama 3.3 70B via Groq	0.1715	0.7594
Mistral 7B base (no fine-tuning)	0.1102	0.7118

The fine-tuned 7B model beats Llama 3.3 70B on ROUGE-L (+18.5%) and BERTScore (+1.9%) while being 10x smaller and running at a fraction of the inference cost.

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE_MODEL = "mistralai/Mistral-7B-v0.1"

# Load in 4-bit for efficient inference
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "kk014/mistral-7b-docstring")
model.eval()

# Generate a docstring
function_code = """
def calculate_bmi(weight_kg, height_m):
    return weight_kg / (height_m ** 2)
""".strip()

prompt = (
    "You are a Python documentation expert. "
    "Write a clear, concise NumPy-style docstring for the following Python function.\n\n"
    f"### Function:\n{function_code}\n\n"
    "### Docstring:"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
docstring  = generated[len(prompt):].strip()
print(docstring)

Training details

Parameter	Value
Base model	mistralai/Mistral-7B-v0.1
Dataset	CodeSearchNet (Python split)
Training samples	8,000
Method	QLoRA (4-bit NF4 quantisation)
LoRA rank	16
LoRA alpha	32
Epochs	1
Batch size	2 (effective 16 with grad accum)
Learning rate	2e-4
Hardware	Kaggle T4 x2 (free tier)
Training time	~4 hours
Framework	HuggingFace PEFT + TRL

Limitations

Trained on NumPy-style docstrings specifically — output style may differ for Google or Sphinx style
Best on standalone functions under ~50 lines
May repeat examples in generated output at very low temperatures
Evaluated on CodeSearchNet Python split only — performance on other codebases may vary

Citation

If you use this model, please cite the original QLoRA paper:

@article{dettmers2023qlora,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Dettmers, Tim and others},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kk014/mistral-7b-docstring

Base model

mistralai/Mistral-7B-v0.1

Adapter

(2462)

this model

Dataset used to train kk014/mistral-7b-docstring

Paper for kk014/mistral-7b-docstring

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 63