| --- |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen2.5-Coder-7B-Instruct |
| pipeline_tag: text-generation |
| library_name: peft |
| tags: |
| - lora |
| - peft |
| - qwen2.5 |
| - miniscript |
| - code |
| --- |
| |
| # miniscript-code-helper-lora |
|
|
| This repository contains a LoRA adapter for `Qwen/Qwen2.5-Coder-7B-Instruct`, fine-tuned to help answer questions about the MiniScript programming language. |
|
|
| The adapter was trained on a small MiniScript Q&A corpus. On its own, it improves MiniScript awareness somewhat, but best results come when it is used together with a RAG pipeline over MiniScript reference materials. |
|
|
| ## Base model |
|
|
| - Qwen/Qwen2.5-Coder-7B-Instruct |
|
|
| ## What this repo contains |
|
|
| - PEFT/LoRA adapter weights only |
| - Not the full base model |
|
|
| ## Intended use |
|
|
| - Answering questions about MiniScript |
| - Assisting with MiniScript syntax and examples |
| - Best used with retrieval augmentation (RAG) |
|
|
| ## Limitations |
|
|
| - The adapter alone is not fully reliable |
| - It may still fall back to Python-flavored assumptions from the base model |
| - For best accuracy, pair it with a MiniScript documentation retriever |
|
|
| ## Example usage |
|
|
| ```python |
| from peft import PeftModel |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| base_model_id = "Qwen/Qwen2.5-Coder-7B-Instruct" |
| adapter_id = "JoeStrout/miniscript-code-helper-lora" |
| |
| tokenizer = AutoTokenizer.from_pretrained(base_model_id) |
| |
| base_model = AutoModelForCausalLM.from_pretrained( |
| base_model_id, |
| torch_dtype="auto", |
| device_map="auto", |
| ) |
| |
| model = PeftModel.from_pretrained(base_model, adapter_id) |
| model.eval() |
| |
| messages = [ |
| {"role": "system", "content": "You are a helpful assistant specializing in MiniScript programming."}, |
| {"role": "user", "content": "How do I iterate over a map in MiniScript?"}, |
| ] |
| |
| text = tokenizer.apply_chat_template( |
| messages, |
| tokenize=False, |
| add_generation_prompt=True, |
| ) |
| inputs = tokenizer([text], return_tensors="pt").to(model.device) |
| output = model.generate(**inputs, max_new_tokens=512) |
| response = tokenizer.decode( |
| output[0][len(inputs.input_ids[0]):], |
| skip_special_tokens=True, |
| ) |
| |
| print(response) |
| ``` |
|
|