File size: 4,106 Bytes
43a9cd4 78f5b35 04d3a3f 43a9cd4 06ecae2 43a9cd4 78f5b35 43a9cd4 06ecae2 43a9cd4 78f5b35 43a9cd4 78f5b35 43a9cd4 06ecae2 26660e5 43a9cd4 78f5b35 43a9cd4 78f5b35 43a9cd4 78f5b35 43a9cd4 78f5b35 43a9cd4 78f5b35 06ecae2 78f5b35 43a9cd4 78f5b35 43a9cd4 78f5b35 43a9cd4 78f5b35 43a9cd4 78f5b35 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | ---
library_name: peft
base_model: codellama/CodeLlama-7b-Instruct-hf
tags:
- instruction-tuning
- qlora
- code-llama
- text-generation
language:
- en
datasets:
- mingyue0101/prompt_code_parquet
- mingyue0101/prompts_modi
license: apache-2.0
---
# Model Card for codellama-7b-matplotlib-assistant
This model is a fine-tuned version of `codellama/CodeLlama-7b-Instruct-hf` designed to enhance instruction-following capabilities. It was developed as part of a Master's thesis project.
## Model Details
### Model Description
The `codellama-7b-matplotlib-assistant` model is a large language model fine-tuned using the QLoRA (4-bit Quantization + LoRA) technique. The goal of this model was to adapt the base CodeLlama model to better follow user instructions while maintaining its coding and reasoning capabilities.
- **Developed by:** mingyue0101
- **Model type:** Causal Language Model (Fine-tuned with PEFT/LoRA)
- **Language(s) (NLP):** English, Chinese
- **License:** Apache-2.0 (inherited from CodeLlama)
- **Finetuned from model:** codellama/CodeLlama-7b-Instruct-hf
### Model Sources
- **Repository:** https://huggingface.co/mingyue0101/codellama-7b-matplotlib-assistant
- **Dataset:** https://huggingface.co/datasets/mingyue0101/prompt_code_parquet
## Uses
### Direct Use
The model can be used for text generation, code assistance, and general-purpose instruction following. It is particularly suited for tasks where a balance of technical coding knowledge and conversational instruction following is required.
### Out-of-Scope Use
The model should not be used for high-stakes decision-making, generating malicious code, or any application that violates the safety guidelines of the base CodeLlama model.
## Bias, Risks, and Limitations
This model may inherit biases present in the training data or the base model. Since it was fine-tuned on a specific dataset (`parquet02`), it might exhibit limitations when handling domains outside of its training distribution. Users should expect potential hallucinations in complex reasoning tasks.
### Recommendations
Users are encouraged to use safety filters when deploying this model in production and to perform domain-specific evaluation before use.
## How to Get Started with the Model
Use the code below to load the model in 4-bit precision:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
model_id = "codellama/CodeLlama-7b-Instruct-hf"
peft_model_id = "mingyue0101/codellama-7b-matplotlib-assistant"
# Load 4-bit configuration
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
# Load base model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
base_model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
# Load the fine-tuned adapter
model = PeftModel.from_pretrained(base_model, peft_model_id)
# Inference
prompt = "Write a Python function to sort a list."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Training Details
### Training Data
The model was trained on the `mingyue0101/parquet02` dataset. This dataset contains instruction-response pairs formatted for Supervised Fine-Tuning (SFT).
### Training Procedure
**Training Hyperparameters**
- Training regime: QLoRA 4-bit (NF4) mixed precision (fp16)
- Learning rate: 2e-4
- Optimizer: paged_adamw_32bit
- Batch size: 4
- Epochs: 1
- LoRA Rank (r): 64
- LoRA Alpha: 16
- LoRA Dropout: 0.1
- LR Scheduler: constant
- Warmup Ratio: 0.03
## Technical Specifications
### Model Architecture and Objective
Based on the Llama 2 architecture, this model utilizes grouped-query attention (GQA) and rotary positional embeddings (RoPE), fine-tuned with a causal language modeling objective.
### Compute Infrastructure
### Software
- PEFT 0.10.0
- Transformers
- Bitsandbytes
- TRL (SFTTrainer) |