| --- |
| library_name: peft |
| base_model: codellama/CodeLlama-7b-Instruct-hf |
| tags: |
| - instruction-tuning |
| - qlora |
| - code-llama |
| - text-generation |
| language: |
| - en |
| datasets: |
| - mingyue0101/prompt_code_parquet |
| - mingyue0101/prompts_modi |
| license: apache-2.0 |
| --- |
| |
| # Model Card for codellama-7b-matplotlib-assistant |
|
|
| This model is a fine-tuned version of `codellama/CodeLlama-7b-Instruct-hf` designed to enhance instruction-following capabilities. It was developed as part of a Master's thesis project. |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| The `codellama-7b-matplotlib-assistant` model is a large language model fine-tuned using the QLoRA (4-bit Quantization + LoRA) technique. The goal of this model was to adapt the base CodeLlama model to better follow user instructions while maintaining its coding and reasoning capabilities. |
|
|
| - **Developed by:** mingyue0101 |
| - **Model type:** Causal Language Model (Fine-tuned with PEFT/LoRA) |
| - **Language(s) (NLP):** English, Chinese |
| - **License:** Apache-2.0 (inherited from CodeLlama) |
| - **Finetuned from model:** codellama/CodeLlama-7b-Instruct-hf |
|
|
| ### Model Sources |
|
|
| - **Repository:** https://huggingface.co/mingyue0101/codellama-7b-matplotlib-assistant |
| - **Dataset:** https://huggingface.co/datasets/mingyue0101/prompt_code_parquet |
|
|
| ## Uses |
|
|
| ### Direct Use |
|
|
| The model can be used for text generation, code assistance, and general-purpose instruction following. It is particularly suited for tasks where a balance of technical coding knowledge and conversational instruction following is required. |
|
|
| ### Out-of-Scope Use |
|
|
| The model should not be used for high-stakes decision-making, generating malicious code, or any application that violates the safety guidelines of the base CodeLlama model. |
|
|
| ## Bias, Risks, and Limitations |
|
|
| This model may inherit biases present in the training data or the base model. Since it was fine-tuned on a specific dataset (`parquet02`), it might exhibit limitations when handling domains outside of its training distribution. Users should expect potential hallucinations in complex reasoning tasks. |
|
|
| ### Recommendations |
|
|
| Users are encouraged to use safety filters when deploying this model in production and to perform domain-specific evaluation before use. |
|
|
| ## How to Get Started with the Model |
|
|
| Use the code below to load the model in 4-bit precision: |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
| from peft import PeftModel |
| |
| model_id = "codellama/CodeLlama-7b-Instruct-hf" |
| peft_model_id = "mingyue0101/codellama-7b-matplotlib-assistant" |
| |
| # Load 4-bit configuration |
| bnb_config = BitsAndBytesConfig( |
| load_in_4bit=True, |
| bnb_4bit_quant_type="nf4", |
| bnb_4bit_compute_dtype=torch.float16, |
| ) |
| |
| # Load base model and tokenizer |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| base_model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| quantization_config=bnb_config, |
| device_map="auto" |
| ) |
| |
| # Load the fine-tuned adapter |
| model = PeftModel.from_pretrained(base_model, peft_model_id) |
| |
| # Inference |
| prompt = "Write a Python function to sort a list." |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| outputs = model.generate(**inputs, max_new_tokens=128) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
| ## Training Details |
| ### Training Data |
| The model was trained on the `mingyue0101/parquet02` dataset. This dataset contains instruction-response pairs formatted for Supervised Fine-Tuning (SFT). |
|
|
| ### Training Procedure |
| **Training Hyperparameters** |
| - Training regime: QLoRA 4-bit (NF4) mixed precision (fp16) |
| - Learning rate: 2e-4 |
| - Optimizer: paged_adamw_32bit |
| - Batch size: 4 |
| - Epochs: 1 |
| - LoRA Rank (r): 64 |
| - LoRA Alpha: 16 |
| - LoRA Dropout: 0.1 |
| - LR Scheduler: constant |
| - Warmup Ratio: 0.03 |
|
|
| ## Technical Specifications |
| ### Model Architecture and Objective |
| Based on the Llama 2 architecture, this model utilizes grouped-query attention (GQA) and rotary positional embeddings (RoPE), fine-tuned with a causal language modeling objective. |
|
|
| ### Compute Infrastructure |
| ### Software |
|
|
| - PEFT 0.10.0 |
| - Transformers |
| - Bitsandbytes |
| - TRL (SFTTrainer) |