--- library_name: peft base_model: codellama/CodeLlama-7b-Instruct-hf tags: - instruction-tuning - qlora - code-llama - text-generation language: - en datasets: - mingyue0101/prompt_code_parquet - mingyue0101/prompts_modi license: apache-2.0 --- # Model Card for codellama-7b-matplotlib-assistant This model is a fine-tuned version of `codellama/CodeLlama-7b-Instruct-hf` designed to enhance instruction-following capabilities. It was developed as part of a Master's thesis project. ## Model Details ### Model Description The `codellama-7b-matplotlib-assistant` model is a large language model fine-tuned using the QLoRA (4-bit Quantization + LoRA) technique. The goal of this model was to adapt the base CodeLlama model to better follow user instructions while maintaining its coding and reasoning capabilities. - **Developed by:** mingyue0101 - **Model type:** Causal Language Model (Fine-tuned with PEFT/LoRA) - **Language(s) (NLP):** English, Chinese - **License:** Apache-2.0 (inherited from CodeLlama) - **Finetuned from model:** codellama/CodeLlama-7b-Instruct-hf ### Model Sources - **Repository:** https://huggingface.co/mingyue0101/codellama-7b-matplotlib-assistant - **Dataset:** https://huggingface.co/datasets/mingyue0101/prompt_code_parquet ## Uses ### Direct Use The model can be used for text generation, code assistance, and general-purpose instruction following. It is particularly suited for tasks where a balance of technical coding knowledge and conversational instruction following is required. ### Out-of-Scope Use The model should not be used for high-stakes decision-making, generating malicious code, or any application that violates the safety guidelines of the base CodeLlama model. ## Bias, Risks, and Limitations This model may inherit biases present in the training data or the base model. Since it was fine-tuned on a specific dataset (`parquet02`), it might exhibit limitations when handling domains outside of its training distribution. Users should expect potential hallucinations in complex reasoning tasks. ### Recommendations Users are encouraged to use safety filters when deploying this model in production and to perform domain-specific evaluation before use. ## How to Get Started with the Model Use the code below to load the model in 4-bit precision: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel model_id = "codellama/CodeLlama-7b-Instruct-hf" peft_model_id = "mingyue0101/codellama-7b-matplotlib-assistant" # Load 4-bit configuration bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) # Load base model and tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) base_model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto" ) # Load the fine-tuned adapter model = PeftModel.from_pretrained(base_model, peft_model_id) # Inference prompt = "Write a Python function to sort a list." inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=128) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details ### Training Data The model was trained on the `mingyue0101/parquet02` dataset. This dataset contains instruction-response pairs formatted for Supervised Fine-Tuning (SFT). ### Training Procedure **Training Hyperparameters** - Training regime: QLoRA 4-bit (NF4) mixed precision (fp16) - Learning rate: 2e-4 - Optimizer: paged_adamw_32bit - Batch size: 4 - Epochs: 1 - LoRA Rank (r): 64 - LoRA Alpha: 16 - LoRA Dropout: 0.1 - LR Scheduler: constant - Warmup Ratio: 0.03 ## Technical Specifications ### Model Architecture and Objective Based on the Llama 2 architecture, this model utilizes grouped-query attention (GQA) and rotary positional embeddings (RoPE), fine-tuned with a causal language modeling objective. ### Compute Infrastructure ### Software - PEFT 0.10.0 - Transformers - Bitsandbytes - TRL (SFTTrainer)