🧠 Llama-3.2-1B Code Solver (QLoRA Fine-Tuned)

A lightweight yet powerful code-focused language model fine-tuned from Meta Llama-3.2-1B using QLoRA (4-bit) on the CodeAlpaca-20K dataset.
Designed for efficient code generation, reasoning, and problem-solving on limited GPU resources.

πŸš€ Trained on a single Tesla P100 GPU
⚑ Optimized for Kaggle, Colab, and low-VRAM environments
🧩 Ideal for research, education, and rapid prototyping


πŸ” Model Overview

Attribute Value
Base Model meta-llama/Llama-3.2-1B
Model Type Decoder-only causal language model
Fine-Tuning Method QLoRA (4-bit quantization + LoRA)
LoRA Rank 16
Task Domain Code generation & code reasoning
Training Samples 10,000
Training Time ~5 hours
Hardware NVIDIA Tesla P100
Precision 4-bit (NF4)
Frameworks Hugging Face Transformers, PEFT, BitsAndBytes

🎯 What This Model Is Good At

  • πŸ§‘β€πŸ’» Code generation (Python-focused, but generalizable)
  • 🧠 Step-by-step coding reasoning
  • πŸ§ͺ Algorithmic problem solving
  • πŸ“˜ Educational coding assistance
  • βš™οΈ Running efficiently on low-VRAM GPUs

πŸ“š Training Dataset

CodeAlpaca-20K

A high-quality instruction-tuning dataset derived from the Alpaca format and specialized for coding tasks.

  • Total dataset size: 20,000 samples
  • Used for training: 10,000 samples (50%)
  • Data format:
    {
      "instruction": "Describe the coding task",
      "input": "Optional context or input code",
      "output": "Expected code solution"
    }
    
  • Task Types:

    • Algorithm implementation
    • Code completion
    • Debugging
    • Function writing
    • Problem solving

πŸ—οΈ Training Methodology

This model was fine-tuned using QLoRA, enabling efficient adaptation of large language models on limited hardware.

Key Techniques Used

  • 4-bit Quantization (NF4) via BitsAndBytes
  • LoRA adapters applied to attention layers
  • Frozen base model weights
  • Low-rank updates only

Why QLoRA?

  • πŸ”» Drastically reduces GPU memory usage
  • ⚑ Enables training on consumer-grade GPUs
  • πŸ“ˆ Maintains strong downstream performance

βš™οΈ Training Configuration

Parameter Value
Max Sequence Length 1024
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.05
Optimizer AdamW
Learning Rate 2e-4
Batch Size Small (GPU-constrained)
Gradient Accumulation Enabled
Quantization 4-bit

πŸš€ Usage

Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "YOUR_USERNAME/llama-3.2-1b-code-solver"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    load_in_4bit=True
)

Example Inference

prompt = "Write a Python function to check if a number is prime."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ§ͺ Evaluation Notes

  • This model is instruction-tuned, not benchmark-optimized
  • No formal benchmarks (HumanEval / MBPP) were run
  • Best evaluated through qualitative code generation

⚠️ Limitations

  • 1B parameters β†’ limited long-context reasoning
  • Not optimized for natural language chat
  • May hallucinate on complex or ambiguous prompts
  • English-centric training data

🧭 Intended Use

βœ… Allowed

  • Research and experimentation
  • Coding assistants
  • Educational tools
  • Prototyping LLM systems

πŸ™ Acknowledgements

  • Meta AI for Llama 3.2
  • CodeAlpaca dataset creators
  • Hugging Face ecosystem
  • QLoRA & PEFT authors
Downloads last month
23,649
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ShahriarFerdoush/llama-3.2-1b-code-instruct

Finetuned
(887)
this model

Dataset used to train ShahriarFerdoush/llama-3.2-1b-code-instruct