🧠 Llama-3.2-1B Code Solver (QLoRA Fine-Tuned)

A lightweight yet powerful code-focused language model fine-tuned from Meta Llama-3.2-1B using QLoRA (4-bit) on the CodeAlpaca-20K dataset.
Designed for efficient code generation, reasoning, and problem-solving on limited GPU resources.

🚀 Trained on a single Tesla P100 GPU
⚡ Optimized for Kaggle, Colab, and low-VRAM environments
🧩 Ideal for research, education, and rapid prototyping

🔍 Model Overview

Attribute	Value
Base Model	`meta-llama/Llama-3.2-1B`
Model Type	Decoder-only causal language model
Fine-Tuning Method	QLoRA (4-bit quantization + LoRA)
LoRA Rank	16
Task Domain	Code generation & code reasoning
Training Samples	10,000
Training Time	~5 hours
Hardware	NVIDIA Tesla P100
Precision	4-bit (NF4)
Frameworks	Hugging Face Transformers, PEFT, BitsAndBytes

🎯 What This Model Is Good At

🧑‍💻 Code generation (Python-focused, but generalizable)
🧠 Step-by-step coding reasoning
🧪 Algorithmic problem solving
📘 Educational coding assistance
⚙️ Running efficiently on low-VRAM GPUs

📚 Training Dataset

CodeAlpaca-20K

A high-quality instruction-tuning dataset derived from the Alpaca format and specialized for coding tasks.

Total dataset size: 20,000 samples
Used for training: 10,000 samples (50%)

Data format:

{
  "instruction": "Describe the coding task",
  "input": "Optional context or input code",
  "output": "Expected code solution"
}

Task Types:
- Algorithm implementation
- Code completion
- Debugging
- Function writing
- Problem solving

🏗️ Training Methodology

This model was fine-tuned using QLoRA, enabling efficient adaptation of large language models on limited hardware.

Key Techniques Used

4-bit Quantization (NF4) via BitsAndBytes
LoRA adapters applied to attention layers
Frozen base model weights
Low-rank updates only

Why QLoRA?

🔻 Drastically reduces GPU memory usage
⚡ Enables training on consumer-grade GPUs
📈 Maintains strong downstream performance

⚙️ Training Configuration

Parameter	Value
Max Sequence Length	1024
LoRA Rank (r)	16
LoRA Alpha	32
LoRA Dropout	0.05
Optimizer	AdamW
Learning Rate	2e-4
Batch Size	Small (GPU-constrained)
Gradient Accumulation	Enabled
Quantization	4-bit

🚀 Usage

Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "YOUR_USERNAME/llama-3.2-1b-code-solver"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    load_in_4bit=True
)

Example Inference

prompt = "Write a Python function to check if a number is prime."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🧪 Evaluation Notes

This model is instruction-tuned, not benchmark-optimized
No formal benchmarks (HumanEval / MBPP) were run
Best evaluated through qualitative code generation

⚠️ Limitations

1B parameters → limited long-context reasoning
Not optimized for natural language chat
May hallucinate on complex or ambiguous prompts
English-centric training data

🧭 Intended Use

✅ Allowed

Research and experimentation
Coding assistants
Educational tools
Prototyping LLM systems

🙏 Acknowledgements

Meta AI for Llama 3.2
CodeAlpaca dataset creators
Hugging Face ecosystem
QLoRA & PEFT authors

Downloads last month: 23,649

Safetensors

Model size

1B params

Tensor type

F16

Model tree for ShahriarFerdoush/llama-3.2-1b-code-instruct

Base model

meta-llama/Llama-3.2-1B

Finetuned

(887)

this model

ShahriarFerdoush
/

llama-3.2-1b-code-instruct