haricharanhl22's picture
Update README.md
8ed3943 verified
---
language:
- en
license: llama3.2
base_model: unsloth/Llama-3.2-3B
tags:
- text-generation
- sql
- distributed-databases
- qlora
- peft
- fine-tuned
- e-commerce
pipeline_tag: text-generation
---
# Llama 3.2 3B — E-commerce Distributed SQL
Fine-tuned version of Llama 3.2 3B that converts natural language questions
into SQL queries for distributed e-commerce databases.
## Example
**Input:**
```
### Instruction:
Convert to distributed SQL
### Input:
Find all customers who spent more than 1000 euros in Germany
### Response:
```
**Output:**
```sql
SELECT * FROM customers
WHERE country = 'Germany' AND amount > 1000;
```
## Model Details
| Property | Value |
|----------|-------|
| Base model | Llama 3.2 3B |
| Fine-tuning method | QLoRA (4-bit quantization + LoRA) |
| LoRA rank | 16 |
| Trainable parameters | 0.14% |
| Training GPU | Google Colab T4 (free tier) |
| Training time | ~20 minutes |
| Dataset size | 25 examples |
| Training epochs | 3 |
## Training Details
Fine-tuned using QLoRA — 4-bit NF4 quantization with LoRA adapters on the
attention layers (`q_proj`, `v_proj`). This reduced memory requirements enough
to train on a free Colab T4 GPU (15GB VRAM) in under 20 minutes, while only
updating 0.14% of parameters.
**Libraries used:** HuggingFace Transformers, PEFT, TRL (SFTTrainer),
bitsandbytes, datasets
## Dataset
25 natural language → SQL pairs covering distributed e-commerce scenarios:
- Orders across regions and shards
- Inventory across warehouses
- Customer analytics and segmentation
- Revenue aggregations
- JOIN queries across fragmented tables
**Prompt format used during training:**
```
### Instruction:
Convert to distributed SQL
### Input:
{natural language question}
### Response:
{SQL query}
```
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
# Load base model + adapter
base = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-3B")
model = PeftModel.from_pretrained(base, "haricharanhl22/ecommerce-distributed-sql")
tokenizer = AutoTokenizer.from_pretrained("haricharanhl22/ecommerce-distributed-sql")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
query = """### Instruction:
Convert to distributed SQL
### Input:
Find top 5 customers by total order value
### Response:"""
result = pipe(query, max_new_tokens=100, do_sample=False)
print(result[0]["generated_text"])
```
## Limitations
- Trained on a small dataset (25 examples) — works best for common query patterns
- Optimized for e-commerce schemas (orders, customers, products, inventory)
- May not generalize well to very complex multi-level nested subqueries
- SQL dialect closest to standard SQL / SQLite
## Author
**Hari Charan Hosakote Lokesh**
M.Sc. Digital Engineering — Otto-von-Guericke-Universität Magdeburg
- GitHub: [haricharanhl22](https://github.com/haricharanhl22)
- LinkedIn: [haricharanhl22](https://linkedin.com/in/haricharanhl22)
- Live project: [ai-bewerbung-assistant.vercel.app](https://ai-bewerbung-assistant.vercel.app)