haricharanhl22
/

ecommerce-distributed-sql

Text Generation

distributed-databases

Model card Files Files and versions

ecommerce-distributed-sql / README.md

haricharanhl22's picture

Update README.md

8ed3943 verified 5 days ago

|

history blame contribute delete

3.1 kB

	---
	language:
	- en
	license: llama3.2
	base_model: unsloth/Llama-3.2-3B
	tags:
	- text-generation
	- sql
	- distributed-databases
	- qlora
	- peft
	- fine-tuned
	- e-commerce
	pipeline_tag: text-generation
	---

	# Llama 3.2 3B — E-commerce Distributed SQL

	Fine-tuned version of Llama 3.2 3B that converts natural language questions
	into SQL queries for distributed e-commerce databases.

	## Example

	Input:
	```
	### Instruction:
	Convert to distributed SQL

	### Input:
	Find all customers who spent more than 1000 euros in Germany

	### Response:
	```

	Output:
	```sql
	SELECT * FROM customers
	WHERE country = 'Germany' AND amount > 1000;
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base model \| Llama 3.2 3B \|
	\| Fine-tuning method \| QLoRA (4-bit quantization + LoRA) \|
	\| LoRA rank \| 16 \|
	\| Trainable parameters \| 0.14% \|
	\| Training GPU \| Google Colab T4 (free tier) \|
	\| Training time \| ~20 minutes \|
	\| Dataset size \| 25 examples \|
	\| Training epochs \| 3 \|

	## Training Details

	Fine-tuned using QLoRA — 4-bit NF4 quantization with LoRA adapters on the
	attention layers (`q_proj`, `v_proj`). This reduced memory requirements enough
	to train on a free Colab T4 GPU (15GB VRAM) in under 20 minutes, while only
	updating 0.14% of parameters.

	Libraries used: HuggingFace Transformers, PEFT, TRL (SFTTrainer),
	bitsandbytes, datasets

	## Dataset

	25 natural language → SQL pairs covering distributed e-commerce scenarios:
	- Orders across regions and shards
	- Inventory across warehouses
	- Customer analytics and segmentation
	- Revenue aggregations
	- JOIN queries across fragmented tables

	Prompt format used during training:
	```
	### Instruction:
	Convert to distributed SQL

	### Input:
	{natural language question}

	### Response:
	{SQL query}
	```

	## How to Use
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel

	# Load base model + adapter
	base = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-3B")
	model = PeftModel.from_pretrained(base, "haricharanhl22/ecommerce-distributed-sql")
	tokenizer = AutoTokenizer.from_pretrained("haricharanhl22/ecommerce-distributed-sql")

	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

	query = """### Instruction:
	Convert to distributed SQL

	### Input:
	Find top 5 customers by total order value

	### Response:"""

	result = pipe(query, max_new_tokens=100, do_sample=False)
	print(result[0]["generated_text"])
	```

	## Limitations

	- Trained on a small dataset (25 examples) — works best for common query patterns
	- Optimized for e-commerce schemas (orders, customers, products, inventory)
	- May not generalize well to very complex multi-level nested subqueries
	- SQL dialect closest to standard SQL / SQLite

	## Author

	Hari Charan Hosakote Lokesh
	M.Sc. Digital Engineering — Otto-von-Guericke-Universität Magdeburg

	- GitHub: [haricharanhl22](https://github.com/haricharanhl22)
	- LinkedIn: [haricharanhl22](https://linkedin.com/in/haricharanhl22)
	- Live project: [ai-bewerbung-assistant.vercel.app](https://ai-bewerbung-assistant.vercel.app)