--- language: - en license: llama3.2 base_model: unsloth/Llama-3.2-3B tags: - text-generation - sql - distributed-databases - qlora - peft - fine-tuned - e-commerce pipeline_tag: text-generation --- # Llama 3.2 3B — E-commerce Distributed SQL Fine-tuned version of Llama 3.2 3B that converts natural language questions into SQL queries for distributed e-commerce databases. ## Example **Input:** ``` ### Instruction: Convert to distributed SQL ### Input: Find all customers who spent more than 1000 euros in Germany ### Response: ``` **Output:** ```sql SELECT * FROM customers WHERE country = 'Germany' AND amount > 1000; ``` ## Model Details | Property | Value | |----------|-------| | Base model | Llama 3.2 3B | | Fine-tuning method | QLoRA (4-bit quantization + LoRA) | | LoRA rank | 16 | | Trainable parameters | 0.14% | | Training GPU | Google Colab T4 (free tier) | | Training time | ~20 minutes | | Dataset size | 25 examples | | Training epochs | 3 | ## Training Details Fine-tuned using QLoRA — 4-bit NF4 quantization with LoRA adapters on the attention layers (`q_proj`, `v_proj`). This reduced memory requirements enough to train on a free Colab T4 GPU (15GB VRAM) in under 20 minutes, while only updating 0.14% of parameters. **Libraries used:** HuggingFace Transformers, PEFT, TRL (SFTTrainer), bitsandbytes, datasets ## Dataset 25 natural language → SQL pairs covering distributed e-commerce scenarios: - Orders across regions and shards - Inventory across warehouses - Customer analytics and segmentation - Revenue aggregations - JOIN queries across fragmented tables **Prompt format used during training:** ``` ### Instruction: Convert to distributed SQL ### Input: {natural language question} ### Response: {SQL query} ``` ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import PeftModel # Load base model + adapter base = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.2-3B") model = PeftModel.from_pretrained(base, "haricharanhl22/ecommerce-distributed-sql") tokenizer = AutoTokenizer.from_pretrained("haricharanhl22/ecommerce-distributed-sql") pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) query = """### Instruction: Convert to distributed SQL ### Input: Find top 5 customers by total order value ### Response:""" result = pipe(query, max_new_tokens=100, do_sample=False) print(result[0]["generated_text"]) ``` ## Limitations - Trained on a small dataset (25 examples) — works best for common query patterns - Optimized for e-commerce schemas (orders, customers, products, inventory) - May not generalize well to very complex multi-level nested subqueries - SQL dialect closest to standard SQL / SQLite ## Author **Hari Charan Hosakote Lokesh** M.Sc. Digital Engineering — Otto-von-Guericke-Universität Magdeburg - GitHub: [haricharanhl22](https://github.com/haricharanhl22) - LinkedIn: [haricharanhl22](https://linkedin.com/in/haricharanhl22) - Live project: [ai-bewerbung-assistant.vercel.app](https://ai-bewerbung-assistant.vercel.app)