--- language: - en license: apache-2.0 tags: - text-to-sql - sql - tally - accounting - erp - llama-3 - sqlcoder - postgresql - finance base_model: defog/llama-3-sqlcoder-8b library_name: peft pipeline_tag: text-generation --- # Tally SQLCoder - Fine-tuned for TallyPrime ERP **Created by:** Jay Viramgami A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for **TallyPrime ERP** systems. ## 🎯 Model Description This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally. ### Key Features - 🏦 **Accounting Domain Expertise** - Understands financial terms, GST, vouchers, ledgers - 📊 **28 Database Tables** - Covers all master and transaction tables from Tally - 🎓 **ICAI Compliant** - Based on Indian accounting standards - 🚀 **Fast Inference** - Optimized with QLoRA for efficient deployment - 💯 **High Accuracy** - Fine-tuned on 5,000+ Tally-specific query pairs ### Use Cases - Customer receivables and vendor payables analysis - Sales and purchase reporting - Inventory and stock management queries - GST and tax compliance reports - Financial statements (Profit & Loss, Balance Sheet) - Voucher and transaction searches ## 📊 Model Details - **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) - **Fine-tuning Method:** QLoRA (4-bit quantization) - **Training Data:** 5,000 synthetic Tally accounting text-to-SQL pairs - **Target Database:** PostgreSQL (Tally migration schema with 28 tables) - **Training Platform:** Kaggle (NVIDIA T4 GPU) - **Training Time:** ~4 hours - **Final Training Loss:** 0.05-0.07 ### Query Categories Supported | Category | Templates | Examples | |----------|-----------|----------| | Simple Filters | 20 | "Show all customers", "List bank accounts" | | Date Ranges | 20 | "Sales in March 2024", "Payments this quarter" | | Aggregations | 25 | "Total sales amount", "Top 10 customers" | | Joins | 25 | "Customer-wise outstanding", "Item sales by godown" | | Accounting | 20 | "P&L items", "Assets and liabilities" | | GST/Tax | 15 | "GST collected", "TDS deducted" | | Inventory | 15 | "Stock movements", "Items with zero balance" | | Financial Statements | 10 | "Trial balance", "Balance sheet data" | ## 🚀 Quick Start ### Installation ```bash pip install transformers peft torch accelerate bitsandbytes ``` ### Basic Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base model base_model = AutoModelForCausalLM.from_pretrained( "defog/llama-3-sqlcoder-8b", device_map="auto", torch_dtype=torch.float16 ) # Load fine-tuned adapter model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned") tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned") # Generate SQL question = "Show all customers with outstanding balance above 50000" schema = """CREATE TABLE mst_ledger ( name VARCHAR(1024), parent VARCHAR(1024), closing_balance DECIMAL(17,2) );""" prompt = f"""### Task Generate a SQL query to answer [QUESTION]{question}[/QUESTION] ### Database Schema The query will run on a database with the following schema: {schema} ### Answer Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION] [SQL]""" inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device) with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=300, temperature=0.1, do_sample=True, pad_token_id=tokenizer.eos_token_id ) result = tokenizer.decode(outputs[0], skip_special_tokens=True) sql = result.split("[SQL]")[-1].strip() print(sql) ``` **Output:** ```sql SELECT name, closing_balance FROM mst_ledger WHERE parent = 'Sundry Debtors' AND closing_balance > 50000 ``` ## 📝 Example Queries ### Simple Customer Query **Question:** "Show all customers" **Generated SQL:** ```sql SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors' ``` ### Sales Analysis **Question:** "What is the total sales amount for March 2024?" **Generated SQL:** ```sql SELECT SUM(ABS(amount)) as total_sales FROM trn_accounting ta JOIN trn_voucher tv ON ta.guid = tv.guid WHERE tv.voucher_type = 'Sales' AND tv.date BETWEEN '2024-03-01' AND '2024-03-31' ``` ### Top Customers **Question:** "Show top 10 customers by sales" **Generated SQL:** ```sql SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales FROM trn_voucher tv JOIN trn_accounting ta ON tv.guid = ta.guid WHERE tv.voucher_type = 'Sales' GROUP BY tv.party_name ORDER BY total_sales DESC LIMIT 10 ``` ### GST Query **Question:** "Show GST payable amount" **Generated SQL:** ```sql SELECT name, closing_balance FROM mst_ledger WHERE parent = 'Duties & Taxes' AND name LIKE '%GST%' ``` ## 🗄️ Database Schema The model is trained on a PostgreSQL schema with **28 tables** from TallyPrime: ### Master Tables (15) - `mst_ledger` - Customers, vendors, banks, expenses, incomes - `mst_group` - Account group hierarchy - `mst_stock_item` - Inventory items with GST details - `mst_stock_group` - Stock categories - `mst_vouchertype` - Voucher type definitions - `mst_godown` - Warehouse locations - `mst_cost_centre` - Cost centers - And 8 more... ### Transaction Tables (13) - `trn_voucher` - All financial transactions - `trn_accounting` - Ledger-wise entries - `trn_inventory` - Item-wise stock movements - `trn_bill` - Bill allocations - `trn_bank` - Bank transaction details - And 8 more... ## 📈 Training Details ### Dataset - **Size:** 5,000 text-to-SQL pairs - **Source:** Synthetically generated using 150 query templates - **Split:** 90/10 train/test - **Categories:** 8 query types covering all Tally operations ### Training Configuration - **Method:** QLoRA (Quantized Low-Rank Adaptation) - **Quantization:** 4-bit (NF4) - **LoRA Rank:** 16 - **LoRA Alpha:** 32 - **Target Modules:** q_proj, k_proj, v_proj, o_proj - **Batch Size:** 2 per device - **Gradient Accumulation:** 4 steps - **Learning Rate:** 2e-4 - **Epochs:** ~3 (1,600 steps) - **Optimizer:** PagedAdamW 8-bit - **Max Sequence Length:** 2048 tokens ### Hardware - **Platform:** Kaggle Notebooks - **GPU:** NVIDIA T4 (16GB) - **Training Time:** ~4 hours ## 📊 Performance - **Valid SQL Syntax:** >95% - **Keyword Match:** >85% - **Exact Match (normalized):** >70% ## ⚠️ Limitations - **Tally-Specific:** Optimized for TallyPrime PostgreSQL schema - **PostgreSQL Only:** SQL generated for PostgreSQL dialect - **Schema Required:** Needs database schema in the prompt - **Context Window:** Limited to 2048 tokens - **Custom Schemas:** May require additional fine-tuning for non-Tally schemas ## 🔧 Deployment Tips ### For Production Use: 1. **Add validation** - Verify generated SQL before execution 2. **Read-only mode** - Restrict to SELECT queries only 3. **Query timeout** - Set execution time limits 4. **Error handling** - Catch and handle syntax errors 5. **Logging** - Track all queries for audit ### Optimization: - Use GPU for faster inference (2-3 seconds per query) - CPU inference works but is slower (~10-15 seconds) - Consider caching frequently asked queries ## 📄 License This model is released under the **Apache 2.0** license, inheriting from the base model. ## 🙏 Acknowledgments - **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) by Defog.ai - **Training Standards:** ICAI (Institute of Chartered Accountants of India) Foundation Course - **Platform:** Trained on Kaggle's free GPU infrastructure ## 📧 Contact **Author:** Jay Viramgami For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page. ## 🔗 Related Resources - [TallyPrime ERP](https://tallysolutions.com/) - [Defog SQLCoder](https://github.com/defog-ai/sqlcoder) - [PEFT Library](https://github.com/huggingface/peft) --- ## Citation If you use this model in your work, please cite: ```bibtex @misc{tally-sqlcoder-finetuned, author = {Jay Viramgami}, title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP}, year = {2024}, publisher = {HuggingFace}, url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned} } ``` --- **Model Card created by Jay Viramgami | March 2024**