| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - text-to-sql |
| - sql |
| - tally |
| - accounting |
| - erp |
| - llama-3 |
| - sqlcoder |
| - postgresql |
| - finance |
| base_model: defog/llama-3-sqlcoder-8b |
| library_name: peft |
| pipeline_tag: text-generation |
| --- |
| |
| # Tally SQLCoder - Fine-tuned for TallyPrime ERP |
|
|
| **Created by:** Jay Viramgami |
|
|
| A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for **TallyPrime ERP** systems. |
|
|
| ## π― Model Description |
|
|
| This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally. |
|
|
| ### Key Features |
|
|
| - π¦ **Accounting Domain Expertise** - Understands financial terms, GST, vouchers, ledgers |
| - π **28 Database Tables** - Covers all master and transaction tables from Tally |
| - π **ICAI Compliant** - Based on Indian accounting standards |
| - π **Fast Inference** - Optimized with QLoRA for efficient deployment |
| - π― **High Accuracy** - Fine-tuned on 5,000+ Tally-specific query pairs |
|
|
| ### Use Cases |
|
|
| - Customer receivables and vendor payables analysis |
| - Sales and purchase reporting |
| - Inventory and stock management queries |
| - GST and tax compliance reports |
| - Financial statements (Profit & Loss, Balance Sheet) |
| - Voucher and transaction searches |
|
|
| ## π Model Details |
|
|
| - **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) |
| - **Fine-tuning Method:** QLoRA (4-bit quantization) |
| - **Training Data:** 5,000 synthetic Tally accounting text-to-SQL pairs |
| - **Target Database:** PostgreSQL (Tally migration schema with 28 tables) |
| - **Training Platform:** Kaggle (NVIDIA T4 GPU) |
| - **Training Time:** ~4 hours |
| - **Final Training Loss:** 0.05-0.07 |
|
|
| ### Query Categories Supported |
|
|
| | Category | Templates | Examples | |
| |----------|-----------|----------| |
| | Simple Filters | 20 | "Show all customers", "List bank accounts" | |
| | Date Ranges | 20 | "Sales in March 2024", "Payments this quarter" | |
| | Aggregations | 25 | "Total sales amount", "Top 10 customers" | |
| | Joins | 25 | "Customer-wise outstanding", "Item sales by godown" | |
| | Accounting | 20 | "P&L items", "Assets and liabilities" | |
| | GST/Tax | 15 | "GST collected", "TDS deducted" | |
| | Inventory | 15 | "Stock movements", "Items with zero balance" | |
| | Financial Statements | 10 | "Trial balance", "Balance sheet data" | |
|
|
| ## π Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| pip install transformers peft torch accelerate bitsandbytes |
| ``` |
|
|
| ### Basic Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| from peft import PeftModel |
| import torch |
| |
| # Load base model |
| base_model = AutoModelForCausalLM.from_pretrained( |
| "defog/llama-3-sqlcoder-8b", |
| device_map="auto", |
| torch_dtype=torch.float16 |
| ) |
| |
| # Load fine-tuned adapter |
| model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned") |
| tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned") |
| |
| # Generate SQL |
| question = "Show all customers with outstanding balance above 50000" |
| schema = """CREATE TABLE mst_ledger ( |
| name VARCHAR(1024), |
| parent VARCHAR(1024), |
| closing_balance DECIMAL(17,2) |
| );""" |
| |
| prompt = f"""### Task |
| Generate a SQL query to answer [QUESTION]{question}[/QUESTION] |
| |
| ### Database Schema |
| The query will run on a database with the following schema: |
| {schema} |
| |
| ### Answer |
| Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION] |
| [SQL]""" |
| |
| inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=300, |
| temperature=0.1, |
| do_sample=True, |
| pad_token_id=tokenizer.eos_token_id |
| ) |
| |
| result = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| sql = result.split("[SQL]")[-1].strip() |
| print(sql) |
| ``` |
|
|
| **Output:** |
| ```sql |
| SELECT name, closing_balance |
| FROM mst_ledger |
| WHERE parent = 'Sundry Debtors' |
| AND closing_balance > 50000 |
| ``` |
|
|
| ## π Example Queries |
|
|
| ### Simple Customer Query |
| **Question:** "Show all customers" |
| **Generated SQL:** |
| ```sql |
| SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors' |
| ``` |
|
|
| ### Sales Analysis |
| **Question:** "What is the total sales amount for March 2024?" |
| **Generated SQL:** |
| ```sql |
| SELECT SUM(ABS(amount)) as total_sales |
| FROM trn_accounting ta |
| JOIN trn_voucher tv ON ta.guid = tv.guid |
| WHERE tv.voucher_type = 'Sales' |
| AND tv.date BETWEEN '2024-03-01' AND '2024-03-31' |
| ``` |
|
|
| ### Top Customers |
| **Question:** "Show top 10 customers by sales" |
| **Generated SQL:** |
| ```sql |
| SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales |
| FROM trn_voucher tv |
| JOIN trn_accounting ta ON tv.guid = ta.guid |
| WHERE tv.voucher_type = 'Sales' |
| GROUP BY tv.party_name |
| ORDER BY total_sales DESC |
| LIMIT 10 |
| ``` |
|
|
| ### GST Query |
| **Question:** "Show GST payable amount" |
| **Generated SQL:** |
| ```sql |
| SELECT name, closing_balance |
| FROM mst_ledger |
| WHERE parent = 'Duties & Taxes' |
| AND name LIKE '%GST%' |
| ``` |
|
|
| ## ποΈ Database Schema |
|
|
| The model is trained on a PostgreSQL schema with **28 tables** from TallyPrime: |
|
|
| ### Master Tables (15) |
| - `mst_ledger` - Customers, vendors, banks, expenses, incomes |
| - `mst_group` - Account group hierarchy |
| - `mst_stock_item` - Inventory items with GST details |
| - `mst_stock_group` - Stock categories |
| - `mst_vouchertype` - Voucher type definitions |
| - `mst_godown` - Warehouse locations |
| - `mst_cost_centre` - Cost centers |
| - And 8 more... |
|
|
| ### Transaction Tables (13) |
| - `trn_voucher` - All financial transactions |
| - `trn_accounting` - Ledger-wise entries |
| - `trn_inventory` - Item-wise stock movements |
| - `trn_bill` - Bill allocations |
| - `trn_bank` - Bank transaction details |
| - And 8 more... |
|
|
| ## π Training Details |
|
|
| ### Dataset |
| - **Size:** 5,000 text-to-SQL pairs |
| - **Source:** Synthetically generated using 150 query templates |
| - **Split:** 90/10 train/test |
| - **Categories:** 8 query types covering all Tally operations |
|
|
| ### Training Configuration |
| - **Method:** QLoRA (Quantized Low-Rank Adaptation) |
| - **Quantization:** 4-bit (NF4) |
| - **LoRA Rank:** 16 |
| - **LoRA Alpha:** 32 |
| - **Target Modules:** q_proj, k_proj, v_proj, o_proj |
| - **Batch Size:** 2 per device |
| - **Gradient Accumulation:** 4 steps |
| - **Learning Rate:** 2e-4 |
| - **Epochs:** ~3 (1,600 steps) |
| - **Optimizer:** PagedAdamW 8-bit |
| - **Max Sequence Length:** 2048 tokens |
|
|
| ### Hardware |
| - **Platform:** Kaggle Notebooks |
| - **GPU:** NVIDIA T4 (16GB) |
| - **Training Time:** ~4 hours |
|
|
| ## π Performance |
|
|
| - **Valid SQL Syntax:** >95% |
| - **Keyword Match:** >85% |
| - **Exact Match (normalized):** >70% |
|
|
| ## β οΈ Limitations |
|
|
| - **Tally-Specific:** Optimized for TallyPrime PostgreSQL schema |
| - **PostgreSQL Only:** SQL generated for PostgreSQL dialect |
| - **Schema Required:** Needs database schema in the prompt |
| - **Context Window:** Limited to 2048 tokens |
| - **Custom Schemas:** May require additional fine-tuning for non-Tally schemas |
|
|
| ## π§ Deployment Tips |
|
|
| ### For Production Use: |
| 1. **Add validation** - Verify generated SQL before execution |
| 2. **Read-only mode** - Restrict to SELECT queries only |
| 3. **Query timeout** - Set execution time limits |
| 4. **Error handling** - Catch and handle syntax errors |
| 5. **Logging** - Track all queries for audit |
|
|
| ### Optimization: |
| - Use GPU for faster inference (2-3 seconds per query) |
| - CPU inference works but is slower (~10-15 seconds) |
| - Consider caching frequently asked queries |
|
|
| ## π License |
|
|
| This model is released under the **Apache 2.0** license, inheriting from the base model. |
|
|
| ## π Acknowledgments |
|
|
| - **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) by Defog.ai |
| - **Training Standards:** ICAI (Institute of Chartered Accountants of India) Foundation Course |
| - **Platform:** Trained on Kaggle's free GPU infrastructure |
|
|
| ## π§ Contact |
|
|
| **Author:** Jay Viramgami |
|
|
| For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page. |
|
|
| ## π Related Resources |
|
|
| - [TallyPrime ERP](https://tallysolutions.com/) |
| - [Defog SQLCoder](https://github.com/defog-ai/sqlcoder) |
| - [PEFT Library](https://github.com/huggingface/peft) |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model in your work, please cite: |
|
|
| ```bibtex |
| @misc{tally-sqlcoder-finetuned, |
| author = {Jay Viramgami}, |
| title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP}, |
| year = {2024}, |
| publisher = {HuggingFace}, |
| url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned} |
| } |
| ``` |
|
|
| --- |
|
|
| **Model Card created by Jay Viramgami | March 2024** |
|
|