File size: 8,494 Bytes
4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc 4b269de 19d65fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | ---
language:
- en
license: apache-2.0
tags:
- text-to-sql
- sql
- tally
- accounting
- erp
- llama-3
- sqlcoder
- postgresql
- finance
base_model: defog/llama-3-sqlcoder-8b
library_name: peft
pipeline_tag: text-generation
---
# Tally SQLCoder - Fine-tuned for TallyPrime ERP
**Created by:** Jay Viramgami
A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for **TallyPrime ERP** systems.
## π― Model Description
This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally.
### Key Features
- π¦ **Accounting Domain Expertise** - Understands financial terms, GST, vouchers, ledgers
- π **28 Database Tables** - Covers all master and transaction tables from Tally
- π **ICAI Compliant** - Based on Indian accounting standards
- π **Fast Inference** - Optimized with QLoRA for efficient deployment
- π― **High Accuracy** - Fine-tuned on 5,000+ Tally-specific query pairs
### Use Cases
- Customer receivables and vendor payables analysis
- Sales and purchase reporting
- Inventory and stock management queries
- GST and tax compliance reports
- Financial statements (Profit & Loss, Balance Sheet)
- Voucher and transaction searches
## π Model Details
- **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b)
- **Fine-tuning Method:** QLoRA (4-bit quantization)
- **Training Data:** 5,000 synthetic Tally accounting text-to-SQL pairs
- **Target Database:** PostgreSQL (Tally migration schema with 28 tables)
- **Training Platform:** Kaggle (NVIDIA T4 GPU)
- **Training Time:** ~4 hours
- **Final Training Loss:** 0.05-0.07
### Query Categories Supported
| Category | Templates | Examples |
|----------|-----------|----------|
| Simple Filters | 20 | "Show all customers", "List bank accounts" |
| Date Ranges | 20 | "Sales in March 2024", "Payments this quarter" |
| Aggregations | 25 | "Total sales amount", "Top 10 customers" |
| Joins | 25 | "Customer-wise outstanding", "Item sales by godown" |
| Accounting | 20 | "P&L items", "Assets and liabilities" |
| GST/Tax | 15 | "GST collected", "TDS deducted" |
| Inventory | 15 | "Stock movements", "Items with zero balance" |
| Financial Statements | 10 | "Trial balance", "Balance sheet data" |
## π Quick Start
### Installation
```bash
pip install transformers peft torch accelerate bitsandbytes
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"defog/llama-3-sqlcoder-8b",
device_map="auto",
torch_dtype=torch.float16
)
# Load fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned")
tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned")
# Generate SQL
question = "Show all customers with outstanding balance above 50000"
schema = """CREATE TABLE mst_ledger (
name VARCHAR(1024),
parent VARCHAR(1024),
closing_balance DECIMAL(17,2)
);"""
prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{schema}
### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql = result.split("[SQL]")[-1].strip()
print(sql)
```
**Output:**
```sql
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Sundry Debtors'
AND closing_balance > 50000
```
## π Example Queries
### Simple Customer Query
**Question:** "Show all customers"
**Generated SQL:**
```sql
SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors'
```
### Sales Analysis
**Question:** "What is the total sales amount for March 2024?"
**Generated SQL:**
```sql
SELECT SUM(ABS(amount)) as total_sales
FROM trn_accounting ta
JOIN trn_voucher tv ON ta.guid = tv.guid
WHERE tv.voucher_type = 'Sales'
AND tv.date BETWEEN '2024-03-01' AND '2024-03-31'
```
### Top Customers
**Question:** "Show top 10 customers by sales"
**Generated SQL:**
```sql
SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales
FROM trn_voucher tv
JOIN trn_accounting ta ON tv.guid = ta.guid
WHERE tv.voucher_type = 'Sales'
GROUP BY tv.party_name
ORDER BY total_sales DESC
LIMIT 10
```
### GST Query
**Question:** "Show GST payable amount"
**Generated SQL:**
```sql
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Duties & Taxes'
AND name LIKE '%GST%'
```
## ποΈ Database Schema
The model is trained on a PostgreSQL schema with **28 tables** from TallyPrime:
### Master Tables (15)
- `mst_ledger` - Customers, vendors, banks, expenses, incomes
- `mst_group` - Account group hierarchy
- `mst_stock_item` - Inventory items with GST details
- `mst_stock_group` - Stock categories
- `mst_vouchertype` - Voucher type definitions
- `mst_godown` - Warehouse locations
- `mst_cost_centre` - Cost centers
- And 8 more...
### Transaction Tables (13)
- `trn_voucher` - All financial transactions
- `trn_accounting` - Ledger-wise entries
- `trn_inventory` - Item-wise stock movements
- `trn_bill` - Bill allocations
- `trn_bank` - Bank transaction details
- And 8 more...
## π Training Details
### Dataset
- **Size:** 5,000 text-to-SQL pairs
- **Source:** Synthetically generated using 150 query templates
- **Split:** 90/10 train/test
- **Categories:** 8 query types covering all Tally operations
### Training Configuration
- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Quantization:** 4-bit (NF4)
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj
- **Batch Size:** 2 per device
- **Gradient Accumulation:** 4 steps
- **Learning Rate:** 2e-4
- **Epochs:** ~3 (1,600 steps)
- **Optimizer:** PagedAdamW 8-bit
- **Max Sequence Length:** 2048 tokens
### Hardware
- **Platform:** Kaggle Notebooks
- **GPU:** NVIDIA T4 (16GB)
- **Training Time:** ~4 hours
## π Performance
- **Valid SQL Syntax:** >95%
- **Keyword Match:** >85%
- **Exact Match (normalized):** >70%
## β οΈ Limitations
- **Tally-Specific:** Optimized for TallyPrime PostgreSQL schema
- **PostgreSQL Only:** SQL generated for PostgreSQL dialect
- **Schema Required:** Needs database schema in the prompt
- **Context Window:** Limited to 2048 tokens
- **Custom Schemas:** May require additional fine-tuning for non-Tally schemas
## π§ Deployment Tips
### For Production Use:
1. **Add validation** - Verify generated SQL before execution
2. **Read-only mode** - Restrict to SELECT queries only
3. **Query timeout** - Set execution time limits
4. **Error handling** - Catch and handle syntax errors
5. **Logging** - Track all queries for audit
### Optimization:
- Use GPU for faster inference (2-3 seconds per query)
- CPU inference works but is slower (~10-15 seconds)
- Consider caching frequently asked queries
## π License
This model is released under the **Apache 2.0** license, inheriting from the base model.
## π Acknowledgments
- **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) by Defog.ai
- **Training Standards:** ICAI (Institute of Chartered Accountants of India) Foundation Course
- **Platform:** Trained on Kaggle's free GPU infrastructure
## π§ Contact
**Author:** Jay Viramgami
For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page.
## π Related Resources
- [TallyPrime ERP](https://tallysolutions.com/)
- [Defog SQLCoder](https://github.com/defog-ai/sqlcoder)
- [PEFT Library](https://github.com/huggingface/peft)
---
## Citation
If you use this model in your work, please cite:
```bibtex
@misc{tally-sqlcoder-finetuned,
author = {Jay Viramgami},
title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned}
}
```
---
**Model Card created by Jay Viramgami | March 2024**
|