File size: 8,494 Bytes
4b269de
19d65fc
 
 
 
 
 
 
 
 
 
 
 
 
4b269de
 
19d65fc
4b269de
 
19d65fc
4b269de
19d65fc
4b269de
19d65fc
4b269de
19d65fc
4b269de
19d65fc
4b269de
19d65fc
4b269de
19d65fc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4b269de
19d65fc
4b269de
19d65fc
4b269de
19d65fc
4b269de
19d65fc
 
 
 
 
 
 
 
 
4b269de
19d65fc
4b269de
19d65fc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
---
language:
- en
license: apache-2.0
tags:
- text-to-sql
- sql
- tally
- accounting
- erp
- llama-3
- sqlcoder
- postgresql
- finance
base_model: defog/llama-3-sqlcoder-8b
library_name: peft
pipeline_tag: text-generation
---

# Tally SQLCoder - Fine-tuned for TallyPrime ERP

**Created by:** Jay Viramgami

A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for **TallyPrime ERP** systems.

## 🎯 Model Description

This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally.

### Key Features

- 🏦 **Accounting Domain Expertise** - Understands financial terms, GST, vouchers, ledgers
- πŸ“Š **28 Database Tables** - Covers all master and transaction tables from Tally
- πŸŽ“ **ICAI Compliant** - Based on Indian accounting standards
- πŸš€ **Fast Inference** - Optimized with QLoRA for efficient deployment
- πŸ’― **High Accuracy** - Fine-tuned on 5,000+ Tally-specific query pairs

### Use Cases

- Customer receivables and vendor payables analysis
- Sales and purchase reporting
- Inventory and stock management queries
- GST and tax compliance reports
- Financial statements (Profit & Loss, Balance Sheet)
- Voucher and transaction searches

## πŸ“Š Model Details

- **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b)
- **Fine-tuning Method:** QLoRA (4-bit quantization)
- **Training Data:** 5,000 synthetic Tally accounting text-to-SQL pairs
- **Target Database:** PostgreSQL (Tally migration schema with 28 tables)
- **Training Platform:** Kaggle (NVIDIA T4 GPU)
- **Training Time:** ~4 hours
- **Final Training Loss:** 0.05-0.07

### Query Categories Supported

| Category | Templates | Examples |
|----------|-----------|----------|
| Simple Filters | 20 | "Show all customers", "List bank accounts" |
| Date Ranges | 20 | "Sales in March 2024", "Payments this quarter" |
| Aggregations | 25 | "Total sales amount", "Top 10 customers" |
| Joins | 25 | "Customer-wise outstanding", "Item sales by godown" |
| Accounting | 20 | "P&L items", "Assets and liabilities" |
| GST/Tax | 15 | "GST collected", "TDS deducted" |
| Inventory | 15 | "Stock movements", "Items with zero balance" |
| Financial Statements | 10 | "Trial balance", "Balance sheet data" |

## πŸš€ Quick Start

### Installation

```bash
pip install transformers peft torch accelerate bitsandbytes
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "defog/llama-3-sqlcoder-8b",
    device_map="auto",
    torch_dtype=torch.float16
)

# Load fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned")
tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned")

# Generate SQL
question = "Show all customers with outstanding balance above 50000"
schema = """CREATE TABLE mst_ledger (
    name VARCHAR(1024),
    parent VARCHAR(1024),
    closing_balance DECIMAL(17,2)
);"""

prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]

### Database Schema
The query will run on a database with the following schema:
{schema}

### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]"""

inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=300,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql = result.split("[SQL]")[-1].strip()
print(sql)
```

**Output:**
```sql
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Sundry Debtors'
AND closing_balance > 50000
```

## πŸ“ Example Queries

### Simple Customer Query
**Question:** "Show all customers"
**Generated SQL:**
```sql
SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors'
```

### Sales Analysis
**Question:** "What is the total sales amount for March 2024?"
**Generated SQL:**
```sql
SELECT SUM(ABS(amount)) as total_sales
FROM trn_accounting ta
JOIN trn_voucher tv ON ta.guid = tv.guid
WHERE tv.voucher_type = 'Sales'
AND tv.date BETWEEN '2024-03-01' AND '2024-03-31'
```

### Top Customers
**Question:** "Show top 10 customers by sales"
**Generated SQL:**
```sql
SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales
FROM trn_voucher tv
JOIN trn_accounting ta ON tv.guid = ta.guid
WHERE tv.voucher_type = 'Sales'
GROUP BY tv.party_name
ORDER BY total_sales DESC
LIMIT 10
```

### GST Query
**Question:** "Show GST payable amount"
**Generated SQL:**
```sql
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Duties & Taxes'
AND name LIKE '%GST%'
```

## πŸ—„οΈ Database Schema

The model is trained on a PostgreSQL schema with **28 tables** from TallyPrime:

### Master Tables (15)
- `mst_ledger` - Customers, vendors, banks, expenses, incomes
- `mst_group` - Account group hierarchy
- `mst_stock_item` - Inventory items with GST details
- `mst_stock_group` - Stock categories
- `mst_vouchertype` - Voucher type definitions
- `mst_godown` - Warehouse locations
- `mst_cost_centre` - Cost centers
- And 8 more...

### Transaction Tables (13)
- `trn_voucher` - All financial transactions
- `trn_accounting` - Ledger-wise entries
- `trn_inventory` - Item-wise stock movements
- `trn_bill` - Bill allocations
- `trn_bank` - Bank transaction details
- And 8 more...

## πŸ“ˆ Training Details

### Dataset
- **Size:** 5,000 text-to-SQL pairs
- **Source:** Synthetically generated using 150 query templates
- **Split:** 90/10 train/test
- **Categories:** 8 query types covering all Tally operations

### Training Configuration
- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Quantization:** 4-bit (NF4)
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj
- **Batch Size:** 2 per device
- **Gradient Accumulation:** 4 steps
- **Learning Rate:** 2e-4
- **Epochs:** ~3 (1,600 steps)
- **Optimizer:** PagedAdamW 8-bit
- **Max Sequence Length:** 2048 tokens

### Hardware
- **Platform:** Kaggle Notebooks
- **GPU:** NVIDIA T4 (16GB)
- **Training Time:** ~4 hours

## πŸ“Š Performance

- **Valid SQL Syntax:** >95%
- **Keyword Match:** >85%
- **Exact Match (normalized):** >70%

## ⚠️ Limitations

- **Tally-Specific:** Optimized for TallyPrime PostgreSQL schema
- **PostgreSQL Only:** SQL generated for PostgreSQL dialect
- **Schema Required:** Needs database schema in the prompt
- **Context Window:** Limited to 2048 tokens
- **Custom Schemas:** May require additional fine-tuning for non-Tally schemas

## πŸ”§ Deployment Tips

### For Production Use:
1. **Add validation** - Verify generated SQL before execution
2. **Read-only mode** - Restrict to SELECT queries only
3. **Query timeout** - Set execution time limits
4. **Error handling** - Catch and handle syntax errors
5. **Logging** - Track all queries for audit

### Optimization:
- Use GPU for faster inference (2-3 seconds per query)
- CPU inference works but is slower (~10-15 seconds)
- Consider caching frequently asked queries

## πŸ“„ License

This model is released under the **Apache 2.0** license, inheriting from the base model.

## πŸ™ Acknowledgments

- **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) by Defog.ai
- **Training Standards:** ICAI (Institute of Chartered Accountants of India) Foundation Course
- **Platform:** Trained on Kaggle's free GPU infrastructure

## πŸ“§ Contact

**Author:** Jay Viramgami

For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page.

## πŸ”— Related Resources

- [TallyPrime ERP](https://tallysolutions.com/)
- [Defog SQLCoder](https://github.com/defog-ai/sqlcoder)
- [PEFT Library](https://github.com/huggingface/peft)

---

## Citation

If you use this model in your work, please cite:

```bibtex
@misc{tally-sqlcoder-finetuned,
  author = {Jay Viramgami},
  title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned}
}
```

---

**Model Card created by Jay Viramgami | March 2024**