jaykv's picture
Upload README.md with huggingface_hub
19d65fc verified
---
language:
- en
license: apache-2.0
tags:
- text-to-sql
- sql
- tally
- accounting
- erp
- llama-3
- sqlcoder
- postgresql
- finance
base_model: defog/llama-3-sqlcoder-8b
library_name: peft
pipeline_tag: text-generation
---
# Tally SQLCoder - Fine-tuned for TallyPrime ERP
**Created by:** Jay Viramgami
A fine-tuned LLaMA 3 SQLCoder model specialized for converting natural language questions to PostgreSQL queries for **TallyPrime ERP** systems.
## 🎯 Model Description
This model is specifically trained to understand accounting and business terminology used in TallyPrime ERP and generate accurate SQL queries for a PostgreSQL database migrated from Tally.
### Key Features
- 🏦 **Accounting Domain Expertise** - Understands financial terms, GST, vouchers, ledgers
- πŸ“Š **28 Database Tables** - Covers all master and transaction tables from Tally
- πŸŽ“ **ICAI Compliant** - Based on Indian accounting standards
- πŸš€ **Fast Inference** - Optimized with QLoRA for efficient deployment
- πŸ’― **High Accuracy** - Fine-tuned on 5,000+ Tally-specific query pairs
### Use Cases
- Customer receivables and vendor payables analysis
- Sales and purchase reporting
- Inventory and stock management queries
- GST and tax compliance reports
- Financial statements (Profit & Loss, Balance Sheet)
- Voucher and transaction searches
## πŸ“Š Model Details
- **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b)
- **Fine-tuning Method:** QLoRA (4-bit quantization)
- **Training Data:** 5,000 synthetic Tally accounting text-to-SQL pairs
- **Target Database:** PostgreSQL (Tally migration schema with 28 tables)
- **Training Platform:** Kaggle (NVIDIA T4 GPU)
- **Training Time:** ~4 hours
- **Final Training Loss:** 0.05-0.07
### Query Categories Supported
| Category | Templates | Examples |
|----------|-----------|----------|
| Simple Filters | 20 | "Show all customers", "List bank accounts" |
| Date Ranges | 20 | "Sales in March 2024", "Payments this quarter" |
| Aggregations | 25 | "Total sales amount", "Top 10 customers" |
| Joins | 25 | "Customer-wise outstanding", "Item sales by godown" |
| Accounting | 20 | "P&L items", "Assets and liabilities" |
| GST/Tax | 15 | "GST collected", "TDS deducted" |
| Inventory | 15 | "Stock movements", "Items with zero balance" |
| Financial Statements | 10 | "Trial balance", "Balance sheet data" |
## πŸš€ Quick Start
### Installation
```bash
pip install transformers peft torch accelerate bitsandbytes
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"defog/llama-3-sqlcoder-8b",
device_map="auto",
torch_dtype=torch.float16
)
# Load fine-tuned adapter
model = PeftModel.from_pretrained(base_model, "jaykv/tally-sqlcoder-finetuned")
tokenizer = AutoTokenizer.from_pretrained("jaykv/tally-sqlcoder-finetuned")
# Generate SQL
question = "Show all customers with outstanding balance above 50000"
schema = """CREATE TABLE mst_ledger (
name VARCHAR(1024),
parent VARCHAR(1024),
closing_balance DECIMAL(17,2)
);"""
prompt = f"""### Task
Generate a SQL query to answer [QUESTION]{question}[/QUESTION]
### Database Schema
The query will run on a database with the following schema:
{schema}
### Answer
Given the database schema, here is the SQL query that answers [QUESTION]{question}[/QUESTION]
[SQL]"""
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048).to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
sql = result.split("[SQL]")[-1].strip()
print(sql)
```
**Output:**
```sql
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Sundry Debtors'
AND closing_balance > 50000
```
## πŸ“ Example Queries
### Simple Customer Query
**Question:** "Show all customers"
**Generated SQL:**
```sql
SELECT name FROM mst_ledger WHERE parent = 'Sundry Debtors'
```
### Sales Analysis
**Question:** "What is the total sales amount for March 2024?"
**Generated SQL:**
```sql
SELECT SUM(ABS(amount)) as total_sales
FROM trn_accounting ta
JOIN trn_voucher tv ON ta.guid = tv.guid
WHERE tv.voucher_type = 'Sales'
AND tv.date BETWEEN '2024-03-01' AND '2024-03-31'
```
### Top Customers
**Question:** "Show top 10 customers by sales"
**Generated SQL:**
```sql
SELECT tv.party_name, SUM(ABS(ta.amount)) as total_sales
FROM trn_voucher tv
JOIN trn_accounting ta ON tv.guid = ta.guid
WHERE tv.voucher_type = 'Sales'
GROUP BY tv.party_name
ORDER BY total_sales DESC
LIMIT 10
```
### GST Query
**Question:** "Show GST payable amount"
**Generated SQL:**
```sql
SELECT name, closing_balance
FROM mst_ledger
WHERE parent = 'Duties & Taxes'
AND name LIKE '%GST%'
```
## πŸ—„οΈ Database Schema
The model is trained on a PostgreSQL schema with **28 tables** from TallyPrime:
### Master Tables (15)
- `mst_ledger` - Customers, vendors, banks, expenses, incomes
- `mst_group` - Account group hierarchy
- `mst_stock_item` - Inventory items with GST details
- `mst_stock_group` - Stock categories
- `mst_vouchertype` - Voucher type definitions
- `mst_godown` - Warehouse locations
- `mst_cost_centre` - Cost centers
- And 8 more...
### Transaction Tables (13)
- `trn_voucher` - All financial transactions
- `trn_accounting` - Ledger-wise entries
- `trn_inventory` - Item-wise stock movements
- `trn_bill` - Bill allocations
- `trn_bank` - Bank transaction details
- And 8 more...
## πŸ“ˆ Training Details
### Dataset
- **Size:** 5,000 text-to-SQL pairs
- **Source:** Synthetically generated using 150 query templates
- **Split:** 90/10 train/test
- **Categories:** 8 query types covering all Tally operations
### Training Configuration
- **Method:** QLoRA (Quantized Low-Rank Adaptation)
- **Quantization:** 4-bit (NF4)
- **LoRA Rank:** 16
- **LoRA Alpha:** 32
- **Target Modules:** q_proj, k_proj, v_proj, o_proj
- **Batch Size:** 2 per device
- **Gradient Accumulation:** 4 steps
- **Learning Rate:** 2e-4
- **Epochs:** ~3 (1,600 steps)
- **Optimizer:** PagedAdamW 8-bit
- **Max Sequence Length:** 2048 tokens
### Hardware
- **Platform:** Kaggle Notebooks
- **GPU:** NVIDIA T4 (16GB)
- **Training Time:** ~4 hours
## πŸ“Š Performance
- **Valid SQL Syntax:** >95%
- **Keyword Match:** >85%
- **Exact Match (normalized):** >70%
## ⚠️ Limitations
- **Tally-Specific:** Optimized for TallyPrime PostgreSQL schema
- **PostgreSQL Only:** SQL generated for PostgreSQL dialect
- **Schema Required:** Needs database schema in the prompt
- **Context Window:** Limited to 2048 tokens
- **Custom Schemas:** May require additional fine-tuning for non-Tally schemas
## πŸ”§ Deployment Tips
### For Production Use:
1. **Add validation** - Verify generated SQL before execution
2. **Read-only mode** - Restrict to SELECT queries only
3. **Query timeout** - Set execution time limits
4. **Error handling** - Catch and handle syntax errors
5. **Logging** - Track all queries for audit
### Optimization:
- Use GPU for faster inference (2-3 seconds per query)
- CPU inference works but is slower (~10-15 seconds)
- Consider caching frequently asked queries
## πŸ“„ License
This model is released under the **Apache 2.0** license, inheriting from the base model.
## πŸ™ Acknowledgments
- **Base Model:** [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b) by Defog.ai
- **Training Standards:** ICAI (Institute of Chartered Accountants of India) Foundation Course
- **Platform:** Trained on Kaggle's free GPU infrastructure
## πŸ“§ Contact
**Author:** Jay Viramgami
For questions, feedback, or collaboration inquiries, please open an issue on the model's discussion page.
## πŸ”— Related Resources
- [TallyPrime ERP](https://tallysolutions.com/)
- [Defog SQLCoder](https://github.com/defog-ai/sqlcoder)
- [PEFT Library](https://github.com/huggingface/peft)
---
## Citation
If you use this model in your work, please cite:
```bibtex
@misc{tally-sqlcoder-finetuned,
author = {Jay Viramgami},
title = {Tally SQLCoder - Fine-tuned for TallyPrime ERP},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/jaykv/tally-sqlcoder-finetuned}
}
```
---
**Model Card created by Jay Viramgami | March 2024**