|
|
--- |
|
|
license: cc-by-4.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- text-to-sql |
|
|
- code |
|
|
- qwen3 |
|
|
- knowledge-distillation |
|
|
datasets: |
|
|
- birdsql/bird_mini_dev |
|
|
- craterlabs/struct-sql-data |
|
|
base_model: |
|
|
- Qwen/Qwen3-4B-Instruct-2507 |
|
|
language: |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Struct-SQL-8B: Knowledge Distillation with Structured Chain-of-Thought |
|
|
|
|
|
**Struct-SQL** is a specialized Text-to-SQL model based on [**Qwen3-4B-Instruct-2507**](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507). It was trained using a novel Knowledge Distillation (KD) framework that transfers **structured reasoning** (Query Execution Plans) from a state-of-the-art teacher LLM (GPT-4o) to a smaller student model. |
|
|
|
|
|
Unlike standard distillation methods that rely on unstructured Chain-of-Thought (CoT), Struct-SQL learns to generate a formal, logical blueprint (a query plan) before generating the final SQL. This approach significantly reduces syntactic errors and schema hallucinations. |
|
|
|
|
|
📄 **Paper:** [Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL](https://arxiv.org/abs/2512.17053) |
|
|
*(Accepted at Canadian AI Conference 2026)* |
|
|
|
|
|
|
|
|
## Performance |
|
|
|
|
|
On the **BIRD mini-dev** benchmark, Struct-SQL achieves an **Execution Accuracy (EX) of 45.0%**, outperforming standard unstructured CoT distillation baselines by **8.1 points**. |
|
|
|
|
|
| Model | Distillation Method | Execution Accuracy (EX) | |
|
|
|:---|:---|:---| |
|
|
| **Struct-SQL (Ours)** | **Structured QP-CoT** | **45.0%** | |
|
|
| ReasonSQL Baseline | Unstructured CoT | 36.9% | |
|
|
| FN-Gold Baseline | No Reasoning (SQL Only) | 34.3% | |
|
|
| Base Student (Zero-shot) | None | 17.0% | |
|
|
|
|
|
--- |
|
|
## Methodology |
|
|
|
|
|
The model was trained on a curated dataset of **1,000 samples** generated by GPT-4o. The training data consists of: |
|
|
1. **Input:** Natural Language Question + Database Schema. |
|
|
2. **Output:** A structured **Query Execution Plan** (Reasoning) + Final **SQL Query**. |
|
|
|
|
|
By forcing the model to explicitly plan the query execution (e.g., "Scan Table", "Filter by...", "Join with..."), the model learns the logical structure of SQL generation rather than just memorizing patterns. |
|
|
|
|
|
--- |
|
|
## Usage |
|
|
|
|
|
You can use this model with the `transformers` library. It expects the input to be formatted with a specific system prompt or structure if you want to elicit the query plan. |
|
|
|
|
|
--- |
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model_id = "craterlabs/Struct-SQL" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
outputs = model.generate(**inputs, max_new_tokens=1200) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
--- |
|
|
## Intended Use |
|
|
|
|
|
Struct-SQL-4B is intended for **research and academic use** in tasks involving **Text-to-SQL generation** and **semantic parsing over relational databases**. The model is particularly suited for studying: |
|
|
|
|
|
- Knowledge distillation techniques that leverage **structured intermediate representations** |
|
|
- Explicit **query planning** as an alternative to unstructured chain-of-thought reasoning |
|
|
- Error reduction in SQL generation, including syntactic validity and schema grounding |
|
|
- Compact language models for complex reasoning under limited parameter budgets |
|
|
|
|
|
The model is not optimized for direct deployment in production database systems without additional validation and safety constraints. |
|
|
|
|
|
--- |
|
|
## Limitations |
|
|
|
|
|
- Evaluation is confined to the SQLite-based BIRD benchmark |
|
|
- The model may generate logically plausible but incorrect SQL for highly complex multi-hop queries |
|
|
|
|
|
--- |
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{thaker2025knowledge, |
|
|
title={Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL}, |
|
|
author={Thaker, Khushboo and Bresler, Yony}, |
|
|
journal={arXiv preprint arXiv:2512.17053}, |
|
|
year={2025} |
|
|
} |
|
|
@inproceedings{thaker2026knowledge, |
|
|
title={Struct-SQL: Distilling Structured Reasoning for Small Text-to-SQL Models}, |
|
|
author={Thaker, Khushboo and Bresler, Yony}, |
|
|
booktitle={Proceedings of the 39th Canadian Conference on Artificial Intelligence}, |
|
|
year={2026}, |
|
|
note={To appear} |
|
|
} |