craterlabs
/

Struct-SQL

Text Generation

knowledge-distillation

text-generation-inference

Model card Files Files and versions

KhushbooThaker commited on 15 days ago

Commit

843b0b2

·

verified ·

1 Parent(s): 511fa5a

Update README.md

Files changed (1) hide show

README.md +78 -1

README.md CHANGED Viewed

@@ -1,3 +1,80 @@
 ---
 license: cc-by-4.0
----

 ---
 license: cc-by-4.0
+metrics:
+- exact_match
+---
+---
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- text-to-sql
+- knowledge-distillation
+- struct-sql
+- qwen
+- generated_from_trainer
+base_model: Qwen/Qwen3-4B-Instruct-2507
+dataset:
+- bird-bench/bird
+arxiv: 2512.17053
+---
+# Struct-SQL-8B: Knowledge Distillation with Structured Chain-of-Thought
+**Struct-SQL** is a specialized Text-to-SQL model based on **Qwen3-4B-Instruct**. It was trained using a novel Knowledge Distillation (KD) framework that transfers **structured reasoning** (Query Execution Plans) from a state-of-the-art teacher LLM (GPT-4o) to a smaller student model.
+Unlike standard distillation methods that rely on unstructured Chain-of-Thought (CoT), Struct-SQL learns to generate a formal, logical blueprint (a query plan) before generating the final SQL. This approach significantly reduces syntactic errors and schema hallucinations.
+📄 **Paper:** [Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL](https://arxiv.org/abs/2512.17053)
+## Performance
+On the **BIRD mini-dev** benchmark, Struct-SQL achieves an **Execution Accuracy (EX) of 45.0%**, outperforming standard unstructured CoT distillation baselines by **8.1 points**.
+| Model | Distillation Method | Execution Accuracy (EX) |
+|:---|:---|:---|
+| **Struct-SQL (Ours)** | **Structured QP-CoT** | **45.0%** |
+| ReasonSQL Baseline | Unstructured CoT | 36.9% |
+| FN-Gold Baseline | No Reasoning (SQL Only) | 34.3% |
+| Base Student (Zero-shot) | None | 17.0% |
+## Methodology
+The model was trained on a curated dataset of **1,000 samples** generated by GPT-4o. The training data consists of:
+1.  **Input:** Natural Language Question + Database Schema.
+2.  **Output:** A structured **Query Execution Plan** (Reasoning) + Final **SQL Query**.
+By forcing the model to explicitly plan the query execution (e.g., "Scan Table", "Filter by...", "Join with..."), the model learns the logical structure of SQL generation rather than just memorizing patterns.
+## Usage
+You can use this model with the `transformers` library. It expects the input to be formatted with a specific system prompt or structure if you want to elicit the query plan.
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "craterlabs/Struct-SQL"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=1200)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Citation
+### If you use this model or method in your research, please cite our paper:
+@article{thaker2025knowledge,
+  title={Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL},
+  author={Thaker, Khushboo and Bresler, Yony},
+  journal={arXiv preprint arXiv:2512.17053},
+  year={2025}
+}