KhushbooThaker commited on
Commit
843b0b2
·
verified ·
1 Parent(s): 511fa5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -1
README.md CHANGED
@@ -1,3 +1,80 @@
1
  ---
2
  license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ metrics:
4
+ - exact_match
5
+ ---
6
+ ---
7
+ language:
8
+ - en
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - text-to-sql
12
+ - knowledge-distillation
13
+ - struct-sql
14
+ - qwen
15
+ - generated_from_trainer
16
+ base_model: Qwen/Qwen3-4B-Instruct-2507
17
+ dataset:
18
+ - bird-bench/bird
19
+ arxiv: 2512.17053
20
+ ---
21
+
22
+ # Struct-SQL-8B: Knowledge Distillation with Structured Chain-of-Thought
23
+
24
+ **Struct-SQL** is a specialized Text-to-SQL model based on **Qwen3-4B-Instruct**. It was trained using a novel Knowledge Distillation (KD) framework that transfers **structured reasoning** (Query Execution Plans) from a state-of-the-art teacher LLM (GPT-4o) to a smaller student model.
25
+
26
+ Unlike standard distillation methods that rely on unstructured Chain-of-Thought (CoT), Struct-SQL learns to generate a formal, logical blueprint (a query plan) before generating the final SQL. This approach significantly reduces syntactic errors and schema hallucinations.
27
+
28
+ 📄 **Paper:** [Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL](https://arxiv.org/abs/2512.17053)
29
+
30
+ ## Performance
31
+
32
+ On the **BIRD mini-dev** benchmark, Struct-SQL achieves an **Execution Accuracy (EX) of 45.0%**, outperforming standard unstructured CoT distillation baselines by **8.1 points**.
33
+
34
+ | Model | Distillation Method | Execution Accuracy (EX) |
35
+ |:---|:---|:---|
36
+ | **Struct-SQL (Ours)** | **Structured QP-CoT** | **45.0%** |
37
+ | ReasonSQL Baseline | Unstructured CoT | 36.9% |
38
+ | FN-Gold Baseline | No Reasoning (SQL Only) | 34.3% |
39
+ | Base Student (Zero-shot) | None | 17.0% |
40
+
41
+ ## Methodology
42
+
43
+ The model was trained on a curated dataset of **1,000 samples** generated by GPT-4o. The training data consists of:
44
+ 1. **Input:** Natural Language Question + Database Schema.
45
+ 2. **Output:** A structured **Query Execution Plan** (Reasoning) + Final **SQL Query**.
46
+
47
+ By forcing the model to explicitly plan the query execution (e.g., "Scan Table", "Filter by...", "Join with..."), the model learns the logical structure of SQL generation rather than just memorizing patterns.
48
+
49
+ ## Usage
50
+
51
+ You can use this model with the `transformers` library. It expects the input to be formatted with a specific system prompt or structure if you want to elicit the query plan.
52
+
53
+ ```python
54
+ import torch
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+
57
+ model_id = "craterlabs/Struct-SQL"
58
+
59
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
60
+ model = AutoModelForCausalLM.from_pretrained(
61
+ model_id,
62
+ torch_dtype=torch.float16,
63
+ device_map="auto"
64
+ )
65
+
66
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
67
+ outputs = model.generate(**inputs, max_new_tokens=1200)
68
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
69
+
70
+
71
+
72
+ # Citation
73
+ ### If you use this model or method in your research, please cite our paper:
74
+
75
+ @article{thaker2025knowledge,
76
+ title={Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL},
77
+ author={Thaker, Khushboo and Bresler, Yony},
78
+ journal={arXiv preprint arXiv:2512.17053},
79
+ year={2025}
80
+ }