nishu08 commited on
Commit
7e815bc
·
verified ·
1 Parent(s): d8b9b39

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - codebert
6
+ - sql
7
+ - education
8
+ - text-classification
9
+ - cross-encoder
10
+ base_model: microsoft/codebert-base
11
+ pipeline_tag: text-classification
12
+ ---
13
+
14
+ # SQL CodeBERT Cross-Encoder
15
+
16
+ Multi-label SQL error classifier using **microsoft/codebert-base** as a cross-encoder.
17
+
18
+ ## Input Format
19
+
20
+ All fields are concatenated into one sequence:
21
+
22
+ ```
23
+ QUESTION:
24
+ {question}
25
+
26
+ SCHEMA:
27
+ {schema}
28
+
29
+ STUDENT_SQL:
30
+ {student_sql}
31
+
32
+ CORRECT_SQL:
33
+ {correct_sql}
34
+ ```
35
+
36
+ ## Labels
37
+
38
+ `JOIN_ERROR`, `AGGREGATION_ERROR`, `FILTER_ERROR`, `WINDOW_FUNCTION_ERROR`,
39
+ `SUBQUERY_ERROR`, `NULL_HANDLING_ERROR`, `PERFORMANCE_ERROR`, `LOGICAL_ERROR`, `SYNTAX_ERROR`
40
+
41
+ ## Training
42
+
43
+ ```bash
44
+ python -m src.hf_train_codebert \
45
+ --data data/sql_errors_1m.parquet \
46
+ --output-dir models/codebert-cross-encoder \
47
+ --epochs 3 \
48
+ --push-to-hub \
49
+ --hub-model-id YOUR_USERNAME/sql-codebert-cross-encoder
50
+ ```
51
+
52
+ ## Inference
53
+
54
+ ```python
55
+ from src.hf_predict_codebert import CodeBERTSQLErrorClassifier
56
+
57
+ clf = CodeBERTSQLErrorClassifier("YOUR_USERNAME/sql-codebert-cross-encoder")
58
+ result = clf.predict(
59
+ question="What is the average score per department?",
60
+ schema="students(id, score, department_id)",
61
+ student_sql="SELECT department_id, SUM(score) FROM students GROUP BY department_id",
62
+ correct_sql="SELECT department_id, AVG(score) FROM students GROUP BY department_id",
63
+ )
64
+ print(result["error_labels"])
65
+ ```