--- title: SQL Error Classifier Training emoji: 🧠 colorFrom: blue colorTo: green sdk: docker app_port: 7860 pinned: false license: mit hardware: t4-small --- # SQL Error Classifier — CodeBERT Training Space Train `microsoft/codebert-base` as a **cross-encoder** for multi-label SQL error classification. ## Setup 1. **Hardware:** Settings → Hardware → **GPU t4-small** (recommended) 2. **Secrets:** Settings → Secrets → add `HF_TOKEN` (Hugging Face write token) to push models to your account 3. **Data:** Include `data/sql_errors_dev.parquet` in this Space repo, or upload parquet at runtime ## Usage 1. Choose bundled dataset or upload your own parquet 2. Set epochs, batch size, max samples 3. Click **Start Training** 4. Optionally enable **Push to Hub** with model id `your-username/sql-codebert-classifier` ## Dataset columns Required (aliases supported): | Column | Aliases | |--------|---------| | `question` | — | | `schema` | — | | `student_sql` | `query` | | `correct_sql` | `correct_query` | | `error_labels` | `label_name` | ## Labels (9-class multi-label) `JOIN_ERROR`, `AGGREGATION_ERROR`, `FILTER_ERROR`, `WINDOW_FUNCTION_ERROR`, `SUBQUERY_ERROR`, `NULL_HANDLING_ERROR`, `PERFORMANCE_ERROR`, `LOGICAL_ERROR`, `SYNTAX_ERROR`