File size: 7,044 Bytes
aa3a171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5db060f
 
 
 
aa3a171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
---

title: Sql Debug Environment Server
emoji: ๐Ÿ’
colorFrom: pink
colorTo: red
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv
  - sql
  - debugging
  - optimization
---


# ๐Ÿ’ OpenEnv: SQL Debug Environment

An [OpenEnv](https://openenv.dev)-compliant environment where AI agents fix broken SQL queries and optimize slow ones against in-memory SQLite databases.

> โœ… **Validator:** `openenv validate` passes when the environment is wired up correctly
> ๐Ÿš€ **Local API:** `https://abhinavthedev-sql-debug.hf.space`
> ๐Ÿ“– **Swagger UI:** `https://abhinavthedev-sql-debug.hf.space/docs`

---

## ๐ŸŽฏ Environment Description

This environment simulates the work of a SQL engineer who must repair syntax errors, correct logic bugs, and improve query performance. Agents receive a schema, a broken or slow query, and a natural-language target description. They submit SQL queries, observe the execution result and query plan, and are scored on correctness and efficiency.

The environment is intentionally practical: each task mirrors a real debugging pattern used in analytics, reporting, and data engineering workflows.

---

## ๐Ÿ“‹ Tasks

### Task 1 - Syntax Fix *(Easy)*
**Task ID:** `syntax_fix_001`

**Objective:** Fix a malformed query so it returns all orders where `amount > 500`.

| Field | Description |
|---|---|
| `schema` | `orders` table with `id`, `customer`, `amount`, `order_date` |
| `broken_query` | `SELEC * FORM orders WERE amount > 500` |
| `target` | Return all orders where amount is greater than 500 |

**Max steps:** 5 | **Difficulty:** Easy

---

### Task 2 - Logic Fix *(Medium)*
**Task ID:** `logic_fix_001`

**Objective:** Correct a join bug so only employees in valid departments are returned.

| Field | Description |
|---|---|
| `schema` | `employees` and `departments` tables |
| `broken_query` | Query uses `LEFT JOIN` but should exclude missing departments |
| `target` | Return employees in departments with budget > 400000 |

**Max steps:** 8 | **Difficulty:** Medium

---

### Task 3 - Query Optimization *(Hard)*
**Task ID:** `optimize_001`

**Objective:** Rewrite a correlated subquery into an efficient CTE or grouped subquery.

| Field | Description |
|---|---|
| `schema` | `transactions` table with generated sample rows |
| `broken_query` | Correlated subquery that scans per row |
| `target` | Return completed transactions above the user's average amount |

**Max steps:** 10 | **Difficulty:** Hard

---

## ๐Ÿ”Œ API Reference

### Base URL
```text

https://abhinavthedev-sql-debug.hf.space

```

### Core Endpoints

| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/reset` | Start a new episode; pass `task_id` to choose a task |
| `POST` | `/step` | Submit a SQL query and receive the next observation |
| `GET` | `/state/{session_id}` | Inspect the current episode state |
| `GET` | `/schema` | View action, observation, and state schemas |
| `GET` | `/ws` | WebSocket endpoint for low-latency sessions |
| `GET` | `/health` | Health check |
| `GET` | `/docs` | Swagger UI |

---

## ๐ŸŽฎ Action Space

The agent submits a single SQL query each step.

```json

{

  "query": "SELECT * FROM orders WHERE amount > 500"

}

```

### Example Actions

```json

{ "query": "SELECT * FROM orders WHERE amount > 500" }



{ "query": "SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.id WHERE d.budget > 400000" }



{ "query": "WITH avg_amount AS (SELECT user_id, AVG(amount) AS avg_amount FROM transactions GROUP BY user_id) SELECT t.* FROM transactions t JOIN avg_amount a ON t.user_id = a.user_id WHERE t.status = 'completed' AND t.amount > a.avg_amount" }

```

---

## ๐Ÿ“Š Observation Space

```json

{

  "task_id": "syntax_fix_001",

  "schema_sql": "CREATE TABLE orders (...)",

  "current_query": "SELEC * FORM orders WERE amount > 500",

  "error_message": "near \"SELEC\": syntax error",

  "query_result": [],

  "execution_plan": "",

  "step_count": 0,

  "target_description": "Return all orders where amount is greater than 500",

  "reward_so_far": 0.0,

  "available_tasks": ["syntax_fix_001", "logic_fix_001", "optimize_001"],

  "done": false,

  "reward": 0.05

}

```

---

## ๐Ÿ’ฐ Reward Function

The reward is computed from syntax validity, result correctness, and query plan quality.

| Event | Reward |
|---|---|
| Query fails with syntax error | `0.05` |
| Query runs successfully | contributes to the main score |
| Correct row match on easy and medium tasks | up to `0.6` of the score |
| Good query plan on hard task | up to `0.2` of the score |
| Uses correlated-subquery pattern on hard task | heavy plan penalty |
| Excessively long query | length penalty |

Final scores are clamped to the range `[0.0, 1.0]`.

---

## ๐Ÿš€ Setup & Usage

### Option 1 - Run Locally

```bash

pip install -e .

uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload

# Open https://abhinavthedev-sql-debug.hf.space/docs

```

### Option 2 - Run with Docker

```bash

docker build -t sql-debug-env -f server/Dockerfile .

docker run -p 8000:8000 sql-debug-env

curl https://abhinavthedev-sql-debug.hf.space/health

```

### Option 3 - Run the Inference Loop

```bash

export SERVER_URL=https://abhinavthedev-sql-debug.hf.space

export API_KEY=sk-...

python inference.py

```

The inference script defaults to `syntax_fix_001`, logs each step, and stops when the episode ends or the step budget is reached.

---

## ๐Ÿ—๏ธ Project Structure

```text

sql_exp/

โ”œโ”€โ”€ client.py              # OpenEnv client wrapper

โ”œโ”€โ”€ inference.py           # LLM-driven inference loop

โ”œโ”€โ”€ models.py              # Action and observation models

โ”œโ”€โ”€ openenv.yaml           # OpenEnv manifest

โ”œโ”€โ”€ pyproject.toml         # Project metadata and dependencies

โ”œโ”€โ”€ runner.py              # SQLite query runner

โ”œโ”€โ”€ server/

โ”‚   โ”œโ”€โ”€ app.py             # FastAPI app and OpenEnv wiring

โ”‚   โ”œโ”€โ”€ Dockerfile         # Container definition

โ”‚   โ””โ”€โ”€ sql_debug_environment.py  # Core environment logic

โ”œโ”€โ”€ tasks/

โ”‚   โ”œโ”€โ”€ task_easy.py       # Syntax-fix task

โ”‚   โ”œโ”€โ”€ task_medium.py     # Join logic task

โ”‚   โ””โ”€โ”€ task_hard.py       # Query optimization task

โ”œโ”€โ”€ graders/

โ”‚   โ”œโ”€โ”€ grader_easy.py       # Syntax-fix task

โ”‚   โ”œโ”€โ”€ grader_medium.py     # Join logic task

โ”‚   โ””โ”€โ”€ grader_hard.py       # Query optimization task

โ””โ”€โ”€ README.md              # Project overview

```

---

## ๐Ÿ› ๏ธ Tech Stack

- **Python 3.10+** - Runtime
- **FastAPI** - HTTP framework
- **OpenEnv Core** - Environment server and client primitives
- **SQLite** - Query execution engine
- **Uvicorn** - ASGI server
- **Docker** - Containerization

---

## ๐Ÿ“ License

BSD-style license, matching the source headers in this repository.