sql_env / README.md
hjerpe's picture
Upload folder using huggingface_hub
a001a97 verified
---
title: SQLEnv
emoji: πŸ€–
colorFrom: blue
colorTo: green
sdk: docker
app_port: 8000
pinned: true
base_path: /web
---
# SQLEnv: Teaching Small Models to Explore Databases
![Python](https://img.shields.io/badge/python-3.12-blue.svg)
![License](https://img.shields.io/badge/license-MIT-green.svg)
![Data](https://img.shields.io/badge/data-CC%20BY--SA%204.0-orange.svg)
SQLEnv is an RL environment for training small language models to answer questions about SQL databases through iterative exploration. Instead of producing one-shot SQL from a fully visible schema, the agent discovers the schema step by step using four tools: DESCRIBE, SAMPLE, QUERY, and ANSWER.
Built on [OpenEnv](https://github.com/meta-pytorch/OpenEnv) and trained with [TRL](https://huggingface.co/docs/trl)'s GRPO implementation. A 0.6B parameter model trained in this environment goes from 0% to ~30% accuracy on a curated Spider subset, learning to explore schemas, recover from SQL errors, and format answers correctly.
**[Blog post](https://hjerpe-sqlenv-blog.static.hf.space)** | **[Live environment](https://huggingface.co/spaces/hjerpe/sql_env)** | **[Training notebook](notebooks/train_grpo.ipynb)**
## Quick Start
```bash
uv sync
uv run pytest tests/ -v
```
Run the environment locally:
```bash
uv run uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
```
Or with Docker:
```bash
docker build -t sqlenv:latest -f server/Dockerfile .
docker run -p 8000:8000 sqlenv:latest
```
## How It Works
Each episode starts with a natural-language question and a list of table names. The schema (columns, types, relationships) is hidden. The agent uses four actions to explore:
| Action | Purpose |
|--------|---------|
| `DESCRIBE table` | Reveal column names, types, and row count |
| `SAMPLE table` | Preview representative rows |
| `QUERY sql` | Execute read-only SQL |
| `ANSWER value` | Submit a final answer (ends episode) |
The environment provides dense reward at each step (operational feedback + progress toward the answer) and a terminal reward for correctness (+1.0 correct, 0.0 wrong). See the [blog post](https://hjerpe-sqlenv-blog.static.hf.space) for details on the reward architecture.
```python
from server.sql_environment import SQLEnvironment, SQLAction
env = SQLEnvironment(questions_path="data/questions/questions_train.json",
db_dir="data/databases", tokenizer=tok)
obs = env.reset(seed=42)
obs = env.step(SQLAction(action_type="DESCRIBE", argument="employee"))
obs = env.step(SQLAction(action_type="QUERY", argument="SELECT COUNT(*) FROM employee"))
obs = env.step(SQLAction(action_type="ANSWER", argument="10"))
# obs.done=True, obs.reward=1.0
```
## Training
We train [Qwen3-0.6B](https://arxiv.org/abs/2505.09388) using [GRPO](https://arxiv.org/abs/2402.03300) (from DeepSeekMath) through TRL's `environment_factory`. The full pipeline (SFT warmup + two-phase GRPO) runs in ~5 hours on a single Colab L4.
**Notebooks:**
- **[train_grpo.ipynb](notebooks/train_grpo.ipynb)** runs the full SFT + GRPO pipeline
- **[compare_methods.ipynb](notebooks/compare_methods.ipynb)** evaluates base vs trained models
- **[showcase_sqlenv.ipynb](notebooks/showcase_sqlenv.ipynb)** lets you explore the environment interactively
**Local test (CPU, ~3 min):**
```bash
docker build -f Dockerfile.test -t sqlenv-test .
docker run --rm sqlenv-test
```
## Evaluation
All evaluation runs through the Green Agent evaluator:
```python
from sql_env.evaluation import evaluate, RandomPolicy, OraclePolicy
result = evaluate(env, policy, n_episodes=50, seed=0)
print(f"Accuracy: {result.success_rate:.1%}, Reward: {result.avg_reward:.3f}")
```
Results on our curated 10-database Spider subset (N=50, 2 runs):
| Method | Accuracy | Parse Rate | Avg Steps |
|--------|----------|------------|-----------|
| Zero-shot | 0% | 24-28% | 10.8-12.4 |
| 1-shot | 0-2% | 16-17% | 14.0-14.8 |
| 3-shot | 0% | 19-20% | 13.8-14.8 |
| GRPO v1 (2 epochs) | 28-30% | 95-100% | 3.5-4.0 |
| GRPO v2 (4 epochs) | 24-32% | 87-95% | 3.5-4.0 |
This evaluation is not comparable to the official Spider leaderboard, which uses different scoring, full-schema input, and a broader database set. See the [blog post](https://hjerpe-sqlenv-blog.static.hf.space) for detailed analysis.
## Data
676 questions (473 train, 203 eval) across 10 Spider databases with difficulty labels, plus 120 multi-turn SFT warmup trajectories generated from gold SQL. See [docs/data-sources.md](docs/data-sources.md) for full details on provenance, curation, and regeneration.
Data in `data/` is adapted from [Spider](https://yale-lily.github.io/spider) (Yu et al., 2018) and shared under CC BY-SA 4.0. See [DATA_LICENSE](DATA_LICENSE).
## Project Structure
```
sqlenv/
β”œβ”€β”€ __init__.py, client.py, models.py # Core types and client
β”œβ”€β”€ server/
β”‚ β”œβ”€β”€ app.py # FastAPI server
β”‚ β”œβ”€β”€ sql_environment.py # Environment implementation
β”‚ β”œβ”€β”€ reward.py # Three-layer reward function
β”‚ β”œβ”€β”€ verifier.py # Answer verification
β”‚ └── Dockerfile # HF Spaces deployment
β”œβ”€β”€ evaluation/ # Green Agent evaluator, policies
β”œβ”€β”€ training/ # TRL adapter, data loading
β”œβ”€β”€ scripts/ # Data curation, SFT generation
β”œβ”€β”€ notebooks/ # Training, evaluation, showcase
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ databases/ # 10 Spider SQLite databases
β”‚ β”œβ”€β”€ questions/ # Train/eval question sets
β”‚ └── sft/ # SFT warmup trajectories
β”œβ”€β”€ configs/ # Training configurations
β”œβ”€β”€ tests/ # Unit and integration tests
└── docs/
β”œβ”€β”€ data-sources.md # Data provenance
└── ARCHITECTURE.md # System architecture
```
## References
- Yu et al. (2018). [Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task](https://yale-lily.github.io/spider). EMNLP.
- Shao et al. (2024). [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://arxiv.org/abs/2402.03300). (GRPO algorithm)
- Ng, Harada, Russell (1999). [Policy Invariance Under Reward Transformations](https://people.eecs.berkeley.edu/~pabbeel/cs287-fa09/readings/NgHaradaRussell-shaping-ICML1999.pdf). ICML.
- [OpenEnv framework](https://github.com/meta-pytorch/OpenEnv)
- [TRL OpenEnv docs](https://huggingface.co/docs/trl/openenv)
## License
Code: [MIT](LICENSE). Data: [CC BY-SA 4.0](DATA_LICENSE).