sql_env / docs /exploration /README.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified
# Exploration
Ideas, technology research, and ad-hoc investigation notes. This is a scratchpad -- content here is not system-of-record.
**Diataxis type:** Exploration (learning-oriented, not yet distilled)
## What Goes Here
- Technology evaluations and comparisons
- Prototype findings
- External API exploration
- Performance investigations
- Ideation and backlog notes
## What Does NOT Go Here
- Durable learnings (go to `docs/learnings/`)
- Design decisions (go to `docs/design-docs/`)
- Implementation specs (go to `specs/`)
- Operational how-to guides (go to `docs/guides/`)
## Exploration Index
| Topic | Type | Date | Summary |
|-------|------|------|---------|
| [grpo-collapse-analysis.md](grpo-collapse-analysis.md) | Investigation | 2026-04 | Post-mortem on Qwen3-1.7B GRPO collapse into degenerate null-argument tool calls |
| [grpo-plateau-plan.md](grpo-plateau-plan.md) | Investigation | 2026-04 | Interventions to push past 30-40% accuracy plateau in GRPO training |
| [grpo-training-session-log.md](grpo-training-session-log.md) | Investigation | 2026-04 | Running log of SFT warmup + GRPO training sessions on Colab L4 |
| [rl-vs-icl-research.md](rl-vs-icl-research.md) | Comparison | 2026-04 | When GRPO training adds value over pure prompting for small SQL agents |
| [train-grpo-walkthrough.md](train-grpo-walkthrough.md) | Prototype | 2026-04 | Step-by-step companion guide for train_grpo.ipynb |
## Types
- **Tech Eval:** Evaluating a library, framework, or service
- **Prototype:** Findings from exploratory prototyping
- **Investigation:** Deep dive into a specific problem
- **Comparison:** Side-by-side analysis of options
## Graduating Content
When exploration produces durable insights:
1. Extract patterns to `docs/learnings/<category>.md`
2. Create reference files in `docs/references/` for agent context
3. Create how-to guides in `docs/guides/` for operational procedures