ADAPT DSA Tutor OpenEnv

ADAPT, the Adversarial DSA Tutor, is an OpenEnv-compliant RLVR environment for training code-generation agents on small DSA tasks. The agent receives a problem prompt, examples, and visible tests, then submits Python code. The environment runs the code against visible and hidden tests and returns reward, pass-rate metrics, execution status, and feedback.

This repo now focuses on the environment layer only. Verifier work and training scripts are owned separately.

Why This Environment

The hackathon asks for OpenEnv environments that can improve LLM behavior through verifiable interaction. ADAPT targets a simple but useful skill loop:

agent writes code -> environment executes it -> hidden tests and reward signals score it -> trainer improves the agent

The differentiator is curriculum-ready DSA practice: each episode carries a problem id and difficulty tier so training can track per-tier success instead of only aggregate reward.

OpenEnv Interface

The environment uses the latest OpenEnv API shape:

AdaptEnvironment(Environment[AdaptAction, AdaptObservation, AdaptState])
reset() returns a typed observation.
step(action) accepts an AdaptAction with a Python code string.
state exposes episode id, step count, current problem id, difficulty, and recent metrics.

openenv.yaml points to:

app: server.app:app
port: 7860

Action

{
    "code": "n = int(input())\nprint(n * 2)"
}

Observation

Reset and step observations include:

problem statement
input format
constraints
examples
visible tests
problem id
difficulty tier
feedback
pass rate, visible pass rate, and hidden pass rate
syntax/runtime/timeout status
reward components

Hidden test inputs and expected outputs are never returned in observations.

Reward

Reward is clipped to [0.0, 1.0] and combines multiple environment-level signals:

correctness from visible and hidden pass rate
syntax validity
clean execution
output format compliance
timeout penalty
runtime error penalty
static safety rejection for dangerous imports such as os, subprocess, socket, pathlib, and shutil

If verifier.verifier.verify(code, test_cases) exists, the environment can use it as an optional reward augmentation. If the verifier is absent, the environment still works using executor-derived reward.

Local Setup

Use Python 3.10+.

cd C:\Users\kaust\PycharmProjects\meta-rl-dsa-solver
python -m venv .venv
.\.venv\Scripts\pip install -e .

For this local machine, the existing checked-out OpenEnv repo can also be used during development:

$env:PYTHONPATH="C:\Users\kaust\PycharmProjects\OpenEnv\src;$PWD"

Smoke Tests

Run the local smoke test:

python test.py

Check syntax:

python -m py_compile models.py env\adapt_env.py env\executor.py env\test_cases.py server\app.py

Start the OpenEnv server:

uvicorn server.app:app --host 0.0.0.0 --port 7860

Useful endpoints:

GET /health
GET /schema
POST /reset
POST /step
GET /state

Example step request:

curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d "{\"action\":{\"code\":\"n=int(input())\nprint(n*2)\"}}"

Validate with OpenEnv once dependencies are installed:

openenv validate .

Hugging Face Spaces

This repo is Docker Space ready:

openenv push --repo-id <your-hf-username>/adapt-dsa-tutor

Before final submission, add:

live Hugging Face Space link
training reward/loss plots from Disha's run
before/after code example showing a problem the model failed before training and solved after training
mini-blog or short video link

Current Problem Bank

The environment includes a lightweight curated bank:

easy_double
easy_sum_two
medium_maximum
medium_count_even
hard_reverse_words

This is intentionally small for submission-minimum stability. Later work can expand it to 30-50 tiered problems without changing the OpenEnv API.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning