ADAPT DSA Tutor OpenEnv

ADAPT, the Adversarial DSA Tutor, is an OpenEnv-compliant RLVR environment for training code-generation agents on small DSA tasks. The agent receives a problem prompt, examples, and visible tests, then submits Python code. The environment runs the code against visible and hidden tests and returns reward, pass-rate metrics, execution status, and feedback.

This repo now focuses on the environment layer only. Verifier work and training scripts are owned separately.

Why This Environment

The hackathon asks for OpenEnv environments that can improve LLM behavior through verifiable interaction. ADAPT targets a simple but useful skill loop:

agent writes code -> environment executes it -> hidden tests and reward signals score it -> trainer improves the agent

The differentiator is curriculum-ready DSA practice: each episode carries a problem id and difficulty tier so training can track per-tier success instead of only aggregate reward.

OpenEnv Interface

The environment uses the latest OpenEnv API shape:

  • AdaptEnvironment(Environment[AdaptAction, AdaptObservation, AdaptState])
  • reset() returns a typed observation.
  • step(action) accepts an AdaptAction with a Python code string.
  • state exposes episode id, step count, current problem id, difficulty, and recent metrics.

openenv.yaml points to:

app: server.app:app
port: 7860

Action

{
    "code": "n = int(input())\nprint(n * 2)"
}

Observation

Reset and step observations include:

  • problem statement
  • input format
  • constraints
  • examples
  • visible tests
  • problem id
  • difficulty tier
  • feedback
  • pass rate, visible pass rate, and hidden pass rate
  • syntax/runtime/timeout status
  • reward components

Hidden test inputs and expected outputs are never returned in observations.

Reward

Reward is clipped to [0.0, 1.0] and combines multiple environment-level signals:

  • correctness from visible and hidden pass rate
  • syntax validity
  • clean execution
  • output format compliance
  • timeout penalty
  • runtime error penalty
  • static safety rejection for dangerous imports such as os, subprocess, socket, pathlib, and shutil

If verifier.verifier.verify(code, test_cases) exists, the environment can use it as an optional reward augmentation. If the verifier is absent, the environment still works using executor-derived reward.

Local Setup

Use Python 3.10+.

cd C:\Users\kaust\PycharmProjects\meta-rl-dsa-solver
python -m venv .venv
.\.venv\Scripts\pip install -e .

For this local machine, the existing checked-out OpenEnv repo can also be used during development:

$env:PYTHONPATH="C:\Users\kaust\PycharmProjects\OpenEnv\src;$PWD"

Smoke Tests

Run the local smoke test:

python test.py

Check syntax:

python -m py_compile models.py env\adapt_env.py env\executor.py env\test_cases.py server\app.py

Start the OpenEnv server:

uvicorn server.app:app --host 0.0.0.0 --port 7860

Useful endpoints:

  • GET /health
  • GET /schema
  • POST /reset
  • POST /step
  • GET /state

Example step request:

curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d "{\"action\":{\"code\":\"n=int(input())\nprint(n*2)\"}}"

Validate with OpenEnv once dependencies are installed:

openenv validate .

Hugging Face Spaces

This repo is Docker Space ready:

openenv push --repo-id <your-hf-username>/adapt-dsa-tutor

Before final submission, add:

  • live Hugging Face Space link
  • training reward/loss plots from Disha's run
  • before/after code example showing a problem the model failed before training and solved after training
  • mini-blog or short video link

Current Problem Bank

The environment includes a lightweight curated bank:

  • easy_double
  • easy_sum_two
  • medium_maximum
  • medium_count_even
  • hard_reverse_words

This is intentionally small for submission-minimum stability. Later work can expand it to 30-50 tiered problems without changing the OpenEnv API.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading