deprec / setup.md
Shivoo29's picture
first draft
2930dae

SETUP.md β€” Local Development Guide

Prerequisites

  • Python 3.10+ (download)
  • Git
  • Docker (optional, for containerised run)
  • An OpenAI API key (optional, only for the LLM baseline)

1. Clone the repository

git clone https://github.com/Shivoo29/dummy_1.git
cd dummy_1
git checkout claude/openenv-ai-agent-environment-qJ9pB

2. Create a virtual environment

python -m venv .venv

# macOS / Linux
source .venv/bin/activate

# Windows (PowerShell)
.venv\Scripts\Activate.ps1

3. Install dependencies

pip install -r requirements.txt

4. Run the server

uvicorn app:app --host 0.0.0.0 --port 7860 --reload

5. Quick smoke test

# Health check
curl http://localhost:7860/health

# List tasks
curl http://localhost:7860/tasks

# Start a task1 episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task1", "ticket_index": 0}'

# The response contains an episode_id β€” use it below
EPISODE_ID="<paste episode_id here>"

# Submit a classification action
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d "{\"episode_id\": \"$EPISODE_ID\", \"action\": {\"action_type\": \"classify\", \"category\": \"billing\", \"priority\": \"high\"}}"

# Submit to close the episode
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d "{\"episode_id\": \"$EPISODE_ID\", \"action\": {\"action_type\": \"submit\"}}"

# Grade the episode
curl -X POST http://localhost:7860/grader \
  -H "Content-Type: application/json" \
  -d "{\"episode_id\": \"$EPISODE_ID\"}"

6. Run the baseline

Heuristic baseline (no API key required)

# Single ticket (ticket_index 0)
python baseline.py --mode heuristic

# All 5 tickets per task, averaged
python baseline.py --mode heuristic --all-tickets

Expected output:

task1: 0.8600  (scores: [1.0, 1.0, 1.0, 1.0, 0.3])
task2: 0.5614  (scores: [0.8, 0.386, 0.45, 0.7, 0.471])
task3: 0.9895  (scores: [1.0, 0.992, 0.961, 0.994, 1.0])
OVERALL AVERAGE: 0.8036

LLM baseline (requires OpenAI API key)

export OPENAI_API_KEY="sk-..."          # macOS/Linux
# $env:OPENAI_API_KEY="sk-..."          # Windows PowerShell

python baseline.py --mode llm --model gpt-4o-mini
python baseline.py --mode llm --model gpt-4o-mini --all-tickets

7. Run with Docker

# Build
docker build -t supportenv .

# Run (no API key needed for heuristic mode)
docker run -p 7860:7860 supportenv

# Run with OpenAI key for LLM baseline
docker run -p 7860:7860 -e OPENAI_API_KEY="sk-..." supportenv

8. Project layout

dummy_1/
β”œβ”€β”€ app.py            FastAPI server β€” all HTTP endpoints
β”œβ”€β”€ environment.py    Episode lifecycle: reset / step / state / grade
β”œβ”€β”€ graders.py        Deterministic graders for all 3 tasks
β”œβ”€β”€ data.py           15 pre-defined tickets + ground truth answers
β”œβ”€β”€ models.py         Pydantic typed models (Observation, Action, Reward…)
β”œβ”€β”€ baseline.py       Heuristic + LLM baseline inference scripts
β”œβ”€β”€ openenv.yaml      OpenEnv spec metadata
β”œβ”€β”€ Dockerfile        HF Spaces-compatible container (port 7860)
β”œβ”€β”€ requirements.txt  Python dependencies
β”œβ”€β”€ README.md         Full environment documentation
└── SETUP.md          This file

9. Key files to edit when extending

What you want to change File to edit
Add / modify tickets data.py β€” TASK1/2/3_TICKETS lists
Change grader weights graders.py β€” grade_task1/2/3()
Add a new task data.py (add task meta) + graders.py + app.py (_ACTION_SCHEMAS)
Change reward shaping environment.py β€” _step_reward_task* functions and constants
Add an endpoint app.py
Change typed models models.py

10. Deploy to Hugging Face Spaces

  1. Create a new Space at https://huggingface.co/new-space
    • SDK: Docker
    • Visibility: Public
  2. Add the HF Space as a remote:
    git remote add hf https://huggingface.co/spaces/<your-username>/<space-name>
    
  3. Push:
    git push hf claude/openenv-ai-agent-environment-qJ9pB:main
    
  4. The Space auto-builds from the Dockerfile and exposes port 7860.

11. Environment variables

Variable Required Description
OPENAI_API_KEY Only for LLM baseline Your OpenAI API key
PORT No (default 7860) Override server port

12. Running tests

python -c "
import environment as env
from models import Action

# Verify all 3 tasks reset and grade correctly
for task_id in ['task1', 'task2', 'task3']:
    for i in range(5):
        obs = env.reset(task_id, i)
        env.step(obs.episode_id, Action(action_type='submit'))
        gr = env.grade(obs.episode_id)
        assert 0.0 <= gr.score <= 1.0, f'Score out of range: {gr.score}'
        print(f'{task_id} ticket[{i}]: score={gr.score:.4f} OK')

print('All tests passed.')
"