File size: 5,302 Bytes
2930dae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
# SETUP.md β€” Local Development Guide

## Prerequisites

- Python 3.10+ ([download](https://www.python.org/downloads/))
- Git
- Docker (optional, for containerised run)
- An OpenAI API key (optional, only for the LLM baseline)

---

## 1. Clone the repository

```bash
git clone https://github.com/Shivoo29/dummy_1.git
cd dummy_1
git checkout claude/openenv-ai-agent-environment-qJ9pB
```

---

## 2. Create a virtual environment

```bash
python -m venv .venv

# macOS / Linux
source .venv/bin/activate

# Windows (PowerShell)
.venv\Scripts\Activate.ps1
```

---

## 3. Install dependencies

```bash
pip install -r requirements.txt
```

---

## 4. Run the server

```bash
uvicorn app:app --host 0.0.0.0 --port 7860 --reload
```

- API: http://localhost:7860
- Interactive docs (Swagger UI): http://localhost:7860/docs
- ReDoc: http://localhost:7860/redoc

---

## 5. Quick smoke test

```bash
# Health check
curl http://localhost:7860/health

# List tasks
curl http://localhost:7860/tasks

# Start a task1 episode
curl -X POST http://localhost:7860/reset \
  -H "Content-Type: application/json" \
  -d '{"task_id": "task1", "ticket_index": 0}'

# The response contains an episode_id β€” use it below
EPISODE_ID="<paste episode_id here>"

# Submit a classification action
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d "{\"episode_id\": \"$EPISODE_ID\", \"action\": {\"action_type\": \"classify\", \"category\": \"billing\", \"priority\": \"high\"}}"

# Submit to close the episode
curl -X POST http://localhost:7860/step \
  -H "Content-Type: application/json" \
  -d "{\"episode_id\": \"$EPISODE_ID\", \"action\": {\"action_type\": \"submit\"}}"

# Grade the episode
curl -X POST http://localhost:7860/grader \
  -H "Content-Type: application/json" \
  -d "{\"episode_id\": \"$EPISODE_ID\"}"
```

---

## 6. Run the baseline

### Heuristic baseline (no API key required)

```bash
# Single ticket (ticket_index 0)
python baseline.py --mode heuristic

# All 5 tickets per task, averaged
python baseline.py --mode heuristic --all-tickets
```

Expected output:
```
task1: 0.8600  (scores: [1.0, 1.0, 1.0, 1.0, 0.3])
task2: 0.5614  (scores: [0.8, 0.386, 0.45, 0.7, 0.471])
task3: 0.9895  (scores: [1.0, 0.992, 0.961, 0.994, 1.0])
OVERALL AVERAGE: 0.8036
```

### LLM baseline (requires OpenAI API key)

```bash
export OPENAI_API_KEY="sk-..."          # macOS/Linux
# $env:OPENAI_API_KEY="sk-..."          # Windows PowerShell

python baseline.py --mode llm --model gpt-4o-mini
python baseline.py --mode llm --model gpt-4o-mini --all-tickets
```

---

## 7. Run with Docker

```bash
# Build
docker build -t supportenv .

# Run (no API key needed for heuristic mode)
docker run -p 7860:7860 supportenv

# Run with OpenAI key for LLM baseline
docker run -p 7860:7860 -e OPENAI_API_KEY="sk-..." supportenv
```

---

## 8. Project layout

```
dummy_1/
β”œβ”€β”€ app.py            FastAPI server β€” all HTTP endpoints
β”œβ”€β”€ environment.py    Episode lifecycle: reset / step / state / grade
β”œβ”€β”€ graders.py        Deterministic graders for all 3 tasks
β”œβ”€β”€ data.py           15 pre-defined tickets + ground truth answers
β”œβ”€β”€ models.py         Pydantic typed models (Observation, Action, Reward…)
β”œβ”€β”€ baseline.py       Heuristic + LLM baseline inference scripts
β”œβ”€β”€ openenv.yaml      OpenEnv spec metadata
β”œβ”€β”€ Dockerfile        HF Spaces-compatible container (port 7860)
β”œβ”€β”€ requirements.txt  Python dependencies
β”œβ”€β”€ README.md         Full environment documentation
└── SETUP.md          This file
```

---

## 9. Key files to edit when extending

| What you want to change | File to edit |
|------------------------|-------------|
| Add / modify tickets | `data.py` β€” `TASK1/2/3_TICKETS` lists |
| Change grader weights | `graders.py` β€” `grade_task1/2/3()` |
| Add a new task | `data.py` (add task meta) + `graders.py` + `app.py` (`_ACTION_SCHEMAS`) |
| Change reward shaping | `environment.py` β€” `_step_reward_task*` functions and constants |
| Add an endpoint | `app.py` |
| Change typed models | `models.py` |

---

## 10. Deploy to Hugging Face Spaces

1. Create a new Space at https://huggingface.co/new-space
   - SDK: **Docker**
   - Visibility: Public
2. Add the HF Space as a remote:
   ```bash
   git remote add hf https://huggingface.co/spaces/<your-username>/<space-name>
   ```
3. Push:
   ```bash
   git push hf claude/openenv-ai-agent-environment-qJ9pB:main
   ```
4. The Space auto-builds from the `Dockerfile` and exposes port 7860.

---

## 11. Environment variables

| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | Only for LLM baseline | Your OpenAI API key |
| `PORT` | No (default 7860) | Override server port |

---

## 12. Running tests

```bash
python -c "
import environment as env
from models import Action

# Verify all 3 tasks reset and grade correctly
for task_id in ['task1', 'task2', 'task3']:
    for i in range(5):
        obs = env.reset(task_id, i)
        env.step(obs.episode_id, Action(action_type='submit'))
        gr = env.grade(obs.episode_id)
        assert 0.0 <= gr.score <= 1.0, f'Score out of range: {gr.score}'
        print(f'{task_id} ticket[{i}]: score={gr.score:.4f} OK')

print('All tests passed.')
"
```