File size: 3,639 Bytes
62c5bbf
2930dae
62c5bbf
2930dae
62c5bbf
2930dae
62c5bbf
 
 
 
 
2930dae
 
62c5bbf
2930dae
62c5bbf
 
 
 
 
 
 
 
 
 
 
 
2930dae
 
62c5bbf
2930dae
62c5bbf
2930dae
62c5bbf
 
2930dae
 
 
 
62c5bbf
2930dae
 
62c5bbf
 
 
2930dae
 
 
62c5bbf
2930dae
62c5bbf
2930dae
 
62c5bbf
2930dae
 
62c5bbf
2930dae
 
62c5bbf
2930dae
 
62c5bbf
2930dae
 
62c5bbf
2930dae
 
62c5bbf
2930dae
 
62c5bbf
 
 
 
 
2930dae
 
62c5bbf
2930dae
62c5bbf
 
2930dae
 
 
62c5bbf
2930dae
62c5bbf
 
 
 
 
2930dae
 
62c5bbf
2930dae
62c5bbf
 
 
 
 
 
 
2930dae
 
62c5bbf
2930dae
62c5bbf
 
 
2930dae
 
62c5bbf
2930dae
62c5bbf
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# setup.md - SupportEnv Validator-Focused Runbook

## 1. What judges/validator execute

Most checks align to this flow:

1. `POST /reset` on the deployed Space
2. `docker build` from repo root
3. `openenv validate`
4. endpoint contract checks for `/health`, `/reset`, `/step`, `/state`, `/grader`
5. `python inference.py` and stdout format check for `[START]`, `[STEP]`, `[END]`


## 2. File-by-file usage (root)

- `app.py`: FastAPI API surface (`/reset`, `/step`, `/state`, `/tasks`, `/grader`, `/health`)
- `environment.py`: episode lifecycle and reward accumulation (`reset`, `step`, `get_state`, `grade`)
- `graders.py`: deterministic terminal scoring per task with score clamped to `[0.0, 1.0]`
- `data.py`: task metadata and ticket datasets with ground truth labels/entities/steps
- `models.py`: typed Pydantic models used by API and internal state
- `inference.py`: baseline runner; calls the API, logs strict `[START]/[STEP]/[END]`
- `openenv.yaml`: OpenEnv metadata and interface declaration used by validator
- `Dockerfile`: image build/runtime contract for HF Docker Spaces (serves on `7860`)
- `requirements.txt`: runtime dependencies
- `pyproject.toml`: packaging metadata + script entrypoint expected by validator tooling
- `uv.lock`: lockfile required by OpenEnv multi-mode validation path
- `server/app.py`: validator-friendly script entrypoint (`server = server.app:main`)


## 3. Local setup

### Windows PowerShell

```powershell
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txt
```

### macOS/Linux

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```


## 4. Validation checklist (exact order)

1. OpenEnv validator

```bash
.venv/Scripts/openenv.exe validate
```

2. Docker build

```bash
docker build -t supportenv .
```

3. Run server locally

```bash
uvicorn app:app --host 0.0.0.0 --port 7860
```

4. API checks

```bash
curl http://127.0.0.1:7860/health
curl -X POST http://127.0.0.1:7860/reset -H "Content-Type: application/json" -d '{"task_id":"task1","ticket_index":0}'
curl -X POST http://127.0.0.1:7860/step -H "Content-Type: application/json" -d '{"episode_id":"<id>","action":{"action_type":"classify","category":"billing","priority":"high"}}'
curl -X POST http://127.0.0.1:7860/state?episode_id=<id>
curl -X POST http://127.0.0.1:7860/grader -H "Content-Type: application/json" -d '{"episode_id":"<id>"}'
```

5. Baseline inference

```bash
python inference.py
```


## 5. Docker and Spaces runtime model

- Build stage installs from `requirements.txt`.
- Runtime command runs Uvicorn: `app:app` on `0.0.0.0:7860`.
- HF Space should set `sdk: docker` and `app_port: 7860` in `README.md` frontmatter.
- Healthcheck points at `/health` to indicate container liveness.
- If Docker daemon is not running locally, `docker build`/`docker run` will fail even if repo is correct.


## 6. Inference variables

- Required for LLM call path:
  - `API_BASE_URL`
  - `MODEL_NAME`
  - `HF_TOKEN`
- Environment endpoint:
  - `OPENENV_BASE_URL` (preferred)
  - `API_BASE_URL_ENV` (legacy alias)


## 7. Example scorer sanity checks

- Task 1: submit `classify` then `submit`, verify non-binary reward and final score in `[0, 1]`
- Task 2: include deterministic entity/action coverage keys from ticket text
- Task 3: include professional response plus ordered resolution steps


## 8. Common failure causes

- Missing `pyproject.toml` or `uv.lock`
- Missing script entrypoint (`server = server.app:main`)
- App not serving on `0.0.0.0:7860`
- Duplicate HF variable/secret names in Space settings
- Invalid or missing `HF_TOKEN` for real LLM inference