Spaces:

exploring-solver
/

my-env

Sleeping

App Files Files Community

my-env / setup.md

exploring-solver

Submission tweaks

62c5bbf 28 days ago

preview code

raw

history blame contribute delete

3.64 kB

	# setup.md - SupportEnv Validator-Focused Runbook

	## 1. What judges/validator execute

	Most checks align to this flow:

	1. `POST /reset` on the deployed Space
	2. `docker build` from repo root
	3. `openenv validate`
	4. endpoint contract checks for `/health`, `/reset`, `/step`, `/state`, `/grader`
	5. `python inference.py` and stdout format check for `[START]`, `[STEP]`, `[END]`


	## 2. File-by-file usage (root)

	- `app.py`: FastAPI API surface (`/reset`, `/step`, `/state`, `/tasks`, `/grader`, `/health`)
	- `environment.py`: episode lifecycle and reward accumulation (`reset`, `step`, `get_state`, `grade`)
	- `graders.py`: deterministic terminal scoring per task with score clamped to `[0.0, 1.0]`
	- `data.py`: task metadata and ticket datasets with ground truth labels/entities/steps
	- `models.py`: typed Pydantic models used by API and internal state
	- `inference.py`: baseline runner; calls the API, logs strict `[START]/[STEP]/[END]`
	- `openenv.yaml`: OpenEnv metadata and interface declaration used by validator
	- `Dockerfile`: image build/runtime contract for HF Docker Spaces (serves on `7860`)
	- `requirements.txt`: runtime dependencies
	- `pyproject.toml`: packaging metadata + script entrypoint expected by validator tooling
	- `uv.lock`: lockfile required by OpenEnv multi-mode validation path
	- `server/app.py`: validator-friendly script entrypoint (`server = server.app:main`)


	## 3. Local setup

	### Windows PowerShell

	```powershell
	python -m venv .venv
	.venv\Scripts\Activate.ps1
	pip install -r requirements.txt
	```

	### macOS/Linux

	```bash
	python -m venv .venv
	source .venv/bin/activate
	pip install -r requirements.txt
	```


	## 4. Validation checklist (exact order)

	1. OpenEnv validator

	```bash
	.venv/Scripts/openenv.exe validate
	```

	2. Docker build

	```bash
	docker build -t supportenv .
	```

	3. Run server locally

	```bash
	uvicorn app:app --host 0.0.0.0 --port 7860
	```

	4. API checks

	```bash
	curl http://127.0.0.1:7860/health
	curl -X POST http://127.0.0.1:7860/reset -H "Content-Type: application/json" -d '{"task_id":"task1","ticket_index":0}'
	curl -X POST http://127.0.0.1:7860/step -H "Content-Type: application/json" -d '{"episode_id":"<id>","action":{"action_type":"classify","category":"billing","priority":"high"}}'
	curl -X POST http://127.0.0.1:7860/state?episode_id=<id>
	curl -X POST http://127.0.0.1:7860/grader -H "Content-Type: application/json" -d '{"episode_id":"<id>"}'
	```

	5. Baseline inference

	```bash
	python inference.py
	```


	## 5. Docker and Spaces runtime model

	- Build stage installs from `requirements.txt`.
	- Runtime command runs Uvicorn: `app:app` on `0.0.0.0:7860`.
	- HF Space should set `sdk: docker` and `app_port: 7860` in `README.md` frontmatter.
	- Healthcheck points at `/health` to indicate container liveness.
	- If Docker daemon is not running locally, `docker build`/`docker run` will fail even if repo is correct.


	## 6. Inference variables

	- Required for LLM call path:
	- `API_BASE_URL`
	- `MODEL_NAME`
	- `HF_TOKEN`
	- Environment endpoint:
	- `OPENENV_BASE_URL` (preferred)
	- `API_BASE_URL_ENV` (legacy alias)


	## 7. Example scorer sanity checks

	- Task 1: submit `classify` then `submit`, verify non-binary reward and final score in `[0, 1]`
	- Task 2: include deterministic entity/action coverage keys from ticket text
	- Task 3: include professional response plus ordered resolution steps


	## 8. Common failure causes

	- Missing `pyproject.toml` or `uv.lock`
	- Missing script entrypoint (`server = server.app:main`)
	- App not serving on `0.0.0.0:7860`
	- Duplicate HF variable/secret names in Space settings
	- Invalid or missing `HF_TOKEN` for real LLM inference