from __future__ import annotations def render_home_page() -> str: return """ FlakyGym Control Center
FlakyGym Space

FlakyGym Control Center

This console runs flaky-test benchmark episodes and streams live logs. Use it to configure runs, estimate runtime, and review grader outcomes quickly.

Quick Brief: Dataset + Graders

Dataset: dataset/py_tasks.csv

Each row is one flaky-test investigation task created from py-data.csv (repo + SHA + target test + labels + optional known fix diff).

Headers:

repo_urlshatest_nametest_file categorylabelstatuspr_link task_typestest_codeknown_fix_diff

3 Graders (short)

  • Task 1 (`classify`): exact-match flaky vs stable.
  • Task 2 (`root_cause`): category similarity matrix (partial credit allowed).
  • Task 3 (`fix_proposal`): weighted score from pattern match, patch applicability, and LLM judge.

Run Configuration

1 episode(s) 1-100
20 step(s) 1-100

Add from dropdown, remove with x on each chip.

~09m 00s

3 task(s) × 1 episode(s) × 180s/episode

Open API Docs

Tip: if no API key is provided, inference.py falls back to its heuristic agent.

Run Status

idle
Job ID: -
Return Code: -
Started: -
Finished: -
Live Logs 0 lines
No run started yet.
"""