AE-Shree commited on
Commit Β·
f3f7834
1
Parent(s): dfa9f05
Select ous To ROund 2 !!
Browse files- README.md +6 -12
- grader/clm_graders.py +82 -63
- inference.py +5 -2
- models.py +231 -100
- openenv.yaml +4 -4
- server/app.py +13 -7
- tests/test_clm.py +257 -0
README.md
CHANGED
|
@@ -17,21 +17,21 @@ tags: [openenv, rl, scheduling, agent-eval, productivity]
|
|
| 17 |
[](#)
|
| 18 |
[](#)
|
| 19 |
|
| 20 |
-
CLM is a **real-world productivity simulation** where an AI agent plays the role of a human knowledge worker's task scheduler. It must manage heterogeneous work items
|
| 21 |
|
| 22 |
*This is not a toy game.* CLM models how humans actually experience workload: stress accumulates when deadlines approach, fatigue reduces efficiency, context-switching has a cognitive cost, and deep focus yields better output at the expense of higher energy.
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
## π― Why This Environment Matters
|
| 27 |
|
| 28 |
-
Modern knowledge workers face **cognitive load management** as one of their most critical daily challenges
|
| 29 |
|
| 30 |
- **Useful for training agents** that assist with personal productivity tools, calendar management, and task triage systems.
|
| 31 |
-
- **Useful for evaluating LLM planning ability**
|
| 32 |
- **Realistic dynamics**: energy, stress, fatigue, and task dependencies create emergent difficulty that pure search algorithms cannot exploit.
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
## πΉοΈ Actions
|
| 37 |
|
|
@@ -50,7 +50,7 @@ Action format:
|
|
| 50 |
{"type": "break", "task_id": null}
|
| 51 |
```
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
## ποΈ Observation Space
|
| 56 |
|
|
@@ -85,7 +85,6 @@ Action format:
|
|
| 85 |
- `upcoming_deadlines` β tasks with deadline within the next 5 steps
|
| 86 |
- `focus_mode` β whether the agent is currently in deep-work state
|
| 87 |
|
| 88 |
-
---
|
| 89 |
|
| 90 |
## π Tasks & Baseline Scores
|
| 91 |
|
|
@@ -99,7 +98,6 @@ Action format:
|
|
| 99 |
Scores produced by heuristic agent (priority + deadline triage with focus mode).
|
| 100 |
A strong LLM agent should achieve: easy >0.85, medium >0.55, hard >0.35, expert >0.25.
|
| 101 |
|
| 102 |
-
---
|
| 103 |
|
| 104 |
## π Scoring Formula
|
| 105 |
|
|
@@ -119,7 +117,6 @@ score = weighted_completion Γ 0.60
|
|
| 119 |
|
| 120 |
Score is always in **(0.01, 0.99)** β never exactly 0 or 1.
|
| 121 |
|
| 122 |
-
---
|
| 123 |
|
| 124 |
## π Setup
|
| 125 |
|
|
@@ -149,7 +146,6 @@ cd frontend && npm install && npm run dev
|
|
| 149 |
# Visit http://localhost:5173
|
| 150 |
```
|
| 151 |
|
| 152 |
-
---
|
| 153 |
|
| 154 |
## ποΈ Architecture
|
| 155 |
|
|
@@ -177,7 +173,6 @@ graph TD
|
|
| 177 |
API -->|OpenEnv spec| OE[openenv validate]
|
| 178 |
```
|
| 179 |
|
| 180 |
-
---
|
| 181 |
|
| 182 |
## π Reward Shaping Details
|
| 183 |
|
|
@@ -197,7 +192,6 @@ Step rewards provide **dense signal** across the full trajectory:
|
|
| 197 |
| Episode: all done (on time) | +1.0 |
|
| 198 |
| Episode: all done (late) | +0.5 |
|
| 199 |
|
| 200 |
-
---
|
| 201 |
|
| 202 |
## βοΈ Environment Variables
|
| 203 |
|
|
|
|
| 17 |
[](#)
|
| 18 |
[](#)
|
| 19 |
|
| 20 |
+
CLM is a **real-world productivity simulation** where an AI agent plays the role of a human knowledge worker's task scheduler. It must manage heterogeneous work items like emails, meetings, code reviews, reports, and calls each with different cognitive demands, deadlines, priorities, and dependencies, while keeping the worker's energy and stress within safe bounds.
|
| 21 |
|
| 22 |
*This is not a toy game.* CLM models how humans actually experience workload: stress accumulates when deadlines approach, fatigue reduces efficiency, context-switching has a cognitive cost, and deep focus yields better output at the expense of higher energy.
|
| 23 |
|
| 24 |
+
|
| 25 |
|
| 26 |
## π― Why This Environment Matters
|
| 27 |
|
| 28 |
+
Modern knowledge workers face **cognitive load management** as one of their most critical daily challenges, yet no RL environment has modelled this domain in a principled, agent-evaluatable way. CLM fills this gap:
|
| 29 |
|
| 30 |
- **Useful for training agents** that assist with personal productivity tools, calendar management, and task triage systems.
|
| 31 |
+
- **Useful for evaluating LLM planning ability** especially multi-step planning under resource constraints.
|
| 32 |
- **Realistic dynamics**: energy, stress, fatigue, and task dependencies create emergent difficulty that pure search algorithms cannot exploit.
|
| 33 |
|
| 34 |
+
|
| 35 |
|
| 36 |
## πΉοΈ Actions
|
| 37 |
|
|
|
|
| 50 |
{"type": "break", "task_id": null}
|
| 51 |
```
|
| 52 |
|
| 53 |
+
|
| 54 |
|
| 55 |
## ποΈ Observation Space
|
| 56 |
|
|
|
|
| 85 |
- `upcoming_deadlines` β tasks with deadline within the next 5 steps
|
| 86 |
- `focus_mode` β whether the agent is currently in deep-work state
|
| 87 |
|
|
|
|
| 88 |
|
| 89 |
## π Tasks & Baseline Scores
|
| 90 |
|
|
|
|
| 98 |
Scores produced by heuristic agent (priority + deadline triage with focus mode).
|
| 99 |
A strong LLM agent should achieve: easy >0.85, medium >0.55, hard >0.35, expert >0.25.
|
| 100 |
|
|
|
|
| 101 |
|
| 102 |
## π Scoring Formula
|
| 103 |
|
|
|
|
| 117 |
|
| 118 |
Score is always in **(0.01, 0.99)** β never exactly 0 or 1.
|
| 119 |
|
|
|
|
| 120 |
|
| 121 |
## π Setup
|
| 122 |
|
|
|
|
| 146 |
# Visit http://localhost:5173
|
| 147 |
```
|
| 148 |
|
|
|
|
| 149 |
|
| 150 |
## ποΈ Architecture
|
| 151 |
|
|
|
|
| 173 |
API -->|OpenEnv spec| OE[openenv validate]
|
| 174 |
```
|
| 175 |
|
|
|
|
| 176 |
|
| 177 |
## π Reward Shaping Details
|
| 178 |
|
|
|
|
| 192 |
| Episode: all done (on time) | +1.0 |
|
| 193 |
| Episode: all done (late) | +0.5 |
|
| 194 |
|
|
|
|
| 195 |
|
| 196 |
## βοΈ Environment Variables
|
| 197 |
|
grader/clm_graders.py
CHANGED
|
@@ -1,10 +1,11 @@
|
|
| 1 |
"""
|
| 2 |
-
Class-based graders for CLM tasks
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
"""
|
| 9 |
import sys, os
|
| 10 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
|
@@ -22,18 +23,68 @@ def _safe(raw) -> float:
|
|
| 22 |
return _MIN
|
| 23 |
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
def _heuristic_action(env: CLMEnvironment) -> Action:
|
| 26 |
"""
|
| 27 |
Competent heuristic agent:
|
| 28 |
-
- Enters focus mode on critical tasks with approaching deadlines
|
| 29 |
- Takes breaks when fatigued or stressed
|
| 30 |
- Prioritises: critical > high > normal > low, then earliest deadline
|
| 31 |
-
- Respects task dependencies
|
|
|
|
| 32 |
"""
|
| 33 |
-
state
|
| 34 |
blocked = env._blocked_ids()
|
| 35 |
|
| 36 |
-
# Rest condition
|
| 37 |
if state.energy < 0.30 or state.stress > 0.70:
|
| 38 |
return Action(type="break", task_id=None)
|
| 39 |
|
|
@@ -41,80 +92,48 @@ def _heuristic_action(env: CLMEnvironment) -> Action:
|
|
| 41 |
if not pending:
|
| 42 |
return Action(type="delay", task_id=None)
|
| 43 |
|
| 44 |
-
# Sort by priority weight DESC then deadline ASC
|
| 45 |
pending.sort(key=lambda t: (
|
| 46 |
-PRIORITY_WEIGHT[t.priority],
|
| 47 |
t.deadline if t.deadline is not None else 9999
|
| 48 |
))
|
| 49 |
target = pending[0]
|
| 50 |
|
| 51 |
-
# Use focus mode for critical tasks with deadline in β€10 steps
|
| 52 |
use_focus = (
|
| 53 |
target.priority == "critical"
|
| 54 |
and target.deadline is not None
|
| 55 |
and (target.deadline - state.time_step) <= 10
|
| 56 |
and state.energy > 0.55
|
| 57 |
)
|
| 58 |
-
|
| 59 |
-
if state.current_task_id == target.id:
|
| 60 |
-
return Action(type="focus" if use_focus else "work", task_id=target.id)
|
| 61 |
return Action(type="focus" if use_focus else "work", task_id=target.id)
|
| 62 |
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
max_s = 60 if difficulty == "expert" else 50
|
| 68 |
-
env = CLMEnvironment(tasks=tasks, max_steps=max_s)
|
| 69 |
-
env.reset()
|
| 70 |
-
done, step = False, 0
|
| 71 |
-
while not done and step < max_s:
|
| 72 |
-
action = _heuristic_action(env)
|
| 73 |
-
_, _, done, _ = env.step(action)
|
| 74 |
-
step += 1
|
| 75 |
-
raw = deterministic_grader(env.state.tasks, env.state.time_step, env.state.energy)
|
| 76 |
-
score = _safe(raw)
|
| 77 |
-
comp = sum(1 for t in env.state.tasks if t.progress >= 1.0)
|
| 78 |
-
msg = (
|
| 79 |
-
f"CLM {difficulty} | score={score:.4f} | "
|
| 80 |
-
f"steps={step} energy={env.state.energy:.2f} "
|
| 81 |
-
f"completed={comp}/{len(env.state.tasks)}"
|
| 82 |
-
)
|
| 83 |
-
return score, score >= 0.5, msg
|
| 84 |
-
except Exception as e:
|
| 85 |
-
return _MIN, False, f"Grader error: {e}"
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
def _from_trajectory(trajectory: dict, difficulty: str) -> tuple:
|
| 89 |
-
if trajectory and "tasks" in trajectory:
|
| 90 |
-
raw_tasks = trajectory.get("tasks", [])
|
| 91 |
-
ts = trajectory.get("time_step", 50)
|
| 92 |
-
eng = trajectory.get("energy", 0.5)
|
| 93 |
-
task_objs = [Task(**t) if isinstance(t, dict) else t for t in raw_tasks]
|
| 94 |
-
raw = deterministic_grader(task_objs, ts, eng)
|
| 95 |
-
score = _safe(raw)
|
| 96 |
-
comp = sum(1 for t in task_objs if t.progress >= 1.0)
|
| 97 |
-
msg = f"CLM {difficulty} | score={score:.4f} | completed={comp}/{len(task_objs)}"
|
| 98 |
-
return score, score >= 0.5, msg
|
| 99 |
-
return _run_episode(difficulty)
|
| 100 |
-
|
| 101 |
-
|
| 102 |
class EasyGrader:
|
| 103 |
-
"""Easy: 2 tasks (email + report), no deadlines. Expected
|
| 104 |
-
def grade(self, trajectory=None, *a, **kw):
|
| 105 |
-
|
|
|
|
|
|
|
| 106 |
|
| 107 |
class MediumGrader:
|
| 108 |
-
"""Medium: 5 tasks
|
| 109 |
-
def grade(self, trajectory=None, *a, **kw):
|
| 110 |
-
|
|
|
|
|
|
|
| 111 |
|
| 112 |
class HardGrader:
|
| 113 |
-
"""Hard: 8 tasks
|
| 114 |
-
def grade(self, trajectory=None, *a, **kw):
|
| 115 |
-
|
|
|
|
|
|
|
| 116 |
|
| 117 |
class ExpertGrader:
|
| 118 |
-
"""Expert: 10 tasks, deep dependencies, 3
|
| 119 |
-
def grade(self, trajectory=None, *a, **kw):
|
| 120 |
-
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
+
Class-based graders for CLM tasks.
|
| 3 |
|
| 4 |
+
FIX 1: _from_trajectory no longer falls back to running a heuristic episode
|
| 5 |
+
when the trajectory is empty or missing. It returns 0.01 immediately.
|
| 6 |
+
The grader MUST score the actual agent, not a proxy.
|
| 7 |
|
| 8 |
+
Graders produce scores strictly in (0.01, 0.99).
|
| 9 |
"""
|
| 10 |
import sys, os
|
| 11 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
|
|
|
| 23 |
return _MIN
|
| 24 |
|
| 25 |
|
| 26 |
+
def _from_trajectory(trajectory: dict, difficulty: str) -> tuple:
|
| 27 |
+
"""
|
| 28 |
+
Score a completed agent trajectory.
|
| 29 |
+
|
| 30 |
+
FIX 1: If trajectory is empty or has no tasks, return 0.01 immediately.
|
| 31 |
+
We must never rerun a heuristic episode here β that would score the
|
| 32 |
+
heuristic agent, not the LLM agent under evaluation.
|
| 33 |
+
"""
|
| 34 |
+
if not trajectory or not trajectory.get("tasks"):
|
| 35 |
+
return _MIN, False, f"CLM {difficulty} | score=0.0100 | empty trajectory"
|
| 36 |
+
|
| 37 |
+
raw_tasks = trajectory["tasks"]
|
| 38 |
+
ts = trajectory.get("time_step", 50)
|
| 39 |
+
eng = trajectory.get("energy", 0.5)
|
| 40 |
+
task_objs = [Task(**t) if isinstance(t, dict) else t for t in raw_tasks]
|
| 41 |
+
raw = deterministic_grader(task_objs, ts, eng)
|
| 42 |
+
score = _safe(raw)
|
| 43 |
+
comp = sum(1 for t in task_objs if t.progress >= 1.0)
|
| 44 |
+
msg = f"CLM {difficulty} | score={score:.4f} | completed={comp}/{len(task_objs)}"
|
| 45 |
+
return score, score >= 0.5, msg
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
def _run_heuristic_baseline(difficulty: str) -> tuple:
|
| 49 |
+
"""
|
| 50 |
+
Run a heuristic agent to produce a BASELINE reference score only.
|
| 51 |
+
This is used for reporting / README baseline numbers β NEVER for
|
| 52 |
+
grading an LLM agent's actual trajectory.
|
| 53 |
+
"""
|
| 54 |
+
try:
|
| 55 |
+
tasks = generate_tasks(difficulty, seed=42) # fixed seed for reproducibility
|
| 56 |
+
max_s = 60 if difficulty == "expert" else 50
|
| 57 |
+
env = CLMEnvironment(tasks=tasks, max_steps=max_s, seed=42)
|
| 58 |
+
env.reset()
|
| 59 |
+
done, step = False, 0
|
| 60 |
+
while not done and step < max_s:
|
| 61 |
+
action = _heuristic_action(env)
|
| 62 |
+
_, _, done, _ = env.step(action)
|
| 63 |
+
step += 1
|
| 64 |
+
raw = deterministic_grader(env.state.tasks, env.state.time_step, env.state.energy)
|
| 65 |
+
score = _safe(raw)
|
| 66 |
+
comp = sum(1 for t in env.state.tasks if t.progress >= 1.0)
|
| 67 |
+
msg = (
|
| 68 |
+
f"CLM {difficulty} baseline | score={score:.4f} | "
|
| 69 |
+
f"steps={step} energy={env.state.energy:.2f} "
|
| 70 |
+
f"completed={comp}/{len(env.state.tasks)}"
|
| 71 |
+
)
|
| 72 |
+
return score, score >= 0.5, msg
|
| 73 |
+
except Exception as e:
|
| 74 |
+
return _MIN, False, f"Baseline error: {e}"
|
| 75 |
+
|
| 76 |
+
|
| 77 |
def _heuristic_action(env: CLMEnvironment) -> Action:
|
| 78 |
"""
|
| 79 |
Competent heuristic agent:
|
|
|
|
| 80 |
- Takes breaks when fatigued or stressed
|
| 81 |
- Prioritises: critical > high > normal > low, then earliest deadline
|
| 82 |
+
- Respects task dependencies
|
| 83 |
+
- Uses focus mode on critical tasks near their deadline
|
| 84 |
"""
|
| 85 |
+
state = env.state
|
| 86 |
blocked = env._blocked_ids()
|
| 87 |
|
|
|
|
| 88 |
if state.energy < 0.30 or state.stress > 0.70:
|
| 89 |
return Action(type="break", task_id=None)
|
| 90 |
|
|
|
|
| 92 |
if not pending:
|
| 93 |
return Action(type="delay", task_id=None)
|
| 94 |
|
|
|
|
| 95 |
pending.sort(key=lambda t: (
|
| 96 |
-PRIORITY_WEIGHT[t.priority],
|
| 97 |
t.deadline if t.deadline is not None else 9999
|
| 98 |
))
|
| 99 |
target = pending[0]
|
| 100 |
|
|
|
|
| 101 |
use_focus = (
|
| 102 |
target.priority == "critical"
|
| 103 |
and target.deadline is not None
|
| 104 |
and (target.deadline - state.time_step) <= 10
|
| 105 |
and state.energy > 0.55
|
| 106 |
)
|
|
|
|
|
|
|
|
|
|
| 107 |
return Action(type="focus" if use_focus else "work", task_id=target.id)
|
| 108 |
|
| 109 |
|
| 110 |
+
# ==========================================
|
| 111 |
+
# PUBLIC GRADER CLASSES
|
| 112 |
+
# ==========================================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
class EasyGrader:
|
| 114 |
+
"""Easy: 2 tasks (email + report), no deadlines. Expected score: ~0.72β0.82."""
|
| 115 |
+
def grade(self, trajectory=None, *a, **kw):
|
| 116 |
+
return _from_trajectory(trajectory or {}, "easy")
|
| 117 |
+
def __call__(self, trajectory=None, *a, **kw):
|
| 118 |
+
return _from_trajectory(trajectory or {}, "easy")[0]
|
| 119 |
|
| 120 |
class MediumGrader:
|
| 121 |
+
"""Medium: 5 tasks, mixed priorities and deadlines. Expected: ~0.38β0.52."""
|
| 122 |
+
def grade(self, trajectory=None, *a, **kw):
|
| 123 |
+
return _from_trajectory(trajectory or {}, "medium")
|
| 124 |
+
def __call__(self, trajectory=None, *a, **kw):
|
| 125 |
+
return _from_trajectory(trajectory or {}, "medium")[0]
|
| 126 |
|
| 127 |
class HardGrader:
|
| 128 |
+
"""Hard: 8 tasks, dependencies, tight deadlines, stochastic interruptions. Expected: ~0.15β0.28."""
|
| 129 |
+
def grade(self, trajectory=None, *a, **kw):
|
| 130 |
+
return _from_trajectory(trajectory or {}, "hard")
|
| 131 |
+
def __call__(self, trajectory=None, *a, **kw):
|
| 132 |
+
return _from_trajectory(trajectory or {}, "hard")[0]
|
| 133 |
|
| 134 |
class ExpertGrader:
|
| 135 |
+
"""Expert: 10 tasks, deep dependencies, 3 stochastic interruptions. Expected: ~0.05β0.15."""
|
| 136 |
+
def grade(self, trajectory=None, *a, **kw):
|
| 137 |
+
return _from_trajectory(trajectory or {}, "expert")
|
| 138 |
+
def __call__(self, trajectory=None, *a, **kw):
|
| 139 |
+
return _from_trajectory(trajectory or {}, "expert")[0]
|
inference.py
CHANGED
|
@@ -100,7 +100,9 @@ def heuristic_fallback(obs: dict) -> Dict:
|
|
| 100 |
blocked = set(vs.get("blocked_tasks", []))
|
| 101 |
tasks = [t for t in obs.get("tasks", [])
|
| 102 |
if t.get("progress", 0.0) < 1.0 and t["id"] not in blocked]
|
| 103 |
-
|
|
|
|
|
|
|
| 104 |
return {"type": "break", "task_id": None}
|
| 105 |
if tasks:
|
| 106 |
# Sort: critical > high > normal > low, then nearest deadline
|
|
@@ -108,7 +110,8 @@ def heuristic_fallback(obs: dict) -> Dict:
|
|
| 108 |
tasks.sort(key=lambda t: (pmap.get(t.get("priority", "normal"), 2),
|
| 109 |
t.get("deadline") or 9999))
|
| 110 |
t = tasks[0]
|
| 111 |
-
|
|
|
|
| 112 |
return {"type": atype, "task_id": t["id"]}
|
| 113 |
return {"type": "delay", "task_id": None}
|
| 114 |
|
|
|
|
| 100 |
blocked = set(vs.get("blocked_tasks", []))
|
| 101 |
tasks = [t for t in obs.get("tasks", [])
|
| 102 |
if t.get("progress", 0.0) < 1.0 and t["id"] not in blocked]
|
| 103 |
+
# FIX 6: observation is now partially observable β use categorical labels
|
| 104 |
+
fatigue = vs.get("fatigue_level", "low")
|
| 105 |
+
if fatigue == "high" or vs.get("stress_warning", False):
|
| 106 |
return {"type": "break", "task_id": None}
|
| 107 |
if tasks:
|
| 108 |
# Sort: critical > high > normal > low, then nearest deadline
|
|
|
|
| 110 |
tasks.sort(key=lambda t: (pmap.get(t.get("priority", "normal"), 2),
|
| 111 |
t.get("deadline") or 9999))
|
| 112 |
t = tasks[0]
|
| 113 |
+
fatigue_ok = vs.get("fatigue_level", "low") != "high"
|
| 114 |
+
atype = "focus" if t.get("priority") == "critical" and fatigue_ok else "work"
|
| 115 |
return {"type": atype, "task_id": t["id"]}
|
| 116 |
return {"type": "delay", "task_id": None}
|
| 117 |
|
models.py
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
from pydantic import BaseModel, Field
|
| 2 |
from typing import List, Optional, Literal, Tuple, Dict, Any
|
|
|
|
| 3 |
|
| 4 |
# ==========================================
|
| 5 |
-
# TASK TYPES
|
| 6 |
# ==========================================
|
| 7 |
TaskType = Literal["email", "meeting", "code_review", "report", "call"]
|
| 8 |
Priority = Literal["critical", "high", "normal", "low"]
|
|
@@ -11,6 +12,9 @@ PRIORITY_WEIGHT = {"critical": 1.5, "high": 1.2, "normal": 1.0, "low": 0.7}
|
|
| 11 |
TASK_ENERGY_COST = {"email": 0.08, "meeting": 0.18, "code_review": 0.20, "report": 0.14, "call": 0.11}
|
| 12 |
TASK_PROGRESS_RATE = {"email": 0.35, "meeting": 0.30, "code_review": 0.20, "report": 0.22, "call": 0.28}
|
| 13 |
|
|
|
|
|
|
|
|
|
|
| 14 |
# ==========================================
|
| 15 |
# OPENENV SCHEMAS
|
| 16 |
# ==========================================
|
|
@@ -21,17 +25,22 @@ class Task(BaseModel):
|
|
| 21 |
priority: Priority = "normal"
|
| 22 |
progress: float = 0.0
|
| 23 |
deadline: Optional[int] = None
|
| 24 |
-
depends_on: Optional[str] = None
|
| 25 |
-
is_interrupted: bool = False
|
| 26 |
|
| 27 |
class VisibleState(BaseModel):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
fatigue_level: str # "low" | "medium" | "high"
|
|
|
|
| 29 |
stress_warning: bool
|
| 30 |
-
energy_level: float = 1.0
|
| 31 |
-
stress_level: float = 0.0
|
| 32 |
focus_mode: bool = False
|
| 33 |
-
upcoming_deadlines: List[str] = []
|
| 34 |
-
blocked_tasks: List[str] = []
|
|
|
|
| 35 |
|
| 36 |
class Observation(BaseModel):
|
| 37 |
tasks: List[Task]
|
|
@@ -40,72 +49,165 @@ class Observation(BaseModel):
|
|
| 40 |
|
| 41 |
class Action(BaseModel):
|
| 42 |
type: Literal["work", "break", "switch", "delay", "focus"]
|
| 43 |
-
# work β normal work on task_id
|
| 44 |
-
# break β rest; recover energy + reduce stress
|
| 45 |
-
# switch β change active task (small context-switch cost)
|
| 46 |
-
# delay β do nothing; slight stress relief
|
| 47 |
-
# focus β deep-work mode: 2Γ progress, 2Γ energy cost
|
| 48 |
task_id: Optional[str] = None
|
| 49 |
|
| 50 |
class EnvState(BaseModel):
|
| 51 |
-
energy:
|
| 52 |
-
stress:
|
| 53 |
-
fatigue:
|
| 54 |
-
time_step:
|
| 55 |
-
current_task_id:
|
| 56 |
-
tasks:
|
| 57 |
-
focus_mode:
|
| 58 |
-
interruption_count:
|
| 59 |
-
milestone_rewards:
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
|
| 62 |
# ==========================================
|
| 63 |
-
# TASK GENERATION
|
|
|
|
|
|
|
| 64 |
# ==========================================
|
| 65 |
-
def generate_tasks(level: str) -> list[Task]:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
if level == "easy":
|
| 67 |
-
# 2 simple tasks, no deadlines β learn basics
|
| 68 |
return [
|
| 69 |
-
Task(id="e1", difficulty="easy",
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
]
|
| 72 |
|
| 73 |
elif level == "medium":
|
| 74 |
-
# 5 mixed tasks with deadlines and priorities
|
| 75 |
return [
|
| 76 |
-
Task(id="m1", difficulty="medium",
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
Task(id="
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
]
|
| 82 |
|
| 83 |
elif level == "hard":
|
| 84 |
-
# 8 tasks with task dependencies + 2 mid-episode interruptions
|
| 85 |
return [
|
| 86 |
-
Task(id="h1", difficulty="hard",
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
Task(id="
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
]
|
| 95 |
|
| 96 |
elif level == "expert":
|
| 97 |
-
# 10 tasks, deep dependencies, 3 mid-episode interruptions
|
| 98 |
return [
|
| 99 |
-
Task(id="x1", difficulty="expert",
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
Task(id="
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
Task(id="
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
]
|
| 110 |
|
| 111 |
return []
|
|
@@ -126,8 +228,19 @@ def _inject_interruption(state: EnvState, step: int) -> None:
|
|
| 126 |
# GRADER
|
| 127 |
# ==========================================
|
| 128 |
def grader(trajectory: dict) -> float:
|
| 129 |
-
"""
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
ts = trajectory.get("time_step", 50)
|
| 132 |
eng = trajectory.get("energy", 0.5)
|
| 133 |
task_objs = [Task(**t) if isinstance(t, dict) else t for t in raw_tasks]
|
|
@@ -136,41 +249,35 @@ def grader(trajectory: dict) -> float:
|
|
| 136 |
|
| 137 |
def deterministic_grader(tasks: list[Task], time_step: int, final_energy: float) -> float:
|
| 138 |
"""
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
+ deadline_adherence Γ 0.22 (fraction of tasks meeting deadline)
|
| 148 |
-
+ energy_efficiency Γ 0.10 (reward for not burning out)
|
| 149 |
-
+ dependency_bonus Γ 0.05 (rewarded correct sequencing)
|
| 150 |
-
+ interruption_bonus Γ 0.03 (handled urgent tasks)
|
| 151 |
-
|
| 152 |
-
Always returns value in (0.01, 0.99).
|
| 153 |
"""
|
| 154 |
if not tasks:
|
| 155 |
return 0.01
|
| 156 |
|
| 157 |
total_weight = sum(PRIORITY_WEIGHT[t.priority] for t in tasks)
|
| 158 |
|
| 159 |
-
#
|
| 160 |
wc = sum(t.progress * PRIORITY_WEIGHT[t.priority] for t in tasks) / max(total_weight, 0.01)
|
| 161 |
|
| 162 |
-
#
|
| 163 |
-
completable
|
| 164 |
-
met_deadline
|
| 165 |
1 for t in completable
|
| 166 |
if t.progress >= 1.0 and time_step <= t.deadline
|
| 167 |
)
|
| 168 |
da = (met_deadline / len(completable)) if completable else 1.0
|
| 169 |
|
| 170 |
-
#
|
| 171 |
ee = max(0.0, (final_energy - 0.10) * 0.13)
|
| 172 |
|
| 173 |
-
#
|
| 174 |
dep_bonus = 0.0
|
| 175 |
for t in tasks:
|
| 176 |
if t.depends_on and t.progress >= 1.0:
|
|
@@ -179,11 +286,11 @@ def deterministic_grader(tasks: list[Task], time_step: int, final_energy: float)
|
|
| 179 |
dep_bonus += 0.015
|
| 180 |
dep_bonus = min(0.05, dep_bonus)
|
| 181 |
|
| 182 |
-
#
|
| 183 |
interrupted = [t for t in tasks if t.is_interrupted]
|
| 184 |
int_bonus = 0.0
|
| 185 |
if interrupted:
|
| 186 |
-
handled
|
| 187 |
int_bonus = min(0.03, (handled / len(interrupted)) * 0.03)
|
| 188 |
|
| 189 |
raw = wc * 0.60 + da * 0.22 + ee + dep_bonus + int_bonus
|
|
@@ -191,22 +298,41 @@ def deterministic_grader(tasks: list[Task], time_step: int, final_energy: float)
|
|
| 191 |
|
| 192 |
|
| 193 |
# ==========================================
|
| 194 |
-
#
|
|
|
|
|
|
|
|
|
|
| 195 |
# ==========================================
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
|
|
|
| 201 |
|
| 202 |
-
|
|
|
|
|
|
|
| 203 |
self.max_steps = max_steps
|
| 204 |
self.initial_tasks = tasks
|
| 205 |
self.difficulty = tasks[0].difficulty if tasks else "easy"
|
| 206 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 207 |
|
| 208 |
def reset(self) -> Observation:
|
| 209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
return self._get_observation()
|
| 211 |
|
| 212 |
def _blocked_ids(self) -> set[str]:
|
|
@@ -221,12 +347,16 @@ class CLMEnvironment:
|
|
| 221 |
|
| 222 |
def _get_observation(self) -> Observation:
|
| 223 |
e = self.state.energy
|
| 224 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
vs = VisibleState(
|
| 226 |
-
fatigue_level=
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
stress_level=round(self.state.stress, 3),
|
| 230 |
focus_mode=self.state.focus_mode,
|
| 231 |
upcoming_deadlines=self._upcoming_ids(),
|
| 232 |
blocked_tasks=list(self._blocked_ids()),
|
|
@@ -237,23 +367,25 @@ class CLMEnvironment:
|
|
| 237 |
reward = 0.0
|
| 238 |
blocked = self._blocked_ids()
|
| 239 |
|
| 240 |
-
#
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
and self.
|
| 244 |
_inject_interruption(self.state, self.state.time_step)
|
|
|
|
|
|
|
| 245 |
reward -= 0.05
|
| 246 |
|
| 247 |
-
#
|
| 248 |
if action.type in ("work", "focus"):
|
| 249 |
is_focus = (action.type == "focus")
|
| 250 |
|
| 251 |
if action.task_id:
|
| 252 |
if action.task_id in blocked:
|
| 253 |
-
reward -= 0.15
|
| 254 |
else:
|
| 255 |
if self.state.current_task_id and self.state.current_task_id != action.task_id:
|
| 256 |
-
reward -= 0.07
|
| 257 |
self.state.current_task_id = action.task_id
|
| 258 |
self.state.focus_mode = is_focus
|
| 259 |
|
|
@@ -272,7 +404,6 @@ class CLMEnvironment:
|
|
| 272 |
|
| 273 |
reward += 0.10 * (task.progress - old_p) * pw
|
| 274 |
|
| 275 |
-
# Milestone rewards
|
| 276 |
for ms, bonus in [(0.25, 0.04), (0.50, 0.07), (0.75, 0.09), (1.00, 0.18)]:
|
| 277 |
key = f"{task.id}@{ms}"
|
| 278 |
if task.progress >= ms and key not in self.state.milestone_rewards:
|
|
@@ -298,7 +429,7 @@ class CLMEnvironment:
|
|
| 298 |
|
| 299 |
self.state.time_step += 1
|
| 300 |
|
| 301 |
-
#
|
| 302 |
for t in (tt for tt in self.state.tasks if tt.progress < 1.0):
|
| 303 |
if t.deadline:
|
| 304 |
ttd = t.deadline - self.state.time_step
|
|
@@ -308,7 +439,7 @@ class CLMEnvironment:
|
|
| 308 |
elif ttd < 0:
|
| 309 |
self.state.stress = min(1.0, self.state.stress + 0.12 * pw)
|
| 310 |
|
| 311 |
-
#
|
| 312 |
all_done = all(t.progress >= 1.0 for t in self.state.tasks)
|
| 313 |
burnout = self.state.energy < 0.07
|
| 314 |
timeout = self.state.time_step >= self.max_steps
|
|
|
|
| 1 |
from pydantic import BaseModel, Field
|
| 2 |
from typing import List, Optional, Literal, Tuple, Dict, Any
|
| 3 |
+
import random
|
| 4 |
|
| 5 |
# ==========================================
|
| 6 |
+
# TASK TYPES
|
| 7 |
# ==========================================
|
| 8 |
TaskType = Literal["email", "meeting", "code_review", "report", "call"]
|
| 9 |
Priority = Literal["critical", "high", "normal", "low"]
|
|
|
|
| 12 |
TASK_ENERGY_COST = {"email": 0.08, "meeting": 0.18, "code_review": 0.20, "report": 0.14, "call": 0.11}
|
| 13 |
TASK_PROGRESS_RATE = {"email": 0.35, "meeting": 0.30, "code_review": 0.20, "report": 0.22, "call": 0.28}
|
| 14 |
|
| 15 |
+
ALL_TASK_TYPES: list[TaskType] = ["email", "meeting", "code_review", "report", "call"]
|
| 16 |
+
ALL_PRIORITIES: list[Priority] = ["critical", "high", "normal", "low"]
|
| 17 |
+
|
| 18 |
# ==========================================
|
| 19 |
# OPENENV SCHEMAS
|
| 20 |
# ==========================================
|
|
|
|
| 25 |
priority: Priority = "normal"
|
| 26 |
progress: float = 0.0
|
| 27 |
deadline: Optional[int] = None
|
| 28 |
+
depends_on: Optional[str] = None
|
| 29 |
+
is_interrupted: bool = False
|
| 30 |
|
| 31 |
class VisibleState(BaseModel):
|
| 32 |
+
"""
|
| 33 |
+
FIX 6 β Partial observability: agent sees only categorical labels,
|
| 34 |
+
not raw float values for energy/stress. This rewards agents that
|
| 35 |
+
reason from context rather than reading exact numbers.
|
| 36 |
+
"""
|
| 37 |
fatigue_level: str # "low" | "medium" | "high"
|
| 38 |
+
stress_level: str # "calm" | "elevated" | "critical"
|
| 39 |
stress_warning: bool
|
|
|
|
|
|
|
| 40 |
focus_mode: bool = False
|
| 41 |
+
upcoming_deadlines: List[str] = []
|
| 42 |
+
blocked_tasks: List[str] = []
|
| 43 |
+
# energy_level and stress float removed β use fatigue_level / stress_level instead
|
| 44 |
|
| 45 |
class Observation(BaseModel):
|
| 46 |
tasks: List[Task]
|
|
|
|
| 49 |
|
| 50 |
class Action(BaseModel):
|
| 51 |
type: Literal["work", "break", "switch", "delay", "focus"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
task_id: Optional[str] = None
|
| 53 |
|
| 54 |
class EnvState(BaseModel):
|
| 55 |
+
energy: float = 1.0
|
| 56 |
+
stress: float = 0.0
|
| 57 |
+
fatigue: float = 0.0
|
| 58 |
+
time_step: int = 0
|
| 59 |
+
current_task_id: Optional[str] = None
|
| 60 |
+
tasks: List[Task] = []
|
| 61 |
+
focus_mode: bool = False
|
| 62 |
+
interruption_count: int = 0
|
| 63 |
+
milestone_rewards: Dict[str, float] = {}
|
| 64 |
+
# FIX 3 β stochastic interrupt tracking
|
| 65 |
+
next_interrupt_eligible: int = 999
|
| 66 |
+
interrupt_budget: int = 0
|
| 67 |
|
| 68 |
|
| 69 |
# ==========================================
|
| 70 |
+
# FIX 2 β PROCEDURAL TASK GENERATION
|
| 71 |
+
# Seed-based so episodes are reproducible on request but vary by default.
|
| 72 |
+
# Deadlines jitter +-3 steps; task types and secondary priorities randomised.
|
| 73 |
# ==========================================
|
| 74 |
+
def generate_tasks(level: str, seed: Optional[int] = None) -> list[Task]:
|
| 75 |
+
"""
|
| 76 |
+
Generate tasks for the given difficulty level.
|
| 77 |
+
Pass seed=None for a random seed (default for live play),
|
| 78 |
+
or an explicit int for reproducible evaluation runs.
|
| 79 |
+
"""
|
| 80 |
+
rng = random.Random(seed)
|
| 81 |
+
|
| 82 |
+
def _jitter(base: int, lo: int = -3, hi: int = 3) -> int:
|
| 83 |
+
return max(1, base + rng.randint(lo, hi))
|
| 84 |
+
|
| 85 |
+
def _p(pool: list) -> str:
|
| 86 |
+
return rng.choice(pool)
|
| 87 |
+
|
| 88 |
if level == "easy":
|
|
|
|
| 89 |
return [
|
| 90 |
+
Task(id="e1", difficulty="easy",
|
| 91 |
+
task_type=_p(["email", "report"]),
|
| 92 |
+
priority=_p(["normal", "high"]),
|
| 93 |
+
deadline=None),
|
| 94 |
+
Task(id="e2", difficulty="easy",
|
| 95 |
+
task_type=_p(["report", "code_review"]),
|
| 96 |
+
priority=_p(["normal", "low"]),
|
| 97 |
+
deadline=None),
|
| 98 |
]
|
| 99 |
|
| 100 |
elif level == "medium":
|
|
|
|
| 101 |
return [
|
| 102 |
+
Task(id="m1", difficulty="medium",
|
| 103 |
+
task_type=_p(["email", "call"]),
|
| 104 |
+
priority="critical",
|
| 105 |
+
deadline=_jitter(14)),
|
| 106 |
+
Task(id="m2", difficulty="medium",
|
| 107 |
+
task_type=_p(["meeting", "code_review"]),
|
| 108 |
+
priority=_p(["high", "normal"]),
|
| 109 |
+
deadline=_jitter(20)),
|
| 110 |
+
Task(id="m3", difficulty="medium",
|
| 111 |
+
task_type=_p(["code_review", "report"]),
|
| 112 |
+
priority=_p(["normal", "high"]),
|
| 113 |
+
deadline=_jitter(28)),
|
| 114 |
+
Task(id="m4", difficulty="medium",
|
| 115 |
+
task_type=_p(["report", "meeting"]),
|
| 116 |
+
priority=_p(["high", "normal"]),
|
| 117 |
+
deadline=_jitter(35)),
|
| 118 |
+
Task(id="m5", difficulty="medium",
|
| 119 |
+
task_type=_p(["call", "email"]),
|
| 120 |
+
priority=_p(["low", "normal"]),
|
| 121 |
+
deadline=_jitter(45)),
|
| 122 |
]
|
| 123 |
|
| 124 |
elif level == "hard":
|
|
|
|
| 125 |
return [
|
| 126 |
+
Task(id="h1", difficulty="hard",
|
| 127 |
+
task_type=_p(["email", "call"]),
|
| 128 |
+
priority="critical",
|
| 129 |
+
deadline=_jitter(12)),
|
| 130 |
+
Task(id="h2", difficulty="hard",
|
| 131 |
+
task_type=_p(["code_review", "report"]),
|
| 132 |
+
priority=_p(["high", "normal"]),
|
| 133 |
+
deadline=_jitter(16)),
|
| 134 |
+
Task(id="h3", difficulty="hard",
|
| 135 |
+
task_type=_p(["meeting", "call"]),
|
| 136 |
+
priority="critical",
|
| 137 |
+
deadline=_jitter(20),
|
| 138 |
+
depends_on="h1"),
|
| 139 |
+
Task(id="h4", difficulty="hard",
|
| 140 |
+
task_type=_p(["report", "code_review"]),
|
| 141 |
+
priority=_p(["high", "normal"]),
|
| 142 |
+
deadline=_jitter(24)),
|
| 143 |
+
Task(id="h5", difficulty="hard",
|
| 144 |
+
task_type=_p(["call", "meeting"]),
|
| 145 |
+
priority=_p(["normal", "high"]),
|
| 146 |
+
deadline=_jitter(28),
|
| 147 |
+
depends_on="h2"),
|
| 148 |
+
Task(id="h6", difficulty="hard",
|
| 149 |
+
task_type=_p(["email", "report"]),
|
| 150 |
+
priority=_p(["high", "normal"]),
|
| 151 |
+
deadline=_jitter(32)),
|
| 152 |
+
Task(id="h7", difficulty="hard",
|
| 153 |
+
task_type=_p(["code_review", "meeting"]),
|
| 154 |
+
priority="critical",
|
| 155 |
+
deadline=_jitter(38),
|
| 156 |
+
depends_on="h4"),
|
| 157 |
+
Task(id="h8", difficulty="hard",
|
| 158 |
+
task_type=_p(["report", "email"]),
|
| 159 |
+
priority=_p(["normal", "low"]),
|
| 160 |
+
deadline=_jitter(46)),
|
| 161 |
]
|
| 162 |
|
| 163 |
elif level == "expert":
|
|
|
|
| 164 |
return [
|
| 165 |
+
Task(id="x1", difficulty="expert",
|
| 166 |
+
task_type=_p(["email", "call"]),
|
| 167 |
+
priority="critical",
|
| 168 |
+
deadline=_jitter(8)),
|
| 169 |
+
Task(id="x2", difficulty="expert",
|
| 170 |
+
task_type=_p(["code_review", "report"]),
|
| 171 |
+
priority=_p(["high", "critical"]),
|
| 172 |
+
deadline=_jitter(12)),
|
| 173 |
+
Task(id="x3", difficulty="expert",
|
| 174 |
+
task_type=_p(["meeting", "call"]),
|
| 175 |
+
priority="critical",
|
| 176 |
+
deadline=_jitter(14),
|
| 177 |
+
depends_on="x1"),
|
| 178 |
+
Task(id="x4", difficulty="expert",
|
| 179 |
+
task_type=_p(["report", "code_review"]),
|
| 180 |
+
priority=_p(["high", "normal"]),
|
| 181 |
+
deadline=_jitter(18),
|
| 182 |
+
depends_on="x2"),
|
| 183 |
+
Task(id="x5", difficulty="expert",
|
| 184 |
+
task_type=_p(["call", "meeting"]),
|
| 185 |
+
priority=_p(["normal", "high"]),
|
| 186 |
+
deadline=_jitter(22),
|
| 187 |
+
depends_on="x3"),
|
| 188 |
+
Task(id="x6", difficulty="expert",
|
| 189 |
+
task_type=_p(["code_review", "email"]),
|
| 190 |
+
priority="critical",
|
| 191 |
+
deadline=_jitter(24)),
|
| 192 |
+
Task(id="x7", difficulty="expert",
|
| 193 |
+
task_type=_p(["email", "report"]),
|
| 194 |
+
priority=_p(["high", "normal"]),
|
| 195 |
+
deadline=_jitter(28),
|
| 196 |
+
depends_on="x4"),
|
| 197 |
+
Task(id="x8", difficulty="expert",
|
| 198 |
+
task_type=_p(["report", "call"]),
|
| 199 |
+
priority=_p(["normal", "high"]),
|
| 200 |
+
deadline=_jitter(33),
|
| 201 |
+
depends_on="x6"),
|
| 202 |
+
Task(id="x9", difficulty="expert",
|
| 203 |
+
task_type=_p(["meeting", "code_review"]),
|
| 204 |
+
priority="critical",
|
| 205 |
+
deadline=_jitter(36),
|
| 206 |
+
depends_on="x5"),
|
| 207 |
+
Task(id="x10", difficulty="expert",
|
| 208 |
+
task_type=_p(["call", "email"]),
|
| 209 |
+
priority=_p(["high", "normal"]),
|
| 210 |
+
deadline=_jitter(44)),
|
| 211 |
]
|
| 212 |
|
| 213 |
return []
|
|
|
|
| 228 |
# GRADER
|
| 229 |
# ==========================================
|
| 230 |
def grader(trajectory: dict) -> float:
|
| 231 |
+
"""
|
| 232 |
+
OpenEnv single-argument grader.
|
| 233 |
+
|
| 234 |
+
FIX 1: If trajectory is empty or missing tasks, return 0.01 immediately.
|
| 235 |
+
The grader MUST score the actual agent trajectory β it must never silently
|
| 236 |
+
fall back to re-running a heuristic episode. Doing so would let the
|
| 237 |
+
environment grade itself rather than the agent under evaluation.
|
| 238 |
+
"""
|
| 239 |
+
if not trajectory or not trajectory.get("tasks"):
|
| 240 |
+
# Empty trajectory = agent produced no useful state β minimum score
|
| 241 |
+
return 0.01
|
| 242 |
+
|
| 243 |
+
raw_tasks = trajectory["tasks"]
|
| 244 |
ts = trajectory.get("time_step", 50)
|
| 245 |
eng = trajectory.get("energy", 0.5)
|
| 246 |
task_objs = [Task(**t) if isinstance(t, dict) else t for t in raw_tasks]
|
|
|
|
| 249 |
|
| 250 |
def deterministic_grader(tasks: list[Task], time_step: int, final_energy: float) -> float:
|
| 251 |
"""
|
| 252 |
+
Scores the ACTUAL final task state. Always returns a value in (0.01, 0.99).
|
| 253 |
+
|
| 254 |
+
Formula:
|
| 255 |
+
weighted_completion x 0.60
|
| 256 |
+
deadline_adherence x 0.22
|
| 257 |
+
energy_efficiency x 0.10
|
| 258 |
+
dependency_bonus x 0.05
|
| 259 |
+
interruption_bonus x 0.03
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 260 |
"""
|
| 261 |
if not tasks:
|
| 262 |
return 0.01
|
| 263 |
|
| 264 |
total_weight = sum(PRIORITY_WEIGHT[t.priority] for t in tasks)
|
| 265 |
|
| 266 |
+
# Weighted completion (partial progress counts)
|
| 267 |
wc = sum(t.progress * PRIORITY_WEIGHT[t.priority] for t in tasks) / max(total_weight, 0.01)
|
| 268 |
|
| 269 |
+
# Deadline adherence
|
| 270 |
+
completable = [t for t in tasks if t.deadline is not None]
|
| 271 |
+
met_deadline = sum(
|
| 272 |
1 for t in completable
|
| 273 |
if t.progress >= 1.0 and time_step <= t.deadline
|
| 274 |
)
|
| 275 |
da = (met_deadline / len(completable)) if completable else 1.0
|
| 276 |
|
| 277 |
+
# Energy efficiency
|
| 278 |
ee = max(0.0, (final_energy - 0.10) * 0.13)
|
| 279 |
|
| 280 |
+
# Dependency ordering bonus
|
| 281 |
dep_bonus = 0.0
|
| 282 |
for t in tasks:
|
| 283 |
if t.depends_on and t.progress >= 1.0:
|
|
|
|
| 286 |
dep_bonus += 0.015
|
| 287 |
dep_bonus = min(0.05, dep_bonus)
|
| 288 |
|
| 289 |
+
# Interruption handling bonus
|
| 290 |
interrupted = [t for t in tasks if t.is_interrupted]
|
| 291 |
int_bonus = 0.0
|
| 292 |
if interrupted:
|
| 293 |
+
handled = sum(1 for t in interrupted if t.progress >= 1.0)
|
| 294 |
int_bonus = min(0.03, (handled / len(interrupted)) * 0.03)
|
| 295 |
|
| 296 |
raw = wc * 0.60 + da * 0.22 + ee + dep_bonus + int_bonus
|
|
|
|
| 298 |
|
| 299 |
|
| 300 |
# ==========================================
|
| 301 |
+
# FIX 3 β STOCHASTIC INTERRUPTION CONFIG
|
| 302 |
+
# Interruptions fire with a per-step probability once an eligibility
|
| 303 |
+
# window opens, with a cooldown to prevent back-to-back fires.
|
| 304 |
+
# budget = max number of interrupts for the difficulty level.
|
| 305 |
# ==========================================
|
| 306 |
+
_INTERRUPT_CONFIG = {
|
| 307 |
+
# prob_per_step eligible_from cooldown_steps budget
|
| 308 |
+
"hard": (0.18, 10, 8, 2),
|
| 309 |
+
"expert": (0.22, 6, 7, 3),
|
| 310 |
+
}
|
| 311 |
+
|
| 312 |
|
| 313 |
+
class CLMEnvironment:
|
| 314 |
+
def __init__(self, tasks: list[Task], max_steps: int = 50,
|
| 315 |
+
seed: Optional[int] = None):
|
| 316 |
self.max_steps = max_steps
|
| 317 |
self.initial_tasks = tasks
|
| 318 |
self.difficulty = tasks[0].difficulty if tasks else "easy"
|
| 319 |
+
self._rng = random.Random(seed)
|
| 320 |
+
cfg = _INTERRUPT_CONFIG.get(self.difficulty, (0.0, 999, 999, 0))
|
| 321 |
+
self._interrupt_prob, eligible_from, self._cooldown, budget = cfg
|
| 322 |
+
self.state = EnvState(
|
| 323 |
+
tasks=[t.model_copy() for t in tasks],
|
| 324 |
+
next_interrupt_eligible=eligible_from,
|
| 325 |
+
interrupt_budget=budget,
|
| 326 |
+
)
|
| 327 |
|
| 328 |
def reset(self) -> Observation:
|
| 329 |
+
cfg = _INTERRUPT_CONFIG.get(self.difficulty, (0.0, 999, 999, 0))
|
| 330 |
+
_, eligible_from, _, budget = cfg
|
| 331 |
+
self.state = EnvState(
|
| 332 |
+
tasks=[t.model_copy() for t in self.initial_tasks],
|
| 333 |
+
next_interrupt_eligible=eligible_from,
|
| 334 |
+
interrupt_budget=budget,
|
| 335 |
+
)
|
| 336 |
return self._get_observation()
|
| 337 |
|
| 338 |
def _blocked_ids(self) -> set[str]:
|
|
|
|
| 347 |
|
| 348 |
def _get_observation(self) -> Observation:
|
| 349 |
e = self.state.energy
|
| 350 |
+
s = self.state.stress
|
| 351 |
+
|
| 352 |
+
# FIX 6: Categorical labels only β no raw floats exposed to agent
|
| 353 |
+
fatigue_label = "high" if e < 0.30 else ("medium" if e < 0.60 else "low")
|
| 354 |
+
stress_label = "critical" if s > 0.75 else ("elevated" if s > 0.45 else "calm")
|
| 355 |
+
|
| 356 |
vs = VisibleState(
|
| 357 |
+
fatigue_level=fatigue_label,
|
| 358 |
+
stress_level=stress_label,
|
| 359 |
+
stress_warning=s > 0.65,
|
|
|
|
| 360 |
focus_mode=self.state.focus_mode,
|
| 361 |
upcoming_deadlines=self._upcoming_ids(),
|
| 362 |
blocked_tasks=list(self._blocked_ids()),
|
|
|
|
| 367 |
reward = 0.0
|
| 368 |
blocked = self._blocked_ids()
|
| 369 |
|
| 370 |
+
# FIX 3: Stochastic interruption β probabilistic, not fixed-step
|
| 371 |
+
if (self.state.interrupt_budget > 0
|
| 372 |
+
and self.state.time_step >= self.state.next_interrupt_eligible
|
| 373 |
+
and self._rng.random() < self._interrupt_prob):
|
| 374 |
_inject_interruption(self.state, self.state.time_step)
|
| 375 |
+
self.state.interrupt_budget -= 1
|
| 376 |
+
self.state.next_interrupt_eligible = self.state.time_step + self._cooldown
|
| 377 |
reward -= 0.05
|
| 378 |
|
| 379 |
+
# Action processing
|
| 380 |
if action.type in ("work", "focus"):
|
| 381 |
is_focus = (action.type == "focus")
|
| 382 |
|
| 383 |
if action.task_id:
|
| 384 |
if action.task_id in blocked:
|
| 385 |
+
reward -= 0.15
|
| 386 |
else:
|
| 387 |
if self.state.current_task_id and self.state.current_task_id != action.task_id:
|
| 388 |
+
reward -= 0.07
|
| 389 |
self.state.current_task_id = action.task_id
|
| 390 |
self.state.focus_mode = is_focus
|
| 391 |
|
|
|
|
| 404 |
|
| 405 |
reward += 0.10 * (task.progress - old_p) * pw
|
| 406 |
|
|
|
|
| 407 |
for ms, bonus in [(0.25, 0.04), (0.50, 0.07), (0.75, 0.09), (1.00, 0.18)]:
|
| 408 |
key = f"{task.id}@{ms}"
|
| 409 |
if task.progress >= ms and key not in self.state.milestone_rewards:
|
|
|
|
| 429 |
|
| 430 |
self.state.time_step += 1
|
| 431 |
|
| 432 |
+
# Stress dynamics
|
| 433 |
for t in (tt for tt in self.state.tasks if tt.progress < 1.0):
|
| 434 |
if t.deadline:
|
| 435 |
ttd = t.deadline - self.state.time_step
|
|
|
|
| 439 |
elif ttd < 0:
|
| 440 |
self.state.stress = min(1.0, self.state.stress + 0.12 * pw)
|
| 441 |
|
| 442 |
+
# Episode termination
|
| 443 |
all_done = all(t.progress >= 1.0 for t in self.state.tasks)
|
| 444 |
burnout = self.state.energy < 0.07
|
| 445 |
timeout = self.state.time_step >= self.max_steps
|
openenv.yaml
CHANGED
|
@@ -48,10 +48,10 @@ observation_space:
|
|
| 48 |
- depends_on: task_id or null
|
| 49 |
- is_interrupted: bool
|
| 50 |
visible_state:
|
| 51 |
-
|
| 52 |
-
-
|
| 53 |
-
-
|
| 54 |
-
-
|
| 55 |
- focus_mode: bool
|
| 56 |
- upcoming_deadlines: list[task_id]
|
| 57 |
- blocked_tasks: list[task_id]
|
|
|
|
| 48 |
- depends_on: task_id or null
|
| 49 |
- is_interrupted: bool
|
| 50 |
visible_state:
|
| 51 |
+
# Partial observability: energy/stress are categorical labels, not raw floats.
|
| 52 |
+
- fatigue_level: "low | medium | high" # energy bands: >0.6 | 0.3-0.6 | <0.3
|
| 53 |
+
- stress_level: "calm | elevated | critical" # stress bands: <0.45 | 0.45-0.75 | >0.75
|
| 54 |
+
- stress_warning: bool # true when stress > 0.65
|
| 55 |
- focus_mode: bool
|
| 56 |
- upcoming_deadlines: list[task_id]
|
| 57 |
- blocked_tasks: list[task_id]
|
server/app.py
CHANGED
|
@@ -1,15 +1,21 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
import sys
|
| 3 |
import os
|
| 4 |
|
| 5 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 6 |
|
| 7 |
-
from backend.main import app #
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
def main():
|
| 11 |
-
uvicorn.run(app, host="0.0.0.0", port=7860)
|
| 12 |
|
|
|
|
| 13 |
|
| 14 |
if __name__ == "__main__":
|
| 15 |
-
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
server/app.py β single entry point for CLM OpenEnv server.
|
| 3 |
+
|
| 4 |
+
Imports the FastAPI app built in backend/main.py and exposes it for:
|
| 5 |
+
- Dockerfile: uvicorn server.app:app --host 0.0.0.0 --port 7860
|
| 6 |
+
- openenv.yaml: app: server.app:app
|
| 7 |
+
|
| 8 |
+
All route logic lives in backend/main.py. This file is intentionally thin.
|
| 9 |
+
"""
|
| 10 |
import sys
|
| 11 |
import os
|
| 12 |
|
| 13 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 14 |
|
| 15 |
+
from backend.main import app # single source of truth for the FastAPI app
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
__all__ = ["app"]
|
| 18 |
|
| 19 |
if __name__ == "__main__":
|
| 20 |
+
import uvicorn
|
| 21 |
+
uvicorn.run(app, host="0.0.0.0", port=7860)
|
tests/test_clm.py
ADDED
|
@@ -0,0 +1,257 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
tests/test_clm.py β unit tests for the Cognitive Load Manager environment.
|
| 3 |
+
|
| 4 |
+
Run with: pytest tests/test_clm.py -v
|
| 5 |
+
"""
|
| 6 |
+
import sys, os, pytest
|
| 7 |
+
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 8 |
+
|
| 9 |
+
from models import (
|
| 10 |
+
Action, Task, EnvState, CLMEnvironment,
|
| 11 |
+
generate_tasks, deterministic_grader, grader,
|
| 12 |
+
PRIORITY_WEIGHT,
|
| 13 |
+
)
|
| 14 |
+
from grader.clm_graders import (
|
| 15 |
+
EasyGrader, MediumGrader, HardGrader, ExpertGrader, _from_trajectory,
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 20 |
+
# FIX 2 β Procedural generation
|
| 21 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
+
class TestProceduralGeneration:
|
| 23 |
+
def test_seed_produces_same_tasks(self):
|
| 24 |
+
a = generate_tasks("medium", seed=7)
|
| 25 |
+
b = generate_tasks("medium", seed=7)
|
| 26 |
+
assert [t.model_dump() for t in a] == [t.model_dump() for t in b]
|
| 27 |
+
|
| 28 |
+
def test_different_seeds_differ(self):
|
| 29 |
+
results = set()
|
| 30 |
+
for s in range(20):
|
| 31 |
+
tasks = generate_tasks("medium", seed=s)
|
| 32 |
+
results.add(tuple(t.deadline for t in tasks))
|
| 33 |
+
assert len(results) > 1, "All seeds produced identical deadlines"
|
| 34 |
+
|
| 35 |
+
def test_task_counts(self):
|
| 36 |
+
assert len(generate_tasks("easy")) == 2
|
| 37 |
+
assert len(generate_tasks("medium")) == 5
|
| 38 |
+
assert len(generate_tasks("hard")) == 8
|
| 39 |
+
assert len(generate_tasks("expert")) == 10
|
| 40 |
+
|
| 41 |
+
def test_deadlines_positive_and_bounded(self):
|
| 42 |
+
"""Jitter can reorder adjacent deadlines, but all must be positive and sane."""
|
| 43 |
+
base_deadlines = {"medium": [14, 20, 28, 35, 45], "hard": [12, 16, 20, 24, 28, 32, 38, 46]}
|
| 44 |
+
for level, bases in base_deadlines.items():
|
| 45 |
+
for seed in range(20):
|
| 46 |
+
tasks = generate_tasks(level, seed=seed)
|
| 47 |
+
for t in tasks:
|
| 48 |
+
if t.deadline is not None:
|
| 49 |
+
assert t.deadline >= 1, f"Deadline must be >= 1, got {t.deadline}"
|
| 50 |
+
# Should be within Β±5 of the nearest base (generous bound)
|
| 51 |
+
nearest = min(bases, key=lambda b: abs(b - t.deadline))
|
| 52 |
+
assert abs(t.deadline - nearest) <= 5, \
|
| 53 |
+
f"Deadline {t.deadline} too far from base {nearest}"
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 57 |
+
# FIX 1 β Grader trajectory bug
|
| 58 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 59 |
+
class TestGraderTrajectoryBug:
|
| 60 |
+
def test_empty_trajectory_returns_min(self):
|
| 61 |
+
assert grader({}) == 0.01
|
| 62 |
+
|
| 63 |
+
def test_missing_tasks_returns_min(self):
|
| 64 |
+
assert grader({"time_step": 50, "energy": 0.8}) == 0.01
|
| 65 |
+
|
| 66 |
+
def test_empty_tasks_list_returns_min(self):
|
| 67 |
+
assert grader({"tasks": [], "time_step": 50, "energy": 0.8}) == 0.01
|
| 68 |
+
|
| 69 |
+
def test_grader_class_empty_trajectory(self):
|
| 70 |
+
for cls in [EasyGrader, MediumGrader, HardGrader, ExpertGrader]:
|
| 71 |
+
score = cls()(trajectory={})
|
| 72 |
+
assert score == 0.01, f"{cls.__name__} returned {score} for empty trajectory"
|
| 73 |
+
|
| 74 |
+
def test_from_trajectory_empty(self):
|
| 75 |
+
score, success, msg = _from_trajectory({}, "easy")
|
| 76 |
+
assert score == 0.01
|
| 77 |
+
assert success is False
|
| 78 |
+
assert "empty trajectory" in msg
|
| 79 |
+
|
| 80 |
+
def test_real_trajectory_scores_above_min(self):
|
| 81 |
+
"""A trajectory with completed tasks should score > 0.01."""
|
| 82 |
+
tasks = generate_tasks("easy", seed=1)
|
| 83 |
+
for t in tasks:
|
| 84 |
+
t.progress = 1.0
|
| 85 |
+
traj = {"tasks": [t.model_dump() for t in tasks], "time_step": 20, "energy": 0.7}
|
| 86 |
+
assert grader(traj) > 0.01
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 90 |
+
# Environment basics
|
| 91 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 92 |
+
class TestReset:
|
| 93 |
+
def test_reset_produces_clean_state(self):
|
| 94 |
+
env = CLMEnvironment(tasks=generate_tasks("easy", seed=0), max_steps=50)
|
| 95 |
+
obs = env.reset()
|
| 96 |
+
assert env.state.energy == 1.0
|
| 97 |
+
assert env.state.stress == 0.0
|
| 98 |
+
assert env.state.time_step == 0
|
| 99 |
+
assert all(t.progress == 0.0 for t in env.state.tasks)
|
| 100 |
+
|
| 101 |
+
def test_reset_after_episode_clears_state(self):
|
| 102 |
+
env = CLMEnvironment(tasks=generate_tasks("easy", seed=0), max_steps=50)
|
| 103 |
+
env.reset()
|
| 104 |
+
for _ in range(10):
|
| 105 |
+
env.step(Action(type="work", task_id="e1"))
|
| 106 |
+
env.reset()
|
| 107 |
+
assert env.state.time_step == 0
|
| 108 |
+
assert env.state.energy == 1.0
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 112 |
+
# Blocked-task penalty (Fix 3 indirectly β env mechanics)
|
| 113 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 114 |
+
class TestBlockedTaskPenalty:
|
| 115 |
+
def test_working_on_blocked_task_gives_penalty(self):
|
| 116 |
+
tasks = generate_tasks("hard", seed=0)
|
| 117 |
+
env = CLMEnvironment(tasks=tasks, max_steps=50)
|
| 118 |
+
env.reset()
|
| 119 |
+
|
| 120 |
+
# h3 depends on h1 β h1 not done yet, so h3 is blocked
|
| 121 |
+
blocked = env._blocked_ids()
|
| 122 |
+
assert "h3" in blocked, "h3 should be blocked at episode start"
|
| 123 |
+
|
| 124 |
+
_, reward, _, _ = env.step(Action(type="work", task_id="h3"))
|
| 125 |
+
assert reward <= -0.15, f"Expected penalty for blocked task, got {reward}"
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 129 |
+
# FIX 3 β Stochastic interruptions
|
| 130 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 131 |
+
class TestStochasticInterruptions:
|
| 132 |
+
def test_hard_eventually_interrupts(self):
|
| 133 |
+
"""Over many seeds, at least one hard episode should fire an interruption."""
|
| 134 |
+
fired = False
|
| 135 |
+
for seed in range(50):
|
| 136 |
+
tasks = generate_tasks("hard", seed=seed)
|
| 137 |
+
env = CLMEnvironment(tasks=tasks, max_steps=50, seed=seed)
|
| 138 |
+
env.reset()
|
| 139 |
+
done = False
|
| 140 |
+
while not done:
|
| 141 |
+
_, _, done, _ = env.step(Action(type="work", task_id=tasks[0].id))
|
| 142 |
+
if env.state.interruption_count > 0:
|
| 143 |
+
fired = True
|
| 144 |
+
break
|
| 145 |
+
assert fired, "Expected at least one interruption across 50 hard seeds"
|
| 146 |
+
|
| 147 |
+
def test_interruptions_respect_budget(self):
|
| 148 |
+
"""Hard episodes should never exceed budget=2 interruptions."""
|
| 149 |
+
for seed in range(30):
|
| 150 |
+
tasks = generate_tasks("hard", seed=seed)
|
| 151 |
+
env = CLMEnvironment(tasks=tasks, max_steps=50, seed=seed)
|
| 152 |
+
env.reset()
|
| 153 |
+
done = False
|
| 154 |
+
while not done:
|
| 155 |
+
_, _, done, _ = env.step(Action(type="work", task_id=tasks[0].id))
|
| 156 |
+
assert env.state.interruption_count <= 2, \
|
| 157 |
+
f"Seed {seed}: got {env.state.interruption_count} interruptions, max is 2"
|
| 158 |
+
|
| 159 |
+
def test_no_interruptions_on_easy(self):
|
| 160 |
+
for seed in range(10):
|
| 161 |
+
tasks = generate_tasks("easy", seed=seed)
|
| 162 |
+
env = CLMEnvironment(tasks=tasks, max_steps=50, seed=seed)
|
| 163 |
+
env.reset()
|
| 164 |
+
done = False
|
| 165 |
+
while not done:
|
| 166 |
+
_, _, done, _ = env.step(Action(type="break"))
|
| 167 |
+
assert env.state.interruption_count == 0
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 171 |
+
# Burnout terminates episode
|
| 172 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 173 |
+
class TestBurnout:
|
| 174 |
+
def test_burnout_terminates_episode(self):
|
| 175 |
+
tasks = generate_tasks("easy", seed=0)
|
| 176 |
+
env = CLMEnvironment(tasks=tasks, max_steps=200)
|
| 177 |
+
env.reset()
|
| 178 |
+
env.state.energy = 0.08 # just above burnout threshold
|
| 179 |
+
done = False
|
| 180 |
+
for _ in range(5):
|
| 181 |
+
_, _, done, info = env.step(Action(type="work", task_id="e1"))
|
| 182 |
+
if done:
|
| 183 |
+
break
|
| 184 |
+
assert done, "Episode should terminate on burnout"
|
| 185 |
+
|
| 186 |
+
def test_burnout_applies_penalty(self):
|
| 187 |
+
tasks = generate_tasks("easy", seed=0)
|
| 188 |
+
env = CLMEnvironment(tasks=tasks, max_steps=200)
|
| 189 |
+
env.reset()
|
| 190 |
+
env.state.energy = 0.08
|
| 191 |
+
rewards = []
|
| 192 |
+
done = False
|
| 193 |
+
for _ in range(5):
|
| 194 |
+
_, r, done, _ = env.step(Action(type="work", task_id="e1"))
|
| 195 |
+
rewards.append(r)
|
| 196 |
+
if done:
|
| 197 |
+
break
|
| 198 |
+
assert any(r <= -0.5 for r in rewards), "Burnout should produce a large negative reward"
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 202 |
+
# Grader score bounds
|
| 203 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 204 |
+
class TestGraderBounds:
|
| 205 |
+
def test_grader_always_in_bounds(self):
|
| 206 |
+
for level in ["easy", "medium", "hard", "expert"]:
|
| 207 |
+
for seed in range(10):
|
| 208 |
+
tasks = generate_tasks(level, seed=seed)
|
| 209 |
+
for frac in [0.0, 0.3, 0.7, 1.0]:
|
| 210 |
+
for t in tasks:
|
| 211 |
+
t.progress = frac
|
| 212 |
+
score = deterministic_grader(tasks, time_step=30, final_energy=0.5)
|
| 213 |
+
assert 0.01 <= score <= 0.99, \
|
| 214 |
+
f"Score {score} out of bounds for {level} seed={seed} progress={frac}"
|
| 215 |
+
|
| 216 |
+
def test_grader_higher_completion_scores_higher(self):
|
| 217 |
+
tasks_low = generate_tasks("medium", seed=1)
|
| 218 |
+
tasks_high = generate_tasks("medium", seed=1)
|
| 219 |
+
for t in tasks_low: t.progress = 0.0
|
| 220 |
+
for t in tasks_high: t.progress = 1.0
|
| 221 |
+
assert deterministic_grader(tasks_high, 30, 0.7) > \
|
| 222 |
+
deterministic_grader(tasks_low, 30, 0.7)
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 226 |
+
# FIX 6 β Partial observability
|
| 227 |
+
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 228 |
+
class TestPartialObservability:
|
| 229 |
+
def test_observation_has_no_raw_floats(self):
|
| 230 |
+
env = CLMEnvironment(tasks=generate_tasks("easy", seed=0))
|
| 231 |
+
obs = env.reset()
|
| 232 |
+
vs = obs.visible_state
|
| 233 |
+
# energy_level and stress float must NOT appear in visible state
|
| 234 |
+
assert not hasattr(vs, "energy_level"), "energy_level float should not be in observation"
|
| 235 |
+
assert isinstance(vs.fatigue_level, str)
|
| 236 |
+
assert isinstance(vs.stress_level, str)
|
| 237 |
+
|
| 238 |
+
def test_fatigue_levels_are_valid(self):
|
| 239 |
+
env = CLMEnvironment(tasks=generate_tasks("easy", seed=0))
|
| 240 |
+
env.reset()
|
| 241 |
+
env.state.energy = 0.1 # should be "high" fatigue
|
| 242 |
+
obs = env._get_observation()
|
| 243 |
+
assert obs.visible_state.fatigue_level == "high"
|
| 244 |
+
env.state.energy = 0.5 # "medium"
|
| 245 |
+
assert env._get_observation().visible_state.fatigue_level == "medium"
|
| 246 |
+
env.state.energy = 0.9 # "low"
|
| 247 |
+
assert env._get_observation().visible_state.fatigue_level == "low"
|
| 248 |
+
|
| 249 |
+
def test_stress_levels_are_valid(self):
|
| 250 |
+
env = CLMEnvironment(tasks=generate_tasks("easy", seed=0))
|
| 251 |
+
env.reset()
|
| 252 |
+
env.state.stress = 0.8
|
| 253 |
+
assert env._get_observation().visible_state.stress_level == "critical"
|
| 254 |
+
env.state.stress = 0.5
|
| 255 |
+
assert env._get_observation().visible_state.stress_level == "elevated"
|
| 256 |
+
env.state.stress = 0.1
|
| 257 |
+
assert env._get_observation().visible_state.stress_level == "calm"
|