Spaces:
Sleeping
Sleeping
| # Cloud Queue Env - High Severity Analysis (Updated) | |
| Date: 2026-04-12 | |
| This note captures the two highest-impact issues still present in the environment logic. | |
| ## 1) Arrival Modeling and Arrival Metrics Mismatch | |
| Files and lines: | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:240 | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:241 | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:248 | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:259 | |
| What happens now: | |
| - The simulator samples Poisson arrivals each step. | |
| - If sampled arrivals are greater than 1, the code still creates only one incoming job object. | |
| - The arrivals metric is incremented by 1.0, not by sampled arrival count. | |
| Why this is high severity: | |
| - Burst behavior is compressed into a single-event stream, so load spikes are underrepresented. | |
| - Several business metrics and grader components become biased (rejections, abandonment, SLA pressure). | |
| - Policy ranking can drift because the environment under-penalizes burst scenarios. | |
| Impact on benchmark credibility: | |
| - High. This directly affects realism, fairness of grading, and reproducibility quality claims. | |
| Recommended fix direction: | |
| - Track all sampled arrivals each step. | |
| - Either queue all arrivals or maintain an explicit backlog of pending incoming jobs. | |
| - Increment arrivals metric using true sampled count. | |
| ## 2) Agent Dispatch Control Is Partially Bypassed by Autodispatch | |
| Files and lines: | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:353 | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:391 | |
| - cloud_queue_env/server/cloud_queue_env_environment.py:738 | |
| What happens now: | |
| - The agent may choose an action that is not dispatch. | |
| - After action application, the environment still runs autodispatch and moves work to idle servers. | |
| Why this is high severity: | |
| - It weakens action-to-outcome causality for dispatch decisions. | |
| - A policy can look better than it should because server assignment still happens automatically. | |
| - It reduces benchmark difficulty in exactly the control surface the task is evaluating. | |
| Impact on benchmark credibility: | |
| - High. This can alter policy comparisons and invalidate assumptions about explicit control. | |
| Recommended fix direction: | |
| - Make dispatch behavior explicit by mode: | |
| - strict-control mode: only agent dispatches. | |
| - assisted mode: autodispatch on, but document this clearly and score accordingly. | |
| - Keep one consistent mode for official benchmark scoring. | |
| ## Priority Summary | |
| 1. Fix arrival accounting and multi-arrival handling first. | |
| 2. Fix dispatch authority semantics second. | |
| Both should be addressed before claiming benchmark-grade reliability. | |