Stanford-CongLab
/

LabHorizon-Model

@@ -192,7 +192,7 @@ Main training settings:
 ## 🧠 Training Result
-The table compares direct-prompting SOTA/baseline systems, the base Qwen model, this trained LoRA adapter, and the trained+agents system evaluated on the same LabHorizon test splits.
 | System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
 |:---|---:|---:|---:|---:|
@@ -201,7 +201,6 @@ The table compares direct-prompting SOTA/baseline systems, the base Qwen model,
 | GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
 | Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
 | Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
-| Qwen3.6-35B-A3B(trained) | 0.635 | 0.4030 | 0.4170 | 0.4100 |
 | Qwen3.6-35B-A3B(trained+agents) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
 Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro is used as Simulator/Selector. The Simulator/Selector choice is the current setting and has not been exhaustively ablated.

 ## 🧠 Training Result
+The table compares direct-prompting SOTA/baseline systems, the base Qwen model, and the trained+agents system evaluated on the same LabHorizon test splits.
 | System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
 |:---|---:|---:|---:|---:|
 | GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
 | Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
 | Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
 | Qwen3.6-35B-A3B(trained+agents) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
 Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro is used as Simulator/Selector. The Simulator/Selector choice is the current setting and has not been exhaustively ablated.