Image-Text-to-Text
PEFT
Safetensors
laboratory
protocol-conditioned-action-prediction
lora
qwen
long-horizon-planning
conversational
Instructions to use Stanford-CongLab/LabHorizon-Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Stanford-CongLab/LabHorizon-Model with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.6-35B-A3B") model = PeftModel.from_pretrained(base_model, "Stanford-CongLab/LabHorizon-Model") - Notebooks
- Google Colab
- Kaggle
Update model card
Browse files
README.md
CHANGED
|
@@ -137,13 +137,13 @@ The tables below report direct-prompting baselines on the same test split used f
|
|
| 137 |
| 11 | Qwen3.5 9B | 0.485 |
|
| 138 |
| 12 | Gemini 3.5 Flash | 0.485 |
|
| 139 |
| 13 | Qwen3.6 35B-A3B | 0.475 |
|
| 140 |
-
| 14 | Gemini 3.1 Pro
|
| 141 |
|
| 142 |
### π§ͺ Level 2: Protocol-Conditioned Planning
|
| 143 |
|
| 144 |
| Rank | Model | Final Score | Action Sequence Similarity | Parameter Accuracy |
|
| 145 |
|:---:|:---|---:|---:|---:|
|
| 146 |
-
| π₯ | Gemini 3.1 Pro
|
| 147 |
| π₯ | Grok 4.3 | 0.3244 | 0.3339 | 0.3148 |
|
| 148 |
| π₯ | Kimi K2.6 | 0.3150 | 0.2845 | 0.3456 |
|
| 149 |
| 4 | Gemini 3.5 Flash | 0.3039 | 0.2686 | 0.3391 |
|
|
@@ -197,14 +197,14 @@ The table compares direct-prompting SOTA/baseline systems, the base Qwen model,
|
|
| 197 |
| System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
|
| 198 |
|:---|---:|---:|---:|---:|
|
| 199 |
| Grok 4.3 | 0.555 | 0.3339 | 0.3148 | 0.3244 |
|
| 200 |
-
| Gemini 3.1 Pro
|
| 201 |
| GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
|
| 202 |
| Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
|
| 203 |
| Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
|
| 204 |
| Qwen3.6-35B-A3B(trained) | 0.635 | 0.4030 | 0.4170 | 0.4100 |
|
| 205 |
| Qwen3.6-35B-A3B(trained+agents) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
|
| 206 |
|
| 207 |
-
Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro
|
| 208 |
|
| 209 |
The trained adapter improves both levels over the direct Qwen3.6-35B-A3B baseline. Level 1 improves from `0.475` to `0.635`, indicating better laboratory asset-to-action alignment. Level 2 Final Score improves from `0.2534` to `0.4100`, indicating better action ordering, parameter retention, and dependency tracking. The trained+agents setting further improves consistency by selecting candidates with stronger symbolic protocol-state validity.
|
| 210 |
|
|
@@ -212,7 +212,7 @@ The trained adapter improves both levels over the direct Qwen3.6-35B-A3B baselin
|
|
| 212 |
|
| 213 |
The trained+agents result uses this adapter as the Actor and combines it with a separate Simulator/Selector model. The agent is not a physical simulator and does not execute wet-lab actions. It samples candidate next actions or action sequences, checks symbolic protocol-state consistency, and selects the most consistent candidate.
|
| 214 |
|
| 215 |
-
Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro
|
| 216 |
|
| 217 |
## π Quick Start
|
| 218 |
|
|
|
|
| 137 |
| 11 | Qwen3.5 9B | 0.485 |
|
| 138 |
| 12 | Gemini 3.5 Flash | 0.485 |
|
| 139 |
| 13 | Qwen3.6 35B-A3B | 0.475 |
|
| 140 |
+
| 14 | Gemini 3.1 Pro | 0.465 |
|
| 141 |
|
| 142 |
### π§ͺ Level 2: Protocol-Conditioned Planning
|
| 143 |
|
| 144 |
| Rank | Model | Final Score | Action Sequence Similarity | Parameter Accuracy |
|
| 145 |
|:---:|:---|---:|---:|---:|
|
| 146 |
+
| π₯ | Gemini 3.1 Pro | 0.3263 | 0.3195 | 0.3331 |
|
| 147 |
| π₯ | Grok 4.3 | 0.3244 | 0.3339 | 0.3148 |
|
| 148 |
| π₯ | Kimi K2.6 | 0.3150 | 0.2845 | 0.3456 |
|
| 149 |
| 4 | Gemini 3.5 Flash | 0.3039 | 0.2686 | 0.3391 |
|
|
|
|
| 197 |
| System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
|
| 198 |
|:---|---:|---:|---:|---:|
|
| 199 |
| Grok 4.3 | 0.555 | 0.3339 | 0.3148 | 0.3244 |
|
| 200 |
+
| Gemini 3.1 Pro | 0.465 | 0.3195 | 0.3331 | 0.3263 |
|
| 201 |
| GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
|
| 202 |
| Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
|
| 203 |
| Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
|
| 204 |
| Qwen3.6-35B-A3B(trained) | 0.635 | 0.4030 | 0.4170 | 0.4100 |
|
| 205 |
| Qwen3.6-35B-A3B(trained+agents) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
|
| 206 |
|
| 207 |
+
Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro is used as Simulator/Selector. The Simulator/Selector choice is the current setting and has not been exhaustively ablated.
|
| 208 |
|
| 209 |
The trained adapter improves both levels over the direct Qwen3.6-35B-A3B baseline. Level 1 improves from `0.475` to `0.635`, indicating better laboratory asset-to-action alignment. Level 2 Final Score improves from `0.2534` to `0.4100`, indicating better action ordering, parameter retention, and dependency tracking. The trained+agents setting further improves consistency by selecting candidates with stronger symbolic protocol-state validity.
|
| 210 |
|
|
|
|
| 212 |
|
| 213 |
The trained+agents result uses this adapter as the Actor and combines it with a separate Simulator/Selector model. The agent is not a physical simulator and does not execute wet-lab actions. It samples candidate next actions or action sequences, checks symbolic protocol-state consistency, and selects the most consistent candidate.
|
| 214 |
|
| 215 |
+
Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro is used as Simulator/Selector. This Simulator/Selector choice is the current setting and has not been exhaustively ablated.
|
| 216 |
|
| 217 |
## π Quick Start
|
| 218 |
|