black-yt commited on
Commit
c03a16b
Β·
1 Parent(s): 96b75d0

Update model card

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -137,13 +137,13 @@ The tables below report direct-prompting baselines on the same test split used f
137
  | 11 | Qwen3.5 9B | 0.485 |
138
  | 12 | Gemini 3.5 Flash | 0.485 |
139
  | 13 | Qwen3.6 35B-A3B | 0.475 |
140
- | 14 | Gemini 3.1 Pro Preview | 0.465 |
141
 
142
  ### πŸ§ͺ Level 2: Protocol-Conditioned Planning
143
 
144
  | Rank | Model | Final Score | Action Sequence Similarity | Parameter Accuracy |
145
  |:---:|:---|---:|---:|---:|
146
- | πŸ₯‡ | Gemini 3.1 Pro Preview | 0.3263 | 0.3195 | 0.3331 |
147
  | πŸ₯ˆ | Grok 4.3 | 0.3244 | 0.3339 | 0.3148 |
148
  | πŸ₯‰ | Kimi K2.6 | 0.3150 | 0.2845 | 0.3456 |
149
  | 4 | Gemini 3.5 Flash | 0.3039 | 0.2686 | 0.3391 |
@@ -197,14 +197,14 @@ The table compares direct-prompting SOTA/baseline systems, the base Qwen model,
197
  | System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
198
  |:---|---:|---:|---:|---:|
199
  | Grok 4.3 | 0.555 | 0.3339 | 0.3148 | 0.3244 |
200
- | Gemini 3.1 Pro Preview | 0.465 | 0.3195 | 0.3331 | 0.3263 |
201
  | GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
202
  | Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
203
  | Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
204
  | Qwen3.6-35B-A3B(trained) | 0.635 | 0.4030 | 0.4170 | 0.4100 |
205
  | Qwen3.6-35B-A3B(trained+agents) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
206
 
207
- Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro Preview is used as Simulator/Selector. The Simulator/Selector choice is the current setting and has not been exhaustively ablated.
208
 
209
  The trained adapter improves both levels over the direct Qwen3.6-35B-A3B baseline. Level 1 improves from `0.475` to `0.635`, indicating better laboratory asset-to-action alignment. Level 2 Final Score improves from `0.2534` to `0.4100`, indicating better action ordering, parameter retention, and dependency tracking. The trained+agents setting further improves consistency by selecting candidates with stronger symbolic protocol-state validity.
210
 
@@ -212,7 +212,7 @@ The trained adapter improves both levels over the direct Qwen3.6-35B-A3B baselin
212
 
213
  The trained+agents result uses this adapter as the Actor and combines it with a separate Simulator/Selector model. The agent is not a physical simulator and does not execute wet-lab actions. It samples candidate next actions or action sequences, checks symbolic protocol-state consistency, and selects the most consistent candidate.
214
 
215
- Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro Preview is used as Simulator/Selector. This Simulator/Selector choice is the current setting and has not been exhaustively ablated.
216
 
217
  ## πŸš€ Quick Start
218
 
 
137
  | 11 | Qwen3.5 9B | 0.485 |
138
  | 12 | Gemini 3.5 Flash | 0.485 |
139
  | 13 | Qwen3.6 35B-A3B | 0.475 |
140
+ | 14 | Gemini 3.1 Pro | 0.465 |
141
 
142
  ### πŸ§ͺ Level 2: Protocol-Conditioned Planning
143
 
144
  | Rank | Model | Final Score | Action Sequence Similarity | Parameter Accuracy |
145
  |:---:|:---|---:|---:|---:|
146
+ | πŸ₯‡ | Gemini 3.1 Pro | 0.3263 | 0.3195 | 0.3331 |
147
  | πŸ₯ˆ | Grok 4.3 | 0.3244 | 0.3339 | 0.3148 |
148
  | πŸ₯‰ | Kimi K2.6 | 0.3150 | 0.2845 | 0.3456 |
149
  | 4 | Gemini 3.5 Flash | 0.3039 | 0.2686 | 0.3391 |
 
197
  | System | Level 1 Next Action Accuracy | Level 2 Action Sequence Similarity | Level 2 Parameter Accuracy | Level 2 Final Score |
198
  |:---|---:|---:|---:|---:|
199
  | Grok 4.3 | 0.555 | 0.3339 | 0.3148 | 0.3244 |
200
+ | Gemini 3.1 Pro | 0.465 | 0.3195 | 0.3331 | 0.3263 |
201
  | GPT-5.5 | 0.535 | 0.2092 | 0.2459 | 0.2276 |
202
  | Kimi K2.6 | 0.550 | 0.2845 | 0.3456 | 0.3150 |
203
  | Qwen3.6-35B-A3B | 0.475 | 0.2585 | 0.2483 | 0.2534 |
204
  | Qwen3.6-35B-A3B(trained) | 0.635 | 0.4030 | 0.4170 | 0.4100 |
205
  | Qwen3.6-35B-A3B(trained+agents) | **0.665** | **0.4485** | **0.4580** | **0.4532** |
206
 
207
+ Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro is used as Simulator/Selector. The Simulator/Selector choice is the current setting and has not been exhaustively ablated.
208
 
209
  The trained adapter improves both levels over the direct Qwen3.6-35B-A3B baseline. Level 1 improves from `0.475` to `0.635`, indicating better laboratory asset-to-action alignment. Level 2 Final Score improves from `0.2534` to `0.4100`, indicating better action ordering, parameter retention, and dependency tracking. The trained+agents setting further improves consistency by selecting candidates with stronger symbolic protocol-state validity.
210
 
 
212
 
213
  The trained+agents result uses this adapter as the Actor and combines it with a separate Simulator/Selector model. The agent is not a physical simulator and does not execute wet-lab actions. It samples candidate next actions or action sequences, checks symbolic protocol-state consistency, and selects the most consistent candidate.
214
 
215
+ Agent setting: `Qwen3.6-35B-A3B(trained)` is used as Actor, and Gemini 3.1 Pro is used as Simulator/Selector. This Simulator/Selector choice is the current setting and has not been exhaustively ablated.
216
 
217
  ## πŸš€ Quick Start
218