File size: 8,938 Bytes
bfb9665
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# Results Raw

This file records exact values and exact partial statuses without additional conclusions.

## Proxy Sprint v7 Main Table

Source:

- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

| Item | Raw values |
| --- | --- |
| base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` |
| random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` |
| candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` |
| oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` |
| scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` |

## Proxy Sprint v7 Ablation Table

Source:

- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

| Item | Raw values |
| --- | --- |
| no_planner | `0.2` |
| no_memory | `0.3233333333333333` |
| no_task_conditioning | `0.28` |
| no_geometry | `0.27` |
| no_camera_pose | `0.29333333333333333` |

## Selector Table

Sources:

- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`

| Item | Raw values |
| --- | --- |
| iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` |
| iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` |
| iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` |
| routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` |

## Proxy Baseline Compare Table

Source:

- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`

| Item | Raw values |
| --- | --- |
| baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` |
| iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` |

## RLBench Recovered Push-Box Comparator

Sources:

- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`

| Item | Raw values |
| --- | --- |
| current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` |
| historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` |

## Official AnyBimanual Overlap Training Milestones

Sources:

- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`

| Global step | Raw values |
| --- | --- |
| 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` |
| 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` |
| 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` |
| 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` |
| 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` |
| 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` |
| 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` |
| 1000 checkpoint | train reached `weights/1000` and exited cleanly |

## Official AnyBimanual Overlap Eval Final Output

Sources:

- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`

| Item | Raw values |
| --- | --- |
| local last complete step | `1000` |
| local mean success | `0.16` |
| coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` |
| coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` |
| dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` |
| public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` |
| public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` |
| delta vs public best mean success | `-0.5333333333333333` |
| delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` |

## Validated General-Task Anchor: dual_push_buttons

Source:

- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`

| Item | Raw values |
| --- | --- |
| public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` |
| local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` |
| local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
| local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
| local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |

## RVT Overlap Branch

Sources:

- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`

| Item | Raw values |
| --- | --- |
| frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` |
| frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
| frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
| local overlap floor used for gate | `0.16` |
| stage2 run flag | `false` |

## Dual-Push Nonzero Branch

Source:

- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`

| Item | Raw values |
| --- | --- |
| direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` |
| controller sweep planning_c4 | `0.0` |
| controller sweep ik_c1 | `0.0` |
| controller sweep planning_c1_s05 | `0.0` |
| kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` |
| weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` |
| demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` |
| weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` |
| chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` |
| retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |

## Dual-Push Full-Architecture Hybrid

Sources:

- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`

| Item | Raw values |
| --- | --- |
| elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` |
| full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` |

## Previous Repo Raw Results

Previous raw tables are preserved in:

- `history/VLAarchtests_previous_README.md`