File size: 8,938 Bytes
bfb9665 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | # Results Raw
This file records exact values and exact partial statuses without additional conclusions.
## Proxy Sprint v7 Main Table
Source:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
| Item | Raw values |
| --- | --- |
| base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` |
| random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` |
| candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` |
| oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` |
| scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` |
## Proxy Sprint v7 Ablation Table
Source:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
| Item | Raw values |
| --- | --- |
| no_planner | `0.2` |
| no_memory | `0.3233333333333333` |
| no_task_conditioning | `0.28` |
| no_geometry | `0.27` |
| no_camera_pose | `0.29333333333333333` |
## Selector Table
Sources:
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
| Item | Raw values |
| --- | --- |
| iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` |
| iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` |
| iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` |
| routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` |
## Proxy Baseline Compare Table
Source:
- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
| Item | Raw values |
| --- | --- |
| baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` |
| iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` |
## RLBench Recovered Push-Box Comparator
Sources:
- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
| Item | Raw values |
| --- | --- |
| current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` |
| historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` |
## Official AnyBimanual Overlap Training Milestones
Sources:
- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`
| Global step | Raw values |
| --- | --- |
| 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` |
| 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` |
| 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` |
| 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` |
| 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` |
| 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` |
| 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` |
| 1000 checkpoint | train reached `weights/1000` and exited cleanly |
## Official AnyBimanual Overlap Eval Final Output
Sources:
- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
| Item | Raw values |
| --- | --- |
| local last complete step | `1000` |
| local mean success | `0.16` |
| coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` |
| coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` |
| dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` |
| public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` |
| public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` |
| delta vs public best mean success | `-0.5333333333333333` |
| delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` |
## Validated General-Task Anchor: dual_push_buttons
Source:
- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
| Item | Raw values |
| --- | --- |
| public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` |
| local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` |
| local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
| local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
| local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
## RVT Overlap Branch
Sources:
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
| Item | Raw values |
| --- | --- |
| frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` |
| frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
| frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
| local overlap floor used for gate | `0.16` |
| stage2 run flag | `false` |
## Dual-Push Nonzero Branch
Source:
- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
| Item | Raw values |
| --- | --- |
| direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` |
| controller sweep planning_c4 | `0.0` |
| controller sweep ik_c1 | `0.0` |
| controller sweep planning_c1_s05 | `0.0` |
| kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` |
| weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` |
| demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` |
| weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` |
| chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` |
| retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
## Dual-Push Full-Architecture Hybrid
Sources:
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
| Item | Raw values |
| --- | --- |
| elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` |
| full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` |
## Previous Repo Raw Results
Previous raw tables are preserved in:
- `history/VLAarchtests_previous_README.md`
|