| # Results Raw |
|
|
| This file records exact values and exact partial statuses without additional conclusions. |
|
|
| ## Proxy Sprint v7 Main Table |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` | |
| | random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` | |
| | candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` | |
| | oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` | |
| | scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` | |
| |
| ## Proxy Sprint v7 Ablation Table |
| |
| Source: |
| |
| - `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | no_planner | `0.2` | |
| | no_memory | `0.3233333333333333` | |
| | no_task_conditioning | `0.28` | |
| | no_geometry | `0.27` | |
| | no_camera_pose | `0.29333333333333333` | |
| |
| ## Selector Table |
| |
| Sources: |
| |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json` |
| - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` | |
| | iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` | |
| | iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` | |
| | routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` | |
|
|
| ## Proxy Baseline Compare Table |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` | |
| | iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` | |
| |
| ## RLBench Recovered Push-Box Comparator |
| |
| Sources: |
| |
| - `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json` |
| - `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` | |
| | historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` | |
|
|
| ## Official AnyBimanual Overlap Training Milestones |
|
|
| Sources: |
|
|
| - `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md` |
|
|
| | Global step | Raw values | |
| | --- | --- | |
| | 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` | |
| | 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` | |
| | 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` | |
| | 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` | |
| | 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` | |
| | 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` | |
| | 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` | |
| | 1000 checkpoint | train reached `weights/1000` and exited cleanly | |
|
|
| ## Official AnyBimanual Overlap Eval Final Output |
|
|
| Sources: |
|
|
| - `reports/anybimanual_subset3_overlap_resume1000_eval.log` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | local last complete step | `1000` | |
| | local mean success | `0.16` | |
| | coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` | |
| | coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` | |
| | dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` | |
| | public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` | |
| | public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` | |
| | delta vs public best mean success | `-0.5333333333333333` | |
| | delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` | |
|
|
| ## Validated General-Task Anchor: dual_push_buttons |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` | |
| | local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` | |
| | local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` | |
| | local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` | |
| | local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` | |
|
|
| ## RVT Overlap Branch |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` | |
| | frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` | |
| | frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` | |
| | local overlap floor used for gate | `0.16` | |
| | stage2 run flag | `false` | |
|
|
| ## Dual-Push Nonzero Branch |
|
|
| Source: |
|
|
| - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` | |
| | controller sweep planning_c4 | `0.0` | |
| | controller sweep ik_c1 | `0.0` | |
| | controller sweep planning_c1_s05 | `0.0` | |
| | kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` | |
| | weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` | |
| | demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` | |
| | weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` | |
| | chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` | |
| | retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` | |
| | retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` | |
| | retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` | |
| | retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` | |
|
|
| ## Dual-Push Full-Architecture Hybrid |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md` |
| - `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json` |
| - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json` |
|
|
| | Item | Raw values | |
| | --- | --- | |
| | elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` | |
| | full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` | |
|
|
| ## Previous Repo Raw Results |
|
|
| Previous raw tables are preserved in: |
|
|
| - `history/VLAarchtests_previous_README.md` |
|
|