Add files using upload-large-folder tool

bfb9665 verified about 1 month ago

8.94 kB

	# Results Raw

	This file records exact values and exact partial statuses without additional conclusions.

	## Proxy Sprint v7 Main Table

	Source:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| base_model \| mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` \|
	\| random \| mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` \|
	\| candidate0 \| mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` \|
	\| oracle \| mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` \|
	\| scripted \| mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` \|

	## Proxy Sprint v7 Ablation Table

	Source:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| no_planner \| `0.2` \|
	\| no_memory \| `0.3233333333333333` \|
	\| no_task_conditioning \| `0.28` \|
	\| no_geometry \| `0.27` \|
	\| no_camera_pose \| `0.29333333333333333` \|

	## Selector Table

	Sources:

	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| iter6 \| mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` \|
	\| iter7 \| mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` \|
	\| iter8 bag fixed slice \| mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` \|
	\| routed controller \| mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` \|

	## Proxy Baseline Compare Table

	Source:

	- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| baseline_rgbd_stage3 \| mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` \|
	\| iter5_selector \| mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` \|

	## RLBench Recovered Push-Box Comparator

	Sources:

	- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
	- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| current fair-step1 final \| mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` \|
	\| historical push-box control \| mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` \|

	## Official AnyBimanual Overlap Training Milestones

	Sources:

	- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`

	\| Global step \| Raw values \|
	\| --- \| --- \|
	\| 300 \| loss `40.91718`; sample time `0.093029`; step time `14.0686` \|
	\| 400 \| loss `33.26684`; sample time `0.073085`; step time `14.3032` \|
	\| 500 \| loss `36.07054`; sample time `0.048558`; step time `11.1376` \|
	\| 600 \| loss `35.32345`; sample time `0.040642`; step time `9.7719` \|
	\| 700 \| loss `28.50959`; sample time `0.057937`; step time `10.9347` \|
	\| 800 \| loss `23.60169`; sample time `0.032697`; step time `11.8652` \|
	\| 900 \| loss `15.28901`; sample time `0.051232`; step time `11.5073` \|
	\| 1000 checkpoint \| train reached `weights/1000` and exited cleanly \|

	## Official AnyBimanual Overlap Eval Final Output

	Sources:

	- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| local last complete step \| `1000` \|
	\| local mean success \| `0.16` \|
	\| coordinated_push_box \| success `0.0`; return `0.0`; final score log line `0.0` \|
	\| coordinated_lift_ball \| success `0.0`; return `0.0`; final score log line `0.0` \|
	\| dual_push_buttons \| success `0.48`; return `12.0`; final score log line `12.0` \|
	\| public best overlap step in local summary \| step `60000`; mean success `0.6933333333333334` \|
	\| public best overlap per-task success \| coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` \|
	\| delta vs public best mean success \| `-0.5333333333333333` \|
	\| delta vs public best per-task success \| coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` \|

	## Validated General-Task Anchor: dual_push_buttons

	Source:

	- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| public AnyBimanual release \| step `60000`; success `0.96`; return `24.0`; length `21.56` \|
	\| local official single-task eval \| step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` \|
	\| local clip backbone-only \| success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` \|
	\| local elastic reveal proxy iter6 \| success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` \|
	\| local RVT hybrid frozen fixed-bounds \| success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` \|

	## RVT Overlap Branch

	Sources:

	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| frozen RVT stage1 train \| checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` \|
	\| frozen RVT overlap eval \| mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` \|
	\| frozen fixed-bounds RVT overlap eval \| mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` \|
	\| local overlap floor used for gate \| `0.16` \|
	\| stage2 run flag \| `false` \|

	## Dual-Push Nonzero Branch

	Source:

	- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| direct rollout smoke planning \| `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` \|
	\| controller sweep planning_c4 \| `0.0` \|
	\| controller sweep ik_c1 \| `0.0` \|
	\| controller sweep planning_c1_s05 \| `0.0` \|
	\| kNN top-1 planning \| `5` episodes; `25` steps; mean success `0.0` \|
	\| weighted rollout smoke planning \| `5` episodes; `25` steps; mean success `0.0` \|
	\| demo replay through absolute_action_from_delta \| mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` \|
	\| weighted kNN top-1 planning length120 \| `2` episodes; mean success `0.0` \|
	\| chunk8 probe IK length120 \| `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` \|
	\| retargeted demo task_state smoke \| `2` episodes; mean success `1.0`; mean return `1.0` \|
	\| retargeted demo checkpoint-backbone ep5 \| `5` episodes; mean success `1.0`; mean return `1.0` \|
	\| retargeted demo checkpoint-backbone vision ep1 \| `1` episode; mean success `1.0`; mean return `1.0` \|
	\| retargeted demo checkpoint-backbone vision ep5 \| `5` episodes; mean success `1.0`; mean return `1.0` \|

	## Dual-Push Full-Architecture Hybrid

	Sources:

	- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
	- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
	- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| elastic checkpoint retargeted-demo probe \| `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` \|
	\| full-architecture hybrid eval \| `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` \|

	## Previous Repo Raw Results

	Previous raw tables are preserved in:

	- `history/VLAarchtests_previous_README.md`

	# Results Raw

	This file records exact values and exact partial statuses without additional conclusions.

	## Proxy Sprint v7 Main Table

	Source:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| base_model \| mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` \|
	\| random \| mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` \|
	\| candidate0 \| mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` \|
	\| oracle \| mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` \|
	\| scripted \| mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` \|

	## Proxy Sprint v7 Ablation Table

	Source:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| no_planner \| `0.2` \|
	\| no_memory \| `0.3233333333333333` \|
	\| no_task_conditioning \| `0.28` \|
	\| no_geometry \| `0.27` \|
	\| no_camera_pose \| `0.29333333333333333` \|

	## Selector Table

	Sources:

	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
	- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| iter6 \| mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` \|
	\| iter7 \| mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` \|
	\| iter8 bag fixed slice \| mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` \|
	\| routed controller \| mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` \|

	## Proxy Baseline Compare Table

	Source:

	- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| baseline_rgbd_stage3 \| mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` \|
	\| iter5_selector \| mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` \|

	## RLBench Recovered Push-Box Comparator

	Sources:

	- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
	- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| current fair-step1 final \| mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` \|
	\| historical push-box control \| mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` \|

	## Official AnyBimanual Overlap Training Milestones

	Sources:

	- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`

	\| Global step \| Raw values \|
	\| --- \| --- \|
	\| 300 \| loss `40.91718`; sample time `0.093029`; step time `14.0686` \|
	\| 400 \| loss `33.26684`; sample time `0.073085`; step time `14.3032` \|
	\| 500 \| loss `36.07054`; sample time `0.048558`; step time `11.1376` \|
	\| 600 \| loss `35.32345`; sample time `0.040642`; step time `9.7719` \|
	\| 700 \| loss `28.50959`; sample time `0.057937`; step time `10.9347` \|
	\| 800 \| loss `23.60169`; sample time `0.032697`; step time `11.8652` \|
	\| 900 \| loss `15.28901`; sample time `0.051232`; step time `11.5073` \|
	\| 1000 checkpoint \| train reached `weights/1000` and exited cleanly \|

	## Official AnyBimanual Overlap Eval Final Output

	Sources:

	- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| local last complete step \| `1000` \|
	\| local mean success \| `0.16` \|
	\| coordinated_push_box \| success `0.0`; return `0.0`; final score log line `0.0` \|
	\| coordinated_lift_ball \| success `0.0`; return `0.0`; final score log line `0.0` \|
	\| dual_push_buttons \| success `0.48`; return `12.0`; final score log line `12.0` \|
	\| public best overlap step in local summary \| step `60000`; mean success `0.6933333333333334` \|
	\| public best overlap per-task success \| coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` \|
	\| delta vs public best mean success \| `-0.5333333333333333` \|
	\| delta vs public best per-task success \| coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` \|

	## Validated General-Task Anchor: dual_push_buttons

	Source:

	- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| public AnyBimanual release \| step `60000`; success `0.96`; return `24.0`; length `21.56` \|
	\| local official single-task eval \| step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` \|
	\| local clip backbone-only \| success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` \|
	\| local elastic reveal proxy iter6 \| success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` \|
	\| local RVT hybrid frozen fixed-bounds \| success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` \|

	## RVT Overlap Branch

	Sources:

	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| frozen RVT stage1 train \| checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` \|
	\| frozen RVT overlap eval \| mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` \|
	\| frozen fixed-bounds RVT overlap eval \| mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` \|
	\| local overlap floor used for gate \| `0.16` \|
	\| stage2 run flag \| `false` \|

	## Dual-Push Nonzero Branch

	Source:

	- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| direct rollout smoke planning \| `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` \|
	\| controller sweep planning_c4 \| `0.0` \|
	\| controller sweep ik_c1 \| `0.0` \|
	\| controller sweep planning_c1_s05 \| `0.0` \|
	\| kNN top-1 planning \| `5` episodes; `25` steps; mean success `0.0` \|
	\| weighted rollout smoke planning \| `5` episodes; `25` steps; mean success `0.0` \|
	\| demo replay through absolute_action_from_delta \| mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` \|
	\| weighted kNN top-1 planning length120 \| `2` episodes; mean success `0.0` \|
	\| chunk8 probe IK length120 \| `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` \|
	\| retargeted demo task_state smoke \| `2` episodes; mean success `1.0`; mean return `1.0` \|
	\| retargeted demo checkpoint-backbone ep5 \| `5` episodes; mean success `1.0`; mean return `1.0` \|
	\| retargeted demo checkpoint-backbone vision ep1 \| `1` episode; mean success `1.0`; mean return `1.0` \|
	\| retargeted demo checkpoint-backbone vision ep5 \| `5` episodes; mean success `1.0`; mean return `1.0` \|

	## Dual-Push Full-Architecture Hybrid

	Sources:

	- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
	- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
	- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`

	\| Item \| Raw values \|
	\| --- \| --- \|
	\| elastic checkpoint retargeted-demo probe \| `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` \|
	\| full-architecture hybrid eval \| `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` \|

	## Previous Repo Raw Results

	Previous raw tables are preserved in:

	- `history/VLAarchtests_previous_README.md`