lsnu's picture
Add files using upload-large-folder tool
bfb9665 verified
# Results Raw
This file records exact values and exact partial statuses without additional conclusions.
## Proxy Sprint v7 Main Table
Source:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
| Item | Raw values |
| --- | --- |
| base_model | mean success `0.28`; foliage `0.39`; bag `0.31`; cloth `0.14` |
| random | mean success `0.43333333333333335`; foliage `0.41`; bag `0.37`; cloth `0.52` |
| candidate0 | mean success `0.2`; foliage `0.24`; bag `0.22`; cloth `0.14` |
| oracle | mean success `0.4066666666666667`; foliage `0.5`; bag `0.42`; cloth `0.3` |
| scripted | mean success `1.0`; foliage `1.0`; bag `1.0`; cloth `1.0` |
## Proxy Sprint v7 Ablation Table
Source:
- `VLAarchtests/artifacts/reports/sprint_v7_summary/reveal_sprint_summary_compact.json`
| Item | Raw values |
| --- | --- |
| no_planner | `0.2` |
| no_memory | `0.3233333333333333` |
| no_task_conditioning | `0.28` |
| no_geometry | `0.27` |
| no_camera_pose | `0.29333333333333333` |
## Selector Table
Sources:
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/full_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/summary.md`
| Item | Raw values |
| --- | --- |
| iter6 | mean success `0.4566666666666667`; foliage `0.46`; bag `0.4`; cloth `0.51` |
| iter7 | mean success `0.4666666666666666`; foliage `0.4`; bag `0.41`; cloth `0.59` |
| iter8 bag fixed slice | mean success `0.41`; nominal `0.45`; high_reocclusion `0.4`; camera_perturbation `0.5`; one_sided_slip `0.25` |
| routed controller | mean success `0.48666666666666664`; route `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8`; foliage `0.46`; bag `0.41`; cloth `0.59` |
## Proxy Baseline Compare Table
Source:
- `VLAarchtests/artifacts/reports/real_baseline_compare_v7_full/reveal_benchmark.json`
| Item | Raw values |
| --- | --- |
| baseline_rgbd_stage3 | mean success `0.31`; foliage `0.21`; bag `0.15`; cloth `0.57` |
| iter5_selector | mean success `0.45`; foliage `0.44`; bag `0.4`; cloth `0.51` |
## RLBench Recovered Push-Box Comparator
Sources:
- `reports/rlbench_general_debug/rlbench_push_box_fair_step1_final_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
- `reports/rlbench_general_debug/rlbench_push_box_historical_step1_knn_ep10_x99_res224_len180_train80_fixed/bimanual_push_box/rollout_eval.json`
| Item | Raw values |
| --- | --- |
| current fair-step1 final | mean success `0.7`; mean return `0.7`; successes `[1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0]` |
| historical push-box control | mean success `0.4`; mean return `0.4`; successes `[0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0]` |
## Official AnyBimanual Overlap Training Milestones
Sources:
- `baselines/AnyBimanual_overlap_runs/peract_bc_subset3_overlap_smoke200_fixpretrain_nowandb3/PERACT_BC/seed0/training.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/status.md`
| Global step | Raw values |
| --- | --- |
| 300 | loss `40.91718`; sample time `0.093029`; step time `14.0686` |
| 400 | loss `33.26684`; sample time `0.073085`; step time `14.3032` |
| 500 | loss `36.07054`; sample time `0.048558`; step time `11.1376` |
| 600 | loss `35.32345`; sample time `0.040642`; step time `9.7719` |
| 700 | loss `28.50959`; sample time `0.057937`; step time `10.9347` |
| 800 | loss `23.60169`; sample time `0.032697`; step time `11.8652` |
| 900 | loss `15.28901`; sample time `0.051232`; step time `11.5073` |
| 1000 checkpoint | train reached `weights/1000` and exited cleanly |
## Official AnyBimanual Overlap Eval Final Output
Sources:
- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`
| Item | Raw values |
| --- | --- |
| local last complete step | `1000` |
| local mean success | `0.16` |
| coordinated_push_box | success `0.0`; return `0.0`; final score log line `0.0` |
| coordinated_lift_ball | success `0.0`; return `0.0`; final score log line `0.0` |
| dual_push_buttons | success `0.48`; return `12.0`; final score log line `12.0` |
| public best overlap step in local summary | step `60000`; mean success `0.6933333333333334` |
| public best overlap per-task success | coordinated_push_box `0.8`; coordinated_lift_ball `0.32`; dual_push_buttons `0.96` |
| delta vs public best mean success | `-0.5333333333333333` |
| delta vs public best per-task success | coordinated_push_box `-0.8`; coordinated_lift_ball `-0.32`; dual_push_buttons `-0.48` |
## Validated General-Task Anchor: dual_push_buttons
Source:
- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`
| Item | Raw values |
| --- | --- |
| public AnyBimanual release | step `60000`; success `0.96`; return `24.0`; length `21.56` |
| local official single-task eval | step `60000`; episodes `25`; success `0.96`; return `24.0`; length `21.84` |
| local clip backbone-only | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_backbone_only_clip_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
| local elastic reveal proxy iter6 | success `0.0`; return `0.0`; path `reports/true_baseline_compare_subset3_v1/rlbench_subset3_elastic_reveal_proxy_iter6_100demo_fair_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
| local RVT hybrid frozen fixed-bounds | success `0.0`; return `0.0`; path `reports/rvt_overlap_branch_fixedbounds_20260330/evals/rlbench_subset3_backbone_only_rvt_100demo_frozen_fixedbounds_seed17_noplan_split/bimanual_dual_push_buttons/rollout_eval.json` |
## RVT Overlap Branch
Sources:
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/status.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/summary.md`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/summary.md`
| Item | Raw values |
| --- | --- |
| frozen RVT stage1 train | checkpoint `outputs/rlbench_rvt_branch/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17/checkpoint_best.pt`; final train total `0.043179353826920445`; final val total `0.039591669984665984`; train seconds `2261.2839448451996` |
| frozen RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
| frozen fixed-bounds RVT overlap eval | mean success `0.0`; push_box `0.0`; lift_ball `0.0`; dual_push_buttons `0.0` |
| local overlap floor used for gate | `0.16` |
| stage2 run flag | `false` |
## Dual-Push Nonzero Branch
Source:
- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
| Item | Raw values |
| --- | --- |
| direct rollout smoke planning | `5` episodes; `25` steps; mean success `0.0`; path `reports/dual_push_nonzero_branch_20260330/smoke_planning/rollout_eval.json` |
| controller sweep planning_c4 | `0.0` |
| controller sweep ik_c1 | `0.0` |
| controller sweep planning_c1_s05 | `0.0` |
| kNN top-1 planning | `5` episodes; `25` steps; mean success `0.0` |
| weighted rollout smoke planning | `5` episodes; `25` steps; mean success `0.0` |
| demo replay through absolute_action_from_delta | mean success `0.8`; mean return `0.8`; successful demo step counts `89`, `112`, `93`, `112` |
| weighted kNN top-1 planning length120 | `2` episodes; mean success `0.0` |
| chunk8 probe IK length120 | `1` episode; success `0.0`; return `0.0`; path recoveries `119`; noop fallbacks `1` |
| retargeted demo task_state smoke | `2` episodes; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone vision ep1 | `1` episode; mean success `1.0`; mean return `1.0` |
| retargeted demo checkpoint-backbone vision ep5 | `5` episodes; mean success `1.0`; mean return `1.0` |
## Dual-Push Full-Architecture Hybrid
Sources:
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`
- `reports/dual_push_full_arch_probe_iter6_scene_ep1/summary.json`
- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep1/summary.json`
| Item | Raw values |
| --- | --- |
| elastic checkpoint retargeted-demo probe | `1` episode; mean success `1.0`; mean return `1.0`; steps `94`; retrieved episode index `11`; retrieval similarity `0.9998629689216614` |
| full-architecture hybrid eval | `1` episode; mean success `1.0`; mean return `1.0`; steps `116`; path recoveries `0`; noop fallbacks `0`; first selected mode `residual::maintain_opening`; last selected mode `residual::base_action` |
## Previous Repo Raw Results
Previous raw tables are preserved in:
- `history/VLAarchtests_previous_README.md`