File size: 13,894 Bytes

bfb9665

# Change And Test Log

This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.

## Previous Repo Work Included Here

Copied from `history/VLAarchtests_previous_README.md`:

- core model, memory, planner, and dataset changes under:
  - `VLAarchtests/code/reveal_vla_bimanual/models/`
  - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
  - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/`
  - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py`
- training and eval paths under:
  - `VLAarchtests/code/reveal_vla_bimanual/train/`
  - `VLAarchtests/code/reveal_vla_bimanual/eval/`
- earlier test suite under:
  - `VLAarchtests/tests/`

## Current Session File Changes

### Core reveal/proxy path

- `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
- `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
- `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
- `VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py`
- `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
- `VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py`
- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
- `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py`

### Training/eval wrappers and configs

- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml`
- `environment/reconstruct_anybimanual_overlap_replay.sh`

### Test additions or updates

- `VLAarchtests/tests/test_eval_toggle_paths_work.py`
- `VLAarchtests/tests/test_task_routed_model_eval.py`
- `VLAarchtests/tests/test_anybimanual_resume_logic.py`
- `VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
- `VLAarchtests/tests/test_candidate_ranking_loss.py`
- `VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
- `VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
- `VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
- `VLAarchtests/tests/test_proxy_scripted_bench.py`
- `VLAarchtests/tests/test_rvt_backbone_forward.py`
- `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
- `VLAarchtests/tests/test_rlbench_init_checkpoint.py`
- `VLAarchtests/tests/test_rlbench_pickle_bootstrap.py`
- `VLAarchtests/tests/test_rlbench_task_resolver_aliases.py`
- `VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
- `VLAarchtests/tests/test_dual_push_retarget_utils.py`
- `VLAarchtests/tests/test_dual_push_full_arch_utils.py`

### Third-party baseline path changes

- `third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py`
- `third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py`
- `third_party/AnyBimanual/agents/peract_bc/launch_utils.py`
- `third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py`
- `third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py`

## Current Session Test Commands

Executed commands recorded in the workspace:

- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
- `PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
  - result: `11 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
  - result: `2 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py`
  - result: `4 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
  - result: `passed`
- `pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
  - result: `10 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py`
  - result: `passed`
- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py`
  - result: `6 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py`
  - result: `9 passed`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
- `PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py`
  - result: `4 passed`
- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
  - result: `passed`
- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
  - result: `passed`
- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
  - result: `passed`

## Current Session Generated Reports

Current-session report roots staged in this repo:

- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`

## HF Packaging Notes

Raw packaging changes applied to the staged HF export:

- `baselines/AnyBimanual_overlap_replay/multi/` was reshaped from one flat directory into shard subdirectories:
  - `00000-04999/`
  - `05000-09999/`
  - `10000-14999/`
- file count after reshape: `14034`
- reconstruction helper added at:
  - `environment/reconstruct_anybimanual_overlap_replay.sh`
- exact rejected Hub error before reshape:
  - `Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/`

## Current Session Logs

Main logs staged in this repo:

- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log`
- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log`
- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `reports/anybimanual_subset3_overlap_resume1000_summary.log`
- `reports/task_routed_proxy_v1_rerun.log`
- `reports/run_bag_selector_iter9_prebuild.log`
- `reports/anybimanual_release_subset3_eval_ep5.log`
- `reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh`
- `reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log`
- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log`

## Official Overlap Eval Final Raw Outputs

Sources:

- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`

Raw values:

- step `1000`
- local mean success `0.16`
- `coordinated_push_box`: success `0.0`, return `0.0`
- `coordinated_lift_ball`: success `0.0`, return `0.0`
- `dual_push_buttons`: success `0.48`, return `12.0`

## General-Task Anchor Raw Outputs

Sources:

- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`

Raw values:

- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
- local clip backbone-only result: success `0.0`, return `0.0`
- local elastic reveal proxy iter6 result: success `0.0`, return `0.0`
- local RVT frozen fixed-bounds result: success `0.0`, return `0.0`

## Dual-Push Branch Raw Outputs

Sources:

- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`

Raw values:

- demo replay through `absolute_action_from_delta`: mean success `0.8`, mean return `0.8`
- retargeted demo with checkpoint backbone retrieval and vision-only button localization, `5` episodes: mean success `1.0`, mean return `1.0`
- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, `1` episode: mean success `1.0`, mean return `1.0`
- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, `1` episode: mean success `1.0`, mean return `1.0`, steps `116`, path recoveries `0`, noop fallbacks `0`