| # Change And Test Log |
|
|
| This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only. |
|
|
| ## Previous Repo Work Included Here |
|
|
| Copied from `history/VLAarchtests_previous_README.md`: |
|
|
| - core model, memory, planner, and dataset changes under: |
| - `VLAarchtests/code/reveal_vla_bimanual/models/` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/` |
| - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py` |
| - training and eval paths under: |
| - `VLAarchtests/code/reveal_vla_bimanual/train/` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/` |
| - earlier test suite under: |
| - `VLAarchtests/tests/` |
|
|
| ## Current Session File Changes |
|
|
| ### Core reveal/proxy path |
|
|
| - `VLAarchtests/code/reveal_vla_bimanual/models/policy.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py` |
| - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py` |
|
|
| ### Training/eval wrappers and configs |
|
|
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml` |
| - `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml` |
| - `environment/reconstruct_anybimanual_overlap_replay.sh` |
|
|
| ### Test additions or updates |
|
|
| - `VLAarchtests/tests/test_eval_toggle_paths_work.py` |
| - `VLAarchtests/tests/test_task_routed_model_eval.py` |
| - `VLAarchtests/tests/test_anybimanual_resume_logic.py` |
| - `VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py` |
| - `VLAarchtests/tests/test_candidate_ranking_loss.py` |
| - `VLAarchtests/tests/test_compose_task_routed_proxy_summary.py` |
| - `VLAarchtests/tests/test_build_task_specialized_episode_specs.py` |
| - `VLAarchtests/tests/test_proposal_mode_names_label_base_action.py` |
| - `VLAarchtests/tests/test_proxy_scripted_bench.py` |
| - `VLAarchtests/tests/test_rvt_backbone_forward.py` |
| - `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py` |
| - `VLAarchtests/tests/test_rlbench_init_checkpoint.py` |
| - `VLAarchtests/tests/test_rlbench_pickle_bootstrap.py` |
| - `VLAarchtests/tests/test_rlbench_task_resolver_aliases.py` |
| - `VLAarchtests/tests/test_summarize_rvt_overlap_branch.py` |
| - `VLAarchtests/tests/test_dual_push_retarget_utils.py` |
| - `VLAarchtests/tests/test_dual_push_full_arch_utils.py` |
|
|
| ### Third-party baseline path changes |
|
|
| - `third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py` |
| - `third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py` |
| - `third_party/AnyBimanual/agents/peract_bc/launch_utils.py` |
| - `third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py` |
| - `third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py` |
|
|
| ## Current Session Test Commands |
|
|
| Executed commands recorded in the workspace: |
|
|
| - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py` |
| - `PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py` |
| - result: `11 passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py` |
| - result: `2 passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py` |
| - result: `4 passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py` |
| - result: `passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py` |
| - result: `10 passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py` |
| - result: `passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py` |
| - result: `6 passed` |
| - `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py` |
| - result: `9 passed` |
| - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh` |
| - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh` |
| - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh` |
| - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh` |
| - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh` |
| - `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh` |
| - `PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py` |
| - result: `4 passed` |
| - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py` |
| - result: `passed` |
| - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py` |
| - result: `passed` |
| - `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py` |
| - result: `passed` |
|
|
| ## Current Session Generated Reports |
|
|
| Current-session report roots staged in this repo: |
|
|
| - `VLAarchtests/artifacts/reports/sprint_v7_summary/` |
| - `VLAarchtests/artifacts/reports/sprint_v7_followup/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/` |
| - `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/` |
| - `VLAarchtests/artifacts/reports/task_routed_proxy_v1/` |
| - `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/` |
| - `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/` |
| - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/` |
| - `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/` |
| - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/` |
| - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/` |
|
|
| ## HF Packaging Notes |
|
|
| Raw packaging changes applied to the staged HF export: |
|
|
| - `baselines/AnyBimanual_overlap_replay/multi/` was reshaped from one flat directory into shard subdirectories: |
| - `00000-04999/` |
| - `05000-09999/` |
| - `10000-14999/` |
| - file count after reshape: `14034` |
| - reconstruction helper added at: |
| - `environment/reconstruct_anybimanual_overlap_replay.sh` |
| - exact rejected Hub error before reshape: |
| - `Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/` |
|
|
| ## Current Session Logs |
|
|
| Main logs staged in this repo: |
|
|
| - `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log` |
| - `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log` |
| - `reports/anybimanual_subset3_overlap_resume1000_eval.log` |
| - `reports/anybimanual_subset3_overlap_resume1000_summary.log` |
| - `reports/task_routed_proxy_v1_rerun.log` |
| - `reports/run_bag_selector_iter9_prebuild.log` |
| - `reports/anybimanual_release_subset3_eval_ep5.log` |
| - `reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh` |
| - `reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log` |
| - `reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log` |
|
|
| ## Official Overlap Eval Final Raw Outputs |
|
|
| Sources: |
|
|
| - `reports/anybimanual_subset3_overlap_resume1000_eval.log` |
| - `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json` |
|
|
| Raw values: |
|
|
| - step `1000` |
| - local mean success `0.16` |
| - `coordinated_push_box`: success `0.0`, return `0.0` |
| - `coordinated_lift_ball`: success `0.0`, return `0.0` |
| - `dual_push_buttons`: success `0.48`, return `12.0` |
|
|
| ## General-Task Anchor Raw Outputs |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json` |
|
|
| Raw values: |
|
|
| - public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56` |
| - local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84` |
| - local clip backbone-only result: success `0.0`, return `0.0` |
| - local elastic reveal proxy iter6 result: success `0.0`, return `0.0` |
| - local RVT frozen fixed-bounds result: success `0.0`, return `0.0` |
|
|
| ## Dual-Push Branch Raw Outputs |
|
|
| Sources: |
|
|
| - `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md` |
| - `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md` |
|
|
| Raw values: |
|
|
| - demo replay through `absolute_action_from_delta`: mean success `0.8`, mean return `0.8` |
| - retargeted demo with checkpoint backbone retrieval and vision-only button localization, `5` episodes: mean success `1.0`, mean return `1.0` |
| - elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, `1` episode: mean success `1.0`, mean return `1.0` |
| - full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, `1` episode: mean success `1.0`, mean return `1.0`, steps `116`, path recoveries `0`, noop fallbacks `0` |
|
|