VLAarchtests3 / code /VLAarchtests2_code /CHANGE_AND_TEST_LOG.md

Add files using upload-large-folder tool

bfb9665 verified about 1 month ago

13.9 kB

	# Change And Test Log

	This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.

	## Previous Repo Work Included Here

	Copied from `history/VLAarchtests_previous_README.md`:

	- core model, memory, planner, and dataset changes under:
	- `VLAarchtests/code/reveal_vla_bimanual/models/`
	- `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
	- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/`
	- `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py`
	- training and eval paths under:
	- `VLAarchtests/code/reveal_vla_bimanual/train/`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/`
	- earlier test suite under:
	- `VLAarchtests/tests/`

	## Current Session File Changes

	### Core reveal/proxy path

	- `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
	- `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
	- `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
	- `VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py`
	- `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
	- `VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py`
	- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
	- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py`
	- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
	- `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py`

	### Training/eval wrappers and configs

	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml`
	- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml`
	- `environment/reconstruct_anybimanual_overlap_replay.sh`

	### Test additions or updates

	- `VLAarchtests/tests/test_eval_toggle_paths_work.py`
	- `VLAarchtests/tests/test_task_routed_model_eval.py`
	- `VLAarchtests/tests/test_anybimanual_resume_logic.py`
	- `VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
	- `VLAarchtests/tests/test_candidate_ranking_loss.py`
	- `VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
	- `VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
	- `VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
	- `VLAarchtests/tests/test_proxy_scripted_bench.py`
	- `VLAarchtests/tests/test_rvt_backbone_forward.py`
	- `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
	- `VLAarchtests/tests/test_rlbench_init_checkpoint.py`
	- `VLAarchtests/tests/test_rlbench_pickle_bootstrap.py`
	- `VLAarchtests/tests/test_rlbench_task_resolver_aliases.py`
	- `VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
	- `VLAarchtests/tests/test_dual_push_retarget_utils.py`
	- `VLAarchtests/tests/test_dual_push_full_arch_utils.py`

	### Third-party baseline path changes

	- `third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py`
	- `third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py`
	- `third_party/AnyBimanual/agents/peract_bc/launch_utils.py`
	- `third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py`
	- `third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py`

	## Current Session Test Commands

	Executed commands recorded in the workspace:

	- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
	- `PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
	- result: `11 passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
	- result: `2 passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py`
	- result: `4 passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
	- result: `passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
	- result: `10 passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py`
	- result: `passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py`
	- result: `6 passed`
	- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py`
	- result: `9 passed`
	- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
	- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
	- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
	- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
	- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
	- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
	- `PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py`
	- result: `4 passed`
	- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
	- result: `passed`
	- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
	- result: `passed`
	- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
	- result: `passed`

	## Current Session Generated Reports

	Current-session report roots staged in this repo:

	- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
	- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
	- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
	- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
	- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
	- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
	- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
	- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
	- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
	- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`

	## HF Packaging Notes

	Raw packaging changes applied to the staged HF export:

	- `baselines/AnyBimanual_overlap_replay/multi/` was reshaped from one flat directory into shard subdirectories:
	- `00000-04999/`
	- `05000-09999/`
	- `10000-14999/`
	- file count after reshape: `14034`
	- reconstruction helper added at:
	- `environment/reconstruct_anybimanual_overlap_replay.sh`
	- exact rejected Hub error before reshape:
	- `Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/`

	## Current Session Logs

	Main logs staged in this repo:

	- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log`
	- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log`
	- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
	- `reports/anybimanual_subset3_overlap_resume1000_summary.log`
	- `reports/task_routed_proxy_v1_rerun.log`
	- `reports/run_bag_selector_iter9_prebuild.log`
	- `reports/anybimanual_release_subset3_eval_ep5.log`
	- `reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh`
	- `reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log`
	- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log`

	## Official Overlap Eval Final Raw Outputs

	Sources:

	- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
	- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`

	Raw values:

	- step `1000`
	- local mean success `0.16`
	- `coordinated_push_box`: success `0.0`, return `0.0`
	- `coordinated_lift_ball`: success `0.0`, return `0.0`
	- `dual_push_buttons`: success `0.48`, return `12.0`

	## General-Task Anchor Raw Outputs

	Sources:

	- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`

	Raw values:

	- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
	- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
	- local clip backbone-only result: success `0.0`, return `0.0`
	- local elastic reveal proxy iter6 result: success `0.0`, return `0.0`
	- local RVT frozen fixed-bounds result: success `0.0`, return `0.0`

	## Dual-Push Branch Raw Outputs

	Sources:

	- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
	- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`

	Raw values:

	- demo replay through `absolute_action_from_delta`: mean success `0.8`, mean return `0.8`
	- retargeted demo with checkpoint backbone retrieval and vision-only button localization, `5` episodes: mean success `1.0`, mean return `1.0`
	- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, `1` episode: mean success `1.0`, mean return `1.0`
	- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, `1` episode: mean success `1.0`, mean return `1.0`, steps `116`, path recoveries `0`, noop fallbacks `0`