File size: 13,894 Bytes
bfb9665
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# Change And Test Log

This file records the main code changes and executed test commands copied into this repo. Result statements below are raw command outcomes only.

## Previous Repo Work Included Here

Copied from `history/VLAarchtests_previous_README.md`:

- core model, memory, planner, and dataset changes under:
  - `VLAarchtests/code/reveal_vla_bimanual/models/`
  - `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
  - `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/`
  - `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py`
- training and eval paths under:
  - `VLAarchtests/code/reveal_vla_bimanual/train/`
  - `VLAarchtests/code/reveal_vla_bimanual/eval/`
- earlier test suite under:
  - `VLAarchtests/tests/`

## Current Session File Changes

### Core reveal/proxy path

- `VLAarchtests/code/reveal_vla_bimanual/models/policy.py`
- `VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py`
- `VLAarchtests/code/reveal_vla_bimanual/models/backbones.py`
- `VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py`
- `VLAarchtests/code/reveal_vla_bimanual/train/losses.py`
- `VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_anybimanual_overlap_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/compose_task_routed_proxy_summary.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_proposal_alignment_diagnostics.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_knn_task_sweep.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py`
- `VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py`
- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/build_task_specialized_episode_specs.py`
- `VLAarchtests/code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
- `VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py`

### Training/eval wrappers and configs

- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter6.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter7.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter8.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_v7_selector_finetune_iter9_bag.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_100demo_fair_step1_full.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_frozen_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_rvt_100demo_unfreeze_top2_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_finetune_weighted_seed17.yaml`
- `VLAarchtests/code/reveal_vla_bimanual/train/configs/rlbench_dual_push_backbone_only_clip_chunk8_weighted_seed17.yaml`
- `environment/reconstruct_anybimanual_overlap_replay.sh`

### Test additions or updates

- `VLAarchtests/tests/test_eval_toggle_paths_work.py`
- `VLAarchtests/tests/test_task_routed_model_eval.py`
- `VLAarchtests/tests/test_anybimanual_resume_logic.py`
- `VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
- `VLAarchtests/tests/test_candidate_ranking_loss.py`
- `VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
- `VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
- `VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
- `VLAarchtests/tests/test_proxy_scripted_bench.py`
- `VLAarchtests/tests/test_rvt_backbone_forward.py`
- `VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py`
- `VLAarchtests/tests/test_rlbench_init_checkpoint.py`
- `VLAarchtests/tests/test_rlbench_pickle_bootstrap.py`
- `VLAarchtests/tests/test_rlbench_task_resolver_aliases.py`
- `VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
- `VLAarchtests/tests/test_dual_push_retarget_utils.py`
- `VLAarchtests/tests/test_dual_push_full_arch_utils.py`

### Third-party baseline path changes

- `third_party/AnyBimanual/third_party/YARR/yarr/runners/offline_train_runner.py`
- `third_party/AnyBimanual/third_party/YARR/yarr/runners/weight_init_utils.py`
- `third_party/AnyBimanual/agents/peract_bc/launch_utils.py`
- `third_party/AnyBimanual/agents/peract_bc/qattention_peract_bc_agent.py`
- `third_party/AnyBimanual/agents/peract_bimanual/qattention_peract_bc_agent.py`

## Current Session Test Commands

Executed commands recorded in the workspace:

- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/action_decoder.py /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py`
- `PYTHONPATH=/workspace/VLAarchtests/code/reveal_vla_bimanual pytest -q /workspace/VLAarchtests/tests/test_proposal_mode_names_label_base_action.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py`
  - result: `11 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_anybimanual_overlap_eval_summary.py`
  - result: `2 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_task_routed_model_eval.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py`
  - result: `4 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_rvt_backbone_forward.py /workspace/VLAarchtests/tests/test_rlbench_dataset_rgbd_geometry.py /workspace/VLAarchtests/tests/test_eval_toggle_paths_work.py /workspace/VLAarchtests/tests/test_rlbench_init_checkpoint.py /workspace/VLAarchtests/tests/test_rlbench_pickle_bootstrap.py /workspace/VLAarchtests/tests/test_rlbench_task_resolver_aliases.py /workspace/VLAarchtests/tests/test_summarize_rvt_overlap_branch.py`
  - result: `passed`
- `pytest -q /workspace/VLAarchtests/tests/test_build_task_specialized_episode_specs.py /workspace/VLAarchtests/tests/test_candidate_ranking_loss.py /workspace/VLAarchtests/tests/test_compose_task_routed_proxy_summary.py`
  - result: `10 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_rlbench_knn_eval_scene_kwargs.py`
  - result: `passed`
- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py`
  - result: `6 passed`
- `pytest -q /workspace/VLAarchtests/tests/test_dual_push_retarget_utils.py /workspace/VLAarchtests/tests/test_dual_push_full_arch_utils.py`
  - result: `9 passed`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_bag_selector_iter9.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_anybimanual_subset3_overlap_train.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_rvt_overlap_branch.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_retargeted_demo_eval.sh`
- `bash -n /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_dual_push_full_arch_hybrid_eval.sh`
- `PYTHONPATH=/workspace/third_party/AnyBimanual/third_party/YARR pytest -q /workspace/VLAarchtests/tests/test_anybimanual_resume_logic.py`
  - result: `4 passed`
- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/models/rvt_backbone.py /workspace/VLAarchtests/code/reveal_vla_bimanual/train/run_rlbench_experiment.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/dataset.py /workspace/VLAarchtests/code/reveal_vla_bimanual/sim_rlbench/task_resolver.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/summarize_rvt_overlap_branch.py`
  - result: `passed`
- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_retargeted_demo_eval.py`
  - result: `passed`
- `python -m py_compile /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_retarget_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/dual_push_full_arch_utils.py /workspace/VLAarchtests/code/reveal_vla_bimanual/eval/run_rlbench_dual_push_full_arch_hybrid_eval.py`
  - result: `passed`

## Current Session Generated Reports

Current-session report roots staged in this repo:

- `VLAarchtests/artifacts/reports/sprint_v7_summary/`
- `VLAarchtests/artifacts/reports/sprint_v7_followup/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iterations/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter7/`
- `VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/`
- `VLAarchtests/artifacts/reports/task_routed_proxy_v1/`
- `VLAarchtests/artifacts/reports/rlbench_general_debug_20260330/`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/`
- `VLAarchtests/artifacts/reports/bag_mode_specialization_20260330/`
- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_20260330/`
- `VLAarchtests/artifacts/reports/rvt_overlap_branch_fixedbounds_20260330/`
- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/`
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/`

## HF Packaging Notes

Raw packaging changes applied to the staged HF export:

- `baselines/AnyBimanual_overlap_replay/multi/` was reshaped from one flat directory into shard subdirectories:
  - `00000-04999/`
  - `05000-09999/`
  - `10000-14999/`
- file count after reshape: `14034`
- reconstruction helper added at:
  - `environment/reconstruct_anybimanual_overlap_replay.sh`
- exact rejected Hub error before reshape:
  - `Your push was rejected because it contains too many files per directory. Each directory in your git repo can only contain up to 10000 files. Offending directories: /baselines/AnyBimanual_overlap_replay/multi/`

## Current Session Logs

Main logs staged in this repo:

- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train.log`
- `reports/anybimanual_subset3_overlap_smoke200_fixpretrain_nowandb3_train_presavefix.log`
- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `reports/anybimanual_subset3_overlap_resume1000_summary.log`
- `reports/task_routed_proxy_v1_rerun.log`
- `reports/run_bag_selector_iter9_prebuild.log`
- `reports/anybimanual_release_subset3_eval_ep5.log`
- `reports/rvt_overlap_branch_fixedbounds_20260330_chain.sh`
- `reports/dual_push_full_arch_hybrid_iter6_scene_ep5.log`
- `reports/dual_push_full_arch_hybrid_iter6_backbone_ep2_r005.log`

## Official Overlap Eval Final Raw Outputs

Sources:

- `reports/anybimanual_subset3_overlap_resume1000_eval.log`
- `VLAarchtests/artifacts/reports/anybimanual_overlap_baseline_20260330/resume1000_summary/summary.json`

Raw values:

- step `1000`
- local mean success `0.16`
- `coordinated_push_box`: success `0.0`, return `0.0`
- `coordinated_lift_ball`: success `0.0`, return `0.0`
- `dual_push_buttons`: success `0.48`, return `12.0`

## General-Task Anchor Raw Outputs

Sources:

- `VLAarchtests/artifacts/reports/general_task_anchor_20260330_dual_push_buttons/summary.json`

Raw values:

- public AnyBimanual release, step `60000`: success `0.96`, return `24.0`, length `21.56`
- local official single-task eval, step `60000`, `25` episodes: success `0.96`, return `24.0`, length `21.84`
- local clip backbone-only result: success `0.0`, return `0.0`
- local elastic reveal proxy iter6 result: success `0.0`, return `0.0`
- local RVT frozen fixed-bounds result: success `0.0`, return `0.0`

## Dual-Push Branch Raw Outputs

Sources:

- `VLAarchtests/artifacts/reports/dual_push_nonzero_branch_20260330/summary.md`
- `VLAarchtests/artifacts/reports/dual_push_full_arch_hybrid_20260331/summary.md`

Raw values:

- demo replay through `absolute_action_from_delta`: mean success `0.8`, mean return `0.8`
- retargeted demo with checkpoint backbone retrieval and vision-only button localization, `5` episodes: mean success `1.0`, mean return `1.0`
- elastic checkpoint retargeted-demo probe with scene retrieval and vision-only button localization, `1` episode: mean success `1.0`, mean return `1.0`
- full-architecture hybrid eval with elastic controller checkpoint plus dual-push retrieval checkpoint, `1` episode: mean success `1.0`, mean return `1.0`, steps `116`, path recoveries `0`, noop fallbacks `0`