wsagi
/

ACT-PickOrange

@@ -32,19 +32,30 @@ _An [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) pol
 - **任务 / Task**：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
   _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
 - **数据集 / Dataset**：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
-- **架构 / Architecture**：ACT chunk_size=100，~80M 参数，纯 vision + joint state → action chunk regression（无 LLM / 无 diffusion）。
-- **训练 / Training**：batch=8 / lr=1e-5 / 10k step / **关闭图像增强**，~5h on RTX 4090。
-- **评测 / Eval**：Isaac Sim 5.1 + LeIsaac，**1/1 success @ 120s sim time**（3 颗全部放盘成功）。
-- **⚠️ 关键 inference 配置 / Critical inference setting**：`policy_action_horizon=32`。
-  默认值 16 会让模型卡在第二颗橙子（爪子抖），8 会卡在第一颗。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。
 ## 模型亮点
 _Highlights_
-- **复刻 + 验证 [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) 的配方**，得到等价或更好的成功率。
-  _Reproduces and validates the [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) recipe with comparable or better success rate._
-- **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**：chunk_size=100 的 ACT 需要 horizon ≥ 32 才能让宏观运动段完整执行，详见 README 的诊断章节。
-  _Exposes a hidden trap in LeIsaac's default `policy_action_horizon=16`: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute._
 - 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
 ## 训练配方
@@ -54,69 +65,97 @@ _Training recipe_
 |---|---|
 | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
 | Policy | `act` (LeRobot 实现 / LeRobot impl.) |
 | Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
 | `chunk_size` | 100 |
 | `n_action_steps` | 100 |
 | Batch size | 8 |
 | Optimizer | AdamW |
 | Learning rate | 1e-5 (constant) |
-| Steps | 10,000 |
 | Image augmentation | **disabled** |
 | Hardware | RTX 4090 (24 GB) |
-| Wall-clock | ~5 hours |
-| Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) |
 训练入口脚本在我们的 LeIsaac fork：[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
 _Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._
-## 评测结果
-_Eval results_
-| 配置 / Config | 第 1 颗 | 第 2 颗 | 第 3 颗 | Episode 成功率 |
-|---|---|---|---|---|
-| horizon=8  | 🔴 卡死（夹住不动） | — | — | 0/1 |
-| horizon=16 | ✅ 成功 | 🟡 爪子抖 / muting | — | 0/1 |
-| **horizon=32** | ✅ 成功 | ✅ 折腾后成功 | ✅ 折腾后成功 | **1/1** ✅ |
 测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=30`，dual-cam 观测。
 _Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
-**单 sample 警告 / Single-sample caveat**：以上 1/1 是单一 episode 结果，未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题，是配置问题。
-_The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue._
 ## ⚠️ 推理关键配置 / Critical inference caveat
-**ACT chunk_size=100 + 默认 horizon=16 = 第二颗橙子永远过不去。** 这不是 ACT 的弱点，是 LeIsaac 默认配置的隐性陷阱。
-_**ACT chunk_size=100 + the default horizon=16 will deadlock on the 2nd orange.** This is not an ACT weakness; it's a hidden trap in LeIsaac's default config._
 ### 根因 / Root cause
-ACT 每个 chunk 输出 100 步动作，是一段**完整规划**：前 ~10 步是"启动 / 加速"，中段 (step 20-80) 才是真正的**宏观运动**（接近 → 夹起 → 提起 → 运送 → 释放）。LeRobot async client 用直接窗口 (receding horizon)，每 `policy_action_horizon` 步重新查询一次。
-_Each ACT chunk outputs a 100-step planned trajectory: the first ~10 steps are "startup", and steps 20-80 are the macro-motion (approach → grasp → lift → transport → release). The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps._
-- horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段"，**根本到不了宏观运动** → 卡死。
-  _horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock._
-- horizon=16 → 够第 1 颗的简单"靠近→夹起"，但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。
-  _horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter._
-- horizon=32 → 给 macro-motion 完整执行机会，1/1 通过。
 ### 推荐配置 / Recommended settings
 ```bash
 --policy_type=lerobot-act
---policy_action_horizon=32
---policy_checkpoint_path=<path-to-this-model>
---step_hz=30                  # 对齐 dataset 30Hz / matches dataset 30Hz
 --episode_length_s=120
 ```
 ## 使用方法
 _Usage_
-### 1. 启动 LeRobot async policy_server
 ```bash
-pip install lerobot
 python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
 ```
@@ -128,26 +167,43 @@ python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
 cd LeIsaac
 bash scripts/evaluation/run_eval.sh -- \
     --task=LeIsaac-SO101-PickOrange-v0 \
-    --eval_rounds=3 \
     --episode_length_s=120 \
     --step_hz=30 \
     --policy_type=lerobot-act \
     --policy_host=127.0.0.1 --policy_port=8080 \
     --policy_checkpoint_path=wsagi/ACT-PickOrange \
-    --policy_action_horizon=32 \
     --policy_language_instruction="Pick up the orange and place it on the plate" \
     --device=cuda --enable_cameras
 ```
-`run_eval.sh` 自动按 user-patience cap 计算 wall-clock timeout，避免无意义等待慢推理。
-_`run_eval.sh` auto-computes a user-patience wall-clock timeout so slow inference fails fast._
 ## 局限性
 _Limitations_
-- **数据集 OOD on 2nd-3rd orange**：dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级，model 在那里 monotonic 变难、动作变"折腾"。即便 horizon=32 救了形式上的成功率，**精度仍随颗数线性退化**。这是数据问题不是模型问题。
-  _**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=32 the policy gets visibly more jittery on later oranges. This is a data issue, not a model issue._
-- 三个独立架构 (我们的 ACT / Diffusion Policy / SmolVLA / 公开 shadowHokage ACT) 在同一 dataset 上 **共同 OOD on 3rd orange** — 全 family 共病。
 - 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证，不保证真机 deploy。
   _No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
@@ -156,16 +212,16 @@ _Related_
 - 同任务对照 / Same-task comparisons：
   - [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
-  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 同配方公开 ckpt（我们的复刻参考）
-  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA（30s 完成 3 颗）
-- 完整训练 + eval 配方：[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
 ## 致谢
 _Acknowledgments_
 - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
 - LeRobot 团队提供 ACT 实现 + async inference 框架
-- shadowHokage 公开训练配方作为复刻基线
 ## 引用
 _Citation_

 - **任务 / Task**：`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
   _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
 - **数据集 / Dataset**：[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
+- **架构 / Architecture**：ACT chunk_size=100，~52M 参数，纯 vision + joint state → action chunk regression（无 LLM / 无 diffusion）。
+- **训练 / Training**：lerobot **v0.4.0**, batch=8 / lr=1e-5 / 20k step / 关闭图像增强，~10h on RTX 4090. **本 ckpt = step 18000** (sweet spot)。
+- **评测 / Eval**：Isaac Sim 5.1 + LeIsaac，**5-round × 5-run pooled** = 33/75 oranges = **44.0% per-orange success** (95% CI [29.5%, 58.5%])。
+- **⚠️ 关键 inference 配置 / Critical inference setting**：`policy_action_horizon=70`（旧 v0.5.2 ckpt 的 horizon=32 不适用本 v0.4.0 ckpt，详见 [Inference caveat](#-推理关键配置--critical-inference-caveat)）。
+## 🌳 分支说明 / Branch layout
+本 repo 有两个 ckpt，分别记录 framework drift 故事的两端：
+_Two checkpoints are tracked in this repo, capturing both ends of the framework drift story:_
+| Branch | lerobot version | Training step | best horizon | 🍊 per-orange p (5-run pool) | 备注 |
+|---|---|---|---|---|---|
+| **main** (本 ckpt) | **v0.4.0** | **18000** | **70** | **0.440** (33/75) | 当前推荐 / current canonical |
+| `lerobot-v052-ckpt-10k` | v0.5.2 | 10000 | 32 (旧推荐 / old) | 0.267 (4/15 single 5-round) | 历史对照 / archived for framework-drift study |
+详见下方 [Framework drift section](#framework-drift--lerobot-v04-vs-v05)。
+_See [Framework drift section](#framework-drift--lerobot-v04-vs-v05) below._
 ## 模型亮点
 _Highlights_
+- **5-round × 5-run pooled 严格统计** confirmed: 44.0% per-orange (95% CI [29.5%, 58.5%])，显著优于 shadowHokage 公开 ckpt 18.3% (95% CI [10.6%, 26.0%])。Welch t-test (per-ep, 消除 episode-cluster) **p=0.034**，two-proportion Z test **p=0.008**。
+- **暴露了 lerobot v0.4 → v0.5 framework drift**：同 dataset / 同 seed / 同 config，仅切换 lerobot 版本，v0.5.2 训出的 ckpt 跌到 18-27% per-orange（同 shadowHokage 真实水平），锁回 v0.4.0 才恢复 44%。详见底部 framework drift section。
+- **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**：chunk_size=100 的 ACT 需要 per-ckpt sweep 找最优 h（本 ckpt h=70；不同训练曲线产出的 ckpt 最优 h 不同）。
 - 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
 ## 训练配方
 |---|---|
 | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
 | Policy | `act` (LeRobot 实现 / LeRobot impl.) |
+| **lerobot version** | **v0.4.0** (锁版本以避免 framework drift) |
 | Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
 | `chunk_size` | 100 |
 | `n_action_steps` | 100 |
 | Batch size | 8 |
 | Optimizer | AdamW |
 | Learning rate | 1e-5 (constant) |
+| Steps | 20,000 (本 ckpt = step **18000**, 经 sweep 是 sweet spot) |
 | Image augmentation | **disabled** |
 | Hardware | RTX 4090 (24 GB) |
+| Wall-clock | ~10 hours |
+| Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy)（v0.4 era 配方原型）|
 训练入口脚本在我们的 LeIsaac fork：[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
 _Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._
+## 评测结果 / Eval results
+### 5-round × 5-run pooled stats (25 episodes total)
+5-round 协议在 ACT 上 single-run variance 实测 ±40%（同 ckpt 同 horizon 跨 5 runs 范围 2-13/15），所以 canonical 数字必须 pooled multi-run。
+_The 5-round protocol has ±40% single-run variance for ACT (same ckpt + same horizon, range 2-13/15 across 5 runs), so canonical numbers must be pooled across multiple runs._
+| 配置 / Config | 🍊 per-orange p | per-episode mean | 95% CI (per-orange) |
+|---|---|---|---|
+| **wsagi/ACT-PickOrange v0.4.0 ckpt-18k h=70** (本 ckpt, 5 runs) | **0.440** | 1.32/ep | **[0.295, 0.585]** |
+| shadowHokage/act_policy h={16,32,64,70} (4 runs) | 0.183 | 0.55/ep | [0.106, 0.260] |
+**显著性 / Significance**：
+- Two-proportion Z test (per-orange iid): Z = 2.67, **p = 0.008** ✅
+- Welch t-test (per-episode, 消 episode-cluster over-dispersion): t = 2.13, df ≈ 38, **p = 0.034** ✅
+- Effect ratio: **2.20×**
+### 0-3 oranges per-episode 分布 / Per-episode oranges distribution
+ACT chunk-policy 是 trajectory-level 决策，不是 per-orange iid — 一旦 trajectory 进入正确模式 → 3 颗 cluster 连续成功；一旦偏 → 0 颗全废。**实际分布 bimodal 而非 binomial**：
+_ACT chunks make trajectory-level decisions, not per-orange iid — once the trajectory enters the correct mode, all 3 oranges cluster as a successful streak; once it goes off-track, the entire episode is wasted. **Observed distribution is bimodal, not binomial**:_
+| oranges/ep | observed (25 ep) | Binomial(3, 0.440) expected | observed / expected |
+|---|---|---|---|
+| 0 | 11 | 4.4 | **2.51×** (over-dispersed) |
+| 1 | 2 | 10.3 | 0.19× (under) |
+| 2 | 5 | 8.1 | 0.61× (under) |
+| 3 | 7 | 2.1 | **3.29×** (over-dispersed) |
+两端 (0/3) 比 binomial 预期多 2.5-3.3×，中间 (1/2) 比预期少一半 — bimodal/U-shape 签名。
+_Both tails (0/3) appear 2.5-3.3× more often than binomial; middle bins (1/2) appear at half the expected rate — bimodal/U-shape signature._
+### Per-run 数据点 / Per-run datapoints
+ckpt-18k h=70 5 runs (25 episodes total)：
+```
+run1: [3, 3, 3, 2, 2] = 13/15 (lucky tail, P≈0.003% under binomial)
+run2: [1, 1, 0, 0, 0] =  2/15
+run3: [2, 0, 3, 0, 3] =  8/15
+run4: [3, 0, 0, 2, 0] =  5/15
+run5: [0, 0, 3, 2, 0] =  5/15
+```
+范围 2-13/15 = ±40% range，pooled mean = 33/75。
 测试环境 / Test setup：Isaac Sim 5.1，task `LeIsaac-SO101-PickOrange-v0`，`episode_length_s=120`，`step_hz=30`，dual-cam 观测。
 _Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
 ## ⚠️ 推理关键配置 / Critical inference caveat
+**本 v0.4.0 ckpt 最优 horizon = 70**（不是旧 v0.5.2 ckpt 的 32！）。每个训练曲线产出的 ckpt 最优 inference horizon 不同，必须 per-ckpt sweep。
+_**The v0.4.0 ckpt's best horizon is 70** (not the old v0.5.2 ckpt's 32!). Each training trajectory produces a ckpt with different optimal inference horizon — per-ckpt sweep is required._
 ### 根因 / Root cause
+ACT 每个 chunk 输出 100 步动作，是一段**完整规划**。LeRobot async client 用直接窗口 (receding horizon)，每 `policy_action_horizon` 步重新查询一次。**chunk 内 action 一致性** 决定了 best horizon — 训练 framework drift 改了 dataloader RNG / loss normalization → ckpt 内化的 chunk 一致性不同 → 最优 replan 频率不同。
+_Each ACT chunk outputs a 100-step planned trajectory. The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps. **Chunk-internal action coherence** determines the best horizon — framework drift (dataloader RNG / loss normalization) changes the chunk coherence baked into the ckpt → optimal re-plan frequency shifts._
 ### 推荐配置 / Recommended settings
 ```bash
 --policy_type=lerobot-act
+--policy_action_horizon=70                  # for THIS ckpt (v0.4.0 ckpt-18k); 旧 v0.5.2 ckpt 用 32
+--policy_checkpoint_path=wsagi/ACT-PickOrange
+--step_hz=30                                # 对齐 dataset 30Hz / matches dataset 30Hz
 --episode_length_s=120
 ```
 ## 使用方法
 _Usage_
+### 1. 启动 LeRobot async policy_server (lerobot v0.4.0)
 ```bash
+conda create -n lerobot-v040 python=3.10 -y && conda activate lerobot-v040
+pip install lerobot==0.4.0  # 必须锁版本！避免 framework drift
 python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
 ```
 cd LeIsaac
 bash scripts/evaluation/run_eval.sh -- \
     --task=LeIsaac-SO101-PickOrange-v0 \
+    --eval_rounds=5 \
     --episode_length_s=120 \
     --step_hz=30 \
     --policy_type=lerobot-act \
     --policy_host=127.0.0.1 --policy_port=8080 \
     --policy_checkpoint_path=wsagi/ACT-PickOrange \
+    --policy_action_horizon=70 \
     --policy_language_instruction="Pick up the orange and place it on the plate" \
     --device=cuda --enable_cameras
 ```
+## Framework drift — lerobot v0.4 vs v0.5
+本 ckpt 重训于 lerobot **v0.4.0**（锁版本），而不是 main repo 最新 v0.5.x。原因：
+_This ckpt was retrained on lerobot **v0.4.0** (pinned version), not the latest v0.5.x main. Reason:_
+| Training framework | 5-round per-orange p | 显著性 |
+|---|---|---|
+| lerobot v0.4.0（本 ckpt）| **0.440** (5-run pool, 25 ep) | baseline |
+| lerobot v0.5.2 + 2 patches | 0.267 (4/15 single 5-round) | -39% vs v0.4.0 (left-tail p≈0.1%) |
+| shadowHokage (v0.4 era, 2026-01) | 0.183 (4-h sweep, 20 ep) | -58% vs v0.4.0, Z=2.67 **p=0.008** |
+**关键发现 / Key findings**：
+- lerobot **PR #3406 (a8b72d96)** 改 dataloader (`persistent_workers/uint8/prefetch`) 在 2026-04-19 merge
+- lerobot **PR #3442 (1add4606)** 改 ACT padding loss 在 2026-04-23 merge
+- 两个 PR 都 land 在 v0.5.0 (2026-04-26)；锁回 v0.4.0 可恢复 0.440 per-orange
+完整 ablation + 三模型 brainstorm 详见我们的设计文档：[`act_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/act_finetune_pick_orange.html)。
+_Full ablation + 3-model brainstorm in our design doc: [`act_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/act_finetune_pick_orange.html)._
 ## 局限性
 _Limitations_
+- **数据集 OOD on 2nd-3rd orange**：dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级。即便 horizon=70 + 5-run pooled，**精度仍随颗数线性退化**。这是数据问题不是模型问题。
+  _**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=70 with 5-run pooling, accuracy degrades linearly across oranges. This is a data issue, not a model issue._
+- **5-round single-run variance ±40%** — 任何单次 5-round 数字（包括 13/15 lucky tail）都不构成证据；至少 ≥3 runs pool。
+  _**±40% single-run variance** — any single 5-round number (including 13/15 lucky tails) is noise; pool ≥3 runs._
 - 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证，不保证真机 deploy。
   _No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
 - 同任务对照 / Same-task comparisons：
   - [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
+  - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — v0.4 era 公开 ckpt（5-run pool = 18.3%）
+  - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 baseline
+- 完整训练 + eval 配方 + framework drift 调研：[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
 ## 致谢
 _Acknowledgments_
 - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
 - LeRobot 团队提供 ACT 实现 + async inference 框架
+- shadowHokage 公开训练配方作为复刻基线（暴露了 framework drift 问题）
 ## 引用
 _Citation_