wsagi commited on
Commit
d3cb037
·
verified ·
1 Parent(s): 9bb4acc

README: v0.4.0 ckpt-18k h=70 5-run pool + framework drift section

Browse files
Files changed (1) hide show
  1. README.md +104 -48
README.md CHANGED
@@ -32,19 +32,30 @@ _An [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) pol
32
  - **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
33
  _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
34
  - **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
35
- - **架构 / Architecture**:ACT chunk_size=100,~80M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
36
- - **训练 / Training**:batch=8 / lr=1e-5 / 10k step / **关闭图像增强**,~5h on RTX 4090。
37
- - **评测 / Eval**:Isaac Sim 5.1 + LeIsaac,**1/1 success @ 120s sim time**(3 颗全部放盘成功)
38
- - **⚠️ 关键 inference 配置 / Critical inference setting**:`policy_action_horizon=32`。
39
- 默认值 16 会让模型卡在第二颗橙子(爪子抖),8 会卡在第一颗。详见下方 [Inference caveat](#-推理关键配置--critical-inference-caveat)。
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## 模型亮点
42
  _Highlights_
43
 
44
- - **复刻 + 验证 [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) 的配方**,得到等价或更好的成功率
45
- _Reproduces and validates the [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) recipe with comparable or better success rate._
46
- - **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**:chunk_size=100 的 ACT 需要 horizon 32 才能让宏观运动段完整执行,详见 README诊断章节
47
- _Exposes a hidden trap in LeIsaac's default `policy_action_horizon=16`: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute._
48
  - 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
49
 
50
  ## 训练配方
@@ -54,69 +65,97 @@ _Training recipe_
54
  |---|---|
55
  | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
56
  | Policy | `act` (LeRobot 实现 / LeRobot impl.) |
 
57
  | Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
58
  | `chunk_size` | 100 |
59
  | `n_action_steps` | 100 |
60
  | Batch size | 8 |
61
  | Optimizer | AdamW |
62
  | Learning rate | 1e-5 (constant) |
63
- | Steps | 10,000 |
64
  | Image augmentation | **disabled** |
65
  | Hardware | RTX 4090 (24 GB) |
66
- | Wall-clock | ~5 hours |
67
- | Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) |
68
 
69
  训练入口脚本在我们的 LeIsaac fork:[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
70
  _Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._
71
 
72
- ## 评测结果
73
- _Eval results_
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
- | 配置 / Config | 1 颗 | 第 2 颗 | 第 3 | Episode 成功率 |
76
- |---|---|---|---|---|
77
- | horizon=8 | 🔴 卡死(夹住不动) | — | — | 0/1 |
78
- | horizon=16 | ✅ 成功 | 🟡 爪子抖 / muting | — | 0/1 |
79
- | **horizon=32** | ✅ 成功 | ✅ 折腾后成功 | ✅ 折腾后成功 | **1/1** ✅ |
 
 
 
 
 
 
 
 
 
80
 
81
  测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=30`,dual-cam 观测。
82
  _Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
83
 
84
- **单 sample 警告 / Single-sample caveat**:以上 1/1 是单一 episode 结果,未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题,是配置问题。
85
- _The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue._
86
-
87
  ## ⚠️ 推理关键配置 / Critical inference caveat
88
 
89
- **ACT chunk_size=100 + 默认 horizon=16 = 第二颗橙子永远过不去。**不是 ACT弱点 LeIsaac 默认配置的隐性陷阱
90
- _**ACT chunk_size=100 + the default horizon=16 will deadlock on the 2nd orange.** This is not an ACT weakness; it's a hidden trap in LeIsaac's default config._
91
 
92
  ### 根因 / Root cause
93
 
94
- ACT 每个 chunk 输出 100 步动作,是一段**完整规划**:前 ~10 步是"启动 / 加速",中段 (step 20-80) 才是真正的**宏观运动**(接近 → 夹起 → 提起 → 运送 → 释放)。LeRobot async client 用直接窗口 (receding horizon),每 `policy_action_horizon` 步重新查询一次。
95
- _Each ACT chunk outputs a 100-step planned trajectory: the first ~10 steps are "startup", and steps 20-80 are the macro-motion (approach grasp lift transport release). The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps._
96
-
97
- - horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段",**根本到不了宏观运动** → 卡死。
98
- _horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock._
99
- - horizon=16 → 够第 1 颗的简单"靠近→夹起",但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。
100
- _horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter._
101
- - horizon=32 → 给 macro-motion 完整执行机会,1/1 通过。
102
 
103
  ### 推荐配置 / Recommended settings
104
 
105
  ```bash
106
  --policy_type=lerobot-act
107
- --policy_action_horizon=32
108
- --policy_checkpoint_path=<path-to-this-model>
109
- --step_hz=30 # 对齐 dataset 30Hz / matches dataset 30Hz
110
  --episode_length_s=120
111
  ```
112
 
113
  ## 使用方法
114
  _Usage_
115
 
116
- ### 1. 启动 LeRobot async policy_server
117
 
118
  ```bash
119
- pip install lerobot
 
120
  python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
121
  ```
122
 
@@ -128,26 +167,43 @@ python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
128
  cd LeIsaac
129
  bash scripts/evaluation/run_eval.sh -- \
130
  --task=LeIsaac-SO101-PickOrange-v0 \
131
- --eval_rounds=3 \
132
  --episode_length_s=120 \
133
  --step_hz=30 \
134
  --policy_type=lerobot-act \
135
  --policy_host=127.0.0.1 --policy_port=8080 \
136
  --policy_checkpoint_path=wsagi/ACT-PickOrange \
137
- --policy_action_horizon=32 \
138
  --policy_language_instruction="Pick up the orange and place it on the plate" \
139
  --device=cuda --enable_cameras
140
  ```
141
 
142
- `run_eval.sh` 自动按 user-patience cap 计算 wall-clock timeout,避免无意义等待慢推理。
143
- _`run_eval.sh` auto-computes a user-patience wall-clock timeout so slow inference fails fast._
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144
 
145
  ## 局限性
146
  _Limitations_
147
 
148
- - **数据集 OOD on 2nd-3rd orange**:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级,model 在那里 monotonic 变难、动作变"折腾"。即便 horizon=32 救了形式上的成功率,**精度仍随颗数线性退化**。这是数据问题不是模型问题。
149
- _**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=32 the policy gets visibly more jittery on later oranges. This is a data issue, not a model issue._
150
- - 三个独立架构 (我们的 ACT / Diffusion Policy / SmolVLA / 公开 shadowHokage ACT) 在同一 dataset 上 **共同 OOD on 3rd orange** — 全 family 共病
 
151
  - 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。
152
  _No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
153
 
@@ -156,16 +212,16 @@ _Related_
156
 
157
  - 同任务对照 / Same-task comparisons:
158
  - [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
159
- - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — 同配方公开 ckpt(我们的复刻参考
160
- - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 SOTA(30s 完成 3 颗)
161
- - 完整训练 + eval 配方:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
162
 
163
  ## 致谢
164
  _Acknowledgments_
165
 
166
  - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
167
  - LeRobot 团队提供 ACT 实现 + async inference 框架
168
- - shadowHokage 公开训练配方作为复刻基线
169
 
170
  ## 引用
171
  _Citation_
 
32
  - **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
33
  _Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
34
  - **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
35
+ - **架构 / Architecture**:ACT chunk_size=100,~52M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
36
+ - **训练 / Training**:lerobot **v0.4.0**, batch=8 / lr=1e-5 / 20k step / 关闭图像增强,~10h on RTX 4090. **本 ckpt = step 18000** (sweet spot)
37
+ - **评测 / Eval**:Isaac Sim 5.1 + LeIsaac,**5-round × 5-run pooled** = 33/75 oranges = **44.0% per-orange success** (95% CI [29.5%, 58.5%])
38
+ - **⚠️ 关键 inference 配置 / Critical inference setting**:`policy_action_horizon=70`(旧 v0.5.2 ckpt 的 horizon=32 不适用本 v0.4.0 ckpt,详见 [Inference caveat](#-推理关键配置--critical-inference-caveat))
39
+
40
+ ## 🌳 分支说明 / Branch layout
41
+
42
+ 本 repo 有两个 ckpt,分别记录 framework drift 故事的两端:
43
+ _Two checkpoints are tracked in this repo, capturing both ends of the framework drift story:_
44
+
45
+ | Branch | lerobot version | Training step | best horizon | 🍊 per-orange p (5-run pool) | 备注 |
46
+ |---|---|---|---|---|---|
47
+ | **main** (本 ckpt) | **v0.4.0** | **18000** | **70** | **0.440** (33/75) | 当前推荐 / current canonical |
48
+ | `lerobot-v052-ckpt-10k` | v0.5.2 | 10000 | 32 (旧推荐 / old) | 0.267 (4/15 single 5-round) | 历史对照 / archived for framework-drift study |
49
+
50
+ 详见下方 [Framework drift section](#framework-drift--lerobot-v04-vs-v05)。
51
+ _See [Framework drift section](#framework-drift--lerobot-v04-vs-v05) below._
52
 
53
  ## 模型亮点
54
  _Highlights_
55
 
56
+ - **5-round × 5-run pooled 严格统计** confirmed: 44.0% per-orange (95% CI [29.5%, 58.5%]),显著优于 shadowHokage 公开 ckpt 18.3% (95% CI [10.6%, 26.0%])。Welch t-test (per-ep, 消除 episode-cluster) **p=0.034**two-proportion Z test **p=0.008**
57
+ - **暴露了 lerobot v0.4 → v0.5 framework drift**:同 dataset / 同 seed / config,仅切换 lerobot 版本,v0.5.2 训出的 ckpt 跌到 18-27% per-orange(同 shadowHokage 真实水平),锁回 v0.4.0 才恢复 44%。详见底部 framework drift section。
58
+ - **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**:chunk_size=100 的 ACT 需要 per-ckpt sweep 找最优 h(本 ckpt h=70;不同训练曲线产出 ckpt 最优 h 不同)
 
59
  - 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
60
 
61
  ## 训练配方
 
65
  |---|---|
66
  | Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
67
  | Policy | `act` (LeRobot 实现 / LeRobot impl.) |
68
+ | **lerobot version** | **v0.4.0** (锁版本以避免 framework drift) |
69
  | Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
70
  | `chunk_size` | 100 |
71
  | `n_action_steps` | 100 |
72
  | Batch size | 8 |
73
  | Optimizer | AdamW |
74
  | Learning rate | 1e-5 (constant) |
75
+ | Steps | 20,000 (本 ckpt = step **18000**, 经 sweep 是 sweet spot) |
76
  | Image augmentation | **disabled** |
77
  | Hardware | RTX 4090 (24 GB) |
78
+ | Wall-clock | ~10 hours |
79
+ | Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy)(v0.4 era 配方原型)|
80
 
81
  训练入口脚本在我们的 LeIsaac fork:[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
82
  _Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._
83
 
84
+ ## 评测结果 / Eval results
85
+
86
+ ### 5-round × 5-run pooled stats (25 episodes total)
87
+
88
+ 5-round 协议在 ACT 上 single-run variance 实测 ±40%(同 ckpt 同 horizon 跨 5 runs 范围 2-13/15),所以 canonical 数字必须 pooled multi-run。
89
+ _The 5-round protocol has ±40% single-run variance for ACT (same ckpt + same horizon, range 2-13/15 across 5 runs), so canonical numbers must be pooled across multiple runs._
90
+
91
+ | 配置 / Config | 🍊 per-orange p | per-episode mean | 95% CI (per-orange) |
92
+ |---|---|---|---|
93
+ | **wsagi/ACT-PickOrange v0.4.0 ckpt-18k h=70** (本 ckpt, 5 runs) | **0.440** | 1.32/ep | **[0.295, 0.585]** |
94
+ | shadowHokage/act_policy h={16,32,64,70} (4 runs) | 0.183 | 0.55/ep | [0.106, 0.260] |
95
+
96
+ **显著性 / Significance**:
97
+ - Two-proportion Z test (per-orange iid): Z = 2.67, **p = 0.008** ✅
98
+ - Welch t-test (per-episode, 消 episode-cluster over-dispersion): t = 2.13, df ≈ 38, **p = 0.034** ✅
99
+ - Effect ratio: **2.20×**
100
+
101
+ ### 0-3 oranges per-episode 分布 / Per-episode oranges distribution
102
+
103
+ ACT chunk-policy 是 trajectory-level 决策,不是 per-orange iid — 一旦 trajectory 进入正确模式 → 3 颗 cluster 连续成功;一旦偏 → 0 颗全废。**实际分布 bimodal 而非 binomial**:
104
+ _ACT chunks make trajectory-level decisions, not per-orange iid — once the trajectory enters the correct mode, all 3 oranges cluster as a successful streak; once it goes off-track, the entire episode is wasted. **Observed distribution is bimodal, not binomial**:_
105
+
106
+ | oranges/ep | observed (25 ep) | Binomial(3, 0.440) expected | observed / expected |
107
+ |---|---|---|---|
108
+ | 0 | 11 | 4.4 | **2.51×** (over-dispersed) |
109
+ | 1 | 2 | 10.3 | 0.19× (under) |
110
+ | 2 | 5 | 8.1 | 0.61× (under) |
111
+ | 3 | 7 | 2.1 | **3.29×** (over-dispersed) |
112
 
113
+ 两端 (0/3) binomial 预期多 2.5-3.3×,中间 (1/2) 比预期少一半 bimodal/U-shape 签名。
114
+ _Both tails (0/3) appear 2.5-3.3× more often than binomial; middle bins (1/2) appear at half the expected rate — bimodal/U-shape signature._
115
+
116
+ ### Per-run 数据点 / Per-run datapoints
117
+
118
+ ckpt-18k h=70 5 runs (25 episodes total):
119
+ ```
120
+ run1: [3, 3, 3, 2, 2] = 13/15 (lucky tail, P≈0.003% under binomial)
121
+ run2: [1, 1, 0, 0, 0] = 2/15
122
+ run3: [2, 0, 3, 0, 3] = 8/15
123
+ run4: [3, 0, 0, 2, 0] = 5/15
124
+ run5: [0, 0, 3, 2, 0] = 5/15
125
+ ```
126
+ 范围 2-13/15 = ±40% range,pooled mean = 33/75。
127
 
128
  测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=30`,dual-cam 观测。
129
  _Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
130
 
 
 
 
131
  ## ⚠️ 推理关键配置 / Critical inference caveat
132
 
133
+ ** v0.4.0 ckpt 最优 horizon = 70**不是 v0.5.2 ckpt 32!)。每个训练曲线产出的 ckpt 最优 inference horizon 不同必须 per-ckpt sweep
134
+ _**The v0.4.0 ckpt's best horizon is 70** (not the old v0.5.2 ckpt's 32!). Each training trajectory produces a ckpt with different optimal inference horizon — per-ckpt sweep is required._
135
 
136
  ### 根因 / Root cause
137
 
138
+ ACT 每个 chunk 输出 100 步动作,是一段**完整规划**。LeRobot async client 用直接窗口 (receding horizon),每 `policy_action_horizon` 步重新查询一次。**chunk 内 action 一致性** 决定了 best horizon — 训练 framework drift 改了 dataloader RNG / loss normalization → ckpt 内化的 chunk 一致性不同 → 最优 replan 频率不同。
139
+ _Each ACT chunk outputs a 100-step planned trajectory. The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps. **Chunk-internal action coherence** determines the best horizon framework drift (dataloader RNG / loss normalization) changes the chunk coherence baked into the ckpt → optimal re-plan frequency shifts._
 
 
 
 
 
 
140
 
141
  ### 推荐配置 / Recommended settings
142
 
143
  ```bash
144
  --policy_type=lerobot-act
145
+ --policy_action_horizon=70 # for THIS ckpt (v0.4.0 ckpt-18k); 旧 v0.5.2 ckpt 用 32
146
+ --policy_checkpoint_path=wsagi/ACT-PickOrange
147
+ --step_hz=30 # 对齐 dataset 30Hz / matches dataset 30Hz
148
  --episode_length_s=120
149
  ```
150
 
151
  ## 使用方法
152
  _Usage_
153
 
154
+ ### 1. 启动 LeRobot async policy_server (lerobot v0.4.0)
155
 
156
  ```bash
157
+ conda create -n lerobot-v040 python=3.10 -y && conda activate lerobot-v040
158
+ pip install lerobot==0.4.0 # 必须锁版本!避免 framework drift
159
  python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
160
  ```
161
 
 
167
  cd LeIsaac
168
  bash scripts/evaluation/run_eval.sh -- \
169
  --task=LeIsaac-SO101-PickOrange-v0 \
170
+ --eval_rounds=5 \
171
  --episode_length_s=120 \
172
  --step_hz=30 \
173
  --policy_type=lerobot-act \
174
  --policy_host=127.0.0.1 --policy_port=8080 \
175
  --policy_checkpoint_path=wsagi/ACT-PickOrange \
176
+ --policy_action_horizon=70 \
177
  --policy_language_instruction="Pick up the orange and place it on the plate" \
178
  --device=cuda --enable_cameras
179
  ```
180
 
181
+ ## Framework drift lerobot v0.4 vs v0.5
182
+
183
+ 本 ckpt 重训于 lerobot **v0.4.0**(锁版本),而不是 main repo 最新 v0.5.x。原因:
184
+ _This ckpt was retrained on lerobot **v0.4.0** (pinned version), not the latest v0.5.x main. Reason:_
185
+
186
+ | Training framework | 5-round per-orange p | 显著性 |
187
+ |---|---|---|
188
+ | lerobot v0.4.0(本 ckpt)| **0.440** (5-run pool, 25 ep) | baseline |
189
+ | lerobot v0.5.2 + 2 patches | 0.267 (4/15 single 5-round) | -39% vs v0.4.0 (left-tail p≈0.1%) |
190
+ | shadowHokage (v0.4 era, 2026-01) | 0.183 (4-h sweep, 20 ep) | -58% vs v0.4.0, Z=2.67 **p=0.008** |
191
+
192
+ **关键发现 / Key findings**:
193
+ - lerobot **PR #3406 (a8b72d96)** 改 dataloader (`persistent_workers/uint8/prefetch`) 在 2026-04-19 merge
194
+ - lerobot **PR #3442 (1add4606)** 改 ACT padding loss 在 2026-04-23 merge
195
+ - 两个 PR 都 land 在 v0.5.0 (2026-04-26);锁回 v0.4.0 可恢复 0.440 per-orange
196
+
197
+ 完整 ablation + 三模型 brainstorm 详见我们的设计文档:[`act_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/act_finetune_pick_orange.html)。
198
+ _Full ablation + 3-model brainstorm in our design doc: [`act_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/act_finetune_pick_orange.html)._
199
 
200
  ## 局限性
201
  _Limitations_
202
 
203
+ - **数据集 OOD on 2nd-3rd orange**:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级。即便 horizon=70 + 5-run pooled,**精度仍随颗数线性退化**。这是数据问题不是模型问题。
204
+ _**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=70 with 5-run pooling, accuracy degrades linearly across oranges. This is a data issue, not a model issue._
205
+ - **5-round single-run variance ±40%** 任何单次 5-round 数字(包括 13/15 lucky tail)都不构成证据;至少 ≥3 runs pool
206
+ _**±40% single-run variance** — any single 5-round number (including 13/15 lucky tails) is noise; pool ≥3 runs._
207
  - 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。
208
  _No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
209
 
 
212
 
213
  - 同任务对照 / Same-task comparisons:
214
  - [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
215
+ - [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — v0.4 era 公开 ckpt(5-run pool = 18.3%
216
+ - [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 baseline
217
+ - 完整训练 + eval 配方 + framework drift 调研:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
218
 
219
  ## 致谢
220
  _Acknowledgments_
221
 
222
  - LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
223
  - LeRobot 团队提供 ACT 实现 + async inference 框架
224
+ - shadowHokage 公开训练配方作为复刻基线(暴露了 framework drift 问题)
225
 
226
  ## 引用
227
  _Citation_