Instructions to use wsagi/ACT-PickOrange with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use wsagi/ACT-PickOrange with LeRobot:
- Notebooks
- Google Colab
- Kaggle
README: v0.4.0 ckpt-18k h=70 5-run pool + framework drift section
Browse files
README.md
CHANGED
|
@@ -32,19 +32,30 @@ _An [ACT (Action Chunking Transformer)](https://tonyzhaozh.github.io/aloha/) pol
|
|
| 32 |
- **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
|
| 33 |
_Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
|
| 34 |
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
|
| 35 |
-
- **架构 / Architecture**:ACT chunk_size=100,~
|
| 36 |
-
- **训练 / Training**:batch=8 / lr=1e-5 /
|
| 37 |
-
- **评测 / Eval**:Isaac Sim 5.1 + LeIsaac,**
|
| 38 |
-
- **⚠️ 关键 inference 配置 / Critical inference setting**:`policy_action_horizon=
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## 模型亮点
|
| 42 |
_Highlights_
|
| 43 |
|
| 44 |
-
- **
|
| 45 |
-
|
| 46 |
-
- **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**:chunk_size=100 的 ACT 需要
|
| 47 |
-
_Exposes a hidden trap in LeIsaac's default `policy_action_horizon=16`: ACT models with chunk_size=100 require horizon ≥ 32 to let the macro-motion segment of each chunk execute._
|
| 48 |
- 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
|
| 49 |
|
| 50 |
## 训练配方
|
|
@@ -54,69 +65,97 @@ _Training recipe_
|
|
| 54 |
|---|---|
|
| 55 |
| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
|
| 56 |
| Policy | `act` (LeRobot 实现 / LeRobot impl.) |
|
|
|
|
| 57 |
| Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
|
| 58 |
| `chunk_size` | 100 |
|
| 59 |
| `n_action_steps` | 100 |
|
| 60 |
| Batch size | 8 |
|
| 61 |
| Optimizer | AdamW |
|
| 62 |
| Learning rate | 1e-5 (constant) |
|
| 63 |
-
| Steps |
|
| 64 |
| Image augmentation | **disabled** |
|
| 65 |
| Hardware | RTX 4090 (24 GB) |
|
| 66 |
-
| Wall-clock | ~
|
| 67 |
-
| Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy) |
|
| 68 |
|
| 69 |
训练入口脚本在我们的 LeIsaac fork:[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
|
| 70 |
_Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._
|
| 71 |
|
| 72 |
-
## 评测结果
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=30`,dual-cam 观测。
|
| 82 |
_Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
|
| 83 |
|
| 84 |
-
**单 sample 警告 / Single-sample caveat**:以上 1/1 是单一 episode 结果,未跑统计意义上的多轮平均。但 horizon=8 / 16 / 32 三个失败模式的 monotonic 趋势 (失败 → 部分失败 → 成功) 足以做 falsification — 不是模型问题,是配置问题。
|
| 85 |
-
_The 1/1 success rate is from a single episode, not statistically averaged. However, the monotonic failure-mode pattern across horizon=8/16/32 (stuck → jitter → success) is sufficient as a falsification: this is a configuration issue, not a model capability issue._
|
| 86 |
-
|
| 87 |
## ⚠️ 推理关键配置 / Critical inference caveat
|
| 88 |
|
| 89 |
-
**
|
| 90 |
-
_**
|
| 91 |
|
| 92 |
### 根因 / Root cause
|
| 93 |
|
| 94 |
-
ACT 每个 chunk 输出 100 步动作,是一段**完整规划**
|
| 95 |
-
_Each ACT chunk outputs a 100-step planned trajectory
|
| 96 |
-
|
| 97 |
-
- horizon=8 → 每次只执行前 8 步就丢掉重 query → 永远在执行"启动段",**根本到不了宏观运动** → 卡死。
|
| 98 |
-
_horizon=8 → only the first 8 startup steps are ever executed → the macro-motion never fires → deadlock._
|
| 99 |
-
- horizon=16 → 够第 1 颗的简单"靠近→夹起",但第 2 颗的"放→后退→接近第 2 颗"复杂段需要更长执行窗 → 模型 OOD + 短 horizon 双重打击 → 抖。
|
| 100 |
-
_horizon=16 → enough for the simple "approach → grasp" of orange #1, but the post-1st-orange transition demands a longer execution window → OOD state + short horizon compound → jitter._
|
| 101 |
-
- horizon=32 → 给 macro-motion 完整执行机会,1/1 通过。
|
| 102 |
|
| 103 |
### 推荐配置 / Recommended settings
|
| 104 |
|
| 105 |
```bash
|
| 106 |
--policy_type=lerobot-act
|
| 107 |
-
--policy_action_horizon=32
|
| 108 |
-
--policy_checkpoint_path=
|
| 109 |
-
--step_hz=30
|
| 110 |
--episode_length_s=120
|
| 111 |
```
|
| 112 |
|
| 113 |
## 使用方法
|
| 114 |
_Usage_
|
| 115 |
|
| 116 |
-
### 1. 启动 LeRobot async policy_server
|
| 117 |
|
| 118 |
```bash
|
| 119 |
-
|
|
|
|
| 120 |
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
|
| 121 |
```
|
| 122 |
|
|
@@ -128,26 +167,43 @@ python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
|
|
| 128 |
cd LeIsaac
|
| 129 |
bash scripts/evaluation/run_eval.sh -- \
|
| 130 |
--task=LeIsaac-SO101-PickOrange-v0 \
|
| 131 |
-
--eval_rounds=
|
| 132 |
--episode_length_s=120 \
|
| 133 |
--step_hz=30 \
|
| 134 |
--policy_type=lerobot-act \
|
| 135 |
--policy_host=127.0.0.1 --policy_port=8080 \
|
| 136 |
--policy_checkpoint_path=wsagi/ACT-PickOrange \
|
| 137 |
-
--policy_action_horizon=
|
| 138 |
--policy_language_instruction="Pick up the orange and place it on the plate" \
|
| 139 |
--device=cuda --enable_cameras
|
| 140 |
```
|
| 141 |
|
| 142 |
-
|
| 143 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
## 局限性
|
| 146 |
_Limitations_
|
| 147 |
|
| 148 |
-
- **数据集 OOD on 2nd-3rd orange**:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级
|
| 149 |
-
_**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=
|
| 150 |
-
-
|
|
|
|
| 151 |
- 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。
|
| 152 |
_No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
|
| 153 |
|
|
@@ -156,16 +212,16 @@ _Related_
|
|
| 156 |
|
| 157 |
- 同任务对照 / Same-task comparisons:
|
| 158 |
- [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
|
| 159 |
-
- [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) —
|
| 160 |
-
- [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5
|
| 161 |
-
- 完整训练 + eval 配方:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
|
| 162 |
|
| 163 |
## 致谢
|
| 164 |
_Acknowledgments_
|
| 165 |
|
| 166 |
- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
|
| 167 |
- LeRobot 团队提供 ACT 实现 + async inference 框架
|
| 168 |
-
- shadowHokage 公开训练配方作为复刻基线
|
| 169 |
|
| 170 |
## 引用
|
| 171 |
_Citation_
|
|
|
|
| 32 |
- **任务 / Task**:`Pick up the orange and place it on the plate` — SO-101 单臂依次夹起 3 颗橙子并放盘子。
|
| 33 |
_Single-arm SO-101 picks 3 oranges sequentially and places each on a plate._
|
| 34 |
- **数据集 / Dataset**:[`LightwheelAI/leisaac-pick-orange`](https://huggingface.co/datasets/LightwheelAI/leisaac-pick-orange) — 60 episode 遥操示范。
|
| 35 |
+
- **架构 / Architecture**:ACT chunk_size=100,~52M 参数,纯 vision + joint state → action chunk regression(无 LLM / 无 diffusion)。
|
| 36 |
+
- **训练 / Training**:lerobot **v0.4.0**, batch=8 / lr=1e-5 / 20k step / 关闭图像增强,~10h on RTX 4090. **本 ckpt = step 18000** (sweet spot)。
|
| 37 |
+
- **评测 / Eval**:Isaac Sim 5.1 + LeIsaac,**5-round × 5-run pooled** = 33/75 oranges = **44.0% per-orange success** (95% CI [29.5%, 58.5%])。
|
| 38 |
+
- **⚠️ 关键 inference 配置 / Critical inference setting**:`policy_action_horizon=70`(旧 v0.5.2 ckpt 的 horizon=32 不适用本 v0.4.0 ckpt,详见 [Inference caveat](#-推理关键配置--critical-inference-caveat))。
|
| 39 |
+
|
| 40 |
+
## 🌳 分支说明 / Branch layout
|
| 41 |
+
|
| 42 |
+
本 repo 有两个 ckpt,分别记录 framework drift 故事的两端:
|
| 43 |
+
_Two checkpoints are tracked in this repo, capturing both ends of the framework drift story:_
|
| 44 |
+
|
| 45 |
+
| Branch | lerobot version | Training step | best horizon | 🍊 per-orange p (5-run pool) | 备注 |
|
| 46 |
+
|---|---|---|---|---|---|
|
| 47 |
+
| **main** (本 ckpt) | **v0.4.0** | **18000** | **70** | **0.440** (33/75) | 当前推荐 / current canonical |
|
| 48 |
+
| `lerobot-v052-ckpt-10k` | v0.5.2 | 10000 | 32 (旧推荐 / old) | 0.267 (4/15 single 5-round) | 历史对照 / archived for framework-drift study |
|
| 49 |
+
|
| 50 |
+
详见下方 [Framework drift section](#framework-drift--lerobot-v04-vs-v05)。
|
| 51 |
+
_See [Framework drift section](#framework-drift--lerobot-v04-vs-v05) below._
|
| 52 |
|
| 53 |
## 模型亮点
|
| 54 |
_Highlights_
|
| 55 |
|
| 56 |
+
- **5-round × 5-run pooled 严格统计** confirmed: 44.0% per-orange (95% CI [29.5%, 58.5%]),显著优于 shadowHokage 公开 ckpt 18.3% (95% CI [10.6%, 26.0%])。Welch t-test (per-ep, 消除 episode-cluster) **p=0.034**,two-proportion Z test **p=0.008**。
|
| 57 |
+
- **暴露了 lerobot v0.4 → v0.5 framework drift**:同 dataset / 同 seed / 同 config,仅切换 lerobot 版本,v0.5.2 训出的 ckpt 跌到 18-27% per-orange(同 shadowHokage 真实水平),锁回 v0.4.0 才恢复 44%。详见底部 framework drift section。
|
| 58 |
+
- **暴露了 LeIsaac 默认 `policy_action_horizon=16` 的隐性陷阱**:chunk_size=100 的 ACT 需要 per-ckpt sweep 找最优 h(本 ckpt h=70;不同训练曲线产出的 ckpt 最优 h 不同)。
|
|
|
|
| 59 |
- 无 image augmentation、无 weight decay 调参、无 special trick — 干净的 ACT baseline。
|
| 60 |
|
| 61 |
## 训练配方
|
|
|
|
| 65 |
|---|---|
|
| 66 |
| Dataset | `LightwheelAI/leisaac-pick-orange` (60 ep, dual-cam 480×640 RGB + 6 DOF state, 30 Hz) |
|
| 67 |
| Policy | `act` (LeRobot 实现 / LeRobot impl.) |
|
| 68 |
+
| **lerobot version** | **v0.4.0** (锁版本以避免 framework drift) |
|
| 69 |
| Backbone | ResNet18 vision encoder + Transformer encoder/decoder |
|
| 70 |
| `chunk_size` | 100 |
|
| 71 |
| `n_action_steps` | 100 |
|
| 72 |
| Batch size | 8 |
|
| 73 |
| Optimizer | AdamW |
|
| 74 |
| Learning rate | 1e-5 (constant) |
|
| 75 |
+
| Steps | 20,000 (本 ckpt = step **18000**, 经 sweep 是 sweet spot) |
|
| 76 |
| Image augmentation | **disabled** |
|
| 77 |
| Hardware | RTX 4090 (24 GB) |
|
| 78 |
+
| Wall-clock | ~10 hours |
|
| 79 |
+
| Recipe credit | [shadowHokage/act_policy](https://huggingface.co/shadowHokage/act_policy)(v0.4 era 配方原型)|
|
| 80 |
|
| 81 |
训练入口脚本在我们的 LeIsaac fork:[`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)。
|
| 82 |
_Training entrypoint script lives in our LeIsaac fork: [`scripts/training/act/train.sh`](https://github.com/vitorcen/LeIsaac-Training/blob/main/scripts/training/act/train.sh)._
|
| 83 |
|
| 84 |
+
## 评测结果 / Eval results
|
| 85 |
+
|
| 86 |
+
### 5-round × 5-run pooled stats (25 episodes total)
|
| 87 |
+
|
| 88 |
+
5-round 协议在 ACT 上 single-run variance 实测 ±40%(同 ckpt 同 horizon 跨 5 runs 范围 2-13/15),所以 canonical 数字必须 pooled multi-run。
|
| 89 |
+
_The 5-round protocol has ±40% single-run variance for ACT (same ckpt + same horizon, range 2-13/15 across 5 runs), so canonical numbers must be pooled across multiple runs._
|
| 90 |
+
|
| 91 |
+
| 配置 / Config | 🍊 per-orange p | per-episode mean | 95% CI (per-orange) |
|
| 92 |
+
|---|---|---|---|
|
| 93 |
+
| **wsagi/ACT-PickOrange v0.4.0 ckpt-18k h=70** (本 ckpt, 5 runs) | **0.440** | 1.32/ep | **[0.295, 0.585]** |
|
| 94 |
+
| shadowHokage/act_policy h={16,32,64,70} (4 runs) | 0.183 | 0.55/ep | [0.106, 0.260] |
|
| 95 |
+
|
| 96 |
+
**显著性 / Significance**:
|
| 97 |
+
- Two-proportion Z test (per-orange iid): Z = 2.67, **p = 0.008** ✅
|
| 98 |
+
- Welch t-test (per-episode, 消 episode-cluster over-dispersion): t = 2.13, df ≈ 38, **p = 0.034** ✅
|
| 99 |
+
- Effect ratio: **2.20×**
|
| 100 |
+
|
| 101 |
+
### 0-3 oranges per-episode 分布 / Per-episode oranges distribution
|
| 102 |
+
|
| 103 |
+
ACT chunk-policy 是 trajectory-level 决策,不是 per-orange iid — 一旦 trajectory 进入正确模式 → 3 颗 cluster 连续成功;一旦偏 → 0 颗全废。**实际分布 bimodal 而非 binomial**:
|
| 104 |
+
_ACT chunks make trajectory-level decisions, not per-orange iid — once the trajectory enters the correct mode, all 3 oranges cluster as a successful streak; once it goes off-track, the entire episode is wasted. **Observed distribution is bimodal, not binomial**:_
|
| 105 |
+
|
| 106 |
+
| oranges/ep | observed (25 ep) | Binomial(3, 0.440) expected | observed / expected |
|
| 107 |
+
|---|---|---|---|
|
| 108 |
+
| 0 | 11 | 4.4 | **2.51×** (over-dispersed) |
|
| 109 |
+
| 1 | 2 | 10.3 | 0.19× (under) |
|
| 110 |
+
| 2 | 5 | 8.1 | 0.61× (under) |
|
| 111 |
+
| 3 | 7 | 2.1 | **3.29×** (over-dispersed) |
|
| 112 |
|
| 113 |
+
两端 (0/3) 比 binomial 预期多 2.5-3.3×,中间 (1/2) 比预期少一半 — bimodal/U-shape 签名。
|
| 114 |
+
_Both tails (0/3) appear 2.5-3.3× more often than binomial; middle bins (1/2) appear at half the expected rate — bimodal/U-shape signature._
|
| 115 |
+
|
| 116 |
+
### Per-run 数据点 / Per-run datapoints
|
| 117 |
+
|
| 118 |
+
ckpt-18k h=70 5 runs (25 episodes total):
|
| 119 |
+
```
|
| 120 |
+
run1: [3, 3, 3, 2, 2] = 13/15 (lucky tail, P≈0.003% under binomial)
|
| 121 |
+
run2: [1, 1, 0, 0, 0] = 2/15
|
| 122 |
+
run3: [2, 0, 3, 0, 3] = 8/15
|
| 123 |
+
run4: [3, 0, 0, 2, 0] = 5/15
|
| 124 |
+
run5: [0, 0, 3, 2, 0] = 5/15
|
| 125 |
+
```
|
| 126 |
+
范围 2-13/15 = ±40% range,pooled mean = 33/75。
|
| 127 |
|
| 128 |
测试环境 / Test setup:Isaac Sim 5.1,task `LeIsaac-SO101-PickOrange-v0`,`episode_length_s=120`,`step_hz=30`,dual-cam 观测。
|
| 129 |
_Test setup: Isaac Sim 5.1, task `LeIsaac-SO101-PickOrange-v0`, `episode_length_s=120`, `step_hz=30`, dual-cam observations._
|
| 130 |
|
|
|
|
|
|
|
|
|
|
| 131 |
## ⚠️ 推理关键配置 / Critical inference caveat
|
| 132 |
|
| 133 |
+
**本 v0.4.0 ckpt 最优 horizon = 70**(不是旧 v0.5.2 ckpt 的 32!)。每个训练曲线产出的 ckpt 最优 inference horizon 不同,必须 per-ckpt sweep。
|
| 134 |
+
_**The v0.4.0 ckpt's best horizon is 70** (not the old v0.5.2 ckpt's 32!). Each training trajectory produces a ckpt with different optimal inference horizon — per-ckpt sweep is required._
|
| 135 |
|
| 136 |
### 根因 / Root cause
|
| 137 |
|
| 138 |
+
ACT 每个 chunk 输出 100 步动作,是一段**完整规划**。LeRobot async client 用直接窗口 (receding horizon),每 `policy_action_horizon` 步重新查询一次。**chunk 内 action 一致性** 决定了 best horizon — 训练 framework drift 改了 dataloader RNG / loss normalization → ckpt 内化的 chunk 一致性不同 → 最优 replan 频率不同。
|
| 139 |
+
_Each ACT chunk outputs a 100-step planned trajectory. The LeRobot async client uses a sliding window, re-querying every `policy_action_horizon` steps. **Chunk-internal action coherence** determines the best horizon — framework drift (dataloader RNG / loss normalization) changes the chunk coherence baked into the ckpt → optimal re-plan frequency shifts._
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
### 推荐配置 / Recommended settings
|
| 142 |
|
| 143 |
```bash
|
| 144 |
--policy_type=lerobot-act
|
| 145 |
+
--policy_action_horizon=70 # for THIS ckpt (v0.4.0 ckpt-18k); 旧 v0.5.2 ckpt 用 32
|
| 146 |
+
--policy_checkpoint_path=wsagi/ACT-PickOrange
|
| 147 |
+
--step_hz=30 # 对齐 dataset 30Hz / matches dataset 30Hz
|
| 148 |
--episode_length_s=120
|
| 149 |
```
|
| 150 |
|
| 151 |
## 使用方法
|
| 152 |
_Usage_
|
| 153 |
|
| 154 |
+
### 1. 启动 LeRobot async policy_server (lerobot v0.4.0)
|
| 155 |
|
| 156 |
```bash
|
| 157 |
+
conda create -n lerobot-v040 python=3.10 -y && conda activate lerobot-v040
|
| 158 |
+
pip install lerobot==0.4.0 # 必须锁版本!避免 framework drift
|
| 159 |
python -m lerobot.async_inference.policy_server --host 0.0.0.0 --port 8080
|
| 160 |
```
|
| 161 |
|
|
|
|
| 167 |
cd LeIsaac
|
| 168 |
bash scripts/evaluation/run_eval.sh -- \
|
| 169 |
--task=LeIsaac-SO101-PickOrange-v0 \
|
| 170 |
+
--eval_rounds=5 \
|
| 171 |
--episode_length_s=120 \
|
| 172 |
--step_hz=30 \
|
| 173 |
--policy_type=lerobot-act \
|
| 174 |
--policy_host=127.0.0.1 --policy_port=8080 \
|
| 175 |
--policy_checkpoint_path=wsagi/ACT-PickOrange \
|
| 176 |
+
--policy_action_horizon=70 \
|
| 177 |
--policy_language_instruction="Pick up the orange and place it on the plate" \
|
| 178 |
--device=cuda --enable_cameras
|
| 179 |
```
|
| 180 |
|
| 181 |
+
## Framework drift — lerobot v0.4 vs v0.5
|
| 182 |
+
|
| 183 |
+
本 ckpt 重训于 lerobot **v0.4.0**(锁版本),而不是 main repo 最新 v0.5.x。原因:
|
| 184 |
+
_This ckpt was retrained on lerobot **v0.4.0** (pinned version), not the latest v0.5.x main. Reason:_
|
| 185 |
+
|
| 186 |
+
| Training framework | 5-round per-orange p | 显著性 |
|
| 187 |
+
|---|---|---|
|
| 188 |
+
| lerobot v0.4.0(本 ckpt)| **0.440** (5-run pool, 25 ep) | baseline |
|
| 189 |
+
| lerobot v0.5.2 + 2 patches | 0.267 (4/15 single 5-round) | -39% vs v0.4.0 (left-tail p≈0.1%) |
|
| 190 |
+
| shadowHokage (v0.4 era, 2026-01) | 0.183 (4-h sweep, 20 ep) | -58% vs v0.4.0, Z=2.67 **p=0.008** |
|
| 191 |
+
|
| 192 |
+
**关键发现 / Key findings**:
|
| 193 |
+
- lerobot **PR #3406 (a8b72d96)** 改 dataloader (`persistent_workers/uint8/prefetch`) 在 2026-04-19 merge
|
| 194 |
+
- lerobot **PR #3442 (1add4606)** 改 ACT padding loss 在 2026-04-23 merge
|
| 195 |
+
- 两个 PR 都 land 在 v0.5.0 (2026-04-26);锁回 v0.4.0 可恢复 0.440 per-orange
|
| 196 |
+
|
| 197 |
+
完整 ablation + 三模型 brainstorm 详见我们的设计文档:[`act_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/act_finetune_pick_orange.html)。
|
| 198 |
+
_Full ablation + 3-model brainstorm in our design doc: [`act_finetune_pick_orange.html`](https://github.com/vitorcen/LeIsaac-Training/blob/main/docs/training/act_finetune_pick_orange.html)._
|
| 199 |
|
| 200 |
## 局限性
|
| 201 |
_Limitations_
|
| 202 |
|
| 203 |
+
- **数据集 OOD on 2nd-3rd orange**:dataset 60 episode × 每集 1 次"放第 N 颗"演示。第 2/3 颗的 state coverage 比第 1 颗稀疏一个数量级。即便 horizon=70 + 5-run pooled,**精度仍随颗数线性退化**。这是数据问题不是模型问题。
|
| 204 |
+
_**Dataset OOD on 2nd–3rd orange**: with 60 episodes × 1 "place N-th orange" demo each, state coverage drops by ~1 order of magnitude per orange. Even at horizon=70 with 5-run pooling, accuracy degrades linearly across oranges. This is a data issue, not a model issue._
|
| 205 |
+
- **5-round single-run variance ±40%** — 任何单次 5-round 数字(包括 13/15 lucky tail)都不构成证据;至少 ≥3 runs pool。
|
| 206 |
+
_**±40% single-run variance** — any single 5-round number (including 13/15 lucky tails) is noise; pool ≥3 runs._
|
| 207 |
- 无图像增强、无 domain randomization → real-world transfer 可能弱。本 ckpt 仅用于 Isaac Sim 仿真验证,不保证真机 deploy。
|
| 208 |
_No image augmentation or domain randomization → real-world transfer is likely weak. This checkpoint is only validated in Isaac Sim simulation; real-robot deployment is not guaranteed._
|
| 209 |
|
|
|
|
| 212 |
|
| 213 |
- 同任务对照 / Same-task comparisons:
|
| 214 |
- [`wsagi/DiffusionPolicy-PickOrange`](https://huggingface.co/wsagi/DiffusionPolicy-PickOrange) — 自训 Diffusion Policy (267M, DDIM 32-step swap)
|
| 215 |
+
- [`shadowHokage/act_policy`](https://huggingface.co/shadowHokage/act_policy) — v0.4 era 公开 ckpt(5-run pool = 18.3%)
|
| 216 |
+
- [`LightwheelAI/leisaac-pick-orange-v0`](https://huggingface.co/LightwheelAI/leisaac-pick-orange-v0) — GR00T N1.5 baseline
|
| 217 |
+
- 完整训练 + eval 配方 + framework drift 调研:[vitorcen/LeIsaac-Training](https://github.com/vitorcen/LeIsaac-Training) fork
|
| 218 |
|
| 219 |
## 致谢
|
| 220 |
_Acknowledgments_
|
| 221 |
|
| 222 |
- LeIsaac 团队 + LightwheelAI 提供任务环境和数据集
|
| 223 |
- LeRobot 团队提供 ACT 实现 + async inference 框架
|
| 224 |
+
- shadowHokage 公开训练配方作为复刻基线(暴露了 framework drift 问题)
|
| 225 |
|
| 226 |
## 引用
|
| 227 |
_Citation_
|