| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - robotics |
| - vla |
| - pi05 |
| - subtask |
| - openpi |
| - lerobot |
| - orbax |
| datasets: |
| - physical-intelligence/libero |
| pipeline_tag: robotics |
| --- |
| |
| # pi0.5 subtask fine-tune |
|
|
| A 100-step fine-tune of `pi05_base` for subtask generation from the original [pi05 paper](https://www.pi.website/download/pi05.pdf). |
| We reproduced steps from a community issue thread on openpi that studies this [#701](https://github.com/Physical-Intelligence/openpi/issues/701). |
|
|
| ## TL;DR |
|
|
| - **Start weights**: `gs://openpi-assets/checkpoints/pi05_base/params` |
| - **Config**: `pi05_subtask_libero` (adds `Pi05Subtask` head: joint flow-matching + CE-on-subtask-tokens loss) |
| - **Training**: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal |
| - **Final loss**: 3.04 → 0.23 |
|
|
| ## Loading |
|
|
| ```python |
| from pathlib import Path |
| import jax |
| import jax.numpy as jnp |
| import flax.nnx as nnx |
| from huggingface_hub import hf_hub_download |
| import tarfile |
| |
| from openpi.models import model as _model |
| from openpi.models.pi0 import Pi0 |
| from openpi.models.pi0_config import Pi0Config |
| |
| # 1. Download + extract |
| tar = hf_hub_download("swatery/pi05-subtask", |
| "jax/pi05_subtask.tar") |
| tarfile.open(tar).extractall(".") |
| ckpt = Path("99") |
| |
| # 2. Build model and restore weights |
| config = Pi0Config(pi05=True) |
| model = config.create(jax.random.key(0)) |
| params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16) |
| nnx.update(model, nnx.State(params)) |
| model.eval() |
| ``` |
|
|
| For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the `SubtaskGenerator` implementation in [openpi/hosting](https://github.com/Hebbian-Robotics/openpi) `src/hosting/subtask_generator.py`. |
| That module loads a checkpoint like this one and calls `.generate(prompt, images)`. |
|
|
| ## Training details |
|
|
| | | | |
| |---|---| |
| | Architecture | pi0.5 — PaliGemma + Gemma action expert, with `Pi05Subtask` head | |
| | Loss | Flow-matching (action) + cross-entropy (subtask tokens) | |
| | Knowledge insulation | Yes — LM backbone receives only CE gradients | |
| | Steps | 100 | |
| | Batch size | 8 (global, single device) | |
| | Optimizer | AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup) | |
| | EMA decay | 0.999 | |
| | Precision | bfloat16 | |
| | Hardware | 1× NVIDIA H100 80GB (Modal) | |
| | Wall-clock | ~10 min training + ~5 min data/weight fetch | |
|
|
| ### Data |
|
|
| - **Dataset**: first 30 episodes of `physical-intelligence/libero` chunk-000 (~8,294 frames) |
| - **Norm stats**: reused `pi05_libero`'s precomputed full-dataset stats from `gs://openpi-assets/checkpoints/pi05_libero/assets/` |
| - **Subtask annotation**: **identity** — `high_prompt = low_prompt = task_prompt` |
| (real hierarchical subtask annotations for LIBERO are not publicly available) |
|
|
| ## References |
|
|
| - https://www.pi.website/blog/pi05 |
| - https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation) |
| - https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation) |
| - https://github.com/LisavilaLee/openpi_with_subtask (fork with training example) |
|
|
| ## License |
|
|
| - Code & fine-tuned weights: Apache 2.0 (inherited from openpi) |
| - Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0. |