--- license: apache-2.0 language: - en tags: - robotics - vla - pi05 - subtask - openpi - lerobot - orbax datasets: - physical-intelligence/libero pipeline_tag: robotics --- # pi0.5 subtask fine-tune A 100-step fine-tune of `pi05_base` for subtask generation from the original [pi05 paper](https://www.pi.website/download/pi05.pdf). We reproduced steps from a community issue thread on openpi that studies this [#701](https://github.com/Physical-Intelligence/openpi/issues/701). ## TL;DR - **Start weights**: `gs://openpi-assets/checkpoints/pi05_base/params` - **Config**: `pi05_subtask_libero` (adds `Pi05Subtask` head: joint flow-matching + CE-on-subtask-tokens loss) - **Training**: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal - **Final loss**: 3.04 → 0.23 ## Loading ```python from pathlib import Path import jax import jax.numpy as jnp import flax.nnx as nnx from huggingface_hub import hf_hub_download import tarfile from openpi.models import model as _model from openpi.models.pi0 import Pi0 from openpi.models.pi0_config import Pi0Config # 1. Download + extract tar = hf_hub_download("swatery/pi05-subtask", "jax/pi05_subtask.tar") tarfile.open(tar).extractall(".") ckpt = Path("99") # 2. Build model and restore weights config = Pi0Config(pi05=True) model = config.create(jax.random.key(0)) params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16) nnx.update(model, nnx.State(params)) model.eval() ``` For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the `SubtaskGenerator` implementation in [openpi/hosting](https://github.com/Hebbian-Robotics/openpi) `src/hosting/subtask_generator.py`. That module loads a checkpoint like this one and calls `.generate(prompt, images)`. ## Training details | | | |---|---| | Architecture | pi0.5 — PaliGemma + Gemma action expert, with `Pi05Subtask` head | | Loss | Flow-matching (action) + cross-entropy (subtask tokens) | | Knowledge insulation | Yes — LM backbone receives only CE gradients | | Steps | 100 | | Batch size | 8 (global, single device) | | Optimizer | AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup) | | EMA decay | 0.999 | | Precision | bfloat16 | | Hardware | 1× NVIDIA H100 80GB (Modal) | | Wall-clock | ~10 min training + ~5 min data/weight fetch | ### Data - **Dataset**: first 30 episodes of `physical-intelligence/libero` chunk-000 (~8,294 frames) - **Norm stats**: reused `pi05_libero`'s precomputed full-dataset stats from `gs://openpi-assets/checkpoints/pi05_libero/assets/` - **Subtask annotation**: **identity** — `high_prompt = low_prompt = task_prompt` (real hierarchical subtask annotations for LIBERO are not publicly available) ## References - https://www.pi.website/blog/pi05 - https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation) - https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation) - https://github.com/LisavilaLee/openpi_with_subtask (fork with training example) ## License - Code & fine-tuned weights: Apache 2.0 (inherited from openpi) - Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.