pi0.5 subtask fine-tune
A 100-step fine-tune of pi05_base for subtask generation from the original pi05 paper.
We reproduced steps from a community issue thread on openpi that studies this #701.
TL;DR
- Start weights:
gs://openpi-assets/checkpoints/pi05_base/params - Config:
pi05_subtask_libero(addsPi05Subtaskhead: joint flow-matching + CE-on-subtask-tokens loss) - Training: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal
- Final loss: 3.04 → 0.23
Loading
from pathlib import Path
import jax
import jax.numpy as jnp
import flax.nnx as nnx
from huggingface_hub import hf_hub_download
import tarfile
from openpi.models import model as _model
from openpi.models.pi0 import Pi0
from openpi.models.pi0_config import Pi0Config
# 1. Download + extract
tar = hf_hub_download("swatery/pi05-subtask",
"jax/pi05_subtask.tar")
tarfile.open(tar).extractall(".")
ckpt = Path("99")
# 2. Build model and restore weights
config = Pi0Config(pi05=True)
model = config.create(jax.random.key(0))
params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16)
nnx.update(model, nnx.State(params))
model.eval()
For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the SubtaskGenerator implementation in openpi/hosting src/hosting/subtask_generator.py.
That module loads a checkpoint like this one and calls .generate(prompt, images).
Training details
| Architecture | pi0.5 — PaliGemma + Gemma action expert, with Pi05Subtask head |
| Loss | Flow-matching (action) + cross-entropy (subtask tokens) |
| Knowledge insulation | Yes — LM backbone receives only CE gradients |
| Steps | 100 |
| Batch size | 8 (global, single device) |
| Optimizer | AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup) |
| EMA decay | 0.999 |
| Precision | bfloat16 |
| Hardware | 1× NVIDIA H100 80GB (Modal) |
| Wall-clock | ~10 min training + ~5 min data/weight fetch |
Data
- Dataset: first 30 episodes of
physical-intelligence/liberochunk-000 (~8,294 frames) - Norm stats: reused
pi05_libero's precomputed full-dataset stats fromgs://openpi-assets/checkpoints/pi05_libero/assets/ - Subtask annotation: identity —
high_prompt = low_prompt = task_prompt(real hierarchical subtask annotations for LIBERO are not publicly available)
References
- https://www.pi.website/blog/pi05
- https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation)
- https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation)
- https://github.com/LisavilaLee/openpi_with_subtask (fork with training example)
License
- Code & fine-tuned weights: Apache 2.0 (inherited from openpi)
- Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.