pi0.5 subtask fine-tune

A 100-step fine-tune of pi05_base for subtask generation from the original pi05 paper. We reproduced steps from a community issue thread on openpi that studies this #701.

TL;DR

Start weights: gs://openpi-assets/checkpoints/pi05_base/params
Config: pi05_subtask_libero (adds Pi05Subtask head: joint flow-matching + CE-on-subtask-tokens loss)
Training: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal
Final loss: 3.04 → 0.23

Loading

from pathlib import Path
import jax
import jax.numpy as jnp
import flax.nnx as nnx
from huggingface_hub import hf_hub_download
import tarfile

from openpi.models import model as _model
from openpi.models.pi0 import Pi0
from openpi.models.pi0_config import Pi0Config

# 1. Download + extract
tar = hf_hub_download("swatery/pi05-subtask",
                     "jax/pi05_subtask.tar")
tarfile.open(tar).extractall(".")
ckpt = Path("99")

# 2. Build model and restore weights
config = Pi0Config(pi05=True)
model = config.create(jax.random.key(0))
params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16)
nnx.update(model, nnx.State(params))
model.eval()

For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the SubtaskGenerator implementation in openpi/hosting src/hosting/subtask_generator.py. That module loads a checkpoint like this one and calls .generate(prompt, images).

Training details


Architecture	pi0.5 — PaliGemma + Gemma action expert, with `Pi05Subtask` head
Loss	Flow-matching (action) + cross-entropy (subtask tokens)
Knowledge insulation	Yes — LM backbone receives only CE gradients
Steps	100
Batch size	8 (global, single device)
Optimizer	AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup)
EMA decay	0.999
Precision	bfloat16
Hardware	1× NVIDIA H100 80GB (Modal)
Wall-clock	~10 min training + ~5 min data/weight fetch

Data

Dataset: first 30 episodes of physical-intelligence/libero chunk-000 (~8,294 frames)
Norm stats: reused pi05_libero's precomputed full-dataset stats from gs://openpi-assets/checkpoints/pi05_libero/assets/
Subtask annotation: identity — high_prompt = low_prompt = task_prompt (real hierarchical subtask annotations for LIBERO are not publicly available)

References

https://www.pi.website/blog/pi05
https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation)
https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation)
https://github.com/LisavilaLee/openpi_with_subtask (fork with training example)

License

Code & fine-tuned weights: Apache 2.0 (inherited from openpi)
Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Hebbian-Robotics
/

pi05_subtask