pi0.5 subtask fine-tune

A 100-step fine-tune of pi05_base for subtask generation from the original pi05 paper. We reproduced steps from a community issue thread on openpi that studies this #701.

TL;DR

  • Start weights: gs://openpi-assets/checkpoints/pi05_base/params
  • Config: pi05_subtask_libero (adds Pi05Subtask head: joint flow-matching + CE-on-subtask-tokens loss)
  • Training: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal
  • Final loss: 3.04 → 0.23

Loading

from pathlib import Path
import jax
import jax.numpy as jnp
import flax.nnx as nnx
from huggingface_hub import hf_hub_download
import tarfile

from openpi.models import model as _model
from openpi.models.pi0 import Pi0
from openpi.models.pi0_config import Pi0Config

# 1. Download + extract
tar = hf_hub_download("swatery/pi05-subtask",
                     "jax/pi05_subtask.tar")
tarfile.open(tar).extractall(".")
ckpt = Path("99")

# 2. Build model and restore weights
config = Pi0Config(pi05=True)
model = config.create(jax.random.key(0))
params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16)
nnx.update(model, nnx.State(params))
model.eval()

For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the SubtaskGenerator implementation in openpi/hosting src/hosting/subtask_generator.py. That module loads a checkpoint like this one and calls .generate(prompt, images).

Training details

Architecture pi0.5 — PaliGemma + Gemma action expert, with Pi05Subtask head
Loss Flow-matching (action) + cross-entropy (subtask tokens)
Knowledge insulation Yes — LM backbone receives only CE gradients
Steps 100
Batch size 8 (global, single device)
Optimizer AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup)
EMA decay 0.999
Precision bfloat16
Hardware 1× NVIDIA H100 80GB (Modal)
Wall-clock ~10 min training + ~5 min data/weight fetch

Data

  • Dataset: first 30 episodes of physical-intelligence/libero chunk-000 (~8,294 frames)
  • Norm stats: reused pi05_libero's precomputed full-dataset stats from gs://openpi-assets/checkpoints/pi05_libero/assets/
  • Subtask annotation: identity — high_prompt = low_prompt = task_prompt (real hierarchical subtask annotations for LIBERO are not publicly available)

References

License

  • Code & fine-tuned weights: Apache 2.0 (inherited from openpi)
  • Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train Hebbian-Robotics/pi05_subtask