pi05_subtask / README.md
Student Watery
Create README.md
6a98371 verified
---
license: apache-2.0
language:
- en
tags:
- robotics
- vla
- pi05
- subtask
- openpi
- lerobot
- orbax
datasets:
- physical-intelligence/libero
pipeline_tag: robotics
---
# pi0.5 subtask fine-tune
A 100-step fine-tune of `pi05_base` for subtask generation from the original [pi05 paper](https://www.pi.website/download/pi05.pdf).
We reproduced steps from a community issue thread on openpi that studies this [#701](https://github.com/Physical-Intelligence/openpi/issues/701).
## TL;DR
- **Start weights**: `gs://openpi-assets/checkpoints/pi05_base/params`
- **Config**: `pi05_subtask_libero` (adds `Pi05Subtask` head: joint flow-matching + CE-on-subtask-tokens loss)
- **Training**: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal
- **Final loss**: 3.04 → 0.23
## Loading
```python
from pathlib import Path
import jax
import jax.numpy as jnp
import flax.nnx as nnx
from huggingface_hub import hf_hub_download
import tarfile
from openpi.models import model as _model
from openpi.models.pi0 import Pi0
from openpi.models.pi0_config import Pi0Config
# 1. Download + extract
tar = hf_hub_download("swatery/pi05-subtask",
"jax/pi05_subtask.tar")
tarfile.open(tar).extractall(".")
ckpt = Path("99")
# 2. Build model and restore weights
config = Pi0Config(pi05=True)
model = config.create(jax.random.key(0))
params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16)
nnx.update(model, nnx.State(params))
model.eval()
```
For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the `SubtaskGenerator` implementation in [openpi/hosting](https://github.com/Hebbian-Robotics/openpi) `src/hosting/subtask_generator.py`.
That module loads a checkpoint like this one and calls `.generate(prompt, images)`.
## Training details
| | |
|---|---|
| Architecture | pi0.5 — PaliGemma + Gemma action expert, with `Pi05Subtask` head |
| Loss | Flow-matching (action) + cross-entropy (subtask tokens) |
| Knowledge insulation | Yes — LM backbone receives only CE gradients |
| Steps | 100 |
| Batch size | 8 (global, single device) |
| Optimizer | AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup) |
| EMA decay | 0.999 |
| Precision | bfloat16 |
| Hardware | 1× NVIDIA H100 80GB (Modal) |
| Wall-clock | ~10 min training + ~5 min data/weight fetch |
### Data
- **Dataset**: first 30 episodes of `physical-intelligence/libero` chunk-000 (~8,294 frames)
- **Norm stats**: reused `pi05_libero`'s precomputed full-dataset stats from `gs://openpi-assets/checkpoints/pi05_libero/assets/`
- **Subtask annotation**: **identity** — `high_prompt = low_prompt = task_prompt`
(real hierarchical subtask annotations for LIBERO are not publicly available)
## References
- https://www.pi.website/blog/pi05
- https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation)
- https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation)
- https://github.com/LisavilaLee/openpi_with_subtask (fork with training example)
## License
- Code & fine-tuned weights: Apache 2.0 (inherited from openpi)
- Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.