Hebbian-Robotics
/

pi05_subtask

Model card Files Files and versions

pi05_subtask / README.md

Student Watery

Create README.md

6a98371 verified 4 days ago

|

history blame contribute delete

3.36 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- robotics
	- vla
	- pi05
	- subtask
	- openpi
	- lerobot
	- orbax
	datasets:
	- physical-intelligence/libero
	pipeline_tag: robotics
	---

	# pi0.5 subtask fine-tune

	A 100-step fine-tune of `pi05_base` for subtask generation from the original [pi05 paper](https://www.pi.website/download/pi05.pdf).
	We reproduced steps from a community issue thread on openpi that studies this [#701](https://github.com/Physical-Intelligence/openpi/issues/701).

	## TL;DR

	- Start weights: `gs://openpi-assets/checkpoints/pi05_base/params`
	- Config: `pi05_subtask_libero` (adds `Pi05Subtask` head: joint flow-matching + CE-on-subtask-tokens loss)
	- Training: 100 steps × batch 8 on 30 LIBERO episodes, 1× H100 on Modal
	- Final loss: 3.04 → 0.23

	## Loading

	```python
	from pathlib import Path
	import jax
	import jax.numpy as jnp
	import flax.nnx as nnx
	from huggingface_hub import hf_hub_download
	import tarfile

	from openpi.models import model as _model
	from openpi.models.pi0 import Pi0
	from openpi.models.pi0_config import Pi0Config

	# 1. Download + extract
	tar = hf_hub_download("swatery/pi05-subtask",
	"jax/pi05_subtask.tar")
	tarfile.open(tar).extractall(".")
	ckpt = Path("99")

	# 2. Build model and restore weights
	config = Pi0Config(pi05=True)
	model = config.create(jax.random.key(0))
	params = _model.restore_params(ckpt / "params", dtype=jnp.bfloat16)
	nnx.update(model, nnx.State(params))
	model.eval()
	```

	For end-to-end subtask generation (JIT-compiled AR decode with ASCII vocab mask over PaliGemma's LM head), see the `SubtaskGenerator` implementation in [openpi/hosting](https://github.com/Hebbian-Robotics/openpi) `src/hosting/subtask_generator.py`.
	That module loads a checkpoint like this one and calls `.generate(prompt, images)`.

	## Training details

	\| \| \|
	\|---\|---\|
	\| Architecture \| pi0.5 — PaliGemma + Gemma action expert, with `Pi05Subtask` head \|
	\| Loss \| Flow-matching (action) + cross-entropy (subtask tokens) \|
	\| Knowledge insulation \| Yes — LM backbone receives only CE gradients \|
	\| Steps \| 100 \|
	\| Batch size \| 8 (global, single device) \|
	\| Optimizer \| AdamW, cosine schedule, peak LR 5e-5, warmup 10k (only 100 steps used, so effectively constant warmup) \|
	\| EMA decay \| 0.999 \|
	\| Precision \| bfloat16 \|
	\| Hardware \| 1× NVIDIA H100 80GB (Modal) \|
	\| Wall-clock \| ~10 min training + ~5 min data/weight fetch \|

	### Data

	- Dataset: first 30 episodes of `physical-intelligence/libero` chunk-000 (~8,294 frames)
	- Norm stats: reused `pi05_libero`'s precomputed full-dataset stats from `gs://openpi-assets/checkpoints/pi05_libero/assets/`
	- Subtask annotation: identity — `high_prompt = low_prompt = task_prompt`
	(real hierarchical subtask annotations for LIBERO are not publicly available)

	## References

	- https://www.pi.website/blog/pi05
	- https://github.com/Physical-Intelligence/openpi (upstream pi0.5 implementation)
	- https://github.com/Physical-Intelligence/openpi/issues/701 (community issue thread reproducing subtask generation)
	- https://github.com/LisavilaLee/openpi_with_subtask (fork with training example)

	## License

	- Code & fine-tuned weights: Apache 2.0 (inherited from openpi)
	- Gemma dependency: this checkpoint is derived from Google's Gemma via PaliGemma. Usage is subject to the Gemma Terms of Use in addition to Apache 2.0.