| --- |
| license: apache-2.0 |
| pipeline_tag: robotics |
| library_name: transformers |
| --- |
| |
| # Mixture of Horizons in Action Chunking |
|
|
| This repository hosts the official models and code for the paper: |
| [**Mixture of Horizons in Action Chunking**](https://huggingface.co/papers/2511.19433) |
|
|
| Project Page: https://timsty1.github.io/moh/ |
| Code Repository: https://github.com/Timsty1/MixtureOfHorizons/tree/main |
|
|
| ## Introduction |
| Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the **action chunk length** used during training, termed **horizon**. This paper proposes a **mixture of horizons (MoH)** strategy to mitigate the inherent trade-off between long-term foresight and short-term precision observed with fixed horizons. MoH rearranges action chunks into segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs. This approach allows MoH to exploit both long-term foresight and short-term precision jointly within a single model, improving performance and generalizability with minimal overhead. MoH also enables dynamic inference with adaptive horizons, achieving higher throughput while preserving superior performance. |
|
|
| <div align="center"> |
| <table border="0" cellspacing="0" cellpadding="0"> |
| <tr> |
| <td align="center" width="50%"> |
| <img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/study_of_horizons_pi0.png" alt="Trade-off Effect" width="100%"> |
| </td> |
| <td align="center" width="50%"> |
| <img src="https://huggingface.co/Timsty/mixture_of_horizons/resolve/main/figure/intro_motivation_v2.png" alt="Mixture of Horizons" width="100%"> |
| </td> |
| </tr> |
| <tr> |
| <td align="center" valign="top"> |
| Figure 1: Trade-off between long-term foresight and short-term precision induced by single horizon |
| </td> |
| <td align="center" valign="top"> |
| Figure 2: Overview of the proposed mixture-of-horizons strategy |
| </td> |
| </tr> |
| </table> |
| </div> |
| |
| ## Quick Start |
|
|
| ### 1. Environment Setup |
|
|
| Clone the repository and set up the conda environment: |
|
|
| ```bash |
| git clone git@github.com:Timsty1/MixtureOfHorizons.git |
| conda create -n moh -y python=3.10 |
| conda activate moh |
| pip install uv |
| cd MixtureOfHorizons |
| uv pip install -r requirements.txt |
| pip install packages/libero |
| pip install packages/openpi-client |
| ``` |
|
|
| ### 2. Modify Transformers Library |
|
|
| This implementation requires modifying the `transformers` library to support PyTorch-type $\pi$ series models, which rely on *gemma*, *paligemma*, and *siglip*. |
|
|
| First, locate your conda environment path: |
| ```bash |
| conda info --base |
| ``` |
| Then, copy the provided files to the transformers library directory (replace `YOUR_CONDA_DIR` with the path found above): |
| ```bash |
| cp -r ./src/openpi/models_pytorch/transformers_replace/* YOUR_CONDA_DIR/envs/moh/lib/python3.10/site-packages/transformers/ |
| ``` |
|
|
| ### 3. Inference with Code |
| You can use our provided "eagenerate" for speedup generation just like using 'generate' from Hugging Face. Here is an example. |
|
|
| ```python |
| import torch |
| from eagle.model.ea_model import EaModel |
| from fastchat.model import get_conversation_template |
| |
| # Replace with paths to your base model and EAGLE model checkpoints |
| # Example: base_model_path = "lmsys/vicuna-13b-v1.3", EAGLE_model_path = "Timsty/mixture_of_horizons" |
| base_model_path = "path/to/your/base_model" |
| EAGLE_model_path = "path/to/your/eagle_model" |
| |
| model = EaModel.from_pretrained( |
| base_model_path=base_model_path, |
| ea_model_path=EAGLE_model_path, |
| torch_dtype=torch.float16, |
| low_cpu_mem_usage=True, |
| device_map="auto", |
| total_token=-1 |
| ) |
| model.eval() |
| your_message="Hello" |
| conv = get_conversation_template("vicuna") # Use the correct template for your base model |
| conv.append_message(conv.roles[0], your_message) |
| conv.append_message(conv.roles[1], None) |
| prompt = conv.get_prompt() |
| input_ids=model.tokenizer([prompt]).input_ids |
| input_ids = torch.as_tensor(input_ids).cuda() |
| output_ids=model.eagenerate(input_ids,temperature=0.5,max_new_tokens=512) |
| output=model.tokenizer.decode(output_ids[0]) |
| print(output) |
| ``` |
| **Note:** Vicuna, LLaMA2-Chat, and LLaMA3-Instruct are both chat models. You need to use the correct chat template, otherwise it will cause abnormal output from the model and affect the performance of EAGLE. |
|
|
| ## ❤️ Acknowledgment |
|
|
| We express our gratitude to [OpenPi](https://github.com/Physical-Intelligence/openpi/tree/main), [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO), and [RoboTwin](https://robotwin-platform.github.io/) for their open-source contributions. |
|
|
| ## 📝 Citation |
| If you feel that this paper, models, or codes are helpful, please cite our paper, thanks for your support! |
|
|
| ```bibtex |
| @article{jing2025mixture_of_horizons, |
| title={Mixture of Horizons in Action Chunking}, |
| author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu}, |
| journal={arXiv preprint arXiv:2511.19433}, |
| year={2025} |
| } |
| ``` |