---
pipeline_tag: robotics
library_name: lerobot
tags:
- robotics
- manipulation
- transformer
---
# CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation
CroSTAta introduces a novel **State Transition Attention (STA)** mechanism designed to improve robotic manipulation tasks by better capturing temporal dependencies and state transitions in sequential data. It enables policies to adapt behavior based on execution history, particularly in failure and recovery patterns.
- **Paper:** [CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation](https://huggingface.co/papers/2510.00726)
- **Repository:** [https://github.com/iit-DLSLab/croSTAta](https://github.com/iit-DLSLab/croSTAta)
## Method Overview
The Cross-State Transition Attention Transformer modulates standard attention weights based on learned state evolution patterns. This approach is combined with temporal masking during training to encourage temporal reasoning from historical context, achieving more than 2x improvement over standard cross-attention on precision-critical tasks.
## Installation
```bash
git clone https://github.com/iit-DLSLab/croSTAta.git
cd croSTAta/
pip install -r requirements.txt
```
## Usage
### Training
To train the model on a specific environment (e.g., PegInsertionSide-v1) using ManiSkill:
```bash
python train.py --task PegInsertionSide-v1 --envsim maniskill --num_envs 1 --val_episodes 100 --agent Maniskill/maniskill_sl_inference_cfg --device cuda --sim_device cuda
```
### Evaluation
To evaluate a saved checkpoint:
```bash
python predict.py --task PegInsertionSide-v1 --envsim maniskill --num_envs 1 --val_episodes 100 --agent Maniskill/ --device cuda --sim_device cuda --resume --checkpoint save/
```
Note: `sl_agent` inference uses by default the policy's method `predict_batch` (batch prediction). For efficient inference, use the policy's method `predict` (prediction with cache).
## Data
The model was evaluated using specific recovery datasets available on the Hugging Face Hub:
- [ManiSkill_StackCube-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_StackCube-v1_recovery)
- [ManiSkill_PegInsertionSide-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_PegInsertionSide-v1_recovery)
- [ManiSkill_TwoRobotStackCube-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_TwoRobotStackCube-v1_recovery)
- [ManiSkill_UnitreeG1TransportBox-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_UnitreeG1TransportBox-v1_recovery)
## Citation
```bibtex
@article{minelli2025crostata,
title={CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation},
author={Minelli, Giovanni and Turrisi, Giulio and Barasuol, Victor and Semini, Claudio},
year={2025}
}
```