nielsr HF Staff commited on
Commit
da8f422
·
verified ·
1 Parent(s): d8f2c24

Add model card for CroSTAta

Browse files

This PR adds a comprehensive model card for CroSTAta. It includes:
- Metadata for the `robotics` pipeline and `lerobot` library.
- Links to the paper and official GitHub repository.
- A description of the novel State Transition Attention (STA) mechanism.
- Instructions for installation, training, and evaluation.
- Links to the relevant ManiSkill recovery datasets.
- Citation information.

Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: robotics
3
+ library_name: lerobot
4
+ tags:
5
+ - robotics
6
+ - manipulation
7
+ - transformer
8
+ ---
9
+
10
+ # CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation
11
+
12
+ CroSTAta introduces a novel **State Transition Attention (STA)** mechanism designed to improve robotic manipulation tasks by better capturing temporal dependencies and state transitions in sequential data. It enables policies to adapt behavior based on execution history, particularly in failure and recovery patterns.
13
+
14
+ - **Paper:** [CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation](https://huggingface.co/papers/2510.00726)
15
+ - **Repository:** [https://github.com/iit-DLSLab/croSTAta](https://github.com/iit-DLSLab/croSTAta)
16
+
17
+ <div align="center">
18
+ <img src="https://raw.githubusercontent.com/iit-DLSLab/croSTAta/main/docs/banner.png" alt="CroSTAta Banner" width="600">
19
+ </div>
20
+
21
+ ## Method Overview
22
+ The Cross-State Transition Attention Transformer modulates standard attention weights based on learned state evolution patterns. This approach is combined with temporal masking during training to encourage temporal reasoning from historical context, achieving more than 2x improvement over standard cross-attention on precision-critical tasks.
23
+
24
+ ## Installation
25
+ ```bash
26
+ git clone https://github.com/iit-DLSLab/croSTAta.git
27
+ cd croSTAta/
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ ## Usage
32
+
33
+ ### Training
34
+ To train the model on a specific environment (e.g., PegInsertionSide-v1) using ManiSkill:
35
+ ```bash
36
+ python train.py --task PegInsertionSide-v1 --envsim maniskill --num_envs 1 --val_episodes 100 --agent Maniskill/maniskill_sl_inference_cfg --device cuda --sim_device cuda
37
+ ```
38
+
39
+ ### Evaluation
40
+ To evaluate a saved checkpoint:
41
+ ```bash
42
+ python predict.py --task PegInsertionSide-v1 --envsim maniskill --num_envs 1 --val_episodes 100 --agent Maniskill/<cfg_file> --device cuda --sim_device cuda --resume --checkpoint save/<model_name>
43
+ ```
44
+ Note: `sl_agent` inference uses by default the policy's method `predict_batch` (batch prediction). For efficient inference, use the policy's method `predict` (prediction with cache).
45
+
46
+ ## Data
47
+ The model was evaluated using specific recovery datasets available on the Hugging Face Hub:
48
+ - [ManiSkill_StackCube-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_StackCube-v1_recovery)
49
+ - [ManiSkill_PegInsertionSide-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_PegInsertionSide-v1_recovery)
50
+ - [ManiSkill_TwoRobotStackCube-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_TwoRobotStackCube-v1_recovery)
51
+ - [ManiSkill_UnitreeG1TransportBox-v1_recovery](https://huggingface.co/datasets/johnMinelli/ManiSkill_UnitreeG1TransportBox-v1_recovery)
52
+
53
+ ## Citation
54
+ ```bibtex
55
+ @article{minelli2025crostata,
56
+ title={CroSTAta: Cross-State Transition Attention Transformer for Robotic Manipulation},
57
+ author={Minelli, Giovanni and Turrisi, Giulio and Barasuol, Victor and Semini, Claudio},
58
+ year={2025}
59
+ }
60
+ ```