| | --- |
| | license: mit |
| | tags: |
| | - model_hub_mixin |
| | - pytorch_model_hub_mixin |
| | pipeline_tag: robotics |
| | library_name: pytorch |
| | --- |
| | |
| | This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration: |
| | - Library: https://huggingface.co/phython96/ROCKET-1 |
| | - Docs: [More Information Needed] |
| | - Paper: https://huggingface.co/papers/2410.17856 |
| | - Github: https://github.com/CraftJarvis/ROCKET-1 |
| | - Project: https://craftjarvis.github.io/ROCKET-1 |
| |
|
| | ## Usage |
| | ```python |
| | from rocket.arm.models import ROCKET1 |
| | from rocket.stark_tech.env_interface import MinecraftWrapper |
| | |
| | model = ROCKET1.from_pretrained("phython96/ROCKET-1").to("cuda") |
| | memory = None |
| | input = { |
| | "img": torch.rand(224, 224, 3, dtype=torch.uint8), |
| | 'segment': { |
| | 'obj_id': torch.tensor(6), # specify the interaction type |
| | 'obj_mask': torch.zeros(224, 224, dtype=torch.uint8), # highlight the regions of interest |
| | } |
| | } |
| | agent_action, memory = model.get_action(input, memory, first=None, input_shape="*") |
| | env_action = MinecraftWrapper.agent_action_to_env(agent_action) |
| | |
| | # --------------------- the output --------------------- # |
| | # agent_action = {'buttons': tensor([1], device='cuda:0'), 'camera': tensor([54], device='cuda:0')} |
| | # env_action = {'attack': array(0), 'back': array(0), 'forward': array(0), 'jump': array(0), 'left': array(0), 'right': array(0), 'sneak': array(0), 'sprint': array(0), 'use': array(0), 'drop': array(0), 'inventory': array(0), 'hotbar.1': array(0), 'hotbar.2': array(0), 'hotbar.3': array(0), 'hotbar.4': array(0), 'hotbar.5': array(0), 'hotbar.6': array(0), 'hotbar.7': array(0), 'hotbar.8': array(0), 'hotbar.9': array(0), 'camera': array([-0.61539427, 10. ])} |
| | ``` |
| |
|
| | ## Interaction Details |
| |
|
| | Here are some interaction types: |
| | | interaction | obj_id | function | |
| | | --- | --- | --- | |
| | | Hunt | 0 | Approach the animals then kill it. | |
| | | Mine | 2 | Approach and mine the target object. | |
| | | Interact | 3 | Approach and right click the target object. | |
| | | Craft | 4 | Move the cursor to the item and click on it. | |
| | | Switch | 5 | Highlight an item in the hotkey bar, then switch to holding state. | |
| | | Approach | 6 | Approach the target object. | |
| | |
| | ## Play ROCKET-1 with Gradio |
| | Click the following picture to learn how to play ROCKET-1 with gradio. |
| | [](https://www.youtube.com/embed/qXLWw81p-Y0) |
| | |
| | ```sh |
| | cd rocket/arm |
| | python eval_rocket.py --port 8110 --sam-path "/path/to/sam2-ckpt-directory" |
| | ``` |
| | |
| | |
| | ## Citing ROCKET-1 |
| | If you use ROCKET-1 in your research, please use the following BibTeX entry. |
| | |
| | ``` |
| | @article{cai2024rocket, |
| | title={ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting}, |
| | author={Cai, Shaofei and Wang, Zihao and Lian, Kewei and Mu, Zhancun and Ma, Xiaojian and Liu, Anji and Liang, Yitao}, |
| | journal={arXiv preprint arXiv:2410.17856}, |
| | year={2024} |
| | } |
| | ``` |