| | --- |
| | license: cc-by-nc-sa-4.0 |
| | base_model: |
| | - Qwen/Qwen2.5-VL-3B-Instruct |
| | tags: |
| | - robotics |
| | - vision-language-action-model |
| | - vision-language-model |
| | library_name: transformers |
| | --- |
| | # Model Card for InternVLA-M1 |
| |
|
| | ## Description: |
| | **InternVLA-M1** is an open-source, end-to-end **vision–language–action (VLA) framework** for building and researching generalist robot policies. The checkpoints in this repository were pretrained on the system2 dataset. |
| | - 🌐 Homepage: [InternVLA-M1 Project Page](https://internrobotics.github.io/internvla-m1.github.io/) |
| | - 💻 Codebase: [InternVLA-M1 GitHub Repo](https://github.com/InternRobotics/InternVLA-M1) |
| |
|
| |
|
| |  |
| |
|
| |
|
| |
|
| | ## Citation |
| | ``` |
| | @misc{internvla2024, |
| | title = {InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy}, |
| | author = {InternVLA-M1 Contributors}, |
| | year = {2025}, |
| | booktitle={arXiv}, |
| | } |
| | ``` |