InternRobotics
/

InternVLA-M1

image-text-to-text

vision-language-action-model

vision-language-model

text-generation-inference

Model card Files Files and versions

InternVLA-M1 / README.md

chenyilun95's picture

Update README.md

ce8a342 verified 5 months ago

|

history blame contribute delete

972 Bytes

	---
	license: cc-by-nc-sa-4.0
	base_model:
	- Qwen/Qwen2.5-VL-3B-Instruct
	tags:
	- robotics
	- vision-language-action-model
	- vision-language-model
	library_name: transformers
	---
	# Model Card for InternVLA-M1

	## Description:
	InternVLA-M1 is an open-source, end-to-end vision–language–action (VLA) framework for building and researching generalist robot policies. The checkpoints in this repository were pretrained on the system2 dataset.
	- 🌐 Homepage: [InternVLA-M1 Project Page](https://internrobotics.github.io/internvla-m1.github.io/)
	- 💻 Codebase: [InternVLA-M1 GitHub Repo](https://github.com/InternRobotics/InternVLA-M1)


	![image/png](https://github.com/InternRobotics/InternVLA-M1/raw/InternVLA-M1/assets/teaser.png)



	## Citation
	```
	@misc{internvla2024,
	title = {InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy},
	author = {InternVLA-M1 Contributors},
	year = {2025},
	booktitle={arXiv},
	}
	```