blankyang233 commited on
Commit
c8bae6a
Β·
verified Β·
1 Parent(s): f0e663d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -14
README.md CHANGED
@@ -9,12 +9,10 @@
9
 
10
 
11
  ## πŸ”₯ Overview
12
-
13
  Multimodal large language models (MLLMs) have demonstrated strong vision-language reasoning and increasingly underpin embodied agents. However, unified models that simultaneously support tasks in digital and physical spaces and generalize across embodiments remain scarce. To address this gap, we propose <b>Boundless Large Model (BLM<sub>0</sub>)</b>, a multimodal spatial foundation model that preserves native instruction-following and reasoning while injecting embodied knowledge and enabling robust cross-embodiment control. BLM<sub>0</sub> unifies three core capabilities: cross-space transfer, cross-task learning, and cross-embodiment generalization, which are realized through a two-stage training recipe. Stage I uses curated digital corpora to impart embodied knowledge to the MLLM while preserving language abilities. Stage II trains a policy module via an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, avoiding MLLM fine-tuning. It uses a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six increasingly challenging tasks. We evaluate BLM<sub>0</sub> as a single model on both digital and physical benchmarks and compare it against four families: Multimodal Large Language Models, Embodied Large Language Models, Vision-Language-Action models, and General Multimodal Large Models. BLM<sub>0</sub> improves digital-space tasks by approximately <b>6%</b> and physical-space tasks by approximately <b>3%</b>.
14
 
15
 
16
  ## πŸš€ Features
17
-
18
  - Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
19
  - Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
20
  - A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
@@ -22,7 +20,6 @@ Multimodal large language models (MLLMs) have demonstrated strong vision-languag
22
 
23
 
24
  ## πŸ—žοΈ News
25
-
26
  - **`2025-09-25`**: πŸ€— [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
27
 
28
 
@@ -39,7 +36,6 @@ pip install -r requirements.txt
39
 
40
 
41
  Install and launch VLLM
42
-
43
  ```bash
44
  # Install vllm package
45
  pip install vllm
@@ -54,7 +50,6 @@ vllm serve ./model \
54
  ```
55
 
56
  Run python script as example:
57
-
58
  ```python
59
  from openai import OpenAI
60
  import base64
@@ -99,12 +94,10 @@ print(response.choices[0].message.content)
99
  ## πŸ€– Evaluation
100
 
101
  ### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
102
-
103
  <div align="center">
104
  <img src="images/digital-space.png" />
105
  </div>
106
 
107
-
108
  ### Comparison with existing VLAs on physical-space benchmarks
109
 
110
  <div align="center">
@@ -112,19 +105,17 @@ print(response.choices[0].message.content)
112
  </div>
113
 
114
 
115
-
116
  **†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
117
  **β˜…** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
118
 
119
  ## πŸ“‘ Citation
120
-
121
  If you find this project useful, please consider citing our paper.
122
-
123
  ```bib
124
- @article{,
125
- title={},
126
- author={},
 
127
  journal={},
128
  year={2025}
129
  }
130
- ```
 
9
 
10
 
11
  ## πŸ”₯ Overview
 
12
  Multimodal large language models (MLLMs) have demonstrated strong vision-language reasoning and increasingly underpin embodied agents. However, unified models that simultaneously support tasks in digital and physical spaces and generalize across embodiments remain scarce. To address this gap, we propose <b>Boundless Large Model (BLM<sub>0</sub>)</b>, a multimodal spatial foundation model that preserves native instruction-following and reasoning while injecting embodied knowledge and enabling robust cross-embodiment control. BLM<sub>0</sub> unifies three core capabilities: cross-space transfer, cross-task learning, and cross-embodiment generalization, which are realized through a two-stage training recipe. Stage I uses curated digital corpora to impart embodied knowledge to the MLLM while preserving language abilities. Stage II trains a policy module via an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, avoiding MLLM fine-tuning. It uses a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six increasingly challenging tasks. We evaluate BLM<sub>0</sub> as a single model on both digital and physical benchmarks and compare it against four families: Multimodal Large Language Models, Embodied Large Language Models, Vision-Language-Action models, and General Multimodal Large Models. BLM<sub>0</sub> improves digital-space tasks by approximately <b>6%</b> and physical-space tasks by approximately <b>3%</b>.
13
 
14
 
15
  ## πŸš€ Features
 
16
  - Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
17
  - Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
18
  - A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
 
20
 
21
 
22
  ## πŸ—žοΈ News
 
23
  - **`2025-09-25`**: πŸ€— [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
24
 
25
 
 
36
 
37
 
38
  Install and launch VLLM
 
39
  ```bash
40
  # Install vllm package
41
  pip install vllm
 
50
  ```
51
 
52
  Run python script as example:
 
53
  ```python
54
  from openai import OpenAI
55
  import base64
 
94
  ## πŸ€– Evaluation
95
 
96
  ### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
 
97
  <div align="center">
98
  <img src="images/digital-space.png" />
99
  </div>
100
 
 
101
  ### Comparison with existing VLAs on physical-space benchmarks
102
 
103
  <div align="center">
 
105
  </div>
106
 
107
 
 
108
  **†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
109
  **β˜…** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
110
 
111
  ## πŸ“‘ Citation
 
112
  If you find this project useful, please consider citing our paper.
 
113
  ```bib
114
+ @article{
115
+ BLM-0,
116
+ title={BLM$_0$: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning},
117
+ author={WenTao Tan, Bowen Wang, Heng Zhi, Chenyu Liu, Zhe Li, Jian Liu, Zenrong Lin, Yukun Dai, Yipeng Chen, Wenjie Yang, Enci Xie, Hao Xue, Baixu Ji, Chen Xu, Zhibin Wang, Tianshi Wang, Lei Zhu, Hengtao Shen},
118
  journal={},
119
  year={2025}
120
  }
121
+ ```