| | --- |
| | base_model: |
| | - Qwen/Qwen2.5-VL-7B-Instruct |
| | language: |
| | - en |
| | license: apache-2.0 |
| | pipeline_tag: image-text-to-text |
| | tags: |
| | - transformers |
| | - multimodal |
| | library_name: transformers |
| | --- |
| | |
| |
|
| | ## ๐ ReVisual-R1 (7B) โ Open-Source Multimodal Reasoner |
| |
|
| | > **One cold-start, two RL stages, endless reasoning power.** |
| |
|
| | --- |
| |
|
| | ### ๐ Highlights |
| |
|
| | * **SOTA on 9 tough benchmarks** covering visualโmath + text reasoning. |
| | * **Three-Stage SRO Training** |
| |
|
| | 1. **Text Cold-Start** โ seed deep reflection |
| | 2. **Multimodal RL** โ align vision & logic |
| | 3. **Text RL** โ polish fluency & brevity |
| | * **PAD** (Prioritized Advantage Distillation) keeps gradients alive. |
| | * **Efficient-Length Reward** = concise, self-reflective CoT. |
| |
|
| | --- |
| |
|
| | ### ๐ Resources |
| |
|
| | * [Paper](https://arxiv.org/abs/2506.04207) |
| | * [Code](https://github.com/CSfufu/Revisual-R1) |
| |
|
| |
|
| | --- |
| |
|
| | ### ๐ Citation |
| |
|
| | ```bibtex |
| | @article{chen2025advancing, |
| | title={Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning}, |
| | author={Chen, Shuang and Guo, Yue and Su, Zhaochen and Li, Yafu and Wu, Yulun and Chen, Jiacheng and Chen, Jiayu and Wang, Weijie and Qu, Xiaoye and Cheng, Yu}, |
| | journal={arXiv preprint arXiv:2506.04207}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | Take ReVisual-R1 for a spin and let us know what you build! ๐ฏ |
| |
|