💡HF Papers Live 4: Multi Modal models - a AI-Insight Collection

AI-Insight 's Collections

💡HF Papers Live 1: Reinforcement Learning

💡HF Papers Live 2: Code Bench

💡HF Papers Live 3: AI for Science

💡HF Papers Live 4: Multi Modal models

💡HF Papers Live 5: Omni-Modal models

💡HF Papers Live 6: OCR

💡HF Papers Live 4: Multi Modal models

updated Dec 3, 2025

internlm/Intern-S1

Image-Text-to-Text • 241B • Updated Mar 29 • 7.74k • 259
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 274
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3, 2024 • 96
openbmb/MiniCPM-V-4_5

Image-Text-to-Text • 9B • Updated Mar 10 • 462k • 1.1k
openbmb/MiniCPM-V-4

Image-Text-to-Text • 4B • Updated Sep 15, 2025 • 14.6k • 465
zai-org/GLM-4.5V

Image-Text-to-Text • 108B • Updated Oct 25, 2025 • 147k • • 719
zai-org/GLM-4.1V-9B-Thinking

Image-Text-to-Text • 10B • Updated 5 days ago • 421k • 781
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 257
Ovis2.5 Technical Report

Paper • 2508.11737 • Published Aug 15, 2025 • 116
ATH-MaaS/Ovis2.5-2B

Image-Text-to-Text • 3B • Updated Feb 13 • 10.6k • 201
stepfun-ai/step3

Image-Text-to-Text • 321B • Updated Jan 29 • 145k • 166