zai-org/AutoGLM-Phone-9B-Multilingual Image-Text-to-Text • 934k • Updated 29 days ago • 7.57k • • 223
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs Paper • 2510.18876 • Published Oct 21, 2025 • 37