Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 15 days ago • 114
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 15 days ago • 114
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22, 2025 • 91
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness Paper • 2502.14914 • Published Feb 19, 2025
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25, 2025 • 104