benchmarks
updated
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper
• 2404.12390
• Published
• 26
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with
Text-Rich Visual Comprehension
Paper
• 2404.16790
• Published
• 10
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
Language Models in Code Generation from Scientific Plots
Paper
• 2405.07990
• Published
• 20
MuirBench: A Comprehensive Benchmark for Robust Multi-image
Understanding
Paper
• 2406.09411
• Published
• 19
CVQA: Culturally-diverse Multilingual Visual Question Answering
Benchmark
Paper
• 2406.05967
• Published
• 6
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
• 2406.08407
• Published
• 28
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via
Chart-to-Code Generation
Paper
• 2406.09961
• Published
• 55
Needle In A Multimodal Haystack
Paper
• 2406.07230
• Published
• 54
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images
Interleaved with Text
Paper
• 2406.08418
• Published
• 32
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for
Southeast Asian Languages
Paper
• 2406.10118
• Published
• 32
VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Paper
• 2406.10227
• Published
• 9
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
Instruction-Tuning Dataset for LVLMs
Paper
• 2406.11833
• Published
• 62
Benchmarking Multi-Image Understanding in Vision and Language Models:
Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Paper
• 2406.12742
• Published
• 15
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of
Multimodal Large Language Models
Paper
• 2406.11230
• Published
• 33
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video
Understanding
Paper
• 2406.14515
• Published
• 33
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
Large Video-Language Models
Paper
• 2406.16338
• Published
• 26
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal
LLMs
Paper
• 2406.18521
• Published
• 30
We-Math: Does Your Large Multimodal Model Achieve Human-like
Mathematical Reasoning?
Paper
• 2407.01284
• Published
• 81
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and
Efficient Evaluation
Paper
• 2407.00468
• Published
• 35
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
Paper
• 2407.01791
• Published
• 6
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paper
• 2407.03418
• Published
• 12