benchmarks - a plmsmile Collection

plmsmile 's Collections

vision foundation modesl

image-video llm

video generation

mllm applications

benchmarks

updated Aug 13, 2024

benchmarks

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Paper • 2404.16790 • Published Apr 25, 2024 • 10
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots

Paper • 2405.07990 • Published May 13, 2024 • 20
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Paper • 2406.09411 • Published Jun 13, 2024 • 19
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

Paper • 2406.05967 • Published Jun 10, 2024 • 6
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Paper • 2406.08407 • Published Jun 12, 2024 • 28
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Paper • 2406.09961 • Published Jun 14, 2024 • 55
Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 55
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Paper • 2406.08418 • Published Jun 12, 2024 • 32
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Paper • 2406.10118 • Published Jun 14, 2024 • 32
VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Paper • 2406.10227 • Published Jun 14, 2024 • 9
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Paper • 2406.11833 • Published Jun 17, 2024 • 62
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning

Paper • 2406.12742 • Published Jun 18, 2024 • 15
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models

Paper • 2406.11230 • Published Jun 17, 2024 • 33
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Paper • 2406.14515 • Published Jun 20, 2024 • 33
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Paper • 2406.16338 • Published Jun 24, 2024 • 26
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26, 2024 • 31
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Paper • 2407.01284 • Published Jul 1, 2024 • 81
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

Paper • 2407.00468 • Published Jun 29, 2024 • 36
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

Paper • 2407.01791 • Published Jul 1, 2024 • 6
HEMM: Holistic Evaluation of Multimodal Foundation Models

Paper • 2407.03418 • Published Jul 3, 2024 • 12