MultiHaystack: Benchmarking Multimodal Retrieval and Reasoning over 40K Images, Videos, and Documents Paper • 2603.05697 • Published Mar 5
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published 7 days ago • 85
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration Paper • 2512.02981 • Published Dec 2, 2025 • 1
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 9
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation Paper • 2503.19065 • Published Mar 24, 2025 • 11