PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers Paper • 2604.11307 • Published 21 days ago • 1
MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval Paper • 2509.26378 • Published Sep 30, 2025
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery Paper • 2604.25256 • Published 6 days ago • 28
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published Feb 11 • 59
MemoBrain: Executive Memory as an Agentic Brain for Reasoning Paper • 2601.08079 • Published Jan 12 • 39
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration Paper • 2601.06860 • Published Jan 11 • 16
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis Paper • 2601.05808 • Published Jan 9 • 37