MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory Paper • 2605.15128 • Published 20 days ago • 62
EvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification Paper • 2604.01687 • Published Apr 2 • 8
M3-Bench: Multi-Modal, Multi-Hop, Multi-Threaded Tool-Using MLLM Agent Benchmark Paper • 2511.17729 • Published Nov 21, 2025 • 17
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph Paper • 2511.00086 • Published Oct 29, 2025 • 42