MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published 3 days ago • 4
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published 3 days ago • 4
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents Paper • 2605.18652 • Published 3 days ago • 4
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Paper • 2605.12684 • Published 9 days ago • 11
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Paper • 2605.12684 • Published 9 days ago • 11
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Paper • 2603.27064 • Published Mar 28 • 28
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Paper • 2603.27064 • Published Mar 28 • 28
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding Paper • 2603.27064 • Published Mar 28 • 28
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs Paper • 2602.06566 • Published Feb 6 • 3
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs Paper • 2602.06566 • Published Feb 6 • 3
SPARC: Separating Perception And Reasoning Circuits for Test-time Scaling of VLMs Paper • 2602.06566 • Published Feb 6 • 3
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10
MIRA: Multimodal Iterative Reasoning Agent for Image Editing Paper • 2511.21087 • Published Nov 26, 2025 • 10 • 2
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination Paper • 2511.17490 • Published Nov 21, 2025 • 22
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination Paper • 2511.17490 • Published Nov 21, 2025 • 22