AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models Paper • 2607.02269 • Published 3 days ago • 7
VIOLA: Towards Video In-Context Learning with Minimal Annotations Paper • 2601.15549 • Published Jan 22 • 5
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 19 days ago • 63
EarlyTom: Early Token Compression Completes Fast Video Understanding Paper • 2605.30010 • Published May 28 • 32
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published May 27 • 93