CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning Paper • 2606.09393 • Published 17 days ago
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 8 days ago • 46
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation Paper • 2605.10912 • Published May 11 • 46
SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction Paper • 2605.20110 • Published May 19 • 4
DecQ: Detail-Condensing Queries for Enhanced Reconstruction and Generation in Representation Autoencoders Paper • 2605.22777 • Published May 21 • 5
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published 28 days ago • 23
Not only where, But when: Temporal Scheduling for RLVR Paper • 2605.25381 • Published May 25 • 6
AdaCodec: A Predictive Visual Code for Video MLLMs Paper • 2606.02569 • Published 24 days ago • 5
CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning Paper • 2606.09393 • Published 17 days ago
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 15 days ago • 200
AdaCodec: A Predictive Visual Code for Video MLLMs Paper • 2606.02569 • Published 24 days ago • 5
OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs Paper • 2606.03890 • Published 23 days ago • 31
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published 28 days ago • 23
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion Paper • 2605.30265 • Published 28 days ago • 23