Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 1 day ago • 36
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 9 days ago • 184
EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models Paper • 2603.12252 • Published Mar 12 • 12
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing Paper • 2602.12205 • Published Feb 13 • 83
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published Dec 4, 2025 • 50
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published Dec 2, 2025 • 22
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published Nov 19, 2025 • 9
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published Nov 19, 2025 • 9 • 2
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published Oct 31, 2025 • 31