YoCausal: How Far is Video Generation from World Model? A Causality Perspective Paper • 2605.30346 • Published 3 days ago • 37
CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation Paper • 2605.25378 • Published 6 days ago • 50
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models Paper • 2605.30263 • Published 3 days ago • 46
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 3 days ago • 61
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published 3 days ago • 91
AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security Paper • 2605.29801 • Published 3 days ago • 106
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published 9 days ago • 204
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning Paper • 2605.30231 • Published 3 days ago • 1
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7, 2025 • 143
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 164
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26, 2025 • 189
Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing Paper • 2507.05259 • Published Jul 7, 2025 • 6
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28, 2025 • 125
Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs Paper • 2504.15280 • Published Apr 21, 2025 • 25
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper • 2402.15504 • Published Feb 23, 2024 • 21
Magic-Me: Identity-Specific Video Customized Diffusion Paper • 2402.09368 • Published Feb 14, 2024 • 31
Meta-Personalizing Vision-Language Models to Find Named Instances in Video Paper • 2306.10169 • Published Jun 16, 2023 • 6