MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism Paper • 2606.07512 • Published 6 days ago • 35
WorldOlympiad: Can Your World Model Survive a Triathlon? Paper • 2606.11129 • Published 2 days ago • 27
Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? Paper • 2606.01247 • Published 11 days ago • 30
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction Paper • 2605.26115 • Published 17 days ago • 51
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization Paper • 2605.15980 • Published 27 days ago • 36
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 28 days ago • 86
Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video Paper • 2605.15182 • Published 28 days ago • 39
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 118
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published Apr 9 • 26
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published Apr 6 • 114
Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO Paper • 2602.06422 • Published Feb 6 • 47
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published Jan 5 • 30
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 51
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published Nov 25, 2025 • 51
BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation Paper • 2509.25077 • Published Sep 29, 2025 • 15
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17, 2025 • 67
DeepVerse: 4D Autoregressive Video Generation as a World Model Paper • 2506.01103 • Published Jun 1, 2025
WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool Paper • 2509.05296 • Published Sep 5, 2025 • 8
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 107