Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 4 days ago • 28
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published 4 days ago • 24
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 52
SpotEdit: Selective Region Editing in Diffusion Transformers Paper • 2512.22323 • Published Dec 26, 2025 • 39
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published Dec 22, 2025 • 30
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published Dec 23, 2025 • 56
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published Dec 22, 2025 • 30
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published Dec 18, 2025 • 25
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published Dec 18, 2025 • 25
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published Dec 18, 2025 • 25
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published Dec 16, 2025 • 72
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 32