DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper โข 2601.22153 โข Published Jan 29 โข 74
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper โข 2512.19693 โข Published Dec 22, 2025 โข 67
Learning an Image Editing Model without Image Editing Pairs Paper โข 2510.14978 โข Published Oct 16, 2025 โข 9
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper โข 2510.14979 โข Published Oct 16, 2025 โข 69
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper โข 2510.08673 โข Published Oct 9, 2025 โข 127
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper โข 2509.01964 โข Published Sep 2, 2025 โข 7
Enhanced Generative Structure Prior for Chinese Text Image Super-resolution Paper โข 2508.07537 โข Published Aug 11, 2025
Reconstructing 4D Spatial Intelligence: A Survey Paper โข 2507.21045 โข Published Jul 28, 2025 โข 38
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Paper โข 2506.05301 โข Published Jun 5, 2025 โข 59