Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 3 days ago • 55
AutoScape: Geometry-Consistent Long-Horizon Scene Generation Paper • 2510.20726 • Published Oct 23, 2025
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 3 days ago • 59
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 3 days ago • 59
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published Oct 9, 2025 • 81
EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing Paper • 2509.26346 • Published Sep 30, 2025 • 19
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Paper • 2509.01215 • Published Sep 1, 2025 • 51
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering Paper • 2507.08776 • Published Jul 11, 2025 • 55
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering Paper • 2507.08776 • Published Jul 11, 2025 • 55