Representation Alignment for Just Image Transformers is not Easier than You Think Paper • 2603.14366 • Published 13 days ago • 6
Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting Paper • 2603.25745 • Published 2 days ago • 8
AVControl: Efficient Framework for Training Audio-Visual Controls Paper • 2603.24793 • Published 3 days ago • 15
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 2 days ago • 105
Repurposing Geometric Foundation Models for Multi-view Diffusion Paper • 2603.22275 • Published 5 days ago • 43
FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow Paper • 2603.19598 • Published 9 days ago • 32
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 11 days ago • 301
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 12 days ago • 148
OmniForcing: Unleashing Real-time Joint Audio-Visual Generation Paper • 2603.11647 • Published 17 days ago • 31
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs Paper • 2603.09095 • Published 19 days ago • 28
Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion Paper • 2603.06577 • Published 22 days ago • 48
Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence Paper • 2603.07660 • Published 20 days ago • 84
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Paper • 2603.05890 • Published 23 days ago • 91