Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper β’ 2601.16208 β’ Published 4 days ago β’ 50
VideoNSA: Native Sparse Attention Scales Video Understanding Paper β’ 2510.02295 β’ Published Oct 2, 2025 β’ 10
VideoNSA: Native Sparse Attention Scales Video Understanding Paper β’ 2510.02295 β’ Published Oct 2, 2025 β’ 10 β’ 2
Devil in the Number: Towards Robust Multi-modality Data Filter Paper β’ 2309.13770 β’ Published Sep 24, 2023
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper β’ 2410.03051 β’ Published Oct 4, 2024 β’ 6
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark Paper β’ 2504.14693 β’ Published Apr 20, 2025
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper β’ 2506.09991 β’ Published Jun 11, 2025 β’ 55
Video-MMLU Collection A Massive Multi-Discipline Lecture Understanding Benchmark β’ 3 items β’ Updated Apr 27, 2025