MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier Paper • 2603.03756 • Published 4 days ago • 80
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 5 days ago • 79
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published 27 days ago • 50
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 46
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation Paper • 2601.09688 • Published Jan 14 • 127
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published Dec 9, 2025 • 133
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 121
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published Dec 19, 2025 • 67
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 66
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19, 2025 • 231
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 239
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 613 items • Updated about 13 hours ago • 86
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models Paper • 2512.01949 • Published Dec 1, 2025 • 9
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 186