Efficient Autoregressive Video Diffusion with Dummy Head Paper • 2601.20499 • Published 9 days ago • 5
Quantifying the Gap between Understanding and Generation within Unified Multimodal Models Paper • 2602.02140 • Published 4 days ago • 9
SpatiaLab: Can Vision-Language Models Perform Spatial Reasoning in the Wild? Paper • 2602.03916 • Published 3 days ago • 10
Horizon-LM: A RAM-Centric Architecture for LLM Training Paper • 2602.04816 • Published 1 day ago • 16
Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives Paper • 2601.20833 • Published 9 days ago • 171
Executable Code Actions Elicit Better LLM Agents Paper • 2402.01030 • Published Feb 1, 2024 • 187
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models Paper • 2512.19526 • Published Dec 22, 2025 • 12
Running 3.67k The Ultra-Scale Playbook 🌌 3.67k The ultimate guide to training LLM on large GPU Clusters
Cosmos-Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 1 day ago • 43
VGGHeads: A Large-Scale Synthetic Dataset for 3D Human Heads Paper • 2407.18245 • Published Jul 25, 2024 • 12
dima806/facial_emotions_image_detection Image Classification • 85.8M • Updated Oct 19, 2024 • 54.1k • • 117
trpakov/vit-face-expression Image Classification • 85.8M • Updated Feb 20, 2025 • 238k • • 86
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper • 2407.08683 • Published Jul 11, 2024 • 24