UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors Paper • 2605.00658 • Published 5 days ago • 77
From Skill Text to Skill Structure: The Scheduling-Structural-Logical Representation for Agent Skills Paper • 2604.24026 • Published 9 days ago • 16
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 7 days ago • 97
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 9 days ago • 116
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company Paper • 2604.22446 • Published 12 days ago • 118
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 2.15M • 673
Running Agents 351 VBench Leaderboard 📊 351 Submit video model evaluation results to a public benchmark
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 19
Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models Paper • 2601.08955 • Published Jan 13 • 13