Running Agents 7 Official Benchmarks Leaderboard 2026 🏆 7 Explore and compare AI model scores across official benchmarks
SWE-rebench-V2 Collection SWE-rebench-V2 is a curated dataset of software-engineering tasks derived from real GitHub issues and pull requests. • 3 items • Updated Mar 3 • 11
mistralai/Voxtral-Mini-4B-Realtime-2602 Automatic Speech Recognition • 4B • Updated Mar 11 • 1.16M • 840
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 166
Running Featured 595 Image Arena Leaderboard 📊 595 Image Generation and Image Editing Arena & Leaderboard
Running Featured 456 LLM Performance Leaderboard 🐨 456 View the latest LLM performance leaderboard online
Running Agents Featured 134 Open VLM Video Leaderboard 🌎 134 VLMEvalKit Eval Results in video understanding benchmark