OpenComputer: Verifiable Software Worlds for Computer-Use Agents Paper • 2605.19769 • Published 11 days ago • 81
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6, 2025 • 21 • 2
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21, 2025 • 84
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6, 2025 • 21
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21, 2025 • 84
Running on CPU Upgrade 14k Open LLM Leaderboard 🏆 14k Track, rank and evaluate open LLMs and chatbots