ClawBench — Browser Agent Benchmark Suite Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. NAIL-Group/ClawBench Viewer • Updated 2 days ago • 153 • 393 • 2 Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark NAIL-Group/ClawBenchV1Trace Updated 2 days ago • 5.79k ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench — Browser Agent Benchmark Suite Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. NAIL-Group/ClawBench Viewer • Updated 2 days ago • 153 • 393 • 2 Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark NAIL-Group/ClawBenchV1Trace Updated 2 days ago • 5.79k ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
pinned Running Agents ClawBench Leaderboard 🦀 Live leaderboard for the ClawBench web-agent benchmark