CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
• Updated
• 1.54k • 1.29k
• 18
Updated
• 89
• 6
rootsautomation/RICO-ScreenQA
Viewer
• Updated
• 86k • 177
• 11
rootsautomation/ScreenSpot
Viewer
• Updated
• 1.27k • 1.22k
• 44
Viewer
• Updated
• 1.27k • 887
• 8
Viewer
• Updated
• 1.59k • 1.9k
• 44
Preview
• Updated
• 1.67k
• 15
Preview
• Updated
• 4.14k
• 25
Viewer
• Updated
• 168k • 265
• 5
Preview
• Updated
• 12
osunlp/Multimodal-Mind2Web
Viewer
• Updated
• 14.2k • 3.26k
• 91
Viewer
• Updated
• 259 • 130
• 2
Viewer
• Updated
• 253 • 3.58k
• 123
Viewer
• Updated
• 7.74k • 4.12k
• 26
xlangai/ubuntu_osworld_file_cache
Updated
• 298k
• 3
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published
• 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
• 2405.14573
• Published
Viewer
• Updated
• 1.21k • 140
• 5