Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification Paper • 2603.26648 • Published Mar 27 • 42
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published 6 days ago • 11