K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 11 days ago • 56
ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published Sep 25, 2025 • 64
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think Paper • 2505.10185 • Published May 15, 2025 • 26