Running Agents MedRiskEval Seed Annotator 🩺 Validate and edit medical seed texts with versioned annotations
Running Agents MedRiskEval Seed Annotator 🩺 Validate and edit medical seed texts with versioned annotations
Running Agents 5 WC2026 LLM Prediction Leaderboard ⚽ 5 Explore LLM World Cup 2026 prediction leaderboard and results
Running Agents 5 WC2026 LLM Prediction Leaderboard ⚽ 5 Explore LLM World Cup 2026 prediction leaderboard and results
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published Apr 7 • 122
Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran Paper • 2601.17880 • Published Jan 25 • 3
Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency Paper • 2510.18556 • Published Oct 21, 2025 • 2