Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
TroyeML 's Collections
🔥Hot Benchmarks

🔥Hot Benchmarks

updated about 9 hours ago
Upvote
1

  • SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

    Paper • 2602.12783 • Published 19 days ago • 147

  • MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

    Paper • 2602.22638 • Published 6 days ago • 103

  • CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

    Paper • 2601.22027 • Published Jan 29 • 83

  • ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development

    Paper • 2601.11077 • Published Jan 16 • 65

  • Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

    Paper • 2512.13168 • Published Dec 15, 2025 • 52

  • AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models

    Paper • 2511.14295 • Published Nov 18, 2025 • 73

  • UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation

    Paper • 2510.18701 • Published Oct 21, 2025 • 67

  • DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

    Paper • 2510.09116 • Published Oct 10, 2025 • 96

  • RubricBench: Aligning Model-Generated Rubrics with Human Standards

    Paper • 2603.01562 • Published 2 days ago • 47
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs