Add evaluation results for GPQA-Diamond, MMLU-Pro

#6
by SaylorTwift HF Staff - opened

Evaluation Results

    This PR adds evaluation results extracted from the Model Card.

    **Benchmarks:**
    - MMLU-Pro: 81.02
  • GPQA-Diamond: 74.43

      **Files created:**
      - .eval_results/mmlu_pro.yaml
    
  • .eval_results/gpqa_diamond.yaml

      ---
    
      Extracted automatically using the [LLM-powered evaluation extractor](https://github.com/huggingface/community-evals).
    
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment