| | --- |
| | license: apache-2.0 |
| | tags: |
| | - merge |
| | - mergekit |
| | --- |
| | |
| | # MetaModel |
| |
|
| | This model is a merge of the following models made with [mergekit](https://github.com/cg123/mergekit): |
| | * [jeonsworld/CarbonVillain-en-10.7B-v4](https://huggingface.co/jeonsworld/CarbonVillain-en-10.7B-v4) |
| | * [kekmodel/StopCarbon-10.7B-v5](https://huggingface.co/kekmodel/StopCarbon-10.7B-v5) |
| |
|
| | ## 🧩 Configuration |
| |
|
| | ```yaml |
| | slices: |
| | - sources: |
| | - model: jeonsworld/CarbonVillain-en-10.7B-v4 |
| | layer_range: [0, 48] |
| | - model: kekmodel/StopCarbon-10.7B-v5 |
| | layer_range: [0, 48] |
| | merge_method: slerp |
| | base_model: jeonsworld/CarbonVillain-en-10.7B-v4 |
| | parameters: |
| | t: |
| | - filter: self_attn |
| | value: [0, 0.5, 0.3, 0.7, 1] |
| | - filter: mlp |
| | value: [1, 0.5, 0.7, 0.3, 0] |
| | - value: 0.5 |
| | dtype: bfloat16 |
| | ``` |
| |
|
| | # Dataset Card for Evaluation run of gagan3012/MetaModel |
| |
|
| | <!-- Provide a quick summary of the dataset. --> |
| |
|
| | Dataset automatically created during the evaluation run of model [gagan3012/MetaModel](https://huggingface.co/gagan3012/MetaModel) on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard). |
| |
|
| | The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task. |
| |
|
| | The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. |
| |
|
| | An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)). |
| |
|
| | To load the details from a run, you can for instance do the following: |
| | ```python |
| | from datasets import load_dataset |
| | data = load_dataset("open-llm-leaderboard/details_gagan3012__MetaModel", |
| | "harness_winogrande_5", |
| | split="train") |
| | ``` |
| |
|
| | ## Latest results |
| |
|
| | These are the [latest results from run 2024-01-04T14:09:43.780941](https://huggingface.co/datasets/open-llm-leaderboard/details_gagan3012__MetaModel/blob/main/results_2024-01-04T14-09-43.780941.json)(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): |
| |
|
| | ```python |
| | { |
| | "all": { |
| | "acc": 0.6664380298886512, |
| | "acc_stderr": 0.031642195230944255, |
| | "acc_norm": 0.6671639222858992, |
| | "acc_norm_stderr": 0.03228745343467652, |
| | "mc1": 0.5691554467564259, |
| | "mc1_stderr": 0.01733527247533237, |
| | "mc2": 0.7184177934834866, |
| | "mc2_stderr": 0.014995634120330182 |
| | }, |
| | "harness|arc:challenge|25": { |
| | "acc": 0.6843003412969283, |
| | "acc_stderr": 0.013582571095815291, |
| | "acc_norm": 0.7107508532423208, |
| | "acc_norm_stderr": 0.01325001257939344 |
| | }, |
| | "harness|hellaswag|10": { |
| | "acc": 0.7132045409281019, |
| | "acc_stderr": 0.004513409114983828, |
| | "acc_norm": 0.8844851623182632, |
| | "acc_norm_stderr": 0.0031898897894046684 |
| | }, |
| | "harness|hendrycksTest-abstract_algebra|5": { |
| | "acc": 0.43, |
| | "acc_stderr": 0.049756985195624284, |
| | "acc_norm": 0.43, |
| | "acc_norm_stderr": 0.049756985195624284 |
| | }, |
| | "harness|hendrycksTest-anatomy|5": { |
| | "acc": 0.6148148148148148, |
| | "acc_stderr": 0.04203921040156279, |
| | "acc_norm": 0.6148148148148148, |
| | "acc_norm_stderr": 0.04203921040156279 |
| | }, |
| | "harness|hendrycksTest-astronomy|5": { |
| | "acc": 0.743421052631579, |
| | "acc_stderr": 0.0355418036802569, |
| | "acc_norm": 0.743421052631579, |
| | "acc_norm_stderr": 0.0355418036802569 |
| | }, |
| | "harness|hendrycksTest-business_ethics|5": { |
| | "acc": 0.75, |
| | "acc_stderr": 0.04351941398892446, |
| | "acc_norm": 0.75, |
| | "acc_norm_stderr": 0.04351941398892446 |
| | }, |
| | "harness|hendrycksTest-clinical_knowledge|5": { |
| | "acc": 0.6830188679245283, |
| | "acc_stderr": 0.02863723563980089, |
| | "acc_norm": 0.6830188679245283, |
| | "acc_norm_stderr": 0.02863723563980089 |
| | }, |
| | "harness|hendrycksTest-college_biology|5": { |
| | "acc": 0.7638888888888888, |
| | "acc_stderr": 0.03551446610810826, |
| | "acc_norm": 0.7638888888888888, |
| | "acc_norm_stderr": 0.03551446610810826 |
| | }, |
| | "harness|hendrycksTest-college_chemistry|5": { |
| | "acc": 0.47, |
| | "acc_stderr": 0.050161355804659205, |
| | "acc_norm": 0.47, |
| | "acc_norm_stderr": 0.050161355804659205 |
| | }, |
| | "harness|hendrycksTest-college_computer_science|5": { |
| | "acc": 0.48, |
| | "acc_stderr": 0.05021167315686781, |
| | "acc_norm": 0.48, |
| | "acc_norm_stderr": 0.05021167315686781 |
| | }, |
| | "harness|hendrycksTest-college_mathematics|5": { |
| | "acc": 0.32, |
| | "acc_stderr": 0.046882617226215034, |
| | "acc_norm": 0.32, |
| | "acc_norm_stderr": 0.046882617226215034 |
| | }, |
| | "harness|hendrycksTest-college_medicine|5": { |
| | "acc": 0.6647398843930635, |
| | "acc_stderr": 0.03599586301247077, |
| | "acc_norm": 0.6647398843930635, |
| | "acc_norm_stderr": 0.03599586301247077 |
| | }, |
| | "harness|hendrycksTest-college_physics|5": { |
| | "acc": 0.38235294117647056, |
| | "acc_stderr": 0.04835503696107223, |
| | "acc_norm": 0.38235294117647056, |
| | "acc_norm_stderr": 0.04835503696107223 |
| | }, |
| | "harness|hendrycksTest-computer_security|5": { |
| | "acc": 0.75, |
| | "acc_stderr": 0.04351941398892446, |
| | "acc_norm": 0.75, |
| | "acc_norm_stderr": 0.04351941398892446 |
| | }, |
| | "harness|hendrycksTest-conceptual_physics|5": { |
| | "acc": 0.625531914893617, |
| | "acc_stderr": 0.03163910665367291, |
| | "acc_norm": 0.625531914893617, |
| | "acc_norm_stderr": 0.03163910665367291 |
| | }, |
| | "harness|hendrycksTest-econometrics|5": { |
| | "acc": 0.4824561403508772, |
| | "acc_stderr": 0.04700708033551038, |
| | "acc_norm": 0.4824561403508772, |
| | "acc_norm_stderr": 0.04700708033551038 |
| | }, |
| | "harness|hendrycksTest-electrical_engineering|5": { |
| | "acc": 0.6413793103448275, |
| | "acc_stderr": 0.039966295748767186, |
| | "acc_norm": 0.6413793103448275, |
| | "acc_norm_stderr": 0.039966295748767186 |
| | }, |
| | "harness|hendrycksTest-elementary_mathematics|5": { |
| | "acc": 0.5, |
| | "acc_stderr": 0.025751310131230234, |
| | "acc_norm": 0.5, |
| | "acc_norm_stderr": 0.025751310131230234 |
| | }, |
| | "harness|hendrycksTest-formal_logic|5": { |
| | "acc": 0.42857142857142855, |
| | "acc_stderr": 0.0442626668137991, |
| | "acc_norm": 0.42857142857142855, |
| | "acc_norm_stderr": 0.0442626668137991 |
| | }, |
| | "harness|hendrycksTest-global_facts|5": { |
| | "acc": 0.35, |
| | "acc_stderr": 0.047937248544110196, |
| | "acc_norm": 0.35, |
| | "acc_norm_stderr": 0.047937248544110196 |
| | }, |
| | "harness|hendrycksTest-high_school_biology|5": { |
| | "acc": 0.8129032258064516, |
| | "acc_stderr": 0.022185710092252252, |
| | "acc_norm": 0.8129032258064516, |
| | "acc_norm_stderr": 0.022185710092252252 |
| | }, |
| | "harness|hendrycksTest-high_school_chemistry|5": { |
| | "acc": 0.5073891625615764, |
| | "acc_stderr": 0.035176035403610105, |
| | "acc_norm": 0.5073891625615764, |
| | "acc_norm_stderr": 0.035176035403610105 |
| | }, |
| | "harness|hendrycksTest-high_school_computer_science|5": { |
| | "acc": 0.72, |
| | "acc_stderr": 0.04512608598542128, |
| | "acc_norm": 0.72, |
| | "acc_norm_stderr": 0.04512608598542128 |
| | }, |
| | "harness|hendrycksTest-high_school_european_history|5": { |
| | "acc": 0.8121212121212121, |
| | "acc_stderr": 0.03050193405942914, |
| | "acc_norm": 0.8121212121212121, |
| | "acc_norm_stderr": 0.03050193405942914 |
| | }, |
| | "harness|hendrycksTest-high_school_geography|5": { |
| | "acc": 0.8636363636363636, |
| | "acc_stderr": 0.024450155973189835, |
| | "acc_norm": 0.8636363636363636, |
| | "acc_norm_stderr": 0.024450155973189835 |
| | }, |
| | "harness|hendrycksTest-high_school_government_and_politics|5": { |
| | "acc": 0.8963730569948186, |
| | "acc_stderr": 0.021995311963644244, |
| | "acc_norm": 0.8963730569948186, |
| | "acc_norm_stderr": 0.021995311963644244 |
| | }, |
| | "harness|hendrycksTest-high_school_macroeconomics|5": { |
| | "acc": 0.6692307692307692, |
| | "acc_stderr": 0.02385479568097114, |
| | "acc_norm": 0.6692307692307692, |
| | "acc_norm_stderr": 0.02385479568097114 |
| | }, |
| | "harness|hendrycksTest-high_school_mathematics|5": { |
| | "acc": 0.37037037037037035, |
| | "acc_stderr": 0.02944316932303154, |
| | "acc_norm": 0.37037037037037035, |
| | "acc_norm_stderr": 0.02944316932303154 |
| | }, |
| | "harness|hendrycksTest-high_school_microeconomics|5": { |
| | "acc": 0.7142857142857143, |
| | "acc_stderr": 0.029344572500634332, |
| | "acc_norm": 0.7142857142857143, |
| | "acc_norm_stderr": 0.029344572500634332 |
| | }, |
| | "harness|hendrycksTest-high_school_physics|5": { |
| | "acc": 0.3708609271523179, |
| | "acc_stderr": 0.03943966699183629, |
| | "acc_norm": 0.3708609271523179, |
| | "acc_norm_stderr": 0.03943966699183629 |
| | }, |
| | "harness|hendrycksTest-high_school_psychology|5": { |
| | "acc": 0.8422018348623853, |
| | "acc_stderr": 0.01563002297009246, |
| | "acc_norm": 0.8422018348623853, |
| | "acc_norm_stderr": 0.01563002297009246 |
| | }, |
| | "harness|hendrycksTest-high_school_statistics|5": { |
| | "acc": 0.5740740740740741, |
| | "acc_stderr": 0.03372343271653062, |
| | "acc_norm": 0.5740740740740741, |
| | "acc_norm_stderr": 0.03372343271653062 |
| | }, |
| | "harness|hendrycksTest-high_school_us_history|5": { |
| | "acc": 0.8578431372549019, |
| | "acc_stderr": 0.02450980392156862, |
| | "acc_norm": 0.8578431372549019, |
| | "acc_norm_stderr": 0.02450980392156862 |
| | }, |
| | "harness|hendrycksTest-high_school_world_history|5": { |
| | "acc": 0.8565400843881856, |
| | "acc_stderr": 0.022818291821017012, |
| | "acc_norm": 0.8565400843881856, |
| | "acc_norm_stderr": 0.022818291821017012 |
| | }, |
| | "harness|hendrycksTest-human_aging|5": { |
| | "acc": 0.672645739910314, |
| | "acc_stderr": 0.03149384670994131, |
| | "acc_norm": 0.672645739910314, |
| | "acc_norm_stderr": 0.03149384670994131 |
| | }, |
| | "harness|hendrycksTest-human_sexuality|5": { |
| | "acc": 0.7557251908396947, |
| | "acc_stderr": 0.03768335959728743, |
| | "acc_norm": 0.7557251908396947, |
| | "acc_norm_stderr": 0.03768335959728743 |
| | }, |
| | "harness|hendrycksTest-international_law|5": { |
| | "acc": 0.7851239669421488, |
| | "acc_stderr": 0.037494924487096966, |
| | "acc_norm": 0.7851239669421488, |
| | "acc_norm_stderr": 0.037494924487096966 |
| | }, |
| | "harness|hendrycksTest-jurisprudence|5": { |
| | "acc": 0.8055555555555556, |
| | "acc_stderr": 0.038260763248848646, |
| | "acc_norm": 0.8055555555555556, |
| | "acc_norm_stderr": 0.038260763248848646 |
| | }, |
| | "harness|hendrycksTest-logical_fallacies|5": { |
| | "acc": 0.754601226993865, |
| | "acc_stderr": 0.03380939813943354, |
| | "acc_norm": 0.754601226993865, |
| | "acc_norm_stderr": 0.03380939813943354 |
| | }, |
| | "harness|hendrycksTest-machine_learning|5": { |
| | "acc": 0.4732142857142857, |
| | "acc_stderr": 0.047389751192741546, |
| | "acc_norm": 0.4732142857142857, |
| | "acc_norm_stderr": 0.047389751192741546 |
| | }, |
| | "harness|hendrycksTest-management|5": { |
| | "acc": 0.8446601941747572, |
| | "acc_stderr": 0.035865947385739734, |
| | "acc_norm": 0.8446601941747572, |
| | "acc_norm_stderr": 0.035865947385739734 |
| | }, |
| | "harness|hendrycksTest-marketing|5": { |
| | "acc": 0.8589743589743589, |
| | "acc_stderr": 0.02280138253459753, |
| | "acc_norm": 0.8589743589743589, |
| | "acc_norm_stderr": 0.02280138253459753 |
| | }, |
| | "harness|hendrycksTest-medical_genetics|5": { |
| | "acc": 0.7, |
| | "acc_stderr": 0.046056618647183814, |
| | "acc_norm": 0.7, |
| | "acc_norm_stderr": 0.046056618647183814 |
| | }, |
| | "harness|hendrycksTest-miscellaneous|5": { |
| | "acc": 0.8084291187739464, |
| | "acc_stderr": 0.014072859310451949, |
| | "acc_norm": 0.8084291187739464, |
| | "acc_norm_stderr": 0.014072859310451949 |
| | }, |
| | "harness|hendrycksTest-moral_disputes|5": { |
| | "acc": 0.7572254335260116, |
| | "acc_stderr": 0.023083658586984204, |
| | "acc_norm": 0.7572254335260116, |
| | "acc_norm_stderr": 0.023083658586984204 |
| | }, |
| | "harness|hendrycksTest-moral_scenarios|5": { |
| | "acc": 0.39664804469273746, |
| | "acc_stderr": 0.016361354769822468, |
| | "acc_norm": 0.39664804469273746, |
| | "acc_norm_stderr": 0.016361354769822468 |
| | }, |
| | "harness|hendrycksTest-nutrition|5": { |
| | "acc": 0.7581699346405228, |
| | "acc_stderr": 0.024518195641879334, |
| | "acc_norm": 0.7581699346405228, |
| | "acc_norm_stderr": 0.024518195641879334 |
| | }, |
| | "harness|hendrycksTest-philosophy|5": { |
| | "acc": 0.7202572347266881, |
| | "acc_stderr": 0.025494259350694905, |
| | "acc_norm": 0.7202572347266881, |
| | "acc_norm_stderr": 0.025494259350694905 |
| | }, |
| | "harness|hendrycksTest-prehistory|5": { |
| | "acc": 0.7777777777777778, |
| | "acc_stderr": 0.02313237623454333, |
| | "acc_norm": 0.7777777777777778, |
| | "acc_norm_stderr": 0.02313237623454333 |
| | }, |
| | "harness|hendrycksTest-professional_accounting|5": { |
| | "acc": 0.5035460992907801, |
| | "acc_stderr": 0.02982674915328092, |
| | "acc_norm": 0.5035460992907801, |
| | "acc_norm_stderr": 0.02982674915328092 |
| | }, |
| | "harness|hendrycksTest-professional_law|5": { |
| | "acc": 0.49478487614080835, |
| | "acc_stderr": 0.012769541449652547, |
| | "acc_norm": 0.49478487614080835, |
| | "acc_norm_stderr": 0.012769541449652547 |
| | }, |
| | "harness|hendrycksTest-professional_medicine|5": { |
| | "acc": 0.75, |
| | "acc_stderr": 0.026303648393696036, |
| | "acc_norm": 0.75, |
| | "acc_norm_stderr": 0.026303648393696036 |
| | }, |
| | "harness|hendrycksTest-professional_psychology|5": { |
| | "acc": 0.6813725490196079, |
| | "acc_stderr": 0.018850084696468712, |
| | "acc_norm": 0.6813725490196079, |
| | "acc_norm_stderr": 0.018850084696468712 |
| | }, |
| | "harness|hendrycksTest-public_relations|5": { |
| | "acc": 0.6818181818181818, |
| | "acc_stderr": 0.04461272175910509, |
| | "acc_norm": 0.6818181818181818, |
| | "acc_norm_stderr": 0.04461272175910509 |
| | }, |
| | "harness|hendrycksTest-security_studies|5": { |
| | "acc": 0.746938775510204, |
| | "acc_stderr": 0.027833023871399677, |
| | "acc_norm": 0.746938775510204, |
| | "acc_norm_stderr": 0.027833023871399677 |
| | }, |
| | "harness|hendrycksTest-sociology|5": { |
| | "acc": 0.8258706467661692, |
| | "acc_stderr": 0.026814951200421603, |
| | "acc_norm": 0.8258706467661692, |
| | "acc_norm_stderr": 0.026814951200421603 |
| | }, |
| | "harness|hendrycksTest-us_foreign_policy|5": { |
| | "acc": 0.91, |
| | "acc_stderr": 0.028762349126466125, |
| | "acc_norm": 0.91, |
| | "acc_norm_stderr": 0.028762349126466125 |
| | }, |
| | "harness|hendrycksTest-virology|5": { |
| | "acc": 0.5783132530120482, |
| | "acc_stderr": 0.038444531817709175, |
| | "acc_norm": 0.5783132530120482, |
| | "acc_norm_stderr": 0.038444531817709175 |
| | }, |
| | "harness|hendrycksTest-world_religions|5": { |
| | "acc": 0.7777777777777778, |
| | "acc_stderr": 0.03188578017686398, |
| | "acc_norm": 0.7777777777777778, |
| | "acc_norm_stderr": 0.03188578017686398 |
| | }, |
| | "harness|truthfulqa:mc|0": { |
| | "mc1": 0.5691554467564259, |
| | "mc1_stderr": 0.01733527247533237, |
| | "mc2": 0.7184177934834866, |
| | "mc2_stderr": 0.014995634120330182 |
| | }, |
| | "harness|winogrande|5": { |
| | "acc": 0.8342541436464088, |
| | "acc_stderr": 0.010450899545370632 |
| | }, |
| | "harness|gsm8k|5": { |
| | "acc": 0.6535253980288097, |
| | "acc_stderr": 0.013107179054313398 |
| | } |
| | } |
| | ``` |
| | # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
| | Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_gagan3012__MetaModel) |
| |
|
| | | Metric | Value | |
| | |-----------------------|---------------------------| |
| | | Avg. | 74.4 | |
| | | ARC (25-shot) | 71.08 | |
| | | HellaSwag (10-shot) | 88.45 | |
| | | MMLU (5-shot) | 66.26 | |
| | | TruthfulQA (0-shot) | 71.84 | |
| | | Winogrande (5-shot) | 83.43 | |
| | | GSM8K (5-shot) | 65.35 | |
| |
|