Large-Scale Data Selection for Instruction Tuning
Datasets and models associated with the paper "Large-Scale Data Selection for Instruction Tuning" (https://arxiv.org/abs/2503.01807)
-
Paper • 2503.01807 • Published • 14
hamishivi/tulu-2-multitask-rrmax-326k-sft
7B • Updated • 6 • 1Note Above is our Llama 2 7b model trained on the multitask mixture linked below on Tulu 2 data.
-
hamishivi/rds-sels-multitask-rrmax-top326k
Viewer • Updated • 326k • 13 • 1
hamishivi/llama-3.1-tulu-3-multitask-rrmax-939k-sft
Updated • 7Note Above is our Llama 3 8b model trained on the multitask mixture linked below on Tulu 3 data.
hamishivi/rds-sels-tulu-3-multitask-rrmax-939k
Viewer • Updated • 939k • 8Note Below is our unfiltered datasets, multi-million size instruction tuning datasets made up of all the data considered for Tulu 2 and 3 respectively.
-
hamishivi/tulu-2-unfiltered
Viewer • Updated • 3.54M • 40 • 1 -
hamishivi/200k-tulu-2-unbalanced
Viewer • Updated • 200k • 7
hamishivi/tulu-3-unfiltered
Viewer • Updated • 4.88M • 86 • 2Note Below are other multitask trained models and the data they were trained on. Tulu 2 models are based on Llama 2 7b and Tulu 3 models on Llama 3 8b.
-
hamishivi/llama-3.1-tulu-3-arena-hard-939k-sft
Updated • 2 -
hamishivi/rds-sels-tulu-3-arena-hard-939k
Viewer • Updated • 939k • 12 -
hamishivi/tulu-2-arena-hard-326k-sft
7B • Updated • 3 -
hamishivi/rds-sels-arena-hard-top326k
Viewer • Updated • 326k • 5 -
hamishivi/tulu-2-wildchat-326k-sft
7B • Updated • 1
hamishivi/rds-sels-wildchat-top326k
Viewer • Updated • 326k • 13Note Below is the data selected by RDS+ with Llama 2 7b from the Tulu 2 unfiltered dataset, selecting for the evaluation in the dataset name.
-
hamishivi/rds-sels-alpacafarm-top326k
Viewer • Updated • 326k • 5 -
hamishivi/rds-sels-gsm8k-shots-top326k
Viewer • Updated • 326k • 3 -
hamishivi/rds-sels-codex-top326k
Viewer • Updated • 326k • 6 • 1 -
hamishivi/rds-sels-bbh-shots-top326k
Viewer • Updated • 326k • 1 -
hamishivi/rds-sels-mmlu-shots-top326k
Viewer • Updated • 326k • 9 -
hamishivi/rds-sels-squad-top326k
Viewer • Updated • 326k • 2 -
hamishivi/rds-sels-tydiqa-shots-top326k
Viewer • Updated • 326k • 1 -
hamishivi/lsds_data
Preview • Updated • 6