Open Datasets
updated
Updated • 339
• 86
Viewer
• Updated • 1.8k • 56.1k
• 9.69k
Viewer
• Updated • 69.9k • 71.4k
• 399
Viewer
• Updated • 2.2M • 15.9k
• 407
Matthijs/cmu-arctic-xvectors
Viewer
• Updated • 7.93k • 25.1k
• 63
parler-tts/libritts-r-filtered-speaker-descriptions
Viewer
• Updated • 359k • 2.07k
• 8
Viewer
• Updated • 860k • 55.4k
• 577
alpindale/two-million-bluesky-posts
Viewer
• Updated • 2.11M • 1.09k
• 202
arimalabs/2.3-million-bluesky-posts
Viewer
• Updated • 2.37M • 9
• 5
Viewer
• Updated • 70k • 83.3k
• 239
Viewer
• Updated • 1.34M • 72.8k
• 32
Viewer
• Updated • 1.12M • 1.73k
• 4
parler-tts/libritts_r_filtered
Viewer
• Updated • 359k • 3.45k
• 21
opendiffusionai/cc12m-cleaned
Viewer
• Updated • 8.53M • 113
• 10
Viewer
• Updated • 31.4k • 3.04k
• 23
Preview
• Updated • 691
• 7
Viewer
• Updated • 61.6M • 254k
• 1.22k
parler-tts/mls-eng-speaker-descriptions
Viewer
• Updated • 10.8M • 337
• 11
Viewer
• Updated • 117M • 7.49k
• 107
Updated • 26
• 2
Viewer
• Updated • 602k • 42.3k
• 156
Viewer
• Updated • 4.48B • 86.6k
• 796
Viewer
• Updated • 1.55k • 29
• 4
Viewer
• Updated • 1.26M • 22k
• 150
Viewer
• Updated • 59.1k • 507
• 12
keremberke/license-plate-object-detection
Viewer
• Updated • 8.83k • 1.02k
• 37
Updated • 116
• 8
Viewer
• Updated • 98.6k • 1.55k
• 105
nebius/SWE-agent-trajectories
Viewer
• Updated • 80k • 2.39k
• 81
Viewer
• Updated • 3.4k • 9.04k
• 59
cfahlgren1/react-code-instructions
Viewer
• Updated • 74.4k • 829
• 157
DAMO-NLP-SG/multimodal_textbook
Updated • 1.49k
• 164
NovaSky-AI/Sky-T1_data_17k
Viewer
• Updated • 16.4k • 591
• 186
Viewer
• Updated • 5.45B • 21.5k
• 560
Viewer
• Updated • 546M • 16.6k
• 996
hoskinson-center/proof-pile
Viewer
• Updated • 363k • 2.14k
• 67
HuggingFaceFW/fineweb-edu
Viewer
• Updated • 3.5B • 570k
• 1.07k
EleutherAI/the_pile_deduplicated
Viewer
• Updated • 134M • 22.2k
• 112
MohamedRashad/multilingual-tts
Viewer
• Updated • 25.5k • 292
• 48
Viewer
• Updated • 16.4k • 13
• 4
facebook/multilingual_librispeech
Viewer
• Updated • 1.49M • 42.6k
• 179
Viewer
• Updated • 1.25M • 13.8k
• 88
Viewer
• Updated • 2.77M • 18.9k
• 116
Fumika/Wikinews-multilingual
Viewer
• Updated • 15.2k • 74
• 7
ayymen/Weblate-Translations
Viewer
• Updated • 11.7M • 608
• 19
Updated • 32.4k
• 168
Helsinki-NLP/opus_wikipedia
Viewer
• Updated • 1.75M • 541
• 10
Viewer
• Updated • 3.59M • 38
• 1
MLCommons/unsupervised_peoples_speech
Updated • 5.19k
• 75
HKUSTAudio/Llasa_opensource_speech_data_160k_hours_tokenized
Updated • 388
• 30
Viewer
• Updated • 10k • 9.5k
• 545
Viewer
• Updated • 68.1k • 190k
• 24
allenai/RLVR-GSM-MATH-IF-Mixed-Constraints
Viewer
• Updated • 29.9k • 386
• 31
allenai/olmo-2-0325-32b-preference-mix
Updated • 181
• 15
allenai/tulu-3-sft-olmo-2-mixture-0225
Viewer
• Updated • 866k • 916
• 22
Viewer
• Updated • 170M • 21.2k
• 94
Viewer
• Updated • 621M • 28.4k
• 88
Viewer
• Updated • 932 • 48.6k
• 662
Congliu/Chinese-DeepSeek-R1-Distill-data-110k
Viewer
• Updated • 110k • 1.17k
• 749
Viewer
• Updated • 102k • 282
• 47
Viewer
• Updated • 450k • 25.3k
• 745
Viewer
• Updated • 167M • 5.69k
• 70