Datasets
updated
common-pile/arxiv_abstracts_filtered
Viewer
• Updated • 2.5M • 178
• 8
common-pile/youtube_filtered
Viewer
• Updated • 986k • 85
• 5
common-pile/wikiteam_filtered
Viewer
• Updated • 10.2M • 93
• 2
common-pile/wikimedia_filtered
Viewer
• Updated • 12.9M • 370
• 6
common-pile/uspto_filtered
Viewer
• Updated • 14.4M • 1.36k
• 3
common-pile/usgpo_filtered
Viewer
• Updated • 2.34M • 311
• 1
common-pile/uk_hansard_filtered
Viewer
• Updated • 47.9k • 116
• 1
common-pile/ubuntu_irc_filtered
Viewer
• Updated • 216k • 149
• 2
common-pile/stackv2_html_filtered
Viewer
• Updated • 1.67M • 27
• 2
common-pile/stackv2_edu_filtered
Viewer
• Updated • 57M • 975
• 6
common-pile/stackexchange_filtered
Viewer
• Updated • 27.5M • 927
• 9
common-pile/regulations_filtered
Viewer
• Updated • 192k • 94
common-pile/python_enhancement_proposals_filtered
Viewer
• Updated • 655 • 26
• 1
common-pile/pubmed_filtered
Viewer
• Updated • 4.77M • 206
• 3
common-pile/public_domain_review_filtered
Viewer
• Updated • 1.41k • 90
common-pile/project_gutenberg_filtered
Viewer
• Updated • 57.1k • 1.57k
• 2
common-pile/pressbooks_filtered
Viewer
• Updated • 54.5k • 49
common-pile/pre_1929_books_filtered
Viewer
• Updated • 122k • 119
common-pile/peS2o_filtered
Viewer
• Updated • 6.09M • 503
• 1
common-pile/oercommons_filtered
Viewer
• Updated • 5.25k • 36
• 2
common-pile/news_filtered
Viewer
• Updated • 127k • 109
• 2
common-pile/libretexts_filtered
Viewer
• Updated • 40k • 250
• 3
common-pile/library_of_congress_filtered
Viewer
• Updated • 128k • 89
• 2
common-pile/github_archive_filtered
Viewer
• Updated • 23.3M • 229
• 2
common-pile/foodista_filtered
Preview
• Updated • 28
• 1
common-pile/doab_filtered
Viewer
• Updated • 404k • 208
• 2
common-pile/data_provenance_initiative_filtered
Viewer
• Updated • 3.51M • 43
common-pile/cccc_filtered
Viewer
• Updated • 10.8M • 181
• 2
common-pile/caselaw_access_project_filtered
Viewer
• Updated • 5.5M • 674
• 12
common-pile/biodiversity_heritage_library_filtered
Viewer
• Updated • 16.5M • 237
• 1
common-pile/arxiv_papers_filtered
Viewer
• Updated • 309k • 575
• 7
togethercomputer/RedPajama-Data-V2
Updated • 9.38k
• 404
allenai/llama-3.1-tulu-3-405b-preference-mixture
Viewer
• Updated • 361k • 32
• 6
HuggingFaceFW/fineweb-edu
Viewer
• Updated • 3.5B • 386k
• 1.18k
nvidia/Llama-Nemotron-Post-Training-Dataset
Viewer
• Updated • 3.91M • 4.94k
• 680
open-thoughts/OpenThoughts3-1.2M
Viewer
• Updated • 1.2M • 17.8k
• 247