Pretraining Data
updated
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer
•
Updated
•
958M
•
62.3k
•
63
Viewer
•
Updated
•
56.2M
•
144k
•
29
Viewer
•
Updated
•
3.8B
•
15.1k
•
106
allenai/dolma3_dolmino_pool
Updated
•
93.3k
•
7
allenai/dolma3_longmino_pool
Updated
•
50.7k
•
10
Viewer
•
Updated
•
476M
•
36.6k
•
817
Viewer
•
Updated
•
4.48B
•
78.5k
•
753
Viewer
•
Updated
•
61.6M
•
7.27k
•
284
Viewer
•
Updated
•
819M
•
53.6k
•
11
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
176k
•
31
ByteDance-Seed/Code-Contests-Plus
Viewer
•
Updated
•
49.2k
•
26.8k
•
60
Viewer
•
Updated
•
7.09M
•
6.16k
•
157
nvidia/Nemotron-Pretraining-Code-v2
Viewer
•
Updated
•
836M
•
4.04k
•
103
nvidia/Nemotron-Pretraining-Specialized-v1
Viewer
•
Updated
•
60.7M
•
4.09k
•
69
nvidia/Nemotron-CC-Math-v1
Viewer
•
Updated
•
190M
•
3.83k
•
66
nvidia/Nemotron-Pretraining-SFT-v1
Viewer
•
Updated
•
299M
•
3.07k
•
62
Viewer
•
Updated
•
1.86M
•
17.6k
•
225
EssentialAI/essential-web-v1.0
Preview
•
Updated
•
113k
•
218
EssentialAI/eai-taxonomy-stem-w-dclm
Preview
•
Updated
•
522
•
6
EssentialAI/eai-taxonomy-med-w-dclm
Viewer
•
Updated
•
81.2M
•
497
•
8
EssentialAI/eai-taxonomy-code-w-dclm
Viewer
•
Updated
•
274M
•
85.3k
•
9
EssentialAI/eai-taxonomy-math-w-fm
Viewer
•
Updated
•
21.6M
•
301
•
5
Viewer
•
Updated
•
27.9B
•
29
•
3
DataMuncher-Labs/UltiMath
Viewer
•
Updated
•
32.9B
•
18.4k
•
40
HuggingFaceFW/finetranslations
Viewer
•
Updated
•
3.33B
•
56.1k
•
270
Viewer
•
Updated
•
517M
•
55.4k
•
350