Pretraining Data
updated
opencsg/Fineweb-Edu-Chinese-V2.1
Viewer
•
Updated
•
958M
•
40k
•
63
Viewer
•
Updated
•
56.2M
•
43.6k
•
28
Viewer
•
Updated
•
3.8B
•
28.9k
•
103
allenai/dolma3_dolmino_pool
Updated
•
70.3k
•
7
allenai/dolma3_longmino_pool
Updated
•
27.5k
•
10
Viewer
•
Updated
•
476M
•
37.6k
•
814
Viewer
•
Updated
•
4.48B
•
108k
•
742
Viewer
•
Updated
•
61.6M
•
9.82k
•
278
Viewer
•
Updated
•
819M
•
8.22k
•
11
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
16.4k
•
28
ByteDance-Seed/Code-Contests-Plus
Viewer
•
Updated
•
49.2k
•
6.86k
•
57
Preview
•
Updated
•
6.76k
•
143
nvidia/Nemotron-Pretraining-Code-v2
Viewer
•
Updated
•
836M
•
6.28k
•
100
nvidia/Nemotron-Pretraining-Specialized-v1
Viewer
•
Updated
•
60.7M
•
7.15k
•
69
nvidia/Nemotron-CC-Math-v1
Viewer
•
Updated
•
190M
•
6.32k
•
63
nvidia/Nemotron-Pretraining-SFT-v1
Viewer
•
Updated
•
299M
•
5.05k
•
57
Viewer
•
Updated
•
1.86M
•
4.29k
•
225
EssentialAI/essential-web-v1.0
Preview
•
Updated
•
17.3k
•
217
EssentialAI/eai-taxonomy-stem-w-dclm
Preview
•
Updated
•
1.43k
•
6
EssentialAI/eai-taxonomy-med-w-dclm
Viewer
•
Updated
•
81.2M
•
591
•
8
EssentialAI/eai-taxonomy-code-w-dclm
Viewer
•
Updated
•
274M
•
469
•
8
EssentialAI/eai-taxonomy-math-w-fm
Viewer
•
Updated
•
21.6M
•
435
•
5
Viewer
•
Updated
•
27.9B
•
24
•
3
DataMuncher-Labs/UltiMath
Viewer
•
Updated
•
32.9B
•
16.1k
•
7
HuggingFaceFW/finetranslations
Viewer
•
Updated
•
3.33B
•
68.6k
•
262
Viewer
•
Updated
•
470M
•
36.5k
•
335