File size: 918 Bytes
a15c275 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ---
datasets:
- bigcode/the-stack-v2
- yulan-team/YuLan-Mini-Datasets
- HuggingFaceFW/fineweb-edu
- bigcode/the-stack-v2
- mlfoundations/dclm-baseline-1.0
- math-ai/AutoMathText
- gair-prox/open-web-math-pro
- RUC-AIBOX/long_form_thought_data_5k
- internlm/Lean-Workbook
- internlm/Lean-Github
- deepseek-ai/DeepSeek-Prover-V1
- ScalableMath/Lean-STaR-base
- ScalableMath/Lean-STaR-plus
- ScalableMath/Lean-CoT-base
- ScalableMath/Lean-CoT-plus
- opencsg/chinese-fineweb-edu
- liwu/MNBVC
- vikp/textbook_quality_programming
- HuggingFaceTB/smollm-corpus
- OpenCoder-LLM/opc-annealing-corpus
- OpenCoder-LLM/opc-sft-stage1
- OpenCoder-LLM/opc-sft-stage2
- XinyaoHu/AMPS_mathematica
- deepmind/math_dataset
- mrfakename/basic-math-10m
- microsoft/orca-math-word-problems-200k
- AI-MO/NuminaMath-CoT
- HuggingFaceTB/cosmopedia
- MU-NLPC/Calc-ape210k
- manu/project_gutenberg
- storytracer/LoC-PD-Books
- allenai/dolma
--- |