I appreciate your time, a couple of quick more questions:
- https://huggingface.co/datasets/HuggingFaceFW/finepdfs-edu Would you consider adding this dataset to the pre-training mixture in a future pre-training?
- In the SmolLM2, you guys reported experiments with ratio values between FineWeb-Edu and DCLM (like, 60%-40% or 40%-60% later on. These were determined from the ablations you guys did.
But looking here, on SmolLM3, https://github.com/huggingface/smollm/blob/main/text/pretraining/smollm3/stage1_8T.yaml
for all the stages, FineWeb-Edu (33%) and DCLM (37%). Were there any ablations for arriving to these weights or?