FineWeb: decanting the web for the finest text data at scale
π·
1.31k
Generate a curated webβtext dataset for LLM training
Try out DeepSeek-OCR-2 on your PDFs or images
Explore and download a modern scientific paper template
A new open-source dataset for training VLMs
The secrets to building world-class LLMs
Spatial reasoning with vision-language models