david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260428 Text Generation • 19.8M • Updated about 16 hours ago
david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260428 Text Generation • 19.8M • Updated about 16 hours ago
david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260427 Text Generation • 19.8M • Updated 1 day ago • 12
david-thrower/HelixLM-tiny-10k-samples-s1-8942pt-s2-700it-20260427 Text Generation • 19.8M • Updated 1 day ago • 12
SmolLM3 pretraining datasets Collection datasets used in SmolLM3 pretraining • 15 items • Updated Aug 12, 2025 • 47
david-thrower/codelion-finemix-pdf-dclm-edu-1024-seq-len-15897-samples Viewer • Updated Jan 19 • 15.9k • 32
david-thrower/codelion-finemix-pdf-dclm-edu-1024-seq-len-15897-samples Viewer • Updated Jan 19 • 15.9k • 32