HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization Paper • 2606.20097 • Published 15 days ago • 18
HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization Paper • 2606.20097 • Published 15 days ago • 18
HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization Paper • 2606.20097 • Published 15 days ago • 18
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 21 days ago • 175
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 539k • 1.61k
datajuicer/the-pile-pubmed-central-refined-by-data-juicer Viewer • Updated Oct 23, 2023 • 100 • 12 • 2