Submitted by Stefan Schweter 16 Repetition over Diversity: High-Signal Data Filtering for Sample-Efficient German Language Modeling Boldt 3