Where does output diversity collapse in post-training? Paper • 2604.16027 • Published 4 days ago • 10
Deconstructing Attention: Investigating Design Principles for Effective Language Modeling Paper • 2510.11602 • Published Oct 13, 2025 • 15