Models of the Paper LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts
Felipe Rodríguez Bórquez PRO
feliperodriguezborquez
AI & ML interests
Architectures, pre-training, post-training
Organizations
None yet