Nemotron-TwoTower: Diffusion Language Modeling with Pretrained Autoregressive Context Paper • 2606.26493 • Published 8 days ago • 2
Nemotron-Labs-TwoTower Collection Diffusion Language Modeling with Pretrained Autoregressive Nemotron 3 Models • 1 item • Updated 2 days ago • 5
Rethinking the Role of Efficient Attention in Hybrid Architectures Paper • 2606.15378 • Published 20 days ago • 18