MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published Jan 12 • 52
Towards Automated Kernel Generation in the Era of LLMs Paper • 2601.15727 • Published 23 days ago • 18
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding Paper • 2601.14724 • Published 24 days ago • 74
Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening Paper • 2601.21590 • Published 16 days ago • 12