Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models Paper • 2605.09630 • Published 3 days ago • 1
CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling Paper • 2210.07661 • Published Oct 14, 2022