IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse Paper • 2603.12201 • Published Mar 12 • 60
yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF Text Generation • 12B • Updated about 3 hours ago • 147k • 1.51k
view article Article Arcee Becomes the First Major American AI Lab to Replace AWS S3 with Hugging Face Private Storage, in a Multi-Million Dollar Commercial Partnership clem • 9 days ago • 32
view article Article Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP +3 ariG23498, ror, sergiopaniego, pcuenq, sayakpaul • 7 days ago • 43