ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation Paper • 2601.21420 • Published 13 days ago • 42
KaVa: Latent Reasoning via Compressed KV-Cache Distillation Paper • 2510.02312 • Published Oct 2, 2025 • 2
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation Jan 28, 2025 • 25
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published Feb 10, 2025 • 21
FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration Paper • 2502.01068 • Published Feb 3, 2025 • 18