FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion
Abstract
FadeMem introduces a distance-aware key-value memory consolidation mechanism that organizes historical video data into a temporal hierarchy, improving long-video generation by preserving recent context and long-range anchors under fixed cache constraints.
Autoregressive video generators synthesize long videos by generating successive temporal segments, but their historical KV cache grows with video length. Existing bounded-cache methods reduce this cost with local windows, sink tokens, or compressed memory states, yet they usually assign fixed roles to different parts of the history. We propose FadeMem, a distance-aware KV memory consolidation mechanism that organizes historical KV blocks into a temporal hierarchy under a fixed cache budget. This design is motivated by frequency-dependent temporal decay: fine details decorrelate quickly, while coarse scene structure and identity remain useful over longer horizons. During generation, new history is inserted as fine-grained entries, while older adjacent entries are progressively merged under a power-law temporal allocation schedule, yielding a dense-near, sparse-far memory within one cache. Without architectural changes, FadeMem preserves recent context for short-term dynamics and compact long-range anchors for identity and scene coherence. Experiments show improved subject consistency, background stability, and temporal coherence over existing bounded-cache strategies.
Community
We introduce FadeMem, a distance-aware KV memory consolidation method for long autoregressive video diffusion.
The core idea is simple: not all past frames should be treated equally. Recent frames are kept at higher resolution for short-term dynamics, while older history is progressively merged into compact long-range memory that preserves scene structure and identity. This gives a dense-near / sparse-far temporal memory under a fixed KV cache budget.
FadeMem does not require architectural changes. In our experiments, it improves long-video consistency, background stability, and temporal coherence over existing bounded-cache strategies.
Get this paper in your agent:
hf papers read 2606.10671 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper