DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

We propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation.

Project Page | Paper | Code

Checkpoints

Download the Wan2.1 backbone (VAE + tokenizer weights used by the pipeline):

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \
    --local-dir-use-symlinks False \
    --local-dir wan_models/Wan2.1-T2V-1.3B

Download DecMem trained checkpoints from HuggingFace:

huggingface-cli download KlingTeam/DecMem --local-dir checkpoints

Checkpoint layout expected by training / inference scripts:

checkpoints/
└── decmem.pt             # released weights

Quick start

We provide the example video-pose pairs for quick inference. The inference is Block-by-block causal denoising manner with KV cache.

bash scripts/infer_example.sh

Citation

If you find our work helpful, please cite our paper:

@misc{yang2026decmemminutelongconsistentworld,
      title={DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory}, 
      author={Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong},
      year={2026},
      eprint={2605.31336},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.31336}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video-to-Video

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for KlingTeam/DecMem

Base model

Wan-AI/Wan2.1-T2V-1.3B

Finetuned

(61)

this model

Paper for KlingTeam/DecMem

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Paper • 2605.31336 • Published May 29 • 12