DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory
Paper • 2605.31336 • Published • 8
We propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation.
Project Page | Paper | Code
Download the Wan2.1 backbone (VAE + tokenizer weights used by the pipeline):
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \
--local-dir-use-symlinks False \
--local-dir wan_models/Wan2.1-T2V-1.3B
Download DecMem trained checkpoints from HuggingFace:
huggingface-cli download KlingTeam/DecMem --local-dir checkpoints
Checkpoint layout expected by training / inference scripts:
checkpoints/
└── decmem.pt # released weights
We provide the example video-pose pairs for quick inference. The inference is Block-by-block causal denoising manner with KV cache.
bash scripts/infer_example.sh
If you find our work helpful, please cite our paper:
@misc{yang2026decmemminutelongconsistentworld,
title={DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory},
author={Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong},
year={2026},
eprint={2605.31336},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.31336},
}
Base model
Wan-AI/Wan2.1-T2V-1.3B