Video-to-Video
English

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

We propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation.

Project Page | Paper | Code

Checkpoints

Download the Wan2.1 backbone (VAE + tokenizer weights used by the pipeline):

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B \
    --local-dir-use-symlinks False \
    --local-dir wan_models/Wan2.1-T2V-1.3B

Download DecMem trained checkpoints from HuggingFace:

huggingface-cli download KlingTeam/DecMem --local-dir checkpoints

Checkpoint layout expected by training / inference scripts:

checkpoints/
└── decmem.pt             # released weights

Quick start

We provide the example video-pose pairs for quick inference. The inference is Block-by-block causal denoising manner with KV cache.

bash scripts/infer_example.sh

Citation

If you find our work helpful, please cite our paper:

@misc{yang2026decmemminutelongconsistentworld,
      title={DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory}, 
      author={Zhenhao Yang and Xiaoshi Wu and Zhengyao Lv and Xiaoyu Shi and Xintao Wang and Pengfei Wan and Kun Gai and Kwan-Yee K. Wong},
      year={2026},
      eprint={2605.31336},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.31336}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KlingTeam/DecMem

Finetuned
(53)
this model

Paper for KlingTeam/DecMem