| | --- |
| | license: mit |
| | language: |
| | - en |
| | base_model: |
| | - Wan-AI/Wan2.1-T2V-1.3B |
| | pipeline_tag: text-to-video |
| | tags: |
| | - Real-Time |
| | - Long-Video |
| | - Video-Diffusion-Model |
| | - Autoregressive |
| | --- |
| | <p align="center"> |
| | <h1 align="center">Rolling Forcing</h1> |
| | <h3 align="center">Autoregressive Long Video Diffusion in Real Time</h3> |
| | </p> |
| | <p align="center"> |
| | <p align="center"> |
| | <a href="https://kunhao-liu.github.io/">Kunhao Liu</a><sup>1</sup> |
| | 路 |
| | <a href="https://wbhu.github.io/">Wenbo Hu</a><sup>2</sup> |
| | 路 |
| | <a href="https://bluestyle97.github.io/">Jiale Xu</a><sup>2</sup> |
| | 路 |
| | <a href="http://www.linkedin.com/in/YingShanProfile">Ying Shan</a><sup>2</sup> |
| | 路 |
| | <a href="https://personal.ntu.edu.sg/shijian.lu/">Shijian Lu</a><sup>1</sup><br> |
| | <sup>1</sup>Nanyang Technological University <sup>2</sup>ARC Lab, Tencent PCG |
| | </p> |
| | <h3 align="center"><a href="https://arxiv.org/abs/2509.25161"><img src="https://img.shields.io/badge/ArXiv-Paper-brown"></a> <a href="https://kunhao-liu.github.io/Rolling_Forcing_Webpage/"><img src="https://img.shields.io/badge/Project-Webpage-bron"></a> <a href="https://github.com/TencentARC/RollingForcing"><img src="https://img.shields.io/badge/GitHub-Code-blue"></a> <a href="https://huggingface.co/TencentARC/RollingForcing"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow"></a></h3> |
| | </p> |
| | |
| |
|
| | ## 馃挕 TL;DR: REAL-TIME streaming generation of MULTI-MINUTE videos |
| | <img src="https://github.com/user-attachments/assets/194bd647-508c-4dba-9ee9-979b54a0e230" /> |
| |
|
| | - 馃殌 Real-Time at 16 FPS: Stream high-quality video directly from text on a single GPU. |
| | - 馃幀 Minute-Long Videos: Generate coherent, multi-minute sequences with dramatically reduced drift. |
| | - 鈿欙笍 Rolling-Window Strategy: Denoise frames together in a rolling window for mutual refinement, breaking the chain of error accumulation. |
| | - 馃 Long-Term Memory: The novel Attention Sink anchors your video, preserving global context over thousands of frames. |
| | - 馃 State-of-the-Art Performance: Outperforms all comparable open-source models in quality and consistency. |
| |
|
| |
|
| | ## 馃摎 Citation |
| |
|
| | If you find this codebase useful for your research, please cite our paper and consider giving the repo a 猸愶笍 on GitHub: https://github.com/TencentARC/RollingForcing |
| |
|
| | ```bibtex |
| | @article{liu2025rolling, |
| | title={Rolling Forcing: Autoregressive Long Video Diffusion in Real Time}, |
| | author={Liu, Kunhao and Hu, Wenbo and Xu, Jiale and Shan, Ying and Lu, Shijian}, |
| | journal={arXiv preprint arXiv:2509.25161}, |
| | year={2025} |
| | } |
| | ``` |