--- license: mit pipeline_tag: image-to-video --- # Causal Forcing [**Causal Forcing**](https://huggingface.co/papers/2602.02214) is a framework for high-quality real-time interactive video generation. It distills pretrained bidirectional video diffusion models into few-step autoregressive (AR) models by bridging the architectural gap between bidirectional and causal attention. - **Project Page:** [https://thu-ml.github.io/CausalForcing.github.io/](https://thu-ml.github.io/CausalForcing.github.io/) - **Code:** [https://github.com/thu-ml/Causal-Forcing](https://github.com/thu-ml/Causal-Forcing) - **Paper:** [Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation](https://huggingface.co/papers/2602.02214) ## Overview Causal Forcing uses an autoregressive teacher for ODE initialization to bridge the architectural gap, then applies an asymmetric DMD procedure. It significantly outperforms existing baselines in visual quality and motion dynamics while maintaining inference efficiency. The frame-wise models natively support both Text-to-Video (T2V) and Image-to-Video (I2V) generation. ## Inference Please refer to the [official GitHub repository](https://github.com/thu-ml/Causal-Forcing) for installation instructions. ### Text-to-Video (T2V) To generate video using the chunk-wise model: ```bash python inference.py \ --config_path configs/causal_forcing_dmd_chunkwise.yaml \ --output_folder output/chunkwise \ --checkpoint_path checkpoints/chunkwise/causal_forcing.pt \ --data_path prompts/demos.txt ``` ### Image-to-Video (I2V) The frame-wise setting natively supports I2V. Set the first latent initial frame as your conditional image: ```bash python inference.py \ --config_path configs/causal_forcing_dmd_framewise.yaml \ --output_folder output/framewise \ --checkpoint_path checkpoints/framewise/causal_forcing.pt \ --data_path prompts/i2v \ --i2v \ --use_ema ``` ## Citation If you find this work useful, please cite: ```bibtex @article{zhu2026causal, title={Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation}, author={Zhu, Hongzhou and Zhao, Min and He, Guande heg and Su, Hang and Li, Chongxuan and Zhu, Jun}, journal={arXiv preprint arXiv:2602.02214}, year={2026} } @article{zhao2026causal, title={Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation}, author={Zhao, Min and Zhu, Hongzhou and Zheng, Kaiwen and Zhou, Zihan and Yan, Bokai and Li, Xinyuan and Yang, Xiao and Li, Chongxuan and Zhu, Jun}, journal={arXiv preprint arXiv:2605.15141}, year={2026} } ```