On-Policy Adversarial Flow Distillation for Autoregressive Video Generation
Abstract
Adversarial Flow Distillation enables efficient distillation of heterogeneous video generation models by using on-policy feedback and forward-process flow-matching updates without requiring teacher scores or detailed trajectory information.
Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal students remains difficult. The student must learn under its own rollout distribution, whereas practical teachers may expose only prompt-conditioned completed videos and may differ in architecture, capacity, temporal design, and sampling schedule. This interface makes supervised fine-tuning off-policy, score-based distillation inapplicable, and direct adversarial imitation too sparse for denoising-time credit assignment. We propose Adversarial Flow Distillation (AFD), an on-policy framework for heterogeneous black-box video distillation. AFD queries the teacher and rolls out the current student on the same prompts, trains a prompt-paired Bradley-Terry discriminator to estimate clean-sample teacher-student discrepancy, and converts the resulting on-policy advantage into forward-process flow-matching updates on the student's own noised states. Thus, AFD provides dense velocity-field supervision while requiring no teacher scores, latents, denoising trajectories, step alignment, or reverse-chain reinforcement learning. Experiments across two causal AR student families show that AFD consistently improves motion- and physics-sensitive generation while preserving general video quality, and ablations validate the importance of adaptive on-policy feedback and forward-process credit assignment. The method requires only clean teacher videos and student rollouts, providing a practical route for distilling proprietary or heterogeneous video generators into efficient autoregressive students.
Community
upload
the on-policy flow distillation approach is clever, turning the teacher-student discrepancy into a dense forward-flow update along the student's own denoising path. my main question is how robust the adaptive Bradley-Terry discriminator is to prompt drift and distribution shift between completed teacher videos and the student rollouts, especially as you explore longer horizons or unseen prompts. btw, arxivlens had a solid breakdown that helped me parse the method details, it’s a nice companion to the paper here: https://arxivlens.com/PaperView/Details/on-policy-adversarial-flow-distillation-for-autoregressive-video-generation-6553-45a38d20. a small ablation on how discriminator quality tracks with final physics and motion fidelity across different prompt regimes would seal the argument for me.
Get this paper in your agent:
hf papers read 2605.26105 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper