MegaFlow: Zero-Shot Large Displacement Optical Flow
Abstract
MegaFlow employs pre-trained Vision Transformer features to address large displacement optical flow estimation through global matching and iterative refinement, achieving superior zero-shot performance across multiple benchmarks.
Accurate estimation of large displacement optical flow remains a critical challenge. Existing methods typically rely on iterative local search or/and domain-specific fine-tuning, which severely limits their performance in large displacement and zero-shot generalization scenarios. To overcome this, we introduce MegaFlow, a simple yet powerful model for zero-shot large displacement optical flow. Rather than relying on highly complex, task-specific architectural designs, MegaFlow adapts powerful pre-trained vision priors to produce temporally consistent motion fields. In particular, we formulate flow estimation as a global matching problem by leveraging pre-trained global Vision Transformer features, which naturally capture large displacements. This is followed by a few lightweight iterative refinements to further improve the sub-pixel accuracy. Extensive experiments demonstrate that MegaFlow achieves state-of-the-art zero-shot performance across multiple optical flow benchmarks. Moreover, our model also delivers highly competitive zero-shot performance on long-range point tracking benchmarks, demonstrating its robust transferability and suggesting a unified paradigm for generalizable motion estimation. Our project page is at: https://kristen-z.github.io/projects/megaflow.
Community
The zero-shot angle is compelling — most optical flow methods struggle when you move beyond their training domain. The large displacement problem is particularly interesting for video understanding in agentic systems where camera motion is unpredictable. Curious if MegaFlow handles occlusion boundaries differently than RAFT or GMFlow? The tradeoff between iteration count and displacement range is something we've wrestled with in real-time video pipelines. Any benchmarks on inference speed vs accuracy at 4K resolution?
Get this paper in your agent:
hf papers read 2603.25739 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper