SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation
Abstract
SparkVSR enables interactive video super-resolution by using sparse keyframes as control signals, combining latent-pixel two-stage training with motion-guided propagation for enhanced temporal consistency and restoration quality.
Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black boxes at inference time: users cannot reliably correct unexpected artifacts, but instead can only accept whatever the model produces. In this paper, we propose a novel interactive VSR framework dubbed SparkVSR that makes sparse keyframes a simple and expressive control signal. Specifically, users can first super-resolve or optionally a small set of keyframes using any off-the-shelf image super-resolution (ISR) model, then SparkVSR propagates the keyframe priors to the entire video sequence while remaining grounded by the original LR video motion. Concretely, we introduce a keyframe-conditioned latent-pixel two-stage training pipeline that fuses LR video latents with sparsely encoded HR keyframe latents to learn robust cross-space propagation and refine perceptual details. At inference time, SparkVSR supports flexible keyframe selection (manual specification, codec I-frame extraction, or random sampling) and a reference-free guidance mechanism that continuously balances keyframe adherence and blind restoration, ensuring robust performance even when reference keyframes are absent or imperfect. Experiments on multiple VSR benchmarks demonstrate improved temporal consistency and strong restoration quality, surpassing baselines by up to 24.6%, 21.8%, and 5.6% on CLIP-IQA, DOVER, and MUSIQ, respectively, enabling controllable, keyframe-driven video super-resolution. Moreover, we demonstrate that SparkVSR is a generic interactive, keyframe-conditioned video processing framework as it can be applied out of the box to unseen tasks such as old-film restoration and video style transfer. Our project page is available at: https://sparkvsr.github.io/
Community
SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OSDEnhancer: Taming Real-World Space-Time Video Super-Resolution with One-Step Diffusion (2026)
- LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts (2026)
- PISCO: Precise Video Instance Insertion with Sparse Control (2026)
- TextOVSR: Text-Guided Real-World Opera Video Super-Resolution (2026)
- FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution (2026)
- FC-VFI: Faithful and Consistent Video Frame Interpolation for High-FPS Slow Motion Video Generation (2026)
- Frames2Residual: Spatiotemporal Decoupling for Self-Supervised Video Denoising (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper